208 lines
18 KiB
Plaintext
208 lines
18 KiB
Plaintext
|
|
Episode: 3305
|
||
|
|
Title: HPR3305: Nagios part 2
|
||
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3305/hpr3305.mp3
|
||
|
|
Transcribed: 2025-10-24 20:30:02
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
This is Hacker Public Radio Episode 3305 for Friday, the 2nd of April 2021.
|
||
|
|
Today's show is entitled, Nageos Part 2.
|
||
|
|
It is hosted by Norist and is about 24 minutes long and carries a clean flag.
|
||
|
|
The summary is, follow up to phpr3264 notifications, SNMP, remote checks.
|
||
|
|
This episode of HPR is brought to you by archive.org.
|
||
|
|
Support universal access to all knowledge by heading over to archive.org forward slash donate.
|
||
|
|
Well, I didn't get any feedback on my first Nageos episode, so the only reasonable explanation
|
||
|
|
for that is that I perfectly explained what Nageos is and what it does and the installation
|
||
|
|
instructions that I provided were flawless.
|
||
|
|
So, no one had any questions or comments, so I'm going to move on to some additional Nageos topics.
|
||
|
|
One thing I forgot to talk about in the intro is some reasons you may want to use Nageos while
|
||
|
|
while Nageos is interesting to hobbyist. So first, it's just learning something new for the sake of
|
||
|
|
learning something new. To me, that's always fun. I always want to know something about everything.
|
||
|
|
Related is that Nageos or network monitoring in general is a pretty valuable IT scale. So,
|
||
|
|
if you work in IT or you want to work in IT, generally knowing how network monitoring works is
|
||
|
|
a positive and dependent on the position you take or the company you work at for
|
||
|
|
no one Nageos specifically will probably help. Then outside of just knowledge, you know,
|
||
|
|
some practical benefits of Nageos around the house is, you know, one it might help you detect
|
||
|
|
some early sounds of equipment failure. So, if you're monitoring a device and you start noticing
|
||
|
|
that things are occasionally dropping or something like that, you may get the idea that, you know,
|
||
|
|
it's not working like it's supposed to or if you see disc errors or discs getting full,
|
||
|
|
that could be a sort of an early warning that you may have some hardware to replace.
|
||
|
|
Next is, you know, monitoring self-hosted applications. So, if you have a VPN setup or
|
||
|
|
version of bone cloud or something and you want some notification that these things are down before
|
||
|
|
you try to use them. There's something like Nageos can help. And then I use Nageos at home for one
|
||
|
|
of the big things I use it for is my home security system. I have a lot of
|
||
|
|
IP cameras, you know, I have like ringed cameras outside and I have some amcrest cameras inside
|
||
|
|
the house. And I want to know, you know, when those things stop working, it'd be nice to know
|
||
|
|
before I go to look at something to know that they were down. So, one of the devices or a group
|
||
|
|
of class devices that I monitor are my network cameras. So far, every camera I bought at least
|
||
|
|
responds to a ping so they can all be monitored by Nageos. And earlier when I was talking about
|
||
|
|
pushover, you know, I said that you can assign the pushover contact to some services, not every service.
|
||
|
|
So, what I'll do is I'll assign the pushover contacts to the security devices that I, you know,
|
||
|
|
deem the most critical. Then I'll get a notification on my phone if they go offline, if the battery dies
|
||
|
|
or if the switch accidentally gets turned off or something like that. So, there's a lot of
|
||
|
|
benefits to learning Nageos, but none of them are really specific to Nageos itself. So, there's
|
||
|
|
plenty of other options out there for monitoring. All of them are worth exploring.
|
||
|
|
You know, if you're looking for an alternative to Nageos, there are Nageos forks.
|
||
|
|
Icinga is probably the most well-known fork of Nageos. And then there's some other network
|
||
|
|
monitoring tools. There's plenty of others. A couple others that I recommend. One is
|
||
|
|
if you have a Windows server, there's a network monitoring program called PRTG.
|
||
|
|
It's a commercial program. It's not free or free-soft or anything like that, but it does have a
|
||
|
|
free tier. You can do up to X number of checks and I don't remember how many checks it is, but you
|
||
|
|
can do X number of checks for free. And I like it because it's really simple to use and it's
|
||
|
|
really simple to set up. Really, the downsides are that it records Windows and that it's
|
||
|
|
non-free. The other recommendation I can make if you're just looking for an alternative to Nageos,
|
||
|
|
especially if you're using Kubernetes or Docker is Prometheus, which is really good at
|
||
|
|
collecting data and it has something called Alert Manager, which lets you alert off the data that's
|
||
|
|
collected. Unfortunately, spammers have ruined the ability to send emails directly.
|
||
|
|
Because spam from malware is such a problem, a lot of ISPs block sending email by blocking
|
||
|
|
outbound port 25. All the residential ISPs do it and even some of the hosting providers do it.
|
||
|
|
But even if your hosting provider doesn't block outbound port 25, most mail servers are not going
|
||
|
|
to accept mail from an IP range that's known to be a residential IP range or a BPS, just because
|
||
|
|
the risk of those emails being spam is so high. So there's a few ways to get around,
|
||
|
|
not being able to send email directly. I use an email sending service. The particular one I use
|
||
|
|
is called Send Grid. And then what they do is Send Grid goes through all the work of keeping
|
||
|
|
themselves off the black list of known spammers. And most email servers will accept email that sent via
|
||
|
|
Send Grid. So I'm not going to go into the specific instructions for configuring postfix to relay
|
||
|
|
email via Send Grid, but the Send Grid documentation is really good. And if that's the service you choose
|
||
|
|
to use, their documentation will walk you through setting up postfix to relay via Send Grid.
|
||
|
|
The other thing you'll need to do is you'll need to make sure that the address you're sending to
|
||
|
|
has a good alias. So by default, Nagio sends its alerts to the Nagio's admin at localhost.
|
||
|
|
So you'll need to make sure you have a good email alias set up to forward the emails from
|
||
|
|
Nagio's admin to mailbox that you're watching. There's plenty of other services out there like
|
||
|
|
Send Grid, including Amazon has a simple mail service and there's things like MailChamp. If you
|
||
|
|
look around, there's probably plenty of options. And I think most of them have a free tier.
|
||
|
|
So I've never had to pay for my Nagio's checks. So unless you're just blasting out alerts
|
||
|
|
hundreds a day, you can probably stay within the free tier of most of these mail providers.
|
||
|
|
So besides email notification, there's a couple other notification options you can use
|
||
|
|
specifically if you want to get alerts on your phone. Probably the simplest
|
||
|
|
thing to do to get some Nagio's alerts on your phone is to install the Android app called ANAG,
|
||
|
|
ANAG. And what that does is that's an Android app. Like I said, it connects directly to the
|
||
|
|
Nagio's UI just like you would in the browser. And then it can periodically check the status
|
||
|
|
of Nagio's and generate notifications, phone notifications based on the status of the Nagio's checks.
|
||
|
|
One of the downsides to ANAG is that the phone has to be able to directly connect to the
|
||
|
|
Nagio server. So if the Nagio servers on a private network, you may need a VPN or something to connect
|
||
|
|
to the Nagio server from your phone. And you might be tempted to put Nagios on the public
|
||
|
|
internet. And that's okay. Nagio's itself is secure, but it can be considered a security risk if
|
||
|
|
someone is able to brute force your Nagios password. They can get a lot of information about
|
||
|
|
your network. So if you decide to put Nagios on the public internet, take security very seriously.
|
||
|
|
I recommend only using HTTPS and only ever logging in to Nagios via HTTPS.
|
||
|
|
And if you need help setting up Apache with HTTPS, there's a lot of good gowns out there
|
||
|
|
and there's even tools like Certpot that can set up the LexingCrip certificate for you
|
||
|
|
and configure Apache for you. Another option for getting push alerts on your phone is a push
|
||
|
|
notification service. Again, there's a there's a few of these, but I use one called pushover.
|
||
|
|
It's at pushover.net. I like to use the pushover app. It's it pushover.net.
|
||
|
|
It's a commercial service, but instead of paying monthly or per push notification,
|
||
|
|
you pay $5 in the Play Store whenever you download the app. There is a free trial if you want to
|
||
|
|
try it. Right now it's 30 days. It's been seven days in the past, but I'm sure whenever you
|
||
|
|
get to look at it, if it's what you want to use, they'll have some free time for you to try it out.
|
||
|
|
So to use pushover with Nagios, we need to add a pushover contact to the Nagios configs.
|
||
|
|
So when a notification is sent to the new pushover contact, the contact will the contact
|
||
|
|
definition will run a script that calls the pushover API via curl. So I'm sure you remember from
|
||
|
|
the previous Nagios episode that in Etsy Nagios, there's a conf.d directory where you can put
|
||
|
|
any config file and Nagios will load it. As long as it's named.cfg, something.cfg,
|
||
|
|
Nagios will find all the files in that directory and load them. So we'll create a file called
|
||
|
|
pushover.cfg, put it in there, and then restart Nagios. The contents of the, I'm not going to
|
||
|
|
bore you about reading out the contents of the pushover.cfg, but it'll be in the show notes.
|
||
|
|
If you want to use pushover for some specific checks, you can just add the contact to that check,
|
||
|
|
to it, you can add it to a single check or multiple checks, or if you want to make all your
|
||
|
|
notifications via pushover, you can modify the template definition that's used for the host
|
||
|
|
and services and change the contact from the default to pushover. One of the benefits of
|
||
|
|
Nagios is that you can write your own checks. So if there's not a plugin for what you want to
|
||
|
|
monitor, if you can write a script for it, Nagios can check it. So remember in the previous episode
|
||
|
|
that I mentioned, the status of the Nagios checks are based on exit code. So if you run the script
|
||
|
|
and it exits with zero, that's okay. And if it exits with one, that's all warning. And if it exits
|
||
|
|
with two, that's critical. So to write a custom check, you just need to write a script that performs
|
||
|
|
the check and then does some logic to figure out the exit code and exits based on the result of
|
||
|
|
the logic check. So as an example, I'll use a custom check that I wrote recently. I have a server
|
||
|
|
that collects syslog from around the network. And occasionally, I don't know why the syslog
|
||
|
|
game just stops running. So instead of trying to figure out what's wrong with my syslog server,
|
||
|
|
I just wrote a script to check that the syslog file is actually being written to and updated.
|
||
|
|
So the script looks for the expected log file. It's got the date and the name, so it looks for
|
||
|
|
what are today's date.log. And then it tests that the file has been modified within the last
|
||
|
|
few minutes. The script will exit zero if the syslog file is less than a minute old. It'll exit one,
|
||
|
|
which is a warning if it's less than 10 minutes old. And it'll exit two, which is critical if it's
|
||
|
|
more than 10 minutes old or if it can't found the file at all. And then since the server with the
|
||
|
|
crashy syslog bits isn't on the same server as Nagios, we have to be able to check. We have to be
|
||
|
|
able to run the script on a remote system. Nagios has a few different ways to run commands on
|
||
|
|
remote servers. I prefer to use ssh. So that's the only one I'm going to talk about. There are some
|
||
|
|
disadvantages to using ssh specifically. ssh is kind of a heavy network connection when compared
|
||
|
|
to some of the other options. So if you have a lot of checks to do, you may want to look at
|
||
|
|
something different than execute by ssh. So there's a check by ssh plugin that's used to check
|
||
|
|
commands on remote systems. Typically, what you'll do is you'll set up ssh key authentication
|
||
|
|
from the user that's running a Nagios. And in most cases, the user name is Nagios. So what you want
|
||
|
|
is you want the Nagios user to be able to log into the remote system without having to pass a
|
||
|
|
password. Again, one cool thing about Nagios plugins and Nagios checks is they can all be tested
|
||
|
|
outside of Nagios. So you can try the command before you set it up in Nagios to make sure it's
|
||
|
|
working. So to test running a remote plugin via ssh, you can cd to the plugins directory
|
||
|
|
and then run the check by ssh script with some flags, dash h for the host, dash u for the user,
|
||
|
|
and then dash capital c for the path to the check on the remote server. So now that you've verified
|
||
|
|
the syntax to the check by ssh command that you want to use, you can add that to a command file in
|
||
|
|
Nagios.d directory. Again, I'm not going to bore you by reading off Nagios configs, but the example
|
||
|
|
will be in the show notes. So now that you've added the command definition, you can use the
|
||
|
|
check syslog age command as a service check for one of your hosts.
|
||
|
|
And the script I'm using to check the syslog date will be in the show notes.
|
||
|
|
Not that you would have this specific use, but you can look and see how I'm doing the test
|
||
|
|
and how I structured the logic of the script.
|
||
|
|
So another common use for Nagios or another common method for
|
||
|
|
monitoring servers is via SNMP. SNMP is really complicated and honestly I have some
|
||
|
|
mixed feelings about using it. I'm not, I just can't go into all the different
|
||
|
|
variations and versions of SNMP or all the different SNMP authentication options,
|
||
|
|
but I will show you how to get a minimal setup working so you can monitor a few things via SNMP.
|
||
|
|
And I want to give you guys a warning. The SNMP authentication option that I am going to
|
||
|
|
demonstrate you today is only appropriate for isolated networks. If you plan on using SNMP over a
|
||
|
|
public network, I highly recommend looking at some more secure versions of SNMP or even tunneling
|
||
|
|
your SNMP traffic over SSH or MVPN. If you want to know more about SNMP, I'm definitely not
|
||
|
|
an expert, but one of my very favorite tech authors, Michael Lucas, just recently released a book
|
||
|
|
called SNMP Mastery. It's a good book. It goes into a lot of details. It'll, if you need to know
|
||
|
|
how to use SNMP, securely, this is the source you need. So first we'll talk about the clients
|
||
|
|
or the servers being monitored. We'll talk about the setup on those. So on a month to it's easy,
|
||
|
|
just apt install SNMPD. And we need to make a few changes to the config file by default. SNMPD only
|
||
|
|
listens on localhost. So you'll have to replace the SNMPD comp. I'll have an example in the show
|
||
|
|
notes of the SNMP.com that I use. So it changes the listening address to all IP addresses. And it
|
||
|
|
also sets a read-only community string. And again, using community strings is insecure. Don't do
|
||
|
|
it over the internet. And then finally on the clients restart SNMPD. And if you're using a firewall,
|
||
|
|
you'll need to open up the SNMPD port 161. And again, last morning, if you put it on the internet,
|
||
|
|
don't do it this way. So now we'll talk a little bit about setting up the checks in Nugios.
|
||
|
|
If you remember from the last episode, you talked about this directory. It's an Etsy Nugios plug-ins,
|
||
|
|
slash config. There's a lot of checks that are already set up for you. Remember, just a second
|
||
|
|
ago, when we were talking about the custom script, we had a write our own command that went along
|
||
|
|
with a custom script. That's not the case with these SNMP checks. They're already defined for you.
|
||
|
|
So if you look in that SNMP.CFG in the plug-ins directory in Etsy, you can look through that file and
|
||
|
|
get an idea of all the different checks you can do via SNMP. I'll have an example of some client
|
||
|
|
configuration using SNMP in the show notes. You can look through those as well and get an idea of
|
||
|
|
how the command definitions are used. Most of that SNMP definitions require or can take arguments.
|
||
|
|
Whenever you need to pass a command and argument, that argument is represented in the check by a
|
||
|
|
placeholder, which is arg1 and dollar signs. In a lot of cases, the arguments are optional.
|
||
|
|
In these SNMP checks, they require the community string and then some of the disc checks require
|
||
|
|
an argument of the disc number. In the service checks definitions, the arguments that you pass to
|
||
|
|
the commands are separated by exclamation points. You can also see that in the example.
|
||
|
|
Also, in the examples, you can see how I add additional contacts, the pushover contacts,
|
||
|
|
and I have an example of how to change the number of check attempts. Normally, the default is
|
||
|
|
fine, but if you want to, it'll have to, a check will have to fail five times if we're
|
||
|
|
going to alert, but if you want to change that to one, that's the check interval. You can see that
|
||
|
|
in the example. Also, I have an example of changing the frequency of the check from the default of
|
||
|
|
five minutes to one minute. Another thing I like to do with Nagos is use it to monitor any remote
|
||
|
|
service I have for security updates. Nagos has plugins that can check to see if system
|
||
|
|
updates are required, and it can tell you things like the number of updates and the check will be
|
||
|
|
critical if any of the updates are security related, and it can also tell you if a reboot is
|
||
|
|
required because you've updated the kernel, but you're not running the latest kernel.
|
||
|
|
So the check plugin has to be installed on the remote server.
|
||
|
|
For Debian-based systems, the name of the plugin is Nagos-plugins-contrib,
|
||
|
|
and that's a big package. It'll come with a lot of checks, more than just one for a second
|
||
|
|
if you need apt updates, and then on Red Hat-based systems, the name of the plugin is Nagos-plugins-check-updates.
|
||
|
|
The command definitions for how to run it are listed in the show notes. These plugins take
|
||
|
|
a little while to run longer than typical, so you'll see in the example where I set the time out
|
||
|
|
with the dash t-flag to 120 seconds just to give it some extra time to do all the checks it has to do.
|
||
|
|
That's probably all the Nagos I can handle for now, so if you have any comments or questions,
|
||
|
|
leave them under the episode, and again, thanks for listening, and I'll see you guys next time.
|
||
|
|
You've been listening to HackerPublicRadio at HackerPublicRadio.org.
|
||
|
|
We are a community podcast network that releases shows every weekday, Monday through Friday.
|
||
|
|
Today's show, like all our shows, was contributed by an HPR listener like yourself.
|
||
|
|
If you ever thought of recording a podcast and click on our contributing,
|
||
|
|
to find out how easy it really is. HackerPublicRadio was founded by the Digital Dog Pound
|
||
|
|
and the Infonomicon Computer Club, and is part of the binary revolution at binrev.com.
|
||
|
|
If you have comments on today's show, please email the host directly, leave a comment on the website
|
||
|
|
or record a follow-up episode yourself. Unless otherwise status,
|
||
|
|
today's show is released on the Creative Commons'
|
||
|
|
Attribution ShareLight 3.0 license.
|