hpr-knowledge-base/hpr_transcripts/hpr3305.txt

Episode: 3305
Title: HPR3305: Nagios part 2
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3305/hpr3305.mp3
Transcribed: 2025-10-24 20:30:02

---

This is Hacker Public Radio Episode 3305 for Friday, the 2nd of April 2021.
Today's show is entitled, Nageos Part 2.
It is hosted by Norist and is about 24 minutes long and carries a clean flag.
The summary is, follow up to phpr3264 notifications, SNMP, remote checks.
This episode of HPR is brought to you by archive.org.
Support universal access to all knowledge by heading over to archive.org forward slash donate.
Well, I didn't get any feedback on my first Nageos episode, so the only reasonable explanation
for that is that I perfectly explained what Nageos is and what it does and the installation
instructions that I provided were flawless.
So, no one had any questions or comments, so I'm going to move on to some additional Nageos topics.
One thing I forgot to talk about in the intro is some reasons you may want to use Nageos while
while Nageos is interesting to hobbyist. So first, it's just learning something new for the sake of
learning something new. To me, that's always fun. I always want to know something about everything.
Related is that Nageos or network monitoring in general is a pretty valuable IT scale. So,
if you work in IT or you want to work in IT, generally knowing how network monitoring works is
a positive and dependent on the position you take or the company you work at for
no one Nageos specifically will probably help. Then outside of just knowledge, you know,
some practical benefits of Nageos around the house is, you know, one it might help you detect
some early sounds of equipment failure. So, if you're monitoring a device and you start noticing
that things are occasionally dropping or something like that, you may get the idea that, you know,
it's not working like it's supposed to or if you see disc errors or discs getting full,
that could be a sort of an early warning that you may have some hardware to replace.
Next is, you know, monitoring self-hosted applications. So, if you have a VPN setup or
version of bone cloud or something and you want some notification that these things are down before
you try to use them. There's something like Nageos can help. And then I use Nageos at home for one
of the big things I use it for is my home security system. I have a lot of
IP cameras, you know, I have like ringed cameras outside and I have some amcrest cameras inside
the house. And I want to know, you know, when those things stop working, it'd be nice to know
before I go to look at something to know that they were down. So, one of the devices or a group
of class devices that I monitor are my network cameras. So far, every camera I bought at least
responds to a ping so they can all be monitored by Nageos. And earlier when I was talking about
pushover, you know, I said that you can assign the pushover contact to some services, not every service.
So, what I'll do is I'll assign the pushover contacts to the security devices that I, you know,
deem the most critical. Then I'll get a notification on my phone if they go offline, if the battery dies
or if the switch accidentally gets turned off or something like that. So, there's a lot of
benefits to learning Nageos, but none of them are really specific to Nageos itself. So, there's
plenty of other options out there for monitoring. All of them are worth exploring.
You know, if you're looking for an alternative to Nageos, there are Nageos forks.
Icinga is probably the most well-known fork of Nageos. And then there's some other network
monitoring tools. There's plenty of others. A couple others that I recommend. One is
if you have a Windows server, there's a network monitoring program called PRTG.
It's a commercial program. It's not free or free-soft or anything like that, but it does have a
free tier. You can do up to X number of checks and I don't remember how many checks it is, but you
can do X number of checks for free. And I like it because it's really simple to use and it's
really simple to set up. Really, the downsides are that it records Windows and that it's
non-free. The other recommendation I can make if you're just looking for an alternative to Nageos,
especially if you're using Kubernetes or Docker is Prometheus, which is really good at
collecting data and it has something called Alert Manager, which lets you alert off the data that's
collected. Unfortunately, spammers have ruined the ability to send emails directly.
Because spam from malware is such a problem, a lot of ISPs block sending email by blocking
outbound port 25. All the residential ISPs do it and even some of the hosting providers do it.
But even if your hosting provider doesn't block outbound port 25, most mail servers are not going
to accept mail from an IP range that's known to be a residential IP range or a BPS, just because
the risk of those emails being spam is so high. So there's a few ways to get around,
not being able to send email directly. I use an email sending service. The particular one I use
is called Send Grid. And then what they do is Send Grid goes through all the work of keeping
themselves off the black list of known spammers. And most email servers will accept email that sent via
Send Grid. So I'm not going to go into the specific instructions for configuring postfix to relay
email via Send Grid, but the Send Grid documentation is really good. And if that's the service you choose
to use, their documentation will walk you through setting up postfix to relay via Send Grid.
The other thing you'll need to do is you'll need to make sure that the address you're sending to
has a good alias. So by default, Nagio sends its alerts to the Nagio's admin at localhost.
So you'll need to make sure you have a good email alias set up to forward the emails from
Nagio's admin to mailbox that you're watching. There's plenty of other services out there like
Send Grid, including Amazon has a simple mail service and there's things like MailChamp. If you
look around, there's probably plenty of options. And I think most of them have a free tier.
So I've never had to pay for my Nagio's checks. So unless you're just blasting out alerts
hundreds a day, you can probably stay within the free tier of most of these mail providers.
So besides email notification, there's a couple other notification options you can use
specifically if you want to get alerts on your phone. Probably the simplest
thing to do to get some Nagio's alerts on your phone is to install the Android app called ANAG,
ANAG. And what that does is that's an Android app. Like I said, it connects directly to the
Nagio's UI just like you would in the browser. And then it can periodically check the status
of Nagio's and generate notifications, phone notifications based on the status of the Nagio's checks.
One of the downsides to ANAG is that the phone has to be able to directly connect to the
Nagio server. So if the Nagio servers on a private network, you may need a VPN or something to connect
to the Nagio server from your phone. And you might be tempted to put Nagios on the public
internet. And that's okay. Nagio's itself is secure, but it can be considered a security risk if
someone is able to brute force your Nagios password. They can get a lot of information about
your network. So if you decide to put Nagios on the public internet, take security very seriously.
I recommend only using HTTPS and only ever logging in to Nagios via HTTPS.
And if you need help setting up Apache with HTTPS, there's a lot of good gowns out there
and there's even tools like Certpot that can set up the LexingCrip certificate for you
and configure Apache for you. Another option for getting push alerts on your phone is a push
notification service. Again, there's a there's a few of these, but I use one called pushover.
It's at pushover.net. I like to use the pushover app. It's it pushover.net.
It's a commercial service, but instead of paying monthly or per push notification,
you pay $5 in the Play Store whenever you download the app. There is a free trial if you want to
try it. Right now it's 30 days. It's been seven days in the past, but I'm sure whenever you
get to look at it, if it's what you want to use, they'll have some free time for you to try it out.
So to use pushover with Nagios, we need to add a pushover contact to the Nagios configs.
So when a notification is sent to the new pushover contact, the contact will the contact
definition will run a script that calls the pushover API via curl. So I'm sure you remember from
the previous Nagios episode that in Etsy Nagios, there's a conf.d directory where you can put
any config file and Nagios will load it. As long as it's named.cfg, something.cfg,
Nagios will find all the files in that directory and load them. So we'll create a file called
pushover.cfg, put it in there, and then restart Nagios. The contents of the, I'm not going to
bore you about reading out the contents of the pushover.cfg, but it'll be in the show notes.
If you want to use pushover for some specific checks, you can just add the contact to that check,
to it, you can add it to a single check or multiple checks, or if you want to make all your
notifications via pushover, you can modify the template definition that's used for the host
and services and change the contact from the default to pushover. One of the benefits of
Nagios is that you can write your own checks. So if there's not a plugin for what you want to
monitor, if you can write a script for it, Nagios can check it. So remember in the previous episode
that I mentioned, the status of the Nagios checks are based on exit code. So if you run the script
and it exits with zero, that's okay. And if it exits with one, that's all warning. And if it exits
with two, that's critical. So to write a custom check, you just need to write a script that performs
the check and then does some logic to figure out the exit code and exits based on the result of
the logic check. So as an example, I'll use a custom check that I wrote recently. I have a server
that collects syslog from around the network. And occasionally, I don't know why the syslog
game just stops running. So instead of trying to figure out what's wrong with my syslog server,
I just wrote a script to check that the syslog file is actually being written to and updated.
So the script looks for the expected log file. It's got the date and the name, so it looks for
what are today's date.log. And then it tests that the file has been modified within the last
few minutes. The script will exit zero if the syslog file is less than a minute old. It'll exit one,
which is a warning if it's less than 10 minutes old. And it'll exit two, which is critical if it's
more than 10 minutes old or if it can't found the file at all. And then since the server with the
crashy syslog bits isn't on the same server as Nagios, we have to be able to check. We have to be
able to run the script on a remote system. Nagios has a few different ways to run commands on
remote servers. I prefer to use ssh. So that's the only one I'm going to talk about. There are some
disadvantages to using ssh specifically. ssh is kind of a heavy network connection when compared
to some of the other options. So if you have a lot of checks to do, you may want to look at
something different than execute by ssh. So there's a check by ssh plugin that's used to check
commands on remote systems. Typically, what you'll do is you'll set up ssh key authentication
from the user that's running a Nagios. And in most cases, the user name is Nagios. So what you want
is you want the Nagios user to be able to log into the remote system without having to pass a
password. Again, one cool thing about Nagios plugins and Nagios checks is they can all be tested
outside of Nagios. So you can try the command before you set it up in Nagios to make sure it's
working. So to test running a remote plugin via ssh, you can cd to the plugins directory
and then run the check by ssh script with some flags, dash h for the host, dash u for the user,
and then dash capital c for the path to the check on the remote server. So now that you've verified
the syntax to the check by ssh command that you want to use, you can add that to a command file in
Nagios.d directory. Again, I'm not going to bore you by reading off Nagios configs, but the example
will be in the show notes. So now that you've added the command definition, you can use the
check syslog age command as a service check for one of your hosts.
And the script I'm using to check the syslog date will be in the show notes.
Not that you would have this specific use, but you can look and see how I'm doing the test
and how I structured the logic of the script.
So another common use for Nagios or another common method for
monitoring servers is via SNMP. SNMP is really complicated and honestly I have some
mixed feelings about using it. I'm not, I just can't go into all the different
variations and versions of SNMP or all the different SNMP authentication options,
but I will show you how to get a minimal setup working so you can monitor a few things via SNMP.
And I want to give you guys a warning. The SNMP authentication option that I am going to
demonstrate you today is only appropriate for isolated networks. If you plan on using SNMP over a
public network, I highly recommend looking at some more secure versions of SNMP or even tunneling
your SNMP traffic over SSH or MVPN. If you want to know more about SNMP, I'm definitely not
an expert, but one of my very favorite tech authors, Michael Lucas, just recently released a book
called SNMP Mastery. It's a good book. It goes into a lot of details. It'll, if you need to know
how to use SNMP, securely, this is the source you need. So first we'll talk about the clients
or the servers being monitored. We'll talk about the setup on those. So on a month to it's easy,
just apt install SNMPD. And we need to make a few changes to the config file by default. SNMPD only
listens on localhost. So you'll have to replace the SNMPD comp. I'll have an example in the show
notes of the SNMP.com that I use. So it changes the listening address to all IP addresses. And it
also sets a read-only community string. And again, using community strings is insecure. Don't do
it over the internet. And then finally on the clients restart SNMPD. And if you're using a firewall,
you'll need to open up the SNMPD port 161. And again, last morning, if you put it on the internet,
don't do it this way. So now we'll talk a little bit about setting up the checks in Nugios.
If you remember from the last episode, you talked about this directory. It's an Etsy Nugios plug-ins,
slash config. There's a lot of checks that are already set up for you. Remember, just a second
ago, when we were talking about the custom script, we had a write our own command that went along
with a custom script. That's not the case with these SNMP checks. They're already defined for you.
So if you look in that SNMP.CFG in the plug-ins directory in Etsy, you can look through that file and
get an idea of all the different checks you can do via SNMP. I'll have an example of some client
configuration using SNMP in the show notes. You can look through those as well and get an idea of
how the command definitions are used. Most of that SNMP definitions require or can take arguments.
Whenever you need to pass a command and argument, that argument is represented in the check by a
placeholder, which is arg1 and dollar signs. In a lot of cases, the arguments are optional.
In these SNMP checks, they require the community string and then some of the disc checks require
an argument of the disc number. In the service checks definitions, the arguments that you pass to
the commands are separated by exclamation points. You can also see that in the example.
Also, in the examples, you can see how I add additional contacts, the pushover contacts,
and I have an example of how to change the number of check attempts. Normally, the default is
fine, but if you want to, it'll have to, a check will have to fail five times if we're
going to alert, but if you want to change that to one, that's the check interval. You can see that
in the example. Also, I have an example of changing the frequency of the check from the default of
five minutes to one minute. Another thing I like to do with Nagos is use it to monitor any remote
service I have for security updates. Nagos has plugins that can check to see if system
updates are required, and it can tell you things like the number of updates and the check will be
critical if any of the updates are security related, and it can also tell you if a reboot is
required because you've updated the kernel, but you're not running the latest kernel.
So the check plugin has to be installed on the remote server.
For Debian-based systems, the name of the plugin is Nagos-plugins-contrib,
and that's a big package. It'll come with a lot of checks, more than just one for a second
if you need apt updates, and then on Red Hat-based systems, the name of the plugin is Nagos-plugins-check-updates.
The command definitions for how to run it are listed in the show notes. These plugins take
a little while to run longer than typical, so you'll see in the example where I set the time out
with the dash t-flag to 120 seconds just to give it some extra time to do all the checks it has to do.
That's probably all the Nagos I can handle for now, so if you have any comments or questions,
leave them under the episode, and again, thanks for listening, and I'll see you guys next time.
You've been listening to HackerPublicRadio at HackerPublicRadio.org.
We are a community podcast network that releases shows every weekday, Monday through Friday.
Today's show, like all our shows, was contributed by an HPR listener like yourself.
If you ever thought of recording a podcast and click on our contributing,
to find out how easy it really is. HackerPublicRadio was founded by the Digital Dog Pound
and the Infonomicon Computer Club, and is part of the binary revolution at binrev.com.
If you have comments on today's show, please email the host directly, leave a comment on the website
or record a follow-up episode yourself. Unless otherwise status,
today's show is released on the Creative Commons'
Attribution ShareLight 3.0 license.