Learning Nagios 4 – A Book Review

As a longtime user of GroundWork, I’ve always had an abstraction layer between me and Nagios. I’d always thought that having a better understanding of the internals of GroundWork would make it easier for me to use, but I didn’t take the opportunity to learn about Nagios until now.

The book, Learning Nagios 4, by Wojciech Kocjan, weighs in at 400 pages and is the second edition. I found the book to be very well written, and it contained a lot of good technical information that I thought was interesting and beneficial.

Chapter 1 introduces Nagios to the unfamiliar user, and Wojciech gives good examples that ensure system administrators that Nagios is suitable for them. can provide IT staff with a very good system to check infrastructure and software to ensure it’s working correctly.

Chapter 2 runs through installing and configuring Nagios. I was very pleased to see a book providing instructions on installing software from source, as it’s rather unusual in my experience to find books that don’t just provide installation by package manager. Going through common Nagios configurations was also interesting, as I learnt a few quirks about templates and precedence.

Chapter 3 is all about the web interface that compliments Nagios. As a user of Nagios by proxy through GroundWork I was a little shocked at the Web GUI and how different it was to the interface I was used to, but it is nice to see Nagios 4 has implemented PHP support so there’s a bigger avenue for theme customisation.

Chapter 4 talks about the basic plugins that are provided with Nagios. If you’re a follower of my blog you would’ve seen my Nagios plugins for OS X Server, some of which were co-authored with/by my friend Jedda Wignall. I learnt quite a bit about the inbuilt plugins that come with Nagios, including the plugins that can schedule package manager checks – very cool!

Chapter 5 discusses advanced configuration details, mainly about templates and the nuances to inheritance, along with describing what flapping actually is. I thought the section on using multiple configurations (like OS type, location etc) to generate a configuration for a specific machine was quite interesting, and would allow the user to create advanced host settings with relative ease.

Chapter 6 was a chapter that I found very interesting as it focused on alerts and contacts. As a former member of a very small team we were inundated by emails every day and it became hard to keep track of what was coming in. The authors example of constant email flooding was exactly what happened to us. It’s worth spending a bit more time setting up proper alerts to make sure the right information reaches the right people, rather than spamming everyone constantly.

Chapter 7 talks about passive checks, and how they compare to the normal active checks. NCSA, or the Nagios Service Check Acceptor is also discussed, which is a daemon on the client end that can send check results back to the monitoring service securely. I’ve not used either types of passive checks, so learning about them was quite interesting. I’m looking forward to putting them into good use some time.

Chapter 8 contains a ton of great information and detail about the remote server checks performed by SSH, and the Nagios Remote Plugin Executor (NRPE). The author provides good arguments for choosing either of the services, depending on your requirements. I hadn’t actually heard of NRPE before, but it looks to be quite powerful without the overhead of SSH connections by the host.

Chapter 9 is all about SNMP and how it can interact with Nagios. In past experience I’ve only ever had bash scripts to process SNMP responses, but now I know how to implement it properly into Nagios without having a conduit processing script. I also never really knew much about SNMP, so it was good to learn about what SNMP actually is, not just how to interact with it, which can be an issue in some technical books where interacting is explained, but the source/destination isn’t.

Chapter 10 starts off by covering getting Nagios working with Windows clients, which to me isn’t very applicable as I’m purely a Linux/Unix/OS X man myself so my eyes glazed over as I pushed through that section. Having said that, it’s good to know Nagios monitoring is fully supported in Windows with the appropriate software installed. Another concept that is looked at in Chapter 10 is the setup and configuration of a multi-server master/slave setup with multiple Nagios servers. Now, unfortunately (or fortunately, depending on which way you look at it) I’ve not been in a position where I’ve needed to have multiple Nagios servers performing checks, but it’s useful to know that it’s possible, and to have some instructions on getting it set up.

Chapter 11 is probably my favourite chapter of the book because it’s all about programming Nagios plugins. The book has a multitude of examples written in different languages. I’ve always done my scripts in Bash, but had never even thought of writing plugins in PHP, which is my strongest language. Having seen code for a few languages (like Tcl) that I’ve heard of but not used, this book has encouraged me to try other languages for Nagios plugins, and not limit myself to Bash.

Chapter 12, the final chapter, talks about the query handler which is used for two-way communications with Nagios. There’s also a section on Nagios Event Radio Dispatcher (or NERD) which can be used for a real-time notification system for alerts.

Overall, I would highly recommend this book to any sysadmins looking to implement an excellent monitoring solution that is easy to set up, yet powerful enough through its extensive plugin collection and flexibility. After reading this book I’ve come away with a stronger knowledge of Nagios that will benefit my work in the future.

Note: I was provided with a free eBook to review this book, however, this review is 100% genuine and contains my true thoughts about the book.


SSL Certificate Expiry Monitoring with Nagios

The expiration of an SSL certificate can render your secured service inoperable. In late September, LightSpeed’s SSL certificate for their activation server expired which made any install or update fail due to not being able to connect to said activation server via SSL. This problem lasted two days and was a problem worldwide. This could have been avoided by monitoring the SSL certificate using a monitoring solution like GroundWork. We currently have a script that checks certificates, but it’s only designed to look at certificates on OS X in /etc/certificates. I’ve since written a bash script that will obtain any web-accessible certificate via openssl and check its start and end expiry dates.

The command I use to get the SSL certificate and its expiry is as follows:

echo "QUIT" | openssl s_client -connect apple.com:443 2>/dev/null | openssl x509 -noout -startdate 2>/dev/null

First, we echo QUIT and pipe it to the openssl s_client -connect command to send a QUIT command to the server so openssl will finish up neatly (and won’t expect a ^C to quit the process). Next, that result is piped to openssl x509 which displays certificate information. I also send the noout flag to not prevent the output of the encoded version of the certificate. Finally, startdate or enddate depending on which date I want. With both of the openssl commands I redirect stderr (standard error) output to /dev/null because I’d rather ignore any errors at this time.

The rest of the script performs regex searches of the output (using grep), then formats the timestamp with date before doing a differential of the expiration timestamp, and the current timestamp (both of these are in UNIX Epoch time). Depending on how many you days in the expiration warning, if the number of seconds for the current Epoch time minus the expiration date is less than the expiration day count, a warning is thrown.

To use this script, all you need to specify is the address of the host, the port and number of days to throw a warning (e.g. specify 7 if you want a warning period of 7 days). For example, to get the SSL certificate for apple.com on the port 443 you would enter:

./check_ssl_certificate.expiry -h apple.com -p 443 -e 7

And you should get something like below:

OK - Certificate expires in 338 days.

Not only does this script verify the end date of the certificate, but it also verifies the start date of the certificate. For example, if your certificate was set to be valid from 30th December 2013, you might see something like the error below:

CRITICAL - Certificate is not valid for 30 days!

Or if your certificate has expired, you’ll see something like below:

CRITICAL - Certificate expired 18 days ago!

Hopefully this script will prevent your certificates from expiring without your knowledge, and you won’t have angry customers that can’t access your service(s)!

Check out the code on GitHub!


Time Machine Monitoring Nagios Script Updated for OS X Mavericks

Just a quick heads up that the Nagios script for Time Machine written by Jedda (and updated by me) has been updated with support for OS X Mavericks (10.9). Unfortunately the .TimeMachine.Results file no longer exists in Mavericks. At first my thought was to enumerate through the arrays in /Library/Preferences/com.apple.TimeMachine.plist but that was going to be very complex, and very difficult to do in Bash. In the end, tmutil (Man Page Link) came to the rescue.

Along with the other Mavericks scripts, I do the quick check to see what OS is running before doing a if/then statement depending on the OS. When running tmutil latestbackup you’ll get the path to the latest backup (note that the disk(s) must be connected otherwise the command will fail). The last folder in the path is the timestamp for the most recent backup. Using grep we can pull out the timestamp from the path:

tmutil latestbackup | grep -E -o "[0-9]{4}-[0-9]{2}-[0-9]{2}-[0-9]{6}"

Next up, using the date command we can convert the timestamp to the time in Unix Epoch (number of seconds from 1st January, 1970). Once the timestamp has been converted to the Unix timestamp, we get the current time in Unix time and measure the different between the two by subtracting the latest backup timestamp from the current timestamp.

This is a relatively minor update to the OS X Monitoring Tools codebase, but it’s handy to have another script updated for OS X Mavericks (10.9). As always, the code for the file is below:

Check out the code on GitHub!


Caching Server Nagios Script Updated With Port Checking

I’ve just updated the Nagios script for OS X Server’s Caching Server with support for ensuring Caching Server is running on a specific port.

In a normal setup of Caching Server, it has the ability to change ports during reboots, and if you have a restrictive firewall, you’ll have to specify a port for Caching Server via the command line. You can specify the port via the serveradmin command (make sure you stop the service first):

sudo serveradmin settings caching:Port = xxx

The xxx is the port you want to use for Caching Server. Normally, Caching Server will use 69010, but depending on your environment, that port may be taken by another service and so Caching Server will pick another port. Using that command above will get Caching Server to attempt to map to the specified port, but if it can’t it will fall back to finding another available port.

The script checks the port as defined in serveradmin settings caching:Port and cross-references it with serveradmin fullstatus caching:Port. If they are the same, we’ll continue through the script, but if it doesn’t I’ll throw a warning. Naturally, if serveradmin settings caching:Port returns 0, I skip any further port checks because a 0 port means randomly assign a port.

Check out the code on GitHub


FreeRADIUS Monitoring Script Updated for Mavericks

I’ve updated the FreeRADIUS monitoring script for Nagios with support for OS X Mavericks Server. Mavericks changed the way the FreeRADIUS server is started, along with the paths of execution and storage.

Like the update to the Caching Server 2 monitoring script, I had to write a check to see if the current OS is running 10.9, and if it is, perform a Mavericks-specific check. Like I mentioned in the other article, doing a version check and comparison isn’t particularly easy in Bash, but thankfully, I only have the compare the major and minor release numbers which is nice given that it’s also a float. Using bc (the arbitrary precision calculator language – I should get that on a shirt!) I can quite easily calculate the difference between 10.8 and 10.9. Anyway, check the code below for how I get the version number.

sw_vers -productVersion | grep -E -o "[0-9]+\.[0-9]"

Next, the comparison is performed to see whether the current OS is less than 10.9. If the current OS is less than 10.9, 1 is returned. If it’s the same (or greater), the result is 0. This code is below:

echo $osVersion '< 10.9' | bc -l

Note that the above example requires the variable $osVersion. If you were hard coding the values, you could do something like below:

echo '10.8 < 10.9' | bc -l

The major difference in my script for OS X Mavericks is now there's actually a process running called radiusd! Using ps and grep I now check to see if the FreeRADIUS server is running by doing this:

ps -ef | grep radiusd

Which, if the FreeRADIUS (or radiusd) is running, will return a non-empty string. If you run that and get an empty string (or nothing) back then your FreeRADIUS server isn't running. Shit. I recommend doing radiusd -X to start your RADIUS server in debug mode. That or you forgot to get RADIUS added to launchd by entering radiusconfig -start. Anyway, that's enough chit-chat, just get the damn code from the link below:

Check out the code on GitHub!