Caching Server Nagios Script Updated for Mavericks

Just a quick heads up that Jedda and I’s check_osx_caching monitoring script for Nagios and OS X has been updated to support OS X Mavericks Server. Given that Caching Server 2 in 10.9 adds more verbosity to what content is taking up what space, I’ve added support for each content type and has been added to the performance data that is returned in the script.

As content usage is not tracked in Mountain Lion’s Caching Server I had to do a check of the operating system to see what version the script is being run on. To do this, I used the function sw_vers and added the flag productVersion to only have the product version returned. Now, on a first release of the OS you’ll get a clean float like 10.9 or 10.8 which makes it really easy to do a comparison using bc (a precision calculator language). Where it becomes a problem is when you get numbers like 10.8.5 which, because they have two periods, mean they don’t work with bc. I decided I could either write a function that does a full comparison of the version numbers, or just get the major and minor release and do a standard mathematical float comparison. Using grep I can pull the major and minor release information and assign it to a variable. See below for the code:

sw_vers -productVersion | grep -E -o "[0-9]+\.[0-9]"

Next, the comparison is performed to see whether the current OS is less than 10.9. If the current OS is less than 10.9, 1 is returned. If it’s the same (or greater), the result is 0. This code is below:

echo $osVersion '< 10.9' | bc -l

Note that the above example requires the variable $osVersion. If you were hard coding the values, you could do something like below:

echo '10.8 < 10.9' | bc -l

Now, if the value of the expression is 0, we will now grab all the Mavericks Caching Server usage data, and assign it to a variable called mavericksPerfData which is appended to the end of the final return printf.

I'll be updating our RRDtool graph for the Caching Server 2 and will post the code soon! Stay tuned.

Check out the code on GitHub!

RRDtool Graph for Nagios’ check_disk Command

Over the past few months I’ve been working more and more with RRDtool, and since we use GroundWork at work, I thought I’d start to better the default graphs that come with GroundWork. One of the graphs I thought needed the most improvement is the graph for check_disk. See the old graph below:

Disk Usage Old
The original GroundWork/RRDtool graph for disk usage.
Disk Usage New
The new and improved graph for disk usage!

Along with my other graphs, I will be using the Outlined Area Graphs colour set for my graph areas and lines.

Without further ado, here is the code for the graph:

rrdtool graph - \
-E \
-P \
-h 180 \
-l 0 \
--grid-dash 1:2 \
-t "<big><b>Disk Utilisation</b></big>" \
-b 1024 \
-X 0 \
-a PNG \
-v "<b>Disk Usage (MB)</b>" \
DEF:diskCurr=rrd_source:ds_source_0:AVERAGE \
DEF:diskWarn=rrd_source:ds_source_1:AVERAGE \
DEF:diskCrit=rrd_source:ds_source_2:AVERAGE \
DEF:diskMax=rrd_source:ds_source_3:AVERAGE \
CDEF:diskPerc=diskCurr,diskMax,/,100,* \
CDEF:cdefDisk=diskCurr \
CDEF:cdefw=diskWarn \
CDEF:cdefc=diskCrit \
CDEF:cdefm=diskMax \
CDEF:warnPerc=diskWarn,diskMax,/,100,* \
CDEF:critPerc=diskCrit,diskMax,/,100,* \
AREA:diskCurr#54EC48:"<b>Space Used\:</b>\g" \
LINE2:cdefDisk#24CB14: \
GPRINT:diskCurr:LAST:" <b>%.0lf MB</b>\g" \
GPRINT:diskPerc:LAST:" <i><b>(%.0lf%%)</b></i>" \
LINE2:cdefm#4D18E4:"<i>Maximum Capacity\:</i>\g" \
GPRINT:cdefm:LAST:" <i>%.0lf MB</i>\n" \
LINE2:cdefw#C9B215:"Warning Threshold\:\g" \
GPRINT:cdefw:AVERAGE:" <i>%.0lf MB</i>\g" \
GPRINT:warnPerc:LAST:" <i>(%0.lf%%)</i>" \
LINE2:cdefc#CC3118:"<i>Critical Threshold\:</i>\g" \
GPRINT:cdefc:LAST:" <i>%.0lf MB</i>\g" \
GPRINT:critPerc:LAST:" <i>(%.0lf%%)</i>\n" \
-c GRID#C0C0C0 \
-c MGRID#404040 \
  • rrdtool graph - — this will tell rrdtool that we’re making a graph, and the output will go to stdout (needed for GroundWork, otherwise you can put something like disk_usage.png).
  • -E — aka --slope-mode gives the graph a more organic and natural look.
  • -P — uses Pango markup to render all text with HTML, so you can use tags like <b> or <i>. There are more available, you can check out Pango Reference Manual.
  • -h 180 — this sets the height of the graph to 180 pixels.
  • -l 0 — this sets the lowest number (or limit) of the Y-axis to 0, because obviously, a storage device can’t have a negative capacity.
  • --grid-dash 1:2 — gives us nice and small dotted lines for the graph.
  • -t "<big><b>Disk Utilisation</b></big>" — the title for the graph, with Pango-supported HTML to make the title bigger and bolder (as the tags would suggest).
  • -b 1024 — as we are dealing with storage capacity, we measure the data in base2.
  • -X 0 — disables the unit exponent scaling for this graph.
  • -a PNG — formats the output of the graph as a PNG. Depending on what setting you have, it could impact some other settings (i.e. --no-gridfit).
  • -v "<b>Disk Usage (MB)</b>" — this sets a vertical title on the left-side of the Y-axis.
  • DEF:diskCurr=rrd_source:ds_source_0:AVERAGE — this defines a variable called diskCurr which is sourced from the RRD. GroundWork automatically fetches this for you. From the command line, you would manually specify both the path to the RRD, and the data source (i.e. server17.pretendco.com_ssh_disk_boot.rrd:_dev_disk0s2).
  • DEF:diskWarn=rrd_source:ds_source_1:AVERAGE — same as above, but gets the warning level for the disk (in MB).
  • DEF:diskCrit=rrd_source:ds_source_2:AVERAGE — same as above, but gets the critical level for the disk (in MB).
  • DEF:diskMax=rrd_source:ds_source_3:AVERAGE — same as above, but gets the maximum size of the disk (in MB).
  • CDEF:diskPerc=diskCurr,diskMax,/,100,* — this command (CDEF) copies the values of a variable(s), and math can be performed using rpn (Reverse Polish Notation). The standard way of writing this is diskPerc = (diskCurr / diskMax) * 100 as we are calculating the percentage used of the disk. I won’t delve into the complexities of RPN, but take a look at the Wikipedia article.
  • CDEF:cdefDisk=diskCurr — assign another variable called cdefDisk which copies the diskCurr variable.
  • CDEF:cdefw=diskWarn — as above, copy one variable to another.
  • CDEF:cdefc=diskCrit — as above, copy one variable to another.
  • CDEF:cdefm=diskMax — as above, copy one variable to another.
  • CDEF:warnPerc=diskWarn,diskMax,/,100,* — calculate the warning level as a percentage. In standard mathematics, your equation would be warnPerc = (diskWarn / diskMax) * 100.
  • CDEF:critPerc=diskCrit,diskMax,/,100,* — like above, calculate the critical level as a percentage.
  • AREA:diskCurr#54EC48:"<b>Space Used\:</b>\g" — graph the area of the variable diskCurr with the colour #54EC48, then print Space Used: in bold, with a string modifier (\g) to strip whitespace at the end of the string.
  • LINE2:cdefDisk#24CB14: — print a line on the graph that is 2 pixels thick, with the value of cdefDisk and the colour #24CB14. Note the extra colon (:) at the end, this means we’re not printing any string in the legend (below the graph).
  • GPRINT:diskCurr:LAST:" %.0lf MB\g" — print the last value of diskCurr as an integer with MB at the end.
  • GPRINT:diskPerc:LAST:" (%.0lf%%)" — print the percentage of the disk used.< Note the double percentage, the first percentage symbol escapes the second symbol (so RRDtool doesn't think we're trying to print a number)./li>
  • LINE2:cdefm#4D18E4:"Maximum Capacity\:\g" — draw a line on the graph that represents the maximum capacity of the storage device.
  • GPRINT:cdefm:LAST:" %.0lf MB\n" — print the maximum capacity of the storage device in megabytes (MB).
  • LINE2:cdefw#C9B215:"Warning Threshold\:\g" — draw a line on the graph that represents the warning threshold of space used.
  • GPRINT:cdefw:AVERAGE:" %.0lf MB\g" — print the warning threshold in MB.
  • GPRINT:warnPerc:LAST:" (%0.lf%%)" — print the warning threshold as a percentage.
  • LINE2:cdefc#CC3118:"Critical Threshold\:\g" — draw a line on the graph that represents the critical threshold of space used.
  • GPRINT:cdefc:LAST:" %.0lf MB\g" — print the critical threshold in MB.
  • GPRINT:critPerc:LAST:" (%.0lf%%)\n" — print the critical threshold has a percentage.
  • -c BACK#FFFFFF — change the background colour of the graph to #FFFFFF (white).
  • -c CANVAS#FFFFFF — change the canvas colour (the actual graph itself) of the graph to #FFFFFF (white).
  • -c GRID#C0C0C0 — change the grid colour to #C0C0C0 (light gray).
  • -c MGRID#404040 — change the major grid colour to #404040 (dark gray).
  • -c ARROW#FFFFFF — change the arrow colours to #FFFFFF (white) to hide them.
  • -Y — enables the nice dynamic Y-axis grid that gives you whole numbers and ensures you don’t have too many horizontal lines that could make the graph messy or hard to understand.

You can also take a look at the gist for the code.


Folder Size Monitoring with Nagios and RRDtool

Folder Size Graph
The RRDtool graph in GroundWork.

Recently at my place of employment we’ve had a few customers have their log folders explode in size and crash their servers. Since we already monitor OS X Servers using GroundWork, I decided to write my own folder size Nagios plugin checks folder sizes that alerts if they get too large, and returns performance data for use with RRDtool.

Bash Hokery

The Bash script called (which can be obtained at the GitHub repo) should be uploaded to any folder on the OS X Server you want to monitor.

To test the script, cd into the directory where the Bash script is, then you can test it doing ./ -f /Library/Logs -w 1024 -c 2048.

Your required flags for the Bash script are:

  • f — the path for the folder you want to get the size for (wrap the path in ” double quotes if there’s a space)
  • w — the warning level (in MB).
  • c — the critical level (in MB).

And the one optional flag, which shouldn’t be used for this particular Nagios plugin, is:

  • m — block size for the folder (e.g. k for KB, m for MB and g for GB)

For your service check command line, enter something like this: check_by_ssh_folder_size!/Library/Logs/!m!1024!2048. See below:

  • check_by_ssh_folder_size — this is my command in Nagios for the check.
  • !/Library/Logs — the folder path for sizing up.
  • !m — we want return data in MB.
  • !1024 — the warning level in MB.
  • !2048 — the critical level in MB.

RRDtool Doo-Hickey

Below is the RRDtool create command (this is used in GroundWork, but may be used for other platforms)

$RRDTOOL$ create $RRDNAME$ --step 300 --start n-1yr $LISTSTART$ DS:$LABEL#$:GAUGE:95040:U:U DS:$LABEL#$_wn:GAUGE:95040:U:U DS:$LABEL#$_cr:GAUGE:95040:U:U $LISTEND$ RRA:AVERAGE:0.5:1:8640 RRA:AVERAGE:0.5:12:9480

And below is the RRDtool update command for each check that is performed. Here we update the RRD file with the last check time, the folder size in MB, the warning level and critical level.


Finally, we have the RRDtool graph command that generates a nice, custom graph for visual output of performance data.

rrdtool graph - \
--slope-mode \
--height 180 \
--grid-dash 1:2 \
--title="Folder Size" \
--base 1024 \
--units-exponent 0 \
--vertical-label "Size (in MB)" \
--imgformat=PNG \
DEF:a=rrd_source:ds_source_0:AVERAGE \
DEF:w=rrd_source:ds_source_1:AVERAGE \
DEF:c=rrd_source:ds_source_2:AVERAGE \
CDEF:cdefa=a \
CDEF:cdefw=w \
CDEF:cdefc=c \
AREA:a#54EC48:"Space Used" \
LINE:cdefa#24BC14: \
GPRINT:a:LAST:"Current\: %.0lf MB" \
GPRINT:a:AVERAGE:"Average\: %.0lf MB" \
GPRINT:a:MAX:"Maximum\: %.0lf MB\n" \
LINE2:cdefw#ECD748:"Warning Threshold\:" \
GPRINT:cdefw:LAST:"%.0lf MB" \
LINE2:cdefc#EA644A:"Critical Threshold\:" \
GPRINT:cdefc:LAST:"%.0lf MB\n" \
CDEF:cdefws=a,cdefw,GT,a,0,IF \
AREA:cdefws#ECD748 \
CDEF:cdefcs=a,cdefc,GT,a,0,IF \
AREA:cdefcs#EA644A \
-c GRID#C0C0C0 \
-c MGRID#404040 \

Explanation of the RRDtool graph

  • rrdtool graph - — this will tell rrdtool that we’re making a graph, and the output will go to stdout (needed for GroundWork, otherwise you can put something like folder_size.png)
  • --slope-mode — this gives the graph a nice organic look, rather than the default step-like lines.
  • --height 180 — make the graph 180 pixels in height.
  • --grid-dash 1:2 — this will make the grid lines slightly dashed (1:3 ratio will make a dotted line)
  • --title "Folder Size" — gives the graph a nice large title.
  • --base 1024 — as we are graphing storage, we want the numbers in base 2.
  • --units-exponent 0 — this will prevent automatic y-axis scaling (it messes up my graph)
  • --vertical-label "Size (in MB)" — puts a label on the left-hand side of the graph (text is printed vertically)
  • --imgformat=PNG — format the output image as a PNG.
  • DEF:a=rrd_source:ds_source_0:AVERAGE — set a variable `a` with the value of the folder size in MB.
  • DEF:w=rrd_source:ds_source_1:AVERAGE — set a variable `w` (warning) with the warning level value.
  • DEF:c=rrd_source:ds_source_2:AVERAGE — set a variable `c` (critical) with the critical level value.
  • CDEF:cdefa/w/c=a/w/c — define more variables for later calculations.
  • AREA:a#54EC48:"Space Used" — define an area on the graph with the variable (a), and the colour (#54EC48) (colour is made with rgb components) and a legend “Space Used”.
  • LINE:cdefa#24BC14: — graph a line at the top of the main area.
  • GPRINT:a:LAST:"Current\: %.0lf MB" — print the text “Current: XXX MB” to the graph (on the same line as the text “Space Used”) which has the most recent rrd entry.
  • GPRINT:a:AVERAGE:"Average\: %.0lf MB" — print the text “Average: XXX MB” to the graph (on the same lines as the other text so far), which displays the average in MB.
  • GPRINT:a:MAX:"Maximum\: %.0lf MB\n" — print the text “Maximum: XXX MB” to the graph, along with a newline character at the end.
  • LINE2:cdefw#ECD748:"Warning Threshold\:" — draw a warning level line on the graph, with the rgb colour.
  • GPRINT:cdefc:LAST:"%.0lf MB" — print the warning level in MB
  • LINE2:cdefc#EA644A:"Critical Threshold\:" — draw the critical level line on the graph.
  • GPRINT:cdefc:LAST:"%.0lf MB\n" — print the critical level in MB, along with a newline character.
  • CDEF:cdefws=a,cdefw,GT,a,0,IF — using a calculated define, we can work out if the folder size is larger than the folder size, and if it is, we’ll change the graph colour on the next line.
  • AREA:cdefws#ECD748 — using the calculation above, change the colour of the graph.
  • CDEF:cdefcs=a,cdefc,GT,a,0,IF — like above, do another calculation to see if the folder size is larger than the critical level.
  • AREA:cdefc#EA644A — as above, colour the graph to the appropriate colour if it reaches or goes over the critical threshold.
  • -c BACK#FFFFFF — change the background colour of the graph to white (#FFFFFF).
  • -c CANVAS#FFFFFF — change the colour of the graph canvas to white.
  • -c GRID#C0C0C0 — change the grid colour to a light-ish grey.
  • -c ARROW#FFFFFF — change the colour of the graph arrows to white, to hide them all together.
  • -Y — scale the graph to integers dynamically on the graph’s Y axis.

Once you’ve got this all set up, you should be getting wonderful graphs (like the one above), along with performance data.