Just a quick heads up that Jedda and I’s check_osx_caching monitoring script for Nagios and OS X has been updated to support OS X Mavericks Server. Given that Caching Server 2 in 10.9 adds more verbosity to what content is taking up what space, I’ve added support for each content type and has been added to the performance data that is returned in the script.
As content usage is not tracked in Mountain Lion’s Caching Server I had to do a check of the operating system to see what version the script is being run on. To do this, I used the function sw_vers and added the flag productVersion to only have the product version returned. Now, on a first release of the OS you’ll get a clean float like 10.9 or 10.8 which makes it really easy to do a comparison using bc (a precision calculator language). Where it becomes a problem is when you get numbers like 10.8.5 which, because they have two periods, mean they don’t work with bc. I decided I could either write a function that does a full comparison of the version numbers, or just get the major and minor release and do a standard mathematical float comparison. Using grep I can pull the major and minor release information and assign it to a variable. See below for the code:
Next, the comparison is performed to see whether the current OS is less than 10.9. If the current OS is less than 10.9, 1 is returned. If it’s the same (or greater), the result is 0. This code is below:
echo $osVersion '< 10.9' | bc -l
Note that the above example requires the variable $osVersion. If you were hard coding the values, you could do something like below:
echo '10.8 < 10.9' | bc -l
Now, if the value of the expression is 0, we will now grab all the Mavericks Caching Server usage data, and assign it to a variable called mavericksPerfData which is appended to the end of the final return printf.
I'll be updating our RRDtool graph for the Caching Server 2 and will post the code soon! Stay tuned.
Over the past few months I’ve been working more and more with RRDtool, and since we use GroundWork at work, I thought I’d start to better the default graphs that come with GroundWork. One of the graphs I thought needed the most improvement is the graph for check_disk. See the old graph below:
Along with my other graphs, I will be using the Outlined Area Graphs colour set for my graph areas and lines.
Without further ado, here is the code for the graph:
rrdtool graph - — this will tell rrdtool that we’re making a graph, and the output will go to stdout (needed for GroundWork, otherwise you can put something like disk_usage.png).
-E — aka --slope-mode gives the graph a more organic and natural look.
-P — uses Pango markup to render all text with HTML, so you can use tags like <b> or <i>. There are more available, you can check out Pango Reference Manual.
-h 180 — this sets the height of the graph to 180 pixels.
-l 0 — this sets the lowest number (or limit) of the Y-axis to 0, because obviously, a storage device can’t have a negative capacity.
--grid-dash 1:2 — gives us nice and small dotted lines for the graph.
-t "<big><b>Disk Utilisation</b></big>" — the title for the graph, with Pango-supported HTML to make the title bigger and bolder (as the tags would suggest).
-b 1024 — as we are dealing with storage capacity, we measure the data in base2.
-X 0 — disables the unit exponent scaling for this graph.
-a PNG — formats the output of the graph as a PNG. Depending on what setting you have, it could impact some other settings (i.e. --no-gridfit).
-v "<b>Disk Usage (MB)</b>" — this sets a vertical title on the left-side of the Y-axis.
DEF:diskCurr=rrd_source:ds_source_0:AVERAGE — this defines a variable called diskCurr which is sourced from the RRD. GroundWork automatically fetches this for you. From the command line, you would manually specify both the path to the RRD, and the data source (i.e. server17.pretendco.com_ssh_disk_boot.rrd:_dev_disk0s2).
DEF:diskWarn=rrd_source:ds_source_1:AVERAGE — same as above, but gets the warning level for the disk (in MB).
DEF:diskCrit=rrd_source:ds_source_2:AVERAGE — same as above, but gets the critical level for the disk (in MB).
DEF:diskMax=rrd_source:ds_source_3:AVERAGE — same as above, but gets the maximum size of the disk (in MB).
CDEF:diskPerc=diskCurr,diskMax,/,100,* — this command (CDEF) copies the values of a variable(s), and math can be performed using rpn (Reverse Polish Notation). The standard way of writing this is diskPerc = (diskCurr / diskMax) * 100 as we are calculating the percentage used of the disk. I won’t delve into the complexities of RPN, but take a look at the Wikipedia article.
CDEF:cdefDisk=diskCurr — assign another variable called cdefDisk which copies the diskCurr variable.
CDEF:cdefw=diskWarn — as above, copy one variable to another.
CDEF:cdefc=diskCrit — as above, copy one variable to another.
CDEF:cdefm=diskMax — as above, copy one variable to another.
CDEF:warnPerc=diskWarn,diskMax,/,100,* — calculate the warning level as a percentage. In standard mathematics, your equation would be warnPerc = (diskWarn / diskMax) * 100.
CDEF:critPerc=diskCrit,diskMax,/,100,* — like above, calculate the critical level as a percentage.
AREA:diskCurr#54EC48:"<b>Space Used\:</b>\g" — graph the area of the variable diskCurr with the colour #54EC48, then print Space Used: in bold, with a string modifier (\g) to strip whitespace at the end of the string.
LINE2:cdefDisk#24CB14: — print a line on the graph that is 2 pixels thick, with the value of cdefDisk and the colour #24CB14. Note the extra colon (:) at the end, this means we’re not printing any string in the legend (below the graph).
GPRINT:diskCurr:LAST:" %.0lf MB\g" — print the last value of diskCurr as an integer with MB at the end.
GPRINT:diskPerc:LAST:" (%.0lf%%)" — print the percentage of the disk used.< Note the double percentage, the first percentage symbol escapes the second symbol (so RRDtool doesn't think we're trying to print a number)./li>
LINE2:cdefm#4D18E4:"Maximum Capacity\:\g" — draw a line on the graph that represents the maximum capacity of the storage device.
GPRINT:cdefm:LAST:" %.0lf MB\n" — print the maximum capacity of the storage device in megabytes (MB).
LINE2:cdefw#C9B215:"Warning Threshold\:\g" — draw a line on the graph that represents the warning threshold of space used.
GPRINT:cdefw:AVERAGE:" %.0lf MB\g" — print the warning threshold in MB.
GPRINT:warnPerc:LAST:" (%0.lf%%)" — print the warning threshold as a percentage.
LINE2:cdefc#CC3118:"Critical Threshold\:\g" — draw a line on the graph that represents the critical threshold of space used.
GPRINT:cdefc:LAST:" %.0lf MB\g" — print the critical threshold in MB.
GPRINT:critPerc:LAST:" (%.0lf%%)\n" — print the critical threshold has a percentage.
-c BACK#FFFFFF — change the background colour of the graph to #FFFFFF (white).
-c CANVAS#FFFFFF — change the canvas colour (the actual graph itself) of the graph to #FFFFFF (white).
-c GRID#C0C0C0 — change the grid colour to #C0C0C0 (light gray).
-c MGRID#404040 — change the major grid colour to #404040 (dark gray).
-c ARROW#FFFFFF — change the arrow colours to #FFFFFF (white) to hide them.
-Y — enables the nice dynamic Y-axis grid that gives you whole numbers and ensures you don’t have too many horizontal lines that could make the graph messy or hard to understand.
Recently at my place of employment we’ve had a few customers have their log folders explode in size and crash their servers. Since we already monitor OS X Servers using GroundWork, I decided to write my own folder size Nagios plugin checks folder sizes that alerts if they get too large, and returns performance data for use with RRDtool.
The Bash script called check_folder_size.sh (which can be obtained at the GitHub repo) should be uploaded to any folder on the OS X Server you want to monitor.
To test the script, cd into the directory where the Bash script is, then you can test it doing ./check_folder_size.sh -f /Library/Logs -w 1024 -c 2048.
Your required flags for the Bash script are:
f — the path for the folder you want to get the size for (wrap the path in ” double quotes if there’s a space)
w — the warning level (in MB).
c — the critical level (in MB).
And the one optional flag, which shouldn’t be used for this particular Nagios plugin, is:
m — block size for the folder (e.g. k for KB, m for MB and g for GB)
For your service check command line, enter something like this: check_by_ssh_folder_size!/Library/Logs/!m!1024!2048. See below:
check_by_ssh_folder_size — this is my command in Nagios for the check.
!/Library/Logs — the folder path for sizing up.
!m — we want return data in MB.
!1024 — the warning level in MB.
!2048 — the critical level in MB.
Below is the RRDtool create command (this is used in GroundWork, but may be used for other platforms)