I like nagios and software raid on my Linux boxes to make sure I don't lose any data, but I was unhappy with the script I used to monitor it. Attached is a new one that is shorter and works better, as it leaves all the work to the mdadm tool instead of trying to parse /proc/mdstat.
#!/bin/sh
# (c) 2008 Jasper Spaans <j@jasper.es>
worst=0
msg=""
for dev in /dev/md?* ; do \
mdadm --misc -t $dev
status=$?
if [ $status == 0 ]; then
msg="${msg} ${dev}: ok"
elif [ $status == 1 ] ; then
if [ worst != 2 ] ; then
worst=1
fi
msg="${msg} ${dev}: degraded"
elif [ $status == 2 ] ; then
worst=2
msg="${msg} ${dev}: unusable"
fi
done
echo $msg
exit $worst
It might be better to parse the /etc/mdadm.conf file. So your first line might read:
for dev in `cat /etc/mdadm.conf | grep md?* | sort | awk '{print $2}'` ; do \
Which avoids checking my non-existent md3-md32 devices.
The line I would replace would be line FIVE (the initialization of the "for" loop) and not line ONE.
The reason I find this necessary is because, in SL4.3 (fedora variant), there are 32 "md" devices listed in /dev even though they may not be all used.
Thus the correction just looks at those devices defined in /etc/mdadm.conf, which is where one would want to look.