Having a real hardware raid controller is a nice thing: Especially in a server setup it helps you keeping data safe on multiple disks. Though, a common mistake is, having a raid controller and not monitoring it. Why? Let’s say, you have a simple type 1 array (one disk mirrored to another) and one of the disks fails. If your raid systems works it will continue to work. But if you did not setup a monitoring for it, you won’t notice it and the chance of a total data loss increases as you are running on one disk now.
So monitoring a raid is actually the step that makes your raid system as safe as you wanted it when setting it up. Some raids are quite easy to monitor, like a Linux software raid system. Some need special software. As I recently got a bunch of dedicated (Hetzner DS8000 and other) servers with 3ware raid controllers, I checked the common software repositories for monitoring software and was surprised not finding any suitable. So a web research showed me that there are Linux tools from 3ware. Of course they don’t provide .deb packages so you need to take of this yourself if you don’t want to install the software manually.
But there exists an unofficial Debian repository by Jonas Genannt (thank you!), providing recent packages of 3ware utilities under http://jonas.genannt.name/. Check the repository, it offers 3ware-3dms and 3ware-cli. 3ware-3dms is a web application for managing your raid controller via browser, BUT: think twice, if you want this. The application opens a privileged port (888) as it is not able to bind on the local interface and has a crappy user identification system. As I am not a friend of opening ports and closing them afterwards via firewall I dropped the web solution.
The „3ware-cli“ utility is just a command line interface to 3ware controllers. Just grab a .deb from the repository above and install it via „dpkg -i xxx.deb“. Aftwerwards you stark asking your controller questions about it’s status. The command is called „tw_cli“, so let’s give it a try with „info“ as parameter:
# tw_cli info Ctl Model (V)Ports Drives Units NotOpt RRate VRate BBU ------------------------------------------------------------------------ c0 8006-2LP 2 2 1 0 2 - - |
tw_cli told us, that there is one controller (meaning a real piece of raid hardware) called „c0“ with two drives. No we want more detailed information about the given controller:
# tw_cli info c0 Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-1 OK - - - 232.885 ON - Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 232.88 GB 488397168 6RYBP4R9 p1 OK u0 232.88 GB 488397168 6RYBSHJC |
tw_cli reports that controller c0 has one unit „u0“. A unit is the device that your operating system is working with – the „virtual“ raid drive provided by the raid controller. There are two ports/drives in this unit, called „p0“ and „p1“. Both of them have „OK“ as status message meaning that the drives are running fine.
You also ask a drive directly by asking tw_cli for the port on the controller:
# tw_cli info c0 p0 Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 232.88 GB 488397168 6RYBP4R9 # tw_cli info c0 p1 Port Status Unit Size Blocks Serial --------------------------------------------------------------- p1 OK u0 232.88 GB 488397168 6RYBSHJC
So you might already got the clue: As tw_cli is just a command line tool your task for an automated setup is setting up a cronjob checking the status of the ports (not the unit! the ports – trust me) regularly and sending a mail or nagios alarm when necessary. I just started writing a little shell script which, right now, just returns an exit status – 0 for a working raid and 1 for a problem:
#!/bin/bash UNIT=u0 CONTROLLER=c0 PORTS=( p0 p1 ) tw_check() { local regex=${1:-${UNIT}} local field=3 if [ $# -gt 0 ]; then field=2 fi local check=$(tw_cli info ${CONTROLLER} $1 \ | awk "/^$regex/ { print \${field} }") [ "XOK" = "X${check}" ] return $? } tw_check || exit 1 for PORT in ${PORTS[@]}; do tw_check ${PORT} || exit 1 done |
As you see you can configure unit, controller and ports. I have not checked this against systems with multiple controllers and units as I don’t have such a setup. But if you need you could just put the configuration stuff in a sourced configuration file.
After writing this little summary I checked all servers I am responsible of and noticed that nearly every server with hardware raid has a 3ware controller and can be checked with tw_cli. Fine…
Let me know how you manage your 3ware raid monitoring under GNU/Linux and Debian/Ubuntu based systems.