Friday, February 10, 2012

Dell PowerEdge 1850 + PowerVault 220S

Only a few moments ago I started this blog, and I already feel the need to share something with you.

I started work today and one of my first duties of the day is checking Nagios for failed systems.

First thing I noticed was a critical message of one of the servers I work with.
At second glance it showed: Dell PowerEdge 1850 with attached PowerVault 220S.
And the entire vault is missing from the system.

So what went wrong?

As the vault wasn't even listed in the system properties anymore I thought about somebody accidentally unplugged it. As we use a shared datacentre with a couple of other companies together this could be a possibility.
But I was relieved to discover, that the vault itself was still plugged in correctly and was humming away in idle-mode.

Okay, so something different went wrong.

A close look at the system-logs revealed a reboot in the middle of the night, which is not uncommon for systems with auto updates enabled.

I used Google a bit and hit another blog, describing something very similar to my problem (Nerhood Weblog).

So running some system diagnostics revealed: Dell PERC 3/DC as a RAID controller, managing the vault.

Shutting down the server and doing a cold start didn't solve the problem by the way, it's some sort of crude gamble playing with these devices. Sometime they appear after a reboot, mostly they don't.

Where to go from here? I actually don't know. Since the devices are out of warranty a few years now, I think I'll try to find a replacement for the crappy controller as this seems to be the main source of failure in this case.

If you happen to have these devices lying around, don’t under any circumstances try to use them in this combination. You will fail.

I hope I'll find a fix for this. Somehow.