My housemate and I are currently in the process of rebuilding our fileserver with some lurvly new 500GB SATA disks. First things first, we used Solaris (for the first time ever) and constructed the thing, and after a lot of faffing around got it working. Using ZFS and RAID-Z we had the disks working. However, on doing some copying to and fro, ZFS reported I/O errors - indicating that the data/parity was inconsistent and couldn't be rebuilt from checksum - basically a big bad sign.
Putting this off to the fact we didn't really know what we were doing in Solaris, we got ubuntu-server running and used software RAID-5 (mdadm). However, every time we boot and start taxing the hard drives, like rebuilding the array for example, we end up with kernel reports that drives are failing to respond or slow to respond, soft retries and hard retries etc. The errors come from all the disks and all the controller cards we try in any configuration.
Hardware:
AMD XP1700+
384MB PC2100 RAM
Abit NF7-S Mobo
4x 500GB WD AAKS disks
Silicon Image SI3112 (yellow card)
Silicon Image SI3112 (black card)
Silicon Image SI3114
Silicon Image SI3112 (on board)
We've tried every combination of the cards in or out of the machine, with the onboard enabled and disabled and we always get some nasty errors. The one which appears to fail silently is the SI3114, where we don't get any errors thrown back at the OS, but instead we end up with data being corrupted (different MD5sums being reported once copied).
I've really grown a dislike for Silicon Image based cards over the years, but I'm not going to be too quick to jump to the conclusion that they are the source of all our problems. If anyone can suggest anything at all which might help, that would be absolutely excellent.
All comments, suggestions or whatever are all very welcome and appreciated