Background:
I have a brand new Thecus n3200Pro with 3 WD15EADS 1.5 TB green drives installed in a Raid-5 configuration (2.8 TB, 128K stripe size). I immediately upgraded the firmware to version 1.00.03a. I installed DropBear SSH daemon, the SYSINFO module and DLM2 (Download Manager V2).
My system is connected to a Gigabit switch with Jumbo Frames turned off. It is plugged into the switch on the WAN port of the 3200pro.
I have also tested my system by connecting my 3200pro and my computer directly to each other using a Cat5e crossover cable and using static IP addresses on the LAN port on the 3200pro.
I have shutdown any and all possible services. I even used my ssh access to shut down cups daemon, then I disabled ssh and DLM2 and Nsync and UPNP and FTP and EVERYTHING except SMB as I need to use the 3200pro as my main storage for the Windows systems on my network.
Please assume I have a very strong understanding of Linux, bash shell scripting and networking fundamentals. I am not a Sys Admin (I am an Oracle Data Architect if you must know and I work with Unix every day) but I do know my way around many different flavors of Unix.
Problem:
My 3200Pro seems to be having problems sustaining a file transfer of large files or a file transfer consisting of a large amount of data (gigabytes at a time, files of various sizes). I am using Windows XP on the client computer. I have used both 100mbs and 1000mbs on the nic to connect to the network. I have tried two different desktops and one laptop, all using Windows XP, but the same problem exists. I even tried Cygwin using scp to push and pull files to and from the Thecus, but still get the same results.
What I am experiencing is all of a sudden, the load average on the system spikes. Once it goes up over 1.0 (it goes as high as 2.9) I see the problem occur. The problem is that the file transfer freezes. The three HDD lights on the front stop flashing and just the one HDD3 light stays on solid. This will continue for long periods of time. Sometimes, if I wait long enough (> 5 minutes or so) the system will pick back up and continue transferring files...only to freeze up again a few minutes later. Whenever the freeze occurs, the tell tale sign is the single, solid light on HDD3 and a load average over 1.0. This is consistent and reproducible.
I have looked at my smartctl output and the drives are fine. DMESG gives me nothing to write home about. I think I am pretty sure the drives are not the issue...well, i think the size may be a contributing factor.
Suspicions and Theories:
I think my problem may be due to the 500MHz AMD Geode processor. Maybe trying to calculate the parity on a 3 TB Raid-5 is too much to do on the fly for this processor and memory configuration, and I may be asking too much of this device. I am prepared to accept that (although would be very pissed as I purchased drives off of the Thecus blessed list of drive models).
My hypothesis was given more weight when I started shutting down other modules and processes. When I shut down extraneous processes that were putting a load on the system, the file transfers began to work longer between freezes. I am still getting them but at a much greater interval...and the probability for the file transfer to continue (rather than die) has increased as well.
I am hoping someone else has had a similar problem and figured out a work-around or a solution. I have been struggling to load data onto the device, although I have only managed to load about 200GB over the past week or so after repeated attempts.
Suggestions/Next Steps:
So I am looking for suggestions on what to do/try next. One thought was to try to upgrade to a beta version of the firmware (M3800/N3200PRO beta firmware V1.00.05.1), however if my problem truly is limitations on the CPU for the drive size that I have, I don't know how this could help. I am afraid to try a beta firmware as I am not 100% sure I can go back down to 1.00.03a again.
Should I consider downgrading firmware down to 1.00.02 that came with the Thecus? Is that even possible to downgrade?
If the system has the memory and horsepower to manage a 2.8TB software Raid-5, then maybe it can be a kernel issue or issue with SMB?
I do have a need for redundant storage with a modicum of fault tolerance. Another thought I had was to run two of the WD15EADS in a 1.5 TB RAID-1 configuration to store my more precious documents (pictures of the kids and my movies) and then mount the third 1.5 TB drive as single point of failure to store files I could tolerate losing to a drive failure. My only problem with this is that the UI doesn't seem to allow you to create two separate mount points. I think if you set up a RAID-1, the third drive gets stuck as a hot spare and is not addressable. I could always just hack Samba manually and create my own mount point, but this would be something I do as a last resort.
I am pulling at straws (and my hair out of my head).
I could really use some help/guidance here.
Thank you.