ESX 3.X and Thecus N5200PRO
I'm having trouble getting a Thecus N5200 PRO to work with VMWare ESX 3.X using iSCSI. Is there anyone else out there wh has the same problem?
I have a HP server that runs ESX on the internal drives. My plan was to use the N5200 as a storage for my virtual machines using iSCSI. I have no problem connecting the ESX to the N5200 through iSCSI. The problem begins after I install a virtual Windows machine on the N5200 and reboot it the first time then it won't boot up. It hangs a few seconds in in the boot sequence. I'we tried all kinds of configurations and settings but nothing helps. It always hangs on the same spot, right after the windows loading screen comes up and shows a progress bar. I've tried different kinds of Windows OS but it makes no difference. If I take the same virtual machine and put it on the local disk, not the N5200, it works fine. I can't put my finger on the problem and it's starting to be a bit of a pain. :surrender:
Re: ESX 3.X and Thecus N5200PRO
I realized that it's MUUUUCH later after you posted this, but I encountered the same problem and went through much roundabout work to find a solution.
The problem is in how Windows generates SCSI requests and in how the ESX software iSCSI initiator handles them. I could go into a technical discussion but to keep it short and sweet, the end result of the problem is that the software initiator thinks it's getting back bad packets because the CRC32 digests on the SCSI responses it examines don't match what it expects to get, so it marks the payload bad and goes through an ugly re-request cycle.
The workaround (until an actual fix gets added to ESX via a patch) is to disable hashing of payload data. On your ESX host, edit /etc/vmkiscsi.conf, and add or uncomment a line to say "DataDigest=Never", then restart the iSCSI initiator (or reboot).
Note that for non-Windows guests, this is not required as they don't exhibit the same behavior.
Hopefully this is helpful even though I'm responding months later. =)
Re: ESX 3.X and Thecus N5200PRO
Oh my god! you're my hero. You'll be in my will for sure. It works like a charm now. Thanks a million :D
Re: ESX 3.X and Thecus N5200PRO
No problem! =) There are more issues with using the N5200B Pro with ESX, but once you get everything worked out, it works really well I've found.
Post some contact info in your profile and I can get in touch with you directly to help you out some more if you'd like.
Re: ESX 3.X and Thecus N5200PRO
Quote:
Originally Posted by
JSVM
No problem! =) There are more issues with using the N5200B Pro with ESX, but once you get everything worked out, it works really well I've found.
Post some contact info in your profile and I can get in touch with you directly to help you out some more if you'd like.
Hi JSVM
You seem to have managed to get ESX to speak to the N5200 Pro - something I have been struggling with a few days now. I was hoping you might be able to give me some helpful hints.
(I posted the following on vmware communities w/o any response - perhaps you know what i am on about)
------
Hi all,
im having some grief getting the iSCSI software adapter to play nice with my Thecus N5200Pro. Im including some links to my website with screenshots of what I am seeing.
The server is a Sun X2200M2 with 2x AMD 2218 processors and 8GB RAM. VI 3.5 server is installed on the internal SAS drive and working nicely.
I've got all my gear on the same internal test subnet 192.168.2.0/24.
The server has 4 NICs, eth0 is configured at 192.168.2.200 virtual switch auto configured on that IP. server hostname is schnitzel.tobyshouse.com (though that doesnt actually resolve). There is no DNS on the local subnet.
I created a new VMkernel port on 192.168.2.205 and also a Service Console on 192.168.2.206 - all on vSwitch1. I believe this is to make iSCSI happy. Then proceeded to edit the properties of the iSCSI Software adapter: enabled, allowed it to pick an iSCSI name and alias. In Dynamic Discover, added the IP of the iSCSI target 192.168.2.130 (default port). After some searching it adds that target in the Send Targets list. However it does not show the name of the target ?volume? in the Static discovery portion. (shouldnt it show up there after auto discovery?) CHAP auth is off as I have not enabled it on the Target.
See config here:
tobyport.com/iscsi/Picture22.jpg
tobyport.com/iscsi/Picture23.jpg
tobyport.com/iscsi/Picture24.jpg
tobyport.com/iscsi/Picture25.jpg
Also, I am expecting (perhaps wrongly) that in the Details view of the iSCSI Software adapter I can see Targets=1 once the target is discovered/configured. Yet I do not:
tobyport.com/iscsi/Picture21.jpg
Now, to the Target side. Its an N5200 Pro from Thecus
Hostname is n5200.tobyshouse.com (again doesnt resolve) and iqn.2008-01.com.tobyshouse:RAID.iscsi0.vg0.scuzzy
Interestingly it does seem to see the VMware esx 3.5 server, after the initiator probes the IP of the target I get confirmation from the N5200 that the VMware box is talking to it: (shows the initiator iqn in the Initiator information window) State set to 'active')
tobyport.com/iscsi/Picture26.jpg
The N5200 is running firmware 2.00.04 apparently
Verified iSCSI initiators:
Windows: Microsoft initiator v2.0.4
StarPort initiator 3.5.2
Linux: open-iscsi 2.0-865
I do believe I read elsewhere on the forums that people have managed to get this to work with ESX in the past.
Would expect the iSCSI volume to show up in Storage config section once all good, but does not at this stage.
All help and pointers greatly appreciated.
Regards
Toby
----
(End post from vmware communities)
Re: ESX 3.X and Thecus N5200PRO
Have you tried esxcfg-rescan on the iSCSI vmhba from the service console? Does a LUN show up?
Incidentally, there's another issue which you won't encounter until you try to create multiple ISCSI LUNs. ESX, on a given HBA, requires unique LUN numbers, even if they're on different ports/targets. Because Thecus' scripts generate the iscsi conf with all LUNs as LUN 0 (just on different SCSI device numbers), ESX will only see the last one it scans. I had to build a new firmware (after reverse-engineering the firmware including the way it's encrypted) with patched scripts that set a unique LUN # for each LUN.
Re: ESX 3.X and Thecus N5200PRO
Quick suggestion -- once you see your LUN listed after running esxcfg-rescan (with your iSCSI vmhba), then do esxcfg-vmhbadevs to find the Linux SCSI device mappings in the service console.
Once you've done that, use fdisk to create a single large partition on the lun, and change the partition type to (hex) FB .
Then "/etc/init.d/mgmt-vmware restart". Relogin with the vpxclient/VIC. Try to create a new datastore.
If that fails, vmkfstools -C vmfs3 -S NewVMFSName vmhba##:#:#:1 where the #s are appropriate for your iSCSI target device.
Re: ESX 3.X and Thecus N5200PRO
Thank you - will try this when i get to the site later in the week and let you know
Re: ESX 3.X and Thecus N5200PRO
Any updates? Added you to MSN but haven't seen you on. =) Let me know how it goes.
Re: ESX 3.X and Thecus N5200PRO
Hey JSVM - interesting results.
I powered the environment back up and was surprised to find it in a slight different state then where I had left it. i.e the initiator was no longer visible in the N5200's iSCSI option. Curious I thought. Removed the whole iSCSI volume and re-added it. Still no dice.
I then remembered I had not configured the ESX firewall - ever. Well apart from unloading it on a regular basis. So I unloaded it again and then removed the Target IP from the Software initiator. Restarted once more for good measure. Unloaded firewall again. Now re-added IP of the N5200 in the Targets section. N5200 now saw the initiator once more. So i was back where I had previously been. Then used the ViC to rescan for volumes and to my great surprise, the iSCSI target showed up. Yay! Back under Storage tab, it also showed my pretty iSCSI target volume. Fromatted it as VMFS3 and away I went. For good measure I updated the firewall to add the swISCSIClient and re-booted the ESX server. It now sees the target w/o me playing with the firewall by unloading it. Great success.
My best guess is that previously while re-booting ESX I had at one stage forgotten to unload the firewall and it could not see the Target anymore. I only have the one iSCSI LUN so I doubt that was the cause for my grief.
Will apply the same rules I learned here to the production system which I will be building in a few days.
Thanks again for your help JSVM.
Re: ESX 3.X clustering and Thecus N5200PRO
Hey JSVM -
Thanks a million for your advice. This thread has been very helpful.
Have you had any luck clustering ESX on the Thecus N5200 Pro? I'm trying and it seems like I can only have one connected at the same time. I thought that I had read something about a connection limit on the Thecus' implementation of iSCSI, but I can't find the details again.
Have you had any such problems? Any thoughts?
Re: ESX 3.X and Thecus N5200PRO
Quote:
Originally Posted by
JSVM
The workaround (until an actual fix gets added to ESX via a patch) is to disable hashing of payload data. On your ESX host, edit /etc/vmkiscsi.conf, and add or uncomment a line to say "DataDigest=Never", then restart the iSCSI initiator (or reboot).
I figured out that this definitley isn't required any longer with 2.04. I thought it was at first, but I discovered some other issues that I'm still trying to work out. (I'll post a follow-up when I know for sure.)
My main issue now is that I can't connect two machines (VMware ESX 3.5) to the same iSCSI LUN at the same time. I've tried editing the ietd.conf to give it 16 max connections and set InitialR2T to "yes" (per some instructions I found) but no go. If anyone can supply a working ietd.conf for reference, that would be much appreciated.
Re: ESX 3.X and Thecus N5200PRO
It's only required if you're running *WINDOWS* guests. I verified that you have to manually disable DataDigest on GA builds of ESX 3.0.2 and 3.5 -- however there was a patch developed for 3.5 that may have hit GA now, I'm not sure.
Regarding editing ietd.conf -- remember that every time you restart or make changes to the arrays a script will re-generate ietd.conf. Editing it manually will do no good unless you manually restart the iSCSI target daemon -- and even then it will get overwritten the next time you do anything that configures stuff in the Thecus' management.
I wrote a firmware patch (firmwares are encrypted cloops that are loaded from a partition that's not mounted during normal operation) to deal with editing the script that generates ietd.conf.
Re: ESX 3.X and Thecus N5200PRO
Oh and as of right now, we're currently running a small labmanager cluster on the N5200B Pro.
Re: ESX 3.X and Thecus N5200PRO
I was able to finally get everything working by simply re-installing the two VMWare ESX 3.5 servers. I'm not 100% sure why this was the fix but the prevailing theories are that my problems were the result of upgrading from 3.01 to 3.02 to 3.5 OR the order in which I configure the iSCSI on the ESX side. Regardless, it DOES work now although there are some pretty important caveats for those interesting in using the Thecus 5200 series as an iSCSI target for ESX clustering:
The version of the iSCSI target (known IET, iSCSI Enterprise Target) in the 2.0.4 firmware is not new enough to support a key feature called "reserve and release". This feature is reportedly important for maintaining a stable cluster by preventing nodes from stepping on each other. I have had no such problems yet, but hopefully they will move to IET version 0.4.15 in 2.0.5. (The current version is 0.4.12, which definitely does not include the reserve/release code unless they have patched it.)
Now has anyone seen any good threads on iSCSI performance optimization on the Thecus? ;-)
Re: ESX 3.X and Thecus N5200PRO
Quote:
Originally Posted by
JSVM
Have you tried esxcfg-rescan on the iSCSI vmhba from the service console? Does a LUN show up?
Incidentally, there's another issue which you won't encounter until you try to create multiple ISCSI LUNs. ESX, on a given HBA, requires unique LUN numbers, even if they're on different ports/targets. Because Thecus' scripts generate the iscsi conf with all LUNs as LUN 0 (just on different SCSI device numbers), ESX will only see the last one it scans. I had to build a new firmware (after reverse-engineering the firmware including the way it's encrypted) with patched scripts that set a unique LUN # for each LUN.
Is this patch something you can share? if so, my email address is cwindomsr at hotmail dot com.
I am currently havingthe same issues trying to get the N5200BPRO to work with ESX Server.
Thanks In Advance,
Charles A. Windom Sr.
cwindomsr at hotmail dot com