Linux expertise required please!
An intresting and tricky problem, but one I really need to resolve.
System is FC 12 with two 1,5Tb drives, sda and sdb.
The drives are partitioned identically sda1 is about 200Mb - sda 2 is the rest sda 1 is reserved for /boot
The system was originally built with the drives partitioned during the build process and sda2 and sdb2 form a RAID 1 array using mdadm and is designated logical device md0. An LVM is then overlayed on md0 giving with one volume group called system. Filesystems / /home /var are defined separate volumes so they appear as /dev/mapper/system-root /dev/mapper/system-root -home and /dev/mapper/system-var and the filesystem in use is Ext3.
And it all works - or it did until I had a failure of sdb very early on and the conclusion is that it was damaged in transit - it was very badly packed.
Long story short - I replaced sdb with a new drive - duplicated the partition table and mbr using dd if=/dev/sda of=/dev/sdb bs 512 count 1 - so copying the first 512 bytes from a to b. The drive was trhen added back to the array and it rebuilt itself.
As it stands, sdb is not bootable - so I ran grub on sdb and copied the files in /boot to a tempory mount point /boot2 for sdb. All worked.
So - to test the system, I failed sda with mdadm --fail /dev/sda2 and swapped the disks over. The system boots, but during the initial start up sequences while the system is running on the ramdisk, I get a problem with the file system check.
This isn't an exact output (I don't have the system to hand) but someting like fsck - /system/mapper-var clean
but then in the line underneath it says fail and refuses to boot - dropping into a command terminal.
Logging into that as root (the only option) and running fsck manually gives a missing superblock error but the fs itself is shown as clean. running fsck -f and forcing a check shows up no errors and the file system itself can be mounted manually. However trying to change the runlevel to 3 using init 3 fails because the system can't write a lock file to /var
Swapping the drives back again and the system boots normally.
So - anding a third drive sdc and repeating the same setup allows me to have an array with sdb2 and sdc2 with sda failed ond not part of it. But trying to boot off either sdb or sdc results in the same error, while booting off sda even though sda2 is not part of the array allows a normal boot.
Anyone any ideas why, and how to fix? Re-installing is not an option - or only one of very last resort.
Re: Linux expertise required please!
Did you do grub-install /dev/sdb after you allowed md to rebuild your /boot array?
Re: Linux expertise required please!
Can't remember if it was before or after (the boot partition isn't part of the array)
Code:
grub>
root (hd1,1)
install (hd1)
But it is booting - at least up to when udev starts and looks for the filesystems. So it is reading, loading and running initrd and vimlinuz off the boot partition.
Re: Linux expertise required please!
paste output from fdisk -l /dev/sd{a,b} && mdadm -D /dev/md{0,1} && pvs && vgs && lvs from your live env. please?
Re: Linux expertise required please!
OK
Code:
Disk /dev/sda: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00070b3a
Device Boot Start End Blocks Id System
/dev/sda1 * 1 26 204800 83 Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2 26 182401 1464931201 fd Linux raid autodetect
Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00070b3a
Device Boot Start End Blocks Id System
/dev/sdb1 * 1 26 204800 83 Linux
Partition 1 does not end on cylinder boundary.
/dev/sdb2 26 182401 1464931201 fd Linux raid autodetect
/dev/md0:
Version : 0.90
Creation Time : Tue Jan 1 03:09:23 2002
Raid Level : raid1
Array Size : 1464931136 (1397.07 GiB 1500.09 GB)
Used Dev Size : 1464931136 (1397.07 GiB 1500.09 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Tue May 18 06:40:05 2010
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
UUID : 8141778e:ee097228:bfe78010:bc810f04
Events : 0.299700
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 8 18 1 active sync /dev/sdb2
mdadm: cannot open /dev/md1: No such file or directory
PV VG Fmt Attr PSize PFree
/dev/md0 system lvm2 a- 1.36T 163.86G
VG #PV #LV #SN Attr VSize VFree
system 1 8 0 wz--n- 1.36T 163.86G
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
home system -wi-ao 11.72G
photos1 system -wi-ao 300.00G
photos2 system -wi-ao 300.00G
photos3 system -wi-ao 300.00G
photos4 system -wi-ao 300.00G
root system -wi-ao 14.65G
swap system -wi-ao 1.95G
var system -wi-ao 4.88G
There is only one raid partition - md0 (sda2 and sdb2) Creation time of 202 is because I hadnt reset the system clock when I loaded the OS about 6 weeks ago.
Can't see anything glaringly obviously wrong with the output though. :(
Re: Linux expertise required please!
Hi PeterB,
Not what you want to hear, but this is why I swapped to OpenSolaris and ZFS. I had all sorts of problems with Linux Software RAID. ZFS is IMO much better implemented.
Re: Linux expertise required please!
So sda1 and sdb1 are non mirrored boot partitions? does /sdb1 have all the files that /dev/sda1 has on it. Did you build the FS on /dev/sdb1 yourself? did you put the /boot label on the fs so the mounter can see it, what does your fstab look like? Personally I never put root on LVM, as you have to get the LVM system active before you can mount it.
Re: Linux expertise required please!
Quote:
Originally Posted by
b0redom
Hi PeterB,
Not what you want to hear, but this is why I swapped to OpenSolaris and ZFS. I had all sorts of problems with Linux Software RAID. ZFS is IMO much better implemented.
Not really an option in this case sadly. mdadm seems to work OK apart from this glitch! (although there have been problems, looking at the second link below)
Quote:
Originally Posted by
oolon
So sda1 and sdb1 are non mirrored boot partitions? does /sdb1 have all the files that /dev/sda1 has on it. Did you build the FS on /dev/sdb1 yourself? did you put the /boot label on the fs so the mounter can see it, what does your fstab look like? Personally I never put root on LVM, as you have to get the LVM system active before you can mount it.
sdb1 was mounted as boot 2 and I did
cp -r *
to copy them over. The partition table was copied using dd on the first sector.
LVM starts from the initrd and stalls AFTER mounting the real file system during the FS integrity check. FSTAB looks fine (can't get to it at the moment - but I'll look into it this evening.
Basically I followed this guide
http://forums.fedoraforum.org/showth...ighlight=mdadm
There seems to be quite a debate here:
http://forums.fedoraforum.org/showth...ht=RAID+1+boot
Re: Linux expertise required please!
Output of tune2fs -l /dev/sdb1 would be useful
I notice you didn't post that fstab I asked for.
You may need to do this depending on what is in your fstab.
e2label /dev/sdb1 /boot
cp -r does not copy symlinks and there is one in /boot
Any referenced to hd0 on /boot2/grub/grub.conf?
Re: Linux expertise required please!
Hmmm thinking about it, when you boot up without disk0 present is the second disk then named /dev/sda ? In which case grub is referencing the wrong place, as it will be hd0 not hd1 when in failed mode. You check this by typing dmesg.
Re: Linux expertise required please!
Quote:
Originally Posted by
oolon
Output of tune2fs -l /dev/sdb1 would be useful
I notice you didn't post that fstab I asked for.
You may need to do this depending on what is in your fstab.
e2label /dev/sdb1 /boot
cp -r does not copy symlinks and there is one in /boot
Any referenced to hd0 on /boot2/grub/grub.conf?
can't get to the machine until this evening - no ssh capable machine until then - but IIRC - it all looked as I would expect.
Yes, hd0 is referenced on boot2 (it was a copy of grub.conf from sda1). However your comment (I've highlighted) has got me wondering if the symlink is missing (although I wouldn't expect it to boot at all off that drive when it is connected as sda)
Quote:
Originally Posted by
oolon
Hmmm thinking about it, when you boot up without disk0 present is the second disk then named /dev/sda ? In which case grub is referencing the wrong place, as it will be hd0 not hd1 when in failed mode. You check this by typing dmesg.
No, I physically remove the drive and re-connect it so it is sda - so cloning the contents of /boot to it would (and from memory) does reference the 'right' hd.
Re: Linux expertise required please!
I don't think its that important just wanted to highlight it as if you had changed grub.conf it could be the problem.
{boot}/grub
lrwxrwxrwx 1 root root 11 Feb 5 13:37 menu.lst -> ./grub.conf
Re: Linux expertise required please!
Quote:
Originally Posted by
peterb
can't get to the machine until this evening - no ssh capable machine until then - but IIRC - it all looked as I would expect.
If it's just software you need, try PuTTY Portable: http://portableapps.com/apps/internet/putty_portable
Re: Linux expertise required please!
Quote:
Originally Posted by
watercooled
Looks interesting!
Re: Linux expertise required please!
Putty and Winscp are great tools. Putty will do keys and ssh-agents, more importantly it will also do port forwarding as well as using HTTPS proxies.
Re: Linux expertise required please!
WinSCP wont allow you to carry out any terminal commands that require a keypress (are you sure press Y/n to continue etc).
It's a royal PITA when you realise just after you press return and then everything locks up.