HELP: Linux 2.6 system suddenly fails to boot
I am ripping my hair out with a linux 2.6 system that has worked reliably for a number of years, but which now this morning suddenly
failed to boot.
What happens is that the boot sequence gets to the point where it does swapon, and then can't find the swap partition. The
subsequent attempt to mount the primary partition readonly appears to work, but when it tries to go read/write, it fails with the
fsck.ext2: No such file or directory while trying to open /dev/sda3
It then prints out a message about the superblock possibly being corrupt and suggesting I run e2fsck against it. Iit doesn't drop
into single-user mode; it just processes a few more boot script instructions and hangs...apparently forever.
The really weird thing is this: I keep a backup Slackware partition on this same computer, and when I boot into it everything works
fine. More importantly, I can mount and browse the partitions I normally use (which are /dev/sda3 and /dev/sda4 in my normal boot
sequence, but are /dev/hdc3 and /dev/hdc4 when running under Slackware).
Most importantly of all, e2fsck reports no errors of any kind on the primary boot partition, not even bad blocks (I did the
non-destructive read/write test). I ran it against those partitions from within Slackware.
What the heck is going on?? How can a partition have a clean bill of health when run under one distribution and fail to mount when
booted under a different distribution on the same hardware??
FYI, my primary distro is LinuxFromScratch.
Any help would be greatly appreciated as I am totally dead in the water here, and am facing having to rebuild my entire system for
no reason that I can see.
Re: HELP: Linux 2.6 system suddenly fails to boot
On Fri, 25 Jan 2008 15:02:00 -0800, Mark Olbert wrote:
> What happens is that the boot sequence gets to the point where it does
> swapon, and then can't find the swap partition.[/color]
Sounds like the /dev directory has not been populated. /dev should have a
tmpfs mounted, that is, a chunk of kernel memory is formatted like a file
system an mounted as a device. I think I remember that this usually
happens during the initrd phase. (Ask if you don't know what initrd is.)
As I remember it, the tmpfs is first creaed and populated during initrd,
then it is remounted under the actual root after initrd ends and /etc/
inittab processing has begun. Inittab usually runs rc.sysinit. Check the
contents of this file (/etc/rc.d/rc.sysinit) to see what is suposed to
happen after initrd processing ends.
Looking closer at the sequence of events around this may bring you one
step closer to the source of your trouble.
> The subsequent attempt
> to mount the primary partition readonly appears to work, but when it
> tries to go read/write, it fails with the error:
> fsck.ext2: No such file or directory while trying to open /dev/[/color]
> It then prints out a message about the superblock possibly being corrupt
> and suggesting I run e2fsck against it.[/color]
Error messages are often not to the point.
Using the second boot option, you can access the initrd image of the
failing partition. unpack it into a temporary directory, and explore it
to find the program it runs. (As things change, I never remember which is
which... Is there a file /init in the initrd image?)
The initrd image, as far as I recall, is a gzipped cpio image.
cd /tmp/initrd; gunzip initrd | cpio -someoptions
Once you have the exact list of actions, it is far easier to spot the
first anomaly in the sequence of messages on the boot screen.