SuSE 10.0 Something broke: /dev/hd* and friends no longer get created,boot fails - Linux

This is a discussion on SuSE 10.0 Something broke: /dev/hd* and friends no longer get created,boot fails - Linux ; Hello. Last night I put a new I/O board in my machine, which means I had to boot for the first time in about two months. Something happened where I can no longer boot. A little while ago I added ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: SuSE 10.0 Something broke: /dev/hd* and friends no longer get created,boot fails

  1. SuSE 10.0 Something broke: /dev/hd* and friends no longer get created,boot fails

    Hello.

    Last night I put a new I/O board in my machine, which means I had to
    boot for the first time in about two months. Something happened where I
    can no longer boot.

    A little while ago I added a rule file to /etc/udev/rules.d (a
    99-something which attempted to set permissions on /dev/ttyS0) but never
    tested it across a boot.

    First, rest assured I reversed the hardware and udev change, so my
    system should be the same as it was before. When getting ready to make
    the change I did a proper shutdown etc.

    I was negligent in three respects: I didn't keep up with my backups
    (most recent is a week or so ago), I never boot-tested my udev rules
    change, and I've been periodically running YOU updating everything it
    suggests including the kernel and whatnot but I hadn't been rebooting
    the machine to ensure all is well. So this problem could be caused by
    something that happened or something I did as long as two months ago and
    I didn't run across it until now.

    Here is my machine configuration:

    - 2.4(?) GHz P4, 1GB RAM, NVidia video, 10/100+USB1.1+1394 combo card,
    Audigy 2, USB2+FW combo card, Promise 20269-based IDE card, two hard
    disks (hda and hdg, both WD 120GB), LG dvd/cd writer, IDE zip drive
    (Dell Dimension 8200 with a couple peripheral changes)
    - Boot on hda1, swap on hda2 and hdg2, root on md0=hda3+hdg3, /home on
    md1=hda4+hdg4. The hdg1 partition is mounted on /altboot; I was going
    to rsync /boot onto it but I never got around to it.
    - All filesystems are ext3
    - Was running KDE with the NVidia driver

    When I shut down to add the new board one thing was a little odd - when
    I logged out of my user KDE session I was dropped to a console prompt
    rather than an xdm screen. I assumed that was simply because I had a
    YOU kernel update that hadn't gotten booted on before. I logged in as
    root at the console prompt and executed 'halt'. The system seemed to
    shut down OK at that point (I use the verbose boot, no splash screen).

    I made the hardware change, then powered on. The kernek booted, initrd
    loaded, / passed fsck and was mounted (it forced fsck due to being 63
    days since last fsck). It detected and assembled both raid1 volumes BUT
    fsck failed on hda1 and hdg1. At first I though "great, disk error or
    something". It dropped me to single-user, and when I tried rerunning
    fsck I realized it failed because /dev did not contain any hd* devices.

    I rebooted into "failsafe" with the same results except this time md0
    and md1 got fsck forced because it claimed 49710 days elapsed since last
    fsck. That makes me uncomfortable, obviously, but the root partition
    (md0) at least seemed to be OK from within single-user.

    It was at this point that I reverted the hardware change and my
    /etc/udev/rules.d/99-foo file (by removing the 99-foo file).

    Right now whether I try to boot into failsafe or normal mode I end up
    with /dev/hd* missing (/dev/md* is there). If I reboot into the same
    mode fsck doesn't get forced, if I switch from normal to failsafe or
    vice versa I get that weird 49710-day fsck (always the same number). It
    also doesn't matter whether I reboot or halt/powerdown then boot.

    A few things I was able to find:

    - It appears that /etc/init.d/boot.udev did not get run. I haven't
    figured out yet when it is supposed to run; if it's before or after
    boot.localfs (where I end up in single-user shell).

    - Sometimes udevd is running when I'm in singleuser, sometimes not. I
    haven't figured out the pattern yet. As I write this, I booted failsafe
    and am in singleuser with udevd running and /dev is missing the hd* files

    - If I run boot.udev force-reload I get a properly populated /dev.

    - Note: While I'm concentrating on /dev/hd* (especially /dev/hda1)
    missing, I have not checked if that is the only thing missing. As I
    write this, /dev has some files such as tty*, lp*, parport*, ippp*,
    isdn*, console, and the misc devices (zero, mem, null, etc.).

    - /proc and /sys are mounted and appear to be OK. Particularly I
    checked that /sys/block is OK, including /sys/block/hda/hda1.

    - if I cd to /dev and run 'df' I see "-" as the device and "/dev" as the
    mount point (I don't know if that's normal or not).

    - Booting with the installation DVD (OpenSuSE Eval DVD for 10.0) comes
    up to the installation screens OK, but the repair options don't work
    because they can't figure out where my root is. It appears to find hda1 OK.

    - I tried searching google and google-groups for anything related to
    this but the only clue I was able to find was to verify /sys/block. I
    was unable to come up with a search string that produced something
    useful (a common problem with me, unfortunately).

    I appreciate any suggestions of what to try or what to look at.
    Hopefully this afternoon I'll have another 10.0 installation on another
    machine I can compare against, at least so I can see what is right and
    what is broken. Obviously I'm most suspicious that my attempt to use
    udev rules to modify ttyS0 permissions royally screwed things up - I'd
    never tried writing a udev rule before. I've reverted the file change
    as I mentioned, but I'd guess if the saved udevdb got messed up maybe
    that's what's wrong. I haven't posted to the udev lists, though; I want
    to see if there might be another reason or suggestion.

    Thanks in advance!

    ken


  2. Re: SuSE 10.0 Something broke: /dev/hd* and friends no longer getcreated, boot fails

    Ken Ryan wrote:
    > Hello.
    >
    > Last night I put a new I/O board in my machine, which means I had to
    > boot for the first time in about two months. Something happened where I
    > can no longer boot.
    >
    > A little while ago I added a rule file to /etc/udev/rules.d (a
    > 99-something which attempted to set permissions on /dev/ttyS0) but never
    > tested it across a boot.
    >
    > First, rest assured I reversed the hardware and udev change, so my
    > system should be the same as it was before. When getting ready to make
    > the change I did a proper shutdown etc.
    >
    > I was negligent in three respects: I didn't keep up with my backups
    > (most recent is a week or so ago), I never boot-tested my udev rules
    > change, and I've been periodically running YOU updating everything it
    > suggests including the kernel and whatnot but I hadn't been rebooting
    > the machine to ensure all is well. So this problem could be caused by
    > something that happened or something I did as long as two months ago and
    > I didn't run across it until now.
    >
    > Here is my machine configuration:
    >
    > - 2.4(?) GHz P4, 1GB RAM, NVidia video, 10/100+USB1.1+1394 combo card,
    > Audigy 2, USB2+FW combo card, Promise 20269-based IDE card, two hard
    > disks (hda and hdg, both WD 120GB), LG dvd/cd writer, IDE zip drive
    > (Dell Dimension 8200 with a couple peripheral changes)
    > - Boot on hda1, swap on hda2 and hdg2, root on md0=hda3+hdg3, /home on
    > md1=hda4+hdg4. The hdg1 partition is mounted on /altboot; I was going
    > to rsync /boot onto it but I never got around to it.
    > - All filesystems are ext3
    > - Was running KDE with the NVidia driver
    >
    > When I shut down to add the new board one thing was a little odd - when
    > I logged out of my user KDE session I was dropped to a console prompt
    > rather than an xdm screen. I assumed that was simply because I had a
    > YOU kernel update that hadn't gotten booted on before. I logged in as
    > root at the console prompt and executed 'halt'. The system seemed to
    > shut down OK at that point (I use the verbose boot, no splash screen).
    >
    > I made the hardware change, then powered on. The kernek booted, initrd
    > loaded, / passed fsck and was mounted (it forced fsck due to being 63
    > days since last fsck). It detected and assembled both raid1 volumes BUT
    > fsck failed on hda1 and hdg1. At first I though "great, disk error or
    > something". It dropped me to single-user, and when I tried rerunning
    > fsck I realized it failed because /dev did not contain any hd* devices.
    >
    > I rebooted into "failsafe" with the same results except this time md0
    > and md1 got fsck forced because it claimed 49710 days elapsed since last
    > fsck. That makes me uncomfortable, obviously, but the root partition
    > (md0) at least seemed to be OK from within single-user.
    >
    > It was at this point that I reverted the hardware change and my
    > /etc/udev/rules.d/99-foo file (by removing the 99-foo file).
    >
    > Right now whether I try to boot into failsafe or normal mode I end up
    > with /dev/hd* missing (/dev/md* is there). If I reboot into the same
    > mode fsck doesn't get forced, if I switch from normal to failsafe or
    > vice versa I get that weird 49710-day fsck (always the same number). It
    > also doesn't matter whether I reboot or halt/powerdown then boot.
    >
    > A few things I was able to find:
    >
    > - It appears that /etc/init.d/boot.udev did not get run. I haven't
    > figured out yet when it is supposed to run; if it's before or after
    > boot.localfs (where I end up in single-user shell).
    >
    > - Sometimes udevd is running when I'm in singleuser, sometimes not. I
    > haven't figured out the pattern yet. As I write this, I booted failsafe
    > and am in singleuser with udevd running and /dev is missing the hd* files
    >
    > - If I run boot.udev force-reload I get a properly populated /dev.
    >
    > - Note: While I'm concentrating on /dev/hd* (especially /dev/hda1)
    > missing, I have not checked if that is the only thing missing. As I
    > write this, /dev has some files such as tty*, lp*, parport*, ippp*,
    > isdn*, console, and the misc devices (zero, mem, null, etc.).
    >
    > - /proc and /sys are mounted and appear to be OK. Particularly I
    > checked that /sys/block is OK, including /sys/block/hda/hda1.
    >
    > - if I cd to /dev and run 'df' I see "-" as the device and "/dev" as the
    > mount point (I don't know if that's normal or not).
    >
    > - Booting with the installation DVD (OpenSuSE Eval DVD for 10.0) comes
    > up to the installation screens OK, but the repair options don't work
    > because they can't figure out where my root is. It appears to find hda1
    > OK.
    >
    > - I tried searching google and google-groups for anything related to
    > this but the only clue I was able to find was to verify /sys/block. I
    > was unable to come up with a search string that produced something
    > useful (a common problem with me, unfortunately).
    >
    > I appreciate any suggestions of what to try or what to look at.
    > Hopefully this afternoon I'll have another 10.0 installation on another
    > machine I can compare against, at least so I can see what is right and
    > what is broken. Obviously I'm most suspicious that my attempt to use
    > udev rules to modify ttyS0 permissions royally screwed things up - I'd
    > never tried writing a udev rule before. I've reverted the file change
    > as I mentioned, but I'd guess if the saved udevdb got messed up maybe
    > that's what's wrong. I haven't posted to the udev lists, though; I want
    > to see if there might be another reason or suggestion.
    >
    > Thanks in advance!
    >
    > ken
    >



    further investigation shows something really bizzarre.

    When I run udevinfo e.g.

    udevinfo -q all -p /sys/block/hda/hda1

    all the lines look OK *except* the line

    N: ttyS0

    is in all files. This is also in /dev/.udevdb files.

    I'm certain now that my attempt to write a rule for permissions on ttyS0
    is the cause of this. The question is how do I fix it? I removed the
    rule I wrote, but something is remembering it. I looked around with
    find and grep but I don't know udev and the SuSE boot process well at
    all, so I'm having no luck figuring out where the problem is.

    Again, any tips would be immensely appreciated!

    Thanks...

    ken


+ Reply to Thread