lenny regression initrd/lvm/ rootfs detection timeout - Debian

This is a discussion on lenny regression initrd/lvm/ rootfs detection timeout - Debian ; Hi, after upgrading an FSI RX/300 from etch to lenny the machine would not boot anymore. It got stuck in the initrd not beeing able to find the root filesystem. The cause was that the aacraid took too long to ...

+ Reply to Thread
Results 1 to 11 of 11

Thread: lenny regression initrd/lvm/ rootfs detection timeout

  1. lenny regression initrd/lvm/ rootfs detection timeout


    Hi,
    after upgrading an FSI RX/300 from etch to lenny the machine would not
    boot anymore. It got stuck in the initrd not beeing able to find the
    root filesystem. The cause was that the aacraid took too long to make
    the root filesystem available. Thus the boot timed out and the initrd
    waited for the root filesystem to get available. After some seconds >45
    the root disks (sda on an aacraid) got available but the boot failed
    anyway dropping into the initrd. The cause was that the root is an lvm
    which is on that disk and the lvm does not get retried after more disks
    get available.

    I got the machine to boot by running /scripts/top-local/lvm2 which made
    the root filesystem in the lvm available and ctrl-d to continue booting.

    I think after more disks get available the initrd should retry running
    the lvm detection otherwise a lot of lvm based systems might die/get
    stuck on upgrade.

    I'd consider this a RC bug - no clue whose fault this is though ...

    Flo
    --
    Florian Lohoff flo@rfc822.org +49-171-2280134
    Those who would give up a little freedom to get a little
    security shall soon have neither - Benjamin Franklin

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)

    iD8DBQFI61P9Uaz2rXW+gJcRAgTPAKC3Y87zMkuFmrN0lHzHe+ 2aComkGgCgpEOv
    FLjX/ennghHYVwKreGigUdg=
    =E+lx
    -----END PGP SIGNATURE-----


  2. Re: lenny regression initrd/lvm/ rootfs detection timeout

    On Tue, Oct 07, 2008 at 03:02:05PM +0200, maximilian attems wrote:
    > Subject: Re: lenny regression initrd/lvm/ rootfs detection timeout
    >
    > On Tue, Oct 07, 2008 at 02:20:13PM +0200, Florian Lohoff wrote:
    > >
    > > Hi,
    > > after upgrading an FSI RX/300 from etch to lenny the machine would not
    > > boot anymore. It got stuck in the initrd not beeing able to find the
    > > root filesystem. The cause was that the aacraid took too long to make
    > > the root filesystem available. Thus the boot timed out and the initrd
    > > waited for the root filesystem to get available. After some seconds >45
    > > the root disks (sda on an aacraid) got available but the boot failed
    > > anyway dropping into the initrd. The cause was that the root is an lvm
    > > which is on that disk and the lvm does not get retried after more disks
    > > get available.
    > >
    > > I got the machine to boot by running /scripts/top-local/lvm2 which made
    > > the root filesystem in the lvm available and ctrl-d to continue booting.
    > >
    > > I think after more disks get available the initrd should retry running
    > > the lvm detection otherwise a lot of lvm based systems might die/get
    > > stuck on upgrade.
    > >
    > > I'd consider this a RC bug - no clue whose fault this is though ...

    >
    > standard answer boot with
    > rootdelay=X


    It worked with etch without that parameter and the upgraded did not add
    it so its a lenny regression - isnt it?

    And in my case it was a remote reboot where the machine did not come
    back - so i needed to go there physically - i am on the lucky side as
    i tested with a machine next door and not 400km away ...

    Flo
    --
    Florian Lohoff flo@rfc822.org +49-171-2280134
    Those who would give up a little freedom to get a little
    security shall soon have neither - Benjamin Franklin

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)

    iD8DBQFI616rUaz2rXW+gJcRApUSAJ9lrzInI2X9ukArq3eJhu wSxK24jACgpGPo
    GIFnOJYuDoBrYm766A0ssjc=
    =bGT3
    -----END PGP SIGNATURE-----


  3. Re: lenny regression initrd/lvm/ rootfs detection timeout

    On Tue, Oct 07, 2008 at 02:20:13PM +0200, Florian Lohoff wrote:
    >
    > Hi,
    > after upgrading an FSI RX/300 from etch to lenny the machine would not
    > boot anymore. It got stuck in the initrd not beeing able to find the
    > root filesystem. The cause was that the aacraid took too long to make
    > the root filesystem available. Thus the boot timed out and the initrd
    > waited for the root filesystem to get available. After some seconds >45
    > the root disks (sda on an aacraid) got available but the boot failed
    > anyway dropping into the initrd. The cause was that the root is an lvm
    > which is on that disk and the lvm does not get retried after more disks
    > get available.
    >
    > I got the machine to boot by running /scripts/top-local/lvm2 which made
    > the root filesystem in the lvm available and ctrl-d to continue booting.
    >
    > I think after more disks get available the initrd should retry running
    > the lvm detection otherwise a lot of lvm based systems might die/get
    > stuck on upgrade.
    >
    > I'd consider this a RC bug - no clue whose fault this is though ...


    standard answer boot with
    rootdelay=X


    --
    To UNSUBSCRIBE, email to debian-boot-REQUEST@lists.debian.org
    with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

  4. Re: lenny regression initrd/lvm/ rootfs detection timeout

    On Tue, Oct 07, 2008 at 03:10:47PM +0200, maximilian attems wrote:
    > > > standard answer boot with
    > > > rootdelay=X

    > >
    > > It worked with etch without that parameter and the upgraded did not add
    > > it so its a lenny regression - isnt it?

    >
    > no it was just luck that it didn't hit you previously.
    > kernel gives no guarantee on timing.


    This renders the argument with "rootdelay=" moot - When the kernel gives
    no guarantee on timing ANY rootdelay works just by luck. So coming back
    to this issue i consider this still a bug - When a block device comes
    available the lvm code needs to scan it in case the rootfs is an lvm.

    The whole issue with finding the rootfs in the initrd needs to be
    triggered and not waited for base on the statement of yours.

    Flo
    --
    Florian Lohoff flo@rfc822.org +49-171-2280134
    Those who would give up a little freedom to get a little
    security shall soon have neither - Benjamin Franklin

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)

    iD8DBQFI62ozUaz2rXW+gJcRAhpjAKCeQ3nJ1Nqxu+EmqOo/4IrlUidnWwCfXpvz
    5llEr5/JB8lqPqj+FFpq0Bo=
    =ijxR
    -----END PGP SIGNATURE-----


  5. Re: lenny regression initrd/lvm/ rootfs detection timeout

    On Tue, Oct 07, 2008 at 03:05:47PM +0200, Florian Lohoff wrote:
    > On Tue, Oct 07, 2008 at 03:02:05PM +0200, maximilian attems wrote:
    > > Subject: Re: lenny regression initrd/lvm/ rootfs detection timeout
    > >
    > > On Tue, Oct 07, 2008 at 02:20:13PM +0200, Florian Lohoff wrote:
    > > >
    > > > Hi,
    > > > after upgrading an FSI RX/300 from etch to lenny the machine would not
    > > > boot anymore. It got stuck in the initrd not beeing able to find the
    > > > root filesystem. The cause was that the aacraid took too long to make
    > > > the root filesystem available. Thus the boot timed out and the initrd
    > > > waited for the root filesystem to get available. After some seconds >45
    > > > the root disks (sda on an aacraid) got available but the boot failed
    > > > anyway dropping into the initrd. The cause was that the root is an lvm
    > > > which is on that disk and the lvm does not get retried after more disks
    > > > get available.
    > > >
    > > > I got the machine to boot by running /scripts/top-local/lvm2 which made
    > > > the root filesystem in the lvm available and ctrl-d to continue booting.
    > > >
    > > > I think after more disks get available the initrd should retry running
    > > > the lvm detection otherwise a lot of lvm based systems might die/get
    > > > stuck on upgrade.
    > > >
    > > > I'd consider this a RC bug - no clue whose fault this is though ...

    > >
    > > standard answer boot with
    > > rootdelay=X

    >
    > It worked with etch without that parameter and the upgraded did not add
    > it so its a lenny regression - isnt it?


    no it was just luck that it didn't hit you previously.
    kernel gives no guarantee on timing.



    --
    To UNSUBSCRIBE, email to debian-boot-REQUEST@lists.debian.org
    with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

  6. Re: lenny regression initrd/lvm/ rootfs detection timeout

    On Tue, Oct 07, 2008 at 03:54:59PM +0200, Florian Lohoff wrote:
    > On Tue, Oct 07, 2008 at 03:10:47PM +0200, maximilian attems wrote:
    > > > > standard answer boot with
    > > > > rootdelay=X
    > > >
    > > > It worked with etch without that parameter and the upgraded did not add
    > > > it so its a lenny regression - isnt it?

    > >
    > > no it was just luck that it didn't hit you previously.
    > > kernel gives no guarantee on timing.

    >
    > This renders the argument with "rootdelay=" moot - When the kernel gives
    > no guarantee on timing ANY rootdelay works just by luck. So coming back
    > to this issue i consider this still a bug - When a block device comes
    > available the lvm code needs to scan it in case the rootfs is an lvm.
    >
    > The whole issue with finding the rootfs in the initrd needs to be
    > triggered and not waited for base on the statement of yours.


    too late for such changements and no that is not the solution either.

    if you read realease notes for etch you find a rootdelay chapter,
    probably is copied over to lenny.


    --
    To UNSUBSCRIBE, email to debian-boot-REQUEST@lists.debian.org
    with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

  7. Re: lenny regression initrd/lvm/ rootfs detection timeout

    maximilian attems writes:

    > On Tue, Oct 07, 2008 at 03:54:59PM +0200, Florian Lohoff wrote:
    >> On Tue, Oct 07, 2008 at 03:10:47PM +0200, maximilian attems wrote:
    >>>>> standard answer boot with
    >>>>> rootdelay=X
    >>>>
    >>>> It worked with etch without that parameter and the upgraded did not add
    >>>> it so its a lenny regression - isnt it?
    >>>
    >>> no it was just luck that it didn't hit you previously.
    >>> kernel gives no guarantee on timing.

    >>
    >> This renders the argument with "rootdelay=" moot - When the kernel gives
    >> no guarantee on timing ANY rootdelay works just by luck. So coming back
    >> to this issue i consider this still a bug - When a block device comes
    >> available the lvm code needs to scan it in case the rootfs is an lvm.
    >>
    >> The whole issue with finding the rootfs in the initrd needs to be
    >> triggered and not waited for base on the statement of yours.

    >
    > too late for such changements and no that is not the solution either.


    Care to elaborate why not? The principle surely sounds better:
    instead of fragile arbitrary delays, wait until the event we're
    interested in happens. Actually, I always dreamed of a system where
    every dependency is encoded as udev rules, and the boot process only
    has to wait for the root device to appear. And I'm stuck now with a
    problem even rootdelay can't help: local-top/iscsi finishes before
    /dev/sda appears, so local-top/lvm has nothing to activate. And
    there's no rootdelay in between...
    --
    Thanks,
    Feri.


    --
    To UNSUBSCRIBE, email to debian-boot-REQUEST@lists.debian.org
    with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

  8. Re: lenny regression initrd/lvm/ rootfs detection timeout

    This one time, at band camp, Ferenc Wagner said:
    > maximilian attems writes:
    >
    > > On Tue, Oct 07, 2008 at 03:54:59PM +0200, Florian Lohoff wrote:
    > >> On Tue, Oct 07, 2008 at 03:10:47PM +0200, maximilian attems wrote:
    > >>>>> standard answer boot with
    > >>>>> rootdelay=X
    > >>>>
    > >>>> It worked with etch without that parameter and the upgraded did not add
    > >>>> it so its a lenny regression - isnt it?
    > >>>
    > >>> no it was just luck that it didn't hit you previously.
    > >>> kernel gives no guarantee on timing.
    > >>
    > >> This renders the argument with "rootdelay=" moot - When the kernel gives
    > >> no guarantee on timing ANY rootdelay works just by luck. So coming back
    > >> to this issue i consider this still a bug - When a block device comes
    > >> available the lvm code needs to scan it in case the rootfs is an lvm.
    > >>
    > >> The whole issue with finding the rootfs in the initrd needs to be
    > >> triggered and not waited for base on the statement of yours.

    > >
    > > too late for such changements and no that is not the solution either.

    >
    > Care to elaborate why not? The principle surely sounds better:
    > instead of fragile arbitrary delays, wait until the event we're
    > interested in happens. Actually, I always dreamed of a system where
    > every dependency is encoded as udev rules, and the boot process only
    > has to wait for the root device to appear. And I'm stuck now with a
    > problem even rootdelay can't help: local-top/iscsi finishes before
    > /dev/sda appears, so local-top/lvm has nothing to activate. And
    > there's no rootdelay in between...


    (I have nothing to do with any of the affected software, just a comment
    as an interested person).

    If modprobe returns before the device is actually initialized and has
    created sysfs entries, this is probably not fixable in shell scripts.
    If, as I suspect, modprobe does not return immediately, this is probably
    a bug in the scripts that don't call udevsettle and wait for the sysfs
    entries to be turned into block devices for the next script to act on.
    Ferenc, since you are affected, can you test?

    Cheers,
    --
    -----------------------------------------------------------------
    | ,''`. Stephen Gran |
    | : :' : sgran@debian.org |
    | `. `' Debian user, admin, and developer |
    | `- http://www.debian.org |
    -----------------------------------------------------------------

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)

    iD8DBQFI66SZSYIMHOpZA44RApMFAJ9q2ZBUEJQGjPMmItkXga OGkSxR2wCgw9lh
    Bkxn/04lZWiff4vTEXobfQw=
    =yuOK
    -----END PGP SIGNATURE-----


  9. Re: lenny regression initrd/lvm/ rootfs detection timeout

    Stephen Gran writes:

    > This one time, at band camp, Ferenc Wagner said:
    >> maximilian attems writes:
    >>
    >>> On Tue, Oct 07, 2008 at 03:54:59PM +0200, Florian Lohoff wrote:
    >>>> On Tue, Oct 07, 2008 at 03:10:47PM +0200, maximilian attems wrote:
    >>>>>>> standard answer boot with
    >>>>>>> rootdelay=X
    >>>>>>
    >>>>>> It worked with etch without that parameter and the upgraded did not add
    >>>>>> it so its a lenny regression - isnt it?
    >>>>>
    >>>>> no it was just luck that it didn't hit you previously.
    >>>>> kernel gives no guarantee on timing.
    >>>>
    >>>> This renders the argument with "rootdelay=" moot - When the kernel gives
    >>>> no guarantee on timing ANY rootdelay works just by luck. So coming back
    >>>> to this issue i consider this still a bug - When a block device comes
    >>>> available the lvm code needs to scan it in case the rootfs is an lvm.
    >>>>
    >>>> The whole issue with finding the rootfs in the initrd needs to be
    >>>> triggered and not waited for base on the statement of yours.
    >>>
    >>> too late for such changements and no that is not the solution either.

    >>
    >> Care to elaborate why not? The principle surely sounds better:
    >> instead of fragile arbitrary delays, wait until the event we're
    >> interested in happens. Actually, I always dreamed of a system where
    >> every dependency is encoded as udev rules, and the boot process only
    >> has to wait for the root device to appear. And I'm stuck now with a
    >> problem even rootdelay can't help: local-top/iscsi finishes before
    >> /dev/sda appears, so local-top/lvm has nothing to activate. And
    >> there's no rootdelay in between...

    >
    > If modprobe returns before the device is actually initialized and has
    > created sysfs entries, this is probably not fixable in shell scripts.
    > If, as I suspect, modprobe does not return immediately, this is probably
    > a bug in the scripts that don't call udevsettle and wait for the sysfs
    > entries to be turned into block devices for the next script to act on.


    It isn't always a modprobe issue. For example iSCSI can create new
    block devices long after the modules are loaded, depending on network
    delays (it may not be the case here, I'm not sure what iscsistart
    does). But you can also think CONFIG_SCSI_SCAN_ASYNC... Events can
    arrive any time (when you plug in your pendrive), udevsettle can only
    wait for the event queue to empty, not for future events.

    > Ferenc, since you are affected, can you test?


    Using udevsettle (udevadm settle) instead of sleep? Sure, but only
    tomorrow. Actually, that may be the best fix for the open-iscsi
    initramfs script, if iscsistart provides the timing guarrantees the
    kernel does not. Thanks for the suggestion!
    --
    Cheers,
    Feri.


    --
    To UNSUBSCRIBE, email to debian-boot-REQUEST@lists.debian.org
    with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

  10. Re: lenny regression initrd/lvm/ rootfs detection timeout

    Ferenc Wagner writes:

    > Stephen Gran writes:
    >
    >> Ferenc, since you are affected, can you test?

    >
    > Using udevsettle (udevadm settle) instead of sleep?


    That seems to work for me in this case. Thanks for the tip!
    --
    Cheers,
    Feri.


    --
    To UNSUBSCRIBE, email to debian-boot-REQUEST@lists.debian.org
    with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

  11. Re: lenny regression initrd/lvm/ rootfs detection timeout

    On Tue, Oct 07, 2008 at 07:04:09PM +0100, Stephen Gran wrote:
    > If modprobe returns before the device is actually initialized and has
    > created sysfs entries, this is probably not fixable in shell scripts.
    > If, as I suspect, modprobe does not return immediately, this is probably
    > a bug in the scripts that don't call udevsettle and wait for the sysfs
    > entries to be turned into block devices for the next script to act on.
    > Ferenc, since you are affected, can you test?


    The point is that an easy fix would be to rescan lvm devices on timeout
    instead of just looking for the root dev node existing.

    The point is if the lvm pv's are not available on lvm scan but come up
    later the whole boot process stops.

    So instead of going the "right" way of using some kind of udev trigger
    one could now as a quick fix rerun the lvm start script on the timeouts
    which would solve the logical volume as root on late blockdev.

    Flo
    --
    Florian Lohoff flo@rfc822.org +49-171-2280134
    Those who would give up a little freedom to get a little
    security shall soon have neither - Benjamin Franklin

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)

    iD8DBQFI8jcFUaz2rXW+gJcRAtlSAJsFCVWKIbBvlo8yHoQUoG Om2XaNHwCg189o
    IMtPg6Q0ofeZZ2KPsRm+dVg=
    =knh8
    -----END PGP SIGNATURE-----


+ Reply to Thread