Regression: Boot hang sizing transparent PCI-to-PCI bridge since after 2.6.25-r7. - Kernel

This is a discussion on Regression: Boot hang sizing transparent PCI-to-PCI bridge since after 2.6.25-r7. - Kernel ; Hello. Time ago I bisected a commit that was making my OKI Anima 3300 laptop hang during boot. The offending commit was: ------------ commit 12c22d6ef299ccf0955e5756eb57d90d7577ac68 Author: Linus Torvalds Date: Wed Mar 26 11:22:40 2008 -0700 Revert "PCI: remove transparent bridge ...

+ Reply to Thread
Results 1 to 8 of 8

Thread: Regression: Boot hang sizing transparent PCI-to-PCI bridge since after 2.6.25-r7.

  1. Regression: Boot hang sizing transparent PCI-to-PCI bridge since after 2.6.25-r7.

    Hello.

    Time ago I bisected a commit that was making my OKI Anima 3300 laptop
    hang during boot.

    The offending commit was:

    ------------
    commit 12c22d6ef299ccf0955e5756eb57d90d7577ac68
    Author: Linus Torvalds
    Date: Wed Mar 26 11:22:40 2008 -0700

    Revert "PCI: remove transparent bridge sizing"

    This reverts commit 8fa5913d54f3b1e09948e6a0db34da887e05ff1f, which
    caused various interesting problems for people, including wrong
    resource
    allocations. See for example bugzilla entry "2.6.25-rc2: ohci1394
    problem (MMIO broken)" at

    http://bugzilla.kernel.org/show_bug.cgi?id=10080

    [...]
    ------------

    It happenend sometime fter 2.6.25-rc7; since then I've been living on
    hand-patched kernels; I can't boot any official kernel from Ubuntu or
    Gentoo without patching them. For Ubuntu this would mean rebuilding the
    LiveCD and then manually patching every kernel update and restricted
    modules distribution...


    I filled this bug:

    http://bugzilla.kernel.org/show_bug.cgi?id=11054

    that seems to be getting no attention at all. It includes detailed lspci
    output.


    The specific bridge that fails (as far as I can tell from putting
    kprintf()'s into the kernel) is:

    00:10.0 PCI bridge: nVidia Corporation MCP51 PCI Bridge (rev a2)
    (prog-if 01 [Subtractive decode])
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
    ParErr- Stepping- SERR+ FastB2B- DisINTx-
    Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
    SERR- Latency: 0
    Bus: primary=00, secondary=04, subordinate=08, sec-latency=64
    I/O behind bridge: 0000f000-00000fff
    Memory behind bridge: c3000000-c30fffff
    Prefetchable memory behind bridge: fff00000-000fffff
    Secondary status: 66MHz- FastB2B+ ParErr- DEVSEL=medium >TAbort-
    BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-
    PriDiscTmr- SecDiscTmr+ DiscTmrStat- DiscTmrSERREn-
    Capabilities: [b8] Subsystem: Gammagraphx, Inc. Device 0000
    Capabilities: [8c] HyperTransport: MSI Mapping Enable- Fixed-
    Mapping Address Base: 00000000fee00000


    A specific weird detail of the machine (I don't know whether it's
    related at all) is that the onboard ethernet controller is recognized as
    some kind of bridge??? :

    00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a3)
    Subsystem: CLEVO/KAPOK Computer Device 5403
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
    ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
    SERR- Latency: 0 (250ns min, 5000ns max)
    Interrupt: pin A routed to IRQ 23
    Region 0: Memory at c0007000 (32-bit, non-prefetchable)
    [size=4K]
    Region 1: I/O ports at 30b8 [size=8]
    Capabilities: [44] Power Management version 2
    Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
    PME(D0+,D1+,D2+,D3hot+,D3cold+)
    Status: D0 PME-Enable+ DSel=0 DScale=0 PME-
    Kernel driver in use: forcedeth


    Since this issue means that either I scrap my laptop or suffer a forced
    live of kernel micromanagement no matter the distro I choose, I'm very
    interested in helping with this. I'll try to find the problem myself if
    someone directs me about what to look for.


    Regards, and thanks in advance,

    Juan Jesus.


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: Regression: Boot hang sizing transparent PCI-to-PCI bridge since after 2.6.25-r7.

    On 11/10/2008 10:12 AM, GARCIA DE SORIA LUCENA, JUAN JESUS wrote:
    > Hello.
    >
    > Time ago I bisected a commit that was making my OKI Anima 3300 laptop
    > hang during boot.


    Doesn't pci=norom help in your case? There was a patch which tried to resolve
    this issue in a different manner, but it was reverted too. This boot parameter
    was introduced as a replacement IIRC.

    > The offending commit was:
    >
    > ------------
    > commit 12c22d6ef299ccf0955e5756eb57d90d7577ac68
    > Author: Linus Torvalds
    > Date: Wed Mar 26 11:22:40 2008 -0700
    >
    > Revert "PCI: remove transparent bridge sizing"
    >
    > This reverts commit 8fa5913d54f3b1e09948e6a0db34da887e05ff1f, which
    > caused various interesting problems for people, including wrong
    > resource
    > allocations. See for example bugzilla entry "2.6.25-rc2: ohci1394
    > problem (MMIO broken)" at
    >
    > http://bugzilla.kernel.org/show_bug.cgi?id=10080
    >
    > [...]

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: Regression: Boot hang sizing transparent PCI-to-PCI bridge since after 2.6.25-r7.

    On Mon, Nov 10, 2008 at 11:14:11AM +0100, Jiri Slaby wrote:
    > On 11/10/2008 10:12 AM, GARCIA DE SORIA LUCENA, JUAN JESUS wrote:
    > > Hello.
    > >
    > > Time ago I bisected a commit that was making my OKI Anima 3300 laptop
    > > hang during boot.

    >
    > Doesn't pci=norom help in your case? There was a patch which tried to resolve
    > this issue in a different manner, but it was reverted too. This boot parameter
    > was introduced as a replacement IIRC.


    As the unfortunate author of both of the reverted patches and author
    of the pci=norom patch I can confirm that Jiri is correct. The issues
    that the patches addressed (with some unintended side effects caused
    by the reverted attempts) were PCI resource allocation failures observed
    during PCI hotplug. We were not seeing or trying to address boot-time
    PCI resource allocation failures or hangs.

    Gary

    --
    Gary Hade
    System x Enablement
    IBM Linux Technology Center
    503-578-4503 IBM T/L: 775-4503
    garyhade@us.ibm.com
    http://www.ibm.com/linux/ltc

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. RE: Regression: Boot hang sizing transparent PCI-to-PCI bridgesince after 2.6.25-r7.

    Hi again.

    > -----Original Message-----
    > From: Gary Hade [mailto:garyhade@us.ibm.com]
    >
    > > Doesn't pci=norom help in your case? There was a patch

    > which tried to
    > > resolve this issue in a different manner, but it was reverted too.
    > > This boot parameter was introduced as a replacement IIRC.

    >
    > As the unfortunate author of both of the reverted patches and
    > author of the pci=norom patch I can confirm that Jiri is
    > correct. The issues that the patches addressed (with some
    > unintended side effects caused by the reverted attempts) were
    > PCI resource allocation failures observed during PCI hotplug.
    > We were not seeing or trying to address boot-time PCI
    > resource allocation failures or hangs.
    >
    > Gary


    I've tested pci=norom with the Ubuntu 8.10 AMD64 kernel, with no
    effects. I'll try to download and test the 32 bit version too, to check
    whether it has anything to do with the size of the resource_size_t type
    (defined in linux/types.h) being u64. Perhaps it's u32 in a 32 bit
    architecture.

    A problem with this bridge in my lspci info is that both the I/O and the
    prefetchable memory ranges behind bridges have a end address BELOW the
    start address.


    I/O behind bridge: 0000f000-00000fff
    Memory behind bridge: c3000000-c30fffff
    Prefetchable memory behind bridge: fff00000-000fffff


    I don't know whether these ranges are supposed to encompass the BIOS ROM
    or whatever (and thus your pci=norom option). Another explanation for
    the hang may lie in the definition of resource_size() (defined in
    linux/ioport.h):


    static inline resource_size_t resource_size(struct resource *res)
    {
    return res->end - res->start + 1;
    }


    As you can see, the subtraction will overflow due to the end address
    being BELOW the start address, and the resulting size will be different
    when resource_size_t is u64 than it would be if it's u32.

    Moreover, I suppose that the intended size-of-size for the IO range
    would be u16.

    I mean: see this table of calculations:

    Size, current u64 Size, u16/u32 clamp
    I/O 0xFFFFFFFFFFFF2000 0x2000
    Prefetchable mem 0xFFFFFFFF00200000 0x200000


    Where "clamping" means and'ing with 0xffff for u16 clamp or 0xffffffff
    for u32 clamp (at least when the end address is below the start
    address).


    You can see that, apart from whether there's a rom in the address range
    or not, the size calculation (at least by code inspection by simple
    arithmetic in resource_size()) seems to be wrong. It produces gigantic
    address range sizes, whereas the clamped values (8KB I/O, 2MB
    prefetchable mem) seem to be far more sensible.

    I tried to do some preliminary tests by putting conditions inside
    resource_size() to check for "reversed ranges" and clamp the size by
    anding with (res->flags & IORESOURCE_IO) ? 0xffffUL : 0xffffffffUL, but
    to no avail yet. It printed the IO mapping in the kernel messages, but
    it seems to have choked on the memory range??? I have to investigate
    more.


    I don't know if what I've written above gives you any clue about my
    issue.


    Regards, and thanks in advance,

    Juan Jesus.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: Regression: Boot hang sizing transparent PCI-to-PCI bridgesince after 2.6.25-r7.

    GARCIA DE SORIA LUCENA, JUAN JESUS napsal(a):
    > resource_size_t type
    > (defined in linux/types.h) being u64. Perhaps it's u32 in a 32 bit
    > architecture.


    Unless you have RESOURCES_64BIT=y which is the default on x86_32 now.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. RE: Regression: Boot hang sizing transparent PCI-to-PCI bridgesinceafter 2.6.25-r7.

    > -----Original Message-----
    > From: Jiri Slaby [mailto:jirislaby@gmail.com]
    >
    > GARCIA DE SORIA LUCENA, JUAN JESUS napsal(a):
    > > resource_size_t type
    > > (defined in linux/types.h) being u64. Perhaps it's u32 in a 32 bit
    > > architecture.

    >
    > Unless you have RESOURCES_64BIT=y which is the default on x86_32 now.


    Ugh. Knowing this will save me from downloading, burning and testing the
    32 bit Ubuntu distro, whose 2.6.27 kernel will surely use that default
    configuration.

    I'll perform more tests, with more debug kprintf()'s, anyway.

    Do you remember if the cases Gary's patches were trying to fix included
    ranges rolling past the 32-bit address limit (end address below start
    address)?


    Regards,

    Juan Jesus.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: Regression: Boot hang sizing transparent PCI-to-PCI bridgesinceafter 2.6.25-r7.

    On Tue, Nov 11, 2008 at 12:43:28PM +0100, GARCIA DE SORIA LUCENA, JUAN JESUS wrote:
    > > -----Original Message-----
    > > From: Jiri Slaby [mailto:jirislaby@gmail.com]
    > >
    > > GARCIA DE SORIA LUCENA, JUAN JESUS napsal(a):
    > > > resource_size_t type
    > > > (defined in linux/types.h) being u64. Perhaps it's u32 in a 32 bit
    > > > architecture.

    > >
    > > Unless you have RESOURCES_64BIT=y which is the default on x86_32 now.

    >
    > Ugh. Knowing this will save me from downloading, burning and testing the
    > 32 bit Ubuntu distro, whose 2.6.27 kernel will surely use that default
    > configuration.
    >
    > I'll perform more tests, with more debug kprintf()'s, anyway.
    >
    > Do you remember if the cases Gary's patches were trying to fix included
    > ranges rolling past the 32-bit address limit (end address below start
    > address)?


    No, my patches were not trying to fix anything like that.

    I looked at the `lspci -vvv` output for some transparent bridges on
    one of our systems and found that the messed up looking ranges are
    not unique to your system. We also see that on our systems. I checked
    the lspci code and found that the displayed ranges are based on base
    and limit register values obtained directly from PCI config space
    for the device. I also noticed that -vvv it will display values even
    if they do not represent valid ranges such as you might expect for the
    contents of base and limit registers on transparent bridges. This
    caused me to peek at the lspci man page which says:
    "-vvv Be even more verbose and display everything we are able
    to parse, even if it doesn’t look interesting at all
    (e.g., undefined memory regions)."
    You must be getting some of that "even if it doesn't look interesting
    at all" stuff.

    Gary

    --
    Gary Hade
    System x Enablement
    IBM Linux Technology Center
    503-578-4503 IBM T/L: 775-4503
    garyhade@us.ibm.com
    http://www.ibm.com/linux/ltc

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: Regression: Boot hang sizing transparent PCI-to-PCI bridge since after 2.6.25-r7.

    On Mon, Nov 10, 2008 at 09:58:16AM -0800, Gary Hade wrote:
    > On Mon, Nov 10, 2008 at 11:14:11AM +0100, Jiri Slaby wrote:
    > > On 11/10/2008 10:12 AM, GARCIA DE SORIA LUCENA, JUAN JESUS wrote:
    > > > Hello.
    > > >
    > > > Time ago I bisected a commit that was making my OKI Anima 3300 laptop
    > > > hang during boot.

    > >
    > > Doesn't pci=norom help in your case? There was a patch which tried to resolve
    > > this issue in a different manner, but it was reverted too. This boot parameter
    > > was introduced as a replacement IIRC.

    >
    > As the unfortunate author of both of the reverted patches and author
    > of the pci=norom patch I can confirm that Jiri is correct. The issues
    > that the patches addressed (with some unintended side effects caused
    > by the reverted attempts) were PCI resource allocation failures observed
    > during PCI hotplug. We were not seeing or trying to address boot-time
    > PCI resource allocation failures or hangs.


    Correction. We were not trying to address boot-time hangs but I
    believe we may have been trying to address expansion ROM related
    PCI resource allocation failures that we were seeing during boot
    with certain PCI cards. However, I doubt that this is relevent
    to your problem. The transparent bridge sizing removal change is
    probably a red herring.

    Gary

    --
    Gary Hade
    System x Enablement
    IBM Linux Technology Center
    503-578-4503 IBM T/L: 775-4503
    garyhade@us.ibm.com
    http://www.ibm.com/linux/ltc
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread