acpi based pci gap calculation - v3 - Kernel

This is a discussion on acpi based pci gap calculation - v3 - Kernel ; Hi Andi, I don't see this patch being applied in any of the tree yet. Resending this incase you guys might have missed it. I assume this will go through the ACPI tree. The need for this patch was explained ...

+ Reply to Thread
Results 1 to 16 of 16

Thread: acpi based pci gap calculation - v3

  1. acpi based pci gap calculation - v3


    Hi Andi,

    I don't see this patch being applied in any of the tree yet.
    Resending this incase you guys might have missed it.
    I assume this will go through the ACPI tree.

    The need for this patch was explained in
    http://marc.info/?l=linux-kernel&m=121441818818598&w=2

    The v2 patch was discussed in
    http://marc.info/?l=linux-acpi&m=121433339619175&w=2

    I then sent a incremental patch on top of v2 to fix some conditions and
    handle the change in the e820_search_gap interface.

    In this v3 patch, i have folded the incremental fix into the v2 patch.
    Please apply this.

    Thanks,
    Alok

    --
    acpi based pci gap calculation - v3

    From: Alok N Kataria

    Evaluates the _CRS object under PCI0 looking for producer resources.
    Then searches the e820 memory space for a gap within these producer resources.

    Allows us to find a gap for the unclaimed pci resources or MMIO resources
    for hotplug devices within the BIOS allowed pci regions.

    v1->v2: Some changes in the print strings.
    Also note the prototype for e820_search_gap has changed in this
    iteration, but the return value is ignored while calling it over here.
    v2->v3: Pass the pci producer resources end_addr (got from _CRS) to the
    e820_search_gap function.
    We limit the search only within the producer region's address space.
    Also add a condition to check if the resource lies in the 32bit
    address range.

    Signed-off-by: Alok N Kataria
    Cc: Andi Kleen
    Cc: Len Brown
    Cc: Ingo Molnar
    ---

    arch/x86/pci/acpi.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++ +
    1 files changed, 80 insertions(+), 0 deletions(-)


    diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
    index 19af069..fff2c42 100644
    --- a/arch/x86/pci/acpi.c
    +++ b/arch/x86/pci/acpi.c
    @@ -4,6 +4,7 @@
    #include
    #include
    #include
    +#include
    #include "pci.h"

    struct pci_root_info {
    @@ -14,6 +15,11 @@ struct pci_root_info {
    int busnum;
    };

    +struct gap_info {
    + unsigned long gapstart;
    + unsigned long gapsize;
    +};
    +
    static acpi_status
    resource_to_addr(struct acpi_resource *resource,
    struct acpi_resource_address64 *addr)
    @@ -110,6 +116,78 @@ adjust_transparent_bridge_resources(struct pci_bus *bus)
    }
    }

    +static acpi_status search_gap(struct acpi_resource *resource, void *data)
    +{
    + struct acpi_resource_address64 addr;
    + acpi_status status;
    + struct gap_info *gap = data;
    + unsigned long long start_addr, end_addr;
    +
    + status = resource_to_addr(resource, &addr);
    + if (ACPI_SUCCESS(status) &&
    + addr.resource_type == ACPI_MEMORY_RANGE &&
    + addr.address_length > gap->gapsize) {
    + start_addr = addr.minimum + addr.translation_offset;
    + /*
    + * We want space only in the 32bit address range
    + */
    + if (start_addr < UINT_MAX) {
    + end_addr = start_addr + addr.address_length;
    + e820_search_gap(&gap->gapstart, &gap->gapsize,
    + start_addr, end_addr);
    + }
    + }
    +
    + return AE_OK;
    +}
    +
    +/*
    + * Search for a hole in the 32 bit address space for PCI to assign MMIO
    + * resources, for hotplug or unconfigured resources.
    + * We query the CRS object of the PCI root device to look for possible producer
    + * resources in the tree and consider these while calulating the start address
    + * for this hole.
    + */
    +static void pci_setup_gap(acpi_handle *handle)
    +{
    + struct gap_info gap;
    + acpi_status status;
    +
    + gap.gapstart = 0;
    + gap.gapsize = 0x400000;
    +
    + status = acpi_walk_resources(handle, METHOD_NAME__CRS,
    + search_gap, &gap);
    +
    + if (ACPI_SUCCESS(status)) {
    + unsigned long round;
    +
    + if (!gap.gapstart) {
    + printk(KERN_ERR "ACPI: Warning: Cannot find a gap "
    + "in the 32bit address range for PCI\n"
    + "ACPI: PCI devices may collide with "
    + "hotpluggable memory address range\n");
    + }
    + /*
    + * Round the gapstart, uses the same logic as in
    + * e820_gap_setup
    + */
    + round = 0x100000;
    + while ((gap.gapsize >> 4) > round)
    + round += round;
    + /* Fun with two's complement */
    + pci_mem_start = (gap.gapstart + round) & -round;
    +
    + printk(KERN_INFO "ACPI: PCI resources should "
    + "start at %lx (gap: %lx:%lx)\n",
    + pci_mem_start, gap.gapstart, gap.gapsize);
    + } else {
    + printk(KERN_ERR "ACPI: Error while searching for gap in "
    + "the 32bit address range for PCI\n");
    + }
    +}
    +
    +
    static void
    get_current_resources(struct acpi_device *device, int busnum,
    int domain, struct pci_bus *bus)
    @@ -220,6 +298,8 @@ struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_device *device, int do

    if (bus && (pci_probe & PCI_USE__CRS))
    get_current_resources(device, busnum, domain, bus);
    +
    + pci_setup_gap(device->handle);
    return bus;
    }



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: acpi based pci gap calculation - v3

    On Tuesday, July 15, 2008 11:59 am Alok Kataria wrote:
    > Hi Andi,
    >
    > I don't see this patch being applied in any of the tree yet.
    > Resending this incase you guys might have missed it.
    > I assume this will go through the ACPI tree.
    >
    > The need for this patch was explained in
    > http://marc.info/?l=linux-kernel&m=121441818818598&w=2
    >
    > The v2 patch was discussed in
    > http://marc.info/?l=linux-acpi&m=121433339619175&w=2
    >
    > I then sent a incremental patch on top of v2 to fix some conditions and
    > handle the change in the e820_search_gap interface.
    >
    > In this v3 patch, i have folded the incremental fix into the v2 patch.
    > Please apply this.


    This should probably go to linux-pci too.

    I wonder if this is stable enough to go into 2.6.27? Most of the PCI bugs we
    have at the moment are related to PCI resource allocation failures. My hope
    is that finding more space will fix most of them. Assuming this patch
    doesn't have any dependencies, I can put it in my linux-next branch.

    Also, TJ was working on something similar; any comments TJ?

    Thanks,
    Jesse
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: acpi based pci gap calculation - v3

    On Tue, 2008-07-15 at 13:28 -0700, Jesse Barnes wrote:
    > On Tuesday, July 15, 2008 11:59 am Alok Kataria wrote:
    > > Hi Andi,
    > >
    > > I don't see this patch being applied in any of the tree yet.
    > > Resending this incase you guys might have missed it.
    > > I assume this will go through the ACPI tree.
    > >
    > > The need for this patch was explained in
    > > http://marc.info/?l=linux-kernel&m=121441818818598&w=2
    > >
    > > The v2 patch was discussed in
    > > http://marc.info/?l=linux-acpi&m=121433339619175&w=2
    > >
    > > I then sent a incremental patch on top of v2 to fix some conditions and
    > > handle the change in the e820_search_gap interface.
    > >
    > > In this v3 patch, i have folded the incremental fix into the v2 patch.
    > > Please apply this.

    >
    > This should probably go to linux-pci too.
    >
    > I wonder if this is stable enough to go into 2.6.27?


    I have tested it with couple of different BIOS settings and it seems to
    work as it should.

    > Most of the PCI bugs we
    > have at the moment are related to PCI resource allocation failures. My hope
    > is that finding more space will fix most of them. Assuming this patch
    > doesn't have any dependencies, I can put it in my linux-next branch.


    No dependencies, I had added a function e820_search_gap which is used by
    this patch. This function is already in the mainline tree.
    Thanks for applying.

    --
    Alok

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: acpi based pci gap calculation - v3

    On Tuesday, July 15, 2008 1:54 pm Alok Kataria wrote:
    > I have tested it with couple of different BIOS settings and it seems to
    > work as it should.
    >
    > > Most of the PCI bugs we
    > > have at the moment are related to PCI resource allocation failures. My
    > > hope is that finding more space will fix most of them. Assuming this
    > > patch doesn't have any dependencies, I can put it in my linux-next
    > > branch.

    >
    > No dependencies, I had added a function e820_search_gap which is used by
    > this patch. This function is already in the mainline tree.
    > Thanks for applying.


    Ok, I stuffed it in my linux-next branch. I'll let it sit there for a day or
    so though, just to shake out any build problems etc. in linux-next before
    asking Linus to pull the whole lot.

    Thanks,
    Jesse
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: acpi based pci gap calculation - v3


    * Jesse Barnes wrote:

    > On Tuesday, July 15, 2008 1:54 pm Alok Kataria wrote:
    > > I have tested it with couple of different BIOS settings and it seems to
    > > work as it should.
    > >
    > > > Most of the PCI bugs we
    > > > have at the moment are related to PCI resource allocation failures. My
    > > > hope is that finding more space will fix most of them. Assuming this
    > > > patch doesn't have any dependencies, I can put it in my linux-next
    > > > branch.

    > >
    > > No dependencies, I had added a function e820_search_gap which is
    > > used by this patch. This function is already in the mainline tree.
    > > Thanks for applying.

    >
    > Ok, I stuffed it in my linux-next branch. I'll let it sit there for a
    > day or so though, just to shake out any build problems etc. in
    > linux-next before asking Linus to pull the whole lot.


    thanks. The general x86 infrastructure bits this patch depends on are
    upstream already.

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: acpi based pci gap calculation - v3

    Jesse Barnes wrote:
    > On Tuesday, July 15, 2008 1:54 pm Alok Kataria wrote:
    >> I have tested it with couple of different BIOS settings and it seems to
    >> work as it should.
    >>
    >>> Most of the PCI bugs we
    >>> have at the moment are related to PCI resource allocation failures. My
    >>> hope is that finding more space will fix most of them. Assuming this
    >>> patch doesn't have any dependencies, I can put it in my linux-next
    >>> branch.

    >> No dependencies, I had added a function e820_search_gap which is used by
    >> this patch. This function is already in the mainline tree.
    >> Thanks for applying.

    >
    > Ok, I stuffed it in my linux-next branch. I'll let it sit there for a day or
    > so though, just to shake out any build problems etc. in linux-next before
    > asking Linus to pull the whole lot.


    For me it seems a little risky to push up that early. I think it would
    be better to let it sit longer in linux-next for this. After all this is
    a major change how IO resources are allocated. That is why I didn't pull
    it into the ACPI release branch.

    -Andi

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: acpi based pci gap calculation - v3

    Alok Kataria wrote:
    > Hi Andi,
    >
    > I don't see this patch being applied in any of the tree yet.
    > Resending this incase you guys might have missed it.


    Sorry still some backlog from vacation last week. However the focus
    right now is on patches for 2.6.27 and I don't think your patch
    is really tested well enough yet for that. It is fairly intrusive
    and changes a critical piece of code. More likely it's a .28 thing,
    or possibly a late merge candidate (but that's also not clear)

    -Andi
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: acpi based pci gap calculation - v3

    On Tuesday, July 15, 2008 10:40 pm Andi Kleen wrote:
    > Jesse Barnes wrote:
    > > On Tuesday, July 15, 2008 1:54 pm Alok Kataria wrote:
    > >> I have tested it with couple of different BIOS settings and it seems to
    > >> work as it should.
    > >>
    > >>> Most of the PCI bugs we
    > >>> have at the moment are related to PCI resource allocation failures. My
    > >>> hope is that finding more space will fix most of them. Assuming this
    > >>> patch doesn't have any dependencies, I can put it in my linux-next
    > >>> branch.
    > >>
    > >> No dependencies, I had added a function e820_search_gap which is used by
    > >> this patch. This function is already in the mainline tree.
    > >> Thanks for applying.

    > >
    > > Ok, I stuffed it in my linux-next branch. I'll let it sit there for a
    > > day or so though, just to shake out any build problems etc. in linux-next
    > > before asking Linus to pull the whole lot.

    >
    > For me it seems a little risky to push up that early. I think it would
    > be better to let it sit longer in linux-next for this. After all this is
    > a major change how IO resources are allocated. That is why I didn't pull
    > it into the ACPI release branch.


    The only problem there is that linux-next doesn't get nearly the sort of
    testing coverage we need for this kind of change. It's also small, and
    reverting it is easy, so if we run into big problems that we can't fix we can
    always back it out...

    I'm interested in hearing people's thoughts on this.

    Jesse
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: acpi based pci gap calculation - v3


    > The only problem there is that linux-next doesn't get nearly the sort of
    > testing coverage we need for this kind of change.


    Normally I tend to wait for one -mm release, which seems to be tested
    by a reasonable number of people. If it survives that it is good
    to be tested in Linus' tree.

    Just stuffing this in in literally the last minute doesn't seem
    like a good idea.

    -Andi
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: acpi based pci gap calculation - v3

    On Wednesday, July 16, 2008 9:33 am Andi Kleen wrote:
    > > The only problem there is that linux-next doesn't get nearly the sort of
    > > testing coverage we need for this kind of change.

    >
    > Normally I tend to wait for one -mm release, which seems to be tested
    > by a reasonable number of people. If it survives that it is good
    > to be tested in Linus' tree.
    >
    > Just stuffing this in in literally the last minute doesn't seem
    > like a good idea.


    Well it's hardly last minute given that the merge window only opened a couple
    of days ago...

    But beyond that, now that I've thought about it a bit more I'm not even sure
    the patch is really correct (though it works on my test machines). Shouldn't
    we be looking at _PRS not _CRS? And ideally we should try to find even more
    space, not less. This patch made one of my machines lose quite a bit of
    space:

    ....
    Allocating PCI resources starting at c0000000 (gap: bf000000:40f00000)
    ....
    ACPI: PCI resources should start at c0000000 (gap: bf000000:31000000)
    ....

    which is a step backwards. With that in mind, I reverted the patch before
    asking Linus to pull; I'm hopeful we can do better though. I'd love to never
    see "resource allocation failed" messages anymore.

    Thanks,
    Jesse
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: acpi based pci gap calculation - v3

    Hi Jesse,

    On Wed, 2008-07-16 at 17:03 -0700, Jesse Barnes wrote:
    > On Wednesday, July 16, 2008 9:33 am Andi Kleen wrote:
    > > > The only problem there is that linux-next doesn't get nearly the sort of
    > > > testing coverage we need for this kind of change.

    > >
    > > Normally I tend to wait for one -mm release, which seems to be tested
    > > by a reasonable number of people. If it survives that it is good
    > > to be tested in Linus' tree.
    > >
    > > Just stuffing this in in literally the last minute doesn't seem
    > > like a good idea.

    >
    > Well it's hardly last minute given that the merge window only opened a couple
    > of days ago...
    >
    > But beyond that, now that I've thought about it a bit more I'm not even sure
    > the patch is really correct (though it works on my test machines). Shouldn't
    > we be looking at _PRS not _CRS?


    IMO, the current resource allocations will give us a better idea of
    which region is free.
    Besides, from what i read PRS is optional and not all BIOS's may export
    that.

    > And ideally we should try to find even more
    > space, not less. This patch made one of my machines lose quite a bit of
    > space:
    >
    > ...
    > Allocating PCI resources starting at c0000000 (gap: bf000000:40f00000)
    > ...
    > ACPI: PCI resources should start at c0000000 (gap: bf000000:31000000)
    > ...
    >


    Yep, we should try to find more space but we should also make sure that
    this space doesn't clash with any other resource.
    As explained in my first mail while posting v1 of these patches, the
    kernel ignores the memory hotplug range while calculating this gap for
    PCI, and consequently these regions collide.
    With this patch we have put some constraints like looking only in the
    producer regions rather than all regions which are right now not
    reserved/used by the e820 map.

    However, looking back again i think we can improve this further in some
    cases.
    I have made some more changes in the e820_search_gap algorithm to find
    the *biggest* un-utilized gap in the 32bit address range, and have added
    some debug print messages.
    Can you please try this patch on your system and mail me your full
    dmesg.

    Thanks,
    Alok

    ---

    arch/x86/kernel/e820.c | 19 +++++++++++++++++++
    arch/x86/pci/acpi.c | 4 ++++
    2 files changed, 23 insertions(+), 0 deletions(-)


    diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
    index a5383ae..298aeec 100644
    --- a/arch/x86/kernel/e820.c
    +++ b/arch/x86/kernel/e820.c
    @@ -539,6 +539,8 @@ __init int e820_search_gap(unsigned long *gapstart, unsigned long *gapsize,

    last = (end_addr && end_addr < MAX_GAP_END) ? end_addr : MAX_GAP_END;

    + printk("E820_DEBUG: Searching for gap between (0x%08lx - 0x%08llx)\n",
    + start_addr, end_addr);
    while (--i >= 0) {
    unsigned long long start = e820.map[i].addr;
    unsigned long long end = start + e820.map[i].size;
    @@ -556,9 +558,26 @@ __init int e820_search_gap(unsigned long *gapstart, unsigned long *gapsize,
    if (gap >= *gapsize) {
    *gapsize = gap;
    *gapstart = end;
    + printk("E820_DEBUG: Found gap starting at "
    + "0x%08llx size 0x%08llx\n", end, gap);
    found = 1;
    }
    }
    + if (start > start_addr) {
    + unsigned long gap, prev_end;
    + prev_end = e820.map[i-1].addr + e820.map[i-1].size;
    + if (start_addr > prev_end) {
    + gap = start - start_addr;
    + if (gap >=*gapsize) {
    + *gapsize = gap;
    + *gapstart = start_addr;
    + printk("E820_DEBUG: Found gap at start starting at "
    + "0x%08llx size 0x%08llx\n", end, gap);
    + found = 1;
    + start = start_addr;
    + }
    + }
    + }
    if (start < last)
    last = start;
    }
    diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
    index fff2c42..1a88149 100644
    --- a/arch/x86/pci/acpi.c
    +++ b/arch/x86/pci/acpi.c
    @@ -131,8 +131,12 @@ static acpi_status search_gap(struct acpi_resource *resource, void *data)
    /*
    * We want space only in the 32bit address range
    */
    + printk("ACPI_DEBUG start_addr 0x%08lx gapsize 0x%08lx "
    + "address_length 0x%08lx\n", start_addr, gap->gapsize,
    + addr.address_length);
    if (start_addr < UINT_MAX) {
    end_addr = start_addr + addr.address_length;
    + printk("\t\tend_addr is 0x%08lx\n", end_addr);
    e820_search_gap(&gap->gapstart, &gap->gapsize,
    start_addr, end_addr);
    }


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: acpi based pci gap calculation - v3

    On Tue, 2008-07-22 at 14:50 -0700, Jesse Barnes wrote:
    > On Monday, July 21, 2008 10:59 am Alok Kataria wrote:
    > > Hi Jesse,
    > >
    > > Did you get a chance to try this patch on your box. Let me know what are
    > > the values you get now.

    >
    > Here's the dmesg from my box with the ACPI gap stuff applied.
    >
    > I'm still more inclined to TJ's approach though; it should give us a lot more
    > space for PCI devices; though you're right that avoiding conflicts is
    > definitely important too...


    Hi Jesse,

    Thanks for sending the log.

    In the log that you sent me, please note the following debug messages
    -------
    E820_DEBUG: Searching for gap between (0x00000000 - 0x100000000)
    E820_DEBUG: Found gap starting at 0xbf000000 size 0x40f00000
    Allocating PCI resources starting at c0000000 (gap: bf000000:40f00000)
    -------

    This is the gap that was allocated by walking just the e820_map

    With my changes we query the _CRS resource and get following info
    ------
    ACPI_DEBUG start_addr 0xf8000000 gapsize 0x00400000 address_length 0x06b00000
    end_addr is 0xfeb00000
    E820_DEBUG: Searching for gap between (0xf8000000 - 0xfeb00000)
    E820_DEBUG: Found gap at start starting at 0x100000000 size 0x07f00000
    ACPI_DEBUG start_addr 0xbf000000 gapsize 0x07f00000 address_length 0x31000000
    end_addr is 0xf0000000
    E820_DEBUG: Searching for gap between (0xbf000000 - 0xf0000000)
    E820_DEBUG: Found gap starting at 0xbf000000 size 0x31000000
    ------

    So there are 2 producer regions one from [0xBF000000 - 0xF0000000] and
    another from [0xF8000000 - 0xFEB00000]. That means BIOS has reserved the
    area from [0xF0000000 - 0xF7FFFFFF] for some other resource.
    If you look a little below in the log there is this

    ----
    system 00:01: iomem range 0xf0000000-0xf7ffffff has been reserved
    ----

    So the gap that we had calculated first i.e. from e820_setup_gap did
    contain a collision i.e. though a resource was reserved from
    [0xf0000000 - 0xf7ffffff] our gap calculation doesn't take that into
    account. My patch fixes this issue.

    So, IMHO this is a BUG and should be fixed. Please let me know your
    views.

    Thanks,
    Alok

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: acpi based pci gap calculation - v3

    On Tuesday, July 22, 2008 3:52 pm Alok Kataria wrote:
    > In the log that you sent me, please note the following debug messages
    > -------
    > E820_DEBUG: Searching for gap between (0x00000000 - 0x100000000)
    > E820_DEBUG: Found gap starting at 0xbf000000 size 0x40f00000
    > Allocating PCI resources starting at c0000000 (gap: bf000000:40f00000)
    > -------
    >
    > This is the gap that was allocated by walking just the e820_map
    >
    > With my changes we query the _CRS resource and get following info
    > ------
    > ACPI_DEBUG start_addr 0xf8000000 gapsize 0x00400000 address_length
    > 0x06b00000 end_addr is 0xfeb00000
    > E820_DEBUG: Searching for gap between (0xf8000000 - 0xfeb00000)
    > E820_DEBUG: Found gap at start starting at 0x100000000 size 0x07f00000
    > ACPI_DEBUG start_addr 0xbf000000 gapsize 0x07f00000 address_length
    > 0x31000000 end_addr is 0xf0000000
    > E820_DEBUG: Searching for gap between (0xbf000000 - 0xf0000000)
    > E820_DEBUG: Found gap starting at 0xbf000000 size 0x31000000
    > ------
    >
    > So there are 2 producer regions one from [0xBF000000 - 0xF0000000] and
    > another from [0xF8000000 - 0xFEB00000]. That means BIOS has reserved the
    > area from [0xF0000000 - 0xF7FFFFFF] for some other resource.
    > If you look a little below in the log there is this
    >
    > ----
    > system 00:01: iomem range 0xf0000000-0xf7ffffff has been reserved
    > ----
    >
    > So the gap that we had calculated first i.e. from e820_setup_gap did
    > contain a collision i.e. though a resource was reserved from
    > [0xf0000000 - 0xf7ffffff] our gap calculation doesn't take that into
    > account. My patch fixes this issue.
    >
    > So, IMHO this is a BUG and should be fixed. Please let me know your
    > views.


    Yes, there's a conflict there, but on many machines it's probably a harmless
    one. My main concerns are these:
    1) it changes long standing behavior and doesn't fix any real reported bugs
    I'm aware of (feel free to point me at some)
    2) it looks like it will dramatically reduce the available PCI resource
    space on at least some platforms, and that space is already scarce in our
    current scheme

    So basically, I'm just feeling very conservative about any changes to resource
    allocation, that's all.

    If this change were coupled with one that allowed us to exploit more available
    address space for PCI resources I'd feel much more comfortable about it,
    which is why I'm so interested in TJ's work (/me goes off to read his wiki
    now).

    Thanks,
    Jesse
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: acpi based pci gap calculation - v3

    Hi Jesse,

    On Wed, 2008-07-23 at 09:58 -0700, Jesse Barnes wrote:
    > On Tuesday, July 22, 2008 3:52 pm Alok Kataria wrote:
    > > So the gap that we had calculated first i.e. from e820_setup_gap did
    > > contain a collision i.e. though a resource was reserved from
    > > [0xf0000000 - 0xf7ffffff] our gap calculation doesn't take that into
    > > account. My patch fixes this issue.
    > >
    > > So, IMHO this is a BUG and should be fixed. Please let me know your
    > > views.

    >
    > Yes, there's a conflict there, but on many machines it's probably a harmless
    > one. My main concerns are these:
    > 1) it changes long standing behavior and doesn't fix any real reported bugs
    > I'm aware of (feel free to point me at some)


    The problem that I see is with Memory Hotadd, though my BIOS exports the
    correct SRAT table and tells the OS that some regions are hot pluggable,
    PCI gap calculation ignores that info and assigns some "option ROMS" in
    hot pluggable memory region.

    Because of this when i try to hot add memory, the OS sees a resource
    allocation conflict and bails out. Which shouldn't have happened,
    right ?

    > 2) it looks like it will dramatically reduce the available PCI resource
    > space on at least some platforms, and that space is already scarce in our
    > current scheme


    IMO, these gaps are used only for the option ROMS or hot pluggable
    devices which in itself are rare. So its not like we are reducing the
    whole available PCI resource space, but only the space that is needed by
    these optional ROMS.
    And i think its inevitable if we have to remove these conflicts with any
    other subsystem.
    >
    > So basically, I'm just feeling very conservative about any changes to resource
    > allocation, that's all.
    >
    > If this change were coupled with one that allowed us to exploit more available
    > address space for PCI resources I'd feel much more comfortable about it,
    > which is why I'm so interested in TJ's work (/me goes off to read his wiki
    > now).


    /me to having a look.

    Thanks,
    Alok
    >
    > Thanks,
    > Jesse


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: acpi based pci gap calculation - v3

    On Wednesday, July 23, 2008 10:10 am Alok Kataria wrote:
    > Hi Jesse,
    >
    > On Wed, 2008-07-23 at 09:58 -0700, Jesse Barnes wrote:
    > > On Tuesday, July 22, 2008 3:52 pm Alok Kataria wrote:
    > > > So the gap that we had calculated first i.e. from e820_setup_gap did
    > > > contain a collision i.e. though a resource was reserved from
    > > > [0xf0000000 - 0xf7ffffff] our gap calculation doesn't take that into
    > > > account. My patch fixes this issue.
    > > >
    > > > So, IMHO this is a BUG and should be fixed. Please let me know your
    > > > views.

    > >
    > > Yes, there's a conflict there, but on many machines it's probably a
    > > harmless one. My main concerns are these:
    > > 1) it changes long standing behavior and doesn't fix any real reported
    > > bugs I'm aware of (feel free to point me at some)

    >
    > The problem that I see is with Memory Hotadd, though my BIOS exports the
    > correct SRAT table and tells the OS that some regions are hot pluggable,
    > PCI gap calculation ignores that info and assigns some "option ROMS" in
    > hot pluggable memory region.
    >
    > Because of this when i try to hot add memory, the OS sees a resource
    > allocation conflict and bails out. Which shouldn't have happened,
    > right ?


    Ah ok, so you've got a real problem here. And I don't deny that there's a
    conflict that we should fix.

    > > 2) it looks like it will dramatically reduce the available PCI resource
    > > space on at least some platforms, and that space is already scarce
    > > in our current scheme

    >
    > IMO, these gaps are used only for the option ROMS or hot pluggable
    > devices which in itself are rare. So its not like we are reducing the
    > whole available PCI resource space, but only the space that is needed by
    > these optional ROMS.
    > And i think its inevitable if we have to remove these conflicts with any
    > other subsystem.


    Actually we have other allocations. In many cases the BIOS won't assign MMIO
    resources, so we do it at boot time. Most of the open PCI bugs we have today
    are resource allocation failures, and not just for ROMs, so hopefully you can
    understand my worry. Using all available gaps rather than just the largest
    seems like the obvious first step to increasing our available space, and
    would give us more leeway to avoid stomping on other reserved space.

    Jesse
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: acpi based pci gap calculation - v3

    On Wed, 2008-07-23 at 10:52 -0700, Jesse Barnes wrote:
    > On Wednesday, July 23, 2008 10:10 am Alok Kataria wrote:
    > > Hi Jesse,
    > >
    > > On Wed, 2008-07-23 at 09:58 -0700, Jesse Barnes wrote:
    > > > On Tuesday, July 22, 2008 3:52 pm Alok Kataria wrote:
    > > > > So the gap that we had calculated first i.e. from e820_setup_gap did
    > > > > contain a collision i.e. though a resource was reserved from
    > > > > [0xf0000000 - 0xf7ffffff] our gap calculation doesn't take that into
    > > > > account. My patch fixes this issue.
    > > > >
    > > > > So, IMHO this is a BUG and should be fixed. Please let me know your
    > > > > views.
    > > >
    > > > Yes, there's a conflict there, but on many machines it's probably a
    > > > harmless one. My main concerns are these:
    > > > 1) it changes long standing behavior and doesn't fix any real reported
    > > > bugs I'm aware of (feel free to point me at some)

    > >
    > > The problem that I see is with Memory Hotadd, though my BIOS exports the
    > > correct SRAT table and tells the OS that some regions are hot pluggable,
    > > PCI gap calculation ignores that info and assigns some "option ROMS" in
    > > hot pluggable memory region.
    > >
    > > Because of this when i try to hot add memory, the OS sees a resource
    > > allocation conflict and bails out. Which shouldn't have happened,
    > > right ?

    >
    > Ah ok, so you've got a real problem here. And I don't deny that there's a
    > conflict that we should fix.


    Thanks :-)

    >
    > > > 2) it looks like it will dramatically reduce the available PCI resource
    > > > space on at least some platforms, and that space is already scarce
    > > > in our current scheme

    > >
    > > IMO, these gaps are used only for the option ROMS or hot pluggable
    > > devices which in itself are rare. So its not like we are reducing the
    > > whole available PCI resource space, but only the space that is needed by
    > > these optional ROMS.
    > > And i think its inevitable if we have to remove these conflicts with any
    > > other subsystem.

    >
    > Actually we have other allocations. In many cases the BIOS won't assign MMIO
    > resources, so we do it at boot time. Most of the open PCI bugs we have today
    > are resource allocation failures, and not just for ROMs, so hopefully you can
    > understand my worry. Using all available gaps rather than just the largest
    > seems like the obvious first step to increasing our available space, and
    > would give us more leeway to avoid stomping on other reserved space.
    >


    Yeah i agree that this should be the approach.
    I went through TJ's wiki and in the solutions section the first point
    talks about
    "keeping a list of all available PCI iomem regions which will be derived
    from the e820 map (and the CRS object of the root PCI device).

    TJ/Jesse, do we have any patches/work-in-progress which actually does
    this ?
    I can then add the CRS object stuff to that then.

    Thanks,
    Alok


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread