[PATCH] [0/18] GB pages hugetlb support - Kernel

This is a discussion on [PATCH] [0/18] GB pages hugetlb support - Kernel ; Andi, Are all the "interesting" cpuset related changes in patch: [PATCH] [1/18] Convert hugeltlb.c over to pass global state around in a structure ? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson 1.940.382.4214 -- ...

+ Reply to Thread
Page 2 of 4 FirstFirst 1 2 3 4 LastLast
Results 21 to 40 of 75

Thread: [PATCH] [0/18] GB pages hugetlb support

  1. Re: [PATCH] [0/18] GB pages hugetlb support

    Andi,

    Are all the "interesting" cpuset related changes in patch:

    [PATCH] [1/18] Convert hugeltlb.c over to pass global state around in a structure

    ?

    --
    I won't rest till it's the best ...
    Programmer, Linux Scalability
    Paul Jackson 1.940.382.4214
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH] [0/18] GB pages hugetlb support

    What kernel version is this patchset against ... apparently not 2.6.25-rc5-mm1.

    --
    I won't rest till it's the best ...
    Programmer, Linux Scalability
    Paul Jackson 1.940.382.4214
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH] [0/18] GB pages hugetlb support

    On Sun, Mar 16, 2008 at 10:11:32PM -0500, Paul Jackson wrote:
    > Andi,
    >
    > Are all the "interesting" cpuset related changes in patch:
    >
    > [PATCH] [1/18] Convert hugeltlb.c over to pass global state around in a structure


    That one and Add basic support for more than one hstate in hugetlbfs
    and partly Add support to have individual hstates for each hugetlbfs mount
    It all builds on each other.
    Ideally look at the end result of the whole series.

    -Andi
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH] [0/18] GB pages hugetlb support

    On Mon, Mar 17, 2008 at 12:35:22AM -0500, Paul Jackson wrote:
    > What kernel version is this patchset against ... apparently not 2.6.25-rc5-mm1.

    This was against 2.6.25-rc4

    -Andi
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH] [11/18] Fix alignment bug in bootmem allocator

    > node_boot_start is not page aligned?

    It is, but it is not necessarily GB aligned and without this
    change sometimes alloc_bootmem when requesting GB alignment
    doesn't return GB aligned memory. This was a nasty problem
    that took some time to track down.

    -Andi
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH] [0/18] GB pages hugetlb support

    Andi wrote:
    > This was against 2.6.25-rc4


    Ok - I'll try that one.

    > Ideally look at the end result of the whole series.


    Ok. Thanks.

    --
    I won't rest till it's the best ...
    Programmer, Linux Scalability
    Paul Jackson 1.940.382.4214
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH] [11/18] Fix alignment bug in bootmem allocator

    On Mon, Mar 17, 2008 at 12:02 AM, Andi Kleen wrote:
    > > node_boot_start is not page aligned?

    >
    > It is, but it is not necessarily GB aligned and without this
    > change sometimes alloc_bootmem when requesting GB alignment
    > doesn't return GB aligned memory. This was a nasty problem
    > that took some time to track down.


    or preferred has some problem?

    preferred = PFN_DOWN(ALIGN(preferred, align)) + offset;

    YH
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH] [0/18] GB pages hugetlb support

    On Mon, Mar 17, 2008 at 02:00:18AM -0500, Paul Jackson wrote:
    > Andi wrote:
    > > This was against 2.6.25-rc4

    >
    > Ok - I'll try that one.


    I just updated to 2.6.25-rc6 base on
    ftp://firstfloor.org/pub/ak/gbpages/patches/
    and gave it a quick test. So you can use that one too.

    It only had a single easy reject.

    -Andi
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH] [11/18] Fix alignment bug in bootmem allocator

    On Mon, Mar 17, 2008 at 12:17 AM, Yinghai Lu wrote:
    >
    > On Mon, Mar 17, 2008 at 12:02 AM, Andi Kleen wrote:
    > > > node_boot_start is not page aligned?

    > >
    > > It is, but it is not necessarily GB aligned and without this
    > > change sometimes alloc_bootmem when requesting GB alignment
    > > doesn't return GB aligned memory. This was a nasty problem
    > > that took some time to track down.

    >
    > or preferred has some problem?
    >
    >
    > preferred = PFN_DOWN(ALIGN(preferred, align)) + offset;
    >


    when node_boot_start is 512M alignment, and align is 1024M, offset
    could be 512M. it seems
    i = ALIGN(i, incr) need to do sth with offset...

    YH
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [PATCH] [11/18] Fix alignment bug in bootmem allocator

    > when node_boot_start is 512M alignment, and align is 1024M, offset
    > could be 512M. it seems
    > i = ALIGN(i, incr) need to do sth with offset...


    It's possible that there are better fixes for this, but at least
    my simple patch seems to work here. I admit I was banging my
    head against this for some time and when I did the fix I just
    wanted the bug to go away and didn't really go for subtleness.

    The bootmem allocator is quite spaghetti in fact, it could
    really need some general clean up (although it's' not quite
    as bad yet as page_alloc.c)

    -Andi
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [PATCH] [11/18] Fix alignment bug in bootmem allocator

    On Mon, Mar 17, 2008 at 12:41 AM, Andi Kleen wrote:
    > > when node_boot_start is 512M alignment, and align is 1024M, offset
    > > could be 512M. it seems
    > > i = ALIGN(i, incr) need to do sth with offset...

    >
    > It's possible that there are better fixes for this, but at least
    > my simple patch seems to work here. I admit I was banging my
    > head against this for some time and when I did the fix I just
    > wanted the bug to go away and didn't really go for subtleness.
    >
    > The bootmem allocator is quite spaghetti in fact, it could
    > really need some general clean up (although it's' not quite
    > as bad yet as page_alloc.c)


    i = ALIGN(i+offset, incr) - offset;

    also the one in fail_block...

    only happen when align is large than alignment of node_boot_start.

    YH
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [PATCH] [11/18] Fix alignment bug in bootmem allocator

    On Mon, Mar 17, 2008 at 01:10:31AM -0700, Yinghai Lu wrote:
    > please check the one against -mm and x86.git


    No offset is not enough because it is still relative to the zone
    start. I'm preparing an updated patch.

    -Andi
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [PATCH] [4/18] Add basic support for more than one hstate in hugetlbfs

    Andi,

    Seems to me that both patches 2/18 and 4/18 are called:

    Add basic support for more than one hstate in hugetlbfs

    You probably want to change this detail.

    --
    I won't rest till it's the best ...
    Programmer, Linux Scalability
    Paul Jackson 1.940.382.4214
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [PATCH] [11/18] Fix alignment bug in bootmem allocator

    please check the one against -mm and x86.git

    ---


  15. Re: [PATCH] [4/18] Add basic support for more than one hstate in hugetlbfs

    On Mon, Mar 17, 2008 at 03:09:42AM -0500, Paul Jackson wrote:
    > Andi,
    >
    > Seems to me that both patches 2/18 and 4/18 are called:
    >
    > Add basic support for more than one hstate in hugetlbfs
    >
    > You probably want to change this detail.


    Fixed thanks. Indeed description went wrong on 4/18
    2/ was the correct one.

    -Andi
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [PATCH] [11/18] Fix alignment bug in bootmem allocator

    > only happen when align is large than alignment of node_boot_start.

    Here's an updated version of the patch with this addressed.
    Please review. The patch is somewhat more complicated, but
    actually makes the code a little cleaner now.

    -Andi


    Fix alignment bug in bootmem allocator

    Without this fix bootmem can return unaligned addresses when the start of a
    node is not aligned to the align value. Needed for reliably allocating
    gigabyte pages.

    I removed the offset variable because all tests should align themself correctly
    now. Slight drawback might be that the bootmem allocator will spend
    some more time skipping bits in the bitmap initially, but that shouldn't
    be a big issue.

    Signed-off-by: Andi Kleen

    ---
    mm/bootmem.c | 24 ++++++++++++------------
    1 file changed, 12 insertions(+), 12 deletions(-)

    Index: linux/mm/bootmem.c
    ================================================== =================
    --- linux.orig/mm/bootmem.c
    +++ linux/mm/bootmem.c
    @@ -195,8 +195,9 @@ void * __init
    __alloc_bootmem_core(struct bootmem_data *bdata, unsigned long size,
    unsigned long align, unsigned long goal, unsigned long limit)
    {
    - unsigned long offset, remaining_size, areasize, preferred;
    - unsigned long i, start = 0, incr, eidx, end_pfn;
    + unsigned long remaining_size, areasize, preferred;
    + unsigned long i, start, incr, eidx, end_pfn;
    + unsigned long pfn;
    void *ret;

    if (!size) {
    @@ -218,10 +219,6 @@ __alloc_bootmem_core(struct bootmem_data
    end_pfn = limit;

    eidx = end_pfn - PFN_DOWN(bdata->node_boot_start);
    - offset = 0;
    - if (align && (bdata->node_boot_start & (align - 1UL)) != 0)
    - offset = align - (bdata->node_boot_start & (align - 1UL));
    - offset = PFN_DOWN(offset);

    /*
    * We try to allocate bootmem pages above 'goal'
    @@ -236,15 +233,18 @@ __alloc_bootmem_core(struct bootmem_data
    } else
    preferred = 0;

    - preferred = PFN_DOWN(ALIGN(preferred, align)) + offset;
    + start = bdata->node_boot_start;
    + preferred = PFN_DOWN(ALIGN(preferred + start, align) - start);
    areasize = (size + PAGE_SIZE-1) / PAGE_SIZE;
    incr = align >> PAGE_SHIFT ? : 1;
    + pfn = PFN_DOWN(start);
    + start = 0;

    restart_scan:
    for (i = preferred; i < eidx; i += incr) {
    unsigned long j;
    i = find_next_zero_bit(bdata->node_bootmem_map, eidx, i);
    - i = ALIGN(i, incr);
    + i = ALIGN(pfn + i, incr) - pfn;
    if (i >= eidx)
    break;
    if (test_bit(i, bdata->node_bootmem_map))
    @@ -258,11 +258,11 @@ restart_scan:
    start = i;
    goto found;
    fail_block:
    - i = ALIGN(j, incr);
    + i = ALIGN(j + pfn, incr) - pfn;
    }

    - if (preferred > offset) {
    - preferred = offset;
    + if (preferred > 0) {
    + preferred = 0;
    goto restart_scan;
    }
    return NULL;
    @@ -278,7 +278,7 @@ found:
    */
    if (align < PAGE_SIZE &&
    bdata->last_offset && bdata->last_pos+1 == start) {
    - offset = ALIGN(bdata->last_offset, align);
    + unsigned long offset = ALIGN(bdata->last_offset, align);
    BUG_ON(offset > PAGE_SIZE);
    remaining_size = PAGE_SIZE - offset;
    if (size < remaining_size) {
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [PATCH] [18/18] Implement hugepagesz= option for x86-64

    Andi wrote:
    + hugepages= [HW,X86-32,IA-64] HugeTLB pages to allocate at boot.
    + hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages.
    + On x86 this option can be specified multiple times
    + interleaved with hugepages= to reserve huge pages
    + of different sizes. Valid pages sizes on x86-64
    + are 2M (when the CPU supports "pse") and 1G (when the
    + CPU supports the "pdpe1gb" cpuinfo flag)
    + Note that 1GB pages can only be allocated at boot time
    + using hugepages= and not freed afterwards.

    This seems to say that hugepages are required for hugepagesz to be
    useful, but hugepagesz is supported on PPC, whereas hugepages is not
    supported on PPC ...odd.

    Should those two HW lists be the same (and sorted in the same order,
    for ease of reading)?

    --
    I won't rest till it's the best ...
    Programmer, Linux Scalability
    Paul Jackson 1.940.382.4214
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [PATCH] [0/18] GB pages hugetlb support

    Andi wrote:
    > I hacked in also cpuset support. It would be good if
    > Paul double checked that.


    Well, from what I can see, Ken Chen wrote the code that deals with
    constraints on hugetlb allocation. So I'll copy him on this reply,
    along with the other two subject matter experts I know of in this area,
    Christoph Lameter and Adam Litke.

    The following is the only cpuset related change I saw in this
    patchset. It looks pretty obvious to me ... just changing the code to
    adapt to Andi's new 'struct hstate' for holding what had been global
    hugetlb state.

    @@ -1228,18 +1252,18 @@ static int hugetlb_acct_memory(long delt
    * semantics that cpuset has.
    */
    if (delta > 0) {
    - if (gather_surplus_pages(delta) < 0)
    + if (gather_surplus_pages(h, delta) < 0)
    goto out;

    - if (delta > cpuset_mems_nr(free_huge_pages_node)) {
    - return_unused_surplus_pages(delta);
    + if (delta > cpuset_mems_nr(h->free_huge_pages_node)) {
    + return_unused_surplus_pages(h, delta);
    goto out;
    }
    }


    Andi claimed, in one of his replies earlier on this thread, that there
    were further interactions with cpusets and later patches in the set
    that "Add basic support for more than one hstate in hugetlbfs
    and partly Add support to have individual hstates for each hugetlbfs
    mount", but I'm not understanding what that interaction is yet.

    --
    I won't rest till it's the best ...
    Programmer, Linux Scalability
    Paul Jackson 1.940.382.4214
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [PATCH] [18/18] Implement hugepagesz= option for x86-64

    On Mon, Mar 17, 2008 at 04:29:39AM -0500, Paul Jackson wrote:
    > Andi wrote:
    > + hugepages= [HW,X86-32,IA-64] HugeTLB pages to allocate at boot.
    > + hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages.
    > + On x86 this option can be specified multiple times
    > + interleaved with hugepages= to reserve huge pages
    > + of different sizes. Valid pages sizes on x86-64
    > + are 2M (when the CPU supports "pse") and 1G (when the
    > + CPU supports the "pdpe1gb" cpuinfo flag)
    > + Note that 1GB pages can only be allocated at boot time
    > + using hugepages= and not freed afterwards.
    >
    > This seems to say that hugepages are required for hugepagesz to be


    Yes, but that was already there before. I didn't change it.

    I agree it should be fixed, but i would prefer to not mix
    PPC specific patches into my patchkit so I hope someone
    else will do that afterwards.

    > useful, but hugepagesz is supported on PPC, whereas hugepages is not
    > supported on PPC ...odd.
    >
    > Should those two HW lists be the same (and sorted in the same order,
    > for ease of reading)?


    Not all architectures support hugepagesz=, in particular i386
    does not and possibly others. It is implemented by arch specific
    code.

    -Andi
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [PATCH] [18/18] Implement hugepagesz= option for x86-64

    Andi wrote:
    > Yes, but that was already there before. I didn't change it.
    >
    > I agree it should be fixed, but i would prefer to not mix
    > PPC specific patches into my patchkit


    Ok - good plan.

    Do you know offhand what would be the correct HW list for hugepages and
    hugepagesz?

    --
    I won't rest till it's the best ...
    Programmer, Linux Scalability
    Paul Jackson 1.940.382.4214
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 2 of 4 FirstFirst 1 2 3 4 LastLast