[RFC] [PATCH 0/5 V2] Huge page backed user-space stacks - Kernel

This is a discussion on [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks - Kernel ; Certain workloads benefit if their data or text segments are backed by huge pages. The stack is no exception to this rule but there is no mechanism currently that allows the backing of a stack reliably with huge pages. Doing ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 34

Thread: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

  1. [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    Certain workloads benefit if their data or text segments are backed by
    huge pages. The stack is no exception to this rule but there is no
    mechanism currently that allows the backing of a stack reliably with
    huge pages. Doing this from userspace is excessively messy and has some
    awkward restrictions. Particularly on POWER where 256MB of address space
    gets wasted if the stack is setup there.

    This patch stack introduces a personality flag that indicates the kernel
    should setup the stack as a hugetlbfs-backed region. A userspace utility
    may set this flag then exec a process whose stack is to be backed by
    hugetlb pages.

    Eric Munson (5):
    Align stack boundaries based on personality
    Add shared and reservation control to hugetlb_file_setup
    Split boundary checking from body of do_munmap
    Build hugetlb backed process stacks
    [PPC] Setup stack memory segment for hugetlb pages

    arch/powerpc/mm/hugetlbpage.c | 6 +
    arch/powerpc/mm/slice.c | 11 ++
    fs/exec.c | 209 ++++++++++++++++++++++++++++++++++++++---
    fs/hugetlbfs/inode.c | 52 +++++++----
    include/asm-powerpc/hugetlb.h | 3 +
    include/linux/hugetlb.h | 22 ++++-
    include/linux/mm.h | 1 +
    include/linux/personality.h | 3 +
    ipc/shm.c | 2 +-
    mm/mmap.c | 11 ++-
    10 files changed, 284 insertions(+), 36 deletions(-)

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. [PATCH 3/5] Split boundary checking from body of do_munmap

    Currently do_unmap pre-checks the unmapped address range against the
    valid address range for the process size. However during initial setup
    the stack may actually be outside this range, particularly it may be
    initially placed at the 64 bit stack address and later moved to the
    normal 32 bit stack location. In a later patch we will want to unmap
    the stack as part of relocating it into huge pages.

    This patch moves the bulk of do_munmap into __do_munmap which will not
    be protected by the boundary checking. When an area that would normally
    fail at these checks needs to be unmapped (e.g. unmapping a stack that
    was setup at 64 bit TASK_SIZE for a 32 bit process) __do_munmap should
    be called directly. do_munmap will continue to do the boundary checking
    and will call __do_munmap as appropriate.

    Signed-off-by: Eric Munson

    ---
    Based on 2.6.26-rc8-mm1

    include/linux/mm.h | 1 +
    mm/mmap.c | 11 +++++++++--
    2 files changed, 10 insertions(+), 2 deletions(-)

    diff --git a/include/linux/mm.h b/include/linux/mm.h
    index a4eeb3c..59c6f89 100644
    --- a/include/linux/mm.h
    +++ b/include/linux/mm.h
    @@ -1152,6 +1152,7 @@ out:
    return ret;
    }

    +extern int __do_munmap(struct mm_struct *, unsigned long, size_t);
    extern int do_munmap(struct mm_struct *, unsigned long, size_t);

    extern unsigned long do_brk(unsigned long, unsigned long);
    diff --git a/mm/mmap.c b/mm/mmap.c
    index 5b62e5d..4e56369 100644
    --- a/mm/mmap.c
    +++ b/mm/mmap.c
    @@ -1881,17 +1881,24 @@ int split_vma(struct mm_struct * mm, struct vm_area_struct * vma,
    return 0;
    }

    +int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
    +{
    + if (start > TASK_SIZE || len > TASK_SIZE-start)
    + return -EINVAL;
    + return __do_munmap(mm, start, len);
    +}
    +
    /* Munmap is split into 2 main parts -- this part which finds
    * what needs doing, and the areas themselves, which do the
    * work. This now handles partial unmappings.
    * Jeremy Fitzhardinge
    */
    -int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
    +int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
    {
    unsigned long end;
    struct vm_area_struct *vma, *prev, *last;

    - if ((start & ~PAGE_MASK) || start > TASK_SIZE || len > TASK_SIZE-start)
    + if (start & ~PAGE_MASK)
    return -EINVAL;

    if ((len = PAGE_ALIGN(len)) == 0)
    --
    1.5.6.1

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH 1/5 V2] Align stack boundaries based on personality

    On Mon, 2008-07-28 at 12:17 -0700, Eric Munson wrote:
    >
    > +static unsigned long personality_page_align(unsigned long addr)
    > +{
    > + if (current->personality & HUGETLB_STACK)
    > +#ifdef CONFIG_STACK_GROWSUP
    > + return HPAGE_ALIGN(addr);
    > +#else
    > + return addr & HPAGE_MASK;
    > +#endif
    > +
    > + return PAGE_ALIGN(addr);
    > +}

    ....
    > - stack_top = PAGE_ALIGN(stack_top);
    > + stack_top = personality_page_align(stack_top);


    Just out of curiosity, why doesn't the existing small-page case seem to
    care about the stack growing up/down? Why do you need to care in the
    large page case?

    -- Dave

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH 4/5 V2] Build hugetlb backed process stacks

    On Mon, 2008-07-28 at 12:17 -0700, Eric Munson wrote:
    >
    > +static int move_to_huge_pages(struct linux_binprm *bprm,
    > + struct vm_area_struct *vma, unsigned
    > long shift)
    > +{
    > + struct mm_struct *mm = vma->vm_mm;
    > + struct vm_area_struct *new_vma;
    > + unsigned long old_end = vma->vm_end;
    > + unsigned long old_start = vma->vm_start;
    > + unsigned long new_end = old_end - shift;
    > + unsigned long new_start, length;
    > + unsigned long arg_size = new_end - bprm->p;
    > + unsigned long flags = vma->vm_flags;
    > + struct file *hugefile = NULL;
    > + unsigned int stack_hpages = 0;
    > + struct page **from_pages = NULL;
    > + struct page **to_pages = NULL;
    > + unsigned long num_pages = (arg_size / PAGE_SIZE) + 1;
    > + int ret;
    > + int i;
    > +
    > +#ifdef CONFIG_STACK_GROWSUP


    Why do you have the #ifdef for the CONFIG_STACK_GROWSUP=y case in that
    first patch if you don't support CONFIG_STACK_GROWSUP=y?

    I think it might be worth some time to break this up a wee little bit.
    16 local variables is a big on the beefy side.

    -- Dave

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On Mon, 2008-07-28 at 12:17 -0700, Eric Munson wrote:
    >
    > This patch stack introduces a personality flag that indicates the
    > kernel
    > should setup the stack as a hugetlbfs-backed region. A userspace
    > utility
    > may set this flag then exec a process whose stack is to be backed by
    > hugetlb pages.


    I didn't see it mentioned here, but these stacks are fixed-size, right?
    They can't actually grow and are fixed in size at exec() time, right?

    -- Dave

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On Mon, 28 Jul 2008, Dave Hansen wrote:

    > On Mon, 2008-07-28 at 12:17 -0700, Eric Munson wrote:
    > >
    > > This patch stack introduces a personality flag that indicates the
    > > kernel
    > > should setup the stack as a hugetlbfs-backed region. A userspace
    > > utility
    > > may set this flag then exec a process whose stack is to be backed by
    > > hugetlb pages.

    >
    > I didn't see it mentioned here, but these stacks are fixed-size, right?
    > They can't actually grow and are fixed in size at exec() time, right?
    >
    > -- Dave


    The stack VMA is a fixed size but the pages will be faulted in as needed.

    --
    Eric B Munson
    IBM Linux Technology Center
    ebmunson@us.ibm.com


    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)

    iD8DBQFIjjjqsnv9E83jkzoRAugvAKD36x3OyfiN/GtGM+x0LJ6SL7e7TgCdHOOf
    OGGM1jiwgdCWkwRUj/Gd/Fg=
    =5CJt
    -----END PGP SIGNATURE-----


  7. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On Mon, 28 Jul 2008 12:17:10 -0700 Eric Munson wrote:

    > Certain workloads benefit if their data or text segments are backed by
    > huge pages.


    oh. As this is a performance patch, it would be much better if its
    description contained some performance measurement results! Please.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On Mon, 28 Jul 2008 12:17:10 -0700 Eric Munson wrote:

    > Certain workloads benefit if their data or text segments are backed by
    > huge pages. The stack is no exception to this rule but there is no
    > mechanism currently that allows the backing of a stack reliably with
    > huge pages. Doing this from userspace is excessively messy and has some
    > awkward restrictions. Particularly on POWER where 256MB of address space
    > gets wasted if the stack is setup there.
    >
    > This patch stack introduces a personality flag that indicates the kernel
    > should setup the stack as a hugetlbfs-backed region. A userspace utility
    > may set this flag then exec a process whose stack is to be backed by
    > hugetlb pages.
    >
    > Eric Munson (5):
    > Align stack boundaries based on personality
    > Add shared and reservation control to hugetlb_file_setup
    > Split boundary checking from body of do_munmap
    > Build hugetlb backed process stacks
    > [PPC] Setup stack memory segment for hugetlb pages
    >
    > arch/powerpc/mm/hugetlbpage.c | 6 +
    > arch/powerpc/mm/slice.c | 11 ++
    > fs/exec.c | 209 ++++++++++++++++++++++++++++++++++++++---
    > fs/hugetlbfs/inode.c | 52 +++++++----
    > include/asm-powerpc/hugetlb.h | 3 +
    > include/linux/hugetlb.h | 22 ++++-
    > include/linux/mm.h | 1 +
    > include/linux/personality.h | 3 +
    > ipc/shm.c | 2 +-
    > mm/mmap.c | 11 ++-
    > 10 files changed, 284 insertions(+), 36 deletions(-)


    That all looks surprisingly straightforward.

    Might there exist an x86 port which people can play with?
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    /************************************************** *************************
    * User front end for using huge pages Copyright (C) 2008, IBM *
    * *
    * This program is free software; you can redistribute it and/or modify *
    * it under the terms of the Lesser GNU General Public License as *
    * published by the Free Software Foundation; either version 2.1 of the *
    * License, or at your option) any later version. *
    * *
    * This program is distributed in the hope that it will be useful, *
    * but WITHOUT ANY WARRANTY; without even the implied warranty of *
    * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
    * GNU Lesser General Public License for more details. *
    * *
    * You should have received a copy of the Lesser GNU General Public *
    * License along with this program; if not, write to the *
    * Free Software Foundation, Inc., *
    * 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. *
    ************************************************** *************************/

    #include
    #include
    #include
    #include

    #define _GNU_SOURCE /* for getopt_long */
    #include
    #include
    #include

    /* Peronsality bit for huge page backed stack */
    #ifndef HUGETLB_STACK
    #define HUGETLB_STACK 0x0020000
    #endif

    extern int errno;
    extern int optind;
    extern char *optarg;

    void print_usage()
    {
    fprintf(stderr, "hugectl [options] target\n");
    fprintf(stderr, "options:\n");
    fprintf(stderr, " --help, -h Prints this message.\n");
    fprintf(stderr,
    " --stack, -s Attempts to execute target program with a hugtlb page backed stack.\n");
    }

    void set_huge_stack()
    {
    char * err;
    unsigned long curr_per = personality(0xffffffff);
    if (personality(curr_per | HUGETLB_STACK) == -1) {
    err = strerror(errno);
    fprintf(stderr,
    "Error setting HUGE_STACK personality flag: '%s'\n",
    err);
    exit(-1);
    }
    }

    int main(int argc, char** argv)
    {
    char opts [] = "+hs";
    int ret = 0, index = 0;
    struct option long_opts [] = {
    {"help", 0, 0, 'h'},
    {"stack", 0, 0, 's'},
    {0, 0, 0, 0},
    };

    if (argc < 2) {
    print_usage();
    return 0;
    }

    while (ret != -1) {
    ret = getopt_long(argc, argv, opts, long_opts, &index);
    switch (ret) {
    case 's':
    set_huge_stack();
    break;

    case '?':
    case 'h':
    print_usage();
    return 0;

    case -1:
    break;

    default:
    ret = -1;
    break;
    }
    }
    index = optind;

    if (execvp(argv[index], &argv[index]) == -1) {
    ret = errno;
    fprintf(stderr, "Error calling execvp: '%s'\n", strerror(ret));
    return ret;
    }

    return 0;
    }


    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)

    iD8DBQFIkIPbsnv9E83jkzoRApWQAJ49otiwlf5b1ooZnWdLv1 XpcrFEjQCgj9Gc
    q1ncDVumvxsGVpw3BUD6cT8=
    =syC9
    -----END PGP SIGNATURE-----


  10. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On Wed, 30 Jul 2008, Andrew Morton wrote:

    > On Mon, 28 Jul 2008 12:17:10 -0700 Eric Munson wrote:
    >
    > > Certain workloads benefit if their data or text segments are backed by
    > > huge pages. The stack is no exception to this rule but there is no
    > > mechanism currently that allows the backing of a stack reliably with
    > > huge pages. Doing this from userspace is excessively messy and has some
    > > awkward restrictions. Particularly on POWER where 256MB of address space
    > > gets wasted if the stack is setup there.
    > >
    > > This patch stack introduces a personality flag that indicates the kernel
    > > should setup the stack as a hugetlbfs-backed region. A userspace utility
    > > may set this flag then exec a process whose stack is to be backed by
    > > hugetlb pages.
    > >
    > > Eric Munson (5):
    > > Align stack boundaries based on personality
    > > Add shared and reservation control to hugetlb_file_setup
    > > Split boundary checking from body of do_munmap
    > > Build hugetlb backed process stacks
    > > [PPC] Setup stack memory segment for hugetlb pages
    > >
    > > arch/powerpc/mm/hugetlbpage.c | 6 +
    > > arch/powerpc/mm/slice.c | 11 ++
    > > fs/exec.c | 209 ++++++++++++++++++++++++++++++++++++++---
    > > fs/hugetlbfs/inode.c | 52 +++++++----
    > > include/asm-powerpc/hugetlb.h | 3 +
    > > include/linux/hugetlb.h | 22 ++++-
    > > include/linux/mm.h | 1 +
    > > include/linux/personality.h | 3 +
    > > ipc/shm.c | 2 +-
    > > mm/mmap.c | 11 ++-
    > > 10 files changed, 284 insertions(+), 36 deletions(-)

    >
    > That all looks surprisingly straightforward.
    >
    > Might there exist an x86 port which people can play with?
    >


    I have tested these patches on x86, x86_64, and ppc64, but not yet on ia64.
    There is a user space utility that I have been using to test which would be
    included in libhugetlbfs if this is merged into the kernel. I will send it
    out as a reply to this thread, performance numbers are also on the way.

    --
    Eric B Munson
    IBM Linux Technology Center
    ebmunson@us.ibm.com


    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)

    iD8DBQFIkILlsnv9E83jkzoRAnu+AJ43tJhIvKC/V/l/tvEzpOLo1AfDugCgky73
    1/w9s6N+iJutNNsYfJdCkx0=
    =nEy7
    -----END PGP SIGNATURE-----


  11. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On (30/07/08 01:43), Andrew Morton didst pronounce:
    > On Mon, 28 Jul 2008 12:17:10 -0700 Eric Munson wrote:
    >
    > > Certain workloads benefit if their data or text segments are backed by
    > > huge pages.

    >
    > oh. As this is a performance patch, it would be much better if its
    > description contained some performance measurement results! Please.
    >


    I ran these patches through STREAM (http://www.cs.virginia.edu/stream/).
    STREAM itself was patched to allocate data from the stack instead of statically
    for the test. They completed without any problem on x86, x86_64 and PPC64
    and each test showed a performance gain from using hugepages. I can post
    the raw figures but they are not currently in an eye-friendly format. Here
    are some plots of the data though;

    x86: http://www.csn.ul.ie/~mel/postings/s...tream-stack.ps
    x86_64: http://www.csn.ul.ie/~mel/postings/s...tream-stack.ps
    ppc64-small: http://www.csn.ul.ie/~mel/postings/s...tream-stack.ps
    ppc64-large: http://www.csn.ul.ie/~mel/postings/s...tream-stack.ps

    The test was to run STREAM with different array sizes (plotted on X-axis)
    and measure the average throughput (y-axis). In each case, backing the stack
    with large pages with a performance gain.

    --
    Mel Gorman
    Part-time Phd Student Linux Technology Center
    University of Limerick IBM Dublin Software Lab
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On Wed, 30 Jul 2008 18:23:18 +0100 Mel Gorman wrote:

    > On (30/07/08 01:43), Andrew Morton didst pronounce:
    > > On Mon, 28 Jul 2008 12:17:10 -0700 Eric Munson wrote:
    > >
    > > > Certain workloads benefit if their data or text segments are backed by
    > > > huge pages.

    > >
    > > oh. As this is a performance patch, it would be much better if its
    > > description contained some performance measurement results! Please.
    > >

    >
    > I ran these patches through STREAM (http://www.cs.virginia.edu/stream/).
    > STREAM itself was patched to allocate data from the stack instead of statically
    > for the test. They completed without any problem on x86, x86_64 and PPC64
    > and each test showed a performance gain from using hugepages. I can post
    > the raw figures but they are not currently in an eye-friendly format. Here
    > are some plots of the data though;
    >
    > x86: http://www.csn.ul.ie/~mel/postings/s...tream-stack.ps
    > x86_64: http://www.csn.ul.ie/~mel/postings/s...tream-stack.ps
    > ppc64-small: http://www.csn.ul.ie/~mel/postings/s...tream-stack.ps
    > ppc64-large: http://www.csn.ul.ie/~mel/postings/s...tream-stack.ps
    >
    > The test was to run STREAM with different array sizes (plotted on X-axis)
    > and measure the average throughput (y-axis). In each case, backing the stack
    > with large pages with a performance gain.


    So about a 10% speedup on x86 for most STREAM configurations. Handy -
    that's somewhat larger than most hugepage-conversions, iirc.

    Do we expect that this change will be replicated in other
    memory-intensive apps? (I do).

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On (30/07/08 10:34), Andrew Morton didst pronounce:
    > On Wed, 30 Jul 2008 18:23:18 +0100 Mel Gorman wrote:
    >
    > > On (30/07/08 01:43), Andrew Morton didst pronounce:
    > > > On Mon, 28 Jul 2008 12:17:10 -0700 Eric Munson wrote:
    > > >
    > > > > Certain workloads benefit if their data or text segments are backed by
    > > > > huge pages.
    > > >
    > > > oh. As this is a performance patch, it would be much better if its
    > > > description contained some performance measurement results! Please.
    > > >

    > >
    > > I ran these patches through STREAM (http://www.cs.virginia.edu/stream/).
    > > STREAM itself was patched to allocate data from the stack instead of statically
    > > for the test. They completed without any problem on x86, x86_64 and PPC64
    > > and each test showed a performance gain from using hugepages. I can post
    > > the raw figures but they are not currently in an eye-friendly format. Here
    > > are some plots of the data though;
    > >
    > > x86: http://www.csn.ul.ie/~mel/postings/s...tream-stack.ps
    > > x86_64: http://www.csn.ul.ie/~mel/postings/s...tream-stack.ps
    > > ppc64-small: http://www.csn.ul.ie/~mel/postings/s...tream-stack.ps
    > > ppc64-large: http://www.csn.ul.ie/~mel/postings/s...tream-stack.ps
    > >
    > > The test was to run STREAM with different array sizes (plotted on X-axis)
    > > and measure the average throughput (y-axis). In each case, backing the stack
    > > with large pages with a performance gain.

    >
    > So about a 10% speedup on x86 for most STREAM configurations. Handy -
    > that's somewhat larger than most hugepage-conversions, iirc.
    >


    It is a bit. Usually, I expect around 5%.

    > Do we expect that this change will be replicated in other
    > memory-intensive apps? (I do).
    >


    I expect so. I know SpecCPU has some benchmarks that are stack-dependent and
    would benefit from this patchset. I haven't experimented enough yet with other
    workloads to give a decent estimate. I've added Andrew Hastings to the cc as
    I believe he can make a good estimate on what sort of gains had by backing
    the stack with huge pages based on experiments along those lines. Andrew?

    With Erics patch and libhugetlbfs, we can automatically back text/data[1],
    malloc[2] and stacks without source modification. Fairly soon, libhugetlbfs
    will also be able to override shmget() to add SHM_HUGETLB. That should cover
    a lot of the memory-intensive apps without source modification.

    [1] It can partially remap non-hugepage-aligned segments but ideally the
    application would be relinked

    [2] Allocated via the morecore hook in glibc

    --
    Mel Gorman
    Part-time Phd Student Linux Technology Center
    University of Limerick IBM Dublin Software Lab
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    Mel Gorman wrote:

    > With Erics patch and libhugetlbfs, we can automatically back text/data[1],
    > malloc[2] and stacks without source modification. Fairly soon, libhugetlbfs
    > will also be able to override shmget() to add SHM_HUGETLB. That should cover
    > a lot of the memory-intensive apps without source modification.


    So we are quite far down the road to having a VM that supports 2 page sizes 4k and 2M?

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On Wed, 30 Jul 2008 20:30:10 +0100
    Mel Gorman wrote:

    > With Erics patch and libhugetlbfs, we can automatically back text/data[1],
    > malloc[2] and stacks without source modification. Fairly soon, libhugetlbfs
    > will also be able to override shmget() to add SHM_HUGETLB. That should cover
    > a lot of the memory-intensive apps without source modification.


    The weak link in all of this still might be the need to reserve
    hugepages and the unreliability of dynamically allocating them.

    The dynamic allocation should be better nowadays, but I've lost track
    of how reliable it really is. What's our status there?

    Thanks.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On Thursday 31 July 2008 03:34, Andrew Morton wrote:
    > On Wed, 30 Jul 2008 18:23:18 +0100 Mel Gorman wrote:
    > > On (30/07/08 01:43), Andrew Morton didst pronounce:
    > > > On Mon, 28 Jul 2008 12:17:10 -0700 Eric Munson

    wrote:
    > > > > Certain workloads benefit if their data or text segments are backed
    > > > > by huge pages.
    > > >
    > > > oh. As this is a performance patch, it would be much better if its
    > > > description contained some performance measurement results! Please.

    > >
    > > I ran these patches through STREAM (http://www.cs.virginia.edu/stream/).
    > > STREAM itself was patched to allocate data from the stack instead of
    > > statically for the test. They completed without any problem on x86,
    > > x86_64 and PPC64 and each test showed a performance gain from using
    > > hugepages. I can post the raw figures but they are not currently in an
    > > eye-friendly format. Here are some plots of the data though;
    > >
    > > x86:
    > > http://www.csn.ul.ie/~mel/postings/s...86-stream-stac
    > >k.ps x86_64:
    > > http://www.csn.ul.ie/~mel/postings/s...86_64-stream-s
    > >tack.ps ppc64-small:
    > > http://www.csn.ul.ie/~mel/postings/s...pc64-small-str
    > >eam-stack.ps ppc64-large:
    > > http://www.csn.ul.ie/~mel/postings/s...pc64-large-str
    > >eam-stack.ps
    > >
    > > The test was to run STREAM with different array sizes (plotted on X-axis)
    > > and measure the average throughput (y-axis). In each case, backing the
    > > stack with large pages with a performance gain.

    >
    > So about a 10% speedup on x86 for most STREAM configurations. Handy -
    > that's somewhat larger than most hugepage-conversions, iirc.


    Although it might be a bit unusual to have codes doing huge streaming
    memory operations on stack memory...

    We can see why IBM is so keen on their hugepages though


    > Do we expect that this change will be replicated in other
    > memory-intensive apps? (I do).


    Such as what? It would be nice to see some numbers with some HPC or java
    or DBMS workload using this. Not that I dispute it will help some cases,
    but 10% (or 20% for ppc) I guess is getting toward the best case, short
    of a specifically written TLB thrasher.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On Thu, 31 Jul 2008 16:04:14 +1000 Nick Piggin wrote:

    > > Do we expect that this change will be replicated in other
    > > memory-intensive apps? (I do).

    >
    > Such as what? It would be nice to see some numbers with some HPC or java
    > or DBMS workload using this. Not that I dispute it will help some cases,
    > but 10% (or 20% for ppc) I guess is getting toward the best case, short
    > of a specifically written TLB thrasher.


    I didn't realise the STREAM is using vast amounts of automatic memory.
    I'd assumed that it was using sane amounts of stack, but the stack TLB
    slots were getting zapped by all the heap-memory activity. Oh well.

    I guess that effect is still there, but smaller.

    I agree that few real-world apps are likely to see gains of this
    order. More benchmarks, please
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On Thursday 31 July 2008 16:14, Andrew Morton wrote:
    > On Thu, 31 Jul 2008 16:04:14 +1000 Nick Piggin

    wrote:
    > > > Do we expect that this change will be replicated in other
    > > > memory-intensive apps? (I do).

    > >
    > > Such as what? It would be nice to see some numbers with some HPC or java
    > > or DBMS workload using this. Not that I dispute it will help some cases,
    > > but 10% (or 20% for ppc) I guess is getting toward the best case, short
    > > of a specifically written TLB thrasher.

    >
    > I didn't realise the STREAM is using vast amounts of automatic memory.
    > I'd assumed that it was using sane amounts of stack, but the stack TLB
    > slots were getting zapped by all the heap-memory activity. Oh well.


    An easy mistake to make because that's probabably how STREAM would normally
    work. I think what Mel had done is to modify the stream kernel so as to
    have it operate on arrays of stack memory.


    > I guess that effect is still there, but smaller.


    I imagine it should be, unless you're using a CPU with seperate TLBs for
    small and huge pages, and your large data set is mapped with huge pages,
    in which case you might now introduce *new* TLB contention between the
    stack and the dataset

    Also, interestingly I have actually seen some CPUs whos memory operations
    get significantly slower when operating on large pages than small (in the
    case when there is full TLB coverage for both sizes). This would make
    sense if the CPU only implements a fast L1 TLB for small pages.

    So for the vast majority of workloads, where stacks are relatively small
    (or slowly changing), and relatively hot, I suspect this could easily have
    no benefit at best and slowdowns at worst.

    But I'm not saying that as a reason not to merge it -- this is no
    different from any other hugepage allocations and as usual they have to be
    used selectively where they help.... I just wonder exactly where huge
    stacks will help.


    > I agree that few real-world apps are likely to see gains of this
    > order. More benchmarks, please


    Would be nice, if just out of morbid curiosity
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On (30/07/08 13:07), Andrew Morton didst pronounce:
    > On Wed, 30 Jul 2008 20:30:10 +0100
    > Mel Gorman wrote:
    >
    > > With Erics patch and libhugetlbfs, we can automatically back text/data[1],
    > > malloc[2] and stacks without source modification. Fairly soon, libhugetlbfs
    > > will also be able to override shmget() to add SHM_HUGETLB. That should cover
    > > a lot of the memory-intensive apps without source modification.

    >
    > The weak link in all of this still might be the need to reserve
    > hugepages and the unreliability of dynamically allocating them.
    >
    > The dynamic allocation should be better nowadays, but I've lost track
    > of how reliable it really is. What's our status there?
    >


    We are a lot more reliable than we were although exact quantification is
    difficult because it's workload dependent. For a long time, I've been able
    to test bits and pieces with hugepages by allocating the pool at the time
    I needed it even after days of uptime. Previously this required a reboot.

    I've also been able to use the dynamic hugepage pool resizing effectively
    and we track how much it is succeeding and failing in /proc/vmstat (see the
    htlb fields) to watch for problems. Between that and /proc/pagetypeinfo, I am
    expecting to be able to identify availablilty problems. As an administrator
    can now set a minimum pool size and the maximum size of the pool (nr_hugepages
    and nr_overcommit_hugepages), the configuration difficulties should be relaxed.

    If it is found that anti-fragmentation can be broken down and pool
    resizing starts failing after X amount of time on Y workloads, there is
    still the option of using movablecore=BiggestPoolSizeIWillEverNeed
    and writing 1 to /proc/sys/vm/hugepages_treat_as_movable so the hugepage
    pool can grow/shrink reliably there.

    Overall, it's in pretty good shape.

    To be fair, one snag is that that swap is almost required for pool
    resizing to work as I never pushed to complete memory compaction
    (http://lwn.net/Articles/238837/). Hence, we depend on the workload to
    have lots of filesystem-backed data for lumpy-reclaim to do its job, for
    pool resizing to take place between batch jobs or for swap to be configured
    even if it's just for the duration of a pool resize.

    --
    Mel Gorman
    Part-time Phd Student Linux Technology Center
    University of Limerick IBM Dublin Software Lab
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On (31/07/08 16:26), Nick Piggin didst pronounce:
    > On Thursday 31 July 2008 16:14, Andrew Morton wrote:
    > > On Thu, 31 Jul 2008 16:04:14 +1000 Nick Piggin

    > wrote:
    > > > > Do we expect that this change will be replicated in other
    > > > > memory-intensive apps? (I do).
    > > >
    > > > Such as what? It would be nice to see some numbers with some HPC or java
    > > > or DBMS workload using this. Not that I dispute it will help some cases,
    > > > but 10% (or 20% for ppc) I guess is getting toward the best case, short
    > > > of a specifically written TLB thrasher.

    > >
    > > I didn't realise the STREAM is using vast amounts of automatic memory.
    > > I'd assumed that it was using sane amounts of stack, but the stack TLB
    > > slots were getting zapped by all the heap-memory activity. Oh well.

    >
    > An easy mistake to make because that's probabably how STREAM would normally
    > work. I think what Mel had done is to modify the stream kernel so as to
    > have it operate on arrays of stack memory.
    >


    Yes, I mentioned in the mail that STREAM was patched to use stack for
    its data. It was as much to show the patches were working as advertised
    even though it is an extreme case obviously.

    I had seen stack-hugepage-backing as something that would improve performance
    in addition to something else as opposed to having to stand entirely on its
    own. For example, I would expect many memory-intensive applications to gain
    by just having malloc and stack backed more than backing either in isolation.

    > > I guess that effect is still there, but smaller.

    >
    > I imagine it should be, unless you're using a CPU with seperate TLBs for
    > small and huge pages, and your large data set is mapped with huge pages,
    > in which case you might now introduce *new* TLB contention between the
    > stack and the dataset


    Yes, this can happen particularly on older CPUs. For example, on my
    crash-test laptop the Pentium III there reports

    TLB and cache info:
    01: Instruction TLB: 4KB pages, 4-way set assoc, 32 entries
    02: Instruction TLB: 4MB pages, 4-way set assoc, 2 entries

    so a workload that sparsely addressed memory (i.e. >= 4MB strides on each
    reference) might suffer more TLB misses with large pages than with small.
    It's hardly new that there are is uncertainity around when and if hugepages
    are of benefit and where.

    > Also, interestingly I have actually seen some CPUs whos memory operations
    > get significantly slower when operating on large pages than small (in the
    > case when there is full TLB coverage for both sizes). This would make
    > sense if the CPU only implements a fast L1 TLB for small pages.
    >


    It's also possible there is a micro-TLB involved that only support small
    pages. It's been the case for a while that what wins on one machine type
    may lose on another.

    > So for the vast majority of workloads, where stacks are relatively small
    > (or slowly changing), and relatively hot, I suspect this could easily have
    > no benefit at best and slowdowns at worst.
    >


    I wouldn't expect an application with small stacks to request its stack
    to be backed by hugepages either. Ideally, it would be enabled because a
    large enough number of DTLB misses were found to be in the stack
    although catching this sort of data is tricky.

    > But I'm not saying that as a reason not to merge it -- this is no
    > different from any other hugepage allocations and as usual they have to be
    > used selectively where they help.... I just wonder exactly where huge
    > stacks will help.
    >


    Benchmark wise, SPECcpu and SPEComp have stack-dependent benchmarks.
    Computations that partition problems with recursion I would expect to benefit
    as well as some JVMs that heavily use the stack (see how many docs suggest
    setting ulimit -s unlimited). Bit out there, but stack-based languages would
    stand to gain by this. The potential gap is for threaded apps as there will
    be stacks that are not the "main" stack. Backing those with hugepages depends
    on how they are allocated (malloc, it's easy, MAP_ANONYMOUS not so much).

    > > I agree that few real-world apps are likely to see gains of this
    > > order. More benchmarks, please

    >
    > Would be nice, if just out of morbid curiosity
    >


    Benchmarks will happen, they just take time, you know the way. The STREAM one
    in the meantime is a "this works" and has an effect. I'm hoping Andrew Hastings
    will have figures at hand and I cc'd him elsewhere in the thread for comment.

    Thanks

    --
    Mel Gorman
    Part-time Phd Student Linux Technology Center
    University of Limerick IBM Dublin Software Lab
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 1 of 2 1 2 LastLast