[PATCH 0/4] Fix vmalloc regression - Kernel

This is a discussion on [PATCH 0/4] Fix vmalloc regression - Kernel ; Nick, This is the whole set of patches I was talking about. Patch 3 is the one that in fact fixes the problem Patches 1 and 2 are debugging aids I made use of, and could be possibly useful to ...

+ Reply to Thread
Results 1 to 9 of 9

Thread: [PATCH 0/4] Fix vmalloc regression

  1. [PATCH 0/4] Fix vmalloc regression

    Nick,

    This is the whole set of patches I was talking about.
    Patch 3 is the one that in fact fixes the problem
    Patches 1 and 2 are debugging aids I made use of, and could be possibly
    useful to others
    Patch 4 removes guard pages entirely for non-debug kernels, as we have already
    previously discussed.

    Hope it's all fine.


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. [PATCH 2/4] show size of failing allocation

    if we can't service a vmalloc allocation, show size of
    the allocation that actually failed. Useful for
    debugging.

    Signed-off-by: Glauber Costa
    ---
    mm/vmalloc.c | 4 ++--
    1 files changed, 2 insertions(+), 2 deletions(-)

    diff --git a/mm/vmalloc.c b/mm/vmalloc.c
    index 95856d1..7db493d 100644
    --- a/mm/vmalloc.c
    +++ b/mm/vmalloc.c
    @@ -381,8 +381,8 @@ found:
    goto retry;
    }
    if (printk_ratelimit())
    - printk(KERN_WARNING "vmap allocation failed: "
    - "use vmalloc= to increase size.\n");
    + printk(KERN_WARNING "vmap allocation for size %d failed: "
    + "use vmalloc= to increase size.\n", size);
    return ERR_PTR(-EBUSY);
    }

    --
    1.5.6.5

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. [PATCH 3/4] restart search at beggining of vmalloc address

    Current vmalloc restart search for a free area in case we
    can't find one. The reason is there are areas which are lazily
    freed, and could be possibly freed now. However, current implementation
    start searching the tree from the last failing address, which is
    pretty much by definition at the end of address space. So, we fail.

    The proposal of this patch is to restart the search from the beginning
    of the requested vstart address. This fixes the regression in running
    KVM virtual machines for me, described in
    http://lkml.org/lkml/2008/10/28/349, caused by commit
    db64fe02258f1507e13fe5212a989922323685ce.

    Signed-off-by: Glauber Costa
    CC: Nick Piggin
    ---
    mm/vmalloc.c | 1 +
    1 files changed, 1 insertions(+), 0 deletions(-)

    diff --git a/mm/vmalloc.c b/mm/vmalloc.c
    index 7db493d..6fe2003 100644
    --- a/mm/vmalloc.c
    +++ b/mm/vmalloc.c
    @@ -378,6 +378,7 @@ found:
    if (!purged) {
    purge_vmap_area_lazy();
    purged = 1;
    + addr = ALIGN(vstart, align);
    goto retry;
    }
    if (printk_ratelimit())
    --
    1.5.6.5

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. [PATCH 4/4] Do not use guard pages in non-debug kernels

    In mm/vmalloc.c, make usage of guard pages dependant
    on CONFIG_DEBUG_PAGEALLOC.

    Signed-off-by: Glauber Costa
    ---
    mm/vmalloc.c | 25 +++++++++++++++----------
    1 files changed, 15 insertions(+), 10 deletions(-)

    diff --git a/mm/vmalloc.c b/mm/vmalloc.c
    index 6fe2003..ed73c6f 100644
    --- a/mm/vmalloc.c
    +++ b/mm/vmalloc.c
    @@ -28,6 +28,11 @@
    #include
    #include

    +#ifdef CONFIG_DEBUG_PAGEALLOC
    +#define GUARD_PAGE_SIZE PAGE_SIZE
    +#else
    +#define GUARD_PAGE_SIZE 0
    +#endif

    /*** Page table manipulation functions ***/

    @@ -363,7 +368,7 @@ retry:
    }

    while (addr + size >= first->va_start && addr + size <= vend) {
    - addr = ALIGN(first->va_end + PAGE_SIZE, align);
    + addr = ALIGN(first->va_end, align);

    n = rb_next(&first->rb_node);
    if (n)
    @@ -954,7 +959,7 @@ void unmap_kernel_range(unsigned long addr, unsigned long size)
    int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page ***pages)
    {
    unsigned long addr = (unsigned long)area->addr;
    - unsigned long end = addr + area->size - PAGE_SIZE;
    + unsigned long end = addr + area->size - GUARD_PAGE_SIZE;
    int err;

    err = vmap_page_range(addr, end, prot, *pages);
    @@ -1003,7 +1008,7 @@ static struct vm_struct *__get_vm_area_node(unsigned long size,
    /*
    * We always allocate a guard page.
    */
    - size += PAGE_SIZE;
    + size += GUARD_PAGE_SIZE;

    va = alloc_vmap_area(size, align, start, end, node, gfp_mask);
    if (IS_ERR(va)) {
    @@ -1098,7 +1103,7 @@ struct vm_struct *remove_vm_area(const void *addr)
    struct vm_struct *vm = va->private;
    struct vm_struct *tmp, **p;
    free_unmap_vmap_area(va);
    - vm->size -= PAGE_SIZE;
    + vm->size -= GUARD_PAGE_SIZE;

    write_lock(&vmlist_lock);
    for (p = &vmlist; (tmp = *p) != vm; p = &tmp->next)
    @@ -1226,7 +1231,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
    struct page **pages;
    unsigned int nr_pages, array_size, i;

    - nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
    + nr_pages = (area->size - GUARD_PAGE_SIZE) >> PAGE_SHIFT;
    array_size = (nr_pages * sizeof(struct page *));

    area->nr_pages = nr_pages;
    @@ -1451,7 +1456,7 @@ long vread(char *buf, char *addr, unsigned long count)
    read_lock(&vmlist_lock);
    for (tmp = vmlist; tmp; tmp = tmp->next) {
    vaddr = (char *) tmp->addr;
    - if (addr >= vaddr + tmp->size - PAGE_SIZE)
    + if (addr >= vaddr + tmp->size - GUARD_PAGE_SIZE)
    continue;
    while (addr < vaddr) {
    if (count == 0)
    @@ -1461,7 +1466,7 @@ long vread(char *buf, char *addr, unsigned long count)
    addr++;
    count--;
    }
    - n = vaddr + tmp->size - PAGE_SIZE - addr;
    + n = vaddr + tmp->size - GUARD_PAGE_SIZE - addr;
    do {
    if (count == 0)
    goto finished;
    @@ -1489,7 +1494,7 @@ long vwrite(char *buf, char *addr, unsigned long count)
    read_lock(&vmlist_lock);
    for (tmp = vmlist; tmp; tmp = tmp->next) {
    vaddr = (char *) tmp->addr;
    - if (addr >= vaddr + tmp->size - PAGE_SIZE)
    + if (addr >= vaddr + tmp->size - GUARD_PAGE_SIZE)
    continue;
    while (addr < vaddr) {
    if (count == 0)
    @@ -1498,7 +1503,7 @@ long vwrite(char *buf, char *addr, unsigned long count)
    addr++;
    count--;
    }
    - n = vaddr + tmp->size - PAGE_SIZE - addr;
    + n = vaddr + tmp->size - GUARD_PAGE_SIZE - addr;
    do {
    if (count == 0)
    goto finished;
    @@ -1544,7 +1549,7 @@ int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
    if (!(area->flags & VM_USERMAP))
    return -EINVAL;

    - if (usize + (pgoff << PAGE_SHIFT) > area->size - PAGE_SIZE)
    + if (usize + (pgoff << PAGE_SHIFT) > area->size - GUARD_PAGE_SIZE)
    return -EINVAL;

    addr += pgoff << PAGE_SHIFT;
    --
    1.5.6.5

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. [PATCH 1/4] don't call __vmalloc from other vmap internal functions

    If we do that, output of files like /proc/vmallocinfo
    will show things like "vmalloc_32", "vmalloc_user", or
    whomever the caller was as the caller. This info is not
    as useful as the real caller of the allocation.

    So, proposal is to call __vmalloc_node node directly, with
    matching parameters to save the caller information

    Signed-off-by: Glauber Costa
    ---
    mm/vmalloc.c | 12 ++++++++----
    1 files changed, 8 insertions(+), 4 deletions(-)

    diff --git a/mm/vmalloc.c b/mm/vmalloc.c
    index 0365369..95856d1 100644
    --- a/mm/vmalloc.c
    +++ b/mm/vmalloc.c
    @@ -1343,7 +1343,8 @@ void *vmalloc_user(unsigned long size)
    struct vm_struct *area;
    void *ret;

    - ret = __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO, PAGE_KERNEL);
    + ret = __vmalloc_node(size, GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO,
    + PAGE_KERNEL, -1, __builtin_return_address(0));
    if (ret) {
    area = find_vm_area(ret);
    area->flags |= VM_USERMAP;
    @@ -1388,7 +1389,8 @@ EXPORT_SYMBOL(vmalloc_node);

    void *vmalloc_exec(unsigned long size)
    {
    - return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC);
    + return __vmalloc_node(size, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC,
    + -1, __builtin_return_address(0));
    }

    #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
    @@ -1408,7 +1410,8 @@ void *vmalloc_exec(unsigned long size)
    */
    void *vmalloc_32(unsigned long size)
    {
    - return __vmalloc(size, GFP_VMALLOC32, PAGE_KERNEL);
    + return __vmalloc_node(size, GFP_VMALLOC32, PAGE_KERNEL,
    + -1, __builtin_return_address(0));
    }
    EXPORT_SYMBOL(vmalloc_32);

    @@ -1424,7 +1427,8 @@ void *vmalloc_32_user(unsigned long size)
    struct vm_struct *area;
    void *ret;

    - ret = __vmalloc(size, GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL);
    + ret = __vmalloc_node(size, GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL,
    + -1, __builtin_return_address(0));
    if (ret) {
    area = find_vm_area(ret);
    area->flags |= VM_USERMAP;
    --
    1.5.6.5

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH 0/4] Fix vmalloc regression

    Glauber Costa wrote:
    > Nick,
    >
    > This is the whole set of patches I was talking about.
    > Patch 3 is the one that in fact fixes the problem...


    Yep, patch 3 works for me, thanks. Only the 32-bit kernel
    seems to need the patch, FWIW.


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH 0/4] Fix vmalloc regression

    On Fri, Nov 07, 2008 at 08:35:50PM -0200, Glauber Costa wrote:
    > Nick,
    >
    > This is the whole set of patches I was talking about.
    > Patch 3 is the one that in fact fixes the problem
    > Patches 1 and 2 are debugging aids I made use of, and could be possibly
    > useful to others
    > Patch 4 removes guard pages entirely for non-debug kernels, as we have already
    > previously discussed.
    >
    > Hope it's all fine.


    OK, these all look good, but I may only push 3/4 for Linus in this round,
    along with some of the changes from my patch that you tested as well.

    With the DEBUG_PAGEALLOC case, I have been thinking that we perhaps should
    turn off the lazy unmapping optimisation as well, so it catches use
    after free similarly to the page allocator... but probably it is a good
    idea at least to avoid the double-guard page for 2.6.28?

    Anyway thanks for these, I'll send them up to Andrew/Linus and cc you.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH 0/4] Fix vmalloc regression

    On Sat, Nov 08, 2008 at 01:58:32AM +0100, Nick Piggin wrote:
    > On Fri, Nov 07, 2008 at 08:35:50PM -0200, Glauber Costa wrote:
    > > Nick,
    > >
    > > This is the whole set of patches I was talking about.
    > > Patch 3 is the one that in fact fixes the problem
    > > Patches 1 and 2 are debugging aids I made use of, and could be possibly
    > > useful to others
    > > Patch 4 removes guard pages entirely for non-debug kernels, as we have already
    > > previously discussed.
    > >
    > > Hope it's all fine.

    >
    > OK, these all look good, but I may only push 3/4 for Linus in this round,
    > along with some of the changes from my patch that you tested as well.


    Makes total sense.
    >
    > With the DEBUG_PAGEALLOC case, I have been thinking that we perhaps should
    > turn off the lazy unmapping optimisation as well, so it catches use
    > after free similarly to the page allocator... but probably it is a good
    > idea at least to avoid the double-guard page for 2.6.28?


    Makes sense. Maybe poisoning after free would also be useful?
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH 0/4] Fix vmalloc regression

    On Saturday 08 November 2008 13:13, Glauber Costa wrote:
    > On Sat, Nov 08, 2008 at 01:58:32AM +0100, Nick Piggin wrote:
    > > On Fri, Nov 07, 2008 at 08:35:50PM -0200, Glauber Costa wrote:
    > > > Nick,
    > > >
    > > > This is the whole set of patches I was talking about.
    > > > Patch 3 is the one that in fact fixes the problem
    > > > Patches 1 and 2 are debugging aids I made use of, and could be possibly
    > > > useful to others
    > > > Patch 4 removes guard pages entirely for non-debug kernels, as we have
    > > > already previously discussed.
    > > >
    > > > Hope it's all fine.

    > >
    > > OK, these all look good, but I may only push 3/4 for Linus in this round,
    > > along with some of the changes from my patch that you tested as well.

    >
    > Makes total sense.


    OK, sent. Thanks again.


    > > With the DEBUG_PAGEALLOC case, I have been thinking that we perhaps
    > > should turn off the lazy unmapping optimisation as well, so it catches
    > > use after free similarly to the page allocator... but probably it is a
    > > good idea at least to avoid the double-guard page for 2.6.28?

    >
    > Makes sense. Maybe poisoning after free would also be useful?


    It's a problem because we're only dealing with virtual address, rather
    than real memory. So we don't really have anything to poison (we don't
    know what the caller will do with the memory). I guess it would be
    possible to poison in the page allocator or in vfree, but.... probably
    not worthwhile (after the immediate-unmap debug option).
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread