[PATCH 0/3 (RFC)](memory hotplug) freeing pages allocated by bootmem for hotremove - Kernel

This is a discussion on [PATCH 0/3 (RFC)](memory hotplug) freeing pages allocated by bootmem for hotremove - Kernel ; Hello. I would like to post patch set to free pages which is allocated by bootmem for memory-hotremove. Basic my idea is using remain members of struct page to remember information of users of bootmem (section number or node id). ...

+ Reply to Thread
Results 1 to 9 of 9

Thread: [PATCH 0/3 (RFC)](memory hotplug) freeing pages allocated by bootmem for hotremove

  1. [PATCH 0/3 (RFC)](memory hotplug) freeing pages allocated by bootmem for hotremove


    Hello.

    I would like to post patch set to free pages which is allocated by bootmem
    for memory-hotremove.

    Basic my idea is using remain members of struct page to remember
    information of users of bootmem (section number or node id).
    When the section is removing, kernel can confirm it.
    By this information, some issues can be solved.

    1) When the memmap of removing section is allocated on other
    section by bootmem, it should/can be free.
    2) When the memmap of removing section is allocated on the
    same section, it shouldn't be freed. Because the section has to be
    offlined already and all pages must be isolated against
    page allocater. Kernel keeps it as is.
    3) When removing section has other section's memmap,
    kernel will be able to show easily which section should be removed
    before it for user. (Not implemented yet)
    4) When the above case 2), the page migrator will be able to check and skip
    memmap againt page isolation when page offline.
    Current page migration fails in this case because this page is
    just reserved page and it can't distinguish this pages can be
    removed or not. But, it will be able to do by this patch.
    (Not implemented yet.)
    5) The node information like pgdat has similar issues. But, this
    will be able to be solved too by this.
    (Not implemented yet, but, remembering node id in the pages.)

    Fortunately, current bootmem allocator just keeps PageReserved flags,
    and doesn't use any other members of page struct. The users of
    bootmem doesn't use them too.

    This patch set needs Badari-san's generic __remove_pages() support patch.
    http://linux.derkeiler.com/Mailing-L.../msg02881.html

    I think this patch set is not perfect. Because, some of section/node
    informations are smaller than one page, and bootmem allocator may
    mix other data. This patch is still trial.
    But I suppose this is good start for everyone to understand what is necessary.

    Please comments.

    Other Todo:
    - for SPARSEMEM_VMEMMAP.
    Freeing vmemmap's page is more diffcult than normal sparsemem.
    Because not only memmap's page, but also the pages like page table must
    be removed too. If removing section has pages for , then it must
    be migrated too. Relocatable page table is necessary.

    - compile with other config.
    This version is just for requesting comments.
    If this way is accepted, I'll check it.
    - Follow fix bootmem by Yinghai Lu-san.
    (This patch set is for 2.6.25-rc3-mm1 with Badari-san's patch yet.)

    Thanks.



    --
    Yasunori Goto


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. [PATCH 2/3 (RFC)](memory hotplug) free pages allocated by bootmem for hotremove


    This patch is to free memmap and usemap by using registered information.

    Signed-off-by: Yasunori Goto

    ---
    mm/internal.h | 3 +--
    mm/page_alloc.c | 2 +-
    mm/sparse.c | 47 +++++++++++++++++++++++++++++++++++++++++------
    3 files changed, 43 insertions(+), 9 deletions(-)

    Index: current/mm/sparse.c
    ================================================== =================
    --- current.orig/mm/sparse.c 2008-03-10 22:24:46.000000000 +0900
    +++ current/mm/sparse.c 2008-03-10 22:31:03.000000000 +0900
    @@ -8,6 +8,7 @@
    #include
    #include
    #include
    +#include "internal.h"
    #include
    #include
    #include
    @@ -361,28 +362,62 @@
    free_pages((unsigned long)memmap,
    get_order(sizeof(struct page) * nr_pages));
    }
    +
    +static void free_maps_by_bootmem(struct page *map, unsigned long nr_pages)
    +{
    + unsigned long maps_section_nr, removing_section_nr, i;
    + struct page *page = map;
    +
    + for (i = 0; i < nr_pages; i++, page++) {
    + maps_section_nr = pfn_to_section_nr(page_to_pfn(page));
    + removing_section_nr = page->private;
    +
    + /*
    + * If removing section's memmap is placed on other section,
    + * it must be free.
    + * Else, nothing is necessary. the memmap is already isolated
    + * against page allocator, and it is not used any more.
    + */
    + if (maps_section_nr != removing_section_nr) {
    + clear_page_bootmem_info(page);
    + __free_pages_bootmem(page, 0);
    + }
    + }
    +}
    #endif /* CONFIG_SPARSEMEM_VMEMMAP */

    static void free_section_usemap(struct page *memmap, unsigned long *usemap)
    {
    + struct page *usemap_page;
    + unsigned long nr_pages;
    +
    if (!usemap)
    return;

    + usemap_page = virt_to_page(usemap);
    /*
    * Check to see if allocation came from hot-plug-add
    */
    - if (PageSlab(virt_to_page(usemap))) {
    + if (PageSlab(usemap_page)) {
    kfree(usemap);
    if (memmap)
    __kfree_section_memmap(memmap, PAGES_PER_SECTION);
    return;
    }

    - /*
    - * TODO: Allocations came from bootmem - how do I free up ?
    - */
    - printk(KERN_WARNING "Not freeing up allocations from bootmem "
    - "- leaking memory\n");
    + /* free maps came from bootmem */
    + nr_pages = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT;
    + free_maps_by_bootmem(usemap_page, nr_pages);
    +
    + if (memmap) {
    + struct page *memmap_page;
    + memmap_page = virt_to_page(memmap);
    +
    + nr_pages = PAGE_ALIGN(PAGES_PER_SECTION * sizeof(struct page))
    + >> PAGE_SHIFT;
    +
    + free_maps_by_bootmem(memmap_page, nr_pages);
    + }
    }

    /*
    Index: current/mm/page_alloc.c
    ================================================== =================
    --- current.orig/mm/page_alloc.c 2008-03-10 22:24:46.000000000 +0900
    +++ current/mm/page_alloc.c 2008-03-10 22:29:20.000000000 +0900
    @@ -564,7 +564,7 @@
    /*
    * permit the bootmem allocator to evade page validation on high-order frees
    */
    -void __init __free_pages_bootmem(struct page *page, unsigned int order)
    +void __free_pages_bootmem(struct page *page, unsigned int order)
    {
    if (order == 0) {
    __ClearPageReserved(page);
    Index: current/mm/internal.h
    ================================================== =================
    --- current.orig/mm/internal.h 2008-03-10 22:24:46.000000000 +0900
    +++ current/mm/internal.h 2008-03-10 22:29:20.000000000 +0900
    @@ -34,8 +34,7 @@
    atomic_dec(&page->_count);
    }

    -extern void __init __free_pages_bootmem(struct page *page,
    - unsigned int order);
    +extern void __free_pages_bootmem(struct page *page, unsigned int order);

    /*
    * function for dealing with page's order in buddy system.

    --
    Yasunori Goto


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. [PATCH 1/3 (RFC)](memory hotplug) remember section_nr and node id for removing


    This is to register information to be able to remove section's or node's
    structures.

    Signed-off-by: Yasunori Goto

    include/linux/memory_hotplug.h | 10 ++++
    include/linux/mmzone.h | 1
    mm/bootmem.c | 1
    mm/memory_hotplug.c | 97 ++++++++++++++++++++++++++++++++++++++++-
    mm/sparse.c | 3 -
    5 files changed, 109 insertions(+), 3 deletions(-)

    Index: current/mm/bootmem.c
    ================================================== =================
    --- current.orig/mm/bootmem.c 2008-03-10 16:42:54.000000000 +0900
    +++ current/mm/bootmem.c 2008-03-10 22:24:46.000000000 +0900
    @@ -401,6 +401,7 @@

    unsigned long __init free_all_bootmem_node(pg_data_t *pgdat)
    {
    + register_page_bootmem_info_node(pgdat);
    return free_all_bootmem_core(pgdat);
    }

    Index: current/include/linux/memory_hotplug.h
    ================================================== =================
    --- current.orig/include/linux/memory_hotplug.h 2008-03-10 16:42:54.000000000 +0900
    +++ current/include/linux/memory_hotplug.h 2008-03-10 16:42:57.000000000 +0900
    @@ -11,6 +11,11 @@
    struct mem_section;

    #ifdef CONFIG_MEMORY_HOTPLUG
    +
    +#define SECTION_MAGIC 0xfffffffe
    +#define NODE_INFO_MAGIC 0xfffffffd
    +#define SECTION_INFO 0
    +#define NODE_INFO 1
    /*
    * pgdat resizing functions
    */
    @@ -145,6 +150,9 @@
    #endif /* CONFIG_NUMA */
    #endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */

    +extern void register_page_bootmem_info_node(struct pglist_data *pgdat);
    +extern void clear_page_bootmem_info(struct page *page);
    +
    #else /* ! CONFIG_MEMORY_HOTPLUG */
    /*
    * Stub functions for when hotplug is off
    @@ -192,5 +200,7 @@
    extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
    int nr_pages);
    extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms);
    +extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
    + unsigned long pnum);

    #endif /* __LINUX_MEMORY_HOTPLUG_H */
    Index: current/include/linux/mmzone.h
    ================================================== =================
    --- current.orig/include/linux/mmzone.h 2008-03-10 16:42:54.000000000 +0900
    +++ current/include/linux/mmzone.h 2008-03-10 16:42:57.000000000 +0900
    @@ -938,6 +938,7 @@
    return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK];
    }
    extern int __section_nr(struct mem_section* ms);
    +extern unsigned long usemap_size(void);

    /*
    * We use the lower bits of the mem_map pointer to store
    Index: current/mm/memory_hotplug.c
    ================================================== =================
    --- current.orig/mm/memory_hotplug.c 2008-03-10 16:42:54.000000000 +0900
    +++ current/mm/memory_hotplug.c 2008-03-10 22:22:25.000000000 +0900
    @@ -59,8 +59,103 @@
    return;
    }

    -
    #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
    +static void set_page_bootmem_info(unsigned long info, struct page *page,
    + unsigned long flag)
    +{
    +
    + if (flag == SECTION_INFO)
    + atomic_set(&page->_mapcount, SECTION_MAGIC);
    + else
    + atomic_set(&page->_mapcount, NODE_INFO_MAGIC);
    +
    + SetPagePrivate(page);
    + set_page_private(page, info);
    +
    +}
    +
    +void clear_page_bootmem_info(struct page *page)
    +{
    + int magic;
    +
    + magic = atomic_read(&page->_mapcount);
    + if (magic != SECTION_MAGIC && magic != NODE_INFO_MAGIC)
    + BUG();
    +
    + ClearPagePrivate(page);
    + set_page_private(page, 0);
    + reset_page_mapcount(page);
    +}
    +
    +void register_page_bootmem_info_section(unsigned long start_pfn)
    +{
    + unsigned long *usemap, mapsize, section_nr, i;
    + struct page *page, *memmap;
    +
    + if (!pfn_valid(start_pfn))
    + return;
    +
    + section_nr = pfn_to_section_nr(start_pfn);
    +
    + memmap = pfn_to_page(start_pfn); /* memmap for the section */
    +
    + /*
    + * Get page for the memmap's phys address
    + * XXX: need more consideration for sparse_vmemmap...
    + */
    + page = virt_to_page(memmap);
    + mapsize = sizeof(struct page) * PAGES_PER_SECTION;
    + mapsize = PAGE_ALIGN(mapsize) >> PAGE_SHIFT;
    +
    + /* remember memmap's page */
    + for (i = 0; i < mapsize; i++, page++)
    + set_page_bootmem_info(section_nr, page, SECTION_INFO);
    +
    + usemap = __nr_to_section(section_nr)->pageblock_flags;
    + page = virt_to_page(usemap);
    +
    + mapsize = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT;
    +
    + for (i = 0; i < mapsize; i++, page++)
    + set_page_bootmem_info(section_nr, page, SECTION_INFO);
    +
    +}
    +
    +void register_page_bootmem_info_node(struct pglist_data *pgdat)
    +{
    + unsigned long i, pfn, end_pfn, nr_pages;
    + int node = pgdat->node_id;
    + struct page *page;
    + struct zone *zone;
    +
    + nr_pages = PAGE_ALIGN(sizeof(struct pglist_data)) >> PAGE_SHIFT;
    + page = virt_to_page(pgdat);
    +
    + for (i = 0; i < nr_pages; i++, page++)
    + set_page_bootmem_info(node, page, NODE_INFO);
    +
    + zone = &pgdat->node_zones[0];
    + for (; zone < pgdat->node_zones + MAX_NR_ZONES - 1; zone++) {
    + if (zone->wait_table) {
    + nr_pages = zone->wait_table_hash_nr_entries
    + * sizeof(wait_queue_head_t);
    + nr_pages = PAGE_ALIGN(nr_pages) >> PAGE_SHIFT;
    + page = virt_to_page(zone->wait_table);
    +
    + for (i = 0; i < nr_pages; i++, page++)
    + set_page_bootmem_info(node, page, NODE_INFO);
    + }
    + }
    +
    + pfn = pgdat->node_start_pfn;
    + end_pfn = pfn + pgdat->node_spanned_pages;
    +
    + /* register_section info */
    + for (; pfn < end_pfn; pfn += PAGES_PER_SECTION)
    + register_page_bootmem_info_section(pfn);
    +
    +}
    +
    static int __add_zone(struct zone *zone, unsigned long phys_start_pfn)
    {
    struct pglist_data *pgdat = zone->zone_pgdat;
    Index: current/mm/sparse.c
    ================================================== =================
    --- current.orig/mm/sparse.c 2008-03-10 16:42:54.000000000 +0900
    +++ current/mm/sparse.c 2008-03-10 22:24:46.000000000 +0900
    @@ -200,7 +200,6 @@
    /*
    * Decode mem_map from the coded memmap
    */
    -static
    struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum)
    {
    /* mask off the extra low bits of information */
    @@ -223,7 +222,7 @@
    return 1;
    }

    -static unsigned long usemap_size(void)
    +unsigned long usemap_size(void)
    {
    unsigned long size_bytes;
    size_bytes = roundup(SECTION_BLOCKFLAGS_BITS, 8) / 8;

    --
    Yasunori Goto


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. [PATCH 3/3 (RFC)](memory hotplug) align maps for easy removing


    To free memmap and usemap easier, this patch aligns these maps to page size.

    I know usemap size is too small to align page size.
    It will be waste of area. So, there may be better way than this.

    Followings are pros. and cons with other my ideas.
    But I'm not sure which is better way.

    a) Packing many section's usemap on one page. Count how many sections use
    it in page_count.
    Pros.
    - Avoid waisting area.
    Cons.
    - This usemap's page will be hard(or impossible) to remove due to
    dependency.
    It should be allocated on un-movable zone/node.
    (I'm not sure it's impact of performance.)
    - Nodes' structures may have to be packed like usemap???

    b) Pack memmap and usemap in one allocation.
    Pros.
    - May avoid wasting area if its size is suitable.
    Cons.
    - If size is not suitable, it will be same as this patch.
    - This way is not good for VMEMMAP_SPARSEMEM.
    At least, it is reverse way against Yinghai-san's fix.

    c) This way.
    Pros.
    - Very easy to remove.
    Cons.
    - Waist of area.

    Any other idea is welcome.


    Signed-off-by: Yasunori Goto

    ---
    mm/sparse.c | 7 ++++---
    1 file changed, 4 insertions(+), 3 deletions(-)

    Index: current/mm/sparse.c
    ================================================== =================
    --- current.orig/mm/sparse.c 2008-03-11 20:15:41.000000000 +0900
    +++ current/mm/sparse.c 2008-03-11 20:58:18.000000000 +0900
    @@ -244,7 +244,8 @@
    struct mem_section *ms = __nr_to_section(pnum);
    int nid = sparse_early_nid(ms);

    - usemap = alloc_bootmem_node(NODE_DATA(nid), usemap_size());
    + usemap = alloc_bootmem_pages_node(NODE_DATA(nid),
    + PAGE_ALIGN(usemap_size()));
    if (usemap)
    return usemap;

    @@ -264,8 +265,8 @@
    if (map)
    return map;

    - map = alloc_bootmem_node(NODE_DATA(nid),
    - sizeof(struct page) * PAGES_PER_SECTION);
    + map = alloc_bootmem_pages_node(NODE_DATA(nid),
    + PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION));
    return map;
    }
    #endif /* !CONFIG_SPARSEMEM_VMEMMAP */

    --
    Yasunori Goto


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH 3/3 (RFC)](memory hotplug) align maps for easy removing

    On Fri, Mar 14, 2008 at 7:44 AM, Yasunori Goto wrote:
    >
    > To free memmap and usemap easier, this patch aligns these maps to page size.
    >
    > I know usemap size is too small to align page size.
    > It will be waste of area. So, there may be better way than this.
    >
    > Followings are pros. and cons with other my ideas.
    > But I'm not sure which is better way.
    >
    > a) Packing many section's usemap on one page. Count how many sections use
    > it in page_count.
    > Pros.
    > - Avoid waisting area.
    > Cons.
    > - This usemap's page will be hard(or impossible) to remove due to
    > dependency.
    > It should be allocated on un-movable zone/node.
    > (I'm not sure it's impact of performance.)
    > - Nodes' structures may have to be packed like usemap???
    >
    > b) Pack memmap and usemap in one allocation.
    > Pros.
    > - May avoid wasting area if its size is suitable.
    > Cons.
    > - If size is not suitable, it will be same as this patch.
    > - This way is not good for VMEMMAP_SPARSEMEM.
    > At least, it is reverse way against Yinghai-san's fix.
    >
    > c) This way.
    > Pros.
    > - Very easy to remove.
    > Cons.
    > - Waist of area.
    >
    > Any other idea is welcome.
    >
    >
    > Signed-off-by: Yasunori Goto
    >
    > ---
    > mm/sparse.c | 7 ++++---
    > 1 file changed, 4 insertions(+), 3 deletions(-)
    >
    > Index: current/mm/sparse.c
    > ================================================== =================
    > --- current.orig/mm/sparse.c 2008-03-11 20:15:41.000000000 +0900
    > +++ current/mm/sparse.c 2008-03-11 20:58:18.000000000 +0900
    > @@ -244,7 +244,8 @@
    > struct mem_section *ms = __nr_to_section(pnum);
    > int nid = sparse_early_nid(ms);
    >
    > - usemap = alloc_bootmem_node(NODE_DATA(nid), usemap_size());
    > + usemap = alloc_bootmem_pages_node(NODE_DATA(nid),
    > + PAGE_ALIGN(usemap_size()));


    if we allocate usemap continuously,
    old way could make different usermap share one page. usermap size is
    only about 24bytes. align to 128bytes ( the SMP cache lines)

    sparse_early_usemap_alloc: usemap = ffff810024e00000 size = 24
    sparse_early_usemap_alloc: usemap = ffff810024e00080 size = 24
    sparse_early_usemap_alloc: usemap = ffff810024e00100 size = 24
    sparse_early_usemap_alloc: usemap = ffff810024e00180 size = 24


    YH
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH 3/3 (RFC)](memory hotplug) align maps for easy removing

    > > Index: current/mm/sparse.c
    > > ================================================== =================
    > > --- current.orig/mm/sparse.c 2008-03-11 20:15:41.000000000 +0900
    > > +++ current/mm/sparse.c 2008-03-11 20:58:18.000000000 +0900
    > > @@ -244,7 +244,8 @@
    > > struct mem_section *ms = __nr_to_section(pnum);
    > > int nid = sparse_early_nid(ms);
    > >
    > > - usemap = alloc_bootmem_node(NODE_DATA(nid), usemap_size());
    > > + usemap = alloc_bootmem_pages_node(NODE_DATA(nid),
    > > + PAGE_ALIGN(usemap_size()));

    >
    > if we allocate usemap continuously,
    > old way could make different usermap share one page. usermap size is
    > only about 24bytes. align to 128bytes ( the SMP cache lines)
    >
    > sparse_early_usemap_alloc: usemap = ffff810024e00000 size = 24
    > sparse_early_usemap_alloc: usemap = ffff810024e00080 size = 24
    > sparse_early_usemap_alloc: usemap = ffff810024e00100 size = 24
    > sparse_early_usemap_alloc: usemap = ffff810024e00180 size = 24



    Yes, they can share one page.

    I was afraid its page will be hard to remove yesterday.
    If all sections' usemaps are allocated on section A,
    the other sections (from B to Z) must be removed before section A.
    If only one of them are busy, section A can't be removed.
    So, I disliked its dependency.

    But, I reconsidered it after reading your mail.
    The node structures like pgdat has same feature.
    If a section has pgdat for the node, it must wait for other section's
    removing on the node. So, I'll try to keep same section about pgdat
    and shared usemap page.

    Anyway, thanks for your comments.

    Bye.

    --
    Yasunori Goto


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH 0/3 (RFC)](memory hotplug) freeing pages allocated by bootmem for hotremove

    On Fri, 2008-03-14 at 23:36 +0900, Yasunori Goto wrote:
    > Hello.
    >
    > I would like to post patch set to free pages which is allocated by bootmem
    > for memory-hotremove.
    >
    > Basic my idea is using remain members of struct page to remember
    > information of users of bootmem (section number or node id).
    > When the section is removing, kernel can confirm it.
    > By this information, some issues can be solved.
    >
    > 1) When the memmap of removing section is allocated on other
    > section by bootmem, it should/can be free.
    > 2) When the memmap of removing section is allocated on the
    > same section, it shouldn't be freed. Because the section has to be
    > offlined already and all pages must be isolated against
    > page allocater. Kernel keeps it as is.
    > 3) When removing section has other section's memmap,
    > kernel will be able to show easily which section should be removed
    > before it for user. (Not implemented yet)
    > 4) When the above case 2), the page migrator will be able to check and skip
    > memmap againt page isolation when page offline.
    > Current page migration fails in this case because this page is
    > just reserved page and it can't distinguish this pages can be
    > removed or not. But, it will be able to do by this patch.
    > (Not implemented yet.)
    > 5) The node information like pgdat has similar issues. But, this
    > will be able to be solved too by this.
    > (Not implemented yet, but, remembering node id in the pages.)
    >
    > Fortunately, current bootmem allocator just keeps PageReserved flags,
    > and doesn't use any other members of page struct. The users of
    > bootmem doesn't use them too.
    >
    > This patch set needs Badari-san's generic __remove_pages() support patch.
    > http://linux.derkeiler.com/Mailing-L.../msg02881.html
    >
    > I think this patch set is not perfect. Because, some of section/node
    > informations are smaller than one page, and bootmem allocator may
    > mix other data. This patch is still trial.
    > But I suppose this is good start for everyone to understand what is necessary.
    >
    > Please comments.
    >
    > Other Todo:
    > - for SPARSEMEM_VMEMMAP.
    > Freeing vmemmap's page is more diffcult than normal sparsemem.
    > Because not only memmap's page, but also the pages like page table must
    > be removed too. If removing section has pages for , then it must
    > be migrated too. Relocatable page table is necessary.
    >
    > - compile with other config.
    > This version is just for requesting comments.
    > If this way is accepted, I'll check it.
    > - Follow fix bootmem by Yinghai Lu-san.
    > (This patch set is for 2.6.25-rc3-mm1 with Badari-san's patch yet.)
    >
    > Thanks.
    >


    Do you have any updates to this. I am getting following boot panic while
    testing this. Before I debug it, I want to make sure its not already
    fixed. Please let me know.

    Thanks,
    Badari

    Linux version 2.6.25-rc5-mm1 (root@elm3b155) (gcc version 3.3.3 (SuSE Linux)) #2 SMP Fri Mar 21 07:48:29 PST 2008
    [boot]0012 Setup Arch
    NUMA associativity depth for CPU/Memory: 3
    adding cpu 0 to node 0
    node 0
    NODE_DATA() = c000000071fea100
    start_paddr = 0
    end_paddr = 72000000
    bootmap_paddr = 71fdb000
    reserve_bootmem 0 7cc000
    reserve_bootmem 23d0000 10000
    reserve_bootmem 77b6000 84a000
    reserve_bootmem 71fdb000 f000
    reserve_bootmem 71fea100 1e00
    reserve_bootmem 71febf68 14098
    PCI host bridge /pci@800000020000002 ranges:
    IO 0x000003fe00200000..0x000003fe002fffff -> 0x0000000000000000
    MEM 0x0000040080000000..0x00000400bfffffff -> 0x00000000c0000000
    PCI host bridge /pci@800000020000003 ranges:
    IO 0x000003fe00700000..0x000003fe007fffff -> 0x0000000000000000
    MEM 0x00000401c0000000..0x00000401ffffffff -> 0x00000000c0000000
    EEH: PCI Enhanced I/O Error Handling Enabled
    PPC64 nvram contains 7168 bytes
    Zone PFN ranges:
    DMA 0 -> 466944
    Normal 466944 -> 466944
    Movable zone start PFN for each node
    Node 0: 262144
    early_node_map[1] active PFN ranges
    0: 0 -> 466944
    [boot]0015 Setup Done
    Built 1 zonelists in Node order, mobility grouping on. Total pages: 451440
    Policy zone: DMA
    Kernel command line: root=/dev/sda3 selinux=0 elevator=cfq numa=debug kernelcore=1024M
    [boot]0020 XICS Init
    [boot]0021 XICS Done
    PID hash table entries: 4096 (order: 12, 32768 bytes)
    clocksource: timebase mult[1352e86] shift[22] registered
    Console: colour dummy device 80x25
    console handover: boot [udbg-1] -> real [hvc0]
    Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
    Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
    freeing bootmem node 0
    Unable to handle kernel paging request for data at address 0xcf7f80000000000c
    Faulting instruction address: 0xc0000000000ce3e8
    Oops: Kernel access of bad area, sig: 11 [#1]
    SMP NR_CPUS=32 NUMA pSeries
    Modules linked in:
    NIP: c0000000000ce3e8 LR: c0000000000cf714 CTR: 800000000013f270
    REGS: c0000000007639f0 TRAP: 0300 Not tainted (2.6.25-rc5-mm1)
    MSR: 8000000000009032 CR: 44002022 XER: 00000008
    DAR: cf7f80000000000c, DSISR: 0000000042010000
    TASK = c000000000689910[0] 'swapper' THREAD: c000000000760000 CPU: 0
    GPR00: fffffffffffffffd c000000000763c70 c000000000761be0 0000000000000000
    GPR04: cf7f800000000000 0000000000000000 0000000000000000 0000000000000001
    GPR08: 0000000000000000 fffffffffffffffe 0000000000000088 cf00000000000000
    GPR12: 0000000000004000 c00000000068a380 0000000000000000 0000000000000000
    GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    GPR20: 4000000001c00000 0000000000000000 0000000002241ed8 0000000000000000
    GPR24: 0000000002242148 0000000000000000 c000000071feb000 0000000000000000
    GPR28: c000000071feb000 0000000000000001 c0000000006e2bd8 cf7f800000000000
    NIP [c0000000000ce3e8] .set_page_bootmem_info+0x10/0x38
    LR [c0000000000cf714] .register_page_bootmem_info_section+0xc4/0x17c
    Call Trace:
    [c000000000763c70] [000000000000001a] 0x1a (unreliable)
    [c000000000763d10] [c0000000000cf8f0] .register_page_bootmem_info_node+0x124/0x158
    [c000000000763dc0] [c0000000006290e4] .free_all_bootmem_node+0x1c/0x3c
    [c000000000763e50] [c00000000061d618] .mem_init+0xbc/0x260
    [c000000000763ee0] [c00000000060bbcc] .start_kernel+0x2f4/0x3f4
    [c000000000763f90] [c000000000008594] .start_here_common+0x54/0xc0
    Instruction dump:
    eb61ffd8 eb81ffe0 eba1ffe8 7c0803a6 ebc1fff0 ebe1fff8 7d808120 4e800020
    2fa50000 3920fffe 3800fffd 409e000c <9124000c> 48000008 9004000c 38000800
    ---[ end trace 31fd0ba7d8756001 ]---
    Kernel panic - not syncing: Attempted to kill the idle task!


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH 0/3 (RFC)](memory hotplug) freeing pages allocated by bootmem for hotremove

    >
    > Do you have any updates to this. I am getting following boot panic while
    > testing this. Before I debug it, I want to make sure its not already
    > fixed. Please let me know.


    Hmmmm. No, I don't. Could you debug it?
    This may come from powerpc environment.

    Thanks.


    >
    > Thanks,
    > Badari
    >
    > Linux version 2.6.25-rc5-mm1 (root@elm3b155) (gcc version 3.3.3 (SuSE Linux)) #2 SMP Fri Mar 21 07:48:29 PST 2008
    > [boot]0012 Setup Arch
    > NUMA associativity depth for CPU/Memory: 3
    > adding cpu 0 to node 0
    > node 0
    > NODE_DATA() = c000000071fea100
    > start_paddr = 0
    > end_paddr = 72000000
    > bootmap_paddr = 71fdb000
    > reserve_bootmem 0 7cc000
    > reserve_bootmem 23d0000 10000
    > reserve_bootmem 77b6000 84a000
    > reserve_bootmem 71fdb000 f000
    > reserve_bootmem 71fea100 1e00
    > reserve_bootmem 71febf68 14098
    > PCI host bridge /pci@800000020000002 ranges:
    > IO 0x000003fe00200000..0x000003fe002fffff -> 0x0000000000000000
    > MEM 0x0000040080000000..0x00000400bfffffff -> 0x00000000c0000000
    > PCI host bridge /pci@800000020000003 ranges:
    > IO 0x000003fe00700000..0x000003fe007fffff -> 0x0000000000000000
    > MEM 0x00000401c0000000..0x00000401ffffffff -> 0x00000000c0000000
    > EEH: PCI Enhanced I/O Error Handling Enabled
    > PPC64 nvram contains 7168 bytes
    > Zone PFN ranges:
    > DMA 0 -> 466944
    > Normal 466944 -> 466944
    > Movable zone start PFN for each node
    > Node 0: 262144
    > early_node_map[1] active PFN ranges
    > 0: 0 -> 466944
    > [boot]0015 Setup Done
    > Built 1 zonelists in Node order, mobility grouping on. Total pages: 451440
    > Policy zone: DMA
    > Kernel command line: root=/dev/sda3 selinux=0 elevator=cfq numa=debug kernelcore=1024M
    > [boot]0020 XICS Init
    > [boot]0021 XICS Done
    > PID hash table entries: 4096 (order: 12, 32768 bytes)
    > clocksource: timebase mult[1352e86] shift[22] registered
    > Console: colour dummy device 80x25
    > console handover: boot [udbg-1] -> real [hvc0]
    > Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
    > Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
    > freeing bootmem node 0
    > Unable to handle kernel paging request for data at address 0xcf7f80000000000c
    > Faulting instruction address: 0xc0000000000ce3e8
    > Oops: Kernel access of bad area, sig: 11 [#1]
    > SMP NR_CPUS=32 NUMA pSeries
    > Modules linked in:
    > NIP: c0000000000ce3e8 LR: c0000000000cf714 CTR: 800000000013f270
    > REGS: c0000000007639f0 TRAP: 0300 Not tainted (2.6.25-rc5-mm1)
    > MSR: 8000000000009032 CR: 44002022 XER: 00000008
    > DAR: cf7f80000000000c, DSISR: 0000000042010000
    > TASK = c000000000689910[0] 'swapper' THREAD: c000000000760000 CPU: 0
    > GPR00: fffffffffffffffd c000000000763c70 c000000000761be0 0000000000000000
    > GPR04: cf7f800000000000 0000000000000000 0000000000000000 0000000000000001
    > GPR08: 0000000000000000 fffffffffffffffe 0000000000000088 cf00000000000000
    > GPR12: 0000000000004000 c00000000068a380 0000000000000000 0000000000000000
    > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    > GPR20: 4000000001c00000 0000000000000000 0000000002241ed8 0000000000000000
    > GPR24: 0000000002242148 0000000000000000 c000000071feb000 0000000000000000
    > GPR28: c000000071feb000 0000000000000001 c0000000006e2bd8 cf7f800000000000
    > NIP [c0000000000ce3e8] .set_page_bootmem_info+0x10/0x38
    > LR [c0000000000cf714] .register_page_bootmem_info_section+0xc4/0x17c
    > Call Trace:
    > [c000000000763c70] [000000000000001a] 0x1a (unreliable)
    > [c000000000763d10] [c0000000000cf8f0] .register_page_bootmem_info_node+0x124/0x158
    > [c000000000763dc0] [c0000000006290e4] .free_all_bootmem_node+0x1c/0x3c
    > [c000000000763e50] [c00000000061d618] .mem_init+0xbc/0x260
    > [c000000000763ee0] [c00000000060bbcc] .start_kernel+0x2f4/0x3f4
    > [c000000000763f90] [c000000000008594] .start_here_common+0x54/0xc0
    > Instruction dump:
    > eb61ffd8 eb81ffe0 eba1ffe8 7c0803a6 ebc1fff0 ebe1fff8 7d808120 4e800020
    > 2fa50000 3920fffe 3800fffd 409e000c <9124000c> 48000008 9004000c 38000800
    > ---[ end trace 31fd0ba7d8756001 ]---
    > Kernel panic - not syncing: Attempted to kill the idle task!
    >
    >


    --
    Yasunori Goto


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH 0/3 (RFC)](memory hotplug) freeing pages allocated by bootmem for hotremove

    On Sat, 2008-03-22 at 09:09 +0900, Yasunori Goto wrote:
    > >
    > > Do you have any updates to this. I am getting following boot panic while
    > > testing this. Before I debug it, I want to make sure its not already
    > > fixed. Please let me know.

    >
    > Hmmmm. No, I don't. Could you debug it?
    > This may come from powerpc environment.
    >
    > Thanks.
    >
    >
    > >
    > > Thanks,
    > > Badari
    > >
    > > Linux version 2.6.25-rc5-mm1 (root@elm3b155) (gcc version 3.3.3 (SuSE Linux)) #2 SMP Fri Mar 21 07:48:29 PST 2008
    > > [boot]0012 Setup Arch
    > > NUMA associativity depth for CPU/Memory: 3
    > > adding cpu 0 to node 0
    > > node 0
    > > NODE_DATA() = c000000071fea100
    > > start_paddr = 0
    > > end_paddr = 72000000
    > > bootmap_paddr = 71fdb000
    > > reserve_bootmem 0 7cc000
    > > reserve_bootmem 23d0000 10000
    > > reserve_bootmem 77b6000 84a000
    > > reserve_bootmem 71fdb000 f000
    > > reserve_bootmem 71fea100 1e00
    > > reserve_bootmem 71febf68 14098
    > > PCI host bridge /pci@800000020000002 ranges:
    > > IO 0x000003fe00200000..0x000003fe002fffff -> 0x0000000000000000
    > > MEM 0x0000040080000000..0x00000400bfffffff -> 0x00000000c0000000
    > > PCI host bridge /pci@800000020000003 ranges:
    > > IO 0x000003fe00700000..0x000003fe007fffff -> 0x0000000000000000
    > > MEM 0x00000401c0000000..0x00000401ffffffff -> 0x00000000c0000000
    > > EEH: PCI Enhanced I/O Error Handling Enabled
    > > PPC64 nvram contains 7168 bytes
    > > Zone PFN ranges:
    > > DMA 0 -> 466944
    > > Normal 466944 -> 466944
    > > Movable zone start PFN for each node
    > > Node 0: 262144
    > > early_node_map[1] active PFN ranges
    > > 0: 0 -> 466944
    > > [boot]0015 Setup Done
    > > Built 1 zonelists in Node order, mobility grouping on. Total pages: 451440
    > > Policy zone: DMA
    > > Kernel command line: root=/dev/sda3 selinux=0 elevator=cfq numa=debug kernelcore=1024M
    > > [boot]0020 XICS Init
    > > [boot]0021 XICS Done
    > > PID hash table entries: 4096 (order: 12, 32768 bytes)
    > > clocksource: timebase mult[1352e86] shift[22] registered
    > > Console: colour dummy device 80x25
    > > console handover: boot [udbg-1] -> real [hvc0]
    > > Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
    > > Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
    > > freeing bootmem node 0
    > > Unable to handle kernel paging request for data at address 0xcf7f80000000000c
    > > Faulting instruction address: 0xc0000000000ce3e8
    > > Oops: Kernel access of bad area, sig: 11 [#1]
    > > SMP NR_CPUS=32 NUMA pSeries
    > > Modules linked in:
    > > NIP: c0000000000ce3e8 LR: c0000000000cf714 CTR: 800000000013f270
    > > REGS: c0000000007639f0 TRAP: 0300 Not tainted (2.6.25-rc5-mm1)
    > > MSR: 8000000000009032 CR: 44002022 XER: 00000008
    > > DAR: cf7f80000000000c, DSISR: 0000000042010000
    > > TASK = c000000000689910[0] 'swapper' THREAD: c000000000760000 CPU: 0
    > > GPR00: fffffffffffffffd c000000000763c70 c000000000761be0 0000000000000000
    > > GPR04: cf7f800000000000 0000000000000000 0000000000000000 0000000000000001
    > > GPR08: 0000000000000000 fffffffffffffffe 0000000000000088 cf00000000000000
    > > GPR12: 0000000000004000 c00000000068a380 0000000000000000 0000000000000000
    > > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    > > GPR20: 4000000001c00000 0000000000000000 0000000002241ed8 0000000000000000
    > > GPR24: 0000000002242148 0000000000000000 c000000071feb000 0000000000000000
    > > GPR28: c000000071feb000 0000000000000001 c0000000006e2bd8 cf7f800000000000
    > > NIP [c0000000000ce3e8] .set_page_bootmem_info+0x10/0x38
    > > LR [c0000000000cf714] .register_page_bootmem_info_section+0xc4/0x17c
    > > Call Trace:
    > > [c000000000763c70] [000000000000001a] 0x1a (unreliable)
    > > [c000000000763d10] [c0000000000cf8f0] .register_page_bootmem_info_node+0x124/0x158
    > > [c000000000763dc0] [c0000000006290e4] .free_all_bootmem_node+0x1c/0x3c
    > > [c000000000763e50] [c00000000061d618] .mem_init+0xbc/0x260
    > > [c000000000763ee0] [c00000000060bbcc] .start_kernel+0x2f4/0x3f4
    > > [c000000000763f90] [c000000000008594] .start_here_common+0x54/0xc0
    > > Instruction dump:
    > > eb61ffd8 eb81ffe0 eba1ffe8 7c0803a6 ebc1fff0 ebe1fff8 7d808120 4e800020
    > > 2fa50000 3920fffe 3800fffd 409e000c <9124000c> 48000008 9004000c 38000800
    > > ---[ end trace 31fd0ba7d8756001 ]---
    > > Kernel panic - not syncing: Attempted to kill the idle task!


    Okay, its an issue with CONFIG_SPARSEMEM_VMEMMAP=y

    I disabled it for now.

    Thanks,
    Badari

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread