[PATCH] mm: allocate usemap at first instead of mem_map in sparse_init - Kernel

This is a discussion on [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init - Kernel ; [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init on powerpc, On Wed, Apr 2, 2008 at 12:22 PM, Badari Pulavarty wrote: > > On Wed, 2008-04-02 at 18:17 +1100, Michael Ellerman wrote: > > On Wed, 2008-04-02 ...

+ Reply to Thread
Results 1 to 10 of 10

Thread: [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init

  1. [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init

    [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init

    on powerpc,

    On Wed, Apr 2, 2008 at 12:22 PM, Badari Pulavarty wrote:
    >
    > On Wed, 2008-04-02 at 18:17 +1100, Michael Ellerman wrote:
    > > On Wed, 2008-04-02 at 12:38 +0530, Kamalesh Babulal wrote:
    > > > Andrew Morton wrote:
    > > > > On Wed, 02 Apr 2008 11:55:36 +0530 Kamalesh Babulal wrote:
    > > > >
    > > > >> Hi Andrew,
    > > > >>
    > > > >> The 2.6.25-rc8-mm1 kernel panic's while bootup on the power machine(s).
    > > > >>
    > > > >> [ 0.000000] ------------[ cut here ]------------
    > > > >> [ 0.000000] kernel BUG at arch/powerpc/mm/init_64.c:240!
    > > > >> [ 0.000000] Oops: Exception in kernel mode, sig: 5 [#1]
    > > > >> [ 0.000000] SMP NR_CPUS=32 NUMA PowerMac
    > > > >> [ 0.000000] Modules linked in:
    > > > >> [ 0.000000] NIP: c0000000003d1dcc LR: c0000000003d1dc4 CTR: c00000000002b6ac
    > > > >> [ 0.000000] REGS: c00000000049b960 TRAP: 0700 Not tainted (2.6.25-rc8-mm1-autokern1)
    > > > >> [ 0.000000] MSR: 9000000000021032 CR: 44000088 XER: 20000000
    > > > >> [ 0.000000] TASK = c0000000003f9c90[0] 'swapper' THREAD: c000000000498000 CPU: 0
    > > > >> [ 0.000000] GPR00: c0000000003d1dc4 c00000000049bbe0 c0000000004989d0 0000000000000001
    > > > >> [ 0.000000] GPR04: d59aca40f0000000 000000000b000000 0000000000000010 0000000000000000
    > > > >> [ 0.000000] GPR08: 0000000000000004 0000000000000001 c00000027e520800 c0000000004bf0f0
    > > > >> [ 0.000000] GPR12: c0000000004bf020 c0000000003fa900 0000000000000000 0000000000000000
    > > > >> [ 0.000000] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    > > > >> [ 0.000000] GPR20: 0000000000000000 0000000000000000 0000000000000000 4000000001400000
    > > > >> [ 0.000000] GPR24: 00000000017d64b0 c0000000003d6250 0000000000000000 c000000000504000
    > > > >> [ 0.000000] GPR28: 0000000000000000 cf000000001f8000 0000000001000000 cf00000000000000
    > > > >> [ 0.000000] NIP [c0000000003d1dcc] .vmemmap_populate+0xb8/0xf4
    > > > >> [ 0.000000] LR [c0000000003d1dc4] .vmemmap_populate+0xb0/0xf4
    > > > >> [ 0.000000] Call Trace:
    > > > >> [ 0.000000] [c00000000049bbe0] [c0000000003d1dc4] .vmemmap_populate+0xb0/0xf4 (unreliable)
    > > > >> [ 0.000000] [c00000000049bc70] [c0000000003d2ee8] .sparse_mem_map_populate+0x38/0x60
    > > > >> [ 0.000000] [c00000000049bd00] [c0000000003c242c] .sparse_early_mem_map_alloc+0x54/0x94
    > > > >> [ 0.000000] [c00000000049bd90] [c0000000003c250c] .sparse_init+0xa0/0x20c
    > > > >> [ 0.000000] [c00000000049be50] [c0000000003ab7d0] .setup_arch+0x1ac/0x218
    > > > >> [ 0.000000] [c00000000049bee0] [c0000000003a36ac] .start_kernel+0xe0/0x3fc
    > > > >> [ 0.000000] [c00000000049bf90] [c000000000008594] .start_here_common+0x54/0xc0
    > > > >> [ 0.000000] Instruction dump:
    > > > >> [ 0.000000] 7fe3fb78 7ca02a14 4082000c 3860fff4 4800003c e92289c8 e96289c0 e9090002
    > > > >> [ 0.000000] e8eb0002 4bc575cd 60000000 78630fe0 <0b030000> 7ffff214 7fbfe840 7fe3fb78
    > > > >> [ 0.000000] ---[ end trace 31fd0ba7d8756001 ]---
    > > > >> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!

    >
    > mm-make-mem_map-allocation-continuous.patch
    > and its friends in -mm.
    >
    > You have to call sparse_init_one_section() on each pmap and usemap
    > as we allocate - since valid_section() depends on it (which is needed
    > by vmemmap_populate() to check if the section is populated or not).
    > On ppc, we need to call htab_bolted_mapping() on each section and
    > we need to skip existing sections.
    >
    > These patches tried to group all allocations together and then later
    > calls sparse_init_one_section() - which is not good


    so try to allocate usemap at first altogether.

    Signed-off-by: Yinghai Lu

    diff --git a/mm/sparse.c b/mm/sparse.c
    index d3cb085..782ebe5 100644
    --- a/mm/sparse.c
    +++ b/mm/sparse.c
    @@ -294,7 +294,7 @@ void __init sparse_init(void)
    unsigned long pnum;
    struct page *map;
    unsigned long *usemap;
    - struct page **section_map;
    + unsigned long **usemap_map;
    int size;
    int node;

    @@ -305,27 +305,31 @@ void __init sparse_init(void)
    * make next 2M slip to one more 2M later.
    * then in big system, the memmory will have a lot hole...
    * here try to allocate 2M pages continously.
    + *
    + * powerpc hope to sparse_init_one_section right after each
    + * sparse_early_mem_map_alloc, so allocate usemap_map
    + * at first.
    */
    - size = sizeof(struct page *) * NR_MEM_SECTIONS;
    - section_map = alloc_bootmem(size);
    - if (!section_map)
    - panic("can not allocate section_map\n");
    + size = sizeof(unsigned long *) * NR_MEM_SECTIONS;
    + usemap_map = alloc_bootmem(size);
    + if (!usemap_map)
    + panic("can not allocate usemap_map\n");

    for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
    if (!present_section_nr(pnum))
    continue;
    - section_map[pnum] = sparse_early_mem_map_alloc(pnum);
    + usemap_map[pnum] = sparse_early_usemap_alloc(pnum);
    }

    for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
    if (!present_section_nr(pnum))
    continue;

    - map = section_map[pnum];
    + map = sparse_early_mem_map_alloc(pnum);
    if (!map)
    continue;

    - usemap = sparse_early_usemap_alloc(pnum);
    + usemap = usemap_map[pnum];
    if (!usemap)
    continue;

    @@ -333,7 +337,7 @@ void __init sparse_init(void)
    usemap);
    }

    - free_bootmem(__pa(section_map), size);
    + free_bootmem(__pa(usemap_map), size);
    }

    #ifdef CONFIG_MEMORY_HOTPLUG
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init

    On Wed, 2 Apr 2008 15:25:48 -0700 Yinghai Lu wrote:

    > [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init
    >
    > on powerpc,
    >
    > On Wed, Apr 2, 2008 at 12:22 PM, Badari Pulavarty wrote:
    > >
    > > On Wed, 2008-04-02 at 18:17 +1100, Michael Ellerman wrote:
    > > > On Wed, 2008-04-02 at 12:38 +0530, Kamalesh Babulal wrote:
    > > > > Andrew Morton wrote:
    > > > > > On Wed, 02 Apr 2008 11:55:36 +0530 Kamalesh Babulal wrote:
    > > > > >
    > > > > >> Hi Andrew,
    > > > > >>
    > > > > >> The 2.6.25-rc8-mm1 kernel panic's while bootup on the power machine(s).
    > > > > >>
    > > > > >> [ 0.000000] ------------[ cut here ]------------
    > > > > >> [ 0.000000] kernel BUG at arch/powerpc/mm/init_64.c:240!
    > > > > >> [ 0.000000] Oops: Exception in kernel mode, sig: 5 [#1]
    > > > > >> [ 0.000000] SMP NR_CPUS=32 NUMA PowerMac
    > > > > >> [ 0.000000] Modules linked in:
    > > > > >> [ 0.000000] NIP: c0000000003d1dcc LR: c0000000003d1dc4 CTR: c00000000002b6ac
    > > > > >> [ 0.000000] REGS: c00000000049b960 TRAP: 0700 Not tainted (2.6.25-rc8-mm1-autokern1)
    > > > > >> [ 0.000000] MSR: 9000000000021032 CR: 44000088 XER: 20000000
    > > > > >> [ 0.000000] TASK = c0000000003f9c90[0] 'swapper' THREAD: c000000000498000 CPU: 0
    > > > > >> [ 0.000000] GPR00: c0000000003d1dc4 c00000000049bbe0 c0000000004989d0 0000000000000001
    > > > > >> [ 0.000000] GPR04: d59aca40f0000000 000000000b000000 0000000000000010 0000000000000000
    > > > > >> [ 0.000000] GPR08: 0000000000000004 0000000000000001 c00000027e520800 c0000000004bf0f0
    > > > > >> [ 0.000000] GPR12: c0000000004bf020 c0000000003fa900 0000000000000000 0000000000000000
    > > > > >> [ 0.000000] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    > > > > >> [ 0.000000] GPR20: 0000000000000000 0000000000000000 0000000000000000 4000000001400000
    > > > > >> [ 0.000000] GPR24: 00000000017d64b0 c0000000003d6250 0000000000000000 c000000000504000
    > > > > >> [ 0.000000] GPR28: 0000000000000000 cf000000001f8000 0000000001000000 cf00000000000000
    > > > > >> [ 0.000000] NIP [c0000000003d1dcc] .vmemmap_populate+0xb8/0xf4
    > > > > >> [ 0.000000] LR [c0000000003d1dc4] .vmemmap_populate+0xb0/0xf4
    > > > > >> [ 0.000000] Call Trace:
    > > > > >> [ 0.000000] [c00000000049bbe0] [c0000000003d1dc4] .vmemmap_populate+0xb0/0xf4 (unreliable)
    > > > > >> [ 0.000000] [c00000000049bc70] [c0000000003d2ee8] .sparse_mem_map_populate+0x38/0x60
    > > > > >> [ 0.000000] [c00000000049bd00] [c0000000003c242c] .sparse_early_mem_map_alloc+0x54/0x94
    > > > > >> [ 0.000000] [c00000000049bd90] [c0000000003c250c] .sparse_init+0xa0/0x20c
    > > > > >> [ 0.000000] [c00000000049be50] [c0000000003ab7d0] .setup_arch+0x1ac/0x218
    > > > > >> [ 0.000000] [c00000000049bee0] [c0000000003a36ac] .start_kernel+0xe0/0x3fc
    > > > > >> [ 0.000000] [c00000000049bf90] [c000000000008594] .start_here_common+0x54/0xc0
    > > > > >> [ 0.000000] Instruction dump:
    > > > > >> [ 0.000000] 7fe3fb78 7ca02a14 4082000c 3860fff4 4800003c e92289c8 e96289c0 e9090002
    > > > > >> [ 0.000000] e8eb0002 4bc575cd 60000000 78630fe0 <0b030000> 7ffff214 7fbfe840 7fe3fb78
    > > > > >> [ 0.000000] ---[ end trace 31fd0ba7d8756001 ]---
    > > > > >> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!

    > >
    > > mm-make-mem_map-allocation-continuous.patch
    > > and its friends in -mm.
    > >
    > > You have to call sparse_init_one_section() on each pmap and usemap
    > > as we allocate - since valid_section() depends on it (which is needed
    > > by vmemmap_populate() to check if the section is populated or not).
    > > On ppc, we need to call htab_bolted_mapping() on each section and
    > > we need to skip existing sections.
    > >
    > > These patches tried to group all allocations together and then later
    > > calls sparse_init_one_section() - which is not good

    >
    > so try to allocate usemap at first altogether.


    I have to turn all the above crud into a proper changelog. I'd prefer that
    you do it.

    Unless this patch should be folded into another one, in which case it
    doesn't matter.

    > Signed-off-by: Yinghai Lu
    >
    > diff --git a/mm/sparse.c b/mm/sparse.c
    > index d3cb085..782ebe5 100644
    > --- a/mm/sparse.c
    > +++ b/mm/sparse.c


    We shouldn't merge this patch on its own because then that will leave a
    non-bisectable region in the powerpc history.

    So which patch is this patch fixing? Lexically it applies to
    mm-allocate-section_map-for-sparse_init.patch (and its updates). But is
    that where it logically lies?

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init

    On Wed, 2008-04-02 at 15:25 -0700, Yinghai Lu wrote:
    > [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init
    > so try to allocate usemap at first altogether.
    >
    > Signed-off-by: Yinghai Lu
    >
    > diff --git a/mm/sparse.c b/mm/sparse.c
    > index d3cb085..782ebe5 100644
    > --- a/mm/sparse.c
    > +++ b/mm/sparse.c
    > @@ -294,7 +294,7 @@ void __init sparse_init(void)
    > unsigned long pnum;
    > struct page *map;
    > unsigned long *usemap;
    > - struct page **section_map;
    > + unsigned long **usemap_map;
    > int size;
    > int node;
    >
    > @@ -305,27 +305,31 @@ void __init sparse_init(void)
    > * make next 2M slip to one more 2M later.
    > * then in big system, the memmory will have a lot hole...
    > * here try to allocate 2M pages continously.


    Comments are x86-64 specific. On ppc its 16MB chunks

    > + *
    > + * powerpc hope to sparse_init_one_section right after each
    > + * sparse_early_mem_map_alloc, so allocate usemap_map
    > + * at first.
    > */
    > - size = sizeof(struct page *) * NR_MEM_SECTIONS;
    > - section_map = alloc_bootmem(size);
    > - if (!section_map)
    > - panic("can not allocate section_map\n");
    > + size = sizeof(unsigned long *) * NR_MEM_SECTIONS;
    > + usemap_map = alloc_bootmem(size);
    > + if (!usemap_map)
    > + panic("can not allocate usemap_map\n");
    >
    > for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
    > if (!present_section_nr(pnum))
    > continue;
    > - section_map[pnum] = sparse_early_mem_map_alloc(pnum);
    > + usemap_map[pnum] = sparse_early_usemap_alloc(pnum);
    > }
    >
    > for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
    > if (!present_section_nr(pnum))
    > continue;
    >
    > - map = section_map[pnum];
    > + map = sparse_early_mem_map_alloc(pnum);
    > if (!map)
    > continue;
    >
    > - usemap = sparse_early_usemap_alloc(pnum);
    > + usemap = usemap_map[pnum];
    > if (!usemap)
    > continue;


    You may want to move this check before doing sparse_early_mem_map_alloc
    (). We are also not handling errors properly (freeing up the unused
    map or usemap) if we "continue". I know the original code is this way,
    but you touched it last

    >
    > @@ -333,7 +337,7 @@ void __init sparse_init(void)
    > usemap);
    > }
    >
    > - free_bootmem(__pa(section_map), size);
    > + free_bootmem(__pa(usemap_map), size);
    > }
    >
    > #ifdef CONFIG_MEMORY_HOTPLUG


    Tested and boots my machine fine.

    Acked-by: Badari Pulavarty

    Thanks,
    Badari

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init

    On Wed, Apr 2, 2008 at 3:52 PM, Andrew Morton wrote:
    >
    > On Wed, 2 Apr 2008 15:25:48 -0700 Yinghai Lu wrote:
    >
    > > [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init
    > >
    > > on powerpc,
    > >
    > > On Wed, Apr 2, 2008 at 12:22 PM, Badari Pulavarty wrote:
    > > >
    > > > On Wed, 2008-04-02 at 18:17 +1100, Michael Ellerman wrote:
    > > > > On Wed, 2008-04-02 at 12:38 +0530, Kamalesh Babulal wrote:
    > > > > > Andrew Morton wrote:
    > > > > > > On Wed, 02 Apr 2008 11:55:36 +0530 Kamalesh Babulal wrote:
    > > > > > >
    > > > > > >> Hi Andrew,
    > > > > > >>
    > > > > > >> The 2.6.25-rc8-mm1 kernel panic's while bootup on the power machine(s).
    > > > > > >>
    > > > > > >> [ 0.000000] ------------[ cut here ]------------
    > > > > > >> [ 0.000000] kernel BUG at arch/powerpc/mm/init_64.c:240!
    > > > > > >> [ 0.000000] Oops: Exception in kernel mode, sig: 5 [#1]
    > > > > > >> [ 0.000000] SMP NR_CPUS=32 NUMA PowerMac
    > > > > > >> [ 0.000000] Modules linked in:
    > > > > > >> [ 0.000000] NIP: c0000000003d1dcc LR: c0000000003d1dc4 CTR: c00000000002b6ac
    > > > > > >> [ 0.000000] REGS: c00000000049b960 TRAP: 0700 Not tainted (2.6.25-rc8-mm1-autokern1)
    > > > > > >> [ 0.000000] MSR: 9000000000021032 CR: 44000088 XER: 20000000
    > > > > > >> [ 0.000000] TASK = c0000000003f9c90[0] 'swapper' THREAD: c000000000498000 CPU: 0
    > > > > > >> [ 0.000000] GPR00: c0000000003d1dc4 c00000000049bbe0 c0000000004989d0 0000000000000001
    > > > > > >> [ 0.000000] GPR04: d59aca40f0000000 000000000b000000 0000000000000010 0000000000000000
    > > > > > >> [ 0.000000] GPR08: 0000000000000004 0000000000000001 c00000027e520800 c0000000004bf0f0
    > > > > > >> [ 0.000000] GPR12: c0000000004bf020 c0000000003fa900 0000000000000000 0000000000000000
    > > > > > >> [ 0.000000] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    > > > > > >> [ 0.000000] GPR20: 0000000000000000 0000000000000000 0000000000000000 4000000001400000
    > > > > > >> [ 0.000000] GPR24: 00000000017d64b0 c0000000003d6250 0000000000000000 c000000000504000
    > > > > > >> [ 0.000000] GPR28: 0000000000000000 cf000000001f8000 0000000001000000 cf00000000000000
    > > > > > >> [ 0.000000] NIP [c0000000003d1dcc] .vmemmap_populate+0xb8/0xf4
    > > > > > >> [ 0.000000] LR [c0000000003d1dc4] .vmemmap_populate+0xb0/0xf4
    > > > > > >> [ 0.000000] Call Trace:
    > > > > > >> [ 0.000000] [c00000000049bbe0] [c0000000003d1dc4] .vmemmap_populate+0xb0/0xf4 (unreliable)
    > > > > > >> [ 0.000000] [c00000000049bc70] [c0000000003d2ee8] .sparse_mem_map_populate+0x38/0x60
    > > > > > >> [ 0.000000] [c00000000049bd00] [c0000000003c242c] .sparse_early_mem_map_alloc+0x54/0x94
    > > > > > >> [ 0.000000] [c00000000049bd90] [c0000000003c250c] .sparse_init+0xa0/0x20c
    > > > > > >> [ 0.000000] [c00000000049be50] [c0000000003ab7d0] .setup_arch+0x1ac/0x218
    > > > > > >> [ 0.000000] [c00000000049bee0] [c0000000003a36ac] .start_kernel+0xe0/0x3fc
    > > > > > >> [ 0.000000] [c00000000049bf90] [c000000000008594] .start_here_common+0x54/0xc0
    > > > > > >> [ 0.000000] Instruction dump:
    > > > > > >> [ 0.000000] 7fe3fb78 7ca02a14 4082000c 3860fff4 4800003c e92289c8 e96289c0 e9090002
    > > > > > >> [ 0.000000] e8eb0002 4bc575cd 60000000 78630fe0 <0b030000> 7ffff214 7fbfe840 7fe3fb78
    > > > > > >> [ 0.000000] ---[ end trace 31fd0ba7d8756001 ]---
    > > > > > >> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
    > > >
    > > > mm-make-mem_map-allocation-continuous.patch
    > > > and its friends in -mm.
    > > >
    > > > You have to call sparse_init_one_section() on each pmap and usemap
    > > > as we allocate - since valid_section() depends on it (which is needed
    > > > by vmemmap_populate() to check if the section is populated or not).
    > > > On ppc, we need to call htab_bolted_mapping() on each section and
    > > > we need to skip existing sections.
    > > >
    > > > These patches tried to group all allocations together and then later
    > > > calls sparse_init_one_section() - which is not good

    > >
    > > so try to allocate usemap at first altogether.

    >
    > I have to turn all the above crud into a proper changelog. I'd prefer that
    > you do it.
    >
    > Unless this patch should be folded into another one, in which case it
    > doesn't matter.
    >
    >
    > > Signed-off-by: Yinghai Lu
    > >
    > > diff --git a/mm/sparse.c b/mm/sparse.c
    > > index d3cb085..782ebe5 100644
    > > --- a/mm/sparse.c
    > > +++ b/mm/sparse.c

    >
    > We shouldn't merge this patch on its own because then that will leave a
    > non-bisectable region in the powerpc history.
    >
    > So which patch is this patch fixing? Lexically it applies to
    > mm-allocate-section_map-for-sparse_init.patch (and its updates). But is
    > that where it logically lies?


    yes. we should fold

    mm-make-mem_map-allocation-continuous.patch
    mm-allocate-section_map-for-sparse_init.patch
    and this one

    to big one (not big really).

    YH
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. [PATCH] mm: make mem_map allocation continuous v2.


    vmemmap allocation current got
    [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
    [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001800000 on node 0
    [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001c00000 on node 0
    [ffffe20000600000-ffffe200007fffff] PMD ->ffff810002000000 on node 0
    [ffffe20000800000-ffffe200009fffff] PMD ->ffff810002400000 on node 0
    ....

    there is 2M hole between them.

    the rootcause is that usemap (24 bytes) will be allocated after every 2M
    mem_map. and it will push next vmemmap (2M) to next align (2M).

    solution:
    try to allocate mem_map continously.

    after patch, will get
    [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
    [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001600000 on node 0
    [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001800000 on node 0
    [ffffe20000600000-ffffe200007fffff] PMD ->ffff810001a00000 on node 0
    [ffffe20000800000-ffffe200009fffff] PMD ->ffff810001c00000 on node 0
    ....
    and usemap will share in page because of they are allocated continuously too.
    sparse_early_usemap_alloc: usemap = ffff810024e00000 size = 24
    sparse_early_usemap_alloc: usemap = ffff810024e00080 size = 24
    sparse_early_usemap_alloc: usemap = ffff810024e00100 size = 24
    sparse_early_usemap_alloc: usemap = ffff810024e00180 size = 24
    ....

    so we make the bootmem allocation more compact and use less memory for usemap.

    for power pc
    Badari Pulavarty wrote:

    > You have to call sparse_init_one_section() on each pmap and usemap
    > as we allocate - since valid_section() depends on it (which is needed
    > by vmemmap_populate() to check if the section is populated or not).
    > On ppc, we need to call htab_bolted_mapping() on each section and
    > we need to skip existing sections.


    so try to allocate usemap at first altogether.

    v2 replace:
    [PATCH] mm: make mem_map allocation continuous.
    [PATCH] mm: allocate section_map for sparse_init
    [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init

    Signed-off-by: Yinghai Lu

    diff --git a/mm/sparse.c b/mm/sparse.c
    index f6a43c0..2881222 100644
    --- a/mm/sparse.c
    +++ b/mm/sparse.c
    @@ -294,22 +294,48 @@ void __init sparse_init(void)
    unsigned long pnum;
    struct page *map;
    unsigned long *usemap;
    + unsigned long **usemap_map;
    + int size;
    +
    + /*
    + * map is using big page (aka 2M in x86 64 bit)
    + * usemap is less one page (aka 24 bytes)
    + * so alloc 2M (with 2M align) and 24 bytes in turn will
    + * make next 2M slip to one more 2M later.
    + * then in big system, the memory will have a lot of holes...
    + * here try to allocate 2M pages continously.
    + *
    + * powerpc need to call sparse_init_one_section right after each
    + * sparse_early_mem_map_alloc, so allocate usemap_map at first.
    + */
    + size = sizeof(unsigned long *) * NR_MEM_SECTIONS;
    + usemap_map = alloc_bootmem(size);
    + if (!usemap_map)
    + panic("can not allocate usemap_map\n");

    for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
    if (!present_section_nr(pnum))
    continue;
    + usemap_map[pnum] = sparse_early_usemap_alloc(pnum);
    + }

    - map = sparse_early_mem_map_alloc(pnum);
    - if (!map)
    + for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
    + if (!present_section_nr(pnum))
    continue;

    - usemap = sparse_early_usemap_alloc(pnum);
    + usemap = usemap_map[pnum];
    if (!usemap)
    continue;

    + map = sparse_early_mem_map_alloc(pnum);
    + if (!map)
    + continue;
    +
    sparse_init_one_section(__nr_to_section(pnum), pnum, map,
    usemap);
    }
    +
    + free_bootmem(__pa(usemap_map), size);
    }

    #ifdef CONFIG_MEMORY_HOTPLUG
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init

    On Wed, Apr 2, 2008 at 5:44 PM, Yinghai Lu wrote:
    >
    > On Wed, Apr 2, 2008 at 3:52 PM, Andrew Morton wrote:
    > >
    > > On Wed, 2 Apr 2008 15:25:48 -0700 Yinghai Lu wrote:
    > >
    > > > [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init
    > > >
    > > > on powerpc,
    > > >
    > > > On Wed, Apr 2, 2008 at 12:22 PM, Badari Pulavarty wrote:
    > > > >
    > > > > On Wed, 2008-04-02 at 18:17 +1100, Michael Ellerman wrote:
    > > > > > On Wed, 2008-04-02 at 12:38 +0530, Kamalesh Babulal wrote:
    > > > > > > Andrew Morton wrote:
    > > > > > > > On Wed, 02 Apr 2008 11:55:36 +0530 Kamalesh Babulal wrote:
    > > > so try to allocate usemap at first altogether.

    > >
    > > I have to turn all the above crud into a proper changelog. I'd prefer that
    > > you do it.
    > >
    > > Unless this patch should be folded into another one, in which case it
    > > doesn't matter.
    > >
    > >
    > > > Signed-off-by: Yinghai Lu
    > > >
    > > > diff --git a/mm/sparse.c b/mm/sparse.c
    > > > index d3cb085..782ebe5 100644
    > > > --- a/mm/sparse.c
    > > > +++ b/mm/sparse.c

    > >
    > > We shouldn't merge this patch on its own because then that will leave a
    > > non-bisectable region in the powerpc history.
    > >
    > > So which patch is this patch fixing? Lexically it applies to
    > > mm-allocate-section_map-for-sparse_init.patch (and its updates). But is
    > > that where it logically lies?

    >
    > yes. we should fold
    >
    >
    > mm-make-mem_map-allocation-continuous.patch
    >
    > mm-allocate-section_map-for-sparse_init.patch
    > and this one
    >


    please check the big one.
    http://lkml.org/lkml/2008/4/2/650

    YH
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH] mm: make mem_map allocation continuous v2.

    On Wed, 2 Apr 2008 18:30:24 -0700 Yinghai Lu wrote:

    > v2 replace:
    > [PATCH] mm: make mem_map allocation continuous.
    > [PATCH] mm: allocate section_map for sparse_init
    > [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init
    >


    err, no.

    >
    > diff --git a/mm/sparse.c b/mm/sparse.c
    > index f6a43c0..2881222 100644
    > --- a/mm/sparse.c
    > +++ b/mm/sparse.c


    Sorry, but I'd rather not do it this way. We presently have this:

    mm-make-mem_map-allocation-continuous.patch
    mm-make-mem_map-allocation-continuous-checkpatch-fixes.patch
    mm-fix-alloc_bootmem_core-to-use-fast-searching-for-all-nodes.patch
    mm-allocate-section_map-for-sparse_init.patch
    mm-allocate-section_map-for-sparse_init-update.patch
    mm-allocate-section_map-for-sparse_init-update-fix.patch
    mm-allocate-section_map-for-sparse_init-powerpc-fix.patch
    mm-offset-align-in-alloc_bootmem.patch
    mm-make-reserve_bootmem-can-crossed-the-nodes.patch
    mm-make-reserve_bootmem-can-crossed-the-nodes-checkpatch-fixes.patch

    and you purport to throw some of them away and combine them into a single
    patch? We assume that the later patches will still apply and work on top
    of this newer patch? It is up to me to check that the replacement patch
    incorporates the third-party changes to the original patches?

    Too hard, too risky. Can't we just do a fix against 2.6.25-rc8-mm1?
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH] mm: make mem_map allocation continuous v2.


    Looks good to me. And ia64 boots up with this patch too.
    Thanks.

    Acked-by: Yasunori Goto


    >
    > vmemmap allocation current got
    > [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
    > [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001800000 on node 0
    > [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001c00000 on node 0
    > [ffffe20000600000-ffffe200007fffff] PMD ->ffff810002000000 on node 0
    > [ffffe20000800000-ffffe200009fffff] PMD ->ffff810002400000 on node 0
    > ...
    >
    > there is 2M hole between them.
    >
    > the rootcause is that usemap (24 bytes) will be allocated after every 2M
    > mem_map. and it will push next vmemmap (2M) to next align (2M).
    >
    > solution:
    > try to allocate mem_map continously.
    >
    > after patch, will get
    > [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
    > [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001600000 on node 0
    > [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001800000 on node 0
    > [ffffe20000600000-ffffe200007fffff] PMD ->ffff810001a00000 on node 0
    > [ffffe20000800000-ffffe200009fffff] PMD ->ffff810001c00000 on node 0
    > ...
    > and usemap will share in page because of they are allocated continuously too.
    > sparse_early_usemap_alloc: usemap = ffff810024e00000 size = 24
    > sparse_early_usemap_alloc: usemap = ffff810024e00080 size = 24
    > sparse_early_usemap_alloc: usemap = ffff810024e00100 size = 24
    > sparse_early_usemap_alloc: usemap = ffff810024e00180 size = 24
    > ...
    >
    > so we make the bootmem allocation more compact and use less memory for usemap.
    >
    > for power pc
    > Badari Pulavarty wrote:
    >
    > > You have to call sparse_init_one_section() on each pmap and usemap
    > > as we allocate - since valid_section() depends on it (which is needed
    > > by vmemmap_populate() to check if the section is populated or not).
    > > On ppc, we need to call htab_bolted_mapping() on each section and
    > > we need to skip existing sections.

    >
    > so try to allocate usemap at first altogether.
    >
    > v2 replace:
    > [PATCH] mm: make mem_map allocation continuous.
    > [PATCH] mm: allocate section_map for sparse_init
    > [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init
    >
    > Signed-off-by: Yinghai Lu
    >
    > diff --git a/mm/sparse.c b/mm/sparse.c
    > index f6a43c0..2881222 100644
    > --- a/mm/sparse.c
    > +++ b/mm/sparse.c
    > @@ -294,22 +294,48 @@ void __init sparse_init(void)
    > unsigned long pnum;
    > struct page *map;
    > unsigned long *usemap;
    > + unsigned long **usemap_map;
    > + int size;
    > +
    > + /*
    > + * map is using big page (aka 2M in x86 64 bit)
    > + * usemap is less one page (aka 24 bytes)
    > + * so alloc 2M (with 2M align) and 24 bytes in turn will
    > + * make next 2M slip to one more 2M later.
    > + * then in big system, the memory will have a lot of holes...
    > + * here try to allocate 2M pages continously.
    > + *
    > + * powerpc need to call sparse_init_one_section right after each
    > + * sparse_early_mem_map_alloc, so allocate usemap_map at first.
    > + */
    > + size = sizeof(unsigned long *) * NR_MEM_SECTIONS;
    > + usemap_map = alloc_bootmem(size);
    > + if (!usemap_map)
    > + panic("can not allocate usemap_map\n");
    >
    > for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
    > if (!present_section_nr(pnum))
    > continue;
    > + usemap_map[pnum] = sparse_early_usemap_alloc(pnum);
    > + }
    >
    > - map = sparse_early_mem_map_alloc(pnum);
    > - if (!map)
    > + for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
    > + if (!present_section_nr(pnum))
    > continue;
    >
    > - usemap = sparse_early_usemap_alloc(pnum);
    > + usemap = usemap_map[pnum];
    > if (!usemap)
    > continue;
    >
    > + map = sparse_early_mem_map_alloc(pnum);
    > + if (!map)
    > + continue;
    > +
    > sparse_init_one_section(__nr_to_section(pnum), pnum, map,
    > usemap);
    > }
    > +
    > + free_bootmem(__pa(usemap_map), size);
    > }
    >
    > #ifdef CONFIG_MEMORY_HOTPLUG
    > --
    > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    > the body of a message to majordomo@vger.kernel.org
    > More majordomo info at http://vger.kernel.org/majordomo-info.html
    > Please read the FAQ at http://www.tux.org/lkml/


    --
    Yasunori Goto


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH] mm: make mem_map allocation continuous v2.

    On Wed, Apr 2, 2008 at 7:22 PM, Andrew Morton wrote:
    > On Wed, 2 Apr 2008 18:30:24 -0700 Yinghai Lu wrote:
    >
    > > v2 replace:
    > > [PATCH] mm: make mem_map allocation continuous.
    > > [PATCH] mm: allocate section_map for sparse_init
    > > [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init
    > >

    >
    > err, no.
    >
    >
    > >
    > > diff --git a/mm/sparse.c b/mm/sparse.c
    > > index f6a43c0..2881222 100644
    > > --- a/mm/sparse.c
    > > +++ b/mm/sparse.c

    >
    > Sorry, but I'd rather not do it this way. We presently have this:
    >


    it replaces

    > mm-make-mem_map-allocation-continuous.patch
    > mm-make-mem_map-allocation-continuous-checkpatch-fixes.patch
    > mm-allocate-section_map-for-sparse_init.patch
    > mm-allocate-section_map-for-sparse_init-update.patch
    > mm-allocate-section_map-for-sparse_init-update-fix.patch
    > mm-allocate-section_map-for-sparse_init-powerpc-fix.patch


    others still needed

    so mm-make-mem-map-allocation-continuous.patch will not break powerpc and ia64

    YH
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [PATCH] mm: make mem_map allocation continuous v2.

    Yinghai Lu wrote:
    > On Wed, Apr 2, 2008 at 7:22 PM, Andrew Morton wrote:
    >> On Wed, 2 Apr 2008 18:30:24 -0700 Yinghai Lu wrote:
    >>
    >> > v2 replace:
    >> > [PATCH] mm: make mem_map allocation continuous.
    >> > [PATCH] mm: allocate section_map for sparse_init
    >> > [PATCH] mm: allocate usemap at first instead of mem_map in sparse_init
    >> >

    >>
    >> err, no.
    >>
    >>
    >> >
    >> > diff --git a/mm/sparse.c b/mm/sparse.c
    >> > index f6a43c0..2881222 100644
    >> > --- a/mm/sparse.c
    >> > +++ b/mm/sparse.c

    >>
    >> Sorry, but I'd rather not do it this way. We presently have this:
    >>

    >
    > it replaces
    >
    >> mm-make-mem_map-allocation-continuous.patch
    >> mm-make-mem_map-allocation-continuous-checkpatch-fixes.patch
    >> mm-allocate-section_map-for-sparse_init.patch
    >> mm-allocate-section_map-for-sparse_init-update.patch
    >> mm-allocate-section_map-for-sparse_init-update-fix.patch
    >> mm-allocate-section_map-for-sparse_init-powerpc-fix.patch

    >
    > others still needed
    >
    > so mm-make-mem-map-allocation-continuous.patch will not break powerpc and ia64
    >
    > YH

    Hi,

    Thanks, the patch fixes the issue. I am able to bootup without the kernel panic.

    Tested-by: Kamalesh Babulal

    --
    Thanks & Regards,
    Kamalesh Babulal,
    Linux Technology Center,
    IBM, ISTL.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread