[RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp - Kernel

This is a discussion on [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp - Kernel ; MIGRATE_RESERVE mean that the page is for emergency. So it shouldn't be cached in pcp. otherwise, the system have unnecessary memory starvation risk because other cpu can't use this emergency pages. Signed-off-by: KOSAKI Motohiro CC: Mel Gorman CC: Christoph Lameter ...

+ Reply to Thread
Results 1 to 13 of 13

Thread: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp

  1. [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp

    MIGRATE_RESERVE mean that the page is for emergency.
    So it shouldn't be cached in pcp.

    otherwise, the system have unnecessary memory starvation risk
    because other cpu can't use this emergency pages.



    Signed-off-by: KOSAKI Motohiro
    CC: Mel Gorman
    CC: Christoph Lameter

    ---
    mm/page_alloc.c | 12 +++++++++++-
    1 file changed, 11 insertions(+), 1 deletion(-)

    Index: b/mm/page_alloc.c
    ================================================== =================
    --- a/mm/page_alloc.c 2008-11-06 06:01:15.000000000 +0900
    +++ b/mm/page_alloc.c 2008-11-06 06:27:41.000000000 +0900
    @@ -1002,6 +1002,7 @@ static void free_hot_cold_page(struct pa
    struct zone *zone = page_zone(page);
    struct per_cpu_pages *pcp;
    unsigned long flags;
    + int migratetype = get_pageblock_migratetype(page);

    if (PageAnon(page))
    page->mapping = NULL;
    @@ -1018,16 +1019,25 @@ static void free_hot_cold_page(struct pa
    pcp = &zone_pcp(zone, get_cpu())->pcp;
    local_irq_save(flags);
    __count_vm_event(PGFREE);
    +
    + set_page_private(page, migratetype);
    +
    + /* the page for emergency shouldn't be cached */
    + if (migratetype == MIGRATE_RESERVE) {
    + free_one_page(zone, page, 0);
    + goto out;
    + }
    if (cold)
    list_add_tail(&page->lru, &pcp->list);
    else
    list_add(&page->lru, &pcp->list);
    - set_page_private(page, get_pageblock_migratetype(page));
    pcp->count++;
    if (pcp->count >= pcp->high) {
    free_pages_bulk(zone, pcp->batch, &pcp->list, 0);
    pcp->count -= pcp->batch;
    }
    +
    +out:
    local_irq_restore(flags);
    put_cpu();
    }


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp

    And the fastpath gets even more complex. Sigh.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp

    On Thu, Nov 06, 2008 at 09:16:58AM +0900, KOSAKI Motohiro wrote:
    > MIGRATE_RESERVE mean that the page is for emergency.
    > So it shouldn't be cached in pcp.
    >


    It doesn't necessarily mean it's for emergencys. MIGRATE_RESERVE is one
    or more pageblocks at the beginning of the zone. While it's possible
    that the minimum page reserve for GFP_ATOMIC is located here, it's not
    mandatory.

    What MIGRATE_RESERVE can help is high-order atomic allocations used by
    some network drivers (a wireless one is what led to MIGRATE_RESERVE). As
    they are high-order allocations, they would be returned to the buddy
    allocator anyway.

    What your patch may help is the situation where the system is under intense
    memory pressure, is dipping routinely into the lowmem reserves and mixing
    with high-order atomic allocations. This seems a bit extreme.

    > otherwise, the system have unnecessary memory starvation risk
    > because other cpu can't use this emergency pages.
    >
    >
    >
    > Signed-off-by: KOSAKI Motohiro
    > CC: Mel Gorman
    > CC: Christoph Lameter
    >


    This patch seems functionally sound but as Christoph points out, this
    adds another branch to the fast path. Now, I ran some tests and those that
    completed didn't show any problems but adding branches in the fast path can
    eventually lead to hard-to-detect performance problems.

    Do you have a situation in mind that this patch fixes up?

    Thanks

    > ---
    > mm/page_alloc.c | 12 +++++++++++-
    > 1 file changed, 11 insertions(+), 1 deletion(-)
    >
    > Index: b/mm/page_alloc.c
    > ================================================== =================
    > --- a/mm/page_alloc.c 2008-11-06 06:01:15.000000000 +0900
    > +++ b/mm/page_alloc.c 2008-11-06 06:27:41.000000000 +0900
    > @@ -1002,6 +1002,7 @@ static void free_hot_cold_page(struct pa
    > struct zone *zone = page_zone(page);
    > struct per_cpu_pages *pcp;
    > unsigned long flags;
    > + int migratetype = get_pageblock_migratetype(page);
    >
    > if (PageAnon(page))
    > page->mapping = NULL;
    > @@ -1018,16 +1019,25 @@ static void free_hot_cold_page(struct pa
    > pcp = &zone_pcp(zone, get_cpu())->pcp;
    > local_irq_save(flags);
    > __count_vm_event(PGFREE);
    > +
    > + set_page_private(page, migratetype);
    > +
    > + /* the page for emergency shouldn't be cached */
    > + if (migratetype == MIGRATE_RESERVE) {
    > + free_one_page(zone, page, 0);
    > + goto out;
    > + }
    > if (cold)
    > list_add_tail(&page->lru, &pcp->list);
    > else
    > list_add(&page->lru, &pcp->list);
    > - set_page_private(page, get_pageblock_migratetype(page));
    > pcp->count++;
    > if (pcp->count >= pcp->high) {
    > free_pages_bulk(zone, pcp->batch, &pcp->list, 0);
    > pcp->count -= pcp->batch;
    > }
    > +
    > +out:
    > local_irq_restore(flags);
    > put_cpu();
    > }
    >
    >


    --
    Mel Gorman
    Part-time Phd Student Linux Technology Center
    University of Limerick IBM Dublin Software Lab
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp

    On Thu, 6 Nov 2008 16:46:45 +0000
    Mel Gorman wrote:
    > > otherwise, the system have unnecessary memory starvation risk
    > > because other cpu can't use this emergency pages.
    > >
    > >
    > >
    > > Signed-off-by: KOSAKI Motohiro
    > > CC: Mel Gorman
    > > CC: Christoph Lameter
    > >

    >
    > This patch seems functionally sound but as Christoph points out, this
    > adds another branch to the fast path. Now, I ran some tests and those that
    > completed didn't show any problems but adding branches in the fast path can
    > eventually lead to hard-to-detect performance problems.
    >

    dividing pcp-list into MIGRATE_TYPES is bad ?
    If divided, we can get rid of scan.

    Thanks,
    -Kame



    > Do you have a situation in mind that this patch fixes up?
    >
    > Thanks
    >
    > > ---
    > > mm/page_alloc.c | 12 +++++++++++-
    > > 1 file changed, 11 insertions(+), 1 deletion(-)
    > >
    > > Index: b/mm/page_alloc.c
    > > ================================================== =================
    > > --- a/mm/page_alloc.c 2008-11-06 06:01:15.000000000 +0900
    > > +++ b/mm/page_alloc.c 2008-11-06 06:27:41.000000000 +0900
    > > @@ -1002,6 +1002,7 @@ static void free_hot_cold_page(struct pa
    > > struct zone *zone = page_zone(page);
    > > struct per_cpu_pages *pcp;
    > > unsigned long flags;
    > > + int migratetype = get_pageblock_migratetype(page);
    > >
    > > if (PageAnon(page))
    > > page->mapping = NULL;
    > > @@ -1018,16 +1019,25 @@ static void free_hot_cold_page(struct pa
    > > pcp = &zone_pcp(zone, get_cpu())->pcp;
    > > local_irq_save(flags);
    > > __count_vm_event(PGFREE);
    > > +
    > > + set_page_private(page, migratetype);
    > > +
    > > + /* the page for emergency shouldn't be cached */
    > > + if (migratetype == MIGRATE_RESERVE) {
    > > + free_one_page(zone, page, 0);
    > > + goto out;
    > > + }
    > > if (cold)
    > > list_add_tail(&page->lru, &pcp->list);
    > > else
    > > list_add(&page->lru, &pcp->list);
    > > - set_page_private(page, get_pageblock_migratetype(page));
    > > pcp->count++;
    > > if (pcp->count >= pcp->high) {
    > > free_pages_bulk(zone, pcp->batch, &pcp->list, 0);
    > > pcp->count -= pcp->batch;
    > > }
    > > +
    > > +out:
    > > local_irq_restore(flags);
    > > put_cpu();
    > > }
    > >
    > >

    >
    > --
    > Mel Gorman
    > Part-time Phd Student Linux Technology Center
    > University of Limerick IBM Dublin Software Lab
    >
    > --
    > To unsubscribe, send a message with 'unsubscribe linux-mm' in
    > the body to majordomo@kvack.org. For more info on Linux MM,
    > see: http://www.linux-mm.org/ .
    > Don't email: email@kvack.org
    >


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp

    Hi Mel, Cristoph,

    Thank you for interesting comment!


    > > MIGRATE_RESERVE mean that the page is for emergency.
    > > So it shouldn't be cached in pcp.

    >
    > It doesn't necessarily mean it's for emergencys. MIGRATE_RESERVE is one
    > or more pageblocks at the beginning of the zone. While it's possible
    > that the minimum page reserve for GFP_ATOMIC is located here, it's not
    > mandatory.
    >
    > What MIGRATE_RESERVE can help is high-order atomic allocations used by
    > some network drivers (a wireless one is what led to MIGRATE_RESERVE). As
    > they are high-order allocations, they would be returned to the buddy
    > allocator anyway.


    yup.
    my patch is meaningless for high order allocation because high order allocation
    don't use pcp.


    > What your patch may help is the situation where the system is under intense
    > memory pressure, is dipping routinely into the lowmem reserves and mixing
    > with high-order atomic allocations. This seems a bit extreme.


    not so extreame.

    The linux page reclaim can't process in interrupt context.
    Sl network subsystem and driver often use MIGRATE_RESERVE memory although
    system have many reclaimable memory.

    At that time, any task in process context can use high order allocation.


    > > otherwise, the system have unnecessary memory starvation risk
    > > because other cpu can't use this emergency pages.
    > >
    > > Signed-off-by: KOSAKI Motohiro
    > > CC: Mel Gorman
    > > CC: Christoph Lameter
    > >

    >
    > This patch seems functionally sound but as Christoph points out, this
    > adds another branch to the fast path. Now, I ran some tests and those that
    > completed didn't show any problems but adding branches in the fast path can
    > eventually lead to hard-to-detect performance problems.
    >
    > Do you have a situation in mind that this patch fixes up?


    Ah, sorry for my description is too poor.
    This isn't real workload issue, it is jsut

    Actually, I plan to rework to pcp because following pcp list searching
    in fast path is NOT fast.

    In general, list searching often cause L1 cache miss, therefore it shouldn't be
    used in fast path.


    static struct page *buffered_rmqueue(struct zone *preferred_zone,
    struct zone *zone, int order, gfp_t gfp_flags)
    {
    (snip)
    /* Find a page of the appropriate migrate type */
    if (cold) {
    list_for_each_entry_reverse(page, &pcp->list, lru)
    if (page_private(page) == migratetype)
    break;
    } else {
    list_for_each_entry(page, &pcp->list, lru)
    if (page_private(page) == migratetype)
    break;
    }


    Therefore, I'd like to make per migratetype pcp list.
    However, MIGRATETYPE_RESEVE list isn't useful because caller never need reserve type.
    it is only internal attribute.

    So I thought "dropping reserve type page in pcp" patch is useful although it is sololy used.
    Then, I posted it sololy for hear other developer opinion.

    Actually, current pcp is NOT fast, therefore the discussion of the
    number of branches isn't meaningful.
    the discussion of the number of branches is only meaningful when the fast path can
    process at N*branches level time, but current pcp is more slow.




    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp

    On Fri, Nov 07, 2008 at 10:42:24AM +0900, KAMEZAWA Hiroyuki wrote:
    > On Thu, 6 Nov 2008 16:46:45 +0000
    > Mel Gorman wrote:
    > > > otherwise, the system have unnecessary memory starvation risk
    > > > because other cpu can't use this emergency pages.
    > > >
    > > >
    > > >
    > > > Signed-off-by: KOSAKI Motohiro
    > > > CC: Mel Gorman
    > > > CC: Christoph Lameter
    > > >

    > >
    > > This patch seems functionally sound but as Christoph points out, this
    > > adds another branch to the fast path. Now, I ran some tests and those that
    > > completed didn't show any problems but adding branches in the fast path can
    > > eventually lead to hard-to-detect performance problems.
    > >

    > dividing pcp-list into MIGRATE_TYPES is bad ?


    I do not understand what your question is.

    > If divided, we can get rid of scan.
    >


    I don't know what you are saying here either. I think you are saying
    that we should avoid scanning the PCP lists for migrate types at all.
    That hurts anti-fragmentation as pages can be badly placed if any pcp
    page is used.

    --
    Mel Gorman
    Part-time Phd Student Linux Technology Center
    University of Limerick IBM Dublin Software Lab
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp

    On Fri, Nov 07, 2008 at 01:37:14PM +0900, KOSAKI Motohiro wrote:
    > Hi Mel, Cristoph,
    >
    > Thank you for interesting comment!
    >
    >
    > > > MIGRATE_RESERVE mean that the page is for emergency.
    > > > So it shouldn't be cached in pcp.

    > >
    > > It doesn't necessarily mean it's for emergencys. MIGRATE_RESERVE is one
    > > or more pageblocks at the beginning of the zone. While it's possible
    > > that the minimum page reserve for GFP_ATOMIC is located here, it's not
    > > mandatory.
    > >
    > > What MIGRATE_RESERVE can help is high-order atomic allocations used by
    > > some network drivers (a wireless one is what led to MIGRATE_RESERVE). As
    > > they are high-order allocations, they would be returned to the buddy
    > > allocator anyway.

    >
    > yup.
    > my patch is meaningless for high order allocation because high order allocation
    > don't use pcp.
    >
    > > What your patch may help is the situation where the system is under intense
    > > memory pressure, is dipping routinely into the lowmem reserves and mixing
    > > with high-order atomic allocations. This seems a bit extreme.

    >
    > not so extreame.
    >
    > The linux page reclaim can't process in interrupt context.
    > Sl network subsystem and driver often use MIGRATE_RESERVE memory although
    > system have many reclaimable memory.
    >


    Why are they often using MIGRATE_RESERVE, have you confirmed that? For that
    to be happening, it implies that either memory is under intense pressure and
    free pages are often below watermarks due to interrupt contexts or they are
    frequently allocating high-order pages in interrupt context. Normal order-0
    allocations should be getting satisified from elsewhere as if the free page
    counts are low, they would be direct reclaiming and that will likely be
    outside of the MIGRATE_RESERVE areas.

    > At that time, any task in process context can use high order allocation.
    >
    > > > otherwise, the system have unnecessary memory starvation risk
    > > > because other cpu can't use this emergency pages.
    > > >
    > > > Signed-off-by: KOSAKI Motohiro
    > > > CC: Mel Gorman
    > > > CC: Christoph Lameter
    > > >

    > >
    > > This patch seems functionally sound but as Christoph points out, this
    > > adds another branch to the fast path. Now, I ran some tests and those that
    > > completed didn't show any problems but adding branches in the fast path can
    > > eventually lead to hard-to-detect performance problems.
    > >
    > > Do you have a situation in mind that this patch fixes up?

    >
    > Ah, sorry for my description is too poor.
    > This isn't real workload issue, it is jsut
    >
    > Actually, I plan to rework to pcp because following pcp list searching
    > in fast path is NOT fast.
    >
    > In general, list searching often cause L1 cache miss, therefore it shouldn't be
    > used in fast path.
    >
    >
    > static struct page *buffered_rmqueue(struct zone *preferred_zone,
    > struct zone *zone, int order, gfp_t gfp_flags)
    > {
    > (snip)
    > /* Find a page of the appropriate migrate type */
    > if (cold) {
    > list_for_each_entry_reverse(page, &pcp->list, lru)
    > if (page_private(page) == migratetype)
    > break;
    > } else {
    > list_for_each_entry(page, &pcp->list, lru)
    > if (page_private(page) == migratetype)
    > break;
    > }
    >
    > Therefore, I'd like to make per migratetype pcp list.


    That was actually how it was originally implemented and later moved to a list
    search. It got shot down on the grounds a per-cpu structure increased in size.

    > However, MIGRATETYPE_RESEVE list isn't useful because caller never need reserve type.
    > it is only internal attribute.
    >
    > So I thought "dropping reserve type page in pcp" patch is useful although it is sololy used.
    > Then, I posted it sololy for hear other developer opinion.
    >
    > Actually, current pcp is NOT fast, therefore the discussion of the
    > number of branches isn't meaningful.
    > the discussion of the number of branches is only meaningful when the fast path can
    > process at N*branches level time, but current pcp is more slow.
    >


    --
    Mel Gorman
    Part-time Phd Student Linux Technology Center
    University of Limerick IBM Dublin Software Lab
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp

    On Fri, 7 Nov 2008 10:42:42 +0000
    Mel Gorman wrote:

    > On Fri, Nov 07, 2008 at 10:42:24AM +0900, KAMEZAWA Hiroyuki wrote:
    > > On Thu, 6 Nov 2008 16:46:45 +0000
    > > Mel Gorman wrote:
    > > > > otherwise, the system have unnecessary memory starvation risk
    > > > > because other cpu can't use this emergency pages.
    > > > >
    > > > >
    > > > >
    > > > > Signed-off-by: KOSAKI Motohiro
    > > > > CC: Mel Gorman
    > > > > CC: Christoph Lameter
    > > > >
    > > >
    > > > This patch seems functionally sound but as Christoph points out, this
    > > > adds another branch to the fast path. Now, I ran some tests and those that
    > > > completed didn't show any problems but adding branches in the fast path can
    > > > eventually lead to hard-to-detect performance problems.
    > > >

    > > dividing pcp-list into MIGRATE_TYPES is bad ?

    >
    > I do not understand what your question is.
    >

    Hmm. like this.

    pcp = &zone_pcp(zone, get_cpu())->pcp[migrate_type];


    Thanks,
    -Kame

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp

    On Fri, Nov 07, 2008 at 08:02:51PM +0900, KAMEZAWA Hiroyuki wrote:
    > On Fri, 7 Nov 2008 10:42:42 +0000
    > Mel Gorman wrote:
    >
    > > On Fri, Nov 07, 2008 at 10:42:24AM +0900, KAMEZAWA Hiroyuki wrote:
    > > > On Thu, 6 Nov 2008 16:46:45 +0000
    > > > Mel Gorman wrote:
    > > > > > otherwise, the system have unnecessary memory starvation risk
    > > > > > because other cpu can't use this emergency pages.
    > > > > >
    > > > > >
    > > > > >
    > > > > > Signed-off-by: KOSAKI Motohiro
    > > > > > CC: Mel Gorman
    > > > > > CC: Christoph Lameter
    > > > > >
    > > > >
    > > > > This patch seems functionally sound but as Christoph points out, this
    > > > > adds another branch to the fast path. Now, I ran some tests and those that
    > > > > completed didn't show any problems but adding branches in the fast path can
    > > > > eventually lead to hard-to-detect performance problems.
    > > > >
    > > > dividing pcp-list into MIGRATE_TYPES is bad ?

    > >
    > > I do not understand what your question is.
    > >

    > Hmm. like this.
    >
    > pcp = &zone_pcp(zone, get_cpu())->pcp[migrate_type];
    >


    Oh, do you mean splitting the list instead of searching? This is how it was
    originally implement and shot down on the grounds it increased the size of
    a per-cpu structure.

    --
    Mel Gorman
    Part-time Phd Student Linux Technology Center
    University of Limerick IBM Dublin Software Lab
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp

    On Fri, 7 Nov 2008, Mel Gorman wrote:

    > Oh, do you mean splitting the list instead of searching? This is how it was
    > originally implement and shot down on the grounds it increased the size of
    > a per-cpu structure.


    The situation may be better with the cpu_alloc stuff. The big pcp array in
    struct zone for all possible processors will be gone and thus the memory
    requirements will be less.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp

    > > Oh, do you mean splitting the list instead of searching? This is how it was
    > > originally implement and shot down on the grounds it increased the size of
    > > a per-cpu structure.

    >
    > The situation may be better with the cpu_alloc stuff. The big pcp array in
    > struct zone for all possible processors will be gone and thus the memory
    > requirements will be less.


    Yup, there are very nicer patch!



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp

    > > > What your patch may help is the situation where the system is under intense
    > > > memory pressure, is dipping routinely into the lowmem reserves and mixing
    > > > with high-order atomic allocations. This seems a bit extreme.

    > >
    > > not so extreame.
    > >
    > > The linux page reclaim can't process in interrupt context.
    > > Sl network subsystem and driver often use MIGRATE_RESERVE memory although
    > > system have many reclaimable memory.
    > >

    >
    > Why are they often using MIGRATE_RESERVE, have you confirmed that? For that
    > to be happening, it implies that either memory is under intense pressure and
    > free pages are often below watermarks due to interrupt contexts or they are
    > frequently allocating high-order pages in interrupt context. Normal order-0
    > allocations should be getting satisified from elsewhere as if the free page
    > counts are low, they would be direct reclaiming and that will likely be
    > outside of the MIGRATE_RESERVE areas.


    if inserting printk() in MIGRATE_RESERVE, I can observe MIGRATE_RESERVE
    page alloc easily although heavy workload don't run.
    but, there aren't my point.

    ok, I guess my patch description was too poor (and a bit pointless).
    So, I retry it.

    (1) in general principal, the system should effort to avoid oom rather than
    performance if memory shortage happend.
    MIGRATE_RESERVE directly indicate memory shortage happend.
    and pcp caching can prevent another cpu allocation.
    (2) MIGRATE_RESERVE is never searched by buffered_rmqueue() because
    allocflags_to_migratetype() never return MIGRATE_RESERVE.
    it doesn't work as cache.
    IOW, it don't help to increase performance.
    (3) if the system pass MIGRATE_RESERVE to free_hot_cold_page() continously,
    pcp queueing can reduce the number of grabing zone->lock.
    However, it is rate. because MIGRATE_RESERVE is emergency memory,
    and it is often used interupt context processing.
    continuous emergency memory allocation in interrupt context isn't so sane.

    Then, unqueueing MIGRATE_RESERVE page doesn't cause performance degression
    and, it can (a bit) increase realibility and I think merit is much over demerit.




    > > static struct page *buffered_rmqueue(struct zone *preferred_zone,
    > > struct zone *zone, int order, gfp_t gfp_flags)
    > > {
    > > (snip)
    > > /* Find a page of the appropriate migrate type */
    > > if (cold) {
    > > list_for_each_entry_reverse(page, &pcp->list, lru)
    > > if (page_private(page) == migratetype)
    > > break;
    > > } else {
    > > list_for_each_entry(page, &pcp->list, lru)
    > > if (page_private(page) == migratetype)
    > > break;
    > > }
    > >
    > > Therefore, I'd like to make per migratetype pcp list.

    >
    > That was actually how it was originally implemented and later moved to a list
    > search. It got shot down on the grounds a per-cpu structure increased in size.


    Yup, I believe at that time your decision is right.
    However, I think the condision was changed (or to be able to change).

    (1) legacy pcp implementation deeply relate to struct zone size.
    and, to blow up struct zone size cause performance degression
    because cache miss increasing.
    However, it solved cristoph's cpu-alloc patch

    (2) legacy pcp doesn't have total number of pages restriction.
    So, increasing lists directly cause number of pages in pcp.
    it can cause oom problem on large numa environment.
    However, I think we can implement total number of pages restriction.



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp

    On Tue, Nov 11, 2008 at 10:39:40PM +0900, KOSAKI Motohiro wrote:
    > > > > What your patch may help is the situation where the system is under intense
    > > > > memory pressure, is dipping routinely into the lowmem reserves and mixing
    > > > > with high-order atomic allocations. This seems a bit extreme.
    > > >
    > > > not so extreame.
    > > >
    > > > The linux page reclaim can't process in interrupt context.
    > > > Sl network subsystem and driver often use MIGRATE_RESERVE memory although
    > > > system have many reclaimable memory.
    > > >

    > >
    > > Why are they often using MIGRATE_RESERVE, have you confirmed that? For that
    > > to be happening, it implies that either memory is under intense pressure and
    > > free pages are often below watermarks due to interrupt contexts or they are
    > > frequently allocating high-order pages in interrupt context. Normal order-0
    > > allocations should be getting satisified from elsewhere as if the free page
    > > counts are low, they would be direct reclaiming and that will likely be
    > > outside of the MIGRATE_RESERVE areas.

    >
    > if inserting printk() in MIGRATE_RESERVE, I can observe MIGRATE_RESERVE
    > page alloc easily although heavy workload don't run.
    > but, there aren't my point.
    >


    That's interesting. What is the size of a pageblock on your system and
    is min_free_kbytes aligned to that value? If it's not aligned, it would
    explain why MIGRATE_RESERVE pages are being used before the watermarks
    are hit.

    > ok, I guess my patch description was too poor (and a bit pointless).
    > So, I retry it.
    >
    > (1) in general principal, the system should effort to avoid oom rather than
    > performance if memory shortage happend.
    > MIGRATE_RESERVE directly indicate memory shortage happend.
    > and pcp caching can prevent another cpu allocation.


    MIGRATE_RESERVE does not directly indicate a memory shortage has
    occured. Bear in mind that a number of pageblocks are marked
    MIGRATE_RESERVE based on the value of the watermarks. In general, the
    minimum number of pages kept free will be in the MIGRATE_RESERVE blocks
    but it is not mandatory.

    > (2) MIGRATE_RESERVE is never searched by buffered_rmqueue() because
    > allocflags_to_migratetype() never return MIGRATE_RESERVE.
    > it doesn't work as cache.
    > IOW, it don't help to increase performance.


    This is true. If MIGRATE_RESERVE pages are routinely being used and placed
    on the pcp lists, the lists are not being used to their full potential
    and your patch would make sense.

    > (3) if the system pass MIGRATE_RESERVE to free_hot_cold_page() continously,
    > pcp queueing can reduce the number of grabing zone->lock.
    > However, it is rate. because MIGRATE_RESERVE is emergency memory,


    Again, MIGRATE_RESERVE is not emergency memory.

    > and it is often used interupt context processing.
    > continuous emergency memory allocation in interrupt context isn't so sane.
    >
    > Then, unqueueing MIGRATE_RESERVE page doesn't cause performance degression
    > and, it can (a bit) increase realibility and I think merit is much over demerit.
    >


    I'm now inclined to agree if you have shown that MIGRATE_RESERVE pages are
    routinely ending up on the PCP lists.

    >
    >
    >
    > > > static struct page *buffered_rmqueue(struct zone *preferred_zone,
    > > > struct zone *zone, int order, gfp_t gfp_flags)
    > > > {
    > > > (snip)
    > > > /* Find a page of the appropriate migrate type */
    > > > if (cold) {
    > > > list_for_each_entry_reverse(page, &pcp->list, lru)
    > > > if (page_private(page) == migratetype)
    > > > break;
    > > > } else {
    > > > list_for_each_entry(page, &pcp->list, lru)
    > > > if (page_private(page) == migratetype)
    > > > break;
    > > > }
    > > >
    > > > Therefore, I'd like to make per migratetype pcp list.

    > >
    > > That was actually how it was originally implemented and later moved to a list
    > > search. It got shot down on the grounds a per-cpu structure increased in size.

    >
    > Yup, I believe at that time your decision is right.
    > However, I think the condision was changed (or to be able to change).
    >
    > (1) legacy pcp implementation deeply relate to struct zone size.
    > and, to blow up struct zone size cause performance degression
    > because cache miss increasing.
    > However, it solved cristoph's cpu-alloc patch
    >


    Indeed.

    > (2) legacy pcp doesn't have total number of pages restriction.
    > So, increasing lists directly cause number of pages in pcp.
    > it can cause oom problem on large numa environment.
    > However, I think we can implement total number of pages restriction.
    >


    Yes although knowing what the right size for each of the lists should be
    so that the overall PCP lists are not huge is a tricky one.

    --
    Mel Gorman
    Part-time Phd Student Linux Technology Center
    University of Limerick IBM Dublin Software Lab
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread