[RFC/PATCH 0/6] memcg: peformance improvement at el. v3 - Kernel

This is a discussion on [RFC/PATCH 0/6] memcg: peformance improvement at el. v3 - Kernel ; This set is for memory resource controller, reviewer and testers. Updated against 2.6.26-rc2 and added fixes. This set does - remove refcnt from page_cgroup. By this, codes can be simplified and we can avoid tons of unnecessary calls just for ...

+ Reply to Thread
Results 1 to 14 of 14

Thread: [RFC/PATCH 0/6] memcg: peformance improvement at el. v3

  1. [RFC/PATCH 0/6] memcg: peformance improvement at el. v3

    This set is for memory resource controller, reviewer and testers.

    Updated against 2.6.26-rc2 and added fixes.

    This set does
    - remove refcnt from page_cgroup. By this, codes can be simplified and
    we can avoid tons of unnecessary calls just for maintain refcnt.
    - handle swap-cache, which is now ignored by memory resource controller.
    - small optimization.
    - make force_empty to drop pages. (NEW)

    major changes : v2 -> v3
    - fixed shared memory handling.
    - added a call to request recalaim memory from specified memcg (NEW) for shmem.
    - added drop_all_pages_in_mem_cgroup before calling force_empty()
    - dropped 3 patches because it's already sent to -mm queue.

    1/6 -- make force_empty to drop pages. (NEW)
    2/6 -- remove refcnt from page_cgroup (shmem handling is fixed.)
    3/6 -- handle swap cache
    4/6 -- add an interface to reclaim memory from memcg (NEW) (for shmem)
    5/6 -- small optimzation with likely()/unlikely()
    6/6 -- remove redundant check.

    If positive feedback, I'd like to send some of them agaisnt -mm queue.

    This is based on
    - 2.6.26-rc2
    - memcg-avoid-unnecessary-initialization.patch (in -mm queue)
    - memcg-make-global-var-read_mostly.patch (in -mm queue)
    - memcg-better-migration-handling.patch (in -mm queue)
    tested on x86-64 box. Seems to work very well.

    Any comments are welcome.

    Thanks,
    -Kame

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. [RFC/PATCH 1/6] memcg: drop_pages at force_empty.

    This is NEW one
    ==
    Now, when we remove memcg, we call force_empty().
    This call drops all page_cgroup accounting in this mem_cgroup but doesn't
    drop pages. So, some page caches can be remaind as "not accounted" memory
    while they are alive. (because it's accounted only when add_to_page_cache())
    If they are not used by other memcg, global LRU will drop them.

    This patch tries to drop pages at removing memcg. Other memcg will
    reload and re-account page caches.

    Consideration: should we add knob to drop pages or not?

    Signed-off-by: KAMEZAWA Hiroyuki

    ---
    include/linux/res_counter.h | 11 +++++++++++
    mm/memcontrol.c | 21 ++++++++++++++++++---
    2 files changed, 29 insertions(+), 3 deletions(-)

    Index: linux-2.6.26-rc2/mm/memcontrol.c
    ================================================== =================
    --- linux-2.6.26-rc2.orig/mm/memcontrol.c
    +++ linux-2.6.26-rc2/mm/memcontrol.c
    @@ -763,6 +763,20 @@ void mem_cgroup_end_migration(struct pag
    mem_cgroup_uncharge_page(newpage);
    }

    +
    +static void mem_cgroup_drop_all_page(struct mem_cgroup *mem)
    +{
    + int progress;
    + while (res_counter_check_empty(&mem->res)) {
    + progress = try_to_free_mem_cgroup_pages(mem,
    + GFP_HIGHUSER_MOVABLE);
    + if (!progress) /* we did as much as possible */
    + break;
    + cond_resched();
    + }
    + return;
    +}
    +
    /*
    * This routine traverse page_cgroup in given list and drop them all.
    * This routine ignores page_cgroup->ref_cnt.
    @@ -820,7 +834,12 @@ static int mem_cgroup_force_empty(struct
    if (mem_cgroup_subsys.disabled)
    return 0;

    + if (atomic_read(&mem->css.cgroup->count) > 0)
    + goto out;
    +
    css_get(&mem->css);
    + /* drop pages as much as possible */
    + mem_cgroup_drop_all_pages(mem);
    /*
    * page reclaim code (kswapd etc..) will move pages between
    * active_list <-> inactive_list while we don't take a lock.
    Index: linux-2.6.26-rc2/include/linux/res_counter.h
    ================================================== =================
    --- linux-2.6.26-rc2.orig/include/linux/res_counter.h
    +++ linux-2.6.26-rc2/include/linux/res_counter.h
    @@ -151,4 +151,15 @@ static inline void res_counter_reset_fai
    cnt->failcnt = 0;
    spin_unlock_irqrestore(&cnt->lock, flags);
    }
    +
    +static inline int res_counter_check_empty(struct res_counter *cnt)
    +{
    + unsigned long flags;
    + int ret;
    +
    + spin_lock_irqsave(&cnt->lock, flags);
    + ret = (cnt->usage == 0);
    + spin_unlock_irqrestore(&cnt->lock, flags);
    + return ret;
    +}
    #endif

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. [RFC/PATCH 3/6] memcg: handle swapcache.

    Including changes in memory.stat file.

    =
    Now swapcache is not accounted. (because it had some troubles.)

    This is retrying account swap cache, based on remove-refcnt patch.

    * If a page is swap-cache, mem_cgroup_uncharge_page() will *not*
    uncharge a page even if page->mapcount == 0.
    * If a page is removed from swap-cache, mem_cgroup_uncharge_page()
    is called.
    * A new swapcache page is not charged until when it's mapped. By this
    we can avoid complicated read-ahead troubles.

    A file, memory.stat,"rss" member is changed to "anon/swapcache".
    (rss is not precise name here...)
    When all processes in cgroup exits, rss/swapcache counter can have some
    numbers because of lazy behavior of LRU. So the word "rss" is confusing.
    I can easily imagine a user says "Oh, there may be memory leak..."
    Precise counting of swapcache is too costly to be handled in memcg.

    Change log: v2->v3
    - adjusted to 2.6.26-rc2+x
    - changed "rss" in stat to "rss/swapcache". (stat value includes swapcache)
    Change log: v1->v2
    - adjusted to 2.6.25-mm1.

    Signed-off-by: KAMEZAWA Hiroyuki

    ---
    mm/memcontrol.c | 9 +++++----
    mm/migrate.c | 3 ++-
    mm/swap_state.c | 1 +
    3 files changed, 8 insertions(+), 5 deletions(-)

    Index: linux-2.6.26-rc2/mm/migrate.c
    ================================================== =================
    --- linux-2.6.26-rc2.orig/mm/migrate.c
    +++ linux-2.6.26-rc2/mm/migrate.c
    @@ -359,7 +359,8 @@ static int migrate_page_move_mapping(str
    write_unlock_irq(&mapping->tree_lock);
    if (!PageSwapCache(newpage)) {
    mem_cgroup_uncharge_cache_page(page);
    - }
    + } else
    + mem_cgroup_uncharge_page(page);

    return 0;
    }
    Index: linux-2.6.26-rc2/mm/swap_state.c
    ================================================== =================
    --- linux-2.6.26-rc2.orig/mm/swap_state.c
    +++ linux-2.6.26-rc2/mm/swap_state.c
    @@ -110,6 +110,7 @@ void __delete_from_swap_cache(struct pag
    total_swapcache_pages--;
    __dec_zone_page_state(page, NR_FILE_PAGES);
    INC_CACHE_INFO(del_total);
    + mem_cgroup_uncharge_page(page);
    }

    /**
    Index: linux-2.6.26-rc2/mm/memcontrol.c
    ================================================== =================
    --- linux-2.6.26-rc2.orig/mm/memcontrol.c
    +++ linux-2.6.26-rc2/mm/memcontrol.c
    @@ -44,10 +44,10 @@ static struct kmem_cache *page_cgroup_ca
    */
    enum mem_cgroup_stat_index {
    /*
    - * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss.
    + * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss/swapcache.
    */
    MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */
    - MEM_CGROUP_STAT_RSS, /* # of pages charged as rss */
    + MEM_CGROUP_STAT_RSS, /* # of pages charged as anon/swapcache */
    MEM_CGROUP_STAT_PGPGIN_COUNT, /* # of pages paged in */
    MEM_CGROUP_STAT_PGPGOUT_COUNT, /* # of pages paged out */

    @@ -696,7 +696,8 @@ void __mem_cgroup_uncharge_common(struct

    if ((ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
    && ((pc->flags & PAGE_CGROUP_FLAG_CACHE)
    - || page_mapped(page)))
    + || page_mapped(page)
    + || PageSwapCache(page)))
    goto unlock;

    mz = page_cgroup_zoneinfo(pc);
    @@ -922,7 +923,7 @@ static const struct mem_cgroup_stat_desc
    u64 unit;
    } mem_cgroup_stat_desc[] = {
    [MEM_CGROUP_STAT_CACHE] = { "cache", PAGE_SIZE, },
    - [MEM_CGROUP_STAT_RSS] = { "rss", PAGE_SIZE, },
    + [MEM_CGROUP_STAT_RSS] = { "anon/swapcache", PAGE_SIZE, },
    [MEM_CGROUP_STAT_PGPGIN_COUNT] = {"pgpgin", 1, },
    [MEM_CGROUP_STAT_PGPGOUT_COUNT] = {"pgpgout", 1, },
    };

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. [RFC/PATCH 4/6] memcg: shmem reclaim helper

    A new call, mem_cgroup_shrink_usage() is added for shmem handling
    and removing not usual usage of mem_cgroup_charge/uncharge.

    Now, shmem calls mem_cgroup_charge() just for reclaim some pages from
    mem_cgroup. In general, shmem is used by some process group and not for
    global resource (like file caches). So, it's reasonable to reclaim pages from
    mem_cgroup where shmem is mainly used.

    Signed-off-by: KAMEZAWA Hiroyuki

    ---
    mm/memcontrol.c | 18 ++++++++++++++++++
    1 file changed, 18 insertions(+)

    Index: linux-2.6.26-rc2/mm/memcontrol.c
    ================================================== =================
    --- linux-2.6.26-rc2.orig/mm/memcontrol.c
    +++ linux-2.6.26-rc2/mm/memcontrol.c
    @@ -783,6 +783,30 @@ static void mem_cgroup_drop_all_pages(st
    }

    /*
    + * A call to try to shrink memory usage under specified resource controller.
    + * This is typically used for page reclaiming for shmem for reducing side
    + * effect of page allocation from shmem, which is used by some mem_cgroup.
    + */
    +int mem_cgroup_shrink_usage(struct mm_struct *mm, gfp_t gfp_mask)
    +{
    + struct mem_cgroup *mem;
    + int progress = 0;
    + int retry = MEM_CGROUP_RECLAIM_RETRIES;
    +
    + rcu_read_lock();
    + mem = mem_cgroup_from_task(rcu_dereference(mm->owner));
    + css_get(&mem->css);
    + rcu_read_unlock();
    +
    + while(!progress && --retry) {
    + progress = try_to_free_mem_cgroup_pages(mem, gfp_mask);
    + }
    + if (!retry)
    + return -ENOMEM;
    + return 0;
    +}
    +
    +/*
    * This routine traverse page_cgroup in given list and drop them all.
    * *And* this routine doesn't reclaim page itself, just removes page_cgroup.
    */
    Index: linux-2.6.26-rc2/mm/shmem.c
    ================================================== =================
    --- linux-2.6.26-rc2.orig/mm/shmem.c
    +++ linux-2.6.26-rc2/mm/shmem.c
    @@ -1314,13 +1314,12 @@ repeat:
    unlock_page(swappage);
    if (error == -ENOMEM) {
    /* allow reclaim from this memory cgroup */
    - error = mem_cgroup_cache_charge(swappage,
    - current->mm, gfp & ~__GFP_HIGHMEM);
    + error = mem_cgroup_shrink_usage(current->mm,
    + gfp & ~__GFP_HIGHMEM);
    if (error) {
    page_cache_release(swappage);
    goto failed;
    }
    - mem_cgroup_uncharge_cache_page(swappage);
    }
    page_cache_release(swappage);
    goto repeat;

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. [RFC/PATCH 2/6] memcg: remove refcnt

    major changes in shmem handling.

    ==
    This patch removes refcnt from page_cgroup().

    After this,

    * A page is charged only when !page_mapped() && no page_cgroup is assigned.
    * Anon page is newly mapped.
    * File page is added to mapping->tree.

    * A page is uncharged only when
    * Anon page is fully unmapped.
    * File page is removed from LRU.

    There is no change in behavior from user's view.

    This patch also removes unnecessary calls in rmap.c which was used only for
    refcnt mangement.

    Changelog: v2->v3
    - adjusted to 2.6.26-rc2
    - Fixed shmem's page_cgroup refcnt handling. (but it's still complicated...)
    - added detect-already-accounted-file-cache check to mem_cgroup_charge().

    Changelog: v1->v2
    adjusted to 2.6.25-mm1.

    Signed-off-by: KAMEZAWA Hiroyuki

    ---
    include/linux/memcontrol.h | 9 +---
    mm/filemap.c | 6 +-
    mm/memcontrol.c | 94 ++++++++++++++++++++++-----------------------
    mm/migrate.c | 3 -
    mm/rmap.c | 14 ------
    mm/shmem.c | 8 +--
    6 files changed, 61 insertions(+), 73 deletions(-)

    Index: linux-2.6.26-rc2/mm/memcontrol.c
    ================================================== =================
    --- linux-2.6.26-rc2.orig/mm/memcontrol.c
    +++ linux-2.6.26-rc2/mm/memcontrol.c
    @@ -166,7 +166,6 @@ struct page_cgroup {
    struct list_head lru; /* per cgroup LRU list */
    struct page *page;
    struct mem_cgroup *mem_cgroup;
    - int ref_cnt; /* cached, mapped, migrating */
    int flags;
    };
    #define PAGE_CGROUP_FLAG_CACHE (0x1) /* charged as cache */
    @@ -185,6 +184,7 @@ static enum zone_type page_cgroup_zid(st
    enum charge_type {
    MEM_CGROUP_CHARGE_TYPE_CACHE = 0,
    MEM_CGROUP_CHARGE_TYPE_MAPPED,
    + MEM_CGROUP_CHARGE_TYPE_FORCE, /* used by force_empty */
    };

    /*
    @@ -552,9 +552,7 @@ retry:
    */
    if (pc) {
    VM_BUG_ON(pc->page != page);
    - VM_BUG_ON(pc->ref_cnt <= 0);
    -
    - pc->ref_cnt++;
    + VM_BUG_ON(!pc->mem_cgroup);
    unlock_page_cgroup(page);
    goto done;
    }
    @@ -570,10 +568,7 @@ retry:
    * thread group leader migrates. It's possible that mm is not
    * set, if so charge the init_mm (happens for pagecache usage).
    */
    - if (!memcg) {
    - if (!mm)
    - mm = &init_mm;
    -
    + if (likely(!memcg)) {
    rcu_read_lock();
    mem = mem_cgroup_from_task(rcu_dereference(mm->owner));
    /*
    @@ -609,7 +604,6 @@ retry:
    }
    }

    - pc->ref_cnt = 1;
    pc->mem_cgroup = mem;
    pc->page = page;
    /*
    @@ -653,6 +647,17 @@ err:

    int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
    {
    + /*
    + * If already mapped, we don't have to account.
    + * If page cache, page->mapping has address_space.
    + * But page->mapping may have out-of-use anon_vma pointer,
    + * detecit it by PageAnon() check. newly-mapped-anon's page->mapping
    + * is NULL.
    + */
    + if (page_mapped(page) || (page->mapping && !PageAnon(page)))
    + return 0;
    + if (unlikely(!mm))
    + mm = &init_mm;
    return mem_cgroup_charge_common(page, mm, gfp_mask,
    MEM_CGROUP_CHARGE_TYPE_MAPPED, NULL);
    }
    @@ -660,32 +665,16 @@ int mem_cgroup_charge(struct page *page,
    int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
    gfp_t gfp_mask)
    {
    - if (!mm)
    + if (unlikely(!mm))
    mm = &init_mm;
    return mem_cgroup_charge_common(page, mm, gfp_mask,
    MEM_CGROUP_CHARGE_TYPE_CACHE, NULL);
    }

    -int mem_cgroup_getref(struct page *page)
    -{
    - struct page_cgroup *pc;
    -
    - if (mem_cgroup_subsys.disabled)
    - return 0;
    -
    - lock_page_cgroup(page);
    - pc = page_get_page_cgroup(page);
    - VM_BUG_ON(!pc);
    - pc->ref_cnt++;
    - unlock_page_cgroup(page);
    - return 0;
    -}
    -
    /*
    - * Uncharging is always a welcome operation, we never complain, simply
    - * uncharge.
    + * uncharge if !page_mapped(page)
    */
    -void mem_cgroup_uncharge_page(struct page *page)
    +void __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
    {
    struct page_cgroup *pc;
    struct mem_cgroup *mem;
    @@ -704,29 +693,41 @@ void mem_cgroup_uncharge_page(struct pag
    goto unlock;

    VM_BUG_ON(pc->page != page);
    - VM_BUG_ON(pc->ref_cnt <= 0);

    - if (--(pc->ref_cnt) == 0) {
    - mz = page_cgroup_zoneinfo(pc);
    - spin_lock_irqsave(&mz->lru_lock, flags);
    - __mem_cgroup_remove_list(mz, pc);
    - spin_unlock_irqrestore(&mz->lru_lock, flags);
    + if ((ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
    + && ((pc->flags & PAGE_CGROUP_FLAG_CACHE)
    + || page_mapped(page)))
    + goto unlock;

    - page_assign_page_cgroup(page, NULL);
    - unlock_page_cgroup(page);
    + mz = page_cgroup_zoneinfo(pc);
    + spin_lock_irqsave(&mz->lru_lock, flags);
    + __mem_cgroup_remove_list(mz, pc);
    + spin_unlock_irqrestore(&mz->lru_lock, flags);

    - mem = pc->mem_cgroup;
    - res_counter_uncharge(&mem->res, PAGE_SIZE);
    - css_put(&mem->css);
    + page_assign_page_cgroup(page, NULL);
    + unlock_page_cgroup(page);

    - kmem_cache_free(page_cgroup_cache, pc);
    - return;
    - }
    + mem = pc->mem_cgroup;
    + res_counter_uncharge(&mem->res, PAGE_SIZE);
    + css_put(&mem->css);

    + kmem_cache_free(page_cgroup_cache, pc);
    + return;
    unlock:
    unlock_page_cgroup(page);
    }

    +void mem_cgroup_uncharge_page(struct page *page)
    +{
    + __mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_MAPPED);
    +}
    +
    +void mem_cgroup_uncharge_cache_page(struct page *page)
    +{
    + VM_BUG_ON(page_mapped(page));
    + __mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_CACHE);
    +}
    +
    /*
    * Before starting migration, account against new page.
    */
    @@ -757,10 +758,13 @@ int mem_cgroup_prepare_migration(struct
    return ret;
    }

    -/* remove redundant charge */
    +/* remove redundant charge if migration failed*/
    void mem_cgroup_end_migration(struct page *newpage)
    {
    - mem_cgroup_uncharge_page(newpage);
    + /* At success, page->mapping is not NULL */
    + if (newpage->mapping)
    + __mem_cgroup_uncharge_common(newpage,
    + MEM_CGROUP_CHARGE_TYPE_FORCE);
    }


    @@ -779,7 +783,6 @@ static void mem_cgroup_drop_all_pages(st

    /*
    * This routine traverse page_cgroup in given list and drop them all.
    - * This routine ignores page_cgroup->ref_cnt.
    * *And* this routine doesn't reclaim page itself, just removes page_cgroup.
    */
    #define FORCE_UNCHARGE_BATCH (128)
    @@ -809,7 +812,8 @@ static void mem_cgroup_force_empty_list(
    * if it's under page migration.
    */
    if (PageLRU(page)) {
    - mem_cgroup_uncharge_page(page);
    + __mem_cgroup_uncharge_common(page,
    + MEM_CGROUP_CHARGE_TYPE_FORCE);
    put_page(page);
    if (--count <= 0) {
    count = FORCE_UNCHARGE_BATCH;
    Index: linux-2.6.26-rc2/mm/filemap.c
    ================================================== =================
    --- linux-2.6.26-rc2.orig/mm/filemap.c
    +++ linux-2.6.26-rc2/mm/filemap.c
    @@ -118,7 +118,7 @@ void __remove_from_page_cache(struct pag
    {
    struct address_space *mapping = page->mapping;

    - mem_cgroup_uncharge_page(page);
    + mem_cgroup_uncharge_cache_page(page);
    radix_tree_delete(&mapping->page_tree, page->index);
    page->mapping = NULL;
    mapping->nrpages--;
    @@ -476,12 +476,12 @@ int add_to_page_cache(struct page *page,
    mapping->nrpages++;
    __inc_zone_page_state(page, NR_FILE_PAGES);
    } else
    - mem_cgroup_uncharge_page(page);
    + mem_cgroup_uncharge_cache_page(page);

    write_unlock_irq(&mapping->tree_lock);
    radix_tree_preload_end();
    } else
    - mem_cgroup_uncharge_page(page);
    + mem_cgroup_uncharge_cache_page(page);
    out:
    return error;
    }
    Index: linux-2.6.26-rc2/mm/migrate.c
    ================================================== =================
    --- linux-2.6.26-rc2.orig/mm/migrate.c
    +++ linux-2.6.26-rc2/mm/migrate.c
    @@ -358,8 +358,7 @@ static int migrate_page_move_mapping(str

    write_unlock_irq(&mapping->tree_lock);
    if (!PageSwapCache(newpage)) {
    - mem_cgroup_uncharge_page(page);
    - mem_cgroup_getref(newpage);
    + mem_cgroup_uncharge_cache_page(page);
    }

    return 0;
    Index: linux-2.6.26-rc2/include/linux/memcontrol.h
    ================================================== =================
    --- linux-2.6.26-rc2.orig/include/linux/memcontrol.h
    +++ linux-2.6.26-rc2/include/linux/memcontrol.h
    @@ -35,6 +35,8 @@ extern int mem_cgroup_charge(struct page
    extern int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
    gfp_t gfp_mask);
    extern void mem_cgroup_uncharge_page(struct page *page);
    +extern void mem_cgroup_uncharge_cache_page(struct page *page);
    +extern int mem_cgroup_shrink_usage(struct mm_struct *mm, gfp_t gfp_mask);
    extern void mem_cgroup_move_lists(struct page *page, bool active);
    extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
    struct list_head *dst,
    @@ -53,7 +55,6 @@ extern struct mem_cgroup *mem_cgroup_fro
    extern int
    mem_cgroup_prepare_migration(struct page *page, struct page *newpage);
    extern void mem_cgroup_end_migration(struct page *page);
    -extern int mem_cgroup_getref(struct page *page);

    /*
    * For memory reclaim.
    @@ -97,6 +98,14 @@ static inline int mem_cgroup_cache_charg
    static inline void mem_cgroup_uncharge_page(struct page *page)
    {
    }
    +static inline void mem_cgroup_uncharge_cache_page(struct page *page)
    +{
    +}
    +
    +static int mem_cgroup_shrink_usage(struct mm_struct *mm, gfp_t gfp_mask);
    +{
    + return 0;
    +}

    static inline void mem_cgroup_move_lists(struct page *page, bool active)
    {
    @@ -123,10 +132,6 @@ static inline void mem_cgroup_end_migrat
    {
    }

    -static inline void mem_cgroup_getref(struct page *page)
    -{
    -}
    -
    static inline int mem_cgroup_calc_mapped_ratio(struct mem_cgroup *mem)
    {
    return 0;
    Index: linux-2.6.26-rc2/mm/rmap.c
    ================================================== =================
    --- linux-2.6.26-rc2.orig/mm/rmap.c
    +++ linux-2.6.26-rc2/mm/rmap.c
    @@ -576,14 +576,8 @@ void page_add_anon_rmap(struct page *pag
    VM_BUG_ON(address < vma->vm_start || address >= vma->vm_end);
    if (atomic_inc_and_test(&page->_mapcount))
    __page_set_anon_rmap(page, vma, address);
    - else {
    + else
    __page_check_anon_rmap(page, vma, address);
    - /*
    - * We unconditionally charged during prepare, we uncharge here
    - * This takes care of balancing the reference counts
    - */
    - mem_cgroup_uncharge_page(page);
    - }
    }

    /**
    @@ -614,12 +608,6 @@ void page_add_file_rmap(struct page *pag
    {
    if (atomic_inc_and_test(&page->_mapcount))
    __inc_zone_page_state(page, NR_FILE_MAPPED);
    - else
    - /*
    - * We unconditionally charged during prepare, we uncharge here
    - * This takes care of balancing the reference counts
    - */
    - mem_cgroup_uncharge_page(page);
    }

    #ifdef CONFIG_DEBUG_VM
    Index: linux-2.6.26-rc2/mm/shmem.c
    ================================================== =================
    --- linux-2.6.26-rc2.orig/mm/shmem.c
    +++ linux-2.6.26-rc2/mm/shmem.c
    @@ -961,13 +961,14 @@ found:
    shmem_swp_unmap(ptr);
    spin_unlock(&info->lock);
    radix_tree_preload_end();
    -uncharge:
    - mem_cgroup_uncharge_page(page);
    out:
    unlock_page(page);
    page_cache_release(page);
    iput(inode); /* allows for NULL */
    return error;
    +uncharge:
    + mem_cgroup_uncharge_cache_page(page);
    + goto out;
    }

    /*
    @@ -1319,7 +1320,7 @@ repeat:
    page_cache_release(swappage);
    goto failed;
    }
    - mem_cgroup_uncharge_page(swappage);
    + mem_cgroup_uncharge_cache_page(swappage);
    }
    page_cache_release(swappage);
    goto repeat;
    @@ -1389,7 +1390,7 @@ repeat:
    if (error || swap.val || 0 != add_to_page_cache_lru(
    filepage, mapping, idx, GFP_NOWAIT)) {
    spin_unlock(&info->lock);
    - mem_cgroup_uncharge_page(filepage);
    + mem_cgroup_uncharge_cache_page(filepage);
    page_cache_release(filepage);
    shmem_unacct_blocks(info->flags, 1);
    shmem_free_blocks(inode, 1);
    @@ -1398,7 +1399,6 @@ repeat:
    goto failed;
    goto repeat;
    }
    - mem_cgroup_uncharge_page(filepage);
    info->flags |= SHMEM_PAGEIN;
    }


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. [RFC/PATCH 5/6] memcg: optimize branch

    Showing brach direction for obvious conditions.

    Signed-off-by: KAMEZAWA Hiroyuki

    ---
    mm/memcontrol.c | 8 ++++----
    1 file changed, 4 insertions(+), 4 deletions(-)

    Index: linux-2.6.26-rc2/mm/memcontrol.c
    ================================================== =================
    --- linux-2.6.26-rc2.orig/mm/memcontrol.c
    +++ linux-2.6.26-rc2/mm/memcontrol.c
    @@ -550,7 +550,7 @@ retry:
    * The page_cgroup exists and
    * the page has already been accounted.
    */
    - if (pc) {
    + if (unlikely(pc)) {
    VM_BUG_ON(pc->page != page);
    VM_BUG_ON(!pc->mem_cgroup);
    unlock_page_cgroup(page);
    @@ -559,7 +559,7 @@ retry:
    unlock_page_cgroup(page);

    pc = kmem_cache_alloc(page_cgroup_cache, gfp_mask);
    - if (pc == NULL)
    + if (unlikely(pc == NULL))
    goto err;

    /*
    @@ -616,7 +616,7 @@ retry:
    pc->flags = PAGE_CGROUP_FLAG_ACTIVE;

    lock_page_cgroup(page);
    - if (page_get_page_cgroup(page)) {
    + if (unlikely(page_get_page_cgroup(page))) {
    unlock_page_cgroup(page);
    /*
    * Another charge has been added to this page already.
    @@ -689,7 +689,7 @@ void __mem_cgroup_uncharge_common(struct
    */
    lock_page_cgroup(page);
    pc = page_get_page_cgroup(page);
    - if (!pc)
    + if (unlikely(!pc))
    goto unlock;

    VM_BUG_ON(pc->page != page);

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. [RFC/PATCH 6/6] memcg: remove redundant check at charge

    Because of remove refcnt patch, it's very rare case to that
    mem_cgroup_charge_common() is called against a page which is accounted.

    mem_cgroup_charge_common() is called when.
    1. a page is added into file cache.
    2. an anon page is _newly_ mapped.

    A racy case is that a newly-swapped-in anonymous page is referred from
    prural threads in do_swap_page() at the same time.
    (a page is not Locked when mem_cgroup_charge() is called from do_swap_page.)

    Signed-off-by : KAMEZAWA Hiroyuki

    ---
    mm/memcontrol.c | 34 ++++------------------------------
    1 file changed, 4 insertions(+), 30 deletions(-)

    Index: linux-2.6.26-rc2/mm/memcontrol.c
    ================================================== =================
    --- linux-2.6.26-rc2.orig/mm/memcontrol.c
    +++ linux-2.6.26-rc2/mm/memcontrol.c
    @@ -536,28 +536,6 @@ static int mem_cgroup_charge_common(stru
    if (mem_cgroup_subsys.disabled)
    return 0;

    - /*
    - * Should page_cgroup's go to their own slab?
    - * One could optimize the performance of the charging routine
    - * by saving a bit in the page_flags and using it as a lock
    - * to see if the cgroup page already has a page_cgroup associated
    - * with it
    - */
    -retry:
    - lock_page_cgroup(page);
    - pc = page_get_page_cgroup(page);
    - /*
    - * The page_cgroup exists and
    - * the page has already been accounted.
    - */
    - if (unlikely(pc)) {
    - VM_BUG_ON(pc->page != page);
    - VM_BUG_ON(!pc->mem_cgroup);
    - unlock_page_cgroup(page);
    - goto done;
    - }
    - unlock_page_cgroup(page);
    -
    pc = kmem_cache_alloc(page_cgroup_cache, gfp_mask);
    if (unlikely(pc == NULL))
    goto err;
    @@ -618,15 +596,10 @@ retry:
    lock_page_cgroup(page);
    if (unlikely(page_get_page_cgroup(page))) {
    unlock_page_cgroup(page);
    - /*
    - * Another charge has been added to this page already.
    - * We take lock_page_cgroup(page) again and read
    - * page->cgroup, increment refcnt.... just retry is OK.
    - */
    res_counter_uncharge(&mem->res, PAGE_SIZE);
    css_put(&mem->css);
    kmem_cache_free(page_cgroup_cache, pc);
    - goto retry;
    + goto done;
    }
    page_assign_page_cgroup(page, pc);


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [RFC/PATCH 4/6] memcg: shmem reclaim helper

    KAMEZAWA Hiroyuki wrote:
    > A new call, mem_cgroup_shrink_usage() is added for shmem handling
    > and removing not usual usage of mem_cgroup_charge/uncharge.
    >
    > Now, shmem calls mem_cgroup_charge() just for reclaim some pages from
    > mem_cgroup. In general, shmem is used by some process group and not for
    > global resource (like file caches). So, it's reasonable to reclaim pages from
    > mem_cgroup where shmem is mainly used.
    >
    > Signed-off-by: KAMEZAWA Hiroyuki
    >
    > ---
    > mm/memcontrol.c | 18 ++++++++++++++++++
    > 1 file changed, 18 insertions(+)
    >
    > Index: linux-2.6.26-rc2/mm/memcontrol.c
    > ================================================== =================
    > --- linux-2.6.26-rc2.orig/mm/memcontrol.c
    > +++ linux-2.6.26-rc2/mm/memcontrol.c
    > @@ -783,6 +783,30 @@ static void mem_cgroup_drop_all_pages(st
    > }
    >
    > /*
    > + * A call to try to shrink memory usage under specified resource controller.
    > + * This is typically used for page reclaiming for shmem for reducing side
    > + * effect of page allocation from shmem, which is used by some mem_cgroup.
    > + */
    > +int mem_cgroup_shrink_usage(struct mm_struct *mm, gfp_t gfp_mask)
    > +{
    > + struct mem_cgroup *mem;
    > + int progress = 0;
    > + int retry = MEM_CGROUP_RECLAIM_RETRIES;
    > +
    > + rcu_read_lock();
    > + mem = mem_cgroup_from_task(rcu_dereference(mm->owner));
    > + css_get(&mem->css);
    > + rcu_read_unlock();
    > +
    > + while(!progress && --retry) {
    > + progress = try_to_free_mem_cgroup_pages(mem, gfp_mask);
    > + }


    This is wrong. How about:
    do {
    ...
    } while (!progress && --retry);

    > + if (!retry)
    > + return -ENOMEM;
    > + return 0;
    > +}
    > +
    > +/*
    > * This routine traverse page_cgroup in given list and drop them all.
    > * *And* this routine doesn't reclaim page itself, just removes page_cgroup.
    > */

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [RFC/PATCH 4/6] memcg: shmem reclaim helper

    On Wed, 14 May 2008 16:15:49 +0800
    Li Zefan wrote:

    > > + while(!progress && --retry) {
    > > + progress = try_to_free_mem_cgroup_pages(mem, gfp_mask);
    > > + }

    >
    > This is wrong. How about:
    > do {
    > ...
    > } while (!progress && --retry);
    >

    Ouch, or retry--....

    Thank you for catching.

    Thanks
    -Kame

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [RFC/PATCH 4/6] memcg: shmem reclaim helper

    KAMEZAWA Hiroyuki wrote:
    > On Wed, 14 May 2008 16:15:49 +0800
    > Li Zefan wrote:
    >
    >>> + while(!progress && --retry) {
    >>> + progress = try_to_free_mem_cgroup_pages(mem, gfp_mask);
    >>> + }

    >> This is wrong. How about:
    >> do {
    >> ...
    >> } while (!progress && --retry);
    >>

    > Ouch, or retry--....
    >


    retry-- is still wrong, because then after while, retry will be -1, and then:

    + if (!retry)
    + return -ENOMEM;

    > Thank you for catching.
    >


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [RFC/PATCH 4/6] memcg: shmem reclaim helper

    On Wed, 14 May 2008 16:23:09 +0800
    Li Zefan wrote:

    > KAMEZAWA Hiroyuki wrote:
    > > On Wed, 14 May 2008 16:15:49 +0800
    > > Li Zefan wrote:
    > >
    > >>> + while(!progress && --retry) {
    > >>> + progress = try_to_free_mem_cgroup_pages(mem, gfp_mask);
    > >>> + }
    > >> This is wrong. How about:
    > >> do {
    > >> ...
    > >> } while (!progress && --retry);
    > >>

    > > Ouch, or retry--....
    > >

    >
    > retry-- is still wrong, because then after while, retry will be -1, and then:
    >
    > + if (!retry)
    > + return -ENOMEM;
    >

    ok, i'm now rewriting.

    Thanks,
    -Kame

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [RFC/PATCH 2/6] memcg: remove refcnt

    > /*
    > - * Uncharging is always a welcome operation, we never complain, simply
    > - * uncharge.
    > + * uncharge if !page_mapped(page)
    > */
    > -void mem_cgroup_uncharge_page(struct page *page)
    > +void __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)


    static void ?

    > {
    > struct page_cgroup *pc;
    > struct mem_cgroup *mem;
    > @@ -704,29 +693,41 @@ void mem_cgroup_uncharge_page(struct pag
    > goto unlock;
    >


    [..snip..]

    > Index: linux-2.6.26-rc2/include/linux/memcontrol.h
    > ================================================== =================
    > --- linux-2.6.26-rc2.orig/include/linux/memcontrol.h
    > +++ linux-2.6.26-rc2/include/linux/memcontrol.h
    > @@ -35,6 +35,8 @@ extern int mem_cgroup_charge(struct page
    > extern int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
    > gfp_t gfp_mask);
    > extern void mem_cgroup_uncharge_page(struct page *page);
    > +extern void mem_cgroup_uncharge_cache_page(struct page *page);
    > +extern int mem_cgroup_shrink_usage(struct mm_struct *mm, gfp_t gfp_mask);


    This function is defined and used in the 4th patch, so the declaration
    should be moved to that patch.

    > extern void mem_cgroup_move_lists(struct page *page, bool active);
    > extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
    > struct list_head *dst,
    > @@ -53,7 +55,6 @@ extern struct mem_cgroup *mem_cgroup_fro
    > extern int
    > mem_cgroup_prepare_migration(struct page *page, struct page *newpage);
    > extern void mem_cgroup_end_migration(struct page *page);
    > -extern int mem_cgroup_getref(struct page *page);
    >
    > /*
    > * For memory reclaim.
    > @@ -97,6 +98,14 @@ static inline int mem_cgroup_cache_charg
    > static inline void mem_cgroup_uncharge_page(struct page *page)
    > {
    > }


    need a blank line here

    > +static inline void mem_cgroup_uncharge_cache_page(struct page *page)
    > +{
    > +}
    > +


    [..snip..]

    > #ifdef CONFIG_DEBUG_VM
    > Index: linux-2.6.26-rc2/mm/shmem.c
    > ================================================== =================
    > --- linux-2.6.26-rc2.orig/mm/shmem.c
    > +++ linux-2.6.26-rc2/mm/shmem.c
    > @@ -961,13 +961,14 @@ found:
    > shmem_swp_unmap(ptr);
    > spin_unlock(&info->lock);
    > radix_tree_preload_end();
    > -uncharge:
    > - mem_cgroup_uncharge_page(page);
    > out:
    > unlock_page(page);
    > page_cache_release(page);
    > iput(inode); /* allows for NULL */
    > return error;
    > +uncharge:
    > + mem_cgroup_uncharge_cache_page(page);
    > + goto out;
    > }
    >


    Seems the logic is changed here. is it intended ?
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [RFC/PATCH 2/6] memcg: remove refcnt

    On Thu, 15 May 2008 09:42:36 +0800
    Li Zefan wrote:

    > > /*
    > > - * Uncharging is always a welcome operation, we never complain, simply
    > > - * uncharge.
    > > + * uncharge if !page_mapped(page)
    > > */
    > > -void mem_cgroup_uncharge_page(struct page *page)
    > > +void __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)

    >
    > static void ?
    >

    Ah yes. will fix.

    > > Index: linux-2.6.26-rc2/include/linux/memcontrol.h
    > > ================================================== =================
    > > --- linux-2.6.26-rc2.orig/include/linux/memcontrol.h
    > > +++ linux-2.6.26-rc2/include/linux/memcontrol.h
    > > @@ -35,6 +35,8 @@ extern int mem_cgroup_charge(struct page
    > > extern int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
    > > gfp_t gfp_mask);
    > > extern void mem_cgroup_uncharge_page(struct page *page);
    > > +extern void mem_cgroup_uncharge_cache_page(struct page *page);
    > > +extern int mem_cgroup_shrink_usage(struct mm_struct *mm, gfp_t gfp_mask);

    >
    > This function is defined and used in the 4th patch, so the declaration
    > should be moved to that patch.

    Ouch, I'll fix.


    >
    > > extern void mem_cgroup_move_lists(struct page *page, bool active);
    > > extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
    > > struct list_head *dst,
    > > @@ -53,7 +55,6 @@ extern struct mem_cgroup *mem_cgroup_fro
    > > extern int
    > > mem_cgroup_prepare_migration(struct page *page, struct page *newpage);
    > > extern void mem_cgroup_end_migration(struct page *page);
    > > -extern int mem_cgroup_getref(struct page *page);
    > >
    > > /*
    > > * For memory reclaim.
    > > @@ -97,6 +98,14 @@ static inline int mem_cgroup_cache_charg
    > > static inline void mem_cgroup_uncharge_page(struct page *page)
    > > {
    > > }

    >
    > need a blank line here
    >

    ok.

    > > +static inline void mem_cgroup_uncharge_cache_page(struct page *page)
    > > +{
    > > +}
    > > +

    >
    > [..snip..]
    >
    > > #ifdef CONFIG_DEBUG_VM
    > > Index: linux-2.6.26-rc2/mm/shmem.c
    > > ================================================== =================
    > > --- linux-2.6.26-rc2.orig/mm/shmem.c
    > > +++ linux-2.6.26-rc2/mm/shmem.c
    > > @@ -961,13 +961,14 @@ found:
    > > shmem_swp_unmap(ptr);
    > > spin_unlock(&info->lock);
    > > radix_tree_preload_end();
    > > -uncharge:
    > > - mem_cgroup_uncharge_page(page);
    > > out:
    > > unlock_page(page);
    > > page_cache_release(page);
    > > iput(inode); /* allows for NULL */
    > > return error;
    > > +uncharge:
    > > + mem_cgroup_uncharge_cache_page(page);
    > > + goto out;
    > > }
    > >

    >
    > Seems the logic is changed here. is it intended ?
    >

    intended. (if success, uncharge is not necessary because there is no refcnt.
    I'll add comment.

    Thanks,
    -Kame

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [RFC/PATCH 2/6] memcg: remove refcnt

    On Thu, 15 May 2008 10:57:40 +0900
    KAMEZAWA Hiroyuki wrote:
    > > > #ifdef CONFIG_DEBUG_VM
    > > > Index: linux-2.6.26-rc2/mm/shmem.c
    > > > ================================================== =================
    > > > --- linux-2.6.26-rc2.orig/mm/shmem.c
    > > > +++ linux-2.6.26-rc2/mm/shmem.c
    > > > @@ -961,13 +961,14 @@ found:
    > > > shmem_swp_unmap(ptr);
    > > > spin_unlock(&info->lock);
    > > > radix_tree_preload_end();
    > > > -uncharge:
    > > > - mem_cgroup_uncharge_page(page);
    > > > out:
    > > > unlock_page(page);
    > > > page_cache_release(page);
    > > > iput(inode); /* allows for NULL */
    > > > return error;
    > > > +uncharge:
    > > > + mem_cgroup_uncharge_cache_page(page);
    > > > + goto out;
    > > > }
    > > >

    > >
    > > Seems the logic is changed here. is it intended ?
    > >

    > intended. (if success, uncharge is not necessary because there is no refcnt.
    > I'll add comment.
    >

    But, it seems patch 6/6 doesn't seem to be optimal in this case.
    and have some troubles...I'll find a workaround.
    It seems that shmem's memcg is complicated...

    Thanks,
    -Kame


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread