[-mm] Add an owner to the mm_struct (v8) - Kernel

This is a discussion on [-mm] Add an owner to the mm_struct (v8) - Kernel ; Changelog v7 ------------ 1. Make mm_need_new_owner() more readable 2. Remove extra white space from init_task.h Changelog v6 ------------ 1. Fix typos 2. Document the use of delay_group_leader() Changelog v5 ------------ Remove the hooks for .owner from init_task.h and move it ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 26

Thread: [-mm] Add an owner to the mm_struct (v8)

  1. [-mm] Add an owner to the mm_struct (v8)



    Changelog v7
    ------------
    1. Make mm_need_new_owner() more readable
    2. Remove extra white space from init_task.h

    Changelog v6
    ------------

    1. Fix typos
    2. Document the use of delay_group_leader()

    Changelog v5
    ------------
    Remove the hooks for .owner from init_task.h and move it to init/main.c

    Changelog v4
    ------------
    1. Release rcu_read_lock() after acquiring task_lock(). Also get a reference
    to the task_struct
    2. Change cgroup mm_owner_changed callback to callback only if the
    cgroup of old and new task is different and to pass the old and new
    cgroups instead of task pointers
    3. Port the patch to 2.6.25-rc8-mm1

    Changelog v3
    ------------

    1. Add mm->owner change callbacks using cgroups

    This patch removes the mem_cgroup member from mm_struct and instead adds
    an owner. This approach was suggested by Paul Menage. The advantage of
    this approach is that, once the mm->owner is known, using the subsystem
    id, the cgroup can be determined. It also allows several control groups
    that are virtually grouped by mm_struct, to exist independent of the memory
    controller i.e., without adding mem_cgroup's for each controller,
    to mm_struct.

    A new config option CONFIG_MM_OWNER is added and the memory resource
    controller selects this config option.

    This patch also adds cgroup callbacks to notify subsystems when mm->owner
    changes. The mm_cgroup_changed callback is called with the task_lock()
    of the new task held and is called just prior to changing the mm->owner.

    I am indebted to Paul Menage for the several reviews of this patchset
    and helping me make it lighter and simpler.

    This patch was tested on a powerpc box, it was compiled with both the
    MM_OWNER config turned on and off.

    After the thread group leader exits, it's moved to init_css_state by
    cgroup_exit(), thus all future charges from runnings threads would
    be redirected to the init_css_set's subsystem.

    Signed-off-by: Balbir Singh
    ---

    fs/exec.c | 1
    include/linux/cgroup.h | 15 +++++++
    include/linux/memcontrol.h | 16 +-------
    include/linux/mm_types.h | 5 +-
    include/linux/sched.h | 13 ++++++
    init/Kconfig | 15 +++++++
    init/main.c | 1
    kernel/cgroup.c | 30 +++++++++++++++
    kernel/exit.c | 89 +++++++++++++++++++++++++++++++++++++++++++++
    kernel/fork.c | 11 ++++-
    mm/memcontrol.c | 24 +-----------
    11 files changed, 181 insertions(+), 39 deletions(-)

    diff -puN fs/exec.c~memory-controller-add-mm-owner fs/exec.c
    --- linux-2.6.25-rc8/fs/exec.c~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    +++ linux-2.6.25-rc8-balbir/fs/exec.c 2008-04-03 22:43:27.000000000 +0530
    @@ -735,6 +735,7 @@ static int exec_mmap(struct mm_struct *m
    tsk->active_mm = mm;
    activate_mm(active_mm, mm);
    task_unlock(tsk);
    + mm_update_next_owner(mm);
    arch_pick_mmap_layout(mm);
    if (old_mm) {
    up_read(&old_mm->mmap_sem);
    diff -puN include/linux/cgroup.h~memory-controller-add-mm-owner include/linux/cgroup.h
    --- linux-2.6.25-rc8/include/linux/cgroup.h~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    +++ linux-2.6.25-rc8-balbir/include/linux/cgroup.h 2008-04-03 22:43:27.000000000 +0530
    @@ -300,6 +300,12 @@ struct cgroup_subsys {
    struct cgroup *cgrp);
    void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp);
    void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
    + /*
    + * This routine is called with the task_lock of mm->owner held
    + */
    + void (*mm_owner_changed)(struct cgroup_subsys *ss,
    + struct cgroup *old,
    + struct cgroup *new);
    int subsys_id;
    int active;
    int disabled;
    @@ -385,4 +391,13 @@ static inline int cgroupstats_build(stru

    #endif /* !CONFIG_CGROUPS */

    +#ifdef CONFIG_MM_OWNER
    +extern void
    +cgroup_mm_owner_callbacks(struct task_struct *old, struct task_struct *new);
    +#else /* !CONFIG_MM_OWNER */
    +static inline void
    +cgroup_mm_owner_callbacks(struct task_struct *old, struct task_struct *new)
    +{
    +}
    +#endif /* CONFIG_MM_OWNER */
    #endif /* _LINUX_CGROUP_H */
    diff -puN include/linux/init_task.h~memory-controller-add-mm-owner include/linux/init_task.h
    diff -puN include/linux/memcontrol.h~memory-controller-add-mm-owner include/linux/memcontrol.h
    --- linux-2.6.25-rc8/include/linux/memcontrol.h~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    +++ linux-2.6.25-rc8-balbir/include/linux/memcontrol.h 2008-04-03 22:43:27.000000000 +0530
    @@ -27,9 +27,6 @@ struct mm_struct;

    #ifdef CONFIG_CGROUP_MEM_RES_CTLR

    -extern void mm_init_cgroup(struct mm_struct *mm, struct task_struct *p);
    -extern void mm_free_cgroup(struct mm_struct *mm);
    -
    #define page_reset_bad_cgroup(page) ((page)->page_cgroup = 0)

    extern struct page_cgroup *page_get_page_cgroup(struct page *page);
    @@ -48,8 +45,10 @@ extern unsigned long mem_cgroup_isolate_
    extern void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask);
    int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem);

    +extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p);
    +
    #define mm_match_cgroup(mm, cgroup) \
    - ((cgroup) == rcu_dereference((mm)->mem_cgroup))
    + ((cgroup) == mem_cgroup_from_task((mm)->owner))

    extern int mem_cgroup_prepare_migration(struct page *page);
    extern void mem_cgroup_end_migration(struct page *page);
    @@ -73,15 +72,6 @@ extern long mem_cgroup_calc_reclaim_inac
    struct zone *zone, int priority);

    #else /* CONFIG_CGROUP_MEM_RES_CTLR */
    -static inline void mm_init_cgroup(struct mm_struct *mm,
    - struct task_struct *p)
    -{
    -}
    -
    -static inline void mm_free_cgroup(struct mm_struct *mm)
    -{
    -}
    -
    static inline void page_reset_bad_cgroup(struct page *page)
    {
    }
    diff -puN include/linux/mm_types.h~memory-controller-add-mm-owner include/linux/mm_types.h
    --- linux-2.6.25-rc8/include/linux/mm_types.h~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    +++ linux-2.6.25-rc8-balbir/include/linux/mm_types.h 2008-04-03 22:43:27.000000000 +0530
    @@ -230,8 +230,9 @@ struct mm_struct {
    /* aio bits */
    rwlock_t ioctx_list_lock; /* aio lock */
    struct kioctx *ioctx_list;
    -#ifdef CONFIG_CGROUP_MEM_RES_CTLR
    - struct mem_cgroup *mem_cgroup;
    +#ifdef CONFIG_MM_OWNER
    + struct task_struct *owner; /* The thread group leader that */
    + /* owns the mm_struct. */
    #endif

    #ifdef CONFIG_PROC_FS
    diff -puN include/linux/sched.h~memory-controller-add-mm-owner include/linux/sched.h
    --- linux-2.6.25-rc8/include/linux/sched.h~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    +++ linux-2.6.25-rc8-balbir/include/linux/sched.h 2008-04-03 22:43:27.000000000 +0530
    @@ -2144,6 +2144,19 @@ static inline void migration_init(void)

    #define TASK_STATE_TO_CHAR_STR "RSDTtZX"

    +#ifdef CONFIG_MM_OWNER
    +extern void mm_update_next_owner(struct mm_struct *mm);
    +extern void mm_init_owner(struct mm_struct *mm, struct task_struct *p);
    +#else
    +static inline void mm_update_next_owner(struct mm_struct *mm)
    +{
    +}
    +
    +static inline void mm_init_owner(struct mm_struct *mm, struct task_struct *p)
    +{
    +}
    +#endif /* CONFIG_MM_OWNER */
    +
    #endif /* __KERNEL__ */

    #endif
    diff -puN init/Kconfig~memory-controller-add-mm-owner init/Kconfig
    --- linux-2.6.25-rc8/init/Kconfig~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    +++ linux-2.6.25-rc8-balbir/init/Kconfig 2008-04-03 22:45:18.000000000 +0530
    @@ -371,9 +371,21 @@ config RESOURCE_COUNTERS
    infrastructure that works with cgroups
    depends on CGROUPS

    +config MM_OWNER
    + bool "Enable ownership of mm structure"
    + help
    + This option enables mm_struct's to have an owner. The advantage
    + of this approach is that it allows for several independent memory
    + based cgroup controllers to co-exist independently without too
    + much space overhead
    +
    + This feature adds fork/exit overhead. So enable this only if
    + you need resource controllers
    +
    config CGROUP_MEM_RES_CTLR
    bool "Memory Resource Controller for Control Groups"
    depends on CGROUPS && RESOURCE_COUNTERS
    + select MM_OWNER
    help
    Provides a memory resource controller that manages both page cache and
    RSS memory.
    @@ -386,6 +398,9 @@ config CGROUP_MEM_RES_CTLR
    Only enable when you're ok with these trade offs and really
    sure you need the memory resource controller.

    + This config option also selects MM_OWNER config option, which
    + could in turn add some fork/exit overhead.
    +
    config SYSFS_DEPRECATED
    bool

    diff -puN kernel/cgroup.c~memory-controller-add-mm-owner kernel/cgroup.c
    --- linux-2.6.25-rc8/kernel/cgroup.c~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    +++ linux-2.6.25-rc8-balbir/kernel/cgroup.c 2008-04-03 22:43:27.000000000 +0530
    @@ -118,6 +118,7 @@ static int root_count;
    * be called.
    */
    static int need_forkexit_callback;
    +static int need_mm_owner_callback;

    /* convenient tests for these bits */
    inline int cgroup_is_removed(const struct cgroup *cgrp)
    @@ -2485,6 +2486,7 @@ static void __init cgroup_init_subsys(st
    }

    need_forkexit_callback |= ss->fork || ss->exit;
    + need_mm_owner_callback |= !!ss->mm_owner_changed;

    ss->active = 1;
    }
    @@ -2721,6 +2723,34 @@ void cgroup_fork_callbacks(struct task_s
    }
    }

    +#ifdef CONFIG_MM_OWNER
    +/**
    + * cgroup_mm_owner_callbacks - run callbacks when the mm->owner changes
    + * @p: the new owner
    + *
    + * Called on every change to mm->owner. mm_init_owner() does not
    + * invoke this routine, since it assigns the mm->owner the first time
    + * and does not change it.
    + */
    +void cgroup_mm_owner_callbacks(struct task_struct *old, struct task_struct *new)
    +{
    + struct cgroup *oldcgrp, *newcgrp;
    +
    + if (need_mm_owner_callback) {
    + int i;
    + for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
    + struct cgroup_subsys *ss = subsys[i];
    + oldcgrp = task_cgroup(old, ss->subsys_id);
    + newcgrp = task_cgroup(new, ss->subsys_id);
    + if (oldcgrp == newcgrp)
    + continue;
    + if (ss->mm_owner_changed)
    + ss->mm_owner_changed(ss, oldcgrp, newcgrp);
    + }
    + }
    +}
    +#endif /* CONFIG_MM_OWNER */
    +
    /**
    * cgroup_post_fork - called on a new task after adding it to the task list
    * @child: the task in question
    diff -puN kernel/exit.c~memory-controller-add-mm-owner kernel/exit.c
    --- linux-2.6.25-rc8/kernel/exit.c~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    +++ linux-2.6.25-rc8-balbir/kernel/exit.c 2008-04-04 00:56:51.000000000 +0530
    @@ -577,6 +577,94 @@ void exit_fs(struct task_struct *tsk)

    EXPORT_SYMBOL_GPL(exit_fs);

    +#ifdef CONFIG_MM_OWNER
    +/*
    + * Task p is exiting and it owned p, so lets find a new owner for it
    + */
    +static inline int
    +mm_need_new_owner(struct mm_struct *mm, struct task_struct *p)
    +{
    + /*
    + * If there are other users of the mm and the owner (us) is exiting
    + * we need to find a new owner to take on the responsibility.
    + * When we use thread groups (CLONE_THREAD), the thread group
    + * leader is kept around in zombie state, even after it exits.
    + * delay_group_leader() ensures that if the group leader is around
    + * we need not select a new owner.
    + */
    + if (!mm)
    + return 0;
    + if (atomic_read(&mm->mm_users) <= 1)
    + return 0;
    + if (mm->owner != p)
    + return 0;
    + if (delay_group_leader(p))
    + return 0;
    + return 1;
    +}
    +
    +void mm_update_next_owner(struct mm_struct *mm)
    +{
    + struct task_struct *c, *g, *p = current;
    +
    +retry:
    + if (!mm_need_new_owner(mm, p))
    + return;
    +
    + rcu_read_lock();
    + /*
    + * Search in the children
    + */
    + list_for_each_entry(c, &p->children, sibling) {
    + if (c->mm == mm)
    + goto assign_new_owner;
    + }
    +
    + /*
    + * Search in the siblings
    + */
    + list_for_each_entry(c, &p->parent->children, sibling) {
    + if (c->mm == mm)
    + goto assign_new_owner;
    + }
    +
    + /*
    + * Search through everything else. We should not get
    + * here often
    + */
    + do_each_thread(g, c) {
    + if (c->mm == mm)
    + goto assign_new_owner;
    + } while_each_thread(g, c);
    +
    + rcu_read_unlock();
    + return;
    +
    +assign_new_owner:
    + BUG_ON(c == p);
    + get_task_struct(c);
    + /*
    + * The task_lock protects c->mm from changing.
    + * We always want mm->owner->mm == mm
    + */
    + task_lock(c);
    + /*
    + * Delay rcu_read_unlock() till we have the task_lock()
    + * to ensure that c does not slip away underneath us
    + */
    + rcu_read_unlock();
    + if (c->mm != mm) {
    + task_unlock(c);
    + put_task_struct(c);
    + goto retry;
    + }
    + cgroup_mm_owner_callbacks(mm->owner, c);
    + mm->owner = c;
    + task_unlock(c);
    + put_task_struct(c);
    +}
    +#endif /* CONFIG_MM_OWNER */
    +
    /*
    * Turn us into a lazy TLB process if we
    * aren't already..
    @@ -616,6 +704,7 @@ static void exit_mm(struct task_struct *
    /* We don't want this task to be frozen prematurely */
    clear_freeze_flag(tsk);
    task_unlock(tsk);
    + mm_update_next_owner(mm);
    mmput(mm);
    }

    diff -puN kernel/fork.c~memory-controller-add-mm-owner kernel/fork.c
    --- linux-2.6.25-rc8/kernel/fork.c~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    +++ linux-2.6.25-rc8-balbir/kernel/fork.c 2008-04-03 22:43:27.000000000 +0530
    @@ -358,14 +358,13 @@ static struct mm_struct * mm_init(struct
    mm->ioctx_list = NULL;
    mm->free_area_cache = TASK_UNMAPPED_BASE;
    mm->cached_hole_size = ~0UL;
    - mm_init_cgroup(mm, p);
    + mm_init_owner(mm, p);

    if (likely(!mm_alloc_pgd(mm))) {
    mm->def_flags = 0;
    return mm;
    }

    - mm_free_cgroup(mm);
    free_mm(mm);
    return NULL;
    }
    @@ -416,7 +415,6 @@ void mmput(struct mm_struct *mm)
    spin_unlock(&mmlist_lock);
    }
    put_swap_token(mm);
    - mm_free_cgroup(mm);
    mmdrop(mm);
    }
    }
    @@ -996,6 +994,13 @@ static void rt_mutex_init_task(struct ta
    #endif
    }

    +#ifdef CONFIG_MM_OWNER
    +void mm_init_owner(struct mm_struct *mm, struct task_struct *p)
    +{
    + mm->owner = p;
    +}
    +#endif /* CONFIG_MM_OWNER */
    +
    /*
    * This creates a new process as a copy of the old one,
    * but does not actually start it yet.
    diff -puN mm/memcontrol.c~memory-controller-add-mm-owner mm/memcontrol.c
    --- linux-2.6.25-rc8/mm/memcontrol.c~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    +++ linux-2.6.25-rc8-balbir/mm/memcontrol.c 2008-04-03 22:46:51.000000000 +0530
    @@ -238,26 +238,12 @@ static struct mem_cgroup *mem_cgroup_fro
    css);
    }

    -static struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
    +struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
    {
    return container_of(task_subsys_state(p, mem_cgroup_subsys_id),
    struct mem_cgroup, css);
    }

    -void mm_init_cgroup(struct mm_struct *mm, struct task_struct *p)
    -{
    - struct mem_cgroup *mem;
    -
    - mem = mem_cgroup_from_task(p);
    - css_get(&mem->css);
    - mm->mem_cgroup = mem;
    -}
    -
    -void mm_free_cgroup(struct mm_struct *mm)
    -{
    - css_put(&mm->mem_cgroup->css);
    -}
    -
    static inline int page_cgroup_locked(struct page *page)
    {
    return bit_spin_is_locked(PAGE_CGROUP_LOCK_BIT, &page->page_cgroup);
    @@ -478,6 +464,7 @@ unsigned long mem_cgroup_isolate_pages(u
    int zid = zone_idx(z);
    struct mem_cgroup_per_zone *mz;

    + BUG_ON(!mem_cont);
    mz = mem_cgroup_zoneinfo(mem_cont, nid, zid);
    if (active)
    src = &mz->active_list;
    @@ -576,7 +563,7 @@ retry:
    mm = &init_mm;

    rcu_read_lock();
    - mem = rcu_dereference(mm->mem_cgroup);
    + mem = mem_cgroup_from_task(rcu_dereference(mm->owner));
    /*
    * For every charge from the cgroup, increment reference count
    */
    @@ -1006,7 +993,6 @@ mem_cgroup_create(struct cgroup_subsys *

    if (unlikely((cont->parent) == NULL)) {
    mem = &init_mem_cgroup;
    - init_mm.mem_cgroup = mem;
    page_cgroup_cache = KMEM_CACHE(page_cgroup, SLAB_PANIC);
    } else
    mem = kzalloc(sizeof(struct mem_cgroup), GFP_KERNEL);
    @@ -1087,10 +1073,6 @@ static void mem_cgroup_move_task(struct
    if (!thread_group_leader(p))
    goto out;

    - css_get(&mem->css);
    - rcu_assign_pointer(mm->mem_cgroup, mem);
    - css_put(&old_mem->css);
    -
    out:
    mmput(mm);
    }
    diff -puN init/main.c~memory-controller-add-mm-owner init/main.c
    --- linux-2.6.25-rc8/init/main.c~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    +++ linux-2.6.25-rc8-balbir/init/main.c 2008-04-03 22:43:27.000000000 +0530
    @@ -537,6 +537,7 @@ asmlinkage void __init start_kernel(void
    printk(KERN_NOTICE);
    printk(linux_banner);
    setup_arch(&command_line);
    + mm_init_owner(&init_mm, &init_task);
    setup_command_line(command_line);
    unwind_setup();
    setup_per_cpu_areas();
    _

    --
    Warm Regards,
    Balbir Singh
    Linux Technology Center
    IBM, ISTL
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [-mm] Add an owner to the mm_struct (v8)

    On Fri, Apr 4, 2008 at 1:05 AM, Balbir Singh wrote:
    >
    > After the thread group leader exits, it's moved to init_css_state by
    > cgroup_exit(), thus all future charges from runnings threads would
    > be redirected to the init_css_set's subsystem.


    And its uncharges, which is more of the problem I was getting at
    earlier - surely when the mm is finally destroyed, all its virtual
    address space charges will be uncharged from the root cgroup rather
    than the correct cgroup, if we left the delayed group leader as the
    owner? Which is why I think the group leader optimization is unsafe.

    Paul

    >
    > Signed-off-by: Balbir Singh
    > ---
    >
    > fs/exec.c | 1
    > include/linux/cgroup.h | 15 +++++++
    > include/linux/memcontrol.h | 16 +-------
    > include/linux/mm_types.h | 5 +-
    > include/linux/sched.h | 13 ++++++
    > init/Kconfig | 15 +++++++
    > init/main.c | 1
    > kernel/cgroup.c | 30 +++++++++++++++
    > kernel/exit.c | 89 +++++++++++++++++++++++++++++++++++++++++++++
    > kernel/fork.c | 11 ++++-
    > mm/memcontrol.c | 24 +-----------
    > 11 files changed, 181 insertions(+), 39 deletions(-)
    >
    > diff -puN fs/exec.c~memory-controller-add-mm-owner fs/exec.c
    > --- linux-2.6.25-rc8/fs/exec.c~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    > +++ linux-2.6.25-rc8-balbir/fs/exec.c 2008-04-03 22:43:27.000000000 +0530
    > @@ -735,6 +735,7 @@ static int exec_mmap(struct mm_struct *m
    > tsk->active_mm = mm;
    > activate_mm(active_mm, mm);
    > task_unlock(tsk);
    > + mm_update_next_owner(mm);
    > arch_pick_mmap_layout(mm);
    > if (old_mm) {
    > up_read(&old_mm->mmap_sem);
    > diff -puN include/linux/cgroup.h~memory-controller-add-mm-owner include/linux/cgroup.h
    > --- linux-2.6.25-rc8/include/linux/cgroup.h~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    > +++ linux-2.6.25-rc8-balbir/include/linux/cgroup.h 2008-04-03 22:43:27.000000000 +0530
    > @@ -300,6 +300,12 @@ struct cgroup_subsys {
    > struct cgroup *cgrp);
    > void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp);
    > void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
    > + /*
    > + * This routine is called with the task_lock of mm->owner held
    > + */
    > + void (*mm_owner_changed)(struct cgroup_subsys *ss,
    > + struct cgroup *old,
    > + struct cgroup *new);
    > int subsys_id;
    > int active;
    > int disabled;
    > @@ -385,4 +391,13 @@ static inline int cgroupstats_build(stru
    >
    > #endif /* !CONFIG_CGROUPS */
    >
    > +#ifdef CONFIG_MM_OWNER
    > +extern void
    > +cgroup_mm_owner_callbacks(struct task_struct *old, struct task_struct *new);
    > +#else /* !CONFIG_MM_OWNER */
    > +static inline void
    > +cgroup_mm_owner_callbacks(struct task_struct *old, struct task_struct *new)
    > +{
    > +}
    > +#endif /* CONFIG_MM_OWNER */
    > #endif /* _LINUX_CGROUP_H */
    > diff -puN include/linux/init_task.h~memory-controller-add-mm-owner include/linux/init_task.h
    > diff -puN include/linux/memcontrol.h~memory-controller-add-mm-owner include/linux/memcontrol.h
    > --- linux-2.6.25-rc8/include/linux/memcontrol.h~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    > +++ linux-2.6.25-rc8-balbir/include/linux/memcontrol.h 2008-04-03 22:43:27.000000000 +0530
    > @@ -27,9 +27,6 @@ struct mm_struct;
    >
    > #ifdef CONFIG_CGROUP_MEM_RES_CTLR
    >
    > -extern void mm_init_cgroup(struct mm_struct *mm, struct task_struct *p);
    > -extern void mm_free_cgroup(struct mm_struct *mm);
    > -
    > #define page_reset_bad_cgroup(page) ((page)->page_cgroup = 0)
    >
    > extern struct page_cgroup *page_get_page_cgroup(struct page *page);
    > @@ -48,8 +45,10 @@ extern unsigned long mem_cgroup_isolate_
    > extern void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask);
    > int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem);
    >
    > +extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p);
    > +
    > #define mm_match_cgroup(mm, cgroup) \
    > - ((cgroup) == rcu_dereference((mm)->mem_cgroup))
    > + ((cgroup) == mem_cgroup_from_task((mm)->owner))
    >
    > extern int mem_cgroup_prepare_migration(struct page *page);
    > extern void mem_cgroup_end_migration(struct page *page);
    > @@ -73,15 +72,6 @@ extern long mem_cgroup_calc_reclaim_inac
    > struct zone *zone, int priority);
    >
    > #else /* CONFIG_CGROUP_MEM_RES_CTLR */
    > -static inline void mm_init_cgroup(struct mm_struct *mm,
    > - struct task_struct *p)
    > -{
    > -}
    > -
    > -static inline void mm_free_cgroup(struct mm_struct *mm)
    > -{
    > -}
    > -
    > static inline void page_reset_bad_cgroup(struct page *page)
    > {
    > }
    > diff -puN include/linux/mm_types.h~memory-controller-add-mm-owner include/linux/mm_types.h
    > --- linux-2.6.25-rc8/include/linux/mm_types.h~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    > +++ linux-2.6.25-rc8-balbir/include/linux/mm_types.h 2008-04-03 22:43:27.000000000 +0530
    > @@ -230,8 +230,9 @@ struct mm_struct {
    > /* aio bits */
    > rwlock_t ioctx_list_lock; /* aio lock */
    > struct kioctx *ioctx_list;
    > -#ifdef CONFIG_CGROUP_MEM_RES_CTLR
    > - struct mem_cgroup *mem_cgroup;
    > +#ifdef CONFIG_MM_OWNER
    > + struct task_struct *owner; /* The thread group leader that */
    > + /* owns the mm_struct. */
    > #endif
    >
    > #ifdef CONFIG_PROC_FS
    > diff -puN include/linux/sched.h~memory-controller-add-mm-owner include/linux/sched.h
    > --- linux-2.6.25-rc8/include/linux/sched.h~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    > +++ linux-2.6.25-rc8-balbir/include/linux/sched.h 2008-04-03 22:43:27.000000000 +0530
    > @@ -2144,6 +2144,19 @@ static inline void migration_init(void)
    >
    > #define TASK_STATE_TO_CHAR_STR "RSDTtZX"
    >
    > +#ifdef CONFIG_MM_OWNER
    > +extern void mm_update_next_owner(struct mm_struct *mm);
    > +extern void mm_init_owner(struct mm_struct *mm, struct task_struct *p);
    > +#else
    > +static inline void mm_update_next_owner(struct mm_struct *mm)
    > +{
    > +}
    > +
    > +static inline void mm_init_owner(struct mm_struct *mm, struct task_struct *p)
    > +{
    > +}
    > +#endif /* CONFIG_MM_OWNER */
    > +
    > #endif /* __KERNEL__ */
    >
    > #endif
    > diff -puN init/Kconfig~memory-controller-add-mm-owner init/Kconfig
    > --- linux-2.6.25-rc8/init/Kconfig~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    > +++ linux-2.6.25-rc8-balbir/init/Kconfig 2008-04-03 22:45:18.000000000 +0530
    > @@ -371,9 +371,21 @@ config RESOURCE_COUNTERS
    > infrastructure that works with cgroups
    > depends on CGROUPS
    >
    > +config MM_OWNER
    > + bool "Enable ownership of mm structure"
    > + help
    > + This option enables mm_struct's to have an owner. The advantage
    > + of this approach is that it allows for several independent memory
    > + based cgroup controllers to co-exist independently without too
    > + much space overhead
    > +
    > + This feature adds fork/exit overhead. So enable this only if
    > + you need resource controllers
    > +
    > config CGROUP_MEM_RES_CTLR
    > bool "Memory Resource Controller for Control Groups"
    > depends on CGROUPS && RESOURCE_COUNTERS
    > + select MM_OWNER
    > help
    > Provides a memory resource controller that manages both page cache and
    > RSS memory.
    > @@ -386,6 +398,9 @@ config CGROUP_MEM_RES_CTLR
    > Only enable when you're ok with these trade offs and really
    > sure you need the memory resource controller.
    >
    > + This config option also selects MM_OWNER config option, which
    > + could in turn add some fork/exit overhead.
    > +
    > config SYSFS_DEPRECATED
    > bool
    >
    > diff -puN kernel/cgroup.c~memory-controller-add-mm-owner kernel/cgroup.c
    > --- linux-2.6.25-rc8/kernel/cgroup.c~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    > +++ linux-2.6.25-rc8-balbir/kernel/cgroup.c 2008-04-03 22:43:27.000000000 +0530
    > @@ -118,6 +118,7 @@ static int root_count;
    > * be called.
    > */
    > static int need_forkexit_callback;
    > +static int need_mm_owner_callback;
    >
    > /* convenient tests for these bits */
    > inline int cgroup_is_removed(const struct cgroup *cgrp)
    > @@ -2485,6 +2486,7 @@ static void __init cgroup_init_subsys(st
    > }
    >
    > need_forkexit_callback |= ss->fork || ss->exit;
    > + need_mm_owner_callback |= !!ss->mm_owner_changed;
    >
    > ss->active = 1;
    > }
    > @@ -2721,6 +2723,34 @@ void cgroup_fork_callbacks(struct task_s
    > }
    > }
    >
    > +#ifdef CONFIG_MM_OWNER
    > +/**
    > + * cgroup_mm_owner_callbacks - run callbacks when the mm->owner changes
    > + * @p: the new owner
    > + *
    > + * Called on every change to mm->owner. mm_init_owner() does not
    > + * invoke this routine, since it assigns the mm->owner the first time
    > + * and does not change it.
    > + */
    > +void cgroup_mm_owner_callbacks(struct task_struct *old, struct task_struct *new)
    > +{
    > + struct cgroup *oldcgrp, *newcgrp;
    > +
    > + if (need_mm_owner_callback) {
    > + int i;
    > + for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
    > + struct cgroup_subsys *ss = subsys[i];
    > + oldcgrp = task_cgroup(old, ss->subsys_id);
    > + newcgrp = task_cgroup(new, ss->subsys_id);
    > + if (oldcgrp == newcgrp)
    > + continue;
    > + if (ss->mm_owner_changed)
    > + ss->mm_owner_changed(ss, oldcgrp, newcgrp);
    > + }
    > + }
    > +}
    > +#endif /* CONFIG_MM_OWNER */
    > +
    > /**
    > * cgroup_post_fork - called on a new task after adding it to the task list
    > * @child: the task in question
    > diff -puN kernel/exit.c~memory-controller-add-mm-owner kernel/exit.c
    > --- linux-2.6.25-rc8/kernel/exit.c~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    > +++ linux-2.6.25-rc8-balbir/kernel/exit.c 2008-04-04 00:56:51.000000000 +0530
    > @@ -577,6 +577,94 @@ void exit_fs(struct task_struct *tsk)
    >
    > EXPORT_SYMBOL_GPL(exit_fs);
    >
    > +#ifdef CONFIG_MM_OWNER
    > +/*
    > + * Task p is exiting and it owned p, so lets find a new owner for it
    > + */
    > +static inline int
    > +mm_need_new_owner(struct mm_struct *mm, struct task_struct *p)
    > +{
    > + /*
    > + * If there are other users of the mm and the owner (us) is exiting
    > + * we need to find a new owner to take on the responsibility.
    > + * When we use thread groups (CLONE_THREAD), the thread group
    > + * leader is kept around in zombie state, even after it exits.
    > + * delay_group_leader() ensures that if the group leader is around
    > + * we need not select a new owner.
    > + */
    > + if (!mm)
    > + return 0;
    > + if (atomic_read(&mm->mm_users) <= 1)
    > + return 0;
    > + if (mm->owner != p)
    > + return 0;
    > + if (delay_group_leader(p))
    > + return 0;
    > + return 1;
    > +}
    > +
    > +void mm_update_next_owner(struct mm_struct *mm)
    > +{
    > + struct task_struct *c, *g, *p = current;
    > +
    > +retry:
    > + if (!mm_need_new_owner(mm, p))
    > + return;
    > +
    > + rcu_read_lock();
    > + /*
    > + * Search in the children
    > + */
    > + list_for_each_entry(c, &p->children, sibling) {
    > + if (c->mm == mm)
    > + goto assign_new_owner;
    > + }
    > +
    > + /*
    > + * Search in the siblings
    > + */
    > + list_for_each_entry(c, &p->parent->children, sibling) {
    > + if (c->mm == mm)
    > + goto assign_new_owner;
    > + }
    > +
    > + /*
    > + * Search through everything else. We should not get
    > + * here often
    > + */
    > + do_each_thread(g, c) {
    > + if (c->mm == mm)
    > + goto assign_new_owner;
    > + } while_each_thread(g, c);
    > +
    > + rcu_read_unlock();
    > + return;
    > +
    > +assign_new_owner:
    > + BUG_ON(c == p);
    > + get_task_struct(c);
    > + /*
    > + * The task_lock protects c->mm from changing.
    > + * We always want mm->owner->mm == mm
    > + */
    > + task_lock(c);
    > + /*
    > + * Delay rcu_read_unlock() till we have the task_lock()
    > + * to ensure that c does not slip away underneath us
    > + */
    > + rcu_read_unlock();
    > + if (c->mm != mm) {
    > + task_unlock(c);
    > + put_task_struct(c);
    > + goto retry;
    > + }
    > + cgroup_mm_owner_callbacks(mm->owner, c);
    > + mm->owner = c;
    > + task_unlock(c);
    > + put_task_struct(c);
    > +}
    > +#endif /* CONFIG_MM_OWNER */
    > +
    > /*
    > * Turn us into a lazy TLB process if we
    > * aren't already..
    > @@ -616,6 +704,7 @@ static void exit_mm(struct task_struct *
    > /* We don't want this task to be frozen prematurely */
    > clear_freeze_flag(tsk);
    > task_unlock(tsk);
    > + mm_update_next_owner(mm);
    > mmput(mm);
    > }
    >
    > diff -puN kernel/fork.c~memory-controller-add-mm-owner kernel/fork.c
    > --- linux-2.6.25-rc8/kernel/fork.c~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    > +++ linux-2.6.25-rc8-balbir/kernel/fork.c 2008-04-03 22:43:27.000000000 +0530
    > @@ -358,14 +358,13 @@ static struct mm_struct * mm_init(struct
    > mm->ioctx_list = NULL;
    > mm->free_area_cache = TASK_UNMAPPED_BASE;
    > mm->cached_hole_size = ~0UL;
    > - mm_init_cgroup(mm, p);
    > + mm_init_owner(mm, p);
    >
    > if (likely(!mm_alloc_pgd(mm))) {
    > mm->def_flags = 0;
    > return mm;
    > }
    >
    > - mm_free_cgroup(mm);
    > free_mm(mm);
    > return NULL;
    > }
    > @@ -416,7 +415,6 @@ void mmput(struct mm_struct *mm)
    > spin_unlock(&mmlist_lock);
    > }
    > put_swap_token(mm);
    > - mm_free_cgroup(mm);
    > mmdrop(mm);
    > }
    > }
    > @@ -996,6 +994,13 @@ static void rt_mutex_init_task(struct ta
    > #endif
    > }
    >
    > +#ifdef CONFIG_MM_OWNER
    > +void mm_init_owner(struct mm_struct *mm, struct task_struct *p)
    > +{
    > + mm->owner = p;
    > +}
    > +#endif /* CONFIG_MM_OWNER */
    > +
    > /*
    > * This creates a new process as a copy of the old one,
    > * but does not actually start it yet.
    > diff -puN mm/memcontrol.c~memory-controller-add-mm-owner mm/memcontrol.c
    > --- linux-2.6.25-rc8/mm/memcontrol.c~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    > +++ linux-2.6.25-rc8-balbir/mm/memcontrol.c 2008-04-03 22:46:51.000000000 +0530
    > @@ -238,26 +238,12 @@ static struct mem_cgroup *mem_cgroup_fro
    > css);
    > }
    >
    > -static struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
    > +struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
    > {
    > return container_of(task_subsys_state(p, mem_cgroup_subsys_id),
    > struct mem_cgroup, css);
    > }
    >
    > -void mm_init_cgroup(struct mm_struct *mm, struct task_struct *p)
    > -{
    > - struct mem_cgroup *mem;
    > -
    > - mem = mem_cgroup_from_task(p);
    > - css_get(&mem->css);
    > - mm->mem_cgroup = mem;
    > -}
    > -
    > -void mm_free_cgroup(struct mm_struct *mm)
    > -{
    > - css_put(&mm->mem_cgroup->css);
    > -}
    > -
    > static inline int page_cgroup_locked(struct page *page)
    > {
    > return bit_spin_is_locked(PAGE_CGROUP_LOCK_BIT, &page->page_cgroup);
    > @@ -478,6 +464,7 @@ unsigned long mem_cgroup_isolate_pages(u
    > int zid = zone_idx(z);
    > struct mem_cgroup_per_zone *mz;
    >
    > + BUG_ON(!mem_cont);
    > mz = mem_cgroup_zoneinfo(mem_cont, nid, zid);
    > if (active)
    > src = &mz->active_list;
    > @@ -576,7 +563,7 @@ retry:
    > mm = &init_mm;
    >
    > rcu_read_lock();
    > - mem = rcu_dereference(mm->mem_cgroup);
    > + mem = mem_cgroup_from_task(rcu_dereference(mm->owner));
    > /*
    > * For every charge from the cgroup, increment reference count
    > */
    > @@ -1006,7 +993,6 @@ mem_cgroup_create(struct cgroup_subsys *
    >
    > if (unlikely((cont->parent) == NULL)) {
    > mem = &init_mem_cgroup;
    > - init_mm.mem_cgroup = mem;
    > page_cgroup_cache = KMEM_CACHE(page_cgroup, SLAB_PANIC);
    > } else
    > mem = kzalloc(sizeof(struct mem_cgroup), GFP_KERNEL);
    > @@ -1087,10 +1073,6 @@ static void mem_cgroup_move_task(struct
    > if (!thread_group_leader(p))
    > goto out;
    >
    > - css_get(&mem->css);
    > - rcu_assign_pointer(mm->mem_cgroup, mem);
    > - css_put(&old_mem->css);
    > -
    > out:
    > mmput(mm);
    > }
    > diff -puN init/main.c~memory-controller-add-mm-owner init/main.c
    > --- linux-2.6.25-rc8/init/main.c~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    > +++ linux-2.6.25-rc8-balbir/init/main.c 2008-04-03 22:43:27.000000000 +0530
    > @@ -537,6 +537,7 @@ asmlinkage void __init start_kernel(void
    > printk(KERN_NOTICE);
    > printk(linux_banner);
    > setup_arch(&command_line);
    > + mm_init_owner(&init_mm, &init_task);
    > setup_command_line(command_line);
    > unwind_setup();
    > setup_per_cpu_areas();
    > _
    >
    > --
    > Warm Regards,
    > Balbir Singh
    > Linux Technology Center
    > IBM, ISTL
    >

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [-mm] Add an owner to the mm_struct (v8)

    Paul Menage wrote:
    > On Fri, Apr 4, 2008 at 1:05 AM, Balbir Singh wrote:
    >> After the thread group leader exits, it's moved to init_css_state by
    >> cgroup_exit(), thus all future charges from runnings threads would
    >> be redirected to the init_css_set's subsystem.

    >
    > And its uncharges, which is more of the problem I was getting at
    > earlier - surely when the mm is finally destroyed, all its virtual
    > address space charges will be uncharged from the root cgroup rather
    > than the correct cgroup, if we left the delayed group leader as the
    > owner? Which is why I think the group leader optimization is unsafe.


    It won't uncharge for the memory controller from the root cgroup since each page
    has the mem_cgroup information associated with it. For other controllers,
    they'll need to monitor exit() callbacks to know when the leader is dead (sigh).

    Not having the group leader optimization can introduce big overheads (consider
    thousands of tasks, with the group leader being the first one to exit).

    --
    Warm Regards,
    Balbir Singh
    Linux Technology Center
    IBM, ISTL
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [-mm] Add an owner to the mm_struct (v8)

    On Fri, Apr 4, 2008 at 1:28 AM, Balbir Singh wrote:
    >
    > It won't uncharge for the memory controller from the root cgroup since each page
    > has the mem_cgroup information associated with it.


    Right, I realise that the memory controller is OK because of the ref counts.

    > For other controllers,
    > they'll need to monitor exit() callbacks to know when the leader is dead (sigh).


    That sounds like a nightmare ...

    >
    > Not having the group leader optimization can introduce big overheads (consider
    > thousands of tasks, with the group leader being the first one to exit).


    Can you test the overhead?

    As long as we find someone to pass the mm to quickly, it shouldn't be
    too bad - I think we're already optimized for that case. Generally the
    group leader's first child will be the new owner, and any subsequent
    times the owner exits, they're unlikely to have any children so
    they'll go straight to the sibling check and pass the mm to the
    parent's first child.

    Unless they all exit in strict sibling order and hence pass the mm
    along the chain one by one, we should be fine. And if that exit
    ordering does turn out to be common, then simply walking the child and
    sibling lists in reverse order to find a victim will minimize the
    amount of passing.

    One other thing occurred to me - what lock protects the child and
    sibling links? I don't see any documentation anywhere, but from the
    code it looks as though it's tasklist_lock rather than RCU - so maybe
    we should be holding that with a read_lock(), at least for the first
    two parts of the search? (The full thread search is RCU-safe).

    Paul
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [-mm] Add an owner to the mm_struct (v8)

    Paul Menage wrote:
    > On Fri, Apr 4, 2008 at 1:28 AM, Balbir Singh wrote:
    >> It won't uncharge for the memory controller from the root cgroup since each page
    >> has the mem_cgroup information associated with it.

    >
    > Right, I realise that the memory controller is OK because of the ref counts.
    >
    >> For other controllers,
    >> they'll need to monitor exit() callbacks to know when the leader is dead (sigh).

    >
    > That sounds like a nightmare ...
    >


    Yes, it would be, but worth the trouble. Is it really critical to move a dead
    cgroup leader to init_css_set in cgroup_exit()?

    >> Not having the group leader optimization can introduce big overheads (consider
    >> thousands of tasks, with the group leader being the first one to exit).

    >
    > Can you test the overhead?
    >


    I probably can write a program and see what the overhead looks like

    > As long as we find someone to pass the mm to quickly, it shouldn't be
    > too bad - I think we're already optimized for that case. Generally the
    > group leader's first child will be the new owner, and any subsequent
    > times the owner exits, they're unlikely to have any children so
    > they'll go straight to the sibling check and pass the mm to the
    > parent's first child.
    >
    > Unless they all exit in strict sibling order and hence pass the mm
    > along the chain one by one, we should be fine. And if that exit
    > ordering does turn out to be common, then simply walking the child and
    > sibling lists in reverse order to find a victim will minimize the
    > amount of passing.
    >



    Finding the next mm might not be all that bad, but doing it each time a task
    exits, can be an overhead, specially for large multi threaded programs. This can
    get severe if the new mm->owner belongs to a different cgroup, in which case we
    need to use callbacks as well.

    If half the threads belonged to a different cgroup and the new mm->owner kept
    switching between cgroups, the overhead would be really high, with the callbacks
    and the mm->owner changing frequently.

    > One other thing occurred to me - what lock protects the child and
    > sibling links? I don't see any documentation anywhere, but from the
    > code it looks as though it's tasklist_lock rather than RCU - so maybe
    > we should be holding that with a read_lock(), at least for the first
    > two parts of the search? (The full thread search is RCU-safe).
    >


    You are right about the read_lock()

    --
    Warm Regards,
    Balbir Singh
    Linux Technology Center
    IBM, ISTL
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [-mm] Add an owner to the mm_struct (v8)

    On Fri, Apr 4, 2008 at 2:25 AM, Balbir Singh wrote:
    > >> For other controllers,
    > >> they'll need to monitor exit() callbacks to know when the leader is dead (sigh).

    > >
    > > That sounds like a nightmare ...
    > >

    >
    > Yes, it would be, but worth the trouble. Is it really critical to move a dead
    > cgroup leader to init_css_set in cgroup_exit()?


    It struck me that this whole group leader optimization is broken as it
    stands since there could (in strange configurations) be multiple
    thread groups sharing the same mm.

    I wonder if we can't just delay the exit_mm() call of a group leader
    until all its threads have exited?

    >
    > > As long as we find someone to pass the mm to quickly, it shouldn't be
    > > too bad - I think we're already optimized for that case. Generally the
    > > group leader's first child will be the new owner, and any subsequent
    > > times the owner exits, they're unlikely to have any children so
    > > they'll go straight to the sibling check and pass the mm to the
    > > parent's first child.
    > >
    > > Unless they all exit in strict sibling order and hence pass the mm
    > > along the chain one by one, we should be fine. And if that exit
    > > ordering does turn out to be common, then simply walking the child and
    > > sibling lists in reverse order to find a victim will minimize the
    > > amount of passing.
    > >

    >
    >
    > Finding the next mm might not be all that bad, but doing it each time a task
    > exits, can be an overhead, specially for large multi threaded programs.


    Right, but we only have that overhead if we actually end up passing
    the mm from one to another each time they exit. It would be
    interesting to know what order the threads in a large multi-threaded
    process exit typically (when the main process exits and all the
    threads die).

    I guess it's likely to be one of:

    - in thread creation order (i.e. in order of parent->children list),
    in which case we should try to throw the mm to the parent's last child
    - in reverse creation order, in which case we should try to throw the
    mm to the parent's first child
    - in random order depending on which threads the scheduler runs first
    (in which case we can expect that a small fraction of the threads will
    have to throw the mm whichever end we start from)

    > This can
    > get severe if the new mm->owner belongs to a different cgroup, in which case we
    > need to use callbacks as well.
    >
    > If half the threads belonged to a different cgroup and the new mm->owner kept
    > switching between cgroups, the overhead would be really high, with the callbacks
    > and the mm->owner changing frequently.


    To me, it seems that setting up a *virtual address space* cgroup
    hierarchy and then putting half your threads in one group and half in
    the another is asking for trouble. We need to not break in that
    situation, but I'm not sure it's a case to optimize for.

    Paul
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [-mm] Add an owner to the mm_struct (v8)

    Paul Menage wrote:
    > On Fri, Apr 4, 2008 at 2:25 AM, Balbir Singh wrote:
    >> >> For other controllers,
    >> >> they'll need to monitor exit() callbacks to know when the leader is dead (sigh).
    >> >
    >> > That sounds like a nightmare ...
    >> >

    >>
    >> Yes, it would be, but worth the trouble. Is it really critical to move a dead
    >> cgroup leader to init_css_set in cgroup_exit()?

    >
    > It struck me that this whole group leader optimization is broken as it
    > stands since there could (in strange configurations) be multiple
    > thread groups sharing the same mm.
    >
    > I wonder if we can't just delay the exit_mm() call of a group leader
    > until all its threads have exited?
    >


    Not sure about this one, I suspect keeping the group_leader around is an
    optimization, changing exit_mm() for the group_leader, not sure how that will
    impact functionality or standards. It might even break some applications.

    Repeating my question earlier

    Can we delay setting task->cgroups = &init_css_set for the group_leader, until
    all threads have exited? If the user is unable to remove a cgroup node, it will
    be due a valid reason, the group_leader is still around, since the threads are
    still around. The user in that case should wait for notify_on_release.

    >> > As long as we find someone to pass the mm to quickly, it shouldn't be
    >> > too bad - I think we're already optimized for that case. Generally the
    >> > group leader's first child will be the new owner, and any subsequent
    >> > times the owner exits, they're unlikely to have any children so
    >> > they'll go straight to the sibling check and pass the mm to the
    >> > parent's first child.
    >> >
    >> > Unless they all exit in strict sibling order and hence pass the mm
    >> > along the chain one by one, we should be fine. And if that exit
    >> > ordering does turn out to be common, then simply walking the child and
    >> > sibling lists in reverse order to find a victim will minimize the
    >> > amount of passing.
    >> >

    >>
    >>
    >> Finding the next mm might not be all that bad, but doing it each time a task
    >> exits, can be an overhead, specially for large multi threaded programs.

    >
    > Right, but we only have that overhead if we actually end up passing
    > the mm from one to another each time they exit. It would be
    > interesting to know what order the threads in a large multi-threaded
    > process exit typically (when the main process exits and all the
    > threads die).
    >
    > I guess it's likely to be one of:
    >
    > - in thread creation order (i.e. in order of parent->children list),
    > in which case we should try to throw the mm to the parent's last child
    > - in reverse creation order, in which case we should try to throw the
    > mm to the parent's first child
    > - in random order depending on which threads the scheduler runs first
    > (in which case we can expect that a small fraction of the threads will
    > have to throw the mm whichever end we start from)
    >
    >> This can
    >> get severe if the new mm->owner belongs to a different cgroup, in which case we
    >> need to use callbacks as well.
    >>
    >> If half the threads belonged to a different cgroup and the new mm->owner kept
    >> switching between cgroups, the overhead would be really high, with the callbacks
    >> and the mm->owner changing frequently.

    >
    > To me, it seems that setting up a *virtual address space* cgroup
    > hierarchy and then putting half your threads in one group and half in
    > the another is asking for trouble. We need to not break in that
    > situation, but I'm not sure it's a case to optimize for.


    That could potentially happen, if the virtual address space cgroup and cpu
    control cgroup were bound together in the same hierarchy by the sysadmin.

    I measured the overhead of removing the delay_group_leader optimization and
    found a 4% impact on throughput (with volanomark, that is one of the
    multi-threaded benchmarks I know of).

    --
    Warm Regards,
    Balbir Singh
    Linux Technology Center
    IBM, ISTL
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [-mm] Add an owner to the mm_struct (v8)

    On Sat, Apr 5, 2008 at 7:47 AM, Balbir Singh wrote:
    >
    > Repeating my question earlier
    >
    > Can we delay setting task->cgroups = &init_css_set for the group_leader, until
    > all threads have exited?


    Potentially, yes. It also might make more sense to move the
    exit_cgroup() for all threads to a later point rather than special
    case delayed group leaders.

    > If the user is unable to remove a cgroup node, it will
    > be due a valid reason, the group_leader is still around, since the threads are
    > still around. The user in that case should wait for notify_on_release.
    >
    > >
    > > To me, it seems that setting up a *virtual address space* cgroup
    > > hierarchy and then putting half your threads in one group and half in
    > > the another is asking for trouble. We need to not break in that
    > > situation, but I'm not sure it's a case to optimize for.

    >
    > That could potentially happen, if the virtual address space cgroup and cpu
    > control cgroup were bound together in the same hierarchy by the sysadmin.


    Yes, I agree it could potentially happen. But it seems like a strange
    thing to do if you're planning to be not have the same groupings for
    cpu and va.

    >
    > I measured the overhead of removing the delay_group_leader optimization and
    > found a 4% impact on throughput (with volanomark, that is one of the
    > multi-threaded benchmarks I know of).


    Interesting, I thought (although I've never actually looked at the
    code) that volanomark was more of a scheduling benchmark than a
    process start/exit benchmark. How frequently does it have processes
    (not threads) exiting?

    How many runs was that over? Ingo's recently posted volanomark tests
    against -rc7 showed ~3% random variation between runs.

    Paul
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [-mm] Add an owner to the mm_struct (v8)

    Paul Menage wrote:
    > On Sat, Apr 5, 2008 at 7:47 AM, Balbir Singh wrote:
    >> Repeating my question earlier
    >>
    >> Can we delay setting task->cgroups = &init_css_set for the group_leader, until
    >> all threads have exited?

    >
    > Potentially, yes. It also might make more sense to move the
    > exit_cgroup() for all threads to a later point rather than special
    > case delayed group leaders.
    >


    Yes, that makes sense. I think that patch should be independent of this one
    though? What do you think?

    >> If the user is unable to remove a cgroup node, it will
    >> be due a valid reason, the group_leader is still around, since the threads are
    >> still around. The user in that case should wait for notify_on_release.
    >>
    >> >
    >> > To me, it seems that setting up a *virtual address space* cgroup
    >> > hierarchy and then putting half your threads in one group and half in
    >> > the another is asking for trouble. We need to not break in that
    >> > situation, but I'm not sure it's a case to optimize for.

    >>
    >> That could potentially happen, if the virtual address space cgroup and cpu
    >> control cgroup were bound together in the same hierarchy by the sysadmin.

    >
    > Yes, I agree it could potentially happen. But it seems like a strange
    > thing to do if you're planning to be not have the same groupings for
    > cpu and va.
    >


    It's easier to set it up that way. Usually the end user gets the same SLA for
    memory, CPU and other resources, so it makes sense to bind the controllers together.

    >> I measured the overhead of removing the delay_group_leader optimization and
    >> found a 4% impact on throughput (with volanomark, that is one of the
    >> multi-threaded benchmarks I know of).

    >
    > Interesting, I thought (although I've never actually looked at the
    > code) that volanomark was more of a scheduling benchmark than a
    > process start/exit benchmark. How frequently does it have processes
    > (not threads) exiting?
    >


    I could not find any other interesting benchmark for benchmarking fork/exits. I
    know that volanomark is heavily threaded, so I used it. The threads quickly exit
    after processing the messages, I thought that would be a good test to see the
    overhead.

    > How many runs was that over? Ingo's recently posted volanomark tests
    > against -rc7 showed ~3% random variation between runs.


    I ran the test four times. I took the average of runs, I did see some variation
    between runs, I did not calculate the standard deviation.

    --
    Warm Regards,
    Balbir Singh
    Linux Technology Center
    IBM, ISTL
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [-mm] Add an owner to the mm_struct (v8)

    On Sat, Apr 5, 2008 at 10:48 AM, Balbir Singh wrote:
    > Paul Menage wrote:
    > > On Sat, Apr 5, 2008 at 7:47 AM, Balbir Singh wrote:
    > >> Repeating my question earlier
    > >>
    > >> Can we delay setting task->cgroups = &init_css_set for the group_leader, until
    > >> all threads have exited?

    > >
    > > Potentially, yes. It also might make more sense to move the
    > > exit_cgroup() for all threads to a later point rather than special
    > > case delayed group leaders.
    > >

    >
    > Yes, that makes sense. I think that patch should be independent of this one
    > though? What do you think?


    Yes, it would probably need to be a separate patch. The current
    positioning of cgroup_exit() is more or less inherited from cpusets.
    I'd need to figure out if a change like that would break anything.

    > >
    > > Yes, I agree it could potentially happen. But it seems like a strange
    > > thing to do if you're planning to be not have the same groupings for
    > > cpu and va.

    >
    > It's easier to set it up that way. Usually the end user gets the same SLA for
    > memory, CPU and other resources, so it makes sense to bind the controllers together.
    >


    True - but in that case why wouldn't they have the same SLA for
    virtual address space too?

    >
    > >> I measured the overhead of removing the delay_group_leader optimization and
    > >> found a 4% impact on throughput (with volanomark, that is one of the
    > >> multi-threaded benchmarks I know of).

    > >
    > > Interesting, I thought (although I've never actually looked at the
    > > code) that volanomark was more of a scheduling benchmark than a
    > > process start/exit benchmark. How frequently does it have processes
    > > (not threads) exiting?
    > >

    >
    > I could not find any other interesting benchmark for benchmarking fork/exits. I
    > know that volanomark is heavily threaded, so I used it. The threads quickly exit
    > after processing the messages, I thought that would be a good test to see the
    > overhead.


    But surely the performance of thread exits wouldn't be affected by the
    delay_group_leader(p) change, since none of the exiting threads would
    be a group leader. That optimization only matters when the entire
    process exits.

    Does oprofile show any interesting differences?

    Paul
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [-mm] Add an owner to the mm_struct (v8)

    Paul Menage wrote:
    > On Sat, Apr 5, 2008 at 10:48 AM, Balbir Singh wrote:
    >> Paul Menage wrote:
    >> > On Sat, Apr 5, 2008 at 7:47 AM, Balbir Singh wrote:
    >> >> Repeating my question earlier
    >> >>
    >> >> Can we delay setting task->cgroups = &init_css_set for the group_leader, until
    >> >> all threads have exited?
    >> >
    >> > Potentially, yes. It also might make more sense to move the
    >> > exit_cgroup() for all threads to a later point rather than special
    >> > case delayed group leaders.
    >> >

    >>
    >> Yes, that makes sense. I think that patch should be independent of this one
    >> though? What do you think?

    >
    > Yes, it would probably need to be a separate patch. The current
    > positioning of cgroup_exit() is more or less inherited from cpusets.
    > I'd need to figure out if a change like that would break anything.
    >


    Yes, thats understandable

    >> >
    >> > Yes, I agree it could potentially happen. But it seems like a strange
    >> > thing to do if you're planning to be not have the same groupings for
    >> > cpu and va.

    >>
    >> It's easier to set it up that way. Usually the end user gets the same SLA for
    >> memory, CPU and other resources, so it makes sense to bind the controllers together.
    >>

    >
    > True - but in that case why wouldn't they have the same SLA for
    > virtual address space too?
    >


    Yes, mostly. That's why I had made the virtual address space patches as a config
    option on top of the memory controller

    >> >> I measured the overhead of removing the delay_group_leader optimization and
    >> >> found a 4% impact on throughput (with volanomark, that is one of the
    >> >> multi-threaded benchmarks I know of).
    >> >
    >> > Interesting, I thought (although I've never actually looked at the
    >> > code) that volanomark was more of a scheduling benchmark than a
    >> > process start/exit benchmark. How frequently does it have processes
    >> > (not threads) exiting?
    >> >

    >>
    >> I could not find any other interesting benchmark for benchmarking fork/exits. I
    >> know that volanomark is heavily threaded, so I used it. The threads quickly exit
    >> after processing the messages, I thought that would be a good test to see the
    >> overhead.

    >
    > But surely the performance of thread exits wouldn't be affected by the
    > delay_group_leader(p) change, since none of the exiting threads would
    > be a group leader. That optimization only matters when the entire
    > process exits.
    >


    On the client side, each JVM instance exits after the test. I see the thread
    group leader exit as well through getdelays (I see TGID exits).

    > Does oprofile show any interesting differences?


    Need to try oprofile.

    --
    Warm Regards,
    Balbir Singh
    Linux Technology Center
    IBM, ISTL
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [-mm] Add an owner to the mm_struct (v8)

    On Sat, Apr 5, 2008 at 11:59 AM, Balbir Singh wrote:
    > > But surely the performance of thread exits wouldn't be affected by the
    > > delay_group_leader(p) change, since none of the exiting threads would
    > > be a group leader. That optimization only matters when the entire
    > > process exits.
    > >

    >
    > On the client side, each JVM instance exits after the test. I see the thread
    > group leader exit as well through getdelays (I see TGID exits).


    How long does the test run for? How many threads does each client have?

    Paul
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [-mm] Add an owner to the mm_struct (v8)

    On Sat, Apr 5, 2008 at 11:59 AM, Balbir Singh wrote:
    > >> It's easier to set it up that way. Usually the end user gets the same SLA for
    > >> memory, CPU and other resources, so it makes sense to bind the controllers together.
    > >>

    > >
    > > True - but in that case why wouldn't they have the same SLA for
    > > virtual address space too?
    > >

    >
    > Yes, mostly. That's why I had made the virtual address space patches as a config
    > option on top of the memory controller
    >


    *If* they want to use the virtual address space controller, that is.

    By that argument, you should make the memory and cpu controllers the
    same controller, since in your scenario they'll usually be used
    together..

    Paul
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [-mm] Add an owner to the mm_struct (v8)

    Paul Menage wrote:
    > On Sat, Apr 5, 2008 at 11:59 AM, Balbir Singh wrote:
    >> > But surely the performance of thread exits wouldn't be affected by the
    >> > delay_group_leader(p) change, since none of the exiting threads would
    >> > be a group leader. That optimization only matters when the entire
    >> > process exits.
    >> >

    >>
    >> On the client side, each JVM instance exits after the test. I see the thread
    >> group leader exit as well through getdelays (I see TGID exits).

    >
    > How long does the test run for? How many threads does each client have?


    The test on each client side runs for about 10 seconds. I saw the client create
    up to 411 threads.

    --
    Warm Regards,
    Balbir Singh
    Linux Technology Center
    IBM, ISTL
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [-mm] Add an owner to the mm_struct (v8)

    Paul Menage wrote:
    > On Sat, Apr 5, 2008 at 11:59 AM, Balbir Singh wrote:
    >> >> It's easier to set it up that way. Usually the end user gets the same SLA for
    >> >> memory, CPU and other resources, so it makes sense to bind the controllers together.
    >> >>
    >> >
    >> > True - but in that case why wouldn't they have the same SLA for
    >> > virtual address space too?
    >> >

    >>
    >> Yes, mostly. That's why I had made the virtual address space patches as a config
    >> option on top of the memory controller
    >>

    >
    > *If* they want to use the virtual address space controller, that is.
    >
    > By that argument, you should make the memory and cpu controllers the
    > same controller, since in your scenario they'll usually be used
    > together..


    Heh, Virtual address and memory are more closely interlinked than CPU and Memory.
    --
    Warm Regards,
    Balbir Singh
    Linux Technology Center
    IBM, ISTL
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [-mm] Add an owner to the mm_struct (v8)

    On Fri, 04 Apr 2008 13:35:44 +0530
    Balbir Singh wrote:

    > 1. Add mm->owner change callbacks using cgroups
    >
    > ...
    >
    > +config MM_OWNER
    > + bool "Enable ownership of mm structure"
    > + help
    > + This option enables mm_struct's to have an owner. The advantage
    > + of this approach is that it allows for several independent memory
    > + based cgroup controllers to co-exist independently without too
    > + much space overhead
    > +
    > + This feature adds fork/exit overhead. So enable this only if
    > + you need resource controllers


    Do we really want to offer this option to people? It's rather a low-level
    thing and it's likely to cause more confusion than it's worth. Remember
    that most kernels get to our users via kernel vendors - to what will they
    be setting this config option?

    > config CGROUP_MEM_RES_CTLR
    > bool "Memory Resource Controller for Control Groups"
    > depends on CGROUPS && RESOURCE_COUNTERS
    > + select MM_OWNER


    Presumably they'll always be setting it to "y" if they are enabling cgroups
    at all.

    > --- linux-2.6.25-rc8/kernel/cgroup.c~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    > +++ linux-2.6.25-rc8-balbir/kernel/cgroup.c 2008-04-03 22:43:27.000000000 +0530
    > @@ -118,6 +118,7 @@ static int root_count;
    > * be called.
    > */
    > static int need_forkexit_callback;
    > +static int need_mm_owner_callback;


    I suppose these should be __read_mostly.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [-mm] Add an owner to the mm_struct (v8)

    Andrew Morton wrote:
    > On Fri, 04 Apr 2008 13:35:44 +0530
    > Balbir Singh wrote:
    >
    >> 1. Add mm->owner change callbacks using cgroups
    >>
    >> ...
    >>
    >> +config MM_OWNER
    >> + bool "Enable ownership of mm structure"
    >> + help
    >> + This option enables mm_struct's to have an owner. The advantage
    >> + of this approach is that it allows for several independent memory
    >> + based cgroup controllers to co-exist independently without too
    >> + much space overhead
    >> +
    >> + This feature adds fork/exit overhead. So enable this only if
    >> + you need resource controllers

    >
    > Do we really want to offer this option to people? It's rather a low-level
    > thing and it's likely to cause more confusion than it's worth. Remember
    > that most kernels get to our users via kernel vendors - to what will they
    > be setting this config option?
    >


    I suspect that this kernel option will not be explicitly set it. This option
    will be selected by other config options (memory controller, swap namespace,
    revoke*)

    >> config CGROUP_MEM_RES_CTLR
    >> bool "Memory Resource Controller for Control Groups"
    >> depends on CGROUPS && RESOURCE_COUNTERS
    >> + select MM_OWNER

    >
    > Presumably they'll always be setting it to "y" if they are enabling cgroups
    > at all.
    >
    >> --- linux-2.6.25-rc8/kernel/cgroup.c~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530
    >> +++ linux-2.6.25-rc8-balbir/kernel/cgroup.c 2008-04-03 22:43:27.000000000 +0530
    >> @@ -118,6 +118,7 @@ static int root_count;
    >> * be called.
    >> */
    >> static int need_forkexit_callback;
    >> +static int need_mm_owner_callback;

    >
    > I suppose these should be __read_mostly.
    >


    Yes, good point. I'll send out v9 with this fix.

    --
    Warm Regards,
    Balbir Singh
    Linux Technology Center
    IBM, ISTL
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [-mm] Add an owner to the mm_struct (v8)

    On Tue, 08 Apr 2008 08:09:57 +0530 Balbir Singh wrote:

    > Andrew Morton wrote:
    > > On Fri, 04 Apr 2008 13:35:44 +0530
    > > Balbir Singh wrote:
    > >
    > >> 1. Add mm->owner change callbacks using cgroups
    > >>
    > >> ...
    > >>
    > >> +config MM_OWNER
    > >> + bool "Enable ownership of mm structure"
    > >> + help
    > >> + This option enables mm_struct's to have an owner. The advantage
    > >> + of this approach is that it allows for several independent memory
    > >> + based cgroup controllers to co-exist independently without too
    > >> + much space overhead
    > >> +
    > >> + This feature adds fork/exit overhead. So enable this only if
    > >> + you need resource controllers

    > >
    > > Do we really want to offer this option to people? It's rather a low-level
    > > thing and it's likely to cause more confusion than it's worth. Remember
    > > that most kernels get to our users via kernel vendors - to what will they
    > > be setting this config option?
    > >

    >
    > I suspect that this kernel option will not be explicitly set it. This option
    > will be selected by other config options (memory controller, swap namespace,
    > revoke*)


    I believe that the way to do this is to not give the option a `help'
    section. Tht makes it a Kconfig-internal-only thing.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [-mm] Add an owner to the mm_struct (v8)

    On Sat, Apr 5, 2008 at 11:31 PM, Balbir Singh wrote:
    > > *If* they want to use the virtual address space controller, that is.
    > >
    > > By that argument, you should make the memory and cpu controllers the
    > > same controller, since in your scenario they'll usually be used
    > > together..

    >
    > Heh, Virtual address and memory are more closely interlinked than CPU and Memory.


    If you consider virtual address space limits a useful way to limit
    swap usage, that's true.

    But if you don't, then memory and CPU are more closely linked since
    they represent real resource usage, whereas virtual address space is a
    more abstract quantity.

    Paul
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [-mm] Add an owner to the mm_struct (v8)

    On Sat, Apr 5, 2008 at 10:38 PM, Balbir Singh wrote:
    > >
    > > How long does the test run for? How many threads does each client have?

    >
    > The test on each client side runs for about 10 seconds. I saw the client create
    > up to 411 threads.
    >


    I'm not convinced that an application that creates 400 threads and
    exits in 10 seconds is particular representative of a high-performance
    application.

    But I agree that it's an example of something it may be worth trying
    to optimize for.

    You mention that you saw tgid exits - what order did the individual
    threads exit in? If we threw the mm to the last thread in the thread
    group rather than the first, would that help?

    Paul
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 1 of 2 1 2 LastLast