[PATCH 0/12] memcg updates v5 - Kernel

This is a discussion on [PATCH 0/12] memcg updates v5 - Kernel ; On Fri, 26 Sep 2008 14:24:55 +0900 Daisuke Nishimura wrote: > On Fri, 26 Sep 2008 13:05:34 +0900, KAMEZAWA Hiroyuki wrote: > > On Fri, 26 Sep 2008 12:00:19 +0900 > > Daisuke Nishimura wrote: > > > > > ...

+ Reply to Thread
Page 3 of 4 FirstFirst 1 2 3 4 LastLast
Results 41 to 60 of 69

Thread: [PATCH 0/12] memcg updates v5

  1. Re: [PATCH 0/12] memcg updates v5

    On Fri, 26 Sep 2008 14:24:55 +0900
    Daisuke Nishimura wrote:

    > On Fri, 26 Sep 2008 13:05:34 +0900, KAMEZAWA Hiroyuki wrote:
    > > On Fri, 26 Sep 2008 12:00:19 +0900
    > > Daisuke Nishimura wrote:
    > >
    > > > I'll test it with updated version of 9-11 and report you back.
    > > >

    > > Thank you. below is the new one...(Sorry!)
    > >
    > > -Kame
    > > ==
    > > Check LRU bit under lru_lock.
    > >
    > > Signed-off-by: KAMEZAWA Hiroyuki
    > >
    > > mm/memcontrol.c | 9 +++++----
    > > 1 file changed, 5 insertions(+), 4 deletions(-)
    > >
    > > Index: mmotm-2.6.27-rc7+/mm/memcontrol.c
    > > ================================================== =================
    > > --- mmotm-2.6.27-rc7+.orig/mm/memcontrol.c
    > > +++ mmotm-2.6.27-rc7+/mm/memcontrol.c
    > > @@ -340,11 +340,12 @@ void mem_cgroup_move_lists(struct page *
    > > if (!trylock_page_cgroup(pc))
    > > return;
    > >
    > > - if (PageCgroupUsed(pc) && PageCgroupLRU(pc)) {
    > > + if (PageCgroupUsed(pc)) {
    > > mem = pc->mem_cgroup;
    > > mz = page_cgroup_zoneinfo(pc);
    > > spin_lock_irqsave(&mz->lru_lock, flags);
    > > - __mem_cgroup_move_lists(pc, lru);
    > > + if (PageCgroupLRU(pc))
    > > + __mem_cgroup_move_lists(pc, lru);
    > > spin_unlock_irqrestore(&mz->lru_lock, flags);
    > > }
    > > unlock_page_cgroup(pc);
    > > @@ -564,8 +565,8 @@ __release_page_cgroup(struct memcg_percp
    > > spin_lock(&mz->lru_lock);
    > > }
    > > if (!PageCgroupUsed(pc) && PageCgroupLRU(pc)) {
    > > - __mem_cgroup_remove_list(mz, pc);
    > > ClearPageCgroupLRU(pc);
    > > + __mem_cgroup_remove_list(mz, pc);
    > > }
    > > }
    > > if (prev_mz)
    > > @@ -597,8 +598,8 @@ __set_page_cgroup_lru(struct memcg_percp
    > > spin_lock(&mz->lru_lock);
    > > }
    > > if (PageCgroupUsed(pc) && !PageCgroupLRU(pc)) {
    > > - SetPageCgroupLRU(pc);
    > > __mem_cgroup_add_list(mz, pc);
    > > + SetPageCgroupLRU(pc);
    > > }
    > > }
    > >
    > >

    >
    > Unfortunately, there remains some bugs yet...
    >

    I confirmed I can reproduce this.

    I found one chance to cause this. (and confirmed this happens by printk)
    set_lru()..

    TestSetPageCgroup(pc);
    .... if (PageCgroupUsed(pc) && !PageCgroupLRU(pc))
    pc->mem_cgroup = mem;
    SetPageCgroupLRU();
    __mem_cgroup_add_list();


    Then, page_cgroup will be added to wrong LRU whic doesn't match pc->mem_cgroup.

    But there is still more...still digging.

    Thanks,
    -kame



    > ------------[ cut here ]------------
    > WARNING: at lib/list_debug.c:51 list_del+0x5c/0x87()
    > list_del corruption. next->prev should be ffff88010ca291e8, but was dead000000200200
    > Modules linked in: ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp ipv6
    > autofs4 hidp rfcomm l2cap bluetooth sunrpc microcode dm_mirror dm_log dm_multipath dm_mod
    > rfkill input_polldev sbs sbshc battery ac lp sg e1000 ide_cd_mod cdrom button acpi_memhotp
    > lug parport_pc rtc_cmos rtc_core parport serio_raw rtc_lib i2c_i801 i2c_core shpchp pcspkr
    > ata_piix libata megaraid_mbox megaraid_mm sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci
    > _hcd
    > Pid: 3940, comm: bash Tainted: G W 2.6.27-rc7-mm1-dd8bf0fe #1
    > Call Trace:
    > [] warn_slowpath+0xb4/0xd2
    > [] prepare_to_wait_exclusive+0x38/0x5a
    > [] finish_wait+0x32/0x5d
    > [] __wait_on_bit_lock+0x5b/0x66
    > [] __lock_page+0x5e/0x64
    > [] target_load+0x2a/0x58
    > [] place_entity+0x85/0xb3
    > [] enqueue_entity+0x16e/0x18f
    > [] zone_statistics+0x3a/0x5d
    > [] zone_statistics+0x3a/0x5d
    > [] get_page_from_freelist+0x455/0x5bf
    > [] list_del+0x5c/0x87
    > [] mem_cgroup_commit_charge+0x6f/0xdd
    > [] mem_cgroup_charge_common+0x4c/0x62
    > [] handle_mm_fault+0x222/0x791
    > [] zone_statistics+0x3a/0x5d
    > [] follow_page+0x2d/0x2c2
    > [] __get_user_pages+0x2f5/0x3f3
    > [] get_arg_page+0x46/0xa5
    > [] copy_strings+0xfc/0x1de
    > [] copy_strings_kernel+0x21/0x33
    > [] do_execve+0x140/0x256
    > [] sys_execve+0x35/0x4c
    > [] stub_execve+0x6a/0xc0
    > ---[ end trace 4eaa2a86a8e2da22 ]---
    > ------------[ cut here ]------------
    > WARNING: at lib/list_debug.c:48 list_del+0x30/0x87()
    > list_del corruption. prev->next should be ffff88010ca29210, but was dead000000100100
    > Modules linked in: ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp ipv6
    > autofs4 hidp rfcomm l2cap bluetooth sunrpc microcode dm_mirror dm_log dm_multipath dm_mod
    > rfkill input_polldev sbs sbshc battery ac lp sg e1000 ide_cd_mod cdrom button acpi_memhotp
    > lug parport_pc rtc_cmos rtc_core parport serio_raw rtc_lib i2c_i801 i2c_core shpchp pcspkr
    > ata_piix libata megaraid_mbox megaraid_mm sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci
    > _hcd
    > Pid: 3940, comm: bash Tainted: G W 2.6.27-rc7-mm1-dd8bf0fe #1
    > Call Trace:
    > [] warn_slowpath+0xb4/0xd2
    > [] __getblk+0x25/0x21f
    > [] __ext3_journal_dirty_metadata+0x1e/0x46 [ext3]
    > [] __wake_up+0x38/0x4f
    > [] __mark_inode_dirty+0x15c/0x16b
    > [] touch_atime+0x109/0x112
    > [] mnt_drop_write+0x25/0xdc
    > [] generic_file_aio_read+0x4b8/0x515
    > [] list_del+0x30/0x87
    > [] __release_page_cgroup+0x68/0x8a
    > [] page_remove_rmap+0x10e/0x12e
    > [] unmap_vmas+0x476/0x7f2
    > [] exit_mmap+0xf0/0x176
    > [] secure_ip_id+0x45/0x4a
    > [] mmput+0x30/0x88
    > [] flush_old_exec+0x487/0x77c
    > [] vfs_read+0x11e/0x133
    > [] load_elf_binary+0x338/0x16b6
    > [] get_arg_page+0x46/0xa5
    > [] copy_strings+0x1cd/0x1de
    > [] search_binary_handler+0xb0/0x22e
    > [] do_execve+0x1a8/0x256
    > [] sys_execve+0x35/0x4c
    > [] stub_execve+0x6a/0xc0
    > ---[ end trace 4eaa2a86a8e2da22 ]---
    > ------------[ cut here ]------------
    > WARNING: at lib/list_debug.c:48 list_del+0x30/0x87()
    > list_del corruption. prev->next should be ffff88010c937d50, but was ffff88010ca052e8
    > Modules linked in: ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp ipv6
    > autofs4 hidp rfcomm l2cap bluetooth sunrpc microcode dm_mirror dm_log dm_multipath dm_mod
    > rfkill input_polldev sbs sbshc battery ac lp sg e1000 ide_cd_mod cdrom button acpi_memhotp
    > lug parport_pc rtc_cmos rtc_core parport serio_raw rtc_lib i2c_i801 i2c_core shpchp pcspkr
    > ata_piix libata megaraid_mbox megaraid_mm sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci
    > _hcd
    > Pid: 3943, comm: shmem_test_02 Tainted: G W 2.6.27-rc7-mm1-dd8bf0fe #1
    > Call Trace:
    > [] warn_slowpath+0xb4/0xd2
    > [] shmem_getpage+0x75/0x7a0
    > [] zone_statistics+0x3a/0x5d
    > [] get_page_from_freelist+0x353/0x5bf
    > [] zone_statistics+0x3a/0x5d
    > [] get_page_from_freelist+0x353/0x5bf
    > [] list_del+0x30/0x87
    > [] mem_cgroup_commit_charge+0x6f/0xdd
    > [] mem_cgroup_charge_common+0x4c/0x62
    > [] do_wp_page+0x3ab/0x58a
    > [] handle_mm_fault+0x735/0x791
    > [] fcntl_setlk+0x233/0x263
    > [] do_page_fault+0x39c/0x773
    > [] error_exit+0x0/0x51
    > ---[ end trace 4eaa2a86a8e2da22 ]---
    > ------------[ cut here ]------------
    > WARNING: at lib/list_debug.c:48 list_del+0x30/0x87()
    > list_del corruption. prev->next should be ffff88010ca052e8, but was ffff88010c4068a0
    > Modules linked in: ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp ipv6
    > autofs4 hidp rfcomm l2cap bluetooth sunrpc microcode dm_mirror dm_log dm_multipath dm_mod
    > rfkill input_polldev sbs sbshc battery ac lp sg e1000 ide_cd_mod cdrom button acpi_memhotp
    > lug parport_pc rtc_cmos rtc_core parport serio_raw rtc_lib i2c_i801 i2c_core shpchp pcspkr
    > ata_piix libata megaraid_mbox megaraid_mm sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci
    > _hcd
    > Pid: 3942, comm: shmem_test_02 Tainted: G W 2.6.27-rc7-mm1-dd8bf0fe #1
    > Call Trace:
    > [] warn_slowpath+0xb4/0xd2
    > [] rmqueue_bulk+0x61/0x8b
    > [] number+0x106/0x1f9
    > [] zone_statistics+0x3a/0x5d
    > [] get_page_from_freelist+0x353/0x5bf
    > [] free_pages_bulk+0x198/0x20b
    > [] __pagevec_free+0x21/0x2e
    > [] release_pages+0x151/0x19f
    > [] list_del+0x30/0x87
    > [] __release_page_cgroup+0x68/0x8a
    > [] __remove_from_page_cache+0x45/0x8f
    > [] remove_from_page_cache+0x27/0x2f
    > [] truncate_complete_page+0x49/0x59
    > [] truncate_inode_pages_range+0xbd/0x2ff
    > [] shmem_delete_inode+0x33/0xc4
    > [] shmem_delete_inode+0x0/0xc4
    > [] generic_delete_inode+0xb0/0x124
    > [] d_kill+0x21/0x43
    > [] dput+0x111/0x11f
    > [] __fput+0x14f/0x17e
    > [] remove_vma+0x3d/0x72
    > [] exit_mmap+0x157/0x176
    > [] mmput+0x30/0x88
    > [] exit_mm+0xff/0x10a
    > [] do_exit+0x210/0x7a5
    > [] audit_syscall_entry+0x12d/0x160
    > [] do_group_exit+0x66/0x96
    > [] system_call_fastpath+0x16/0x1b
    > ---[ end trace 4eaa2a86a8e2da22 ]---
    > ------------[ cut here ]------------
    > WARNING: at lib/list_debug.c:51 list_del+0x5c/0x87()
    > list_del corruption. next->prev should be ffff88010caf6b20, but was dead000000200200
    > Modules linked in: ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp ipv6
    > autofs4 hidp rfcomm l2cap bluetooth sunrpc microcode dm_mirror dm_log dm_multipath dm_mod
    > rfkill input_polldev sbs sbshc battery ac lp sg e1000 ide_cd_mod cdrom button acpi_memhotp
    > lug parport_pc rtc_cmos rtc_core parport serio_raw rtc_lib i2c_i801 i2c_core shpchp pcspkr
    > ata_piix libata megaraid_mbox megaraid_mm sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci
    > _hcd
    > Pid: 3932, comm: page01 Tainted: G W 2.6.27-rc7-mm1-dd8bf0fe #1
    > Call Trace:
    > [] warn_slowpath+0xb4/0xd2
    > [] free_pages_bulk+0x198/0x20b
    > [] release_pages+0x18d/0x19f
    > [] list_del+0x5c/0x87
    > [] __release_page_cgroup+0x68/0x8a
    > [] page_remove_rmap+0x10e/0x12e
    > [] unmap_vmas+0x476/0x7f2
    > [] exit_mmap+0xf0/0x176
    > [] mmput+0x30/0x88
    > [] exit_mm+0xff/0x10a
    > [] do_exit+0x210/0x7a5
    > [] audit_syscall_entry+0x12d/0x160
    > [] do_group_exit+0x66/0x96
    > [] system_call_fastpath+0x16/0x1b
    > ---[ end trace 4eaa2a86a8e2da22 ]---
    > ------------[ cut here ]------------
    > WARNING: at lib/list_debug.c:48 list_del+0x30/0x87()
    > list_del corruption. prev->next should be ffff88010c4068a0, but was dead000000100100
    > Modules linked in: ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp ipv6
    > autofs4 hidp rfcomm l2cap bluetooth sunrpc microcode dm_mirror dm_log dm_multipath dm_mod
    > rfkill input_polldev sbs sbshc battery ac lp sg e1000 ide_cd_mod cdrom button acpi_memhotp
    > lug parport_pc rtc_cmos rtc_core parport serio_raw rtc_lib i2c_i801 i2c_core shpchp pcspkr
    > ata_piix libata megaraid_mbox megaraid_mm sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci
    > _hcd
    > Pid: 3934, comm: page01 Tainted: G W 2.6.27-rc7-mm1-dd8bf0fe #1
    > Call Trace:
    > [] warn_slowpath+0xb4/0xd2
    > [] do_get_write_access+0x37d/0x3c3 [jbd]
    > [] __getblk+0x25/0x21f
    > [] bit_waitqueue+0x10/0xa0
    > [] do_get_write_access+0x37d/0x3c3 [jbd]
    > [] bit_waitqueue+0x10/0xa0
    > [] find_get_page+0x18/0xc4
    > [] bit_waitqueue+0x10/0xa0
    > [] list_del+0x30/0x87
    > [] __release_page_cgroup+0x68/0x8a
    > [] page_remove_rmap+0x10e/0x12e
    > [] unmap_vmas+0x476/0x7f2
    > [] exit_mmap+0xf0/0x176
    > [] mmput+0x30/0x88
    > [] exit_mm+0xff/0x10a
    > [] do_exit+0x210/0x7a5
    > [] audit_syscall_entry+0x12d/0x160
    > [] do_group_exit+0x66/0x96
    > [] system_call_fastpath+0x16/0x1b
    > ---[ end trace 4eaa2a86a8e2da22 ]---
    >


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH 3/12] memcg make root cgroup unlimited.

    KAMEZAWA Hiroyuki wrote:
    > On Fri, 26 Sep 2008 14:11:00 +0530
    > Balbir Singh wrote:
    >
    >> KAMEZAWA Hiroyuki wrote:
    >>> Make root cgroup of memory resource controller to have no limit.
    >>>
    >>> By this, users cannot set limit to root group. This is for making root cgroup
    >>> as a kind of trash-can.
    >>>
    >>> For accounting pages which has no owner, which are created by force_empty,
    >>> we need some cgroup with no_limit. A patch for rewriting force_empty will
    >>> will follow this one.
    >>>
    >>> Signed-off-by: KAMEZAWA Hiroyuki

    >> This is an ABI change (although not too many people might be using it, I wonder
    >> if we should add memory.features (a set of flags and let users enable them and
    >> provide good defaults), like sched features.
    >>

    > I think "feature" flag is complicated, at this stage.
    > We'll add more features and not settled yet.
    >


    I know.. but breaking ABI is a bad bad thing. We'll have to keep the feature
    flags extensible (add new things). If we all feel we don't have enough users
    affected by this change, I might agree with you and make that change.

    > Hmm, if you don't like this,
    > calling try_to_free_page() at force_empty() instead of move_account() ?
    >


    Not sure I understand this.


    --
    Balbir
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH 0/12] memcg updates v5

    KAMEZAWA Hiroyuki wrote:
    > On Fri, 26 Sep 2008 13:48:58 +0530
    > Balbir Singh wrote:
    >
    >> KAMEZAWA Hiroyuki wrote:
    >>> Hi, I updated the stack and reflected comments.
    >>> Against the latest mmotm. (rc7-mm1)
    >>>
    >>> Major changes from previous one is
    >>> - page_cgroup allocation/lookup manner is changed.
    >>> all FLATMEM/DISCONTIGMEM/SPARSEMEM and MEMORY_HOTPLUG is supported.
    >>> - force_empty is totally rewritten. and a problem that "force_empty takes long time"
    >>> in previous version is fixed (I think...)
    >>> - reordered patches.
    >>> - first half are easy ones.
    >>> - second half are big ones.
    >>>
    >>> I'm still testing with full debug option. No problem found yet.
    >>> (I'm afraid of race condition which have not been caught yet.)
    >>>
    >>> [1/12] avoid accounting special mappings not on LRU. (fix)
    >>> [2/12] move charege() call to swapped-in page under lock_page() (clean up)
    >>> [3/12] make root cgroup to be unlimited. (change semantics.)
    >>> [4/12] make page->mapping NULL before calling uncharge (clean up)
    >>> [5/12] make page->flags to use atomic ops. (changes in infrastructure)
    >>> [6/12] optimize stat. (clean up)
    >>> [7/12] add support function for moving account. (new function)
    >>> [8/12] rewrite force_empty to use move_account. (change semantics.)
    >>> [9/12] allocate all page_cgroup at boot. (changes in infrastructure)
    >>> [10/12] free page_cgroup from LRU in lazy way (optimize)
    >>> [11/12] add page_cgroup to LRU in lazy way (optimize)
    >>> [12/12] fix race at charging swap (fix by new logic.)
    >>>
    >>> *Any* comment is welcome.

    >> Kame,
    >>
    >> I'm beginning to review test the patches now. It would be really nice to split
    >> the development patches from the maintenance ones. I think the full patchset has
    >> too many things and is confusing to look at.
    >>

    > I hope I can do....but maybe difficult.
    > If you give me ack, 1,2,4,6, can be pushed at early stage.


    I think (1) might be OK, except for the accounting issues pointed out (change in
    behaviour visible to end user again, sigh! ). Is (1) a serious issue? (2)
    seems OK, except for the locking change for mark_page_accessed. I am looking at
    (4) and (6) currently.

    --
    Balbir
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH 1/12] memcg avoid accounting special mappings not on LRU

    KAMEZAWA Hiroyuki wrote:
    > On Fri, 26 Sep 2008 13:55:54 +0530
    > Balbir Singh wrote:
    >
    >> KAMEZAWA Hiroyuki wrote:
    >>> There are not-on-LRU pages which can be mapped and they are not worth to
    >>> be accounted. (becasue we can't shrink them and need dirty codes to handle
    >>> specical case) We'd like to make use of usual objrmap/radix-tree's protcol
    >>> and don't want to account out-of-vm's control pages.
    >>>
    >>> When special_mapping_fault() is called, page->mapping is tend to be NULL
    >>> and it's charged as Anonymous page.
    >>> insert_page() also handles some special pages from drivers.
    >>>
    >>> This patch is for avoiding to account special pages.
    >>>

    >> Hmm... I am a little concerned that with these changes actual usage will much
    >> more than what we report in memory.usage_in_bytes. Why not move them to
    >> non-reclaimable LRU list as unevictable pages (once those patches go in, we can
    >> push this as well).

    >
    > Because they are not on LRU ...i.e. !PageLRU(page)
    >


    I understand.. Thanks for clarifying.. my concern is w.r.t accounting, we
    account it in RSS (file RSS).

    --
    Balbir
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH 4/12] memcg make page->mapping NULL before calling uncharge

    KAMEZAWA Hiroyuki wrote:
    > This patch tries to make page->mapping to be NULL before
    > mem_cgroup_uncharge_cache_page() is called.
    >
    > "page->mapping == NULL" is a good check for "whether the page is still
    > radix-tree or not".
    > This patch also adds BUG_ON() to mem_cgroup_uncharge_cache_page();
    >
    >
    > Signed-off-by: KAMEZAWA Hiroyuki


    Looks good, small nit-pick below

    > #endif
    > ClearPagePrivate(page);
    > set_page_private(page, 0);
    > - page->mapping = NULL;
    > + /* page->mapping contains a flag for PageAnon() */
    > + if (PageAnon(page)) {
    > + /* This page is uncharged at try_to_unmap(). */
    > + page->mapping = NULL;
    > + } else {
    > + /* Obsolete file cache should be uncharged */
    > + page->mapping = NULL;
    > + mem_cgroup_uncharge_cache_page(page);
    > + }
    >


    Isn't it better and correct coding style to do

    /*
    * Uncharge obsolete file cache
    */
    if (!PageAnon(page))
    mem_cgroup_uncharge_cache_page(page);
    /* else - uncharged at try_to_unmap() */
    page->mapping = NULL;


    --
    Balbir
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH 1/12] memcg avoid accounting special mappings not on LRU

    On Fri, 26 Sep 2008 15:02:13 +0530
    Balbir Singh wrote:

    > KAMEZAWA Hiroyuki wrote:
    > > On Fri, 26 Sep 2008 13:55:54 +0530
    > > Balbir Singh wrote:
    > >
    > >> KAMEZAWA Hiroyuki wrote:
    > >>> There are not-on-LRU pages which can be mapped and they are not worth to
    > >>> be accounted. (becasue we can't shrink them and need dirty codes to handle
    > >>> specical case) We'd like to make use of usual objrmap/radix-tree's protcol
    > >>> and don't want to account out-of-vm's control pages.
    > >>>
    > >>> When special_mapping_fault() is called, page->mapping is tend to be NULL
    > >>> and it's charged as Anonymous page.
    > >>> insert_page() also handles some special pages from drivers.
    > >>>
    > >>> This patch is for avoiding to account special pages.
    > >>>
    > >> Hmm... I am a little concerned that with these changes actual usage will much
    > >> more than what we report in memory.usage_in_bytes. Why not move them to
    > >> non-reclaimable LRU list as unevictable pages (once those patches go in, we can
    > >> push this as well).

    > >
    > > Because they are not on LRU ...i.e. !PageLRU(page)
    > >

    >
    > I understand.. Thanks for clarifying.. my concern is w.r.t accounting, we
    > account it in RSS (file RSS).
    >

    Yes. Because page->mapping is NULL, there are accounted as RSS.

    AFAIK, what bad about !PageLRU(page) is..
    1. we skips !PageLRU(page) in force_empty. and we cannot reduce account.
    2. we don't know we can trust that page is treated as usual LRU page.

    I'll update description.

    Thanks,
    -Kame





    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH 6/12] memcg optimize percpu stat

    KAMEZAWA Hiroyuki wrote:
    > Some obvious optimization to memcg.
    >
    > I found mem_cgroup_charge_statistics() is a little big (in object) and
    > does unnecessary address calclation.
    > This patch is for optimization to reduce the size of this function.
    >
    > And res_counter_charge() is 'likely' to success.
    >
    > Changelog v3->v4:
    > - merged with an other leaf patch.
    >
    > Signed-off-by: KAMEZAWA Hiroyuki
    >
    > mm/memcontrol.c | 18 ++++++++++--------
    > 1 file changed, 10 insertions(+), 8 deletions(-)
    >
    > Index: mmotm-2.6.27-rc7+/mm/memcontrol.c
    > ================================================== =================
    > --- mmotm-2.6.27-rc7+.orig/mm/memcontrol.c
    > +++ mmotm-2.6.27-rc7+/mm/memcontrol.c
    > @@ -66,11 +66,10 @@ struct mem_cgroup_stat {
    > /*
    > * For accounting under irq disable, no need for increment preempt count.
    > */
    > -static void __mem_cgroup_stat_add_safe(struct mem_cgroup_stat *stat,
    > +static inline void __mem_cgroup_stat_add_safe(struct mem_cgroup_stat_cpu *stat,
    > enum mem_cgroup_stat_index idx, int val)
    > {
    > - int cpu = smp_processor_id();
    > - stat->cpustat[cpu].count[idx] += val;
    > + stat->count[idx] += val;
    > }
    >
    > static s64 mem_cgroup_read_stat(struct mem_cgroup_stat *stat,
    > @@ -237,18 +236,21 @@ static void mem_cgroup_charge_statistics
    > {
    > int val = (charge)? 1 : -1;
    > struct mem_cgroup_stat *stat = &mem->stat;
    > + struct mem_cgroup_stat_cpu *cpustat;
    >
    > VM_BUG_ON(!irqs_disabled());
    > +
    > + cpustat = &stat->cpustat[smp_processor_id()];
    > if (PageCgroupCache(pc))
    > - __mem_cgroup_stat_add_safe(stat, MEM_CGROUP_STAT_CACHE, val);
    > + __mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_CACHE, val);
    > else
    > - __mem_cgroup_stat_add_safe(stat, MEM_CGROUP_STAT_RSS, val);
    > + __mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_RSS, val);
    >
    > if (charge)
    > - __mem_cgroup_stat_add_safe(stat,
    > + __mem_cgroup_stat_add_safe(cpustat,
    > MEM_CGROUP_STAT_PGPGIN_COUNT, 1);
    > else
    > - __mem_cgroup_stat_add_safe(stat,
    > + __mem_cgroup_stat_add_safe(cpustat,
    > MEM_CGROUP_STAT_PGPGOUT_COUNT, 1);
    > }
    >
    > @@ -609,7 +611,7 @@ static int mem_cgroup_charge_common(stru
    > css_get(&memcg->css);
    > }
    >
    > - while (res_counter_charge(&mem->res, PAGE_SIZE)) {
    > + while (unlikely(res_counter_charge(&mem->res, PAGE_SIZE))) {
    > if (!(gfp_mask & __GFP_WAIT))
    > goto out;
    >
    >


    Looks good to me

    Acked-by: Balbir Singh

    --
    Balbir
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH 3/12] memcg make root cgroup unlimited.

    On Fri, 26 Sep 2008 14:59:48 +0530
    Balbir Singh wrote:

    > > I think "feature" flag is complicated, at this stage.
    > > We'll add more features and not settled yet.
    > >

    >
    > I know.. but breaking ABI is a bad bad thing. We'll have to keep the feature
    > flags extensible (add new things). If we all feel we don't have enough users
    > affected by this change, I might agree with you and make that change.
    >

    Hmm...I'll drop this and add force_empty_may_fail logic in changes in force_empty.

    > > Hmm, if you don't like this,
    > > calling try_to_free_page() at force_empty() instead of move_account() ?
    > >

    >
    > Not sure I understand this.
    >

    In following series, force_empty() uses move_account() rather than forget all.
    By this, accounted file caches are kept as accounted and the whole accounting
    will be sane.

    Another choice is calling try_to_free_pages() at force_empty rather than forget all
    and makes memory usage to be Zero. This will also makes the whole accounting sange.

    Thanks,
    -Kame

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH 4/12] memcg make page->mapping NULL before calling uncharge

    On Fri, 26 Sep 2008 15:17:48 +0530
    Balbir Singh wrote:

    > KAMEZAWA Hiroyuki wrote:
    > > This patch tries to make page->mapping to be NULL before
    > > mem_cgroup_uncharge_cache_page() is called.
    > >
    > > "page->mapping == NULL" is a good check for "whether the page is still
    > > radix-tree or not".
    > > This patch also adds BUG_ON() to mem_cgroup_uncharge_cache_page();
    > >
    > >
    > > Signed-off-by: KAMEZAWA Hiroyuki

    >
    > Looks good, small nit-pick below
    >
    > > #endif
    > > ClearPagePrivate(page);
    > > set_page_private(page, 0);
    > > - page->mapping = NULL;
    > > + /* page->mapping contains a flag for PageAnon() */
    > > + if (PageAnon(page)) {
    > > + /* This page is uncharged at try_to_unmap(). */
    > > + page->mapping = NULL;
    > > + } else {
    > > + /* Obsolete file cache should be uncharged */
    > > + page->mapping = NULL;
    > > + mem_cgroup_uncharge_cache_page(page);
    > > + }
    > >

    >
    > Isn't it better and correct coding style to do
    >
    > /*
    > * Uncharge obsolete file cache
    > */
    > if (!PageAnon(page))
    > mem_cgroup_uncharge_cache_page(page);
    > /* else - uncharged at try_to_unmap() */
    > page->mapping = NULL;
    >

    yea, maybe.
    I always wonder what I should do when I want to add comment to if-then-else...

    But ok, will remove {}.

    Thanks,
    -Kame



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [PATCH 0/12] memcg updates v5

    On Fri, 26 Sep 2008 15:01:46 +0530
    Balbir Singh wrote:

    > KAMEZAWA Hiroyuki wrote:
    > > On Fri, 26 Sep 2008 13:48:58 +0530
    > > Balbir Singh wrote:
    > >
    > >> KAMEZAWA Hiroyuki wrote:
    > >>> Hi, I updated the stack and reflected comments.
    > >>> Against the latest mmotm. (rc7-mm1)
    > >>>
    > >>> Major changes from previous one is
    > >>> - page_cgroup allocation/lookup manner is changed.
    > >>> all FLATMEM/DISCONTIGMEM/SPARSEMEM and MEMORY_HOTPLUG is supported.
    > >>> - force_empty is totally rewritten. and a problem that "force_empty takes long time"
    > >>> in previous version is fixed (I think...)
    > >>> - reordered patches.
    > >>> - first half are easy ones.
    > >>> - second half are big ones.
    > >>>
    > >>> I'm still testing with full debug option. No problem found yet.
    > >>> (I'm afraid of race condition which have not been caught yet.)
    > >>>
    > >>> [1/12] avoid accounting special mappings not on LRU. (fix)
    > >>> [2/12] move charege() call to swapped-in page under lock_page() (clean up)
    > >>> [3/12] make root cgroup to be unlimited. (change semantics.)
    > >>> [4/12] make page->mapping NULL before calling uncharge (clean up)
    > >>> [5/12] make page->flags to use atomic ops. (changes in infrastructure)
    > >>> [6/12] optimize stat. (clean up)
    > >>> [7/12] add support function for moving account. (new function)
    > >>> [8/12] rewrite force_empty to use move_account. (change semantics.)
    > >>> [9/12] allocate all page_cgroup at boot. (changes in infrastructure)
    > >>> [10/12] free page_cgroup from LRU in lazy way (optimize)
    > >>> [11/12] add page_cgroup to LRU in lazy way (optimize)
    > >>> [12/12] fix race at charging swap (fix by new logic.)
    > >>>
    > >>> *Any* comment is welcome.
    > >> Kame,
    > >>
    > >> I'm beginning to review test the patches now. It would be really nice to split
    > >> the development patches from the maintenance ones. I think the full patchset has
    > >> too many things and is confusing to look at.
    > >>

    > > I hope I can do....but maybe difficult.
    > > If you give me ack, 1,2,4,6, can be pushed at early stage.

    >
    > I think (1) might be OK, except for the accounting issues pointed out (change in
    > behaviour visible to end user again, sigh! ).

    But it was just a BUG from my point of view...

    > Is (1) a serious issue?

    considering force_empty(), it's serious.

    > (2) seems OK, except for the locking change for mark_page_accessed. I am looking at
    > (4) and (6) currently.
    >

    Thanks,
    -Kmae

    > --
    > Balbir
    >


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [PATCH 0/12] memcg updates v5

    On Fri, 26 Sep 2008 14:24:55 +0900
    Daisuke Nishimura wrote:
    > Unfortunately, there remains some bugs yet...
    >


    Thank you for your patient good test!

    I'm now testing following (and will do over-night test.)
    In this an hour, this seems to work good.
    (under your test which usually panics in 10-20min on my box.)

    ==
    page_cgroup is not valid until pc->mem_cgroup is set to appropriate value.
    There is a small race between Set-Used-Bit and Set-Proper-pc->mem_cgroup.
    This patch tries to fix that by adding PCG_VALID bit

    Signed-off-by: KAMEZAWA Hiroyuki

    include/linux/page_cgroup.h | 3 +++
    mm/memcontrol.c | 22 ++++++++++++++--------
    2 files changed, 17 insertions(+), 8 deletions(-)

    Index: mmotm-2.6.27-rc7+/include/linux/page_cgroup.h
    ================================================== =================
    --- mmotm-2.6.27-rc7+.orig/include/linux/page_cgroup.h
    +++ mmotm-2.6.27-rc7+/include/linux/page_cgroup.h
    @@ -21,6 +21,7 @@ struct page_cgroup *lookup_page_cgroup(s

    enum {
    /* flags for mem_cgroup */
    + PCG_VALID, /* you can access this page cgroup */
    PCG_LOCK, /* page cgroup is locked */
    PCG_CACHE, /* charged as cache */
    PCG_USED, /* this object is in use. */
    @@ -50,6 +51,10 @@ static inline int TestSetPageCgroup##una
    /* Cache flag is set only once (at allocation) */
    TESTPCGFLAG(Cache, CACHE)

    +TESTPCGFLAG(Valid, VALID)
    +SETPCGFLAG(Valid, VALID)
    +CLEARPCGFLAG(Valid, VALID)
    +
    TESTPCGFLAG(Used, USED)
    CLEARPCGFLAG(Used, USED)
    TESTSETPCGFLAG(Used, USED)
    Index: mmotm-2.6.27-rc7+/mm/memcontrol.c
    ================================================== =================
    --- mmotm-2.6.27-rc7+.orig/mm/memcontrol.c
    +++ mmotm-2.6.27-rc7+/mm/memcontrol.c
    @@ -340,7 +340,7 @@ void mem_cgroup_move_lists(struct page *
    if (!trylock_page_cgroup(pc))
    return;

    - if (PageCgroupUsed(pc) && PageCgroupLRU(pc)) {
    + if (PageCgroupValid(pc) && PageCgroupLRU(pc)) {
    mem = pc->mem_cgroup;
    mz = page_cgroup_zoneinfo(pc);
    spin_lock_irqsave(&mz->lru_lock, flags);
    @@ -434,7 +434,7 @@ unsigned long mem_cgroup_isolate_pages(u
    list_for_each_entry_safe_reverse(pc, tmp, src, lru) {
    if (scan >= nr_to_scan)
    break;
    - if (unlikely(!PageCgroupUsed(pc)))
    + if (unlikely(!PageCgroupValid(pc)))
    continue;
    page = pc->page;

    @@ -511,7 +511,7 @@ int mem_cgroup_move_account(struct page
    return ret;
    }

    - if (!PageCgroupUsed(pc)) {
    + if (!PageCgroupValid(pc)) {
    res_counter_uncharge(&to->res, PAGE_SIZE);
    goto out;
    }
    @@ -580,6 +580,7 @@ __set_page_cgroup_lru(struct memcg_percp
    unsigned long flags;
    struct mem_cgroup_per_zone *mz, *prev_mz;
    struct page_cgroup *pc;
    + struct mem_cgroup *mem;
    int i, nr;

    local_irq_save(flags);
    @@ -589,6 +590,7 @@ __set_page_cgroup_lru(struct memcg_percp

    for (i = nr - 1; i >= 0; i--) {
    pc = mpv->vec[i];
    + mem = pc->mem_cgroup;
    mz = page_cgroup_zoneinfo(pc);
    if (prev_mz != mz) {
    if (prev_mz)
    @@ -596,9 +598,11 @@ __set_page_cgroup_lru(struct memcg_percp
    prev_mz = mz;
    spin_lock(&mz->lru_lock);
    }
    - if (PageCgroupUsed(pc) && !PageCgroupLRU(pc)) {
    - SetPageCgroupLRU(pc);
    - __mem_cgroup_add_list(mz, pc);
    + if (PageCgroupValid(pc) && !PageCgroupLRU(pc)) {
    + if (mem == pc->mem_cgroup) {
    + SetPageCgroupLRU(pc);
    + __mem_cgroup_add_list(mz, pc);
    + }
    }
    }

    @@ -790,6 +794,7 @@ void mem_cgroup_commit_charge(struct pag
    }

    pc->mem_cgroup = mem;
    + SetPageCgroupValid(pc);
    set_page_cgroup_lru(pc);
    css_put(&mem->css);
    preempt_enable();
    @@ -928,6 +933,7 @@ __mem_cgroup_uncharge_common(struct page
    return;
    preempt_disable();
    lock_page_cgroup(pc);
    + ClearPageCgroupValid(pc);
    ClearPageCgroupUsed(pc);
    mem = pc->mem_cgroup;
    unlock_page_cgroup(pc);
    @@ -970,7 +976,7 @@ int mem_cgroup_prepare_migration(struct

    pc = lookup_page_cgroup(page);
    lock_page_cgroup(pc);
    - if (PageCgroupUsed(pc)) {
    + if (PageCgroupValid(pc)) {
    mem = pc->mem_cgroup;
    css_get(&mem->css);
    if (PageCgroupCache(pc)) {
    @@ -1086,7 +1092,7 @@ static void mem_cgroup_force_empty_list(
    spin_lock_irqsave(&mz->lru_lock, flags);
    list_for_each_entry_safe(pc, tmp, list, lru) {
    page = pc->page;
    - if (!PageCgroupUsed(pc))
    + if (!PageCgroupValid(pc))
    continue;
    /* For avoiding race with speculative page cache handling. */
    if (!PageLRU(page) || !get_page_unless_zero(page))

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [PATCH 0/12] memcg updates v5

    On Fri, 26 Sep 2008 19:43:09 +0900
    KAMEZAWA Hiroyuki wrote:

    > On Fri, 26 Sep 2008 14:24:55 +0900
    > Daisuke Nishimura wrote:
    > > Unfortunately, there remains some bugs yet...
    > >

    >
    > Thank you for your patient good test!
    >
    > I'm now testing following (and will do over-night test.)
    > In this an hour, this seems to work good.
    > (under your test which usually panics in 10-20min on my box.)
    >

    This helps some, but causes other troubles. (But I know what it is.)
    I'll add lock_page_cgroup() to charge side and see what happens.

    Thanks,
    -Kame



    > ==
    > page_cgroup is not valid until pc->mem_cgroup is set to appropriate value.
    > There is a small race between Set-Used-Bit and Set-Proper-pc->mem_cgroup.
    > This patch tries to fix that by adding PCG_VALID bit
    >
    > Signed-off-by: KAMEZAWA Hiroyuki
    >
    > include/linux/page_cgroup.h | 3 +++
    > mm/memcontrol.c | 22 ++++++++++++++--------
    > 2 files changed, 17 insertions(+), 8 deletions(-)
    >
    > Index: mmotm-2.6.27-rc7+/include/linux/page_cgroup.h
    > ================================================== =================
    > --- mmotm-2.6.27-rc7+.orig/include/linux/page_cgroup.h
    > +++ mmotm-2.6.27-rc7+/include/linux/page_cgroup.h
    > @@ -21,6 +21,7 @@ struct page_cgroup *lookup_page_cgroup(s
    >
    > enum {
    > /* flags for mem_cgroup */
    > + PCG_VALID, /* you can access this page cgroup */
    > PCG_LOCK, /* page cgroup is locked */
    > PCG_CACHE, /* charged as cache */
    > PCG_USED, /* this object is in use. */
    > @@ -50,6 +51,10 @@ static inline int TestSetPageCgroup##una
    > /* Cache flag is set only once (at allocation) */
    > TESTPCGFLAG(Cache, CACHE)
    >
    > +TESTPCGFLAG(Valid, VALID)
    > +SETPCGFLAG(Valid, VALID)
    > +CLEARPCGFLAG(Valid, VALID)
    > +
    > TESTPCGFLAG(Used, USED)
    > CLEARPCGFLAG(Used, USED)
    > TESTSETPCGFLAG(Used, USED)
    > Index: mmotm-2.6.27-rc7+/mm/memcontrol.c
    > ================================================== =================
    > --- mmotm-2.6.27-rc7+.orig/mm/memcontrol.c
    > +++ mmotm-2.6.27-rc7+/mm/memcontrol.c
    > @@ -340,7 +340,7 @@ void mem_cgroup_move_lists(struct page *
    > if (!trylock_page_cgroup(pc))
    > return;
    >
    > - if (PageCgroupUsed(pc) && PageCgroupLRU(pc)) {
    > + if (PageCgroupValid(pc) && PageCgroupLRU(pc)) {
    > mem = pc->mem_cgroup;
    > mz = page_cgroup_zoneinfo(pc);
    > spin_lock_irqsave(&mz->lru_lock, flags);
    > @@ -434,7 +434,7 @@ unsigned long mem_cgroup_isolate_pages(u
    > list_for_each_entry_safe_reverse(pc, tmp, src, lru) {
    > if (scan >= nr_to_scan)
    > break;
    > - if (unlikely(!PageCgroupUsed(pc)))
    > + if (unlikely(!PageCgroupValid(pc)))
    > continue;
    > page = pc->page;
    >
    > @@ -511,7 +511,7 @@ int mem_cgroup_move_account(struct page
    > return ret;
    > }
    >
    > - if (!PageCgroupUsed(pc)) {
    > + if (!PageCgroupValid(pc)) {
    > res_counter_uncharge(&to->res, PAGE_SIZE);
    > goto out;
    > }
    > @@ -580,6 +580,7 @@ __set_page_cgroup_lru(struct memcg_percp
    > unsigned long flags;
    > struct mem_cgroup_per_zone *mz, *prev_mz;
    > struct page_cgroup *pc;
    > + struct mem_cgroup *mem;
    > int i, nr;
    >
    > local_irq_save(flags);
    > @@ -589,6 +590,7 @@ __set_page_cgroup_lru(struct memcg_percp
    >
    > for (i = nr - 1; i >= 0; i--) {
    > pc = mpv->vec[i];
    > + mem = pc->mem_cgroup;
    > mz = page_cgroup_zoneinfo(pc);
    > if (prev_mz != mz) {
    > if (prev_mz)
    > @@ -596,9 +598,11 @@ __set_page_cgroup_lru(struct memcg_percp
    > prev_mz = mz;
    > spin_lock(&mz->lru_lock);
    > }
    > - if (PageCgroupUsed(pc) && !PageCgroupLRU(pc)) {
    > - SetPageCgroupLRU(pc);
    > - __mem_cgroup_add_list(mz, pc);
    > + if (PageCgroupValid(pc) && !PageCgroupLRU(pc)) {
    > + if (mem == pc->mem_cgroup) {
    > + SetPageCgroupLRU(pc);
    > + __mem_cgroup_add_list(mz, pc);
    > + }
    > }
    > }
    >
    > @@ -790,6 +794,7 @@ void mem_cgroup_commit_charge(struct pag
    > }
    >
    > pc->mem_cgroup = mem;
    > + SetPageCgroupValid(pc);
    > set_page_cgroup_lru(pc);
    > css_put(&mem->css);
    > preempt_enable();
    > @@ -928,6 +933,7 @@ __mem_cgroup_uncharge_common(struct page
    > return;
    > preempt_disable();
    > lock_page_cgroup(pc);
    > + ClearPageCgroupValid(pc);
    > ClearPageCgroupUsed(pc);
    > mem = pc->mem_cgroup;
    > unlock_page_cgroup(pc);
    > @@ -970,7 +976,7 @@ int mem_cgroup_prepare_migration(struct
    >
    > pc = lookup_page_cgroup(page);
    > lock_page_cgroup(pc);
    > - if (PageCgroupUsed(pc)) {
    > + if (PageCgroupValid(pc)) {
    > mem = pc->mem_cgroup;
    > css_get(&mem->css);
    > if (PageCgroupCache(pc)) {
    > @@ -1086,7 +1092,7 @@ static void mem_cgroup_force_empty_list(
    > spin_lock_irqsave(&mz->lru_lock, flags);
    > list_for_each_entry_safe(pc, tmp, list, lru) {
    > page = pc->page;
    > - if (!PageCgroupUsed(pc))
    > + if (!PageCgroupValid(pc))
    > continue;
    > /* For avoiding race with speculative page cache handling. */
    > if (!PageLRU(page) || !get_page_unless_zero(page))
    >
    > --
    > To unsubscribe, send a message with 'unsubscribe linux-mm' in
    > the body to majordomo@kvack.org. For more info on Linux MM,
    > see: http://www.linux-mm.org/ .
    > Don't email: email@kvack.org
    >


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [PATCH 9/12] memcg allocate all page_cgroup at boot

    On Fri, 26 Sep 2008 10:43:36 +0900
    KAMEZAWA Hiroyuki wrote:

    > > > - VM_BUG_ON(pc->page != page);
    > > > + pc = lookup_page_cgroup(page);
    > > > + if (unlikely(!pc || !PageCgroupUsed(pc)))
    > > > + return;
    > > > + preempt_disable();
    > > > + lock_page_cgroup(pc);
    > > > + if (unlikely(page_mapped(page))) {
    > > > + unlock_page_cgroup(pc);
    > > > + preempt_enable();
    > > > + return;
    > > > + }

    > > Just for clarification, in what sequence will the page be mapped here?
    > > mem_cgroup_uncharge_page checks whether the page is mapped.
    > >


    I think I saw page_mapped() case (in your shmem/page test.)
    I'll check what cause this if I have time.

    Thanks,
    -Kame

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [PATCH 0/12] memcg updates v5

    On Fri, 26 Sep 2008 19:36:02 +0900
    KAMEZAWA Hiroyuki wrote:

    > > I think (1) might be OK, except for the accounting issues pointed out (change in
    > > behaviour visible to end user again, sigh! ).

    > But it was just a BUG from my point of view...
    >
    > > Is (1) a serious issue?

    > considering force_empty(), it's serious.
    >
    > > (2) seems OK, except for the locking change for mark_page_accessed. I am looking at
    > > (4) and (6) currently.
    > >


    I'll do in following way in the next Monday.
    Divide patches into 2 set

    in early fix/optimize set.
    - push (2)
    - push (4)
    - push (6)
    - push (1)

    drops (3).

    I don't want to remove all? pages-never-on-LRU before fixing force_empty.

    in updates
    - introduce atomic flags. (5)
    - add move_account() function (7)
    - add memory.attribute to each memcg dir. (NEW)
    - enhance force_empty (was (8))
    - remove "forget all" logic. and add attribute to select following 2 behavior
    - call try_to_free_page() until the usage goes down to 0.
    This allows faiulre (if page is mlocked, we can't do.). (NEW)
    - call move_account() to move all charges to its parent (as much as possible) (NEW)
    In future, I'd liket to add trash-box cgroup for force_empty somewhere.
    - allocate all page cgroup at boot (9)
    - lazy lru free/add (10,11) with fixes.
    - fix race at charging swap. (12)

    After (9), all page and page_cgroup has one-to-one releationship and we want to
    assume that "if page is alive and on LRU, it's accounted and has page_cgroup."
    (other team, bio cgroup want to use page_cgroup and I want to make it easy.)

    For this, fix to behavior of force_empty..."forget all" is necessary.
    SwapCache handling is also necessary but I'd like to postpone until next set
    because it's complicated.

    After above all.
    - handle swap cache
    - Mem+Swap controller.
    - add trashbox feature ?
    - add memory.shrink_usage_to file.

    It's long way to what I really want to do....


    Thanks,
    -Kame








    -




    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [PATCH 9/12] memcg allocate all page_cgroup at boot

    On Fri, 26 Sep 2008 14:54:22 +0900
    Daisuke Nishimura wrote:
    > > Sorry, my brain seems to be sleeping.. above page_mapped() check doesn't
    > > help this situation. Maybe this page_mapped() check is not necessary
    > > because it's of no use.
    > >
    > > I think this kind of problem will not be fixed until we handle SwapCache.
    > >

    > I've not fully understood yet what [12/12] does, but if we handle
    > swapcache properly, [12/12] would become unnecessary?
    >


    Try to illustrate what is trouble more precisely.


    in do_swap_page(), page is charged when SwapCache lookup ends.

    Here,
    - charged when page is not mapped.
    - not charged when page is mapped.
    set_pte() etc...are done under appropriate lock.

    On the other side, when a task exits, zap_pte_range() is called.
    It calls page_remove_rmap().

    Case A) Following is race.

    Thread A Thread B

    do_swap_page() zap_pte_range()
    (1)try charge (mapcount=1)
    (2) page_remove_rmap()
    (3) uncharge page.
    (4) map it


    Then,
    at (1), mapcount=1 and this page is not charged.
    at (2), page_remove_rmap() is called and mapcount goes down to Zero.
    uncharge(3) is called.
    at (4), at the end of do_swap_page(), page->mapcount=1 but not charged.

    Case B) In another scenario.

    Thread A Thread B

    do_swap_page() zap_pte_range()
    (1)try charge (mapcount=1)
    (2) page_remove_rmap()
    (3) map it
    (4) uncharge is called.

    In (4), uncharge is capped but mapcount can go up to 1.

    protocol 12/12 is for case (A).
    After 12/12, double-check page_mapped() under lock_page_cgroup() will be fix to
    case (B).

    Huu, I don't like swap-cache
    Anyway, we'll have to handle swap cache later.

    Thanks,
    -Kame






















    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [PATCH 5/12] memcg make page_cgroup->flags atomic

    KAMEZAWA Hiroyuki wrote:
    > This patch makes page_cgroup->flags to be atomic_ops and define
    > functions (and macros) to access it.
    >
    > This patch itself makes memcg slow but this patch's final purpose is
    > to remove lock_page_cgroup() and allowing fast access to page_cgroup.
    > (And total performance will increase after all patches applied.)
    >


    Looks good to me

    Acked-by: Balbir Singh
    provided we push in the lockless ones too

    --
    Balbir
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [PATCH 7/12] memcg add function to move account

    KAMEZAWA Hiroyuki wrote:
    > This patch provides a function to move account information of a page between
    > mem_cgroups.
    >


    What is the interface for moving the accounting? Is it an explicit call like
    force_empty? The other concern I have is about merging two LRU lists, when we
    move LRU pages from one mem_cgroup to another, where do we add them? To the head
    or tail? I think we need to think about it and document it well.

    The other thing is that once we have mult-hierarchy support (which we really
    need), we need to move the accounting to the parent instead of root.

    > This moving of page_cgroup is done under
    > - lru_lock of source/destination mem_cgroup is held.


    I suppose you mean and instead of either for the lru_lock

    > - lock_page_cgroup() is held.
    >
    > Then, a routine which touches pc->mem_cgroup without lock_page_cgroup() should
    > confirm pc->mem_cgroup is still valid or not. Typlical code can be following.
    >
    > (while page is not under lock_page())
    > mem = pc->mem_cgroup;
    > mz = page_cgroup_zoneinfo(pc)
    > spin_lock_irqsave(&mz->lru_lock);
    > if (pc->mem_cgroup == mem)
    > ...../* some list handling */
    > spin_unlock_irq(&mz->lru_lock);
    >
    > Or better way is
    > lock_page_cgroup(pc);
    > ....
    > unlock_page_cgroup(pc);
    >
    > But you should confirm the nest of lock and avoid deadlock.
    > (trylock is better if it's ok.)
    >
    > If you find page_cgroup from mem_cgroup's LRU under mz->lru_lock,
    > you don't have to worry about what pc->mem_cgroup points to.
    >
    > Changelog: (v4) -> (v5)
    > - check for lock_page() is removed.
    > - rewrote description.
    >
    > Changelog: (v2) -> (v4)
    > - added lock_page_cgroup().
    > - splitted out from new-force-empty patch.
    > - added how-to-use text.
    > - fixed race in __mem_cgroup_uncharge_common().
    >
    > Signed-off-by: KAMEZAWA Hiroyuki
    >
    > mm/memcontrol.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++ ++++--
    > 1 file changed, 81 insertions(+), 3 deletions(-)
    >
    > Index: mmotm-2.6.27-rc7+/mm/memcontrol.c
    > ================================================== =================
    > --- mmotm-2.6.27-rc7+.orig/mm/memcontrol.c
    > +++ mmotm-2.6.27-rc7+/mm/memcontrol.c
    > @@ -426,6 +426,7 @@ int task_in_mem_cgroup(struct task_struc
    > void mem_cgroup_move_lists(struct page *page, enum lru_list lru)
    > {
    > struct page_cgroup *pc;
    > + struct mem_cgroup *mem;
    > struct mem_cgroup_per_zone *mz;
    > unsigned long flags;
    >
    > @@ -444,9 +445,14 @@ void mem_cgroup_move_lists(struct page *
    >
    > pc = page_get_page_cgroup(page);
    > if (pc) {
    > + mem = pc->mem_cgroup;
    > mz = page_cgroup_zoneinfo(pc);
    > spin_lock_irqsave(&mz->lru_lock, flags);
    > - __mem_cgroup_move_lists(pc, lru);
    > + /*
    > + * check against the race with move_account.
    > + */
    > + if (likely(mem == pc->mem_cgroup))
    > + __mem_cgroup_move_lists(pc, lru);
    > spin_unlock_irqrestore(&mz->lru_lock, flags);
    > }
    > unlock_page_cgroup(page);
    > @@ -567,6 +573,70 @@ unsigned long mem_cgroup_isolate_pages(u
    > return nr_taken;
    > }
    >
    > +/**
    > + * mem_cgroup_move_account - move account of the page
    > + * @page ... the target page of being moved.
    > + * @pc ... page_cgroup of the page.
    > + * @from ... mem_cgroup which the page is moved from.
    > + * @to ... mem_cgroup which the page is moved to.
    > + *
    > + * The caller must confirm following.
    > + * 1. disable irq.
    > + * 2. lru_lock of old mem_cgroup should be held.
    > + * 3. pc is guaranteed to be valid and on mem_cgroup's LRU.
    > + *
    > + * Because we cannot call try_to_free_page() here, the caller must guarantee
    > + * this moving of charge never fails. (if charge fails, this call fails.)
    > + * Currently this is called only against root cgroup.
    > + * which has no limitation of resource.
    > + * Returns 0 at success, returns 1 at failure.
    > + */
    > +int mem_cgroup_move_account(struct page *page, struct page_cgroup *pc,
    > + struct mem_cgroup *from, struct mem_cgroup *to)
    > +{
    > + struct mem_cgroup_per_zone *from_mz, *to_mz;
    > + int nid, zid;
    > + int ret = 1;
    > +
    > + VM_BUG_ON(!irqs_disabled());
    > +
    > + nid = page_to_nid(page);
    > + zid = page_zonenum(page);
    > + from_mz = mem_cgroup_zoneinfo(from, nid, zid);
    > + to_mz = mem_cgroup_zoneinfo(to, nid, zid);
    > +
    > + if (res_counter_charge(&to->res, PAGE_SIZE)) {
    > + /* Now, we assume no_limit...no failure here. */
    > + return ret;
    > + }


    Please BUG_ON() if the charging fails, we can be sure we catch assumptions that
    are broken.

    > + if (!try_lock_page_cgroup(page)) {
    > + res_counter_uncharge(&to->res, PAGE_SIZE);
    > + return ret;
    > + }
    > +
    > + if (page_get_page_cgroup(page) != pc) {
    > + res_counter_uncharge(&to->res, PAGE_SIZE);
    > + goto out;
    > + }
    > +
    > + if (spin_trylock(&to_mz->lru_lock)) {


    The spin_trylock is to avoid deadlocks, right?

    > + __mem_cgroup_remove_list(from_mz, pc);
    > + css_put(&from->css);
    > + res_counter_uncharge(&from->res, PAGE_SIZE);
    > + pc->mem_cgroup = to;
    > + css_get(&to->css);
    > + __mem_cgroup_add_list(to_mz, pc);
    > + ret = 0;
    > + spin_unlock(&to_mz->lru_lock);
    > + } else {
    > + res_counter_uncharge(&to->res, PAGE_SIZE);
    > + }
    > +out:
    > + unlock_page_cgroup(page);
    > +
    > + return ret;
    > +}
    > +
    > /*
    > * Charge the memory controller for page usage.
    > * Return
    > @@ -754,16 +824,24 @@ __mem_cgroup_uncharge_common(struct page
    > if ((ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
    > && ((PageCgroupCache(pc) || page_mapped(page))))
    > goto unlock;
    > -
    > +retry:
    > + mem = pc->mem_cgroup;
    > mz = page_cgroup_zoneinfo(pc);
    > spin_lock_irqsave(&mz->lru_lock, flags);
    > + if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED &&
    > + unlikely(mem != pc->mem_cgroup)) {
    > + /* MAPPED account can be done without lock_page().
    > + Check race with mem_cgroup_move_account() */


    Coding style above is broken. Can this race really occur? Why do we get mem
    before acquiring the mz->lru_lock? We don't seem to be using it.

    > + spin_unlock_irqrestore(&mz->lru_lock, flags);
    > + goto retry;
    > + }
    > __mem_cgroup_remove_list(mz, pc);
    > spin_unlock_irqrestore(&mz->lru_lock, flags);
    >
    > page_assign_page_cgroup(page, NULL);
    > unlock_page_cgroup(page);
    >
    > - mem = pc->mem_cgroup;
    > +
    > res_counter_uncharge(&mem->res, PAGE_SIZE);
    > css_put(&mem->css);
    >
    >



    --
    Balbir
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: Re: [PATCH 7/12] memcg add function to move account

    ----- Original Message -----

    >KAMEZAWA Hiroyuki wrote:
    >> This patch provides a function to move account information of a page betwee

    n
    >> mem_cgroups.
    >>

    >
    >What is the interface for moving the accounting? Is it an explicit call like
    >force_empty?


    My final purpose is moving account at attach_task().

    > The other concern I have is about merging two LRU lists, when we
    >move LRU pages from one mem_cgroup to another, where do we add them?
    > To the head
    >or tail? I think we need to think about it and document it well.
    >

    Hmm, good point. considering force_empty, head is better.
    (But I don't think about this seriously because I assumed root cgroup
    is unlimited)

    I don't think this kind of internal behavior should not be documented as UI
    while it's hot.

    >The other thing is that once we have mult-hierarchy support (which we really
    >need), we need to move the accounting to the parent instead of root.
    >

    yes. of course. please do as you like if this goes.
    adding logic to do so needs more patch, so please wait.

    But, IMHO, just use try_to_free_pages() is better for hierarchy, maybe.

    >> This moving of page_cgroup is done under
    >> - lru_lock of source/destination mem_cgroup is held.

    >
    >I suppose you mean and instead of either for the lru_lock
    >

    I'll fix this comment.

    >> - lock_page_cgroup() is held.
    >>
    >> Then, a routine which touches pc->mem_cgroup without lock_page_cgroup() sho

    uld
    >> confirm pc->mem_cgroup is still valid or not. Typlical code can be followin

    g.
    >>
    >> (while page is not under lock_page())
    >> mem = pc->mem_cgroup;
    >> mz = page_cgroup_zoneinfo(pc)
    >> spin_lock_irqsave(&mz->lru_lock);
    >> if (pc->mem_cgroup == mem)
    >> ...../* some list handling */
    >> spin_unlock_irq(&mz->lru_lock);
    >>
    >> Or better way is
    >> lock_page_cgroup(pc);
    >> ....
    >> unlock_page_cgroup(pc);
    >>
    >> But you should confirm the nest of lock and avoid deadlock.
    >> (trylock is better if it's ok.)
    >>
    >> If you find page_cgroup from mem_cgroup's LRU under mz->lru_lock,
    >> you don't have to worry about what pc->mem_cgroup points to.
    >>
    >> Changelog: (v4) -> (v5)
    >> - check for lock_page() is removed.
    >> - rewrote description.
    >>
    >> Changelog: (v2) -> (v4)
    >> - added lock_page_cgroup().
    >> - splitted out from new-force-empty patch.
    >> - added how-to-use text.
    >> - fixed race in __mem_cgroup_uncharge_common().
    >>
    >> Signed-off-by: KAMEZAWA Hiroyuki
    >>
    >> mm/memcontrol.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++ +

    +++--
    >> 1 file changed, 81 insertions(+), 3 deletions(-)
    >>
    >> Index: mmotm-2.6.27-rc7+/mm/memcontrol.c
    >> ================================================== =================
    >> --- mmotm-2.6.27-rc7+.orig/mm/memcontrol.c
    >> +++ mmotm-2.6.27-rc7+/mm/memcontrol.c
    >> @@ -426,6 +426,7 @@ int task_in_mem_cgroup(struct task_struc
    >> void mem_cgroup_move_lists(struct page *page, enum lru_list lru)
    >> {
    >> struct page_cgroup *pc;
    >> + struct mem_cgroup *mem;
    >> struct mem_cgroup_per_zone *mz;
    >> unsigned long flags;
    >>
    >> @@ -444,9 +445,14 @@ void mem_cgroup_move_lists(struct page *
    >>
    >> pc = page_get_page_cgroup(page);
    >> if (pc) {
    >> + mem = pc->mem_cgroup;
    >> mz = page_cgroup_zoneinfo(pc);
    >> spin_lock_irqsave(&mz->lru_lock, flags);
    >> - __mem_cgroup_move_lists(pc, lru);
    >> + /*
    >> + * check against the race with move_account.
    >> + */
    >> + if (likely(mem == pc->mem_cgroup))
    >> + __mem_cgroup_move_lists(pc, lru);
    >> spin_unlock_irqrestore(&mz->lru_lock, flags);
    >> }
    >> unlock_page_cgroup(page);
    >> @@ -567,6 +573,70 @@ unsigned long mem_cgroup_isolate_pages(u
    >> return nr_taken;
    >> }
    >>
    >> +/**
    >> + * mem_cgroup_move_account - move account of the page
    >> + * @page ... the target page of being moved.
    >> + * @pc ... page_cgroup of the page.
    >> + * @from ... mem_cgroup which the page is moved from.
    >> + * @to ... mem_cgroup which the page is moved to.
    >> + *
    >> + * The caller must confirm following.
    >> + * 1. disable irq.
    >> + * 2. lru_lock of old mem_cgroup should be held.
    >> + * 3. pc is guaranteed to be valid and on mem_cgroup's LRU.
    >> + *
    >> + * Because we cannot call try_to_free_page() here, the caller must guarant

    ee
    >> + * this moving of charge never fails. (if charge fails, this call fails.)
    >> + * Currently this is called only against root cgroup.
    >> + * which has no limitation of resource.
    >> + * Returns 0 at success, returns 1 at failure.
    >> + */
    >> +int mem_cgroup_move_account(struct page *page, struct page_cgroup *pc,
    >> + struct mem_cgroup *from, struct mem_cgroup *to)
    >> +{
    >> + struct mem_cgroup_per_zone *from_mz, *to_mz;
    >> + int nid, zid;
    >> + int ret = 1;
    >> +
    >> + VM_BUG_ON(!irqs_disabled());
    >> +
    >> + nid = page_to_nid(page);
    >> + zid = page_zonenum(page);
    >> + from_mz = mem_cgroup_zoneinfo(from, nid, zid);
    >> + to_mz = mem_cgroup_zoneinfo(to, nid, zid);
    >> +
    >> + if (res_counter_charge(&to->res, PAGE_SIZE)) {
    >> + /* Now, we assume no_limit...no failure here. */
    >> + return ret;
    >> + }

    >
    >Please BUG_ON() if the charging fails, we can be sure we catch assumptions
    > are broken.
    >

    I'll remove this charge() and change protocol to be
    "the page should be pre-charged to destination cgroup before calling move."


    >> + if (!try_lock_page_cgroup(page)) {
    >> + res_counter_uncharge(&to->res, PAGE_SIZE);
    >> + return ret;
    >> + }
    >> +
    >> + if (page_get_page_cgroup(page) != pc) {
    >> + res_counter_uncharge(&to->res, PAGE_SIZE);
    >> + goto out;
    >> + }
    >> +
    >> + if (spin_trylock(&to_mz->lru_lock)) {

    >
    >The spin_trylock is to avoid deadlocks, right?
    >

    yes.

    >> + __mem_cgroup_remove_list(from_mz, pc);
    >> + css_put(&from->css);
    >> + res_counter_uncharge(&from->res, PAGE_SIZE);
    >> + pc->mem_cgroup = to;
    >> + css_get(&to->css);
    >> + __mem_cgroup_add_list(to_mz, pc);
    >> + ret = 0;
    >> + spin_unlock(&to_mz->lru_lock);
    >> + } else {
    >> + res_counter_uncharge(&to->res, PAGE_SIZE);
    >> + }
    >> +out:
    >> + unlock_page_cgroup(page);
    >> +
    >> + return ret;
    >> +}
    >> +
    >> /*
    >> * Charge the memory controller for page usage.
    >> * Return
    >> @@ -754,16 +824,24 @@ __mem_cgroup_uncharge_common(struct page
    >> if ((ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
    >> && ((PageCgroupCache(pc) || page_mapped(page))))
    >> goto unlock;
    >> -
    >> +retry:
    >> + mem = pc->mem_cgroup;
    >> mz = page_cgroup_zoneinfo(pc);
    >> spin_lock_irqsave(&mz->lru_lock, flags);
    >> + if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED &&
    >> + unlikely(mem != pc->mem_cgroup)) {
    >> + /* MAPPED account can be done without lock_page().
    >> + Check race with mem_cgroup_move_account() */

    >
    >Coding style above is broken.

    will fix.

    >Can this race really occur?

    yes, I think so. please check SwapCache behavior.
    But This check may disappear after I rewrite the whole series.

    >Why do we get mem before acquiring the mz->lru_lock? We don't seem to
    >be using it.

    It's costly to call page_cgroup_zoneinfo() again. so just saves mem and
    compare pc->mem_cgroup.

    Thanks,
    -Kame

    >> + spin_unlock_irqrestore(&mz->lru_lock, flags);
    >> + goto retry;
    >> + }
    >> __mem_cgroup_remove_list(mz, pc);
    >> spin_unlock_irqrestore(&mz->lru_lock, flags);
    >>
    >> page_assign_page_cgroup(page, NULL);
    >> unlock_page_cgroup(page);
    >>
    >> - mem = pc->mem_cgroup;
    >> +
    >> res_counter_uncharge(&mem->res, PAGE_SIZE);
    >> css_put(&mem->css);
    >>
    >>

    >
    >
    >--
    > Balbir


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [PATCH 0/12] memcg updates v5

    KAMEZAWA Hiroyuki wrote:
    > On Fri, 26 Sep 2008 19:36:02 +0900
    > KAMEZAWA Hiroyuki wrote:
    >
    >>> I think (1) might be OK, except for the accounting issues pointed out (change in
    >>> behaviour visible to end user again, sigh! ).

    >> But it was just a BUG from my point of view...
    >>
    >>> Is (1) a serious issue?

    >> considering force_empty(), it's serious.
    >>
    >>> (2) seems OK, except for the locking change for mark_page_accessed. I am looking at
    >>> (4) and (6) currently.
    >>>

    >
    > I'll do in following way in the next Monday.
    > Divide patches into 2 set
    >
    > in early fix/optimize set.
    > - push (2)
    > - push (4)
    > - push (6)
    > - push (1)
    >


    Yes, sounds reasonable

    > drops (3).
    >
    > I don't want to remove all? pages-never-on-LRU before fixing force_empty.
    >
    > in updates
    > - introduce atomic flags. (5)
    > - add move_account() function (7)


    without (3), don't we have a problem pushing (7)?

    > - add memory.attribute to each memcg dir. (NEW)
    > - enhance force_empty (was (8))
    > - remove "forget all" logic. and add attribute to select following 2 behavior
    > - call try_to_free_page() until the usage goes down to 0.
    > This allows faiulre (if page is mlocked, we can't do.). (NEW)
    > - call move_account() to move all charges to its parent (as much as possible) (NEW)
    > In future, I'd liket to add trash-box cgroup for force_empty somewhere.
    > - allocate all page cgroup at boot (9)
    > - lazy lru free/add (10,11) with fixes.
    > - fix race at charging swap. (12)
    >


    I think (9) is probably the most important. I'll review it today

    > After (9), all page and page_cgroup has one-to-one releationship and we want to
    > assume that "if page is alive and on LRU, it's accounted and has page_cgroup."
    > (other team, bio cgroup want to use page_cgroup and I want to make it easy.)
    >
    > For this, fix to behavior of force_empty..."forget all" is necessary.
    > SwapCache handling is also necessary but I'd like to postpone until next set
    > because it's complicated.
    >
    > After above all.
    > - handle swap cache
    > - Mem+Swap controller.
    > - add trashbox feature ?
    > - add memory.shrink_usage_to file.
    >
    > It's long way to what I really want to do....
    >


    Yes a long way to go, I want to add

    1) Multi-hierarchy support
    2) Support for soft-limits
    3) get swappiness working (there are patches posted for it by Yamamoto-San, but
    something is broken, I suspect even in global swappiness).



    >
    > Thanks,
    > -Kame



    --
    Balbir
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [PATCH 0/12] memcg updates v5

    On Mon, 29 Sep 2008 08:32:07 +0530
    Balbir Singh wrote:

    > KAMEZAWA Hiroyuki wrote:
    > > On Fri, 26 Sep 2008 19:36:02 +0900
    > > KAMEZAWA Hiroyuki wrote:
    > >
    > >>> I think (1) might be OK, except for the accounting issues pointed out (change in
    > >>> behaviour visible to end user again, sigh! ).
    > >> But it was just a BUG from my point of view...
    > >>
    > >>> Is (1) a serious issue?
    > >> considering force_empty(), it's serious.
    > >>
    > >>> (2) seems OK, except for the locking change for mark_page_accessed. I am looking at
    > >>> (4) and (6) currently.
    > >>>

    > >
    > > I'll do in following way in the next Monday.
    > > Divide patches into 2 set
    > >
    > > in early fix/optimize set.
    > > - push (2)
    > > - push (4)
    > > - push (6)
    > > - push (1)
    > >

    >
    > Yes, sounds reasonable
    >

    I'll just post this, today.

    > > drops (3).
    > >
    > > I don't want to remove all? pages-never-on-LRU before fixing force_empty.
    > >
    > > in updates
    > > - introduce atomic flags. (5)
    > > - add move_account() function (7)

    >
    > without (3), don't we have a problem pushing (7)?
    >

    I'll add -EBUSY behavior to force_empty.
    I'm now adding(merging) code to move_account.patch for supporing force_empty.
    It seems much clearer than v5.


    > > - add memory.attribute to each memcg dir. (NEW)
    > > - enhance force_empty (was (8))
    > > - remove "forget all" logic. and add attribute to select following 2 behavior
    > > - call try_to_free_page() until the usage goes down to 0.
    > > This allows faiulre (if page is mlocked, we can't do.). (NEW)
    > > - call move_account() to move all charges to its parent (as much as possible) (NEW)
    > > In future, I'd liket to add trash-box cgroup for force_empty somewhere.
    > > - allocate all page cgroup at boot (9)
    > > - lazy lru free/add (10,11) with fixes.
    > > - fix race at charging swap. (12)
    > >

    >
    > I think (9) is probably the most important. I'll review it today
    >

    Thanks. no major changes in my current stack from already posted one.

    Thanks,
    -Kame

    > > After (9), all page and page_cgroup has one-to-one releationship and we want to
    > > assume that "if page is alive and on LRU, it's accounted and has page_cgroup."
    > > (other team, bio cgroup want to use page_cgroup and I want to make it easy.)
    > >
    > > For this, fix to behavior of force_empty..."forget all" is necessary.
    > > SwapCache handling is also necessary but I'd like to postpone until next set
    > > because it's complicated.
    > >
    > > After above all.
    > > - handle swap cache
    > > - Mem+Swap controller.
    > > - add trashbox feature ?
    > > - add memory.shrink_usage_to file.
    > >
    > > It's long way to what I really want to do....
    > >

    >
    > Yes a long way to go, I want to add
    >
    > 1) Multi-hierarchy support
    > 2) Support for soft-limits
    > 3) get swappiness working (there are patches posted for it by Yamamoto-San, but
    > something is broken, I suspect even in global swappiness).
    >
    >
    >
    > >
    > > Thanks,
    > > -Kame

    >
    >
    > --
    > Balbir
    >


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 3 of 4 FirstFirst 1 2 3 4 LastLast