[patch 00/19] VM pageout scalability improvements - Kernel

This is a discussion on [patch 00/19] VM pageout scalability improvements - Kernel ; On Fri, 04 Jan 2008 17:34:00 +0100 Andi Kleen wrote: > Lee Schermerhorn writes: > > > We can easily [he says, glibly] reproduce the hang on the anon_vma lock > > Is that a NUMA platform? On non x86? ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 31 of 31

Thread: [patch 00/19] VM pageout scalability improvements

  1. Re: [patch 00/19] VM pageout scalability improvements

    On Fri, 04 Jan 2008 17:34:00 +0100
    Andi Kleen wrote:
    > Lee Schermerhorn writes:
    >
    > > We can easily [he says, glibly] reproduce the hang on the anon_vma lock

    >
    > Is that a NUMA platform? On non x86? Perhaps you just need queued spinlocks?


    I really think that the anon_vma and i_mmap_lock spinlock hangs are
    due to the lack of queued spinlocks. Not because I have seen your
    system hang, but because I've seen one of Larry's test systems here
    hang in scary/amusing ways

    With queued spinlocks the system should just slow down, not hang.

    --
    All rights reversed.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [patch 00/19] VM pageout scalability improvements

    On Fri, 2008-01-04 at 17:34 +0100, Andi Kleen wrote:
    > Lee Schermerhorn writes:
    >
    > > We can easily [he says, glibly] reproduce the hang on the anon_vma lock

    >
    > Is that a NUMA platform? On non x86? Perhaps you just need queued spinlocks?


    We see this on both NUMA and non-NUMA. x86_64 and ia64. The basic
    criteria to reproduce is to be able to run thousands [or low 10s of
    thousands] of tasks, continually increasing the number until the system
    just goes into reclaim. Instead of swapping, the system seems to
    hang--unresponsive from the console, but with "soft lockup" messages
    spitting out every few seconds...


    Lee


    >
    > -Andi


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [patch 00/19] VM pageout scalability improvements

    Rik van Riel wrote:

    >On Fri, 04 Jan 2008 17:34:00 +0100
    >Andi Kleen wrote:
    >
    >
    >>Lee Schermerhorn writes:
    >>
    >>
    >>
    >>>We can easily [he says, glibly] reproduce the hang on the anon_vma lock
    >>>
    >>>

    >>Is that a NUMA platform? On non x86? Perhaps you just need queued spinlocks?
    >>
    >>

    >
    >I really think that the anon_vma and i_mmap_lock spinlock hangs are
    >due to the lack of queued spinlocks. Not because I have seen your
    >system hang, but because I've seen one of Larry's test systems here
    >hang in scary/amusing ways
    >

    Changing the anon_vma->lock into a rwlock_t helps because
    page_lock_anon_vma()
    can take it for read and thats where the contention is. However its the
    fact that under
    some tests, most of the pages are in vmas queued to one anon_vma that
    causes so much
    lock contention.


    >
    >With queued spinlocks the system should just slow down, not hang.
    >
    >
    >



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [patch 06/19] split LRU lists into anon & file sets

    On Wed, 02 Jan 2008 17:41:50 -0500
    linux-kernel@vger.kernel.org wrote:


    > static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
    > - struct scan_control *sc, int priority)
    > + struct scan_control *sc, int priority, int file)
    > {
    > unsigned long pgmoved;
    > int pgdeactivate = 0;
    > @@ -1128,64 +1026,65 @@ static void shrink_active_list(unsigned
    > struct list_head list[NR_LRU_LISTS];
    > struct page *page;
    > struct pagevec pvec;
    > - int reclaim_mapped = 0;
    > - enum lru_list l;
    > + enum lru_list lru;



    > + /*
    > + * For sorting active vs inactive pages, we'll use the 'anon'
    > + * elements of the local list[] array and sort out the file vs
    > + * anon pages below.
    > + */


    This is not easy to read.... (this definition affects later patches...)

    How about adding some new enum (only) for this function ?
    like
    LRU_STAY_ACTIVE = 0,
    LRU_MOVE_INACTIVE = 1,

    Thanks,
    -Kame

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [patch 07/19] split anon & file LRUs for memcontrol code

    On Wed, 02 Jan 2008 17:41:51 -0500
    linux-kernel@vger.kernel.org wrote:

    > Index: linux-2.6.24-rc6-mm1/mm/vmscan.c
    > ================================================== =================
    > --- linux-2.6.24-rc6-mm1.orig/mm/vmscan.c 2008-01-02 15:55:55.000000000 -0500
    > +++ linux-2.6.24-rc6-mm1/mm/vmscan.c 2008-01-02 15:56:00.000000000 -0500
    > @@ -1230,13 +1230,13 @@ static unsigned long shrink_zone(int pri
    >
    > get_scan_ratio(zone, sc, percent);
    >


    I'm happy if this calclation can be following later.
    ==
    if (scan_global_lru(sc)) {
    get_scan_ratio(zone, sc, percent);
    } else {
    get_scan_ratio_cgroup(sc->cgroup, sc, percent);
    }
    ==
    To do this,
    mem_cgroup needs to have recent_rotated_file and recent_rolated_anon ?

    Thanks,
    -Kame

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [patch 00/19] VM pageout scalability improvements

    On Thu, 3 Jan 2008 12:00:00 -0500
    Rik van Riel wrote:

    > On Thu, 03 Jan 2008 11:52:08 -0500
    > Lee Schermerhorn wrote:
    >
    > > Also, I should point out that the full noreclaim series includes a
    > > couple of other patches NOT posted here by Rik:
    > >
    > > 1) treat swap backed pages as nonreclaimable when no swap space is
    > > available. This addresses a problem we've seen in real life, with
    > > vmscan spending a lot of time trying to reclaim anon/shmem/tmpfs/...
    > > pages only to find that there is no swap space--add_to_swap() fails.
    > > Maybe not a problem with Rik's new anon page handling.

    >
    > If there is no swap space, my VM code will not bother scanning
    > any anon pages. This has the same effect as moving the pages
    > to the no-reclaim list, with the extra benefit of being able to
    > resume scanning the anon lists once swap space is freed.
    >

    Is this 'avoiding scanning anon if no swap' feature in this set ?

    Thanks
    -Kame

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [patch 07/19] split anon & file LRUs for memcontrol code

    KAMEZAWA Hiroyuki wrote:
    > On Wed, 02 Jan 2008 17:41:51 -0500
    > linux-kernel@vger.kernel.org wrote:
    >
    >> Index: linux-2.6.24-rc6-mm1/mm/vmscan.c
    >> ================================================== =================
    >> --- linux-2.6.24-rc6-mm1.orig/mm/vmscan.c 2008-01-02 15:55:55.000000000 -0500
    >> +++ linux-2.6.24-rc6-mm1/mm/vmscan.c 2008-01-02 15:56:00.000000000 -0500
    >> @@ -1230,13 +1230,13 @@ static unsigned long shrink_zone(int pri
    >>
    >> get_scan_ratio(zone, sc, percent);
    >>

    >
    > I'm happy if this calclation can be following later.
    > ==
    > if (scan_global_lru(sc)) {
    > get_scan_ratio(zone, sc, percent);
    > } else {
    > get_scan_ratio_cgroup(sc->cgroup, sc, percent);
    > }
    > ==
    > To do this,
    > mem_cgroup needs to have recent_rotated_file and recent_rolated_anon ?


    Yes, that makes sense.

    --
    Warm Regards,
    Balbir Singh
    Linux Technology Center
    IBM, ISTL
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [patch 00/19] VM pageout scalability improvements

    On Mon, 7 Jan 2008 19:06:10 +0900
    KAMEZAWA Hiroyuki wrote:
    > On Thu, 3 Jan 2008 12:00:00 -0500
    > Rik van Riel wrote:


    > > If there is no swap space, my VM code will not bother scanning
    > > any anon pages. This has the same effect as moving the pages
    > > to the no-reclaim list, with the extra benefit of being able to
    > > resume scanning the anon lists once swap space is freed.
    > >

    > Is this 'avoiding scanning anon if no swap' feature in this set ?


    I seem to have lost that code in a forward merge

    Dunno if I started the forward merge from an older series that
    Lee had or if I lost the code myself...

    I'll put it back in ASAP.

    --
    All rights reversed.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [patch 07/19] split anon & file LRUs for memcontrol code

    On Mon, 7 Jan 2008 19:04:55 +0900
    KAMEZAWA Hiroyuki wrote:

    > On Wed, 02 Jan 2008 17:41:51 -0500
    > linux-kernel@vger.kernel.org wrote:
    >
    > > Index: linux-2.6.24-rc6-mm1/mm/vmscan.c
    > > ================================================== =================
    > > --- linux-2.6.24-rc6-mm1.orig/mm/vmscan.c 2008-01-02 15:55:55.000000000 -0500
    > > +++ linux-2.6.24-rc6-mm1/mm/vmscan.c 2008-01-02 15:56:00.000000000 -0500
    > > @@ -1230,13 +1230,13 @@ static unsigned long shrink_zone(int pri
    > >
    > > get_scan_ratio(zone, sc, percent);
    > >

    >
    > I'm happy if this calclation can be following later.
    > ==
    > if (scan_global_lru(sc)) {
    > get_scan_ratio(zone, sc, percent);
    > } else {
    > get_scan_ratio_cgroup(sc->cgroup, sc, percent);
    > }
    > ==
    > To do this,
    > mem_cgroup needs to have recent_rotated_file and recent_rolated_anon ?


    One possible problem could be that the cgroup can also have
    pages reclaimed in global reclaim, not just in local cgroup
    reclaims.

    That is, these cgroup's pages can also disappear or get
    rotated without the cgroup's recent_rotated_file and
    recent_rotated_anon being affected at all.

    Still, having the cgroup do the same thing as the global
    zones is probably the best approximation.

    --
    All rights reversed.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [patch 00/19] VM pageout scalability improvements

    On Fri, 4 Jan 2008, Lee Schermerhorn wrote:

    > We see this on both NUMA and non-NUMA. x86_64 and ia64. The basic
    > criteria to reproduce is to be able to run thousands [or low 10s of
    > thousands] of tasks, continually increasing the number until the system
    > just goes into reclaim. Instead of swapping, the system seems to
    > hang--unresponsive from the console, but with "soft lockup" messages
    > spitting out every few seconds...


    Ditto here.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [patch 00/19] VM pageout scalability improvements

    On Mon, 7 Jan 2008 11:07:54 -0800 (PST)
    Christoph Lameter wrote:
    > On Fri, 4 Jan 2008, Lee Schermerhorn wrote:
    >
    > > We see this on both NUMA and non-NUMA. x86_64 and ia64. The basic
    > > criteria to reproduce is to be able to run thousands [or low 10s of
    > > thousands] of tasks, continually increasing the number until the system
    > > just goes into reclaim. Instead of swapping, the system seems to
    > > hang--unresponsive from the console, but with "soft lockup" messages
    > > spitting out every few seconds...

    >
    > Ditto here.


    I have some suspicions on what could be causing this.

    The most obvious suspect is get_scan_ratio() continuing to return
    100 file reclaim, 0 anon reclaim when the file LRUs have already
    been reduced to something very small, because reclaiming up to that
    point was easy.

    I plan to add some code to automatically set the anon reclaim to
    100% if (free + file_active + file_inactive <= zone->pages_high),
    meaning that reclaiming just file pages will not be able to free
    enough pages.

    --
    All rights reversed.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2