[patch 0/7] cpuset writeback throttling - Kernel

This is a discussion on [patch 0/7] cpuset writeback throttling - Kernel ; On Tue, 4 Nov 2008 20:45:17 -0600 (CST) Christoph Lameter wrote: > On Tue, 4 Nov 2008, Andrew Morton wrote: > > > In a memcg implementation what we would implement is "throttle > > page-dirtying tasks in this memcg ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 33 of 33

Thread: [patch 0/7] cpuset writeback throttling

  1. Re: [patch 0/7] cpuset writeback throttling

    On Tue, 4 Nov 2008 20:45:17 -0600 (CST) Christoph Lameter wrote:

    > On Tue, 4 Nov 2008, Andrew Morton wrote:
    >
    > > In a memcg implementation what we would implement is "throttle
    > > page-dirtying tasks in this memcg when the memcg's dirty memory reaches
    > > 40% of its total".

    >
    > Right that is similar to what this patch does for cpusets. A memcg
    > implementation would need to figure out if we are currently part of a
    > memcg and then determine the percentage of memory that is dirty.
    >
    > That is one aspect. When performing writeback then we need to figure out
    > which inodes have dirty pages in the memcg and we need to start writeout
    > on those inodes and not on others that have their dirty pages elsewhere.
    > There are two components of this that are in this patch and that would
    > also have to be implemented for a memcg.


    Doable. lru->page->mapping->host is a good start.

    > > But that doesn't solve the problem which this patchset is trying to
    > > solve, which is "don't let all the memory in all this group of nodes
    > > get dirty".

    >
    > This patch would solve the problem if the calculation of the dirty pages
    > would consider the active memcg and be able to determine the amount of
    > dirty pages (through some sort of additional memcg counters). That is just
    > the first part though. The second part of finding the inodes that have
    > dirty pages for writeback would require an association between memcgs and
    > inodes.


    We presently have that via the LRU. It has holes, but so does this per-cpuset
    scheme.

    > > What happens if cpuset A uses nodes 0,1,2,3,4,5,6,7,8,9 and cpuset B
    > > uses nodes 0,1? Can activity in cpuset A cause ooms in cpuset B?

    >
    > Yes if the activities of cpuset A cause all pages to be dirtied in cpuset
    > B and then cpuset B attempts to do writeback. This will fail to acquire
    > enough memory for writeback and make reclaim impossible.
    >
    > Typically cpusets are not overlapped like that but used to segment the
    > system.
    >
    > The system would work correctly if the dirty ratio calculation would be
    > done on all overlapping cpusets/memcg groups that contain nodes from
    > which allocations are permitted.


    That.


    Generally, I worry that this is a specific fix to a specific problem
    encountered on specific machines with specific setups and specific
    workloads, and that it's just all too low-level and myopic.

    And now we're back in the usual position where there's existing code and
    everyone says it's terribly wonderful and everyone is reluctant to step
    back and look at the big picture. Am I wrong?


    Plus: we need per-memcg dirty-memory throttling, and this is more
    important than per-cpuset, I suspect. How will the (already rather
    buggy) code look once we've stuffed both of them in there?


    I agree that there's a problem here, although given the amount of time
    that it's been there, I suspect that it is a very small problem.
    Someone please convince me that in three years time we will agree that
    merging this fix to that problem was a correct decision?


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [patch 0/7] cpuset writeback throttling

    On Wed, 5 Nov 2008 10:31:23 +0900 KAMEZAWA Hiroyuki wrote:

    > >
    > > Yes? Someone help me out here. I don't yet have my head around the
    > > overlaps and incompatibilities here. Perhaps the containers guys will
    > > wake up and put their thinking caps on?
    > >
    > >
    > >
    > > What happens if cpuset A uses nodes 0,1,2,3,4,5,6,7,8,9 and cpuset B
    > > uses nodes 0,1? Can activity in cpuset A cause ooms in cpuset B?
    > >

    > For help this, per-node-dirty-ratio-throttoling is necessary.
    >
    > Shouldn't we just have a new parameter as /proc/sys/vm/dirty_ratio_per_node.


    I guess that would work. But it is a general solution and will be less
    efficient for the particular setups which are triggering this problem.

    > /proc/sys/vm/dirty_ratio works for throttling the whole system dirty pages.
    > /proc/sys/vm/dirty_ratio_per_node works for throttling dirty pages in a node.
    >
    > Implementation will not be difficult and works enough against OOM.


    Yup. Just track per-node dirtiness and walk the LRU when it is over
    threshold.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [patch 0/7] cpuset writeback throttling

    On Tue, 4 Nov 2008 19:05:05 -0800
    Andrew Morton wrote:
    > Generally, I worry that this is a specific fix to a specific problem
    > encountered on specific machines with specific setups and specific
    > workloads, and that it's just all too low-level and myopic.
    >
    > And now we're back in the usual position where there's existing code and
    > everyone says it's terribly wonderful and everyone is reluctant to step
    > back and look at the big picture. Am I wrong?
    >
    >
    > Plus: we need per-memcg dirty-memory throttling, and this is more
    > important than per-cpuset, I suspect. How will the (already rather
    > buggy) code look once we've stuffed both of them in there?
    >
    >

    IIUC, Andrea Righ posted 2 patches around dirty_ratio. (added him to CC
    in early October.

    (1) patch for adding dirty_ratio_pcm. (1/100000)
    (2) per-memcg dirty ratio. (maybe this..http://lkml.org/lkml/2008/9/12/121)

    (1) should be just posted again.

    Because we have changed page_cgroup implementation, (2) should be reworked.
    "rework" itself will not be very difficult.
    (.... we tend to be stick to "what interface is the best" discussion

    But memcg itself is not so weak against dirty_pages because we don't call
    try_to_free_pages() becasue of memory shortage but because of memory limitation.

    BTW, in my current stack, followings are queued.
    a. handle SwapCache in proper way in memcg.
    b. handle swap_cgroup (if configured)
    c. make LRU handling easier

    For making per-memcg dirty_ratio sane, (a) should go ahead. I do (a) now.
    If Andrea seems to be too busy, I'll schedule dirty_ratio-for-memcg as my work.

    Thanks,
    -Kame

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [patch 0/7] cpuset writeback throttling

    On Tue, 4 Nov 2008, Andrew Morton wrote:

    >> That is one aspect. When performing writeback then we need to figure out
    >> which inodes have dirty pages in the memcg and we need to start writeout
    >> on those inodes and not on others that have their dirty pages elsewhere.
    >> There are two components of this that are in this patch and that would
    >> also have to be implemented for a memcg.

    >
    > Doable. lru->page->mapping->host is a good start.


    The block layer has a list of inodes that are dirty. From that we need to
    select ones that will improve the situation from the cpuset/memcg. How
    does the LRU come into this?

    >> This patch would solve the problem if the calculation of the dirty pages
    >> would consider the active memcg and be able to determine the amount of
    >> dirty pages (through some sort of additional memcg counters). That is just
    >> the first part though. The second part of finding the inodes that have
    >> dirty pages for writeback would require an association between memcgs and
    >> inodes.

    >
    > We presently have that via the LRU. It has holes, but so does this per-cpuset
    > scheme.


    How do I get to the LRU from the dirtied list of inodes?

    > Generally, I worry that this is a specific fix to a specific problem
    > encountered on specific machines with specific setups and specific
    > workloads, and that it's just all too low-level and myopic.
    >
    > And now we're back in the usual position where there's existing code and
    > everyone says it's terribly wonderful and everyone is reluctant to step
    > back and look at the big picture. Am I wrong?


    Well everyone is just reluctant to do work it seems. Thus they fall back
    to a solution that I provided when memcg groups were not yet available. It
    would be best if someone could find a general scheme or generalize this
    patchset.

    > Plus: we need per-memcg dirty-memory throttling, and this is more
    > important than per-cpuset, I suspect. How will the (already rather
    > buggy) code look once we've stuffed both of them in there?


    The basics will still be the same

    1. One need to establish the dirty ratio of memcgs and monitor them.
    2. There needs to be mechanism to perform writeout on the right inodes.

    > I agree that there's a problem here, although given the amount of time
    > that it's been there, I suspect that it is a very small problem.


    It used to be only a problem for NUMA systems. Now its also a problem for
    memcgs.

    > Someone please convince me that in three years time we will agree that
    > merging this fix to that problem was a correct decision?


    At the mininum: It provides a basis on top of which memcg support
    can be developed. There are likely major modifications needed to VM
    statistics to get there for memcg groups.




    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [patch 0/7] cpuset writeback throttling

    On Wed, 5 Nov 2008 07:52:44 -0600 (CST)
    Christoph Lameter wrote:

    > On Tue, 4 Nov 2008, Andrew Morton wrote:
    >
    > >> That is one aspect. When performing writeback then we need to figure out
    > >> which inodes have dirty pages in the memcg and we need to start writeout
    > >> on those inodes and not on others that have their dirty pages elsewhere.
    > >> There are two components of this that are in this patch and that would
    > >> also have to be implemented for a memcg.

    > >
    > > Doable. lru->page->mapping->host is a good start.

    >
    > The block layer has a list of inodes that are dirty. From that we need to
    > select ones that will improve the situation from the cpuset/memcg. How
    > does the LRU come into this?


    In the simplest case, dirty-memory throttling can just walk the LRU
    writing back pages in the same way that kswapd does.

    There would probably be performance benefits in doing
    address_space-ordered writeback, so the dirty-memory throttling could
    pick a dirty page off the LRU, go find its inode and then feed that
    into __sync_single_inode().

    > >> This patch would solve the problem if the calculation of the dirty pages
    > >> would consider the active memcg and be able to determine the amount of
    > >> dirty pages (through some sort of additional memcg counters). That is just
    > >> the first part though. The second part of finding the inodes that have
    > >> dirty pages for writeback would require an association between memcgs and
    > >> inodes.

    > >
    > > We presently have that via the LRU. It has holes, but so does this per-cpuset
    > > scheme.

    >
    > How do I get to the LRU from the dirtied list of inodes?


    Don't need it.

    It'll be approximate and has obvious scenarios of great inaccuraracy
    but it'll suffice for the workloads which this patchset addresses.



    It sounds like any memcg-based approach just won't be suitable for the
    people who are hitting this problem.

    But _are_ people hitting this problem? I haven't seen any real-looking
    reports in ages. Is there some workaround? If so, what is it? How
    serious is this problem now?

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [patch 0/7] cpuset writeback throttling

    On Wed, 5 Nov 2008, Andrew Morton wrote:

    > > > Doable. lru->page->mapping->host is a good start.

    > >
    > > The block layer has a list of inodes that are dirty. From that we need to
    > > select ones that will improve the situation from the cpuset/memcg. How
    > > does the LRU come into this?

    >
    > In the simplest case, dirty-memory throttling can just walk the LRU
    > writing back pages in the same way that kswapd does.


    That means running reclaim. But we are only interested in getting rid of
    dirty pages. Plus the filesystem guys have repeatedly pointed out that
    page sized I/O to random places in a file is not a good thing to do. There
    was actually talk of stopping kswapd from writing out pages!

    > There would probably be performance benefits in doing
    > address_space-ordered writeback, so the dirty-memory throttling could
    > pick a dirty page off the LRU, go find its inode and then feed that
    > into __sync_single_inode().


    We cannot call into the writeback functions for an inode from a reclaim
    context. We can write back single pages but not a range of pages from an
    inode due to various locking issues (see discussion on slab defrag
    patchset).

    > > How do I get to the LRU from the dirtied list of inodes?

    >
    > Don't need it.
    >
    > It'll be approximate and has obvious scenarios of great inaccuraracy
    > but it'll suffice for the workloads which this patchset addresses.


    Sounds like a wild hack that runs against known limitations in terms
    of locking etc.

    > It sounds like any memcg-based approach just won't be suitable for the
    > people who are hitting this problem.


    Why not? If you can determine which memcgs an inode has dirty pages on
    then the same scheme as proposed here will work.

    > But _are_ people hitting this problem? I haven't seen any real-looking
    > reports in ages. Is there some workaround? If so, what is it? How
    > serious is this problem now?


    Are there people who are actually having memcg based solutions deployed?
    No enterprise release includes it yet so I guess that there is not much of
    a use yet.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [patch 0/7] cpuset writeback throttling

    On Wed, 5 Nov 2008 14:21:47 -0600 (CST)
    Christoph Lameter wrote:

    > On Wed, 5 Nov 2008, Andrew Morton wrote:
    >
    > > > > Doable. lru->page->mapping->host is a good start.
    > > >
    > > > The block layer has a list of inodes that are dirty. From that we need to
    > > > select ones that will improve the situation from the cpuset/memcg. How
    > > > does the LRU come into this?

    > >
    > > In the simplest case, dirty-memory throttling can just walk the LRU
    > > writing back pages in the same way that kswapd does.

    >
    > That means running reclaim. But we are only interested in getting rid of
    > dirty pages. Plus the filesystem guys have repeatedly pointed out that
    > page sized I/O to random places in a file is not a good thing to do. There
    > was actually talk of stopping kswapd from writing out pages!


    They don't have to be reclaimed.

    > > There would probably be performance benefits in doing
    > > address_space-ordered writeback, so the dirty-memory throttling could
    > > pick a dirty page off the LRU, go find its inode and then feed that
    > > into __sync_single_inode().

    >
    > We cannot call into the writeback functions for an inode from a reclaim
    > context. We can write back single pages but not a range of pages from an
    > inode due to various locking issues (see discussion on slab defrag
    > patchset).


    We're not in a reclaim context. We're in sys_write() context.

    > > But _are_ people hitting this problem? I haven't seen any real-looking
    > > reports in ages. Is there some workaround? If so, what is it? How
    > > serious is this problem now?

    >
    > Are there people who are actually having memcg based solutions deployed?
    > No enterprise release includes it yet so I guess that there is not much of
    > a use yet.


    If you know the answer then please provide it. If you don't, please
    say "I don't know".

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [patch 0/7] cpuset writeback throttling

    On Wed, 5 Nov 2008, Andrew Morton wrote:

    > > That means running reclaim. But we are only interested in getting rid of
    > > dirty pages. Plus the filesystem guys have repeatedly pointed out that
    > > page sized I/O to random places in a file is not a good thing to do. There
    > > was actually talk of stopping kswapd from writing out pages!

    >
    > They don't have to be reclaimed.


    Well the LRU is used for reclaim. If you step over it then its using the
    existing reclaim logic in vmscan.c right?

    > > > There would probably be performance benefits in doing
    > > > address_space-ordered writeback, so the dirty-memory throttling could
    > > > pick a dirty page off the LRU, go find its inode and then feed that
    > > > into __sync_single_inode().

    > >
    > > We cannot call into the writeback functions for an inode from a reclaim
    > > context. We can write back single pages but not a range of pages from an
    > > inode due to various locking issues (see discussion on slab defrag
    > > patchset).

    >
    > We're not in a reclaim context. We're in sys_write() context.


    Dirtying a page can occur from a variety of kernel contexts.

    > > > But _are_ people hitting this problem? I haven't seen any real-looking
    > > > reports in ages. Is there some workaround? If so, what is it? How
    > > > serious is this problem now?

    > >
    > > Are there people who are actually having memcg based solutions deployed?
    > > No enterprise release includes it yet so I guess that there is not much of
    > > a use yet.

    >
    > If you know the answer then please provide it. If you don't, please
    > say "I don't know".


    I thought we were talking about memcg related reports. I have dealt with
    scores of the cpuset related ones in my prior job.

    Workarounds are:

    1. Reduce the global dirty ratios so that the number of dirty pages in a
    cpuset cannot become too high.

    2. Do not create small cpusets where the system can dirty all pages.

    3. Find other ways to limit the dirty pages (run sync once in a while or
    so).
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [patch 0/7] cpuset writeback throttling

    On Wed, 5 Nov 2008 14:40:05 -0600 (CST)
    Christoph Lameter wrote:

    > On Wed, 5 Nov 2008, Andrew Morton wrote:
    >
    > > > That means running reclaim. But we are only interested in getting rid of
    > > > dirty pages. Plus the filesystem guys have repeatedly pointed out that
    > > > page sized I/O to random places in a file is not a good thing to do. There
    > > > was actually talk of stopping kswapd from writing out pages!

    > >
    > > They don't have to be reclaimed.

    >
    > Well the LRU is used for reclaim. If you step over it then its using the
    > existing reclaim logic in vmscan.c right?


    Only if you use it that way.

    I imagine that a suitable implementation would start IO on the page
    then move it to the other end of the LRU. ie: treat it as referenced.
    Pretty simple stuff.

    If we were to do writeout on the page's inode instead then we'd need
    to move the page out of the way somehow, presumably by rotating it.

    It's all workable outable.

    > > > > There would probably be performance benefits in doing
    > > > > address_space-ordered writeback, so the dirty-memory throttling could
    > > > > pick a dirty page off the LRU, go find its inode and then feed that
    > > > > into __sync_single_inode().
    > > >
    > > > We cannot call into the writeback functions for an inode from a reclaim
    > > > context. We can write back single pages but not a range of pages from an
    > > > inode due to various locking issues (see discussion on slab defrag
    > > > patchset).

    > >
    > > We're not in a reclaim context. We're in sys_write() context.

    >
    > Dirtying a page can occur from a variety of kernel contexts.


    This writeback will occur from one quite specific place:
    balance_dirty_pages(). That's called from sys_write() and pagefaults.
    Other scruffy places like splice too.

    But none of that matters - the fact is that we're _already_ doing
    writeback from balance_dirty_pages(). All we're talking about here is
    alternative schemes for looking up the pages to write.

    > > > > But _are_ people hitting this problem? I haven't seen any real-looking
    > > > > reports in ages. Is there some workaround? If so, what is it? How
    > > > > serious is this problem now?
    > > >
    > > > Are there people who are actually having memcg based solutions deployed?
    > > > No enterprise release includes it yet so I guess that there is not much of
    > > > a use yet.

    > >
    > > If you know the answer then please provide it. If you don't, please
    > > say "I don't know".

    >
    > I thought we were talking about memcg related reports. I have dealt with
    > scores of the cpuset related ones in my prior job.
    >
    > Workarounds are:
    >
    > 1. Reduce the global dirty ratios so that the number of dirty pages in a
    > cpuset cannot become too high.


    That would be less than the smallest node's memory capacity, I guess.

    > 2. Do not create small cpusets where the system can dirty all pages.
    >
    > 3. Find other ways to limit the dirty pages (run sync once in a while or
    > so).


    hm, OK.


    See, here's my problem: we have a pile of new code which fixes some
    problem. But the problem seems to be fairly small - it only affects a
    small number of sophisticated users and they already have workarounds
    in place.

    So the world wouldn't end if we just didn't merge it. Those users
    stick with their workarounds and the kernel remains simpler and
    smaller.

    How do we work out which is the best choice here? I don't have enough
    information to do this.


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [patch 0/7] cpuset writeback throttling

    On Wed, 5 Nov 2008, Andrew Morton wrote:

    > See, here's my problem: we have a pile of new code which fixes some
    > problem. But the problem seems to be fairly small - it only affects a
    > small number of sophisticated users and they already have workarounds
    > in place.


    Well yes... Great situation with those workarounds.

    > So the world wouldn't end if we just didn't merge it. Those users
    > stick with their workarounds and the kernel remains simpler and
    > smaller.
    >
    > How do we work out which is the best choice here? I don't have enough
    > information to do this.


    Not sure how to treat this. I am not involved with large NUMA at this
    point. So the people who are interested need to speak up if they want
    this.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [patch 0/7] cpuset writeback throttling

    On Wed, Nov 5, 2008 at 12:56 PM, Andrew Morton
    wrote:
    >>
    >> 1. Reduce the global dirty ratios so that the number of dirty pages in a
    >> cpuset cannot become too high.

    >
    > That would be less than the smallest node's memory capacity, I guess.


    Even that doesn't work - if there's a single global limit on dirty
    pages, then any cpuset/cgroup with access to enough memory can exhaust
    that limit and cause other processes to block when they try to write
    to disk. You need independent dirty counts to avoid that, whether it
    be per-node or per-cgroup.

    Paul
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [patch 0/7] cpuset writeback throttling

    On Wed, 5 Nov 2008 14:04:42 -0800 (PST)
    David Rientjes wrote:

    > > So the world wouldn't end if we just didn't merge it. Those users
    > > stick with their workarounds and the kernel remains simpler and
    > > smaller.
    > >

    >
    > Agreed. This patchset is admittedly from a different time when cpusets
    > was the only relevant extension that needed to be done.
    >

    BTW, what is the problem this patch wants to fix ?
    1. avoid slow-down of memory allocation by triggering write-out earlier.
    2. avoid OOM by throttoling dirty pages.

    About 1, memcg's diry_ratio can help if mounted as
    mount -t cgroup none /somewhere/ -o cpuset,memory
    (If the user can accept overheads of memcg.)
    If implemented.

    About 2, A Google guy posted OOM handler cgroup to linux-mm.

    > > How do we work out which is the best choice here? I don't have enough
    > > information to do this.
    > >

    >
    > If we are to support memcg-specific dirty ratios, that requires the
    > aforementioned statistics to be collected so that the calculation is even
    > possible. The series at
    >
    > http://marc.info/?l=linux-kernel&m=122123225006571
    > http://marc.info/?l=linux-kernel&m=122123241106902
    >

    yes. we(memcg) need this kind of.

    > is a step in that direction, although I'd prefer to see NR_UNSTABLE_NFS to
    > be extracted separately from MEM_CGROUP_STAT_FILE_DIRTY so
    > throttle_vm_writeout() can also use the new statistics.
    >

    Thank you for input.

    Thanks,
    -Kame


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [patch 0/7] cpuset writeback throttling

    On 2008-11-05 05:31, KAMEZAWA Hiroyuki wrote:
    > On Tue, 4 Nov 2008 19:05:05 -0800
    > Andrew Morton wrote:
    >> Generally, I worry that this is a specific fix to a specific problem
    >> encountered on specific machines with specific setups and specific
    >> workloads, and that it's just all too low-level and myopic.
    >>
    >> And now we're back in the usual position where there's existing code and
    >> everyone says it's terribly wonderful and everyone is reluctant to step
    >> back and look at the big picture. Am I wrong?
    >>
    >>
    >> Plus: we need per-memcg dirty-memory throttling, and this is more
    >> important than per-cpuset, I suspect. How will the (already rather
    >> buggy) code look once we've stuffed both of them in there?
    >>
    >>

    > IIUC, Andrea Righ posted 2 patches around dirty_ratio. (added him to CC
    > in early October.
    >
    > (1) patch for adding dirty_ratio_pcm. (1/100000)
    > (2) per-memcg dirty ratio. (maybe this..http://lkml.org/lkml/2008/9/12/121)
    >
    > (1) should be just posted again.
    >
    > Because we have changed page_cgroup implementation, (2) should be reworked.
    > "rework" itself will not be very difficult.
    > (.... we tend to be stick to "what interface is the best" discussion
    >
    > But memcg itself is not so weak against dirty_pages because we don't call
    > try_to_free_pages() becasue of memory shortage but because of memory limitation.
    >
    > BTW, in my current stack, followings are queued.
    > a. handle SwapCache in proper way in memcg.
    > b. handle swap_cgroup (if configured)
    > c. make LRU handling easier
    >
    > For making per-memcg dirty_ratio sane, (a) should go ahead. I do (a) now.
    > If Andrea seems to be too busy, I'll schedule dirty_ratio-for-memcg as my work.
    >


    Hi Kame,

    sorry for my late. If it's not too late tonight I'll rebase and test (1)
    to 2.6.28-rc2-mm1 and start to rework on (2), also considering the
    David's suggestion (split NR_UNSTABLE_NFS from NR_FILE_DIRTY).

    -Andrea
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2