[PATCH] Memory management livelock - Kernel

This is a discussion on [PATCH] Memory management livelock - Kernel ; On Fri, 3 Oct 2008, david@lang.hm wrote: > On Fri, 3 Oct 2008, Nick Piggin wrote: > > > > *What* is, forever? Data integrity syncs should have pages operated on > > > in-order, until we get to the ...

+ Reply to Thread
Page 3 of 3 FirstFirst 1 2 3
Results 41 to 57 of 57

Thread: [PATCH] Memory management livelock

  1. Re: application syncing options (was Re: [PATCH] Memory management livelock)



    On Fri, 3 Oct 2008, david@lang.hm wrote:

    > On Fri, 3 Oct 2008, Nick Piggin wrote:
    >
    > > > *What* is, forever? Data integrity syncs should have pages operated on
    > > > in-order, until we get to the end of the range. Circular writeback could
    > > > go through again, possibly, but no more than once.

    > >
    > > OK, I have been able to reproduce it somewhat. It is not a livelock,
    > > but what is happening is that direct IO read basically does an fsync
    > > on the file before performing the IO. The fsync gets stuck behind the
    > > dd that is dirtying the pages, and ends up following behind it and
    > > doing all its IO for it.
    > >
    > > The following patch avoids the issue for direct IO, by using the range
    > > syncs rather than trying to sync the whole file.
    > >
    > > The underlying problem I guess is unchanged. Is it really a problem,
    > > though? The way I'd love to solve it is actually by adding another bit
    > > or two to the pagecache radix tree, that can be used to transiently tag
    > > the tree for future operations. That way we could record the dirty and
    > > writeback pages up front, and then only bother with operating on them.
    > >
    > > That's *if* it really is a problem. I don't have much pity for someone
    > > doing buffered IO and direct IO to the same pages of the same file

    >
    > I've seen lots of discussions here about different options in syncing. in this
    > case a fix is to do a fsync of a range.


    It fixes the bug in concurrent direct read+buffed write, but won't fix the
    bug with concurrent sync+buffered write.

    > I've also seen discussions of how the
    > kernel filesystem code can do ordered writes without having to wait for them
    > with the use of barriers, is this capability exported to userspace? if so,
    > could you point me at documentation for it?


    It isn't. And it is good that it isn't --- the more complicated API, the
    more maintenance work.

    Mikulas

    > David Lang
    >

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: application syncing options (was Re: [PATCH] Memory management livelock)

    On Sun, 5 Oct 2008, Mikulas Patocka wrote:

    > On Fri, 3 Oct 2008, david@lang.hm wrote:
    >
    >> I've also seen discussions of how the
    >> kernel filesystem code can do ordered writes without having to wait for them
    >> with the use of barriers, is this capability exported to userspace? if so,
    >> could you point me at documentation for it?

    >
    > It isn't. And it is good that it isn't --- the more complicated API, the
    > more maintenance work.


    I can understand that most software would not want to deal with
    complications like this, but for things thta have requirements similar to
    journaling filesystems (databases for example) it would seem that there
    would be advantages to exposing this capabilities.

    David Lang
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH 2/3] Fix fsync livelock

    On Sun, 5 Oct 2008 20:01:46 -0400 (EDT)
    Mikulas Patocka wrote:

    > I assume that if very few people complained about the livelock till
    > now, very few people will see degraded write performance. My patch
    > blocks the writes only if the livelock happens, so if the livelock
    > doesn't happen in unpatched kernel for most people, the patch won't
    > make it worse.


    I object to calling this a livelock. It's not.
    And yes, fsync is slow and lots of people are seeing that.
    It's not helped by how ext3 is implemented (where fsync is effectively
    equivalent of a sync for many cases).
    But again, moving the latency to "innocent" parties is not acceptable.

    >
    > > If the fsync() implementation isn't smart enough, sure, lets improve
    > > it. But not by shifting latency around... lets make it more
    > > efficient at submitting IO.
    > > If we need to invent something like "chained IO" where if you wait
    > > on the last of the chain, you wait on the entirely chain, so be it.

    >
    > This looks madly complicated. And ineffective, because if some page
    > was submitted before fsync() was invoked, and is under writeback
    > while fsync() is called, fsync() still has to wait on it.


    so?
    just make a chain per inode always...


    --
    Arjan van de Ven Intel Open Source Technology Centre
    For development, discussion and tips for power savings,
    visit http://www.lesswatts.org
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH 2/3] Fix fsync livelock

    On Sun, Oct 05, 2008 at 08:01:46PM -0400, Mikulas Patocka wrote:
    > This looks madly complicated. And ineffective, because if some page was
    > submitted before fsync() was invoked, and is under writeback while fsync()
    > is called, fsync() still has to wait on it.


    fsync() waiting on pre-issued writeback pages is the correct
    behaviour.

    IOW, if the page is under writeback at the time an fsync() is
    issued (e.g. issued by pdflush), the page was *not clean* at the
    time the fsync() was called and hence must be clean when fsync()
    returns. fsync() needs to wait for all pages under I/O at the time
    it is called, not just the dirty pages it issues I/O on.....

    Cheers,

    Dave.
    --
    Dave Chinner
    david@fromorbit.com
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH 2/3] Fix fsync livelock



    On Sun, 5 Oct 2008, Arjan van de Ven wrote:

    > On Sun, 5 Oct 2008 20:01:46 -0400 (EDT)
    > Mikulas Patocka wrote:
    >
    > > I assume that if very few people complained about the livelock till
    > > now, very few people will see degraded write performance. My patch
    > > blocks the writes only if the livelock happens, so if the livelock
    > > doesn't happen in unpatched kernel for most people, the patch won't
    > > make it worse.

    >
    > I object to calling this a livelock. It's not.


    It unlocks itself when the whole disk is written, and it can be several
    hours (or days, if you have many-terabyte array). So formally it is not
    livelock, from the user experience it is --- he sees unkillable process in
    'D' state for many hours.

    > And yes, fsync is slow and lots of people are seeing that.
    > It's not helped by how ext3 is implemented (where fsync is effectively
    > equivalent of a sync for many cases).
    > But again, moving the latency to "innocent" parties is not acceptable.
    >
    > >
    > > > If the fsync() implementation isn't smart enough, sure, lets improve
    > > > it. But not by shifting latency around... lets make it more
    > > > efficient at submitting IO.
    > > > If we need to invent something like "chained IO" where if you wait
    > > > on the last of the chain, you wait on the entirely chain, so be it.

    > >
    > > This looks madly complicated. And ineffective, because if some page
    > > was submitted before fsync() was invoked, and is under writeback
    > > while fsync() is called, fsync() still has to wait on it.

    >
    > so?
    > just make a chain per inode always...


    The point is that many fsync()s may run in parallel and you have just one
    inode and just one chain. And if you add two-word list_head to a page, to
    link it on this list, many developers will hate it for increasing its
    size.

    See the work dobe by Nick Piggin somewhere in this thread. He uses just
    one bit in radix tree to mark pages to process. But he needs to serialize
    all syncs on a given file, they no longer run in parallel.

    Mikulas
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: application syncing options (was Re: [PATCH] Memory management livelock)

    On Sun, 5 Oct 2008, david@lang.hm wrote:

    > On Sun, 5 Oct 2008, Mikulas Patocka wrote:
    >
    > > On Fri, 3 Oct 2008, david@lang.hm wrote:
    > >
    > > > I've also seen discussions of how the
    > > > kernel filesystem code can do ordered writes without having to wait for
    > > > them
    > > > with the use of barriers, is this capability exported to userspace? if so,
    > > > could you point me at documentation for it?

    > >
    > > It isn't. And it is good that it isn't --- the more complicated API, the
    > > more maintenance work.

    >
    > I can understand that most software would not want to deal with complications
    > like this, but for things thta have requirements similar to journaling
    > filesystems (databases for example) it would seem that there would be
    > advantages to exposing this capabilities.
    >
    > David Lang


    If you invent new interface that allows submitting several ordered IOs
    from userspace, it will require excessive maintenance overhead over long
    period of time. So it should be only justified, if the performance
    improvement is excessive as well.

    It should not be like "here you improve 10% performance on some synthetic
    benchmark in one application that was rewritten to support the new
    interface" and then create a few more security vulnerabilities (because of
    the complexity of the interface) and damage overall Linux progress,
    because everyone is catching bugs in the new interface and checking it for
    correctness.

    Mikulas
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH 2/3] Fix fsync livelock

    On Sun, 5 Oct 2008 23:30:51 -0400 (EDT
    > The point is that many fsync()s may run in parallel and you have just
    > one inode and just one chain. And if you add two-word list_head to a
    > page, to link it on this list, many developers will hate it for
    > increasing its size.


    why to a page?
    a list head in the inode and chain up the bios....
    or not make an actual list but just a "is the previous one done" thing
    it's not all that hard to get something that works on a per inode basis,
    that gives "wait for all io upto this one".



    --
    Arjan van de Ven Intel Open Source Technology Centre
    For development, discussion and tips for power savings,
    visit http://www.lesswatts.org
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH 2/3] Fix fsync livelock

    On Sun, 5 Oct 2008, Arjan van de Ven wrote:

    > On Sun, 5 Oct 2008 23:30:51 -0400 (EDT
    > > The point is that many fsync()s may run in parallel and you have just
    > > one inode and just one chain. And if you add two-word list_head to a
    > > page, to link it on this list, many developers will hate it for
    > > increasing its size.

    >
    > why to a page?
    > a list head in the inode and chain up the bios....


    And if you want to wait for a bio submitted by a different process?
    There's no way you can find the bio from the page.

    > or not make an actual list but just a "is the previous one done" thing
    > it's not all that hard to get something that works on a per inode basis,
    > that gives "wait for all io upto this one".


    So code it

    Mikulas
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH 2/3] Fix fsync livelock

    On Mon, 6 Oct 2008 09:00:14 -0400 (EDT)
    Mikulas Patocka wrote:

    > On Sun, 5 Oct 2008, Arjan van de Ven wrote:
    >
    > > On Sun, 5 Oct 2008 23:30:51 -0400 (EDT
    > > > The point is that many fsync()s may run in parallel and you have
    > > > just one inode and just one chain. And if you add two-word
    > > > list_head to a page, to link it on this list, many developers
    > > > will hate it for increasing its size.

    > >
    > > why to a page?
    > > a list head in the inode and chain up the bios....

    >
    > And if you want to wait for a bio submitted by a different process?
    > There's no way you can find the bio from the page.


    the point is that the kernel would always chain it to the inode,
    independent of who or when it is submitted


    --
    Arjan van de Ven Intel Open Source Technology Centre
    For development, discussion and tips for power savings,
    visit http://www.lesswatts.org
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [PATCH 2/3] Fix fsync livelock

    On Mon, 6 Oct 2008, Arjan van de Ven wrote:

    > On Mon, 6 Oct 2008 09:00:14 -0400 (EDT)
    > Mikulas Patocka wrote:
    >
    > > On Sun, 5 Oct 2008, Arjan van de Ven wrote:
    > >
    > > > On Sun, 5 Oct 2008 23:30:51 -0400 (EDT
    > > > > The point is that many fsync()s may run in parallel and you have
    > > > > just one inode and just one chain. And if you add two-word
    > > > > list_head to a page, to link it on this list, many developers
    > > > > will hate it for increasing its size.
    > > >
    > > > why to a page?
    > > > a list head in the inode and chain up the bios....

    > >
    > > And if you want to wait for a bio submitted by a different process?
    > > There's no way you can find the bio from the page.

    >
    > the point is that the kernel would always chain it to the inode,
    > independent of who or when it is submitted


    If you add a list to an inode, you need to protect it with a spinlock. So
    you take one more spinlock for any write bio submitted --- a lot of
    developers would hate it.

    Another problem: how do you want to walk all dirty pages and submit bio
    for them?

    The act of allocating and submission of bio can block (if you run out of
    some mempool) and in this case it wait until some other bio is finished.
    During this time, more dirty pages can be created.

    Also, if you find a page that is both dirty and under writeback, you need
    to wait until a writeback finishes and then initiate another writeback
    (because the old writeback may be writing stale data). You again, block,
    and more dirty pages can appear.

    And if you block and more dirty pages appear, you are prone to the
    livelock.

    [ In Nick Piggin's patch, it is needed to lock the whole address space,
    mark dirty pages in one non-blocking pass and write marked pages again in
    a blocking pass --- so that if more dirty pages appear while bios are
    submitted, the new pages will be skipped ]

    Mikulas
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: application syncing options (was Re: [PATCH] Memory management livelock)

    On Sun, 5 Oct 2008, Mikulas Patocka wrote:

    > On Sun, 5 Oct 2008, david@lang.hm wrote:
    >
    >> On Sun, 5 Oct 2008, Mikulas Patocka wrote:
    >>
    >>> On Fri, 3 Oct 2008, david@lang.hm wrote:
    >>>
    >>>> I've also seen discussions of how the
    >>>> kernel filesystem code can do ordered writes without having to wait for
    >>>> them
    >>>> with the use of barriers, is this capability exported to userspace? if so,
    >>>> could you point me at documentation for it?
    >>>
    >>> It isn't. And it is good that it isn't --- the more complicated API, the
    >>> more maintenance work.

    >>
    >> I can understand that most software would not want to deal with complications
    >> like this, but for things thta have requirements similar to journaling
    >> filesystems (databases for example) it would seem that there would be
    >> advantages to exposing this capabilities.
    >>
    >> David Lang

    >
    > If you invent new interface that allows submitting several ordered IOs
    > from userspace, it will require excessive maintenance overhead over long
    > period of time. So it should be only justified, if the performance
    > improvement is excessive as well.
    >
    > It should not be like "here you improve 10% performance on some synthetic
    > benchmark in one application that was rewritten to support the new
    > interface" and then create a few more security vulnerabilities (because of
    > the complexity of the interface) and damage overall Linux progress,
    > because everyone is catching bugs in the new interface and checking it for
    > correctness.


    the same benchmarks that show that it's far better for the in-kernel
    filesystem code to use write barriers should apply for FUSE filesystems.

    this isn't a matter of a few % in performance, if an application is
    sync-limited in a way that can be converted to write-ordered the potential
    is for the application to speed up my many times.

    programs that maintain indexes or caches of data that lives in other files
    will be able to write data && barrier && write index && fsync and double
    their performance vs write data && fsync && write index && fsync

    databases can potentially do even better, today they need to fsync data to
    disk before they can update their journal to indicate that the data has
    been written, with a barrier they could order the writes so that the write
    to the journal doesn't happen until the writes of the data. they would
    neve need to call an fsync at all (when emptying the journal)

    for systems without solid-state drives or battery-backed caches, the
    ability to eliminate fsyncs by being able to rely on the order of the
    writes is a huge benifit.

    David Lang
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: application syncing options (was Re: [PATCH] Memory management livelock)

    > > If you invent new interface that allows submitting several ordered IOs
    > > from userspace, it will require excessive maintenance overhead over long
    > > period of time. So it should be only justified, if the performance
    > > improvement is excessive as well.
    > >
    > > It should not be like "here you improve 10% performance on some synthetic
    > > benchmark in one application that was rewritten to support the new
    > > interface" and then create a few more security vulnerabilities (because of
    > > the complexity of the interface) and damage overall Linux progress,
    > > because everyone is catching bugs in the new interface and checking it for
    > > correctness.

    >
    > the same benchmarks that show that it's far better for the in-kernel
    > filesystem code to use write barriers should apply for FUSE filesystems.


    FUSE is slow by design, and it is used in cases where performance isn't
    crucial.

    > this isn't a matter of a few % in performance, if an application is
    > sync-limited in a way that can be converted to write-ordered the potential is
    > for the application to speed up my many times.
    >
    > programs that maintain indexes or caches of data that lives in other files
    > will be able to write data && barrier && write index && fsync and double their
    > performance vs write data && fsync && write index && fsync


    They can do: write data with O_SYNC; write another piece of data with
    O_SYNC.

    And the only difference from barriers is the waiting time after the first
    O_SYNC before the second I/O is submitted (such delay wouldn't happen with
    barriers).

    And now I/O delay is in milliseconds and process wakeup time is tens of
    microseconds, it doesn't look like eliminating process wakeup time would
    do more than few percents.

    > databases can potentially do even better, today they need to fsync data to
    > disk before they can update their journal to indicate that the data has been
    > written, with a barrier they could order the writes so that the write to the
    > journal doesn't happen until the writes of the data. they would neve need to
    > call an fsync at all (when emptying the journal)


    Good databases can pack several user transactions into one fsync() write.
    If the database server is properly engineered, it accumulates all user
    transactions committed so far into one chunk, writes that chunk with one
    fsync() call and then reports successful commit to the clients.

    So if you increase fsync() latency, it should have no effect on the
    transactional throughput --- only on latency of transactions. Similarly,
    if you decrease fsync() latency, it won't increase number of processed
    transactions.

    Certainly, there are primitive embedded database libraries that fsync()
    after each transaction, but they don't have good performance anyway.

    > for systems without solid-state drives or battery-backed caches, the ability
    > to eliminate fsyncs by being able to rely on the order of the writes is a huge
    > benifit.


    I may ask --- where are the applications that require extra slow fsync()
    latency? Databases are not that, they batch transactions.

    If you want to improve things, you can try:
    * implement O_DSYNC (like O_SYNC, but doesn't update inode mtime)
    * implement range_fsync and range_fdatasync (sync on file range --- the
    kernel has already support for that, you can just add a syscall)
    * turn on FUA bit for O_DSYNC writes, that eliminates the need to flush
    drive cache in O_DSYNC call

    --- these are definitely less invasive than new I/O submitting interface.

    Mikulas

    > David Lang
    >

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: application syncing options (was Re: [PATCH] Memory management livelock)

    On Tue, 7 Oct 2008, Mikulas Patocka wrote:

    >>> If you invent new interface that allows submitting several ordered IOs
    >>> from userspace, it will require excessive maintenance overhead over long
    >>> period of time. So it should be only justified, if the performance
    >>> improvement is excessive as well.
    >>>
    >>> It should not be like "here you improve 10% performance on some synthetic
    >>> benchmark in one application that was rewritten to support the new
    >>> interface" and then create a few more security vulnerabilities (because of
    >>> the complexity of the interface) and damage overall Linux progress,
    >>> because everyone is catching bugs in the new interface and checking it for
    >>> correctness.

    >>
    >> the same benchmarks that show that it's far better for the in-kernel
    >> filesystem code to use write barriers should apply for FUSE filesystems.

    >
    > FUSE is slow by design, and it is used in cases where performance isn't
    > crucial.


    FUSE is slow, but I don't believe that it's a design goal for it to be
    slow, it's a limitation of the implementation. so things that could speed
    it up would be a good thing.

    >> this isn't a matter of a few % in performance, if an application is
    >> sync-limited in a way that can be converted to write-ordered the potential is
    >> for the application to speed up my many times.
    >>
    >> programs that maintain indexes or caches of data that lives in other files
    >> will be able to write data && barrier && write index && fsync and double their
    >> performance vs write data && fsync && write index && fsync

    >
    > They can do: write data with O_SYNC; write another piece of data with
    > O_SYNC.
    >
    > And the only difference from barriers is the waiting time after the first
    > O_SYNC before the second I/O is submitted (such delay wouldn't happen with
    > barriers).
    >
    > And now I/O delay is in milliseconds and process wakeup time is tens of
    > microseconds, it doesn't look like eliminating process wakeup time would
    > do more than few percents.


    each sync write needs to wait for a disk rotation (and a seek if you are
    writing to different files). if you only do two writes you save one disk
    rotation, if you do five writes you save four disk rotations

    >> databases can potentially do even better, today they need to fsync data to
    >> disk before they can update their journal to indicate that the data has been
    >> written, with a barrier they could order the writes so that the write to the
    >> journal doesn't happen until the writes of the data. they would neve need to
    >> call an fsync at all (when emptying the journal)

    >
    > Good databases can pack several user transactions into one fsync() write.
    > If the database server is properly engineered, it accumulates all user
    > transactions committed so far into one chunk, writes that chunk with one
    > fsync() call and then reports successful commit to the clients.


    if there are multiple users doing transactions at the same time they will
    benifit from overlapping the fsyncs. but each user session cannot complete
    their transaction until the fsync completes

    > So if you increase fsync() latency, it should have no effect on the
    > transactional throughput --- only on latency of transactions. Similarly,
    > if you decrease fsync() latency, it won't increase number of processed
    > transactions.


    only if you have all your transactions happening in parallel. in the real
    world programs sometimes need to wait for one transaction to complete so
    that they can do the next one.

    > Certainly, there are primitive embedded database libraries that fsync()
    > after each transaction, but they don't have good performance anyway.
    >
    >> for systems without solid-state drives or battery-backed caches, the ability
    >> to eliminate fsyncs by being able to rely on the order of the writes is a huge
    >> benifit.

    >
    > I may ask --- where are the applications that require extra slow fsync()
    > latency? Databases are not that, they batch transactions.
    >
    > If you want to improve things, you can try:
    > * implement O_DSYNC (like O_SYNC, but doesn't update inode mtime)
    > * implement range_fsync and range_fdatasync (sync on file range --- the
    > kernel has already support for that, you can just add a syscall)
    > * turn on FUA bit for O_DSYNC writes, that eliminates the need to flush
    > drive cache in O_DSYNC call
    >
    > --- these are definitely less invasive than new I/O submitting interface.


    but all of these require that the application stop and wait for each
    seperate write to take place before proceeding to the next step.

    if this doesn't matter, then why the big push to have the in-kernel
    filesystems start using barriers? I understood that this resulted in large
    performance increases in the places that they are used from just being
    able to avoid having to drain the entire request queue, and you are saying
    that the applications would not only need to wait for the queue to flush,
    but for the disk to acknowledge the write.

    syncs are slow, in some cases _very_ slow.

    David Lang
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [PATCH 2/3] Fix fsync livelock

    On Sun 2008-10-05 17:30:19, Arjan van de Ven wrote:
    > On Sun, 5 Oct 2008 20:01:46 -0400 (EDT)
    > Mikulas Patocka wrote:
    >
    > > I assume that if very few people complained about the livelock till
    > > now, very few people will see degraded write performance. My patch
    > > blocks the writes only if the livelock happens, so if the livelock
    > > doesn't happen in unpatched kernel for most people, the patch won't
    > > make it worse.

    >
    > I object to calling this a livelock. It's not.


    8 hours of process in D state is a livelock. And we can do minimal fix
    here, this almost never happens in real life anyway.

    Latency imposed of writer should not be a problem...
    --
    (english) http://www.livejournal.com/~pavelmachek
    (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pav...rses/blog.html
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: RFC: one-bit mutexes (was: Re: [PATCH 2/3] Memory management livelock)

    On Monday 06 October 2008 09:11, Mikulas Patocka wrote:
    > Hi
    >
    > I removed the repeated code and create a new bit mutexes. They are
    > space-efficient mutexes that consume only one bit. See the next 3 patches.


    Pretty reasonable to have.


    > If you are concerned about the size of an inode, I can convert other
    > mutexes to bit mutexes: i_mutex and inotify_mutex.


    I wouldn't worry for now. mutexes can be unlocked much faster than bit
    mutexes, especially in the fastpath. And due to slab, it would be
    unlikely to actually save any space.


    > I could also create
    > bit_spinlock (one-bit spinlock that uses test_and_set_bit) and save space
    > for address_space->tree_lock, address_space->i_mmap_lock,
    > address_space->private_lock, inode->i_lock.


    We have that already. It is much much faster to unlock spinlocks than
    bit spinlocks in general (if you own the word exclusively, then it's
    not, but then you would be less likely to save space), and we can also
    do proper FIFO ticket locks with a larger word.


    > Look at it and say what you think about the idea of condensing mutexes
    > into single bits.


    Looks pretty good to me.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: RFC: one-bit mutexes (was: Re: [PATCH 2/3] Memory management livelock)

    > > If you are concerned about the size of an inode, I can convert other
    > > mutexes to bit mutexes: i_mutex and inotify_mutex.

    >
    > I wouldn't worry for now. mutexes can be unlocked much faster than bit
    > mutexes, especially in the fastpath. And due to slab, it would be
    > unlikely to actually save any space.


    Maybe inotify_mutex. You are right that i_mutex is so heavily contended
    that slowing it down to save few words wouldn't be good. Do you know about
    any inotify-intensive workload?

    > > I could also create
    > > bit_spinlock (one-bit spinlock that uses test_and_set_bit) and save space
    > > for address_space->tree_lock, address_space->i_mmap_lock,
    > > address_space->private_lock, inode->i_lock.

    >
    > We have that already. It is much much faster to unlock spinlocks than
    > bit spinlocks in general (if you own the word exclusively, then it's
    > not, but then you would be less likely to save space), and we can also
    > do proper FIFO ticket locks with a larger word.


    BTW. why do spinlocks on x86(64) have 32 bits and not 8 bits or 16 bits?
    Are atomic 32-bit instuctions faster?

    Can x86(86) system have 256 CPUs?

    Mikulas
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: RFC: one-bit mutexes (was: Re: [PATCH 2/3] Memory management livelock)

    On Tuesday 21 October 2008 07:14, Mikulas Patocka wrote:
    > > > If you are concerned about the size of an inode, I can convert other
    > > > mutexes to bit mutexes: i_mutex and inotify_mutex.

    > >
    > > I wouldn't worry for now. mutexes can be unlocked much faster than bit
    > > mutexes, especially in the fastpath. And due to slab, it would be
    > > unlikely to actually save any space.

    >
    > Maybe inotify_mutex. You are right that i_mutex is so heavily contended
    > that slowing it down to save few words wouldn't be good. Do you know about
    > any inotify-intensive workload?


    Don't really know, no. I think most desktop environments use it to
    some extent, but no idea how much.


    > > > I could also create
    > > > bit_spinlock (one-bit spinlock that uses test_and_set_bit) and save
    > > > space for address_space->tree_lock, address_space->i_mmap_lock,
    > > > address_space->private_lock, inode->i_lock.

    > >
    > > We have that already. It is much much faster to unlock spinlocks than
    > > bit spinlocks in general (if you own the word exclusively, then it's
    > > not, but then you would be less likely to save space), and we can also
    > > do proper FIFO ticket locks with a larger word.

    >
    > BTW. why do spinlocks on x86(64) have 32 bits and not 8 bits or 16 bits?
    > Are atomic 32-bit instuctions faster?


    In the case of <= 256 CPUs, they could be an unsigned short I think.
    Probably it has never been found to be a huge win because they are
    often beside other ints or longs. I think I actually booted up the
    kernel with 16-bit spinlocks when doing the FIFO locks, but never
    sent a patch for it... Don't let me stop you from trying though.


    > Can x86(86) system have 256 CPUs?


    Well, none that I know of which actually exist. SGI is hoping to have
    4096 CPU x86 systems as far as I can tell.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 3 of 3 FirstFirst 1 2 3