[PATCH] Properly notify block layer of sync writes - Kernel

This is a discussion on [PATCH] Properly notify block layer of sync writes - Kernel ; Hi, fsync_buffers_list() and sync_dirty_buffer() both issue async writes and then immediately wait on them. Conceptually, that makes them sync writes and we should treat them as such so that the IO schedulers can handle them appropriately. This patch fixes a ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: [PATCH] Properly notify block layer of sync writes

  1. [PATCH] Properly notify block layer of sync writes

    Hi,

    fsync_buffers_list() and sync_dirty_buffer() both issue async writes and
    then immediately wait on them. Conceptually, that makes them sync writes
    and we should treat them as such so that the IO schedulers can handle
    them appropriately.

    This patch fixes a write starvation issue that Lin Ming reported, where
    xx is stuck for more than 2 minutes because of a large number of
    synchronous IO in the system:

    INFO: task kjournald:20558 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
    message.
    kjournald D ffff810010820978 6712 20558 2
    ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2
    ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb
    0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537
    Call Trace:
    [] kobject_get+0x12/0x17
    [] getnstimeofday+0x2f/0x83
    [] sync_buffer+0x0/0x3f
    [] io_schedule+0x5d/0x9f
    [] sync_buffer+0x3b/0x3f
    [] __wait_on_bit+0x40/0x6f
    [] sync_buffer+0x0/0x3f
    [] out_of_line_wait_on_bit+0x6c/0x78
    [] wake_bit_function+0x0/0x23
    [] sync_dirty_buffer+0x98/0xcb
    [] journal_commit_transaction+0x97d/0xcb6
    [] lock_timer_base+0x26/0x4b
    [] kjournald+0xc1/0x1fb
    [] autoremove_wake_function+0x0/0x2e
    [] kjournald+0x0/0x1fb
    [] kthread+0x47/0x74
    [] schedule_tail+0x28/0x5d
    [] child_rip+0xa/0x12
    [] kthread+0x0/0x74
    [] child_rip+0x0/0x12

    Lin Ming confirms that this patch fixes the issue. I've run tests with
    it for the past week and no ill effects have been observed, so I'm
    proposing it for inclusion into 2.6.26.

    Signed-off-by: Jens Axboe
    ---
    fs/buffer.c | 13 ++++++++-----
    include/linux/fs.h | 1 +
    2 files changed, 9 insertions(+), 5 deletions(-)

    diff --git a/fs/buffer.c b/fs/buffer.c
    index a073f3f..0f51c0f 100644
    --- a/fs/buffer.c
    +++ b/fs/buffer.c
    @@ -821,7 +821,7 @@ static int fsync_buffers_list(spinlock_t *lock, struct list_head *list)
    * contents - it is a noop if I/O is still in
    * flight on potentially older contents.
    */
    - ll_rw_block(SWRITE, 1, &bh);
    + ll_rw_block(SWRITE_SYNC, 1, &bh);
    brelse(bh);
    spin_lock(lock);
    }
    @@ -2940,16 +2940,19 @@ void ll_rw_block(int rw, int nr, struct buffer_head *bhs[])
    for (i = 0; i < nr; i++) {
    struct buffer_head *bh = bhs[i];

    - if (rw == SWRITE)
    + if (rw == SWRITE || rw == SWRITE_SYNC)
    lock_buffer(bh);
    else if (test_set_buffer_locked(bh))
    continue;

    - if (rw == WRITE || rw == SWRITE) {
    + if (rw == WRITE || rw == SWRITE || rw == SWRITE_SYNC) {
    if (test_clear_buffer_dirty(bh)) {
    bh->b_end_io = end_buffer_write_sync;
    get_bh(bh);
    - submit_bh(WRITE, bh);
    + if (rw == SWRITE_SYNC)
    + submit_bh(WRITE_SYNC, bh);
    + else
    + submit_bh(WRITE, bh);
    continue;
    }
    } else {
    @@ -2978,7 +2981,7 @@ int sync_dirty_buffer(struct buffer_head *bh)
    if (test_clear_buffer_dirty(bh)) {
    get_bh(bh);
    bh->b_end_io = end_buffer_write_sync;
    - ret = submit_bh(WRITE, bh);
    + ret = submit_bh(WRITE_SYNC, bh);
    wait_on_buffer(bh);
    if (buffer_eopnotsupp(bh)) {
    clear_buffer_eopnotsupp(bh);
    diff --git a/include/linux/fs.h b/include/linux/fs.h
    index d490779..f25f95d 100644
    --- a/include/linux/fs.h
    +++ b/include/linux/fs.h
    @@ -83,6 +83,7 @@ extern int dir_notify_enable;
    #define READ_SYNC (READ | (1 << BIO_RW_SYNC))
    #define READ_META (READ | (1 << BIO_RW_META))
    #define WRITE_SYNC (WRITE | (1 << BIO_RW_SYNC))
    +#define SWRITE_SYNC (SWRITE | (1 << BIO_RW_SYNC))
    #define WRITE_BARRIER ((1 << BIO_RW) | (1 << BIO_RW_BARRIER))

    #define SEL_IN 1
    --
    1.5.6


    --
    Jens Axboe

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH] Properly notify block layer of sync writes

    On Fri, 27 Jun 2008 15:18:31 +0200
    Jens Axboe wrote:

    > Hi,
    >
    > fsync_buffers_list() and sync_dirty_buffer() both issue async writes and
    > then immediately wait on them. Conceptually, that makes them sync writes
    > and we should treat them as such so that the IO schedulers can handle
    > them appropriately.
    >
    > This patch fixes a write starvation issue that Lin Ming reported, where
    > xx is stuck for more than 2 minutes because of a large number of
    > synchronous IO in the system:
    >
    > INFO: task kjournald:20558 blocked for more than 120 seconds.
    > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
    > message.
    > kjournald D ffff810010820978 6712 20558 2
    > ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2
    > ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb
    > 0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537
    > Call Trace:
    > [] kobject_get+0x12/0x17
    > [] getnstimeofday+0x2f/0x83
    > [] sync_buffer+0x0/0x3f
    > [] io_schedule+0x5d/0x9f
    > [] sync_buffer+0x3b/0x3f
    > [] __wait_on_bit+0x40/0x6f
    > [] sync_buffer+0x0/0x3f
    > [] out_of_line_wait_on_bit+0x6c/0x78
    > [] wake_bit_function+0x0/0x23
    > [] sync_dirty_buffer+0x98/0xcb
    > [] journal_commit_transaction+0x97d/0xcb6
    > [] lock_timer_base+0x26/0x4b
    > [] kjournald+0xc1/0x1fb
    > [] autoremove_wake_function+0x0/0x2e
    > [] kjournald+0x0/0x1fb
    > [] kthread+0x47/0x74
    > [] schedule_tail+0x28/0x5d
    > [] child_rip+0xa/0x12
    > [] kthread+0x0/0x74
    > [] child_rip+0x0/0x12
    >
    > Lin Ming confirms that this patch fixes the issue. I've run tests with
    > it for the past week and no ill effects have been observed, so I'm
    > proposing it for inclusion into 2.6.26.


    I expect we'll be wanting this in 2.6.25.x also?

    > Signed-off-by: Jens Axboe
    > ---
    > fs/buffer.c | 13 ++++++++-----
    > include/linux/fs.h | 1 +
    > 2 files changed, 9 insertions(+), 5 deletions(-)
    >
    > ...
    >

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH] Properly notify block layer of sync writes

    On Tue, Jul 01 2008, Andrew Morton wrote:
    > On Fri, 27 Jun 2008 15:18:31 +0200
    > Jens Axboe wrote:
    >
    > > Hi,
    > >
    > > fsync_buffers_list() and sync_dirty_buffer() both issue async writes and
    > > then immediately wait on them. Conceptually, that makes them sync writes
    > > and we should treat them as such so that the IO schedulers can handle
    > > them appropriately.
    > >
    > > This patch fixes a write starvation issue that Lin Ming reported, where
    > > xx is stuck for more than 2 minutes because of a large number of
    > > synchronous IO in the system:
    > >
    > > INFO: task kjournald:20558 blocked for more than 120 seconds.
    > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
    > > message.
    > > kjournald D ffff810010820978 6712 20558 2
    > > ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2
    > > ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb
    > > 0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537
    > > Call Trace:
    > > [] kobject_get+0x12/0x17
    > > [] getnstimeofday+0x2f/0x83
    > > [] sync_buffer+0x0/0x3f
    > > [] io_schedule+0x5d/0x9f
    > > [] sync_buffer+0x3b/0x3f
    > > [] __wait_on_bit+0x40/0x6f
    > > [] sync_buffer+0x0/0x3f
    > > [] out_of_line_wait_on_bit+0x6c/0x78
    > > [] wake_bit_function+0x0/0x23
    > > [] sync_dirty_buffer+0x98/0xcb
    > > [] journal_commit_transaction+0x97d/0xcb6
    > > [] lock_timer_base+0x26/0x4b
    > > [] kjournald+0xc1/0x1fb
    > > [] autoremove_wake_function+0x0/0x2e
    > > [] kjournald+0x0/0x1fb
    > > [] kthread+0x47/0x74
    > > [] schedule_tail+0x28/0x5d
    > > [] child_rip+0xa/0x12
    > > [] kthread+0x0/0x74
    > > [] child_rip+0x0/0x12
    > >
    > > Lin Ming confirms that this patch fixes the issue. I've run tests with
    > > it for the past week and no ill effects have been observed, so I'm
    > > proposing it for inclusion into 2.6.26.

    >
    > I expect we'll be wanting this in 2.6.25.x also?


    Yeah, I think so.

    --
    Jens Axboe

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread