Re: ext3: slow symlink corruption on umount... - Kernel

This is a discussion on Re: ext3: slow symlink corruption on umount... - Kernel ; Some additional info -- and attempting to cast a wider net on the CC: I do not see the long symlink corruption with mount -o data=writeback and we've now seen a couple cases where the symlink corruption does not require ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 32

Thread: Re: ext3: slow symlink corruption on umount...

  1. Re: ext3: slow symlink corruption on umount...

    Some additional info -- and attempting to cast a wider
    net on the CC:

    I do not see the long symlink corruption with mount -o data=writeback
    and we've now seen a couple cases where the symlink corruption
    does not require a umount...

    Arthur

    On Fri, Oct 24, 2008 at 11:37:34AM -0700, Arthur Jones wrote:
    > Hi All, I'm seeing slow symlink corruption on ext3 on linux-2.6.27,
    > yesterday's linux-2.6 git tree and 2.6.9 RHEL4.7. I.e. every kernel
    > I've tried I see this effect. To reproduce this, I need:
    >
    > * 250MB + tar file in memory (tmpfs or in the buffer cache)
    > * long symlinks in the tar file (over 60 characters)
    > * umount immediately after untarring
    >
    > What I see is that the symlinks are corrupted, e.g.:
    >
    > # ls -l etc/vmware-vix-disklib
    > etc/vmware-vix-disklib -> ??f
    >
    > fsck shows:
    >
    > Symlink /etc/vmware-vix-disklib (inode #16454) is invalid.
    >
    > Debugfs shows:
    >
    > debugfs: stat <16454>
    > Inode: 16454 Type: symlink Mode: 0777 Flags: 0x0 Generation: 1431972005
    > User: 0 Group: 0 Size: 65
    > File ACL: 0 Directory ACL: 0
    > Links: 1 Blockcount: 8
    > Fragment: Address: 0 Number: 0 Size: 0
    > ctime: 0x4900ac69 -- Thu Oct 23 09:55:05 2008
    > atime: 0x4900ac84 -- Thu Oct 23 09:55:32 2008
    > mtime: 0x4900ac69 -- Thu Oct 23 09:55:05 2008
    > BLOCKS:
    > (0):56034
    > TOTAL: 1
    >
    > I'm still tracking down exactly what's going on. Anyone seen
    > anything like this before? ext2 does not show this effect (I've
    > not tried ext4). It happens when the backing block device is
    > a SATA drive or flash.
    >
    > Thanks,
    >
    > Arthur

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: ext3: slow symlink corruption on umount...

    This one's turning out to be a slippery fish.
    I have found that the corruption appears to
    be due to ->writepage() not getting called at all
    for any of the long symlinks...

    Ring any bells anyone? Any ideas where to look
    or what to test? This is my first foray into
    ext3 and I could definitely use some expert advice...

    Arthur

    On Mon, Oct 27, 2008 at 09:54:23AM -0700, Arthur Jones wrote:
    > Some additional info -- and attempting to cast a wider
    > net on the CC:
    >
    > I do not see the long symlink corruption with mount -o data=writeback
    > and we've now seen a couple cases where the symlink corruption
    > does not require a umount...
    >
    > Arthur
    >
    > On Fri, Oct 24, 2008 at 11:37:34AM -0700, Arthur Jones wrote:
    > > Hi All, I'm seeing slow symlink corruption on ext3 on linux-2.6.27,
    > > yesterday's linux-2.6 git tree and 2.6.9 RHEL4.7. I.e. every kernel
    > > I've tried I see this effect. To reproduce this, I need:
    > >
    > > * 250MB + tar file in memory (tmpfs or in the buffer cache)
    > > * long symlinks in the tar file (over 60 characters)
    > > * umount immediately after untarring
    > >
    > > What I see is that the symlinks are corrupted, e.g.:
    > >
    > > # ls -l etc/vmware-vix-disklib
    > > etc/vmware-vix-disklib -> ??f
    > >
    > > fsck shows:
    > >
    > > Symlink /etc/vmware-vix-disklib (inode #16454) is invalid.
    > >
    > > Debugfs shows:
    > >
    > > debugfs: stat <16454>
    > > Inode: 16454 Type: symlink Mode: 0777 Flags: 0x0 Generation: 1431972005
    > > User: 0 Group: 0 Size: 65
    > > File ACL: 0 Directory ACL: 0
    > > Links: 1 Blockcount: 8
    > > Fragment: Address: 0 Number: 0 Size: 0
    > > ctime: 0x4900ac69 -- Thu Oct 23 09:55:05 2008
    > > atime: 0x4900ac84 -- Thu Oct 23 09:55:32 2008
    > > mtime: 0x4900ac69 -- Thu Oct 23 09:55:05 2008
    > > BLOCKS:
    > > (0):56034
    > > TOTAL: 1
    > >
    > > I'm still tracking down exactly what's going on. Anyone seen
    > > anything like this before? ext2 does not show this effect (I've
    > > not tried ext4). It happens when the backing block device is
    > > a SATA drive or flash.
    > >
    > > Thanks,
    > >
    > > Arthur

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: ext3: slow symlink corruption on umount...

    Arthur Jones wrote:
    > This one's turning out to be a slippery fish.
    > I have found that the corruption appears to
    > be due to ->writepage() not getting called at all
    > for any of the long symlinks...
    >
    > Ring any bells anyone? Any ideas where to look
    > or what to test? This is my first foray into
    > ext3 and I could definitely use some expert advice...
    >
    > Arthur



    Sorry for the silence, this is a nice bug you've found

    I can reproduce this at least, with this script:

    #!/bin/bash

    umount /mnt/test2
    mount /dev/sdb4 /mnt/test2
    rm -f /mnt/test2/*
    dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512
    touch
    /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryvery veryveryveryveryverylongfilename
    ln -s
    /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryvery veryveryveryveryverylongfilename
    /mnt/test2/link
    umount /mnt/test2
    mount /dev/sdb4 /mnt/test2
    ls /mnt/test2/
    umount /mnt/test2

    I'll look into it ...

    -Eric
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: ext3: slow symlink corruption on umount...

    On Wed, Oct 29, 2008 at 03:36:33PM -0500, Eric Sandeen wrote:
    >
    > Sorry for the silence, this is a nice bug you've found
    >
    > I'll look into it ...


    You may want to take a quick look at this thread:

    http://lkml.org/lkml/2008/10/28/413

    - Ted
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: ext3: slow symlink corruption on umount...

    Theodore Tso wrote:
    > On Wed, Oct 29, 2008 at 03:36:33PM -0500, Eric Sandeen wrote:
    >> Sorry for the silence, this is a nice bug you've found
    >>
    >> I'll look into it ...

    >
    > You may want to take a quick look at this thread:
    >
    > http://lkml.org/lkml/2008/10/28/413
    >
    > - Ted


    I thought about that, but at least at first glance I don't see how the
    gfp mask change would cause this behavior...? At least, I don't think
    we're seeing recursion back into the filesystem... but I'll ponder that.

    (Also, Arthur reports seeing this as long ago as 2.6.9...)

    -Eric
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: ext3: slow symlink corruption on umount...

    Hi Eric, ...

    On Thu, Oct 30, 2008 at 06:38:30AM -0700, Eric Sandeen wrote:
    > Theodore Tso wrote:
    > > On Wed, Oct 29, 2008 at 03:36:33PM -0500, Eric Sandeen wrote:
    > >> Sorry for the silence, this is a nice bug you've found
    > >>
    > >> I'll look into it ...

    > >
    > > You may want to take a quick look at this thread:
    > >
    > > http://lkml.org/lkml/2008/10/28/413
    > >
    > > - Ted

    >
    > I thought about that, but at least at first glance I don't see how the
    > gfp mask change would cause this behavior...? At least, I don't think
    > we're seeing recursion back into the filesystem... but I'll ponder that.


    Back when I thought this was a 2.6.9 bug, I tried
    that patch. There was no change in behavior...

    Arthur
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: ext3: slow symlink corruption on umount...

    Hi Eric, ...

    On Wed, Oct 29, 2008 at 01:36:33PM -0700, Eric Sandeen wrote:
    > [...]
    > I'll look into it ...


    In the cases that fail, I'm seeing bdi_write_congested()
    return 2 at the top of write_cache_pages(). In the working
    case, bdi_write_congested() returns 0 and the inodes are
    found twice in the s_io list in generic_sync_sb_inodes,
    first as i_state==7, where they are skipped, then a second
    time as i_state==4, where ->writepage() is then called...

    Arthur
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: ext3: slow symlink corruption on umount...

    Arthur Jones wrote:
    > Hi Eric, ...
    >
    > On Wed, Oct 29, 2008 at 01:36:33PM -0700, Eric Sandeen wrote:
    >> [...]
    >> I'll look into it ...

    >
    > In the cases that fail, I'm seeing bdi_write_congested()
    > return 2 at the top of write_cache_pages(). In the working
    > case, bdi_write_congested() returns 0 and the inodes are
    > found twice in the s_io list in generic_sync_sb_inodes,
    > first as i_state==7, where they are skipped, then a second
    > time as i_state==4, where ->writepage() is then called...
    >
    > Arthur


    Something is definitely racy here; in my simple testcase I get failures
    maybe 30-50% of the time...

    If I add a mark_buffer_dirty to write_end_fn, it always passes but I
    need to think a bit about that.

    -Eric

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: ext3: slow symlink corruption on umount...

    Hi Eric, ...

    On Thu, Oct 30, 2008 at 10:40:57AM -0700, Arthur Jones wrote:
    > [...]
    > return 2 at the top of write_cache_pages(). In the working
    > case, bdi_write_congested() returns 0 and the inodes are
    > found twice in the s_io list in generic_sync_sb_inodes,
    > first as i_state==7, where they are skipped, then a second
    > time as i_state==4, where ->writepage() is then called...


    To clarify, they are not on the list twice, rather
    a separate call to generic_sync_sb_inodes sees them
    again as i_state==4...

    Arthur
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: ext3: slow symlink corruption on umount...

    Hi Eric, ...

    On Thu, Oct 30, 2008 at 11:03:49AM -0700, Eric Sandeen wrote:
    > [...]
    > Something is definitely racy here; in my simple testcase I get failures
    > maybe 30-50% of the time...


    Some more info: in the working case, the inodes are put
    back on sb->s_dirty at then next ext3_sync_fs() call:

    __fsync_super -> DQUOT_SYNC -> ext3_sync_fs -> log_wait_commit

    In the failing case, journal_start_commit returns 0 in ext_sync_fs
    and the inodes disappear into never-never land...

    Arthur
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: ext3: slow symlink corruption on umount...

    On Thu, Oct 30, 2008 at 02:34:00PM -0700, Arthur Jones wrote:
    > Hi Eric, ...
    >
    > On Thu, Oct 30, 2008 at 11:03:49AM -0700, Eric Sandeen wrote:
    > > [...]
    > > Something is definitely racy here; in my simple testcase I get failures
    > > maybe 30-50% of the time...

    >
    > Some more info: in the working case, the inodes are put
    > back on sb->s_dirty at then next ext3_sync_fs() call:
    >
    > __fsync_super -> DQUOT_SYNC -> ext3_sync_fs -> log_wait_commit
    >
    > In the failing case, journal_start_commit returns 0 in ext_sync_fs
    > and the inodes disappear into never-never land...


    More details, these are dumps at __log_start_commit in the
    call chain described above, the first column is the failing
    case, the next column is working case, t_expires is the delta
    from the time the dump was taken:

    journal->j_flags 0x10 0x10
    journal->j_tail_sequence 515 519
    journal->j_transaction_sequence 517 522
    journal->j_commit_sequence 514 519
    journal->j_commit_request 516 520

    journal->j_running_transaction->t_tid 516 521
    journal->j_running_transaction->t_state 0 0
    journal->j_running_transaction->t_updates 0 0
    journal->j_running_transaction->t_handle_count 27305 27344
    journal->j_running_transaction->t_expires -566 28

    Can you tell from this whether the transactions
    are messed up or whether we're just missing a
    wake_up? Any other info you'd like to see?

    Arthur
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: ext3: slow symlink corruption on umount...

    Arthur Jones wrote:
    > On Thu, Oct 30, 2008 at 02:34:00PM -0700, Arthur Jones wrote:
    >> Hi Eric, ...
    >>
    >> On Thu, Oct 30, 2008 at 11:03:49AM -0700, Eric Sandeen wrote:
    >>> [...]
    >>> Something is definitely racy here; in my simple testcase I get failures
    >>> maybe 30-50% of the time...

    >> Some more info: in the working case, the inodes are put
    >> back on sb->s_dirty at then next ext3_sync_fs() call:
    >>
    >> __fsync_super -> DQUOT_SYNC -> ext3_sync_fs -> log_wait_commit
    >>
    >> In the failing case, journal_start_commit returns 0 in ext_sync_fs
    >> and the inodes disappear into never-never land...

    >
    > More details, these are dumps at __log_start_commit in the
    > call chain described above, the first column is the failing
    > case, the next column is working case, t_expires is the delta
    > from the time the dump was taken:
    >
    > journal->j_flags 0x10 0x10
    > journal->j_tail_sequence 515 519
    > journal->j_transaction_sequence 517 522
    > journal->j_commit_sequence 514 519
    > journal->j_commit_request 516 520
    >
    > journal->j_running_transaction->t_tid 516 521
    > journal->j_running_transaction->t_state 0 0
    > journal->j_running_transaction->t_updates 0 0
    > journal->j_running_transaction->t_handle_count 27305 27344
    > journal->j_running_transaction->t_expires -566 28
    >
    > Can you tell from this whether the transactions
    > are messed up or whether we're just missing a
    > wake_up? Any other info you'd like to see?


    That's kind of along the lines of what I'm seeing; also, in particular,
    I'm never seeing the buffer_head in question (the one for the block
    which contains the slow link's data) transition from jbddirty to normal
    BH_Dirty. I've had to take a break from this today, but will be back at
    it a bit later... since I have a solid testcase I'm sure I'll get to the
    bottom of it ... I'll probably hook up akpm's buffer tracing
    infrastructure, just need to find a decent thing to trigger on to dump
    out the history.

    -Eric
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: ext3: slow symlink corruption on umount...

    On Friday 31 October 2008 00:38, Eric Sandeen wrote:
    > Theodore Tso wrote:
    > > On Wed, Oct 29, 2008 at 03:36:33PM -0500, Eric Sandeen wrote:
    > >> Sorry for the silence, this is a nice bug you've found
    > >>
    > >> I'll look into it ...

    > >
    > > You may want to take a quick look at this thread:
    > >
    > > http://lkml.org/lkml/2008/10/28/413
    > >
    > > - Ted

    >
    > I thought about that, but at least at first glance I don't see how the
    > gfp mask change would cause this behavior...? At least, I don't think
    > we're seeing recursion back into the filesystem... but I'll ponder that.


    That's definitely a bug which I'll have to fix for 2.6.28, but
    I agree it's unlikely to recurse frequently like this (would only
    happen under high memory pressure, and only when writeout from
    page reclaim happens).


    > (Also, Arthur reports seeing this as long ago as 2.6.9...)


    OK, well that would confirm it.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. [PATCH] ext3: wait on all pending commits in ext3_sync_fs

    Hi Eric, This patch fixes the problem for me, and
    seems to put the buffers on the dirty list at the
    place where they are put on the list during the working
    case. Despite having rooted around in the innards of
    ext3 for the last few days, I cannot say that I have
    any sense of whether this patch will cause problems
    elsewhere or even if this is the best place to
    intercede.

    I post the complete patch not because I think it
    should be committed as is, but rather to try
    to explain the logic that brought it about. At the
    very least, this should be reviewed by the experts
    here to make sure there is no collateral damage.

    Arthur

    -------------------
    In ext3_sync_fs, we only wait for a commit to
    finish if we started it, but there may be one
    already in progress which will not be synced.

    In the case of a data=ordered umount with pending long
    symlinks which are delayed due to a long list
    of other I/O on the backing block device, this
    causes the buffer associated with the long symlinks
    to not be moved to the inode dirty list in the second
    phase of fsync_super. Then, before they can be dirtied
    again, kjournald exits, seeing the UMOUNT flag and the
    dirty pages are never written to the backing block device,
    causing long symlink corruption and exposing new or
    previously freed block data to userspace.

    This can be reproduced with a script created
    by Eric Sandeen :

    #!/bin/bash

    umount /mnt/test2
    mount /dev/sdb4 /mnt/test2
    rm -f /mnt/test2/*
    dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512
    touch
    /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryvery veryveryveryveryverylongfilename
    ln -s
    /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryvery veryveryveryveryverylongfilename
    /mnt/test2/link
    umount /mnt/test2
    mount /dev/sdb4 /mnt/test2
    ls /mnt/test2/
    umount /mnt/test2

    To ensure all commits are synced, we flush
    all journal commits now when sync_fs'ing ext3.

    Signed-off-by: Arthur Jones
    ---
    fs/ext3/super.c | 8 +++++++-
    1 files changed, 7 insertions(+), 1 deletions(-)

    diff --git a/fs/ext3/super.c b/fs/ext3/super.c
    index 18eaa78..053659a 100644
    --- a/fs/ext3/super.c
    +++ b/fs/ext3/super.c
    @@ -2392,7 +2392,13 @@ static int ext3_sync_fs(struct super_block *sb, int wait)
    if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
    if (wait)
    log_wait_commit(EXT3_SB(sb)->s_journal, target);
    - }
    + } else if (wait)
    + /*
    + * We may have a commit in progress, clear it out
    + * before we go on...
    + */
    + ext3_force_commit(sb);
    +
    return 0;
    }

    --
    1.5.4.3
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [PATCH] ext3: wait on all pending commits in ext3_sync_fs

    On Mon, 3 Nov 2008 10:44:26 -0800
    Arthur Jones wrote:

    > Hi Eric, This patch fixes the problem for me, and
    > seems to put the buffers on the dirty list at the
    > place where they are put on the list during the working
    > case. Despite having rooted around in the innards of
    > ext3 for the last few days, I cannot say that I have
    > any sense of whether this patch will cause problems
    > elsewhere or even if this is the best place to
    > intercede.
    >
    > I post the complete patch not because I think it
    > should be committed as is, but rather to try
    > to explain the logic that brought it about. At the
    > very least, this should be reviewed by the experts
    > here to make sure there is no collateral damage.
    >
    > Arthur
    >
    > -------------------
    > In ext3_sync_fs, we only wait for a commit to
    > finish if we started it, but there may be one
    > already in progress which will not be synced.


    argh.

    > --- a/fs/ext3/super.c
    > +++ b/fs/ext3/super.c
    > @@ -2392,7 +2392,13 @@ static int ext3_sync_fs(struct super_block *sb, int wait)
    > if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
    > if (wait)
    > log_wait_commit(EXT3_SB(sb)->s_journal, target);
    > - }
    > + } else if (wait)
    > + /*
    > + * We may have a commit in progress, clear it out
    > + * before we go on...
    > + */
    > + ext3_force_commit(sb);
    > +
    > return 0;
    > }


    Can we do

    sb->s_dirt = 0;
    if (wait)
    ext3_force_commit(...);
    else
    journal_start_commit(...);

    ?


    Also, I wonder if that `sb->s_dirt = 0' is correct if
    journal_start_commit() didn't start a commit?

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [PATCH] ext3: wait on all pending commits in ext3_sync_fs

    Arthur Jones wrote:
    > Hi Eric, This patch fixes the problem for me, and
    > seems to put the buffers on the dirty list at the
    > place where they are put on the list during the working
    > case. Despite having rooted around in the innards of
    > ext3 for the last few days, I cannot say that I have
    > any sense of whether this patch will cause problems
    > elsewhere or even if this is the best place to
    > intercede.
    >
    > I post the complete patch not because I think it
    > should be committed as is, but rather to try
    > to explain the logic that brought it about. At the
    > very least, this should be reviewed by the experts
    > here to make sure there is no collateral damage.
    >
    > Arthur


    Seems like the right approach; I too was thinking that the problem was
    we just weren't either kicking off, or waiting for, the log commit at
    unmount time.

    I've had to step away from this problem for a couple days but will
    eyeball this soon, it seems like the right root cause & general approach
    to a fix, to me.

    Thanks!

    -Eric

    > -------------------
    > In ext3_sync_fs, we only wait for a commit to
    > finish if we started it, but there may be one
    > already in progress which will not be synced.
    >
    > In the case of a data=ordered umount with pending long
    > symlinks which are delayed due to a long list
    > of other I/O on the backing block device, this
    > causes the buffer associated with the long symlinks
    > to not be moved to the inode dirty list in the second
    > phase of fsync_super. Then, before they can be dirtied
    > again, kjournald exits, seeing the UMOUNT flag and the
    > dirty pages are never written to the backing block device,
    > causing long symlink corruption and exposing new or
    > previously freed block data to userspace.
    >
    > This can be reproduced with a script created
    > by Eric Sandeen :
    >
    > #!/bin/bash
    >
    > umount /mnt/test2
    > mount /dev/sdb4 /mnt/test2
    > rm -f /mnt/test2/*
    > dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512
    > touch
    > /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryvery veryveryveryveryverylongfilename
    > ln -s
    > /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryvery veryveryveryveryverylongfilename
    > /mnt/test2/link
    > umount /mnt/test2
    > mount /dev/sdb4 /mnt/test2
    > ls /mnt/test2/
    > umount /mnt/test2
    >
    > To ensure all commits are synced, we flush
    > all journal commits now when sync_fs'ing ext3.
    >
    > Signed-off-by: Arthur Jones
    > ---
    > fs/ext3/super.c | 8 +++++++-
    > 1 files changed, 7 insertions(+), 1 deletions(-)
    >
    > diff --git a/fs/ext3/super.c b/fs/ext3/super.c
    > index 18eaa78..053659a 100644
    > --- a/fs/ext3/super.c
    > +++ b/fs/ext3/super.c
    > @@ -2392,7 +2392,13 @@ static int ext3_sync_fs(struct super_block *sb, int wait)
    > if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
    > if (wait)
    > log_wait_commit(EXT3_SB(sb)->s_journal, target);
    > - }
    > + } else if (wait)
    > + /*
    > + * We may have a commit in progress, clear it out
    > + * before we go on...
    > + */
    > + ext3_force_commit(sb);
    > +
    > return 0;
    > }
    >


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [PATCH] ext3: wait on all pending commits in ext3_sync_fs

    Hi Andrew, ...

    On Mon, Nov 03, 2008 at 11:33:18AM -0800, Andrew Morton wrote:
    > [...]
    > > --- a/fs/ext3/super.c
    > > +++ b/fs/ext3/super.c
    > > @@ -2392,7 +2392,13 @@ static int ext3_sync_fs(struct super_block *sb, int wait)
    > > if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
    > > if (wait)
    > > log_wait_commit(EXT3_SB(sb)->s_journal, target);
    > > - }
    > > + } else if (wait)
    > > + /*
    > > + * We may have a commit in progress, clear it out
    > > + * before we go on...
    > > + */
    > > + ext3_force_commit(sb);
    > > +
    > > return 0;
    > > }

    >
    > Can we do
    >
    > sb->s_dirt = 0;
    > if (wait)
    > ext3_force_commit(...);
    > else
    > journal_start_commit(...);


    I think this is what you had in mind:


    diff --git a/fs/ext3/super.c b/fs/ext3/super.c
    index 18eaa78..2b48b85 100644
    --- a/fs/ext3/super.c
    +++ b/fs/ext3/super.c
    @@ -2386,13 +2386,12 @@ static void ext3_write_super (struct super_block * sb)

    static int ext3_sync_fs(struct super_block *sb, int wait)
    {
    - tid_t target;
    -
    sb->s_dirt = 0;
    - if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
    - if (wait)
    - log_wait_commit(EXT3_SB(sb)->s_journal, target);
    - }
    + if (wait)
    + ext3_force_commit(sb);
    + else
    + journal_start_commit(EXT3_SB(sb)->s_journal, NULL);
    +
    return 0;
    }

    I tried this and it too fixes the problem. FWIW I agree it
    looks better...

    Tested-by: Arthur Jones

    Arthur
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [PATCH] ext3: wait on all pending commits in ext3_sync_fs

    On Mon, 3 Nov 2008 12:14:28 -0800
    Arthur Jones wrote:

    > Hi Andrew, ...
    >
    > On Mon, Nov 03, 2008 at 11:33:18AM -0800, Andrew Morton wrote:
    > > [...]
    > > > --- a/fs/ext3/super.c
    > > > +++ b/fs/ext3/super.c
    > > > @@ -2392,7 +2392,13 @@ static int ext3_sync_fs(struct super_block *sb, int wait)
    > > > if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
    > > > if (wait)
    > > > log_wait_commit(EXT3_SB(sb)->s_journal, target);
    > > > - }
    > > > + } else if (wait)
    > > > + /*
    > > > + * We may have a commit in progress, clear it out
    > > > + * before we go on...
    > > > + */
    > > > + ext3_force_commit(sb);
    > > > +
    > > > return 0;
    > > > }

    > >
    > > Can we do
    > >
    > > sb->s_dirt = 0;
    > > if (wait)
    > > ext3_force_commit(...);
    > > else
    > > journal_start_commit(...);

    >
    > I think this is what you had in mind:


    yup.

    >
    > diff --git a/fs/ext3/super.c b/fs/ext3/super.c
    > index 18eaa78..2b48b85 100644
    > --- a/fs/ext3/super.c
    > +++ b/fs/ext3/super.c
    > @@ -2386,13 +2386,12 @@ static void ext3_write_super (struct super_block * sb)
    >
    > static int ext3_sync_fs(struct super_block *sb, int wait)
    > {
    > - tid_t target;
    > -
    > sb->s_dirt = 0;
    > - if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
    > - if (wait)
    > - log_wait_commit(EXT3_SB(sb)->s_journal, target);
    > - }
    > + if (wait)
    > + ext3_force_commit(sb);
    > + else
    > + journal_start_commit(EXT3_SB(sb)->s_journal, NULL);
    > +
    > return 0;
    > }
    >
    > I tried this and it too fixes the problem. FWIW I agree it
    > looks better...
    >


    OK. But still, questions remain.

    - should we clear s_dirt if log_wait_commit() didn't work?

    - ext3_force_commit() clears s_dirt also

    - should ext3_sync_fs() be dropping the ext3_force_commit() return
    value on the floor?

    This is all rather worrisome.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [PATCH] ext3: wait on all pending commits in ext3_sync_fs

    Hi Andrew, ...

    On Mon, Nov 03, 2008 at 12:37:50PM -0800, Andrew Morton wrote:
    > [...]
    > OK. But still, questions remain.
    >
    > - should we clear s_dirt if log_wait_commit() didn't work?
    >
    > - ext3_force_commit() clears s_dirt also
    >
    > - should ext3_sync_fs() be dropping the ext3_force_commit() return
    > value on the floor?


    - does ext4 have a similar issue (ext4_sync_fs looks the same)?

    Arthur
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [PATCH] ext3: wait on all pending commits in ext3_sync_fs

    On Mon, Nov 03, 2008 at 01:13:13PM -0800, Andrew Morton wrote:
    > On Mon, 3 Nov 2008 12:58:54 -0800
    > Arthur Jones wrote:
    >
    > > Hi Andrew, ...
    > >
    > > On Mon, Nov 03, 2008 at 12:37:50PM -0800, Andrew Morton wrote:
    > > > [...]
    > > > OK. But still, questions remain.
    > > >
    > > > - should we clear s_dirt if log_wait_commit() didn't work?
    > > >
    > > > - ext3_force_commit() clears s_dirt also
    > > >
    > > > - should ext3_sync_fs() be dropping the ext3_force_commit() return
    > > > value on the floor?


    Should all the callers of ->sync_fs also be dropping the error returns
    on the floor?

    > >
    > > - does ext4 have a similar issue (ext4_sync_fs looks the same)?
    > >

    >
    > I'm sure it does. Someone(tm) should prepare the ext4 patch once we've
    > settled on the ext3 patch.


    I'll take care of it. And I would reflect the error returns up to
    ext3_sync_fs's callers, although at the moment it's not clear it would
    make much of a difference. :-(

    - Ted
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 1 of 2 1 2 LastLast