[PATCH] fsstack: fsstack_copy_inode_size locking - Kernel

This is a discussion on [PATCH] fsstack: fsstack_copy_inode_size locking - Kernel ; LTP's iogen01 doio tests used to hang nicely on 32-bit SMP when /tmp was a unionfs mount of a tmpfs, i_size_read spinning forever, waiting for a lost seqcount update: fixed by taking i_lock around i_size_write when 32-bit SMP. But akpm ...

+ Reply to Thread
Results 1 to 12 of 12

Thread: [PATCH] fsstack: fsstack_copy_inode_size locking

  1. [PATCH] fsstack: fsstack_copy_inode_size locking

    LTP's iogen01 doio tests used to hang nicely on 32-bit SMP when /tmp was a
    unionfs mount of a tmpfs, i_size_read spinning forever, waiting for a lost
    seqcount update: fixed by taking i_lock around i_size_write when 32-bit SMP.

    But akpm was dissatisfied with the resulting patch: its lack of commentary,
    the #ifs, the nesting around i_size_read, the lack of attention to i_blocks.
    I promised to redo it with the general spin_lock_32bit() he proposed; but
    disliked the result, partly because "32bit" obscures the real constraints,
    which are best commented within fsstack_copy_inode_size itself.

    This version adds those comments, and uses sizeof comparisons which the
    compiler can optimize out, instead of CONFIG_SMP, CONFIG_LSF. BITS_PER_LONG.

    Signed-off-by: Hugh Dickins
    ---
    Should follow mmotm's git-unionfs-fixup.patch

    fs/stack.c | 58 +++++++++++++++++++++++++++++++------
    include/linux/fs_stack.h | 3 -
    2 files changed, 50 insertions(+), 11 deletions(-)

    --- mmotm/fs/stack.c 2008-06-27 13:39:18.000000000 +0100
    +++ linux/fs/stack.c 2008-06-27 18:24:19.000000000 +0100
    @@ -19,16 +19,56 @@
    * This function cannot be inlined since i_size_{read,write} is rather
    * heavy-weight on 32-bit systems
    */
    -void fsstack_copy_inode_size(struct inode *dst, const struct inode *src)
    +void fsstack_copy_inode_size(struct inode *dst, struct inode *src)
    {
    -#if BITS_PER_LONG == 32 && defined(CONFIG_SMP)
    - spin_lock(&dst->i_lock);
    -#endif
    - i_size_write(dst, i_size_read(src));
    - dst->i_blocks = src->i_blocks;
    -#if BITS_PER_LONG == 32 && defined(CONFIG_SMP)
    - spin_unlock(&dst->i_lock);
    -#endif
    + loff_t i_size;
    + blkcnt_t i_blocks;
    +
    + /*
    + * i_size_read() includes its own seqlocking and protection from
    + * preemption (see include/linux/fs.h): we need nothing extra for
    + * that here, and prefer to avoid nesting locks than attempt to
    + * keep i_size and i_blocks in synch together.
    + */
    + i_size = i_size_read(src);
    +
    + /*
    + * But if CONFIG_LSF (on 32-bit), we ought to make an effort to keep
    + * the two halves of i_blocks in synch despite SMP or PREEMPT - though
    + * stat's generic_fillattr() doesn't bother, and we won't be applying
    + * quotas (where i_blocks does become important) at the upper level.
    + *
    + * We don't actually know what locking is used at the lower level; but
    + * if it's a filesystem that supports quotas, it will be using i_lock
    + * as in inode_add_bytes(). tmpfs uses other locking, and its 32-bit
    + * is (just) able to exceed 2TB i_size with the aid of holes; but its
    + * i_blocks cannot carry into the upper long without almost 2TB swap -
    + * let's ignore that case.
    + */
    + if (sizeof(i_blocks) > sizeof(long))
    + spin_lock(&src->i_lock);
    + i_blocks = src->i_blocks;
    + if (sizeof(i_blocks) > sizeof(long))
    + spin_unlock(&src->i_lock);
    +
    + /*
    + * If CONFIG_SMP on 32-bit, it's vital for fsstack_copy_inode_size()
    + * to hold some lock around i_size_write(), otherwise i_size_read()
    + * may spin forever (see include/linux/fs.h). We don't necessarily
    + * hold i_mutex when this is called, so take i_lock for that case.
    + *
    + * And if CONFIG_LSF (on 32-bit), continue our effort to keep the
    + * two halves of i_blocks in synch despite SMP or PREEMPT: use i_lock
    + * for that case too, and do both at once by combining the tests.
    + *
    + * There is none of this locking overhead in the 64-bit case.
    + */
    + if (sizeof(i_size) > sizeof(long) || sizeof(i_blocks) > sizeof(long))
    + spin_lock(&dst->i_lock);
    + i_size_write(dst, i_size);
    + dst->i_blocks = i_blocks;
    + if (sizeof(i_size) > sizeof(long) || sizeof(i_blocks) > sizeof(long))
    + spin_unlock(&dst->i_lock);
    }
    EXPORT_SYMBOL_GPL(fsstack_copy_inode_size);

    --- mmotm/include/linux/fs_stack.h 2008-06-27 13:39:19.000000000 +0100
    +++ linux/include/linux/fs_stack.h 2008-06-27 14:08:00.000000000 +0100
    @@ -21,8 +21,7 @@

    /* externs for fs/stack.c */
    extern void fsstack_copy_attr_all(struct inode *dest, const struct inode *src);
    -extern void fsstack_copy_inode_size(struct inode *dst,
    - const struct inode *src);
    +extern void fsstack_copy_inode_size(struct inode *dst, struct inode *src);

    /* inlines */
    static inline void fsstack_copy_attr_atime(struct inode *dest,
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. [PATCH] unionfs: fix memory leak

    Unionfs has slowly been leaking memory: although many places do explicitly
    free the dentry's lower_paths array (and I've not changed those, in case
    the ordering is important), not all do - and adding a WARN_ON didn't seem
    to finger any particular culprit. So free_dentry_private_data needs to
    kfree lower_paths (other freeers are good about setting it to NULL).

    Signed-off-by: Hugh Dickins
    ---
    Should follow mmotm's git-unionfs-fixup.patch

    fs/unionfs/lookup.c | 1 +
    1 file changed, 1 insertion(+)

    --- mmotm/fs/unionfs/lookup.c 2008-06-27 13:39:18.000000000 +0100
    +++ linux/fs/unionfs/lookup.c 2008-06-27 14:08:00.000000000 +0100
    @@ -498,6 +498,7 @@ void free_dentry_private_data(struct den
    {
    if (!dentry || !dentry->d_fsdata)
    return;
    + kfree(UNIONFS_D(dentry)->lower_paths);
    kmem_cache_free(unionfs_dentry_cachep, dentry->d_fsdata);
    dentry->d_fsdata = NULL;
    }
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH] fsstack: fsstack_copy_inode_size locking

    On Sun, Jun 29, 2008 at 01:05:09AM +0100, Hugh Dickins wrote:
    > LTP's iogen01 doio tests used to hang nicely on 32-bit SMP when /tmp was a
    > unionfs mount of a tmpfs, i_size_read spinning forever, waiting for a lost
    > seqcount update: fixed by taking i_lock around i_size_write when 32-bit SMP.
    >
    > But akpm was dissatisfied with the resulting patch: its lack of commentary,
    > the #ifs, the nesting around i_size_read, the lack of attention to i_blocks.
    > I promised to redo it with the general spin_lock_32bit() he proposed; but
    > disliked the result, partly because "32bit" obscures the real constraints,
    > which are best commented within fsstack_copy_inode_size itself.
    >
    > This version adds those comments, and uses sizeof comparisons which the
    > compiler can optimize out, instead of CONFIG_SMP, CONFIG_LSF. BITS_PER_LONG.


    Btw, I hope fsstack doesn't rely on i_size having any particular
    meaning. As far as the VFS is concerned i_size is field only used by
    the filesystem (or library routines like generic_file_*).

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH] fsstack: fsstack_copy_inode_size locking

    On Sun, 29 Jun 2008, Christoph Hellwig wrote:
    >
    > Btw, I hope fsstack doesn't rely on i_size having any particular
    > meaning. As far as the VFS is concerned i_size is field only used by
    > the filesystem (or library routines like generic_file_*).


    Interesting point. I can't speak for fsstack itself (I'm not even
    sure if it's anything beyond fs/stack.c and the tag I used to identify
    where this patch lies); but certainly fs/stack.c doesn't use i_size
    for anything, just duplicates it from the lower filesystem.

    unionfs (which I think you don't care for at all in general) does
    look as if it assumes it's the lower file size in a few places,
    when copying up or truncating. Isn't that reasonable? Wouldn't
    users make the same assumption?

    Or are you saying that filesystems which don't support the usual
    meaning of inode->i_size (leave it 0?) would supply their own
    equivalent to vmtruncate() if they support truncation, and their
    own getattr which fills in stat->size from somewhere else.

    Yes, I think you are saying that: unionfs may not play well with them.

    Hugh
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH] fsstack: fsstack_copy_inode_size locking


    Hugh Dickins:
    > LTP's iogen01 doio tests used to hang nicely on 32-bit SMP when /tmp was a
    > unionfs mount of a tmpfs, i_size_read spinning forever, waiting for a lost
    > seqcount update: fixed by taking i_lock around i_size_write when 32-bit SMP.


    I don't know why dst->i_lock is affected by src->i_size_seqcount.
    Do you mean that your test issued write(2) to the lower/actual file so
    frequently that i_size_read() in unionfs always failed?

    Is your test
    iogen01 export LTPROOT; rwtest -N iogen01 -i 120s -s read,write -Da -Dv -n 2 500b:doio.f1.$$ 1000b:doio.f2.$$
    line in runtest/fs?


    Junjiro Okajima
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH] fsstack: fsstack_copy_inode_size locking

    On Mon, 30 Jun 2008, hooanon05@yahoo.co.jp wrote:
    > Hugh Dickins:
    > > LTP's iogen01 doio tests used to hang nicely on 32-bit SMP when /tmp was a
    > > unionfs mount of a tmpfs, i_size_read spinning forever, waiting for a lost
    > > seqcount update: fixed by taking i_lock around i_size_write when 32-bit SMP.


    (That, of course, is already fixed in Erez's unionfs git, so in mm;
    but I thought I'd better repeat the history because some of akpm's
    objections to pushing the fix to mainline came from the lack of
    this explanation when the patch was submitted for mainline.)

    >
    > I don't know why dst->i_lock is affected by src->i_size_seqcount.


    It certainly shouldn't be. The problem would have come from two
    racing i_size_write(dst)s, one of the unlocked increments getting
    lost, leaving seqcount odd, so the next i_size_read(dst) would
    spin forever waiting for it to go even.

    > Do you mean that your test issued write(2) to the lower/actual file


    I was merely running LTP, on a Core(2) Quad machine, with /tmp as
    unionfs mount of a tmpfs. It wouldn't have been doing anything
    directly to the lower file, that would all have been coming via
    the unionfs mount.

    > so frequently that i_size_read() in unionfs always failed?


    I'm not sure what you mean by that. i_size_read() doesn't fail,
    but it may loop; and if the seqcount has got out of step from
    concurrent unlocked i_size_write()s, then it'll spin forever.

    >
    > Is your test
    > iogen01 export LTPROOT; rwtest -N iogen01 -i 120s -s read,write -Da -Dv -n 2 500b:doio.f1.$$ 1000b:doio.f2.$$
    > line in runtest/fs?


    That looks like it: I don't think I ever tried it as a standalone test,
    just investigated what was happening when standard LTP runs hung here.

    Hugh
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH] fsstack: fsstack_copy_inode_size locking


    Hugh Dickins:
    > It certainly shouldn't be. The problem would have come from two
    > racing i_size_write(dst)s, one of the unlocked increments getting
    > lost, leaving seqcount odd, so the next i_size_read(dst) would
    > spin forever waiting for it to go even.


    I see.
    The unlocked increment can cause the next i_size_read() hang.


    > I'm not sure what you mean by that. i_size_read() doesn't fail,
    > but it may loop; and if the seqcount has got out of step from
    > concurrent unlocked i_size_write()s, then it'll spin forever.


    What I meant by "fail" was "loop" actually.
    And I understand that you didn't writing directly (bypassing unionfs)
    too.


    Junjiro Okajima
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH] unionfs: fix memory leak

    In message , Hugh Dickins writes:
    > Unionfs has slowly been leaking memory: although many places do explicitly
    > free the dentry's lower_paths array (and I've not changed those, in case
    > the ordering is important), not all do - and adding a WARN_ON didn't seem
    > to finger any particular culprit. So free_dentry_private_data needs to
    > kfree lower_paths (other freeers are good about setting it to NULL).
    >
    > Signed-off-by: Hugh Dickins
    > ---
    > Should follow mmotm's git-unionfs-fixup.patch
    >
    > fs/unionfs/lookup.c | 1 +
    > 1 file changed, 1 insertion(+)
    >
    > --- mmotm/fs/unionfs/lookup.c 2008-06-27 13:39:18.000000000 +0100
    > +++ linux/fs/unionfs/lookup.c 2008-06-27 14:08:00.000000000 +0100
    > @@ -498,6 +498,7 @@ void free_dentry_private_data(struct den
    > {
    > if (!dentry || !dentry->d_fsdata)
    > return;
    > + kfree(UNIONFS_D(dentry)->lower_paths);
    > kmem_cache_free(unionfs_dentry_cachep, dentry->d_fsdata);
    > dentry->d_fsdata = NULL;
    > }



    Thanks, I'll take a closer look at this to ensure that all paths kfree
    lower_paths as needed.

    Erez.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH] fsstack: fsstack_copy_inode_size locking

    In message , Hugh Dickins writes:
    > On Sun, 29 Jun 2008, Christoph Hellwig wrote:
    > >
    > > Btw, I hope fsstack doesn't rely on i_size having any particular
    > > meaning. As far as the VFS is concerned i_size is field only used by
    > > the filesystem (or library routines like generic_file_*).

    >
    > Interesting point. I can't speak for fsstack itself (I'm not even
    > sure if it's anything beyond fs/stack.c and the tag I used to identify
    > where this patch lies); but certainly fs/stack.c doesn't use i_size
    > for anything, just duplicates it from the lower filesystem.
    >
    > unionfs (which I think you don't care for at all in general) does
    > look as if it assumes it's the lower file size in a few places,
    > when copying up or truncating. Isn't that reasonable? Wouldn't
    > users make the same assumption?
    >
    > Or are you saying that filesystems which don't support the usual
    > meaning of inode->i_size (leave it 0?) would supply their own
    > equivalent to vmtruncate() if they support truncation, and their
    > own getattr which fills in stat->size from somewhere else.
    >
    > Yes, I think you are saying that: unionfs may not play well with them.
    >
    > Hugh


    Hugh, yes, the only place in fsstack where i_size is used is to copy the
    lower i_size to the upper one verbatim. If this assumption is incorrect for
    some lower file systems, then stackable file systems in general may have a
    problem with this assumption; in that case, we'll need an alternative way to
    "interpret" the lower i_size, and report the i_size in the upper inode (and
    hence to the user).

    Is there such an alternative?

    BTW, ecryptfs may have a problem with this, b/c it uses i_size_read/write
    b/t the lower and upper inodes. If some filesystems have a different
    interpretation for i_size, then stacking ecryptfs on top of them could be an
    issue. Mike?

    Erez.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [PATCH] fsstack: fsstack_copy_inode_size locking

    On Mon, Jun 30, 2008 at 02:19:27PM -0400, Erez Zadok wrote:
    > BTW, ecryptfs may have a problem with this, b/c it uses
    > i_size_read/write b/t the lower and upper inodes. If some
    > filesystems have a different interpretation for i_size, then
    > stacking ecryptfs on top of them could be an issue. Mike?


    eCryptfs depends on the results of i_size_read() on the lower inode
    when it needs to interpolate the filesize when the metadata is stored
    in the lower file's xattr region.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [PATCH] fsstack: fsstack_copy_inode_size locking

    On Mon, Jun 30, 2008 at 04:49:01PM -0500, Michael Halcrow wrote:
    > eCryptfs depends on the results of i_size_read() on the lower inode
    > when it needs to interpolate the filesize when the metadata is stored
    > in the lower file's xattr region.


    please use vfs_getattr to get the size in thst case.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [PATCH] fsstack: fsstack_copy_inode_size locking

    In message <20080701070350.GA22914@infradead.org>, Christoph Hellwig writes:
    > On Mon, Jun 30, 2008 at 04:49:01PM -0500, Michael Halcrow wrote:
    > > eCryptfs depends on the results of i_size_read() on the lower inode
    > > when it needs to interpolate the filesize when the metadata is stored
    > > in the lower file's xattr region.

    >
    > please use vfs_getattr to get the size in thst case.


    So, Christoph, is using vfs_getattr() the preferred choice for stackable
    file systems in general to get the lower inode size?

    If so, the Hugh's fsstack_copy_inode_size patch will have to be updated: and
    we'll probably have to change the names of the src/dst parameters to
    lower/upper, possibly watch out for whether "src" was the lower or upper
    inode, and be careful about locking semantics when calling vfs_getattr.

    Who'd have thought that a concept as simple as
    dst->i_size = src->i_size
    could turn out to be so hairy... 1/2 a :-)

    Thanks,
    Erez.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread