[PATCH] vfs: Fix lock inversion in drop_pagecache_sb() - Kernel

This is a discussion on [PATCH] vfs: Fix lock inversion in drop_pagecache_sb() - Kernel ; Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock before calling __invalidate_mapping_pages(). We just have to make sure inode won't go away from under us by keeping reference to it and putting the reference only after we have safely resumed ...

+ Reply to Thread
Results 1 to 7 of 7

Thread: [PATCH] vfs: Fix lock inversion in drop_pagecache_sb()

  1. [PATCH] vfs: Fix lock inversion in drop_pagecache_sb()

    Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
    before calling __invalidate_mapping_pages(). We just have to make sure
    inode won't go away from under us by keeping reference to it and putting
    the reference only after we have safely resumed the scan of the inode
    list. A bit tricky but not too bad...

    Signed-off-by: Jan Kara
    CC: Fengguang Wu
    CC: David Chinner

    ---
    fs/drop_caches.c | 8 +++++++-
    1 files changed, 7 insertions(+), 1 deletions(-)

    diff --git a/fs/drop_caches.c b/fs/drop_caches.c
    index 59375ef..f5aae26 100644
    --- a/fs/drop_caches.c
    +++ b/fs/drop_caches.c
    @@ -14,15 +14,21 @@ int sysctl_drop_caches;

    static void drop_pagecache_sb(struct super_block *sb)
    {
    - struct inode *inode;
    + struct inode *inode, *toput_inode = NULL;

    spin_lock(&inode_lock);
    list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
    if (inode->i_state & (I_FREEING|I_WILL_FREE))
    continue;
    + __iget(inode);
    + spin_unlock(&inode_lock);
    __invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
    + iput(toput_inode);
    + toput_inode = inode;
    + spin_lock(&inode_lock);
    }
    spin_unlock(&inode_lock);
    + iput(toput_inode);
    }

    void drop_pagecache(void)
    --
    1.5.2.4

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH] vfs: Fix lock inversion in drop_pagecache_sb()

    On Tue, 25 Mar 2008 19:12:27 +0100
    Jan Kara wrote:

    > Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
    > before calling __invalidate_mapping_pages(). We just have to make sure
    > inode won't go away from under us by keeping reference to it and putting
    > the reference only after we have safely resumed the scan of the inode
    > list. A bit tricky but not too bad...
    >
    > Signed-off-by: Jan Kara
    > CC: Fengguang Wu
    > CC: David Chinner
    >
    > ---
    > fs/drop_caches.c | 8 +++++++-
    > 1 files changed, 7 insertions(+), 1 deletions(-)
    >
    > diff --git a/fs/drop_caches.c b/fs/drop_caches.c
    > index 59375ef..f5aae26 100644
    > --- a/fs/drop_caches.c
    > +++ b/fs/drop_caches.c
    > @@ -14,15 +14,21 @@ int sysctl_drop_caches;
    >
    > static void drop_pagecache_sb(struct super_block *sb)
    > {
    > - struct inode *inode;
    > + struct inode *inode, *toput_inode = NULL;
    >
    > spin_lock(&inode_lock);
    > list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
    > if (inode->i_state & (I_FREEING|I_WILL_FREE))
    > continue;


    OT: it might be worth having an `if (mapping->nrpages==0) continue' here.

    > + __iget(inode);
    > + spin_unlock(&inode_lock);
    > __invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
    > + iput(toput_inode);
    > + toput_inode = inode;
    > + spin_lock(&inode_lock);
    > }
    > spin_unlock(&inode_lock);
    > + iput(toput_inode);
    > }
    >
    > void drop_pagecache(void)


    hrm. So we have a random ref on an inode without holding inode_lock. If
    we race with invalidate_list() we end up with an inode stuck on s_inodes
    and "Self-destruct in 5 seconds. Have a nice day...", don't we?
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH] vfs: Fix lock inversion in drop_pagecache_sb()


    On Tue, 2008-03-25 at 12:53 -0700, Andrew Morton wrote:
    > On Tue, 25 Mar 2008 19:12:27 +0100
    > Jan Kara wrote:
    >
    > > Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
    > > before calling __invalidate_mapping_pages(). We just have to make sure
    > > inode won't go away from under us by keeping reference to it and putting
    > > the reference only after we have safely resumed the scan of the inode
    > > list. A bit tricky but not too bad...
    > >
    > > Signed-off-by: Jan Kara
    > > CC: Fengguang Wu
    > > CC: David Chinner
    > >
    > > ---
    > > fs/drop_caches.c | 8 +++++++-
    > > 1 files changed, 7 insertions(+), 1 deletions(-)
    > >
    > > diff --git a/fs/drop_caches.c b/fs/drop_caches.c
    > > index 59375ef..f5aae26 100644
    > > --- a/fs/drop_caches.c
    > > +++ b/fs/drop_caches.c
    > > @@ -14,15 +14,21 @@ int sysctl_drop_caches;
    > >
    > > static void drop_pagecache_sb(struct super_block *sb)
    > > {
    > > - struct inode *inode;
    > > + struct inode *inode, *toput_inode = NULL;
    > >
    > > spin_lock(&inode_lock);
    > > list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
    > > if (inode->i_state & (I_FREEING|I_WILL_FREE))
    > > continue;

    >
    > OT: it might be worth having an `if (mapping->nrpages==0) continue' here.
    >
    > > + __iget(inode);
    > > + spin_unlock(&inode_lock);
    > > __invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
    > > + iput(toput_inode);
    > > + toput_inode = inode;
    > > + spin_lock(&inode_lock);
    > > }
    > > spin_unlock(&inode_lock);
    > > + iput(toput_inode);
    > > }
    > >
    > > void drop_pagecache(void)

    >
    > hrm. So we have a random ref on an inode without holding inode_lock. If
    > we race with invalidate_list() we end up with an inode stuck on s_inodes
    > and "Self-destruct in 5 seconds. Have a nice day...", don't we?


    Calling drop_pagecache_sb() without having a reference to 'sb'? Surely
    not...

    Trond

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH] vfs: Fix lock inversion in drop_pagecache_sb()

    On Tue, Mar 25, 2008 at 07:12:27PM +0100, Jan Kara wrote:
    > Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
    > before calling __invalidate_mapping_pages(). We just have to make sure
    > inode won't go away from under us by keeping reference to it and putting
    > the reference only after we have safely resumed the scan of the inode
    > list. A bit tricky but not too bad...


    Reviewed-by: Fengguang Wu

    It's a handy trick to iterate through the list_head :-)
    I have practiced this in my filecache code, and it works nice.

    Fengguang

    > Signed-off-by: Jan Kara
    > CC: Fengguang Wu
    > CC: David Chinner
    >
    > ---
    > fs/drop_caches.c | 8 +++++++-
    > 1 files changed, 7 insertions(+), 1 deletions(-)
    >
    > diff --git a/fs/drop_caches.c b/fs/drop_caches.c
    > index 59375ef..f5aae26 100644
    > --- a/fs/drop_caches.c
    > +++ b/fs/drop_caches.c
    > @@ -14,15 +14,21 @@ int sysctl_drop_caches;
    >
    > static void drop_pagecache_sb(struct super_block *sb)
    > {
    > - struct inode *inode;
    > + struct inode *inode, *toput_inode = NULL;
    >
    > spin_lock(&inode_lock);
    > list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
    > if (inode->i_state & (I_FREEING|I_WILL_FREE))
    > continue;
    > + __iget(inode);
    > + spin_unlock(&inode_lock);
    > __invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
    > + iput(toput_inode);
    > + toput_inode = inode;
    > + spin_lock(&inode_lock);
    > }
    > spin_unlock(&inode_lock);
    > + iput(toput_inode);
    > }
    >
    > void drop_pagecache(void)
    > --
    > 1.5.2.4
    >


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH] vfs: Fix lock inversion in drop_pagecache_sb()

    On Tue, Mar 25, 2008 at 12:53:54PM -0700, Andrew Morton wrote:
    > On Tue, 25 Mar 2008 19:12:27 +0100
    > Jan Kara wrote:
    >
    > > Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
    > > before calling __invalidate_mapping_pages(). We just have to make sure
    > > inode won't go away from under us by keeping reference to it and putting
    > > the reference only after we have safely resumed the scan of the inode
    > > list. A bit tricky but not too bad...
    > >
    > > Signed-off-by: Jan Kara
    > > CC: Fengguang Wu
    > > CC: David Chinner
    > >
    > > ---
    > > fs/drop_caches.c | 8 +++++++-
    > > 1 files changed, 7 insertions(+), 1 deletions(-)
    > >
    > > diff --git a/fs/drop_caches.c b/fs/drop_caches.c
    > > index 59375ef..f5aae26 100644
    > > --- a/fs/drop_caches.c
    > > +++ b/fs/drop_caches.c
    > > @@ -14,15 +14,21 @@ int sysctl_drop_caches;
    > >
    > > static void drop_pagecache_sb(struct super_block *sb)
    > > {
    > > - struct inode *inode;
    > > + struct inode *inode, *toput_inode = NULL;
    > >
    > > spin_lock(&inode_lock);
    > > list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
    > > if (inode->i_state & (I_FREEING|I_WILL_FREE))
    > > continue;

    >
    > OT: it might be worth having an `if (mapping->nrpages==0) continue' here.


    Good catch!

    There are 25k opened inodes in my desktop, merely 10% of them has cached pages:

    % cat /proc/sys/fs/inode-state
    25395 129 0 0 0 0 0
    # wc -l /proc/filecache
    2542 /proc/filecache

    + if (!inode->i_mapping || !inode->i_mapping->nrpages)
    + continue;

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. [PATCH] vfs: Skip inodes without pages to free in drop_pagecache_sb()


    Signed-off-by: Jan Kara
    CC: Fengguang Wu

    ---
    fs/drop_caches.c | 2 ++
    1 files changed, 2 insertions(+), 0 deletions(-)

    diff --git a/fs/drop_caches.c b/fs/drop_caches.c
    index f5aae26..7327a42 100644
    --- a/fs/drop_caches.c
    +++ b/fs/drop_caches.c
    @@ -20,6 +20,8 @@ static void drop_pagecache_sb(struct super_block *sb)
    list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
    if (inode->i_state & (I_FREEING|I_WILL_FREE))
    continue;
    + if (inode->i_mapping->nrpages == 0)
    + continue;
    __iget(inode);
    spin_unlock(&inode_lock);
    __invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
    --
    1.5.2.4

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH] vfs: Fix lock inversion in drop_pagecache_sb()

    On Tue 25-03-08 12:53:54, Andrew Morton wrote:
    > On Tue, 25 Mar 2008 19:12:27 +0100
    > Jan Kara wrote:
    >
    > > Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
    > > before calling __invalidate_mapping_pages(). We just have to make sure
    > > inode won't go away from under us by keeping reference to it and putting
    > > the reference only after we have safely resumed the scan of the inode
    > > list. A bit tricky but not too bad...
    > >
    > > Signed-off-by: Jan Kara
    > > CC: Fengguang Wu
    > > CC: David Chinner
    > >
    > > ---
    > > fs/drop_caches.c | 8 +++++++-
    > > 1 files changed, 7 insertions(+), 1 deletions(-)
    > >
    > > diff --git a/fs/drop_caches.c b/fs/drop_caches.c
    > > index 59375ef..f5aae26 100644
    > > --- a/fs/drop_caches.c
    > > +++ b/fs/drop_caches.c
    > > @@ -14,15 +14,21 @@ int sysctl_drop_caches;
    > >
    > > static void drop_pagecache_sb(struct super_block *sb)
    > > {
    > > - struct inode *inode;
    > > + struct inode *inode, *toput_inode = NULL;
    > >
    > > spin_lock(&inode_lock);
    > > list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
    > > if (inode->i_state & (I_FREEING|I_WILL_FREE))
    > > continue;

    >
    > OT: it might be worth having an `if (mapping->nrpages==0) continue' here.

    Good idea. I'll send a patch in a minute.

    > > + __iget(inode);
    > > + spin_unlock(&inode_lock);
    > > __invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
    > > + iput(toput_inode);
    > > + toput_inode = inode;
    > > + spin_lock(&inode_lock);
    > > }
    > > spin_unlock(&inode_lock);
    > > + iput(toput_inode);
    > > }
    > >
    > > void drop_pagecache(void)

    >
    > hrm. So we have a random ref on an inode without holding inode_lock. If
    > we race with invalidate_list() we end up with an inode stuck on s_inodes
    > and "Self-destruct in 5 seconds. Have a nice day...", don't we?

    We hold s_umount for reading so we should be safe against someone trying
    to do umount. We could possibly race with invalidate_list() called from
    check_disk_change() but removing media without unmounting is a bad behavior
    anyway. So I think we are fine.

    Honza
    --
    Jan Kara
    SUSE Labs, CR
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread