ext3 lockdep warning in 2.6.25-rc6 - Kernel

This is a discussion on ext3 lockdep warning in 2.6.25-rc6 - Kernel ; I was building a kernel using "make -j 4" inside a unionfs, mounted on top of ext3. The kernel is vanilla 2.6.25-rc6 plus unionfs patches. At some point I forced a cache flush using "echo 3 > /proc/sys/vm/drop_caches" and I ...

+ Reply to Thread
Results 1 to 11 of 11

Thread: ext3 lockdep warning in 2.6.25-rc6

  1. ext3 lockdep warning in 2.6.25-rc6

    I was building a kernel using "make -j 4" inside a unionfs, mounted on top
    of ext3. The kernel is vanilla 2.6.25-rc6 plus unionfs patches. At some
    point I forced a cache flush using "echo 3 > /proc/sys/vm/drop_caches" and I
    got the lockdep warning below. Note that unionfs doesn't appear to be
    involved in this lockdep warning at all, so I suspect this is probably an
    issue between jbd and ext3 directly.

    Let me know if I can be of more help.

    Cheers,
    Erez.


    [31270.749254] ================================================== =====
    [31270.749473] [ INFO: possible circular locking dependency detected ]
    [31270.749720] 2.6.25-rc6-unionfs2 #69
    [31270.749831] -------------------------------------------------------
    [31270.749958] flush-caches.sh/9702 is trying to acquire lock:
    [31270.750171] (&journal->j_list_lock){--..}, at: [] journal_try_to_free_buffers+0xa5/0x158
    [31270.751607]
    [31270.751607] but task is already holding lock:
    [31270.751607] (inode_lock){--..}, at: [] drop_pagecache+0x45/0xd3
    [31270.751607]
    [31270.751607] which lock already depends on the new lock.
    [31270.751607]
    [31270.751607]
    [31270.751607] the existing dependency chain (in reverse order) is:
    [31270.751607]
    [31270.751607] -> #1 (inode_lock){--..}:
    [31270.751607] [] __lock_acquire+0x934/0xab8
    [31270.751607] [] lock_acquire+0x4e/0x6a
    [31270.751607] [] _spin_lock+0x23/0x32
    [31270.751607] [] __mark_inode_dirty+0xbe/0x134
    [31270.751607] [] __set_page_dirty+0xdd/0xec
    [31270.751607] [] mark_buffer_dirty+0x60/0x63
    [31270.751607] [] __journal_temp_unlink_buffer+0xc9/0xce
    [31270.751607] [] __journal_unfile_buffer+0xb/0x15
    [31270.751607] [] __journal_refile_buffer+0x42/0x8c
    [31270.751607] [] journal_commit_transaction+0xc59/0xe64
    [31270.751607] [] kjournald+0xae/0x1b0
    [31270.751607] [] kthread+0x3b/0x64
    [31270.751607] [] kernel_thread_helper+0x7/0x10
    [31270.751607] [] 0xffffffff
    [31270.751607]
    [31270.751607] -> #0 (&journal->j_list_lock){--..}:
    [31270.751607] [] __lock_acquire+0x85b/0xab8
    [31270.751607] [] lock_acquire+0x4e/0x6a
    [31270.751607] [] _spin_lock+0x23/0x32
    [31270.751607] [] journal_try_to_free_buffers+0xa5/0x158
    [31270.751607] [] ext3_releasepage+0x53/0x5c
    [31270.751607] [] try_to_release_page+0x32/0x3f
    [31270.751607] [] __invalidate_mapping_pages+0x69/0xc6
    [31270.751607] [] drop_pagecache+0x68/0xd3
    [31270.751607] [] drop_caches_sysctl_handler+0x29/0x3f
    [31270.751607] [] proc_sys_write+0x61/0x7e
    [31270.751607] [] vfs_write+0x8c/0x108
    [31270.751607] [] sys_write+0x3b/0x60
    [31270.751607] [] sysenter_past_esp+0x5f/0xa5
    [31270.751607] [] 0xffffffff
    [31270.751607]
    [31270.751607] other info that might help us debug this:
    [31270.751607]
    [31270.751607] 2 locks held by flush-caches.sh/9702:
    [31270.751607] #0: (&type->s_umount_key#12){----}, at: [] drop_pagecache+0x35/0xd3
    [31270.751607] #1: (inode_lock){--..}, at: [] drop_pagecache+0x45/0xd3
    [31270.751608]
    [31270.751608] stack backtrace:
    [31270.751608] Pid: 9702, comm: flush-caches.sh Not tainted 2.6.25-rc6-unionfs2 #69
    [31270.751608] [] print_circular_bug_tail+0x5b/0x66
    [31270.751608] [] __lock_acquire+0x85b/0xab8
    [31270.751608] [] lock_acquire+0x4e/0x6a
    [31270.751608] [] ? journal_try_to_free_buffers+0xa5/0x158
    [31270.751608] [] _spin_lock+0x23/0x32
    [31270.751608] [] ? journal_try_to_free_buffers+0xa5/0x158
    [31270.751608] [] journal_try_to_free_buffers+0xa5/0x158
    [31270.751608] [] ext3_releasepage+0x53/0x5c
    [31270.751608] [] try_to_release_page+0x32/0x3f
    [31270.751608] [] __invalidate_mapping_pages+0x69/0xc6
    [31270.751608] [] drop_pagecache+0x68/0xd3
    [31270.751608] [] drop_caches_sysctl_handler+0x29/0x3f
    [31270.751608] [] proc_sys_write+0x61/0x7e
    [31270.751608] [] ? proc_sys_write+0x0/0x7e
    [31270.751608] [] vfs_write+0x8c/0x108
    [31270.751608] [] sys_write+0x3b/0x60
    [31270.751608] [] sysenter_past_esp+0x5f/0xa5
    [31270.751608] =======================
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: ext3 lockdep warning in 2.6.25-rc6

    On Sat, 2008-03-22 at 10:37 -0400, Erez Zadok wrote:
    > I was building a kernel using "make -j 4" inside a unionfs, mounted on top
    > of ext3. The kernel is vanilla 2.6.25-rc6 plus unionfs patches. At some
    > point I forced a cache flush using "echo 3 > /proc/sys/vm/drop_caches" and I
    > got the lockdep warning below. Note that unionfs doesn't appear to be
    > involved in this lockdep warning at all, so I suspect this is probably an
    > issue between jbd and ext3 directly.


    Known issue, drop_caches isn't production quality.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: ext3 lockdep warning in 2.6.25-rc6

    In message <1206207527.6437.83.camel@lappy>, Peter Zijlstra writes:
    > On Sat, 2008-03-22 at 10:37 -0400, Erez Zadok wrote:
    > > I was building a kernel using "make -j 4" inside a unionfs, mounted on top
    > > of ext3. The kernel is vanilla 2.6.25-rc6 plus unionfs patches. At some
    > > point I forced a cache flush using "echo 3 > /proc/sys/vm/drop_caches" and I
    > > got the lockdep warning below. Note that unionfs doesn't appear to be
    > > involved in this lockdep warning at all, so I suspect this is probably an
    > > issue between jbd and ext3 directly.

    >
    > Known issue, drop_caches isn't production quality.


    Is this a general problem with drop_caches, or its interaction with
    ext3/jbd? If I get a similar lockdep warning with other file systems,
    should I bother to report this to the respective f/s maintainers?

    Thanks,
    Erez.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: ext3 lockdep warning in 2.6.25-rc6

    On Sat, 2008-03-22 at 18:34 -0400, Erez Zadok wrote:
    > In message <1206207527.6437.83.camel@lappy>, Peter Zijlstra writes:
    > > On Sat, 2008-03-22 at 10:37 -0400, Erez Zadok wrote:
    > > > I was building a kernel using "make -j 4" inside a unionfs, mounted on top
    > > > of ext3. The kernel is vanilla 2.6.25-rc6 plus unionfs patches. At some
    > > > point I forced a cache flush using "echo 3 > /proc/sys/vm/drop_caches" and I
    > > > got the lockdep warning below. Note that unionfs doesn't appear to be
    > > > involved in this lockdep warning at all, so I suspect this is probably an
    > > > issue between jbd and ext3 directly.

    > >
    > > Known issue, drop_caches isn't production quality.

    >
    > Is this a general problem with drop_caches, or its interaction with
    > ext3/jbd? If I get a similar lockdep warning with other file systems,
    > should I bother to report this to the respective f/s maintainers?


    Its a generic problem with drop_caches. If you look in the lkml archives
    you'll find similar reports against XFS and others.

    Some time ago Andrew outlined a fix for it, but so far that hasn't
    happened: http://lkml.org/lkml/2007/2/23/46



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: ext3 lockdep warning in 2.6.25-rc6

    > I was building a kernel using "make -j 4" inside a unionfs, mounted on top
    > of ext3. The kernel is vanilla 2.6.25-rc6 plus unionfs patches. At some
    > point I forced a cache flush using "echo 3 > /proc/sys/vm/drop_caches" and I
    > got the lockdep warning below. Note that unionfs doesn't appear to be
    > involved in this lockdep warning at all, so I suspect this is probably an
    > issue between jbd and ext3 directly.
    >
    > Let me know if I can be of more help.

    Actually, I have a fix for that - attached. Can you try it? Thanks.
    Honza

    > [31270.749254] ================================================== =====
    > [31270.749473] [ INFO: possible circular locking dependency detected ]
    > [31270.749720] 2.6.25-rc6-unionfs2 #69
    > [31270.749831] -------------------------------------------------------
    > [31270.749958] flush-caches.sh/9702 is trying to acquire lock:
    > [31270.750171] (&journal->j_list_lock){--..}, at: [] journal_try_to_free_buffers+0xa5/0x158
    > [31270.751607]
    > [31270.751607] but task is already holding lock:
    > [31270.751607] (inode_lock){--..}, at: [] drop_pagecache+0x45/0xd3
    > [31270.751607]
    > [31270.751607] which lock already depends on the new lock.
    > [31270.751607]
    > [31270.751607]
    > [31270.751607] the existing dependency chain (in reverse order) is:
    > [31270.751607]
    > [31270.751607] -> #1 (inode_lock){--..}:
    > [31270.751607] [] __lock_acquire+0x934/0xab8
    > [31270.751607] [] lock_acquire+0x4e/0x6a
    > [31270.751607] [] _spin_lock+0x23/0x32
    > [31270.751607] [] __mark_inode_dirty+0xbe/0x134
    > [31270.751607] [] __set_page_dirty+0xdd/0xec
    > [31270.751607] [] mark_buffer_dirty+0x60/0x63
    > [31270.751607] [] __journal_temp_unlink_buffer+0xc9/0xce
    > [31270.751607] [] __journal_unfile_buffer+0xb/0x15
    > [31270.751607] [] __journal_refile_buffer+0x42/0x8c
    > [31270.751607] [] journal_commit_transaction+0xc59/0xe64
    > [31270.751607] [] kjournald+0xae/0x1b0
    > [31270.751607] [] kthread+0x3b/0x64
    > [31270.751607] [] kernel_thread_helper+0x7/0x10
    > [31270.751607] [] 0xffffffff
    > [31270.751607]
    > [31270.751607] -> #0 (&journal->j_list_lock){--..}:
    > [31270.751607] [] __lock_acquire+0x85b/0xab8
    > [31270.751607] [] lock_acquire+0x4e/0x6a
    > [31270.751607] [] _spin_lock+0x23/0x32
    > [31270.751607] [] journal_try_to_free_buffers+0xa5/0x158
    > [31270.751607] [] ext3_releasepage+0x53/0x5c
    > [31270.751607] [] try_to_release_page+0x32/0x3f
    > [31270.751607] [] __invalidate_mapping_pages+0x69/0xc6
    > [31270.751607] [] drop_pagecache+0x68/0xd3
    > [31270.751607] [] drop_caches_sysctl_handler+0x29/0x3f
    > [31270.751607] [] proc_sys_write+0x61/0x7e
    > [31270.751607] [] vfs_write+0x8c/0x108
    > [31270.751607] [] sys_write+0x3b/0x60
    > [31270.751607] [] sysenter_past_esp+0x5f/0xa5
    > [31270.751607] [] 0xffffffff
    > [31270.751607]
    > [31270.751607] other info that might help us debug this:
    > [31270.751607]
    > [31270.751607] 2 locks held by flush-caches.sh/9702:
    > [31270.751607] #0: (&type->s_umount_key#12){----}, at: [] drop_pagecache+0x35/0xd3
    > [31270.751607] #1: (inode_lock){--..}, at: [] drop_pagecache+0x45/0xd3
    > [31270.751608]
    > [31270.751608] stack backtrace:
    > [31270.751608] Pid: 9702, comm: flush-caches.sh Not tainted 2.6.25-rc6-unionfs2 #69
    > [31270.751608] [] print_circular_bug_tail+0x5b/0x66
    > [31270.751608] [] __lock_acquire+0x85b/0xab8
    > [31270.751608] [] lock_acquire+0x4e/0x6a
    > [31270.751608] [] ? journal_try_to_free_buffers+0xa5/0x158
    > [31270.751608] [] _spin_lock+0x23/0x32
    > [31270.751608] [] ? journal_try_to_free_buffers+0xa5/0x158
    > [31270.751608] [] journal_try_to_free_buffers+0xa5/0x158
    > [31270.751608] [] ext3_releasepage+0x53/0x5c
    > [31270.751608] [] try_to_release_page+0x32/0x3f
    > [31270.751608] [] __invalidate_mapping_pages+0x69/0xc6
    > [31270.751608] [] drop_pagecache+0x68/0xd3
    > [31270.751608] [] drop_caches_sysctl_handler+0x29/0x3f
    > [31270.751608] [] proc_sys_write+0x61/0x7e
    > [31270.751608] [] ? proc_sys_write+0x0/0x7e
    > [31270.751608] [] vfs_write+0x8c/0x108
    > [31270.751608] [] sys_write+0x3b/0x60
    > [31270.751608] [] sysenter_past_esp+0x5f/0xa5
    > [31270.751608] =======================
    > --
    > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    > the body of a message to majordomo@vger.kernel.org
    > More majordomo info at http://vger.kernel.org/majordomo-info.html
    > Please read the FAQ at http://www.tux.org/lkml/

    --
    Jan Kara
    SuSE CR Labs


  6. Re: ext3 lockdep warning in 2.6.25-rc6

    In message <20080325182909.GD21732@atrey.karlin.mff.cuni.cz>, Jan Kara writes:
    >
    > --sdtB3X0nJg68CQEu
    > Content-Type: text/plain; charset=us-ascii
    > Content-Disposition: inline
    >
    > > I was building a kernel using "make -j 4" inside a unionfs, mounted on top
    > > of ext3. The kernel is vanilla 2.6.25-rc6 plus unionfs patches. At some
    > > point I forced a cache flush using "echo 3 > /proc/sys/vm/drop_caches" and I
    > > got the lockdep warning below. Note that unionfs doesn't appear to be
    > > involved in this lockdep warning at all, so I suspect this is probably an
    > > issue between jbd and ext3 directly.
    > >
    > > Let me know if I can be of more help.

    > Actually, I have a fix for that - attached. Can you try it? Thanks.
    > Honza
    >
    > > [31270.749254] ================================================== =====
    > > [31270.749473] [ INFO: possible circular locking dependency detected ]
    > > [31270.749720] 2.6.25-rc6-unionfs2 #69
    > > [31270.749831] -------------------------------------------------------
    > > [31270.749958] flush-caches.sh/9702 is trying to acquire lock:
    > > [31270.750171] (&journal->j_list_lock){--..}, at: [] journal_try_to_free_buffers+0xa5/0x158
    > > [31270.751607]
    > > [31270.751607] but task is already holding lock:
    > > [31270.751607] (inode_lock){--..}, at: [] drop_pagecache+0x45/0xd3
    > > [31270.751607]
    > > [31270.751607] which lock already depends on the new lock.
    > > [31270.751607]
    > > [31270.751607]
    > > [31270.751607] the existing dependency chain (in reverse order) is:
    > > [31270.751607]
    > > [31270.751607] -> #1 (inode_lock){--..}:
    > > [31270.751607] [] __lock_acquire+0x934/0xab8
    > > [31270.751607] [] lock_acquire+0x4e/0x6a
    > > [31270.751607] [] _spin_lock+0x23/0x32
    > > [31270.751607] [] __mark_inode_dirty+0xbe/0x134
    > > [31270.751607] [] __set_page_dirty+0xdd/0xec
    > > [31270.751607] [] mark_buffer_dirty+0x60/0x63
    > > [31270.751607] [] __journal_temp_unlink_buffer+0xc9/0xce
    > > [31270.751607] [] __journal_unfile_buffer+0xb/0x15
    > > [31270.751607] [] __journal_refile_buffer+0x42/0x8c
    > > [31270.751607] [] journal_commit_transaction+0xc59/0xe64
    > > [31270.751607] [] kjournald+0xae/0x1b0
    > > [31270.751607] [] kthread+0x3b/0x64
    > > [31270.751607] [] kernel_thread_helper+0x7/0x10
    > > [31270.751607] [] 0xffffffff
    > > [31270.751607]
    > > [31270.751607] -> #0 (&journal->j_list_lock){--..}:
    > > [31270.751607] [] __lock_acquire+0x85b/0xab8
    > > [31270.751607] [] lock_acquire+0x4e/0x6a
    > > [31270.751607] [] _spin_lock+0x23/0x32
    > > [31270.751607] [] journal_try_to_free_buffers+0xa5/0x158
    > > [31270.751607] [] ext3_releasepage+0x53/0x5c
    > > [31270.751607] [] try_to_release_page+0x32/0x3f
    > > [31270.751607] [] __invalidate_mapping_pages+0x69/0xc6
    > > [31270.751607] [] drop_pagecache+0x68/0xd3
    > > [31270.751607] [] drop_caches_sysctl_handler+0x29/0x3f
    > > [31270.751607] [] proc_sys_write+0x61/0x7e
    > > [31270.751607] [] vfs_write+0x8c/0x108
    > > [31270.751607] [] sys_write+0x3b/0x60
    > > [31270.751607] [] sysenter_past_esp+0x5f/0xa5
    > > [31270.751607] [] 0xffffffff
    > > [31270.751607]
    > > [31270.751607] other info that might help us debug this:
    > > [31270.751607]
    > > [31270.751607] 2 locks held by flush-caches.sh/9702:
    > > [31270.751607] #0: (&type->s_umount_key#12){----}, at: [] drop_pagecache+0x35/0xd3
    > > [31270.751607] #1: (inode_lock){--..}, at: [] drop_pagecache+0x45/0xd3
    > > [31270.751608]
    > > [31270.751608] stack backtrace:
    > > [31270.751608] Pid: 9702, comm: flush-caches.sh Not tainted 2.6.25-rc6-unionfs2 #69
    > > [31270.751608] [] print_circular_bug_tail+0x5b/0x66
    > > [31270.751608] [] __lock_acquire+0x85b/0xab8
    > > [31270.751608] [] lock_acquire+0x4e/0x6a
    > > [31270.751608] [] ? journal_try_to_free_buffers+0xa5/0x158
    > > [31270.751608] [] _spin_lock+0x23/0x32
    > > [31270.751608] [] ? journal_try_to_free_buffers+0xa5/0x158
    > > [31270.751608] [] journal_try_to_free_buffers+0xa5/0x158
    > > [31270.751608] [] ext3_releasepage+0x53/0x5c
    > > [31270.751608] [] try_to_release_page+0x32/0x3f
    > > [31270.751608] [] __invalidate_mapping_pages+0x69/0xc6
    > > [31270.751608] [] drop_pagecache+0x68/0xd3
    > > [31270.751608] [] drop_caches_sysctl_handler+0x29/0x3f
    > > [31270.751608] [] proc_sys_write+0x61/0x7e
    > > [31270.751608] [] ? proc_sys_write+0x0/0x7e
    > > [31270.751608] [] vfs_write+0x8c/0x108
    > > [31270.751608] [] sys_write+0x3b/0x60
    > > [31270.751608] [] sysenter_past_esp+0x5f/0xa5
    > > [31270.751608] =======================
    > > --
    > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    > > the body of a message to majordomo@vger.kernel.org
    > > More majordomo info at http://vger.kernel.org/majordomo-info.html
    > > Please read the FAQ at http://www.tux.org/lkml/

    > --
    > Jan Kara
    > SuSE CR Labs
    >
    > --sdtB3X0nJg68CQEu
    > Content-Type: text/x-diff; charset=us-ascii
    > Content-Disposition: attachment; filename="vfs-2.6.25-drop_pagecache_sb_fix.diff"
    >
    > From f5e41087e345fa5c3b46ac36e6e4a654d2f7f624 Mon Sep 17 00:00:00 2001
    > From: Jan Kara
    > Date: Tue, 18 Mar 2008 14:38:06 +0100
    > Subject: [PATCH] Fix drop_pagecache_sb() to not call __invalidate_mapping_pages() under
    > inode_lock.
    >
    > Signed-off-by: Jan Kara
    > ---
    > fs/drop_caches.c | 8 +++++++-
    > 1 files changed, 7 insertions(+), 1 deletions(-)
    >
    > diff --git a/fs/drop_caches.c b/fs/drop_caches.c
    > index 59375ef..f5aae26 100644
    > --- a/fs/drop_caches.c
    > +++ b/fs/drop_caches.c
    > @@ -14,15 +14,21 @@ int sysctl_drop_caches;
    >
    > static void drop_pagecache_sb(struct super_block *sb)
    > {
    > - struct inode *inode;
    > + struct inode *inode, *toput_inode = NULL;
    >
    > spin_lock(&inode_lock);
    > list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
    > if (inode->i_state & (I_FREEING|I_WILL_FREE))
    > continue;
    > + __iget(inode);
    > + spin_unlock(&inode_lock);
    > __invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
    > + iput(toput_inode);
    > + toput_inode = inode;
    > + spin_lock(&inode_lock);
    > }
    > spin_unlock(&inode_lock);
    > + iput(toput_inode);
    > }


    Jan, I'll be happy to test this, but I don't understand two things about
    this patch:

    1. Is it safe to unlock and re-lock inode_lock temporarily within the loop?

    2. What's the motivation behind having the second toput_inode pointer? It
    appears that the first iteration through the loop, toput_inode will be
    NULL, so we'll be iput'ing a NULL pointer (which is ok). So you're
    trying to iput the previous inode pointer that the list iterated over,
    right? Is that intended?

    Peter's post:

    http://lkml.org/lkml/2008/3/23/202

    includes a reference to a mail by Andrew which implies that the fix may be
    much more involved than what you outlined above, no?

    Thanks,
    Erez.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: ext3 lockdep warning in 2.6.25-rc6

    On Tue 01-04-08 17:23:34, Erez Zadok wrote:

    > > From f5e41087e345fa5c3b46ac36e6e4a654d2f7f624 Mon Sep 17 00:00:00 2001
    > > From: Jan Kara
    > > Date: Tue, 18 Mar 2008 14:38:06 +0100
    > > Subject: [PATCH] Fix drop_pagecache_sb() to not call __invalidate_mapping_pages() under
    > > inode_lock.
    > >
    > > Signed-off-by: Jan Kara
    > > ---
    > > fs/drop_caches.c | 8 +++++++-
    > > 1 files changed, 7 insertions(+), 1 deletions(-)
    > >
    > > diff --git a/fs/drop_caches.c b/fs/drop_caches.c
    > > index 59375ef..f5aae26 100644
    > > --- a/fs/drop_caches.c
    > > +++ b/fs/drop_caches.c
    > > @@ -14,15 +14,21 @@ int sysctl_drop_caches;
    > >
    > > static void drop_pagecache_sb(struct super_block *sb)
    > > {
    > > - struct inode *inode;
    > > + struct inode *inode, *toput_inode = NULL;
    > >
    > > spin_lock(&inode_lock);
    > > list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
    > > if (inode->i_state & (I_FREEING|I_WILL_FREE))
    > > continue;
    > > + __iget(inode);
    > > + spin_unlock(&inode_lock);
    > > __invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
    > > + iput(toput_inode);
    > > + toput_inode = inode;
    > > + spin_lock(&inode_lock);
    > > }
    > > spin_unlock(&inode_lock);
    > > + iput(toput_inode);
    > > }

    >
    > Jan, I'll be happy to test this, but I don't understand two things about
    > this patch:
    >
    > 1. Is it safe to unlock and re-lock inode_lock temporarily within the loop?
    >
    > 2. What's the motivation behind having the second toput_inode pointer? It
    > appears that the first iteration through the loop, toput_inode will be
    > NULL, so we'll be iput'ing a NULL pointer (which is ok). So you're
    > trying to iput the previous inode pointer that the list iterated over,
    > right? Is that intended?

    I'll try to explain the locking here:
    1) We are not allowed to call into __invalidate_mapping_pages() with
    inode_lock held - that it the bug lockdep is complaining about. Moreover it
    leads to rather long waiting times for inode_lock (quite possibly several
    seconds).
    2) When we release inode_lock, we need to protect from inode going away,
    thus we hold reference to it - that guarantees us inode stays in the list.
    3) When we continue scanning of the list we must get inode_lock before we
    put the inode reference to avoid races. But we cannot do iput() when we
    hold inode_lock. Thus we save pointer to inode and do iput() next time we
    have released the inode_lock...

    > Peter's post:
    >
    > http://lkml.org/lkml/2008/3/23/202
    >
    > includes a reference to a mail by Andrew which implies that the fix may be
    > much more involved than what you outlined above, no?

    Definitely we can do a more involved fix What Andrew proposes would have
    some other benefits as well. But until somebody gets to that, this slight
    hack should work fine (Andrew has already merged my patch in -mm so I
    guess he agrees .

    Honza
    --
    Jan Kara
    SUSE Labs, CR
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: ext3 lockdep warning in 2.6.25-rc6

    In message <20080402081215.GA2716@duck.suse.cz>, Jan Kara writes:
    > On Tue 01-04-08 17:23:34, Erez Zadok wrote:
    >

    [...]
    > I'll try to explain the locking here:
    > 1) We are not allowed to call into __invalidate_mapping_pages() with
    > inode_lock held - that it the bug lockdep is complaining about. Moreover it
    > leads to rather long waiting times for inode_lock (quite possibly several
    > seconds).
    > 2) When we release inode_lock, we need to protect from inode going away,
    > thus we hold reference to it - that guarantees us inode stays in the list.
    > 3) When we continue scanning of the list we must get inode_lock before we
    > put the inode reference to avoid races. But we cannot do iput() when we
    > hold inode_lock. Thus we save pointer to inode and do iput() next time we
    > have released the inode_lock...
    >
    > > Peter's post:
    > >
    > > http://lkml.org/lkml/2008/3/23/202
    > >
    > > includes a reference to a mail by Andrew which implies that the fix may be
    > > much more involved than what you outlined above, no?

    > Definitely we can do a more involved fix What Andrew proposes would have
    > some other benefits as well. But until somebody gets to that, this slight
    > hack should work fine (Andrew has already merged my patch in -mm so I
    > guess he agrees .


    Thanks for the explanation. And good to know it's in -mm. I'll give it a
    try.

    Cheers,
    Erez.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: ext3 lockdep warning in 2.6.25-rc6

    On Wed, Apr 02, 2008 at 10:12:15AM +0200, Jan Kara wrote:
    > On Tue 01-04-08 17:23:34, Erez Zadok wrote:
    >


    > > Jan, I'll be happy to test this, but I don't understand two things about
    > > this patch:
    > >
    > > 1. Is it safe to unlock and re-lock inode_lock temporarily within the loop?
    > >
    > > 2. What's the motivation behind having the second toput_inode pointer? It
    > > appears that the first iteration through the loop, toput_inode will be
    > > NULL, so we'll be iput'ing a NULL pointer (which is ok). So you're
    > > trying to iput the previous inode pointer that the list iterated over,
    > > right? Is that intended?

    > I'll try to explain the locking here:
    > 1) We are not allowed to call into __invalidate_mapping_pages() with
    > inode_lock held - that it the bug lockdep is complaining about. Moreover it
    > leads to rather long waiting times for inode_lock (quite possibly several
    > seconds).


    When you have tens of millions of cached inodes, it can take somewhat
    longer than a few seconds.... :/

    Cheers,

    Dave.
    --
    Dave Chinner
    Principal Engineer
    SGI Australian Software Group
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: ext3 lockdep warning in 2.6.25-rc6

    In message <20080325182909.GD21732@atrey.karlin.mff.cuni.cz>, Jan Kara writes:

    > > I was building a kernel using "make -j 4" inside a unionfs, mounted on top
    > > of ext3. The kernel is vanilla 2.6.25-rc6 plus unionfs patches. At some
    > > point I forced a cache flush using "echo 3 > /proc/sys/vm/drop_caches" and I
    > > got the lockdep warning below. Note that unionfs doesn't appear to be
    > > involved in this lockdep warning at all, so I suspect this is probably an
    > > issue between jbd and ext3 directly.
    > >
    > > Let me know if I can be of more help.

    > Actually, I have a fix for that - attached. Can you try it? Thanks.
    > Honza

    [...]

    > From f5e41087e345fa5c3b46ac36e6e4a654d2f7f624 Mon Sep 17 00:00:00 2001
    > From: Jan Kara
    > Date: Tue, 18 Mar 2008 14:38:06 +0100
    > Subject: [PATCH] Fix drop_pagecache_sb() to not call __invalidate_mapping_pages() under
    > inode_lock.
    >
    > Signed-off-by: Jan Kara
    > ---
    > fs/drop_caches.c | 8 +++++++-
    > 1 files changed, 7 insertions(+), 1 deletions(-)
    >
    > diff --git a/fs/drop_caches.c b/fs/drop_caches.c
    > index 59375ef..f5aae26 100644
    > --- a/fs/drop_caches.c
    > +++ b/fs/drop_caches.c
    > @@ -14,15 +14,21 @@ int sysctl_drop_caches;
    >
    > static void drop_pagecache_sb(struct super_block *sb)
    > {
    > - struct inode *inode;
    > + struct inode *inode, *toput_inode = NULL;
    >
    > spin_lock(&inode_lock);
    > list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
    > if (inode->i_state & (I_FREEING|I_WILL_FREE))
    > continue;
    > + __iget(inode);
    > + spin_unlock(&inode_lock);
    > __invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
    > + iput(toput_inode);
    > + toput_inode = inode;
    > + spin_lock(&inode_lock);
    > }
    > spin_unlock(&inode_lock);
    > + iput(toput_inode);
    > }
    >
    > void drop_pagecache(void)


    Jan, I tried the above patch on top of v2.6.25-rc8-82-g49115b7, and using
    the same setup and workloads that produced the warning before: running "make
    -j 20" of the linux kernel 100 times, dual-CPU, SMP, PREEMPT, while running
    flush_cache every few seconds. The entire run took over 12 hours. I'm
    happy to report that so far I've not gotten the same lockdep warning as
    before.

    Maybe this can now go to -mm for more testing?

    Cheers,
    Erez.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: ext3 lockdep warning in 2.6.25-rc6

    On Fri 04-04-08 11:37:58, Erez Zadok wrote:
    > In message <20080325182909.GD21732@atrey.karlin.mff.cuni.cz>, Jan Kara writes:
    >
    > > > I was building a kernel using "make -j 4" inside a unionfs, mounted on top
    > > > of ext3. The kernel is vanilla 2.6.25-rc6 plus unionfs patches. At some
    > > > point I forced a cache flush using "echo 3 > /proc/sys/vm/drop_caches" and I
    > > > got the lockdep warning below. Note that unionfs doesn't appear to be
    > > > involved in this lockdep warning at all, so I suspect this is probably an
    > > > issue between jbd and ext3 directly.
    > > >
    > > > Let me know if I can be of more help.

    > > Actually, I have a fix for that - attached. Can you try it? Thanks.
    > > Honza

    > [...]
    >
    > > From f5e41087e345fa5c3b46ac36e6e4a654d2f7f624 Mon Sep 17 00:00:00 2001
    > > From: Jan Kara
    > > Date: Tue, 18 Mar 2008 14:38:06 +0100
    > > Subject: [PATCH] Fix drop_pagecache_sb() to not call __invalidate_mapping_pages() under
    > > inode_lock.
    > >
    > > Signed-off-by: Jan Kara
    > > ---
    > > fs/drop_caches.c | 8 +++++++-
    > > 1 files changed, 7 insertions(+), 1 deletions(-)
    > >
    > > diff --git a/fs/drop_caches.c b/fs/drop_caches.c
    > > index 59375ef..f5aae26 100644
    > > --- a/fs/drop_caches.c
    > > +++ b/fs/drop_caches.c
    > > @@ -14,15 +14,21 @@ int sysctl_drop_caches;
    > >
    > > static void drop_pagecache_sb(struct super_block *sb)
    > > {
    > > - struct inode *inode;
    > > + struct inode *inode, *toput_inode = NULL;
    > >
    > > spin_lock(&inode_lock);
    > > list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
    > > if (inode->i_state & (I_FREEING|I_WILL_FREE))
    > > continue;
    > > + __iget(inode);
    > > + spin_unlock(&inode_lock);
    > > __invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
    > > + iput(toput_inode);
    > > + toput_inode = inode;
    > > + spin_lock(&inode_lock);
    > > }
    > > spin_unlock(&inode_lock);
    > > + iput(toput_inode);
    > > }
    > >
    > > void drop_pagecache(void)

    >
    > Jan, I tried the above patch on top of v2.6.25-rc8-82-g49115b7, and using
    > the same setup and workloads that produced the warning before: running "make
    > -j 20" of the linux kernel 100 times, dual-CPU, SMP, PREEMPT, while running
    > flush_cache every few seconds. The entire run took over 12 hours. I'm
    > happy to report that so far I've not gotten the same lockdep warning as
    > before.

    Great news, thanks.

    > Maybe this can now go to -mm for more testing?

    Andrew has already merged it .

    Honza
    --
    Jan Kara
    SUSE Labs, CR
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread