latest -git: BUG at fs/jfs/namei.c:512 assert(ip->i_nlink) - Kernel

This is a discussion on latest -git: BUG at fs/jfs/namei.c:512 assert(ip->i_nlink) - Kernel ; Hi, I got this on latest -git with an intentionally corrupted filesystem image: BUG at fs/jfs/namei.c:512 assert(ip->i_nlink) ------------[ cut here ]------------ kernel BUG at fs/jfs/namei.c:512! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Pid: 4408, comm: rm Not tainted (2.6.26-03414-g33af79d #43) ...

+ Reply to Thread
Results 1 to 12 of 12

Thread: latest -git: BUG at fs/jfs/namei.c:512 assert(ip->i_nlink)

  1. latest -git: BUG at fs/jfs/namei.c:512 assert(ip->i_nlink)

    Hi,

    I got this on latest -git with an intentionally corrupted filesystem image:

    BUG at fs/jfs/namei.c:512 assert(ip->i_nlink)
    ------------[ cut here ]------------
    kernel BUG at fs/jfs/namei.c:512!
    invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Pid: 4408, comm: rm Not tainted (2.6.26-03414-g33af79d #43)
    EIP: 0060:[] EFLAGS: 00210286 CPU: 1
    EIP is at jfs_unlink+0x418/0x430
    EAX: 00000031 EBX: 00000000 ECX: f2d16000 EDX: 00000000
    ESI: 0000004c EDI: f43b12dc EBP: f2d17f20 ESP: f2d17ed0
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    Process rm (pid: 4408, ti=f2d16000 task=f4b73fc0 task.ti=f2d16000)
    Stack: c086269e c08626ce 00000200 c0862725 00000000 00000002 f43ba83c f43b10c8
    f43ba628 f43b108c f43b1008 f43ba568 f43ba83c f43b12dc 0000000a f6d33500
    0000005b 00000000 f43ba83c f4281c30 f2d17f3c c01ad04b c1fb0480 f2d17f3c
    Call Trace:
    [] ? vfs_unlink+0xab/0x100
    [] ? do_unlinkat+0x15b/0x170
    [] ? vfs_readdir+0x6b/0xa0
    [] ? filldir64+0x0/0xf0
    [] ? trace_hardirqs_on_thunk+0xc/0x10
    [] ? trace_hardirqs_on_caller+0x116/0x170
    [] ? sys_unlinkat+0x23/0x50
    [] ? sysenter_past_esp+0x78/0xc5
    =======================
    Code: 5f 48 00 0f 0b eb fe c7 44 24 0c 25 27 86 c0 c7 44 24 08 00 02
    00 00 c7 44 24 04 ce 26 86 c0 c7 04 24 9e 26 86 c0 e8 dc 5e 48 00 <0f>
    0b eb fe ba 01 00 00 00 89 f0 e8 08 d5 01 00 e9 f2 fc ff ff
    EIP: [] jfs_unlink+0x418/0x430 SS:ESP 0068:f2d17ed0
    Kernel panic - not syncing: Fatal exception

    Full log at http://folk.uio.no/vegardno/linux/log-1216318656.txt, but
    I think the preceding messages are just left-overs from previous
    mount/remount attempts. I can test patches.


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: latest -git: BUG at fs/jfs/namei.c:512 assert(ip->i_nlink)

    On Thu, 2008-07-17 at 20:35 +0200, Vegard Nossum wrote:
    > Hi,
    >
    > I got this on latest -git with an intentionally corrupted filesystem image:
    >
    > BUG at fs/jfs/namei.c:512 assert(ip->i_nlink)


    This assert shouldn't be here. It would be better to handle this with
    jfs_error(), which will mark the superblock dirty, and take appropriate
    action.

    > Full log at http://folk.uio.no/vegardno/linux/log-1216318656.txt, but
    > I think the preceding messages are just left-overs from previous
    > mount/remount attempts. I can test patches.


    How about this one? So far, compile tested only.

    JFS: The kernel shouldn't trap when deleting a file with nlink == 0.

    Signed-off-by: Dave Kleikamp

    diff --git a/fs/jfs/namei.c b/fs/jfs/namei.c
    index 2aba823..10340be 100644
    --- a/fs/jfs/namei.c
    +++ b/fs/jfs/namei.c
    @@ -508,33 +508,35 @@ static int jfs_unlink(struct inode *dip, struct dentry *dentry)
    IWRITE_UNLOCK(ip);
    goto out1;
    }
    -
    - ASSERT(ip->i_nlink);
    -
    ip->i_ctime = dip->i_ctime = dip->i_mtime = CURRENT_TIME;
    mark_inode_dirty(dip);

    - /* update target's inode */
    - inode_dec_link_count(ip);
    + if (ip->i_nlink == 0)
    + jfs_error(dip->i_sb, "jfs_unlink: i_nlink already zero");
    + else {
    + /* update target's inode */
    + inode_dec_link_count(ip);

    - /*
    - * commit zero link count object
    - */
    - if (ip->i_nlink == 0) {
    - assert(!test_cflag(COMMIT_Nolink, ip));
    - /* free block resources */
    - if ((new_size = commitZeroLink(tid, ip)) < 0) {
    - txAbort(tid, 1); /* Marks FS Dirty */
    - txEnd(tid);
    - mutex_unlock(&JFS_IP(ip)->commit_mutex);
    - mutex_unlock(&JFS_IP(dip)->commit_mutex);
    - IWRITE_UNLOCK(ip);
    - rc = new_size;
    - goto out1;
    + /*
    + * commit zero link count object
    + */
    + if (ip->i_nlink == 0) {
    + assert(!test_cflag(COMMIT_Nolink, ip));
    + /* free block resources */
    + new_size = commitZeroLink(tid, ip);
    + if (new_size < 0) {
    + txAbort(tid, 1); /* Marks FS Dirty */
    + txEnd(tid);
    + mutex_unlock(&JFS_IP(ip)->commit_mutex);
    + mutex_unlock(&JFS_IP(dip)->commit_mutex);
    + IWRITE_UNLOCK(ip);
    + rc = new_size;
    + goto out1;
    + }
    + tblk = tid_to_tblock(tid);
    + tblk->xflag |= COMMIT_DELETE;
    + tblk->u.ip = ip;
    }
    - tblk = tid_to_tblock(tid);
    - tblk->xflag |= COMMIT_DELETE;
    - tblk->u.ip = ip;
    }

    /*

    --
    David Kleikamp
    IBM Linux Technology Center

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: latest -git: BUG at fs/jfs/namei.c:512 assert(ip->i_nlink)

    On Fri, Jul 18, 2008 at 1:09 AM, Dave Kleikamp
    wrote:
    > On Thu, 2008-07-17 at 20:35 +0200, Vegard Nossum wrote:
    >> Hi,
    >>
    >> I got this on latest -git with an intentionally corrupted filesystem image:
    >>
    >> BUG at fs/jfs/namei.c:512 assert(ip->i_nlink)

    >
    > This assert shouldn't be here. It would be better to handle this with
    > jfs_error(), which will mark the superblock dirty, and take appropriate
    > action.
    >
    >> Full log at http://folk.uio.no/vegardno/linux/log-1216318656.txt, but
    >> I think the preceding messages are just left-overs from previous
    >> mount/remount attempts. I can test patches.

    >
    > How about this one? So far, compile tested only.
    >
    > JFS: The kernel shouldn't trap when deleting a file with nlink == 0.
    >
    > Signed-off-by: Dave Kleikamp


    Thanks!

    But whether it truly helped or not, I don't know. I got something different now:

    jfs_unlink: dtDelete returned -116
    jfs_unlink: dtDelete returned -116
    ------------[ cut here ]------------
    kernel BUG at fs/inode.c:262!
    invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Pid: 387, comm: jfsCommit Not tainted (2.6.26-03415-gdf3030b #45)
    EIP: 0060:[] EFLAGS: 00010206 CPU: 0
    EIP is at clear_inode+0x149/0x160
    EAX: 00000005 EBX: f56e283c ECX: f7242fd0 EDX: 00000000
    ESI: f56e2568 EDI: 00000000 EBP: f79f9f0c ESP: f79f9f04
    DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    Process jfsCommit (pid: 387, ti=f79f8000 task=f7242fd0 task.ti=f79f8000)
    Stack: 00000001 f56e283c f79f9f24 c02de258 c08f7ba0 f79f9f24 c02de220 f56e283c
    f79f9f34 c01b9ba1 f56e283c 00000000 f79f9f44 c01b9d67 f56e283c 00000000
    f79f9f50 c01b8be7 f56e283c f79f9f9c c02fd5b5 f88261bc f79f9f70 c0159173
    Call Trace:
    [] ? jfs_delete_inode+0x38/0x130
    [] ? jfs_delete_inode+0x0/0x130
    [] ? generic_delete_inode+0x81/0x120
    [] ? generic_drop_inode+0x127/0x180
    [] ? iput+0x47/0x50
    [] ? txUpdateMap+0x1d5/0x2a0
    [] ? lock_release_holdtime+0x83/0x120
    [] ? trace_hardirqs_on+0xb/0x10
    [] ? jfs_lazycommit+0xd1/0x2e0
    [] ? default_wake_function+0x0/0x10
    [] ? jfs_lazycommit+0x0/0x2e0
    [] ? kthread+0x3c/0x70
    [] ? kthread+0x0/0x70
    [] ? kernel_thread_helper+0x7/0x1c
    =======================
    Code: 00 f0 00 00 e9 65 ff ff ff 90 8d 83 44 02 00 00 b9 90 89 1b c0
    ba 08 00 00 00 c7 04 24 02 00 00 00 e8 cc ed 58 00 e9 16 ff ff ff <0f>
    0b eb fe 8d 76 00 0f 0b eb fe 8d 74 26 00 0f 0b eb fe 8d 74
    EIP: [] clear_inode+0x149/0x160 SS:ESP 0068:f79f9f04
    Kernel panic - not syncing: Fatal exception

    ....but it might just be a different error altogether.


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: latest -git: BUG at fs/jfs/namei.c:512 assert(ip->i_nlink)

    On Fri, Jul 18, 2008 at 10:10 AM, Vegard Nossum wrote:
    > On Fri, Jul 18, 2008 at 1:09 AM, Dave Kleikamp
    > wrote:
    >> On Thu, 2008-07-17 at 20:35 +0200, Vegard Nossum wrote:
    >>> Hi,
    >>>
    >>> I got this on latest -git with an intentionally corrupted filesystem image:
    >>>
    >>> BUG at fs/jfs/namei.c:512 assert(ip->i_nlink)

    >>
    >> This assert shouldn't be here. It would be better to handle this with
    >> jfs_error(), which will mark the superblock dirty, and take appropriate
    >> action.
    >>
    >>> Full log at http://folk.uio.no/vegardno/linux/log-1216318656.txt, but
    >>> I think the preceding messages are just left-overs from previous
    >>> mount/remount attempts. I can test patches.

    >>
    >> How about this one? So far, compile tested only.
    >>
    >> JFS: The kernel shouldn't trap when deleting a file with nlink == 0.
    >>
    >> Signed-off-by: Dave Kleikamp

    >
    > Thanks!
    >
    > But whether it truly helped or not, I don't know. I got something different now:
    >
    > jfs_unlink: dtDelete returned -116
    > jfs_unlink: dtDelete returned -116
    > ------------[ cut here ]------------
    > kernel BUG at fs/inode.c:262!


    Running the test for some more on a freshly booted system also gives
    me this after a while (I don't know if it's related, but it might give
    some additional clues?):

    BUG: unable to handle kernel paging request at c08845af
    IP: [] release_metapage+0x32/0x1c0
    *pde = 37ab2163 *pte = 00884161
    Oops: 0003 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Pid: 387, comm: jfsCommit Not tainted (2.6.26-03415-gdf3030b #45)
    EIP: 0060:[] EFLAGS: 00010202 CPU: 1
    EIP is at release_metapage+0x32/0x1c0
    EAX: 00000246 EBX: f470e5d8 ECX: f7a2afd0 EDX: 00000000
    ESI: c08845af EDI: 00000000 EBP: f735fd98 ESP: f735fd7c
    DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    Process jfsCommit (pid: 387, ti=f735e000 task=f7a2afd0 task.ti=f735e000)
    Stack: c015b6e9 00000001 00000286 00000246 00000002 00000000 00000000 f735fed0
    c02e4202 00000020 f7a2afd0 c015addb 00000000 f470e9a4 f735fe34 00000000
    00000000 f470e85c c015addb 00000000 00000000 f470e5d8 f470e6d8 00000002
    Call Trace:
    [] ? __lock_acquire+0x2c9/0x1110
    [] ? xtTruncate+0xd42/0x1060
    [] ? trace_hardirqs_on+0xb/0x10
    [] ? trace_hardirqs_on+0xb/0x10
    [] ? trace_hardirqs_on+0xb/0x10
    [] ? _read_unlock_irq+0x36/0x60
    [] ? find_get_pages+0x68/0x80
    [] ? pagevec_lookup+0x22/0x30
    [] ? truncate_inode_pages_range+0x189/0x360
    [] ? lock_release_holdtime+0x83/0x120
    [] ? jfs_free_zero_link+0x98/0x1b0
    [] ? put_lock_stats+0xd/0x30
    [] ? jfs_delete_inode+0xdd/0x130
    [] ? jfs_delete_inode+0x0/0x130
    [] ? generic_delete_inode+0x81/0x120
    [] ? generic_drop_inode+0x127/0x180
    [] ? iput+0x47/0x50
    [] ? txUpdateMap+0x1d5/0x2a0
    [] ? lock_release_holdtime+0x83/0x120
    [] ? trace_hardirqs_on+0xb/0x10
    [] ? jfs_lazycommit+0xd1/0x2e0
    [] ? default_wake_function+0x0/0x10
    [] ? jfs_lazycommit+0x0/0x2e0
    [] ? kthread+0x3c/0x70
    [] ? kthread+0x0/0x70
    [] ? kernel_thread_helper+0x7/0x1c
    =======================
    Code: c3 83 ec 10 8b 70 50 83 3d 40 d8 8f c0 03 0f 8f 85 01 00 00 85
    f6 0f 84 75 01 00 00 ba a9 00 00 00 b8 91 05 84 c0 e8 ee 52 e3 ff
    0f ba 2e 00 19 c0 85 c0 75 7b 8d 7b 14 f0 80 63 14 fe b9 01
    EIP: [] release_metapage+0x32/0x1c0 SS:ESP 0068:f735fd7c
    Kernel panic - not syncing: Fatal exception
    ------------[ cut here ]------------
    WARNING: at kernel/smp.c:288 smp_call_function_mask+0x154/0x160()
    Pid: 387, comm: jfsCommit Tainted: G D 2.6.26-03415-gdf3030b #45
    [] warn_on_slowpath+0x4f/0xa0
    [] ? trace_hardirqs_off+0xb/0x10
    [] ? __rcu_read_unlock+0xb2/0xc0
    [] ? __atomic_notifier_call_chain+0x3c/0x50
    [] ? _spin_unlock+0x27/0x50
    [] ? vt_console_print+0x230/0x320
    [] ? trace_hardirqs_off+0xb/0x10
    [] ? _spin_unlock_irqrestore+0x43/0x80
    [] ? release_console_sem+0x1e9/0x200
    [] smp_call_function_mask+0x154/0x160
    [] ? stop_this_cpu+0x0/0x50
    [] ? show_registers+0x7f/0x240
    [] ? stop_this_cpu+0x0/0x50
    [] smp_call_function+0x30/0x60
    [] native_smp_send_stop+0x1e/0x80
    [] ? die+0x110/0x180
    [] ? do_page_fault+0x14d/0x880
    [] ? trace_hardirqs_off+0xb/0x10
    [] ? mark_held_locks+0x40/0x80
    [] ? trace_hardirqs_on_caller+0x116/0x170
    [] ? trace_hardirqs_off+0xb/0x10
    [] ? native_sched_clock+0xb2/0xe0
    [] ? get_lock_stats+0x1e/0x50
    [] ? put_lock_stats+0xd/0x30
    [] ? lock_release_holdtime+0x83/0x120
    [] ? up_read+0x16/0x30
    [] ? dbFree+0x194/0x270
    [] ? do_page_fault+0x0/0x880
    [] ? error_code+0x72/0x78
    [] ? dbSync+0x68/0x140
    [] ? release_metapage+0x32/0x1c0
    [] ? __lock_acquire+0x2c9/0x1110
    [] ? xtTruncate+0xd42/0x1060
    [] ? trace_hardirqs_on+0xb/0x10
    [] ? trace_hardirqs_on+0xb/0x10
    [] ? trace_hardirqs_on+0xb/0x10
    [] ? _read_unlock_irq+0x36/0x60
    [] ? find_get_pages+0x68/0x80
    [] ? pagevec_lookup+0x22/0x30
    [] ? truncate_inode_pages_range+0x189/0x360
    [] ? lock_release_holdtime+0x83/0x120
    [] ? jfs_free_zero_link+0x98/0x1b0
    [] ? put_lock_stats+0xd/0x30
    [] ? jfs_delete_inode+0xdd/0x130
    [] ? jfs_delete_inode+0x0/0x130
    [] ? generic_delete_inode+0x81/0x120
    [] ? generic_drop_inode+0x127/0x180
    [] ? iput+0x47/0x50
    [] ? txUpdateMap+0x1d5/0x2a0
    [] ? lock_release_holdtime+0x83/0x120
    [] ? trace_hardirqs_on+0xb/0x10
    [] ? jfs_lazycommit+0xd1/0x2e0
    [] ? default_wake_function+0x0/0x10
    [] ? jfs_lazycommit+0x0/0x2e0
    [] ? kthread+0x3c/0x70
    [] ? kthread+0x0/0x70
    [] ? kernel_thread_helper+0x7/0x1c
    =======================
    ---[ end trace 3a81a57639faabca ]---
    ------------[ cut here ]------------
    WARNING: at kernel/smp.c:214 smp_call_function_single+0x11f/0x130()
    Pid: 387, comm: jfsCommit Tainted: G D W 2.6.26-03415-gdf3030b #45
    [] warn_on_slowpath+0x4f/0xa0
    [] ? trace_hardirqs_off+0xb/0x10
    [] ? __rcu_read_unlock+0xb2/0xc0
    [] ? __atomic_notifier_call_chain+0x3c/0x50
    [] ? _spin_unlock+0x27/0x50
    [] ? vt_console_print+0x230/0x320
    [] smp_call_function_single+0x11f/0x130
    [] ? stop_this_cpu+0x0/0x50
    [] ? _spin_unlock_irqrestore+0x43/0x80
    [] smp_call_function_mask+0x11e/0x160
    [] ? stop_this_cpu+0x0/0x50
    [] ? show_registers+0x7f/0x240
    [] ? stop_this_cpu+0x0/0x50
    [] smp_call_function+0x30/0x60
    [] native_smp_send_stop+0x1e/0x80
    [] ? die+0x110/0x180
    [] ? do_page_fault+0x14d/0x880
    [] ? trace_hardirqs_off+0xb/0x10
    [] ? mark_held_locks+0x40/0x80
    [] ? trace_hardirqs_on_caller+0x116/0x170
    [] ? trace_hardirqs_off+0xb/0x10
    [] ? native_sched_clock+0xb2/0xe0
    [] ? get_lock_stats+0x1e/0x50
    [] ? put_lock_stats+0xd/0x30
    [] ? lock_release_holdtime+0x83/0x120
    [] ? up_read+0x16/0x30
    [] ? dbFree+0x194/0x270
    [] ? do_page_fault+0x0/0x880
    [] ? error_code+0x72/0x78
    [] ? dbSync+0x68/0x140
    [] ? release_metapage+0x32/0x1c0
    [] ? __lock_acquire+0x2c9/0x1110
    [] ? xtTruncate+0xd42/0x1060
    [] ? trace_hardirqs_on+0xb/0x10
    [] ? trace_hardirqs_on+0xb/0x10
    [] ? trace_hardirqs_on+0xb/0x10
    [] ? _read_unlock_irq+0x36/0x60
    [] ? find_get_pages+0x68/0x80
    [] ? pagevec_lookup+0x22/0x30
    [] ? truncate_inode_pages_range+0x189/0x360
    [] ? lock_release_holdtime+0x83/0x120
    [] ? jfs_free_zero_link+0x98/0x1b0
    [] ? put_lock_stats+0xd/0x30
    [] ? jfs_delete_inode+0xdd/0x130
    [] ? jfs_delete_inode+0x0/0x130
    [] ? generic_delete_inode+0x81/0x120
    [] ? generic_drop_inode+0x127/0x180
    [] ? iput+0x47/0x50
    [] ? txUpdateMap+0x1d5/0x2a0
    [] ? lock_release_holdtime+0x83/0x120
    [] ? trace_hardirqs_on+0xb/0x10
    [] ? jfs_lazycommit+0xd1/0x2e0
    [] ? default_wake_function+0x0/0x10
    [] ? jfs_lazycommit+0x0/0x2e0
    [] ? kthread+0x3c/0x70
    [] ? kthread+0x0/0x70
    [] ? kernel_thread_helper+0x7/0x1c
    =======================
    ---[ end trace 3a81a57639faabca ]---

    The page fault had this address:

    $ addr2line -e vmlinux -i c02f9122
    include/asm/bitops.h:186
    include/linux/page-flags.h:151
    include/linux/pagemap.h:170
    fs/jfs/jfs_metapage.c:741

    The current tree is 33af79d12e0fa25545d49e86afc67ea8ad5f2f40 + your
    previous patch.

    Thanks,


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: latest -git: BUG at fs/jfs/namei.c:512 assert(ip->i_nlink)

    On Fri, Jul 18, 2008 at 10:29 AM, Vegard Nossum wrote:
    > BUG: unable to handle kernel paging request at c08845af
    > IP: [] release_metapage+0x32/0x1c0


    Hi,

    I am still unable to see the "already zero" message. I'll post this as
    well because it looks completely different:

    ------------[ cut here ]------------
    WARNING: at kernel/mutex.c:134 mutex_lock_nested+0x2d4/0x320()
    Pid: 5335, comm: cp Not tainted 2.6.26-03415-gdf3030b #45
    [] warn_on_slowpath+0x4f/0xa0
    [] ? __lock_acquire+0x2c9/0x1110
    [] ? trace_hardirqs_off+0xb/0x10
    [] ? mark_held_locks+0x40/0x80
    [] ? trace_hardirqs_on+0xb/0x10
    [] ? trace_hardirqs_on_caller+0x116/0x170
    [] mutex_lock_nested+0x2d4/0x320
    [] ? diAlloc+0x28b/0x680
    [] diAlloc+0x28b/0x680
    [] ? _spin_unlock+0x27/0x50
    [] ialloc+0x49/0x320
    [] jfs_mkdir+0x77/0x3c0
    [] ? trace_hardirqs_off+0xb/0x10
    [] ? trace_hardirqs_on+0xb/0x10
    [] ? trace_hardirqs_off+0xb/0x10
    [] ? jfs_permission+0x0/0x10
    [] ? jfs_mkdir+0x0/0x3c0
    [] vfs_mkdir+0xb7/0x130
    [] ? _spin_unlock+0x27/0x50
    [] sys_mkdirat+0xe8/0x100
    [] ? trace_hardirqs_on_thunk+0xc/0x10
    [] ? do_page_fault+0x0/0x880
    [] sys_mkdir+0x20/0x30
    [] sysenter_past_esp+0x78/0xc5
    =======================
    ---[ end trace 010115e08457f5d6 ]---
    BUG: unable to handle kernel paging request at fff7f7f7
    IP: [] __list_add+0x10/0x70
    *pde = 00007067 *pte = 00000000
    Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Pid: 5335, comm: cp Tainted: G W (2.6.26-03415-gdf3030b #45)
    EIP: 0060:[] EFLAGS: 00210046 CPU: 1
    EIP is at __list_add+0x10/0x70
    EAX: fff7f7f7 EBX: e5b77d88 ECX: e5f68a70 EDX: fff7f7f7
    ESI: e5f68a50 EDI: 00200246 EBP: e5b77d60 ESP: e5b77d4c
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    Process cp (pid: 5335, ti=e5b76000 task=f69acfb0 task.ti=e5b76000)
    Stack: c02eb38b e5f68a84 e5f68a4c e5f68a50 e5f68a4c e5b77da8 c0748baa 00000000
    00000002 c02eb38b c02eb38b 00000000 f69acfb0 e5f68a70 e5f68a84 e5b77d88
    e5b77d88 11111111 e5f68a4c e5b77d88 de0a9d8c de0a82d4 de0a9d8c e5b77e0c
    Call Trace:
    [] ? diAlloc+0x28b/0x680
    [] ? mutex_lock_nested+0xda/0x320
    [] ? diAlloc+0x28b/0x680
    [] ? diAlloc+0x28b/0x680
    [] ? diAlloc+0x28b/0x680
    [] ? _spin_unlock+0x27/0x50
    [] ? ialloc+0x49/0x320
    [] ? jfs_mkdir+0x77/0x3c0
    [] ? trace_hardirqs_off+0xb/0x10
    [] ? trace_hardirqs_on+0xb/0x10
    [] ? trace_hardirqs_off+0xb/0x10
    [] ? jfs_permission+0x0/0x10
    [] ? jfs_mkdir+0x0/0x3c0
    [] ? vfs_mkdir+0xb7/0x130
    [] ? _spin_unlock+0x27/0x50
    [] ? sys_mkdirat+0xe8/0x100
    [] ? trace_hardirqs_on_thunk+0xc/0x10
    [] ? do_page_fault+0x0/0x880
    [] ? sys_mkdir+0x20/0x30
    [] ? sysenter_past_esp+0x78/0xc5
    =======================
    Code: 54 24 04 c7 04 24 80 59 88 c0 e8 20 bf 30 00 0f 0b eb fe 90 8d
    b4 26 00 00 00 00 55 89 e5 53 89 c3 83 ec 10 8b 41 04 39 d0 75 16 <8b>
    10 39 ca 75 32 89 5a 04 89 13 89 43 04 89 18 83 c4 10 5b 5d
    EIP: [] __list_add+0x10/0x70 SS:ESP 0068:e5b77d4c
    Kernel panic - not syncing: Fatal exception

    (Followed by similar smp_call_function warnings as before.)

    But I think I'll stop testing now until we have another patch.


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: latest -git: BUG at fs/jfs/namei.c:512 assert(ip->i_nlink)

    On Fri, Jul 18, 2008 at 7:15 PM, Dave Kleikamp
    wrote:
    > On Fri, 2008-07-18 at 10:10 +0200, Vegard Nossum wrote:
    >
    >> But whether it truly helped or not, I don't know. I got something different now:
    >>
    >> jfs_unlink: dtDelete returned -116
    >> jfs_unlink: dtDelete returned -116
    >> ------------[ cut here ]------------
    >> kernel BUG at fs/inode.c:262!

    >
    > BUG_ON(inode->i_data.nrpages);
    >
    >> invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    >> Pid: 387, comm: jfsCommit Not tainted (2.6.26-03415-gdf3030b #45)
    >> EIP: 0060:[] EFLAGS: 00010206 CPU: 0
    >> EIP is at clear_inode+0x149/0x160
    >> EAX: 00000005 EBX: f56e283c ECX: f7242fd0 EDX: 00000000
    >> ESI: f56e2568 EDI: 00000000 EBP: f79f9f0c ESP: f79f9f04
    >> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    >> Process jfsCommit (pid: 387, ti=f79f8000 task=f7242fd0 task.ti=f79f8000)
    >> Stack: 00000001 f56e283c f79f9f24 c02de258 c08f7ba0 f79f9f24 c02de220 f56e283c
    >> f79f9f34 c01b9ba1 f56e283c 00000000 f79f9f44 c01b9d67 f56e283c 00000000
    >> f79f9f50 c01b8be7 f56e283c f79f9f9c c02fd5b5 f88261bc f79f9f70 c0159173

    >
    >>
    >> ...but it might just be a different error altogether.

    >
    > I think it is. Your test corrupt more than just di_nlink, right? There
    > is a test in jfs_delete_inode() for JFS_IP(inode)->fileset ==
    > FILESYSTEM_I. If the test fails, we don't call truncate_inode_pages().
    > Nowhere do we validate that fileset = FILESYSTEM_I when we read the
    > inode from disk.
    >
    > I'm pondering whether it would be okay to remove the silly test in
    > jfs_delete_inode(), or whether I need to validate the field when we read
    > in the inode.
    >
    > Hmm. The fileset is written to the journal, so it really should be
    > validated.
    >
    > Here's another patch
    >
    > JFS: validate di_fileset when reading inode from disk.


    Thanks. On top of the previous one, I presume?


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: latest -git: BUG at fs/jfs/namei.c:512 assert(ip->i_nlink)

    On Fri, 2008-07-18 at 10:10 +0200, Vegard Nossum wrote:

    > But whether it truly helped or not, I don't know. I got something different now:
    >
    > jfs_unlink: dtDelete returned -116
    > jfs_unlink: dtDelete returned -116
    > ------------[ cut here ]------------
    > kernel BUG at fs/inode.c:262!


    BUG_ON(inode->i_data.nrpages);

    > invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    > Pid: 387, comm: jfsCommit Not tainted (2.6.26-03415-gdf3030b #45)
    > EIP: 0060:[] EFLAGS: 00010206 CPU: 0
    > EIP is at clear_inode+0x149/0x160
    > EAX: 00000005 EBX: f56e283c ECX: f7242fd0 EDX: 00000000
    > ESI: f56e2568 EDI: 00000000 EBP: f79f9f0c ESP: f79f9f04
    > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    > Process jfsCommit (pid: 387, ti=f79f8000 task=f7242fd0 task.ti=f79f8000)
    > Stack: 00000001 f56e283c f79f9f24 c02de258 c08f7ba0 f79f9f24 c02de220 f56e283c
    > f79f9f34 c01b9ba1 f56e283c 00000000 f79f9f44 c01b9d67 f56e283c 00000000
    > f79f9f50 c01b8be7 f56e283c f79f9f9c c02fd5b5 f88261bc f79f9f70 c0159173


    >
    > ...but it might just be a different error altogether.


    I think it is. Your test corrupt more than just di_nlink, right? There
    is a test in jfs_delete_inode() for JFS_IP(inode)->fileset ==
    FILESYSTEM_I. If the test fails, we don't call truncate_inode_pages().
    Nowhere do we validate that fileset = FILESYSTEM_I when we read the
    inode from disk.

    I'm pondering whether it would be okay to remove the silly test in
    jfs_delete_inode(), or whether I need to validate the field when we read
    in the inode.

    Hmm. The fileset is written to the journal, so it really should be
    validated.

    Here's another patch

    JFS: validate di_fileset when reading inode from disk.

    Signed-off-by: Dave Kleikamp

    diff --git a/fs/jfs/jfs_imap.c b/fs/jfs/jfs_imap.c
    index d6363d8..19df0dd 100644
    --- a/fs/jfs/jfs_imap.c
    +++ b/fs/jfs/jfs_imap.c
    @@ -391,7 +391,10 @@ int diRead(struct inode *ip)
    dp += rel_inode;

    if (ip->i_ino != le32_to_cpu(dp->di_number)) {
    - jfs_error(ip->i_sb, "diRead: i_ino != di_number");
    + jfs_error(ip->i_sb, "JFS diRead: i_ino != di_number");
    + rc = -EIO;
    + } else if (dp->di_fileset != cpu_to_le32(FILESYSTEM_I)) {
    + jfs_error(ip->i_sb, "JFS diRead: inode is corrupt");
    rc = -EIO;
    } else if (le32_to_cpu(dp->di_nlink) == 0)
    rc = -ESTALE;

    --
    David Kleikamp
    IBM Linux Technology Center

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: latest -git: BUG at fs/jfs/namei.c:512 assert(ip->i_nlink)

    On Fri, 2008-07-18 at 10:29 +0200, Vegard Nossum wrote:
    > On Fri, Jul 18, 2008 at 10:10 AM, Vegard Nossum wrote:
    > > On Fri, Jul 18, 2008 at 1:09 AM, Dave Kleikamp
    > > wrote:
    > >> On Thu, 2008-07-17 at 20:35 +0200, Vegard Nossum wrote:
    > >>> Hi,
    > >>>
    > >>> I got this on latest -git with an intentionally corrupted filesystem image:
    > >>>
    > >>> BUG at fs/jfs/namei.c:512 assert(ip->i_nlink)
    > >>
    > >> This assert shouldn't be here. It would be better to handle this with
    > >> jfs_error(), which will mark the superblock dirty, and take appropriate
    > >> action.
    > >>
    > >>> Full log at http://folk.uio.no/vegardno/linux/log-1216318656.txt, but
    > >>> I think the preceding messages are just left-overs from previous
    > >>> mount/remount attempts. I can test patches.
    > >>
    > >> How about this one? So far, compile tested only.
    > >>
    > >> JFS: The kernel shouldn't trap when deleting a file with nlink == 0.
    > >>
    > >> Signed-off-by: Dave Kleikamp

    > >
    > > Thanks!
    > >
    > > But whether it truly helped or not, I don't know. I got something different now:
    > >
    > > jfs_unlink: dtDelete returned -116
    > > jfs_unlink: dtDelete returned -116
    > > ------------[ cut here ]------------
    > > kernel BUG at fs/inode.c:262!

    >
    > Running the test for some more on a freshly booted system also gives
    > me this after a while (I don't know if it's related, but it might give
    > some additional clues?):
    >
    > BUG: unable to handle kernel paging request at c08845af
    > IP: [] release_metapage+0x32/0x1c0
    > *pde = 37ab2163 *pte = 00884161
    > Oops: 0003 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    > Pid: 387, comm: jfsCommit Not tainted (2.6.26-03415-gdf3030b #45)
    > EIP: 0060:[] EFLAGS: 00010202 CPU: 1
    > EIP is at release_metapage+0x32/0x1c0
    > EAX: 00000246 EBX: f470e5d8 ECX: f7a2afd0 EDX: 00000000
    > ESI: c08845af EDI: 00000000 EBP: f735fd98 ESP: f735fd7c
    > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    > Process jfsCommit (pid: 387, ti=f735e000 task=f7a2afd0 task.ti=f735e000)
    > Stack: c015b6e9 00000001 00000286 00000246 00000002 00000000 00000000 f735fed0
    > c02e4202 00000020 f7a2afd0 c015addb 00000000 f470e9a4 f735fe34 00000000
    > 00000000 f470e85c c015addb 00000000 00000000 f470e5d8 f470e6d8 00000002
    > Call Trace:
    > [] ? __lock_acquire+0x2c9/0x1110
    > [] ? xtTruncate+0xd42/0x1060


    xtTruncate can call release_metapage() in several places. If you still
    have this kernel, could you please run addr2line against c02e4202? If
    you've rebuilt your kernel since then, I understand.

    Thanks,
    Shaggy
    --
    David Kleikamp
    IBM Linux Technology Center

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [Jfs-discussion] latest -git: BUG at fs/jfs/namei.c:512 assert(ip->i_nlink)

    On Fri, 2008-07-18 at 19:17 +0200, Vegard Nossum wrote:
    > On Fri, Jul 18, 2008 at 7:15 PM, Dave Kleikamp
    > wrote:


    > > Here's another patch
    > >
    > > JFS: validate di_fileset when reading inode from disk.

    >
    > Thanks. On top of the previous one, I presume?


    Yes. I'm still looking at the others.

    --
    David Kleikamp
    IBM Linux Technology Center

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: latest -git: BUG at fs/jfs/namei.c:512 assert(ip->i_nlink)

    On Fri, Jul 18, 2008 at 7:29 PM, Dave Kleikamp
    wrote:
    >> BUG: unable to handle kernel paging request at c08845af
    >> IP: [] release_metapage+0x32/0x1c0
    >> *pde = 37ab2163 *pte = 00884161
    >> Oops: 0003 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    >> Pid: 387, comm: jfsCommit Not tainted (2.6.26-03415-gdf3030b #45)
    >> EIP: 0060:[] EFLAGS: 00010202 CPU: 1
    >> EIP is at release_metapage+0x32/0x1c0
    >> EAX: 00000246 EBX: f470e5d8 ECX: f7a2afd0 EDX: 00000000
    >> ESI: c08845af EDI: 00000000 EBP: f735fd98 ESP: f735fd7c
    >> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    >> Process jfsCommit (pid: 387, ti=f735e000 task=f7a2afd0 task.ti=f735e000)
    >> Stack: c015b6e9 00000001 00000286 00000246 00000002 00000000 00000000 f735fed0
    >> c02e4202 00000020 f7a2afd0 c015addb 00000000 f470e9a4 f735fe34 00000000
    >> 00000000 f470e85c c015addb 00000000 00000000 f470e5d8 f470e6d8 00000002
    >> Call Trace:
    >> [] ? __lock_acquire+0x2c9/0x1110
    >> [] ? xtTruncate+0xd42/0x1060

    >
    > xtTruncate can call release_metapage() in several places. If you still
    > have this kernel, could you please run addr2line against c02e4202? If
    > you've rebuilt your kernel since then, I understand.


    Sorry, it's gone. However, I just recompiled it and the size of the
    function is now 0x105f, and the offset 0xd42 matches with a call to
    release_metapage. So with this new address, it's this call:

    $ addr2line -e vmlinux -i c02e428d
    fs/jfs/jfs_metapage.h:101
    fs/jfs/jfs_xtree.c:3723

    ....which is this one:

    /* invalidate empty leaf page */
    discard_metapage(mp);


    With your two patches applied, I now got this:

    ERROR: (device loop0): remounting filesystem as read-only
    blkno = 200000687, nblocks = f
    ERROR: (device loop0): dbFree: block to be freed is outside the map
    BUG: unable to handle kernel paging request at 338f8038
    IP: [] dbAllocBits+0x252/0x2b0
    *pde = 00000000
    Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Pid: 5034, comm: dmesg Not tainted (2.6.26-03421-g7f4c453 #48)
    EIP: 0060:[] EFLAGS: 00210246 CPU: 1
    EIP is at dbAllocBits+0x252/0x2b0
    EAX: 00000001 EBX: f38f8448 ECX: 08000000 EDX: f38f8000
    ESI: f38f8000 EDI: 00000000 EBP: f62c9b14 ESP: f62c9ad8
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    Process dmesg (pid: 5034, ti=f62c8000 task=f62e1fe0 task.ti=f62c8000)
    Stack: 00000004 c0179d5e 00000000 f62c9b38 f393f000 f38f8000 f393f800 00000022
    00000000 00000001 f393f010 f393f076 0000000a 00000000 f393f000 f62c9b38
    c02ee3e8 00000022 00000100 00000001 f38f8000 f393f000 00000001 00000005
    Call Trace:
    [] ? unlock_page+0x4e/0x60
    [] ? dbAllocDmap+0x38/0xa0
    [] ? dbAlloc+0x54a/0x5d0
    [] ? extAlloc+0x200/0x620
    [] ? jfs_get_block+0x41/0x2a0
    [] ? down_write_nested+0x77/0x90
    [] ? jfs_get_block+0x41/0x2a0
    [] ? jfs_get_block+0x23f/0x2a0
    [] ? alloc_buffer_head+0x11/0x70
    [] ? set_bh_page+0x3f/0x70
    [] ? alloc_page_buffers+0x70/0xd0
    [] ? nobh_write_begin+0x155/0x400
    [] ? __lock_acquire+0x2c9/0x1110
    [] ? jfs_write_begin+0x3d/0x50
    [] ? jfs_get_block+0x0/0x2a0
    [] ? generic_file_buffered_write+0x165/0x5f0
    [] ? _spin_unlock+0x27/0x50
    [] ? __generic_file_aio_write_nolock+0x236/0x530
    [] ? trace_hardirqs_on+0xb/0x10
    [] ? generic_file_aio_write+0x63/0xd0
    [] ? do_sync_write+0xd1/0x110
    [] ? get_lock_stats+0x1e/0x50
    [] ? autoremove_wake_function+0x0/0x50
    [] ? _spin_unlock+0x27/0x50
    [] ? vfs_write+0x9c/0x140
    [] ? do_sync_write+0x0/0x110
    [] ? sys_write+0x3d/0x70
    [] ? sysenter_past_esp+0x78/0xc5
    =======================
    Code: 8b 55 0c 8b 45 08 8b 4e 34 0f ad d0 d3 fa f6 c1 20 0f 45 c2 3b
    46 1c 89 c1 7e 03 89 46 1c 8b 45 10 89 c2 c1 fa 1f 89 d7 8b 55 d8 <29>
    44 ca 38 19 7c ca 3c 29 42 08 89 d8 19 7a 0c e8 89 b0 45 00
    EIP: [] dbAllocBits+0x252/0x2b0 SS:ESP 0068:f62c9ad8
    Kernel panic - not syncing: Fatal exception

    $ addr2line -e vmlinux -i c02ee352
    fs/jfs/jfs_dmap.c:2188

    You can also have a copy of my scripts if you want to try to reproduce
    it locally. But I don't mind testing either :-)


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: latest -git: BUG at fs/jfs/namei.c:512 assert(ip->i_nlink)

    On Fri, 2008-07-18 at 20:08 +0200, Vegard Nossum wrote:
    > On Fri, Jul 18, 2008 at 7:29 PM, Dave Kleikamp
    > wrote:
    > >
    > > xtTruncate can call release_metapage() in several places. If you still
    > > have this kernel, could you please run addr2line against c02e4202? If
    > > you've rebuilt your kernel since then, I understand.

    >
    > Sorry, it's gone. However, I just recompiled it and the size of the
    > function is now 0x105f, and the offset 0xd42 matches with a call to
    > release_metapage. So with this new address, it's this call:
    >
    > $ addr2line -e vmlinux -i c02e428d
    > fs/jfs/jfs_metapage.h:101
    > fs/jfs/jfs_xtree.c:3723
    >
    > ...which is this one:
    >
    > /* invalidate empty leaf page */
    > discard_metapage(mp);


    Thanks. This helps some, but I'm not sure why it should fail there.
    Possibly a memory corruption problem caused by a bug in an earlier error
    path.

    > With your two patches applied, I now got this:
    >
    > ERROR: (device loop0): remounting filesystem as read-only
    > blkno = 200000687, nblocks = f
    > ERROR: (device loop0): dbFree: block to be freed is outside the map
    > BUG: unable to handle kernel paging request at 338f8038


    I see I've got a few of these to track down. I'm sure there are more
    than a couple more bugs you will uncover.

    > $ addr2line -e vmlinux -i c02ee352
    > fs/jfs/jfs_dmap.c:2188
    >
    > You can also have a copy of my scripts if you want to try to reproduce
    > it locally. But I don't mind testing either :-)


    Can you please send me a copy. For some of these, I'd like to get a
    repeatable test case, so I'm not debugging several problems at once.

    Thanks,
    Shaggy
    --
    David Kleikamp
    IBM Linux Technology Center

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: latest -git: BUG at fs/jfs/namei.c:512 assert(ip->i_nlink)

    On Fri, Jul 18, 2008 at 11:58 PM, Dave Kleikamp
    wrote:
    >> You can also have a copy of my scripts if you want to try to reproduce
    >> it locally. But I don't mind testing either :-)

    >
    > Can you please send me a copy. For some of these, I'd like to get a
    > repeatable test case, so I'm not debugging several problems at once.


    Hi,

    I've updated my scripts to save the bad disk images too, and now I've
    hit a repeatable case:

    mount -o loop disk.1 /mnt
    rm -rf /mnt/*

    BUG: unable to handle kernel paging request at f4568036
    IP: [] dbAdjTree+0x62/0xd0
    *pde = 36b9a163 *pte = 34568160
    Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Pid: 396, comm: jfsCommit Not tainted (2.6.26-05752-g93ded9b #93)
    EIP: 0060:[] EFLAGS: 00010246 CPU: 1
    EIP is at dbAdjTree+0x62/0xd0
    EAX: 00000000 EBX: 00020015 ECX: 00000004 EDX: 00080054
    ESI: f4548010 EDI: 00000000 EBP: f7a37c58 ESP: f7a37c48
    DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    Process jfsCommit (pid: 396, ti=f7a36000 task=f78a9fe0 task.ti=f7a36000)
    Stack: 00020015 f4548804 00000001 00000004 f7a37c78 c0304a0d 00000001 f4548010
    f7a37c88 f4548804 00000001 00000022 f7a37cc0 c03065c1 00000246 f569893c
    c1c44ce0 f7a37c98 c017c77e f4548000 f639b950 f4548800 00000000 00000022
    Call Trace:
    [] ? dbJoin+0xbd/0xe0
    [] ? dbFreeBits+0xc1/0x320
    [] ? unlock_page+0x4e/0x60
    [] ? dbFreeDmap+0x3d/0xd0
    [] ? dbFree+0x153/0x270
    [] ? txFreeMap+0x1d1/0x3e0
    [] ? __mark_inode_dirty+0x2c/0x180
    [] ? xtTruncate+0xe7b/0x1030
    [] ? __lock_acquire+0x2c8/0x1140
    [] ? __lock_acquire+0x2c8/0x1140
    [] ? _raw_spin_unlock+0x68/0x80
    [] ? __mark_inode_dirty+0xa2/0x180
    [] ? jfs_free_zero_link+0x98/0x1b0
    [] ? put_lock_stats+0xd/0x30
    [] ? jfs_delete_inode+0x77/0x90
    [] ? jfs_delete_inode+0x0/0x90
    [] ? generic_delete_inode+0x62/0xe0
    [] ? generic_drop_inode+0x11d/0x170
    [] ? iput+0x47/0x50
    [] ? txUpdateMap+0x1d5/0x2a0
    [] ? lock_release_holdtime+0x83/0x120
    [] ? trace_hardirqs_on+0xb/0x10
    [] ? jfs_lazycommit+0xd1/0x2e0
    [] ? default_wake_function+0x0/0x10
    [] ? jfs_lazycommit+0x0/0x2e0
    [] ? kthread+0x3c/0x70
    [] ? kthread+0x0/0x70
    [] ? kernel_thread_helper+0x7/0x1c
    =======================
    Code: f0 8d 44 16 12 8b 5d f0 0f b6 48 02 0f b6 78 03 38 48 03 0f b6
    40 01 0f 4d cf 38 c1 0f 4c c8 0f b6 44 16 12 38 c1 0f 4c c8 31 ff <38>
    4c 1e 11 75 47 eb 54 8d b6 00 00 00 00 8b 55 f0 83 ea 01 83
    EIP: [] dbAdjTree+0x62/0xd0 SS:ESP 0068:f7a37c48
    ---[ end trace 2d3ec5b787d965f7 ]---

    This decodes to fs/jfs/jfs_dmap.c:2831:

    if (tp->dmt_stree[pp] == max)

    or

    0: 38 4c 1e 11 cmp %cl,0x11(%esi,%ebx,1)

    It looks like %esi is the "tp" and %ebx is "pp", which would make pp
    == 0x00020015.

    Earlier, I got a crash in the same function, but a bit higher (with
    the same disk image, and with the same commands):

    BUG: unable to handle kernel paging request at daf76077
    IP: [] dbAdjTree+0xe/0xd0
    *pde = 34e11163 *pte = 1af76160
    Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC

    Pid: 396, comm: jfsCommit Not tainted (2.6.26-05752-g93ded9b #93)
    EIP: 0060:[] EFLAGS: 00010206 CPU: 1
    EIP is at dbAdjTree+0xe/0xd0
    EAX: daef6010 EBX: daef6804 ECX: 00000004 EDX: 00080056
    ESI: daef6010 EDI: 00000004 EBP: f79dfc58 ESP: f79dfc48
    DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    Process jfsCommit (pid: 396, ti=f79de000 task=f7336f90 task.ti=f79de000)
    Stack: c0185b72 daef6804 00000001 00000004 f79dfc78 c0304a0d 00000001 daef6010
    f79dfc88 daef6804 00000001 00000022 f79dfcc0 c03065c1 00000246 d7731e5c
    c16511a8 f79dfc98 c017c77e daef6000 f386c180 daef6800 00000000 00000022
    Call Trace:
    [] ? activate_page+0x42/0xb0
    [] ? dbJoin+0xbd/0xe0

    This corresponds to fs/jfs/jfs_dmap.c:2805:

    if (tp->dmt_stree[lp] == newval)

    or

    0: 0f be 44 10 11 movsbl 0x11(%eax,%edx,1),%eax

    ...and here it looks like %eax is the base pointer and %edx the index,
    which would make lp == 0x00080056.

    The bad disk image can be found at
    http://userweb.kernel.org/~vegard/20080722-jfs/ (no private/sensitive
    data)

    Thanks,


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread