ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c - Kernel

This is a discussion on ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c - Kernel ; Hi, I get this with both clean v2.6.26 and latest -git (33af79d12e0fa25545d49e86afc67ea8ad5f2f40): BUG: unable to handle kernel NULL pointer dereference at 0000000c IP: [ ] journal_dirty_metadata+0xa0/0x160 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Pid: 4935, comm: rm Not ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 25

Thread: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

  1. ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    Hi,

    I get this with both clean v2.6.26 and latest -git
    (33af79d12e0fa25545d49e86afc67ea8ad5f2f40):

    BUG: unable to handle kernel NULL pointer dereference at 0000000c
    IP: [] journal_dirty_metadata+0xa0/0x160
    *pde = 00000000
    Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Pid: 4935, comm: rm Not tainted (2.6.26-03414-g33af79d #39)
    EIP: 0060:[] EFLAGS: 00210246 CPU: 1
    EIP is at journal_dirty_metadata+0xa0/0x160
    EAX: 00000000 EBX: cca59160 ECX: 00000001 EDX: f5114000
    ESI: 00000000 EDI: f3d27750 EBP: f5115d58 ESP: f5115d40
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    Process rm (pid: 4935, ti=f5114000 task=f6a04fb0 task.ti=f5114000)
    Stack: 00000001 f77d0050 cca00c90 f3d27750 f77d0050 f3d27750 f5115d78 c01f9eff
    00000001 00000001 c05c2a53 f3d27750 00000000 f60da560 f5115da8 c01ef9ef
    00000001 00000001 f60da560 f60da800 f3d27750 f3cc5944 f77d0050 f3d27750
    Call Trace:
    [] ? __ext3_journal_dirty_metadata+0x1f/0x50
    [] ? ext3_free_data+0x9f/0x100
    [] ? ext3_free_branches+0x23b/0x250
    [] ? sync_buffer+0x0/0x40
    [] ? ext3_free_branches+0xae/0x250
    [] ? ext3_free_branches+0xae/0x250
    [] ? ext3_truncate+0x5c8/0x940
    [] ? trace_hardirqs_on_caller+0x116/0x170
    [] ? journal_start+0xb0/0x110
    [] ? journal_start+0xd3/0x110
    [] ? journal_start+0xb0/0x110
    [] ? ext3_journal_start_sb+0x29/0x50
    [] ? ext3_delete_inode+0xd7/0xe0
    [] ? ext3_delete_inode+0x0/0xe0
    [] ? generic_delete_inode+0x62/0xe0
    [] ? generic_drop_inode+0x11d/0x170
    [] ? iput+0x47/0x50
    [] ? do_unlinkat+0xec/0x170
    [] ? trace_hardirqs_on_thunk+0xc/0x10
    [] ? do_page_fault+0x0/0x880
    [] ? trace_hardirqs_on_caller+0x116/0x170
    [] ? sys_unlinkat+0x23/0x50
    [] ? sysenter_past_esp+0x78/0xc5
    =======================
    Code: b8 01 00 00 00 e8 f1 57 f3 ff 89 e0 25 00 e0 ff ff f6 40 08 08
    74 05 e8 2f e6 3a 00 83 c4 0c 31 c0 5b 5e 5f 5d c3 90 8d 74 26 00 <8b>
    46 0c 85 c0 0f 84 8c 00 00 00 39 5e 18 74 68 8d 47 02 89 45
    EIP: [] journal_dirty_metadata+0xa0/0x160 SS:ESP 0068:f5115d40
    ---[ end trace ad9c7bca1cad9e55 ]---

    This corresponds to "jh" being NULL in journal_dirty_metadata():

    if (jh->b_modified == 0) {

    I also tried with this patch, but without success:

    http://folk.uio.no/vegardno/linux/jbd-transaction.patch

    so the problem seems quite reproducible by intentionally corrupting a
    disk image.


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    On Thu, Jul 17, 2008 at 8:51 AM, Vegard Nossum wrote:
    > Hi,
    >
    > I get this with both clean v2.6.26 and latest -git
    > (33af79d12e0fa25545d49e86afc67ea8ad5f2f40):
    >


    Thanks for the report, do you happen to have any messages above the
    panic message that would indicate if there was some sort of fs error
    that was hit before the panic? That would help me figure out what
    exactly happened. Thanks,

    Josef
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    On Thu, Jul 17, 2008 at 3:13 PM, Josef Bacik wrote:
    > On Thu, Jul 17, 2008 at 8:51 AM, Vegard Nossum wrote:
    >> Hi,
    >>
    >> I get this with both clean v2.6.26 and latest -git
    >> (33af79d12e0fa25545d49e86afc67ea8ad5f2f40):
    >>

    >
    > Thanks for the report, do you happen to have any messages above the
    > panic message that would indicate if there was some sort of fs error
    > that was hit before the panic? That would help me figure out what
    > exactly happened. Thanks,


    Yeah, the full log exists at

    http://folk.uio.no/vegardno/linux/log-1216293934.txt

    I think this is the interesting part:

    kjournald starting. Commit interval 5 seconds
    EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
    EXT3 FS on loop0, internal journal
    EXT3-fs: mounted filesystem with ordered data mode.
    EXT3-fs error (device loop0): ext3_free_blocks_sb: bit already cleared
    for block 1507
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 1050159, count = 1
    EXT3-fs error (device loop0) in ext3_reserve_inode_write: IO failure
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 2048, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 102, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 496, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 245, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 1, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 8192, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 8192, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 256, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 1216293753, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 1216293753, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 1703965, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 257875, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 1, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 1216293738, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 15552000, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 1, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 11, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 128, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 52, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 6, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 1, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 3246399477, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 860559364, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 1659021227, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 2723221558, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 458752, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 8, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 11, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 1092049505, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 2823687499, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 2276435116, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 2703362334, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 258, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 1216293738, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 58, count = 13
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 327, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 1048576, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 1, count = 1
    BUG: unable to handle kernel NULL pointer dereference at 0000000c
    IP: [] journal_dirty_metadata+0xa0/0x160

    Would it also help to reproduce with jbd/ext3 debug enabled? (If it
    isn't already.)


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    On Thu, Jul 17, 2008 at 9:20 AM, Vegard Nossum wrote:
    > On Thu, Jul 17, 2008 at 3:13 PM, Josef Bacik wrote:
    >> On Thu, Jul 17, 2008 at 8:51 AM, Vegard Nossum wrote:
    >>> Hi,
    >>>
    >>> I get this with both clean v2.6.26 and latest -git
    >>> (33af79d12e0fa25545d49e86afc67ea8ad5f2f40):
    >>>

    >>
    >> Thanks for the report, do you happen to have any messages above the
    >> panic message that would indicate if there was some sort of fs error
    >> that was hit before the panic? That would help me figure out what
    >> exactly happened. Thanks,

    >
    > Yeah, the full log exists at
    >
    > http://folk.uio.no/vegardno/linux/log-1216293934.txt
    >
    > I think this is the interesting part:


    Hmm well the journal should have aborted, but it looks like it didn't,
    are you mounting with errors=continue by any chance? Thanks much,

    Josef
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    On Thu, Jul 17, 2008 at 3:34 PM, Josef Bacik wrote:
    >> Yeah, the full log exists at
    >>
    >> http://folk.uio.no/vegardno/linux/log-1216293934.txt
    >>
    >> I think this is the interesting part:

    >
    > Hmm well the journal should have aborted, but it looks like it didn't,
    > are you mounting with errors=continue by any chance? Thanks much,


    No, this is the command I used:

    mount -o loop disk mnt

    I think this looks interesting:

    EXT3-fs error (device loop0) in ext3_reserve_inode_write: IO failure

    The code in ext3_reserve_inode_write() is here:

    err = ext3_journal_get_write_access(handle, iloc->bh);
    if (err) {
    brelse(iloc->bh);
    iloc->bh = NULL;
    }

    Maybe it should do something different here?

    But I don't know :-)

    Thanks for helping out!


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    On Thu, Jul 17, 2008 at 03:39:24PM +0200, Vegard Nossum wrote:
    > On Thu, Jul 17, 2008 at 3:34 PM, Josef Bacik wrote:
    > >> Yeah, the full log exists at
    > >>
    > >> http://folk.uio.no/vegardno/linux/log-1216293934.txt
    > >>
    > >> I think this is the interesting part:

    > >
    > > Hmm well the journal should have aborted, but it looks like it didn't,
    > > are you mounting with errors=continue by any chance? Thanks much,

    >
    > No, this is the command I used:
    >
    > mount -o loop disk mnt
    >
    > I think this looks interesting:
    >
    > EXT3-fs error (device loop0) in ext3_reserve_inode_write: IO failure
    >
    > The code in ext3_reserve_inode_write() is here:
    >
    > err = ext3_journal_get_write_access(handle, iloc->bh);
    > if (err) {
    > brelse(iloc->bh);
    > iloc->bh = NULL;
    > }
    >
    > Maybe it should do something different here?
    >
    > But I don't know :-)
    >
    > Thanks for helping out!
    >


    Well this is really odd, after that we call ext3_std_error which calls
    journal_abort, so when we come into journal_dirty_metadata is_handle_aborted()
    should have returned 1 and we should have just exited. I'm going to have to
    think on this for a bit.

    Josef
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    On Thu, Jul 17, 2008 at 03:39:24PM +0200, Vegard Nossum wrote:
    > On Thu, Jul 17, 2008 at 3:34 PM, Josef Bacik wrote:
    > >> Yeah, the full log exists at
    > >>
    > >> http://folk.uio.no/vegardno/linux/log-1216293934.txt
    > >>
    > >> I think this is the interesting part:

    > >
    > > Hmm well the journal should have aborted, but it looks like it didn't,
    > > are you mounting with errors=continue by any chance? Thanks much,

    >
    > No, this is the command I used:
    >
    > mount -o loop disk mnt
    >
    > I think this looks interesting:
    >
    > EXT3-fs error (device loop0) in ext3_reserve_inode_write: IO failure
    >
    > The code in ext3_reserve_inode_write() is here:
    >
    > err = ext3_journal_get_write_access(handle, iloc->bh);
    > if (err) {
    > brelse(iloc->bh);
    > iloc->bh = NULL;
    > }
    >
    > Maybe it should do something different here?
    >
    > But I don't know :-)
    >
    > Thanks for helping out!
    >
    >


    Can you try this patch out and see if it fixes the problem? I didn't compile
    test it, so you may need to tweak somethings, but it should work. Thanks,

    Signed-off-by: Josef Bacik


    Index: linux-2.6/fs/ext3/inode.c
    ================================================== =================
    --- linux-2.6.orig/fs/ext3/inode.c
    +++ linux-2.6/fs/ext3/inode.c
    @@ -2023,13 +2023,27 @@ static void ext3_clear_blocks(handle_t *
    unsigned long count, __le32 *first, __le32 *last)
    {
    __le32 *p;
    + int ret;
    +
    if (try_to_extend_transaction(handle, inode)) {
    if (bh) {
    BUFFER_TRACE(bh, "call ext3_journal_dirty_metadata");
    - ext3_journal_dirty_metadata(handle, bh);
    + ret = ext3_journal_dirty_metadata(handle, bh);
    + if (ret) {
    + ext3_std_error(inode->i_sb, ret);
    + return;
    + }
    }
    - ext3_mark_inode_dirty(handle, inode);
    - ext3_journal_test_restart(handle, inode);
    + ret = ext3_mark_inode_dirty(handle, inode);
    + if (ret)
    + return;
    +
    + ret = ext3_journal_test_restart(handle, inode);
    + if (ret) {
    + ext3_std_error(inode->i_sb, ret);
    + return;
    + }
    +
    if (bh) {
    BUFFER_TRACE(bh, "retaking write access");
    ext3_journal_get_write_access(handle, bh);
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    On Thu, Jul 17, 2008 at 3:57 PM, Josef Bacik wrote:
    > Can you try this patch out and see if it fixes the problem? I didn't compile
    > test it, so you may need to tweak somethings, but it should work. Thanks,
    >
    > Signed-off-by: Josef Bacik


    Nope, seems to be the same problem:

    kjournald starting. Commit interval 5 seconds
    EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
    EXT3 FS on loop0, internal journal
    EXT3-fs: mounted filesystem with ordered data mode.
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
    system zones - Block = 16, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
    system zones - Block = 32, count = 1
    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 262144, count = 1
    BUG: unable to handle kernel NULL pointer dereference at 0000000c
    IP: [] journal_dirty_metadata+0xa0/0x160

    Full log at http://folk.uio.no/vegardno/linux/log-1216304953.txt


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    On Thu, Jul 17, 2008 at 04:35:16PM +0200, Vegard Nossum wrote:
    > On Thu, Jul 17, 2008 at 4:13 PM, Josef Bacik wrote:
    > > On Thu, Jul 17, 2008 at 04:25:49PM +0200, Vegard Nossum wrote:
    > >> On Thu, Jul 17, 2008 at 3:57 PM, Josef Bacik wrote:
    > >> > Can you try this patch out and see if it fixes the problem? I didn't compile
    > >> > test it, so you may need to tweak somethings, but it should work. Thanks,
    > >> >
    > >> > Signed-off-by: Josef Bacik
    > >>
    > >> Nope, seems to be the same problem:
    > >>
    > >> kjournald starting. Commit interval 5 seconds
    > >> EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
    > >> EXT3 FS on loop0, internal journal
    > >> EXT3-fs: mounted filesystem with ordered data mode.
    > >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
    > >> system zones - Block = 16, count = 1
    > >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
    > >> system zones - Block = 32, count = 1
    > >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    > >> datazone - block = 262144, count = 1
    > >> BUG: unable to handle kernel NULL pointer dereference at 0000000c
    > >> IP: [] journal_dirty_metadata+0xa0/0x160
    > >>
    > >> Full log at http://folk.uio.no/vegardno/linux/log-1216304953.txt
    > >>
    > >>

    > >
    > > Ok, this should do it then. Thanks,

    >
    > Hm, it doesn't apply. Should I revert the previous patch?
    >


    Yeah sorry about that.

    Josef
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    On Thu, Jul 17, 2008 at 04:25:49PM +0200, Vegard Nossum wrote:
    > On Thu, Jul 17, 2008 at 3:57 PM, Josef Bacik wrote:
    > > Can you try this patch out and see if it fixes the problem? I didn't compile
    > > test it, so you may need to tweak somethings, but it should work. Thanks,
    > >
    > > Signed-off-by: Josef Bacik

    >
    > Nope, seems to be the same problem:
    >
    > kjournald starting. Commit interval 5 seconds
    > EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
    > EXT3 FS on loop0, internal journal
    > EXT3-fs: mounted filesystem with ordered data mode.
    > EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
    > system zones - Block = 16, count = 1
    > EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
    > system zones - Block = 32, count = 1
    > EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    > datazone - block = 262144, count = 1
    > BUG: unable to handle kernel NULL pointer dereference at 0000000c
    > IP: [] journal_dirty_metadata+0xa0/0x160
    >
    > Full log at http://folk.uio.no/vegardno/linux/log-1216304953.txt
    >
    >


    Ok, this should do it then. Thanks,

    Josef


    Index: linux-2.6/fs/ext3/inode.c
    ================================================== =================
    --- linux-2.6.orig/fs/ext3/inode.c
    +++ linux-2.6/fs/ext3/inode.c
    @@ -2023,13 +2023,27 @@ static void ext3_clear_blocks(handle_t *
    unsigned long count, __le32 *first, __le32 *last)
    {
    __le32 *p;
    + int ret;
    +
    if (try_to_extend_transaction(handle, inode)) {
    if (bh) {
    BUFFER_TRACE(bh, "call ext3_journal_dirty_metadata");
    - ext3_journal_dirty_metadata(handle, bh);
    + ret = ext3_journal_dirty_metadata(handle, bh);
    + if (ret) {
    + ext3_std_error(inode->i_sb, ret);
    + return;
    + }
    }
    - ext3_mark_inode_dirty(handle, inode);
    - ext3_journal_test_restart(handle, inode);
    + ret = ext3_mark_inode_dirty(handle, inode);
    + if (ret)
    + return;
    +
    + ret = ext3_journal_test_restart(handle, inode);
    + if (ret) {
    + ext3_std_error(inode->i_sb, ret);
    + return;
    + }
    +
    if (bh) {
    BUFFER_TRACE(bh, "retaking write access");
    ext3_journal_get_write_access(handle, bh);
    Index: linux-2.6/fs/ext3/balloc.c
    ================================================== =================
    --- linux-2.6.orig/fs/ext3/balloc.c
    +++ linux-2.6/fs/ext3/balloc.c
    @@ -498,6 +498,7 @@ void ext3_free_blocks_sb(handle_t *handl
    ext3_error (sb, "ext3_free_blocks",
    "Freeing blocks not in datazone - "
    "block = "E3FSBLK", count = %lu", block, count);
    + err = -EIO;
    goto error_return;
    }

    @@ -535,6 +536,7 @@ do_more:
    "Freeing blocks in system zones - "
    "Block = "E3FSBLK", count = %lu",
    block, count);
    + err = -EIO;
    goto error_return;
    }

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    On Thu, Jul 17, 2008 at 4:13 PM, Josef Bacik wrote:
    > On Thu, Jul 17, 2008 at 04:25:49PM +0200, Vegard Nossum wrote:
    >> On Thu, Jul 17, 2008 at 3:57 PM, Josef Bacik wrote:
    >> > Can you try this patch out and see if it fixes the problem? I didn't compile
    >> > test it, so you may need to tweak somethings, but it should work. Thanks,
    >> >
    >> > Signed-off-by: Josef Bacik

    >>
    >> Nope, seems to be the same problem:
    >>
    >> kjournald starting. Commit interval 5 seconds
    >> EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
    >> EXT3 FS on loop0, internal journal
    >> EXT3-fs: mounted filesystem with ordered data mode.
    >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
    >> system zones - Block = 16, count = 1
    >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
    >> system zones - Block = 32, count = 1
    >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    >> datazone - block = 262144, count = 1
    >> BUG: unable to handle kernel NULL pointer dereference at 0000000c
    >> IP: [] journal_dirty_metadata+0xa0/0x160
    >>
    >> Full log at http://folk.uio.no/vegardno/linux/log-1216304953.txt
    >>
    >>

    >
    > Ok, this should do it then. Thanks,


    Hm, it doesn't apply. Should I revert the previous patch?


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    On Thu, Jul 17, 2008 at 4:16 PM, Josef Bacik wrote:
    > On Thu, Jul 17, 2008 at 04:35:16PM +0200, Vegard Nossum wrote:
    >> On Thu, Jul 17, 2008 at 4:13 PM, Josef Bacik wrote:
    >> > On Thu, Jul 17, 2008 at 04:25:49PM +0200, Vegard Nossum wrote:
    >> >> On Thu, Jul 17, 2008 at 3:57 PM, Josef Bacik wrote:
    >> >> > Can you try this patch out and see if it fixes the problem? I didn't compile
    >> >> > test it, so you may need to tweak somethings, but it should work. Thanks,
    >> >> >
    >> >> > Signed-off-by: Josef Bacik
    >> >>
    >> >> Nope, seems to be the same problem:
    >> >>
    >> >> kjournald starting. Commit interval 5 seconds
    >> >> EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
    >> >> EXT3 FS on loop0, internal journal
    >> >> EXT3-fs: mounted filesystem with ordered data mode.
    >> >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
    >> >> system zones - Block = 16, count = 1
    >> >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
    >> >> system zones - Block = 32, count = 1
    >> >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    >> >> datazone - block = 262144, count = 1
    >> >> BUG: unable to handle kernel NULL pointer dereference at 0000000c
    >> >> IP: [] journal_dirty_metadata+0xa0/0x160
    >> >>
    >> >> Full log at http://folk.uio.no/vegardno/linux/log-1216304953.txt
    >> >>
    >> >>
    >> >
    >> > Ok, this should do it then. Thanks,


    -ENOLUCK

    EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    datazone - block = 524288, count = 1
    EXT3-fs error (device loop0) in ext3_free_blocks_sb: IO failure
    BUG: unable to handle kernel NULL pointer dereference at 0000000c
    IP: [] journal_dirty_metadata+0xa0/0x160

    It did seem to get further, though.

    http://folk.uio.no/vegardno/linux/log-1216306142.txt


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    On Thu, Jul 17, 2008 at 04:44:47PM +0200, Vegard Nossum wrote:
    > On Thu, Jul 17, 2008 at 4:16 PM, Josef Bacik wrote:
    > > On Thu, Jul 17, 2008 at 04:35:16PM +0200, Vegard Nossum wrote:
    > >> On Thu, Jul 17, 2008 at 4:13 PM, Josef Bacik wrote:
    > >> > On Thu, Jul 17, 2008 at 04:25:49PM +0200, Vegard Nossum wrote:
    > >> >> On Thu, Jul 17, 2008 at 3:57 PM, Josef Bacik wrote:
    > >> >> > Can you try this patch out and see if it fixes the problem? I didn't compile
    > >> >> > test it, so you may need to tweak somethings, but it should work. Thanks,
    > >> >> >
    > >> >> > Signed-off-by: Josef Bacik
    > >> >>
    > >> >> Nope, seems to be the same problem:
    > >> >>
    > >> >> kjournald starting. Commit interval 5 seconds
    > >> >> EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
    > >> >> EXT3 FS on loop0, internal journal
    > >> >> EXT3-fs: mounted filesystem with ordered data mode.
    > >> >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
    > >> >> system zones - Block = 16, count = 1
    > >> >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
    > >> >> system zones - Block = 32, count = 1
    > >> >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    > >> >> datazone - block = 262144, count = 1
    > >> >> BUG: unable to handle kernel NULL pointer dereference at 0000000c
    > >> >> IP: [] journal_dirty_metadata+0xa0/0x160
    > >> >>
    > >> >> Full log at http://folk.uio.no/vegardno/linux/log-1216304953.txt
    > >> >>
    > >> >>
    > >> >
    > >> > Ok, this should do it then. Thanks,

    >
    > -ENOLUCK
    >
    > EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
    > datazone - block = 524288, count = 1
    > EXT3-fs error (device loop0) in ext3_free_blocks_sb: IO failure
    > BUG: unable to handle kernel NULL pointer dereference at 0000000c
    > IP: [] journal_dirty_metadata+0xa0/0x160
    >
    > It did seem to get further, though.
    >
    > http://folk.uio.no/vegardno/linux/log-1216306142.txt
    >


    Ok run dumpe2fs -h on your image and see if you have a line that says

    Errors behavior: Continue

    if you do run tune2fs -e remount-ro and then do the mount. That would explain
    why you are still having panics even though we should be aborting the journal.
    Thanks,

    Josef
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    On Thu, Jul 17, 2008 at 4:33 PM, Josef Bacik wrote:
    > Ok run dumpe2fs -h on your image and see if you have a line that says
    >
    > Errors behavior: Continue
    >
    > if you do run tune2fs -e remount-ro and then do the mount. That would explain
    > why you are still having panics even though we should be aborting the journal.
    > Thanks,


    Ahh, that probably explains it. I didn't realize there was such a thing.

    I am doing random-corruption tests, so it is quite possible that this
    bit gets set anywhere along the road...

    But even so, is it correct that the kernel should crash? It seems
    quite possible that error behaviour can change (like this) even with
    "normal" corruption, e.g. outside my test scripts.

    But I cannot even run dumpe2fs on my image (even with -f switch):

    dumpe2fs: Bad magic number in super-block while trying to open disk



    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    Vegard,

    How big is the filesystem? Is there any chance you can make a
    compressed e2image of the file? (This will not include file contents,
    but does reveal the names of the file.) Given the nature of the bug
    which you are reporting, it should be safe to scramble the names of
    the filenames using the -s option if that would make you feel more
    comfortable.

    The quick version is:

    e2image -r /dev/loop0 | bzip2 > badfs.e2i.bz2

    Then folks like Josef would be able to test your filesystem right
    away, instead of asking oyu to test it.

    - Ted
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    On Thu, Jul 17, 2008 at 5:08 PM, Theodore Tso wrote:
    > Vegard,
    >
    > How big is the filesystem? Is there any chance you can make a
    > compressed e2image of the file? (This will not include file contents,
    > but does reveal the names of the file.) Given the nature of the bug
    > which you are reporting, it should be safe to scramble the names of
    > the filenames using the -s option if that would make you feel more
    > comfortable.


    Oh, just 2M. It doesn't contain anything but copies of /bin/bash.

    I basically just made a crash-tester script that corrupts a dummy
    filesystem on purpose. But it seems that it might be partly my own
    fault for not protecting the bits in the filesystem image that say
    "oh, proceed on error". But I do have a feeling that the filesystem
    should not be able to say this in the first place. Because those bits
    can be corrupted legitimately in other ways too!

    http://folk.uio.no/vegardno/linux/corrupt.tar.bz2

    Is there a way to override the

    "Errors behavior: Continue"

    information which is present in the filesystem?


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    On Thu, Jul 17, 2008 at 05:16:39PM +0200, Vegard Nossum wrote:
    > Is there a way to override the
    >
    > "Errors behavior: Continue"
    >
    > information which is present in the filesystem?


    Yep:

    tune2fs -e remount-ro /dev/XXXX

    I should probably make the default configurable, and not "continue"....

    - Ted
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    On Thu, Jul 17, 2008 at 05:00:07PM +0200, Vegard Nossum wrote:
    > On Thu, Jul 17, 2008 at 4:33 PM, Josef Bacik wrote:
    > > Ok run dumpe2fs -h on your image and see if you have a line that says
    > >
    > > Errors behavior: Continue
    > >
    > > if you do run tune2fs -e remount-ro and then do the mount. That would explain
    > > why you are still having panics even though we should be aborting the journal.
    > > Thanks,

    >
    > Ahh, that probably explains it. I didn't realize there was such a thing.
    >
    > I am doing random-corruption tests, so it is quite possible that this
    > bit gets set anywhere along the road...
    >
    > But even so, is it correct that the kernel should crash? It seems
    > quite possible that error behaviour can change (like this) even with
    > "normal" corruption, e.g. outside my test scripts.
    >


    Yeah thats a hard to answer question, one that I will leave up to others who
    have been doing this much longer than I. My thought is remount-ro is there to
    keep you from crashing, so if you have errors=continue then you expect to live
    with the consequences. Course if that bit gets flipped via corruption thats not
    good either. Thanks,

    Josef
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    On Jul 17, 2008 11:40 -0400, Theodore Ts'o wrote:
    > On Thu, Jul 17, 2008 at 05:16:39PM +0200, Vegard Nossum wrote:
    > > Is there a way to override the
    > >
    > > "Errors behavior: Continue"
    > >
    > > information which is present in the filesystem?

    >
    > tune2fs -e remount-ro /dev/XXXX
    >
    > I should probably make the default configurable, and not "continue"....


    Yes, it has been that way on Debian for many years... I was going to
    say the same thing, but you beat me to it.

    Cheers, Andreas
    --
    Andreas Dilger
    Sr. Staff Engineer, Lustre Group
    Sun Microsystems of Canada, Inc.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c

    On Jul 17, 2008 10:43 -0400, Josef Bacik wrote:
    > Yeah thats a hard to answer question, one that I will leave up to others
    > who have been doing this much longer than I. My thought is remount-ro
    > is there to keep you from crashing, so if you have errors=continue then
    > you expect to live with the consequences. Course if that bit gets flipped
    > via corruption thats not good either.


    It shouldn't cause the kernel to crash, but it should definitely return
    an error to the application. This is probably one of the code paths
    that the Coverity folks were reporting on in FAST this year where on-disk
    errors are not propagated to the application.

    Cheers, Andreas
    --
    Andreas Dilger
    Sr. Staff Engineer, Lustre Group
    Sun Microsystems of Canada, Inc.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 1 of 2 1 2 LastLast