ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c - Kernel
This is a discussion on ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c - Kernel ; Hi,
I get this with both clean v2.6.26 and latest -git
(33af79d12e0fa25545d49e86afc67ea8ad5f2f40):
BUG: unable to handle kernel NULL pointer dereference at 0000000c
IP: [ ] journal_dirty_metadata+0xa0/0x160
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 4935, comm: rm Not ...
-
ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
Hi,
I get this with both clean v2.6.26 and latest -git
(33af79d12e0fa25545d49e86afc67ea8ad5f2f40):
BUG: unable to handle kernel NULL pointer dereference at 0000000c
IP: [] journal_dirty_metadata+0xa0/0x160
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 4935, comm: rm Not tainted (2.6.26-03414-g33af79d #39)
EIP: 0060:[] EFLAGS: 00210246 CPU: 1
EIP is at journal_dirty_metadata+0xa0/0x160
EAX: 00000000 EBX: cca59160 ECX: 00000001 EDX: f5114000
ESI: 00000000 EDI: f3d27750 EBP: f5115d58 ESP: f5115d40
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process rm (pid: 4935, ti=f5114000 task=f6a04fb0 task.ti=f5114000)
Stack: 00000001 f77d0050 cca00c90 f3d27750 f77d0050 f3d27750 f5115d78 c01f9eff
00000001 00000001 c05c2a53 f3d27750 00000000 f60da560 f5115da8 c01ef9ef
00000001 00000001 f60da560 f60da800 f3d27750 f3cc5944 f77d0050 f3d27750
Call Trace:
[] ? __ext3_journal_dirty_metadata+0x1f/0x50
[] ? ext3_free_data+0x9f/0x100
[] ? ext3_free_branches+0x23b/0x250
[] ? sync_buffer+0x0/0x40
[] ? ext3_free_branches+0xae/0x250
[] ? ext3_free_branches+0xae/0x250
[] ? ext3_truncate+0x5c8/0x940
[] ? trace_hardirqs_on_caller+0x116/0x170
[] ? journal_start+0xb0/0x110
[] ? journal_start+0xd3/0x110
[] ? journal_start+0xb0/0x110
[] ? ext3_journal_start_sb+0x29/0x50
[] ? ext3_delete_inode+0xd7/0xe0
[] ? ext3_delete_inode+0x0/0xe0
[] ? generic_delete_inode+0x62/0xe0
[] ? generic_drop_inode+0x11d/0x170
[] ? iput+0x47/0x50
[] ? do_unlinkat+0xec/0x170
[] ? trace_hardirqs_on_thunk+0xc/0x10
[] ? do_page_fault+0x0/0x880
[] ? trace_hardirqs_on_caller+0x116/0x170
[] ? sys_unlinkat+0x23/0x50
[] ? sysenter_past_esp+0x78/0xc5
=======================
Code: b8 01 00 00 00 e8 f1 57 f3 ff 89 e0 25 00 e0 ff ff f6 40 08 08
74 05 e8 2f e6 3a 00 83 c4 0c 31 c0 5b 5e 5f 5d c3 90 8d 74 26 00 <8b>
46 0c 85 c0 0f 84 8c 00 00 00 39 5e 18 74 68 8d 47 02 89 45
EIP: [] journal_dirty_metadata+0xa0/0x160 SS:ESP 0068:f5115d40
---[ end trace ad9c7bca1cad9e55 ]---
This corresponds to "jh" being NULL in journal_dirty_metadata():
if (jh->b_modified == 0) {
I also tried with this patch, but without success:
http://folk.uio.no/vegardno/linux/jbd-transaction.patch
so the problem seems quite reproducible by intentionally corrupting a
disk image.
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
On Thu, Jul 17, 2008 at 8:51 AM, Vegard Nossum wrote:
> Hi,
>
> I get this with both clean v2.6.26 and latest -git
> (33af79d12e0fa25545d49e86afc67ea8ad5f2f40):
>
Thanks for the report, do you happen to have any messages above the
panic message that would indicate if there was some sort of fs error
that was hit before the panic? That would help me figure out what
exactly happened. Thanks,
Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
On Thu, Jul 17, 2008 at 3:13 PM, Josef Bacik wrote:
> On Thu, Jul 17, 2008 at 8:51 AM, Vegard Nossum wrote:
>> Hi,
>>
>> I get this with both clean v2.6.26 and latest -git
>> (33af79d12e0fa25545d49e86afc67ea8ad5f2f40):
>>
>
> Thanks for the report, do you happen to have any messages above the
> panic message that would indicate if there was some sort of fs error
> that was hit before the panic? That would help me figure out what
> exactly happened. Thanks,
Yeah, the full log exists at
http://folk.uio.no/vegardno/linux/log-1216293934.txt
I think this is the interesting part:
kjournald starting. Commit interval 5 seconds
EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
EXT3 FS on loop0, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
EXT3-fs error (device loop0): ext3_free_blocks_sb: bit already cleared
for block 1507
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 1050159, count = 1
EXT3-fs error (device loop0) in ext3_reserve_inode_write: IO failure
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 2048, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 102, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 496, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 245, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 1, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 8192, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 8192, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 256, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 1216293753, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 1216293753, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 1703965, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 257875, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 1, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 1216293738, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 15552000, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 1, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 11, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 128, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 52, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 6, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 1, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 3246399477, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 860559364, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 1659021227, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 2723221558, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 458752, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 8, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 11, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 1092049505, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 2823687499, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 2276435116, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 2703362334, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 258, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 1216293738, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 58, count = 13
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 327, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 1048576, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 1, count = 1
BUG: unable to handle kernel NULL pointer dereference at 0000000c
IP: [] journal_dirty_metadata+0xa0/0x160
Would it also help to reproduce with jbd/ext3 debug enabled? (If it
isn't already.)
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
On Thu, Jul 17, 2008 at 9:20 AM, Vegard Nossum wrote:
> On Thu, Jul 17, 2008 at 3:13 PM, Josef Bacik wrote:
>> On Thu, Jul 17, 2008 at 8:51 AM, Vegard Nossum wrote:
>>> Hi,
>>>
>>> I get this with both clean v2.6.26 and latest -git
>>> (33af79d12e0fa25545d49e86afc67ea8ad5f2f40):
>>>
>>
>> Thanks for the report, do you happen to have any messages above the
>> panic message that would indicate if there was some sort of fs error
>> that was hit before the panic? That would help me figure out what
>> exactly happened. Thanks,
>
> Yeah, the full log exists at
>
> http://folk.uio.no/vegardno/linux/log-1216293934.txt
>
> I think this is the interesting part:
Hmm well the journal should have aborted, but it looks like it didn't,
are you mounting with errors=continue by any chance? Thanks much,
Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
On Thu, Jul 17, 2008 at 3:34 PM, Josef Bacik wrote:
>> Yeah, the full log exists at
>>
>> http://folk.uio.no/vegardno/linux/log-1216293934.txt
>>
>> I think this is the interesting part:
>
> Hmm well the journal should have aborted, but it looks like it didn't,
> are you mounting with errors=continue by any chance? Thanks much,
No, this is the command I used:
mount -o loop disk mnt
I think this looks interesting:
EXT3-fs error (device loop0) in ext3_reserve_inode_write: IO failure
The code in ext3_reserve_inode_write() is here:
err = ext3_journal_get_write_access(handle, iloc->bh);
if (err) {
brelse(iloc->bh);
iloc->bh = NULL;
}
Maybe it should do something different here?
But I don't know :-)
Thanks for helping out!
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
On Thu, Jul 17, 2008 at 03:39:24PM +0200, Vegard Nossum wrote:
> On Thu, Jul 17, 2008 at 3:34 PM, Josef Bacik wrote:
> >> Yeah, the full log exists at
> >>
> >> http://folk.uio.no/vegardno/linux/log-1216293934.txt
> >>
> >> I think this is the interesting part:
> >
> > Hmm well the journal should have aborted, but it looks like it didn't,
> > are you mounting with errors=continue by any chance? Thanks much,
>
> No, this is the command I used:
>
> mount -o loop disk mnt
>
> I think this looks interesting:
>
> EXT3-fs error (device loop0) in ext3_reserve_inode_write: IO failure
>
> The code in ext3_reserve_inode_write() is here:
>
> err = ext3_journal_get_write_access(handle, iloc->bh);
> if (err) {
> brelse(iloc->bh);
> iloc->bh = NULL;
> }
>
> Maybe it should do something different here?
>
> But I don't know :-)
>
> Thanks for helping out!
>
Well this is really odd, after that we call ext3_std_error which calls
journal_abort, so when we come into journal_dirty_metadata is_handle_aborted()
should have returned 1 and we should have just exited. I'm going to have to
think on this for a bit.
Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
On Thu, Jul 17, 2008 at 03:39:24PM +0200, Vegard Nossum wrote:
> On Thu, Jul 17, 2008 at 3:34 PM, Josef Bacik wrote:
> >> Yeah, the full log exists at
> >>
> >> http://folk.uio.no/vegardno/linux/log-1216293934.txt
> >>
> >> I think this is the interesting part:
> >
> > Hmm well the journal should have aborted, but it looks like it didn't,
> > are you mounting with errors=continue by any chance? Thanks much,
>
> No, this is the command I used:
>
> mount -o loop disk mnt
>
> I think this looks interesting:
>
> EXT3-fs error (device loop0) in ext3_reserve_inode_write: IO failure
>
> The code in ext3_reserve_inode_write() is here:
>
> err = ext3_journal_get_write_access(handle, iloc->bh);
> if (err) {
> brelse(iloc->bh);
> iloc->bh = NULL;
> }
>
> Maybe it should do something different here?
>
> But I don't know :-)
>
> Thanks for helping out!
>
>
Can you try this patch out and see if it fixes the problem? I didn't compile
test it, so you may need to tweak somethings, but it should work. Thanks,
Signed-off-by: Josef Bacik
Index: linux-2.6/fs/ext3/inode.c
================================================== =================
--- linux-2.6.orig/fs/ext3/inode.c
+++ linux-2.6/fs/ext3/inode.c
@@ -2023,13 +2023,27 @@ static void ext3_clear_blocks(handle_t *
unsigned long count, __le32 *first, __le32 *last)
{
__le32 *p;
+ int ret;
+
if (try_to_extend_transaction(handle, inode)) {
if (bh) {
BUFFER_TRACE(bh, "call ext3_journal_dirty_metadata");
- ext3_journal_dirty_metadata(handle, bh);
+ ret = ext3_journal_dirty_metadata(handle, bh);
+ if (ret) {
+ ext3_std_error(inode->i_sb, ret);
+ return;
+ }
}
- ext3_mark_inode_dirty(handle, inode);
- ext3_journal_test_restart(handle, inode);
+ ret = ext3_mark_inode_dirty(handle, inode);
+ if (ret)
+ return;
+
+ ret = ext3_journal_test_restart(handle, inode);
+ if (ret) {
+ ext3_std_error(inode->i_sb, ret);
+ return;
+ }
+
if (bh) {
BUFFER_TRACE(bh, "retaking write access");
ext3_journal_get_write_access(handle, bh);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
On Thu, Jul 17, 2008 at 3:57 PM, Josef Bacik wrote:
> Can you try this patch out and see if it fixes the problem? I didn't compile
> test it, so you may need to tweak somethings, but it should work. Thanks,
>
> Signed-off-by: Josef Bacik
Nope, seems to be the same problem:
kjournald starting. Commit interval 5 seconds
EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
EXT3 FS on loop0, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
system zones - Block = 16, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
system zones - Block = 32, count = 1
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 262144, count = 1
BUG: unable to handle kernel NULL pointer dereference at 0000000c
IP: [] journal_dirty_metadata+0xa0/0x160
Full log at http://folk.uio.no/vegardno/linux/log-1216304953.txt
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
On Thu, Jul 17, 2008 at 04:35:16PM +0200, Vegard Nossum wrote:
> On Thu, Jul 17, 2008 at 4:13 PM, Josef Bacik wrote:
> > On Thu, Jul 17, 2008 at 04:25:49PM +0200, Vegard Nossum wrote:
> >> On Thu, Jul 17, 2008 at 3:57 PM, Josef Bacik wrote:
> >> > Can you try this patch out and see if it fixes the problem? I didn't compile
> >> > test it, so you may need to tweak somethings, but it should work. Thanks,
> >> >
> >> > Signed-off-by: Josef Bacik
> >>
> >> Nope, seems to be the same problem:
> >>
> >> kjournald starting. Commit interval 5 seconds
> >> EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
> >> EXT3 FS on loop0, internal journal
> >> EXT3-fs: mounted filesystem with ordered data mode.
> >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
> >> system zones - Block = 16, count = 1
> >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
> >> system zones - Block = 32, count = 1
> >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
> >> datazone - block = 262144, count = 1
> >> BUG: unable to handle kernel NULL pointer dereference at 0000000c
> >> IP: [] journal_dirty_metadata+0xa0/0x160
> >>
> >> Full log at http://folk.uio.no/vegardno/linux/log-1216304953.txt
> >>
> >>
> >
> > Ok, this should do it then. Thanks,
>
> Hm, it doesn't apply. Should I revert the previous patch?
>
Yeah sorry about that.
Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
On Thu, Jul 17, 2008 at 04:25:49PM +0200, Vegard Nossum wrote:
> On Thu, Jul 17, 2008 at 3:57 PM, Josef Bacik wrote:
> > Can you try this patch out and see if it fixes the problem? I didn't compile
> > test it, so you may need to tweak somethings, but it should work. Thanks,
> >
> > Signed-off-by: Josef Bacik
>
> Nope, seems to be the same problem:
>
> kjournald starting. Commit interval 5 seconds
> EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
> EXT3 FS on loop0, internal journal
> EXT3-fs: mounted filesystem with ordered data mode.
> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
> system zones - Block = 16, count = 1
> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
> system zones - Block = 32, count = 1
> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
> datazone - block = 262144, count = 1
> BUG: unable to handle kernel NULL pointer dereference at 0000000c
> IP: [] journal_dirty_metadata+0xa0/0x160
>
> Full log at http://folk.uio.no/vegardno/linux/log-1216304953.txt
>
>
Ok, this should do it then. Thanks,
Josef
Index: linux-2.6/fs/ext3/inode.c
================================================== =================
--- linux-2.6.orig/fs/ext3/inode.c
+++ linux-2.6/fs/ext3/inode.c
@@ -2023,13 +2023,27 @@ static void ext3_clear_blocks(handle_t *
unsigned long count, __le32 *first, __le32 *last)
{
__le32 *p;
+ int ret;
+
if (try_to_extend_transaction(handle, inode)) {
if (bh) {
BUFFER_TRACE(bh, "call ext3_journal_dirty_metadata");
- ext3_journal_dirty_metadata(handle, bh);
+ ret = ext3_journal_dirty_metadata(handle, bh);
+ if (ret) {
+ ext3_std_error(inode->i_sb, ret);
+ return;
+ }
}
- ext3_mark_inode_dirty(handle, inode);
- ext3_journal_test_restart(handle, inode);
+ ret = ext3_mark_inode_dirty(handle, inode);
+ if (ret)
+ return;
+
+ ret = ext3_journal_test_restart(handle, inode);
+ if (ret) {
+ ext3_std_error(inode->i_sb, ret);
+ return;
+ }
+
if (bh) {
BUFFER_TRACE(bh, "retaking write access");
ext3_journal_get_write_access(handle, bh);
Index: linux-2.6/fs/ext3/balloc.c
================================================== =================
--- linux-2.6.orig/fs/ext3/balloc.c
+++ linux-2.6/fs/ext3/balloc.c
@@ -498,6 +498,7 @@ void ext3_free_blocks_sb(handle_t *handl
ext3_error (sb, "ext3_free_blocks",
"Freeing blocks not in datazone - "
"block = "E3FSBLK", count = %lu", block, count);
+ err = -EIO;
goto error_return;
}
@@ -535,6 +536,7 @@ do_more:
"Freeing blocks in system zones - "
"Block = "E3FSBLK", count = %lu",
block, count);
+ err = -EIO;
goto error_return;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
On Thu, Jul 17, 2008 at 4:13 PM, Josef Bacik wrote:
> On Thu, Jul 17, 2008 at 04:25:49PM +0200, Vegard Nossum wrote:
>> On Thu, Jul 17, 2008 at 3:57 PM, Josef Bacik wrote:
>> > Can you try this patch out and see if it fixes the problem? I didn't compile
>> > test it, so you may need to tweak somethings, but it should work. Thanks,
>> >
>> > Signed-off-by: Josef Bacik
>>
>> Nope, seems to be the same problem:
>>
>> kjournald starting. Commit interval 5 seconds
>> EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
>> EXT3 FS on loop0, internal journal
>> EXT3-fs: mounted filesystem with ordered data mode.
>> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
>> system zones - Block = 16, count = 1
>> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
>> system zones - Block = 32, count = 1
>> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
>> datazone - block = 262144, count = 1
>> BUG: unable to handle kernel NULL pointer dereference at 0000000c
>> IP: [] journal_dirty_metadata+0xa0/0x160
>>
>> Full log at http://folk.uio.no/vegardno/linux/log-1216304953.txt
>>
>>
>
> Ok, this should do it then. Thanks,
Hm, it doesn't apply. Should I revert the previous patch?
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
On Thu, Jul 17, 2008 at 4:16 PM, Josef Bacik wrote:
> On Thu, Jul 17, 2008 at 04:35:16PM +0200, Vegard Nossum wrote:
>> On Thu, Jul 17, 2008 at 4:13 PM, Josef Bacik wrote:
>> > On Thu, Jul 17, 2008 at 04:25:49PM +0200, Vegard Nossum wrote:
>> >> On Thu, Jul 17, 2008 at 3:57 PM, Josef Bacik wrote:
>> >> > Can you try this patch out and see if it fixes the problem? I didn't compile
>> >> > test it, so you may need to tweak somethings, but it should work. Thanks,
>> >> >
>> >> > Signed-off-by: Josef Bacik
>> >>
>> >> Nope, seems to be the same problem:
>> >>
>> >> kjournald starting. Commit interval 5 seconds
>> >> EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
>> >> EXT3 FS on loop0, internal journal
>> >> EXT3-fs: mounted filesystem with ordered data mode.
>> >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
>> >> system zones - Block = 16, count = 1
>> >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
>> >> system zones - Block = 32, count = 1
>> >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
>> >> datazone - block = 262144, count = 1
>> >> BUG: unable to handle kernel NULL pointer dereference at 0000000c
>> >> IP: [] journal_dirty_metadata+0xa0/0x160
>> >>
>> >> Full log at http://folk.uio.no/vegardno/linux/log-1216304953.txt
>> >>
>> >>
>> >
>> > Ok, this should do it then. Thanks,
-ENOLUCK
EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
datazone - block = 524288, count = 1
EXT3-fs error (device loop0) in ext3_free_blocks_sb: IO failure
BUG: unable to handle kernel NULL pointer dereference at 0000000c
IP: [] journal_dirty_metadata+0xa0/0x160
It did seem to get further, though.
http://folk.uio.no/vegardno/linux/log-1216306142.txt
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
On Thu, Jul 17, 2008 at 04:44:47PM +0200, Vegard Nossum wrote:
> On Thu, Jul 17, 2008 at 4:16 PM, Josef Bacik wrote:
> > On Thu, Jul 17, 2008 at 04:35:16PM +0200, Vegard Nossum wrote:
> >> On Thu, Jul 17, 2008 at 4:13 PM, Josef Bacik wrote:
> >> > On Thu, Jul 17, 2008 at 04:25:49PM +0200, Vegard Nossum wrote:
> >> >> On Thu, Jul 17, 2008 at 3:57 PM, Josef Bacik wrote:
> >> >> > Can you try this patch out and see if it fixes the problem? I didn't compile
> >> >> > test it, so you may need to tweak somethings, but it should work. Thanks,
> >> >> >
> >> >> > Signed-off-by: Josef Bacik
> >> >>
> >> >> Nope, seems to be the same problem:
> >> >>
> >> >> kjournald starting. Commit interval 5 seconds
> >> >> EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
> >> >> EXT3 FS on loop0, internal journal
> >> >> EXT3-fs: mounted filesystem with ordered data mode.
> >> >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
> >> >> system zones - Block = 16, count = 1
> >> >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in
> >> >> system zones - Block = 32, count = 1
> >> >> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
> >> >> datazone - block = 262144, count = 1
> >> >> BUG: unable to handle kernel NULL pointer dereference at 0000000c
> >> >> IP: [] journal_dirty_metadata+0xa0/0x160
> >> >>
> >> >> Full log at http://folk.uio.no/vegardno/linux/log-1216304953.txt
> >> >>
> >> >>
> >> >
> >> > Ok, this should do it then. Thanks,
>
> -ENOLUCK
>
> EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in
> datazone - block = 524288, count = 1
> EXT3-fs error (device loop0) in ext3_free_blocks_sb: IO failure
> BUG: unable to handle kernel NULL pointer dereference at 0000000c
> IP: [] journal_dirty_metadata+0xa0/0x160
>
> It did seem to get further, though.
>
> http://folk.uio.no/vegardno/linux/log-1216306142.txt
>
Ok run dumpe2fs -h on your image and see if you have a line that says
Errors behavior: Continue
if you do run tune2fs -e remount-ro and then do the mount. That would explain
why you are still having panics even though we should be aborting the journal.
Thanks,
Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
On Thu, Jul 17, 2008 at 4:33 PM, Josef Bacik wrote:
> Ok run dumpe2fs -h on your image and see if you have a line that says
>
> Errors behavior: Continue
>
> if you do run tune2fs -e remount-ro and then do the mount. That would explain
> why you are still having panics even though we should be aborting the journal.
> Thanks,
Ahh, that probably explains it. I didn't realize there was such a thing.
I am doing random-corruption tests, so it is quite possible that this
bit gets set anywhere along the road...
But even so, is it correct that the kernel should crash? It seems
quite possible that error behaviour can change (like this) even with
"normal" corruption, e.g. outside my test scripts.
But I cannot even run dumpe2fs on my image (even with -f switch):
dumpe2fs: Bad magic number in super-block while trying to open disk
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
Vegard,
How big is the filesystem? Is there any chance you can make a
compressed e2image of the file? (This will not include file contents,
but does reveal the names of the file.) Given the nature of the bug
which you are reporting, it should be safe to scramble the names of
the filenames using the -s option if that would make you feel more
comfortable.
The quick version is:
e2image -r /dev/loop0 | bzip2 > badfs.e2i.bz2
Then folks like Josef would be able to test your filesystem right
away, instead of asking oyu to test it.
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
On Thu, Jul 17, 2008 at 5:08 PM, Theodore Tso wrote:
> Vegard,
>
> How big is the filesystem? Is there any chance you can make a
> compressed e2image of the file? (This will not include file contents,
> but does reveal the names of the file.) Given the nature of the bug
> which you are reporting, it should be safe to scramble the names of
> the filenames using the -s option if that would make you feel more
> comfortable.
Oh, just 2M. It doesn't contain anything but copies of /bin/bash.
I basically just made a crash-tester script that corrupts a dummy
filesystem on purpose. But it seems that it might be partly my own
fault for not protecting the bits in the filesystem image that say
"oh, proceed on error". But I do have a feeling that the filesystem
should not be able to say this in the first place. Because those bits
can be corrupted legitimately in other ways too!
http://folk.uio.no/vegardno/linux/corrupt.tar.bz2
Is there a way to override the
"Errors behavior: Continue"
information which is present in the filesystem?
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
On Thu, Jul 17, 2008 at 05:16:39PM +0200, Vegard Nossum wrote:
> Is there a way to override the
>
> "Errors behavior: Continue"
>
> information which is present in the filesystem?
Yep:
tune2fs -e remount-ro /dev/XXXX
I should probably make the default configurable, and not "continue"....
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
On Thu, Jul 17, 2008 at 05:00:07PM +0200, Vegard Nossum wrote:
> On Thu, Jul 17, 2008 at 4:33 PM, Josef Bacik wrote:
> > Ok run dumpe2fs -h on your image and see if you have a line that says
> >
> > Errors behavior: Continue
> >
> > if you do run tune2fs -e remount-ro and then do the mount. That would explain
> > why you are still having panics even though we should be aborting the journal.
> > Thanks,
>
> Ahh, that probably explains it. I didn't realize there was such a thing.
>
> I am doing random-corruption tests, so it is quite possible that this
> bit gets set anywhere along the road...
>
> But even so, is it correct that the kernel should crash? It seems
> quite possible that error behaviour can change (like this) even with
> "normal" corruption, e.g. outside my test scripts.
>
Yeah thats a hard to answer question, one that I will leave up to others who
have been doing this much longer than I. My thought is remount-ro is there to
keep you from crashing, so if you have errors=continue then you expect to live
with the consequences. Course if that bit gets flipped via corruption thats not
good either. Thanks,
Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
On Jul 17, 2008 11:40 -0400, Theodore Ts'o wrote:
> On Thu, Jul 17, 2008 at 05:16:39PM +0200, Vegard Nossum wrote:
> > Is there a way to override the
> >
> > "Errors behavior: Continue"
> >
> > information which is present in the filesystem?
>
> tune2fs -e remount-ro /dev/XXXX
>
> I should probably make the default configurable, and not "continue"....
Yes, it has been that way on Debian for many years... I was going to
say the same thing, but you beat me to it.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference at 0000000c
On Jul 17, 2008 10:43 -0400, Josef Bacik wrote:
> Yeah thats a hard to answer question, one that I will leave up to others
> who have been doing this much longer than I. My thought is remount-ro
> is there to keep you from crashing, so if you have errors=continue then
> you expect to live with the consequences. Course if that bit gets flipped
> via corruption thats not good either.
It shouldn't cause the kernel to crash, but it should definitely return
an error to the application. This is probably one of the code paths
that the Coverity folks were reporting on in FAST this year where on-disk
errors are not propagated to the application.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/