BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers - Kernel

This is a discussion on BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers - Kernel ; On x86_64, during testing using "stress" package: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 IP: [ ] drop_buffers+0x2f/0xfb PGD 1ee8ad067 PUD 26f19a067 PMD 0 Oops: 0000 [1] SMP CPU 3 Modules linked in: parport_pc lp parport tg3 ...

+ Reply to Thread
Results 1 to 8 of 8

Thread: BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers

  1. BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers

    On x86_64, during testing using "stress" package:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    IP: [] drop_buffers+0x2f/0xfb
    PGD 1ee8ad067 PUD 26f19a067 PMD 0
    Oops: 0000 [1] SMP
    CPU 3
    Modules linked in: parport_pc lp parport tg3 cciss ehci_hcd ohci_hcd uhci_hcd
    Pid: 16860, comm: stress Not tainted 2.6.26-rc1-git8 #1
    RIP: 0010:[] [] drop_buffers+0x2f/0xfb
    RSP: 0000:ffff81026bc03a08 EFLAGS: 00010203
    RAX: 0000000000000000 RBX: ffffe20008bae680 RCX: ffff81027f490f00
    RDX: 0000000000000000 RSI: ffff81026bc03a58 RDI: ffffe20008bae680
    RBP: ffff81026bc03a38 R08: ffff81026bc03b78 R09: ffff810001103780
    R10: ffff81026bc03a08 R11: ffff81026bc03c88 R12: ffffe20008bae680
    R13: ffff81027c412850 R14: ffff81026bc03d58 R15: ffff81026bc03a58
    FS: 00007fa9e7e416f0(0000) GS:ffff81027f806980(0000) knlGS:00000000f7f856c0
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000000 CR3: 000000027f973000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process stress (pid: 16860, threadinfo ffff81026bc02000, task ffff81027e424c50)
    Stack: ffffe20008bafa68 ffffe20008bae680 ffff81027f490f00 ffff81026bc03a58
    ffff81026bc03d58 ffff81026bc03c88 ffff81026bc03a78 ffffffff802ad39f
    ffff81027f490f00 ffffe20008b14060 0000000000000000 ffff81027f490f00
    Call Trace:
    [] try_to_free_buffers+0x60/0xa2
    [] try_to_release_page+0x3b/0x41
    [] shrink_page_list+0x457/0x562
    [] shrink_inactive_list+0x126/0x361
    [] shrink_zone+0xe5/0x10a
    [] try_to_free_pages+0x1ef/0x326
    [] ? isolate_pages_global+0x0/0x34
    [] __alloc_pages_internal+0x25a/0x3ad
    [] __alloc_pages+0xb/0xd
    [] handle_mm_fault+0x238/0x6d0
    [] do_page_fault+0x438/0x7de
    [] error_exit+0x0/0x51


    Code: 41 57 49 89 f7 41 56 41 55 41 54 49 89 fc 53 48 83 ec 08 48 8b 07 25 00 08 00 00 48 85 c0 75 04 0f 0b eb fe 4c 8b 6f 10 4c 89 ea <48> 8b 02 25 00 08 00 00 48 85 c0 74 10 49 8b 44 24 18 48 85 c0
    RIP [] drop_buffers+0x2f/0xfb
    RSP
    CR2: 0000000000000000
    Kernel panic - not syncing: Fatal exception

    ---
    ~Randy
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers

    On Sun, 11 May 2008 10:54:29 -0700 Randy Dunlap wrote:

    > On x86_64, during testing using "stress" package:
    >
    > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000



    > IP: [] drop_buffers+0x2f/0xfb
    > PGD 1ee8ad067 PUD 26f19a067 PMD 0
    > Oops: 0000 [1] SMP
    > CPU 3
    > Modules linked in: parport_pc lp parport tg3 cciss ehci_hcd ohci_hcd uhci_hcd
    > Pid: 16860, comm: stress Not tainted 2.6.26-rc1-git8 #1
    > RIP: 0010:[] [] drop_buffers+0x2f/0xfb
    > RSP: 0000:ffff81026bc03a08 EFLAGS: 00010203
    > RAX: 0000000000000000 RBX: ffffe20008bae680 RCX: ffff81027f490f00
    > RDX: 0000000000000000 RSI: ffff81026bc03a58 RDI: ffffe20008bae680
    > RBP: ffff81026bc03a38 R08: ffff81026bc03b78 R09: ffff810001103780
    > R10: ffff81026bc03a08 R11: ffff81026bc03c88 R12: ffffe20008bae680
    > R13: ffff81027c412850 R14: ffff81026bc03d58 R15: ffff81026bc03a58
    > FS: 00007fa9e7e416f0(0000) GS:ffff81027f806980(0000) knlGS:00000000f7f856c0
    > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    > CR2: 0000000000000000 CR3: 000000027f973000 CR4: 00000000000006e0
    > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    > Process stress (pid: 16860, threadinfo ffff81026bc02000, task ffff81027e424c50)
    > Stack: ffffe20008bafa68 ffffe20008bae680 ffff81027f490f00 ffff81026bc03a58
    > ffff81026bc03d58 ffff81026bc03c88 ffff81026bc03a78 ffffffff802ad39f
    > ffff81027f490f00 ffffe20008b14060 0000000000000000 ffff81027f490f00
    > Call Trace:
    > [] try_to_free_buffers+0x60/0xa2
    > [] try_to_release_page+0x3b/0x41
    > [] shrink_page_list+0x457/0x562
    > [] shrink_inactive_list+0x126/0x361
    > [] shrink_zone+0xe5/0x10a
    > [] try_to_free_pages+0x1ef/0x326
    > [] ? isolate_pages_global+0x0/0x34
    > [] __alloc_pages_internal+0x25a/0x3ad
    > [] __alloc_pages+0xb/0xd
    > [] handle_mm_fault+0x238/0x6d0
    > [] do_page_fault+0x438/0x7de
    > [] error_exit+0x0/0x51
    >
    >
    > Code: 41 57 49 89 f7 41 56 41 55 41 54 49 89 fc 53 48 83 ec 08 48 8b 07 25 00 08 00 00 48 85 c0 75 04 0f 0b eb fe 4c 8b 6f 10 4c 89 ea <48> 8b 02 25 00 08 00 00 48 85 c0 74 10 49 8b 44 24 18 48 85 c0
    > RIP [] drop_buffers+0x2f/0xfb
    > RSP
    > CR2: 0000000000000000
    > Kernel panic - not syncing: Fatal exception


    Seems that local variable `bh' is NULL.

    I wonder what the heck we did to cause that. Which filesystems were in
    use?

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers

    Andrew Morton wrote:
    > On Sun, 11 May 2008 10:54:29 -0700 Randy Dunlap wrote:
    >
    >> On x86_64, during testing using "stress" package:
    >>
    >> BUG: unable to handle kernel NULL pointer dereference at 0000000000000000

    >
    >
    >> IP: [] drop_buffers+0x2f/0xfb
    >> PGD 1ee8ad067 PUD 26f19a067 PMD 0
    >> Oops: 0000 [1] SMP
    >> CPU 3
    >> Modules linked in: parport_pc lp parport tg3 cciss ehci_hcd ohci_hcd uhci_hcd
    >> Pid: 16860, comm: stress Not tainted 2.6.26-rc1-git8 #1
    >> RIP: 0010:[] [] drop_buffers+0x2f/0xfb
    >> RSP: 0000:ffff81026bc03a08 EFLAGS: 00010203
    >> RAX: 0000000000000000 RBX: ffffe20008bae680 RCX: ffff81027f490f00
    >> RDX: 0000000000000000 RSI: ffff81026bc03a58 RDI: ffffe20008bae680
    >> RBP: ffff81026bc03a38 R08: ffff81026bc03b78 R09: ffff810001103780
    >> R10: ffff81026bc03a08 R11: ffff81026bc03c88 R12: ffffe20008bae680
    >> R13: ffff81027c412850 R14: ffff81026bc03d58 R15: ffff81026bc03a58
    >> FS: 00007fa9e7e416f0(0000) GS:ffff81027f806980(0000) knlGS:00000000f7f856c0
    >> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    >> CR2: 0000000000000000 CR3: 000000027f973000 CR4: 00000000000006e0
    >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    >> Process stress (pid: 16860, threadinfo ffff81026bc02000, task ffff81027e424c50)
    >> Stack: ffffe20008bafa68 ffffe20008bae680 ffff81027f490f00 ffff81026bc03a58
    >> ffff81026bc03d58 ffff81026bc03c88 ffff81026bc03a78 ffffffff802ad39f
    >> ffff81027f490f00 ffffe20008b14060 0000000000000000 ffff81027f490f00
    >> Call Trace:
    >> [] try_to_free_buffers+0x60/0xa2
    >> [] try_to_release_page+0x3b/0x41
    >> [] shrink_page_list+0x457/0x562
    >> [] shrink_inactive_list+0x126/0x361
    >> [] shrink_zone+0xe5/0x10a
    >> [] try_to_free_pages+0x1ef/0x326
    >> [] ? isolate_pages_global+0x0/0x34
    >> [] __alloc_pages_internal+0x25a/0x3ad
    >> [] __alloc_pages+0xb/0xd
    >> [] handle_mm_fault+0x238/0x6d0
    >> [] do_page_fault+0x438/0x7de
    >> [] error_exit+0x0/0x51
    >>
    >>
    >> Code: 41 57 49 89 f7 41 56 41 55 41 54 49 89 fc 53 48 83 ec 08 48 8b 07 25 00 08 00 00 48 85 c0 75 04 0f 0b eb fe 4c 8b 6f 10 4c 89 ea <48> 8b 02 25 00 08 00 00 48 85 c0 74 10 49 8b 44 24 18 48 85 c0
    >> RIP [] drop_buffers+0x2f/0xfb
    >> RSP
    >> CR2: 0000000000000000
    >> Kernel panic - not syncing: Fatal exception

    >
    > Seems that local variable `bh' is NULL.
    >
    > I wonder what the heck we did to cause that. Which filesystems were in
    > use?


    ext3, nfs, and the usual procfs, sysfs, and tmpfs.

    Also in the kernel: debugfs, usbfs, inotifyfs, configfs, ramfs,
    hugetlbfs, msdos, vfat, iso9660, and rootfs.

    --
    ~Randy
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers

    > Andrew Morton wrote:
    > >On Sun, 11 May 2008 10:54:29 -0700 Randy Dunlap
    > >wrote:
    > >
    > >>On x86_64, during testing using "stress" package:
    > >>
    > >>BUG: unable to handle kernel NULL pointer dereference at 0000000000000000

    > >
    > >
    > >>IP: [] drop_buffers+0x2f/0xfb
    > >>PGD 1ee8ad067 PUD 26f19a067 PMD 0
    > >>Oops: 0000 [1] SMP
    > >>CPU 3
    > >>Modules linked in: parport_pc lp parport tg3 cciss ehci_hcd ohci_hcd
    > >>uhci_hcd
    > >>Pid: 16860, comm: stress Not tainted 2.6.26-rc1-git8 #1
    > >>RIP: 0010:[] []
    > >>drop_buffers+0x2f/0xfb
    > >>RSP: 0000:ffff81026bc03a08 EFLAGS: 00010203
    > >>RAX: 0000000000000000 RBX: ffffe20008bae680 RCX: ffff81027f490f00
    > >>RDX: 0000000000000000 RSI: ffff81026bc03a58 RDI: ffffe20008bae680
    > >>RBP: ffff81026bc03a38 R08: ffff81026bc03b78 R09: ffff810001103780
    > >>R10: ffff81026bc03a08 R11: ffff81026bc03c88 R12: ffffe20008bae680
    > >>R13: ffff81027c412850 R14: ffff81026bc03d58 R15: ffff81026bc03a58
    > >>FS: 00007fa9e7e416f0(0000) GS:ffff81027f806980(0000)
    > >>knlGS:00000000f7f856c0
    > >>CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    > >>CR2: 0000000000000000 CR3: 000000027f973000 CR4: 00000000000006e0
    > >>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    > >>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    > >>Process stress (pid: 16860, threadinfo ffff81026bc02000, task
    > >>ffff81027e424c50)
    > >>Stack: ffffe20008bafa68 ffffe20008bae680 ffff81027f490f00
    > >>ffff81026bc03a58
    > >> ffff81026bc03d58 ffff81026bc03c88 ffff81026bc03a78 ffffffff802ad39f
    > >> ffff81027f490f00 ffffe20008b14060 0000000000000000 ffff81027f490f00
    > >>Call Trace:
    > >> [] try_to_free_buffers+0x60/0xa2
    > >> [] try_to_release_page+0x3b/0x41
    > >> [] shrink_page_list+0x457/0x562
    > >> [] shrink_inactive_list+0x126/0x361
    > >> [] shrink_zone+0xe5/0x10a
    > >> [] try_to_free_pages+0x1ef/0x326
    > >> [] ? isolate_pages_global+0x0/0x34
    > >> [] __alloc_pages_internal+0x25a/0x3ad
    > >> [] __alloc_pages+0xb/0xd
    > >> [] handle_mm_fault+0x238/0x6d0
    > >> [] do_page_fault+0x438/0x7de
    > >> [] error_exit+0x0/0x51
    > >>
    > >>
    > >>Code: 41 57 49 89 f7 41 56 41 55 41 54 49 89 fc 53 48 83 ec 08 48 8b 07
    > >>25 00 08 00 00 48 85 c0 75 04 0f 0b eb fe 4c 8b 6f 10 4c 89 ea <48> 8b 02
    > >>25 00 08 00 00 48 85 c0 74 10 49 8b 44 24 18 48 85 c0
    > >>RIP [] drop_buffers+0x2f/0xfb
    > >> RSP
    > >>CR2: 0000000000000000
    > >>Kernel panic - not syncing: Fatal exception

    > >
    > >Seems that local variable `bh' is NULL.
    > >
    > >I wonder what the heck we did to cause that. Which filesystems were in
    > >use?

    >
    > ext3, nfs, and the usual procfs, sysfs, and tmpfs.

    Hmm, the page doesn't look like one from ext3 or nfs because they have
    their private releasepage() calls... In theory it could also be a single
    bit error setting PagePrivate bit but that's just a last resort hope
    I don't have better explanation though.

    > Also in the kernel: debugfs, usbfs, inotifyfs, configfs, ramfs,
    > hugetlbfs, msdos, vfat, iso9660, and rootfs.


    Honza
    --
    Jan Kara
    SuSE CR Labs
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers

    On Monday 12 May 2008, Randy Dunlap wrote:
    > Andrew Morton wrote:
    > > On Sun, 11 May 2008 10:54:29 -0700 Randy Dunlap

    wrote:
    > >> On x86_64, during testing using "stress" package:
    > >>
    > >> BUG: unable to handle kernel NULL pointer dereference at
    > >> 0000000000000000
    > >>
    > >>
    > >> IP: [] drop_buffers+0x2f/0xfb
    > >> PGD 1ee8ad067 PUD 26f19a067 PMD 0
    > >> Oops: 0000 [1] SMP
    > >> CPU 3
    > >> Modules linked in: parport_pc lp parport tg3 cciss ehci_hcd ohci_hcd
    > >> uhci_hcd Pid: 16860, comm: stress Not tainted 2.6.26-rc1-git8 #1
    > >> RIP: 0010:[] []
    > >> drop_buffers+0x2f/0xfb RSP: 0000:ffff81026bc03a08 EFLAGS: 00010203
    > >> RAX: 0000000000000000 RBX: ffffe20008bae680 RCX: ffff81027f490f00
    > >> RDX: 0000000000000000 RSI: ffff81026bc03a58 RDI: ffffe20008bae680
    > >> RBP: ffff81026bc03a38 R08: ffff81026bc03b78 R09: ffff810001103780
    > >> R10: ffff81026bc03a08 R11: ffff81026bc03c88 R12: ffffe20008bae680
    > >> R13: ffff81027c412850 R14: ffff81026bc03d58 R15: ffff81026bc03a58
    > >> FS: 00007fa9e7e416f0(0000) GS:ffff81027f806980(0000)
    > >> knlGS:00000000f7f856c0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    > >> CR2: 0000000000000000 CR3: 000000027f973000 CR4: 00000000000006e0
    > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    > >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    > >> Process stress (pid: 16860, threadinfo ffff81026bc02000, task
    > >> ffff81027e424c50) Stack: ffffe20008bafa68 ffffe20008bae680
    > >> ffff81027f490f00 ffff81026bc03a58 ffff81026bc03d58 ffff81026bc03c88
    > >> ffff81026bc03a78 ffffffff802ad39f ffff81027f490f00 ffffe20008b14060
    > >> 0000000000000000 ffff81027f490f00 Call Trace:
    > >> [] try_to_free_buffers+0x60/0xa2
    > >> [] try_to_release_page+0x3b/0x41
    > >> [] shrink_page_list+0x457/0x562
    > >> [] shrink_inactive_list+0x126/0x361
    > >> [] shrink_zone+0xe5/0x10a
    > >> [] try_to_free_pages+0x1ef/0x326
    > >> [] ? isolate_pages_global+0x0/0x34
    > >> [] __alloc_pages_internal+0x25a/0x3ad
    > >> [] __alloc_pages+0xb/0xd
    > >> [] handle_mm_fault+0x238/0x6d0
    > >> [] do_page_fault+0x438/0x7de
    > >> [] error_exit+0x0/0x51
    > >>
    > >>
    > >> Code: 41 57 49 89 f7 41 56 41 55 41 54 49 89 fc 53 48 83 ec 08 48 8b 07
    > >> 25 00 08 00 00 48 85 c0 75 04 0f 0b eb fe 4c 8b 6f 10 4c 89 ea <48> 8b
    > >> 02 25 00 08 00 00 48 85 c0 74 10 49 8b 44 24 18 48 85 c0 RIP
    > >> [] drop_buffers+0x2f/0xfb
    > >> RSP
    > >> CR2: 0000000000000000
    > >> Kernel panic - not syncing: Fatal exception

    > >
    > > Seems that local variable `bh' is NULL.
    > >
    > > I wonder what the heck we did to cause that. Which filesystems were in
    > > use?

    >
    > ext3, nfs, and the usual procfs, sysfs, and tmpfs.
    >
    > Also in the kernel: debugfs, usbfs, inotifyfs, configfs, ramfs,
    > hugetlbfs, msdos, vfat, iso9660, and rootfs.


    If you stand on your head, and race really really hard,
    nfs_inode_remove_request() does this without locking the page:

    set_page_private(req->wb_page, 0);
    ClearPagePrivate(req->wb_page);

    That code has been around for a long time though.

    Probably not the droids we're looking for, but it was the only one that jumped
    out at me during a quick search of set_page_private(foo, 0) callers.

    It seems more likely that we got there by an invalidatepage call that left
    PagePrivate set but didn't allow the page to be freed.

    The page would turn into the funky anonymous zombie thing meant for buffers
    that had to be written before the page could be freed (PagePrivate set but
    page->mapping == NULL), and eventually find its way to try_to_free_buffers().

    The problem with that theory is that I would expect page->private to be
    non-null in such a case. Randy, any chance this can be reproduced?

    -chris
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers


    --- Original Message ---
    > On Monday 12 May 2008, Randy Dunlap wrote:
    > > Andrew Morton wrote:
    > > > On Sun, 11 May 2008 10:54:29 -0700 Randy Dunlap

    > wrote:
    > > >> On x86_64, during testing using "stress" package:
    > > >>
    > > >> BUG: unable to handle kernel NULL pointer dereference at
    > > >> 0000000000000000
    > > >>
    > > >>
    > > >> IP: [] drop_buffers+0x2f/0xfb
    > > >> PGD 1ee8ad067 PUD 26f19a067 PMD 0
    > > >> Oops: 0000 [1] SMP
    > > >> CPU 3
    > > >> Modules linked in: parport_pc lp parport tg3 cciss ehci_hcd ohci_hcd
    > > >> uhci_hcd Pid: 16860, comm: stress Not tainted 2.6.26-rc1-git8 #1
    > > >> RIP: 0010:[] []
    > > >> drop_buffers+0x2f/0xfb RSP: 0000:ffff81026bc03a08 EFLAGS: 00010203
    > > >> RAX: 0000000000000000 RBX: ffffe20008bae680 RCX: ffff81027f490f00
    > > >> RDX: 0000000000000000 RSI: ffff81026bc03a58 RDI: ffffe20008bae680
    > > >> RBP: ffff81026bc03a38 R08: ffff81026bc03b78 R09: ffff810001103780
    > > >> R10: ffff81026bc03a08 R11: ffff81026bc03c88 R12: ffffe20008bae680
    > > >> R13: ffff81027c412850 R14: ffff81026bc03d58 R15: ffff81026bc03a58
    > > >> FS: 00007fa9e7e416f0(0000) GS:ffff81027f806980(0000)
    > > >> knlGS:00000000f7f856c0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    > > >> CR2: 0000000000000000 CR3: 000000027f973000 CR4: 00000000000006e0
    > > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    > > >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    > > >> Process stress (pid: 16860, threadinfo ffff81026bc02000, task
    > > >> ffff81027e424c50) Stack: ffffe20008bafa68 ffffe20008bae680
    > > >> ffff81027f490f00 ffff81026bc03a58 ffff81026bc03d58 ffff81026bc03c88
    > > >> ffff81026bc03a78 ffffffff802ad39f ffff81027f490f00 ffffe20008b14060
    > > >> 0000000000000000 ffff81027f490f00 Call Trace:
    > > >> [] try_to_free_buffers+0x60/0xa2
    > > >> [] try_to_release_page+0x3b/0x41
    > > >> [] shrink_page_list+0x457/0x562
    > > >> [] shrink_inactive_list+0x126/0x361
    > > >> [] shrink_zone+0xe5/0x10a
    > > >> [] try_to_free_pages+0x1ef/0x326
    > > >> [] ? isolate_pages_global+0x0/0x34
    > > >> [] __alloc_pages_internal+0x25a/0x3ad
    > > >> [] __alloc_pages+0xb/0xd
    > > >> [] handle_mm_fault+0x238/0x6d0
    > > >> [] do_page_fault+0x438/0x7de
    > > >> [] error_exit+0x0/0x51
    > > >>
    > > >>
    > > >> Code: 41 57 49 89 f7 41 56 41 55 41 54 49 89 fc 53 48 83 ec 08 48 8b 07
    > > >> 25 00 08 00 00 48 85 c0 75 04 0f 0b eb fe 4c 8b 6f 10 4c 89 ea <48> 8b
    > > >> 02 25 00 08 00 00 48 85 c0 74 10 49 8b 44 24 18 48 85 c0 RIP
    > > >> [] drop_buffers+0x2f/0xfb
    > > >> RSP
    > > >> CR2: 0000000000000000
    > > >> Kernel panic - not syncing: Fatal exception
    > > >
    > > > Seems that local variable `bh' is NULL.
    > > >
    > > > I wonder what the heck we did to cause that. Which filesystems were in
    > > > use?

    > >
    > > ext3, nfs, and the usual procfs, sysfs, and tmpfs.
    > >
    > > Also in the kernel: debugfs, usbfs, inotifyfs, configfs, ramfs,
    > > hugetlbfs, msdos, vfat, iso9660, and rootfs.

    >
    > If you stand on your head, and race really really hard,
    > nfs_inode_remove_request() does this without locking the page:
    >
    > set_page_private(req->wb_page, 0);
    > ClearPagePrivate(req->wb_page);
    >
    > That code has been around for a long time though.
    >
    > Probably not the droids we're looking for, but it was the only
    > one that jumped
    > out at me during a quick search of set_page_private(foo, 0)
    > callers.
    >
    > It seems more likely that we got there by an invalidatepage
    > call that left
    > PagePrivate set but didn't allow the page to be freed.
    >
    > The page would turn into the funky anonymous zombie thing meant
    > for buffers
    > that had to be written before the page could be freed (PagePrivate
    > set but
    > page->mapping == NULL), and eventually find its way to
    > try_to_free_buffers().
    >
    > The problem with that theory is that I would expect page->private
    > to be
    > non-null in such a case. Randy, any chance this can be
    > reproduced?


    No idea. I'm rerunning the test now.

    ~Randy

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers

    --- Original Message ---
    > On Monday 12 May 2008, Randy Dunlap wrote:
    > > Andrew Morton wrote:
    > > > On Sun, 11 May 2008 10:54:29 -0700 Randy Dunlap

    > wrote:
    > > >> On x86_64, during testing using "stress" package:
    > > >>
    > > >> BUG: unable to handle kernel NULL pointer dereference at
    > > >> 0000000000000000
    > > >>
    > > >>
    > > >> IP: [] drop_buffers+0x2f/0xfb
    > > >> PGD 1ee8ad067 PUD 26f19a067 PMD 0
    > > >> Oops: 0000 [1] SMP
    > > >> CPU 3
    > > >> Modules linked in: parport_pc lp parport tg3 cciss ehci_hcd ohci_hcd
    > > >> uhci_hcd Pid: 16860, comm: stress Not tainted 2.6.26-rc1-git8 #1
    > > >> RIP: 0010:[] []
    > > >> drop_buffers+0x2f/0xfb RSP: 0000:ffff81026bc03a08 EFLAGS: 00010203
    > > >> RAX: 0000000000000000 RBX: ffffe20008bae680 RCX: ffff81027f490f00
    > > >> RDX: 0000000000000000 RSI: ffff81026bc03a58 RDI: ffffe20008bae680
    > > >> RBP: ffff81026bc03a38 R08: ffff81026bc03b78 R09: ffff810001103780
    > > >> R10: ffff81026bc03a08 R11: ffff81026bc03c88 R12: ffffe20008bae680
    > > >> R13: ffff81027c412850 R14: ffff81026bc03d58 R15: ffff81026bc03a58
    > > >> FS: 00007fa9e7e416f0(0000) GS:ffff81027f806980(0000)
    > > >> knlGS:00000000f7f856c0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    > > >> CR2: 0000000000000000 CR3: 000000027f973000 CR4: 00000000000006e0
    > > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    > > >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    > > >> Process stress (pid: 16860, threadinfo ffff81026bc02000, task
    > > >> ffff81027e424c50) Stack: ffffe20008bafa68 ffffe20008bae680
    > > >> ffff81027f490f00 ffff81026bc03a58 ffff81026bc03d58 ffff81026bc03c88
    > > >> ffff81026bc03a78 ffffffff802ad39f ffff81027f490f00 ffffe20008b14060
    > > >> 0000000000000000 ffff81027f490f00 Call Trace:
    > > >> [] try_to_free_buffers+0x60/0xa2
    > > >> [] try_to_release_page+0x3b/0x41
    > > >> [] shrink_page_list+0x457/0x562
    > > >> [] shrink_inactive_list+0x126/0x361
    > > >> [] shrink_zone+0xe5/0x10a
    > > >> [] try_to_free_pages+0x1ef/0x326
    > > >> [] ? isolate_pages_global+0x0/0x34
    > > >> [] __alloc_pages_internal+0x25a/0x3ad
    > > >> [] __alloc_pages+0xb/0xd
    > > >> [] handle_mm_fault+0x238/0x6d0
    > > >> [] do_page_fault+0x438/0x7de
    > > >> [] error_exit+0x0/0x51
    > > >>
    > > >>
    > > >> Code: 41 57 49 89 f7 41 56 41 55 41 54 49 89 fc 53 48 83 ec 08 48 8b 07
    > > >> 25 00 08 00 00 48 85 c0 75 04 0f 0b eb fe 4c 8b 6f 10 4c 89 ea <48> 8b
    > > >> 02 25 00 08 00 00 48 85 c0 74 10 49 8b 44 24 18 48 85 c0 RIP
    > > >> [] drop_buffers+0x2f/0xfb
    > > >> RSP
    > > >> CR2: 0000000000000000
    > > >> Kernel panic - not syncing: Fatal exception
    > > >
    > > > Seems that local variable `bh' is NULL.
    > > >
    > > > I wonder what the heck we did to cause that. Which filesystems were in
    > > > use?

    > >
    > > ext3, nfs, and the usual procfs, sysfs, and tmpfs.
    > >
    > > Also in the kernel: debugfs, usbfs, inotifyfs, configfs, ramfs,
    > > hugetlbfs, msdos, vfat, iso9660, and rootfs.

    >
    > If you stand on your head, and race really really hard,
    > nfs_inode_remove_request() does this without locking the page:
    >
    > set_page_private(req->wb_page, 0);
    > ClearPagePrivate(req->wb_page);
    >
    > That code has been around for a long time though.
    >
    > Probably not the droids we're looking for, but it was the only
    > one that jumped
    > out at me during a quick search of set_page_private(foo, 0)
    > callers.
    >
    > It seems more likely that we got there by an invalidatepage
    > call that left
    > PagePrivate set but didn't allow the page to be freed.
    >
    > The page would turn into the funky anonymous zombie thing meant
    > for buffers
    > that had to be written before the page could be freed (PagePrivate
    > set but
    > page->mapping == NULL), and eventually find its way to
    > try_to_free_buffers().
    >
    > The problem with that theory is that I would expect page->private
    > to be
    > non-null in such a case. Randy, any chance this can be
    > reproduced?


    It didn't fail when I re-ran the test.

    ~Randy

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers

    On Monday 12 May 2008, Randy Dunlap wrote:
    > --- Original Message ---
    >
    > > On Monday 12 May 2008, Randy Dunlap wrote:
    > > > Andrew Morton wrote:
    > > > > On Sun, 11 May 2008 10:54:29 -0700 Randy Dunlap
    > > > >

    > >
    > > wrote:
    > > > >> On x86_64, during testing using "stress" package:
    > > > >>
    > > > >> BUG: unable to handle kernel NULL pointer dereference at
    > > > >> 0000000000000000
    > > > >>
    > > > >>
    > > > >> IP: [] drop_buffers+0x2f/0xfb
    > > > >> PGD 1ee8ad067 PUD 26f19a067 PMD 0
    > > > >> Oops: 0000 [1] SMP
    > > > >> CPU 3
    > > > >> Modules linked in: parport_pc lp parport tg3 cciss ehci_hcd ohci_hcd
    > > > >> uhci_hcd Pid: 16860, comm: stress Not tainted 2.6.26-rc1-git8 #1
    > > > >> RIP: 0010:[] []
    > > > >> drop_buffers+0x2f/0xfb RSP: 0000:ffff81026bc03a08 EFLAGS: 00010203
    > > > >> RAX: 0000000000000000 RBX: ffffe20008bae680 RCX: ffff81027f490f00
    > > > >> RDX: 0000000000000000 RSI: ffff81026bc03a58 RDI: ffffe20008bae680
    > > > >> RBP: ffff81026bc03a38 R08: ffff81026bc03b78 R09: ffff810001103780
    > > > >> R10: ffff81026bc03a08 R11: ffff81026bc03c88 R12: ffffe20008bae680
    > > > >> R13: ffff81027c412850 R14: ffff81026bc03d58 R15: ffff81026bc03a58
    > > > >> FS: 00007fa9e7e416f0(0000) GS:ffff81027f806980(0000)
    > > > >> knlGS:00000000f7f856c0 CS: 0010 DS: 0000 ES: 0000 CR0:
    > > > >> 000000008005003b CR2: 0000000000000000 CR3: 000000027f973000 CR4:
    > > > >> 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2:
    > > > >> 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
    > > > >> 0000000000000400 Process stress (pid: 16860, threadinfo
    > > > >> ffff81026bc02000, task ffff81027e424c50) Stack: ffffe20008bafa68
    > > > >> ffffe20008bae680 ffff81027f490f00 ffff81026bc03a58 ffff81026bc03d58
    > > > >> ffff81026bc03c88 ffff81026bc03a78 ffffffff802ad39f ffff81027f490f00
    > > > >> ffffe20008b14060 0000000000000000 ffff81027f490f00 Call Trace:
    > > > >> [] try_to_free_buffers+0x60/0xa2
    > > > >> [] try_to_release_page+0x3b/0x41
    > > > >> [] shrink_page_list+0x457/0x562
    > > > >> [] shrink_inactive_list+0x126/0x361
    > > > >> [] shrink_zone+0xe5/0x10a
    > > > >> [] try_to_free_pages+0x1ef/0x326
    > > > >> [] ? isolate_pages_global+0x0/0x34
    > > > >> [] __alloc_pages_internal+0x25a/0x3ad
    > > > >> [] __alloc_pages+0xb/0xd
    > > > >> [] handle_mm_fault+0x238/0x6d0
    > > > >> [] do_page_fault+0x438/0x7de
    > > > >> [] error_exit+0x0/0x51
    > > > >>
    > > > >>
    > > > >> Code: 41 57 49 89 f7 41 56 41 55 41 54 49 89 fc 53 48 83 ec 08 48 8b
    > > > >> 07 25 00 08 00 00 48 85 c0 75 04 0f 0b eb fe 4c 8b 6f 10 4c 89 ea
    > > > >> <48> 8b 02 25 00 08 00 00 48 85 c0 74 10 49 8b 44 24 18 48 85 c0 RIP
    > > > >> [] drop_buffers+0x2f/0xfb
    > > > >> RSP
    > > > >> CR2: 0000000000000000
    > > > >> Kernel panic - not syncing: Fatal exception
    > > > >
    > > > > Seems that local variable `bh' is NULL.
    > > > >
    > > > > I wonder what the heck we did to cause that. Which filesystems were
    > > > > in use?
    > > >
    > > > ext3, nfs, and the usual procfs, sysfs, and tmpfs.
    > > >
    > > > Also in the kernel: debugfs, usbfs, inotifyfs, configfs, ramfs,
    > > > hugetlbfs, msdos, vfat, iso9660, and rootfs.

    > >
    > > If you stand on your head, and race really really hard,
    > > nfs_inode_remove_request() does this without locking the page:
    > >
    > > set_page_private(req->wb_page, 0);
    > > ClearPagePrivate(req->wb_page);
    > >
    > > That code has been around for a long time though.
    > >
    > > Probably not the droids we're looking for, but it was the only
    > > one that jumped
    > > out at me during a quick search of set_page_private(foo, 0)
    > > callers.
    > >
    > > It seems more likely that we got there by an invalidatepage
    > > call that left
    > > PagePrivate set but didn't allow the page to be freed.
    > >
    > > The page would turn into the funky anonymous zombie thing meant
    > > for buffers
    > > that had to be written before the page could be freed (PagePrivate
    > > set but
    > > page->mapping == NULL), and eventually find its way to
    > > try_to_free_buffers().
    > >
    > > The problem with that theory is that I would expect page->private
    > > to be
    > > non-null in such a case. Randy, any chance this can be
    > > reproduced?

    >
    > It didn't fail when I re-ran the test.


    So, either the teeny tiny NFS race I saw or a really unfortunate single bit
    flip. We can put a busy loop into the NFS code to make it easier to trigger,
    but I don't think that'll prove its the bug you hit.

    -chris
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread