[bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten - Kernel

This is a discussion on [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten - Kernel ; A regression to v2.6.26: I started getting this skb-head corruption message today, on a T60 laptop with e1000: PM: Removing info for No Bus:vcs11 device: 'vcs11': device_create_release ================================================== =========================== BUG skbuff_head_cache: Poison overwritten ----------------------------------------------------------------------------- INFO: 0xf658ae9c-0xf658ae9c. First byte 0x6a instead ...

+ Reply to Thread
Page 1 of 3 1 2 3 LastLast
Results 1 to 20 of 55

Thread: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten

  1. [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten


    A regression to v2.6.26:

    I started getting this skb-head corruption message today, on a T60
    laptop with e1000:

    PM: Removing info for No Bus:vcs11
    device: 'vcs11': device_create_release
    ================================================== ===========================
    BUG skbuff_head_cache: Poison overwritten
    -----------------------------------------------------------------------------

    INFO: 0xf658ae9c-0xf658ae9c. First byte 0x6a instead of 0x6b
    INFO: Allocated in __alloc_skb+0x2c/0x110 age=0 cpu=0 pid=5098
    INFO: Freed in __kfree_skb+0x31/0x80 age=0 cpu=1 pid=4440
    INFO: Slab 0xc16cc140 objects=16 used=1 fp=0xf658ae00 flags=0x400000c3
    INFO: Object 0xf658ae00 @offset=3584 fp=0xf658af00

    Bytes b4 0xf658adf0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
    Object 0xf658ae00: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xf658ae10: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xf658ae20: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xf658ae30: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xf658ae40: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xf658ae50: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xf658ae60: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xf658ae70: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Redzone 0xf658aea0: bb bb bb bb ����
    Padding 0xf658aec8: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
    Padding 0xf658aed8: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
    Padding 0xf658aee8: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
    Padding 0xf658aef8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
    Pid: 5098, comm: gdm-binary Not tainted 2.6.26-tip #3094
    [] print_trailer+0xa9/0xf0
    [] check_bytes_and_report+0x9b/0xc0
    [] check_object+0x19e/0x1e0
    [] __slab_alloc+0x371/0x4e0
    [] kmem_cache_alloc+0xb2/0xc0
    [] ? __alloc_skb+0x2c/0x110
    [] ? __alloc_skb+0x2c/0x110
    [] __alloc_skb+0x2c/0x110
    [] find_skb+0x3c/0x80
    [] netpoll_send_udp+0x2b/0x1f0
    [] ? notify_update+0x22/0x30
    [] write_msg+0x95/0xe0
    [] ? write_msg+0x0/0xe0
    [] __call_console_drivers+0x60/0x70
    [] _call_console_drivers+0x79/0x90
    [] release_console_sem+0xc4/0x1f0
    [] vprintk+0x15e/0x3b0
    [] ? release_sysfs_dirent+0x43/0xa0
    [] ? release_sysfs_dirent+0x43/0xa0
    [] ? release_sysfs_dirent+0x43/0xa0
    [] printk+0x1b/0x20
    [] device_create_release+0x27/0x40
    [] device_release+0x15/0x70
    [] kobject_release+0x39/0x80
    [] ? kobject_release+0x0/0x80
    [] kref_put+0x2d/0x70
    [] kobject_put+0x20/0x50
    [] ? kobject_del+0x22/0x30
    [] ? device_del+0x123/0x140
    [] put_device+0xf/0x20
    [] device_unregister+0x35/0x40
    [] device_destroy+0x29/0x30
    [] vcs_remove_sysfs+0x1c/0x40
    [] con_close+0x5e/0x70
    [] release_dev+0x139/0x600
    [] ? __slab_free+0x1c2/0x240
    [] ? destroy_inode+0x39/0x40
    [] ? __d_free+0x23/0x30
    [] ? __d_free+0x23/0x30
    [] ? __d_free+0x23/0x30
    [] tty_release+0x12/0x20
    [] __fput+0xb2/0x1d0
    [] fput+0x19/0x20
    [] filp_close+0x49/0x70
    [] sys_close+0x66/0xb0
    [] sysenter_past_esp+0x6a/0x99
    =======================
    FIX skbuff_head_cache: Restoring 0xf658ae9c-0xf658ae9c=0x6b

    FIX skbuff_head_cache: Marking all objects used
    device: 'vcsa11': device_unregister
    PM: Removing info for No Bus:vcsa11
    device: 'vcsa11': device_create_release

    With this config:

    http://redhat.com/~mingo/misc/config..._CEST_2008.bad

    The box uses netconsole.

    Suspected range of breakage is v2.6.26..a3cf859, or around 3000 commits.
    But a fair portion of those commit were tested on this box before.

    Perhaps SLUB debugging got smarter?

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten


    * David Miller wrote:

    > From: Ingo Molnar
    > Date: Thu, 17 Jul 2008 23:42:22 +0200
    >
    > >
    > > A regression to v2.6.26:
    > >
    > > I started getting this skb-head corruption message today, on a T60
    > > laptop with e1000:

    >
    > This is very unlikely to be added by us networking folks, no
    > networking merges have happened for the 2.6.27 merge window yet :-)


    yeah. That's why i observed:

    > > Perhaps SLUB debugging got smarter?


    and Cc:-ed SLUB folks. Could be a sleeper cell of bugs gone active ;-)

    Or could be SLUB (-debugging) breakage. Netconsole is pretty reliable on
    this box. (and the bootup continued just fine after this report)

    Just re-tried it, the bug is reliably repeatable. Will try a bisection
    run.

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten

    From: Ingo Molnar
    Date: Fri, 18 Jul 2008 00:06:00 +0200

    > > > Perhaps SLUB debugging got smarter?

    >
    > and Cc:-ed SLUB folks. Could be a sleeper cell of bugs gone active ;-)


    This bug would be a quite positive result then
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten


    * Ingo Molnar wrote:

    > Just re-tried it, the bug is reliably repeatable. Will try a bisection
    > run.


    hm, but it was not reproducible on the third and fourth attempt :-( I
    tried hard to provoke it by generating artificial parallel network and
    netconsole output - but it didnt want to trigger. Heisenbug ...

    Maybe the debug output gives someone an idea about the nature of the
    bug?

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten

    On Thu, Jul 17, 2008 at 11:42 PM, Ingo Molnar wrote:
    >
    > A regression to v2.6.26:
    >
    > I started getting this skb-head corruption message today, on a T60
    > laptop with e1000:
    >
    > PM: Removing info for No Bus:vcs11
    > device: 'vcs11': device_create_release
    > ================================================== ===========================
    > BUG skbuff_head_cache: Poison overwritten
    > -----------------------------------------------------------------------------
    >
    > INFO: 0xf658ae9c-0xf658ae9c. First byte 0x6a instead of 0x6b


    1. Notice the range. It's just a single byte.
    2. Notice the value. It's just a ++.

    Probably a stray increment of a uint8_t somewhere on a freed object?

    The offset from the beginning of the object is 0xf658ae9c - 0xf658ae00 = 0x9c.

    How big is a struct sk_buff? Hm.. it is in fact quite big. Now what
    member has offset 0x9c? Seems to depend on your config. Is there any
    way you can figure it out, Ingo? I'll try it with your config too.


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten

    On Thu, Jul 17, 2008 at 11:42 PM, Ingo Molnar wrote:
    > With this config:
    >
    > http://redhat.com/~mingo/misc/config..._CEST_2008.bad
    >


    It doesn't actually work. The config says

    # head: 088fcf34

    and I checked out this from the tip tree. But kernel-config still
    complains about unknown config options... What went wrong?


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten

    On Fri, Jul 18, 2008 at 1:15 AM, Vegard Nossum wrote:
    > On Thu, Jul 17, 2008 at 11:42 PM, Ingo Molnar wrote:
    >>
    >> A regression to v2.6.26:
    >>
    >> I started getting this skb-head corruption message today, on a T60
    >> laptop with e1000:
    >>
    >> PM: Removing info for No Bus:vcs11
    >> device: 'vcs11': device_create_release
    >> ================================================== ===========================
    >> BUG skbuff_head_cache: Poison overwritten
    >> -----------------------------------------------------------------------------
    >>
    >> INFO: 0xf658ae9c-0xf658ae9c. First byte 0x6a instead of 0x6b

    >
    > 1. Notice the range. It's just a single byte.
    > 2. Notice the value. It's just a ++.
    >
    > Probably a stray increment of a uint8_t somewhere on a freed object?
    >
    > The offset from the beginning of the object is 0xf658ae9c - 0xf658ae00 = 0x9c.
    >
    > How big is a struct sk_buff? Hm.. it is in fact quite big. Now what
    > member has offset 0x9c? Seems to depend on your config. Is there any
    > way you can figure it out, Ingo? I'll try it with your config too.


    With your config:

    (gdb) p ((struct sk_buff *) 0)->truesize
    Cannot access memory at address 0x9c

    Now just audit users of ->truesize... There are quite a few.

    Which one would only += 1?


    Vegard

    PS: I might be on the completely wrong track. So far I only have bad
    experiences with this sk_buff...

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten


    * Vegard Nossum wrote:

    > On Thu, Jul 17, 2008 at 11:42 PM, Ingo Molnar wrote:
    > > With this config:
    > >
    > > http://redhat.com/~mingo/misc/config..._CEST_2008.bad
    > >

    >
    > It doesn't actually work. The config says
    >
    > # head: 088fcf34
    >
    > and I checked out this from the tip tree. But kernel-config still
    > complains about unknown config options... What went wrong?


    that's ok - i've got some local qa helpers that have config options.

    Things like making SMP bootups more likely in randconfig, adding various
    boot parameters to the bootup via .config methods (so that boot
    parameters can be randomized via make randconfig), etc.

    these:

    CONFIG_BOOTPARAM_SUPPORT=y
    CONFIG_BOOTPARAM_NO_HZ_OFF=y
    CONFIG_BOOTPARAM_NMI_WATCHDOG_BIT_0=y
    CONFIG_BOOTPARAM_LAPIC=y
    CONFIG_BOOTPARAM_IDLE_MWAIT=y
    CONFIG_BOOTPARAM_NOPAT=y
    CONFIG_BOOTPARAM_NOTSC=y

    are equivalent to adding this to the boot line:

    nohz=off nmi_watchdog=1 lapic idle=mwait nopat notsc

    although i dont think they are normally material to netconsole workings.

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten


    * Vegard Nossum wrote:

    > On Thu, Jul 17, 2008 at 11:42 PM, Ingo Molnar wrote:
    > >
    > > A regression to v2.6.26:
    > >
    > > I started getting this skb-head corruption message today, on a T60
    > > laptop with e1000:
    > >
    > > PM: Removing info for No Bus:vcs11
    > > device: 'vcs11': device_create_release
    > > ================================================== ===========================
    > > BUG skbuff_head_cache: Poison overwritten
    > > -----------------------------------------------------------------------------
    > >
    > > INFO: 0xf658ae9c-0xf658ae9c. First byte 0x6a instead of 0x6b

    >
    > 1. Notice the range. It's just a single byte.
    > 2. Notice the value. It's just a ++.
    >
    > Probably a stray increment of a uint8_t somewhere on a freed object?
    >
    > The offset from the beginning of the object is 0xf658ae9c - 0xf658ae00
    > = 0x9c.
    >
    > How big is a struct sk_buff? Hm.. it is in fact quite big. Now what
    > member has offset 0x9c? Seems to depend on your config. Is there any
    > way you can figure it out, Ingo? I'll try it with your config too.


    hmm ... your analysis gave me a wonderful albeit admittedly remote idea:

    If only we had some kernel technology that could track and validate
    memory accesses, and point out the cases where we access uninitialized
    memory, just like Valgrind?

    .... something like kmemcheck? ;-)

    So i booted that box with tip/master and kmemcheck enabled. (plus a few
    fixlets to make networking allocations be properly tracked by
    kmemcheck.)

    It was a slow bootup and long wait, but it gave a few hits here:

    kmemcheck: Caught 8-bit read from uninitialized memory (f653ad24)
    iiiiiiiiiiiiiiiiuuuuuuuuuuuuuuuuuuuuuiuuuuuuuuuuuu uuuuuuuuuuuuuu
    ^

    Pid: 2484, comm: arping Not tainted (2.6.26-tip #20187)
    EIP: 0060:[] EFLAGS: 00010282 CPU: 0
    EIP is at __copy_skb_header+0x7c/0x100
    EAX: 00000000 EBX: f653acc0 ECX: f653ac00 EDX: f653ac00
    ESI: f653ac50 EDI: f653ad10 EBP: c09b9e84 ESP: c09ddaa8
    DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    CR0: 8005003b CR2: f71c2700 CR3: 36513000 CR4: 000006d0
    DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    DR6: ffff4ff0 DR7: 00000400
    [] __skb_clone+0x27/0xe0
    [] skb_clone+0x41/0x60
    [] packet_rcv+0xc1/0x290
    [] netif_receive_skb+0x20d/0x400
    [] e1000_receive_skb+0x47/0x180
    [] e1000_clean_rx_irq+0x223/0x2e0
    [] e1000_clean+0x5b/0x200
    [] net_rx_action+0xfb/0x160
    [] __do_softirq+0x82/0xf0
    [] call_on_stack+0x1a/0x30

    false positive? Find below the quick hacks i did to pre-initialize skb
    allocations that have RX DMA into them.

    another one is:

    kmemcheck: Caught 8-bit read from uninitialized memory (f653a902)
    iiuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuu
    ^

    Pid: 2575, comm: hcid Not tainted (2.6.26-tip #20187)
    EIP: 0060:[] EFLAGS: 00010293 CPU: 0
    EIP is at __copy_to_user_ll+0x46/0x70
    EAX: 00000004 EBX: b7f3c478 ECX: 00000002 EDX: f653a900
    ESI: f653a902 EDI: b7f3c47a EBP: f668ceec ESP: c09ddbc8
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    CR0: 8005003b CR2: f71c2700 CR3: 3668d000 CR4: 000006d0
    DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    DR6: ffff4ff0 DR7: 00000400
    [] copy_to_user+0x3a/0x50
    [] hci_get_dev_list+0x100/0x120
    [] hci_sock_ioctl+0x143/0x2c0
    [] sock_ioctl+0xc1/0x1d0
    [] vfs_ioctl+0x2d/0x90
    [] do_vfs_ioctl+0x26b/0x2d0
    [] sys_ioctl+0x57/0x70
    [] sysenter_past_esp+0x6a/0x91
    [] 0xffffffff

    this might actually be genuine use of uninitialized memory, hm? Or
    perhaps gcc optimizing out bitmasks and kmemcheck not coping with it?

    a third type was this:

    kmemcheck: Caught 8-bit read from uninitialized memory (f653a2a4)
    iiiiiiiiiiiiiiiiuuuuuuuuuuuuuuuuuuuuuiuuuuuuuuuuuu uuuuuuuuuuuuuu
    ^

    Pid: 2771, comm: ssh Not tainted (2.6.26-tip #20187)
    EIP: 0060:[] EFLAGS: 00010282 CPU: 0
    EIP is at __copy_skb_header+0x7c/0x100
    EAX: 00000000 EBX: f653a240 ECX: f6762000 EDX: f6762000
    ESI: f6762050 EDI: f653a290 EBP: f675cd28 ESP: c09ddce8
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    CR0: 8005003b CR2: f71c2700 CR3: 367e3000 CR4: 000006d0
    DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    DR6: ffff4ff0 DR7: 00000400
    [] __skb_clone+0x27/0xe0
    [] skb_clone+0x41/0x60
    [] tcp_transmit_skb+0x41/0x800
    [] tcp_connect+0x293/0x330
    [] tcp_v4_connect+0x3d6/0x550
    [] inet_stream_connect+0x1b9/0x240
    [] sys_connect+0x86/0xa0
    [] sys_socketcall+0x220/0x260
    [] sysenter_past_esp+0x6a/0x91
    [] 0xffffffff

    this too is likely a false positive related to RX packets?

    none of this looks netconsole related.

    I'll keep the box running under kmemcheck - maybe something pops up.

    Ingo

    ------------------->
    Subject: kmemcheck/net hacks
    From: Ingo Molnar

    ---
    include/asm-generic/siginfo.h | 8 ++++++++
    include/linux/fs.h | 4 ++--
    include/linux/netdevice.h | 4 ++--
    include/linux/skbuff.h | 6 +++++-
    include/net/inet_sock.h | 3 ++-
    include/net/tcp.h | 11 +++++++++++
    kernel/signal.c | 12 ++++++++++++
    net/core/skbuff.c | 6 ++++++
    net/ipv4/tcp_output.c | 4 ++++
    9 files changed, 52 insertions(+), 6 deletions(-)

    Index: linux/include/asm-generic/siginfo.h
    ================================================== =================
    --- linux.orig/include/asm-generic/siginfo.h
    +++ linux/include/asm-generic/siginfo.h
    @@ -278,11 +278,19 @@ void do_schedule_next_timer(struct sigin

    static inline void copy_siginfo(struct siginfo *to, struct siginfo *from)
    {
    +#ifdef CONFIG_KMEMCHECK
    + memcpy(to, from, sizeof(*to));
    +#else
    + /*
    + * Optimization, only copy up to the size of the largest known
    + * union member:
    + */
    if (from->si_code < 0)
    memcpy(to, from, sizeof(*to));
    else
    /* _sigchld is currently the largest know union member */
    memcpy(to, from, __ARCH_SI_PREAMBLE_SIZE + sizeof(from->_sifields._sigchld));
    +#endif
    }

    #endif
    Index: linux/include/linux/fs.h
    ================================================== =================
    --- linux.orig/include/linux/fs.h
    +++ linux/include/linux/fs.h
    @@ -922,8 +922,8 @@ struct file_lock {
    struct pid *fl_nspid;
    wait_queue_head_t fl_wait;
    struct file *fl_file;
    - unsigned char fl_flags;
    - unsigned char fl_type;
    + unsigned int fl_flags;
    + unsigned int fl_type;
    loff_t fl_start;
    loff_t fl_end;

    Index: linux/include/linux/netdevice.h
    ================================================== =================
    --- linux.orig/include/linux/netdevice.h
    +++ linux/include/linux/netdevice.h
    @@ -199,8 +199,8 @@ struct dev_addr_list
    {
    struct dev_addr_list *next;
    u8 da_addr[MAX_ADDR_LEN];
    - u8 da_addrlen;
    - u8 da_synced;
    + unsigned int da_addrlen;
    + unsigned int da_synced;
    int da_users;
    int da_gusers;
    };
    Index: linux/include/linux/skbuff.h
    ================================================== =================
    --- linux.orig/include/linux/skbuff.h
    +++ linux/include/linux/skbuff.h
    @@ -1208,7 +1208,11 @@ static inline void __skb_queue_purge(str
    static inline struct sk_buff *__dev_alloc_skb(unsigned int length,
    gfp_t gfp_mask)
    {
    - struct sk_buff *skb = alloc_skb(length + NET_SKB_PAD, gfp_mask);
    + struct sk_buff *skb;
    +#ifdef CONFIG_KMEMCHECK
    + gfp_mask |= __GFP_ZERO;
    +#endif
    + skb = alloc_skb(length + NET_SKB_PAD, gfp_mask);
    if (likely(skb))
    skb_reserve(skb, NET_SKB_PAD);
    return skb;
    Index: linux/include/net/inet_sock.h
    ================================================== =================
    --- linux.orig/include/net/inet_sock.h
    +++ linux/include/net/inet_sock.h
    @@ -72,7 +72,8 @@ struct inet_request_sock {
    sack_ok : 1,
    wscale_ok : 1,
    ecn_ok : 1,
    - acked : 1;
    + acked : 1,
    + __filler : 3;
    struct ip_options *opt;
    };

    Index: linux/include/net/tcp.h
    ================================================== =================
    --- linux.orig/include/net/tcp.h
    +++ linux/include/net/tcp.h
    @@ -966,6 +966,17 @@ static inline void tcp_openreq_init(stru
    tcp_rsk(req)->rcv_isn = TCP_SKB_CB(skb)->seq;
    req->mss = rx_opt->mss_clamp;
    req->ts_recent = rx_opt->saw_tstamp ? rx_opt->rcv_tsval : 0;
    +#ifdef CONFIG_KMEMCHECK
    + /* bitfield init */
    + ireq->snd_wscale =
    + ireq->rcv_wscale =
    + ireq->tstamp_ok =
    + ireq->sack_ok =
    + ireq->wscale_ok =
    + ireq->ecn_ok =
    + ireq->acked =
    + ireq->__filler = 0;
    +#endif
    ireq->tstamp_ok = rx_opt->tstamp_ok;
    ireq->sack_ok = rx_opt->sack_ok;
    ireq->snd_wscale = rx_opt->snd_wscale;
    Index: linux/kernel/signal.c
    ================================================== =================
    --- linux.orig/kernel/signal.c
    +++ linux/kernel/signal.c
    @@ -841,6 +841,12 @@ static int send_signal(int sig, struct s
    list_add_tail(&q->list, &pending->list);
    switch ((unsigned long) info) {
    case (unsigned long) SEND_SIG_NOINFO:
    + /*
    + * Make sure we always have a fully initialized
    + * siginfo struct:
    + */
    + memset(&q->info, 0, sizeof(q->info));
    +
    q->info.si_signo = sig;
    q->info.si_errno = 0;
    q->info.si_code = SI_USER;
    @@ -848,6 +854,12 @@ static int send_signal(int sig, struct s
    q->info.si_uid = current->uid;
    break;
    case (unsigned long) SEND_SIG_PRIV:
    + /*
    + * Make sure we always have a fully initialized
    + * siginfo struct:
    + */
    + memset(&q->info, 0, sizeof(q->info));
    +
    q->info.si_signo = sig;
    q->info.si_errno = 0;
    q->info.si_code = SI_KERNEL;
    Index: linux/net/core/skbuff.c
    ================================================== =================
    --- linux.orig/net/core/skbuff.c
    +++ linux/net/core/skbuff.c
    @@ -225,6 +225,9 @@ struct sk_buff *__alloc_skb(unsigned int
    struct sk_buff *child = skb + 1;
    atomic_t *fclone_ref = (atomic_t *) (child + 1);

    +#ifdef CONFIG_KMEMCHECK
    + memset(child, 0, offsetof(struct sk_buff, tail));
    +#endif
    skb->fclone = SKB_FCLONE_ORIG;
    atomic_set(fclone_ref, 1);

    @@ -257,6 +260,9 @@ struct sk_buff *__netdev_alloc_skb(struc
    int node = dev_to_node(&dev->dev);
    struct sk_buff *skb;

    +#ifdef CONFIG_KMEMCHECK
    + gfp_mask |= __GFP_ZERO;
    +#endif
    skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask, 0, node);
    if (likely(skb)) {
    skb_reserve(skb, NET_SKB_PAD);
    Index: linux/net/ipv4/tcp_output.c
    ================================================== =================
    --- linux.orig/net/ipv4/tcp_output.c
    +++ linux/net/ipv4/tcp_output.c
    @@ -333,6 +333,10 @@ static inline void TCP_ECN_send(struct s
    static void tcp_init_nondata_skb(struct sk_buff *skb, u32 seq, u8 flags)
    {
    skb->csum = 0;
    + skb->local_df = skb->cloned = skb->ip_summed = skb->nohdr =
    + skb->nfctinfo = 0;
    + skb->pkt_type = skb->fclone = skb->ipvs_property = skb->peeked =
    + skb->nf_trace = 0;

    TCP_SKB_CB(skb)->flags = flags;
    TCP_SKB_CB(skb)->sacked = 0;
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten

    On Fri, Jul 18, 2008 at 1:52 AM, Ingo Molnar wrote:
    > If only we had some kernel technology that could track and validate
    > memory accesses, and point out the cases where we access uninitialized
    > memory, just like Valgrind?
    >
    > ... something like kmemcheck? ;-)


    Cool

    > So i booted that box with tip/master and kmemcheck enabled. (plus a few
    > fixlets to make networking allocations be properly tracked by
    > kmemcheck.)
    >
    > It was a slow bootup and long wait, but it gave a few hits here:


    Hm, if you think it was that slow, I am suspecting you were also using
    SLUB debugging.

    This can actually be negative, since now SLUB will access the objects
    (+redzone +padding) and possibly trick kmemcheck into thinking they
    were initialized in the first place.

    But what we are really looking for is "read from freed memory"
    messages. So I would actually recommend this: Disable kmemcheck's
    reporting of uninitialized memory, simply to make it easier to spot
    the "freed" messages more easily.

    Maybe something like this (warning: whitespace-munged):

    diff --git a/arch/x86/mm/kmemcheck/error.c b/arch/x86/mm/kmemcheck/error.c
    index 56410c6..6944cb7 100644
    --- a/arch/x86/mm/kmemcheck/error.c
    +++ b/arch/x86/mm/kmemcheck/error.c
    @@ -98,6 +98,9 @@ void kmemcheck_error_save(enum kmemcheck_shadow state,
    return;
    prev_ip = regs->ip;

    + if (state == KMEMCHECK_SHADOW_UNINITIALIZED)
    + return;
    +
    e = error_next_wr();
    if (!e)
    return;


    If this only happens during boot, it would also be a good idea to
    simply reboot the machine a lot...


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten


    two other ones:

    kmemcheck: Caught 32-bit read from uninitialized memory (f459ac00)
    uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuu
    ^

    Pid: 1614, comm: kjournald Not tainted (2.6.26-tip #20187)
    EIP: 0060:[] EFLAGS: 00010216 CPU: 0
    EIP is at skb_copy_bits+0x54/0x220
    EAX: 00000110 EBX: 00000110 ECX: 00000044 EDX: f6c256c0
    ESI: f459ac00 EDI: f459a000 EBP: c09b9f60 ESP: c09de048
    DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    CR0: 8005003b CR2: f71c2700 CR3: 2e849000 CR4: 000006d0
    DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    DR6: ffff4ff0 DR7: 00000400
    [] skb_copy+0x78/0x90
    [] neigh_timer_handler+0x156/0x2c0
    [] run_timer_softirq+0x142/0x180
    [] __do_softirq+0x82/0xf0
    [] call_on_stack+0x1a/0x30
    [] 0xffffffff


    plus:

    kmemcheck: Caught 32-bit read from uninitialized memory (f654b590)
    iiiiiiiiiiiiiiiiuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu uuuuuuuuuuuuuu
    ^

    Pid: 3130, comm: git-read-tree Not tainted (2.6.26-tip #20187)
    EIP: 0060:[] EFLAGS: 00010286 CPU: 0
    EIP is at pskb_expand_head+0x86/0x150
    EAX: 00000140 EBX: f6763124 ECX: 00000027 EDX: f6549400
    ESI: f654b590 EDI: f6549650 EBP: c09b9a80 ESP: c09dfa28
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    CR0: 8005003b CR2: f71c2700 CR3: 2e849000 CR4: 000006d0
    DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    DR6: ffff4ff0 DR7: 00000400
    [] __pskb_pull_tail+0x21d/0x300
    [] e1000_xmit_frame+0x1fd/0xa60
    [] dev_hard_start_xmit+0x254/0x2e0
    [] __qdisc_run+0x61/0x1b0
    [] dev_queue_xmit+0x25e/0x340
    [] ip_finish_output+0x105/0x260
    [] ip_output+0x49/0x50
    [] ip_local_out+0x1d/0x30
    [] ip_queue_xmit+0x1a5/0x340
    [] tcp_transmit_skb+0x443/0x800
    [] __tcp_push_pending_frames+0xfa/0x730
    [] tcp_rcv_established+0x3b8/0x6f0
    [] tcp_v4_do_rcv+0x2d0/0x510
    [] tcp_v4_rcv+0x58e/0x660
    [] ip_local_deliver+0x4c/0x180
    [] ip_rcv+0x2be/0x570
    [] netif_receive_skb+0x333/0x400
    [] e1000_receive_skb+0x47/0x180
    [] e1000_clean_rx_irq+0x223/0x2e0
    [] e1000_clean+0x5b/0x200
    [] net_rx_action+0xfb/0x160
    [] __do_softirq+0x82/0xf0
    [] call_on_stack+0x1a/0x30
    [] 0xffffffff

    again, from the RX path, likely false positives. We've got to fix these
    false positives to make automated debugging easier.

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten


    * Vegard Nossum wrote:

    > On Fri, Jul 18, 2008 at 1:52 AM, Ingo Molnar wrote:
    > > If only we had some kernel technology that could track and validate
    > > memory accesses, and point out the cases where we access uninitialized
    > > memory, just like Valgrind?
    > >
    > > ... something like kmemcheck? ;-)

    >
    > Cool
    >
    > > So i booted that box with tip/master and kmemcheck enabled. (plus a few
    > > fixlets to make networking allocations be properly tracked by
    > > kmemcheck.)
    > >
    > > It was a slow bootup and long wait, but it gave a few hits here:

    >
    > Hm, if you think it was that slow, I am suspecting you were also using
    > SLUB debugging.


    nope:

    # CONFIG_SLUB_DEBUG is not set
    CONFIG_SLUB=y

    > This can actually be negative, since now SLUB will access the objects
    > (+redzone +padding) and possibly trick kmemcheck into thinking they
    > were initialized in the first place.
    >
    > But what we are really looking for is "read from freed memory"
    > messages. So I would actually recommend this: Disable kmemcheck's
    > reporting of uninitialized memory, simply to make it easier to spot
    > the "freed" messages more easily.
    >
    > Maybe something like this (warning: whitespace-munged):


    ok, applied this too.

    > If this only happens during boot, it would also be a good idea to
    > simply reboot the machine a lot...


    yeah, i've got a script for that. Will try it overnight.

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten

    From: "Vegard Nossum"
    Date: Fri, 18 Jul 2008 01:15:47 +0200

    > On Thu, Jul 17, 2008 at 11:42 PM, Ingo Molnar wrote:
    > >
    > > A regression to v2.6.26:
    > >
    > > I started getting this skb-head corruption message today, on a T60
    > > laptop with e1000:
    > >
    > > PM: Removing info for No Bus:vcs11
    > > device: 'vcs11': device_create_release
    > > ================================================== ===========================
    > > BUG skbuff_head_cache: Poison overwritten
    > > -----------------------------------------------------------------------------
    > >
    > > INFO: 0xf658ae9c-0xf658ae9c. First byte 0x6a instead of 0x6b

    >
    > 1. Notice the range. It's just a single byte.
    > 2. Notice the value. It's just a ++.


    It's supposed to be 0x6b, this would be a "--"

    Also it (more likely IMHO) could be clearing a flag with the value 0x01.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten

    From: Ingo Molnar
    Date: Fri, 18 Jul 2008 01:52:54 +0200

    > kmemcheck: Caught 8-bit read from uninitialized memory (f653ad24)
    > iiiiiiiiiiiiiiiiuuuuuuuuuuuuuuuuuuuuuiuuuuuuuuuuuu uuuuuuuuuuuuuu
    > ^
    >
    > Pid: 2484, comm: arping Not tainted (2.6.26-tip #20187)
    > EIP: 0060:[] EFLAGS: 00010282 CPU: 0
    > EIP is at __copy_skb_header+0x7c/0x100
    > EAX: 00000000 EBX: f653acc0 ECX: f653ac00 EDX: f653ac00
    > ESI: f653ac50 EDI: f653ad10 EBP: c09b9e84 ESP: c09ddaa8
    > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    > CR0: 8005003b CR2: f71c2700 CR3: 36513000 CR4: 000006d0
    > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    > DR6: ffff4ff0 DR7: 00000400
    > [] __skb_clone+0x27/0xe0
    > [] skb_clone+0x41/0x60
    > [] packet_rcv+0xc1/0x290
    > [] netif_receive_skb+0x20d/0x400
    > [] e1000_receive_skb+0x47/0x180
    > [] e1000_clean_rx_irq+0x223/0x2e0
    > [] e1000_clean+0x5b/0x200
    > [] net_rx_action+0xfb/0x160
    > [] __do_softirq+0x82/0xf0
    > [] call_on_stack+0x1a/0x30
    >
    > false positive? Find below the quick hacks i did to pre-initialize skb
    > allocations that have RX DMA into them.


    Maybe. Every SKB object allocated is fully initialized
    in __alloc_skb():

    /*
    * Only clear those fields we need to clear, not those that we will
    * actually initialise below. Hence, don't put any more fields after
    * the tail pointer in struct sk_buff!
    */
    memset(skb, 0, offsetof(struct sk_buff, tail));

    That leaves the following trailing members of struct sk_buff:

    /* These elements must be at the end, see alloc_skb() for details. */
    sk_buff_data_t tail;
    sk_buff_data_t end;
    unsigned char *head,
    *data;
    unsigned int truesize;
    atomic_t users;

    which are the explicitly initialized right after the quotes memset().

    skb->truesize = size + sizeof(struct sk_buff);
    atomic_set(&skb->users, 1);
    skb->head = data;
    skb->data = data;
    skb_reset_tail_pointer(skb);
    skb->end = skb->tail + size;

    When we clone, there are probably some fields we don't copy over
    explicitly. And we usually do that because they don't matter or
    if they do the caller will take care of it.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten

    Hi Ingo.

    On Thu, Jul 17, 2008 at 11:42:22PM +0200, Ingo Molnar (mingo@elte.hu) wrote:
    > Pid: 5098, comm: gdm-binary Not tainted 2.6.26-tip #3094
    > [] print_trailer+0xa9/0xf0
    > [] check_bytes_and_report+0x9b/0xc0
    > [] check_object+0x19e/0x1e0
    > [] __slab_alloc+0x371/0x4e0
    > [] kmem_cache_alloc+0xb2/0xc0
    > [] ? __alloc_skb+0x2c/0x110


    Out of curiosity, why does it scream at allocation time?
    Does SLUB have a debug check at freeing time? If so, how does it work
    and why didn't it caught use after free there?

    --
    Evgeniy Polyakov
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten

    On Fri, Jul 18, 2008 at 4:03 AM, David Miller wrote:
    >> On Thu, Jul 17, 2008 at 11:42 PM, Ingo Molnar wrote:
    >> >
    >> > A regression to v2.6.26:
    >> >
    >> > I started getting this skb-head corruption message today, on a T60
    >> > laptop with e1000:
    >> >
    >> > PM: Removing info for No Bus:vcs11
    >> > device: 'vcs11': device_create_release
    >> > ================================================== ===========================
    >> > BUG skbuff_head_cache: Poison overwritten
    >> > -----------------------------------------------------------------------------
    >> >
    >> > INFO: 0xf658ae9c-0xf658ae9c. First byte 0x6a instead of 0x6b

    >>
    >> 1. Notice the range. It's just a single byte.
    >> 2. Notice the value. It's just a ++.

    >
    > It's supposed to be 0x6b, this would be a "--"


    You're right! Oops. In my defence, I wrote that at 2 AM last night ;-)

    > Also it (more likely IMHO) could be clearing a flag with the value 0x01.


    It could be. But like I said in a later e-mail, the thing is likely
    sk_buff->truesize. Which is not a flags variable. It _is_ however, a
    counter, which is frequently -= and atomic_sub()ed.

    That field is also an int, not a byte like I suggested above. This is
    fine, though. "--" on an int can of course legitimately update/change
    just the lower byte of an int.

    But.. it could also be some random corruption coming from elsewhere.
    Maybe even bad RAM (it's just a single bit anyway). But that's less
    likely.


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten

    From: "Vegard Nossum"
    Date: Fri, 18 Jul 2008 09:03:50 +0200

    > > It's supposed to be 0x6b, this would be a "--"

    >
    > You're right! Oops. In my defence, I wrote that at 2 AM last night ;-)
    >
    > > Also it (more likely IMHO) could be clearing a flag with the value 0x01.

    >
    > It could be. But like I said in a later e-mail, the thing is likely
    > sk_buff->truesize. Which is not a flags variable. It _is_ however, a
    > counter, which is frequently -= and atomic_sub()ed.


    skb->truesize is ever incremented or decremented by only one.

    Usually it is changed by the entire packet size, or at least one MSS's
    worth.

    On packet free, it will be decremented by at least sizeof(struct sk_buff)
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten


    * Vegard Nossum wrote:

    > But.. it could also be some random corruption coming from elsewhere.
    > Maybe even bad RAM (it's just a single bit anyway). But that's less
    > likely.


    ok, i looked at the logs once more and while i thought that it occured
    twice it only occured once:

    Jul 17 20:22:14 europe kernel: BUG skbuff_head_cache: Poison overwritten

    .... this would explain why my attempts to bisect and reproduce it
    failed. I got too excited about it being seemingly reproducible and
    possibly bisectable (memory corruption bugs rarely are). My overnight
    reboot-the-same-kernel tests didnt show anything either.

    It's a known-reliable system with thousands of bootups:

    Jul 17 20:20:54 europe kernel: Linux version 2.6.26-tip (mingo@europe)
    (gcc version 4.2.2) #3094 SMP Thu Jul 17 20:19:27 CEST 2008

    .... but a hw fluke is never out of question. (It wasnt a particularly
    hot day but the evening was unusually humid, maybe that made the
    difference.)

    So lets close this for now, it's not a reproducible regression that we
    can act upon. I'll update this thread if anything new happens.
    Netconsole is reliable on this system in any case.

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten

    Hi Ingo,

    On Thu, 17 Jul 2008, Ingo Molnar wrote:
    > A regression to v2.6.26:
    >
    > I started getting this skb-head corruption message today, on a T60
    > laptop with e1000:


    [snip]

    On Thu, 17 Jul 2008, Ingo Molnar wrote:
    > Perhaps SLUB debugging got smarter?


    Nope.

    On Thu, 17 Jul 2008, Ingo Molnar wrote:
    > PM: Removing info for No Bus:vcs11
    > device: 'vcs11': device_create_release
    > ================================================== ===========================
    > BUG skbuff_head_cache: Poison overwritten
    > -----------------------------------------------------------------------------
    >
    > INFO: 0xf658ae9c-0xf658ae9c. First byte 0x6a instead of 0x6b


    0x6b is POISON_FREE so 0x6a is one bit corruption.

    > INFO: Allocated in __alloc_skb+0x2c/0x110 age=0 cpu=0 pid=5098
    > INFO: Freed in __kfree_skb+0x31/0x80 age=0 cpu=1 pid=4440
    > INFO: Slab 0xc16cc140 objects=16 used=1 fp=0xf658ae00 flags=0x400000c3
    > INFO: Object 0xf658ae00 @offset=3584 fp=0xf658af00
    >
    > Bytes b4 0xf658adf0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
    > Object 0xf658ae00: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    > Object 0xf658ae10: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    > Object 0xf658ae20: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    > Object 0xf658ae30: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    > Object 0xf658ae40: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    > Object 0xf658ae50: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    > Object 0xf658ae60: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    > Object 0xf658ae70: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk


    It's bit unfortunate that we don't see full dump of the corruption here
    because SLUB limits the output to 128 bytes. Ingo, you might want to try
    this patch so that we can see all of it:

    diff --git a/mm/slub.c b/mm/slub.c
    index 5f6e2c4..f69d181 100644
    --- a/mm/slub.c
    +++ b/mm/slub.c
    @@ -492,7 +492,7 @@ static void print_trailer(struct kmem_cache *s, struct page *page, u8 *p)
    if (p > addr + 16)
    print_section("Bytes b4", p - 16, 16);

    - print_section("Object", p, min(s->objsize, 128));
    + print_section("Object", p, s->objsize);

    if (s->flags & SLAB_RED_ZONE)
    print_section("Redzone", p + s->objsize,
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison overwritten

    Hi Evgeniy,

    On Fri, Jul 18, 2008 at 8:46 AM, Evgeniy Polyakov wrote:
    > Hi Ingo.
    >
    > On Thu, Jul 17, 2008 at 11:42:22PM +0200, Ingo Molnar (mingo@elte.hu) wrote:
    >> Pid: 5098, comm: gdm-binary Not tainted 2.6.26-tip #3094
    >> [] print_trailer+0xa9/0xf0
    >> [] check_bytes_and_report+0x9b/0xc0
    >> [] check_object+0x19e/0x1e0
    >> [] __slab_alloc+0x371/0x4e0
    >> [] kmem_cache_alloc+0xb2/0xc0
    >> [] ? __alloc_skb+0x2c/0x110

    >
    > Out of curiosity, why does it scream at allocation time?


    Because it's checking for use-after-free errors. The object is
    poisoned with POISON_FREE when it's free'd and we verify the poison
    values at allocation time.

    On Fri, Jul 18, 2008 at 8:46 AM, Evgeniy Polyakov wrote:
    > Does SLUB have a debug check at freeing time? If so, how does it work
    > and why didn't it caught use after free there?


    You can't detect use after free before the object is actually free'd ;-)

    Pekka
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 1 of 3 1 2 3 LastLast