v2.6.26-rc7: BUG task_struct: Poison overwritten - Kernel

This is a discussion on v2.6.26-rc7: BUG task_struct: Poison overwritten - Kernel ; Hi, I don't really know what to do about this. The stacktrace is utterly uninteresting, too. Maybe you can extract some information from it? I was running lmbench and hotplug/-unplugging CPUs, some network- exercising code, and a grep on /sys. ...

+ Reply to Thread
Results 1 to 7 of 7

Thread: v2.6.26-rc7: BUG task_struct: Poison overwritten

  1. v2.6.26-rc7: BUG task_struct: Poison overwritten

    Hi,

    I don't really know what to do about this. The stacktrace is utterly
    uninteresting, too. Maybe you can extract some information from it?
    I was running lmbench and hotplug/-unplugging CPUs, some network-
    exercising code, and a grep on /sys.

    I have the config and vmlinux with debugging symbols if this is needed.

    Thanks.


    Vegard


    ================================================== ===========================
    BUG task_struct: Poison overwritten
    -----------------------------------------------------------------------------

    INFO: 0xeb2f3018-0xeb2f302b. First byte 0x35 instead of 0x6b
    INFO: Allocated in copy_process+0x70/0x1090 age=326 cpu=1 pid=2109
    INFO: Freed in free_task+0x2c/0x30 age=283 cpu=0 pid=19516
    INFO: Slab 0xc1974480 objects=8 used=7 fp=0xeb2f2fd0 flags=0x400020c3
    INFO: Object 0xeb2f2fd0 @offset=12240 fp=0x00000000

    Bytes b4 0xeb2f2fc0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
    Object 0xeb2f2fd0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xeb2f2fe0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xeb2f2ff0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xeb2f3000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xeb2f3010: 6b 6b 6b 6b 6b 6b 6b 6b 35 cc 4a 6a 1f 08 00 00 kkkkkkkk5Jj....
    Object 0xeb2f3020: 35 cc 4a 6a 6c 6b 6b 6b 6a 6b 6b eb 6b 6b 6b 6b 5Jjlkkkjkkkkkk
    Object 0xeb2f3030: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xeb2f3040: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Redzone 0xeb2f3f88: bb bb bb bb
    Padding 0xeb2f3fb0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
    Pid: 1326, comm: udevd Not tainted 2.6.26-rc7 #25
    [] print_trailer+0xaa/0xe0
    [] check_bytes_and_report+0x9b/0xc0
    [] check_object+0x193/0x1f0
    [] __slab_alloc+0x44b/0x5d0
    [] kmem_cache_alloc+0xb3/0xe0
    [] ? copy_process+0x70/0x1090
    [] ? copy_process+0x70/0x1090
    [] copy_process+0x70/0x1090
    [] ? remove_vma+0x49/0x60
    [] do_fork+0x57/0x280
    [] ? do_munmap+0x18a/0x1f0
    [] ? sys_munmap+0x46/0x60
    [] sys_clone+0x2f/0x40
    [] sysenter_past_esp+0x78/0xd1
    =======================
    FIX task_struct: Restoring 0xeb2f3018-0xeb2f302b=0x6b

    FIX task_struct: Marking all objects used

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: v2.6.26-rc7: BUG task_struct: Poison overwritten

    Oops, seems there was another one a bit earlier (about 5 minutes) that I
    didn't notice. I don't think it helps that much, but here it is:


    ================================================== ===========================
    BUG task_struct: Poison overwritten
    -----------------------------------------------------------------------------

    INFO: 0xf53ab018-0xf53ab02b. First byte 0x71 instead of 0x6b
    INFO: Allocated in copy_process+0x70/0x1090 age=110 cpu=1 pid=28664
    INFO: Freed in free_task+0x2c/0x30 age=68 cpu=0 pid=28667
    INFO: Slab 0xc1ba6cc0 objects=8 used=5 fp=0xf53aafd0 flags=0x400020c3
    INFO: Object 0xf53aafd0 @offset=12240 fp=0xf53acfb0

    Bytes b4 0xf53aafc0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
    Object 0xf53aafd0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xf53aafe0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xf53aaff0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xf53ab000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xf53ab010: 6b 6b 6b 6b 6b 6b 6b 6b 71 19 6f be dd 07 00 00 kkkkkkkkq.o
    ...
    Object 0xf53ab020: 71 19 6f be 6b 6b 6b 6b 6a 6b 6b eb 6b 6b 6b 6b q.okkkkjkkkkkk
    Object 0xf53ab030: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xf53ab040: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Redzone 0xf53abf88: bb bb bb bb

    Padding 0xf53abfb0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
    Pid: 1326, comm: udevd Not tainted 2.6.26-rc7 #25
    [] print_trailer+0xaa/0xe0
    [] check_bytes_and_report+0x9b/0xc0
    [] check_object+0x193/0x1f0
    [] __slab_alloc+0x44b/0x5d0
    [] kmem_cache_alloc+0xb3/0xe0
    [] ? copy_process+0x70/0x1090
    [] ? copy_process+0x70/0x1090
    [] copy_process+0x70/0x1090
    [] ? remove_vma+0x49/0x60
    [] do_fork+0x57/0x280
    [] ? do_munmap+0x18a/0x1f0
    [] ? sys_munmap+0x46/0x60
    [] sys_clone+0x2f/0x40
    [] sysenter_past_esp+0x78/0xd1
    =======================
    FIX task_struct: Restoring 0xf53ab018-0xf53ab02b=0x6b

    FIX task_struct: Marking all objects used

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: v2.6.26-rc7: BUG task_struct: Poison overwritten

    On Sat, Jun 21, 2008 at 9:28 PM, Vegard Nossum wrote:
    > Oops, seems there was another one a bit earlier (about 5 minutes) that I
    > didn't notice. I don't think it helps that much, but here it is:


    I actually got a third one too, but it's similar to the first two.

    >
    >
    > ================================================== ===========================
    > BUG task_struct: Poison overwritten
    > -----------------------------------------------------------------------------
    >
    > INFO: 0xf53ab018-0xf53ab02b. First byte 0x71 instead of 0x6b
    > INFO: Allocated in copy_process+0x70/0x1090 age=110 cpu=1 pid=28664
    > INFO: Freed in free_task+0x2c/0x30 age=68 cpu=0 pid=28667
    > INFO: Slab 0xc1ba6cc0 objects=8 used=5 fp=0xf53aafd0 flags=0x400020c3
    > INFO: Object 0xf53aafd0 @offset=12240 fp=0xf53acfb0
    >
    > Bytes b4 0xf53aafc0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
    > Object 0xf53aafd0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    > Object 0xf53aafe0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    > Object 0xf53aaff0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    > Object 0xf53ab000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    > Object 0xf53ab010: 6b 6b 6b 6b 6b 6b 6b 6b 71 19 6f be dd 07 00 00 kkkkkkkkq.o
    ...
    > Object 0xf53ab020: 71 19 6f be 6b 6b 6b 6b 6a 6b 6b eb 6b 6b 6b 6b q.okkkkjkkkkkk


    So what to notice is that this is offset hex(0xf53ab018-0xf53aafc0) =
    '0x58L' from the beginning of the object (would be nice to have SLUB
    print that too, btw), which corresponds to (struct
    task_struct).se.vruntime (the "se" is a struct sched_entity). I'm
    putting Ingo and Peter on the Cc.

    What I find odd is that only some of the bytes in there are wrong,
    take the stray "eb" in the last line (above), for example. And these
    variables around offset 0x58 from the struct task struct are all u64s.
    Is it possible that the corruption comes from somewhere else?

    (Does the number look like a valid vruntime, for example?)

    For the record,

    $ grep SCHED .config
    CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
    CONFIG_GROUP_SCHED=y
    CONFIG_FAIR_GROUP_SCHED=y
    CONFIG_RT_GROUP_SCHED=y
    CONFIG_USER_SCHED=y
    # CONFIG_CGROUP_SCHED is not set
    CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y
    CONFIG_SCHED_SMT=y
    CONFIG_SCHED_MC=y
    CONFIG_SCHED_HRTICK=y
    CONFIG_SCHED_DEBUG=y
    CONFIG_SCHEDSTATS=y


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: v2.6.26-rc7: BUG task_struct: Poison overwritten

    On Sat, 2008-06-21 at 22:41 +0200, Vegard Nossum wrote:
    > On Sat, Jun 21, 2008 at 9:28 PM, Vegard Nossum wrote:
    > > Oops, seems there was another one a bit earlier (about 5 minutes) that I
    > > didn't notice. I don't think it helps that much, but here it is:

    >
    > I actually got a third one too, but it's similar to the first two.
    >
    > >
    > >
    > > ================================================== ===========================
    > > BUG task_struct: Poison overwritten
    > > -----------------------------------------------------------------------------
    > >
    > > INFO: 0xf53ab018-0xf53ab02b. First byte 0x71 instead of 0x6b
    > > INFO: Allocated in copy_process+0x70/0x1090 age=110 cpu=1 pid=28664
    > > INFO: Freed in free_task+0x2c/0x30 age=68 cpu=0 pid=28667
    > > INFO: Slab 0xc1ba6cc0 objects=8 used=5 fp=0xf53aafd0 flags=0x400020c3
    > > INFO: Object 0xf53aafd0 @offset=12240 fp=0xf53acfb0
    > >
    > > Bytes b4 0xf53aafc0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
    > > Object 0xf53aafd0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    > > Object 0xf53aafe0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    > > Object 0xf53aaff0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    > > Object 0xf53ab000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    > > Object 0xf53ab010: 6b 6b 6b 6b 6b 6b 6b 6b 71 19 6f be dd 07 00 00 kkkkkkkkq.o
    ...
    > > Object 0xf53ab020: 71 19 6f be 6b 6b 6b 6b 6a 6b 6b eb 6b 6b 6b 6b q.okkkkjkkkkkk

    >
    > So what to notice is that this is offset hex(0xf53ab018-0xf53aafc0) =
    > '0x58L' from the beginning of the object (would be nice to have SLUB
    > print that too, btw), which corresponds to (struct
    > task_struct).se.vruntime (the "se" is a struct sched_entity). I'm
    > putting Ingo and Peter on the Cc.
    >
    > What I find odd is that only some of the bytes in there are wrong,
    > take the stray "eb" in the last line (above), for example. And these
    > variables around offset 0x58 from the struct task struct are all u64s.
    > Is it possible that the corruption comes from somewhere else?
    >
    > (Does the number look like a valid vruntime, for example?)


    Hard to tell, its a virtual timeline so pretty much any value is a valid
    value - you'd need a contemporary vruntime from the same cfs_rq to
    compare against.

    But it looks like there might be some cpu hotplug race with group
    scheduling - heiko (s390) and avi (x86_64) reported some cpu hotplug
    crashes. We're still looking into those.

    > For the record,
    >
    > $ grep SCHED .config
    > CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
    > CONFIG_GROUP_SCHED=y
    > CONFIG_FAIR_GROUP_SCHED=y
    > CONFIG_RT_GROUP_SCHED=y
    > CONFIG_USER_SCHED=y
    > # CONFIG_CGROUP_SCHED is not set
    > CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y
    > CONFIG_SCHED_SMT=y
    > CONFIG_SCHED_MC=y
    > CONFIG_SCHED_HRTICK=y
    > CONFIG_SCHED_DEBUG=y
    > CONFIG_SCHEDSTATS=y





    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: v2.6.26-rc7: BUG task_struct: Poison overwritten

    On Sat, Jun 21, 2008 at 11:21 PM, Peter Zijlstra wrote:
    > But it looks like there might be some cpu hotplug race with group
    > scheduling - heiko (s390) and avi (x86_64) reported some cpu hotplug
    > crashes. We're still looking into those.


    Thanks.

    I was poking around in kernel/sched.c and noticed something odd: In
    migrate_dead(), we have this code:

    /*
    * Drop lock around migration; if someone else moves it,
    * that's OK. No task can be added to this CPU, so iteration is
    * fine.
    */
    spin_unlock_irq(&rq->lock);
    move_task_off_dead_cpu(dead_cpu, p);
    spin_lock_irq(&rq->lock);

    which is fine in itself, I guess. But spin_unlock_irq() will enable
    interrupts. And move_task_off_dead_cpu() has this comment:

    /*
    * Figure out where task on dead CPU should go, use force if necessary.
    * NOTE: interrupts should be disabled by the caller
    */
    static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
    {

    ....but here, interrupts will not be disabled. On the other hand
    __migrate_task_irq() (called by move_task_off_dead_cpu()) calls
    local_irq_disable() itself... What do you think of this? Is the
    comment wrong? Or is there a difference between "interrupts" and
    "local_irq"?


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: v2.6.26-rc7: BUG task_struct: Poison overwritten

    On Sat, Jun 21, 2008 at 10:41 PM, Vegard Nossum wrote:
    >> INFO: 0xf53ab018-0xf53ab02b. First byte 0x71 instead of 0x6b
    >> INFO: Allocated in copy_process+0x70/0x1090 age=110 cpu=1 pid=28664
    >> INFO: Freed in free_task+0x2c/0x30 age=68 cpu=0 pid=28667
    >> INFO: Slab 0xc1ba6cc0 objects=8 used=5 fp=0xf53aafd0 flags=0x400020c3
    >> INFO: Object 0xf53aafd0 @offset=12240 fp=0xf53acfb0
    >>
    >> Bytes b4 0xf53aafc0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
    >> Object 0xf53aafd0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    >> Object 0xf53aafe0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    >> Object 0xf53aaff0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    >> Object 0xf53ab000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    >> Object 0xf53ab010: 6b 6b 6b 6b 6b 6b 6b 6b 71 19 6f be dd 07 00 00 kkkkkkkkq.o
    ...
    >> Object 0xf53ab020: 71 19 6f be 6b 6b 6b 6b 6a 6b 6b eb 6b 6b 6b 6b q.okkkkjkkkkkk

    >
    > So what to notice is that this is offset hex(0xf53ab018-0xf53aafc0) =
    > '0x58L' from the beginning of the object (would be nice to have SLUB
    > print that too, btw), which corresponds to (struct
    > task_struct).se.vruntime (the "se" is a struct sched_entity). I'm
    > putting Ingo and Peter on the Cc.


    Oops, this calculation is bogus. The start of the object is
    0xf53aafd0. This corresponds to 0x48L or .se.exec_start. So it seems
    that exec_start, sum_exec_runtime, and vruntime have all been written
    after the object was freed.

    I guess the high number of 6bs in there is simply because the poison
    was read, used for some calculation, then written back with not all
    bits modified by the calculation.


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: v2.6.26-rc7: BUG task_struct: Poison overwritten

    On 06/22, Vegard Nossum wrote:
    >
    > I was poking around in kernel/sched.c and noticed something odd: In
    > migrate_dead(), we have this code:
    >
    > /*
    > * Drop lock around migration; if someone else moves it,
    > * that's OK. No task can be added to this CPU, so iteration is
    > * fine.
    > */
    > spin_unlock_irq(&rq->lock);
    > move_task_off_dead_cpu(dead_cpu, p);
    > spin_lock_irq(&rq->lock);
    >
    > which is fine in itself, I guess. But spin_unlock_irq() will enable
    > interrupts. And move_task_off_dead_cpu() has this comment:
    >
    > /*
    > * Figure out where task on dead CPU should go, use force if necessary.
    > * NOTE: interrupts should be disabled by the caller
    > */
    > static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
    > {
    >
    > ...but here, interrupts will not be disabled. On the other hand
    > __migrate_task_irq() (called by move_task_off_dead_cpu()) calls
    > local_irq_disable() itself... What do you think of this? Is the
    > comment wrong? Or is there a difference between "interrupts" and
    > "local_irq"?


    Yes, the comment is wrong, thanks.

    It wasn't updated by

    "do CPU_DEAD migrating under read_lock(tasklist) instead of write_lock_irq(tasklist)"
    http://git.kernel.org/?p=linux/kerne...9424dfba5056c2

    Previously move_task_off_dead_cpu() was called under write_lock_irq(tasklist),
    and we can't take tasklist for writing without disabling irqs.

    If you don't see other problems, could you make a patch to fix the comment?

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread