2.6.23-rc7-mm1: panic in scheduler - Kernel

This is a discussion on 2.6.23-rc7-mm1: panic in scheduler - Kernel ; I looked around on the MLs for mention of this, but didn't find anything that appeared to match. Platform: HP rx8620 - 16-cpu/32GB/4-node ia64 [Madison] 2.6.23-rc7-mm1 broken out -- panic occurs when git-sched.patch pushed: Unable to handle kernel NULL pointer ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: 2.6.23-rc7-mm1: panic in scheduler

  1. 2.6.23-rc7-mm1: panic in scheduler

    I looked around on the MLs for mention of this, but didn't find anything
    that appeared to match.

    Platform: HP rx8620 - 16-cpu/32GB/4-node ia64 [Madison]

    2.6.23-rc7-mm1 broken out -- panic occurs when git-sched.patch pushed:

    Unable to handle kernel NULL pointer dereference (address 0000000000000000)
    swapper[0]: Oops 8813272891392 [1]
    Modules linked in: scsi_wait_scan ehci_hcd ohci_hcd uhci_hcd usbcore

    Pid: 0, CPU 14, comm: swapper
    psr : 0000101008522030 ifs : 8000000000000002 ip : [] Not tainted
    ip is at rb_next+0x0/0x140
    unat: 0000000000000000 pfs : 0000000000000308 rsc : 0000000000000003
    rnat: 8000000000000012 bsps: 000000000001003e pr : 6609a840599519a5
    ldrs: 0000000000000000 ccv : 0000000000000002 fpsr: 0009804c8a70433f
    csd : 0000000000000000 ssd : 0000000000000000
    b0 : a000000100078dc0 b6 : a000000100074a40 b7 : a000000100078e00
    f6 : 1003e0000000000000000 f7 : 1003e0000000000400000
    f8 : 1003e000000002aaaaaab f9 : 1003e0000000d43798a2b
    f10 : 1003e35e9970b967dd8b9 f11 : 1003e0000000000000002
    r1 : a000000100bc0920 r2 : e0000760000577f0 r3 : e000076000057f10
    r8 : fffffffffffffff0 r9 : 0000000000000002 r10 : e000076000057780
    r11 : 0000000000000000 r12 : e00007004160fe10 r13 : e000070041608000
    r14 : 0000000000000000 r15 : 000000000000000e r16 : 00000007f6c30a22
    r17 : e000070041608040 r18 : a0000001008383a8 r19 : a000000100078e00
    r20 : e000076000055bb8 r21 : e000076000055bb0 r22 : e000076000057ed0
    r23 : 00000000000f4240 r24 : a0000001009e0440 r25 : e000070041608bb4
    r26 : 0000000000000000 r27 : 0000000000000000 r28 : e000076000057f80
    r29 : 00000000000002e7 r30 : 0000000000000000 r31 : e000076000057780

    Call Trace:
    [] show_stack+0x80/0xa0
    sp=e00007004160f9e0 bsp=e000070041609008
    [] show_regs+0x870/0x8a0
    sp=e00007004160fbb0 bsp=e000070041608fa8
    [] die+0x190/0x300
    sp=e00007004160fbb0 bsp=e000070041608f60
    [] ia64_do_page_fault+0x780/0xa80
    sp=e00007004160fbb0 bsp=e000070041608f08
    [] ia64_leave_kernel+0x0/0x270
    sp=e00007004160fc40 bsp=e000070041608f08
    [] rb_next+0x0/0x140
    sp=e00007004160fe10 bsp=e000070041608ef8
    [] __dequeue_entity+0x80/0xc0
    sp=e00007004160fe10 bsp=e000070041608ec8
    [] pick_next_task_fair+0x60/0x180
    sp=e00007004160fe10 bsp=e000070041608e98
    [] schedule+0x340/0x19c0
    sp=e00007004160fe10 bsp=e000070041608cc0
    [] cpu_idle+0x290/0x3e0
    sp=e00007004160fe30 bsp=e000070041608c50
    [] start_secondary+0x380/0x5a0
    sp=e00007004160fe30 bsp=e000070041608c00
    [] __kprobes_text_end+0x6c0/0x6f0
    sp=e00007004160fe30 bsp=e000070041608c00


    Taking a quick look at [__]{en|de|queue_entity() and the functions they call,
    I see something suspicious in set_leftmost() in sched_fair.c:

    static inline void
    set_leftmost(struct cfs_rq *cfs_rq, struct rb_node *leftmost)
    {
    struct sched_entity *se;

    cfs_rq->rb_leftmost = leftmost;
    if (leftmost)
    se = rb_entry(leftmost, struct sched_entity, run_node);
    }

    Missing code? corrupt patch?

    config available on request, but there doesn't seem to be much in the way
    of scheduler config option. A few that might apply:

    SCHED_SMT is not set
    SCHED_DEBUG=y
    SCHEDSTATS=y


    Regards,
    Lee Schermerhorn


    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: 2.6.23-rc7-mm1: panic in scheduler

    Lee Schermerhorn wrote:
    > I looked around on the MLs for mention of this, but didn't find anything
    > that appeared to match.
    >
    > Platform: HP rx8620 - 16-cpu/32GB/4-node ia64 [Madison]
    >
    > 2.6.23-rc7-mm1 broken out -- panic occurs when git-sched.patch pushed:
    >
    > Unable to handle kernel NULL pointer dereference (address 0000000000000000)
    > swapper[0]: Oops 8813272891392 [1]
    > Modules linked in: scsi_wait_scan ehci_hcd ohci_hcd uhci_hcd usbcore
    >
    > Pid: 0, CPU 14, comm: swapper
    > psr : 0000101008522030 ifs : 8000000000000002 ip : [] Not tainted
    > ip is at rb_next+0x0/0x140
    > unat: 0000000000000000 pfs : 0000000000000308 rsc : 0000000000000003
    > rnat: 8000000000000012 bsps: 000000000001003e pr : 6609a840599519a5
    > ldrs: 0000000000000000 ccv : 0000000000000002 fpsr: 0009804c8a70433f
    > csd : 0000000000000000 ssd : 0000000000000000
    > b0 : a000000100078dc0 b6 : a000000100074a40 b7 : a000000100078e00
    > f6 : 1003e0000000000000000 f7 : 1003e0000000000400000
    > f8 : 1003e000000002aaaaaab f9 : 1003e0000000d43798a2b
    > f10 : 1003e35e9970b967dd8b9 f11 : 1003e0000000000000002
    > r1 : a000000100bc0920 r2 : e0000760000577f0 r3 : e000076000057f10
    > r8 : fffffffffffffff0 r9 : 0000000000000002 r10 : e000076000057780
    > r11 : 0000000000000000 r12 : e00007004160fe10 r13 : e000070041608000
    > r14 : 0000000000000000 r15 : 000000000000000e r16 : 00000007f6c30a22
    > r17 : e000070041608040 r18 : a0000001008383a8 r19 : a000000100078e00
    > r20 : e000076000055bb8 r21 : e000076000055bb0 r22 : e000076000057ed0
    > r23 : 00000000000f4240 r24 : a0000001009e0440 r25 : e000070041608bb4
    > r26 : 0000000000000000 r27 : 0000000000000000 r28 : e000076000057f80
    > r29 : 00000000000002e7 r30 : 0000000000000000 r31 : e000076000057780
    >
    > Call Trace:
    > [] show_stack+0x80/0xa0
    > sp=e00007004160f9e0 bsp=e000070041609008
    > [] show_regs+0x870/0x8a0
    > sp=e00007004160fbb0 bsp=e000070041608fa8
    > [] die+0x190/0x300
    > sp=e00007004160fbb0 bsp=e000070041608f60
    > [] ia64_do_page_fault+0x780/0xa80
    > sp=e00007004160fbb0 bsp=e000070041608f08
    > [] ia64_leave_kernel+0x0/0x270
    > sp=e00007004160fc40 bsp=e000070041608f08
    > [] rb_next+0x0/0x140
    > sp=e00007004160fe10 bsp=e000070041608ef8
    > [] __dequeue_entity+0x80/0xc0
    > sp=e00007004160fe10 bsp=e000070041608ec8
    > [] pick_next_task_fair+0x60/0x180
    > sp=e00007004160fe10 bsp=e000070041608e98
    > [] schedule+0x340/0x19c0
    > sp=e00007004160fe10 bsp=e000070041608cc0
    > [] cpu_idle+0x290/0x3e0
    > sp=e00007004160fe30 bsp=e000070041608c50
    > [] start_secondary+0x380/0x5a0
    > sp=e00007004160fe30 bsp=e000070041608c00
    > [] __kprobes_text_end+0x6c0/0x6f0
    > sp=e00007004160fe30 bsp=e000070041608c00
    >
    >
    > Taking a quick look at [__]{en|de|queue_entity() and the functions they call,
    > I see something suspicious in set_leftmost() in sched_fair.c:
    >
    > static inline void
    > set_leftmost(struct cfs_rq *cfs_rq, struct rb_node *leftmost)
    > {
    > struct sched_entity *se;
    >
    > cfs_rq->rb_leftmost = leftmost;
    > if (leftmost)
    > se = rb_entry(leftmost, struct sched_entity, run_node);
    > }
    >
    > Missing code? corrupt patch?
    >
    > config available on request, but there doesn't seem to be much in the way
    > of scheduler config option. A few that might apply:
    >
    > SCHED_SMT is not set
    > SCHED_DEBUG=y
    > SCHEDSTATS=y
    >
    >
    > Regards,
    > Lee Schermerhorn
    >


    Exactly same call trace is produced over IA64 Madison (up to 9M cache) with 8 cpu's.
    --
    Thanks & Regards,
    Kamalesh Babulal,
    Linux Technology Center,
    IBM, ISTL.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: 2.6.23-rc7-mm1: panic in scheduler


    * Lee Schermerhorn wrote:

    > Taking a quick look at [__]{en|de|queue_entity() and the functions
    > they call, I see something suspicious in set_leftmost() in
    > sched_fair.c:
    >
    > static inline void
    > set_leftmost(struct cfs_rq *cfs_rq, struct rb_node *leftmost)
    > {
    > struct sched_entity *se;
    >
    > cfs_rq->rb_leftmost = leftmost;
    > if (leftmost)
    > se = rb_entry(leftmost, struct sched_entity, run_node);
    > }
    >
    > Missing code? corrupt patch?


    could you pull this git tree ontop of a -rc7 (or later) upstream tree:

    git-pull git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git

    does the solve the crash?

    the above set_leftmost() code used to be larger and now indeed those
    bits are mostly dead code. I've queued up a clean-up patch for that -
    see the patch below. It should not impact correctness though, so if you
    can still trigger the crash with the latest sched-devel.git tree we'd
    like to know about it.

    Ingo

    ------------------->
    Subject: sched: remove set_leftmost()
    From: Ingo Molnar

    Lee Schermerhorn noticed that set_leftmost() contains dead code,
    remove this.

    Reported-by: Lee Schermerhorn
    Signed-off-by: Ingo Molnar
    ---
    kernel/sched_fair.c | 14 ++------------
    1 file changed, 2 insertions(+), 12 deletions(-)

    Index: linux/kernel/sched_fair.c
    ================================================== =================
    --- linux.orig/kernel/sched_fair.c
    +++ linux/kernel/sched_fair.c
    @@ -124,16 +124,6 @@ max_vruntime(u64 min_vruntime, u64 vrunt
    return min_vruntime;
    }

    -static inline void
    -set_leftmost(struct cfs_rq *cfs_rq, struct rb_node *leftmost)
    -{
    - struct sched_entity *se;
    -
    - cfs_rq->rb_leftmost = leftmost;
    - if (leftmost)
    - se = rb_entry(leftmost, struct sched_entity, run_node);
    -}
    -
    static inline s64
    entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se)
    {
    @@ -175,7 +165,7 @@ __enqueue_entity(struct cfs_rq *cfs_rq,
    * used):
    */
    if (leftmost)
    - set_leftmost(cfs_rq, &se->run_node);
    + cfs_rq->rb_leftmost = &se->run_node;

    rb_link_node(&se->run_node, parent, link);
    rb_insert_color(&se->run_node, &cfs_rq->tasks_timeline);
    @@ -185,7 +175,7 @@ static void
    __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
    {
    if (cfs_rq->rb_leftmost == &se->run_node)
    - set_leftmost(cfs_rq, rb_next(&se->run_node));
    + cfs_rq->rb_leftmost = rb_next(&se->run_node);

    rb_erase(&se->run_node, &cfs_rq->tasks_timeline);
    }
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: 2.6.23-rc7-mm1: panic in scheduler

    On 9/25/07, Kamalesh Babulal wrote:
    > Exactly same call trace is produced over IA64 Madison (up to 9M cache) with 8 cpu's.
    > --


    Hi, Kamalesh,

    Could you please reproduce the problem or share the steps to reproduce
    the problem?

    Thanks,
    Balbir
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: 2.6.23-rc7-mm1: panic in scheduler

    Balbir Singh wrote:
    > On 9/25/07, Kamalesh Babulal wrote:
    >> Exactly same call trace is produced over IA64 Madison (up to 9M cache) with 8 cpu's.
    >> --

    >
    > Hi, Kamalesh,
    >
    > Could you please reproduce the problem or share the steps to reproduce
    > the problem?
    >
    > Thanks,
    > Balbir
    > -


    Hi Balbir,

    Yes, i am able to reproduce the problem. The problem can be reproduced
    using the ltprunall.

    --
    Thanks & Regards,
    Kamalesh Babulal,
    Linux Technology Center,
    IBM, ISTL.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: 2.6.23-rc7-mm1: panic in scheduler

    On Tue, 2007-09-25 at 13:32 +0530, Kamalesh Babulal wrote:
    > Balbir Singh wrote:
    > > On 9/25/07, Kamalesh Babulal wrote:
    > >> Exactly same call trace is produced over IA64 Madison (up to 9M cache) with 8 cpu's.
    > >> --

    > >
    > > Hi, Kamalesh,
    > >
    > > Could you please reproduce the problem or share the steps to reproduce
    > > the problem?
    > >
    > > Thanks,
    > > Balbir
    > > -

    >
    > Hi Balbir,
    >
    > Yes, i am able to reproduce the problem. The problem can be reproduced
    > using the ltprunall.
    >


    I see the problem just trying to boot. I have yet to successfully boot
    23-rc7-mm1 on my platform. [But, I'll try Ingo's dev tree real soon
    now...]

    Lee

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread