2.6.24-rc3-mm2 - Kernel

This is a discussion on 2.6.24-rc3-mm2 - Kernel ; Andrew Morton wrote: > On Thu, 29 Nov 2007 21:58:16 +0100 > "Torsten Kaiser" wrote: >> First crash: >> >> [ 1116.083651] Unable to handle kernel NULL pointer dereference at >> 0000000000000378 RIP: >> [ 1116.089216] [ ] ether1394_dg_complete+0x28/0xa0 .... ...

+ Reply to Thread
Page 3 of 3 FirstFirst 1 2 3
Results 41 to 47 of 47

Thread: 2.6.24-rc3-mm2

  1. Re: 2.6.24-rc3-mm2

    Andrew Morton wrote:
    > On Thu, 29 Nov 2007 21:58:16 +0100
    > "Torsten Kaiser" wrote:
    >> First crash:
    >>
    >> [ 1116.083651] Unable to handle kernel NULL pointer dereference at
    >> 0000000000000378 RIP:
    >> [ 1116.089216] [] ether1394_dg_complete+0x28/0xa0

    ....
    > Yep, looks like a genuine 1394 bug.


    I can't make head or tail of it.

    FWIW, eth1394 and the entire rest of the 1394 stack beneath eth1394 are
    identical between -mm and Linus' tree.
    --
    Stefan Richter
    -=====-=-=== =-== ===-=
    http://arcgraph.de/sr/
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: named + capset = EPERM [Was: 2.6.24-rc3-mm2]

    On 11/29/2007 01:17 AM, Serge E. Hallyn wrote:
    > From 70d5da610fdbd66a36886c01e27b7fb11d2de044 Mon Sep 17 00:00:00 2001
    > From: sergeh@us.ibm.com
    > Date: Wed, 28 Nov 2007 16:16:23 -0800
    > Subject: [PATCH 1/1] capabilities: correct logic at capset_check
    >
    > Fix typo at capset_check introduced with capability bounding set
    > patch.
    >
    > Signed-off-by: sergeh@us.ibm.com


    Tested-by: Jiri Slaby

    > ---
    > security/commoncap.c | 2 +-
    > 1 files changed, 1 insertions(+), 1 deletions(-)
    >
    > diff --git a/security/commoncap.c b/security/commoncap.c
    > index c25ad09..503e958 100644
    > --- a/security/commoncap.c
    > +++ b/security/commoncap.c
    > @@ -119,7 +119,7 @@ int cap_capset_check (struct task_struct *target, kernel_cap_t *effective,
    > /* incapable of using this inheritable set */
    > return -EPERM;
    > }
    > - if (!!cap_issubset(*inheritable,
    > + if (!cap_issubset(*inheritable,
    > cap_combine(target->cap_inheritable,
    > current->cap_bset))) {
    > /* no new pI capabilities outside bounding set */


    Thanks.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: 2.6.24-rc3-mm2 (bugfix for memory cgroup per-zone-struct allocation.)

    On Thu, 29 Nov 2007 16:25:33 -0500
    Lee Schermerhorn wrote:
    > > - pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node);
    > > + /*
    > > + * This routine is called against possible nodes.
    > > + * But it's BUG to call kmalloc() against offline node.
    > > + *
    > > + * TODO: this routine can waste much memory for nodes which will
    > > + * never be onlined. It's better to use memory hotplug callback
    > > + * function.
    > > + */
    > > + if (node_state(node, N_HIGH_MEMORY))
    > > + pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node);
    > > + else
    > > + pn = kmalloc(sizeof(*pn), GFP_KERNEL);
    > > if (!pn)
    > > return 1;
    > >
    > >

    >
    > This worked for me. Can boot 24-rc3-mm2 [if I turn off async scsi scan,
    > that is--not related to mem controller].
    >

    Thank you !

    > Just FYI, on my ia64 platform, with NODES_SHIFT == 8 [RHEL & SLES ship
    > with 10, I believe], the size of the mem_cgroup structure is ~10KB.
    >

    Yes. But...
    I'll ask Goto-san how memory hotplug callback works and try it.

    Thanks,
    -Kame


    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [BUG] 2.6.24-rc3-mm2 soft lockup while running tbench

    Andrew Morton wrote:
    > On Wed, 28 Nov 2007 20:03:22 +0530
    > Kamalesh Babulal wrote:
    >
    >> Hi Andrew,
    >>
    >> while running tbench on the powerpc with 2.6.24-rc3-mm2 softlock up occurs
    >>
    >> BUG: soft lockup - CPU#0 stuck for 11s! [tbench:12183]
    >> NIP: c0000000000ac978 LR: c0000000000acff0 CTR: c00000000005c648
    >> REGS: C00000076F0F3200 TRAP: 0901 Not tainted (2.6.24-rc3-mm2-autotest)
    >> MSR: 8000000000009032 CR: 44000482 XER: 00000000
    >> TASK = C00000076F4BC000[12183] 'tbench' THREAD: C00000076F0F0000 CPU: 0
    >> NIP [c0000000000ac978] .get_page_from_freelist+0x1cc/0x754
    >> LR [c0000000000acff0] .__alloc_pages+0xb0/0x3a8
    >> Call Trace:
    >> [c00000076f0f3480] [c00000076f0f3560] 0xc00000076f0f3560 (unreliable)
    >> [c00000076f0f3590] [c0000000000acff0] .__alloc_pages+0xb0/0x3a8
    >> [c00000076f0f3680] [c0000000000ce2e4] .alloc_pages_current+0xa8/0xc8
    >> [c00000076f0f3710] [c0000000000ac6ec] .__get_free_pages+0x20/0x70
    >> [c00000076f0f3790] [c0000000000d75c8] .__kmalloc_node_track_caller+0x60/0x148
    >> [c00000076f0f3840] [c0000000002c22b0] .__alloc_skb+0x98/0x184
    >> [c00000076f0f38f0] [c000000000306cd8] .tcp_sendmsg+0x1fc/0xe24
    >> [c00000076f0f3a10] [c0000000002b963c] .sock_sendmsg+0xe4/0x128
    >> [c00000076f0f3c10] [c0000000002ba4ec] .sys_sendto+0xd4/0x120
    >> [c00000076f0f3d90] [c0000000002df2f8] .compat_sys_socketcall+0x148/0x214
    >> [c00000076f0f3e30] [c00000000000872c] syscall_exit+0x0/0x40
    >> Instruction dump:
    >> 720b0001 eb970000 40820070 72000002 4182000c e8bc0000 48000018 72080004
    >> 4182000c e8bc0008 48000008 e8bc0010 7f83e378 7de407b4 7e078378
    >>

    >
    > hm. Beats me. Does the machine recover OK?
    > -

    Hi Andrew,

    In the set of test cases ran serially, the softlockup in seen in tbench,
    then the remaining test cases get to run successfully after the softlockup.

    --
    Thanks & Regards,
    Kamalesh Babulal,
    Linux Technology Center,
    IBM, ISTL.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH] 2.6.24-rc3-mm2 build failure pasemi-rng driver

    On Wed, Nov 28, 2007 at 07:52:01PM +0530, Kamalesh Babulal wrote:
    > Hi Andrew,
    >
    > The kerne build fails, with message
    >
    > CC drivers/char/hw_random/pasemi-rng.o
    > drivers/char/hw_random/pasemi-rng.c: In function ???pasemi_rng_data_present???:
    > drivers/char/hw_random/pasemi-rng.c:53: error: ???wait??? undeclared (first use in this function)
    > drivers/char/hw_random/pasemi-rng.c:53: error: (Each undeclared identifier is reported only once
    > drivers/char/hw_random/pasemi-rng.c:53: error: for each function it appears in.)
    > drivers/char/hw_random/pasemi-rng.c: At top level:
    > drivers/char/hw_random/pasemi-rng.c:93: warning: initialization from incompatible pointer type
    > make[3]: *** [drivers/char/hw_random/pasemi-rng.o] Error 1
    > make[2]: *** [drivers/char/hw_random] Error 2
    > make[1]: *** [drivers/char] Error 2
    > make: *** [drivers] Error 2
    >
    > Tested for build failure, only.


    Fix works. Sorry for the delay, it's been a crazy week with other stuff.

    > Signed-off-by: Kamalesh Babulal


    Acked-by: Olof Johansson
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [BUG] 2.6.24-rc3-mm2 kernel bug on nfs & cifs mounted partitions

    Jan Kara wrote:
    > On Thu 29-11-07 17:27:08, Kamalesh Babulal wrote:
    >> Andrew Morton wrote:
    >>> On Thu, 29 Nov 2007 14:30:14 +0530 Kamalesh Babulal wrote:
    >>>
    >>>> Hi Andrew,
    >>>>
    >>>> While running file system stress on nfs and cifs mounted partitions, the machine
    >>>> drops to xmon
    >>>>
    >>>> 1:mon> e
    >>>> cpu 0x1: Vector: 300 (Data Access) at [c000000080a9f880]
    >>>> pc: c0000000001392c8: .inotify_inode_queue_event+0x50/0x158
    >>>> lr: c0000000001074d0: .vfs_link+0x204/0x298
    >>>> sp: c000000080a9fb00
    >>>> msr: 8000000000009032
    >>>> dar: 280
    >>>> dsisr: 40010000
    >>>> current = 0xc0000000c8e6f670
    >>>> paca = 0xc000000000512c00
    >>>> pid = 2848, comm = fsstress
    >>>> 1:mon> t
    >>>> [c000000080a9fbd0] c0000000001074d0 .vfs_link+0x204/0x298
    >>>> [c000000080a9fc70] c00000000010b6e0 .sys_linkat+0x134/0x1b4
    >>>> [c000000080a9fe30] c00000000000872c syscall_exit+0x0/0x40
    >>>> --- Exception: c00 (System Call) at 000000000ff1bdfc
    >>>> SP (ffeaed10) is in userspace
    >>>> 1:mon> r
    >>>> R00 = c0000000001074d0 R16 = 0000000000000000
    >>>> R01 = c000000080a9fb00 R17 = 0000000000000000
    >>>> R02 = c00000000060c380 R18 = 0000000000000000
    >>>> R03 = 0000000000000000 R19 = 0000000000000000
    >>>> R04 = 0000000000000004 R20 = 0000000000000000
    >>>> R05 = 0000000000000000 R21 = 0000000000000000
    >>>> R06 = 0000000000000000 R22 = 0000000000000000
    >>>> R07 = 0000000000000000 R23 = 0000000000000004
    >>>> R08 = 0000000000000000 R24 = 0000000000000280
    >>>> R09 = 0000000000000000 R25 = fffffffffffff000
    >>>> R10 = 0000000000000001 R26 = c000000082827790
    >>>> R11 = c0000000003963e8 R27 = c0000000828275a0
    >>>> R12 = d000000000deec78 R28 = 0000000000000000
    >>>> R13 = c000000000512c00 R29 = c00000007b18fcf0
    >>>> R14 = 0000000000000000 R30 = c0000000005bc088
    >>>> R15 = 0000000000000000 R31 = 0000000000000000
    >>>> pc = c0000000001392c8 .inotify_inode_queue_event+0x50/0x158
    >>>> lr = c0000000001074d0 .vfs_link+0x204/0x298
    >>>> msr = 8000000000009032 cr = 24000882
    >>>> ctr = c0000000003963e8 xer = 0000000000000000 trap = 300
    >>>> dar = 0000000000000280 dsisr = 40010000
    >>>>
    >>>>
    >>>> The gdb output shows
    >>>>
    >>>> 0xc0000000001076d4 is in vfs_symlink (include/linux/fsnotify.h:108).
    >>>> 103 * fsnotify_create - 'name' was linked in
    >>>> 104 */
    >>>> 105 static inline void fsnotify_create(struct inode *inode, struct dentry *dentry)
    >>>> 106 {
    >>>> 107 inode_dir_notify(inode, DN_CREATE);
    >>>> 108 inotify_inode_queue_event(inode, IN_CREATE, 0, dentry->d_name.name,
    >>>> 109 dentry->d_inode);
    >>>> 110 audit_inode_child(dentry->d_name.name, dentry, inode);
    >>>> 111 }
    >>>> 112
    >>>>
    >>> If it is reproducible can you please try reverting
    >>> inotify-send-in_attrib-events-when-link-count-changes.patch?

    >> Hi Andrew,
    >>
    >> reverting the patch inotify-send-in_attrib-events-when-link-count-changes.patch, the
    >> bug is not reproduced.

    > OK, it's a problem with CIFS. Its cifs_hardlink() function doesn't call
    > d_instantiate() and thus returns a dentry with d_inode set to NULL. I'm not
    > sure if such behavior is really correct but anyway, attached is a new
    > version of the patch which should handle it gracefully. Kamalesh, can you
    > please give it a try? Thanks.
    >
    > Honza

    Hi Jan,

    Thanks, the patch fixes the bug.

    --
    Thanks & Regards,
    Kamalesh Babulal,
    Linux Technology Center,
    IBM, ISTL.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: 2.6.24-rc3-mm2

    On Nov 29, 2007 10:07 PM, Andrew Morton wrote:
    > On Thu, 29 Nov 2007 21:58:16 +0100
    > "Torsten Kaiser" wrote:
    >
    > > But after ~1h of usage I got two different crashes on my x86_64 box.

    >
    > Nice, thanks. By finding these now you (hopefully) saved a whole lot of
    > people a whole lot of grief a couple months from now.


    Thats part of why I use/test the mm-kernels. :-)

    > > I hope, the CC's are correct...

    >
    > Bruce works on NFS things too.
    >
    >
    > > First crash:
    > >
    > > [ 1116.083651] Unable to handle kernel NULL pointer dereference at
    > > 0000000000000378 RIP:
    > > [ 1116.089216] [] ether1394_dg_complete+0x28/0xa0
    > > [ 1116.097883] PGD 51880067 PUD 4a08b067 PMD 0
    > > [ 1116.102232] Oops: 0000 [1] SMP
    > > [ 1116.105423] last sysfs file:
    > > /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map


    [snip]

    > Yep, looks like a genuine 1394 bug.
    > > I then change the network from ether1394 to a real network card, but
    > > this also crashed:
    > > [ 602.464580] ------------[ cut here ]------------
    > > [ 602.469250] kernel BUG at lib/list_debug.c:33!
    > > [ 602.473731] invalid opcode: 0000 [1] SMP
    > > [ 602.477828] last sysfs file:
    > > /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map

    [snip]
    > > [ 602.515102] Pid: 7452, comm: nfsv4-svc Not tainted 2.6.24-rc3-mm2 #1

    [snip]
    > > Both times the system hung with Caps Lock and Scroll Lock where blinking.

    >
    > And one in NFS.


    I'm starting to think, I'm seeing "random" memory corruptions.
    (But I do not think that this is hardware related, I would had
    expected a warning of some kind, if my ECC-RAM really had gone bad...)

    Yesterday the system worked a hole day perfectly, today it crashed again.
    Again Caps Lock and Scroll Lock where blinking, but the crash was at
    yet another subsystem.

    Todays stacktrace:
    [ 1397.050713] Unable to handle kernel NULL pointer dereference at
    0000000000000000 RIP:
    [ 1397.052918] [] kmem_cache_alloc_node+0x63/0x90
    [ 1397.056357] PGD 115dd2067 PUD 115c1e067 PMD 0
    [ 1397.058153] Oops: 0000 [1] SMP
    [ 1397.059424] last sysfs file:
    /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
    [ 1397.062560] CPU 3
    [ 1397.063372] Modules linked in: radeon drm nfsd exportfs w83792d
    ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx
    tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
    videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common
    v4l1_compat hid i2c_nforce2 pata_amd sg
    [ 1397.074283] Pid: 0, comm: swapper Not tainted 2.6.24-rc3-mm2 #2
    [ 1397.076646] RIP: 0010:[] []
    kmem_cache_alloc_node+0x63/0x90
    [ 1397.080179] RSP: 0018:ffff81011ff7fb10 EFLAGS: 00010246
    [ 1397.082301] RAX: 0000000000000000 RBX: ffff81008005e980 RCX: ffffffff8052e159
    [ 1397.085164] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffffffff807e7e80
    [ 1397.088022] RBP: ffff81011ff7fb30 R08: 000000000029d8f0 R09: 000000000014ec78
    [ 1397.090879] R10: 00000000000005a8 R11: 0000000000000001 R12: 00000000ffffffff
    [ 1397.093732] R13: 0000000000000020 R14: 0000000000000020 R15: ffffffff807e7e80
    [ 1397.096583] FS: 00007f064c8b9700(0000) GS:ffff81011ff23d00(0000)
    knlGS:0000000000000000
    [ 1397.099839] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    [ 1397.102121] CR2: 0000000000000000 CR3: 0000000115dd0000 CR4: 00000000000006e0
    [ 1397.104982] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 1397.107835] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [ 1397.110697] Process swapper (pid: 0, threadinfo FFFF81007FFAC000,
    task FFFF81011FF72000)
    [ 1397.113949] Stack: 0000000000000008 ffff810108c1e000
    00000000ffffffff 00000000000000d0
    [ 1397.117206] ffff81011ff7fb70 ffffffff8052e159 000000001ff7fbd0
    ffff810108c1e000
    [ 1397.120185] 0000000000000000 ffff8100d61f2400 ffff8100d61f2438
    0000000000000000
    [ 1397.123116] Call Trace:
    [ 1397.124171] [] __alloc_skb+0x49/0x150
    [ 1397.126557] [] tcp_send_ack+0x2e/0x120
    [ 1397.128725] [] __tcp_ack_snd_check+0x5c/0xa0
    [ 1397.131093] [] tcp_rcv_established+0x3b3/0x800
    [ 1397.133515] [] tcp_v4_do_rcv+0x2da/0x6a0
    [ 1397.135763] [] tcp_v4_rcv+0x978/0xac0
    [ 1397.137904] [] ip_local_deliver_finish+0xd3/0x250
    [ 1397.140440] [] ip_local_deliver+0x3b/0x90
    [ 1397.142708] [] ip_rcv_finish+0x119/0x410
    [ 1397.144920] [] __lock_acquire+0x725/0x1130
    [ 1397.147229] [] ip_rcv+0x22a/0x300
    [ 1397.149192] [] netif_receive_skb+0x1d6/0x280
    [ 1397.151556] [] process_backlog+0x7c/0xf0
    [ 1397.153785] [] process_backlog+0x8a/0xf0
    [ 1397.155997] [] net_rx_action+0xb6/0x130
    [ 1397.158209] [] __do_softirq+0x84/0x110
    [ 1397.160369] [] call_softirq+0x1c/0x30
    [ 1397.162489] [] do_softirq+0x65/0xc0
    [ 1397.164545] [] irq_exit+0x95/0xa0
    [ 1397.166527] [] do_IRQ+0x8f/0x100
    [ 1397.168470] [] default_idle+0x0/0x60
    [ 1397.170568] [] default_idle+0x0/0x60
    [ 1397.172650] [] ret_from_intr+0x0/0xf
    [ 1397.174741] [] default_idle+0x37/0x60
    [ 1397.177131] [] default_idle+0x35/0x60has
    [ 1397.179266] [] cpu_idle+0x6b/0xa0
    [ 1397.181236] [] start_secondary+0x2f8/0x430
    [ 1397.183523]
    [ 1397.184115] INFO: lockdep is turned off.
    [ 1397.185691]
    [ 1397.185691] Code: 4c 8b 04 c6 48 89 f0 4c 0f b1 03 48 39 f0 49 89
    c4 75 b0 eb
    [ 1397.189307] RIP [] kmem_cache_alloc_node+0x63/0x90
    [ 1397.191891] RSP
    [ 1397.193305] CR2: 0000000000000000
    [ 1397.194638] Kernel panic - not syncing: Aiee, killing interrupt handler!

    I put some WARN_ON's into ether1394_dg_complete() to see what happened
    there, but these never triggered.
    Is "last sysfs file:
    /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map" relevant, or
    just glibc checking for NUMA?

    I don't know in what direction I should look to find the cause of this.
    Using slub_debug=FZP?

    I have:
    CONFIG_DEBUG_LIST=y
    CONFIG_DEBUG_SG=y
    Would an addition CONFIG_IOMMU_DEBUG (or something else) make sense?

    Torsten
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 3 of 3 FirstFirst 1 2 3