[bug] block subsystem related crash with latest -git - Kernel

This is a discussion on [bug] block subsystem related crash with latest -git - Kernel ; Jens, just got this crash on a testbox: [ 37.701628] warning: process `kmodule' used the removed sysctl system call with 1.23. [ 39.409390] BUG: unable to handle kernel paging request at virtual address 7ca76000 [ 39.416892] printing eip: 78406669 *pde ...

+ Reply to Thread
Page 1 of 8 1 2 3 ... LastLast
Results 1 to 20 of 151

Thread: [bug] block subsystem related crash with latest -git

  1. [bug] block subsystem related crash with latest -git


    Jens, just got this crash on a testbox:

    [ 37.701628] warning: process `kmodule' used the removed sysctl system call with 1.23.
    [ 39.409390] BUG: unable to handle kernel paging request at virtual address 7ca76000
    [ 39.416892] printing eip: 78406669 *pde = 00dda027 *pte = 04a76000
    [ 39.423132] Oops: 0000 [#1] DEBUG_PAGEALLOC
    [ 39.427292]
    [ 39.428766] Pid: 431, comm: fsck.ext3 Not tainted (2.6.23 #45)
    [ 39.434571] EIP: 0060:[<78406669>] EFLAGS: 00010006 CPU: 0
    [ 39.440035] EIP is at blk_rq_map_sg+0xb9/0x170
    [ 39.444450] EAX: 04aed000 EBX: 7ca1f380 ECX: 04aee000 EDX: 00095da0
    [ 39.450689] ESI: 7ca75ff0 EDI: 79180da0 EBP: 00001000 ESP: 7caa1cac
    [ 39.456929] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
    [ 39.462303] Process fsck.ext3 (pid: 431, ti=7caa0000 task=7ca80000 task.ti=7caa0000)
    [ 39.469841] Stack: 00000020 7b520000 00002000 04aef000 7ca75fe0 0000001f 00000001 00000000
    [ 39.478161] 7ca1f300 01000002 7ca44980 7b522b94 7ca0de00 7b520000 784c89b5 7b520000
    [ 39.486480] 784c843a 7ca0de00 7b520f20 7b522b94 7ca0de00 784e54a0 784f394b 00000000
    [ 39.494799] Call Trace:
    [ 39.497400] [<784c89b5>] scsi_init_io+0x55/0xe0
    [ 39.501992] [<784c843a>] scsi_get_cmd_from_req+0x2a/0x40
    [ 39.507366] [<784e54a0>] sd_prep_fn+0x80/0x940
    [ 39.511872] [<784f394b>] ata_bmdma_start+0xb/0x20
    [ 39.516638] [<784ef344>] ata_qc_issue_prot+0x164/0x1e0
    [ 39.521839] [<78405c63>] elv_dispatch_sort+0x23/0xe0
    [ 39.526865] [<784057d0>] elv_next_request+0xa0/0x130
    [ 39.531891] [<787715b8>] _spin_lock_irq+0x38/0x50
    [ 39.536657] [<784c9af4>] scsi_request_fn+0x1e4/0x370
    [ 39.541684] [<78120e72>] del_timer+0x62/0x70
    [ 39.546016] [<78408a45>] __generic_unplug_device+0x25/0x30
    [ 39.551563] [<78408d15>] generic_unplug_device+0x15/0x30
    [ 39.556936] [<78406430>] blk_backing_dev_unplug+0x0/0x10
    [ 39.562309] [<7840643c>] blk_backing_dev_unplug+0xc/0x10
    [ 39.567682] [<7818080d>] block_sync_page+0x2d/0x40
    [ 39.572535] [<78144ee9>] sync_page+0x29/0x40
    [ 39.576868] [<7876fdec>] __wait_on_bit_lock+0x3c/0x70
    [ 39.581981] [<78144ec0>] sync_page+0x0/0x40
    [ 39.586227] [<78144ea2>] __lock_page+0x52/0x60
    [ 39.590734] [<7812ae90>] wake_bit_function+0x0/0x60
    [ 39.595674] [<7814566c>] do_generic_mapping_read+0x21c/0x450
    [ 39.601393] [<78144c30>] file_read_actor+0x0/0x130
    [ 39.606246] [<78147227>] generic_file_aio_read+0x137/0x180
    [ 39.611793] [<78144c30>] file_read_actor+0x0/0x130
    [ 39.616645] [<78160d95>] do_sync_read+0xd5/0x120
    [ 39.621326] [<78135cd8>] mark_held_locks+0x38/0x70
    [ 39.626179] [<7812ae50>] autoremove_wake_function+0x0/0x40
    [ 39.631724] [<787705b4>] mutex_lock_nested+0x1a4/0x200
    [ 39.636924] [<78185611>] block_llseek+0x31/0xc0
    [ 39.641517] [<787703a5>] __mutex_unlock_slowpath+0xb5/0x110
    [ 39.647150] [<78135e1c>] trace_hardirqs_on+0x9c/0xb0
    [ 39.652177] [<78160cc0>] do_sync_read+0x0/0x120
    [ 39.656770] [<7816165b>] vfs_read+0xbb/0x120
    [ 39.661103] [<78161a41>] sys_read+0x41/0x70
    [ 39.665349] [<781028a2>] syscall_call+0x7/0xb
    [ 39.669769] =======================
    [ 39.673322] Code: 05 c1 e1 0c 03 4a 08 8b 52 04 01 ca 89 54 24 0c 8b 3b 89 fa 29 c2 89 d0 c1 f8 05 c1 e0 0c 03 43 08 39 44 24 0c 0f 84 7e 00 00 00 <8b> 46 10 8d 56 10 a8 01 75 4c 89 3e 89 6e 0c 8b 43 08 89 46 04

    config and full bootlog attached.

    i tried another bootup with the same kernel and the crash did not
    reoccur, so it seems to be spurious. This crash could be related to the
    scsi or block merges done in the past few days - never saw this before.

    Ingo


  2. Re: [bug] block subsystem related crash with latest -git


    * Ingo Molnar wrote:

    > i tried another bootup with the same kernel and the crash did not
    > reoccur, so it seems to be spurious. This crash could be related to
    > the scsi or block merges done in the past few days - never saw this
    > before.


    managed to trigger it a second time, so it seems rather reproducible:

    [ 328.771333] kjournald starting. Commit interval 5 seconds
    [ 328.776963] EXT3 FS on sda5, internal journal
    [ 328.781172] EXT3-fs: mounted filesystem with ordered data mode.
    [ 329.689493] BUG: unable to handle kernel paging request at virtual address 7d516000
    [ 329.696990] printing eip: 78406669 *pde = 00ddd027 *pte = 05516000
    [ 329.703230] Oops: 0000 [#1] DEBUG_PAGEALLOC
    [ 329.707390]
    [ 329.708863] Pid: 0, comm: swapper Not tainted (2.6.23 #45)
    [ 329.714321] EIP: 0060:[<78406669>] EFLAGS: 00010006 CPU: 0
    [ 329.719787] EIP is at blk_rq_map_sg+0xb9/0x170
    [ 329.724202] EAX: 2f6df000 EBX: 7d6af880 ECX: 34d55000 EDX: 005edbe0
    [ 329.730441] ESI: 7d515ff0 EDI: 796d8be0 EBP: 00001000 ESP: 78a13db0
    [ 329.736680] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
    [ 329.742054] Process swapper (pid: 0, ti=78a12000 task=7893c400 task.ti=78a12000)
    [ 329.749246] Stack: 00000020 7b520000 00002000 34d56000 7d515fe0 00000007 00000001 00000000
    [ 329.757565] 7d6af800 01000000 7d515e00 7b522b94 7d84f58c 7b520000 784c89b5 7b520000
    [ 329.765885] 784c843a 7d84f58c 7b520f20 7b522b94 7d84f58c 784e54a0 784f394b 00000000
    [ 329.774204] Call Trace:
    [ 329.776804] [<784c89b5>] scsi_init_io+0x55/0xe0
    [ 329.781398] [<784c843a>] scsi_get_cmd_from_req+0x2a/0x40
    [ 329.786770] [<784e54a0>] sd_prep_fn+0x80/0x940
    [ 329.791277] [<784f394b>] ata_bmdma_start+0xb/0x20
    [ 329.796043] [<784ef344>] ata_qc_issue_prot+0x164/0x1e0
    [ 329.801243] [<78405c63>] elv_dispatch_sort+0x23/0xe0
    [ 329.806268] [<784057d0>] elv_next_request+0xa0/0x130
    [ 329.811295] [<787715b8>] _spin_lock_irq+0x38/0x50
    [ 329.816062] [<784c9af4>] scsi_request_fn+0x1e4/0x370
    [ 329.821088] [<784097f6>] blk_run_queue+0x36/0x80
    [ 329.825768] [<784c8370>] scsi_next_command+0x30/0x50
    [ 329.830794] [<784c84fb>] scsi_end_request+0xab/0xe0
    [ 329.835733] [<784c9249>] scsi_io_completion+0xa9/0x3d0
    [ 329.840933] [<78135de7>] trace_hardirqs_on+0x67/0xb0
    [ 329.845960] [<784068c5>] blk_done_softirq+0x45/0x80
    [ 329.850900] [<784068f3>] blk_done_softirq+0x73/0x80
    [ 329.855839] [<7811d3f3>] __do_softirq+0x53/0xb0
    [ 329.860432] [<7811d4b8>] do_softirq+0x68/0x70
    [ 329.864852] [<78105351>] do_IRQ+0x51/0x90
    [ 329.868925] [<78135e1c>] trace_hardirqs_on+0x9c/0xb0
    [ 329.873951] [<7810388e>] common_interrupt+0x2e/0x40
    [ 329.878891] [<78100c55>] cpu_idle+0x35/0x60
    [ 329.883138] [<78a14b35>] start_kernel+0x265/0x300
    [ 329.887904] [<78a14380>] unknown_bootoption+0x0/0x1e0
    [ 329.893016] =======================
    [ 329.896570] Code: 05 c1 e1 0c 03 4a 08 8b 52 04 01 ca 89 54 24 0c 8b 3b 89 fa 29 c2 89 d0 c1 f8 05 c1 e0 0c 03 43 08 39 44 24 0c 0f 84 7e 00 00 00 <8b> 46 10 8d 56 10 a8 01 75 4c 89 3e 89 6e 0c 8b 43 08 89 46 04
    [ 329.915375] EIP: [<78406669>] blk_rq_map_sg+0xb9/0x170 SS:ESP 0068:78a13db0
    [ 329.922309] Kernel panic - not syncing: Fatal exception in interrupt

    Ingo
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [bug] block subsystem related crash with latest -git

    On Wed, Oct 17 2007, Ingo Molnar wrote:
    >
    > * Ingo Molnar wrote:
    >
    > > i tried another bootup with the same kernel and the crash did not
    > > reoccur, so it seems to be spurious. This crash could be related to
    > > the scsi or block merges done in the past few days - never saw this
    > > before.

    >
    > managed to trigger it a second time, so it seems rather reproducible:


    Can you pull

    git://git.kernel.dk/linux-2.6-block.git for-linus

    and see if it still reproduces?

    --
    Jens Axboe

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [bug] block subsystem related crash with latest -git



    On Wed, 17 Oct 2007, Ingo Molnar wrote:
    >
    > Jens, just got this crash on a testbox:


    The code in question is:

    mov %edx,0xc(%esp)
    mov (%ebx),%edi
    mov %edi,%edx
    sub %eax,%edx
    mov %edx,%eax
    sar $0x5,%eax
    shl $0xc,%eax
    add 0x8(%ebx),%eax
    cmp %eax,0xc(%esp)
    je +126
    mov 0x10(%esi),%eax <----- Oops
    lea 0x10(%esi),%edx
    test $0x1,%al
    jne +76
    mov %edi,(%esi)
    mov %ebp,0xc(%esi)
    mov 0x8(%ebx),%eax
    mov %eax,0x4(%esi)


    and it looks like %esi is overflowing from one page to the next one, ie:

    BUG: unable to handle kernel paging request at virtual address 7ca76000
    ESI: 7ca75ff0

    and you caught this thanks to page-alloc debugging again.

    I think I can match that up with the source code: that's "sg_next()". It's
    doing:

    sg++;

    if (unlikely(sg_is_chain(sg)))
    sg = sg_chain_ptr(sg);

    return sg;

    and the oopsing instruction is that load of "sg->page" in the assembly
    code:

    mov 0x10(%esi),%eax # %eax = sg->page
    lea 0x10(%esi),%edx # %edx = sg+1;
    test $0x1,%al # if (unlikely(sg_is_chain()))
    jne +76

    Jens?

    Linus
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [bug] block subsystem related crash with latest -git

    On Wed, Oct 17 2007, Linus Torvalds wrote:
    >
    >
    > On Wed, 17 Oct 2007, Ingo Molnar wrote:
    > >
    > > Jens, just got this crash on a testbox:

    >
    > The code in question is:
    >
    > mov %edx,0xc(%esp)
    > mov (%ebx),%edi
    > mov %edi,%edx
    > sub %eax,%edx
    > mov %edx,%eax
    > sar $0x5,%eax
    > shl $0xc,%eax
    > add 0x8(%ebx),%eax
    > cmp %eax,0xc(%esp)
    > je +126
    > mov 0x10(%esi),%eax <----- Oops
    > lea 0x10(%esi),%edx
    > test $0x1,%al
    > jne +76
    > mov %edi,(%esi)
    > mov %ebp,0xc(%esi)
    > mov 0x8(%ebx),%eax
    > mov %eax,0x4(%esi)
    >
    >
    > and it looks like %esi is overflowing from one page to the next one, ie:
    >
    > BUG: unable to handle kernel paging request at virtual address 7ca76000
    > ESI: 7ca75ff0
    >
    > and you caught this thanks to page-alloc debugging again.
    >
    > I think I can match that up with the source code: that's "sg_next()". It's
    > doing:
    >
    > sg++;
    >
    > if (unlikely(sg_is_chain(sg)))
    > sg = sg_chain_ptr(sg);
    >
    > return sg;
    >
    > and the oopsing instruction is that load of "sg->page" in the assembly
    > code:
    >
    > mov 0x10(%esi),%eax # %eax = sg->page
    > lea 0x10(%esi),%edx # %edx = sg+1;
    > test $0x1,%al # if (unlikely(sg_is_chain()))
    > jne +76
    >
    > Jens?


    Yep, that's what I came up with as well - I asked Ingo for a dump in
    private, but ended up just using ksymoops to decode the line.

    The way blk_rq_map_sg() operates is that it ends up doing a

    next_sg = sg_next(sg);

    even though sg may be the last entry. Perhaps this is crapping out,
    although if sg is a valid address, then sg + 1 should be as well.
    next_sg may end up being crap, in fact it will, but we'll never use that
    unless there are more entries to fill. And if there is, then both sg and
    next_sg were valid.

    So nothing in for-linus should fix it, I'll try and come up with an
    alternate way to assign next_sg so it's always valid.

    --
    Jens Axboe

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [bug] block subsystem related crash with latest -git

    On Wed, Oct 17 2007, Jens Axboe wrote:
    > On Wed, Oct 17 2007, Linus Torvalds wrote:
    > >
    > >
    > > On Wed, 17 Oct 2007, Ingo Molnar wrote:
    > > >
    > > > Jens, just got this crash on a testbox:

    > >
    > > The code in question is:
    > >
    > > mov %edx,0xc(%esp)
    > > mov (%ebx),%edi
    > > mov %edi,%edx
    > > sub %eax,%edx
    > > mov %edx,%eax
    > > sar $0x5,%eax
    > > shl $0xc,%eax
    > > add 0x8(%ebx),%eax
    > > cmp %eax,0xc(%esp)
    > > je +126
    > > mov 0x10(%esi),%eax <----- Oops
    > > lea 0x10(%esi),%edx
    > > test $0x1,%al
    > > jne +76
    > > mov %edi,(%esi)
    > > mov %ebp,0xc(%esi)
    > > mov 0x8(%ebx),%eax
    > > mov %eax,0x4(%esi)
    > >
    > >
    > > and it looks like %esi is overflowing from one page to the next one, ie:
    > >
    > > BUG: unable to handle kernel paging request at virtual address 7ca76000
    > > ESI: 7ca75ff0
    > >
    > > and you caught this thanks to page-alloc debugging again.
    > >
    > > I think I can match that up with the source code: that's "sg_next()". It's
    > > doing:
    > >
    > > sg++;
    > >
    > > if (unlikely(sg_is_chain(sg)))
    > > sg = sg_chain_ptr(sg);
    > >
    > > return sg;
    > >
    > > and the oopsing instruction is that load of "sg->page" in the assembly
    > > code:
    > >
    > > mov 0x10(%esi),%eax # %eax = sg->page
    > > lea 0x10(%esi),%edx # %edx = sg+1;
    > > test $0x1,%al # if (unlikely(sg_is_chain()))
    > > jne +76
    > >
    > > Jens?

    >
    > Yep, that's what I came up with as well - I asked Ingo for a dump in
    > private, but ended up just using ksymoops to decode the line.
    >
    > The way blk_rq_map_sg() operates is that it ends up doing a
    >
    > next_sg = sg_next(sg);
    >
    > even though sg may be the last entry. Perhaps this is crapping out,
    > although if sg is a valid address, then sg + 1 should be as well.
    > next_sg may end up being crap, in fact it will, but we'll never use that
    > unless there are more entries to fill. And if there is, then both sg and
    > next_sg were valid.
    >
    > So nothing in for-linus should fix it, I'll try and come up with an
    > alternate way to assign next_sg so it's always valid.


    OK, the below should actually be safe, I don't know why I talked myself
    into the next_sg stuff in the beginning. It's always safe to zero sg,
    since it's a valid entry - nothing to save in ->page. Ingo, does this
    work for you?

    diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
    index 9e3f3cc..3935469 100644
    --- a/block/ll_rw_blk.c
    +++ b/block/ll_rw_blk.c
    @@ -1322,8 +1322,8 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
    struct scatterlist *sglist)
    {
    struct bio_vec *bvec, *bvprv;
    - struct scatterlist *next_sg, *sg;
    struct req_iterator iter;
    + struct scatterlist *sg;
    int nsegs, cluster;

    nsegs = 0;
    @@ -1333,7 +1333,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
    * for each bio in rq
    */
    bvprv = NULL;
    - sg = next_sg = &sglist[0];
    + sg = NULL;
    rq_for_each_segment(bvec, rq, iter) {
    int nbytes = bvec->bv_len;

    @@ -1349,8 +1349,10 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
    sg->length += nbytes;
    } else {
    new_segment:
    - sg = next_sg;
    - next_sg = sg_next(sg);
    + if (!sg)
    + sg = sglist;
    + else
    + sg = sg_next(sg);

    memset(sg, 0, sizeof(*sg));
    sg->page = bvec->bv_page;

    --
    Jens Axboe

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [bug] block subsystem related crash with latest -git


    * Jens Axboe wrote:

    > > and the oopsing instruction is that load of "sg->page" in the assembly
    > > code:
    > >
    > > mov 0x10(%esi),%eax # %eax = sg->page
    > > lea 0x10(%esi),%edx # %edx = sg+1;
    > > test $0x1,%al # if (unlikely(sg_is_chain()))
    > > jne +76
    > >
    > > Jens?

    >
    > Yep, that's what I came up with as well - I asked Ingo for a dump in
    > private, but ended up just using ksymoops to decode the line.
    >
    > The way blk_rq_map_sg() operates is that it ends up doing a
    >
    > next_sg = sg_next(sg);
    >
    > even though sg may be the last entry. Perhaps this is crapping out,
    > although if sg is a valid address, then sg + 1 should be as well.
    > next_sg may end up being crap, in fact it will, but we'll never use
    > that unless there are more entries to fill. And if there is, then both
    > sg and next_sg were valid.


    find below the disassembled code. Here's the faulting source line
    according to gdb:

    (gdb) list *0x78406669
    0x78406669 is in blk_rq_map_sg (include/linux/scatterlist.h:48).
    43 */
    44 static inline struct scatterlist *sg_next(struct scatterlist *sg)
    45 {
    46 sg++;
    47
    48 if (unlikely(sg_is_chain(sg)))
    49 sg = sg_chain_ptr(sg);
    50
    51 return sg;
    52 }

    (gdb) list *0x78406673
    0x78406673 is in blk_rq_map_sg (block/ll_rw_blk.c:1355).
    1350 } else {
    1351 new_segment:
    1352 sg = next_sg;
    1353 next_sg = sg_next(sg);
    1354
    1355 sg->page = bvec->bv_page;
    1356 sg->length = nbytes;
    1357 sg->offset = bvec->bv_offset;
    1358 nsegs++;
    1359 }
    (gdb)

    the compiler is gcc-4.2.2. (vanilla, built from sources)

    Ingo

    784065b0 :
    784065b0: 55 push %ebp
    784065b1: 57 push %edi
    784065b2: 56 push %esi
    784065b3: 53 push %ebx
    784065b4: 83 ec 28 sub $0x28,%esp
    784065b7: 89 44 24 04 mov %eax,0x4(%esp)
    784065bb: 8b 98 08 01 00 00 mov 0x108(%eax),%ebx
    784065c1: 83 e3 01 and $0x1,%ebx
    784065c4: 89 5c 24 18 mov %ebx,0x18(%esp)
    784065c8: 8b 52 3c mov 0x3c(%edx),%edx
    784065cb: c7 44 24 14 00 00 00 movl $0x0,0x14(%esp)
    784065d2: 00
    784065d3: 85 d2 test %edx,%edx
    784065d5: 89 54 24 20 mov %edx,0x20(%esp)
    784065d9: 0f 84 fc 00 00 00 je 784066db
    784065df: 89 ce mov %ecx,%esi
    784065e1: 31 d2 xor %edx,%edx
    784065e3: 89 4c 24 10 mov %ecx,0x10(%esp)
    784065e7: 8b 44 24 20 mov 0x20(%esp),%eax
    784065eb: 0f b7 58 1a movzwl 0x1a(%eax),%ebx
    784065ef: 8b 48 30 mov 0x30(%eax),%ecx
    784065f2: 89 5c 24 1c mov %ebx,0x1c(%esp)
    784065f6: 0f b7 40 18 movzwl 0x18(%eax),%eax
    784065fa: 39 d8 cmp %ebx,%eax
    784065fc: 0f 8e c6 00 00 00 jle 784066c8
    78406602: 8d 04 5b lea (%ebx,%ebx,2),%eax
    78406605: 8d 1c 81 lea (%ecx,%eax,4),%ebx
    78406608: 0f b6 44 24 18 movzbl 0x18(%esp),%eax
    7840660d: 88 44 24 27 mov %al,0x27(%esp)
    78406611: e9 8b 00 00 00 jmp 784066a1
    78406616: 8b 4c 24 10 mov 0x10(%esp),%ecx
    7840661a: 8b 41 0c mov 0xc(%ecx),%eax
    7840661d: 8b 4c 24 04 mov 0x4(%esp),%ecx
    78406621: 01 e8 add %ebp,%eax
    78406623: 89 44 24 08 mov %eax,0x8(%esp)
    78406627: 3b 81 6c 01 00 00 cmp 0x16c(%ecx),%eax
    7840662d: 0f 87 80 00 00 00 ja 784066b3
    78406633: a1 18 ec d7 78 mov 0x78d7ec18,%eax
    78406638: 8b 0a mov (%edx),%ecx
    7840663a: 29 c1 sub %eax,%ecx
    7840663c: c1 f9 05 sar $0x5,%ecx
    7840663f: c1 e1 0c shl $0xc,%ecx
    78406642: 03 4a 08 add 0x8(%edx),%ecx
    78406645: 8b 52 04 mov 0x4(%edx),%edx
    78406648: 01 ca add %ecx,%edx
    7840664a: 89 54 24 0c mov %edx,0xc(%esp)
    7840664e: 8b 3b mov (%ebx),%edi
    78406650: 89 fa mov %edi,%edx
    78406652: 29 c2 sub %eax,%edx
    78406654: 89 d0 mov %edx,%eax
    78406656: c1 f8 05 sar $0x5,%eax
    78406659: c1 e0 0c shl $0xc,%eax
    7840665c: 03 43 08 add 0x8(%ebx),%eax
    7840665f: 39 44 24 0c cmp %eax,0xc(%esp)
    78406663: 0f 84 7e 00 00 00 je 784066e7
    78406669: 8b 46 10 mov 0x10(%esi),%eax
    7840666c: 8d 56 10 lea 0x10(%esi),%edx
    7840666f: a8 01 test $0x1,%al
    78406671: 75 4c jne 784066bf
    78406673: 89 3e mov %edi,(%esi)
    78406675: 89 6e 0c mov %ebp,0xc(%esi)
    78406678: 8b 43 08 mov 0x8(%ebx),%eax
    7840667b: 89 46 04 mov %eax,0x4(%esi)
    7840667e: 83 44 24 14 01 addl $0x1,0x14(%esp)
    78406683: 89 74 24 10 mov %esi,0x10(%esp)
    78406687: 89 d6 mov %edx,%esi
    78406689: 8b 54 24 20 mov 0x20(%esp),%edx
    7840668d: 83 44 24 1c 01 addl $0x1,0x1c(%esp)
    78406692: 0f b7 42 18 movzwl 0x18(%edx),%eax
    78406696: 3b 44 24 1c cmp 0x1c(%esp),%eax
    7840669a: 7e 2a jle 784066c6
    7840669c: 89 da mov %ebx,%edx
    7840669e: 83 c3 0c add $0xc,%ebx
    784066a1: 85 d2 test %edx,%edx
    784066a3: 8b 6b 04 mov 0x4(%ebx),%ebp
    784066a6: 74 0b je 784066b3
    784066a8: 80 7c 24 27 00 cmpb $0x0,0x27(%esp)
    784066ad: 0f 85 63 ff ff ff jne 78406616
    784066b3: 8b 46 10 mov 0x10(%esi),%eax
    784066b6: 8d 56 10 lea 0x10(%esi),%edx
    784066b9: 8b 3b mov (%ebx),%edi
    784066bb: a8 01 test $0x1,%al
    784066bd: 74 b4 je 78406673
    784066bf: 89 c2 mov %eax,%edx
    784066c1: 83 e2 fe and $0xfffffffe,%edx
    784066c4: eb ad jmp 78406673
    784066c6: 89 da mov %ebx,%edx
    784066c8: 8b 4c 24 20 mov 0x20(%esp),%ecx
    784066cc: 8b 49 08 mov 0x8(%ecx),%ecx
    784066cf: 85 c9 test %ecx,%ecx
    784066d1: 89 4c 24 20 mov %ecx,0x20(%esp)
    784066d5: 0f 85 0c ff ff ff jne 784065e7
    784066db: 8b 44 24 14 mov 0x14(%esp),%eax
    784066df: 83 c4 28 add $0x28,%esp
    784066e2: 5b pop %ebx
    784066e3: 5e pop %esi
    784066e4: 5f pop %edi
    784066e5: 5d pop %ebp
    784066e6: c3 ret
    784066e7: 8b 44 24 04 mov 0x4(%esp),%eax
    784066eb: 8b 54 24 0c mov 0xc(%esp),%edx
    784066ef: 8b 80 70 01 00 00 mov 0x170(%eax),%eax
    784066f5: 89 04 24 mov %eax,(%esp)
    784066f8: 09 c1 or %eax,%ecx
    784066fa: 8d 44 2a ff lea 0xffffffff(%edx,%ebp,1),%eax
    784066fe: 0b 04 24 or (%esp),%eax
    78406701: 39 c1 cmp %eax,%ecx
    78406703: 0f 85 60 ff ff ff jne 78406669
    78406709: 8b 44 24 08 mov 0x8(%esp),%eax
    7840670d: 8b 4c 24 10 mov 0x10(%esp),%ecx
    78406711: 89 41 0c mov %eax,0xc(%ecx)
    78406714: e9 70 ff ff ff jmp 78406689
    78406719: 8d b4 26 00 00 00 00 lea 0x0(%esi),%esi

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [bug] block subsystem related crash with latest -git


    * Jens Axboe wrote:

    > [...] It's always safe to zero sg, since it's a valid entry - nothing
    > to save in ->page. Ingo, does this work for you?


    with that patch it not crashes on NULL dereference - see crashlog below.
    Compiler bug perhaps?

    Ingo

    ---------------->
    [ 34.605614] EXT3-fs: INFO: recovery required on readonly filesystem.
    [ 34.611842] EXT3-fs: write access will be enabled during recovery.
    [ 34.635861] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
    [ 34.644227] printing eip: 7840840e *pde = 00000000
    [ 34.649081] Oops: 0002 [#1] DEBUG_PAGEALLOC
    [ 34.653239]
    [ 34.654713] Pid: 1, comm: swapper Not tainted (2.6.23 #3)
    [ 34.660086] EIP: 0060:[<7840840e>] EFLAGS: 00010046 CPU: 0
    [ 34.665548] EIP is at blk_rq_map_sg+0x8e/0x190
    [ 34.669965] EAX: 00000000 EBX: 7c885180 ECX: 00000004 EDX: 033b6000
    [ 34.676205] ESI: 00001000 EDI: 00000008 EBP: 00000000 ESP: 7b4219b8
    [ 34.682444] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
    [ 34.687818] Process swapper (pid: 1, ti=7b420000 task=7b416000 task.ti=7b420000)
    [ 34.695010] Stack: 7b521d38 00000008 7c884000 7b520000 00002000 033b6000 7c885080 00000001
    [ 34.703329] 00000001 7c885880 01000002 7c886e00 7b521d1c 7c8857c4 7b520000 784c7ae5
    [ 34.711649] 7b520000 784c75ea 7c8857c4 7b524ce4 7b521d1c 7c8857c4 784e4590 7b416000
    [ 34.719968] Call Trace:
    [ 34.722568] [<784c7ae5>] scsi_init_io+0x55/0xe0
    [ 34.727161] [<784c75ea>] scsi_get_cmd_from_req+0x2a/0x40
    [ 34.732534] [<784e4590>] sd_prep_fn+0x80/0x940
    [ 34.737041] [<7813661b>] __lock_acquire+0x4ab/0xe20
    [ 34.741981] [<78135c9c>] trace_hardirqs_on+0x9c/0xb0
    [ 34.747007] [<78770e30>] _spin_unlock_irq+0x20/0x30
    [ 34.751946] [<78404633>] elv_dispatch_sort+0x23/0xe0
    [ 34.756973] [<784041a0>] elv_next_request+0xa0/0x130
    [ 34.761999] [<784c8c24>] scsi_request_fn+0x1e4/0x370
    [ 34.767025] [<78120f02>] del_timer+0x62/0x70
    [ 34.771358] [<784072d5>] __generic_unplug_device+0x25/0x30
    [ 34.776905] [<784075a5>] generic_unplug_device+0x15/0x30
    [ 34.782278] [<78404e0c>] blk_backing_dev_unplug+0xc/0x10
    [ 34.787650] [<78181e76>] sync_buffer+0x26/0x40
    [ 34.792157] [<7876f6c2>] __wait_on_bit+0x42/0x70
    [ 34.796836] [<78181e50>] sync_buffer+0x0/0x40
    [ 34.801256] [<78181e50>] sync_buffer+0x0/0x40
    [ 34.805676] [<7876f74a>] out_of_line_wait_on_bit+0x5a/0x70
    [ 34.811223] [<7812adb0>] wake_bit_function+0x0/0x60
    [ 34.816163] [<78181db4>] __wait_on_buffer+0x24/0x30
    [ 34.821102] [<78207a27>] jread+0x1b7/0x250
    [ 34.825262] [<78207bcd>] do_one_pass+0x10d/0x600
    [ 34.829942] [<78135c9c>] trace_hardirqs_on+0x9c/0xb0
    [ 34.834968] [<7820823b>] journal_recover+0x9b/0x1a0
    [ 34.839908] [<7820b811>] journal_load+0x51/0xf0
    [ 34.844501] [<781e6914>] ext3_fill_super+0xdd4/0x1850
    [ 34.849613] [<78411cef>] snprintf+0x1f/0x30
    [ 34.853860] [<7819ce70>] disk_name+0xb0/0xc0
    [ 34.858193] [<78163809>] get_sb_bdev+0x109/0x130
    [ 34.862873] [<7815f300>] __kmalloc_node+0x80/0x90
    [ 34.867639] [<781e3a80>] ext3_get_sb+0x20/0x30
    [ 34.872146] [<781e5b40>] ext3_fill_super+0x0/0x1850
    [ 34.877085] [<78163315>] vfs_kern_mount+0xb5/0x130
    [ 34.881938] [<781633ed>] do_kern_mount+0x3d/0xe0
    [ 34.886618] [<78177c57>] do_mount+0x5e7/0x710
    [ 34.891038] [<78770fe5>] _spin_unlock_irqrestore+0x55/0x70
    [ 34.896584] [<78135c9c>] trace_hardirqs_on+0x9c/0xb0
    [ 34.901610] [<78110935>] change_page_attr+0x3d5/0x400
    [ 34.906724] [<7811099a>] kernel_map_pages+0x3a/0x90
    [ 34.911663] [<7814972f>] get_page_from_freelist+0x1ff/0x3e0
    [ 34.917296] [<7814a2cf>] __alloc_pages+0x5f/0x380
    [ 34.922063] [<7814a61e>] __get_free_pages+0x2e/0x50
    [ 34.927002] [<78176660>] copy_mount_options+0x40/0x140
    [ 34.932202] [<78177df2>] sys_mount+0x72/0xb0
    [ 34.936535] [<78a14d39>] mount_block_root+0x89/0x260
    [ 34.941561] [<7816b4c7>] sys_mknod+0x27/0x30
    [ 34.945894] [<78a14f56>] mount_root+0x46/0x60
    [ 34.950314] [<78a1501c>] prepare_namespace+0xac/0x170
    [ 34.955427] [<7816000f>] sys_access+0x1f/0x30
    [ 34.959846] [<78a147ae>] kernel_init+0x15e/0x280
    [ 34.964526] [<78a14650>] kernel_init+0x0/0x280
    [ 34.969033] [<78103a97>] kernel_thread_helper+0x7/0x10
    [ 34.974233] =======================
    [ 34.977785] Code: 44 24 20 88 44 24 2b eb 54 8d 74 26 00 83 44 24 04 10 8b 7c 24 04 8b 07 a8 01 0f 85 fb 00 00 00 31 c0 b9 04 00 00 00 8b 7c 24 04 ab 8b 03 8b 54 24 04 89 02 89 72 0c 8b 43 08 89 42 04 83 44
    [ 34.996590] EIP: [<7840840e>] blk_rq_map_sg+0x8e/0x190 SS:ESP 0068:7b4219b8
    [ 35.003525] Kernel panic - not syncing: Attempted to kill init!
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [bug] block subsystem related crash with latest -git

    On Wed, Oct 17 2007, Jens Axboe wrote:
    > On Wed, Oct 17 2007, Jens Axboe wrote:
    > > On Wed, Oct 17 2007, Linus Torvalds wrote:
    > > >
    > > >
    > > > On Wed, 17 Oct 2007, Ingo Molnar wrote:
    > > > >
    > > > > Jens, just got this crash on a testbox:
    > > >
    > > > The code in question is:
    > > >
    > > > mov %edx,0xc(%esp)
    > > > mov (%ebx),%edi
    > > > mov %edi,%edx
    > > > sub %eax,%edx
    > > > mov %edx,%eax
    > > > sar $0x5,%eax
    > > > shl $0xc,%eax
    > > > add 0x8(%ebx),%eax
    > > > cmp %eax,0xc(%esp)
    > > > je +126
    > > > mov 0x10(%esi),%eax <----- Oops
    > > > lea 0x10(%esi),%edx
    > > > test $0x1,%al
    > > > jne +76
    > > > mov %edi,(%esi)
    > > > mov %ebp,0xc(%esi)
    > > > mov 0x8(%ebx),%eax
    > > > mov %eax,0x4(%esi)
    > > >
    > > >
    > > > and it looks like %esi is overflowing from one page to the next one, ie:
    > > >
    > > > BUG: unable to handle kernel paging request at virtual address 7ca76000
    > > > ESI: 7ca75ff0
    > > >
    > > > and you caught this thanks to page-alloc debugging again.
    > > >
    > > > I think I can match that up with the source code: that's "sg_next()". It's
    > > > doing:
    > > >
    > > > sg++;
    > > >
    > > > if (unlikely(sg_is_chain(sg)))
    > > > sg = sg_chain_ptr(sg);
    > > >
    > > > return sg;
    > > >
    > > > and the oopsing instruction is that load of "sg->page" in the assembly
    > > > code:
    > > >
    > > > mov 0x10(%esi),%eax # %eax = sg->page
    > > > lea 0x10(%esi),%edx # %edx = sg+1;
    > > > test $0x1,%al # if (unlikely(sg_is_chain()))
    > > > jne +76
    > > >
    > > > Jens?

    > >
    > > Yep, that's what I came up with as well - I asked Ingo for a dump in
    > > private, but ended up just using ksymoops to decode the line.
    > >
    > > The way blk_rq_map_sg() operates is that it ends up doing a
    > >
    > > next_sg = sg_next(sg);
    > >
    > > even though sg may be the last entry. Perhaps this is crapping out,
    > > although if sg is a valid address, then sg + 1 should be as well.
    > > next_sg may end up being crap, in fact it will, but we'll never use that
    > > unless there are more entries to fill. And if there is, then both sg and
    > > next_sg were valid.
    > >
    > > So nothing in for-linus should fix it, I'll try and come up with an
    > > alternate way to assign next_sg so it's always valid.

    >
    > OK, the below should actually be safe, I don't know why I talked myself
    > into the next_sg stuff in the beginning. It's always safe to zero sg,
    > since it's a valid entry - nothing to save in ->page. Ingo, does this
    > work for you?
    >
    > diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
    > index 9e3f3cc..3935469 100644
    > --- a/block/ll_rw_blk.c
    > +++ b/block/ll_rw_blk.c
    > @@ -1322,8 +1322,8 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
    > struct scatterlist *sglist)
    > {
    > struct bio_vec *bvec, *bvprv;
    > - struct scatterlist *next_sg, *sg;
    > struct req_iterator iter;
    > + struct scatterlist *sg;
    > int nsegs, cluster;
    >
    > nsegs = 0;
    > @@ -1333,7 +1333,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
    > * for each bio in rq
    > */
    > bvprv = NULL;
    > - sg = next_sg = &sglist[0];
    > + sg = NULL;
    > rq_for_each_segment(bvec, rq, iter) {
    > int nbytes = bvec->bv_len;
    >
    > @@ -1349,8 +1349,10 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
    > sg->length += nbytes;
    > } else {
    > new_segment:
    > - sg = next_sg;
    > - next_sg = sg_next(sg);
    > + if (!sg)
    > + sg = sglist;
    > + else
    > + sg = sg_next(sg);
    >
    > memset(sg, 0, sizeof(*sg));
    > sg->page = bvec->bv_page;
    >


    Scratch that, it cannot work... I'll think up a different approach.

    --
    Jens Axboe

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [bug] block subsystem related crash with latest -git


    * Jens Axboe wrote:

    > OK, it is fine, as long as the sglist is cleared initially. And I
    > don't think there's anyway around that, clearly I didn't think long
    > enough before including the memset() removal from Tomo.
    >
    > Ingo, please try this rolled up version.
    >
    > Linus, this should work. It would probably be best if you first did a
    > git revert on f5c0dde4c66421a3a2d7d6fa604a712c9b0744e5 and then
    > applied the ll_rw_blk.c bit alone. Do you want me to stuff that
    > (revert + patch) into a branch for you to pull?


    yep, this one did the trick, it booted up fine twice in a row already!

    Tested-by: Ingo Molnar

    thanks!

    Ingo
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [bug] block subsystem related crash with latest -git

    On Wed, Oct 17 2007, Ingo Molnar wrote:
    >
    > * Jens Axboe wrote:
    >
    > > > - sg = next_sg;
    > > > - next_sg = sg_next(sg);
    > > > + if (!sg)
    > > > + sg = sglist;
    > > > + else
    > > > + sg = sg_next(sg);
    > > >
    > > > memset(sg, 0, sizeof(*sg));
    > > > sg->page = bvec->bv_page;
    > > >

    > >
    > > Scratch that, it cannot work... I'll think up a different approach.

    >
    > too late, crashed my box with it already :-)


    Sorry about that, please try the next one that includes the scsi_lib.c
    one liner to clear the sg table on alloc :-)

    --
    Jens Axboe

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [bug] block subsystem related crash with latest -git

    On Wed, Oct 17 2007, Ingo Molnar wrote:
    >
    > * Jens Axboe wrote:
    >
    > > OK, it is fine, as long as the sglist is cleared initially. And I
    > > don't think there's anyway around that, clearly I didn't think long
    > > enough before including the memset() removal from Tomo.
    > >
    > > Ingo, please try this rolled up version.
    > >
    > > Linus, this should work. It would probably be best if you first did a
    > > git revert on f5c0dde4c66421a3a2d7d6fa604a712c9b0744e5 and then
    > > applied the ll_rw_blk.c bit alone. Do you want me to stuff that
    > > (revert + patch) into a branch for you to pull?

    >
    > yep, this one did the trick, it booted up fine twice in a row already!
    >
    > Tested-by: Ingo Molnar
    >
    > thanks!


    Great! Thanks a lot for reporting and testing... Linus, care to pull

    git://git.kernel.dk/data/git/linux-2.6-block.git for-linus

    Jens Axboe (2):
    Revert "[SCSI] Remove full sg table memset()"
    [BLOCK] blk_rq_map_sg() next_sg fixup

    block/ll_rw_blk.c | 10 ++++++----
    drivers/scsi/scsi_lib.c | 2 ++
    2 files changed, 8 insertions(+), 4 deletions(-)

    --
    Jens Axboe

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [bug] block subsystem related crash with latest -git


    * Jens Axboe wrote:

    > > - sg = next_sg;
    > > - next_sg = sg_next(sg);
    > > + if (!sg)
    > > + sg = sglist;
    > > + else
    > > + sg = sg_next(sg);
    > >
    > > memset(sg, 0, sizeof(*sg));
    > > sg->page = bvec->bv_page;
    > >

    >
    > Scratch that, it cannot work... I'll think up a different approach.


    too late, crashed my box with it already :-)

    Ingo
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [bug] block subsystem related crash with latest -git

    On Wed, Oct 17 2007, Jens Axboe wrote:
    > On Wed, Oct 17 2007, Jens Axboe wrote:
    > > On Wed, Oct 17 2007, Jens Axboe wrote:
    > > > On Wed, Oct 17 2007, Linus Torvalds wrote:
    > > > >
    > > > >
    > > > > On Wed, 17 Oct 2007, Ingo Molnar wrote:
    > > > > >
    > > > > > Jens, just got this crash on a testbox:
    > > > >
    > > > > The code in question is:
    > > > >
    > > > > mov %edx,0xc(%esp)
    > > > > mov (%ebx),%edi
    > > > > mov %edi,%edx
    > > > > sub %eax,%edx
    > > > > mov %edx,%eax
    > > > > sar $0x5,%eax
    > > > > shl $0xc,%eax
    > > > > add 0x8(%ebx),%eax
    > > > > cmp %eax,0xc(%esp)
    > > > > je +126
    > > > > mov 0x10(%esi),%eax <----- Oops
    > > > > lea 0x10(%esi),%edx
    > > > > test $0x1,%al
    > > > > jne +76
    > > > > mov %edi,(%esi)
    > > > > mov %ebp,0xc(%esi)
    > > > > mov 0x8(%ebx),%eax
    > > > > mov %eax,0x4(%esi)
    > > > >
    > > > >
    > > > > and it looks like %esi is overflowing from one page to the next one, ie:
    > > > >
    > > > > BUG: unable to handle kernel paging request at virtual address 7ca76000
    > > > > ESI: 7ca75ff0
    > > > >
    > > > > and you caught this thanks to page-alloc debugging again.
    > > > >
    > > > > I think I can match that up with the source code: that's "sg_next()". It's
    > > > > doing:
    > > > >
    > > > > sg++;
    > > > >
    > > > > if (unlikely(sg_is_chain(sg)))
    > > > > sg = sg_chain_ptr(sg);
    > > > >
    > > > > return sg;
    > > > >
    > > > > and the oopsing instruction is that load of "sg->page" in the assembly
    > > > > code:
    > > > >
    > > > > mov 0x10(%esi),%eax # %eax = sg->page
    > > > > lea 0x10(%esi),%edx # %edx = sg+1;
    > > > > test $0x1,%al # if (unlikely(sg_is_chain()))
    > > > > jne +76
    > > > >
    > > > > Jens?
    > > >
    > > > Yep, that's what I came up with as well - I asked Ingo for a dump in
    > > > private, but ended up just using ksymoops to decode the line.
    > > >
    > > > The way blk_rq_map_sg() operates is that it ends up doing a
    > > >
    > > > next_sg = sg_next(sg);
    > > >
    > > > even though sg may be the last entry. Perhaps this is crapping out,
    > > > although if sg is a valid address, then sg + 1 should be as well.
    > > > next_sg may end up being crap, in fact it will, but we'll never use that
    > > > unless there are more entries to fill. And if there is, then both sg and
    > > > next_sg were valid.
    > > >
    > > > So nothing in for-linus should fix it, I'll try and come up with an
    > > > alternate way to assign next_sg so it's always valid.

    > >
    > > OK, the below should actually be safe, I don't know why I talked myself
    > > into the next_sg stuff in the beginning. It's always safe to zero sg,
    > > since it's a valid entry - nothing to save in ->page. Ingo, does this
    > > work for you?
    > >
    > > diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
    > > index 9e3f3cc..3935469 100644
    > > --- a/block/ll_rw_blk.c
    > > +++ b/block/ll_rw_blk.c
    > > @@ -1322,8 +1322,8 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
    > > struct scatterlist *sglist)
    > > {
    > > struct bio_vec *bvec, *bvprv;
    > > - struct scatterlist *next_sg, *sg;
    > > struct req_iterator iter;
    > > + struct scatterlist *sg;
    > > int nsegs, cluster;
    > >
    > > nsegs = 0;
    > > @@ -1333,7 +1333,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
    > > * for each bio in rq
    > > */
    > > bvprv = NULL;
    > > - sg = next_sg = &sglist[0];
    > > + sg = NULL;
    > > rq_for_each_segment(bvec, rq, iter) {
    > > int nbytes = bvec->bv_len;
    > >
    > > @@ -1349,8 +1349,10 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
    > > sg->length += nbytes;
    > > } else {
    > > new_segment:
    > > - sg = next_sg;
    > > - next_sg = sg_next(sg);
    > > + if (!sg)
    > > + sg = sglist;
    > > + else
    > > + sg = sg_next(sg);
    > >
    > > memset(sg, 0, sizeof(*sg));
    > > sg->page = bvec->bv_page;
    > >

    >
    > Scratch that, it cannot work... I'll think up a different approach.


    OK, it is fine, as long as the sglist is cleared initially. And I don't
    think there's anyway around that, clearly I didn't think long enough
    before including the memset() removal from Tomo.

    Ingo, please try this rolled up version.

    Linus, this should work. It would probably be best if you first did a
    git revert on f5c0dde4c66421a3a2d7d6fa604a712c9b0744e5 and then applied
    the ll_rw_blk.c bit alone. Do you want me to stuff that (revert + patch)
    into a branch for you to pull?

    diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
    index 9e3f3cc..3935469 100644
    --- a/block/ll_rw_blk.c
    +++ b/block/ll_rw_blk.c
    @@ -1322,8 +1322,8 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
    struct scatterlist *sglist)
    {
    struct bio_vec *bvec, *bvprv;
    - struct scatterlist *next_sg, *sg;
    struct req_iterator iter;
    + struct scatterlist *sg;
    int nsegs, cluster;

    nsegs = 0;
    @@ -1333,7 +1333,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
    * for each bio in rq
    */
    bvprv = NULL;
    - sg = next_sg = &sglist[0];
    + sg = NULL;
    rq_for_each_segment(bvec, rq, iter) {
    int nbytes = bvec->bv_len;

    @@ -1349,8 +1349,10 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
    sg->length += nbytes;
    } else {
    new_segment:
    - sg = next_sg;
    - next_sg = sg_next(sg);
    + if (!sg)
    + sg = sglist;
    + else
    + sg = sg_next(sg);

    memset(sg, 0, sizeof(*sg));
    sg->page = bvec->bv_page;
    diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
    index 0c86be7..aac8a02 100644
    --- a/drivers/scsi/scsi_lib.c
    +++ b/drivers/scsi/scsi_lib.c
    @@ -764,6 +764,8 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
    if (unlikely(!sgl))
    goto enomem;

    + memset(sgl, 0, sizeof(*sgl) * sgp->size);
    +
    /*
    * first loop through, set initial index and return value
    */

    --
    Jens Axboe

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. [bug] ata subsystem related crash with latest -git


    ok, here's a different but similar crash that triggers on the testbox:

    [ 233.438890] BUG: unable to handle kernel paging request at virtual address 7d93e000
    [ 233.446390] printing eip: 784e9480 *pde = 01000067 *pte = 0593e000
    [ 233.452630] Oops: 0000 [#1] DEBUG_PAGEALLOC
    [ 233.456790]
    [ 233.458264] Pid: 0, comm: swapper Not tainted (2.6.23 #5)
    [ 233.463637] EIP: 0060:[<784e9480>] EFLAGS: 00010087 CPU: 0
    [ 233.469101] EIP is at ata_qc_issue+0x90/0x380
    [ 233.473429] EAX: 7d93dff0 EBX: 0000001f ECX: 7d93dff0 EDX: 798daf80
    [ 233.479668] ESI: 00000020 EDI: 7d93de00 EBP: 7b54007c ESP: 78a13e14
    [ 233.485908] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
    [ 233.491282] Process swapper (pid: 0, ti=78a12000 task=789753e0 task.ti=78a12000)
    [ 233.498473] Stack: 7d93de00 7b540000 7b540000 00000000 7d93dfe0 7b54007c 7d93db00 7b5417a4
    [ 233.506793] 784c2490 784ef69e 784f21f3 7b52de98 7d93db00 7b540000 7b5417a4 7d93db00
    [ 233.515112] 7b540000 7b524004 784f22e0 784ef380 784c2490 7d93db00 00000202 7b524004
    [ 233.523432] Call Trace:
    [ 233.526033] [<784c2490>] scsi_done+0x0/0x20
    [ 233.530279] [<784ef69e>] ata_scsi_translate+0xbe/0x140
    [ 233.535478] [<784f21f3>] ata_scsi_queuecmd+0x33/0x200
    [ 233.540591] [<784f22e0>] ata_scsi_queuecmd+0x120/0x200
    [ 233.545791] [<784ef380>] ata_scsi_rw_xlat+0x0/0x220
    [ 233.550730] [<784c2490>] scsi_done+0x0/0x20
    [ 233.554976] [<784c2d12>] scsi_dispatch_cmd+0x152/0x290
    [ 233.560177] [<78135c67>] trace_hardirqs_on+0x67/0xb0
    [ 233.565202] [<784c8c7e>] scsi_request_fn+0x1be/0x370
    [ 233.570229] [<78408086>] blk_run_queue+0x36/0x80
    [ 233.574909] [<784c7520>] scsi_next_command+0x30/0x50
    [ 233.579935] [<784c76ab>] scsi_end_request+0xab/0xe0
    [ 233.584875] [<784c83f9>] scsi_io_completion+0xa9/0x3d0
    [ 233.590075] [<78135c67>] trace_hardirqs_on+0x67/0xb0
    [ 233.595100] [<78405125>] blk_done_softirq+0x45/0x80
    [ 233.600040] [<78405153>] blk_done_softirq+0x73/0x80
    [ 233.604981] [<7811d4c3>] __do_softirq+0x53/0xb0
    [ 233.609573] [<7811d588>] do_softirq+0x68/0x70
    [ 233.613993] [<78105351>] do_IRQ+0x51/0x90
    [ 233.618066] [<78135c9c>] trace_hardirqs_on+0x9c/0xb0
    [ 233.623092] [<7810f2d0>] pgd_dtor+0x0/0x50
    [ 233.627252] [<7810388e>] common_interrupt+0x2e/0x40
    [ 233.632192] [<7810f2d0>] pgd_dtor+0x0/0x50
    [ 233.636352] [<7815f3be>] quicklist_trim+0x5e/0x90
    [ 233.641118] [<7810f2cb>] check_pgt_cache+0x1b/0x20
    [ 233.645971] [<78100c52>] cpu_idle+0x32/0x60
    [ 233.650217] [<78a14b35>] start_kernel+0x265/0x300
    [ 233.654983] [<78a14380>] unknown_bootoption+0x0/0x1e0
    [ 233.660097] =======================
    [ 233.663649] Code: 00 00 00 8b 45 34 a8 02 0f 84 ed 00 00 00 8b bd 88 00 00 00 31 db 89 3c 24 8b 75 3c 89 f8 c7 44 24 10 00 00 00 00 eb 1b 8d 76 00 <8b> 50 10 8d 48 10 f6 c2 01 0f 85 be 02 00 00 89 44 24 10 83 c3
    [ 233.682455] EIP: [<784e9480>] ata_qc_issue+0x90/0x380 SS:ESP 0068:78a13e14
    [ 233.689302] Kernel panic - not syncing: Fatal exception in interrupt

    (gdb) list *0x784e9480
    0x784e9480 is in ata_qc_issue (include/linux/scatterlist.h:48).
    43 */
    44 static inline struct scatterlist *sg_next(struct scatterlist *sg)
    45 {
    46 sg++;
    47
    48 if (unlikely(sg_is_chain(sg)))
    49 sg = sg_chain_ptr(sg);
    50
    51 return sg;
    52 }
    (gdb)

    so there's sg_next() involvement too. Below is the disassembly.

    Ingo

    ------------------------->
    784e93f0 :
    784e93f0: 55 push %ebp
    784e93f1: 89 c5 mov %eax,%ebp
    784e93f3: 57 push %edi
    784e93f4: 56 push %esi
    784e93f5: 53 push %ebx
    784e93f6: 83 ec 14 sub $0x14,%esp
    784e93f9: 8b 00 mov (%eax),%eax
    784e93fb: 89 44 24 04 mov %eax,0x4(%esp)
    784e93ff: 8b 45 04 mov 0x4(%ebp),%eax
    784e9402: 80 7d 14 04 cmpb $0x4,0x14(%ebp)
    784e9406: 8b 10 mov (%eax),%edx
    784e9408: 0f 84 d2 01 00 00 je 784e95e0
    784e940e: 8b 5c 24 04 mov 0x4(%esp),%ebx
    784e9412: 83 83 84 16 00 00 01 addl $0x1,0x1684(%ebx)
    784e9419: 8b 45 38 mov 0x38(%ebp),%eax
    784e941c: 89 42 08 mov %eax,0x8(%edx)
    784e941f: 83 4d 34 01 orl $0x1,0x34(%ebp)
    784e9423: b8 01 00 00 00 mov $0x1,%eax
    784e9428: 8b 4d 38 mov 0x38(%ebp),%ecx
    784e942b: 89 c7 mov %eax,%edi
    784e942d: 8b 54 24 04 mov 0x4(%esp),%edx
    784e9431: d3 e7 shl %cl,%edi
    784e9433: 09 ba 80 16 00 00 or %edi,0x1680(%edx)
    784e9439: 8b 4d 00 mov 0x0(%ebp),%ecx
    784e943c: 89 4c 24 08 mov %ecx,0x8(%esp)
    784e9440: 80 7d 14 07 cmpb $0x7,0x14(%ebp)
    784e9444: 0f 87 c6 00 00 00 ja 784e9510
    784e944a: 0f be 4d 14 movsbl 0x14(%ebp),%ecx
    784e944e: d3 e0 shl %cl,%eax
    784e9450: a8 98 test $0x98,%al
    784e9452: 0f 84 ab 00 00 00 je 784e9503
    784e9458: 8b 45 34 mov 0x34(%ebp),%eax
    784e945b: a8 02 test $0x2,%al
    784e945d: 0f 84 ed 00 00 00 je 784e9550
    784e9463: 8b bd 88 00 00 00 mov 0x88(%ebp),%edi
    784e9469: 31 db xor %ebx,%ebx
    784e946b: 89 3c 24 mov %edi,(%esp)
    784e946e: 8b 75 3c mov 0x3c(%ebp),%esi
    784e9471: 89 f8 mov %edi,%eax
    784e9473: c7 44 24 10 00 00 00 movl $0x0,0x10(%esp)
    784e947a: 00
    784e947b: eb 1b jmp 784e9498
    784e947d: 8d 76 00 lea 0x0(%esi),%esi
    784e9480: 8b 50 10 mov 0x10(%eax),%edx
    784e9483: 8d 48 10 lea 0x10(%eax),%ecx
    784e9486: f6 c2 01 test $0x1,%dl
    784e9489: 0f 85 be 02 00 00 jne 784e974d
    784e948f: 89 44 24 10 mov %eax,0x10(%esp)
    784e9493: 83 c3 01 add $0x1,%ebx
    784e9496: 89 c8 mov %ecx,%eax
    784e9498: 39 f3 cmp %esi,%ebx
    784e949a: 75 e4 jne 784e9480
    784e949c: 8b 54 24 10 mov 0x10(%esp),%edx
    784e94a0: 8b 42 0c mov 0xc(%edx),%eax
    784e94a3: 83 e0 03 and $0x3,%eax
    784e94a6: 85 c0 test %eax,%eax
    784e94a8: 89 45 4c mov %eax,0x4c(%ebp)
    784e94ab: 0f 85 53 01 00 00 jne 784e9604
    784e94b1: 89 f1 mov %esi,%ecx
    784e94b3: 83 f9 00 cmp $0x0,%ecx
    784e94b6: 0f 84 df 01 00 00 je 784e969b
    784e94bc: 7e 30 jle 784e94ee
    784e94be: 31 d2 xor %edx,%edx
    784e94c0: 8b 1c 24 mov (%esp),%ebx
    784e94c3: 83 c2 01 add $0x1,%edx
    784e94c6: 8b 03 mov (%ebx),%eax
    784e94c8: 2b 05 f8 ec d7 78 sub 0x78d7ecf8,%eax
    784e94ce: c1 f8 05 sar $0x5,%eax
    784e94d1: c1 e0 0c shl $0xc,%eax
    784e94d4: 03 43 04 add 0x4(%ebx),%eax
    784e94d7: 89 43 08 mov %eax,0x8(%ebx)
    784e94da: 83 c3 10 add $0x10,%ebx
    784e94dd: 89 1c 24 mov %ebx,(%esp)
    784e94e0: 8b 03 mov (%ebx),%eax
    784e94e2: a8 01 test $0x1,%al
    784e94e4: 0f 85 6d 02 00 00 jne 784e9757
    784e94ea: 39 ca cmp %ecx,%edx
    784e94ec: 75 d2 jne 784e94c0
    784e94ee: f0 83 44 24 00 00 lock addl $0x0,0x0(%esp)
    784e94f4: 85 c9 test %ecx,%ecx
    784e94f6: 89 c8 mov %ecx,%eax
    784e94f8: 0f 8e 2c 02 00 00 jle 784e972a
    784e94fe: 89 45 3c mov %eax,0x3c(%ebp)
    784e9501: eb 11 jmp 784e9514
    784e9503: a8 24 test $0x24,%al
    784e9505: 0f 85 b5 00 00 00 jne 784e95c0
    784e950b: 90 nop
    784e950c: 8d 74 26 00 lea 0x0(%esi),%esi
    784e9510: 83 65 34 f9 andl $0xfffffff9,0x34(%ebp)
    784e9514: 8b 5c 24 04 mov 0x4(%esp),%ebx
    784e9518: 89 e8 mov %ebp,%eax
    784e951a: 8b 53 04 mov 0x4(%ebx),%edx
    784e951d: ff 52 48 call *0x48(%edx)
    784e9520: 8b 9d 8c 00 00 00 mov 0x8c(%ebp),%ebx
    784e9526: 89 e8 mov %ebp,%eax
    784e9528: 8b 7c 24 04 mov 0x4(%esp),%edi
    784e952c: 8b 57 04 mov 0x4(%edi),%edx
    784e952f: ff 52 4c call *0x4c(%edx)
    784e9532: 09 d8 or %ebx,%eax
    784e9534: 85 c0 test %eax,%eax
    784e9536: 89 85 8c 00 00 00 mov %eax,0x8c(%ebp)
    784e953c: 0f 85 fd 01 00 00 jne 784e973f
    784e9542: 83 c4 14 add $0x14,%esp
    784e9545: 5b pop %ebx
    784e9546: 5e pop %esi
    784e9547: 5f pop %edi
    784e9548: 5d pop %ebp
    784e9549: c3 ret
    784e954a: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
    784e9550: a8 04 test $0x4,%al
    784e9552: 74 c0 je 784e9514
    784e9554: 8b 95 88 00 00 00 mov 0x88(%ebp),%edx
    784e955a: 8b 42 0c mov 0xc(%edx),%eax
    784e955d: 83 e0 03 and $0x3,%eax
    784e9560: 85 c0 test %eax,%eax
    784e9562: 89 45 4c mov %eax,0x4c(%ebp)
    784e9565: 0f 84 3a 01 00 00 je 784e96a5
    784e956b: 8b 45 38 mov 0x38(%ebp),%eax
    784e956e: 8d 3c 85 00 00 00 00 lea 0x0(,%eax,4),%edi
    784e9575: 8b 44 24 08 mov 0x8(%esp),%eax
    784e9579: 03 78 24 add 0x24(%eax),%edi
    784e957c: c7 07 00 00 00 00 movl $0x0,(%edi)
    784e9582: f6 45 10 08 testb $0x8,0x10(%ebp)
    784e9586: 0f 85 3a 01 00 00 jne 784e96c6
    784e958c: 8b 4c 24 08 mov 0x8(%esp),%ecx
    784e9590: 8b 45 38 mov 0x38(%ebp),%eax
    784e9593: c1 e0 02 shl $0x2,%eax
    784e9596: 03 41 28 add 0x28(%ecx),%eax
    784e9599: c7 85 80 00 00 00 04 movl $0x4,0x80(%ebp)
    784e95a0: 00 00 00
    784e95a3: 89 45 7c mov %eax,0x7c(%ebp)
    784e95a6: 8b 42 0c mov 0xc(%edx),%eax
    784e95a9: 2b 45 4c sub 0x4c(%ebp),%eax
    784e95ac: 85 c0 test %eax,%eax
    784e95ae: 89 42 0c mov %eax,0xc(%edx)
    784e95b1: 0f 85 ee 00 00 00 jne 784e96a5
    784e95b7: 83 6d 3c 01 subl $0x1,0x3c(%ebp)
    784e95bb: e9 54 ff ff ff jmp 784e9514
    784e95c0: 8b 5c 24 08 mov 0x8(%esp),%ebx
    784e95c4: 80 7b 0c 00 cmpb $0x0,0xc(%ebx)
    784e95c8: 0f 89 42 ff ff ff jns 784e9510
    784e95ce: 8b 45 34 mov 0x34(%ebp),%eax
    784e95d1: a8 02 test $0x2,%al
    784e95d3: 0f 84 77 ff ff ff je 784e9550
    784e95d9: e9 85 fe ff ff jmp 784e9463
    784e95de: 89 f6 mov %esi,%esi
    784e95e0: 8b 42 0c mov 0xc(%edx),%eax
    784e95e3: 85 c0 test %eax,%eax
    784e95e5: 75 0b jne 784e95f2
    784e95e7: 8b 4c 24 04 mov 0x4(%esp),%ecx
    784e95eb: 83 81 84 16 00 00 01 addl $0x1,0x1684(%ecx)
    784e95f2: 8b 4d 38 mov 0x38(%ebp),%ecx
    784e95f5: b8 01 00 00 00 mov $0x1,%eax
    784e95fa: d3 e0 shl %cl,%eax
    784e95fc: 09 42 0c or %eax,0xc(%edx)
    784e95ff: e9 1b fe ff ff jmp 784e941f
    784e9604: 8b 45 38 mov 0x38(%ebp),%eax
    784e9607: 8b 4c 24 08 mov 0x8(%esp),%ecx
    784e960b: c1 e0 02 shl $0x2,%eax
    784e960e: 89 44 24 0c mov %eax,0xc(%esp)
    784e9612: 8b 49 24 mov 0x24(%ecx),%ecx
    784e9615: 01 c8 add %ecx,%eax
    784e9617: 89 44 24 0c mov %eax,0xc(%esp)
    784e961b: c7 00 00 00 00 00 movl $0x0,(%eax)
    784e9621: 8b 5c 24 10 mov 0x10(%esp),%ebx
    784e9625: 8b 7c 24 10 mov 0x10(%esp),%edi
    784e9629: 8b 53 04 mov 0x4(%ebx),%edx
    784e962c: 03 53 0c add 0xc(%ebx),%edx
    784e962f: 8b 07 mov (%edi),%eax
    784e9631: 8b 1d f8 ec d7 78 mov 0x78d7ecf8,%ebx
    784e9637: 2b 55 4c sub 0x4c(%ebp),%edx
    784e963a: 29 d8 sub %ebx,%eax
    784e963c: 89 d1 mov %edx,%ecx
    784e963e: c1 f8 05 sar $0x5,%eax
    784e9641: 81 e2 ff 0f 00 00 and $0xfff,%edx
    784e9647: c1 e9 0c shr $0xc,%ecx
    784e964a: 01 c8 add %ecx,%eax
    784e964c: c1 e0 05 shl $0x5,%eax
    784e964f: 01 c3 add %eax,%ebx
    784e9651: f6 45 10 08 testb $0x8,0x10(%ebp)
    784e9655: 89 5d 74 mov %ebx,0x74(%ebp)
    784e9658: 89 55 78 mov %edx,0x78(%ebp)
    784e965b: 0f 85 88 00 00 00 jne 784e96e9
    784e9661: 8b 54 24 08 mov 0x8(%esp),%edx
    784e9665: 8b 45 38 mov 0x38(%ebp),%eax
    784e9668: c1 e0 02 shl $0x2,%eax
    784e966b: 03 42 28 add 0x28(%edx),%eax
    784e966e: c7 85 80 00 00 00 04 movl $0x4,0x80(%ebp)
    784e9675: 00 00 00
    784e9678: 89 45 7c mov %eax,0x7c(%ebp)
    784e967b: 8b 4c 24 10 mov 0x10(%esp),%ecx
    784e967f: 8b 41 0c mov 0xc(%ecx),%eax
    784e9682: 2b 45 4c sub 0x4c(%ebp),%eax
    784e9685: 85 c0 test %eax,%eax
    784e9687: 89 41 0c mov %eax,0xc(%ecx)
    784e968a: 75 32 jne 784e96be
    784e968c: 8b 4d 3c mov 0x3c(%ebp),%ecx
    784e968f: 85 c9 test %ecx,%ecx
    784e9691: 74 08 je 784e969b
    784e9693: 83 e9 01 sub $0x1,%ecx
    784e9696: e9 18 fe ff ff jmp 784e94b3
    784e969b: 31 c0 xor %eax,%eax
    784e969d: 8d 76 00 lea 0x0(%esi),%esi
    784e96a0: e9 59 fe ff ff jmp 784e94fe
    784e96a5: 8b 85 84 00 00 00 mov 0x84(%ebp),%eax
    784e96ab: f0 83 44 24 00 00 lock addl $0x0,0x0(%esp)
    784e96b1: 2d 00 00 00 78 sub $0x78000000,%eax
    784e96b6: 89 42 08 mov %eax,0x8(%edx)
    784e96b9: e9 56 fe ff ff jmp 784e9514
    784e96be: 8b 75 3c mov 0x3c(%ebp),%esi
    784e96c1: e9 eb fd ff ff jmp 784e94b1
    784e96c6: 8b 45 4c mov 0x4c(%ebp),%eax
    784e96c9: 8b 72 0c mov 0xc(%edx),%esi
    784e96cc: 03 b5 84 00 00 00 add 0x84(%ebp),%esi
    784e96d2: 89 c1 mov %eax,%ecx
    784e96d4: c1 e9 02 shr $0x2,%ecx
    784e96d7: 29 c6 sub %eax,%esi
    784e96d9: f3 a5 rep movsl %ds%esi),%es%edi)
    784e96db: 89 c1 mov %eax,%ecx
    784e96dd: 83 e1 03 and $0x3,%ecx
    784e96e0: 74 02 je 784e96e4
    784e96e2: f3 a4 rep movsb %ds%esi),%es%edi)
    784e96e4: e9 a3 fe ff ff jmp 784e958c
    784e96e9: 89 e2 mov %esp,%edx
    784e96eb: 81 e2 00 e0 ff ff and $0xffffe000,%edx
    784e96f1: 83 42 14 01 addl $0x1,0x14(%edx)
    784e96f5: 8b 45 4c mov 0x4c(%ebp),%eax
    784e96f8: 2b 1d f8 ec d7 78 sub 0x78d7ecf8,%ebx
    784e96fe: 8b 7c 24 0c mov 0xc(%esp),%edi
    784e9702: c1 fb 05 sar $0x5,%ebx
    784e9705: 89 c1 mov %eax,%ecx
    784e9707: c1 e3 0c shl $0xc,%ebx
    784e970a: 8d b3 00 00 00 78 lea 0x78000000(%ebx),%esi
    784e9710: c1 e9 02 shr $0x2,%ecx
    784e9713: 03 75 78 add 0x78(%ebp),%esi
    784e9716: f3 a5 rep movsl %ds%esi),%es%edi)
    784e9718: 89 c1 mov %eax,%ecx
    784e971a: 83 e1 03 and $0x3,%ecx
    784e971d: 74 02 je 784e9721
    784e971f: f3 a4 rep movsb %ds%esi),%es%edi)
    784e9721: 83 6a 14 01 subl $0x1,0x14(%edx)
    784e9725: e9 37 ff ff ff jmp 784e9661
    784e972a: 8b 54 24 10 mov 0x10(%esp),%edx
    784e972e: 8b 45 4c mov 0x4c(%ebp),%eax
    784e9731: 01 42 0c add %eax,0xc(%edx)
    784e9734: 83 65 34 f9 andl $0xfffffff9,0x34(%ebp)
    784e9738: 83 8d 8c 00 00 00 40 orl $0x40,0x8c(%ebp)
    784e973f: 83 c4 14 add $0x14,%esp
    784e9742: 89 e8 mov %ebp,%eax
    784e9744: 5b pop %ebx
    784e9745: 5e pop %esi
    784e9746: 5f pop %edi
    784e9747: 5d pop %ebp
    784e9748: e9 23 fc ff ff jmp 784e9370
    784e974d: 89 d1 mov %edx,%ecx
    784e974f: 83 e1 fe and $0xfffffffe,%ecx
    784e9752: e9 38 fd ff ff jmp 784e948f
    784e9757: 83 e0 fe and $0xfffffffe,%eax
    784e975a: 89 04 24 mov %eax,(%esp)
    784e975d: e9 88 fd ff ff jmp 784e94ea
    784e9762: 8d b4 26 00 00 00 00 lea 0x0(%esi),%esi
    784e9769: 8d bc 27 00 00 00 00 lea 0x0(%edi),%edi

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [bug] ata subsystem related crash with latest -git

    On Wed, Oct 17 2007, Ingo Molnar wrote:
    >
    > ok, here's a different but similar crash that triggers on the testbox:
    >
    > [ 233.438890] BUG: unable to handle kernel paging request at virtual address 7d93e000
    > [ 233.446390] printing eip: 784e9480 *pde = 01000067 *pte = 0593e000
    > [ 233.452630] Oops: 0000 [#1] DEBUG_PAGEALLOC
    > [ 233.456790]
    > [ 233.458264] Pid: 0, comm: swapper Not tainted (2.6.23 #5)
    > [ 233.463637] EIP: 0060:[<784e9480>] EFLAGS: 00010087 CPU: 0
    > [ 233.469101] EIP is at ata_qc_issue+0x90/0x380
    > [ 233.473429] EAX: 7d93dff0 EBX: 0000001f ECX: 7d93dff0 EDX: 798daf80
    > [ 233.479668] ESI: 00000020 EDI: 7d93de00 EBP: 7b54007c ESP: 78a13e14
    > [ 233.485908] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
    > [ 233.491282] Process swapper (pid: 0, ti=78a12000 task=789753e0 task.ti=78a12000)
    > [ 233.498473] Stack: 7d93de00 7b540000 7b540000 00000000 7d93dfe0 7b54007c 7d93db00 7b5417a4
    > [ 233.506793] 784c2490 784ef69e 784f21f3 7b52de98 7d93db00 7b540000 7b5417a4 7d93db00
    > [ 233.515112] 7b540000 7b524004 784f22e0 784ef380 784c2490 7d93db00 00000202 7b524004
    > [ 233.523432] Call Trace:
    > [ 233.526033] [<784c2490>] scsi_done+0x0/0x20
    > [ 233.530279] [<784ef69e>] ata_scsi_translate+0xbe/0x140
    > [ 233.535478] [<784f21f3>] ata_scsi_queuecmd+0x33/0x200
    > [ 233.540591] [<784f22e0>] ata_scsi_queuecmd+0x120/0x200
    > [ 233.545791] [<784ef380>] ata_scsi_rw_xlat+0x0/0x220
    > [ 233.550730] [<784c2490>] scsi_done+0x0/0x20
    > [ 233.554976] [<784c2d12>] scsi_dispatch_cmd+0x152/0x290
    > [ 233.560177] [<78135c67>] trace_hardirqs_on+0x67/0xb0
    > [ 233.565202] [<784c8c7e>] scsi_request_fn+0x1be/0x370
    > [ 233.570229] [<78408086>] blk_run_queue+0x36/0x80
    > [ 233.574909] [<784c7520>] scsi_next_command+0x30/0x50
    > [ 233.579935] [<784c76ab>] scsi_end_request+0xab/0xe0
    > [ 233.584875] [<784c83f9>] scsi_io_completion+0xa9/0x3d0
    > [ 233.590075] [<78135c67>] trace_hardirqs_on+0x67/0xb0
    > [ 233.595100] [<78405125>] blk_done_softirq+0x45/0x80
    > [ 233.600040] [<78405153>] blk_done_softirq+0x73/0x80
    > [ 233.604981] [<7811d4c3>] __do_softirq+0x53/0xb0
    > [ 233.609573] [<7811d588>] do_softirq+0x68/0x70
    > [ 233.613993] [<78105351>] do_IRQ+0x51/0x90
    > [ 233.618066] [<78135c9c>] trace_hardirqs_on+0x9c/0xb0
    > [ 233.623092] [<7810f2d0>] pgd_dtor+0x0/0x50
    > [ 233.627252] [<7810388e>] common_interrupt+0x2e/0x40
    > [ 233.632192] [<7810f2d0>] pgd_dtor+0x0/0x50
    > [ 233.636352] [<7815f3be>] quicklist_trim+0x5e/0x90
    > [ 233.641118] [<7810f2cb>] check_pgt_cache+0x1b/0x20
    > [ 233.645971] [<78100c52>] cpu_idle+0x32/0x60
    > [ 233.650217] [<78a14b35>] start_kernel+0x265/0x300
    > [ 233.654983] [<78a14380>] unknown_bootoption+0x0/0x1e0
    > [ 233.660097] =======================
    > [ 233.663649] Code: 00 00 00 8b 45 34 a8 02 0f 84 ed 00 00 00 8b bd 88 00 00 00 31 db 89 3c 24 8b 75 3c 89 f8 c7 44 24 10 00 00 00 00 eb 1b 8d 76 00 <8b> 50 10 8d 48 10 f6 c2 01 0f 85 be 02 00 00 89 44 24 10 83 c3
    > [ 233.682455] EIP: [<784e9480>] ata_qc_issue+0x90/0x380 SS:ESP 0068:78a13e14
    > [ 233.689302] Kernel panic - not syncing: Fatal exception in interrupt
    >
    > (gdb) list *0x784e9480
    > 0x784e9480 is in ata_qc_issue (include/linux/scatterlist.h:48).
    > 43 */
    > 44 static inline struct scatterlist *sg_next(struct scatterlist *sg)
    > 45 {
    > 46 sg++;
    > 47
    > 48 if (unlikely(sg_is_chain(sg)))
    > 49 sg = sg_chain_ptr(sg);
    > 50
    > 51 return sg;
    > 52 }
    > (gdb)
    >
    > so there's sg_next() involvement too. Below is the disassembly.


    You must have a magic test box :-)

    Will investigate... libata doesn't actually enable chaining, but since
    i386 supports it, it ends up using the chain helpers anyway.

    There seems to be some automatic inlining involved here, it must be
    dying inside ata_sg_setup().

    --
    Jens Axboe

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [bug] block subsystem related crash with latest -git



    On Wed, 17 Oct 2007, Jens Axboe wrote:
    >
    > OK, it is fine, as long as the sglist is cleared initially. And I don't
    > think there's anyway around that, clearly I didn't think long enough
    > before including the memset() removal from Tomo.


    Ok, I think that one-liner fixes the real bug.

    But I think the rest of your changes are simply bad.

    The fix to block/ll_rw_block.c should likely be something like the
    appended instead:

    - remove the "memset()" you had added earlier. It's bogus. It cannot be
    the right thing. If the sg list wasn't initialized correctly much
    earlier, trying to initialize it late is pointless - it contains crap.

    - the old code was fine, but let's initialize "sg" to NULL to make it
    clear that the initial value of sg is pointless, and only "next_sg"
    matters (since sg had better be assigned from that).

    Hmm?

    Linus

    ---
    diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
    index 9e3f3cc..54d974e 100644
    --- a/block/ll_rw_blk.c
    +++ b/block/ll_rw_blk.c
    @@ -1333,7 +1333,8 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
    * for each bio in rq
    */
    bvprv = NULL;
    - sg = next_sg = &sglist[0];
    + sg = NULL;
    + next_sg = &sglist[0];
    rq_for_each_segment(bvec, rq, iter) {
    int nbytes = bvec->bv_len;

    @@ -1352,7 +1353,6 @@ new_segment:
    sg = next_sg;
    next_sg = sg_next(sg);

    - memset(sg, 0, sizeof(*sg));
    sg->page = bvec->bv_page;
    sg->length = nbytes;
    sg->offset = bvec->bv_offset;
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [bug] ata subsystem related crash with latest -git

    On Wed, Oct 17 2007, Jens Axboe wrote:
    > On Wed, Oct 17 2007, Ingo Molnar wrote:
    > >
    > > ok, here's a different but similar crash that triggers on the testbox:
    > >
    > > [ 233.438890] BUG: unable to handle kernel paging request at virtual address 7d93e000
    > > [ 233.446390] printing eip: 784e9480 *pde = 01000067 *pte = 0593e000
    > > [ 233.452630] Oops: 0000 [#1] DEBUG_PAGEALLOC
    > > [ 233.456790]
    > > [ 233.458264] Pid: 0, comm: swapper Not tainted (2.6.23 #5)
    > > [ 233.463637] EIP: 0060:[<784e9480>] EFLAGS: 00010087 CPU: 0
    > > [ 233.469101] EIP is at ata_qc_issue+0x90/0x380
    > > [ 233.473429] EAX: 7d93dff0 EBX: 0000001f ECX: 7d93dff0 EDX: 798daf80
    > > [ 233.479668] ESI: 00000020 EDI: 7d93de00 EBP: 7b54007c ESP: 78a13e14
    > > [ 233.485908] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
    > > [ 233.491282] Process swapper (pid: 0, ti=78a12000 task=789753e0 task.ti=78a12000)
    > > [ 233.498473] Stack: 7d93de00 7b540000 7b540000 00000000 7d93dfe0 7b54007c 7d93db00 7b5417a4
    > > [ 233.506793] 784c2490 784ef69e 784f21f3 7b52de98 7d93db00 7b540000 7b5417a4 7d93db00
    > > [ 233.515112] 7b540000 7b524004 784f22e0 784ef380 784c2490 7d93db00 00000202 7b524004
    > > [ 233.523432] Call Trace:
    > > [ 233.526033] [<784c2490>] scsi_done+0x0/0x20
    > > [ 233.530279] [<784ef69e>] ata_scsi_translate+0xbe/0x140
    > > [ 233.535478] [<784f21f3>] ata_scsi_queuecmd+0x33/0x200
    > > [ 233.540591] [<784f22e0>] ata_scsi_queuecmd+0x120/0x200
    > > [ 233.545791] [<784ef380>] ata_scsi_rw_xlat+0x0/0x220
    > > [ 233.550730] [<784c2490>] scsi_done+0x0/0x20
    > > [ 233.554976] [<784c2d12>] scsi_dispatch_cmd+0x152/0x290
    > > [ 233.560177] [<78135c67>] trace_hardirqs_on+0x67/0xb0
    > > [ 233.565202] [<784c8c7e>] scsi_request_fn+0x1be/0x370
    > > [ 233.570229] [<78408086>] blk_run_queue+0x36/0x80
    > > [ 233.574909] [<784c7520>] scsi_next_command+0x30/0x50
    > > [ 233.579935] [<784c76ab>] scsi_end_request+0xab/0xe0
    > > [ 233.584875] [<784c83f9>] scsi_io_completion+0xa9/0x3d0
    > > [ 233.590075] [<78135c67>] trace_hardirqs_on+0x67/0xb0
    > > [ 233.595100] [<78405125>] blk_done_softirq+0x45/0x80
    > > [ 233.600040] [<78405153>] blk_done_softirq+0x73/0x80
    > > [ 233.604981] [<7811d4c3>] __do_softirq+0x53/0xb0
    > > [ 233.609573] [<7811d588>] do_softirq+0x68/0x70
    > > [ 233.613993] [<78105351>] do_IRQ+0x51/0x90
    > > [ 233.618066] [<78135c9c>] trace_hardirqs_on+0x9c/0xb0
    > > [ 233.623092] [<7810f2d0>] pgd_dtor+0x0/0x50
    > > [ 233.627252] [<7810388e>] common_interrupt+0x2e/0x40
    > > [ 233.632192] [<7810f2d0>] pgd_dtor+0x0/0x50
    > > [ 233.636352] [<7815f3be>] quicklist_trim+0x5e/0x90
    > > [ 233.641118] [<7810f2cb>] check_pgt_cache+0x1b/0x20
    > > [ 233.645971] [<78100c52>] cpu_idle+0x32/0x60
    > > [ 233.650217] [<78a14b35>] start_kernel+0x265/0x300
    > > [ 233.654983] [<78a14380>] unknown_bootoption+0x0/0x1e0
    > > [ 233.660097] =======================
    > > [ 233.663649] Code: 00 00 00 8b 45 34 a8 02 0f 84 ed 00 00 00 8b bd 88 00 00 00 31 db 89 3c 24 8b 75 3c 89 f8 c7 44 24 10 00 00 00 00 eb 1b 8d 76 00 <8b> 50 10 8d 48 10 f6 c2 01 0f 85 be 02 00 00 89 44 24 10 83 c3
    > > [ 233.682455] EIP: [<784e9480>] ata_qc_issue+0x90/0x380 SS:ESP 0068:78a13e14
    > > [ 233.689302] Kernel panic - not syncing: Fatal exception in interrupt
    > >
    > > (gdb) list *0x784e9480
    > > 0x784e9480 is in ata_qc_issue (include/linux/scatterlist.h:48).
    > > 43 */
    > > 44 static inline struct scatterlist *sg_next(struct scatterlist *sg)
    > > 45 {
    > > 46 sg++;
    > > 47
    > > 48 if (unlikely(sg_is_chain(sg)))
    > > 49 sg = sg_chain_ptr(sg);
    > > 50
    > > 51 return sg;
    > > 52 }
    > > (gdb)
    > >
    > > so there's sg_next() involvement too. Below is the disassembly.

    >
    > You must have a magic test box :-)
    >
    > Will investigate... libata doesn't actually enable chaining, but since
    > i386 supports it, it ends up using the chain helpers anyway.
    >
    > There seems to be some automatic inlining involved here, it must be
    > dying inside ata_sg_setup().


    Also, can you send a dmesg from that system so I can see which libata
    drivers are involved?

    --
    Jens Axboe

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [bug] block subsystem related crash with latest -git



    On Wed, 17 Oct 2007, Jens Axboe wrote:
    >
    > OK, the below should actually be safe, I don't know why I talked myself
    > into the next_sg stuff in the beginning. It's always safe to zero sg,
    > since it's a valid entry - nothing to save in ->page. Ingo, does this
    > work for you?


    I really don't think this should work.

    Doing "sg_next()" on a valid sg is *always* ok. So if the old code didn't
    work, then "sg" wasn't valid to start with (and the code *after* the
    sg_next() would have oopsed even if you try to avoid using sg_next.

    So avoiding the "sg_next()" on the last entry is pointless.

    Also, your patch makes the code almost totally unreadable, with that
    subtle issue of the "if (bvprv && cluster)" case not triggering on the
    first case, so the NULL initial sg is "safe".

    So at a guess, I think the *real* problem is simply that the passed-in
    sglist was just too small. What guarantees that the sg list allocation
    (apparently done by scsi_alloc_sgtable()) is big enough?

    If I read things right, scsi_alloc_sgtable() will allocate "cmd->use_sg"
    SG enties, no? But I also notice that it does not seem to initialize the
    SG allocation, so those SG entries contain random crap - including,
    perhaps, a random - and bogus - chain pointer in sg->page..

    Yes, we set sh->page *if* we create a chain, but if we don't chain, we
    leave the old random contents around which in turn may include old and
    stale chain pointers. Or am I missing something?

    So when you added that "memset(sg, 0, sizeof(*sg))" into blk_rq_map_sg(),
    you did it way too late - it needs to be done when the sg chain is
    allocated, and for every entry (and then the "link" entry needs to be
    linked in separately)

    I think.

    Linus

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [bug] block subsystem related crash with latest -git

    On Wed, Oct 17 2007, Linus Torvalds wrote:
    >
    >
    > On Wed, 17 Oct 2007, Jens Axboe wrote:
    > >
    > > OK, the below should actually be safe, I don't know why I talked myself
    > > into the next_sg stuff in the beginning. It's always safe to zero sg,
    > > since it's a valid entry - nothing to save in ->page. Ingo, does this
    > > work for you?

    >
    > I really don't think this should work.
    >
    > Doing "sg_next()" on a valid sg is *always* ok. So if the old code didn't
    > work, then "sg" wasn't valid to start with (and the code *after* the
    > sg_next() would have oopsed even if you try to avoid using sg_next.
    >
    > So avoiding the "sg_next()" on the last entry is pointless.


    Yeah, I didn't quite understand why if sg was valid, why dereferencing
    *(sg + 1)->page would crap out :/

    > Also, your patch makes the code almost totally unreadable, with that
    > subtle issue of the "if (bvprv && cluster)" case not triggering on the
    > first case, so the NULL initial sg is "safe".


    Hmm I think it's quite readable, but perhaps that's just me :-). The
    first is much cleaner, and the last part just reads 'if sg is not set
    yet, set to list. otherwise, goto next entry'.

    > So at a guess, I think the *real* problem is simply that the passed-in
    > sglist was just too small. What guarantees that the sg list allocation
    > (apparently done by scsi_alloc_sgtable()) is big enough?
    >
    > If I read things right, scsi_alloc_sgtable() will allocate "cmd->use_sg"
    > SG enties, no? But I also notice that it does not seem to initialize the
    > SG allocation, so those SG entries contain random crap - including,
    > perhaps, a random - and bogus - chain pointer in sg->page..


    Right, we allocate an sgtable that will hold ->use_sg entries, which
    contains request->nr_phys_segments. And that should definitely fit.

    Regarding the init of the sglist, that was the revert I was talking
    about. We do need that memset() in there, so all those sg entries will
    be properly zeroed.

    > Yes, we set sh->page *if* we create a chain, but if we don't chain, we
    > leave the old random contents around which in turn may include old and
    > stale chain pointers. Or am I missing something?
    >
    > So when you added that "memset(sg, 0, sizeof(*sg))" into blk_rq_map_sg(),
    > you did it way too late - it needs to be done when the sg chain is
    > allocated, and for every entry (and then the "link" entry needs to be
    > linked in separately)
    >
    > I think.


    Yep, and that is what Ingo did test as well and it worked. For that
    case, now libata is crapping out elsewhere in sg_next().

    --
    Jens Axboe

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 1 of 8 1 2 3 ... LastLast