[PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE - Kernel

This is a discussion on [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE - Kernel ; blk_recalc_rq_segments assumes that any segments can be merged in the case of BIOVEC_VIRT_MERGEABLE && !BIOVEC_VIRT_OVERSIZE. However, an IOMMU can't merge segments if the total length of the segments is larger than max_segment_size (the LLD restriction). Due to this bug, a ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 39

Thread: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

  1. [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    blk_recalc_rq_segments assumes that any segments can be merged in the
    case of BIOVEC_VIRT_MERGEABLE && !BIOVEC_VIRT_OVERSIZE. However, an
    IOMMU can't merge segments if the total length of the segments is
    larger than max_segment_size (the LLD restriction).

    Due to this bug, a LLD may get the larger number of segments than
    nr_hw_segments because the block layer puts more segments in a request
    than it should do.

    This bug could happen on alpha, parisc, and sparc, which use VMERGE.

    Like blk_hw_contig_segment() does, this patch uses hw_seg_size for
    simplicity, which is a bit larger than an exact value (we don't need
    BIOVEC_VIRT_START_SIZE here).

    Signed-off-by: FUJITA Tomonori
    Cc: Jens Axboe
    Cc: Mikulas Patocka
    Cc: David Miller
    ---
    block/blk-merge.c | 3 ++-
    1 files changed, 2 insertions(+), 1 deletions(-)

    diff --git a/block/blk-merge.c b/block/blk-merge.c
    index 5efc9e7..39a22f8 100644
    --- a/block/blk-merge.c
    +++ b/block/blk-merge.c
    @@ -83,7 +83,8 @@ void blk_recalc_rq_segments(struct request *rq)
    continue;
    }
    new_segment:
    - if (BIOVEC_VIRT_MERGEABLE(bvprv, bv) &&
    + if (hw_seg_size + bv->bv_len <= q->max_segment_size &&
    + BIOVEC_VIRT_MERGEABLE(bvprv, bv) &&
    !BIOVEC_VIRT_OVERSIZE(hw_seg_size + bv->bv_len))
    hw_seg_size += bv->bv_len;
    else {
    --
    1.5.5.GIT

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    On Tue, 15 Jul 2008, FUJITA Tomonori wrote:

    > blk_recalc_rq_segments assumes that any segments can be merged in the
    > case of BIOVEC_VIRT_MERGEABLE && !BIOVEC_VIRT_OVERSIZE. However, an
    > IOMMU can't merge segments if the total length of the segments is
    > larger than max_segment_size (the LLD restriction).
    >
    > Due to this bug, a LLD may get the larger number of segments than
    > nr_hw_segments because the block layer puts more segments in a request
    > than it should do.
    >
    > This bug could happen on alpha, parisc, and sparc, which use VMERGE.


    Parisc doesn't use virtual merge accounting (there is variable for it but
    it's always 0). On sparc64 it is broken anyway with or without your patch.
    And alpha alone doesn't justify substantial code bloat in generic block
    layer. So I propose this patch to drop it at all.

    A further patch could be made to drop bi_hw_segments, nr_hw_segments and
    similar --- they have no use for anything except alpha and because there
    are few alpha computers and alpha is discontinued platform, the logic for
    attempting to predict number of hardware segments can't be well tested and
    maintained.

    > Like blk_hw_contig_segment() does, this patch uses hw_seg_size for
    > simplicity, which is a bit larger than an exact value (we don't need
    > BIOVEC_VIRT_START_SIZE here).
    >
    > Signed-off-by: FUJITA Tomonori
    > Cc: Jens Axboe
    > Cc: Mikulas Patocka
    > Cc: David Miller
    > ---
    > block/blk-merge.c | 3 ++-
    > 1 files changed, 2 insertions(+), 1 deletions(-)
    >
    > diff --git a/block/blk-merge.c b/block/blk-merge.c
    > index 5efc9e7..39a22f8 100644
    > --- a/block/blk-merge.c
    > +++ b/block/blk-merge.c
    > @@ -83,7 +83,8 @@ void blk_recalc_rq_segments(struct request *rq)
    > continue;
    > }
    > new_segment:
    > - if (BIOVEC_VIRT_MERGEABLE(bvprv, bv) &&
    > + if (hw_seg_size + bv->bv_len <= q->max_segment_size &&
    > + BIOVEC_VIRT_MERGEABLE(bvprv, bv) &&
    > !BIOVEC_VIRT_OVERSIZE(hw_seg_size + bv->bv_len))
    > hw_seg_size += bv->bv_len;
    > else {
    >


    Drop virtual merge accounting in bio layer (not the virtual merging
    itself). It is used only on Sparc64 (where it is broken and needs to be
    disabled) and on Alpha. This logic is very hard to maintain (the generic
    code in block/blk-merge.c tries to attempt to predict how
    architecture-specific IOMMU will merge the segments) and Alpha
    architecture alone doesn't justify the maintenance and code-bloat cost.

    Signed-off-by: Mikulas Patocka
    ---
    arch/parisc/kernel/setup.c | 5 ---
    arch/x86/kernel/pci-dma.c | 6 ---
    block/blk-merge.c | 72 +++------------------------------------------
    fs/bio.c | 6 +--
    include/asm-alpha/io.h | 3 -
    include/asm-ia64/io.h | 26 +---------------
    include/asm-parisc/io.h | 6 ---
    include/asm-powerpc/io.h | 7 ----
    include/asm-sparc64/io.h | 1
    include/asm-x86/io_64.h | 3 -
    include/linux/bio.h | 15 ---------
    11 files changed, 10 insertions(+), 140 deletions(-)

    Index: linux-2.6.26-fast/arch/parisc/kernel/setup.c
    ================================================== =================
    --- linux-2.6.26-fast.orig/arch/parisc/kernel/setup.c 2008-07-15 14:19:01.000000000 +0200
    +++ linux-2.6.26-fast/arch/parisc/kernel/setup.c 2008-07-15 14:19:18.000000000 +0200
    @@ -57,11 +57,6 @@ int parisc_bus_is_phys __read_mostly = 1
    EXPORT_SYMBOL(parisc_bus_is_phys);
    #endif

    -/* This sets the vmerge boundary and size, it's here because it has to
    - * be available on all platforms (zero means no-virtual merging) */
    -unsigned long parisc_vmerge_boundary = 0;
    -unsigned long parisc_vmerge_max_size = 0;
    -
    void __init setup_cmdline(char **cmdline_p)
    {
    extern unsigned int boot_args[];
    Index: linux-2.6.26-fast/arch/x86/kernel/pci-dma.c
    ================================================== =================
    --- linux-2.6.26-fast.orig/arch/x86/kernel/pci-dma.c 2008-07-15 14:20:15.000000000 +0200
    +++ linux-2.6.26-fast/arch/x86/kernel/pci-dma.c 2008-07-15 14:20:37.000000000 +0200
    @@ -30,11 +30,6 @@ int no_iommu __read_mostly;
    /* Set this to 1 if there is a HW IOMMU in the system */
    int iommu_detected __read_mostly = 0;

    -/* This tells the BIO block layer to assume merging. Default to off
    - because we cannot guarantee merging later. */
    -int iommu_bio_merge __read_mostly = 0;
    -EXPORT_SYMBOL(iommu_bio_merge);
    -
    dma_addr_t bad_dma_address __read_mostly = 0;
    EXPORT_SYMBOL(bad_dma_address);

    @@ -151,7 +146,6 @@ static __init int iommu_setup(char *p)
    }

    if (!strncmp(p, "biomerge", 8)) {
    - iommu_bio_merge = 4096;
    iommu_merge = 1;
    force_iommu = 1;
    }
    Index: linux-2.6.26-fast/block/blk-merge.c
    ================================================== =================
    --- linux-2.6.26-fast.orig/block/blk-merge.c 2008-07-15 14:22:25.000000000 +0200
    +++ linux-2.6.26-fast/block/blk-merge.c 2008-07-15 14:36:06.000000000 +0200
    @@ -66,7 +66,7 @@ void blk_recalc_rq_segments(struct reque
    */
    high = page_to_pfn(bv->bv_page) > q->bounce_pfn;
    if (high || highprv)
    - goto new_hw_segment;
    + goto new_segment;
    if (cluster) {
    if (seg_size + bv->bv_len > q->max_segment_size)
    goto new_segment;
    @@ -74,8 +74,6 @@ void blk_recalc_rq_segments(struct reque
    goto new_segment;
    if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bv))
    goto new_segment;
    - if (BIOVEC_VIRT_OVERSIZE(hw_seg_size + bv->bv_len))
    - goto new_hw_segment;

    seg_size += bv->bv_len;
    hw_seg_size += bv->bv_len;
    @@ -83,17 +81,11 @@ void blk_recalc_rq_segments(struct reque
    continue;
    }
    new_segment:
    - if (BIOVEC_VIRT_MERGEABLE(bvprv, bv) &&
    - !BIOVEC_VIRT_OVERSIZE(hw_seg_size + bv->bv_len))
    - hw_seg_size += bv->bv_len;
    - else {
    -new_hw_segment:
    - if (nr_hw_segs == 1 &&
    - hw_seg_size > rq->bio->bi_hw_front_size)
    - rq->bio->bi_hw_front_size = hw_seg_size;
    - hw_seg_size = BIOVEC_VIRT_START_SIZE(bv) + bv->bv_len;
    - nr_hw_segs++;
    - }
    + if (nr_hw_segs == 1 &&
    + hw_seg_size > rq->bio->bi_hw_front_size)
    + rq->bio->bi_hw_front_size = hw_seg_size;
    + hw_seg_size = bv->bv_len;
    + nr_hw_segs++;

    nr_phys_segs++;
    bvprv = bv;
    @@ -146,22 +138,6 @@ static int blk_phys_contig_segment(struc
    return 0;
    }

    -static int blk_hw_contig_segment(struct request_queue *q, struct bio *bio,
    - struct bio *nxt)
    -{
    - if (!bio_flagged(bio, BIO_SEG_VALID))
    - blk_recount_segments(q, bio);
    - if (!bio_flagged(nxt, BIO_SEG_VALID))
    - blk_recount_segments(q, nxt);
    - if (!BIOVEC_VIRT_MERGEABLE(__BVEC_END(bio), __BVEC_START(nxt)) ||
    - BIOVEC_VIRT_OVERSIZE(bio->bi_hw_back_size + nxt->bi_hw_front_size))
    - return 0;
    - if (bio->bi_hw_back_size + nxt->bi_hw_front_size > q->max_segment_size)
    - return 0;
    -
    - return 1;
    -}
    -
    /*
    * map a request to scatterlist, return number of sg entries setup. Caller
    * must make sure sg can hold rq->nr_phys_segments entries
    @@ -317,18 +293,6 @@ int ll_back_merge_fn(struct request_queu
    if (!bio_flagged(bio, BIO_SEG_VALID))
    blk_recount_segments(q, bio);
    len = req->biotail->bi_hw_back_size + bio->bi_hw_front_size;
    - if (BIOVEC_VIRT_MERGEABLE(__BVEC_END(req->biotail), __BVEC_START(bio))
    - && !BIOVEC_VIRT_OVERSIZE(len)) {
    - int mergeable = ll_new_mergeable(q, req, bio);
    -
    - if (mergeable) {
    - if (req->nr_hw_segments == 1)
    - req->bio->bi_hw_front_size = len;
    - if (bio->bi_hw_segments == 1)
    - bio->bi_hw_back_size = len;
    - }
    - return mergeable;
    - }

    return ll_new_hw_segment(q, req, bio);
    }
    @@ -356,18 +320,6 @@ int ll_front_merge_fn(struct request_que
    blk_recount_segments(q, bio);
    if (!bio_flagged(req->bio, BIO_SEG_VALID))
    blk_recount_segments(q, req->bio);
    - if (BIOVEC_VIRT_MERGEABLE(__BVEC_END(bio), __BVEC_START(req->bio)) &&
    - !BIOVEC_VIRT_OVERSIZE(len)) {
    - int mergeable = ll_new_mergeable(q, req, bio);
    -
    - if (mergeable) {
    - if (bio->bi_hw_segments == 1)
    - bio->bi_hw_front_size = len;
    - if (req->nr_hw_segments == 1)
    - req->biotail->bi_hw_back_size = len;
    - }
    - return mergeable;
    - }

    return ll_new_hw_segment(q, req, bio);
    }
    @@ -399,18 +351,6 @@ static int ll_merge_requests_fn(struct r
    return 0;

    total_hw_segments = req->nr_hw_segments + next->nr_hw_segments;
    - if (blk_hw_contig_segment(q, req->biotail, next->bio)) {
    - int len = req->biotail->bi_hw_back_size +
    - next->bio->bi_hw_front_size;
    - /*
    - * propagate the combined length to the end of the requests
    - */
    - if (req->nr_hw_segments == 1)
    - req->bio->bi_hw_front_size = len;
    - if (next->nr_hw_segments == 1)
    - next->biotail->bi_hw_back_size = len;
    - total_hw_segments--;
    - }

    if (total_hw_segments > q->max_hw_segments)
    return 0;
    Index: linux-2.6.26-fast/include/asm-alpha/io.h
    ================================================== =================
    --- linux-2.6.26-fast.orig/include/asm-alpha/io.h 2008-07-15 14:14:39.000000000 +0200
    +++ linux-2.6.26-fast/include/asm-alpha/io.h 2008-07-15 14:16:15.000000000 +0200
    @@ -96,9 +96,6 @@ static inline dma_addr_t __deprecated is
    return page_to_phys(page);
    }

    -/* This depends on working iommu. */
    -#define BIO_VMERGE_BOUNDARY (alpha_mv.mv_pci_tbi ? PAGE_SIZE : 0)
    -
    /* Maximum PIO space address supported? */
    #define IO_SPACE_LIMIT 0xffff

    Index: linux-2.6.26-fast/include/asm-ia64/io.h
    ================================================== =================
    --- linux-2.6.26-fast.orig/include/asm-ia64/io.h 2008-07-15 14:14:45.000000000 +0200
    +++ linux-2.6.26-fast/include/asm-ia64/io.h 2008-07-15 14:18:24.000000000 +0200
    @@ -430,30 +430,8 @@ extern void memcpy_fromio(void *dst, con
    extern void memcpy_toio(volatile void __iomem *dst, const void *src, long n);
    extern void memset_io(volatile void __iomem *s, int c, long n);

    -# endif /* __KERNEL__ */
    -
    -/*
    - * Enabling BIO_VMERGE_BOUNDARY forces us to turn off I/O MMU bypassing. It is said that
    - * BIO-level virtual merging can give up to 4% performance boost (not verified for ia64).
    - * On the other hand, we know that I/O MMU bypassing gives ~8% performance improvement on
    - * SPECweb-like workloads on zx1-based machines. Thus, for now we favor I/O MMU bypassing
    - * over BIO-level virtual merging.
    - */
    extern unsigned long ia64_max_iommu_merge_mask;
    -#if 1
    -#define BIO_VMERGE_BOUNDARY 0
    -#else
    -/*
    - * It makes no sense at all to have this BIO_VMERGE_BOUNDARY macro here. Should be
    - * replaced by dma_merge_mask() or something of that sort. Note: the only way
    - * BIO_VMERGE_BOUNDARY is used is to mask off bits. Effectively, our definition gets
    - * expanded into:
    - *
    - * addr & ((ia64_max_iommu_merge_mask + 1) - 1) == (addr & ia64_max_iommu_vmerge_mask)
    - *
    - * which is precisely what we want.
    - */
    -#define BIO_VMERGE_BOUNDARY (ia64_max_iommu_merge_mask + 1)
    -#endif
    +
    +# endif /* __KERNEL__ */

    #endif /* _ASM_IA64_IO_H */
    Index: linux-2.6.26-fast/include/asm-parisc/io.h
    ================================================== =================
    --- linux-2.6.26-fast.orig/include/asm-parisc/io.h 2008-07-15 14:14:52.000000000 +0200
    +++ linux-2.6.26-fast/include/asm-parisc/io.h 2008-07-15 14:18:58.000000000 +0200
    @@ -4,12 +4,6 @@
    #include
    #include

    -extern unsigned long parisc_vmerge_boundary;
    -extern unsigned long parisc_vmerge_max_size;
    -
    -#define BIO_VMERGE_BOUNDARY parisc_vmerge_boundary
    -#define BIO_VMERGE_MAX_SIZE parisc_vmerge_max_size
    -
    #define virt_to_phys(a) ((unsigned long)__pa(a))
    #define phys_to_virt(a) __va(a)
    #define virt_to_bus virt_to_phys
    Index: linux-2.6.26-fast/include/asm-powerpc/io.h
    ================================================== =================
    --- linux-2.6.26-fast.orig/include/asm-powerpc/io.h 2008-07-15 14:14:56.000000000 +0200
    +++ linux-2.6.26-fast/include/asm-powerpc/io.h 2008-07-15 14:19:37.000000000 +0200
    @@ -683,13 +683,6 @@ static inline void * phys_to_virt(unsign
    */
    #define page_to_phys(page) (page_to_pfn(page) << PAGE_SHIFT)

    -/* We do NOT want virtual merging, it would put too much pressure on
    - * our iommu allocator. Instead, we want drivers to be smart enough
    - * to coalesce sglists that happen to have been mapped in a contiguous
    - * way by the iommu
    - */
    -#define BIO_VMERGE_BOUNDARY 0
    -
    /*
    * 32 bits still uses virt_to_bus() for it's implementation of DMA
    * mappings se we have to keep it defined here. We also have some old
    Index: linux-2.6.26-fast/include/asm-sparc64/io.h
    ================================================== =================
    --- linux-2.6.26-fast.orig/include/asm-sparc64/io.h 2008-07-15 14:15:04.000000000 +0200
    +++ linux-2.6.26-fast/include/asm-sparc64/io.h 2008-07-15 14:19:52.000000000 +0200
    @@ -16,7 +16,6 @@
    /* BIO layer definitions. */
    extern unsigned long kern_base, kern_size;
    #define page_to_phys(page) (page_to_pfn(page) << PAGE_SHIFT)
    -#define BIO_VMERGE_BOUNDARY 8192

    static inline u8 _inb(unsigned long addr)
    {
    Index: linux-2.6.26-fast/include/asm-x86/io_64.h
    ================================================== =================
    --- linux-2.6.26-fast.orig/include/asm-x86/io_64.h 2008-07-15 14:15:08.000000000 +0200
    +++ linux-2.6.26-fast/include/asm-x86/io_64.h 2008-07-15 14:20:11.000000000 +0200
    @@ -304,9 +304,6 @@ void memset_io(volatile void __iomem *a,

    #define flush_write_buffers()

    -extern int iommu_bio_merge;
    -#define BIO_VMERGE_BOUNDARY iommu_bio_merge
    -
    /*
    * Convert a virtual cached pointer to an uncached pointer
    */
    Index: linux-2.6.26-fast/include/linux/bio.h
    ================================================== =================
    --- linux-2.6.26-fast.orig/include/linux/bio.h 2008-07-15 14:15:13.000000000 +0200
    +++ linux-2.6.26-fast/include/linux/bio.h 2008-07-15 14:44:58.000000000 +0200
    @@ -26,21 +26,8 @@

    #ifdef CONFIG_BLOCK

    -/* Platforms may set this to teach the BIO layer about IOMMU hardware. */
    #include

    -#if defined(BIO_VMERGE_MAX_SIZE) && defined(BIO_VMERGE_BOUNDARY)
    -#define BIOVEC_VIRT_START_SIZE(x) (bvec_to_phys(x) & (BIO_VMERGE_BOUNDARY - 1))
    -#define BIOVEC_VIRT_OVERSIZE(x) ((x) > BIO_VMERGE_MAX_SIZE)
    -#else
    -#define BIOVEC_VIRT_START_SIZE(x) 0
    -#define BIOVEC_VIRT_OVERSIZE(x) 0
    -#endif
    -
    -#ifndef BIO_VMERGE_BOUNDARY
    -#define BIO_VMERGE_BOUNDARY 0
    -#endif
    -
    #define BIO_DEBUG

    #ifdef BIO_DEBUG
    @@ -235,8 +222,6 @@ static inline void *bio_data(struct bio
    ((bvec_to_phys((vec1)) + (vec1)->bv_len) == bvec_to_phys((vec2)))
    #endif

    -#define BIOVEC_VIRT_MERGEABLE(vec1, vec2) \
    - ((((bvec_to_phys((vec1)) + (vec1)->bv_len) | bvec_to_phys((vec2))) & (BIO_VMERGE_BOUNDARY - 1)) == 0)
    #define __BIO_SEG_BOUNDARY(addr1, addr2, mask) \
    (((addr1) | (mask)) == (((addr2) - 1) | (mask)))
    #define BIOVEC_SEG_BOUNDARY(q, b1, b2) \
    Index: linux-2.6.26-fast/fs/bio.c
    ================================================== =================
    --- linux-2.6.26-fast.orig/fs/bio.c 2008-07-15 14:43:03.000000000 +0200
    +++ linux-2.6.26-fast/fs/bio.c 2008-07-15 14:43:58.000000000 +0200
    @@ -352,8 +352,7 @@ static int __bio_add_page(struct request
    */

    while (bio->bi_phys_segments >= q->max_phys_segments
    - || bio->bi_hw_segments >= q->max_hw_segments
    - || BIOVEC_VIRT_OVERSIZE(bio->bi_size)) {
    + || bio->bi_hw_segments >= q->max_hw_segments) {

    if (retried_segments)
    return 0;
    @@ -390,8 +389,7 @@ static int __bio_add_page(struct request
    }

    /* If we may be able to merge these biovecs, force a recount */
    - if (bio->bi_vcnt && (BIOVEC_PHYS_MERGEABLE(bvec-1, bvec) ||
    - BIOVEC_VIRT_MERGEABLE(bvec-1, bvec)))
    + if (bio->bi_vcnt && (BIOVEC_PHYS_MERGEABLE(bvec-1, bvec)))
    bio->bi_flags &= ~(1 << BIO_SEG_VALID);

    bio->bi_vcnt++;
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    On Tue, 15 Jul 2008 09:37:05 -0400 (EDT)
    Mikulas Patocka wrote:

    > On Tue, 15 Jul 2008, FUJITA Tomonori wrote:
    >
    > > blk_recalc_rq_segments assumes that any segments can be merged in the
    > > case of BIOVEC_VIRT_MERGEABLE && !BIOVEC_VIRT_OVERSIZE. However, an
    > > IOMMU can't merge segments if the total length of the segments is
    > > larger than max_segment_size (the LLD restriction).
    > >
    > > Due to this bug, a LLD may get the larger number of segments than
    > > nr_hw_segments because the block layer puts more segments in a request
    > > than it should do.
    > >
    > > This bug could happen on alpha, parisc, and sparc, which use VMERGE.

    >
    > Parisc doesn't use virtual merge accounting (there is variable for it but
    > it's always 0).


    Hmm, really? Looks like PARISC IOMMUs (ccio-dma.c and sba_iomm.c) set
    parisc_vmerge_boundary (CC'ed PARISC mailing list).


    > On sparc64 it is broken anyway with or without your patch.


    Yeah, we need to modify SPARC64 IOMMU code (I'm not sure that it's
    worth). Right now, the best fix is setting BIO_VMERGE_BOUNDARY to 0.


    > And alpha alone doesn't justify substantial code bloat in generic block
    > layer. So I propose this patch to drop it at all.


    Jens, what do you think about removing VMERGE code?
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    >>> This bug could happen on alpha, parisc, and sparc, which use VMERGE.
    >>
    >> Parisc doesn't use virtual merge accounting (there is variable for it but
    >> it's always 0).

    >
    > Hmm, really? Looks like PARISC IOMMUs (ccio-dma.c and sba_iomm.c) set
    > parisc_vmerge_boundary (CC'ed PARISC mailing list).


    That's right, I looked only at arch and include.

    >> On sparc64 it is broken anyway with or without your patch.

    >
    > Yeah, we need to modify SPARC64 IOMMU code (I'm not sure that it's
    > worth). Right now, the best fix is setting BIO_VMERGE_BOUNDARY to 0.


    Even if we fix it now, the question is: how long it will stay fixed? Until
    someone makes another change to struct device that restricts boundaries on
    some wacky hardware.

    Mikulas

    >> And alpha alone doesn't justify substantial code bloat in generic block
    >> layer. So I propose this patch to drop it at all.

    >
    > Jens, what do you think about removing VMERGE code?
    >

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    On Tue, 2008-07-15 at 23:20 +0900, FUJITA Tomonori wrote:
    > On Tue, 15 Jul 2008 09:37:05 -0400 (EDT)
    > Mikulas Patocka wrote:
    >
    > > On Tue, 15 Jul 2008, FUJITA Tomonori wrote:
    > >
    > > > blk_recalc_rq_segments assumes that any segments can be merged in the
    > > > case of BIOVEC_VIRT_MERGEABLE && !BIOVEC_VIRT_OVERSIZE. However, an
    > > > IOMMU can't merge segments if the total length of the segments is
    > > > larger than max_segment_size (the LLD restriction).
    > > >
    > > > Due to this bug, a LLD may get the larger number of segments than
    > > > nr_hw_segments because the block layer puts more segments in a request
    > > > than it should do.
    > > >
    > > > This bug could happen on alpha, parisc, and sparc, which use VMERGE.

    > >
    > > Parisc doesn't use virtual merge accounting (there is variable for it but
    > > it's always 0).

    >
    > Hmm, really? Looks like PARISC IOMMUs (ccio-dma.c and sba_iomm.c) set
    > parisc_vmerge_boundary (CC'ed PARISC mailing list).


    That's correct. The size and boundary depend on the type of IOMMU (ccio
    or sba) so the vmerge boundary parameters are set up in the iommu driver
    code.

    > > On sparc64 it is broken anyway with or without your patch.

    >
    > Yeah, we need to modify SPARC64 IOMMU code (I'm not sure that it's
    > worth). Right now, the best fix is setting BIO_VMERGE_BOUNDARY to 0.
    >
    >
    > > And alpha alone doesn't justify substantial code bloat in generic block
    > > layer. So I propose this patch to drop it at all.

    >
    > Jens, what do you think about removing VMERGE code?


    Actually, it's code I did.

    There are plusses and minusses to all of this. The original vmerge code
    was done for sparc ... mainly because the benefits of virtual merging
    can offset the cost of having to use the iommu. However, most
    architectures didn't use it. When I fixed it up to work for parisc (and
    introduced the parameters) we were trying to demonstrate that using it
    was feasible.

    The idea behind vmerging is that assembling and programming sg lists is
    expensive, so you want to do it once. Either in the iommu or in the
    driver sg list, but not in both. There is evidence that it saves around
    7% or so on drivers. However, for architectures that can do it, better
    savings are made simply by lifting the iommu out of the I/O path (so
    called bypass mode).

    I suspect with IOMMUs coming back (and being unable to be bypassed) with
    virtualisation, virtual merging might once more become a significant
    value.

    James


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    >>> On sparc64 it is broken anyway with or without your patch.
    >>
    >> Yeah, we need to modify SPARC64 IOMMU code (I'm not sure that it's
    >> worth). Right now, the best fix is setting BIO_VMERGE_BOUNDARY to 0.
    >>
    >>
    >>> And alpha alone doesn't justify substantial code bloat in generic block
    >>> layer. So I propose this patch to drop it at all.

    >>
    >> Jens, what do you think about removing VMERGE code?

    >
    > Actually, it's code I did.
    >
    > There are plusses and minusses to all of this. The original vmerge code
    > was done for sparc ... mainly because the benefits of virtual merging
    > can offset the cost of having to use the iommu. However, most
    > architectures didn't use it. When I fixed it up to work for parisc (and
    > introduced the parameters) we were trying to demonstrate that using it
    > was feasible.
    >
    > The idea behind vmerging is that assembling and programming sg lists is
    > expensive, so you want to do it once. Either in the iommu or in the
    > driver sg list, but not in both. There is evidence that it saves around
    > 7% or so on drivers. However, for architectures that can do it, better
    > savings are made simply by lifting the iommu out of the I/O path (so
    > called bypass mode).


    The problem is with vmerge accounting in block layer (that is what I'm
    proposing to remove), not with vmerge itself.

    Vmerge accounting has advantages only if you have device with small amount
    of sg slots --- it allows the block layer to create request that has
    higher number of segments then the device.

    If you have device with for example 1024 slots, the virtual merge
    accounting has no effect, because the any request will fit into that size.
    Even without virtual merge accounting, the virtual merging will happen, so
    there will be no performance penalty for the controller --- the controller
    will be programmed with exactly the same number of segments as if virtual
    merge accounting was present. (there could be even slight positive
    performance effect if you remove accounting, because you burn less CPU
    cycles per request)

    If you have device will small number of sg slots (16 or so), vmerge
    accounting can improve performance by creating requests with more than 16
    segments --- the question is: is there any such device? And is the device
    performance-sensitive? (i.e. isn't it such an old hardware where no one
    cares about performance anyway?)

    > I suspect with IOMMUs coming back (and being unable to be bypassed) with
    > virtualisation, virtual merging might once more become a significant
    > value.


    I suppose that no one would manufacture new SCSI card with 16 or 32 sg
    slots these days, so the accounting of hardware segments has no effect on
    modern hardware.

    Mikulas

    > James
    >
    >

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    On Tue, 15 Jul 2008 10:37:58 -0400 (EDT)
    Mikulas Patocka wrote:

    > >>> This bug could happen on alpha, parisc, and sparc, which use VMERGE.
    > >>
    > >> Parisc doesn't use virtual merge accounting (there is variable for it but
    > >> it's always 0).

    > >
    > > Hmm, really? Looks like PARISC IOMMUs (ccio-dma.c and sba_iomm.c) set
    > > parisc_vmerge_boundary (CC'ed PARISC mailing list).

    >
    > That's right, I looked only at arch and include.
    >
    > >> On sparc64 it is broken anyway with or without your patch.

    > >
    > > Yeah, we need to modify SPARC64 IOMMU code (I'm not sure that it's
    > > worth). Right now, the best fix is setting BIO_VMERGE_BOUNDARY to 0.

    >
    > Even if we fix it now, the question is: how long it will stay fixed? Until
    > someone makes another change to struct device that restricts boundaries on
    > some wacky hardware.


    I'm not sure how the boundary restriction of a device can break
    the VMERGE accounting.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    >> Even if we fix it now, the question is: how long it will stay fixed? Until
    >> someone makes another change to struct device that restricts boundaries on
    >> some wacky hardware.

    >
    > I'm not sure how the boundary restriction of a device can break
    > the VMERGE accounting.


    Because block layer code doesn't know anything about the device, pci
    access restrictions and so on.

    Someone already broken DaveM's Sparc64 merging by adding boundaries (it
    was broken even before, but these boundary checks made it worse).

    Mikulas
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    On Tue, 2008-07-15 at 11:24 -0400, Mikulas Patocka wrote:
    > >>> On sparc64 it is broken anyway with or without your patch.
    > >>
    > >> Yeah, we need to modify SPARC64 IOMMU code (I'm not sure that it's
    > >> worth). Right now, the best fix is setting BIO_VMERGE_BOUNDARY to 0.
    > >>
    > >>
    > >>> And alpha alone doesn't justify substantial code bloat in generic block
    > >>> layer. So I propose this patch to drop it at all.
    > >>
    > >> Jens, what do you think about removing VMERGE code?

    > >
    > > Actually, it's code I did.
    > >
    > > There are plusses and minusses to all of this. The original vmerge code
    > > was done for sparc ... mainly because the benefits of virtual merging
    > > can offset the cost of having to use the iommu. However, most
    > > architectures didn't use it. When I fixed it up to work for parisc (and
    > > introduced the parameters) we were trying to demonstrate that using it
    > > was feasible.
    > >
    > > The idea behind vmerging is that assembling and programming sg lists is
    > > expensive, so you want to do it once. Either in the iommu or in the
    > > driver sg list, but not in both. There is evidence that it saves around
    > > 7% or so on drivers. However, for architectures that can do it, better
    > > savings are made simply by lifting the iommu out of the I/O path (so
    > > called bypass mode).

    >
    > The problem is with vmerge accounting in block layer (that is what I'm
    > proposing to remove), not with vmerge itself.


    I don't think that's true ... otherwise parisc would be falling over
    left right and centre.

    > Vmerge accounting has advantages only if you have device with small amount
    > of sg slots --- it allows the block layer to create request that has
    > higher number of segments then the device.


    This isn't really true either. A lot of devices with a high sg slot
    count are still less efficient than an iommu for programming.

    Even if they're not, on parisc we have to program the iommu, we can't
    bypass, so it still makes sense to only have one large sg list (in the
    iommu) and one small one (in the device). Having two large ones reduces
    our I/O throughput because of the extra overhead.

    > If you have device with for example 1024 slots, the virtual merge
    > accounting has no effect, because the any request will fit into that size.


    It's not about fitting a request, it's about efficient processing.

    > Even without virtual merge accounting, the virtual merging will happen, so
    > there will be no performance penalty for the controller --- the controller
    > will be programmed with exactly the same number of segments as if virtual
    > merge accounting was present. (there could be even slight positive
    > performance effect if you remove accounting, because you burn less CPU
    > cycles per request)


    Yes there is. Both the iommu and the device have to traverse large SG
    lists. This is where the inefficiency lies. On PA, we use exactly the
    same number of iotlb slots whether virtual merging is in effect or not,
    but the device has an internal loop to go over the list. It's that loop
    that virtual merging reduces.

    Since the virtual merge computation is in line when the request is built
    (by design) it doesn't really detract from the throughput and the cost
    is pretty small.

    > If you have device will small number of sg slots (16 or so), vmerge
    > accounting can improve performance by creating requests with more than 16
    > segments --- the question is: is there any such device? And is the device
    > performance-sensitive? (i.e. isn't it such an old hardware where no one
    > cares about performance anyway?)
    >
    > > I suspect with IOMMUs coming back (and being unable to be bypassed) with
    > > virtualisation, virtual merging might once more become a significant
    > > value.

    >
    > I suppose that no one would manufacture new SCSI card with 16 or 32 sg
    > slots these days, so the accounting of hardware segments has no effect on
    > modern hardware.


    It's not about accounting, it's about performance. There's a cost in
    every device to traversing large count sg lists. If you have to bear it
    in the iommu (which is usually more efficient because the iotlb tends to
    follow mmtlb optimisations) you can reduce the cost by eliminating it
    from the device.

    James


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    You are mixing two ideas here:

    (1) virtual merging --- IOMMU maps discontinuous segments into continuous
    area that it presents to the device.

    (2) virtual merge accounting --- block layer tries to guess how many
    segments will be created by (1) and merges small requests into big ones.
    The resulting requests are as big that they can't be processed by the
    device if (1) weren't in effect.

    >> The problem is with vmerge accounting in block layer (that is what I'm
    >> proposing to remove), not with vmerge itself.

    >
    > I don't think that's true ... otherwise parisc would be falling over
    > left right and centre.
    >
    >> Vmerge accounting has advantages only if you have device with small amount
    >> of sg slots --- it allows the block layer to create request that has
    >> higher number of segments then the device.

    >
    > This isn't really true either. A lot of devices with a high sg slot
    > count are still less efficient than an iommu for programming.


    --- for these devices virtual merging (1) improves performance, but
    virtual merge accounting (2) doesn't.

    > Even if they're not, on parisc we have to program the iommu, we can't
    > bypass, so it still makes sense to only have one large sg list (in the
    > iommu) and one small one (in the device). Having two large ones reduces
    > our I/O throughput because of the extra overhead.
    >
    >> If you have device with for example 1024 slots, the virtual merge
    >> accounting has no effect, because the any request will fit into that size.

    >
    > It's not about fitting a request, it's about efficient processing.


    Virtual merge accounting (2) is about fitting a request. It is block layer
    technique.

    >> Even without virtual merge accounting, the virtual merging will happen, so
    >> there will be no performance penalty for the controller --- the controller
    >> will be programmed with exactly the same number of segments as if virtual
    >> merge accounting was present. (there could be even slight positive
    >> performance effect if you remove accounting, because you burn less CPU
    >> cycles per request)

    >
    > Yes there is. Both the iommu and the device have to traverse large SG
    > lists. This is where the inefficiency lies. On PA, we use exactly the
    > same number of iotlb slots whether virtual merging is in effect or not,
    > but the device has an internal loop to go over the list. It's that loop
    > that virtual merging reduces.
    >
    > Since the virtual merge computation is in line when the request is built
    > (by design) it doesn't really detract from the throughput and the cost
    > is pretty small.


    The purpose of (1) virtual merging is to save device's sg slots. The
    purpose of (2) virtual merge accounting is to allow block layer to build
    larger requests. If you remove virtual merge accounting, it will cause no
    increase in number of sg slots used.

    >>> I suspect with IOMMUs coming back (and being unable to be bypassed) with
    >>> virtualisation, virtual merging might once more become a significant
    >>> value.

    >>
    >> I suppose that no one would manufacture new SCSI card with 16 or 32 sg
    >> slots these days, so the accounting of hardware segments has no effect on
    >> modern hardware.

    >
    > It's not about accounting, it's about performance. There's a cost in
    > every device to traversing large count sg lists. If you have to bear it
    > in the iommu (which is usually more efficient because the iotlb tends to
    > follow mmtlb optimisations) you can reduce the cost by eliminating it
    > from the device.


    That's why I'm proposing to remove virtual merge accounting (2), but leave
    virtual merging (1) itself. The accounting doesn't reduce number of sg
    slots.

    Mikulas

    > James

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    On Tue, 2008-07-15 at 11:58 -0400, Mikulas Patocka wrote:
    > You are mixing two ideas here:
    >
    > (1) virtual merging --- IOMMU maps discontinuous segments into continuous
    > area that it presents to the device.
    >
    > (2) virtual merge accounting --- block layer tries to guess how many
    > segments will be created by (1) and merges small requests into big ones.
    > The resulting requests are as big that they can't be processed by the
    > device if (1) weren't in effect.


    No ... I'm not ... the virtual merge implementation requires the block
    layer to get this accounting right, otherwise the iommu code can end up
    doing the wrong thing.

    You're proposing to eliminate the difference between max_phys_segments
    and max_hw_segments without actually removing them.

    > >> The problem is with vmerge accounting in block layer (that is what I'm
    > >> proposing to remove), not with vmerge itself.

    > >
    > > I don't think that's true ... otherwise parisc would be falling over
    > > left right and centre.
    > >
    > >> Vmerge accounting has advantages only if you have device with small amount
    > >> of sg slots --- it allows the block layer to create request that has
    > >> higher number of segments then the device.

    > >
    > > This isn't really true either. A lot of devices with a high sg slot
    > > count are still less efficient than an iommu for programming.

    >
    > --- for these devices virtual merging (1) improves performance, but
    > virtual merge accounting (2) doesn't.
    >
    > > Even if they're not, on parisc we have to program the iommu, we can't
    > > bypass, so it still makes sense to only have one large sg list (in the
    > > iommu) and one small one (in the device). Having two large ones reduces
    > > our I/O throughput because of the extra overhead.
    > >
    > >> If you have device with for example 1024 slots, the virtual merge
    > >> accounting has no effect, because the any request will fit into that size.

    > >
    > > It's not about fitting a request, it's about efficient processing.

    >
    > Virtual merge accounting (2) is about fitting a request. It is block layer
    > technique.
    >
    > >> Even without virtual merge accounting, the virtual merging will happen, so
    > >> there will be no performance penalty for the controller --- the controller
    > >> will be programmed with exactly the same number of segments as if virtual
    > >> merge accounting was present. (there could be even slight positive
    > >> performance effect if you remove accounting, because you burn less CPU
    > >> cycles per request)

    > >
    > > Yes there is. Both the iommu and the device have to traverse large SG
    > > lists. This is where the inefficiency lies. On PA, we use exactly the
    > > same number of iotlb slots whether virtual merging is in effect or not,
    > > but the device has an internal loop to go over the list. It's that loop
    > > that virtual merging reduces.
    > >
    > > Since the virtual merge computation is in line when the request is built
    > > (by design) it doesn't really detract from the throughput and the cost
    > > is pretty small.

    >
    > The purpose of (1) virtual merging is to save device's sg slots. The
    > purpose of (2) virtual merge accounting is to allow block layer to build
    > larger requests. If you remove virtual merge accounting, it will cause no
    > increase in number of sg slots used.
    >
    > >>> I suspect with IOMMUs coming back (and being unable to be bypassed) with
    > >>> virtualisation, virtual merging might once more become a significant
    > >>> value.
    > >>
    > >> I suppose that no one would manufacture new SCSI card with 16 or 32 sg
    > >> slots these days, so the accounting of hardware segments has no effect on
    > >> modern hardware.

    > >
    > > It's not about accounting, it's about performance. There's a cost in
    > > every device to traversing large count sg lists. If you have to bear it
    > > in the iommu (which is usually more efficient because the iotlb tends to
    > > follow mmtlb optimisations) you can reduce the cost by eliminating it
    > > from the device.

    >
    > That's why I'm proposing to remove virtual merge accounting (2), but leave
    > virtual merging (1) itself. The accounting doesn't reduce number of sg
    > slots.


    Yes, but it's gains very little ... architectures that don't want it can
    already turn it off, and it's useful for those, like parisc, who do.

    James


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    On Tue, 15 Jul 2008, James Bottomley wrote:

    > On Tue, 2008-07-15 at 11:58 -0400, Mikulas Patocka wrote:
    >> You are mixing two ideas here:
    >>
    >> (1) virtual merging --- IOMMU maps discontinuous segments into continuous
    >> area that it presents to the device.
    >>
    >> (2) virtual merge accounting --- block layer tries to guess how many
    >> segments will be created by (1) and merges small requests into big ones.
    >> The resulting requests are as big that they can't be processed by the
    >> device if (1) weren't in effect.

    >
    > No ... I'm not ... the virtual merge implementation requires the block
    > layer to get this accounting right, otherwise the iommu code can end up
    > doing the wrong thing.


    The virtual merge (1) can work even without accounting (2). IOMMU can
    always create less sg entries then the block layer expects.

    > You're proposing to eliminate the difference between max_phys_segments
    > and max_hw_segments without actually removing them.


    Yes. Only for alpha and pa-risc, there is difference between these values.
    And both of these architectures are being discontinued.

    >> That's why I'm proposing to remove virtual merge accounting (2), but leave
    >> virtual merging (1) itself. The accounting doesn't reduce number of sg
    >> slots.

    >
    > Yes, but it's gains very little ... architectures that don't want it can
    > already turn it off, and it's useful for those, like parisc, who do.
    >
    > James


    It increases maintainability of the code, reduces bloat and bugs.

    Mikulas
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    On Tue, 2008-07-15 at 12:20 -0400, Mikulas Patocka wrote:
    > On Tue, 15 Jul 2008, James Bottomley wrote:
    >
    > > On Tue, 2008-07-15 at 11:58 -0400, Mikulas Patocka wrote:
    > >> You are mixing two ideas here:
    > >>
    > >> (1) virtual merging --- IOMMU maps discontinuous segments into continuous
    > >> area that it presents to the device.
    > >>
    > >> (2) virtual merge accounting --- block layer tries to guess how many
    > >> segments will be created by (1) and merges small requests into big ones.
    > >> The resulting requests are as big that they can't be processed by the
    > >> device if (1) weren't in effect.

    > >
    > > No ... I'm not ... the virtual merge implementation requires the block
    > > layer to get this accounting right, otherwise the iommu code can end up
    > > doing the wrong thing.

    >
    > The virtual merge (1) can work even without accounting (2). IOMMU can
    > always create less sg entries then the block layer expects.


    It can, but it's not optimal ... and depends on max_phys_segments ==
    max_hw_segments.

    > > You're proposing to eliminate the difference between max_phys_segments
    > > and max_hw_segments without actually removing them.

    >
    > Yes. Only for alpha and pa-risc, there is difference between these values.
    > And both of these architectures are being discontinued.
    >
    > >> That's why I'm proposing to remove virtual merge accounting (2), but leave
    > >> virtual merging (1) itself. The accounting doesn't reduce number of sg
    > >> slots.

    > >
    > > Yes, but it's gains very little ... architectures that don't want it can
    > > already turn it off, and it's useful for those, like parisc, who do.

    >
    > It increases maintainability of the code, reduces bloat and bugs.


    That's not really a good reason. You can eliminate code because it's
    unused and unikely to be used or you redo it to better or fix it to be
    less buggy. You don't simply eliminate useful functionality that
    currently has in-tree users, however marginal you might opine those
    users to be.

    James


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    > > > Yes, but it's gains very little ... architectures that don't want it can
    > > > already turn it off, and it's useful for those, like parisc, who do.

    > >
    > > It increases maintainability of the code, reduces bloat and bugs.

    >
    > That's not really a good reason. You can eliminate code because it's
    > unused and unikely to be used or you redo it to better or fix it to be
    > less buggy. You don't simply eliminate useful functionality that
    > currently has in-tree users, however marginal you might opine those
    > users to be.
    >
    > James


    So show a specific device where the virtual merge accounting is useful.

    (1) The device that is often used in alpha or pa-risc environments ---
    because the accounting is not used on other archs.

    (2) The device that is performance-sensitive --- not something outdated or
    unusual.

    (3) And the device that has limited sg-list size, so that generic I/O
    requests made by the kernel hit this limit. (if the sg-list is so big that
    nr_phys_segments of most requests fits into it, you don't need to count
    nr_hw_segments --- because nr_hw_segments < nr_phys_segments and
    nr_phys_segments already fits).

    [ the device that traverses its sg-list slowly doesn't fall into category
    (3), beacuse virtual merging would happen with or without nr_hw_segments
    accounting ]

    Mikulas
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    On Tue, 15 Jul 2008 11:46:46 -0400 (EDT)
    Mikulas Patocka wrote:

    > >> Even if we fix it now, the question is: how long it will stay fixed? Until
    > >> someone makes another change to struct device that restricts boundaries on
    > >> some wacky hardware.

    > >
    > > I'm not sure how the boundary restriction of a device can break
    > > the VMERGE accounting.

    >
    > Because block layer code doesn't know anything about the device, pci
    > access restrictions and so on.


    Not true, the block layer knows about the device restrictions like DMA
    boundary.

    But it's not the point here because the boundary restriction doesn't
    matter for the VMERGE accounting. An IOMMU just returns an error if it
    can't allocate an I/O space fit for the device restrictions.


    Please give me an example how the boundary restriction of a device can
    break the VMERGE accounting and an IOMMU if you aren't still sure.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    On Wed, 16 Jul 2008, FUJITA Tomonori wrote:

    > On Tue, 15 Jul 2008 11:46:46 -0400 (EDT)
    > Mikulas Patocka wrote:
    >
    > > >> Even if we fix it now, the question is: how long it will stay fixed? Until
    > > >> someone makes another change to struct device that restricts boundaries on
    > > >> some wacky hardware.
    > > >
    > > > I'm not sure how the boundary restriction of a device can break
    > > > the VMERGE accounting.

    > >
    > > Because block layer code doesn't know anything about the device, pci
    > > access restrictions and so on.

    >
    > Not true, the block layer knows about the device restrictions like DMA
    > boundary.
    >
    > But it's not the point here because the boundary restriction doesn't
    > matter for the VMERGE accounting. An IOMMU just returns an error if it
    > can't allocate an I/O space fit for the device restrictions.
    >
    >
    > Please give me an example how the boundary restriction of a device can
    > break the VMERGE accounting and an IOMMU if you aren't still sure.


    You have dma_get_seg_boundary and dma_get_max_seg_size. On sparc64, adding
    one of these broken VMERGE accounting (the VMERGE didn't happen past 64-kb
    boundary and bio layer thought that VMERGE would be possible).

    And if you fix this case, someone will break it again, sooner or later, by
    adding new restriction.

    Mikulas
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    On Wed, 16 Jul 2008 14:02:27 -0400 (EDT)
    Mikulas Patocka wrote:

    > On Wed, 16 Jul 2008, FUJITA Tomonori wrote:
    >
    > > On Tue, 15 Jul 2008 11:46:46 -0400 (EDT)
    > > Mikulas Patocka wrote:
    > >
    > > > >> Even if we fix it now, the question is: how long it will stay fixed? Until
    > > > >> someone makes another change to struct device that restricts boundaries on
    > > > >> some wacky hardware.
    > > > >
    > > > > I'm not sure how the boundary restriction of a device can break
    > > > > the VMERGE accounting.
    > > >
    > > > Because block layer code doesn't know anything about the device, pci
    > > > access restrictions and so on.

    > >
    > > Not true, the block layer knows about the device restrictions like DMA
    > > boundary.
    > >
    > > But it's not the point here because the boundary restriction doesn't
    > > matter for the VMERGE accounting. An IOMMU just returns an error if it
    > > can't allocate an I/O space fit for the device restrictions.
    > >
    > >
    > > Please give me an example how the boundary restriction of a device can
    > > break the VMERGE accounting and an IOMMU if you aren't still sure.

    >
    > You have dma_get_seg_boundary and dma_get_max_seg_size. On sparc64, adding
    > one of these broken VMERGE accounting (the VMERGE didn't happen past 64-kb
    > boundary and bio layer thought that VMERGE would be possible).


    If the device has 64KB boundary restriction, the device also has
    max_seg_size restriction of 64KB or under. So the vmerge acounting
    works (though we need to fix it to handle max_seg_size, as discussed).


    > And if you fix this case, someone will break it again, sooner or later, by
    > adding new restriction.


    What is your new restriction?

    All restrictions that IOMMUs need to know are dma_get_seg_boundary and
    dma_get_max_seg_size.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    > > > Please give me an example how the boundary restriction of a device can
    > > > break the VMERGE accounting and an IOMMU if you aren't still sure.

    > >
    > > You have dma_get_seg_boundary and dma_get_max_seg_size. On sparc64, adding
    > > one of these broken VMERGE accounting (the VMERGE didn't happen past 64-kb
    > > boundary and bio layer thought that VMERGE would be possible).

    >
    > If the device has 64KB boundary restriction, the device also has
    > max_seg_size restriction of 64KB or under. So the vmerge acounting
    > works (though we need to fix it to handle max_seg_size, as discussed).
    >
    > > And if you fix this case, someone will break it again, sooner or later, by
    > > adding new restriction.

    >
    > All restrictions that IOMMUs need to know are dma_get_seg_boundary and
    > dma_get_max_seg_size.
    >
    > What is your new restriction?


    We don't know what happens in the future. And that is the problem that we
    don't know --- but we have two pieces of code (blk-merge and iommu) that
    try to calculate the same number (number of hw segments) and if they get
    different result, it will crash. If the calculations were done at one
    place, there would be no problem with that.

    Mikulas
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    FUJITA Tomonori wrote:
    > On Thu, 17 Jul 2008 07:50:24 -0400 (EDT)
    > Mikulas Patocka wrote:
    >
    >>>>> Please give me an example how the boundary restriction of a device can
    >>>>> break the VMERGE accounting and an IOMMU if you aren't still sure.
    >>>> You have dma_get_seg_boundary and dma_get_max_seg_size. On sparc64, adding
    >>>> one of these broken VMERGE accounting (the VMERGE didn't happen past 64-kb
    >>>> boundary and bio layer thought that VMERGE would be possible).
    >>> If the device has 64KB boundary restriction, the device also has
    >>> max_seg_size restriction of 64KB or under. So the vmerge acounting
    >>> works (though we need to fix it to handle max_seg_size, as discussed).
    >>>
    >>>> And if you fix this case, someone will break it again, sooner or later, by
    >>>> adding new restriction.
    >>> All restrictions that IOMMUs need to know are dma_get_seg_boundary and
    >>> dma_get_max_seg_size.
    >>>
    >>> What is your new restriction?

    >> We don't know what happens in the future.

    >
    > It's very unlikely to add new restrictions.
    >
    >
    >> And that is the problem that we
    >> don't know --- but we have two pieces of code (blk-merge and iommu) that
    >> try to calculate the same number (number of hw segments) and if they get
    >> different result, it will crash. If the calculations were done at one
    >> place, there would be no problem with that.

    >
    > I don't think that your argument, 'the problem that we don't know', is
    > true.
    >
    > With the vmerge accounting, we calculate at two places. So if we add
    > a new restriction, we need to handle it at two places. It's a logical
    > result.
    >
    > Of course, it's easier to calculate at one place rather than two
    > places. But 'we don't know what restriction we will need' isn't a
    > problem.
    >
    >
    > BTW, as I've already said, I'm not against removing the vmerge
    > accounting from the block layer.


    I have a question. Does the block layer know of the IOMMU in use
    for the device? can it call into the IOMMU to calculate the
    restriction?

    Thanks Boaz

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

    On Thu, 17 Jul 2008 07:50:24 -0400 (EDT)
    Mikulas Patocka wrote:

    > > > > Please give me an example how the boundary restriction of a device can
    > > > > break the VMERGE accounting and an IOMMU if you aren't still sure.
    > > >
    > > > You have dma_get_seg_boundary and dma_get_max_seg_size. On sparc64, adding
    > > > one of these broken VMERGE accounting (the VMERGE didn't happen past 64-kb
    > > > boundary and bio layer thought that VMERGE would be possible).

    > >
    > > If the device has 64KB boundary restriction, the device also has
    > > max_seg_size restriction of 64KB or under. So the vmerge acounting
    > > works (though we need to fix it to handle max_seg_size, as discussed).
    > >
    > > > And if you fix this case, someone will break it again, sooner or later, by
    > > > adding new restriction.

    > >
    > > All restrictions that IOMMUs need to know are dma_get_seg_boundary and
    > > dma_get_max_seg_size.
    > >
    > > What is your new restriction?

    >
    > We don't know what happens in the future.


    It's very unlikely to add new restrictions.


    > And that is the problem that we
    > don't know --- but we have two pieces of code (blk-merge and iommu) that
    > try to calculate the same number (number of hw segments) and if they get
    > different result, it will crash. If the calculations were done at one
    > place, there would be no problem with that.


    I don't think that your argument, 'the problem that we don't know', is
    true.

    With the vmerge accounting, we calculate at two places. So if we add
    a new restriction, we need to handle it at two places. It's a logical
    result.

    Of course, it's easier to calculate at one place rather than two
    places. But 'we don't know what restriction we will need' isn't a
    problem.


    BTW, as I've already said, I'm not against removing the vmerge
    accounting from the block layer.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 1 of 2 1 2 LastLast