Slow file transfer speeds with CFQ IO scheduler in some cases - Kernel

This is a discussion on Slow file transfer speeds with CFQ IO scheduler in some cases - Kernel ; Jens Axboe writes: > OK, that looks better. Can I talk you into just trying this little > patch, just to see what kind of performance that yields? Remove the cfq > patch first. I would have patched nfsd only, ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 24 of 24

Thread: Slow file transfer speeds with CFQ IO scheduler in some cases

  1. Re: Slow file transfer speeds with CFQ IO scheduler in some cases

    Jens Axboe writes:

    > OK, that looks better. Can I talk you into just trying this little
    > patch, just to see what kind of performance that yields? Remove the cfq
    > patch first. I would have patched nfsd only, but this is just a quick'n
    > dirty.


    I went ahead and gave it a shot. The updated CFQ patch with no I/O
    context sharing does about 40MB/s reading a 1GB file. Backing that
    patch out, and then adding the patch to share io_context's between
    kthreads yields 45MB/s.

    By the way, in looking at the copy_io function, I noticed what appears
    to be a (minor) bug:

    if (clone_flags & CLONE_IO) {
    tsk->io_context = ioc_task_link(ioc);
    if (unlikely(!tsk->io_context))
    return -ENOMEM;

    According to comments in ioc_task_link, tsk->io_context == NULL means:
    /*
    * if ref count is zero, don't allow sharing (ioc is going away, it's
    * a race).
    */

    It seems more appropriate to just create a new I/O context at this
    point, don't you think? (Sorry, I know it's off-topic!)

    Cheers,

    Jeff

    diff --git a/kernel/fork.c b/kernel/fork.c
    index f608356..483d95c 100644
    --- a/kernel/fork.c
    +++ b/kernel/fork.c
    @@ -723,10 +723,17 @@ static int copy_io(unsigned long clone_flags, struct task_struct *tsk)
    * Share io context with parent, if CLONE_IO is set
    */
    if (clone_flags & CLONE_IO) {
    + /*
    + * If ioc_task_link fails, it just means that we raced
    + * with io context cleanup. Continue on to allocate
    + * a new context in this case.
    + */
    tsk->io_context = ioc_task_link(ioc);
    - if (unlikely(!tsk->io_context))
    - return -ENOMEM;
    - } else if (ioprio_valid(ioc->ioprio)) {
    + if (likely(tsk->io_context))
    + return 0;
    + }
    +
    + if (ioprio_valid(ioc->ioprio)) {
    tsk->io_context = alloc_io_context(GFP_KERNEL, -1);
    if (unlikely(!tsk->io_context))
    return -ENOMEM;
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: Slow file transfer speeds with CFQ IO scheduler in some cases

    Jens Axboe wrote:
    > On Tue, Nov 11 2008, Vitaly V. Bursov wrote:
    >> Jens Axboe wrote:
    >>> On Tue, Nov 11 2008, Jens Axboe wrote:
    >>>> On Tue, Nov 11 2008, Jens Axboe wrote:
    >>>>> On Mon, Nov 10 2008, Jeff Moyer wrote:
    >>>>>> "Vitaly V. Bursov" writes:
    >>>>>>
    >>>>>>> Jens Axboe wrote:
    >>>>>>>> On Mon, Nov 10 2008, Vitaly V. Bursov wrote:
    >>>>>>>>> Jens Axboe wrote:
    >>>>>>>>>> On Mon, Nov 10 2008, Jeff Moyer wrote:
    >>>>>>>>>>> Jens Axboe writes:
    >>>>>>>>>>>
    >>>>>>>>>>>> http://bugzilla.kernel.org/attachmen...73&action=view
    >>>>>>>>>>> Funny, I was going to ask the same question. The reason Jens wants
    >>>>>>>>>>> you to try this patch is that nfsd may be farming off the I/O requests
    >>>>>>>>>>> to different threads which are then performing interleaved I/O. The
    >>>>>>>>>>> above patch tries to detect this and allow cooperating processes to get
    >>>>>>>>>>> disk time instead of waiting for the idle timeout.
    >>>>>>>>>> Precisely :-)
    >>>>>>>>>>
    >>>>>>>>>> The only reason I haven't merged it yet is because of worry of extra
    >>>>>>>>>> cost, but I'll throw some SSD love at it and see how it turns out.
    >>>>>>>>>>
    >>>>>>>>> Sorry, but I get "oops" same moment nfs read transfer starts.
    >>>>>>>>> I can get directory list via nfs, read files locally (not
    >>>>>>>>> carefully tested, though)
    >>>>>>>>>
    >>>>>>>>> Dumps captured via netconsole, so these may not be completely accurate
    >>>>>>>>> but hopefully will give a hint.
    >>>>>>>> Interesting, strange how that hasn't triggered here. Or perhaps the
    >>>>>>>> version that Jeff posted isn't the one I tried. Anyway, search for:
    >>>>>>>>
    >>>>>>>> RB_CLEAR_NODE(&cfqq->rb_node);
    >>>>>>>>
    >>>>>>>> and add a
    >>>>>>>>
    >>>>>>>> RB_CLEAR_NODE(&cfqq->prio_node);
    >>>>>>>>
    >>>>>>>> just below that. It's in cfq_find_alloc_queue(). I think that should fix
    >>>>>>>> it.
    >>>>>>>>
    >>>>>>> Same problem.
    >>>>>>>
    >>>>>>> I did make clean; make -j3; sync; on (2 times) patched kernel and it went OK
    >>>>>>> but It won't boot anymore with cfq with same error...
    >>>>>>>
    >>>>>>> Switching cfq io scheduler at runtime (booting with "as") appears to work with
    >>>>>>> two parallel local dd reads.
    >>>>>> Strange, I can't reproduce a failure. I'll keep trying. For now, these
    >>>>>> are the results I see:
    >>>>>>
    >>>>>> [root@maiden ~]# mount megadeth:/export/cciss /mnt/megadeth/
    >>>>>> [root@maiden ~]# dd if=/mnt/megadeth/file1 of=/dev/null bs=1M
    >>>>>> 1024+0 records in
    >>>>>> 1024+0 records out
    >>>>>> 1073741824 bytes (1.1 GB) copied, 26.8128 s, 40.0 MB/s
    >>>>>> [root@maiden ~]# umount /mnt/megadeth/
    >>>>>> [root@maiden ~]# mount megadeth:/export/cciss /mnt/megadeth/
    >>>>>> [root@maiden ~]# dd if=/mnt/megadeth/file1 of=/dev/null bs=1M
    >>>>>> 1024+0 records in
    >>>>>> 1024+0 records out
    >>>>>> 1073741824 bytes (1.1 GB) copied, 23.7025 s, 45.3 MB/s
    >>>>>> [root@maiden ~]# umount /mnt/megadeth/
    >>>>>>
    >>>>>> Here is the patch, with the suggestion from Jens to switch the cfqq to
    >>>>>> the right priority tree when the priority is changed.
    >>>>> I don't see the issue here either. Vitaly, are you using any openvz
    >>>>> kernel patches? IIRC, they patch cfq so it could just be that your cfq
    >>>>> version is incompatible with Jeff's patch.
    >>>> Heh, got it to trigger about 3 seconds after sending that email! I'll
    >>>> look more into it.
    >>> OK, found the issue. A few bugs there... cfq_prio_tree_lookup() doesn't
    >>> even return a hit, since it just breaks and returns NULL always. That
    >>> can cause cfq_prio_tree_add() to screw up the rbtree. The code to
    >>> correct on ioprio change wasn't correct either, I changed that as well.
    >>> New patch below, Vitaly can you give it a spin?
    >>>

    >> No crashes so far. Transfer speed is quiet good also.
    >>
    >>
    >> NFS+deadline, file not cached:
    >>
    >> avg-cpu: %user %nice %system %iowait %steal %idle
    >> 0,00 0,00 25,50 19,40 0,00 55,10
    >>
    >> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
    >> sda 6648,80 0,00 1281,70 0,00 115179,20 0,00 89,86 5,35 4,18 0,35 45,20
    >> sdb 6672,30 0,00 1257,00 0,00 115292,80 0,00 91,72 5,09 4,06 0,35 44,60
    >>
    >>
    >>
    >> NFS+cfq, file not cached:
    >>
    >> avg-cpu: %user %nice %system %iowait %steal %idle
    >> 0,05 0,00 25,30 23,95 0,00 50,70
    >>
    >> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
    >> sda 6403,00 0,00 1089,90 0,00 108655,20 0,00 99,69 4,50 4,13 0,41 44,50
    >> sdb 6394,90 0,00 1099,60 0,00 108639,20 0,00 98,80 4,53 4,12 0,39 42,50
    >>
    >>
    >> Just for reference: 10 sec interval average, gigabit network,
    >> no tcp/udp hardware checksumming may lead to high system cpu load.
    >>
    >>
    >> Also, a few more test (server has 4G RAM):
    >>
    >> NFS+cfq, file not cached:
    >> $ dd if=test of=/dev/null bs=1M count=2000
    >> 2000+0 records in
    >> 2000+0 records out
    >> 2097152000 bytes (2.1 GB) copied, 24.9147 s, 84.2 MB/s
    >>
    >> NFS+deadline, file not cached:
    >> 2000+0 records in
    >> 2000+0 records out
    >> 2097152000 bytes (2.1 GB) copied, 23.2999 s, 90.0 MB/s
    >>
    >> file cached on server:
    >> 2000+0 records in
    >> 2000+0 records out
    >> 2097152000 bytes (2.1 GB) copied, 21.9784 s, 95.4 MB/s
    >>
    >>
    >> Local single dd read leads to 193 MB/s for deadline and
    >> 167 MB/s for cfq.

    >
    > OK, that looks better. Can I talk you into just trying this little
    > patch, just to see what kind of performance that yields? Remove the cfq
    > patch first. I would have patched nfsd only, but this is just a quick'n
    > dirty.
    >
    > diff --git a/kernel/kthread.c b/kernel/kthread.c
    > index 8e7a7ce..3aacf48 100644
    > --- a/kernel/kthread.c
    > +++ b/kernel/kthread.c
    > @@ -92,7 +92,7 @@ static void create_kthread(struct kthread_create_info *create)
    > int pid;
    >
    > /* We want our own signal handler (we take no signals by default). */
    > - pid = kernel_thread(kthread, create, CLONE_FS | CLONE_FILES | SIGCHLD);
    > + pid = kernel_thread(kthread, create, CLONE_FS | CLONE_FILES | CLONE_IO | SIGCHLD);
    > if (pid < 0) {
    > create->result = ERR_PTR(pid);
    > } else {
    >


    No patches:

    iostat for nfs+cfq read
    avg-cpu: %user %nice %system %iowait %steal %idle
    0,00 0,00 3,25 52,20 0,00 44,55

    Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
    sda 2820,10 0,00 452,40 0,00 47648,80 0,00 105,32 7,54 16,70 1,96 88,60
    sdb 2818,60 0,00 453,90 0,00 47391,20 0,00 104,41 4,13 9,02 1,33 60,30

    NFS+cfq, file not cached:
    2000+0 records in
    2000+0 records out
    2097152000 bytes (2.1 GB) copied, 57.5762 s, 36.4 MB/s

    NFS+deadline, file not cached:
    2000+0 records in
    2000+0 records out
    2097152000 bytes (2.1 GB) copied, 23.6672 s, 88.6 MB/s

    ======================
    Above patch applied:

    avg-cpu: %user %nice %system %iowait %steal %idle
    0,00 0,00 3,60 51,10 0,00 45,30

    Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
    sda 2805,80 0,00 446,20 0,00 47267,20 0,00 105,93 5,61 12,62 1,71 76,50
    sdb 2803,90 0,00 448,50 0,00 47246,40 0,00 105,34 5,56 12,46 1,68 75,40


    NFS+cfq, file not cached:
    2000+0 records in
    2000+0 records out
    2097152000 bytes (2.1 GB) copied, 57.5903 s, 36.4 MB/s

    NFS+deadline, file not cached:
    2000+0 records in
    2000+0 records out
    2097152000 bytes (2.1 GB) copied, 23.46 s, 89.4 MB/s

    ======================
    Both patches applied:

    avg-cpu: %user %nice %system %iowait %steal %idle
    0,00 0,00 22,95 24,65 0,00 52,40

    Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
    sda 6504,60 0,00 1089,80 0,00 110359,20 0,00 101,27 4,67 4,29 0,40 43,50
    sdb 6495,50 0,00 1097,50 0,00 110312,80 0,00 100,51 4,57 4,17 0,39 43,10


    NFS+cfq, file not cached:
    2000+0 records in
    2000+0 records out
    2097152000 bytes (2.1 GB) copied, 25.4477 s, 82.4 MB/s

    NFS+deadline, file not cached:
    2000+0 records in
    2000+0 records out
    2097152000 bytes (2.1 GB) copied, 23.1639 s, 90.5 MB/s

    --
    Thanks,
    Vitaly
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: Slow file transfer speeds with CFQ IO scheduler in some cases

    On Tue, 11 Nov 2008 14:36:07 -0500
    Jeff Moyer wrote:

    > Jens Axboe writes:
    >
    > > OK, that looks better. Can I talk you into just trying this little
    > > patch, just to see what kind of performance that yields? Remove the cfq
    > > patch first. I would have patched nfsd only, but this is just a quick'n
    > > dirty.

    >
    > I went ahead and gave it a shot. The updated CFQ patch with no I/O
    > context sharing does about 40MB/s reading a 1GB file. Backing that
    > patch out, and then adding the patch to share io_context's between
    > kthreads yields 45MB/s.
    >


    Here's a quick and dirty patch to make all of the nfsd's have the same
    io_context. Comments appreciated -- I'm not that familiar with the IO
    scheduling code. If this looks good, I'll clean it up, add some
    comments and formally send it to Bruce.

    ----------------[snip]-------------------

    From dd15b19a0eab3e181a6f76f1421b97950e255b4b Mon Sep 17 00:00:00 2001
    From: Jeff Layton
    Date: Tue, 11 Nov 2008 15:43:15 -0500
    Subject: [PATCH] knfsd: make all nfsd threads share an io_context

    This apparently makes the I/O scheduler treat the threads as a group
    which helps throughput when sequential I/O is multiplexed over several
    nfsd's.

    Signed-off-by: Jeff Layton
    ---
    fs/nfsd/nfssvc.c | 27 +++++++++++++++++++++++++++
    1 files changed, 27 insertions(+), 0 deletions(-)

    diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
    index 07e4f5d..6d87f74 100644
    --- a/fs/nfsd/nfssvc.c
    +++ b/fs/nfsd/nfssvc.c
    @@ -22,6 +22,7 @@
    #include
    #include
    #include
    +#include

    #include
    #include
    @@ -42,6 +43,7 @@ static int nfsd(void *vrqstp);
    struct timeval nfssvc_boot;
    static atomic_t nfsd_busy;
    static unsigned long nfsd_last_call;
    +static struct io_context *nfsd_io_context;
    static DEFINE_SPINLOCK(nfsd_call_lock);

    /*
    @@ -173,6 +175,7 @@ static void nfsd_last_thread(struct svc_serv *serv)
    nfsd_serv = NULL;
    nfsd_racache_shutdown();
    nfs4_state_shutdown();
    + nfsd_io_context = NULL;

    printk(KERN_WARNING "nfsd: last server has exited, flushing export "
    "cache\n");
    @@ -398,6 +401,28 @@ update_thread_usage(int busy_threads)
    }

    /*
    + * should be called while holding nfsd_mutex
    + */
    +static void
    +nfsd_set_io_context(void)
    +{
    + int cpu, node;
    +
    + if (!nfsd_io_context) {
    + cpu = get_cpu();
    + node = cpu_to_node(cpu);
    + put_cpu();
    +
    + /*
    + * get_io_context can return NULL if the alloc_context fails.
    + * That's not technically fatal here, so we don't bother to
    + * check for it.
    + */
    + nfsd_io_context = get_io_context(GFP_KERNEL, node);
    + } else
    + copy_io_context(&current->io_context, &nfsd_io_context);
    +}
    +/*
    * This is the NFS server kernel thread
    */
    static int
    @@ -410,6 +435,8 @@ nfsd(void *vrqstp)
    /* Lock module and set up kernel thread */
    mutex_lock(&nfsd_mutex);

    + nfsd_set_io_context();
    +
    /* At this point, the thread shares current->fs
    * with the init process. We need to create files with a
    * umask of 0 instead of init's umask. */
    --
    1.5.5.1

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: Slow file transfer speeds with CFQ IO scheduler in some cases

    On Tue, 11 Nov 2008 16:41:04 -0500
    Jeff Layton wrote:

    > On Tue, 11 Nov 2008 14:36:07 -0500
    > Jeff Moyer wrote:
    >
    > > Jens Axboe writes:
    > >
    > > > OK, that looks better. Can I talk you into just trying this little
    > > > patch, just to see what kind of performance that yields? Remove the cfq
    > > > patch first. I would have patched nfsd only, but this is just a quick'n
    > > > dirty.

    > >
    > > I went ahead and gave it a shot. The updated CFQ patch with no I/O
    > > context sharing does about 40MB/s reading a 1GB file. Backing that
    > > patch out, and then adding the patch to share io_context's between
    > > kthreads yields 45MB/s.
    > >

    >
    > Here's a quick and dirty patch to make all of the nfsd's have the same
    > io_context. Comments appreciated -- I'm not that familiar with the IO
    > scheduling code. If this looks good, I'll clean it up, add some
    > comments and formally send it to Bruce.
    >


    No sooner than I send it out than I find a bug. We need to eventually
    put the io_context reference we get. This should be more correct:

    ----------------[snip]-------------------

    From d0ee67045a12c677883f77791c6f260588c7b41f Mon Sep 17 00:00:00 2001
    From: Jeff Layton
    Date: Tue, 11 Nov 2008 16:54:16 -0500
    Subject: [PATCH] knfsd: make all nfsd threads share an io_context

    This apparently makes the I/O scheduler treat the threads as a group
    which helps throughput when sequential I/O is multiplexed over several
    nfsd's.

    Signed-off-by: Jeff Layton
    ---
    fs/nfsd/nfssvc.c | 30 ++++++++++++++++++++++++++++++
    1 files changed, 30 insertions(+), 0 deletions(-)

    diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
    index 07e4f5d..5cd99f9 100644
    --- a/fs/nfsd/nfssvc.c
    +++ b/fs/nfsd/nfssvc.c
    @@ -22,6 +22,7 @@
    #include
    #include
    #include
    +#include

    #include
    #include
    @@ -42,6 +43,7 @@ static int nfsd(void *vrqstp);
    struct timeval nfssvc_boot;
    static atomic_t nfsd_busy;
    static unsigned long nfsd_last_call;
    +static struct io_context *nfsd_io_context;
    static DEFINE_SPINLOCK(nfsd_call_lock);

    /*
    @@ -173,6 +175,10 @@ static void nfsd_last_thread(struct svc_serv *serv)
    nfsd_serv = NULL;
    nfsd_racache_shutdown();
    nfs4_state_shutdown();
    + if (nfsd_io_context) {
    + put_io_context(nfsd_io_context);
    + nfsd_io_context = NULL;
    + }

    printk(KERN_WARNING "nfsd: last server has exited, flushing export "
    "cache\n");
    @@ -398,6 +404,28 @@ update_thread_usage(int busy_threads)
    }

    /*
    + * should be called while holding nfsd_mutex
    + */
    +static void
    +nfsd_set_io_context(void)
    +{
    + int cpu, node;
    +
    + if (!nfsd_io_context) {
    + cpu = get_cpu();
    + node = cpu_to_node(cpu);
    + put_cpu();
    +
    + /*
    + * get_io_context can return NULL if the alloc_context fails.
    + * That's not technically fatal here, so we don't bother to
    + * check for it.
    + */
    + nfsd_io_context = get_io_context(GFP_KERNEL, node);
    + } else
    + copy_io_context(&current->io_context, &nfsd_io_context);
    +}
    +/*
    * This is the NFS server kernel thread
    */
    static int
    @@ -410,6 +438,8 @@ nfsd(void *vrqstp)
    /* Lock module and set up kernel thread */
    mutex_lock(&nfsd_mutex);

    + nfsd_set_io_context();
    +
    /* At this point, the thread shares current->fs
    * with the init process. We need to create files with a
    * umask of 0 instead of init's umask. */
    --
    1.5.5.1

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2