[PATCH 1/3] accounting: task counters for disk/network - Kernel

This is a discussion on [PATCH 1/3] accounting: task counters for disk/network - Kernel ; From: Gerlof Langeveld Proper performance analysis requires the availability of system level and process level counters for CPU, memory, disk and network utilization. The current kernel offers the system level counters, however process level counters are only (sufficiently) available for ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: [PATCH 1/3] accounting: task counters for disk/network

  1. [PATCH 1/3] accounting: task counters for disk/network


    From: Gerlof Langeveld

    Proper performance analysis requires the availability of system level
    and process level counters for CPU, memory, disk and network utilization.
    The current kernel offers the system level counters, however process level
    counters are only (sufficiently) available for CPU and memory utilization.

    The kernel feature "task I/O accounting" currently maintains
    per process counters for the number of bytes transferred to/from disk.
    These counters are available via /proc/pid/io. It is still not possible
    to find out which process issues the physical disk transfer. Besides,
    not *all* disk transfers are accounted to processes (e.g. swap-transfers
    by kswapd, journaling transfers).

    This patch extends "task I/O accounting" by counting real *physical*
    disk transfers per process and by counting IPv4/IPv6 socket transfers
    per process.
    The modified output generated for /proc/pid/io will be as follows:

    $ cat /proc/3179/io
    rchar: 49934
    wchar: 4
    syscr: 27
    syscw: 1
    read_bytes: 200704
    write_bytes: 4096
    cancelled_write_bytes: 0
    disk_read: 8 <---- this line is added
    disk_read_sect: 392 <---- this line is added
    disk_write: 0 <---- this line is added
    disk_write_sect: 0 <---- this line is added
    tcp_send: 0 <---- this line is added
    tcp_send_bytes: 0 <---- this line is added
    tcp_recv: 0 <---- this line is added
    tcp_recv_bytes: 0 <---- this line is added
    udp_send: 27 <---- this line is added
    udp_send_bytes: 1296 <---- this line is added
    udp_recv: 27 <---- this line is added
    udp_recv_bytes: 29484 <---- this line is added
    raw_send: 0 <---- this line is added
    raw_recv: 0 <---- this line is added

    The performance monitor 'atop' uses a similar kernel patch for
    several years already to be able to show these per process statistics.

    Modified source files
    include/linux/task_io_accounting.h: addition of new counters to the
    struct task_io_accounting

    fs/proc/base.c: generate output via /proc/pid/io

    block/ll_rw_blk.c: per process counting of physical
    disk access

    net/socket.c: per process counting of socket
    transfers

    kernel/acct.c: add number of disk reads/writes to
    standard accounting record

    Since "task I/O accounting" is currently optional (CONFIG_TASK_IO_ACCOUNTING),
    all modifications are ifdef'd with the same macro as well.
    Patch applies to kernel version 2.6.24.4

    Signed-off-by: Gerlof Langeveld
    ---

    diff -uprN -X linux-2.6.24.4-vanilla/Documentation/dontdiff linux-2.6.24.4-vanilla/block/ll_rw_blk.c linux-2.6.24.4-modified/block/ll_rw_blk.c
    --- linux-2.6.24.4-vanilla/block/ll_rw_blk.c 2008-03-24 19:49:18.000000000 +0100
    +++ linux-2.6.24.4-modified/block/ll_rw_blk.c 2008-03-25 13:52:14.000000000 +0100
    @@ -2739,6 +2739,19 @@ static void drive_stat_acct(struct reque
    disk_round_stats(rq->rq_disk);
    rq->rq_disk->in_flight++;
    }
    +
    +#ifdef CONFIG_TASK_IO_ACCOUNTING
    + switch (rw) {
    + case READ:
    + current->group_leader->ioac.dsk_rio += new_io;
    + current->group_leader->ioac.dsk_rsz += rq->nr_sectors;
    + break;
    + case WRITE:
    + current->group_leader->ioac.dsk_wio += new_io;
    + current->group_leader->ioac.dsk_wsz += rq->nr_sectors;
    + break;
    + }
    +#endif
    }

    /*
    diff -uprN -X linux-2.6.24.4-vanilla/Documentation/dontdiff linux-2.6.24.4-vanilla/fs/proc/base.c linux-2.6.24.4-modified/fs/proc/base.c
    --- linux-2.6.24.4-vanilla/fs/proc/base.c 2008-03-24 19:49:18.000000000 +0100
    +++ linux-2.6.24.4-modified/fs/proc/base.c 2008-03-25 13:52:14.000000000 +0100
    @@ -2174,7 +2174,21 @@ static int proc_pid_io_accounting(struct
    #endif
    "read_bytes: %llu\n"
    "write_bytes: %llu\n"
    - "cancelled_write_bytes: %llu\n",
    + "cancelled_write_bytes: %llu\n"
    + "disk_read: %llu\n"
    + "disk_read_sect: %llu\n"
    + "disk_write: %llu\n"
    + "disk_write_sect: %llu\n"
    + "tcp_send: %llu\n"
    + "tcp_send_bytes: %llu\n"
    + "tcp_recv: %llu\n"
    + "tcp_recv_bytes: %llu\n"
    + "udp_send: %llu\n"
    + "udp_send_bytes: %llu\n"
    + "udp_recv: %llu\n"
    + "udp_recv_bytes: %llu\n"
    + "raw_send: %llu\n"
    + "raw_recv: %llu\n",
    #ifdef CONFIG_TASK_XACCT
    (unsigned long long)task->rchar,
    (unsigned long long)task->wchar,
    @@ -2183,7 +2197,21 @@ static int proc_pid_io_accounting(struct
    #endif
    (unsigned long long)task->ioac.read_bytes,
    (unsigned long long)task->ioac.write_bytes,
    - (unsigned long long)task->ioac.cancelled_write_bytes);
    + (unsigned long long)task->ioac.cancelled_write_bytes,
    + (unsigned long long)task->ioac.dsk_rio,
    + (unsigned long long)task->ioac.dsk_rsz,
    + (unsigned long long)task->ioac.dsk_wio,
    + (unsigned long long)task->ioac.dsk_wsz,
    + (unsigned long long)task->ioac.tcp_snd,
    + (unsigned long long)task->ioac.tcp_ssz,
    + (unsigned long long)task->ioac.tcp_rcv,
    + (unsigned long long)task->ioac.tcp_rsz,
    + (unsigned long long)task->ioac.udp_snd,
    + (unsigned long long)task->ioac.udp_ssz,
    + (unsigned long long)task->ioac.udp_rcv,
    + (unsigned long long)task->ioac.udp_rsz,
    + (unsigned long long)task->ioac.raw_snd,
    + (unsigned long long)task->ioac.raw_rcv);
    }
    #endif

    diff -uprN -X linux-2.6.24.4-vanilla/Documentation/dontdiff linux-2.6.24.4-vanilla/include/linux/task_io_accounting.h linux-2.6.24.4-modified/include/linux/task_io_accounting.h
    --- linux-2.6.24.4-vanilla/include/linux/task_io_accounting.h 2008-03-24 19:49:18.000000000 +0100
    +++ linux-2.6.24.4-modified/include/linux/task_io_accounting.h 2008-03-25 13:52:14.000000000 +0100
    @@ -30,6 +30,23 @@ struct task_io_accounting {
    * information loss in doing that.
    */
    u64 cancelled_write_bytes;
    +
    + /*
    + * Number of physical reads and writes to disk by this task
    + * and the accumulated size of these physical transfers.
    + */
    + u64 dsk_rio, dsk_wio;
    + u64 dsk_rsz, dsk_wsz;
    +
    + /*
    + * Number of sends and receives issued for IPv4/IPv6 by
    + * this task on TCP, UDP and raw sockets with their accumulated size.
    + */
    + u64 tcp_snd, tcp_rcv;
    + u64 tcp_ssz, tcp_rsz;
    + u64 udp_snd, udp_rcv;
    + u64 udp_ssz, udp_rsz;
    + u64 raw_snd, raw_rcv;
    };
    #else
    struct task_io_accounting {
    diff -uprN -X linux-2.6.24.4-vanilla/Documentation/dontdiff linux-2.6.24.4-vanilla/kernel/acct.c linux-2.6.24.4-modified/kernel/acct.c
    --- linux-2.6.24.4-vanilla/kernel/acct.c 2008-03-24 19:49:18.000000000 +0100
    +++ linux-2.6.24.4-modified/kernel/acct.c 2008-03-25 13:55:07.000000000 +0100
    @@ -497,7 +497,11 @@ static void do_acct_process(struct file
    ac.ac_exitcode = pacct->ac_exitcode;
    spin_unlock_irq(&current->sighand->siglock);
    ac.ac_io = encode_comp_t(0 /* current->io_usage */); /* %% */
    +#ifdef CONFIG_TASK_IO_ACCOUNTING
    + ac.ac_rw = encode_comp_t(current->ioac.dsk_rio + current->ioac.dsk_wio);
    +#else
    ac.ac_rw = encode_comp_t(ac.ac_io / 1024);
    +#endif
    ac.ac_swaps = encode_comp_t(0);

    /*
    diff -uprN -X linux-2.6.24.4-vanilla/Documentation/dontdiff linux-2.6.24.4-vanilla/net/socket.c linux-2.6.24.4-modified/net/socket.c
    --- linux-2.6.24.4-vanilla/net/socket.c 2008-03-24 19:49:18.000000000 +0100
    +++ linux-2.6.24.4-modified/net/socket.c 2008-03-25 13:52:14.000000000 +0100
    @@ -551,10 +551,30 @@ static inline int __sock_sendmsg(struct
    si->size = size;

    err = security_socket_sendmsg(sock, msg, size);
    - if (err)
    - return err;
    + if (!err)
    + err = sock->ops->sendmsg(iocb, sock, msg, size);

    - return sock->ops->sendmsg(iocb, sock, msg, size);
    +#ifdef CONFIG_TASK_IO_ACCOUNTING
    + if (err >= 0 && sock->sk) {
    + switch (sock->sk->sk_family) {
    + case PF_INET:
    + case PF_INET6:
    + switch (sock->sk->sk_type) {
    + case SOCK_STREAM:
    + current->group_leader->ioac.tcp_snd++;
    + current->group_leader->ioac.tcp_ssz += size;
    + break;
    + case SOCK_DGRAM:
    + current->group_leader->ioac.udp_snd++;
    + current->group_leader->ioac.udp_ssz += size;
    + break;
    + case SOCK_RAW:
    + current->group_leader->ioac.raw_snd++;
    + }
    + }
    + }
    +#endif
    + return err;
    }

    int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
    @@ -633,10 +653,31 @@ static inline int __sock_recvmsg(struct
    si->flags = flags;

    err = security_socket_recvmsg(sock, msg, size, flags);
    - if (err)
    - return err;
    + if (!err)
    + err = sock->ops->recvmsg(iocb, sock, msg, size, flags);

    - return sock->ops->recvmsg(iocb, sock, msg, size, flags);
    +#ifdef CONFIG_TASK_IO_ACCOUNTING
    + if (err >= 0 && sock->sk) {
    + switch (sock->sk->sk_family) {
    + case PF_INET:
    + case PF_INET6:
    + switch (sock->sk->sk_type) {
    + case SOCK_STREAM:
    + current->group_leader->ioac.tcp_rcv++;
    + current->group_leader->ioac.tcp_rsz += size;
    + break;
    + case SOCK_DGRAM:
    + current->group_leader->ioac.udp_rcv++;
    + current->group_leader->ioac.udp_rsz += size;
    + break;
    + case SOCK_RAW:
    + current->group_leader->ioac.raw_rcv++;
    + break;
    + }
    + }
    + }
    +#endif
    + return err;
    }

    int sock_recvmsg(struct socket *sock, struct msghdr *msg,
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH 1/3] accounting: task counters for disk/network

    On Wed, 2 Apr 2008 09:30:37 +0200
    Gerlof Langeveld wrote:

    >
    > From: Gerlof Langeveld


    You sent three different patches, all with the same title. Please don't do
    that - choose unique, suitable and meaningful titles for each patch.

    > Proper performance analysis requires the availability of system level
    > and process level counters for CPU, memory, disk and network utilization.
    > The current kernel offers the system level counters, however process level
    > counters are only (sufficiently) available for CPU and memory utilization.
    >
    > The kernel feature "task I/O accounting" currently maintains
    > per process counters for the number of bytes transferred to/from disk.
    > These counters are available via /proc/pid/io. It is still not possible
    > to find out which process issues the physical disk transfer. Besides,
    > not *all* disk transfers are accounted to processes (e.g. swap-transfers
    > by kswapd, journaling transfers).
    >
    > This patch extends "task I/O accounting" by counting real *physical*
    > disk transfers per process and by counting IPv4/IPv6 socket transfers
    > per process.
    > The modified output generated for /proc/pid/io will be as follows:
    >
    > $ cat /proc/3179/io


    /proc/pid/io is not the primary interface for this sort of accounting - it
    was just tossed in there as an afterthought because it wasy easy.

    This sort of accounting should be delivered across taskstats and
    Documentation/accounting/getdelays.c should be suitably updated.

    > --- linux-2.6.24.4-vanilla/block/ll_rw_blk.c 2008-03-24 19:49:18.000000000 +0100
    > +++ linux-2.6.24.4-modified/block/ll_rw_blk.c 2008-03-25 13:52:14.000000000 +0100
    > @@ -2739,6 +2739,19 @@ static void drive_stat_acct(struct reque
    > disk_round_stats(rq->rq_disk);
    > rq->rq_disk->in_flight++;
    > }
    > +
    > +#ifdef CONFIG_TASK_IO_ACCOUNTING
    > + switch (rw) {
    > + case READ:
    > + current->group_leader->ioac.dsk_rio += new_io;
    > + current->group_leader->ioac.dsk_rsz += rq->nr_sectors;
    > + break;
    > + case WRITE:
    > + current->group_leader->ioac.dsk_wio += new_io;
    > + current->group_leader->ioac.dsk_wsz += rq->nr_sectors;
    > + break;
    > + }
    > +#endif


    For many workloads, this will cause almost all writeout to be accounted to
    pdflush and perhaps kswapd. This makes the per-task write accounting
    largely unuseful.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH 1/3] accounting: task counters for disk/network

    Op 03-04-2008, 12:54 Andrew Morton wrote:
    > On Wed, 2 Apr 2008 09:30:37 +0200
    > Gerlof Langeveld wrote:
    >
    > >
    > > From: Gerlof Langeveld

    >
    > You sent three different patches, all with the same title. Please don't do
    > that - choose unique, suitable and meaningful titles for each patch.


    Sorry for that (I assumed the same title would correlate the three patches).

    > > Proper performance analysis requires the availability of system level
    > > and process level counters for CPU, memory, disk and network utilization.
    > > The current kernel offers the system level counters, however process level
    > > counters are only (sufficiently) available for CPU and memory utilization.
    > >
    > > The kernel feature "task I/O accounting" currently maintains
    > > per process counters for the number of bytes transferred to/from disk.
    > > These counters are available via /proc/pid/io. It is still not possible
    > > to find out which process issues the physical disk transfer. Besides,
    > > not *all* disk transfers are accounted to processes (e.g. swap-transfers
    > > by kswapd, journaling transfers).
    > >
    > > This patch extends "task I/O accounting" by counting real *physical*
    > > disk transfers per process and by counting IPv4/IPv6 socket transfers
    > > per process.
    > > The modified output generated for /proc/pid/io will be as follows:
    > >
    > > $ cat /proc/3179/io

    >
    > /proc/pid/io is not the primary interface for this sort of accounting - it
    > was just tossed in there as an afterthought because it wasy easy.
    >
    > This sort of accounting should be delivered across taskstats and
    > Documentation/accounting/getdelays.c should be suitably updated.


    I must dive into the taskstats feature first, so I will deliver
    a new patch later on.

    > > --- linux-2.6.24.4-vanilla/block/ll_rw_blk.c 2008-03-24 19:49:18.000000000 +0100
    > > +++ linux-2.6.24.4-modified/block/ll_rw_blk.c 2008-03-25 13:52:14.000000000 +0100
    > > @@ -2739,6 +2739,19 @@ static void drive_stat_acct(struct reque
    > > disk_round_stats(rq->rq_disk);
    > > rq->rq_disk->in_flight++;
    > > }
    > > +
    > > +#ifdef CONFIG_TASK_IO_ACCOUNTING
    > > + switch (rw) {
    > > + case READ:
    > > + current->group_leader->ioac.dsk_rio += new_io;
    > > + current->group_leader->ioac.dsk_rsz += rq->nr_sectors;
    > > + break;
    > > + case WRITE:
    > > + current->group_leader->ioac.dsk_wio += new_io;
    > > + current->group_leader->ioac.dsk_wsz += rq->nr_sectors;
    > > + break;
    > > + }
    > > +#endif

    >
    > For many workloads, this will cause almost all writeout to be accounted to
    > pdflush and perhaps kswapd. This makes the per-task write accounting
    > largely unuseful.


    There are several situations that writeouts are accounted to the user-process
    itself, e.g. when issueing direct writes (open mode O_DIRECT) or synchronous
    writes (open mode O_SYNC, syscall sync/fsync, synchronous file attribute,
    synchronous mounted filesystem).

    Apart from that, swapping out of process pages by kswapd is currently not
    accounted at all as shown by the following snapshot of 'atop' on a heavily
    swapping system:

    ATOP - atdts 2008/04/07 19:01:24 10 seconds elapsed
    .......
    MEM | tot 1.9G | free 14.1M | cache 11.0M | buff 0.6M | slab 22.4M |
    SWP | tot 1.0G | free 513.6M | | vmcom 2.3G | vmlim 2.0G |
    PAG | scan 9865 | stall 0 | | swin 4337 | swout 4718 |
    DSK | sda | busy 100% | read 1499 | write 1949 | avio 2 ms |

    PID SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S DSK CMD 1/1
    13795 0.04s 0.01s 0K -3504K 12200K 0K -- - D 71% memeater
    27823 0.04s 0.00s 0K -360K 5080K 0K -- - D 29% appl
    13791 0.00s 0.24s 0K 0K 0K 0K -- - S 0% memeater
    13793 0.00s 0.24s 0K 0K 0K 0K -- - S 0% memeater
    13792 0.00s 0.23s 0K -4K 0K 0K -- - S 0% memeater
    13851 0.03s 0.00s 0K 0K 0K 0K -- - S 0% atop
    236 0.03s 0.00s 0K 0K 0K 0K -- - D 0% kswapd0

    The process counters RDDSK and WRDSK are retrieved from the
    standard /proc/pid/io.
    There are no write-request accounted to any of the processes while
    1949 write requests have been issued on disk (line marked with DSK).
    These writes should have been accounted to kswapd (writing to the swap
    device).

    With the additional counters maintained by this patch, every physical
    I/O request is accounted to one of the processes which can be very useful
    as an addition to the I/O accounting that is already implemented.
    A snapshot of 'atop' on a swapping system that is patched:

    ATOP - atdts 2008/04/07 19:01:17 10 seconds elapsed
    .......
    MEM | tot 1.9G | free 13.8M | cache 11.0M | buff 0.6M | slab 22.4M |
    SWP | tot 1.0G | free 513.4M | | vmcom 2.3G | vmlim 2.0G |
    PAG | scan 8021 | stall 0 | | swin 3923 | swout 3367 |
    DSK | sda | busy 100% | read 1578 | write 1304 | avio 3 ms |

    PID SYSCPU USRCPU VGROW RGROW RDDSK WRDSK RNET SNET S DSK CMD 1/1
    27823 0.05s 0.00s 0K 1796K 1072 55 0 0 D 39% appl
    236 0.02s 0.00s 0K 0K 0 988 0 0 D 34% kswapd0
    13795 0.04s 0.00s 0K -3824K 491 258 0 0 D 26% memeater
    2017 0.01s 0.00s 0K 0K 0 28 0 0 S 1% kjournald
    3218 0.00s 0.00s 0K 4K 6 0 0 0 S 0% sendmail

    The process counters RDDSK and WRDSK now show the number of read and write
    requests issued on disk for each process. The accumulated counters per process
    correspond to the total number of requests measured on disk level (line marked
    with DSK).

    For read accounting it also useful to see the number of I/O requests issued
    by a process (currently only the total number of Kbytes is accounted per
    process). After all, 64 I/O requests of 4 Kbytes cause a heavier disk load
    than 1 I/O request of 256 Kbytes.

    So the extra counters can be considered as a useful addition to the I/O
    counters that are currently maintained.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH 1/3] accounting: task counters for disk/network

    On Mon, Apr 7, 2008 at 11:10 PM, Andrew Morton
    wrote:
    >
    > This sort of accounting will presumably be needed by a disk bandwidth
    > cgroup controller. Perhaps the containers/cgroup people have plans of code
    > already?
    >


    Yes, there have been various cgroup block-I/O subsystems proposed. I
    know that at least the one posted by Hirokazu Takahashi
    uses the same per-page cgroup pointer as the
    memory controller.

    Paul
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH 1/3] accounting: task counters for disk/network

    On Tue, 8 Apr 2008 07:48:37 +0200 Gerlof Langeveld wrote:

    > > > --- linux-2.6.24.4-vanilla/block/ll_rw_blk.c 2008-03-24 19:49:18.000000000 +0100
    > > > +++ linux-2.6.24.4-modified/block/ll_rw_blk.c 2008-03-25 13:52:14.000000000 +0100
    > > > @@ -2739,6 +2739,19 @@ static void drive_stat_acct(struct reque
    > > > disk_round_stats(rq->rq_disk);
    > > > rq->rq_disk->in_flight++;
    > > > }
    > > > +
    > > > +#ifdef CONFIG_TASK_IO_ACCOUNTING
    > > > + switch (rw) {
    > > > + case READ:
    > > > + current->group_leader->ioac.dsk_rio += new_io;
    > > > + current->group_leader->ioac.dsk_rsz += rq->nr_sectors;
    > > > + break;
    > > > + case WRITE:
    > > > + current->group_leader->ioac.dsk_wio += new_io;
    > > > + current->group_leader->ioac.dsk_wsz += rq->nr_sectors;
    > > > + break;
    > > > + }
    > > > +#endif

    > >
    > > For many workloads, this will cause almost all writeout to be accounted to
    > > pdflush and perhaps kswapd. This makes the per-task write accounting
    > > largely unuseful.

    >
    > There are several situations that writeouts are accounted to the user-process
    > itself, e.g. when issueing direct writes (open mode O_DIRECT) or synchronous
    > writes (open mode O_SYNC, syscall sync/fsync, synchronous file attribute,
    > synchronous mounted filesystem).


    yup.

    > Apart from that, swapping out of process pages by kswapd is currently not
    > accounted at all as shown by the following snapshot of 'atop' on a heavily
    > swapping system:


    Under heavy load, callers into alloc_pages() will themselves perform disk
    writeout. So under the proposed scheme, process A will be accounted for
    writeout which was in fact caused by process B.

    > So the extra counters can be considered as a useful addition to the I/O
    > counters that are currently maintained.


    mmm, maybe. But if we implement a partial solution like this we really
    should have a plan to finish it off.

    There have been numerous attempts at this, which tend to involve adding
    backpointers to the pageframe structure and such.

    This sort of accounting will presumably be needed by a disk bandwidth
    cgroup controller. Perhaps the containers/cgroup people have plans of code
    already?

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH 1/3] accounting: task counters for disk/network

    Hi Gerlof,

    > > This sort of accounting will presumably be needed by a disk bandwidth
    > > cgroup controller. Perhaps the containers/cgroup people have plans of code
    > > already?
    > >

    >
    > Yes, there have been various cgroup block-I/O subsystems proposed. I
    > know that at least the one posted by Hirokazu Takahashi
    > uses the same per-page cgroup pointer as the
    > memory controller.


    Yes, I'm working on this.

    linux-2.6.25-rc series now have the memory controller in them, which
    supports to get the cgroup from any pageframe. I made a block-I/O
    controller on it experimentally.

    If you use this controller with dm-ioband, which is an engine implemented
    as a device mapper module to control the bandwidth of block I/Os,
    you can also get "cgroup block I/O accounting" from it.

    I guess you can enhance this controller to track the I/Os per process
    if you really want to. But it will cause extra overhead which I think
    you may not ignore.

    Thanks,
    Hirokazu Takahashi.



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread