Scatter-gather I/O and copies - Networking

This is a discussion on Scatter-gather I/O and copies - Networking ; Hello, (As far as I can tell, the answer to my question is OS-dependent, and even driver-dependent.) Consider two buffers A and B. char A[16]; char B[1316]; I want to send A+B (A and B concatenated) in a single UDP ...

+ Reply to Thread
Results 1 to 7 of 7

Thread: Scatter-gather I/O and copies

  1. Scatter-gather I/O and copies

    Hello,

    (As far as I can tell, the answer to my question is OS-dependent, and
    even driver-dependent.)

    Consider two buffers A and B.

    char A[16];
    char B[1316];

    I want to send A+B (A and B concatenated) in a single UDP datagram.
    (I have to do it 500-5000 times per second.)

    One obvious(?) solution is to do:

    char C[sizeof A + sizeof B];
    memcpy(C, A, sizeof A);
    memcpy(C+sizeof A, B, sizeof B);
    send(sock, C, sizeof C, 0);

    But the copies feels unnecessary.
    (I could save one copy by making A as big as C, but still.)

    I could also use scatter-gather I/O with sendmsg().

    I'm afraid the kernel (or Ethernet driver) will just end up copying the
    two buffers to concatenate them anyway. Are some Ethernet drivers
    "smart" enough that they don't perform any copy?

    (I use Linux 2.6.20 and Fast Ethernet or GigE devices.)

    Regards.

  2. Re: Scatter-gather I/O and copies

    On 07/31/2007 03:45 PM, Spoon wrote:
    > Hello,
    >
    > (As far as I can tell, the answer to my question is OS-dependent, and
    > even driver-dependent.)
    >
    > Consider two buffers A and B.
    >
    > char A[16];
    > char B[1316];
    >
    > I want to send A+B (A and B concatenated) in a single UDP datagram.
    > (I have to do it 500-5000 times per second.)
    >
    > One obvious(?) solution is to do:
    >
    > char C[sizeof A + sizeof B];
    > memcpy(C, A, sizeof A);
    > memcpy(C+sizeof A, B, sizeof B);
    > send(sock, C, sizeof C, 0);
    >
    > But the copies feels unnecessary.
    > (I could save one copy by making A as big as C, but still.)
    >
    > I could also use scatter-gather I/O with sendmsg().
    >
    > I'm afraid the kernel (or Ethernet driver) will just end up copying the
    > two buffers to concatenate them anyway. Are some Ethernet drivers
    > "smart" enough that they don't perform any copy?
    >
    > (I use Linux 2.6.20 and Fast Ethernet or GigE devices.)
    >
    > Regards.


    Hum, a using a struct comes in mind:

    struct buff {
    char a[16];
    char b[1316];
    };

    Hope that helps!

    --
    Dr Balwinder S "bsd" Dheeman Registered Linux User: #229709
    Anu'z Linux@HOME Machines: #168573, 170593, 259192
    Chandigarh, UT, 160062, India Gentoo, Fedora, Debian/FreeBSD/XP
    Home: http://cto.homelinux.net/~bsd/ Visit: http://counter.li.org/

  3. Re: Scatter-gather I/O and copies

    Spoon wrote:
    > Hello,
    >
    > (As far as I can tell, the answer to my question is OS-dependent, and
    > even driver-dependent.)
    >
    > Consider two buffers A and B.
    >
    > char A[16];
    > char B[1316];
    >
    > I want to send A+B (A and B concatenated) in a single UDP datagram.
    > (I have to do it 500-5000 times per second.)
    >
    > One obvious(?) solution is to do:
    >
    > char C[sizeof A + sizeof B];
    > memcpy(C, A, sizeof A);
    > memcpy(C+sizeof A, B, sizeof B);
    > send(sock, C, sizeof C, 0);
    >
    > But the copies feels unnecessary.
    > (I could save one copy by making A as big as C, but still.)
    >
    > I could also use scatter-gather I/O with sendmsg().
    >
    > I'm afraid the kernel (or Ethernet driver) will just end up copying the
    > two buffers to concatenate them anyway. Are some Ethernet drivers
    > "smart" enough that they don't perform any copy?
    >
    > (I use Linux 2.6.20 and Fast Ethernet or GigE devices.)
    >
    > Regards.


    You will save copying the buffer by using writev(). For that to work
    with UDP, you will have to specify the destination in advance with a
    connect().

    Robert

  4. Re: Scatter-gather I/O and copies

    Spoon wrote On 07/31/07 06:15,:
    > [...]
    > Consider two buffers A and B.
    >
    > char A[16];
    > char B[1316];
    >
    > I want to send A+B (A and B concatenated) in a single UDP datagram.
    > (I have to do it 500-5000 times per second.)
    > [...]
    > I could also use scatter-gather I/O with sendmsg().
    >
    > I'm afraid the kernel (or Ethernet driver) will just end up copying the
    > two buffers to concatenate them anyway. Are some Ethernet drivers
    > "smart" enough that they don't perform any copy?


    Some network stacks may be smart enough to avoid
    a copy, some may not. But you can be certain of one
    thing: if you copy the data yourself you *will* incur
    a copy no matter how smart the rest of the system is.
    By using scatter/gather, you at least give the stack
    an opportunity to be clever -- it may or may not exploit
    the opportunity you give it, but you haven't already
    closed the door.

    Even if the network stack does copy your buffers
    you may win with scatter/gather by copying the data
    not zero times, but one less than if you'd copied it
    yourself. (For example, a stack that always copies
    outbound data from user space to kernel space will
    copy even a single contiguous buffer; if its gather
    mode just makes those copies from several buffers,
    most of the copying you've done is overhead.)

    Have you measured the throughput you're actually
    getting to see whether it meets your needs? Are you
    off by just a few percent, or by a big factor?

    --
    Eric.Sosman@sun.com

  5. Re: Scatter-gather I/O and copies

    Spoon writes:

    > Hello,
    >
    > (As far as I can tell, the answer to my question is OS-dependent, and
    > even driver-dependent.)
    >
    > Consider two buffers A and B.
    >
    > char A[16];
    > char B[1316];
    >
    > I want to send A+B (A and B concatenated) in a single UDP datagram.
    > (I have to do it 500-5000 times per second.)
    >
    > One obvious(?) solution is to do:


    Are you sure sizeof A + sizeof B is the same as the size of the entire
    memory space encompassed by A and B? Alignment issues? I don't
    know. Sounds risky to me. Ask on comp.lang.c.

    Make it a Union, but add another 16 bytes to B and offset all B access
    by 16?

    If you use a struct will the elements guarantee to be aligned side by
    side to the char level?

    >
    > char C[sizeof A + sizeof B];
    > memcpy(C, A, sizeof A);
    > memcpy(C+sizeof A, B, sizeof B);
    > send(sock, C, sizeof C, 0);
    >
    > But the copies feels unnecessary.
    > (I could save one copy by making A as big as C, but still.)
    >
    > I could also use scatter-gather I/O with sendmsg().
    >
    > I'm afraid the kernel (or Ethernet driver) will just end up copying
    > the two buffers to concatenate them anyway. Are some Ethernet drivers
    > "smart" enough that they don't perform any copy?
    >
    > (I use Linux 2.6.20 and Fast Ethernet or GigE devices.)
    >
    > Regards.


    --

  6. Re: Scatter-gather I/O and copies

    Eric Sosman wrote:

    > Spoon wrote:
    >> [...]
    >> Consider two buffers A and B.
    >>
    >> char A[16];
    >> char B[1316];
    >>
    >> I want to send A+B (A and B concatenated) in a single UDP datagram.
    >> (I have to do it 500-5000 times per second.)
    >> [...]
    >> I could also use scatter-gather I/O with sendmsg().
    >>
    >> I'm afraid the kernel (or Ethernet driver) will just end up copying the
    >> two buffers to concatenate them anyway. Are some Ethernet drivers
    >> "smart" enough that they don't perform any copy?

    >
    > Some network stacks may be smart enough to avoid
    > a copy, some may not. But you can be certain of one
    > thing: if you copy the data yourself you *will* incur
    > a copy no matter how smart the rest of the system is.
    > By using scatter/gather, you at least give the stack
    > an opportunity to be clever -- it may or may not exploit
    > the opportunity you give it, but you haven't already
    > closed the door.
    >
    > Even if the network stack does copy your buffers
    > you may win with scatter/gather by copying the data
    > not zero times, but one less than if you'd copied it
    > yourself. (For example, a stack that always copies
    > outbound data from user space to kernel space will
    > copy even a single contiguous buffer; if its gather
    > mode just makes those copies from several buffers,
    > most of the copying you've done is overhead.)
    >
    > Have you measured the throughput you're actually
    > getting to see whether it meets your needs? Are you
    > off by just a few percent, or by a big factor?


    I'm still in the design phase, I haven't written much code yet.

    I left an important part out:

    Each "A+B" datagram might be sent to different destinations.
    (4-8 destinations seems like a reasonable upper bound.)
    Some destinations only want the first 12 bytes of A, while
    other destinations want all 16 bytes of A.

    This is why I started considering sendmsg() as an option.

    Regards.

  7. Re: Scatter-gather I/O and copies

    On Tue, 31 Jul 2007 16:57:52 +0200, Spoon wrote:

    > Eric Sosman wrote:
    >
    >> Spoon wrote:
    >>> [...]
    >>> Consider two buffers A and B.
    >>>
    >>> char A[16];
    >>> char B[1316];
    >>>
    >>> I want to send A+B (A and B concatenated) in a single UDP datagram. (I
    >>> have to do it 500-5000 times per second.) [...]
    >>> I could also use scatter-gather I/O with sendmsg().


    Note that if you _really_ care, you _might_ be able to get the best
    performance (on Linux) with using a single buffer and vmsplice() with the
    flags to do the page flipping ... but that would mean using at least a
    page per. message, AIUI.
    But, to be frank, I would be shocked if you got a well performing and
    stable application if your main data structure is a char array.

    > I'm still in the design phase, I haven't written much code yet.
    >
    > I left an important part out:
    >
    > Each "A+B" datagram might be sent to different destinations. (4-8
    > destinations seems like a reasonable upper bound.) Some destinations
    > only want the first 12 bytes of A, while other destinations want all 16
    > bytes of A.
    >
    > This is why I started considering sendmsg() as an option.


    This doesn't alter much, doing a memcpy() of exactly 12 or 16 bytes
    will be so fast it's almost guaranteed not to be the problem in a real
    application.

    --
    James Antill -- james@and.org
    C String APIs use too much memory? ustr: length, ref count, size and
    read-only/fixed. Ave. 44% overhead over strdup(), for 0-20B strings
    http://www.and.org/ustr/

+ Reply to Thread