TCP interactive data flow - TCP-IP

This is a discussion on TCP interactive data flow - TCP-IP ; On Jul 15, 1:18 am, Rick Jones wrote: > Sru...@gmail.com wrote: > > Aha, so Nagle?s algorithm only kicks in when packet to be send is very > > small > > For some definition of very small, ususally > ...

+ Reply to Thread
Page 2 of 3 FirstFirst 1 2 3 LastLast
Results 21 to 40 of 43

Thread: TCP interactive data flow

  1. Re: TCP interactive data flow

    On Jul 15, 1:18 am, Rick Jones wrote:
    > Sru...@gmail.com wrote:
    > > Aha, so Nagle?s algorithm only kicks in when packet to be send is very
    > > small

    >
    > For some definition of very small, ususally < the TCP MSS (Maximum
    > Segment Size). In broad handwaving terms, Nagle (should) work like
    > this:
    >


    I'm not sure I understand why you were trying to say here.





    > 1) Is this send() by the user, plus any queued, untransmitted data, >=
    > MSS? If yes, then transmit the data now, modulo constraints like
    > congestion or receiver window. If no, go to question 2.
    >
    > 2) Is the connection otherwise "idle?" That is, is there no
    > transmitted but not yet ACKnowledged data outstanding on the
    > connection? If yes, transmit the data now, modulo constraints like
    > congestion or receiver window. If no, go to 3.
    >
    > 3) Queue the data until:
    > a) The application provides enough data to get >= the MSS
    > b) The remote ACK's the currently unACKed data
    > c) The retransmission timer for currently unACKed data (if any)
    > expires and there is room for (some of) the queued data in the
    > segment to be retransmitted.
    >

    So basically the size limit is MSS, where anything smaller is buffered
    until there are no unacknowledged data? But why is MSS the limit? MSS
    can be greater than 1000 bytes, which in my opinion is not a tinygram.
    So why handle packets with 1 byte of data the same as packets of say
    400 bytes of data?


  2. Re: TCP interactive data flow

    On Jul 14, 6:11*pm, Sru...@gmail.com wrote:

    > So basically the size limit is MSS, where anything smaller is buffered
    > until there are no unacknowledged data? But why is MSS the limit? MSS
    > can be greater than 1000 bytes, which in my opinion is not a tinygram.
    > So why handle *packets with 1 byte of data *the same as packets of say
    > 400 bytes of data?


    Because in either case waiting is more efficient than sending. If you
    have at least one MSS, you are going to send the same packet no matter
    what.

    Also, if you send at even one byte less than the MSS, you can get
    repeatable degenerate behavior. For example, suppose the MSS is 768
    bytes. Suppose an application has a huge amount of data to send, but
    chooses to send it in 3,838 byte chunks (it has to use some chunk
    size, right?). You can send 4 768-byte chunks immediately, and you
    have 766 bytes left over. The application is about to call 'send'
    again. Which is better? To wait a split send and send a full segment?
    Or to repeatedly and inefficiently send unfull segments with no
    possible application workaround? (Since the app doesn't know the
    MSS.)

    DS

  3. Re: TCP interactive data flow

    On Jul 14, 8:11*pm, Sru...@gmail.com wrote:
    > On Jul 15, 1:18 am, Rick Jones wrote:
    >
    > > Sru...@gmail.com wrote:
    > > > Aha, so Nagle?s algorithm only kicks in when packet to be send is very
    > > > small

    >
    > > For some definition of very small, ususally < the TCP MSS (Maximum
    > > Segment Size). *In broad handwaving terms, Nagle (should) work like
    > > this:

    >
    > I'm not sure I understand why you were trying to say here.
    >
    >
    >
    >
    >
    > > 1) Is this send() by the user, plus any queued, untransmitted data, >=
    > > * *MSS? *If yes, then transmit the data now, modulo constraints like
    > > * *congestion or receiver window. *If no, go to question 2.

    >
    > > 2) Is the connection otherwise "idle?" That is, is there no
    > > * *transmitted but not yet ACKnowledged data outstanding on the
    > > * *connection? *If yes, transmit the data now, modulo constraintslike
    > > * *congestion or receiver window. *If no, go to 3.

    >
    > > 3) Queue the data until:
    > > *a) The application provides enough data to get >= the MSS
    > > *b) The remote ACK's the currently unACKed data
    > > *c) The retransmission timer for currently unACKed data (if any)
    > > * * expires and there is room for (some of) the queued data in the
    > > * * segment to be retransmitted.

    >
    > So basically the size limit is MSS, where anything smaller is buffered
    > until there are no unacknowledged data? But why is MSS the limit? MSS
    > can be greater than 1000 bytes, which in my opinion is not a tinygram.
    > So why handle *packets with 1 byte of data *the same as packets of say
    > 400 bytes of data?



    You keep overanalyzing this. With Nagel data is buffered if there's
    less than a packet's worth (MSS) to send, and there is already sent
    data waiting for an acknowledgement from the other end. The idea is
    to transmit as few packets as possible by making them as large as
    possible (which results in the most efficient utilization of the
    network), while still keeping interactive traffic prompt. Thus the
    maximum amount of buffering time is about one round trip plus half a
    second (200ms for most stacks).

    But the question is really the opposite of yours - why *not* a full
    packet (MSS)? It's obvious why you'd not want Nagel to buffer more
    than a packet's (MSS) worth of data (because a full pack is actually
    the goal of Nagel, there’s nothing left to accomplish but to transmit
    the thing). But what would you gain from capping the buffering at
    some lower limit? Remembering that it's for a rather limited time
    interval. And just how many applications would actually be better off
    because only (say) 200 bytes got buffered, rather than 1500?

  4. Re: TCP interactive data flow

    Srubys@gmail.com wrote:
    > So basically the size limit is MSS, where anything smaller is
    > buffered until there are no unacknowledged data? But why is MSS the
    > limit? MSS can be greater than 1000 bytes, which in my opinion is
    > not a tinygram. So why handle packets with 1 byte of data the same
    > as packets of say 400 bytes of data?


    If you go back in time by reading the initial Nagle paper/RFC MSSes
    were "typically" in the 536 byte range.

    The MSS is the "best" TCP can do for the ratio of data to data+headers.

    I'm not sure about your question wrt 1 byte vs 400 bytes. Are you
    asking why the Nagle limit isn't based on a constant rather than the
    MSS? In some stacks IIRC on can configure the value against which the
    user's send is compared. It generally defaults to the MSS for the
    connection. And yes, as MTU's and thus MSS's increase in size that
    does start to look a little, well, odd...

    rick jones
    --
    firebug n, the idiot who tosses a lit cigarette out his car window
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  5. Re: TCP interactive data flow

    robertwessel2@yahoo.com wrote:
    > But the question is really the opposite of yours - why *not* a full
    > packet (MSS)?


    Interestingly enough, TCP stacks trying to make use of TSO in the NIC
    (Transport/TCP Segmentation Offload) have just that issue - when/if to
    wait until there is even more than one MSS-worth of data before
    shipping data down the stack.

    rick jones
    --
    The computing industry isn't as much a game of "Follow The Leader" as
    it is one of "Ring Around the Rosy" or perhaps "Duck Duck Goose."
    - Rick Jones
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  6. Re: TCP interactive data flow

    > But what would you gain from capping the buffering at
    > some lower limit? Remembering that it's for a rather limited time
    > interval. And just how many applications would actually be better off
    > because only (say) 200 bytes got buffered, rather than 1500?


    not much I presume, but still, communication between two apps where at
    least one of them buffered 200 bytes instead of MSS-1 would be just a
    wee faster, provided this app ( one that buffers only up to 200
    bytes ) would send lots of data of size greater than 200 but smaller
    than MSS.


    thank you all for your help

    kind regards



  7. Re: TCP interactive data flow

    Srubys@gmail.com wrote:
    > > But what would you gain from capping the buffering at some lower
    > > limit? Remembering that it's for a rather limited time interval.
    > > And just how many applications would actually be better off
    > > because only (say) 200 bytes got buffered, rather than 1500?


    > not much I presume, but still, communication between two apps where at
    > least one of them buffered 200 bytes instead of MSS-1 would be just a
    > wee faster, provided this app ( one that buffers only up to 200
    > bytes ) would send lots of data of size greater than 200 but smaller
    > than MSS.


    Not necessarily. Here we have an example of something sending 200
    bytes at a time, leaving Nagle enabled, and then that same 200 byte
    send, with Nagle disabled (the nodelay case)

    manny:~# netperf -H moe -c -C -- -m 200
    TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to moe.west (10.208.0.15) port 0 AF_INET
    Recv Send Send Utilization Service Demand
    Socket Socket Message Elapsed Send Recv Send Recv
    Size Size Size Time Throughput local remote local remote
    bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

    87380 16384 200 10.02 941.38 19.62 12.84 6.828 4.469
    manny:~# netperf -H moe -c -C -- -m 200 -D
    TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to moe.west (10.208.0.15) port 0 AF_INET : nodelay
    Recv Send Send Utilization Service Demand
    Socket Socket Message Elapsed Send Recv Send Recv
    Size Size Size Time Throughput local remote local remote
    bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

    87380 16384 200 10.00 318.76 24.99 22.68 25.687 23.311
    manny:~#

    Notice that in the nodelay (Nagle off) case there is a significantly
    higher depand placed on the CPU of the system - in this case a four
    core system, which is why the CPU util caps at 25% since a single TCP
    connection will not (generally) make use of the services of more than
    one core. The increase is between 4x and 6x CPU consumed per KB
    transferred.

    rick jones
    --
    portable adj, code that compiles under more than one compiler
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  8. Re: TCP interactive data flow

    Rick Jones wrote:
    > Not necessarily. Here we have an example of something sending 200
    > bytes at a time, leaving Nagle enabled, and then that same 200 byte
    > send, with Nagle disabled (the nodelay case)


    Those numbers were for a unidirectional test over a GbE LAN. If the
    test is request/response then we start having "races" between
    standalone ACK timers, RTT's and how many requests or responses will
    be put into the connection at one time by the application. What
    follows is the ./configure --enable-burst mode of netperf with a
    TCP_RR test and a 200 byte request/response size. Again first is with
    defaults, second is with nagle disabled. I've stripped the socket
    buffer, request/response size and time columns to better fit in 80
    columns:

    manny:~# for i in 0 1 2 3 4 5 6 7 8 9 10; do netperf $HDR -t TCP_RR -H moe -c -C -B "burst $i" -- -r 200 -b $i; HDR="-P 0"; done
    TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to moe.west (10.208.0.15) port 0 AF_INET : first burst 0

    Trans. CPU CPU S.dem S.dem
    Rate local remote local remote
    per sec % S % S us/Tr us/Tr

    8657.38 3.61 3.47 16.681 16.044
    9247.23 3.21 3.57 13.882 15.463 burst 1
    10324.20 4.69 4.27 18.152 16.550 burst 2
    11371.37 4.08 4.31 14.340 15.150 burst 3
    13726.78 2.51 3.03 7.305 8.823 burst 4
    16007.27 4.82 8.12 12.052 20.283 burst 5
    18231.57 3.30 3.43 7.230 7.529 burst 6
    20235.90 2.98 3.01 5.893 5.950 burst 7
    22214.26 3.99 3.24 7.184 5.837 burst 8
    24002.79 4.00 2.99 6.663 4.984 burst 9
    25778.28 4.46 3.58 6.918 5.562 burst 10
    ....
    67198.41 7.11 6.29 4.229 3.745 burst 20
    98375.44 9.76 9.00 3.967 3.659 burst 30
    132360.98 11.86 12.00 3.583 3.627 burst 40
    173646.81 15.43 14.87 3.554 3.424 burst 50
    204709.83 18.38 17.15 3.591 3.351 burst 60
    235860.77 20.81 19.94 3.529 3.382 burst 70

    manny:~# HDR="-P 1";for i in 0 1 2 3 4 5 6 7 8 9 10; do netperf $HDR -t TCP_RR -H moe -c -C -B "burst $i" -- -r 200 -b $i -D; HDR="-P 0"; done
    TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to moe.west (10.208.0.15) port 0 AF_INET : nodelay : first burst 0

    Trans. CPU CPU S.dem S.dem
    Rate local remote local remote
    per sec % S % S us/Tr us/Tr

    8523.55 3.78 3.52 17.720 16.509
    17714.38 5.16 4.94 11.652 11.161 burst 1
    18660.94 5.92 5.59 12.697 11.978 burst 2
    27373.66 8.78 8.53 12.828 12.462 burst 3
    34303.27 10.22 10.67 11.914 12.436 burst 4
    41652.40 11.34 10.39 10.891 9.973 burst 5
    42222.80 12.43 12.81 11.778 12.135 burst 6
    45601.75 13.03 12.76 11.430 11.196 burst 7
    48737.80 13.58 13.47 11.142 11.052 burst 8
    52505.19 14.43 14.25 10.994 10.858 burst 9
    56406.20 14.95 14.40 10.602 10.209 burst 10
    ....
    101401.90 24.74 24.35 9.761 9.605 burst 20
    102946.48 24.99 24.75 9.711 9.619 burst 30
    104170.04 24.99 24.72 9.595 9.493 burst 40

    I stopped at 40 in the Nagle disabled case because it was pretty clear
    things had maxed-out - again one of the four cores was saturated.

    So, for smaller numbers of transactions outstanding at one time, the
    transaction rate for the RR test is higher with Nagle disabled, but as
    you increase the concurrent transactions, having Nagle enabled enables
    a higher transaction rate because it allows several transactions to be
    carried in a single TCP segment. As before, this is reflected in the
    lower service demand figures for the Nagle enabled case.

    rick jones
    --
    The computing industry isn't as much a game of "Follow The Leader" as
    it is one of "Ring Around the Rosy" or perhaps "Duck Duck Goose."
    - Rick Jones
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  9. Re: TCP interactive data flow

    On Jul 15, 4:16*pm, Sru...@gmail.com wrote:

    > not much I presume, but still, communication between two apps where at
    > least one of them buffered 200 bytes instead of MSS-1 would be just a
    > wee faster, provided this app ( one that buffers only up to 200
    > bytes ) would send lots of data of size greater than 200 but smaller
    > than MSS.


    No. Because such an application would be sending 200 bytes when no
    unacknowledged data is pending and so wouldn't trigger Nagle. The data
    replies from the other side would have ACKs piggy-backed on them, so
    Nagle would never delay any transmissions.

    Most people who criticize Nagle don't understand it.

    DS

  10. Re: TCP interactive data flow

    On Jul 16, 5:01*am, David Schwartz wrote:
    > On Jul 15, 4:16*pm, Sru...@gmail.com wrote:
    >
    > > not much I presume, but still, communication between two apps where at
    > > least one of them buffered 200 bytes instead of MSS-1 would be just a
    > > wee faster, provided this app ( one that buffers only up to 200
    > > bytes ) would send lots of data of size greater than 200 but smaller
    > > than MSS.

    >
    > No. Because such an application would be sending 200 bytes when no
    > unacknowledged data is pending and so wouldn't trigger Nagle. The data
    > replies from the other side would have ACKs piggy-backed on them, so
    > Nagle would never delay any transmissions.
    >
    > Most people who criticize Nagle don't understand it.


    I was going to start a new thread on this, but after searching for
    information on nagling and delayed acks, found this thread and decided
    to hang my questions on here.

    I work with some application architecture where timely notification of
    data is far more important than max throughput (or at least, we do not
    have any issues with max throughput).

    The apps in question generally establish a connection with third party
    servers over whose software we have no control of course. However,
    it's not helpful to think in terms of client / server because
    communication flows both ways. IOW either can initiate flow of
    information at any time. That's irrelevant anyway. Transmission is
    usually in small chunks of data that are likely to be sporadic and
    almost always below the MSS. A typical flow (real data, not tcp/ip
    acks) may look something like:

    OurApp -> submit request 112 bytes -> 3rdParty
    3rdParty -> acknowledge request 89 bytes -> OurApp
    3rdParty -> dataX 140 bytes -> OurApp

    The problem came in because 3rdParty was implementing nagling, and
    OurApp was delaying tcp/ip acks, so we were getting delays between
    "acknowledge request" and "dataX" of approximately 200ms. I have
    WireShark captures if anyone really cares, but basically with tcp/ip
    (simplified) included it looked something like this.

    OurApp -> PSH "submit request" 112 bytes -> 3rdParty
    3rdParty -> immediate ACK
    3rdParty -> 40ms later PSH "acknowledge request" 89 bytes -> OurApp
    (at this point 3rdParty has dataX available virtually instantly after
    the above is sent, and I assume it tries to send, however due to
    nagling, it is waiting for the ACK to the previous PSH).
    OurApp -> delays ack: 200ms later ACK
    3rdParty -> immediate PSH "dataX" 140 bytes -> OurApp

    From our point of view, that delay of 200ms to receive "dataX" was
    unacceptable.

    I apologise in advance for the great simplifcation of what is
    happening, or if I misunderstood the situation, but with a recent
    upgrade to the 3rdParty server software, nagling has been disabled and
    we see immediate resolution of this problem with so far no negative
    consequences.

    We are now seeing similar latency from a different 3rd party, and are
    about to start investigations. In the previous example, disabling
    delayed acks on the boxes OurApp ran on pretty much resolved the
    problem. We saw a slight round trip latency because 3rdParty was still
    waiting for the tcp/ip ACK, but it was acceptable. The better solution
    was disabling nagling on their side since "dataX" is sent immediately,
    incurring no round trip.

    So if the case with the latest problem turns out to be the same, and
    assuming we can't rely on 3rd party2 to disable nagling, what negative
    effects will disabling delayed acks on our boxes have?

    If it were just my application on the box I'd have no concerns, but
    they are production machines we share server real estate with any
    number of other applications. I wouldn't want to degrade their
    performance in some way, say consuming more CPU for example.

    Any corrections to my (mis)understanding are welcome.

  11. Re: TCP interactive data flow

    Mark (newsgroups) wrote:
    > On Jul 16, 5:01 am, David Schwartz wrote:
    >> On Jul 15, 4:16 pm, Sru...@gmail.com wrote:
    >>
    >>> not much I presume, but still, communication between two apps where at
    >>> least one of them buffered 200 bytes instead of MSS-1 would be just a
    >>> wee faster, provided this app ( one that buffers only up to 200
    >>> bytes ) would send lots of data of size greater than 200 but smaller
    >>> than MSS.

    >> No. Because such an application would be sending 200 bytes when no
    >> unacknowledged data is pending and so wouldn't trigger Nagle. The data
    >> replies from the other side would have ACKs piggy-backed on them, so
    >> Nagle would never delay any transmissions.
    >>
    >> Most people who criticize Nagle don't understand it.

    >
    > I was going to start a new thread on this, but after searching for
    > information on nagling and delayed acks, found this thread and decided
    > to hang my questions on here.
    >
    > I work with some application architecture where timely notification of
    > data is far more important than max throughput (or at least, we do not
    > have any issues with max throughput).
    >
    > The apps in question generally establish a connection with third party
    > servers over whose software we have no control of course. However,
    > it's not helpful to think in terms of client / server because
    > communication flows both ways. IOW either can initiate flow of
    > information at any time. That's irrelevant anyway. Transmission is
    > usually in small chunks of data that are likely to be sporadic and
    > almost always below the MSS. A typical flow (real data, not tcp/ip
    > acks) may look something like:
    >
    > OurApp -> submit request 112 bytes -> 3rdParty
    > 3rdParty -> acknowledge request 89 bytes -> OurApp
    > 3rdParty -> dataX 140 bytes -> OurApp
    >
    > The problem came in because 3rdParty was implementing nagling, and
    > OurApp was delaying tcp/ip acks, so we were getting delays between
    > "acknowledge request" and "dataX" of approximately 200ms. I have
    > WireShark captures if anyone really cares, but basically with tcp/ip
    > (simplified) included it looked something like this.
    >
    > OurApp -> PSH "submit request" 112 bytes -> 3rdParty
    > 3rdParty -> immediate ACK
    > 3rdParty -> 40ms later PSH "acknowledge request" 89 bytes -> OurApp
    > (at this point 3rdParty has dataX available virtually instantly after
    > the above is sent, and I assume it tries to send, however due to
    > nagling, it is waiting for the ACK to the previous PSH).
    > OurApp -> delays ack: 200ms later ACK
    > 3rdParty -> immediate PSH "dataX" 140 bytes -> OurApp
    >
    > From our point of view, that delay of 200ms to receive "dataX" was
    > unacceptable.
    >
    > I apologise in advance for the great simplifcation of what is
    > happening, or if I misunderstood the situation, but with a recent
    > upgrade to the 3rdParty server software, nagling has been disabled and
    > we see immediate resolution of this problem with so far no negative
    > consequences.
    >
    > We are now seeing similar latency from a different 3rd party, and are
    > about to start investigations. In the previous example, disabling
    > delayed acks on the boxes OurApp ran on pretty much resolved the
    > problem. We saw a slight round trip latency because 3rdParty was still
    > waiting for the tcp/ip ACK, but it was acceptable. The better solution
    > was disabling nagling on their side since "dataX" is sent immediately,
    > incurring no round trip.
    >
    > So if the case with the latest problem turns out to be the same, and
    > assuming we can't rely on 3rd party2 to disable nagling, what negative
    > effects will disabling delayed acks on our boxes have?
    >
    > If it were just my application on the box I'd have no concerns, but
    > they are production machines we share server real estate with any
    > number of other applications. I wouldn't want to degrade their
    > performance in some way, say consuming more CPU for example.
    >
    > Any corrections to my (mis)understanding are welcome.


    Gah. I forgot to add, the connection and communication to 3rdParty and
    OurApp is done via an API provided by 3rdParty (lib/dll). We have no
    control over setting no ack delay at a socket level. It would have to be
    at machine level, unless someone knows ways around this.

  12. Re: TCP interactive data flow

    On Jul 16, 11:55*am, "Mark (newsgroups)"
    wrote:
    > So if the case with the latest problem turns out to be the same, and
    > assuming we can't rely on 3rd party2 to disable nagling, what negative
    > effects will disabling delayed acks on our boxes have?
    >
    > If it were just my application on the box I'd have no concerns, but
    > they are production machines we share server real estate with any
    > number of other applications. I wouldn't want to degrade their
    > performance in some way, say consuming more CPU for example.



    Turning off delayed acks will (usually slightly) increase the CPU load
    on both ends of the conversation(s), and will result in more send
    traffic from your host. It will usually help response time, at some
    cost in bandwidth.

  13. Re: TCP interactive data flow

    Here is my "boilerplate" Nagle discussion, performance discussion at
    the end:

    In broad terms, whenever an application does a send() call, the logic
    of the Nagle algorithm is supposed to go something like this:

    1) Is the quantity of data in this send, plus any queued, unsent data,
    greater than the MSS (Maximum Segment Size) for this connection? If
    yes, send the data in the user's send now (modulo any other
    constraints such as receiver's advertised window and the TCP
    congestion window). If no, go to 2.

    2) Is the connection to the remote otherwise idle? That is, is there
    no unACKed data outstanding on the network. If yes, send the data in
    the user's send now. If no, queue the data and wait. Either the
    application will continue to call send() with enough data to get to a
    full MSS-worth of data, or the remote will ACK all the currently sent,
    unACKed data, or our retransmission timer will expire.

    Now, where applications run into trouble is when they have what might
    be described as "write, write, read" behaviour, where they present
    logically associated data to the transport in separate 'send' calls
    and those sends are typically less than the MSS for the connection.
    It isn't so much that they run afoul of Nagle as they run into issues
    with the interaction of Nagle and the other heuristics operating on
    the remote. In particular, the delayed ACK heuristics.

    When a receiving TCP is deciding whether or not to send an ACK back to
    the sender, in broad handwaving terms it goes through logic similar to
    this:

    a) is there data being sent back to the sender? if yes, piggy-back the
    ACK on the data segment.

    b) is there a window update being sent back to the sender? if yes,
    piggy-back the ACK on the window update.

    c) has the standalone ACK timer expired.

    Window updates are generally triggered by the following heuristics:

    i) would the window update be for a non-trivial fraction of the window
    - typically somewhere at or above 1/4 the window, that is, has the
    application "consumed" at least that much data? if yes, send a
    window update. if no, check ii.

    ii) would the window update be for, the application "consumed," at
    least 2*MSS worth of data? if yes, send a window update, if no wait.

    Now, going back to that write, write, read application, on the sending
    side, the first write will be transmitted by TCP via logic rule 2 -
    the connection is otherwise idle. However, the second small send will
    be delayed as there is at that point unACKnowledged data outstanding
    on the connection.

    At the receiver, that small TCP segment will arrive and will be passed
    to the application. The application does not have the entire app-level
    message, so it will not send a reply (data to TCP) back. The typical
    TCP window is much much larger than the MSS, so no window update would
    be triggered by heuristic i. The data just arrived is < 2*MSS, so no
    window update from heuristic ii. Since there is no window update, no
    ACK is sent by heuristic b.

    So, that leaves heuristic c - the standalone ACK timer. That ranges
    anywhere between 50 and 200 milliseconds depending on the TCP stack in
    use.

    If you've read this far now we can take a look at the effect of
    various things touted as "fixes" to applications experiencing this
    interaction. We take as our example a client-server application where
    both the client and the server are implemented with a write of a small
    application header, followed by application data. First, the
    "default" case which is with Nagle enabled (TCP_NODELAY _NOT_ set) and
    with standard ACK behaviour:

    Client Server
    Req Header ->
    <- Standalone ACK after Nms
    Req Data ->
    <- Possible standalone ACK
    <- Rsp Header
    Standalone ACK ->
    <- Rsp Data
    Possible standalone ACK ->


    For two "messages" we end-up with at least six segments on the wire.
    The possible standalone ACKs will depend on whether the server's
    response time, or client's think time is longer than the standalone
    ACK interval on their respective sides. Now, if TCP_NODELAY is set we
    see:


    Client Server
    Req Header ->
    Req Data ->
    <- Possible Standalone ACK after Nms
    <- Rsp Header
    <- Rsp Data
    Possible Standalone ACK ->

    In theory, we are down two four segments on the wire which seems good,
    but frankly we can do better. First though, consider what happens
    when someone disables delayed ACKs

    Client Server
    Req Header ->
    <- Immediate Standalone ACK
    Req Data ->
    <- Immediate Standalone ACK
    <- Rsp Header
    Immediate Standalone ACK ->
    <- Rsp Data
    Immediate Standalone ACK ->

    Now we definitly see 8 segments on the wire. It will also be that way
    if both TCP_NODELAY is set and delayed ACKs are disabled.

    How about if the application did the "right" think in the first place?
    That is sent the logically associated data at the same time:


    Client Server
    Request ->
    <- Possible Standalone ACK
    <- Response
    Possible Standalone ACK ->

    We are down to two segments on the wire.

    For "small" packets, the CPU cost is about the same regardless of data
    or ACK. This means that the application which is making the propper
    gathering send call will spend far fewer CPU cycles in the networking
    stack.


    --
    denial, anger, bargaining, depression, acceptance, rebirth...
    where do you want to be today?
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  14. Re: TCP interactive data flow

    Rick Jones wrote:
    > Here is my "boilerplate" Nagle discussion, performance discussion at
    > the end:
    >
    > In broad terms, whenever an application does a send() call, the logic
    > of the Nagle algorithm is supposed to go something like this:
    >
    > 1) Is the quantity of data in this send, plus any queued, unsent data,
    > greater than the MSS (Maximum Segment Size) for this connection? If
    > yes, send the data in the user's send now (modulo any other
    > constraints such as receiver's advertised window and the TCP
    > congestion window). If no, go to 2.
    >
    > 2) Is the connection to the remote otherwise idle? That is, is there
    > no unACKed data outstanding on the network. If yes, send the data in
    > the user's send now. If no, queue the data and wait. Either the
    > application will continue to call send() with enough data to get to a
    > full MSS-worth of data, or the remote will ACK all the currently sent,
    > unACKed data, or our retransmission timer will expire.
    >
    > Now, where applications run into trouble is when they have what might
    > be described as "write, write, read" behaviour, where they present
    > logically associated data to the transport in separate 'send' calls
    > and those sends are typically less than the MSS for the connection.
    > It isn't so much that they run afoul of Nagle as they run into issues
    > with the interaction of Nagle and the other heuristics operating on
    > the remote. In particular, the delayed ACK heuristics.
    >
    > When a receiving TCP is deciding whether or not to send an ACK back to
    > the sender, in broad handwaving terms it goes through logic similar to
    > this:
    >
    > a) is there data being sent back to the sender? if yes, piggy-back the
    > ACK on the data segment.
    >
    > b) is there a window update being sent back to the sender? if yes,
    > piggy-back the ACK on the window update.
    >
    > c) has the standalone ACK timer expired.
    >
    > Window updates are generally triggered by the following heuristics:
    >
    > i) would the window update be for a non-trivial fraction of the window
    > - typically somewhere at or above 1/4 the window, that is, has the
    > application "consumed" at least that much data? if yes, send a
    > window update. if no, check ii.
    >
    > ii) would the window update be for, the application "consumed," at
    > least 2*MSS worth of data? if yes, send a window update, if no wait.
    >
    > Now, going back to that write, write, read application, on the sending
    > side, the first write will be transmitted by TCP via logic rule 2 -
    > the connection is otherwise idle. However, the second small send will
    > be delayed as there is at that point unACKnowledged data outstanding
    > on the connection.
    >
    > At the receiver, that small TCP segment will arrive and will be passed
    > to the application. The application does not have the entire app-level
    > message, so it will not send a reply (data to TCP) back. The typical
    > TCP window is much much larger than the MSS, so no window update would
    > be triggered by heuristic i. The data just arrived is < 2*MSS, so no
    > window update from heuristic ii. Since there is no window update, no
    > ACK is sent by heuristic b.
    >
    > So, that leaves heuristic c - the standalone ACK timer. That ranges
    > anywhere between 50 and 200 milliseconds depending on the TCP stack in
    > use.
    >
    > If you've read this far now we can take a look at the effect of
    > various things touted as "fixes" to applications experiencing this
    > interaction. We take as our example a client-server application where
    > both the client and the server are implemented with a write of a small
    > application header, followed by application data. First, the
    > "default" case which is with Nagle enabled (TCP_NODELAY _NOT_ set) and
    > with standard ACK behaviour:
    >
    > Client Server
    > Req Header ->
    > <- Standalone ACK after Nms
    > Req Data ->
    > <- Possible standalone ACK
    > <- Rsp Header
    > Standalone ACK ->
    > <- Rsp Data
    > Possible standalone ACK ->
    >
    >
    > For two "messages" we end-up with at least six segments on the wire.
    > The possible standalone ACKs will depend on whether the server's
    > response time, or client's think time is longer than the standalone
    > ACK interval on their respective sides. Now, if TCP_NODELAY is set we
    > see:
    >
    >
    > Client Server
    > Req Header ->
    > Req Data ->
    > <- Possible Standalone ACK after Nms
    > <- Rsp Header
    > <- Rsp Data
    > Possible Standalone ACK ->
    >
    > In theory, we are down two four segments on the wire which seems good,
    > but frankly we can do better. First though, consider what happens
    > when someone disables delayed ACKs
    >
    > Client Server
    > Req Header ->
    > <- Immediate Standalone ACK
    > Req Data ->
    > <- Immediate Standalone ACK
    > <- Rsp Header
    > Immediate Standalone ACK ->
    > <- Rsp Data
    > Immediate Standalone ACK ->
    >
    > Now we definitly see 8 segments on the wire. It will also be that way
    > if both TCP_NODELAY is set and delayed ACKs are disabled.
    >
    > How about if the application did the "right" think in the first place?
    > That is sent the logically associated data at the same time:
    >
    >
    > Client Server
    > Request ->
    > <- Possible Standalone ACK
    > <- Response
    > Possible Standalone ACK ->
    >
    > We are down to two segments on the wire.
    >
    > For "small" packets, the CPU cost is about the same regardless of data
    > or ACK. This means that the application which is making the propper
    > gathering send call will spend far fewer CPU cycles in the networking
    > stack.


    A pretty complete description of the problem, and seems to be exactly as
    I understood it. Thanks for that.

  15. Re: TCP interactive data flow

    robertwessel2@yahoo.com wrote:
    > On Jul 16, 11:55 am, "Mark (newsgroups)"
    > wrote:
    >> So if the case with the latest problem turns out to be the same, and
    >> assuming we can't rely on 3rd party2 to disable nagling, what negative
    >> effects will disabling delayed acks on our boxes have?
    >>
    >> If it were just my application on the box I'd have no concerns, but
    >> they are production machines we share server real estate with any
    >> number of other applications. I wouldn't want to degrade their
    >> performance in some way, say consuming more CPU for example.

    >
    >
    > Turning off delayed acks will (usually slightly) increase the CPU load
    > on both ends of the conversation(s), and will result in more send
    > traffic from your host. It will usually help response time, at some
    > cost in bandwidth.


    Thank you. I guess the real answer is to look at the performance of the
    boxes in question and see how much give we have. It's not trivial since
    as I said, other applications share server real estate. But it seems
    that I have understood the problem correctly.

  16. Re: TCP interactive data flow

    "Mark (newsgroups)" wrote:
    > A pretty complete description of the problem, and seems to be
    > exactly as I understood it. Thanks for that.


    My pleasure. You should be able to plug-in your message sizes and the
    sizes for a standalone TCP ACK segment (plus IP header and link-layer
    header) and arrive at an estimate for the differences in maximum
    network bandwidth achievable.

    rick jones
    --
    Wisdom Teeth are impacted, people are affected by the effects of events.
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  17. Re: TCP interactive data flow

    Rick Jones wrote:
    > "Mark (newsgroups)" wrote:
    >> A pretty complete description of the problem, and seems to be
    >> exactly as I understood it. Thanks for that.

    >
    > My pleasure. You should be able to plug-in your message sizes and the
    > sizes for a standalone TCP ACK segment (plus IP header and link-layer
    > header) and arrive at an estimate for the differences in maximum
    > network bandwidth achievable.


    Interesting in theory, but not applicable to me in practice. I have a
    few options

    1) Leave things as they are - not really acceptable, we have clients
    complaining about these 100-200ms latencies.

    2) Try turning off delayed ack on a production machine - possible but
    I'm very worried about the negative impacts on other applications

    3) Hope the 3rd party comes out with a solution with nagling disabled on
    their side.

    As I mentioned, I have no control at a socket level on the tcp/ip
    communication since this is done through their own provided API.

  18. Re: TCP interactive data flow

    > Interesting in theory, but not applicable to me in practice. I have a
    > few options


    > 1) Leave things as they are - not really acceptable, we have clients
    > complaining about these 100-200ms latencies.


    > 2) Try turning off delayed ack on a production machine - possible
    > but I'm very worried about the negative impacts on other
    > applications


    That was one of the reasons for doing the packet size overhead
    calculation. If we are talking about an "ethernet like" thing, there
    is 14 bytes worth of link-layer header, 20 bytes of IPv4 header and
    then, assuming timstamps are on in TCP, 32 bytes of TCP header. So,
    headers for any packet on the wire will be at least 14+20+32 or 66
    bytes. You can then use your known application-level message and ack
    sizes. That could tell you the effect at the network bandwidth level.
    Effect at the CPU util level would require gathering some fundamental
    performance figures for your system(s) and stacks(s) with something
    like netperf. Perhaps using a test system if you have one.

    > 3) Hope the 3rd party comes out with a solution with nagling
    > disabled on their side.


    An application-layer ACK implies an application-layer retransmission
    mechanism. Is there one? Any idea what those timers happen to be and
    whether the application can implemented application-layer delayed ACK?
    Then it could piggy-back its ACKs on replies and avoid the nagle bit.

    rick jones
    --
    firebug n, the idiot who tosses a lit cigarette out his car window
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  19. Re: TCP interactive data flow

    On Jul 16, 2:29*pm, "Mark (newsgroups)"
    wrote:

    > 3) Hope the 3rd party comes out with a solution with nagling disabled on
    > their side.


    You left out:

    4) Hope the 3rt party comes out what a *proper* solution on their
    side, sending all the data in a single write call like they're
    supposed to.

    5) Disabling Nagle on your side and dribbling data in the delay
    interval to give your ACKs something to piggyback on.

    DS

  20. Re: TCP interactive data flow

    On Jul 17, 2:50*am, David Schwartz wrote:
    > On Jul 16, 2:29*pm, "Mark (newsgroups)"
    > wrote:
    >
    > > 3) Hope the 3rd party comes out with a solution with nagling disabled on
    > > their side.

    >
    > You left out:
    >
    > 4) Hope the 3rt party comes out what a *proper* solution on their
    > side, sending all the data in a single write call like they're
    > supposed to.


    Thanks but you lack understanding of the problem space. I'm reluctant
    to actually say what it is we're doing due to sensitivities, but
    needless to say you should accept that the situation I described is as
    it is for a reason (I don't mean the nagling I mean sending the
    "acknowledge" and "dataX" in seperate write calls).

    > 5) Disabling Nagle on your side and dribbling data in the delay
    > interval to give your ACKs something to piggyback on.


    Firstly, I don't think this would solve the problem since data can be
    sporadic therefore we'd still see the delay in many cases. Which is
    not a solution. Secondly, as I mentioned, I do not have control over
    the underlying tcp/ip communication which is done via an api provided.

+ Reply to Thread
Page 2 of 3 FirstFirst 1 2 3 LastLast