Linux TCP - unexpected retransmissions - Networking

This is a discussion on Linux TCP - unexpected retransmissions - Networking ; This may not be the proper newsgroup but any help would be greatly appreciated. Our are working on an embedded system that has a number of PowerQUICC processors running Linux. During normal operation, processors exchange small messages ( requirement of ...

+ Reply to Thread
Results 1 to 10 of 10

Thread: Linux TCP - unexpected retransmissions

  1. Linux TCP - unexpected retransmissions

    This may not be the proper newsgroup but any help would be greatly
    appreciated.

    Our are working on an embedded system that has a number of PowerQUICC
    processors running Linux. During normal operation, processors exchange
    small messages (< 100 bytes) using TCP. We have a response time
    requirement of about 100 milliseconds and we observed that sometimes
    we have a long latency in transporting (e.g., > 200 mlliseconds across
    Ethernet link) messages between nodes of the system resulting in
    response time exceeding our requirement. This latency occurs randomly
    at different places and on different interface types. We set the
    socket NO_DELAY option, tried different setting (proc file ipv4
    options) and test programs to isolate the root cause of the latency
    with no success.

    We can reproduce the latency using a small application where two
    PowerQuicc cards randomly send each other burst of messages across an
    Ethernet link. For this test, we are using the 2.6.16 kernel. We use a
    sniffer to capture data across the Ethernet link to realize that
    sometimes when both TCPs send each other messages at about the same
    time (segment 5 and 6 below), for unknown reasons, the second TCP does
    not ack the message from the first TCP and a transmission occurs
    (segment 8). We also observed that retransmissions sometimes occur
    when one TCP is busy transmitting many messages (segment 38 contains
    many application messages) while a message is being sent to it, again,
    for unknown reasons, that TCP does not ack the message thus forcing a
    retransmission (segment 40).

    Netstats reports TCP segments being retransmitted but no error at the
    interface level. We have no reason to believe that segments are
    dropped at the physical layer. We suspect that segments are dropped at
    the TCP layer but we don't know why/where. Any ideas?

    Thanks
    Francois

    Here is the trace with relative sequence numbers where we capture
    three instances of a retransmission.
    1 0.000000 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [PSH, ACK] Seq=0 Ack=0 Win=9902 Len=84 TSV=15025917 TSER=16502810
    2 0.039817 172.118.100.101 172.118.100.102 TCP 9000 >
    4124 [ACK] Seq=0 Ack=84 Win=2896 Len=0 TSV=16502926 TSER=15025917
    3 0.080062 172.118.100.101 172.118.100.102 TCP 9000 >
    4124 [PSH, ACK] Seq=0 Ack=84 Win=2896 Len=8 TSV=16502936 TSER=15025917
    4 0.080103 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [ACK] Seq=84 Ack=8 Win=9902 Len=0 TSV=15025937 TSER=16502936
    5 0.583935 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [PSH, ACK] Seq=84 Ack=8 Win=9902 Len=8 TSV=15026063 TSER=16502936
    6 0.583940 172.118.100.101 172.118.100.102 TCP 9000 >
    4124 [PSH, ACK] Seq=8 Ack=84 Win=2896 Len=8 TSV=16503062 TSER=15025937
    7 0.583985 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [ACK] Seq=92 Ack=16 Win=9902 Len=0 TSV=15026063 TSER=16503062
    8 0.795861 172.118.100.102 172.118.100.101 TCP [TCP
    Retransmission] 4124 > 9000 [PSH, ACK] Seq=84 Ack=16 Win=9902 Len=8
    TSV=15026116 TSER=16503062
    9 0.796059 172.118.100.101 172.118.100.102 TCP 9000 >
    4124 [ACK] Seq=16 Ack=92 Win=2896 Len=0 TSV=16503115 TSER=15026116
    10 0.797151 172.118.100.101 172.118.100.102 TCP 9000 >
    4124 [PSH, ACK] Seq=16 Ack=92 Win=2896 Len=8 TSV=16503115
    TSER=15026116
    11 0.797194 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [ACK] Seq=92 Ack=24 Win=9902 Len=0 TSV=15026116 TSER=16503115
    12 1.088260 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [PSH, ACK] Seq=92 Ack=24 Win=9902 Len=8 TSV=15026189
    TSER=16503115

    16 6.127280 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [PSH, ACK] Seq=324 Ack=2656 Win=9902 Len=8 TSV=15027449
    TSER=16504322
    17 6.127289 172.118.100.101 172.118.100.102 TCP 9000 >
    4124 [PSH, ACK] Seq=2656 Ack=324 Win=2896 Len=8 TSV=16504448
    TSER=15027323
    18 6.127334 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [ACK] Seq=332 Ack=2664 Win=9902 Len=0 TSV=15027449 TSER=16504448
    19 6.127865 172.118.100.101 172.118.100.102 TCP 9000 >
    4124 [PSH, ACK] Seq=2664 Ack=332 Win=2896 Len=8 TSV=16504448
    TSER=15027449
    20 6.127907 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [ACK] Seq=332 Ack=2672 Win=9902 Len=0 TSV=15027449 TSER=16504448
    21 6.631221 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [PSH, ACK] Seq=332 Ack=2672 Win=9902 Len=8 TSV=15027575
    TSER=16504448
    22 6.631226 172.118.100.101 172.118.100.102 TCP 9000 >
    4124 [PSH, ACK] Seq=2672 Ack=332 Win=2896 Len=8 TSV=16504574
    TSER=15027449
    23 6.631260 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [ACK] Seq=340 Ack=2680 Win=9902 Len=0 TSV=15027575 TSER=16504574
    24 6.839618 172.118.100.102 172.118.100.101 TCP [TCP
    Retransmission] 4124 > 9000 [PSH, ACK] Seq=332 Ack=2680 Win=9902 Len=8
    TSV=15027627 TSER=16504574
    25 6.840379 172.118.100.101 172.118.100.102 TCP 9000 >
    4124 [PSH, ACK] Seq=2680 Ack=340 Win=2896 Len=8 TSV=16504626
    TSER=15027627
    26 6.840433 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [ACK] Seq=340 Ack=2688 Win=9902 Len=0 TSV=15027627 TSER=16504626
    27 7.136158 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [PSH, ACK] Seq=340 Ack=2688 Win=9902 Len=8 TSV=15027701
    TSER=16504626
    28 7.136163 172.118.100.101 172.118.100.102 TCP 9000 >
    4124 [PSH, ACK] Seq=2688 Ack=348 Win=2896 Len=8 TSV=16504700
    TSER=15027701
    29 7.136164 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [ACK] Seq=348 Ack=2696 Win=9902 Len=0 TSV=15027701 TSER=16504700

    31 1106.230079 172.118.100.101 172.118.100.102 TCP 9000 >
    4124 [PSH, ACK] Seq=470416 Ack=58388 Win=2896 Len=84 TSV=16779507
    TSER=15302381
    32 1106.230121 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [ACK] Seq=58388 Ack=470500 Win=14942 Len=0 TSV=15302506
    TSER=16779507
    33 1106.230402 172.118.100.101 172.118.100.102 TCP 9000 >
    4124 [PSH, ACK] Seq=470500 Ack=58388 Win=2896 Len=84 TSV=16779507
    TSER=15302381
    34 1106.230445 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [ACK] Seq=58388 Ack=470584 Win=14942 Len=0 TSV=15302506
    TSER=16779507
    35 1106.230716 172.118.100.101 172.118.100.102 TCP 9000 >
    4124 [PSH, ACK] Seq=470584 Ack=58388 Win=2896 Len=84 TSV=16779507
    TSER=15302381
    36 1106.230759 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [ACK] Seq=58388 Ack=470668 Win=14942 Len=0 TSV=15302506
    TSER=16779507
    37 1106.232746 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [PSH, ACK] Seq=58388 Ack=470668 Win=14942 Len=8 TSV=15302507
    TSER=16779507
    38 1106.232809 172.118.100.101 172.118.100.102 TCP 9000 >
    4124 [PSH, ACK] Seq=470668 Ack=58388 Win=2896 Len=588 TSV=16779507
    TSER=15302506
    39 1106.272712 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [ACK] Seq=58396 Ack=471256 Win=14942 Len=0 TSV=15302517
    TSER=16779507
    40 1106.440704 172.118.100.102 172.118.100.101 TCP [TCP
    Retransmission] 4124 > 9000 [PSH, ACK] Seq=58388 Ack=471256 Win=14942
    Len=8 TSV=15302559 TSER=16779507
    41 1106.443387 172.118.100.101 172.118.100.102 TCP 9000 >
    4124 [PSH, ACK] Seq=471256 Ack=58396 Win=2896 Len=8 TSV=16779560
    TSER=15302559
    42 1106.443391 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [ACK] Seq=58396 Ack=471264 Win=14942 Len=0 TSV=15302559
    TSER=16779560
    43 1106.736707 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [PSH, ACK] Seq=58396 Ack=471264 Win=14942 Len=8 TSV=15302633
    TSER=16779560
    44 1106.737143 172.118.100.101 172.118.100.102 TCP 9000 >
    4124 [PSH, ACK] Seq=471264 Ack=58404 Win=2896 Len=8 TSV=16779633
    TSER=15302633
    45 1106.737196 172.118.100.102 172.118.100.101 TCP 4124 >
    9000 [ACK] Seq=58404 Ack=471272 Win=14942 Len=0 TSV=15302633
    TSER=16779633


  2. Re: Linux TCP - unexpected retransmissions

    Francois wrote:
    > Our are working on an embedded system that has a number of PowerQUICC
    > processors running Linux. During normal operation, processors exchange
    > small messages (< 100 bytes) using TCP. We have a response time
    > requirement of about 100 milliseconds and we observed that sometimes
    > we have a long latency in transporting (e.g., > 200 mlliseconds across
    > Ethernet link) messages between nodes of the system resulting in
    > response time exceeding our requirement. This latency occurs randomly
    > at different places and on different interface types. We set the
    > socket NO_DELAY option, tried different setting (proc file ipv4
    > options) and test programs to isolate the root cause of the latency
    > with no success.
    >
    > We can reproduce the latency using a small application where two
    > PowerQuicc cards randomly send each other burst of messages across an
    > Ethernet link. For this test, we are using the 2.6.16 kernel. We use a
    > sniffer to capture data across the Ethernet link to realize that
    > sometimes when both TCPs send each other messages at about the same
    > time (segment 5 and 6 below), for unknown reasons, the second TCP does
    > not ack the message from the first TCP and a transmission occurs

    re?
    > (segment 8). We also observed that retransmissions sometimes occur
    > when one TCP is busy transmitting many messages (segment 38 contains
    > many application messages) while a message is being sent to it, again,
    > for unknown reasons, that TCP does not ack the message thus forcing a
    > retransmission (segment 40).
    >
    > Netstats reports TCP segments being retransmitted but no error at the
    > interface level. We have no reason to believe that segments are
    > dropped at the physical layer. We suspect that segments are dropped at
    > the TCP layer but we don't know why/where. Any ideas?


    Did you try replacing whatever was in the middle (hub/switch/crossover
    cable/...)? I know you said you don't suspect the link layer, but a
    little paranoia never hurts.

    Did you try using well-tested network cards? The machine I'm using to
    write this has a built-in NIC that started mysteriously dropping packets
    when I installed FC5. Switching to a well-debugged card/driver made the
    problem go away.

  3. Re: Linux TCP - unexpected retransmissions

    On May 28, 10:54 pm, Allen McIntosh wrote:
    > Francois wrote:
    > > Our are working on an embedded system that has a number of PowerQUICC
    > > processors running Linux. During normal operation, processors exchange
    > > small messages (< 100 bytes) using TCP. We have a response time
    > > requirement of about 100 milliseconds and we observed that sometimes
    > > we have a long latency in transporting (e.g., > 200 mlliseconds across
    > > Ethernet link) messages between nodes of the system resulting in
    > > response time exceeding our requirement. This latency occurs randomly
    > > at different places and on different interface types. We set the
    > > socket NO_DELAY option, tried different setting (proc file ipv4
    > > options) and test programs to isolate the root cause of the latency
    > > with no success.

    >
    > > We can reproduce the latency using a small application where two
    > > PowerQuicc cards randomly send each other burst of messages across an
    > > Ethernet link. For this test, we are using the 2.6.16 kernel. We use a
    > > sniffer to capture data across the Ethernet link to realize that
    > > sometimes when both TCPs send each other messages at about the same
    > > time (segment 5 and 6 below), for unknown reasons, the second TCP does
    > > not ack the message from the first TCP and a transmission occurs

    >
    > re?
    >
    > > (segment 8). We also observed that retransmissions sometimes occur
    > > when one TCP is busy transmitting many messages (segment 38 contains
    > > many application messages) while a message is being sent to it, again,
    > > for unknown reasons, that TCP does not ack the message thus forcing a
    > > retransmission (segment 40).

    >
    > > Netstats reports TCP segments being retransmitted but no error at the
    > > interface level. We have no reason to believe that segments are
    > > dropped at the physical layer. We suspect that segments are dropped at
    > > the TCP layer but we don't know why/where. Any ideas?

    >
    > Did you try replacing whatever was in the middle (hub/switch/crossover
    > cable/...)? I know you said you don't suspect the link layer, but a
    > little paranoia never hurts.
    >
    > Did you try using well-tested network cards? The machine I'm using to
    > write this has a built-in NIC that started mysteriously dropping packets
    > when I installed FC5. Switching to a well-debugged card/driver made the
    > problem go away.- Hide quoted text -
    >
    > - Show quoted text -


    Our system is composed of a number of embedded PowerQUICC processors
    (VME) located within a number of shelves. Processors communicate using
    point-to-point Ethernet links, or through the VME backplane. There is
    no hub or switch between them (except when we use a sniffer for
    testing purposes). We tried different cables, cards, shelves, etc, to
    isolate the root cause of this latency with no success.

    After browsing the Linux code for a while (I wish I understand it
    better), we realized that the TCP stack optimizes performance by
    separating the processing of events between user and kernel space. We
    suspect that under certain conditions (heavy burst of messages, or
    messages arriving at the same time), the stack drops or postpones
    processing of events (holding locks, buffering) causing timers to
    trigger retransmissions.

    Thanks
    Francois


  4. Re: Linux TCP - unexpected retransmissions

    Francois wrote:
    > After browsing the Linux code for a while (I wish I understand it
    > better), we realized that the TCP stack optimizes performance by
    > separating the processing of events between user and kernel
    > space. We suspect that under certain conditions (heavy burst of
    > messages, or messages arriving at the same time), the stack drops or
    > postpones processing of events (holding locks, buffering) causing
    > timers to trigger retransmissions.


    ISTR there is a sysctl which controls some of that decision making -
    net.ipv4.tcp_low_latency . Maybe that will help, maybe not.

    Quite frankly, TCP isn't exactly the right protocol for firm/hard
    realtime requirements, as you have learned from experience with lost
    traffic and retransmissions. There isn't really a "perfect" protocol
    for such things though (IMO).

    rick jones
    --
    a wide gulf separates "what if" from "if only"
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  5. Re: Linux TCP - unexpected retransmissions

    Rick Jones wrote:

    > Francois wrote:
    >> After browsing the Linux code for a while (I wish I understand it
    >> better), we realized that the TCP stack optimizes performance by
    >> separating the processing of events between user and kernel
    >> space. We suspect that under certain conditions (heavy burst of
    >> messages, or messages arriving at the same time), the stack drops or
    >> postpones processing of events (holding locks, buffering) causing
    >> timers to trigger retransmissions.

    >
    > ISTR there is a sysctl which controls some of that decision making -
    > net.ipv4.tcp_low_latency . Maybe that will help, maybe not.
    >
    > Quite frankly, TCP isn't exactly the right protocol for firm/hard
    > realtime requirements, as you have learned from experience with lost
    > traffic and retransmissions. There isn't really a "perfect" protocol
    > for such things though (IMO).
    >
    > rick jones


    There's Infiniband (which I know little of apart from it exists). I dare say
    it would be an expensive option and totally OTT for the OP's application.

    However, I do wonder if the OP has considered dumping IP and just throwing
    raw ethernet frames around? Hard to say whether it would be better or not -
    depends on the hardware setup, but it's worth a though.

    Cheers

    Tim

  6. Re: Linux TCP - unexpected retransmissions

    Tim S wrote:
    > However, I do wonder if the OP has considered dumping IP and just
    > throwing raw ethernet frames around? Hard to say whether it would be
    > better or not - depends on the hardware setup, but it's worth a
    > though.


    One of those damned if you do, damned if you don't things I suspect.
    One could go with direct Ethernet, but then one has to segment
    oneself, as well as deal with lost traffic. One does have the
    advantage of being able to use one's own retransmission timeouts.
    Having doe that though, some months later someone will want to be able
    to run the application between two sites, without any bridging
    available and then the lack of routing (since we've ditched IP) will
    come back to haunt.

    Also, with direct Ethernet, there are only so many Ethertypes/SAPs one
    can use which may make multiple "connections" a bit difficult. The
    author might have to write her own connection multiplex/demultiplex.

    rick jones
    --
    oxymoron n, commuter in a gas-guzzling luxury SUV with an American flag
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  7. Re: Linux TCP - unexpected retransmissions

    Rick Jones wrote:

    > Tim S wrote:
    >> However, I do wonder if the OP has considered dumping IP and just
    >> throwing raw ethernet frames around? Hard to say whether it would be
    >> better or not - depends on the hardware setup, but it's worth a
    >> though.

    >
    > One of those damned if you do, damned if you don't things I suspect.
    > One could go with direct Ethernet, but then one has to segment
    > oneself, as well as deal with lost traffic. One does have the
    > advantage of being able to use one's own retransmission timeouts.
    > Having doe that though, some months later someone will want to be able
    > to run the application between two sites, without any bridging
    > available and then the lack of routing (since we've ditched IP) will
    > come back to haunt.
    >
    > Also, with direct Ethernet, there are only so many Ethertypes/SAPs one
    > can use which may make multiple "connections" a bit difficult. The
    > author might have to write her own connection multiplex/demultiplex.
    >
    > rick jones


    Yes - I should clarify. I've seriously considered using plain ethernet for a
    point-to-point link where one half of the link is hosted by a fairly dumb
    embedded system (too dumb to run a "proper" OS, but highly specialised for
    its task) and where the link's purpose is to feed data to a more
    intelligent but less specialised embedded board.

    Cheers

    Tim

  8. Re: Linux TCP - unexpected retransmissions

    On Tue, 29 May 2007 06:15:00 -0700, Francois wrote:

    > We
    > suspect that under certain conditions (heavy burst of messages, or
    > messages arriving at the same time), the stack drops or postpones
    > processing of events (holding locks, buffering) causing timers to
    > trigger retransmissions.


    That sounds like a reasonable explanation to me. Or the link layer drops
    data because of timing constraints and/or limited resource, so the tcp
    stack never sees it.

    Others have suggested using link layer protocol only, but what about using
    udp?

    Dan

  9. Re: Linux TCP - unexpected retransmissions

    On May 29, 9:15 pm, Dan N wrote:
    > On Tue, 29 May 2007 06:15:00 -0700, Francois wrote:
    > > We
    > > suspect that under certain conditions (heavy burst of messages, or
    > > messages arriving at the same time), the stack drops or postpones
    > > processing of events (holding locks, buffering) causing timers to
    > > trigger retransmissions.

    >
    > That sounds like a reasonable explanation to me. Or the link layer drops
    > data because of timing constraints and/or limited resource, so the tcp
    > stack never sees it.
    >
    > Others have suggested using link layer protocol only, but what about using
    > udp?
    >
    > Dan


    We have considered using UDP. Although feasible, it would be a
    significant of work, not so much to implement but to prove for
    correctness. Rightly or wrongly, we made a number of assumptions early
    on in the design that were driven by the fact that we used TCP thus
    there would be a need to implement additional services on top of UDP
    and prove correctness.

    We first wanted to isolate the root cause of this latency. As
    described above, we suspect the problem related the TCP stack but we
    have not proven this yet. We were hoping someone on the net would
    confirm that either the current design of the Linux TCP stack could
    result in such behaviour, or that this a bug and even better point us
    towards a fix.

    Thanks
    Francois


  10. Re: Linux TCP - unexpected retransmissions

    If you have already checked all the stats available in Linux (netstat
    -s and ethtool) and they are indeed clean, and then have checked the
    stats on the switches (for those situations were switches were used),
    and a tcpdump trace, or perhaps better still some external packet
    sniffing with a sufficinelty powerfull third system (and perhaps a
    hub) shows actual symptoms of packet loss, then it would seem that you
    have encountered a situation where there are points in the stack which
    can drop packets, but not increment a stat.

    That would be a bug.

    You may need to start perusing the source of the entire path looking
    for places where this might be the case. You would then need to
    kludge-in some counters of your own (perhaps just simple printk's even
    as a start) to see what might be going-on. If you get your Linux bits
    from a commerical source, you could fire-up your support contract and
    start getting them to do some of that - the source code perusal and
    perhaps quick and dirty counters at least.

    rick jones
    --
    oxymoron n, Hummer H2 with California Save Our Coasts and Oceans plates
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

+ Reply to Thread