Bug#478062: This has a work around - Debian

This is a discussion on Bug#478062: This has a work around - Debian ; I received an email from Ilpo Järvinen asking me to set F-RTO to off in sysctl. I could not get the syntax right in that file to boot with that option off so I ran echo 0 >/proc/sys/net/ipv4/tcp_frto and I ...

+ Reply to Thread
Results 1 to 10 of 10

Thread: Bug#478062: This has a work around

  1. Bug#478062: This has a work around

    I received an email from Ilpo Järvinen asking me to set F-RTO to off in
    sysctl. I could not get the syntax right in that file to boot with that
    option off so I ran
    echo 0 >/proc/sys/net/ipv4/tcp_frto and I then ran a test page through
    the cups web front end. It printed.

    I don't know who Ilpo Järvinen is, but a
    quick google shows he appears to be a kernel developer. Below is a copy
    of his request to me:

    Hi,

    I've been trying to track reports about TCP problem that is possibly F-RTO
    related (there aren't that many), which was enabled for 2.6.24. Noticed
    that http://mwh.geek.nz/ links to your mail as similar problem (wasn't
    working for me except from google archives), that problem was solved by
    turning F-RTO off (tcp_frto) sysctl. Could you possible check if that
    applies to your CUPS case as well, if so I'd be interested to see some
    tcpdumps about the problem to see what's wrong with it.

    Ilpo,

    What exactly do you want me to do to assist you pinning this down? I know the words "tcpdump" but I never ran it, it looks like you want me to run
    tcpdump -some switches. After perusing the man file, I thought it best to just ask you what options would be best to be run. Keep in mind I have one box on my net that is running a vpn to a business that I do NOT want sniffed, so I would think that host printerIP would be best?



    --
    Damon L. Chesser
    damon@damtek.com
    http://www.linkedin.com/in/dchesser




    --
    To UNSUBSCRIBE, email to debian-bugs-dist-REQUEST@lists.debian.org
    with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

  2. Bug#478062: This has a work around

    Ah, there were some other bug reports as well to discover...

    On Tue, 6 May 2008, Damon L. Chesser wrote:

    > I received an email from Ilpo Järvinen asking me to set F-RTO to off in
    > sysctl. I could not get the syntax right in that file to boot with that
    > option off so I ran
    > echo 0 >/proc/sys/net/ipv4/tcp_frto and I then ran a test page through the
    > cups web front end. It printed.


    The command you used was exactly what I was asking for (I don't know what
    else you expected to work). :-)

    So the problem was 100% reproducable for you with FRTO (I'm still catching
    up the details here :-))?

    > I don't know who Ilpo Järvinen is, but a quick
    > google shows he appears to be a kernel developer.


    Yeah, you found out a right one. :-)

    I suspect there's some corner case which might be buggy in kernel, but
    there are other possibilities currently as well. I'd like to get this
    fixed if it's a kernel bug, and think about work-arounds if it seems to be
    elsewhere (end-point or middlebox). It might be that those devices just
    blantantly discard any out-of-order segments, it could explain this kind
    of phenomena, but we'll soon see what's doing.

    > What exactly do you want me to do to assist you pinning this down? I
    > know the
    > words "tcpdump" but I never ran it, it looks like you want me to run
    > tcpdump -some switches. After perusing the man file, I thought it bestto
    > just ask you what options would be best to be run. Keep in mind I haveone
    > box on my net that is running a vpn to a business that I do NOT want sniffed,
    > so I would think that host printerIP would be best?


    Thanks a lot for the help. Yes, it's possible to get basically all
    unrelated things filtered out, and I'm ok even with a log that has
    IP-addresses anonymized (e.g., with sed afterwards) if you feel a need
    to do so.

    Turn FRTO back on (set the tcp_frto to 2). Then run this command (as
    superuser):

    tcpdump -i -w frtoprob.log host and host

    ....That will exclude everything else but stuff between the relevant hosts,
    you will get a file called frtoprob.log. Please change those <>-marked
    parts so that they match you setup. I suppose that printerip should be
    192.168.200.150 for your case (based on debian bug). ...Then just
    reproduce the problem.

    Please wait at least minutes before ctrl-c:ing the tcpdump! If you really
    want to capture the whole flow, you can keep the tcpdump running until the
    connection between and has disappeared
    from netstat's output (might take considerable amount of time, 20mins or
    so if it slowly makes progress whole the time), though I think I can
    determine the problem pattern with somewhat less than that, e.g., 5mins
    during the problem should probably be enough.

    You can evaluate the generated log afterwards with tcpdump -r frtoprob.log
    (superuser rights are no longer mandatory, as long as the generated log
    file is accessable for an ordinary user). I'd like a verbose log, use this
    cmd:

    tcpdump -tt -vvv -n -r frtoprob.log > frtoprob.txt

    (You could add
    | sed -e 's||prnt|g' -e 's||myip|g'
    there before redirection if you would want to hide the ip addresses too,
    but since you have one in debian bug too, I suspect you don't feel that
    necessary).

    Send what is in the frtoprob.txt, preferrable in such format that there
    won't be extra newlines (yes, the lines are long) :-). The generated
    output doesn't contain even any payload between the printer and your
    machine, just protocol headers printed out (the binary dump would contain
    part of the payload), not to speak of anything between any other hosts.

    ....I hope I wasn't overly verbose here. :-)


    I suppose you're willing to test some kernel patch as well if that becomes
    necessary?

    --
    i.


  3. Bug#478062: This has a work around

    Ilpo Järvinen wrote:
    > Ah, there were some other bug reports as well to discover...
    >
    > snip
    >
    > The command you used was exactly what I was asking for (I don't know what
    > else you expected to work). :-)
    >
    > So the problem was 100% reproducable for you with FRTO (I'm still catching
    > up the details here :-))?
    >


    Yes, 100% reproducible if you turn FRTO = 2. Printing works as it is
    supposed to if FRTO = 0
    > snip
    > Yeah, you found out a right one. :-)
    >
    > I suspect there's some corner case which might be buggy in kernel, but
    > there are other possibilities currently as well. I'd like to get this
    > fixed if it's a kernel bug, and think about work-arounds if it seems to be
    > elsewhere (end-point or middlebox). It might be that those devices just
    > blantantly discard any out-of-order segments, it could explain this kind
    > of phenomena, but we'll soon see what's doing


    I also suspect a corner case. I suspect the hardware used has a "narrow
    window" in which to allow the network traffic and the *new* kernels for
    some reason are outside this window. However, just because FC8, Ubuntu
    8.04 and Debian all see the same thing does not rule out an issue with
    cupsys.
    > .
    >
    >
    >> snip
    >>

    >
    > Thanks a lot for the help. Yes, it's possible to get basically all
    > unrelated things filtered out, and I'm ok even with a log that has
    > IP-addresses anonymized (e.g., with sed afterwards) if you feel a need
    > to do so.
    >
    > Turn FRTO back on (set the tcp_frto to 2). Then run this command (as
    > superuser):
    >
    > tcpdump -i -w frtoprob.log host and host
    >
    > ...That will exclude everything else but stuff between the relevant hosts,
    > you will get a file called frtoprob.log. Please change those <>-marked
    > parts so that they match you setup. I suppose that printerip should be
    > 192.168.200.150 for your case (based on debian bug). ...Then just
    > reproduce the problem.
    >
    > Please wait at least minutes before ctrl-c:ing the tcpdump! If you really
    > want to capture the whole flow, you can keep the tcpdump running until the
    > connection between and has disappeared
    > from netstat's output (might take considerable amount of time, 20mins or
    > so if it slowly makes progress whole the time), though I think I can
    > determine the problem pattern with somewhat less than that, e.g., 5mins
    > during the problem should probably be enough.
    >
    > You can evaluate the generated log afterwards with tcpdump -r frtoprob.log
    > (superuser rights are no longer mandatory, as long as the generated log
    > file is accessable for an ordinary user). I'd like a verbose log, use this
    > cmd:
    >
    > tcpdump -tt -vvv -n -r frtoprob.log > frtoprob.txt
    >
    > (You could add
    > | sed -e 's||prnt|g' -e 's||myip|g'
    > there before redirection if you would want to hide the ip addresses too,
    > but since you have one in debian bug too, I suspect you don't feel that
    > necessary).
    >
    > Send what is in the frtoprob.txt, preferrable in such format that there
    > won't be extra newlines (yes, the lines are long) :-). The generated
    > output doesn't contain even any payload between the printer and your
    > machine, just protocol headers printed out (the binary dump would contain
    > part of the payload), not to speak of anything between any other hosts.
    >
    > ...I hope I wasn't overly verbose here. :-)
    >


    Just long enough
    >
    > I suppose you're willing to test some kernel patch as well if that becomes
    > necessary?
    >


    Yup. I found it. I *own* it. This impacts a very small amount of
    people as evidenced by the resounding silence of complaints about it
    (which is shy I think the hardware has a "small window" to accept
    packets) I will have to dust off my kernel patching skills, been a long
    long time since I had to patch a kernel.

    Interesting: I have stopped the cups print job, I have reset the
    printer, powered off the printerserver device, and re-seated the network
    cable (required to clear the stuck tcp packets so that the other packets
    can get to the printer), I have stopped cupsysd and netstat still shows
    an tcp connection with 192.168.200.150 (printer). I include this info
    as this might be needed by you. I suspect that that is not *normal*
    behavior, but I have never ran netstat --tcp -c on a print job that
    failed, so I don't know:

    Active Internet connections (w/o servers)
    Proto Recv-Q Send-Q Local Address Foreign Address
    State
    tcp 0 7168 dam-main.local:35770 192.168.200.150:9100
    ESTABLISHED

    Ran the tcpdump for a solid 20 minutes, attached.

    Let me know if you need anything else.

    Thanks!

    --
    Damon L. Chesser
    damon@damtek.com
    http://www.linkedin.com/in/dchesser


    1210168073.605885 arp who-has 192.168.200.150 tell 192.168.200.15
    1210168073.609015 arp reply 192.168.200.150 is-at 00:0e:3b:00:3a:75
    1210168073.609061 IP (tos 0x0, ttl 64, id 63223, offset 0, flags [DF], proto TCP (6), length 60) 192.168.200.15.35770 > 192.168.200.150.9100: S, cksum 0xc0b9 (correct), 2773183952:2773183952(0) win 5840
    1210168073.611728 IP (tos 0x0, ttl 100, id 21, offset 0, flags [none], proto TCP (6), length 44) 192.168.200.150.9100 > 192.168.200.15.35770: S, cksum 0x94a7 (correct), 214441984:214441984(0) ack 2773183953 win 1024
    1210168073.611810 IP (tos 0x0, ttl 64, id 63224, offset 0, flags [DF], proto TCP (6), length 40) 192.168.200.15.35770 > 192.168.200.150.9100: ., cksum 0x97e0 (correct), 1:1(0) ack 1 win 5840
    1210168075.424729 IP (tos 0x0, ttl 64, id 63225, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: . 1:513(512) ack 1 win 5840
    1210168075.424750 IP (tos 0x0, ttl 64, id 63226, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 513:1025(512) ack 1 win 5840
    1210168075.433410 IP (tos 0x0, ttl 100, id 22, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0xa8b0 (correct), 1:1(0) ack 513 win 1024
    1210168075.433465 IP (tos 0x0, ttl 64, id 63227, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: . 1025:1537(512) ack 1 win 5840
    1210168075.637881 IP (tos 0x0, ttl 64, id 63228, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 513:1025(512) ack 1 win 5840
    1210168075.643234 IP (tos 0x0, ttl 100, id 23, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0xa6b0 (correct), 1:1(0) ack 1025 win 1024
    1210168075.643293 IP (tos 0x0, ttl 64, id 63229, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 1537:2049(512) ack 1 win 5840
    1210168076.058858 IP (tos 0x0, ttl 64, id 63230, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: . 1025:1537(512) ack 1 win 5840
    1210168076.065195 IP (tos 0x0, ttl 100, id 24, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0xa4b0 (correct), 1:1(0) ack 1537 win 1024
    1210168076.065218 IP (tos 0x0, ttl 64, id 63231, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: . 2049:2561(512) ack 1 win 5840
    1210168076.764849 IP (tos 0x0, ttl 100, id 25, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0xa4b0 (correct), 1:1(0) ack 1537 win 1024
    1210168076.894877 IP (tos 0x0, ttl 64, id 63232, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 1537:2049(512) ack 1 win 5840
    1210168076.900138 IP (tos 0x0, ttl 100, id 26, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0xa2b0 (correct), 1:1(0) ack 2049 win 1024
    1210168076.900171 IP (tos 0x0, ttl 64, id 63233, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 2561:3073(512) ack 1 win 5840
    1210168077.600238 IP (tos 0x0, ttl 100, id 27, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0xa2b0 (correct), 1:1(0) ack 2049 win 1024
    1210168078.562871 IP (tos 0x0, ttl 64, id 63234, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: . 2049:2561(512) ack 1 win 5840
    1210168078.568982 IP (tos 0x0, ttl 100, id 28, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0xa0b0 (correct), 1:1(0) ack 2561 win 1024
    1210168078.569027 IP (tos 0x0, ttl 64, id 63235, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: . 3073:3585(512) ack 1 win 5840
    1210168079.263714 IP (tos 0x0, ttl 100, id 29, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0xa0b0 (correct), 1:1(0) ack 2561 win 1024
    1210168081.893871 IP (tos 0x0, ttl 64, id 63236, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 2561:3073(512) ack 1 win 5840
    1210168081.897744 IP (tos 0x0, ttl 100, id 30, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x9eb0 (correct), 1:1(0) ack 3073 win 1024
    1210168081.897775 IP (tos 0x0, ttl 64, id 63237, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 3585:4097(512) ack 1 win 5840
    1210168082.597179 IP (tos 0x0, ttl 100, id 31, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x9eb0 (correct), 1:1(0) ack 3073 win 1024
    1210168088.549863 IP (tos 0x0, ttl 64, id 63238, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: . 3073:3585(512) ack 1 win 5840
    1210168088.555052 IP (tos 0x0, ttl 100, id 32, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x9cb0 (correct), 1:1(0) ack 3585 win 1024
    1210168088.555071 IP (tos 0x0, ttl 64, id 63239, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: . 4097:4609(512) ack 1 win 5840
    1210168089.254224 IP (tos 0x0, ttl 100, id 33, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x9cb0 (correct), 1:1(0) ack 3585 win 1024
    1210168101.865871 IP (tos 0x0, ttl 64, id 63240, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 3585:4097(512) ack 1 win 5840
    1210168101.872129 IP (tos 0x0, ttl 100, id 34, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x9ab0 (correct), 1:1(0) ack 4097 win 1024
    1210168101.872159 IP (tos 0x0, ttl 64, id 63241, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 4609:5121(512) ack 1 win 5840
    1210168102.568296 IP (tos 0x0, ttl 100, id 35, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x9ab0 (correct), 1:1(0) ack 4097 win 1024
    1210168128.494871 IP (tos 0x0, ttl 64, id 63242, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: . 4097:4609(512) ack 1 win 5840
    1210168128.501748 IP (tos 0x0, ttl 100, id 36, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x98b0 (correct), 1:1(0) ack 4609 win 1024
    1210168128.501779 IP (tos 0x0, ttl 64, id 63243, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: . 5121:5633(512) ack 1 win 5840
    1210168129.196598 IP (tos 0x0, ttl 100, id 37, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x98b0 (correct), 1:1(0) ack 4609 win 1024
    1210168181.746872 IP (tos 0x0, ttl 64, id 63244, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 4609:5121(512) ack 1 win 5840
    1210168181.753119 IP (tos 0x0, ttl 100, id 41, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x96b0 (correct), 1:1(0) ack 5121 win 1024
    1210168181.753151 IP (tos 0x0, ttl 64, id 63245, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 5633:6145(512) ack 1 win 5840
    1210168182.452827 IP (tos 0x0, ttl 100, id 42, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x96b0 (correct), 1:1(0) ack 5121 win 1024
    1210168288.246871 IP (tos 0x0, ttl 64, id 63246, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: . 5121:5633(512) ack 1 win 5840
    1210168288.251178 IP (tos 0x0, ttl 100, id 46, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x94b0 (correct), 1:1(0) ack 5633 win 1024
    1210168288.251213 IP (tos 0x0, ttl 64, id 63247, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: . 6145:6657(512) ack 1 win 5840
    1210168288.950675 IP (tos 0x0, ttl 100, id 47, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x94b0 (correct), 1:1(0) ack 5633 win 1024
    1210168408.249876 IP (tos 0x0, ttl 64, id 63248, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 5633:6145(512) ack 1 win 5840
    1210168408.255523 IP (tos 0x0, ttl 100, id 54, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x92b0 (correct), 1:1(0) ack 6145 win 1024
    1210168408.255556 IP (tos 0x0, ttl 64, id 63249, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 6657:7169(512) ack 1 win 5840
    1210168408.950279 IP (tos 0x0, ttl 100, id 55, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x92b0 (correct), 1:1(0) ack 6145 win 1024
    1210168528.253909 IP (tos 0x0, ttl 64, id 63250, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: . 6145:6657(512) ack 1 win 5840
    1210168528.260088 IP (tos 0x0, ttl 100, id 59, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x90b0 (correct), 1:1(0) ack 6657 win 1024
    1210168528.260126 IP (tos 0x0, ttl 64, id 63251, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 7169:7681(512) ack 1 win 5840
    1210168528.954815 IP (tos 0x0, ttl 100, id 60, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x90b0 (correct), 1:1(0) ack 6657 win 1024
    1210168648.257875 IP (tos 0x0, ttl 64, id 63252, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 6657:7169(512) ack 1 win 5840
    1210168648.264753 IP (tos 0x0, ttl 100, id 67, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x8eb0 (correct), 1:1(0) ack 7169 win 1024
    1210168648.264804 IP (tos 0x0, ttl 64, id 63253, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 7681:8193(512) ack 1 win 5840
    1210168648.959350 IP (tos 0x0, ttl 100, id 68, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x8eb0 (correct), 1:1(0) ack 7169 win 1024
    1210168768.261878 IP (tos 0x0, ttl 64, id 63254, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 7169:7681(512) ack 1 win 5840
    1210168768.265497 IP (tos 0x0, ttl 100, id 72, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x8cb0 (correct), 1:1(0) ack 7681 win 1024
    1210168768.265528 IP (tos 0x0, ttl 64, id 63255, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: . 8193:8705(512) ack 1 win 5840
    1210168768.964044 IP (tos 0x0, ttl 100, id 73, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x8cb0 (correct), 1:1(0) ack 7681 win 1024
    1210168888.262884 IP (tos 0x0, ttl 64, id 63256, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 7681:8193(512) ack 1 win 5840
    1210168888.268781 IP (tos 0x0, ttl 100, id 77, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x8ab0 (correct), 1:1(0) ack 8193 win 1024
    1210168888.268847 IP (tos 0x0, ttl 64, id 63257, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 8705:9217(512) ack 1 win 5840
    1210168888.963510 IP (tos 0x0, ttl 100, id 78, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x8ab0 (correct), 1:1(0) ack 8193 win 1024
    1210169008.265876 IP (tos 0x0, ttl 64, id 63258, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: . 8193:8705(512) ack 1 win 5840
    1210169008.270775 IP (tos 0x0, ttl 100, id 85, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x88b0 (correct), 1:1(0) ack 8705 win 1024
    1210169008.270808 IP (tos 0x0, ttl 64, id 63259, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: . 9217:9729(512) ack 1 win 5840
    1210169008.968045 IP (tos 0x0, ttl 100, id 86, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x88b0 (correct), 1:1(0) ack 8705 win 1024
    1210169128.269900 IP (tos 0x0, ttl 64, id 63260, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 8705:9217(512) ack 1 win 5840
    1210169128.273712 IP (tos 0x0, ttl 100, id 90, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x86b0 (correct), 1:1(0) ack 9217 win 1024
    1210169128.273744 IP (tos 0x0, ttl 64, id 63261, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 9729:10241(512) ack 1 win 5840
    1210169128.972707 IP (tos 0x0, ttl 100, id 91, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x86b0 (correct), 1:1(0) ack 9217 win 1024
    1210169248.269882 IP (tos 0x0, ttl 64, id 63262, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: . 9217:9729(512) ack 1 win 5840
    1210169248.273988 IP (tos 0x0, ttl 100, id 98, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x84b0 (correct), 1:1(0) ack 9729 win 1024
    1210169248.274020 IP (tos 0x0, ttl 64, id 63263, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: . 10241:10753(512) ack 1 win 5840
    1210169248.972183 IP (tos 0x0, ttl 100, id 99, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x84b0 (correct), 1:1(0) ack 9729 win 1024
    1210169368.274878 IP (tos 0x0, ttl 64, id 63264, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 9729:10241(512) ack 1 win 5840
    1210169368.282107 IP (tos 0x0, ttl 100, id 103, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x82b0 (correct), 1:1(0) ack 10241 win 1024
    1210169368.282164 IP (tos 0x0, ttl 64, id 63265, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.35770 > 192.168.200.150.9100: P 10753:11265(512) ack 1 win 5840
    1210169368.976736 IP (tos 0x0, ttl 100, id 104, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.35770: ., cksum 0x82b0 (correct), 1:1(0) ack 10241 win 1024


  4. Bug#478062: Fix FRTO+NewReno problem (Was: Re: This has a work around)

    Added Cc netdev (=linux network developers list).

    On Wed, 7 May 2008, Damon L. Chesser wrote:

    > Ilpo Järvinen wrote:
    > >
    > > So the problem was 100% reproducable for you with FRTO (I'm still catching
    > > up the details here :-))?
    > >

    >
    > Yes, 100% reproducible if you turn FRTO = 2. Printing works as it is
    > supposed to if FRTO = 0
    >
    > > I suspect there's some corner case which might be buggy in kernel, but there
    > > are other possibilities currently as well. I'd like to get this fixedif
    > > it's a kernel bug, and think about work-arounds if it seems to be elsewhere
    > > (end-point or middlebox). It might be that those devices just blantantly
    > > discard any out-of-order segments, it could explain this kind of phenomena,
    > > but we'll soon see what's doing

    >
    > I also suspect a corner case. I suspect the hardware used has a "narrow
    > window" in which to allow the network traffic and the *new* kernels forsome
    > reason are outside this window. However, just because FC8, Ubuntu 8.04and
    > Debian all see the same thing does not rule out an issue with cupsys.


    No we're not sending past the window this time, that bug got resolved
    before 2.6.25 got released (and it wasn't in 2.6.24 at all)... :-)

    There were couple of interesting aspect in the tcpdump log:

    - The printer is using advertized window of 1024 whole the time => mss is
    set to 512 to allow two segments per rtt, we won't get a larger window at
    all.
    - The printer wasn't exactly blantantly discarding all received
    out-of-order segments, it correctly sent dupacks (most of the time),
    though the rate it's able to receive segments seems quite low, thus
    there were some losses as well (and therefore "missing" dupACKs too).
    - FRTOs occur during a previous recovery.

    After some code reading, I found the causing bug in the kernel:
    The printer won't negotiate SACK, yet F-RTO is using in couple of places
    a condition which lacks check for this making F-RTO to select wrongly
    SACK enabled behavior. ...There are two bugs actually with the same
    sympthoms, one for non-SACK case and the other for SACK-enabled case.
    ....The fix below is for non-SACK case.

    > > I suppose you're willing to test some kernel patch as well if that becomes
    > > necessary?

    >
    > Yup. I found it. I *own* it. This impacts a very small amount of people as
    > evidenced by the resounding silence of complaints about it (which is shy I
    > think the hardware has a "small window" to accept packets) I will haveto
    > dust off my kernel patching skills, been a long long time since I had to patch
    > a kernel.
    >
    > Interesting: I have stopped the cups print job, I have reset the printer,
    > powered off the printerserver device, and re-seated the network cable
    > (required to clear the stuck tcp packets so that the other packets can get to
    > the printer), I have stopped cupsysd and netstat still shows an tcp connection
    > with 192.168.200.150 (printer). I include this info as this might be needed
    > by you. I suspect that that is not *normal* behavior, but I have neverran
    > netstat --tcp -c on a print job that failed, so I don't know:


    This is expected, once you had make the printer unreachabled, TCP will try
    a number of retransmissions without getting any response from printer
    before giving up, it would have just taken even longer time to get it
    cleaned up. The printer on the other hand will lose TCP states when you
    resetted it and therefore it's able to receive more stuff right away.

    > Ran the tcpdump for a solid 20 minutes, attached.


    Thanks, tcpdump was helpful.

    > Let me know if you need anything else.


    Could you next try with tcp_frto set to 1, if my theory proves to be
    correct, it too should be "enough" to fix the problem (in this
    particular case). Of course you can verify the patch below too if you want
    to, the patch should allow cups<->printer to work with tcp_frto = 2 too..
    In case you have problem to apply the patch to the particular version
    you're want to try with, just send a note about the version number to me
    so I can adapt the patch for you (space etc. formatting issues may show up
    because I recently run a code style cleanup on the tcp code).

    --
    i.


    --
    [PATCH] [TCP] FRTO: SACK variant is errorneously used with NewReno

    Note: there's actually another bug in FRTO's SACK variant, which
    is the causing failure in NewReno case because of the error
    that's fixed here. I'll fix the SACK case separately (it's
    a separate bug really, though related, but in order to fix that
    I need to audit tp->snd_nxt usage a bit).

    There were two places where SACK variant of FRTO is getting
    incorrectly used even if SACK wasn't negotiated by the TCP flow.
    This leads to incorrect setting of frto_highmark with NewReno
    if a previous recovery was interrupted by another RTO.

    An eventual fallback to conventional recovery then incorrectly
    considers one or couple of segments as forward transmissions
    though they weren't, which then are not LOST marked during
    fallback making them "non-retransmittable" until the next RTO.
    In a bad case, those segments are really lost and are the only
    one left in the window. Thus TCP needs another RTO to continue.
    The next FRTO, however, could again repeat the same events
    making the progress of the TCP flow extremely slow.

    In order for these events to occur at all, FRTO must occur
    again in FRTOs step 3 while the key segments must be lost as
    well, which is not too likely in practice. It seems to most
    frequently with some small devices such as network printers
    that *seem* to accept TCP segments only in-order. In cases
    were key segments weren't lost, things get automatically
    resolved because those wrongly marked segments don't need to be
    retransmitted in order to continue.

    I found a reproducer after digging up relevant reports (few
    reports in total, none at netdev or lkml I know of), some
    cases seemed to indicate middlebox issues which seems now
    to be a false assumption some people had made. Bugzilla
    #10063 _might_ be related. Damon L. Chesser
    had a reproducable case and was kind enough to tcpdump it
    for me. With the tcpdump log it was quite trivial to figure
    out.

    Signed-off-by: Ilpo Järvinen
    ---
    net/ipv4/tcp_input.c | 12 +++++++-----
    1 files changed, 7 insertions(+), 5 deletions(-)

    diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
    index 0298f80..5c503e0 100644
    --- a/net/ipv4/tcp_input.c
    +++ b/net/ipv4/tcp_input.c
    @@ -113,8 +113,6 @@ int sysctl_tcp_abc __read_mostly;
    #define FLAG_FORWARD_PROGRESS (FLAG_ACKED|FLAG_DATA_SACKED)
    #define FLAG_ANY_PROGRESS (FLAG_FORWARD_PROGRESS|FLAG_SND_UNA_ADVANCED)

    -#define IsSackFrto() (sysctl_tcp_frto == 0x2)
    -
    #define TCP_REMNANT (TCP_FLAG_FIN|TCP_FLAG_URG|TCP_FLAG_SYN|TCP_FLAG_P SH)
    #define TCP_HP_BITS (~(TCP_RESERVED_BITS|TCP_FLAG_PSH))

    @@ -1685,6 +1683,10 @@ static inline void tcp_reset_reno_sack(struct tcp_sock *tp)
    tp->sacked_out = 0;
    }

    +static int tcp_is_sackfrto(const struct tcp_sock *tp) {
    + return (sysctl_tcp_frto == 0x2) && !tcp_is_reno(tp);
    +}
    +
    /* F-RTO can only be used if TCP has never retransmitted anything other than
    * head (SACK enhanced variant from Appendix B of RFC4138 is more robusthere)
    */
    @@ -1701,7 +1703,7 @@ int tcp_use_frto(struct sock *sk)
    if (icsk->icsk_mtup.probe_size)
    return 0;

    - if (IsSackFrto())
    + if (tcp_is_sackfrto(tp))
    return 1;

    /* Avoid expensive walking of rexmit queue if possible */
    @@ -1791,7 +1793,7 @@ void tcp_enter_frto(struct sock *sk)
    /* Earlier loss recovery underway (see RFC4138; Appendix B).
    * The last condition is necessary at least in tp->frto_counter case.
    */
    - if (IsSackFrto() && (tp->frto_counter ||
    + if (tcp_is_sackfrto(tp) && (tp->frto_counter ||
    ((1 << icsk->icsk_ca_state) & (TCPF_CA_Recovery|TCPF_CA_Loss))) &&
    after(tp->high_seq, tp->snd_una)) {
    tp->frto_highmark = tp->high_seq;
    @@ -3123,7 +3125,7 @@ static int tcp_process_frto(struct sock *sk, int flag)
    return 1;
    }

    - if (!IsSackFrto() || tcp_is_reno(tp)) {
    + if (!tcp_is_sackfrto(tp)) {
    /* RFC4138 shortcoming in step 2; should also have case c):
    * ACK isn't duplicate nor advances window, e.g., opposite dir
    * data, winupdate
    --
    1.5.2.2


  5. Bug#478062: Fix FRTO+NewReno problem

    From: "Ilpo_Järvinen"
    Date: Thu, 8 May 2008 01:26:59 +0300 (EEST)

    > [PATCH] [TCP] FRTO: SACK variant is errorneously used with NewReno


    I applied this with a minor coding style fixup.

    From: "Ilpo_Järvinen"
    Date: Thu, 8 May 2008 01:26:59 +0300 (EEST)

    > +static int tcp_is_sackfrto(const struct tcp_sock *tp) {
    > + return (sysctl_tcp_frto == 0x2) && !tcp_is_reno(tp);
    > +}
    > +


    Should be:

    static int tcp_is_sackfrto(const struct tcp_sock *tp)
    {
    return (sysctl_tcp_frto == 0x2) && !tcp_is_reno(tp);
    }

    I will also queue this up to -stable, thanks so much for
    this bug fix!



    --
    To UNSUBSCRIBE, email to debian-bugs-dist-REQUEST@lists.debian.org
    with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

  6. Bug#478062: Fix FRTO+NewReno problem (Was: Re: This has a work around)

    Ilpo Järvinen wrote:

    Snip
    >
    > Could you next try with tcp_frto set to 1, if my theory proves to be
    > correct, it too should be "enough" to fix the problem (in this
    > particular case). Of course you can verify the patch below too if you
    > want to, the patch should allow cups<->printer to work with tcp_frto =
    > 2 too. In case you have problem to apply the patch to the particular
    > version you're want to try with, just send a note about the version
    > number to me so I can adapt the patch for you (space etc. formatting
    > issues may show up because I recently run a code style cleanup on the
    > tcp code).
    >

    Ilpo,

    I tried the tcp_frto = 1 and got the same results as = 2. Attached is
    the output of the tcpdump for frto=1. I might not get to the patch
    today as I am feeling a bit slow.

    Thanks for the work!

    --
    Damon L. Chesser
    damon@damtek.com
    http://www.linkedin.com/in/dchesser


    1210263858.691448 IP (tos 0x0, ttl 64, id 59590, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: . 390384065:390384577(512) ack 27168769 win 5840
    1210263858.695038 IP (tos 0x0, ttl 100, id 75, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x7df5 (correct), 1:1(0) ack 512 win 1024
    1210263858.695231 IP (tos 0x0, ttl 64, id 59591, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: . 1024:1536(512) ack 1 win 5840
    1210263859.394452 IP (tos 0x0, ttl 100, id 76, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x7df5 (correct), 1:1(0) ack 512 win 1024
    1210263859.394517 IP (tos 0x0, ttl 64, id 59592, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: P 512:1024(512) ack 1 win 5840
    1210263859.400855 IP (tos 0x0, ttl 100, id 77, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x7bf5 (correct), 1:1(0) ack 1024 win 1024
    1210263859.400880 IP (tos 0x0, ttl 64, id 59593, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: P 1536:2048(512) ack 1 win 5840
    1210263860.096485 IP (tos 0x0, ttl 100, id 78, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x7bf5 (correct), 1:1(0) ack 1024 win 1024
    1210263979.417094 IP (tos 0x0, ttl 64, id 59594, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: . 1024:1536(512) ack 1 win 5840
    1210263979.421796 IP (tos 0x0, ttl 100, id 82, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x79f5 (correct), 1:1(0) ack 1536 win 1024
    1210263979.421850 IP (tos 0x0, ttl 64, id 59595, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: . 2048:2560(512) ack 1 win 5840
    1210263980.121475 IP (tos 0x0, ttl 100, id 83, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x79f5 (correct), 1:1(0) ack 1536 win 1024
    1210263980.121525 IP (tos 0x0, ttl 64, id 59596, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: P 1536:2048(512) ack 1 win 5840
    1210263980.127898 IP (tos 0x0, ttl 100, id 84, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x77f5 (correct), 1:1(0) ack 2048 win 1024
    1210263980.127915 IP (tos 0x0, ttl 64, id 59597, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: P 2560:3072(512) ack 1 win 5840
    1210263980.822754 IP (tos 0x0, ttl 100, id 85, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x77f5 (correct), 1:1(0) ack 2048 win 1024
    1210264100.142724 IP (tos 0x0, ttl 64, id 59598, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: . 2048:2560(512) ack 1 win 5840
    1210264100.148733 IP (tos 0x0, ttl 100, id 92, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x75f5 (correct), 1:1(0) ack 2560 win 1024
    1210264100.148766 IP (tos 0x0, ttl 64, id 59599, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: . 3072:3584(512) ack 1 win 5840
    1210264100.843579 IP (tos 0x0, ttl 100, id 93, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x75f5 (correct), 1:1(0) ack 2560 win 1024
    1210264100.843629 IP (tos 0x0, ttl 64, id 59600, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: P 2560:3072(512) ack 1 win 5840
    1210264100.850144 IP (tos 0x0, ttl 100, id 94, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x73f5 (correct), 1:1(0) ack 3072 win 1024
    1210264100.850161 IP (tos 0x0, ttl 64, id 59601, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: P 3584:4096(512) ack 1 win 5840
    1210264101.549783 IP (tos 0x0, ttl 100, id 95, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x73f5 (correct), 1:1(0) ack 3072 win 1024
    1210264220.864362 IP (tos 0x0, ttl 64, id 59602, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: . 3072:3584(512) ack 1 win 5840
    1210264220.870832 IP (tos 0x0, ttl 100, id 99, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x71f5 (correct), 1:1(0) ack 3584 win 1024
    1210264220.870874 IP (tos 0x0, ttl 64, id 59603, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: . 4096:4608(512) ack 1 win 5840
    1210264221.570637 IP (tos 0x0, ttl 100, id 100, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x71f5 (correct), 1:1(0) ack 3584 win 1024
    1210264221.570684 IP (tos 0x0, ttl 64, id 59604, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: P 3584:4096(512) ack 1 win 5840
    1210264221.577194 IP (tos 0x0, ttl 100, id 101, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x6ff5 (correct), 1:1(0) ack 4096 win 1024
    1210264221.577211 IP (tos 0x0, ttl 64, id 59605, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: P 4608:5120(512) ack 1 win 5840
    1210264222.271894 IP (tos 0x0, ttl 100, id 102, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x6ff5 (correct), 1:1(0) ack 4096 win 1024
    1210264341.590019 IP (tos 0x0, ttl 64, id 59606, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: . 4096:4608(512) ack 1 win 5840
    1210264341.593578 IP (tos 0x0, ttl 100, id 106, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x6df5 (correct), 1:1(0) ack 4608 win 1024
    1210264341.593627 IP (tos 0x0, ttl 64, id 59607, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: . 5120:5632(512) ack 1 win 5840
    1210264342.292717 IP (tos 0x0, ttl 100, id 110, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x6df5 (correct), 1:1(0) ack 4608 win 1024
    1210264342.292767 IP (tos 0x0, ttl 64, id 59608, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: P 4608:5120(512) ack 1 win 5840
    1210264342.299168 IP (tos 0x0, ttl 100, id 111, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x6bf5 (correct), 1:1(0) ack 5120 win 1024
    1210264342.299188 IP (tos 0x0, ttl 64, id 59609, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: P 5632:6144(512) ack 1 win 5840
    1210264342.999071 IP (tos 0x0, ttl 100, id 112, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x6bf5 (correct), 1:1(0) ack 5120 win 1024
    1210264462.315638 IP (tos 0x0, ttl 64, id 59610, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: . 5120:5632(512) ack 1 win 5840
    1210264462.320063 IP (tos 0x0, ttl 100, id 116, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x69f5 (correct), 1:1(0) ack 5632 win 1024
    1210264462.320099 IP (tos 0x0, ttl 64, id 59611, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: . 6144:6656(512) ack 1 win 5840
    1210264463.019784 IP (tos 0x0, ttl 100, id 117, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x69f5 (correct), 1:1(0) ack 5632 win 1024
    1210264463.019848 IP (tos 0x0, ttl 64, id 59612, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: P 5632:6144(512) ack 1 win 5840
    1210264463.026210 IP (tos 0x0, ttl 100, id 118, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x67f5 (correct), 1:1(0) ack 6144 win 1024
    1210264463.026256 IP (tos 0x0, ttl 64, id 59613, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: P 6656:7168(512) ack 1 win 5840
    1210264463.721025 IP (tos 0x0, ttl 100, id 119, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x67f5 (correct), 1:1(0) ack 6144 win 1024
    1210264583.041283 IP (tos 0x0, ttl 64, id 59614, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: . 6144:6656(512) ack 1 win 5840
    1210264583.047127 IP (tos 0x0, ttl 100, id 123, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x65f5 (correct), 1:1(0) ack 6656 win 1024
    1210264583.047166 IP (tos 0x0, ttl 64, id 59615, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: . 7168:7680(512) ack 1 win 5840
    1210264583.741856 IP (tos 0x0, ttl 100, id 124, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x65f5 (correct), 1:1(0) ack 6656 win 1024
    1210264583.741920 IP (tos 0x0, ttl 64, id 59616, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: P 6656:7168(512) ack 1 win 5840
    1210264583.748296 IP (tos 0x0, ttl 100, id 125, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x63f5 (correct), 1:1(0) ack 7168 win 1024
    1210264583.748351 IP (tos 0x0, ttl 64, id 59617, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: P 7680:8192(512) ack 1 win 5840
    1210264584.443149 IP (tos 0x0, ttl 100, id 126, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x63f5 (correct), 1:1(0) ack 7168 win 1024
    1210264703.766913 IP (tos 0x0, ttl 64, id 59618, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: . 7168:7680(512) ack 1 win 5840
    1210264703.770428 IP (tos 0x0, ttl 100, id 133, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x61f5 (correct), 1:1(0) ack 7680 win 1024
    1210264703.770462 IP (tos 0x0, ttl 64, id 59619, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: . 8192:8704(512) ack 1 win 5840
    1210264704.469029 IP (tos 0x0, ttl 100, id 134, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x61f5 (correct), 1:1(0) ack 7680 win 1024
    1210264704.469076 IP (tos 0x0, ttl 64, id 59620, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: P 7680:8192(512) ack 1 win 5840
    1210264704.475326 IP (tos 0x0, ttl 100, id 135, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x5ff5 (correct), 1:1(0) ack 8192 win 1024
    1210264704.475346 IP (tos 0x0, ttl 64, id 59621, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: P 8704:9216(512) ack 1 win 5840
    1210264705.170173 IP (tos 0x0, ttl 100, id 136, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x5ff5 (correct), 1:1(0) ack 8192 win 1024
    1210264824.492552 IP (tos 0x0, ttl 64, id 59622, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: . 8192:8704(512) ack 1 win 5840
    1210264824.496167 IP (tos 0x0, ttl 100, id 140, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x5df5 (correct), 1:1(0) ack 8704 win 1024
    1210264824.496197 IP (tos 0x0, ttl 64, id 59623, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: . 9216:9728(512) ack 1 win 5840
    1210264825.191007 IP (tos 0x0, ttl 100, id 141, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x5df5 (correct), 1:1(0) ack 8704 win 1024
    1210264825.191069 IP (tos 0x0, ttl 64, id 59624, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: P 8704:9216(512) ack 1 win 5840
    1210264825.197574 IP (tos 0x0, ttl 100, id 142, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x5bf5 (correct), 1:1(0) ack 9216 win 1024
    1210264825.197623 IP (tos 0x0, ttl 64, id 59625, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.33718 > 192.168.200.150.9100: P 9728:10240(512) ack 1 win 5840
    1210264825.897222 IP (tos 0x0, ttl 100, id 143, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.33718: ., cksum 0x5bf5 (correct), 1:1(0) ack 9216 win 1024


  7. Bug#478062: Fix FRTO+NewReno problem (Was: Re: This has a work around)

    Ilpo Järvinen wrote:

    Snip
    >
    > Could you next try with tcp_frto set to 1, if my theory proves to be
    > correct, it too should be "enough" to fix the problem (in this
    > particular case). Of course you can verify the patch below too if you
    > want to, the patch should allow cups<->printer to work with tcp_frto =
    > 2 too. In case you have problem to apply the patch to the particular
    > version you're want to try with, just send a note about the version
    > number to me so I can adapt the patch for you (space etc. formatting
    > issues may show up because I recently run a code style cleanup on the
    > tcp code).
    >

    Ilpo,

    and all others, please ignore frtoprob1.txt. I ran the test on the
    wrong kernel. The tested kernel was 2.6.23.17-amd64 and should have
    been 2.6.24-1-amd64. I am re-running the test and will forward the new
    dump. No idea why 2.6.23.17 did not print, it always did before. I am
    sure I did something stupid, feeling a bit under the weather. Sorry for
    the confusion. New test comming shortly.



    --
    Damon L. Chesser
    damon@damtek.com
    http://www.linkedin.com/in/dchesser





    --
    To UNSUBSCRIBE, email to debian-bugs-dist-REQUEST@lists.debian.org
    with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

  8. Bug#478062: Fix FRTO+NewReno problem (Was: Re: This has a work around)

    Ilpo Järvinen wrote:
    > SNIP


    >
    > Could you next try with tcp_frto set to 1, if my theory proves to be
    > correct, it too should be "enough" to fix the problem (in this
    > particular case). Of course you can verify the patch below too if you
    > want to, the patch should allow cups<->printer to work with tcp_frto =
    > 2 too. In case you have problem to apply the patch to the particular
    > version you're want to try with, just send a note about the version
    > number to me so I can adapt the patch for you (space etc. formatting
    > issues may show up because I recently run a code style cleanup on the
    > tcp code).


    Ilpo,

    reran the print job with the correct kernel (for control reasons) and
    received the same results: tcp_frto=1 no print. tcp_frto=0 I can
    print. Attached is the output of tcpdump

    uname -r = 2.6.24-1-amd64

    Thanks for the work!


    --
    Damon L. Chesser
    damon@damtek.com
    http://www.linkedin.com/in/dchesser


    1210269685.442292 IP (tos 0x0, ttl 64, id 65247, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.53527 > 192.168.200.150.9100: P 1730585386:1730585898(512) ack 129179649 win 5840
    1210269685.446179 IP (tos 0x0, ttl 100, id 22, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.53527: ., cksum 0x7134 (correct), 1:1(0) ack 512 win 1024
    1210269685.446221 IP (tos 0x0, ttl 64, id 65248, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.53527 > 192.168.200.150.9100: P 1024:1536(512) ack 1 win 5840
    1210269686.140866 IP (tos 0x0, ttl 100, id 23, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.53527: ., cksum 0x7134 (correct), 1:1(0) ack 512 win 1024
    1210269714.626282 IP (tos 0x0, ttl 64, id 65249, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.53527 > 192.168.200.150.9100: . 512:1024(512) ack 1 win 5840
    1210269714.632392 IP (tos 0x0, ttl 100, id 24, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.53527: ., cksum 0x6f34 (correct), 1:1(0) ack 1024 win 1024
    1210269714.632425 IP (tos 0x0, ttl 64, id 65250, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.53527 > 192.168.200.150.9100: . 1536:2048(512) ack 1 win 5840
    1210269715.327141 IP (tos 0x0, ttl 100, id 25, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.53527: ., cksum 0x6f34 (correct), 1:1(0) ack 1024 win 1024
    1210269772.998283 IP (tos 0x0, ttl 64, id 65251, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.53527 > 192.168.200.150.9100: P 1024:1536(512) ack 1 win 5840
    1210269773.005111 IP (tos 0x0, ttl 100, id 29, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.53527: ., cksum 0x6d34 (correct), 1:1(0) ack 1536 win 1024
    1210269773.005144 IP (tos 0x0, ttl 64, id 65252, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.53527 > 192.168.200.150.9100: P 2048:2560(512) ack 1 win 5840
    1210269773.704682 IP (tos 0x0, ttl 100, id 30, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.53527: ., cksum 0x6d34 (correct), 1:1(0) ack 1536 win 1024
    1210269889.738282 IP (tos 0x0, ttl 64, id 65253, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.53527 > 192.168.200.150.9100: . 1536:2048(512) ack 1 win 5840
    1210269889.745981 IP (tos 0x0, ttl 100, id 37, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.53527: ., cksum 0x6b34 (correct), 1:1(0) ack 2048 win 1024
    1210269889.746012 IP (tos 0x0, ttl 64, id 65254, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.53527 > 192.168.200.150.9100: . 2560:3072(512) ack 1 win 5840
    1210269890.444886 IP (tos 0x0, ttl 100, id 38, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.53527: ., cksum 0x6b34 (correct), 1:1(0) ack 2048 win 1024
    1210270009.746305 IP (tos 0x0, ttl 64, id 65255, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.53527 > 192.168.200.150.9100: P 2048:2560(512) ack 1 win 5840
    1210270009.752297 IP (tos 0x0, ttl 100, id 42, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.53527: ., cksum 0x6934 (correct), 1:1(0) ack 2560 win 1024
    1210270009.752335 IP (tos 0x0, ttl 64, id 65256, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.53527 > 192.168.200.150.9100: P 3072:3584(512) ack 1 win 5840
    1210270010.449445 IP (tos 0x0, ttl 100, id 43, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.53527: ., cksum 0x6934 (correct), 1:1(0) ack 2560 win 1024
    1210270129.749294 IP (tos 0x0, ttl 64, id 65257, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.53527 > 192.168.200.150.9100: . 2560:3072(512) ack 1 win 5840
    1210270129.755196 IP (tos 0x0, ttl 100, id 47, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.53527: ., cksum 0x6734 (correct), 1:1(0) ack 3072 win 1024
    1210270129.755227 IP (tos 0x0, ttl 64, id 65258, offset 0, flags [DF], proto TCP (6), length 552) 192.168.200.15.53527 > 192.168.200.150.9100: . 3584:4096(512) ack 1 win 5840
    1210270130.453997 IP (tos 0x0, ttl 100, id 48, offset 0, flags [none], proto TCP (6), length 40) 192.168.200.150.9100 > 192.168.200.15.53527: ., cksum 0x6734 (correct), 1:1(0) ack 3072 win 1024


  9. Bug#478062: Fix FRTO+NewReno problem (Was: Re: This has a work around)

    On Thu, 8 May 2008, Damon L. Chesser wrote:

    > Ilpo Järvinen wrote:
    > > SNIP

    >
    > >
    > > Could you next try with tcp_frto set to 1, if my theory proves to be
    > > correct, it too should be "enough" to fix the problem (in this particular
    > > case). Of course you can verify the patch below too if you want to, the
    > > patch should allow cups<->printer to work with tcp_frto = 2 too. Incase you
    > > have problem to apply the patch to the particular version you're wantto try
    > > with, just send a note about the version number to me so I can adapt the
    > > patch for you (space etc. formatting issues may show up because I recently
    > > run a code style cleanup on the tcp code).

    >
    > reran the print job with the correct kernel (for control reasons) and received
    > the same results: tcp_frto=1 no print. tcp_frto=0 I can print. Attached is
    > the output of tcpdump
    >
    > uname -r = 2.6.24-1-amd64


    Well, that was a surprise, there must be something else too I didn't yet
    notice. I don't think it's that necessary for you to test that patch I
    sent earlier (basically the code paths it would have fixed were already in
    use with tcp_frto=1). And that patch was "obviously correct" anyway though
    it wasn't enough to fix this issue.

    ....I too can probably reproduce this locally with small amount of work
    because the receiver pattern is dead obvious from the logs.

    --
    i.

  10. Bug#478062: Fix FRTO+NewReno problem (Was: Re: This has a work around)

    On Thu, 8 May 2008, Ilpo Järvinen wrote:

    > On Thu, 8 May 2008, Damon L. Chesser wrote:
    >
    > > reran the print job with the correct kernel (for control reasons) and
    > > received
    > > the same results: tcp_frto=1 no print. tcp_frto=0 I can print. Attached
    > > is
    > > the output of tcpdump
    > >
    > > uname -r = 2.6.24-1-amd64

    >
    > Well, that was a surprise, there must be something else too I didn't yet
    > notice. I don't think it's that necessary for you to test that patch I sent
    > earlier (basically the code paths it would have fixed were already in use with
    > tcp_frto=1). And that patch was "obviously correct" anyway though it wasn't
    > enough to fix this issue.
    >
    > ...I too can probably reproduce this locally with small amount of work because
    > the receiver pattern is dead obvious from the logs.


    Yes indeed, some hping3 tcl acting as a clone of that network printer did
    it :-). Below is the 2nd patch (both are necessary). Besides them there's
    still SACKFRTO snd_nxt != frto_highmark problem remaining but it is a lot
    less severe and rare than this problem was and I'm still trying to find a
    simple way to fix it w/o adding another u32 to tcp_sock. I may need to
    think this retrans_stamp usage more around the rest of TCP code too as
    it seems to be somewhat suspicious here and there.

    --
    i.

    ps. ...you could have at least considered reporting upstream a bit
    earlier if some problem goes away/appears by changing kernel version
    (especially since you already tried some non-distro kernels and found
    them non-working), it might help to catch devs attention who hardly
    hang much around distro bug trackers :-).

    --
    [PATCH] [TCP] FRTO: Fix fallback to conventional recovery

    It seems that commit 009a2e3e4ec ("[TCP] FRTO: Improve
    interoperability with other undo_marker users") run into
    another land-mine which caused fallback to conventional
    recovery to break:

    1. Cumulative ACK arrives after FRTO retransmission
    2. tcp_try_to_open sees zero retrans_out, clears retrans_stamp
    which should be kept like in CA_Loss state it would be
    3. undo_marker change allowed tcp_packet_delayed to return
    true because of the cleared retrans_stamp once FRTO is
    terminated causing LossUndo to occur, which means all loss
    markings FRTO made are reverted.

    This means that the conventional recovery basically recovered
    one loss per RTO, which is not that efficient. It becomes a
    serious problem to progress of the flow if many segments were
    lost or when losses will persist to the FRTO RTTs as well.
    Retrans_stamp was incorrectly cleared even before that
    particular change (though it's effect is not often significant).

    It was quite unobvious that the undo_marker change broken
    something like this, I had a quite long session to track it
    down because of the non-intuitiviness of the bug (luckily I
    had a trivial reproducer at hand and I was also able to learn
    to use kprobes in the process as well :-)).

    This together with the NewReno+FRTO fix (62ab22278308a)
    should finally fix Damon's problems.

    Compile tested (but I did experiment with a similar fix on
    a live kernel with systemtap+kprobes).

    Signed-off-by: Ilpo Järvinen
    Reported-by: Damon L. Chesser
    ---
    net/ipv4/tcp_input.c | 2 +-
    1 files changed, 1 insertions(+), 1 deletions(-)

    diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
    index 81ece1f..4c2255c 100644
    --- a/net/ipv4/tcp_input.c
    +++ b/net/ipv4/tcp_input.c
    @@ -2481,7 +2481,7 @@ static void tcp_try_to_open(struct sock *sk, int flag)

    tcp_verify_left_out(tp);

    - if (tp->retrans_out == 0)
    + if (!tp->frto_counter && tp->retrans_out == 0)
    tp->retrans_stamp = 0;

    if (flag & FLAG_ECE)
    --
    1.5.2.2

+ Reply to Thread