Re: Connections hung TIME_WAIT state - TCP-IP

This is a discussion on Re: Connections hung TIME_WAIT state - TCP-IP ; Kami wrote: > here goes the problem.... > there are two servers, server A and server B.... Server A is running > apache, and server B is running memcached (database query result > caching)... > now the server A connects ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: Re: Connections hung TIME_WAIT state

  1. Re: Connections hung TIME_WAIT state

    Kami wrote:
    > here goes the problem....


    > there are two servers, server A and server B.... Server A is running
    > apache, and server B is running memcached (database query result
    > caching)...


    > now the server A connects to server B on a specified port....
    > i'm using ab to generate request on server A locally.. so a high load
    > hit situation can be simulated for the caching server...
    > NOw the problem is ... the server B cleans up its connection
    > properly... but on server A, the connections keep hanging in TIME_WAIT
    > state, their numer start increasing, and eventually i get connection
    > timeout to server B...


    How do you know the connections are "hung" in TIME_WAIT?

    I believe the "problem" is that the application software is trying to
    establish and tear-down TCP connections "too fast" where too fast is

    >= sizeof(clientportspace)/lengthof(TIME_WAIT)



    > here is a list of variables along with the values i changed , in an
    > attempt to forcfully kill the TIME_WAIT connections...


    > net.ipv4.netfilter.ip_conntrack_tcp_timeout_fin_wa it=1
    > net.ipv4.tcp_fin_timeout=1


    > net.ipv4.netfilter.ip_conntrack_tcp_timeout_close= 1
    > net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_w ait=1
    > net.ipv4.netfilter.ip_conntrack_tcp_timeout_last_a ck=1
    > net.ipv4.netfilter.ip_conntrack_tcp_timeout_close_ wait=1
    > net.ipv4.netfilter.ip_conntrack_tcp_timeout_fin_wa it=1


    > net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_re cv=2
    > net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_se nt=2


    > net.ipv4.tcp_fin_timeout=1


    > these variables were changed using sysctl -w, but to no use..


    indeed, because none of them are directly related to the problem your
    applications have. I hope you set those things back

    TIME_WAIT is an integral part of TCP's correctness algorithms. It is
    there to protect new connections by the same "name" from inadvertantly
    accepting segments from old connections and thus corrupting data.

    Strictly speaking, TIME_WAIT is supposed to last as long as four
    minutes, so the connection rate that could result in attempts to reuse
    a TCP connection name (local/remote IP, local/remote port) that is
    still in TIME_WAIT would be:

    sizeof(portspace)/240

    If your client application is allowing the stack to pick the local
    port number (eg is not calling bind() to pick a port number itself),
    then likely as not, the range of ports it gets will be 49152 to 65535
    or ~16384 port numbers:

    16384/240

    or 68 connections per second.

    The best "fix" is to get your applications to use long-lived TCP
    connections. The next best fix after that is to broaden the number of
    ports (and perhaps IP addresses) involved. One way to do that is to
    have the application attempt to bind() to port numbers in the range of
    day 5000 to 65535. That would increase the rate before attempted
    TIME_WAIT reuse to

    65000/240

    or ~270 connections per second.

    You could achieve similar results by spreading the traffic across a
    larger number of IP addresses - on the client, the server, or both.

    A much more distant fourth option is to decrease the length of TIME_WAIT to
    say 60 seconds (math left as an excercise to the reader

    However, you should "never" take steps to make there be no TIME_WAIT
    state at all, such as using an "abortive" close.

    rick jones
    --
    firebug n, the idiot who tosses a lit cigarette out his car window
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  2. Re: Connections hung TIME_WAIT state


    In article , Rick Jones writes:
    > Kami wrote:
    >
    > > now the server A connects to server B on a specified port....
    > > i'm using ab to generate request on server A locally.. so a high load
    > > hit situation can be simulated for the caching server...
    > > NOw the problem is ... the server B cleans up its connection
    > > properly... but on server A, the connections keep hanging in TIME_WAIT
    > > state, their numer start increasing, and eventually i get connection
    > > timeout to server B...

    >
    > How do you know the connections are "hung" in TIME_WAIT?


    It's a pity that the TCP designers didn't give the state a more
    explanatory name, like I_HAVE_TO_WAIT_FOR_A_CERTAIN_LENGTH_OF_TIME.

    > I believe the "problem" is that the application software is trying to
    > establish and tear-down TCP connections "too fast" where too fast is
    >
    > >= sizeof(clientportspace)/lengthof(TIME_WAIT)


    Y'know, I just went through this with one of our testers, who had
    a similar "load test" that fired SOAP requests at our server as
    fast as it could using hundreds of client threads. Fortunately,
    he took an Ethereal trace before reporting a problem, so I could
    show him right where he started reusing client ports that hadn't
    finished TIME_WAIT yet.

    > If your client application is allowing the stack to pick the local
    > port number (eg is not calling bind() to pick a port number itself),
    > then likely as not, the range of ports it gets will be 49152 to 65535
    > or ~16384 port numbers:
    >
    > 16384/240
    >
    > or 68 connections per second.
    >
    > The best "fix" is to get your applications to use long-lived TCP
    > connections. The next best fix after that is to broaden the number of
    > ports (and perhaps IP addresses) involved. One way to do that is to
    > have the application attempt to bind() to port numbers in the range of
    > day 5000 to 65535. That would increase the rate before attempted
    > TIME_WAIT reuse to
    >
    > 65000/240
    >
    > or ~270 connections per second.


    Er, 60536/240, or ~252 connections per second. Though of course
    the principle is correct.

    On some platforms you can change the range of ports the system
    will assign for ephemeral use, rather than making the application
    bind explicitly on the client side, though of course at some loss
    of portability.

    Also, when testing over a loopback connection, if the server could
    use a port in the client's port space, watch out for self-connect,
    if the platform's stack supports it. *That* can produce some odd
    errors in testing.

    See eg
    http://groups.google.com/group/comp....0ce279fd1e2db0
    or
    http://tinyurl.com/owkhd

    --
    Michael Wojcik michael.wojcik@microfocus.com

    Most people believe that anything that is true is true for a reason.
    These theorems show that some things are true for no reason at all,
    i.e., accidentally, or at random. -- G J Chaitin

  3. Re: Connections hung TIME_WAIT state


    >> How do you know the connections are "hung" in TIME_WAIT?


    > It's a pity that the TCP designers didn't give the state a more
    > explanatory name, like I_HAVE_TO_WAIT_FOR_A_CERTAIN_LENGTH_OF_TIME.


    Then people would have been complaining about how much space that
    consumed in netstat output...

    FWIW, the purpose of TIME_WAIT is discussed in the TCP RFCs and just
    about any decent book on TCP out there


    > On some platforms you can change the range of ports the system will
    > assign for ephemeral use, rather than making the application bind
    > explicitly on the client side, though of course at some loss of
    > portability.


    Indeed. My preference is to have ways to configure the application to
    do the right thing without relying on the system administrator.

    rick jones
    --
    firebug n, the idiot who tosses a lit cigarette out his car window
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  4. Re: Connections hung TIME_WAIT state

    Thank you all for ur replies ..
    i wasn't able to come online for the past few days.. and would be going
    through ur replies in detail now...


+ Reply to Thread