After a while all outbound connections get stuck in SYN_SENT - Networking

This is a discussion on After a while all outbound connections get stuck in SYN_SENT - Networking ; I have a Java application that makes a large number of outbound webservice calls over HTTP/TCP. The hosts contacted are a fixed set of about 2000 hosts and a web service call is made to each of them approximately every ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: After a while all outbound connections get stuck in SYN_SENT

  1. After a while all outbound connections get stuck in SYN_SENT

    I have a Java application that makes a large number of outbound
    webservice calls over HTTP/TCP. The hosts contacted are a fixed set
    of about 2000 hosts and a web service call is made to each of them
    approximately every 5 mintues by a pool of 200 Java threads. Over
    time, on average a percentage of these hosts are unreachable for one
    reason or another, so there is a persistent count of sockets in the
    SYN_SENT state in the range of about 60-80. This is fine, as these
    failed connection attempts eventually time out.

    However, after approximately 38 hours of operation, all outbound
    connection attemtps get stuck in the SYN_SENT state. It happens
    instantaneously, where we go from the baseline of about 60-80 sockets
    in SYN_SENT to a count of 200 (corresponding to the # of java threads
    that make these calls). I've tried several things to clear this
    problem up, including:

    1) Restarting the Java application
    2) ip route flush cache
    3) Start/stop networking
    4) rmmod/insmod the kernel driver for the NIC
    5) Tuning of /proc/sys/net/ipv4/tcp_syn_retries
    6) Disabling /proc/sys/net/ipv4/tcp_syncookies

    However, after each of these countermeasures, the outbound connections
    still get stuck in SYN_SENT. During this time, I am still able to SSH
    to the box and run wget www.google.com, etc, so the problem appears to
    be specific to the hosts that I'm accessing via the webservices. The
    only thing that makes this problem go away is to restart the entire
    Linux box. Once I do this and restart my application it works
    perfectly fine... for 38 hours until it occurs again.

    I'm running kernel 2.6.18 on RedHat, but have had this problem occur
    on other kernel versions. I've also had this problem occur on
    different boxes, NICs, routers, co-location facilities, and several
    other variables. The only thing in common is my application and the
    fact that it is Linux, so I have to believe that my application is
    causing something wierd in the kernel, since an application restart
    doesn't help.

    Any ideas?

  2. Re: After a while all outbound connections get stuck in SYN_SENT

    On Mon, 10 Dec 2007 07:22:50 -0800 (PST), I waved a wand and this
    message magically appears in front of JamesNichols3@gmail.com:

    > Any ideas?


    In my experience Java on Linux is very poor at this sort of thing.
    Which JVM are you using with Linux?

    Native networking apps works perfectly well though. If you rewrite your
    Java app as a pure C app, it will solve all your problems.
    --
    http://www.munted.org.uk

    Fearsome grindings.

  3. Re: After a while all outbound connections get stuck in SYN_SENT

    I figured it out a countermeasure.

    Bear in mind that the application works flawlessly for 38 hours before
    all outbound connections get stuck in SYN_SENT state.

    When the problem occurs after 38 hours, I disable tcp_sack and the
    problem went away- i.e. connectivity was restored. I'm not sure, but
    there must be a bug in the linux tcp_sack implementation that is
    causing a buffer or memory structure to fill up after 38 hours. Any
    thoughts on how to pin this issue down to a specific problem in the
    kernel?

+ Reply to Thread