TCP socket, shutdown, eats last few bytes in buffer (rarely) - Unix

This is a discussion on TCP socket, shutdown, eats last few bytes in buffer (rarely) - Unix ; On Linux 2.6.19 Open input and output TCP sockets. Copy from input to output (a lot). The last write() comes back without any errors. Shutdown the socket Usually this works. However every so often the shutdown() seems to eat whatever ...

+ Reply to Thread
Results 1 to 17 of 17

Thread: TCP socket, shutdown, eats last few bytes in buffer (rarely)

  1. TCP socket, shutdown, eats last few bytes in buffer (rarely)

    On Linux 2.6.19

    Open input and output TCP sockets.
    Copy from input to output (a lot).
    The last write() comes back without any errors.
    Shutdown the socket

    Usually this works. However every so often the shutdown() seems to eat
    whatever the lat write() sent to the transmission buffer, such that the
    next node never receives it. The shutdown is like this:

    if(0!=shutdown(fd, SHUT_RDWR)){
    fprintf(stderr, "Error on shutdown of network stream\n");
    }

    and it never emits the error message so I assume it closed correctly.
    However sometimes the next node gets stuck waiting for the data from the
    last write(), and that node eventually times out.

    Shouldn't shutdown() used this way on a TCP socket wait until the
    data has been received at the other end before tearing down the connection?

    Here is a specific example of a failure. The 18th node in the chain
    (the error hops around and does not occur on every run), with heavy
    debugging turned on, shows that the output buffer emptied down to just
    5016 bytes when the input socket closed:

    monkey19.cluster: Flushing buffer... 2008-11-03 15:52:22.391
    DEBUG PREIO RB 3 type a holds 5016 misses 1 count 204794984
    DEBUG POSTIO RB 3 bytes moved 5016 count
    204800000

    (at this point write() has returned and all 204800000 bytes have been
    written)

    DEBUG closing fd 7 type 6
    DEBUG closing fd 8 type A <-- this is where the shutdown() is called

    So far, so good, and it does not look different from any other
    successful run, but on the next node downstream the log file holds:

    DEBUG - select - fdmax 7
    DEBUG PREIO RB 2 type 9 holds 204793536 misses 2 count
    204793536
    DEBUG POSTIO RB 2 bytes moved 1448 count
    204794984
    DEBUG - select - fdmax 7
    monkey20.cluster: No input data received within Timeout period of 10
    seconds.

    Translation, the downstream node read up to byte 204794984 of the TCP
    stream, whereupon it hangs and receives no more input and eventually
    times out. (Specifically, it does NOT see the connection closed, as
    that should have triggered the select(), not fallen into the signal
    hander.) But 204794984 + 5016 = 204800000. So it is waiting for
    exactly the amount of the last write() on the preceding node. I have
    seen 4 of these, and the values vary, but it is always like this,
    last write() amount == missing amount on next node.

    After the last input data has been read (which it knows because the
    input socket was closed at the other end), but before that data has been
    sent to the output socket, this code is executed.

    fprintf(stderr,"Flushing buffer... %s",show_time());
    signal(SIGALRM, SIG_IGN);

    The show_time() function calls time(), localtime(), and gettimeofday().
    In uses them to construct this format: "2008-11-03 15:52:14.590"

    The SIGALRM is canceled because it has been up to this point attached to
    a signal hander which goes off if more than N seconds pass with no input
    received. Once the input socket closes that timer must not run any more
    (since definitely no more input will be found), yet the IO loop may not
    complete for a while as it unspools the data to the output.

    I'm wondering if every so often one of these time/signal related calls
    interferes with the subsequent shutdown(). Or maybe there is something
    else one is supposed to do to be sure that the transmission buffer is
    empty before shutdown() is called? Some sort of "netflush"???

    Thanks,

    David Mathog

  2. Re: TCP socket, shutdown, eats last few bytes in buffer (rarely)

    On Nov 4, 2:10 am, David Mathog wrote:
    > On Linux 2.6.19
    >
    > Open input and output TCP sockets.
    > Copy from input to output (a lot).
    > The last write() comes back without any errors.
    > Shutdown the socket
    >
    > Usually this works. However every so often the shutdown() seems to eat
    > whatever the lat write() sent to the transmission buffer, such that the
    > next node never receives it. The shutdown is like this:
    >
    > if(0!=shutdown(fd, SHUT_RDWR)){
    > fprintf(stderr, "Error on shutdown of network stream\n");
    > }
    >
    > and it never emits the error message so I assume it closed correctly.
    > However sometimes the next node gets stuck waiting for the data from the
    > last write(), and that node eventually times out.
    >
    > Shouldn't shutdown() used this way on a TCP socket wait until the
    > data has been received at the other end before tearing down the connection?


    And what happends if it's never received?
    I think you can use fsync(2) with a socket.

    Ie:

    last_write(fd);
    fsync(fd);
    shutdown(fd, SHUT_RDWR)

  3. Re: TCP socket, shutdown, eats last few bytes in buffer (rarely)

    On Nov 4, 12:10*am, David Mathog wrote:
    > On Linux 2.6.19
    >
    > * *Open input and output TCP sockets.
    > * *Copy from input to output (a lot).
    > * *The last write() comes back without any errors.
    > * *Shutdown the socket
    >
    > Usually this works. *However every so often the shutdown() seems to eat
    > whatever the lat write() sent to the transmission buffer, such that the
    > next node never receives it. *The shutdown is like this:
    >
    > * *if(0!=shutdown(fd, SHUT_RDWR)){
    > * * *fprintf(stderr, "Error on shutdown of network stream\n");
    > * *}
    >
    > and it never emits the error message so I assume it closed correctly.
    > However sometimes the next node gets stuck waiting for the data from the
    > last write(), and that node eventually times out.
    >
    > Shouldn't shutdown() used this way on a TCP socket wait until the
    > data has been received at the other end before tearing down the connection?


    Only if you set SO_LINGER socket option. Otherwise the call returns
    immediately and the waiting is done in the background (by the kernel).

    close() call does the same thing as shutdown(fd, SHUT_RDWR) does, so
    in this case shutdown() is unnecessary, just close() the socket.

    [...application log snipped...]

    Your application log is pretty much useless for debugging this sort of
    problems.

    You need to look at tcpdump (or wireshark) output on the sender to see
    what was the last TCP segment sent (it should be a TCP data segment
    with FIN flag set) when you called shutdown() or close().

    > Translation, the downstream node read up to byte 204794984 of the TCP
    > stream, whereupon it hangs and receives no more input and eventually
    > times out. *(Specifically, it does NOT see the connection closed, as
    > that should have triggered the select(), not fallen into the signal
    > hander.) *But 204794984 + 5016 = 204800000. *So it is waiting for
    > exactly the amount of the last write() on the preceding node. *I have
    > seen 4 of these, and the values vary, but it is always like this,
    > last write() amount == missing amount on next node.


    Again, you should be looking at tcpdump on the receiver when it times
    out. If you see a TCP segment with FIN flag set has arrived when it
    times out, it must a problem with the receiver.

    > After the last input data has been read (which it knows because the
    > input socket was closed at the other end), but before that data has been
    > sent to the output socket, this code is executed.
    >
    > * * * * fprintf(stderr,"Flushing buffer... *%s",show_time());
    > * * * * signal(SIGALRM, SIG_IGN);
    >
    > The show_time() function calls time(), localtime(), and gettimeofday().
    > * In uses them to construct this format: "2008-11-03 15:52:14.590"
    >
    > The SIGALRM is canceled because it has been up to this point attached to
    > a signal hander which goes off if more than N seconds pass with no input
    > received. *Once the input socket closes that timer must not run any more
    > (since definitely no more input will be found), yet the IO loop may not
    > complete for a while as it unspools the data to the output.
    >
    > I'm wondering if every so often one of these time/signal related calls
    > interferes with the subsequent shutdown(). *Or maybe there is something
    > else one is supposed to do to be sure that the transmission buffer is
    > empty before shutdown() is called? *Some sort of "netflush"???


    It would be quite helpful if you posted your send()ing and recv()ing
    code.

    --
    Max

  4. Re: TCP socket, shutdown, eats last few bytes in buffer (rarely)

    David Mathog wrote:
    > On Linux 2.6.19


    > Open input and output TCP sockets.
    > Copy from input to output (a lot).
    > The last write() comes back without any errors.
    > Shutdown the socket


    > Usually this works. However every so often the shutdown() seems to eat
    > whatever the lat write() sent to the transmission buffer, such that the
    > next node never receives it. The shutdown is like this:


    > if(0!=shutdown(fd, SHUT_RDWR)){
    > fprintf(stderr, "Error on shutdown of network stream\n");
    > }


    In netperf, to be "certain" the data was received by the remote on the
    "test" connection and to avoid a race between the last bytes of data
    hitting the wire and close() happening, I use shutdown(fd, SHUT_WR)
    (note the lack of RD) and then await the remote's shutdown by waiting
    for a zero byte return from recv(). In a proper "robust" application
    there would probably be a select there or somesuch and some timeouts.

    rick jones
    --
    a wide gulf separates "what if" from "if only"
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  5. Re: TCP socket, shutdown, eats last few bytes in buffer (rarely)

    Rick Jones wrote:
    > David Mathog wrote:
    >> if(0!=shutdown(fd, SHUT_RDWR)){
    >> fprintf(stderr, "Error on shutdown of network stream\n");
    >> }

    >
    > In netperf, to be "certain" the data was received by the remote on the
    > "test" connection and to avoid a race between the last bytes of data
    > hitting the wire and close() happening, I use shutdown(fd, SHUT_WR)
    > (note the lack of RD)


    That appears to be it. If shutdown with SHUT_RDWR is used when the
    transmit buffer has not yet been emptied, in at least some instances,
    that data never will be sent. SHUT_RDWR was in that code for years
    without causing even a hiccup - it took an odd combination of run time
    parameters to expose the race condition.

    In any case, so far changing it to SHUT_WR has done the trick.

    Thanks,

    David Mathog

  6. Re: TCP socket, shutdown, eats last few bytes in buffer (rarely)

    On Nov 4, 7:32*pm, Rick Jones wrote:
    > David Mathog wrote:
    > > On Linux 2.6.19
    > > * *Open input and output TCP sockets.
    > > * *Copy from input to output (a lot).
    > > * *The last write() comes back without any errors.
    > > * *Shutdown the socket
    > > Usually this works. *However every so often the shutdown() seems to eat
    > > whatever the lat write() sent to the transmission buffer, such that the
    > > next node never receives it. *The shutdown is like this:
    > > * *if(0!=shutdown(fd, SHUT_RDWR)){
    > > * * *fprintf(stderr, "Error on shutdown of network stream\n");
    > > * *}

    >
    > In netperf, to be "certain" the data was received by the remote on the
    > "test" connection and to avoid a race between the last bytes of data
    > hitting the wire and close() happening, I use shutdown(fd, SHUT_WR)
    > (note the lack of RD) and then await the remote's shutdown by waiting
    > for a zero byte return from recv(). *In a proper "robust" application
    > there would probably be a select there or somesuch and some timeouts.


    Why is shutdown(fd, SHUT_WR) necessary? Does not close() send out all
    the data followed by FIN and then wait for FIN/ACK?

    --
    Max

  7. Re: TCP socket, shutdown, eats last few bytes in buffer (rarely)

    Maxim Yegorushkin writes:
    > On Nov 4, 7:32*pm, Rick Jones wrote:
    >> David Mathog wrote:
    >> > On Linux 2.6.19
    >> > * *Open input and output TCP sockets.
    >> > * *Copy from input to output (a lot).
    >> > * *The last write() comes back without any errors.
    >> > * *Shutdown the socket
    >> > Usually this works. *However every so often the shutdown() seems to eat
    >> > whatever the lat write() sent to the transmission buffer, such that the
    >> > next node never receives it. *The shutdown is like this:
    >> > * *if(0!=shutdown(fd, SHUT_RDWR)){
    >> > * * *fprintf(stderr, "Error on shutdown of network stream\n");
    >> > * *}

    >>
    >> In netperf, to be "certain" the data was received by the remote on the
    >> "test" connection and to avoid a race between the last bytes of data
    >> hitting the wire and close() happening, I use shutdown(fd, SHUT_WR)
    >> (note the lack of RD) and then await the remote's shutdown by waiting
    >> for a zero byte return from recv(). *In a proper "robust" application
    >> there would probably be a select there or somesuch and some timeouts.

    >
    > Why is shutdown(fd, SHUT_WR) necessary? Does not close() send out all
    > the data followed by FIN and then wait for FIN/ACK?


    The problem is that there may still be unread data in the input
    buffer. At least Linux (both 2.4 and 2.6) send a RST when the
    descriptor is destroyed while there is.

  8. Re: TCP socket, shutdown, eats last few bytes in buffer (rarely)

    On Nov 5, 2:14*am, Rainer Weikusat wrote:

    > > Why is shutdown(fd, SHUT_WR) necessary? Does not close() send out all
    > > the data followed by FIN and then wait for FIN/ACK?

    >
    > The problem is that there may still be unread data in the input
    > buffer. At least Linux (both 2.4 and 2.6) send a RST when the
    > descriptor is destroyed while there is.


    Which is fine so long as either there being unread data is a breach of
    the higher-level protocol or if there is unread data, you want the
    other end to know it wasn't read. However, if you want to be able to
    read any additional data the other end sends or allow a normal
    shutdown even if the other side sends more data, you need to close in
    two steps.

    DS

  9. Re: TCP socket, shutdown, eats last few bytes in buffer (rarely)

    Rick Jones wrote:
    > In netperf, to be "certain" the data was received by the remote on the
    > "test" connection and to avoid a race between the last bytes of data
    > hitting the wire and close() happening, I use shutdown(fd, SHUT_WR)
    > (note the lack of RD) and then await the remote's shutdown by waiting
    > for a zero byte return from recv(). In a proper "robust" application
    > there would probably be a select there or somesuch and some timeouts.


    TCP gives me a headache.

    Since there can always be a packet which was sent and might or might not
    have been delivered (yet), it seems to lead to an endless "I know that
    you know that I know that you know..." series of confirmation messages
    like the following, where the last sender is never 100% sure that the
    last message reached its destination, and so cannot act on it reliably:

    A B
    snd "A Done"
    rcv "A Done"
    snd "B knows A Done"
    rcv "B knows A done"
    snd "A knows B knows A done"
    rcv "A knows B knows A done"
    snd "B knows A knows B knows A done"
    etc.

    Does that make sense?

    shutdown() is, I think supposed to get us out of this. Is
    this the proper method (pseudocode, it is missing parameters
    and error checking), so that both sides know the socket
    is well and truly done, that all data is delivered, and that
    it is safe to close the fd on each end and then to exit?
    It is pretty much what you said above, but with an extra shutdown
    on A. Do you close() on A and B after the shutdowns, or is there
    no point in that if both RD and WR have been shutdown on each end?

    A B
    write(fd) (sends last data)
    shutdown(fd,SHUT_WR)
    read(fd) (picks up last data)
    read(fd) (ret 0, knows A shutdown)
    shutdown(fd,SHUT_RDWR)
    close(fd)
    exit
    read(fd) (ret 0, knows B shutdown)
    shutdown(fd,SHUT_RD)
    close(fd)
    exit

    On a blocking socket I can see how this might work reliably (if B locked
    at its shutdown until the TCP layer completed the handshake with the
    second shutdown on A), but not so on a nonblocking socket. At B's
    shutdown() it cannot know that A has seen the handshake, so cannot know
    that it can close(), since that would blow up the socket and conceivably
    the handshake state needed by A. Or does shutdown() block whatever the
    socket is set to, so that a graceful termination is possible?

    The shutdown man page is not very clear, at least to me. This
    is all it says:

    The shutdown() call causes all or part of a full-duplex
    connection on the socket associated with s to be shut down.
    If how is SHUT_RD, further receptions will be disallowed. If
    how is SHUT_WR, further transmissions will be disallowed.
    If how is SHUT_RDWR, further receptions and transmissions will
    be disallowed.

    "Further" is plenty ambiguous, does "further transmissions" mean
    "no new data will be accepted from write()" or that plus "and the
    remaining data in the transmit buffer is not sent either".

    In a nutshell, I want to know the "this will always work method"
    because anything less seems to leave potential race conditions or
    deadlocks which can be rare and unpredictable, yet show up
    at inopportune moments.

    Thanks,

    David Mathog

  10. Re: TCP socket, shutdown, eats last few bytes in buffer (rarely)

    On Nov 5, 3:31*pm, David Mathog wrote:

    > TCP gives me a headache.
    >
    > Since there can always be a packet which was sent and might or might not
    > have been delivered (yet), it seems to lead to an endless "I know that
    > you know that I know that you know..." series of confirmation messages
    > like the following, where the last sender is never 100% sure that the
    > last message reached its destination, and so cannot act on it reliably:


    This is a fundamental problem that every network protocol has. It's
    often called the "two generals" problem. It is impossible to design a
    protocol such that both ends will always agree on whether a connection
    terminated normally or not.

    > shutdown() is, I think supposed to get us out of this. *Is
    > this the proper method (pseudocode, it is missing parameters
    > and error checking), so that both sides know the socket
    > is well and truly done, that all data is delivered, and that
    > it is safe to close the fd on each end and then to exit?
    > It is pretty much what you said above, but with an extra shutdown
    > on A. Do you close() on A and B after the shutdowns, or is there
    > no point in that if both RD and WR have been shutdown on each end?


    The only reason to 'close' is to release the socket. If you don't
    'close', the socket stays around forever. The 'close' function is the
    one and only way (other than process termination) to release the
    socket. As a side-effect, if you 'close' the last reference to a
    connection, that connection is shutdown.

    > A * * * * * * * * * * * * * B
    > write(fd) (sends last data)
    > shutdown(fd,SHUT_WR)
    > * * * * * * * * * * * * * * *read(fd) * (picks up last data)
    > * * * * * * * * * * * * * * *read(fd) * (ret 0, knows A shutdown)
    > * * * * * * * * * * * * * * *shutdown(fd,SHUT_RDWR)


    No need to do this. You already did a SHUT_WR, and you already know
    there is no more data to read. So the last shutdown is redundant.

    > * * * * * * * * * * * * * * *close(fd)
    > * * * * * * * * * * * * * * *exit
    > read(fd) (ret 0, knows B shutdown)
    > shutdown(fd,SHUT_RD)
    > close(fd)
    > exit


    > On a blocking socket I can see how this might work reliably (if B locked
    > at its shutdown until the TCP layer completed the handshake with the
    > second shutdown on A), but not so on a nonblocking socket. At B's
    > shutdown() it cannot know that A has seen the handshake, so cannot know
    > that it can close(), since that would blow up the socket and conceivably
    > the handshake state needed by A.


    You misunderstand what 'close' does. The 'close' function releases the
    socket. If it also releases the last handle to a connection, it
    initiates a 'shutdown' of that connection. The handshake state needed
    by A is not associated with the socket but with the connection.

    >*Or does shutdown() block whatever the
    > socket is set to, so that a graceful termination is possible?


    No, 'shutdown' does not block a non-blocking socket. That's why you
    call 'shutdown' and then use 'read' to tell when the shutdown has
    completed. Once 'read' returns zero, you're done. You already asked to
    'shutdown' your side, and the zero read means the other side has asked
    to 'shutdown' its side. All data coming your way has been read, and
    you cannot send any more. So you, the application, do not need to do
    anything else.

    > The shutdown man page is not very clear, at least to me. *This
    > is all it says:
    >
    > * The shutdown() call causes all or part of a full-duplex
    > * connection on the socket associated with s to be shut down.
    > * If how is SHUT_RD, further receptions will be disallowed. *If
    > * how *is *SHUT_WR, further *transmissions *will be disallowed.
    > * If how is SHUT_RDWR, further receptions and transmissions will
    > * be disallowed.
    >
    > "Further" is plenty ambiguous, does "further transmissions" mean
    > "no new data will be accepted from write()" or that plus "and the
    > remaining data in the transmit buffer is not sent either".


    The 'shutdown' function initiates a graceful shutdown. Once you call
    SHUT_WR, you cannot 'write' any more data. This fact will be
    communicated to the other side, and it's 'read' will return zero.

    > In a nutshell, I want to know the "this will always work method"
    > because anything less seems to leave potential race conditions or
    > deadlocks which can be rare and unpredictable, yet show up
    > at inopportune moments.


    Read and write until you need not write any more. Shutdown write. Wait
    for read to return zero or error. You are now done, and presumably the
    other side will see a graceful shutdown, but you cannot be sure.

    DS

  11. Re: TCP socket, shutdown, eats last few bytes in buffer (rarely)

    David Mathog wrote:
    > Rick Jones wrote:
    > > In netperf, to be "certain" the data was received by the remote on the
    > > "test" connection and to avoid a race between the last bytes of data
    > > hitting the wire and close() happening, I use shutdown(fd, SHUT_WR)
    > > (note the lack of RD) and then await the remote's shutdown by waiting
    > > for a zero byte return from recv(). In a proper "robust" application
    > > there would probably be a select there or somesuch and some timeouts.


    > TCP gives me a headache.


    > Since there can always be a packet which was sent and might or might
    > not have been delivered (yet), it seems to lead to an endless "I
    > know that you know that I know that you know..." series of
    > confirmation messages like the following, where the last sender is
    > never 100% sure that the last message reached its destination, and
    > so cannot act on it reliably:


    > A B
    > snd "A Done"
    > rcv "A Done"
    > snd "B knows A Done"
    > rcv "B knows A done"
    > snd "A knows B knows A done"
    > rcv "A knows B knows A done"
    > snd "B knows A knows B knows A done"
    > etc.


    > Does that make sense?


    I think there is something called two phase commit for that but that
    is just an off the cuff remark.

    > shutdown() is, I think supposed to get us out of this.


    At least for closing a TCP connection, yes.

    > Is this the proper method (pseudocode, it is missing parameters and
    > error checking), so that both sides know the socket is well and
    > truly done, that all data is delivered, and that it is safe to close
    > the fd on each end and then to exit? It is pretty much what you
    > said above, but with an extra shutdown on A. Do you close() on A and
    > B after the shutdowns, or is there no point in that if both RD and
    > WR have been shutdown on each end?


    > A B
    > write(fd) (sends last data)
    > shutdown(fd,SHUT_WR)
    > read(fd) (picks up last data)
    > read(fd) (ret 0, knows A shutdown)
    > shutdown(fd,SHUT_RDWR)
    > close(fd)
    > exit
    > read(fd) (ret 0, knows B shutdown)
    > shutdown(fd,SHUT_RD)
    > close(fd)
    > exit


    On B, the shutdown() and the close() are IIRC redundant.

    > On a blocking socket I can see how this might work reliably (if B locked
    > at its shutdown until the TCP layer completed the handshake with the
    > second shutdown on A), but not so on a nonblocking socket. At B's
    > shutdown() it cannot know that A has seen the handshake, so cannot know
    > that it can close(), since that would blow up the socket and conceivably
    > the handshake state needed by A. Or does shutdown() block whatever the
    > socket is set to, so that a graceful termination is possible?


    In my netperf case, I am using blocking sockets. When A calls
    shutdown it "knows" that it has received all data it is expecting from
    B and will not be asking B for any more data, so when B does his
    close(), there (should be) is no data in A's socket buffers.

    B's shutdown/close and the resulting FIN is a "don't care" case on the
    part of B as to whether or not A ever sees the FINinshed segment. A
    has already said it will not be further requesting of B. The TCP
    endpoint in B will be in an active retransmission state and so that
    endpoint will go away in time no matter what. A's endpoint of course
    is not in an active retransmission state, which is why A needs some
    sort of select/poll/timer while waiting for the read return of zero
    from B's shutdown/close.

    > The shutdown man page is not very clear, at least to me. This
    > is all it says:


    > The shutdown() call causes all or part of a full-duplex
    > connection on the socket associated with s to be shut down.
    > If how is SHUT_RD, further receptions will be disallowed. If
    > how is SHUT_WR, further transmissions will be disallowed.
    > If how is SHUT_RDWR, further receptions and transmissions will
    > be disallowed.


    > "Further" is plenty ambiguous, does "further transmissions" mean
    > "no new data will be accepted from write()" or that plus "and the
    > remaining data in the transmit buffer is not sent either".


    SHUT_WR means that no new data will be presented to the socket by the
    caller. A FINished segment will follow in the sequence space (perhaps
    piggy-backed) after the last of the data the caller of shutdown() has
    already given to the socket. If the application tries to call write
    on the socket after a shutdown(SHUT_WR) an error will be returned.

    rick jones
    --
    The glass is neither half-empty nor half-full. The glass has a leak.
    The real question is "Can it be patched?"
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  12. Re: TCP socket, shutdown, eats last few bytes in buffer (rarely)


    Rick Jones wrote:

    > I think there is something called two phase commit for that but that
    > is just an off the cuff remark.


    Sadly, it doesn't work. The two-phase commit protocol is not
    guaranteed to terminate in a finite amount of time.

    Consider:

    1) You send commit request.

    2) You get an acknowledge.

    3) You send an agreement message.

    4) You don't get an acknowledge.

    At this point, you are stuck. You have no idea whether the peer got
    the agreement message or not. If you send a rollback, you have no idea
    whether the peer will receive it or not. At this point, whatever you
    do, the peer is in an indeterminate state. If you send no further
    messages, you will not know whether the peer got the acknowledge and
    will consider the transaction a success or did not and will consider
    it a failure. If no further packets go through either way, this will
    not change.

    What do you do now?

    DS

  13. Re: TCP socket, shutdown, eats last few bytes in buffer (rarely)

    David Schwartz wrote:
    > What do you do now?


    Walk to the bank like I should have done in the first place

    rick jones
    --
    Wisdom Teeth are impacted, people are affected by the effects of events.
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  14. Re: TCP socket, shutdown, eats last few bytes in buffer (rarely)

    David Schwartz writes:
    > On Nov 5, 2:14*am, Rainer Weikusat wrote:
    >
    >> > Why is shutdown(fd, SHUT_WR) necessary? Does not close() send out all
    >> > the data followed by FIN and then wait for FIN/ACK?

    >>
    >> The problem is that there may still be unread data in the input
    >> buffer. At least Linux (both 2.4 and 2.6) send a RST when the
    >> descriptor is destroyed while there is.

    >
    > Which is fine so long as either there being unread data is a breach of
    > the higher-level protocol


    The real-world counter example is that at least some versions of IE
    (one of them having been used by my boss during what should have been
    a demonstration) transmit an additional CR-LF sequence after the
    entity body of a HTTP POST-request which is not accounted for in the
    associated Content-Length-header. Destroying the socket after the
    reply has been sent but with this CR-LF still sitting in the input buffer,
    leads to the kernel transmitting a RST to the client, and this causes
    IE to display synthetic error document instead of the already received
    data.

  15. Re: TCP socket, shutdown, eats last few bytes in buffer (rarely)

    On Nov 6, 5:39*am, Rainer Weikusat wrote:

    > > Which is fine so long as either there being unread data is a breach of
    > > the higher-level protocol


    > The real-world counter example is that at least some versions of IE
    > (one of them having been used by my boss during what should have been
    > a demonstration) transmit an additional CR-LF sequence after the
    > entity body of a HTTP POST-request which is not accounted for in the
    > associated Content-Length-header. Destroying the socket after the
    > reply has been sent but with this CR-LF still sitting in the input buffer,
    > leads to the kernel transmitting a RST to the client, and this causes
    > IE to display synthetic error document instead of the already received
    > data.


    This is correct, but perhaps not useful, behavior. There was an error
    -- the client breached the protocol. You will have to break your
    server to fix the client. But make no mistake, code you added
    specifically to work around this is breakage.

    Great example.

    Here even though there being unread data is a breach of the protocol,
    you still have to deal with the unread data because the client is
    breaching the protocol.

    I would argue, however, that reading the data just in case a client is
    broken in this way, without knowing that any existing client actually
    is broken, is a mistake. You have no way to know a priori whether
    sending a RST or eating the data is better. It is only by seeing the
    specifics of the client bug that you know what the right thing to do
    in this case is.

    DS

  16. Re: TCP socket, shutdown, eats last few bytes in buffer (rarely)

    On Nov 5, 10:14*am, Rainer Weikusat wrote:
    > Maxim Yegorushkin writes:
    > > On Nov 4, 7:32*pm, Rick Jones wrote:
    > >> David Mathog wrote:
    > >> > On Linux 2.6.19
    > >> > * *Open input and output TCP sockets.
    > >> > * *Copy from input to output (a lot).
    > >> > * *The last write() comes back without any errors.
    > >> > * *Shutdown the socket
    > >> > Usually this works. *However every so often the shutdown() seems to eat
    > >> > whatever the lat write() sent to the transmission buffer, such that the
    > >> > next node never receives it. *The shutdown is like this:
    > >> > * *if(0!=shutdown(fd, SHUT_RDWR)){
    > >> > * * *fprintf(stderr, "Error on shutdown of network stream\n");
    > >> > * *}

    >
    > >> In netperf, to be "certain" the data was received by the remote on the
    > >> "test" connection and to avoid a race between the last bytes of data
    > >> hitting the wire and close() happening, I use shutdown(fd, SHUT_WR)
    > >> (note the lack of RD) and then await the remote's shutdown by waiting
    > >> for a zero byte return from recv(). *In a proper "robust" application
    > >> there would probably be a select there or somesuch and some timeouts.

    >
    > > Why is shutdown(fd, SHUT_WR) necessary? Does not close() send out all
    > > the data followed by FIN and then wait for FIN/ACK?

    >
    > The problem is that there may still be unread data in the input
    > buffer. At least Linux (both 2.4 and 2.6) send a RST when the
    > descriptor is destroyed while there is.


    Indeed. Thank you.

    http://lxr.linux.no/linux+v2.6.27.4/...v4/tcp.c#L1772 points to the
    RFC 2525 section 2.17 describing why it should send RST when closing a
    TCP socket with unread data.

    --
    Max

  17. Re: TCP socket, shutdown, eats last few bytes in buffer (rarely)

    I am reading through this thread, and it bring me up another question, maybe not closely related with the original one:
    Is shutdown() blocking with a blocked socket in Linux? I can not find any document that clarify this.
    I was assuming the shutdown() will trigger the FIN/FIN Ack sequence, and should block until the TCP shutdown is actually finished. Now I doubt this previous understanding is wrong.
    Can anyone clarify?
    Thanks

+ Reply to Thread