AIX 5.1 and 5.3 Socket Program Issue - Aix

This is a discussion on AIX 5.1 and 5.3 Socket Program Issue - Aix ; Hi, Our product uses a file transfer module which uses TCP sockets to transfer data. This code was developed on SUSE9 32 bit and ported to AIX-64bit (5.1 and 5.3), HP-UX 64 bit and SUSE 10 64bit. The file transfer ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: AIX 5.1 and 5.3 Socket Program Issue

  1. AIX 5.1 and 5.3 Socket Program Issue

    Hi,

    Our product uses a file transfer module which uses TCP sockets to
    transfer data.
    This code was developed on SUSE9 32 bit and ported to AIX-64bit (5.1
    and 5.3), HP-UX 64 bit and SUSE 10 64bit.

    The file transfer module is working on SUSE9,10 and HP-UX without any
    issues.
    Only on AIX 5.1 and 5.3 we are facing issues.

    The overview of the modules is as follows:

    Server( Reciever)
    s1) Opens a socket, sets following options
    - SO_REUSEADDR
    - SO_RCVBUF & SO_SNDBUF - 1MB
    - SO_SNDTIMEO & SO_RCVTIMEO - 5 seconds
    s2) call bind() to a port
    s3) call listen() to check for incoming connections
    s4) if listen() is success call accept() to accept the connection
    s5) Recive file chunks by calling recieve() system call

    Client (sender)
    c1) Opens a socket, sets following options
    - SO_REUSEADDR
    - SO_RCVBUF & SO_SNDBUF - 1MB
    - SO_SNDTIMEO & SO_RCVTIMEO - 5 seconds
    c2) call connect()
    c3) If connect is successfull, start sending file chunks ( each chunk
    is of size 1024)
    by using send() system call.

    The issues encountered are:
    1. When server is in step s5 and client in step c3, receive at s5
    used to get many "Resource unavailable" (errno:11) errors.
    - To solve this we added a retry(5000 times) for recieve with 2
    milli sec sleep.
    2. After the retry mechanism, packets were being recieved at Server.
    But after 8-10 chunks (each of size 1024). The recieved packets were
    corrupted.
    - To solve this, we added a delay at sender and reciever. The
    delay was 500ms.
    After this fix file chunks were received correctly.
    But this delay affects the performance. We want to avoid this
    delay and still make sure packet corruption does not happen.

    This packet corruption/ loss issue is being observed only in AIX.
    All our AIX,HP and SUSE machines are in local LAN. In HP and SUSE no
    such issues have been observed.

    Please let us know if we have missed some AIX specific socket option,
    which is resulting in this behaviour.
    Please suggest on how to resolve this as soon as possible.



    Also on net I found a flag on AIX i.e
    #define _BSD 43 or 44


    Does this have any impact?


    Regards,



  2. Re: AIX 5.1 and 5.3 Socket Program Issue

    In article <35bbcd7d-2bdb-4ec1-a25e-3c245352271c@t54g2000hsg.googlegroups.com>,
    "Kiran Kumar.M.R" writes:

    > But after 8-10 chunks (each of size 1024). The recieved packets were
    > corrupted.


    Could it be the usual pitfall that send() and recv() do not guarantee
    that the specified number of bytes is transmitted at once ?
    IIRC one has to code sth like

    pc = pBuf; nBuf = MBUF;
    while(nBuf>0) {
    if ( (iRC = send( ioSocket, pc, nBuf, 0 )) < 0 ) {
    perror(" TCPsend(send)"); goto gError;
    }
    nBuf -= iRC; pc += iRC;
    if ( iRC==0 ) break;
    }

    to cope with that. Likewise on the recv()'ing side.

    > This packet corruption/ loss issue is being observed only in AIX.
    > All our AIX,HP and SUSE machines are in local LAN. In HP and SUSE no
    > such issues have been observed.


    if the above explanation is true: it worked just by chance.


  3. Re: AIX 5.1 and 5.3 Socket Program Issue

    Kiran Kumar.M.R wrote:
    > Our product uses a file transfer module which uses TCP sockets to
    > transfer data.
    > This code was developed on SUSE9 32 bit and ported to AIX-64bit (5.1
    > and 5.3), HP-UX 64 bit and SUSE 10 64bit.


    > The file transfer module is working on SUSE9,10 and HP-UX without
    > any issues.


    > Only on AIX 5.1 and 5.3 we are facing issues.


    > The overview of the modules is as follows:


    > Server( Reciever)
    > s1) Opens a socket, sets following options
    > - SO_REUSEADDR
    > - SO_RCVBUF & SO_SNDBUF - 1MB
    > - SO_SNDTIMEO & SO_RCVTIMEO - 5 seconds


    IIRC *TIMEO are noops under HP-UX...

    > s2) call bind() to a port
    > s3) call listen() to check for incoming connections


    listen() doesn't check for incoming connections, it enables the socket
    to allow incoming connections.

    > s4) if listen() is success call accept() to accept the connection
    > s5) Recive file chunks by calling recieve() system call


    > Client (sender)
    > c1) Opens a socket, sets following options
    > - SO_REUSEADDR
    > - SO_RCVBUF & SO_SNDBUF - 1MB
    > - SO_SNDTIMEO & SO_RCVTIMEO - 5 seconds
    > c2) call connect()
    > c3) If connect is successfull, start sending file chunks ( each chunk
    > is of size 1024) by using send() system call.


    Why such small send() calls into the transport? I hope your reciever
    isn't doing the same thing at its end, only pulling 1024 bytes at a
    time from the socket.

    > The issues encountered are:
    > 1. When server is in step s5 and client in step c3, receive at s5
    > used to get many "Resource unavailable" (errno:11) errors.


    What does your code do at that point, particularly with respect to
    manipulating buffer pointers or whatnot? Also, just to be paranoid,
    the recv() call at s5 is returning -1 and _then_ you are checking
    errno right?

    Are you also setting non-blocking on the socket or are you relying on
    the *TIMEO settings?

    What does your sending side do if it ever hits an SO_SNDTIMEO?
    Particularly wrt updating buffer pointers and whatnot.

    > - To solve this we added a retry(5000 times) for recieve with 2
    > millisec sleep.
    > 2. After the retry mechanism, packets were being recieved at Server.
    > But after 8-10 chunks (each of size 1024). The recieved packets were
    > corrupted.
    > - To solve this, we added a delay at sender and reciever. The
    > delay was 500ms.
    > After this fix file chunks were received correctly.
    > But this delay affects the performance. We want to avoid this
    > delay and still make sure packet corruption does not happen.


    It is difficult to believe that the AIX transport would be that fubar.
    That you have to add such sleeps/delay's suggests some application
    timing problems you simply hadn't seen before (ie had gotten lucky).

    You might consider taking some packet traces of the transfers at the
    sending and receiving side and compare the data.

    Also, consider removing the setsockopt() calls setting the SO_SNDTIMEO
    and SO_RCVTIMEO and see if that changes things.

    rick jones
    --
    a wide gulf separates "what if" from "if only"
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

+ Reply to Thread