AIX 5.1 and 5.3 Socket Program Issue - Aix
This is a discussion on AIX 5.1 and 5.3 Socket Program Issue - Aix ; Hi,
Our product uses a file transfer module which uses TCP sockets to
transfer data.
This code was developed on SUSE9 32 bit and ported to AIX-64bit (5.1
and 5.3), HP-UX 64 bit and SUSE 10 64bit.
The file transfer ...
-
AIX 5.1 and 5.3 Socket Program Issue
Hi,
Our product uses a file transfer module which uses TCP sockets to
transfer data.
This code was developed on SUSE9 32 bit and ported to AIX-64bit (5.1
and 5.3), HP-UX 64 bit and SUSE 10 64bit.
The file transfer module is working on SUSE9,10 and HP-UX without any
issues.
Only on AIX 5.1 and 5.3 we are facing issues.
The overview of the modules is as follows:
Server( Reciever)
s1) Opens a socket, sets following options
- SO_REUSEADDR
- SO_RCVBUF & SO_SNDBUF - 1MB
- SO_SNDTIMEO & SO_RCVTIMEO - 5 seconds
s2) call bind() to a port
s3) call listen() to check for incoming connections
s4) if listen() is success call accept() to accept the connection
s5) Recive file chunks by calling recieve() system call
Client (sender)
c1) Opens a socket, sets following options
- SO_REUSEADDR
- SO_RCVBUF & SO_SNDBUF - 1MB
- SO_SNDTIMEO & SO_RCVTIMEO - 5 seconds
c2) call connect()
c3) If connect is successfull, start sending file chunks ( each chunk
is of size 1024)
by using send() system call.
The issues encountered are:
1. When server is in step s5 and client in step c3, receive at s5
used to get many "Resource unavailable" (errno:11) errors.
- To solve this we added a retry(5000 times) for recieve with 2
milli sec sleep.
2. After the retry mechanism, packets were being recieved at Server.
But after 8-10 chunks (each of size 1024). The recieved packets were
corrupted.
- To solve this, we added a delay at sender and reciever. The
delay was 500ms.
After this fix file chunks were received correctly.
But this delay affects the performance. We want to avoid this
delay and still make sure packet corruption does not happen.
This packet corruption/ loss issue is being observed only in AIX.
All our AIX,HP and SUSE machines are in local LAN. In HP and SUSE no
such issues have been observed.
Please let us know if we have missed some AIX specific socket option,
which is resulting in this behaviour.
Please suggest on how to resolve this as soon as possible.
Also on net I found a flag on AIX i.e
#define _BSD 43 or 44
Does this have any impact?
Regards,
-
Re: AIX 5.1 and 5.3 Socket Program Issue
In article <35bbcd7d-2bdb-4ec1-a25e-3c245352271c@t54g2000hsg.googlegroups.com>,
"Kiran Kumar.M.R" writes:
> But after 8-10 chunks (each of size 1024). The recieved packets were
> corrupted.
Could it be the usual pitfall that send() and recv() do not guarantee
that the specified number of bytes is transmitted at once ?
IIRC one has to code sth like
pc = pBuf; nBuf = MBUF;
while(nBuf>0) {
if ( (iRC = send( ioSocket, pc, nBuf, 0 )) < 0 ) {
perror(" TCPsend(send)"); goto gError;
}
nBuf -= iRC; pc += iRC;
if ( iRC==0 ) break;
}
to cope with that. Likewise on the recv()'ing side.
> This packet corruption/ loss issue is being observed only in AIX.
> All our AIX,HP and SUSE machines are in local LAN. In HP and SUSE no
> such issues have been observed.
if the above explanation is true: it worked just by chance.
-
Re: AIX 5.1 and 5.3 Socket Program Issue
Kiran Kumar.M.R wrote:
> Our product uses a file transfer module which uses TCP sockets to
> transfer data.
> This code was developed on SUSE9 32 bit and ported to AIX-64bit (5.1
> and 5.3), HP-UX 64 bit and SUSE 10 64bit.
> The file transfer module is working on SUSE9,10 and HP-UX without
> any issues.
> Only on AIX 5.1 and 5.3 we are facing issues.
> The overview of the modules is as follows:
> Server( Reciever)
> s1) Opens a socket, sets following options
> - SO_REUSEADDR
> - SO_RCVBUF & SO_SNDBUF - 1MB
> - SO_SNDTIMEO & SO_RCVTIMEO - 5 seconds
IIRC *TIMEO are noops under HP-UX...
> s2) call bind() to a port
> s3) call listen() to check for incoming connections
listen() doesn't check for incoming connections, it enables the socket
to allow incoming connections.
> s4) if listen() is success call accept() to accept the connection
> s5) Recive file chunks by calling recieve() system call
> Client (sender)
> c1) Opens a socket, sets following options
> - SO_REUSEADDR
> - SO_RCVBUF & SO_SNDBUF - 1MB
> - SO_SNDTIMEO & SO_RCVTIMEO - 5 seconds
> c2) call connect()
> c3) If connect is successfull, start sending file chunks ( each chunk
> is of size 1024) by using send() system call.
Why such small send() calls into the transport? I hope your reciever
isn't doing the same thing at its end, only pulling 1024 bytes at a
time from the socket.
> The issues encountered are:
> 1. When server is in step s5 and client in step c3, receive at s5
> used to get many "Resource unavailable" (errno:11) errors.
What does your code do at that point, particularly with respect to
manipulating buffer pointers or whatnot? Also, just to be paranoid,
the recv() call at s5 is returning -1 and _then_ you are checking
errno right?
Are you also setting non-blocking on the socket or are you relying on
the *TIMEO settings?
What does your sending side do if it ever hits an SO_SNDTIMEO?
Particularly wrt updating buffer pointers and whatnot.
> - To solve this we added a retry(5000 times) for recieve with 2
> millisec sleep.
> 2. After the retry mechanism, packets were being recieved at Server.
> But after 8-10 chunks (each of size 1024). The recieved packets were
> corrupted.
> - To solve this, we added a delay at sender and reciever. The
> delay was 500ms.
> After this fix file chunks were received correctly.
> But this delay affects the performance. We want to avoid this
> delay and still make sure packet corruption does not happen.
It is difficult to believe that the AIX transport would be that fubar.
That you have to add such sleeps/delay's suggests some application
timing problems you simply hadn't seen before (ie had gotten lucky).
You might consider taking some packet traces of the transfers at the
sending and receiving side and compare the data.
Also, consider removing the setsockopt() calls setting the SO_SNDTIMEO
and SO_RCVTIMEO and see if that changes things.
rick jones
--
a wide gulf separates "what if" from "if only"
these opinions are mine, all mine; HP might not want them anyway... 
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...