On Mon, 04 Sep 2006 11:06:51 -0400 Sten wrote:
OS> This all happens because the first event (that runs run_alarms() and
OS> agentx ping) calls send(), select(), and recv() all in the same function
OS> call (agentx_synch_response()).
OS> It seems to me that it shouldn't be calling select() or recv(). I
OS> realize it was designed to do this, but it seems wrong to do this. The
OS> send() would be OK, but then the code should wait for until the FD
OS> manager (which may be the user's application) select()s "read" to
OS> process the response to the send().

I agree. I'm sure the use of synch_response is just because that's how it was
originally written.

I can envision 2 possible fixes for this.. 'the right way', which would be
making the changes you described for asynchronous operation. 'the easy way'
would be to add a non-blocking select on the agentx fd before the synchronous
send, sidestepping the issue.

Another idea would be to change the way the ping interval works. I think it is
a fixed interval, and changing it to reset the timer whenever there is any
agentx traffic would reduce the window of opportunity for this bug to occur.

