Overhead, UDP: Many packets on 1 socket vs. one packet and manysockets. - Unix

This is a discussion on Overhead, UDP: Many packets on 1 socket vs. one packet and manysockets. - Unix ; I have an application that communicates with a series of devices on a gigabit LAN network (64 or more devices) using UDP. It is constantly streaming data to these devices. The data is organized into "frames" and each device frame ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: Overhead, UDP: Many packets on 1 socket vs. one packet and manysockets.

  1. Overhead, UDP: Many packets on 1 socket vs. one packet and manysockets.

    I have an application that communicates with a series of devices on a
    gigabit LAN network (64 or more devices) using UDP. It is constantly
    streaming data to these devices. The data is organized into "frames"
    and each device frame is a single UDP packet. The current
    implementation creates a separate SOCK_DGRAM socket for each device,
    then calls connect() and sends packets with send() (rather than using
    sendto() with no connect()).

    I am now at a point where I have to modify this application to send
    additional data to the devices, with some slightly different
    information in the data packets. The application must now send more
    than one UDP packet per frame to each device.

    I basically have two options:

    1. Create a new socket for each of these new pieces of data. This
    means more than one UDP socket is "connected" to the same network
    device. If I have to send 3 UDP packets per frame to a device, I have
    3 sockets "connected" to that device, and I serially send a packet
    over each of those sockets.

    2. Share one socket per device. Only one UDP socket is "connected"
    to a given device. If I have to send 3 UDP packets per frame to a
    device, I have a single socket "connected" to it and I send 3 packets
    using that one socket.

    Without going into too much detail, because of the way the application
    is designed it is *far* easier to implement number 1 than number 2
    (option 1 requires little additional code, option 2 requires nearly a
    complete rewrite of the hardware interface portion of the
    application).

    In the current situation, there are 60 devices each which require 16
    UDP packets per frame of data. Therefore option 1 requires 60*16 = 960
    sockets and option 2 requires only 60. This machine and LAN is
    dedicated to this application and no other applications are competing
    for network resources.

    My question is, is it worth the trouble to rewrite a lot of the
    application to go with option 2? Low network latencies are critical,
    and bandwidth usage is fairly high (370 kilobit bursts every 1/60th of
    a second -- so about 22 mbps on average, and we'd like to keep each
    burst under 0.5 ms). Is there a significant amount of overhead
    associated with sending less packets over more sockets to the same
    destination, rather than more packets over less sockets? Or will
    option 1 have similar performance?

    Thanks,
    Jason

  2. Re: Overhead, UDP: Many packets on 1 socket vs. one packet and many sockets.

    Jason C writes:
    > I have an application that communicates with a series of devices on a
    > gigabit LAN network (64 or more devices) using UDP. It is constantly
    > streaming data to these devices. The data is organized into "frames"
    > and each device frame is a single UDP packet. The current
    > implementation creates a separate SOCK_DGRAM socket for each device,
    > then calls connect() and sends packets with send() (rather than using
    > sendto() with no connect()).


    [...]

    > The application must now send more than one UDP packet per frame to
    > each device.
    >
    > I basically have two options:
    >
    > 1. Create a new socket for each of these new pieces of data.


    [...]

    > 2. Share one socket per device.


    [...]

    > because of the way the application is designed it is *far* easier to
    > implement number 1 than number 2 (option 1 requires little
    > additional code, option 2 requires nearly a complete rewrite of the
    > hardware interface portion of the application).


    There could be a third option: Create an additional file descriptor
    refering to the already existing socket (eg, using dup(2)). This would
    still cause the process descriptor table to grow, but without an
    increase in the actual number of sockets.

    [...]

    > Is there a significant amount of overhead associated with sending
    > less packets over more sockets to the same destination, rather than
    > more packets over less sockets? Or will option 1 have similar
    > performance?


    These are two different questions. AFAIK, there should be no drastic
    overhead when using multiple sockets to send data from the same
    process, at least with blocking I/O, insofar the protocol
    implementation has any built-in local flow control at all, which would
    be atypical for UDP. But using 16 times the amount of kernel
    ressources could have a performance impact, eg by reducing the amount
    of RAM available for application use.

  3. Re: Overhead, UDP: Many packets on 1 socket vs. one packet and many ?sockets.

    Jason C wrote:
    > I have an application that communicates with a series of devices on
    > a gigabit LAN network (64 or more devices) using UDP. It is
    > constantly streaming data to these devices. The data is organized
    > into "frames" and each device frame is a single UDP packet. The


    Nit - UDP has "datagrams"

    > current implementation creates a separate SOCK_DGRAM socket for each
    > device, then calls connect() and sends packets with send() (rather
    > than using sendto() with no connect()).


    It just calls send() the one time, or does it over and over again over
    time?

    > I am now at a point where I have to modify this application to send
    > additional data to the devices, with some slightly different
    > information in the data packets. The application must now send more
    > than one UDP packet per frame to each device.


    > I basically have two options:


    > 1. Create a new socket for each of these new pieces of data. This
    > means more than one UDP socket is "connected" to the same network
    > device. If I have to send 3 UDP packets per frame to a device, I have
    > 3 sockets "connected" to that device, and I serially send a packet
    > over each of those sockets.


    > 2. Share one socket per device. Only one UDP socket is "connected"
    > to a given device. If I have to send 3 UDP packets per frame to a
    > device, I have a single socket "connected" to it and I send 3 packets
    > using that one socket.


    > Without going into too much detail, because of the way the
    > application is designed it is *far* easier to implement number 1
    > than number 2 (option 1 requires little additional code, option 2
    > requires nearly a complete rewrite of the hardware interface portion
    > of the application).


    Do your devices care that the additional data would be coming from
    additional port numbers?

    > In the current situation, there are 60 devices each which require 16
    > UDP packets per frame of data. Therefore option 1 requires 60*16 =
    > 960 sockets and option 2 requires only 60. This machine and LAN is
    > dedicated to this application and no other applications are
    > competing for network resources.


    > My question is, is it worth the trouble to rewrite a lot of the
    > application to go with option 2? Low network latencies are critical,
    > and bandwidth usage is fairly high (370 kilobit bursts every 1/60th
    > of a second -- so about 22 mbps on average, and we'd like to keep
    > each burst under 0.5 ms). Is there a significant amount of overhead
    > associated with sending less packets over more sockets to the same
    > destination, rather than more packets over less sockets? Or will
    > option 1 have similar performance?


    If you are not creating the sockets over and over again, nor using
    anything like select() or poll() as far as basic code path goes, there
    probably isn't that big a difference. There may be a slightly longer
    local to global descriptor lookup with the 960 sockets.

    It is though somewhat "messy" (IMO), and will rapidly become untenable
    when you have to send even more data to the devices. IMO it is best
    to bite the bullet now rather than later.

    Since you mention low latencies being critical, what is your
    definition of "low?" Many GbE drivers will set interrupt coalescing
    parameters to favor CPU utilization of throughtput over latency.

    rick jones
    --
    denial, anger, bargaining, depression, acceptance, rebirth...
    where do you want to be today?
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

+ Reply to Thread