Sending Duplicate Data to Many TCP Clients - TCP-IP

This is a discussion on Sending Duplicate Data to Many TCP Clients - TCP-IP ; I need to send the same data to many (e.g. 1000) clients over TCP. I need to use TCP (UDP, RUDP, PGM, etc. are not options). Using standard methods, I would open 1000 connections, loop and send the data to ...

+ Reply to Thread
Results 1 to 15 of 15

Thread: Sending Duplicate Data to Many TCP Clients

  1. Sending Duplicate Data to Many TCP Clients

    I need to send the same data to many (e.g. 1000) clients over TCP.

    I need to use TCP (UDP, RUDP, PGM, etc. are not options).

    Using standard methods, I would open 1000 connections, loop and send
    the data to each connection. The stack would provide each connection
    with its own TCP Window (send buffer).

    There would be a significant memory & cpu advantage if I could send
    the data once to the stack and have it share the data (and send
    buffer) once between the 1000 clients.

    Anyone know of a way to do this on Windows and/or Linux?

    thanks,

    wr


  2. Re: Sending Duplicate Data to Many TCP Clients

    In article <1189076023.145605.45050@g4g2000hsf.googlegroups.co m>,
    reuven wrote:
    >I need to send the same data to many (e.g. 1000) clients over TCP.


    Are these clients all across the Internet or are they on a local network?

    >I need to use TCP (UDP, RUDP, PGM, etc. are not options).


    Why can't you use UDP?

    >Using standard methods, I would open 1000 connections, loop and send
    >the data to each connection. The stack would provide each connection
    >with its own TCP Window (send buffer).


    Right.

    >There would be a significant memory & cpu advantage if I could send
    >the data once to the stack and have it share the data (and send
    >buffer) once between the 1000 clients.


    Yes, it would, but no OS I know of supports such a concept. What OS are
    you using for this project.

    >Anyone know of a way to do this on Windows and/or Linux?


    How much data are you sending out? Do all the machines have to receive the
    data at the same time? Could you create a distribution tree such that you
    machine sends the data to (for example) 10 machines, then those 10 machines
    eash send the data to 10 machines, then those 100 machines each send the
    data to 10 machines, etc?

    If you can share any other details about the data and the environment, maybe
    we could give you more answers than questions, but in general, TCP won't do
    what you apparently need except for as described in your "standard method".

    Patrick
    ========= For LAN/WAN Protocol Analysis, check out PacketView Pro! =========
    Patrick Klos Email: patrick@klos.com
    Klos Technologies, Inc. Web: http://www.klos.com/
    ================================================== ==========================

  3. Re: Sending Duplicate Data to Many TCP Clients

    On Sep 6, 3:56 pm, pk...@osmium.mv.net (Patrick Klos) wrote:
    > In article <1189076023.145605.45...@g4g2000hsf.googlegroups.co m>,
    >
    > reuven wrote:
    > >I need to send the same data to many (e.g. 1000) clients over TCP.

    >
    > Are these clients all across the Internet or are they on a local network?
    >
    > >I need to use TCP (UDP, RUDP, PGM, etc. are not options).

    >
    > Why can't you use UDP?


    A customer requirement. The customer currently has a solution in place
    that uses TCP and is afraid of NAT/Firewall issues. As he put it, "I
    don't want to get a single call about firewall problems." I am trying
    to support more clients without changing hardware or protocol.

    >
    > >Using standard methods, I would open 1000 connections, loop and send
    > >the data to each connection. The stack would provide each connection
    > >with its own TCP Window (send buffer).

    >
    > Right.
    >
    > >There would be a significant memory & cpu advantage if I could send
    > >the data once to the stack and have it share the data (and send
    > >buffer) once between the 1000 clients.

    >
    > Yes, it would, but no OS I know of supports such a concept. What OS are
    > you using for this project.


    Linux & Windows.

    >
    > >Anyone know of a way to do this on Windows and/or Linux?

    >
    > How much data are you sending out? Do all the machines have to receive the
    > data at the same time? Could you create a distribution tree such that you
    > machine sends the data to (for example) 10 machines, then those 10 machines
    > eash send the data to 10 machines, then those 100 machines each send the
    > data to 10 machines, etc?
    >


    Again, not really an option. Customer requirement is to do it on his
    existing machines.

    > If you can share any other details about the data and the environment, maybe
    > we could give you more answers than questions, but in general, TCP won't do
    > what you apparently need except for as described in your "standard method".


    Based on your answer, I think you pretty much got the gist of my
    question. When you say that TCP won't do what I need I assume you mean
    that the standard stacks won't do it. So I guess my next challenge is
    how to go about getting a non-standard stack.

    > Patrick
    > ========= For LAN/WAN Protocol Analysis, check out PacketView Pro! =========
    > Patrick Klos Email: patr...@klos.com
    > Klos Technologies, Inc. Web: http://www.klos.com/
    > ================================================== =========================*=




  4. Re: Sending Duplicate Data to Many TCP Clients

    In article <1189086966.378094.255460@w3g2000hsg.googlegroups.c om>,
    reuven wrote:

    >> >I need to send the same data to many (e.g. 1000) clients over TCP.

    >>
    >> Are these clients all across the Internet or are they on a local network?


    That is an important question. Unless a very very high wide area
    speed network or very, very low speed CPUs are involved, optimizing
    "TCP Window (send buffer)" concerns are probably misplaced. At low
    speeds, say less than 100 Mbit/sec, and with a sending CPU that is not too
    ancient, say at least 500 MHz, the sending CPU will have no trouble
    keeping up and the bottleneck wiill be the slow network.


    >A customer requirement. The customer currently has a solution in place
    >that uses TCP and is afraid of NAT/Firewall issues. As he put it, "I
    >don't want to get a single call about firewall problems." I am trying
    >to support more clients without changing hardware or protocol.



    >> >Using standard methods, I would open 1000 connections, loop and send
    >> >the data to each connection. The stack would provide each connection
    >> >with its own TCP Window (send buffer).

    >>
    >> Right.


    "TCP Window (send buffer)" suggests a model of TCP/IP code that is
    almost certainly false. "Send buffers" are not necessarily simplistic
    buffers into which user data is copied from the application and then
    given to the Ethernet machinery. They can be blocks of RAM that are
    passed around by various kinds of pointer, such as a page frame number.
    The result is that the effective copying is cheap. Look for the
    notion of "cluster mbuf."

    Besides, the costs of TCP output are often in computing the TCP
    checksums, or more specifically, in waiting for the the CPU data
    cache misses forced by computing the checksums. A single block of
    data sent to 1000 different systems will involve 1000 different TCP
    checksums because of the psuedo-header.

    >> >There would be a significant memory & cpu advantage if I could send
    >> >the data once to the stack and have it share the data (and send
    >> >buffer) once between the 1000 clients.

    >>
    >> Yes, it would, but no OS I know of supports such a concept. What OS are
    >> you using for this project.


    Again, depending the details of what is mean, that is not exactly
    true. Moe than one UNIX-like system does "page flipping" or, equivalently,
    has "zero copy sockets."


    >> How much data are you sending out? Do all the machines have to receive the
    >> data at the same time? Could you create a distribution tree such that you
    >> machine sends the data to (for example) 10 machines, then those 10 machines
    >> eash send the data to 10 machines, then those 100 machines each send the
    >> data to 10 machines, etc?

    >
    >Again, not really an option. Customer requirement is to do it on his
    >existing machines.


    That response suggests that the idea was not understood. I think
    the idea was not to use any new machines, but to re-arrange the
    communications among them. Instead of having a single sender and
    1000 receivers, have some of the receivers also send. If the original
    source sends to 6 of the 1000 receivers, each of those 6 contact another
    6 of the receivers, and so on, you would need only a tree 4 layers deep
    to reach all 1000 systems.


    >Based on your answer, I think you pretty much got the gist of my
    >question. When you say that TCP won't do what I need I assume you mean
    >that the standard stacks won't do it. So I guess my next challenge is
    >how to go about getting a non-standard stack.


    On the contrary, the next challenge should be making measurements to
    validate the assumption that "TCP Window (send buffer)" is a valid
    concern. Except on a private, high speed network that can move at least
    100 Mbit/sec to all 1000 targets, "TCP Window (send buffer)" is unlikely
    to be a relevant notion. Even if it is relevant, you are likely to
    encounter problems with 1000 of simultaneously open sockets before you
    notice any buffer problems.


    Vernon Schryver vjs@rhyolite.com

  5. Re: Sending Duplicate Data to Many TCP Clients

    reuven writes:

    > Based on your answer, I think you pretty much got the gist of my
    > question. When you say that TCP won't do what I need I assume you mean
    > that the standard stacks won't do it. So I guess my next challenge is
    > how to go about getting a non-standard stack.


    No, he means that TCP the protocol isn't designed to do what you need.
    Don't expect to find a non-standard stack that does what you need,
    either, since anyone going down that route would go all the way and
    use a multicast connection orient protocol instead of trying to wedge
    TCP into that implementation.

    You don't say how much data is involved, but if your really
    constrained the way you say you are, I suggest you just implement it
    in the straightforward way and collect your fee. Then you can look at
    the actual performance and decide if you need to improve it as a
    follow-on project. If you really have enough data flowing for the
    buffer space and CPU overhead to be a problem, you're likely to be
    running into bandwidth problems, too, and they won't change if you
    only change the number of application-to-protocol-stack interactions.
    Buffer space and CPU issues are local to the machine, and can be
    solved by buying a bigger machine; bandwidth issues affect the entire
    network. Once the solution is implemented as spec'd using TCP, it
    might be easier for the client to see the advantages of using a
    different protocol.

    -don

  6. Re: Sending Duplicate Data to Many TCP Clients

    reuven wrote:
    > So I guess my next challenge is how to go about getting a
    > non-standard stack.


    I would think your customer would (certainly should, at least
    initially) balk at putting a non-standard stack on his systems. I
    think that unless you can implement the aforementioned distribution
    tree you will just have to go ahead and establish 1000 distinct TCP
    connections (probably not all at once) to distribute the data.

    rick jones
    --
    Process shall set you free from the need for rational thought.
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  7. Re: Sending Duplicate Data to Many TCP Clients

    On Sep 6, 3:53 am, reuven wrote:
    > I need to send the same data to many (e.g. 1000) clients over TCP.
    >
    > I need to use TCP (UDP, RUDP, PGM, etc. are not options).
    >
    > Using standard methods, I would open 1000 connections, loop and send
    > the data to each connection. The stack would provide each connection
    > with its own TCP Window (send buffer).
    >
    > There would be a significant memory & cpu advantage if I could send
    > the data once to the stack and have it share the data (and send
    > buffer) once between the 1000 clients.
    >
    > Anyone know of a way to do this on Windows and/or Linux?


    You can (sort of / mostly) do this on Windows using IOCP. Windows
    permits you to lock an application buffer in memory and then post
    multiple requests using that same buffer.

    However, the tradeoffs with doing this generally outweigh the
    advantages, and I would not recommend bothering. Odds are any
    performance issues you have are due to something completely unrelated
    to the cost of actually sending the data.

    Even ultra-high-performance code for Windows seldom does this. It
    requires setting the kernel send buffer to zero and keeping the
    application buffer around until the data is acknowledged by the other
    end.

    The only case in which I could see it being worth the effort is where
    the cost to keep 1,000 copies of the data is significant. I maintain
    a code base that basically sends the same data to subsets of 10,000 or
    more TCP clients. We don't even set the kernel's send buffers to zero
    -- ironically because of throughput and resource consumption issues.

    DS


  8. Re: Sending Duplicate Data to Many TCP Clients

    First, thanks to all the responders.

    As I mentioned, my problems are both in CPU and MEMORY and in the fact
    that the customer has put significant restraints on changing the
    hardware & protocol. Cascading the send is not an option because of
    the network topology.

    I suspect that you are correct and the CPU issue is mostly a red-
    herring. (the hardware does support offloaded checksum) but the memory
    issue is very real. Also, I was a bit misleading in mentioning 1000
    connections. The number is actually closer to 2000 with a total output
    of around 1.2 gigabit per second (using 2 x 1 gigabit NICs) per server
    (this is the number I need to improve on).

    IOCP sounds line an interesting direction. I'll look into it.

    Thanks again,

    rw


  9. Re: Sending Duplicate Data to Many TCP Clients

    In article <1189278847.226956.237620@19g2000hsx.googlegroups.c om>,
    reuven wrote:

    > First, thanks to all the responders.
    >
    > As I mentioned, my problems are both in CPU and MEMORY and in the fact
    > that the customer has put significant restraints on changing the
    > hardware & protocol. Cascading the send is not an option because of
    > the network topology.


    If he's going to put so many constraints on, maybe HE should design it
    and just let you code what he's designed.

    If he doesn't want to do that, he should specify the overall goal and
    let you design the best way to do it.

    --
    Barry Margolin, barmar@alum.mit.edu
    Arlington, MA
    *** PLEASE post questions in newsgroups, not directly to me ***
    *** PLEASE don't copy me on replies, I'll read them in the group ***

  10. Re: Sending Duplicate Data to Many TCP Clients

    "reuven" wrote in message
    news:1189278847.226956.237620@19g2000hsx.googlegro ups.com...
    > First, thanks to all the responders.
    >
    > As I mentioned, my problems are both in CPU and MEMORY and in the fact
    > that the customer has put significant restraints on changing the
    > hardware & protocol. Cascading the send is not an option because of
    > the network topology.
    >
    > I suspect that you are correct and the CPU issue is mostly a red-
    > herring. (the hardware does support offloaded checksum) but the memory
    > issue is very real. Also, I was a bit misleading in mentioning 1000
    > connections. The number is actually closer to 2000 with a total output
    > of around 1.2 gigabit per second (using 2 x 1 gigabit NICs) per server
    > (this is the number I need to improve on).


    Some problems with ip/tcp:

    1. IP checksum needs to be re-calculated for every client.(because of
    changing address/port)

    2. TCP checksum needs to be re-calculated for every client. (because of
    changing address/port) TCP does have a checksum ?)

    3. Any extra data integrity checks like crc32 or hashing might have to be
    done again.

    However 1 and 2 could be speed up maybe by doing:

    For 1. Subtract IP address and Port from checksum, then re-add different ip
    address and different port to checksum, might work, might save some cycles.

    For 2. Do the same.

    For 3. Probably needs to be completely re-calculated.

    4. Then there is ofcourse the extra call overhead and probably extra copies
    inside winsock.

    5. Extra memory requirements for the windows which could use up lots of
    memory.

    I like your envisioned solution kinda:

    A. Specify a group of ip addressess and possibly ports.

    B. Specify the data to be transferred.

    C. Give each ip address/port it's own ack system etc.

    D. Make sure to share the data.

    E. Make sure not to add any additional checksumming or so.

    F. Would work best if clients get updated at the same speed, so they are all
    in sync.

    Gains to be expected:

    Less cpu usage for the sender.

    Less memory usage for the sender.

    Coding speed might be higher, might prove easier to use.

    Especially this last gain sounds interesting.

    But then gain what if an individual client fails or so ?

    Might be kicked from the group or so, so some extra routines might be
    necessary to indicate these kind of failures, or maybe individual
    completions

    And/Or alterntively there might be a single completion indication for the
    whole group

    Each member should have ofcourse it's own speed limits and such.

    I kinda like this envisioned solution.

    More of a "working with groups concept me thinks "

    Bye,
    Skybuck.



  11. Re: Sending Duplicate Data to Many TCP Clients

    On Sep 8, 12:14 pm, reuven wrote:

    > I suspect that you are correct and the CPU issue is mostly a red-
    > herring. (the hardware does support offloaded checksum) but the memory
    > issue is very real. Also, I was a bit misleading in mentioning 1000
    > connections. The number is actually closer to 2000 with a total output
    > of around 1.2 gigabit per second (using 2 x 1 gigabit NICs) per server
    > (this is the number I need to improve on).
    >
    > IOCP sounds line an interesting direction. I'll look into it.


    If you're not using IOCP, you're definitely going to lose. I would
    suggest setting the send buffer to be as low as possible (to conserve
    memory), and keeping a reference counted linked list of messages,
    removing each message as soon as it has been sent to the last client.
    This way, the duplication is minimized.

    DS


  12. Re: Sending Duplicate Data to Many TCP Clients

    In article <1189278847.226956.237620@19g2000hsx.googlegroups.c om>,
    reuven wrote:
    >First, thanks to all the responders.
    >
    >As I mentioned, my problems are both in CPU and MEMORY and in the fact
    >that the customer has put significant restraints on changing the
    >hardware & protocol. Cascading the send is not an option because of
    >the network topology.
    >
    >I suspect that you are correct and the CPU issue is mostly a red-
    >herring. (the hardware does support offloaded checksum) but the memory
    >issue is very real. Also, I was a bit misleading in mentioning 1000
    >connections. The number is actually closer to 2000 with a total output
    >of around 1.2 gigabit per second (using 2 x 1 gigabit NICs) per server
    >(this is the number I need to improve on).


    CPU usage, especially when sending data, is rarely a bottleneck. If you
    have real-time constraints, it might be, but look elsewhere first. You
    will want to collect some statistics on memory usage and correlate that
    with the number of active connections. The biggest win is generally in
    a good round-robin algorithm. Keep a table of all open connections and
    order your sends with the least-recently sent connection, limited by
    how many connections can concurrently be sending data without seriously
    impacting overall performance.

    Your bandwidth numbers and NIC configuration are vital pieces of
    information. See if you can find a way to bond the two NICs into a
    single virtual port. Or, if you can find a natural division in your
    topology, attach this server to two subnets and attempt to balance the
    load between them.

    >IOCP sounds line an interesting direction. I'll look into it.


    IO completion ports are a big win because you don't require a thread for
    each open connection. This often alleviates CPU-related issues.

  13. Re: Sending Duplicate Data to Many TCP Clients

    On Tue, 11 Sep 2007 18:11:23 +0000 (UTC), Howard Johnson wrote:
    > In article <1189278847.226956.237620@19g2000hsx.googlegroups.c om>,
    > reuven wrote:


    >>IOCP sounds line an interesting direction. I'll look into it.

    >
    > IO completion ports are a big win because you don't require a thread for
    > each open connection. This often alleviates CPU-related issues.


    You mean you cannot handle more than one TCP connection in a Windows
    thread without IOCP? Just curious -- I'm on Unix, so I typically use
    select(), but I assumed there was /some/ simple non-thread way in
    Win32, too.

    /Jorgen

    --
    // Jorgen Grahn \X/ snipabacken.dyndns.org> R'lyeh wgah'nagl fhtagn!

  14. Re: Sending Duplicate Data to Many TCP Clients

    On Sep 23, 9:07 am, Jorgen Grahn
    wrote:
    > On Tue, 11 Sep 2007 18:11:23 +0000 (UTC), Howard Johnson wrote:
    > > In article <1189278847.226956.237...@19g2000hsx.googlegroups.c om>,
    > > reuven wrote:
    > >>IOCP sounds line an interesting direction. I'll look into it.

    >
    > > IO completion ports are a big win because you don't require a thread for
    > > each open connection. This often alleviates CPU-related issues.

    >
    > You mean you cannot handle more than one TCP connection in a Windows
    > thread without IOCP? Just curious -- I'm on Unix, so I typically use
    > select(), but I assumed there was /some/ simple non-thread way in
    > Win32, too.
    >
    > /Jorgen
    >
    > --
    > // Jorgen Grahn > \X/ snipabacken.dyndns.org> R'lyeh wgah'nagl fhtagn!


    You can use the standard select mechanism on windows also.

    see http://msdn.microsoft.com/msdnmag/issues/1000/Winsock for an
    explanation of why IOCP is superior to select for large scale servers.


  15. Re: Sending Duplicate Data to Many TCP Clients

    In article <1190548553.490083.290770@50g2000hsm.googlegroups.c om>,
    reuven wrote:

    >> > IO completion ports are a big win because you don't require a thread for
    >> > each open connection. This often alleviates CPU-related issues.

    >>
    >> You mean you cannot handle more than one TCP connection in a Windows
    >> thread without IOCP? Just curious -- I'm on Unix, so I typically use
    >> select(), but I assumed there was /some/ simple non-thread way in
    >> Win32, too.


    >You can use the standard select mechanism on windows also.


    "Standard select() mechanism on Window" is not entirely accurate, because
    Winsock sockets are pointers of some kind instead of small integers
    like UNIX file descriptors. Naive, extremely simple UNIX select() style
    code might not notice the difference, but you'll have reason to grumble
    while trying to keep other code portable between Windows and UNIX-like
    systems. As far as I can recall, the issues are minor, but they involve
    more than merely a typedef or two used for your sockets.

    Those problems distinct from the sometimes substantial problems in
    other parts of Winsock such as with Microsoft's reading of POSIX
    and odd choices in suppoort of get/setsockopt(), ioctl(), etc.


    >see http://msdn.microsoft.com/msdnmag/issues/1000/Winsock for an
    >explanation of why IOCP is superior to select for large scale servers.


    I've read reasonable people saying good things about Microsoft's Complete
    Ports, but that page starts with so such sales nonsense and assumptions
    that application code must garbage written by idiots that it might convince
    people with technical clues otherwise. Later on it becomse more sensible.
    --


    Vernon Schryver vjs@rhyolite.com

+ Reply to Thread