Degradation of TCP connection - VxWorks

This is a discussion on Degradation of TCP connection - VxWorks ; Hi everyone, I've got a TCP communications problem in VxWorks (6.5) that has me stumped. I've got a Windows XP machine running a LabView program that sends 500- byte TCP/IP packets to the VxWorks app (running on a single-board computer ...

+ Reply to Thread
Results 1 to 16 of 16

Thread: Degradation of TCP connection

  1. Degradation of TCP connection

    Hi everyone,

    I've got a TCP communications problem in VxWorks (6.5) that has me
    stumped.

    I've got a Windows XP machine running a LabView program that sends 500-
    byte TCP/IP packets to the VxWorks app (running on a single-board
    computer w/ a Motorola PowerPC 7447 processor, elsewhere on the
    network) at 10Hz. The VxWorks app reads some temperature sensors from
    its A/D boards, packages up that data (about 1kB), and sends it back
    to the LabView app, also at 10Hz. There is another task on the VxWorks
    app that performs some simple calculations on the data, packages up
    the results of those calculations (again, into packets about 1kB in
    size) and sends them to the LabView app, also at about 10Hz. The
    socket for the TCP connection is a global variable that is used by
    both VxWorks tasks, and protected by a mutex.

    Here is the problem. After about 70 hours of communication, the
    connection fails. A packet sniffer (Wireshark) revealed that the
    VxWorks app suddenly stops hearing from the LabView app. Both parties
    continue to send packets until (1) the LabView app gets nervous that
    VxWorks hasn't been incrementing its ACKs and so begins retransmitting
    old packets, and (2) the VxWorks app gets nervous that it hasn't
    received anything from the LabView app and so begins retransmitting
    old packets. Soon the apps fill their respective send buffers, the
    connection times out, and it's game over. The VxWorks program seems to
    lock up and the app must be restarted.

    This problem happened after 70 hours of flawless communication. After
    rebooting VxWorks, we ran 22 hours until the next failure. After
    rebooting again, 28 hours until the next.

    At first, it seemed like this behavior might be caused by an
    intermittently-broken receive wire on the VxWorks side, but an
    inspection suggests that the hardware (at least outside the single-
    board computer) is fine. We are begrudgingly confident that the
    problem is not on the Windows side, because the packet data
    demonstrates that the problem starts when the VxWorks app first stops
    listening. I am hesitant to blame the application-level software,
    because it appears from the packet data that the VxWorks TCP stack
    simply does not receive (or ignores) the LabView packets, and so the
    poor application simply sees the stream of packets stop. Lastly, we
    attempted a workaround where the LabView app closes and reopens the
    connection every two hours. After implementing this code, the
    connection failed 28 hours later. In an attempt to simplify
    communications, we enabled TCP_NODELAY on both sides, to no avail.

    I have no more tricks up my sleeve. I am not familiar with the VxWorks
    TCP stack; perhaps there are buffers (aside from the TCP send and
    receive buffers) that can fill up or degrade over time? Is there any
    way I can probe into the TCP stack and determine its health? Do you
    think I'm barking up the wrong tree? A friend suggested setting the
    TCP window size to something very small in order to minimize the
    number of packets "in the air" on the network, but I'm not sure how to
    set this in VxWorks. Does anyone know?

    In short: Any suggestions? Has anyone encountered this behavior
    before?

    Thank you all in advance for your suggestions and time,
    Justin

  2. Re: Degradation of TCP connection

    On Jul 22, 3:12 pm, justin.pear...@gmail.com wrote:
    > Hi everyone,
    >
    > I've got a TCP communications problem in VxWorks (6.5) that has me
    > stumped.
    >
    > I've got a Windows XP machine running a LabView program that sends 500-
    > byte TCP/IP packets to the VxWorks app (running on a single-board
    > computer w/ a Motorola PowerPC 7447 processor, elsewhere on the
    > network) at 10Hz. The VxWorks app reads some temperature sensors from
    > its A/D boards, packages up that data (about 1kB), and sends it back
    > to the LabView app, also at 10Hz. There is another task on the VxWorks
    > app that performs some simple calculations on the data, packages up
    > the results of those calculations (again, into packets about 1kB in
    > size) and sends them to the LabView app, also at about 10Hz. The
    > socket for the TCP connection is a global variable that is used by
    > both VxWorks tasks, and protected by a mutex.
    >
    > Here is the problem. After about 70 hours of communication, the
    > connection fails. A packet sniffer (Wireshark) revealed that the
    > VxWorks app suddenly stops hearing from the LabView app. Both parties
    > continue to send packets until (1) the LabView app gets nervous that
    > VxWorks hasn't been incrementing its ACKs and so begins retransmitting
    > old packets, and (2) the VxWorks app gets nervous that it hasn't
    > received anything from the LabView app and so begins retransmitting
    > old packets. Soon the apps fill their respective send buffers, the
    > connection times out, and it's game over. The VxWorks program seems to
    > lock up and the app must be restarted.
    >
    > This problem happened after 70 hours of flawless communication. After
    > rebooting VxWorks, we ran 22 hours until the next failure. After
    > rebooting again, 28 hours until the next.
    >
    > At first, it seemed like this behavior might be caused by an
    > intermittently-broken receive wire on the VxWorks side, but an
    > inspection suggests that the hardware (at least outside the single-
    > board computer) is fine. We are begrudgingly confident that the
    > problem is not on the Windows side, because the packet data
    > demonstrates that the problem starts when the VxWorks app first stops
    > listening. I am hesitant to blame the application-level software,
    > because it appears from the packet data that the VxWorks TCP stack
    > simply does not receive (or ignores) the LabView packets, and so the
    > poor application simply sees the stream of packets stop. Lastly, we
    > attempted a workaround where the LabView app closes and reopens the
    > connection every two hours. After implementing this code, the
    > connection failed 28 hours later. In an attempt to simplify
    > communications, we enabled TCP_NODELAY on both sides, to no avail.
    >
    > I have no more tricks up my sleeve. I am not familiar with the VxWorks
    > TCP stack; perhaps there are buffers (aside from the TCP send and
    > receive buffers) that can fill up or degrade over time? Is there any
    > way I can probe into the TCP stack and determine its health? Do you
    > think I'm barking up the wrong tree? A friend suggested setting the
    > TCP window size to something very small in order to minimize the
    > number of packets "in the air" on the network, but I'm not sure how to
    > set this in VxWorks. Does anyone know?
    >
    > In short: Any suggestions? Has anyone encountered this behavior
    > before?
    >
    > Thank you all in advance for your suggestions and time,
    > Justin


    Oh, and I failed to mention that the VxWorks app is doing the
    listening in the TCP connection. The LabView app connects to it.

  3. Re: Degradation of TCP connection

    Yes vxworks does have it's own network data buffers that eventually
    can cause problems. I am having a similar issue with vxWorks TCP and
    can share what I've been doing to try and diagnose it in the event it
    gives you some ideas.

    My application requires sending tons of data at very fast rates over a
    Gigabit ethernet. Essentially I am trying to send 4,800,000 bytes per
    second (this may prove unfeasible). Anyways, what I do is read voltage
    samples off an A/D card connected to my Single Board computer via the
    PC104-Plus interface, at a rate of 400k samples per second for 6
    channels. Each sample has 16 bits, or 2 bytes of data. However, the A/
    D card's data buffer can only hold about 64K samples worth of data
    before overflowing, so I need to make sure I am taking data off the A/
    D card at a constant rate and then sending that data out in large
    packets over the ethernet.

    After 6000 samples accumulate in the data buffer, I have a PCI
    interrupt trip which copies the sample data off the PCI buffer and
    into another buffer, then sends a message to my ethernet application
    telling it to send the data on the second buffer out over the TCP
    socket.

    My connection keeps up fine for the first 10 or so interrupt cycles,
    then the network write task begins falling behind. After some
    troubleshooting, I determined that after sending about 128k bytes out
    over the TCP socket, my problems start. The same holds true when I did
    a test with UDP (just to verify that it wasn't the receive side
    causing the issue). If I shrink the packets, or send less data at a
    time, I still eventually run into the same problem at 128kbytes every
    time.

    The issue seems to be (just a theory right now) that the vxworks
    network data buffer has filled up at this point and needs to free the
    memory and re-initialize, so the tNetTask is pending and my network
    send call has to wait. The 128k-bytes total seems to match up with the
    default network memory block total. I am trying a new method using
    something called the zbufSocket Library. This library basically allows
    you to send data over a socket without using vxwork's network data
    buffer. However, WindRiver took zBuf support out of vxworks version
    6.5, so I am wondering whether this is a dead end.

    Try this link http://slac.stanford.edu/exp/glast/f...e/c-tcpip.html
    and go to section "4.3.3 Network Memory Pool Configuration". This
    covers how the network memory is set up. I am still trying to digest
    some of it myself to get a better idea. There does appear to be some
    way to diagnose network memory usage.


    If I come across something, I'll post a better update.

  4. Re: Degradation of TCP connection

    On Jul 23, 2:05 pm, gtd...@gmail.com wrote:
    > Yes vxworks does have it's own network data buffers that eventually
    > can cause problems. I am having a similar issue with vxWorks TCP and
    > can share what I've been doing to try and diagnose it in the event it
    > gives you some ideas.
    >
    > My application requires sending tons of data at very fast rates over a
    > Gigabit ethernet. Essentially I am trying to send 4,800,000 bytes per
    > second (this may prove unfeasible). Anyways, what I do is read voltage
    > samples off an A/D card connected to my Single Board computer via the
    > PC104-Plus interface, at a rate of 400k samples per second for 6
    > channels. Each sample has 16 bits, or 2 bytes of data. However, the A/
    > D card's data buffer can only hold about 64K samples worth of data
    > before overflowing, so I need to make sure I am taking data off the A/
    > D card at a constant rate and then sending that data out in large
    > packets over the ethernet.
    >
    > After 6000 samples accumulate in the data buffer, I have a PCI
    > interrupt trip which copies the sample data off the PCI buffer and
    > into another buffer, then sends a message to my ethernet application
    > telling it to send the data on the second buffer out over the TCP
    > socket.
    >
    > My connection keeps up fine for the first 10 or so interrupt cycles,
    > then the network write task begins falling behind. After some
    > troubleshooting, I determined that after sending about 128k bytes out
    > over the TCP socket, my problems start. The same holds true when I did
    > a test with UDP (just to verify that it wasn't the receive side
    > causing the issue). If I shrink the packets, or send less data at a
    > time, I still eventually run into the same problem at 128kbytes every
    > time.
    >
    > The issue seems to be (just a theory right now) that the vxworks
    > network data buffer has filled up at this point and needs to free the
    > memory and re-initialize, so the tNetTask is pending and my network
    > send call has to wait. The 128k-bytes total seems to match up with the
    > default network memory block total. I am trying a new method using
    > something called the zbufSocket Library. This library basically allows
    > you to send data over a socket without using vxwork's network data
    > buffer. However, WindRiver took zBuf support out of vxworks version
    > 6.5, so I am wondering whether this is a dead end.
    >
    > Try this linkhttp://slac.stanford.edu/exp/glast/flight/sw/vxdocs/vxworks/netguide/...
    > and go to section "4.3.3 Network Memory Pool Configuration". This
    > covers how the network memory is set up. I am still trying to digest
    > some of it myself to get a better idea. There does appear to be some
    > way to diagnose network memory usage.
    >
    > If I come across something, I'll post a better update.



    There are two important points here:

    1) You're using VxWorks 6.5, which has a new TCP/IP stack (the IPNET
    stack from Interpeak, which Wind River acquired). The documentation
    link you posted is for an older version of VxWorks that used a BSD-
    derived stack. Some of the information in that documentation is no
    longer valid for the new stack, in particular that which pertains to
    the stack's internal buffer management: there is no "system pool" and
    "data pool" any longer. If you're looking at netPoolShow(), I wouldn't
    bother. You can use that to check the ethernet driver's netpool, but
    internally IPNET uses a totally different buffer management scheme, so
    the netBufLib debugging stuff won't help you.

    2) You failed to specify what ethernet controller you're using. I know
    that the MPC7447 doesn't include built in networking hardware, so your
    board must have some other ethernet chip on it (standalone, or part of
    some combined I/O controller device). Please explain what controller
    it is, exactly. This is important, because your problem might not be a
    general networking issue, but a bug in the ethernet driver. If it
    helps, tell us what BSP or single board computer your design is based
    on.

    And no, you can't use zbufs with the IPNET stack. The major design
    difference between IPNET and the BSD-derived stack is that the BSD
    code allows internally stored packet data to be fragmented across
    multiple buffers (mbufs) while the IPNET stack requires all packets to
    fit into a single contiguous buffer. zBufs are a VxWorks-specific
    extension to the BSD-derived stack that allow an application supplied
    buffer (or buffers) to be directly mated to an mbuf (or mbuf chain)
    instead of having to allocate a whole mbuf tuple and copying the data
    from the application buffers into the mbuf cluster buffers. The
    problem is, sometimes an application may do a large write which has to
    be broken up into smaller packets (if an app write()s 64K of data to a
    socket, that has to eventually be split up into 1500 byte chunks for
    transmission over ethernet). With the BSD mbuf scheme, it's not that
    hard to just allocate a bunch of mbufs and set them to point to
    different sub-buffers within the bounds of the single large buffer
    provided by the application and then chain them together. But because
    IPNET requires all packets to fit into a single contiguous buffer, it
    doesn't support the ability to chain multiple fragments together. This
    means you can't help but copy the data in order to get it formatted
    correctly for transmission over the wire, which sort of defeats the
    purpose of zero copy buffers. There is talk, however, of finally
    implementing scatter/gather within IPNET so that zBufs can be brought
    back.

    There are arguments for and against both designs. The BSD mbuf-based
    design is more flexible and can be more frugal with memory, but the
    code is more complex. The IPNET ipnet_packet design is not as
    flexible, but the code is simpler, and using its own internal buffer
    handling scheme makes it more OS-agnostic, which was one of the IPNET
    design requirements. (Personally I prefer the BSD design. Critics
    complain that the days when you had to run BSD on a PDP-11 with
    minimal memory -- which is what necessitated the more frugal buffer
    management scheme in the first place -- are long over, and that I
    should learn stop worrying and love large amounts of RAM. I contend
    that just because you have a lot of RAM doesn't mean you shouldn't
    make frugal use of it, and besides, VxWorks does sometimes have to run
    on hardware with minimal memory.)

    -Bill

  5. Re: Degradation of TCP connection

    On Jul 24, 9:30 am, noiset...@gmail.com wrote:
    > On Jul 23, 2:05 pm, gtd...@gmail.com wrote:
    >
    >
    >
    > > Yes vxworks does have it's own network data buffers that eventually
    > > can cause problems. I am having a similar issue with vxWorksTCPand
    > > can share what I've been doing to try and diagnose it in the event it
    > > gives you some ideas.

    >
    > > My application requires sending tons of data at very fast rates over a
    > > Gigabit ethernet. Essentially I am trying to send 4,800,000 bytes per
    > > second (this may prove unfeasible). Anyways, what I do is read voltage
    > > samples off an A/D card connected to my Single Board computer via the
    > > PC104-Plus interface, at a rate of 400k samples per second for 6
    > > channels. Each sample has 16 bits, or 2 bytes of data. However, the A/
    > > D card's data buffer can only hold about 64K samples worth of data
    > > before overflowing, so I need to make sure I am taking data off the A/
    > > D card at a constant rate and then sending that data out in large
    > > packets over the ethernet.

    >
    > > After 6000 samples accumulate in the data buffer, I have a PCI
    > > interrupt trip which copies the sample data off the PCI buffer and
    > > into another buffer, then sends a message to my ethernet application
    > > telling it to send the data on the second buffer out over theTCP
    > > socket.

    >
    > > My connection keeps up fine for the first 10 or so interrupt cycles,
    > > then the network write task begins falling behind. After some
    > > troubleshooting, I determined that after sending about 128k bytes out
    > > over theTCPsocket, my problems start. The same holds true when I did
    > > a test with UDP (just to verify that it wasn't the receive side
    > > causing the issue). If I shrink the packets, or send less data at a
    > > time, I still eventually run into the same problem at 128kbytes every
    > > time.

    >
    > > The issue seems to be (just a theory right now) that the vxworks
    > > network data buffer has filled up at this point and needs to free the
    > > memory and re-initialize, so the tNetTask is pending and my network
    > > send call has to wait. The 128k-bytes total seems to match up with the
    > > default network memory block total. I am trying a new method using
    > > something called the zbufSocket Library. This library basically allows
    > > you to send data over a socket without using vxwork's network data
    > > buffer. However, WindRiver took zBuf support out of vxworks version
    > > 6.5, so I am wondering whether this is a dead end.

    >
    > > Try this linkhttp://slac.stanford.edu/exp/glast/flight/sw/vxdocs/vxworks/netguide/...
    > > and go to section "4.3.3 Network Memory Pool Configuration". This
    > > covers how the network memory is set up. I am still trying to digest
    > > some of it myself to get a better idea. There does appear to be some
    > > way to diagnose network memory usage.

    >
    > > If I come across something, I'll post a better update.

    >
    > There are two important points here:
    >
    > 1) You're using VxWorks 6.5, which has a newTCP/IP stack (the IPNET
    > stack from Interpeak, which Wind River acquired). The documentation
    > link you posted is for an older version of VxWorks that used a BSD-
    > derived stack. Some of the information in that documentation is no
    > longer valid for the new stack, in particular that which pertains to
    > the stack's internal buffer management: there is no "system pool" and
    > "data pool" any longer. If you're looking at netPoolShow(), I wouldn't
    > bother. You can use that to check the ethernet driver's netpool, but
    > internally IPNET uses a totally different buffer management scheme, so
    > the netBufLib debugging stuff won't help you.
    >
    > 2) You failed to specify what ethernet controller you're using. I know
    > that the MPC7447 doesn't include built in networking hardware, so your
    > board must have some other ethernet chip on it (standalone, or part of
    > some combined I/O controller device). Please explain what controller
    > it is, exactly. This is important, because your problem might not be a
    > general networking issue, but a bug in the ethernet driver. If it
    > helps, tell us what BSP or single board computer your design is based
    > on.
    >
    > And no, you can't use zbufs with the IPNET stack. The major design
    > difference between IPNET and the BSD-derived stack is that the BSD
    > code allows internally stored packet data to be fragmented across
    > multiple buffers (mbufs) while the IPNET stack requires all packets to
    > fit into a single contiguous buffer. zBufs are a VxWorks-specific
    > extension to the BSD-derived stack that allow an application supplied
    > buffer (or buffers) to be directly mated to an mbuf (or mbuf chain)
    > instead of having to allocate a whole mbuf tuple and copying the data
    > from the application buffers into the mbuf cluster buffers. The
    > problem is, sometimes an application may do a large write which has to
    > be broken up into smaller packets (if an app write()s 64K of data to a
    > socket, that has to eventually be split up into 1500 byte chunks for
    > transmission over ethernet). With the BSD mbuf scheme, it's not that
    > hard to just allocate a bunch of mbufs and set them to point to
    > different sub-buffers within the bounds of the single large buffer
    > provided by the application and then chain them together. But because
    > IPNET requires all packets to fit into a single contiguous buffer, it
    > doesn't support the ability to chain multiple fragments together. This
    > means you can't help but copy the data in order to get it formatted
    > correctly for transmission over the wire, which sort of defeats the
    > purpose of zero copy buffers. There is talk, however, of finally
    > implementing scatter/gather within IPNET so that zBufs can be brought
    > back.
    >
    > There are arguments for and against both designs. The BSD mbuf-based
    > design is more flexible and can be more frugal with memory, but the
    > code is more complex. The IPNET ipnet_packet design is not as
    > flexible, but the code is simpler, and using its own internal buffer
    > handling scheme makes it more OS-agnostic, which was one of the IPNET
    > design requirements. (Personally I prefer the BSD design. Critics
    > complain that the days when you had to run BSD on a PDP-11 with
    > minimal memory -- which is what necessitated the more frugal buffer
    > management scheme in the first place -- are long over, and that I
    > should learn stop worrying and love large amounts of RAM. I contend
    > that just because you have a lot of RAM doesn't mean you shouldn't
    > make frugal use of it, and besides, VxWorks does sometimes have to run
    > on hardware with minimal memory.)
    >
    > -Bill


    Hi Bill,

    Thanks for your thoughtful post.

    Here are some stats on the system we're using:
    - Curtiss-Wright 124 single-board computer
    - The 124 uses an END Ethernet driver, and the physical device is on a
    chip called the MV64460 (or Discovery III). I couldn't find any info
    on the driver versions. I figure it's linked to our BSP version
    number.

    You were very helpful to another poster* regarding a problem that you
    determined to be a buggy ethernet driver, made by DY-4 Systems. The
    original poster from that thread also seemed to be using a Curtiss-
    Wright (then DY-4 Systems) single-board computer. From our packet
    logs, we found that the MAC address of the single-board computer
    resolves to something that starts with DY-4... coincidence? Do you
    think that we're running into the same problem as the poster from the
    other thread because we're using the same crummy DY-4 ethernet driver?

    *
    http://groups.google.com/group/comp....a618f4306f23ad

    Thanks for your continued help,
    Justin

  6. Re: Degradation of TCP connection

    On Jul 24, 2:10 pm, justin.pear...@gmail.com wrote:
    > On Jul 24, 9:30 am, noiset...@gmail.com wrote:
    >
    >
    >
    > > On Jul 23, 2:05 pm, gtd...@gmail.com wrote:

    >
    > > > Yes vxworks does have it's own network data buffers that eventually
    > > > can cause problems. I am having a similar issue with vxWorksTCPand
    > > > can share what I've been doing to try and diagnose it in the event it
    > > > gives you some ideas.

    >
    > > > My application requires sending tons of data at very fast rates over a
    > > > Gigabit ethernet. Essentially I am trying to send 4,800,000 bytes per
    > > > second (this may prove unfeasible). Anyways, what I do is read voltage
    > > > samples off an A/D card connected to my Single Board computer via the
    > > > PC104-Plus interface, at a rate of 400k samples per second for 6
    > > > channels. Each sample has 16 bits, or 2 bytes of data. However, the A/
    > > > D card's data buffer can only hold about 64K samples worth of data
    > > > before overflowing, so I need to make sure I am taking data off the A/
    > > > D card at a constant rate and then sending that data out in large
    > > > packets over the ethernet.

    >
    > > > After 6000 samples accumulate in the data buffer, I have a PCI
    > > > interrupt trip which copies the sample data off the PCI buffer and
    > > > into another buffer, then sends a message to my ethernet application
    > > > telling it to send the data on the second buffer out over theTCP
    > > > socket.

    >
    > > > My connection keeps up fine for the first 10 or so interrupt cycles,
    > > > then the network write task begins falling behind. After some
    > > > troubleshooting, I determined that after sending about 128k bytes out
    > > > over theTCPsocket, my problems start. The same holds true when I did
    > > > a test with UDP (just to verify that it wasn't the receive side
    > > > causing the issue). If I shrink the packets, or send less data at a
    > > > time, I still eventually run into the same problem at 128kbytes every
    > > > time.

    >
    > > > The issue seems to be (just a theory right now) that the vxworks
    > > > network data buffer has filled up at this point and needs to free the
    > > > memory and re-initialize, so the tNetTask is pending and my network
    > > > send call has to wait. The 128k-bytes total seems to match up with the
    > > > default network memory block total. I am trying a new method using
    > > > something called the zbufSocket Library. This library basically allows
    > > > you to send data over a socket without using vxwork's network data
    > > > buffer. However, WindRiver took zBuf support out of vxworks version
    > > > 6.5, so I am wondering whether this is a dead end.

    >
    > > > Try this linkhttp://slac.stanford.edu/exp/glast/flight/sw/vxdocs/vxworks/netguide/...
    > > > and go to section "4.3.3 Network Memory Pool Configuration". This
    > > > covers how the network memory is set up. I am still trying to digest
    > > > some of it myself to get a better idea. There does appear to be some
    > > > way to diagnose network memory usage.

    >
    > > > If I come across something, I'll post a better update.

    >
    > > There are two important points here:

    >
    > > 1) You're using VxWorks 6.5, which has a newTCP/IP stack (the IPNET
    > > stack from Interpeak, which Wind River acquired). The documentation
    > > link you posted is for an older version of VxWorks that used a BSD-
    > > derived stack. Some of the information in that documentation is no
    > > longer valid for the new stack, in particular that which pertains to
    > > the stack's internal buffer management: there is no "system pool" and
    > > "data pool" any longer. If you're looking at netPoolShow(), I wouldn't
    > > bother. You can use that to check the ethernet driver's netpool, but
    > > internally IPNET uses a totally different buffer management scheme, so
    > > the netBufLib debugging stuff won't help you.

    >
    > > 2) You failed to specify what ethernet controller you're using. I know
    > > that the MPC7447 doesn't include built in networking hardware, so your
    > > board must have some other ethernet chip on it (standalone, or part of
    > > some combined I/O controller device). Please explain what controller
    > > it is, exactly. This is important, because your problem might not be a
    > > general networking issue, but a bug in the ethernet driver. If it
    > > helps, tell us what BSP or single board computer your design is based
    > > on.

    >
    > > And no, you can't use zbufs with the IPNET stack. The major design
    > > difference between IPNET and the BSD-derived stack is that the BSD
    > > code allows internally stored packet data to be fragmented across
    > > multiple buffers (mbufs) while the IPNET stack requires all packets to
    > > fit into a single contiguous buffer. zBufs are a VxWorks-specific
    > > extension to the BSD-derived stack that allow an application supplied
    > > buffer (or buffers) to be directly mated to an mbuf (or mbuf chain)
    > > instead of having to allocate a whole mbuf tuple and copying the data
    > > from the application buffers into the mbuf cluster buffers. The
    > > problem is, sometimes an application may do a large write which has to
    > > be broken up into smaller packets (if an app write()s 64K of data to a
    > > socket, that has to eventually be split up into 1500 byte chunks for
    > > transmission over ethernet). With the BSD mbuf scheme, it's not that
    > > hard to just allocate a bunch of mbufs and set them to point to
    > > different sub-buffers within the bounds of the single large buffer
    > > provided by the application and then chain them together. But because
    > > IPNET requires all packets to fit into a single contiguous buffer, it
    > > doesn't support the ability to chain multiple fragments together. This
    > > means you can't help but copy the data in order to get it formatted
    > > correctly for transmission over the wire, which sort of defeats the
    > > purpose of zero copy buffers. There is talk, however, of finally
    > > implementing scatter/gather within IPNET so that zBufs can be brought
    > > back.

    >
    > > There are arguments for and against both designs. The BSD mbuf-based
    > > design is more flexible and can be more frugal with memory, but the
    > > code is more complex. The IPNET ipnet_packet design is not as
    > > flexible, but the code is simpler, and using its own internal buffer
    > > handling scheme makes it more OS-agnostic, which was one of the IPNET
    > > design requirements. (Personally I prefer the BSD design. Critics
    > > complain that the days when you had to run BSD on a PDP-11 with
    > > minimal memory -- which is what necessitated the more frugal buffer
    > > management scheme in the first place -- are long over, and that I
    > > should learn stop worrying and love large amounts of RAM. I contend
    > > that just because you have a lot of RAM doesn't mean you shouldn't
    > > make frugal use of it, and besides, VxWorks does sometimes have to run
    > > on hardware with minimal memory.)

    >
    > > -Bill

    >
    > Hi Bill,
    >
    > Thanks for your thoughtful post.
    >
    > Here are some stats on the system we're using:
    > - Curtiss-Wright 124 single-board computer
    > - The 124 uses an END Ethernet driver, and the physical device is on a
    > chip called the MV64460 (or Discovery III). I couldn't find any info
    > on the driver versions. I figure it's linked to our BSP version
    > number.
    >
    > You were very helpful to another poster* regarding a problem that you
    > determined to be a buggy ethernet driver, made by DY-4 Systems. The
    > original poster from that thread also seemed to be using a Curtiss-
    > Wright (then DY-4 Systems) single-board computer. From our packet
    > logs, we found that the MAC address of the single-board computer
    > resolves to something that starts with DY-4... coincidence? Do you
    > think that we're running into the same problem as the poster from the
    > other thread because we're using the same crummy DY-4 ethernet driver?
    >
    > *http://groups.google.com/group/comp....hread/thread/5...
    >
    > Thanks for your continued help,
    > Justin



    No, it's not the same driver.

    Drivers usually come from one of two places: either Wind River, or a
    3rd party BSP supplier. Wind River supplies drivers for some commonly
    available NICs (i.e. the Intel PRO/100 or PRO/1000 PCI cards) and for
    some controllers in system-on-chip processors for which they provide
    BSPs (i.e. the TSEC ethernet on the Freescale MPC8560). Some board
    vendors supply their own BSPs and include their own ethernet drivers
    if VxWorks doesn't provide driver support already. (If no driver
    exists at all, you can either write one yourself, or pay Wind River
    Professional Services to write one for you.)

    In the other poster's case, his board had a 10Mbps NatSemi SONIC chip
    and VxWorks did include a driver for it (if_sn). Unfortunately, that
    driver turned out to be kinda crummy, and didn't hold up well under
    load.

    In your case, you're probably using the on-board gigabit MACs in the
    Discovery III system controller. According to the documentation, it
    has 3 gigabit ports. I'm pretty sure the network driver you have was
    not written by Wind River, though I don't know if it was done by
    Curtiss-Wright or Marvell. If I had to bet, I'd say that at least some
    of the code was provided by Marvell.

    Looking over your original post, I see that the time that elapses
    before the failure is not consistent -- in one case it was 70 hours,
    in others 20 and 28 hours. This could be a buffer exhaustion issue,
    but to me the variation in reproducibility suggests a race condition
    instead. It could be a driver bug, but you'll need to run some more
    tests in order to know for sure.

    Here are a couple of thoughts:

    - When your application and LabView stop communicating, can you still
    ping the target from the Windows XP machine? If yes, the ethernet
    driver and the IP layer of the stack are still working, at least to
    some extent, and it's something at the TCP layer that's gone wrong. If
    no, then it could be the driver, or a serious problem in the stack.

    - You say that it looks like the VxWorks target stops receiving
    traffic from LabView. How did you determine this? I usually check for
    receive operation by adding the target shell and the INCLUDE_IFCONFIG
    component. At the shell, you can do:

    -> ifconfig "motfcc0"
    motfcc0 Link type:Ethernet HWaddr 00:04:9f:07:08:09 Queue:none
    inet 147.11.46.192 mask 255.255.255.0 broadcast
    147.11.46.255
    UP RUNNING SIMPLEX BROADCAST MULTICAST
    MTU:1500 metric:1 VR:0 ifindex:2
    RX packets:24 mcast:7 errors:0 dropped:1
    TX packets:6 mcast:0 errors:0
    collisions:0 unsupported proto:0
    RX bytes:2198 TX bytes:438

    value = 0 = 0x0

    If you run ifconfig a couple of times and you see the "RX packets" and
    "RX bytes" incrementing, then this means the driver is still receiving
    frames and passing them into the stack. (The stack maintains the
    counters shown by ifconfig, not the driver, so you know the data is
    making the transition across the driver/stack boundary.) If this is
    the case, it means the receive path is working, but the LabView isn't
    getting any response back from the target because the transmit path is
    stalled. If you don't see the RX counters incrementing, then the
    receiver is actually stuck.

    Unfortunately, the TX counters are less useful: they indicate that the
    stack sourced packets to the underlying driver, but that doesn't tell
    you whether or not the outgoing frames were successfully transmited.
    You can test if the driver is still able to send traffic by including
    the INCLUDE_PING component in your image. When the target hangs, try
    to do:

    -> ping "xxx.xxx.xxx.255", 5

    from the target shell, where xxx.xxx.xxx is your IP network. This
    should (assuming your netmask is 255.255.255.0) cause the target to
    send some broadcast packets onto the wire, which you can observe with
    Wireshark.

    If you see the broadcast packets sent, and the RX counters don't
    increment, then the transmit path is working and the receiver is
    stalled.

    If the broadcast packets don't make it onto the wire, and the RX
    counters do increment, then the receive path is working and the
    transmitter is stalled.

    If the RX counters don't increment and you don't see any broadcast
    packets on the wire either, the interface is completely jammed (maybe
    it's stopped getting interrupts).

    - The board should have at least two ethernet ports (three, if Curtiss-
    Wright wired up all 3 MACs). As a test, I would enable a second port
    on the target and cable it to another machine. For example, add a
    second NIC to your Windows XP host and connect it to the other port on
    the target via crossover cable. (Don't cheat by plugging everything
    into the same hub/switch/whatever -- ideally, you want the two ports
    to be isolated.) Give the second link some dummy IP addresses, like
    10.0.0.2 (target) and 10.0.0.1 (Windows XP host). You should be able
    to ping the target from your Windows host over both the spare link
    (ping 10.0.0.2) and the primary link. Now start your LabView app going
    and wait until it fails again. Once it fails, try to ping the target
    over both links. If pinging the primary IP address fails, but pinging
    the spare IP succeeds, then the problem is almost certainly a driver
    bug which has caused the primary interface to become stalled somehow.
    (That is, the heavily loaded interface encountered a race condition or
    some other error condition from which it did not recover and is now
    wedged, while the unloaded spare link is still functional.) If both
    links fail to respond to ping, then the problem is more likely a stack
    issue. (It could still be a driver issue: each time you send a packet,
    the driver takes temporary ownership of it until the TX DMA operation
    completes -- if the driver sets up a large TX DMA ring and the
    transmitter stalls while it still has ownership of a lot of the
    stack's TX buffers, this could prevent the stack from being able to
    transmit packets entirely.)

    I have found that many vendor supplied drivers follow a pattern. The
    companies that make the networking silicon also create an OS-
    independent hardware abstraction library for managing the controller.
    To make a driver, they port the HAL to the target OS API, and then add
    a driver shim over the top of it. This is considered to be more
    effective from their perspective since it means that they can support
    several OSes with the same piece of core library code. If they find a
    bug specifically related to their ethernet hardware, they can then
    just patch the HAL code once and fix the bug in all their drivers at
    the same time (once they've tested on one OS, they can just recompile
    the drivers for all the others to pick up the fix).

    This sounds like a great idea, but there are drawbacks. Writing truly
    portable code is hard: often the HAL ends up polluted with many
    spaghetti #ifdefs to deal with platform differences, which complicates
    maintenance. Also, the object code can end up being very large (and
    some of it might be dead code that doesn't even apply to your
    plarform). If you're targeting Windows or UNIX, you might not care,
    but with VxWorks, small footprint is key. And VxWorks has some special
    requirements compared to other OSes: sometimes "portable" designs fail
    to take those differences into account.

    From an OS developer's perspective, the best driver is one that's
    small, easy to read, easy to maintain and which makes the best use of
    available OS (and network stack) facilities.

    Anyway, the point is that I wouldn't be surprised if the driver for
    the Discovery III ethernet has a bug lurking in it somewhere. I also
    bet a quarter that they didn't provide the source code for it either. :
    (

    -Bill

  7. Re: Degradation of TCP connection

    On Jul 25, 4:40 pm, noiset...@gmail.com wrote:
    > On Jul 24, 2:10 pm, justin.pear...@gmail.com wrote:
    >
    > > On Jul 24, 9:30 am, noiset...@gmail.com wrote:

    >
    > > > On Jul 23, 2:05 pm, gtd...@gmail.com wrote:

    >
    > > > > Yes vxworks does have it's own network data buffers that eventually
    > > > > can cause problems. I am having a similar issue with vxWorksTCPand
    > > > > can share what I've been doing to try and diagnose it in the event it
    > > > > gives you some ideas.

    >
    > > > > My application requires sending tons of data at very fast rates over a
    > > > > Gigabit ethernet. Essentially I am trying to send 4,800,000 bytes per
    > > > > second (this may prove unfeasible). Anyways, what I do is read voltage
    > > > > samples off an A/D card connected to my Single Board computer via the
    > > > > PC104-Plus interface, at a rate of 400k samples per second for 6
    > > > > channels. Each sample has 16 bits, or 2 bytes of data. However, the A/
    > > > > D card's data buffer can only hold about 64K samples worth of data
    > > > > before overflowing, so I need to make sure I am taking data off the A/
    > > > > D card at a constant rate and then sending that data out in large
    > > > > packets over the ethernet.

    >
    > > > > After 6000 samples accumulate in the data buffer, I have a PCI
    > > > > interrupt trip which copies the sample data off the PCI buffer and
    > > > > into another buffer, then sends a message to my ethernet application
    > > > > telling it to send the data on the second buffer out over theTCP
    > > > > socket.

    >
    > > > > My connection keeps up fine for the first 10 or so interrupt cycles,
    > > > > then the network write task begins falling behind. After some
    > > > > troubleshooting, I determined that after sending about 128k bytes out
    > > > > over theTCPsocket, my problems start. The same holds true when I did
    > > > > a test with UDP (just to verify that it wasn't the receive side
    > > > > causing the issue). If I shrink the packets, or send less data at a
    > > > > time, I still eventually run into the same problem at 128kbytes every
    > > > > time.

    >
    > > > > The issue seems to be (just a theory right now) that the vxworks
    > > > > network data buffer has filled up at this point and needs to free the
    > > > > memory and re-initialize, so the tNetTask is pending and my network
    > > > > send call has to wait. The 128k-bytes total seems to match up with the
    > > > > default network memory block total. I am trying a new method using
    > > > > something called the zbufSocket Library. This library basically allows
    > > > > you to send data over a socket without using vxwork's network data
    > > > > buffer. However, WindRiver took zBuf support out of vxworks version
    > > > > 6.5, so I am wondering whether this is a dead end.

    >
    > > > > Try this linkhttp://slac.stanford.edu/exp/glast/flight/sw/vxdocs/vxworks/netguide/...
    > > > > and go to section "4.3.3 Network Memory Pool Configuration". This
    > > > > covers how the network memory is set up. I am still trying to digest
    > > > > some of it myself to get a better idea. There does appear to be some
    > > > > way to diagnose network memory usage.

    >
    > > > > If I come across something, I'll post a better update.

    >
    > > > There are two important points here:

    >
    > > > 1) You're using VxWorks 6.5, which has a newTCP/IP stack (the IPNET
    > > > stack from Interpeak, which Wind River acquired). The documentation
    > > > link you posted is for an older version of VxWorks that used a BSD-
    > > > derived stack. Some of the information in that documentation is no
    > > > longer valid for the new stack, in particular that which pertains to
    > > > the stack's internal buffer management: there is no "system pool" and
    > > > "data pool" any longer. If you're looking at netPoolShow(), I wouldn't
    > > > bother. You can use that to check the ethernet driver's netpool, but
    > > > internally IPNET uses a totally different buffer management scheme, so
    > > > the netBufLib debugging stuff won't help you.

    >
    > > > 2) You failed to specify what ethernet controller you're using. I know
    > > > that the MPC7447 doesn't include built in networking hardware, so your
    > > > board must have some other ethernet chip on it (standalone, or part of
    > > > some combined I/O controller device). Please explain what controller
    > > > it is, exactly. This is important, because your problem might not be a
    > > > general networking issue, but a bug in the ethernet driver. If it
    > > > helps, tell us what BSP or single board computer your design is based
    > > > on.

    >
    > > > And no, you can't use zbufs with the IPNET stack. The major design
    > > > difference between IPNET and the BSD-derived stack is that the BSD
    > > > code allows internally stored packet data to be fragmented across
    > > > multiple buffers (mbufs) while the IPNET stack requires all packets to
    > > > fit into a single contiguous buffer. zBufs are a VxWorks-specific
    > > > extension to the BSD-derived stack that allow an application supplied
    > > > buffer (or buffers) to be directly mated to an mbuf (or mbuf chain)
    > > > instead of having to allocate a whole mbuf tuple and copying the data
    > > > from the application buffers into the mbuf cluster buffers. The
    > > > problem is, sometimes an application may do a large write which has to
    > > > be broken up into smaller packets (if an app write()s 64K of data to a
    > > > socket, that has to eventually be split up into 1500 byte chunks for
    > > > transmission over ethernet). With the BSD mbuf scheme, it's not that
    > > > hard to just allocate a bunch of mbufs and set them to point to
    > > > different sub-buffers within the bounds of the single large buffer
    > > > provided by the application and then chain them together. But because
    > > > IPNET requires all packets to fit into a single contiguous buffer, it
    > > > doesn't support the ability to chain multiple fragments together. This
    > > > means you can't help but copy the data in order to get it formatted
    > > > correctly for transmission over the wire, which sort of defeats the
    > > > purpose of zero copy buffers. There is talk, however, of finally
    > > > implementing scatter/gather within IPNET so that zBufs can be brought
    > > > back.

    >
    > > > There are arguments for and against both designs. The BSD mbuf-based
    > > > design is more flexible and can be more frugal with memory, but the
    > > > code is more complex. The IPNET ipnet_packet design is not as
    > > > flexible, but the code is simpler, and using its own internal buffer
    > > > handling scheme makes it more OS-agnostic, which was one of the IPNET
    > > > design requirements. (Personally I prefer the BSD design. Critics
    > > > complain that the days when you had to run BSD on a PDP-11 with
    > > > minimal memory -- which is what necessitated the more frugal buffer
    > > > management scheme in the first place -- are long over, and that I
    > > > should learn stop worrying and love large amounts of RAM. I contend
    > > > that just because you have a lot of RAM doesn't mean you shouldn't
    > > > make frugal use of it, and besides, VxWorks does sometimes have to run
    > > > on hardware with minimal memory.)

    >
    > > > -Bill

    >
    > > Hi Bill,

    >
    > > Thanks for your thoughtful post.

    >
    > > Here are some stats on the system we're using:
    > > - Curtiss-Wright 124 single-board computer
    > > - The 124 uses an END Ethernet driver, and the physical device is on a
    > > chip called the MV64460 (or Discovery III). I couldn't find any info
    > > on the driver versions. I figure it's linked to our BSP version
    > > number.

    >
    > > You were very helpful to another poster* regarding a problem that you
    > > determined to be a buggy ethernet driver, made by DY-4 Systems. The
    > > original poster from that thread also seemed to be using a Curtiss-
    > > Wright (then DY-4 Systems) single-board computer. From our packet
    > > logs, we found that the MAC address of the single-board computer
    > > resolves to something that starts with DY-4... coincidence? Do you
    > > think that we're running into the same problem as the poster from the
    > > other thread because we're using the same crummy DY-4 ethernet driver?

    >
    > > *http://groups.google.com/group/comp....hread/thread/5...

    >
    > > Thanks for your continued help,
    > > Justin

    >
    > No, it's not the same driver.
    >
    > Drivers usually come from one of two places: either Wind River, or a
    > 3rd party BSP supplier. Wind River supplies drivers for some commonly
    > available NICs (i.e. the Intel PRO/100 or PRO/1000 PCI cards) and for
    > some controllers in system-on-chip processors for which they provide
    > BSPs (i.e. the TSEC ethernet on the Freescale MPC8560). Some board
    > vendors supply their own BSPs and include their own ethernet drivers
    > if VxWorks doesn't provide driver support already. (If no driver
    > exists at all, you can either write one yourself, or pay Wind River
    > Professional Services to write one for you.)
    >
    > In the other poster's case, his board had a 10Mbps NatSemi SONIC chip
    > and VxWorks did include a driver for it (if_sn). Unfortunately, that
    > driver turned out to be kinda crummy, and didn't hold up well under
    > load.
    >
    > In your case, you're probably using the on-board gigabit MACs in the
    > Discovery III system controller. According to the documentation, it
    > has 3 gigabit ports. I'm pretty sure the network driver you have was
    > not written by Wind River, though I don't know if it was done by
    > Curtiss-Wright or Marvell. If I had to bet, I'd say that at least some
    > of the code was provided by Marvell.
    >
    > Looking over your original post, I see that the time that elapses
    > before the failure is not consistent -- in one case it was 70 hours,
    > in others 20 and 28 hours. This could be a buffer exhaustion issue,
    > but to me the variation in reproducibility suggests a race condition
    > instead. It could be a driver bug, but you'll need to run some more
    > tests in order to know for sure.
    >
    > Here are a couple of thoughts:
    >
    > - When your application and LabView stop communicating, can you still
    > ping the target from the Windows XP machine? If yes, the ethernet
    > driver and the IP layer of the stack are still working, at least to
    > some extent, and it's something at the TCP layer that's gone wrong. If
    > no, then it could be the driver, or a serious problem in the stack.
    >
    > - You say that it looks like the VxWorks target stops receiving
    > traffic from LabView. How did you determine this? I usually check for
    > receive operation by adding the target shell and the INCLUDE_IFCONFIG
    > component. At the shell, you can do:
    >
    > -> ifconfig "motfcc0"
    > motfcc0 Link type:Ethernet HWaddr 00:04:9f:07:08:09 Queue:none
    > inet 147.11.46.192 mask 255.255.255.0 broadcast
    > 147.11.46.255
    > UP RUNNING SIMPLEX BROADCAST MULTICAST
    > MTU:1500 metric:1 VR:0 ifindex:2
    > RX packets:24 mcast:7 errors:0 dropped:1
    > TX packets:6 mcast:0 errors:0
    > collisions:0 unsupported proto:0
    > RX bytes:2198 TX bytes:438
    >
    > value = 0 = 0x0
    >
    > If you run ifconfig a couple of times and you see the "RX packets" and
    > "RX bytes" incrementing, then this means the driver is still receiving
    > frames and passing them into the stack. (The stack maintains the
    > counters shown by ifconfig, not the driver, so you know the data is
    > making the transition across the driver/stack boundary.) If this is
    > the case, it means the receive path is working, but the LabView isn't
    > getting any response back from the target because the transmit path is
    > stalled. If you don't see the RX counters incrementing, then the
    > receiver is actually stuck.
    >
    > Unfortunately, the TX counters are less useful: they indicate that the
    > stack sourced packets to the underlying driver, but that doesn't tell
    > you whether or not the outgoing frames were successfully transmited.
    > You can test if the driver is still able to send traffic by including
    > the INCLUDE_PING component in your image. When the target hangs, try
    > to do:
    >
    > -> ping "xxx.xxx.xxx.255", 5
    >
    > from the target shell, where xxx.xxx.xxx is your IP network. This
    > should (assuming your netmask is 255.255.255.0) cause the target to
    > send some broadcast packets onto the wire, which you can observe with
    > Wireshark.
    >
    > If you see the broadcast packets sent, and the RX counters don't
    > increment, then the transmit path is working and the receiver is
    > stalled.
    >
    > If the broadcast packets don't make it onto the wire, and the RX
    > counters do increment, then the receive path is working and the
    > transmitter is stalled.
    >
    > If the RX counters don't increment and you don't see any broadcast
    > packets on the wire either, the interface is completely jammed (maybe
    > it's stopped getting interrupts).
    >
    > - The board should have at least two ethernet ports (three, if Curtiss-
    > Wright wired up all 3 MACs). As a test, I would enable a second port
    > on the target and cable it to another machine. For example, add a
    > second NIC to your Windows XP host and connect it to the other port on
    > the target via crossover cable. (Don't cheat by plugging everything
    > into the same hub/switch/whatever -- ideally, you want the two ports
    > to be isolated.) Give the second link some dummy IP addresses, like
    > 10.0.0.2 (target) and 10.0.0.1 (Windows XP host). You should be able
    > to ping the target from your Windows host over both the spare link
    > (ping 10.0.0.2) and the primary link. Now start your LabView app going
    > and wait until it fails again. Once it fails, try to ping the target
    > over both links. If pinging the primary IP address fails, but pinging
    > the spare IP succeeds, then the problem is almost certainly a driver
    > bug which has caused the primary interface to become stalled somehow.
    > (That is, the heavily loaded interface encountered a race condition or
    > some other error condition from which it did not recover and is now
    > wedged, while the unloaded spare link is still functional.) If both
    > links fail to respond to ping, then the problem is more likely a stack
    > issue. (It could still be a driver issue: each time you send a packet,
    > the driver takes temporary ownership of it until the TX DMA operation
    > completes -- if the driver sets up a large TX DMA ring and the
    > transmitter stalls while it still has ownership of a lot of the
    > stack's TX buffers, this could prevent the stack from being able to
    > transmit packets entirely.)
    >
    > I have found that many vendor supplied drivers follow a pattern. The
    > companies that make the networking silicon also create an OS-
    > independent hardware abstraction library for managing the controller.
    > To make a driver, they port the HAL to the target OS API, and then add
    > a driver shim over the top of it. This is considered to be more
    > effective from their perspective since it means that they can support
    > several OSes with the same piece of core library code. If they find a
    > bug specifically related to their ethernet hardware, they can then
    > just patch the HAL code once and fix the bug in all their drivers at
    > the same time (once they've tested on one OS, they can just recompile
    > the drivers for all the others to pick up the fix).
    >
    > This sounds like a great idea, but there are drawbacks. Writing truly
    > portable code is hard: often the HAL ends up polluted with many
    > spaghetti #ifdefs to deal with platform differences, which complicates
    > maintenance. Also, the object code can end up being very large (and
    > some of it might be dead code that doesn't even apply to your
    > plarform). If you're targeting Windows or UNIX, you might not care,
    > but with VxWorks, small footprint is key. And VxWorks has some special
    > requirements compared to other OSes: sometimes "portable" designs fail
    > to take those differences into account.
    >
    > From an OS developer's perspective, the best driver is one that's
    > small, easy to read, easy to maintain and which makes the best use of
    > available OS (and network stack) facilities.
    >
    > Anyway, the point is that I wouldn't be surprised if the driver for
    > the Discovery III ethernet has a bug lurking in it somewhere. I also
    > bet a quarter that they didn't provide the source code for it either. :
    > (
    >
    > -Bill


    Bill,

    I continue to appreciate your knowledgeable assistance on this
    problem! You have been so helpful. To address the points you brought
    up:

    > I'm pretty sure the network driver you have was
    > not written by Wind River, though I don't know if it was done by
    > Curtiss-Wright or Marvell. If I had to bet, I'd say that at least some
    > of the code was provided by Marvell.


    You're absolutely right. To quote one of our contacts at Curtiss-
    Wright,

    "Each of our boards will have an ethernet driver specific
    for the chip and the board. For the 124 board, the driver would have
    been based on code from Marvell (the manufacturer of the Discovery III
    bridge), and as updated by Curtiss Wright Controls Embedded Computing/
    Dy
    4 Systems."

    So that answers that.

    > - When your application and LabView stop communicating, can you still
    > ping the target from the Windows XP machine? If yes, the ethernet
    > driver and the IP layer of the stack are still working, at least to
    > some extent, and it's something at the TCP layer that's gone wrong. If
    > no, then it could be the driver, or a serious problem in the stack.


    We have not tried this yet, but it's on our (now much longer) list of
    things to try once the problem comes up again.

    > - You say that it looks like the VxWorks target stops receiving
    > traffic from LabView. How did you determine this? I usually check for
    > receive operation by adding the target shell and the INCLUDE_IFCONFIG
    > component.


    We had Wireshark running on a separate machine that was watching all
    the traffic on the network. Each time this anomaly happens, it starts
    when the VxWorks box stops ACKing packets sent from the Windows box.
    To be precise, the ACK number on the "VxWorks --> Windows" packets
    stop increasing. Soon, the Windows box, noticing that the VxWorks box
    is reporting the same ACK number, begins retransmitting packets.
    However, the VxWorks box still does not increment its ACK number. At
    the same time, the VxWorks box begins retransmitting data to the
    Windows box, as though it didn't hear the incrementing ACKs that the
    Windows box was sending to VxWorks.

    In short, we deduced it from a bunch of Wireshark data. Do you think
    this is a valid conclusion?

    We feel much more prepared for the next anomaly, though. We have
    enabled ifconfig() in our kernel and are running on the bench with our
    fingers crossed.

    Per your several suggestions, we'll try
    1. Pinging the VxWorks box from another machine on the network
    2. Pinging xxx.xxx.xxx.255 from the VxWorks box and seeing what
    happens in Wireshark
    3. Calling ifconfig() to see what the "Rx packets" and "Tx packets"
    counters are doing.

    > - The board should have at least two ethernet ports (three, if Curtiss-
    > Wright wired up all 3 MACs). As a test, I would enable a second port
    > on the target and cable it to another machine. For example,

    [snip]

    You are correct, Curtiss-Wright did wire up another NIC, but
    unfortunately we've had a problem enabling it. Another group of my
    coworkers is tackling that problem. If they get it working we'll try
    setting up another machine on a two-node network, the way you
    suggested.

    Thanks again for all your help. We've been in contact with Curtiss-
    Wright support and Wind River support, but this thread has provided us
    with the most help so far. In fact, our Wind River support contact
    provided a cornucopia of advice, almost all of which he copied from
    your last post. He omitted to include phrases which would point to a
    fault of Wind River's, like your phrase "a serious problem with the
    stack." Blah.

    Warm regards,
    Justin

  8. Re: Degradation of TCP connection

    Bill,

    I continue to appreciate your knowledgeable assistance on this
    problem! You have been so helpful. To address the points you brought
    up:

    > I'm pretty sure the network driver you have was
    > not written by Wind River, though I don't know if it was done by
    > Curtiss-Wright or Marvell. If I had to bet, I'd say that at least some
    > of the code was provided by Marvell.


    You're absolutely right. To quote one of our contacts at Curtiss-
    Wright,

    "Each of our boards will have an ethernet driver specific
    for the chip and the board. For the 124 board, the driver would have
    been based on code from Marvell (the manufacturer of the Discovery III
    bridge), and as updated by Curtiss Wright Controls Embedded Computing/
    Dy
    4 Systems."

    So that answers that.

    > - When your application and LabView stop communicating, can you still
    > ping the target from the Windows XP machine? If yes, the ethernet
    > driver and the IP layer of the stack are still working, at least to
    > some extent, and it's something at the TCP layer that's gone wrong. If
    > no, then it could be the driver, or a serious problem in the stack.


    We have not tried this yet, but it's on our (now much longer) list of
    things to try once the problem comes up again.

    > - You say that it looks like the VxWorks target stops receiving
    > traffic from LabView. How did you determine this? I usually check for
    > receive operation by adding the target shell and the INCLUDE_IFCONFIG
    > component.


    We had Wireshark running on a separate machine that was watching all
    the traffic on the network. Each time this anomaly happens, it starts
    when the VxWorks box stops ACKing packets sent from the Windows box.
    To be precise, the ACK number on the "VxWorks --> Windows" packets
    stop increasing. Soon, the Windows box, noticing that the VxWorks box
    is reporting the same ACK number, begins retransmitting packets.
    However, the VxWorks box still does not increment its ACK number. At
    the same time, the VxWorks box begins retransmitting data to the
    Windows box, as though it didn't hear the incrementing ACKs that the
    Windows box was sending to VxWorks.

    In short, we deduced it from a bunch of Wireshark data. Do you think
    this is a valid conclusion?

    We feel much more prepared for the next anomaly, though. We have
    enabled ifconfig() in our kernel and are running on the bench with our
    fingers crossed.

    Per your several suggestions, we'll try
    1. Pinging the VxWorks box from another machine on the network
    2. Pinging xxx.xxx.xxx.255 from the VxWorks box and seeing what
    happens in Wireshark
    3. Calling ifconfig() to see what the "Rx packets" and "Tx packets"
    counters are doing.

    > - The board should have at least two ethernet ports (three, if Curtiss-
    > Wright wired up all 3 MACs). As a test, I would enable a second port
    > on the target and cable it to another machine. For example,


    [snip]

    You are correct, Curtiss-Wright did wire up another NIC, but
    unfortunately we've had a problem enabling it. Another group of my
    coworkers is tackling that problem. If they get it working we'll try
    setting up another machine on a two-node network, the way you
    suggested.

    Thanks again for all your help. We've been in contact with Curtiss-
    Wright support and Wind River support, but this thread has provided us
    with the most help so far. In fact, our Wind River support contact
    provided a cornucopia of advice, almost all of which he copied from
    your last post. He omitted to include phrases which would point to a
    fault of Wind River's, like your phrase "a serious problem with the
    stack." Blah.

    Warm regards,
    Justin

  9. Re: Degradation of TCP connection

    Oh, and I just remembered another piece of the puzzle: The VxWorks
    machine is also exchanging data with another box on the network over
    UDP. We have timers in the VxWorks app that make it panic if it stops
    receiving UDP packets. It appears that during each of these anomalies,
    the VxWorks box continues to receive UDP packets just fine. That is,
    it appears as though it stops hearing from the TCP stream, but
    continues to receive UDP packets as normal.

    The two main suspects in this case are the VxWorks network stack and
    the ethernet driver on our single-board computer. Does this new data
    point to one over another?

    Thanks,
    Justin

  10. Re: Degradation of TCP connection

    On Aug 5, 4:07 pm, justin.pear...@gmail.com wrote:
    > Oh, and I just remembered another piece of the puzzle: The VxWorks
    > machine is also exchanging data with another box on the network over
    > UDP. We have timers in the VxWorks app that make it panic if it stops
    > receiving UDP packets. It appears that during each of these anomalies,
    > the VxWorks box continues to receive UDP packets just fine. That is,
    > it appears as though it stops hearing from the TCP stream, but
    > continues to receive UDP packets as normal.
    >
    > The two main suspects in this case are the VxWorks network stack and
    > the ethernet driver on our single-board computer. Does this new data
    > point to one over another?


    (You don't explain how you implement this "panic if the traffic stops"
    behavior. I'm sdduming you're using a VxWorks watchdog timer for this,
    and that when the watchdog fires, it triggers another task to reset
    the target.)

    In my opinion, it tends to point to some problem with the TX code in
    the driver.

    You say the target is exchanging data with another box. This implies
    that there should be both TX and RX activity. However, you also say
    that the app is designed to panic only if it stops receiving data. In
    your previous post you said you were using Wireshark to monitor the
    traffic from the target and noted that the TCP transmissions from the
    target had ceased, but you didn't say if the UDP traffic sent by this
    application had stopped as well.

    Again, the right way to check for continued RX activity is with the
    ifconfig() utility. However, assuming thet the app is actually still
    receiving UDP traffic (as opposed to having gotten blocked in a call
    into the stack), then it means the RX path in the ethernet driver and
    the stack are still nominally functional, and the TX path in the
    driver has gotten wedged.

    By the way, I have a simple and cheap stress test diagnostic I use to
    test drivers in VxWorks. It's not as good as using a dedicated traffic
    tester, but it can sometimes reveal interesting problems.

    There's a very simple utility called TTCP. The UNIX version can be
    downloaded from here:

    ftp://ftp.sgi.com/src/sgi/ttcp

    The Windows version can be downloaded from here:

    http://www.pcausa.com/Utilities/pcattcp.htm

    (I typically use the UNIX version, but it sounds like you're a Windows
    shop.)

    What I like to do is use this utility to bombard the target with small
    UDP packets as follows:

    % ttcp -s -u -l22 -n1000000 -t

    The options are:

    -s: source a bunch of garbage data
    -u: use UDP instead of TCP
    -l22: use a UDP payload length of 22 (should result in a 60 byte
    ethernet frame)
    -n1000000: The number of UDP datagrams to send (a lot)
    -t: transmit (as opposed to -r, receive)

    I'm pretty sure the Windows version supports the same options as the
    UNIX one.

    This excercises both the RX and TX path of the network driver, and
    some parts of the stack. By default, ttcp uses port 5000 to send
    traffic. There normally isn't any application running on the VxWorks
    target that's listening on this port, so when it receives a UDP
    datagram for this port number, the stack responds by sending an ICMP
    port unreachable message. If you send it a lot of datagrams, it will
    respond with a lot of messages. This will force traffic through the
    UDP receive path and the ICMP output path in the stack.

    I find that this generates a bit more traffic than using a flood ping,
    and it's helped me expose bugs in several VxWorks ethernet drivers in
    the past. Using the FreeBSD host in my office, I can generate
    something on the order of 200,000 frames/second on my gigabit ethernet
    interface. The types of failure modes you might see are:

    - exception in tNetTask (or possibly ipnetd using the new IPNET stack
    in 6.5)
    - exception in interrupt context (buggy interrupt service routine?)
    - RX stall (possibly due to mishandled RX overrun in the driver)
    - TX stall (possibly due to incorrectly implemented TX cleanup
    handling, or mishandled
    TX underrun)
    - RX and TX stall (possibly due to interrupts getting masked off and
    not re-enabled, or driver
    state getting hosed due to a race condition)
    - sluggish response on target shell (possibly due to driver doing too
    much work in
    interrupt context, or making excessive use of intLock()/intUnlock())

    If you notice any of these (especially an exception) then you've found
    a driver bug.

    -Bill

    > Thanks,
    > Justin



  11. Re: Degradation of TCP connection

    On Aug 5, 3:01 pm, justin.pear...@gmail.com wrote:
    > Bill,
    >
    > I continue to appreciate your knowledgeable assistance on this
    > problem! You have been so helpful. To address the points you brought
    > up:
    >
    > > I'm pretty sure the network driver you have was
    > > not written by Wind River, though I don't know if it was done by
    > > Curtiss-Wright or Marvell. If I had to bet, I'd say that at least some
    > > of the code was provided by Marvell.

    >
    > You're absolutely right. To quote one of our contacts at Curtiss-
    > Wright,
    >
    > "Each of our boards will have an ethernet driver specific
    > for the chip and the board. For the 124 board, the driver would have
    > been based on code from Marvell (the manufacturer of the Discovery III
    > bridge), and as updated by Curtiss Wright Controls Embedded Computing/
    > Dy
    > 4 Systems."
    >
    > So that answers that.
    >
    > > - When your application and LabView stop communicating, can you still
    > > ping the target from the Windows XP machine? If yes, the ethernet
    > > driver and the IP layer of the stack are still working, at least to
    > > some extent, and it's something at the TCP layer that's gone wrong. If
    > > no, then it could be the driver, or a serious problem in the stack.

    >
    > We have not tried this yet, but it's on our (now much longer) list of
    > things to try once the problem comes up again.


    My philosophy is: once you get a target into a failed state, gather as
    much data as you can from it before you reset it. Sometimes that's not
    a lot, but simple things like ping can often provide helpful clues.

    > > - You say that it looks like the VxWorks target stops receiving
    > > traffic from LabView. How did you determine this? I usually check for
    > > receive operation by adding the target shell and the INCLUDE_IFCONFIG
    > > component.

    >
    > We had Wireshark running on a separate machine that was watching all
    > the traffic on the network. Each time this anomaly happens, it starts
    > when the VxWorks box stops ACKing packets sent from the Windows box.
    > To be precise, the ACK number on the "VxWorks --> Windows" packets
    > stop increasing. Soon, the Windows box, noticing that the VxWorks box
    > is reporting the same ACK number, begins retransmitting packets.
    > However, the VxWorks box still does not increment its ACK number. At
    > the same time, the VxWorks box begins retransmitting data to the
    > Windows box, as though it didn't hear the incrementing ACKs that the
    > Windows box was sending to VxWorks.
    >
    > In short, we deduced it from a bunch of Wireshark data. Do you think
    > this is a valid conclusion?


    Oh. That's interesting. Okay, so if I understand you correctly, the
    target _is_ able to transmit packets when the problem occurs. This
    definitely points to a failure in the RX path somewhere. The fact that
    it's continuing to send TCP segments occasionally means that the
    stack's TCP timers are still firing, and that the driver can transmit
    frames onto the wire. If it stops receiving packets, the stack will
    think the peer hasn't acknowledged the current segment yet and will
    keep retransmitting it. (There are sometimes oddball cases where the
    stack on one side or the other becomes desynchronized, which just
    botches a single TCP stream while other traffic continues to flow
    normally. These are rare though, and it doesn't sound like you're
    doing anything that would trigger such a condition. In any case, this
    is why I asked if you could ping the target once the anomaly occured.
    My suspicion at this point is that you won't be able to.) Not being
    able to receive packets could mean a couple of things:

    - The RX state in the driver may have fallen out of sync with the chip
    - The receiver encountered an error from which the driver couldn't
    recover
    - RX interrupts have stopped firing
    - _all_ interrupts for that port have stopped firing (if TX interrupts
    have also stopped,
    the driver may still be able to send packets onto the wire for a
    short time)

    > We feel much more prepared for the next anomaly, though. We have
    > enabled ifconfig() in our kernel and are running on the bench with our
    > fingers crossed.
    >
    > Per your several suggestions, we'll try
    > 1. Pinging the VxWorks box from another machine on the network
    > 2. Pinging xxx.xxx.xxx.255 from the VxWorks box and seeing what
    > happens in Wireshark
    > 3. Calling ifconfig() to see what the "Rx packets" and "Tx packets"
    > counters are doing.


    Good. I'm curious to see the result. (And hopefully adding the
    additional components won't just make the problem disappear.)

    > > - The board should have at least two ethernet ports (three, if Curtiss-
    > > Wright wired up all 3 MACs). As a test, I would enable a second port
    > > on the target and cable it to another machine. For example,

    >
    > [snip]
    >
    > You are correct, Curtiss-Wright did wire up another NIC, but
    > unfortunately we've had a problem enabling it. Another group of my
    > coworkers is tackling that problem. If they get it working we'll try
    > setting up another machine on a two-node network, the way you
    > suggested.


    If the second interface shows up when you do "muxShow()" then you
    might just be able to do this:

    -> ipcom_drv_eth_init "nameofdriver", 1, 0
    -> ifconfig "nameofdriver1 10.0.0.1 netmask 255.255.255.0 up"

    If it doesn't show up in muxShow(), then it probably needs to be
    enabled in the BSP somewhere.

    > Thanks again for all your help. We've been in contact with Curtiss-
    > Wright support and Wind River support, but this thread has provided us
    > with the most help so far. In fact, our Wind River support contact
    > provided a cornucopia of advice, almost all of which he copied from
    > your last post. He omitted to include phrases which would point to a
    > fault of Wind River's, like your phrase "a serious problem with the
    > stack." Blah.


    That's politics for you.

    -Bill

    >
    > Warm regards,
    > Justin



  12. Re: Degradation of TCP connection

    Bill,

    Thanks again for your swift and knowledgeable responses. I've
    downloaded pcattcp from here

    http://www.pcausa.com/Utilities/ttcpdown1.htm

    and tried it out. It appears as though I can blast my target with
    zillions of packets and it continues, on the surface, to chug along
    happily. The Windows machine receives all the frames it expects in a
    timely manner, and running ifconfig() on the target does indeed show
    the huge number of dropped Rx packets, as we expected. Does this mean
    that none of the exceptions you mentioned, e.g.

    - exception in tNetTask (or possibly ipnetd using the new IPNET stack
    in 6.5)
    - exception in interrupt context (buggy interrupt service routine?)
    - RX stall (possibly due to mishandled RX overrun in the driver)
    - TX stall (possibly due to incorrectly implemented TX cleanup
    handling, or mishandled TX underrun)
    - RX and TX stall (possibly due to interrupts getting masked off and
    not re-enabled, or driver state getting hosed due to a race
    condition)
    - sluggish response on target shell (possibly due to driver doing too
    much work in interrupt context, or making excessive use of intLock()/
    intUnlock())

    are occurring?

    Also, I've got a tool called Colasoft Capsa Packet Builder, which lets
    you construct and edit packets, then send them out. If we're thinking
    that the driver or network stack might barf if it gets crummy packets,
    I could maybe construct some malformed packets with this program and
    ship them out on the network. Do you think this would be a fruitful
    approach?

    You mentioned some of the causes of being unable to receive packets:

    - The RX state in the driver may have fallen out of sync with the chip
    - The receiver encountered an error from which the driver couldn't
    recover
    - RX interrupts have stopped firing
    - _all_ interrupts for that port have stopped firing (if TX interrupts
    have also stopped, the driver may still be able to send packets onto
    the wire for a short time)

    Can you please suggest some ways I can test for these cases? My
    knowledge of this level of detail of driver/OS architecture is pretty
    sparse...!

    Lastly, thanks for the heads up with muxShow(). The second NIC didn't
    show up, and I remember one of my coworkers mentioning having to
    enable it in the BSP.

    Thanks again for your time and attention,
    Justin

  13. Re: Degradation of TCP connection

    On Aug 6, 6:49 pm, justin.pear...@gmail.com wrote:
    > Bill,
    >
    > Thanks again for your swift and knowledgeable responses. I've
    > downloaded pcattcp from here
    >
    > http://www.pcausa.com/Utilities/ttcpdown1.htm
    >
    > and tried it out. It appears as though I can blast my target with
    > zillions of packets and it continues, on the surface, to chug along
    > happily. The Windows machine receives all the frames it expects in a
    > timely manner, and running ifconfig() on the target does indeed show
    > the huge number of dropped Rx packets, as we expected. Does this mean
    > that none of the exceptions you mentioned, e.g.
    >
    > - exception in tNetTask (or possibly ipnetd using the new IPNET stack
    > in 6.5)
    > - exception in interrupt context (buggy interrupt service routine?)
    > - RX stall (possibly due to mishandled RX overrun in the driver)
    > - TX stall (possibly due to incorrectly implemented TX cleanup
    > handling, or mishandled TX underrun)
    > - RX and TX stall (possibly due to interrupts getting masked off and
    > not re-enabled, or driver state getting hosed due to a race
    > condition)
    > - sluggish response on target shell (possibly due to driver doing too
    > much work in interrupt context, or making excessive use of intLock()/
    > intUnlock())
    >
    > are occurring?


    All but the last one: you didn't say if the shell became slow to
    respond while ttcp was blasting the target with traffic.

    > Also, I've got a tool called Colasoft Capsa Packet Builder, which lets
    > you construct and edit packets, then send them out. If we're thinking
    > that the driver or network stack might barf if it gets crummy packets,
    > I could maybe construct some malformed packets with this program and
    > ship them out on the network. Do you think this would be a fruitful
    > approach?


    I would hold off on this for now. I always tell people I can only
    handle one catastrophe at a time. You have one failure scenario
    involving your LabView app: focus on that problem rather than trying
    too hard to provoke others.

    > You mentioned some of the causes of being unable to receive packets:
    >
    > - The RX state in the driver may have fallen out of sync with the chip
    > - The receiver encountered an error from which the driver couldn't
    > recover
    > - RX interrupts have stopped firing
    > - _all_ interrupts for that port have stopped firing (if TX interrupts
    > have also stopped, the driver may still be able to send packets onto
    > the wire for a short time)
    >
    > Can you please suggest some ways I can test for these cases? My
    > knowledge of this level of detail of driver/OS architecture is pretty
    > sparse...!


    I wouldn't worry about this just yet. What you really want to do is to
    wait for the LabView app to fail again and collect some more data like
    I suggested previously. Once you have that data, _then_ you can decide
    what to look at next. (This is why I asked if you'd tried to ping the
    target once the LabView app stopped working; by using Wireshark you
    analyzed the network, but you didn't really do anything to analyze the
    target. All you really know about the target is "it stops working."
    You need to know more.)

    -Bill

    > Lastly, thanks for the heads up with muxShow(). The second NIC didn't
    > show up, and I remember one of my coworkers mentioning having to
    > enable it in the BSP.
    >
    > Thanks again for your time and attention,
    > Justin



  14. Re: Degradation of TCP connection

    On Tue, 5 Aug 2008 16:07:34 -0700 (PDT), justin.pearson@gmail.com
    wrote:

    >Oh, and I just remembered another piece of the puzzle: The VxWorks
    >machine is also exchanging data with another box on the network over
    >UDP. We have timers in the VxWorks app that make it panic if it stops
    >receiving UDP packets. It appears that during each of these anomalies,
    >the VxWorks box continues to receive UDP packets just fine. That is,
    >it appears as though it stops hearing from the TCP stream, but
    >continues to receive UDP packets as normal.


    Perhaps your ARP cache has become corrupt. I had a system which after
    about 26 days of continuous connection would respond to ping but not
    to telnet; it turned out that the ARP cache had become corrupted by a
    nanosecond timer overflow. The mechanism of corruption is probably
    not timer-related in your case but the end result seems similar. Can
    you devise ARP diagnostics that can run periodically on the sending
    device, both before and after the TCP fail?

    Regards

    James Cunnane

  15. Re: Degradation of TCP connection

    > All but the last one: you didn't say if the shell became slow to
    > respond while ttcp was blasting the target with traffic.


    Thanks for reminding me. The shell continued at its normal pace and
    showed no signs of slowing.

    > What you really want to do is to
    > wait for the LabView app to fail again and collect some more data like
    > I suggested previously. Once you have that data, _then_ you can decide
    > what to look at next. (This is why I asked if you'd tried to ping the
    > target once the LabView app stopped working; by using Wireshark you
    > analyzed the network, but you didn't really do anything to analyze the
    > target. All you really know about the target is "it stops working."
    > You need to know more.)


    Understood. Thanks again for your help. I'll make sure to keep you
    posted as we learn more.

    Regards,

    -Justin

  16. Re: Degradation of TCP connection

    On Aug 7, 3:13 am, James Cunnane
    wrote:
    > On Tue, 5 Aug 2008 16:07:34 -0700 (PDT), justin.pear...@gmail.com
    > wrote:
    >
    > >Oh, and I just remembered another piece of the puzzle: The VxWorks
    > >machine is also exchanging data with another box on the network over
    > >UDP. We have timers in the VxWorks app that make it panic if it stops
    > >receiving UDP packets. It appears that during each of these anomalies,
    > >the VxWorks box continues to receive UDP packets just fine. That is,
    > >it appears as though it stops hearing from the TCP stream, but
    > >continues to receive UDP packets as normal.

    >
    > Perhaps your ARP cache has become corrupt. I had a system which after
    > about 26 days of continuous connection would respond to ping but not
    > to telnet; it turned out that the ARP cache had become corrupted by a
    > nanosecond timer overflow. The mechanism of corruption is probably
    > not timer-related in your case but the end result seems similar. Can
    > you devise ARP diagnostics that can run periodically on the sending
    > device, both before and after the TCP fail?


    Hmm... In your case you said the system would respond to ping, but
    not telnet. It's hard to classify that as a problem with the ARP
    cache, _if_ you tried to ping the target from the same host that you
    also tried to telnet to it from. If you can ping target A from host
    B, then ARP resolution between A and B is working (or at least, the
    ARP entries haven't timed out yet). Ping (ICMP over IP) and telnet
    (TCP over IP) both rely on ARP, so if it worked for one, it should
    have worked for the other.

    However, if you tried to ping target A from host B, and that worked,
    but trying to telnet to target A from host C did not work, that could
    be an ARP problem. (The target still had an unexpired ARP entry for
    host B, but was unable to perform ARP resolution for the previously
    unknown host C.)

    In Justin's case, he said once his app got into its error state, he
    could see the target still sending TCP segments to his Windows host
    using Wireshark (but not responding to ACKs from the Windows host).
    This implies the target's ARP entry for the Windows host was still
    valid (otherwise it would have started sending ARP "who has" requests
    instead).

    -Bill

    > Regards
    >
    > James Cunnane



+ Reply to Thread