tNetTask suspends - VxWorks

This is a discussion on tNetTask suspends - VxWorks ; Hi, I'm getting the following message on a DY4 card: "sn: Fatal error. Receive structure invalid." Once it appears, the netTask is suspended and the only way out is a reboot. Can anyone help remedy this? Thanks, -TomM...

+ Reply to Thread
Results 1 to 6 of 6

Thread: tNetTask suspends

  1. tNetTask suspends

    Hi,

    I'm getting the following message on a DY4 card:

    "sn: Fatal error. Receive structure invalid."

    Once it appears, the netTask is suspended and the only way out is a
    reboot.

    Can anyone help remedy this?

    Thanks,

    -TomM

  2. Re: tNetTask suspends

    On Mar 18, 1:06 pm, tom.mccaffe...@boeing.com wrote:
    > Hi,
    >
    > I'm getting the following message on a DY4 card:
    >
    > "sn: Fatal error. Receive structure invalid."
    >
    > Once it appears, the netTask is suspended and the only way out is a
    > reboot.
    >
    > Can anyone help remedy this?


    Well, uhm, no, because you provided almost no details. For example,
    what version of
    VxWorks are you using? What is a DY4 card? What kind of processor does
    it have on it?
    (ARM? PPC? Coldfire? MIPS? x86?) What driver are you using? ("sn" is
    not much to go
    on.) Is it one that's shipped by Wind River? Is it one that you wrote?
    Is it from a 3rd party?
    Is it ethernet? Some kind of serial line interface? Shared memory?

    One assumes the problem is that the driver (whatever it is) is
    encountering some sort of
    error condition while receiving data, and the error handling is
    inadequate. That is, rather than
    just discarding the bad data and continuing, it just crashes or hangs.
    And since the driver's
    receive handler runs in the context of tNetTask, that means it crashes
    or hangs too.

    As to how to remedy it, there's no way to know without more
    information.

    -Bill

    > Thanks,
    >
    > -TomM



  3. Re: tNetTask suspends

    Hi Tom,

    I believe this error is from sonic driver and problem could be
    malformed packet header.

    Best Regards
    VKG | Ritsoft Technologies


  4. Re: tNetTask suspends

    On Mar 18, 4:13*pm, noiset...@gmail.com wrote:
    > On Mar 18, 1:06 pm, tom.mccaffe...@boeing.com wrote:
    >
    > > Hi,

    >
    > > I'm getting the following message on a DY4 card:

    >
    > > "sn: Fatal error. Receive structure invalid."

    >
    > > Once it appears, the netTask is suspended and the only way out is a
    > > reboot.

    >
    > > Can anyone help remedy this?

    >
    > Well, uhm, no, because you provided almost no details. For example,
    > what version of
    > VxWorks are you using? What is a DY4 card? What kind of processor does
    > it have on it?
    > (ARM? PPC? Coldfire? MIPS? x86?) What driver are you using? ("sn" is
    > not much to go
    > on.) Is it one that's shipped by Wind River? Is it one that you wrote?
    > Is it from a 3rd party?
    > Is it ethernet? Some kind of serial line interface? Shared memory?
    >
    > One assumes the problem is that the driver (whatever it is) is
    > encountering some sort of
    > error condition while receiving data, and the error handling is
    > inadequate. That is, rather than
    > just discarding the bad data and continuing, it just crashes or hangs.
    > And since the driver's
    > receive handler runs in the context of tNetTask, that means it crashes
    > or hangs too.
    >
    > As to how to remedy it, there's no way to know without more
    > information.
    >
    > -Bill
    >
    >

    Hi Bill,

    Sorry for the abbreviated information; here is information that will
    hopefully help:

    The card is a SVME/DMV-177 single board computer manufactured by DY 4
    Systems in Ontario, Canada (acquired by Curtiss-Wright Controls in
    2004). The processor is a PPC 603e, running at 80 MHz. The error
    appears with increased Ethernet activity (TCP/IP). The Ethernet
    interface is controlled by a National Semiconductor DP83932B SONIC
    chip.

    The error text was found in libPPC603gnuvx.a. sysLib.c was developed
    by DY 4.

    VxWorks (for DY 4 VME-176/177) version 5.3.1; Kernel: WIND version
    2.5.

    Thanks,

    -TomM



  5. Re: tNetTask suspends

    On Mar 19, 7:33 am, tom.mccaffe...@boeing.com wrote:
    > On Mar 18, 4:13 pm, noiset...@gmail.com wrote:
    >
    > > On Mar 18, 1:06 pm, tom.mccaffe...@boeing.com wrote:

    >
    > > > Hi,

    >
    > > > I'm getting the following message on a DY4 card:

    >
    > > > "sn: Fatal error. Receive structure invalid."

    >
    > > > Once it appears, the netTask is suspended and the only way out is a
    > > > reboot.

    >
    > > > Can anyone help remedy this?

    >
    > > Well, uhm, no, because you provided almost no details. For example,
    > > what version of
    > > VxWorks are you using? What is a DY4 card? What kind of processor does
    > > it have on it?
    > > (ARM? PPC? Coldfire? MIPS? x86?) What driver are you using? ("sn" is
    > > not much to go
    > > on.) Is it one that's shipped by Wind River? Is it one that you wrote?
    > > Is it from a 3rd party?
    > > Is it ethernet? Some kind of serial line interface? Shared memory?

    >
    > > One assumes the problem is that the driver (whatever it is) is
    > > encountering some sort of
    > > error condition while receiving data, and the error handling is
    > > inadequate. That is, rather than
    > > just discarding the bad data and continuing, it just crashes or hangs.
    > > And since the driver's
    > > receive handler runs in the context of tNetTask, that means it crashes
    > > or hangs too.

    >
    > > As to how to remedy it, there's no way to know without more
    > > information.

    >
    > > -Bill

    >
    > Hi Bill,
    >
    > Sorry for the abbreviated information; here is information that will
    > hopefully help:
    >
    > The card is a SVME/DMV-177 single board computer manufactured by DY 4
    > Systems in Ontario, Canada (acquired by Curtiss-Wright Controls in
    > 2004). The processor is a PPC 603e, running at 80 MHz. The error
    > appears with increased Ethernet activity (TCP/IP). The Ethernet
    > interface is controlled by a National Semiconductor DP83932B SONIC
    > chip.
    >
    > The error text was found in libPPC603gnuvx.a. sysLib.c was developed
    > by DY 4.
    >
    > VxWorks (for DY 4 VME-176/177) version 5.3.1; Kernel: WIND version
    > 2.5.
    >
    > Thanks,
    >
    > -TomM



    Ah, ok.

    Something told me I should have recognized the "sn" driver name, but
    now I realize why I didn't: VxWorks 5.3.1 is pretty old, and the if_sn
    driver is one of the BSD-style netif drivers, which have been
    deprecated since 5.5. There doesn't appear to be an END driver to
    replace it.

    Anyway, I dug around a bit, and it looks like I was right: the problem
    is in fact that the driver's error handling is very poor, and it calls
    taskSuspend(0) when it encounters a problem receiving a packet. (Oddly
    enough, it also tries to do a 'return' immediately following the
    taskSuspend().) It looks like the error can be triggered by a number
    of things, including the frame being too small or too large, or if the
    PRX bit is not set in the RX DMA descriptor (which indicates there was
    an error on receive, such as a CRC error or frame alignment error).
    What the driver really should be doing is discarding the bad frame and
    moving on to the next one (incrementing the RX error count along the
    way), not suspending tNetTask.

    The short answer here is that this driver looks buggy, probably in a
    number of ways. It was probably not tested under heavy load (which is
    a chronic problem). During periods of heavy network activity, a couple
    of things can happen: you run out of RX descriptors (or overrun the RX
    FIFO), or you end up with bad packets (runts, CRC errors, etc...). The
    driver should be written to handle these conditions and recover from
    them, but it isn't.

    There's another problem though, which is that judging by the
    documentation for the SONIC controller (which is still available on
    National Semiconductor's site), you have to be very careful how you
    handle the RX descriptor buffer management. The SONIC looks like it
    uses what I call the 'single synchronization point' model, where the
    RX DMA engine expects a linked list of descriptors, where the last one
    has an 'end of list' bit set to mark where the list terminates. The
    end of list bit is used by the chip to figure out when it's reached
    the end of the list: when it reaches the end of list, the RX DMA
    engine pauses. The driver is supposed to process pending RX
    descriptors and when -- and only when -- it has also reached the end
    of list, it can resume the DMA channel. The problem with this of
    course is that you don't normally want the DMA channel to pause if you
    can avoid. In this case though, you really can't avoid it, but some
    driver developers think they can, so they try to cheat their way
    around the problem: each time they receive a packet, they move the EOL
    bit to the next descriptor. (This is a little like tying a stick with
    a carrot on the end of it to the head of a donkey.) You can't do this
    though, because it creates a race condition: there's no way to be
    certain that the ethernet chip won't attempt to consume the next
    descriptor while you're tying to update it (unless of course you stall
    the RX DMA channel yourself first).

    Anyway, I don't think the if_sn driver implements the right logic, so
    it's very possible it's susceptible to the same race condition.

    As to how to fix this... well, that's hard to say, mainly because of
    how old VxWorks 5.3.1 is. The problem is definitely a bug in the if_sn
    driver, which is Wind River code. If have a valid support contract,
    then you can open up a support request for this issue. Given how old
    VxWorkls 5.3.1 is though, I somehow doubt you can still get support
    for it. At the very least, if you might be able to search for possible
    patches for the if_sn driver.

    Another alternative is to just write your own replacement for the
    SONIC driver. The datasheet for the device is still available (http://
    www.national.com/ds/DP/DP83932C.pdf) so this is not entirely out of
    the question, but it's probably more work than you care to do.

    -Bill

  6. Re: tNetTask suspends

    On Mar 19, 3:23*pm, noiset...@gmail.com wrote:
    > On Mar 19, 7:33 am, tom.mccaffe...@boeing.com wrote:
    >
    >
    >
    >
    >
    > > On Mar 18, 4:13 pm, noiset...@gmail.com wrote:

    >
    > > > On Mar 18, 1:06 pm, tom.mccaffe...@boeing.com wrote:

    >
    > > > > Hi,

    >
    > > > > I'm getting the following message on a DY4 card:

    >
    > > > > "sn: Fatal error. Receive structure invalid."

    >
    > > > > Once it appears, the netTask is suspended and the only way out is a
    > > > > reboot.

    >
    > > > > Can anyone help remedy this?

    >
    > > > Well, uhm, no, because you provided almost no details. For example,
    > > > what version of
    > > > VxWorks are you using? What is a DY4 card? What kind of processor does
    > > > it have on it?
    > > > (ARM? PPC? Coldfire? MIPS? x86?) What driver are you using? ("sn" is
    > > > not much to go
    > > > on.) Is it one that's shipped by Wind River? Is it one that you wrote?
    > > > Is it from a 3rd party?
    > > > Is it ethernet? Some kind of serial line interface? Shared memory?

    >
    > > > One assumes the problem is that the driver (whatever it is) is
    > > > encountering some sort of
    > > > error condition while receiving data, and the error handling is
    > > > inadequate. That is, rather than
    > > > just discarding the bad data and continuing, it just crashes or hangs.
    > > > And since the driver's
    > > > receive handler runs in the context of tNetTask, that means it crashes
    > > > or hangs too.

    >
    > > > As to how to remedy it, there's no way to know without more
    > > > information.

    >
    > > > -Bill

    >
    > > Hi Bill,

    >
    > > Sorry for the abbreviated information; here is information that will
    > > hopefully help:

    >
    > > The card is a SVME/DMV-177 single board computer manufactured by DY 4
    > > Systems in Ontario, Canada (acquired by Curtiss-Wright Controls in
    > > 2004). *The processor is a PPC 603e, running at 80 MHz. *The error
    > > appears with increased Ethernet activity (TCP/IP). *The Ethernet
    > > interface is controlled by a National Semiconductor DP83932B SONIC
    > > chip.

    >
    > > The error text was found in libPPC603gnuvx.a. *sysLib.c was developed
    > > by DY 4.

    >
    > > VxWorks (for DY 4 VME-176/177) version 5.3.1; Kernel: WIND version
    > > 2.5.

    >
    > > Thanks,

    >
    > > -TomM

    >
    > Ah, ok.
    >
    > Something told me I should have recognized the "sn" driver name, but
    > now I realize why I didn't: VxWorks 5.3.1 is pretty old, and the if_sn
    > driver is one of the BSD-style netif drivers, which have been
    > deprecated since 5.5. There doesn't appear to be an END driver to
    > replace it.
    >
    > Anyway, I dug around a bit, and it looks like I was right: the problem
    > is in fact that the driver's error handling is very poor, and it calls
    > taskSuspend(0) when it encounters a problem receiving a packet. (Oddly
    > enough, it also tries to do a 'return' immediately following the
    > taskSuspend().) It looks like the error can be triggered by a number
    > of things, including the frame being too small or too large, or if the
    > PRX bit is not set in the RX DMA descriptor (which indicates there was
    > an error on receive, such as a CRC error or frame alignment error).
    > What the driver really should be doing is discarding the bad frame and
    > moving on to the next one (incrementing the RX error count along the
    > way), not suspending tNetTask.
    >
    > The short answer here is that this driver looks buggy, probably in a
    > number of ways. It was probably not tested under heavy load (which is
    > a chronic problem). During periods of heavy network activity, a couple
    > of things can happen: you run out of RX descriptors (or overrun the RX
    > FIFO), or you end up with bad packets (runts, CRC errors, etc...). The
    > driver should be written to handle these conditions and recover from
    > them, but it isn't.
    >
    > There's another problem though, which is that judging by the
    > documentation for the SONIC controller (which is still available on
    > National Semiconductor's site), you have to be very careful how you
    > handle the RX descriptor buffer management. The SONIC looks like it
    > uses what I call the 'single synchronization point' model, where the
    > RX DMA engine expects a linked list of descriptors, where the last one
    > has an 'end of list' bit set to mark where the list terminates. The
    > end of list bit is used by the chip to figure out when it's reached
    > the end of the list: when it reaches the end of list, the RX DMA
    > engine pauses. The driver is supposed to process pending RX
    > descriptors and when -- and only when -- it has also reached the end
    > of list, it can resume the DMA channel. The problem with this of
    > course is that you don't normally want the DMA channel to pause if you
    > can avoid. In this case though, you really can't avoid it, but some
    > driver developers think they can, so they try to cheat their way
    > around the problem: each time they receive a packet, they move the EOL
    > bit to the next descriptor. (This is a little like tying a stick with
    > a carrot on the end of it to the head of a donkey.) You can't do this
    > though, because it creates a race condition: there's no way to be
    > certain that the ethernet chip won't attempt to consume the next
    > descriptor while you're tying to update it (unless of course you stall
    > the RX DMA channel yourself first).
    >
    > Anyway, I don't think the if_sn driver implements the right logic, so
    > it's very possible it's susceptible to the same race condition.
    >
    > As to how to fix this... well, that's hard to say, mainly because of
    > how old VxWorks 5.3.1 is. The problem is definitely a bug in the if_sn
    > driver, which is Wind River code. If have a valid support contract,
    > then you can open up a support request for this issue. Given how old
    > VxWorkls 5.3.1 is though, I somehow doubt you can still get support
    > for it. At the very least, if you might be able to search for possible
    > patches for the if_sn driver.
    >
    > Another alternative is to just write your own replacement for the
    > SONIC driver. The datasheet for the device is still available (http://www.national.com/ds/DP/DP83932C.pdf) so this is not entirely out of
    > the question, but it's probably more work than you care to do.
    >
    > -Bill- Hide quoted text -
    >
    > - Show quoted text -


    Many thanks, Bill

    -TomM

+ Reply to Thread