TCP Checksum Errors When Checksum is Calculated in Hardware? - TCP-IP

This is a discussion on TCP Checksum Errors When Checksum is Calculated in Hardware? - TCP-IP ; On systems that use HP gigabit ethernet cards, my sniffer traces show lots of TCP Checksum errors in the trace. I've read various threads claiming that this is typical behavior when hardware error checking is turned on in the hardware. ...

+ Reply to Thread
Results 1 to 8 of 8

Thread: TCP Checksum Errors When Checksum is Calculated in Hardware?

  1. TCP Checksum Errors When Checksum is Calculated in Hardware?

    On systems that use HP gigabit ethernet cards, my sniffer traces show lots
    of TCP Checksum errors in the trace. I've read various threads claiming
    that this is typical behavior when hardware error checking is turned on in
    the hardware. Can someone explain in details what sequence is likely
    happening here?

    Is the idea that the software layer that sends out the packet is not
    calculating the checksum, but some software layer just prior to the packet
    going to hardware is calculating the checksum and comparing anyway? I
    need to know if this is a symptom of a bad driver implementation, or if the
    OS itself is to blame. I have seen the behavior on both Windows 2003 and
    Windows 2000.

    I'm concerned by this behavior for these reasons:

    1) If checksums are "failing" on receipt, isn't TCP going to ask for
    resends? That would get very expensive to performance, far beyond
    anything saved by keeping the CRC calculation off the CPU.

    2) If my sniffer trace is littered with not very meaingful checksum errors,
    I lose the ability to quickly see checksum errors that are important (the
    ones due to bad wire or bad hardware).

    3) The sniffer trace becomes ugly and more difficult to navigate through.

    In the big picture, on a *lightly loaded* server (and certainly on a
    client), is turning hardware error checking off going to affect performance
    by more than about 5%? Most Windows or Linux boxes that are 3 GHz or
    faster have plenty of CPU to spare for such things, so I can't imagine
    bottlenecking on CPU.

    --
    Will



  2. Re: TCP Checksum Errors When Checksum is Calculated in Hardware?

    In article <7cudnUyJDaXojuzYnZ2dnUVZ_rSdnZ2d@giganews.com>,
    Will wrote:
    >On systems that use HP gigabit ethernet cards, my sniffer traces show lots
    >of TCP Checksum errors in the trace. I've read various threads claiming
    >that this is typical behavior when hardware error checking is turned on in
    >the hardware. Can someone explain in details what sequence is likely
    >happening here?


    >Is the idea that the software layer that sends out the packet is not
    >calculating the checksum, but some software layer just prior to the packet
    >going to hardware is calculating the checksum and comparing anyway?


    No, the hardware itself does a CRC fixup in this case.


    >2) If my sniffer trace is littered with not very meaingful checksum errors,
    >I lose the ability to quickly see checksum errors that are important (the
    >ones due to bad wire or bad hardware).


    It is not clear where your sniffing is taking place. If your sniffing
    is software running on the a system that has checksum offloads turned
    on, then Yes, you have this problem. If it presents undue difficulties,
    turn off the feature in the driver, or do your sniffing externally
    where the driver has already fixed the packet up.

  3. Re: TCP Checksum Errors When Checksum is Calculated in Hardware?

    "Walter Roberson" wrote in message
    news:2Dich.409304$R63.191328@pd7urf1no...
    > In article <7cudnUyJDaXojuzYnZ2dnUVZ_rSdnZ2d@giganews.com>,
    > Will wrote:
    > >On systems that use HP gigabit ethernet cards, my sniffer traces show

    lots
    > >of TCP Checksum errors in the trace. I've read various threads claiming
    > >that this is typical behavior when hardware error checking is turned on

    in
    > >the hardware. Can someone explain in details what sequence is likely
    > >happening here?

    >
    > >Is the idea that the software layer that sends out the packet is not
    > >calculating the checksum, but some software layer just prior to the

    packet
    > >going to hardware is calculating the checksum and comparing anyway?

    >
    > No, the hardware itself does a CRC fixup in this case.


    Sorry to just not get this. Why is the software (sniffer) seeing a
    checksum error? Did something put a fake value into the CRC field
    knowing that the hardware would patch that up later? I don't understand
    why they would not just set a flag somewhere to indicate hardware CRC
    checking, so that the software could check that and not bother with
    reporting CRC errors for such cases because the CRC it inspects won't mean
    anything.

    --
    Will





  4. Re: TCP Checksum Errors When Checksum is Calculated in Hardware?

    So there's not quite enough information to diagnose here.

    It sounds as if what you are seeing is data packets which pass the CRC
    but have TCP checksums that the sniffer software determines is bad, even
    though the inbound hardware (which does checksum tests) determined the
    packets were good. The discussion that follows assumes this is what's
    going on. If that's not what's going on, we need to do a different analysis.

    The simple version of the analysis is the following:

    * if the inbound CRC checked correctly, then you almost certainly
    have received the packet as it was originally transmitted (CRC
    errors are rare)

    * if the inbound hardware checksum test succeeded, then the packet,
    at the time it passed the hardware checksummer is highly likely
    to have been correct

    * the fact that your sniffer says the checksum is bad, says that somewhere
    between the hardware checksum test and the sniffer software, the
    data was corrupted. This could be a hardware or software error.

    Reading that may help: J. Stone and C. Partridge, "When the CRC and
    Checksum Disagree," Proc. ACM SIGCOMM 2000, Stockholm, Sweden, August 2000
    (http://www.sigcomm.org/sigcomm2000/c...mm2000-9-1.pdf).

    Craig


    "Will" writes:

    >On systems that use HP gigabit ethernet cards, my sniffer traces show lots
    >of TCP Checksum errors in the trace. I've read various threads claiming
    >that this is typical behavior when hardware error checking is turned on in
    >the hardware. Can someone explain in details what sequence is likely
    >happening here?


    >Is the idea that the software layer that sends out the packet is not
    >calculating the checksum, but some software layer just prior to the packet
    >going to hardware is calculating the checksum and comparing anyway? I
    >need to know if this is a symptom of a bad driver implementation, or if the
    >OS itself is to blame. I have seen the behavior on both Windows 2003 and
    >Windows 2000.


    >I'm concerned by this behavior for these reasons:


    >1) If checksums are "failing" on receipt, isn't TCP going to ask for
    >resends? That would get very expensive to performance, far beyond
    >anything saved by keeping the CRC calculation off the CPU.


    >2) If my sniffer trace is littered with not very meaingful checksum errors,
    >I lose the ability to quickly see checksum errors that are important (the
    >ones due to bad wire or bad hardware).


    >3) The sniffer trace becomes ugly and more difficult to navigate through.


    >In the big picture, on a *lightly loaded* server (and certainly on a
    >client), is turning hardware error checking off going to affect performance
    >by more than about 5%? Most Windows or Linux boxes that are 3 GHz or
    >faster have plenty of CPU to spare for such things, so I can't imagine
    >bottlenecking on CPU.


    >--
    >Will




  5. Re: TCP Checksum Errors When Checksum is Calculated in Hardware?

    On Sat, 02 Dec 2006 17:24:41 -0800, Will wrote:

    >> No, the hardware itself does a CRC fixup in this case.

    >
    > Sorry to just not get this. Why is the software (sniffer) seeing a
    > checksum error? Did something put a fake value into the CRC field
    > knowing that the hardware would patch that up later? I don't understand


    Yes. No point in calculating the checksum when the hardware is going to do
    it again.

    > why they would not just set a flag somewhere to indicate hardware CRC
    > checking, so that the software could check that and not bother with
    > reporting CRC errors for such cases because the CRC it inspects won't mean
    > anything.


    That flag probably exists. Somewhere in the preferences of your sniffer :-)

    Seriously, the low level interfaces used by the sniffer were probably not
    designed with this in mind. Either hardware checksumming did not exist
    when the interfaces were written, or the designers felt that this problem
    wasn't big enough to tackle. Or nobody thought about it at that time.

    HTH,
    M4
    --
    Redundancy is a great way to introduce more single points of failure.


  6. Re: TCP Checksum Errors When Checksum is Calculated in Hardware?

    On Sun, 3 Dec 2006 16:03:51 +0000 (UTC), Craig Partridge
    wrote:

    > * the fact that your sniffer says the checksum is bad, says that somewhere
    > between the hardware checksum test and the sniffer software, the
    > data was corrupted. This could be a hardware or software error.


    I assume that he sees the checksum errors on outgoing packets only.

    Emil

  7. Re: TCP Checksum Errors When Checksum is Calculated in Hardware?

    Will wrote:
    > On systems that use HP gigabit ethernet cards, my sniffer traces


    _Which_ HP gigabit ethernet cards? There are many of them.

    If you are using a software sniffer, and it is on one of the systems
    infolved in the transfer (ie it is not grabbing packets from the wire)
    then it is perfectly normal to see it report checksum errors because
    the sending stack (ie the stack on the same system as you are running
    the packet sniffing software) is letting the Checksum be calculated by
    the NIC, and that happens _after_ the packet has passed the sniffing
    point - packets sent by the system on which the sniffer is running are
    seen by the sniffing software _before_ they go to the NIC, not after.

    When NICs with CKO are involved, you are better off (unless you are
    looking to diagnose a broken NIC I guess) doing the packet sniffing
    with a "disinterested third-party" system - one that is not involved
    in the communications and simply takes packets off the network. Of
    course, with switches involved this can be a bit more difficult,
    requiring the assistance of those folks who rule the network
    infrastructure with an iron fist

    > In the big picture, on a *lightly loaded* server (and certainly on a
    > client), is turning hardware error checking off going to affect
    > performance by more than about 5%? Most Windows or Linux boxes that
    > are 3 GHz or faster have plenty of CPU to spare for such things, so
    > I can't imagine bottlenecking on CPU.


    CKO is a pre-requsite for things like zero-copy or segmentation
    offload. So, feel free to try disabling it, but be prepared for the
    possibility of a large increase in CPU utilization, which then may or
    may not result in a large change in performance.

    Some _ancient_ data showing how CKO synnergisticaly (sp) combines with
    copy-avoidance can be found at:

    ftp://ftp.cup.hp.com/dist/networking.../copyavoid.pdf

    and that doesn't begin to cover segmentation offload (large send or
    "TSO")

    rick jones
    --
    Process shall set you free from the need for rational thought.
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  8. Re: TCP Checksum Errors When Checksum is Calculated in Hardware?

    Rick Jones wrote:
    > Will wrote:
    >> On systems that use HP gigabit ethernet cards, my sniffer traces


    > _Which_ HP gigabit ethernet cards? There are many of them.


    > If you are using a software sniffer, and it is on one of the systems
    > infolved in the transfer (ie it is not grabbing packets from the wire)


    That should be "ie not _just_ grabbing packets from the wire"

    And you may see other sorts of "strangeness" when the local NIC offers
    segmentation offload - typically the IP length field is left as zero
    in that case as it will be the NIC which segments things to apropriate
    sizes for the actual link/connection. However, there have been some
    patches to tcpdump at least which apply an heuristic in this case to
    avoid simply reporting "IP bad len..."

    The joys of sniffing on systems with helpful NICs. Just imagine what
    it could be if the NIC did full TCP offload

    rick jones
    --
    No need to believe in either side, or any side. There is no cause.
    There's only yourself. The belief is in your own precision. - Jobert
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

+ Reply to Thread