Silent data corruption despite TCP - TCP-IP

This is a discussion on Silent data corruption despite TCP - TCP-IP ; Hello everyone, Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy wireless channel, and suppose the link layer does not compute any CRC. Then, I imagine that there is a very high probability that TCP's ...

+ Reply to Thread
Results 1 to 16 of 16

Thread: Silent data corruption despite TCP

  1. Silent data corruption despite TCP

    Hello everyone,

    Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy
    wireless channel, and suppose the link layer does not compute any CRC.

    Then, I imagine that there is a very high probability that TCP's
    checksum will not detect every instance of data corruption, and the
    receiver's copy of the file will differ from the original file.

    Even when the link layer does compute a CRC, it has been shown (*)
    that corrupted packets do reach the receiver. Therefore, I imagine it
    is possible for silent data corruption to occur?

    (*) http://citeseer.ist.psu.edu/stone00when.html

    Have there been other studies of silent data corruption despite CRCs
    and TCP's checksum?

    I suppose I need to use a (cryptographic?) hash function if I want to
    be certain, beyond any reasonable doubt, that the receiver's copy is
    the same as the original file?

    SHA-512 produces a 512-bit hash.
    One chance in 2^512 seems small enough :-)

    Regards.

  2. Re: Silent data corruption despite TCP

    In article <4815f164$0$21072$426a34cc@news.free.fr>,
    Noob wrote:
    >Hello everyone,
    >
    >Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy
    >wireless channel, and suppose the link layer does not compute any CRC.


    That's gotta be some link layer... :-P

    >Even when the link layer does compute a CRC, it has been shown (*)
    >that corrupted packets do reach the receiver. Therefore, I imagine it
    >is possible for silent data corruption to occur?
    >
    >(*) http://citeseer.ist.psu.edu/stone00when.html


    I've seen it happen due to pre-release NFS bugs internally. It's not
    pretty.

    >I suppose I need to use a (cryptographic?) hash function if I want to
    >be certain, beyond any reasonable doubt, that the receiver's copy is
    >the same as the original file?
    >
    >SHA-512 produces a 512-bit hash.
    >One chance in 2^512 seems small enough :-)


    We detected said NFS bug only because a few of our NFS clients were using
    IPsec to protect the packets. IPsec's data-integrity/packet-authentication
    (i.e. its use of HMAC-{MD5,SHA1,SHA2}) helps immensely here. Combine that
    with a TCP that retransmits, and the use of IPsec can make up for your very
    flaky link-layer.

    You could also hash the file after transmission. This is a cat that you can
    skin any number of ways.


    --
    Daniel L. McDonald - Solaris Security & Networking Engineering
    Mail: danmcd@sun.com | * MY OPINIONS ARE NOT NECESSARILY SUN'S! *
    35 Network Drive Burlington, MA |"rising falling at force ten
    http://blogs.sun.com/danmcd/ | we twist the world and ride the wind" - Rush

  3. Re: Silent data corruption despite TCP

    In article <4815f164$0$21072$426a34cc@news.free.fr>,
    Noob wrote:

    >Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy
    >wireless channel, and suppose the link layer does not compute any CRC.
    >
    >Then, I imagine that there is a very high probability that TCP's
    >checksum will not detect every instance of data corruption, and the
    >receiver's copy of the file will differ from the original file.


    What is a "very high probability"?
    It should depend on your application.

    >Even when the link layer does compute a CRC, it has been shown (*)
    >that corrupted packets do reach the receiver. Therefore, I imagine it
    >is possible for silent data corruption to occur?


    Silent data corruption is always possible, even if the CRC is twice
    or even 100 times as long as the data itself. It is all matter of
    what you consider a "very high probability".


    >(*) http://citeseer.ist.psu.edu/stone00when.html
    >
    >Have there been other studies of silent data corruption despite CRCs
    >and TCP's checksum?


    I think that's the best published study.


    >I suppose I need to use a (cryptographic?) hash function if I want to
    >be certain, beyond any reasonable doubt, that the receiver's copy is
    >the same as the original file?


    You need to quantify "reasonable doubt" and decide what kind of errors
    you are worried about. Are the errors you care about isolate single
    bit changes, drop-outs (a block N bits all changed to 0 or 1), bursts
    of static (a block M bits changed randomly), or something else? How
    many errors occur in a packet? Are the errors uniformly distributed?
    Do you only want to detect errors and rely on TCP to recover by
    retransmitting or are the errors frequent enough that the costs of
    forward error correction are worthwhile?


    >SHA-512 produces a 512-bit hash.
    >One chance in 2^512 seems small enough :-)


    That fundamental misunderstanding of cryptographic hash functions is
    one of my pet peeves. Cryptographic hash functions are not necessarily
    better at detecting changes than other hash functions, CRCs, FCSs, etc.
    Cryptographic graphic hash functions are mostly designed to be very
    hard to analyze so that adversaries cannot reverse them; considerations
    of how many and what kinds of changes are they detect are secondary.
    You can say things about error detection functions like "CRC-X detects
    any single burst of errors of N or fewer bits in a block of Y bits,"
    but you cannot say anything similar about cryptographic hash functions
    (except for trivial cases of N and Y). You cannot even say, for example,
    that "the detection failure rate of SHA-512 is one in 2^512 changes"
    (of course with suitable definitions for "changes" including type, size,
    and distribution).

    It is almost (but not quite) true that if you could say that
    "Crypto-Hash CH() detects all N bit errors" then CH would be "broken"
    on the grounds that you know it doesn't detect all N+1 bit errors,
    and so some of those undetected N+1 bit changes could be used for evil.

    Never mind that most people who use "broken" in that context are wrong,
    as they are blather authoritative sounding nonsense about MD5 being
    "broken." MD5 and some other cryptographic hashes are "broken" only
    for some uses and not others. The big problem there are only vague
    hopes that SHA-512 or any other hash function you might name are not
    just as "breakable." That "hard to analyze" requirement on every
    crypto-hash function is at least so far and perhaps forever a fundamental
    weakness.


    Vernon Schryver vjs@rhyolite.com

  4. Re: Silent data corruption despite TCP

    On Apr 28, 8:46 am, Noob wrote:

    > Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy
    > wireless channel, and suppose the link layer does not compute any CRC.


    I think that's a completely unrealistic hypothetical. Typical TCP-over-
    wireless implementations have a 32-bit CRC at the wireless layer and a
    16-bit CRC at the TCP layer. No sane person would implement a "noisy
    wireless channel" with a link layer that "does not compute any CRC".
    If you did, file transfer over TCP would be only one of your many
    problems.

    DS

  5. Re: Silent data corruption despite TCP

    In comp.protocols.tcp-ip Noob wrote:
    > I suppose I need to use a (cryptographic?) hash function if I want
    > to be certain, beyond any reasonable doubt, that the receiver's copy
    > is the same as the original file?


    It depends entirely on your definition of a reasonable doubt. It
    would/could certainly help considerably.

    > SHA-512 produces a 512-bit hash.
    > One chance in 2^512 seems small enough :-)


    I'm not sure the math works _exactly_ that way but it would be better
    than just relying on TCP's checksum alone. Might be belts, suspenders
    and duct-tape, but some data calls for that.

    IIRC the emerging SCTP uses a rather stronger 32 bit checksum of some
    sort.

    rick jones
    --
    The computing industry isn't as much a game of "Follow The Leader" as
    it is one of "Ring Around the Rosy" or perhaps "Duck Duck Goose."
    - Rick Jones
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  6. Re: Silent data corruption despite TCP

    On Apr 28, 9:34 am, v...@calcite.rhyolite.com (Vernon Schryver) wrote:

    > That fundamental misunderstanding of cryptographic hash functions is
    > one of my pet peeves. Cryptographic hash functions are not necessarily
    > better at detecting changes than other hash functions, CRCs, FCSs, etc.
    > Cryptographic graphic hash functions are mostly designed to be very
    > hard to analyze so that adversaries cannot reverse them; considerations
    > of how many and what kinds of changes are they detect are secondary.
    > You can say things about error detection functions like "CRC-X detects
    > any single burst of errors of N or fewer bits in a block of Y bits,"
    > but you cannot say anything similar about cryptographic hash functions
    > (except for trivial cases of N and Y). You cannot even say, for example,
    > that "the detection failure rate of SHA-512 is one in 2^512 changes"
    > (of course with suitable definitions for "changes" including type, size,
    > and distribution).


    If you have a block of data with a 512-bit cryptographic hash, the
    probability that random changes to the data and/or the hash will leave
    things such that the hash is still the correct hash of the data is
    fairly close to 1 in 2^512 for practical purposes. This is one of the
    design criteria for cryptographic hashes and is definitely true of
    commonly-used hashes such as SHA-512.

    This can be true of a cryptographic hash, and if it's not, then the
    hash is at least somewhat broken. Commonly-used cryptographic hashes
    are not broken.

    Again, this is specifically one of the design criteria for
    cryptographic hashes. The hashes are supposed to be randomly
    distributed over the available hash space and any change in the input
    is supposed to avalanche over the output.

    DS

  7. Re: Silent data corruption despite TCP

    Noob writes:

    >Hello everyone,


    >Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy
    >wireless channel, and suppose the link layer does not compute any CRC.


    >Then, I imagine that there is a very high probability that TCP's
    >checksum will not detect every instance of data corruption, and the
    >receiver's copy of the file will differ from the original file.



    >Even when the link layer does compute a CRC, it has been shown (*)
    >that corrupted packets do reach the receiver. Therefore, I imagine it
    >is possible for silent data corruption to occur?


    >(*) http://citeseer.ist.psu.edu/stone00when.html


    >Have there been other studies of silent data corruption despite CRCs
    >and TCP's checksum?


    >I suppose I need to use a (cryptographic?) hash function if I want to
    >be certain, beyond any reasonable doubt, that the receiver's copy is
    >the same as the original file?


    >SHA-512 produces a 512-bit hash.
    >One chance in 2^512 seems small enough :-)


    I would say 1 in 2^128 is good enough.
    You do not need a cryptographic checksum. Just one that is sufficiently
    mixing and that depends equally on each bit of the text. Nature is not
    malicious-- it is not trying to mess up. Ie, the chances that nature will
    happen to hit on the noise structure to vastly increase the rate from
    1/2^128 to a much smaller rate is even smaller than 1.2^128



    >Regards.


  8. Re: Silent data corruption despite TCP

    On Apr 28, 9:34 am, v...@calcite.rhyolite.com (Vernon Schryver) wrote:

    > It is almost (but not quite) true that if you could say that
    > "Crypto-Hash CH() detects all N bit errors" then CH would be "broken"
    > on the grounds that you know it doesn't detect all N+1 bit errors,
    > and so some of those undetected N+1 bit changes could be used for evil.


    If that were true, the crypto hash would be broken. The whole point of
    a crypto hash is that even if you know such changes exist, they cannot
    be used for evil because they cannot be *found*. The possible
    advantage of a crypto hash over another hash would be that collisions
    cannot be found for a proper crypto hash. (Although in this case, it's
    not clear why that would matter. If you want to maliciously corrupt
    the data, you can just put in the correct hash anyway.)

    > Never mind that most people who use "broken" in that context are wrong,
    > as they are blather authoritative sounding nonsense about MD5 being
    > "broken." MD5 and some other cryptographic hashes are "broken" only
    > for some uses and not others.


    Right, but this is dangerously close to one of those uses. All you'd
    have to do is sign the hash, and you'd have a use case for which MD5
    is broken.

    > The big problem there are only vague
    > hopes that SHA-512 or any other hash function you might name are not
    > just as "breakable." That "hard to analyze" requirement on every
    > crypto-hash function is at least so far and perhaps forever a fundamental
    > weakness.


    Ideally, you adjust your use of a hash so that even if it is "broken"
    in the ways it's most likely to be broken in the future, that has no
    effect on your use. That requires a deep understanding of the
    strengths and weaknesses of cryptographic hashes.

    For example, it's quite likely that someone will find two chunks of
    data that hash to the same value long before they can find data of the
    same length that hash the same hash as a given chunk.

    DS

  9. Re: Silent data corruption despite TCP

    In comp.protocols.tcp-ip David Schwartz wrote:
    > This can be true of a cryptographic hash, and if it's not, then the
    > hash is at least somewhat broken. Commonly-used cryptographic hashes
    > are not broken.


    Are not known to be broken.

    rick jones
    --
    web2.0 n, the dot.com reunion tour...
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  10. Re: Silent data corruption despite TCP

    In article ,
    Rick Jones wrote:

    >> This can be true of a cryptographic hash, and if it's not, then the
    >> hash is at least somewhat broken. Commonly-used cryptographic hashes
    >> are not broken.

    >
    >Are not known to be broken.


    Even that's is gross optimism. All cryptogrpaphic hashes are merely
    hoped to not be secretly broken by too many adversaries. The nature
    of all current cryptographic hashes is that no one has proven anything
    useful about how well they work for simple error detection. I'd believe
    SHA-512 detects all single bit errors in all blocks of 512 bits, but
    I'd like to see a proof of all double bit errors in 512, all single
    bits in 1024 (or even 513) bits, not to mention blocks not so tiny that
    you would do better by transmitting second copies of the 64 bytes run
    through a bijection ("scrambler"). Anyone who doesn't "know" a bunch
    of stuff that is false would choose CRC-512 instead of SHA-512 to detect
    natural errors. Unless you are battling adversaries who would use the
    obvious ways to outwit CRC-512 as a signature, you are better off with
    something than other than a cryptographic hash.


    Vernon Schryver vjs@rhyolite.com

  11. Re: Silent data corruption despite TCP

    On Apr 28, 12:20 pm, Rick Jones wrote:

    > In comp.protocols.tcp-ip David Schwartz wrote:


    > > This can be true of a cryptographic hash, and if it's not, then the
    > > hash is at least somewhat broken. Commonly-used cryptographic hashes
    > > are not broken.


    > Are not known to be broken.


    For the type of breakage we are talking about here, our knowledge that
    they are not broken is almost as certain as such knowledge can be.

    In this case, we are talking about random corruption of the hash or
    the data causing the hash to match the data without the data matching
    the original. There is no "adversary" here recomputing the hash -- the
    adversary is any multilation process with no knowledge of the
    internals of our hash but any other kind of mechanism. (Such as
    flipping all bits, flipping random bits, flipping alternate bits,
    turning 1/3 of all zeroes to ones, and so on. Any process that does
    not know about the hash's internals.)

    Cryptographic weaknesses (which, of course, hashes like SHA-512 almost
    certainly have and will ultimately be discovered) cannot be exploited
    by anything but an adversary. Random corruption will not get smarter.

    There's always a very slim chance that I will be proven wrong, but I
    feel about as sure as anything in computer science that SHA-512 will
    always be safe for this use. For that matter, MD5 will likely always
    be safe for this use.

    DS

  12. Re: Silent data corruption despite TCP

    On Apr 28, 12:48 pm, v...@calcite.rhyolite.com (Vernon Schryver)
    wrote:

    > Even that's is gross optimism. All cryptogrpaphic hashes are merely
    > hoped to not be secretly broken by too many adversaries.


    That's true, but cryptographic weaknesses (such as the ability of an
    adversary to infer something about the data from the hash or to create
    two blocks that hash to the same value) have no affect on use as a
    checksum.

    > The nature
    > of all current cryptographic hashes is that no one has proven anything
    > useful about how well they work for simple error detection.


    Right, but that's totally irrelevant. For example, which would you
    prefer:

    1) A hash that's been proven to catch 95% of single bit errors, or

    2) A hash that hasn't been proven to catch anything, but there is
    strong evidence that the probability of a single bit error passing the
    hash is believed to be close to 1 in 2^128.

    > I'd believe
    > SHA-512 detects all single bit errors in all blocks of 512 bits, but
    > I'd like to see a proof of all double bit errors in 512, all single
    > bits in 1024 (or even 513) bits, not to mention blocks not so tiny that
    > you would do better by transmitting second copies of the 64 bytes run
    > through a bijection ("scrambler").


    If you know for a fact that all you'll have are errors of this type,
    then SHA-512 is a poor choice. But SHA-512 has a probability of not
    detecting any given random error that's vanishingly small, close to
    one in 2^512. This is not an accident, it's a consequence of its
    design criteria (see my other post).

    > Anyone who doesn't "know" a bunch
    > of stuff that is false would choose CRC-512 instead of SHA-512 to detect
    > natural errors. Unless you are battling adversaries who would use the
    > obvious ways to outwit CRC-512 as a signature, you are better off with
    > something than other than a cryptographic hash.


    If you know that short errors are more likely than long errors, then I
    agree with you. If the corruption is expected to be random or larger,
    there is no detection ability advantage of CRC-512 over SHA-512.
    CRC-512 might be preferable because it's computationally cheaper, but
    this is an absolutely suitable application for a cryptographic hash
    functions.

    DS

  13. Re: Silent data corruption despite TCP

    On Apr 28, 11:43*am, David Schwartz wrote:
    > On Apr 28, 8:46 am, Noob wrote:
    >
    > > Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy
    > > wireless channel, and suppose the link layer does not compute any CRC.

    >
    > I think that's a completely unrealistic hypothetical. Typical TCP-over-
    > wireless implementations have a 32-bit CRC at the wireless layer and a
    > 16-bit CRC at the TCP layer. No sane person would implement a "noisy
    > wireless channel" with a link layer that "does not compute any CRC".
    > If you did, file transfer over TCP would be only one of your many
    > problems.



    While not wireless, SLIP used to suffer from just that problem - no
    CRC over mediocre quality modem connections, and relied on the IP
    checksums for all error detection. And yes, files were corrupted at a
    not insignificant rate.

    While I don't have any of the raw data handy, or in a publishable
    form, so this should be considered anecdotal at best, back in the
    early/mid-nineties we internally demonstrated that modem connections
    with only 16 bit checksums (dial up BBS's were common at the time too
    - so the 16-bit CRC version of ZMODEM was in the same mix as SLIP),
    led to two or three percent of 1MB file transfers being corrupted. We
    did the tests with quite real world conditions too we had actual
    (willing) customers who had downloaded files download a group of test
    files, zip them up and send them back to us. We also found that
    badly implemented restart schemes* caused several times more errors
    than that. Between those two, we finally managed to nail a 100% firm
    internal policy of always requiring patches and updates to be
    distributed in some form of archive file with at least a 32 bit CRC.

    And mind you, this is with most of the modems running some sort of
    error correction protocol.

    But in short, noisy links and 16 bit checksums do lead to unacceptable
    real world performance.


    *The most common error was the restart scheme on the user's PC would
    see that a shorter abc.exe file already existing in the target
    directory, and would assume that this was really a restart of the
    download of the newer (and larger) updated abc.exe.

  14. Re: Silent data corruption despite TCP

    David Schwartz writes:

    >On Apr 28, 9:34 am, v...@calcite.rhyolite.com (Vernon Schryver) wrote:


    >> It is almost (but not quite) true that if you could say that
    >> "Crypto-Hash CH() detects all N bit errors" then CH would be "broken"
    >> on the grounds that you know it doesn't detect all N+1 bit errors,
    >> and so some of those undetected N+1 bit changes could be used for evil.


    >If that were true, the crypto hash would be broken. The whole point of
    >a crypto hash is that even if you know such changes exist, they cannot
    >be used for evil because they cannot be *found*. The possible
    >advantage of a crypto hash over another hash would be that collisions
    >cannot be found for a proper crypto hash. (Although in this case, it's
    >not clear why that would matter. If you want to maliciously corrupt
    >the data, you can just put in the correct hash anyway.)


    Also, a CH probably does not detect all N bit errors either for any N. Ie,
    it is quite possible that there exist two plain texts which differ by a
    single bit, but have the same hash. Those may be 10^100 bits long.

    A crypto hash is as close as one can come to a random map from any text
    into say 128 bits. In general any 1 bit change will produce 64 bits of
    change in the hash, but since it is a random map, it may produce no change
    for specific strings.
    The important thing is that given the hash value, it is essentially
    impossible to find any string which produces it.


    >> Never mind that most people who use "broken" in that context are wrong,
    >> as they are blather authoritative sounding nonsense about MD5 being
    >> "broken." MD5 and some other cryptographic hashes are "broken" only
    >> for some uses and not others.


    >Right, but this is dangerously close to one of those uses. All you'd
    >have to do is sign the hash, and you'd have a use case for which MD5
    >is broken.


    >> The big problem there are only vague
    >> hopes that SHA-512 or any other hash function you might name are not
    >> just as "breakable." That "hard to analyze" requirement on every
    >> crypto-hash function is at least so far and perhaps forever a fundamental
    >> weakness.


    >Ideally, you adjust your use of a hash so that even if it is "broken"
    >in the ways it's most likely to be broken in the future, that has no
    >effect on your use. That requires a deep understanding of the
    >strengths and weaknesses of cryptographic hashes.


    >For example, it's quite likely that someone will find two chunks of
    >data that hash to the same value long before they can find data of the
    >same length that hash the same hash as a given chunk.


    >DS


  15. Re: Silent data corruption despite TCP

    Noob wrote:

    > Even when the link layer does compute a CRC, it has been shown (*) that
    > corrupted packets do reach the receiver. Therefore, I imagine it is
    > possible for silent data corruption to occur?
    >
    > (*) http://citeseer.ist.psu.edu/stone00when.html


    Thanks everyone for your comments.

    For my own record, I'll add a few links that I find somewhat relevant.

    http://www.ietf.org/mail-archive/web.../msg00890.html
    http://kerneltrap.org/Linux/Data_Err..._Communication

  16. Re: Silent data corruption despite TCP

    Noob wrote:

    > For my own record, I'll add a few links that I find somewhat relevant.
    >
    > http://www.ietf.org/mail-archive/web.../msg00890.html
    > http://kerneltrap.org/Linux/Data_Err..._Communication


    http://citeseer.ist.psu.edu/34744.html
    Performance of Checksums and CRCs over Real Data (1998)

    http://citeseer.ist.psu.edu/stone00when.html
    When The CRC and TCP Checksum Disagree (2000)

    http://www.pdl.cmu.edu/mailinglists/.../msg04095.html
    TCP checksum escapes and iSCSI error recovery design

+ Reply to Thread