Silent data corruption despite TCP - TCP-IP
This is a discussion on Silent data corruption despite TCP - TCP-IP ; Hello everyone,
Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy
wireless channel, and suppose the link layer does not compute any CRC.
Then, I imagine that there is a very high probability that TCP's
...
-
Silent data corruption despite TCP
Hello everyone,
Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy
wireless channel, and suppose the link layer does not compute any CRC.
Then, I imagine that there is a very high probability that TCP's
checksum will not detect every instance of data corruption, and the
receiver's copy of the file will differ from the original file.
Even when the link layer does compute a CRC, it has been shown (*)
that corrupted packets do reach the receiver. Therefore, I imagine it
is possible for silent data corruption to occur?
(*) http://citeseer.ist.psu.edu/stone00when.html
Have there been other studies of silent data corruption despite CRCs
and TCP's checksum?
I suppose I need to use a (cryptographic?) hash function if I want to
be certain, beyond any reasonable doubt, that the receiver's copy is
the same as the original file?
SHA-512 produces a 512-bit hash.
One chance in 2^512 seems small enough :-)
Regards.
-
Re: Silent data corruption despite TCP
In article <4815f164$0$21072$426a34cc@news.free.fr>,
Noob wrote:
>Hello everyone,
>
>Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy
>wireless channel, and suppose the link layer does not compute any CRC.
That's gotta be some link layer... :-P
>Even when the link layer does compute a CRC, it has been shown (*)
>that corrupted packets do reach the receiver. Therefore, I imagine it
>is possible for silent data corruption to occur?
>
>(*) http://citeseer.ist.psu.edu/stone00when.html
I've seen it happen due to pre-release NFS bugs internally. It's not
pretty.
>I suppose I need to use a (cryptographic?) hash function if I want to
>be certain, beyond any reasonable doubt, that the receiver's copy is
>the same as the original file?
>
>SHA-512 produces a 512-bit hash.
>One chance in 2^512 seems small enough :-)
We detected said NFS bug only because a few of our NFS clients were using
IPsec to protect the packets. IPsec's data-integrity/packet-authentication
(i.e. its use of HMAC-{MD5,SHA1,SHA2}) helps immensely here. Combine that
with a TCP that retransmits, and the use of IPsec can make up for your very
flaky link-layer.
You could also hash the file after transmission. This is a cat that you can
skin any number of ways.
--
Daniel L. McDonald - Solaris Security & Networking Engineering
Mail: danmcd@sun.com | * MY OPINIONS ARE NOT NECESSARILY SUN'S! *
35 Network Drive Burlington, MA |"rising falling at force ten
http://blogs.sun.com/danmcd/ | we twist the world and ride the wind" - Rush
-
Re: Silent data corruption despite TCP
In article <4815f164$0$21072$426a34cc@news.free.fr>,
Noob wrote:
>Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy
>wireless channel, and suppose the link layer does not compute any CRC.
>
>Then, I imagine that there is a very high probability that TCP's
>checksum will not detect every instance of data corruption, and the
>receiver's copy of the file will differ from the original file.
What is a "very high probability"?
It should depend on your application.
>Even when the link layer does compute a CRC, it has been shown (*)
>that corrupted packets do reach the receiver. Therefore, I imagine it
>is possible for silent data corruption to occur?
Silent data corruption is always possible, even if the CRC is twice
or even 100 times as long as the data itself. It is all matter of
what you consider a "very high probability".
>(*) http://citeseer.ist.psu.edu/stone00when.html
>
>Have there been other studies of silent data corruption despite CRCs
>and TCP's checksum?
I think that's the best published study.
>I suppose I need to use a (cryptographic?) hash function if I want to
>be certain, beyond any reasonable doubt, that the receiver's copy is
>the same as the original file?
You need to quantify "reasonable doubt" and decide what kind of errors
you are worried about. Are the errors you care about isolate single
bit changes, drop-outs (a block N bits all changed to 0 or 1), bursts
of static (a block M bits changed randomly), or something else? How
many errors occur in a packet? Are the errors uniformly distributed?
Do you only want to detect errors and rely on TCP to recover by
retransmitting or are the errors frequent enough that the costs of
forward error correction are worthwhile?
>SHA-512 produces a 512-bit hash.
>One chance in 2^512 seems small enough :-)
That fundamental misunderstanding of cryptographic hash functions is
one of my pet peeves. Cryptographic hash functions are not necessarily
better at detecting changes than other hash functions, CRCs, FCSs, etc.
Cryptographic graphic hash functions are mostly designed to be very
hard to analyze so that adversaries cannot reverse them; considerations
of how many and what kinds of changes are they detect are secondary.
You can say things about error detection functions like "CRC-X detects
any single burst of errors of N or fewer bits in a block of Y bits,"
but you cannot say anything similar about cryptographic hash functions
(except for trivial cases of N and Y). You cannot even say, for example,
that "the detection failure rate of SHA-512 is one in 2^512 changes"
(of course with suitable definitions for "changes" including type, size,
and distribution).
It is almost (but not quite) true that if you could say that
"Crypto-Hash CH() detects all N bit errors" then CH would be "broken"
on the grounds that you know it doesn't detect all N+1 bit errors,
and so some of those undetected N+1 bit changes could be used for evil.
Never mind that most people who use "broken" in that context are wrong,
as they are blather authoritative sounding nonsense about MD5 being
"broken." MD5 and some other cryptographic hashes are "broken" only
for some uses and not others. The big problem there are only vague
hopes that SHA-512 or any other hash function you might name are not
just as "breakable." That "hard to analyze" requirement on every
crypto-hash function is at least so far and perhaps forever a fundamental
weakness.
Vernon Schryver vjs@rhyolite.com
-
Re: Silent data corruption despite TCP
On Apr 28, 8:46 am, Noob wrote:
> Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy
> wireless channel, and suppose the link layer does not compute any CRC.
I think that's a completely unrealistic hypothetical. Typical TCP-over-
wireless implementations have a 32-bit CRC at the wireless layer and a
16-bit CRC at the TCP layer. No sane person would implement a "noisy
wireless channel" with a link layer that "does not compute any CRC".
If you did, file transfer over TCP would be only one of your many
problems.
DS
-
Re: Silent data corruption despite TCP
In comp.protocols.tcp-ip Noob wrote:
> I suppose I need to use a (cryptographic?) hash function if I want
> to be certain, beyond any reasonable doubt, that the receiver's copy
> is the same as the original file?
It depends entirely on your definition of a reasonable doubt. It
would/could certainly help considerably.
> SHA-512 produces a 512-bit hash.
> One chance in 2^512 seems small enough :-)
I'm not sure the math works _exactly_ that way but it would be better
than just relying on TCP's checksum alone. Might be belts, suspenders
and duct-tape, but some data calls for that.
IIRC the emerging SCTP uses a rather stronger 32 bit checksum of some
sort.
rick jones
--
The computing industry isn't as much a game of "Follow The Leader" as
it is one of "Ring Around the Rosy" or perhaps "Duck Duck Goose."
- Rick Jones
these opinions are mine, all mine; HP might not want them anyway... 
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
-
Re: Silent data corruption despite TCP
On Apr 28, 9:34 am, v...@calcite.rhyolite.com (Vernon Schryver) wrote:
> That fundamental misunderstanding of cryptographic hash functions is
> one of my pet peeves. Cryptographic hash functions are not necessarily
> better at detecting changes than other hash functions, CRCs, FCSs, etc.
> Cryptographic graphic hash functions are mostly designed to be very
> hard to analyze so that adversaries cannot reverse them; considerations
> of how many and what kinds of changes are they detect are secondary.
> You can say things about error detection functions like "CRC-X detects
> any single burst of errors of N or fewer bits in a block of Y bits,"
> but you cannot say anything similar about cryptographic hash functions
> (except for trivial cases of N and Y). You cannot even say, for example,
> that "the detection failure rate of SHA-512 is one in 2^512 changes"
> (of course with suitable definitions for "changes" including type, size,
> and distribution).
If you have a block of data with a 512-bit cryptographic hash, the
probability that random changes to the data and/or the hash will leave
things such that the hash is still the correct hash of the data is
fairly close to 1 in 2^512 for practical purposes. This is one of the
design criteria for cryptographic hashes and is definitely true of
commonly-used hashes such as SHA-512.
This can be true of a cryptographic hash, and if it's not, then the
hash is at least somewhat broken. Commonly-used cryptographic hashes
are not broken.
Again, this is specifically one of the design criteria for
cryptographic hashes. The hashes are supposed to be randomly
distributed over the available hash space and any change in the input
is supposed to avalanche over the output.
DS
-
Re: Silent data corruption despite TCP
Noob writes:
>Hello everyone,
>Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy
>wireless channel, and suppose the link layer does not compute any CRC.
>Then, I imagine that there is a very high probability that TCP's
>checksum will not detect every instance of data corruption, and the
>receiver's copy of the file will differ from the original file.
>Even when the link layer does compute a CRC, it has been shown (*)
>that corrupted packets do reach the receiver. Therefore, I imagine it
>is possible for silent data corruption to occur?
>(*) http://citeseer.ist.psu.edu/stone00when.html
>Have there been other studies of silent data corruption despite CRCs
>and TCP's checksum?
>I suppose I need to use a (cryptographic?) hash function if I want to
>be certain, beyond any reasonable doubt, that the receiver's copy is
>the same as the original file?
>SHA-512 produces a 512-bit hash.
>One chance in 2^512 seems small enough :-)
I would say 1 in 2^128 is good enough.
You do not need a cryptographic checksum. Just one that is sufficiently
mixing and that depends equally on each bit of the text. Nature is not
malicious-- it is not trying to mess up. Ie, the chances that nature will
happen to hit on the noise structure to vastly increase the rate from
1/2^128 to a much smaller rate is even smaller than 1.2^128
>Regards.
-
Re: Silent data corruption despite TCP
On Apr 28, 9:34 am, v...@calcite.rhyolite.com (Vernon Schryver) wrote:
> It is almost (but not quite) true that if you could say that
> "Crypto-Hash CH() detects all N bit errors" then CH would be "broken"
> on the grounds that you know it doesn't detect all N+1 bit errors,
> and so some of those undetected N+1 bit changes could be used for evil.
If that were true, the crypto hash would be broken. The whole point of
a crypto hash is that even if you know such changes exist, they cannot
be used for evil because they cannot be *found*. The possible
advantage of a crypto hash over another hash would be that collisions
cannot be found for a proper crypto hash. (Although in this case, it's
not clear why that would matter. If you want to maliciously corrupt
the data, you can just put in the correct hash anyway.)
> Never mind that most people who use "broken" in that context are wrong,
> as they are blather authoritative sounding nonsense about MD5 being
> "broken." MD5 and some other cryptographic hashes are "broken" only
> for some uses and not others.
Right, but this is dangerously close to one of those uses. All you'd
have to do is sign the hash, and you'd have a use case for which MD5
is broken.
> The big problem there are only vague
> hopes that SHA-512 or any other hash function you might name are not
> just as "breakable." That "hard to analyze" requirement on every
> crypto-hash function is at least so far and perhaps forever a fundamental
> weakness.
Ideally, you adjust your use of a hash so that even if it is "broken"
in the ways it's most likely to be broken in the future, that has no
effect on your use. That requires a deep understanding of the
strengths and weaknesses of cryptographic hashes.
For example, it's quite likely that someone will find two chunks of
data that hash to the same value long before they can find data of the
same length that hash the same hash as a given chunk.
DS
-
Re: Silent data corruption despite TCP
In comp.protocols.tcp-ip David Schwartz wrote:
> This can be true of a cryptographic hash, and if it's not, then the
> hash is at least somewhat broken. Commonly-used cryptographic hashes
> are not broken.
Are not known to be broken.
rick jones
--
web2.0 n, the dot.com reunion tour...
these opinions are mine, all mine; HP might not want them anyway... 
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
-
Re: Silent data corruption despite TCP
In article ,
Rick Jones wrote:
>> This can be true of a cryptographic hash, and if it's not, then the
>> hash is at least somewhat broken. Commonly-used cryptographic hashes
>> are not broken.
>
>Are not known to be broken.
Even that's is gross optimism. All cryptogrpaphic hashes are merely
hoped to not be secretly broken by too many adversaries. The nature
of all current cryptographic hashes is that no one has proven anything
useful about how well they work for simple error detection. I'd believe
SHA-512 detects all single bit errors in all blocks of 512 bits, but
I'd like to see a proof of all double bit errors in 512, all single
bits in 1024 (or even 513) bits, not to mention blocks not so tiny that
you would do better by transmitting second copies of the 64 bytes run
through a bijection ("scrambler"). Anyone who doesn't "know" a bunch
of stuff that is false would choose CRC-512 instead of SHA-512 to detect
natural errors. Unless you are battling adversaries who would use the
obvious ways to outwit CRC-512 as a signature, you are better off with
something than other than a cryptographic hash.
Vernon Schryver vjs@rhyolite.com
-
Re: Silent data corruption despite TCP
On Apr 28, 12:20 pm, Rick Jones wrote:
> In comp.protocols.tcp-ip David Schwartz wrote:
> > This can be true of a cryptographic hash, and if it's not, then the
> > hash is at least somewhat broken. Commonly-used cryptographic hashes
> > are not broken.
> Are not known to be broken.
For the type of breakage we are talking about here, our knowledge that
they are not broken is almost as certain as such knowledge can be.
In this case, we are talking about random corruption of the hash or
the data causing the hash to match the data without the data matching
the original. There is no "adversary" here recomputing the hash -- the
adversary is any multilation process with no knowledge of the
internals of our hash but any other kind of mechanism. (Such as
flipping all bits, flipping random bits, flipping alternate bits,
turning 1/3 of all zeroes to ones, and so on. Any process that does
not know about the hash's internals.)
Cryptographic weaknesses (which, of course, hashes like SHA-512 almost
certainly have and will ultimately be discovered) cannot be exploited
by anything but an adversary. Random corruption will not get smarter.
There's always a very slim chance that I will be proven wrong, but I
feel about as sure as anything in computer science that SHA-512 will
always be safe for this use. For that matter, MD5 will likely always
be safe for this use.
DS
-
Re: Silent data corruption despite TCP
On Apr 28, 12:48 pm, v...@calcite.rhyolite.com (Vernon Schryver)
wrote:
> Even that's is gross optimism. All cryptogrpaphic hashes are merely
> hoped to not be secretly broken by too many adversaries.
That's true, but cryptographic weaknesses (such as the ability of an
adversary to infer something about the data from the hash or to create
two blocks that hash to the same value) have no affect on use as a
checksum.
> The nature
> of all current cryptographic hashes is that no one has proven anything
> useful about how well they work for simple error detection.
Right, but that's totally irrelevant. For example, which would you
prefer:
1) A hash that's been proven to catch 95% of single bit errors, or
2) A hash that hasn't been proven to catch anything, but there is
strong evidence that the probability of a single bit error passing the
hash is believed to be close to 1 in 2^128.
> I'd believe
> SHA-512 detects all single bit errors in all blocks of 512 bits, but
> I'd like to see a proof of all double bit errors in 512, all single
> bits in 1024 (or even 513) bits, not to mention blocks not so tiny that
> you would do better by transmitting second copies of the 64 bytes run
> through a bijection ("scrambler").
If you know for a fact that all you'll have are errors of this type,
then SHA-512 is a poor choice. But SHA-512 has a probability of not
detecting any given random error that's vanishingly small, close to
one in 2^512. This is not an accident, it's a consequence of its
design criteria (see my other post).
> Anyone who doesn't "know" a bunch
> of stuff that is false would choose CRC-512 instead of SHA-512 to detect
> natural errors. Unless you are battling adversaries who would use the
> obvious ways to outwit CRC-512 as a signature, you are better off with
> something than other than a cryptographic hash.
If you know that short errors are more likely than long errors, then I
agree with you. If the corruption is expected to be random or larger,
there is no detection ability advantage of CRC-512 over SHA-512.
CRC-512 might be preferable because it's computationally cheaper, but
this is an absolutely suitable application for a cryptographic hash
functions.
DS
-
Re: Silent data corruption despite TCP
On Apr 28, 11:43*am, David Schwartz wrote:
> On Apr 28, 8:46 am, Noob wrote:
>
> > Suppose I transfer a large file, say 20-50 GB, using TCP, over a noisy
> > wireless channel, and suppose the link layer does not compute any CRC.
>
> I think that's a completely unrealistic hypothetical. Typical TCP-over-
> wireless implementations have a 32-bit CRC at the wireless layer and a
> 16-bit CRC at the TCP layer. No sane person would implement a "noisy
> wireless channel" with a link layer that "does not compute any CRC".
> If you did, file transfer over TCP would be only one of your many
> problems.
While not wireless, SLIP used to suffer from just that problem - no
CRC over mediocre quality modem connections, and relied on the IP
checksums for all error detection. And yes, files were corrupted at a
not insignificant rate.
While I don't have any of the raw data handy, or in a publishable
form, so this should be considered anecdotal at best, back in the
early/mid-nineties we internally demonstrated that modem connections
with only 16 bit checksums (dial up BBS's were common at the time too
- so the 16-bit CRC version of ZMODEM was in the same mix as SLIP),
led to two or three percent of 1MB file transfers being corrupted. We
did the tests with quite real world conditions too – we had actual
(willing) customers who had downloaded files download a group of test
files, zip them up and send them back to us. We also found that
badly implemented restart schemes* caused several times more errors
than that. Between those two, we finally managed to nail a 100% firm
internal policy of always requiring patches and updates to be
distributed in some form of archive file with at least a 32 bit CRC.
And mind you, this is with most of the modems running some sort of
error correction protocol.
But in short, noisy links and 16 bit checksums do lead to unacceptable
real world performance.
*The most common error was the restart scheme on the user's PC would
see that a shorter abc.exe file already existing in the target
directory, and would assume that this was really a restart of the
download of the newer (and larger) updated abc.exe.
-
Re: Silent data corruption despite TCP
David Schwartz writes:
>On Apr 28, 9:34 am, v...@calcite.rhyolite.com (Vernon Schryver) wrote:
>> It is almost (but not quite) true that if you could say that
>> "Crypto-Hash CH() detects all N bit errors" then CH would be "broken"
>> on the grounds that you know it doesn't detect all N+1 bit errors,
>> and so some of those undetected N+1 bit changes could be used for evil.
>If that were true, the crypto hash would be broken. The whole point of
>a crypto hash is that even if you know such changes exist, they cannot
>be used for evil because they cannot be *found*. The possible
>advantage of a crypto hash over another hash would be that collisions
>cannot be found for a proper crypto hash. (Although in this case, it's
>not clear why that would matter. If you want to maliciously corrupt
>the data, you can just put in the correct hash anyway.)
Also, a CH probably does not detect all N bit errors either for any N. Ie,
it is quite possible that there exist two plain texts which differ by a
single bit, but have the same hash. Those may be 10^100 bits long.
A crypto hash is as close as one can come to a random map from any text
into say 128 bits. In general any 1 bit change will produce 64 bits of
change in the hash, but since it is a random map, it may produce no change
for specific strings.
The important thing is that given the hash value, it is essentially
impossible to find any string which produces it.
>> Never mind that most people who use "broken" in that context are wrong,
>> as they are blather authoritative sounding nonsense about MD5 being
>> "broken." MD5 and some other cryptographic hashes are "broken" only
>> for some uses and not others.
>Right, but this is dangerously close to one of those uses. All you'd
>have to do is sign the hash, and you'd have a use case for which MD5
>is broken.
>> The big problem there are only vague
>> hopes that SHA-512 or any other hash function you might name are not
>> just as "breakable." That "hard to analyze" requirement on every
>> crypto-hash function is at least so far and perhaps forever a fundamental
>> weakness.
>Ideally, you adjust your use of a hash so that even if it is "broken"
>in the ways it's most likely to be broken in the future, that has no
>effect on your use. That requires a deep understanding of the
>strengths and weaknesses of cryptographic hashes.
>For example, it's quite likely that someone will find two chunks of
>data that hash to the same value long before they can find data of the
>same length that hash the same hash as a given chunk.
>DS
-
Re: Silent data corruption despite TCP
Noob wrote:
> Even when the link layer does compute a CRC, it has been shown (*) that
> corrupted packets do reach the receiver. Therefore, I imagine it is
> possible for silent data corruption to occur?
>
> (*) http://citeseer.ist.psu.edu/stone00when.html
Thanks everyone for your comments.
For my own record, I'll add a few links that I find somewhat relevant.
http://www.ietf.org/mail-archive/web.../msg00890.html
http://kerneltrap.org/Linux/Data_Err..._Communication
-
Re: Silent data corruption despite TCP
Noob wrote:
> For my own record, I'll add a few links that I find somewhat relevant.
>
> http://www.ietf.org/mail-archive/web.../msg00890.html
> http://kerneltrap.org/Linux/Data_Err..._Communication
http://citeseer.ist.psu.edu/34744.html
Performance of Checksums and CRCs over Real Data (1998)
http://citeseer.ist.psu.edu/stone00when.html
When The CRC and TCP Checksum Disagree (2000)
http://www.pdl.cmu.edu/mailinglists/.../msg04095.html
TCP checksum escapes and iSCSI error recovery design