CHECKSUM oddity? - VMS

This is a discussion on CHECKSUM oddity? - VMS ; Create a backup saveset. Create an "identical" saveset (i.e. issue the command again). Get the checksums of the savesets. They are the same. $ DIFFERENCES and $ BACKUP/COMPARE say they are different. There is an obvious difference: the Date: field ...

+ Reply to Thread
Results 1 to 16 of 16

Thread: CHECKSUM oddity?

  1. CHECKSUM oddity?

    Create a backup saveset. Create an "identical" saveset (i.e. issue the
    command again). Get the checksums of the savesets. They are the same.
    $ DIFFERENCES and $ BACKUP/COMPARE say they are different.

    There is an obvious difference: the Date: field shown by BACKUP/LIST.
    However, this is a date internal to the backup saveset, not the creation
    or modifcation date of the saveset. Thus, I don't think it is a date in
    the file header (which CHECKSUM should and does ignore, since a primary
    use is to make sure that files were transferred properly).

    Who can explain this puzzle?

    In other words, here is a case where CHECKSUM reports no difference, but
    a) there is an obvious difference which b) other utilities report. (A
    different issue, of course, is that BACKUP/COMPARE might react to stuff
    after the EOF. But that's not the issue here.)


  2. Re: CHECKSUM oddity?

    In article , helbig@astro.multiCLOTHESvax.de (Phillip Helbig---remove CLOTHES to reply) writes:
    > Create a backup saveset. Create an "identical" saveset (i.e. issue the
    > command again). Get the checksums of the savesets. They are the same.
    > $ DIFFERENCES and $ BACKUP/COMPARE say they are different.
    >
    > There is an obvious difference: the Date: field shown by BACKUP/LIST.
    > However, this is a date internal to the backup saveset, not the creation
    > or modifcation date of the saveset. Thus, I don't think it is a date in
    > the file header (which CHECKSUM should and does ignore, since a primary
    > use is to make sure that files were transferred properly).
    >
    > Who can explain this puzzle?


    Repeat your test with $ BACKUP /GROUP=0

    VMS CHECKSUM is not a cryptographically secure hash function. In
    particular, you're pretty well guaranteed to get a collision when
    the application creating your data file is actively trying to
    make the XOR of a batch of records equal to zero.

  3. Re: CHECKSUM oddity?

    In article ,
    briggs@encompasserve.org writes:

    > In article , helbig@astro.multiCLOTHESvax.de (Phillip Helbig---remove CLOTHES to reply) writes:
    > > Create a backup saveset. Create an "identical" saveset (i.e. issue the
    > > command again). Get the checksums of the savesets. They are the same.
    > > $ DIFFERENCES and $ BACKUP/COMPARE say they are different.
    > >
    > > There is an obvious difference: the Date: field shown by BACKUP/LIST.
    > > However, this is a date internal to the backup saveset, not the creation
    > > or modifcation date of the saveset. Thus, I don't think it is a date in
    > > the file header (which CHECKSUM should and does ignore, since a primary
    > > use is to make sure that files were transferred properly).
    > >
    > > Who can explain this puzzle?

    >
    > Repeat your test with $ BACKUP /GROUP=0


    That was it!


  4. Re: CHECKSUM oddity?

    On Oct 15, 3:18 pm, hel...@astro.multiCLOTHESvax.de (Phillip Helbig---
    remove CLOTHES to reply) wrote:
    > In article ,
    >
    > bri...@encompasserve.org writes:
    > > In article , hel...@astro.multiCLOTHESvax.de (Phillip Helbig---remove CLOTHES to reply) writes:
    > > > Create a backup saveset. Create an "identical" saveset (i.e. issue the
    > > > command again). Get the checksums of the savesets. They are the same.
    > > > $ DIFFERENCES and $ BACKUP/COMPARE say they are different.

    :
    > > Repeat your test with $ BACKUP /GROUP=0

    >
    > That was it!


    Cute. Checksum uses a simple XOR on the longword in the data records
    for the file, adding in any odd remaining remaining bytes if present.
    The backup redundancy group used that same formula resulting in the
    same output for files being different (albeit in the header area).

    A similar confusion can arise from the fact that CHECKSUM (as well as
    DIFFERENCE) use RECORD IO to get the data. So you can CONVERT and
    indexed file to sequential and get no differences reported and get an
    equal checksum where you and I know that the on disk bits for those
    files are rather different.
    Backup/compare finds those differences (very verbosely so)

    Cheers,
    Hein.




  5. Re: CHECKSUM oddity?

    In article , briggs@encompasserve.org writes:
    > In article , helbig@astro.multiCLOTHESvax.de (Phillip Helbig---remove CLOTHES to reply) writes:
    >> Create a backup saveset. Create an "identical" saveset (i.e. issue the
    >> command again). Get the checksums of the savesets. They are the same.
    >> $ DIFFERENCES and $ BACKUP/COMPARE say they are different.
    >>
    >> There is an obvious difference: the Date: field shown by BACKUP/LIST.
    >> However, this is a date internal to the backup saveset, not the creation
    >> or modifcation date of the saveset. Thus, I don't think it is a date in
    >> the file header (which CHECKSUM should and does ignore, since a primary
    >> use is to make sure that files were transferred properly).
    >>
    >> Who can explain this puzzle?

    >
    > Repeat your test with $ BACKUP /GROUP=0
    >
    > VMS CHECKSUM is not a cryptographically secure hash function. In
    > particular, you're pretty well guaranteed to get a collision when
    > the application creating your data file is actively trying to
    > make the XOR of a batch of records equal to zero.


    This can be true of CRC's as well. The CRC16 calculation on 9-track
    tapes was _appended_ to the data as it was written to tape. The
    result was that when the CRC algorithm was applied to the full string
    of bits read from the tape later on, the outcome was zero.

    [Simple enough to explain if you understand that while CRC's are thought
    of as remainders from polynomial divisions with coefficients being modulo
    2 integers, the reality is that they are essentially ordinary binary
    division simplified by discarding carries/borrows. Since such a
    polynomial modulo two is its own negative (-RMDR(X)=RMDR(X)), we are
    appending the negative of the remainder to the original input, and if
    the division is redone, the remainder from the initial portion cancels
    the RMDR(X) = -RMDR(X) we appended. [To make this all work, I think
    the original calculation needed to have 16 zero bits appended during
    the division phase]]

  6. Re: CHECKSUM oddity?

    On Oct 15, 3:58 pm, Hein RMS van den Heuvel
    wrote:
    :
    > Cute. Checksum uses a simple XOR on the longword in the data records
    > for the file, adding in any odd remaining remaining bytes if present.
    > The backup redundancy group used that same formula resulting in the
    > same output for files being different (albeit in the header area).
    >
    > A similar confusion can arise from the fact that CHECKSUM (as well as
    > DIFFERENCE) use RECORD IO to get the data. So you can CONVERT and
    > indexed file to sequential and get no differences reported and get an
    > equal checksum where you and I know that the on disk bits for those
    > files are rather different.
    > Backup/compare finds those differences (very verbosely so)


    btw... OpenVMS 8.3 checksum addresses all of the above.
    See help text below.
    (Hartmut, thanks fr pointing that out !)

    Cheers,
    Hein.

    CHECKSUM
    /ALGORITHM
    /ALGORITHM=option
    /ALGORITHM=XOR (default)

    Selects the algorithm used for file checksums. The default is the
    XOR algorithm for data within records, as used by the previous
    Checksum utilities on OpenVMS Alpha and VAX systems. Options
    include:

    o CRC - A CRC-32 algorithm for all bytes within the file
    (possible record structures are ignored); this algorithm is
    also known as AUTODIN II, Ethernet, or FDDI CRC.

    o MD5 - The MD5 digest, as published by Ronald L. Rivest
    (RFC 1321), for all bytes within the file (possible record
    structures are ignored).

    o XOR - An XOR algorithm for all data, according to the record
    structure of the file.

    Hein.


  7. Re: CHECKSUM oddity?

    In article <1192542873.238000.42110@e34g2000pro.googlegroups.c om>, Hein RMS van den Heuvel writes:
    >CHECKSUM
    > /ALGORITHM
    > /ALGORITHM=option
    > /ALGORITHM=XOR (default)
    >
    > Selects the algorithm used for file checksums. The default is the
    > XOR algorithm for data within records, as used by the previous
    > Checksum utilities on OpenVMS Alpha and VAX systems. Options
    > include:
    >
    > o CRC - A CRC-32 algorithm for all bytes within the file
    > (possible record structures are ignored); this algorithm is
    > also known as AUTODIN II, Ethernet, or FDDI CRC.
    >
    > o MD5 - The MD5 digest, as published by Ronald L. Rivest
    > (RFC 1321), for all bytes within the file (possible record
    > structures are ignored).
    >
    > o XOR - An XOR algorithm for all data, according to the record
    > structure of the file.


    When is SHA1 expected?

    --
    Peter "EPLAN" LANGSTOEGER
    Network and OpenVMS system specialist
    E-mail peter@langstoeger.at
    A-1030 VIENNA AUSTRIA I'm not a pessimist, I'm a realist

  8. Re: CHECKSUM oddity?

    George Cornelius wrote:
    (snip)

    > This can be true of CRC's as well. The CRC16 calculation on 9-track
    > tapes was _appended_ to the data as it was written to tape. The
    > result was that when the CRC algorithm was applied to the full string
    > of bits read from the tape later on, the outcome was zero.


    > [Simple enough to explain if you understand that while CRC's are thought
    > of as remainders from polynomial divisions with coefficients being modulo
    > 2 integers, the reality is that they are essentially ordinary binary
    > division simplified by discarding carries/borrows. Since such a
    > polynomial modulo two is its own negative (-RMDR(X)=RMDR(X)), we are
    > appending the negative of the remainder to the original input, and if
    > the division is redone, the remainder from the initial portion cancels
    > the RMDR(X) = -RMDR(X) we appended. [To make this all work, I think
    > the original calculation needed to have 16 zero bits appended during
    > the division phase]]


    I have known that this works, but never saw the detailed description
    of why it works. The common serial implementation using a LFSR
    (linear feedback shift register) makes it easy, and the required
    hardware very simple. As to your last statement, appending
    zero bits is done by feeding zeros into the shift register and
    sending as data what comes out the other end.

    In some cases the shift register is initialized with ones, otherwise
    initial zero bits in the input are not checked.

    Another interesting case is the ability to read tapes backwards.
    While many newer drives can't do that, most of the older ones did.
    There are sorting algorithms specifically optimized for tape drives
    that can read backwards. (The data appears in the buffer in the
    right order, filling from the end toward the start.) It must
    also be possible to verify the CRC backwards.

    -- glen


  9. Re: CHECKSUM oddity?

    On 10/16/07 13:35, Peter 'EPLAN' LANGSTOeGER wrote:
    > In article , Hein RMS van den Heuvel writes:
    >> CHECKSUM
    >> /ALGORITHM
    >> /ALGORITHM=option
    >> /ALGORITHM=XOR (default)
    >>
    >> Selects the algorithm used for file checksums. The default is the
    >> XOR algorithm for data within records, as used by the previous
    >> Checksum utilities on OpenVMS Alpha and VAX systems. Options
    >> include:
    >>
    >> o CRC - A CRC-32 algorithm for all bytes within the file
    >> (possible record structures are ignored); this algorithm is
    >> also known as AUTODIN II, Ethernet, or FDDI CRC.
    >>
    >> o MD5 - The MD5 digest, as published by Ronald L. Rivest
    >> (RFC 1321), for all bytes within the file (possible record
    >> structures are ignored).
    >>
    >> o XOR - An XOR algorithm for all data, according to the record
    >> structure of the file.

    >
    > When is SHA1 expected?


    Snicker.

    VMS ENCRYPT only got AES approx 2 years ago. CHECKSUM will get SHA1
    *maybe* 5 years after trivial message forgery software has been
    published.

    --
    Ron Johnson, Jr.
    Jefferson LA USA

    Give a man a fish, and he eats for a day.
    Hit him with a fish, and he goes away for good!

  10. Re: CHECKSUM oddity?

    In article , glen herrmannsfeldt writes:
    > I have known that this works, but never saw the detailed description
    > of why it works. The common serial implementation using a LFSR
    > (linear feedback shift register) makes it easy, and the required
    > hardware very simple. As to your last statement, appending
    > zero bits is done by feeding zeros into the shift register and
    > sending as data what comes out the other end.
    >
    > In some cases the shift register is initialized with ones, otherwise
    > initial zero bits in the input are not checked.
    >
    > Another interesting case is the ability to read tapes backwards.
    > While many newer drives can't do that, most of the older ones did.


    IBM certainly did it. I used the 2400 series drives, which had that
    capability, at least on mainframe channels.

    > There are sorting algorithms specifically optimized for tape drives
    > that can read backwards. (The data appears in the buffer in the
    > right order, filling from the end toward the start.)


    The SynchSort algorithm was known for being optimized to make use
    of both directions of tape motion, and is notable for being the
    first software to be patented.

    [It was unclear at the time whether software should be patentable,
    given that algorithms were considered as being, roughly, mathematical
    equations, which were well established as being non patentable. The
    key distinction with the sort algorithm was that the patent was granted
    for a _process_, one that clearly involved mechanical motion.]

    > It must
    > also be possible to verify the CRC backwards.


    From a polynomial division point of view it seems a bit tricky but I
    guess I can see how it works to run it backwards. A shift register
    implementation, though, is so simple that operating backwards is easily
    imagined. I suppose you feed the bits into the register from the other
    end. Could actually be a CRC as well, using a bit order reversed
    version of the original polynomial.

    --
    George Cornelius cornelius(at)eisner.decus.org
    cornelius(at)mayo.edu

  11. Re: CHECKSUM oddity?

    In article , cornelius@encompasserve.org (George Cornelius) writes:
    > In article , glen herrmannsfeldt writes:
    >> Another interesting case is the ability to read tapes backwards.
    >> While many newer drives can't do that, most of the older ones did.

    >
    > IBM certainly did it. I used the 2400 series drives, which had that
    > capability, at least on mainframe channels.
    >


    As of the December 2000 doc CD, only TK50 was a supported tape that
    didn't support reading in reverse. I suspect that's changed, but when
    I tried to bring up the latest I/O User's Guide the I didn't wait
    to see how long the doc site would take to respond to the PDF file
    request.

    I wouldn't expect 8mm to read in reverse, either, but those never made
    it to full support.


  12. Re: CHECKSUM oddity?

    In article , koehler@eisner.nospam.encompasserve.org (Bob Koehler) writes:
    > In article , cornelius@encompasserve.org (George Cornelius) writes:
    >> In article , glen herrmannsfeldt writes:
    >>> Another interesting case is the ability to read tapes backwards.
    >>> While many newer drives can't do that, most of the older ones did.

    >>
    >> IBM certainly did it. I used the 2400 series drives, which had that
    >> capability, at least on mainframe channels.
    >>

    >
    > As of the December 2000 doc CD, only TK50 was a supported tape that
    > didn't support reading in reverse. I suspect that's changed, but when
    > I tried to bring up the latest I/O User's Guide the I didn't wait
    > to see how long the doc site would take to respond to the PDF file
    > request.


    Actually, I was thinking pre-history - about 10 years before VMS.

    And, yes, the VMS I/O User's Guide seems to list only TK50's, of the various
    obsolete drives it claims are supported, as being unable to read backwards. I
    just spit an IO$_READPBLK!IO$M_REVERSE at a STK 9840 that had advanced one record
    from load point and it happily fetched the VOL1 label into my input buffer, so
    VMS read reverse support is alive and well.

    --
    George Cornelius cornelius(at)eisner.decus.org
    cornelius(at)mayo.edu


    >
    > I wouldn't expect 8mm to read in reverse, either, but those never made
    > it to full support.
    >


  13. Re: CHECKSUM oddity?

    Read reverse support is done in MKdriver by skipping backwards, reading
    forwards and skipping backwards again. Very slow and inefficient but
    it works.

    Jur.

    George Cornelius wrote:
    > In article , koehler@eisner.nospam.encompasserve.org (Bob Koehler) writes:
    >> In article , cornelius@encompasserve.org (George Cornelius) writes:
    >>> In article , glen herrmannsfeldt writes:
    >>>> Another interesting case is the ability to read tapes backwards.
    >>>> While many newer drives can't do that, most of the older ones did.
    >>> IBM certainly did it. I used the 2400 series drives, which had that
    >>> capability, at least on mainframe channels.
    >>>

    >> As of the December 2000 doc CD, only TK50 was a supported tape that
    >> didn't support reading in reverse. I suspect that's changed, but when
    >> I tried to bring up the latest I/O User's Guide the I didn't wait
    >> to see how long the doc site would take to respond to the PDF file
    >> request.

    >
    > Actually, I was thinking pre-history - about 10 years before VMS.
    >
    > And, yes, the VMS I/O User's Guide seems to list only TK50's, of the various
    > obsolete drives it claims are supported, as being unable to read backwards. I
    > just spit an IO$_READPBLK!IO$M_REVERSE at a STK 9840 that had advanced one record
    > from load point and it happily fetched the VOL1 label into my input buffer, so
    > VMS read reverse support is alive and well.
    >
    > --
    > George Cornelius cornelius(at)eisner.decus.org
    > cornelius(at)mayo.edu
    >
    >
    >> I wouldn't expect 8mm to read in reverse, either, but those never made
    >> it to full support.
    >>


  14. Re: CHECKSUM oddity?

    In article <4717476e$0$243$e4fe514c@news.xs4all.nl>, Jur van der Burg <"vdburg at hotmail dot com"> writes:
    > Read reverse support is done in MKdriver by skipping backwards, reading
    > forwards and skipping backwards again. Very slow and inefficient but
    > it works.


    Ugh. I wondered about that when I tried the test, but had no way of
    looking inside the drive given it's in a storage library in the data
    center.

    Since I know of no reasonable way to attach a non-MKdriver tape (presumably
    the fibrechannel driver is a variant of the same code), I suppose that means
    VMS read reverse is no more.

    George

  15. Re: CHECKSUM oddity?

    George Cornelius wrote:

    (snip)

    >>Another interesting case is the ability to read tapes backwards.
    >>While many newer drives can't do that, most of the older ones did.


    > IBM certainly did it. I used the 2400 series drives, which had that
    > capability, at least on mainframe channels.


    Yes, I think all the IBM 9 track drives, as well as 18 track and
    more (3480, 3490) could.

    Helical scan drives, such as DDS and Exabyte, can't easily read
    backwards. DLT might be able to do it. I don't remember that
    QIC drives offered the ability, but it might have been physically
    possible to do it.

    -- glen


  16. Re: CHECKSUM oddity?

    On Nov 7, 3:14 am, glen herrmannsfeldt wrote:
    > George Cornelius wrote:
    >
    > (snip)
    >
    > >>Another interesting case is the ability to read tapes backwards.
    > >>While many newer drives can't do that, most of the older ones did.

    > > IBM certainly did it. I used the 2400 series drives, which had that
    > > capability, at least on mainframe channels.

    >
    > Yes, I think all the IBM 9 track drives, as well as 18 track and
    > more (3480, 3490) could.
    >
    > Helical scan drives, such as DDS and Exabyte, can't easily read
    > backwards. DLT might be able to do it. I don't remember that
    > QIC drives offered the ability, but it might have been physically
    > possible to do it.
    >
    > -- glen


    DDS drives often can't read forwards, much less backwards! (Well, at
    least the ones I've been using! They do seem to be working fine with
    DDS-1 tapes but they are DDS-2 drives.)

    AEF


+ Reply to Thread