interfilesystem copies: large du diffs - Tools

This is a discussion on interfilesystem copies: large du diffs - Tools ; I recently rsync'd around 2.8TB between a RHE server (jfs fs) and a Netapps system. Did a 'du -sk' against each to verify the transfers: 2894932960 sources total, KB 2751664496 destination total, KB That's a 140GB discrepancy. Subsequent verbose rsyncs ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: interfilesystem copies: large du diffs

  1. interfilesystem copies: large du diffs

    I recently rsync'd around 2.8TB between a RHE server (jfs fs) and a
    Netapps system. Did a 'du -sk' against each to verify the transfers:

    2894932960 sources total, KB
    2751664496 destination total, KB

    That's a 140GB discrepancy. Subsequent verbose rsyncs have turned up
    nothing that was not originally transferred.

    I often note similar behaviour with smaller transfers between servers
    with similar OS/fs combos and have always seen it to come extent with
    transfers between systems of any type. It's just that the usual
    discrepancies in this case are magnified greatly by the sheer volume of
    data. Needless to say, 140GB going missing would be a bit of a problem
    and it's not much fun picking through 2.8TB for MIA data.

    Can anyone shed some light on why this happens?

    tia


  2. Re: interfilesystem copies: large du diffs

    orgone wrote:
    >
    > I recently rsync'd around 2.8TB between a RHE server (jfs fs) and a
    > Netapps system. Did a 'du -sk' against each to verify the transfers:
    >
    > 2894932960 sources total, KB
    > 2751664496 destination total, KB


    "df" uses actual blocks allocated. "du" takes the
    file size and concludes that all blocks are allocated.

    > That's a 140GB discrepancy. Subsequent verbose rsyncs have turned up
    > nothing that was not originally transferred.
    >
    > I often note similar behaviour with smaller transfers between servers
    > with similar OS/fs combos and have always seen it to come extent with
    > transfers between systems of any type. It's just that the usual
    > discrepancies in this case are magnified greatly by the sheer volume of
    > data. Needless to say, 140GB going missing would be a bit of a problem
    > and it's not much fun picking through 2.8TB for MIA data.
    >
    > Can anyone shed some light on why this happens?


    My best guess is the NetApp somehow handles sparsely allocated
    files differently so that "du" sees the block actually
    allocated not just the file size using the address of the last
    byte.

    Alternate theory that is far less likely: On your source tree
    you have a history of making hundreds of thousands of files
    and then deleting nearly all of them, leaving a lot of very
    large directories. On your target tree the directories are
    much smaller.

    Yet another alternate theory: Smaller blcok/fragment/extent
    size on the target. So on the source any file has a fairly
    large minimum block count but on the target smaller files
    take fewer blocks. You would need very many small files to
    account for a 3% difference, but a few 100K files under 512
    bytes should cause this.


  3. Re: interfilesystem copies: large du diffs

    On 24 Aug 2005 02:08:46 -0700, orgone said something similar to:
    : I recently rsync'd around 2.8TB between a RHE server (jfs fs) and a
    : Netapps system. Did a 'du -sk' against each to verify the transfers:
    :
    : 2894932960 sources total, KB
    : 2751664496 destination total, KB
    :
    : That's a 140GB discrepancy. Subsequent verbose rsyncs have turned up
    : nothing that was not originally transferred.

    What are the native block sizes of the two filesystems? If you've got
    a large enough number of files and directories there, a smaller block size
    on the destination could account for the discrepancy in terms of less unused
    space at the end of the last block of each file.

    Another thing that I've seen cause discrepancies like this on occasion is
    when the source directories once had many more files in them then they
    currently do. Once more blocks have been allocated to a directory, they
    don't get deallocated when the number of files drops.


  4. Re: interfilesystem copies: large du diffs

    orgone wrote:
    > I recently rsync'd around 2.8TB between a RHE server (jfs fs) and a
    > Netapps system. Did a 'du -sk' against each to verify the transfers:
    >
    > 2894932960 sources total, KB
    > 2751664496 destination total, KB
    >
    > That's a 140GB discrepancy. Subsequent verbose rsyncs have turned up
    > nothing that was not originally transferred.
    >
    > I often note similar behaviour with smaller transfers between servers
    > with similar OS/fs combos and have always seen it to come extent with
    > transfers between systems of any type. It's just that the usual
    > discrepancies in this case are magnified greatly by the sheer volume of
    > data. Needless to say, 140GB going missing would be a bit of a problem
    > and it's not much fun picking through 2.8TB for MIA data.


    Rsync has a "-c" option for producing checksums, I imagine that would
    give me some reassurance that the transfer ocurred correctly. There is
    also the "-v" verbose option as you noted.

    To be certain I'd consider checksumming all the files on each system
    (e.g. something like find mydirectory -exec sum {} \; > sysname.sums)
    and use diff to compare the results. If really paranoid I'd use md5sum
    instead of sum. I imagine this will take considerable time on 2.8TB so
    I'd try it on small subsets first :-)

+ Reply to Thread