File corruptions with rsync version 2.6.9 on 64-bit openSUSE 10.3 - Tools

This is a discussion on File corruptions with rsync version 2.6.9 on 64-bit openSUSE 10.3 - Tools ; Hi, I'm part of the team that runs the Bioconductor project http://bioconductor.org/ and we've used rsync successfully so far for a lot of different things in particular for moving the hundreds of packages that we build and check every day ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: File corruptions with rsync version 2.6.9 on 64-bit openSUSE 10.3

  1. File corruptions with rsync version 2.6.9 on 64-bit openSUSE 10.3

    Hi,

    I'm part of the team that runs the Bioconductor project

    http://bioconductor.org/

    and we've used rsync successfully so far for a lot of different
    things in particular for moving the hundreds of packages that we build
    and check every day thru our build system pipe (which is made of several
    build nodes running different OSes, see our daily build report here:
    http://bioconductor.org/checkResults/2.2/bioc-LATEST/).

    At the very end of the build pipe, rsync is used again to sync our
    public package repository (http://bioconductor.org/packages/2.2/bioc/)
    with an internal repository that is behind a firewall.

    Until recently, the internal repository was hosted on lamb1, a 64-bit
    SUSE LINUX 10.1 system:

    biocadmin@lamb1:~> rsync --version
    rsync version 2.6.6 protocol version 29
    Copyright (C) 1996-2005 by Andrew Tridgell and others

    Capabilities: 64-bit files, socketpairs, hard links, ACLs, symlinks, batchfiles,
    inplace, IPv6, 64-bit system inums, 64-bit internal inums, SLP

    rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you
    are welcome to redistribute it under certain conditions. See the GNU
    General Public Licence for details.

    and AFAICT we've never observed any file corruption when rsync'ing
    between lamb1 and bioconductor.org. rsync was run everyday on lamb1
    with the following options:

    rsync --delete -ave ssh SRC USER@HOSTEST

    Recently we've set up a new machine, wilson1, for hosting the internal
    package repository. wilson1 is a 64-bit openSUSE 10.3 system:

    biocadmin@wilson1:~> rsync --version
    rsync version 2.6.9 protocol version 29
    Copyright (C) 1996-2006 by Andrew Tridgell, Wayne Davison, and others.

    Capabilities: 64-bit files, socketpairs, hard links, symlinks,
    batchfiles, inplace, IPv6, ACLs, xattrs, SLP
    64-bit system inums, 64-bit internal inums

    rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you
    are welcome to redistribute it under certain conditions. See the GNU
    General Public Licence for details.

    Now when we use rsync on wilson1 to synchronize the internal and public
    package repositories, we end up having corrupted files on the public
    repository (their md5sums differ between local and remote file, but their
    sizes and timestamps are exactly the same). On wilson1, we use rsync
    exactly the same way as on lamb1 i.e. we do:

    rsync --delete -ave ssh SRC USER@HOSTEST

    The destination machine (bioconductor.org) is a 64-bit SUSE LINUX
    Enterprise Server 9 system. It has not changed during our switch
    from lamb1 to wilson1 for the source machine.

    It seems that the frequency of the corruptions is low but since
    the total volume of packages that we produce is high (> 30G,
    a few packages are several hundred MB), we end up having a few
    corrupted packages on bioconductor.org (9 in total today, most of
    them are among the biggest packages we produce i.e. they are >
    700MB).

    Of course, if I rerun

    rsync --delete -ave ssh SRC USER@HOSTEST

    again, the corrupted files are not detected so nothing happens.

    But strangely enough, if I delete the corrupted file by hand and
    rerun the above command, then this time the transfer seems to be
    OK. But may that's just luck (given that the corruptions seem to
    happen randomly). I've only done this manual deletion once and for
    1 file only because I want to give some time to our IT guys to look
    into this problem.

    Any idea what could be going wrong? What kind of extra information
    would you need?

    Thanks in advance for your help,

    H.
    --
    Please use reply-all for most replies to avoid omitting the mailing list.
    To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
    Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


  2. Re: File corruptions with rsync version 2.6.9 on 64-bit openSUSE10.3

    Hi,

    An update on this: we might have an hardware problem.

    After moving our internal package repository to another machine with the
    same OS, same patch level, same rsync version and same hardware, we don't
    observe file corruptions anymore.
    We've tried different versions of rsync on the broken machine (2.6.9, 3.0.2
    and 2.6.6) with different options (--whole-file and --ignore-times) and we
    always ended up with a few corrupted files on the remote machine (the
    destination).
    Then we discovered that running md5sum on the local files at different moments
    was producing different results even though no process/job was supposed to
    modify those files in the meantime (and the timestamps were confirming this).
    Some files would have an abnormal md5sum and look corrupted for a few minutes
    and then be back to their normal md5sum and look fine again.
    All the files are on a hardware RAID10 made of 4 disks of 230GB each and our
    IT guys are starting to suspect it.

    I'll post here again when we know more...

    Cheers,
    H.


    Kyle Lanclos wrote:
    > Are you experiencing hardware problems? While disk problems usually show
    > up in a log somewhere, something like memory or CPU problems ususally do
    > not on Linux systems.
    >
    > I've had at least two systems manifest CPU problems in the form of random
    > I/O corruption.
    >
    > --Kyle


    --
    Please use reply-all for most replies to avoid omitting the mailing list.
    To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
    Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


+ Reply to Thread