** I/O performance for backup - tar , rsync , cp on FreeBSD ** - BSD

This is a discussion on ** I/O performance for backup - tar , rsync , cp on FreeBSD ** - BSD ; I recently tried to backup one drive to another on a live production server with cp -Rp /path /path and the copy took almost 11 hours. My guess is this was due to hundreds of thousands of tiny small files ...

+ Reply to Thread
Results 1 to 10 of 10

Thread: ** I/O performance for backup - tar , rsync , cp on FreeBSD **

  1. ** I/O performance for backup - tar , rsync , cp on FreeBSD **

    I recently tried to backup one drive to another on a live production
    server with cp -Rp /path /path and the copy took almost 11 hours.
    My guess is this was due to hundreds of thousands of tiny small files
    within this copy and that cp is a very inefficient way (due to I/O) to
    do this backup.

    I was wondering if rsync is just as efficient as tar in that it will
    not require the I/O tasks that are being done with a full cp.

    Thanks for your input!

  2. Re: ** I/O performance for backup - tar , rsync , cp on FreeBSD**

    David wrote:
    > I recently tried to backup one drive to another on a live production
    > server with cp -Rp /path /path and the copy took almost 11 hours.
    > My guess is this was due to hundreds of thousands of tiny small files
    > within this copy and that cp is a very inefficient way (due to I/O) to
    > do this backup.
    >
    > I was wondering if rsync is just as efficient as tar in that it will
    > not require the I/O tasks that are being done with a full cp.
    >
    > Thanks for your input!


    My personal experience is that rsync is even less efficient than cp when
    it has to perform full copies of files. rsync is efficient if you make
    delta copies (that's what it is developed for). But I have to add that
    I've observed this when using rsync to copy files via ethernet
    connection to a NFS share. Under this setup cp is usual 2 - 3 times faster.

    Cheers,
    Bruno

    --
    Mail: inform.me@gmx.net
    IM: bruflu@swissjabber.ch

  3. Re: ** I/O performance for backup - tar , rsync , cp on FreeBSD **

    On Jul 9, 1:26*pm, Bruno Flueckiger wrote:
    > David wrote:
    > > I recently tried to backup one drive to another on a live production
    > > server with cp -Rp /path /path * and the copy took almost 11 hours.
    > > My guess is this was due to hundreds of thousands of tiny small files
    > > within this copy and that cp is a very inefficient way (due to I/O) to
    > > do this backup.

    >
    > > I was wondering if rsync is just as efficient as tar in that it will
    > > not require the I/O tasks that are being done with a full cp.

    >
    > > Thanks for your input!

    >
    > My personal experience is that rsync is even less efficient than cp when
    > it has to perform full copies of files. rsync is efficient if you make
    > delta copies (that's what it is developed for). But I have to add that
    > I've observed this when using rsync to copy files via ethernet
    > connection to a NFS share. Under this setup cp is usual 2 - 3 times faster.
    >
    > Cheers,
    > Bruno
    >
    > --
    > Mail: inform...@gmx.net
    > IM: * bru...@swissjabber.ch



    So what is the most efficient way to copy so many files from one drive
    to another ? is there a way to tar it without needing twice the
    space on the destination for an untar afterwards?

  4. Re: ** I/O performance for backup - tar , rsync , cp on FreeBSD **

    On Wed, 9 Jul 2008 18:44:39 UTC, David wrote:

    > So what is the most efficient way to copy so many files from one drive
    > to another ? is there a way to tar it without needing twice the
    > space on the destination for an untar afterwards?


    The way I learned years ago was to use back-to-back tar... feeding
    standard output of one into standard input of the other. However,
    there's still the small file handling.

    You might try pax or cpio and see if they work better; also make sure
    the file systems are tuned correctly. Setting noatime on the source
    mount might help, too.
    --
    Bob Eager
    UNIX since v6..
    http://tinyurl.com/2xqr6h


  5. Re: ** I/O performance for backup - tar , rsync , cp on FreeBSD **

    David wrote:
    : I recently tried to backup one drive to another on a live production
    : server with cp -Rp /path /path and the copy took almost 11 hours.
    : My guess is this was due to hundreds of thousands of tiny small files
    : within this copy and that cp is a very inefficient way (due to I/O) to
    : do this backup.


    cd TO_TOP_LEVEL_DIRECTORY_YOU_WANT_TO_COPY
    find . -print | cpio -pdm /TARGET_DIRECTORY

    This is reasonably fast, but all the cpio versions seem to have
    a bug copying files larger than 3.9999GB in size. They will report
    that a large file changed size during the copy (which it didn't),
    and copy rubbish for that file. Still, if all the files being
    copied are under 4GB, cpio is pretty effective.

    The fastest method for making real offline backups that I have
    seen is still "dump" but that doesn't sound quite like what
    you are wanting.


    Frank Durda IV - send mail to this address and remove the "LOSE":
    http://nemesis.lonestar.org
    "The Knights who say "LETNi" demand... A SEGMENT REGISTER!!!"
    "A what?" "LETNi! LETNi! LETNi!" - 1983
    Copyright 2008, ask before reprinting.


  6. Re: ** I/O performance for backup - tar , rsync , cp on FreeBSD **

    Hello,

    Bob Eager wrote:
    > On Wed, 9 Jul 2008 18:44:39 UTC, David wrote:
    >
    > > So what is the most efficient way to copy so many files from one drive
    > > to another ? is there a way to tar it without needing twice the
    > > space on the destination for an untar afterwards?

    >
    > The way I learned years ago was to use back-to-back tar... feeding
    > standard output of one into standard input of the other. However,
    > there's still the small file handling.
    >
    > You might try pax or cpio and see if they work better; also make sure
    > the file systems are tuned correctly. Setting noatime on the source
    > mount might help, too.


    Since I grew up on System V before switching to FreeBSD
    for most servers, I used to use find | cpio -p for tasks like
    that. Stock System V tar could not copy device nodes and
    a lot of other stuff, while cpio could.

    Recently I tried dump | restore and surprisingly it seemed
    _a_lot_ faster. I did not back that impression with benchmarks,
    but you might want to try it. Downside: works on entire
    filesystems, only.

    Kind regards,
    Patrick
    --
    punkt.de GmbH * Kaiserallee 13a * 76133 Karlsruhe
    Tel. 0721 9109 0 * Fax 0721 9109 100
    info@punkt.de http://www.punkt.de
    Gf: Jürgen Egeling AG Mannheim 108285

  7. Re: ** I/O performance for backup - tar , rsync , cp on FreeBSD **

    David writes:

    > On Jul 9, 1:26*pm, Bruno Flueckiger wrote:
    >> David wrote:
    >> > I recently tried to backup one drive to another on a live production
    >> > server with cp -Rp /path /path * and the copy took almost 11 hours.
    >> > My guess is this was due to hundreds of thousands of tiny small files
    >> > within this copy and that cp is a very inefficient way (due to I/O) to
    >> > do this backup.

    >>
    >> > I was wondering if rsync is just as efficient as tar in that it will
    >> > not require the I/O tasks that are being done with a full cp.

    >>
    >> > Thanks for your input!

    >>
    >> My personal experience is that rsync is even less efficient than cp when
    >> it has to perform full copies of files. rsync is efficient if you make
    >> delta copies (that's what it is developed for). But I have to add that
    >> I've observed this when using rsync to copy files via ethernet
    >> connection to a NFS share. Under this setup cp is usual 2 - 3 times faster.


    > So what is the most efficient way to copy so many files from one drive
    > to another ? is there a way to tar it without needing twice the
    > space on the destination for an untar afterwards?


    You can just put a pipeline in between the two tar invocations,
    avoiding intermediate storage.

    If the source is a whole filesystem (as I suspect it is from the
    original description), then dump/restore is the strategy of choice.
    You can also use a pipeline for this, feeding the standard output of
    dump directly into the standard input of restore.

    --
    Lowell Gilbert, embedded/networking software engineer
    http://be-well.ilk.org/~lowell/

  8. Re: ** I/O performance for backup - tar , rsync , cp on FreeBSD **

    On Wed, 9 Jul 2008 18:58:29 GMT, Frank Durda IV wrote:
    >
    > This is reasonably fast, but all the cpio versions seem to have
    > a bug copying files larger than 3.9999GB in size. They will report
    > that a large file changed size during the copy (which it didn't),
    > and copy rubbish for that file. Still, if all the files being
    > copied are under 4GB, cpio is pretty effective.


    I have a 6.1 where cpio has trouble with a 2.3GB file.
    /hjj

  9. Re: ** I/O performance for backup - tar , rsync , cp on FreeBSD **

    Hans J?rgen Jakobsen wrote:
    : I have a 6.1 where cpio has trouble with a 2.3GB file.

    2GB may be the size where the problem starts in cpio and not 4GB.
    (It's been a while since I first figured out what was going on and
    I've slept since then.) The 2GB boundary makes more sense for
    something still using signed 32bit values with fseek() and related calls.

    As many people are out there dealing with DVD images and such large
    files these days, I'm amazed this bug managed to exist for more than one
    release, but it was in at least FreeBSD 5.4, 5.5, 6.0, 6.1, 6.2 and 6.3
    per my own testing, and I assume it is in 7 too. (I had to back off of
    running 7 because the fairly basic things I need to use Firefox don't
    work in 7 yet and using that program is now about 33% of my daily tasks
    so...)

    Maybe the GNU guys aren't that interested in cpio anymore, or just slow
    to deal with bugs. I've never been sure if they are also the ones who
    broke "strings", or if FreeBSD decided to switch to the GNU version that
    had been broken forever. Does "strings /dev/cd0" or similar device
    work? If not, it's broken. Running "strings" and some other common
    programs on devices was legal and quite useful for 20 years.
    Annoying work-around: "cat /dev/cd0 | strings"


    Frank Durda IV - send mail to this address and remove the "LOSE":
    http://nemesis.lonestar.org
    A sign mounted high on a telephone pole in the country says: "Please do not
    shoot at the telephone wires. We appreciate your cooperation, Mr. Cheney."
    Copyright 2008, ask before reprinting.


  10. Re: ** I/O performance for backup - tar , rsync , cp on FreeBSD **

    This is an old thread, but I thought I should add this, since it is by far the fastest.
    If you are just interested in backing up the entire partition, and not interested in accessing them individually, use "dd".

    For example, my partition is /dev/sda1:

    dd if=/dev/sda1 of=backup.img bs=8192000

    This will put everything in partition /dev/sda1 into a file in your current directory called backup.img and use a ~8MB buffer, which will make it super fast. (adjust the buffer to the cache size of your hard drive for maximum efficiency)

    To restore the above:

    dd if=backup.img of=/dev/sda1 bs=8192000

    This will overwrite a partition /dev/sda1 with the previously saved partition. Make sure you have enough space and you have used fdisk to set the correct size and type of the partition.

    NOTE: The "dd" command works on device files, so any type of filesystem will work. And if you want to backup all partitions on a hard drive(i.e. the whole disk) use something like "/dev/sda" instead of "/dev/sda1". If you do that, everything including the partition table will be saved in the output file.

+ Reply to Thread