tape backup block size - Unix

This is a discussion on tape backup block size - Unix ; I'm curious how to select a reasonable block size for tape backups. I have discovered that the default of 10K is incredibly slow (1M/s) and that increasing it to 1024K goes much faster (8M/s). My question is why they don't ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: tape backup block size

  1. tape backup block size

    I'm curious how to select a reasonable block size for tape backups.
    I have discovered that the default of 10K is incredibly slow (1M/s)
    and that increasing it to 1024K goes much faster (8M/s). My
    question is why they don't make the default larger? I can imagine
    two possibilities:

    - compatibility with older tape drives or operating systems
    - each file takes a minimum of 1 block

    But as I understand it, dump(1) savves each filesystem as a single
    tape "file", so the larger blocksize shouldn't hurt. I guess I can
    test this by creating a filesystem with a million 1-byte files, and
    see how many copies of it fit on a tape.

    If it matters, I'm using Linux with a 2.6 kernel (RHEL4) and a
    TAIT-2 tape drive on a remote machine (connecting via ssh).

    Damian Menscher
    --
    -=#| www.uiuc.edu/~menscher/ Ofc217)253-2757 |#=-
    -=#| The above opinions are not necessarily those of my employers. |#=-

  2. Re: tape backup block size

    Damian Menscher wrote:
    > I'm curious how to select a reasonable block size for tape backups.
    > I have discovered that the default of 10K is incredibly slow (1M/s)
    > and that increasing it to 1024K goes much faster (8M/s). My
    > question is why they don't make the default larger? I can imagine
    > two possibilities:
    > - compatibility with older tape drives or operating systems
    > - each file takes a minimum of 1 block
    > But as I understand it, dump(1) savves each filesystem as a single
    > tape "file", so the larger blocksize shouldn't hurt. I guess I can
    > test this by creating a filesystem with a million 1-byte files, and
    > see how many copies of it fit on a tape.
    > If it matters, I'm using Linux with a 2.6 kernel (RHEL4) and a
    > TAIT-2 tape drive on a remote machine (connecting via ssh).


    Historically, the tape/archive block sizes were rather small - e.g. as
    small as 512 bytes, and sometimes the maximum was only 10 KiB (20 512
    byte "blocks"). In some cases, this might matter for backwards
    compatibility.

    With much more modern (current, non-ancient) hardware, generally
    larger tape block sizes are faster and more efficient. Typically
    using the largest tape block size supported by the hardware (or driver
    dependencies, etc.) usually works out to be fastest and most efficient
    - at least provided one can stream the data to the tape drive fast
    enough to avoid underruns. Also, larger block sizes on most modern
    tape drives generally don't end up wasting more space - even if the
    data to be written is a quite small bit. Most notably, most all
    modern tape drives include hardware compression, and backup utilities
    typically pad any remaining space to a block boundary with nulls -
    which compresses exceedingly well. And in most cases, where multiple
    blocks are being written, less space is required on tape, as there is
    a reduced need for space for start/end of block (typically tape
    record marks) markers.


  3. Re: tape backup block size

    Damian Menscher wrote:
    >
    > I'm curious how to select a reasonable block size for tape backups.
    > I have discovered that the default of 10K is incredibly slow (1M/s)
    > and that increasing it to 1024K goes much faster (8M/s). My
    > question is why they don't make the default larger? I can imagine
    > two possibilities:
    >
    > - compatibility with older tape drives or operating systems
    > - each file takes a minimum of 1 block


    Possibility 3 that you didn't consider -

    Streaming tape technologies no longer have fixed block formats
    so the blocking is virtual anyways.

    The blocks in question were seekable. Streaming tape drives
    have not supported that for quite some time. Are there once again
    fixed block format technologies available like there were in the
    days of reel to reel?

    The timing difference is from buffering and any timing delays
    caused by delivering blocks that loss stream. Figure out the
    biggest buffer that fits, and use it.

    Years ago I timed how long it took to write end to end on a
    tape, and I doubled the blockzie each time. Early in the test
    each write took half as long as the previous. Later in the test
    all larger blocksizes took the same time. Classic roll-off
    curve. I picked a nice big blocksize well along the curve and
    never looked back. Since I did that with an old Exabyte 8mm
    helical scan drive the blocksize I picked no longer matters.
    What does matter is as long as a bigger blocksize helps,
    go for it and feel free to benchmark your own performance curve.


  4. Re: tape backup block size

    Doug Freyburger wrote:
    > Damian Menscher wrote:
    >>
    >> I'm curious how to select a reasonable block size for tape backups.
    >> I have discovered that the default of 10K is incredibly slow (1M/s)
    >> and that increasing it to 1024K goes much faster (8M/s). My
    >> question is why they don't make the default larger? I can imagine
    >> two possibilities:
    >>
    >> - compatibility with older tape drives or operating systems
    >> - each file takes a minimum of 1 block


    > Streaming tape technologies no longer have fixed block formats
    > so the blocking is virtual anyways.


    > The blocks in question were seekable. Streaming tape drives
    > have not supported that for quite some time. Are there once again
    > fixed block format technologies available like there were in the
    > days of reel to reel?


    My previous drive used a fixed blocksize (Certance STT3401A), but
    that was crap produced by Seagate/Certance. My current drive (a
    Sony TAIT-2) uses variable block sizes. I see what you mean
    about being seekable, though, as the -Q option to dump(1) doesn't
    work with this drive. (Presumably it would have with my older
    drive?)

    > The timing difference is from buffering and any timing delays
    > caused by delivering blocks that loss stream. Figure out the
    > biggest buffer that fits, and use it.


    > Years ago I timed how long it took to write end to end on a
    > tape, and I doubled the blockzie each time. Early in the test
    > each write took half as long as the previous. Later in the test
    > all larger blocksizes took the same time. Classic roll-off
    > curve. I picked a nice big blocksize well along the curve and
    > never looked back. Since I did that with an old Exabyte 8mm
    > helical scan drive the blocksize I picked no longer matters.
    > What does matter is as long as a bigger blocksize helps,
    > go for it and feel free to benchmark your own performance curve.


    For the curious:

    blksize speed (K/s)
    (K) local remote /dev/null
    10 7284 1172 17076
    32 12426 5141
    64 15054 5771
    128 same 6344
    256 6622
    512 6765
    1024 14389 6827 17764

    So local dumps max out at 15M/s, which is the limitation of the
    tape drive. Dumps to /dev/null max otu at 17M/s, so that's the
    limit of the hard drives (crappy 3ware raid5...). Remote dumps
    speed up significantly as I increase the blocksize. Given that
    restores with a 1024K blocksize work, and Michael convinced me
    that it's not wasting tape-space, I'm just running with that.

    I think the remote backup must be hitting a network limitation
    (100mbit networking) though I'd expect it to get 10M/s, not max out
    at 7M/s. If anyone has ideas, I'd love to hear them.

    Damian Menscher
    --
    -=#| www.uiuc.edu/~menscher/ Ofc650)253-2757 |#=-
    -=#| The above opinions are not necessarily those of my employers. |#=-

+ Reply to Thread