Backup Software - Embedded

This is a discussion on Backup Software - Embedded ; Please accept my apology for this not being exactly an embedded question, but as this is my favorite Linux group, this is the primary place for me to seek for advice. (and in special cases this _can_ be useful for ...

+ Reply to Thread
Results 1 to 17 of 17

Thread: Backup Software

  1. Backup Software

    Please accept my apology for this not being exactly an embedded
    question, but as this is my favorite Linux group, this is the primary
    place for me to seek for advice. (and in special cases this _can_ be
    useful for embedded devices)

    I am planning to do a highly automatically working backup program to be
    released under GPL.

    Same should use an external harddisk (or flash disk) and a DVD (or CD)
    burner as backup media.

    According to how long a version of the file is on the source tree, files
    should be stored on both media, using a suitable algorithm: the external
    disk should hold several versions of any file that changes between the
    backup cycles (daily versions for some days, fewer versions for less
    recent changes), long living versions finally are permanently stored on
    the DVD instead. Thus both media together are hold an image of the
    source tree, while the external harddisk is not likely to overflow. (Of
    course special manual actions should be provided that move everything to
    the DVD-chain, etc.) I plan to use a database (e.g. mySQL) to hold a
    directory of all versions of all files on the backup media.

    Some questions:

    - Does such a project already exist so that I don't need to start from
    scratch ?

    - Does it make sense to use a compressing file system for the external
    disk ? Which one is known as save ?

    - How is it possible to write to a DVD ROM ? I suppose the easiest way
    is to prepare the content on the harddisk and use a commandline tool to
    move it on the DVD. Any pointers ?

    Thanks a lot,

    - Michael

  2. Re: Backup Software

    Mikey Quick wrote:
    > Please accept my apology for this not being exactly an embedded
    > question, but as this is my favorite Linux group, this is the primary
    > place for me to seek for advice. (and in special cases this _can_ be
    > useful for embedded devices)
    >
    > I am planning to do a highly automatically working backup program to be
    > released under GPL.
    >
    > Same should use an external harddisk (or flash disk) and a DVD (or CD)
    > burner as backup media.
    >
    > According to how long a version of the file is on the source tree, files
    > should be stored on both media, using a suitable algorithm: the external
    > disk should hold several versions of any file that changes between the
    > backup cycles (daily versions for some days, fewer versions for less
    > recent changes), long living versions finally are permanently stored on
    > the DVD instead. Thus both media together are hold an image of the
    > source tree, while the external harddisk is not likely to overflow. (Of
    > course special manual actions should be provided that move everything to
    > the DVD-chain, etc.) I plan to use a database (e.g. mySQL) to hold a
    > directory of all versions of all files on the backup media.
    >
    > Some questions:
    >
    > - Does such a project already exist so that I don't need to start from
    > scratch ?
    >
    > - Does it make sense to use a compressing file system for the external
    > disk ? Which one is known as save ?
    >
    > - How is it possible to write to a DVD ROM ? I suppose the easiest way
    > is to prepare the content on the harddisk and use a commandline tool to
    > move it on the DVD. Any pointers ?
    >
    > Thanks a lot,
    >
    > - Michael


    If you are backing up to DVD or CD, you have to be aware that these
    media have limited lifetimes (although no one is quite sure what that
    lifetime might be...).

    Before thinking of a backup strategy, think about your restore strategy.
    People often forget that, and come up with a system that requires
    searching through a dozen "incremental" backup CDs to get the files they
    are really looking for. The other key point for your backups is that
    they must be kept off site - your backups are of little value if a thief
    or a fire destroys the backups as well as the originals.

    The backup system I use is "dirvish" ( http://www.dirvish.com/ ). It is
    basically a wrapper around rsync - each backup is a direct full copy of
    the source tree. There are a few smart things about it - using rsync,
    it only copies over the differences between the current source and the
    latest backup. It also snapshots the source, so that in your backup
    directory you have a series of directories labelled by date, each
    containing an image of the original source tree on that date. It uses
    hard links during the backup procedure, so that unchanged files are
    multiply linked and thus do not take up extra space. Finally, it has a
    flexible system for giving expiry dates to backups, such as daily
    backups being kept for a month, weekly backups kept for three months,
    and monthly backups kept for a year.

    The rsync protocol and the use of hard links mean that only changes are
    ever backed up, yet the backup always appears as a full backup. I use
    it for doing backups of the office servers (something like 200 GB of
    data) to a server at my home over an ADSL link - it seldom takes more
    than an hour, since there is not a huge amount of change to the data set
    each day, and couple of ordinary hard disks on the backup machine are
    sufficient for years (lvm is your friend - it's important that all the
    disks are part of the same file system).

    And if I need to restore anything, I've got online snapshots of all the
    data, for any given date.

  3. Re: Backup Software

    David, Thanks a lot for answering
    > If you are backing up to DVD or CD, you have to be aware that these
    > media have limited lifetimes (although no one is quite sure what that
    > lifetime might be...).


    Of course you are right. That is true for tapes, as well. I even read
    the recommendation to use DVDs instead of tapes, as the small likelihood
    that in some decades functional drives for reading the tapes used will
    not be available is the largest danger to be considered

    >
    > Before thinking of a backup strategy, think about your restore strategy.
    > People often forget that, and come up with a system that requires
    > searching through a dozen "incremental" backup CDs to get the files they
    > are really looking for.


    Again you are right. This concept is primarily meant for recovering
    versions lost by human actions and only secondarily meant for disaster
    recovery. So fully automatic backup and the possibility to restore
    single files is the _primary_ intention.

    There are lots of backup solutions for disaster recovery that might be
    use additionally. In case of disaster you would need to set up a working
    system before you can use the backup media, so this is not a quick
    solution anyway. To be fast, a mirrored system is the way to go. Of
    course a restore strategy also suitable for disaster and other
    mass-restores might be a good enhancement if the system is doing it's
    primary purpose OK.

    > The other key point for your backups is that
    > they must be kept off site - your backups are of little value if a thief
    > or a fire destroys the backups as well as the originals.


    Right again. Same issue with any backup solution. And very often
    forgotten. Doing the backup remotely over a Network helps independently
    from the software used. Doing an incremental ("intelligent") strategy
    reduces the Network traffic greatly. IMHO the strategy I consider is
    very useful for "over the wire" backups.

    >
    > The backup system I use is "dirvish" ( http://www.dirvish.com/ ). It is
    > basically a wrapper around rsync


    Thanks a lot for the pointer. I'll take a look at dirvish/rsync. (If I
    can find something usable for me, I'm not going to do it myself)

    How does this rsync handle versions of modified files ? any chanced to
    get back a previous version after some days ?

    How does this rsync handle deleted files ? Any chanced to get one back
    after some days ? Don't they clutter the backup disk ?

    Thanks again,

    -Michael


  4. Re: Backup Software

    David, Thanks a lot for answering
    > If you are backing up to DVD or CD, you have to be aware that these
    > media have limited lifetimes (although no one is quite sure what that
    > lifetime might be...).


    Of course you are right. That is true for tapes, as well. I even read
    the recommendation to use DVDs instead of tapes, as the small likelihood
    that in some decades functional drives for reading the tapes used will
    not be available is the largest danger to be considered

    >
    > Before thinking of a backup strategy, think about your restore strategy.
    > People often forget that, and come up with a system that requires
    > searching through a dozen "incremental" backup CDs to get the files they
    > are really looking for.


    Again you are right. This concept is primarily meant for recovering
    versions lost by human actions and only secondarily meant for disaster
    recovery. So fully automatic backup and the possibility to restore
    single files is the _primary_ intention.

    There are lots of backup solutions for disaster recovery that might be
    use additionally. In case of disaster you would need to set up a working
    system before you can use the backup media, so this is not a quick
    solution anyway. To be fast, a mirrored system is the way to go. Of
    course a restore strategy also suitable for disaster and other
    mass-restores might be a good enhancement if the system is doing it's
    primary purpose OK.

    > The other key point for your backups is that
    > they must be kept off site - your backups are of little value if a thief
    > or a fire destroys the backups as well as the originals.


    Right again. Same issue with any backup solution. And very often
    forgotten. Doing the backup remotely over a Network helps independently
    from the software used. Doing an incremental ("intelligent") strategy
    reduces the Network traffic greatly. IMHO the strategy I consider is
    very useful for "over the wire" backups.

    >
    > The backup system I use is "dirvish" ( http://www.dirvish.com/ ). It is
    > basically a wrapper around rsync


    Thanks a lot for the pointer. I'll take a look at dirvish/rsync. (If I
    can find something usable for me, I'm not going to do it myself)

    How does this rsync handle versions of modified files ? any chanced to
    get back a previous version after some days ?

    How does this rsync handle deleted files ? Any chanced to get one back
    after some days ? Don't they clutter the backup disk ?

    Thanks again,

    -Michael



  5. Re: Backup Software

    Michael Schnell wrote:
    > David, Thanks a lot for answering
    >> If you are backing up to DVD or CD, you have to be aware that these
    >> media have limited lifetimes (although no one is quite sure what that
    >> lifetime might be...).

    >
    > Of course you are right. That is true for tapes, as well. I even read
    > the recommendation to use DVDs instead of tapes, as the small likelihood
    > that in some decades functional drives for reading the tapes used will
    > not be available is the largest danger to be considered
    >
    >>
    >> Before thinking of a backup strategy, think about your restore
    >> strategy. People often forget that, and come up with a system that
    >> requires searching through a dozen "incremental" backup CDs to get the
    >> files they are really looking for.

    >
    > Again you are right. This concept is primarily meant for recovering
    > versions lost by human actions and only secondarily meant for disaster
    > recovery. So fully automatic backup and the possibility to restore
    > single files is the _primary_ intention.
    >


    Most disasters are caused by human actions (hopefully the reverse is not
    true...)

    > There are lots of backup solutions for disaster recovery that might be
    > use additionally. In case of disaster you would need to set up a working
    > system before you can use the backup media, so this is not a quick
    > solution anyway. To be fast, a mirrored system is the way to go. Of
    > course a restore strategy also suitable for disaster and other
    > mass-restores might be a good enhancement if the system is doing it's
    > primary purpose OK.
    >


    One of the great advantages of dirvish is that you have a mirror, and
    thus quick and easy restores of either single files or whole trees.

    >> The other key point for your backups is that they must be kept off
    >> site - your backups are of little value if a thief or a fire destroys
    >> the backups as well as the originals.

    >
    > Right again. Same issue with any backup solution. And very often
    > forgotten. Doing the backup remotely over a Network helps independently
    > from the software used. Doing an incremental ("intelligent") strategy
    > reduces the Network traffic greatly. IMHO the strategy I consider is
    > very useful for "over the wire" backups.
    >


    Dirvish lets you use incremental backups, with the underlying rsync even
    supporting incremental backups of changes to large files, so that things
    like mailboxes are handled efficiently. An incremental backup of our
    office data normally takes under an hour - the single original full
    backup would have taken weeks over the same wire, so I cheated and
    plugged the backup server into our local network for the first run.

    >>
    >> The backup system I use is "dirvish" ( http://www.dirvish.com/ ). It
    >> is basically a wrapper around rsync

    >
    > Thanks a lot for the pointer. I'll take a look at dirvish/rsync. (If I
    > can find something usable for me, I'm not going to do it myself)
    >


    Dirvish gives you a higher level viewpoint - it uses the goodies from
    rsync, without you having to read the details in the man page, or write
    your own scripts. But it's also worth reading about rsync, to see
    what's happening underneath.

    > How does this rsync handle versions of modified files ? any chanced to
    > get back a previous version after some days ?
    >


    Using dirvish (or your own rsync scripts), yes, no problem. All the old
    backup snapshots are easily available (depending on your "expire"
    policies for backup snapshots). When a file has not changed between
    backups, extra copies are hard linked to the old copies, avoiding wasted
    disk space. Changed versions get their own files - but the network copy
    may only send the changed parts, reducing network traffic.

    > How does this rsync handle deleted files ? Any chanced to get one back
    > after some days ? Don't they clutter the backup disk ?
    >


    If I delete a file today, the original remains in yesterday's backup
    snapshot, and is gone from this evening's backup run. Files eventually
    disappear off the disk if all their links are removed during "expire"
    runs to clear out old backups.


    One thing to remember about dirvish (or other similar rsync based backup
    systems, as there are several around), is that you end up with a lot of
    files on the disk, and a huge number of links. It's also much more
    efficient if you have a single file system rather than spread the
    backups (for any given "vault") over separate file systems, since you
    can then use hard links. You can also expect the backup system to grow
    larger than you thought when you first setup the system.

    The best way to set up the system (IMHO), therefore, is to use LVM and
    reiserfs. Reiserfs copes better with large numbers of files and links
    than most file systems, and has no (realistic) limits. It also supports
    on the fly size changes, with no issues about limited inode tables. So
    when you start getting low on disk space, you simply connect up a new
    hard disk, make it an LVM physical volume, link it to your volume group,
    and then add as much space as you need to your backup logical volume. I
    use LVM for most of my data volumes - any time I need more space,
    assuming there is free space left on the disk, I can grow the file
    systems in 10 seconds with a couple of commands without even umounting
    the filesystems.


    > Thanks again,
    >
    > -Michael
    >


  6. Re: Backup Software

    All this sounds really promising. (As the unsolved Backup problem is the
    cause why I did not yet migrate the server to Linux, I now have hope
    that it can be done soon.)

    Just the question of deleted files is not clear to me yet. My hope is
    that they are handled like versions: if I set the expire time to say 10
    days (or whatever a suitable setting is), a deleted file should stay
    accessible on the backup for 10 days, too, and get deleted at day 11.

    Thanks again,
    -Michael

  7. Re: Backup Software

    Mikey Quick wrote:
    > All this sounds really promising. (As the unsolved Backup problem is the
    > cause why I did not yet migrate the server to Linux, I now have hope
    > that it can be done soon.)
    >
    > Just the question of deleted files is not clear to me yet. My hope is
    > that they are handled like versions: if I set the expire time to say 10
    > days (or whatever a suitable setting is), a deleted file should stay
    > accessible on the backup for 10 days, too, and get deleted at day 11.
    >


    That is correct.

    > Thanks again,
    > -Michael



    A more usual arrangement for your expiry times is to have a variety -
    for example, this is an excerpt from my dirvish "master.conf" file:

    # Most backups kept for 30 days
    # Monday backups kept for 6 months
    # First Monday in month kept for 3 years
    # Quad-yearly backups kept forever

    expire-default: +30 days
    expire-rule:
    # Min Hr DOM Mon DOW Expire
    * * * * 2 +6 months
    * * 1-7 * 2 +3 year
    * * 1-7 1,4,7,11 2 never


  8. Re: Backup Software

    >
    >
    > A more usual arrangement for your expiry times is to have a variety -
    > for example, this is an excerpt from my dirvish "master.conf" file:
    >
    > # Most backups kept for 30 days
    > # Monday backups kept for 6 months
    > # First Monday in month kept for 3 years
    > # Quad-yearly backups kept forever
    >
    > expire-default: +30 days
    > expire-rule:
    > # Min Hr DOM Mon DOW Expire
    > * * * * 2 +6 months
    > * * 1-7 * 2 +3 year
    > * * 1-7 1,4,7,11 2 never
    >


    Thanks a lot !

    I definitely will do the server migration soon and use dirvish.

    -Michael

  9. Re: Backup Software

    Still another question.

    To move the the backup drive location away from the server, would it be
    possible to use a NAS device (instead of an USB or FireWire disk, as I
    intended before) as a backup medium. I understand that this NAS would
    need to use a Network protocol that supports hard links (NFS ? SSH ?)
    and should to use Raiser as a file system.

    -Michael

  10. Re: Backup Software

    Michael Schnell wrote:
    > Still another question.
    >
    > To move the the backup drive location away from the server, would it be
    > possible to use a NAS device (instead of an USB or FireWire disk, as I
    > intended before) as a backup medium. I understand that this NAS would
    > need to use a Network protocol that supports hard links (NFS ? SSH ?)
    > and should to use Raiser as a file system.
    >
    > -Michael


    I have not used NAS (or NFS, for that matter), so I couldn't tell you.
    I also don't know how LVM would work with removable drives. But there
    is no requirement to use LVM or reiserfs - these are just my personal
    favourites for such systems (I've set up two such backup arrangements).
    I find the backup over ADSL to be far more convenient than using
    removable media - quite simply, it is far less effort and therefore far
    more reliable. But of course, that requires you having two locations
    linked by broadband.

  11. Re: Backup Software

    > I have not used NAS (or NFS, for that matter), so I couldn't tell you. I
    > also don't know how LVM would work with removable drives. But there is
    > no requirement to use LVM or reiserfs - these are just my personal
    > favourites for such systems (I've set up two such backup arrangements).


    I do follow your argument preferring raiser, of course

    > I find the backup over ADSL to be far more convenient than using
    > removable media - quite simply, it is far less effort and therefore far
    > more reliable. But of course, that requires you having two locations
    > linked by broadband.


    For this small site, I would find it secure enough to install the backup
    drive in the basement, which is very secure against fire. But it's a
    little damp so I don't want to install the server there.

    -Michael

  12. Re: Backup Software

    On Mon, 10 Jul 2006 17:10:56 +0200, Mikey Quick wrote:

    > All this sounds really promising. (As the unsolved Backup problem is the
    > cause why I did not yet migrate the server to Linux, I now have hope
    > that it can be done soon.)
    >


    Also check out Amanda... It's an enterprise backup solution. It backs up
    to tapes primarily. Its biggest advantage is that restores can be handled
    via ordinary tools - tar and friends - so a bare metal restore is easier.

    Although dirvish sounds really nice for a small, single server system.

    --Yan

    --
    o__
    ,>/'_ o__
    (_)\(_) ,>/'_ o__
    Yan Seiner, PE (_)\(_) ,>/'_ o__
    Certified Personal Trainer (_)\(_) ,>/'_ o__
    Licensed Professional Engineer (_)\(_) ,>/'_
    Who says engineers have to be pencil necked geeks? (_)\(_)


  13. Re: Backup Software

    Thanks for the pointer

    -Michael

  14. Re: Backup Software


    > A more usual arrangement for your expiry times is to have a variety -
    > for example, this is an excerpt from my dirvish "master.conf" file:
    >
    > # Most backups kept for 30 days
    > # Monday backups kept for 6 months
    > # First Monday in month kept for 3 years
    > # Quad-yearly backups kept forever
    >
    > expire-default: +30 days
    > expire-rule:
    > # Min Hr DOM Mon DOW Expire
    > * * * * 2 +6 months
    > * * 1-7 * 2 +3 year
    > * * 1-7 1,4,7,11 2 never
    >


    Of course all files will need to stay on the backup unless they are
    overwritten on the working disk.

    (How) Does this setting provide for this ?

    Thanks,
    -Michael

  15. Re: Backup Software

    Michael Schnell wrote:
    >
    >> A more usual arrangement for your expiry times is to have a variety -
    >> for example, this is an excerpt from my dirvish "master.conf" file:
    >>
    >> # Most backups kept for 30 days
    >> # Monday backups kept for 6 months
    >> # First Monday in month kept for 3 years
    >> # Quad-yearly backups kept forever
    >>
    >> expire-default: +30 days
    >> expire-rule:
    >> # Min Hr DOM Mon DOW Expire
    >> * * * * 2 +6 months
    >> * * 1-7 * 2 +3 year
    >> * * 1-7 1,4,7,11 2 never
    >>

    >
    > Of course all files will need to stay on the backup unless they are
    > overwritten on the working disk.


    All files stay on the backup even if they are overwritten on the
    original disk. That's the point of a backup. When dirvish runs and
    makes a directory called "20060717" for the 17th July, 2006 backup, then
    that directory contains a snapshot copy of the original disk when the
    backup was run. The contents of the 20060717 backup directory never
    change after that, unless it is "expired" (using the dirvish-expire
    script) according to the expire rules used when the backup was taken.
    When it is "expired", the directory is completely removed.

    Remember, all this stuff about hard links and rsync differential backups
    is just to make the process more efficient (both in terms of disk space
    and network bandwidth). The backups appear as simple full copies of the
    original source.

    >
    > (How) Does this setting provide for this ?
    >
    > Thanks,
    > -Michael


  16. Re: Backup Software

    Really great !

    Thanks a lot for taking the time to explain dirvish to me.

    -Michael

  17. Re: Backup Software

    Michael Schnell wrote:
    > Really great !
    >
    > Thanks a lot for taking the time to explain dirvish to me.
    >
    > -Michael


    I hope it works out for you (and anyone else who was following this
    thread - I know it is off-topic in an embedded group, but backups are
    useful for everyone).

+ Reply to Thread