rsync not hardlinking anymore? - Tools

This is a discussion on rsync not hardlinking anymore? - Tools ; or... du and df show discrepancies? After adding an extra hard drive to our RAID, all of the sudden the RAID capacity is no longer consistent between du and df readings, and the drive (according to df) is filling up ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: rsync not hardlinking anymore?

  1. rsync not hardlinking anymore?

    or... du and df show discrepancies?

    After adding an extra hard drive to our RAID, all of the sudden the
    RAID capacity is no longer consistent between du and df readings, and
    the drive (according to df) is filling up way too quickly. The RAID
    partition (/backup) is exclusively used for rsyncing backups from other
    servers. Each server has its own subdirectory in /backup. We were
    running out of storage space, so I added another identical 160G drive
    to the array. Before adding the new drive, the array could hold more
    than a month's worth of backups. Now, after adding the drive, it can
    only hold 4 days worth! df shows the array nearly full:

    backup_server:~# df -h
    Filesystem Size Used Avail Use% Mounted on
    /dev/sda1 440G 341G 77G 82% /backup

    while the sum of du -s /backup/* shows only half that amount:

    backup_server:/backup# du -hs *
    5.2G server1
    6.1G server2
    14G server3
    23G server4
    824M server5
    12K server6
    9.1M server7
    20K server8
    20G server9
    2.0G server10
    1.1G server11
    7.0G server12
    1.4G server13
    20K server14
    16K lost+found
    111G server15
    64G server16
    14G server17
    25G server18
    8.0K server19
    49G server20
    686M server21

    With the added capacity of the array, the du sum makes sense, but I've
    read that df is the most accurate and will always tell you exactly
    what's on the drive. I've checked for hidden files, temp files that may
    still be open, ran fsck -pfv /dev/sda1, even rebuilt the array
    (hardware RAID on a Dell Poweredge 1800 with an Adaptec RAID card),
    verifying the intergrity of each drive before doing so, and
    reformatting the array. The only thing that has changed, besides adding
    the drive, is the format. When it was working, the fs was XFS. Since
    the problem has been occurring, it has been ext3. Our customized rsync
    script creates hard links between dated directories for each backup.
    I'm guessing that rather than hard links being created, actual files
    are being duplicated from day to day. Or else, the fs is no longer
    recognizing hardlinks (I'm just taking pot-shot guesses here). Could
    the difference between ext3 and XFS make this happen? I'm stumped. Can
    anyone shed light?

    Here is an example script, run daily from cron on each server:

    #!/bin/sh

    LIST='/etc /home'

    SERVER='backup_server.domain.edu'
    PASSWORD='xxxxx'
    RSYNC='/usr/bin/rsync'
    SSH='/usr/bin/ssh'
    KEY='/root/.ssh/id_rsa'
    RUSER='root'

    TEMP=$( tempfile )
    exec &> $TEMP

    MODULE=$( hostname -s )-$PASSWORD

    LAST=$( rsync -e "$SSH -i $KEY" $RUSER@$SERVER::$MODULE/ | sort -nr +4
    | awk '$5 ~ /^[0-9-]+$/ {print $5; exit}' )
    CURRENT=$( date +%Y-%m-%d )

    $RSYNC -av -e "$SSH -i $KEY" --force --delete --ignore-errors
    --numeric-ids --link-dest=/$LAST \
    $LIST $RUSER@$SERVER::$MODULE/$CURRENT/

    echo "LAST was $LAST"
    echo "CURRENT was $CURRENT"

    if [ $? -ne 0 ]; then
    mail -s "$( hostname -s ) backup error" sysalert@domain.edu < $TEMP
    fi


  2. Re: rsync not hardlinking anymore?

    Solved:

    In the script above, before posting it, I realized an error I had made
    when I modified the script to use rsync over SSH. I corrected the error
    yesterday before posting the above problem. The first call to SSH (line
    17) is for file comparison before the differential backup. In the
    original script, I had inadvertently left out the double-quotes (")
    around $SSH -i $KEY. Therefore, rsync was attempting to see this code
    as rsync switches, which made no sense at all it (understandably), so
    that phase of the rsync process failed. As a result, each new day's
    backup considered no pre-existing backup, and therefore no hardlinks. A
    complete duplication of data each day. Last nights backups, with the
    corrected rsync script, were normal as expected.

    So, if anyone is reading this out of pure geeky interest, or pure
    boredom, it's just another lesson in the importance of paying attention
    to details.


+ Reply to Thread