I recently came across a problem of needing to create incremental backups to a remote site for my server in the case of a failure. Since my VPS provider didn’t provide this as a service (paid or free), I had to come up with a different solution. This solution assumes that you are using Ubuntu (in my case, Karmic Koala), root access and an Amazon S3 account.  Also, this assumes that you are willing to spend the money to back up to S3.  The pricing structure is here, but in my experience, my initial backup cost $3.78 and since then, my average monthly bill has been < $0.25.  You can calculate your own bill with this handy Amazon S3 S3/EC2 calculator.

I know that using FUSE is not the fastest method of backing up, so you’re mileage may vary depending on your tolerance levels and needs.  The actual download site for FuseOverAmazon is here.  Also, I am using rsync because I believe that incremental (differential) backups are far more efficient and cost/time saving than full backups every week.

1.  The first step is to install all the dependencies we’ll need for FUSE:

sudo apt-get install build-essential libcurl4-openssl-dev libxml2-dev libfuse-dev

Next, install the most recent version of s3fs. As of now the most recent is r191, but here is a link to the downloads section so that you can check to see which version is the most up-to-date. I chose to put my src download in /usr/local/src.

wget http://s3fs.googlecode.com/files/s3fs-r191-source.tar.gz
tar -xzf s3fs*
cd s3fs
make
sudo make install
sudo mkdir /backup/s3
sudo chown yourusername:yourusername /backup/s3

2. Scripting your backup plan:

You’ll need to create a bucket on the S3 cloud.  If you haven’t done this already, you can use an online tool like JetS3t (my favorite).  I would recommend that you create a separate bucket for each logical site you are going to backup.  For example, I backup each one of my repositories in Unfuddle in a different bucket.  That makes restoring easier.  You might also want to consider replicating to multiple locations, if you don’t trust that Amazon can keep your data safe or even use a separate service provider like JungleDisk, Mozy or Backblaze.

Using gvim or TextMate (or some other text editor), we are going to automate mounting the volume, perform a sync and unmount the volume.  The reason I unmount is for safety.  If somehow the hard disk becomes corrupted, I have a bit of time to prevent the script from running and replicating the bad data.  If the volume is constantly mounted, that may not be the case.  It is also easy to wipe out the volume if you aren’t careful.

The following will be the script in your backup script, s3fs-backup.sh (or whatever you name yours):

#!/bin/bash

/usr/bin/s3fs yourbucket -o accessKeyId=yourS3key -o secretAccessKey=yourS3secretkey /mnt/s3
/usr/bin/rsync -avz --delete /home/username/dir/you/want/to/backup /mnt/s3
/usr/bin/rsync -avz --log-file=log.file --delete --exclude /sys --exclude /mnt --exclude /proc --exclude /tmp / /mnt/s3 #exclude some directories
mail -s "backup complete with log" user@host.org &lt; log.file #email yourself the log
mv log.file log.file.`date +"%Y%m%d%H%M%S"` # move the file to a log with a datetime stamp
/bin/umount /mnt/s3

There some directories that I don’t want to backup, one being proc, because the that directory is manged by the OS while the system is running. You don’t want to restore this directory. Also, even though rsync is smart enough to recognize cycles, we don’t want to backup our /mnt/s3 directory. We exclude those here. Note, the –delete option. This will delete any files that have been removed on the ’source’. Lastly, note that we can increase/decrease the verbosity of the script and email ourselves a transcript of the backup session so we know that it actually took place – not a bad way to keep tabs.  After we are finished emailing ourselves (the potentially massive log file), we rename it to keep track of our backups on the server as well. There are many more options with rsync, so check out the man pages for the command to customize your script.

chmod 755 s3fs-backup.sh

Before you run the entire script, you might want to use the line above to change the permissions on the script you just saved.  You can verify the integrity of the script by running each command individually, which isn’t a bad idea after editing it for your own situation because mistakes do happen.  A quick check after the S3 volume (df -h) is mounted will show 256T available for your own personal use.

The most important part is automating the backup process.  If you forget and you lose your most recent data, then what was the point!?  We are going to use good ol’ fashioned *nix cron daemon to handle this process for us. There are two options for creating your crontab.  You can either put this script (or a softlink) to it in your cron.hourly, cron.daily, cron.weekly, cron.monthly folder or you can directly edit the crontab file to have more control over when the script runs.  I personally run mine every hour and every week on Sunday.  Here is a nice cron reference to customize your schedule.

crontab -e
* * * * * /path/to/s3fs.sh # this runs it hourly
0 0 0 0 0 /path/to/s3fs.sh # this runs it every week on sunday

A note about speed: The initial backup could take a long time.  The server up-stream speed is the limiting factor on how long this takes.  While rsync is a great program, using FUSE is not the speediest option in the world. There is another solution out there called ‘s3sync.’

To run the script initially and create your first back-up (if you can’t wait), simply run this command: sudo ./s3fs.sh.

One last nice thing is that this can be adapted to run anywhere, other servers, your home computers, etc.  If you can install Ruby and the dependencies above, you can have ultra cheap backups without a lot of hassle.

That’s it!