The Use of S3 and EC2 for Remote Backup

Even before the introduction of the Amazon S3 storage service I was intrigued bye the possibilities of secure backup over the Internet. Over the years I’ve evaluated a number of possibilities such as the use of rsync and Unison either to my own remote servers or to a service. I’m really not too interested in the commercial vendors as most of their software works on Windows or maybe Mac and my files reside on a Linux fileserver. It only makes sense that my backup solution should run on the Linux server as well.

None of these solutions seemed to quite fit the bill for me because of expense, concerns about data security or speed. Since the introduction of S3 I have started playing around with some of the scripts and software which have been developed to take advantage of these powerful services. I was still disappointed though mostly because of some data encryption concerns (on the storage system, not in transit) and the potential charges associated with backing up data to the S3 service. Ideally I would want something rsync like which would only transfer the changed parts of the files instead of recopying the entire file or directory. Unfortunately there is no built in support for anything like this in the low-level S3 system. So after playing with many scripts that suggested they would be able to do something along these lines and remaining unimpressed I decided to put things on hold for a while longer.

Eventually Amazon released the EC2 cloud computing platform but that still didn’t seem particularly useful for my purposes because of the lack of persistent storage between sessions. Once the elastic block storage became available things got more interesting. Now that I could retain data between sessions I had visions of a backup script which would launch an EC2 instance, mount an EBS volume and run rsync or Unison to backup directories on my local server to the remote site. I started playing around with EC2 and soon discovered that although it is very powerful it is a monster to control unless you are writing your own application from the ground up. For a simple job like this that should be easily accomplished by a script it can be a nightmare with several shell variables to set and paths to keep straight. Never mind the several encryption keys and the changing SSH host identifier to deal with. Eventually with some help from two fantastic blog entries (Ereblog and Free Wisdom Online) I was able to get something working…mostly.

It’s quite a fragile thing and you have to make sure that things are executed in the correct paths and with the correct environment variables set. In addition the returned data from the control commands is just awked from the output so it could easily break if the control package were updated, etc. The final nails in the coffin for me were my increased backup storage requirements for photos, audio and video which are huge and can change the economics of doing remote backup quickly. Even for a slimmed down set of documents I found the process to be too slow and fragile for my needs. In the end I have gone back to hauling hard drives with data backups off site and using the rsync program locally to sync these periodically with my live storage.

*Edited 2/2/09 to fix the several times I mistakenly called EC2 EC3 although I knew better. Thanks to the commenter for pointing this out!

3 Comments.

  1. You keep saying EC3 instead of EC2…

  2. Take a look at rsync.com it a service that let you Rsync to Amazon S3 with out the need to bother dealing with Ec2 machine.

  3. correction:

    Take a look at s3rsync.com it a service that let you Rsync to Amazon S3 with out the need to bother dealing with Ec2 machine.