Making the cloud yours.

So I have my dedicated server setup with all my cloud replacement apps and services. Email, pictures, status.net, jabber and blog, which is great, but how do I ensure that my data is safe from failure. One of the great advantages of using web services in the cloud is that they promise to keep your data safe from corruption, or loss. When you move away from these apps, you can also loose this safety net. Previous to this post, I was performing nightly rsync’s to a remote server using SSH. It worked ok, but had some disadvantages.

  • Costly in terms of hardware
  • Costly in terms of bandwidth
  • Is that storage medium that safe?
  • Requires a lot of effort to make snapshots
Having a second server sitting there just for backups isn’t really the cheapest solution, or the most robust. I went looking for a “service” on the internet yo play with. FTP hosts seemed to be too expensive, and backup providers were either geared towards home users, or fairly expensive. I then read up on the Amazon S3 cloud. Amazon S3 cloud is a cheap fast storage solution.

Amazon allows you to store as much as you want, and they’ll only charge you for what resources you use, at ludicrously cheap rates. Now you can rush off and get Amazon S3, and use a tool called s3cmd to sync your data with the cloud, however there are some draw backs.

S3cmd contains encryption, however you can only use it when you use the “put” command, and not the “sync” command. So if you want to use encryption (through GPG), it requires doing a full backup each time. The other issue is that it provides no snapshotting features.

The other option is s3fs which is a FUSE module to mount the S3 cloud as if it were a real disk. It works great however no encryption.

Eventually I can across some well hidden software called Duplicity which can backup to multiple sources including S3. It uses GPG for encryption and librsync to provide incremental updates. Duplicity will backup everything into 25MB GPG encrypted compressed tarballs and fire them off to S3. The advantage off this is that provides low bandwidth usage, and also low requests (every 1000 requests to S3 costs $0.01). Care must be taken to ensure you have a copy of the GPG key used for the encryption located elsewhere though. Duplicity allows you to restore data from a certain time or date, and only ever uploads data that has changed.

The disadvantage is that you cannot easily see the backups tree structure, and that you need to backup a GPG private key, however I believe this is a small price to pay for such good backups.