Getting Started With Boto and Glacier

by

Amazon recently released Glacier, a new web service designed to store rarely accessed data. Thanks to boto, a Python interface to Amazon Web Services, it’s very easy to store/retrieve archives from Glacier.

If you have never heard about Amazon Glacier you should read the Amazon Glacier FAQ and the Amazon Glacier developer guide.

The basics

With Glacier, a backuped file is an archive stored in a vault. To make an analogy with Amazon S3, an archive is like a key and a vault is like a bucket.

To download an archive, and even to get the inventory, you must first initiate a job that will complete within 3-5 hours, you can optionally get notified via the Amazon Simple Notification Service, then you can download the result.

Also, Amazon specify that you should maintain your own inventory.

Getting started with boto

Here is the strict minimum to store/retrieve an archive. You should also check the API Reference.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import boto

ACCESS_KEY_ID = "XXXXX"
SECRET_ACCESS_KEY = "XXXXX"

# boto.connect_glacier is a shortcut return a Layer2 instance 
glacier_connection = boto.connect_glacier(aws_access_key_id=ACCESS_KEY_ID,
                                    aws_secret_access_key=SECRET_ACCESS_KEY)

vault = glacier_connection.create_vault("myvault")

# Uploading an archive
# ====================

# You must keep track of the archive_id
archive_id = vault.upload_archive("mybackup.tgz")

# Retrieving an archive
# =====================

# You must initiate a job to retrieve the archive
retrieve_job = vault.retrieve_archive(archive_id)

# or if the job is pending (with job_id = retrieve_job.id)
# retrieve_job = vault.get_job(job_id)

# You can check if the job is completed either manually, or via Amazon SNS
if retrieve_job.completed:
    job.download_to_file("mybackup.tgz")

That’s it !

Keeping track of the inventory

I chosed to use shelve to store both the inventory and waiting jobs.

Here is a simple class that can help you getting started:

Bakthat

You may also want to check out bakthat, a Python tool I wrote, that allow you to compress, encrypt (symmetric encryption) and upload files directly to Amazon S3/Glacier, you can use it either via command line, or as a python module. And BakManager, an app that monitors your backups and notifies you when a backup doesn’t happen (It works well with bakthat).

Your feedback

Don’t hesitate if you have any questions !

Comments