Amazon recently released Glacier, a new web service designed to store rarely accessed data. Thanks to boto, a Python interface to Amazon Web Services, it’s very easy to store/retrieve archives from Glacier.
With Glacier, a backuped file is an archive stored in a vault.
To make an analogy with Amazon S3, an archive is like a key and a vault is like a bucket.
To download an archive, and even to get the inventory, you must first initiate a job that will complete within 3-5 hours, you can optionally get notified via the Amazon Simple Notification Service, then you can download the result.
Here is the strict minimum to store/retrieve an archive. You should also check the API Reference.
importbotoACCESS_KEY_ID="XXXXX"SECRET_ACCESS_KEY="XXXXX"# boto.connect_glacier is a shortcut return a Layer2 instance glacier_connection=boto.connect_glacier(aws_access_key_id=ACCESS_KEY_ID,aws_secret_access_key=SECRET_ACCESS_KEY)vault=glacier_connection.create_vault("myvault")# Uploading an archive# ====================# You must keep track of the archive_idarchive_id=vault.upload_archive("mybackup.tgz")# Retrieving an archive# =====================# You must initiate a job to retrieve the archiveretrieve_job=vault.retrieve_archive(archive_id)# or if the job is pending (with job_id = retrieve_job.id)# retrieve_job = vault.get_job(job_id)# You can check if the job is completed either manually, or via Amazon SNSifretrieve_job.completed:job.download_to_file("mybackup.tgz")
That’s it !
Keeping track of the inventory
I chosed to use shelve to store both the inventory and waiting jobs.
Here is a simple class that can help you getting started: