On a whim, just to play with Python and AWS, I thought of building a script to backup my Logic Pro projects to AWS S3. The premise of the script is that it will check a directory on my Macbook and upload any files that have been modified since the last time the file was uploaded to S3. This will run via a cron job (which, yes, only runs if my Macbook is awake). Anyway, let’s get started.
As for how I built it, some prerequisites:
- PyCharm or any IDE
- Python 3.0+
- AWS account (free tier should suffice for testing)
- Basic understanding of Python/programming
This post won’t go into every code snippet as you can find it on my GitHub (coming soon), but will walk through the steps I took to assemble the components needed.
Objective
Compare files in a local folder with the respective files in S3 and zip and upload the local file if the local file is later than the one on S3 (last modified date time)
Part 1
In Part 1 of implementing this, we will simply go over how to connect to S3 via Python and how to get the last modified date time for a later comparison.
So, we’ll go through the snippets for these steps and/or you can skip all of this and just check out the code on GitHub (coming soon).
Connecting to S3
How to connect to AWS S3? Fortunately, AWS offers a SDK for Python called boto3, so you simply need to install this module and then import it into your script.
I am using Python 3.7 and pip3 so my command was pip3 install boto3
Import the following:
python
import AWS modules
import boto3
import logging
from botocore.exceptions import ClientError
You also need to install the aws cli module (AWS command line interface) so go ahead and do that with a pip3 install aws cli
.
Now, you will need to configure this aws cli tool to connect to your AWS account, so you should have an IAM user and group for the cli. Go to the IAM service in your AWS console to create that.
Creating IAM User and Group for AWS CLI
First, create a Group and attach the AdministratorAccess policy. Then, create a User and assign the User to the Group you just created. Continue to the end of the wizard and eventually, you will see a button to download a CSV which will contain your credentials. These will be the credentials you use to configure your aws cli.
Have the Access Key ID and Secret Access Key ready and in your Terminal, type aws configure
. You will then be asked for the Key ID and Access Key so simply paste the values from the CSV. You will also be asked for region name (input the region where your S3 buckets are). The last input field asks for output format (you can just press Enter).
Great! Now, your aws cli should be configured to connect to your AWS S3 bucket(s).
Create S3 Client and Resource
So, next we need to create a client and a resource for accessing S3 methods (the client and resource offer different methods based on usage, you can read more here).
python
CREATE CLIENT AND RESOURCE FOR S3
s3Client = boto3.client('s3')
s3Resource = boto3.resource('s3')
object for all s3 buckets
bucket_name = '' # NAME OF BUCKET GOES HERE, HARD CODED FOR NOW
CREATE BUCKET OBJECT FOR THE BUCKET OF CHOICE
bucket = s3Resource.Bucket(bucket_name)
Retrieve object's last modified date time (upload time)
Ok, so what needs to be done to retrieve an object’s last modified date time?
I’ve written a method to loop through the objects in a bucket and call an AWS method to get the last modified time, shown below (yes, this can be written other ways as well).
The bucket.objects.all()
returns a Collection which you can iterate through (read more on Collections here).
So you can see I began looping through that and only calling the last_modified method if the S3 object contained Logic_Projects and .zip in the key name.
If you’re not familiar, the key is simply how S3 identifies an Object. I am looking for Logic_Projects because I made a folder with that name and am also checking for .zip just in case I upload something else in the folder by accident (but this script will only upload zips to that folder, so it’s just a safety check).
If it passes those checks, then I proceed to the date time conversions.
You’ll notice I am calling two other methods in this loop, called utc_to_est and stamp_to_epoch before I finally return a value. That is to make datetime comparisons easier later on. The last_modified method returns a date in UTC format so it could look like this: 2019-12-07 19:47:36+00:00.
When I am comparing modification times, I’d rather just compare numbers. The date AWS returns is in UTC, also referred to as GMT (Greenwich Mean Time).
So I added a function to convert this datetime to Eastern Standard Time, shown below:
Now that we’re in the right timezone, I want to convert it to a number, so I converted it to an Epoch time, which is the number of seconds that have elapsed since 00:00:00 UTC on January 1, 1970 (Why Epoch time is this date is not in the scope of this post).
OK, that's all for Part 1! In this post, we've connected to AWS via Python (boto3) and we've created a method to get the last modified time (in seconds since Epoch and in EST) of an object in S3.
In Part 2, we will go over comparing the last modified time of a file in S3 with the last modified time of that file locally.
This post also appears on Medium (link below).
Top comments (0)