Kyle Mistele

Posted on Dec 22, 2020

Botocore is awful, so I wrote a better Python client for AWS S3

#python #aws #cloud

If you've ever been unfortunate enough to have had to work with botocore, Amazon Web Services' Python API, you know that it's awful. There are dozens of ways to accomplish any given task, and the differences between each are unclear at best. I recently found myself working with botocore while trying to build some S3 functionality into codelighthouse, and I got really frustrated with it, really quickly.

AWS S3 (Simple Storage Service) is not complicated - it's object storage. You can GET, PUT, DELETE, and COPY objects, with a few other functionalities. Simple, right? Yet for some reason, if you were to print botocore's documentation for the S3 service, you'd come out to about 525 printed pages.

I chose to use to the Object API, which is the highest-level API provided by the S3 resource in botocore, and it was still a headache. For example, the Object API doesn't throw different types of exceptions - it throws one type of exception, which has numerous properties that you have to programatically analyze to determine what actually went wrong.

There are a few open-source packages out there already, but I found that most of them left a lot to be desired - some had you writing XML, and others were just more complicated than they needed to be.

To save myself from going mad trying to decipher the docs, I wrote a custom high-level driver that consumes that low-level botocore API to perform most basic botocore functionalities.

To save other developers from the same fate I narrowly avoided, I open-sourced the code and published it on PyPi so you can easily use it in all of your projects.

Let's Get Started

Installing my custom AWS S3 Client

Since my client code is hosted via PyPi, it's super easy to install:

pip install s3-bucket

Configuring the S3 Client

To access your S3 buckets, you're going to need an AWS secret access key ID, and the AWS secret access key. I wrote a method that you can pass these to in order to configure the client so that you can use your buckets. I strongly suggest not hard-coding these values in your code, since doing so can create security vulnerabilities and is bad practice. Instead, I recommend storing them in environment variables and using the os module to fetch them:

import s3_bucket as S3
import os

# get your key data from environment variables
AWS_ACCESS_KEY_ID = os.environ.get('AWS_ACCESS_KEY_ID')
AWS_SECRET_ACCESS_KEY = os.environ.get('AWS_SECRET_ACCESS_KEY')

# initialize the package
S3.Bucket.prepare(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)

Using the S3 Client

I designed the S3 Client's API to be logically similar to how AWS structures S3 buckets. Instead of messing around with botocore's Client, Resource, Session, and Object APIs, there is one, simple API: the Bucket API.

The Bucket API

The Bucket API is simple and provides most of the basic methods you'd want to use for an S3 bucket. Once you've initialized the S3 client with the keys as described in the previous section, you can initialize a Bucket object by passing it a bucket name:

bucket = S3.Bucket('your bucket name')

#example
bucket = S3.Bucket('my-website-data')

Once you've done that, it's smooth sailing - you can use any of the following methods:

Method	Description
`bucket.get(key)`	returns a two-tuple containing the `bytes` of the object and a `Dict` containing the object's metadata
`bucket.put(key, data, metadata=metadata)`	upload `data` as an object with `key` as the object's key. `data` can be either a `str` type or a `bytes` type. `metadata` is an optional argument that should be a `Dict` containing metadata to store with the object.
`bucket.delete(key)`	delete the object in the bucket specified by `key`
`bucket.upload_file(local_filepath, key)`	Upload the file specified by `local_filepath` to the bucket with `key` as the object's key.
`bucket.download_file(key, local_filepath)`	Download the object specified by `key` from the bucket and store it in the local file `local_filepath`.

Custom Exceptions

As I mentioned earlier, the way that botocore raises exceptions is somewhat arcane. Instead of raising different types of exceptions to indicate different types of problems, it throws one type of exception that contains properties that you must use to determine what went wrong. It's really obtuse, and a bad design pattern.

Instead of relying on your client code to decipher botocore's exceptions, I wrote custom exception classes that you can use to handle most common types of S3 errors.

Exception	Cause	Properties
`BucketException`	The `super` class for all other Bucket exceptions. Can be used to generically catch exceptions raised by the API.	`bucket`, `message`
`NoSuchBucket`	Raised if you try to access a bucket that does not exist.	`bucket`, `key`, `message`
`NoSuchKey`	Raised if you try to access an object that does not exist within an existing bucket.	`bucket`, `key`, `message`
`BucketAccessDenied`	AWS denied access to the bucket you tried to access. It may not exist, or you may not have permission to access it.	`bucket`, `message`
`UnknownBucketException`	Botocore threw an exception which this client was not programmed to handle.	`bucket`, `error_code`, `error_message`

To use these exceptions, you can do the following:

try:
    bucket = S3.Bucket('my-bucket-name') 
    data, metadata = bucket.get('some key')
except S3.Exceptions.NoSuchBucket as e:
    # some error handling here
    pass

Examples

Below I've provided an example of a couple of use cases for the S3 client.

Uploading and downloading files

This example shows how to upload and download files to/from your S3 bucket

import s3_bucket as S3
import os

# get your key data from environment variables
AWS_ACCESS_KEY_ID = os.environ.get('AWS_ACCESS_KEY_ID')
AWS_SECRET_ACCESS_KEY = os.environ.get('AWS_SECRET_ACCESS_KEY')

# initialize the package
S3.Bucket.prepare(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)

# initialize a bucket
my_bucket = S3.Bucket('my-bucket')

# UPLOAD A FILE
my_bucket.upload_file('/tmp/file_to_upload.txt', 'myfile.txt')
my_bucket.download_file('myfile.txt', '/tmp/destination_filename.txt')

Storing and retrieving large blobs of text

The reason that I originally built this client was to handle storing and retrieving large blobs of JSON data that were way to big to store in my database. The below example shows you how to do that.

import s3_bucket as S3
import os

# get your key data from environment variables
AWS_ACCESS_KEY_ID = os.environ.get('AWS_ACCESS_KEY_ID')
AWS_SECRET_ACCESS_KEY = os.environ.get('AWS_SECRET_ACCESS_KEY')

# initialize the package
S3.Bucket.prepare(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)

# initialize a bucket
my_bucket = S3.bucket('my-bucket')

# some json string
my_json_str = "{'a': 1, 'b': 2}" # an example json string

my_bucket.put('json_data_1', my_json_str)

data, metadata = my_bucket.get('json_data_1')

Conclusion

I hope that you find this as useful as I did! Let me know what you think in the comments below.

If you're writing code for cloud applications, you need to go when things go wrong. I built CodeLighthouse to send real-time application error notifications straight to developers so that you can find and fix errors faster. Get started for free at codelighthouse.io today!

Top comments (6)

Arhire Ionut • Dec 22 '20

It looks like very nice.

Although I'm not familiar with the technology, I'm still going to ask some questions in the hopes that I learn something new and that maybe it helps you too.

For the NoSuchBucket exception, why does the exception contain the 'key' property?

Shouldn't UnknownBucketException be instead called UnknownBucket just to be consistent with the naming style?

Does the bucket constructor have an overload where you can give it the credentials so that you can use a single line instead of two for initialisation?

Is there serialization going on for the 'put' method? Or is the user expected to handle that?

Kyle Mistele • Dec 22 '20

Hey, these are great questions.

The NoSuchBucket exception doesn't actually contain a key, that appears to be a typo I'll have to fix.

The UnknownBucketException doesn't mean the Bucket is unknown, it means that an unknown exception occurred, so the naming names sense.

The bucket constructor doesn't have an overload where you can provide credentials. The intent is that you can configure the module with your credentials once, and then create as many bucket objects as you want that correspond to different S3 buckets without having to re-specify them. I'll definitely consider that in a future version though!

The put method can take either a bytes type or a str type. The user is expected to provide serialization. Since there are so many possible different use cases, it didn't make sense to try and anticipate all of them and write serialization routines for every possible type of data.

Thanks for the feedback, Arhire!

Arhire Ionut • Dec 23 '20

Thanks for the replies. Highly appreciated.

I got confused and didn't realise what UnknownBucketException meant. Maybe a name like BucketUnknownException would be more clear? Or even, UnknownException?

Also, I now see there's also BucketException which I understand why it would have the Exception suffix (obviously) but it still doesn't follow the naming style. Now that I'm revisiting the exception names of the python standard library, I see they make no sense. It's beyond me why most of them end with 'Error' (error is something very different than exception in my book) and others don't even have this termination (like KeyboardInterrupt. Well, it kinda makes sense for it not to be labeled as an exception but still...) Also you've got Exception. I'm really confused right now :)).

The overload for the bucket constructor is as important as the probability of occurence for the use case of needing only a single bucket. If most users only use a single bucket (which I don't know because I'm not familiar with this technology) then the overload is important because it makes working with the library easier and more intuitive (plus they don't need to learn an additional method). If that use case is rare, then the overload can be easily postponed.

I don't have much experience with serialization in python but from the little experience I have, it can be quite unintuitive. You know better when you say that there are a lot of different use cases but I'm thinking, as long as you provide a default mechanism that will work for most cases, most users will be guarded from having to think about serialization althougheter. I'm thinking of trying to serialize and if the library can't do it just raise an exception and the user can then provided her own serialization.

Also, is upload_file overwriting the file in case it already exists?