Originally published on my personal blog georgeoffley.com
A hybrid infrastructure has tons of exciting challenges. Although we host a great deal of our software in AWS at my company, we cannot do everything in the cloud. As such, we have tons of physical infrastructure as well. This hybrid infrastructure presents many challenges that we strive to overcome on the software team. One of the challenges we are working towards is imaging and utilizing software to detect our yields. This piece of that puzzle will focus on storage for our images.
We decided that we would use a combination of services offered by AWS. The first is the Amazon Simple Storage Service or S3 for image storage and DynamoDB for holding metadata of said images. Given that we are getting information straight from hardware, many things might go wrong, from getting the pictures to when said pictures are pushed to AWS. This brings us to this evening’s question: How can I be sure these services are available for me to send stuff to?
Well, as it turns out, there are a few ways this can be done. For example, there are libraries out there that will scan health check websites to see if AWS has any service outages. This would not be a great way to do health checks for a production application. So, I decided to spike this problem and make something myself. I am not worried about AWS services being out as they have high availability using their different availability zones. I am more concerned about our endpoints failing, internet issues, or Cloverfield monsters. So, this needs to be explored.
A simple solution for checking the health of my resources was needed. Luckily, I quickly put something together using the Boto3 library, which is the AWS SDK for Python. This library gives us easy access to the AWS API for configuring and managing services. The first thing I did was create an object class to utilize the Client class in Boto3.
We only need to pass in our access credentials and the services we want to create a client object for, and we get our client object back. Each turn in Boto3 allows for interacting with the Client class. The docs define the Client class as “a low-level client representing whatever service”. In most cases, you would use it to access the various functions for interacting with the service.
After that, I put together some simple logic to return some information on the resource we are looking for. In our case, we were trying to get access to a bucket where we will store images. This solution is enough to satisfy me that the resource exists, and I can communicate with it. Below is the code I used for S3.
The code above sets up a new client instance and utilizes the head_bucket() function. This is great for seeing if a bucket exists and if the entity polling it has permissions to access it. In my case, I only need to be able to see if I get a message back. So, I pass in the bucket name, and I can receive a 200 message back from the server if the resource is there and I have access to it. I like this approach because it is dead simple, and I also get to utilize the custom exception that we get access to using the client object, which is the NoSuchBucket exception. Using this exception allows us to be concise with our exceptions.
There were some questions about the limitations on being able to use something like this. We expect to use this frequently to pole S3 and make sure that we can talk to our bucket. If AWS is not available, we need to turn off the spigot and stop our software from sending stuff to AWS and not lose messages in the void of space. That said, we will be polling a few times a second at least; luckily for us, S3 upped their request rate to 3500 to add data and 5500 for retrieving data. This gives us plenty of room to be able to pole what we need.
With the client object that we created above, we can also use that to access DynamoDB. As such, the code is below:
The above code snippet does the same thing as the S3 code does. We create a new instance, and we use the describe_table() function while passing in the table name. This function returns information about the table, including the status. Also, note that the ResourceNotFoundException is another custom exception provided by the Dynamo Client object. This bit of code satisfies what I need to be able to check the status of a table. Yay!
Using this method also has similar challenges. The decribe_table() function uses up an eventually consistent read on your table. So, getting out-of-date data is possible if you are polling something you just created, so give it a second. If you are using a provisioned table in Dynamo, this method will take up one of your reads per second. We will need to make sure this is accounted for when we start designing our database.
The above simple bit of code was a brief spike for a solution we needed to explore. This write-up was inspired by a lot of the help I received from my fellow AWS Community Builders. Checking the health and status of services is one of many things that we will build out using AWS. I am excited to keep up my learning and building. If you have seen or made other stuff to accomplish this type of work, let me know! I would love to learn more.