DEV Community: Michael Field

#CloudGuruChallenge – Cloud Resume on AWS

Michael Field — Tue, 11 May 2021 02:32:39 +0000

Summary

This post is about the Cloud Resume challenge posted by Forrest Brazeal from "A Cloud Guru". #CloudGuruChallenge

The challenge was to create a static website for your "Cloud Resume". As I journeyed through the cloud I wound up registering and creating four static websites since, as the saying goes, practice makes perfect. I wasn't worried about the extra cost associated with the three other domains as I plan to use them for other personal projects and/or sandboxing.

Finished Project
- MField Cloud Resume
Other test sites

Now onto the details of the project.

Tools

I used VSCode for my IDE and I'm loving it more and more each time I use it. Check it out! It's free!

Requirements

I organized the requirements into three parts:

Front-end Components

Include the AWS Cloud Practitioner certificate (WIP)
Be built with HTML, CSS and stored in an S3 bucket
Use Route 53 to host the domain and configure DNS
Use HTTPS for security via CloudFront
Use Javascript to send a request to the backend to get the visitor count

Back-end Components

Store the visitor count in DynamoDB
An API Gateway endpoint which invokes a Lambda Function
A Lambda function invoked which can access the database to retrieve/update the visitor count as well as send a response back to the front-end
Infrastructure As Code (IAC) using the SAM CLI to create, build and deploy the template.yaml file needed to generate the resources in AWS

GitHub CI/CD

Two private Github repos one for each component
Github actions invoked whenever new changes are merged
- Front-end: need to invalidate the CloudFront cache
- Back-end: run python tests, build and deploy SAM 'template.yaml' file

Front-end Components

HTML & CSS

I used a free, open sourced starter bootstrap template and modified the content as needed.

index.html: personal and resume related background info
projects.html: highlights other AWS projects & Cloud Challenges
404.html: custom error page returned by CloudFront distribution whenever user navigates to a non-existent page
default.js: contains the Javascript which requests the user count

Javascript

I attempted two ways to send a request to the Visitor Count API:

Although both methods worked I eventually went with XMLHttpRequest since it was easier to handle non-200 status codes as Fetch doesn't reject even on an HTTP error status.

When testing the site locally I used a VSCode plugin called Live Server which launches and loads your site in a local server. It was during this part of testing that I noticed my visitor count kept incrementing each time I refreshed and/or navigated back to the main index.html page.

To help prevent this I decided to use the browser's Session Storage which is checked when the page loads.

If the Session Storage item doesn't exist:

Send a request to the API endpoint to update and return the updated Visitor Count. The count value is then parsed from the response, stored in the Session Storage and displayed on the page.

If the Session Storage item does exist:

Retrieve the current value from the Session Storage and display the value on the page.

Per MSDN documentation the Session Storage persists as long as the browser window is open. Once the user closes the browser the session is ended and the object is cleared.

AWS Resources

S3 Bucket

AWS documentation found here was my main source of information when setting up and configuring the S3 bucket. I found it very interesting how the "Access" level of the bucket changed throughout the entire project.

After creating the bucket
- Access = Bucket and objects not public
After disabling the "Block all public access" setting
- Access = Objects can be public
After editing the bucket policy
- Access = Public
After creating and configuring CloudFront
- Access = Objects can be public
- Details = The bucket is not public but anyone with appropriate permissions can grant public access to objects.

AWS Certificate Manager

For the most part things went smoothly. However, there were some hiccups along the way.

For one of my test sites I figured I'd try creating a site with all the resources in the US-West region. When configuring the CloudFront distribution I was surprised to see the cert was not listed. Attempting to manually enter the cert failed but luckily the AWS error message displayed contained the answer as to why. When using CloudFront the certificate must be requested/imported in the US East region as is stated here.
Remember to click the "Create record in Route 53" button. Otherwise one could spend a while wondering why the request was taking so long. (true story! lol)

CloudFront

Configuring a CloudFront distribution was (and still is) a rather daunting task considering all the available settings and options.

General
- Make sure to fill in the CNAME(s) properly. A few times I quickly skipped over or forgot this part and wondered why I wasn't able to view the site.
Origin Access Identity
- Found creating a new identity worked best instead of using an existing identity.
- Select the "Yes, Update Bucket Policy" option. Doing so will automatically update the S3 bucket's policy which is how I noticed the bucket's access changed from "Public" to "Objects can be public". Otherwise you will have to manually update the bucket policy to grant the distribution access.
Custom Error Pages
- When navigating to a non-existent page the response returned by CloudFront is a very user unfriendly '403 Access Denied' page. To help mitigate this I created a custom error response which returns the 404.html page in the S3 bucket.

Back-end Components

Infrastructure As Code

Initial setup was rather simple. To start I used the "Hello World Example" template which one can initialize through the SAM CLI. In the command line run the following:

Step 1: type in sam init
Step 2: Select "AWS Quick Start Templates"
Step 3: Select "Zip (artifact is a zip uploaded to S3)"
Step 4: Select your runtime of choice (ex: Python3.8)
Step 5: Give your sam-app a name
Step 6: Select template #1 (Hello World Example)

Following the above steps will auto-generate a template which one can then modify to suit your needs. For this project, I modified the following three pieces.

template.yaml file
Lambda Function
python tests

The template.yaml file consists of a serverless lambda function configured with events which generates an implicit API. Modifying the template the final version consisted of:

DynamoDB to store the visitor count
Lambda Function with policy allowing access to
- DynamoDB
- Parameter Store to retrieve the database name
- API Events to request the data from the backend

The Lambda Function has two "API Events":

Count
- This path will retrieve, update and return the new visitor count to the frontend.
Health
- I wanted the CI/CD workflow to fail and not even get to the SAM build/deploy portion if the API itself was down. To accomplish this I wrote a test which asserts the health response's status. If returned response code is not the expected value than the test fails which in turn should cause the CI/CD workflow to stop.

I also used Postman to locally verify the endpoints. Postman is an absolutely awesome tool for testing APIs and I highly recommend it to anyone!

GitHub CI/CD

The final portion of the project was to create a CI/CD pipeline using Github workflow actions. I was nervous at first since I had never used this feature before but I was able to put together a workflow for both the front-end and back-end code. The initial step was to add a workflow to the private repository. This can be done by going to the "Actions" tab of the repository and creating the suggested "Simple Workflow".

From there it was a matter of using other resources (i.e. Google) as well as trial/error to finetune the workflow. Part of the process was using an AWS user with only the permissions necessary for the job at hand. To accomplish this I created a new Group with a custom policy consisting of only the necessary permissions needed. Afterwards I created and assigned a new User to that Group.

Back-End

Front-End

Closing Thoughts

This challenge is a really great starting point for anyone who is interested in learning what the cloud is and the resources AWS provides. It's a highly recommended stepping stone for anyone who wants to start learning by doing. And once you start you'll get bitten by the 'cloud bug' and want to learn and do more. And what better way to showcase such knowledge than building and deploying your resume utilizing the technologies inside the (AWS) cloud itself.

Where to go from here? Well, as of this writing I am scheduled to take the AWS Cloud Practitioners exam in early June 2021 and will use the bulk of my time in preparation for the exam. Other areas of interest and projects include:

Use the AWS Cloud Development Kit to redo my Twitter Bot
Hosting a REACT application on AWS
the #100DaysOfCode challenge

Thank you again Forrest! This was a lot of fun! And thank you to everyone who took the time to read this post.

[Update - June 12, 2021]
I took and passed the AWS Cloud Practitioner certification exam on June 11th! Click here to view my certificate.

#CloudGuruChallenge – Event-Driven Python on AWS

Michael Field — Thu, 08 Oct 2020 05:02:51 +0000

Summary

Back in September 2020 Forrest Brazeal from Cloud Guru posted a new #CloudGuruChallenge which was to build an event driven ETL (Extract, Transform, Load) application.

Challenge Details can be found here.

Background

Before going into the details of the #CloudGuruChallenge, I would like to mention that this was only my second time at attempting to create an application using AWS. At the time the new challenge was announced I had just started a two week stay-cation where my main goal was to create a "Japanese Word of the Day" Twitter bot using Python.

I had a very basic POC created which worked locally on my machine but I wanted to move it to the cloud. Doing so would provide me with an opportunity to learn something new, get more practice using Python as well as not having to leave the application running on my local machine. However, moving the app to the cloud also presented some new challenges.

What services should I use?
Should I run the app on an EC2 instance?
How would I store and update the data?
How should I store sensitive data like the auth-keys/secrets?

In all the application included:

Lambda Function #1: posts a new word to Twitter account
Lambda Function #2: adds new words to the database
CloudWatch Event #1: invokes Lambda Function #1 once a day
CloudWatch Event #2: invokes Lambda Function #2 when a JSON file is uploaded to S3
S3 Bucket: holds a JSON file consisting of new Japanese words
DynamoDB: stores a list of Japanese words which Lambda Function #1 queries when posting a new word to Twitter
AWS Systems Manager Parameter Store to hold the auth-keys/secrets
AWS SAM CLI for building and deploying the app

I wanted to mention this first because it gave me the confidence to believe that I could actually complete the new challenge.

Repository for this project can be found here.

Cloud Guru Challenge – Details

The challenge was to create a Python compute job which would (once a day) extract a set of Covid19 data and display the information on a dashboard.

AWS Resources

All resources (Lambda, Events, S3, DynamoDB, AWS Glue, SNS) were created via the template.yaml file which sounded easy at first but was actually very difficult for me to fully understand. Just when I thought I got it right (and with sam build giving the green light) something would go wrong when attempting to deploy the app.

Sometimes the issue had to do with improper syntax while others times I wasn't referencing the correct name or resource. Overall it was really cool and exciting being able to spin up AWS resources without having to use the console.

Note: Don't forget to set PublicAccessBlockConfiguration to TRUE to prevent public access to your S3 resources

Resources:

Data Extraction/Filtering/Validation

In another first for me I decided to use Python Pandas as the means of extracting, filtering and transforming the raw csv data. I've heard about Python Pandas before but never used it myself nor did I have any real insight into how powerful the module is.

I used Pandas to extract the raw data from the csv files as well and performed some preliminary validation. I checked to see if the raw data included the expected column names as well the expected datatype (isnull(), map(type).all() == str, etc). My thinking was that the app should stop (and not proceed) if the raw data itself didn't contain the expected data.

My biggest concern about using the Panda Module was the shear size of the module itself as it was the bulk of the application's size. AWS provides Lambda layers and I attempted to try and use that to decrease the size/deploy time but wasn't able to figure it out. Taking application size (as well as any changes made by either AWS or Python) I'm thinking it might be good to create one's own Lambda Layer library of modules. My thinking being if there are any breaking changes/updates (AWS SDK, the module or Python) the application should still be able to run using the layer.

Resources:

Data Transformation

The final data transformation logic is contained in a separate Python module (transformdata.py) where the two sets data are merged into one 'final' dataset.

Uploading The Data

Because I had "some" experience with using DynamoDB with my Twitter-bot I thought it was going to be a piece of cake using the same database resource for this challenge. However, I should have investigated AWS QuickSight more because if I had I would have learned that QuickSight doesn't yet support DynamoDB. With that said my application basically has two databases: DynamoDB and S3.

DynamoDB

If the DynamoDB table has 0 items than the whole final dataset is uploaded. Otherwise I query the database (with date sorted descending) and return only one record (which should be the last most recent date). I then use Pandas to filter the final dataset by that date so that only items with a date greater than the filtered date are uploaded to the DB. This is to avoid having to uploaded everything each day in the database.

S3

Since I use S3 to feed the QuickSight dashboard, I simply just upload the entire dataset to S3 (csv file). This also avoids having to worry about any older dates having updates to any of the counts (cases, deaths, recovered).

Resources:

Uploading CSV to S3

Notification

The notification I sent aren't the most user friendly nor prettiest to look at but it does notify the user of how many items were added to the database.

Tests

I have six tests (each test containing several scenarios) which are used to verify the 'raw' data and the transformed data. Making sure the expected outcome occurs.

Resources:

Dashboard

When I found out QuickSight doesn't support DynamoDB I had to quickly create an S3 resource. Once that was done I was able to connect QuickSight to the S3 dataset and put together a chart showing the number of cases vs recovered vs deaths over time. I have to say (despite the name) QuickSight wasn't very quick for me to catch on and use. The user interface didn't seem very intuitive but I say that with a grain of salt since this was my first experience using any sort of Business Intelligence (BI) service.

Repository

Code can be found here

Looking Back

First and foremost I want to thank Forrest and everyone at CloudGuru for creating these challenges as it was (without a doubt) an amazing learning adventure. I'd also want to thank my current job for 1) paying for a 3 day AWS training camp back in December 2019 and 2) for giving us access to the resources on Cloud Guru itself.

In my current job position the extent of my direct interaction with AWS itself is via the console and mostly for troubleshooting. To be able to actually create resources and deploy an application using those resources is an amazing feeling. I can't stress enough how amazing a learning opportunity these se #CloudGuruChallenges are.

Moving forward I want to continue looking/persuing:

Lambda layers
CI/CD
Further usage and insight into AWS QuickSight
More usage of Python Pandas and Unit Tests
Complete the previous Cloud Resume challenge
Study for the AWS Certified Cloud Practitioner exam
Planning better before jumping into code