I was on my normal daily Twitter scrolling when I saw this:
Introducing the #CloudResumeChallenge. I'm volunteering my network to help you get your first job in the cloud. But I can only share a certain kind of resume. Thread -> forrestbrazeal.com/2020/04/23/the…17:19 PM - 27 Apr 2020
I love the idea of motivating people into going to the Cloud and after reading the challenge, I got immediately hooked (you know, WCGW with another side project?).
I’ve been working in software development for the last 5 years and have previous experience with the Cloud. I’m currently working as a Site Reliability Engineer and the company I work for recently decided to go “all-in” into the Cloud. The challenge would help me to refresh my memory and learn new things (~1 year apart from the Cloud and everything seemed new).
The idea is to have a static website (your resume) deployed online using a serverless API to count the number of visitors. You can read the full instructions.
Enough talk, let’s dive into my journey of doing the challenge.
Frontend
So far, almost all my programming experience has been on the backend side. I always wanted to have my own blog/website (you know, like a cool kid). Hopefully, this will be the kick-start I need.
I have been following @kentcdodds blog and getting more hooked into React. I try to follow all the examples but never got to do a full project (tried multiple times with side projects and failed constantly). Maybe this time's a charm.
Therefore, I decided to build the website with React using Next.js so I can forget about things like bundling modules and Javascript compiling for the time being (yes, I’m looking at you Webpack and Babel). I also avoided using components libraries and things like Bootstrap or tailwindcss so I can learn the basics of React and CSS (hence why my resume looks so ugly 😄).
Tip: if you want to increase the chances for your resume to grab recruiters' and Companies' attention, I recommend you to buy The Tech Interview Inside-Out book by @gergelyorosz . I've already applied some suggestions for my resume.
Once I got the static site done, all left was to deploy online all HTML and Javascript files. Here is where the Cloud part starts. First, I used S3 to serve the files online. Later on, added a CloudFront web distribution as a Content delivery network (CDN) in front, together with Certificate Manager to handle the SSL/TLS certificate needed to serve traffic through HTTPS. All under a custom domain name managed by Route 53.
Here is a brief overview of the architecture:
Backend
Recently a friend of mine asked for some help to debug some issues with his company website, which was under a serverless architecture. So I already had my fair share of reading documentation and understanding concepts behind API Gateway, Lambda Proxy integration, SAM. Now that I know more about what a serverless architecture entails, things should not be that hard right?
The main logic is inside a simple python lambda function that talks to DynamoDB table and increases and returns the current visits counter of the domain. I used Amazon API Gateway to expose my lambda function and used it as a backend for the static site. By default, the API Gateway uses an Edge-optimized API endpoint, which has an AWS managed CloudFront web distribution in front of it to help improve the client connection time. Since we already have a CloudFront distribution, we can choose a Regional API endpoint instead and use the same distribution used for the static site. Here you can read a detailed explanation of how to do this.
Here is a brief overview of the architecture:
Infrastructure as Code (IaC)
I'm used to working with Terraform (and a bit of CloudFormation) to deploy and maintain the infrastructure in the Cloud. But recently a new cool kid came to play, CDK, and of course, now I want it to give it a try.
CDK is a developer-friendly abstraction to manage cloud infrastructure as a code. It is a higher abstraction compared to things like Terraform and CloudFormation, that lets you use a language you are already comfortable with (yes even Terraform has its DSL that you need to learn) and apply some higher logic that might not be possible or nice to do with lower-level tools. I will not state which approach is better since I don’t want to start a declarative vs imperative discussion. On the positive side, there are both low-level and high-level abstractions so you can choose how to deploy things. Also, Terraform recently released a CDK so you can now also leverage the same principles for IaC for different Cloud vendors.
I recommend having a look at aws-cdk-examples which contains some good examples with best practices. As an example, I started my project by using two examples, static-site and api-cors-lambda-crud-dynamodb, which later I modified and tuned according to my needs.
The outcome:
class MyAPIStack extends cdk.Stack {
constructor(parent: cdk.App, name: string, props: cdk.StackProps) {
super(parent, name, props);
new Api(this, 'API', {
siteDomain: this.node.tryGetContext('domain')
});
}
}
class MyStaticSiteStack extends cdk.Stack {
constructor(parent: cdk.App, name: string, props: cdk.StackProps) {
super(parent, name, props);
new StaticSite(this, 'StaticSite', {
siteDomain: this.node.tryGetContext('domain'),
api: {
id: Fn.importValue('ApiId'),
originPath: Fn.importValue('ApiOriginPath'),
},
});
}
}
const app = new cdk.App();
const env = {
account: process.env.CDK_DEFAULT_ACCOUNT,
region: process.env.CDK_DEFAULT_REGION,
}
new MyAPIStack(app, 'MyApi', { env });
new MyStaticSiteStack(app, 'MyStaticSite', { env });
app.synth();
Two CDK stacks, Static Site and Api (architecture was explained above) which can be deployed separately while considering dependencies: Api
stack exports the URL endpoint and origin path needed by the StaticSite
stack so CloudFront can redirect the traffic intended for API to the API Gateway and later on to the lambda function (or CORS settings).
CI/CD
Well here you should already know what’s going to happen, we are going to use the newest cool kid, Github Actions.
My experience with Github Actions has been a bit of a roller coaster. I was expecting more features implemented out of the box: share artifacts and dependencies between workflows and better/easier container integrations, e.g. services in Jobs, etc. On the other hand, I like the Marketplace and how easy it's to use or share Actions within the community.
Again things are simple here, two workflows, one for the Static Site and one for the API:
- Static Side:
- Build the website (Next.js export).
- Deploy resources using CDK (
distributionPaths
option invalidates the cache in CloudFront distribution).
- API
- Test python code
- Build assets code: install dependencies and
zip
all files. - Deploy resources using CDK.
Code Review
After finishing the challenge I added @forrestbrazeal (the real cool kid) as a collaborator, who gave me a personalized code review. There was a bug with DynamoDB increasing the visits counter with initial zero values and suggestions to use on-demand billing for DynamoDB and structured logging for the lambda function.
The bug was an easy fix and I added some tests to cover the interactions with DynamoDB. For this, I used the dynamo-local container and AWS CLI to initialize the table. Turning on on-demand billing was just setting some config on the AWS DynamoDB construct. And finally, adding structured logging was also an easy thing thanks to aws-lambda-powertools-python.
Now the fun part begins.
When I added structured logging, I added an external dependency into the lambda code (boto3 is also an external dependency but it is included by default on the AWS python lambda runtime). Now just having the lambda python file was not enough, now you need to bundle dependencies: pip install -r requirements.txt --target=<output_dir>
and a zip
all files.
This is when SAM comes to play. It helps you to build serverless applications and abstract things like bundling asset code for you. Here is an easy to follow blog about how to start working with SAM (done by a person also doing this challenge). You can use SAM together with CDK to test, build, and deploy serverless resources. Here is another blog with a detailed explanation.
TL;DR:
cdk synth --no-staging > template.yaml
sam local invoke <function>
First, I was bundling assets manually inside Github pipeline, then I was using SAM and later I learn CDK lambda construct also has built-in support already. To keep things in sync, I decided to use CDK for both bundling and deploying then lambda function.
code: lambda.Code.fromAsset('../api/app', {
bundling: {
image: lambda.Runtime.PYTHON_3_8.bundlingDockerImage,
command: [
'bash', '-c', 'cp -R /asset-input/* /asset-output && pip install -r requirements.txt -t /asset-output',
],
},
}),
Downfall
After finishing fixing bugs and suggestions in the code review, I informed @forrestbrazeal that it was done. Now I’m happy, I have finished the challenge (only blog post is left), it’s almost midnight and it’s time to go to bed.
Wait, what? I've tested it before, I have been playing with it for so long already (my counter was at 150 already and I was the only one that knew the domain by that time).
I try to see what differences are between deploy code and latest committed changes with cdk diff
and nothing comes up. I even locally reverted the latest commit and tried again, and the same happens.
Deploying things with multiple tools at the same time is not a good thing to do. I was doing manual deployments of the lambda function on AWS console and with SAM to test things before committing changes. Which caused a CloudFormation (CDK leverage deployment of resources to CloudFormation) drift. CloudFormation's state of managed resources diverged from what was currently deployed, due to manual actions on the lambda function (this reminds me of old times working with Terraform without storing the state on S3 with versioning enabled). Fortunately, now there is a new drift detection which tells what things diverged and on which resources.
It's past midnight, you are tired, and there is no urgency in fixing this right away. What do you do?
Obviously, you try to fix it now so you can sleep in peace. Here is where I tried to be clever and decided to manually delete the lambda function so it can be re-deployed again by CDK.
Remember that I said there was a drift in my CloudFormation stack, so How manually deleting the lambda function is going to help? Now the state has diverged even more. To fix this you need to bring the state back in sync, which means manually editing resources until they match CF state or editing CF stack to match the current state.
As said before, it is late, and I want to go to bed. I don’t want to manually keep trying to edit things until the drift is gone. Luckily we have used IaC so it should be simple to delete everything and bring it back by just doing cdk destroy && cdk deploy
again right? YOLO (please don’t do this with a production application).
This went surprisingly well and after a few minutes, all resources were destroyed and re-deployed again. Now everything is working, I tried multiple times refreshing the page to make sure the counter is working, so now I can inform that it is ready and go quickly to sleep before I break something else 😄.
Conclusion
It was a fun journey, I learned and struggle more than I thought. If you are new to the Cloud or even to IT, don't worry if you are struggling too much. Being honest, this “simple” challenge covers most of the best practices in software deployment that a production application should have.
Kudos again to @forrestbrazeal for this great initiative, I hope more people can enjoy this challenge. I’m open to suggestions or questions about how I did things. Also, feel free to reach if you need help. Hopefully, in the following days, I’ll add better explanations into the GitHub repo and I can write a follow-up post with a deeper focus on the technical side.
Almost forget, this is the final outcome: https://luisyonaldo.com
Top comments (0)