I've been working with AWS for the past 6 years or so. I've run into many highs and lows during the work. With emotions ranging from feeling like a mythic god, bending the cloud to my will, to debating seeking a job doing something with my hands far away from a computer.
I introspected over the past years and asked a few co-workers and friends things that they wish they had known before they got started using AWS. I dug for more than "I wish I knew technology X better," because that is circumstantial and not all that helpful in many cases. Instead, view this compilation of observations by folks that have been in the cloud and things we think would have been nice to know.
AWS is a massive ecosystem. You can't understand it all, let alone be an expert in most of it.
In the early days when we were building Trek10 as a company (an AWS consulting shop), we dabbled in helping folks with Azure and other clouds. We quickly realized these projects drained resources and mental capacity, and simply put, distracted us from our ultimate goals of being recognized serverless and AWS experts. We decided to hyper-focus on our goals and reject all projects that didn't get us closer to that work.
On a personal level, this holds true for individuals as well. I make an effort to read at least one AWS service's documentation from cover to cover once a week. Rough calculations suggest I have spent 500+ hours just reading AWS service documentation (There are 221 doc repos on the official GitHub). I regularly reach out to AWS folks and colleagues for help understanding interactions between services, implications of a design choice, or even just an overview or refresher of a feature.
So don't take it personally on day 1 when nothing makes sense. This stuff is hard. My general suggestion is to have an end goal of what you want to build and break it into pieces you think are solvable by various services. Pick one service, read the docs cover to cover, and create that part to get your hit of endorphins. Then, wire that thing to a second service in the same manner.
Consider your knowledge base of AWS a living source. As features are added and best practices evolved, even what you knew a year ago may soon be obsolete. It's not bad, just a natural consequence of the cloud platform. Saying "I don't know, but I can find out" isn't a shameful thing, even to a client or boss.
Understanding a service's use-cases is very different than understanding the service.
I can tell you what most of the services on AWS do at a decent level of detail. Given a use case, I could probably rough out a couple of options for architectures pretty quickly. What I can't tell you is, based on exact needs, what problems you may run into with all of the service options that may fit. To do that, I would need to research service limits and other tidbits that are scattered around AWS documentation (notice a theme here yet?).
AWS Services have limits, both service limits and other functional restrictions. Usually they are quite reasonable, but other times, they will make your life genuinely miserable. Especially if your system needs to scale or evolve if you didn't consider things ahead of time.
For example, a real-world use case. Let's say you want to process a queue of transactions. We do some Googling and find that AWS has the Simple Queue Service (SQS). We look up limits and see we there can be 120,000 inflight messages by default and practically unlimited stored and incoming messages. Great, queue solved. To process the queue we can hook things up to AWS Lambda. Lambda will handle processing messages as they come in, so no messages mean no cost, and we can scale up to high volume transactions per second without much consideration.
Our system is humming along, but then we realize we are getting some problems in our backend during higher load times. Some numbers aren't adding up quite right. Ah, a race condition! Lambda and SQS do not guarantee order. We frantically search and find we can guarantee order with SQS FIFO (first in, first out). We are saved! Except... FIFO queues don't support Lambda function triggers.
That's a heck of a limitation to run into this late in the project.
We are going to have to rearchitect or reconsider our whole set of service choices to address this problem (perhaps with Amazon Kinesis which can guarantee order and still works great with Lambda)!
You are going to get caught on sharp edges when you are exploring and learning. Your best defense is to review and re-review the limitations of the services before you choose them and ask in-depth questions about future states. Do we need to guarantee the order of anything? What size of payloads might we need? How many transactions per second if we are wildly successful? Do we need searchability of the data?
Inline bonus thing: AWS Improves around you
The AWS savvy among you may have been shouting at your screen "AWS Lambda Support FIFO Queues!" and you would be absolutely right. It does now. This wasn't the case a year ago. This brings me to the point that a proper solution today isn't the best solution tomorrow.
One of the unique properties of AWS (or the cloud in general), is that the services may improve around you. DynamoDB and S3 have addressed numerous pitfalls like hotkeys, scaling, etc. You'll see performance increases as AWS iterates on the underlying architectures. I've had solutions out in AWS Land that just got more performant or started costing less just based on AWS changes without me having to do anything.
Cost is really hard; leveraging the cloud is just as much about the business and people as it is the technology.
There are a lot of arguments for and against the cloud on cost models. Nearly every case I have read or heard toys around on raw pricing comparisons. On one end we have, "An EC2 instance on AWS costs $$$ per year, hardware in my data center amortized costs $$ per year. You are silly if you move to the cloud." With the other end saying, "Lambda functions cost $ per year, and I can serve a billion transactions without needing to order new hardware ever."
Most arguments don't dive into what it takes to build and maintain a data center, or staff folks in the data center or colo. What isn't talked about is the cost to retrain staff to leverage the cloud: the opportunity cost of pursuing cloud and retraining, or of building a data center and staffing developers to maintain internal systems just to keep the data center working.
While some of these conversations are starting to happen, I think the thing I wish I had realized was that cloud cost is less about optimizing raw infrastructure cost and more about optimizing opportunity, maintenance, and people costs. This frame of mind would have made a lot of conversations much smoother early in my career.
Corey Quinn, AWS Billing maverick explains it better than I can.
Speed round
There was a lot to think about for this post. Here are some other things to consider.
- Always, always practice good AWS account hygiene. Separate AWS accounts for environments and projects to get a built-in security blast radius and more effective costing practices.
- AWS services sometimes aren't that practical or useful when they release, but they get progressively better. Check back on them as cases arise.
- It's usually cheaper to try stuff out than do a ton of research, Proof-of-Concept early and often. Do it in a new account and with infrastructure as code so you can throw away the environment when you are done without losing any work.
- There will be regional outages of services. This doesn't mean AWS is less available or stable than your data center. Build with these outages in mind. Review the AWS Well-Architected Framework. It will help you make better decisions from day 1.
- Moving to cloud from traditional data centers is usually easier for developers than previously infrastructure and ops folks. Have a plan to turn your people into cloud natives, not just your services.
- "Lift and Shift" is never as smart or short-term of a strategy as you want to believe.
I imagine there is a lot more someone could be told, more that I wish I had been told, but I can only hope to save someone a few troubles or moments of worry as they start their cloud journey. I'm always available on Twitter or in the comments to help!
Top comments (1)
As a traditional infrastructure/ops guy turned DevOps/cloud guy, I agree with your point that some traditional infrastructure/ops folks will struggle with the transition.
To those infrastructure/ops folks who are worried about cloud replacing their job, I would have them consider the implications of your main points.
1) AWS is a massive ecosystem
2) Understanding a service's use-cases.
3) Cost is really hard
Most Devs would love nothing more than to not concern themselves with the ins and outs of AWS so they can focus on building software (yes, I'm making a gross generalization here.) So, becoming someone who does understand the ins and outs of AWS almost assures a seat at the table.