Originally posted here.
A couple words about me and the project. I’m a software developer from Ukraine, mainly working with JavaScript based projects, but open to any other technology. For the last 4 months I’ve been working on my own with a serverless project, using only AWS solutions. The goal was to develop an application with API Gateway, DynamoDB and Lambdas. The business needed an MVP, which every developer can interpret as “we need to have a good solution even sooner than usual”. Okay.
The stack was chosen before I joined the project. From what I understood, the technical person on customer side decided that using listed above technologies will speed up development process a lot. Everyone heard about serverless before. Titles like “tired of infrastructure management? Join serverless!” or “forget about scaling — put all your efforts in the code!” pop here and there on medium and twitter. Some people even think that serverless is the future of backend development and classic servers are literally dying. Some are right, some are wrong, but for these four months I accumulated a couple thoughts about expedience or practicability of the main buzz word of the last years. I really want to share them, so that you guys will have more stuff to take in consideration before you make a decision of whether to join serverless or not.
So let’s begin…
API Gateway, Lambda, DynamoDB, what’s that?
I won’t spend much time with explanation of AWS products, but I’ll leave a couple words.
API Gateway can be treated as a router for your application. It’s simple as that — you create an endpoint (they call it a resource), associate an HTTP verb either via CLI or Console UI, attach either an AWS Service or Lambda or static response built with VTL or a proxy destination to it. In my case I only attached Lambdas or services like DynamoDB, VTL allows you to cook HTTP request body into readable by any AWS service request. AWS team did a great job here 👏.
Lambda is a tiny container (you can treat it as a tiny server), in which you put your code. With Node.js based lambdas you put there a file, which contains a function, that accepts specified by AWS docs arguments. More about that you can read here. Lambda is the place where you’ll have your business logic implemented.
DynamoDB is an infinitely scalable NoSQL database. At least they say so. It is true that you’ll work with JSON, it is true that it scales very well and on its own. Infinite scalability is awesome, but it left some significant downsides on this database too. Everything has a price. I want to say that you really NEED to read the docs and to understand how DynamoDB works BEFORE you start designing your schemas. DynamoDB’s pros and cons are actually a great topic for another article.
Now it’s definitely not everything about these solutions, but I tried to cover the most essential for developer parts.
What do they always say? “Forget about scaling and infrastructure, think about the code”, right? I want to ask you to keep that in mind for the next paragraphs.
Business logic problem
The backend was not that big. On my API Gateway setup I had approximately 50 unique endpoints, which meant that I had ~40 unique lambdas. Some of the logic was delegated to the VTL + DynamoDB combination. Some functions were used for several endpoints, but with if/else or switch/case statements inside.
Sounds ok, but the problems began to arise with the thing that serverless, as they say, let you concentrate on — the code you write. The first downside of this function-per-endpoint approach came up very soon. Lambdas are tiny containers, as I mentioned before, therefore you can treat them as separate servers. Separate servers do not share random access memory. Because of that you can not build any service oriented architecture… easily.
Let’s say you want to have some service class that contains some important and very often used business logic, so that you follow the DRY principle. On classic Node server you’d build a class or just a function and imported/required it everywhere you need it. This is great, because if this logic changes, you don’t need to update it in every place this logic is used. Now imagine if you want to use the same logic in two, four, ten lambdas. Since these functions are actually separate servers, the only way to achieve the goal is… to repeat yourself. Exactly, just copy and paste the code — no other way.
So what to do? One of the options is to build a bash script that puts something shared in each lambda it should be used with. This is a pretty good option and it will work for you, but everything has it’s price. Now you’ll have to be 100% sure that the script is fine, it’s not broken and each lambda will receive the most recent version of something that you want to be shared. This script simply can’t be simple and easy to make, because this is something you want to rely on, therefore you’ll have to spend some significant amount of time on it. I just didn’t have that time. And it’s not always possible to explain your customer that you spent a day or two and nothing new from business point of view was built.
There’s another option, which I actually like, but it’s not always possible too. You can set up a private npm registry and use it to be able to require shared stuff from there. But for this you gotta pay to npm and, again, business is not always happy (especially startups) to pay for stuff that could be avoided or built by you.
Long story short, shared business logic is something that we rely on and use every day in our job, and unfortunately it becomes a problem.
But hey, you have infinite scalability!
Simultaneous deployment problem
Imagine that you have 4 lambda functions, attached to 4 different endpoints, and they share some piece of business logic, which I mentioned above. Let’s say this piece of business logic needs to be changed and since there are 4 lambda functions using it, there are 4 lambda functions to be redeployed. Obviously, they can’t be deployed simultaneously, exactly at the same time. This means that in production environment you can end up with some lambdas that use out of date logic, which can be really dangerous, you can have corrupted data in tour database and so on.
This problem is definitely not something new to the world of software development, but still, even with the simplest CRUD backend, built purely with API Gateway and Lambdas, you will definitely end up in need for some solution.
But hey, you still have infinite scalability!
The cold start problem
With lambdas you pay only for the working time of the function. When the function is idle — you do not pay for it. That’s cool.
Lambdas are containers. Containers require some time to bootstrap and to become available, unfortunately it’s not fast at all, from my experience it’s at least a couple seconds. So what does it mean? Let’s say your lambda function average working time is 500ms. If your function was idle for some time, its container will be shut down by AWS. The next time your function is invoked, AWS will spin up the container and it’ll add a couple seconds to the response time for your endpoint. Literally your average response time can jump between 300ms to 10 seconds sometimes! Not that user friendly, right?
How do people solve this cold start problem? They keep lambdas warm! So they have a cron job somewhere that triggers their lambdas from time to time, to prevent AWS from shutting down the container.
Hmm, doesn’t it sound like you actually end up thinking about the underlying infrastructure?
But hey, you still have … Ok, you get that.
So having all these problems in mind, would you still chose infinite scalability? Well, there’s no right answer on this question, because it depends only on your needs.
Other stuff
Now these are not problems, but just some stuff that I have to mention.
You can’t do WebSockets with lambdas, because they have a timeout and the protocol requires persistent up time.
One of the components of the application I built required a real time messaging option. For this I setup ECS cluster and deployed Docker containers there. Worked just fine.
Also it was not obvious from the very beginning on how to achieve staging and versioning with serverless backend. How to distinguish between prod, dev and test functions? Luckily API Gateway has concept of stages, which I made great use of. Lambdas also have a thing called aliases, where you basically make a snapshot of the function and give it a name. Had to build a couple of bash scripts to automate stuff and it also worked fine for me. Staging of lambdas might be a pretty good topic for another article too.
To sum up…
Serverless may sound as a silver bullet for all your scalability problems. Like literally you just write code and that’s all you’ll ever do! No, that’s not true at all. There’s still infrastructure and problems to handle and take care of, they’re just different.
In my opinion, the next time I have to deal with serverless backend from scratch, I’ll think twice and also will take in consideration products like Up made by TJ or Serverless framework. They actually make a great use of API Gateway and Lambda functions, I would even call them the real game changers. The code you write using them actually looks like a classic monolithic backend, but with lower cost, cause it runs on demand.
I’m not saying serverless is bad. The goal of this article was to just give you some more stuff to marinade in your minds before you join serverless too.
Because in theory, theory and practice are the same, but there’s no magic in this world, everything has its price.
P.S. Also about the price and cost you can read this great article
P.P.S If you like this post, please give it a couple claps on medium, I would appreciate that a lot <3
Thank you!
Top comments (11)
Rebuilding a CRUD application in AWS Lambda is a little bit.. not using the right tool for the job :) Actually it's a good article, I liked the read, and IMO it represents a different point of view, when thinking about serverless architecture. And I can imagine everyone starting with serverless feels the same, I was at the same page as well, I understand your pain! But...
You have to understand, and I can't stress this enough, that building serverless applications requires a very different mindset! You can't just jump in from NodeJS or whatever other language into serverless with the same approach to architecture and expect the same outcome. You have to think different, have a slightly deeper understanding of what was the intention of Lambda and DynamoDB, read some whitepapers and examples from AWS of serverless applications. And I can understand, that mindset change is awfully difficult without right examples.
I guess this will be quite a long comment, maybe it'll become an article at some point, but let's start at the beginning.
You can always direct the solution in a different way, because you are the professional here (I get it can be hard, but you're paid for being a pro!), a client can choose a stack, sure, but they should also understand the underlying implications of choosing it, which actually you've listed quite clearly here (dynamo structure, cold boot, tied logic and etc.) This article is really a perfect example when not to use serverless.
DynamoDB is a document store and you should treat it like one. I know it is called a Database, and you can do some querying there, but it's expensive and usually not needed. Sorry, should not be needed. Use another tool to satisfy those needs.
Now the logic problem - AWS Lambdas are independent, stateless, throwaway and parallel functions. They can die at any point, there can be many of them. Your logic problems, where separate lambdas have to communicate, have to be solved by using intermediates for exchanging information like AWS SQS and AWS SNS, with DynamoDB or S3 as a message store for example. On the other hand, if you're talking about libraries, all of the functions can share them, well TBH, all of the functions can be deployed from one repository without any hassle! Which leads to my next point...
You can't be deploying Lambdas one by one! That's why you might have out of date code running in Lambda. When developing serverless, API versioning becomes an important thing... But AWS provides a great tool for dealing with this problem - CloudFormation (with serverless transform aka SAM). You define your infrastructure (lamdas, dynamo tables, etc.) and deploy it all at once, then AWS takes care of everything running properly and without interruptions.
Ah, the cold start, oh how many complaints and posts and tweets about that, people pinging their lambdas. But have they ever thought about how those containers run and why are they being killed? AWS wouldn't be so elastic if they didn't. And if your lambda has a cold start constantly - you have too little calls to use AWS Lambda at all! Yes, if you're not at scale, you don't have hundreds of requests per minute at least, AWS Lambda is a waste of effort because you have to deal with bigger response times, which don't make any sense.
Use the right tool for the job! AWS serverless is awesome if used in the correct environment and for the right job. Infinite scaling is beautiful when you have thousands and thousands of lambda calls, each posting a SQS message, triggering more lambdas down the line. It works like an orchestra ;)
Sorry for the long post, but I wanted to tell that it's not as bad as it sounds from this article. The author was in an unfortunate circumstances and had to work with a chosen tool, not correct for the job, I hope my comment sheds some light.
This is awesome! Thanks for such a great comment. One of the reasons I decided to actually write a post is to find somebody who knows more stuff than I do.
This was my first experience with serverless and with AWS also (I don't count S3 and simple lambdas) and, unfortunately I didn't have anybody around to help me with the stuff, so I had only myself.
The goal of the article was to show that serverless might not be appropriate for every project in the world, but probably my mood influenced on the very text a lot, yeah.. I did end up in not the best conditions, therefore after the project my mood was not the best.
I agree about DynamoDB. Now I always say to people, that if you don't have the key to your document, you are screwed. Queries will be slow and expensive... But I could do nothing about it, so I was designing the indexes to avoid table scans at least. Didn't really like that too.
After all this and your comment, I think serverless might be great to migrate to, when you already have big enough client base, who trigger your API a lot. You won't have to deal with cold starts at least :)
Thanks for CloudFormation advice too, I wish you were around when I was working on that thing!
Have you considered locally running your own illegal underground NPM repository?
Most recently, something called Verdaccio?
No, I actually didn't know thing like these exist. Thanks!
Me neither, but I got curious from your inquiry.
Please ping me if you try it out and write something about it.
I am actually using git repositories to get my private code
Hey Vladyslab, great article with a real value and useful advices. Yes serverless is not a silverbullet and often stakeholders just consider the visible tip of the iceberg. I built a platform that is full serverless and that was a rough experience. Serverless requires to reconsider every step at every stages of software development. I would be glad to have a chat about dynamoDB and contribute to your pros and cons list. Keep up the sharing
Thanks!
Actually @donis left a great comment and mentioned dynamodb. I even think that describing this DB as a key-value store covers all pros and cons of it :)
Encountered "The cold start" problem when working on Azure Functions and my function loads some data when it spins up which is taking minutes on first run. Ended up pinging every few minutes by creating a new Logic app! A real set-back IMHO.
Divide into node project in subfolder of folder and write template of cloud formation... google lambda cloud formation and put it in main folder root level
Write one index.js file with many hanfler and write cloudformation per handler as different lambda.
Theen run sam cli... thats it. DRY code achieved.
I did 12 lambdas with 1 index file...
Im writing from a DevOps point of view. Yesterday I was at the AWS Devdays in Copenhagen and heard about a concept called Lambda layers for dealing with libraries etc. I haven't had time to read through it yet, but maybe it can help. Thanks for starting this conversation, I'm def following