Colin Chartier

Posted on Mar 7, 2021

Serverless is more expensive than you'd expect.

#serverless #lambda #systems #docker

After years running a CI company, you see lots of hidden costs in architectural decisions. In this article I'd like to talk a bit about the practical differences between serverless and its primary alternative, microservice (containerized) architecture.

Introduction to architectures: Serverless and Containers + CDN

The vast majority of back end architectures for new products we've seen built in the past couple of years have fallen one of these two buckets.

Serverless

The core idea for serverless is to specify a strategy for creating new webservers, instead of starting them yourself. That way if your product is featured somewhere and gets a huge burst of traffic, your cloud provider can start many copies of the webserver, and turn them off as the traffic subsides.

Serverless still uses webservers eventually, of course. The point is just that you don't have to create them yourself. All you do is specify the recipe for building them, and your cloud provider will create more copies as the number of concurrent requests goes up or down. Even for small projects with few users, you'll save money by turning off every single webserver overnight while nobody is visiting your site.

Containers + CDN

Instead of specifying a recipe to create your webservers (Serverless style), you could offload the most computationally expensive parts of hosting your website to someone else. That's the idea for a CDN. When a webserver slows down, it's often from hundreds of requests asking for the same resource that hasn't changed. The CDN takes care of these common requests like static images for you.

In this architecture, users to your website will request the resource from the CDN, which will respond to most (~90%) of the requests. Only the requests that can't obviously be cached are forwarded to your containers.

Recap: Serverless vs Containers

Serverless "webservers" are really recipes for how to create real webservers, and your cloud provider will start one when a visitor first requests something.
Containers are usually much slower to start, so you need to keep at least one running 24/7 in case someone requests something from your site.

Serverless looks inexpensive in theory

When AWS Lambda was launched in 2014, it sounded like an incredible deal: 1GB of memory for $0.0000000167/ms of computation. A typical API request might be 20ms, so you'd be paying $0.000000334/request.

Since most workloads are very "bursty", you'll likely be paying under $100/yr for even a popular service being hosted behind lambdas.

For comparison, a comparable Containers + CDN architecture would likely cost over $500/yr for most products.

Of course, this is an extremely simplified comparison (elastic load balancing, CDN ingress/egress, serverless can use a CDN too, ...) but will generally hold true for a majority of products online today. Serverless webservers will cost 10-20% as much as comparable containers, with all else held equal.

Developer salaries are the hidden cost of serverless

Infrastructure cost is a nice metric - it's somewhat easy to predict, it's a convenient way to compare a "good architecture" with a "bad architecture."

However, infrastructure cost is completely dwarfed by developer salaries. That $500/yr container will be maintained by a developer being paid (on average) $75,000/yr.

As I mentioned in the title of this cost, the human cost is something I've personally seen a lot of companies underestimate. If you have a bug which can't be reproduced locally (most meaningful ones), developers using a closed-source AWS account will have enormous difficulty reproducing and triaging it.

Three factors that combine to make developing with serverless exceptionally difficult.

You can't emulate serverless instances locally, only in the cloud (*)
Very few companies can create copies of their production infrastructure. It often takes hours for developers to requisition the proper environment and ensure that nobody else is using it concurrently.
Developers rarely have access to production resources, so they can't debug in production without spending hours negotiating a pair programming session.

If a single developer fixes one bug per week, and takes an hour longer to fix each bug because of the factors above, your company will be paying $1,872 per year for a single developer's productivity loss. That number is already 4x the difference in cost to begin with.

In contrast, a container architecture might entirely run on a developer's computer (which is the case for LayerCI's infrastructure) so that same developer would likely be able to reproduce a bug on their own laptop without having to use a staging environment in the first place.

Another cost: More bugs make it to customers with serverless

For similar reasons to the previous point, it's significantly harder to automatically test a serverless architecture with a CI/CD system.

A containerized architecture is significantly easier to test because you can run it within a single VM. A serverless architecture would require a new deployment within your cloud provider for every change in order to run CI. Since cloud providers don't optimize for speed, this can take 30 minutes per change, and it monotonically increases as your team grows.

The ultimate test of a system are end-to-end (E2E) tests which validate common workflows by creating a fake user and interacting with the entire app like a real user would. These tests can only be run automatically against a proposed change if it's possible to create copies of your architecture as needed.

In many companies, a single bug in a demo could mean losing a 6-7 digit sale. These bugs are often the result of seemingly innocuous changes that affect the interactions of different components. It's very easy for a developer to break the "pay now" button or the onboarding if they aren't testing that those flows continue work after every change.

If your entire stack can be run on a single machine, it can be run in off-the-shelf CI providers without too much trouble. In comparison, a serverless stack might require creating an entire environment within your cloud provider from scratch for every change, which would take 10x longer.

Solutions

The obvious solution for serverless would be for providers to adapt a common standard (AWS kubeless or AWS OpenFAAS.) As long as serverless is eponymous with closed-source cloud offerings, it will cost dramatically more because of lost developer productivity.

Really, the current best solution is to avoid platforms like AWS Lambda or Cloudflare Workers for core infrastructure altogether until it's possible to self-host them for reproduction and testing.

For almost every architectural decision, hosting cost is significantly less important than the cost of developers moving slower. Don't make architectural decisions because of price until you consider the cost of time for the developers that need to work with that infrastructure.

More discussions on Hacker News

Top comments (12)

KamranKhan-Dev • Mar 7 '21

You should really explain the point you're trying to make with the asterisk (*). Emulators are widely available to run functions locally, especially with Azure (via VSCode) and GCP/Firebase.
The other two points sound like they're based on personal experience. Setting up a good developer experience is paramount and the return on investment in doing so is even greater.

Colin Chartier • Mar 8 '21

Firebase emulators are a terrible experience - we've had multiple customers eventually migrate off of them because they are so different from production firebase.

Azure's emulators are good, but they still aren't very popular, in 95% of cases when someone is talking about serverless they will be talking about AWS Lambda.

The asterisk is that you can sometimes find products (like github.com/serverless-stack/server... or whatnot) that provide third party solutions, but they are generally pretty low fidelity in comparison to the production configuration, especially taking into consideration things like service discovery

KamranKhan-Dev • Mar 8 '21

I would have to go back 4 years and 6 companies to find an infrastructure that was deployed on AWS, so saying that serverless talks about AWS Lambda in 95% of cases is a pretty strong statement.

Given you point to developers' salaries being the hidden costs, I feel that your post lacks insights on the problems you've faced or your pain points.

Finally, you warn against the use of AWS Lambda and Cloudflare Workers but there aren't any recommendations for alternative solutions?

Nočnica Mellifera • Mar 8 '21

I love this article and just want to echo this point: most attempts to emulate Lambdas locally aren't usable. Notably: Lambda functions always rely on cloud resources that can't be locally replicated.

(Quick plug for Stackery.io which has a hybrid tool for better local emulation)

Johan Eriksson • Mar 9 '21

It's 2021, now it's more like 95% azure. At least in my small bubble. Your small bubble seems older though

DominiksCode • Mar 8 '21

Thank you for your article but I do not really follow what you are writing.

You use lambda on the one site and container + cdn on the other. Then you compare it. This comparison is flawed because you can use lambda + cdn in the exact same way. You can even have caching in api-gateway basically doing the same.
You compare direct cost of lambdas to containers not keeping in mind that you have to run 2-3 containers at all time to be available compared to 0 lambda executions on the other side in the night for example. Not even mentioning heavy burst of traffic which at some apps can occure which would mean you need a lot of containers to be hot to tackle this and lambda handling this on the other side for you.
Always keep the use case in mind.
Developer cost is higher on developers which can do severless? I do not see this. A developer which can do a good container architecture can do serverless too. Developers are never cheap this times.
You can not debug serverless locally? Did you every use serverless framework or AWS SAM? You run the function locally with node debugger and database access or other services like SQS and SNS like you do in the cloud. Have a look at the documentation under: AWS - Invoke Local
You can not run the whole stack locally? Yes Yes you can. Including Cloud Services and on Azure even with local mock services for databases which can be run in pipelines as well (we do this). If you use Cloud Services you need to access them. If you use lets say MySQL locally you can do this with serverless in the same way.

So thanks again for the post but I think some stuff needs correction.

Chad Woolley • Mar 8 '21

Re: "It's very easy for a developer to break the "pay now" button or the onboarding if they aren't testing that those flows continue work after every change."

If it's this easy to break the money path, it sounds like the problem isn't a lack of end-to-end testing, but a lack of unit testing, enforced API contracts, type/null safety in your language choice, and lack of developer discipline.

The tip of the testing pyramid should be your last defense against bugs, not the first.

Tomas Hayes • Mar 8 '21

When I used vercel last year, it had this pretty neat command to run your code as it was in their cloud, is not open source I think, but it worked great.

Nice points to think about in the article. Thanks.

Johan Eriksson • Mar 9 '21

I have the absolutely opposite experience to what you talk about 🤠

Emil • Mar 8 '21

You might be right. But I would not go straight into use serverles, or use docker. If you want to use a cloud service like azure or aws you will have to use them. But no one says you should only use only one of them. Its totally fine to have some lambda functions as glue code or write decent apis with it. The tooling with aws lambda or azure is fine and if you write isolated code there is a way to unit test them. Sure debugging on prod is heavy, but imho who does that?!

Colin Chartier • Mar 7 '21

RDS is an excellent example of a cloud-hosted service that implements an open standard. Even if your production uses RDS, you could still reproduce bugs somewhat easily using a postgresql docker container seeded with something similar to production data.