After years running a CI company, you see lots of hidden costs in architectural decisions. In this article I'd like to talk a bit about the practical differences between serverless and its primary alternative, microservice (containerized) architecture.
The vast majority of back end architectures for new products we've seen built in the past couple of years have fallen one of these two buckets.
The core idea for serverless is to specify a strategy for creating new webservers, instead of starting them yourself. That way if your product is featured somewhere and gets a huge burst of traffic, your cloud provider can start many copies of the webserver, and turn them off as the traffic subsides.
Serverless still uses webservers eventually, of course. The point is just that you don't have to create them yourself. All you do is specify the recipe for building them, and your cloud provider will create more copies as the number of concurrent requests goes up or down. Even for small projects with few users, you'll save money by turning off every single webserver overnight while nobody is visiting your site.
Instead of specifying a recipe to create your webservers (Serverless style), you could offload the most computationally expensive parts of hosting your website to someone else. That's the idea for a CDN. When a webserver slows down, it's often from hundreds of requests asking for the same resource that hasn't changed. The CDN takes care of these common requests like static images for you.
In this architecture, users to your website will request the resource from the CDN, which will respond to most (~90%) of the requests. Only the requests that can't obviously be cached are forwarded to your containers.
Serverless "webservers" are really recipes for how to create real webservers, and your cloud provider will start one when a visitor first requests something.
Containers are usually much slower to start, so you need to keep at least one running 24/7 in case someone requests something from your site.
When AWS Lambda was launched in 2014, it sounded like an incredible deal: 1GB of memory for $0.0000000167/ms of computation. A typical API request might be 20ms, so you'd be paying $0.000000334/request.
Since most workloads are very "bursty", you'll likely be paying under $100/yr for even a popular service being hosted behind lambdas.
For comparison, a comparable Containers + CDN architecture would likely cost over $500/yr for most products.
Of course, this is an extremely simplified comparison (elastic load balancing, CDN ingress/egress, serverless can use a CDN too, ...) but will generally hold true for a majority of products online today. Serverless webservers will cost 10-20% as much as comparable containers, with all else held equal.
Infrastructure cost is a nice metric - it's somewhat easy to predict, it's a convenient way to compare a "good architecture" with a "bad architecture."
However, infrastructure cost is completely dwarfed by developer salaries. That $500/yr container will be maintained by a developer being paid (on average) $75,000/yr.
As I mentioned in the title of this cost, the human cost is something I've personally seen a lot of companies underestimate. If you have a bug which can't be reproduced locally (most meaningful ones), developers using a closed-source AWS account will have enormous difficulty reproducing and triaging it.
You can't emulate serverless instances locally, only in the cloud (*)
Very few companies can create copies of their production infrastructure. It often takes hours for developers to requisition the proper environment and ensure that nobody else is using it concurrently.
Developers rarely have access to production resources, so they can't debug in production without spending hours negotiating a pair programming session.
If a single developer fixes one bug per week, and takes an hour longer to fix each bug because of the factors above, your company will be paying $1,872 per year for a single developer's productivity loss. That number is already 4x the difference in cost to begin with.
In contrast, a container architecture might entirely run on a developer's computer (which is the case for LayerCI's infrastructure) so that same developer would likely be able to reproduce a bug on their own laptop without having to use a staging environment in the first place.
For similar reasons to the previous point, it's significantly harder to automatically test a serverless architecture with a CI/CD system.
A containerized architecture is significantly easier to test because you can run it within a single VM. A serverless architecture would require a new deployment within your cloud provider for every change in order to run CI. Since cloud providers don't optimize for speed, this can take 30 minutes per change, and it monotonically increases as your team grows.
The ultimate test of a system are end-to-end (E2E) tests which validate common workflows by creating a fake user and interacting with the entire app like a real user would. These tests can only be run automatically against a proposed change if it's possible to create copies of your architecture as needed.
In many companies, a single bug in a demo could mean losing a 6-7 digit sale. These bugs are often the result of seemingly innocuous changes that affect the interactions of different components. It's very easy for a developer to break the "pay now" button or the onboarding if they aren't testing that those flows continue work after every change.
If your entire stack can be run on a single machine, it can be run in off-the-shelf CI providers without too much trouble. In comparison, a serverless stack might require creating an entire environment within your cloud provider from scratch for every change, which would take 10x longer.
The obvious solution for serverless would be for providers to adapt a common standard (AWS kubeless or AWS OpenFAAS.) As long as serverless is eponymous with closed-source cloud offerings, it will cost dramatically more because of lost developer productivity.
Really, the current best solution is to avoid platforms like AWS Lambda or Cloudflare Workers for core infrastructure altogether until it's possible to self-host them for reproduction and testing.
For almost every architectural decision, hosting cost is significantly less important than the cost of developers moving slower. Don't make architectural decisions because of price until you consider the cost of time for the developers that need to work with that infrastructure.
More discussions on Hacker News