DEV Community

Cover image for The value of serverless: a real-life example
4

The value of serverless: a real-life example

I often hear that managed services, especially serverless services, are expensive. So I wanted to tell this story.

Before I do, let me remind you that cloud providers very often offer many options to run a workload. AWS' motto is "we prefer AND rather than OR". Let's look at queues for instance:

  • You may set up EC2 instances and run Rabbit MQ on top of those.
  • Or you could do the same with containers on ECS/EKS. No more OS to manage.
  • Or you can use AWS MQ for RabbitMQ. No more software/runtime to manage.
  • Or you can use AWS SQS. No more hourly fee, and also no capacity planning!

Now back to my story. A customer of mine needed a cache system to persist customer session data, so that all containers might share customer context.

Because we don't want to manage servers, AWS Elasticache for Redis was the obvious go-to solution. But since my customer needed really little cache memory (~50Mb) we used a cache.t4g.micro instances. Single instance in dev, a small cluster in staging and production.

For a few months, everything went fine. Our system operated smoothly. However, one day, 6h after releasing a new version to production, the system crashed. Let's say an e-commerce website doesn't do well without it's session cache.

What had happened is that the release had added just a tiny bit of load on the cache, leading to higher throughput. T4 instances have a baseline throughput and a bucket of throughput for burst. If your workload consumes more than the baseline for a long period of time, then the bucket gets empty and the instance becomes nearly unresponsive.

Variation of throughput due to release

That took a little while to diagnose and restore the system to full operational capacity. This time means

  • loss of business for my client, and reputation impact.
  • time of employee + consulting to diagnose, restore then evolve the system.

All of this because it was considered too costly to use the Serverless version, because its 1Gb minimum storage meant the cost would be at least ~106$/month. By then, AWS had released Elasticache for Valkey that has a lower minimum storage (100Mb) so we ended up moving all environments on this solution.

With the serverless version of Elasticache, we only have to worry about how much cache memory we use and how many queries we do. And that's only for FinOps reasons, not reliability considerations. AWS removed all the heavy lifting of monitoring server metrics, such as CPU, throughput, IOPS, etc.

Key take-aways:

  1. Unless you have an Ops team whose job it is to monitor servers, use serverless.
  2. Consider TCO, not just service cost when making those decisions. (Storage is cheap, Compute less so, but Human time is by all measures the most expensive resource you have).
  3. The trade off is almost always in favor of serverless unless you have very high throughput/load on your service. Then only it may be worth deploying and monitoring servers.
  4. Continuously monitor AWS what's new as AWS often releases new options that should lead you to reconsider past architecture decisions (and log the reasons for these decisions, so you can reconsider them with less effort)
  5. Any time you use T4 instances, monitor not only CPU credits but also Network allowance metrics !
  6. Monitor your systems around releases (I'll get back to that soon).

I hope this saves you from many production incidents!

Heroku

This site is built on Heroku

Join the ranks of developers at Salesforce, Airbase, DEV, and more who deploy their mission critical applications on Heroku. Sign up today and launch your first app!

Get Started

Top comments (3)

Collapse
 
dvddpl profile image
Davide de Paolis

Storage is cheap, Compute less so, but Human time is by all measures the most expensive resource you have
well said

Collapse
 
ray_bates_28f127c3a7cf098 profile image
Ray Bates

Thanks for that!

Collapse
 
anydev1103 profile image
Ron Bo

Good example to use serverless!
Thanks for your sharing.

Best Practices for Running  Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK cover image

Best Practices for Running Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK

This post discusses the process of migrating a growing WordPress eShop business to AWS using AWS CDK for an easily scalable, high availability architecture. The detailed structure encompasses several pillars: Compute, Storage, Database, Cache, CDN, DNS, Security, and Backup.

Read full post

👋 Kindness is contagious

Immerse yourself in a wealth of knowledge with this piece, supported by the inclusive DEV Community—every developer, no matter where they are in their journey, is invited to contribute to our collective wisdom.

A simple “thank you” goes a long way—express your gratitude below in the comments!

Gathering insights enriches our journey on DEV and fortifies our community ties. Did you find this article valuable? Taking a moment to thank the author can have a significant impact.

Okay