Cover image for Scalable architecture without magic (and how to build it if you’re not Google)

Scalable architecture without magic (and how to build it if you’re not Google)

uyouthe profile image Miloslav Voloskov Updated on ・7 min read

TL;DR: Use the right tools. Don’t mess up write-first and read-first databases. Multiply everything. To multiply things, you have to keep them stateless. Don’t let backend do a database’s job – it will always be slower.

Scalability is considered a hard problem to tackle. It’s always presented like it’s something magical done with secret, special tools that only big, million dollar guys are allowed to use.

This is of course not true. There’s nothing magical about it – after all, it’s just the regular code you write with regular programming languages available to everyone.

First of all, it’s about choosing the right tools for the job. You’ve seen the benchmarks, you know that some of languages perform better at different things. Some of the databases are faster to read from and some of them are faster when it comes to writing data.

Even if you’ve chosen the right tech stack for your task, one server could not be enough. And this is where things become interesting.

Of course you can just go and pick from AWS tiers. But if you want to know how things really work, you should know how to make a scalable setup on bare metal.

Let’s dive in.


Choosing the right tools

Different programming languages are for different tasks.

For example, Python have very rich and sugary syntax that’s great for working with data while keeping your code small and expressive. But for achieving this, it runs on interpreter which is by default slower than Go or C which are compiled to run on bare metal.

NodeJS have probably the richest external tool palette available, but it’s single-threaded. To run NodeJS on multiple cores, you have to use something like PM2, but since this you have to keep your code stateless.

Databases are just the same. SQL ones offers the whole Turing-complete language for querying and working with data, but it comes at a cost – without caching, SQL is almost always slower than NoSQL.

Apart from that, databases are often read-first or write-first. It means that some of them are quicker for writing data and some of them performs better on large amounts of reads.

For example, for analytics and other tasks that require a lot of writing but reading is occasional, you might want to choose something write-first, just like Cassandra.

For read-first tasks like displaying news, it’s better to stick with something like MongoDB.

If you need both, just install two databases! This isn’t forbidden. Nothing will break. This is how things should be done.

Multiple servers

When one computer isn’t enough, you just double them. When two isn’t enough, you go for three and so on.

But there’s a pitfall: leaping from one to two may be way harder than going from two to three or from ten to twenty.

To use multiple computers, your backend should be stateless. It means that you have to store all of your data to your databases and leave nothing at backend. This is why functional languages are so popular at the backend, this is why Scala was invented. Functional code is stateless by default.

Different servers should behave exactly the same no matter what. If you have a multitude of stateful servers, them by definition are prone to return different data as a response to same input because there’ll be two sources of truth: a database and the server’s state. You don’t want to let this happen, trust me.

Go for stateless as soon as possible. It’s better to go for stateless right from the very beginning. If you’re on NodeJS and PM2, you have to keep your code stateless if you want PM2 to multiply your runtime for load-balancing.

Load balancer is the thing that will reroute your request to the least busy server. You obviously need your servers to deliver exactly the same responses for same requests. This is why we go stateless. As being said for NodeJS, PM2 is a great load-balancing option. If you’re not using Node, go for Nginx.

Sessions? Store them at Redis and allow all of your servers to access it.

Caching and rate limiting

Imagine calculating the same thing every 100 milliseconds for each and every user. This will make your server vulnerable for Slashdot effect – it will basically be DDOS-ed just by users accessing data.

Put a caching middleware in. Only the first user will trigger a data query, and all others will be receiving exactly the same data straight from the RAM.

It comes with downside – that data will be outdated by default. Caching middleware often allows setting the time that have to pass before cache will be reset and thus the data will be eventually refreshed.

Think about your users and their needs and configure cache accordingly. And never, never cache user input. Only the server output should be cached.

Varnish is a great cache option that works with HTTP responses, so it may work with any backend.

Even with caching, different requests blasting every 10ms can bring your server down because your server will be calculating different responses for them. This is why you need a rate limiter – if there’s not enough time have passed since last request, the ongoing request will be denied. This will keep your server alive.

Dividing responsibilities

If you’re using a SQL database and still calculating external keys with your backend, you’re not using the abilities of your database. Just set up entry relations and allow your database to calculate external keys for you – the query planner will always be faster than your backend.

Backend should have different responsibilities: hashing, building web pages from data and templates, managing sessions and so on.

For anything related to data management or data models, move it to your database as procedures or queries.

Big Data

Even with database cluster, your maximum capacity is limited to your servers’ motherboard. You just can’t put infinite amount of hard drives there. If you want to grow infinitely, you’re have no other option than using a distributed database. It will store your data on different servers with maximum capacity close to the sum of all your servers’ capacity. If you’re running out of storage, you just add another server to the mix.

With just a master-slave replication, you’ll be doubling and balancing your DB, but the capacity won’t grow infinitely.

Possible bottlenecks

  1. Single-threaded, stateful, unscalable server. For load balancing and running many servers, your code has to be stateless.

  2. Server does database’s job. Move anything related to the data to the database.

  3. Single database instance. For load-balancing a database, go for cluster.

  4. Read-first vs write-first messed up. Analyse the common tasks and use the database of different type accordingly.

  5. You’re far away from your clients geographically. Go for CDN.

Setup examples




This is your basic setup you can build on a LAMP stack in one evening. It is stateful – it stores sessions and god knows what else right in memory. You guessed it – it doesn’t scale at all. But still, it’s perfect for small weekend projects.

  • Data: gigabytes
  • Users: thousands
  • Bottleneck: Availability. Single server only, easily vulnerable to Slashdot effect
  • Tools: your regular LAMP stack.




We added cache to the mix. It gained speed, but it’s still unscalable because of stateful architecture. This is what you do when your weekend project got users.

  • Data: gigabytes
  • Users: tens of thousands
  • Bottleneck: Stateful server. Even with cache doing its job, server it still unscalable
  • Tools: Express with rate limiter and in-memory cache, MongoDB




It scales! You can have as many servers as you want. Now you can deal with all that requests that may’ve brought a Cat down, but your database is still runs a single instance and have to deal with all the requests. Despite that, it’s perfect for small projects, e-shops or something like that.

  • Data: terabytes
  • Users: hundreds of thousands
  • Bottleneck: Single DB. Functional languages kick in, server is scalable. But single DB can fail to deal with many requests
  • Tools: Go, Redis cache, MongoDB




It’s fast. It’s scalable. Look how beautiful it is. DB is balancing requests, as well as your backend. The bottleneck here is when you run a single server or data center, overseas users may have to deal with high latencies just because they’re far away. But still, this setup can handle many of users and is perfect for news website.

  • Data: hundreds of terabytes
  • Users: millions
  • Bottleneck: Distance. The server is fast, but if your user is far away, it can be slow
  • Tools: Go, Redis + Cassandra + MongoDB




This is a CDN – an entirely different thing. You have multiple servers all over the world geographically and they can serve the requests just like the master. It’s not like having a cache – they are fully-functional on their own.

Users from different continents are separated with DNS.

Despite that servers are fast, you’re still limited to the capacity of one server. Your DB’s are copies of master DB, thus you are limited to master’s capacity.

This is perfect for pretty much anything like hosting providers, large e-commerce and things like that.

  • Data: hundreds of terabytes
  • Users: tens of millions
  • Bottleneck: Big Data. With master-slave replication, you can’t go big on data, you’re limited to the capacity of one DB server.
  • Tools: Same but MongoDB is clustered





This is the final form. With a graph database like Riak, your capacity is no longer limited. When you’re running low, you just buy a new storage server and add it to the mix.

Perfect for recreating Google or Facebook.

  • Data: unlimited
  • Users: all of them
  • Bottleneck: Price. It costs like a space program.
  • Tools: Go, Riak

Wrapping up

We’ve reviewed some of most common setups for pretty much every project. You don’t have to stick with them – if the task requires that, go and design your own. Just remember that every tool have it’s use and make sure that you’re using the tools that are right for your job.

Keep is scalable, keep is stateless.


Posted on by:

uyouthe profile

Miloslav Voloskov


🏳️‍🌈 Declarative logic for masses


markdown guide

Thanks for the really interesting article @uyouthe !

When the database is distributed, can there be situations where a user on one server will get different results from a user accessing a different server?

I'm thinking that there are basically two scenarios:

In one scenario all data is local to, say, each user. In that case we can use sharding to separate the databases, and there are no consistency problems: User A goes to database A, user B goes to database B and everything is fine.

However, in the second scenario, if I need to get data that is not "native" to my server, then I either have to access the other server directly for that data, which presumably lowers scalability, or I need to use some cached version of that data which may not be up to date. Is that a reasonable assessment or is there more to it?


First of all, thanks!

No, data inconsistencies are not possible in distributed databases. As soon as we go distributed, a CRDT algorithms steps in to ensure data consistency. You basically can’t go distributed without CRDT, and Riak got you covered. This is why you can access the data through any node – conflict resolving and syncing made under the hood.


You either go full distributed or not distributed at all. With just a master-slave replication, slaves just copy the whole dataset and thus any inconsistencies aren’t possible.


Thank you for your reply! Your answer prompted me to do some reading. If I understand correctly, it seems that this kind of approach relies on the idea of "eventual consistency." However, if that is the case, it does seem that different nodes can potentially return different versions of the same information. That is, a node can potentially answer a query with data that is not up-to-date (over some finite interval of time) even when there are other nodes that do have the up-to-date information. This is something I am interested in, but do not have experience with, so do let me know if I've misunderstood, or if Riak works differently from "basic" eventual consistency...

Yes, you’re right. At distributed system, you can go for ACID, but it will be slower. However, Riak seems to use eventual consistency and vector clock:

In CRDTs, the Cap theorem is always taking place.


Its also extremely important to look into the use case of the setup.

It is quite possible that only small handful of data operations would need such level of database consistency. Where you can have large scale distributed DB for almost everything but one API endpoint.

Facebook for example is known to still extensively use MySQL throughout their system, during such situations (exactly where however is unknown).

Majority of data operations would then typically use "eventual consistency" where it makes alot more sense at their scale.

With some rather careful planning of API and data flows, along with DB sharding. And of course time to invest in the relevant dev work... it can be quite surprising how little data operations need true complete consistency.


Once you're using replicated MySQL with nontrivial lag among replicas (thing US to Europe or Asia), it's eventually consistent anyway unless you force your reads to go to the master.

Facebook's backend is still pretty much all MySQL, but backing another system called Tao.


Of course yes. Going that big, it's important to have a well-defined operations workflow.


I felt genuinely excited reading this. I’ve geeked out about scale for a long time and have only had a couple of opportunities to put it into practice. Sabertooth is next level!


Wow, you had a chance to deploy a Lion? Amazing!


I wish! A couple of cheetahs and I saw a tiger up close.

You can actually build a Tiger in one evening at your regular laptop :)

There’s probably a docket image for it!

How would you build a Tiger just in one evening? I'm trying to learn more about deployment and server architecture.


Hey Miloslav, congrats on this great write-up and thanks for simplifying this topic. I am interested in knowing which of these architectures you've had the chance of creating or working with directly. Thanks.


Hey, thanks! I've deployed everything up to a Lion.


Great to know you've experienced these. Any practical tutorial which can be followed for hands on practice about these architectures?

My article, basically :)

I also created a Cheetah implementation in Node, Mongo, Redis and PM2, but it still lacks a cache and load balancer. It's easily deployed to Heroku.

Cache and load balancer are easy things to implement tho.


Excellent article @uyouthe !

I was wondering if you had some sources for further reading on these topics. I liked your concise summary. Is is based on personal experience or are there texts on the subject that we could refer to?

Also are these names you invented for the architectures or are they more commonly known?


First of all, thanks!

Yes, I invented that names as a metaphors.

Well, this is all my personal experience plus a kind of compilation of both my CS degree and my own solutions. I've tried to read books on scalable architecture, and pretty much all of them seem to tell exactly the same but spread across hundreds of pages.

Fell free to use this patterns :)


I'd appreciate any book recommendations if you have them. Thanks.

Unfortunately I don’t have any book suggestions on scalability but check DESTROYALLSOFTWARE podcast. They tend to elaborate on hard topics.


One more question: you suggest rate limiters.

What is the best way to implement these. Clearly writing your own could have some pitfalls. Does Apache or Tomcat or Rails (or whatever) have tools we can just turn on to achieve this? Is this something we can configure in AWS?


I suggest rate-limiter-flexible for NodeJS servers. I never worked with Apache specifically, I just've treated it like a go-to LAMP webserver, but I googled it and here's what I found.

Here's the AWS default approach.


The best way (IMHO) is to treat these issues at infrastructure level rather than application-level. Check projects such as Ambassador, Kong, Istio, Linkerd. And it's not just rate limiting, it's all kinds of policies, security, circuit-breakers, canary deployments, etc.


I love this article, your explaination and language are very easy. it brushes up my knowledge to some other level and all credits goes to you. keep it up this great work. i would love to see your next article.


Oh, thanks! Nice to hear that. Here's the next article about product-making: dev.to/uyouthe/sustainable-archite...


This is an amazing writeup and solid description of how to scale with growth. I love it!


With a graph database like Riak, your capacity is no longer limited. When you’re running low, you just buy a new storage server and add it to the mix.

Oh, if only that were true...

When you add a new server to a storage backend, you have to migrate data to it. That's a major load on the servers and the network. Do you have enough capacity? Also, will your partitioning/sharding scheme scale? You can only migrate data to a new server if your partitioning scheme is granular enough. It's easy to find scenarios where you have no granularity left to partition.

Any database product that says, "You just add servers and it takes care of itself!" it engaging in false advertising. Certain parts of the scaling operation may be built in, but the operational headaches and risks are still there.

It's easy to get caught up talking about horizontally scaling stateless programs, but it's not actually that interesting, and a team that effortlessly scaled a stateless program did so because a specialized team was taking care of the scaling problems in the stateful systems they were using and trying to abstract as much as possible away. You can only get so far, though. For example, many developers who work in stateless environments will see a deadlock exception from MySQL and immediately say, "Oh, something's wrong with the database" and forward it to the infrastructure team. But that's not infrastructure. That's concurrency in the application logic, that is, state.


When using a distributed system, of course you should go for atomic data model. Of course if you just store the data as is, it wouldn't distribute.


I'm curious, for Cheetah, you mention a stack using Go and MongoDB. Do you think these choices are important in terms of the horizontal scaling? Say we replace Go with Python (or Ruby, Perl, Php), and MongoDB with Postgresql (or maybe MySQL), would that significantly impact the situation? Also, would using cooperative concurrency be very important here, or would using threads to serve the (stateless) application code be okay? I really like how you included a sense of how many concurrent connections are possible with each scenario, but it made me wonder about what makes the difference. Is it the particular technologies more, or is it more the overall architectural choices (like not storing any state at the application level) and having multiple servers?


Of course it’s about architecture. You can use whatever tools you want. I just mention the most popular and suitable as it seems to me


This is by far the easily understandable article on scaling on the internet. I can see a lot of effort has gone in writing this piece and helping us understand things better. I am sharing this on social media.


Wow, nice to hear that! Thank you!

I actually wrote this in one take in maybe 40 minutes, so not so much refinement and polishing :)


WOW I read it listening The Quarantine Zone and the entire article sounds incredibly epic! A freaking sabertooth!!!!


What's interesting about the "Sabertooth" approach is how the DB and app servers scale independently.

I wonder why "regular" microservice architecture didn't get a diagram here?

If you have independent server nodes, each with their own database, this gives you similar scalability (since each node will spend it's time on app or DB load as needed) without the network overhead between app/DB nodes. And it doesn't have to "cost like a space program" in terms of infrastructure - and probably is comparable in terms of cost of development/deployment.

I'm not a microservice architecture fanboy by any means, but the article seems incomplete without considering this type of architecture. 🙂


“For anything related to data management or data models, move it to your database as procedures or queries.”

Don’t know if I agree here. It’s difficult to source control SQL procedures/functions when they live in the database.


Your SQL schemas should be declarative. Set them to reapply on every deploy and store them as .sql files.


I'm shocked that Kubernetes doesn't even mentoined. :)


I tried to mention infrastructural tools as least as possible. Docker isn't mentioned either.


It's a nice summary of web development! I really enjoyed it


Oh, thanks! Glad to hear that!


Great article!

"Bottleneck: Price. It costs like a space program." - I laughed out loud when I read this :)


Thanks! Whoa, I wrote something funny and someone said it was FUNNY, finally!


I'm just a newcomer. I hardly comment on any post until yours. I just wanna say "Amazingggggg". I am looking forward to other articles likes this from you


You're truly an architect. Thank you for this great article, Miloslav.


You know what, I'm gonna save that testimonial. Thank you a lot :)


Thank you for this! A really interesting read.


A very solid description.
More than best practices, you write interesting approaches.

Definitively a must-read.

Bravo ✍️


Great article! One thing I don't have so clear is how the ORM cache works in these architectures.


I'm afraid I don't have sufficient experience with ORM to give advice here.


Some claims are wrong, e.g.:

  • python can be FASTER than alternatives because its most popular performance-sensitive libraries are written in C, with python just executing calls. You should use python-only code to distribute data, not to do performance-intensive calls.
  • Node.js execution is single-threaded, but it's async event-driven. For I/O (disk, memory requests, ...), your OS already handles concurrent I/O perfectly and doesn't need any execution until the I/O has finished. The single thread now maximizes what it can get out of a single core, and the first step of scaling up is to use multiple cores (e.g. pm2). This is a design decision, not a downside, not something that makes it slower (unless you write bad code). Use it for the right use cases.
  • If you have relational data, NoSQL will always be slower than SQL, because it doesn't scale. Per relation, your NoSQL will have to execute an additional query, while SQL will only have one. NoSQL is for documents, SQL is for relations. They are different tools for different jobs.

And your diagrams include bottlenecks, like your load balancer and what you call "main server". In scalability, there is no such thing as a "main" server (you should have scalable workers, and scalable entry points).

You're missing DNS load balancing, which balances requests not just per region but also to different load balancers in that region. Load balanced load balancers.

Structure is also independent of technology.

  • Redis is memory storage (which you can implement in a much faster and simpler way if you have specific knowledge about your data, e.g. you have a data set that is guaranteed to be limited to a fixed, small, number of items with a specific structure). (But for generic cases, agreed, don't implement it yourself.)
  • Caching can be done at several layers (any piece of the output, or the complete pre-generated output like HTML), and needs more than only caching the final output unless you're working with simple documents that don't change often (like HTML for blogs).
  • Database choice: MongoDB, MySQL, PostgreSQL, ... Implementation and feature details will differ but if they use the same methodology, they will have similar scalability characteristics.

And so on.

A perfectly scalable solution hosted on a worldwide distributed network also doesn't cost that much these days. For solutions that don't process millions of dollars, you can go for serverless solutions and pay per execution.

Well done on contributing to a community, it can take a lot of effort. But try to include the limits of your advice in your article if you write about sensitive subjects. There will be people who find your article on Google and will make sensitive production decisions based on your advice, and they should be informed.


I can see nothing wrong. Python calls are written in C, but only some of them. Still can't go faster than optimised raw C or Go. Node is still single-threaded. On raw read and write queries, NoSQL will outperform SQL all the way.

No, there IS a main server if we're talking anything except distributed systems. Master-slave replication? Master is here. Regular CDN? Edge servers are copying main server.

I don't miss DNS load balancing. Lion consists of Cheetah and Tiger, each of them are load balanced.

No Redis? Write it from scratch ad-hoc and specific for your data? Oh, and maybe add some Radix sort to the mix.

Doesn't cost that much? Have you seen Heroku or AWS tiers for the data that large and servers that performant? Bear in mind that you'll have to pay monthly.


I can see nothing wrong.

Alright, well then don't let me disrupt your flow. Go on doing whatever it is you want to do.