Scalable architecture without magic (and how to build it if you’re not Google)

Miloslav Voloskov on February 21, 2019

TL;DR: Use the right tools. Don’t mess up write-first and read-first databases. Multiply everything. To multiply things, you have to keep them sta... [Read Full]
markdown guide
 

Thanks for the really interesting article @uyouthe !

When the database is distributed, can there be situations where a user on one server will get different results from a user accessing a different server?

I'm thinking that there are basically two scenarios:

In one scenario all data is local to, say, each user. In that case we can use sharding to separate the databases, and there are no consistency problems: User A goes to database A, user B goes to database B and everything is fine.

However, in the second scenario, if I need to get data that is not "native" to my server, then I either have to access the other server directly for that data, which presumably lowers scalability, or I need to use some cached version of that data which may not be up to date. Is that a reasonable assessment or is there more to it?

 

First of all, thanks!

No, data inconsistencies are not possible in distributed databases. As soon as we go distributed, a CRDT algorithms steps in to ensure data consistency. You basically can’t go distributed without CRDT, and Riak got you covered. This is why you can access the data through any node – conflict resolving and syncing made under the hood.

en.m.wikipedia.org/wiki/Conflict-f...

You either go full distributed or not distributed at all. With just a master-slave replication, slaves just copy the whole dataset and thus any inconsistencies aren’t possible.

 

Thank you for your reply! Your answer prompted me to do some reading. If I understand correctly, it seems that this kind of approach relies on the idea of "eventual consistency." However, if that is the case, it does seem that different nodes can potentially return different versions of the same information. That is, a node can potentially answer a query with data that is not up-to-date (over some finite interval of time) even when there are other nodes that do have the up-to-date information. This is something I am interested in, but do not have experience with, so do let me know if I've misunderstood, or if Riak works differently from "basic" eventual consistency...

Yes, you’re right. At distributed system, you can go for ACID, but it will be slower. However, Riak seems to use eventual consistency and vector clock:
docs.basho.com/riak/kv/2.0.1/learn...

In CRDTs, the Cap theorem is always taking place.

 

Its also extremely important to look into the use case of the setup.

It is quite possible that only small handful of data operations would need such level of database consistency. Where you can have large scale distributed DB for almost everything but one API endpoint.

Facebook for example is known to still extensively use MySQL throughout their system, during such situations (exactly where however is unknown).

Majority of data operations would then typically use "eventual consistency" where it makes alot more sense at their scale.

With some rather careful planning of API and data flows, along with DB sharding. And of course time to invest in the relevant dev work... it can be quite surprising how little data operations need true complete consistency.

 

Once you're using replicated MySQL with nontrivial lag among replicas (thing US to Europe or Asia), it's eventually consistent anyway unless you force your reads to go to the master.

Facebook's backend is still pretty much all MySQL, but backing another system called Tao.

 

Of course yes. Going that big, it's important to have a well-defined operations workflow.

 

Excellent article @uyouthe !

I was wondering if you had some sources for further reading on these topics. I liked your concise summary. Is is based on personal experience or are there texts on the subject that we could refer to?

Also are these names you invented for the architectures or are they more commonly known?

 

First of all, thanks!

Yes, I invented that names as a metaphors.

Well, this is all my personal experience plus a kind of compilation of both my CS degree and my own solutions. I've tried to read books on scalable architecture, and pretty much all of them seem to tell exactly the same but spread across hundreds of pages.

Fell free to use this patterns :)

 

I'd appreciate any book recommendations if you have them. Thanks.

Unfortunately I don’t have any book suggestions on scalability but check DESTROYALLSOFTWARE podcast. They tend to elaborate on hard topics.

 

I felt genuinely excited reading this. I’ve geeked out about scale for a long time and have only had a couple of opportunities to put it into practice. Sabertooth is next level!

 

Wow, you had a chance to deploy a Lion? Amazing!

 

I wish! A couple of cheetahs and I saw a tiger up close.

You can actually build a Tiger in one evening at your regular laptop :)

 

Hey Miloslav, congrats on this great write-up and thanks for simplifying this topic. I am interested in knowing which of these architectures you've had the chance of creating or working with directly. Thanks.

 

Hey, thanks! I've deployed everything up to a Lion.

 

Great to know you've experienced these. Any practical tutorial which can be followed for hands on practice about these architectures?

My article, basically :)

I also created a Cheetah implementation in Node, Mongo, Redis and PM2, but it still lacks a cache and load balancer. It's easily deployed to Heroku.

Cache and load balancer are easy things to implement tho.

 

I'm curious, for Cheetah, you mention a stack using Go and MongoDB. Do you think these choices are important in terms of the horizontal scaling? Say we replace Go with Python (or Ruby, Perl, Php), and MongoDB with Postgresql (or maybe MySQL), would that significantly impact the situation? Also, would using cooperative concurrency be very important here, or would using threads to serve the (stateless) application code be okay? I really like how you included a sense of how many concurrent connections are possible with each scenario, but it made me wonder about what makes the difference. Is it the particular technologies more, or is it more the overall architectural choices (like not storing any state at the application level) and having multiple servers?

 

Of course it’s about architecture. You can use whatever tools you want. I just mention the most popular and suitable as it seems to me

 

This is by far the easily understandable article on scaling on the internet. I can see a lot of effort has gone in writing this piece and helping us understand things better. I am sharing this on social media.

 

Wow, nice to hear that! Thank you!

I actually wrote this in one take in maybe 40 minutes, so not so much refinement and polishing :)

 

With a graph database like Riak, your capacity is no longer limited. When you’re running low, you just buy a new storage server and add it to the mix.

Oh, if only that were true...

When you add a new server to a storage backend, you have to migrate data to it. That's a major load on the servers and the network. Do you have enough capacity? Also, will your partitioning/sharding scheme scale? You can only migrate data to a new server if your partitioning scheme is granular enough. It's easy to find scenarios where you have no granularity left to partition.

Any database product that says, "You just add servers and it takes care of itself!" it engaging in false advertising. Certain parts of the scaling operation may be built in, but the operational headaches and risks are still there.

It's easy to get caught up talking about horizontally scaling stateless programs, but it's not actually that interesting, and a team that effortlessly scaled a stateless program did so because a specialized team was taking care of the scaling problems in the stateful systems they were using and trying to abstract as much as possible away. You can only get so far, though. For example, many developers who work in stateless environments will see a deadlock exception from MySQL and immediately say, "Oh, something's wrong with the database" and forward it to the infrastructure team. But that's not infrastructure. That's concurrency in the application logic, that is, state.

 

When using a distributed system, of course you should go for atomic data model. Of course if you just store the data as is, it wouldn't distribute.

 

This is an amazing writeup and solid description of how to scale with growth. I love it!

 
 

One more question: you suggest rate limiters.

What is the best way to implement these. Clearly writing your own could have some pitfalls. Does Apache or Tomcat or Rails (or whatever) have tools we can just turn on to achieve this? Is this something we can configure in AWS?

 

I suggest rate-limiter-flexible for NodeJS servers. I never worked with Apache specifically, I just've treated it like a go-to LAMP webserver, but I googled it and here's what I found.

Here's the AWS default approach.

 

The best way (IMHO) is to treat these issues at infrastructure level rather than application-level. Check projects such as Ambassador, Kong, Istio, Linkerd. And it's not just rate limiting, it's all kinds of policies, security, circuit-breakers, canary deployments, etc.

 

“For anything related to data management or data models, move it to your database as procedures or queries.”

Don’t know if I agree here. It’s difficult to source control SQL procedures/functions when they live in the database.

 

Your SQL schemas should be declarative. Set them to reapply on every deploy and store them as .sql files.

 

WOW I read it listening The Quarantine Zone and the entire article sounds incredibly epic! A freaking sabertooth!!!!

 
 

I'm shocked that Kubernetes doesn't even mentoined. :)

 

I tried to mention infrastructural tools as least as possible. Docker isn't mentioned either.

 

You're truly an architect. Thank you for this great article, Miloslav.

 

You know what, I'm gonna save that testimonial. Thank you a lot :)

 

Great article!

"Bottleneck: Price. It costs like a space program." - I laughed out loud when I read this :)

 

Thanks! Whoa, I wrote something funny and someone said it was FUNNY, finally!

 
 
 
 
 

Great article! One thing I don't have so clear is how the ORM cache works in these architectures.

 

I'm afraid I don't have sufficient experience with ORM to give advice here.

 
 
 

A very solid description.
More than best practices, you write interesting approaches.

Definitively a must-read.

Bravo ✍️

 
 

Some claims are wrong, e.g.:

  • python can be FASTER than alternatives because its most popular performance-sensitive libraries are written in C, with python just executing calls. You should use python-only code to distribute data, not to do performance-intensive calls.
  • Node.js execution is single-threaded, but it's async event-driven. For I/O (disk, memory requests, ...), your OS already handles concurrent I/O perfectly and doesn't need any execution until the I/O has finished. The single thread now maximizes what it can get out of a single core, and the first step of scaling up is to use multiple cores (e.g. pm2). This is a design decision, not a downside, not something that makes it slower (unless you write bad code). Use it for the right use cases.
  • If you have relational data, NoSQL will always be slower than SQL, because it doesn't scale. Per relation, your NoSQL will have to execute an additional query, while SQL will only have one. NoSQL is for documents, SQL is for relations. They are different tools for different jobs.

And your diagrams include bottlenecks, like your load balancer and what you call "main server". In scalability, there is no such thing as a "main" server (you should have scalable workers, and scalable entry points).

You're missing DNS load balancing, which balances requests not just per region but also to different load balancers in that region. Load balanced load balancers.

Structure is also independent of technology.

  • Redis is memory storage (which you can implement in a much faster and simpler way if you have specific knowledge about your data, e.g. you have a data set that is guaranteed to be limited to a fixed, small, number of items with a specific structure). (But for generic cases, agreed, don't implement it yourself.)
  • Caching can be done at several layers (any piece of the output, or the complete pre-generated output like HTML), and needs more than only caching the final output unless you're working with simple documents that don't change often (like HTML for blogs).
  • Database choice: MongoDB, MySQL, PostgreSQL, ... Implementation and feature details will differ but if they use the same methodology, they will have similar scalability characteristics.

And so on.

A perfectly scalable solution hosted on a worldwide distributed network also doesn't cost that much these days. For solutions that don't process millions of dollars, you can go for serverless solutions and pay per execution.

Well done on contributing to a community, it can take a lot of effort. But try to include the limits of your advice in your article if you write about sensitive subjects. There will be people who find your article on Google and will make sensitive production decisions based on your advice, and they should be informed.

 

I can see nothing wrong. Python calls are written in C, but only some of them. Still can't go faster than optimised raw C or Go. Node is still single-threaded. On raw read and write queries, NoSQL will outperform SQL all the way.

No, there IS a main server if we're talking anything except distributed systems. Master-slave replication? Master is here. Regular CDN? Edge servers are copying main server.

I don't miss DNS load balancing. Lion consists of Cheetah and Tiger, each of them are load balanced.

No Redis? Write it from scratch ad-hoc and specific for your data? Oh, and maybe add some Radix sort to the mix.

Doesn't cost that much? Have you seen Heroku or AWS tiers for the data that large and servers that performant? Bear in mind that you'll have to pay monthly.

 

I can see nothing wrong.

Alright, well then don't let me disrupt your flow. Go on doing whatever it is you want to do.

code of conduct - report abuse