Let say you want to design a highly available large-scale system. What will you read?
You might start to read about how Google, Facebook, Netflix or all those big companies build their system. That makes perfect sense.
Along the way, you will see a lot of design decision, such as how Facebook use GraphQL in their backend, Stripe using Ruby as their main language, etc.
You might start to think that those are all proven ways to build a large scale highly available system.
And that is where it is incorrect.
I recently saw this type of reasoning and start to worry about it.
Let say you design a system for those companies; what kind of concern do you need to think about?
- What is a usage characteristic?
- How will the requirement grow?
- How can we ensure system availability?
- How can we trace any issue from user tickets?
- How engineering team will work together?
According to all these concerns, you put many many design decisions to accommodate and balance all needs from every constraint the best you can think of.
Let say that to ensure GDPR requirement; you need to do X. And to ensure high availability, you need to do Y. And then to ensure performance, you need to do Z.
You will do many other things, but let just keep it simple.
But from the outsider's perspective, they see that your system implements X,Y,Z, and your system is very performant. As they are objective and data-driven, they think:
Based on the data and actual result, it is clear that I do X and Y, then I will get a performant system.
The truth is both X and Y are design decision that is not related to performance at all...
Just because a system is performant, it does not mean every parts of the system are designed to be performant.
Just because a system is highly available, it does not mean every technique implemented in that system are contribute to high availability.
Just because a system produces a result Z, does not mean design all design decision A, B, C are all related to Z.
Some real-life examples
- While many large scale complex systems implement Microservices, it does not mean that microservices are a requirement for large-scale systems. In fact, you must be this tall to use microservice. Many systems out there handle scale very well without it
- While Facebook uses GraphQL on their backend and their backend possibly handle a billion requests per day, The design of GraphQL enables even large teams to make changes with a high degree of isolation and confidence. It meant to solve the coordination problem in a large team and has nothing to do with performance and availability (well, technically a little bit, but let leave that discussion out)
I was worried about this because I recently saw some post claim that one big company achieve X, and they do Y, so if we want to achieve X, we need to do Y.
And Y was meant to solve totally different unrelated problems.
In order to understand the real correlation between a design decision and outcome, you need to go way further than just look at the result. Or at least read through the motivation behind each decision.
And if you write a design document, please also write a motivation and trade-offs. Otherwise, many followers will interpret it in the wrong way.
Thanks for reading!!