re: Scalable architecture without magic (and how to build it if you’re not Google) VIEW POST

FULL DISCUSSION
 

Some claims are wrong, e.g.:

  • python can be FASTER than alternatives because its most popular performance-sensitive libraries are written in C, with python just executing calls. You should use python-only code to distribute data, not to do performance-intensive calls.
  • Node.js execution is single-threaded, but it's async event-driven. For I/O (disk, memory requests, ...), your OS already handles concurrent I/O perfectly and doesn't need any execution until the I/O has finished. The single thread now maximizes what it can get out of a single core, and the first step of scaling up is to use multiple cores (e.g. pm2). This is a design decision, not a downside, not something that makes it slower (unless you write bad code). Use it for the right use cases.
  • If you have relational data, NoSQL will always be slower than SQL, because it doesn't scale. Per relation, your NoSQL will have to execute an additional query, while SQL will only have one. NoSQL is for documents, SQL is for relations. They are different tools for different jobs.

And your diagrams include bottlenecks, like your load balancer and what you call "main server". In scalability, there is no such thing as a "main" server (you should have scalable workers, and scalable entry points).

You're missing DNS load balancing, which balances requests not just per region but also to different load balancers in that region. Load balanced load balancers.

Structure is also independent of technology.

  • Redis is memory storage (which you can implement in a much faster and simpler way if you have specific knowledge about your data, e.g. you have a data set that is guaranteed to be limited to a fixed, small, number of items with a specific structure). (But for generic cases, agreed, don't implement it yourself.)
  • Caching can be done at several layers (any piece of the output, or the complete pre-generated output like HTML), and needs more than only caching the final output unless you're working with simple documents that don't change often (like HTML for blogs).
  • Database choice: MongoDB, MySQL, PostgreSQL, ... Implementation and feature details will differ but if they use the same methodology, they will have similar scalability characteristics.

And so on.

A perfectly scalable solution hosted on a worldwide distributed network also doesn't cost that much these days. For solutions that don't process millions of dollars, you can go for serverless solutions and pay per execution.

Well done on contributing to a community, it can take a lot of effort. But try to include the limits of your advice in your article if you write about sensitive subjects. There will be people who find your article on Google and will make sensitive production decisions based on your advice, and they should be informed.

 

I can see nothing wrong. Python calls are written in C, but only some of them. Still can't go faster than optimised raw C or Go. Node is still single-threaded. On raw read and write queries, NoSQL will outperform SQL all the way.

No, there IS a main server if we're talking anything except distributed systems. Master-slave replication? Master is here. Regular CDN? Edge servers are copying main server.

I don't miss DNS load balancing. Lion consists of Cheetah and Tiger, each of them are load balanced.

No Redis? Write it from scratch ad-hoc and specific for your data? Oh, and maybe add some Radix sort to the mix.

Doesn't cost that much? Have you seen Heroku or AWS tiers for the data that large and servers that performant? Bear in mind that you'll have to pay monthly.

 

I can see nothing wrong.

Alright, well then don't let me disrupt your flow. Go on doing whatever it is you want to do.

code of conduct - report abuse