Dang Hoang Nhu Nguyen

Posted on Dec 31, 2021 • Edited on Jan 4, 2022

[BTY] Day 5: Notes for "Introduction to architecting systems for scale" article

#betterthanyesterday #architecture

This post is mainly based on this article: https://lethain.com/introduction-to-architecting-systems-for-scale/#platform_layer. It was written 10 years ago, but it's still relevant nowadays. I copy all the contents I feel interested.

I really like this quote:

Few computer science or software development programs attempt to teach the building blocks of scalable systems. Instead, system architecture is usually picked up on the job by working through the pain of a growing product or by working with engineers who have already learned through that suffering process.

Load Balancing

Both horizontal scalability and redundancy are usually achieved via load balancing.

Load balancing is the process of spreading requests across multiple resources according to some metric (random, round-robin, random with weighting for machine capacity, etc) and their current status (available for requests, not responding, elevated error rate, etc).

Load needs to be balanced between user requests and your web servers, but must also be balanced at every stage to achieve full scalability and redundancy for your system. A moderately large system may balance load at three layers:

user to your web servers,
web servers to an internal platform layer,
internal platform layer to your database.

Smart Clients

It is a client which takes a pool of service hosts and balances load across them, detects downed hosts and avoids sending requests their way (they also have to detect recovered hosts, deal with adding new hosts, etc, making them fun to get working decently and a terror to setup).

Software load balancers

An example is HAProxy. It manages healthchecks and will remove and return machines to those pools according to your configuration, as well as balancing across all the machines in those pools as well.

Caching

Load balancing helps you scale horizontally across an ever-increasing number of servers, but caching will enable you to make vastly better use of the resources you already have, as well as making otherwise unattainable product requirements feasible.

Caching consists of: precalculating results (e.g. the number of visits from each referring domain for the previous day), pre-generating expensive indexes (e.g. suggested stories based on a user’s click history), and storing copies of frequently accessed data in a faster backend (e.g. Memcache instead of PostgreSQL).

In practice, caching is important earlier in the development process than load-balancing, and starting with a consistent caching strategy will save you time later on.

Application vs. database caching

Application caching requires explicit integration in the application code itself. Usually it will check if a value is in the cache; if not, retrieve the value from the database; then write that value into the cache (this value is especially common if you are using a cache which observes the least recently used caching algorithm).

key = "user.%s" % user_id
user_blob = memcache.get(key)
if user_blob is None:
    user = mysql.query("SELECT * FROM users WHERE user_id=\"%s\"", user_id)
    if user:
        memcache.set(key, json.dumps(user))
    return user
else:
    return json.loads(user_blob)

Database caching

When you flip your database on, you’re going to get some level of default configuration which will provide some degree of caching and performance. Those initial settings will be optimized for a generic usecase, and by tweaking them to your system’s access patterns you can generally squeeze a great deal of performance improvement.

The beauty of database caching is that your application code gets faster “for free” [?].

In-memory caches

Memcached and Redis are both examples of in-memory caches. This is because accesses to RAM are orders of magnitude faster than those to disk.

On the other hand, you’ll generally have far less RAM available than disk space, so you’ll need a strategy for only keeping the hot subset of your data in your memory cache. The most straightforward strategy is least recently used.

Content distribution networks

A particular kind of cache which comes into play for sites serving large amounts of static media is the content distribution network.

Cache invalidation

While caching is fantastic, it does require you to maintain consistency between your caches and the source of truth (i.e. your database), at risk of truly bizarre applicaiton behavior.

Solving this problem is known as cache invalidation.

At a high level, the solution is: each time a value changes, write the new value into the cache (this is called a write-through cache) or simply delete the current value from the cache and allow a read-through cache to populate it later (choosing between read and write through caches depends on your application’s details, but generally the author prefers write-through caches as they reduce likelihood of a stampede on your backend database).

Curious about "write-through cache" or "read-through cache"? You can find it in this article: https://codeahoy.com/2017/08/11/caching-strategies-and-how-to-choose-the-right-one/. I found it easy to understand. ☺️

Off-line processing

As a system grows more complex, it is almost always necessary to perform processing which can’t be performed in-line with a client’s request either because it is creates unacceptable latency (e.g. you want to want to propagate a user’s action across a social graph) or it because it needs to occur periodically (e.g. want to create daily rollups of analytics).

Message queues

Message queues allow your web applications to quickly publish messages to the queue, and have other consumers processes perform the processing outside the scope and timeline of the client request.

Generally you’ll either:

perform almost no work in the consumer (merely scheduling a task) and inform your user that the task will occur offline, usually with a polling mechanism to update the interface once the task is complete, or
perform enough work in-line to make it appear to the user that the task has completed, and tie up hanging ends afterwards (posting a message on Twitter or Facebook likely follow this pattern by updating the tweet/message in your timeline but updating your followers' timelines out of band).

Message queues have another benefit, which is that they allow you to create a separate machine pool for performing off-line processing rather than burdening your web application servers. This allows you to target increases in resources to your current performance or throughput bottleneck rather than uniformly increasing resources across the bottleneck and non-bottleneck systems.

Scheduling periodic tasks

Cronjob, or store the cronjobs in a Puppet config for a machine, which makes recovering from losing that machine easy, but it would still require a manual recovery, which is likely acceptable but not perfect.

Map-reduce

Platform layer

First, separating the platform and web application allow you to scale the pieces independently. If you add a new API, you can add platform servers without adding unnecessary capacity for your web application tier.

Second, adding a platform layer can be a way to reuse your infrastructure for multiple products or interfaces (a web application, an API, an iPhone app, etc) without writing too much redundant boilerplate code for dealing with caches, databases, etc.

Third, a sometimes underappreciated aspect of platform layers is that they make it easier to scale an organization.

DEV Community