Designing resilient systems using rate limiting

#todayilearned #programming #webperf

TL;DR notes from articles I read today.

Designing resilient systems beyond retries: rate limiting

In distributed systems, a circuit-breaker pattern and retries are commonly used to improve resiliency. A retry ‘storm’ is a common risk if the server cannot handle the increased number of requests, and a circuit-breaker can prevent it.
In a large organization with hundreds of microservices, coordinating and maintaining all the circuit-breakers is difficult and rate-limiting or throttling can be a second line of defense.
You can limit requests by client or user account (say, 1000 requests per hour each and then reject requests till the time window resets) or by endpoints (benchmarked to server capabilities so that the limit applies across all clients). These can be combined to use different levels of thresholds together, in a specific order, culminating in a server-wide threshold possibly.
Consider global versus local rate-limiting. The former is especially useful in microservices architecture because bottlenecks may not be tied to individual servers but to exhausted downstream resources such as a database, third-party service, or another microservice.
Take care to ensure the rate-limiting service does not become a single point of failure, nor should it add significant latency. The system must function even if the rate-limiter experiences problems, and perhaps fall back to its local limit strategy.

Full post here, 11 mins read

An overview of caching methods

The most common caching methods are browser caching, application caching and key-value caching.
Browser caching is a collaboration between the browser and the web server and you don’t have to write any extra code. For example - in Chrome when you reload a page you have visited before, the date specified under ‘expires’ in the 'Responses' header determines whether the browser loads resources directly from cache (from your first visit) or requests the resources again from the server. The server uses the headers passed by the browser (headers like If-modified-since or If-none-match or Etag) to determine whether to send the resources afresh or ask the browser to load from its cache.
Application-level caching is also called memoization and it is useful if your program is very slow. Think of cases where you are reading a file and extracting data from it or requesting data from an API. The results of any slow method are placed in an instance variable and returned on subsequent calls to the method. This speeds up the method. Though the downsides to this kind of caching are that you lose the cache on restarting the application and you cannot share the cache between multiple servers.
Key-value data caching takes memoization a step further with dedicated databases like memcache or Redis. This allows cached data to persist through user requests (allowing data sharing) and application reboots, but does introduce a dependency to your application and adds another object to monitor.
To determine the best method for you, start with browser caching as the baseline. Then identify your hotspots with an application profiling tool before choosing which method to grow with to add a second layer of caching.

Full post here, 7 mins read

Get these notes directly in your inbox every weekday by signing up for my newsletter, in.snippets().