Once upon a time, companies thought they could do everything in-houselike being the chef, the waiter, and the dishwasher at a restaurant. But soon, they realized their secret recipe for success was drowning in unimportant tasks like setting up email servers, building payment systems, and reinventing the wheel (which, spoiler alert, no one wants to do). So, they started outsourcing. They realized buying a pre-made pizza dough (API) is way more efficient than grinding the flour, kneading it, and praying it rises correctly every time. With APIs and SaaS, companies can focus on what they do bestserving their core business, while someone else worries about the sauce.
While focusing on core business is essential, external services often present challenges and highlight several considerations. Those services introduce quotas, limits, or constraints that vary based on Business plan, Architecture, or Infrastructure. Whatever the reason, any consumer must plan enough guardrails to isolate the internal process from external services, and those considerations differ based on the layer in the internal system that lives with that coupling.
Coupling Level
It is not avoidable to decouple all the external dependencies, and sometimes, a business needs to integrate some external service to gain complicated expertise such as payment, financing, insurance, etc. But at the same time, there is no way to persist all data for systems such as payment, and even for others such as insurance and financing, there is no possibility to predict all end-user interactions. The coupling can be established in two ways, Direct or Indirect.
Direct
Direct here does not mean Request / Response but more about how the end user product is coupled to the external systems. Coming back to the insurance example, the scenario can be described as:
A Product is available for a user.
The Product is eligible for N years insurance.
The User can modify the insurance default details.
The insurance price is computed per modified details.
The user makes the decision.
When a user changes the insurance duration, this results in a price change and the system needs to interact with the insurance partner service, which, brings the risk of introducing a new point of failure in the system.
Indirect
The above scenario mentions the default value for any single product, While these defaults are essential for the end user, they are often close to being static for a while. However, those values dont relate to the product but to the product category and characteristics. As an example, a Combined Refrigerator ( Frigo/Freezer) with an E-level energy class can have a higher insurance price than an A-class ( just assumptions ), Maybe that price can be lower for well-manufactured and popular brands like SIMENS, AEG, and BOSCH.
The above diagram illustrates a backend service interacting with the partner for product Creation/Modifications. This helps reduce the process coupling level to partner for all product detail page hits by persisting the defaults related to any product Category projected to all available products.
Coupling Layer
Focusing again on the articles topic, defining a rate-controlling strategy depends on the layer's position interacting with the external system, which drives the corresponding design and implementation patterns. Each layer offers distinct possibilities to solve the problem.
Client Side
When a client-side application, such as a Web or Mobile application, interacts with an external system directly or via a gateway proxy, any external system behavior will be replicated in the client app. This can be any latency, throttling, or internal failure problems. In case of rate issues, the client app will receive a 429 HTTP status code, indicating Too Many Requests. The client can retry the call and hope for a successful response. The problem with the client side is that there is no visibility about the overall rate of errors at a given time, and each client app runs with a local state.
Having millions of users means many users can find a single product details page, so they all need to interact with external insurance partners directly or via a gateway proxy. However, the important takeaway is that no single user knows about the overall load on the systems and how many retries are inflight to the external system.
Edge Side
If the process layer interacting with the external system is an edge layer, an edge-level state can help to keep track of the state, this allows to lead better the client apps and also avoids external system calls by applying some level of caching. But this layer adds a challenge of How zonal distributed edge locations can answer the regional rate challenge, while the rate is dedicated to a single regional communication boundary, many edge locations can send requests. However, this problem is many times less important than the client-side problem.
Earlier about the edge, the paragraph mentioned that the edge locations yet follow the same distributed state problems. While this is true, in the real world, Geographic insurance and pricing are definitively different, and this can be an unrealistic challenge except for some rare use cases.
Another option when using content delivery networks ( CDN ) is the Request Collapsing feature; it stands for serving the same origin response for many inflight requests, so the first request reaches the origin while the same requests at a given time are paused while waiting for the response to the first request. this reduces the origin load and, consequently, the external service load. ( The topic of CDN Request Collapsing is deeply discussed in Chapter 9 of Mastering Serverless Computing with AWS Lambda )
Server Side
The server layer is where all the autonomy and power reside. The server-side layer has access to computing power and is closer to the external system. It can be the best for keeping track of the rate limit's actual state and monitoring the actual consumed threshold. Using the server layer helps reduce the challenge of statelessness. However, this may be a more difficult layer for applying retries and exponential backoff of throttled requests, as this will increase the rate of suspended processor resources. The server layer allows synchronous or asynchronous interactions with the external systems. The choice of async or sync varies based on different constraints, and the Coupling level that can be Direct or Indirect.
Direct coupling choice will be synchronous as the end user product waits for the response, this means the interaction with the external service must be done before sending back the response.
All these assumptions can be real when a single container serves the backend service. However, it will suffer from the Concurrency challenge that will be explored later in this article and distributed state if the service is designed for high availability and scalability.
In the case of indirect coupling, an asynchronous backend process is a more flexible solution for interacting with an external system. Often, the async processes are on top of managed services such as SQS, Event Bridge, etc., and react to changes via messaging. This can be an ideal design, but the scaling capacity of managed services can become a point of reflection when interacting with external systems. This can be a real concern if the internal system is designed on top of Function as a service, in which the risk of high concurrency becomes a real concern.
Although the invocations are driven by pollers, each execution environment can receive a batch of records, giving more control to the consumer at the container level. However, the fact of batching by default is related to many factors such as speed, time window, and number of changes.
Processing Main Attributes
A variety of attributes can play a role in the rate of communication toward the external systems such as:
Time Window
Concurrency
Speed
Count
Time Window
A time window represents the time duration that a constraint applies, it is often a per-second metric, but there are services with per-minute or per-hour constraints. Whenever using a service with time window constraints, the base challenge will be, How do we adapt the internal rate in a dedicated time window based on external service constraints?
Concurrency
Concurrency is the ability to run many tasks simultaneously to increase some quality attributes such as Throughput and Scalability. A higher concurrency level can put more pressure on downstream, so crossing the rate constraints and thresholds. Controlling concurrency is a real challenge in distributed systems, and to achieve that control, a decrease in processing speed or throughput will be necessary. Another challenge in highly concurrent processes is state consistency, which can only be shared ( not done ) in a consistent way by context switching that is a complex process and, if achievable, will not allow the control of treatment but only sharing states by the complexity and cost of the real-time context switching process.
Speed
The speed represents how fast a system treats the demands. A higher speed means a higher level of statelessness, and achieving state means reducing the treatment speed. The system speed must be aligned with the overall business value, and there is no interest in driving fast while putting everyone in danger.
Count
The number of demands significantly impacts downstream services. Responding to a high number of demands is toward responsiveness, but how downstream services can handle that rate brings more design-related discussions. However, dealing with external systems becomes tricky because an external system has many more customers than our internal system. Sometimes, a subscription change and, sometimes, a move to another vendor can be the solution. Aligning two roadmaps is not achievable. Controlling the number of demands depends on the layer of coupling.
Trustable State
The rate controlling in distributed systems belongs to consistent, performant, and highly available persistent storage. Why those quality attributes are important?
Achieving a shared state consistently relates to many factors when dealing with distributed storage and will solve the most important challenge of controlling the State.
Availability: is important as unavailability in one node can lead to a stale state. so this will be important to fetch from the most consistent node or fetch and cache the state if any node is replaced. Yet, With all efforts, having a real-time trusted and available state is not an option if we talk about nano or milliseconds.
Performance: leads to lower latency, and lower latency means less overhead on storage, however, high latency will result in some shared state synchronization problems.
Consistency: The state must be consistent over different nodes in near real-time, updating the persisted state must be consistent for all subsequent reads.
DynamoDb, S3, and ElasticCache are some available options on AWS. Many other interesting options, such as Momento, can help achieve a distributed share state.
Conclusion
Designing distributed systems and business processes at scale leads to more inter-service communication needs, while in the best cases, the value is evident, there will be the worst cases that will lead to frustration because of the pressure a service puts on downstream services.
Some services add guardrails such as rate limiting to safeguard their infrastructure health, but this impacts the consumers and based on the presented coupling can impact the final products and users.
In this part of Rate Controlling in Distributed Systems , some concepts related to Rate Controlling, such as Coupling Level , Coupling Layer , Processing Main Attributes , and Trustable State, are explored to give an overview of how the rate is measured .
The new Part of the Series will explore some patterns and technical details to reduce the risk and control the rate in both Sync and Async designs.
Top comments (0)