Problem
To prevent an unnecessary load, API requests should be throttled when made maliciously or due to an error.
Context
- The API Gateway of a distributed system runs in multiple instances, each with an in-memory state that does not allow tracking the current quota usage.
- Precise quota enforcement (per request, per second) is not critical; the goal is to prevent significant overuse.
- Quota configuration is static.
Forces
- No per request IO is allowed (centralized solutions do not fit).
- In a distributed system, there is no concept of a global current time.
- Failure to retrieve the quota state should not result in Gateway failure.
Solution
Implement in-memory quotas in each process, periodically synchronizing them asynchronously using Redis.
Consider a basic throttling rule:
- No more than
MAX_REQUESTS
withinINTERVAL
time for any API route (KEY
). - If the limit is exceeded, block requests to
KEY
forCOOLDOWN
seconds.
Concept
- Divide
INTERVAL
intoN
spans (N >= 2
). - At the end of each span, atomically increment the value in Redis (INCRBY) by the number of requests received for each
KEY
, using a key composed of theKEY
and the current ordinal number ofINTERVAL
in Unix epoch. - If the returned value exceeds
MAX_REQUESTS
, block access toKEY
, remove afterCOOLDOWN
. - If the write operation fails and
REQUESTS > MAX_REQUESTS / N
(i.e., the quota is exceeded locally), block access toKEY
, remove afterCOOLDOWN
.
Each API Gateway instance will generate the KEY
based on its own local time, which, in general, will lead to simultaneous writes (from Redis’s local time perspective) to different KEY
s. However, the total contribution of each node to each KEY
will correspond to the actual request rate experienced by that node.
Dividing the INTERVAL
into spans smooths the desynchronization effect.
Extension
Introduce nodes
, the approximate number of active API Gateway instances, to improve the algorithm.
- Initially,
nodes
equals to1
. - During each
INTERVAL
, sum the total request count for eachKEY
, keep current and previous values. - At the end of each interval:
- Read the value from the
KEY
for the previous interval. At this point, it is assumed that all nodes have switched to the next interval. - Update the
nodes
value by dividing response form Redis by the number ofREQUESTS
counted for the previous interval. - Set the previous value to the current value.
- Set the current value to
0
- Update point 4 of the Concept:
REQUESTS * nodes > MAX_REQUESTS / N
. This will increase the precision of the local quota enforcement. - Add a new step: if
REQUESTS * nodes > MAX_REQUESTS
, block access toKEY
, remove afterCOOLDOWN
.
When nodes are added or removed, the algorithm will adapt in the upcoming intervals.
Caveats
- In the worst case scenario (really, really unlikely), the quota is exceeded by
MAX_REQUESTS / N
in a span on each node. - Time desynchronization between nodes should be insignificant for the selected
INTERVAL
(i.e.,INTERVAL
>> desync). See Extension point 3.
Top comments (0)