Throttling Musings with Api Gateway

#aws #serverless

Musing n' brewing special coffees

Have been a while since my last writing, working n’ walking the tightrope with some of the craziest foreigners guys with ease, but deep in the gears, like machines in order to arrest Eroom’s Law by delivering industry-wide platform economies of scale via the industrialisation of AI-native scientific data and AI-enabled use cases across the value chain, like on top of petabyte sized data to develop new vaccines, just ordinary stuff, you know? 🤷
And at the same time, trying to explain myself without failing to entertain, yet in the interim, trying to attain a way to explain myself without losing myself while trying to entertain, as a non-native English speaker n’ since contractors produce untrustworthy code (on their own eyes), have you got the drift?
swimming through the void, we hear the word, we lose ourselves, but we find it all.... nvm….musings…

all burned, coffee brewed, may we go to the tech stuff again?

Throttling & Rate Limiting using Api Gateway

I was in charge of coming up with a way to create something like guardrails against attackers, and some of the malicious AI/ML scientists who spin up EC2 P5e/P5en instances as pennies could be cherry-picked off trees, can you imagine? How many dolphins were dying for testing a near-real-time batch inference, just because they like to pet some GPU beasts, yeah…
We have developed this AI Platform (ECS Cluster in a Fargate, always believe in SLS baby)
and they were overusing it, and as a MLOps Engineer (fancy name for Platform Engineering for people who treat production like a science experiment and expect the infrastructure to fix the results, just kiding xD) I was asked to think about Throttling & Rate limting, my proposal was a two layer treatment, but it's hard to sell when they don't want to buy, have you catch my flow? anyway at least the layer 1 was implemented 😅, I'm going to share with you that still following my thoughts:

Layer 1: API Gateway & The Token Bucket

The "I Promise I'm Customer Obsessed" Layer

First up is the Token Bucket in API Gateway. AWS loves this because it’s built-in, which means they can bill you for it without you having to provision a single EC2 instance

Think of the Token Bucket as a vending machine that only accepts "Permission Slips"

The Rate: Every second, AWS drops a set number of slips into the bucket.
The Burst: The bucket has a fixed size. If you don't use your slips, they pile up until the bucket is full
The Reality: When a "passionate" user (read: a botnet from a country you can’t find on a map) hammers your API, they can chug all the slips in the bucket instantly. Once the bucket is empty, API Gateway starts handing out 429 Too Many Requests errors like they’re flyers for a timeshare

It’s great for protecting your account-level limits, but it’s about as surgical as a chainsaw. It doesn't care who is taking the tokens; it just cares that the bucket is empty and it’s time to go home
There is your awaited code example in CDK (sorry, other IaC tools, I've my preferred one):

import * as cdk from 'aws-cdk-lib';
import * as apigateway from 'aws-cdk-lib/aws-apigateway';
import { Construct } from 'constructs';

export class LayerOneStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // 1. The API Gateway: The fancy front door
    const api = new apigateway.RestApi(this, 'MyOverEngineeredApi', {
      restApiName: 'Layer1-Service',
      deployOptions: {
        stageName: 'prod',

        // --- THE TOKEN BUCKET (Throttling) ---
        // Rate: The 'Refiller' (Requests per second)
        // Burst: The 'Bucket Size' (The bank of tokens for spikes)
        throttlingRateLimit: 100, 
        throttlingBurstLimit: 200,

        // --- THE MARGARITA BOWL (Caching) ---
        cachingEnabled: true,
        cacheClusterEnabled: true,
        cacheClusterSize: '0.5', // '0.5' is the smallest and therefore least painful for your bill
        cacheTtl: cdk.Duration.minutes(5),
      },
    });

    // 2. A Resource (The "VIP Section")
    const vips = api.root.addResource('vips');

    // 3. Adding a Method with Integration
    vips.addMethod('GET', new apigateway.MockIntegration({
      integrationResponses: [{
        statusCode: '200',
        responseTemplates: {
          'application/json': '{"message": "Welcome to the club. AWS has already billed you for this sentence."}'
        }
      }],
      passthroughBehavior: apigateway.PassthroughBehavior.NEVER,
    }), {
      methodResponses: [{ statusCode: '200' }],
      // This is the key: tell API Gateway WHAT to cache
      requestParameters: {
        'method.request.querystring.user_id': true
      },
    });
  }
}

Layer 2: Redis/Valkey & The Sliding Window

The "I Don't Trust My Users" Layer

Now, if you actually want to know which of your customers is trying to scrape your entire database at 3:00 AM, you need Layer 2: The Sliding Window, usually implemented in ElastiCache for Valkey (a service whose pricing model I can only describe as "aggressively optimistic")

Unlike the Token Bucket, which is just a simple counter, the Sliding Window is for people with trust issues

Illustrating it for you who likes illustrations:

I know you were thinking, where is the code? There is mate:

import { Valkey } from 'iovalkey'; // The fork that won't ask for your credit card

const valkey = new Valkey({
  host: 'localhost',
  port: 6379,
});

/**
 * THE SLIDING WINDOW LUA SCRIPT
 * Keys: [ratelimit_key]
 * Args: [window_ms, max_requests, current_timestamp]
 */
const slidingWindowScript = `
  local key = KEYS[1]
  local window = tonumber(ARGV[1])
  local limit = tonumber(ARGV[2])
  local now = tonumber(ARGV[3])
  local oldest = now - window

  -- 1. Remove timestamps outside the sliding window
  redis.call('ZREMRANGEBYSCORE', key, 0, oldest)

  -- 2. Check the current count
  local current_count = redis.call('ZCARD', key)

  if current_count < limit then
    -- 3. Add current request and update expiry
    redis.call('ZADD', key, now, now)
    redis.call('PEXPIRE', key, window)
    return {1, current_count + 1} -- [Allowed, New Count]
  else
    return {0, current_count} -- [Denied, Current Count]
  end
`;

async function checkRateLimit(userId: string) {
  const key = `ratelimit:${userId}`;
  const windowMs = 60000; // 1 minute
  const limit = 100;      // 100 requests
  const now = Date.now();

  // Execute the script on Valkey
  const [allowed, count] = await valkey.eval(
    slidingWindowScript,
    1, // Number of keys
    key,
    windowMs,
    limit,
    now
  ) as [number, number];

  if (allowed === 1) {
    console.log(`✅ Request allowed for ${userId}. Count: ${count}/${limit}`);
    return true;
  } else {
    console.log(`❌ 429: Rate limit exceeded for ${userId}. Stop it.`);
    return false;
  }
}

How it works: Instead of a bucket of tokens, you keep a timestamped log of every request a specific user has made.
The Logic: When a request hits, you look at the last 60 seconds relative to right now. You count the logs. If they’ve made 101 requests and the limit is 100, you kick them out.
The Twist: We use Redis/Valkey Sorted Sets for this. We use [ZREMRANGEBYSCORE](https://valkey.io/commands/zremrangebyscore/) to evict "old news" (timestamps older than a minute) and ZCARD to check whether the user is currently being a nuisance

Remember:

Don’t give up and remember that you are writing not only for the community but also for yourself to organise thoughts :) (sodkiewicz, marcin)
thanks for these words, man 🤝