DEV Community

Cover image for How to Use API Key Metadata to Enforce Dynamic Rate Limits and Quotas in an AI API
Chidera Humphrey
Chidera Humphrey

Posted on

How to Use API Key Metadata to Enforce Dynamic Rate Limits and Quotas in an AI API

Introduction

The moment your API needs more than one plan, the cracks start to show.

You add a plan column to your users table. Then a quota check in a middleware. Then a rate limiter that reads from that middleware. Then a special case for the enterprise customer who negotiated something different. Before long, governance logic is scattered across handlers, middleware, and database queries. And changing a plan limit means a code change, a review, and a deployment.

There’s a cleaner model. When an API key carries its own metadata, plan tier, feature flags, and custom limits, the gateway can read that data on every request and enforce the right behaviour per consumer, without a single line of application code involved.

This guide walks you through building a lightweight AI API with Node.js, Express, and OpenRouter, then using Zuplo’s API key metadata to drive dynamic rate limiting and quota enforcement per plan. One metadata field, plan, drives everything from here.

Section 1: The Architecture

Client
  ↓
Zuplo
  ├─ API key auth
  ├─ Rate limiting
  ├─ Quotas
  └─ Backend secret header
  ↓
Express AI API
  ↓
OpenRouter
Enter fullscreen mode Exit fullscreen mode

Every request from a client hits Zuplo first. Express never sees a request until Zuplo has authenticated it, checked quotas, applied rate limits, and injected the backend secret. The two layers have exactly one job each: Zuplo owns governance, Express owns AI generation.

That separation matters more than it might seem. Governance logic that lives in the gateway is centrally managed, instantly changeable, and completely decoupled from your application code. If you want to change the Free plan limit from 10 requests per hour to 5, you can change the number in the Zuplo dashboard – no code review, no manual deployment, and no risk of accidentally breaking something in the handler.

Before building either layer, it’s worth being precise about two terms that often get conflated:

  • Rate limits protect infrastructure — they prevent burst abuse, concurrency spikes, and accidental request floods. They reset in short windows (usually minutes).
  • Quotas enforce business constraints — they represent how much of your API a consumer is allowed to use per billing period. They reset on longer cycles (hourly, daily, monthly).

Both are driven by the same field in this tutorial: plan on the consumer’s metadata.

Section 2: Why API Key Metadata Matters

Most developers treat API keys as opaque authentication strings. You have one, or you don't, and the backend figures out the rest by looking up the user in a database.

Zuplo consumers work differently. Each consumer can carry arbitrary JSON metadata, set at creation time and available as request.user.data in every policy that runs after authentication. There’s no database lookup; the data travels with the key.

You can store whatever makes sense for your API: plan tier, organization ID, feature flags, custom limits. Anything that should influence how the gateway treats a consumer’s requests can live here.

The conceptual unlock: metadata turns every API key into a runtime configuration object. Policies read it and enforce different behaviour per consumer. You don’t write configuration logic in your application; you write it once in the policy, and the metadata drives the outcome.

Here’s what that enables in practice:

  • Free vs. Pro plans with different limits
  • Organization-level quota keys
  • Tenant-specific governance
  • Adaptive usage tiers
  • Monetization models

In this tutorial, a single plan field on the consumer metadata drives everything. Every policy from this point forward reads request.user.data.plan.

Prerequisites

  • Node.js v18+ installed (v18 adds native fetch support, which avoids an extra dependency)
  • Basic familiarity with Node.js and Express. If you can build a REST API in your preferred stack, you’ll follow fine
  • An understanding of what an API gateway does
  • A Zuplo account – the free tier covers everything in this guide
  • An OpenRouter account and API key – a free account is enough to get started

Section 3: Building the AI API

The API is a simple LLM wrapper: it takes a user’s text and rewrites it in a tone they specify. It’s thin by design; all it does is accept a request, call OpenRouter, and stream the response back.

OpenRouter is a unified API that routes requests to multiple LLM providers. One API key, multiple models: you pick the model in the request, and OpenRouter handles the routing. It’s OpenAI-compatible, so the integration is a drop-in: swap the base URL and key, and the OpenAI SDK works as-is.

Create a Node.js app:

npm init -y
Enter fullscreen mode Exit fullscreen mode

This project uses TypeScript, but you don’t have to; it's a simple enough app to work in plain JavaScript. If you want TypeScript, here’s how to set it up:

Install the TypeScript dependencies:

npm install -D typescript ts-node @types/node tsx nodemon
Enter fullscreen mode Exit fullscreen mode

The -D flag installs these as development dependencies.

  • ts-node: Runs TypeScript directly in Node.js without compiling to JavaScript first
  • @types/node: Adds TypeScript type definitions for Node.js core modules like fs, path, and http
  • typescript: The TypeScript package itself

Generate a tsconfig.json:

npx tsc --init
Enter fullscreen mode Exit fullscreen mode

Open the generated tsconfig.json and add "node" to the types array:

"types": ["node"]
Enter fullscreen mode Exit fullscreen mode

Then update your package.json:

"main": "server.ts",
  "scripts": {
    "dev": "nodemon --watch src --exec\"tsx src/server.ts\"",
    "test": "echo\"Error: no test specified\" && exit 1",
    "build": "tsc --project tsconfig.json",
    "start": "node dist/server.js"
  },
  "type": "module"
Enter fullscreen mode Exit fullscreen mode

This sets the entry point to a TypeScript file and enables import syntax throughout the project.

The Endpoints

The API exposes one endpoint: POST /rewrite. It accepts JSON, calls OpenRouter, and streams the response token by token back to the client. No auth, no rate limiting, no quota logic–that all lives in Zuplo. The backend’s only job is AI generation.

One thing worth knowing before you test: streaming works correctly in browser and frontend contexts but doesn’t render visibly in Zuplo’s test interface or Postman. You’ll see the response arrive as a single blob rather than token by token. That’s not a bug; it’s just how those tools handle streamed responses.

OpenRouter

Create an openrouter.ts file inside a src/services folder and add the following code:

// src/services/openrouter.ts

import OpenAI from "openai";
import type { Response as ExpressResponse } from "express";

let client: OpenAI | null = null;

function getClient(): OpenAI {
  if (!client) {
    const apiKey = process.env.OPENROUTER_API_KEY;
    if (!apiKey) {
      throw new Error("Missing OPENROUTER_API_KEY environment variable.");
    }

    client = new OpenAI({
      apiKey,
      baseURL: "https://openrouter.ai/api/v1",
    });
  }
  return client;
}

export async function createStreamResponse(
  prompt: string,
  res: ExpressResponse,
) {
  const apiClient = getClient();
  const model = process.env.OPENROUTER_MODEL || "gpt-4o-mini";

  const stream = await apiClient.chat.completions.create({
    model,
    max_tokens: 1024,
    messages: [
      {
        role: "user",
        content: prompt,
      },
    ],
    stream: true,
  });

  try {
    for await (const chunk of stream) {
      if (chunk.choices?.[0]?.delta?.content) {
        res.write(chunk.choices[0].delta.content);
      }
    }
  } catch (error) {
    console.error(error);
  } finally {
    res.end();
  }
}
Enter fullscreen mode Exit fullscreen mode

Let’s break down what’s happening in the code:

  1. You implemented a singleton pattern for the OpenAI client, where getClient() creates and caches a single client instance configured to use OpenRouter’s API endpoint rather than OpenAI’s native endpoint. This allows you to access multiple AI models through a unified interface while reusing the same connection across requests.
  2. The createStreamResponse function accepts a user prompt and an Express response object, then initiates a streaming chat completion using the model specified in environment variables (defaulting to gpt-4o-mini). Setting stream: true enables token-by-token responses rather than waiting for the complete generation.
  3. The function iterates over the asynchronous stream chunks and writes each token directly to the Express response as it arrives, keeping the connection open until the stream finishes. Error handling ensures the response is properly closed even if streaming fails, preventing hanging connections.

This creates a server-sent event (SSE)-style AI streaming endpoint that delivers real-time token generation to clients, providing a more responsive experience compared to waiting for entire responses.

Server file

Create a server.ts file and add the following code:

import express, { json } from "express";
import cors from "cors";
import dotenv from "dotenv";
import type { Request, Response, NextFunction } from "express";
import { createStreamResponse } from "./services/openrouter.js";

dotenv.config();

interface RewriteBody {
  text: string;
  tone?: string;
}

const app = express();
const port = process.env.PORT ? Number(process.env.PORT) : 3000;

app.use(cors());
app.use(json());

app.post(
  `/rewrite`,
  async (
    req: Request<unknown, unknown, RewriteBody>,
    res: Response,
    next: NextFunction,
  ) => {
    try {
      const text = req.body?.text;
      let tone = req.body?.tone;

      if (!text) {
        return res.status(400).json({
          error: "Text is required",
        });
      }
      if (!tone) {
        tone = "professional";
      }

      // Validate the text field
      if (typeof text !== "string" || !text.trim()) {
        return res.status(400).json({
          error: "The request body must include a non-empty text field.",
        });
      }

      // Validate the tone field
      if (typeof tone !== "string" || !tone.trim()) {
        return res.status(400).json({
          error: "The tone field must be a non-empty string when provided.",
        });
      }

      res.setHeader("Content-Type", "text/plain; charset=utf-8");
      res.setHeader("Cache-Control", "no-transform");
      res.flushHeaders();

      await createStreamResponse(
        `Rewrite the following text with a ${tone} tone, preserving the original meaning:\n\n${text}`,
        res,
      );
    } catch (error) {
              res.status(500).json({ error: "Internal server error" });
        }
  },
);
Enter fullscreen mode Exit fullscreen mode

Let’s break down what’s happening in this Express server code:

  1. You set up an Express server with CORS and JSON middleware, then defined a POST endpoint at /rewrite that accepts a text field and an optional tone field in the request body. The endpoint validates that text exists and is a non-empty string, defaulting the tone to “professional” if none is provided.
  2. Before initiating the AI stream, the endpoint configures response headers with Content-Type: text/plain and calls res.flushHeaders() to ensure headers are sent immediately. This is critical for streaming responses because it prevents the connection from buffering the entire response before sending data to the client.
  3. The endpoint calls the createStreamResponse function from your OpenRouter service, passing a prompt that instructs the AI to rewrite the user’s text with the specified tone while preserving meaning.

This creates a complete AI text rewriting API endpoint that accepts user input, validates it, streams AI-generated responses token by token, and maintains proper HTTP semantics for server-sent events.

That’s the core app. You can test it with a cURL command or Postman before moving on.

The Express backend is ready. Next, you’ll put Zuplo in front of it.

Setting Up Zuplo

Zuplo is a programmable API gateway. It sits in front of your backend and intercepts every request before it hits your handlers and every response before it reaches the client.

After you’ve created an account, you’ll see a screen like this:

Zuplo getting started screen

Click “Start Building”.

You’ll need to create a route; this is what your API consumers and frontend will call. The Zuplo route is the public endpoint; your Express app is never called directly.

Navigate to the “Code” tab, select the route.oas.json file, and click “Add Route”.

Zuplo dashboard

Create a /rewrite route and set the method to POST.

In the “Request Handler” section, set the “Handler” to “URL Forward”. The URL Forward handler proxies requests to a different API without requiring custom code. It appends the incoming path to the specified baseUrl, which makes it ideal for gateway and backend proxying patterns.

In the “Forward to” field, enter:

${env.BACKEND_URL}
Enter fullscreen mode Exit fullscreen mode

env.BACKEND_URL is an environment variable you’ll set in the next step. Using it here is good practice when you have more than one route to secure; you can define the base URL once and reference it across every route definition.

Adding the BACKEND_URL Environment Variable

Navigate to the “Environments” tab → “Environment Variables”. Click “Add Variable” and add:

  • Name: BACKEND_URL
  • Value: your deployed backend URL

Adding BACKEND_URL environment variable in Zuplo

You’ll need to deploy your backend so Zuplo can reach it. Two options:

  • Use Ngrok to create a public tunnel to your localhost (note: you’ll need to rebuild the tunnel any time your machine goes off)
  • Deploy it to a managed service (Render works well for this)

Section 4: API Key Authentication and Consumer Metadata Setup

With a route in place, the next step is making sure only authenticated consumers can use it.

Policies are how Zuplo adds behaviour to routes; they intercept incoming requests or outgoing responses, similar to Express middleware but scoped to individual endpoints. You can mix and match different policies per route, which is what makes the gateway composable. Inbound policies run before the request reaches your handler.

The API key inbound policy will reject any request missing a valid key before it ever reaches your Node.js app.

In the “Policies” section of the /rewrite route, click “Add Policy” for “Request” and select “API Key Authentication”.

selecting API key auth policy

Leave the configuration file as-is.

Click “Test Route” and send a request without an authorization header. You’ll get a 401: Unauthorized error:

{
  "type": "https://httpproblems.com/http-status/401",
  "title": "Unauthorized",
  "status": 401,
  "detail": "No Authorization Header",
  "instance": "/rewrite",
  "trace": {
    "timestamp": "2026-05-25T10:06:12.221Z",
    "requestId": "eb68959d-e0c9-4a78-8842-6028f8ccedac",
    "buildId": "d7ec368a-201a-4304-b531-e4c2ef697449",
    "rayId": "a013b9de23f7af03"
  }
}
Enter fullscreen mode Exit fullscreen mode

That’s the API key authentication working. No code needed.

Creating the API Key Consumers

You’ll need two test API key consumers: one for Free and one for Pro. You can create consumers via the Zuplo UI or programmatically on user signup. (I wrote a guide on doing it with Supabase that works with any auth method Zuplo supports.)

Go to the “Services” tab in your dashboard and configure an API key service for the environment you’re working in.

Configuring API key consumers

Click “Create Consumer” and create two consumers:

Create a new API key consumer in Zuplo

  • Subject: the consumer’s name
  • Key managers: email addresses associated with the consumer — use a valid email; you’ll need it to access the developer portal
  • Metadata: the JSON data attached to the consumer. For each consumer, add a plan field:
// Free consumer
{ "plan": "Free" }

// Pro consumer
{ "plan": "Pro" }
Enter fullscreen mode Exit fullscreen mode

After saving, Zuplo generates API keys for each consumer.

This plan field is what every policy you add from here reads. Change it later and the behaviour changes immediately – no code touched, no deployment.

Section 5: Dynamic Quota Enforcement

Quotas are how you differentiate your Free plan from your Pro plan at the usage level. The quota policy here reads request.user.data.plan—the same field you just set on each consumer in Section 4. That’s what makes the whole system work.

The Credit-Based Model

This API uses a credit-based model: 1 successful request = 5 credits consumed. It’s a simplified version of how real AI APIs meter token usage. In a production system you’d map credits to actual token counts per model, but keeping it conceptual here makes the policy logic easier to follow.

Credits are also a more flexible unit than raw requests. A heavy summarization and a lightweight rewrite can consume different amounts without changing the API surface; you just adjust the meter value.

Adding the Dynamic Quota Policy

Add a “Quota” inbound policy to your route.

Quota Inbound policy

Replace the configuration file with this:

{
  "export": "QuotaInboundPolicy",
  "module": "$import(@zuplo/runtime)",
  "options": {
    "period": "hourly",
    "identifier": {
      "getQuotaDetailExport": "getQuotaDetail",
      "module": "$import(./modules/custom-detail)"
    },
    "quotaAnchorMode": "first-api-call",
    "quotaBy": "function",
    "quotaOnStatusCodes": "200-399"
  }
}
Enter fullscreen mode Exit fullscreen mode

Let’s break down what’s happening in this Zuplo quota configuration:

  1. You configured a quota policy that limits total usage over time, similar to a cell phone data plan. The period set to “hourly” means the quota resets every hour, and quotaAnchorMode set to “first-api-call” means the hour starts counting from the user’s first request rather than on a fixed clock.
  2. The policy looks at a custom module ./modules/custom-detail that exports a getQuotaDetail function, which determines how many “credits” each request consumes and what the user’s total hourly allowance should be.
  3. The quotaOnStatusCodes set to “200-399” ensures that only successful responses (HTTP status codes 200 through 399) count against the quota; failed requests don’t consume the user’s allowance.

This configuration creates a flexible usage-tracking system where successful API calls deplete a user’s hourly allowance based on custom logic, perfect for implementing usage-based billing or tiered access plans.

Creating the custom-detail.ts File

In the “modules” tab on the left panel, create a new file (select “Inbound Policy”) named custom-detail.ts—this must match the filename in the configuration.

Replace the content with:

import { GetQuotaDetailFunction, QuotaInboundPolicy, ZuploRequest, ZuploContext } from "@zuplo/runtime";

export const getQuotaDetail: GetQuotaDetailFunction = async (
  request: ZuploRequest,
  context: ZuploContext,
  policyName,
) => {
  // sets how many credits to increment per request
  QuotaInboundPolicy.setMeters(context, { credits: 5 });
  const credits =
  request.user.data.plan === "Free"
    ? 10
    : request.user.data.plan === "Pro"
    ? 50
    : 0;
  return {
    key: request.user.sub,
    allowances: {
      credits
    },
  };
};
Enter fullscreen mode Exit fullscreen mode

Let’s break down what’s happening in this custom quota module code:

  1. You defined a getQuotaDetail function that dynamically assigns quota allowances based on the user’s subscription plan. The function reads request.user.data.plan from the authenticated user’s metadata and sets credits allowances: 10 credits per hour for Free users, 50 for Pro users, and 0 for others.
  2. The function calls QuotaInboundPolicy.setMeters(context, { credits: 5 }) to specify that each API request consumes 5 credits from the user’s allowance. This creates a clear cost-per-request model where different plans get different total request capacities.
  3. The returned object uses request.user.sub as the quota tracking key—the consumer's unique identifier. This gives each consumer their own quota bucket, so one Free user exhausting their credits doesn't affect another. The allowance is still plan-driven; only the key changes. The allowances object specifies the user’s hourly limit for the “credits” meter.

This creates a usage-based quota system where Free users get 2 requests per hour (10 credits ÷ 5 per request) and Pro users get 10 requests per hour (50 ÷ 5), and the quota resets each hour automatically; all are driven entirely by user metadata.

Section 6: Quota Anchoring and Usage Tracking

There’s a runtime behaviour here that the documentation doesn’t make obvious, and it’s worth understanding before you hit it in production.

When getUsage is called on a newly created consumer, the first response looks like this:

{
  "anchorDate": "",
  "nextResetDate": "",
  "meters": {}
}
Enter fullscreen mode Exit fullscreen mode

After the first successful API call, it populates:

{"anchorDate":"2026-06-02T23:28:49Z","nextResetDate":"2026-06-03T03:28:49Z","meters":{"credits":5,"requests":1}}
Enter fullscreen mode Exit fullscreen mode

This is because quotaAnchorMode: "first-api-call", the quota window, doesn’t initialize until the first request hits. The anchor date becomes the start of the billing window. It’s expected behaviour, not a bug. Don’t treat an empty meters response as an error in your monitoring or dashboards.

As requests come in, credits and requests increment together. Each request adds 5 to credits (per the setMeters call in custom-detail.ts) and 1 to requests.

Section 7: Dynamic Rate Limiting

Quotas handle business usage enforcement. They answer “how much of this API does this consumer get per period?” Rate limiting answers a different question: “How fast can they hit it right now?” The two work together: a consumer can be within their quota but still get throttled if they’re sending requests too aggressively.

Think of it like an ATM: rate limiting is the number of times you can try to withdraw per day; quotas are how much money you actually have to withdraw. You need both.

The rate limiter reads the same request.user.data.plan metadata to enforce limits per consumer—same field, different enforcement layer.

Adding the Rate Limiting Inbound Policy

Add a “Rate Limiting” inbound policy to your route (not “Complex Rate Limiting"; the simple version is enough here).

Modify the configuration file with the following:

{
  "export": "RateLimitInboundPolicy",
  "module": "$import(@zuplo/runtime)",
  "options": {
    "rateLimitBy": "function",
    "requestsAllowed": 2,
    "timeWindowMinutes": 1,
    "identifier": {
      "module": "$import(./modules/rate-limiter)",
      "export": "rateLimitKey"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Let’s break down what’s happening in the Zuplo rate limit configuration:

  1. You configured a rate limit policy using a custom function to determine the limit per user. The requestsAllowed: 2 in the config is a fallback; the function overrides it with 10 for Free and 60 for Pro.
  2. The identifier field points to a custom module ./modules/rate-limiter that exports a function called rateLimitKey. This function returns a unique string (like a user ID or API key) that Zuplo uses to track request counts.
  3. The policy is named “RateLimitInboundPolicy” and imports from @zuplo/runtime, which is the standard way to reference built-in Zuplo policies.

This configuration creates a flexible rate-limiting rule where you control how users are identified.

Creating the rate-limiter.ts File

In the “modules” tab on the left panel, click the + icon and select “Inbound Policy”. Name the file rate-limiter.ts (matching the filename in the configuration).

Add the following code:

import {
  CustomRateLimitDetails,
  ZuploRequest,
  ZuploContext,
} from "@zuplo/runtime";

export function rateLimitKey(
  request: ZuploRequest,
  context: ZuploContext,
  policyName: string,
): CustomRateLimitDetails | undefined {
  // dynamically set requestsAllowed based on user's plan
  const requestsAllowed = request.user.data.plan === "Free" ? 10 : request.user.data.plan === "Pro" ? 60 : 5
  return {
    key: request.user.sub,
    requestsAllowed,
    timeWindowMinutes: 1,
  };
}
Enter fullscreen mode Exit fullscreen mode

Let's break down what's happening in this custom rate limiter code:

  1. You defined a rateLimitKey function that reads the user's subscription plan from request.user.data.plan metadata. Based on this value, it dynamically sets requestsAllowed: 10 requests per minute for Free users, 60 for Pro users, and a default of 5 for users without a plan or on other tiers.
  2. The function uses request.user.sub (the user's unique ID) as the tracking key, which means each individual user gets their own separate rate limit counter. This prevents one user's activity from affecting another user's limits.
  3. The function returns a CustomRateLimitDetails object containing the user-specific key, their plan-based allowance, and a fixed 1-minute time window. All this logic is driven entirely by the user's metadata attached during authentication.

This creates a personalised rate-limiting system where Pro users get 6x more requests per minute than Free users, all controlled by a simple metadata field on the authenticated user object.

The requestsAllowed: 2 in the config is a fallback; when rateLimitBy is set to "function", the custom function's return value overrides it. The function returns 10 for Free and 60 for Pro, so those are the limits that actually apply.

Authentication, quotas, and rate limits are all in place. But there’s still a gap worth closing: your Express app is publicly accessible. Anyone who finds the backend URL bypasses all of this. The next section closes that gap.

Section 8: Protecting the Backend

Without backend protection, a caller who discovers your Express URL skips API key auth, rate limits, and quota enforcement entirely. The metadata model becomes worthless if someone can route around it.

The fix is a shared secret between Zuplo and your Express app.

Here’s how it works:

You generate a random token and add it to both systems as an environment variable. Whenever Zuplo forwards a request downstream, it includes the token in a header. Your route reads that header and rejects anything that didn’t come through Zuplo.

Generate a suitable secret with:

node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"
Enter fullscreen mode Exit fullscreen mode

Adding a Headers Policy

Store the generated token as a secret environment variable in Zuplo.

Add a BACKEND_SECRET environment variable to your Zuplo project (same process as BACKEND_URL from the “Setting up Zuplo" section). The value is the token string you generated.

Adding BACKEND_SECRET variable in Zuplo

Zuplo has a built-in policy for setting request headers: Add or Set Request Headers.

Add the “Add or Set Request Headers” policy to your route.

Set Headers Secret

Add your BACKEND_SECRET variable to the headers array of the configuration file:

Adding the backend secret header to the Set Headers policy

Updating Your Express API Endpoint

Add the same secret as an environment variable in your Node.js app (.env.local for local development, or your hosting platform’s project settings for production). Then add this check at the top of your API handler:

if (req.headers["backend-secret"] !== process.env.BACKEND_SECRET) {
      return res.status(403).json({
        error: "Forbidden",
      });
    }
Enter fullscreen mode Exit fullscreen mode

Your API endpoint now rejects any request that doesn’t arrive through Zuplo. This is Zuplo's recommended pattern for backend protection. You can read more about it in their Securing your Backend with a Shared Secret docs.

With this in place, Zuplo is the only trusted entry point. API key auth, quotas, rate limiting, and backend protection are all enforced end to end.

Section 9: Testing the Full Flow

Two consumers, two plans. Here’s what the behavioural difference looks like in practice:

Free Pro
Rate limit 10 req/min 60 req/min
Quota (credits/hour) 10 credits 50 credits
Requests per hour 2 10

Optional: Debugging quota usage
During development, you can add a Custom Code Inbound policy to log
quota consumption on every request. In the configuration file, change
"YOUR_MODULE_NAME" to "get-usage", then create a get-usage.ts
file in the modules tab with the following:

import { ZuploContext, ZuploRequest, QuotaInboundPolicy } from "@zuplo/runtime";

export default async function policy(
  request: ZuploRequest,
  context: ZuploContext,
  policyName: string
) {
  const usage = QuotaInboundPolicy.getUsage(context, 'quota-inbound');
  context.log.info(usage);
  return request;
}

Check the Logs tab after each request to see credits and requests
incrementing. Remove this policy before going to production — it adds
overhead on every request.

Here's what to expect when you run each test:

In the “Code” tab, click “Test Route”. This opens a dialog where you can test your route.

Testing the API gateway without Authorization header

Test 1: No API key

Send the request without an Authorization header. You’ll get a 401: Unauthorized error:

{
  "type": "https://httpproblems.com/http-status/401",
  "title": "Unauthorized",
  "status": 401,
  "detail": "No Authorization Header",
  "instance": "/rewrite",
  "trace": {
    "timestamp": "2026-05-29T14:41:52.810Z",
    "requestId": "e83fcffa-01be-47d6-bbea-98498d8e071c",
    "buildId": "d7ec368a-201a-4304-b531-e4c2ef697449",
    "rayId": "a036432ff5fc1b60"
  }
}
Enter fullscreen mode Exit fullscreen mode

Test 2: Free plan consumer

Add an Authorization header set to Bearer <your-free-plan-api-key> and send the request. You’ll get the rewritten text:

Electric scooters, bikes, and ride‑sharing services have completely reshaped how people get around today's cities. More and more commuters are ditching their own cars in favor of flexible, on‑demand options. In places like Paris, Berlin and San Francisco, you'll see shared e‑scooters on almost every corner, filling the "last‑mile" gap that public transit often misses. This shift isn't just convenient—it also cuts traffic jams and lowers carbon emissions.

That said, the rapid rollout of these services brings new headaches. Cities are wrestling with sidewalk clutter, safety worries, and data‑privacy concerns tied to the apps. Some municipalities have responded by limiting the number of vehicles; others use geofencing to slow speeds in pedestrian‑heavy zones.

Even with these challenges, the momentum is clear: a more integrated, multimodal transport network is emerging. Experts estimate that by 2030, on‑demand mobility could make up close to 30 % of all urban trips. Achieving that vision will require solid collaboration among tech firms, local governments, and the public. Without clear rules and responsible use, the very tools meant to simplify travel could create new urban headaches. As cities keep growing, striking the right balance between innovation and regulation will be one of the defining challenges of our century
Enter fullscreen mode Exit fullscreen mode

If you've added the optional debug policy, check the Logs tab. On the first successful request, you'll see:

{"anchorDate":"2026-06-02T23:28:49Z","nextResetDate":"2026-06-03T02:28:49Z","meters":{}}
Enter fullscreen mode Exit fullscreen mode

The meters object is empty because quotaAnchorMode: "first-api-call" — the quota window hasn’t initialized yet. The anchor is set on the first request, not before it. This means when getUsage is called during request processing, the window hasn’t been committed yet. This is expected.

Run the request a second time and check the logs:

{"anchorDate":"2026-06-02T23:28:49Z","nextResetDate":"2026-06-03T03:28:49Z","meters":{"credits":5,"requests":1}}
Enter fullscreen mode Exit fullscreen mode

Now the meters are populated. The anchor date is fixed from the first request, and nextResetDate marks the end of the quota window. Each subsequent request adds 5 to credits and 1 to requests.

Test 3: Pro plan consumer

Switch to the Pro plan API key. The first successful request will show an empty meters object for the same anchoring reason. On subsequent requests, you’ll see quota consumed at the same rate, but the allowance is 50 credits (10 requests) rather than 10.

Test 4: Quota exhaustion

The Free plan allows 10 credits per hour (2 requests). After hitting the limit, the next request returns a 429 Too Many Requests error:

{
  "type": "https://httpproblems.com/http-status/429",
  "title": "Too Many Requests",
  "status": 429,
  "detail": "Quota exceeded for meters 'credits'",
  "instance": "/rewrite",
  "trace": {
    "timestamp": "2026-05-29T15:30:07.202Z",
    "requestId": "da20c3a5-c927-4883-bd9c-4d7a633f23cc",
    "buildId": "ef0357df-5f30-4c76-bb45-9be3969ee772",
    "rayId": "a03689d1a7941b60"
  }
}
Enter fullscreen mode Exit fullscreen mode

The rate limit triggers the same 429 response but with "detail": "Too many requests" referencing the rate limit policy rather than quota.

Test 5: Direct backend access

Try hitting your Express URL directly (bypassing Zuplo). You’ll get a 403 Forbidden—the backend secret check catching the unauthenticated request.

The metadata change test

Update the Free consumer’s metadata in the Zuplo dashboard from { "plan": "Free" } to { "plan": "Pro" }. Make the same request. The behaviour changes immediately: higher quota, higher rate limit, same code, no deployment.

That’s the payoff of the metadata model in its most concrete form.

Note: The policies on your route should be in this order:

  1. api-key-inbound
  2. quota-inbound
  3. rate-limit-inbound
  4. set-headers-inbound

Section 10: Why this Architecture Works

The separation here is deliberate, not incidental.

Express does one thing: talk to OpenRouter and stream a response. Zuplo does everything else: authenticate the request, enforce quotas, apply rate limits, inject the backend secret, and make all of it configurable without touching application code. Neither layer knows more about the other than it needs to.

What makes this programmable, not just configurable, is the metadata model.

Most API governance approaches are static: you deploy a config file, it controls the limits, and changing anything requires a deployment.

Here, the gateway reads runtime state from the consumer and enforces different behaviour per key. Change the plan field on a consumer, and the behaviour changes on the next request. No code review, no deployment pipeline, no risk of a bad deploy rolling back a limit change.

That has real implications beyond this tutorial.

The same pattern that powers Free vs. Pro plans here also powers multi-tenant APIs with organization-level governance, adaptive limits that change without deployments, differentiated API products on a single backend, and usage governance that scales with your consumer base.

Section 11: Using this for API Monetization

The infrastructure you just built is also the foundation of a monetizable API product. Free tiers, paid plans, usage-based billing, AI credit systems – all of them reduce to the same mechanics you’ve already configured: a plan field on the consumer, a quota policy reading it, and a rate limiter enforcing it.

Zuplo’s monetization feature builds on top of the exact same consumer and metadata model you’ve configured here. It layers on Stripe-powered subscriptions, a self-serve pricing page, plan management, and usage dashboards – all integrated with the gateway you already have.

If you need the full billing system on top of what you’ve built, the monetization quickstart is the natural next step.

Conclusion

API keys shouldn’t just authenticate users. When they carry metadata, they become the control plane for your entire API product, determining quotas, rate limits, and behavioural differences per consumer without a line of governance code in your application.

The model in one sentence: store plan data on the consumer, read it in your policies, enforce different behaviour per plan. One field drives everything.

By moving governance into Zuplo, the backend stayed lightweight, limits became instantly changeable, infrastructure protection became centralized, and the API became productizable without a billing system attached to it.

API gateways aren’t just reverse proxies anymore; they’re programmable control planes, and metadata is what makes them programmable.

If you’re building this into a SaaS, the natural next problem is per-user metering at the token level — mapping actual LLM token counts to credits rather than using a flat 5-credit-per-request model. Pair that with a webhook to your billing system (Stripe, for example), and you have usage-based billing without any quota logic in your application code.

Resources

Top comments (0)