DEV Community

Debby McKinney
Debby McKinney

Posted on

Best AWS Gateway for Tracking LLM Costs and Rate Limits

TL;DR: If you are running LLM workloads on AWS (Bedrock, SageMaker, or calling external APIs from EC2/Lambda), you probably do not have great visibility into per-team costs or rate limit management. Here is a look at gateway options that solve this, with a focus on what actually works for AWS-heavy setups.

The Problem with LLM Cost Tracking on AWS

If you are using AWS Bedrock, your cost tracking options are limited. CloudWatch gives you invocation counts and latency. AWS Cost Explorer shows aggregate Bedrock spend. But neither gives you:

  • Per-team or per-application cost breakdowns
  • Real-time budget enforcement (not after-the-fact alerts)
  • Rate limiting per user or per service
  • Unified view when you also use OpenAI or Anthropic directly

Most teams figure this out after the first surprise bill.

What to Look for in an LLM Gateway for AWS

A good gateway for AWS LLM workloads should handle:

  1. Cost tracking per team/service: Not just total spend, but who is spending what
  2. Budget enforcement: Hard caps that stop requests when limits are hit
  3. Rate limiting: Per-user, per-team, and per-provider throttling
  4. Multi-provider support: Because most teams use Bedrock AND direct API calls
  5. Low overhead: Your gateway should not become the bottleneck

Option 1: AWS API Gateway + Custom Lambda

You can build cost tracking yourself using API Gateway as a proxy, Lambda for request processing, and DynamoDB for tracking.

Pros:

  • Fully within AWS ecosystem
  • You control everything

Cons:

  • You have to build and maintain everything
  • Lambda cold starts add latency
  • No built-in LLM-aware features (token counting, model pricing)
  • Cost tracking logic is your responsibility

This works for teams with dedicated platform engineering resources. For most teams, it is more effort than the problem is worth.

Option 2: Bifrost (Open Source, Self-Hosted)

Bifrost is an open-source LLM gateway written in Go. It supports Bedrock natively alongside 20+ other providers.

What it does for AWS cost tracking:

The four-tier budget hierarchy is where Bifrost stands out:

  • Customer level: Total organization budget
  • Team level: Per-team spending caps (e.g., engineering gets $500/month, marketing gets $200/month)
  • Virtual Key level: Per-application or per-service budgets with configurable reset durations
  • Provider Config level: Per-provider rate limits

When a budget is hit at any level, the gateway enforces it. If your Bedrock budget runs out, requests can automatically fall back to a cheaper provider or stop entirely. This is real-time enforcement, not an alert you see the next day.

Rate limiting:

Bifrost handles rate limiting at the Virtual Key level:

  • Token-based limits (max tokens per period)
  • Request-based limits (max requests per period)
  • Configurable reset durations (per minute, hour, day, week, month)

If a provider config exceeds its rate limits, that provider is excluded from routing. Other providers stay available.

AWS Bedrock setup:

{
  "providers": {
    "bedrock": {
      "keys": [{
        "name": "bedrock-1",
        "value": "env.AWS_ACCESS_KEY",
        "models": ["anthropic.claude-3-sonnet"],
        "weight": 0.7
      }]
    },
    "openai": {
      "keys": [{
        "name": "openai-1",
        "value": "env.OPENAI_API_KEY",
        "models": ["gpt-4o-mini"],
        "weight": 0.3
      }]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Use provider-prefixed model names in your requests:

bedrock/anthropic.claude-3-sonnet
openai/gpt-4o-mini
Enter fullscreen mode Exit fullscreen mode

Bifrost handles the authentication, request format translation, and cost logging for each provider.

Performance: 11µs overhead per request, 5,000 RPS sustained throughput. Self-hosted, so your data stays within your AWS VPC. That matters for compliance.

Cost tracking:

The Model Catalog tracks pricing across all providers automatically. Every request is logged with token counts and calculated cost. You get one dashboard for Bedrock, OpenAI, Anthropic, and any other provider you configure.

Semantic caching:

The cache layer (Weaviate-backed) can reduce costs further by serving cached responses for similar queries. Dual-layer: exact hash matching plus semantic similarity.

Option 3: Build on CloudWatch + Cost Explorer

If you just want visibility (not enforcement), you can set up CloudWatch dashboards for Bedrock metrics and use AWS Cost Explorer with tags.

Pros:

  • No additional infrastructure
  • Native AWS tooling

Cons:

  • No real-time budget enforcement
  • No per-user or per-team granularity without custom tagging
  • Does not cover non-AWS providers
  • No rate limiting beyond Bedrock's built-in throttling

Comparison

Feature API Gateway + Lambda Bifrost CloudWatch
Per-team cost tracking Build yourself Built-in Manual tagging
Real-time budget caps Build yourself Built-in No
Rate limiting Build yourself Built-in Bedrock only
Multi-provider Build yourself 20+ providers AWS only
Overhead Lambda cold starts 11µs N/A
Maintenance High Low (single binary) Low
Self-hosted Yes Yes N/A
Open source Your code Yes No

Recommendation

If you need actual cost enforcement and rate limiting (not just monitoring), Bifrost is the most practical option for AWS-heavy teams. It is self-hosted, so it runs inside your VPC. The budget hierarchy maps well to how engineering organizations are structured. And it covers both AWS and non-AWS providers.

If you only need visibility and are fine with after-the-fact cost analysis, CloudWatch and Cost Explorer work without additional infrastructure.

Links:

Top comments (0)