TL;DR: If you are running LLM workloads on AWS (Bedrock, SageMaker, or calling external APIs from EC2/Lambda), you probably do not have great visibility into per-team costs or rate limit management. Here is a look at gateway options that solve this, with a focus on what actually works for AWS-heavy setups.
The Problem with LLM Cost Tracking on AWS
If you are using AWS Bedrock, your cost tracking options are limited. CloudWatch gives you invocation counts and latency. AWS Cost Explorer shows aggregate Bedrock spend. But neither gives you:
- Per-team or per-application cost breakdowns
- Real-time budget enforcement (not after-the-fact alerts)
- Rate limiting per user or per service
- Unified view when you also use OpenAI or Anthropic directly
Most teams figure this out after the first surprise bill.
What to Look for in an LLM Gateway for AWS
A good gateway for AWS LLM workloads should handle:
- Cost tracking per team/service: Not just total spend, but who is spending what
- Budget enforcement: Hard caps that stop requests when limits are hit
- Rate limiting: Per-user, per-team, and per-provider throttling
- Multi-provider support: Because most teams use Bedrock AND direct API calls
- Low overhead: Your gateway should not become the bottleneck
Option 1: AWS API Gateway + Custom Lambda
You can build cost tracking yourself using API Gateway as a proxy, Lambda for request processing, and DynamoDB for tracking.
Pros:
- Fully within AWS ecosystem
- You control everything
Cons:
- You have to build and maintain everything
- Lambda cold starts add latency
- No built-in LLM-aware features (token counting, model pricing)
- Cost tracking logic is your responsibility
This works for teams with dedicated platform engineering resources. For most teams, it is more effort than the problem is worth.
Option 2: Bifrost (Open Source, Self-Hosted)
Bifrost is an open-source LLM gateway written in Go. It supports Bedrock natively alongside 20+ other providers.
What it does for AWS cost tracking:
The four-tier budget hierarchy is where Bifrost stands out:
- Customer level: Total organization budget
- Team level: Per-team spending caps (e.g., engineering gets $500/month, marketing gets $200/month)
- Virtual Key level: Per-application or per-service budgets with configurable reset durations
- Provider Config level: Per-provider rate limits
When a budget is hit at any level, the gateway enforces it. If your Bedrock budget runs out, requests can automatically fall back to a cheaper provider or stop entirely. This is real-time enforcement, not an alert you see the next day.
Rate limiting:
Bifrost handles rate limiting at the Virtual Key level:
- Token-based limits (max tokens per period)
- Request-based limits (max requests per period)
- Configurable reset durations (per minute, hour, day, week, month)
If a provider config exceeds its rate limits, that provider is excluded from routing. Other providers stay available.
AWS Bedrock setup:
{
"providers": {
"bedrock": {
"keys": [{
"name": "bedrock-1",
"value": "env.AWS_ACCESS_KEY",
"models": ["anthropic.claude-3-sonnet"],
"weight": 0.7
}]
},
"openai": {
"keys": [{
"name": "openai-1",
"value": "env.OPENAI_API_KEY",
"models": ["gpt-4o-mini"],
"weight": 0.3
}]
}
}
}
Use provider-prefixed model names in your requests:
bedrock/anthropic.claude-3-sonnet
openai/gpt-4o-mini
Bifrost handles the authentication, request format translation, and cost logging for each provider.
Performance: 11µs overhead per request, 5,000 RPS sustained throughput. Self-hosted, so your data stays within your AWS VPC. That matters for compliance.
Cost tracking:
The Model Catalog tracks pricing across all providers automatically. Every request is logged with token counts and calculated cost. You get one dashboard for Bedrock, OpenAI, Anthropic, and any other provider you configure.
Semantic caching:
The cache layer (Weaviate-backed) can reduce costs further by serving cached responses for similar queries. Dual-layer: exact hash matching plus semantic similarity.
Option 3: Build on CloudWatch + Cost Explorer
If you just want visibility (not enforcement), you can set up CloudWatch dashboards for Bedrock metrics and use AWS Cost Explorer with tags.
Pros:
- No additional infrastructure
- Native AWS tooling
Cons:
- No real-time budget enforcement
- No per-user or per-team granularity without custom tagging
- Does not cover non-AWS providers
- No rate limiting beyond Bedrock's built-in throttling
Comparison
| Feature | API Gateway + Lambda | Bifrost | CloudWatch |
|---|---|---|---|
| Per-team cost tracking | Build yourself | Built-in | Manual tagging |
| Real-time budget caps | Build yourself | Built-in | No |
| Rate limiting | Build yourself | Built-in | Bedrock only |
| Multi-provider | Build yourself | 20+ providers | AWS only |
| Overhead | Lambda cold starts | 11µs | N/A |
| Maintenance | High | Low (single binary) | Low |
| Self-hosted | Yes | Yes | N/A |
| Open source | Your code | Yes | No |
Recommendation
If you need actual cost enforcement and rate limiting (not just monitoring), Bifrost is the most practical option for AWS-heavy teams. It is self-hosted, so it runs inside your VPC. The budget hierarchy maps well to how engineering organizations are structured. And it covers both AWS and non-AWS providers.
If you only need visibility and are fine with after-the-fact cost analysis, CloudWatch and Cost Explorer work without additional infrastructure.
Links:
Top comments (0)