This is the pitfall that cost me 3 hours at 2 AM. If you're running LiteLLM Proxy in production, it will hit you too — usually at the worst possible time.
What Happened
I run LiteLLM Proxy + New API in front of 18 provider channels. One night, I rotated an API key for a provider that had been flagged for unusual spending.
Standard procedure:
- Generate new key in provider dashboard
- Update
config.yamlwith new key - Run
litellm --config config.yaml --reload
The reload succeeded. No errors. The config showed the new key. I went to sleep.
The next morning, the old key was still being used. Every single request was still authenticating with the rotated-out key. The provider's dashboard showed traffic from both keys — the new one (from config validation) and the old one (from actual API calls).
Why It Happens
LiteLLM caches API keys in-memory for performance. When you --reload, the config is reloaded, but the key store is not purged. The worker process holds the old keys in a dictionary that persists across config reloads.
This means:
-
config.yamlshows the new key ✅ -
litellm --model_cost_mapshows the new key ✅ - The actual HTTP requests use the old key ❌
You won't notice until the old key expires or is revoked — at which point every request to that provider starts returning 401, and your fallback chain kicks in, routing traffic to your most expensive model.
The Fix
Option 1: Purge the cache manually (no downtime)
curl -X POST http://localhost:4000/cache/purge \
-H "Authorization: Bearer $LITELLM_MASTER_KEY"
This clears the in-memory key cache. The next request will pull the key from the freshly reloaded config.
Option 2: Use Redis for shared key state (recommended for multi-worker)
Set REDIS_HOST in your environment:
# docker-compose.yml
environment:
- REDIS_HOST=redis://redis:6379
- REDIS_CONNECTION_POOL_SIZE=5
With Redis, keys are stored externally. A config reload triggers a Redis key update, and all workers pick it up immediately. No stale keys.
Option 3: Restart the worker (downtime: 2-5 seconds)
docker restart litellm-proxy
Brute force, but guaranteed to work. Use this if you're in a hurry and can afford a brief blip.
How to Detect It Before Users Do
Add this to your monitoring — a simple script that checks whether the key in config matches the key actually being used:
# Check which key is being used for a specific model
curl -s http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer $LITELLM_API_KEY" \
-d '{"model": "openai/gpt-4o", "messages": [{"role": "user", "content": "test"}], "max_tokens": 1}' \
| jq '.usage'
# Compare with the key in config
grep "api_key:" config.yaml | head -1
If the provider's response includes a x-api-key-id header (OpenAI does), you can verify which key was used without guessing.
The Bigger Picture
Key cache invalidation is Pitfall #3 in my production survival map. There are 4 more deployment pitfalls and 3 hidden cost traps that I documented after 6 months of running this stack:
- 503 on every request after adding a provider — model name mismatch
- Costs 3× higher than expected — fallback chain hits expensive models by default
- Keys rotated but old ones still work ← this one
- Streaming responses cut off mid-token — Nginx/Cloudflare buffering
- New API channels show "insufficient quota" with balance > 0 — weight = 0 by default
Each of these took me 1-2 hours to diagnose in production. The full one-page reference card with all 5 pitfalls, 3 cost traps, a failure decision tree, and a pre-launch security checklist is available here:
👉 AI API Gateway Pitfall Map — $9
It's the page you print and pin next to your monitor — because when your gateway goes down at 2 AM, you won't be reading a 40-page guide.
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.