DEV Community

Paul Twist
Paul Twist

Posted on • Originally published at docs.litellm.ai

We Let 40 Engineers Loose With Coding Agents. The Bill Was Brutal.

Your engineering team adopted Claude Code last month. Productivity went up. Then the bill came in.

340% increase.

Nobody budgeted for this. Nobody even knew who spent what.

The math that kills budgets

A single coding agent session makes 50-200 API calls. Claude Sonnet 4 processes 100K+ context windows on every call. One developer running sessions all day burns $50-100.

Scale that to 40 engineers and you hit $20K/month in unexpected AI spend.

The root cause: raw API keys. No per-developer budgets. No team caps. No visibility. You find out about the problem when the invoice arrives.

What we did

We put LiteLLM between our coding agents and the LLM providers. Every call flows through the proxy, gets tracked, gets budget-checked. Took maybe 15 minutes to set up.

Per-developer budget keys

Each engineer gets a virtual key with a hard budget cap:

curl -X POST 'http://litellm-proxy:4000/key/generate' \
  -H 'Authorization: Bearer sk-master' \
  -d '{
    "key_alias": "alice-claude-code",
    "max_budget": 100,
    "budget_duration": "1mo",
    "models": ["claude-sonnet-4-20250514", "gpt-4.1-mini"],
    "tpm_limit": 1000000,
    "rpm_limit": 100
  }'
Enter fullscreen mode Exit fullscreen mode

$100/month cap. Auto-resets. Rate-limited so a runaway loop can't burn through it in 10 minutes.

The developer just changes one env var:

export ANTHROPIC_BASE_URL=http://litellm-proxy:4000
export ANTHROPIC_API_KEY=sk-alice-generated-key
Enter fullscreen mode Exit fullscreen mode

Claude Code doesn't know it's going through a gateway. No SDK changes, no config files, nothing.

Team budgets as the second wall

Individual caps are good. Team budgets catch the case where 20 developers each spending $90 still adds up to $1,800:

curl -X POST 'http://litellm-proxy:4000/team/new' \
  -H 'Authorization: Bearer sk-master' \
  -d '{
    "team_alias": "backend-eng",
    "max_budget": 2000,
    "budget_duration": "1mo",
    "models": ["claude-sonnet-4-20250514", "gpt-4.1-mini", "gpt-4.1"]
  }'
Enter fullscreen mode Exit fullscreen mode

Budget checks happen at every level: key, team, org. If any limit is hit, the request gets rejected with a clear error. No silent failures.

Model access controls

Not every task needs Claude Opus ($15/M input tokens). Most coding agent work, autocomplete, test generation, docs, that's Sonnet 4 ($3/M) or GPT-4.1-mini ($0.40/M) territory.

We give junior devs access to cost-effective models only. Senior engineers get the full menu. If an intern's agent tries to call Opus, the request is rejected before any tokens are consumed.

Cost attribution via tags

This is the part that actually made our CFO happy:

response = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[{"role": "user", "content": "Refactor this function..."}],
    extra_body={
        "metadata": {
            "tags": [
                "project:payments-refactor",
                "team:backend",
                "agent:claude-code"
            ]
        }
    }
)
Enter fullscreen mode Exit fullscreen mode

Now instead of "AI costs $50K/month" the conversation becomes "the payments team spent $12K on Claude Sonnet for their Q3 refactor, saving 3 weeks of engineering time."

The numbers

Without controls, Month 3 of org-wide agent rollout looks like this:

  • 80 developers, 4 sessions/day, $15 avg session cost
  • $105,600/month in AI spend nobody planned for

With per-developer caps ($100/mo) and team budgets ($2,000/mo), you cap exposure at a number you actually chose. Alerts fire at 50% consumption, giving you 2 weeks to adjust.

Why not LangSmith Gateway?

LangSmith launched their LLM Gateway recently. Fair comparison:

  • Still in beta. LiteLLM has been in production since 2023.
  • 7 providers vs 100+.
  • Managed only. LiteLLM is self-hosted, your code stays in your VPC.
  • Locked to LangSmith ecosystem. LiteLLM works with any observability stack.

For coding agents processing proprietary source code, the self-hosted part matters a lot.

Getting started

# Start proxy
litellm --config config.yaml

# Create team
curl -X POST 'http://localhost:4000/team/new' \
  -H 'Authorization: Bearer sk-master' \
  -d '{"team_alias": "engineering", "max_budget": 5000, "budget_duration": "1mo"}'

# Generate developer key
curl -X POST 'http://localhost:4000/key/generate' \
  -H 'Authorization: Bearer sk-master' \
  -d '{"team_id": "TEAM_ID", "key_alias": "dev-key", "max_budget": 100, "budget_duration": "1mo"}'

# Developer sets env var
export ANTHROPIC_BASE_URL=http://litellm-proxy:4000
export ANTHROPIC_API_KEY=sk-generated-key
Enter fullscreen mode Exit fullscreen mode

15 minutes. Every coding agent call gets tracked, budget-checked, and attributed.

Full walkthrough with screenshots: docs.litellm.ai/blog/coding-agent-spend-control


Ran into similar agent cost problems? Curious what approaches other teams are using.

Top comments (0)