Kuldeep Paul

Posted on Mar 15

Best Ways to Monitor Claude Code Token Usage and Costs in 2026

#ai #llm #monitoring #productivity

Claude Code is quickly becoming a core development tool for engineering teams building AI-powered products. Developers now rely on it daily to write code, debug issues, review pull requests, and automate large parts of the development workflow.

According to Anthropic, the average Claude Code usage cost is around $6 per developer per day, with 90% of users spending under $12 daily when using API pricing. Over a month, this typically translates to $100 to $200 per developer when using Claude Sonnet, although heavy usage with Claude Opus or multi-agent workflows can increase costs significantly.

While these costs are manageable for individuals, they become harder to track at the team and organization level. Engineering leaders often lack visibility into questions such as:

Which developers are consuming the most tokens?
Which models are responsible for the majority of costs?
How quickly is the team approaching budget limits?
Are inefficient prompts or workflows increasing spending?

Claude Code provides some basic cost visibility, but it is not designed for centralized monitoring across teams. This is where LLM gateways become essential.

An LLM gateway sits between Claude Code and the model provider, capturing every request and response. This allows organizations to track token usage, costs, latency, model selection, and request metadata in real time.

In this guide, we review the best solutions for monitoring Claude Code token usage and spend, focusing on observability depth, cost tracking granularity, and governance capabilities.

Why Teams Need Better Monitoring for Claude Code

Claude Code includes built-in commands such as /cost that show session-level spending. Some developers also use tools like ccusage, which parses local log files to estimate historical token usage.

However, these approaches have important limitations when teams scale.

1. Lack of Centralized Visibility

Claude Code stores usage data locally on each developer's machine inside:

~/.claude/projects/

This means each developer has their own isolated logs. There is no native way to aggregate usage across:

multiple developers
multiple repositories
multiple environments

Engineering managers therefore cannot easily view team-wide Claude usage in a single dashboard.

2. No Real-Time Budget Alerts

Claude Code's cost tools operate after tokens are already consumed. By the time someone checks the /cost command, the spend has already occurred.

Teams cannot:

set proactive spending limits
receive alerts when thresholds are reached
automatically block requests that exceed a budget

Without guardrails, unexpected usage spikes can appear only when the monthly invoice arrives.

3. Missing Cost Attribution

Organizations often want to understand how AI usage maps to internal teams and projects.

Typical questions include:

Which team is driving the majority of LLM costs?
Which project consumes the most tokens?
Which workflows are inefficient?

Claude Code does not support tagging requests with metadata such as team, project, or environment, making cost attribution difficult.

This is why many organizations deploy an LLM gateway layer to monitor and control AI usage.

What an LLM Gateway Does for Claude Code

An LLM gateway acts as an intermediary between Claude Code and the upstream model provider. Instead of sending requests directly to Anthropic, Claude Code sends them through the gateway.

This architecture enables several important capabilities:

centralized logging of every request
token and cost tracking
model-level analytics
budget enforcement
request tagging with custom metadata
integration with observability platforms

The gateway effectively becomes the control plane for AI usage across the organization.

Below are the most effective platforms for monitoring Claude Code usage.

1. Bifrost

Bifrost is a high-performance open source AI gateway written in Go that provides extensive observability and governance for LLM usage.

It integrates with Claude Code through a fully compatible Anthropic API endpoint, which allows teams to route traffic through the gateway without changing developer workflows.

Setup requires only two environment variables:

export ANTHROPIC_API_KEY=your-bifrost-virtual-key
export ANTHROPIC_BASE_URL=http://localhost:8080/anthropic

Once configured, all Claude Code requests automatically pass through Bifrost.

Detailed Token and Cost Metrics

Bifrost exposes detailed Prometheus metrics for tracking token usage and spending across models and providers.

Examples include:

bifrost_input_tokens_total
bifrost_output_tokens_total
bifrost_cost_total

These metrics allow teams to build dashboards showing:

cost per provider
cost per model
cost per developer
cost trends over time

Because metrics are collected asynchronously, they add no noticeable latency to requests.

Real-Time Request Logging

Bifrost logs every request with full metadata, including:

model parameters
prompt content
token counts
latency
cost per request

Logs can be accessed through:

a real-time web dashboard
REST APIs
WebSocket streams for monitoring tools

Teams can filter logs by model, token usage, cost range, time window, or request status.

Team-Level Cost Attribution

Bifrost introduces Virtual Keys, which allow organizations to issue unique API keys for developers, teams, or projects.

Because every request contains the associated key, usage metrics automatically include identifiers such as:

developer
team
project

This enables accurate cost attribution across the organization.

Budget Controls and Rate Limits

Bifrost supports hierarchical budget controls that allow organizations to define limits for:

developers
teams
projects

If a spending threshold is reached, the gateway can automatically block additional requests before more cost is incurred.

Rate limits on tokens and requests add another layer of protection.

Observability Integrations

Bifrost integrates with standard monitoring platforms using OpenTelemetry and native connectors.

Supported tools include:

Grafana
Datadog
New Relic
Honeycomb

This allows LLM usage metrics to be combined with existing application monitoring systems.

Bifrost is open source and designed for high throughput deployments.

2. LiteLLM

LiteLLM is a Python-based proxy that routes requests across more than 100 model providers, including Anthropic.

It includes built-in cost tracking capabilities through a virtual key system backed by PostgreSQL.

Key monitoring features include:

per-key spend tracking
model-level cost analytics
OpenTelemetry integrations
budget limits for API keys

When a key exceeds its configured budget, LiteLLM can block further requests.

However, teams running high traffic Claude Code workflows may encounter some limitations.

Because LiteLLM runs on a Python runtime and relies on a database for usage tracking, the architecture introduces additional infrastructure complexity compared to compiled gateway implementations.

Real-time streaming metrics such as time to first token or inter-token latency are also not instrumented with the same granularity as some specialized gateways.

3. Cloudflare AI Gateway

Cloudflare AI Gateway provides a fully managed gateway for routing LLM requests through Cloudflare's network.

The platform includes built-in analytics showing:

request counts
token usage
provider-level cost estimates

To monitor Claude Code traffic, teams simply configure the ANTHROPIC_BASE_URL to point to their Cloudflare gateway endpoint.

This allows requests to be logged and analyzed through Cloudflare's dashboard.

Cloudflare also supports exact match caching, which can reduce repeated API calls.

However, there are several limitations for organizations that need deeper monitoring capabilities:

no per-developer cost attribution
no hierarchical budget controls
limited custom metric dimensions
no self-hosted deployment

Log retention is also capped on lower pricing tiers.

For small teams that prefer a fully managed solution, Cloudflare AI Gateway can provide basic visibility without infrastructure management.

4. Anthropic Console

Teams using Claude via API keys can also monitor usage through the Anthropic Console.

Anthropic provides a Usage and Cost API that reports token consumption and cost data grouped by:

model
workspace
service tier

Reports can be generated at different time intervals, including:

one minute
one hour
one day

This provides a basic level of organizational cost tracking.

However, the Console only captures requests sent directly to Anthropic's API.

If an organization routes traffic through a gateway for features like fallbacks, load balancing, or multi-provider routing, the Console cannot observe those requests.

Additionally, the platform does not provide:

custom request metadata
Prometheus metrics
real-time alerting

For teams using Pro or Max subscriptions, detailed cost data may not be available since billing is subscription based.

Choosing the Right Claude Code Monitoring Solution

The best monitoring approach depends on the size of your team and how Claude Code is used.

For individual developers, the built-in /cost command combined with tools like ccusage provides enough visibility into session-level spending.

For teams using API pricing, the Anthropic Console adds organizational reporting capabilities.

However, organizations that require:

centralized monitoring
real-time cost tracking
team-level cost attribution
budget enforcement
integration with observability platforms

will typically benefit from deploying a dedicated LLM gateway.

Among the available solutions, Bifrost provides one of the most complete monitoring stacks for Claude Code usage, combining detailed token metrics, cost tracking, governance controls, and integration with standard observability systems.

As AI-assisted development becomes a standard part of engineering workflows, having full visibility into LLM usage and cost drivers will become increasingly important for managing budgets and optimizing developer productivity.

Top comments (5)

Henry Godnick • Mar 29

Great breakdown of the enterprise/team side. One gap in the article though — for individual developers who just want quick personal visibility without deploying a gateway, there's a lighter-weight option worth mentioning: TokenBar (tokenbar.site). It's a macOS menu bar app that shows real-time token usage and costs across providers. $5 one-time.

Not a replacement for Bifrost or LiteLLM if you need team-wide governance, but for solo devs or small teams who just want that ambient "how much am I spending right now" awareness, it fills the gap between the built-in /cost command and a full gateway deployment. The always-visible menu bar counter actually changes prompting behavior — you start being more deliberate when you can see tokens ticking up in real time.

Harjot Singh • May 31

The fact that "best ways to monitor Claude Code costs" is now a post people search for tells you how real this has gotten - token spend went from rounding error to line item in about a year. Good monitoring needs three layers, and most setups stop at one: real-time (what's this session costing right now), historical (trend over weeks so you catch creep), and per-task attribution (which kind of work is the budget sink). The third is the one that actually drives optimization - aggregate numbers just create anxiety.

The thing I'd add to any monitoring stack: monitoring is necessary but not sufficient - it tells you AFTER. The complete picture is monitor + a hard cap that prevents the runaway in real time. That observe-and-enforce combo is exactly how Moonshift (a multi-agent pipeline that ships a prompt to a deployed SaaS) keeps builds at ~$3 flat - per-agent accounting for visibility, hard ceilings for safety. Really useful roundup for 2026. Of the monitoring approaches, do any of them also enforce a limit, or are they all observe-only? The gap between watching and stopping is where the expensive surprises live.

Krishnesh Pujari • Jun 19 • Edited

Good roundup - worth adding a lighter-weight option for anyone who doesn't want to stand up a gateway just to see their own spend. I built a local-first tool that reads the JSONL transcripts Claude Code already writes to ~/.claude/, persists them in SQLite, and gives per-project/per-developer attribution without any proxy or infra: github.com/krishnesh7/claude-usage.... It's monitoring, not enforcement (your commenter's point stands), but it's a five-minute install versus standing up Bifrost.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.