DEV Community

Kuldeep Paul
Kuldeep Paul

Posted on

Best Ways to Monitor Claude Code Token Usage and Costs in 2026

Claude Code is quickly becoming a core development tool for engineering teams building AI-powered products. Developers now rely on it daily to write code, debug issues, review pull requests, and automate large parts of the development workflow.

According to Anthropic, the average Claude Code usage cost is around $6 per developer per day, with 90% of users spending under $12 daily when using API pricing. Over a month, this typically translates to $100 to $200 per developer when using Claude Sonnet, although heavy usage with Claude Opus or multi-agent workflows can increase costs significantly.

While these costs are manageable for individuals, they become harder to track at the team and organization level. Engineering leaders often lack visibility into questions such as:

  • Which developers are consuming the most tokens?
  • Which models are responsible for the majority of costs?
  • How quickly is the team approaching budget limits?
  • Are inefficient prompts or workflows increasing spending?

Claude Code provides some basic cost visibility, but it is not designed for centralized monitoring across teams. This is where LLM gateways become essential.

An LLM gateway sits between Claude Code and the model provider, capturing every request and response. This allows organizations to track token usage, costs, latency, model selection, and request metadata in real time.

In this guide, we review the best solutions for monitoring Claude Code token usage and spend, focusing on observability depth, cost tracking granularity, and governance capabilities.


Why Teams Need Better Monitoring for Claude Code

Claude Code includes built-in commands such as /cost that show session-level spending. Some developers also use tools like ccusage, which parses local log files to estimate historical token usage.

However, these approaches have important limitations when teams scale.

1. Lack of Centralized Visibility

Claude Code stores usage data locally on each developer's machine inside:

~/.claude/projects/
Enter fullscreen mode Exit fullscreen mode

This means each developer has their own isolated logs. There is no native way to aggregate usage across:

  • multiple developers
  • multiple repositories
  • multiple environments

Engineering managers therefore cannot easily view team-wide Claude usage in a single dashboard.

2. No Real-Time Budget Alerts

Claude Code's cost tools operate after tokens are already consumed. By the time someone checks the /cost command, the spend has already occurred.

Teams cannot:

  • set proactive spending limits
  • receive alerts when thresholds are reached
  • automatically block requests that exceed a budget

Without guardrails, unexpected usage spikes can appear only when the monthly invoice arrives.

3. Missing Cost Attribution

Organizations often want to understand how AI usage maps to internal teams and projects.

Typical questions include:

  • Which team is driving the majority of LLM costs?
  • Which project consumes the most tokens?
  • Which workflows are inefficient?

Claude Code does not support tagging requests with metadata such as team, project, or environment, making cost attribution difficult.

This is why many organizations deploy an LLM gateway layer to monitor and control AI usage.


What an LLM Gateway Does for Claude Code

An LLM gateway acts as an intermediary between Claude Code and the upstream model provider. Instead of sending requests directly to Anthropic, Claude Code sends them through the gateway.

This architecture enables several important capabilities:

  • centralized logging of every request
  • token and cost tracking
  • model-level analytics
  • budget enforcement
  • request tagging with custom metadata
  • integration with observability platforms

The gateway effectively becomes the control plane for AI usage across the organization.

Below are the most effective platforms for monitoring Claude Code usage.


1. Bifrost

Bifrost is a high-performance open source AI gateway written in Go that provides extensive observability and governance for LLM usage.

It integrates with Claude Code through a fully compatible Anthropic API endpoint, which allows teams to route traffic through the gateway without changing developer workflows.

Setup requires only two environment variables:

export ANTHROPIC_API_KEY=your-bifrost-virtual-key
export ANTHROPIC_BASE_URL=http://localhost:8080/anthropic
Enter fullscreen mode Exit fullscreen mode

Once configured, all Claude Code requests automatically pass through Bifrost.

Detailed Token and Cost Metrics

Bifrost exposes detailed Prometheus metrics for tracking token usage and spending across models and providers.

Examples include:

  • bifrost_input_tokens_total
  • bifrost_output_tokens_total
  • bifrost_cost_total

These metrics allow teams to build dashboards showing:

  • cost per provider
  • cost per model
  • cost per developer
  • cost trends over time

Because metrics are collected asynchronously, they add no noticeable latency to requests.

Real-Time Request Logging

Bifrost logs every request with full metadata, including:

  • model parameters
  • prompt content
  • token counts
  • latency
  • cost per request

Logs can be accessed through:

  • a real-time web dashboard
  • REST APIs
  • WebSocket streams for monitoring tools

Teams can filter logs by model, token usage, cost range, time window, or request status.

Team-Level Cost Attribution

Bifrost introduces Virtual Keys, which allow organizations to issue unique API keys for developers, teams, or projects.

Because every request contains the associated key, usage metrics automatically include identifiers such as:

  • developer
  • team
  • project

This enables accurate cost attribution across the organization.

Budget Controls and Rate Limits

Bifrost supports hierarchical budget controls that allow organizations to define limits for:

  • developers
  • teams
  • projects

If a spending threshold is reached, the gateway can automatically block additional requests before more cost is incurred.

Rate limits on tokens and requests add another layer of protection.

Observability Integrations

Bifrost integrates with standard monitoring platforms using OpenTelemetry and native connectors.

Supported tools include:

  • Grafana
  • Datadog
  • New Relic
  • Honeycomb

This allows LLM usage metrics to be combined with existing application monitoring systems.

Bifrost is open source and designed for high throughput deployments.


2. LiteLLM

LiteLLM is a Python-based proxy that routes requests across more than 100 model providers, including Anthropic.

It includes built-in cost tracking capabilities through a virtual key system backed by PostgreSQL.

Key monitoring features include:

  • per-key spend tracking
  • model-level cost analytics
  • OpenTelemetry integrations
  • budget limits for API keys

When a key exceeds its configured budget, LiteLLM can block further requests.

However, teams running high traffic Claude Code workflows may encounter some limitations.

Because LiteLLM runs on a Python runtime and relies on a database for usage tracking, the architecture introduces additional infrastructure complexity compared to compiled gateway implementations.

Real-time streaming metrics such as time to first token or inter-token latency are also not instrumented with the same granularity as some specialized gateways.


3. Cloudflare AI Gateway

Cloudflare AI Gateway provides a fully managed gateway for routing LLM requests through Cloudflare's network.

The platform includes built-in analytics showing:

  • request counts
  • token usage
  • provider-level cost estimates

To monitor Claude Code traffic, teams simply configure the ANTHROPIC_BASE_URL to point to their Cloudflare gateway endpoint.

This allows requests to be logged and analyzed through Cloudflare's dashboard.

Cloudflare also supports exact match caching, which can reduce repeated API calls.

However, there are several limitations for organizations that need deeper monitoring capabilities:

  • no per-developer cost attribution
  • no hierarchical budget controls
  • limited custom metric dimensions
  • no self-hosted deployment

Log retention is also capped on lower pricing tiers.

For small teams that prefer a fully managed solution, Cloudflare AI Gateway can provide basic visibility without infrastructure management.


4. Anthropic Console

Teams using Claude via API keys can also monitor usage through the Anthropic Console.

Anthropic provides a Usage and Cost API that reports token consumption and cost data grouped by:

  • model
  • workspace
  • service tier

Reports can be generated at different time intervals, including:

  • one minute
  • one hour
  • one day

This provides a basic level of organizational cost tracking.

However, the Console only captures requests sent directly to Anthropic's API.

If an organization routes traffic through a gateway for features like fallbacks, load balancing, or multi-provider routing, the Console cannot observe those requests.

Additionally, the platform does not provide:

  • custom request metadata
  • Prometheus metrics
  • real-time alerting

For teams using Pro or Max subscriptions, detailed cost data may not be available since billing is subscription based.


Choosing the Right Claude Code Monitoring Solution

The best monitoring approach depends on the size of your team and how Claude Code is used.

For individual developers, the built-in /cost command combined with tools like ccusage provides enough visibility into session-level spending.

For teams using API pricing, the Anthropic Console adds organizational reporting capabilities.

However, organizations that require:

  • centralized monitoring
  • real-time cost tracking
  • team-level cost attribution
  • budget enforcement
  • integration with observability platforms

will typically benefit from deploying a dedicated LLM gateway.

Among the available solutions, Bifrost provides one of the most complete monitoring stacks for Claude Code usage, combining detailed token metrics, cost tracking, governance controls, and integration with standard observability systems.

As AI-assisted development becomes a standard part of engineering workflows, having full visibility into LLM usage and cost drivers will become increasingly important for managing budgets and optimizing developer productivity.

Top comments (0)