The 3-Line Python Decorator That Tracks Every Token Your AI Agent Spends

#ai #productivity #programming #python

Jensen Huang just told GTC 2026 that every NVIDIA engineer will get a token budget worth half their base salary — $100K-$150K in compute credits. His argument: in the agentic era, your output is capped by your token access, not your working hours.

Which means somebody has to track those tokens. Here's a decorator that does it in three lines of logic.

The Decorator

import functools
from collections import defaultdict

_token_log = defaultdict(lambda: {"calls": 0, "input": 0, "output": 0})
_session_total = {"input": 0, "output": 0}

ALERT_THRESHOLD = 500_000  # tokens — adjust to your budget

def track_tokens(fn):
    @functools.wraps(fn)
    def wrapper(*args, **kwargs):
        result = fn(*args, **kwargs)

        # Extract token counts from the response
        usage = result.usage  # works for OpenAI, Anthropic, LiteLLM
        inp, out = usage.input_tokens, usage.output_tokens

        # Log per-function and per-session
        _token_log[fn.__name__]["calls"] += 1
        _token_log[fn.__name__]["input"] += inp
        _token_log[fn.__name__]["output"] += out
        _session_total["input"] += inp
        _session_total["output"] += out

        total = _session_total["input"] + _session_total["output"]
        if total > ALERT_THRESHOLD:
            print(f"⚠️  TOKEN ALERT: {total:,} tokens used — over {ALERT_THRESHOLD:,} limit")

        return result
    return wrapper

Usage

Wrap any function that returns an LLM response:

import anthropic

client = anthropic.Anthropic()

@track_tokens
def summarize(text: str):
    return client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": f"Summarize: {text}"}]
    )

@track_tokens
def classify(text: str):
    return client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=256,
        messages=[{"role": "user", "content": f"Classify sentiment: {text}"}]
    )

# Run your agent workflow
summarize("Long document here...")
classify("I love this product")
classify("Terrible experience")

Check where your tokens are going:

def token_report():
    print(f"\n{'Function':<20} {'Calls':>6} {'Input':>10} {'Output':>10} {'Cost':>10}")
    print("-" * 60)
    for fn_name, stats in _token_log.items():
        # Claude Sonnet 4.5 pricing: $3/M input, $15/M output
        cost = (stats["input"] / 1e6 * 3) + (stats["output"] / 1e6 * 15)
        print(f"{fn_name:<20} {stats['calls']:>6} {stats['input']:>10,} {stats['output']:>10,} ${cost:>8.4f}")
    total_cost = (_session_total["input"] / 1e6 * 3) + (_session_total["output"] / 1e6 * 15)
    print(f"\n{'TOTAL':<20} {'':>6} {_session_total['input']:>10,} {_session_total['output']:>10,} ${total_cost:>8.4f}")

token_report()

Sample Output

Function             Calls      Input     Output       Cost
------------------------------------------------------------
summarize                1      1,247        312   $0.0085
classify                 2        418        124   $0.0031

TOTAL                            1,665        436   $0.0116

And when your agent goes on a loop (they always do):

⚠️  TOKEN ALERT: 502,101 tokens used — over 500,000 limit

Why It Works

Both the Anthropic and OpenAI Python SDKs return a usage object on every response with input_tokens and output_tokens. The decorator intercepts that before passing the response through, so your calling code never changes. The defaultdict keeps a running tally per function name — no setup, no database, no third-party library.

Adapt It

Swap the pricing constants for your model. Current rates per million tokens:

Model	Input	Output
Claude Opus 4.5	$5.00	$25.00
Claude Sonnet 4.5	$3.00	$15.00
Claude Haiku 4.5	$1.00	$5.00
GPT-4o	$2.50	$10.00
GPT-4o mini	$0.15	$0.60

For OpenAI responses, change usage.input_tokens to usage.prompt_tokens and usage.output_tokens to usage.completion_tokens. That's it.

Gotcha: Streaming responses don't include usage by default. For Anthropic, pass stream_options={"include_usage": True} and grab the final message_delta event. For OpenAI, set stream_options={"include_usage": True} in the create() call.

DEV Community