Jensen Huang just told GTC 2026 that every NVIDIA engineer will get a token budget worth half their base salary — $100K-$150K in compute credits. His argument: in the agentic era, your output is capped by your token access, not your working hours.
Which means somebody has to track those tokens. Here's a decorator that does it in three lines of logic.
The Decorator
import functools
from collections import defaultdict
_token_log = defaultdict(lambda: {"calls": 0, "input": 0, "output": 0})
_session_total = {"input": 0, "output": 0}
ALERT_THRESHOLD = 500_000 # tokens — adjust to your budget
def track_tokens(fn):
@functools.wraps(fn)
def wrapper(*args, **kwargs):
result = fn(*args, **kwargs)
# Extract token counts from the response
usage = result.usage # works for OpenAI, Anthropic, LiteLLM
inp, out = usage.input_tokens, usage.output_tokens
# Log per-function and per-session
_token_log[fn.__name__]["calls"] += 1
_token_log[fn.__name__]["input"] += inp
_token_log[fn.__name__]["output"] += out
_session_total["input"] += inp
_session_total["output"] += out
total = _session_total["input"] + _session_total["output"]
if total > ALERT_THRESHOLD:
print(f"⚠️ TOKEN ALERT: {total:,} tokens used — over {ALERT_THRESHOLD:,} limit")
return result
return wrapper
Usage
Wrap any function that returns an LLM response:
import anthropic
client = anthropic.Anthropic()
@track_tokens
def summarize(text: str):
return client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": f"Summarize: {text}"}]
)
@track_tokens
def classify(text: str):
return client.messages.create(
model="claude-haiku-4-5",
max_tokens=256,
messages=[{"role": "user", "content": f"Classify sentiment: {text}"}]
)
# Run your agent workflow
summarize("Long document here...")
classify("I love this product")
classify("Terrible experience")
Check where your tokens are going:
def token_report():
print(f"\n{'Function':<20} {'Calls':>6} {'Input':>10} {'Output':>10} {'Cost':>10}")
print("-" * 60)
for fn_name, stats in _token_log.items():
# Claude Sonnet 4.5 pricing: $3/M input, $15/M output
cost = (stats["input"] / 1e6 * 3) + (stats["output"] / 1e6 * 15)
print(f"{fn_name:<20} {stats['calls']:>6} {stats['input']:>10,} {stats['output']:>10,} ${cost:>8.4f}")
total_cost = (_session_total["input"] / 1e6 * 3) + (_session_total["output"] / 1e6 * 15)
print(f"\n{'TOTAL':<20} {'':>6} {_session_total['input']:>10,} {_session_total['output']:>10,} ${total_cost:>8.4f}")
token_report()
Sample Output
Function Calls Input Output Cost
------------------------------------------------------------
summarize 1 1,247 312 $0.0085
classify 2 418 124 $0.0031
TOTAL 1,665 436 $0.0116
And when your agent goes on a loop (they always do):
⚠️ TOKEN ALERT: 502,101 tokens used — over 500,000 limit
Why It Works
Both the Anthropic and OpenAI Python SDKs return a usage object on every response with input_tokens and output_tokens. The decorator intercepts that before passing the response through, so your calling code never changes. The defaultdict keeps a running tally per function name — no setup, no database, no third-party library.
Adapt It
Swap the pricing constants for your model. Current rates per million tokens:
| Model | Input | Output |
|---|---|---|
| Claude Opus 4.5 | $5.00 | $25.00 |
| Claude Sonnet 4.5 | $3.00 | $15.00 |
| Claude Haiku 4.5 | $1.00 | $5.00 |
| GPT-4o | $2.50 | $10.00 |
| GPT-4o mini | $0.15 | $0.60 |
For OpenAI responses, change usage.input_tokens to usage.prompt_tokens and usage.output_tokens to usage.completion_tokens. That's it.
Gotcha: Streaming responses don't include usage by default. For Anthropic, pass stream_options={"include_usage": True} and grab the final message_delta event. For OpenAI, set stream_options={"include_usage": True} in the create() call.
Top comments (0)