Strands Agents provides native telemetry and cost tracking out of the box. Stop writing custom token counters.
Building AI agents is easy. Deploying them to production is where most teams hit a wall.
One of the first questions from finance: "How much will this cost per request?"
Most agent frameworks make you build your own token counter. Strands Agents gives you one.
The Problem with Custom Token Counting
Every AI application needs cost monitoring. But tracking tokens across:
- Multiple model calls
- Tool invocations
- Prompt caching
- Multi-agent workflows
...requires custom infrastructure most teams rebuild from scratch.
Native Telemetry in Strands Agents
Strands Agents includes production-grade telemetry by default:
from strands import Agent
from strands_tools import calculator
# Create an agent with tools
agent = Agent(tools=[calculator])
# Invoke the agent with a prompt and get an AgentResult
result = agent("What is the square root of 144?")
# Access metrics through the AgentResult
print(f"Total tokens: {result.metrics.accumulated_usage['totalTokens']}")
print(f"Execution time: {sum(result.metrics.cycle_durations):.2f} seconds")
print(f"Tools used: {list(result.metrics.tool_metrics.keys())}")
# Cache metrics (when available)
if 'cacheReadInputTokens' in result.metrics.accumulated_usage:
print(f"Cache read tokens: {result.metrics.accumulated_usage['cacheReadInputTokens']}")
if 'cacheWriteInputTokens' in result.metrics.accumulated_usage:
print(f"Cache write tokens: {result.metrics.accumulated_usage['cacheWriteInputTokens']}")
No configuration. No custom code. It just works.
What You Get
Every AgentResult includes:
| Metric | Description |
|---|---|
inputTokens |
Tokens sent to the model |
outputTokens |
Tokens generated by the model |
totalTokens |
Total cost (input + output) |
cacheReadInputTokens |
Tokens read from cache (Bedrock prompt caching) |
cacheWriteInputTokens |
Tokens written to cache |
Multi-Agent Token Tracking
For multi-agent systems (executor → validator → critic), aggregate metrics across all agents:
from strands.multiagent import Swarm
swarm = Swarm([executor, validator, critic])
result = swarm("Query")
total_tokens = 0
for node_result in result.results.values():
usage = node_result.result.metrics.accumulated_usage
total_tokens += usage['totalTokens']
print(f"Total cost across all agents: {total_tokens} tokens")
Per-Cycle Tracking
For agents that run multiple reasoning cycles, track tokens per cycle:
from strands import Agent
from strands_tools import calculator
agent = Agent(tools=[calculator])
# First invocation
result1 = agent("What is 5 + 3?")
# Second invocation
result2 = agent("What is the square root of 144?")
# Access metrics for the latest invocation
latest_invocation = result2.metrics.latest_agent_invocation
cycles = latest_invocation.cycles
usage = latest_invocation.usage
# Or access all invocations
for invocation in response.metrics.agent_invocations:
print(f"Invocation usage: {invocation.usage}")
for cycle in invocation.cycles:
print(f" Cycle {cycle.event_loop_cycle_id}: {cycle.usage}")
# Or print the summary (includes all invocations)
print(result2.metrics.get_summary())
For a complete list of attributes and their types, see the EventLoopMetrics API reference.
Why This Matters
Cost visibility is the difference between a prototype and production AI.
With Strands telemetry:
- ✅ Budget AI workloads before deployment
- ✅ Identify expensive queries in production
- ✅ Optimize prompts with real token data
- ✅ Track prompt caching savings
All without writing a single line of telemetry code.
Works with All Model Providers
Token tracking works regardless of your model provider:
- Amazon Bedrock (Claude, Llama, Mistral)
- OpenAI (GPT-4, GPT-3.5)
- Anthropic API
- Ollama (local models)
Same API, same metrics, zero config changes.
Try It
pip install strands-agents
Full documentation: strandsagents.com/docs/user-guide/concepts/agents/
Gracias!

Top comments (0)