Built-in Token Counting: Telemetry for Production AI Agents

#opensource #tutorial #ai #python

Strands Agents provides native telemetry and cost tracking out of the box. Stop writing custom token counters.

Building AI agents is easy. Deploying them to production is where most teams hit a wall.

One of the first questions from finance: "How much will this cost per request?"

Most agent frameworks make you build your own token counter. Strands Agents gives you one.

The Problem with Custom Token Counting

Every AI application needs cost monitoring. But tracking tokens across:

Multiple model calls
Tool invocations
Prompt caching
Multi-agent workflows

...requires custom infrastructure most teams rebuild from scratch.

Native Telemetry in Strands Agents

Strands Agents includes production-grade telemetry by default:

from strands import Agent
from strands_tools import calculator

# Create an agent with tools
agent = Agent(tools=[calculator])

# Invoke the agent with a prompt and get an AgentResult
result = agent("What is the square root of 144?")

# Access metrics through the AgentResult
print(f"Total tokens: {result.metrics.accumulated_usage['totalTokens']}")
print(f"Execution time: {sum(result.metrics.cycle_durations):.2f} seconds")
print(f"Tools used: {list(result.metrics.tool_metrics.keys())}")

# Cache metrics (when available)
if 'cacheReadInputTokens' in result.metrics.accumulated_usage:
    print(f"Cache read tokens: {result.metrics.accumulated_usage['cacheReadInputTokens']}")
if 'cacheWriteInputTokens' in result.metrics.accumulated_usage:
    print(f"Cache write tokens: {result.metrics.accumulated_usage['cacheWriteInputTokens']}")

No configuration. No custom code. It just works.

What You Get

Every AgentResult includes:

Metric	Description
`inputTokens`	Tokens sent to the model
`outputTokens`	Tokens generated by the model
`totalTokens`	Total cost (input + output)
`cacheReadInputTokens`	Tokens read from cache (Bedrock prompt caching)
`cacheWriteInputTokens`	Tokens written to cache

Multi-Agent Token Tracking

For multi-agent systems (executor → validator → critic), aggregate metrics across all agents:

from strands.multiagent import Swarm

swarm = Swarm([executor, validator, critic])
result = swarm("Query")

total_tokens = 0
for node_result in result.results.values():
    usage = node_result.result.metrics.accumulated_usage
    total_tokens += usage['totalTokens']

print(f"Total cost across all agents: {total_tokens} tokens")

Per-Cycle Tracking

For agents that run multiple reasoning cycles, track tokens per cycle:

from strands import Agent
from strands_tools import calculator

agent = Agent(tools=[calculator])

# First invocation
result1 = agent("What is 5 + 3?")

# Second invocation
result2 = agent("What is the square root of 144?")

# Access metrics for the latest invocation
latest_invocation = result2.metrics.latest_agent_invocation
cycles = latest_invocation.cycles
usage = latest_invocation.usage

# Or access all invocations
for invocation in response.metrics.agent_invocations:
    print(f"Invocation usage: {invocation.usage}")
    for cycle in invocation.cycles:
        print(f"  Cycle {cycle.event_loop_cycle_id}: {cycle.usage}")

# Or print the summary (includes all invocations)
print(result2.metrics.get_summary())