I’ve been working on production AI agents recently, and something became obvious very quickly:
We have great tools for tracing execution…
But almost nothing for tracing cost.
- You can see prompt logs.
- You can see model outputs.
- You can debug chains.
But when your LLM bill spikes, you’re left guessing.
- Which agent caused it?
- Which model was overused?
- Where should you optimize?
That gap pushed me to start building a cost observability layer specifically for agent systems.
The goal isn’t just analytics - it’s attribution:
- Cost per agent
- Cost per workflow
- Cost per tool invocation
- Real-time token burn rates
Still early, but architecturally it’s been one of the more interesting infra problems I’ve worked on in AI so far.
Top comments (0)