Three months ago I was debugging a LangGraph agent that kept failing in production. Token costs were spiking, I had no idea which tool call was causing it, and every time I tried to reproduce the bug the agent did something completely different. That's when I realized: AI agents need their own observability tooling. So I built it.
The Problem with Current Tools
Web observability tools like Datadog and New Relic are great — for web services. But AI agents are fundamentally different. They run for minutes or hours, not milliseconds. Their costs are variable and per-token. They're non-deterministic, so you can't reproduce failures the same way. And they process sensitive data — full conversation text — that needs to be handled carefully.
What AgentStack Does
AgentStack is a free, self-hostable observability platform built specifically for AI agents. The entire SDK surface is one decorator — @observe — and that's intentional. I didn't want developers to rewrite their agents or learn a new paradigm. You instrument what you already have, and AgentStack does the rest.
The moment you add @observe to your agent function, every execution starts producing full structured traces — arguments, return values, timing, exceptions, token counts, and cost. It captures sync functions, async functions, and deeply nested call chains automatically. Agent calls a tool, tool calls an LLM, LLM calls another tool — AgentStack links all of it into a clean parent-child span tree.
The Time Machine is the feature I'm most proud of. AI agents are non-deterministic — you can't just run the same input again and expect the same bug to appear. Time Machine lets you step through any past execution, span by span, and see exactly what every LLM returned, what every tool did, and which decision path the agent took. It's like a debugger, but for agent runs that already happened.
The Security Engine runs in real time. Every span is analyzed for prompt injection patterns, PII leakage, token explosions, and anomalous latency. If something suspicious happens, an alert surfaces in the dashboard within seconds. And before any span ever hits storage, AgentStack automatically scrubs SSNs, credit card numbers, emails, phone numbers, and API keys — no configuration required.
Cost Analytics was something I personally needed badly. When you're running GPT-4 agents in production, costs can spiral fast and silently. AgentStack tracks per-model token usage and calculates USD cost for every single span, with timeseries charts broken down by hour, day, or week across GPT-4, Claude, Gemini, and any other provider.
The whole thing is fully self-hostable with one Docker Compose command. Redis, ClickHouse, the collector, the API, the workers, and the React dashboard — all spin up together. Your traces never leave your infrastructure. No SaaS subscription. No data sharing. Just your agents, fully visible, on your own servers
If this resonates with you, I'd love a star on GitHub — it's my first open-source launch and every star genuinely helps others find the project.
👉 https://github.com/Ramakrishna1967/AgentStack
For further actions, you may consider blocking this person and/or reporting abuse
Top comments (0)