I spent the last three months building a production LLM app.
I tried every major observability tool. None of them fit perfectly —
so I built my own.
Here's my honest take on each one.
LangSmith
What it gets right: Deep LangChain integration. If you're all-in
on LangGraph, it's seamless.
What it gets wrong: Everything else.
- Pricing is punitive.$39/seat/month — before you log a single trace. A team of 5 = $195/month just to get started.
- 14-day retention by default.** Want 400 days? That's $5.00 per 1,000 traces — 10x the base price. No middle tier.
- US data only unless you're on an Enterprise plan. For EU teams: good luck with GDPR.
- Vendor lock-in. It's built for LangChain. Use anything else and you're fighting the tool.
Langfuse
What it gets right: Open source, self-hostable, framework-agnostic.
The pricing is transparent. The community is solid (25k+ GitHub stars).
What it gets wrong:
- No MCP support. If you're building with Claude and MCP tools, you're blind.
- Alerting is weak. For production monitoring, most teams end up piping to Datadog or Grafana anyway.
- 15% latency overhead in benchmarks. Not a dealbreaker, but noticeable for latency-sensitive apps.
Langfuse is genuinely good. It's the one I'd recommend to most teams —
except for the MCP gap.
Helicone
**What it gets right: Incredibly simple setup. Literally a proxy —
one line change and you're logging.
What it gets wrong:
- It's a **proxy, not an instrumentation layer. That means it only sees HTTP traffic. No agent tracing, no span-level visibility.
- If you want to understand why your agent made a decision, Helicone can't help you.
- Limited self-hosting story.
Great for quick cost tracking on simple apps. Not for complex agents.
Phoenix (Arize)
What it gets right: Strong on ML observability roots.
OpenTelemetry-native. Good for teams with existing ML infrastructure.
What it gets wrong:
- Complexity. It's built for ML teams with existing Arize infrastructure, not solo devs or small teams.
- Setup is non-trivial compared to the others.
- The UI feels like it was designed for data scientists, not backend developers.
What I actually needed
After using all four, I realized my requirements were simpler
than any of them assumed:
- See exactly what my agent did — every tool call, every decision, in order
- Keep my data on my own server — I have EU customers
- Not pay per seat — I'm a solo dev
- Work with Claude and MCP — that's my stack
None of them checked all four boxes. So I built AgentLens.
AgentLens
It's MIT licensed, self-hosted, and has native MCP support - the only obserability tool that does.
Setup:
``bash
import agentlens
agentlens.init()
agentlens.patch_anthropic() # every Claude call tracked automatically
Top comments (0)