I tried LangSmith, Langfuse, Helicone, and Phoenix — here's what each gets wrong

#ai #llm #monitoring #tooling

I spent the last three months building a production LLM app.
I tried every major observability tool. None of them fit perfectly —
so I built my own.

Here's my honest take on each one.

LangSmith

What it gets right: Deep LangChain integration. If you're all-in
on LangGraph, it's seamless.

What it gets wrong: Everything else.

Pricing is punitive.$39/seat/month — before you log a single trace. A team of 5 = $195/month just to get started.
14-day retention by default.** Want 400 days? That's $5.00 per 1,000 traces — 10x the base price. No middle tier.
US data only unless you're on an Enterprise plan. For EU teams: good luck with GDPR.
Vendor lock-in. It's built for LangChain. Use anything else and you're fighting the tool.

Langfuse

What it gets right: Open source, self-hostable, framework-agnostic.
The pricing is transparent. The community is solid (25k+ GitHub stars).

What it gets wrong:

No MCP support. If you're building with Claude and MCP tools, you're blind.
Alerting is weak. For production monitoring, most teams end up piping to Datadog or Grafana anyway.
15% latency overhead in benchmarks. Not a dealbreaker, but noticeable for latency-sensitive apps.

Langfuse is genuinely good. It's the one I'd recommend to most teams —
except for the MCP gap.

Helicone

**What it gets right: Incredibly simple setup. Literally a proxy —
one line change and you're logging.

What it gets wrong:

It's a **proxy, not an instrumentation layer. That means it only sees HTTP traffic. No agent tracing, no span-level visibility.
If you want to understand why your agent made a decision, Helicone can't help you.
Limited self-hosting story.

Great for quick cost tracking on simple apps. Not for complex agents.

Phoenix (Arize)

What it gets right: Strong on ML observability roots.
OpenTelemetry-native. Good for teams with existing ML infrastructure.

What it gets wrong:

Complexity. It's built for ML teams with existing Arize infrastructure, not solo devs or small teams.
Setup is non-trivial compared to the others.
The UI feels like it was designed for data scientists, not backend developers.

What I actually needed

After using all four, I realized my requirements were simpler
than any of them assumed:

None of them checked all four boxes. So I built AgentLens.

AgentLens

It's MIT licensed, self-hosted, and has native MCP support - the only obserability tool that does.

Setup:
``bash
import agentlens
agentlens.init()
agentlens.patch_anthropic() # every Claude call tracked automatically

DEV Community