DEV Community

Soufian Azzaoui
Soufian Azzaoui

Posted on

I tried LangSmith, Langfuse, Helicone, and Phoenix — here's what each gets wrong

I spent the last three months building a production LLM app.
I tried every major observability tool. None of them fit perfectly —
so I built my own.

Here's my honest take on each one.

LangSmith

What it gets right: Deep LangChain integration. If you're all-in
on LangGraph, it's seamless.

What it gets wrong: Everything else.

  • Pricing is punitive.$39/seat/month — before you log a single trace. A team of 5 = $195/month just to get started.
  • 14-day retention by default.** Want 400 days? That's $5.00 per 1,000 traces — 10x the base price. No middle tier.
  • US data only unless you're on an Enterprise plan. For EU teams: good luck with GDPR.
  • Vendor lock-in. It's built for LangChain. Use anything else and you're fighting the tool.

Langfuse

What it gets right: Open source, self-hostable, framework-agnostic.
The pricing is transparent. The community is solid (25k+ GitHub stars).

What it gets wrong:

  • No MCP support. If you're building with Claude and MCP tools, you're blind.
  • Alerting is weak. For production monitoring, most teams end up piping to Datadog or Grafana anyway.
  • 15% latency overhead in benchmarks. Not a dealbreaker, but noticeable for latency-sensitive apps.

Langfuse is genuinely good. It's the one I'd recommend to most teams —
except for the MCP gap.

Helicone

**What it gets right: Incredibly simple setup. Literally a proxy —
one line change and you're logging.

What it gets wrong:

  • It's a **proxy, not an instrumentation layer. That means it only sees HTTP traffic. No agent tracing, no span-level visibility.
  • If you want to understand why your agent made a decision, Helicone can't help you.
  • Limited self-hosting story.

Great for quick cost tracking on simple apps. Not for complex agents.

Phoenix (Arize)

What it gets right: Strong on ML observability roots.
OpenTelemetry-native. Good for teams with existing ML infrastructure.

What it gets wrong:

  • Complexity. It's built for ML teams with existing Arize infrastructure, not solo devs or small teams.
  • Setup is non-trivial compared to the others.
  • The UI feels like it was designed for data scientists, not backend developers.

What I actually needed

After using all four, I realized my requirements were simpler
than any of them assumed:

  1. See exactly what my agent did — every tool call, every decision, in order
  2. Keep my data on my own server — I have EU customers
  3. Not pay per seat — I'm a solo dev
  4. Work with Claude and MCP — that's my stack

None of them checked all four boxes. So I built AgentLens.

AgentLens

It's MIT licensed, self-hosted, and has native MCP support - the only obserability tool that does.

Setup:
``bash
import agentlens
agentlens.init()
agentlens.patch_anthropic() # every Claude call tracked automatically

Top comments (0)