<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Soufian Azzaoui</title>
    <description>The latest articles on DEV Community by Soufian Azzaoui (@soufian_azzaoui_85ea1c030).</description>
    <link>https://dev.to/soufian_azzaoui_85ea1c030</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3889595%2Fd2d51747-60ce-4146-8b33-d256e2b4a050.png</url>
      <title>DEV Community: Soufian Azzaoui</title>
      <link>https://dev.to/soufian_azzaoui_85ea1c030</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/soufian_azzaoui_85ea1c030"/>
    <language>en</language>
    <item>
      <title>I tried LangSmith, Langfuse, Helicone, and Phoenix — here's what each gets wrong</title>
      <dc:creator>Soufian Azzaoui</dc:creator>
      <pubDate>Mon, 20 Apr 2026 19:52:48 +0000</pubDate>
      <link>https://dev.to/soufian_azzaoui_85ea1c030/i-tried-langsmith-langfuse-helicone-and-phoenix-heres-what-each-gets-wrong-2cjk</link>
      <guid>https://dev.to/soufian_azzaoui_85ea1c030/i-tried-langsmith-langfuse-helicone-and-phoenix-heres-what-each-gets-wrong-2cjk</guid>
      <description>&lt;p&gt;I spent the last three months building a production LLM app. &lt;br&gt;
I tried every major observability tool. None of them fit perfectly — &lt;br&gt;
so I built my own.&lt;/p&gt;

&lt;p&gt;Here's my honest take on each one.&lt;/p&gt;

&lt;p&gt;LangSmith&lt;/p&gt;

&lt;p&gt;What it gets right: Deep LangChain integration. If you're all-in &lt;br&gt;
on LangGraph, it's seamless.&lt;/p&gt;

&lt;p&gt;What it gets wrong: Everything else.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pricing is punitive.$39/seat/month — before you log a single 
trace. A team of 5 = $195/month just to get started.&lt;/li&gt;
&lt;li&gt;14-day retention by default.** Want 400 days? That's $5.00 per 
1,000 traces — 10x the base price. No middle tier.&lt;/li&gt;
&lt;li&gt;US data only unless you're on an Enterprise plan. 
For EU teams: good luck with GDPR.&lt;/li&gt;
&lt;li&gt;Vendor lock-in. It's built for LangChain. Use anything else 
and you're fighting the tool.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Langfuse&lt;/p&gt;

&lt;p&gt;What it gets right: Open source, self-hostable, framework-agnostic. &lt;br&gt;
The pricing is transparent. The community is solid (25k+ GitHub stars).&lt;/p&gt;

&lt;p&gt;What it gets wrong:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No MCP support. If you're building with Claude and MCP tools, 
you're blind.&lt;/li&gt;
&lt;li&gt;Alerting is weak. For production monitoring, most teams end up 
piping to Datadog or Grafana anyway.&lt;/li&gt;
&lt;li&gt;15% latency overhead in benchmarks. Not a dealbreaker, but 
noticeable for latency-sensitive apps.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Langfuse is genuinely good. It's the one I'd recommend to most teams — &lt;br&gt;
except for the MCP gap.&lt;/p&gt;

&lt;p&gt;Helicone&lt;/p&gt;

&lt;p&gt;**What it gets right: Incredibly simple setup. Literally a proxy — &lt;br&gt;
one line change and you're logging.&lt;/p&gt;

&lt;p&gt;What it gets wrong:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It's a **proxy, not an instrumentation layer. That means it only 
sees HTTP traffic. No agent tracing, no span-level visibility.&lt;/li&gt;
&lt;li&gt;If you want to understand &lt;em&gt;why&lt;/em&gt; your agent made a decision, 
Helicone can't help you.&lt;/li&gt;
&lt;li&gt;Limited self-hosting story.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Great for quick cost tracking on simple apps. Not for complex agents.&lt;/p&gt;

&lt;p&gt;Phoenix (Arize)&lt;/p&gt;

&lt;p&gt;What it gets right: Strong on ML observability roots. &lt;br&gt;
OpenTelemetry-native. Good for teams with existing ML infrastructure.&lt;/p&gt;

&lt;p&gt;What it gets wrong:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complexity. It's built for ML teams with existing Arize 
infrastructure, not solo devs or small teams.&lt;/li&gt;
&lt;li&gt;Setup is non-trivial compared to the others.&lt;/li&gt;
&lt;li&gt;The UI feels like it was designed for data scientists, 
not backend developers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What I actually needed&lt;/p&gt;

&lt;p&gt;After using all four, I realized my requirements were simpler &lt;br&gt;
than any of them assumed:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;See exactly what my agent did — every tool call, every 
decision, in order&lt;/li&gt;
&lt;li&gt;Keep my data on my own server — I have EU customers&lt;/li&gt;
&lt;li&gt;Not pay per seat — I'm a solo dev&lt;/li&gt;
&lt;li&gt;Work with Claude and MCP — that's my stack&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of them checked all four boxes. So I built AgentLens.&lt;/p&gt;

&lt;p&gt;AgentLens&lt;/p&gt;

&lt;p&gt;It's MIT licensed, self-hosted, and has native MCP support - the only obserability tool that does.&lt;/p&gt;

&lt;p&gt;Setup:&lt;br&gt;
``bash&lt;br&gt;
import agentlens&lt;br&gt;
agentlens.init()&lt;br&gt;
agentlens.patch_anthropic()  # every Claude call tracked automatically&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>monitoring</category>
      <category>tooling</category>
    </item>
  </channel>
</rss>
