<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ben Higgins</title>
    <description>The latest articles on DEV Community by Ben Higgins (@b3n_higgins).</description>
    <link>https://dev.to/b3n_higgins</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3872023%2Fab146744-e112-407e-a033-26f134e9580c.png</url>
      <title>DEV Community: Ben Higgins</title>
      <link>https://dev.to/b3n_higgins</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/b3n_higgins"/>
    <language>en</language>
    <item>
      <title>How I built an open-source observability layer for AI agents (and why it’s needed)</title>
      <dc:creator>Ben Higgins</dc:creator>
      <pubDate>Sat, 11 Apr 2026 11:25:40 +0000</pubDate>
      <link>https://dev.to/b3n_higgins/how-i-built-an-open-source-observability-layer-for-ai-agents-and-why-its-needed-2km8</link>
      <guid>https://dev.to/b3n_higgins/how-i-built-an-open-source-observability-layer-for-ai-agents-and-why-its-needed-2km8</guid>
      <description>&lt;p&gt;Last week I shipped layr-sdk — open source observability for AI agents. Here's the honest story of how it got built, the pivots along the way, and why I think the agentic AI ecosystem is missing something fundamental.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem I kept running into:
&lt;/h2&gt;

&lt;p&gt;I work in data integration. Over the past year, I've watched enterprises get increasingly excited about AI agents — and increasingly stuck when they try to actually deploy them.&lt;/p&gt;

&lt;p&gt;The technology works. The frameworks are mature. LangChain, CrewAI, AutoGen — you can build a capable agent in an afternoon.&lt;/p&gt;

&lt;p&gt;But when teams try to put agents into production, something breaks down. Not the agent itself. The infrastructure around it. Specifically, nobody can see what the agent is actually doing.&lt;/p&gt;

&lt;p&gt;Not just logs. Real observability. When your agent sends an email, makes a database query, &lt;br&gt;
or calls an external API — can you tell:&lt;/p&gt;

&lt;p&gt;What action was taken, and whether it succeeded&lt;br&gt;
Why did it decide to take that action&lt;br&gt;
Which tools were considered before choosing&lt;br&gt;
What does it cost in tokens and dollars&lt;br&gt;
How long it took&lt;br&gt;
Whether that behaviour is normal or anomalous&lt;/p&gt;

&lt;p&gt;For most teams, the answer is no. And that gap, between deploying an agent and understanding what it's doing, is what I built Layr to close.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## The first idea was wrong&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My original instinct was to build a governance and compliance platform. Okta for agentic AI, if you will. A dashboard where enterprises could define policies, enforce boundaries, and generate audit trails for regulators. I built it. It looked good. And then I asked myself an uncomfortable question. What compliance standard are we actually driving towards?&lt;/p&gt;

&lt;p&gt;There isn't one. Not yet. Trying to sell compliance without a defined standard is selling fear, not value. The buyer can't articulate what they need, and you can't articulate what you're delivering.&lt;/p&gt;

&lt;p&gt;So I pivoted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## The real insight&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The right framing isn't compliance. It's observability. Every layer of the modern software stack has an observability standard.&lt;/p&gt;

&lt;p&gt;Infrastructure — OpenTelemetry&lt;br&gt;
Applications — OpenTelemetry&lt;br&gt;
Databases — OpenTelemetry&lt;/p&gt;

&lt;p&gt;AI agents — nothing.&lt;/p&gt;

&lt;p&gt;There is no standard for capturing agent actions. No standard for how reasoning &lt;br&gt;
chains are expressed. No standard for how token consumption is measured across frameworks and platforms.&lt;/p&gt;

&lt;p&gt;That's the gap. And it's the same gap that existed for infrastructure before OpenTelemetry emerged.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## What I built&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Layr is an open source Python SDK that instruments AI agents and emits structured telemetry data. Three lines of code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;layr&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;send_email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;reasoning&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Customer requested update&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;450&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;210&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;latency_ms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1200&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every tracked action produces a structured event containing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent identity — name, framework, model,  environment&lt;/li&gt;
&lt;li&gt;Action details — what it did, what it acted on, whether it succeeded&lt;/li&gt;
&lt;li&gt;Reasoning chain — intent, confidence score, tools considered vs tools used&lt;/li&gt;
&lt;li&gt;LLM metrics — input tokens, output tokens, estimated cost, latency&lt;/li&gt;
&lt;li&gt;Session context — total actions, total cost, what triggered the session&lt;/li&gt;
&lt;li&gt;Multi-agent metadata — parent agent, handoff count, delegation depth&lt;/li&gt;
&lt;li&gt;Anomaly signals — deviation from baseline cost, error rate, actions per hour&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;## The OpenTelemetry decision&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The most important technical decision I made was to emit native OpenTelemetry spans by &lt;br&gt;
default.&lt;/p&gt;

&lt;p&gt;This means Layr doesn't require you to adopt a new platform. Your agent telemetry flows &lt;br&gt;
into whatever observability backend you already use: Grafana, Datadog, Honeycomb, &lt;br&gt;
or any OTEL compatible system.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;exporter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;otlp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;      &lt;span class="c1"&gt;# OpenTelemetry
&lt;/span&gt;    &lt;span class="c1"&gt;# exporter="datadog" # Datadog  
&lt;/span&gt;    &lt;span class="c1"&gt;# exporter="grafana" # Grafana
&lt;/span&gt;    &lt;span class="c1"&gt;# exporter="layr"    # Layr Cloud
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the difference between building a point solution and building infrastructure. LangSmith is great if you're building on LangChain and happy to send data to their platform. Layr is for teams who want framework-agnostic, stack-agnostic instrumentation that they fully control.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## Framework integrations&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The LangChain integration was the most important to get right. Zero manual instrumentation, add a callback handler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;layr.integrations.langchain&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LayrCallbackHandler&lt;/span&gt;

&lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LayrCallbackHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;callbacks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every LLM call, tool use, and agent action is now automatically tracked.&lt;/p&gt;

&lt;p&gt;CrewAI and AutoGen integrations work the same way. The goal is that whatever framework you're building on, Layr should feel native.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## Local development mode&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One thing I was deliberate about — Layr should work completely offline during development. No API key, no data sent anywhere:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;LAYR_MODE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;local &lt;/span&gt;python my_agent.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output goes straight to your console:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[LAYR] agent=customer-support-agent
       action=send_email
       target=user@example.com
       tokens=660 cost=$0.003225
       latency=1200ms
       success=True

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This matters because trust is everything for an observability tool. If developers aren't confident about what data you're collecting and where it goes, they won't instrument anything sensitive. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## The build-in public numbers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I shipped v0.1.0 last Thursday. Here's the honest week one data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Day 1 — 84 real installs&lt;/li&gt;
&lt;li&gt;Day 2 — 101 real installs&lt;/li&gt;
&lt;li&gt;Week total — 200+ installs&lt;/li&gt;
&lt;li&gt;Marketing spend — $0&lt;/li&gt;
&lt;li&gt;Customers — 0&lt;/li&gt;
&lt;li&gt;GitHub stars — 1&lt;/li&gt;
&lt;li&gt;X followers — 1&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;200 installs with two X posts and a GitHub repo. I'll take that as an early signal that the problem is real.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## The bigger vision&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I want Layr to become what OpenTelemetry is for infrastructure: the standard for how AI agent telemetry is emitted, regardless of framework, platform, or vendor.&lt;/p&gt;

&lt;p&gt;Not a platform that locks you in. Not a tool for one ecosystem. The instrumentation layer on which the entire agentic AI stack is built on top of.&lt;/p&gt;

&lt;p&gt;That's a long road. But the technical foundation is right, OTEL native, framework agnostic, fully open source, and the timing feels right. The frameworks are mature. The production deployments are starting. The observability gap is becoming painful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## Try it&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;layr-sdk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: github.com/getlayr/layr-sdk&lt;br&gt;
Website: getlayr.co&lt;/p&gt;

&lt;p&gt;I'm building this entirely in public. If you're running agents in production or staging I'd love to hear what your observability setup looks like today and what Layr is missing.&lt;/p&gt;

&lt;p&gt;What metrics matter most to you? What integrations would make this immediately useful?&lt;/p&gt;

&lt;p&gt;Reply here or open an issue on GitHub. &lt;br&gt;
Always up for a chat.&lt;/p&gt;

&lt;p&gt;Thanks for reading. &lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>agents</category>
      <category>langchain</category>
    </item>
  </channel>
</rss>
