Ben Higgins

Posted on Apr 11

How I built an open-source observability layer for AI agents (and why it’s needed)

#ai #agents #opensource #langchain

Last week I shipped layr-sdk — open source observability for AI agents. Here's the honest story of how it got built, the pivots along the way, and why I think the agentic AI ecosystem is missing something fundamental.

The problem I kept running into:

I work in data integration. Over the past year, I've watched enterprises get increasingly excited about AI agents — and increasingly stuck when they try to actually deploy them.

The technology works. The frameworks are mature. LangChain, CrewAI, AutoGen — you can build a capable agent in an afternoon.

But when teams try to put agents into production, something breaks down. Not the agent itself. The infrastructure around it. Specifically, nobody can see what the agent is actually doing.

Not just logs. Real observability. When your agent sends an email, makes a database query,
or calls an external API — can you tell:

What action was taken, and whether it succeeded
Why did it decide to take that action
Which tools were considered before choosing
What does it cost in tokens and dollars
How long it took
Whether that behaviour is normal or anomalous

For most teams, the answer is no. And that gap, between deploying an agent and understanding what it's doing, is what I built Layr to close.

## The first idea was wrong

My original instinct was to build a governance and compliance platform. Okta for agentic AI, if you will. A dashboard where enterprises could define policies, enforce boundaries, and generate audit trails for regulators. I built it. It looked good. And then I asked myself an uncomfortable question. What compliance standard are we actually driving towards?

There isn't one. Not yet. Trying to sell compliance without a defined standard is selling fear, not value. The buyer can't articulate what they need, and you can't articulate what you're delivering.

So I pivoted.

## The real insight

The right framing isn't compliance. It's observability. Every layer of the modern software stack has an observability standard.

Infrastructure — OpenTelemetry
Applications — OpenTelemetry
Databases — OpenTelemetry

AI agents — nothing.

There is no standard for capturing agent actions. No standard for how reasoning
chains are expressed. No standard for how token consumption is measured across frameworks and platforms.

That's the gap. And it's the same gap that existed for infrastructure before OpenTelemetry emerged.

## What I built

Layr is an open source Python SDK that instruments AI agents and emits structured telemetry data. Three lines of code:

from layr import Agent

agent = Agent(api_key="your-key")
agent.track(
    action="send_email",
    reasoning="Customer requested update",
    input_tokens=450,
    output_tokens=210,
    latency_ms=1200
)

Every tracked action produces a structured event containing:

Agent identity — name, framework, model, environment
Action details — what it did, what it acted on, whether it succeeded
Reasoning chain — intent, confidence score, tools considered vs tools used
LLM metrics — input tokens, output tokens, estimated cost, latency
Session context — total actions, total cost, what triggered the session
Multi-agent metadata — parent agent, handoff count, delegation depth
Anomaly signals — deviation from baseline cost, error rate, actions per hour

## The OpenTelemetry decision

The most important technical decision I made was to emit native OpenTelemetry spans by
default.

This means Layr doesn't require you to adopt a new platform. Your agent telemetry flows
into whatever observability backend you already use: Grafana, Datadog, Honeycomb,
or any OTEL compatible system.

agent = Agent(
    api_key="your-key",
    exporter="otlp"      # OpenTelemetry
    # exporter="datadog" # Datadog  
    # exporter="grafana" # Grafana
    # exporter="layr"    # Layr Cloud
)

This is the difference between building a point solution and building infrastructure. LangSmith is great if you're building on LangChain and happy to send data to their platform. Layr is for teams who want framework-agnostic, stack-agnostic instrumentation that they fully control.

## Framework integrations

The LangChain integration was the most important to get right. Zero manual instrumentation, add a callback handler:

from layr.integrations.langchain import LayrCallbackHandler

handler = LayrCallbackHandler(api_key="your-key")
llm = ChatOpenAI(callbacks=[handler])

Every LLM call, tool use, and agent action is now automatically tracked.

CrewAI and AutoGen integrations work the same way. The goal is that whatever framework you're building on, Layr should feel native.

## Local development mode

One thing I was deliberate about — Layr should work completely offline during development. No API key, no data sent anywhere:

LAYR_MODE=local python my_agent.py

Output goes straight to your console:

[LAYR] agent=customer-support-agent
       action=send_email
       target=user@example.com
       tokens=660 cost=$0.003225
       latency=1200ms
       success=True

This matters because trust is everything for an observability tool. If developers aren't confident about what data you're collecting and where it goes, they won't instrument anything sensitive.

## The build-in public numbers

I shipped v0.1.0 last Thursday. Here's the honest week one data:

Day 1 — 84 real installs
Day 2 — 101 real installs
Week total — 200+ installs
Marketing spend — $0
Customers — 0
GitHub stars — 1
X followers — 1

200 installs with two X posts and a GitHub repo. I'll take that as an early signal that the problem is real.

## The bigger vision

I want Layr to become what OpenTelemetry is for infrastructure: the standard for how AI agent telemetry is emitted, regardless of framework, platform, or vendor.

Not a platform that locks you in. Not a tool for one ecosystem. The instrumentation layer on which the entire agentic AI stack is built on top of.

That's a long road. But the technical foundation is right, OTEL native, framework agnostic, fully open source, and the timing feels right. The frameworks are mature. The production deployments are starting. The observability gap is becoming painful.

## Try it

pip install layr-sdk

GitHub: github.com/getlayr/layr-sdk
Website: getlayr.co

I'm building this entirely in public. If you're running agents in production or staging I'd love to hear what your observability setup looks like today and what Layr is missing.

What metrics matter most to you? What integrations would make this immediately useful?

Reply here or open an issue on GitHub.
Always up for a chat.

Thanks for reading.