Last week I shipped layr-sdk — open source observability for AI agents. Here's the honest story of how it got built, the pivots along the way, and why I think the agentic AI ecosystem is missing something fundamental.
The problem I kept running into:
I work in data integration. Over the past year, I've watched enterprises get increasingly excited about AI agents — and increasingly stuck when they try to actually deploy them.
The technology works. The frameworks are mature. LangChain, CrewAI, AutoGen — you can build a capable agent in an afternoon.
But when teams try to put agents into production, something breaks down. Not the agent itself. The infrastructure around it. Specifically, nobody can see what the agent is actually doing.
Not just logs. Real observability. When your agent sends an email, makes a database query,
or calls an external API — can you tell:
What action was taken, and whether it succeeded
Why did it decide to take that action
Which tools were considered before choosing
What does it cost in tokens and dollars
How long it took
Whether that behaviour is normal or anomalous
For most teams, the answer is no. And that gap, between deploying an agent and understanding what it's doing, is what I built Layr to close.
## The first idea was wrong
My original instinct was to build a governance and compliance platform. Okta for agentic AI, if you will. A dashboard where enterprises could define policies, enforce boundaries, and generate audit trails for regulators. I built it. It looked good. And then I asked myself an uncomfortable question. What compliance standard are we actually driving towards?
There isn't one. Not yet. Trying to sell compliance without a defined standard is selling fear, not value. The buyer can't articulate what they need, and you can't articulate what you're delivering.
So I pivoted.
## The real insight
The right framing isn't compliance. It's observability. Every layer of the modern software stack has an observability standard.
Infrastructure — OpenTelemetry
Applications — OpenTelemetry
Databases — OpenTelemetry
AI agents — nothing.
There is no standard for capturing agent actions. No standard for how reasoning
chains are expressed. No standard for how token consumption is measured across frameworks and platforms.
That's the gap. And it's the same gap that existed for infrastructure before OpenTelemetry emerged.
## What I built
Layr is an open source Python SDK that instruments AI agents and emits structured telemetry data. Three lines of code:
from layr import Agent
agent = Agent(api_key="your-key")
agent.track(
action="send_email",
reasoning="Customer requested update",
input_tokens=450,
output_tokens=210,
latency_ms=1200
)
Every tracked action produces a structured event containing:
- Agent identity — name, framework, model, environment
- Action details — what it did, what it acted on, whether it succeeded
- Reasoning chain — intent, confidence score, tools considered vs tools used
- LLM metrics — input tokens, output tokens, estimated cost, latency
- Session context — total actions, total cost, what triggered the session
- Multi-agent metadata — parent agent, handoff count, delegation depth
- Anomaly signals — deviation from baseline cost, error rate, actions per hour
## The OpenTelemetry decision
The most important technical decision I made was to emit native OpenTelemetry spans by
default.
This means Layr doesn't require you to adopt a new platform. Your agent telemetry flows
into whatever observability backend you already use: Grafana, Datadog, Honeycomb,
or any OTEL compatible system.
agent = Agent(
api_key="your-key",
exporter="otlp" # OpenTelemetry
# exporter="datadog" # Datadog
# exporter="grafana" # Grafana
# exporter="layr" # Layr Cloud
)
This is the difference between building a point solution and building infrastructure. LangSmith is great if you're building on LangChain and happy to send data to their platform. Layr is for teams who want framework-agnostic, stack-agnostic instrumentation that they fully control.
## Framework integrations
The LangChain integration was the most important to get right. Zero manual instrumentation, add a callback handler:
from layr.integrations.langchain import LayrCallbackHandler
handler = LayrCallbackHandler(api_key="your-key")
llm = ChatOpenAI(callbacks=[handler])
Every LLM call, tool use, and agent action is now automatically tracked.
CrewAI and AutoGen integrations work the same way. The goal is that whatever framework you're building on, Layr should feel native.
## Local development mode
One thing I was deliberate about — Layr should work completely offline during development. No API key, no data sent anywhere:
LAYR_MODE=local python my_agent.py
Output goes straight to your console:
[LAYR] agent=customer-support-agent
action=send_email
target=user@example.com
tokens=660 cost=$0.003225
latency=1200ms
success=True
This matters because trust is everything for an observability tool. If developers aren't confident about what data you're collecting and where it goes, they won't instrument anything sensitive.
## The build-in public numbers
I shipped v0.1.0 last Thursday. Here's the honest week one data:
- Day 1 — 84 real installs
- Day 2 — 101 real installs
- Week total — 200+ installs
- Marketing spend — $0
- Customers — 0
- GitHub stars — 1
- X followers — 1
200 installs with two X posts and a GitHub repo. I'll take that as an early signal that the problem is real.
## The bigger vision
I want Layr to become what OpenTelemetry is for infrastructure: the standard for how AI agent telemetry is emitted, regardless of framework, platform, or vendor.
Not a platform that locks you in. Not a tool for one ecosystem. The instrumentation layer on which the entire agentic AI stack is built on top of.
That's a long road. But the technical foundation is right, OTEL native, framework agnostic, fully open source, and the timing feels right. The frameworks are mature. The production deployments are starting. The observability gap is becoming painful.
## Try it
pip install layr-sdk
GitHub: github.com/getlayr/layr-sdk
Website: getlayr.co
I'm building this entirely in public. If you're running agents in production or staging I'd love to hear what your observability setup looks like today and what Layr is missing.
What metrics matter most to you? What integrations would make this immediately useful?
Reply here or open an issue on GitHub.
Always up for a chat.
Thanks for reading.
Top comments (0)