Introduction
"AI did something. But I have no idea what it actually did." — Before that happens to you, observability is something you really need to think about upfront.
When you develop using AI coding tools like Claude Code or Cursor, you can see "it works." But "why it worked," "where and how it made decisions," and "whether the implementation matches my intent" become much harder to see.
The same goes when you use AI in your product itself — whether that's calling LLM APIs, building AI agents, or integrating MCP tools. If you can't see what the AI returned or what decisions it made, you can't improve it.
There's a principle in the SRE world: "If you can't see it, you can't fix it."
This principle applies directly to AI-powered development and to products that run AI-driven features.
This article captures how I've come to think about observability in LLM-based development — and what I'm actually doing about it.
What Is Observability? (In the SRE Context)
Observability is one of the core concepts in SRE.
It refers to the degree to which you can understand a system's state from the outside. Generally, it's described as the "three pillars":
- Metrics: State expressed in numbers (latency, error rate, CPU usage, etc.)
- Logs: Records of what happened
- Tracing: Records of how requests flowed through the system
When all three are present, you can answer: "What's happening?" "Why is it happening?" and "Where is it happening?"
In AI-involved development, you need to think about this in two contexts:
- Observability of the development process: Observing what happens when you use AI tools to develop
- Observability of AI in production: Observing AI behavior within your product
1. When You Use AI in Your Development Process, Visibility Decreases
When you develop with Claude Code or similar tools, changes happen fast.
But "why did this happen?" becomes much harder to see.
For example:
- When you accept code the AI proposed as-is, you can't trace the reasoning behind that decision
- Subtle differences between what you intended and what got implemented are easy to miss
- Later, you can't remember "why was this changed that way?"
This isn't really a code quality issue — it's a process observability problem.
How to Handle It
- Write commit messages carefully: Even if you have the AI write them, always include the intent
- Document your PRs properly: Make the intent and context behind changes explicit
- Review AI-generated code as a habit: Don't stop at "it works"
- Write tests first: Declaring intent upfront lets you catch gaps between the AI's implementation and what you wanted
In short, the faster your AI moves, the more you need to deliberately create places where humans can observe what's happening.
2. Think About AI Observability in Production as Three Layers
When you embed AI in your product, observability is essential.
Layer 1: Input/Output Logging
At minimum, log what you send to the AI and what it returns.
- The prompt you provided
- The output it generated
- The model used
- Tokens consumed
- Cost
Even this alone lets you investigate later: "Why was that output wrong that one time?"
Layer 2: Action Logging
Record what the AI did. This is especially important if you're using it in an agentic way.
- Which tools it called
- What each tool invocation returned
- Where errors occurred
- Whether any retries happened
Without this, you can't track down problems like "the AI called these tools in the wrong order."
Layer 3: Connecting to Product Behavior
Link what the AI does to what your users do.
- Did users actually use the content the AI suggested?
- Which input patterns tend to produce good outputs?
- Where do users drop off?
To see this, you need to connect AI logs with your product's user behavior logs.
What I'm Actually Doing
This is what we're implementing across kaizen-lab and related projects.
Storing User Behavior in the Database with Vercel Analytics Drains
By storing page views and custom events in the database, we can connect AI outputs with user behavior.
I wrote about this in detail here: "Connecting Vercel Analytics Drains to an internal database to have AI evaluate product behavior"
Exposing APIs via MCP
By making the product state accessible to external AI tools via MCP, we expand what the AI can observe. This is a form of observability in itself.
Catching Errors with Sentry on AI Workflows
When AI processes things automatically, errors can happen without anyone noticing immediately. By integrating Sentry, we catch unexpected exceptions.
I covered this in: "Automating Error Detection → Root Cause Analysis → Fix PRs with Sentry × AI Agents"
Without Observability, You Can't Get to "Improvement"
In AI-powered development, it's common to stop at "it works, so we're done."
But "it works" and "it works correctly" are different things.
Only with observability can you reach the state where:
- You understand what the problem is
- You know how to improve it
- You can verify that your improvements are working
This is true whether you're using AI in development or running AI in production.
I think "if you can't see it, you can't fix it" is actually becoming even more important in the LLM era of development.
Summary
The observability mindset from SRE applies directly to LLM-based development.
What I try to keep in mind:
- Observability of the development process: Build a process where you can track what the AI did
- Observability of AI in production: Input/output logs, action logs, and connections to user behavior
- You can't improve what you can't observe: This principle doesn't change before or after AI
Observability isn't just "something enterprises do in their SRE teams."
The moment you start developing with AI, it becomes something indie developers need to think about too.
Top comments (0)