toshipon

Posted on Apr 19

If You're Building with LLMs, You Should Have Thought About Observability from Day One

#ai #webdev #programming #productivity

Introduction

"AI did something. But I have no idea what it actually did." — Before that happens to you, observability is something you really need to think about upfront.

When you develop using AI coding tools like Claude Code or Cursor, you can see "it works." But "why it worked," "where and how it made decisions," and "whether the implementation matches my intent" become much harder to see.

The same goes when you use AI in your product itself — whether that's calling LLM APIs, building AI agents, or integrating MCP tools. If you can't see what the AI returned or what decisions it made, you can't improve it.

There's a principle in the SRE world: "If you can't see it, you can't fix it."

This principle applies directly to AI-powered development and to products that run AI-driven features.

This article captures how I've come to think about observability in LLM-based development — and what I'm actually doing about it.

What Is Observability? (In the SRE Context)

Observability is one of the core concepts in SRE.

It refers to the degree to which you can understand a system's state from the outside. Generally, it's described as the "three pillars":

Metrics: State expressed in numbers (latency, error rate, CPU usage, etc.)
Logs: Records of what happened
Tracing: Records of how requests flowed through the system

When all three are present, you can answer: "What's happening?" "Why is it happening?" and "Where is it happening?"

In AI-involved development, you need to think about this in two contexts:

Observability of the development process: Observing what happens when you use AI tools to develop
Observability of AI in production: Observing AI behavior within your product

1. When You Use AI in Your Development Process, Visibility Decreases

When you develop with Claude Code or similar tools, changes happen fast.
But "why did this happen?" becomes much harder to see.

For example:

When you accept code the AI proposed as-is, you can't trace the reasoning behind that decision
Subtle differences between what you intended and what got implemented are easy to miss
Later, you can't remember "why was this changed that way?"

This isn't really a code quality issue — it's a process observability problem.

How to Handle It

Write commit messages carefully: Even if you have the AI write them, always include the intent
Document your PRs properly: Make the intent and context behind changes explicit
Review AI-generated code as a habit: Don't stop at "it works"
Write tests first: Declaring intent upfront lets you catch gaps between the AI's implementation and what you wanted

In short, the faster your AI moves, the more you need to deliberately create places where humans can observe what's happening.

2. Think About AI Observability in Production as Three Layers

When you embed AI in your product, observability is essential.

Layer 1: Input/Output Logging

At minimum, log what you send to the AI and what it returns.

The prompt you provided
The output it generated
The model used
Tokens consumed
Cost

Even this alone lets you investigate later: "Why was that output wrong that one time?"

Layer 2: Action Logging

Record what the AI did. This is especially important if you're using it in an agentic way.

Which tools it called
What each tool invocation returned
Where errors occurred
Whether any retries happened

Without this, you can't track down problems like "the AI called these tools in the wrong order."

Layer 3: Connecting to Product Behavior

Link what the AI does to what your users do.

Did users actually use the content the AI suggested?
Which input patterns tend to produce good outputs?
Where do users drop off?

To see this, you need to connect AI logs with your product's user behavior logs.

What I'm Actually Doing

This is what we're implementing across kaizen-lab and related projects.

Storing User Behavior in the Database with Vercel Analytics Drains

By storing page views and custom events in the database, we can connect AI outputs with user behavior.

I wrote about this in detail here: "Connecting Vercel Analytics Drains to an internal database to have AI evaluate product behavior"

Exposing APIs via MCP

By making the product state accessible to external AI tools via MCP, we expand what the AI can observe. This is a form of observability in itself.

Catching Errors with Sentry on AI Workflows

When AI processes things automatically, errors can happen without anyone noticing immediately. By integrating Sentry, we catch unexpected exceptions.

I covered this in: "Automating Error Detection → Root Cause Analysis → Fix PRs with Sentry × AI Agents"

Without Observability, You Can't Get to "Improvement"

In AI-powered development, it's common to stop at "it works, so we're done."

But "it works" and "it works correctly" are different things.

Only with observability can you reach the state where:

You understand what the problem is
You know how to improve it
You can verify that your improvements are working

This is true whether you're using AI in development or running AI in production.

I think "if you can't see it, you can't fix it" is actually becoming even more important in the LLM era of development.

Summary

The observability mindset from SRE applies directly to LLM-based development.

What I try to keep in mind:

Observability of the development process: Build a process where you can track what the AI did
Observability of AI in production: Input/output logs, action logs, and connections to user behavior
You can't improve what you can't observe: This principle doesn't change before or after AI

Observability isn't just "something enterprises do in their SRE teams."

The moment you start developing with AI, it becomes something indie developers need to think about too.

DEV Community