Lakshmi Sravya Vedantham

Posted on Mar 6

I Built a Flight Recorder for AI Agents — Now I Can Replay Every Decision They Made

#ai #programming #rust #opensource

90% of AI agents fail in production. When they do, you get... nothing. No trace, no replay, no step-by-step view of what went wrong. Debugging an agent is like debugging a black box.

I built llm-lens to fix this.

What is llm-lens?

A single Rust binary that sits between your code and any LLM API, records every call, and lets you replay sessions step-by-step in your terminal.

Your code / agent framework
        |
   http://localhost:4001
        |
    ┌─────────┐
    │ llm-lens │  ← records everything, forwards unchanged
    └────┬────┘
         |
    LLM API (OpenAI, Anthropic, etc.)

Zero code changes. Swap one environment variable:

export OPENAI_BASE_URL=http://localhost:4001/v1

Every LLM call now gets recorded. Your code works exactly the same.

Quick Start

git clone https://github.com/LakshmiSravyaVedantham/llm-lens.git
cd llm-lens
cargo build --release
cp config.example.toml config.toml
./target/release/llm-lens start

That is it. Every LLM call through port 4001 is now recorded.

The Killer Feature: Session Replay

Run llm-lens replay --last to step through your most recent agent session:

┌─ Session a3f8 -- 7 calls -- 12.4s total ────────────┐
│                                                       │
│  Step 3/7  [gpt-4]  tokens: 340>120  latency: 1.2s  │
│                                                       │
│  --- REQUEST ---                                      │
│  system: You are a coding assistant                    │
│  user: Fix the bug in auth.py line 42                 │
│                                                       │
│  --- RESPONSE ---                                     │
│  The issue is in the token validation logic.          │
│  Here is the fix: ...                                 │
│                                                       │
│  h: prev | l: next | q: quit | e: export             │
└───────────────────────────────────────────────────────┘

Navigate with h/l (prev/next), q (quit), e (export). See exactly what the agent sent, what it got back, and where it went wrong.

What You Get

Feature	What it does
Session recording	Groups related calls by header or time window
Full trace capture	Stores request, response, tokens, latency, model
TUI replay	Step through any session call-by-call
Failure detection	Auto-flags 5xx errors, empty responses, error fields
JSON export	`llm-lens export <id>` for programmatic analysis
Markdown export	`llm-lens export <id> --md` for sharing in PRs/docs
llmux chaining	Chain behind llmux for caching + tracing together

Session Grouping

llm-lens automatically groups related calls into sessions:

Explicit: Set X-Session-Id header in your agent code
Time-based: Calls within 60 seconds of each other = same session
Fallback: Each call is its own session

Most agent frameworks make multiple LLM calls per task. llm-lens groups them so you can see the full chain of reasoning.

Browse All Sessions

$ llm-lens sessions

Session      Calls    Latency      Tokens     Errors   Last Activity
----------------------------------------------------------------------
a3f8         7        12400ms      2840>710   -        2026-03-06 14:23:01
b2c1         3        4200ms       890>120    YES      2026-03-06 14:15:42
c9d4         12       28100ms      5200>1800  -        2026-03-06 13:50:18

Spot the session with errors. Replay it. Find the exact call that failed.

Export for Sharing

# JSON for programmatic analysis
llm-lens export a3f8 > session.json

# Markdown for PRs and docs
llm-lens export a3f8 --md > session.md

The Markdown export produces a clean document with every step, request, and response — ready to paste into a GitHub issue or PR.

Chain with llmux

llm-lens pairs with llmux (the LLM gateway I built last week):

Your code -> llm-lens:4001 -> llmux:4000 -> OpenAI/Anthropic
                 |                  |
            records traces    caches + failover

Caching, failover, cost tracking, AND full session recording. Two binaries, zero code changes.

Why Rust?

Sub-millisecond proxy overhead (your agent does not slow down)
Single binary, no runtime dependencies
Thread-safe concurrent request handling
SQLite storage — everything stays on your machine

What is Next

This is the second tool in a trilogy:

llmux — LLM gateway with failover, caching, cost tracking
llm-lens (this project) — session recording and trace replay
llm-guard (coming next) — runtime safety monitor for AI agents

Each tool is standalone. Together they form a complete AI agent infrastructure stack.

Try It

git clone https://github.com/LakshmiSravyaVedantham/llm-lens.git
cd llm-lens && cargo build --release

Star it if useful: github.com/LakshmiSravyaVedantham/llm-lens

llm-lens is MIT licensed and open source. Built with Rust, axum, tokio, ratatui, and rusqlite.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.