DEV Community

Jonathan Martin Paez
Jonathan Martin Paez

Posted on

Building Lookspan: local-first observability & replay for LLM apps (v0.4.0)

I've been building Lookspan — a local-first observability and replay tool for apps that use LLMs — and wanted to share where it's at after the latest release.

The problem

When your app calls an LLM, what actually happened is mostly a black box: which prompt went out, what came back, which tools fired, and why the output changed between runs. Most observability stacks were built for plain HTTP services, not for the non-deterministic world of LLM calls.

What Lookspan does

  • Capture spans/traces of your LLM calls — prompts, responses, tool calls. It's MCP-native, so it plugs into the ecosystem instead of locking you in.
  • Replay & diff — re-run a captured trace and compare outputs side by side. Perfect for catching regressions when you tweak a prompt or swap a model.
  • LLM-as-judge — score outputs automatically instead of eyeballing them.
  • Local-first — your traces stay on your machine. No vendor, nothing leaves your laptop.

New in v0.4.0: datasets & experiments

The headline addition is a real evaluation loop:

  1. Define a test set of inputs.
  2. Run a batch through your app.
  3. Judge the results (LLM-as-judge).
  4. See the aggregates — pass rates, diffs, trends.

It turns "I think the new prompt is better" into a number you can actually compare.

The road here

  • 0.2 — multi-agent capture
  • 0.3 — replay/diff + LLM-as-judge
  • 0.4 — datasets & experiments

Try it

npx lookspan
Enter fullscreen mode Exit fullscreen mode

It's on npm: lookspan.

It's still early and I'd love feedback — what would you want from an LLM observability tool you can run entirely locally?

Top comments (0)