DEV Community

HAESEONG JEON
HAESEONG JEON

Posted on

Spanlens

Spanlens is an open-source (MIT) LLM observability platform that lets developers monitor every call their application makes to OpenAI, Anthropic, Gemini, Mistral, OpenRouter, Azure OpenAI, or a local Ollama model. Integration takes one line: swap your client's baseURL to the Spanlens proxy, or run "npx @spanlens/cli init" and the wizard rewrites your code automatically. From that moment, every request is recorded with its model, token counts, latency, cost, and full prompt and response body, with streaming responses reconstructed automatically.

The dashboard turns that raw log into operational insight. Cost tracking breaks spend down per request, per model, and per end user, and parses prompt-cache tokens separately so you see real cache savings rather than sticker price. Agent tracing visualizes multi-step workflows as Gantt waterfalls and node-and-edge graphs, highlighting the critical path so you can find the slowest dependency chain in a fan-out. Anomaly detection flags 3-sigma deviations in latency, cost, or error rate against a rolling 7-day baseline with root-cause hints. Alerts on budget, error rate, and p95 latency are delivered to Email, Slack, or Discord.

Spanlens goes beyond passive logging. A regex-based PII and prompt-injection scanner inspects request and response bodies and can block injections at the proxy. The savings engine spots calls that match a cheaper model's profile (for example, a gpt-4o call that looks like a classification task) and estimates the monthly saving from switching. Prompt versioning with A/B experiments compares versions on latency, cost, and error rate using Welch's t-test for statistical significance, and an LLM-as-judge evaluation framework (judge with OpenAI, Anthropic, or Gemini) scores outputs against rubric anchors, with human agreement measured by Pearson r or Cohen's kappa. Reusable datasets power offline evals and regression checks.

Top comments (0)