The Langfuse migration that cost us a sprint: how I now budget LLM observability

#llmops #langfuse #opentelemetry #observability

We moved off our first tracer in month eight. The migration took one engineer the better part of a sprint, because the trace data lived in a schema we did not own. Nobody costed that line item on day one. I am writing this so you can.

I run reliability for a small team shipping LLM features. When the pager goes off at 2am, I do not care which dashboard is prettiest. I care about two numbers: what this tool costs me per month, and what it costs me to leave. Those two numbers are the whole story, and they are almost never on the comparison page.

So here are six Langfuse alternatives. For each I tracked both numbers: the monthly bill on the invoice, and the exit bill that only shows up the day you migrate. I compared Helicone, Arize Phoenix, LangSmith, Braintrust, Laminar, and Future AGI traceAI. They all trace LLM calls (prompts, tokens, retrieval spans, latency). The axis that decides your exit cost is whether the trace format is OpenTelemetry-native or a vendor schema. Get that wrong and the migration bill lands later, with interest.

The cost nobody puts on the pricing page

Your monthly invoice is the visible cost. The exit cost is the invisible one: re-instrumenting the app, rebuilding integrations, and losing historical traces when the schema does not travel. If your spans are OTel, the exit cost trends toward zero because the data is portable by construction. If they are proprietary, you are paying a deferred bill every month you stay. Sort on that first.

Helicone. The gateway-first option. You proxy model calls through it and get logging, cost tracking, and analytics with almost no code change. Apache-2.0, self-hostable, roughly 5,800 GitHub stars as of June 2026. On pure observability ergonomics this is one of the strongest picks, and the proxy model means low setup cost. The thing to watch at scale: a gateway in the request path is one more hop to reason about when latency spikes.

Arize Phoenix. The open-source OTel option. Tracing plus evals, self-hostable, around 10,000 stars as of June 2026. Because it is OTel-native, your exit cost stays low. The commercial Arize AX tier adds ML monitoring and enterprise features. If portability is your top line, this and traceAI are the two that keep the invisible bill near zero.

LangSmith. The LangChain-native option. If you live in LangChain or LangGraph, instrumentation is automatic and the developer experience is strong. Proprietary and closed-source, tightly coupled to the LangChain ecosystem. This is the most lock-in of the group: the day-one cost is the lowest, the day-200 cost is the highest. Worth it only if you are certain you are never leaving LangChain.

Braintrust. The polished SaaS option. One of the better eval and observability experiences, and the people who do not page (PMs, leads) tend to like the UI. Proprietary trace schema, closed-source, managed by default. Even on enterprise deployments you operate inside their format, so the exit cost stays on the books.

Laminar. The newer open-source entrant. OTel-based tracing with evals, smaller and younger than Phoenix, in the low-thousands of stars as of June 2026. Lower lock-in on the same OTel logic. The cost to weigh here is maturity, not portability: a smaller project means fewer battle-tested edges, which matters more for an on-call rotation than a demo.

Future AGI traceAI. The instrumentation-layer option. Worth being precise here, because it is not the same kind of thing as the others. traceAI is not an observability dashboard. It is an Apache-2.0, OpenTelemetry-native instrumentation SDK (pip install fi-instrumentation-otel) that emits portable OTel spans for 50-plus frameworks as of June 2026. The spans go wherever you point your collector. Future AGI's broader platform adds evals on top (50-plus metrics under one evaluate() call as of June 2026), but on raw observability ergonomics Helicone and Phoenix are more mature dashboards. Where traceAI earns its place on this list is the exit-cost column: because it speaks OTel, the cost of leaving is roughly the cost of changing a collector endpoint. Code: github.com/future-agi/traceAI.

The two numbers, side by side

Visible cost is easy: read the pricing page, multiply by your span volume, done. Invisible cost is the one that bit me. The open-source OTel tools (Phoenix, Laminar, traceAI as the instrumentation layer) keep your exit near free. The proprietary ones (LangSmith, Braintrust) front-load convenience and back-load the migration. Helicone sits in between: open and portable, with a proxy hop to account for. Pick the lock-in profile you can afford in month eight, then argue about features.

What I'd page on

If I were standing this up again, here is the dashboard and alert set I would build before I cared about anything else:

Trace export success rate below 99 percent over 5 minutes. A silent collector drop is invisible until you need the trace you do not have.

Span ingestion cost per day trending above your budget line. Token spend gets watched; span volume does not, and it scales with traffic too.

P99 added latency from the tracing path above your SLO budget. If the tracer (or proxy) adds tail latency, that is a reliability cost masquerading as observability.

Percent of spans in a portable (OTel) format. This is your exit-cost gauge. If it drifts down because someone added a proprietary integration, you just took on migration debt. Page on it before it compounds.

Dropped-trace rate during incidents specifically. Tracing tends to fail exactly when load is highest, which is exactly when you need it. Alert on the correlation, not just the absolute.

Build those five first. The dashboard you actually page on is cheaper than the migration you did not plan for.