Alex Spinov

Posted on Mar 29

Langfuse Has a Free LLM Observability Platform That Shows You Where AI Costs Go

#langfuse #llm #ai #observability

Your AI app burns $500/month on OpenAI. But which prompts cost the most? Which ones fail? Langfuse gives you full observability for LLM applications — traces, costs, quality scores, and prompt management.

The Problem

LLM applications are expensive and unpredictable:

Which prompts cost the most?
Which responses are low quality?
How much does each feature cost in API calls?
Are there regressions after prompt changes?

Langfuse answers all of these.

Setup

npm install langfuse

import { Langfuse } from "langfuse";

const langfuse = new Langfuse({
  publicKey: "pk-...",
  secretKey: "sk-...",
  baseUrl: "https://cloud.langfuse.com", // or self-hosted
});

Tracing LLM Calls

// Create a trace
const trace = langfuse.trace({ name: "chat-response", userId: "user-123" });

// Track a generation (LLM call)
const generation = trace.generation({
  name: "gpt-4-response",
  model: "gpt-4",
  input: [{ role: "user", content: "Explain quantum computing" }],
});

const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{ role: "user", content: "Explain quantum computing" }],
});

generation.end({
  output: response.choices[0].message.content,
  usage: {
    promptTokens: response.usage.prompt_tokens,
    completionTokens: response.usage.completion_tokens,
  },
});

Automatic Integration (OpenAI)

import { observeOpenAI } from "langfuse";
import OpenAI from "openai";

const openai = observeOpenAI(new OpenAI(), { generationName: "chat" });

// All OpenAI calls are automatically traced!
const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{ role: "user", content: "Hello" }],
});

Zero manual instrumentation — all calls, costs, and latencies are tracked.

What You See in Langfuse Dashboard

Cost per trace: Exact $ cost of each user interaction
Latency: How long each LLM call takes
Token usage: Input vs output tokens
Quality scores: Rate responses manually or automatically
Prompt versions: Compare different prompt versions
User analytics: Cost and usage per user

Prompt Management

// Fetch prompt from Langfuse (versioned, A/B testable)
const prompt = await langfuse.getPrompt("summarize-article");

const response = await openai.chat.completions.create({
  model: prompt.config.model,
  messages: [{ role: "system", content: prompt.compile({ topic: "AI" }) }],
});

Change prompts without code deployments.

Evaluations

// Automatic evaluation with LLM-as-judge
trace.score({ name: "relevance", value: 0.95 });
trace.score({ name: "helpfulness", value: 0.88 });

// Or use Langfuse's built-in evaluators

Self-Host

git clone https://github.com/langfuse/langfuse.git
cd langfuse && docker compose up -d

Free Tier (Cloud)

50K observations/month
Unlimited team members
30-day data retention

Building AI applications? I create developer tools and data solutions. Email spinov001@gmail.com or check my Apify tools.

DEV Community