DEV Community

Alex Spinov
Alex Spinov

Posted on

Langfuse Has a Free LLM Observability Platform That Shows You Where AI Costs Go

Your AI app burns $500/month on OpenAI. But which prompts cost the most? Which ones fail? Langfuse gives you full observability for LLM applications — traces, costs, quality scores, and prompt management.

The Problem

LLM applications are expensive and unpredictable:

  • Which prompts cost the most?
  • Which responses are low quality?
  • How much does each feature cost in API calls?
  • Are there regressions after prompt changes?

Langfuse answers all of these.

Setup

npm install langfuse
Enter fullscreen mode Exit fullscreen mode
import { Langfuse } from "langfuse";

const langfuse = new Langfuse({
  publicKey: "pk-...",
  secretKey: "sk-...",
  baseUrl: "https://cloud.langfuse.com", // or self-hosted
});
Enter fullscreen mode Exit fullscreen mode

Tracing LLM Calls

// Create a trace
const trace = langfuse.trace({ name: "chat-response", userId: "user-123" });

// Track a generation (LLM call)
const generation = trace.generation({
  name: "gpt-4-response",
  model: "gpt-4",
  input: [{ role: "user", content: "Explain quantum computing" }],
});

const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{ role: "user", content: "Explain quantum computing" }],
});

generation.end({
  output: response.choices[0].message.content,
  usage: {
    promptTokens: response.usage.prompt_tokens,
    completionTokens: response.usage.completion_tokens,
  },
});
Enter fullscreen mode Exit fullscreen mode

Automatic Integration (OpenAI)

import { observeOpenAI } from "langfuse";
import OpenAI from "openai";

const openai = observeOpenAI(new OpenAI(), { generationName: "chat" });

// All OpenAI calls are automatically traced!
const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{ role: "user", content: "Hello" }],
});
Enter fullscreen mode Exit fullscreen mode

Zero manual instrumentation — all calls, costs, and latencies are tracked.

What You See in Langfuse Dashboard

  • Cost per trace: Exact $ cost of each user interaction
  • Latency: How long each LLM call takes
  • Token usage: Input vs output tokens
  • Quality scores: Rate responses manually or automatically
  • Prompt versions: Compare different prompt versions
  • User analytics: Cost and usage per user

Prompt Management

// Fetch prompt from Langfuse (versioned, A/B testable)
const prompt = await langfuse.getPrompt("summarize-article");

const response = await openai.chat.completions.create({
  model: prompt.config.model,
  messages: [{ role: "system", content: prompt.compile({ topic: "AI" }) }],
});
Enter fullscreen mode Exit fullscreen mode

Change prompts without code deployments.

Evaluations

// Automatic evaluation with LLM-as-judge
trace.score({ name: "relevance", value: 0.95 });
trace.score({ name: "helpfulness", value: 0.88 });

// Or use Langfuse's built-in evaluators
Enter fullscreen mode Exit fullscreen mode

Self-Host

git clone https://github.com/langfuse/langfuse.git
cd langfuse && docker compose up -d
Enter fullscreen mode Exit fullscreen mode

Free Tier (Cloud)

  • 50K observations/month
  • Unlimited team members
  • 30-day data retention

Building AI applications? I create developer tools and data solutions. Email spinov001@gmail.com or check my Apify tools.

Top comments (0)