Your AI app burns $500/month on OpenAI. But which prompts cost the most? Which ones fail? Langfuse gives you full observability for LLM applications — traces, costs, quality scores, and prompt management.
The Problem
LLM applications are expensive and unpredictable:
- Which prompts cost the most?
- Which responses are low quality?
- How much does each feature cost in API calls?
- Are there regressions after prompt changes?
Langfuse answers all of these.
Setup
npm install langfuse
import { Langfuse } from "langfuse";
const langfuse = new Langfuse({
publicKey: "pk-...",
secretKey: "sk-...",
baseUrl: "https://cloud.langfuse.com", // or self-hosted
});
Tracing LLM Calls
// Create a trace
const trace = langfuse.trace({ name: "chat-response", userId: "user-123" });
// Track a generation (LLM call)
const generation = trace.generation({
name: "gpt-4-response",
model: "gpt-4",
input: [{ role: "user", content: "Explain quantum computing" }],
});
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [{ role: "user", content: "Explain quantum computing" }],
});
generation.end({
output: response.choices[0].message.content,
usage: {
promptTokens: response.usage.prompt_tokens,
completionTokens: response.usage.completion_tokens,
},
});
Automatic Integration (OpenAI)
import { observeOpenAI } from "langfuse";
import OpenAI from "openai";
const openai = observeOpenAI(new OpenAI(), { generationName: "chat" });
// All OpenAI calls are automatically traced!
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [{ role: "user", content: "Hello" }],
});
Zero manual instrumentation — all calls, costs, and latencies are tracked.
What You See in Langfuse Dashboard
- Cost per trace: Exact $ cost of each user interaction
- Latency: How long each LLM call takes
- Token usage: Input vs output tokens
- Quality scores: Rate responses manually or automatically
- Prompt versions: Compare different prompt versions
- User analytics: Cost and usage per user
Prompt Management
// Fetch prompt from Langfuse (versioned, A/B testable)
const prompt = await langfuse.getPrompt("summarize-article");
const response = await openai.chat.completions.create({
model: prompt.config.model,
messages: [{ role: "system", content: prompt.compile({ topic: "AI" }) }],
});
Change prompts without code deployments.
Evaluations
// Automatic evaluation with LLM-as-judge
trace.score({ name: "relevance", value: 0.95 });
trace.score({ name: "helpfulness", value: 0.88 });
// Or use Langfuse's built-in evaluators
Self-Host
git clone https://github.com/langfuse/langfuse.git
cd langfuse && docker compose up -d
Free Tier (Cloud)
- 50K observations/month
- Unlimited team members
- 30-day data retention
Building AI applications? I create developer tools and data solutions. Email spinov001@gmail.com or check my Apify tools.
Top comments (0)