Your AI App Is a Black Box
Your LLM app works in testing. In production, users complain about hallucinations, slow responses, and wrong answers. But you cannot see what happened because LLM calls are opaque — input goes in, output comes out.
Langfuse: Observability for LLM Applications
Langfuse is an open-source LLM engineering platform. Trace every LLM call, measure quality, manage prompts, and debug issues — all in one dashboard.
Free Options
- Self-hosted: 100% free, unlimited traces
- Cloud: Free tier with 50K observations/month
What You See
For every LLM call, Langfuse captures:
- Input prompt (full)
- Output response (full)
- Token usage and cost
- Latency (time to first token, total)
- Model used
- User feedback scores
- Custom metadata
Add Tracing in 3 Lines
Python (OpenAI)
from langfuse.openai import openai
# That is it. Every OpenAI call is now traced.
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Explain quantum computing"}]
)
LangChain
from langfuse.callback import CallbackHandler
handler = CallbackHandler()
chain.invoke({"input": "query"}, config={"callbacks": [handler]})
# Every chain step is now traced
Why Teams Need This
1. Cost Tracking
Total spend this week: $147.23
Most expensive endpoint: /api/summarize ($89)
Average cost per request: $0.03
GPT-4 calls: 2,341 ($120)
GPT-3.5 calls: 15,000 ($27)
Know exactly where your AI budget goes.
2. Quality Scores
Attach user feedback to traces:
langfuse.score(
trace_id=trace.id,
name="user-feedback",
value=1 # thumbs up
)
Track quality over time. Find which prompts produce bad results.
3. Prompt Management
Version your prompts in Langfuse instead of hardcoding them:
prompt = langfuse.get_prompt("summarize-article", version=3)
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "system", "content": prompt.compile(max_words=200)}]
)
Change prompts without redeploying code.
4. Evaluation Pipelines
Run automated evals on your LLM outputs:
- Factuality checks
- Toxicity detection
- Relevance scoring
- Custom evaluators
Langfuse vs Alternatives
| Feature | Langfuse (Free) | LangSmith | Weights & Biases |
|---|---|---|---|
| Open source | Yes | No | No |
| Self-host | Yes | No | No |
| Tracing | Full | Full | Limited |
| Prompt mgmt | Yes | Yes | No |
| Cost tracking | Yes | Yes | No |
| Evaluations | Yes | Yes | Yes |
Get Started
# Self-hosted
docker compose up -d
# Or use cloud
pip install langfuse
Building AI apps that need real data? 88+ scrapers on Apify for training data and RAG pipelines. Custom: spinov001@gmail.com
Top comments (0)