DEV Community

Alex Spinov
Alex Spinov

Posted on

Langfuse Has a Free LLM Observability Platform — Debug Your AI Apps Like a Pro

Your AI App Is a Black Box

Your LLM app works in testing. In production, users complain about hallucinations, slow responses, and wrong answers. But you cannot see what happened because LLM calls are opaque — input goes in, output comes out.

Langfuse: Observability for LLM Applications

Langfuse is an open-source LLM engineering platform. Trace every LLM call, measure quality, manage prompts, and debug issues — all in one dashboard.

Free Options

  • Self-hosted: 100% free, unlimited traces
  • Cloud: Free tier with 50K observations/month

What You See

For every LLM call, Langfuse captures:

  • Input prompt (full)
  • Output response (full)
  • Token usage and cost
  • Latency (time to first token, total)
  • Model used
  • User feedback scores
  • Custom metadata

Add Tracing in 3 Lines

Python (OpenAI)

from langfuse.openai import openai

# That is it. Every OpenAI call is now traced.
client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)
Enter fullscreen mode Exit fullscreen mode

LangChain

from langfuse.callback import CallbackHandler

handler = CallbackHandler()
chain.invoke({"input": "query"}, config={"callbacks": [handler]})
# Every chain step is now traced
Enter fullscreen mode Exit fullscreen mode

Why Teams Need This

1. Cost Tracking

Total spend this week: $147.23
Most expensive endpoint: /api/summarize ($89)
Average cost per request: $0.03
GPT-4 calls: 2,341 ($120)
GPT-3.5 calls: 15,000 ($27)
Enter fullscreen mode Exit fullscreen mode

Know exactly where your AI budget goes.

2. Quality Scores

Attach user feedback to traces:

langfuse.score(
    trace_id=trace.id,
    name="user-feedback",
    value=1  # thumbs up
)
Enter fullscreen mode Exit fullscreen mode

Track quality over time. Find which prompts produce bad results.

3. Prompt Management

Version your prompts in Langfuse instead of hardcoding them:

prompt = langfuse.get_prompt("summarize-article", version=3)
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "system", "content": prompt.compile(max_words=200)}]
)
Enter fullscreen mode Exit fullscreen mode

Change prompts without redeploying code.

4. Evaluation Pipelines

Run automated evals on your LLM outputs:

  • Factuality checks
  • Toxicity detection
  • Relevance scoring
  • Custom evaluators

Langfuse vs Alternatives

Feature Langfuse (Free) LangSmith Weights & Biases
Open source Yes No No
Self-host Yes No No
Tracing Full Full Limited
Prompt mgmt Yes Yes No
Cost tracking Yes Yes No
Evaluations Yes Yes Yes

Get Started

# Self-hosted
docker compose up -d

# Or use cloud
pip install langfuse
Enter fullscreen mode Exit fullscreen mode

Building AI apps that need real data? 88+ scrapers on Apify for training data and RAG pipelines. Custom: spinov001@gmail.com

Top comments (0)