DEV Community

SIGNAL
SIGNAL

Posted on

Self-Host Langfuse With Docker — LLM Observability Without the Cloud Bill

Why You Need LLM Observability

You're running Ollama, maybe piping it through LangChain or LiteLLM, and everything seems fine — until it isn't. A prompt that worked yesterday now returns garbage. Costs on your OpenAI fallback are creeping up. A chain silently swallows an error and returns a confident-sounding hallucination.

This is the observability gap. In traditional backend services, we have Grafana, Prometheus, Jaeger. For LLM workloads, we need something purpose-built: tracing every call, measuring latency and token usage, scoring output quality, and catching regressions before users do.

Langfuse is the open-source answer. MIT-licensed, self-hostable, and as of v3, genuinely production-ready with ClickHouse-backed analytics. Let's set it up.

What Langfuse Actually Does

Think of it as Jaeger for LLM calls. Every interaction becomes a trace — a tree of spans showing:

  • Which model was called, with what parameters
  • Input/output tokens and estimated cost
  • Latency per step in a chain
  • Custom scores (human feedback, LLM-as-judge, regex checks)
  • Prompt versions and A/B test metadata

It integrates natively with LangChain, LlamaIndex, LiteLLM, the OpenAI SDK, and Vercel AI. If you're building anything with LLMs, it probably fits.

Prerequisites

  • Docker and Docker Compose (v2)
  • A machine with at least 4 GB RAM (ClickHouse is hungry)
  • 10 minutes

Step 1: Clone and Configure

git clone https://github.com/langfuse/langfuse.git
cd langfuse
Enter fullscreen mode Exit fullscreen mode

Langfuse v3 ships a docker-compose.yml that bundles PostgreSQL, ClickHouse, Redis, MinIO, and the Langfuse web/worker services. Before starting, create a .env file:

# .env
NEXTAUTH_SECRET=$(openssl rand -base64 32)
SALT=$(openssl rand -base64 32)
ENCRYPTION_KEY=$(openssl rand -hex 32)
NEXTAUTH_URL=http://localhost:3000
LANGFUSE_INIT_ORG_ID=my-org
LANGFUSE_INIT_ORG_NAME=MyOrg
LANGFUSE_INIT_PROJECT_ID=my-project
LANGFUSE_INIT_PROJECT_NAME=default
LANGFUSE_INIT_PROJECT_PUBLIC_KEY=pk-lf-local
LANGFUSE_INIT_PROJECT_SECRET_KEY=sk-lf-local
Enter fullscreen mode Exit fullscreen mode

The INIT variables auto-create your first org and project — no manual setup in the UI.

Step 2: Start It Up

docker compose up -d
Enter fullscreen mode Exit fullscreen mode

Give it 30-60 seconds. ClickHouse takes a moment to initialize. Then open http://localhost:3000, create an admin account, and you're in.

Verify everything is healthy:

docker compose ps
Enter fullscreen mode Exit fullscreen mode

You should see langfuse-web, langfuse-worker, postgres, clickhouse, redis, and minio all running.

Step 3: Instrument Your Code

Here's the payoff. Install the Python SDK:

pip install langfuse
Enter fullscreen mode Exit fullscreen mode

Basic tracing with the OpenAI SDK

from langfuse.openai import openai
import os

os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-local"
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-local"
os.environ["LANGFUSE_HOST"] = "http://localhost:3000"

# This is a drop-in replacement — same API, automatic tracing
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain Docker volumes in one paragraph"}],
    metadata={"environment": "dev", "feature": "docs-assistant"}
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

That's it. One import change. Every call now appears in your Langfuse dashboard with full token counts, latency, and cost.

Tracing Ollama via LiteLLM

Running local models? Route through LiteLLM and Langfuse picks it up:

from langfuse.openai import openai

response = openai.chat.completions.create(
    model="ollama/llama3.2",
    messages=[{"role": "user", "content": "What is eBPF?"}],
    api_base="http://localhost:11434",
)
Enter fullscreen mode Exit fullscreen mode

Now you can compare latency and quality between your local Llama and a cloud fallback — with actual data, not vibes.

Scoring Outputs

The killer feature. You can attach scores to any trace:

from langfuse import Langfuse

langfuse = Langfuse()

# After getting user feedback or running an eval
langfuse.score(
    trace_id="your-trace-id",
    name="helpfulness",
    value=0.9,
    comment="Accurate and concise"
)
Enter fullscreen mode Exit fullscreen mode

Combine this with an LLM-as-judge pattern (have GPT-4o score your Llama outputs) and you've got automated quality monitoring.

Step 4: Set Up Alerts (Don't Skip This)

Langfuse doesn't have built-in alerting yet, but the data is in PostgreSQL and ClickHouse. A simple cron job covers the gap:

#!/bin/bash
# check-llm-errors.sh — alert on high error rate
ERROR_COUNT=$(docker exec langfuse-clickhouse-1 clickhouse-client \
  --query "SELECT count() FROM traces WHERE timestamp > now() - INTERVAL 1 HOUR AND level = 'ERROR'")

if [ "$ERROR_COUNT" -gt 10 ]; then
  curl -X POST "$WEBHOOK_URL" \
    -H 'Content-Type: application/json' \
    -d "{\"text\": \"Warning: LLM error spike: $ERROR_COUNT errors in the last hour\"}"
fi
Enter fullscreen mode Exit fullscreen mode

What You Get

After a day of running this, you'll have:

  • Cost tracking — know exactly what each feature costs in tokens
  • Latency baselines — spot regressions when you change prompts or models
  • Quality scores — track output quality over time, not just "it works"
  • Prompt versioning — Langfuse's prompt management lets you A/B test without redeploying
  • Full data ownership — everything stays on your hardware

Resource Usage

On my homelab (Ryzen 5600G, 32 GB RAM), the full stack idles at about 1.5 GB RAM and negligible CPU. ClickHouse is the biggest consumer. For a solo dev or small team, any machine that can run Ollama can run Langfuse alongside it.

When NOT to Self-Host

Be honest with yourself:

  • If you're a solo dev experimenting, Langfuse Cloud's free tier (50K events/month) is perfectly fine
  • If you need SSO, audit logs, or SLA guarantees, the managed plan is worth it
  • If you don't want to maintain PostgreSQL backups, don't pretend you will

Self-hosting is for people who either need data sovereignty or enjoy running infrastructure. If that's you, Langfuse is one of the best-maintained open-source projects in the LLM tooling space.

Wrapping Up

LLM observability isn't optional anymore — not when models are non-deterministic, costs are real, and quality can silently degrade. Langfuse gives you production-grade tracing with a single import change and a docker compose up. That's a good trade.

The repo: github.com/langfuse/langfuse
Docs: langfuse.com/docs


SIGNAL is a weekly series for builders who self-host, automate, and actually ship. Published Mon/Wed/Fri.

Top comments (0)