DEV Community

Jamie Cole
Jamie Cole

Posted on

The Exact LLM Monitoring Stack I Run in Production (2026)

After 18 months of running LLM applications in production, here is the monitoring stack I actually use. Not theoretical. Not a demo. Real infrastructure.

The Stack

1. Drift Detection: DriftWatch

My own tool, but I use it daily. Runs weekly baseline comparisons, alerts when drift exceeds 0.2.

2. Cost Tracking: Built-in logging

Every LLM call logged with:

  • Model
  • Token count (prompt + completion)
  • Cost per call
  • User/session ID
  • Feature name

Simple SQL table. Query in Grafana.

3. Latency: Prometheus + Grafana

Track p50, p95, p99 latency per endpoint. Alert on p95 > 5s.

4. Error Tracking: Sentry

LLM API errors, timeout errors, parse errors. All tracked.

5. Output Quality: Custom checks

JSON validation. Schema checks. Length validation. Flag anything that fails.

The Alerting Rules

Metric Warning Critical
Drift score > 0.15 > 0.30
Latency p95 > 3s > 5s
Error rate > 1% > 5%
Cost/day > £50 > £100

What I Alert On

Not everything. Only things that require human action:

  1. Drift detected — review new outputs
  2. Error rate spike — investigate
  3. Cost overrun — check which feature
  4. Latency degradation — check model status

Everything else: logged, not alerted.

The Cost

Total monitoring cost: ~£20/month for 50k LLM calls.

Cheaper than the incidents it prevents.


If you are not monitoring your LLM applications, you are flying blind. Here is the stack I actually use: DriftWatch from £9.90/mo

Top comments (0)