Jamie Cole

Posted on Mar 23

The Exact LLM Monitoring Stack I Run in Production (2026)

#ai #llm #devops

After 18 months of running LLM applications in production, here is the monitoring stack I actually use. Not theoretical. Not a demo. Real infrastructure.

The Stack

1. Drift Detection: DriftWatch

My own tool, but I use it daily. Runs weekly baseline comparisons, alerts when drift exceeds 0.2.

2. Cost Tracking: Built-in logging

Every LLM call logged with:

Model
Token count (prompt + completion)
Cost per call
User/session ID
Feature name

Simple SQL table. Query in Grafana.

3. Latency: Prometheus + Grafana

Track p50, p95, p99 latency per endpoint. Alert on p95 > 5s.

4. Error Tracking: Sentry

LLM API errors, timeout errors, parse errors. All tracked.

5. Output Quality: Custom checks

JSON validation. Schema checks. Length validation. Flag anything that fails.

The Alerting Rules

Metric	Warning	Critical
Drift score	> 0.15	> 0.30
Latency p95	> 3s	> 5s
Error rate	> 1%	> 5%
Cost/day	> £50	> £100

What I Alert On

Not everything. Only things that require human action:

Drift detected — review new outputs
Error rate spike — investigate
Cost overrun — check which feature
Latency degradation — check model status

Everything else: logged, not alerted.

The Cost

Total monitoring cost: ~£20/month for 50k LLM calls.

Cheaper than the incidents it prevents.

If you are not monitoring your LLM applications, you are flying blind. Here is the stack I actually use: DriftWatch from £9.90/mo

DEV Community