After 18 months of running LLM applications in production, here is the monitoring stack I actually use. Not theoretical. Not a demo. Real infrastructure.
The Stack
1. Drift Detection: DriftWatch
My own tool, but I use it daily. Runs weekly baseline comparisons, alerts when drift exceeds 0.2.
2. Cost Tracking: Built-in logging
Every LLM call logged with:
- Model
- Token count (prompt + completion)
- Cost per call
- User/session ID
- Feature name
Simple SQL table. Query in Grafana.
3. Latency: Prometheus + Grafana
Track p50, p95, p99 latency per endpoint. Alert on p95 > 5s.
4. Error Tracking: Sentry
LLM API errors, timeout errors, parse errors. All tracked.
5. Output Quality: Custom checks
JSON validation. Schema checks. Length validation. Flag anything that fails.
The Alerting Rules
| Metric | Warning | Critical |
|---|---|---|
| Drift score | > 0.15 | > 0.30 |
| Latency p95 | > 3s | > 5s |
| Error rate | > 1% | > 5% |
| Cost/day | > £50 | > £100 |
What I Alert On
Not everything. Only things that require human action:
- Drift detected — review new outputs
- Error rate spike — investigate
- Cost overrun — check which feature
- Latency degradation — check model status
Everything else: logged, not alerted.
The Cost
Total monitoring cost: ~£20/month for 50k LLM calls.
Cheaper than the incidents it prevents.
If you are not monitoring your LLM applications, you are flying blind. Here is the stack I actually use: DriftWatch from £9.90/mo
Top comments (0)