Jamie Cole

Posted on Mar 23

The LLM Monitoring Stack I Run in Production (It's 3 Tools, $50/mo)

#ai #llm #devops

I've spent 18 months building and running LLM-powered production systems. Here's exactly what I monitor and what it costs.

The Three Tools

1. DriftWatch — £9.90/mo

Catches when the model silently changes behaviour. Runs 20 standardized prompts against your model every hour. Alerts you when outputs drift from baseline.

This is the one nobody thinks about until they get their first incident from a model update.

What I use it for: Detecting GPT-4o or Claude updates before they break my prompts.

2. Helicone — Free tier (or $50/mo)

LLM observability. Shows you token usage, latency, failure rates, and what prompts are actually going to your models.

What I use it for: Debugging why production is slow or expensive.

3. PagerDuty or Slack — $0-20/mo

Alerting. DriftWatch and Helicone both webhook. Point them at Slack or PagerDuty and you're covered.

What I use it for: Waking me up at 3am if something goes wrong.

The Total Cost

Tool	Monthly Cost
DriftWatch (Starter)	£9.90
Helicone (Free tier)	$0
Slack (if needed)	$0-20
Total	£9.90-29.90/mo

For what it protects against — a single production incident from undetected drift can cost more than a year of monitoring.

What This Actually Catches

Real Example

Last month, my JSON parser broke because GPT-4o started adding apologetic preambles to certain response types. DriftWatch caught it in 20 minutes. I fixed the prompt. Nobody noticed.

Without it: I find out when users report bugs. That's a 4-hour incident instead of a 20-minute fix.

The Stack in Practice

# The monitoring pipeline
driftwatch = DriftWatch(model="gpt-4o")
helicone = Helicone(api_key=os.environ["HELICONE_KEY"])

# Hourly: check for drift
schedule.every("1h").do(driftwatch.check_and_alert)

# Real-time: log all calls
@app.route("/llm/call")
def llm_call():
    response = helicone.track(llm.request(messages))
    return response

Is This Overkill?

If you're running a hobby project: yes, probably.
If you're running LLMs in a product that people pay for: no.

The gap between "works in dev" and "works in production" is where most LLM incidents live. Monitoring is how you cross that gap safely.

I'm the author of DriftWatch. If you want to try it: Starter plan £9.90/mo — cancel anytime.

DEV Community