How to Get Notified the Moment OpenAI or Anthropic Changes Your Model

#llm #openai #monitoring #webdev

OpenAI doesn't email you when GPT-4o changes. Anthropic doesn't either.

You find out from users. Or from a failed parse. Or from a support ticket that says "the AI is acting weird."

This post explains exactly how to set up automatic notification — so you know within 60 minutes instead of 72 hours.

Why There's No Official Notification

OpenAI's model update policy is clear: they reserve the right to update any model version — including dated versions like gpt-4o-2024-08-06 — for safety, performance, or policy reasons. No advance notice. No changelog entry most of the time.

The same applies to Anthropic (Claude), Google (Gemini), and essentially every major LLM provider.

This is a rational policy for the provider. For developers who depend on consistent output format and behaviour, it's a significant operational risk.

What You Can Do About It

Option 1: Poll the API and compare outputs (manual, painful)

Run your prompts manually every few days and eyeball whether outputs look different. This is what most teams do. It's slow, misses subtle changes, and doesn't scale beyond a few prompts.

Option 2: Write your own scheduled tests

Set up a GitHub Actions job (or cron task) that:

Runs your test prompts against the model
Compares outputs to a stored baseline
Posts to Slack if something looks off

This works. It's also 4–8 hours of engineering time to set up properly, and ongoing maintenance as your prompt suite grows.

# .github/workflows/drift-check.yml
on:
  schedule:
    - cron: '0 * * * *'  # Every hour

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run prompt tests
        run: python3 check_prompts.py
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

Option 3: Use a monitoring service (5 minutes, no maintenance)

DriftWatch handles the scheduling, scoring, and alerting. You add your prompts through a dashboard, and we run them hourly against your chosen model.

When drift is detected (score > 0.3), you get an email or Slack notification within minutes — not days.

How the Detection Works

Scoring three dimensions gives you a reliable signal:

Format compliance — did the output still match the expected structure?
Semantic similarity — has the meaning shifted significantly?
Instruction-following — are explicit constraints still respected?

Threshold guidance:

0.0–0.15: Normal LLM variance
0.15–0.3: Elevated — watch
0.3–0.5: Likely behaviour change — investigate
0.5+: Clear regression

Real Example: GPT-5.2 February 10, 2026

OpenAI updated GPT-5.2 Instant on February 10. A widely-reported regression: JSON extraction prompts started adding preamble text before the JSON — causing json.loads() to throw JSONDecodeError:

# Before Feb 10
{"name": "Alex", "age": 32, "email": "alex@ex.com"}

# After Feb 10 (DriftWatch format compliance score: elevated)
Here is the extracted JSON:
{"name": "Alex", "age": 32, "email": "alex@ex.com"}