Why Your AI Prompts Work on Monday and Fail on Friday

#ai #programming #productivity

You write a prompt. It works great. You ship it. By Friday, same prompt, same model, garbage output.

You didn't change anything. So what happened?

The 3 Hidden Variables

1. Context Drift

Your prompt depends on surrounding context — system messages, previous turns, injected files. As your codebase evolves, that context changes.

Monday: System prompt includes 200-line API spec (v2.1)
Friday: Someone updated the spec to v2.3, added 80 lines

The prompt didn't change. The context did. The model now has different instructions competing for attention.

Fix: Pin your context. Version your system prompts the same way you version code.

# Track prompt context in git
prompts/
├── code-review.md       # v1.2
├── code-review.ctx.md   # Pinned context snapshot
└── CHANGELOG.md

2. Temperature Sampling

Even at temperature 0, most API providers don't guarantee deterministic output. Batch processing, load balancing, and quantization all introduce variance.

The same prompt can produce subtly different outputs on different runs. Most of the time you don't notice. But edge cases compound.

Fix: Add output validation. Don't trust the model to be consistent — verify it:

def validate_output(result):
    required_fields = ["summary", "risk_level", "action_items"]
    for field in required_fields:
        if field not in result:
            raise ValueError(f"Missing required field: {field}")
    if result["risk_level"] not in ["low", "medium", "high"]:
        raise ValueError(f"Invalid risk_level: {result['risk_level']}")

3. Upstream Model Updates

Model providers ship silent updates. Fine-tuning adjustments, safety patches, routing changes. Your prompt was optimized for last week's model behavior.

Fix: Run regression tests. Keep 5-10 known-good input/output pairs and check them weekly:

#!/bin/bash
# prompt-regression.sh
for test in tests/prompt-cases/*.json; do
  expected=$(jq -r '.expected' "$test")
  input=$(jq -r '.input' "$test")
  actual=$(call_api "$input")
  if ! echo "$actual" | grep -q "$expected"; then
    echo "REGRESSION: $test"
    echo "Expected: $expected"
    echo "Got: $actual"
  fi
done

The Monday/Friday Pattern

Most prompt failures follow this pattern:

Monday: Fresh context, recent testing, prompt works
Wednesday: Context accumulates, minor drift, output slightly off
Friday: Context is stale, model behavior shifted, output breaks

The solution isn't better prompts. It's prompt ops — treating prompts as production systems that need monitoring, versioning, and regression tests.

Start here: Pick your most important prompt. Write three test cases for it. Run them next Friday. You'll be surprised what you find.