Stop Tweaking Prompts: Build a Feedback Loop Instead

#ai #promptengineering #programming #productivity

Here's a pattern I see constantly: a developer writes a prompt, gets mediocre output, tweaks a word, runs it again, tweaks another word, runs it again. Thirty minutes later, the prompt is a mess of contradictions and the output is marginally better.

This is prompt tweaking. It feels productive. It isn't.

The alternative is a feedback loop — and it takes five minutes to set up.

What's Wrong With Tweaking

Tweaking is random walk optimization. You change one thing, observe the result, and decide if it's "better" based on gut feel. Problems:

No baseline. You can't tell if version 12 is better than version 3 because you didn't save version 3's output.
No criteria. "Better" is undefined. Is shorter better? More detailed? More accurate? You're optimizing for a moving target.
No reproduction. Models are stochastic. The same prompt can give different results on different runs. One good output doesn't mean the prompt is good.

The Feedback Loop (5-Minute Setup)

Step 1: Define "Good"

Before you touch the prompt, write down what a good output looks like. Be specific:

## Acceptance Criteria
- Output is valid JSON
- Contains exactly 3 bullet points
- Each bullet is under 20 words
- No marketing language ("revolutionary", "game-changing")
- Captures the main complaint, not a summary of everything

This takes 60 seconds and saves you from chasing your tail.

Step 2: Create 3 Test Inputs

Pick three real inputs that represent your actual use case. Not toy examples — real data:

inputs/
  input-1.txt   # Short feedback, one complaint
  input-2.txt   # Long feedback, multiple issues
  input-3.txt   # Edge case: positive feedback (no complaints)

Step 3: Run All Three, Score Against Criteria

Run your prompt against all three inputs. For each output, check it against your acceptance criteria:

input-1: ✅ JSON ✅ 3 bullets ✅ under 20 words ✅ no marketing ✅ main complaint
input-2: ✅ JSON ❌ 4 bullets ✅ under 20 words ✅ no marketing ✅ main complaint
input-3: ✅ JSON ✅ 3 bullets ✅ under 20 words ❌ "amazing" ❌ no complaint to find

Now you know exactly what's broken: bullet count enforcement and the positive-feedback edge case.

Step 4: Fix What Failed

Change the prompt to address the specific failures. Not "make it better" — fix the bullet count issue:

Return EXACTLY 3 bullet points. Not 2, not 4.
If the feedback is positive with no complaints, return:
[{"point": "No actionable complaints identified"}]

Step 5: Re-Run, Re-Score

Run all three inputs again. Check the criteria. If everything passes, you're done. If something new breaks, fix that.

Why This Works

The feedback loop replaces intuition with information. Instead of "hmm, that looks better," you get "2 out of 3 inputs pass all criteria."

You also build an eval set as a side effect. Next time the model updates or you change the prompt, run the same three inputs and see if anything regressed. You just got regression testing for free.

The Time Math

Approach	Time Spent	Confidence
Tweaking for 30 min	30 min	Low ("it seems better?")
Feedback loop	10 min setup + 5 min per iteration	High (pass/fail per criteria)

The feedback loop is faster and gives you reusable test infrastructure.

Practical Tips

Start with 3 inputs, not 30. You can always add more later. Three is enough to catch most issues.
Write criteria before the prompt. It forces you to think about what you actually want.
Save every prompt version. Just prompt-v1.md\, prompt-v2.md\. You'll want to diff them later.
Automate the loop when it matters. If this prompt runs in production, turn your test inputs into a script that runs on CI.