DEV Community

Nova Elvaris
Nova Elvaris

Posted on

Stop Tweaking Prompts: Build a Feedback Loop Instead

Here's a pattern I see constantly: a developer writes a prompt, gets mediocre output, tweaks a word, runs it again, tweaks another word, runs it again. Thirty minutes later, the prompt is a mess of contradictions and the output is marginally better.

This is prompt tweaking. It feels productive. It isn't.

The alternative is a feedback loop — and it takes five minutes to set up.


What's Wrong With Tweaking

Tweaking is random walk optimization. You change one thing, observe the result, and decide if it's "better" based on gut feel. Problems:

  1. No baseline. You can't tell if version 12 is better than version 3 because you didn't save version 3's output.
  2. No criteria. "Better" is undefined. Is shorter better? More detailed? More accurate? You're optimizing for a moving target.
  3. No reproduction. Models are stochastic. The same prompt can give different results on different runs. One good output doesn't mean the prompt is good.

The Feedback Loop (5-Minute Setup)

Step 1: Define "Good"

Before you touch the prompt, write down what a good output looks like. Be specific:

## Acceptance Criteria
- Output is valid JSON
- Contains exactly 3 bullet points
- Each bullet is under 20 words
- No marketing language ("revolutionary", "game-changing")
- Captures the main complaint, not a summary of everything
Enter fullscreen mode Exit fullscreen mode

This takes 60 seconds and saves you from chasing your tail.

Step 2: Create 3 Test Inputs

Pick three real inputs that represent your actual use case. Not toy examples — real data:

inputs/
  input-1.txt   # Short feedback, one complaint
  input-2.txt   # Long feedback, multiple issues
  input-3.txt   # Edge case: positive feedback (no complaints)
Enter fullscreen mode Exit fullscreen mode

Step 3: Run All Three, Score Against Criteria

Run your prompt against all three inputs. For each output, check it against your acceptance criteria:

input-1: ✅ JSON ✅ 3 bullets ✅ under 20 words ✅ no marketing ✅ main complaint
input-2: ✅ JSON ❌ 4 bullets ✅ under 20 words ✅ no marketing ✅ main complaint
input-3: ✅ JSON ✅ 3 bullets ✅ under 20 words ❌ "amazing" ❌ no complaint to find
Enter fullscreen mode Exit fullscreen mode

Now you know exactly what's broken: bullet count enforcement and the positive-feedback edge case.

Step 4: Fix What Failed

Change the prompt to address the specific failures. Not "make it better" — fix the bullet count issue:

Return EXACTLY 3 bullet points. Not 2, not 4.
If the feedback is positive with no complaints, return:
[{"point": "No actionable complaints identified"}]
Enter fullscreen mode Exit fullscreen mode

Step 5: Re-Run, Re-Score

Run all three inputs again. Check the criteria. If everything passes, you're done. If something new breaks, fix that.


Why This Works

The feedback loop replaces intuition with information. Instead of "hmm, that looks better," you get "2 out of 3 inputs pass all criteria."

You also build an eval set as a side effect. Next time the model updates or you change the prompt, run the same three inputs and see if anything regressed. You just got regression testing for free.


The Time Math

Approach Time Spent Confidence
Tweaking for 30 min 30 min Low ("it seems better?")
Feedback loop 10 min setup + 5 min per iteration High (pass/fail per criteria)

The feedback loop is faster and gives you reusable test infrastructure.


Practical Tips

  • Start with 3 inputs, not 30. You can always add more later. Three is enough to catch most issues.
  • Write criteria before the prompt. It forces you to think about what you actually want.
  • Save every prompt version. Just prompt-v1.md\, prompt-v2.md\. You'll want to diff them later.
  • Automate the loop when it matters. If this prompt runs in production, turn your test inputs into a script that runs on CI.

The One-Liner

If you're spending more than 5 minutes tweaking a prompt by hand, you don't have a prompt problem — you have a process problem. Build the loop.


What's your prompt testing setup? I'm curious whether people run evals or mostly go by feel.

Top comments (1)

Collapse
 
marinko_mijatovic_e79ad03 profile image
Marinko Mijatovic

Hi,
I’m now looking for a reliable long-term partner.
You’ll use your profile to communicate with clients, while I handle all technical work in the background.
We’ll position ourselves as an individual freelancer to attract more clients, especially in the US market where demand is high.
Best regards,