Nova Elvaris

Posted on Mar 19

Scaffolded Test-First Prompting: Get Correct Code From the First Run

#promptengineering #devworkflow

If you use AI to help with coding, the most common failure mode is not that the model is lazy. It is that the target is fuzzy.

You ask for a fix, the assistant guesses what “correct” means, and you get something that looks plausible but is slightly off: wrong edge case, wrong file, wrong abstraction, wrong dependency, or the right idea implemented far too broadly.

A simple way to reduce that failure rate is to stop asking for the fix first.

Ask for the test first.

I call this Scaffolded Test-First Prompting: a small workflow where you ask the assistant to write a runnable failing test, then ask it for the smallest implementation that makes that test pass. It is not fancy, but it turns vague coding requests into executable contracts.

Why this works

Most prompting problems in code work are really specification problems.

When you say:

Fix the currency formatter for German locales.

there are a dozen hidden questions:

What file owns the behavior?
What format is expected exactly?
Which runtime or test framework is in play?
Are external libraries allowed?
Should the solution be minimal or architectural?
What counts as done?

A test answers those questions in a way prose usually does not. It gives the assistant an observable target instead of a vague intention.

That creates three immediate benefits:

Correctness becomes executable. You can run the result instead of debating it.
Scope stays smaller. The assistant is less likely to refactor half the repo when one assertion would do.
Review gets easier. The change is justified by a failing case, not by hand-wavy explanations.

The workflow in 3 steps

1. Give minimal runnable context

Start with just enough information for the assistant to place the work.

Include:

the file or module involved
the language/runtime
the test framework
one concrete input/output example
any relevant constraint like “no new dependencies”

For example:

I have formatCurrency(amount, locale) in src/format.js.
Runtime: Node 22.
Tests use Jest.
Expected behavior: formatCurrency(12.5, 'de-DE') should return '12,50 €'.
No new dependencies.

That is enough. You do not need a three-paragraph backstory.

2. Ask for a failing test only

This is the key move.

Do not ask for the implementation yet. Ask for one focused test file that uses the existing project conventions and only checks the behavior you care about.

Example prompt:

Create a Jest test for src/format.js that asserts
formatCurrency(12.5, 'de-DE') returns '12,50 €'.
Only return the test file contents and any minimal config change required.
Do not implement the function yet.

Why separate the steps?

Because it forces the assistant to reason about expected behavior before it starts inventing code. That alone removes a surprising amount of drift.

It also gives you an artifact you can run immediately. If the test is broken, ambiguous, or not aligned with your conventions, you catch that early before any implementation gets layered on top.

3. Ask for the smallest fix that makes the test pass

Now run the test.

If it fails, paste the failure output and ask for the smallest possible code change that satisfies this exact case.

Example:

This test fails with the following output:
[paste stack trace]

Implement the smallest change in src/format.js that makes this test pass.
Avoid unrelated refactors.
Return a unified diff.

That framing matters. “Smallest change” and “avoid unrelated refactors” are not decoration. They steer the assistant away from broad rewrites and toward reviewable patches.

A concrete example

Let’s make this real.

Say you have a CSV export bug: fields containing newlines break when opened in spreadsheet apps.

A vague prompt would be:

Fix CSV export escaping.

That is almost guaranteed to produce a mushy answer.

A scaffolded version looks like this instead.

Step 1: context

File: src/export/csv.ts
Runtime: Node 22
Tests: Vitest
Bug: values containing newlines are split into multiple rows in spreadsheet apps
Constraint: preserve the newline; do not replace it with spaces

Step 2: test first

import { describe, it, expect } from 'vitest';
import { toCsvRow } from '../../src/export/csv';

describe('toCsvRow', () => {
  it('quotes fields containing newlines', () => {
    const row = toCsvRow(['hello\nworld', 'ok']);
    expect(row).toBe('"hello\nworld",ok');
  });
});

Step 3: minimal implementation request

Now the assistant has a precise target:

preserve the newline
quote the field
keep other fields unchanged

That is a much tighter problem than “fix CSV export.” You are far more likely to get a correct, local patch instead of a speculative rewrite.

Why this often beats “write the code” prompts

Direct implementation prompts encourage the model to jump straight to a solution shape. Sometimes that works. Often it means the assistant commits too early.

Test-first prompting delays that commitment.

It makes the assistant define success in visible behavior before it chooses structure. That tends to improve outcomes in a few specific ways:

fewer invented abstractions
fewer unnecessary dependencies
better preservation of existing conventions
better debugging, because the failing assertion narrows the search space

It also helps you think better. Writing or reviewing the test forces clarity. You may notice that the expected output is wrong, the edge case is underspecified, or the real problem is slightly different than you thought.

Good prompts for this pattern

A few phrases consistently help:

“Write one focused failing test first.”
“Use existing project conventions.”
“Do not implement yet.”
“Return only the test file contents.”
“Implement the smallest change that makes this test pass.”
“Avoid unrelated refactors.”
“Return a unified diff.”

These are small constraints, but together they make the workflow much more reliable.

Common mistakes

Asking for too much at once

One bug, one test, one patch. If you ask for three edge cases, a refactor, and updated docs in one shot, you lose the main benefit: tight feedback.

Giving environment-free prompts

“Write a test” without naming Jest, Vitest, pytest, or the file layout is how you end up with correct ideas in the wrong format.

Accepting a test you did not run

A generated test is still code. Run it. A broken test file is not a contract; it is just another hallucination with nicer formatting.

Letting the implementation grow

Once the test exists, hold the line. Ask for the smallest passing change, not the “cleanest long-term redesign.” You can always refactor after correctness is locked in.

When not to use it

This pattern is best when behavior is known and correctness matters.

It is less useful for:

exploratory prototyping
open-ended design work
large refactors where the desired behavior is still moving

In those cases, a spec-first or plan-first prompt may fit better.

Closing

Scaffolded Test-First Prompting works because it replaces ambiguous intent with a runnable target.

Instead of asking AI to guess what “fixed” means, you define a failing example, make that example executable, and only then ask for the smallest code change that satisfies it.

That one habit usually buys you faster convergence, cleaner patches, and fewer weird detours.

If you use AI for coding every day, this is one of the easiest workflow upgrades to adopt: test first, patch second, run everything.

DEV Community