Nova Elvaris

Posted on Mar 6

The Constraint Ladder: Stepwise Prompting for Reliable AI Outputs

#workflow

A lot of “bad AI output” is really “underspecified input.” When you ask a model to do the thing, it fills in missing details with whatever it thinks is statistically plausible.

That’s fine for brainstorming. It’s painful for shipping.

Over the last months I’ve landed on a simple mental model that keeps my prompts from drifting into mush:

Build constraints in rungs, like a ladder.

Instead of one mega-prompt, I add structure in a predictable order. Each rung reduces ambiguity. If the output is still wrong, you can usually point to the rung that’s missing.

Below is the Constraint Ladder, a reusable template, and two concrete examples (one coding, one writing) you can steal.

The Constraint Ladder (6 rungs)

1) Outcome (what “done” means)

A model can’t hit a target you didn’t describe.

Write the outcome in one sentence, and include the “why” if it affects tradeoffs.

Bad: “Optimize this function.”
Better: “Reduce p95 latency of parseInvoice() by ~20% without changing behavior.”

2) Inputs (what it may use)

Models hallucinate when they don’t have the ingredients.

List the artifacts it should rely on:

files/snippets
API docs you pasted
constraints like “no external deps”

If you can’t provide something, say that too: “You do not have access to production logs.”

3) Boundaries (what it must not do)

This is where you prevent “helpful” damage.

Examples:

“Do not rename exported functions.”
“Do not change database schema.”
“Do not introduce new runtime dependencies.”
“If unsure, ask one question instead of guessing.”

4) Form (how to respond)

The model’s default format is chatty prose. Your workflow probably isn’t.

Pick a form that drops straight into your process:

a diff
a checklist
a table
a step-by-step plan
JSON you can parse

If you need code changes, I strongly prefer diff-first output:

easier to review
easier to apply
makes unintended edits obvious

5) Verification (how to prove it worked)

This rung is the difference between “looks right” and “is right.”

Ask for:

tests to add / run
commands to verify
edge cases to consider

Even better: require the model to state what it checked.

6) Stop conditions (when to halt)

Give it permission to stop.

“If there are multiple reasonable interpretations, stop and ask.”
“If you would need external docs, stop and list what you’d look up.”
“If you cannot meet constraints, propose the smallest relaxation and why.”

This rung prevents confident nonsense.

A reusable prompt template

Copy/paste and fill the blanks:

Task / Outcome:
- ...

Inputs:
- ...

Boundaries:
- Must ...
- Must not ...

Output format:
- ...

Verification:
- ...

Stop conditions:
- ...

Two tiny tricks that help a lot:

1) Keep each rung short. Brevity forces clarity.
2) Prefer “must/must not” language. It’s harder to misread than “try to.”

Example 1: AI-assisted refactor without surprise rewrites

Imagine you have a function that’s correct but slow, and you want safe improvements.

Here’s a Constraint Ladder prompt I’d actually use:

Task / Outcome:
- Improve performance of `parseInvoice()` by reducing allocations.
- Behavior must remain identical.

Inputs:
- The current function implementation is below.
- You may assume Node.js 22.

Boundaries:
- Must not change exported types or function signature.
- Must not introduce new dependencies.
- Must not rewrite unrelated parts of the file.

Output format:
- Provide a unified diff.
- After the diff, include a short explanation (max 8 bullets).

Verification:
- Add/adjust at least 3 tests that protect behavior (edge cases).
- Provide a small benchmark snippet I can run locally.

Stop conditions:
- If behavior depends on input invariants not stated, ask up to 2 questions.

Code:
<PASTE CURRENT FUNCTION>

Why this works:

Outcome is measurable (allocations / perf).
Boundaries prevent “helpful” refactors.
Diff format makes review fast.
Verification forces the model to think in terms of observable behavior.

If you skip rung 5 (verification), you’ll often get code that looks plausible but is subtly wrong.

Example 2: Writing a technical doc that doesn’t drift into fluff

Let’s say you want a short internal doc: “How to rotate service credentials.”

A typical failure mode: the doc becomes generic (“security is important”) and misses your system’s real steps.

Try this:

Task / Outcome:
- Draft an internal runbook: rotating credentials for Service X.
- Goal: a new on-call can follow it at 03:00 without Slack.

Inputs:
- Environment: Kubernetes.
- Secrets live in 1Password + ExternalSecrets.
- Rotation requires updating GitOps repo + triggering ArgoCD.
- Include the exact commands I provide below.

Boundaries:
- Must be specific to our environment (no generic advice).
- Must not mention tools we don’t use.
- Must not assume console access beyond kubectl + Git.

Output format:
- Markdown with these sections:
  1) Preconditions
  2) Step-by-step
  3) Validation
  4) Rollback
  5) Common failure modes

Verification:
- Include a validation checklist with observable signals.
- Include at least 5 failure modes and how to recognize them.

Stop conditions:
- If any critical command is missing, stop and ask for it.

Commands/snippets:
<PASTE YOUR REAL COMMANDS>

Notice how the ladder forces specificity:

Inputs name your real systems.
Form is a runbook, not an essay.
Verification is “observable signals,” not vibes.

How to debug a prompt using the ladder

When output is wrong, don’t rewrite the whole prompt. Identify the missing rung:

Too vague? → strengthen Outcome.
Hallucinated details? → tighten Inputs and add “you do not have access to …”.
Broke stuff? → add explicit Boundaries.
Hard to use? → change the Form (diff/checklist/JSON).
Seems plausible but fails? → raise Verification requirements.
Confident guessing? → add Stop conditions.

This makes prompting feel less like magic and more like engineering.

One last refinement: the “smallest next constraint” rule

If you keep adding constraints until the prompt is a novel, you’ll slow yourself down.

A better rule:

Only add the next rung that would have prevented the last failure.

Prompting is iterative. The Constraint Ladder just makes the iteration structured.

If you try this pattern, I’d love to hear which rung fixed the biggest pain for you.

DEV Community