Odilon HUGONNOT

Posted on May 23 • Originally published at web-developpeur.com

AI does exactly what you ask — that's the problem

#ai #claude #promptengineering #workflow

I asked Claude to add pagination to a product list. Response in 30 seconds: clean, functional, complete. And completely disconnected from the rest of the app. Wrong pagination component (we already had one), invented styles, existing filters broken. Technically correct. Unusable as-is.

The problem wasn't the AI — it was my prompt. I wrote: "Add pagination to this list." That's exactly what it did. Nothing more, nothing less.

Current models (Claude 4.x, GPT-4o) have dropped the "infer intent" behavior. They take prompts literally. That's progress overall, but it fundamentally changes how you need to prompt for code. A good code prompt isn't about tricks — it's about giving the AI the same context you'd give a junior dev joining the project.

I tested and scored dozens of formulations across the four most common coding tasks. Here's what actually works.

The structure that applies to everything

Before getting into specific cases, there are four elements present in every good code prompt:

Stack/context — language, version, framework, relevant files
Precise task — what you want, phrased as an instruction, not a question
Constraints — what NOT to touch, scope limits
Expected output — diff, full code, explanation only, CVSS score…

Anthropic's golden rule sums it up: show your prompt to a colleague with no context. If they'd be confused, the AI will be too.

For larger prompts (multiple files, complex instructions), XML tags help separate sections. Tests show 30–39% improvement in response quality when prompts are structured with clear tags:


<context>PHP 8.1, Laravel 10, multi-tenant app</context>
<task>Add pagination to ProductList</task>
<constraints>Do not touch getFilteredProducts()</constraints>
<code>[paste files here]</code>

Second rule: code first, question last. Placing context and files at the top with the instruction at the bottom improves precision by ~30% on large contexts. The AI reads everything before acting.

Bug fix — describe the symptom, not the theory

The classic mistake: paste code and ask "this doesn't work." The AI doesn't know if it's crashing, returning a wrong value, or just too slow. It will invent a plausible problem and "fix" it.

❌ Vague prompt — score: 3/10


This code doesn't work, help me.

function getUserById($id) {
    $result = $db->query("SELECT * FROM users WHERE id = $id");
    return $result->fetch();
}

Typical result: the AI fixes the SQL injection (fair, real issue), ignores the actual error, and rewrites with PDO prepared statements. Technically sound, but not what you needed.

✅ Precise prompt — score: 9/10


Stack: PHP 8.1 + PDO, Laravel 10.

Current behavior:
  Call to a member function fetch() on bool
  → only when the ID doesn't exist in the table

Expected behavior:
  Return null if user doesn't exist (no exception thrown)

Already tried:
  isset($result) before fetch() — same error

Code:
function getUserById($id) {
    $result = $db->query("SELECT * FROM users WHERE id = $id");
    return $result->fetch();
}

The "already tried" line is critical. Without it, the AI will re-suggest exactly what you already attempted. With it, it looks elsewhere — and finds that PDO::query() returns false on failure, so the fix is $result !== false ? $result->fetch() : null.

Bug fix template:


Stack: [language + version + framework]
Current behavior: [exact symptom — error message, wrong return value, observed behavior]
Expected behavior: [what the code should do]
Already tried: [previous attempts and why they didn't work]
Code: [minimal code that reproduces the bug]

New feature — acceptance criteria and out-of-scope

"Add a comment system." The AI will choose a stack (database, probably), a UI (modals, inline, dedicated page?), validation (client-side, server-side?). Each of those choices may be incompatible with your project.

❌ Without integration context — score: 3/10


Add a comment system to this blog.

Likely result: MySQL solution, jQuery form, a design that matches nothing in the existing project.

✅ With context, criteria, and out-of-scope — score: 9/10


Stack: PHP 8 + Bootstrap 3, no database (JSON file storage).

Task: Add comments to blog articles.

Acceptance criteria:
- Form: name + message (no email, no account required)
- Server-side validation only (no JS)
- Storage: one JSON file per article at /blog/comments/{slug}.json
- Email notification to author on each new comment (PHPMailer already installed)

Out of scope:
- No moderation for now
- No nested replies
- No changes to existing CSS

Files involved: [blog_footer.php, template.php]

The out-of-scope section is the key difference. Telling the AI what not to implement is just as important as what to build. Without it, the AI will either over-engineer (moderation, auth, nested threads) or invent constraints that don't exist.

New feature template:


Stack: [technical context]
Integration context: [what it must integrate with — existing components, project patterns]
Task: [precise feature description]
Acceptance criteria: [list of expected behaviors]
Out of scope: [what we do NOT want implemented now]
Files involved: [files to modify or create]

Refactoring — define the problem, not the solution

"Refactor this code" is the worst possible refactoring prompt. The AI will choose its own quality criteria, likely change method signatures, add unsolicited abstractions, and break the existing public API.

❌ Without goal or constraints — score: 2/10


Refactor this code to make it cleaner.

[450 lines of OrderService]

Typical result: 8 extracted private methods, renames, interfaces added, an abstract class "for future flexibility." It compiles. Tests break.

✅ Concrete problem + goal + constraints — score: 9/10


Concrete problem:
  OrderService::processOrder() (450 lines) is called from 8 places
  with different parameter combinations.
  Impossible to know which cases are covered by existing tests.

Goal:
  Extract pricing rules into immutable Value Objects.
  One class = one rule (e.g., DiscountRule, TaxRule, ShippingRule).

Hard constraints:
  - Observable behavior unchanged — existing tests must pass without modification
  - Do not modify public method signatures
  - Do not change return types

Scope: pricing calculation only, not persistence or validation.

Code: [OrderService.php]

The key phrase: "observable behavior unchanged — existing tests must pass without modification." Without this constraint, the AI optimizes by its own criteria. With it, it stays on track.

Refactoring template:


Concrete problem: [why this code is problematic — not "it's messy", but the real impact]
Goal: [what we want after the refactoring]
Hard constraints:
  - [what must not change: signatures, tests, observable behavior]
Scope: [what's IN and what's OUT]
Code: [the code to refactor]

Code review — choose your angle

Without a focus, the AI runs a generic style review. It will flag missing docstrings, suggest more descriptive variable names, and miss the SQL injection sitting at the bottom of the file. AI code review only pays off if you tell it where to focus.

❌ Generic review — score: 4/10


Review this code as a senior dev and give me improvement suggestions.

Typical result: 80% style suggestions, naming, comments. 20% real issues — buried in noise.

✅ Targeted angle + intentional decisions — score: 9/10


Context:
  Public endpoint POST /api/documents/upload
  Unfiltered incoming data, filesystem access, JWT auth verified upstream.

Review focus: security only
  - Injection (command, SQL, path)
  - Path traversal
  - Malicious file uploads (server-side execution)
  - IDOR

Intentional decisions — do not flag:
  - The (int) cast on ID is deliberate
  - Minimal error handling is intentional (internal app, centralized logs)
  - Variable name $tmp follows team convention

Output format:
  For each issue: severity (critical/high/medium) + description + fix with code.

Code: [upload-handler.php]

The "intentional decisions" section cuts noise in half. The AI doesn't flag what you already know about — it focuses on what you asked for.

Code review template:


Context: [endpoint type, who calls it, input data, auth in place]
Focus: [security / performance / business logic / architecture — one angle at a time]
Intentional decisions (do not flag): [deliberate choices in the code]
Output format: [severity + description + fix with code / simple list / …]
Code: [the code to review]

Two cross-cutting techniques that change everything

1. Two passes beat one

For complex tasks (large refactors, deep reviews), separating critique from implementation consistently outperforms asking for everything at once:

Pass 1: "What's wrong with this code? List the problems without fixing anything."

Pass 2: "Now fix problems 1, 2, and 4 you identified. Skip 3."

The separation forces the AI to analyze before acting. Second-pass results are significantly more precise.

2. "Suggest" vs "Change" — it matters now

With Claude 4.x, this distinction became literal:

Can you suggest some changes? → AI lists suggestions, touches nothing
Change this function to improve its performance. → AI modifies

If you want the AI to act: use action verbs. "Modify", "Fix", "Extract", "Rename". Not "Could you maybe look at whether...".

Summary: the 4 templates

Task

Essential elements

Most common mistake

Bug fix

Exact symptom + expected behavior + "already tried"

Forgetting previous attempts

New feature

Integration context + criteria + out-of-scope

Not defining out-of-scope

Refactoring

Concrete problem + goal + behavioral constraints

No "tests must pass" constraint

Code review

Targeted angle + intentional decisions + output format

Generic review without focus

Conclusion

The real value of these templates isn't about the AI — it's about you. Writing "current behavior vs expected behavior" forces you to precisely characterize the bug. Writing out-of-scope forces you to decide what isn't a priority. Writing refactoring constraints forces you to define what must not move.

Multiple times, while drafting a structured prompt, I realized the task itself was poorly defined from the start. The AI didn't need to run — the prompt had already done the work.

AI doesn't solve the problem for you. It solves the problem you described. You might as well make that description accurate.

📄 Associated CLAUDE.md

View • Download • Catalogue

DEV Community