I handed a 40-line order processing function to Claude Code and said "refactor this nicely."
What came back: Decimal class, dataclasses, logging module, full type hints, and a Strategy pattern. 120 lines. I asked for none of it.
Does it work? Yes. Is it readable? Yes. Will the reviewer say "do I really have to review all of this?" Also yes. And the SQL injection fix I actually needed? Buried somewhere in the diff.
So I ran an experiment. Same code. Two prompts: vague vs. specific. Here's what happened.
The Experiment
Target Code
A 40-line function with 5 intentional problems:
def process_order(order_data):
total = 0
for item in order_data['items']:
price = item['price']
qty = item['quantity']
if item.get('discount'):
if item['discount']['type'] == 'percent':
price = price - (price * item['discount']['value'] / 100)
elif item['discount']['type'] == 'fixed':
price = price - item['discount']['value']
if price < 0:
price = 0
total = total + (price * qty)
tax = total * 0.1
shipping = 0
if total < 5000:
shipping = 500
elif total < 10000:
shipping = 300
final = total + tax + shipping
import smtplib # Problem 1: import inside function
try:
server = smtplib.SMTP('smtp.example.com', 587)
server.sendmail('shop@example.com', order_data['email'],
f'Total: {final}')
except: # Problem 2: bare except
pass
import sqlite3 # Problem 3: import inside function
conn = sqlite3.connect('orders.db')
c = conn.cursor()
c.execute(f"INSERT INTO orders VALUES ('{order_data['email']}', {final})")
# ^ Problem 4: SQL injection
conn.commit()
conn.close()
return {'total': total, 'tax': tax, 'shipping': shipping, 'final': final}
# Problem 5: calculation, email, DB save all in one function (SRP violation)
Two Prompts
Vague:
Refactor this Python code nicely.
Specific:
Refactor this Python code. Improvement points:
1. Split into 3 functions: calculation, email, DB save
2. Fix SQL injection (use parameterized query)
3. Replace bare except with specific exception classes
4. Move imports to file top
5. Extract discount calculation into a function
Results (Claude Sonnet 4)
| Aspect | Vague prompt | Specific prompt |
|---|---|---|
| Fixed all 5 problems? | ✅ Yes | ✅ Yes |
| Output tokens | 2,172 | 1,897 |
| Unrequested additions | Decimal, dataclass, logging, full type hints | None |
| Code lines (approx.) | ~120 | ~80 |
| Review-friendly? | ❌ Real changes buried in noise | ✅ Focused on the 5 points |
Both fixed every problem. Claude Sonnet spots code issues even with vague instructions. That's impressive.
The problem is output focus. With the vague prompt, AI decides what "good code" means: convert float to Decimal, replace dicts with dataclasses, swap print for logging.getLogger, add type hints everywhere. Each change is correct. None were requested.
Why Unrequested Changes Are a Problem
"If the extra improvements are harmless, just keep them?" Three scenarios where they're not:
1. PR diff explosion
The SQL injection fix is a 1-line change. But committing the vague refactor result creates an 80-line diff. Reviewers must distinguish "essential security fix" from "cosmetic improvement." The critical change gets buried.
2. Tests break
Changing float to Decimal breaks assert result['total'] == 1000.0. Changing to dataclass breaks result['total'] (now result.total). Unrequested changes breaking existing tests is the opposite of what refactoring should do.
3. Dependencies shift
from decimal import Decimal and from dataclasses import dataclass are standard library, but you now have to explain in the PR "why Decimal?" for a change you never asked for. Writing justifications for unrequested changes is wasted energy.
The Template That Works
Refactor this code.
Improvement points:
1. [specific change 1]
2. [specific change 2]
3. ...
Constraints:
- Do not make changes beyond what is specified
- Ensure existing tests continue to pass
"Do not make changes beyond what is specified" is the key line. Without it, AI's helpfulness kicks in and "improves" everything it can see.
When "Refactor This Nicely" Is Fine
Vague instructions work in the exploration phase:
- "List the problems in this code" → AI enumerates issues → you prioritize → specific instructions
- "Suggest 3 refactoring approaches" → AI proposes → you choose
Use vague prompts for reconnaissance. Use specific prompts for execution. That way, you won't get surprise Decimals.
For more patterns on controlling AI code generation — from Plan Mode workflows to CLAUDE.md constraints that keep agents focused:
📖 Practical Claude Code: Context Engineering for Modern Development
Top comments (0)