I Gave 5 AI Models the Same Refactor Task. Here's What Each One Added Without Being Asked.

#webdev #programming #cursor #ai

I got Cursor Pro last week and wanted to know which model to default to. So I wrote a messy Express.js file, gave all five models the same vague prompt, and compared the results.

The refactors were all fine. What I didn't expect was everything they added that I didn't ask for.

The Setup

It was one file. Hardcoded credentials, copy-pasted auth blocks, no structure:

const pool = new Pool({ 
  host: 'localhost', 
  database: 'myapp', 
  user: 'admin', 
  password: 'password123'  // hardcoded
});
const JWT_SECRET = 'super-secret-key-dont-share';  // hardcoded

// Same auth check copy-pasted into every route handler
app.post('/api/products', async (req, res) => {
  const authHeader = req.headers.authorization;
  if (!authHeader || !authHeader.startsWith('Bearer ')) 
    return res.status(401).json({ error: 'No token' });
  let decoded; 
  try { decoded = jwt.verify(authHeader.split(' ')[1], JWT_SECRET); } 
  catch { return res.status(401).json({ error: 'Invalid' }); }
  // ...
});

The prompt: "refactor this into a clean structure." Nothing about what to add, what patterns to use, what's missing. I wanted to see what each model thinks "clean" means beyond the literal request.

The Basics (Everyone Got These Right)

All five models:

Moved credentials to environment variables
Extracted auth into middleware
Split routes into separate files
Kept PostgreSQL (the original database)

There were no hallucinations, no database swaps, no weird rewrites. The core refactor was correct across the board.

What They Added Without Being Asked

Here's what each one did.

Opus: The Security-Conscious One

Opus noticed the JWT tokens had no expiry. The original code creates tokens that live forever, which is a real security issue. It added expiresIn: '24h' without being prompted.

It also:

Created a .gitignore with node_modules/ and .env
Generated an actual .env file (not just an example)
Fixed a subtle HTTP status code issue: the original returned 400 for invalid tokens, and Opus corrected it to 401 (which is the right code for auth failures per the HTTP spec)

That 400-to-401 fix is the kind of thing a senior dev catches in code review.

Sonnet: The Ops-Minded One

Sonnet also caught the JWT expiry problem but it made the value configurable: process.env.JWT_EXPIRES_IN || '24h'. It's a small difference, but it means you can change token lifetime without touching code.

It also added:

A /health endpoint (useful if you're deploying behind a load balancer)
A README.md with setup instructions
express.urlencoded() middleware for form data

The health check is interesting. Nobody asked for it. But if you're refactoring a messy Express app, there's a decent chance you're about to deploy it properly, and health checks are the first thing you'll need.

GPT: The Architecture Astronaut

GPT behaved in a way that felt more architectural than the others. It created:

A custom HttpError class for typed error handling
An asyncHandler wrapper so async route errors flow to middleware instead of crashing
A separate config.js module for centralized configuration
Split app.js from server.js (a testability pattern, so you can import the app without starting the server)

The asyncHandler is genuinely useful. Express doesn't catch async errors by default, so without it, an unhandled promise rejection in a route kills the process. Most production Express apps need this. GPT just assumed you'd want it.

The app.js/server.js split is a pattern you see in well-tested codebases. Whether you need it for a refactor of a messy file is debatable, but it's not wrong.

Gemini: Just the Refactor

Gemini did exactly what was asked and nothing more. It produced a clean folder structure, extracted middleware, and moved creds to env vars. It didn't add anything extra.

Depending on your perspective, that's either the most disciplined output or the least helpful. If you just want a clean refactor with no surprises, Gemini's your model. If you want it to catch things you missed, it won't.

Auto: Chose Opus

Auto mode picked Opus for this task (the output was nearly identical). If you're curious what Cursor thinks is the right model for a refactor, apparently it's the thorough one, not the fast one.

The Timing

Model	Time
Sonnet	~60s
GPT	~1m 21s
Opus	~2m
Gemini	~4m 21s
Auto	~2m (picked Opus)

Sonnet is 4x faster than Gemini. For a simple refactor, that matters. Opus takes twice as long as Sonnet but it catches more issues. GPT lands in the middle.

So Which Model?

Depends on what you want:

If you want fast and clean: I'd go with Sonnet. It does the refactor, adds practical ops stuff (health check, README), and finishes in a minute.
If you want thorough: Opus catches security issues and fixes subtle bugs, but it takes longer. It's good for code you're about to ship.
If you want opinionated: GPT adds architectural patterns you might not need yet but will appreciate later. I'd expect to review its structural decisions though.
If you want minimal: Gemini does what you asked and nothing more. It's the slowest though, which makes the lack of extras harder to justify.

For most refactors, I'd pick Sonnet and let Opus handle anything security-sensitive. But honestly the real takeaway is: check what the model added. Some of it is genuinely useful stuff you forgot to ask for.

I tested this with Cursor CLI agent mode, February 2026. All models were run through Cursor Pro. I committed the input file and all outputs to git before and after each run.

📝 Previous articles:

💻 Free collection (33 rules): github.com/nedcodes-ok/cursorrules-collection\n---\n\n*Check your setup: npx cursor-doctor scan — finds broken rules, conflicts, and token waste. Free on npm.*

📋 I made a free Cursor Safety Checklist — a pre-flight checklist for AI-assisted coding sessions, based on actual experiments.

Get it free →