The Vibe-Coding Cleanup Playbook: What Senior Engineers Actually Do When They Inherit an AI-Generated Codebase

Roughly 8,000 startups built production apps with Cursor, Replit Agent, Lovable, or Bolt in 2024 and 2025. Most of them now need cleanup work, and the engagements run $50K to $500K. Veracode's 2025 analysis found ~50% of AI-generated code contains security flaws and AI-co-authored code has 1.7x more major issues than human-written code.

This post is the technical playbook we run when a vibe-coded codebase lands in our lap. It is not theoretical. It is what works.

The Audit Phase (Days 1 to 3)

Before any code changes, you need to know what you're dealing with. The founder will tell you what they think is there. That is the starting hypothesis, not the answer.

Step 1: Inventory entry points and integrations.

# Find every HTTP entry point
rg -t ts -t js "app\.(get|post|put|delete|patch)|router\.(get|post|put|delete|patch)" \
  --line-number > _audit/entry_points.txt

# Find every external integration
rg -i "axios|fetch\(|got\(|node-fetch|http\.request" \
  --line-number > _audit/external_calls.txt

# Find every database call
rg "(prisma|knex|drizzle|sequelize|mongoose|pg\.query|supabase)" \
  --line-number > _audit/db_calls.txt

You are looking for the gap between the founder's mental model and the actual surface area of the app. The gap is always large.

Step 2: Find the secrets.

# Tools to run
npx gitleaks detect --source . --verbose
npx trufflehog filesystem .
git log --all -p | rg -i "(api[_-]?key|secret|password|token|bearer)" | head -100

In the last six engagements, every single codebase had at least one secret in git history. Half had secrets still in current files. One had the production AWS root credentials in a .env.example checked into the public repo.

Step 3: Map the auth model.

Look at three things in the route handlers. Where is the user identity established? Where is authorization checked? Is the check on the server, or is it a client-side hide-the-button trick?

The vibe-coded pattern looks like this:

// Found in 70% of cleanup engagements
function AdminPanel() {
  const { user } = useAuth();
  if (!user?.isAdmin) return <div>Not authorized</div>;
  return <SensitiveAdminStuff />;
}

// Meanwhile, the API:
app.delete('/api/users/:id', async (req, res) => {
  await db.user.delete({ where: { id: req.params.id } });
  res.json({ ok: true });
});

The frontend hides the button. The endpoint deletes any user, no auth, no audit. Anyone with the URL can call it.

The Stabilization Phase (Days 4 to 10)

You do not refactor a burning building. You put out the fire first.

Patch 1: Pull secrets from history.

# Move secrets to env, remove from current files
git rm --cached .env .env.local
echo ".env*" >> .gitignore

# Pull from history (destructive, coordinate with team)
git filter-repo --invert-paths --path .env --force

# Rotate everything that was exposed
# This part is manual: every key, every token, every credential

Patch 2: Server-side auth on every mutating endpoint.

Write a middleware. Apply it everywhere. No exceptions.

// auth.ts
export async function requireAuth(req, res, next) {
  const token = req.headers.authorization?.replace('Bearer ', '');
  if (!token) return res.status(401).json({ error: 'unauthorized' });

  try {
    const user = await verifyToken(token);
    req.user = user;
    next();
  } catch {
    return res.status(401).json({ error: 'invalid token' });
  }
}

export function requireRole(...roles: string[]) {
  return (req, res, next) => {
    if (!roles.includes(req.user?.role)) {
      return res.status(403).json({ error: 'forbidden' });
    }
    next();
  };
}

// Apply
app.delete('/api/users/:id', requireAuth, requireRole('admin'), handler);

Patch 3: Rate limit the abuse-prone endpoints.

import rateLimit from 'express-rate-limit';

const writeLimit = rateLimit({
  windowMs: 60_000,
  max: 20,
  standardHeaders: true,
});

const authLimit = rateLimit({
  windowMs: 15 * 60_000,
  max: 5,
  skipSuccessfulRequests: true,
});

app.use('/api/auth/login', authLimit);
app.use('/api/auth/register', authLimit);
app.use('/api/*', writeLimit);

Patch 4: Parameterize every query.

The vibe-coded SQL pattern:

// DANGEROUS
const results = await db.query(
  `SELECT * FROM users WHERE email = '${email}'`
);

The fix:

// SAFE
const results = await db.query(
  `SELECT * FROM users WHERE email = $1`,
  [email]
);

Run rg "db\.query\(.*\\${" -t ts -t js to find every single instance. There will be more than you expect.

The Observability Phase (Days 11 to 17)

You cannot prioritize what you cannot see. Get instrumentation in before you refactor anything.

The minimum viable stack:

// Error tracking
import * as Sentry from '@sentry/node';
Sentry.init({ dsn: process.env.SENTRY_DSN, tracesSampleRate: 0.1 });

// Structured logging
import pino from 'pino';
const log = pino({ level: process.env.LOG_LEVEL || 'info' });

// HTTP request logging with timing
app.use((req, res, next) => {
  const start = Date.now();
  res.on('finish', () => {
    log.info({
      method: req.method,
      path: req.path,
      status: res.statusCode,
      duration_ms: Date.now() - start,
      user_id: req.user?.id,
    });
  });
  next();
});

// Database query logging (Prisma example)
const prisma = new PrismaClient({
  log: [
    { level: 'query', emit: 'event' },
    { level: 'error', emit: 'stdout' },
  ],
});

prisma.$on('query', (e) => {
  if (e.duration > 100) {
    log.warn({ query: e.query, duration: e.duration }, 'slow query');
  }
});

Run that for 48 hours. Look at the data. The pattern is always the same: 80% of the pain comes from 5 endpoints and 3 queries.

The Keep-or-Rebuild Decision

For each module, score it against three questions:

Is the business logic clear? Can you write a one-paragraph spec for what it does, and is the code clearly implementing that spec? Or are there branches that nobody can explain?
Is the data model correct? Are tables normalized appropriately? Are foreign keys actually constrained? Or did the AI invent denormalizations the founder accepted without understanding the implications?
Is it isolated enough to refactor incrementally? Can you replace it behind a feature flag, or is its logic spread across forty files?

Scoring:

3 yes answers: refactor in place
2 yes: refactor with caution
1 yes: rebuild behind a feature flag
0 yes: rebuild and burn the original

In practice, the auth module survives most engagements. The billing module survives sometimes. The core business logic almost never survives.

The Rebuild Phase

The replacement code uses the same AI tools, but with explicit guardrails. The pattern that works:

Write the test first. Generate the test scaffolding with the AI if you want, but the assertions are human-written and reflect the actual business requirement.
Generate the implementation. Let the AI handle the boilerplate. Read every line before accepting it. Reject anything that touches modules outside the current concern.
Run the test. Iterate the prompt or the implementation until it passes.
Code review. Either a second engineer or yourself the next morning. Treat AI output the same way you'd treat a junior engineer's PR.
Merge behind a feature flag. The old code still serves production until the new code is proven.

A typical day on a rebuild engagement looks like 6 to 8 small PRs, all green CI, all reviewed, all merged behind flags. We measure success by what is migratable, not by what is committed.

What This Costs in Real Numbers

The engagement we ran last quarter on a 40-customer B2B SaaS:

Triage: 3 days, 1 senior engineer
Stabilization: 7 days, including 31 rotated keys, 1 SQL injection fix, server-side auth refactor, rate limiting
Observability: 5 days, full Sentry + Pino + Datadog APM rollout
Keep-or-rebuild decisions: 1 day of architecture review
Core rebuild: 3 weeks, 2 senior engineers, behind feature flags
Migration: 2 weeks, customer cohorts moved one tier at a time

Total: 10 weeks, around $90K. The founder told us afterward the engagement was the difference between selling the company and going under. That math has been roughly consistent across every engagement we've run.

What Not to Do

A few hard-won lessons:

Do not rewrite from scratch as the first move. You will reproduce the bugs the founder hasn't noticed yet, and you will miss the business logic that lives in awkward branches.
Do not skip the audit because the founder swears they know what's there. They don't.
Do not let the AI tools touch the code without supervision during the cleanup. The tools that made the mess are not the tools to clean it up unattended.
Do not refactor without tests. If there are no tests, your first job is to write characterization tests around the existing behavior.
Do not promise speed. Promise correctness. Speed comes back once the foundation is solid.

If you're hitting this in production and want a second set of eyes, feel free to DM me. Happy to dig in.