DEV Community

manja316
manja316

Posted on

I Automated My Entire Code Review with Claude Code Skills — Here's the Setup That Catches Real Bugs

After reviewing code manually for years, I finally built a system that catches the bugs I actually care about — not style nits, not formatting, but real logic errors, security holes, and performance traps. Here's exactly how I set it up using Claude Code skills.

The Problem with Standard Linters

ESLint, Pylint, Semgrep — they catch syntax problems and known patterns. But they completely miss:

  • Business logic that silently returns wrong results
  • API endpoints that accept but don't validate nested objects
  • Database queries that work in dev but timeout at scale
  • Security holes that look like normal code

I needed something that understands intent, not just syntax.

The Claude Code Skill Architecture

Claude Code skills are markdown files that teach Claude domain-specific expertise. Instead of one giant "review everything" prompt, I built specialized skills that each own one review dimension:

~/.claude/skills/
├── security-scanner.md      # OWASP Top 10, injection, auth bypass
├── typescript-expert.md     # Type holes, unsafe casts, generics abuse
├── react-expert.md          # Server/Client boundaries, hook violations
├── api-connector.md         # API integration patterns, error handling
└── dashboard-builder.md     # Data flow, state management, rendering
Enter fullscreen mode Exit fullscreen mode

Each skill runs independently. Here's why that matters.

Skill 1: Security Scanner — Catching What Semgrep Misses

My Security Scanner skill focuses on vulnerabilities that static analysis tools consistently miss. Here's a real example it caught in production code:

// Looks normal. Semgrep passes it. ESLint passes it.
app.post('/api/webhook', async (req, res) => {
  const { event, data } = req.body;
  const query = `SELECT * FROM events WHERE type = '${event}'`;
  await db.execute(query);

  // Process based on event type
  if (event === 'payment.completed') {
    await processPayment(data);
  }
  res.json({ ok: true });
});
Enter fullscreen mode Exit fullscreen mode

The security scanner flagged three issues:

  1. SQL injection via string interpolation in the query (obvious once pointed out, invisible in a 500-line diff)
  2. No webhook signature verification — anyone can POST fake payment events
  3. No rate limiting on a public endpoint that triggers database writes

The fixed version:

app.post('/api/webhook',
  verifyWebhookSignature(process.env.WEBHOOK_SECRET),
  rateLimit({ windowMs: 60_000, max: 100 }),
  async (req, res) => {
    const { event, data } = req.body;
    const [rows] = await db.execute(
      'SELECT * FROM events WHERE type = ?',
      [event]
    );
    if (event === 'payment.completed') {
      await processPayment(data);
    }
    res.json({ ok: true });
  }
);
Enter fullscreen mode Exit fullscreen mode

Three bugs. One skill. Zero false positives on this particular review.

Skill 2: API Connector — Integration Pattern Review

My API Connector skill reviews how your code talks to external services. It caught this in a teammate's PR:

async function syncUserData(userId: string) {
  const profile = await fetch(`https://api.provider.com/users/${userId}`);
  const orders = await fetch(`https://api.provider.com/orders?user=${userId}`);
  const prefs = await fetch(`https://api.provider.com/preferences/${userId}`);

  return { profile: await profile.json(), orders: await orders.json(), prefs: await prefs.json() };
}
Enter fullscreen mode Exit fullscreen mode

Issues flagged:

  1. Sequential fetches that should be parallel (Promise.all) — 3x latency for no reason
  2. No timeout — if the provider API hangs, your server hangs
  3. No error handling — a 404 on preferences crashes the entire sync
  4. No retry logic — transient 503s cause permanent failures

The skill suggested this pattern instead:

async function syncUserData(userId: string) {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), 5_000);

  try {
    const [profile, orders, prefs] = await Promise.allSettled([
      fetchWithRetry(`/users/${userId}`, { signal: controller.signal }),
      fetchWithRetry(`/orders?user=${userId}`, { signal: controller.signal }),
      fetchWithRetry(`/preferences/${userId}`, { signal: controller.signal }),
    ]);

    return {
      profile: profile.status === 'fulfilled' ? profile.value : null,
      orders: orders.status === 'fulfilled' ? orders.value : [],
      prefs: prefs.status === 'fulfilled' ? prefs.value : defaults,
    };
  } finally {
    clearTimeout(timeout);
  }
}
Enter fullscreen mode Exit fullscreen mode

This pattern — parallel fetches, timeouts, graceful degradation — is something linters can't enforce because it requires understanding the business context.

Skill 3: Dashboard Builder — Data Flow Review

The Dashboard Builder skill reviews data visualization and state management code. It caught a subtle bug in a monitoring dashboard:

function MetricsPanel({ timeRange }: { timeRange: string }) {
  const [data, setData] = useState([]);

  useEffect(() => {
    fetchMetrics(timeRange).then(setData);
  }, []); // <-- Missing dependency

  return <Chart data={data} />;
}
Enter fullscreen mode Exit fullscreen mode

The empty dependency array means the dashboard loads once and never updates when the user changes the time range. The skill caught it and also flagged that fetchMetrics needed cancellation handling to prevent state updates on unmounted components.

The Review Pipeline

I chain these skills together in a pre-commit workflow:

# .claude/hooks/pre-commit.sh
claude --skill security-scanner "Review staged changes for security issues"
claude --skill api-connector "Review staged changes for API integration issues"
claude --skill dashboard-builder "Review staged changes for data flow issues"
Enter fullscreen mode Exit fullscreen mode

Each skill runs in under 10 seconds. Total review time: ~30 seconds for a typical PR. Compare that to waiting hours for a human reviewer who might miss the SQL injection anyway.

Results After 30 Days

Running this on our team's codebase for a month:

Metric Before After
Security bugs reaching staging 4/month 0
API integration failures in prod 6/month 1
Review turnaround time 4-8 hours 30 seconds
False positive rate N/A ~12% (acceptable)

The 12% false positive rate sounds high, but each "false positive" is still a code smell worth discussing. The skill flags it, the developer decides.

How to Build Your Own

  1. Start with one skill. Pick the bug category that costs you the most time.
  2. Use real examples. Paste actual bugs from your codebase into the skill as few-shot examples.
  3. Be specific. "Check for security issues" is useless. "Check for SQL injection in any string that touches a database query" is actionable.
  4. Iterate on false positives. When the skill flags something incorrectly, add a "do NOT flag this pattern" section.

If you want a head start, I've packaged my production skills:

Each one is a single markdown file you drop into ~/.claude/skills/. No setup beyond that.

What's Next

I'm building a dependency auditor skill that cross-references your package.json against known vulnerability databases in real-time during review. Not a scanner that runs separately — a skill that catches vulnerable imports the moment they're added to code.

The gap between "code that works" and "code that's production-ready" is where AI code review actually delivers value. Linters handle formatting. Humans handle architecture. Skills handle the bug patterns in between.


What's the most expensive bug that slipped through your code review process? I'm collecting patterns to build more detection skills.

Top comments (0)