manja316

Posted on Apr 8

I Automated My Entire Code Review with Claude Code Skills — Here's the Setup That Catches Real Bugs

#claudecode #codereview #typescript #security

After reviewing code manually for years, I finally built a system that catches the bugs I actually care about — not style nits, not formatting, but real logic errors, security holes, and performance traps. Here's exactly how I set it up using Claude Code skills.

The Problem with Standard Linters

ESLint, Pylint, Semgrep — they catch syntax problems and known patterns. But they completely miss:

Business logic that silently returns wrong results
API endpoints that accept but don't validate nested objects
Database queries that work in dev but timeout at scale
Security holes that look like normal code

I needed something that understands intent, not just syntax.

The Claude Code Skill Architecture

Claude Code skills are markdown files that teach Claude domain-specific expertise. Instead of one giant "review everything" prompt, I built specialized skills that each own one review dimension:

~/.claude/skills/
├── security-scanner.md      # OWASP Top 10, injection, auth bypass
├── typescript-expert.md     # Type holes, unsafe casts, generics abuse
├── react-expert.md          # Server/Client boundaries, hook violations
├── api-connector.md         # API integration patterns, error handling
└── dashboard-builder.md     # Data flow, state management, rendering

Each skill runs independently. Here's why that matters.

Skill 1: Security Scanner — Catching What Semgrep Misses

My Security Scanner skill focuses on vulnerabilities that static analysis tools consistently miss. Here's a real example it caught in production code:

// Looks normal. Semgrep passes it. ESLint passes it.
app.post('/api/webhook', async (req, res) => {
  const { event, data } = req.body;
  const query = `SELECT * FROM events WHERE type = '${event}'`;
  await db.execute(query);

  // Process based on event type
  if (event === 'payment.completed') {
    await processPayment(data);
  }
  res.json({ ok: true });
});

The security scanner flagged three issues:

SQL injection via string interpolation in the query (obvious once pointed out, invisible in a 500-line diff)
No webhook signature verification — anyone can POST fake payment events
No rate limiting on a public endpoint that triggers database writes

The fixed version:

app.post('/api/webhook',
  verifyWebhookSignature(process.env.WEBHOOK_SECRET),
  rateLimit({ windowMs: 60_000, max: 100 }),
  async (req, res) => {
    const { event, data } = req.body;
    const [rows] = await db.execute(
      'SELECT * FROM events WHERE type = ?',
      [event]
    );
    if (event === 'payment.completed') {
      await processPayment(data);
    }
    res.json({ ok: true });
  }
);

Three bugs. One skill. Zero false positives on this particular review.

Skill 2: API Connector — Integration Pattern Review

My API Connector skill reviews how your code talks to external services. It caught this in a teammate's PR:

async function syncUserData(userId: string) {
  const profile = await fetch(`https://api.provider.com/users/${userId}`);
  const orders = await fetch(`https://api.provider.com/orders?user=${userId}`);
  const prefs = await fetch(`https://api.provider.com/preferences/${userId}`);

  return { profile: await profile.json(), orders: await orders.json(), prefs: await prefs.json() };
}

Issues flagged:

Sequential fetches that should be parallel (Promise.all) — 3x latency for no reason
No timeout — if the provider API hangs, your server hangs
No error handling — a 404 on preferences crashes the entire sync
No retry logic — transient 503s cause permanent failures

The skill suggested this pattern instead:

async function syncUserData(userId: string) {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), 5_000);

  try {
    const [profile, orders, prefs] = await Promise.allSettled([
      fetchWithRetry(`/users/${userId}`, { signal: controller.signal }),
      fetchWithRetry(`/orders?user=${userId}`, { signal: controller.signal }),
      fetchWithRetry(`/preferences/${userId}`, { signal: controller.signal }),
    ]);

    return {
      profile: profile.status === 'fulfilled' ? profile.value : null,
      orders: orders.status === 'fulfilled' ? orders.value : [],
      prefs: prefs.status === 'fulfilled' ? prefs.value : defaults,
    };
  } finally {
    clearTimeout(timeout);
  }
}

This pattern — parallel fetches, timeouts, graceful degradation — is something linters can't enforce because it requires understanding the business context.

Skill 3: Dashboard Builder — Data Flow Review

The Dashboard Builder skill reviews data visualization and state management code. It caught a subtle bug in a monitoring dashboard:

function MetricsPanel({ timeRange }: { timeRange: string }) {
  const [data, setData] = useState([]);

  useEffect(() => {
    fetchMetrics(timeRange).then(setData);
  }, []); // <-- Missing dependency

  return <Chart data={data} />;
}

The empty dependency array means the dashboard loads once and never updates when the user changes the time range. The skill caught it and also flagged that fetchMetrics needed cancellation handling to prevent state updates on unmounted components.

The Review Pipeline

I chain these skills together in a pre-commit workflow:

# .claude/hooks/pre-commit.sh
claude --skill security-scanner "Review staged changes for security issues"
claude --skill api-connector "Review staged changes for API integration issues"
claude --skill dashboard-builder "Review staged changes for data flow issues"

Each skill runs in under 10 seconds. Total review time: ~30 seconds for a typical PR. Compare that to waiting hours for a human reviewer who might miss the SQL injection anyway.

Results After 30 Days

Running this on our team's codebase for a month:

Metric	Before	After
Security bugs reaching staging	4/month	0
API integration failures in prod	6/month	1
Review turnaround time	4-8 hours	30 seconds
False positive rate	N/A	~12% (acceptable)

The 12% false positive rate sounds high, but each "false positive" is still a code smell worth discussing. The skill flags it, the developer decides.

How to Build Your Own

Start with one skill. Pick the bug category that costs you the most time.
Use real examples. Paste actual bugs from your codebase into the skill as few-shot examples.
Be specific. "Check for security issues" is useless. "Check for SQL injection in any string that touches a database query" is actionable.
Iterate on false positives. When the skill flags something incorrectly, add a "do NOT flag this pattern" section.

If you want a head start, I've packaged my production skills:

Security Scanner Skill ($10) — OWASP Top 10 detection, injection patterns, auth bypass, 400+ vulnerability signatures
API Connector Skill ($7) — Integration patterns, retry logic, timeout handling, error propagation
Dashboard Builder Skill ($7) — Data flow review, state management, rendering optimization

Each one is a single markdown file you drop into ~/.claude/skills/. No setup beyond that.

What's Next

I'm building a dependency auditor skill that cross-references your package.json against known vulnerability databases in real-time during review. Not a scanner that runs separately — a skill that catches vulnerable imports the moment they're added to code.

The gap between "code that works" and "code that's production-ready" is where AI code review actually delivers value. Linters handle formatting. Humans handle architecture. Skills handle the bug patterns in between.

What's the most expensive bug that slipped through your code review process? I'm collecting patterns to build more detection skills.

DEV Community