<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Стас Журавель</title>
    <description>The latest articles on DEV Community by Стас Журавель (@zhuravelstas).</description>
    <link>https://dev.to/zhuravelstas</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3181276%2Fb6f79e55-dc06-44ca-95ee-28b11c66c926.jpg</url>
      <title>DEV Community: Стас Журавель</title>
      <link>https://dev.to/zhuravelstas</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/zhuravelstas"/>
    <language>en</language>
    <item>
      <title>Whole-laptop scanner for the Axios supply chain attack</title>
      <dc:creator>Стас Журавель</dc:creator>
      <pubDate>Wed, 01 Apr 2026 10:23:38 +0000</pubDate>
      <link>https://dev.to/zhuravelstas/whole-laptop-scanner-for-the-axios-supply-chain-attack-c25</link>
      <guid>https://dev.to/zhuravelstas/whole-laptop-scanner-for-the-axios-supply-chain-attack-c25</guid>
      <description>&lt;p&gt;On March 31, 2026, attackers hijacked the npm maintainer account for &lt;strong&gt;axios&lt;/strong&gt; (300M+ weekly downloads) and published poisoned versions that deploy a cross-platform Remote Access Trojan. The malicious versions were live for ~3 hours before being pulled.&lt;/p&gt;

&lt;p&gt;Every security vendor published analysis. None shipped a tool that scans your &lt;strong&gt;entire laptop&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So we built one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 30-second version
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-sL&lt;/span&gt; https://raw.githubusercontent.com/booklib-ai/dispatch/main/dispatches/2026-04-01-axios-supply-chain-attack/scan.sh &lt;span class="nt"&gt;-o&lt;/span&gt; scan.sh
&lt;span class="nb"&gt;chmod&lt;/span&gt; +x scan.sh
./scan.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This scans every npm project on your machine, checks for malware artifacts, verifies no C2 connections are active, and lists credentials that may have been exfiltrated.&lt;/p&gt;

&lt;h2&gt;
  
  
  What happened
&lt;/h2&gt;

&lt;p&gt;The attacker compromised the &lt;code&gt;jasonsaayman&lt;/code&gt; npm account and published:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;axios@1.14.1&lt;/code&gt; (targeting the 1.x user base)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;axios@0.30.4&lt;/code&gt; (targeting the legacy 0.x branch)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both versions inject &lt;code&gt;plain-crypto-js@4.2.1&lt;/code&gt; — a package that runs a &lt;code&gt;postinstall&lt;/code&gt; script deploying platform-specific RATs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;macOS&lt;/strong&gt;: Binary at &lt;code&gt;/Library/Caches/com.apple.act.mond&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Windows&lt;/strong&gt;: PowerShell copy at &lt;code&gt;%PROGRAMDATA%\wt.exe&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linux&lt;/strong&gt;: Python script at &lt;code&gt;/tmp/ld.py&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After execution, the malware &lt;strong&gt;deletes itself&lt;/strong&gt; and replaces its &lt;code&gt;package.json&lt;/code&gt; with a clean version. If you inspect &lt;code&gt;node_modules&lt;/code&gt; after the fact, everything looks normal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why existing tools aren't enough
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Limitation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;snyk test&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Per-project only — must &lt;code&gt;cd&lt;/code&gt; into each directory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;StepSecurity Harden-Runner&lt;/td&gt;
&lt;td&gt;CI/CD only (GitHub Actions)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;StepSecurity Dev Machine Guard&lt;/td&gt;
&lt;td&gt;Enterprise paid product&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;npm audit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Doesn't check for malware artifacts on disk&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Our scanner does &lt;strong&gt;7 checks across your entire machine&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;All lock files&lt;/strong&gt; — recursively finds every &lt;code&gt;package-lock.json&lt;/code&gt;, &lt;code&gt;yarn.lock&lt;/code&gt;, &lt;code&gt;pnpm-lock.yaml&lt;/code&gt;, &lt;code&gt;bun.lock&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;node_modules&lt;/strong&gt; — checks for &lt;code&gt;plain-crypto-js&lt;/code&gt; directory (presence = compromise, even if clean)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Package caches&lt;/strong&gt; — npm, Yarn, pnpm, Bun&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Malware artifacts&lt;/strong&gt; — OS-specific trojan paths + campaign files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;C2 connections&lt;/strong&gt; — &lt;code&gt;sfrclak.com&lt;/code&gt; / &lt;code&gt;142.11.206.73&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Credential files&lt;/strong&gt; — lists what may have been exfiltrated&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hardening&lt;/strong&gt; — checks &lt;code&gt;ignore-scripts&lt;/code&gt;, recommends &lt;code&gt;overrides&lt;/code&gt; block&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The false positive trap
&lt;/h2&gt;

&lt;p&gt;Our first version had a bug: the regex &lt;code&gt;"1.14.1"&lt;/code&gt; matched &lt;strong&gt;any&lt;/strong&gt; package at that version — &lt;code&gt;serve-static@1.14.1&lt;/code&gt;, &lt;code&gt;@webassemblyjs/ast@1.14.1&lt;/code&gt;, etc. One machine showed 48 "critical" hits that were all false positives.&lt;/p&gt;

&lt;p&gt;The fix: two-phase detection. Phase 1 searches for definitive markers (&lt;code&gt;plain-crypto-js&lt;/code&gt;, &lt;code&gt;openclaw-qbot&lt;/code&gt;). Phase 2 does contextual grep — only flags version &lt;code&gt;1.14.1&lt;/code&gt; when it appears within 2 lines of &lt;code&gt;"axios"&lt;/code&gt; in the lock file.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anti-forensic detection
&lt;/h2&gt;

&lt;p&gt;The coolest (scariest?) detail from &lt;a href="https://www.stepsecurity.io/blog/axios-compromised-on-npm-malicious-versions-drop-remote-access-trojan" rel="noopener noreferrer"&gt;StepSecurity's analysis&lt;/a&gt;: after the malware runs, it replaces its &lt;code&gt;package.json&lt;/code&gt; with a stub that reports version &lt;code&gt;4.2.0&lt;/code&gt; instead of &lt;code&gt;4.2.1&lt;/code&gt;. Running &lt;code&gt;npm list&lt;/code&gt; post-infection shows the wrong version.&lt;/p&gt;

&lt;p&gt;Our scanner catches this by checking for the &lt;strong&gt;directory existence&lt;/strong&gt; regardless of what &lt;code&gt;package.json&lt;/code&gt; says inside.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compatible with everything
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;macOS Catalina → Sequoia (Intel + Apple Silicon)&lt;/li&gt;
&lt;li&gt;Linux (any distro)&lt;/li&gt;
&lt;li&gt;Bash 3.2+ (stock macOS bash)&lt;/li&gt;
&lt;li&gt;Works with or without Node.js installed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Get it
&lt;/h2&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/booklib-ai/dispatch" rel="noopener noreferrer"&gt;booklib-ai/dispatch&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is dispatch #001 from &lt;strong&gt;booklib-ai&lt;/strong&gt; — we'll publish same-day analysis + tools for future supply chain incidents. Star the repo if you want to stay updated.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Also from booklib-ai: &lt;a href="https://github.com/booklib-ai/skills" rel="noopener noreferrer"&gt;skills&lt;/a&gt; — plug-and-play expertise for AI coding agents — structured engineering skills distributed via npm that integrate with Claude Code, Cursor, and any MCP-compatible tool.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>npm</category>
      <category>security</category>
      <category>tooling</category>
    </item>
    <item>
      <title>What's your AI code review setup in 2026?</title>
      <dc:creator>Стас Журавель</dc:creator>
      <pubDate>Fri, 27 Mar 2026 07:06:41 +0000</pubDate>
      <link>https://dev.to/zhuravelstas/whats-your-ai-code-review-setup-in-2026-1341</link>
      <guid>https://dev.to/zhuravelstas/whats-your-ai-code-review-setup-in-2026-1341</guid>
      <description>&lt;p&gt;AI code review has exploded in the last year. Everyone has a different setup — I'm curious what's actually working.&lt;/p&gt;

&lt;p&gt;A few specific questions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Do you use a single reviewer or multiple specialized ones?&lt;/strong&gt;&lt;br&gt;
Claude's built-in &lt;code&gt;pr-review-toolkit&lt;/code&gt; runs 6 parallel sub-agents (tests, types, silent failures, etc.). I've been experimenting with the opposite — one agent, one deeply focused skill per review. Different trade-offs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Do you apply the same review to every file, or do you route by context?&lt;/strong&gt;&lt;br&gt;
A Clean Code reviewer on a domain model file gives you naming feedback when the real problem is your aggregate boundary. A DDD reviewer on a utility function talks about bounded contexts when you just need cleaner variable names. How do you handle this?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. What does your AI reviewer consistently miss?&lt;/strong&gt;&lt;br&gt;
In my own benchmark, the biggest miss was a PCI violation — card data logged to stdout. The architectural reviewer caught naming issues and design patterns but had no security lens at all.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Pre-merge gate or architectural review — or both?&lt;/strong&gt;&lt;br&gt;
I've landed on "both, at different moments" — fast confidence-filtered review before merge, deeper book-grounded review when planning a larger refactor.&lt;/p&gt;




&lt;p&gt;I wrote up a benchmark comparing Claude's native reviewer against a routed book-based approach — &lt;a href="https://dev.to/zhuravelstas/how-i-route-ai-agents-to-the-right-code-review-context-24hl"&gt;full comparison here&lt;/a&gt; if you want the details. Curious whether others have run similar experiments or landed on different conclusions.&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>ai</category>
      <category>codereview</category>
      <category>programming</category>
    </item>
    <item>
      <title>How I Route AI Agents to the Right Code Review Context</title>
      <dc:creator>Стас Журавель</dc:creator>
      <pubDate>Fri, 27 Mar 2026 06:44:11 +0000</pubDate>
      <link>https://dev.to/zhuravelstas/how-i-route-ai-agents-to-the-right-code-review-context-24hl</link>
      <guid>https://dev.to/zhuravelstas/how-i-route-ai-agents-to-the-right-code-review-context-24hl</guid>
      <description>&lt;p&gt;You gave your AI agent a Clean Code checklist. It reviewed your You gave Claude Code a Clean Code checklist. It reviewed your order processing service and told you to rename &lt;code&gt;proc&lt;/code&gt; to &lt;code&gt;processOrder&lt;/code&gt; and split a 22-line function into three.&lt;/p&gt;

&lt;p&gt;Meanwhile, the actual problem — your aggregate boundary is wrong and you're leaking domain logic into the API layer — went completely unnoticed.&lt;/p&gt;

&lt;p&gt;This isn't an AI failure. It's a routing failure. The agent applied the wrong lens.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Context Collapse
&lt;/h2&gt;

&lt;p&gt;If you give an AI agent a broad set of review instructions, two things happen:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token waste&lt;/strong&gt; — the agent reads through hundreds of lines of principles that don't apply to the file at hand.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong focus&lt;/strong&gt; — a Clean Code reviewer will nitpick naming on a file where the real issue is a broken domain model. A DDD reviewer will talk about bounded contexts on a utility function that just needs cleaner variable names.&lt;/p&gt;

&lt;p&gt;This is what one Hacker News commenter called context collapse: "Clean Code was written for Java in 2008. DDIA is about distributed systems at scale. If you apply the Clean Code reviewer to a 50-line Python script, you'll get pedantic nonsense about function length when the actual problem might be that the data model is wrong."&lt;/p&gt;

&lt;p&gt;The criticism is valid. The fix isn't to abandon structured review — it's to pick the right structure for the file in front of you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Approach: A Router That Picks the Reviewer
&lt;/h2&gt;

&lt;p&gt;I've been building a collection of "skills" — structured instruction sets distilled from classic software engineering books (Clean Code, DDIA, Effective Java, DDD, etc.). Each one is a focused lens that an AI agent uses during code review or code generation.&lt;/p&gt;

&lt;p&gt;The key piece is a &lt;code&gt;skill-router&lt;/code&gt;: a meta-skill that runs before any review happens. It inspects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;File type and language — Kotlin? Python? Infrastructure config?&lt;/li&gt;
&lt;li&gt;Domain signals — is this a service layer? A repository? A controller?&lt;/li&gt;
&lt;li&gt;Work type — code review, refactoring, greenfield design, or bug fix?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Based on that, it selects the 1–2 most relevant skills and explicitly skips the rest.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example in Practice
&lt;/h2&gt;

&lt;p&gt;User: "Review my order processing service"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Router decision:
  ✅ Primary:   domain-driven-design    — domain model design (Aggregates, Value Objects)
  ✅ Secondary: microservices-patterns  — service boundaries and inter-service communication
  ⛔ Skip:      clean-code-reviewer     — premature at design stage; apply later on implementation code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The router doesn't just pick — it explains why it skipped alternatives. That rationale is important: it makes the selection auditable, and you can override it if you disagree.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Not Just Use One Giant Prompt?
&lt;/h2&gt;

&lt;p&gt;You could stuff everything into one system prompt. I tried. Here's what happens:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attention dilution&lt;/strong&gt; — the model tries to apply everything at once and produces shallow, generic feedback.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conflicting advice&lt;/strong&gt; — Clean Code says "extract small functions." Some microservices patterns say "prefer cohesive, slightly larger functions over deep call stacks." The model hedges between both.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token budget&lt;/strong&gt; — if you're working in Claude Code or Cursor, every token of instructions competes with your actual code context.&lt;/p&gt;

&lt;p&gt;Routing means the agent reads ~200 focused lines of instructions instead of ~2000 unfocused ones.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Alternative Criticism: "LLMs Already Know These Books"
&lt;/h2&gt;

&lt;p&gt;This is the most common pushback I get. And it's partially true — LLMs have read Clean Code. But they apply that knowledge inconsistently and at low confidence.&lt;/p&gt;

&lt;p&gt;Giving the model an explicit lens — "review this against Clean Code heuristics C1–C36" — concentrates attention and dramatically reduces hallucinated or off-topic feedback. It's the difference between asking someone "what do you think?" vs. "evaluate this against these specific criteria."&lt;/p&gt;

&lt;p&gt;Think of it like unit tests: the runtime can execute your code correctly without them. But tests make correctness explicit, repeatable, and auditable. Skills do the same for AI review.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the Routing Actually Works
&lt;/h2&gt;

&lt;p&gt;The router skill is a structured prompt with a decision tree:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Parse the request — what file(s), what task&lt;/li&gt;
&lt;li&gt;Match against skill metadata — each skill declares its applicable languages, domains, and work types&lt;/li&gt;
&lt;li&gt;Rank by relevance — primary (strongest match) and secondary (complementary perspective)&lt;/li&gt;
&lt;li&gt;Conflict resolution — if two skills would give contradictory advice, prefer the one matching the higher abstraction level of the task&lt;/li&gt;
&lt;li&gt;Return selection with rationale&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There's no ML model or embedding search involved. It's structured prompting — the LLM acts as the routing engine using routing rules baked into the router's own instructions. Language signals, domain signals, and conflict resolution are all declared explicitly inside the router skill, not inferred at runtime. The trade-off: it's fast and predictable, but adding a new skill requires updating the router manually.&lt;/p&gt;

&lt;h2&gt;
  
  
  Levels of Review (a Pattern Worth Stealing)
&lt;/h2&gt;

&lt;p&gt;One of the most useful ideas that came from community feedback: separate your review into levels of critique:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A fast "lint" pass — formatting, obvious bugs, missing tests&lt;/li&gt;
&lt;li&gt;A domain pass — does the code correctly model the business logic?&lt;/li&gt;
&lt;li&gt;A "counterexample" pass — propose at least one concrete failing scenario and how to reproduce it&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The skill library maps roughly to these levels — Clean Code for level 1, DDD for level 2 — but you have to invoke them separately with the right framing. The router picks based on what the code &lt;em&gt;is&lt;/em&gt;, not which level you're at. Explicit level-based routing isn't built yet. The counterexample pass is harder and something I'm still figuring out.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The skills and the router are open source: &lt;a href="https://github.com/booklib-ai/skills" rel="noopener noreferrer"&gt;github.com/booklib-ai/skills&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can use them with Claude Code, Cursor, or any agent that supports SKILL.md files. The quickest way to try it — install everything and let the router decide:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @booklib/skills add &lt;span class="nt"&gt;--all&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or globally, so it's available in every project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @booklib/skills add &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;--global&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then just ask your agent to review a file — the router picks the right skill automatically. You don't need to know the library upfront.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark: Routed Skills vs. Native Review
&lt;/h2&gt;

&lt;p&gt;Theory is nice. Does it actually find more issues?&lt;/p&gt;

&lt;p&gt;I took a deliberately terrible 157-line Node.js order processing module — god function, SQL injection on every query, global mutable state, &lt;code&gt;eval()&lt;/code&gt; for no reason — and ran it through two pipelines in parallel:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Native:&lt;/strong&gt; Claude's built-in &lt;code&gt;pr-review-toolkit:code-reviewer&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;skill-router:&lt;/strong&gt; &lt;code&gt;skill-router&lt;/code&gt; → &lt;code&gt;clean-code-reviewer&lt;/code&gt; + &lt;code&gt;design-patterns&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What the router chose
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Primary:    clean-code-reviewer  — god function, cryptic names, magic numbers
Secondary:  design-patterns      — duplicated payment blocks → Strategy pattern
Skipped:    domain-driven-design — implementation level, not model design stage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Issue detection
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Native&lt;/th&gt;
&lt;th&gt;skill-router&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Critical/High issues&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Important/Improvement&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Suggestions&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total unique issues&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;19&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~28&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;~89% of what Claude's native reviewer found, skill-router also found. But skill-router found ~9 additional issues that the native reviewer missed entirely.&lt;/p&gt;

&lt;p&gt;A few that stood out:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;formatMoney&lt;/code&gt; has a floating-point rounding bug&lt;/strong&gt; — &lt;code&gt;0.1 + 0.2&lt;/code&gt; arithmetic, not &lt;code&gt;Math.round&lt;/code&gt;. Native didn't flag it; clean-code-reviewer caught it via the G-series heuristics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The stubs always return &lt;code&gt;true&lt;/code&gt;&lt;/strong&gt; — they're lying to callers. Native missed it; clean-code-reviewer flagged it as a lying comment / false contract.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;skill-router surfaced 7 pattern opportunities&lt;/strong&gt; — places where a known pattern could reduce complexity (Strategy for payments, State for order lifecycle, Singleton for the broken global state). It explains the problem each one solves and suggests a fix sequence, but leaves the decision to you. Native produced no architectural guidance at all.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Where each approach wins
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Situation&lt;/th&gt;
&lt;th&gt;Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Pre-merge PR review, security audit&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Native&lt;/strong&gt; — pre-merge gate: fast, confidence-filtered, adapts to your CLAUDE.md project conventions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Larger refactor, architecture planning&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;skill-router&lt;/strong&gt; — patterns, principles, refactor roadmap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Both together&lt;/td&gt;
&lt;td&gt;~95% total issue coverage vs. ~80% for either alone&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;One honest loss for skill-router:&lt;/strong&gt; Card data was being logged to stdout — a clear PCI violation. Claude's built-in reviewer flagged it at 92% confidence. skill-router didn't. Security compliance isn't in any book-based skill's scope, and the router has no way to know it should care. If compliance is the priority, the native reviewer is the right tool.&lt;/p&gt;

&lt;p&gt;After looking closely at how both tools are built, the difference in purpose becomes clear.&lt;/p&gt;

&lt;p&gt;The native reviewer runs &lt;strong&gt;6 parallel sub-agents&lt;/strong&gt;, each focused on one category: code quality, silent failures, type design, test coverage, comment accuracy, and security. It defaults to reviewing only the current &lt;code&gt;git diff&lt;/code&gt; — not the whole file. Before starting, it reads your &lt;code&gt;CLAUDE.md&lt;/code&gt; to pick up project conventions. And it discards any finding below 80% confidence, so output arrives pre-filtered. That's a purpose-built pre-merge gate: narrow scope, parallel specialists, high signal-to-noise.&lt;/p&gt;

&lt;p&gt;skill-router does the opposite: one agent, one deeply focused skill, applied to the whole module. It trades breadth and speed for depth and principle grounding.&lt;/p&gt;

&lt;p&gt;They target different moments in the development lifecycle, which is why using both gives ~95% coverage.&lt;/p&gt;

&lt;p&gt;One gap this benchmark exposed was the noise filtering: Claude's native reviewer discards anything below 80% confidence; skill-router had no equivalent. Since writing this, the router has been updated to instruct selected skills to classify every finding as HIGH / MEDIUM / LOW and skip LOW-tier findings on standard reviews — same idea, book-grounded framing instead of a confidence score.&lt;/p&gt;

&lt;p&gt;The full before/after code and comparison report are in the repo under &lt;a href="https://github.com/booklib-ai/skills/tree/main/benchmark" rel="noopener noreferrer"&gt;&lt;code&gt;/benchmark/&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open Questions
&lt;/h2&gt;

&lt;p&gt;I don't have everything figured out. A few things I'm still exploring:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sub-agent architecture&lt;/strong&gt; — the native pr-review-toolkit runs 6 parallel sub-agents (tests, types, silent failures, comments, etc.), each a focused specialist. skill-router takes the opposite approach: one agent, one focused skill, narrow scope. Both work, but for different reasons. The open question is whether a &lt;em&gt;generate-then-evaluate&lt;/em&gt; loop — one agent produces code using a skill's patterns, a second agent checks it against the same skill's rubric — would catch more issues than a single-pass review. My current answer is no for code review, maybe for code generation. If you've tried this pattern, I'd like to know what you found.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Feedback loops&lt;/strong&gt; — the benchmark above is one data point. How do you systematically measure whether routing improves review quality across different codebases and languages?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Domain-specific routing&lt;/strong&gt; — healthcare code, fintech code, and game code each have very different "what matters most" priorities. Should routing consider the project domain, not just the file?&lt;/p&gt;

&lt;p&gt;If you've been working on similar problems — structured AI review, skill selection, multi-agent evaluation — I'd love to hear what's working for you.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Currently covering: Clean Code, Domain-Driven Design, Effective Java, Effective Kotlin, Microservices Patterns, System Design Interview, Storytelling with Data, and more. Skills are community-contributed and new books are welcome.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>codereview</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
