<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aleksandr Ishchenko</title>
    <description>The latest articles on DEV Community by Aleksandr Ishchenko (@aquarvin).</description>
    <link>https://dev.to/aquarvin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1434415%2F992f07c4-3392-45b0-ae72-c309b3ad019a.jpeg</url>
      <title>DEV Community: Aleksandr Ishchenko</title>
      <link>https://dev.to/aquarvin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aquarvin"/>
    <language>en</language>
    <item>
      <title>Building a RAG-Powered Code Reviewer That Actually Understands Your Codebase</title>
      <dc:creator>Aleksandr Ishchenko</dc:creator>
      <pubDate>Mon, 15 Jun 2026 07:06:06 +0000</pubDate>
      <link>https://dev.to/aquarvin/building-a-rag-powered-code-reviewer-that-actually-understands-your-codebase-4c82</link>
      <guid>https://dev.to/aquarvin/building-a-rag-powered-code-reviewer-that-actually-understands-your-codebase-4c82</guid>
      <description>&lt;p&gt;Most AI code review tools give you generic advice. "Add type hints." "Handle exceptions." Useful, sure — but the same advice you'd get from any linter or a quick ChatGPT prompt.&lt;/p&gt;

&lt;p&gt;What if your AI reviewer could say: &lt;em&gt;"Add type hints to the constructor — consistent with how &lt;code&gt;OrderProcessor.php&lt;/code&gt; and &lt;code&gt;OrderPlaceAfter.php&lt;/code&gt; already do it in your project"&lt;/em&gt;?&lt;/p&gt;

&lt;p&gt;That's the difference between a generic AI tool and one that &lt;strong&gt;understands your codebase&lt;/strong&gt;. I built the latter. Here's how.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;I've been a Magento/PHP developer for 12+ years. Magento has complex architectural patterns — plugins, observers, dependency injection via XML, area-scoped configuration. When tools like CodeRabbit or GitHub Copilot review Magento code, they're getting better at repository-wide context — Copilot indexes your workspace, CodeRabbit reads related files. But they still treat Magento XML configurations as static text files rather than active dependency injection and event routing maps.&lt;/p&gt;

&lt;p&gt;They don't inherently know that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A class registered as a &lt;code&gt;&amp;lt;plugin&amp;gt;&lt;/code&gt; in &lt;code&gt;di.xml&lt;/code&gt; must implement at least one interceptor method (&lt;code&gt;before&lt;/code&gt;, &lt;code&gt;after&lt;/code&gt;, or &lt;code&gt;around&lt;/code&gt;) matching the target class methods&lt;/li&gt;
&lt;li&gt;An observer for &lt;code&gt;sales_order_place_after&lt;/code&gt; must implement &lt;code&gt;Magento\Framework\Event\ObserverInterface&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Direct SQL queries bypass Magento's entire database abstraction layer — repositories, resource models, caching, plugins, observers, and business logic like MSI or event-driven extensions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wanted a tool that understands these things. Not because it was trained on Magento docs, but because it &lt;strong&gt;read my actual project&lt;/strong&gt; and understands the patterns we follow.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Approach: RAG for Code
&lt;/h2&gt;

&lt;p&gt;RAG (Retrieval Augmented Generation) is a pattern where you search for relevant context before sending a query to an LLM. For code review, this means:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Index the codebase&lt;/strong&gt; — parse every PHP file into functions/methods/classes, generate embeddings, store in a vector database&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enrich with framework knowledge&lt;/strong&gt; — parse &lt;code&gt;di.xml&lt;/code&gt;, &lt;code&gt;events.xml&lt;/code&gt;, &lt;code&gt;module.xml&lt;/code&gt; to understand how classes are registered (as plugins, observers, preferences)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieve context&lt;/strong&gt; — when reviewing a file, search for similar code, related classes, and architectural patterns from the indexed codebase&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate review&lt;/strong&gt; — send the file + retrieved context to the LLM&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key insight: the LLM doesn't need to be trained on Magento. It just needs to &lt;strong&gt;see how your project does things&lt;/strong&gt;, and it can spot inconsistencies.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why RAG and not just a huge context window?
&lt;/h3&gt;

&lt;p&gt;Fair question. Gemini offers a 1-2M token context window. Why not dump the entire module in there?&lt;/p&gt;

&lt;p&gt;For a single small module (20-50 KB) — you absolutely could. But this approach is designed for a different scale: a real Magento project with &lt;code&gt;vendor/magento/&lt;/code&gt; (200+ core modules) plus &lt;code&gt;app/code/&lt;/code&gt; (dozens of custom modules). That's megabytes of code. Sending all of it into context for every review is slow, expensive, and — as my experiments showed — counterproductive.&lt;/p&gt;

&lt;p&gt;When I tested RAG with irrelevant context (a "customer hobby" module as context for an "order processing" review), the results were &lt;strong&gt;worse&lt;/strong&gt; than simple mode. The LLM got confused by unrelated code. RAG gives you &lt;strong&gt;surgical, relevant&lt;/strong&gt; context — 5 precisely selected code chunks instead of 500 files where 490 are noise.&lt;/p&gt;

&lt;p&gt;That said, for small projects, long-context is simpler and may be sufficient. RAG pays off at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;The system has three layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────┐
│  CLI: review / index / search   │
├─────────────────────────────────┤
│  Agent Layer                    │
│  ├── RAG Review Service         │
│  ├── Search Service             │
│  └── Review Service (simple)    │
├─────────────────────────────────┤
│  Core Engine                    │
│  ├── LLM Providers (Gemini...)  │
│  ├── Embedding (BGE-small)      │
│  └── PHP Parser (tree-sitter)   │
├─────────────────────────────────┤
│  Framework Layer                │
│  ├── Magento Config Parser      │
│  └── Module Indexer             │
└─────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why layers matter:&lt;/strong&gt; The core engine knows nothing about PHP or Magento. The PHP parser knows nothing about Magento. Only the framework layer has Magento-specific knowledge. Adding Symfony or Laravel support reuses ~80% of the codebase.&lt;/p&gt;

&lt;p&gt;LLM providers are behind an abstraction — switching from Gemini to Claude is one environment variable. Same for embeddings. This was a deliberate architectural choice: provider-agnostic from day one.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Indexing Works
&lt;/h2&gt;

&lt;p&gt;When you run &lt;code&gt;mage-audit index ./my-module&lt;/code&gt;, the system:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Parses PHP files with tree-sitter&lt;/strong&gt; — not regex, not string splitting. Tree-sitter builds an AST (Abstract Syntax Tree), so we extract classes, methods, and functions at their logical boundaries. A 500-line file becomes 15-20 separate chunks, each a complete logical unit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Parses Magento XML configs as structured data&lt;/strong&gt; — &lt;code&gt;di.xml&lt;/code&gt; becomes a map of plugins (with target classes, sort orders, disabled flags) and preferences (interface-to-implementation mappings). &lt;code&gt;events.xml&lt;/code&gt; becomes a list of observers with event names, handler classes, and methods. This isn't text parsing — it's structured extraction that understands what these configs &lt;em&gt;mean&lt;/em&gt; in the Magento DI/event system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Enriches chunks with framework context&lt;/strong&gt; — if a class is registered as a plugin for &lt;code&gt;OrderRepositoryInterface&lt;/code&gt;, the chunk gets tagged: &lt;code&gt;[PLUGIN for Magento\Sales\Api\OrderRepositoryInterface]&lt;/code&gt;. This tag is included in the embedding text, so searching for "plugin for order save" directly matches it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Generates embeddings and stores in pgvector&lt;/strong&gt; — each chunk becomes a 384-dimensional vector via BGE-small. This is an intentional trade-off: BGE-small is a general-purpose text embedding model, not a code-specific one. Models like &lt;code&gt;jina-embeddings-v2-base-code&lt;/code&gt; or &lt;code&gt;voyage-code-3&lt;/code&gt; would likely perform better on code search. But BGE-small runs locally on CPU with zero API cost — critical for a zero-budget MVP. The architecture supports swapping models with one config change, so upgrading is trivial when budget allows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-Strategy Retrieval
&lt;/h2&gt;

&lt;p&gt;Simple RAG uses one search query. We use three:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Code similarity&lt;/strong&gt; — find chunks with similar code structure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Name matching&lt;/strong&gt; — extract class/method names from the reviewed file, search for them directly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dependency matching&lt;/strong&gt; — extract &lt;code&gt;use&lt;/code&gt; statements, find code that uses the same interfaces&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Results are deduplicated and sorted by similarity. This catches context that a single strategy would miss — a class name search finds the exact interface implementation, while a code similarity search finds structurally similar methods.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;p&gt;I tested on a PHP file with 13 known issues (SQL injection, architecture violations, missing error handling, etc.) — issues I identified manually as a senior Magento developer. This hand-curated list serves as ground truth for evaluation.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Simple Mode&lt;/th&gt;
&lt;th&gt;RAG Mode&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Issues detected (recall)&lt;/td&gt;
&lt;td&gt;54% (7/13)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;69% (9/13)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Project-specific references&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;10&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Input tokens&lt;/td&gt;
&lt;td&gt;637&lt;/td&gt;
&lt;td&gt;1,495&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;RAG found 15 percentage points more issues.&lt;/strong&gt; But the real difference is qualitative.&lt;/p&gt;

&lt;p&gt;Simple mode says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Add type hints to constructor parameters."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;RAG mode says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Add type hints to constructor parameters — &lt;strong&gt;consistent with &lt;code&gt;Model\OrderProcessor.php&lt;/code&gt; and &lt;code&gt;Observer\OrderPlaceAfter.php&lt;/code&gt;&lt;/strong&gt; from your project."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;RAG mode also found issues that Simple mode missed entirely — like the method always returning &lt;code&gt;true&lt;/code&gt; regardless of outcome, or the lack of error handling around repository calls. These are the kinds of issues that only become visible when you see how the &lt;em&gt;rest of the project&lt;/em&gt; handles similar cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Context relevance is everything.&lt;/strong&gt; When I tested RAG with an unrelated module as context (a "customer hobby" module for an "order processing" review), the results were worse than simple mode. The LLM got confused by irrelevant code. When the context was from a related module (also about orders), results improved dramatically. This confirms: retrieval quality determines review quality. Garbage context in → garbage review out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. LLMs don't return clean JSON.&lt;/strong&gt; Even with explicit instructions to "return ONLY valid JSON", Gemini would add markdown fences, inconsistently escape backslashes in PHP namespaces (&lt;code&gt;\Magento\Sales&lt;/code&gt; has invalid JSON escapes), and sometimes swap field values (putting a category in the severity field). I built a character-by-character JSON fixer and fallback field mapping for misplaced values.&lt;/p&gt;

&lt;p&gt;A note on this: Gemini does offer &lt;code&gt;response_schema&lt;/code&gt; (Structured Outputs) that enforces valid JSON at the token generation level. I chose prompt-based JSON instead for a specific reason — provider agnosticism. The same prompt works with Gemini, Claude, and OpenAI without changes. &lt;code&gt;response_schema&lt;/code&gt; is a Gemini-specific API. For a production system targeting one provider, Structured Outputs would reduce parsing issues. For a multi-provider architecture, defensive parsing is the more portable approach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Embeddings find similar code, not bugs.&lt;/strong&gt; Searching for "SQL injection vulnerability" returned the actual vulnerable function as the 3rd result, not the 1st. Embeddings measure text similarity, not security analysis. That's why you need both retrieval (find relevant code) AND generation (analyze it with LLM). Each alone is weak; together they're strong. Using code-specific embedding models (Voyage Code, Jina Code) instead of the general-purpose BGE-small would likely improve retrieval quality — that's a planned upgrade.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Free tiers are enough for building.&lt;/strong&gt; The entire project runs on Google Gemini free tier (1,500 requests/day), local embeddings (BGE-small, no API cost), and PostgreSQL + pgvector in Docker. Total spend: $0. You don't need an API budget to build serious AI applications — but you do need architecture that lets you upgrade when budget appears.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.12, FastAPI, Typer + Rich (CLI)&lt;/li&gt;
&lt;li&gt;PostgreSQL 16 + pgvector (vector storage with HNSW index)&lt;/li&gt;
&lt;li&gt;tree-sitter (AST-based PHP parsing)&lt;/li&gt;
&lt;li&gt;sentence-transformers / BGE-small (local embeddings, swappable)&lt;/li&gt;
&lt;li&gt;Google Gemini (LLM, swappable via provider abstraction)&lt;/li&gt;
&lt;li&gt;GitHub Actions (CI)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Code-specific embeddings&lt;/strong&gt; — replacing BGE-small with Voyage Code or Jina Code for better retrieval&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid search&lt;/strong&gt; — combining BM25 (keyword) with embedding (semantic) search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured Outputs&lt;/strong&gt; — comparing Gemini's &lt;code&gt;response_schema&lt;/code&gt; vs prompt-based JSON&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub App integration&lt;/strong&gt; — auto-review on pull requests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-agent architecture&lt;/strong&gt; — separate agents for security, performance, and architecture analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The project is open source: &lt;a href="https://github.com/Aquarvin/mage-audit" rel="noopener noreferrer"&gt;github.com/Aquarvin/mage-audit&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm a senior PHP/Magento developer transitioning into AI Engineering. This project is both a real tool and a learning vehicle — every architectural decision, experiment, and dead end is documented in the repo. If you're hiring AI Engineers or interested in discussing RAG for code analysis — reach out on &lt;a href="https://www.linkedin.com/in/aquar/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or &lt;a href="https://t.me/aquarvin" rel="noopener noreferrer"&gt;Telegram&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>php</category>
      <category>rag</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
