<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Abhiram C Divakaran</title>
    <description>The latest articles on DEV Community by Abhiram C Divakaran (@abhiramcdivakaran).</description>
    <link>https://dev.to/abhiramcdivakaran</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3874789%2F66d70fc6-42eb-4742-ae22-85c71c351a1d.jpeg</url>
      <title>DEV Community: Abhiram C Divakaran</title>
      <link>https://dev.to/abhiramcdivakaran</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/abhiramcdivakaran"/>
    <language>en</language>
    <item>
      <title>Building a Code Review Agent That Learns From Every Decision</title>
      <dc:creator>Abhiram C Divakaran</dc:creator>
      <pubDate>Sun, 12 Apr 2026 11:20:55 +0000</pubDate>
      <link>https://dev.to/abhiramcdivakaran/building-a-code-review-agent-that-learns-from-every-decision-5a4c</link>
      <guid>https://dev.to/abhiramcdivakaran/building-a-code-review-agent-that-learns-from-every-decision-5a4c</guid>
      <description>&lt;p&gt;Most AI-powered developer tools share a fundamental limitation: they reset to zero after every interaction. Close the tab, and the system forgets everything—your preferences, your team’s standards, and the context behind past decisions.&lt;br&gt;
I wanted the opposite.&lt;br&gt;
Instead of a stateless reviewer, I set out to build a code review agent that adapts over time—one that pays attention to which suggestions developers accept, which they reject, and gradually aligns itself with how a team actually works.&lt;br&gt;
The result is a review system that evolves. After a handful of pull requests, it stops behaving like a generic linter and starts resembling a teammate who understands your codebase and your norms.&lt;br&gt;
System Overview&lt;br&gt;
At a high level, the agent sits in front of pull requests and executes a tight feedback loop:&lt;br&gt;
Recall — Retrieve past review patterns and team conventions&lt;br&gt;
Review — Analyze the current diff and generate structured feedback&lt;br&gt;
Retain — Store developer decisions to refine future behavior&lt;br&gt;
A developer opens a PR, triggers the review, and receives annotated feedback. Each comment can be accepted or rejected, and that signal feeds directly back into the system.&lt;br&gt;
The interface is intentionally simple:&lt;br&gt;
Left: PR metadata and file list&lt;br&gt;
Center: syntax-highlighted diff&lt;br&gt;
Right: structured review comments with actions&lt;br&gt;
Each comment includes severity, location, category, and—when applicable—a suggested fix.&lt;br&gt;
The key is what happens after interaction: repeated rejection of a specific suggestion type (e.g., stylistic nitpicks) suppresses it in future reviews. The system adapts without explicit configuration.&lt;br&gt;
Memory as a First-Class Primitive&lt;br&gt;
The most interesting part of the system isn’t the model—it’s the memory layer.&lt;br&gt;
Instead of treating each review as an isolated task, the agent uses two primitives:&lt;br&gt;
retain() — persist feedback decisions&lt;br&gt;
recall() — retrieve relevant historical patterns&lt;br&gt;
Retaining Feedback&lt;br&gt;
Each developer action is stored as a simple, human-readable record:&lt;br&gt;
Python&lt;br&gt;
async def retain_feedback(repo: str, pr_id: str, comment: str, file: str, action: str):&lt;br&gt;
    payload = {&lt;br&gt;
        "collection": f"reviews:{repo}",&lt;br&gt;
        "content": f"PR #{pr_id} | File: {file} | Comment: {comment} | Developer {action} this suggestion.",&lt;br&gt;
        "metadata": {"pr_id": pr_id, "file": file, "action": action}&lt;br&gt;
    }&lt;br&gt;
    ...&lt;br&gt;
Notably, the system avoids rigid schemas. Instead of structured JSON objects, it stores plain language summaries.&lt;br&gt;
Recalling Context&lt;br&gt;
When a new review starts, the system retrieves patterns:&lt;br&gt;
Python&lt;br&gt;
async def recall_context(repo: str) -&amp;gt; dict:&lt;br&gt;
    ...&lt;br&gt;
    return {"past_patterns": past_patterns or "No past patterns yet."}&lt;br&gt;
These patterns are injected directly into the model prompt.&lt;br&gt;
Why Plain Text Wins&lt;br&gt;
This design choice turned out to be critical.&lt;br&gt;
LLMs don’t need structured records—they need interpretable context. A sentence like “Developer rejected this suggestion” is immediately useful without parsing overhead. It aligns naturally with how the model reasons.&lt;br&gt;
The Review Pipeline&lt;br&gt;
The backend is a lightweight service built around three endpoints:&lt;br&gt;
GET /prs — fetch PR data&lt;br&gt;
POST /review — execute the full review pipeline&lt;br&gt;
POST /feedback — record Accept/Reject decisions&lt;br&gt;
The core flow lives inside the review endpoint:&lt;br&gt;
Python&lt;br&gt;
@app.post("/review")&lt;br&gt;
async def review_pr(request: ReviewRequest):&lt;br&gt;
    memory = await recall_context(request.repo)&lt;br&gt;
    chunks = parse_diff(request.diff)&lt;br&gt;
    comments = await generate_review(...)&lt;br&gt;
    return {"comments": comments, "memory_used": memory}&lt;br&gt;
Diff Parsing&lt;br&gt;
Diffs are split into file-level chunks, each annotated with additions and deletions. This improves the model’s ability to anchor feedback to specific lines.&lt;br&gt;
Edge cases are unavoidable—malformed diffs, missing headers, unusual filenames—so a fallback treats the entire diff as a single block when needed. Not elegant, but robust.&lt;br&gt;
Model Output&lt;br&gt;
The model is instructed to return strictly structured JSON:&lt;br&gt;
file&lt;br&gt;
line number&lt;br&gt;
severity&lt;br&gt;
category&lt;br&gt;
comment&lt;br&gt;
optional suggestion&lt;br&gt;
A defensive fallback wraps malformed responses into a valid structure when parsing fails—a necessity during early iterations.&lt;br&gt;
Example Output&lt;br&gt;
On a PR introducing an authentication endpoint, the agent produced:&lt;br&gt;
Critical (security) — direct SQL string interpolation → injection risk&lt;br&gt;
Critical (security) — MD5 used for hashing → insecure&lt;br&gt;
Warning (bug) — database connection not closed&lt;br&gt;
Praise (documentation) — clear and helpful docstring&lt;br&gt;
The inclusion of positive feedback is intentional. Purely negative reviews are easy to ignore; balanced feedback increases engagement and trust.&lt;br&gt;
What Actually Matters&lt;br&gt;
Several lessons emerged during development:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Memory Is the Differentiator
The first review is average. The tenth is meaningfully better.
The Accept/Reject loop isn’t a feature—it’s the mechanism that makes the system improve. Without it, you’re just building another static reviewer.&lt;/li&gt;
&lt;li&gt;Human-Readable Context Outperforms Structured Data
For LLM-driven systems, readability beats schema design.
Storing feedback as natural language eliminates translation layers and lets the model reason directly over prior decisions.&lt;/li&gt;
&lt;li&gt;Diff Handling Is Non-Trivial
Unified diffs contain numerous edge cases. Any production system needs defensive parsing and sensible fallbacks.&lt;/li&gt;
&lt;li&gt;Latency Shapes UX
End-to-end response time sits around 2–3 seconds. That’s fast enough to feel interactive, which is essential for developer adoption.&lt;/li&gt;
&lt;li&gt;Build for Offline and Demo Scenarios
Both the memory layer and model calls include fallbacks:
Default team standards when memory is unavailable
Mock review responses when APIs are not configured
This made development smoother and ensured the system works even without external dependencies.
Where This Goes Next
Two extensions stand out:
GitHub Integration
Replacing static PR data with live pull requests is straightforward. GitHub’s diff format is directly compatible, requiring only API integration.
Team-Aware Memory
Currently, all feedback is stored per repository. A more refined approach would segment memory by team, allowing different groups within the same repo to maintain distinct review preferences.
The Core Insight
Most AI tools operate as one-shot systems. They respond, then forget.
Adding memory changes the trajectory entirely.
Each Accept or Reject is a small signal. Individually, they’re trivial. At scale, they compound into a system that reflects how a team actually writes and reviews code.
That compounding effect is what transforms a generic assistant into something genuinely useful.
And that’s the part worth building.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffcm9bxlqqbxn7wuo2cwa.jpg" alt=" " width="800" height="518"&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0py943dh7y3eq7atmjmc.jpg" alt=" " width="800" height="518"&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzgwdo1ytdru5j2r353uq.jpg" alt=" " width="800" height="518"&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvbnn99imr41xejcnlbx1.jpg" alt=" " width="800" height="518"&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbfjh3fy02s71ndy2c8a.jpg" alt=" " width="800" height="518"&gt;
&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>codequality</category>
      <category>softwareengineering</category>
    </item>
  </channel>
</rss>
