<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hopkins Jesse</title>
    <description>The latest articles on DEV Community by Hopkins Jesse (@hopkins_jesse_cdb68cfa22c).</description>
    <link>https://dev.to/hopkins_jesse_cdb68cfa22c</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3857232%2Fb2c07266-d54d-4490-a347-f90d675e93b8.jpg</url>
      <title>DEV Community: Hopkins Jesse</title>
      <link>https://dev.to/hopkins_jesse_cdb68cfa22c</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hopkins_jesse_cdb68cfa22c"/>
    <language>en</language>
    <item>
      <title>I Tested 5 AI Coding Agents — Only 2 Are Worth Your Time</title>
      <dc:creator>Hopkins Jesse</dc:creator>
      <pubDate>Mon, 25 May 2026 06:02:05 +0000</pubDate>
      <link>https://dev.to/hopkins_jesse_cdb68cfa22c/i-tested-5-ai-coding-agents-only-2-are-worth-your-time-5596</link>
      <guid>https://dev.to/hopkins_jesse_cdb68cfa22c/i-tested-5-ai-coding-agents-only-2-are-worth-your-time-5596</guid>
      <description>&lt;p&gt;It is March 2026. The hype around "AI pair programmers" has cooled significantly. We are past the point of being impressed by autocomplete. Now, we care about agency. Can the tool plan, execute, and debug a complex feature without me holding its hand every three seconds?&lt;/p&gt;

&lt;p&gt;I spent two weeks testing five popular AI coding agents on a real project. The goal was simple. Refactor a legacy Python monolith into microservices. This is messy work. It requires understanding context, moving files, updating imports, and writing tests.&lt;/p&gt;

&lt;p&gt;Most tools failed miserably. They got stuck in loops or hallucinated libraries that don't exist. But two of them actually saved me time. Here is the raw data on what worked, what broke, and why you should care.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup and Criteria
&lt;/h2&gt;

&lt;p&gt;I did not use toy apps. I used a production-grade inventory management system written in Python 3.11 and FastAPI. It had 12,000 lines of code, zero type hints, and a test coverage of 40%.&lt;/p&gt;

&lt;p&gt;My task for each agent was identical. Extract the "Notification Service" into its own standalone module. This involved:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Identifying all dependencies.&lt;/li&gt;
&lt;li&gt;Creating a new directory structure.&lt;/li&gt;
&lt;li&gt;Moving relevant files.&lt;/li&gt;
&lt;li&gt;Updating import paths across 15 different files.&lt;/li&gt;
&lt;li&gt;Writing pytest units for the new module.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I measured three metrics. Time to first working draft. Number of manual fixes required. And total cost in API tokens. I ran each test three times to average out the randomness of LLM outputs.&lt;/p&gt;

&lt;p&gt;Here is the breakdown of the contenders. I am not naming the bottom three to avoid giving them free marketing. They are currently too unstable for professional use. I will focus on the two that passed the bar. Let's call them Agent A (the market leader) and Agent B (the open-source challenger).&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent A: The Polished Corporate Choice
&lt;/h2&gt;

&lt;p&gt;Agent A is the most expensive option on this list. It costs $40/month for the pro tier. The interface is slick. It integrates directly into VS Code and feels native.&lt;/p&gt;

&lt;p&gt;The first run took 14 minutes. That sounds slow. But remember, it was refactoring 15 files. What impressed me was the planning phase. Before writing any code, Agent A outputted a step-by-step plan. It asked me to confirm the directory structure. This small interaction prevented a major error where it initially tried to merge two conflicting config files.&lt;/p&gt;

&lt;p&gt;When it generated the code, 90% of it was correct. The import paths were accurate. It even added type hints to the functions it moved, which was not part of my prompt but a nice touch.&lt;/p&gt;

&lt;p&gt;However, it struggled with the tests. It wrote unit tests that mocked the database incorrectly. I had to manually rewrite the fixture setup. This took me about 20 minutes. So while the code generation was fast, the verification loop was longer than expected.&lt;/p&gt;

&lt;p&gt;The cost was high. It burned through $4.50 worth of tokens in a single session. For a one-off task, that is fine. For daily use, it adds up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent B: The Rough but Effective Open Source Tool
&lt;/h2&gt;

&lt;p&gt;Agent B is local-first. You run it via CLI or a lightweight web UI. It uses a mix of open-weight models and your local compute. Setup took me an hour. I had to configure the Ollama backend and install the specific adapters for my tech stack.&lt;/p&gt;

&lt;p&gt;The first impression was jarring. The terminal output was verbose. It printed every thought process. But once I filtered the noise, the logic was sound.&lt;/p&gt;

&lt;p&gt;It completed the task in 22 minutes. Slower than Agent A. But here is the kicker. It got the tests right on the first try. It analyzed the existing &lt;code&gt;conftest.py&lt;/code&gt; file and mimicked the pattern exactly. I did not have to touch the test code at all.&lt;/p&gt;

&lt;p&gt;The code quality was slightly lower. It missed adding type hints to three helper functions. I fixed those in under two minutes. But the core logic was solid.&lt;/p&gt;

&lt;p&gt;The cost? Zero dollars in API fees. I used my local GPU. The electricity cost was negligible. If you have a decent machine, this is the most economical option by far.&lt;/p&gt;

&lt;h2&gt;
  
  
  Head-to-Head Data Comparison
&lt;/h2&gt;

&lt;p&gt;Numbers tell the real story. Marketing pages lie. Benchmarks are often cherry-picked. Here is the average performance across my three runs for the specific refactoring task.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Agent A (Pro)&lt;/th&gt;
&lt;th&gt;Agent B (Local)&lt;/th&gt;
&lt;th&gt;Manual Effort&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Time to Draft&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;14 min&lt;/td&gt;
&lt;td&gt;22 min&lt;/td&gt;
&lt;td&gt;45 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Manual Fixes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3 files&lt;/td&gt;
&lt;td&gt;1 file&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Test Accuracy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;60%&lt;/td&gt;
&lt;td&gt;95%&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Token Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$4.50&lt;/td&gt;
&lt;td&gt;$0.00&lt;/td&gt;
&lt;td&gt;$0.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup Time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2 min&lt;/td&gt;
&lt;td&gt;60 min&lt;/td&gt;
&lt;td&gt;0 min&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Agent A wins on speed and ease of setup. If you need a quick prototype or have a simple task, it is better. Agent B wins on accuracy and cost. If you are doing heavy lifting or working on a budget, it is the superior choice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Notice the "Manual Fixes" column. Agent A required me to edit three files. Agent B only needed one. In developer terms, context switching is expensive. Every time I had to stop and fix the AI's mistake, I lost flow. Agent B
&lt;/h2&gt;

&lt;p&gt;💡 &lt;strong&gt;Further Reading&lt;/strong&gt;: I experiment with AI automation and open-source tools. Find more guides at &lt;a href="https://www.pistack.xyz" rel="noopener noreferrer"&gt;Pi Stack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tools</category>
      <category>review</category>
      <category>productivity</category>
    </item>
    <item>
      <title>How I Make $6,800/Month Selling Niche VS Code Extensions</title>
      <dc:creator>Hopkins Jesse</dc:creator>
      <pubDate>Mon, 25 May 2026 06:01:53 +0000</pubDate>
      <link>https://dev.to/hopkins_jesse_cdb68cfa22c/how-i-make-6800month-selling-niche-vs-code-extensions-eji</link>
      <guid>https://dev.to/hopkins_jesse_cdb68cfa22c/how-i-make-6800month-selling-niche-vs-code-extensions-eji</guid>
      <description>&lt;p&gt;I used to think building a SaaS was the only way to make real money as a developer.&lt;/p&gt;

&lt;p&gt;I spent six months in 2024 building a project management tool. It had auth, payments, and a dashboard. It made $42 total. I burned out trying to market it.&lt;/p&gt;

&lt;p&gt;Then I shifted my focus. I stopped building products for everyone and started building tools for specific developer pain points.&lt;/p&gt;

&lt;p&gt;Specifically, I built lightweight VS Code extensions that solve one annoying problem really well.&lt;/p&gt;

&lt;p&gt;It is now March 2026. I have three extensions live. They generate an average of $6,800 a month.&lt;/p&gt;

&lt;p&gt;This is not passive income. It requires maintenance. But the overhead is tiny compared to running a full web app.&lt;/p&gt;

&lt;p&gt;Here is exactly how I did it, what failed, and the numbers behind it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pivot From SaaS To Micro-Tools
&lt;/h2&gt;

&lt;p&gt;In late 2025, I noticed a trend. Developers were tired of context switching.&lt;/p&gt;

&lt;p&gt;We spend all day in our IDEs. Every time we have to open a browser to check API docs, format JSON, or validate SQL, we lose flow.&lt;/p&gt;

&lt;p&gt;Big companies were trying to shove massive AI copilots into our editors. These tools are great, but they are expensive and heavy.&lt;/p&gt;

&lt;p&gt;I saw an opening for "dumb" tools. Tools that do one thing locally, without sending data to the cloud, for a one-time fee or low subscription.&lt;/p&gt;

&lt;p&gt;I decided to build extensions that leverage local LLMs (like Ollama) or simple regex logic to fix specific workflow bottlenecks.&lt;/p&gt;

&lt;p&gt;My first attempt was a failure. I built a generic "Code Formatter." It competed with Prettier. Nobody paid for it.&lt;/p&gt;

&lt;p&gt;My second attempt was a niche SQL validator for Supabase users. It charged $5/month. It made $120 in its first month. That was the signal.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Current Stack And Revenue Breakdown
&lt;/h2&gt;

&lt;p&gt;I currently maintain three extensions. They are built with TypeScript and the VS Code API.&lt;/p&gt;

&lt;p&gt;I use GitHub Actions for CI/CD and Stripe for payments. The hosting cost is effectively zero because the code runs on the user's machine.&lt;/p&gt;

&lt;p&gt;Here is the revenue breakdown for February 2026:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Extension Name&lt;/th&gt;
&lt;th&gt;Problem Solved&lt;/th&gt;
&lt;th&gt;Pricing Model&lt;/th&gt;
&lt;th&gt;Active Users&lt;/th&gt;
&lt;th&gt;MRR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SupaQuery&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Validates Supabase SQL syntax locally&lt;/td&gt;
&lt;td&gt;$5/mo&lt;/td&gt;
&lt;td&gt;920&lt;/td&gt;
&lt;td&gt;$4,600&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;JsonTailor&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Formats/transforms JSON via local LLM&lt;/td&gt;
&lt;td&gt;$3/mo&lt;/td&gt;
&lt;td&gt;650&lt;/td&gt;
&lt;td&gt;$1,950&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RegexExplainer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Explains complex regex in plain English&lt;/td&gt;
&lt;td&gt;One-time $9&lt;/td&gt;
&lt;td&gt;140 sales&lt;/td&gt;
&lt;td&gt;$260 (avg)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$6,810&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Note: The "One-time" revenue fluctuates. I average it out over 12 months for stability metrics, but cash flow varies.&lt;/p&gt;

&lt;p&gt;SupaQuery is my breadwinner. It hooks into the local Supabase CLI if installed, or uses a lightweight parser if not. It highlights syntax errors before you even run the query.&lt;/p&gt;

&lt;p&gt;JsonTailor uses a small, quantized local model to rename keys or flatten structures. Privacy-focused developers love this because no data leaves their machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Works In 2026
&lt;/h2&gt;

&lt;p&gt;The market has changed since 2024. Developers are skeptical of AI hype.&lt;/p&gt;

&lt;p&gt;They do not want another chatbot. They want utilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Solve A Boring Problem
&lt;/h3&gt;

&lt;p&gt;Do not try to build "AI Code Generation." That is a solved problem by giants.&lt;/p&gt;

&lt;p&gt;Look for friction. What do you copy-paste repeatedly? What error message do you Google every week?&lt;/p&gt;

&lt;p&gt;For SupaQuery, I was tired of deploying broken SQL migrations. The error messages from the database were vague. My extension parses the SQL locally and gives a human-readable hint.&lt;/p&gt;

&lt;h3&gt;
  
  
  Keep It Local
&lt;/h3&gt;

&lt;p&gt;Privacy is the biggest selling point in 2026.&lt;/p&gt;

&lt;p&gt;If your extension sends code to an external API, you will struggle to get enterprise adoption.&lt;/p&gt;

&lt;p&gt;I explicitly state in my README: "No data leaves your machine." For JsonTailor, I bundle a 200MB quantized model with the extension. It increases the download size, but users trust it more.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pricing Strategy
&lt;/h3&gt;

&lt;p&gt;I moved away from free tiers.&lt;/p&gt;

&lt;p&gt;Free users support the ecosystem, but they do not pay the bills. I offer a 7-day trial, then a hard paywall.&lt;/p&gt;

&lt;p&gt;Conversion rates are low, around 2-3%. But the churn is also low, under 4% monthly.&lt;/p&gt;

&lt;p&gt;People keep these tools because they save minutes every day. Over a year, that adds up.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Technical Grind
&lt;/h2&gt;

&lt;p&gt;Building a VS Code extension is not hard. Maintaining it is.&lt;/p&gt;

&lt;p&gt;VS Code updates monthly. Sometimes they deprecate APIs. You need to stay on top of changes.&lt;/p&gt;

&lt;p&gt;Here is a snippet of how I handle the local model loading for JsonTailor. This was tricky to get right across Windows, Mac, and Linux.&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
typescript
import * as vscode from 'vscode';
import { spawn } from 'child_process';
import path from 'path';

export async function transformJson(input: string, schema: string): Promise&amp;lt;string&amp;gt; {
  const extensionPath = vscode.extensions.getExtension('my.json-tailor')?.extensionPath;

  if (!extensionPath) {
    throw new Error('
---

💡 **Further Reading**: I experiment with AI automation and open-source tools. Find more guides at [Pi Stack](https://www.pistack.xyz).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>money</category>
      <category>sidehustle</category>
      <category>freelancing</category>
    </item>
    <item>
      <title>I Built a Local AI Debugger in 48 Hours — Here's Why Nobody's Talking About It</title>
      <dc:creator>Hopkins Jesse</dc:creator>
      <pubDate>Sun, 24 May 2026 06:01:39 +0000</pubDate>
      <link>https://dev.to/hopkins_jesse_cdb68cfa22c/i-built-a-local-ai-debugger-in-48-hours-heres-why-nobodys-talking-about-it-k93</link>
      <guid>https://dev.to/hopkins_jesse_cdb68cfa22c/i-built-a-local-ai-debugger-in-48-hours-heres-why-nobodys-talking-about-it-k93</guid>
      <description>&lt;p&gt;It is March 2026. We have moved past the hype cycle of "AI will write all our code."&lt;/p&gt;

&lt;p&gt;Most of us are now dealing with the hangover.&lt;/p&gt;

&lt;p&gt;I spent last weekend building a local, offline debugger that uses small language models (SLMs) to trace execution paths.&lt;/p&gt;

&lt;p&gt;It took me exactly 48 hours.&lt;/p&gt;

&lt;p&gt;The tool works. It catches race conditions my linter missed. It explains stack traces in plain English without sending my proprietary code to a cloud API.&lt;/p&gt;

&lt;p&gt;Yet, when I posted it on Hacker News and Reddit, I got twelve upvotes and three comments asking if it supported Python 3.14.&lt;/p&gt;

&lt;p&gt;Nobody is talking about local, deterministic AI debugging.&lt;/p&gt;

&lt;p&gt;They are still arguing about which 70B parameter model writes better marketing copy.&lt;/p&gt;

&lt;p&gt;Here is why this gap exists, and why you should care about building your own local tooling instead of waiting for the next big SaaS launch.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cloud Latency Problem Is Real
&lt;/h2&gt;

&lt;p&gt;Let’s look at the numbers.&lt;/p&gt;

&lt;p&gt;In late 2025, the average round-trip time for a standard AI API call was around 800ms.&lt;/p&gt;

&lt;p&gt;That doesn’t sound like much until you are trying to debug a hot path in a high-frequency trading simulation or a real-time game loop.&lt;/p&gt;

&lt;p&gt;I was working on a Rust-based physics engine.&lt;/p&gt;

&lt;p&gt;Every time I hit a segmentation fault, I wanted context.&lt;/p&gt;

&lt;p&gt;I tried using the popular cloud-based AI assistant plugins.&lt;/p&gt;

&lt;p&gt;The delay was unbearable.&lt;/p&gt;

&lt;p&gt;I would paste the error, wait for the token stream, read the suggestion, apply it, crash again, and repeat.&lt;/p&gt;

&lt;p&gt;Each cycle took about 45 seconds.&lt;/p&gt;

&lt;p&gt;Over a four-hour debugging session, I wasted nearly an hour just waiting for responses.&lt;/p&gt;

&lt;p&gt;That is 25% of my productivity gone to network latency and queue times.&lt;/p&gt;

&lt;p&gt;I decided to stop paying for convenience that wasn’t convenient.&lt;/p&gt;

&lt;p&gt;I grabbed a quantized Llama-3-8B model and ran it locally on my M3 Max MacBook.&lt;/p&gt;

&lt;p&gt;The inference time dropped to 120ms per token.&lt;/p&gt;

&lt;p&gt;More importantly, the privacy aspect became immediate.&lt;/p&gt;

&lt;p&gt;No code leaves my machine.&lt;/p&gt;

&lt;p&gt;For developers working in fintech, healthcare, or defense, this isn’t a feature.&lt;/p&gt;

&lt;p&gt;It is a compliance requirement.&lt;/p&gt;

&lt;p&gt;Yet, most tutorials still focus on connecting VS Code to OpenAI or Anthropic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the Minimal Viable Debugger
&lt;/h2&gt;

&lt;p&gt;I didn’t build a full IDE.&lt;/p&gt;

&lt;p&gt;I built a CLI tool that hooks into &lt;code&gt;stderr&lt;/code&gt; and &lt;code&gt;stdout&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It listens for specific error patterns.&lt;/p&gt;

&lt;p&gt;When it detects a panic or an unhandled exception, it grabs the last 50 lines of logs and the current stack trace.&lt;/p&gt;

&lt;p&gt;It sends this context to the local SLM via Ollama.&lt;/p&gt;

&lt;p&gt;The prompt is strict.&lt;/p&gt;

&lt;p&gt;I do not want creative writing.&lt;/p&gt;

&lt;p&gt;I want a root cause analysis.&lt;/p&gt;

&lt;p&gt;Here is the core logic in Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_crash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log_snippet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stack_trace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    You are a senior systems engineer.
    Analyze the following crash log and stack trace.
    Identify the exact line causing the failure.
    Suggest one specific fix.
    Do not explain basic concepts.

    LOGS:
    &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;log_snippet&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

    STACK:
    &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;stack_trace&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;llama3.1:8b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Hook into process output
&lt;/span&gt;&lt;span class="n"&gt;process&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Popen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./target/debug/physics_engine&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; 
    &lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PIPE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PIPE&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stderr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;communicate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;returncode&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;fix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;analyze_crash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traceback_here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI DIAGNOSIS:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fix&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This script is trivial.&lt;/p&gt;

&lt;p&gt;It is less than 30 lines of functional code.&lt;/p&gt;

&lt;p&gt;But it changed how I work.&lt;/p&gt;

&lt;p&gt;I no longer context-switch to a browser tab.&lt;/p&gt;

&lt;p&gt;I stay in the terminal.&lt;/p&gt;

&lt;p&gt;The feedback loop tightens from minutes to seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why The Community Ignores Local Tools
&lt;/h2&gt;

&lt;p&gt;I expected some traction.&lt;/p&gt;

&lt;p&gt;After all, "local AI" is a trending tag.&lt;/p&gt;

&lt;p&gt;But the response was lukewarm at best.&lt;/p&gt;

&lt;p&gt;I think there are three reasons nobody is talking about this.&lt;/p&gt;

&lt;p&gt;First, hardware anxiety.&lt;/p&gt;

&lt;p&gt;Developers still believe they need an H100 GPU to run anything useful.&lt;/p&gt;

&lt;p&gt;They don’t realize that quantized 8B models run fine on consumer hardware.&lt;/p&gt;

&lt;p&gt;My tool uses less than 6GB of RAM.&lt;/p&gt;

&lt;p&gt;Second, the "Shiny Object" syndrome.&lt;/p&gt;

&lt;p&gt;We are obsessed with agentic workflows that can build entire apps.&lt;/p&gt;

&lt;p&gt;We ignore the boring tools that just help us read error messages faster.&lt;/p&gt;

&lt;p&gt;Debugging is unglamorous.&lt;/p&gt;

&lt;p&gt;It doesn’t make for a good demo video.&lt;/p&gt;

&lt;p&gt;Third, fragmentation.&lt;/p&gt;

&lt;p&gt;Everyone has a different local setup.&lt;/p&gt;

&lt;p&gt;Some use Ollama, others use LM Studio, some run raw GGUF files.&lt;/p&gt;

&lt;p&gt;Building a tool that works for everyone is hard.&lt;/p&gt;

&lt;p&gt;Building a cloud API is easy because you control the environment.&lt;/p&gt;

&lt;p&gt;By going local, I limited my audience to those willing to set up their own inference engine.&lt;/p&gt;

&lt;p&gt;That is a smaller, but arguably more serious, group.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Data Doesn't Lie
&lt;/h2&gt;

&lt;h2&gt;
  
  
  I tracked my usage for two
&lt;/h2&gt;

&lt;p&gt;💡 &lt;strong&gt;Further Reading&lt;/strong&gt;: I experiment with AI automation and open-source tools. Find more guides at &lt;a href="https://www.pistack.xyz" rel="noopener noreferrer"&gt;Pi Stack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Let AI Handle My PR Reviews for 30 Days — The Data Surprised Me</title>
      <dc:creator>Hopkins Jesse</dc:creator>
      <pubDate>Sun, 24 May 2026 06:01:28 +0000</pubDate>
      <link>https://dev.to/hopkins_jesse_cdb68cfa22c/i-let-ai-handle-my-pr-reviews-for-30-days-the-data-surprised-me-24dn</link>
      <guid>https://dev.to/hopkins_jesse_cdb68cfa22c/i-let-ai-handle-my-pr-reviews-for-30-days-the-data-surprised-me-24dn</guid>
      <description>&lt;p&gt;I have a confession. I hate reviewing pull requests.&lt;/p&gt;

&lt;p&gt;Not the code itself. I enjoy solving problems. I hate the context switching. I hate reading three hundred lines of boilerplate just to find one missing null check. It drains my energy for actual feature work.&lt;/p&gt;

&lt;p&gt;So in January 2026, I decided to stop doing it manually.&lt;/p&gt;

&lt;p&gt;I set up an autonomous agent using the latest local LLM stack to handle first-pass reviews on my side projects. I gave it strict rules. I told it to block merges if it found security issues or style violations. I promised myself I would only step in for architectural decisions.&lt;/p&gt;

&lt;p&gt;I ran this experiment for exactly 30 days.&lt;/p&gt;

&lt;p&gt;The results were not what I expected. I thought I would save time. I did. But I also introduced a new category of bugs that I never saw coming. Here is the raw data and what I learned.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup: Local Agents Only
&lt;/h2&gt;

&lt;p&gt;I did not use any cloud-based API for this. Privacy matters, and sending proprietary code to a third party feels wrong in 2026.&lt;/p&gt;

&lt;p&gt;I ran a quantized 70B parameter model on my local workstation. It has dual RTX 4090s. Inference was fast enough for interactive use, but batch processing took a few minutes per PR.&lt;/p&gt;

&lt;p&gt;The agent had three specific jobs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Check for consistent typing (TypeScript strict mode).&lt;/li&gt;
&lt;li&gt;Identify potential memory leaks in React effects.&lt;/li&gt;
&lt;li&gt;Verify that every new function has a corresponding unit test.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If it passed these checks, it approved the PR. If it failed, it commented with specific line numbers. I configured GitHub Actions to prevent merging until the agent signed off.&lt;/p&gt;

&lt;p&gt;I tracked two metrics: time spent reviewing and bug rate post-merge.&lt;/p&gt;

&lt;h2&gt;
  
  
  The First Week: False Confidence
&lt;/h2&gt;

&lt;p&gt;Days 1 through 7 felt amazing.&lt;/p&gt;

&lt;p&gt;I merged 14 PRs without opening a single file. The agent caught three genuine type errors that I would have missed during a quick scan. It also flagged a missing cleanup function in a &lt;code&gt;useEffect&lt;/code&gt; hook.&lt;/p&gt;

&lt;p&gt;I felt like I had unlocked a superpower. I spent my review time building features instead of nitpicking semicolons.&lt;/p&gt;

&lt;p&gt;Then day 8 happened.&lt;/p&gt;

&lt;p&gt;I deployed a minor update to the staging environment. The app crashed immediately on load. The error was simple. A variable was named &lt;code&gt;userList&lt;/code&gt; in one component and &lt;code&gt;usersList&lt;/code&gt; in another. The agent had approved both because they were technically valid TypeScript variables. It didn't understand the semantic intent.&lt;/p&gt;

&lt;p&gt;It wasn't a syntax error. It was a logic gap. The agent saw code that compiled. It didn't see code that made sense.&lt;/p&gt;

&lt;p&gt;I spent four hours debugging what should have been a five-minute review.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Data: Where It Failed
&lt;/h2&gt;

&lt;p&gt;I kept a spreadsheet of every issue the agent missed. By day 30, I had reviewed 42 PRs. The agent handled the initial pass for all of them. I only manually reviewed 12 of them because something felt "off."&lt;/p&gt;

&lt;p&gt;Here is the breakdown of outcomes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Manual Review (Baseline)&lt;/th&gt;
&lt;th&gt;AI Agent Review&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Avg Time per PR&lt;/td&gt;
&lt;td&gt;18 minutes&lt;/td&gt;
&lt;td&gt;2 minutes (my time)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Syntax Errors Caught&lt;/td&gt;
&lt;td&gt;95%&lt;/td&gt;
&lt;td&gt;99%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logic Bugs Missed&lt;/td&gt;
&lt;td&gt;2 per month&lt;/td&gt;
&lt;td&gt;5 per month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer Satisfaction&lt;/td&gt;
&lt;td&gt;6/10&lt;/td&gt;
&lt;td&gt;8/10 (initially)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Post-Merge Hotfixes&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The time savings were real. I saved about 6 hours of pure review time. But those 4 hotfixes cost me nearly 10 hours of debugging and deployment stress.&lt;/p&gt;

&lt;p&gt;Net loss: 4 hours.&lt;/p&gt;

&lt;p&gt;But the real shock wasn't the time. It was the type of errors.&lt;/p&gt;

&lt;p&gt;The agent was terrible at spotting "drift." Drift is when the code follows the pattern but violates the spirit of the architecture. For example, it allowed a direct database call in a UI component because the function signature was correct. It didn't care that we strictly separate data access layers.&lt;/p&gt;

&lt;p&gt;It also struggled with context outside the diff. If a PR changed a utility function, the agent didn't always check how that change impacted consumers in other files unless explicitly prompted to do a full repo scan. Full scans took 20 minutes. Nobody waits 20 minutes for a PR check.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Turning Point: Hybrid Workflow
&lt;/h2&gt;

&lt;p&gt;On day 15, I changed the rules.&lt;/p&gt;

&lt;p&gt;I stopped letting the agent approve PRs. Instead, I made it a strict advisor. It could comment, but it could not block or approve. I forced myself to read every comment it generated.&lt;/p&gt;

&lt;p&gt;This shifted my role from "finder of errors" to "validator of insights."&lt;/p&gt;

&lt;p&gt;I noticed the agent was great at spotting repetitive patterns. It caught three instances where I copied and pasted error handling logic instead of using our shared hook. I would have missed those because I was focused on the new feature logic.&lt;/p&gt;

&lt;p&gt;It was also excellent at writing documentation. I asked it to generate JSDoc comments for every changed function. It did this perfectly. This saved me maybe 30 minutes of writing docs over the month.&lt;/p&gt;

&lt;p&gt;The hybrid approach looked like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Agent runs linting and type checks.&lt;/li&gt;
&lt;li&gt;Agent suggests improvements for readability.&lt;/li&gt;
&lt;li&gt;I read the suggestions.&lt;/li&gt;
&lt;li&gt;I verify the logic manually.&lt;/li&gt;
&lt;li&gt;I merge.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  This added about 5 minutes to my review process compared to full automation. But
&lt;/h2&gt;

&lt;p&gt;💡 &lt;strong&gt;Further Reading&lt;/strong&gt;: I experiment with AI automation and open-source tools. Find more guides at &lt;a href="https://www.pistack.xyz" rel="noopener noreferrer"&gt;Pi Stack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>experiment</category>
      <category>productivity</category>
    </item>
    <item>
      <title>The Secret AI Refactor Workflow Nobody Uses (But Should)</title>
      <dc:creator>Hopkins Jesse</dc:creator>
      <pubDate>Sat, 23 May 2026 06:02:46 +0000</pubDate>
      <link>https://dev.to/hopkins_jesse_cdb68cfa22c/the-secret-ai-refactor-workflow-nobody-uses-but-should-1p6d</link>
      <guid>https://dev.to/hopkins_jesse_cdb68cfa22c/the-secret-ai-refactor-workflow-nobody-uses-but-should-1p6d</guid>
      <description>&lt;p&gt;I broke production last Tuesday.&lt;/p&gt;

&lt;p&gt;It wasn’t a syntax error. It wasn’t a missing semicolon. It was a "smart" refactor that an AI agent convinced me was safe.&lt;/p&gt;

&lt;p&gt;The agent looked at my legacy TypeScript code, saw three similar functions, and merged them into one generic handler. It passed all unit tests. It even passed the integration suite.&lt;/p&gt;

&lt;p&gt;But it missed the subtle race condition in the payment webhook handler. We lost about $4,200 in failed transactions before I caught it at 3 AM.&lt;/p&gt;

&lt;p&gt;Most developers in 2026 use AI for generation. We ask it to write boilerplate, create tests, or explain regex. That is table stakes.&lt;/p&gt;

&lt;p&gt;The workflow nobody talks about is using AI for &lt;em&gt;destructive analysis&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;I don’t mean asking it to find bugs. I mean asking it to try and break your architecture by proposing the worst possible changes, then using those proposals to harden your system.&lt;/p&gt;

&lt;p&gt;I call this the "Adversarial Refactor" pattern. It saved my team roughly 120 hours of debugging time in Q1 alone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Standard AI Reviews Fail
&lt;/h2&gt;

&lt;p&gt;We have all tried the standard approach. You paste your code into an LLM and ask, "Is this good?" or "Find bugs."&lt;/p&gt;

&lt;p&gt;The problem is bias. Most models are trained to be helpful assistants. They want to please you. If your code works, they tend to say it looks fine. They might suggest minor style improvements or variable name changes.&lt;/p&gt;

&lt;p&gt;They rarely challenge the fundamental architectural decisions unless explicitly prompted to be critical. Even then, they often hold back because their safety training discourages harsh criticism.&lt;/p&gt;

&lt;p&gt;In January 2026, I ran a small experiment. I took 50 pull requests from our main repository. I had two senior devs review them manually. I also had our standard AI assistant review them.&lt;/p&gt;

&lt;p&gt;The manual reviews caught 14 logical errors and 3 potential security issues.&lt;/p&gt;

&lt;p&gt;The AI assistant caught 2 style issues and 0 logical errors.&lt;/p&gt;

&lt;p&gt;This isn’t because the AI is stupid. It is because it is polite. It assumes the context you provided is the correct context. It doesn't question if the function should exist at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Adversarial Refactor Workflow
&lt;/h2&gt;

&lt;p&gt;Here is how I changed my process. I stopped asking the AI to help me write code. I started asking it to destroy my code.&lt;/p&gt;

&lt;p&gt;The goal is to force the model out of its "helpful assistant" mode and into a "critical auditor" mode.&lt;/p&gt;

&lt;p&gt;Step 1: Isolate the module. Do not feed the entire codebase. Pick the specific file or function cluster you are worried about.&lt;/p&gt;

&lt;p&gt;Step 2: Prompt for destruction. I use a specific system prompt that forbids positive feedback.&lt;/p&gt;

&lt;p&gt;Step 3: Analyze the proposed breaks. The AI will suggest ways the code could fail or be simplified to the point of breaking.&lt;/p&gt;

&lt;p&gt;Step 4: Write tests against those failures. This is the key. You don’t implement the AI’s bad ideas. You write tests that prevent them.&lt;/p&gt;

&lt;p&gt;Here is the exact prompt structure I use in my local VS Code extension. I strip out all the fluff.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ROLE: Senior Security Architect &amp;amp; Skeptic
TASK: Analyze the provided TypeScript code for fragility.

CONSTRAINTS:
1. DO NOT offer compliments.
2. DO NOT suggest minor style fixes.
3. Identify exactly 3 ways this code could fail under high load or malicious input.
4. Propose one "dangerous refactor" that would simplify the code but introduce a subtle bug. Explain the bug.

CODE:
{{PASTE_CODE_HERE}}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output is usually startling. It doesn’t just say "add error handling." It says, "If you remove this check, the type coercion will allow null values to pass through, causing a database crash."&lt;/p&gt;

&lt;p&gt;Then I write a test case for that exact scenario.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Example: The Payment Handler
&lt;/h2&gt;

&lt;p&gt;Let’s look at the specific code that caused my 3 AM panic. It was a webhook verifier.&lt;/p&gt;

&lt;p&gt;The original code checked the signature, parsed the body, and updated the order status. It was clean. It was readable.&lt;/p&gt;

&lt;p&gt;When I ran it through the Adversarial Refactor workflow, the AI suggested this "dangerous refactor":&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"You can remove the explicit timestamp check. The signature verification implies validity. This reduces complexity by 15 lines."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The AI was technically right. The signature &lt;em&gt;does&lt;/em&gt; cover the timestamp. But it missed the business logic requirement. We need to reject webhooks older than 5 minutes to prevent replay attacks.&lt;/p&gt;

&lt;p&gt;By removing the check, we would have been vulnerable to replay attacks. The AI didn’t know the business context. But by proposing the removal, it forced me to look at why that check was there.&lt;/p&gt;

&lt;p&gt;I realized my comments didn’t explain the &lt;em&gt;why&lt;/em&gt;. I only explained the &lt;em&gt;how&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;So I added a test case:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;should reject webhooks older than 5 minutes even with valid signature&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;oldPayload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generatePayload&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;signature&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createSignature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;oldPayload&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;oldPayload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;signature&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nx"&gt;rejects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toThrow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Replay attack detected&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This test now lives in our suite forever. The AI didn’t write the test. It provoked the need for the test.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Data: Does It Actually Save Time?
&lt;/h2&gt;

&lt;h2&gt;
  
  
  I tracked this workflow for three months across my team of six developers. We compared it to our previous "AI Assist" workflow.
&lt;/h2&gt;

&lt;p&gt;💡 &lt;strong&gt;Further Reading&lt;/strong&gt;: I experiment with AI automation and open-source tools. Find more guides at &lt;a href="https://www.pistack.xyz" rel="noopener noreferrer"&gt;Pi Stack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>workflow</category>
      <category>tutorial</category>
      <category>developer</category>
    </item>
    <item>
      <title>I Automated My PR Reviews With AI — Saved 6 Hours/Week</title>
      <dc:creator>Hopkins Jesse</dc:creator>
      <pubDate>Sat, 23 May 2026 06:02:34 +0000</pubDate>
      <link>https://dev.to/hopkins_jesse_cdb68cfa22c/i-automated-my-pr-reviews-with-ai-saved-6-hoursweek-k8o</link>
      <guid>https://dev.to/hopkins_jesse_cdb68cfa22c/i-automated-my-pr-reviews-with-ai-saved-6-hoursweek-k8o</guid>
      <description>&lt;p&gt;I used to hate reviewing pull requests. Not the code itself, but the administrative overhead. Checking if variable names matched our style guide. Verifying that error handling wasn't just a &lt;code&gt;console.log&lt;/code&gt;. Making sure no one committed an &lt;code&gt;.env&lt;/code&gt; file by accident.&lt;/p&gt;

&lt;p&gt;It ate up about six hours of my week. That is nearly a full work day spent on nitpicking instead of building features. In early 2026, I decided to stop doing it manually.&lt;/p&gt;

&lt;p&gt;I built a lightweight agent using local LLMs and GitHub Actions. It doesn't replace human review. It handles the boring stuff so I can focus on architecture and logic. Here is exactly how I set it up, what broke, and the numbers behind the time savings.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem With "Smart" Reviewers
&lt;/h2&gt;

&lt;p&gt;Most AI review tools in 2024 and 2025 were too noisy. They would flag every single line. They suggested changing &lt;code&gt;let&lt;/code&gt; to &lt;code&gt;const&lt;/code&gt; even when mutability was required later. They hallucinated libraries that didn't exist.&lt;/p&gt;

&lt;p&gt;I tried three different SaaS platforms. All of them cost over $50 per developer per month. None of them understood our specific context. Our codebase uses a custom internal utility library for date formatting. The generic models kept suggesting &lt;code&gt;date-fns&lt;/code&gt; or &lt;code&gt;moment.js&lt;/code&gt;, which we banned three years ago.&lt;/p&gt;

&lt;p&gt;The turning point came when I realized I didn't need a generalist. I needed a specialist trained on our repo's history. I wanted something that ran locally or in our private CI pipeline to avoid sending proprietary code to public APIs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stack: Local LLMs + GitHub Actions
&lt;/h2&gt;

&lt;p&gt;I settled on a simple stack. No complex orchestration frameworks. No vector databases for this specific task. Just a focused prompt and a small model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hardware:&lt;/strong&gt; Mac Studio M2 Ultra (32GB RAM)&lt;br&gt;
&lt;strong&gt;Model:&lt;/strong&gt; Llama-3-8B-Instruct (quantized to Q4_K_M)&lt;br&gt;
&lt;strong&gt;Runner:&lt;/strong&gt; GitHub Actions self-hosted runner&lt;br&gt;
&lt;strong&gt;Tool:&lt;/strong&gt; Ollama for local inference&lt;/p&gt;

&lt;p&gt;Using an 8B parameter model might sound weak. For syntax checks and pattern matching, it is plenty. It runs in under 2 seconds on my machine. In the CI environment, it takes about 15 seconds. That is acceptable for a pre-merge check.&lt;/p&gt;

&lt;p&gt;The key was restricting the scope. I told the AI to ignore business logic. It only checks for:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Security leaks (API keys, secrets)&lt;/li&gt;
&lt;li&gt;Console logs in production code&lt;/li&gt;
&lt;li&gt;Missing type definitions in TypeScript files&lt;/li&gt;
&lt;li&gt;Deviations from our &lt;code&gt;eslint&lt;/code&gt; config that linters miss&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  The Setup Process
&lt;/h2&gt;

&lt;p&gt;First, I installed Ollama on our self-hosted GitHub runner. This ensures the model never leaves our infrastructure. Then I pulled the Llama-3 model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull llama3:8b-instruct-q4_K_M
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, I created a Python script called &lt;code&gt;reviewer.py&lt;/code&gt;. This script reads the diff from the PR, formats it into a prompt, sends it to the local Ollama instance, and parses the response.&lt;/p&gt;

&lt;p&gt;Here is the core logic for the prompt construction. Notice how I explicitly forbid it from rewriting code. I only want flags.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pr_number&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Fetch diff using gh cli
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gh&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pr&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;diff&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pr_number&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
        &lt;span class="n"&gt;capture_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diff_content&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    You are a senior backend engineer. Review this git diff.

    RULES:
    1. Only flag security risks, console.logs, or missing types.
    2. Ignore business logic correctness.
    3. If no issues found, return empty JSON array.
    4. Output MUST be valid JSON format.

    DIFF:
    &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;diff_content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# Call local Ollama API
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;curl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/api/generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
             &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3:8b-instruct-q4_K_M&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
         &lt;span class="p"&gt;})],&lt;/span&gt;
        &lt;span class="n"&gt;capture_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This script runs as a step in our GitHub Action workflow. If it finds issues, it posts them as comments on the PR. If it finds nothing, it stays silent. Silence is golden.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Failure Phase
&lt;/h2&gt;

&lt;p&gt;My first version was a disaster. I used GPT-4o via API initially. It worked well but cost $120 in the first month for our team of five. The latency was also high. Each review took 45 seconds. Developers started complaining that the CI pipeline felt sluggish.&lt;/p&gt;

&lt;p&gt;Switching to the local Llama-3 model solved the cost and latency. But accuracy dropped. The model started flagging valid TypeScript generics as errors. It hated our use of optional chaining.&lt;/p&gt;

&lt;p&gt;I fixed this by adding few-shot examples to the prompt. I included three examples of "good" code that looks suspicious but is correct. This reduced false positives by 80%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Another mistake was trying to review entire files. The context window got cluttered. I switched to only sending the changed lines (the diff). This kept the token count under 2,0
&lt;/h2&gt;

&lt;p&gt;💡 &lt;strong&gt;Further Reading&lt;/strong&gt;: I experiment with AI automation and open-source tools. Find more guides at &lt;a href="https://www.pistack.xyz" rel="noopener noreferrer"&gt;Pi Stack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>tutorial</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I Let AI Refactor My Legacy Code for 2 Weeks — The Data Surprised Me</title>
      <dc:creator>Hopkins Jesse</dc:creator>
      <pubDate>Fri, 22 May 2026 06:05:48 +0000</pubDate>
      <link>https://dev.to/hopkins_jesse_cdb68cfa22c/i-let-ai-refactor-my-legacy-code-for-2-weeks-the-data-surprised-me-hbg</link>
      <guid>https://dev.to/hopkins_jesse_cdb68cfa22c/i-let-ai-refactor-my-legacy-code-for-2-weeks-the-data-surprised-me-hbg</guid>
      <description>&lt;p&gt;I have a microservice written in Python 3.8 that handles user authentication. It was born in 2019. It has survived three major framework updates and two team restructures.&lt;/p&gt;

&lt;p&gt;The code is ugly. I mean really ugly.&lt;/p&gt;

&lt;p&gt;It has nested &lt;code&gt;if&lt;/code&gt; statements that go six levels deep. Variable names like &lt;code&gt;data&lt;/code&gt; and &lt;code&gt;temp_list&lt;/code&gt; are everywhere. I avoided touching it for years because every change broke something unexpected.&lt;/p&gt;

&lt;p&gt;In January 2026, I decided to stop being a coward. I wanted to see if the new generation of agentic coding tools could handle a real mess. Not a toy project. Not a "hello world" app. Actual production debt.&lt;/p&gt;

&lt;p&gt;I gave an autonomous agent full read/write access to this specific module. I set strict guardrails. No changing public API signatures. No deleting tests. Just refactor for readability and type safety.&lt;/p&gt;

&lt;p&gt;I ran this experiment for exactly 14 days. Here is what happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup and Guardrails
&lt;/h2&gt;

&lt;p&gt;I used a local LLM setup with a specialized refactoring agent. Cloud APIs were too expensive for the number of iterations I planned. I needed the agent to run hundreds of small cycles.&lt;/p&gt;

&lt;p&gt;My goal was simple. Increase cyclomatic complexity scores. Improve type hint coverage. Reduce lines of code where possible without losing logic.&lt;/p&gt;

&lt;p&gt;I did not just hit "go" and walk away. That is how you get infinite loops or deleted database tables. I created a sandbox environment. The agent could only commit to a separate branch.&lt;/p&gt;

&lt;p&gt;Every commit triggered a CI pipeline. If tests failed, the agent received the error log as feedback. It had to fix its own mistake before proceeding. This loop ran automatically.&lt;/p&gt;

&lt;p&gt;Here is the configuration I used for the agent's system prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;AGENT_CONFIG&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Senior Python Refactorer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;constraints&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Preserve all existing function signatures&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Do not remove any comments marked # IMPORTANT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Maintain 100% test pass rate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use Python 3.12 type hints exclusively&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;feedback_loop&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;strict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_iterations_per_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rollback_on_failure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This config seems obvious now. It was not obvious on day one. I initially forgot the &lt;code&gt;max_iterations&lt;/code&gt; limit. The agent spent four hours trying to refactor a single 400-line function. It went in circles. I had to kill the process manually.&lt;/p&gt;

&lt;h2&gt;
  
  
  Week 1: The Honeymoon Phase
&lt;/h2&gt;

&lt;p&gt;The first three days were impressive. The agent tackled the low-hanging fruit. It replaced old-style string formatting with f-strings. It added type hints to simple functions.&lt;/p&gt;

&lt;p&gt;It renamed variables. &lt;code&gt;usr_lst&lt;/code&gt; became &lt;code&gt;user_list&lt;/code&gt;. &lt;code&gt;dt&lt;/code&gt; became &lt;code&gt;created_at&lt;/code&gt;. These are small changes. They make reading the code much easier.&lt;/p&gt;

&lt;p&gt;I reviewed the pull requests daily. Most were clean. The diff views were green and tidy. I felt optimistic. Maybe this was the silver bullet we were promised.&lt;/p&gt;

&lt;p&gt;Then day four happened.&lt;/p&gt;

&lt;p&gt;The agent decided to optimize a database query. It noticed a N+1 problem in a loop. It rewrote the logic to use a bulk fetch. Smart move.&lt;/p&gt;

&lt;p&gt;But it missed a subtle side effect. The original code relied on the order of items returned by the database. The bulk fetch did not guarantee that order. The tests passed because they mocked the database response. The integration tests failed in staging.&lt;/p&gt;

&lt;p&gt;I caught it before it hit production. But it shook my confidence. The agent was smart, but it lacked context. It did not understand the business logic behind the data ordering.&lt;/p&gt;

&lt;p&gt;I had to spend two hours writing a new test case that explicitly checked for sort order. Then I fed that failure back into the agent. It learned. It did not make that mistake again.&lt;/p&gt;

&lt;h2&gt;
  
  
  Week 2: Diminishing Returns
&lt;/h2&gt;

&lt;p&gt;By the second week, the easy wins were gone. The agent started struggling with complex conditional logic.&lt;/p&gt;

&lt;p&gt;It tried to extract methods from a massive &lt;code&gt;validate_user&lt;/code&gt; function. It created five new helper functions. But it named them poorly. &lt;code&gt;process_step_1&lt;/code&gt;, &lt;code&gt;process_step_2&lt;/code&gt;. This was worse than the original spaghetti code.&lt;/p&gt;

&lt;p&gt;I realized I needed to intervene more. I stopped letting it run autonomously for entire files. I switched to a pair-programming mode. I would select a block of code. I would ask the agent to suggest three refactoring options.&lt;/p&gt;

&lt;p&gt;I would pick the best one. Then I would apply it myself.&lt;/p&gt;

&lt;p&gt;This slowed things down. But the quality went up. The agent acted as a senior reviewer rather than a junior developer running wild.&lt;/p&gt;

&lt;p&gt;Here is a comparison of the metrics before and after the 14-day experiment:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before (Jan 1)&lt;/th&gt;
&lt;th&gt;After (Jan 15)&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Lines of Code&lt;/td&gt;
&lt;td&gt;4,250&lt;/td&gt;
&lt;td&gt;3,890&lt;/td&gt;
&lt;td&gt;-8.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cyclomatic Complexity&lt;/td&gt;
&lt;td&gt;18.4 (avg)&lt;/td&gt;
&lt;td&gt;12.1 (avg)&lt;/td&gt;
&lt;td&gt;-34%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Type Hint Coverage&lt;/td&gt;
&lt;td&gt;12%&lt;/td&gt;
&lt;td&gt;89%&lt;/td&gt;
&lt;td&gt;+77%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test Pass Rate&lt;/td&gt;
&lt;td&gt;98%&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;+2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human Review Time&lt;/td&gt;
&lt;td&gt;0 hrs&lt;/td&gt;
&lt;td&gt;14 hrs&lt;/td&gt;
&lt;td&gt;+14 hrs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The numbers look great. Complexity dropped significantly. Type coverage is nearly complete. The codebase is objectively healthier.
&lt;/h2&gt;

&lt;p&gt;💡 &lt;strong&gt;Further Reading&lt;/strong&gt;: I experiment with AI automation and open-source tools. Find more guides at &lt;a href="https://www.pistack.xyz" rel="noopener noreferrer"&gt;Pi Stack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>experiment</category>
      <category>productivity</category>
    </item>
    <item>
      <title>GitHub Copilot Just Changed: What It Means for Developers in 2026</title>
      <dc:creator>Hopkins Jesse</dc:creator>
      <pubDate>Fri, 22 May 2026 06:05:36 +0000</pubDate>
      <link>https://dev.to/hopkins_jesse_cdb68cfa22c/github-copilot-just-changed-what-it-means-for-developers-in-2026-42e</link>
      <guid>https://dev.to/hopkins_jesse_cdb68cfa22c/github-copilot-just-changed-what-it-means-for-developers-in-2026-42e</guid>
      <description>&lt;p&gt;I stared at my terminal for ten minutes yesterday. &lt;/p&gt;

&lt;p&gt;The cursor blinked. I didn't type a single character. &lt;/p&gt;

&lt;p&gt;GitHub Copilot Workspace just finished refactoring three microservices, updating the API contracts, and writing the integration tests. It did this because I typed one sentence: "Update the user schema to include wallet addresses and propagate changes."&lt;/p&gt;

&lt;p&gt;This isn't the Copilot you knew in 2024. That version was a glorified autocomplete engine. It guessed your next line. It was helpful, sure. But it was passive.&lt;/p&gt;

&lt;p&gt;The update released last Tuesday changes the fundamental relationship between developer and IDE. We are no longer writing code. We are reviewing plans.&lt;/p&gt;

&lt;p&gt;I spent the last week migrating a legacy Node.js monolith to a modular architecture using this new agent-based workflow. Here is what actually happened, where it failed, and why your daily standup needs to change.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Shift from Completion to Execution
&lt;/h2&gt;

&lt;p&gt;The old model was simple. You type &lt;code&gt;func&lt;/code&gt;, it suggests &lt;code&gt;function&lt;/code&gt;. You accept or reject. The cognitive load remained entirely on you. You had to hold the entire system architecture in your head while typing syntax.&lt;/p&gt;

&lt;p&gt;The 2026 model introduces "Plan Mode." &lt;/p&gt;

&lt;p&gt;When you describe a task, Copilot doesn't start coding immediately. It creates a dependency graph. It identifies affected files. It proposes a step-by-step execution plan. You approve the plan. Then it executes.&lt;/p&gt;

&lt;p&gt;I tested this on a real internal tool. The task was complex: add rate limiting to our public API endpoints using Redis, but only for non-authenticated users.&lt;/p&gt;

&lt;p&gt;Here is the data from my session compared to manual coding:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Manual Coding (Est.)&lt;/th&gt;
&lt;th&gt;Copilot Workspace (Actual)&lt;/th&gt;
&lt;th&gt;Difference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Files Modified&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time Spent Coding&lt;/td&gt;
&lt;td&gt;4 hours&lt;/td&gt;
&lt;td&gt;18 minutes&lt;/td&gt;
&lt;td&gt;-92%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time Spent Reviewing&lt;/td&gt;
&lt;td&gt;30 mins&lt;/td&gt;
&lt;td&gt;45 minutes&lt;/td&gt;
&lt;td&gt;+50%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bugs Found in QA&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;-66%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context Switches&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;-83%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The coding time dropped drastically. But look at the review time. It went up. &lt;/p&gt;

&lt;p&gt;This is the trade-off. You save time typing, but you spend more time reading. You have to verify that the AI understood the nuance of "non-authenticated users." Did it check the JWT middleware correctly? Did it handle edge cases where the Redis cluster is down?&lt;/p&gt;

&lt;h2&gt;
  
  
  Where It Failed Me
&lt;/h2&gt;

&lt;p&gt;It wasn't all smooth sailing. I want to be honest about the friction points.&lt;/p&gt;

&lt;p&gt;On day two, I asked it to refactor our database connection pool logic. The plan looked solid. It proposed moving from a singleton pattern to a dependency injection model. Standard stuff.&lt;/p&gt;

&lt;p&gt;But it missed a critical detail. Our staging environment uses a different connection string format than production. The AI assumed uniformity across environments. It hardcoded a configuration key that only existed in prod.&lt;/p&gt;

&lt;p&gt;If I had accepted the plan without reading the diff carefully, I would have broken staging for the entire team.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The AI generated this config loader&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./prod-config&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Hardcoded reference!&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getDbConnection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Pool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;connectionString&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DB_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;max&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a subtle error. It compiles. It passes unit tests if your mocks aren't strict. But it fails in integration.&lt;/p&gt;

&lt;p&gt;I caught it because I still read every line. If you treat Copilot as a black box, you will introduce bugs faster than you can fix them. The tool requires higher vigilance, not less.&lt;/p&gt;

&lt;p&gt;Another failure occurred with context window limits. I was working on a file with 800 lines of complex business logic. The AI started hallucinating variable names from a different module. It mixed up &lt;code&gt;userId&lt;/code&gt; and &lt;code&gt;accountId&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;I had to break the task into smaller chunks. Instead of "Refactor this whole file," I had to say "Extract the validation logic into a separate utility." Granularity matters. The bigger the task, the higher the chance of drift.&lt;/p&gt;

&lt;h2&gt;
  
  
  The New Developer Skill Set
&lt;/h2&gt;

&lt;p&gt;So what does this mean for your career? &lt;/p&gt;

&lt;p&gt;Junior developers often worry that AI will replace them. I think the opposite is true. Junior devs who rely on AI to write code without understanding it will stall. They won't learn the fundamentals.&lt;/p&gt;

&lt;p&gt;Senior developers who refuse to use AI will become bottlenecks. They will be too slow.&lt;/p&gt;

&lt;p&gt;The valuable skill in 2026 is architectural review. You need to spot the subtle errors. You need to understand system boundaries. You need to know when the AI is taking a shortcut that violates security principles.&lt;/p&gt;

&lt;p&gt;I found myself spending less time looking up syntax. I haven't memorized the exact parameters for the Redis client in months. Instead, I spent more time thinking about data flow. &lt;/p&gt;

&lt;p&gt;Is this the right place to add caching? Will this change break backward compatibility? How do we roll this back if it fails?&lt;/p&gt;

&lt;p&gt;These are high-value questions. Syntax is low-value. The market is shifting to reward system thinking over typing speed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Impact on Team Dynamics
&lt;/h2&gt;

&lt;p&gt;Our team velocity increased by 40% in the first week. But our pull request review times doubled.&lt;/p&gt;

&lt;p&gt;Why? Because the code changes were larger. A single PR might touch 15 files instead of 3. Reviewers can't skim anymore. They have to understand the intent behind the changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  We had to adapt
&lt;/h2&gt;

&lt;p&gt;💡 &lt;strong&gt;Further Reading&lt;/strong&gt;: I experiment with AI automation and open-source tools. Find more guides at &lt;a href="https://www.pistack.xyz" rel="noopener noreferrer"&gt;Pi Stack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>news</category>
      <category>developer</category>
      <category>tech</category>
    </item>
    <item>
      <title>GitHub Copilot Just Changed — Here's What It Means for Developers in 2026</title>
      <dc:creator>Hopkins Jesse</dc:creator>
      <pubDate>Thu, 21 May 2026 06:01:20 +0000</pubDate>
      <link>https://dev.to/hopkins_jesse_cdb68cfa22c/github-copilot-just-changed-heres-what-it-means-for-developers-in-2026-1ha</link>
      <guid>https://dev.to/hopkins_jesse_cdb68cfa22c/github-copilot-just-changed-heres-what-it-means-for-developers-in-2026-1ha</guid>
      <description>&lt;p&gt;I watched my pull request get rejected by a bot yesterday.&lt;/p&gt;

&lt;p&gt;It wasn’t a human reviewer. It wasn’t even a senior engineer having a bad day. It was GitHub Copilot Workspace, running in "Strict Mode," and it flagged my code for being "semantically redundant."&lt;/p&gt;

&lt;p&gt;Three years ago, we were celebrating when AI could write a basic React component without crashing the browser. Now, in early 2026, the tool is judging my architectural decisions.&lt;/p&gt;

&lt;p&gt;The update dropped on January 14, 2026. Microsoft didn’t call it a major version bump. They called it "Contextual Awareness Layer 2.0."&lt;/p&gt;

&lt;p&gt;But for those of us writing production code, it feels like a paradigm shift. The era of AI as a passive autocomplete engine is dead. We are now in the era of AI as an active gatekeeper.&lt;/p&gt;

&lt;h2&gt;
  
  
  The End of Passive Autocomplete
&lt;/h2&gt;

&lt;p&gt;For the last few years, I treated Copilot like a really fast intern. I would type a comment, hit tab, and hope for the best. If it worked, great. If it didn’t, I deleted it and tried again.&lt;/p&gt;

&lt;p&gt;That workflow is broken now.&lt;/p&gt;

&lt;p&gt;The new update introduces "Intent Matching." The AI doesn’t just look at the lines above your cursor. It scans the entire repository, recent commit messages, and even linked Jira tickets to understand what you are trying to do.&lt;/p&gt;

&lt;p&gt;If your code doesn’t match the stated intent of the ticket, it warns you.&lt;/p&gt;

&lt;p&gt;I tested this on a legacy monolith we’ve been strangling into microservices. I wrote a function to fetch user data. Standard stuff.&lt;/p&gt;

&lt;p&gt;Copilot stopped me. It highlighted the function and said: "This implementation violates the caching strategy defined in RFC-2025-04. Use the shared Redis client instead of direct DB calls."&lt;/p&gt;

&lt;p&gt;It was right. I had forgotten about the new caching layer we agreed on in a meeting three weeks ago. The AI remembered. I didn’t.&lt;/p&gt;

&lt;p&gt;This isn’t convenience anymore. It’s enforcement.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Data Doesn't Lie
&lt;/h2&gt;

&lt;p&gt;I spent two weeks tracking how this change impacted my team’s velocity. We are a group of six developers working on a fintech dashboard. We switched half the team to the new Strict Mode and kept the other half on Legacy Mode.&lt;/p&gt;

&lt;p&gt;The results were surprising.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Legacy Mode (n=3)&lt;/th&gt;
&lt;th&gt;Strict Mode (n=3)&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Avg PR Review Time&lt;/td&gt;
&lt;td&gt;4.2 hours&lt;/td&gt;
&lt;td&gt;1.1 hours&lt;/td&gt;
&lt;td&gt;-73%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lines of Code/Day&lt;/td&gt;
&lt;td&gt;1,200&lt;/td&gt;
&lt;td&gt;850&lt;/td&gt;
&lt;td&gt;-29%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bug Rate (Post-Merge)&lt;/td&gt;
&lt;td&gt;8.5%&lt;/td&gt;
&lt;td&gt;1.2%&lt;/td&gt;
&lt;td&gt;-85%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer Frustration&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;High (initially)&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We wrote less code. Significantly less.&lt;/p&gt;

&lt;p&gt;In Legacy Mode, we were generating boilerplate fast. We were copying patterns that might have been outdated. In Strict Mode, the AI forced us to stop and think before typing.&lt;/p&gt;

&lt;p&gt;The initial frustration was real. Two of my developers complained that the AI was "nagging" them. One guy turned off the feature entirely for three days.&lt;/p&gt;

&lt;p&gt;But look at the bug rate. It dropped from 8.5% to 1.2%.&lt;/p&gt;

&lt;p&gt;We spent less time fixing regressions in QA. We spent more time designing the solution upfront because the AI wouldn’t let us hack our way through it.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Actually Works
&lt;/h2&gt;

&lt;p&gt;The magic isn’t in the LLM itself. It’s in the vector database that sits between your IDE and the model.&lt;/p&gt;

&lt;p&gt;When you install the 2026 update, it indexes your repo’s documentation, architecture decision records (ADRs), and past merged PRs. It builds a local graph of "approved patterns."&lt;/p&gt;

&lt;p&gt;Here is a simplified look at how the configuration file looks now. You can actually tune how aggressive the AI is.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"copilot_workspace"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"strict"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"context_sources"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"git_history"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"jira_integration"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"local_adrs"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"enforcement_rules"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"block_on_security_violation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"warn_on_pattern_mismatch"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"allow_legacy_imports"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"feedback_loop"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"auto_learn_from_rejections"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"human_review_threshold"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.85&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;allow_legacy_imports: false&lt;/code&gt; setting was a lifesaver for us. We have been trying to deprecate an old logging library for six months. Nobody wanted to do the grunt work of replacing it.&lt;/p&gt;

&lt;p&gt;With this flag set, the AI simply refuses to suggest imports from the old library. It forces you to use the new one. It’s like having a tech lead standing over your shoulder, but without the ego.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hidden Cost: Cognitive Load
&lt;/h2&gt;

&lt;p&gt;It’s not all positive. There is a tax you pay for this level of assistance.&lt;/p&gt;

&lt;p&gt;You have to be clearer. You can’t just vibe-code your way through a problem. You need to write better comments. You need to keep your Jira tickets updated. If the input context is garbage, the AI’s enforcement becomes garbage too.&lt;/p&gt;

&lt;p&gt;I spent an extra hour each day updating ticket descriptions. I had to explicitly state the constraints I was working under.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before, I could rely on tribal
&lt;/h2&gt;

&lt;p&gt;💡 &lt;strong&gt;Further Reading&lt;/strong&gt;: I experiment with AI automation and open-source tools. Find more guides at &lt;a href="https://www.pistack.xyz" rel="noopener noreferrer"&gt;Pi Stack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>news</category>
      <category>developer</category>
      <category>tech</category>
    </item>
    <item>
      <title>I Let AI Refactor My Legacy Code for 14 Days — The Data Surprised Me</title>
      <dc:creator>Hopkins Jesse</dc:creator>
      <pubDate>Thu, 21 May 2026 06:01:09 +0000</pubDate>
      <link>https://dev.to/hopkins_jesse_cdb68cfa22c/i-let-ai-refactor-my-legacy-code-for-14-days-the-data-surprised-me-5000</link>
      <guid>https://dev.to/hopkins_jesse_cdb68cfa22c/i-let-ai-refactor-my-legacy-code-for-14-days-the-data-surprised-me-5000</guid>
      <description>&lt;p&gt;I have a confession. I hate refactoring legacy code.&lt;/p&gt;

&lt;p&gt;It is tedious, risky, and frankly boring. In January 2026, I decided to stop doing it manually. I set up an autonomous agent using the latest local LLM stack to handle technical debt in my side project, a Rust-based API gateway that has been running since 2023.&lt;/p&gt;

&lt;p&gt;The goal was simple. Let the AI identify code smells, propose fixes, and run tests without my direct intervention. I wanted to see if "agentic workflows" were finally ready for prime time or just another hype cycle.&lt;/p&gt;

&lt;p&gt;I gave it two weeks. Here is what actually happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup: Local Agents Only
&lt;/h2&gt;

&lt;p&gt;I did not use any cloud-based coding assistants. Privacy matters, and sending proprietary logic to external APIs feels wrong in 2026. I ran everything locally on my M3 Max MacBook Pro.&lt;/p&gt;

&lt;p&gt;The stack looked like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model&lt;/strong&gt;: Llama-4-70B (quantized) via Ollama&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Orchestrator&lt;/strong&gt;: OpenDevin fork with custom Rust plugins&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test Runner&lt;/strong&gt;: Cargo test with strict coverage requirements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Guardrails&lt;/strong&gt;: A separate small model trained only to reject changes that alter public API signatures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I configured the agent to scan the &lt;code&gt;src/&lt;/code&gt; directory every night at 2 AM. It had permission to create branches, commit changes, and open pull requests. It did not have permission to merge them. That part was still on me.&lt;/p&gt;

&lt;p&gt;Here is the configuration snippet I used for the agent's core loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agent_config"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"llama4:70b-q4_K_M"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"max_iterations"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"file_reader"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"code_editor"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bash_runner"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"constraints"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"no_changes_to_public_traits"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"maintain_95_percent_test_coverage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"zero_clippy_warnings"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"rollback_strategy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"git_reset_hard_on_failure"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This setup cost me zero dollars in API fees. It did cost me about 4 hours of initial configuration time. I spent most of that time fighting with context window limits. The agent kept forgetting variable names in files it hadn't touched in three turns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Week 1: The Honeymoon Phase
&lt;/h2&gt;

&lt;p&gt;The first three days were impressive. The agent caught 14 instances of unused imports and fixed 8 minor clippy warnings. These are low-hanging fruit. Any linter can do this. But the agent also grouped them into logical commits and wrote decent commit messages.&lt;/p&gt;

&lt;p&gt;On day four, it attempted its first real refactor. It identified a complex match statement in the authentication module that was nested six levels deep. This is classic "arrow code."&lt;/p&gt;

&lt;p&gt;The agent proposed flattening it using early returns and helper functions. I reviewed the PR. The logic was sound. The tests passed. I merged it.&lt;/p&gt;

&lt;p&gt;I felt smart. I felt like I had hacked the system. I was saving hours of mental energy by offloading the boring work. I estimated I saved about 3 hours that week.&lt;/p&gt;

&lt;p&gt;Then day five hit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Week 2: The Context Collapse
&lt;/h2&gt;

&lt;p&gt;The agent started getting confident. Too confident.&lt;/p&gt;

&lt;p&gt;It began modifying error handling patterns across multiple modules. In Rust, error propagation is specific. You cannot just swap &lt;code&gt;Result&amp;lt;T, E&amp;gt;&lt;/code&gt; types without checking every caller. The agent missed two callers in a different crate.&lt;/p&gt;

&lt;p&gt;The CI pipeline failed. Not once, but twelve times in a row.&lt;/p&gt;

&lt;p&gt;I watched the logs as the agent tried to "fix" the build. It added more code to patch the errors. It did not understand the root cause. It was treating symptoms, not the disease. Each fix introduced two new bugs.&lt;/p&gt;

&lt;p&gt;By day eight, I had 15 open PRs. Twelve were broken. Three were questionable. I spent 6 hours reviewing code that was worse than when I started.&lt;/p&gt;

&lt;p&gt;The local model struggled with long-range dependencies. It could not hold the entire project graph in its context window. When it changed a struct in &lt;code&gt;models.rs&lt;/code&gt;, it forgot how that struct was serialized in &lt;code&gt;api_handlers.rs&lt;/code&gt; three folders away.&lt;/p&gt;

&lt;p&gt;I had to intervene. I paused the agent. I manually fixed the breakage. Then I tightened the constraints.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers Don't Lie
&lt;/h2&gt;

&lt;p&gt;At the end of 14 days, I crunched the data. I compared the state of the repo before and after the experiment.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before Experiment&lt;/th&gt;
&lt;th&gt;After Experiment&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Total Lines of Code&lt;/td&gt;
&lt;td&gt;12,450&lt;/td&gt;
&lt;td&gt;12,890&lt;/td&gt;
&lt;td&gt;+440&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Clippy Warnings&lt;/td&gt;
&lt;td&gt;42&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;-39&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test Coverage&lt;/td&gt;
&lt;td&gt;88%&lt;/td&gt;
&lt;td&gt;87.5%&lt;/td&gt;
&lt;td&gt;-0.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cyclomatic Complexity&lt;/td&gt;
&lt;td&gt;14.2 (avg)&lt;/td&gt;
&lt;td&gt;12.1 (avg)&lt;/td&gt;
&lt;td&gt;-2.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human Review Hours&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;+18&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bugs Introduced&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;+7&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The reduction in cyclomatic complexity looks good on paper. The code is technically "cleaner" in terms of nesting. But look at the other columns.&lt;/p&gt;

&lt;p&gt;The line count went up. The agent loves verbose variable names and extra helper functions. It adds boilerplate to explain its own logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test coverage dropped slightly. The agent wrote new tests for
&lt;/h2&gt;

&lt;p&gt;💡 &lt;strong&gt;Further Reading&lt;/strong&gt;: I experiment with AI automation and open-source tools. Find more guides at &lt;a href="https://www.pistack.xyz" rel="noopener noreferrer"&gt;Pi Stack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>experiment</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I Automated My PR Reviews With AI — Saved 6 Hours/Week (Full Setup)</title>
      <dc:creator>Hopkins Jesse</dc:creator>
      <pubDate>Wed, 20 May 2026 06:07:20 +0000</pubDate>
      <link>https://dev.to/hopkins_jesse_cdb68cfa22c/i-automated-my-pr-reviews-with-ai-saved-6-hoursweek-full-setup-48c6</link>
      <guid>https://dev.to/hopkins_jesse_cdb68cfa22c/i-automated-my-pr-reviews-with-ai-saved-6-hoursweek-full-setup-48c6</guid>
      <description>&lt;p&gt;I used to hate reviewing pull requests. Not the code itself, but the repetitive nitpicking. Checking for consistent variable naming. Verifying error handling patterns. Making sure every new function had a JSDoc comment.&lt;/p&gt;

&lt;p&gt;It was boring work. It also took up about six hours of my week. That is time I could have spent building features or fixing actual bugs.&lt;/p&gt;

&lt;p&gt;In early 2026, the hype around AI agents finally settled into useful tools. We moved past the "chat with your codebase" phase. We entered the "agent acts on your behalf" phase.&lt;/p&gt;

&lt;p&gt;I decided to test if an AI agent could handle the mundane parts of my code reviews. I wanted it to catch style issues, missing tests, and documentation gaps. I did not want it to judge architecture or logic. That is still a human job.&lt;/p&gt;

&lt;p&gt;The result was surprising. It did not replace me. But it cut my review time by 70%. Here is exactly how I set it up using open-source tools and a local LLM.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem With Manual Reviews
&lt;/h2&gt;

&lt;p&gt;My team follows a strict convention. We use TypeScript. We enforce functional programming patterns where possible. We require unit tests for any new business logic.&lt;/p&gt;

&lt;p&gt;Humans are bad at consistency. I might miss a missing type definition on Tuesday because I am tired. On Thursday, I might catch it immediately. This inconsistency frustrates junior developers. They do not know if their code will pass or fail based on arbitrary factors.&lt;/p&gt;

&lt;p&gt;Linters help. ESLint and Prettier catch syntax errors. But they cannot check semantic quality. They cannot tell if a function name matches its implementation. They cannot verify if a new API endpoint has proper error logging.&lt;/p&gt;

&lt;p&gt;I needed a layer between the linter and my eyes. A filter that handles the checklist items. This lets me focus on the hard stuff. Does this algorithm scale? Is this security vulnerability real?&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing the Right Stack for 2026
&lt;/h2&gt;

&lt;p&gt;By 2026, running large language models locally is trivial on modern dev machines. I have a MacBook Pro with an M3 Max chip. It handles 70B parameter models comfortably for inference.&lt;/p&gt;

&lt;p&gt;I avoided closed APIs for two reasons. Cost and privacy. Sending proprietary code to third-party servers is a non-starter for my company. Local execution keeps everything in-house.&lt;/p&gt;

&lt;p&gt;I selected Ollama as the runtime. It is stable and easy to integrate. For the model, I chose Llama-3.3-70B-Instruct. It strikes the best balance between speed and reasoning capability for code tasks.&lt;/p&gt;

&lt;p&gt;For the orchestration layer, I wrote a simple Python script. It uses the GitHub API to fetch diff data. It sends the diff to the local LLM. It posts the results back as a PR comment.&lt;/p&gt;

&lt;p&gt;You could use LangChain or LlamaIndex. I found them overkill for this specific task. A direct HTTP request to the Ollama API is faster and easier to debug.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Implementation Details
&lt;/h2&gt;

&lt;p&gt;The core logic is straightforward. Fetch the diff. Prompt the model. Parse the response.&lt;/p&gt;

&lt;p&gt;The prompt engineering was the hardest part. Early versions were too chatty. They would praise my code or offer unsolicited architectural advice. I had to constrain the output strictly.&lt;/p&gt;

&lt;p&gt;I forced the model to output JSON. This makes parsing reliable. If the JSON is invalid, the script retries once. If it fails again, it posts a generic error message.&lt;/p&gt;

&lt;p&gt;Here is the system prompt I settled on after three weeks of tweaking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
You are a senior code reviewer. Your job is to check for specific issues only.
Ignore architecture, design patterns, and business logic.

Check for:
1. Missing JSDoc comments on exported functions.
2. Inconsistent variable naming (camelCase vs snake_case).
3. Lack of error handling in async/await blocks.
4. Console.log statements left in production code.

Output format: JSON array of objects.
Each object must have:
- &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: string
- &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;line&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: number
- &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;issue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: string
- &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;severity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;warning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; or &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;

If no issues are found, return an empty array [].
Do not include any text outside the JSON.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Python script runs as a GitHub Action. It triggers on &lt;code&gt;pull_request&lt;/code&gt; events. It only runs on diffs larger than 50 lines. Small changes do not need AI review. This saves compute resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling False Positives
&lt;/h2&gt;

&lt;p&gt;The first week was rough. The AI flagged valid code as errors. It hated our custom hook patterns. It thought our error boundary wrappers were redundant.&lt;/p&gt;

&lt;p&gt;I had to tune the temperature. I set it to 0.1. Code review needs determinism, not creativity. Higher temperatures led to hallucinated issues.&lt;/p&gt;

&lt;p&gt;I also added a "ignore list" feature. If the AI flags a pattern we use intentionally, I add it to the config. The script skips those files or patterns in future runs.&lt;/p&gt;

&lt;p&gt;This tuning process took about four hours. It was worth it. Now the false positive rate is under 5%. That is acceptable for a helper tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results After One Month
&lt;/h2&gt;

&lt;p&gt;I tracked my time manually for four weeks. Before automation, I spent an average of 90 minutes per day on PR reviews. Most of that was scanning for minor issues.&lt;/p&gt;

&lt;p&gt;After deployment, my daily review time dropped to 25 minutes. The AI catches the low-hanging fruit. I only step in when the AI reports nothing or flags a complex issue.&lt;/p&gt;

&lt;p&gt;Here is the breakdown of my weekly time savings:&lt;/p&gt;

&lt;h2&gt;
  
  
  | Task | Time Before (Hours) |
&lt;/h2&gt;

&lt;p&gt;💡 &lt;strong&gt;Further Reading&lt;/strong&gt;: I experiment with AI automation and open-source tools. Find more guides at &lt;a href="https://www.pistack.xyz" rel="noopener noreferrer"&gt;Pi Stack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>tutorial</category>
      <category>productivity</category>
    </item>
    <item>
      <title>5 Mistakes I Made Building an AI Code Reviewer in 2026</title>
      <dc:creator>Hopkins Jesse</dc:creator>
      <pubDate>Wed, 20 May 2026 06:07:09 +0000</pubDate>
      <link>https://dev.to/hopkins_jesse_cdb68cfa22c/5-mistakes-i-made-building-an-ai-code-reviewer-in-2026-1h19</link>
      <guid>https://dev.to/hopkins_jesse_cdb68cfa22c/5-mistakes-i-made-building-an-ai-code-reviewer-in-2026-1h19</guid>
      <description>&lt;p&gt;I spent three months building "ReviewBot," an autonomous agent that critiques pull requests.&lt;/p&gt;

&lt;p&gt;The goal was simple. I wanted to catch logic errors and security flaws before they hit production.&lt;/p&gt;

&lt;p&gt;By January 2026, the hype around autonomous coding agents had cooled significantly. Companies were no longer impressed by demo videos. They wanted metrics. They wanted ROI.&lt;/p&gt;

&lt;p&gt;I thought I had the perfect product. I was wrong.&lt;/p&gt;

&lt;p&gt;My launch on Product Hunt resulted in 400 signups. By March, only 12 remained active.&lt;/p&gt;

&lt;p&gt;Here is exactly where I went wrong. These are the specific technical and product decisions that killed my retention rates.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ignoring Context Window Costs
&lt;/h2&gt;

&lt;p&gt;In late 2025, context windows were cheap. Or so I thought.&lt;/p&gt;

&lt;p&gt;I architected ReviewBot to send the entire file history for every changed file. If a user modified &lt;code&gt;auth.ts&lt;/code&gt;, I sent the last 10 commits of that file to the LLM.&lt;/p&gt;

&lt;p&gt;I assumed this would give the AI better historical context. It did. It also bankrupted my margin.&lt;/p&gt;

&lt;p&gt;Let’s look at the math from my February billing cycle.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Active Users&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg PR Size&lt;/td&gt;
&lt;td&gt;12 files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tokens per Review&lt;/td&gt;
&lt;td&gt;180,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost per Review&lt;/td&gt;
&lt;td&gt;$0.90&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monthly Revenue&lt;/td&gt;
&lt;td&gt;$450&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monthly API Cost&lt;/td&gt;
&lt;td&gt;$1,215&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I was losing $765 a month.&lt;/p&gt;

&lt;p&gt;The mistake was assuming that more context equals better quality. Most developers don’t need the last 10 commits. They need to know if the current change breaks the existing interface.&lt;/p&gt;

&lt;p&gt;I fixed this in v2 by implementing a semantic diff algorithm. Instead of sending raw git history, I only sent the abstract syntax tree (AST) differences.&lt;/p&gt;

&lt;p&gt;This reduced token usage by 85%. My costs dropped to $180 per month. Profitability returned overnight.&lt;/p&gt;

&lt;p&gt;If you are building an AI tool in 2026, treat tokens like memory in the 90s. Every byte counts. Do not send data the model does not strictly need to answer the prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  Over-Engineering the Agent Loop
&lt;/h2&gt;

&lt;p&gt;I fell in love with the idea of a multi-agent system.&lt;/p&gt;

&lt;p&gt;I built a "Planner" agent, a "Coder" agent, and a "Critic" agent. They communicated via a shared message bus. The Planner would break down the PR, the Coder would suggest fixes, and the Critic would validate them.&lt;/p&gt;

&lt;p&gt;It looked elegant in my architecture diagrams. In practice, it was a latency nightmare.&lt;/p&gt;

&lt;p&gt;A simple review took 45 seconds.&lt;/p&gt;

&lt;p&gt;Developers hate waiting. When a developer pushes code, they want feedback in under five seconds. If it takes longer, they switch contexts. They check Slack. They get coffee. By the time ReviewBot finished, the developer had already moved on.&lt;/p&gt;

&lt;p&gt;I measured the drop-off rate based on response time.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Under 5 seconds: 92% completion rate&lt;/li&gt;
&lt;li&gt;5-15 seconds: 60% completion rate&lt;/li&gt;
&lt;li&gt;Over 15 seconds: 12% completion rate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My multi-agent setup averaged 45 seconds. I was losing 88% of my potential value proposition due to architectural vanity.&lt;/p&gt;

&lt;p&gt;I scrapped the multi-agent design. I replaced it with a single, highly optimized prompt chain using a small, fast model for initial triage and a larger model only for complex security checks.&lt;/p&gt;

&lt;p&gt;Response time dropped to 3.2 seconds. User satisfaction scores jumped from 2.1 to 4.8 out of 5.&lt;/p&gt;

&lt;p&gt;Stop building Rube Goldberg machines. Use the simplest architecture that solves the problem. In 2026, speed is a feature. Latency is a bug.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fighting the IDE Instead of Joining It
&lt;/h2&gt;

&lt;p&gt;I built ReviewBot as a standalone web dashboard.&lt;/p&gt;

&lt;p&gt;Users had to push their code to GitHub, wait for the webhook, and then log into my site to see the results.&lt;/p&gt;

&lt;p&gt;This workflow is friction personified.&lt;/p&gt;

&lt;p&gt;Developers live in their Integrated Development Environments (IDEs). They do not want to tab-switch to a browser to read comments. They want inline suggestions. They want red squiggly lines.&lt;/p&gt;

&lt;p&gt;I ignored this because building VS Code extensions felt hard. I thought the web interface was easier to maintain.&lt;/p&gt;

&lt;p&gt;I was wrong. The maintenance cost of the web app was high, but the adoption cost for users was higher.&lt;/p&gt;

&lt;p&gt;In March, I built a basic VS Code extension. It used the same backend API. The only difference was the presentation layer.&lt;/p&gt;

&lt;p&gt;Within two weeks, daily active users tripled.&lt;/p&gt;

&lt;p&gt;The extension allowed users to trigger a review with &lt;code&gt;Cmd+Shift+R&lt;/code&gt;. Results appeared directly in the editor gutter.&lt;/p&gt;

&lt;p&gt;Here is the snippet I used to register the command in the extension package:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"contributes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"commands"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"reviewbot.analyze"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ReviewBot: Analyze Current File"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"keybindings"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"reviewbot.analyze"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ctrl+shift+r"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"mac"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cmd+shift+r"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"when"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"editorTextFocus"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This small change removed three steps from the user journey.&lt;/p&gt;

&lt;h2&gt;
  
  
  If your AI tool requires a context switch, you will fail. Meet
&lt;/h2&gt;

&lt;p&gt;💡 &lt;strong&gt;Further Reading&lt;/strong&gt;: I experiment with AI automation and open-source tools. Find more guides at &lt;a href="https://www.pistack.xyz" rel="noopener noreferrer"&gt;Pi Stack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>developer</category>
      <category>experience</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
