<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: hefty</title>
    <description>The latest articles on DEV Community by hefty (@hefty_69a4c2d631c9dd70724).</description>
    <link>https://dev.to/hefty_69a4c2d631c9dd70724</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3686846%2Fd23c7b90-6e5c-4c63-a220-85df4d0e14fa.png</url>
      <title>DEV Community: hefty</title>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hefty_69a4c2d631c9dd70724"/>
    <language>en</language>
    <item>
      <title>The Coding Agent Wrapper Is the Product Now</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Fri, 12 Jun 2026 04:07:20 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/the-coding-agent-wrapper-is-the-product-now-1l43</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/the-coding-agent-wrapper-is-the-product-now-1l43</guid>
      <description>&lt;p&gt;The model is no longer the most interesting part of a coding agent setup.&lt;/p&gt;

&lt;p&gt;That sounds wrong if you only watch demos. The demo is always about the model. It reads the issue, writes the code, explains the diff, maybe even runs the tests. Clean screen recording. Nice ending. Everyone claps.&lt;/p&gt;

&lt;p&gt;Real projects are messier. The hard part is not getting an agent to produce code once. The hard part is making that work repeatable, inspectable, recoverable, and boring enough that a team can trust it on a Tuesday when nobody has patience for another magical workflow.&lt;/p&gt;

&lt;p&gt;That is why the wrapper around the agent is starting to matter more than the agent itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chat was the first interface. It is not the final one.
&lt;/h2&gt;

&lt;p&gt;The first wave of coding-agent adoption trained people to think in prompts. Ask better questions. Paste better context. Keep the thread alive. Remind the model what matters.&lt;/p&gt;

&lt;p&gt;That works for one-off work. It breaks down the moment the work becomes a loop.&lt;/p&gt;

&lt;p&gt;A real development loop has state. It has constraints. It has review. It has failure modes. It has permissions. It has awkward handoffs between issue trackers, repos, test runners, deployment gates, and humans who are already overloaded.&lt;/p&gt;

&lt;p&gt;Recent DEV.to discussions around agent orchestration and on-commit AI review point in that direction. The conversation is moving away from "can the model write code?" and toward "what path does the work travel through before anyone trusts it?"&lt;/p&gt;

&lt;p&gt;That is the right question.&lt;/p&gt;

&lt;p&gt;If the only interface is a chat box, every workflow becomes a memory game. The human has to remember which context was provided, which assumptions were made, which checks were real, and which parts were just fluent confidence.&lt;/p&gt;

&lt;p&gt;That is not automation. That is a faster way to create review debt.&lt;/p&gt;

&lt;h2&gt;
  
  
  The workflow layer has a few different shapes
&lt;/h2&gt;

&lt;p&gt;The interesting agent tools right now do not all look the same. Some are orchestration systems. Some are local runtimes. Some are reusable skill packages. Some are cloud queues where agents work in isolated environments.&lt;/p&gt;

&lt;p&gt;The common thread is that they move value out of the prompt and into the operating layer around the model.&lt;/p&gt;

&lt;p&gt;A project like &lt;code&gt;last30days-skill&lt;/code&gt; is a good example. The useful thing is not that an agent can summarize recent discussion. The useful thing is that the research procedure is packaged: sources, search surfaces, scoring habits, and repeatable steps. That turns a messy browser habit into something closer to a dependency.&lt;/p&gt;

&lt;p&gt;Goose points at a different piece of the same problem. A local/open agent runner gives teams a place to think about provider choice, extensions, CLI usage, desktop workflows, and where the agent actually runs. That matters because agent workflows touch real files, real credentials, and real repos. Runtime control is not a philosophical preference once the workflow becomes part of how work ships.&lt;/p&gt;

&lt;p&gt;Then there are cloud work-queue products like Replicas, and gate-focused wrappers like Stagent. I would not treat Product Hunt pages as proof that a category has won. But they are useful signals. Builders are trying to solve the same thing from different angles: how do you give agents long-running tasks without losing track of what happened?&lt;/p&gt;

&lt;p&gt;That is the product surface now.&lt;/p&gt;

&lt;p&gt;Not "the model can code."&lt;/p&gt;

&lt;p&gt;"The loop can survive contact with the repo."&lt;/p&gt;

&lt;h2&gt;
  
  
  More output is not leverage by default
&lt;/h2&gt;

&lt;p&gt;There is a quiet trap in coding-agent adoption: people assume output speed converts directly into productivity.&lt;/p&gt;

&lt;p&gt;It does not.&lt;/p&gt;

&lt;p&gt;More output can make a team slower if the review surface is bad. It can bury maintainers in plausible diffs. It can produce patches that pass the shallow check and miss the actual cause. It can turn every task into a forensic exercise: what did the agent see, why did it choose this, did it run the right tests, what did it ignore?&lt;/p&gt;

&lt;p&gt;HN and Reddit discussions around agent productivity keep circling this problem. The sentiment is not just "agents are good" or "agents are bad." It is more annoying than that. Agents can be useful, but the coordination cost is real. The human still has to absorb the work.&lt;/p&gt;

&lt;p&gt;That is why gates matter.&lt;/p&gt;

&lt;p&gt;Not fake gates. Not "the agent says it reviewed itself." Real gates.&lt;/p&gt;

&lt;p&gt;Tests that actually cover the changed behavior. Diffs a human can scan. Logs that show what ran. Scope limits. Permission boundaries. A way to resume or retry without starting from scratch. A changelog when the workflow expects one. A place where the agent's assumptions are written down instead of hidden inside a chat transcript.&lt;/p&gt;

&lt;p&gt;The wrapper is where those gates live.&lt;/p&gt;

&lt;h2&gt;
  
  
  Skills are workflow dependencies, not prompt decorations
&lt;/h2&gt;

&lt;p&gt;Reusable skills are especially easy to underestimate because they look harmless.&lt;/p&gt;

&lt;p&gt;Instructions in a folder. Maybe a script. Maybe examples. Maybe a reference doc. Nothing dramatic.&lt;/p&gt;

&lt;p&gt;But once an agent starts loading those files during real work, the skill becomes part of the build process in the broadest sense. It shapes what the agent reads, what it ignores, what commands it prefers, what it considers done, and how it explains failure.&lt;/p&gt;

&lt;p&gt;That deserves the same seriousness we already apply to code dependencies.&lt;/p&gt;

&lt;p&gt;I would ask boring questions before trusting a skill in a production-adjacent workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who wrote it?&lt;/li&gt;
&lt;li&gt;What files can it read or write?&lt;/li&gt;
&lt;li&gt;Does it call external tools?&lt;/li&gt;
&lt;li&gt;Does it encode old assumptions about the repo?&lt;/li&gt;
&lt;li&gt;Does it make review easier, or just make the agent sound more confident?&lt;/li&gt;
&lt;li&gt;Can another developer inspect it without reverse-engineering a whole chat history?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The boring questions are the good ones. They are how you keep "agent productivity" from becoming "mysterious process that sometimes edits our repo."&lt;/p&gt;

&lt;h2&gt;
  
  
  Local runners and cloud queues solve different problems
&lt;/h2&gt;

&lt;p&gt;There is no single correct wrapper shape.&lt;/p&gt;

&lt;p&gt;A local runner can be the right answer when control matters most. You get closer to the repo. You can reason about local files, local commands, provider choice, and extension surfaces. That is appealing for teams that do not want every workflow trapped inside one vendor's memory system.&lt;/p&gt;

&lt;p&gt;A cloud queue can be the right answer when delegation and review matter more. Trigger from GitHub, Linear, or Slack. Run the agent somewhere isolated. Come back to a branch, a diff, or a task artifact. That can be cleaner than asking every developer to nurse a terminal session all afternoon.&lt;/p&gt;

&lt;p&gt;The mistake is treating these as aesthetic choices.&lt;/p&gt;

&lt;p&gt;They are architecture choices.&lt;/p&gt;

&lt;p&gt;Where the agent runs changes what it can see. What it can see changes what it can break. What it can break changes what you need to log, gate, and review.&lt;/p&gt;

&lt;p&gt;If the wrapper does not make those tradeoffs visible, it is not doing enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical checklist for choosing an agent workflow
&lt;/h2&gt;

&lt;p&gt;The best way to evaluate an agent tool is to ignore the demo for a minute.&lt;/p&gt;

&lt;p&gt;Ask what loop it creates.&lt;/p&gt;

&lt;p&gt;Can the workflow explain where its context came from? If the agent used docs, issues, previous runs, or repo-specific rules, can a reviewer see that trail?&lt;/p&gt;

&lt;p&gt;Does state survive between steps in a controlled way? Persistent memory is useful when it is inspectable. It is dangerous when nobody knows what the agent thinks it remembers.&lt;/p&gt;

&lt;p&gt;Are the gates hard or decorative? A passing test suite is useful. A self-written claim that "all tests pass" is not the same thing.&lt;/p&gt;

&lt;p&gt;Can the workflow switch models or providers? Maybe you do not need that today. You will care the first time pricing, limits, policy, or quality changes under you.&lt;/p&gt;

&lt;p&gt;What happens when the agent gets stuck? A good workflow should fail visibly. It should leave enough context for a human to resume. Silent failure and confident partial work are the expensive cases.&lt;/p&gt;

&lt;p&gt;Who owns the final merge? This is the line teams should be honest about. If a human owns it, design the workflow around human review. If the agent owns it, the gates need to be much stricter than most teams are ready for.&lt;/p&gt;

&lt;p&gt;None of this is as flashy as a model generating a feature from a vague prompt.&lt;/p&gt;

&lt;p&gt;It is much closer to the work that decides whether agents become part of normal engineering practice or stay trapped in impressive demos.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trust the loop, not the demo
&lt;/h2&gt;

&lt;p&gt;The next phase of coding agents will not be won by the cleanest chat transcript.&lt;/p&gt;

&lt;p&gt;It will be won by the systems that make agent work legible: runtimes, skills, queues, gates, permissions, source trails, and review surfaces. Some of those systems will look boring. Good. Boring is underrated when software has to ship.&lt;/p&gt;

&lt;p&gt;I am still skeptical of wrapper hype. A bad wrapper can hide the same old model problems behind a prettier dashboard.&lt;/p&gt;

&lt;p&gt;But the direction is correct. The model is only one part of the work now. The real question is whether the workflow around it can carry responsibility.&lt;/p&gt;

&lt;p&gt;Trust the loop, not the demo.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/soytuber/agent-orchestration-workflow-automation-dynamic-workflows-robust-agent-patterns-and-on-commit-2ceb"&gt;Agent Orchestration &amp;amp; Workflow Automation: Dynamic Workflows, Robust Agent Patterns, and On-Commit AI Code Review&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/mvanhorn/last30days-skill" rel="noopener noreferrer"&gt;mvanhorn/last30days-skill&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/aaif-goose/goose" rel="noopener noreferrer"&gt;aaif-goose/goose&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://news.ycombinator.com/item?id=47467922" rel="noopener noreferrer"&gt;Claude Code and the Great Productivity Panic of 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reddit.com/r/ClaudeWorkflows/comments/1u115k1/workflow_autonomous_claude_code_loop_for_full/" rel="noopener noreferrer"&gt;Workflow: Autonomous Claude Code Loop for Full Software Development Lifecycle&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.producthunt.com/products/replicas" rel="noopener noreferrer"&gt;Replicas&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.producthunt.com/products/stagent" rel="noopener noreferrer"&gt;Stagent&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devtools</category>
      <category>ai</category>
      <category>automation</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Parallel Coding Agents Only Work When the Handoffs Live in Files</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Thu, 11 Jun 2026 04:22:37 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/parallel-coding-agents-only-work-when-the-handoffs-live-in-files-3oa9</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/parallel-coding-agents-only-work-when-the-handoffs-live-in-files-3oa9</guid>
      <description>&lt;h2&gt;
  
  
  Most multi-agent demos optimize the wrong metric
&lt;/h2&gt;

&lt;p&gt;More agents is not a flex. It is a coordination bill.&lt;/p&gt;

&lt;p&gt;A lot of multi-agent demos still lead with the same number: how many workers ran at once. Four. Eight. A swarm. That is mostly theater if nobody can say what each worker owned, what it changed, and what still needs verification before merge.&lt;/p&gt;

&lt;p&gt;Parallelism only helps when intent survives the handoff. If the assignment evaporates when the chat window closes, you do not have a workflow. You have several agents improvising in parallel.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chat history is not a coordination layer
&lt;/h2&gt;

&lt;p&gt;This is the first thing people get wrong.&lt;/p&gt;

&lt;p&gt;A big transcript can drag one session through one task. The moment work splits, chat memory stops being a system and starts being a liability. Missing assumptions multiply. Scope drifts. Two agents solve different versions of the same problem and both think they were clear.&lt;/p&gt;

&lt;p&gt;The fix is boring and effective: write the contract down.&lt;/p&gt;

&lt;p&gt;That contract does not need to be huge. It just needs to be real.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what the worker is building&lt;/li&gt;
&lt;li&gt;what is out of scope&lt;/li&gt;
&lt;li&gt;which files or surfaces it owns&lt;/li&gt;
&lt;li&gt;what "done" means&lt;/li&gt;
&lt;li&gt;how the result will be checked&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Put that in a spec, a task file, &lt;code&gt;AGENTS.md&lt;/code&gt;, a ticket brief, whatever fits your repo. Just do not pretend a long prompt is the same thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real speedup comes from separating roles
&lt;/h2&gt;

&lt;p&gt;Parallel workflows get better the moment planning, implementation, and verification stop sharing the same muddy context.&lt;/p&gt;

&lt;p&gt;One layer figures out the task and the boundaries. Another worker executes a narrow assignment. A later pass verifies. That separation is not process theater. It is how you stop every session from re-deciding the whole project from scratch.&lt;/p&gt;

&lt;p&gt;Files are the right handoff format because files survive session boundaries. They can be reviewed. They can be updated mid-run. They do not depend on someone remembering what paragraph 34 of a transcript said two hours ago.&lt;/p&gt;

&lt;p&gt;That is the actual leverage. Not more chatter. Cleaner state transfer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Isolation matters more than swarm size
&lt;/h2&gt;

&lt;p&gt;Most coordination failures are not model failures. They are boundary failures.&lt;/p&gt;

&lt;p&gt;Parallel workers need narrow ownership, smaller tool surfaces, fresh context, and isolated places to operate when possible. Sandboxes help. Separate worktrees help. Curated tools help. Smaller ownership slices definitely help.&lt;/p&gt;

&lt;p&gt;Skip that part and "more parallelism" usually means "larger blast radius."&lt;/p&gt;

&lt;p&gt;This is why so many multi-agent setups feel impressive in a demo and exhausting in a real repo. Coordination cost rises faster than people expect. Past a certain point, extra workers mostly generate extra merge risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  Messaging is part of the system
&lt;/h2&gt;

&lt;p&gt;Once agents can keep working asynchronously, messaging stops being cleanup. It becomes infrastructure.&lt;/p&gt;

&lt;p&gt;Priorities change. A reviewer spots a bad assumption. Another task finishes early and frees up capacity. Someone needs to redirect a running worker without tearing the whole flow down.&lt;/p&gt;

&lt;p&gt;That only works if the communication lane has rules.&lt;/p&gt;

&lt;p&gt;Who can send the message? Which sessions accept outside input? What kinds of interruption are allowed? When is it worth paying the cost of context switching a worker mid-run?&lt;/p&gt;

&lt;p&gt;If you do not answer those questions, mid-run steering becomes random interference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verification is where fake parallelism gets exposed
&lt;/h2&gt;

&lt;p&gt;This is the step people keep trying to compress into vibes.&lt;/p&gt;

&lt;p&gt;"The agents finished" is not a quality signal. It means output exists. That is all.&lt;/p&gt;

&lt;p&gt;Real parallel workflows make verification explicit. Somebody checks the result. Somebody confirms the contract was met. Somebody makes sure the changes still belong together and did not quietly widen scope on the way to the branch.&lt;/p&gt;

&lt;p&gt;I would take fewer workers and one honest verification lane over a bigger swarm with no real review model.&lt;/p&gt;

&lt;p&gt;Because once implementation and verification collapse into the same vague gesture, the workflow starts lying to you. Everything looks fast. Nobody can say what is actually safe to merge.&lt;/p&gt;

&lt;h2&gt;
  
  
  The coordination ceiling shows up early
&lt;/h2&gt;

&lt;p&gt;People like to imagine the ceiling is model intelligence or context length. Usually it is human synthesis.&lt;/p&gt;

&lt;p&gt;More workers mean more review load, more handoffs, more context switching, more chances for conflicting edits, and more places for intent to degrade. At some point the bottleneck is simple: can a human still recover the plot?&lt;/p&gt;

&lt;p&gt;That is the number worth optimizing for. Not the maximum agent count. The maximum number of parallel changes a team can still explain, review, and merge cleanly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;Parallel coding is a workflow design problem before it is a model problem.&lt;/p&gt;

&lt;p&gt;Specs. &lt;code&gt;AGENTS.md&lt;/code&gt;-style instructions. Checkpoints. Isolated execution. Mid-run messaging. Dedicated verification.&lt;/p&gt;

&lt;p&gt;Those are not side quests around the real system. They are the real system.&lt;/p&gt;

&lt;p&gt;If the handoff is fuzzy, the parallelism is fake.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/googleai/vibe-coding-in-google-ai-studio-my-tips-to-prompt-better-and-create-amazing-apps-3kcp"&gt;Vibe-coding in Google AI Studio: my tips to prompt better and create amazing apps&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://raw.githubusercontent.com/langchain-ai/open-swe/main/README.md" rel="noopener noreferrer"&gt;Open SWE&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://schipper.ai/posts/parallel-coding-agents/" rel="noopener noreferrer"&gt;How I run 4-8 parallel coding agents with tmux and Markdown specs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/channels" rel="noopener noreferrer"&gt;Push events into a running session with channels&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reddit.com/r/vibecoding/comments/1ryr12i/i_no_longer_know_more_than_47_of_my_apps_code/" rel="noopener noreferrer"&gt;I no longer know more than 47% of my app's code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Why Throwing More Agents At Your Code Won't Make You Faster</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Mon, 08 Jun 2026 10:16:59 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/why-throwing-more-agents-at-your-code-wont-make-you-faster-3m2o</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/why-throwing-more-agents-at-your-code-wont-make-you-faster-3m2o</guid>
      <description>&lt;p&gt;Everyone is trying to scale up their AI coding setups right now. The pitch is simple: if one coding agent makes you faster, why not run four or eight of them in parallel?&lt;/p&gt;

&lt;p&gt;The mistake people make is treating agent count as the main optimization metric. When you spin up a bunch of concurrent agents without a plan, you don't get faster feature delivery. You just multiply the blast radius and drown yourself in context switching.&lt;/p&gt;

&lt;p&gt;Multi-agent speed doesn't come from raw concurrency. It comes from explicit artifacts, strict boundaries, and moving the handoff out of the chat window.&lt;/p&gt;

&lt;h2&gt;
  
  
  The handoff has to live in files
&lt;/h2&gt;

&lt;p&gt;If your agents are coordinating by reading each other's chat memory, your system is fragile. &lt;/p&gt;

&lt;p&gt;Operators who actually run 4-8 parallel agents successfully do not rely on implicit context. They use explicit markdown specs, &lt;code&gt;AGENTS.md&lt;/code&gt; instructions, and milestone commits. The workflow is simple: a planner agent or human writes the specification to a file, and the worker agent reads that file in a fresh session. &lt;/p&gt;

&lt;p&gt;When the implementation plan lives in files, new worker sessions perform significantly better. You preserve intent, and you stop the agents from drifting off into unrelated refactors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Isolation is not optional
&lt;/h2&gt;

&lt;p&gt;Parallel agents need isolated execution environments. Frameworks like Open SWE are pushing this heavily for a reason. &lt;/p&gt;

&lt;p&gt;If four agents have full read/write access to the same worktree, they will step on each other. You need sandboxes, separate branches, or entirely fresh environments. Curating tools and enforcing permission boundaries matter far more than how many tools your agent has access to. &lt;/p&gt;

&lt;h2&gt;
  
  
  Verification is its own stage
&lt;/h2&gt;

&lt;p&gt;The interesting part of parallel agent workflows is what happens after the code is written. &lt;/p&gt;

&lt;p&gt;You cannot treat verification as an afterthought. It deserves its own explicit stage in the pipeline. Google AI Studio docs emphasize using checkpoints and structured stops to keep output from drifting. Independent operators are doing the same thing: separating the worker role from the verification role.&lt;/p&gt;

&lt;p&gt;When an agent finishes a task, it needs a way to signal it is done, and you need a way to review the work. This is why mid-run messaging and reactive channels are becoming critical. You need to push events into a session and pull status out without breaking the flow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;Adding more agents quickly increases your review cost. The coordination ceiling hits you long before you run out of compute.&lt;/p&gt;

&lt;p&gt;Stop trying to orchestrate complex internal agent logic through massive prompts. Separate your planner work from your implementation work. Force your agents to read from and write to explicit specs. If the handoff isn't tangible, you aren't actually running parallel agents - you are just managing chaos.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/googleai/vibe-coding-in-google-ai-studio-my-tips-to-prompt-better-and-create-amazing-apps-3kcp"&gt;Vibe-coding in Google AI Studio: my tips to prompt better and create amazing apps&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://raw.githubusercontent.com/langchain-ai/open-swe/main/README.md" rel="noopener noreferrer"&gt;Open SWE&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://schipper.ai/posts/parallel-coding-agents/" rel="noopener noreferrer"&gt;How I run 4-8 parallel coding agents with tmux and Markdown specs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/channels" rel="noopener noreferrer"&gt;Push events into a running session with channels&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>MCP Servers Are Not the Hard Part</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Fri, 05 Jun 2026 03:00:50 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/mcp-servers-are-not-the-hard-part-1e7d</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/mcp-servers-are-not-the-hard-part-1e7d</guid>
      <description>&lt;p&gt;The MCP demo is easy now.&lt;/p&gt;

&lt;p&gt;That is the part people keep underestimating. Once a protocol gets enough examples, SDKs, docs, and copy-pasteable server templates, the first win becomes cheap. Your agent calls a tool. It reads a resource. It pulls context from somewhere that used to require glue code. The screenshot looks great.&lt;/p&gt;

&lt;p&gt;Then the second server shows up.&lt;/p&gt;

&lt;p&gt;Then the fifth.&lt;/p&gt;

&lt;p&gt;Then someone gives an agent access to browser debugging, issue trackers, a database console, internal docs, shell commands, deployment scripts, or all of the above because "it needs context."&lt;/p&gt;

&lt;p&gt;That is where MCP stops being a connector story and becomes an operating model story. The hard question is not "can the agent call this?" The hard question is "who decided it was allowed to call this, with which credentials, under which conditions, and what receipt do we have afterward?"&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP made integration easier. That also made governance unavoidable.
&lt;/h2&gt;

&lt;p&gt;The official Model Context Protocol framing is clean: MCP standardizes how applications provide context and tools to models. Instead of every AI application inventing its own custom integration layer, you get a common client/server mental model. Hosts run clients. Servers expose capabilities. Models can work with tools, resources, prompts, and other structured context through a more predictable interface.&lt;/p&gt;

&lt;p&gt;That is a real improvement. Integration sprawl was already becoming gross.&lt;/p&gt;

&lt;p&gt;But standardization does not remove risk. It normalizes the surface area. A shared protocol makes it easier to add more tool access, and more tool access means more decisions that need to be explicit.&lt;/p&gt;

&lt;p&gt;The mistake is treating MCP servers like editor plugins.&lt;/p&gt;

&lt;p&gt;An editor plugin mostly extends a human's workspace. An MCP server can extend an agent's authority. Those are not the same thing. A tool call can mutate state. A resource can expose sensitive context. A prompt template can shape behavior in ways reviewers never see. A task flow can let work continue after the initial user intent has become fuzzy.&lt;/p&gt;

&lt;p&gt;If you collapse all of that into "the agent has MCP access," you have already lost the plot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Direct server sprawl breaks in boring ways first
&lt;/h2&gt;

&lt;p&gt;The early version of MCP adoption is usually direct wiring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;this agent connects to that server&lt;/li&gt;
&lt;li&gt;this local config points at that credential&lt;/li&gt;
&lt;li&gt;this prompt explains when to use the tool&lt;/li&gt;
&lt;li&gt;this developer remembers which server is safe&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That works for a demo. It might even work for a solo workflow.&lt;/p&gt;

&lt;p&gt;It gets weird as soon as more people, more agents, or more environments are involved.&lt;/p&gt;

&lt;p&gt;Credential handling is the first smell. If secrets live in scattered local configs, copied setup docs, or prompt-adjacent instructions, you are depending on vibes and memory. OAuth refresh, secret rotation, and environment-specific access should not be reinvented per agent.&lt;/p&gt;

&lt;p&gt;Inventory is the next one. Most teams can tell you which production services they run. Fewer can immediately answer which agent can call which MCP server, which tools are write-capable, and which resources expose internal data.&lt;/p&gt;

&lt;p&gt;Then comes logging. A normal app integration leaves traces in service logs, API logs, database logs, or deploy logs. Agent tool calls need the same level of reviewability, but tuned for developer workflow: what did it read, what did it call, what changed, what failed, what did it skip, and which approval gate did it pass?&lt;/p&gt;

&lt;p&gt;Without that, MCP access becomes invisible infrastructure. Invisible infrastructure is where expensive mistakes go to hide.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tools, resources, prompts, and tasks are different permission surfaces
&lt;/h2&gt;

&lt;p&gt;One practical starting point: stop using "MCP access" as a single permission.&lt;/p&gt;

&lt;p&gt;Break it down.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tools&lt;/strong&gt; are active capability. They can search, inspect, create, update, delete, deploy, comment, file tickets, trigger workflows, or run commands. A read-only tool and a write-capable tool should not share the same approval path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resources&lt;/strong&gt; are context. They might be harmless docs. They might also be customer data, internal strategy, private code, credentials by accident, logs, or sensitive operational state. "The model only read it" is not a serious security model when context can steer later actions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompts&lt;/strong&gt; are policy-shaped inputs. If a server exposes prompts, those prompts can become part of how the agent decides what to do. They deserve review like any other behavior-affecting artifact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Task flows&lt;/strong&gt; add lifecycle. Once an agent can start, track, resume, or coordinate work, you need boundaries around duration, scope drift, and handoff state.&lt;/p&gt;

&lt;p&gt;This is why the gateway/control-plane discussion is more interesting than it first looks. The useful part is not the enterprise label. It is the recognition that auth, policy, observability, and tool discovery want a central place to live once the system grows past a few direct connections.&lt;/p&gt;

&lt;p&gt;Small teams do not need a giant governance program. They do need a boring list:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which MCP servers exist&lt;/li&gt;
&lt;li&gt;which agents can reach them&lt;/li&gt;
&lt;li&gt;which capabilities are read-only&lt;/li&gt;
&lt;li&gt;which capabilities can mutate state&lt;/li&gt;
&lt;li&gt;which credentials are used&lt;/li&gt;
&lt;li&gt;which calls require approval&lt;/li&gt;
&lt;li&gt;where the receipts land&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That list is not bureaucracy. It is the minimum viable map.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Chrome DevTools example makes the tradeoff obvious
&lt;/h2&gt;

&lt;p&gt;Browser and DevTools access is one of the clearest examples because the value is not theoretical.&lt;/p&gt;

&lt;p&gt;A frontend agent that can inspect the page, read console output, watch network requests, test layout, and profile runtime behavior is much more useful than one staring at static files. Static code tells you what the app claims it should do. The browser tells you what it actually did.&lt;/p&gt;

&lt;p&gt;That is exactly why the boundary matters.&lt;/p&gt;

&lt;p&gt;Runtime visibility can expose auth state, API responses, cookies, tokens, user data, unpublished UI, feature flags, and debugging surfaces. It can also tempt teams into giving an agent broad "just inspect everything" authority because the workflow feels magical when it works.&lt;/p&gt;

&lt;p&gt;The right lesson is not "do not connect agents to DevTools." That would be throwing away the good part.&lt;/p&gt;

&lt;p&gt;The right lesson is: powerful runtime access needs scoped authority.&lt;/p&gt;

&lt;p&gt;For a frontend workflow, that might mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;allow console and network inspection by default&lt;/li&gt;
&lt;li&gt;require approval before replaying mutating requests&lt;/li&gt;
&lt;li&gt;separate local development targets from authenticated production sessions&lt;/li&gt;
&lt;li&gt;log inspected URLs and tool calls&lt;/li&gt;
&lt;li&gt;keep screenshots, traces, or summaries attached to the agent's final work&lt;/li&gt;
&lt;li&gt;make skipped checks visible instead of letting the agent sound confident by default&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a much healthier mental model. The agent gets enough access to be useful, but not so much that "debug the page" quietly turns into "operate the product."&lt;/p&gt;

&lt;h2&gt;
  
  
  Context has a budget, and so does authority
&lt;/h2&gt;

&lt;p&gt;The community arguments around AI coding-agent cost are easy to reduce to model pricing, but that misses part of the point. Context is not free just because the model window got bigger. Tool exposure is not free just because the call succeeded.&lt;/p&gt;

&lt;p&gt;Every extra server gives the agent another path to spend tokens, time, credentials, and trust.&lt;/p&gt;

&lt;p&gt;This is where governance and developer ergonomics meet. A good MCP setup should make the cheap path obvious:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;use the narrowest resource that answers the question&lt;/li&gt;
&lt;li&gt;prefer read-only inspection before mutation&lt;/li&gt;
&lt;li&gt;ask for approval at the point of risk, not at the start of the whole session&lt;/li&gt;
&lt;li&gt;summarize tool calls in a form a reviewer can scan&lt;/li&gt;
&lt;li&gt;preserve enough detail to debug a bad result later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last part matters. Developers do not want a compliance archive they will never read. They want receipts that help them review work.&lt;/p&gt;

&lt;p&gt;For an agent change, the useful receipt is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;task it believed it was doing&lt;/li&gt;
&lt;li&gt;files or resources it inspected&lt;/li&gt;
&lt;li&gt;tools it called&lt;/li&gt;
&lt;li&gt;mutations it made&lt;/li&gt;
&lt;li&gt;checks it ran&lt;/li&gt;
&lt;li&gt;checks it skipped&lt;/li&gt;
&lt;li&gt;approvals it requested&lt;/li&gt;
&lt;li&gt;anything it considered out of scope&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Apply the same idea to MCP calls. If an agent used a server to inspect browser state, query a repo, read internal docs, or trigger a workflow, the reviewer should not have to reconstruct that from a vague final message.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical control plane can start small
&lt;/h2&gt;

&lt;p&gt;You do not need to build the perfect MCP gateway before using MCP.&lt;/p&gt;

&lt;p&gt;You do need to avoid the trap where every new server becomes a permanent exception.&lt;/p&gt;

&lt;p&gt;Start with five rules.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Keep an inventory before you need one
&lt;/h2&gt;

&lt;p&gt;Write down every MCP server, what it exposes, who uses it, and what environment it touches. Keep it close to the repo or platform config, not buried in someone's notes.&lt;/p&gt;

&lt;p&gt;The inventory should distinguish tools, resources, prompts, and task flows. A server that only exposes public docs is not the same as a server that can operate a browser or mutate tickets.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Split read and write access early
&lt;/h2&gt;

&lt;p&gt;Read access is still access, but write access is where blast radius spikes. Separate them before the system becomes hard to change.&lt;/p&gt;

&lt;p&gt;For example, an agent may be allowed to inspect issues, logs, docs, and local app state freely. Creating issues, changing labels, posting comments, editing files, triggering deploys, or calling production APIs should go through a narrower path.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Move credentials out of prompts
&lt;/h2&gt;

&lt;p&gt;Prompts are a terrible place to manage secrets. So are random local snippets copied between machines.&lt;/p&gt;

&lt;p&gt;If more than one person or agent uses the workflow, credentials need lifecycle: scoped access, rotation, revocation, and environment separation. Even a simple wrapper that centralizes credential lookup is better than sprinkling secrets across agent configs.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Put approval gates where the blast radius changes
&lt;/h2&gt;

&lt;p&gt;Asking for approval once at the beginning of a session is lazy design.&lt;/p&gt;

&lt;p&gt;Approval should happen when the action changes category: read to write, local to remote, draft to publish, inspect to mutate, test to deploy. That makes the interruption meaningful. It also gives the reviewer a better question to answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Make receipts part of done
&lt;/h2&gt;

&lt;p&gt;If the agent cannot explain what it touched, the work is not done.&lt;/p&gt;

&lt;p&gt;This does not need to be fancy. A markdown note, JSONL log, PR comment, or final run summary can work. The format matters less than the habit: every meaningful tool-using agent run should leave behind enough evidence for another developer to review it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real MCP maturity test
&lt;/h2&gt;

&lt;p&gt;The mature MCP question is not "how many servers can we connect?"&lt;/p&gt;

&lt;p&gt;It is "how quickly can we tell what authority this agent has, what it used, and whether that authority matched the task?"&lt;/p&gt;

&lt;p&gt;That is the shift teams need to make. MCP is a good answer to integration chaos, but it also makes agent tool access easier to spread. If you treat that spread as plugin installation, you will eventually get a mess of invisible permissions, drifting credentials, and unreviewable actions.&lt;/p&gt;

&lt;p&gt;Treat it like production integration design instead.&lt;/p&gt;

&lt;p&gt;Inventory the surface. Separate read from write. Scope credentials. Gate risky actions. Leave receipts.&lt;/p&gt;

&lt;p&gt;The server is the easy part. The operating model is where the real engineering starts.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/monuminu/model-context-protocol-mcp-the-complete-developer-guide-to-building-production-grade-ai-agents-ah3"&gt;Model Context Protocol (MCP): The Complete Developer Guide to Building Production-Grade AI Agents in 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/composiodev/what-is-an-mcp-gateway-and-why-do-enterprise-ai-teams-need-one-in-2026-1lie"&gt;What Is an MCP Gateway, and Why Do Enterprise AI Teams Need One in 2026?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://modelcontextprotocol.io/docs/getting-started/intro" rel="noopener noreferrer"&gt;Introduction - Model Context Protocol&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/_46ea277e677b888e0cd13/chrome-devtools-mcp-googles-official-mcp-server-that-lets-ai-agents-drive-chrome-devtools-1m16"&gt;chrome-devtools-mcp: Google's Official MCP Server That Lets AI Agents Drive Chrome DevTools&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://news.ycombinator.com/item?id=47976415" rel="noopener noreferrer"&gt;Uber torches 2026 AI budget on Claude Code in four months&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reddit.com/r/AI_Agents/comments/1tkse0w/ai_coding_agent_output_verification_in_2026_read/" rel="noopener noreferrer"&gt;AI coding agent output verification in 2026: read the diff, vibe check it, merge&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>security</category>
    </item>
    <item>
      <title>AI code review is a routing problem now</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Thu, 04 Jun 2026 04:12:57 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/ai-code-review-is-a-routing-problem-now-33g5</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/ai-code-review-is-a-routing-problem-now-33g5</guid>
      <description>&lt;p&gt;The weakest version of AI code review is also the easiest one to demo.&lt;/p&gt;

&lt;p&gt;Take a diff. Paste it into a model. Ask for a review. Get back a wall of comments that sound reasonable, half of which are too vague to act on and a few of which are just wrong enough to waste everyone's time.&lt;/p&gt;

&lt;p&gt;That is not a review system. That is a comment generator.&lt;/p&gt;

&lt;p&gt;The useful version looks much less magical. It looks like routing. It decides which parts of a change deserve attention, which reviewer should inspect them, what severity means, when a human needs to approve the result, and when the bot should stay quiet.&lt;/p&gt;

&lt;p&gt;That last part matters more than people admit. Developers do not ignore review bots because they hate automation. They ignore review bots because the bots train them to ignore noise.&lt;/p&gt;

&lt;h2&gt;
  
  
  One prompt cannot be the review process
&lt;/h2&gt;

&lt;p&gt;The naive prompt fails because it has no operating contract.&lt;/p&gt;

&lt;p&gt;"Review this PR" sounds clear until you ask what kind of review you mean. Security? Data migration risk? Frontend accessibility? Dependency policy? Performance footguns? API compatibility? Test coverage? Abuse cases? Dead code? Naming?&lt;/p&gt;

&lt;p&gt;A senior engineer does not review every diff the same way. A small CSS cleanup and an auth change do not deserve the same review path. A generated test update and a billing migration should not get the same amount of model attention.&lt;/p&gt;

&lt;p&gt;So why do we keep asking one general-purpose model to behave like every reviewer at once?&lt;/p&gt;

&lt;p&gt;The better pattern is boring and obvious once you see it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;split review concerns into scoped reviewers&lt;/li&gt;
&lt;li&gt;give each reviewer explicit responsibilities and non-goals&lt;/li&gt;
&lt;li&gt;normalize findings into structured output&lt;/li&gt;
&lt;li&gt;classify severity before bothering a human&lt;/li&gt;
&lt;li&gt;route risky changes differently from cheap ones&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the difference between "the model had thoughts" and "the system made a review decision."&lt;/p&gt;

&lt;h2&gt;
  
  
  Specialist reviewers need boundaries, not vibes
&lt;/h2&gt;

&lt;p&gt;Specialist agents are only useful when the specialization is real.&lt;/p&gt;

&lt;p&gt;An accessibility reviewer should know what it is allowed to flag and what it should leave alone. A security reviewer should not waste time bike-shedding component names. A migration reviewer should care about rollback paths, data shape, and blast radius. A dependency reviewer should understand lockfile churn, license policy, and transitive risk.&lt;/p&gt;

&lt;p&gt;The point is not to create a cute little panel of AI coworkers. The point is to reduce ambiguity.&lt;/p&gt;

&lt;p&gt;Every reviewer needs a job description tight enough that the coordinator can judge its output. If two agents can produce the same comment in different words, the system is probably not specialized enough. If a reviewer cannot say "no issue found" without apologizing for it, the system will drown the PR in filler.&lt;/p&gt;

&lt;p&gt;This is where AI review starts to feel like normal engineering again. You are defining interfaces. Inputs, outputs, ownership, failure behavior. The agent is just the worker behind one interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  The coordinator is where trust lives or dies
&lt;/h2&gt;

&lt;p&gt;Running five reviewers in parallel sounds powerful until all five return mediocre findings.&lt;/p&gt;

&lt;p&gt;Now the problem is worse. The developer does not have one noisy bot. They have a noisy bot committee.&lt;/p&gt;

&lt;p&gt;The coordinator layer is what makes the system survivable. It should dedupe findings, downgrade weak claims, merge overlapping concerns, and decide what deserves to block a PR. It should also preserve uncertainty instead of laundering every guess into confident review language.&lt;/p&gt;

&lt;p&gt;This is the part teams underestimate. The value of AI review is not the number of comments generated. It is the number of comments that a developer can act on without doing a forensic audit of the bot.&lt;/p&gt;

&lt;p&gt;Good review systems need structured findings. File path. Line range when possible. Category. Severity. Confidence. Suggested fix. Reasoning short enough to read. A clear distinction between "must fix" and "worth considering."&lt;/p&gt;

&lt;p&gt;Without that structure, the review becomes theater. The bot speaks. The human squints. Nobody knows whether the finding is policy, preference, or panic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Risk tiers beat blanket automation
&lt;/h2&gt;

&lt;p&gt;Not every diff deserves the same machinery.&lt;/p&gt;

&lt;p&gt;The practical move is to define risk tiers before the agent gets involved.&lt;/p&gt;

&lt;p&gt;Low-risk changes might get cheap lint-style checks and a quick scan. Medium-risk changes might trigger targeted reviewers. High-risk changes should bring in stricter gates, human approval, audit logs, and maybe a rule that the bot can suggest but not approve.&lt;/p&gt;

&lt;p&gt;This is especially true once agents touch production-adjacent tools. The failure mode is no longer just "bad code landed." It can be "the agent called the wrong internal API," "the agent had write access it did not need," or "the agent confidently touched a path with a bigger blast radius than the prompt implied."&lt;/p&gt;

&lt;p&gt;Human-in-the-loop is not a step backward. It is the control plane.&lt;/p&gt;

&lt;p&gt;The right question is not "can AI approve this by itself?" The right question is "what type of change is this, and what kind of approval makes sense for that risk?"&lt;/p&gt;

&lt;p&gt;That framing keeps small changes fast without pretending every change is small.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool filtering is part of code review
&lt;/h2&gt;

&lt;p&gt;There is another unglamorous piece here: what the agent can see and call.&lt;/p&gt;

&lt;p&gt;Tool sprawl quietly wrecks agent systems. Give a reviewer every MCP server, every repo tool, every internal command, and every document surface, and you have not made it smarter. You have made its decision space messier. You also made the permission story harder to explain.&lt;/p&gt;

&lt;p&gt;Review agents should get the smallest useful tool surface.&lt;/p&gt;

&lt;p&gt;If the task is frontend accessibility review, maybe it needs the diff, relevant component files, rendered output, and an accessibility checklist. It probably does not need deploy credentials. If the task is dependency review, it needs package metadata and policy context. It does not need broad write access to the repo.&lt;/p&gt;

&lt;p&gt;This also helps with cost. Context is not free. Parallel agents make that painfully obvious. A system that filters diffs, shares common context, and routes only the useful slice to each reviewer will be cheaper and easier to debug than one that dumps everything into every call.&lt;/p&gt;

&lt;p&gt;Filtering is not an optimization pass at the end. It is part of the architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  What small teams should steal
&lt;/h2&gt;

&lt;p&gt;Most teams do not need a Cloudflare-scale review platform. Copying the whole shape would be silly.&lt;/p&gt;

&lt;p&gt;But small teams can steal the important moves.&lt;/p&gt;

&lt;p&gt;Start with two or three review lanes instead of one generic bot. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;security and secrets&lt;/li&gt;
&lt;li&gt;risky migrations or data changes&lt;/li&gt;
&lt;li&gt;frontend behavior and accessibility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Write down what each lane is allowed to comment on. Write down what should block a PR. Write down what should be informational. Then make the output structured enough that a human can scan it quickly.&lt;/p&gt;

&lt;p&gt;Add a simple risk router. File paths alone can get you surprisingly far at first. Changes under auth, billing, migrations, infrastructure, or permission-sensitive code can trigger stricter review. Docs and test-only changes can take a cheaper path.&lt;/p&gt;

&lt;p&gt;Keep an audit trail. Which reviewers ran? Which tools did they use? Which findings were dismissed? Which ones blocked? If the system cannot explain its own behavior, people will not trust it when the stakes go up.&lt;/p&gt;

&lt;p&gt;And please add an escape hatch. A broken AI review gate should be visible, logged, and bypassable by the right human. Otherwise you have not built quality infrastructure. You have built a new way for CI to hold a team hostage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dynamic workflows make this urgent
&lt;/h2&gt;

&lt;p&gt;Parallel subagents are becoming normal product behavior. That is the direction the tools are moving: one agent plans, several agents investigate, another verifies, and the user sees the final result.&lt;/p&gt;

&lt;p&gt;That can be genuinely useful for review. Bug hunts, security checks, migration audits, and regression verification all benefit from bounded parallel work.&lt;/p&gt;

&lt;p&gt;But parallelism does not remove the need for judgment. It multiplies the need for it.&lt;/p&gt;

&lt;p&gt;More agents means more outputs to reconcile, more token spend to justify, more permission boundaries to define, and more failure modes to observe. If the orchestration layer is weak, parallel review just produces wrong answers faster and in more places.&lt;/p&gt;

&lt;p&gt;The winning setup is not "run as many agents as possible."&lt;/p&gt;

&lt;p&gt;The winning setup is "route the work so the right agents run, the wrong agents stay idle, and the human only sees findings that survived a real filter."&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;AI code review is not becoming a better comment box. It is becoming infrastructure.&lt;/p&gt;

&lt;p&gt;The teams that get value from it will treat review like a routed system with narrow workers, severity rules, permission boundaries, telemetry, and human approval where the blast radius is real.&lt;/p&gt;

&lt;p&gt;The teams that do not will keep pasting diffs into a model and wondering why developers stopped reading the output.&lt;/p&gt;

&lt;p&gt;The future of AI review is not more comments.&lt;/p&gt;

&lt;p&gt;It is knowing which comments are worth making.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blog.cloudflare.com/ai-code-review/" rel="noopener noreferrer"&gt;Orchestrating AI Code Review at scale&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://claude.com/blog/introducing-dynamic-workflows-in-claude-code" rel="noopener noreferrer"&gt;Introducing dynamic workflows in Claude Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reddit.com/r/AI_Agents/comments/1tid6nk/feels_like_people_are_giving_ai_agents_production/" rel="noopener noreferrer"&gt;feels like people are giving AI agents production access way too casually.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/topics/human-in-the-loop" rel="noopener noreferrer"&gt;human-in-the-loop GitHub topic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.producthunt.com/products/shutup-mcp" rel="noopener noreferrer"&gt;shutup-mcp&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>codequality</category>
      <category>devops</category>
    </item>
    <item>
      <title>The next AI coding bottleneck is repo understanding</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Wed, 03 Jun 2026 03:36:55 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/the-next-ai-coding-bottleneck-is-repo-understanding-4ph3</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/the-next-ai-coding-bottleneck-is-repo-understanding-4ph3</guid>
      <description>&lt;p&gt;The least interesting thing an AI coding agent can do now is generate code.&lt;/p&gt;

&lt;p&gt;That sounds harsher than I mean it. Generation still matters. Better models still matter. Faster edits still matter. But if you have used these tools on a real codebase, not a demo repo with three files and no history, you already know where the pain moved.&lt;/p&gt;

&lt;p&gt;The bottleneck is not "can the model write a React component?"&lt;/p&gt;

&lt;p&gt;The bottleneck is "does the agent understand why this repo is weird?"&lt;/p&gt;

&lt;p&gt;Real repos are full of weirdness. Naming conventions nobody wrote down. Migration leftovers. Feature flags with political history. Tests that exist because of one brutal production incident. API boundaries that look accidental until you remove them and break billing. A hundred tiny facts that separate a useful change from a confident mess.&lt;/p&gt;

&lt;p&gt;Coding agents are getting much better at editing files. The next stack has to get better at making the system legible before the edit starts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bigger context windows are not the same as understanding
&lt;/h2&gt;

&lt;p&gt;The lazy answer is to throw more context at the model.&lt;/p&gt;

&lt;p&gt;Give it the whole repo. Add the README. Add the docs. Add the last five tickets. Add the architecture decision records. Add the transcript from the previous session. Add the test output. Add the package lock, because why not.&lt;/p&gt;

&lt;p&gt;That works until it does not.&lt;/p&gt;

&lt;p&gt;A larger context window can hold more text. It does not automatically turn that text into a map. It does not know which files are architectural boundaries and which are incidental wrappers. It does not know that one directory is deprecated unless the repo says so clearly. It does not know that a scary-looking validation branch is protecting a partner integration from 2021.&lt;/p&gt;

&lt;p&gt;More context can even make the problem worse. You get the pleasant illusion that the agent has seen everything, while the useful signal is buried under raw file dumps and old notes.&lt;/p&gt;

&lt;p&gt;Repo understanding needs structure.&lt;/p&gt;

&lt;p&gt;That is why tools that turn codebases into graphs, domain maps, guided tours, semantic search surfaces, and diff-impact views feel like the right direction. The specific product does not matter as much as the pattern: parse the repo deterministically, summarize it deliberately, and create an artifact that both humans and agents can inspect.&lt;/p&gt;

&lt;p&gt;That last part matters. If the repo map is just hidden prompt fuel, it is another magic box. If it is a file, graph, guide, or generated artifact the team can review, refresh, and correct, it becomes part of the engineering system.&lt;/p&gt;

&lt;h2&gt;
  
  
  The agent stack is becoming less chat-shaped
&lt;/h2&gt;

&lt;p&gt;The early coding-agent story was mostly about the model.&lt;/p&gt;

&lt;p&gt;Which one writes better code? Which one follows instructions? Which one can make a larger change without wandering off?&lt;/p&gt;

&lt;p&gt;That is still useful, but the center of gravity is moving. The serious work is now around the harness: skills, plugins, commands, connectors, permissions, model switching, quota visibility, tool execution, and workspace state.&lt;/p&gt;

&lt;p&gt;You can see this in newer terminal-agent workflows. The CLI is no longer just a textbox with a shell nearby. It is becoming an operating surface. It tracks context. It exposes commands. It switches models. It authenticates to services. It makes the developer think about the environment around the model instead of pretending the model is the whole product.&lt;/p&gt;

&lt;p&gt;The most useful agent behavior should not live in a perfect prompt someone has to remember to paste. It should live in durable team infrastructure.&lt;/p&gt;

&lt;p&gt;If your team has a migration rule, write it down where the agent can use it. If your repo has a testing ritual, make that ritual executable or at least explicit. If your frontend has design rules, stop hoping the model infers taste from screenshots. If your security review has non-negotiables, package them as instructions that can be inspected.&lt;/p&gt;

&lt;p&gt;Prompts are cheap. Installed behavior is where the leverage is.&lt;/p&gt;

&lt;p&gt;That is also why it needs review.&lt;/p&gt;

&lt;h2&gt;
  
  
  Skills and plugins are repo context with consequences
&lt;/h2&gt;

&lt;p&gt;I like the direction of skill and plugin systems because they admit something developers already know: every team has local operating procedure.&lt;/p&gt;

&lt;p&gt;The model is generic. The work is not.&lt;/p&gt;

&lt;p&gt;One repo wants conservative dependency upgrades. Another wants aggressive refactors. One team prefers tiny PRs. Another wants complete vertical slices. One product treats accessibility as a release blocker. Another keeps it as a best-effort checklist, which is a separate problem, but still a real team behavior.&lt;/p&gt;

&lt;p&gt;When those preferences stay in chat, they disappear. When they become skills, plugins, commands, or repo-local guidance, they compound.&lt;/p&gt;

&lt;p&gt;That is the useful part.&lt;/p&gt;

&lt;p&gt;The risky part is the same sentence.&lt;/p&gt;

&lt;p&gt;They compound.&lt;/p&gt;

&lt;p&gt;A bad skill can turn into a bad habit that runs every time. A stale convention can keep steering new work months after the codebase changed. A plugin that wires in the wrong assumption can quietly shape dozens of sessions before anyone notices.&lt;/p&gt;

&lt;p&gt;So the review surface changes. We are not only reviewing generated code anymore. We are reviewing the installed behavior that produced the code.&lt;/p&gt;

&lt;p&gt;That means the boring questions become important:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who owns this repo guide?&lt;/li&gt;
&lt;li&gt;When was this architecture map refreshed?&lt;/li&gt;
&lt;li&gt;Can the agent explain which rule it followed?&lt;/li&gt;
&lt;li&gt;Can the team diff changes to skills and workflows?&lt;/li&gt;
&lt;li&gt;Can stale context expire?&lt;/li&gt;
&lt;li&gt;Can a human correct the map without reverse-engineering a vector store?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where AI coding stops looking like autocomplete and starts looking like operations work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Parallel agents make understanding an operations problem
&lt;/h2&gt;

&lt;p&gt;One agent misunderstanding a repo is annoying.&lt;/p&gt;

&lt;p&gt;Five agents misunderstanding the repo in parallel is a workflow incident.&lt;/p&gt;

&lt;p&gt;Parallel agent products are interesting because they expose the next layer of pain. Once agents can run at the same time, in separate workspaces, touching different branches, the human needs a control plane. What is running? What changed? Which session is still burning tokens? Which diff is ready? Which agent hit a permission boundary? Which local server is this thing using?&lt;/p&gt;

&lt;p&gt;The funny part: this problem is not really about AI.&lt;/p&gt;

&lt;p&gt;It is the same old software truth: concurrency creates coordination cost.&lt;/p&gt;

&lt;p&gt;Agents do not remove that cost. They move it. Sometimes they multiply it.&lt;/p&gt;

&lt;p&gt;Git isolation helps. Session dashboards help. Diff review helps. Notifications help. Passive visibility helps. But none of those replace understanding. They only become useful when the work units are grounded in a shared view of the repo.&lt;/p&gt;

&lt;p&gt;Otherwise the control plane becomes a prettier way to watch several agents produce plausible nonsense.&lt;/p&gt;

&lt;h2&gt;
  
  
  The HN skepticism is the useful warning label
&lt;/h2&gt;

&lt;p&gt;There is a recurring argument in developer discussions that coding agents can replace large chunks of the framework stack. I understand the appeal. If an agent can generate the glue code, maybe you need fewer abstractions. Maybe you can write closer to the product. Maybe scaffolding becomes disposable.&lt;/p&gt;

&lt;p&gt;Maybe.&lt;/p&gt;

&lt;p&gt;But the skeptical side of that discussion is the part teams should keep pinned to the wall.&lt;/p&gt;

&lt;p&gt;Fast scaffolding is not the same as production engineering. Production systems have hidden constraints: data integrity, permissions, migrations, abuse cases, audit logs, rate limits, weird customers, broken integrations, and old decisions that still matter because money flows through them.&lt;/p&gt;

&lt;p&gt;An agent that does not understand those constraints is not freeing you from frameworks. It is just generating around the guardrails.&lt;/p&gt;

&lt;p&gt;That can feel amazing for the first 80 percent of a feature. Then the last 20 percent arrives with interest.&lt;/p&gt;

&lt;p&gt;This is why repo understanding is the multiplier. It helps the agent see the shape of the system before it starts optimizing for local plausibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would improve before adding another agent
&lt;/h2&gt;

&lt;p&gt;If a team asked me how to make coding agents more useful tomorrow, I would not start with a new model subscription.&lt;/p&gt;

&lt;p&gt;I would start with the repo surface.&lt;/p&gt;

&lt;p&gt;Write the missing map. Document the boundaries that keep getting violated. Turn tribal knowledge into plain files. Add a real "how to verify this area" note. Keep the commands current. Make the test strategy boring and visible. Put the dangerous directories and dead paths somewhere the agent can see them.&lt;/p&gt;

&lt;p&gt;Then I would look at the agent harness.&lt;/p&gt;

&lt;p&gt;Can it run in an isolated workspace? Can it show its plan before touching broad areas? Can it report what it changed without theatrical summaries? Can it surface token use and tool calls? Can it attach source context to claims? Can it stop when the repo map says an area is risky?&lt;/p&gt;

&lt;p&gt;None of this feels magical.&lt;/p&gt;

&lt;p&gt;Good.&lt;/p&gt;

&lt;p&gt;The impressive part of AI coding is already here. The missing part is the dull infrastructure that lets teams trust it for more than demos.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;The next leap in AI coding will not come from agents typing faster.&lt;/p&gt;

&lt;p&gt;It will come from agents entering a repo with a usable map, a clear operating procedure, and a human who can supervise the work without reading every token of the conversation.&lt;/p&gt;

&lt;p&gt;That is less glamorous than "build the whole app from one prompt."&lt;/p&gt;

&lt;p&gt;It is also much closer to how real software gets changed.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Lum1104/Understand-Anything" rel="noopener noreferrer"&gt;Lum1104/Understand-Anything&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/anthropics/knowledge-work-plugins" rel="noopener noreferrer"&gt;anthropics/knowledge-work-plugins&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/gde/getting-started-with-antigravity-cli-183g"&gt;Getting Started with Antigravity CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.producthunt.com/products/baton-2" rel="noopener noreferrer"&gt;Baton&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.producthunt.com/products/agentpeek" rel="noopener noreferrer"&gt;AgentPeek&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://news.ycombinator.com/item?id=46923543" rel="noopener noreferrer"&gt;Coding agents have replaced every framework I used&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>tooling</category>
    </item>
    <item>
      <title>Token spend is the new architecture smell</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Mon, 01 Jun 2026 08:24:24 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/token-spend-is-the-new-architecture-smell-53f</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/token-spend-is-the-new-architecture-smell-53f</guid>
      <description>&lt;p&gt;The AI coding conversation is finally moving past the fun part.&lt;/p&gt;

&lt;p&gt;For a while, the question was simple: can the agent write the code? Now the better question is nastier: why did it need that many tokens to get there?&lt;/p&gt;

&lt;p&gt;That sounds like finance. It is not. It is architecture.&lt;/p&gt;

&lt;p&gt;When a coding agent burns through a mountain of context, retries, tool calls, and half-correct edits, the bill is only the easiest symptom to measure. The real problem is usually that the system gave the agent too much room to wander and too little structure to finish.&lt;/p&gt;

&lt;h2&gt;
  
  
  The budget is telling you where the workflow leaks
&lt;/h2&gt;

&lt;p&gt;The interesting part of the recent cost panic around AI coding agents is not the exact number on the invoice. Big companies spend absurd money on all kinds of engineering tools. Sometimes that is rational.&lt;/p&gt;

&lt;p&gt;The useful signal is the shape of the spend.&lt;/p&gt;

&lt;p&gt;If token usage rises because an agent is doing valuable work against a clear spec, fine. That is a cost center you can reason about. If token usage rises because agents keep rereading the repo, guessing at intent, patching around their own mistakes, and asking for more context every time they get stuck, that is not "AI is expensive."&lt;/p&gt;

&lt;p&gt;That is an unbounded loop.&lt;/p&gt;

&lt;p&gt;Developers already know this smell in other forms. A slow build tells you something about dependency structure. A flaky test suite tells you something about isolation. A runaway cloud bill tells you something about lifecycle control.&lt;/p&gt;

&lt;p&gt;Runaway token spend belongs in the same bucket.&lt;/p&gt;

&lt;p&gt;It means the agent workflow has no useful stopping rule, no tight task boundary, or no cheap way to decide whether the work is actually getting better.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 80% problem is where the tokens go to die
&lt;/h2&gt;

&lt;p&gt;The optimistic version of agentic coding is that the model handles the boring 80% and humans handle the interesting 20%.&lt;/p&gt;

&lt;p&gt;That is close enough to be dangerous.&lt;/p&gt;

&lt;p&gt;The first 80% is often cheap because it is mostly production. The agent can scaffold files, follow patterns, fill in obvious glue, and produce a plausible diff. The last 20% is expensive because it is comprehension. Does this match the product intent? Did it preserve the invariant? Did it quietly break the boring path nobody mentioned? Is the patch smaller than the problem, or did it just create a new surface area for review?&lt;/p&gt;

&lt;p&gt;This is where unguided agents start spending like a badly written query.&lt;/p&gt;

&lt;p&gt;They search again. They widen the context. They rewrite nearby code. They add abstractions because the current shape feels confusing. They run tests, fail, patch the failure, and keep going without ever proving that the original design was right.&lt;/p&gt;

&lt;p&gt;The model is not being malicious. It is doing what you asked, or what your workflow accidentally allowed.&lt;/p&gt;

&lt;p&gt;That is why token spend is a better diagnostic than people want to admit. It shows you where your instructions are vague, your repo boundaries are mushy, and your review process depends on vibes.&lt;/p&gt;

&lt;h2&gt;
  
  
  "More agents" makes the smell worse unless review scales too
&lt;/h2&gt;

&lt;p&gt;Parallel agents are useful. I use the pattern. One agent explores docs, another patches a narrow module, another checks a failure mode. Done well, it feels less like pair programming and more like running a small build system made of judgment.&lt;/p&gt;

&lt;p&gt;But parallelism does not magically create leverage. It multiplies whatever workflow you already have.&lt;/p&gt;

&lt;p&gt;If the task is underspecified, you now have three underspecified tasks. If the repo context is too broad, you now have three agents dragging different chunks of it into their windows. If the review surface is weak, you now have three diffs competing for human attention.&lt;/p&gt;

&lt;p&gt;This is why the tooling trend around diff reviewers, model comparison, skills, and operator stacks matters more than the demo videos. The useful layer is not "agent, but more autonomous." The useful layer is routing.&lt;/p&gt;

&lt;p&gt;What kind of task is this?&lt;/p&gt;

&lt;p&gt;Which context is allowed?&lt;/p&gt;

&lt;p&gt;What file boundary is owned?&lt;/p&gt;

&lt;p&gt;What evidence is required before the work is considered done?&lt;/p&gt;

&lt;p&gt;Which failures should stop the run instead of inviting another 40,000 tokens of improvisation?&lt;/p&gt;

&lt;p&gt;That is architecture. Boring, necessary architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context is an attack surface on your budget
&lt;/h2&gt;

&lt;p&gt;People talk about repo context as if more is always better. That makes sense for a chat demo. It is a bad default for production agent work.&lt;/p&gt;

&lt;p&gt;Every extra file can become a distraction. Every broad instruction can become a permission slip. Every fuzzy requirement can turn into another loop where the agent tries to infer the missing product decision from code shape, naming conventions, old tests, stale comments, and whatever happens to fit in the context window.&lt;/p&gt;

&lt;p&gt;The answer is not to starve the model. The answer is to treat context like a dependency.&lt;/p&gt;

&lt;p&gt;Give the agent the files it needs. Give it the contract it must preserve. Give it examples when examples are cheaper than prose. Put durable workflow rules in versioned skill files or repo docs instead of rewriting the same prompt every time. Make the happy path obvious and the off-ramp explicit.&lt;/p&gt;

&lt;p&gt;In other words: stop treating context as a giant bucket and start treating it as an interface.&lt;/p&gt;

&lt;p&gt;That applies outside code too. If an AI image workflow keeps burning generations just to fix a tiny artifact, the architectural move is often to split generation from cleanup. For Gemini image outputs, a narrow browser-side cleanup step like &lt;a href="https://geminiwatermarkcleaner.com/" rel="noopener noreferrer"&gt;Gemini Watermark Cleaner&lt;/a&gt; is the kind of bounded tool that keeps the model from being used as a hammer for every downstream problem.&lt;/p&gt;

&lt;p&gt;The same principle holds for coding agents. Use the expensive generative system where generation is actually the work. Push repeatable cleanup, validation, formatting, and review into smaller tools with clearer contracts.&lt;/p&gt;

&lt;h2&gt;
  
  
  A better agent workflow has budgets in the design
&lt;/h2&gt;

&lt;p&gt;The practical fix is not "use fewer tokens." That is like telling someone with a slow database to "query less."&lt;/p&gt;

&lt;p&gt;Better constraints beat guilt.&lt;/p&gt;

&lt;p&gt;Start with task shape. A good agent task should have a small ownership boundary, a known success condition, and a clear reason to stop. "Improve the auth flow" is a fog machine. "Update the password reset form to use the existing validation helper, keep API contracts unchanged, and add one regression test for expired tokens" is work.&lt;/p&gt;

&lt;p&gt;Then control context. If the agent needs five files, do not hand it the whole repo and hope it discovers wisdom. If it needs the design system, point to the actual component pattern. If it needs product intent, put that intent in a durable file that humans can review.&lt;/p&gt;

&lt;p&gt;Then make review native. A diff is not enough when agents can produce a lot of plausible code quickly. You want a short summary of what changed, what was intentionally not changed, which tests ran, and where the agent is uncertain. You also want the workflow to stop when the evidence is missing.&lt;/p&gt;

&lt;p&gt;Finally, track spend per task type. Token usage in isolation is noisy. Token usage by category is useful. Bug fix. Refactor. test repair. exploratory spike. dependency upgrade. If one category keeps getting expensive, it probably needs better instructions, smaller task slices, or a different tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  High token spend is not automatically bad
&lt;/h2&gt;

&lt;p&gt;There is an annoying caveat here: some expensive runs are worth it.&lt;/p&gt;

&lt;p&gt;A deep migration across a messy legacy codebase will cost more than a small bug fix. A security-sensitive change should spend more time reading and verifying. A long-running agent that produces a clean, reviewed, high-value patch may be cheap compared with the human time it saved.&lt;/p&gt;

&lt;p&gt;So no, the goal is not to worship a tiny token bill.&lt;/p&gt;

&lt;p&gt;The goal is to notice when spend is buying motion instead of progress.&lt;/p&gt;

&lt;p&gt;That distinction matters. Motion is the agent editing, retrying, summarizing, and expanding context. Progress is the system getting closer to a correct, reviewable, owned change.&lt;/p&gt;

&lt;p&gt;If you cannot tell the difference, your architecture is missing instrumentation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real operator skill is saying no earlier
&lt;/h2&gt;

&lt;p&gt;The best agent operators are not the people who let models run forever. They are the people who know when to constrain the run before it starts.&lt;/p&gt;

&lt;p&gt;They write sharper specs. They split work into smaller units. They keep reusable instructions in files. They demand evidence. They do not let a model compensate for unclear product thinking by spending more tokens.&lt;/p&gt;

&lt;p&gt;That is the real shift behind all the "operator stack" chatter. The stack is not just Codex, Claude Code, routers, skills, diff viewers, and whatever launches next week. The stack is a way to make agent work reviewable.&lt;/p&gt;

&lt;p&gt;Once you see token spend that way, the invoice stops being a surprise and starts being a profiler.&lt;/p&gt;

&lt;p&gt;And if the profiler says your agent spent half the run wandering around your repo trying to understand what you meant, the fix is probably not a cheaper model.&lt;/p&gt;

&lt;p&gt;The fix is a better boundary.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://news.ycombinator.com/item?id=47976415" rel="noopener noreferrer"&gt;Uber torches 2026 AI budget on Claude Code in four months&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/max_quimby/tokenmaxxing-codex-claude-code-operator-stack-2026-318"&gt;Tokenmaxxing: Codex + Claude Code Operator Stack 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reddit.com/r/vibecoding/comments/1t13g9h/the_80_problem_in_agentic_coding/" rel="noopener noreferrer"&gt;The 80% Problem in Agentic Coding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.producthunt.com/products/kilocode" rel="noopener noreferrer"&gt;Kilo Code v7 for VS Code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Your Repo Context Is an Attack Surface Now</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Wed, 27 May 2026 02:51:22 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/your-repo-context-is-an-attack-surface-now-4e3m</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/your-repo-context-is-an-attack-surface-now-4e3m</guid>
      <description>&lt;p&gt;The lazy version of AI coding security is "make sure the model does not write insecure code."&lt;/p&gt;

&lt;p&gt;That is not wrong. It is just too small.&lt;/p&gt;

&lt;p&gt;The more interesting problem is everything the agent reads before it writes code, plus everything it is allowed to run after it decides what to do. Your repo is no longer just a place where code lives. For an agentic coding tool, it is part of the input stream.&lt;/p&gt;

&lt;p&gt;That changes the security model.&lt;/p&gt;

&lt;p&gt;Old docs, stale examples, local instruction files, hidden project conventions, dependency scripts, shell hooks, webhooks, memories, delegated workers, and previous diffs can all become steering material. Some of that context is useful. Some of it is garbage. Some of it might be hostile.&lt;/p&gt;

&lt;p&gt;This is where the agent hype gets painfully normal. The risk is not magic. It is the same automation risk developers already know, moved closer to the editor and wrapped in a model that is very good at sounding confident.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context is not background anymore
&lt;/h2&gt;

&lt;p&gt;Developers tend to treat repo context as neutral.&lt;/p&gt;

&lt;p&gt;The README is just the README. The old migration notes are just old migration notes. The examples in &lt;code&gt;docs/&lt;/code&gt; are just examples. The hook config is just a convenience thing someone added last quarter.&lt;/p&gt;

&lt;p&gt;An agent does not necessarily see that social context. It sees text, tools, paths, commands, and patterns. If a coding assistant uses project context to decide what "normal" looks like, then all of that material can affect the output.&lt;/p&gt;

&lt;p&gt;That does not mean every agent reads every git object, hidden file, or forgotten note. Overstating this makes the whole discussion worse. The real point is narrower and more useful: once a tool can use local context to shape behavior, local context becomes part of the trust boundary.&lt;/p&gt;

&lt;p&gt;That is a big shift for teams that have spent years treating docs and examples as low-risk clutter.&lt;/p&gt;

&lt;p&gt;Bad context can be boring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;outdated setup instructions&lt;/li&gt;
&lt;li&gt;examples that use deprecated APIs&lt;/li&gt;
&lt;li&gt;old architecture notes that no longer match production&lt;/li&gt;
&lt;li&gt;test fixtures that encode unsafe assumptions&lt;/li&gt;
&lt;li&gt;copied snippets with weak security defaults&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bad context can also be adversarial:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompt-injection-style instructions inside files the agent may read&lt;/li&gt;
&lt;li&gt;dependency scripts that run more than expected&lt;/li&gt;
&lt;li&gt;hook configuration that turns a local command into a larger execution path&lt;/li&gt;
&lt;li&gt;poisoned examples that nudge future changes toward unsafe patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Either way, the failure mode is the same. The agent builds on a premise you did not mean to endorse.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hooks deserve the same suspicion as build scripts
&lt;/h2&gt;

&lt;p&gt;The fastest way to make an agent useful is to let it do things.&lt;/p&gt;

&lt;p&gt;Run the formatter. Execute tests. Search files. Open pull requests. Call project scripts. Trigger webhooks. Hand work to another agent. That is the good stuff. It is also where the blast radius starts.&lt;/p&gt;

&lt;p&gt;A hook system is not "just a productivity feature" once it can run commands in a real developer environment. It is automation. Treat it like automation.&lt;/p&gt;

&lt;p&gt;That means asking basic, unfashionable questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who can edit this hook?&lt;/li&gt;
&lt;li&gt;What command does it run?&lt;/li&gt;
&lt;li&gt;What environment variables can it see?&lt;/li&gt;
&lt;li&gt;Does it inherit developer credentials?&lt;/li&gt;
&lt;li&gt;Can a package install script affect it?&lt;/li&gt;
&lt;li&gt;Does it write outside the repo?&lt;/li&gt;
&lt;li&gt;Is there a log when it fires?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this is new security wisdom. CI pipelines, build scripts, and dependency installers have lived in this world for years. The difference is that coding agents make local automation feel conversational and lightweight. That feeling is dangerous.&lt;/p&gt;

&lt;p&gt;If a compromised package, sloppy hook, or over-permissive token can turn a small agent action into a machine-level event, the model is not the only thing you need to audit. The surrounding workflow matters more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory and delegation widen the surface
&lt;/h2&gt;

&lt;p&gt;Agent platforms are moving away from single-turn chat. That is the right direction.&lt;/p&gt;

&lt;p&gt;Memory makes agents less repetitive. Outcome tracking makes them easier to steer. Delegation lets work split across specialized workers. Webhooks and visibility features make the system feel less like a magic text box and more like developer infrastructure.&lt;/p&gt;

&lt;p&gt;Good.&lt;/p&gt;

&lt;p&gt;But useful state is still state. Delegation is still delegation. A webhook is still an integration point.&lt;/p&gt;

&lt;p&gt;The mistake is treating these features as pure capability upgrades. They are also governance upgrades, whether the product UI says that out loud or not.&lt;/p&gt;

&lt;p&gt;Once an agent can remember project preferences, assign work, trigger external systems, and operate across a longer task loop, you need to care about the shape of that loop:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what memory is stored&lt;/li&gt;
&lt;li&gt;who can change it&lt;/li&gt;
&lt;li&gt;when it is used&lt;/li&gt;
&lt;li&gt;which agents can inherit it&lt;/li&gt;
&lt;li&gt;what tools delegated workers can call&lt;/li&gt;
&lt;li&gt;how results are reviewed before they land&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not a call to panic. It is a call to stop pretending the agent is only a smarter autocomplete.&lt;/p&gt;

&lt;p&gt;Autocomplete suggests text. Agent workflows can accumulate assumptions, call tools, execute commands, and leave behind changes that future runs may trust. That is a different class of system.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical controls are boring, which is good
&lt;/h2&gt;

&lt;p&gt;The right response is not "never use agents." That is unserious.&lt;/p&gt;

&lt;p&gt;The right response is to make the workflow less squishy. You want fewer ambient permissions, smaller scopes, cleaner context, and better records of what happened.&lt;/p&gt;

&lt;p&gt;Start with scope.&lt;/p&gt;

&lt;p&gt;Do not point an agent at the whole world when it only needs three files. Use narrow tasks. Use disposable worktrees when the change is risky. Keep unrelated diffs out of the working tree so the agent does not have to infer which mess is intentional.&lt;/p&gt;

&lt;p&gt;Then audit instructions.&lt;/p&gt;

&lt;p&gt;Read the files your agents are likely to treat as guidance: root docs, agent instruction files, coding standards, examples, old migration notes, and internal checklists. If they are stale, delete or fix them. If they are important, make them explicit. If they contain commands, treat those commands as part of the system.&lt;/p&gt;

&lt;p&gt;Then harden execution.&lt;/p&gt;

&lt;p&gt;Run risky work in a sandbox where possible. Keep credentials scoped. Avoid ambient secrets in the shell. Review hook configuration like you would review CI. Pin or at least inspect dependency behavior for workflows that agents can trigger. Make sure "run tests" does not secretly mean "run every script with every token available."&lt;/p&gt;

&lt;p&gt;Finally, demand visibility.&lt;/p&gt;

&lt;p&gt;You should be able to answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What did the agent read?&lt;/li&gt;
&lt;li&gt;What tools did it call?&lt;/li&gt;
&lt;li&gt;What files did it change?&lt;/li&gt;
&lt;li&gt;What commands ran?&lt;/li&gt;
&lt;li&gt;What assumptions did it make?&lt;/li&gt;
&lt;li&gt;What still needs human review?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the answer is "I think it was fine," the workflow is not mature enough for high-blast-radius work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production use still needs human ownership
&lt;/h2&gt;

&lt;p&gt;Developer discussions around production AI coding keep circling the same point: people are using these tools for serious work, but they do not get to outsource judgment.&lt;/p&gt;

&lt;p&gt;That feels right.&lt;/p&gt;

&lt;p&gt;Agents can move fast through known terrain. They can scaffold, refactor, inspect, summarize, and wire things together. They can also follow the wrong context with perfect confidence. The person operating the system still owns architecture, credentials, review, test quality, and release decisions.&lt;/p&gt;

&lt;p&gt;This is the part teams should make explicit.&lt;/p&gt;

&lt;p&gt;If an agent opens a pull request, the review standard should not drop because the author is non-human. If an agent changes auth code, the security review should get stricter, not softer. If an agent edits scripts or hooks, treat that as infrastructure work. If an agent claims the tests pass, check which tests ran and what they prove.&lt;/p&gt;

&lt;p&gt;The best teams will not be the ones that ban agentic coding. They also will not be the ones that give every agent a permanent token and a heroic prompt.&lt;/p&gt;

&lt;p&gt;They will be the ones that turn agent work into a boring, inspectable development process.&lt;/p&gt;

&lt;h2&gt;
  
  
  A useful mental model
&lt;/h2&gt;

&lt;p&gt;Think about an AI coding agent as a junior developer with shell access, excellent typing speed, strange reading habits, and no social memory of why your repo looks the way it does.&lt;/p&gt;

&lt;p&gt;You would not hand that person production credentials on day one. You would not ask them to rewrite the deployment pipeline without review. You would not let them treat every old note in the repo as current policy. You would give them a small task, a clean context, limited permissions, and a review path.&lt;/p&gt;

&lt;p&gt;That model is not perfect, but it gets the posture right.&lt;/p&gt;

&lt;p&gt;The agent is not evil. The repo is not cursed. The problem is trust.&lt;/p&gt;

&lt;p&gt;Once repo context becomes model input and local automation becomes agent action, the boundary moves. Security has to move with it.&lt;/p&gt;

&lt;p&gt;The practical takeaway is simple: clean the context, narrow the scope, sandbox the execution, log the actions, and review the diff like it matters.&lt;/p&gt;

&lt;p&gt;Because it does.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/piiiico/git-history-as-an-attack-surface-22dh"&gt;Git History as an Attack Surface&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://lord.technology/2026/05/02/claude-codes-hook-system-just-got-weaponised.html" rel="noopener noreferrer"&gt;Claude Code's hook system just got weaponised&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://claude.com/blog/new-in-claude-managed-agents" rel="noopener noreferrer"&gt;New in Claude Managed Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/alexmercedcoder/ai-weekly-free-web-tools-mcp-production-wins-trusted-compute-models-april-30-may-6-2026-325h"&gt;AI Weekly: Free Web Tools, MCP Production Wins, Trusted-Compute Models&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reddit.com/r/ClaudeCode/comments/1t818g9/production_level_software_by_ai/" rel="noopener noreferrer"&gt;Production Level Software by AI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>devops</category>
      <category>security</category>
    </item>
    <item>
      <title>Agent memory is a review problem now</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Mon, 25 May 2026 04:15:42 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/agent-memory-is-a-review-problem-now-4c9h</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/agent-memory-is-a-review-problem-now-4c9h</guid>
      <description>&lt;p&gt;The boring take on agent memory is that coding agents forget too much.&lt;/p&gt;

&lt;p&gt;That is true, but it is also the least interesting part of the problem.&lt;/p&gt;

&lt;p&gt;The real issue starts when an agent stops treating context as temporary and starts turning it into durable state. A bad answer in chat is annoying. A bad memory is worse because it can quietly steer the next task, the next branch, and the next review. It becomes part of the engineering system without going through the engineering process.&lt;/p&gt;

&lt;p&gt;That is the part people keep underestimating.&lt;/p&gt;

&lt;p&gt;Persistent agent memory is useful. I want agents to remember project conventions, old decisions, failed approaches, deployment weirdness, naming rules, and the sharp edges that never make it into the README. Nobody wants to re-explain the same repo every morning.&lt;/p&gt;

&lt;p&gt;But "remember more" is not a strategy. It is a storage policy with good branding.&lt;/p&gt;

&lt;p&gt;If memory can change future behavior, it needs the same boring controls we already expect around code: ownership, review, provenance, correction, deletion, and a way to tell whether it still applies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory is state, not vibes
&lt;/h2&gt;

&lt;p&gt;A lot of agent-memory discussion still sounds like a context-window problem.&lt;/p&gt;

&lt;p&gt;The agent forgot what we decided last week. The agent lost the thread between sessions. The agent burned tokens rediscovering the same repo structure. Fine. Those are real annoyances.&lt;/p&gt;

&lt;p&gt;But once you give an agent durable memory, you are not only improving recall. You are creating state.&lt;/p&gt;

&lt;p&gt;That state can be wrong. It can be stale. It can be copied from a one-off workaround. It can encode a temporary migration rule as if it were a permanent architecture principle. It can remember a human's half-formed comment as a requirement. It can also leak across projects or tools if the memory layer is too eager.&lt;/p&gt;

&lt;p&gt;That is why I do not buy the idea that the winning memory system is simply the one with the biggest recall surface.&lt;/p&gt;

&lt;p&gt;The winner is probably the one teams can govern.&lt;/p&gt;

&lt;h2&gt;
  
  
  The recent agent stack is moving this way anyway
&lt;/h2&gt;

&lt;p&gt;The Hermes/OpenClaw discussion on DEV is interesting because the exact product fight is not the main point. The useful signal is the stack shape: persistent runtime, memory, skills, background work, sandboxed execution, and scoped tools.&lt;/p&gt;

&lt;p&gt;That is where coding agents are clearly heading. The chat box is becoming less important than the operating surface around it.&lt;/p&gt;

&lt;p&gt;The same pattern shows up in &lt;code&gt;agent-skills&lt;/code&gt;. That repo is not trying to be an agent memory database. It is more useful as a model for what durable agent knowledge should feel like: plain files, clear procedures, lifecycle steps, verification gates, and workflows that can travel across tools.&lt;/p&gt;

&lt;p&gt;That matters.&lt;/p&gt;

&lt;p&gt;An opaque memory blob says, "trust me, I remembered something." A skill or runbook says, "here is the procedure I am going to follow, and here is the file you can diff."&lt;/p&gt;

&lt;p&gt;I know which one I would rather review before letting an agent touch production code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Retrieval is not governance
&lt;/h2&gt;

&lt;p&gt;There is a tempting engineering answer here: better search.&lt;/p&gt;

&lt;p&gt;Use vector search. Add BM25. Fuse rankings. Store transcripts. Filter by time. Summarize sessions. Pull the most relevant memories into the prompt.&lt;/p&gt;

&lt;p&gt;All of that can help. The Reddit builder thread around local agent memory is a good example of the practical direction: local storage, hybrid retrieval, session management, tool-call history, and cross-agent compatibility. These are useful pieces.&lt;/p&gt;

&lt;p&gt;They still do not answer the important question.&lt;/p&gt;

&lt;p&gt;Search can find an old memory. It cannot decide whether that memory is true.&lt;/p&gt;

&lt;p&gt;It cannot tell you whether a previous workaround was temporary. It cannot know whether a team changed its deployment path last month. It cannot prove that a remembered instruction came from the maintainer instead of a random debugging session. It cannot decide that a rule should expire.&lt;/p&gt;

&lt;p&gt;That decision needs a lifecycle.&lt;/p&gt;

&lt;p&gt;Without one, memory becomes another form of prompt injection, except now the attacker might be your own past self at 2 a.m.&lt;/p&gt;

&lt;h2&gt;
  
  
  The useful checklist is not complicated
&lt;/h2&gt;

&lt;p&gt;If an agent memory layer wants to be trusted in real software work, I would start with six boring requirements.&lt;/p&gt;

&lt;p&gt;First, memory should be inspectable. A developer should be able to see what the agent believes about the project without spelunking through a vector database.&lt;/p&gt;

&lt;p&gt;Second, it should be correctable. If a memory is wrong, fixing it should be a normal edit, not a ritual.&lt;/p&gt;

&lt;p&gt;Third, it should have provenance. A memory that came from a merged architecture doc is different from one inferred from a temporary branch.&lt;/p&gt;

&lt;p&gt;Fourth, it should be portable enough that the team is not locked into one agent runtime forever.&lt;/p&gt;

&lt;p&gt;Fifth, it should be permissioned. Some memories belong to a project. Some belong to a user. Some should never be stored.&lt;/p&gt;

&lt;p&gt;Sixth, it should be pruneable. Old context has a half-life. Pretending it does not is how agents keep resurrecting dead decisions.&lt;/p&gt;

&lt;p&gt;None of this is glamorous. That is the point.&lt;/p&gt;

&lt;p&gt;The agent ecosystem keeps rediscovering that the valuable parts of software engineering are usually the parts that look least magical: diffs, reviews, tests, ownership, changelogs, and boring text files.&lt;/p&gt;

&lt;h2&gt;
  
  
  Treat memory writes like changes
&lt;/h2&gt;

&lt;p&gt;The simplest model I can think of is this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The agent finishes a session.&lt;/li&gt;
&lt;li&gt;It proposes memory entries or skill updates.&lt;/li&gt;
&lt;li&gt;The human sees the proposed changes in plain language.&lt;/li&gt;
&lt;li&gt;The team accepts, edits, rejects, or scopes them.&lt;/li&gt;
&lt;li&gt;Periodic cleanup removes stale or dangerous entries.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is it.&lt;/p&gt;

&lt;p&gt;Not every memory needs a pull request ceremony. A solo developer can use a lighter flow. A regulated team might need stricter controls. The point is not bureaucracy. The point is that durable memory should not silently mutate the way future work gets done.&lt;/p&gt;

&lt;p&gt;This is especially important for skills.&lt;/p&gt;

&lt;p&gt;A skill is procedural memory. It says, "when this kind of task appears, do this." That is incredibly powerful. It is also a great way to institutionalize a bad habit if nobody reviews the procedure.&lt;/p&gt;

&lt;p&gt;The same thing that makes skills valuable makes them dangerous: they compound.&lt;/p&gt;

&lt;h2&gt;
  
  
  The category is real, but the bar should be higher
&lt;/h2&gt;

&lt;p&gt;AgentMemory and similar products are a sign that this category is no longer theoretical. Developers want cross-session continuity. They want fewer repeated explanations. They want local control, less token waste, and agents that can pick up where they left off.&lt;/p&gt;

&lt;p&gt;Good. That is the right demand.&lt;/p&gt;

&lt;p&gt;The pushback in the comments and community threads is also right. People are asking about stale memories, exports, secrets, conflicts, correction flows, and whether memory is just another hidden prompt layer with a nicer name.&lt;/p&gt;

&lt;p&gt;That skepticism is healthy.&lt;/p&gt;

&lt;p&gt;Agent memory will be one of those features that feels incredible in demos and painful in teams if the product surface is wrong. The demo version remembers the right thing at the right time. The production version needs to explain why it remembered that thing, where it came from, who can change it, and when it should stop being trusted.&lt;/p&gt;

&lt;p&gt;That is less exciting than "infinite memory."&lt;/p&gt;

&lt;p&gt;It is also the difference between a clever assistant and a tool you can safely build around.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;The next agent-memory fight should not be about who stores the most context.&lt;/p&gt;

&lt;p&gt;It should be about who makes memory reviewable.&lt;/p&gt;

&lt;p&gt;Because the moment memory becomes durable, it stops being a convenience feature. It becomes part of your engineering process. Treat it that way, or it will become the invisible teammate that nobody reviews and everyone slowly depends on.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/tahosin/hermes-just-killed-openclaw-heres-why-4c23"&gt;Hermes Just Killed OpenClaw (Here's Why)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/addyosmani/agent-skills" rel="noopener noreferrer"&gt;addyosmani/agent-skills&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.producthunt.com/products/agent-memory-dev" rel="noopener noreferrer"&gt;AgentMemory&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reddit.com/r/aiagents/comments/1tcw5jd/ai_agents_dont_really_have_a_memory_problem_they/" rel="noopener noreferrer"&gt;AI agents do not really have a memory problem, they have a memory ownership problem&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reddit.com/r/opencode/comments/1te9tzi/i_got_tired_of_my_ai_agent_forgetting_everything/" rel="noopener noreferrer"&gt;I got tired of my AI agent forgetting everything. So I built a memory system for it.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Stop Calling It Vibe Coding When You Need Engineering</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Mon, 18 May 2026 03:04:36 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/stop-calling-it-vibe-coding-when-you-need-engineering-1o80</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/stop-calling-it-vibe-coding-when-you-need-engineering-1o80</guid>
      <description>&lt;p&gt;The most useful thing about "vibe coding" is also the thing that makes it dangerous: it feels like progress before the system has earned your trust.&lt;/p&gt;

&lt;p&gt;You describe the app. The model writes a lot of code. The demo starts to move. For prototypes, that is magic. For production software, it is where the bill starts.&lt;/p&gt;

&lt;p&gt;The mistake is treating the first 70% as proof that the last 30% will be easy. It usually is not. The last 30% is where the vague requirements become edge cases, the generated architecture starts pushing back, and the missing tests stop being a detail.&lt;/p&gt;

&lt;p&gt;That is why the more interesting shift is not "AI writes code now." We already know that. The real shift is from vibe coding to agentic engineering: structured work where agents operate inside specs, tests, memory, review loops, and clear human control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vibe coding is great until the code matters
&lt;/h2&gt;

&lt;p&gt;Vibe coding works best when the cost of being wrong is low.&lt;/p&gt;

&lt;p&gt;Want to explore a UI idea? Fine. Want to build a throwaway internal script? Great. Want to generate a starter app so you can see the shape of a product? That is a legitimate use case.&lt;/p&gt;

&lt;p&gt;The problem starts when the prototype quietly becomes the foundation.&lt;/p&gt;

&lt;p&gt;AI-generated code can look more finished than it is. It may compile. It may even pass the happy-path click test. But production work has a different standard:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can another developer understand the structure?&lt;/li&gt;
&lt;li&gt;Are the failure cases explicit?&lt;/li&gt;
&lt;li&gt;Do tests cover the behavior that matters?&lt;/li&gt;
&lt;li&gt;Is the architecture still sane after the fifth change request?&lt;/li&gt;
&lt;li&gt;Can you safely modify it next month?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the part vibe coding tends to hide. The model can generate volume faster than you can inspect intent. If the workflow is just "prompt, accept, prompt again," you are not removing engineering work. You are moving it downstream, where it is harder to see.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 70% problem is really a trust problem
&lt;/h2&gt;

&lt;p&gt;The "70% problem" is a good way to frame this: AI gets you impressively far, then the remaining work becomes weirdly expensive.&lt;/p&gt;

&lt;p&gt;That does not mean AI coding is bad. It means code generation is not the same as software delivery.&lt;/p&gt;

&lt;p&gt;The first 70% rewards speed. The last 30% rewards judgment. Those are different muscles.&lt;/p&gt;

&lt;p&gt;Early on, the agent can make broad moves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scaffold the app&lt;/li&gt;
&lt;li&gt;wire up common patterns&lt;/li&gt;
&lt;li&gt;generate boilerplate&lt;/li&gt;
&lt;li&gt;suggest APIs&lt;/li&gt;
&lt;li&gt;implement obvious flows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Later, the work becomes less about typing and more about control:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deciding what should not be abstracted&lt;/li&gt;
&lt;li&gt;catching incorrect assumptions&lt;/li&gt;
&lt;li&gt;tightening data boundaries&lt;/li&gt;
&lt;li&gt;deleting clever-but-useless code&lt;/li&gt;
&lt;li&gt;proving behavior with tests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why serious AI coding workflows start to look less like chat and more like engineering operations. You need constraints. You need feedback. You need durable context. You need a way to say, "This is the contract. This is the test. This is the part you are allowed to change."&lt;/p&gt;

&lt;p&gt;Without that, the agent is just producing plausible text in a code-shaped format.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agentic engineering changes the unit of work
&lt;/h2&gt;

&lt;p&gt;The useful unit is no longer a prompt. It is a task with context, acceptance criteria, tools, and review.&lt;/p&gt;

&lt;p&gt;That sounds less exciting than "build me an app," but it is the difference between a demo and a workflow you can keep using.&lt;/p&gt;

&lt;p&gt;Agentic engineering is the practice of making AI agents operate inside an engineering system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;specs before implementation&lt;/li&gt;
&lt;li&gt;tests before trust&lt;/li&gt;
&lt;li&gt;small scopes instead of giant rewrites&lt;/li&gt;
&lt;li&gt;file-based handoffs instead of chat memory guesses&lt;/li&gt;
&lt;li&gt;human review at the points where judgment matters&lt;/li&gt;
&lt;li&gt;repeatable skills for common work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where tools like Hermes Agent are worth watching. The interesting part is not that it is another chatbot interface. The project points toward a more operational model: agents with memory, custom skills, subagents, and deployment options that let them run as part of a workflow instead of sitting off to the side as a text box.&lt;/p&gt;

&lt;p&gt;That is a different posture. A coding assistant answers. An engineering agent should remember, delegate, run tools, adapt to local patterns, and leave artifacts that humans can audit.&lt;/p&gt;

&lt;p&gt;It still needs supervision. Maybe more supervision, not less. But the supervision moves from babysitting every line to designing the system the agent works inside.&lt;/p&gt;

&lt;h2&gt;
  
  
  Parallel agents only help if the work is shaped correctly
&lt;/h2&gt;

&lt;p&gt;Once people see agents as workers instead of autocomplete, the next temptation is obvious: run more of them.&lt;/p&gt;

&lt;p&gt;That can help. It can also create a beautiful mess.&lt;/p&gt;

&lt;p&gt;Parallel agents are only useful when the work can be split cleanly. If three agents all edit the same files, disagree about architecture, and invent their own assumptions, you did not gain throughput. You created a merge conflict with confidence.&lt;/p&gt;

&lt;p&gt;The better pattern is boring and effective:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one agent explores a specific question&lt;/li&gt;
&lt;li&gt;one agent owns a narrow implementation area&lt;/li&gt;
&lt;li&gt;one agent verifies behavior or checks risks&lt;/li&gt;
&lt;li&gt;all of them write results back to files&lt;/li&gt;
&lt;li&gt;a human or orchestrator integrates the output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is also why memory and custom skills matter. If every agent starts cold, you spend half the run re-explaining the codebase. If the agent can carry durable project knowledge and reusable workflows, it has a better shot at producing work that fits.&lt;/p&gt;

&lt;p&gt;The goal is not autonomy for its own sake. The goal is less repeated context loading, fewer sloppy handoffs, and faster movement through well-defined tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Programming for agents may need stricter boundaries
&lt;/h2&gt;

&lt;p&gt;The Weft project is another signal in the same direction: developers are starting to think about languages and runtimes where humans, LLMs, and infrastructure are all first-class parts of the system.&lt;/p&gt;

&lt;p&gt;That framing matters because agent work is not just "call an LLM and hope." Durable execution, explicit state, recoverable tasks, and clear boundaries become much more important when a model is allowed to act over time.&lt;/p&gt;

&lt;p&gt;This is where the hype often gets ahead of the engineering.&lt;/p&gt;

&lt;p&gt;Agents are not magically reliable because they can call tools. Tool access gives them more surface area to fail. The workflow has to make failure visible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what did the agent read?&lt;/li&gt;
&lt;li&gt;what did it change?&lt;/li&gt;
&lt;li&gt;what assumptions did it make?&lt;/li&gt;
&lt;li&gt;what tests did it run?&lt;/li&gt;
&lt;li&gt;what still needs human review?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you cannot answer those questions, you do not have an agentic workflow. You have a longer prompt chain.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical upgrade path
&lt;/h2&gt;

&lt;p&gt;You do not need to throw away vibe coding. You need to stop using it as the whole process.&lt;/p&gt;

&lt;p&gt;A more production-friendly workflow looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use vibe coding for exploration.&lt;/li&gt;
&lt;li&gt;Freeze the useful direction into a short spec.&lt;/li&gt;
&lt;li&gt;Break the work into small tasks with file ownership.&lt;/li&gt;
&lt;li&gt;Ask the agent to implement against the spec, not the vibe.&lt;/li&gt;
&lt;li&gt;Require tests, logs, or screenshots depending on the change.&lt;/li&gt;
&lt;li&gt;Review the diff like you would review a human teammate's work.&lt;/li&gt;
&lt;li&gt;Capture repeatable patterns as reusable project instructions or skills.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That last step is underrated. If you keep prompting the same rule every day, it belongs in the system, not in your short-term memory. Agents get more useful when the workflow teaches them how your project actually works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this still breaks down
&lt;/h2&gt;

&lt;p&gt;Agentic engineering is not a magic maturity badge.&lt;/p&gt;

&lt;p&gt;It can add overhead. It can produce too much process around simple work. It can create false confidence if the agent writes tests that merely confirm its own misunderstanding. It can also make teams lazy about architecture if they assume "the agent will fix it later."&lt;/p&gt;

&lt;p&gt;The rule of thumb is simple: match the process to the blast radius.&lt;/p&gt;

&lt;p&gt;For a prototype, vibe coding is fine. For a user-facing system, you need constraints. For critical paths, you need human review, meaningful tests, and boring operational discipline.&lt;/p&gt;

&lt;p&gt;The future of AI coding is probably not one giant prompt that builds the perfect app. It is smaller, sharper loops where agents do real work inside systems that make their output inspectable.&lt;/p&gt;

&lt;p&gt;That is less magical.&lt;/p&gt;

&lt;p&gt;It is also much closer to engineering.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>Generative UI Is the New Responsive Design</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Mon, 20 Apr 2026 01:51:48 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/generative-ui-is-the-new-responsive-design-4pad</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/generative-ui-is-the-new-responsive-design-4pad</guid>
      <description>&lt;p&gt;Frontend developers spent the last decade getting very good at adapting layouts to different screens. That problem still matters. It just is not the interesting one anymore.&lt;/p&gt;

&lt;p&gt;The new problem is adapting the interface to intent.&lt;/p&gt;

&lt;p&gt;Generative UI is pushing a different shift into the open. Once an AI product does more than stream text, the old model of "one prompt box, one response area, maybe a sidebar" starts to look tiny. You need interfaces that can change shape based on structured output, tool results, and what the system thinks the user is actually trying to do.&lt;/p&gt;

&lt;p&gt;Responsive design taught us to stop hardcoding for one viewport. Generative UI is forcing us to stop hardcoding for one interaction path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Text streaming was a good first step. It is also the ceiling of a lot of AI apps.
&lt;/h2&gt;

&lt;p&gt;The easiest AI UI to build is still a chat box that streams tokens into a message bubble. That is why so many products stop there.&lt;/p&gt;

&lt;p&gt;The problem is that text is a weak container for action.&lt;/p&gt;

&lt;p&gt;If a model finds flights, summarizes tickets, chooses a chart type, or decides that the next useful step is a file picker instead of another paragraph, dumping all of that back into prose makes the interface worse. The model may have produced something structured, but the user receives it as a wall of words and a vague suggestion to click somewhere else.&lt;/p&gt;

&lt;p&gt;That is why generative UI matters. It lets the model return data that the frontend can turn into actual interface decisions instead of treating every answer like a markdown blob.&lt;/p&gt;

&lt;h2&gt;
  
  
  Structured outputs are what make the idea real
&lt;/h2&gt;

&lt;p&gt;The Vercel AI SDK is a good example of where the stack is moving. The important part is not the branding. The important part is the model contract.&lt;/p&gt;

&lt;p&gt;When AI output is described with schemas instead of left as raw text, the UI gets something it can trust enough to render with purpose. A recommendation card can be a card. A comparison can be a table. A loading state can stream into a concrete component instead of leaving the user staring at animated dots and hoping the paragraph eventually gets to the point.&lt;/p&gt;

&lt;p&gt;That sounds obvious once you say it out loud, but it changes the job.&lt;/p&gt;

&lt;p&gt;You are no longer building a static component tree with a chatbot bolted on top. You are building a system that can map model decisions, tool responses, and partial results into interface states that were not fully predetermined at compile time.&lt;/p&gt;

&lt;p&gt;This is much closer to responsive design than most people admit. The trigger is different, but the mindset is the same: define constraints, build flexible primitives, and let runtime conditions shape the final UI.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI UX is becoming a primitives game
&lt;/h2&gt;

&lt;p&gt;The second signal is what libraries like &lt;code&gt;assistant-ui&lt;/code&gt; are standardizing.&lt;/p&gt;

&lt;p&gt;A lot of teams learned the hard way that "just build a chat interface" is fake-simple. Streaming, scroll behavior, message composition, markdown rendering, accessibility, code blocks, attachments, and tool result presentation turn into a real frontend system very quickly. The toy demo becomes production software the moment users expect it to be reliable.&lt;/p&gt;

&lt;p&gt;That is why composable AI UI primitives matter so much. They do for AI interfaces what headless component libraries did for design systems: pull repeated complexity into reusable building blocks without forcing every team into one giant widget.&lt;/p&gt;

&lt;p&gt;This is also why generative UI is bigger than chat. Chat happened to be the first shell. The deeper shift is that frontends now need reusable parts for model-driven interaction, not just reusable parts for buttons and modals.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hard part is not rendering. It is control.
&lt;/h2&gt;

&lt;p&gt;This is where the hype usually gets sloppy.&lt;/p&gt;

&lt;p&gt;People talk about generative UI like the main challenge is getting the model to render a clever component on the fly. That is demo thinking. The real bottleneck is control.&lt;/p&gt;

&lt;p&gt;Who owns state when the model proposes the next UI shape?&lt;/p&gt;

&lt;p&gt;What happens when the tool result is incomplete, wrong, or slow?&lt;/p&gt;

&lt;p&gt;How do you test a flow where the interface is partly determined at runtime?&lt;/p&gt;

&lt;p&gt;How much freedom should the model have before the product starts feeling unpredictable?&lt;/p&gt;

&lt;p&gt;Those are frontend architecture questions, not prompt-engineering questions. Once you move from text output to adaptive interface output, you inherit the usual complexity around state machines, validation, fallback paths, and debugging. Probably more, because now another probabilistic system is feeding the UI layer.&lt;/p&gt;

&lt;p&gt;Generative UI is exciting for the same reason it is annoying: it pushes AI products back into real software engineering.&lt;/p&gt;

&lt;h2&gt;
  
  
  React is ahead here, but the pattern matters more than the framework
&lt;/h2&gt;

&lt;p&gt;Right now, a lot of this work is happening in React-heavy tooling. That is not surprising. React already has strong conventions around component composition, state flow, and streaming-oriented app architecture.&lt;/p&gt;

&lt;p&gt;Still, the trend is larger than one ecosystem.&lt;/p&gt;

&lt;p&gt;The durable idea is that AI products need a translation layer between model output and interface behavior. Some teams will express that with React components and schema-driven rendering. Others will build their own equivalent abstractions. Either way, the old "LLM response equals text on screen" model is starting to break.&lt;/p&gt;

&lt;p&gt;That break is healthy. It forces product teams to ask what the best interface actually is instead of defaulting to chat because chat is easy to ship.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who should care
&lt;/h2&gt;

&lt;p&gt;If you are building AI products with a single linear conversation and very little tool use, plain text streaming is still fine. It is cheap, simple, and good enough more often than people want to admit.&lt;/p&gt;

&lt;p&gt;If your product needs structured answers, multi-step workflows, tool invocation, or UI states that change with user intent, generative UI is not a fancy extra. It is the next layer of frontend discipline.&lt;/p&gt;

&lt;p&gt;Responsive design taught us that one layout could not serve every screen. Generative UI is teaching the same lesson about interaction.&lt;/p&gt;

&lt;p&gt;One fixed interface is not going to serve every intent either.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/vercel/ai" rel="noopener noreferrer"&gt;Vercel AI SDK&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/assistant-ui/assistant-ui" rel="noopener noreferrer"&gt;assistant-ui&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>productivity</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>Stateless Chat Is Losing to Persistent CLI Agents</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Fri, 17 Apr 2026 02:44:16 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/stateless-chat-is-losing-to-persistent-cli-agents-25j4</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/stateless-chat-is-losing-to-persistent-cli-agents-25j4</guid>
      <description>&lt;p&gt;Most people are still treating AI like a better search box with a chat window attached. That made sense when the whole workflow was "open a tab, paste some code, ask a question, close the tab." It makes a lot less sense once the work stops being one prompt long.&lt;/p&gt;

&lt;p&gt;The real bottleneck now is not model intelligence. It's context reset.&lt;/p&gt;

&lt;p&gt;If you do serious work in the terminal, the browser-chat loop starts to feel weirdly primitive. You keep re-explaining your stack. You keep pasting the same paths. You lose the thread between yesterday's bug, today's refactor, and tomorrow's follow-up. The model might be strong, but the workflow is forgetful.&lt;/p&gt;

&lt;p&gt;That is why persistent local agents are getting attention. The interesting shift is not "AI got smarter again." It's that the agent now has somewhere to live.&lt;/p&gt;

&lt;h2&gt;
  
  
  The old workflow breaks as soon as work spans sessions
&lt;/h2&gt;

&lt;p&gt;Stateless chat is fine for isolated questions. It falls apart when the job has continuity.&lt;/p&gt;

&lt;p&gt;Software work usually has continuity. Your project has conventions. Your machine has quirks. Your team has rules about tests, branch flow, deployment, and what not to touch. Repeating that every session is bad enough. Repeating it while the agent is also expected to operate tools, run commands, and pick up unfinished work is worse.&lt;/p&gt;

&lt;p&gt;Persistent agents attack that exact problem. Hermes Agent is a good example of the pattern because it is built around memory, session search, and multi-surface access instead of treating those as optional extras. The point is not just "remember my preferences." The point is that the agent can carry project context forward across sessions, search prior work, and keep the same identity whether you talk to it in a terminal or through a gateway like Telegram or Slack.&lt;/p&gt;

&lt;p&gt;That changes the unit of work. You stop thinking in prompts and start thinking in ongoing threads.&lt;/p&gt;

&lt;h2&gt;
  
  
  CLI is the real center of gravity
&lt;/h2&gt;

&lt;p&gt;Another mistake people make is assuming the important battle is web UI versus terminal UI. It isn't.&lt;/p&gt;

&lt;p&gt;The important question is where the agent can actually do useful work. For developers, that is still the CLI.&lt;/p&gt;

&lt;p&gt;The terminal is where files, git, build tools, test runners, logs, package managers, and remote shells already meet. A persistent CLI agent fits that environment much better than a browser tab does. Hermes leans into that with an interactive CLI, gateway access from messaging platforms, multiple execution backends, and recent release work around long-running tasks, completion notifications, smarter inactivity timeouts, and better model switching mid-session.&lt;/p&gt;

&lt;p&gt;That combination matters. A lot of AI tooling still assumes the session itself is the product. Persistent agents treat the session as just one interface into a longer-running system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory only matters if retrieval is practical
&lt;/h2&gt;

&lt;p&gt;"Has memory" is turning into one of those AI feature claims that means almost nothing on its own.&lt;/p&gt;

&lt;p&gt;What matters is whether the memory model is usable under real pressure.&lt;/p&gt;

&lt;p&gt;Hermes splits memory into a few layers: compact persistent files for stable context, searchable session history, and optional external memory providers when people want to go further. The practical part is the retrieval path. If the agent can search prior sessions and recover the piece that matters, continuity becomes real. If memory is just a bloated prompt appendix, it quickly becomes expensive decoration.&lt;/p&gt;

&lt;p&gt;This is also where persistent agents feel more honest than browser chat. They admit that context is infrastructure. It has storage, boundaries, search behavior, and tradeoffs. That's a much better framing than pretending each new conversation is magically "aware" of your work.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP is what keeps this from becoming another closed stack
&lt;/h2&gt;

&lt;p&gt;Persistence is only half the story. The other half is extensibility.&lt;/p&gt;

&lt;p&gt;If your agent remembers everything but can only use the tools shipped by one vendor, you still have a lock-in problem. MCP is important because it gives these agents a cleaner way to attach external tools and data sources without rewriting the whole product every time a new integration shows up.&lt;/p&gt;

&lt;p&gt;This is where the local-agent model gets much more compelling for developers. You can keep one long-lived agent setup and swap models, add MCP servers, change providers, or route work differently without throwing away the whole workflow. Hermes explicitly pushes that "bring your own model" path, including mid-session switching and support for multiple providers.&lt;/p&gt;

&lt;p&gt;That flexibility is a bigger deal than the demo-friendly "look, it can use Slack" angle. The long-term win is having an agent architecture that can absorb new tools without making you start over.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tradeoff is setup, security, and cost discipline
&lt;/h2&gt;

&lt;p&gt;None of this is free.&lt;/p&gt;

&lt;p&gt;Persistent agents ask more from you than opening ChatGPT in a tab. You need to think about where the agent runs, what it can access, how commands are approved, how memory is stored, and whether your model choices are going to burn tokens for no good reason. Community discussions around Hermes already show both sides: people like the continuity and remote access, but they also push on token usage, setup friction, and operational rough edges.&lt;/p&gt;

&lt;p&gt;That is normal. In fact, it is a good sign. It means these tools are being judged as infrastructure now, not as toys.&lt;/p&gt;

&lt;p&gt;The security side matters even more. If you are giving an agent terminal access, file access, browser access, cron, and external integrations, sandboxing and approval boundaries are not optional polish. They are the product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who should care
&lt;/h2&gt;

&lt;p&gt;If you mostly ask one-off questions, stateless chat is still fine. It is cheap, immediate, and easy.&lt;/p&gt;

&lt;p&gt;If your AI workflow already involves recurring project context, repeated setup, remote execution, or handoffs between terminal work and message-based monitoring, persistent CLI agents are a better fit. Not because they feel futuristic, but because they match how real systems work: stateful, messy, and spread over time.&lt;/p&gt;

&lt;p&gt;That is the part people are finally starting to get. The future is probably not one magical chat box. It is an agent that can keep context, live close to your tools, and survive long enough to become operationally useful.&lt;/p&gt;

&lt;p&gt;Browser chat is not dead. But for serious developer workflows, it is starting to look like the temporary layer.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
