<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: hefty</title>
    <description>The latest articles on DEV Community by hefty (@hefty_69a4c2d631c9dd70724).</description>
    <link>https://dev.to/hefty_69a4c2d631c9dd70724</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3686846%2Fd23c7b90-6e5c-4c63-a220-85df4d0e14fa.png</url>
      <title>DEV Community: hefty</title>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hefty_69a4c2d631c9dd70724"/>
    <language>en</language>
    <item>
      <title>Coding agents need boring trust boundaries, not hidden cleverness</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Thu, 02 Jul 2026 02:56:46 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/coding-agents-need-boring-trust-boundaries-not-hidden-cleverness-2l81</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/coding-agents-need-boring-trust-boundaries-not-hidden-cleverness-2l81</guid>
      <description>&lt;p&gt;The worst kind of coding-agent feature is the clever one nobody can see.&lt;/p&gt;

&lt;p&gt;That sounds harsh, but I mean it pretty literally. A tool that can read files, shape prompts, call shell commands, touch git state, drive a browser, and route traffic through model providers does not get the same trust budget as a normal CLI.&lt;/p&gt;

&lt;p&gt;If a formatter does something surprising, you revert the diff.&lt;/p&gt;

&lt;p&gt;If a coding agent does something surprising, you may not even know which local context, prompt mutation, gateway decision, or review shortcut shaped the result.&lt;/p&gt;

&lt;p&gt;That is why the agent stack needs less hidden cleverness and more boring, inspectable boundaries.&lt;/p&gt;

&lt;h2&gt;
  
  
  A coding-agent client is not just another wrapper
&lt;/h2&gt;

&lt;p&gt;The easy mistake is treating an agent client like a nicer terminal interface for a model.&lt;/p&gt;

&lt;p&gt;It is not.&lt;/p&gt;

&lt;p&gt;A serious coding-agent client sits near too many important edges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;local files&lt;/li&gt;
&lt;li&gt;shell commands&lt;/li&gt;
&lt;li&gt;git history and pending changes&lt;/li&gt;
&lt;li&gt;repo instructions&lt;/li&gt;
&lt;li&gt;browser sessions&lt;/li&gt;
&lt;li&gt;prompt context&lt;/li&gt;
&lt;li&gt;provider routing&lt;/li&gt;
&lt;li&gt;API gateways&lt;/li&gt;
&lt;li&gt;generated code review&lt;/li&gt;
&lt;li&gt;maintainer policy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once a tool lives there, "trust us" stops being enough. Even "the model is good" stops being enough. The model can be good while the client behavior is confusing. The client can be useful while the gateway behavior is undocumented. The patch can look fine while nobody really owns the generated work.&lt;/p&gt;

&lt;p&gt;This is the part developers keep underestimating. Agent trust is not a vibe. It is a system property.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hidden markers are the wrong shape for this job
&lt;/h2&gt;

&lt;p&gt;A recent technical post by Thereallo argues that Claude Code can mark some requests by subtly changing a date sentence in the system prompt under certain custom endpoint conditions. The post frames this as a steganographic request marker: not a big visible telemetry field, not an explicit warning, but a tiny text-level difference inside prompt context.&lt;/p&gt;

&lt;p&gt;I am not going to pretend that one reverse-engineering post is a complete vendor record. It is not. The post also says ordinary official-endpoint usage likely does not hit the same path.&lt;/p&gt;

&lt;p&gt;But the design question is still useful.&lt;/p&gt;

&lt;p&gt;If a coding-agent client wants to classify custom gateways, detect abuse patterns, distinguish proxy traffic, or handle unusual provider setups differently, that behavior should be boring and explicit.&lt;/p&gt;

&lt;p&gt;Put it in a documented field.&lt;/p&gt;

&lt;p&gt;Put it in logs.&lt;/p&gt;

&lt;p&gt;Put it behind a visible config value.&lt;/p&gt;

&lt;p&gt;Put it somewhere an operator can reason about it without reverse engineering prompt text.&lt;/p&gt;

&lt;p&gt;The issue is not that abuse prevention is illegitimate. The issue is that hidden-ish prompt behavior is a bad trade for a tool asking developers for local authority.&lt;/p&gt;

&lt;p&gt;When the tool is close to files, commands, and source control, subtlety becomes a liability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Routers and custom gateways are normal now
&lt;/h2&gt;

&lt;p&gt;This would matter less if custom API paths were rare edge cases. They are not.&lt;/p&gt;

&lt;p&gt;Developers are wiring coding tools through routers, provider fallbacks, quota managers, local gateways, and policy layers because the agent workflow is getting expensive and operationally messy. Projects like OmniRoute are a signal of where the market is going: people want one place to route different coding tools across different model providers, with fallback behavior and local control.&lt;/p&gt;

&lt;p&gt;You do not have to buy every claim in a router README to see the pattern.&lt;/p&gt;

&lt;p&gt;Teams are no longer just choosing "which model?" They are choosing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which provider gets which task&lt;/li&gt;
&lt;li&gt;where logs live&lt;/li&gt;
&lt;li&gt;how fallback works&lt;/li&gt;
&lt;li&gt;how cost is capped&lt;/li&gt;
&lt;li&gt;which tools can call which backend&lt;/li&gt;
&lt;li&gt;what policy lives locally versus with the vendor&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That makes client transparency more important, not less.&lt;/p&gt;

&lt;p&gt;If a client treats official endpoints, custom base URLs, proxies, or local routers differently, the operator should be able to see that. A team should not need a packet capture and a prompt diff to understand which path their agent is taking.&lt;/p&gt;

&lt;p&gt;The boring version is better: explicit gateway handling, documented routing assumptions, auditable config, and failure modes that say what happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  Maintainers are drawing the same boundary from the other side
&lt;/h2&gt;

&lt;p&gt;Godot's 2026 contribution-policy update is the maintainer-side version of this problem.&lt;/p&gt;

&lt;p&gt;The post is not just a generic "AI bad" statement. The more interesting argument is about review cost and ownership. AI-generated work can reduce the effort needed to submit code, but it does not reduce the effort needed to review it. In some cases it increases that effort, because maintainers now have to work out whether the contributor understands the patch well enough to fix it.&lt;/p&gt;

&lt;p&gt;That is a brutal but fair standard.&lt;/p&gt;

&lt;p&gt;Open source review depends on a human feedback loop. A maintainer points out a design problem, a missed edge case, or a style issue. The contributor learns, revises, and eventually becomes more useful to the project.&lt;/p&gt;

&lt;p&gt;If the contributor cannot explain the code because an agent produced the substance of it, the loop breaks. The maintainer is no longer reviewing a peer's work. They are debugging output owned by nobody.&lt;/p&gt;

&lt;p&gt;Godot's policy draws a hard line around autonomous agents, substantial AI-authored code, undisclosed AI use, and AI-generated human communication. It still leaves room for limited menial assistance with disclosure and human review.&lt;/p&gt;

&lt;p&gt;That distinction matters. The point is not "never use tools." The point is "somebody has to own the work."&lt;/p&gt;

&lt;p&gt;Agent trust boundaries are the same idea applied earlier in the workflow.&lt;/p&gt;

&lt;p&gt;Who owns the prompt context?&lt;/p&gt;

&lt;p&gt;Who owns the gateway decision?&lt;/p&gt;

&lt;p&gt;Who owns the generated patch?&lt;/p&gt;

&lt;p&gt;Who owns the review burden when the output is wrong?&lt;/p&gt;

&lt;p&gt;If the answer is fuzzy, the system is not ready.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent-ready should mean inspectable, not magical
&lt;/h2&gt;

&lt;p&gt;There is a good version of agent readiness, and it is much less flashy.&lt;/p&gt;

&lt;p&gt;Facebook's Astryx project is useful as a contrast. It presents itself as a design system built for both people and AI assistants, with documented APIs, conventions, CLI usage, and component patterns. The interesting part is not "AI can use it." The interesting part is that the assistant-facing surface is also human-readable.&lt;/p&gt;

&lt;p&gt;That is the pattern I want more teams to copy.&lt;/p&gt;

&lt;p&gt;Do not hide the magic in the client. Move behavior into shared surfaces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;docs humans can review&lt;/li&gt;
&lt;li&gt;commands humans can run&lt;/li&gt;
&lt;li&gt;configs humans can diff&lt;/li&gt;
&lt;li&gt;conventions humans can teach&lt;/li&gt;
&lt;li&gt;policy files humans can enforce&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Agent-friendly infrastructure should make the repo easier to operate, not harder to audit.&lt;/p&gt;

&lt;p&gt;The best agent support often looks embarrassingly ordinary: stable commands, clear names, reliable docs, small examples, strict boundaries, and logs that do not require mythology to interpret.&lt;/p&gt;

&lt;p&gt;That is not less advanced. That is what advanced systems look like after you remove the theater.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical checklist for teams using agents this week
&lt;/h2&gt;

&lt;p&gt;If your team is adding coding agents, routers, or AI-assisted contribution flows, start with the boring questions before arguing about model quality.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Make endpoint behavior explicit.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the client handles official APIs, custom base URLs, local gateways, or proxy-like hosts differently, document the difference. Do not bury it in prompt text.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Treat prompt context as an audit surface.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;System prompts, repo instructions, hidden context, tool metadata, and generated summaries can all shape output. Teams need a way to inspect the meaningful pieces.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Put routing policy in config.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Provider selection, fallback behavior, cost caps, and model routing rules should be visible enough for a reviewer to understand.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Separate telemetry from prompt behavior.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the product needs telemetry, abuse detection, or gateway classification, expose it as telemetry. Do not make developers wonder whether ordinary prompt content is carrying hidden control signals.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Require human ownership for generated code.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;"The agent wrote it" is not an answer to a review comment. The submitter should understand the patch, explain the tradeoffs, and fix it when it breaks.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Make review gates fail loudly.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Silent policy decisions are poison. If a read is blocked, a gateway is rejected, a model is swapped, or a generated contribution violates policy, say so plainly.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Keep agent-facing docs boring.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A good agent instruction file should be useful to a new human contributor too. If only the tool understands it, that is a smell.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Review client upgrades like infrastructure changes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A coding-agent client update can change prompt handling, tool permissions, routing behavior, or telemetry. That deserves the same suspicion you would give a dependency with local execution rights.&lt;/p&gt;

&lt;p&gt;None of this requires a giant platform team. It requires admitting that agent behavior is now part of your engineering system.&lt;/p&gt;

&lt;h2&gt;
  
  
  The trust feature is boredom
&lt;/h2&gt;

&lt;p&gt;The trustworthy agent stack is not the one with the cleverest hidden controls.&lt;/p&gt;

&lt;p&gt;It is the one boring enough to inspect.&lt;/p&gt;

&lt;p&gt;Boring config. Boring logs. Boring endpoint handling. Boring contribution rules. Boring review gates. Boring docs that humans and assistants can both follow.&lt;/p&gt;

&lt;p&gt;That does not mean the underlying work is simple. It means the important behavior is visible where operators can reason about it.&lt;/p&gt;

&lt;p&gt;The model can be brilliant. The workflow can be fast. The tooling can keep improving.&lt;/p&gt;

&lt;p&gt;But if developers cannot tell what the client did, what the gateway changed, what context shaped the output, or who owns the resulting patch, the trust story is already broken.&lt;/p&gt;

&lt;p&gt;Coding agents do not need more hidden cleverness right now.&lt;/p&gt;

&lt;p&gt;They need fewer places for important behavior to hide.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thereallo.dev/blog/claude-code-prompt-steganography" rel="noopener noreferrer"&gt;Claude Code Is Steganographically Marking Requests&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://godotengine.org/article/contribution-policy-2026/" rel="noopener noreferrer"&gt;Changes to our Contribution Policies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/facebook/astryx" rel="noopener noreferrer"&gt;facebook/astryx&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/diegosouzapw/OmniRoute" rel="noopener noreferrer"&gt;diegosouzapw/OmniRoute&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://news.ycombinator.com/item?id=48734373" rel="noopener noreferrer"&gt;Hacker News discussion of the Claude Code marker post&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>security</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Coding agents need file boundaries, not better manners</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Mon, 29 Jun 2026 08:18:20 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/coding-agents-need-file-boundaries-not-better-manners-kg2</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/coding-agents-need-file-boundaries-not-better-manners-kg2</guid>
      <description>&lt;p&gt;The next serious coding-agent feature is not a warmer tone or a smarter autocomplete.&lt;/p&gt;

&lt;p&gt;It is an auditable denylist.&lt;/p&gt;

&lt;p&gt;That sounds boring, which is exactly why it matters. Once an agent can inspect your repo, open local files, summarize context, run commands, or prepare a patch, the trust question stops being "does the model seem careful?" The useful question is much more mechanical:&lt;/p&gt;

&lt;p&gt;What can it read?&lt;/p&gt;

&lt;p&gt;What can it send to the model?&lt;/p&gt;

&lt;p&gt;What can it change?&lt;/p&gt;

&lt;p&gt;And when it says the work is verified, what actually failed loudly enough for a human to notice?&lt;/p&gt;

&lt;p&gt;Developers keep trying to solve agent trust with softer language. "Be careful with secrets." "Do not touch credentials." "Ask before using sensitive files." That is fine as guidance. It is not a boundary.&lt;/p&gt;

&lt;p&gt;A boundary is something the agent cannot talk its way around.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sensitive files are not normal context
&lt;/h2&gt;

&lt;p&gt;There is a current open Codex issue asking for a way to exclude sensitive files and directories from agent access. The examples are exactly the ones you would expect: &lt;code&gt;.env&lt;/code&gt;, private keys, cloud credentials, local config, &lt;code&gt;.aws/&lt;/code&gt;, &lt;code&gt;.ssh/&lt;/code&gt;, and other files that live close to real authority.&lt;/p&gt;

&lt;p&gt;That issue is useful because it cuts through the usual agent hype. This is not an abstract "AI safety" argument. It is a repo hygiene problem that any team can understand.&lt;/p&gt;

&lt;p&gt;Source code is context. Docs are context. Test files are context. Build scripts are context.&lt;/p&gt;

&lt;p&gt;Secrets are different.&lt;/p&gt;

&lt;p&gt;Local credentials are different.&lt;/p&gt;

&lt;p&gt;Customer exports sitting in a working directory are different.&lt;/p&gt;

&lt;p&gt;The mistake is treating all nearby files as equally valid input for a helpful model. They are not. Some files are operational boundaries. Some files exist because the developer's machine is where messy real work happens. Some files were never meant to become model context, even if they happen to be one &lt;code&gt;read_file&lt;/code&gt; away.&lt;/p&gt;

&lt;p&gt;Prompting the agent to "avoid sensitive files" is weaker than a rule the runtime enforces before the agent ever sees the path.&lt;/p&gt;

&lt;p&gt;That distinction matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompt policy is not access control
&lt;/h2&gt;

&lt;p&gt;I do not want to pretend prompts are useless. Repo instructions, agent guidelines, and project policies are real parts of the workflow now. They tell the agent how the project works. They help keep edits consistent. They can prevent a lot of dumb mistakes.&lt;/p&gt;

&lt;p&gt;But they are still prose.&lt;/p&gt;

&lt;p&gt;Prose is reviewable. Prose is useful. Prose is also easy to misread, override, conflict with, or forget when the agent is juggling a long task.&lt;/p&gt;

&lt;p&gt;Access control should not depend on the agent remembering your preference. If a path is off limits, the system should make it off limits.&lt;/p&gt;

&lt;p&gt;That means teams need boring controls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repo-level deny rules for files the agent should never read&lt;/li&gt;
&lt;li&gt;global deny rules for machine-level credential paths&lt;/li&gt;
&lt;li&gt;visible config that code reviewers can inspect&lt;/li&gt;
&lt;li&gt;a clear difference between readable context and forbidden context&lt;/li&gt;
&lt;li&gt;logs that show denied access attempts without leaking the contents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not enterprise theater. Solo developers need this too. The smallest possible version is still useful: a checked-in agent config plus a local global ignore list for secrets and machine-specific state.&lt;/p&gt;

&lt;p&gt;The point is simple. If the agent should not read a file, do not make that a personality test.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generated work moves the burden into review
&lt;/h2&gt;

&lt;p&gt;The research around AI coding agents is starting to make one thing clearer: agents do not remove the need for review. They move more pressure into it.&lt;/p&gt;

&lt;p&gt;One recent paper studies thousands of repositories after AI coding-agent adoption and argues that the effects show up in the human contributor ecosystem, not just in code volume. That is the part teams should pay attention to. More generated work can mean more review depth, more governance work, and more pressure on maintainers to catch problems after the fact.&lt;/p&gt;

&lt;p&gt;That matches how these workflows feel in practice.&lt;/p&gt;

&lt;p&gt;The agent can produce a patch quickly. Great.&lt;/p&gt;

&lt;p&gt;Now somebody has to decide whether the patch touched the right files, used the right assumptions, exposed the wrong context, skipped the wrong tests, or hid a risky change behind a clean summary.&lt;/p&gt;

&lt;p&gt;Weak boundaries make that review worse. If the agent had broad file access, the reviewer has to wonder what it saw. If the agent can read local secrets, the reviewer has to wonder whether any of that state influenced the output. If the agent can sweep through generated assets, design exports, local data, and config blobs, the diff is only part of the story.&lt;/p&gt;

&lt;p&gt;This is where file boundaries become a productivity feature.&lt;/p&gt;

&lt;p&gt;A smaller operating surface is easier to review. A visible denylist is easier to explain. A config file in the repo is easier to discuss than a vague assurance that the agent "probably would not do that."&lt;/p&gt;

&lt;p&gt;Good boundaries do not slow the team down. They reduce the amount of detective work after the agent has already acted.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tests are not proof if the oracle is weak
&lt;/h2&gt;

&lt;p&gt;There is a similar trap with agent-written tests.&lt;/p&gt;

&lt;p&gt;Another recent paper looks at oracle signals in agent-authored test code. The useful takeaway is not "agent tests are bad." The useful takeaway is that test-shaped output can still fail to check the thing that matters.&lt;/p&gt;

&lt;p&gt;A test file can exist.&lt;/p&gt;

&lt;p&gt;The suite can run.&lt;/p&gt;

&lt;p&gt;The summary can look green.&lt;/p&gt;

&lt;p&gt;And the actual behavioral claim can still be under-tested, over-mocked, or asserted in a way that would never catch the bug.&lt;/p&gt;

&lt;p&gt;That matters because teams often talk about agent safety as if "run the tests" closes the loop. It does not. Running tests is a step. Meaningful verification is the loop.&lt;/p&gt;

&lt;p&gt;The same principle applies to file access. "The agent did not mention any secrets" is not proof that it never touched sensitive context. "The agent says it verified the change" is not proof that the verification had a useful oracle.&lt;/p&gt;

&lt;p&gt;Agent workflows need failure modes that are visible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;blocked file reads should be explicit&lt;/li&gt;
&lt;li&gt;skipped tests should be explicit&lt;/li&gt;
&lt;li&gt;flaky verifier output should be explicit&lt;/li&gt;
&lt;li&gt;generated tests should say what behavior they assert&lt;/li&gt;
&lt;li&gt;summaries should separate "I changed this" from "I proved this"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The dangerous state is not failure. Failure is fine. Failure is information.&lt;/p&gt;

&lt;p&gt;The dangerous state is a fake green check.&lt;/p&gt;

&lt;h2&gt;
  
  
  The frontend example is the same pattern
&lt;/h2&gt;

&lt;p&gt;This is not limited to backend repos or secret files.&lt;/p&gt;

&lt;p&gt;Frontend and AI UI work has the same boundary problem, just with different artifacts. A repo may contain design screenshots, generated images, social preview assets, customer mockups, exported UI states, and half-finished experiments that should not automatically become agent context.&lt;/p&gt;

&lt;p&gt;If the task is "prepare a social preview image," the agent probably does not need a folder full of unrelated raw assets. Keep that work outside the agent context when you can. A browser-local utility such as &lt;a href="https://resizeimagefor.com" rel="noopener noreferrer"&gt;Resize Image For&lt;/a&gt; is a better fit for resizing platform assets than handing extra image files to an agent just because they are nearby.&lt;/p&gt;

&lt;p&gt;The same applies when evaluating generated interface patterns. You do not need the agent to ingest every old experiment in the repo to learn what the field looks like. A curated reference surface such as &lt;a href="https://awesomegenerativeui.com/" rel="noopener noreferrer"&gt;Awesome Generative UI&lt;/a&gt; can be enough context for comparing patterns, papers, examples, and tools without widening the agent's access to your local project.&lt;/p&gt;

&lt;p&gt;That is the broader rule: give the agent the context it needs, not every artifact you happen to have.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical checklist for teams adopting agents this week
&lt;/h2&gt;

&lt;p&gt;If your team is adding coding agents to real work, I would start with this checklist before arguing about model choice.&lt;/p&gt;

&lt;p&gt;First, define forbidden paths. Include secrets, credentials, private keys, local environment files, cloud config, customer data, and machine-specific directories. Make the list visible.&lt;/p&gt;

&lt;p&gt;Second, split repo rules from machine rules. The repo can define project boundaries. The developer's machine still needs a global denylist for things that should never be agent-readable anywhere.&lt;/p&gt;

&lt;p&gt;Third, review agent config like build config. If a change gives the agent more context, more write access, or more authority, it deserves real review.&lt;/p&gt;

&lt;p&gt;Fourth, keep generated assets out of context unless the task needs them. Images, previews, exports, logs, snapshots, and local data can carry more information than the agent needs.&lt;/p&gt;

&lt;p&gt;Fifth, make denied reads observable. A silent block is better than a leak, but a visible block is better than mystery. The reviewer should know when the boundary did its job.&lt;/p&gt;

&lt;p&gt;Sixth, separate patch success from verification success. "The diff was produced" is not "the behavior was verified." Make the agent say which checks ran, which checks failed, and which claims are still unproven.&lt;/p&gt;

&lt;p&gt;Seventh, inspect agent-written tests for real oracles. A test that only proves the mock returned the mock value is not doing much for you.&lt;/p&gt;

&lt;p&gt;Eighth, keep source notes for risky changes. If the agent changed auth, file handling, config loading, tool access, data export, or test policy, the review should know which source or rule justified the change.&lt;/p&gt;

&lt;p&gt;None of this requires a giant platform team. It requires deciding that agent access is part of the system design, not an afterthought.&lt;/p&gt;

&lt;h2&gt;
  
  
  The boring boundary is the product
&lt;/h2&gt;

&lt;p&gt;Better models will help. Better IDE integrations will help. Better summaries will help.&lt;/p&gt;

&lt;p&gt;They will not remove the need for hard boundaries.&lt;/p&gt;

&lt;p&gt;A coding agent can be brilliant and still have too much access. It can be careful and still see a file it should never have seen. It can write tests and still fail to prove the behavior. It can produce a beautiful summary and still leave the reviewer guessing about what context shaped the patch.&lt;/p&gt;

&lt;p&gt;The serious version of agent adoption is not "trust the model more."&lt;/p&gt;

&lt;p&gt;It is "make the model operate inside a smaller, inspectable space."&lt;/p&gt;

&lt;p&gt;That is why the denylist matters. It is not a minor settings-panel feature. It is the shape of the trust boundary.&lt;/p&gt;

&lt;p&gt;A coding agent becomes easier to trust when its access rules are boring enough for the whole team to audit.&lt;/p&gt;

&lt;p&gt;That is the bar I would want before letting one work near real repos every day.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/openai/codex/issues/2847" rel="noopener noreferrer"&gt;A way to exclude sensitive files - openai/codex issue #2847&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2606.26289" rel="noopener noreferrer"&gt;Augmentation with Dilution: Human Contributor Ecosystems After AI Coding Agent Adoption&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2606.18168" rel="noopener noreferrer"&gt;All Smoke, No Alarm: Oracle Signals in Agent-Authored Test Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://news.ycombinator.com/news" rel="noopener noreferrer"&gt;Hacker News front page&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/danmercede/when-your-verifier-goes-quiet-why-a-crashed-reviewer-is-not-a-refutation-5bnm"&gt;When Your Verifier Goes Quiet: Why a Crashed Reviewer Is Not a Refutation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/zwiserfit/constitution-prompts-how-we-govern-9-autonomous-agents-without-a-central-orchestrator-2d6i"&gt;Constitution &amp;gt; Prompts: How We Govern 9 Autonomous Agents Without a Central Orchestrator&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>programming</category>
      <category>security</category>
    </item>
    <item>
      <title>Agent tools need supply-chain controls now</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Sun, 28 Jun 2026 08:04:21 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/agent-tools-need-supply-chain-controls-now-1co2</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/agent-tools-need-supply-chain-controls-now-1co2</guid>
      <description>&lt;p&gt;Better prompts will not save a repo with ungoverned agent tools.&lt;/p&gt;

&lt;p&gt;That sounds dramatic until you look at what coding agents are actually becoming. They have moved past chat boxes that suggest code. They read repo instructions. They call tools. They connect to marketplaces. They run inside developer workflows that can touch files, issues, pull requests, package managers, CI, docs, internal APIs, and whatever else the team wires in because "it saves time."&lt;/p&gt;

&lt;p&gt;At that point, the interesting question stops being "is the model smart enough?"&lt;/p&gt;

&lt;p&gt;The better question is: who allowed this tool into the workflow, what can it reach, and how would anyone notice if that changed?&lt;/p&gt;

&lt;p&gt;That is not prompt engineering. That is supply-chain control.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tool layer is where the risk moved
&lt;/h2&gt;

&lt;p&gt;The current agent conversation still spends too much time on model output. Hallucinated code matters. Bad refactors matter. A confident but wrong explanation can waste an afternoon.&lt;/p&gt;

&lt;p&gt;But once an agent can act through tools, the failure mode gets less cute.&lt;/p&gt;

&lt;p&gt;A bad suggestion is one thing. A bad suggestion with access to a shell, a repo token, a package installer, a browser session, or a writable project directory is a different class of problem. The model is no longer only producing text for a human to inspect. It is sitting in front of capability.&lt;/p&gt;

&lt;p&gt;That is why the recent DEV.to framing around plugin marketplaces as endpoint policy feels right. Teams do not want every developer hand-auditing random endpoints, plugin manifests, MCP servers, and agent integrations from scratch. They need a control plane. They need known sources, scoped permissions, reviewable installation paths, and boring rules about what is allowed.&lt;/p&gt;

&lt;p&gt;Developers already learned this lesson with packages.&lt;/p&gt;

&lt;p&gt;We do not install dependencies by vibes, or at least we should not. We care about the registry, the maintainer, the version, the lockfile, the transitive graph, the install script, the update path, and the review diff.&lt;/p&gt;

&lt;p&gt;Agent tools deserve the same suspicion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Repo instructions are now infrastructure
&lt;/h2&gt;

&lt;p&gt;GitHub's same-week support for &lt;code&gt;AGENTS.md&lt;/code&gt; in Copilot coding agent is a useful signal because it makes something explicit that was already happening informally.&lt;/p&gt;

&lt;p&gt;Agent instructions are becoming project artifacts.&lt;/p&gt;

&lt;p&gt;That is a good thing. A repo should be able to tell an agent how tests run, where generated files live, which commands are safe, what style the project uses, and which workflows should be avoided. Keeping that in version control is much better than hiding it in one person's chat history.&lt;/p&gt;

&lt;p&gt;But putting agent behavior into the repo also changes the review burden.&lt;/p&gt;

&lt;p&gt;If a pull request edits &lt;code&gt;AGENTS.md&lt;/code&gt;, that is not "just docs." It may change how future agents modify code, run commands, interpret ownership boundaries, or decide which tests count. In practice, it can behave more like a CI config change than a README tweak.&lt;/p&gt;

&lt;p&gt;So review it that way.&lt;/p&gt;

&lt;p&gt;Ask the same uncomfortable questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does this instruction grant the agent more freedom than the project expects?&lt;/li&gt;
&lt;li&gt;Does it skip tests, approvals, or verification steps?&lt;/li&gt;
&lt;li&gt;Does it route work through a tool nobody owns?&lt;/li&gt;
&lt;li&gt;Does it tell the agent to trust generated output too easily?&lt;/li&gt;
&lt;li&gt;Does it conflict with the security model in CI, deployment, or local development?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The point is not to make every instruction file scary. The point is to stop treating it as disposable text. A repo-level agent file is operational policy written in prose.&lt;/p&gt;

&lt;p&gt;Prose can ship bugs too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Marketplace policy is a real security feature
&lt;/h2&gt;

&lt;p&gt;GitHub's &lt;code&gt;strictKnownMarketplaces&lt;/code&gt; support points at the other half of the problem: tool source control.&lt;/p&gt;

&lt;p&gt;The useful question is not "can the agent install tools?" The useful question is "which tool sources are known enough to be allowed?"&lt;/p&gt;

&lt;p&gt;That sounds like a small enterprise setting. It is not. It is the same pattern developers already use everywhere else. Approved package registries. Container base image policies. Browser extension allowlists. Internal Terraform modules. CI actions pinned to trusted publishers.&lt;/p&gt;

&lt;p&gt;Agent marketplaces are heading toward that world because they have to.&lt;/p&gt;

&lt;p&gt;If an agent can discover and attach tools from arbitrary places, your workflow has a new dependency channel. Maybe the tool is fine. Maybe the marketplace has real review. Maybe the manifest is honest. Maybe the tool does exactly what the name suggests.&lt;/p&gt;

&lt;p&gt;Maybe.&lt;/p&gt;

&lt;p&gt;I would rather not build a team process on "maybe."&lt;/p&gt;

&lt;p&gt;A known-marketplace policy does not solve every agent security problem. It will not magically prevent prompt injection, data leakage, overbroad permissions, misleading tool descriptions, or a human approving the wrong action. It does give teams one concrete lever: tools should come from approved sources, not random convenience paths.&lt;/p&gt;

&lt;p&gt;That lever matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Treat agent tools like dependencies
&lt;/h2&gt;

&lt;p&gt;The mental model I would use is simple: if an agent tool can affect the repo, the filesystem, an account, a network request, a deployment, or a user-visible artifact, treat it like a dependency.&lt;/p&gt;

&lt;p&gt;That means the tool needs an owner.&lt;/p&gt;

&lt;p&gt;It needs a source.&lt;/p&gt;

&lt;p&gt;It needs a permission story.&lt;/p&gt;

&lt;p&gt;It needs an update path.&lt;/p&gt;

&lt;p&gt;It needs a way to be removed without archaeology.&lt;/p&gt;

&lt;p&gt;This is where a lot of agent adoption gets sloppy. A team adds a local helper, an MCP server, a marketplace plugin, a browser connector, or a repo-specific script because one workflow becomes faster. The demo works. Everyone likes the speed. Then six weeks later nobody remembers why the tool can read the whole workspace or why the agent is allowed to call it during review.&lt;/p&gt;

&lt;p&gt;That is not an AI problem. That is a normal engineering problem with a model-shaped interface on top.&lt;/p&gt;

&lt;p&gt;The fix is not mystical.&lt;/p&gt;

&lt;p&gt;Keep an inventory of agent tools. Write down where each one comes from, what it can do, and who owns it.&lt;/p&gt;

&lt;p&gt;Version repo-level agent instructions. Review changes like you would review CI, dependency, or build-system changes.&lt;/p&gt;

&lt;p&gt;Allowlist tool sources. If your platform supports known marketplace policy, use it. If it does not, document the manual equivalent before people start installing whatever makes a demo look good.&lt;/p&gt;

&lt;p&gt;Separate read tools from write tools. A documentation search tool and a tool that mutates issues, files, or deployment state should not feel like the same kind of permission.&lt;/p&gt;

&lt;p&gt;Log tool calls in a form humans can read. If the audit trail is technically present but practically useless, you do not have an audit trail. You have a JSON landfill.&lt;/p&gt;

&lt;p&gt;Make risky capabilities obvious. Shell access, filesystem writes, credential access, browser state, external network calls, and package installation should stand out during review.&lt;/p&gt;

&lt;p&gt;Have a disable path. If a tool turns out to be wrong, stale, compromised, or just too broad, the team should know how to remove it quickly.&lt;/p&gt;

&lt;p&gt;None of this is glamorous. Good. Glamour is how people talk themselves into skipping the boring controls.&lt;/p&gt;

&lt;h2&gt;
  
  
  This is not enterprise paranoia
&lt;/h2&gt;

&lt;p&gt;It is tempting to file this under "big company governance" and move on.&lt;/p&gt;

&lt;p&gt;That is a mistake.&lt;/p&gt;

&lt;p&gt;Small teams are often the ones most exposed to messy agent workflows because they move fastest. One developer wires in a tool. Another copies the setup. A third adds repo instructions. Someone adds a marketplace plugin because it solved a specific task. Nobody writes the policy because the team is small and "we all know what is going on."&lt;/p&gt;

&lt;p&gt;Until they do not.&lt;/p&gt;

&lt;p&gt;The same is true for solo builders. If an agent can act on your machine, inside your repo, against your accounts, the boundary still matters. You may not need a formal approval board. You still need to know what you installed and what it can touch.&lt;/p&gt;

&lt;p&gt;The arXiv work on autonomous-agent security and privacy is useful background here because it keeps pulling the conversation back to actions and permissions. A wrong answer is annoying. A system with delegated capability doing the wrong thing in a place that matters is worse.&lt;/p&gt;

&lt;p&gt;That is the part developers should internalize.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical adoption checklist
&lt;/h2&gt;

&lt;p&gt;If your team is adding coding agents this week, I would start with a blunt checklist.&lt;/p&gt;

&lt;p&gt;First, list the surfaces the agent can touch. Repos, local files, terminals, browsers, SaaS accounts, package managers, CI systems, issue trackers, docs, databases, cloud consoles, internal APIs. Be honest. The weird edge cases are usually where the risk lives.&lt;/p&gt;

&lt;p&gt;Second, put agent instructions in version control and review them as behavior changes. If the instruction changes what the agent is expected to do, it deserves real review.&lt;/p&gt;

&lt;p&gt;Third, define approved tool sources. Use marketplace policy where your platform gives it to you. If you are using local tools or MCP servers, write down the source and owner.&lt;/p&gt;

&lt;p&gt;Fourth, split capabilities by blast radius. Read-only context tools should not be reviewed the same way as write-capable tools. A tool that can search docs is not the same as a tool that can edit files, publish content, rotate config, or open pull requests.&lt;/p&gt;

&lt;p&gt;Fifth, make permissions visible before execution. A human should not have to infer from a friendly tool name that the agent is about to mutate a real system.&lt;/p&gt;

&lt;p&gt;Sixth, log what happened. "Tool call succeeded" is too thin. Log the tool, target, visible parameters, authority used, and result. The future reviewer should not need a ritual to reconstruct the incident.&lt;/p&gt;

&lt;p&gt;Seventh, rehearse removal. If you cannot disable a tool quickly, you do not control it. You are just hoping it behaves.&lt;/p&gt;

&lt;p&gt;This checklist will not make agent workflows perfectly safe. Perfect safety is not the point. The point is to move from accidental trust to intentional trust.&lt;/p&gt;

&lt;h2&gt;
  
  
  The boring teams will win
&lt;/h2&gt;

&lt;p&gt;The next serious coding-agent advantage will not come from the team with the flashiest prompt file.&lt;/p&gt;

&lt;p&gt;It will come from the team that can let agents do useful work without turning every tool into an unreviewed side door. The team with boring inventories. Boring allowlists. Boring repo instructions. Boring logs. Boring rollback paths.&lt;/p&gt;

&lt;p&gt;That sounds less exciting than "the agent can use any tool."&lt;/p&gt;

&lt;p&gt;It is also the version that survives contact with real projects.&lt;/p&gt;

&lt;p&gt;Agent tools should be reviewed like dependencies because operationally, that is what they are. They bring code, authority, configuration, network paths, and failure modes into the workflow.&lt;/p&gt;

&lt;p&gt;Treat them that way now, while the stack is still small enough to understand.&lt;/p&gt;

&lt;p&gt;Waiting until the tool layer becomes invisible is how teams end up debugging their own trust model at the worst possible time.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/pvgomes/plugin-marketplaces-are-the-new-endpoint-policy-for-coding-agents-19p6"&gt;Plugin Marketplaces Are the New Endpoint Policy for Coding Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.blog/changelog/2026-06-26-copilot-coding-agent-now-respects-strictknownmarketplaces-policy/" rel="noopener noreferrer"&gt;GitHub Copilot coding agent now respects &lt;code&gt;strictKnownMarketplaces&lt;/code&gt; policy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.blog/changelog/2026-06-23-copilot-coding-agent-now-supports-agents-md-custom-instructions/" rel="noopener noreferrer"&gt;GitHub Copilot coding agent now supports AGENTS.md custom instructions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2606.13449" rel="noopener noreferrer"&gt;Agent Security and Privacy: A Risk Taxonomy for Autonomous AI Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reddit.com/r/ChatGPTCoding/" rel="noopener noreferrer"&gt;r/ChatGPTCoding current coding-agent discussion feed&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.producthunt.com/p/general/builder-ai-shuts-down" rel="noopener noreferrer"&gt;Builder.ai shuts down&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>AI-built apps don't get a privacy discount</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Mon, 22 Jun 2026 03:37:56 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/ai-built-apps-dont-get-a-privacy-discount-2ek2</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/ai-built-apps-dont-get-a-privacy-discount-2ek2</guid>
      <description>&lt;p&gt;The AI-built app era needs less demo energy and more permission discipline.&lt;/p&gt;

&lt;p&gt;Shipping got weirdly cheap. A small team, or one stubborn developer, can now push something that looks like a real app much faster than they could a few years ago. The interface can be polished. The README can be clean. The build can work. The whole thing can feel more finished than it has any right to feel.&lt;/p&gt;

&lt;p&gt;None of that reduces the privacy bill.&lt;/p&gt;

&lt;p&gt;If your app can read device signals, touch user files, inspect local state, send network requests, keep logs, export data, or process user assets, "built mostly with AI" is not a disclaimer. It is trivia. The user still has the same question:&lt;/p&gt;

&lt;p&gt;What can this thing see?&lt;/p&gt;

&lt;p&gt;That question is part of the UI whether you design for it or not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Loupe is a useful warning shot
&lt;/h2&gt;

&lt;p&gt;Loupe is an iOS and iPadOS app from Mysk Research that shows what native apps can read through public APIs. Its README groups signals into categories like passive, permission-gated, and advanced. It also says the app keeps values on device unless the user exports them.&lt;/p&gt;

&lt;p&gt;That is already interesting. Most users do not have a clean mental model for what an app can see without asking, what needs a prompt, and what only becomes visible through more advanced inspection.&lt;/p&gt;

&lt;p&gt;The more interesting detail, at least for developers, is that the project says Loupe was written almost entirely with AI coding tools.&lt;/p&gt;

&lt;p&gt;That does not make Loupe bad. It makes the point sharper.&lt;/p&gt;

&lt;p&gt;AI can help produce the app. It cannot absorb the responsibility for the app's capability boundary. The moment a tool starts explaining what apps can read, it has to be clear about what it reads, what stays local, what can be exported, and what the user is supposed to trust.&lt;/p&gt;

&lt;p&gt;That obligation does not care whether the implementation came from a senior engineer, a weekend prototype, or a model-assisted sprint.&lt;/p&gt;

&lt;h2&gt;
  
  
  "Built with AI" is not a privacy model
&lt;/h2&gt;

&lt;p&gt;There is a lazy version of AI product thinking that treats generated code as a category of its own. The app is experimental, therefore rough edges are expected. The builder moved fast, therefore the responsibility is lighter. The README says AI helped, therefore the reader should grade on a curve.&lt;/p&gt;

&lt;p&gt;No.&lt;/p&gt;

&lt;p&gt;Users do not experience your app as a prompt transcript. They experience it as software running on their machine, phone, browser, or account. It either asks for permissions clearly or it does not. It either sends data somewhere or it does not. It either explains export paths or it leaves people guessing.&lt;/p&gt;

&lt;p&gt;For developer tools and small utilities, the tempting shortcut is to ship the feature first and explain the boundary later. That is how you end up with vague privacy copy around behavior that should have been designed as product behavior from day one.&lt;/p&gt;

&lt;p&gt;"We value privacy" is not a boundary.&lt;/p&gt;

&lt;p&gt;"Images are processed locally in your browser and never uploaded for resizing" is a boundary.&lt;/p&gt;

&lt;p&gt;"Network access is only used to fetch metadata from this endpoint" is a boundary.&lt;/p&gt;

&lt;p&gt;"Export happens only when you click this button" is a boundary.&lt;/p&gt;

&lt;p&gt;Those sentences are not legal magic. They are engineering commitments the product has to keep.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inspectability beats vibes
&lt;/h2&gt;

&lt;p&gt;The community reaction around app privacy tools keeps circling the same practical need: people want behavior they can inspect.&lt;/p&gt;

&lt;p&gt;Apple's App Privacy Report pushed that idea into the user interface by showing things like data and sensor access, network activity, and contacted domains. Research around that style of privacy reporting points to the next problem too: raw visibility is not enough if users cannot understand the purpose behind what they are seeing.&lt;/p&gt;

&lt;p&gt;That is the part developers should steal.&lt;/p&gt;

&lt;p&gt;The strongest privacy posture is boring: the visible behavior matches the explanation.&lt;/p&gt;

&lt;p&gt;If the app says it works locally, a network log should not look suspicious.&lt;/p&gt;

&lt;p&gt;If the app says export is user controlled, there should be an obvious export action.&lt;/p&gt;

&lt;p&gt;If the app needs permissions, the product should explain why before the OS prompt makes everything feel abrupt.&lt;/p&gt;

&lt;p&gt;If the tool processes sensitive assets, the processing path should be boring enough that a skeptical user can understand it.&lt;/p&gt;

&lt;p&gt;Privacy copy should be the receipt, not the substitute.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local-first only helps when the boundary is concrete
&lt;/h2&gt;

&lt;p&gt;Local processing is one of the easiest boundaries to understand when it is real.&lt;/p&gt;

&lt;p&gt;A narrow browser utility is a good example. If someone uploads an image, resizes it, previews the output, and downloads the result without sending the image pixels to a server, the privacy story is not complicated. It is just constrained.&lt;/p&gt;

&lt;p&gt;That is why tools like &lt;a href="https://resizeimagefor.com" rel="noopener noreferrer"&gt;Resize Image For&lt;/a&gt; are useful examples in this conversation. The point is not that every app should be an image resizer. The point is that the workflow has a small, explainable boundary: upload in the browser, process locally, preview the result, download the file.&lt;/p&gt;

&lt;p&gt;That kind of design does not need dramatic privacy language. It needs the implementation to stay inside the box it describes.&lt;/p&gt;

&lt;p&gt;The same idea applies to AI-built apps.&lt;/p&gt;

&lt;p&gt;If the app can avoid a permission, avoid it.&lt;/p&gt;

&lt;p&gt;If it can process locally, process locally.&lt;/p&gt;

&lt;p&gt;If it needs the network, make the network behavior legible.&lt;/p&gt;

&lt;p&gt;If it exports data, make export explicit.&lt;/p&gt;

&lt;p&gt;If telemetry is not essential, do not add it just because every product analytics template assumes it.&lt;/p&gt;

&lt;p&gt;The boring boundary is the feature.&lt;/p&gt;

&lt;h2&gt;
  
  
  The checklist I would use before shipping
&lt;/h2&gt;

&lt;p&gt;If an AI coding tool helped build your app, the privacy review should get more explicit, not less. Generated code can be fine. It can also include defaults you did not notice, dependencies you did not inspect, and flows that feel harmless until someone asks where the data goes.&lt;/p&gt;

&lt;p&gt;I would start with a blunt checklist.&lt;/p&gt;

&lt;p&gt;List the data the app can see. Not the data you think of as "private." All of it. Device signals, files, clipboard access, location, camera, contacts, account identifiers, logs, generated outputs, uploaded assets, and metadata.&lt;/p&gt;

&lt;p&gt;Separate passive visibility from permission-gated access. If the app can see something without a prompt, say so internally. That is exactly the kind of thing users do not expect.&lt;/p&gt;

&lt;p&gt;Write down every network path. Domains, endpoints, analytics, error reporting, update checks, model calls, storage, payment flows, whatever applies. If you cannot explain why a request exists, it probably should not survive review.&lt;/p&gt;

&lt;p&gt;Make export a user action. Silent movement of data is where trust starts leaking. If users are creating a report, saving a file, sharing an asset, or sending something to another service, make the moment obvious.&lt;/p&gt;

&lt;p&gt;Prefer narrow permissions. Ask for the thing you need, when you need it. Broad permissions feel convenient right up until they become the whole risk profile.&lt;/p&gt;

&lt;p&gt;Test the privacy story like a feature. Open the app with network inspection. Trigger the main flows. Check what leaves the machine. Check what persists after refresh or restart. Check what happens when permissions are denied. The README should match what the app actually does.&lt;/p&gt;

&lt;p&gt;Then make the app say what it does in plain language.&lt;/p&gt;

&lt;p&gt;Not a wall of policy text. Not "military-grade privacy." Just the operational truth.&lt;/p&gt;

&lt;h2&gt;
  
  
  The trust boundary is still yours
&lt;/h2&gt;

&lt;p&gt;AI-assisted development changes the cost of building software. It does not change the accountability model.&lt;/p&gt;

&lt;p&gt;That is the part I think a lot of builders are going to learn the awkward way. The app may have been cheap to produce, but the user's trust is not cheaper. The permissions still count. The network calls still count. The data paths still count. The unclear export flow still counts.&lt;/p&gt;

&lt;p&gt;The best AI-built tools will not be the ones that apologize for being AI-built.&lt;/p&gt;

&lt;p&gt;They will be the ones where the implementation, the interface, and the privacy explanation all point in the same direction.&lt;/p&gt;

&lt;p&gt;AI can help ship the interface.&lt;/p&gt;

&lt;p&gt;The trust boundary is still yours.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/mysk-research/loupe" rel="noopener noreferrer"&gt;mysk-research/loupe&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://news.ycombinator.com/news" rel="noopener noreferrer"&gt;Hacker News front-page Loupe signal&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.wired.com/story/ios-15-app-privacy-report/" rel="noopener noreferrer"&gt;How to Read Your iOS 15 App Privacy Report&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2511.00467" rel="noopener noreferrer"&gt;A Big Step Forward? A User-Centric Examination of iOS App Privacy Report and Enhancements&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>security</category>
      <category>privacy</category>
      <category>webdev</category>
      <category>ai</category>
    </item>
    <item>
      <title>MCP's real production problem is the trust boundary</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Sun, 21 Jun 2026 08:03:37 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/mcps-real-production-problem-is-the-trust-boundary-gg9</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/mcps-real-production-problem-is-the-trust-boundary-gg9</guid>
      <description>&lt;p&gt;"It connected" is not production readiness.&lt;/p&gt;

&lt;p&gt;That is the demo milestone. It is useful, sure. The first time an agent calls a real tool, pulls data from a real service, or edits something outside its own chat box, the whole thing suddenly feels less like autocomplete and more like infrastructure.&lt;/p&gt;

&lt;p&gt;But production is where the easy excitement gets boring.&lt;/p&gt;

&lt;p&gt;The hard question is not "can the agent call a tool?" The hard question is "can I understand exactly what authority crossed that boundary, what resource it touched, what the model was shown, what the user approved, and how I undo it when something feels wrong?"&lt;/p&gt;

&lt;p&gt;That is the part MCP teams need to take seriously.&lt;/p&gt;

&lt;p&gt;MCP makes tool attachment feel clean. That is the whole appeal. A host can talk to servers. Servers expose tools and resources. Agents get a common way to reach local workflows, SaaS APIs, docs, databases, browser state, repo context, and all the other messy places where work actually lives.&lt;/p&gt;

&lt;p&gt;Great.&lt;/p&gt;

&lt;p&gt;Now every one of those connections is a trust boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  The demo boundary is too small
&lt;/h2&gt;

&lt;p&gt;Most MCP demos focus on the happy path:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;connect the server&lt;/li&gt;
&lt;li&gt;list the tools&lt;/li&gt;
&lt;li&gt;ask the agent to do something&lt;/li&gt;
&lt;li&gt;watch the tool call happen&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a reasonable demo. It is also nowhere near enough for a production workflow.&lt;/p&gt;

&lt;p&gt;The production boundary is bigger. It includes the host, the MCP client, the server, the authorization server, the resource being accessed, the model context, the tool metadata, the approval UI, the logs, and the human who has to review the result later.&lt;/p&gt;

&lt;p&gt;If that sounds like too much surface area, that is the point. The moment an agent can call tools, your security model is no longer just "does the API endpoint require auth?" It becomes "what did the agent believe this tool was, who gave it authority, and what could it do with that authority?"&lt;/p&gt;

&lt;p&gt;That is a much more annoying question. It is also the useful one.&lt;/p&gt;

&lt;p&gt;I do not think this means MCP is broken. The opposite, really. MCP is getting real enough that the boring boundary questions matter now. Standards only become interesting when people start depending on them.&lt;/p&gt;

&lt;h2&gt;
  
  
  OAuth is more than a login screen
&lt;/h2&gt;

&lt;p&gt;The MCP authorization spec is a good reminder that remote tool use changes the shape of auth.&lt;/p&gt;

&lt;p&gt;When an MCP server runs over HTTP and touches user-linked services, it is not enough to wave at OAuth and call it done. The spec frames protected MCP servers as OAuth resource servers and MCP clients as OAuth clients. That means the boring details matter: protected resource metadata, authorization server metadata, resource indicators, bearer tokens, token audience, PKCE, and scope boundaries.&lt;/p&gt;

&lt;p&gt;This is where a lot of "agent tool" thinking gets sloppy.&lt;/p&gt;

&lt;p&gt;A token is not a magic permission blob that should be passed around until something works. A token is authority. If the wrong service can accept it, or the wrong layer can replay it, or the client cannot tell which resource it was meant for, you have not built a helpful shortcut. You have built confusion into the system.&lt;/p&gt;

&lt;p&gt;The official guidance is direct about token passthrough. Treating a token issued for one service as a convenient credential for another service is a boundary failure. It may make a prototype easier. It also makes the trust model harder to explain, harder to audit, and harder to recover from.&lt;/p&gt;

&lt;p&gt;This is the part developers should bring back into everyday review:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What resource is this token actually for?&lt;/li&gt;
&lt;li&gt;Which client is allowed to use it?&lt;/li&gt;
&lt;li&gt;What scopes were granted?&lt;/li&gt;
&lt;li&gt;Can the server validate the token audience?&lt;/li&gt;
&lt;li&gt;Can a user revoke this path without tearing down everything else?&lt;/li&gt;
&lt;li&gt;Are local and remote MCP servers being treated differently where they should be?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this is glamorous. Good. Permission should be boring.&lt;/p&gt;

&lt;p&gt;The worst version of an agent workflow is one where the auth path works, but nobody can explain it after the fact.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool descriptions are not harmless docs
&lt;/h2&gt;

&lt;p&gt;The part that still feels under-discussed is tool metadata.&lt;/p&gt;

&lt;p&gt;In normal software, a description field is usually just documentation. Maybe it shows up in a UI. Maybe someone reads it. Maybe nobody does.&lt;/p&gt;

&lt;p&gt;In an MCP client, tool descriptions and schemas can end up inside model context. That changes their role. They are labels for humans, but they also influence how the model decides what to call, when to call it, and what parameters to send.&lt;/p&gt;

&lt;p&gt;That is why the tool-poisoning research around MCP is worth paying attention to. "A malicious server runs bad code" is the obvious fear. The more subtle failure is a server providing metadata that steers the model toward the wrong behavior.&lt;/p&gt;

&lt;p&gt;That should make every approval dialog feel a little more serious.&lt;/p&gt;

&lt;p&gt;If the client says "approve this tool call," what is the user actually seeing? A friendly tool name? A sanitized summary? The real parameters? The server-provided description? The resource being touched? The authority being used?&lt;/p&gt;

&lt;p&gt;If the answer is "mostly vibes," that is not enough.&lt;/p&gt;

&lt;p&gt;A tool description is part of the attack surface once it influences the model. A schema is part of the attack surface once it shapes the call. An approval UI is part of the security surface once a human is expected to catch mistakes there.&lt;/p&gt;

&lt;p&gt;This is where product design and security stop being separate conversations. The user cannot approve what the interface hides.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local does not automatically mean safe
&lt;/h2&gt;

&lt;p&gt;There is a tempting shortcut in developer tooling: local equals trusted.&lt;/p&gt;

&lt;p&gt;That is sometimes true enough. It is not a rule.&lt;/p&gt;

&lt;p&gt;A local MCP server can still expose too much filesystem access. It can still bridge into credentials. It can still make network calls. It can still pass unreviewed context into the model. It can still become the thing an agent uses because the description sounded convenient.&lt;/p&gt;

&lt;p&gt;Local reduces some risks and increases others. You may avoid a remote auth flow, but you also put the server close to sensitive repo state, shell commands, browser profiles, env files, local databases, and all the half-finished work developers keep on their machines.&lt;/p&gt;

&lt;p&gt;That does not mean "do not run local MCP servers." It means do not skip the boundary review just because the process is on your laptop.&lt;/p&gt;

&lt;p&gt;For a local server, I would still want to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which directories it can read&lt;/li&gt;
&lt;li&gt;whether it can write or execute&lt;/li&gt;
&lt;li&gt;what secrets it can see&lt;/li&gt;
&lt;li&gt;what network access it has&lt;/li&gt;
&lt;li&gt;how tools are named and described&lt;/li&gt;
&lt;li&gt;whether calls are logged&lt;/li&gt;
&lt;li&gt;how to disable it quickly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Again, boring. Again, exactly the point.&lt;/p&gt;

&lt;h2&gt;
  
  
  The approval screen is developer experience
&lt;/h2&gt;

&lt;p&gt;Security advice often gets written like paperwork. That is unfortunate, because the best MCP safety features are also developer experience features.&lt;/p&gt;

&lt;p&gt;Visible parameters are DX.&lt;/p&gt;

&lt;p&gt;Readable tool descriptions are DX.&lt;/p&gt;

&lt;p&gt;Small scopes are DX.&lt;/p&gt;

&lt;p&gt;Revocation is DX.&lt;/p&gt;

&lt;p&gt;Session handling is DX.&lt;/p&gt;

&lt;p&gt;Audit logs are DX.&lt;/p&gt;

&lt;p&gt;The developer trying to ship with an agent does not want a lecture about confused deputies or token audience validation. They want to know whether this tool call is about to touch the wrong account, write to the wrong repo, post to the wrong workspace, or send private context somewhere it does not belong.&lt;/p&gt;

&lt;p&gt;The UI should make that obvious.&lt;/p&gt;

&lt;p&gt;If the approval step is just a speed bump, people will click through it. If it shows the real resource, the real operation, the real parameters, and the real authority, it becomes part of the workflow.&lt;/p&gt;

&lt;p&gt;That is what production readiness looks like to me. Not an impressive number of connected tools. A system where the next action is legible before it happens and reviewable after it happens.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical MCP checklist
&lt;/h2&gt;

&lt;p&gt;If I were evaluating an MCP-backed agent workflow before letting it near real work, I would keep the checklist blunt.&lt;/p&gt;

&lt;p&gt;Start with fewer tools than you think you need. Tool sprawl is review sprawl. Every new server adds metadata, permissions, sessions, and failure modes.&lt;/p&gt;

&lt;p&gt;Prefer explicit scopes. If a tool only needs read access, do not give it write access because write access is convenient later. Convenience is how prototypes become weird production incidents.&lt;/p&gt;

&lt;p&gt;Do not pass tokens through layers just to make integration easier. Bind tokens to the right resource and audience. If that sounds annoying, that is probably the boundary doing its job.&lt;/p&gt;

&lt;p&gt;Show parameters before execution. A human should not have to infer what the agent is about to do from a cute tool name.&lt;/p&gt;

&lt;p&gt;Treat tool descriptions as inputs, not decoration. Review them. Keep them short. Make them accurate. Do not let a server smuggle policy into prose that the model will treat as instruction.&lt;/p&gt;

&lt;p&gt;Log calls in a way a developer can actually read. A giant blob of JSON nobody opens is not an audit trail. The useful record says what tool ran, against which resource, with which visible parameters, under which authority, and what happened next.&lt;/p&gt;

&lt;p&gt;Separate local and remote assumptions. A local server may not need OAuth. It still needs a permission story. A remote server may have OAuth. It still needs audience validation, session discipline, and revocation.&lt;/p&gt;

&lt;p&gt;Make rollback obvious. If a tool can mutate state, the workflow needs a way to stop, revoke, revert, or at least explain the damage without detective work.&lt;/p&gt;

&lt;p&gt;Force the agent to say what it did not verify. That one sounds small, but it changes the tone of the whole system. "I called the tool and got a success response" is not the same as "I verified the target resource changed correctly and logged the call."&lt;/p&gt;

&lt;h2&gt;
  
  
  Trust is the product surface now
&lt;/h2&gt;

&lt;p&gt;MCP's value is obvious: common plumbing for agents and tools. I want that world. I do not want every agent platform inventing its own one-off plugin format forever.&lt;/p&gt;

&lt;p&gt;But the useful version of MCP is not the one with the longest tool list.&lt;/p&gt;

&lt;p&gt;The useful version is the one where permission is visible, authority is scoped, metadata is treated as a real input, and a human can reconstruct what happened without reading a detective novel made of logs.&lt;/p&gt;

&lt;p&gt;"The agent called the tool" is a nice demo.&lt;/p&gt;

&lt;p&gt;"The right agent used the right authority against the right resource, showed the parameters, left an audit trail, and can be revoked cleanly" is the production bar.&lt;/p&gt;

&lt;p&gt;That is less flashy. It is also the only version I would trust near real work.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://modelcontextprotocol.io/specification/2025-06-18/basic/authorization" rel="noopener noreferrer"&gt;MCP Authorization specification&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://modelcontextprotocol.io/docs/tutorials/security/security_best_practices" rel="noopener noreferrer"&gt;MCP Security Best Practices&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2603.22489" rel="noopener noreferrer"&gt;Model Context Protocol Threat Modeling and Analyzing Vulnerabilities to Prompt Injection with Tool Poisoning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2605.22333" rel="noopener noreferrer"&gt;A First Measurement Study on Authentication Security in Real-World Remote MCP Servers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/mathenemy/-why-most-production-ready-mcp-servers-actually-arent-1pm2"&gt;Why Most "Production-Ready" MCP Servers Actually Aren't&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/modelcontextprotocol/modelcontextprotocol" rel="noopener noreferrer"&gt;modelcontextprotocol/modelcontextprotocol&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>oauth</category>
    </item>
    <item>
      <title>Your AI frontend workflow needs proof, not screenshots</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Sat, 20 Jun 2026 08:12:01 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/your-ai-frontend-workflow-needs-proof-not-screenshots-48mn</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/your-ai-frontend-workflow-needs-proof-not-screenshots-48mn</guid>
      <description>&lt;p&gt;A screenshot is not proof.&lt;/p&gt;

&lt;p&gt;It is an artifact. Sometimes a useful one. Sometimes the fastest way to show that something rendered at least once on at least one machine under at least one pile of hidden state.&lt;/p&gt;

&lt;p&gt;But if an AI agent just changed your frontend and the only evidence is a screenshot, you still do not know enough.&lt;/p&gt;

&lt;p&gt;You do not know which selector failed before the screenshot was taken. You do not know whether the console was clean. You do not know whether the network request returned real data or a mocked happy path. You do not know whether the layout works after refresh, on mobile, behind a feature flag, or with the next bit of state a user is likely to hit.&lt;/p&gt;

&lt;p&gt;The agent can still explain the work beautifully. That is the dangerous part.&lt;/p&gt;

&lt;p&gt;The real bottleneck in AI-assisted frontend work is not whether the model can produce UI code. It can. The bottleneck is whether the workflow can prove what happened when that code reached a browser.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frontend failure got harder to trust
&lt;/h2&gt;

&lt;p&gt;Hand-written frontend bugs are annoying, but they usually arrive with a trail you understand. You changed the component. You ran the app. You saw the failure. You probably remember the assumption you made.&lt;/p&gt;

&lt;p&gt;Agent-written frontend bugs feel different.&lt;/p&gt;

&lt;p&gt;The agent may touch a component, a hook, a route, a style file, a fixture, and a test in one pass. It may say the implementation is complete. It may say it ran checks. It may even include a neat summary with bullet points that look like a changelog.&lt;/p&gt;

&lt;p&gt;That summary is not evidence.&lt;/p&gt;

&lt;p&gt;Frontend work lives in the browser, which means correctness is spread across DOM state, CSS behavior, event handling, API timing, accessibility, viewport size, persisted state, and the boring little details that never fit into a diff summary. The agent does not get credit for describing success. It gets credit when the workflow leaves enough evidence for a human to inspect failure.&lt;/p&gt;

&lt;p&gt;This is why browser testing discussions around AI work keep feeling more urgent. The question is no longer just "did the test pass?" It is "can you prove why it failed, and can the next run recover without guessing?"&lt;/p&gt;

&lt;p&gt;That is a much better question.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local agents moved validation into the environment
&lt;/h2&gt;

&lt;p&gt;The practical shift with local coding agents is that the agent is no longer just a text box. It sits near the repo. It may run shell commands. It may inspect files. It may start a dev server. It may open a browser. It may use editor state, terminal output, local tools, and project-specific rules.&lt;/p&gt;

&lt;p&gt;That makes the surrounding environment part of the product.&lt;/p&gt;

&lt;p&gt;A local setup guide for coding agents is interesting for exactly that reason. The setup details are not just installation trivia. They decide what the agent can observe, mutate, and verify. A weak environment produces weak evidence. A strong environment makes the work legible.&lt;/p&gt;

&lt;p&gt;If the agent can edit UI files but cannot open the page, you have a code generator with extra steps. If it can open the page but does not capture console errors, you have a screenshot machine. If it can run tests but the results disappear into a chat summary, you have theater.&lt;/p&gt;

&lt;p&gt;The useful setup is the one that answers boring questions clearly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What changed in the diff?&lt;/li&gt;
&lt;li&gt;What command ran?&lt;/li&gt;
&lt;li&gt;What browser state was observed?&lt;/li&gt;
&lt;li&gt;What failed first?&lt;/li&gt;
&lt;li&gt;What evidence survived after the agent finished?&lt;/li&gt;
&lt;li&gt;What should a human review next?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That sounds less exciting than "autonomous frontend engineer." Good. It is also closer to how reliable software gets built.&lt;/p&gt;

&lt;h2&gt;
  
  
  The proof loop matters more than the wrapper
&lt;/h2&gt;

&lt;p&gt;The AI tooling market keeps producing new wrappers for coding agents: local shells, cloud workspaces, async task queues, stage-gated agents, headless engines, and review dashboards.&lt;/p&gt;

&lt;p&gt;Some of that is useful. Some of it is just another place to talk to a model.&lt;/p&gt;

&lt;p&gt;The wrapper only matters if it improves the proof loop.&lt;/p&gt;

&lt;p&gt;A good proof loop ties the browser back to the repo. It does not stop at "the page looks right." It connects the rendered state to the command, the diff, the logs, and the failure mode.&lt;/p&gt;

&lt;p&gt;For frontend work, I want an agent workflow that can leave artifacts like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the exact route or story it opened&lt;/li&gt;
&lt;li&gt;the viewport it used&lt;/li&gt;
&lt;li&gt;the visible state it inspected&lt;/li&gt;
&lt;li&gt;console errors and warnings&lt;/li&gt;
&lt;li&gt;failed selectors or assertions&lt;/li&gt;
&lt;li&gt;network responses that explain missing UI&lt;/li&gt;
&lt;li&gt;screenshots tied to a reproducible step&lt;/li&gt;
&lt;li&gt;the diff that caused the observed behavior&lt;/li&gt;
&lt;li&gt;the command output that proves checks ran&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the difference between a screenshot and proof.&lt;/p&gt;

&lt;p&gt;A screenshot says, "look, it rendered."&lt;/p&gt;

&lt;p&gt;A proof loop says, "this was the state, this is what changed, this is where it failed, and this is how to reproduce it."&lt;/p&gt;

&lt;p&gt;The second one is what lets a developer make a decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  Terminal and editor surfaces still matter
&lt;/h2&gt;

&lt;p&gt;One funny side effect of the agent era is that boring developer tools feel more important, not less.&lt;/p&gt;

&lt;p&gt;Small terminal-native tools, fast editors, text interfaces, and inspectable command output are still where a lot of recovery happens. A lightweight editor project like Microsoft's &lt;code&gt;edit&lt;/code&gt; is not an AI-agent product, and it does not need to be. Its relevance is simpler: when workflows get more automated, developers need surfaces they can understand quickly when automation gets weird.&lt;/p&gt;

&lt;p&gt;The same applies to terminal UI experiments and CLI-heavy tools. The agent may be doing the work, but the human still needs a place to inspect, interrupt, retry, narrow the scope, and decide whether the output is worth keeping.&lt;/p&gt;

&lt;p&gt;This is where some agent products get the emphasis wrong. They optimize for delegation before they optimize for inspection.&lt;/p&gt;

&lt;p&gt;Delegation without inspection creates review debt.&lt;/p&gt;

&lt;p&gt;Inspection is not glamorous. It is logs, diffs, terminal panes, browser traces, local screenshots, and state that does not vanish when the chat scrolls away. But that is exactly what frontend agents need. The moment the UI fails, the question is not "can the model explain frontend testing?" The question is "what can I inspect right now?"&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical browser-proof workflow
&lt;/h2&gt;

&lt;p&gt;If I were setting up an AI-assisted frontend workflow, I would start with the proof loop before worrying about the agent personality.&lt;/p&gt;

&lt;p&gt;First, make the target explicit. The agent should know the route, story, component, or user flow it is supposed to verify. "Check the UI" is too vague. "Open &lt;code&gt;/settings/billing&lt;/code&gt;, switch to mobile width, submit the empty form, and inspect the validation state" is much better.&lt;/p&gt;

&lt;p&gt;Second, capture the browser state. A useful run should preserve screenshots, but it should also capture console output, failed selectors, network errors, and the current URL. Screenshots are easier to skim, but logs explain why the screenshot happened.&lt;/p&gt;

&lt;p&gt;Third, tie browser evidence to commands. If the agent ran a test, keep the command. If it started a dev server, keep the URL and port. If it changed fixtures, make that visible. A frontend failure is often a bad interaction between app state and test setup, not a single broken component.&lt;/p&gt;

&lt;p&gt;Fourth, keep visual asset prep out of the critical path, but do not ignore it. Frontend teams often need platform-ready screenshots, thumbnails, or social preview images after the UI work is done. For that narrow job, a browser-local tool such as &lt;a href="https://resizeimagefor.com" rel="noopener noreferrer"&gt;Resize Image For&lt;/a&gt; can prepare social-ready image sizes without uploading the source pixels. That belongs as a small workflow step, not as a substitute for browser validation.&lt;/p&gt;

&lt;p&gt;Fifth, make the agent say what it could not prove. This is the part I care about most. A good agent run should be comfortable ending with "I changed the component and verified the desktop route, but I did not verify mobile Safari or the logged-out state." That is not failure. That is useful honesty.&lt;/p&gt;

&lt;h2&gt;
  
  
  Async agents need gates, not vibes
&lt;/h2&gt;

&lt;p&gt;Cloud and async coding-agent products are moving in a predictable direction: isolated execution, task queues, review surfaces, and stage gates.&lt;/p&gt;

&lt;p&gt;That direction makes sense. If an agent is going to work away from your main machine, the environment needs stronger boundaries, not weaker ones. The agent should not just disappear for twenty minutes and come back with a confident paragraph. It should come back with a trail.&lt;/p&gt;

&lt;p&gt;The valuable feature is not "the agent kept working while I was gone."&lt;/p&gt;

&lt;p&gt;The valuable feature is "the agent worked in an isolated place, left reviewable evidence, and stopped before pretending uncertain work was done."&lt;/p&gt;

&lt;p&gt;That distinction matters for frontend work because UI bugs love hidden state. A cloud agent can generate a plausible patch without ever seeing the same browser reality your users see. An async agent can pass a narrow check while missing the interaction that actually breaks. A stage gate is only useful when it forces evidence into the open.&lt;/p&gt;

&lt;p&gt;Otherwise, async just means you receive the uncertainty later.&lt;/p&gt;

&lt;h2&gt;
  
  
  The checklist I would actually use
&lt;/h2&gt;

&lt;p&gt;For a real team, I would keep the evaluation criteria blunt:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can the agent open the actual app surface, not just edit files?&lt;/li&gt;
&lt;li&gt;Can it preserve browser evidence without relying on a prose summary?&lt;/li&gt;
&lt;li&gt;Can a reviewer replay the failure?&lt;/li&gt;
&lt;li&gt;Are screenshots paired with logs, selectors, network state, or traces?&lt;/li&gt;
&lt;li&gt;Are diffs small enough to inspect?&lt;/li&gt;
&lt;li&gt;Does the workflow separate "implemented" from "verified"?&lt;/li&gt;
&lt;li&gt;Does the agent clearly say what it did not test?&lt;/li&gt;
&lt;li&gt;Can the same check run again tomorrow?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one is underrated. Reproducibility is where a lot of AI workflow demos fall apart. A good demo can be lucky. A good workflow can survive a second run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trust the workflow that can explain failure
&lt;/h2&gt;

&lt;p&gt;AI agents are going to write more frontend code. That part is not interesting anymore.&lt;/p&gt;

&lt;p&gt;The interesting part is whether teams build workflows that make the work reviewable. Browser proof, terminal output, editor ergonomics, isolated execution, and stage gates are not side quests. They are the control surface.&lt;/p&gt;

&lt;p&gt;I am skeptical of any agent workflow that can describe success but cannot explain failure.&lt;/p&gt;

&lt;p&gt;Give me the route, the diff, the console output, the screenshot, the failing selector, the command, and the thing the agent did not verify. Then we can talk about trust.&lt;/p&gt;

&lt;p&gt;Until then, a screenshot is just a screenshot.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>frontend</category>
      <category>testing</category>
    </item>
    <item>
      <title>Local Coding Agents Are an Environment Problem</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Fri, 19 Jun 2026 15:07:26 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/local-coding-agents-are-an-environment-problem-1o4p</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/local-coding-agents-are-an-environment-problem-1o4p</guid>
      <description>&lt;p&gt;The prompt is no longer the center of the coding-agent setup.&lt;/p&gt;

&lt;p&gt;That feels strange because most demos still make the prompt look like the whole product. You ask for a feature. The agent reads some files. It edits code. Maybe it runs tests. The clean version fits nicely in a screen recording.&lt;/p&gt;

&lt;p&gt;Real local agents are messier than that. Once the agent can sit near your repo, run commands, inspect files, and use tools, the important question changes.&lt;/p&gt;

&lt;p&gt;It is not "did I write the perfect prompt?"&lt;/p&gt;

&lt;p&gt;It is "what environment did I just give this thing?"&lt;/p&gt;

&lt;p&gt;That is the part developers should be more opinionated about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local changes the trust boundary
&lt;/h2&gt;

&lt;p&gt;A chat assistant is easy to underestimate because the boundary is obvious. You paste context into a box. It gives you text back. The workflow can still go wrong, but at least the shape of the interaction is visible.&lt;/p&gt;

&lt;p&gt;A local coding agent is different. The agent is closer to the machine where work happens. It may touch a shell, local tools, project files, package managers, test runners, credentials, editor state, or MCP servers. Even if every individual permission is reasonable, the combined environment becomes the real product surface.&lt;/p&gt;

&lt;p&gt;That is why a practical macOS setup guide for local coding agents is more interesting than it first looks. The useful signal is not "here is another way to install an AI tool." The useful signal is that agent setup now looks like developer infrastructure.&lt;/p&gt;

&lt;p&gt;You have prerequisites. You have local runtime decisions. You have shell access. You have tool configuration. You have repo proximity. You have the awkward question of what you are comfortable letting an agent see and do.&lt;/p&gt;

&lt;p&gt;A better prompt can improve one answer. A better environment improves the whole loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup is part of the product
&lt;/h2&gt;

&lt;p&gt;Developers already know how much environment design matters. We do not treat CI, local dev containers, lint rules, permissions, or deployment gates as vibes. We treat them as part of the system because they decide what work can happen safely and repeatedly.&lt;/p&gt;

&lt;p&gt;Local agents deserve the same treatment.&lt;/p&gt;

&lt;p&gt;If an agent can edit files but cannot run the right checks, it is a code generator with a blindfold. If it can run commands but nobody can see which commands ran, it is a review problem waiting to happen. If it can connect to every available tool because "more integrations" sounds impressive, the team has created a permission model without admitting it.&lt;/p&gt;

&lt;p&gt;That is the mistake I see people drifting toward: treating local agent setup like a personal productivity preference.&lt;/p&gt;

&lt;p&gt;It is closer to choosing development infrastructure.&lt;/p&gt;

&lt;p&gt;The practical questions are boring, which is a good sign:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What can the agent read?&lt;/li&gt;
&lt;li&gt;What can it edit?&lt;/li&gt;
&lt;li&gt;What commands can it run?&lt;/li&gt;
&lt;li&gt;Which tools are available by default?&lt;/li&gt;
&lt;li&gt;Where does state live?&lt;/li&gt;
&lt;li&gt;Can another developer reproduce the setup?&lt;/li&gt;
&lt;li&gt;What evidence does the agent leave behind after it acts?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If those answers are fuzzy, the prompt will not save you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Small capabilities beat vague autonomy
&lt;/h2&gt;

&lt;p&gt;One of the healthier patterns showing up around agent tooling is the move toward small, inspectable capabilities.&lt;/p&gt;

&lt;p&gt;Projects like Superpowers point at that direction. Even with limited readable material, the signal is clear enough: developers want reusable affordances that can be understood, composed, and reused. That is much better than stuffing every expectation into a giant prompt and hoping the agent remembers the important parts.&lt;/p&gt;

&lt;p&gt;A capability can be reviewed. A prompt blob usually cannot.&lt;/p&gt;

&lt;p&gt;This matters because agent behavior becomes less mysterious when the workflow is broken into named pieces. A skill for gathering sources. A rule for editing a specific project. A script that validates output. A checklist that defines "done" for a platform. None of these is glamorous, but they turn agent work into something a teammate can inspect.&lt;/p&gt;

&lt;p&gt;The same idea applies to local coding work. A scoped capability that says "run this test command and summarize failures" is easier to trust than an open-ended instruction like "make sure everything works." The first one leaves a trail. The second one invites theater.&lt;/p&gt;

&lt;p&gt;This is where agent systems start to look less like magic and more like software.&lt;/p&gt;

&lt;p&gt;Good.&lt;/p&gt;

&lt;p&gt;Software has boundaries.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP needs governance, not connector collecting
&lt;/h2&gt;

&lt;p&gt;MCP-style tooling makes this more obvious.&lt;/p&gt;

&lt;p&gt;The interesting part of MCP is not that an agent can connect to more things. Connection count is a bad metric. A local agent with access to ten tools is not automatically better than one with access to three. It may just have a larger blast radius.&lt;/p&gt;

&lt;p&gt;The useful question is what each tool lets the agent do.&lt;/p&gt;

&lt;p&gt;Can it read only, or can it mutate state? Can it reach production systems? Can it write files? Can it call external services? Does it expose secrets by accident? Does the human reviewer know when the agent used it?&lt;/p&gt;

&lt;p&gt;Projects like Paca are useful signals because they show tool access becoming infrastructure. Once agent tools are infrastructure, teams need the same instincts they use everywhere else: least privilege, auditability, clear ownership, and boring defaults.&lt;/p&gt;

&lt;p&gt;This does not mean every local agent needs enterprise ceremony. A solo developer hacking on a side project can accept different risks than a team working near customer data.&lt;/p&gt;

&lt;p&gt;But the distinction should be explicit. "It is local" does not automatically mean "it is safe." Local control gives you more visibility and more responsibility at the same time.&lt;/p&gt;

&lt;h2&gt;
  
  
  More output still needs review
&lt;/h2&gt;

&lt;p&gt;The community debate around AI coding tools keeps circling one painful point: output is not the same as leverage.&lt;/p&gt;

&lt;p&gt;Agents can create more code, more branches, more suggestions, more summaries, and more things for a human to look at. That can help. It can also turn into review debt if the environment does not make the work legible.&lt;/p&gt;

&lt;p&gt;HN discussions around AI coding tools often land in that messy middle. The argument is less "good or bad" than "where did the cost move?" Did the agent remove work, or did it move work into review? Did it solve the task, or did it produce a plausible diff that now needs forensic reading?&lt;/p&gt;

&lt;p&gt;That is why local-agent environments need review surfaces as much as execution surfaces.&lt;/p&gt;

&lt;p&gt;Show what files were read. Show what commands ran. Keep diffs small enough to scan. Make assumptions visible. Preserve logs. Prefer workflows that can fail clearly over workflows that half-succeed with confidence.&lt;/p&gt;

&lt;p&gt;The local setup should make the human's job easier after the agent acts. If it only makes the agent faster, the team may not be faster at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical checklist for local-agent environments
&lt;/h2&gt;

&lt;p&gt;If I were evaluating a local coding-agent setup, I would mostly ignore the impressive demo for the first few minutes.&lt;/p&gt;

&lt;p&gt;I would ask about the loop.&lt;/p&gt;

&lt;p&gt;Can the agent explain where its context came from? Repo files, docs, previous runs, issue text, skills, and local rules all shape the answer. A reviewer should not have to guess which ones mattered.&lt;/p&gt;

&lt;p&gt;Can permissions be scoped without heroics? Read access, write access, shell access, network access, and tool access are separate concerns. A setup that treats them as one big yes/no switch is asking for trouble.&lt;/p&gt;

&lt;p&gt;Are reusable capabilities inspectable? If a skill changes how the agent behaves, it should be easy to read. If a tool can mutate state, that should be obvious before the agent uses it.&lt;/p&gt;

&lt;p&gt;Does the workflow leave evidence? A local agent that runs tests should leave the command and result somewhere visible. A local agent that edits code should make the diff easy to review. A local agent that gets blocked should write down the blocker instead of pretending the task is basically done.&lt;/p&gt;

&lt;p&gt;Can the setup be shared? A personal pile of shell aliases and hidden assumptions might work for one developer. It becomes fragile the moment a team tries to rely on it.&lt;/p&gt;

&lt;p&gt;Where does human ownership enter the loop? This is the question teams tend to dodge. If a human owns the final merge, optimize for review. If the agent owns more of the path, the gates need to be much stricter.&lt;/p&gt;

&lt;p&gt;None of this requires fear. It requires taste.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trust the environment before the output
&lt;/h2&gt;

&lt;p&gt;Local coding agents are compelling because they move AI work closer to the place where software is actually built.&lt;/p&gt;

&lt;p&gt;That is also what makes them risky.&lt;/p&gt;

&lt;p&gt;The model matters. The prompt matters. But the environment carries more of the risk than people want to admit: runtime, permissions, tools, capabilities, logs, review gates, and the habits a team builds around them.&lt;/p&gt;

&lt;p&gt;I am skeptical of any agent setup that cannot explain its own work. I am much more interested in setups that make boring things visible: what the agent saw, what it changed, what it ran, what failed, and where the human is expected to take over.&lt;/p&gt;

&lt;p&gt;That is the real local-agent test.&lt;/p&gt;

&lt;p&gt;Trust the environment before you trust the output.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://ikyle.me/blog/2026/how-to-setup-a-local-coding-agent-on-macos" rel="noopener noreferrer"&gt;How to setup a local coding agent on macOS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/obra/superpowers" rel="noopener noreferrer"&gt;Superpowers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/xmcp/paca" rel="noopener noreferrer"&gt;Paca&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://news.ycombinator.com/item?id=44226761" rel="noopener noreferrer"&gt;1,600 Software Engineers Were Asked About AI Coding Tools. Here Is What They Said&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reddit.com/r/LocalLLaMA/" rel="noopener noreferrer"&gt;What are you using LLMs for?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>mcp</category>
      <category>productivity</category>
    </item>
    <item>
      <title>AI Coding Speed Is Cheap. Control Debt Is the Real Cost</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Fri, 19 Jun 2026 13:55:05 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/ai-coding-speed-is-cheap-control-debt-is-the-real-cost-3gh9</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/ai-coding-speed-is-cheap-control-debt-is-the-real-cost-3gh9</guid>
      <description>&lt;h2&gt;
  
  
  The code is cheap now. Staying in control is not
&lt;/h2&gt;

&lt;p&gt;Teams keep measuring the wrong thing.&lt;/p&gt;

&lt;p&gt;Yes, AI makes code cheaper. That part is obvious. The non-obvious part is that faster generation does not make understanding, review, or safe change management any cheaper. If anything, it makes the gap worse.&lt;/p&gt;

&lt;p&gt;That gap is where control debt shows up.&lt;/p&gt;

&lt;p&gt;Control debt is what happens when a team can keep shipping changes but can no longer explain them cleanly, verify them fast enough, or steer the system without guessing. The codebase keeps moving. Human control lags behind. People call that "productivity" right up until a bug report, a rollback, or a scary refactor reveals the bill.&lt;/p&gt;

&lt;h2&gt;
  
  
  Control debt shows up in three different ways
&lt;/h2&gt;

&lt;p&gt;The first kind is cognitive debt.&lt;/p&gt;

&lt;p&gt;You merge the feature. Two days later you can still point at the files, but you cannot give a confident explanation of how the behavior actually works. Parts of the codebase already feel like someone else's project.&lt;/p&gt;

&lt;p&gt;The second kind is verification debt.&lt;/p&gt;

&lt;p&gt;The agent can produce another diff before the reviewer finishes reading the last one. Tests help, but green tests only tell you something passed. They do not prove the team understands the change, the assumptions behind it, or the blast radius of the next edit.&lt;/p&gt;

&lt;p&gt;The third kind is architectural debt.&lt;/p&gt;

&lt;p&gt;This one is slower and nastier. Local choices keep working just well enough to merge, while the shape of the system gets worse: duplicated patterns, awkward seams, brittle abstractions, and code that technically functions but fits the codebase less every week.&lt;/p&gt;

&lt;p&gt;Those are different problems. They compound fast. Once understanding drops, review quality drops. Once review quality drops, architecture starts drifting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Invisible agent work is where trust dies
&lt;/h2&gt;

&lt;p&gt;A lot of people think the problem is code volume. Not quite. The more immediate problem is invisible work.&lt;/p&gt;

&lt;p&gt;The useful pattern in emerging agent tooling is not "look, cool terminal UI." It is visibility. Context pressure. Active tools. Running workers. Todo state. Transcript access. The whole point is to make agent behavior inspectable before the operator loses the plot.&lt;/p&gt;

&lt;p&gt;That is the real control surface.&lt;/p&gt;

&lt;p&gt;If an agent can read files, call tools, spawn workers, and continue asynchronously, observability stops being a nice extra. It becomes part of the review system. You do not need perfect omniscience. You do need enough visibility to answer a simple question at any moment: what is this thing doing on my behalf right now?&lt;/p&gt;

&lt;h2&gt;
  
  
  Async control needs hard edges
&lt;/h2&gt;

&lt;p&gt;This gets more serious once sessions can accept outside events while they are still running.&lt;/p&gt;

&lt;p&gt;That sounds powerful because it is powerful. A human can redirect work mid-run instead of restarting everything from zero. But that only helps when the workflow has explicit edges.&lt;/p&gt;

&lt;p&gt;Which sessions are allowed to accept outside input? Who is allowed to send it? What kinds of interruption are safe? When does a mid-run redirect help, and when does it just scramble state?&lt;/p&gt;

&lt;p&gt;If the answers are fuzzy, "autonomy" becomes a polite word for unattended drift.&lt;/p&gt;

&lt;p&gt;The rule is simple: if a system supports async steering, it also needs opt-in sessions, clear sender limits, and known interruption rules. Otherwise the control plane is just another source of chaos.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical control stack
&lt;/h2&gt;

&lt;p&gt;Most teams do not need a grand theory here. They need operating discipline.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keep diffs review-sized. If a human cannot explain the change honestly, the change is too large to merge casually.&lt;/li&gt;
&lt;li&gt;Separate generation from ownership. "The model produced this" and "the team now owns this" should be treated as different workflow stages.&lt;/li&gt;
&lt;li&gt;Ask for explainability, not just green tests. Teams should be able to answer why the code exists, what assumptions it makes, and what breaks when inputs change.&lt;/li&gt;
&lt;li&gt;Make agent activity visible. Tool activity, context pressure, active tasks, and pending work help humans recover the plot before drift gets expensive.&lt;/li&gt;
&lt;li&gt;Put hard limits around async steering. If the system allows event injection or mid-run redirection, it also needs explicit rules for who can intervene and how.&lt;/li&gt;
&lt;li&gt;Slow down before merge when the system is moving faster than the reviewer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of that is glamorous. That is the point. Good control usually looks boring right up until it saves you from a mess.&lt;/p&gt;

&lt;h2&gt;
  
  
  The mistake people make
&lt;/h2&gt;

&lt;p&gt;The mistake is thinking AI coding creates a pure speed game.&lt;/p&gt;

&lt;p&gt;It does create a speed game, but only for output. Everything else stays stubbornly physical. Humans still need to recover intent. Teams still need to verify behavior. Systems still rot when nobody owns the shape of the code.&lt;/p&gt;

&lt;p&gt;So the real bottleneck is not generation anymore. It is recoverability.&lt;/p&gt;

&lt;p&gt;If you cannot tell what changed, why it changed, and whether the next person can change it safely, you are not moving fast. You are borrowing confidence from the future.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;AI tools are making it cheaper to produce code. They are not making it cheaper to stay in control of a codebase.&lt;/p&gt;

&lt;p&gt;That is the debt worth naming.&lt;/p&gt;

&lt;p&gt;If teams do not design for visibility, review, and bounded intervention, they will keep celebrating output while quietly losing ownership. And once ownership goes, the speed win stops being real.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/harsh2644/ai-is-creating-a-new-kind-of-tech-debt-and-nobody-is-talking-about-it-3pm6"&gt;AI Is Creating a New Kind of Tech Debt - And Nobody Is Talking About It&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://raw.githubusercontent.com/jarrodwatts/claude-hud/main/README.md" rel="noopener noreferrer"&gt;Claude HUD&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/channels" rel="noopener noreferrer"&gt;Push events into a running session with channels&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reddit.com/r/vibecoding/comments/1ryr12i/i_no_longer_know_more_than_47_of_my_apps_code/" rel="noopener noreferrer"&gt;I no longer know more than 47% of my app's code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>productivity</category>
    </item>
    <item>
      <title>Your Repo Context Is an Attack Surface Now</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Thu, 18 Jun 2026 02:16:48 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/your-repo-context-is-an-attack-surface-now-5dhj</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/your-repo-context-is-an-attack-surface-now-5dhj</guid>
      <description>&lt;p&gt;The lazy version of AI coding security is "make sure the model does not write insecure code."&lt;/p&gt;

&lt;p&gt;That is not wrong. It is just too small.&lt;/p&gt;

&lt;p&gt;The more interesting problem is everything the agent reads before it writes code, plus everything it is allowed to run after it decides what to do. Your repo is no longer just a place where code lives. For an agentic coding tool, it is part of the input stream.&lt;/p&gt;

&lt;p&gt;That changes the security model.&lt;/p&gt;

&lt;p&gt;Old docs, stale examples, local instruction files, hidden project conventions, dependency scripts, shell hooks, webhooks, memories, delegated workers, and previous diffs can all become steering material. Some of that context is useful. Some of it is garbage. Some of it might be hostile.&lt;/p&gt;

&lt;p&gt;This is where the agent hype gets painfully normal. The risk is not magic. It is the same automation risk developers already know, moved closer to the editor and wrapped in a model that is very good at sounding confident.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context is not background anymore
&lt;/h2&gt;

&lt;p&gt;Developers tend to treat repo context as neutral.&lt;/p&gt;

&lt;p&gt;The README is just the README. The old migration notes are just old migration notes. The examples in &lt;code&gt;docs/&lt;/code&gt; are just examples. The hook config is just a convenience thing someone added last quarter.&lt;/p&gt;

&lt;p&gt;An agent does not necessarily see that social context. It sees text, tools, paths, commands, and patterns. If a coding assistant uses project context to decide what "normal" looks like, then all of that material can affect the output.&lt;/p&gt;

&lt;p&gt;That does not mean every agent reads every git object, hidden file, or forgotten note. Overstating this makes the whole discussion worse. The real point is narrower and more useful: once a tool can use local context to shape behavior, local context becomes part of the trust boundary.&lt;/p&gt;

&lt;p&gt;That is a big shift for teams that have spent years treating docs and examples as low-risk clutter.&lt;/p&gt;

&lt;p&gt;Bad context can be boring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;outdated setup instructions&lt;/li&gt;
&lt;li&gt;examples that use deprecated APIs&lt;/li&gt;
&lt;li&gt;old architecture notes that no longer match production&lt;/li&gt;
&lt;li&gt;test fixtures that encode unsafe assumptions&lt;/li&gt;
&lt;li&gt;copied snippets with weak security defaults&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bad context can also be adversarial:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompt-injection-style instructions inside files the agent may read&lt;/li&gt;
&lt;li&gt;dependency scripts that run more than expected&lt;/li&gt;
&lt;li&gt;hook configuration that turns a local command into a larger execution path&lt;/li&gt;
&lt;li&gt;poisoned examples that nudge future changes toward unsafe patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Either way, the failure mode is the same. The agent builds on a premise you did not mean to endorse.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hooks deserve the same suspicion as build scripts
&lt;/h2&gt;

&lt;p&gt;The fastest way to make an agent useful is to let it do things.&lt;/p&gt;

&lt;p&gt;Run the formatter. Execute tests. Search files. Open pull requests. Call project scripts. Trigger webhooks. Hand work to another agent. That is the good stuff. It is also where the blast radius starts.&lt;/p&gt;

&lt;p&gt;A hook system is not "just a productivity feature" once it can run commands in a real developer environment. It is automation. Treat it like automation.&lt;/p&gt;

&lt;p&gt;That means asking basic, unfashionable questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who can edit this hook?&lt;/li&gt;
&lt;li&gt;What command does it run?&lt;/li&gt;
&lt;li&gt;What environment variables can it see?&lt;/li&gt;
&lt;li&gt;Does it inherit developer credentials?&lt;/li&gt;
&lt;li&gt;Can a package install script affect it?&lt;/li&gt;
&lt;li&gt;Does it write outside the repo?&lt;/li&gt;
&lt;li&gt;Is there a log when it fires?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this is new security wisdom. CI pipelines, build scripts, and dependency installers have lived in this world for years. The difference is that coding agents make local automation feel conversational and lightweight. That feeling is dangerous.&lt;/p&gt;

&lt;p&gt;If a compromised package, sloppy hook, or over-permissive token can turn a small agent action into a machine-level event, the model is not the only thing you need to audit. The surrounding workflow matters more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory and delegation widen the surface
&lt;/h2&gt;

&lt;p&gt;Agent platforms are moving away from single-turn chat. That is the right direction.&lt;/p&gt;

&lt;p&gt;Memory makes agents less repetitive. Outcome tracking makes them easier to steer. Delegation lets work split across specialized workers. Webhooks and visibility features make the system feel less like a magic text box and more like developer infrastructure.&lt;/p&gt;

&lt;p&gt;Good.&lt;/p&gt;

&lt;p&gt;But useful state is still state. Delegation is still delegation. A webhook is still an integration point.&lt;/p&gt;

&lt;p&gt;The mistake is treating these features as pure capability upgrades. They are also governance upgrades, whether the product UI says that out loud or not.&lt;/p&gt;

&lt;p&gt;Once an agent can remember project preferences, assign work, trigger external systems, and operate across a longer task loop, you need to care about the shape of that loop:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what memory is stored&lt;/li&gt;
&lt;li&gt;who can change it&lt;/li&gt;
&lt;li&gt;when it is used&lt;/li&gt;
&lt;li&gt;which agents can inherit it&lt;/li&gt;
&lt;li&gt;what tools delegated workers can call&lt;/li&gt;
&lt;li&gt;how results are reviewed before they land&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not a call to panic. It is a call to stop pretending the agent is only a smarter autocomplete.&lt;/p&gt;

&lt;p&gt;Autocomplete suggests text. Agent workflows can accumulate assumptions, call tools, execute commands, and leave behind changes that future runs may trust. That is a different class of system.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical controls are boring, which is good
&lt;/h2&gt;

&lt;p&gt;The right response is not "never use agents." That is unserious.&lt;/p&gt;

&lt;p&gt;The right response is to make the workflow less squishy. You want fewer ambient permissions, smaller scopes, cleaner context, and better records of what happened.&lt;/p&gt;

&lt;p&gt;Start with scope.&lt;/p&gt;

&lt;p&gt;Do not point an agent at the whole world when it only needs three files. Use narrow tasks. Use disposable worktrees when the change is risky. Keep unrelated diffs out of the working tree so the agent does not have to infer which mess is intentional.&lt;/p&gt;

&lt;p&gt;Then audit instructions.&lt;/p&gt;

&lt;p&gt;Read the files your agents are likely to treat as guidance: root docs, agent instruction files, coding standards, examples, old migration notes, and internal checklists. If they are stale, delete or fix them. If they are important, make them explicit. If they contain commands, treat those commands as part of the system.&lt;/p&gt;

&lt;p&gt;Then harden execution.&lt;/p&gt;

&lt;p&gt;Run risky work in a sandbox where possible. Keep credentials scoped. Avoid ambient secrets in the shell. Review hook configuration like you would review CI. Pin or at least inspect dependency behavior for workflows that agents can trigger. Make sure "run tests" does not secretly mean "run every script with every token available."&lt;/p&gt;

&lt;p&gt;Finally, demand visibility.&lt;/p&gt;

&lt;p&gt;You should be able to answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What did the agent read?&lt;/li&gt;
&lt;li&gt;What tools did it call?&lt;/li&gt;
&lt;li&gt;What files did it change?&lt;/li&gt;
&lt;li&gt;What commands ran?&lt;/li&gt;
&lt;li&gt;What assumptions did it make?&lt;/li&gt;
&lt;li&gt;What still needs human review?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the answer is "I think it was fine," the workflow is not mature enough for high-blast-radius work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production use still needs human ownership
&lt;/h2&gt;

&lt;p&gt;Developer discussions around production AI coding keep circling the same point: people are using these tools for serious work, but they do not get to outsource judgment.&lt;/p&gt;

&lt;p&gt;That feels right.&lt;/p&gt;

&lt;p&gt;Agents can move fast through known terrain. They can scaffold, refactor, inspect, summarize, and wire things together. They can also follow the wrong context with perfect confidence. The person operating the system still owns architecture, credentials, review, test quality, and release decisions.&lt;/p&gt;

&lt;p&gt;This is the part teams should make explicit.&lt;/p&gt;

&lt;p&gt;If an agent opens a pull request, the review standard should not drop because the author is non-human. If an agent changes auth code, the security review should get stricter, not softer. If an agent edits scripts or hooks, treat that as infrastructure work. If an agent claims the tests pass, check which tests ran and what they prove.&lt;/p&gt;

&lt;p&gt;The best teams will not be the ones that ban agentic coding. They also will not be the ones that give every agent a permanent token and a heroic prompt.&lt;/p&gt;

&lt;p&gt;They will be the ones that turn agent work into a boring, inspectable development process.&lt;/p&gt;

&lt;h2&gt;
  
  
  A useful mental model
&lt;/h2&gt;

&lt;p&gt;Think about an AI coding agent as a junior developer with shell access, excellent typing speed, strange reading habits, and no social memory of why your repo looks the way it does.&lt;/p&gt;

&lt;p&gt;You would not hand that person production credentials on day one. You would not ask them to rewrite the deployment pipeline without review. You would not let them treat every old note in the repo as current policy. You would give them a small task, a clean context, limited permissions, and a review path.&lt;/p&gt;

&lt;p&gt;That model is not perfect, but it gets the posture right.&lt;/p&gt;

&lt;p&gt;The agent is not evil. The repo is not cursed. The problem is trust.&lt;/p&gt;

&lt;p&gt;Once repo context becomes model input and local automation becomes agent action, the boundary moves. Security has to move with it.&lt;/p&gt;

&lt;p&gt;The practical takeaway is simple: clean the context, narrow the scope, sandbox the execution, log the actions, and review the diff like it matters.&lt;/p&gt;

&lt;p&gt;Because it does.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/piiiico/git-history-as-an-attack-surface-22dh"&gt;Git History as an Attack Surface&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://lord.technology/2026/05/02/claude-codes-hook-system-just-got-weaponised.html" rel="noopener noreferrer"&gt;Claude Code's hook system just got weaponised&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://claude.com/blog/new-in-claude-managed-agents" rel="noopener noreferrer"&gt;New in Claude Managed Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/alexmercedcoder/ai-weekly-free-web-tools-mcp-production-wins-trusted-compute-models-april-30-may-6-2026-325h"&gt;AI Weekly: Free Web Tools, MCP Production Wins, Trusted-Compute Models&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reddit.com/r/ClaudeCode/comments/1t818g9/production_level_software_by_ai/" rel="noopener noreferrer"&gt;Production Level Software by AI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>devops</category>
      <category>tooling</category>
    </item>
    <item>
      <title>Stop Calling It Vibe Coding When You Need Engineering</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Wed, 17 Jun 2026 03:15:48 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/stop-calling-it-vibe-coding-when-you-need-engineering-48o6</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/stop-calling-it-vibe-coding-when-you-need-engineering-48o6</guid>
      <description>&lt;p&gt;The most useful thing about "vibe coding" is also the thing that makes it dangerous: it feels like progress before the system has earned your trust.&lt;/p&gt;

&lt;p&gt;You describe the app. The model writes a lot of code. The demo starts to move. For prototypes, that is magic. For production software, it is where the bill starts.&lt;/p&gt;

&lt;p&gt;The mistake is treating the first 70% as proof that the last 30% will be easy. It usually is not. The last 30% is where the vague requirements become edge cases, the generated architecture starts pushing back, and the missing tests stop being a detail.&lt;/p&gt;

&lt;p&gt;That is why the more interesting shift is not "AI writes code now." We already know that. The real shift is from vibe coding to agentic engineering: structured work where agents operate inside specs, tests, memory, review loops, and clear human control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vibe coding is great until the code matters
&lt;/h2&gt;

&lt;p&gt;Vibe coding works best when the cost of being wrong is low.&lt;/p&gt;

&lt;p&gt;Want to explore a UI idea? Fine. Want to build a throwaway internal script? Great. Want to generate a starter app so you can see the shape of a product? That is a legitimate use case.&lt;/p&gt;

&lt;p&gt;The problem starts when the prototype quietly becomes the foundation.&lt;/p&gt;

&lt;p&gt;AI-generated code can look more finished than it is. It may compile. It may even pass the happy-path click test. But production work has a different standard:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can another developer understand the structure?&lt;/li&gt;
&lt;li&gt;Are the failure cases explicit?&lt;/li&gt;
&lt;li&gt;Do tests cover the behavior that matters?&lt;/li&gt;
&lt;li&gt;Is the architecture still sane after the fifth change request?&lt;/li&gt;
&lt;li&gt;Can you safely modify it next month?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the part vibe coding tends to hide. The model can generate volume faster than you can inspect intent. If the workflow is just "prompt, accept, prompt again," you are not removing engineering work. You are moving it downstream, where it is harder to see.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 70% problem is really a trust problem
&lt;/h2&gt;

&lt;p&gt;The "70% problem" is a good way to frame this: AI gets you impressively far, then the remaining work becomes weirdly expensive.&lt;/p&gt;

&lt;p&gt;That does not mean AI coding is bad. It means code generation is not the same as software delivery.&lt;/p&gt;

&lt;p&gt;The first 70% rewards speed. The last 30% rewards judgment. Those are different muscles.&lt;/p&gt;

&lt;p&gt;Early on, the agent can make broad moves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scaffold the app&lt;/li&gt;
&lt;li&gt;wire up common patterns&lt;/li&gt;
&lt;li&gt;generate boilerplate&lt;/li&gt;
&lt;li&gt;suggest APIs&lt;/li&gt;
&lt;li&gt;implement obvious flows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Later, the work becomes less about typing and more about control:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deciding what should not be abstracted&lt;/li&gt;
&lt;li&gt;catching incorrect assumptions&lt;/li&gt;
&lt;li&gt;tightening data boundaries&lt;/li&gt;
&lt;li&gt;deleting clever-but-useless code&lt;/li&gt;
&lt;li&gt;proving behavior with tests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why serious AI coding workflows start to look less like chat and more like engineering operations. You need constraints. You need feedback. You need durable context. You need a way to say, "This is the contract. This is the test. This is the part you are allowed to change."&lt;/p&gt;

&lt;p&gt;Without that, the agent is just producing plausible text in a code-shaped format.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agentic engineering changes the unit of work
&lt;/h2&gt;

&lt;p&gt;The useful unit is no longer a prompt. It is a task with context, acceptance criteria, tools, and review.&lt;/p&gt;

&lt;p&gt;That sounds less exciting than "build me an app," but it is the difference between a demo and a workflow you can keep using.&lt;/p&gt;

&lt;p&gt;Agentic engineering is the practice of making AI agents operate inside an engineering system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;specs before implementation&lt;/li&gt;
&lt;li&gt;tests before trust&lt;/li&gt;
&lt;li&gt;small scopes instead of giant rewrites&lt;/li&gt;
&lt;li&gt;file-based handoffs instead of chat memory guesses&lt;/li&gt;
&lt;li&gt;human review at the points where judgment matters&lt;/li&gt;
&lt;li&gt;repeatable skills for common work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where tools like Hermes Agent are worth watching. The interesting part is not that it is another chatbot interface. The project points toward a more operational model: agents with memory, custom skills, subagents, and deployment options that let them run as part of a workflow instead of sitting off to the side as a text box.&lt;/p&gt;

&lt;p&gt;That is a different posture. A coding assistant answers. An engineering agent should remember, delegate, run tools, adapt to local patterns, and leave artifacts that humans can audit.&lt;/p&gt;

&lt;p&gt;It still needs supervision. Maybe more supervision, not less. But the supervision moves from babysitting every line to designing the system the agent works inside.&lt;/p&gt;

&lt;h2&gt;
  
  
  Parallel agents only help if the work is shaped correctly
&lt;/h2&gt;

&lt;p&gt;Once people see agents as workers instead of autocomplete, the next temptation is obvious: run more of them.&lt;/p&gt;

&lt;p&gt;That can help. It can also create a beautiful mess.&lt;/p&gt;

&lt;p&gt;Parallel agents are only useful when the work can be split cleanly. If three agents all edit the same files, disagree about architecture, and invent their own assumptions, you did not gain throughput. You created a merge conflict with confidence.&lt;/p&gt;

&lt;p&gt;The better pattern is boring and effective:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one agent explores a specific question&lt;/li&gt;
&lt;li&gt;one agent owns a narrow implementation area&lt;/li&gt;
&lt;li&gt;one agent verifies behavior or checks risks&lt;/li&gt;
&lt;li&gt;all of them write results back to files&lt;/li&gt;
&lt;li&gt;a human or orchestrator integrates the output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is also why memory and custom skills matter. If every agent starts cold, you spend half the run re-explaining the codebase. If the agent can carry durable project knowledge and reusable workflows, it has a better shot at producing work that fits.&lt;/p&gt;

&lt;p&gt;The goal is not autonomy for its own sake. The goal is less repeated context loading, fewer sloppy handoffs, and faster movement through well-defined tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Programming for agents may need stricter boundaries
&lt;/h2&gt;

&lt;p&gt;The Weft project is another signal in the same direction: developers are starting to think about languages and runtimes where humans, LLMs, and infrastructure are all first-class parts of the system.&lt;/p&gt;

&lt;p&gt;That framing matters because agent work is not just "call an LLM and hope." Durable execution, explicit state, recoverable tasks, and clear boundaries become much more important when a model is allowed to act over time.&lt;/p&gt;

&lt;p&gt;This is where the hype often gets ahead of the engineering.&lt;/p&gt;

&lt;p&gt;Agents are not magically reliable because they can call tools. Tool access gives them more surface area to fail. The workflow has to make failure visible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what did the agent read?&lt;/li&gt;
&lt;li&gt;what did it change?&lt;/li&gt;
&lt;li&gt;what assumptions did it make?&lt;/li&gt;
&lt;li&gt;what tests did it run?&lt;/li&gt;
&lt;li&gt;what still needs human review?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you cannot answer those questions, you do not have an agentic workflow. You have a longer prompt chain.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical upgrade path
&lt;/h2&gt;

&lt;p&gt;You do not need to throw away vibe coding. You need to stop using it as the whole process.&lt;/p&gt;

&lt;p&gt;A more production-friendly workflow looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use vibe coding for exploration.&lt;/li&gt;
&lt;li&gt;Freeze the useful direction into a short spec.&lt;/li&gt;
&lt;li&gt;Break the work into small tasks with file ownership.&lt;/li&gt;
&lt;li&gt;Ask the agent to implement against the spec, not the vibe.&lt;/li&gt;
&lt;li&gt;Require tests, logs, or screenshots depending on the change.&lt;/li&gt;
&lt;li&gt;Review the diff like you would review a human teammate's work.&lt;/li&gt;
&lt;li&gt;Capture repeatable patterns as reusable project instructions or skills.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That last step is underrated. If you keep prompting the same rule every day, it belongs in the system, not in your short-term memory. Agents get more useful when the workflow teaches them how your project actually works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this still breaks down
&lt;/h2&gt;

&lt;p&gt;Agentic engineering is not a magic maturity badge.&lt;/p&gt;

&lt;p&gt;It can add overhead. It can produce too much process around simple work. It can create false confidence if the agent writes tests that merely confirm its own misunderstanding. It can also make teams lazy about architecture if they assume "the agent will fix it later."&lt;/p&gt;

&lt;p&gt;The rule of thumb is simple: match the process to the blast radius.&lt;/p&gt;

&lt;p&gt;For a prototype, vibe coding is fine. For a user-facing system, you need constraints. For critical paths, you need human review, meaningful tests, and boring operational discipline.&lt;/p&gt;

&lt;p&gt;The future of AI coding is probably not one giant prompt that builds the perfect app. It is smaller, sharper loops where agents do real work inside systems that make their output inspectable.&lt;/p&gt;

&lt;p&gt;That is less magical.&lt;/p&gt;

&lt;p&gt;It is also much closer to engineering.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://addyosmani.com/vibecoding" rel="noopener noreferrer"&gt;Vibe Coding by Addy Osmani&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/NousResearch/Hermes-Agent" rel="noopener noreferrer"&gt;Hermes Agent by Nous Research&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/weft-lang" rel="noopener noreferrer"&gt;Weft Programming Language for AI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>tooling</category>
    </item>
    <item>
      <title>Your RAG App Is Broken Because You're Still Parsing PDFs Like It's 2023</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Sun, 14 Jun 2026 03:05:27 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/your-rag-app-is-broken-because-youre-still-parsing-pdfs-like-its-2023-5mn</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/your-rag-app-is-broken-because-youre-still-parsing-pdfs-like-its-2023-5mn</guid>
      <description>&lt;p&gt;Most developers building "chat with your data" apps hit the exact same wall. You chunk the text, embed it, dump it in a vector database, and the retrieval is still terrible. The model hallucinates or completely scrambles tables. &lt;/p&gt;

&lt;p&gt;People think data ingestion is just text extraction. It isn't. In 2026, text extraction is a solved, boring problem. The actual hard part is layout. If your ingestion layer doesn't know that a bold header implies hierarchy, or that a two-column page isn't just one long string of text read left-to-right, your LLM is reading garbage. &lt;/p&gt;

&lt;h2&gt;
  
  
  Markdown won the ingestion war
&lt;/h2&gt;

&lt;p&gt;We've mostly stopped treating PDFs as plain text. Markdown is now the default format for document ingestion, simply because it preserves structure. &lt;/p&gt;

&lt;p&gt;Modern ingestion tools don't just dump strings. They output Markdown where headers, lists, and tables actually mean something. This gives the LLM the context it needs to figure out where a piece of information lived in the original document, which makes citations and retrieval significantly more accurate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local engines vs. Vision models
&lt;/h2&gt;

&lt;p&gt;Right now, there are basically two ways to handle this layout problem.&lt;/p&gt;

&lt;p&gt;First, you have local deterministic engines like IBM's Docling or OpenDataLoader PDF. Docling has quietly become a standard for enterprise RAG because it natively handles the whole Office suite and spits out clean Markdown. It runs locally without a GPU. OpenDataLoader does something similar. If you have a massive volume of private documents, this is the realistic path.&lt;/p&gt;

&lt;p&gt;Then you have the Vision-Language Model (VLM) approach. Instead of trying to parse messy PDF code, tools like Mistral OCR and LlamaParse just look at the document as an image. They see it the way we do. This completely bypasses the nightmare of multi-column layouts and nested tables that broke older parsers.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tradeoff
&lt;/h2&gt;

&lt;p&gt;VLM parsing feels like magic, but it's expensive. If you process millions of pages, running everything through a cloud vision API will destroy your budget. &lt;/p&gt;

&lt;p&gt;If I'm building a RAG pipeline today, my default is a robust local engine like Docling for the bulk of the documents. I only reach for the expensive VLM calls when a PDF is too visually complex for the local parser to figure out.&lt;/p&gt;

&lt;p&gt;Whatever you do, don't use legacy libraries like PyPDF or pdfminer for RAG anymore. If your ingestion layer isn't outputting structured Markdown or using vision to understand layout, your app is broken before the prompt even starts.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/opendataloader-project/opendataloader-pdf" rel="noopener noreferrer"&gt;OpenDataLoader PDF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/DS4SD/docling" rel="noopener noreferrer"&gt;Docling (IBM Research)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://schipper.ai/posts/the-vlm-parsing-shift/" rel="noopener noreferrer"&gt;The VLM Parsing Shift&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Why AI Coding Speed Is Creating Control Debt</title>
      <dc:creator>hefty</dc:creator>
      <pubDate>Sat, 13 Jun 2026 08:45:37 +0000</pubDate>
      <link>https://dev.to/hefty_69a4c2d631c9dd70724/why-ai-coding-speed-is-creating-control-debt-50lc</link>
      <guid>https://dev.to/hefty_69a4c2d631c9dd70724/why-ai-coding-speed-is-creating-control-debt-50lc</guid>
      <description>&lt;p&gt;I keep seeing people brag about how much code their AI agents wrote for them overnight. But when you look closer at the community discussions, the hangover is starting to set in. &lt;/p&gt;

&lt;p&gt;One developer on Reddit recently admitted they no longer understand more than 47% of their own app's codebase. They shipped features incredibly fast, but the cost was losing their mental model of the system. This is the mistake people make when they treat AI as a pure velocity multiplier: speed without control is just legacy code arriving faster.&lt;/p&gt;

&lt;p&gt;The real bottleneck isn't getting agents to write code. It is maintaining visibility, review discipline, and system understanding. &lt;/p&gt;

&lt;h2&gt;
  
  
  The difference between cognitive debt and verification debt
&lt;/h2&gt;

&lt;p&gt;We talk a lot about technical debt, but AI coding tools introduce two specific variants that are much harder to track. &lt;/p&gt;

&lt;p&gt;First is cognitive debt. When an agent writes 500 lines of boilerplate, it might be technically correct, but you didn't have to think through the architectural constraints to write it. When that code breaks three months later, you have to pay the cognitive cost all at once.&lt;/p&gt;

&lt;p&gt;Second is verification debt. Generation speed has completely outpaced review capacity. The code compiles, and the tests pass, but your merge gates are asking the wrong question. They ask if the code works today. They should ask if the reviewer can actually explain and debug the code tomorrow. &lt;/p&gt;

&lt;h2&gt;
  
  
  You need observability for your agents
&lt;/h2&gt;

&lt;p&gt;If you run a background worker in production without logging, you are asking for trouble. Why are we letting autonomous coding agents mutate our codebases with zero visibility?&lt;/p&gt;

&lt;p&gt;Blind trust in long unattended runs is a massive failure mode. We are finally starting to see tools treat agent runs like systems that need monitoring. Things like Claude HUD are bringing context usage, tool activity, and agent state right into the terminal statusline. &lt;/p&gt;

&lt;p&gt;Observability layers catch hidden work before reviewers completely lose the thread. Context health isn't cosmetic telemetry. It is the control surface you need to know when an agent is hallucinating or looping.&lt;/p&gt;

&lt;h2&gt;
  
  
  Async agents need strict boundaries
&lt;/h2&gt;

&lt;p&gt;If you let an agent run while you sleep, you still need bounded feedback loops.&lt;/p&gt;

&lt;p&gt;We are moving away from pull-based chat loops toward event-driven workflows. The recent docs on Claude channels show how developers are pushing external events directly into live coding sessions. But this only works if you enforce strict approval boundaries. Sender allowlists and per-session constraints are not optional. You cannot just give an agent a Jira ticket and root access and hope for the best.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;The solution isn't to stop using AI. The solution is to separate the generation step from the understanding step.&lt;/p&gt;

&lt;p&gt;Keep your diffs small. Force agents to explain their work before they execute it. If you can't debug what the agent just wrote, you have not actually saved time. You just borrowed it from your future self.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/harsh2644/ai-is-creating-a-new-kind-of-tech-debt-and-nobody-is-talking-about-it-3pm6"&gt;AI Is Creating a New Kind of Tech Debt - And Nobody Is Talking About It&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://raw.githubusercontent.com/jarrodwatts/claude-hud/main/README.md" rel="noopener noreferrer"&gt;Claude HUD&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/channels" rel="noopener noreferrer"&gt;Push events into a running session with channels&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reddit.com/r/vibecoding/comments/1ryr12i/i_no_longer_know_more_than_47_of_my_apps_code/" rel="noopener noreferrer"&gt;I no longer know more than 47% of my app's code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
  </channel>
</rss>
