<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Randy Cylonix</title>
    <description>The latest articles on DEV Community by Randy Cylonix (@dryricenoodle).</description>
    <link>https://dev.to/dryricenoodle</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3984449%2Fe6a52391-c40a-42da-aef5-b882ba3e3456.jpg</url>
      <title>DEV Community: Randy Cylonix</title>
      <link>https://dev.to/dryricenoodle</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dryricenoodle"/>
    <language>en</language>
    <item>
      <title>I measured what my AI coding agent did to production. It SSH'd in as root 1,508 times.</title>
      <dc:creator>Randy Cylonix</dc:creator>
      <pubDate>Sun, 14 Jun 2026 23:13:03 +0000</pubDate>
      <link>https://dev.to/dryricenoodle/i-measured-what-my-ai-coding-agent-did-to-production-it-sshd-in-as-root-1508-times-bd3</link>
      <guid>https://dev.to/dryricenoodle/i-measured-what-my-ai-coding-agent-did-to-production-it-sshd-in-as-root-1508-times-bd3</guid>
      <description>&lt;p&gt;I have been running Claude Code on auto mode for about a month. It is genuinely great. It also made me uneasy in a way I could not name, so I did the boring thing and looked at what it had actually done.&lt;/p&gt;

&lt;p&gt;I grepped roughly 30 days of my own Claude Code session transcripts (the JSONL files the CLI keeps) for privileged operations: anything that shelled out to sudo, ssh, scp, or rsync against a non-lab host. This is not rigorous telemetry. It is a regex sweep over my own history, so treat the numbers as the right order of magnitude, not an audited figure. But the shape was clear enough to change how I work.&lt;/p&gt;

&lt;p&gt;In that window:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;174 local sudo escalations. Roughly half were just reading root-owned files.&lt;/li&gt;
&lt;li&gt;1,508 raw root SSH/scp/rsync sessions into production hosts, 1,288 of them to a single box.&lt;/li&gt;
&lt;li&gt;Of those 1,508, about 6 in 10 were read-only inspection: docker ps, tailing logs, reading a config. They never needed root at all.&lt;/li&gt;
&lt;li&gt;A tail I did not enjoy finding: docker cp hot-patches into running prod containers, a couple of UPDATE/DELETE statements on a production database, live nginx edits.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The part that bothered me most is not the number. It is that the number coexists with this: the same agent, on full auto, still stops and asks me for sudo to install a test build on my Mac. It refuses to run a local package install, but it SSHes into production as root on its own. That is backwards.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it is backwards
&lt;/h2&gt;

&lt;p&gt;The agent decides "is this safe?" with a classifier, and a classifier is a probabilistic judgment that mis-fires in both directions. It over-blocks safe work: installing a signed package, rebuilding a test binary, the mundane local things that happen to need sudo. And it under-bounds genuinely dangerous work: root on a production host gets a probabilistic "this is probably fine," not a hard wall. I cannot tune that classifier, I cannot audit it, and I cannot prove what it will decide tomorrow when the prompt is slightly different. One fuzzy dial controls everything from apt install to a write on prod.&lt;/p&gt;

&lt;p&gt;The two failure modes feed two different pains. The over-blocking means I babysit the safe stuff, which is the thing auto mode was supposed to free me from. The under-bounding means I do not trust it near anything with real users, so I do production by hand, which is the thing an agent was supposed to help with. Auto mode handed me the worst of both.&lt;/p&gt;

&lt;p&gt;The deeper issue is where the decision lives. The "should I do this?" check runs inside the agent, against config the agent can read and, in principle, change. Turning the dial down to stop the nagging also turns it down on the dangerous side. You cannot say "auto-approve the safe class, hard-bound the dangerous class," because there is only one dial and it is a guess.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I did about it
&lt;/h2&gt;

&lt;p&gt;So I moved the decision out of the agent. The result is a small open-source tool, OpenScope. It is an action broker: a separate process between the agent and the system. The agent never gets sudo or an SSH key. It gets named, scoped actions like restart_service, install_pkg, tail_logs, read_note. The real credential stays inside a root-owned daemon, and the policy that says which actions the agent may call lives in root-owned files the agent cannot edit. Even a fully compromised agent can only invoke the verbs it was granted, with the parameters it was scoped to.&lt;/p&gt;

&lt;p&gt;New capabilities are not ambient. When the agent needs a new power it writes a proposal, a plan step reviews it (read-only, no sudo), and a human applies. The plan is the interesting part, because it makes the consequences legible before anything happens. Here is the real output for granting the agent the ability to install a signed package:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbg2fg13we7gfxdq9p641.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbg2fg13we7gfxdq9p641.png" alt="The plan to grant install_pkg: one high finding that requires every package be signed by my Developer ID AND located under a pinned path." width="800" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The high finding is not a failure, it is the point. Installing a pkg runs scripts as root, so the grant is gated to packages that are both signed by my Developer ID and under one specific path. After this, Claude Code installs my new builds on its own. The sudo nag is gone, and it did not become a blanket "agent can sudo."&lt;/p&gt;

&lt;p&gt;Here is the one that lets the agent deploy this project's own marketing site. The command is pinned at apply time, so the agent can trigger it but cannot change a character of it:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu43eej5mdkvcf3u9lrar.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu43eej5mdkvcf3u9lrar.png" alt="The plan to grant the web deploy: a pinned root command, and SSH-NO-BYPASS verified live that no key reaches the host." width="800" height="758"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It logs in as root, which the plan flags loudly. But the command is fixed, it touches only the marketing container, and the broker verifies live that no SSH key on my machine can reach that host outside the broker. The line I care about is "no ~/.ssh key reaches the target; verified live." That is the difference between a classifier saying "probably fine" and a system proving the agent has no side path to the key.&lt;/p&gt;

&lt;h2&gt;
  
  
  Not only infrastructure
&lt;/h2&gt;

&lt;p&gt;The same broker gives my personal-assistant agent scoped access to Apple Notes and Mail instead of my whole mailbox: it can list or read a specific note, or draft a mail, each one gated and logged, while the signed app holds the macOS Automation grant and the agent never does. a16z described this exact gap on their AI + a16z podcast recently: &lt;strong&gt;"There's a huge opportunity for startups to create these proxies ... if someone would give me a scoped Gmail, I'd adopt it today."&lt;/strong&gt; That is the whole idea. An agent should get a scoped proxy to whatever it touches, a production host, your package manager, or your inbox, and the credential plus the full surface should stay behind the proxy.&lt;/p&gt;

&lt;p&gt;The obvious objection at team scale is verb explosion, a custom action per person. It is not. The verbs are a shared catalog, authored once and shipped to every broker, and most are generic and parameterized: one restart_service or tail_logs covers everyone. What varies per agent is policy, which verbs it may call and with what scope, which is just data, the same model as IAM grants. A root-owned bounds file enforces invariants across the whole fleet (no agent may hold a readable key, a cap on how many hosts one agent can reach) regardless of any individual grant.&lt;/p&gt;

&lt;h2&gt;
  
  
  The point
&lt;/h2&gt;

&lt;p&gt;I do not think auto mode is the problem. I think a single probabilistic dial controlling both "install a package" and "drop a production table" is the problem. The fix is not a better classifier. It is to stop asking the agent to police itself, and instead give it a small set of scoped powers bounded by something it cannot rewrite. Then the safe stuff actually runs unattended, and the dangerous stuff is bounded by a rule you can read.&lt;/p&gt;

&lt;p&gt;OpenScope is open source and runs fully local, no server: &lt;a href="https://github.com/cylonix/openscope" rel="noopener noreferrer"&gt;github.com/cylonix/openscope&lt;/a&gt;. The honest caveat from the top still stands, so if you run coding agents, grep your own transcripts and tell me whether 1,508 in a month is high or low. And if you read the design, I most want to hear where you think the broker boundary leaks.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>devops</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
