<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rani</title>
    <description>The latest articles on DEV Community by Rani (@rani_ea731c5e9b512).</description>
    <link>https://dev.to/rani_ea731c5e9b512</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3850228%2Fc7b23426-a206-4e35-8a73-ae55193657ee.jpg</url>
      <title>DEV Community: Rani</title>
      <link>https://dev.to/rani_ea731c5e9b512</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rani_ea731c5e9b512"/>
    <language>en</language>
    <item>
      <title>Letting an AI agent run shell commands is RCE on your machine. I fixed it with the kernel, not Docker.</title>
      <dc:creator>Rani</dc:creator>
      <pubDate>Wed, 24 Jun 2026 15:25:12 +0000</pubDate>
      <link>https://dev.to/rani_ea731c5e9b512/letting-an-ai-agent-run-shell-commands-is-rce-on-your-machine-i-fixed-it-with-the-kernel-not-31db</link>
      <guid>https://dev.to/rani_ea731c5e9b512/letting-an-ai-agent-run-shell-commands-is-rce-on-your-machine-i-fixed-it-with-the-kernel-not-31db</guid>
      <description>&lt;p&gt;A few weeks ago I gave my coding agent permission to run shell commands, watched it run &lt;code&gt;cargo test&lt;/code&gt;, and felt good about myself. Then it hit me what I had actually done. "Let the model run shell commands" is just a friendly way of saying "let a program I do not fully control execute arbitrary code on my laptop." That is the textbook definition of remote code execution. I had built myself an RCE machine and handed it the keys.&lt;/p&gt;

&lt;p&gt;So I went looking for a way to box it in. This is what I tried, why Docker was the wrong tool, and what I ended up building instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  The obvious answer, and why it is wrong
&lt;/h2&gt;

&lt;p&gt;"Put it in a container" is everyone's first instinct, and it is not crazy. But Docker is the wrong shape for this specific job:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cold start.&lt;/strong&gt; An agent does not run one command, it runs hundreds of short ones. A 200ms+ spin-up per command turns a snappy session into a slideshow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It needs a daemon and root&lt;/strong&gt;, and on macOS a whole Linux VM. That is a lot of moving parts to babysit just to run &lt;code&gt;ls&lt;/code&gt; safely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It is the wrong granularity.&lt;/strong&gt; A container isolates a whole environment. What I actually wanted was to confine a single process, per command, for almost no cost.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The thing is, every major OS already ships exactly that primitive. We just rarely reach for it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The kernel already does this
&lt;/h2&gt;

&lt;p&gt;Each platform has a built-in way to confine a single process at the kernel level, no daemon required:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;macOS: Seatbelt.&lt;/strong&gt; The same &lt;code&gt;sandbox_init&lt;/code&gt; mechanism Chrome and friends use. You hand it a profile describing what the process may touch, and the kernel enforces it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linux: Landlock + seccomp.&lt;/strong&gt; Landlock (an LSM in mainline since 5.13) restricts filesystem access; seccomp-bpf filters which syscalls the process can even make.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Windows: AppContainer + a Job Object.&lt;/strong&gt; Capability-based confinement plus resource limits.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The catch is that these are three completely different APIs with three different mental models, and two of them are barely documented. Hiding that behind one interface ("confine this command to this directory, deny the network") was most of the work. The payoff is that the confinement is enforced by the kernel rather than by asking the model nicely, and cold start stays under 5ms because there is no container to build.&lt;/p&gt;

&lt;p&gt;In the tool I built (Skarn), it looks like this:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
skarn run --net deny -- cargo test&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;That runs the command locked to the project directory with network egress denied. If the model decides to &lt;code&gt;curl&lt;/code&gt; your secrets somewhere or &lt;code&gt;rm -rf&lt;/code&gt; a path outside the repo, the syscall fails. Not because of a policy prompt, but because the kernel said no.&lt;/p&gt;

&lt;h2&gt;
  
  
  The harder problem: running code the model wrote
&lt;/h2&gt;

&lt;p&gt;Sandboxing shell commands is the easy half. I also wanted the agent to orchestrate tools by writing a short script, which keeps huge tool schemas out of the context window (that is another post). But running model-generated code is the same RCE problem wearing a nicer hat.&lt;/p&gt;

&lt;p&gt;A JavaScript isolate alone is not a security boundary. People escape them. So I did not rely on it being one. The script runs in a QuickJS isolate, and that isolate runs inside a worker process that sandboxes &lt;em&gt;itself&lt;/em&gt; (deny network, no workspace writes) before it ever loads the model's code.&lt;/p&gt;

&lt;p&gt;That gives two independent walls:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The isolate.&lt;/strong&gt; Static validation rejects &lt;code&gt;eval&lt;/code&gt;, &lt;code&gt;Function&lt;/code&gt;, &lt;code&gt;require&lt;/code&gt;, &lt;code&gt;import&lt;/code&gt;, and &lt;code&gt;process&lt;/code&gt;, and execution is bounded by memory, stack, wall-clock, and output-size limits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The kernel sandbox underneath it.&lt;/strong&gt; Even a full isolate escape lands in a process that still cannot reach the network or write outside the workspace.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You have to get through both, and the outer one is enforced by the OS. The inner layer is for ergonomics, the outer layer is for actually stopping you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Being honest about the threat model
&lt;/h2&gt;

&lt;p&gt;A security post that only lists wins is marketing. So: this runs untrusted, model-generated code on purpose, and the most useful thing anyone can do is try to break it. The hand-written &lt;code&gt;unsafe&lt;/code&gt; FFI into those kernel APIs is where I am least confident, because the surfaces are sparsely documented. There are things it does not defend against, which is why the repo has a SECURITY.md that says so plainly. If you find a hole, I would rather hear about that than hear that it is cool.&lt;/p&gt;

&lt;h2&gt;
  
  
  The other half, briefly
&lt;/h2&gt;

&lt;p&gt;The same gateway also cuts the agent's token usage by compressing noisy shell output (70-90% fewer tokens, errors and warnings always kept) and by the schema-avoidance trick above. That is the part that saves money rather than saving your filesystem, and it is a separate story.&lt;/p&gt;

&lt;p&gt;If you want to read the code, kick the tires, or attack the sandbox, it is one Rust binary here: &lt;a href="https://github.com/Rani367/Skarn" rel="noopener noreferrer"&gt;https://github.com/Rani367/Skarn&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is early, MIT or Apache-2.0, and review of the sandbox crate is the most welcome thing you could send.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rust</category>
      <category>security</category>
      <category>programming</category>
    </item>
    <item>
      <title>I built a CLI that stops your CI from running tests it doesn't need to</title>
      <dc:creator>Rani</dc:creator>
      <pubDate>Mon, 30 Mar 2026 00:25:19 +0000</pubDate>
      <link>https://dev.to/rani_ea731c5e9b512/i-built-a-cli-that-stops-your-ci-from-running-tests-it-doesnt-need-to-4gbm</link>
      <guid>https://dev.to/rani_ea731c5e9b512/i-built-a-cli-that-stops-your-ci-from-running-tests-it-doesnt-need-to-4gbm</guid>
      <description>&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;You change one file in your monorepo. CI runs all 200 tests. 35 minutes later, you get a green checkmark for tests that had nothing to do with your change.&lt;/p&gt;

&lt;p&gt;Every team I've seen deals with this in one of three ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Ignore it&lt;/strong&gt; — waste CI minutes and developer time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hack together bash scripts&lt;/strong&gt; — &lt;code&gt;git diff --name-only | grep&lt;/code&gt; piped into whatever test runner you use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adopt Nx/Bazel/Turborepo&lt;/strong&gt; — great tools, but they require buying into an entire build framework&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I wanted option 4: &lt;strong&gt;a standalone CLI that just works&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/Rani367/affected" rel="noopener noreferrer"&gt;&lt;code&gt;affected&lt;/code&gt;&lt;/a&gt; is a Rust CLI that detects which packages in your monorepo are affected by git changes and runs only their tests.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;affected list &lt;span class="nt"&gt;--base&lt;/span&gt; main &lt;span class="nt"&gt;--explain&lt;/span&gt;

3 affected package&lt;span class="o"&gt;(&lt;/span&gt;s&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;base: main, 2 files changed&lt;span class="o"&gt;)&lt;/span&gt;:

  ● core       &lt;span class="o"&gt;(&lt;/span&gt;directly changed: src/lib.rs&lt;span class="o"&gt;)&lt;/span&gt;
  ● api        &lt;span class="o"&gt;(&lt;/span&gt;depends on: core&lt;span class="o"&gt;)&lt;/span&gt;
  ● cli        &lt;span class="o"&gt;(&lt;/span&gt;depends on: api → core&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How it works
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Detect&lt;/strong&gt; — scans for marker files (Cargo.toml, package.json, go.mod, pom.xml, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resolve&lt;/strong&gt; — builds a dependency graph from project manifests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diff&lt;/strong&gt; — computes changed files using libgit2&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Map&lt;/strong&gt; — maps each changed file to its owning package&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Traverse&lt;/strong&gt; — runs reverse BFS on the dependency graph to find all transitively affected packages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execute&lt;/strong&gt; — runs test commands for affected packages only&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  What it supports
&lt;/h3&gt;

&lt;p&gt;7 ecosystems out of the box, zero config:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Ecosystem&lt;/th&gt;
&lt;th&gt;Detection&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cargo&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;Cargo.toml&lt;/code&gt; workspace&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;npm/pnpm&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;package.json&lt;/code&gt; workspaces&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yarn Berry&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.yarnrc.yml&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;go.work&lt;/code&gt; / &lt;code&gt;go.mod&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;pyproject.toml&lt;/code&gt; (Poetry, uv, generic)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maven&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;pom.xml&lt;/code&gt; with &lt;code&gt;&amp;lt;modules&amp;gt;&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gradle&lt;/td&gt;
&lt;td&gt;&lt;code&gt;settings.gradle(.kts)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  CI integration
&lt;/h2&gt;

&lt;p&gt;This was designed for CI from day one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# GitHub Actions&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Detect affected&lt;/span&gt;
  &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;affected&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;affected ci --merge-base main&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run tests&lt;/span&gt;
  &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;steps.affected.outputs.has_affected == 'true'&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;affected test --merge-base main --jobs 4 --junit results.xml&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It also supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--json&lt;/code&gt; for structured output&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--junit results.xml&lt;/code&gt; for JUnit XML (Jenkins, GitLab, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--filter "lib-*"&lt;/code&gt; / &lt;code&gt;--skip "e2e-*"&lt;/code&gt; for targeting specific packages&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--explain&lt;/code&gt; to show the dependency chain for each affected package&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--jobs 4&lt;/code&gt; for parallel test execution&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The &lt;code&gt;--explain&lt;/code&gt; flag
&lt;/h2&gt;

&lt;p&gt;This is my favorite feature. Instead of just listing affected packages, it tells you &lt;em&gt;why&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;affected list &lt;span class="nt"&gt;--base&lt;/span&gt; main &lt;span class="nt"&gt;--explain&lt;/span&gt;

  ● core       &lt;span class="o"&gt;(&lt;/span&gt;directly changed: src/lib.rs, src/utils.rs&lt;span class="o"&gt;)&lt;/span&gt;
  ● api        &lt;span class="o"&gt;(&lt;/span&gt;depends on: core&lt;span class="o"&gt;)&lt;/span&gt;
  ● cli        &lt;span class="o"&gt;(&lt;/span&gt;depends on: api → core&lt;span class="o"&gt;)&lt;/span&gt;
  ● docs-gen   &lt;span class="o"&gt;(&lt;/span&gt;depends on: api → core&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you know exactly which change caused which packages to be retested.&lt;/p&gt;

&lt;h2&gt;
  
  
  Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;~5,000 lines of Rust&lt;/li&gt;
&lt;li&gt;160+ tests (unit + integration + CLI)&lt;/li&gt;
&lt;li&gt;CI passes on Linux, macOS, and Windows&lt;/li&gt;
&lt;li&gt;MIT licensed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo &lt;span class="nb"&gt;install &lt;/span&gt;affected-cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then in any monorepo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;affected detect          &lt;span class="c"&gt;# see what it found&lt;/span&gt;
affected list &lt;span class="nt"&gt;--base&lt;/span&gt; main  &lt;span class="c"&gt;# see what's affected&lt;/span&gt;
affected &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;--base&lt;/span&gt; main  &lt;span class="c"&gt;# run only affected tests&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/Rani367/affected" rel="noopener noreferrer"&gt;github.com/Rani367/affected&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;I'd love to hear what edge cases you hit, what ecosystems you'd want added, or if this actually saves you CI time. Star the repo if it's useful, it helps others find it.&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>rust</category>
      <category>testing</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
