DEV Community: Rani

Letting an AI agent run shell commands is RCE on your machine. I fixed it with the kernel, not Docker.

Rani — Wed, 24 Jun 2026 15:25:12 +0000

A few weeks ago I gave my coding agent permission to run shell commands, watched it run cargo test, and felt good about myself. Then it hit me what I had actually done. "Let the model run shell commands" is just a friendly way of saying "let a program I do not fully control execute arbitrary code on my laptop." That is the textbook definition of remote code execution. I had built myself an RCE machine and handed it the keys.

So I went looking for a way to box it in. This is what I tried, why Docker was the wrong tool, and what I ended up building instead.

The obvious answer, and why it is wrong

"Put it in a container" is everyone's first instinct, and it is not crazy. But Docker is the wrong shape for this specific job:

Cold start. An agent does not run one command, it runs hundreds of short ones. A 200ms+ spin-up per command turns a snappy session into a slideshow.
It needs a daemon and root, and on macOS a whole Linux VM. That is a lot of moving parts to babysit just to run ls safely.
It is the wrong granularity. A container isolates a whole environment. What I actually wanted was to confine a single process, per command, for almost no cost.

The thing is, every major OS already ships exactly that primitive. We just rarely reach for it.

The kernel already does this

Each platform has a built-in way to confine a single process at the kernel level, no daemon required:

macOS: Seatbelt. The same sandbox_init mechanism Chrome and friends use. You hand it a profile describing what the process may touch, and the kernel enforces it.
Linux: Landlock + seccomp. Landlock (an LSM in mainline since 5.13) restricts filesystem access; seccomp-bpf filters which syscalls the process can even make.
Windows: AppContainer + a Job Object. Capability-based confinement plus resource limits.

The catch is that these are three completely different APIs with three different mental models, and two of them are barely documented. Hiding that behind one interface ("confine this command to this directory, deny the network") was most of the work. The payoff is that the confinement is enforced by the kernel rather than by asking the model nicely, and cold start stays under 5ms because there is no container to build.

In the tool I built (Skarn), it looks like this:

\bash skarn run --net deny -- cargo test \\

That runs the command locked to the project directory with network egress denied. If the model decides to curl your secrets somewhere or rm -rf a path outside the repo, the syscall fails. Not because of a policy prompt, but because the kernel said no.

The harder problem: running code the model wrote

Sandboxing shell commands is the easy half. I also wanted the agent to orchestrate tools by writing a short script, which keeps huge tool schemas out of the context window (that is another post). But running model-generated code is the same RCE problem wearing a nicer hat.

A JavaScript isolate alone is not a security boundary. People escape them. So I did not rely on it being one. The script runs in a QuickJS isolate, and that isolate runs inside a worker process that sandboxes itself (deny network, no workspace writes) before it ever loads the model's code.

That gives two independent walls:

The isolate. Static validation rejects eval, Function, require, import, and process, and execution is bounded by memory, stack, wall-clock, and output-size limits.
The kernel sandbox underneath it. Even a full isolate escape lands in a process that still cannot reach the network or write outside the workspace.

You have to get through both, and the outer one is enforced by the OS. The inner layer is for ergonomics, the outer layer is for actually stopping you.

Being honest about the threat model

A security post that only lists wins is marketing. So: this runs untrusted, model-generated code on purpose, and the most useful thing anyone can do is try to break it. The hand-written unsafe FFI into those kernel APIs is where I am least confident, because the surfaces are sparsely documented. There are things it does not defend against, which is why the repo has a SECURITY.md that says so plainly. If you find a hole, I would rather hear about that than hear that it is cool.

The other half, briefly

The same gateway also cuts the agent's token usage by compressing noisy shell output (70-90% fewer tokens, errors and warnings always kept) and by the schema-avoidance trick above. That is the part that saves money rather than saving your filesystem, and it is a separate story.

If you want to read the code, kick the tires, or attack the sandbox, it is one Rust binary here: https://github.com/Rani367/Skarn

It is early, MIT or Apache-2.0, and review of the sandbox crate is the most welcome thing you could send.

I built a CLI that stops your CI from running tests it doesn't need to

Rani — Mon, 30 Mar 2026 00:25:19 +0000

The problem

You change one file in your monorepo. CI runs all 200 tests. 35 minutes later, you get a green checkmark for tests that had nothing to do with your change.

Every team I've seen deals with this in one of three ways:

Ignore it — waste CI minutes and developer time
Hack together bash scripts — git diff --name-only | grep piped into whatever test runner you use
Adopt Nx/Bazel/Turborepo — great tools, but they require buying into an entire build framework

I wanted option 4: a standalone CLI that just works.

What I built

affected is a Rust CLI that detects which packages in your monorepo are affected by git changes and runs only their tests.

$ affected list --base main --explain

3 affected package(s) (base: main, 2 files changed):

  ● core       (directly changed: src/lib.rs)
  ● api        (depends on: core)
  ● cli        (depends on: api → core)

How it works

Detect — scans for marker files (Cargo.toml, package.json, go.mod, pom.xml, etc.)
Resolve — builds a dependency graph from project manifests
Diff — computes changed files using libgit2
Map — maps each changed file to its owning package
Traverse — runs reverse BFS on the dependency graph to find all transitively affected packages
Execute — runs test commands for affected packages only

What it supports

7 ecosystems out of the box, zero config:

Ecosystem	Detection
Cargo	`Cargo.toml` workspace
npm/pnpm	`package.json` workspaces
Yarn Berry	`.yarnrc.yml`
Go	`go.work` / `go.mod`
Python	`pyproject.toml` (Poetry, uv, generic)
Maven	`pom.xml` with `<modules>`
Gradle	`settings.gradle(.kts)`

CI integration

This was designed for CI from day one:

# GitHub Actions
- name: Detect affected
  id: affected
  run: affected ci --merge-base main

- name: Run tests
  if: steps.affected.outputs.has_affected == 'true'
  run: affected test --merge-base main --jobs 4 --junit results.xml

It also supports:

--json for structured output
--junit results.xml for JUnit XML (Jenkins, GitLab, etc.)
--filter "lib-*" / --skip "e2e-*" for targeting specific packages
--explain to show the dependency chain for each affected package
--jobs 4 for parallel test execution

The `--explain` flag

This is my favorite feature. Instead of just listing affected packages, it tells you why:

$ affected list --base main --explain

  ● core       (directly changed: src/lib.rs, src/utils.rs)
  ● api        (depends on: core)
  ● cli        (depends on: api → core)
  ● docs-gen   (depends on: api → core)

Now you know exactly which change caused which packages to be retested.

Numbers

~5,000 lines of Rust
160+ tests (unit + integration + CLI)
CI passes on Linux, macOS, and Windows
MIT licensed

Try it

cargo install affected-cli

Then in any monorepo:

affected detect          # see what it found
affected list --base main  # see what's affected
affected test --base main  # run only affected tests

GitHub: github.com/Rani367/affected

I'd love to hear what edge cases you hit, what ecosystems you'd want added, or if this actually saves you CI time. Star the repo if it's useful, it helps others find it.