DEV Community

Dmytro Levchenko
Dmytro Levchenko

Posted on • Originally published at levchenkod.com

Unleashing Agentic Coding Tools

ai-cli-sandbox-levchenkod.com@2x.png

Intro

Over the last few years, we have seen an immense boom in agentic coding tools, and while the applicability is often clear, workflow-wise there are different ways and flavours to do the job. At a high level, we’re talking about a trade-off among efficiency, effectiveness, autonomous vs. interactive ways to generate code, and, of course, security.

In this article I’ll focus on how to securely improve the efficiency of autonomous coding tools, like Codex*. That works as well for small-to-medium teams as for individuals.

*in the examples I’ll use Codex, but the same approach works for Claude Code, Gemini, OpenCode and other interchangeable agentic CLI tools. The only important detail is that you might need to change tool-specific flags and params.

Problem

While CLI tools are leaning towards the autonomous side of the spectrum, by default they still require a lot of short-lived interactions for you as a user during the generative session - approving script runs, file reads, env reads(…yeah), you name it.

One way to solve it is to use tool settings: update permissions, yolo mode (danger-full-access), a sandbox, or remote execution. If you are a user of the enterprise package, most of that is likely already defined for you by the admin.

The compromises here are that it’s

A) less convenient to transfer and maintain permissions across vendors. With the industry moving that fast, it’s a good strategy to be open to new tooling

B) you have to trust that the tool will respect the boundaries and permissions

Solution

Another, more flexible way is to constrain agentic CLI tools at the OS level. By running Codex or Claude in an isolated Docker container/microVM(Virtual Machine), you get

  • a more contained environment to run the tool in full access mode
  • fewer hiccups with permission requests
  • reproducibility across machines
  • flexibility to swap the tool without affecting existing workflows that much

Based on your goals, there are different levels of how you can adopt this approach. I’ll use sbx https://docs.docker.com/ai/sandboxes/ as it is specifically designed for such use cases.

Docker Sandboxes run AI coding agents in isolated microVM sandboxes

To set it up, simply run

brew install docker/tap/sbx
sbx login
Enter fullscreen mode Exit fullscreen mode

Docker Templates

Docker offers a list of maintained sandbox templates https://docs.docker.com/ai/sandboxes/customize/templates/, which is good enough for basic tasks

Here's an example for running Codex

sbx run codex --template docker.io/docker/sandbox-templates:codex
Enter fullscreen mode Exit fullscreen mode

For alternative tools, the idea is the same, but the template must match the tool.

sbx run claude --template docker.io/docker/sandbox-templates:claude-code
Enter fullscreen mode Exit fullscreen mode

That command will create a workspace sandbox and start an interactive CLI session, and to run it autonomously, add the exec command

sbx run codex --template docker.io/docker/sandbox-templates:codex -- exec "create google clone, no mistakes"
Enter fullscreen mode Exit fullscreen mode

Custom Templates

Docker templates are basically container images used as sandbox templates, meaning that to execute additional libraries or tools, your agent will need access to them, and in yolo mode it will most likely just go and install them. That’s effective - it doesn’t bother you, but not efficient - token burn rate may skyrocket.

That can be avoided with custom containers-templates, that have all the libs and tools. Extra perk - you can inject a reusable system prompt/config in the script itself, or preinstall tools that you expect the agent to use often.

One way to do it - assuming the agent installed everything itself - is to, right after the sbx session ends, call the sbx template save command

sbx template save workspace-sandbox-name new-template-name:v1
Enter fullscreen mode Exit fullscreen mode

Important: do not save/publish templates from sandboxes where the agent could have handled secrets, logged tokens, cloned private repos with credentials, or written auth config into the filesystem. Saving the template captures the filesystem state.

But to make it reusable, we’ll have to create a new Dockerfile. Here’s a step-by-step guide for a FastAPI + React monorepo template (pnpm, Vite, Node.js, Python, Playwright, and Poetry):

FROM docker.io/docker/sandbox-templates:codex

LABEL maintainer="levchenkod.com" \
    description="Sandbox template for Codex and Playwright, with pinned Node.js, Python, Playwright, pnpm, and Poetry"

ENV POETRY_HOME=/opt/poetry \
    PLAYWRIGHT_BROWSERS_PATH=/ms-playwright 

USER root

ENV PNPM_STORE_PATH=/home/agent/.local/share/pnpm/store
ENV DEBIAN_FRONTEND=noninteractive
ENV NPM_CONFIG_PREFIX=
ENV npm_config_prefix=
ENV PNPM_HOME=/home/agent/.local/share/pnpm
ENV PATH=/home/agent/.local/bin:/home/agent/.local/share/pnpm:${PATH}

ARG NODEJS_APT_VERSION=
ARG NPM_APT_VERSION=
ARG PYTHON3_APT_VERSION=
ARG PYTHON3_PIP_APT_VERSION=
ARG PNPM_VERSION=10.24.0
ARG TYPESCRIPT_VERSION=5.4.5
ARG VITE_VERSION=5.2.11

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        ca-certificates \
        curl \
        nodejs${NODEJS_APT_VERSION:+=${NODEJS_APT_VERSION}} \
        npm${NPM_APT_VERSION:+=${NPM_APT_VERSION}} \
        python-is-python3 \
        python3${PYTHON3_APT_VERSION:+=${PYTHON3_APT_VERSION}} \
        python3-pip${PYTHON3_PIP_APT_VERSION:+=${PYTHON3_PIP_APT_VERSION}} \
        sudo \
        tini \
    && rm -rf /var/lib/apt/lists/*

RUN mkdir -p /ms-playwright /home/agent/.local/bin /home/agent/.local/share/pnpm/store \
    && chown -R agent:agent /ms-playwright /home/agent/.local

USER agent
SHELL ["/bin/bash", "-lc"]

# pnpm, Vite, and TypeScript as pinned global CLIs.
RUN unset NPM_CONFIG_PREFIX npm_config_prefix \
    && npm --prefix /home/agent/.local install -g "pnpm@${PNPM_VERSION}" "vite@${VITE_VERSION}" "typescript@${TYPESCRIPT_VERSION}" \
    && /home/agent/.local/bin/pnpm config set global-bin-dir "${PNPM_HOME}" \
    && node --version \
    && npm --version \
    && pnpm --version \
    && vite --version \
    && tsc --version \
    && python --version

COPY --chown=agent:agent web/package.json web/pnpm-lock.yaml /tmp/codex-playwright-web/

RUN cd /tmp/codex-playwright-web \
    && pnpm fetch --frozen-lockfile --store-dir "${PNPM_STORE_PATH}" \
    && rm -rf /tmp/codex-playwright-web

ARG PLAYWRIGHT_VERSION=1.60.0
ARG PLAYWRIGHT_HOST_PLATFORM_OVERRIDE=
ARG TARGETARCH

# Playwright package plus its matching Chromium.
RUN python -m pip install --user --break-system-packages "playwright==${PLAYWRIGHT_VERSION}" \
    && if [[ -z "${PLAYWRIGHT_HOST_PLATFORM_OVERRIDE}" ]]; then \
        case "${TARGETARCH}" in \
            amd64) PLAYWRIGHT_HOST_PLATFORM_OVERRIDE=ubuntu24.04-x64 ;; \
            arm64) PLAYWRIGHT_HOST_PLATFORM_OVERRIDE=ubuntu24.04-arm64 ;; \
            *) echo "Unsupported TARGETARCH for Playwright: ${TARGETARCH}" >&2; exit 1 ;; \
        esac; \
    fi \
    && PLAYWRIGHT_HOST_PLATFORM_OVERRIDE="${PLAYWRIGHT_HOST_PLATFORM_OVERRIDE}" python -m playwright install-deps chromium \
    && PLAYWRIGHT_HOST_PLATFORM_OVERRIDE="${PLAYWRIGHT_HOST_PLATFORM_OVERRIDE}" python -m playwright install chromium \
    && touch /ms-playwright/.system-deps-installed

WORKDIR /workspace

ENTRYPOINT ["/usr/bin/tini", "--"]
CMD ["sleep", "infinity"]
Enter fullscreen mode Exit fullscreen mode

Build and publish the image

docker buildx build \
  --platform linux/arm64 \                     
  --push \
  --provenance=false \
  -t lapps/codex-playwright:0.1.0 \
  -f ./Dockerfile.codex-playwright .
Enter fullscreen mode Exit fullscreen mode

Or save it locally as tar

docker image save lapps/codex-playwright:0.1.0 -o codex-playwright.tar
Enter fullscreen mode Exit fullscreen mode

If you use a local tar, load it into sbx

sbx template load codex-playwright.tar
Enter fullscreen mode Exit fullscreen mode

Create a new workspace using your template

sbx create --name codex-playwright --template docker.io/lapps/codex-playwright:0.1.0 codex .
Enter fullscreen mode Exit fullscreen mode

For the context - in my system prompts I like to define that after a task is completed the e2e video proof must be provided, so I can validate the behaviour even before reviewing the code. And Playwright here does the heavy lifting.

To test Playwright, I created a smoke test:

import { expect, test } from "@playwright/test";

test("records video for a trivial browser page", async ({ page }) => {
  await page.setContent("<main><h1>Playwright video smoke</h1></main>");

  await expect(
    page.getByRole("heading", { name: "Playwright video smoke" }),
  ).toBeVisible();
});
Enter fullscreen mode Exit fullscreen mode

And then run

sbx run codex-playwright -- exec "run playwright video smoke spec"
Enter fullscreen mode Exit fullscreen mode

Which will result in a new video file

./test-results/playwright-video-smoke-rec-6c08a--for-a-trivial-browser-page-chromium/video.webm
Enter fullscreen mode Exit fullscreen mode

Outcome

With a few simple steps, we get a reliable, reproducible and more contained way to let generative models do whatever they do best - generate code changes, without stopping to ask their human for permission. Also, we can give the tool more freedom within the sandbox while keeping the host machine, credentials, and network access strictly constrained.


The original article also has example use cases

Top comments (0)