<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yuji Suzuki</title>
    <description>The latest articles on DEV Community by Yuji Suzuki (@yujisuzuki).</description>
    <link>https://dev.to/yujisuzuki</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3750795%2Ffc147a96-d3ca-4873-a567-f27aa1edabe7.png</url>
      <title>DEV Community: Yuji Suzuki</title>
      <link>https://dev.to/yujisuzuki</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yujisuzuki"/>
    <language>en</language>
    <item>
      <title>My AI Escaped Its Container and Did Everything — Except Review Its Own Code</title>
      <dc:creator>Yuji Suzuki</dc:creator>
      <pubDate>Tue, 17 Feb 2026 15:23:49 +0000</pubDate>
      <link>https://dev.to/yujisuzuki/my-ai-escaped-its-container-and-did-everything-except-review-its-own-code-388o</link>
      <guid>https://dev.to/yujisuzuki/my-ai-escaped-its-container-and-did-everything-except-review-its-own-code-388o</guid>
      <description>&lt;h2&gt;
  
  
  Previously: The Complete Dev Cycle
&lt;/h2&gt;

&lt;p&gt;In &lt;a href="https://dev.to/yujisuzuki/my-ai-broke-out-of-its-container-and-i-let-it-4dkh"&gt;Part 4 of this series&lt;/a&gt;, my AI assistant achieved something remarkable. Running inside a secure Docker container, it could now execute the entire development cycle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Code → Test → Build → Deploy → Commit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I called it the finale. The trilogy was complete. The AI could write code, run tests, build artifacts, deploy to containers, and commit changes — all while keeping secrets safely hidden.&lt;/p&gt;

&lt;p&gt;I was wrong. Something was missing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Missing Piece
&lt;/h2&gt;

&lt;p&gt;Look at that cycle again. Now think about how a real development team works.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Code → Test → Build → Deploy → Commit → PR → ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where's the review?&lt;/p&gt;

&lt;p&gt;In any professional team, code doesn't just flow from writing to deployment. Someone reads it. Someone checks for bugs, security issues, architectural problems. Someone asks "did you consider this edge case?"&lt;/p&gt;

&lt;p&gt;My AI could do everything — except check its own work.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Official Plugin
&lt;/h2&gt;

&lt;p&gt;Claude Code has an official &lt;a href="https://github.com/anthropics/claude-code/blob/main/plugins/code-review/commands/code-review.md" rel="noopener noreferrer"&gt;&lt;code&gt;/code-review&lt;/code&gt; plugin&lt;/a&gt;. When I discovered it, I was impressed by its design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Parallel agents&lt;/strong&gt;: Multiple AI agents analyze code simultaneously from different angles — bug scanning, CLAUDE.md compliance checking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confidence scoring&lt;/strong&gt;: Each finding gets a score, filtering out noise&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verification step&lt;/strong&gt;: A separate agent re-checks findings to eliminate false positives&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is serious engineering. Not "ask AI to review code" but a structured, multi-stage pipeline designed to produce high-signal results.&lt;/p&gt;

&lt;p&gt;I installed it immediately.&lt;/p&gt;

&lt;p&gt;And it didn't work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It Couldn't Reach
&lt;/h2&gt;

&lt;p&gt;The official plugin is designed for a standard GitHub workflow. It expects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;gh&lt;/code&gt; CLI&lt;/strong&gt; — to fetch PR details from GitHub&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A GitHub PR&lt;/strong&gt; — the review target is a pull request&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A single repository&lt;/strong&gt; — it operates within one project&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My AI Sandbox environment has none of that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No &lt;code&gt;gh&lt;/code&gt; CLI (the container has no GitHub authentication)&lt;/li&gt;
&lt;li&gt;No PR yet (I want review &lt;em&gt;before&lt;/em&gt; pushing, not after)&lt;/li&gt;
&lt;li&gt;Multiple independent repositories in one workspace (API, Web, iOS — each with their own Git history)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The plugin couldn't reach my code. Not because it was poorly designed — it's excellent at what it does. But it was built for a different moment in the development cycle: &lt;strong&gt;after&lt;/strong&gt; you push. I needed something &lt;strong&gt;before&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Learning From the Design
&lt;/h2&gt;

&lt;p&gt;I couldn't use the plugin directly, but I could learn from it.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/anthropics/claude-code/blob/main/plugins/README.md" rel="noopener noreferrer"&gt;plugins documentation&lt;/a&gt; showed me that Claude Code's custom commands are just Markdown files — structured instructions that become slash commands. The official &lt;code&gt;/code-review&lt;/code&gt; demonstrated what a well-designed review pipeline looks like: parallel analysis, scoring, verification.&lt;/p&gt;

&lt;p&gt;So I did what my AI Sandbox was built for. I asked the AI:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Analyze the code-review plugin and create a custom command that works locally. Allow selecting which project to review. Confirm the target branch with the user. Run the same kind of review, but without GitHub access.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The AI read the official plugin, understood its structure, and produced a local version. No &lt;code&gt;gh&lt;/code&gt; dependency. Multi-project support. Git and non-Git modes.&lt;/p&gt;

&lt;p&gt;It worked.&lt;/p&gt;

&lt;h2&gt;
  
  
  From One to Nine
&lt;/h2&gt;

&lt;p&gt;Once the local review command was running, the next thought was obvious.&lt;/p&gt;

&lt;p&gt;If I can have a general code reviewer, why not a security reviewer? A performance reviewer? An architecture reviewer?&lt;/p&gt;

&lt;p&gt;Each review type needs different expertise. A security review looks for injection vulnerabilities, authentication gaps, and data exposure. A performance review looks for N+1 queries, unnecessary allocations, and missing caching. A general review catches bugs and checks CLAUDE.md compliance.&lt;/p&gt;

&lt;p&gt;One command became nine:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ais-local-review&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;General code review (bugs, CLAUDE.md)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ais-local-security-review&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Security vulnerabilities&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ais-local-performance-review&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Performance bottlenecks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ais-local-architecture-review&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Structural concerns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ais-local-test-review&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Test quality assessment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ais-local-doc-review&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Documentation accuracy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ais-local-prompt-review&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;AI prompt/command quality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ais-refactor&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Concrete refactoring suggestions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ais-test-gen&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Automated test generation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All nine share the same pipeline architecture inspired by the official plugin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Parallel Analysis → Scoring → Verification → Report
(4-5 Sonnet agents)  (Haiku)   (Sonnet)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each specialized command sends parallel agents with different review perspectives. A scoring agent evaluates confidence. A verification agent eliminates false positives. Only high-confidence, verified findings make it to the final report.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pipeline in Action
&lt;/h2&gt;

&lt;p&gt;Here's what happens when you run &lt;code&gt;/ais-local-review&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1&lt;/strong&gt;: Select a project and branch (or files, if no Git)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2&lt;/strong&gt;: Four Sonnet agents launch in parallel:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent #1: CLAUDE.md compliance — does the code follow project conventions?&lt;/li&gt;
&lt;li&gt;Agent #2: Bug scan — obvious logic errors, edge cases&lt;/li&gt;
&lt;li&gt;Agent #3: History analysis — are we reintroducing a previously fixed bug?&lt;/li&gt;
&lt;li&gt;Agent #4: Comment check — does the code match its own documentation?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 3&lt;/strong&gt;: A Haiku agent scores every finding (0-100)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4&lt;/strong&gt;: A Sonnet verification agent re-checks anything scoring 75+&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5&lt;/strong&gt;: Only confirmed, high-confidence issues appear in the report&lt;/p&gt;

&lt;p&gt;The result is a focused report. Not a wall of nitpicks — a short list of things that actually matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Reviews, Two Moments
&lt;/h2&gt;

&lt;p&gt;Here's what's interesting: the official plugin and my local commands aren't competing. They serve different moments in the development cycle.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Code → Review → Test → Build → Deploy → Commit → PR → Review
         ↑                                                ↑
    ais-* commands                              Official /code-review
    Before you push                              After you push
    Quality gate                                 Team review
    Local, private                               GitHub, collaborative
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The official &lt;code&gt;/code-review&lt;/code&gt; is for when your code is ready for team eyes. It posts comments on PRs, suggests changes, integrates with GitHub's collaboration features.&lt;/p&gt;

&lt;p&gt;My &lt;code&gt;ais-*&lt;/code&gt; commands are for before that moment. While you're still developing. Before you've committed, sometimes before you've even finished writing tests. A private quality gate that catches issues early, when they're cheapest to fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Completed Cycle
&lt;/h2&gt;

&lt;p&gt;Remember the development cycle from Part 4?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Code → Test → Build → Deploy → Commit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's what it looks like now:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Code → Review → Test → Build → Deploy → Commit
         ↑
    The missing piece
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AI can write code, review its own work (from multiple perspectives), run tests, build, deploy, and commit. The quality gate that was missing is now in place.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;This project started because the official plugin couldn't reach my code. But that limitation led somewhere unexpected.&lt;/p&gt;

&lt;p&gt;The official plugin's design — parallel agents, confidence scoring, false positive elimination — was the blueprint. Open source at its best: you read how something works, understand the principles, and adapt them to your environment.&lt;/p&gt;

&lt;p&gt;I didn't just get a code reviewer. I got nine specialized review tools, a refactoring assistant, and an automated test generator. All because the official plugin showed me what a well-designed review pipeline looks like, and my AI Sandbox gave me a place to build one that works locally.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Series So Far
&lt;/h2&gt;

&lt;p&gt;What started as "my AI can see my API keys" has become something larger:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://dev.to/yujisuzuki/your-ai-assistant-can-see-your-secrets-heres-how-i-fixed-it-19lj"&gt;Secrets&lt;/a&gt;&lt;/strong&gt;: Hide sensitive files from AI using Docker volume mounts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://dev.to/yujisuzuki/i-gave-my-ai-a-toolbox-it-surprised-me-by-using-it-4on2"&gt;Toolbox&lt;/a&gt;&lt;/strong&gt;: AI discovers and uses tools autonomously via SandboxMCP&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://dev.to/yujisuzuki/my-ai-broke-out-of-its-container-and-i-let-it-4dkh"&gt;Host Access&lt;/a&gt;&lt;/strong&gt;: AI breaks out of its container with controlled host OS access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review&lt;/strong&gt; (this article): AI reviews its own code, completing the dev cycle&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The trilogy became a tetralogy. I'll stop promising it's complete.&lt;/p&gt;




&lt;p&gt;The AI Sandbox with DockMCP is open source: &lt;a href="https://github.com/yujiosaka/ai-sandbox-with-dockmcp" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you've built custom review commands for your AI workflow, I'd love to hear about it in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>docker</category>
      <category>security</category>
    </item>
    <item>
      <title>My AI Broke Out of Its Container — And I Let It</title>
      <dc:creator>Yuji Suzuki</dc:creator>
      <pubDate>Sun, 15 Feb 2026 15:56:36 +0000</pubDate>
      <link>https://dev.to/yujisuzuki/my-ai-broke-out-of-its-container-and-i-let-it-4dkh</link>
      <guid>https://dev.to/yujisuzuki/my-ai-broke-out-of-its-container-and-i-let-it-4dkh</guid>
      <description>&lt;h2&gt;
  
  
  Previously, on AI Sandbox...
&lt;/h2&gt;

&lt;p&gt;If you've been following along, you know the story:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://dev.to/yujisuzuki/your-ai-assistant-can-see-your-secrets-heres-how-i-fixed-it-19lj"&gt;Part 1&lt;/a&gt;:&lt;/strong&gt; I discovered my AI assistant was reading my API keys. I built a Docker sandbox that hides secrets via volume mounts — files appear empty to AI, but application containers still have full access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://dev.to/yujisuzuki/i-gave-my-ai-a-toolbox-it-surprised-me-by-using-it-4on2"&gt;Part 2&lt;/a&gt;:&lt;/strong&gt; I gave the sandboxed AI a toolbox (SandboxMCP). It surprised me by autonomously discovering a forgotten tool and repurposing it to solve a problem I hadn't anticipated.&lt;/p&gt;

&lt;p&gt;Now for the final chapter.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Last Wall
&lt;/h2&gt;

&lt;p&gt;After Parts 1 and 2, my AI assistant could:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Read and edit all source code&lt;/li&gt;
&lt;li&gt;✅ Check container logs via DockMCP&lt;/li&gt;
&lt;li&gt;✅ Run tests inside containers&lt;/li&gt;
&lt;li&gt;✅ Discover and use tools autonomously&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But there was one thing it still couldn't do:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anything that required the host OS.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Start the demo app" → Sorry, I can't run docker-compose.
"Build the containers" → I don't have access to Docker.
"Commit this change" → I can commit, but the message style...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every time I needed to build, deploy, or manage containers, I had to switch to my terminal and do it myself. The AI was powerful inside its box, but the box was still a box.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Uncomfortable Question
&lt;/h2&gt;

&lt;p&gt;Here's the thing about containers: they're designed to isolate. That's their entire purpose. Giving a sandboxed AI access to the host OS sounds like... undoing the sandbox.&lt;/p&gt;

&lt;p&gt;I sat with this tension for a while. The whole point of AI Sandbox was &lt;strong&gt;security&lt;/strong&gt; — hiding secrets, controlling access, preventing accidents. How do you extend reach without breaking trust?&lt;/p&gt;

&lt;p&gt;The answer turned out to be the same pattern I'd used twice before: &lt;strong&gt;don't give access, give a controlled interface.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Part 1: Don't give AI your secrets → Give it DockMCP to read logs and run tests&lt;/li&gt;
&lt;li&gt;Part 2: Don't install tools for AI → Let it discover what's available&lt;/li&gt;
&lt;li&gt;Part 3: Don't give AI the host OS → Give it &lt;strong&gt;approved scripts&lt;/strong&gt; it can run through a gateway&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How Host Access Works
&lt;/h2&gt;

&lt;p&gt;The architecture is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AI Sandbox (container)
  │
  │  MCP / HTTP
  ▼
DockMCP Server (host OS)
  ├── Container access     ← existing (logs, exec, stats)
  ├── Host Tools           ← NEW: run approved scripts
  ├── Container Lifecycle  ← NEW: start/stop/restart
  └── Host Commands        ← NEW: whitelisted CLI commands
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But the interesting part isn't the architecture — it's the approval workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Two-Stage Approval
&lt;/h3&gt;

&lt;p&gt;AI (or developers) can propose a new host tool by placing a script in &lt;code&gt;.sandbox/host-tools/&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# demo-up.sh&lt;/span&gt;
&lt;span class="c"&gt;# Start demo application containers&lt;/span&gt;
docker compose &lt;span class="nt"&gt;-f&lt;/span&gt; demo-apps/docker-compose.yml up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But this script &lt;strong&gt;cannot run yet&lt;/strong&gt;. It's in staging — inside the workspace where AI can write files.&lt;/p&gt;

&lt;p&gt;To activate it, the human runs &lt;code&gt;dkmcp tools sync&lt;/code&gt; on the host:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ dkmcp tools sync

📋 Reviewing tools from .sandbox/host-tools/:

  demo-up.sh (NEW)
    Start demo application containers
    ─────────────────────────────
    #!/bin/bash
    docker compose -f demo-apps/docker-compose.yml up -d
    ─────────────────────────────
    Approve? [y/N]: y
    ✅ Approved → ~/.dkmcp/host-tools/myproject/demo-up.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Only the &lt;strong&gt;approved copy&lt;/strong&gt; runs. If AI modifies the staging version later, &lt;code&gt;dkmcp tools sync&lt;/code&gt; detects the SHA256 mismatch and prompts for re-approval.&lt;/p&gt;

&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI can &lt;strong&gt;propose&lt;/strong&gt; tools (creative freedom)&lt;/li&gt;
&lt;li&gt;Humans &lt;strong&gt;approve&lt;/strong&gt; tools (security control)&lt;/li&gt;
&lt;li&gt;Only approved versions &lt;strong&gt;execute&lt;/strong&gt; (integrity guarantee)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Changed in Practice
&lt;/h2&gt;

&lt;p&gt;Before v0.5.x, a typical conversation:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt; Start the demo app.&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; I can't run docker-compose from inside the container. Please run this on your terminal:&lt;/p&gt;


&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker compose -f demo-apps/docker-compose.yml up -d
&lt;/code&gt;&lt;/pre&gt;

&lt;/blockquote&gt;

&lt;p&gt;After v0.5.x:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt; Start the demo app.&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; &lt;em&gt;[calls &lt;code&gt;run_host_tool&lt;/code&gt; with &lt;code&gt;demo-up.sh&lt;/code&gt;]&lt;/em&gt;&lt;br&gt;
Demo containers started. API is at :3000, Web at :3001.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The difference seems small. But multiply it across every build, every restart, every deployment — and the friction adds up. The AI went from being a helpful advisor that tells you &lt;em&gt;what&lt;/em&gt; to type, to a capable agent that &lt;em&gt;does&lt;/em&gt; the thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Commit Workflow (A Small Thing That Matters)
&lt;/h2&gt;

&lt;p&gt;One more thing that came together in this release. I built a commit message script (&lt;code&gt;commit-msg.sh&lt;/code&gt;) that creates a collaborative workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Generate draft from staged changes → Check previous commit style → Refine together → Commit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's what it looks like in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;.sandbox/scripts/commit-msg.sh              &lt;span class="c"&gt;# Generate draft from git diff&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;.sandbox/scripts/commit-msg.sh &lt;span class="nt"&gt;--log&lt;/span&gt;        &lt;span class="c"&gt;# Check how recent commits read&lt;/span&gt;
&lt;span class="c"&gt;# ... AI and human refine CommitMsg-draft.md together ...&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;.sandbox/scripts/commit-msg.sh &lt;span class="nt"&gt;--msg-file&lt;/span&gt; CommitMsg-draft.md  &lt;span class="c"&gt;# Commit&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The interesting discovery: the script itself always had a &lt;code&gt;--log&lt;/code&gt; option to check previous style, but AI wasn't using it. The fix wasn't changing the script — it was writing the full three-step workflow in the AI instruction file (CLAUDE.md). Once AI could &lt;em&gt;see&lt;/em&gt; the steps laid out explicitly, it followed them perfectly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you want AI to follow a workflow, don't just give it a tool — spell out the steps.&lt;/strong&gt; The tool can be perfectly designed, but AI won't discover optional flags on its own. This applies to any AI coding assistant, not just this project.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Trilogy Arc
&lt;/h2&gt;

&lt;p&gt;Looking back, there's a clear progression:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phase&lt;/th&gt;
&lt;th&gt;What AI Could Do&lt;/th&gt;
&lt;th&gt;What It Couldn't&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;v0.1: Protect&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Read code&lt;/td&gt;
&lt;td&gt;See secrets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;v0.3: Equip&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Discover and use tools&lt;/td&gt;
&lt;td&gt;Touch the host OS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;v0.5: Unleash&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Run host scripts, manage containers&lt;/td&gt;
&lt;td&gt;&lt;em&gt;(nothing that matters for daily dev)&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The sandbox started as a cage. Then it became a workshop. Now it's a full development environment — with the security model still intact.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Secrets? Still hidden (volume mounts haven't changed).
Container access? Still controlled (whitelist, output masking).
Host access? Controlled too (approval workflow, SHA256 verification).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every layer of capability was added &lt;strong&gt;on top of&lt;/strong&gt; the security foundation, never at the expense of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Is This the End?
&lt;/h2&gt;

&lt;p&gt;For the core functionality — yes. My personal development workflow is now complete:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Code → Test → Build → Deploy → Commit → Code → ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The full loop. All within the sandbox.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;AI reads and writes code (sandbox)&lt;/li&gt;
&lt;li&gt;AI checks logs and runs tests (DockMCP container access)&lt;/li&gt;
&lt;li&gt;AI discovers and uses tools (SandboxMCP)&lt;/li&gt;
&lt;li&gt;AI builds, deploys, and manages containers (DockMCP host access)&lt;/li&gt;
&lt;li&gt;AI drafts commit messages collaboratively (commit-msg.sh)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There's nothing left in my daily workflow that requires me to switch to a terminal and do things manually.&lt;/p&gt;

&lt;p&gt;Well, except &lt;code&gt;dkmcp tools sync&lt;/code&gt;. That one stays manual — by design.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;The template is open source. You can set up the entire environment in about 10 minutes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/YujiSuzuki/ai-sandbox-dkmcp" rel="noopener noreferrer"&gt;ai-sandbox-dkmcp&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It works with Claude Code, Gemini CLI, and any MCP-compatible AI tool. If you're running AI coding assistants and haven't thought about where your secrets go — now's a good time.&lt;/p&gt;

&lt;p&gt;If you find it useful, a star on GitHub would mean a lot.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is Part 3 of the AI Sandbox series. &lt;a href="https://dev.to/yujisuzuki/your-ai-assistant-can-see-your-secrets-heres-how-i-fixed-it-19lj"&gt;Part 1: Secrets&lt;/a&gt; | &lt;a href="https://dev.to/yujisuzuki/i-gave-my-ai-a-toolbox-it-surprised-me-by-using-it-4on2"&gt;Part 2: Tools&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>dock</category>
      <category>security</category>
      <category>showdev</category>
    </item>
    <item>
      <title>I Gave My AI a Toolbox — It Surprised Me by Using It</title>
      <dc:creator>Yuji Suzuki</dc:creator>
      <pubDate>Sat, 14 Feb 2026 07:41:31 +0000</pubDate>
      <link>https://dev.to/yujisuzuki/i-gave-my-ai-a-toolbox-it-surprised-me-by-using-it-4on2</link>
      <guid>https://dev.to/yujisuzuki/i-gave-my-ai-a-toolbox-it-surprised-me-by-using-it-4on2</guid>
      <description>&lt;p&gt;A 3.5MB Go binary was sitting in my source tree, staged for commit. &lt;code&gt;git log&lt;/code&gt; showed nothing — it had never been committed. I had no idea how it got there.&lt;/p&gt;

&lt;p&gt;I asked Claude Code to investigate. I didn't point it to any specific tool. I didn't tell it how to search. I just said: &lt;em&gt;"There's a mystery binary in the repo. Where did it come from?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It solved the mystery in under a minute — using a tool I'd forgotten I built.&lt;/p&gt;

&lt;p&gt;But that's getting ahead of the story.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This is Part 3 of the AI Sandbox series. &lt;a href="https://dev.to/yujisuzuki/your-ai-assistant-can-see-your-secrets-heres-how-i-fixed-it-19lj"&gt;Part 1&lt;/a&gt; introduced the AI Sandbox and secret hiding. &lt;a href="https://dev.to/yujisuzuki/official-ai-sandboxes-arrived-why-i-published-mine-anyway-50n7"&gt;Part 2&lt;/a&gt; positioned it against official Docker and Claude Code sandboxes. This article focuses on what happens &lt;em&gt;inside&lt;/em&gt; the sandbox — specifically, what AI does when you give it the ability to discover tools on its own.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Drop a File, Get a Tool
&lt;/h2&gt;

&lt;p&gt;Inside the &lt;a href="https://github.com/YujiSuzuki/ai-sandbox-dkmcp" rel="noopener noreferrer"&gt;AI Sandbox&lt;/a&gt; — a Docker-based environment where AI coding agents run in isolation — I wanted a simple way to extend what AI can do. Not by editing config files or registering MCP servers for every new capability, but something closer to a plugin folder: &lt;strong&gt;drop a file in, and AI can use it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's what &lt;strong&gt;SandboxMCP&lt;/strong&gt; does. It's an MCP server that runs inside the container alongside Claude Code (or Gemini CLI), scanning two directories on demand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;.sandbox/tools/&lt;/code&gt; — Go source files&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.sandbox/scripts/&lt;/code&gt; — Shell scripts (which can call Python, Node.js, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When AI asks "what tools are available?", SandboxMCP scans these directories, parses the header comments for descriptions and usage info, and exposes them as MCP tools. No build step, no npm install, no manual registration.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it works
&lt;/h3&gt;

&lt;p&gt;Each file just needs a &lt;a href="https://github.com/YujiSuzuki/ai-sandbox-dkmcp/blob/main/docs/architecture.md#adding-custom-tools" rel="noopener noreferrer"&gt;header comment&lt;/a&gt; describing what it does. SandboxMCP parses these on demand and exposes them as MCP tools. AI discovers a tool through &lt;code&gt;list_tools&lt;/code&gt;, reads its documentation with &lt;code&gt;get_tool_info&lt;/code&gt;, and executes it with &lt;code&gt;run_tool&lt;/code&gt; — all without you telling it to.&lt;/p&gt;

&lt;p&gt;Here's the header from &lt;code&gt;search-history.go&lt;/code&gt; — the tool that ended up solving the mystery:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// search-history.go - Claude Code conversation history search tool&lt;/span&gt;
&lt;span class="c"&gt;//&lt;/span&gt;
&lt;span class="c"&gt;// Usage:&lt;/span&gt;
&lt;span class="c"&gt;//   go run .sandbox/tools/search-history.go [options] &amp;lt;pattern&amp;gt;&lt;/span&gt;
&lt;span class="c"&gt;//&lt;/span&gt;
&lt;span class="c"&gt;// Examples:&lt;/span&gt;
&lt;span class="c"&gt;//   go run .sandbox/tools/search-history.go "DockMCP"&lt;/span&gt;
&lt;span class="c"&gt;//   go run .sandbox/tools/search-history.go -role tool -tool Bash "go build"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few lines of comments. That's all AI needs to decide whether a tool is relevant.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌────────────────────────────────────────────────────┐
│ AI Sandbox (inside container)                      │
│                                                    │
│  ┌─────────────────┐      ┌─────────────────────┐  │
│  │ Claude Code     │ ←──→ │ SandboxMCP (stdio)  │  │
│  │                 │      │                     │  │
│  └─────────────────┘      │ • list_tools        │  │
│                           │ • get_tool_info     │  │
│  ┌─────────────────────┐  │ • run_tool          │  │
│  │ .sandbox/tools/     │  │ • list_scripts      │  │
│  │ • search-history.go │←─│ • get_script_info   │  │
│  │ • usage-report.go   │  │ • run_script        │  │
│  └─────────────────────┘  └─────────────────────┘  │
│  ┌─────────────────────┐                           │
│  │ .sandbox/scripts/   │                           │
│  │ • validate-secrets  │                           │
│  │ • sync-secrets      │                           │
│  └─────────────────────┘                           │
└────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Back to the Mystery Binary
&lt;/h2&gt;

&lt;p&gt;So — a 3.5MB Go binary, staged but never committed. Where did it come from?&lt;/p&gt;

&lt;p&gt;Claude Code started with the obvious: &lt;code&gt;git log&lt;/code&gt;, file timestamps, &lt;code&gt;git diff --cached&lt;/code&gt;. Dead ends. The binary was there, but nothing in the commit history explained it.&lt;/p&gt;

&lt;p&gt;Then something unexpected showed up in the tool calls. Instead of asking me for more context, Claude Code called &lt;code&gt;list_tools&lt;/code&gt; on SandboxMCP — browsing the tool catalog on its own initiative. It found &lt;code&gt;search-history&lt;/code&gt;, read the documentation with &lt;code&gt;get_tool_info&lt;/code&gt;, and decided the &lt;code&gt;-role tool -tool Bash&lt;/code&gt; filter was what it needed. It ran the search across all past Claude Code sessions — not looking for conversations, but for &lt;em&gt;Bash commands&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Within seconds, the culprit appeared: a session from a few days earlier where AI had run &lt;code&gt;go build&lt;/code&gt; without the &lt;code&gt;-o&lt;/code&gt; flag to verify that the code compiled. The build succeeded, but &lt;code&gt;go build&lt;/code&gt; without &lt;code&gt;-o&lt;/code&gt; drops the binary in the current directory. That's how a 3.5MB executable ended up in the source tree.&lt;/p&gt;

&lt;p&gt;I didn't build &lt;code&gt;search-history&lt;/code&gt; for this scenario. I wrote it for finding past conversations about specific topics. But because it was sitting in &lt;code&gt;.sandbox/tools/&lt;/code&gt;, SandboxMCP exposed it, and Claude Code decided — on its own — that it was the right tool for the job.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The auto-discovery mechanism turned a tool I'd forgotten about into the solution for a problem I hadn't anticipated.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Discovery vs. Installation
&lt;/h2&gt;

&lt;p&gt;By now, AI using tools is nothing remarkable. What caught my attention was the &lt;em&gt;how&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;There are already several MCP servers for searching Claude Code conversation history — &lt;a href="https://github.com/Vvkmnn/claude-historian-mcp" rel="noopener noreferrer"&gt;claude-historian-mcp&lt;/a&gt;, &lt;a href="https://github.com/TonySimonovsky/claude-code-conversation-search-mcp" rel="noopener noreferrer"&gt;claude-code-conversation-search-mcp&lt;/a&gt;, and others. They work well. If I'd installed one, Claude Code could have searched past sessions too.&lt;/p&gt;

&lt;p&gt;But here's the thing: &lt;strong&gt;I would never have installed a conversation-search MCP server to debug a mystery binary.&lt;/strong&gt; That connection — "unknown file in the repo" → "search past AI sessions for Bash commands" — only became obvious &lt;em&gt;after&lt;/em&gt; the answer was found. Before that, it was a lateral leap that required recognizing a tool built for one purpose could solve a completely different problem.&lt;/p&gt;

&lt;p&gt;That's the gap between installation and discovery. When you install a tool, it exists because you anticipated the need. When AI discovers tools from a catalog, it can make connections you didn't plan for — repurposing tools across problem boundaries you never drew.&lt;/p&gt;

&lt;p&gt;You don't have to predict every need upfront. You build a toolbox, and AI reaches for the right tool when the moment comes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Your Own Toolbox
&lt;/h2&gt;

&lt;p&gt;The pattern scales with the tools you add. Shell scripts can call Python, Node.js, or anything else installed in the container. The only requirement is a header comment that describes what the tool does.&lt;/p&gt;

&lt;p&gt;Some examples from this project:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;th&gt;How AI Uses It&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;search-history.go&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Search past Claude Code sessions&lt;/td&gt;
&lt;td&gt;"What did we work on last week?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;usage-report.go&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Token usage by model and period&lt;/td&gt;
&lt;td&gt;"How much did I spend this month?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;validate-secrets.sh&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Verify secret hiding works&lt;/td&gt;
&lt;td&gt;"Are my secrets properly hidden?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sync-secrets.sh&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Sync secret config across environments&lt;/td&gt;
&lt;td&gt;"Check if my secret config is consistent"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You can start with one tool. The cost of an unused tool sitting in a directory is zero. The value of having the right tool already there when an unexpected problem arises — that's what this article is about.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Doesn't Solve
&lt;/h2&gt;

&lt;p&gt;To be consistent with the honesty in &lt;a href="https://dev.to/yujisuzuki/your-ai-assistant-can-see-your-secrets-heres-how-i-fixed-it-19lj"&gt;Part 1&lt;/a&gt; and &lt;a href="https://dev.to/yujisuzuki/official-ai-sandboxes-arrived-why-i-published-mine-anyway-50n7"&gt;Part 2&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tools must already exist.&lt;/strong&gt; AI discovers tools — it doesn't create them from scratch in the moment. Someone has to write the tool and place it in the directory first. (Though in practice, AI can write a tool in one session that another session later discovers and uses.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discovery depends on good descriptions.&lt;/strong&gt; If your header comment is vague, AI may not recognize the tool as relevant. The quality of the metadata matters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scoped to the sandbox.&lt;/strong&gt; SandboxMCP only sees tools inside the container. It doesn't discover tools on the host OS or in other containers — that's &lt;a href="https://dev.to/yujisuzuki/your-ai-assistant-can-see-your-secrets-heres-how-i-fixed-it-19lj"&gt;DockMCP's&lt;/a&gt; job.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No sandboxing within the sandbox.&lt;/strong&gt; Tools run with the same permissions as AI itself. SandboxMCP doesn't add an extra isolation layer for tool execution.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How It Fits Together
&lt;/h2&gt;

&lt;p&gt;SandboxMCP completes the tooling stack of the &lt;a href="https://github.com/YujiSuzuki/ai-sandbox-dkmcp" rel="noopener noreferrer"&gt;AI Sandbox&lt;/a&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Location&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AI Sandbox&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Container&lt;/td&gt;
&lt;td&gt;Secure environment where AI runs. Secrets hidden via volume mounts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SandboxMCP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Container (stdio)&lt;/td&gt;
&lt;td&gt;Tool auto-discovery and execution inside the sandbox&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DockMCP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Host OS (SSE)&lt;/td&gt;
&lt;td&gt;Cross-container access — AI can check logs and run tests in other containers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;SandboxMCP auto-registers on container startup. No manual setup required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;VS Code + DevContainer:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/YujiSuzuki/ai-sandbox-dkmcp.git
&lt;span class="nb"&gt;cd &lt;/span&gt;ai-sandbox-dkmcp
code &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;span class="c"&gt;# Cmd+Shift+P → "Dev Containers: Reopen in Container"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;CLI (no VS Code needed):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/YujiSuzuki/ai-sandbox-dkmcp.git
&lt;span class="nb"&gt;cd &lt;/span&gt;ai-sandbox-dkmcp
./cli_sandbox/claude.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once inside, try dropping your own tool in &lt;code&gt;.sandbox/tools/&lt;/code&gt; and asking AI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"What tools are available?"&lt;/li&gt;
&lt;li&gt;"What did we work on this week?"&lt;/li&gt;
&lt;li&gt;"Show me token usage this month"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don't have to tell AI which tool to use. It figures that out on its own.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Took Away
&lt;/h2&gt;

&lt;p&gt;The binary mystery was a small thing. But it changed how I think about AI tooling.&lt;/p&gt;

&lt;p&gt;I'm not claiming drop-in tools replace dedicated MCP servers — those are more capable and better integrated for specific use cases. But for the long tail of small, project-specific utilities that don't justify their own server, a plugin folder works remarkably well. The cost of an unused tool sitting in a directory is zero.&lt;/p&gt;

&lt;p&gt;I built a search tool to find old conversations. AI used it to track down a mystery I didn't know I had. That's what happens when you stop installing tools for specific problems and start building a toolbox.&lt;/p&gt;




&lt;h2&gt;
  
  
  Appendix: The Actual Conversation
&lt;/h2&gt;

&lt;p&gt;Everything above describes what happened. Here's how it actually looked — the unedited exchange between me and Claude Code that inspired this article.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This conversation was originally in Japanese. The following is a translated version.&lt;br&gt;
Original: &lt;a href="https://zenn.dev/yujisuzuki/articles/9728a8a3f7905a#%E3%81%8A%E3%81%BE%E3%81%91-%EF%BC%88%E3%81%9D%E3%81%AE%E6%99%82%E3%81%AE%E3%83%AA%E3%82%A2%E3%83%AB%E3%81%AAai%E3%81%A8%E3%81%AE%E4%BC%9A%E8%A9%B1%EF%BC%89" rel="noopener noreferrer"&gt;Zenn article (Japanese)&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;● ---
  Code Review Results

  Project: /workspace
  Mode: Git mode (staged changes)
  Review target: Staged changes (12 files, +1264 -46)
  Summary: Addition of usage-report.go, improvements to search-history.go
  (date filters, timezone support), toolparser separator support,
  documentation updates

  Issues Found

  Issue 1: Compiled binary is staged
  - File: .sandbox/sandbox-mcp/search-history
  - Line: N/A (binary, 3.5MB)
  - Basis: Best practice violation
  - Confidence: 85/100

  Binary files /dev/null and b/.sandbox/sandbox-mcp/search-history differ

  Go projects should not commit compiled binaries. It can be regenerated
  with make install, bloats the repository, and causes platform-dependent
  issues.

❯ new file:   .sandbox/sandbox-mcp/search-history
  This one?

● Yes, that's the one. .sandbox/sandbox-mcp/search-history is a compiled
  Go binary (3.5MB) included in staging.

  Since it can be regenerated with make install or go build, there's no
  need to commit it. To unstage:

  git reset HEAD .sandbox/sandbox-mcp/search-history

  You may also want to add it to .gitignore.

❯ When did I build this binary?

● Read 2 files (ctrl+o to expand)

● Bash(git log --all --diff-filter=A -- .sandbox/sandbox-mcp/search-history)
  ⎿  (No content)

● Found the cause in past conversations.

  In session 16f4a485-b7d (today, 18:53), the AI ran
  go build /workspace/.sandbox/tools/search-history.go to verify
  compilation — but without the -o flag, so the binary was written to
  the current directory (.sandbox/sandbox-mcp/).

  In another session (e33d09a7), it correctly used go build -o /dev/null,
  but session 16f4a485 forgot the -o flag, leaving the binary behind.

  It wasn't intentional — safe to unstage.

❯ Wait... you searched past conversation history? With search-history?

● Yes, exactly. Here's what I ran via SandboxMCP's run_tool:

  search-history.go -role tool -tool Bash -after 2026-02-09 "search-history" -context 100

  I filtered Bash tool executions containing "search-history" from today's
  sessions to pinpoint when and how go build was run.

  Review flagged the issue → investigation completed with its own tools.
  That's the beauty of .sandbox/tools/.

❯ That's incredible. Without the search, we'd never have known the cause, right?

● Exactly. Without search-history, "why is this binary here?" would have
  remained unanswered:

  - git log had nothing — the file was never committed
  - The file timestamp (18:53) tells you *when*, not *what happened*
  - The previous session's AI is gone

  The conversation history was the only record of "go build was run
  without -o", and search-history is what pulled that out.

  AI sessions are ephemeral. Without a way to search past sessions,
  questions like "who created this and when?" become unanswerable.
  It's unglamorous, but genuinely useful.

❯ This is amazing — can we write about this somehow?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That last question led to this article.&lt;/p&gt;

&lt;p&gt;The part I find most interesting isn't that the AI solved the problem. It's that &lt;code&gt;git log&lt;/code&gt; returned nothing, and instead of asking me for more context, it reached for a tool I'd forgotten was there — and used it in a way I hadn't anticipated when I wrote it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is Part 3 of the AI Sandbox series. &lt;a href="https://dev.to/yujisuzuki/your-ai-assistant-can-see-your-secrets-heres-how-i-fixed-it-19lj"&gt;Part 1: Your AI Assistant Can See Your Secrets — Here's How I Fixed It&lt;/a&gt; / &lt;a href="https://dev.to/yujisuzuki/official-ai-sandboxes-arrived-why-i-published-mine-anyway-50n7"&gt;Part 2: Official AI Sandboxes Arrived — Why I Published Mine Anyway&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://github.com/YujiSuzuki/ai-sandbox-dkmcp" rel="noopener noreferrer"&gt;AI Sandbox + DockMCP&lt;/a&gt; is open source (MIT License). Feedback and contributions welcome.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>docker</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Official AI Sandboxes Arrived — Why I Published Mine Anyway</title>
      <dc:creator>Yuji Suzuki</dc:creator>
      <pubDate>Wed, 11 Feb 2026 02:54:24 +0000</pubDate>
      <link>https://dev.to/yujisuzuki/official-ai-sandboxes-arrived-why-i-published-mine-anyway-50n7</link>
      <guid>https://dev.to/yujisuzuki/official-ai-sandboxes-arrived-why-i-published-mine-anyway-50n7</guid>
      <description>&lt;h2&gt;
  
  
  Previously
&lt;/h2&gt;

&lt;p&gt;In my &lt;a href="https://dev.to/yujisuzuki/your-ai-assistant-can-see-your-secrets-heres-how-i-fixed-it-19lj"&gt;previous article&lt;/a&gt;, I wrote about catching Claude Code silently reading API keys from an iOS project — not even in the current directory, but in a &lt;em&gt;parent&lt;/em&gt; directory I never pointed it to. No prompt. No permission. It just looked.&lt;/p&gt;

&lt;p&gt;That discovery led me down a rabbit hole — and I ended up building &lt;strong&gt;AI Sandbox Environment + DockMCP&lt;/strong&gt;: a system that isolates AI inside a Docker container, hides secrets via volume mounts, and provides controlled access to other containers through MCP (Model Context Protocol).&lt;/p&gt;

&lt;p&gt;All that was left was to clean up the repo and publish it.&lt;/p&gt;

&lt;p&gt;Or so I thought.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Officials Showed Up
&lt;/h2&gt;

&lt;p&gt;Right around that time, I found official solutions in the same space.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.docker.com/ai/sandboxes" rel="noopener noreferrer"&gt;Docker AI Sandboxes&lt;/a&gt; — Docker's official AI sandbox&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://code.claude.com/docs/en/sandboxing" rel="noopener noreferrer"&gt;Claude Code Sandboxing&lt;/a&gt; — Anthropic's official sandboxing feature&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My honest first reaction: &lt;em&gt;"There's no point publishing this anymore."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If the officials have it covered, why bother releasing a personal project? So instead of giving up immediately, I decided to read what they actually offered — carefully.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Docker AI Sandboxes Solve
&lt;/h2&gt;

&lt;p&gt;Docker AI Sandboxes provide microVM-based isolation.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run AI agents inside lightweight VMs&lt;/li&gt;
&lt;li&gt;Complete isolation from the host's Docker daemon, containers, and files&lt;/li&gt;
&lt;li&gt;Sync workspace directories into the VM for autonomous work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's a polished approach. VM-level isolation is robust, and you can manage sandboxes with &lt;code&gt;docker sandbox ls&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;But as I read further, a few things stood out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The entire workspace gets synced.&lt;/strong&gt; As of this writing, it's directory-level sync with no mechanism to exclude specific files. If your &lt;code&gt;.env&lt;/code&gt; or &lt;code&gt;secrets/&lt;/code&gt; directory lives inside the workspace, the AI sees it. (This could change — the feature is still evolving.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No access to host-side containers.&lt;/strong&gt; Each sandbox runs its own Docker daemon in a completely isolated VM. You can spin up test containers inside it, but it cannot reach containers already running on the host. When you ask the AI "check the API container logs," it simply can't see them.&lt;/p&gt;

&lt;p&gt;In real multi-container development — frontend, API, and database each running in separate containers — this limitation matters a lot.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Claude Code Sandboxing Solves
&lt;/h2&gt;

&lt;p&gt;Claude Code's sandboxing takes a different approach. Instead of containers or VMs, it uses OS-level security primitives (Seatbelt on macOS, bubblewrap on Linux) for process-level restrictions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Filesystem read/write control (blocks writes outside the working directory)&lt;/li&gt;
&lt;li&gt;Network access restricted by domain (proxy-based)&lt;/li&gt;
&lt;li&gt;Approved commands auto-execute; everything else requires user confirmation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;File access is controlled through deny rules in &lt;code&gt;settings.json&lt;/code&gt;. You can block reads to &lt;code&gt;~/.ssh/&lt;/code&gt;, &lt;code&gt;/etc/&lt;/code&gt;, or specific file paths.&lt;/p&gt;

&lt;p&gt;The network isolation is especially strong — a proxy enforces domain-level access control, preventing data exfiltration to unauthorized servers. The sandbox runtime is even &lt;a href="https://github.com/anthropic-experimental/sandbox-runtime" rel="noopener noreferrer"&gt;open-sourced&lt;/a&gt;, which is great for the ecosystem.&lt;/p&gt;

&lt;p&gt;But here too, some things caught my attention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No safety net for configuration mistakes.&lt;/strong&gt; The sandbox itself is solid at the OS level, but which files are hidden depends entirely on getting the deny rules right. Add a new secret file, forget to update the deny rules, and nothing warns you. This isn't a flaw in the design — it's an inherent challenge of rule-based approaches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-container access is possible but uncontrolled.&lt;/strong&gt; Since &lt;code&gt;docker&lt;/code&gt; commands are incompatible with the sandbox, they run via &lt;code&gt;excludedCommands&lt;/code&gt; — outside sandbox protection. This means &lt;code&gt;docker exec&lt;/code&gt; and &lt;code&gt;docker logs&lt;/code&gt; work, but they bypass the sandbox entirely. There's no control over which containers get accessed, which commands are allowed, or whether secrets in log output are visible to the AI. (Anthropic may well add finer-grained controls in the future.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Where They Overlap — and Where They Don't
&lt;/h2&gt;

&lt;p&gt;At this point, things started to crystallize.&lt;/p&gt;

&lt;p&gt;Here's the three-way comparison:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Docker AI Sandboxes&lt;/th&gt;
&lt;th&gt;Claude Code Sandbox&lt;/th&gt;
&lt;th&gt;AI Sandbox + DockMCP&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Isolation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;microVM&lt;/td&gt;
&lt;td&gt;OS primitives&lt;/td&gt;
&lt;td&gt;Docker container&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Secret handling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full sync (no exclusion)&lt;/td&gt;
&lt;td&gt;Deny rules (config-based)&lt;/td&gt;
&lt;td&gt;Volume mounts (physically absent)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-container&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not possible (isolated VM)&lt;/td&gt;
&lt;td&gt;Possible but uncontrolled (docker outside sandbox)&lt;/td&gt;
&lt;td&gt;Controlled access via DockMCP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Network control&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;VM-level&lt;/td&gt;
&lt;td&gt;Domain-level (proxy-based)&lt;/td&gt;
&lt;td&gt;Docker network (no AI-specific control)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Output masking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Automatic (regex-based)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Config drift detection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Validated on startup&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;On &lt;strong&gt;isolation&lt;/strong&gt;, all three have answers. Docker AI Sandboxes is the most robust with VM-level separation. Claude Code Sandbox wins on ease of use. AI Sandbox is container-based — the weakest of the three, since containers share the host kernel and can't match VM-level isolation.&lt;/p&gt;

&lt;p&gt;But on &lt;strong&gt;what happens after isolation&lt;/strong&gt;, the existing two don't say much.&lt;/p&gt;

&lt;p&gt;An isolated AI is safe, but it's also powerless. It can't see API logs. It can't run tests. It can't trace errors. If "safe but unusable" is the result, people will eventually turn the sandbox off.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Gap My Project Fills
&lt;/h2&gt;

&lt;p&gt;AI Sandbox + DockMCP addresses a more specific problem:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Hide only the secrets — reliably — and let AI access everything else."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Mount &lt;code&gt;/dev/null&lt;/code&gt; over a file with Docker volume mounts, and the file physically doesn't exist. Mount a directory with &lt;code&gt;tmpfs&lt;/code&gt;, and it's empty. Unlike deny rules, there's no ambiguity — no "I wrote the rule but the path resolution didn't match so it was still readable." What you mount is what disappears.&lt;/p&gt;

&lt;p&gt;Of course, if you forget to add a file to &lt;code&gt;docker-compose.yml&lt;/code&gt;, it stays visible. The same is true for deny rules. That's why I built automatic validation that runs on every startup, cross-checking docker-compose volume mounts against AI tool deny configurations (Claude Code, Gemini Code Assist, Gemini CLI). If something is in the deny list but missing from docker-compose, you get a warning. The one thing it can't catch: secrets that aren't listed anywhere. The initial inventory — "what needs to be hidden" — is still a human responsibility. But once that list exists, the tooling catches the rest.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/dev/null:/workspace/api/.env:ro&lt;/span&gt;      &lt;span class="c1"&gt;# .env physically absent&lt;/span&gt;
&lt;span class="na"&gt;tmpfs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/workspace/api/secrets:ro&lt;/span&gt;              &lt;span class="c1"&gt;# secrets/ is empty&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the second problem:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Let AI access other containers — with guardrails."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;DockMCP runs on the host OS as an MCP server, acting as a gateway to the Docker API. AI accesses logs, runs whitelisted commands, and inspects containers — all through DockMCP.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AI (in container) ──MCP──▶ DockMCP (host) ──Docker API──▶ Other containers
   No Docker socket        Policy enforced            Full access
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;"Controlled" here is specific. Each container has a whitelist of allowed commands. Everything else is rejected.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;exec_whitelist&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;securenote-api"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npm&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;test"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npm&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;run&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;lint"&lt;/span&gt;
&lt;span class="err"&gt;  &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;securenote-web"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npm&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;run&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;build"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the AI tries to run &lt;code&gt;rm -rf /&lt;/code&gt;, DockMCP blocks it — not on the whitelist. File access to specific paths inside containers (like &lt;code&gt;/etc/shadow&lt;/code&gt;) can also be blocked. Security policies come in three tiers: strict, moderate, and permissive.&lt;/p&gt;

&lt;p&gt;Passwords and API keys in responses are automatically masked with regex patterns. The AI sees the logs, but secrets within them are replaced with &lt;code&gt;***&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;On the other hand, &lt;strong&gt;network control is weak&lt;/strong&gt;. There's no AI-specific restriction on outbound traffic from the container. You can constrain it with Docker network settings, but domain-level granularity is where Claude Code Sandbox's proxy-based approach and Docker AI Sandboxes' VM-level isolation clearly do better. To address this, Anthropic's official &lt;a href="https://github.com/anthropics/claude-code/tree/main/.devcontainer" rel="noopener noreferrer"&gt;firewall scripts&lt;/a&gt; can be integrated into the DevContainer. I've documented the setup in a &lt;a href="https://github.com/YujiSuzuki/ai-sandbox-dkmcp/blob/main/docs/network-firewall.md" rel="noopener noreferrer"&gt;network restriction guide&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Decided to Publish Anyway
&lt;/h2&gt;

&lt;p&gt;Existing solutions focus on &lt;strong&gt;isolating AI safely&lt;/strong&gt;. That's the right problem to solve.&lt;/p&gt;

&lt;p&gt;But in real development, AI doesn't work in isolation alone. You need it to debug multi-container applications. Run tests. Read logs. &lt;strong&gt;The balance between isolation and usability&lt;/strong&gt; is what matters.&lt;/p&gt;

&lt;p&gt;AI Sandbox + DockMCP fills that gap.&lt;/p&gt;

&lt;p&gt;It's not competing with the official solutions — it's &lt;strong&gt;complementary&lt;/strong&gt;. If Docker AI Sandboxes had something like DockMCP built in, they'd be more practical. If Claude Code Sandbox combined its filesystem controls with volume-mount-level hiding, the defense would be deeper.&lt;/p&gt;

&lt;p&gt;And there's one more thing that mattered to me personally:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's a template.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The repo is published as a GitHub Template Repository. Click "Use this template," replace &lt;code&gt;demo-apps/&lt;/code&gt; with your own project, and it works. It's not tied to any specific product — it's Docker + MCP, so it works with Claude Code, Gemini CLI, or any MCP-compatible tool.&lt;/p&gt;

&lt;p&gt;It's reassuring when official solutions set the standard. But there are always gaps the officials don't cover. Sharing one answer to those gaps, in a usable form, felt worthwhile.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Answer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Full overlap with official solutions?&lt;/td&gt;
&lt;td&gt;No — isolation approaches are similar, but secret hiding and cross-container access are different&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Are officials better in some areas?&lt;/td&gt;
&lt;td&gt;Yes — VM-level robustness (Docker), OS-primitive ease of use and network isolation (Claude Code)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unique value beyond officials?&lt;/td&gt;
&lt;td&gt;Yes — filesystem-level secret hiding, controlled cross-container access, config validation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Publish?&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When I first thought "someone already did this," I almost stopped there. I'm glad I didn't. When I actually read what existed, I realized we were solving different problems.&lt;/p&gt;

&lt;p&gt;If you're working with AI in a multi-container setup and want secrets handled at the filesystem level, give it a try — and let me know what's missing.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;AI Sandbox Environment + DockMCP&lt;/strong&gt; is available on GitHub:&lt;br&gt;
&lt;a href="https://github.com/YujiSuzuki/ai-sandbox-dkmcp" rel="noopener noreferrer"&gt;https://github.com/YujiSuzuki/ai-sandbox-dkmcp&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click "Use this template" to start using it in your own project.&lt;br&gt;
Feedback, suggestions, and feature requests are welcome in &lt;a href="https://github.com/YujiSuzuki/ai-sandbox-dkmcp/discussions" rel="noopener noreferrer"&gt;GitHub Discussions&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>docker</category>
      <category>security</category>
      <category>ai</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Your AI Assistant Can See Your Secrets — Here's How I Fixed It</title>
      <dc:creator>Yuji Suzuki</dc:creator>
      <pubDate>Tue, 03 Feb 2026 14:58:07 +0000</pubDate>
      <link>https://dev.to/yujisuzuki/your-ai-assistant-can-see-your-secrets-heres-how-i-fixed-it-19lj</link>
      <guid>https://dev.to/yujisuzuki/your-ai-assistant-can-see-your-secrets-heres-how-i-fixed-it-19lj</guid>
      <description>&lt;p&gt;AI coding agents are incredible. They refactor code, write tests, debug across files. But there's a problem nobody talks about enough: &lt;strong&gt;they read everything in your project directory&lt;/strong&gt;, including your &lt;code&gt;.env&lt;/code&gt; files, API keys, and private certificates.&lt;/p&gt;

&lt;p&gt;I found out the hard way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Incident
&lt;/h2&gt;

&lt;p&gt;I was using Claude Code to investigate an issue that spanned both a server-side project and an iOS app. The server repo was my working directory, but the iOS source lived in a sibling directory. I asked Claude Code if it could see the iOS code too. It said yes — so I let it investigate.&lt;/p&gt;

&lt;p&gt;What I didn't realize: the iOS project had its own &lt;code&gt;.claude/settings.json&lt;/code&gt; with deny rules to protect API keys. But Claude Code only reads the settings from the directory it was launched in. The iOS project's deny rules were ignored. The AI read the API keys.&lt;/p&gt;

&lt;p&gt;Fortunately, they were ad network keys — not catastrophic. But the experience stuck with me. I had configured the protection correctly, and it still didn't work because of a scoping limitation I didn't know about.&lt;/p&gt;

&lt;p&gt;And the data backs this up. GitHub reported &lt;a href="https://www.darkreading.com/application-security/coders-adopt-ai-agents-security-pitfalls-lurk-2026" rel="noopener noreferrer"&gt;39 million leaked secrets&lt;/a&gt; in repositories in 2024 alone. With 82% of developers now using AI tools, the attack surface is only growing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem Is Getting Worse, Not Better
&lt;/h2&gt;

&lt;p&gt;2025 brought a wave of new attack vectors targeting AI coding tools specifically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PromptPwnd&lt;/strong&gt; — Attackers inject malicious instructions into PR descriptions and commit messages. When an AI agent processes these in CI/CD, it leaks secrets or modifies workflows. &lt;a href="https://www.darkreading.com/application-security/coders-adopt-ai-agents-security-pitfalls-lurk-2026" rel="noopener noreferrer"&gt;At least five Fortune 500 companies were affected&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;IDEsaster&lt;/strong&gt; — Prompt injection through legitimate IDE features. CVEs were issued for &lt;a href="https://www.pillar.security/blog/new-vulnerability-in-github-copilot-and-cursor-how-hackers-can-weaponize-code-agents" rel="noopener noreferrer"&gt;Cursor, Roo Code, JetBrains Junie, GitHub Copilot, and Claude Code&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rules File Backdoor&lt;/strong&gt; — Hidden Unicode characters in configuration files silently instruct AI to insert malicious code. Virtually invisible to developers during code review.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't theoretical. They're happening now.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Existing Solutions Do — and Don't Do
&lt;/h2&gt;

&lt;p&gt;The industry is responding, and there are real options available now:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://code.claude.com/docs/en/sandboxing" rel="noopener noreferrer"&gt;Claude Code Sandboxing&lt;/a&gt;&lt;/strong&gt; uses OS-level primitives (Seatbelt on macOS, bubblewrap on Linux) to restrict filesystem writes and network access. You can also add &lt;code&gt;Read&lt;/code&gt; deny rules in permissions to block AI from reading specific files. This is solid for preventing command execution attacks and prompt injection damage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://docs.docker.com/ai/sandboxes" rel="noopener noreferrer"&gt;Docker AI Sandboxes&lt;/a&gt;&lt;/strong&gt; run AI agents in isolated microVMs with their own Docker daemon. The agent can't touch your host system. Great for giving agents full autonomy without risking your machine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.docker.com/blog/mcp-toolkit-mcp-servers-that-just-work/" rel="noopener noreferrer"&gt;Docker MCP Toolkit&lt;/a&gt;&lt;/strong&gt; provides 200+ containerized MCP servers with built-in isolation and secret management.&lt;/p&gt;

&lt;p&gt;These are good tools. But there are two gaps they don't fully address:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gap 1: Filesystem-level secret hiding.&lt;/strong&gt; Claude Code's deny rules are application-level — they depend on correct configuration and the AI tool respecting them. There are also scope limitations: deny rules &lt;a href="https://github.com/anthropics/claude-code/issues/12962" rel="noopener noreferrer"&gt;don't traverse parent directories&lt;/a&gt;, so in a monorepo or multi-project workspace, settings in one project won't protect secrets in a sibling project. Docker AI Sandboxes sync your entire workspace directory into the microVM with no mechanism to exclude specific files. In both cases, a &lt;code&gt;.env&lt;/code&gt; file sitting in your project directory is either visible or requires careful per-tool configuration to hide.&lt;/p&gt;

&lt;p&gt;What if the secrets simply didn't exist in AI's filesystem? Not blocked by a rule, not filtered by a config — just not there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gap 2: Cross-container debugging.&lt;/strong&gt; Docker AI Sandboxes are fully isolated — each sandbox can't communicate with others. Claude Code's sandbox restricts outbound access. But in a real microservice setup, AI needs to check API logs, run tests in other containers, and inspect services to be useful. How do you give it that access safely?&lt;/p&gt;

&lt;h2&gt;
  
  
  My Solution: AI Sandbox + DockMCP
&lt;/h2&gt;

&lt;p&gt;The existing tools above are valuable — and you should use them. Claude Code's sandboxing reduces permission fatigue. Docker AI Sandboxes provide strong isolation for autonomous agents.&lt;/p&gt;

&lt;p&gt;But I needed something that works at a different layer: &lt;strong&gt;making secrets physically absent from AI's filesystem&lt;/strong&gt;, while still giving AI &lt;strong&gt;controlled access to other containers&lt;/strong&gt; for debugging.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/YujiSuzuki/ai-sandbox-dkmcp" rel="noopener noreferrer"&gt;AI Sandbox&lt;/a&gt; — a Docker-based development environment that fills these two gaps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Secret hiding&lt;/strong&gt;: Specific files don't exist in AI's filesystem — not blocked, not filtered, just absent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-container access&lt;/strong&gt;: AI can check logs and run tests in other containers through a controlled gateway&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It's complementary to existing tools. You can use Claude Code's sandbox &lt;em&gt;inside&lt;/em&gt; the AI Sandbox for defense in depth.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Secret Hiding Works
&lt;/h3&gt;

&lt;p&gt;The trick is surprisingly simple: Docker volume mounts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# docker-compose.yml&lt;/span&gt;
&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# AI sees an empty file instead of real .env&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/dev/null:/workspace/my-app/.env:ro&lt;/span&gt;

&lt;span class="na"&gt;tmpfs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# AI sees an empty directory instead of real secrets&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/workspace/my-app/secrets:ro&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When AI runs inside the container, it literally cannot see the real files. They don't exist in its filesystem. But other containers — the ones actually running your app — mount the real files normally.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AI Sandbox (where AI runs)
├── my-app/
│   ├── src/           ← AI can read and edit all code
│   ├── .env           ← Empty (mounted from /dev/null)
│   └── secrets/       ← Empty directory (tmpfs)

API Container (where your app runs)
├── my-app/
│   ├── src/           ← Same code
│   ├── .env           ← Real secrets
│   └── secrets/       ← Real private keys
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No runtime agents scanning for secrets. No AI configuration that could be bypassed. The secrets simply don't exist in AI's world.&lt;/p&gt;

&lt;p&gt;The risk with this approach is forgetting to hide a file. To catch that, the sandbox runs startup validation that automatically checks whether your AI tool's deny rules (e.g., &lt;code&gt;.claude/settings.json&lt;/code&gt;) and your &lt;code&gt;docker-compose.yml&lt;/code&gt; volume mounts are in sync. If a secret file is blocked in one but not the other, you get a warning on startup — before AI ever sees it.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Cross-Container Access Works
&lt;/h3&gt;

&lt;p&gt;Hiding secrets creates a new problem: AI can't access other containers for debugging. That's where DockMCP comes in.&lt;/p&gt;

&lt;p&gt;DockMCP is an MCP (Model Context Protocol) server that runs on the host OS and acts as a controlled gateway between the AI sandbox and other Docker containers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AI Sandbox  →  DockMCP (host)  →  API Container
  "show me       checks            returns last
   the logs"     security policy   100 log lines
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AI can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read logs from any allowed container&lt;/li&gt;
&lt;li&gt;Run whitelisted commands (like &lt;code&gt;npm test&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Check resource usage (CPU, memory)&lt;/li&gt;
&lt;li&gt;Inspect container configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI cannot:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start or stop containers&lt;/li&gt;
&lt;li&gt;Access files outside allowed paths&lt;/li&gt;
&lt;li&gt;Run arbitrary commands&lt;/li&gt;
&lt;li&gt;Bypass the security policy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The security policy is defined in a simple YAML file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;security&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;moderate"&lt;/span&gt;
  &lt;span class="na"&gt;allowed_containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-api-*"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-web-*"&lt;/span&gt;
  &lt;span class="na"&gt;exec_whitelist&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-api"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npm&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;test"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npm&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;run&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;lint"&lt;/span&gt;
  &lt;span class="na"&gt;blocked_paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/etc/shadow"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;**/.env"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;DockMCP also automatically masks sensitive data (passwords, API keys, tokens) in log output and command results, so even when AI reads logs from other containers, secrets don't leak through.&lt;/p&gt;

&lt;p&gt;For the full configuration reference, see the &lt;a href="https://github.com/YujiSuzuki/ai-sandbox-dkmcp/blob/main/dkmcp/README.md" rel="noopener noreferrer"&gt;DockMCP documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture Overview
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────┐
│ Host OS                                 │
│                                         │
│  DockMCP Server (:8080)                 │
│    ├── Security policy enforcement      │
│    └── Container access gateway         │
│                                         │
│  Docker Engine                          │
│    ├── AI Sandbox                       │
│    │   ├── Claude Code / Gemini / etc.  │
│    │   └── secrets → hidden             │
│    │                                    │
│    ├── API Container                    │
│    │   └── secrets → real files         │
│    │                                    │
│    └── Web Container                    │
└─────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What This Looks Like in Practice
&lt;/h2&gt;

&lt;p&gt;Here's a real debugging session. I ask AI to help investigate a failing API endpoint:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt; The &lt;code&gt;/api/notes&lt;/code&gt; endpoint returns 500. Can you check the API logs?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;AI uses DockMCP to fetch logs from the API container:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;AI:&lt;/strong&gt; I can see the error in the logs. There's a JWT verification failure at line 42 of &lt;code&gt;auth.js&lt;/code&gt;. The token format changed after a recent update. Let me check the code...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;AI reads the auth middleware code (visible in the sandbox), identifies the bug, and proposes a fix. It never sees the JWT secret key (hidden via volume mount), but it doesn't need to — it can see the code and the error logs.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt; Looks good. Run the tests to make sure it passes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;AI runs &lt;code&gt;npm test&lt;/code&gt; via DockMCP in the API container and confirms all tests pass.&lt;/p&gt;

&lt;p&gt;The AI had everything it needed to debug and fix the issue, without ever accessing a single secret.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Try It
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Minimal Setup (Sandbox Only)
&lt;/h3&gt;

&lt;p&gt;If you just want secret hiding without cross-container access:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/YujiSuzuki/ai-sandbox-dkmcp.git
&lt;span class="nb"&gt;cd &lt;/span&gt;ai-sandbox-dkmcp
code &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;span class="c"&gt;# Cmd+Shift+P → "Dev Containers: Reopen in Container"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Edit &lt;code&gt;docker-compose.yml&lt;/code&gt; to hide your secret files, and you're done.&lt;/p&gt;

&lt;h3&gt;
  
  
  Full Setup (Sandbox + DockMCP)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Clone and enter&lt;/span&gt;
git clone https://github.com/YujiSuzuki/ai-sandbox-dkmcp.git
&lt;span class="nb"&gt;cd &lt;/span&gt;ai-sandbox-dkmcp

&lt;span class="c"&gt;# 2. Start DockMCP on host OS&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;dkmcp
make &lt;span class="nb"&gt;install
&lt;/span&gt;dkmcp serve &lt;span class="nt"&gt;--config&lt;/span&gt; configs/dkmcp.example.yaml

&lt;span class="c"&gt;# 3. Open DevContainer in VS Code&lt;/span&gt;
code &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;span class="c"&gt;# Cmd+Shift+P → "Dev Containers: Reopen in Container"&lt;/span&gt;
&lt;span class="c"&gt;# Or use the CLI sandbox (no VS Code needed):&lt;/span&gt;
&lt;span class="c"&gt;# ./cli_sandbox/claude.sh or ./cli_sandbox/gemini.sh&lt;/span&gt;

&lt;span class="c"&gt;# 4. Register DockMCP as an MCP server (inside the container)&lt;/span&gt;
claude mcp add &lt;span class="nt"&gt;--transport&lt;/span&gt; sse &lt;span class="nt"&gt;--scope&lt;/span&gt; user dkmcp http://host.docker.internal:8080/sse
&lt;span class="c"&gt;# Or for Gemini CLI:&lt;/span&gt;
gemini mcp add &lt;span class="nt"&gt;--transport&lt;/span&gt; sse dkmcp http://host.docker.internal:8080/sse
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The repo includes a demo application (SecureNote) so you can see the full flow in action. For detailed setup instructions, troubleshooting, and connection verification, see the &lt;a href="https://github.com/YujiSuzuki/ai-sandbox-dkmcp" rel="noopener noreferrer"&gt;README&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Limitations
&lt;/h2&gt;

&lt;p&gt;I want to be upfront about what this does and doesn't do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Local development only.&lt;/strong&gt; DockMCP has no authentication yet, so it's designed for local use only. I'd like to add authentication in the future, but no timeline has been set.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No network restriction by default.&lt;/strong&gt; AI can still make outbound HTTP requests. You can add network restrictions using &lt;a href="https://github.com/anthropics/claude-code/blob/main/.devcontainer/init-firewall.sh" rel="noopener noreferrer"&gt;Anthropic's official firewall script&lt;/a&gt; or Docker network policies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker required.&lt;/strong&gt; The volume mount approach means you need a Docker-compatible runtime. No Docker, no secret hiding.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Only tested on macOS.&lt;/strong&gt; It should work on Linux and Windows, but I haven't verified it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not a replacement for proper secrets management.&lt;/strong&gt; This is a development-time protection layer. Use HashiCorp Vault, AWS Secrets Manager, etc. for production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complementary, not competing.&lt;/strong&gt; This works alongside Claude Code's sandbox and Docker AI Sandboxes. Use them together for defense in depth.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;85% of developers now use AI coding tools (&lt;a href="https://www.javacodegeeks.com/2025/12/ai-assisted-coding-in-2026-how-github-copilot-cursor-and-amazon-q-are-reshaping-developer-workflows.html" rel="noopener noreferrer"&gt;JetBrains, 2025&lt;/a&gt;). AI writes &lt;a href="https://dev.to/alexmercedcoder/ai-coding-dominates-2026-week-of-january-20-27-7lc"&gt;29% of new code&lt;/a&gt; in the US. These tools aren't going away — they're becoming the default way we write software.&lt;/p&gt;

&lt;p&gt;But the security model hasn't caught up. We give AI full read access to our projects and hope for the best. When something leaks, we rotate keys and move on.&lt;/p&gt;

&lt;p&gt;We can do better. The approach doesn't have to be complicated:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Isolate AI in a container&lt;/strong&gt; so secrets can be hidden at the filesystem level&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provide controlled access&lt;/strong&gt; to other containers for debugging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Define explicit security policies&lt;/strong&gt; for what AI can and cannot do&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;a href="https://github.com/YujiSuzuki/ai-sandbox-dkmcp" rel="noopener noreferrer"&gt;AI Sandbox + DockMCP&lt;/a&gt; template is one implementation of this idea. Hit "Use this template" on GitHub to start your own.&lt;/p&gt;

&lt;p&gt;The goal isn't to restrict AI — it's to let AI work freely in a space where secrets simply don't exist.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;AI Sandbox + DockMCP is open source (MIT License). Contributions and feedback are welcome on &lt;a href="https://github.com/YujiSuzuki/ai-sandbox-dkmcp" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




</description>
      <category>security</category>
      <category>ai</category>
      <category>docker</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
