DEV Community

Yuji Suzuki
Yuji Suzuki

Posted on • Edited on

I Gave My AI a Toolbox — It Surprised Me by Using It

A 3.5MB Go binary was sitting in my source tree, staged for commit. git log showed nothing — it had never been committed. I had no idea how it got there.

I asked Claude Code to investigate. I didn't point it to any specific tool. I didn't tell it how to search. I just said: "There's a mystery binary in the repo. Where did it come from?"

It solved the mystery in under a minute — using a tool I'd forgotten I built.

But that's getting ahead of the story.

This is Part 3 of the AI Sandbox series. Part 1 introduced the AI Sandbox and secret hiding. Part 2 positioned it against official Docker and Claude Code sandboxes. This article focuses on what happens inside the sandbox — specifically, what AI does when you give it the ability to discover tools on its own.

Drop a File, Get a Tool

Inside the AI Sandbox — a Docker-based environment where AI coding agents run in isolation — I wanted a simple way to extend what AI can do. Not by editing config files or registering MCP servers for every new capability, but something closer to a plugin folder: drop a file in, and AI can use it.

That's what SandboxMCP does. It's an MCP server that runs inside the container alongside Claude Code (or Gemini CLI), scanning two directories on demand:

  • .sandbox/tools/ — Go source files
  • .sandbox/scripts/ — Shell scripts (which can call Python, Node.js, etc.)

When AI asks "what tools are available?", SandboxMCP scans these directories, parses the header comments for descriptions and usage info, and exposes them as MCP tools. No build step, no npm install, no manual registration.

How it works

Each file just needs a header comment describing what it does. SandboxMCP parses these on demand and exposes them as MCP tools. AI discovers a tool through list_tools, reads its documentation with get_tool_info, and executes it with run_tool — all without you telling it to.

Here's the header from search-history.go — the tool that ended up solving the mystery:

// search-history.go - Claude Code conversation history search tool
//
// Usage:
//   go run .sandbox/tools/search-history.go [options] <pattern>
//
// Examples:
//   go run .sandbox/tools/search-history.go "DockMCP"
//   go run .sandbox/tools/search-history.go -role tool -tool Bash "go build"
Enter fullscreen mode Exit fullscreen mode

A few lines of comments. That's all AI needs to decide whether a tool is relevant.

┌────────────────────────────────────────────────────┐
│ AI Sandbox (inside container)                      │
│                                                    │
│  ┌─────────────────┐      ┌─────────────────────┐  │
│  │ Claude Code     │ ←──→ │ SandboxMCP (stdio)  │  │
│  │                 │      │                     │  │
│  └─────────────────┘      │ • list_tools        │  │
│                           │ • get_tool_info     │  │
│  ┌─────────────────────┐  │ • run_tool          │  │
│  │ .sandbox/tools/     │  │ • list_scripts      │  │
│  │ • search-history.go │←─│ • get_script_info   │  │
│  │ • usage-report.go   │  │ • run_script        │  │
│  └─────────────────────┘  └─────────────────────┘  │
│  ┌─────────────────────┐                           │
│  │ .sandbox/scripts/   │                           │
│  │ • validate-secrets  │                           │
│  │ • sync-secrets      │                           │
│  └─────────────────────┘                           │
└────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Back to the Mystery Binary

So — a 3.5MB Go binary, staged but never committed. Where did it come from?

Claude Code started with the obvious: git log, file timestamps, git diff --cached. Dead ends. The binary was there, but nothing in the commit history explained it.

Then something unexpected showed up in the tool calls. Instead of asking me for more context, Claude Code called list_tools on SandboxMCP — browsing the tool catalog on its own initiative. It found search-history, read the documentation with get_tool_info, and decided the -role tool -tool Bash filter was what it needed. It ran the search across all past Claude Code sessions — not looking for conversations, but for Bash commands.

Within seconds, the culprit appeared: a session from a few days earlier where AI had run go build without the -o flag to verify that the code compiled. The build succeeded, but go build without -o drops the binary in the current directory. That's how a 3.5MB executable ended up in the source tree.

I didn't build search-history for this scenario. I wrote it for finding past conversations about specific topics. But because it was sitting in .sandbox/tools/, SandboxMCP exposed it, and Claude Code decided — on its own — that it was the right tool for the job.

The auto-discovery mechanism turned a tool I'd forgotten about into the solution for a problem I hadn't anticipated.

Discovery vs. Installation

By now, AI using tools is nothing remarkable. What caught my attention was the how.

There are already several MCP servers for searching Claude Code conversation history — claude-historian-mcp, claude-code-conversation-search-mcp, and others. They work well. If I'd installed one, Claude Code could have searched past sessions too.

But here's the thing: I would never have installed a conversation-search MCP server to debug a mystery binary. That connection — "unknown file in the repo" → "search past AI sessions for Bash commands" — only became obvious after the answer was found. Before that, it was a lateral leap that required recognizing a tool built for one purpose could solve a completely different problem.

That's the gap between installation and discovery. When you install a tool, it exists because you anticipated the need. When AI discovers tools from a catalog, it can make connections you didn't plan for — repurposing tools across problem boundaries you never drew.

You don't have to predict every need upfront. You build a toolbox, and AI reaches for the right tool when the moment comes.

Building Your Own Toolbox

The pattern scales with the tools you add. Shell scripts can call Python, Node.js, or anything else installed in the container. The only requirement is a header comment that describes what the tool does.

Some examples from this project:

Tool What It Does How AI Uses It
search-history.go Search past Claude Code sessions "What did we work on last week?"
usage-report.go Token usage by model and period "How much did I spend this month?"
validate-secrets.sh Verify secret hiding works "Are my secrets properly hidden?"
sync-secrets.sh Sync secret config across environments "Check if my secret config is consistent"

You can start with one tool. The cost of an unused tool sitting in a directory is zero. The value of having the right tool already there when an unexpected problem arises — that's what this article is about.

What This Doesn't Solve

To be consistent with the honesty in Part 1 and Part 2:

  • Tools must already exist. AI discovers tools — it doesn't create them from scratch in the moment. Someone has to write the tool and place it in the directory first. (Though in practice, AI can write a tool in one session that another session later discovers and uses.)
  • Discovery depends on good descriptions. If your header comment is vague, AI may not recognize the tool as relevant. The quality of the metadata matters.
  • Scoped to the sandbox. SandboxMCP only sees tools inside the container. It doesn't discover tools on the host OS or in other containers — that's DockMCP's job.
  • No sandboxing within the sandbox. Tools run with the same permissions as AI itself. SandboxMCP doesn't add an extra isolation layer for tool execution.

How It Fits Together

SandboxMCP completes the tooling stack of the AI Sandbox:

Component Location Purpose
AI Sandbox Container Secure environment where AI runs. Secrets hidden via volume mounts
SandboxMCP Container (stdio) Tool auto-discovery and execution inside the sandbox
DockMCP Host OS (SSE) Cross-container access — AI can check logs and run tests in other containers

SandboxMCP auto-registers on container startup. No manual setup required.

Try It

VS Code + DevContainer:

git clone https://github.com/YujiSuzuki/ai-sandbox-dkmcp.git
cd ai-sandbox-dkmcp
code .
# Cmd+Shift+P → "Dev Containers: Reopen in Container"
Enter fullscreen mode Exit fullscreen mode

CLI (no VS Code needed):

git clone https://github.com/YujiSuzuki/ai-sandbox-dkmcp.git
cd ai-sandbox-dkmcp
./cli_sandbox/claude.sh
Enter fullscreen mode Exit fullscreen mode

Once inside, try dropping your own tool in .sandbox/tools/ and asking AI:

  • "What tools are available?"
  • "What did we work on this week?"
  • "Show me token usage this month"

You don't have to tell AI which tool to use. It figures that out on its own.

What I Took Away

The binary mystery was a small thing. But it changed how I think about AI tooling.

I'm not claiming drop-in tools replace dedicated MCP servers — those are more capable and better integrated for specific use cases. But for the long tail of small, project-specific utilities that don't justify their own server, a plugin folder works remarkably well. The cost of an unused tool sitting in a directory is zero.

I built a search tool to find old conversations. AI used it to track down a mystery I didn't know I had. That's what happens when you stop installing tools for specific problems and start building a toolbox.


Appendix: The Actual Conversation

Everything above describes what happened. Here's how it actually looked — the unedited exchange between me and Claude Code that inspired this article.

This conversation was originally in Japanese. The following is a translated version.
Original: Zenn article (Japanese)

● ---
  Code Review Results

  Project: /workspace
  Mode: Git mode (staged changes)
  Review target: Staged changes (12 files, +1264 -46)
  Summary: Addition of usage-report.go, improvements to search-history.go
  (date filters, timezone support), toolparser separator support,
  documentation updates

  Issues Found

  Issue 1: Compiled binary is staged
  - File: .sandbox/sandbox-mcp/search-history
  - Line: N/A (binary, 3.5MB)
  - Basis: Best practice violation
  - Confidence: 85/100

  Binary files /dev/null and b/.sandbox/sandbox-mcp/search-history differ

  Go projects should not commit compiled binaries. It can be regenerated
  with make install, bloats the repository, and causes platform-dependent
  issues.

❯ new file:   .sandbox/sandbox-mcp/search-history
  This one?

● Yes, that's the one. .sandbox/sandbox-mcp/search-history is a compiled
  Go binary (3.5MB) included in staging.

  Since it can be regenerated with make install or go build, there's no
  need to commit it. To unstage:

  git reset HEAD .sandbox/sandbox-mcp/search-history

  You may also want to add it to .gitignore.

❯ When did I build this binary?

● Read 2 files (ctrl+o to expand)

● Bash(git log --all --diff-filter=A -- .sandbox/sandbox-mcp/search-history)
  ⎿  (No content)

● Found the cause in past conversations.

  In session 16f4a485-b7d (today, 18:53), the AI ran
  go build /workspace/.sandbox/tools/search-history.go to verify
  compilation — but without the -o flag, so the binary was written to
  the current directory (.sandbox/sandbox-mcp/).

  In another session (e33d09a7), it correctly used go build -o /dev/null,
  but session 16f4a485 forgot the -o flag, leaving the binary behind.

  It wasn't intentional — safe to unstage.

❯ Wait... you searched past conversation history? With search-history?

● Yes, exactly. Here's what I ran via SandboxMCP's run_tool:

  search-history.go -role tool -tool Bash -after 2026-02-09 "search-history" -context 100

  I filtered Bash tool executions containing "search-history" from today's
  sessions to pinpoint when and how go build was run.

  Review flagged the issue → investigation completed with its own tools.
  That's the beauty of .sandbox/tools/.

❯ That's incredible. Without the search, we'd never have known the cause, right?

● Exactly. Without search-history, "why is this binary here?" would have
  remained unanswered:

  - git log had nothing — the file was never committed
  - The file timestamp (18:53) tells you *when*, not *what happened*
  - The previous session's AI is gone

  The conversation history was the only record of "go build was run
  without -o", and search-history is what pulled that out.

  AI sessions are ephemeral. Without a way to search past sessions,
  questions like "who created this and when?" become unanswerable.
  It's unglamorous, but genuinely useful.

❯ This is amazing — can we write about this somehow?
Enter fullscreen mode Exit fullscreen mode

That last question led to this article.

The part I find most interesting isn't that the AI solved the problem. It's that git log returned nothing, and instead of asking me for more context, it reached for a tool I'd forgotten was there — and used it in a way I hadn't anticipated when I wrote it.


This is Part 3 of the AI Sandbox series. Part 1: Your AI Assistant Can See Your Secrets — Here's How I Fixed It / Part 2: Official AI Sandboxes Arrived — Why I Published Mine Anyway

AI Sandbox + DockMCP is open source (MIT License). Feedback and contributions welcome.

Top comments (1)

Collapse
 
chovy profile image
chovy

This is a great breakdown of the discovery vs. installation distinction. The plugin folder pattern is underrated — it's basically what Unix philosophy looks like when applied to AI tooling.

The mystery binary story is a perfect example of emergent tool use. You didn't anticipate the connection between conversation search and debugging stray build artifacts, but the agent made that lateral leap because the tool was available, not because it was prescribed.

I've been curating a list of AI agent platforms that take different approaches to this kind of tool/capability discovery — some go the MCP route, others do custom plugin systems, and a few let agents discover and compose tools dynamically like your SandboxMCP approach. If you're interested in seeing how different platforms tackle it: awesome-agent-platforms

Curious whether you've seen agents chain multiple SandboxMCP tools together in a single investigation — like using search-history to find a past session, then usage-report to correlate the cost of that session?