By AIO Sandbox team
This is the second part of a series of posts diving into AIO Sandbox that provides an isolated, programmable environment where agents can safely execute.
In the world of AI agents, we’ve moved past simple text-in, text-out. Today’s agents need to act—they need to browse the web, interact with GUIs, and communicate through standardized protocols. However, running these agents locally is risky. You wouldn't want an experimental script having unfettered access to your main browser or local file system. This is where the AIO Sandbox comes in.
In our previous post we focussed on "Filesystem" & "Shell" capabilities of AIO Sandbox. In this post, we’ll explore ways to supercharge AI agents using the AIO Sandbox from web automation to MCP protocol communication.
1. Seamless Web Automation with browser-use
The general-purpose GUI agents (CUA – Computer-use Agents) integrate advanced reasoning, perception, and action modules, enabling them to perform tasks with human-like adaptability and versatility.
The first challenge for any agent is navigating the web. Using the browser-use library alongside a remote CDP (Chrome DevTools Protocol) session provided by the AIO Sandbox, we can create an agent that feels like it’s "seeing" the web.
A browser-use agent built on the Chrome DevTools Protocol (CDP) is essentially an LLM-powered controller that treats the browser like a programmable environment. Think of CDP (Chrome DevTools Protocol) as “The browser’s brain exposed as an API”, the "backdoor" into the browser's brain. Instead of sending clicks to pixel X, Y coordinates, the commands are sent to the DOM (Document Object Model).
High-level architecture of a browser-use agent
User Task
↓
LLM (Planner + Reasoner)
↓
Agent Runtime
↓
CDP Client (WebSocket)
↓
Browser (Chrome/Chromium)
The code below demonstrates how to connect a browser-use agent to a sandboxed browser to perform a specific task: extracting Cisco’s stock price.
import asyncio
from agent_sandbox import Sandbox
from browser_use import Agent, Tools, BrowserProfile, BrowserSession
from browser_use.llm import ChatOpenAI
async def main():
# 1. Connect to the AIO Sandbox
sandbox = Sandbox(base_url="http://localhost:8080")
cdp_url = sandbox.browser.get_info().data.cdp_url
# 2. Attach the browser session to the remote Sandbox
browser_session = BrowserSession(
browser_profile=BrowserProfile(cdp_url=cdp_url, is_local=True)
)
task = 'Go to Google, look up "Cisco stock price" (CSCO), and extract the current price.'
agent = Agent(
task=task,
llm=ChatOpenAI(model="gpt-4o"),
browser_session=browser_session,
)
await agent.run()
await browser_session.kill()
if __name__ == "__main__":
asyncio.run(main())
Why this matters: By offloading the browser to a sandbox, your agent runs in an isolated environment. If the agent accidentally clicks a malicious link or tries to download a file, it only affects the disposable container, not your host machine.
2.Standardizing Discovery: The Agentic MCP (Model Context Protocol) Loop
Next, we have the Model Context Protocol (MCP). MCP is a standardized way for agents to discover and use tools across different environments without custom integration code for every new feature. MCP is the "native language" of modern AI agents. AIO Sandbox exports its services (such as browser_navigate etc.) as tools that an LLM can call directly. For a human developer, the REST API is the interface. For an AI agent, the MCP Tool Manifest is the interface. By exposing both, the sandbox ensures that:
A Python script can use the sandbox via REST.
An LLM (like Claude) can use the sandbox via MCP. The AIO Sandbox acts as an MCP host, allowing you to list and execute tools (like
browser_navigateorshell_execute) through a unified interface.
To see this MCP usage in practice, let’s look at how we can bridge the gap between an LLM and the sandbox using MCP. The following example demonstrates a dynamic 'handshake' where the agent queries the AIO Sandbox for its available capabilities and automatically converts them into a format the LLM understands.
import asyncio
import json
from openai import AsyncOpenAI
from agent_sandbox import AsyncSandbox
# Initialize your AI and your MCP Client
# ai_client = AsyncOpenAI(api_key="your-key-here")
ai_client = AsyncOpenAI(api_key="API-KEY")
mcp_client = AsyncSandbox(base_url="http://localhost:8080")
async def run_agent(user_prompt):
# 1. Fetch available tools from MCP
tools_resp = await mcp_client.mcp.list_mcp_tools(server_name="browser")
# Format MCP tools into OpenAI's required JSON schema
openai_tools = [
{
"type": "function",
"function": {
"name": tool.name,
"description": tool.description,
"parameters": tool.input_schema # MCP uses JSON Schema, which OpenAI likes
}
} for tool in tools_resp.data.tools
]
# 2. Ask the model what to do
response = await ai_client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": user_prompt}],
tools=openai_tools
)
message = response.choices[0].message
# 3. If the model wants to call a tool, we execute it
if message.tool_calls:
for tool_call in message.tool_calls:
# Parse arguments
args = json.loads(tool_call.function.arguments)
print(f"Agent is executing {tool_call.function.name}...")
# CALL THE MCP SANDBOX
result = await mcp_client.mcp.execute_mcp_tool(
server_name="browser",
tool_name=tool_call.function.name,
request=args
)
# 4. Give the tool's output back to the model
final_resp = await ai_client.chat.completions.create(
model="gpt-5.4",
messages=[
{"role": "user", "content": user_prompt},
message,
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": str(result.data.content[0].text)
}
]
)
print(f"Final Answer: {final_resp.choices[0].message.content}")
if __name__ == "__main__":
asyncio.run(run_agent("Go to google.com and tell me what the search button says and the button next to it."))
Why MCP is the future: Instead of writing custom logic for every new sandbox capability, your agent can "handshake" with the server, see what it's capable of, and start working immediately.
The Shift: This decouples your code from your tools. If you add a "SQL" or "FileSystem" server to the sandbox tomorrow, your agent automatically knows how to use them without a single line of new code.
3. Conclusion
Building AI agents is no longer just about the prompt. It’s about the infrastructure that supports them. By combining AIO Sandbox for security, browser-use for high-level navigation, and MCP for standardized tool access, you can build agents that are powerful, safe, and interoperable.
The future of autonomous work isn't just a chatbot—it's a sandboxed agent with a browser in one hand and MCP in the other.

Top comments (0)