Inside AIO Sandbox (Part 2): Bridging the Gap — Mastering AI Agents with Browser-Use, and MCP

#ai #sandbox #infrastructure #opensource

By AIO Sandbox team

This is the second part of a series of posts diving into AIO Sandbox that provides an isolated, programmable environment where agents can safely execute.
In the world of AI agents, we’ve moved past simple text-in, text-out. Today’s agents need to act—they need to browse the web, interact with GUIs, and communicate through standardized protocols. However, running these agents locally is risky. You wouldn't want an experimental script having unfettered access to your main browser or local file system. This is where the AIO Sandbox comes in.
In our previous post we focussed on "Filesystem" & "Shell" capabilities of AIO Sandbox. In this post, we’ll explore ways to supercharge AI agents using the AIO Sandbox from web automation to MCP protocol communication.

1. Seamless Web Automation with browser-use

The general-purpose GUI agents (CUA – Computer-use Agents) integrate advanced reasoning, perception, and action modules, enabling them to perform tasks with human-like adaptability and versatility.
The first challenge for any agent is navigating the web. Using the browser-use library alongside a remote CDP (Chrome DevTools Protocol) session provided by the AIO Sandbox, we can create an agent that feels like it’s "seeing" the web.
A browser-use agent built on the Chrome DevTools Protocol (CDP) is essentially an LLM-powered controller that treats the browser like a programmable environment. Think of CDP (Chrome DevTools Protocol) as “The browser’s brain exposed as an API”, the "backdoor" into the browser's brain. Instead of sending clicks to pixel X, Y coordinates, the commands are sent to the DOM (Document Object Model).
High-level architecture of a browser-use agent

User Task
   ↓
LLM (Planner + Reasoner)
   ↓
Agent Runtime
   ↓
CDP Client (WebSocket)
   ↓
Browser (Chrome/Chromium)

The code below demonstrates how to connect a browser-use agent to a sandboxed browser to perform a specific task: extracting Cisco’s stock price.

import asyncio
from agent_sandbox import Sandbox
from browser_use import Agent, Tools, BrowserProfile, BrowserSession
from browser_use.llm import ChatOpenAI

async def main():
    # 1. Connect to the AIO Sandbox
    sandbox = Sandbox(base_url="http://localhost:8080")
    cdp_url = sandbox.browser.get_info().data.cdp_url

    # 2. Attach the browser session to the remote Sandbox
    browser_session = BrowserSession(
        browser_profile=BrowserProfile(cdp_url=cdp_url, is_local=True)
    )

    task = 'Go to Google, look up "Cisco stock price" (CSCO), and extract the current price.'

    agent = Agent(
        task=task,
        llm=ChatOpenAI(model="gpt-4o"),
        browser_session=browser_session,
    )

    await agent.run()
    await browser_session.kill()

if __name__ == "__main__":
    asyncio.run(main())

Why this matters: By offloading the browser to a sandbox, your agent runs in an isolated environment. If the agent accidentally clicks a malicious link or tries to download a file, it only affects the disposable container, not your host machine.

2.Standardizing Discovery: The Agentic MCP (Model Context Protocol) Loop

Next, we have the Model Context Protocol (MCP). MCP is a standardized way for agents to discover and use tools across different environments without custom integration code for every new feature. MCP is the "native language" of modern AI agents. AIO Sandbox exports its services (such as browser_navigate etc.) as tools that an LLM can call directly. For a human developer, the REST API is the interface. For an AI agent, the MCP Tool Manifest is the interface. By exposing both, the sandbox ensures that:

A Python script can use the sandbox via REST.
An LLM (like Claude) can use the sandbox via MCP. The AIO Sandbox acts as an MCP host, allowing you to list and execute tools (like browser_navigate or shell_execute) through a unified interface.

To see this MCP usage in practice, let’s look at how we can bridge the gap between an LLM and the sandbox using MCP. The following example demonstrates a dynamic 'handshake' where the agent queries the AIO Sandbox for its available capabilities and automatically converts them into a format the LLM understands.

import asyncio
import json
from openai import AsyncOpenAI
from agent_sandbox import AsyncSandbox

# Initialize your AI and your MCP Client
# ai_client = AsyncOpenAI(api_key="your-key-here")
ai_client = AsyncOpenAI(api_key="API-KEY")
mcp_client = AsyncSandbox(base_url="http://localhost:8080")

async def run_agent(user_prompt):
    # 1. Fetch available tools from MCP
    tools_resp = await mcp_client.mcp.list_mcp_tools(server_name="browser")

    # Format MCP tools into OpenAI's required JSON schema
    openai_tools = [
        {
            "type": "function",
            "function": {
                "name": tool.name,
                "description": tool.description,
                "parameters": tool.input_schema # MCP uses JSON Schema, which OpenAI likes
            }
        } for tool in tools_resp.data.tools
    ]

    # 2. Ask the model what to do
    response = await ai_client.chat.completions.create(
        model="gpt-5.4",
        messages=[{"role": "user", "content": user_prompt}],
        tools=openai_tools
    )

    message = response.choices[0].message

    # 3. If the model wants to call a tool, we execute it
    if message.tool_calls:
        for tool_call in message.tool_calls:
            # Parse arguments
            args = json.loads(tool_call.function.arguments)

            print(f"Agent is executing {tool_call.function.name}...")

            # CALL THE MCP SANDBOX
            result = await mcp_client.mcp.execute_mcp_tool(
                server_name="browser",
                tool_name=tool_call.function.name,
                request=args
            )

            # 4. Give the tool's output back to the model
            final_resp = await ai_client.chat.completions.create(
                model="gpt-5.4",
                messages=[
                    {"role": "user", "content": user_prompt},
                    message,
                    {
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "content": str(result.data.content[0].text)
                    }
                ]
            )
            print(f"Final Answer: {final_resp.choices[0].message.content}")

if __name__ == "__main__":
    asyncio.run(run_agent("Go to google.com and tell me what the search button says and the button next to it."))

Why MCP is the future: Instead of writing custom logic for every new sandbox capability, your agent can "handshake" with the server, see what it's capable of, and start working immediately.
The Shift: This decouples your code from your tools. If you add a "SQL" or "FileSystem" server to the sandbox tomorrow, your agent automatically knows how to use them without a single line of new code.

3. Conclusion

Building AI agents is no longer just about the prompt. It’s about the infrastructure that supports them. By combining AIO Sandbox for security, browser-use for high-level navigation, and MCP for standardized tool access, you can build agents that are powerful, safe, and interoperable.
The future of autonomous work isn't just a chatbot—it's a sandboxed agent with a browser in one hand and MCP in the other.