The previous three articles each dissected one layer of BoxAgnts' three-tier architecture:
- Outer layer: Out-of-the-box user experience — Vue 3 Dashboard, REST API, WebSocket real-time communication
- Middle layer: Agent Toolbox — multi-model dispatch (LlmProvider trait), Agent reasoning loop, Tool trait unified abstraction, Cron scheduling
- Bottom layer: WASM security sandbox — Wasmtime engine, 11-dimensional RunOption, three-layer defense-in-depth
But the true value of architecture lies not in what each layer does individually, but in how they collaborate. This article examines this collaboration through five perspectives.
Three-Layer Panorama
┌──────────────────────────────────────────────────────────────────┐
│ OUTER LAYER │
│ │
│ CLI (clap 6 params) │ Dashboard (Vue 3 × 10 pages) │ REST API │
│ ───────────────────────────────────────────────────────────── │
│ User Interface · Visualization · Request Ingestion · │
│ Streaming Response · Authentication · Site Hosting │
│ │
│ Pre-installed Resources: 7 WASM Tools · 5 Skills · │
│ AGENTS.md · Static File Service │
│ │
└────────────────────────────┬─────────────────────────────────────┘
│ HTTP REST / WebSocket
│ (Axum Router)
┌────────────────────────────┴─────────────────────────────────────┐
│ MIDDLE LAYER │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ boxagnts-api │ │boxagnts-query│ │ boxagnts-tools │ │
│ │ LlmProvider │ │ run_query │ │ Tool trait │ │
│ │ 20+ Provider │ │ _loop() │ │ tools-manager registry│ │
│ └──────┬───────┘ └──────┬───────┘ └──────────┬───────────┘ │
│ │ │ │ │
│ ┌──────┴─────────────────┴──────────────────────┴───────────┐ │
│ │ boxagnts-gateway · Cron · Site │ │
│ │ boxagnts-workspace · SQLite · JSON │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
│ Core Abstractions: Tool trait · QueryConfig (20 fields) · │
│ QueryOutcome │
│ Core Mechanisms: Token Recovery · Context Compaction · │
│ Fallback · Budget · Manager │
└────────────────────────────┬─────────────────────────────────────┘
│ Tool trait → execute()
┌────────────────────────────┴─────────────────────────────────────┐
│ BOTTOM LAYER │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ WasmTool.execute() → Wasmtime Engine → .wasm Components │ │
│ │ RunOption (11-dim) · WASI Interception · 3-Layer Defense │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ Security Dimensions: Permission · Memory · Stack · Time · │
│ Fuel · File · Network │
│ Component Interface: JSON stdin → WASM Execution → JSON stdout │
│ Extension System: Skill (SKILL.md) · Service (.wasm) │
└──────────────────────────────────────────────────────────────────┘
Three Key Interfaces: The "Joints" of the Architecture
Whether a layered architecture is excellent depends on its inter-layer interfaces. BoxAgnts defines three clear interfaces that connect each layer like joints in the human body:
Interface 1: Outer Layer ↔ Middle Layer — HTTP + WebSocket
This is the boundary between user interaction and business logic:
Dashboard (Vue 3)
│
├── REST API (Axum) ──→ gateway/api/chat.rs, config.rs, cron.rs...
│ Unified response format { success, data, error }
│
└── WebSocket ──→ server/dashboard/ws.rs
Bidirectional real-time communication
command_type: "chat_execute" | "chat_execute_cancel"
Key design decision: Chat command sending and response receiving use two different channels:
- Command sending: WebSocket (low latency, bidirectional)
-
Streaming responses: WebSocket push
QueryEvent(real-time, unidirectional) - History loading: REST API (idempotent, cacheable)
This separation avoids REST's long-polling problem and also avoids making WebSocket carry all data.
Interface 2: Middle Layer ↔ Bottom Layer — Tool Trait
This is the boundary between business logic and secure execution. It is also the most critical interface:
pub trait Tool: Send + Sync {
fn name(&self) -> &'static str; // Visible to AI
fn description(&self) -> &'static str; // Visible to AI
fn input_schema(&self) -> Value; // Visible to AI
async fn execute(&self, input: Value, ctx: &ToolContext) -> ToolResult;
}
The elegance of this interface lies in its asymmetric information design:
| Interface Method | Used By | Purpose |
|---|---|---|
name() |
AI Model | List tool names in system prompt |
description() |
AI Model | Help AI decide which tool to use |
input_schema() |
AI Model | Let AI construct correct JSON parameters |
execute() |
System | Actually execute the tool |
The AI model only needs to see the first three — it doesn't need to know where the tool executes (Rust or WASM). This perfectly embodies the principle of information hiding.
Interface 3: Bottom Layer ↔ System Resources — WASI
This is the boundary between the security sandbox and the host operating system:
WASM Component
│
├── fopen("/src/main.rs") ──→ WASI preopen ──→ Host /workspace/src/main.rs
│ Path mapping auto-conversion, unmapped paths fail directly
│
├── socket connect ──→ socket_addr_check() ──→ Whitelist + Blacklist ──→ Allow/Deny
│
├── memory.grow ──→ wasm_max_memory_size check ──→ Allow/Return -1
│
└── Instruction execution ──→ wasm_fuel metering ──→ Trap on fuel exhaustion
WASI is an interface specification defined by the WebAssembly standardization organization, supported by all mainstream WASM runtimes. Choosing WASI over a custom interface means BoxAgnts' sandbox components have cross-runtime portability.
Complete Request Trace: From User Input to Sandbox Execution
Let's trace a complete user request through all three layers:
⌨️ User types in ChatPage:
"Help me create a REST API endpoint in the project with full CRUD for users"
══════════════════════ Step 1: Outer Layer Receives ═══════════════════════
→ ws.rs receives WebSocket message { command_type: "chat_execute", prompt: "...", session_id: "..." }
→ Generates instance_id, records in ws_instances HashMap
→ Spawns async task execute_command()
→ Builds QueryConfig: model=current selection, max_turns=default, thinking_budget=default
══════════════════════ Step 2: Middle Layer Orchestrates ═══════════════════════
→ boxagnts-query::run_query_loop() starts
Turn 1:
→ Build system prompt (with AGENTS.md + tool list + skill list)
→ boxagnts-api selects Provider, formats messages
→ AI returns tool_use: "read" (read project structure)
→ boxagnts-tools matches WasmTool("read")
→ Enters Step 3...
══════════════════════ Step 3: Bottom Layer Executes ═══════════════════════
→ WasmTool.execute()
→ RunOption build:
work_dir = /workspace/root
wasm_timeout = default
wasm_fuel = default
→ Wasmtime loads file-read-component.wasm (or uses .cwasm cache)
→ stdin input: {"file_path":"/src/main.rs","limit":100}
→ WASI preopen maps: / → /workspace/root
→ Executes in sandbox, stdout returns: {"content":"use axum::...","is_error":false}
→ Returns ToolResult { content: "...", is_error: false }
══════════════════════ Step 2 Continues: Middle Layer Orchestrates ═══════════════════════
→ Tool result injected into conversation history
Turn 2:
→ AI: "I can see the project uses axum, let me look at the existing route structure"
→ tool_use: "glob" (match **/*.rs)
→ Enters Step 3 again...
Turn 3:
→ AI understands project structure
→ AI returns tool_use: "write" (write src/api/users.rs)
→ Sandbox file write (path within work_dir scope)
→ Write successful
Turn 4:
→ AI returns tool_use: "edit" (register new route in src/main.rs)
→ Precise string replacement, completed in sandbox
Turn 5:
→ AI: "A complete user CRUD API has been created"
→ stop_reason: "end_turn"
→ Returns QueryOutcome::EndTurn
══════════════════════ Step 4: Outer Layer Presents ═══════════════════════
→ WebSocket pushes final completion message
→ ChatPage renders Markdown result
→ useChatSession updates message list
→ useChatScroll auto-scrolls to bottom
══════════════════════ Result ═══════════════════════
Total time: ~8-15 seconds
AI calls: 5 turns
Tool executions: 4 (read × 1, glob × 1, write × 1, edit × 1)
Sandbox executions: 4 (all in WASM sandbox)
User perception: Ask one question, watch AI analyze and complete coding step by step
Design Patterns and Architectural Trade-offs
Pattern 1: Separation of Concerns
| Concern | Layer | Independent Evolution |
|---|---|---|
| User experience, visualization, interaction | Outer | Frontend can switch to React / Tauri / Flutter |
| AI orchestration, tool dispatch, business logic | Middle | Models replaceable, tools addable/removable |
| Security isolation, system operations, resource control | Bottom | Engine upgradeable (Wasmtime → WasmEdge) |
Pattern 2: Interface-Oriented Programming
The Tool trait is the cornerstone of the entire edifice. It gives Rust native tools and WASM sandbox tools exactly the same interface shape. The benefits of this design:
- AI models don't need to know how tools are implemented
- Middle layer doesn't need to care whether bottom is Rust or WASM
- Bottom layer can be optimized independently without affecting the middle
Pattern 3: Embedded Security
BoxAgnts' security mechanisms are not "bolt-on" — they are deeply embedded in the architecture:
- Outer layer authentication checks (local vs remote)
- Bottom layer three-tier defense (resource + WASI + network)
- Interface layer parameter validation (JSON Schema defined in input_schema)
Trade-off: The Cost of Choosing WASM
| Advantage | Cost | BoxAgnts' Handling |
|---|---|---|
| Instruction-level security isolation | WASM component dev/debug is complex | Core ops use WASM, meta ops use Rust native |
| Cross-platform single compile | Some syscalls unavailable | Standardized WASI interfaces, no OS dependency |
| Hot-loading new tools | .wasm files need pre-compiled distribution | Cranelift compile cache (.cwasm), millisecond load |
This pragmatic trade-off reflects mature engineering judgment — sacrificing neither flexibility for "purity" nor security for "convenience."
Core Architectural Insights
Insight 1: Good Architecture Makes "Simple Things Simple, Complex Things Possible"
Simple scenario: Just want to chat?
→ boxagnts → browser open → type question → chat
→ Three layers invisible internally, zero config
Medium scenario: Need custom tools?
→ Implement Tool trait → register in all_tools()
→ Only need to understand one trait, clear middle-layer interface
Complex scenario: Need enterprise-grade security?
→ RunOption 11-dimensional config → three-layer defense-in-depth
→ Bottom layer is full-featured, defaults cover 90% scenarios
Insight 2: Interfaces Are Architecture's Gateways
The three key interfaces (HTTP/WS, Tool trait, WASI) are three gateways. As long as the contracts of these interfaces are maintained, each layer can be freely refactored internally:
- Frontend from Vue 3 to React — doesn't affect backend (HTTP API unchanged)
- AI model from Anthropic to OpenAI — doesn't affect tools (Tool trait unchanged)
- Sandbox engine from Wasmtime to WasmEdge — doesn't affect business (WASI standard unchanged)
Insight 3: Security Is the Currency of Trust
In an era where AI can manipulate files, execute commands, and access networks:
- No security = No trust
- No trust = No users willing to use deeply
- Placing security at the bottom of the architecture means it cannot be bypassed or overlooked
Insight 4: Unified Abstraction Reduces Complexity
BoxAgnts' architecture has a recurring pattern: using one core abstraction to unify diverse implementations:
Multi-model → LlmProvider trait → One interface, 20+ implementations
Multi-tool → Tool trait → One interface, Rust + WASM implementations
Multi-storage → workspace → SQLite (state) + JSON (config)
Multi-interaction → axios (REST) + WebSocket → Each with its own role
This "unified abstraction" design reduces cognitive load — developers only need to understand a few core concepts to master the entire system.
Comparison with Other AI Agent Systems
| Dimension | LangChain/LangGraph | AutoGPT | BoxAgnts |
|---|---|---|---|
| Install Experience | pip install + config | Docker + config | Single file download and run |
| Model Compatibility | Via LangChain wrappers | Limited | 20+ Providers native support |
| Tool Execution | Direct host execution | Direct host execution | WASM sandbox isolation |
| Security Model | Relies on OS permissions | Relies on OS permissions | Three-layer defense (resource/file/network) |
| UI Interface | Optional (LangServe) | Web interface | Built-in full-featured Dashboard |
| Scheduled Tasks | Third-party integration | No native support | Built-in Cron engine |
| Extensibility | Python ecosystem | Python ecosystem | Skill system + WASM components |
| Deployment Mode | Python service | Docker container | Single Rust binary |
BoxAgnts' unique positioning: not a development framework for building AI Agents (that's LangChain's territory), but a ready-to-use AI Agent runtime platform.
Lessons for AI Tool Developers
If you're developing your own AI Agent system, BoxAgnts' three-tier design offers these lessons:
1. Define Interfaces First, Then Implement Features
The Tool trait is BoxAgnts' core skeleton. All extensions revolve around this trait — new tools just need to implement it, and new functionality is automatically perceived by the Agent loop. Time spent designing a good trait pays off in long-term extensibility.
2. Security from Day One
Don't wait for problems before adding a sandbox — by then the architecture is already set, and security can only exist as a "patch." BoxAgnts placed security at the bottom from the start, meaning any new functionality automatically gets sandbox protection.
3. Unified Abstraction Reduces Complexity
Don't design different interfaces for different types of functionality — one trait rules them all. Vec<Box<dyn Tool>> simultaneously holds simple tools (SleepTool) and complex tools (WasmTool), with no difference to the caller.
4. Minimize User Choices
6 CLI parameters > 60 lines of YAML config. Every parameter has a reasonable default. Users should spend time "using tools to solve problems," not "configuring the tools themselves."
5. Each Layer Focuses Only on Its Own Problem
- Outer layer doesn't care if bottom uses Wasmtime or Docker
- Middle layer doesn't care if outer uses Vue or React
- Bottom layer doesn't care which AI model the middle uses
- Each layer only needs to understand the interfaces with adjacent layers
6. Pragmatic Technical Trade-offs
Not all operations need to go through the WASM sandbox — AskUserQuestion is pure UI interaction, PlanMode is pure state management. BoxAgnts lets Rust native tools and WASM tools coexist in the same all_tools() list — this is a pragmatic choice, not dogmatic insistence.
The Value Formula of Three-Layer Synergy
Single Layer Value:
Outer = Good user experience
Middle = Powerful capabilities
Bottom = Reliable security
Two-Layer Combined Value:
Outer + Middle = Easy-to-use, powerful Agent platform
Middle + Bottom = Secure, flexible Agent engine
Bottom + Outer = Secure, intuitive user interface
Three-Layer Synergy Value:
Outer × Middle × Bottom = User Trust → Deep Usage → Continuous Feedback → System Iteration
This is the true meaning of 1+1+1 > 3
Conclusion
BoxAgnts' three-tier architecture is not a complex concept — it only does three things:
- Outer layer: Lets users easily interact with AI — 6 params, 10 pages, 1 Dashboard
- Middle layer: Lets AI flexibly dispatch tools and knowledge — 1 trait, 20+ models, 6 modules
- Bottom layer: Lets all operations run in a secure environment — 1 engine, 11-dim control, 3-layer defense
But it is the organic combination of these three that shapes a complete, trustworthy, extensible AI Agent platform.
Each of the three layers performs its own role while tightly collaborating through precisely defined interfaces — HTTP/WebSocket connects outer and middle, the Tool trait connects middle and bottom, WASI connects bottom and system resources. The three interfaces are three gateways — holding them means holding the architecture's stability and evolvability.
As the project's name suggests — Box (toolbox), Agent (intelligent agent), Sandbox (sandbox) — three words, three layers, one complete AI Agent runtime platform.
Related Resources
- Boxagnts: https://github.com/guyoung/boxagnts
Top comments (0)