Guyoung Studio

Posted on May 28

BoxAgnts Introduction (4) — Core Architecture

#ai #agents

The previous three articles each dissected one layer of BoxAgnts' three-tier architecture:

Outer layer: Out-of-the-box user experience — Vue 3 Dashboard, REST API, WebSocket real-time communication
Middle layer: Agent Toolbox — multi-model dispatch (LlmProvider trait), Agent reasoning loop, Tool trait unified abstraction, Cron scheduling
Bottom layer: WASM security sandbox — Wasmtime engine, 11-dimensional RunOption, three-layer defense-in-depth

But the true value of architecture lies not in what each layer does individually, but in how they collaborate. This article examines this collaboration through five perspectives.

Three-Layer Panorama

┌──────────────────────────────────────────────────────────────────┐
│                        OUTER LAYER                               │
│                                                                  │
│  CLI (clap 6 params)  │  Dashboard (Vue 3 × 10 pages)  │  REST API   │
│  ─────────────────────────────────────────────────────────────  │
│  User Interface · Visualization · Request Ingestion ·            │
│  Streaming Response · Authentication · Site Hosting              │
│                                                                  │
│  Pre-installed Resources: 7 WASM Tools · 5 Skills ·             │
│  AGENTS.md · Static File Service                                 │
│                                                                  │
└────────────────────────────┬─────────────────────────────────────┘
                             │ HTTP REST / WebSocket
                             │ (Axum Router)
┌────────────────────────────┴─────────────────────────────────────┐
│                        MIDDLE LAYER                              │
│                                                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐  │
│  │ boxagnts-api │  │boxagnts-query│  │ boxagnts-tools       │  │
│  │ LlmProvider  │  │ run_query    │  │ Tool trait           │  │
│  │ 20+ Provider │  │ _loop()      │  │ tools-manager registry│ │
│  └──────┬───────┘  └──────┬───────┘  └──────────┬───────────┘  │
│         │                 │                      │               │
│  ┌──────┴─────────────────┴──────────────────────┴───────────┐  │
│  │          boxagnts-gateway · Cron · Site                    │  │
│  │          boxagnts-workspace · SQLite · JSON                │  │
│  └────────────────────────────────────────────────────────────┘  │
│                                                                  │
│  Core Abstractions: Tool trait · QueryConfig (20 fields) ·       │
│  QueryOutcome                                                   │
│  Core Mechanisms: Token Recovery · Context Compaction ·          │
│  Fallback · Budget · Manager                                    │
└────────────────────────────┬─────────────────────────────────────┘
                             │ Tool trait → execute()
┌────────────────────────────┴─────────────────────────────────────┐
│                        BOTTOM LAYER                              │
│                                                                  │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │  WasmTool.execute() → Wasmtime Engine → .wasm Components  │   │
│  │  RunOption (11-dim) · WASI Interception · 3-Layer Defense │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                  │
│  Security Dimensions: Permission · Memory · Stack · Time ·       │
│  Fuel · File · Network                                          │
│  Component Interface: JSON stdin → WASM Execution → JSON stdout  │
│  Extension System: Skill (SKILL.md) · Service (.wasm)            │
└──────────────────────────────────────────────────────────────────┘

Three Key Interfaces: The "Joints" of the Architecture

Whether a layered architecture is excellent depends on its inter-layer interfaces. BoxAgnts defines three clear interfaces that connect each layer like joints in the human body:

Interface 1: Outer Layer ↔ Middle Layer — HTTP + WebSocket

This is the boundary between user interaction and business logic:

Dashboard (Vue 3)
    │
    ├── REST API (Axum) ──→ gateway/api/chat.rs, config.rs, cron.rs...
    │   Unified response format { success, data, error }
    │
    └── WebSocket ──→ server/dashboard/ws.rs
        Bidirectional real-time communication
        command_type: "chat_execute" | "chat_execute_cancel"

Key design decision: Chat command sending and response receiving use two different channels:

Command sending: WebSocket (low latency, bidirectional)
Streaming responses: WebSocket push QueryEvent (real-time, unidirectional)
History loading: REST API (idempotent, cacheable)

This separation avoids REST's long-polling problem and also avoids making WebSocket carry all data.

Interface 2: Middle Layer ↔ Bottom Layer — Tool Trait

This is the boundary between business logic and secure execution. It is also the most critical interface:

pub trait Tool: Send + Sync {
    fn name(&self) -> &'static str;                          // Visible to AI
    fn description(&self) -> &'static str;                   // Visible to AI
    fn input_schema(&self) -> Value;                         // Visible to AI
    async fn execute(&self, input: Value, ctx: &ToolContext) -> ToolResult;
}

The elegance of this interface lies in its asymmetric information design:

Interface Method	Used By	Purpose
`name()`	AI Model	List tool names in system prompt
`description()`	AI Model	Help AI decide which tool to use
`input_schema()`	AI Model	Let AI construct correct JSON parameters
`execute()`	System	Actually execute the tool

The AI model only needs to see the first three — it doesn't need to know where the tool executes (Rust or WASM). This perfectly embodies the principle of information hiding.

Interface 3: Bottom Layer ↔ System Resources — WASI

This is the boundary between the security sandbox and the host operating system:

WASM Component
    │
    ├── fopen("/src/main.rs") ──→ WASI preopen ──→ Host /workspace/src/main.rs
    │   Path mapping auto-conversion, unmapped paths fail directly
    │
    ├── socket connect ──→ socket_addr_check() ──→ Whitelist + Blacklist ──→ Allow/Deny
    │
    ├── memory.grow ──→ wasm_max_memory_size check ──→ Allow/Return -1
    │
    └── Instruction execution ──→ wasm_fuel metering ──→ Trap on fuel exhaustion

WASI is an interface specification defined by the WebAssembly standardization organization, supported by all mainstream WASM runtimes. Choosing WASI over a custom interface means BoxAgnts' sandbox components have cross-runtime portability.

Complete Request Trace: From User Input to Sandbox Execution

Let's trace a complete user request through all three layers:

⌨️  User types in ChatPage:
    "Help me create a REST API endpoint in the project with full CRUD for users"

══════════════════════ Step 1: Outer Layer Receives ═══════════════════════

→ ws.rs receives WebSocket message { command_type: "chat_execute", prompt: "...", session_id: "..." }
→ Generates instance_id, records in ws_instances HashMap
→ Spawns async task execute_command()
→ Builds QueryConfig: model=current selection, max_turns=default, thinking_budget=default

══════════════════════ Step 2: Middle Layer Orchestrates ═══════════════════════

→ boxagnts-query::run_query_loop() starts

Turn 1:
  → Build system prompt (with AGENTS.md + tool list + skill list)
  → boxagnts-api selects Provider, formats messages
  → AI returns tool_use: "read" (read project structure)
  → boxagnts-tools matches WasmTool("read")
  → Enters Step 3...

══════════════════════ Step 3: Bottom Layer Executes ═══════════════════════

→ WasmTool.execute()
→ RunOption build:
    work_dir = /workspace/root
    wasm_timeout = default
    wasm_fuel = default
→ Wasmtime loads file-read-component.wasm (or uses .cwasm cache)
→ stdin input: {"file_path":"/src/main.rs","limit":100}
→ WASI preopen maps: / → /workspace/root
→ Executes in sandbox, stdout returns: {"content":"use axum::...","is_error":false}
→ Returns ToolResult { content: "...", is_error: false }

══════════════════════ Step 2 Continues: Middle Layer Orchestrates ═══════════════════════

→ Tool result injected into conversation history

Turn 2:
  → AI: "I can see the project uses axum, let me look at the existing route structure"
  → tool_use: "glob" (match **/*.rs)
  → Enters Step 3 again...

Turn 3:
  → AI understands project structure
  → AI returns tool_use: "write" (write src/api/users.rs)
  → Sandbox file write (path within work_dir scope)
  → Write successful

Turn 4:
  → AI returns tool_use: "edit" (register new route in src/main.rs)
  → Precise string replacement, completed in sandbox

Turn 5:
  → AI: "A complete user CRUD API has been created"
  → stop_reason: "end_turn"
  → Returns QueryOutcome::EndTurn

══════════════════════ Step 4: Outer Layer Presents ═══════════════════════

→ WebSocket pushes final completion message
→ ChatPage renders Markdown result
→ useChatSession updates message list
→ useChatScroll auto-scrolls to bottom

══════════════════════ Result ═══════════════════════

Total time: ~8-15 seconds
AI calls: 5 turns
Tool executions: 4 (read × 1, glob × 1, write × 1, edit × 1)
Sandbox executions: 4 (all in WASM sandbox)
User perception: Ask one question, watch AI analyze and complete coding step by step

Design Patterns and Architectural Trade-offs

Pattern 1: Separation of Concerns

Concern	Layer	Independent Evolution
User experience, visualization, interaction	Outer	Frontend can switch to React / Tauri / Flutter
AI orchestration, tool dispatch, business logic	Middle	Models replaceable, tools addable/removable
Security isolation, system operations, resource control	Bottom	Engine upgradeable (Wasmtime → WasmEdge)

Pattern 2: Interface-Oriented Programming

The Tool trait is the cornerstone of the entire edifice. It gives Rust native tools and WASM sandbox tools exactly the same interface shape. The benefits of this design:

AI models don't need to know how tools are implemented
Middle layer doesn't need to care whether bottom is Rust or WASM
Bottom layer can be optimized independently without affecting the middle

Pattern 3: Embedded Security

BoxAgnts' security mechanisms are not "bolt-on" — they are deeply embedded in the architecture:

Outer layer authentication checks (local vs remote)
Bottom layer three-tier defense (resource + WASI + network)
Interface layer parameter validation (JSON Schema defined in input_schema)

Trade-off: The Cost of Choosing WASM

Advantage	Cost	BoxAgnts' Handling
Instruction-level security isolation	WASM component dev/debug is complex	Core ops use WASM, meta ops use Rust native
Cross-platform single compile	Some syscalls unavailable	Standardized WASI interfaces, no OS dependency
Hot-loading new tools	.wasm files need pre-compiled distribution	Cranelift compile cache (.cwasm), millisecond load

This pragmatic trade-off reflects mature engineering judgment — sacrificing neither flexibility for "purity" nor security for "convenience."

Core Architectural Insights

Insight 1: Good Architecture Makes "Simple Things Simple, Complex Things Possible"

Simple scenario: Just want to chat?
→ boxagnts → browser open → type question → chat
→ Three layers invisible internally, zero config

Medium scenario: Need custom tools?
→ Implement Tool trait → register in all_tools()
→ Only need to understand one trait, clear middle-layer interface

Complex scenario: Need enterprise-grade security?
→ RunOption 11-dimensional config → three-layer defense-in-depth
→ Bottom layer is full-featured, defaults cover 90% scenarios

Insight 2: Interfaces Are Architecture's Gateways

The three key interfaces (HTTP/WS, Tool trait, WASI) are three gateways. As long as the contracts of these interfaces are maintained, each layer can be freely refactored internally:

Frontend from Vue 3 to React — doesn't affect backend (HTTP API unchanged)
AI model from Anthropic to OpenAI — doesn't affect tools (Tool trait unchanged)
Sandbox engine from Wasmtime to WasmEdge — doesn't affect business (WASI standard unchanged)

Insight 3: Security Is the Currency of Trust

In an era where AI can manipulate files, execute commands, and access networks:

No security = No trust
No trust = No users willing to use deeply
Placing security at the bottom of the architecture means it cannot be bypassed or overlooked

Insight 4: Unified Abstraction Reduces Complexity

BoxAgnts' architecture has a recurring pattern: using one core abstraction to unify diverse implementations:

Multi-model → LlmProvider trait → One interface, 20+ implementations
Multi-tool → Tool trait → One interface, Rust + WASM implementations
Multi-storage → workspace → SQLite (state) + JSON (config)
Multi-interaction → axios (REST) + WebSocket → Each with its own role

This "unified abstraction" design reduces cognitive load — developers only need to understand a few core concepts to master the entire system.

Comparison with Other AI Agent Systems

Dimension	LangChain/LangGraph	AutoGPT	BoxAgnts
Install Experience	pip install + config	Docker + config	Single file download and run
Model Compatibility	Via LangChain wrappers	Limited	20+ Providers native support
Tool Execution	Direct host execution	Direct host execution	WASM sandbox isolation
Security Model	Relies on OS permissions	Relies on OS permissions	Three-layer defense (resource/file/network)
UI Interface	Optional (LangServe)	Web interface	Built-in full-featured Dashboard
Scheduled Tasks	Third-party integration	No native support	Built-in Cron engine
Extensibility	Python ecosystem	Python ecosystem	Skill system + WASM components
Deployment Mode	Python service	Docker container	Single Rust binary

BoxAgnts' unique positioning: not a development framework for building AI Agents (that's LangChain's territory), but a ready-to-use AI Agent runtime platform.

Lessons for AI Tool Developers

If you're developing your own AI Agent system, BoxAgnts' three-tier design offers these lessons:

1. Define Interfaces First, Then Implement Features

The Tool trait is BoxAgnts' core skeleton. All extensions revolve around this trait — new tools just need to implement it, and new functionality is automatically perceived by the Agent loop. Time spent designing a good trait pays off in long-term extensibility.

2. Security from Day One

Don't wait for problems before adding a sandbox — by then the architecture is already set, and security can only exist as a "patch." BoxAgnts placed security at the bottom from the start, meaning any new functionality automatically gets sandbox protection.

3. Unified Abstraction Reduces Complexity

Don't design different interfaces for different types of functionality — one trait rules them all. Vec<Box<dyn Tool>> simultaneously holds simple tools (SleepTool) and complex tools (WasmTool), with no difference to the caller.

4. Minimize User Choices

6 CLI parameters > 60 lines of YAML config. Every parameter has a reasonable default. Users should spend time "using tools to solve problems," not "configuring the tools themselves."

5. Each Layer Focuses Only on Its Own Problem

Outer layer doesn't care if bottom uses Wasmtime or Docker
Middle layer doesn't care if outer uses Vue or React
Bottom layer doesn't care which AI model the middle uses
Each layer only needs to understand the interfaces with adjacent layers

6. Pragmatic Technical Trade-offs

Not all operations need to go through the WASM sandbox — AskUserQuestion is pure UI interaction, PlanMode is pure state management. BoxAgnts lets Rust native tools and WASM tools coexist in the same all_tools() list — this is a pragmatic choice, not dogmatic insistence.

The Value Formula of Three-Layer Synergy

Single Layer Value:
  Outer = Good user experience
  Middle = Powerful capabilities
  Bottom = Reliable security

Two-Layer Combined Value:
  Outer + Middle = Easy-to-use, powerful Agent platform
  Middle + Bottom = Secure, flexible Agent engine
  Bottom + Outer = Secure, intuitive user interface

Three-Layer Synergy Value:
  Outer × Middle × Bottom = User Trust → Deep Usage → Continuous Feedback → System Iteration
  This is the true meaning of 1+1+1 > 3

Conclusion

BoxAgnts' three-tier architecture is not a complex concept — it only does three things:

Outer layer: Lets users easily interact with AI — 6 params, 10 pages, 1 Dashboard
Middle layer: Lets AI flexibly dispatch tools and knowledge — 1 trait, 20+ models, 6 modules
Bottom layer: Lets all operations run in a secure environment — 1 engine, 11-dim control, 3-layer defense

But it is the organic combination of these three that shapes a complete, trustworthy, extensible AI Agent platform.

Each of the three layers performs its own role while tightly collaborating through precisely defined interfaces — HTTP/WebSocket connects outer and middle, the Tool trait connects middle and bottom, WASI connects bottom and system resources. The three interfaces are three gateways — holding them means holding the architecture's stability and evolvability.

As the project's name suggests — Box (toolbox), Agent (intelligent agent), Sandbox (sandbox) — three words, three layers, one complete AI Agent runtime platform.

Related Resources

Boxagnts: https://github.com/guyoung/boxagnts

DEV Community