Guyoung Studio

Posted on May 30

BoxAgnts Introduction (6) — Agent Multi-Turn Conversation and Tool/Skill Invocation

#agentskills #agents #ai

If you've only chatted with ChatGPT, you might think an AI Agent is simply "send a prompt to the API, display the response."

The reality is far more complex. Here is a complete Agent interaction flow in BoxAgnts:

User input: "Help me read config.toml and change port to 9090"

1. User message added to conversation history
2. Build system prompt (tool list + skill list + AGENTS.md + Agent role definition)
3. Call LLM API → stream receive response
4. AI decides to call tool: tool_use("read", {path: "config.toml"})
5. Execute read tool (within WASM sandbox)
6. Tool result injected into conversation history
7. Call API again → AI analyzes config
8. AI decides to call tool: tool_use("edit", {path: "config.toml", old: "port = 8080", new: "port = 9090"})
9. Execute edit tool
10. Tool result injected into conversation
11. Call API again → AI responds: "Port has been changed from 8080 to 9090"
12. end_turn → Conversation ends

This process involves 3 API calls, 2 tool executions, streaming push, and context management. This article dissects the design and implementation of each link.

Agent Definition: Giving the Agent an "Identity"

Before starting the reasoning loop, the Agent's "role" needs to be defined. BoxAgnts comes with three pre-installed Agents:

// boxagnts-workspace/src/config.rs
pub struct AgentDefinition {
    pub description: "Option<String>,    // Description"
    pub model: Option<String>,          // Model override
    pub temperature: Option<f64>,       // Temperature override
    pub prompt: Option<String>,         // System prompt prefix
    pub access: String,                 // Permission: full / read-only / search-only
    pub visible: bool,                  // Whether visible in @agent autocomplete
    pub max_turns: Option<u32>,         // Max turns override
    pub color: Option<String>,          // Terminal display color
}

The three pre-installed Agent roles:

Agent	Permission	Prompt Characteristics	Use Cases
build	full	"You are the build agent. Focus on implementing..."	Coding, modifying files
plan	read-only	"You are the plan agent. You can read files and analyze..."	Code analysis, architecture design
explore	search-only	"Fast search-only agent for code exploration"	Quick search, file location

How Agent Prompts Are Injected

The prompt field in the Agent definition is injected at the very front of the system prompt when the query loop starts:

// boxagnts-query/src/query.rs
if let Some(ref agent) = config.agent_definition {
    if let Some(ref agent_prompt) = agent.prompt {
        patched.system_prompt = Some(match &config.system_prompt {
            Some(existing) => format!("{}\n\n{}", agent_prompt, existing),
            None => agent_prompt.clone(),
        });
    }
}

Additionally, the Agent can override the model and max turns:

let effective_model = if let Some(ref agent) = config.agent_definition {
    agent.model.clone().unwrap_or_else(|| config.model.clone())
} else {
    config.model.clone()
};

let effective_max_turns = config.agent_definition
    .as_ref()
    .and_then(|a| a.max_turns)
    .unwrap_or(config.max_turns);

This means users can use Agent definitions to implement "different models and roles at different stages of the same session" — for example, using a read-only slow-thinking model during the planning phase and a full-access fast model during the execution phase.

run_query_loop: The Heart of the Agent

run_query_loop() is the most core function in BoxAgnts, located in the boxagnts-query crate:

pub async fn run_query_loop(
    client: &AnthropicClient,        // API client
    messages: &mut Vec<Message>,     // Conversation history (mutable reference)
    tools: &[Box<dyn Tool>],         // Tool collection
    tool_ctx: &ToolContext,          // Tool execution context
    config: &QueryConfig,            // Loop configuration
    cost_tracker: Arc<CostTracker>,  // Cost tracking
    event_tx: Option<mpsc::UnboundedSender<QueryEvent>>, // Event push
    cancel_token: CancellationToken, // Cancellation signal
    pending_messages: Option<&mut Vec<String>>, // Pending message queue
) -> QueryOutcome

This function signature is itself an architectural document. Each parameter is a design decision:

Parameter	Design Intent
`client`	Single entry point, but internally switches 20+ models via ProviderRegistry
`messages: &mut Vec<Message>`	Directly modifies conversation history, appends content each iteration
`tools: &[Box<dyn Tool>]`	Type-erased tool collection, AI calls by name
`tool_ctx`	Carries work_dir, allowed_hosts and other sandbox config
`event_tx`	Real-time push of per-turn status to Dashboard / TUI
`cancel_token`	User can interrupt loop at any time
`pending_messages`	Insert commands mid-execution (e.g., user sends new message during tool execution)

The Five-Step Rhythm of the Main Loop

┌─────────────────────────────────────────────┐
│                  loop {                       │
│                                               │
│  ① Check termination conditions               │
│     · turn > max_turns ? → EndTurn           │
│     · cancel_token ?    → Cancelled          │
│     · budget exceeded?  → BudgetExceeded     │
│                                               │
│  ② Preprocess messages                       │
│     · drain pending_messages queue           │
│     · apply_tool_result_budget (truncate old results) │
│     · auto_compact (context compression)      │
│                                               │
│  ③ Build system prompt + Call LLM API        │
│     · Inject Agent definition / AGENTS.md    │
│     · Build CreateMessageRequest             │
│     · Stream receive StreamEvent              │
│     · Accumulate text / thinking / tool_use blocks │
│                                               │
│  ④ Process response                          │
│     · end_turn → return                       │
│     · tool_use → parallel execute tools → inject results → continue │
│     · max_tokens → resume conversation → continue │
│                                               │
│  ⑤ Error recovery                            │
│     · overloaded → switch fallback model     │
│     · stream stall → retry (max 2 times)      │
│                                               │
│  }                                            │
└─────────────────────────────────────────────┘

System Prompt Construction: The Agent's "Worldview"

Before each API call, BoxAgnts builds a complete system prompt:

fn build_system_prompt(config: &QueryConfig) -> SystemPrompt {
    let opts = SystemPromptOptions {
        custom_system_prompt: config.system_prompt.clone(),     // User custom
        append_system_prompt: config.append_system_prompt.clone(), // Appended content
        output_style: config.output_style,                      // Output style
        custom_output_style_prompt: config.output_style_prompt.clone(),
        working_directory: config.working_directory.clone(),    // Current working directory
        ..Default::default()
    };

    let text = boxagnts_core::system_prompt::build_system_prompt(&opts);
    SystemPrompt::Text(text)
}

The system prompt structure is hierarchical:

┌──────────────────────────────────────┐
│ Agent Role Definition (build/plan/explore) │  ← AgentDefinition.prompt
├──────────────────────────────────────┤
│ Core Capability Declaration           │
│ · Available tool list (16+)           │  ← Dynamically generated from tools parameter
│ · Skill list                          │  ← Discovered by SkillTool
│ · Output format requirements          │
│ · Security boundaries                 │
├──────────────────────────────────────┤
│ AGENTS.md content                     │  ← User project-level behavior spec
├──────────────────────────────────────┤
│ Dynamic Boundary Marker               │
│ --- Above cached, below not cached ---│
├──────────────────────────────────────┤
│ Session-specific information          │  ← Current working directory, time, etc.
└──────────────────────────────────────┘

The --- Above cached, below not cached --- divider is a clever design — Anthropic API supports prompt caching, and caching the above portion can significantly reduce token costs per API call.

max_tokens Recovery: The Agent's "Resume from Breakpoint"

When the AI's response hits the max_tokens limit, the model cuts off output midway. A normal API call ends here — but the Agent cannot stop.

BoxAgnts' solution is clever:

// boxagnts-query/src/query.rs
const MAX_TOKENS_RECOVERY_LIMIT: u32 = 3;

const MAX_TOKENS_RECOVERY_MSG: &str =
    "Output token limit hit. Resume directly — no apology, no recap of what \
     you were doing. Pick up mid-thought if that is where the cut happened. \
     Break remaining work into smaller pieces.";

When stop_reason == "max_tokens" is detected:

Add the partial response as an assistant message to the conversation
Append a special user message (MAX_TOKENS_RECOVERY_MSG)
Continue the loop — the model will continue generating from the cutoff point

The details in the prompt are worth noting — "no apology, no recap" — because an LLM's instinctive reaction after being cut off is "Sorry, I was interrupted, let me start over..." This leads to useless output. This prompt directly forbids that pattern.

auto_compact: When Context Gets Too Long

An LLM's context window is finite. As conversations grow longer and tool results pile up, there comes a moment when things no longer fit.

BoxAgnts' response is automatic compaction. The trigger condition is when token estimation reaches 90% of the context window:

// boxagnts-query/src/compact.rs
const AUTOCOMPACT_TRIGGER_FRACTION: f64 = 0.90;
const WARNING_PCT: f64 = 0.80;   // Warning at 80%
const CRITICAL_PCT: f64 = 0.95;  // Critical warning at 95%

The core compaction strategy is calling another LLM to "summarize" the conversation history:

Original conversation (potentially thousands of messages)
      │
      ▼
Compaction Prompt (NO_TOOLS_PREAMBLE → force summary mode)
      │
      ▼
LLM generates structured summary:
  · Primary Request and Intent
  · Key Technical Concepts
  · Files and Code Sections
  · Errors and fixes
  · Pending Tasks
  · Current Work
      │
      ▼
Summary replaces early conversation history, last 10 messages kept in original form

The compaction prompt has a key design — NO_TOOLS_PREAMBLE:

CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.
- Do NOT use Read, Bash, Grep, Glob, Edit, Write, or ANY other tool.
- You already have all the context you need in the conversation above.
- Tool calls will be REJECTED and will waste your only turn.

If the compacting LLM tries to call tools, the entire compaction is wasted. This preamble prevents such meta-recursion.

Tool Execution: From AI Decision to Execution Result

When the LLM returns stop_reason == "tool_use", the conversation enters the tool execution phase:

┌──────────────────────────────────────────────┐
│  Phase 1: Sequential PreToolUse preprocessing │
│  (Each tool block processed sequentially,     │
│   can interrupt execution)                     │
├──────────────────────────────────────────────┤
│  Phase 2: Parallel execution of non-blocking   │
│  tools                                         │
│  join_all(futures) → all tools run concurrently │
│  (Blocking tools return pre-computed error      │
│   results)                                      │
└──────────────────────────────────────────────┘

Key design point: tool results are injected in user message format. This leverages LLM message role semantics — the Assistant initiated the tool call, and the User (i.e., the system acting on behalf of the user) returned the tool result. The model understands this as "the user answered your request" and naturally proceeds to the next round of reasoning.

execute_tool: The Core of Tool Dispatch

// boxagnts-query/src/lib.rs
async fn execute_tool(
    name: &str,
    input: &Value,
    tools: &[Box<dyn Tool>],
    ctx: &ToolContext,
) -> ToolResult {
    let tool = tools.iter().find(|t| t.name() == name);

    match tool {
        Some(tool) => {
            debug!(tool = name, "Executing tool");
            tool.execute(input.clone(), ctx).await
        }
        None => {
            warn!(tool = name, "Unknown tool requested");
            ToolResult::error(format!("Unknown tool: {}", name))
        }
    }
}

An extremely simple implementation — a linear search. The tools vector typically has only a dozen elements, so the linear search overhead is negligible. Simplicity is more reliable than complexity.

Managed Agent Mode: Manager-Executor Architecture

When task complexity exceeds a single Agent's capacity, BoxAgnts provides Managed Agent mode:

                    ┌──────────────────┐
                    │  Manager Agent   │
                    │  (Strong model    │
                    │   like Opus)      │
                    │  Plans and        │
                    │  assigns only     │
                    └────────┬─────────┘
                             │
              ┌──────────────┼──────────────┐
              ▼              ▼              ▼
        ┌──────────┐  ┌──────────┐  ┌──────────┐
        │ Executor │  │ Executor │  │ Executor │
        │ (Sonnet)  │  │ (Sonnet)  │  │ (Sonnet)  │
        │ Subtask A│  │ Subtask B│  │ Subtask C│
        └──────────┘  └──────────┘  └──────────┘
            Parallel execution, each with independent context

The Manager's system prompt is injected with managed mode instructions:

pub fn managed_agent_system_prompt(config: &ManagedAgentConfig) -> String {
    format!(r#"
## Managed Agent Mode

You are the MANAGER in a manager-executor architecture.

### Your Role
- You coordinate work but do NOT execute tasks directly.
- Delegate all implementation work to executor agents.
- Each executor uses model `{executor_model}` with up to {max_turns} turns.
- You may run up to {max_concurrent} executors in parallel.

### Workflow
1. Analyze the user's request and break into sub-tasks.
2. Spawn executors using the Agent tool.
3. Review results. If insufficient, spawn follow-up executors.
4. Synthesize all results into a coherent response.
"#, ...)
}

The Manager does not execute tools itself — it only plans, assigns, and synthesizes results. Executors are ordinary Agent instances with the full tool set. This pattern separates "thinking" from "execution," both avoiding single-Agent context bloat and enabling true parallel processing.

Skill System: Teaching the Agent "Professional Skills"

Tools are the Agent's "hands" — reading files, writing files, executing commands. Skills are the Agent's "professional knowledge" — code review methodology, CSS refactoring guidelines, frontend component templates.

Skill File Format

A Skill is simply a SKILL.md file:

app/extensions/skills/
├── code-review/SKILL.md
├── css-refactor-advisor/SKILL.md
├── current-weather/SKILL.md
├── weather-forecast/SKILL.md
└── front-component-generator/SKILL.md

SkillTool Implementation

pub struct SkillTool;

#[async_trait]
impl Tool for SkillTool {
    fn name(&self) -> &str { "skill-tool" }

    async fn execute(&self, input: Value, ctx: &ToolContext) -> ToolResult {
        let params: SkillInput = serde_json::from_value(input)?;

        // "skill": "list" → List all available skills
        if params.skill == "list" {
            return list_skills(&dirs).await;
        }

        // Find and read SKILL.md
        let (skill_path, raw) = find_and_read_skill(&skill_name, &dirs).await?;

        // Strip YAML frontmatter
        let content = strip_frontmatter(&raw);

        // Replace $ARGUMENTS placeholder
        let prompt = if let Some(args) = &params.args {
            content.replace("$ARGUMENTS", args)
        } else {
            content.replace("$ARGUMENTS", "")
        };

        ToolResult::success(prompt)
    }
}

Dual-Layer Skill Search Paths

Skill search prioritizes the workspace directory, then the app extensions directory:

async fn skill_search_dirs(ctx: &ToolContext) -> Vec<PathBuf> {
    let mut dirs = vec![
        ctx.get_workspace_extensions_dir().await.join("skills")  // Project-level
    ];
    dirs.push(ctx.get_app_extensions_dir().await.join("skills")); // Global-level
    dirs
}

This means you can define project-specific Skills under your project directory (e.g., "Understand this project's build system") while also using global Skills (e.g., "Universal code review standards"). Project-level Skills take priority over global Skills.

$ARGUMENTS Placeholder

The most critical mechanism in Skill templates is $ARGUMENTS:

# Code Review Skill Template

Please review: $ARGUMENTS

Checklist:
1. Are functions too long (>50 lines)?
2. Are there unhandled Result/Option cases?
3. Are there unnecessary .clone() calls?
4. Does naming follow Rust conventions?

When the AI calls with args: "src/main.rs", $ARGUMENTS is replaced with src/main.rs. This turns Skills from "static knowledge" into "parameterized tools."

Streaming Push: Letting Users See the Agent "Think"

The entire query loop pushes status in real-time through the event_tx channel:

pub enum QueryEvent {
    Token { text: String },                    // Per-token push
    ToolStart { tool_name, tool_id, input },   // Tool start
    ToolEnd { tool_name, tool_id, result },    // Tool end
    Status(String),                            // Status message
}

These events are pushed to the Dashboard frontend in real-time via WebSocket, allowing users to see every decision the Agent makes — not facing a black box.

Summary

An AI Agent's multi-turn conversation is a complex control system:

System Prompt → API Call → Stream Parse → Tool Detection → Tool Execution → Result Injection → Call Again
     ↑                                                                         │
     └───────────────── Loop until end_turn ───────────────────────────────────┘

The robustness of this loop depends on:

Mechanism	Problem Solved
Agent definition system	Multi-role, multi-model switching
System prompt construction	Agent worldview + prompt caching
max_tokens recovery	Long output truncation
auto_compact (structured summaries)	Context overflow beyond window
tool_result_budget	Tool result accumulation

Related Resources

Boxagnts: https://github.com/guyoung/boxagnts

Top comments (2)

Harjot Singh • May 31

Multi-turn + tool/skill invocation is where agent frameworks earn their keep, because the single-turn demo hides all the hard state problems: keeping conversation context coherent across turns while tools mutate the world underneath you, and deciding when the agent should call a tool vs answer directly (over-eager tool-calling is its own failure mode). Getting that loop clean is most of the battle.

The part I'd dig into from my own build: tool/skill invocation gets expensive fast if every turn re-ships full context to decide which tool to call. In Moonshift (a multi-agent pipeline: prompt to a shipped SaaS on your own GitHub + Vercel) I scope context per turn and route the tool-selection decision to a cheap model, reserving the expensive model for the actual reasoning - that's a big part of why a full build stays ~$3 flat. First run's free, no card. Nice series - how does BoxAgnts decide tool-vs-respond, and does it keep full history per turn or compact it? Those two choices drive both quality and cost more than anything else I've found.

Guyoung Studio • May 31

Boxagnts does not make tool-use decisions itself. Instead, it delegates this choice entirely to the LLM. On every turn, the full conversation history, the system prompt, and the definitions of all available tools (including their names, descriptions, and JSON input schemas) are sent together in the API request. The LLM's streaming response contains content blocks of different types: Text blocks represent direct natural-language replies, ToolUse blocks indicate the model has chosen to invoke a specific tool, and Thinking blocks capture the model's reasoning process. If the response contains any ToolUse blocks, Boxagnts executes each tool sequentially, wraps the results as a User message appended to the conversation, decrements the turn counter (so tool invocations do not consume the turn budget), and continues the loop. The LLM then sees the tool results and decides whether to call another tool or produce a final answer. The loop ends only when the model emits an end_turn stop reason.

Regarding history, Boxagnts preserves the complete conversation by default but applies two protective mechanisms to prevent context window overflow. The first is the Tool Result Budget: when the cumulative character count of all tool outputs exceeds a configurable threshold (50,000 by default), the oldest results are replaced with short truncation placeholders while preserving the logical structure of the conversation. The second is Auto-Compact, which triggers when the context window reaches 90% usage. It groups messages by API round boundaries, uses a separate non-agentic API call to generate summaries of old groups while keeping the 10 most recent messages verbatim, and replaces the summarized portion with a synthetic compact-summary message. A circuit breaker disables compaction after three consecutive failures to prevent infinite retry loops.