Gotham64

Posted on Mar 9

The Foreman Protocol: How OpenPawz gives AI agents bidirectional access to community driven services

#ai #agents #opensource #mcp

The hidden cost of AI tool execution

When an AI agent sends a Slack message, the Slack API itself is free. But the cloud model has to process the tool schema, reason about parameters, format structured JSON, wait for the result, and summarize it back to the user. Every one of those steps consumes tokens at your provider's rate.

Now multiply that across an automation that sends 50 messages, creates 10 tickets, and updates 5 spreadsheets. The cloud API costs dominate — not because the tools are expensive, but because the reasoning about how to call them is expensive.

And it gets worse. As integrations grow, the cloud LLM has to hold more tool schemas in its context window in order to call them:

OpenPawz tools — manageable context overhead
Built-in tools — significant context consumed by schemas alone
Community integrations — impossible to load into any context window

Formatting a JSON-RPC call is not a task that needs GPT-4 or Claude Opus. OpenPawz solves this with the Foreman Protocol — splitting the agent into two roles, each doing only what it's suited for.

Star the repo — it's open source

The invention: Architect plans, Foreman executes

The Foreman Protocol splits the agent into two roles:

Role	Model	Does	Costs
Architect	Cloud LLM (GPT, Claude, Gemini)	Plans, reasons, talks to user — decides what needs to happen	Per-token (paid)
Foreman	Any cheap/free model	Interfaces with services — handles how it happens	local models or cloud based models

The Architect never sees MCP schemas. The Foreman never reasons about user intent. Each model does only what it's suited for.

Bidirectional, not a pipeline

The Foreman is not a one-way executor in a predefined sequence. It is a bidirectional bridge between your agent and every connected service. It can:

Read — Query a database, list Slack channels, fetch open Jira tickets, check GitHub PR status
Write — Send a message, create a ticket, update a spreadsheet, post to a webhook
Both in one task — Read the open tickets, then post a summary to Slack

And it doesn't need to be part of a flow or automation chain. The agent can reach into any connected service at any point in a conversation, for any reason — to answer a question, check a fact, or pull context before making a decision. No predetermined sequence. No predefined trigger.

This is what makes it fundamentally different from automation platforms like Zapier, N8N and Make, where you build flows — predefined sequences of steps. With the Foreman Protocol, the agent decides what information it needs and what actions to take in real time.

Examples

Reading (querying information):

"What are the open tickets assigned to me in Jira?"
→ Architect decides it needs Jira data → Foreman queries Jira via MCP → returns ticket list → Architect summarizes for user

Writing (taking action):

"Send 'hello' to #general on Slack"
→ Architect decides to post a message → Foreman calls Slack via MCP → message sent → Architect confirms

Both in one conversation:

"Summarize my open GitHub PRs and post the summary to #engineering on Slack"
→ Architect plans two steps → Foreman reads from GitHub, then writes to Slack → Architect presents the result

Ad-hoc access (no flow, no sequence):

"How many unread messages do I have in Slack?"
→ The agent just reaches into Slack, checks, and answers. No automation. No workflow. Just a question answered from a live data source.

Why self-describing MCP is the key

The Foreman Protocol would not work without self-describing tool schemas. Here's why:

Traditional tool execution: The LLM must have the tool's schema in its context to know how to call it. With thousands of potential integrations, you can't fit all their schemas into any context window.

With MCP: The Foreman connects to the MCP server and asks "What tools do you have?" The server responds with complete schemas — parameter names, types, descriptions, examples. The Foreman uses these to find and execute the right operation.

No pre-training. No static configuration. No context window overflow.

This means:

Any new integration is accessible immediately — install it, the Foreman can execute it
Zero configuration per service — no prompt engineering, no few-shot examples, no fine-tuning
Any model works — the Foreman just needs to follow JSON-RPC formatting, which any code-capable model can do
Reads are as natural as writes — querying a database and sending a Slack message go through the same execution path

The cost structure inverts

In a traditional agent architecture, the cloud model handles everything — intent, planning, tool formatting, execution, response. You pay cloud rates for all of it.

With the Foreman Protocol, the cloud model only handles intent and planning. All service interaction — every read and every write — is delegated to a worker model. The Architect pays only for the tokens it actually needs frontier intelligence for.

The savings scale with usage. The more your agents interact with connected services, the more you save — because every tool call that would have burned premium tokens is handled by the cheapest capable model in the stack.

vs. automation platforms

Platform	AI-Driven?	Tool Execution Cost	Integrations	Bidirectional?
OpenPawz (Foreman)	Yes — natural language	Free (local) or cheap (cloud)	Community Services	Yes — read + write, any time
Zapier	Partial	Per-task pricing	7,000	No — predefined flows
Make	No	Per-operation pricing	2,000	No — predefined flows
n8n Standalone	No — manual workflows	Free (self-hosted)	400+ built-in	No — predefined flows

OpenPawz is the only platform where you can say "What are my open PRs on GitHub, and post a summary to #engineering on Slack" and have it work — with natural language, AI-driven execution, bidirectional service access, and free local tool execution across 25,000+ integrations.

Key design decisions

1. Interception, not routing

The Foreman is wired into the main agent loop's execute_tool() path. Any mcp_* tool call is automatically intercepted — the Architect doesn't need to know the Foreman exists. Zero changes to agent prompts or system instructions. Works with any cloud provider. Transparent fallback if no worker model is configured.

2. Mini agent loop (8 rounds max)

The Foreman runs a constrained agent loop — up to 8 rounds of tool calls. This handles multi-step tasks (query a database → format results → post to Slack) and multi-read scenarios (check Jira + check GitHub + check Slack) without risking infinite loops.

3. No recursion

The Foreman cannot spawn sub-workers or delegate to other agents. It receives a task, executes MCP tools, and returns a result. This prevents runaway delegation chains.

4. Direct MCP execution

The Foreman calls MCP servers directly via JSON-RPC — it doesn't go back through the engine's execute_tool() path. This prevents the worker's MCP calls from being intercepted again (infinite loop) and keeps the execution path simple.

5. Graceful fallback

If no worker_model is configured, MCP tool calls execute directly via JSON-RPC as before. The Foreman Protocol is additive — it improves cost efficiency but is never required.

Implementation

The core flow in simplified Rust:

// In execute_tool() — MCP path
if tool_name.starts_with("mcp_") {
    // Try Foreman delegation first
    if let Some(result) = delegate_to_worker(
        tool_name, tool_args, engine_state
    ).await? {
        return Ok(result); // Foreman handled it
    }
    // Fallback: direct JSON-RPC execution
    registry.execute_tool(tool_name, tool_args).await
}

File	Purpose
`engine/tools/worker_delegate.rs`	Core — `delegate_to_worker()`, `run_worker_loop()`, `execute_worker_tool()`
`engine/tools/mod.rs`	MCP interception point in `execute_tool()`
`engine/mcp/registry.rs`	MCP tool schema discovery
`engine/mcp/client.rs`	JSON-RPC tool execution
`commands/ollama.rs`	Worker model management

Model requirements

The Foreman can run any model from any provider:

Local Example (Ollama — free): The default qwen2.5-coder:7b requires ~5 GB disk and runs on 8+ GB RAM (CPU) or 5+ GB VRAM (GPU). On Apple Silicon (M1+), inference is fast enough that tool execution feels instant. Zero API cost.

Cloud (any provider — cheap): Use a cheap model from your existing provider — gemini-2.0-flash, gpt-4o-mini, claude-haiku-4-5, deepseek-chat. No local hardware needed. The worker model can use a different provider than the Architect.

The worker Modelfile for Ollama:

FROM qwen2.5-coder:7b
SYSTEM You are a precise tool executor. Given a task and available MCP tools,
execute the correct tool call and return the result. Be concise.
PARAMETER temperature 0.1
PARAMETER num_ctx 8192

Low temperature ensures structured, deterministic tool calls. The 7B model is large enough for reliable JSON-RPC formatting but small enough to run on consumer hardware.

Part of a trinity

The Foreman Protocol works with two complementary OpenPawz innovations:

Protocol	Problem	Solution
The Librarian Method	Which tool to use among many?	Intent-driven discovery via semantic embeddings
The Foreman Protocol	How to execute tools cheaply?	Worker model delegation via self-describing MCP
The Conductor Protocol	What's the optimal execution plan?	AI-compiled flow strategies

Together: the Librarian finds the right tool, the Foreman executes it for free, and the Conductor orchestrates everything into minimal LLM calls.

In practice, an agent can discover and execute any of 25,000+ integrations at near-zero cost — something no other AI agent platform achieves.

Try it

Option A: Local worker (Ollama — free)

ollama pull qwen2.5-coder:7b

Go to Settings → Advanced → Ollama and click Setup Worker Agent
In Settings → Models → Model Routing, set Worker Model to worker-qwen

Option B: Cloud worker (any provider — cheap)

Go to Settings → Models → Model Routing
Set your Boss Model (e.g. gemini-3.1-pro-preview, gpt-4o, claude-opus-4-6)
Set your Worker Model to a cheaper model from the same or different provider (e.g. gemini-2.0-flash, gpt-4o-mini, claude-haiku-4-5)

Use it

Just chat normally. When your agent calls any MCP tool, the Foreman handles execution automatically:

"Generate a QR code for https://openpawz.ai"

Architect identifies the task → Librarian finds n8n QR code node → Foreman executes via MCP → QR code returned — tool execution handled by the worker model, not the expensive Architect.