DEV Community

Cover image for The Foreman Protocol: How OpenPawz gives AI agents bidirectional access to community driven services
Gotham64
Gotham64

Posted on

The Foreman Protocol: How OpenPawz gives AI agents bidirectional access to community driven services

The hidden cost of AI tool execution

When an AI agent sends a Slack message, the Slack API itself is free. But the cloud model has to process the tool schema, reason about parameters, format structured JSON, wait for the result, and summarize it back to the user. Every one of those steps consumes tokens at your provider's rate.

Now multiply that across an automation that sends 50 messages, creates 10 tickets, and updates 5 spreadsheets. The cloud API costs dominate — not because the tools are expensive, but because the reasoning about how to call them is expensive.

And it gets worse. As integrations grow, the cloud LLM has to hold more tool schemas in its context window in order to call them:

  • OpenPawz tools — manageable context overhead
  • Built-in tools — significant context consumed by schemas alone
  • Community integrations — impossible to load into any context window

Formatting a JSON-RPC call is not a task that needs GPT-4 or Claude Opus. OpenPawz solves this with the Foreman Protocol — splitting the agent into two roles, each doing only what it's suited for.

OpenPawz

Star the repo — it's open source


The invention: Architect plans, Foreman executes

The Foreman Protocol splits the agent into two roles:

Role Model Does Costs
Architect Cloud LLM (GPT, Claude, Gemini) Plans, reasons, talks to user — decides what needs to happen Per-token (paid)
Foreman Any cheap/free model Interfaces with services — handles how it happens local models or cloud based models

The Architect never sees MCP schemas. The Foreman never reasons about user intent. Each model does only what it's suited for.


Bidirectional, not a pipeline

The Foreman is not a one-way executor in a predefined sequence. It is a bidirectional bridge between your agent and every connected service. It can:

  • Read — Query a database, list Slack channels, fetch open Jira tickets, check GitHub PR status
  • Write — Send a message, create a ticket, update a spreadsheet, post to a webhook
  • Both in one task — Read the open tickets, then post a summary to Slack

And it doesn't need to be part of a flow or automation chain. The agent can reach into any connected service at any point in a conversation, for any reason — to answer a question, check a fact, or pull context before making a decision. No predetermined sequence. No predefined trigger.

This is what makes it fundamentally different from automation platforms like Zapier, N8N and Make, where you build flows — predefined sequences of steps. With the Foreman Protocol, the agent decides what information it needs and what actions to take in real time.

Examples

Reading (querying information):

"What are the open tickets assigned to me in Jira?"
→ Architect decides it needs Jira data → Foreman queries Jira via MCP → returns ticket list → Architect summarizes for user

Writing (taking action):

"Send 'hello' to #general on Slack"
→ Architect decides to post a message → Foreman calls Slack via MCP → message sent → Architect confirms

Both in one conversation:

"Summarize my open GitHub PRs and post the summary to #engineering on Slack"
→ Architect plans two steps → Foreman reads from GitHub, then writes to Slack → Architect presents the result

Ad-hoc access (no flow, no sequence):

"How many unread messages do I have in Slack?"
→ The agent just reaches into Slack, checks, and answers. No automation. No workflow. Just a question answered from a live data source.


Why self-describing MCP is the key

The Foreman Protocol would not work without self-describing tool schemas. Here's why:

Traditional tool execution: The LLM must have the tool's schema in its context to know how to call it. With thousands of potential integrations, you can't fit all their schemas into any context window.

With MCP: The Foreman connects to the MCP server and asks "What tools do you have?" The server responds with complete schemas — parameter names, types, descriptions, examples. The Foreman uses these to find and execute the right operation.

No pre-training. No static configuration. No context window overflow.

This means:

  • Any new integration is accessible immediately — install it, the Foreman can execute it
  • Zero configuration per service — no prompt engineering, no few-shot examples, no fine-tuning
  • Any model works — the Foreman just needs to follow JSON-RPC formatting, which any code-capable model can do
  • Reads are as natural as writes — querying a database and sending a Slack message go through the same execution path

The cost structure inverts

In a traditional agent architecture, the cloud model handles everything — intent, planning, tool formatting, execution, response. You pay cloud rates for all of it.

With the Foreman Protocol, the cloud model only handles intent and planning. All service interaction — every read and every write — is delegated to a worker model. The Architect pays only for the tokens it actually needs frontier intelligence for.

The savings scale with usage. The more your agents interact with connected services, the more you save — because every tool call that would have burned premium tokens is handled by the cheapest capable model in the stack.


vs. automation platforms

Platform AI-Driven? Tool Execution Cost Integrations Bidirectional?
OpenPawz (Foreman) Yes — natural language Free (local) or cheap (cloud) Community Services Yes — read + write, any time
Zapier Partial Per-task pricing 7,000 No — predefined flows
Make No Per-operation pricing 2,000 No — predefined flows
n8n Standalone No — manual workflows Free (self-hosted) 400+ built-in No — predefined flows

OpenPawz is the only platform where you can say "What are my open PRs on GitHub, and post a summary to #engineering on Slack" and have it work — with natural language, AI-driven execution, bidirectional service access, and free local tool execution across 25,000+ integrations.


Key design decisions

1. Interception, not routing

The Foreman is wired into the main agent loop's execute_tool() path. Any mcp_* tool call is automatically intercepted — the Architect doesn't need to know the Foreman exists. Zero changes to agent prompts or system instructions. Works with any cloud provider. Transparent fallback if no worker model is configured.

2. Mini agent loop (8 rounds max)

The Foreman runs a constrained agent loop — up to 8 rounds of tool calls. This handles multi-step tasks (query a database → format results → post to Slack) and multi-read scenarios (check Jira + check GitHub + check Slack) without risking infinite loops.

3. No recursion

The Foreman cannot spawn sub-workers or delegate to other agents. It receives a task, executes MCP tools, and returns a result. This prevents runaway delegation chains.

4. Direct MCP execution

The Foreman calls MCP servers directly via JSON-RPC — it doesn't go back through the engine's execute_tool() path. This prevents the worker's MCP calls from being intercepted again (infinite loop) and keeps the execution path simple.

5. Graceful fallback

If no worker_model is configured, MCP tool calls execute directly via JSON-RPC as before. The Foreman Protocol is additive — it improves cost efficiency but is never required.


Implementation

The core flow in simplified Rust:

// In execute_tool() — MCP path
if tool_name.starts_with("mcp_") {
    // Try Foreman delegation first
    if let Some(result) = delegate_to_worker(
        tool_name, tool_args, engine_state
    ).await? {
        return Ok(result); // Foreman handled it
    }
    // Fallback: direct JSON-RPC execution
    registry.execute_tool(tool_name, tool_args).await
}
Enter fullscreen mode Exit fullscreen mode
File Purpose
engine/tools/worker_delegate.rs Core — delegate_to_worker(), run_worker_loop(), execute_worker_tool()
engine/tools/mod.rs MCP interception point in execute_tool()
engine/mcp/registry.rs MCP tool schema discovery
engine/mcp/client.rs JSON-RPC tool execution
commands/ollama.rs Worker model management

Model requirements

The Foreman can run any model from any provider:

Local Example (Ollama — free): The default qwen2.5-coder:7b requires ~5 GB disk and runs on 8+ GB RAM (CPU) or 5+ GB VRAM (GPU). On Apple Silicon (M1+), inference is fast enough that tool execution feels instant. Zero API cost.

Cloud (any provider — cheap): Use a cheap model from your existing provider — gemini-2.0-flash, gpt-4o-mini, claude-haiku-4-5, deepseek-chat. No local hardware needed. The worker model can use a different provider than the Architect.

The worker Modelfile for Ollama:

FROM qwen2.5-coder:7b
SYSTEM You are a precise tool executor. Given a task and available MCP tools,
execute the correct tool call and return the result. Be concise.
PARAMETER temperature 0.1
PARAMETER num_ctx 8192
Enter fullscreen mode Exit fullscreen mode

Low temperature ensures structured, deterministic tool calls. The 7B model is large enough for reliable JSON-RPC formatting but small enough to run on consumer hardware.


Part of a trinity

The Foreman Protocol works with two complementary OpenPawz innovations:

Protocol Problem Solution
The Librarian Method Which tool to use among many? Intent-driven discovery via semantic embeddings
The Foreman Protocol How to execute tools cheaply? Worker model delegation via self-describing MCP
The Conductor Protocol What's the optimal execution plan? AI-compiled flow strategies

Together: the Librarian finds the right tool, the Foreman executes it for free, and the Conductor orchestrates everything into minimal LLM calls.

In practice, an agent can discover and execute any of 25,000+ integrations at near-zero cost — something no other AI agent platform achieves.


Try it

Option A: Local worker (Ollama — free)

ollama pull qwen2.5-coder:7b
Enter fullscreen mode Exit fullscreen mode
  1. Go to Settings → Advanced → Ollama and click Setup Worker Agent
  2. In Settings → Models → Model Routing, set Worker Model to worker-qwen

Option B: Cloud worker (any provider — cheap)

  1. Go to Settings → Models → Model Routing
  2. Set your Boss Model (e.g. gemini-3.1-pro-preview, gpt-4o, claude-opus-4-6)
  3. Set your Worker Model to a cheaper model from the same or different provider (e.g. gemini-2.0-flash, gpt-4o-mini, claude-haiku-4-5)

Use it

Just chat normally. When your agent calls any MCP tool, the Foreman handles execution automatically:

"Generate a QR code for https://openpawz.ai"

Architect identifies the task → Librarian finds n8n QR code node → Foreman executes via MCP → QR code returned — tool execution handled by the worker model, not the expensive Architect.


Read the full spec

The complete technical reference — including architecture diagrams, cost analysis, and implementation details:

Star the repo if you want to track progress. 🙏

OpenPawz — Your AI, Your Rules

A native desktop AI platform that runs fully offline, connects to any provider, and puts you in control. Private by default. Powerful by design.

favicon openpawz.ai

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.