# MCP vs CAP: Why Your AI Agents Need Both Protocols

#programming #mcp #agents #ai

The AI agent ecosystem is exploding with protocols. Anthropic released MCP (Model Context Protocol). Google announced A2A (Agent-to-Agent). Every week there's a new "standard" for agent communication.

But here's the thing most people miss: these protocols solve different problems at different layers. Using MCP for distributed agent orchestration is like using HTTP for job scheduling—wrong tool, wrong layer.

Let me break down the actual difference and why you probably need both.

What MCP Actually Does

MCP (Model Context Protocol) is a tool-calling protocol for a single model. It standardizes how one LLM discovers and invokes external tools—databases, APIs, file systems, etc.

┌─────────────────────────────────────┐
│            Your LLM                 │
│                                     │
│  "I need to query the database"     │
│              │                      │
│              ▼                      │
│     ┌─────────────┐                 │
│     │  MCP Client │                 │
│     └──────┬──────┘                 │
└────────────┼────────────────────────┘
             │
             ▼
     ┌───────────────┐
     │  MCP Server   │
     │  (tool host)  │
     └───────────────┘
             │
             ▼
        [Database]

MCP is great at this. It solves tool discovery, schema negotiation, and invocation for a single model context.

What MCP doesn't cover:

How do you schedule work across multiple agents?
How do you track job state across a cluster?
How do you enforce safety policies before execution?
How do you handle agent liveness and capacity?
How do you fan out workflows with parent/child relationships?

MCP was never designed for this. It's a tool protocol, not an orchestration protocol.

Enter CAP: The Missing Layer

CAP (Cordum Agent Protocol) is a cluster-native job protocol for AI agents. It standardizes the control plane that MCP doesn't touch:

Job lifecycle: submit → schedule → dispatch → run → complete
Distributed routing: pool-based dispatch with competing consumers
Safety hooks: allow/deny/throttle decisions before any job runs
Heartbeats: worker liveness, capacity, and pool membership
Workflows: parent/child jobs with aggregation
Pointer architecture: keeps payloads off the bus for security and performance

┌─────────────────────────────────────────────────────────────┐
│                     CAP Control Plane                       │
│                                                             │
│  Client ──▶ Gateway ──▶ Scheduler ──▶ Safety ──▶ Workers   │
│                              │                      │       │
│                              ▼                      ▼       │
│                         [Job State]           [Results]     │
└─────────────────────────────────────────────────────────────┘
                                                      │
                                                      ▼
                                              ┌──────────────┐
                                              │ MCP (tools)  │
                                              └──────────────┘

CAP handles:

BusPacket envelopes for all messages
JobRequest / JobResult with full state machine
context_ptr / result_ptr to keep blobs off the wire
Heartbeats for worker pools
Safety Kernel integration (policy checks before dispatch)
Workflow orchestration with workflow_id, parent_job_id, step_index

The Key Insight: Different Layers

Think of it like the network stack:

Layer	Protocol	What It Does
Tool execution	MCP	Model ↔ Tool communication
Agent orchestration	CAP	Job scheduling, routing, safety, state
Transport	NATS/Kafka	Message delivery

MCP is layer 7. CAP is layer 5-6.

You wouldn't use HTTP to schedule Kubernetes jobs. Similarly, you shouldn't use MCP to orchestrate distributed agent workloads.

How They Work Together

Here's the beautiful part: MCP and CAP complement each other perfectly.

A CAP worker receives a job, executes it (potentially using MCP to call tools), and returns a result. MCP handles the tool-calling inside the worker. CAP handles everything outside.

┌─────────────────────────────────────────────────────────────────┐
│                         CAP Cluster                             │
│                                                                 │
│   ┌──────────┐    ┌───────────┐    ┌─────────────────────────┐ │
│   │  Client  │───▶│ Scheduler │───▶│      Worker Pool        │ │
│   └──────────┘    └───────────┘    │  ┌───────────────────┐  │ │
│                         │          │  │   CAP Worker      │  │ │
│                         ▼          │  │        │          │  │ │
│                   [Safety Kernel]  │  │        ▼          │  │ │
│                                    │  │   ┌─────────┐     │  │ │
│                                    │  │   │   MCP   │     │  │ │
│                                    │  │   │ Client  │     │  │ │
│                                    │  │   └────┬────┘     │  │ │
│                                    │  └────────┼──────────┘  │ │
│                                    └───────────┼─────────────┘ │
└────────────────────────────────────────────────┼───────────────┘
                                                 ▼
                                          [MCP Servers]
                                          (tools, DBs, APIs)

Example flow:

Client submits job via CAP (JobRequest to sys.job.submit)
Scheduler checks Safety Kernel → approved
Job dispatched to worker pool via CAP
Worker uses MCP to call tools (query DB, fetch API, etc.)
Worker returns result via CAP (JobResult to sys.job.result)
Scheduler updates state, notifies client

MCP never touches the bus. CAP never touches the tools. Clean separation.

Why This Matters for Production

If you're building a toy demo, you don't need CAP. One model, a few tools, MCP is plenty.

But if you're building production multi-agent systems, you need:

Requirement	MCP	CAP
Tool discovery & invocation	✅	❌
Job scheduling	❌	✅
Distributed worker pools	❌	✅
Safety policies (allow/deny/throttle)	❌	✅
Job state machine	❌	✅
Worker heartbeats & capacity	❌	✅
Workflow orchestration	❌	✅
Payload security (pointer refs)	❌	✅

CAP gives you the control plane. MCP gives you the tool plane.

Getting Started with CAP

CAP is open source (Apache-2.0) with SDKs for Go, Python, Node/TS, and C++.

Minimal Go worker (20 lines):

nc, _ := nats.Connect("nats://127.0.0.1:4222")

nc.QueueSubscribe("job.echo", "job.echo", func(msg *nats.Msg) {
    var pkt agentv1.BusPacket
    proto.Unmarshal(msg.Data, &pkt)

    req := pkt.GetJobRequest()
    res := &agentv1.JobResult{
        JobId:  req.GetJobId(),
        Status: agentv1.JobStatus_JOB_STATUS_SUCCEEDED,
    }

    out, _ := proto.Marshal(&agentv1.BusPacket{
        Payload: &agentv1.BusPacket_JobResult{JobResult: res},
    })
    nc.Publish("sys.job.result", out)
})