The Problem Nobody Talks About
You can wire together a multi-agent system in an afternoon with LangChain, CrewAI, or AutoGen. You'll have agents calling tools, passing messages, and producing outputs. It works.
Then someone asks: "Which agent decided to access that patient record? Can you show us the full decision trail for the compliance audit?"
Or: "The orchestrator passed a context payload to the subagent — was PII scrubbed before that transfer?"
Or simply: "How do we know the right agent handled this request? Is there a capability check before invocation?"
Most frameworks don't answer these questions. They give you plumbing — not governance. For internal prototypes that's fine. For enterprise deployments in regulated industries (healthcare, education, financial services), it isn't.
This article introduces MACF — the Multi-Agent Collaborative Framework — a six-component reference architecture that adds the layer most frameworks skip: compliance-enforced, auditable, privacy-aware agent coordination.
Multi-Agent Systems in 90 Seconds
Before diving into MACF, a brief grounding for readers new to multi-agent architectures.
A multi-agent system is a collection of AI agents — each with a defined capability scope — that collaborate to complete tasks neither could do alone. A common pattern: an orchestrator agent receives a user request, decomposes it, delegates subtasks to specialist agents (a retrieval agent, a summarization agent, a compliance-check agent), and assembles the final response.
This is different from a single LLM with tool calls. Here, each agent may run a different model, hold its own context window, call its own tools, and run in parallel with other agents. Communication between agents is structured — each message has a sender, receiver, and payload.
The value is clear: parallelism, specialization, and modularity. The challenge is equally clear: when agents are coordinated across a pipeline, failures — and compliance violations — can cascade. A privacy leak in one agent propagates silently to every downstream agent unless something stops it.
MACF is that something.
What Current Frameworks Provide (and What They Skip)
LangChain, CrewAI, AutoGen, and Google ADK are all excellent at defining agent topology — which agents exist, how they connect, and what tools they can call.
What none of them include out of the box:
| Concern | LangChain | CrewAI | AutoGen | Google ADK |
|---|---|---|---|---|
| Capability-gated agent invocation | ❌ | ❌ | ❌ | Partial |
| Regulatory context propagation across agent hops | ❌ | ❌ | ❌ | ❌ |
| PII/PHI scrubbing before context transfer | ❌ | ❌ | ❌ | ❌ |
| Pre-response compliance enforcement (HIPAA, TCPA, GDPR) | ❌ | ❌ | ❌ | ❌ |
| Tamper-evident immutable audit trail | ❌ | ❌ | ❌ | ❌ |
MACF doesn't replace these frameworks. It runs as an infrastructure layer that wraps them — the same way an API gateway sits in front of microservices without replacing the services themselves.
The MACF Architecture
MACF defines six components. Each has a typed interface, a defined responsibility, and a compliance guarantee.
flowchart TB
User([User Request]) --> Orch
subgraph Agents ["Your Agent Framework (LangChain / CrewAI / AutoGen / ADK)"]
Orch[Orchestrator Agent]
SA1[Specialist Agent A]
SA2[Specialist Agent B]
end
subgraph MACF ["MACF Infrastructure Layer"]
AR["AgentRegistry\nauthorize() · capability check"]
MB["MessageBus\nroute() · regulatory_context propagation"]
CS["ContextStore\nget() / set() · TTL · audit flag"]
PF["PrivacyFilter\nscrub() · PHI/PII redaction"]
CG["ComplianceGate\nenforce() · HIPAA · TCPA · GDPR · EU AI Act"]
AT["AuditTrail\nappend() · verify() · hash chain"]
end
Orch -->|1 authorize| AR
AR -->|✅ capability check| Orch
Orch -->|2 route + scrub| MB
MB --> PF
PF --> MB
MB -->|log| AT
MB --> SA1
MB --> SA2
SA1 -->|read context| CS
CS -->|log read| AT
SA1 -->|response| Orch
Orch -->|3 enforce| CG
CG -->|log gate result| AT
CG -->|✅ gated response| Response([Final Response])
Let's walk through each component.
1. AgentRegistry
The AgentRegistry is the single source of truth for what agents exist and what they're allowed to do. Before any agent is invoked, the registry performs a capability authorization check — validating that the requesting agent has permission to call the target agent in the current regulatory context.
from dataclasses import dataclass, field
from typing import Set, Optional, Callable
@dataclass
class AgentCapability:
agent_id: str
name: str
allowed_callers: Set[str] # which agents can invoke this one
required_context_keys: Set[str] # context fields that must be present
regulatory_scope: Set[str] # e.g. {"hipaa", "ferpa", "eu_ai_act"}
handler: Callable
class AgentRegistry:
def __init__(self):
self._agents: dict[str, AgentCapability] = {}
def register(self, capability: AgentCapability) -> None:
self._agents[capability.agent_id] = capability
def authorize(
self,
caller_id: str,
target_id: str,
context: dict,
) -> bool:
cap = self._agents.get(target_id)
if not cap:
return False
if caller_id not in cap.allowed_callers:
return False
missing = cap.required_context_keys - set(context.keys())
if missing:
raise ValueError(f"Missing required context keys: {missing}")
return True
def get(self, agent_id: str) -> Optional[AgentCapability]:
return self._agents.get(agent_id)
Why this matters: Without an authorization check, any agent can call any other agent. In a regulated deployment, an unauthenticated agent calling a records-access agent is an access-control violation — regardless of what the underlying model does.
2. MessageBus
The MessageBus handles all inter-agent communication. Every message is typed and carries a regulatory_context field — a propagation envelope that tells downstream agents which regulatory constraints apply to this request.
from dataclasses import dataclass, field
from typing import Any, Dict, Optional
import datetime
@dataclass
class AgentMessage:
sender_id: str
receiver_id: str
payload: Any
message_id: str
timestamp: str = field(
default_factory=lambda: datetime.datetime.now(datetime.timezone.utc).isoformat()
)
regulatory_context: Dict[str, Any] = field(default_factory=dict)
# e.g. {"regulations": ["hipaa", "tcpa"], "consent_verified": True,
# "jurisdiction": "US", "session_id": "sess_abc123"}
class MessageBus:
def route(
self,
message: AgentMessage,
registry: AgentRegistry,
privacy_filter: "PrivacyFilter",
audit_trail: "AuditTrail",
) -> AgentMessage:
# 1. Authorization check
authorized = registry.authorize(
caller_id=message.sender_id,
target_id=message.receiver_id,
context=message.regulatory_context,
)
if not authorized:
raise PermissionError(
f"Agent {message.sender_id!r} is not authorized "
f"to invoke {message.receiver_id!r}"
)
# 2. Scrub PII from payload before routing
if isinstance(message.payload, str):
scrub_result = privacy_filter.scrub(message.payload)
message = AgentMessage(
**{**message.__dict__, "payload": scrub_result.clean_text}
)
# 3. Append routing decision to audit trail
audit_trail.append({
"event": "message_routed",
"from": message.sender_id,
"to": message.receiver_id,
"message_id": message.message_id,
"regulatory_context": message.regulatory_context,
})
return message
The key design decision: regulatory_context propagates with every message hop. An orchestrator that sets {"regulations": ["hipaa"], "consent_verified": True} will have that context available to every downstream subagent — without requiring each subagent to re-derive it.
3. ContextStore
The ContextStore holds cross-agent shared state with two MACF-specific additions: TTL enforcement (context expires, preventing stale data from leaking across sessions) and audit flagging (any read of a flagged key is automatically recorded in the AuditTrail).
import time
from typing import Any, Optional
class ContextStore:
def __init__(self, default_ttl_seconds: int = 3600):
self._store: dict = {}
self._ttl: dict = {}
self._audit_keys: set = set()
self.default_ttl = default_ttl_seconds
def set(
self,
key: str,
value: Any,
ttl_seconds: Optional[int] = None,
audit: bool = False,
) -> None:
self._store[key] = value
self._ttl[key] = time.time() + (ttl_seconds or self.default_ttl)
if audit:
self._audit_keys.add(key)
def get(self, key: str, audit_trail: Optional["AuditTrail"] = None) -> Optional[Any]:
if key not in self._store:
return None
if time.time() > self._ttl.get(key, 0):
del self._store[key]
return None
if key in self._audit_keys and audit_trail:
audit_trail.append({"event": "context_read", "key": key})
return self._store[key]
Practical use: A student's enrollment record retrieved during session initialization can be stored with audit=True so every downstream read is traced — satisfying FERPA §99.32 (recordkeeping of education record disclosures).
4. PrivacyFilter
The PrivacyFilter scrubs PHI and PII from any text payload before it crosses an agent boundary. It runs on every outbound message from the orchestrator to subagents, and on every context value written to the ContextStore.
import re
from dataclasses import dataclass
from typing import List, Tuple
@dataclass
class ScrubResult:
clean_text: str
replacements: List[Tuple[str, str]] # (pattern_name, matched_value)
@property
def was_modified(self) -> bool:
return bool(self.replacements)
class PrivacyFilter:
# Extend this list for your regulatory scope
_PATTERNS = [
("SSN", r"\b\d{3}-\d{2}-\d{4}\b"),
("MRN_PARAM", r"(?:mrn|patient_id|dob|ssn)=[^&\s]+"),
("EMAIL", r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"),
("PHONE", r"\b(\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b"),
("STUDENT_ID", r"\b(?:student_id|stu_id)=\S+"),
("DOB", r"\b(?:dob|date_of_birth)[=:\s]+\d{1,2}[/\-]\d{1,2}[/\-]\d{2,4}"),
]
def scrub(self, text: str) -> ScrubResult:
replacements = []
result = text
for name, pattern in self._PATTERNS:
matches = re.findall(pattern, result, re.IGNORECASE)
if matches:
result = re.sub(pattern, "[REDACTED]", result, flags=re.IGNORECASE)
replacements.extend([(name, m) for m in matches])
return ScrubResult(clean_text=result, replacements=replacements)
In a healthcare context, this is the difference between sending "Patient mrn=9876543 needs billing review" to a downstream agent and sending "Patient mrn=[REDACTED] needs billing review". The subagent can still process the intent without receiving the PHI.
The regulated-ai-governance and voice-ai-governance OSS packages ship production-grade implementations of this scrubbing layer with 14+ PHI patterns validated across 100% recall on a test corpus.
5. ComplianceGate
The ComplianceGate is evaluated before every agent response is returned to the caller. It enforces applicable regulations — HIPAA, TCPA, GDPR, EU AI Act Article 13/14 — based on the regulatory_context carried by the incoming message.
This is where the confidence-escalation package integrates directly into MACF. The ComplianceGate doesn't just check rules — it also evaluates the agent's confidence in its response and can trigger a human-in-the-loop handoff if confidence falls below threshold.
pip install confidence-escalation regulated-ai-governance
from confidence_escalation import (
MultiSignalConfidenceScorer,
ThresholdPolicy,
EscalationAction,
HumanInLoopHandler,
ComplianceLoggingHandler,
ConfidenceEscalationMiddleware,
ScoringMethod,
)
# Build a multi-signal scorer: logprob + verbalized confidence + tool-call risk
scorer = MultiSignalConfidenceScorer(
weights={
ScoringMethod.LOGPROB: 0.5,
ScoringMethod.VERBALIZED: 0.3,
ScoringMethod.TOOL_CALL_RISK: 0.2,
}
)
# Two-tier policy: warn below 0.65, abort below 0.30
policy = ThresholdPolicy(
threshold=0.65,
action=EscalationAction.HUMAN_IN_LOOP,
critical_threshold=0.30,
critical_action=EscalationAction.ABORT,
)
# Handlers: route to human queue OR write to compliance log
handlers = [
HumanInLoopHandler(queue_callback=your_human_queue.enqueue),
ComplianceLoggingHandler(logger=your_compliance_logger),
]
class ComplianceGate:
def __init__(self, scorer, policy, handlers, audit_trail):
self._middleware = ConfidenceEscalationMiddleware(
scorer=scorer,
policy=policy,
handlers=handlers,
)
self._audit = audit_trail
def enforce(self, response: str, context: dict) -> dict:
# 1. Confidence check
escalation_event = self._middleware.evaluate(response, context)
self._audit.append(escalation_event.to_dict())
if escalation_event.action == EscalationAction.ABORT:
raise RuntimeError("Response aborted: confidence below critical threshold")
# 2. Regulatory policy check (from regulated-ai-governance)
# Pass response through applicable compliance policies
# (HIPAA, TCPA quiet hours, EU AI Act disclosure, etc.)
return {
"response": response,
"confidence": escalation_event.confidence_score,
"escalated": escalation_event.triggered,
"regulatory_context": context,
}
The confidence-escalation package provides four framework adapters (LangChain, CrewAI, AutoGen, Google ADK) so this composes cleanly with whichever agent framework you're already using:
from confidence_escalation import LangChainEscalationAdapter
# Drop-in middleware for an existing LangChain agent
adapter = LangChainEscalationAdapter(
scorer=scorer,
policy=policy,
handlers=handlers,
)
agent_with_gate = adapter.wrap(your_langchain_agent)
6. AuditTrail
Every MACF operation — agent authorization, message routing, context reads, compliance gate evaluations — appends a record to the AuditTrail. The trail uses a hash chain: each entry includes the SHA-256 hash of the previous entry, making tamper-evidence detectable without an external ledger.
import hashlib
import json
import datetime
from typing import Any, Dict, List
class AuditTrail:
def __init__(self):
self._entries: List[Dict[str, Any]] = []
self._last_hash = "genesis"
def append(self, record: Dict[str, Any]) -> str:
entry = {
**record,
"timestamp": datetime.datetime.now(datetime.timezone.utc).isoformat(),
"prev_hash": self._last_hash,
}
entry_hash = hashlib.sha256(
json.dumps(entry, sort_keys=True).encode()
).hexdigest()
entry["hash"] = entry_hash
self._entries.append(entry)
self._last_hash = entry_hash
return entry_hash
def verify(self) -> bool:
"""Returns True if the chain is intact (no entries have been modified)."""
prev = "genesis"
for entry in self._entries:
claimed_hash = entry.get("hash")
reconstructed = {k: v for k, v in entry.items() if k != "hash"}
reconstructed["prev_hash"] = prev
expected = hashlib.sha256(
json.dumps(reconstructed, sort_keys=True).encode()
).hexdigest()
if claimed_hash != expected:
return False
prev = claimed_hash
return True
def export(self) -> List[Dict[str, Any]]:
return list(self._entries)
A three-month deployment using this audit design produced zero unresolved TCPA compliance queries — every agent decision was traceable within seconds of a compliance request.
How a Request Flows Through MACF
Here's what a single user request looks like end-to-end:
sequenceDiagram
actor User
participant O as Orchestrator
participant AR as AgentRegistry
participant PF as PrivacyFilter
participant MB as MessageBus
participant SA as Specialist Agent
participant CS as ContextStore
participant CG as ComplianceGate
participant AT as AuditTrail
User->>O: request
O->>AR: authorize(caller="orchestrator", target="records_agent")
AR-->>O: ✅ authorized
O->>PF: scrub(payload)
PF-->>O: clean_payload [PHI removed]
O->>MB: route(message + regulatory_context)
MB->>AT: append(routing_event)
MB->>SA: deliver(message)
SA->>CS: get("session_context", audit_trail)
CS->>AT: append(context_read_event)
CS-->>SA: context_data
SA-->>O: raw_response
O->>CG: enforce(response, regulatory_context)
Note over CG: MultiSignalConfidenceScorer → ThresholdPolicy<br/>score ≥ 0.65 → pass / score < 0.65 → HumanInLoop
CG->>AT: append(gate_result)
CG-->>O: gated_response
O-->>User: final_response
Note over AT: AuditTrail.verify() → ✅ hash chain intact
The total overhead across all six components adds a median of under 10ms per request — less than a single LLM token generation step.
Getting Started
All compliance-enforcement components referenced in this article are available as open-source Python packages:
# Confidence scoring, threshold policies, framework adapters
pip install confidence-escalation
# Runtime compliance enforcement (HIPAA, FERPA, TCPA, EU AI Act)
pip install regulated-ai-governance
# Voice/SMS pipeline compliance (Pipecat, Twilio, A2P 10DLC)
pip install voice-ai-governance
A minimal MACF wiring for a two-agent system:
from confidence_escalation import (
MultiSignalConfidenceScorer,
ThresholdPolicy,
EscalationAction,
HumanInLoopHandler,
ConfidenceEscalationMiddleware,
ScoringMethod,
)
# Initialize the six components
registry = AgentRegistry()
bus = MessageBus()
store = ContextStore()
pfilter = PrivacyFilter()
audit = AuditTrail()
scorer = MultiSignalConfidenceScorer(
weights={ScoringMethod.LOGPROB: 0.5, ScoringMethod.VERBALIZED: 0.5}
)
policy = ThresholdPolicy(threshold=0.65, action=EscalationAction.HUMAN_IN_LOOP)
gate = ComplianceGate(scorer, policy, [HumanInLoopHandler(...)], audit)
# Register agents
registry.register(AgentCapability(
agent_id="records_agent",
name="Records Access Agent",
allowed_callers={"orchestrator"},
required_context_keys={"consent_verified", "regulations"},
regulatory_scope={"hipaa", "ferpa"},
handler=your_records_handler,
))
# Route a message
msg = AgentMessage(
sender_id="orchestrator",
receiver_id="records_agent",
payload="Retrieve enrollment history for student",
message_id="msg_001",
regulatory_context={"regulations": ["ferpa"], "consent_verified": True},
)
routed = bus.route(msg, registry, pfilter, audit)
# Execute and gate the response
raw_response = registry.get("records_agent").handler(routed)
result = gate.enforce(raw_response, msg.regulatory_context)
# Verify audit integrity
assert audit.verify(), "Audit trail tampered"
Why Six Components
Each component addresses a distinct failure mode:
| Component | Failure Mode Addressed |
|---|---|
| AgentRegistry | Unauthorized agent invocations in multi-hop chains |
| MessageBus | PII leaking across agent boundaries during routing |
| ContextStore | Stale session data persisting beyond TTL; untracked reads |
| PrivacyFilter | PHI reaching downstream agents without scrubbing |
| ComplianceGate | Low-confidence responses reaching users without human review |
| AuditTrail | Unverifiable decision history when compliance queries arrive |
You can adopt them incrementally. Start with the AuditTrail and ComplianceGate (the two with the highest compliance ROI), then add the others as your deployment matures.
Further Reading
The complete framework specification — including formal component interface definitions, latency budget analysis, and a framework gap analysis across LangChain, CrewAI, AutoGen, and Google ADK — is documented in the accompanying research paper (preprint forthcoming on arXiv).
OSS packages:
-
confidence-escalation— multi-signal confidence scoring, threshold policies, 4 framework adapters, 50 tests -
regulated-ai-governance— runtime compliance enforcement, HIPAA/FERPA/GDPR/EU AI Act, Google ADK adapter, 84 tests
If this was useful, a ⭐ on either repo helps others find it.
The AgentRegistry, MessageBus, ContextStore, PrivacyFilter, ComplianceGate, and AuditTrail patterns shown here are components of the Multi-Agent Collaborative Framework (MACF), a reference architecture for enterprise agentic AI deployments in regulated sectors.
Top comments (0)