Ashutosh Rana

Posted on Apr 29

MACF: The 6-Component Framework Every Enterprise Multi-Agent AI System Needs

#ai #architecture #python #machinelearning

The Problem Nobody Talks About

You can wire together a multi-agent system in an afternoon with LangChain, CrewAI, or AutoGen. You'll have agents calling tools, passing messages, and producing outputs. It works.

Then someone asks: "Which agent decided to access that patient record? Can you show us the full decision trail for the compliance audit?"

Or: "The orchestrator passed a context payload to the subagent — was PII scrubbed before that transfer?"

Or simply: "How do we know the right agent handled this request? Is there a capability check before invocation?"

Most frameworks don't answer these questions. They give you plumbing — not governance. For internal prototypes that's fine. For enterprise deployments in regulated industries (healthcare, education, financial services), it isn't.

This article introduces MACF — the Multi-Agent Collaborative Framework — a six-component reference architecture that adds the layer most frameworks skip: compliance-enforced, auditable, privacy-aware agent coordination.

Multi-Agent Systems in 90 Seconds

Before diving into MACF, a brief grounding for readers new to multi-agent architectures.

A multi-agent system is a collection of AI agents — each with a defined capability scope — that collaborate to complete tasks neither could do alone. A common pattern: an orchestrator agent receives a user request, decomposes it, delegates subtasks to specialist agents (a retrieval agent, a summarization agent, a compliance-check agent), and assembles the final response.

This is different from a single LLM with tool calls. Here, each agent may run a different model, hold its own context window, call its own tools, and run in parallel with other agents. Communication between agents is structured — each message has a sender, receiver, and payload.

The value is clear: parallelism, specialization, and modularity. The challenge is equally clear: when agents are coordinated across a pipeline, failures — and compliance violations — can cascade. A privacy leak in one agent propagates silently to every downstream agent unless something stops it.

MACF is that something.

What Current Frameworks Provide (and What They Skip)

LangChain, CrewAI, AutoGen, and Google ADK are all excellent at defining agent topology — which agents exist, how they connect, and what tools they can call.

What none of them include out of the box:

Concern	LangChain	CrewAI	AutoGen	Google ADK
Capability-gated agent invocation	❌	❌	❌	Partial
Regulatory context propagation across agent hops	❌	❌	❌	❌
PII/PHI scrubbing before context transfer	❌	❌	❌	❌
Pre-response compliance enforcement (HIPAA, TCPA, GDPR)	❌	❌	❌	❌
Tamper-evident immutable audit trail	❌	❌	❌	❌

MACF doesn't replace these frameworks. It runs as an infrastructure layer that wraps them — the same way an API gateway sits in front of microservices without replacing the services themselves.

The MACF Architecture

MACF defines six components. Each has a typed interface, a defined responsibility, and a compliance guarantee.

flowchart TB
    User([User Request]) --> Orch

    subgraph Agents ["Your Agent Framework  (LangChain / CrewAI / AutoGen / ADK)"]
        Orch[Orchestrator Agent]
        SA1[Specialist Agent A]
        SA2[Specialist Agent B]
    end

    subgraph MACF ["MACF Infrastructure Layer"]
        AR["AgentRegistry\nauthorize() · capability check"]
        MB["MessageBus\nroute() · regulatory_context propagation"]
        CS["ContextStore\nget() / set() · TTL · audit flag"]
        PF["PrivacyFilter\nscrub() · PHI/PII redaction"]
        CG["ComplianceGate\nenforce() · HIPAA · TCPA · GDPR · EU AI Act"]
        AT["AuditTrail\nappend() · verify() · hash chain"]
    end

    Orch -->|1 authorize| AR
    AR -->|✅ capability check| Orch
    Orch -->|2 route + scrub| MB
    MB --> PF
    PF --> MB
    MB -->|log| AT
    MB --> SA1
    MB --> SA2
    SA1 -->|read context| CS
    CS -->|log read| AT
    SA1 -->|response| Orch
    Orch -->|3 enforce| CG
    CG -->|log gate result| AT
    CG -->|✅ gated response| Response([Final Response])

Let's walk through each component.

1. AgentRegistry

The AgentRegistry is the single source of truth for what agents exist and what they're allowed to do. Before any agent is invoked, the registry performs a capability authorization check — validating that the requesting agent has permission to call the target agent in the current regulatory context.

from dataclasses import dataclass, field
from typing import Set, Optional, Callable

@dataclass
class AgentCapability:
    agent_id: str
    name: str
    allowed_callers: Set[str]          # which agents can invoke this one
    required_context_keys: Set[str]    # context fields that must be present
    regulatory_scope: Set[str]         # e.g. {"hipaa", "ferpa", "eu_ai_act"}
    handler: Callable

class AgentRegistry:
    def __init__(self):
        self._agents: dict[str, AgentCapability] = {}

    def register(self, capability: AgentCapability) -> None:
        self._agents[capability.agent_id] = capability

    def authorize(
        self,
        caller_id: str,
        target_id: str,
        context: dict,
    ) -> bool:
        cap = self._agents.get(target_id)
        if not cap:
            return False
        if caller_id not in cap.allowed_callers:
            return False
        missing = cap.required_context_keys - set(context.keys())
        if missing:
            raise ValueError(f"Missing required context keys: {missing}")
        return True

    def get(self, agent_id: str) -> Optional[AgentCapability]:
        return self._agents.get(agent_id)

Why this matters: Without an authorization check, any agent can call any other agent. In a regulated deployment, an unauthenticated agent calling a records-access agent is an access-control violation — regardless of what the underlying model does.

2. MessageBus

The MessageBus handles all inter-agent communication. Every message is typed and carries a regulatory_context field — a propagation envelope that tells downstream agents which regulatory constraints apply to this request.

from dataclasses import dataclass, field
from typing import Any, Dict, Optional
import datetime

@dataclass
class AgentMessage:
    sender_id: str
    receiver_id: str
    payload: Any
    message_id: str
    timestamp: str = field(
        default_factory=lambda: datetime.datetime.now(datetime.timezone.utc).isoformat()
    )
    regulatory_context: Dict[str, Any] = field(default_factory=dict)
    # e.g. {"regulations": ["hipaa", "tcpa"], "consent_verified": True,
    #        "jurisdiction": "US", "session_id": "sess_abc123"}

class MessageBus:
    def route(
        self,
        message: AgentMessage,
        registry: AgentRegistry,
        privacy_filter: "PrivacyFilter",
        audit_trail: "AuditTrail",
    ) -> AgentMessage:
        # 1. Authorization check
        authorized = registry.authorize(
            caller_id=message.sender_id,
            target_id=message.receiver_id,
            context=message.regulatory_context,
        )
        if not authorized:
            raise PermissionError(
                f"Agent {message.sender_id!r} is not authorized "
                f"to invoke {message.receiver_id!r}"
            )

        # 2. Scrub PII from payload before routing
        if isinstance(message.payload, str):
            scrub_result = privacy_filter.scrub(message.payload)
            message = AgentMessage(
                **{**message.__dict__, "payload": scrub_result.clean_text}
            )

        # 3. Append routing decision to audit trail
        audit_trail.append({
            "event": "message_routed",
            "from": message.sender_id,
            "to": message.receiver_id,
            "message_id": message.message_id,
            "regulatory_context": message.regulatory_context,
        })

        return message

The key design decision: regulatory_context propagates with every message hop. An orchestrator that sets {"regulations": ["hipaa"], "consent_verified": True} will have that context available to every downstream subagent — without requiring each subagent to re-derive it.

3. ContextStore

The ContextStore holds cross-agent shared state with two MACF-specific additions: TTL enforcement (context expires, preventing stale data from leaking across sessions) and audit flagging (any read of a flagged key is automatically recorded in the AuditTrail).

import time
from typing import Any, Optional

class ContextStore:
    def __init__(self, default_ttl_seconds: int = 3600):
        self._store: dict = {}
        self._ttl: dict = {}
        self._audit_keys: set = set()
        self.default_ttl = default_ttl_seconds

    def set(
        self,
        key: str,
        value: Any,
        ttl_seconds: Optional[int] = None,
        audit: bool = False,
    ) -> None:
        self._store[key] = value
        self._ttl[key] = time.time() + (ttl_seconds or self.default_ttl)
        if audit:
            self._audit_keys.add(key)

    def get(self, key: str, audit_trail: Optional["AuditTrail"] = None) -> Optional[Any]:
        if key not in self._store:
            return None
        if time.time() > self._ttl.get(key, 0):
            del self._store[key]
            return None
        if key in self._audit_keys and audit_trail:
            audit_trail.append({"event": "context_read", "key": key})
        return self._store[key]

Practical use: A student's enrollment record retrieved during session initialization can be stored with audit=True so every downstream read is traced — satisfying FERPA §99.32 (recordkeeping of education record disclosures).

4. PrivacyFilter

The PrivacyFilter scrubs PHI and PII from any text payload before it crosses an agent boundary. It runs on every outbound message from the orchestrator to subagents, and on every context value written to the ContextStore.

import re
from dataclasses import dataclass
from typing import List, Tuple

@dataclass
class ScrubResult:
    clean_text: str
    replacements: List[Tuple[str, str]]  # (pattern_name, matched_value)

    @property
    def was_modified(self) -> bool:
        return bool(self.replacements)

class PrivacyFilter:
    # Extend this list for your regulatory scope
    _PATTERNS = [
        ("SSN",          r"\b\d{3}-\d{2}-\d{4}\b"),
        ("MRN_PARAM",    r"(?:mrn|patient_id|dob|ssn)=[^&\s]+"),
        ("EMAIL",        r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"),
        ("PHONE",        r"\b(\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b"),
        ("STUDENT_ID",   r"\b(?:student_id|stu_id)=\S+"),
        ("DOB",          r"\b(?:dob|date_of_birth)[=:\s]+\d{1,2}[/\-]\d{1,2}[/\-]\d{2,4}"),
    ]

    def scrub(self, text: str) -> ScrubResult:
        replacements = []
        result = text
        for name, pattern in self._PATTERNS:
            matches = re.findall(pattern, result, re.IGNORECASE)
            if matches:
                result = re.sub(pattern, "[REDACTED]", result, flags=re.IGNORECASE)
                replacements.extend([(name, m) for m in matches])
        return ScrubResult(clean_text=result, replacements=replacements)

In a healthcare context, this is the difference between sending "Patient mrn=9876543 needs billing review" to a downstream agent and sending "Patient mrn=[REDACTED] needs billing review". The subagent can still process the intent without receiving the PHI.

The regulated-ai-governance and voice-ai-governance OSS packages ship production-grade implementations of this scrubbing layer with 14+ PHI patterns validated across 100% recall on a test corpus.

5. ComplianceGate

The ComplianceGate is evaluated before every agent response is returned to the caller. It enforces applicable regulations — HIPAA, TCPA, GDPR, EU AI Act Article 13/14 — based on the regulatory_context carried by the incoming message.

This is where the confidence-escalation package integrates directly into MACF. The ComplianceGate doesn't just check rules — it also evaluates the agent's confidence in its response and can trigger a human-in-the-loop handoff if confidence falls below threshold.

pip install confidence-escalation regulated-ai-governance

from confidence_escalation import (
    MultiSignalConfidenceScorer,
    ThresholdPolicy,
    EscalationAction,
    HumanInLoopHandler,
    ComplianceLoggingHandler,
    ConfidenceEscalationMiddleware,
    ScoringMethod,
)

# Build a multi-signal scorer: logprob + verbalized confidence + tool-call risk
scorer = MultiSignalConfidenceScorer(
    weights={
        ScoringMethod.LOGPROB: 0.5,
        ScoringMethod.VERBALIZED: 0.3,
        ScoringMethod.TOOL_CALL_RISK: 0.2,
    }
)

# Two-tier policy: warn below 0.65, abort below 0.30
policy = ThresholdPolicy(
    threshold=0.65,
    action=EscalationAction.HUMAN_IN_LOOP,
    critical_threshold=0.30,
    critical_action=EscalationAction.ABORT,
)

# Handlers: route to human queue OR write to compliance log
handlers = [
    HumanInLoopHandler(queue_callback=your_human_queue.enqueue),
    ComplianceLoggingHandler(logger=your_compliance_logger),
]

class ComplianceGate:
    def __init__(self, scorer, policy, handlers, audit_trail):
        self._middleware = ConfidenceEscalationMiddleware(
            scorer=scorer,
            policy=policy,
            handlers=handlers,
        )
        self._audit = audit_trail

    def enforce(self, response: str, context: dict) -> dict:
        # 1. Confidence check
        escalation_event = self._middleware.evaluate(response, context)
        self._audit.append(escalation_event.to_dict())

        if escalation_event.action == EscalationAction.ABORT:
            raise RuntimeError("Response aborted: confidence below critical threshold")

        # 2. Regulatory policy check (from regulated-ai-governance)
        # Pass response through applicable compliance policies
        # (HIPAA, TCPA quiet hours, EU AI Act disclosure, etc.)

        return {
            "response": response,
            "confidence": escalation_event.confidence_score,
            "escalated": escalation_event.triggered,
            "regulatory_context": context,
        }

The confidence-escalation package provides four framework adapters (LangChain, CrewAI, AutoGen, Google ADK) so this composes cleanly with whichever agent framework you're already using:

from confidence_escalation import LangChainEscalationAdapter

# Drop-in middleware for an existing LangChain agent
adapter = LangChainEscalationAdapter(
    scorer=scorer,
    policy=policy,
    handlers=handlers,
)
agent_with_gate = adapter.wrap(your_langchain_agent)

6. AuditTrail

Every MACF operation — agent authorization, message routing, context reads, compliance gate evaluations — appends a record to the AuditTrail. The trail uses a hash chain: each entry includes the SHA-256 hash of the previous entry, making tamper-evidence detectable without an external ledger.

import hashlib
import json
import datetime
from typing import Any, Dict, List

class AuditTrail:
    def __init__(self):
        self._entries: List[Dict[str, Any]] = []
        self._last_hash = "genesis"

    def append(self, record: Dict[str, Any]) -> str:
        entry = {
            **record,
            "timestamp": datetime.datetime.now(datetime.timezone.utc).isoformat(),
            "prev_hash": self._last_hash,
        }
        entry_hash = hashlib.sha256(
            json.dumps(entry, sort_keys=True).encode()
        ).hexdigest()
        entry["hash"] = entry_hash
        self._entries.append(entry)
        self._last_hash = entry_hash
        return entry_hash

    def verify(self) -> bool:
        """Returns True if the chain is intact (no entries have been modified)."""
        prev = "genesis"
        for entry in self._entries:
            claimed_hash = entry.get("hash")
            reconstructed = {k: v for k, v in entry.items() if k != "hash"}
            reconstructed["prev_hash"] = prev
            expected = hashlib.sha256(
                json.dumps(reconstructed, sort_keys=True).encode()
            ).hexdigest()
            if claimed_hash != expected:
                return False
            prev = claimed_hash
        return True

    def export(self) -> List[Dict[str, Any]]:
        return list(self._entries)

A three-month deployment using this audit design produced zero unresolved TCPA compliance queries — every agent decision was traceable within seconds of a compliance request.

How a Request Flows Through MACF

Here's what a single user request looks like end-to-end:

sequenceDiagram
    actor User
    participant O as Orchestrator
    participant AR as AgentRegistry
    participant PF as PrivacyFilter
    participant MB as MessageBus
    participant SA as Specialist Agent
    participant CS as ContextStore
    participant CG as ComplianceGate
    participant AT as AuditTrail

    User->>O: request
    O->>AR: authorize(caller="orchestrator", target="records_agent")
    AR-->>O: ✅ authorized

    O->>PF: scrub(payload)
    PF-->>O: clean_payload [PHI removed]

    O->>MB: route(message + regulatory_context)
    MB->>AT: append(routing_event)
    MB->>SA: deliver(message)

    SA->>CS: get("session_context", audit_trail)
    CS->>AT: append(context_read_event)
    CS-->>SA: context_data

    SA-->>O: raw_response

    O->>CG: enforce(response, regulatory_context)
    Note over CG: MultiSignalConfidenceScorer → ThresholdPolicy<br/>score ≥ 0.65 → pass / score < 0.65 → HumanInLoop
    CG->>AT: append(gate_result)
    CG-->>O: gated_response

    O-->>User: final_response
    Note over AT: AuditTrail.verify() → ✅ hash chain intact

The total overhead across all six components adds a median of under 10ms per request — less than a single LLM token generation step.

Getting Started

All compliance-enforcement components referenced in this article are available as open-source Python packages:

# Confidence scoring, threshold policies, framework adapters
pip install confidence-escalation

# Runtime compliance enforcement (HIPAA, FERPA, TCPA, EU AI Act)
pip install regulated-ai-governance

# Voice/SMS pipeline compliance (Pipecat, Twilio, A2P 10DLC)
pip install voice-ai-governance

A minimal MACF wiring for a two-agent system:

from confidence_escalation import (
    MultiSignalConfidenceScorer,
    ThresholdPolicy,
    EscalationAction,
    HumanInLoopHandler,
    ConfidenceEscalationMiddleware,
    ScoringMethod,
)

# Initialize the six components
registry = AgentRegistry()
bus = MessageBus()
store = ContextStore()
pfilter = PrivacyFilter()
audit = AuditTrail()

scorer = MultiSignalConfidenceScorer(
    weights={ScoringMethod.LOGPROB: 0.5, ScoringMethod.VERBALIZED: 0.5}
)
policy = ThresholdPolicy(threshold=0.65, action=EscalationAction.HUMAN_IN_LOOP)
gate = ComplianceGate(scorer, policy, [HumanInLoopHandler(...)], audit)

# Register agents
registry.register(AgentCapability(
    agent_id="records_agent",
    name="Records Access Agent",
    allowed_callers={"orchestrator"},
    required_context_keys={"consent_verified", "regulations"},
    regulatory_scope={"hipaa", "ferpa"},
    handler=your_records_handler,
))

# Route a message
msg = AgentMessage(
    sender_id="orchestrator",
    receiver_id="records_agent",
    payload="Retrieve enrollment history for student",
    message_id="msg_001",
    regulatory_context={"regulations": ["ferpa"], "consent_verified": True},
)
routed = bus.route(msg, registry, pfilter, audit)

# Execute and gate the response
raw_response = registry.get("records_agent").handler(routed)
result = gate.enforce(raw_response, msg.regulatory_context)

# Verify audit integrity
assert audit.verify(), "Audit trail tampered"

Why Six Components

Each component addresses a distinct failure mode:

Component	Failure Mode Addressed
AgentRegistry	Unauthorized agent invocations in multi-hop chains
MessageBus	PII leaking across agent boundaries during routing
ContextStore	Stale session data persisting beyond TTL; untracked reads
PrivacyFilter	PHI reaching downstream agents without scrubbing
ComplianceGate	Low-confidence responses reaching users without human review
AuditTrail	Unverifiable decision history when compliance queries arrive

You can adopt them incrementally. Start with the AuditTrail and ComplianceGate (the two with the highest compliance ROI), then add the others as your deployment matures.

DEV Community