DEV Community: Prinston Palmer

Adaptive Agent Routing in Artemis City: An Exploratory Study of Hebbian Learning Architectures

Prinston Palmer — Tue, 24 Mar 2026 03:57:53 +0000

Abstract

This report documents an exploratory investigation into Hebbian learning as a mechanism for adaptive agent selection within the Artemis City multi-agent platform. Rather than having a fixed controller assign tasks to agents, the system learns from experience strengthening connections between agents and task types that co-succeed, and weakening those that fail. Over a series of progressively sophisticated simulations, we examine how different variants of this approach standard Hebbian, decay-based Hebbian, context-aware, and domain-locked architectures compare against traditional inference methods (k-Nearest Neighbor lookup and monolithic neural networks) across static and dynamically shifting environments.

The findings presented here are exploratory. No single architecture is declared the definitive winner. Instead, this report maps the trade-off space: where adaptive routing excels, where it struggles, and what architectural choices appear to matter most. The central tension running through all experiments is the plasticity-stability trade-off the tension between a system's ability to adapt to change and its ability to maintain reliable accuracy. Resolving this tension well turns out to be the core engineering challenge of the Hebbian marketplace.

1. Background and Motivation

1.1 What Is Artemis City?

Artemis City is a multi-agent orchestration platform in which a network of AI agents shares a persistent knowledge base stored as an Obsidian vault as collective memory. Agents read task specifications from structured notes, execute them, and write results back to the vault. This creates a human-readable, cumulative memory system that agents can draw on over time.

The platform is governed by the Artemis Transmission Protocol (ATP), which structures all agent-to-agent and human-to-agent communication. ATP messages declare an ActionType one of Execute, Scaffold, Summarize, or Reflect which categorizes the cognitive nature of each task. This classification turns out to be central to the routing architecture explored in this report.

1.2 The Memory Bus

All agent reads and writes flow through a Memory Bus a synchronization layer that mediates access to the Obsidian vault and the Supabase vector index. Every knowledge write is an atomic, write-through operation: the bus generates an embedding, upserts it into the vector index, writes the note to the vault, and only confirms the operation once both stores are updated. This guarantees that agents never see a partial write.

The bus implements a three-tiered read hierarchy that balances speed against comprehensiveness:

Exact Lookup (O(1)): Hash-map lookup by unique ID or title constant time, returns immediately if found.
Structured Search (O(log n)): Keyword search across sorted indices fast for known topics.
Vector Similarity (O(n)): Semantic similarity against the Supabase pgvector index most expensive, used only as a fallback.

Performance targets: write latency under 200 ms at p95, read latency under 50 ms for cache hits, cross-system sync lag below 300 ms. These guarantees underpin the Hebbian learning engine's ability to propagate weight updates in near real-time across the full agent collective.

1.3 The Routing Problem

As the agent population grows, so does the question: which agent should handle a given task? Static assignment (always route task type X to agent Y) is brittle it cannot adapt when an agent's performance changes or when the distribution of incoming tasks shifts over time. A naive random router wastes capability. The question this work explores is whether a system can learn routing building up routing intelligence from the accumulated history of which agents succeeded at which tasks.

Hebbian learning offers a biologically inspired answer to this question.

1.4 Hebbian Learning in Brief

Hebbian learning is rooted in the neuroscience principle: neurons that fire together, wire together. In the context of agent routing, the analogous principle is: agents that succeed together on task types should be more strongly associated with those task types. Formally, after each task completion, the connection weight between a task type and a handling agent is adjusted based on the outcome stronger if successful, weaker if not.

Whitebook v2 formalized this as a simple binary update rule:

{t+1} = \max(0,\ w_t + \Delta w)

where $\Delta w = +1$ on success and $\Delta w = -1$ on failure. Simple and interpretable, but potentially volatile a single bad outcome has the same weight as a single good one regardless of magnitude.

Whitebook v3 replaces this with a bounded morphological formula:

$$\Delta W = \tanh(a \cdot x \cdot y)$$

where $a$ is a learning rate (default 0.1), $x$ is the task's input signal magnitude (capturing complexity or confidence), and $y$ is the outcome signal (+1 for success, −1 for failure). The hyperbolic tangent bounds all updates to the range [−1, +1], preventing runaway weight accumulation. A single failure cannot destroy accumulated trust; a single success cannot grant permanent dominance.

On failure, an explicit anti-Hebbian update applies:

$$\Delta W = -\eta \quad (\eta = 0.1)$$

Weight Intelligence Signal. The routing intelligence accumulated by an agent is measured as its deviation from the cold-start baseline:

$$\text{Intelligence} = |W - 1.0|$$

At cold start, all agents begin at $W = 1.0$ equiprobable selection. As the system learns, weights diverge. High-performing agents accumulate $W \gg 1.0$; poor performers decay toward 0. The magnitude of this deviation is the learned routing signal.

Weight Decay. To prevent cementing outdated associations, weights are continuously pulled back toward the cold-start baseline:

$$W \leftarrow 1.0 + (W - 1.0) \times \alpha \quad (\alpha = 0.995)$$

This ensures agents must continuously prove their value a connection unused for 30 days loses approximately 5% of its accumulated signal.

2. Experimental Setup

2.1 Datasets

All simulations use synthetic datasets designed to test specific aspects of learning and adaptation.

Static datasets consist of fixed relationships between inputs and outputs, used to establish baseline performance comparisons without environmental change. For example, one experiment uses a non-linear function across 1,000 samples with three input features:

$$y = 2x_0^2 - 3x_1 + \sin(x_2) + \varepsilon$$

Dynamic datasets simulate concept drift environments where the underlying relationship between inputs and outputs changes over time. These are structured in three phases:

Phase 1 (steps 0–333): Linear relationship $f(x) = 2x_0 + 3x_1$
Phase 2 (steps 334–666): Quadratic relationship $f(x) = -2x_0^2 + x_1$
Phase 3 (steps 667–1000): Sinusoidal relationship $f(x) = 5\sin(x_2) + x_0$

This mirrors the real-world dynamics of Artemis City, where the distribution of task types changes across project phases more Execute tasks during deployment, more Scaffold tasks during planning, more Summarize tasks during review.

2.2 Models Compared

Experiments compare the following systems:

System	Description
Traditional Inference (k-NN)	Online k-Nearest Neighbor (k=5) retrieves the most similar past examples and averages their outcomes. Expensive but accurate on stable data.
Monolithic Learner (MLP)	A single neural network that learns all task types simultaneously. Stable and well-understood, but not specialized.
Random Router	Multi-agent system with random agent selection. Serves as a lower-bound baseline.
Standard Hebbian	Binary ±1 weight update rule (v2 formula). Simple adaptive routing, no decay.
Decay Hebbian	Standard Hebbian with weight decay (α = 0.995) and pruning. Explores the plasticity-stability trade-off.
Adaptive Hebbian	Standard Hebbian adapted for dynamic environments with concept drift.
Domain-Locked Hebbian (DL)	Agents hard-constrained to ATP ActionType domains. The v3 architectural advance.
Dynamic Penalty	Hebbian with escalating penalties for consecutive failures, accelerating domain switching.
Context-Aware Hebbian	Hebbian with decay rate modulated by observed error trends adaptive plasticity.

2.3 Primary Metric

The primary performance metric throughout is Mean Absolute Error (MAE) the average absolute difference between predicted and actual output values, accumulated over all time steps. Lower is better. Moving average error (MAE computed over rolling windows) is used in time-series plots to reveal adaptation dynamics.

3. Experiment 1 Baseline Comparisons on Static Data

3.1 Setup

The first set of experiments establishes a performance baseline by comparing Traditional Inference (k-NN), Standard Hebbian, and Decay Hebbian on static synthetic data. The goal is to understand the fundamental performance characteristics of each approach before introducing environmental change.

3.2 Results

Model	Cumulative MAE	Notes
Traditional Inference (k-NN)	~4,492	Best accuracy on stable data
Standard Hebbian	~9,317	Adaptive, but weaker on static data
Decay Hebbian	~9,820	Worst performance overall

3.3 Analysis: The Plasticity-Stability Trade-Off

The most striking finding here is that Decay Hebbian performed worse than Standard Hebbian, despite being designed as an improvement. Understanding why illuminates the central challenge of this research.

The Decay Hebbian model applies a weight decay rate of α = 0.995 with a pruning threshold connections with low weights are periodically removed. On this static dataset, where the underlying relationship does not change, this decay is counterproductive. The model "forgets" useful associations too quickly (excessive plasticity), preventing it from retaining a stable long-term model of the non-linear function.

This reveals the core tension:

High plasticity (aggressive decay, fast forgetting) → good at adapting to change, poor at remembering what works
High stability (slow decay, strong memory) → good at retaining learned patterns, poor at unlearning outdated ones

Standard Hebbian, with no decay, achieves greater stability on this static task and thus outperforms Decay Hebbian. However, k-NN still outperforms both Hebbian variants significantly on static data. The Hebbian models' advantage, if any, must come from dynamic environments.

The key question is: can the right Hebbian configuration outperform k-NN where it matters most when the environment changes?

4. Experiment 2 Concept Drift: Adaptive Hebbian vs. Baselines

4.1 Setup

This experiment introduces concept drift through the three-phase dynamic dataset described above. Three systems are compared: Adaptive Hebbian, Random Router, and Monolithic Learner (MLP). Performance is visualized as a Moving Average Error curve across all 1,000 steps, making adaptation dynamics visible at phase transitions.

4.2 Results

The Adaptive Hebbian model drastically outperformed the Random Router, confirming that structured learning and routing provides meaningful value over chance selection. However, it generally exhibited higher error rates and slower reaction times than the Monolithic Learner, which maintained greater stability during phase transitions.

Why would a specialized, adaptive system perform worse than a single monolithic model?

The Monolithic Learner (MLP) updates its parameters continuously via stochastic gradient descent when the environment shifts, it adjusts immediately across all parameters. It does not have the routing overhead of identifying which agent to call; it simply adjusts itself. This makes it fast to react to concept drift, especially early in each new phase.

The Adaptive Hebbian system, by contrast, must first experience failure with its current routing (the wrong agent for the new phase), then reallocate weight away from the previously favored agent, and finally allow a better-suited agent to accumulate weight. This introduces a switching cost a lag between when the environment changes and when the routing adapts.

This experiment suggests that naive Hebbian routing does not automatically beat monolithic approaches on concept drift. The architecture matters enormously.

5. Experiment 3 Advanced Architectures: Reducing Switching Cost

5.1 Setup

Having identified switching cost as a key limitation, this experiment tests two advanced Hebbian variants designed to reduce it:

Dynamic Penalty model: Rather than a fixed penalty for each failure, penalties ramp up with consecutive failures. If an agent fails repeatedly, the penalty grows non-linearly forcing a routing change much earlier than the standard model.
Context-Aware model: Monitors the trend in recent error rates and adjusts the global decay rate dynamically. When error spikes (signaling a phase transition), the decay rate increases making the system more plastic precisely when it needs to adapt. When errors stabilize, decay slows preserving learned routing.

A Baseline (fixed decay) model is included for comparison. The simulation tracks both Moving Average Error and, for the Context-Aware model, the active decay rate over time.

5.2 Results

Both advanced mechanisms significantly reduced switching cost compared to the Baseline:

Dynamic Penalty: By ramping up penalties for consecutive failures, the system quickly "fired" the failing expert during a phase transition (e.g., Linear → Quadratic), forcing a routing change much earlier than the linear penalty baseline.

Context-Aware: By detecting the error spike associated with concept drift and temporarily increasing the decay rate, this model achieved faster exploration of alternative agents. Once the new best agent established itself, the decay rate settled back down concentrating plasticity where and when it was needed.

A key observable was the Active Decay Rate plot for the Context-Aware model, which showed sharp increases precisely at phase boundaries (steps ~334 and ~667). This is emergent behavior the model was not told when phases changed; it discovered the transitions through error signals.

5.3 Remaining Questions

While both advanced architectures reduced switching cost, the experiments are exploratory and several questions remain open:

How sensitive are the Dynamic Penalty and Context-Aware models to their hyperparameters (penalty growth rate, error window size)?
Do these improvements hold across datasets with different drift rates or more gradual transitions?
Is the Context-Aware model's decay modulation mechanism stable under adversarial inputs that produce spurious error spikes?

6. The Domain-Locked Architecture (Whitebook v3)

6.1 The Core Insight

The experiments above treat all tasks as belonging to a single pool. The central architectural innovation of Whitebook v3 is the recognition that ATP ActionType is a domain boundary, not just metadata.

Each ActionType corresponds to a structurally distinct class of computation:

ActionType	Domain Function	Character
Execute	$f(x) = 2x_0 + 3x_1$	Linear direct computation
Scaffold	$f(x) = -2x_0^2 + x_1$	Quadratic structural planning
Summarize	$f(x) = 5\sin(x_2) + x_0$	Sinusoidal pattern extraction
Reflect	$f(x) = x_0^2 + \sin(x_1) + x_2$	Mixed nonlinear meta-cognitive

A summarizer does not research. A planner does not execute. In an unconstrained Hebbian marketplace, agents from one domain can pollute routing in another a strong Execute agent might "steal" Scaffold tasks it cannot handle well. Domain-locking eliminates this cross-domain interference entirely.

The domain-locked selection rule is:

$$P(\text{select}i \mid \text{task_type}_t) = 1 \quad \text{if}\ W{i,t} = \max(W_{\text{domain}_t})$$

Only agents within the correct domain compete. Among those, the highest-weight agent wins. The ActionType is declared in the ATP payload, not inferred the parser reads the ActionType field from the structured message and routes directly to the appropriate domain pool. This is O(1) routing: a hash table lookup followed by a max-weight selection within a small pool of typically 3 agents.

Default architecture: 4 domains × 3 agents per domain = 12 total agents.

6.2 Agent Registry

Every agent in Artemis City is registered with an explicit domain assignment at initialization. A representative agent profile looks like:

{
  "id": "executor_01",
  "domain": "Execute",
  "capabilities": ["linear_computation", "data_processing"],
  "sandbox_level": "strict",
  "trust_threshold": 0.75,
  "hebbian_weight": 1.0
}

The registry maintains not just static capability declarations but also dynamic state whether an agent is idle or busy, resource quotas, and task history. This allows the router to match tasks to agents who can handle them and are available to do so. The design is analogous to an operating system process scheduler layered on top of a service directory.

6.3 Simulation Results

The v4 simulation (1,000 tasks, three concept drift phases, random seed 42, 600-sample domain-specific pre-training per agent) produced the following comparison:

Condition	Total MAE	vs DL Trained
DL Trained (3 agents/domain)	1,938	baseline
DL Cold (untrained weights, domain-lock only)	1,967	+1.5%
Unconstrained Marketplace	10,289	+431%
Single MLP (monolithic)	9,617	+396%
k-NN Optimized	10,087	+420%

Domain-locked routing achieves 81.2% lower MAE than the unconstrained marketplace, 79.8% lower than a single monolithic MLP, and 80.8% lower than optimized k-NN while operating at 180× lower computational cost than k-NN.

6.4 Architecture Is the Primary Driver

A particularly important finding is the decomposition of gains:

Domain-locking alone (cold start, no pre-training) reduces MAE from 10,289 → 1,967. The structural constraint provides the majority of the improvement.
Adding domain-specific pre-training (600 samples/agent) reduces MAE further from 1,967 → 1,938 a meaningful but incremental gain.

This implies that the architecture is the primary driver, not the volume of training data. The domain boundaries prevent cross-domain interference that degrades unconstrained systems.

6.5 Robustness to Mislabeling

A practical concern for any domain-locked system is: what happens when tasks are assigned to the wrong domain? The simulations tested progressively higher mislabel rates:

Condition	MAE	vs Trained Baseline	Still Beats MLP (9,617)?
DL Trained (base)	1,938		✓
20% Mislabel	~1,980	+2.2%	✓
40% Mislabel	~2,100	+8.4%	✓
80% Skewed Distribution	~2,180	+12.5%	✓

Even with 40% of tasks routed to the wrong domain, the architecture still substantially outperforms a monolithic MLP. The practical implication: the system needs only >60% ActionType classification accuracy to retain its advantage a threshold achievable by any competent ATP parser.

6.6 Within-Domain Competition

Within each domain, the Hebbian weight mechanism produces a natural competitive dynamic. Simulations tested varying numbers of agents per domain:

Agents/Domain	MAE	vs 3/Domain
1/domain (monopoly)	1,967	+1.5%
3/domain (default)	1,938	baseline
5/domain (competitive)	1,906	−1.7%

An important emergent property: within each domain, 100% monopoly eventually forms one agent captures all routing weight through consistent performance and wins every selection. The 1→3 jump shows that competition provides selection pressure; the 3→5 jump shows diminishing returns. Three agents per domain is the practical sweet spot: enough competition to surface the best performer, without unnecessary overhead.

7. Active Sentinel & Immune System

Whitebook v3 elevates the sentinel from a passive monitoring layer to an active immune system one that not only detects routing pathologies but intervenes in real time to correct them.

7.1 Oscillation Detection

The sentinel monitors a rolling window of prediction errors within each domain. The key metric is the sign-change rate how often consecutive errors alternate direction:

$$\text{oscillation_rate} = \frac{\text{count}(\text{sign}(e_t) \neq \text{sign}(e_{t-1}))}{\text{window_size}}$$

Parameters: window size of 30 tasks, oscillation threshold of 0.35 (35% sign-change rate triggers intervention). High oscillation indicates that the currently-selected agent is producing inconsistent results sometimes good, sometimes bad suggesting it may be near the boundary of its competence or receiving adversarial inputs.

7.2 Active Rerouting

When the oscillation threshold is exceeded, the sentinel executes a reroute:

if oscillation_rate > threshold:
    dominant_agent = argmax(W_domain)
    W[dominant_agent] *= reroute_penalty    # penalty = 0.5
    reroutes += 1

This halves the dominant agent's weight, temporarily equalizing the competitive landscape and forcing the router to explore alternatives. The penalty is not permanent if the dominant agent truly is the best performer, it will re-accumulate weight through subsequent successes.

7.3 Simulation Results (v5 Test 1)

Metric	Passive (v4)	Active (v5)
Total MAE	baseline	−17 improvement (+0.9%)
Total reroutes	0	16
Reroute concentration		100% in Scaffold domain

All 16 reroutes occurred in the Scaffold domain, which uses the quadratic generating function the most volatile domain. The sentinel correctly identified where intervention was needed and left stable domains untouched. This is emergent behavior: the sentinel discovers which domains are pathological through the oscillation signal rather than through any programmed domain knowledge.

7.4 The Immune System Analogy

The sentinel embodies a feedback loop that mirrors biological immune response:

Agent fails → error oscillation increases
Sentinel detects oscillation → reroutes to alternative
Alternative succeeds → accumulates weight via Hebbian update
Original agent's weight decays → system learns to avoid it
If original agent improves → it can earn weight back

Every failure teaches. The rerouting mechanism is not a punishment it is an invitation to prove capability in a more competitive landscape.

8. System Resilience Properties

Beyond performance accuracy, Whitebook v3 documents several resilience properties of the Hebbian marketplace that have no clear equivalent in monolithic approaches.

8.1 Corpus Corruption Resistance

When a single Scaffold-domain agent's corpus is corrupted with 100 garbage samples at task #300, the Hebbian marketplace absorbs only −1.0% damage (MAE actually improves slightly as the corrupted agent is deselected). The monolithic MLP experiences +0.8% permanent degradation with no recovery mechanism.

The corrupted agent's predictions immediately worsen → Hebbian anti-update penalizes its weight → weight falls below competitors → agent receives 0 further assignments. Damage is contained to a single node in the routing graph rather than propagating through the entire system.

8.2 Missing Agent Flow Detection

When a new task type ("Optimize") begins appearing at 30% frequency at task #500 a type no agent has been trained on the failure rate spikes from 0.049 to 0.353: a 7.2× increase. This is an unmistakable signal that a new, unhandled capability gap has emerged, triggering an expansion workflow to register and train new agent flows.

This is how Artemis City grows organically not through manual configuration, but through failure-driven expansion.

8.3 Domain Ceiling Detection

A third resilience property is the system's ability to detect when a domain's agents have hit the limit of their capability. In a simulation where Execute-domain tasks progressively increased in complexity (nonlinearity factor growing by 0.003 per task after step #400), performance degraded predictably:

Quartile	Execute MAE	Complexity Factor
Q1 (simplest)	1.086	0.000
Q2	~3.5	~0.3
Q3	~6.5	~0.9
Q4 (hardest)	9.624	~1.8

A ceiling was detected at Execute task #67 the point where error exceeded 3× the baseline average. This triggers an expansion signal: the domain needs more capable agents or a new sub-domain specialization. Domain ceiling triggers expansion, not failure. The architecture grows organically in response to capability gaps rather than degrading silently.

8.4 Learning Velocity

After a failure event, Hebbian agents recover (3 consecutive successes below threshold) in 4.1–4.6 steps. Monolithic MLPs require 17–24 steps for equivalent recovery 4–5× slower.

The Hebbian system's failure triggers an immediate routing response: the failing agent loses weight, competitors gain opportunity. The MLP must retrain its entire parameter space, a fundamentally slower operation.

9. The Hebbian + k-NN Reconciliation Layer

One of the more practically significant findings of Whitebook v3 is that Hebbian and k-NN need not be treated as competitors. The reconciliation architecture positions Hebbian routing as a cheap elimination layer that filters options before expensive k-NN verification.

Layer 1: Hebbian Domain-Locked Router (O(1))
  → Selects best agent in domain by weight
  → Produces prediction

Layer 2: k-NN Verification (O(W))
  → k=5 nearest neighbors in W=200 step window
  → Produces independent prediction

Reconciliation:
  if |heb_pred - knn_pred| < threshold (3.0):
      AGREE → use cheap Hebbian answer
  else:
      DISAGREE → weighted average based on Hebbian confidence

The key empirical finding: when Hebbian and k-NN disagree, Hebbian is correct 94% of the time. This is because domain-locked agents accumulate specialized knowledge through their weight history that general-purpose nearest-neighbor lookup cannot replicate.

The reconciled system operates at 71.9% of pure k-NN cost (28.1% savings) while achieving better accuracy than either system alone. Agreement rate is ~85%; only ~15% of decisions invoke the expensive k-NN path.

10. Open Questions and Next Steps

This body of work is explicitly exploratory. The following questions are not yet resolved:

On architecture:

The experiments use synthetic data with known generating functions. How does domain-locked routing perform on real-world task distributions where domain boundaries are fuzzier?
The 3-agents-per-domain configuration is identified as a practical sweet spot, but this was tested under specific drift conditions. Does optimal pool size vary with drift rate?

On the plasticity-stability trade-off:

The Context-Aware and Dynamic Penalty models show promise in reducing switching cost, but have been validated on a limited class of concept drift patterns. More adversarial drift profiles (e.g. gradual drift, oscillating drift, multi-domain simultaneous drift) remain untested.
The Decay Hebbian model's failure on static data raises a question for dynamic data: is there a decay rate that optimally balances plasticity and stability across varied task distributions?

11. Summary

This report documents a progression of simulation experiments exploring Hebbian learning as an adaptive routing mechanism for multi-agent systems. The key findings, stated without overreach, are:

The plasticity-stability trade-off is real and consequential. Aggressive decay (Decay Hebbian) hurt performance on static data. Context-aware and dynamic penalty mechanisms reduce this cost on dynamic data but have not been exhaustively validated.

Naive Hebbian routing does not automatically beat monolithic baselines on concept drift. The switching cost the lag between environmental change and routing adaptation is a genuine liability that requires architectural attention.

Domain-locking is the most impactful architectural intervention explored. Constraining agents to ATP ActionType domains eliminates cross-domain interference and produces an 80%+ MAE improvement over unconstrained routing, with O(1) computational cost. The architecture itself, not training data volume, is the primary driver of this improvement.

The Active Sentinel adds a self-correcting immune layer. By monitoring oscillation rates within each domain, the sentinel detects when a routing choice is pathological and intervenes halving the dominant agent's weight to force exploration. In simulation, all 16 sentinel interventions targeted the most volatile domain (Scaffold) without requiring any manual configuration. The system learns where it is sick.

The Hebbian marketplace has emergent resilience properties automatic deselection of corrupted agents, failure-rate-based detection of capability gaps, domain ceiling signals that trigger organic expansion, and 4–5× faster recovery velocity than monolithic alternatives that are not achievable by design in single-model systems.

Reconciliation with k-NN offers a cost-effective path to combining the cheap adaptability of Hebbian routing with the verified accuracy of nearest-neighbor inference, operating at 71.9% of pure k-NN cost. When they disagree, Hebbian is right 94% of the time.

The Memory Bus provides the infrastructure backbone that makes all of the above possible at scale atomic write-through synchronization, a tiered read hierarchy, and near-real-time weight propagation across the full agent collective.

This document was co-authored with AI assistance. All simulation data drawn from Collab docs available on GitHub (February 2026).

AgenticGovernace / AgenticGovernance-ArtemisCity

This project establishes a governance framework for large-scale multi-agent deployments in which transparency is intrinsic rather than retrospective.

Artemis City

Artemis City is an architectural framework designed to align agentic reasoning with transparent, accountable action across distributed intelligence systems—both human and machine. It establishes a governance framework for large-scale multi-agent deployments where transparency is intrinsic rather than retrospective.

The platform is a Multi-Agent Coordination Platform (MCP) built around an Obsidian vault as persistent memory. Agents communicate via the Artemis Transmission Protocol (ATP), are ranked by Hebbian-weighted trust scores, and route tasks through a central orchestrator.

🚀 Overview

Persistent Memory: Uses an Obsidian vault as a write-through memory bus.
Protocol-Driven: Agents communicate using structured ATP headers (Mode, Priority, Action, Context).
Adaptive Governance: Trust scores (Hebbian weights) evolve based on agent performance and decay over time.
Full Stack: Includes a Python orchestration engine, a TypeScript/Express API, and a React-based dashboard.

🛠 Tech Stack

Core Logic: Python 3.10+ (FastAPI, SQLAlchemy, Pydantic, Pytest)
Persistent…

View on GitHub

Why Every Agent Needs A Transmission Protocol

Prinston Palmer — Sun, 15 Mar 2026 13:27:14 +0000

Overview of concept architecture

The Multi-Agent Systems

The most interesting feature of current agent ecosystems that should be explored is do your agents actually understand each other, or do they simply just share a corpus? They look like they do. Two agents pass JSON back and forth, one generates a plan, the other executes it, and the output lands in your inbox looking polished and intentional. But under the hood? It’s barely-controlled chaos. The planner agent didn’t tell the executor why it chose that approach. The executor didn’t confirm it understood the constraints. And when something breaks at 3 AM in production, there’s no record of the conversation that led to the failure. Writing a new prompt for each agent to understand their role and maintaining Agent cards becomes tedious. How do you maintain prompt intent across context windows and Agents without human overload.

These are some of the problem we set out to solve when we built the Agentic Transmission Protocol ATP as the backbone of Artemis City’s multi-agent orchestration platform. This protocol first applied to prompts directed at Artemis directly and through testing discovered it can be used across various agents and hence the acronym is dual serving. And after months of building, breaking, and rebuilding agent communication systems, we’re ready to make our case: every serious multi-agent system needs a transmission protocol, and here’s how to approach building one.

What Even Is a “Transmission Protocol” for Agents?

If you’ve ever worked with HTTP, gRPC, or even MQTT, you already understand the concept. A transmission protocol defines how messages are structured, routed, and interpreted between communicating parties. For web servers, that’s straightforward request/response pairs with headers, status codes, and payloads. For AI agents, it’s dramatically more complex and is hindered by conflicts in translation vs transliteration, also rendering, reading, and printing. These may seem like unrelated aspects but are the main points of lossy noise, and opens door for complex acttack vectors aided by AI. Agents aren’t stateless web servers. They carry context. They make judgment calls. They interpret ambiguity. And crucially, they operate in environments where the “right answer” depends on who’s asking, what they already know, and what they’re trying to accomplish. Without a structured communication layer, you get what we call semantic drift the gradual divergence between what one agent meant* and what another agent understood. ATP solves this by wrapping every agent-to-agent communication in a structured envelope that carries not just the message, but the intent, context, priority, and expected response type alongside it. The envelope is meant to grow along with the need of the domain it is applied.

The Foundational Signals

The core of ATP is deceptively simple: six signal tags that travel with every message between agents. These aren’t optional metadata they’re mandatory headers that define the communication contract.

#Mode defines the overall intent of the transmission. Is this a Build operation where code needs to be written?

A Review where existing work needs critique?

An Organize pass where the knowledge base is being restructured?

A Capture where raw thoughts are being logged?

A Synthesize where multiple inputs need to be merged?

Or a

Commit where finalized work is being saved?

The mode tells the receiving agent how to think not just what to do.

#Context anchors the transmission to a specific mission or goal. This isn’t a full project description it’s a one-line compass heading. “Initial CLI Trigger Script.” “Q3 Compliance Audit Trail.” “User onboarding flow redesign.”

It keeps every agent oriented toward the same north star, even when they’re operating asynchronously and independently.

#Priority signals urgency. Critical means drop everything. High means prioritize over default work. Normal is standard queue processing. Low means handle when idle. This is essential for production systems where not every task has equal weight, and where an agent burning tokens on a low-priority research task while a critical deployment is stalled is a real failure mode. This extends to which agent you allow to attempt this task. Through domain specialization agents will dominate specific portions of task, information retrieval needs may vary.

#ActionType specifies what kind of response the sender expects. Summarize means compress and distill. Scaffold means create a structural foundation. Execute means build the thing. Reflect means analyze what happened and provide insight. This tag prevents one of the most common failures in agent systems: the agent that was asked to “look into authentication options” and returns a 50-page implementation instead of a three-paragraph summary. A workflow can straddle multiple actiontypes and agents, this field is most valuable when used with a central orchestrator. The Mode is the overall output, the actiontype is what is expected of recieving agent. A build step could involve large reasearch that should be summarized and contextualized with database, summarized to match the Mode.

#TargetZone maps the transmission to a physical location in the project architecture or workflow output location, reduces the data crawl scope. This does two things: it scopes the agent’s attention to the relevant part of the codebase, and it provides an auditable record of where changes are being directed. When you’re running dozens of agents across a monorepo, this isn’t optional it’s the difference between your compiled code running and three repo that now uses UV, pipenv, Yarn, NPM in the same src/ calls.

#SpecialNotes is the escape hatch. “Must be compatible with Git safe-commit checks.” “Do not modify the .env file.” “This is a dry run no actual writes.” Every edge case, every exception, every “by the way” lands here.

And critically, it’s a formal field, not a casual aside buried in a prompt. Agents parse it. Governance systems log it. Nothing gets lost in the noise.

How Artemis City Routes Tasks

In Artemis City, ATP isn’t just a nice-to-have formatting standard it’s the language the kernel speaks.

When a task enters the system, the kernel’s ATPParser module reads the signal tags and makes routing decisions in real time. Here’s what that looks like in practice:

A user submits a task: “Build a Python trigger that allows Codex to repackage files after a push event.”

The kernel wraps this in an ATP envelope:

Simplified Kernel use

Mode: Build

#Context: Initial Codex CLI Trigger Script

#Priority: High

#ActionType: Scaffold

#TargetZone: /Projects/Codex_Experiments/scripts/

#SpecialNotes: Must be compatible with Git safe-commit checks.

Now the kernel’s router has everything it needs.

#Mode: Build combined with #ActionType: Scaffold tells it this is a code-generation task that needs structural output first, not a finished product. The router queries the Agent Registry, finds agents with code_generation and python capabilities, selects the highest-scoring candidate (based on composite trust scores weighing alignment, accuracy, and efficiency), and dispatches the task.

The selected agent receives the ATP envelope, and now it knows exactly what to do: scaffold a Python trigger script, scope it to the Codex Experiments directory, ensure Git compatibility, and treat this as high-priority work. No guessing. No prompt engineering hacks. No “let me think about what you might have meant.”

The entire routing decision from ATP parsing to agent selection to dispatch takes approximately 7 milliseconds. Compare that to the 800ms+ you’d spend asking an LLM to read agent profiles and pick the right one. That’s a 99% latency reduction, and it’s fully deterministic. Same input, same routing, every single time.

What the current AI Discussions are Missing

Let’s be direct about what the current agent ecosystem looks like without something like ATP.

In most frameworks agents communicate through one of two mechanisms: function call chaining (where outputs from one agent become inputs to another through code-level plumbing) or prompt injection (where one agent’s output is literally pasted into another agent’s context window).

Both approaches have the same fundamental problem: they carry data without carrying intent.

When Agent A passes a 2,000-token output to Agent B, Agent B has to infer everything about that message. What was the goal? What constraints apply? How urgent is this? What kind of response is expected? Agent B has no structured way to know so it guesses. And LLMs, as we all know, are confidently wrong guessers. This is why you see the classic multi-agent failure pattern: agents that spiral into infinite loops, agents that solve the wrong problem with beautiful precision, agents that ignore critical constraints because they weren’t formatted in a way the model could parse reliably.

ATP eliminates inference at the communication layer. Every message arrives with its own instruction set. The receiving agent doesn’t guess it reads the contract and responds accordingly

The Deeper Architecture: Symmetric Tags and Fault Awareness

ATP doesn’t just handle the initial dispatch. It governs the entire conversation lifecycle.

Every outbound ATP tag expects a corresponding acknowledgment. When an agent sends a #Mode: Build message, the receiving agent must respond with a #Mode_Ack: Build to confirm it understood the operating mode. When a #Context tag is set, the response carries a #Context_Ref back-link. This symmetric tagging creates a verifiable handshake both sides of the conversation are on record confirming alignment.

But the real innovation is in fault awareness. If an agent receives a message with a tag it doesn’t recognize, or a #TargetZone that doesn’t exist in the current project structure, it doesn’t guess or hallucinate an interpretation. Instead, it emits a structured warning:

Intersect_Warning: Tag not mapped in ATP.

Request human arbitration or memory recall.

This is the difference between a communication protocol and a prayer. In traditional agent frameworks, an unrecognized instruction gets absorbed into the context window and the model does its best which often means doing something confidently incorrect. In ATP, ambiguity triggers an explicit interrupt. The system stops, flags the issue, and waits for resolution. No silent failures. No confident hallucinations.

Hash-Based Context Linking: Memory Across Conversations

One of ATP’s most powerful features is its hash-based context linking system. Every ATP message block receives a unique context hash a short identifier like ctx_4df3a that tags the semantic content of that exchange. When another agent references the same context later (even in a different session, or days later), it uses reply_ctx_4df3a to create an explicit link.

This means agents can reference the same context across disconnected threads, sessions, and even different model instances. It’s the difference between an agent that says “I remember building that feature” (and is hallucinating) and one that says “I’m referencing context ctx_4df3a from the 2026–02–10 session” (and can prove it).

In Artemis City, these context hashes are stored in the Memory Bus and indexed in both the Obsidian knowledge vault and the Supabase vector store. They’re searchable, auditable, and decay-aware meaning the system knows not just what was said, but when it was said and how reliable it still is.

Why This Matters Beyond Artemis City

ATP was designed for Artemis City, but the problems it solves are universal.

If you’re building any multi-agent system whether it’s a coding assistant, an enterprise workflow engine, a research pipeline, or an AI-driven operations platform you will eventually hit the wall of unstructured agent communication. Your agents will miscommunicate. They’ll lose context. They’ll make decisions without explaining why. And when you try to debug what happened, you’ll find a pile of JSON blobs and prompt logs that tell you what each agent did but not why.

ATP provides the “why” layer. It’s the structured intent metadata that turns agent communication from a best-effort guess into a verifiable contract.

The design principles transfer directly:

Define modes, not just messages. Don’t just tell agents what to do tell them how to think about what they’re doing. A build task and a review task require fundamentally different reasoning approaches, even if the subject matter is identical.

Carry context explicitly. Never rely on an LLM to infer the project goal from the content of the message. State it. Tag it. Make it mandatory.

Demand acknowledgment. Symmetric tags aren’t bureaucracy they’re verification. If the receiving agent can’t confirm it understood the instruction, you’ve caught a failure before it becomes a production incident.

Interrupt on ambiguity. The most dangerous thing an agent can do is confidently proceed when it doesn’t fully understand the task. Build fault awareness into the protocol layer, not the model layer.

Type caption for image (optional)

The Road Ahead

ATP v0.3 is live in Artemis City today, and we’re already working on the next evolution. Future versions will introduce specialized modes like #Mode: VoiceReflect for speech-captured inputs that need different parsing. We’re exploring weighted priority systems where the kernel can dynamically adjust task urgency based on system load and deadline proximity. And we’re building cross-instance ATP the ability for separate Artemis City deployments to communicate through the same protocol, creating a federation of governed agent systems.

But the core philosophy won’t change: agents need structure to communicate reliably, and that structure needs to be explicit, mandatory, and verifiable.

The era of agents passing unstructured prompts back and forth and hoping for the best is over. If you’re building production-grade multi-agent systems, you need a transmission protocol. ATP is ours. Build yours. Or better yet help us build the standard that the entire ecosystem can share.