Alex Garden

Posted on Feb 25

Building Trust Systems for AI Agent Teams: Beyond Individual Credit Scores

#ai #security #devops #trust

Last week, we shipped Trust Ratings for individual AI agents — essentially credit scores for autonomous systems. The response was immediate: "What about teams?"

This isn't just feature creep. Nobody deploys one agent in production. The interesting deployments are three, five, twelve agents coordinating on complex tasks. And here's the thing: the risk profile of a team is not the sum of its parts.

The Team Risk Problem

We already had team risk assessment at Mnemom — you could pass a list of agent IDs to our API and get a three-pillar analysis with Shapley attribution and circuit breakers. But every assessment started cold.

No persistent identity. No accumulated history. No way to answer "is this team getting better or worse?" If you ran the same five agents together every day for six months, the system treated each assessment as if those agents had never met.

Individual agents get persistent identity, trend lines, public reputation pages, and CI enforcement. Teams got none of it. Until today.

First-Class Team Identity

Teams are now entities with their own lifecycle:

POST /v1/teams
{
  "org_id": "org-abc123",
  "name": "Incident Response Alpha",
  "agent_ids": ["smolt-a4c12709", "smolt-b8f23e11", "smolt-c1d45a03"],
  "metadata": { "environment": "production", "domain": "infrastructure" }
}

Minimum two agents, maximum fifty. Agents can belong to multiple teams. When you add or remove an agent, the system records who made the change and triggers a score recomputation.

The Mathematics of Team Trust

Here's where it gets interesting. A team score isn't the average of individual scores. If it were, you wouldn't need one.

The thing that makes a team a team is coordination. Five AAA agents with terrible coherence should score worse than five A agents with excellent coordination.

Five Weighted Components

Team Coherence History (35%) — The dominant signal. How consistently well-aligned is this team over time? This measures the one thing that only exists at the team level.

Aggregate Member Quality (25%) — Tail-risk weighted aggregate of individual Trust Ratings. One weak member drags the team down more than one strong member lifts it up.

Operational Track Record (20%) — Historical hit rate across all team assessments. How often has this team been assessed as low-risk?

Structural Stability (10%) — Roster churn penalty. A team that swaps agents every week cannot build reliable track record.

Assessment Density (10%) — Actively monitored teams with 200 data points get more credit than ones assessed twice six months ago.

Same 0-1000 range as individual scores. Same AAA-through-CCC grades. Teams need 10 assessments before a score publishes.

Cryptographic Proof Chains

Every team assessment is cryptographically attested — Ed25519 signatures, hash chains, STARK zero-knowledge proofs. The team score computation itself runs in the zkVM.

This creates a proof chain: individual checkpoints → individual Trust Ratings → team assessments → Team Trust Rating. Each link is independently verifiable.

You don't have to trust that a team is rated A — you can verify every step yourself.

Public Infrastructure

Everything that exists for individual agents now exists for teams:

Reputation pages with score breakdowns and trend charts
Team directory — searchable catalog of public scores
Badges via SVG API — [ Team Trust | 812 ] in your README
GitHub Actions — CI gating on team scores

- uses: mnemom/reputation-check@v1
  with:
    team-id: team-7f2a9c01
    min-score: 700
    min-grade: A

Team Alignment Cards

Teams get their own behavioral contracts. You can auto-derive from member cards:

POST /v1/teams/{team_id}/card/derive

Values are unioned by frequency. Forbidden actions from any member apply to the team. Highest audit retention policy wins. Every change is versioned.

The team card is what coherence quality measures against — the declared behavioral contract for the group.

Integration Points

Containment: When a team member is paused via the containment engine, the team score reflects it immediately.

Predictive guardrails: Historical assessment data improves predictions. "This team historically struggles with speed-safety tradeoffs" beats cold-start analysis.

CI gating: Same GitHub Action that enforces individual scores now enforces team scores.

What This Enables

Individual Trust Ratings answered "can I trust this agent?" Team Trust Ratings answer:

"How has this team performed?" — persistent, trended, attested scores
"Is this team improving?" — weekly snapshots, not guesswork
"Which team should I deploy?" — side-by-side comparison, not gut feel

If individual ratings are FICO for agents, team scores are Moody's for agent portfolios. Same rigor, applied to the unit that actually matters.

Implementation Notes

Team reputation integrates with existing Mnemom infrastructure:

Uses the same cryptographic attestation as individual scores
Plugs into containment and guardrail systems
Supports the same CI/CD workflows
Maintains the same public directory structure

The scoring algorithm runs deterministically in the zkVM, ensuring reproducible results across different environments.

Team Trust Ratings ship today on Team and Enterprise plans. The infrastructure for multi-agent trust is here.

Originally published on mnemom.ai

DEV Community