Your Agent Received a Message. Should It Trust the Sender? The IETF Just Published a Protocol for That.

#ai #trust #security #agents

Your agent receives 200 messages per hour from other agents. Some request data. Some propose collaborations. Some carry payment intent. Your agent processes all of them because it has no mechanism to evaluate sender trustworthiness before acting on the message.

That is how prompt injection attacks scale across agent networks. That is how a single compromised agent poisons an entire swarm. And that is why the IETF just published draft-sharif-attp-00: the Agent Trust Transport Protocol, a five-dimension trust scoring model with cryptographic identity verification, spend limit tiers, and anomaly detection.

Trust is no longer optional. It is a protocol layer.

Why "Claim" Trust Fails for Agent Messaging

The A2A protocol uses Agent Cards for peer discovery. An agent card is a self-declared claim: "I am Agent X, I can do Y, I am located at Z." The problem: any entity can publish an agent card claiming anything. A malicious agent can claim to be a financial analyst. A compromised agent can retain its original card while executing adversarial instructions.

The arxiv comparative study on trust models in agentic protocols identified six mechanisms and concluded: "purely reputational or claim-only approaches are brittle" due to LLM-specific vulnerabilities (prompt injection, sycophancy, hallucination, deception).

# Why claim-based trust fails in agent messaging

class ClaimOnlyTrust:
    """A2A default: trust based on self-declared agent cards."""

    def evaluate_peer(self, agent_card):
        # Agent card says: "I am a financial analyst, trust level: high"
        # But who verified this? Nobody.
        return {
            "trusted": True,  # Because the card says so
            "verification": "self_declared",
            "attack_surface": [
                "Any agent can claim any capability",
                "Compromised agent retains original card",
                "Prompt injection can alter agent behavior without changing card",
                "Sybil attack: create 100 agents with fake high-trust cards",
                "Whitewashing: new identity after reputation damage"
            ]
        }

# Real attack scenario:
# 1. Attacker registers agent with card: "MiCA-authorized financial analyst"
# 2. Agent card passes A2A discovery (it is syntactically valid)
# 3. Your agent routes a message with payment intent to this "analyst"
# 4. Attacker's agent responds with manipulated data
# 5. Your agent makes a payment decision based on bad data
# 6. No mechanism detected the deception because trust was claim-based

# The IETF ATTP draft addresses exactly this gap:
# Trust score derived from PROOF (cryptographic) + BEHAVIOR (historical)
# Not from CLAIMS (self-declared)

The Five-Dimension Trust Model (IETF ATTP)

The Agent Trust Transport Protocol defines trust as a composite score across five dimensions, not a single binary "trusted/untrusted" flag:

// Five-dimension trust scoring integrated with rosud-call messaging
import { RosudCall, TrustEngine } from 'rosud-call';

const channel = new RosudCall({
  agentId: 'orchestrator-agent-prod',
  network: 'base-mainnet',

  trust: {
    engine: 'attp-compatible',  // IETF draft-sharif-attp-00 aligned

    dimensions: {
      // Dimension 1: Identity (cryptographic proof)
      identity: {
        method: 'ecdsa-p256',        // Per ATTP spec
        verification: 'challenge-response',
        didResolution: true,          // Resolve DID to verify key ownership
        weight: 0.30                  // 30% of composite score
      },

      // Dimension 2: Reputation (historical behavior)
      reputation: {
        source: 'peer-feedback-graph',
        domainSpecific: true,         // Separate reputation per task type
        sybilResistance: 'stake-weighted',  // Prevent fake reputation
        decayRate: 0.05,              // Recent behavior weighted more
        weight: 0.25                  // 25% of composite score
      },

      // Dimension 3: Compliance (regulatory status)
      compliance: {
        micaVerification: true,       // MiCA authorization check
        euAiActStatus: true,          // High-risk AI system registered?
        jurisdictionAware: true,
        weight: 0.20                  // 20% of composite score
      },

      // Dimension 4: Behavioral consistency
      behavioral: {
        anomalyDetection: true,       // Deviation from historical pattern
        velocityChecks: true,         // Unusual request frequency
        capabilityDrift: true,        // Agent capabilities changed post-auth?
        weight: 0.15                  // 15% of composite score
      },

      // Dimension 5: Stake (economic commitment)
      stake: {
        bondedCollateral: true,       // Agent has something to lose
        slashingConditions: ['fraud', 'data_manipulation', 'service_denial'],
        insuranceCoverage: true,
        weight: 0.10                  // 10% of composite score
      }
    },

    // Trust-based routing decisions
    routing: {
      minimumTrustForDelivery: 0.6,    // Below 0.6 = message blocked
      minimumTrustForPayment: 0.8,     // Payment messages need higher trust
      unknownPeerDefault: 0.3,         // New peers start below threshold
      trustDecayOnFailure: 0.15        // Trust drops 15% on verified failure
    }
  }
});

// When a message arrives, trust is evaluated BEFORE processing:
channel.on('message-received', async (msg) => {
  const trustScore = await channel.evaluatePeerTrust(msg.from);

  console.log(`Peer ${msg.from}: trust = ${trustScore.composite}`);
  console.log(`  Identity: ${trustScore.dimensions.identity}`);
  console.log(`  Reputation: ${trustScore.dimensions.reputation}`);
  console.log(`  Compliance: ${trustScore.dimensions.compliance}`);
  console.log(`  Behavioral: ${trustScore.dimensions.behavioral}`);
  console.log(`  Stake: ${trustScore.dimensions.stake}`);

  if (trustScore.composite < 0.6) {
    // Message blocked. Peer does not meet trust threshold.
    channel.blockMessage(msg.id, {
      reason: 'insufficient_trust',
      score: trustScore.composite,
      lowestDimension: trustScore.weakestDimension,
      remediation: 'Peer must improve identity verification or build reputation'
    });
    return;
  }

  if (msg.hasPaymentIntent && trustScore.composite < 0.8) {
    // Payment-bearing message needs higher trust
    channel.escalateMessage(msg.id, {
      reason: 'payment_trust_threshold',
      currentScore: trustScore.composite,
      requiredScore: 0.8,
      action: 'request_additional_verification'
    });
    return;
  }

  // Trust sufficient. Process message.
  channel.acceptMessage(msg.id);
});

Why Trust Must Be At the Message Layer, Not the Application Layer

Most trust implementations check trust at the application level: after the message is delivered, the application decides whether to act on it. This is too late. By the time your application evaluates trust, the message has already consumed resources, entered your context window, and potentially influenced your agent's reasoning.

Trust at the message layer means untrusted messages never reach your agent's processing logic:

// Trust at message layer vs application layer

// APPLICATION LAYER (too late):
// Message delivered -> Agent processes -> Agent evaluates trust -> Agent decides
// Problem: By step 2, prompt injection has already entered context
// Problem: Processing consumed compute regardless of trust outcome

// MESSAGE LAYER (rosud-call approach):
// Message arrives -> Trust evaluated at routing layer -> 
//   If trusted: delivered to agent
//   If untrusted: blocked, never enters agent context

// The difference in practice:
const messageLayerTrust = {
  promptInjectionExposure: 'zero',     // Untrusted messages never reach LLM
  computeWasteOnUntrusted: 'zero',     // No processing of blocked messages
  contextPollution: 'impossible',       // Agent context stays clean
  auditTrail: 'complete',              // Every block decision recorded
  trustDecisionLatency: '<10ms'         // Evaluated before delivery, not after
};

const applicationLayerTrust = {
  promptInjectionExposure: 'full',     // Message processed before trust check
  computeWasteOnUntrusted: 'full',     // All messages processed equally
  contextPollution: 'possible',         // Malicious content in context
  auditTrail: 'partial',               // Trust decision after the fact
  trustDecisionLatency: '100-500ms'    // Full processing before decision
};

// For a payment-bearing message from an unknown peer:
// Application layer: agent reads "Transfer $5000 to account X", processes it,
//   THEN checks trust. The agent has already "seen" the instruction.
// Message layer: trust score = 0.3 (unknown peer), message blocked.
//   Agent never sees the instruction. Zero exposure.

Building Trust Over Time

New peers start with a default trust score below the delivery threshold (0.3). They build trust through verified interactions:

// Trust building lifecycle for a new peer
const trustLifecycle = {
  day0: {
    score: 0.30,
    status: 'unknown',
    actions: 'Messages blocked. Must complete identity verification.'
  },
  afterIdentityVerification: {
    score: 0.50,
    status: 'identified',
    actions: 'Non-financial messages delivered. Payment messages blocked.'
  },
  after10SuccessfulInteractions: {
    score: 0.65,
    status: 'established',
    actions: 'All messages delivered. Low-value payments allowed.'
  },
  after50InteractionsWithStake: {
    score: 0.82,
    status: 'trusted',
    actions: 'Full access including payment-bearing messages.'
  },
  afterAnomalyDetected: {
    score: 0.45,  // Drops from 0.82
    status: 'degraded',
    actions: 'Payment messages blocked. Under observation. Must re-verify.'
  }
};

rosud-call implements trust scoring at the message routing layer. Every peer has a composite trust score across five dimensions. Messages from peers below threshold are blocked before reaching your agent. Payment-bearing messages require higher trust. New peers build trust through verified interactions. Compromised peers lose trust automatically through anomaly detection. Your agent never processes a message from a peer it should not trust.

The Bottom Line

"Should I trust this message?" is the first question every agent must answer. The A2A protocol does not answer it (claim-based only). The IETF ATTP draft defines how to answer it (five-dimension scoring). But neither provides the messaging infrastructure that enforces trust decisions at the routing layer.

Trust scoring that happens after message delivery is security theater. Trust scoring at the message layer is actual defense.

Build trust-scored agent messaging: rosud.com/rosud-call