Dimitrios S. Sfyris

Posted on Jun 30 • Edited on Jul 15

Agent-Ready Commerce, Part 9: Evidence and Audit Are Part of the Product

#ai #architecture #ecommerce #systemdesign

An agent-ready commerce platform does not only need to answer, refuse, or act.

It needs evidence explaining why it answered, refused, or acted.

That evidence cannot be reconstructed reliably from vague logs after something goes wrong. It has to be captured at the moment the platform makes the decision.

When an agent says a product can be compared, the platform should know which product facts supported the comparison.

When an agent refuses to quote a return policy, the platform should know which policy coverage was missing.

When checkout is blocked, the platform should know which facts, rules, policies, state transitions, or authority checks blocked it.

When delegated payment is refused, the platform should know whether the blocker was checkout validation, checkout state, mandate scope, cart snapshot mismatch, amount limit, currency mismatch, expiry, revocation, confirmation, provider state, or order commitment.

When a generated claim is quoted, the platform should know which facts, decisions, review status, allowed use, and expiry window made it safe at that time. The envelope should preserve that basis, while the audience-facing explanation should show only the detail the audience needs.

This is not ordinary logging.

Logs describe activity.

Evidence explains the basis of a decision.

Audit makes that explanation durable.

Agent-ready commerce needs evidence that survives beyond the request.

Series context

This is the ninth article in the Agent-Ready Commerce series.

Part 1 introduced the broader architecture model:

Facts → Eligibility → Authority → State transition → Evidence → Audit

Part 2 focused on commercial truth. It argued that catalog data is not enough. A platform needs source-backed, freshness-aware product facts before agents or other systems can safely rely on product information.

Part 3 focused on action eligibility. It argued that “available” is too broad. A product may be discoverable but not checkout-ready, comparable but not policy-quotable, or checkout-ready for a human flow but not eligible for delegated payment.

Part 4 focused on policy structure. It argued that agents should not interpret free-text policy pages as executable rules. Policies need structured facts with applicability, evidence, lifecycle, conflicts, and quotability.

Part 5 focused on protocol adapters. It argued that ACP, UCP, MCP, AP2-related flows, feeds, tools, and future interfaces should translate domain decisions rather than becoming separate sources of commercial meaning.

Part 6 focused on checkout state. It argued that checkout is where agent-ready commerce crosses from answering into mutating commercial state. Checkout should therefore use explicit checkout state with controlled transitions, not a loose form endpoint.

Part 7 focused on delegated payment. It argued that a payment artifact is not enough. Delegated payment requires bounded authority over a specific checkout snapshot, amount, currency, merchant, actor, buyer, time window, evidence, and audit.

Part 8 focused on generated claims. It argued that generated commerce text should be treated as a derived claim with dependencies, scope, review status, allowed use, invalidation, expiry, and audit, with each direct parent projection bound into the child’s canonical provenance.

This article focuses on evidence and audit.

The central argument is that evidence and audit are not backend observability details. They are product capabilities. Agent-ready commerce needs decision envelopes, evidence references, replayable inputs, protocol-safe explanations, recorded projection evidence, operator timelines, and audit trails because agents turn platform decisions into buyer-facing answers and commercial actions.

A safe platform follows this chain:

Request
      ↓
Decision
      ↓
Evidence record
      ↓
Protocol-safe explanation
      ↓
Projection or state transition
      ↓
Audit event
      ↓
Operator and support timeline

Without that chain, the platform may know what happened, but it cannot explain why it happened.

Logs are not evidence

Logs are useful.

They help engineers debug systems. They capture service calls, timings, warnings, exceptions, queue events, request identifiers, and operational details.

But logs are not enough for agent-ready commerce.

A log line such as:

checkout blocked for BAG-TRAVEL-42

does not explain the decision.

It does not say which checkout command was requested. It does not say which actor requested it. It does not say whether inventory was stale, which policy facts were missing, or which eligibility rule applied. It also omits the payment-authority state, the checkout state before the command, the returned blocker codes, and the explanation shown to the agent.

A better record answers:

Which action was requested?
Which actor requested it?
Which product, cart, checkout, claim, or mandate was targeted?
Which facts were selected?
Were those facts fresh?
Which policy facts applied?
Which required facts were missing?
Which rule identity evaluated the action?
Which authority record was checked?
Which state transition was attempted?
Which result was produced?
Which explanation was projected?
Which audit event recorded it?

That is evidence.

A useful distinction is:

Log:
A technical record that something happened.

Evidence:
A structured reference to the facts, policies, rules, states, authority records, generated claims, and decisions that justified the result.

Audit:
A durable history of decisions, projections, transitions, explanations, evidence use, invalidations, and remediation.

Logs support debugging.

Evidence supports decision trust.

Audit supports accountability, replay, support, remediation, and dispute handling.

Agent-ready commerce needs all three, but they are not the same layer.

The Travel Backpack example

Continue with the running example from Parts 2–8:

Product: Travel Backpack
SKU: BAG-TRAVEL-42
Price: €129
Catalog status: active
Inventory status: in_stock
Category: Travel Bags

The commercial truth layer currently says:

Price: fresh
Inventory: stale
Return policy: missing for Travel Bags
Warranty policy: known
Shipping policy: known for EU, unknown for US
Generated description: pending review
Feed publication: last published yesterday

From Part 3, the action matrix was:

Action	Result
`discover`	allowed
`compare`	allowed
`quote_policy`	blocked
`add_to_cart`	requires_revalidation
`prepare_checkout`	blocked
`delegate_payment`	blocked

Now suppose an agent asks:

Can you buy the Travel Backpack for the buyer if the total stays under €150?

The platform should not produce only this:

{
  "allowed": false
}

It should produce a decision with evidence.

A useful response may say:

Delegated payment is blocked because checkout state is not valid for payment.

Checkout cannot be prepared because:
- inventory requires revalidation
- return-policy coverage is missing for Travel Bags

Internally, the platform should preserve the decision path:

delegate_payment
      ↓ blocked because checkout state is not valid for payment

prepare_checkout
      ↓ blocked because inventory is stale
      ↓ blocked because return-policy coverage is missing

quote_policy
      ↓ blocked because no return-policy fact applies to Travel Bags

That evidence path matters because different users need different projections.

The buyer may need a concise explanation.

The agent may need structured blocker codes.

The operator may need remediation tasks.

Support may need a customer-safe timeline.

The auditor may need source facts, decision envelopes, rule identities, protocol projections, and timestamps.

The same decision needs multiple views, but it should come from one evidence chain.

Evidence belongs to allowed decisions too

Evidence should not be attached only to failures.

Blocked decisions are important because they explain refusal. Allowed decisions are equally important because they explain reliance.

If an agent is allowed to compare two products, the platform should know which product facts supported the comparison.

If an agent is allowed to quote warranty coverage, the platform should know which warranty policy fact applied.

If checkout is prepared, the platform should know which price, inventory, policy, shipping, tax, and cart snapshot records were used.

If delegated payment is allowed, the platform should know which mandate, cart snapshot, checkout state, amount, currency, merchant binding, expiry check, revocation check, and confirmation record supported the authority decision.

The audit question is not only:

Why did the platform block this?

It is also:

Why did the platform allow this?

A useful rule is:

If the platform can expose an answer or perform an action, it should know what evidence allowed it.

Otherwise, the system may be correct at runtime but unexplainable later.

Decision envelopes are the core artifact

A decision envelope connects the request, its context, the result, the supporting evidence, the explanation, and any meaningful event that followed.

It is canonical because every surface starts from this shared record instead of calculating a separate commercial answer.

A compact model:

type DecisionResult =
  | "allowed"
  | "blocked"
  | "requires_review"
  | "requires_revalidation"
  | "requires_confirmation";

type AgentCommerceDecisionEnvelopeAuthenticator =
  | {
      kind: "digital_signature";
      algorithm: "ed25519";
      format: "detached";
      keyId: string;
      verificationKeyRef: string;
      protectedHash: string;
      value: string;
      verifiable: true;
    }
  | {
      kind: "message_authentication_code";
      algorithm: "hmac-sha256";
      format: "detached";
      keyId: string;
      verificationKeyRef: string;
      protectedHash: string;
      value: string;
      verifiable: true;
    }
  | {
      kind: "unsigned";
      algorithm: "none";
      format: "none";
      protectedHash: string;
      verifiable: false;
      warning: "missing_platform_signing_key";
    };

type AgentCommerceDecisionEnvelope = {
  contractVersion: "agent-commerce-decision-envelope-v4";
  envelopeSchemaVersion: "agent-commerce-decision-envelope-schema-v4";
  decisionId: string;
  decisionHash: string;
  inputDependencyHash: string;
  resultHash: string;
  evaluatedAt: string;
  ruleSetVersion: string;
  ruleSetRef: string;
  ruleSetHash: string;
  authenticator: AgentCommerceDecisionEnvelopeAuthenticator;
  surface?: "feed" | "tool" | "checkout" | "admin" | "support" | "protocol";
  requestedAction: AgentCommerceDecisionAction;
  actor: {
    actorType: "agent" | "buyer" | "merchant" | "operator" | "system";
    agentId?: string;
    merchantId?: string;
  };
  subject: {
    productId?: string;
    variantId?: string;
    sku?: string;
    checkoutId?: string;
    mandateId?: string;
    orderId?: string;
  };
  basis: {
    status: DecisionResult;
    allowed: boolean;
    reasonCodes: readonly string[];
    components: readonly {
      code: string;
      source: "eligibility" | "authority" | "checkout" | "payment" | "generated_claim";
      field: string;
      value: string | number | boolean | null;
      contributesTo: "status" | "payment_dispatch" | "generated_claim_use";
    }[];
  };
  freshness: {
    evaluatedAt: string;
    validUntil: string | null;
    staleAfter: string | null;
    reasonCodes: readonly string[];
    dependencies: readonly {
      ref: string;
      kind: string;
      validUntil?: string | null;
      staleAfter?: string | null;
      hash?: string | null;
    }[];
  };
  evidenceRefs: readonly { type: string; id: string; hash: string; hashAlgorithm: "sha256" }[];
  nextSafeActions: readonly {
    action: string;
    owner: "system" | "operator" | "buyer" | "merchant";
    reasonCode: string;
  }[];
};

The complete envelope also carries input references, eligibility, authority, checkout, payment, and generated-claim projection state.

Its hashes divide responsibility clearly:

inputDependencyHash protects what was evaluated, including decisionId, surface, action, actor, subject, rules, evidence, and freshness dependencies.
resultHash protects what the platform concluded, including eligibility, authority, checkout, payment, generated claims, freshness, reasons, and next actions.
decisionHash binds those two hashes to the declared contract and schema.

The canonical serializer should produce the same bytes for the same normalized input. Object keys are stable, set-like collections are normalized, and valid date-time inputs are converted to one UTC representation before hashing. The canonical reason invariant is also exact: the set of basis.reasonCodes equals the set of basis.components[].code.

The authenticator protects that decision context according to its declared trust model. Before returning an external projection, the boundary should capture any externally supplied or persisted envelope once into a detached, deeply immutable representation. Hash recomputation, reason/component reconciliation, authenticator verification, trusted-key validation, surface binding, live-request binding, freshness evaluation, and projection should all consume that same captured value. A stale envelope must not be projected as allowed. A blocked envelope may still be projected as a refusal so the recipient receives its explanation and remediation context.

Before changing state, the platform should compare the decision's protected dependency hashes with current authoritative snapshots. A missing, changed, or expired dependency requires a fresh decision. An older envelope may still be signed and within its original freshness window even though it no longer describes the state that controls the operation.

An abbreviated illustrative prepare_checkout envelope might look like this:

{
  "contractVersion": "agent-commerce-decision-envelope-v4",
  "envelopeSchemaVersion": "agent-commerce-decision-envelope-schema-v4",
  "decisionId": "decision:travel-backpack:prepare-checkout",
  "decisionHash": "1111111111111111111111111111111111111111111111111111111111111111",
  "inputDependencyHash": "2222222222222222222222222222222222222222222222222222222222222222",
  "resultHash": "3333333333333333333333333333333333333333333333333333333333333333",
  "evaluatedAt": "2026-07-06T10:12:00.000Z",
  "ruleSetVersion": "agent-commerce-decision-rules-v4",
  "ruleSetRef": "ruleset:sha256:5555555555555555555555555555555555555555555555555555555555555555",
  "ruleSetHash": "5555555555555555555555555555555555555555555555555555555555555555",
  "authenticator": {
    "kind": "digital_signature",
    "algorithm": "ed25519",
    "format": "detached",
    "keyId": "platform-key:decision-envelope",
    "verificationKeyRef": "public-key:decision-envelope",
    "protectedHash": "1111111111111111111111111111111111111111111111111111111111111111",
    "value": "<base64url-ed25519-signature>",
    "verifiable": true
  },
  "surface": "checkout",
  "subject": {
    "productId": "product:travel-backpack",
    "checkoutId": "checkout:travel-backpack:example"
  },
  "actor": {
    "actorType": "agent",
    "agentId": "agent:reference-shopper",
    "merchantId": "merchant:travel-demo"
  },
  "requestedAction": "prepare_checkout",
  "freshness": {
    "evaluatedAt": "2026-07-06T10:12:00.000Z",
    "validUntil": "2026-07-06T10:15:00.000Z",
    "staleAfter": "2026-07-06T10:15:00.000Z",
    "reasonCodes": ["stale_inventory"],
    "dependencies": [
      {
        "ref": "inventory:travel-backpack:15",
        "kind": "inventory",
        "hash": "7777777777777777777777777777777777777777777777777777777777777777",
        "staleAfter": "2026-07-06T10:15:00.000Z"
      },
      {
        "ref": "policy:returns:travel-bags:missing",
        "kind": "policy",
        "hash": "8888888888888888888888888888888888888888888888888888888888888888"
      }
    ]
  },
  "basis": {
    "status": "blocked",
    "allowed": false,
    "reasonCodes": [
      "eligibility_blocked",
      "invalid_checkout_state",
      "missing_return_policy",
      "stale_inventory"
    ],
    "components": [
      {
        "code": "missing_return_policy",
        "source": "eligibility",
        "field": "eligibility.blockerCodes",
        "value": "missing_return_policy",
        "contributesTo": "status"
      },
      {
        "code": "stale_inventory",
        "source": "eligibility",
        "field": "eligibility.blockerCodes",
        "value": "stale_inventory",
        "contributesTo": "status"
      },
      {
        "code": "eligibility_blocked",
        "source": "eligibility",
        "field": "eligibility.result",
        "value": "blocked",
        "contributesTo": "status"
      },
      {
        "code": "invalid_checkout_state",
        "source": "checkout",
        "field": "checkout.validForRequestedAction",
        "value": false,
        "contributesTo": "status"
      }
    ]
  },
  "eligibility": {
    "result": "blocked",
    "blockerCodes": ["missing_return_policy", "stale_inventory"],
    "source": "combined"
  },
  "authority": {
    "result": "allowed",
    "blockerCodes": []
  },
  "checkout": {
    "state": "requires_revalidation",
    "validForRequestedAction": false,
    "blockerCodes": ["invalid_checkout_state", "missing_return_policy", "stale_inventory"]
  },
  "evidenceRefs": [
    {
      "type": "inventory_fact",
      "id": "inventory:travel-backpack:15",
      "hash": "9999999999999999999999999999999999999999999999999999999999999999",
      "hashAlgorithm": "sha256"
    },
    {
      "type": "policy_fact",
      "id": "policy:returns:travel-bags:missing",
      "hash": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
      "hashAlgorithm": "sha256"
    }
  ],
  "nextSafeActions": [
    {
      "action": "refresh_inventory_facts",
      "owner": "operator",
      "reasonCode": "stale_inventory"
    },
    {
      "action": "attach_return_policy_fact",
      "owner": "operator",
      "reasonCode": "missing_return_policy"
    }
  ]
}

This is more useful than a log message because it can support:

agent responses
operator tasks
protocol projections
support explanations
meaningful checkout and mandate events
scenario tests
replay analysis

The decision envelope becomes the platform’s stable explanation boundary.

Evidence should be referenced, not copied everywhere

Evidence does not mean copying every product record, policy page, checkout snapshot, mandate, provider response, and generated claim into every decision envelope or creating a parallel evidence table for each transition.

That creates duplication, privacy risk, and consistency problems.

A better model stores compact evidence references rather than copying full evidence records into every decision envelope. A canonical evidence reference contains the evidence type, identifier, explicit SHA-256 hash of the captured artifact or canonical snapshot, and hash algorithm.

Authorized systems can use those references to retrieve richer records. This matters for privacy and security. Payment authority records, buyer identity, provider responses, risk decisions, and support notes may contain sensitive information. The buyer-safe explanation should not expose the same detail as the internal audit view.

Evidence used by a canonical decision should be:

addressable
versioned
hash-pinned
permissioned
visible at the right audience level
captured at decision time

Evidence references and freshness dependencies serve different purposes.

An evidence reference, identified by (type, id), points to an artifact that can substantiate or replay the decision. A freshness dependency, identified by (kind, ref), points to an input whose change or expiry may invalidate the decision. The same artifact may serve both roles, but an identifier-only dependency should not be presented as content-pinned evidence. Conflicting hashes for the same evidence identity should fail rather than coexist.

This lets the platform preserve the explanation without exposing everything everywhere.

Evidence is a graph, not a flat list

Simple decisions may use a flat list of evidence references.

Complex commerce workflows need a dependency graph because one decision may rely on another decision, which in turn relies on several facts and policies.

For example:

delegate_payment decision
      depends on checkout_state
      depends on payment_mandate
      depends on cart_snapshot
      depends on authority_decision

checkout_state
      depends on prepare_checkout transition

prepare_checkout transition
      depends on product facts
      depends on inventory freshness
      depends on policy coverage
      depends on eligibility rules

policy coverage
      depends on policy facts
      depends on source documents
      depends on review decisions

A simplified evidence graph model:

type EvidenceEdgeType =
  | "derived_from"
  | "blocked_by"
  | "allowed_by"
  | "superseded_by"
  | "projected_as"
  | "invalidated_by"
  | "created_task"
  | "recorded_as";

type EvidenceRef = {
  type: string;
  id: string;
  hash: string;
  hashAlgorithm: "sha256";
};

type EvidenceEdge = {
  from: EvidenceRef;
  to: EvidenceRef;
  type: EvidenceEdgeType;
};

For the Travel Backpack:

delegate_payment decision
      blocked_by → checkout state requires_revalidation

checkout state requires_revalidation
      derived_from → prepare_checkout decision

prepare_checkout decision
      blocked_by → stale_inventory
      blocked_by → missing_return_policy

missing_return_policy
      created_task → approve return-policy coverage for Travel Bags

This graph lets the platform trace from a buyer-facing refusal back to the actual source of the refusal. It also prevents derived claims from laundering refusals: if one projected claim was refused for a surface or use, a later generated claim that depends on it should carry that upstream state.

At the generated-claim boundary, each direct edge should also have a canonical dependency reference. The reference binds the parent projection hash, source-envelope and evidence-pin hashes, source record, request-context hash, status, requested surface and use, and refusal kind.

The child provenance hash should cover all of those references, including usable parents, in deterministic order. This creates a dependency and audit edge; it does not replace content-pinned evidence.

That is stronger than storing isolated events or a descriptive lineage label.

It also helps explain changes over time. If return-policy coverage is later approved, the platform can show which old blockers were superseded, which generated claims were invalidated, and which feed projections were refreshed.

Explanations are projections of evidence

An explanation is not the same thing as the evidence record.

The evidence record may contain internal rule IDs, policy fact references, source hashes, payment provider references, risk flags, operator notes, and sensitive buyer information.

The explanation should expose only what is safe and useful for the audience.

Different audiences need different projections:

Audience	Needed explanation
Buyer	Clear, concise, safe reason
Agent	Structured blocker codes and next actions
Operator	Remediation details and affected products
Support	Customer-safe timeline and decision summary
Auditor	Full decision path, evidence refs, rule identities, timestamps
Engineer	Diagnostic details, traces, service calls

For the Travel Backpack, the buyer-safe explanation might be:

Checkout cannot be prepared for this item right now because availability must be revalidated and return terms are not available for this product category.

The agent-facing explanation might be:

{
  "result": "blocked",
  "blockers": [
    {
      "code": "stale_inventory",
      "nextAction": "request_inventory_revalidation"
    },
    {
      "code": "missing_return_policy",
      "nextAction": "operator_policy_review_required"
    }
  ]
}

The operator explanation might be:

BAG-TRAVEL-42 is blocked for checkout preparation.

Required remediation:
- Revalidate inventory.
- Attach or approve return-policy coverage for Travel Bags.

Impact:
Agents may discover and compare the product, but checkout preparation and delegated payment are blocked.

The audit view may include evidence references, rule identities, cart snapshots, timestamps, and the projection evidence recorded by the event that exposed the decision.

These are different explanations from the same evidence chain.

The platform should not generate each explanation independently. It should project them from the decision envelope.

Protocol-safe does not mean vague

Some teams avoid exposing evidence because they do not want to leak internal details.

That is reasonable.

But the solution should not be vague errors.

Weak protocol response:

{
  "error": "not_allowed"
}

Better protocol-safe response:

{
  "result": "blocked",
  "reason": "missing_return_policy",
  "message": "Return-policy terms are not available for this product category.",
  "nextAction": "operator_review_required"
}

This response does not expose internal policy record IDs, source documents, rule implementation, or operator notes. It preserves the domain meaning.

That matters because agents need to know what to do next.

A generic refusal may cause the agent to retry, ask the buyer the wrong question, or proceed with another unsafe assumption.

A structured refusal helps the agent recover correctly.

stale_inventory:
request revalidation

missing_return_policy:
operator remediation required

invalid_checkout_state:
do not request payment

mandate_expired:
request new authority

cart_snapshot_mismatch:
revalidate checkout or request new mandate

Protocol-safe responses should preserve blocker semantics whenever possible.

Recorded projections reveal adapter drift

Part 5 focused on adapter drift. Different adapters may start answering the same commercial question differently.

Evidence helps detect and prevent that drift.

If an MCP-style tool response, an ACP-style commerce response, an AP2-related payment response, an internal feed, and an admin UI all reference the same decision envelope, they can project different shapes while preserving the same meaning.

The response shape can differ.

The decision should not.

For the Travel Backpack:

Feed:
prepareCheckout = blocked

Policy tool:
return-policy quotation blocked

Commerce adapter:
checkout preparation blocked

Payment adapter:
delegated payment blocked because checkout state is not valid for payment

Admin UI:
inventory revalidation and return-policy coverage tasks required

All of these should trace back to compatible evidence. Before an external surface receives the projection, the boundary should verify envelope integrity, authenticator trust, target-surface binding, freshness, and the protected action, subject, and actor against the live request rather than trusting fields at face value. Independently verifiable projections should require a trusted Ed25519 signature. HMAC is appropriate only for an explicitly configured shared-secret boundary, and unsigned output should remain local-development only.

The meaningful event that exposes a decision can carry projection evidence showing how the canonical result was presented to a surface:

type ProtocolProjectionEvidence = {
  projectionId: string;
  decisionId: string;
  protocol:
    | "internal_feed"
    | "mcp"
    | "acp"
    | "ucp"
    | "ap2"
    | "marketplace_api"
    | "admin";

  audience:
    | "buyer"
    | "agent"
    | "operator"
    | "support"
    | "internal";

  projectedResult: string;
  projectedBlockerCodes: string[];
  redactedEvidenceRefs: EvidenceRef[];

  createdAt: string;
};

This supports a specific audit question:

The domain blocked checkout for two reasons. What did the agent actually receive?

That question cannot be answered from the domain decision alone.

A correct domain decision can become a misleading external answer if the adapter projects it badly.

Recorded projection evidence is also where scope survives the adapter. If a claim was grounded for discovery, the owning event should make that clear instead of letting the same language become checkout copy. If a claim exists but is refused for this surface, that refusal should be recorded differently from no claim being available.

Audit should therefore preserve projection context beside the decision, without requiring a separate projection table.

Blocked decisions are product signals

Systems often record successful transactions more carefully than blocked ones.

Agent-ready commerce needs blocked decisions to be first-class audit events.

Blocked decisions explain why the platform refused to answer, quote, mutate, or pay. They also identify operational work.

A blocked prepare_checkout transition should record:

requested command
actor
current checkout state
decision result
blockers
evidence refs
protocol projection
operator tasks created
timestamp

A blocked delegate_payment decision should record:

requested payment action
checkout state
mandate reference
cart snapshot
authority decision
blockers
evidence refs
payment not attempted
timestamp

A blocked generated claim should record:

claim text
unsupported segment
missing or stale dependency
allowed use requested
review state
invalidation reason
timestamp

Blocked events are not noise.

They are the platform’s safety model working.

If the audit trail records only success, it misses the most important part of agent-readiness.

Replayability requires snapshots, hashes, and stable references

Audit is weak if the platform cannot reconstruct the inputs to a decision.

Reconstruction does not always mean storing every full payload forever. A platform can use snapshots, hashes, and stable references.

Important inputs include:

product fact references
policy fact references
eligibility rule identity
authority record reference
cart snapshot
checkout state
payment mandate reference
generated claim reference
protocol request metadata
actor context
buyer context

A simplified replay input model:

type ReplayableDecisionInput = {
  decisionId: string;
  inputDependencyHash: string;

  productFactRefs: EvidenceRef[];
  policyFactRefs: EvidenceRef[];
  ruleSetRef: string;
  ruleSetHash: string;
  envelopeSchemaVersion: string;

  actorContextHash?: string;
  buyerContextHash?: string;

  cartSnapshotId?: string;
  checkoutStateId?: string;
  paymentMandateId?: string;

  requestMetadata: {
    protocol?: string;
    correlationId: string;
    idempotencyKey?: string;
    requestedAt: string;
  };
};

Hashes help detect whether inputs changed.

Stable references identify which facts, rules, states, and authority records were used.

Snapshots preserve mutable commercial state.

References allow controlled access to sensitive records.

Replayability does not mean every decision must be re-executed exactly in production. It means the platform can explain what inputs were used and, when needed, reproduce or approximate the decision path without turning audit into a separate rule system.

Where a transaction coordinator keeps one current spine plus meaningful events, the current record should carry a schema version, canonical hash, revision, and previous hash. Updates should use compare-and-set semantics, and the corresponding event should be appended in the same persistence transaction so the indexed state and audit history cannot drift silently.

Evidence freshness matters

Evidence can become stale.

A decision made yesterday may have been correct yesterday and unsafe today.

That is why evidence records should carry source-reference and time information.

type FreshnessState =
  | "fresh"
  | "stale"
  | "expired"
  | "superseded"
  | "unknown";

type EvidenceFreshness = {
  evidenceRef: EvidenceRef;
  freshness: FreshnessState;
  evaluatedAt: string;
  expiresAt?: string;
  supersededBy?: string;
};

For the Travel Backpack:

Price fact:
fresh at decision time

Inventory fact:
stale at decision time

Return-policy fact:
missing at decision time

Generated description:
pending review at decision time

The audit record should preserve the state at decision time.

The platform also needs to know whether an old decision can still be reused.

A feed projection from yesterday should not be treated as current checkout readiness if inventory freshness expired today.

Evidence freshness connects Part 2 commercial truth, Part 3 eligibility, Part 8 generated claims, and this article’s audit model.

For current projections, evidence references should include a durable source marker such as a source reference, checksum, or hash. Otherwise a changed policy can leave an old grounded status looking current.

Historical audit can still preserve what was true at the time. Current projection should check the source reference again before returning a surface-safe value.

Audit should support invalidation

When a source fact changes, dependent decisions and claims may need invalidation.

For example:

Return-policy fact added for Travel Bags
      ↓
Previous missing_return_policy blocker may be superseded
      ↓
Generated explanation about missing return policy may be invalidated
      ↓
Operator task may be resolved
      ↓
Feed projection may be refreshed

Audit should record that lifecycle.

type InvalidationEvent = {
  invalidationId: string;

  targetRef: EvidenceRef;

  reason:
    | "source_fact_changed"
    | "policy_fact_changed"
    | "decision_superseded"
    | "claim_invalidated"
    | "checkout_state_changed"
    | "authority_revoked"
    | "manual_review";

  causedBy: EvidenceRef;
  occurredAt: string;
};

This lets the platform answer:

Why did the agent answer differently today?

A good answer might be:

Yesterday, return-policy coverage for Travel Bags was missing, so checkout preparation was blocked.
Today, policy coverage was approved, inventory was revalidated, and the eligibility decision changed.

Without invalidation and audit, changing answers look inconsistent.

With evidence, they become explainable.

Generated claims must not cite themselves

Part 8 argued that generated claims are derived projections, not commercial truth.

Evidence and audit should enforce that.

A generated policy summary should not use another generated summary as its primary evidence.

A generated checkout explanation should not cite a previous generated explanation instead of checkout state and its blockers.

A generated recommendation should not cite a generated product description instead of product facts and buyer criteria.

This is evidence laundering.

The platform starts with a weak or derived statement, generates a more polished statement from it, and then treats the result as supported.

A safe rule is:

Generated claims may depend on other projected claims for lineage,
but each direct projection should be hash-bound with its source and request context,
and high-risk claims must ultimately trace back to source facts, policy facts, or domain decisions.

For the Travel Backpack, a claim such as:

This backpack can be returned within 30 days.

must trace to a quotable return-policy fact for Travel Bags.

If the only support is another generated summary, the claim should be blocked.

Operator timelines are product features

Evidence and audit are not only for engineers.

Operators need timelines.

For the Travel Backpack, an operator timeline might show:

09:00
Agent requested checkout preparation.

09:00
Checkout blocked.
Blockers:
- stale_inventory
- missing_return_policy

09:01
Operator task created:
Revalidate inventory for BAG-TRAVEL-42.

09:01
Operator task created:
Approve return-policy coverage for Travel Bags.

10:15
Inventory revalidated.

10:30
Return-policy coverage approved.

10:31
Eligibility refreshed.

10:32
Agent feed republished.

10:33
Checkout preparation allowed for EU consumer context.

That timeline is a product capability.

The underlying remediation record can use one compact lifecycle such as open, acknowledged, resolved, and reopened. Repeated observations of the same unresolved blocker should update the existing stream rather than create duplicate tasks, while a blocker that returns after resolution should reopen it.

It helps operators understand what changed, why agents behaved differently, and what remediation resolved the issue.

Without this timeline, operators work from scattered logs, stale feed data, screenshots, and support tickets.

Agent-readiness requires operational visibility because many blockers are not code defects. They are missing facts, stale facts, missing policy coverage, invalidated claims, unreviewed generated text, expired mandates, or stale projections.

Evidence should drive support responses

Support teams also need evidence.

A buyer may ask:

Why did the agent say it could not buy the backpack?

Support should not need to inspect internal logs manually.

The platform should provide a support-safe answer:

At the time of the request, checkout could not be prepared because inventory needed revalidation and return terms were not available for the Travel Bags category. Payment was not attempted.

That answer is derived from evidence, but redacted for support use.

Support may also need an internal view:

Decision ID:
decision_delegate_payment_001

Checkout state:
checkout_prepare_001

Blockers:
stale_inventory
missing_return_policy

Payment:
not attempted

Agent response:
blocked with invalid_checkout_state

Operator tasks:
inventory revalidation
return-policy coverage

This reduces ambiguity.

Evidence makes customer support part of the product, not an afterthought.

Privacy and retention must be designed

Evidence and audit can contain sensitive data.

Buyer identity, cart contents, payment authority, provider references, support notes, risk decisions, and protocol metadata may all require careful handling.

An evidence system should include:

visibility levels
access controls
redaction rules
retention policies
data minimization
hashing where full data is unnecessary
separation between public explanation and internal evidence

For example, a buyer-safe explanation should not expose internal rule IDs or payment provider references.

An operator view may not need full payment instrument details.

An audit view may need immutable references, not full sensitive payloads.

The goal is not to expose all evidence to everyone. The goal is to preserve the decision path while controlling who can see each part of it.

Evidence without access control creates risk.

Audit without retention rules becomes liability.

Where this model can go wrong

Evidence and audit improve trust, but they introduce their own failure modes.

Treating logs as audit

Logs are useful, but they are not structured decision envelopes. They may be incomplete, inconsistent, redacted incorrectly, or difficult to replay.

Recording outcomes but not inputs

An audit record that says blocked is not enough. The platform needs the facts, policies, rules, state, authority, and context that produced the result.

Recording evidence but not projections

A correct domain decision can become a misleading agent response if the adapter projects it badly. Projections should record the surface, use, source pin, freshness state, scope result, payload integrity result, and inherited refusal state.

Exposing too much evidence externally

Evidence should be permissioned. Buyer-safe, protocol-safe, operator, support, and internal views should differ.

Hiding too much behind generic errors

Over-redaction can make agents and operators unable to recover. Structured blocker codes should survive projection when safe.

Letting evidence become stale without invalidation

Old decisions, feed projections, and generated claims can become unsafe when dependencies change.

Treating generated claims as evidence

Generated claims are derived. High-risk claims should trace back to source facts or domain decisions. If they depend on another generated claim, they should inherit that claim’s projection state and retain a verified direct dependency-projection reference rather than treating the dependency as fresh evidence.

Failing to audit blocked decisions

Blocked decisions are where the safety model is visible. They should be recorded, not discarded.

Making audit unreadable

Audit records that only engineers can interpret will not help operators, support, or compliance reviewers. Different audiences need different views.

Making evidence too expensive to produce

If evidence capture is slow, unreliable, or difficult to integrate, teams will bypass it. Evidence capture should be part of the platform path, not optional paperwork.

Practical tests for evidence and audit

Evidence and audit should be tested directly.

A useful evidence test includes:

Request
Decision result
Expected evidence refs
Expected dependency projection refs
Expected derived provenance hash
Expected blocker codes
Expected projection
Expected audit event
Expected operator work item
Expected invalidation behavior

For the Travel Backpack:

Scenario	Expected evidence/audit behavior
Agent asks to discover product	Decision references approved discoverable product facts
Agent asks to quote return policy	Blocked decision references missing return-policy coverage
Agent asks to prepare checkout	Blocked decision references stale inventory and missing return policy
Agent asks delegated payment	Action blocked because checkout is invalid; payment authority `not_evaluated`; payment not attempted
Generated return claim is requested	Claim blocked because no quotable return-policy fact exists
Derived claim uses an allowed parent projection	Direct dependency ref and child provenance hash are recorded
Parent projection context or evidence pin changes	Child provenance changes and the prior derived claim is invalidated
Parent projection semantics are altered without a matching hash	Derivation is rejected before publication
Inventory is revalidated	Prior stale-inventory blocker is superseded
Return-policy coverage is approved	Operator work item resolved; policy evidence updated
Feed is republished	Projection references current decision envelopes
Adapter maps blocked decision to generic success	Scenario test fails
Buyer-safe explanation is requested	Internal evidence is redacted but blocker meaning is preserved
Audit reconstruction is requested	Decision input refs, rule identity, and snapshot hashes are available

A simplified expectation type:

type EvidenceAuditScenarioExpectation = {
  requestedAction: AgentCommerceDecisionAction;
  expectedResult: DecisionResult;
  expectedEvidenceKinds: EvidenceKind[];
  expectedBlockerCodes: string[];
  expectedProjectionRecorded: boolean;
  expectedAuditEvent: boolean;
  expectedOperatorWorkItem?: boolean;
};

The goal is not only to test the business result. The tests should repeat the same normalized input to confirm stable hashes and exact reason/component reconciliation. They should reject conflicting evidence identities and verify deterministic direct dependency-projection binding.

They should also prove that tampering with decision identity, surface, freshness, dependencies, hashes, or the authenticator is rejected at the appropriate boundary. Requested action, subject, or actor mismatches must also be rejected. An older envelope must not bypass the latest checkout, mandate, or product-decision state merely because it remains signed and within its freshness window. Audit persistence should separately prevent a stale record revision from silently overwriting newer state.

Together, these tests show whether the platform can explain and verify a decision from its recorded inputs, blockers, projections, dependencies, and audit events.

A practical implementation path

A platform can introduce evidence and audit incrementally.

Define decision envelopes.

Start with the shared actions: discover, compare, quote_policy, add_to_cart, prepare_checkout, delegate_payment, complete_checkout, show_generated_claim, and explain.

Define evidence references.

Use typed references for product facts, policy facts, eligibility rules, authority records, checkout state, cart snapshots, payment mandates, generated claims, and audit events.

Attach evidence to allowed and blocked decisions.

Do not record evidence only for failures.

Preserve blocker codes.

Use stable domain blocker codes such as stale_inventory, missing_return_policy, invalid_checkout_state, mandate_expired, and cart_snapshot_mismatch.

Preserve projection evidence.

Record or reconstruct, within the event or audit path that owns the decision, how the canonical envelope was presented to feeds, tools, commerce adapters, payment adapters, admin UI, and support views.

Add replayable inputs.

Store source references, hashes, snapshots, and rule-set identities where exact input reconstruction matters.

Add invalidation events.

When product facts, policy facts, eligibility rules, checkout state, payment authority, or generated claims change, record which decisions or projections are superseded.

Build audience-specific explanation views.

Separate buyer-safe, protocol-safe, operator, support, auditor, and engineer views.

Generate operator timelines.

Show blockers, remediation tasks, state changes, feed refreshes, and decision changes over time.

Test evidence paths.

Include evidence references, recorded projection evidence, meaningful audit events, and invalidation behavior in scenario tests.

This path makes evidence part of the platform rather than something reconstructed after incidents.

The tradeoff

Evidence and audit add structure.

They require decision envelopes, evidence references, snapshots, recorded projection evidence, redaction rules, retention rules, invalidation events, and audience-specific explanation views.

For a simple storefront, that may seem heavy.

Agent-ready commerce changes the threshold.

Once agents can answer buyer questions, quote policies, compare products, prepare checkout, operate under delegated authority, or explain payment refusal, the platform needs to explain the decision path from recorded inputs, blockers, and projections.

The tradeoff is between implementation simplicity and explainable trust.

Logs are easier to add.

Decision evidence is easier to rely on, audit, test, replay, project, and defend.

The goal is not to record everything forever. The goal is to record enough structured evidence to explain why the platform answered, refused, mutated state, or escalated at the time it happened.

The evidence layer

Evidence and audit are part of the product in agent-ready commerce.

They are not backend extras.

Agents turn platform decisions into buyer-facing answers and commercial actions. That makes the reasoning behind those decisions part of the user experience, the operator workflow, the support process, and the audit trail.

A safe platform keeps the chain explicit:

Request
      ↓
Decision
      ↓
Evidence record
      ↓
Protocol-safe explanation
      ↓
Projection or state transition
      ↓
Audit event
      ↓
Operator and support timeline

For the Travel Backpack, that means the platform can explain exactly why the product is discoverable and comparable but not policy-quotable, checkout-ready, or eligible for delegated payment.

It can point to stale inventory, missing return-policy coverage, current eligibility decisions, checkout state, payment authority decisions, generated-claim review status, and operator remediation tasks.

That is the difference between a system that merely responds and a system that can explain the basis of its response.

Next in the series

Part 10 will bring the architecture together:

Agent-Ready Commerce, Part 10: The Reference Architecture

That article will connect commercial truth, policy facts, eligibility, adapters, checkout state, payment authority, generated claims, evidence, audit, feeds, and operator workflows into a complete reference architecture for agent-ready commerce platforms.

About the author

Written by Dimitrios S. Sfyris, Founder & Software Architect at AspectSoft.

AspectSoft designs and develops custom software platforms, e-commerce systems, SaaS infrastructure, integrations, analytics tools, and practical digital products.

You can also follow the AspectSoft LinkedIn page for updates on software platforms, commerce systems, AI tooling, and developer-focused products.

DEV Community

Agent-Ready Commerce, Part 9: Evidence and Audit Are Part of the Product

Series context

Logs are not evidence

The Travel Backpack example

Evidence belongs to allowed decisions too

Decision envelopes are the core artifact

Evidence should be referenced, not copied everywhere

Evidence is a graph, not a flat list

Explanations are projections of evidence

Protocol-safe does not mean vague

Recorded projections reveal adapter drift

Blocked decisions are product signals

Replayability requires snapshots, hashes, and stable references

Evidence freshness matters

Audit should support invalidation

Generated claims must not cite themselves

Operator timelines are product features

Evidence should drive support responses

Privacy and retention must be designed

Where this model can go wrong

Treating logs as audit

Recording outcomes but not inputs

Recording evidence but not projections

Exposing too much evidence externally

Hiding too much behind generic errors

Letting evidence become stale without invalidation

Treating generated claims as evidence

Failing to audit blocked decisions

Making audit unreadable

Making evidence too expensive to produce

Practical tests for evidence and audit

A practical implementation path

The tradeoff

The evidence layer

Next in the series

About the author

Top comments (0)