An agent-ready commerce platform does not only need to answer, refuse, or act.
It needs to prove why it answered, refused, or acted.
That proof cannot be reconstructed reliably from vague logs after something goes wrong. It has to be captured at the moment the platform makes the decision.
When an agent says a product can be compared, the platform should know which product facts supported the comparison.
When an agent refuses to quote a return policy, the platform should know which policy coverage was missing.
When checkout is blocked, the platform should know which facts, rules, policies, state transitions, or authority checks blocked it.
When delegated payment is refused, the platform should know whether the blocker was checkout validity, mandate scope, cart snapshot mismatch, amount limit, currency mismatch, expiry, revocation, confirmation, provider state, or order commitment.
When a generated claim is quoted, the platform should know which source facts, policy facts, decisions, review status, allowed uses, and expiry window made that claim safe at the time it was used.
This is not ordinary logging.
Logs describe activity.
Evidence explains the basis of a decision.
Audit makes that explanation durable.
Agent-ready commerce needs a proof layer.
This is the ninth article in the Agent-Ready Commerce series.
Part 1 introduced the broader architecture model:
Facts → Eligibility → Authority → State transition → Evidence → Audit
Part 2 focused on commercial truth. It argued that catalog data is not enough. A platform needs source-backed, freshness-aware product facts before agents or other systems can safely rely on product information.
Part 3 focused on action eligibility. It argued that “available” is too broad. A product may be discoverable but not checkout-ready, comparable but not policy-quotable, or checkout-ready for a human flow but not eligible for delegated payment.
Part 4 focused on policy structure. It argued that agents should not interpret free-text policy pages as executable rules. Policies need structured facts with applicability, evidence, lifecycle, conflicts, and quoteability.
Part 5 focused on protocol adapters. It argued that ACP, MCP, AP2, feeds, tools, and future interfaces should translate domain decisions rather than becoming separate sources of commercial meaning.
Part 6 focused on checkout state. It argued that checkout is where agent-ready commerce crosses from answering into mutating commercial state. Checkout should therefore be modeled as a state machine, not a form endpoint.
Part 7 focused on delegated payment. It argued that a payment artifact is not enough. Delegated payment requires bounded authority over a specific checkout snapshot, amount, currency, merchant, actor, buyer, time window, evidence, and audit.
Part 8 focused on generated claims. It argued that generated commerce text should be treated as a derived claim with dependencies, scope, review status, allowed use, invalidation, expiry, and audit.
This article focuses on evidence and audit.
The central argument is that evidence and audit are not backend observability details. They are product capabilities. Agent-ready commerce needs decision records, evidence references, replayable inputs, protocol-safe explanations, projection records, operator timelines, and audit trails because agents turn platform decisions into buyer-facing answers and commercial actions.
A safe platform follows this chain:
Request
↓
Decision
↓
Evidence record
↓
Protocol-safe explanation
↓
Projection or state transition
↓
Audit event
↓
Operator and support timeline
Without that chain, the platform may know what happened, but it cannot prove why it happened.
Logs are not evidence
Logs are useful.
They help engineers debug systems. They capture service calls, timings, warnings, exceptions, queue events, request identifiers, and operational details.
But logs are not enough for agent-ready commerce.
A log line such as:
checkout blocked for BAG-TRAVEL-42
does not explain the decision.
It does not say which checkout command was requested. It does not say which actor requested it. It does not say whether inventory was stale, which policy facts were missing, which eligibility rule applied, whether payment authority was present, which checkout state existed before the command, which blocker codes were returned, or what explanation was shown to the agent.
A better record answers:
Which action was requested?
Which actor requested it?
Which product, cart, checkout, claim, or mandate was targeted?
Which facts were selected?
Were those facts fresh?
Which policy facts applied?
Which required facts were missing?
Which rule version evaluated the action?
Which authority record was checked?
Which state transition was attempted?
Which result was produced?
Which explanation was projected?
Which audit event recorded it?
That is evidence.
A useful distinction is:
Log:
A technical record that something happened.
Evidence:
A structured reference to the facts, policies, rules, states, authority records, generated claims, and decisions that justified the result.
Audit:
A durable history of decisions, projections, transitions, explanations, evidence use, invalidations, and remediation.
Logs support debugging.
Evidence supports decision trust.
Audit supports accountability, replay, support, remediation, and dispute handling.
Agent-ready commerce needs all three, but they are not the same layer.
The Travel Backpack example
Continue with the running example from Parts 2–8:
Product: Travel Backpack
SKU: BAG-TRAVEL-42
Price: €129
Catalog status: active
Inventory status: in_stock
Category: Travel Bags
The commercial truth layer currently says:
Price: fresh
Inventory: stale
Return policy: missing for Travel Bags
Warranty policy: known
Shipping policy: known for EU, unknown for US
Generated description: pending review
Feed publication: last published yesterday
From Part 3, the action matrix was:
| Action | Result |
|---|---|
discover |
allowed |
compare |
allowed |
quote_policy |
blocked |
add_to_cart |
requires_revalidation |
prepare_checkout |
blocked |
delegate_payment |
blocked |
Now suppose an agent asks:
Can you buy the Travel Backpack for the buyer if the total stays under €150?
The platform should not produce only this:
{
"allowed": false
}
It should produce a decision with evidence.
A useful response may say:
Delegated payment is blocked because checkout is not valid.
Checkout cannot be prepared because:
- inventory requires revalidation
- return-policy coverage is missing for Travel Bags
Internally, the platform should preserve the decision path:
delegate_payment
↓ blocked because checkout is not valid
prepare_checkout
↓ blocked because inventory is stale
↓ blocked because return-policy coverage is missing
quote_policy
↓ blocked because no return-policy fact applies to Travel Bags
That evidence path matters because different users need different projections.
The buyer may need a concise explanation.
The agent may need structured blocker codes.
The operator may need remediation tasks.
Support may need a customer-safe timeline.
The auditor may need source facts, decision records, rule versions, protocol projections, and timestamps.
The same decision needs multiple views, but it should come from one evidence chain.
Evidence belongs to allowed decisions too
Evidence should not be attached only to failures.
Blocked decisions are important because they explain refusal. Allowed decisions are equally important because they explain reliance.
If an agent is allowed to compare two products, the platform should know which product facts supported the comparison.
If an agent is allowed to quote warranty coverage, the platform should know which warranty policy fact applied.
If checkout is prepared, the platform should know which price, inventory, policy, shipping, tax, and cart snapshot records were used.
If delegated payment is allowed, the platform should know which mandate, cart snapshot, checkout state, amount, currency, merchant binding, expiry check, revocation check, and confirmation record supported the authority decision.
The audit question is not only:
Why did the platform block this?
It is also:
Why did the platform allow this?
A useful rule is:
If the platform can expose an answer or perform an action, it should know what evidence allowed it.
Otherwise, the system may be correct at runtime but unexplainable later.
Decision records are the core artifact
A decision record is the object that connects request, context, result, evidence, explanation, and audit.
It is the platform’s proof artifact.
A simplified model:
type DecisionResult =
| "allowed"
| "blocked"
| "requires_review"
| "requires_revalidation"
| "requires_confirmation";
type CommerceAction =
| "discover"
| "compare"
| "quote_policy"
| "add_to_cart"
| "prepare_checkout"
| "delegate_payment"
| "generate_claim"
| "publish_claim";
type DecisionBlocker = {
code: string;
message: string;
nextAction?: string;
};
type EvidenceKind =
| "product_fact"
| "price_fact"
| "inventory_fact"
| "policy_fact"
| "policy_conflict"
| "eligibility_rule"
| "eligibility_decision"
| "authority_record"
| "checkout_state"
| "checkout_transition"
| "cart_snapshot"
| "payment_mandate"
| "payment_attempt"
| "generated_claim"
| "review_decision"
| "operator_task"
| "protocol_projection"
| "audit_event";
type EvidenceVisibility =
| "internal"
| "operator"
| "support"
| "protocol_safe"
| "buyer_safe";
type EvidenceRef = {
kind: EvidenceKind;
id: string;
version?: string;
hash?: string;
visibility: EvidenceVisibility;
capturedAt: string;
};
type CommerceDecisionRecord = {
decisionId: string;
action: CommerceAction;
target: {
productId?: string;
sku?: string;
cartId?: string;
checkoutId?: string;
claimId?: string;
mandateId?: string;
};
actor: {
actorId: string;
actorType: "human" | "agent" | "system";
};
context: {
region?: string;
buyerType?: "consumer" | "business";
channel: "storefront" | "agent" | "marketplace" | "support";
currency?: string;
};
result: DecisionResult;
blockers: DecisionBlocker[];
warnings: DecisionBlocker[];
evidenceRefs: EvidenceRef[];
ruleSetVersion?: string;
inputSnapshotHash?: string;
evaluatedAt: string;
auditEventIds: string[];
};
For the Travel Backpack, a prepare_checkout decision might look like this:
{
"decisionId": "decision_prepare_checkout_001",
"action": "prepare_checkout",
"target": {
"sku": "BAG-TRAVEL-42",
"cartId": "cart_123"
},
"actor": {
"actorId": "agent_abc",
"actorType": "agent"
},
"context": {
"region": "EU",
"buyerType": "consumer",
"channel": "agent",
"currency": "EUR"
},
"result": "blocked",
"blockers": [
{
"code": "INVENTORY_STALE",
"message": "Inventory must be revalidated before checkout can be prepared.",
"nextAction": "Revalidate inventory from the inventory source."
},
{
"code": "RETURN_POLICY_MISSING",
"message": "Return-policy coverage is missing for the Travel Bags category.",
"nextAction": "Attach or approve return-policy coverage for Travel Bags."
}
],
"warnings": [],
"evidenceRefs": [
{
"kind": "inventory_fact",
"id": "truth:BAG-TRAVEL-42:inventory",
"version": "2026-06-28T08:30:00Z",
"visibility": "operator",
"capturedAt": "2026-06-28T09:00:00Z"
},
{
"kind": "policy_fact",
"id": "policy:returns:travel_bags",
"visibility": "operator",
"capturedAt": "2026-06-28T09:00:00Z"
},
{
"kind": "eligibility_rule",
"id": "rule:prepare_checkout:policy_coverage_required",
"version": "2026-06-01",
"visibility": "internal",
"capturedAt": "2026-06-28T09:00:00Z"
}
],
"ruleSetVersion": "eligibility_rules_2026_06",
"inputSnapshotHash": "hash_prepare_checkout_inputs_001",
"evaluatedAt": "2026-06-28T09:00:00Z",
"auditEventIds": ["audit_evt_001"]
}
This is more useful than a log message because it can support:
agent responses
operator tasks
protocol projections
support explanations
audit review
scenario tests
replay analysis
The decision record becomes the platform’s stable explanation boundary.
Evidence should be referenced, not copied everywhere
Evidence does not mean copying every product record, policy page, checkout snapshot, mandate, provider response, and generated claim into every decision.
That creates duplication, privacy risk, and consistency problems.
A better model stores evidence references.
The decision records the identity, version, hash, visibility, and capture time of the evidence it used.
The referenced evidence can then be retrieved by authorized systems.
This matters for privacy and security. Payment authority records, buyer identity, provider responses, risk decisions, and support notes may contain sensitive information. The buyer-safe explanation should not expose the same detail as the internal audit view.
Evidence should be:
addressable
versioned
hashable where useful
permissioned
visible at the right audience level
captured at decision time
This lets the platform preserve proof without exposing everything everywhere.
Evidence is a graph, not a flat list
Simple decisions may use a flat list of evidence references.
Complex commerce workflows need a graph.
For example:
delegate_payment decision
depends on checkout_state
depends on payment_mandate
depends on cart_snapshot
depends on authority_decision
checkout_state
depends on prepare_checkout decision
prepare_checkout decision
depends on product facts
depends on inventory freshness
depends on policy coverage
depends on eligibility rules
policy coverage
depends on policy facts
depends on source documents
depends on review decisions
A simplified evidence graph model:
type EvidenceEdgeType =
| "derived_from"
| "blocked_by"
| "allowed_by"
| "superseded_by"
| "projected_as"
| "invalidated_by"
| "created_task"
| "recorded_as";
type EvidenceEdge = {
from: EvidenceRef;
to: EvidenceRef;
type: EvidenceEdgeType;
};
For the Travel Backpack:
delegate_payment decision
blocked_by → checkout_not_valid decision
checkout_not_valid decision
blocked_by → prepare_checkout decision
prepare_checkout decision
blocked_by → INVENTORY_STALE
blocked_by → RETURN_POLICY_MISSING
RETURN_POLICY_MISSING
created_task → approve return-policy coverage for Travel Bags
This graph lets the platform trace from a buyer-facing refusal back to the actual source of the refusal.
That is stronger than storing isolated events.
It also helps explain changes over time. If return-policy coverage is later approved, the platform can show which old blockers were superseded, which generated claims were invalidated, and which feed projections were refreshed.
Explanations are projections of evidence
An explanation is not the same thing as the evidence record.
The evidence record may contain internal rule IDs, policy fact references, source hashes, payment provider references, risk flags, operator notes, and sensitive buyer information.
The explanation should expose only what is safe and useful for the audience.
Different audiences need different projections:
| Audience | Needed explanation |
|---|---|
| Buyer | Clear, concise, safe reason |
| Agent | Structured blocker codes and next actions |
| Operator | Remediation details and affected products |
| Support | Customer-safe timeline and decision summary |
| Auditor | Full decision path, evidence refs, rule versions, timestamps |
| Engineer | Diagnostic details, traces, service calls |
For the Travel Backpack, the buyer-safe explanation might be:
Checkout cannot be prepared for this item right now because availability must be revalidated and return terms are not available for this product category.
The agent-facing explanation might be:
{
"result": "blocked",
"blockers": [
{
"code": "INVENTORY_STALE",
"nextAction": "request_inventory_revalidation"
},
{
"code": "RETURN_POLICY_MISSING",
"nextAction": "operator_policy_review_required"
}
]
}
The operator explanation might be:
BAG-TRAVEL-42 is blocked for checkout preparation.
Required remediation:
- Revalidate inventory.
- Attach or approve return-policy coverage for Travel Bags.
Impact:
Agents may discover and compare the product, but checkout preparation and delegated payment are blocked.
The audit view may include evidence references, rule versions, cart snapshots, timestamps, and projection records.
These are different explanations from the same evidence chain.
The platform should not generate each explanation independently. It should project them from the decision record.
Protocol-safe does not mean vague
Some teams avoid exposing evidence because they do not want to leak internal details.
That is reasonable.
But the solution should not be vague errors.
Weak protocol response:
{
"error": "not_allowed"
}
Better protocol-safe response:
{
"result": "blocked",
"reason": "RETURN_POLICY_MISSING",
"message": "Return-policy terms are not available for this product category.",
"nextAction": "operator_review_required"
}
This response does not expose internal policy record IDs, source documents, rule implementation, or operator notes. It preserves the domain meaning.
That matters because agents need to know what to do next.
A generic refusal may cause the agent to retry, ask the buyer the wrong question, or proceed with another unsafe assumption.
A structured refusal helps the agent recover correctly.
INVENTORY_STALE:
request revalidation
RETURN_POLICY_MISSING:
operator remediation required
CHECKOUT_NOT_VALID:
do not request payment
MANDATE_EXPIRED:
request new authority
CART_SNAPSHOT_MISMATCH:
revalidate checkout or request new mandate
Protocol-safe responses should preserve blocker semantics whenever possible.
Projection records prevent adapter drift
Part 5 focused on adapter drift. Different adapters may start answering the same commercial question differently.
Evidence helps detect and prevent that drift.
If an MCP-style tool response, an ACP-style commerce response, an AP2-related payment response, an internal feed, and an admin UI all reference the same decision record, they can project different shapes while preserving the same meaning.
The response shape can differ.
The decision should not.
For the Travel Backpack:
Feed:
prepareCheckout = blocked
Policy tool:
return-policy quotation blocked
Commerce adapter:
checkout preparation blocked
Payment adapter:
delegated payment blocked because checkout is invalid
Admin UI:
inventory revalidation and return-policy coverage tasks required
All of these should trace back to compatible evidence.
A projection record captures how a decision was exposed to a surface:
type ProtocolProjectionRecord = {
projectionId: string;
decisionId: string;
protocol:
| "internal_feed"
| "mcp"
| "acp"
| "ap2"
| "marketplace_api"
| "admin";
audience:
| "buyer"
| "agent"
| "operator"
| "support"
| "internal";
projectedResult: string;
projectedBlockerCodes: string[];
redactedEvidenceRefs: EvidenceRef[];
createdAt: string;
};
This supports a specific audit question:
The domain blocked checkout for two reasons. What did the agent actually receive?
That question cannot be answered from the domain decision alone.
A correct domain decision can become a misleading external answer if the adapter projects it badly.
Audit should therefore record projections, not only decisions.
Blocked decisions are product signals
Systems often record successful transactions more carefully than blocked ones.
Agent-ready commerce needs blocked decisions to be first-class audit events.
Blocked decisions explain why the platform refused to answer, quote, mutate, or pay. They also identify operational work.
A blocked prepare_checkout transition should record:
requested command
actor
current checkout state
decision result
blockers
evidence refs
protocol projection
operator tasks created
timestamp
A blocked delegate_payment decision should record:
requested payment action
checkout state
mandate reference
cart snapshot
authority decision
blockers
evidence refs
payment not attempted
timestamp
A blocked generated claim should record:
claim text
unsupported segment
missing or stale dependency
allowed use requested
review state
invalidation reason
timestamp
Blocked events are not noise.
They are the platform’s safety model working.
If the audit trail records only success, it misses the most important part of agent-readiness.
Replayability requires snapshots, versions, and hashes
Audit is weak if the platform cannot reconstruct the inputs to a decision.
Reconstruction does not always mean storing every full payload forever. A platform can use snapshots, versions, hashes, and references.
Important inputs include:
product fact versions
policy fact versions
eligibility rule version
authority record version
cart snapshot
checkout state
payment mandate version
generated claim version
protocol request metadata
actor context
buyer context
A simplified replay input model:
type ReplayableDecisionInput = {
decisionId: string;
inputSnapshotHash: string;
productFactRefs: EvidenceRef[];
policyFactRefs: EvidenceRef[];
ruleSetVersion: string;
actorContextHash?: string;
buyerContextHash?: string;
cartSnapshotId?: string;
checkoutStateId?: string;
paymentMandateId?: string;
requestMetadata: {
protocol?: string;
correlationId: string;
idempotencyKey?: string;
requestedAt: string;
};
};
Hashes help detect whether inputs changed.
Versions identify which facts or rules were used.
Snapshots preserve mutable commercial state.
References allow controlled access to sensitive records.
Replayability does not mean every decision must be re-executed exactly in production. It means the platform can explain what inputs were used and, when needed, reproduce or approximate the decision path for audit, testing, and incident analysis.
Evidence freshness matters
Evidence can become stale.
A decision made yesterday may have been correct yesterday and unsafe today.
That is why evidence records should carry version and time information.
type FreshnessState =
| "fresh"
| "stale"
| "expired"
| "superseded"
| "unknown";
type EvidenceFreshness = {
evidenceRef: EvidenceRef;
freshness: FreshnessState;
evaluatedAt: string;
expiresAt?: string;
supersededBy?: string;
};
For the Travel Backpack:
Price fact:
fresh at decision time
Inventory fact:
stale at decision time
Return-policy fact:
missing at decision time
Generated description:
pending review at decision time
The audit record should preserve the state at decision time.
The platform also needs to know whether an old decision can still be reused.
A feed projection from yesterday should not be treated as current checkout readiness if inventory freshness expired today.
Evidence freshness connects Part 2 commercial truth, Part 3 eligibility, Part 8 generated claims, and this article’s audit model.
Audit should support invalidation
When a source fact changes, dependent decisions and claims may need invalidation.
For example:
Return-policy fact added for Travel Bags
↓
Previous RETURN_POLICY_MISSING blocker may be superseded
↓
Generated explanation about missing return policy may be invalidated
↓
Operator task may be resolved
↓
Feed projection may be refreshed
Audit should record that lifecycle.
type InvalidationEvent = {
invalidationId: string;
targetRef: EvidenceRef;
reason:
| "source_fact_changed"
| "policy_fact_changed"
| "decision_superseded"
| "claim_invalidated"
| "checkout_state_changed"
| "authority_revoked"
| "manual_review";
causedBy: EvidenceRef;
occurredAt: string;
};
This lets the platform answer:
Why did the agent answer differently today?
A good answer might be:
Yesterday, return-policy coverage for Travel Bags was missing, so checkout preparation was blocked.
Today, policy coverage was approved, inventory was revalidated, and the eligibility decision changed.
Without invalidation and audit, changing answers look inconsistent.
With evidence, they become explainable.
Generated claims must not cite themselves
Part 8 argued that generated claims are derived projections, not commercial truth.
Evidence and audit should enforce that.
A generated policy summary should not use another generated summary as its primary evidence.
A generated checkout explanation should not cite a previous generated explanation instead of the checkout decision.
A generated recommendation should not cite a generated product description instead of product facts and buyer criteria.
This is evidence laundering.
The platform starts with a weak or derived statement, generates a more polished statement from it, and then treats the result as supported.
A safe rule is:
Generated claims may reference other generated claims for lineage,
but high-risk claims must ultimately trace back to source facts, policy facts, or domain decisions.
For the Travel Backpack, a claim such as:
This backpack can be returned within 30 days.
must trace to a quoteable return-policy fact for Travel Bags.
If the only support is another generated summary, the claim should be blocked.
Operator timelines are product features
Evidence and audit are not only for engineers.
Operators need timelines.
For the Travel Backpack, an operator timeline might show:
09:00
Agent requested checkout preparation.
09:00
Checkout blocked.
Blockers:
- INVENTORY_STALE
- RETURN_POLICY_MISSING
09:01
Operator task created:
Revalidate inventory for BAG-TRAVEL-42.
09:01
Operator task created:
Approve return-policy coverage for Travel Bags.
10:15
Inventory revalidated.
10:30
Return-policy coverage approved.
10:31
Eligibility refreshed.
10:32
Agent feed republished.
10:33
Checkout preparation allowed for EU consumer context.
That timeline is a product capability.
It helps operators understand what changed, why agents behaved differently, and what remediation resolved the issue.
Without this timeline, operators work from scattered logs, stale feed data, screenshots, and support tickets.
Agent-readiness requires operational visibility because many blockers are not code defects. They are missing facts, stale facts, missing policy coverage, invalidated claims, unreviewed generated text, expired mandates, or stale projections.
Evidence should drive support responses
Support teams also need evidence.
A buyer may ask:
Why did the agent say it could not buy the backpack?
Support should not need to inspect internal logs manually.
The platform should provide a support-safe answer:
At the time of the request, checkout could not be prepared because inventory needed revalidation and return terms were not available for the Travel Bags category. Payment was not attempted.
That answer is derived from evidence, but redacted for support use.
Support may also need an internal view:
Decision ID:
decision_delegate_payment_001
Checkout decision:
decision_prepare_checkout_001
Blockers:
INVENTORY_STALE
RETURN_POLICY_MISSING
Payment:
not attempted
Agent response:
blocked with checkout_not_valid
Operator tasks:
inventory revalidation
return-policy coverage
This reduces ambiguity.
Evidence makes customer support part of the product, not an afterthought.
Privacy and retention must be designed
Evidence and audit can contain sensitive data.
Buyer identity, cart contents, payment authority, provider references, support notes, risk decisions, and protocol metadata may all require careful handling.
An evidence system should include:
visibility levels
access controls
redaction rules
retention policies
data minimization
hashing where full data is unnecessary
separation between public explanation and internal evidence
For example, a buyer-safe explanation should not expose internal rule IDs or payment provider references.
An operator view may not need full payment instrument details.
An audit view may need immutable references, not full sensitive payloads.
The goal is not to expose all evidence to everyone. The goal is to preserve the decision path while controlling who can see each part of it.
Evidence without access control creates risk.
Audit without retention rules becomes liability.
Where this model can go wrong
Evidence and audit improve trust, but they introduce their own failure modes.
Treating logs as audit
Logs are useful, but they are not structured decision records. They may be incomplete, inconsistent, redacted incorrectly, or difficult to replay.
Recording outcomes but not inputs
An audit record that says blocked is not enough. The platform needs the facts, policies, rules, state, authority, and context that produced the result.
Recording evidence but not projections
A correct domain decision can become a misleading agent response if the adapter projects it badly. Projections should be auditable.
Exposing too much evidence externally
Evidence should be permissioned. Buyer-safe, protocol-safe, operator, support, and internal views should differ.
Hiding too much behind generic errors
Over-redaction can make agents and operators unable to recover. Structured blocker codes should survive projection when safe.
Letting evidence become stale without invalidation
Old decisions, feed projections, and generated claims can become unsafe when dependencies change.
Treating generated claims as evidence
Generated claims are derived. High-risk claims should trace back to source facts or domain decisions, not other generated claims.
Failing to audit blocked decisions
Blocked decisions are where the safety model is visible. They should be recorded, not discarded.
Making audit unreadable
Audit records that only engineers can interpret will not help operators, support, or compliance reviewers. Different audiences need different views.
Making evidence too expensive to produce
If evidence capture is slow, unreliable, or difficult to integrate, teams will bypass it. Evidence capture should be part of the platform path, not optional paperwork.
Testing evidence and audit
Evidence and audit should be tested directly.
A useful test includes:
Request
Decision result
Expected evidence refs
Expected blocker codes
Expected projection
Expected audit event
Expected operator task
Expected invalidation behavior
For the Travel Backpack:
| Scenario | Expected evidence/audit behavior |
|---|---|
| Agent asks to discover product | Decision references approved discoverable product facts |
| Agent asks to quote return policy | Blocked decision references missing return-policy coverage |
| Agent asks to prepare checkout | Blocked decision references stale inventory and missing return policy |
| Agent asks delegated payment | Payment blocked because checkout is not valid; payment not attempted |
| Generated return claim is requested | Claim blocked because no quoteable return-policy fact exists |
| Inventory is revalidated | Prior stale-inventory blocker is superseded |
| Return-policy coverage is approved | Operator task resolved; policy evidence updated |
| Feed is republished | Projection references current decision records |
| Adapter maps blocked decision to generic success | Projection test fails |
| Buyer-safe explanation is requested | Internal evidence is redacted but blocker meaning is preserved |
| Audit replay is requested | Decision input refs, rule version, and snapshot hashes are available |
A simplified expectation type:
type EvidenceAuditScenarioExpectation = {
action: CommerceAction;
expectedResult: DecisionResult;
expectedEvidenceKinds: EvidenceKind[];
expectedBlockerCodes: string[];
expectedProjectionRecorded: boolean;
expectedAuditEvent: boolean;
expectedOperatorTask?: boolean;
};
The goal is not only to test the business decision.
The goal is to test whether the platform can prove the decision.
A practical implementation path
A platform can introduce evidence and audit incrementally.
Define decision records.
Start with eligibility decisions fordiscover,compare,quote_policy,add_to_cart,prepare_checkout, anddelegate_payment.Define evidence references.
Use typed references for product facts, policy facts, eligibility rules, authority records, checkout state, cart snapshots, payment mandates, generated claims, and audit events.Attach evidence to allowed and blocked decisions.
Do not record evidence only for failures.Preserve blocker codes.
Use stable domain blocker codes such asINVENTORY_STALE,RETURN_POLICY_MISSING,CHECKOUT_NOT_VALID,MANDATE_EXPIRED, andCART_SNAPSHOT_MISMATCH.Record protocol projections.
Store how domain decisions were presented to feeds, tools, commerce adapters, payment adapters, admin UI, and support views.Add replayable inputs.
Store versions, hashes, snapshots, and rule-set versions where exact input reconstruction matters.Add invalidation events.
When product facts, policy facts, eligibility rules, checkout state, payment authority, or generated claims change, record which decisions or projections are superseded.Build audience-specific explanation views.
Separate buyer-safe, protocol-safe, operator, support, auditor, and engineer views.Generate operator timelines.
Show blockers, remediation tasks, state changes, feed refreshes, and decision changes over time.Test evidence paths.
Include evidence references, projection records, audit events, and invalidation behavior in scenario tests.
This path makes evidence part of the platform rather than something reconstructed after incidents.
The tradeoff
Evidence and audit add structure.
They require decision records, evidence references, versioning, snapshots, projection records, redaction rules, retention rules, invalidation events, and audience-specific explanation views.
For a simple storefront, that may seem heavy.
Agent-ready commerce changes the threshold.
Once agents can answer buyer questions, quote policies, compare products, prepare checkout, operate under delegated authority, or explain payment refusal, the platform needs to prove the decision path.
The tradeoff is between implementation simplicity and explainable trust.
Logs are easier to add.
Decision evidence is easier to rely on, audit, test, replay, project, and defend.
The goal is not to record everything forever. The goal is to record enough structured evidence to explain why the platform answered, refused, mutated state, or escalated at the time it happened.
The proof layer
Evidence and audit are part of the product in agent-ready commerce.
They are not backend extras.
Agents turn platform decisions into buyer-facing answers and commercial actions. That makes the reasoning behind those decisions part of the user experience, the operator workflow, the support process, and the audit trail.
A safe platform keeps the chain explicit:
Request
↓
Decision
↓
Evidence record
↓
Protocol-safe explanation
↓
Projection or state transition
↓
Audit event
↓
Operator and support timeline
For the Travel Backpack, that means the platform can explain exactly why the product is discoverable and comparable but not policy-quotable, checkout-ready, or eligible for delegated payment.
It can point to stale inventory, missing return-policy coverage, current eligibility decisions, checkout state, payment authority decisions, generated-claim review status, and operator remediation tasks.
That is the difference between a system that merely responds and a system that can prove its response.
Part 10 will bring the architecture together:
Agent-Ready Commerce, Part 10: The Reference Architecture
That article will connect commercial truth, policy facts, eligibility, adapters, checkout state, payment authority, generated claims, evidence, audit, feeds, and operator workflows into a complete reference architecture for agent-ready commerce platforms.
About the author
Written by Dimitrios S. Sfyris, Founder & Software Architect at AspectSoft.
AspectSoft designs and develops custom software platforms, e-commerce systems, SaaS infrastructure, integrations, analytics tools, and practical digital products.
You can also follow the AspectSoft LinkedIn page for updates on software platforms, commerce systems, AI tooling, and developer-focused products.
Top comments (0)