suissAI

Posted on Feb 26

FullAgenticStack WhatsApp-first: RFC-WF-0017

#ai #agents #testing #architecture

RFC-WF-0017

Test Vectors & Reference Scenarios (TVRS)

Status: Draft Standard
Version: 1.0.0
Date: 20 Nov 2025
Category: Standards Track
Author: FullAgenticStack Initiative
Dependencies: RFC-WF-0003 (CCP), RFC-WF-0004 (ACSM), RFC-WF-0005 (CRCD), RFC-WF-0006 (EAS), RFC-WF-0007 (OoC), RFC-WF-0008 (RCP), RFC-WF-0010 (IDS), RFC-WF-0011 (CATS), RFC-WF-0016 (RCMC)
License: Open Specification (Public, Royalty-Free)

Abstract

This document specifies Test Vectors & Reference Scenarios (TVRS) for WhatsApp-first systems. TVRS defines a canonical set of reference scenarios, message sequences, expected evidence artifacts, and expected conversational outputs used to validate conformance with CCP/ACSM/IDS/EAS/OoC/RCP requirements and to power automated audits (CATS). TVRS acts as a portable “conformance suite” enabling implementations to demonstrate behavior under duplicates, ambiguity, privilege boundaries, and recovery flows.

Index Terms— conformance testing, test vectors, reference scenarios, message sequences, evidence expectations, idempotency tests, recovery tests, WhatsApp-first.

I. Introduction

Specifications become enforceable when accompanied by test vectors. TVRS provides concrete scenarios that an implementation can execute in a sandbox tenant to prove compliance—especially for behaviors that cannot be validated statically (duplicates, retries, step-up, recovery).

TVRS is not an application test suite; it is a protocol conformance suite.

II. Scope

TVRS specifies:

A standard scenario format (machine-readable)
Minimum required scenarios for WhatsApp-first conformance
Expected outcomes and evidence requirements (EAS) per scenario
Expected OoC responses and drill-down behavior
Guidance for safe sandbox execution

TVRS does not mandate a testing framework, only scenario descriptions and expected assertions.

III. Normative Language

MUST, MUST NOT, SHOULD, SHOULD NOT, MAY are normative.

IV. Scenario Format

A. Required Fields

Each scenario MUST contain:

scenario_id
title
preconditions
actors (roles/scopes/trust levels)
inputs (normalized message events or conversational utterances)
steps (ordered)
expected (assertions)
- command_envelope assertions
- evidence assertions (artifact types + fields)
- ooc assertions (response structure)
- state assertions (optional, sandbox-only)

B. Example Skeleton (YAML)

```yaml id="cztto6"
scenario_id: "TVRS-IDS-001"
title: "Duplicate delivery does not duplicate effects"
preconditions:
tenant: "sandbox"
actors:

actor_id: "userA" trust_level: "L2" scopes: ["orders.cancel"] inputs:
type: "text" from: "userA" message: "cancel order 204" steps:
action: "deliver_duplicate" times: 2 expected: evidence: must_include:
- artifact_type: "execution.executed" fields: lifecycle.stage: "executed" assertions:
  - "exactly_once_effects" ```

V. Minimum Required Scenario Set (v1.0.0)

Implementations claiming conformance MUST pass the following scenarios (where applicable).

A. CCP Scenarios

TVRS-CCP-001 Mutation requires confirmation (S1)
TVRS-CCP-002 Destructive requires strengthened confirmation (S2)
TVRS-CCP-003 Ambiguous target rejected and disambiguated

B. ACSM Scenarios

TVRS-ACSM-001 Admin command denied without scope
TVRS-ACSM-002 Admin high-impact requires step-up (L3)
TVRS-ACSM-003 Step-up freshness expiry blocks execution

C. IDS Scenarios

TVRS-IDS-001 Duplicate delivery → outcome replay, no double effect
TVRS-IDS-002 Concurrent duplicates converge to single execution
TVRS-IDS-003 Idempotency collision rejects with evidence

D. EAS Scenarios

TVRS-EAS-001 Evidence completeness for mutation lifecycle
TVRS-EAS-002 Evidence trace binds to conversation/message IDs
TVRS-EAS-003 Append-only semantics (no mutation; correction emits new artifact)

E. OoC Scenarios

TVRS-OOC-001 OoC status returns summary-first
TVRS-OOC-002 OoC explains denial reason (scopes/policy)
TVRS-OOC-003 OoC evidence drill-down works with redaction

F. RCP Scenarios

TVRS-RCP-001 Recovery options exist for failed command
TVRS-RCP-002 Retry emits evidence and converges
TVRS-RCP-003 Compensation requires step-up and emits compensation evidence

G. AIP Scenarios (If agents exist)

TVRS-AIP-001 Agent cannot execute S2/S3 without HITL
TVRS-AIP-002 Agent proposals link to executed evidence

VI. Expected Evidence Assertions

For each scenario, expected evidence MUST reference EAS artifact types and minimum field assertions, including:

trace.conversation_id, trace.message_ids
lifecycle.command_id, lifecycle.stage
security.authz.decision (+ scopes)
security.step_up / security.confirmation where applicable
payload.intent, payload.result.status

TVRS MAY define “field masks” to avoid leaking secrets in test fixtures.

VII. Expected OoC Assertions

OoC outputs MUST be validated structurally:

summary-first response
stable identifiers (command_id, correlation_id)
numbered drill-down options (where applicable)
redaction rules respected

TVRS does not require exact wording, only semantic content and structure.

VIII. Safe Sandbox Execution Requirements

Runtime conformance tests MUST:

run in a sandbox tenant/environment
avoid irreversible real-world effects
use mock connectors or test-mode payment/shipping providers
rate-limit scenario execution
clean up created test entities (where safe) or rely on soft-delete policies

IX. Mapping to Controls (RCMC/CATS)

Each scenario MUST map to one or more control IDs (RCMC). Example:

TVRS-IDS-001 → WFS-I-002, WFS-I-003
TVRS-CCP-002 → WFS-P-003
TVRS-RCP-003 → WFS-R-003, WFS-S-002

Audit toolkits SHOULD report failures using control IDs and scenario IDs.

X. Security Considerations

Test vectors can reveal operational internals. Implementations SHOULD:

keep full scenario fixtures private if they include sensitive shapes
publish only abstract scenario definitions publicly
ensure test accounts and data are isolated from production

XI. Conclusion

TVRS provides the missing “conformance suite” layer for WhatsApp-first: portable scenarios that validate the lived behavior of command execution, security gating, idempotency under duplicates, evidence completeness, conversational observability, and governed recovery. This makes the specification measurable and certification-ready.

References

[1] RFC-WF-0003, Conversational Command Protocol (CCP).
[2] RFC-WF-0004, Administrative Command Security Model (ACSM).
[3] RFC-WF-0005, Command Registry & Capability Declaration (CRCD).
[4] RFC-WF-0006, Evidence Artifact Schema (EAS).
[5] RFC-WF-0007, Observability over Conversation (OoC).
[6] RFC-WF-0008, Recovery & Compensation Protocol (RCP).
[7] RFC-WF-0010, Idempotency & Delivery Semantics (IDS).
[8] RFC-WF-0011, Compliance Audit Toolkit Spec (CATS).
[9] RFC-WF-0016, Reference Compliance Matrix & Control IDs (RCMC).

Concepts and Technologies

Conformance suites, test vectors, scenario-driven validation, duplicate delivery tests, step-up freshness tests, evidence assertions, OoC structural validation, recovery/compensation scenarios, mapping scenarios to control IDs.

DEV Community