DEV Community

suissAI
suissAI

Posted on

FullAgenticStack WhatsApp-first: RFC-WF-0017

RFC-WF-0017

Test Vectors & Reference Scenarios (TVRS)

Status: Draft Standard
Version: 1.0.0
Date: 20 Nov 2025
Category: Standards Track
Author: FullAgenticStack Initiative
Dependencies: RFC-WF-0003 (CCP), RFC-WF-0004 (ACSM), RFC-WF-0005 (CRCD), RFC-WF-0006 (EAS), RFC-WF-0007 (OoC), RFC-WF-0008 (RCP), RFC-WF-0010 (IDS), RFC-WF-0011 (CATS), RFC-WF-0016 (RCMC)
License: Open Specification (Public, Royalty-Free)


Abstract

This document specifies Test Vectors & Reference Scenarios (TVRS) for WhatsApp-first systems. TVRS defines a canonical set of reference scenarios, message sequences, expected evidence artifacts, and expected conversational outputs used to validate conformance with CCP/ACSM/IDS/EAS/OoC/RCP requirements and to power automated audits (CATS). TVRS acts as a portable “conformance suite” enabling implementations to demonstrate behavior under duplicates, ambiguity, privilege boundaries, and recovery flows.

Index Terms— conformance testing, test vectors, reference scenarios, message sequences, evidence expectations, idempotency tests, recovery tests, WhatsApp-first.


I. Introduction

Specifications become enforceable when accompanied by test vectors. TVRS provides concrete scenarios that an implementation can execute in a sandbox tenant to prove compliance—especially for behaviors that cannot be validated statically (duplicates, retries, step-up, recovery).

TVRS is not an application test suite; it is a protocol conformance suite.


II. Scope

TVRS specifies:

  • A standard scenario format (machine-readable)
  • Minimum required scenarios for WhatsApp-first conformance
  • Expected outcomes and evidence requirements (EAS) per scenario
  • Expected OoC responses and drill-down behavior
  • Guidance for safe sandbox execution

TVRS does not mandate a testing framework, only scenario descriptions and expected assertions.


III. Normative Language

MUST, MUST NOT, SHOULD, SHOULD NOT, MAY are normative.


IV. Scenario Format

A. Required Fields

Each scenario MUST contain:

  • scenario_id
  • title
  • preconditions
  • actors (roles/scopes/trust levels)
  • inputs (normalized message events or conversational utterances)
  • steps (ordered)
  • expected (assertions)

    • command_envelope assertions
    • evidence assertions (artifact types + fields)
    • ooc assertions (response structure)
    • state assertions (optional, sandbox-only)

B. Example Skeleton (YAML)

```yaml id="cztto6"
scenario_id: "TVRS-IDS-001"
title: "Duplicate delivery does not duplicate effects"
preconditions:
tenant: "sandbox"
actors:

  • actor_id: "userA" trust_level: "L2" scopes: ["orders.cancel"] inputs:
  • type: "text" from: "userA" message: "cancel order 204" steps:
  • action: "deliver_duplicate" times: 2 expected: evidence: must_include:
    • artifact_type: "execution.executed" fields: lifecycle.stage: "executed" assertions:
      • "exactly_once_effects" ```

V. Minimum Required Scenario Set (v1.0.0)

Implementations claiming conformance MUST pass the following scenarios (where applicable).

A. CCP Scenarios

  • TVRS-CCP-001 Mutation requires confirmation (S1)
  • TVRS-CCP-002 Destructive requires strengthened confirmation (S2)
  • TVRS-CCP-003 Ambiguous target rejected and disambiguated

B. ACSM Scenarios

  • TVRS-ACSM-001 Admin command denied without scope
  • TVRS-ACSM-002 Admin high-impact requires step-up (L3)
  • TVRS-ACSM-003 Step-up freshness expiry blocks execution

C. IDS Scenarios

  • TVRS-IDS-001 Duplicate delivery → outcome replay, no double effect
  • TVRS-IDS-002 Concurrent duplicates converge to single execution
  • TVRS-IDS-003 Idempotency collision rejects with evidence

D. EAS Scenarios

  • TVRS-EAS-001 Evidence completeness for mutation lifecycle
  • TVRS-EAS-002 Evidence trace binds to conversation/message IDs
  • TVRS-EAS-003 Append-only semantics (no mutation; correction emits new artifact)

E. OoC Scenarios

  • TVRS-OOC-001 OoC status returns summary-first
  • TVRS-OOC-002 OoC explains denial reason (scopes/policy)
  • TVRS-OOC-003 OoC evidence drill-down works with redaction

F. RCP Scenarios

  • TVRS-RCP-001 Recovery options exist for failed command
  • TVRS-RCP-002 Retry emits evidence and converges
  • TVRS-RCP-003 Compensation requires step-up and emits compensation evidence

G. AIP Scenarios (If agents exist)

  • TVRS-AIP-001 Agent cannot execute S2/S3 without HITL
  • TVRS-AIP-002 Agent proposals link to executed evidence

VI. Expected Evidence Assertions

For each scenario, expected evidence MUST reference EAS artifact types and minimum field assertions, including:

  • trace.conversation_id, trace.message_ids
  • lifecycle.command_id, lifecycle.stage
  • security.authz.decision (+ scopes)
  • security.step_up / security.confirmation where applicable
  • payload.intent, payload.result.status

TVRS MAY define “field masks” to avoid leaking secrets in test fixtures.


VII. Expected OoC Assertions

OoC outputs MUST be validated structurally:

  • summary-first response
  • stable identifiers (command_id, correlation_id)
  • numbered drill-down options (where applicable)
  • redaction rules respected

TVRS does not require exact wording, only semantic content and structure.


VIII. Safe Sandbox Execution Requirements

Runtime conformance tests MUST:

  • run in a sandbox tenant/environment
  • avoid irreversible real-world effects
  • use mock connectors or test-mode payment/shipping providers
  • rate-limit scenario execution
  • clean up created test entities (where safe) or rely on soft-delete policies

IX. Mapping to Controls (RCMC/CATS)

Each scenario MUST map to one or more control IDs (RCMC). Example:

  • TVRS-IDS-001 → WFS-I-002, WFS-I-003
  • TVRS-CCP-002 → WFS-P-003
  • TVRS-RCP-003 → WFS-R-003, WFS-S-002

Audit toolkits SHOULD report failures using control IDs and scenario IDs.


X. Security Considerations

Test vectors can reveal operational internals. Implementations SHOULD:

  • keep full scenario fixtures private if they include sensitive shapes
  • publish only abstract scenario definitions publicly
  • ensure test accounts and data are isolated from production

XI. Conclusion

TVRS provides the missing “conformance suite” layer for WhatsApp-first: portable scenarios that validate the lived behavior of command execution, security gating, idempotency under duplicates, evidence completeness, conversational observability, and governed recovery. This makes the specification measurable and certification-ready.


References

[1] RFC-WF-0003, Conversational Command Protocol (CCP).
[2] RFC-WF-0004, Administrative Command Security Model (ACSM).
[3] RFC-WF-0005, Command Registry & Capability Declaration (CRCD).
[4] RFC-WF-0006, Evidence Artifact Schema (EAS).
[5] RFC-WF-0007, Observability over Conversation (OoC).
[6] RFC-WF-0008, Recovery & Compensation Protocol (RCP).
[7] RFC-WF-0010, Idempotency & Delivery Semantics (IDS).
[8] RFC-WF-0011, Compliance Audit Toolkit Spec (CATS).
[9] RFC-WF-0016, Reference Compliance Matrix & Control IDs (RCMC).


Concepts and Technologies

Conformance suites, test vectors, scenario-driven validation, duplicate delivery tests, step-up freshness tests, evidence assertions, OoC structural validation, recovery/compensation scenarios, mapping scenarios to control IDs.

Top comments (0)