DEV Community

Cover image for FullAgenticStack WhatsApp-first: RFC-WF-0003
suissAI
suissAI

Posted on

FullAgenticStack WhatsApp-first: RFC-WF-0003

RFC-WF-0003

Conversational Command Protocol (CCP)

Status: Draft Standard
Version: 1.0.0
Date: 20 Nov 2025
Category: Standards Track
Author: FullAgenticStack Initiative
Dependencies: RFC-WF-0001 (WFCS), RFC-WF-0002 (WWCS)
License: Open Specification (Public, Royalty-Free)


Abstract

This document specifies the Conversational Command Protocol (CCP) for WhatsApp-first systems. CCP defines normative requirements for representing, validating, confirming, and executing operational and administrative actions through WhatsApp while ensuring semantic clarity, accidental-execution resistance, auditability, idempotency, and multimodal compatibility (text + audio via speech-to-text). CCP provides a canonical Command Envelope that binds user interaction traces to deterministic execution semantics and replay-safe outcomes.

Index Terms— WhatsApp-first, conversational systems, command protocol, idempotency, auditability, multimodal interaction, administrative operations, safety confirmation.


I. Introduction

WhatsApp-first systems require that domain capabilities—operational, administrative, recovery, and observability—be executable through WhatsApp interactions, without mandatory dependence on external dashboards. CCP formalizes the minimum protocol needed to treat chat messages as executable operations rather than informal requests.

Two tensions motivate CCP:

  1. WhatsApp is human-centric and ambiguous; production systems require explicit, replay-safe execution semantics.
  2. Multimodal input (especially audio) increases ambiguity; destructive actions require safeguards against misrecognition and accidental triggers.

This specification defines a protocol that bridges these tensions by introducing a canonical representation (the Command Envelope), mandatory confirmation rules for state mutation, and idempotent execution constraints.


II. Scope

This standard specifies:

  • Minimum command surface forms (natural language, explicit tokens, action menus)
  • Canonicalization into a Command Envelope
  • Confirmation requirements and strengthened safeguards for destructive operations
  • Idempotency and replay-safety requirements
  • Multimodal requirements for text and audio (STT-based)
  • Required audit and trace linkage between WhatsApp messages and execution outcomes

This standard does not fully specify cryptographic identity, authentication protocols, or end-to-end encryption. It mandates integration points for authorization and audit but defers cryptographic and identity details to a dedicated security RFC.


III. Normative Language

The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are to be interpreted as normative requirements.


IV. Definitions

Command: A user intention to perform a domain or administrative action.
Command Envelope: Canonical, persisted representation of a command used for execution.
State Mutation: Any persistent change to system state (DB, cache, event log, configuration).
Confirmation Step: Explicit user affirmation required prior to executing a state mutation.
Idempotency Key: A stable key used to prevent duplicate execution of a logically identical command.
Destructive Action: An action with potential data loss or irreversible effect (e.g., delete, cancel, revoke, reset).


V. Design Goals

A CCP-compliant system MUST ensure:

  • G1. Conversational Executability: Commands can be initiated and completed through WhatsApp.
  • G2. Canonical Execution: Execution occurs only from the canonical envelope, not raw text/audio.
  • G3. Accidental-Execution Resistance: State mutations require explicit confirmation; destructive actions require strengthened confirmation.
  • G4. Auditability: Commands and outcomes are traceable to conversation artifacts (message IDs, conversation IDs).
  • G5. Replay Safety: Repeated delivery, retries, or duplicated messages MUST NOT cause double execution.
  • G6. Multimodal Parity: Audio commands MUST be supported via STT, with canonical preview prior to confirmation.

VI. Command Surface Forms

A. Human-Friendly Commands (HFC)

The system MUST accept natural language commands via text and STT-transcribed audio (e.g., “Cancel order 204”, “Generate monthly report”).

B. Explicit Command Tokens (ECT)

The system SHOULD support an explicit, structured command form for advanced users (e.g., CANCEL_ORDER id=204). This form MUST remain conversational and MUST canonicalize into the same envelope model.

C. Action Menus (AM)

The system MUST support menu-based selection (e.g., numbered options) as a first-class command initiation surface. Selection MUST canonicalize into the same envelope model as HFC/ECT.


VII. Command Canonicalization

A. Canonicalization Requirement

For any command that results in a state mutation:

  1. The system MUST produce a Command Envelope prior to execution.
  2. The system MUST execute only from the envelope.
  3. The system MUST persist the envelope before execution.

B. Command Envelope: Minimum Fields

A command that may mutate state MUST be represented by an envelope with at least:

  • command_id (globally unique)
  • timestamp (ISO 8601)
  • actor (user/channel/auth context reference)
  • intent (entity/action/target)
  • args (normalized arguments)
  • confirmation (required, method, confirmed_at when applicable)
  • idempotency_key
  • trace (conversation_id + message identifiers)

Example (illustrative):

{
  "command_id": "uuid",
  "timestamp": "2026-02-22T00:00:00Z",
  "actor": {
    "user_id": "string",
    "channel": "whatsapp",
    "auth_context_id": "string"
  },
  "intent": {
    "entity": "Order",
    "action": "Cancel",
    "target": { "id": "204" }
  },
  "args": { "reason": "optional" },
  "confirmation": {
    "required": true,
    "method": "explicit_yes_or_token",
    "confirmed_at": "optional"
  },
  "idempotency_key": "string",
  "trace": {
    "conversation_id": "string",
    "message_ids": ["string"]
  }
}
Enter fullscreen mode Exit fullscreen mode

C. Normalization Constraints

Arguments and targets MUST be normalized (e.g., trimming whitespace, canonical casing, stable numeric parsing) before envelope persistence and idempotency computation.


VIII. Confirmation Semantics

A. Mandatory Confirmation for State Mutation

Any command that mutates state MUST require a confirmation step prior to execution.

Commands that are strictly read-only MAY execute without confirmation.

B. Destructive Action Safeguards

Destructive actions MUST require strengthened confirmation, using at least one of:

  • A fixed explicit phrase (e.g., “TYPE CONFIRM”)
  • A context-bound explicit phrase (e.g., “YES, CANCEL ORDER 204”)
  • A short system-generated token (e.g., “Confirm with C9F2”)

Destructive actions MUST NOT rely solely on a generic “yes” response captured via audio STT.

C. Preview Before Commit

Before collecting confirmation for any state mutation, the system MUST display a preview including:

  • Target entity and identifier
  • Action to be executed
  • Primary effect / impact
  • Reversibility or compensation availability (if any)
  • Actor identity (human-readable label)

IX. Idempotency and Replay Safety

A. Idempotency Key Requirement

All state-mutating commands MUST include an idempotency_key that prevents duplicate execution under message retries, duplicated deliveries, or client resends.

The idempotency key SHOULD be derived from a stable hash of:

  • actor identity
  • intent (entity/action/target)
  • normalized args
  • a bounded time window or monotonic command issuance context

B. Execution De-duplication

If a command is received whose idempotency key matches a previously executed command within the applicable window, the system MUST NOT re-execute the mutation and MUST return the previously recorded outcome (or a stable reference to it).

C. Deterministic Outcome Logging

The system MUST record an outcome state for each command, at minimum:

  • accepted | confirmed | executed | rejected | failed | compensated

If failed, the system SHOULD record a reason code and a human-readable explanation suitable for WhatsApp display.


X. Multimodal Requirements (Text and Audio)

A. Audio Acceptance and STT

The system MUST accept audio input and transcribe it via STT for command interpretation.

B. Canonical Preview for Audio

For any command initiated from audio that may mutate state, the system MUST show the interpreted canonical intent and target prior to confirmation.

C. Trace Retention

The system SHOULD retain: (1) the transcription, and (2) a pointer to the original audio message identifier, bound to the envelope trace.


XI. Ambiguity Handling

The system MUST reject ambiguous commands that cannot be canonicalized safely, and MUST request disambiguation within WhatsApp by:

  • listing candidates as numbered options, and/or
  • requesting missing fields explicitly

The system MUST NOT “guess” a destructive target when multiple plausible targets exist.


XII. Compliance Profiles

A CCP implementation MAY advertise one of the following profiles:

  • CCP-Basic: Envelope + confirmation for mutations + minimal idempotency
  • CCP-Strong: Destructive safeguards + preview + robust replay-safe behavior
  • CCP-Auditable: Strong + full trace linkage + append-only outcome record semantics

XIII. Relationship to Other WhatsApp-first RFCs

  • RFC-WF-0001 (WFCS): defines classification constraints (parity, multimodal minimums, admin sovereignty).
  • RFC-WF-0002 (WWCS): defines UI-to-conversation conversion patterns.
  • This RFC (CCP): defines command execution semantics and envelope-based determinism.

XIV. Security Considerations

CCP mandates integration points for authorization and audit:

  1. The system MUST authorize each command against actor privileges and capability scope.
  2. The system MUST maintain an auditable record binding envelopes to conversation traces.
  3. The system SHOULD protect destructive actions with strengthened confirmations resistant to STT ambiguity.

Cryptographic identity, channel binding, and end-to-end security requirements are deferred to the Administrative Command Security Model RFC.


XV. Conclusion

The Conversational Command Protocol standardizes how WhatsApp interactions become safe, replay-resistant, and auditable operations. By enforcing envelope-based canonical execution, mandatory confirmation for mutations, and idempotent semantics, CCP provides the minimum protocol substrate required for truly operable WhatsApp-first systems.


References

[1] RFC-WF-0001, WhatsApp-First Compliance Core (WFCS), Draft/Proposed Standard.
[2] RFC-WF-0002, Web-to-WhatsApp Conversion Standard (WWCS), Draft/Proposed Standard.


Concepts and Technologies

Command envelope, idempotency key, replay-safe execution, destructive-action safeguards, STT-based multimodal input, conversational audit trails, capability authorization hooks.

Top comments (0)