DEV Community

Cover image for FullAgenticStack WhatsApp-first: RFC-WF-0019
suissAI
suissAI

Posted on

FullAgenticStack WhatsApp-first: RFC-WF-0019

RFC-WF-0019

Data Retention, Redaction & Privacy over Conversation (DRPC)

Status: Draft Standard
Version: 1.0.0
Date: 20 Nov 2025
Category: Standards Track
Author: FullAgenticStack Initiative
Dependencies: RFC-WF-0004 (ACSM), RFC-WF-0006 (EAS), RFC-WF-0007 (OoC), RFC-WF-0014 (CAMN), RFC-WF-0015 (PPGP), RFC-WF-0016 (RCMC)
License: Open Specification (Public, Royalty-Free)


Abstract

This document specifies Data Retention, Redaction & Privacy over Conversation (DRPC) for WhatsApp-first systems. DRPC defines normative requirements for data minimization, retention schedules, redaction policies, and privacy-safe conversational observability across messages, transcripts, media, evidence artifacts, and operational telemetry. DRPC establishes a governance model that preserves auditability (EAS) while enabling privacy and regulatory compliance (e.g., GDPR/LGPD-aligned principles) in systems operated primarily through WhatsApp.

Index Terms— data retention, redaction, privacy, LGPD, GDPR, conversational systems, evidence artifacts, minimization, media retention.


I. Introduction

WhatsApp-first systems concentrate operational power and sensitive data into a conversational channel: message text, audio, documents, images, and the derived telemetry and evidence records created by execution. Without a clear retention and redaction standard, systems risk either:

  • retaining too much (privacy risk, breach blast radius), or
  • retaining too little (no auditability, unverifiable compliance)

DRPC defines how to keep systems both auditable and privacy-safe, using policy packs (PPGP) and scope-gated access (ACSM).


II. Scope

DRPC specifies:

  • Data categories and classification
  • Mandatory retention capabilities and default guidance
  • Redaction policies for OoC and evidence exposure
  • Media and transcript handling (STT)
  • Multi-tenant privacy constraints
  • Data export and deletion semantics compatible with append-only evidence
  • Evidence minimization patterns (store references/hashes vs raw payloads)

DRPC does not provide legal advice; it defines technical controls that support privacy principles.


III. Normative Language

MUST, MUST NOT, SHOULD, SHOULD NOT, MAY are normative.


IV. Definitions

Retention Policy: Rules defining how long a data category is stored and under what conditions it is purged or archived.
Redaction: Removal or masking of sensitive fields from outputs.
Minimization: Collecting/storing only what is necessary for operation and audit.
Derived Data: Data produced from raw inputs, e.g., STT transcripts, embeddings, summaries.
Evidence Minimalism: Storing evidence references/hashes rather than raw sensitive payloads where feasible.


V. Data Categories (Normative Classification)

Implementations MUST classify stored data into at least:

  • D1 Conversation Content: message text, interactive replies
  • D2 Audio Content: audio blobs and STT transcripts
  • D3 Media Content: documents, images, attachments, OCR output (if used)
  • D4 Operational State: domain data (orders, inventory, CRM records)
  • D5 Evidence Artifacts: append-only EAS records and integrity chain
  • D6 Telemetry: logs/metrics/traces/events (including OoC query logs)
  • D7 Secrets: API keys, tokens, connector credentials (special handling)

Each category MUST have retention and redaction rules.


VI. Minimization Requirements

A. Store-by-Reference (Recommended Default)

Systems SHOULD avoid duplicating raw message/media payloads across multiple stores. Preferred approach:

  • store raw media in a controlled blob store
  • store hashes + storage refs in NMEs and evidence
  • store transcripts/summaries only if needed

B. Evidence Minimalism (Normative Constraint)

Evidence artifacts MUST NOT require storing full raw sensitive payloads to prove execution. Evidence SHOULD contain:

  • command intent and normalized args (redacted as needed)
  • affected resources summary
  • references to raw inputs via message_id and optional content hash
  • policy decisions and outcomes

C. Derived Data Control

Derived data (STT transcripts, OCR, embeddings) MUST be:

  • labeled with provenance (source message id)
  • governed by retention policies
  • redacted in OoC outputs by default unless privileged

VII. Retention Policy Model

A. Mandatory Support

Implementations MUST support policy configuration per data category:

  • retention duration
  • deletion method (purge vs archive)
  • legal hold / incident hold (optional but recommended)
  • tenant overrides (where allowed)

Retention policies SHOULD be distributed via PPGP.

B. Default Guidance (Non-Normative)

Typical defaults (illustrative; policy-defined):

  • D1 conversation text: 90–180 days
  • D2 raw audio: 7–30 days; transcript 30–90 days
  • D3 documents/images: 7–90 days (depending on business need)
  • D5 evidence: longest retention (audit), possibly years
  • D6 telemetry: shorter hot retention + longer aggregated retention

C. Purge Semantics

When retention expires, the system MUST:

  • purge or archive per policy
  • preserve evidence integrity semantics (Section X)

VIII. Redaction Rules for Conversational Outputs (OoC)

A. Default Redaction

OoC MUST apply redaction by default for:

  • PII fields (masking)
  • secrets (drop)
  • internal topology identifiers (drop)
  • full document contents (not displayed unless privileged)

B. Privileged Detail Levels

If higher-detail OoC views are supported:

  • they MUST be scope-gated (ACSM)
  • they SHOULD require step-up for sensitive data
  • they MUST log access as evidence/telemetry

C. Structured Redaction

Redaction policies MUST be structured and testable (e.g., field-level rules), and SHOULD be bound via PPGP.


IX. Multi-Tenant Privacy Constraints

Implementations MUST guarantee:

  • strict tenant isolation for all categories
  • OoC queries cannot enumerate other tenants’ data
  • evidence queries enforce tenant scoping and authorization

Cross-tenant leakage MUST be treated as a critical compliance failure.


X. Deletion, Export, and “Right-to-Delete” Compatibility with Evidence

A. Append-only Evidence Constraint

Evidence artifacts are append-only; therefore “deleting history” is not compatible with audit integrity.

To reconcile privacy deletion with evidence:

  • systems SHOULD delete or anonymize raw content (D1–D3) while retaining minimal evidence (D5)
  • evidence SHOULD store references/hashes, not raw content
  • when required, evidence may store pseudonymous identifiers instead of direct PII

B. Anonymization Events

If user data is deleted/anonymized, the system SHOULD emit an evidence artifact indicating:

  • what category was purged
  • when
  • under which policy/legal basis

C. Export

Systems MAY support exporting user-visible operational history. Exports MUST respect redaction policies and MUST NOT expose secrets.


XI. Media and Transcript Handling (CAMN Alignment)

  • Media storage references MUST be access-controlled and tenant-scoped.
  • STT transcripts MUST be treated as sensitive derived data.
  • OoC MUST not reveal raw media content unless explicitly authorized.

XII. Compliance Controls (RCMC/CATS Alignment)

Implementations MUST be auditable for:

  • existence of retention policies per category
  • redaction applied to OoC outputs
  • evidence minimalism (no unnecessary raw payload replication)
  • access logs for privileged views
  • tenant isolation checks

These map to control IDs (defined in RCMC) and validated by CATS/TVRS where applicable.


XIII. Security Considerations

  • Over-retention increases breach impact; under-retention breaks auditability.
  • Derived data can leak more than raw input (summaries/embeddings).
  • Privileged OoC access must be monitored and rate-limited.
  • Secrets must never appear in evidence or OoC outputs.

XIV. Conclusion

DRPC defines the privacy spine of WhatsApp-first systems: retention and redaction policies that preserve conversational operability while limiting data exposure and supporting regulatory-aligned principles. By combining evidence minimalism, tenant isolation, and scope-gated observability, DRPC enables WhatsApp-first to scale responsibly.


References

[1] RFC-WF-0004, Administrative Command Security Model (ACSM).
[2] RFC-WF-0006, Evidence Artifact Schema (EAS).
[3] RFC-WF-0007, Observability over Conversation (OoC).
[4] RFC-WF-0014, Channel Adapter & Message Normalization (CAMN).
[5] RFC-WF-0015, Policy Packs & Governance Profiles (PPGP).
[6] RFC-WF-0016, Reference Compliance Matrix & Control IDs (RCMC).


Concepts and Technologies

Data minimization, retention schedules, field-level redaction, derived data governance (STT/OCR/embeddings), evidence minimalism, append-only audit compatibility, anonymization events, tenant isolation, privileged access logging.

Top comments (0)