suissAI

Posted on Feb 26

FullAgenticStack WhatsApp-first: RFC-WF-0019

#data #privacy #security #systemdesign

RFC-WF-0019

Data Retention, Redaction & Privacy over Conversation (DRPC)

Status: Draft Standard
Version: 1.0.0
Date: 20 Nov 2025
Category: Standards Track
Author: FullAgenticStack Initiative
Dependencies: RFC-WF-0004 (ACSM), RFC-WF-0006 (EAS), RFC-WF-0007 (OoC), RFC-WF-0014 (CAMN), RFC-WF-0015 (PPGP), RFC-WF-0016 (RCMC)
License: Open Specification (Public, Royalty-Free)

Abstract

This document specifies Data Retention, Redaction & Privacy over Conversation (DRPC) for WhatsApp-first systems. DRPC defines normative requirements for data minimization, retention schedules, redaction policies, and privacy-safe conversational observability across messages, transcripts, media, evidence artifacts, and operational telemetry. DRPC establishes a governance model that preserves auditability (EAS) while enabling privacy and regulatory compliance (e.g., GDPR/LGPD-aligned principles) in systems operated primarily through WhatsApp.

Index Terms— data retention, redaction, privacy, LGPD, GDPR, conversational systems, evidence artifacts, minimization, media retention.

I. Introduction

WhatsApp-first systems concentrate operational power and sensitive data into a conversational channel: message text, audio, documents, images, and the derived telemetry and evidence records created by execution. Without a clear retention and redaction standard, systems risk either:

retaining too much (privacy risk, breach blast radius), or
retaining too little (no auditability, unverifiable compliance)

DRPC defines how to keep systems both auditable and privacy-safe, using policy packs (PPGP) and scope-gated access (ACSM).

II. Scope

DRPC specifies:

Data categories and classification
Mandatory retention capabilities and default guidance
Redaction policies for OoC and evidence exposure
Media and transcript handling (STT)
Multi-tenant privacy constraints
Data export and deletion semantics compatible with append-only evidence
Evidence minimization patterns (store references/hashes vs raw payloads)

DRPC does not provide legal advice; it defines technical controls that support privacy principles.

III. Normative Language

MUST, MUST NOT, SHOULD, SHOULD NOT, MAY are normative.

IV. Definitions

Retention Policy: Rules defining how long a data category is stored and under what conditions it is purged or archived.
Redaction: Removal or masking of sensitive fields from outputs.
Minimization: Collecting/storing only what is necessary for operation and audit.
Derived Data: Data produced from raw inputs, e.g., STT transcripts, embeddings, summaries.
Evidence Minimalism: Storing evidence references/hashes rather than raw sensitive payloads where feasible.

V. Data Categories (Normative Classification)

Implementations MUST classify stored data into at least:

D1 Conversation Content: message text, interactive replies
D2 Audio Content: audio blobs and STT transcripts
D3 Media Content: documents, images, attachments, OCR output (if used)
D4 Operational State: domain data (orders, inventory, CRM records)
D5 Evidence Artifacts: append-only EAS records and integrity chain
D6 Telemetry: logs/metrics/traces/events (including OoC query logs)
D7 Secrets: API keys, tokens, connector credentials (special handling)

Each category MUST have retention and redaction rules.

VI. Minimization Requirements

A. Store-by-Reference (Recommended Default)

Systems SHOULD avoid duplicating raw message/media payloads across multiple stores. Preferred approach:

store raw media in a controlled blob store
store hashes + storage refs in NMEs and evidence
store transcripts/summaries only if needed

B. Evidence Minimalism (Normative Constraint)

Evidence artifacts MUST NOT require storing full raw sensitive payloads to prove execution. Evidence SHOULD contain:

command intent and normalized args (redacted as needed)
affected resources summary
references to raw inputs via message_id and optional content hash
policy decisions and outcomes

C. Derived Data Control

Derived data (STT transcripts, OCR, embeddings) MUST be:

labeled with provenance (source message id)
governed by retention policies
redacted in OoC outputs by default unless privileged

VII. Retention Policy Model

A. Mandatory Support

Implementations MUST support policy configuration per data category:

retention duration
deletion method (purge vs archive)
legal hold / incident hold (optional but recommended)
tenant overrides (where allowed)

Retention policies SHOULD be distributed via PPGP.

B. Default Guidance (Non-Normative)

Typical defaults (illustrative; policy-defined):

D1 conversation text: 90–180 days
D2 raw audio: 7–30 days; transcript 30–90 days
D3 documents/images: 7–90 days (depending on business need)
D5 evidence: longest retention (audit), possibly years
D6 telemetry: shorter hot retention + longer aggregated retention

C. Purge Semantics

When retention expires, the system MUST:

purge or archive per policy
preserve evidence integrity semantics (Section X)

VIII. Redaction Rules for Conversational Outputs (OoC)

A. Default Redaction

OoC MUST apply redaction by default for:

PII fields (masking)
secrets (drop)
internal topology identifiers (drop)
full document contents (not displayed unless privileged)

B. Privileged Detail Levels

If higher-detail OoC views are supported:

they MUST be scope-gated (ACSM)
they SHOULD require step-up for sensitive data
they MUST log access as evidence/telemetry

C. Structured Redaction

Redaction policies MUST be structured and testable (e.g., field-level rules), and SHOULD be bound via PPGP.

IX. Multi-Tenant Privacy Constraints

Implementations MUST guarantee:

strict tenant isolation for all categories
OoC queries cannot enumerate other tenants’ data
evidence queries enforce tenant scoping and authorization

Cross-tenant leakage MUST be treated as a critical compliance failure.

X. Deletion, Export, and “Right-to-Delete” Compatibility with Evidence

A. Append-only Evidence Constraint

Evidence artifacts are append-only; therefore “deleting history” is not compatible with audit integrity.

To reconcile privacy deletion with evidence:

systems SHOULD delete or anonymize raw content (D1–D3) while retaining minimal evidence (D5)
evidence SHOULD store references/hashes, not raw content
when required, evidence may store pseudonymous identifiers instead of direct PII

B. Anonymization Events

If user data is deleted/anonymized, the system SHOULD emit an evidence artifact indicating:

what category was purged
when
under which policy/legal basis

C. Export

Systems MAY support exporting user-visible operational history. Exports MUST respect redaction policies and MUST NOT expose secrets.

XI. Media and Transcript Handling (CAMN Alignment)

Media storage references MUST be access-controlled and tenant-scoped.
STT transcripts MUST be treated as sensitive derived data.
OoC MUST not reveal raw media content unless explicitly authorized.

XII. Compliance Controls (RCMC/CATS Alignment)

Implementations MUST be auditable for:

existence of retention policies per category
redaction applied to OoC outputs
evidence minimalism (no unnecessary raw payload replication)
access logs for privileged views
tenant isolation checks

These map to control IDs (defined in RCMC) and validated by CATS/TVRS where applicable.

XIII. Security Considerations

Over-retention increases breach impact; under-retention breaks auditability.
Derived data can leak more than raw input (summaries/embeddings).
Privileged OoC access must be monitored and rate-limited.
Secrets must never appear in evidence or OoC outputs.

XIV. Conclusion

DRPC defines the privacy spine of WhatsApp-first systems: retention and redaction policies that preserve conversational operability while limiting data exposure and supporting regulatory-aligned principles. By combining evidence minimalism, tenant isolation, and scope-gated observability, DRPC enables WhatsApp-first to scale responsibly.

References

[1] RFC-WF-0004, Administrative Command Security Model (ACSM).
[2] RFC-WF-0006, Evidence Artifact Schema (EAS).
[3] RFC-WF-0007, Observability over Conversation (OoC).
[4] RFC-WF-0014, Channel Adapter & Message Normalization (CAMN).
[5] RFC-WF-0015, Policy Packs & Governance Profiles (PPGP).
[6] RFC-WF-0016, Reference Compliance Matrix & Control IDs (RCMC).

Concepts and Technologies

Data minimization, retention schedules, field-level redaction, derived data governance (STT/OCR/embeddings), evidence minimalism, append-only audit compatibility, anonymization events, tenant isolation, privileged access logging.

DEV Community