RFC-WF-0019
Data Retention, Redaction & Privacy over Conversation (DRPC)
Status: Draft Standard
Version: 1.0.0
Date: 20 Nov 2025
Category: Standards Track
Author: FullAgenticStack Initiative
Dependencies: RFC-WF-0004 (ACSM), RFC-WF-0006 (EAS), RFC-WF-0007 (OoC), RFC-WF-0014 (CAMN), RFC-WF-0015 (PPGP), RFC-WF-0016 (RCMC)
License: Open Specification (Public, Royalty-Free)
Abstract
This document specifies Data Retention, Redaction & Privacy over Conversation (DRPC) for WhatsApp-first systems. DRPC defines normative requirements for data minimization, retention schedules, redaction policies, and privacy-safe conversational observability across messages, transcripts, media, evidence artifacts, and operational telemetry. DRPC establishes a governance model that preserves auditability (EAS) while enabling privacy and regulatory compliance (e.g., GDPR/LGPD-aligned principles) in systems operated primarily through WhatsApp.
Index Terms— data retention, redaction, privacy, LGPD, GDPR, conversational systems, evidence artifacts, minimization, media retention.
I. Introduction
WhatsApp-first systems concentrate operational power and sensitive data into a conversational channel: message text, audio, documents, images, and the derived telemetry and evidence records created by execution. Without a clear retention and redaction standard, systems risk either:
- retaining too much (privacy risk, breach blast radius), or
- retaining too little (no auditability, unverifiable compliance)
DRPC defines how to keep systems both auditable and privacy-safe, using policy packs (PPGP) and scope-gated access (ACSM).
II. Scope
DRPC specifies:
- Data categories and classification
- Mandatory retention capabilities and default guidance
- Redaction policies for OoC and evidence exposure
- Media and transcript handling (STT)
- Multi-tenant privacy constraints
- Data export and deletion semantics compatible with append-only evidence
- Evidence minimization patterns (store references/hashes vs raw payloads)
DRPC does not provide legal advice; it defines technical controls that support privacy principles.
III. Normative Language
MUST, MUST NOT, SHOULD, SHOULD NOT, MAY are normative.
IV. Definitions
Retention Policy: Rules defining how long a data category is stored and under what conditions it is purged or archived.
Redaction: Removal or masking of sensitive fields from outputs.
Minimization: Collecting/storing only what is necessary for operation and audit.
Derived Data: Data produced from raw inputs, e.g., STT transcripts, embeddings, summaries.
Evidence Minimalism: Storing evidence references/hashes rather than raw sensitive payloads where feasible.
V. Data Categories (Normative Classification)
Implementations MUST classify stored data into at least:
- D1 Conversation Content: message text, interactive replies
- D2 Audio Content: audio blobs and STT transcripts
- D3 Media Content: documents, images, attachments, OCR output (if used)
- D4 Operational State: domain data (orders, inventory, CRM records)
- D5 Evidence Artifacts: append-only EAS records and integrity chain
- D6 Telemetry: logs/metrics/traces/events (including OoC query logs)
- D7 Secrets: API keys, tokens, connector credentials (special handling)
Each category MUST have retention and redaction rules.
VI. Minimization Requirements
A. Store-by-Reference (Recommended Default)
Systems SHOULD avoid duplicating raw message/media payloads across multiple stores. Preferred approach:
- store raw media in a controlled blob store
- store hashes + storage refs in NMEs and evidence
- store transcripts/summaries only if needed
B. Evidence Minimalism (Normative Constraint)
Evidence artifacts MUST NOT require storing full raw sensitive payloads to prove execution. Evidence SHOULD contain:
- command intent and normalized args (redacted as needed)
- affected resources summary
- references to raw inputs via
message_idand optional content hash - policy decisions and outcomes
C. Derived Data Control
Derived data (STT transcripts, OCR, embeddings) MUST be:
- labeled with provenance (source message id)
- governed by retention policies
- redacted in OoC outputs by default unless privileged
VII. Retention Policy Model
A. Mandatory Support
Implementations MUST support policy configuration per data category:
- retention duration
- deletion method (purge vs archive)
- legal hold / incident hold (optional but recommended)
- tenant overrides (where allowed)
Retention policies SHOULD be distributed via PPGP.
B. Default Guidance (Non-Normative)
Typical defaults (illustrative; policy-defined):
- D1 conversation text: 90–180 days
- D2 raw audio: 7–30 days; transcript 30–90 days
- D3 documents/images: 7–90 days (depending on business need)
- D5 evidence: longest retention (audit), possibly years
- D6 telemetry: shorter hot retention + longer aggregated retention
C. Purge Semantics
When retention expires, the system MUST:
- purge or archive per policy
- preserve evidence integrity semantics (Section X)
VIII. Redaction Rules for Conversational Outputs (OoC)
A. Default Redaction
OoC MUST apply redaction by default for:
- PII fields (masking)
- secrets (drop)
- internal topology identifiers (drop)
- full document contents (not displayed unless privileged)
B. Privileged Detail Levels
If higher-detail OoC views are supported:
- they MUST be scope-gated (ACSM)
- they SHOULD require step-up for sensitive data
- they MUST log access as evidence/telemetry
C. Structured Redaction
Redaction policies MUST be structured and testable (e.g., field-level rules), and SHOULD be bound via PPGP.
IX. Multi-Tenant Privacy Constraints
Implementations MUST guarantee:
- strict tenant isolation for all categories
- OoC queries cannot enumerate other tenants’ data
- evidence queries enforce tenant scoping and authorization
Cross-tenant leakage MUST be treated as a critical compliance failure.
X. Deletion, Export, and “Right-to-Delete” Compatibility with Evidence
A. Append-only Evidence Constraint
Evidence artifacts are append-only; therefore “deleting history” is not compatible with audit integrity.
To reconcile privacy deletion with evidence:
- systems SHOULD delete or anonymize raw content (D1–D3) while retaining minimal evidence (D5)
- evidence SHOULD store references/hashes, not raw content
- when required, evidence may store pseudonymous identifiers instead of direct PII
B. Anonymization Events
If user data is deleted/anonymized, the system SHOULD emit an evidence artifact indicating:
- what category was purged
- when
- under which policy/legal basis
C. Export
Systems MAY support exporting user-visible operational history. Exports MUST respect redaction policies and MUST NOT expose secrets.
XI. Media and Transcript Handling (CAMN Alignment)
- Media storage references MUST be access-controlled and tenant-scoped.
- STT transcripts MUST be treated as sensitive derived data.
- OoC MUST not reveal raw media content unless explicitly authorized.
XII. Compliance Controls (RCMC/CATS Alignment)
Implementations MUST be auditable for:
- existence of retention policies per category
- redaction applied to OoC outputs
- evidence minimalism (no unnecessary raw payload replication)
- access logs for privileged views
- tenant isolation checks
These map to control IDs (defined in RCMC) and validated by CATS/TVRS where applicable.
XIII. Security Considerations
- Over-retention increases breach impact; under-retention breaks auditability.
- Derived data can leak more than raw input (summaries/embeddings).
- Privileged OoC access must be monitored and rate-limited.
- Secrets must never appear in evidence or OoC outputs.
XIV. Conclusion
DRPC defines the privacy spine of WhatsApp-first systems: retention and redaction policies that preserve conversational operability while limiting data exposure and supporting regulatory-aligned principles. By combining evidence minimalism, tenant isolation, and scope-gated observability, DRPC enables WhatsApp-first to scale responsibly.
References
[1] RFC-WF-0004, Administrative Command Security Model (ACSM).
[2] RFC-WF-0006, Evidence Artifact Schema (EAS).
[3] RFC-WF-0007, Observability over Conversation (OoC).
[4] RFC-WF-0014, Channel Adapter & Message Normalization (CAMN).
[5] RFC-WF-0015, Policy Packs & Governance Profiles (PPGP).
[6] RFC-WF-0016, Reference Compliance Matrix & Control IDs (RCMC).
Concepts and Technologies
Data minimization, retention schedules, field-level redaction, derived data governance (STT/OCR/embeddings), evidence minimalism, append-only audit compatibility, anonymization events, tenant isolation, privileged access logging.
Top comments (0)