DEV Community

Victor Okefie
Victor Okefie

Posted on

The Illusion of Data Custody in Legal AI — and the Architecture I Built to Replace It

**There is a moment every legal AI founder eventually has to confront.

You have built a capable system. The retrieval is good. The citations hold up. The interface is clean. A lawyer uploads a sensitive client document and asks a question. The system answers correctly.

Then they ask: what happens to this document when I delete it?
And that is where most legal AI products fail quietly.

Not because the founders were careless. Because they treated data custody as a policy question rather than an architecture question. They added a delete button, wrote a privacy policy, and moved on.

This article is about what I built instead — and why the distinction between a deletion confirmation and a cryptographic Destruction Receipt matters enormously in legal contexts.**

SECTION 1: What actually happens when you click delete

Most AI SaaS platforms handle deletion at the application layer. The record is flagged as deleted. The UI stops showing it. The underlying data — the vector embeddings, the chunked source text, the inference logs — frequently persists on the server for operational or safety-monitoring reasons.

OpenAI's standard API retains inference logs for 30 days by default. This is not a secret. It is documented. It is reasonable for consumer applications. It is architecturally incompatible with a system holding M&A filings, client privilege documents, or regulatory correspondence.
The problem is not malicious intent. The problem is that "deletion" in these systems was never designed to mean what a lawyer means when they say deletion.

A lawyer means: gone. Provably gone. Gone in a way I can demonstrate to a regulator if asked.
A standard SaaS confirmation means: removed from your view.
These are not the same thing.

SECTION 2: RLS isolation — enforcing security where it cannot be bypassed

Row Level Security is a PostgreSQL feature that enforces access control at the database layer — below the application entirely.

Most applications enforce access control in the application layer. A user logs in, the application checks their permissions, and the query is run. The problem with this model is that if the application layer is compromised — a bug, a misconfiguration, a session handling error — the isolation fails. The underlying database is a single shared resource.
With RLS, the isolation is enforced by the database itself. Every query is filtered automatically based on the authenticated user's identity. There is no application-layer bypass because the restriction is not in the application.

-- Enable RLS on the documents table
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;

-- Policy: users can only access their own documents
CREATE POLICY "Users can only access their own documents"
ON documents
FOR ALL
USING (auth.uid() = user_id);

SECTION 3: Zero Data Retention via Azure OpenAI enterprise infrastructure

Standard OpenAI infrastructure retains inference data for abuse monitoring and model improvement purposes. This is the infrastructure most legal AI tools are built on.

Azure OpenAI, Microsoft's enterprise offering, operates under a fundamentally different contractual model. Zero data retention is the default. Content logging is disabled. Your queries are processed and discarded — not stored, not used for model training, not retained for monitoring.

This is not a policy distinction. It is a contractual and architectural distinction. Microsoft's enterprise SLA makes commitments that a privacy policy does not.

The migration in PRISM involved building an abstraction layer that routes inference through Azure while keeping the interface and API calls identical. The user experience is unchanged. What changed is what the infrastructure underneath actually guarantees.

import { AzureOpenAI } from 'openai';

const client = new AzureOpenAI({
endpoint: process.env.AZURE_OPENAI_ENDPOINT!,
apiKey: process.env.AZURE_OPENAI_API_KEY!,
apiVersion: '2024-08-01-preview',
deployment: process.env.AZURE_OPENAI_DEPLOYMENT!,
});

The deployment name points to a model instance running under enterprise data handling terms. The rest of the codebase does not change.

SECTION 4: The Atomic Purge — destroying all layers simultaneously

A document in PRISM exists across multiple data layers: the original PDF, the extracted text chunks, the vector embeddings used for retrieval, and the associated chat history. Standard deletion in most systems touches one or two of these layers. The others linger.

The Atomic Purge executes a single database transaction that destroys all layers simultaneously. Either everything is deleted, or nothing is. There is no partial deletion state.

async function atomicPurge(documentId: string, userId: string) {
const { error } = await supabase.rpc('atomic_document_purge', {
p_document_id: documentId,
p_user_id: userId
});

if (error) throw new Error(Purge failed: ${error.message});
return generateDestructionReceipt(documentId, userId);
}

The stored procedure handles deletion across all tables in sequence within a single transaction. If any step fails, the entire operation rolls back. Nothing is half-deleted.

SECTION 5: The Destruction Receipt — generating a verifiable audit artifact

After the purge completes, PRISM generates a Destruction Receipt: a SHA-256 hash of the document content combined with the deletion timestamp, packaged as a verifiable PDF artifact.

async function generateDestructionReceipt(
documentId: string,
userId: string
): Promise {
const timestamp = new Date().toISOString();
const hash = crypto
.createHash('sha256')
.update(${documentId}:${userId}:${timestamp})
.digest('hex');

return {
documentId,
deletionTimestamp: timestamp,
sha256Hash: hash,
verified: true,
receiptId: DR-${hash.substring(0, 16).toUpperCase()}
};
}

The receipt can be independently verified. Given the document ID, user ID, and timestamp, anyone can recompute the hash and confirm the receipt is authentic. This is not a confirmation email. It is an auditable artifact.

In a legal context, the difference matters. A confirmation email proves that a button was clicked. A cryptographic receipt proves that a specific document, processed by a specific user, was permanently destroyed at a specific moment — and that the receipt itself has not been altered.

**Data custody is not a layer you add to a legal AI product.

It is the foundation you build the product on.

The distinction between a deletion confirmation and a Destruction Receipt seems small in a demo. In a regulatory audit, in a client data incident, in a courtroom — it is not small at all.

Build the receipt before anyone asks for it.
That is what it means to build Left of Bang.**

PRISM v1.1 is live at prism-mu-one.vercel.app

Top comments (0)