DEV Community

Cover image for Why Pasting Client Data into ChatGPT is a GDPR Liability (and the Fix)
Ilya Sib
Ilya Sib

Posted on

Why Pasting Client Data into ChatGPT is a GDPR Liability (and the Fix)

Published as: Ilya, Founder of PrivacyScrubber — privacyscrubber.com

Every week, I watch legal teams, HR professionals, and developers do something that makes compliance officers lose sleep: they paste client files — contracts, resumes, medical records, support tickets — straight into ChatGPT to summarize, draft, or analyze them.

I get it. ChatGPT is genuinely useful. The problem isn't the AI. The problem is the data that rides along with your prompt.


The Legal Reality Nobody Talks About

Let me be specific. Under GDPR Article 28, if you use an AI assistant to process personal data on behalf of clients or employees, you need a Data Processing Agreement (DPA) with that AI provider. OpenAI offers a DPA — but only on their API (not ChatGPT Free), and you still bear the burden of proving lawful processing.

More critically: Article 5(1)(f) requires that personal data be processed with "appropriate security... and protection against unauthorised or unlawful processing." Pasting an unredacted client contract into a third-party AI system is hard to square with that requirement.

This isn't hypothetical. The Italian DPA (Garante) temporarily banned ChatGPT in 2023 specifically over data processing transparency concerns. The EU AI Act, coming into force through 2026, adds enforcement teeth to AI data processing requirements.

Even if you have a DPA, you're still on the hook for:

  • Ensuring the data you send is appropriate to send at all
  • Logging and auditing what personal data left your environment
  • Honoring data subject rights (erasure, portability) for anything processed in the AI

The moment a client's name, email, or ID lands in someone else's inference pipeline, your compliance posture weakens.


The "Incognito Chat" Illusion

"But I turned on ChatGPT's privacy mode / temporary chat / incognito mode — my data isn't used for training."

True: OpenAI's temporary chat doesn't use your conversation for model training. But that's a different claim from "your data never touches their servers."

Every message you send — regardless of privacy settings — is processed server-side. It travels over the network, sits in RAM during inference, and is handled by OpenAI's infrastructure. For data that falls under GDPR, HIPAA, or SOC 2 requirements, "not used for training" is a much weaker guarantee than "never left the browser."

This distinction matters enormously when your clients ask: "How are you handling our data when you use AI?"


What Zero-Trust Data Sanitization Means in Practice

The approach I've been building toward — I call it Zero-Trust Data Sanitization (ZTDS) — treats every AI session as potentially hostile to your clients' privacy. The rule is simple:

No personal data should leave your device before you send it to an AI model.

This means scrubbing PII from your text before it becomes a prompt. Not after. Not with server-side filters you don't control. Before.

Here's how ZTDS works in practice:

  1. Input: You paste a client contract, support ticket, or HR document
  2. Detect: Every name, email, phone number, and ID is identified via regex (runs locally)
  3. Replace: Detected PII is swapped for tokens — [NAME_1], [EMAIL_1], [PHONE_1]
  4. Send: You paste the sanitized version into ChatGPT — no real data crosses the wire
  5. Reverse: When you get the AI's response, swap the tokens back to the originals (also locally)

The AI never sees the actual name — it sees [NAME_1]. It still understands the structure, intent, and context of your document perfectly. And the token map that decodes [NAME_1] back to the original? It lives only in your browser's session memory, wiped the moment you close the tab.


The Airplane Mode Test (Your Compliance Proof)

Here's the clearest way to verify whether a privacy tool actually honors zero-trust principles:

Load the page → Disconnect from the internet → Try to use it.

If the tool still works offline: the processing is genuinely local. No data left your device. Your client data never touched a server.

If the tool breaks offline: it's making network calls. Which means your data is traveling somewhere, regardless of what the privacy policy says.

This is the test I require every privacy tool to pass before I recommend it. It's also the test I built PrivacyScrubber to satisfy — you can confirm it yourself, right now, by switching to airplane mode after the page loads.


The GDPR Audit Checklist for AI Sessions

If you're using AI tools with client data today, here's a 5-step framework to reduce your exposure:

1. Classify before you paste
Does this document contain personal data (names, contacts, IDs, health info, financial data)? If yes, it needs scrubbing before it enters any AI prompt.

2. Check your DPA coverage
Do you have a valid DPA with every AI vendor whose models process your prompts? ChatGPT Free = no DPA. Claude API = yes if you signed it. Copilot Enterprise = covered under Microsoft's DPA.

3. Verify client-side processing claims
Apply the Airplane Mode test to any tool claiming to process data locally.

4. Log what you send
Even when using scrubbed data, keep a log of session types (not content). "Summarized HR onboarding document, no PII in prompt" is defensible. Mystery AI sessions are not.

5. Honor data subject rights
If a client asks "what data did you process about me using AI?" — can you answer? If you've been sending raw documents, you probably can't.


A Request from the Dev Community

This article focuses on ChatGPT because it's ubiquitous, but the same logic applies to Claude, Gemini, Copilot, and every other hosted AI model.

There are real solutions here — both technical (regex scrubbers, local models, differential privacy) and procedural (DPAs, audit logs, data classification workflows). I'm biased toward client-side tools because I've found no other approach that satisfies the Airplane Mode test, but the broader conversation matters.

If you're building in this space — privacy-preserving AI pipelines, local inference, differential privacy for LLMs — I'd genuinely like to hear from you in the comments. And if you're using AI with client data today without a sanitization step, I'd encourage a second look.


Ilya is the founder of PrivacyScrubber.com — a browser-based PII sanitizer for AI workflows that specifically passes the airplane mode test. A deeper technical breakdown for enterprise teams is available in the CISO AI Data Security Guide.

Top comments (0)