Paperwork for Paperwork

Posted on Jul 5 • Originally published at paperwork.to

How to mask PII before using AI chat at work

#privacy #fintech #ai #security

People paste work text into AI chat every day. A support agent copies a customer email to draft a reply. A finance analyst pastes a payment instruction to check the wording. A manager pastes a bank-statement extract to reconcile a total before a meeting. Each of these prompts can carry names, card numbers, IBANs, Emirates IDs, and client references that the task never needed.

PII masking removes those values before the prompt is sent. It replaces each sensitive value with a stable placeholder, keeps a private mapping on the user's device, and leaves the amounts, dates, and structure the model needs to do the work. The assistant sees [PERSON_1] and [IBAN_1]; the user keeps the real values.

This guide explains where pasted text goes once it is sent, what to mask, what to leave visible, and how a browser masking workflow runs step by step. It also covers the mistakes that defeat masking, the data protection rules that apply, and the point where a browser extension stops being enough and a document workflow takes over.

Key Takeaways

Mask names, contact details, card and account numbers, IDs, and client references before using AI chat for work email, finance, HR, legal, support, or document review.
Keep the values the task depends on: amounts, dates, categories, document structure, and the question you want answered.
Use stable placeholders such as [PERSON_1], [EMAIL_1], and [CARD_1] so the model can keep track of who is who without knowing identities.
Consumer plans of ChatGPT, Claude, and Gemini can use conversations for model training under their default settings. Masking limits the exposure whatever the account settings are.
Privacy Mask runs the masking step locally in the browser. For whole files, batches, and audit requirements, use a document anonymization workflow.

What happens to text pasted into AI chat

Once a prompt is sent, the text is stored and handled under the provider's terms, and those terms differ by plan. The details matter, because most employees reach for whichever account is already logged in, and the consumer defaults on the three most common assistants all allow the provider to use conversations for model training.

The current policies, from the providers' own documentation:

Provider	Consumer default	Business plans
OpenAI (ChatGPT)	Conversations on Free, Plus, and Pro plans may be used to train models unless the user turns the setting off, per the OpenAI Data Controls FAQ.	Business, Enterprise, and Education data is excluded from training by default.
Anthropic (Claude)	Since 28 September 2025, Free, Pro, and Max accounts default to allowing chats in training, with retention extended to five years while the setting is on, per the Anthropic consumer terms update.	Claude for Work, government, education, and API traffic are excluded.
Google (Gemini)	With Gemini Apps Activity on, which is the default, a subset of conversations may be read by human reviewers and retained for up to three years, per the Gemini Apps Privacy Hub.	Workspace business data is handled under separate terms.

Google's privacy notice states the practical rule plainly: users should not enter confidential information they would not want a reviewer to see. The same caution applies to every assistant in the table, and it describes exactly the text employees paste at work: customer emails, payment details, HR notes.

Each provider offers opt-outs, temporary modes, and enterprise contracts, and a well-run company can standardize on a business plan with training excluded. The gap is that policy cannot see which account an employee is logged into at 6pm on a deadline. Netskope's Cloud and Threat Report 2026 found organizations detecting an average of 223 attempts per month by employees to include sensitive data in genAI prompts or uploads, spanning regulated data, intellectual property, source code, and credentials. The same report found that genAI data-policy violations more than doubled year over year.

The consequences show up in breach data. IBM's Cost of a Data Breach Report 2025 reported that one in five organizations had a breach involving shadow AI, meaning AI tools used outside IT approval. Organizations with high levels of shadow AI saw an average of USD 670,000 in higher breach costs, and shadow AI incidents compromised customer PII in 65% of cases, against a 53% global average. The earliest well-known example predates all of these figures: in May 2023, Bloomberg reported that Samsung banned ChatGPT and similar tools for staff after an engineer uploaded internal source code.

Masking addresses the part of this problem that sits with the individual employee. When sensitive values are replaced before the prompt is sent, the retention policy, the training default, and the account tier all matter less, because the text that leaves the browser no longer contains the identities.

What PII masking means for AI chat

PII masking is a step between the source text and the AI prompt. It detects sensitive values, swaps each one for a placeholder, and keeps a private mapping so the user can restore or look up the original values later. A masked prompt keeps its meaning for the task while withholding the identities.

NIST SP 800-122 treats PII protection as a question of context. The same value can carry different privacy impact depending on where it appears, who can access it, and what sits next to it. A first name alone identifies nobody; a first name next to an employer, a salary figure, and a complaint does. This is why a work prompt should be reviewed as a whole rather than field by field.

For AI chat, masking covers these value types:

Original value type	Masked placeholder	Why it should be masked
Person name	[PERSON_1]	Identifies a customer, employee, applicant, or counterparty.
Work or personal email	[EMAIL_1]	Can identify a person and expose company routing.
Phone number	[PHONE_1]	Direct contact detail.
Payment card number	[CARD_1]	Payment account data should not be pasted into general prompts.
IBAN or bank account	[IBAN_1]	Links a person or company to a financial account.
Emirates ID, passport, tax ID	[ID_1]	Government identity data needs stricter handling.
Client or vendor name	[CLIENT_1]	Can reveal confidential commercial relationships.
Internal account reference	[ACCT_REF_1]	Often enough to identify a case in internal systems.
Address or location	[ADDRESS_1]	Can identify a person or sensitive site.

Placeholders should stay consistent within a prompt. When the same customer appears three times, the masked text should read [PERSON_1] in all three places. The model can then follow the relationships, answer questions about the customer, and produce a reply that maps cleanly back onto the original names.

How detection works in the browser

Sensitive values fall into two groups, and they are detected differently.

Structured identifiers have formats that can be checked. A payment card number is typically 13 to 19 digits and carries a Luhn check digit, so a candidate number can be validated before it is flagged. An IBAN starts with a two-letter country code and validates under the mod-97 checksum defined in ISO 13616. An Emirates ID is a 15-digit sequence beginning with 784, the UAE country code, followed by the holder's birth year, a serial number, and a check digit. Email addresses and phone numbers follow recognizable patterns. Detection for this group is close to deterministic: the format either validates or it fails.

Unstructured entities are harder. Person names, company names, street addresses, and project names carry no checksum. Detection relies on capitalization, context words, and known-name lists, and it will sometimes miss a name spelled unusually or flag a product name as a person. This is the technical reason a masking workflow needs a review step. The tool does the mechanical work of finding candidates; the person confirms the judgment calls before anything is sent.

What should stay visible

Masking too much causes its own failure. When every number and date is removed, the model has nothing to reason about and the answer comes back generic. The skill is keeping the values the task depends on while removing the ones that identify people and accounts.

Keep visible	Example	Why it helps
Amounts	AED 18,500 or USD 125,750	The model can reason about payment size, thresholds, and wording.
Dates	14 June or Q3 2026	The model can draft timelines and deadlines.
Generic role	customer, vendor, employee, applicant	Keeps the business context clear.
Issue category	refund request, renewal, missing invoice	Helps the model choose the right response.
Document structure	table rows, bullet points, clause order	Preserves the shape of the source material.
Non-sensitive policy rule	approval required above AED 50,000	Lets the model apply internal instructions without seeing identities.

A useful test for each value: could the AI assistant answer the work question without it? A refund amount usually needs to stay, because the reply depends on it. The customer's name usually can go, because [PERSON_1] plays the same role in the sentence. When a kept value would identify the person anyway, such as a salary paired with a job title in a small team, mask it and describe it generically instead.

How to mask a work prompt step by step

A browser masking workflow runs in five steps. The example below follows one fictional refund email through all of them.

Step 1: Copy the source text

The workflow starts with the real work text: an email, a note, a ticket, a contract extract, or a document snippet.

Hi Sara, please confirm the refund of AED 4,200 for Rashid Al Marri, card ending 4821, account REF-88213. You can reach him at rashid.m@example.com before 14 June.

The task here is about wording and next steps. The names, the card fragment, the account reference, and the email address are incidental; the model can help with the message without any of them.

Step 2: Detect sensitive entities locally

The extension scans the text and lists what it found: two person names, an email address, a card fragment, and an internal account reference. With Privacy Mask, detection runs in the browser, so at this point the raw text has not left the device.

Step 3: Review and adjust the mask

Automated detection handles the mechanical work; the review step handles the judgment. A project name may be public in one company and confidential in another. A date may be harmless in an invoice question and identifying in an HR case. The user confirms what stays masked and keeps the values the task needs. In this example, the amount and the deadline stay visible.

Step 4: Paste the masked prompt into AI chat

Hi [PERSON_2], please confirm the refund of AED 4,200 for [PERSON_1], card [CARD_1], account [ACCT_REF_1]. You can reach him at [EMAIL_1] before 14 June.

The assistant can rewrite, translate, or summarize the message as usual. Because the placeholders are stable, the model can tell [PERSON_1] from [PERSON_2] and keep the reply coherent.

Step 5: Reveal the mapping privately

The reveal step connects the AI answer back to the original case. When the assistant returns a better version that still says [PERSON_1] and [EMAIL_1], the user opens the mapping in the browser and sees which customer and contact the reply belongs to. The original values stay on the device throughout; the chat window only ever receives placeholders.

Office use cases

Customer emails and support tickets

Support and customer-success teams use AI chat to shorten replies, adjust tone, classify complaints, and produce summaries for CRM notes. The raw text carries names, emails, phone numbers, order IDs, addresses, and sometimes attachments.

A masked prompt can still say that [CUSTOMER_1] complained about a delayed invoice, that the amount was AED 4,200, and that the promised response date was 14 June. The assistant drafts the reply without seeing the customer's contact details, and the agent restores the real values when the draft goes back into the CRM.

Finance and payment questions

Finance teams paste payment instructions, invoice notes, bank details, and approval threads into AI tools to check wording or summarize next steps. The masking rules here are strict for a reason: PCI DSS exists to protect payment account data, and a card number inside a general-purpose prompt sits outside every control the standard describes.

Card numbers, bank accounts, IBANs, beneficiary names, account references, and internal payment IDs should be masked. Amounts, currencies, due dates, payment status, and approval thresholds can usually stay, because the answer depends on them.

Bank statements and reports to cross-check

Finance and operations staff paste statement extracts and report tables into AI chat to cross-check totals, explain a discrepancy, or draft a summary for management. The useful signal in this text is numeric: amounts, dates, running balances, row order.

Mask the account holder, the IBAN, the account number, and counterparty names, and keep the figures. The assistant can check whether the rows reconcile without knowing whose account it is. For recurring statement review at team scale, structured bank statement analysis fits better than a chat window, because it extracts, validates, and logs instead of summarizing.

HR, legal, and internal operations

HR teams ask AI chat to rewrite feedback, summarize interview notes, or draft policy messages. Legal and operations teams summarize contract clauses, dispute notes, and incident timelines.

Names, candidate emails, employee IDs, client names, medical details, complaint parties, matter names, and confidential project names should be masked. The model still helps with structure, tone, issue spotting, and wording, which is what these teams ask it for.

Document snippets and screenshots

Prompts also come from documents: contracts, invoices, identity documents, onboarding PDFs. A browser extension fits when the user handles a small extract and needs a fast prompt.

For whole files, repeated workflows, or regulated review, a document anonymization service is the right tool. It can enforce masking before files reach an LLM gateway, keep audit logs, and re-identify responses under controlled rules.

Common masking mistakes

Masking fails in predictable ways. The table below lists the patterns that show up most often, what each one costs, and the habit that avoids it.

Mistake	What happens	What to do instead
Masking every number and date	The assistant loses the task context and returns a generic answer.	Keep the amounts, dates, and categories the task needs.
Different placeholders for the same person	The model treats one customer as several people and drafts a confused reply.	Use stable numbering: [PERSON_1] every time that person appears.
Masking the text but pasting a raw screenshot	The image carries the original names and numbers past the mask.	Mask text before pasting, and do not attach unredacted screenshots.
Pasting original values back into the chat to check the answer	The leak happens anyway, one message later.	Reveal the mapping in the browser, and keep it out of the chat window.
Using browser masking for whole regulated documents	Coverage gaps, no audit trail, no policy enforcement.	Route files through document anonymization.
Treating masking as a one-time cleanup	Habits drift back to raw copy paste within weeks.	Put the workflow into the team's AI usage policy.

What masking does not solve

Masking reduces what a prompt exposes. Several risks stay open, and it is worth being clear about them.

Some tasks need the identity. A due-diligence question about a specific company, or a conflict check on a named person, cannot be masked into anonymity and still make sense. Those tasks belong in approved systems with access controls, such as a business due diligence workflow, rather than an ad hoc chat.

Masked text can still be confidential. A contract clause with the parties removed may still reveal strategy or pricing. Masking handles identifiers; the decision about whether the underlying matter may leave the company at all belongs to policy.

Re-identification is possible when context is rich. A prompt describing the CFO of a named client who resigned in June identifies a person without spelling out a name. The review step exists to catch these cases, and no automated detector will catch them all.

The quality of the answer is a separate question. Masking changes what the assistant sees; it does nothing for hallucinated clauses, wrong sums, or bad advice. Normal work checks stay in place.

Data protection rules that apply

Masking work prompts lines up with obligations most GCC and international teams already carry.

In the UAE, Federal Decree-Law No. 45 of 2021 on personal data protection prohibits processing personal data without the owner's consent outside defined exceptions, and requires organizations that hold personal data to secure it and maintain its confidentiality. An employee pasting a customer's details into a consumer chat account is a use of that data nobody reviewed against those obligations.

Under the GDPR, which reaches many UAE firms serving EU residents, Article 5(1)(c) requires personal data to be adequate, relevant, and limited to what is necessary for the purpose. A masked prompt applies that principle directly: the purpose is the work question, and the identities are unnecessary for it.

For payment data, PCI DSS treats the primary account number as data to protect wherever it appears. A chat window is no exception.

None of these rules names AI chat directly. They regulate personal data handling in general terms, and pasted prompts fall inside those terms. A masking habit, combined with a business AI plan where training is excluded, keeps everyday AI use inside the lines the rules draw.

Local processing and business trust

For quick office use, the masking step should happen before the text leaves the browser. Privacy Mask is built around that boundary: it masks sensitive and personal information locally in the browser, before the user sends text to an AI chat assistant. The raw values do not need to travel to a Paperwork server for the masking to work.

Local processing also settles the question a masking service would otherwise raise: who protects the data you send to the tool that protects your data. When detection and replacement run on the device, the data stays where it already was.

Company governance still has decisions to make. Teams choose which AI tools employees may use, which work types are allowed, and what may leave the organization. OpenAI's enterprise privacy page separates business controls, retention, access, and training commitments; Anthropic and Google publish equivalent terms for their business tiers. Masking sits in front of all of this as the employee-side control. It reduces the sensitive material that reaches the chat regardless of which provider and plan the company lands on.

The Privacy Mask extension

Privacy Mask is a Chrome extension by Paperwork that runs this workflow where the copy paste happens. It installs from the Chrome Web Store and opens as a side panel next to the AI chat tools teams already use: ChatGPT, Claude, Gemini, Grok, Copilot, DeepSeek, and Perplexity.

The panel follows the five steps described above. The user brings in the work text, the extension detects sensitive entities locally and lists them by type, and each type can be switched on or off: person, email, card number, IBAN, account reference. The masked version is then ready to paste into the chat. The private mapping stays in the browser, and the reveal step reads from it when the answer comes back.

The walkthrough below shows the full loop on a work email, from raw text to masked prompt to reveal:

Watch the Privacy Mask walkthrough.

Feature details, supported assistants, and privacy notes are on the Privacy Mask app page.

Rolling out masking across a team

A masking habit spreads when the rules are short and the tool is already in the browser. Teams that roll this out well define the policy before asking employees to change behavior.

Control	Practical decision
Allowed AI surfaces	Which chat tools are approved for work use.
Sensitive entity list	Names, emails, IDs, card data, accounts, addresses, client names, internal references.
Values to keep	Amounts, dates, non-sensitive categories, and policy thresholds needed for the task.
Review step	User confirms the masked prompt before paste.
Reveal rules	Reveal only inside the company environment or approved browser workflow.
Card data policy	Raw cardholder data stays out of AI chat entirely.
Document policy	Whole documents and batch workflows go through anonymization or an approved LLM gateway.
Training	Give employees examples from real workflows, with the company's own text.

Training lands best with the team's own material. A support team learns from a historic, anonymized ticket; a finance team from a payment thread. Ten minutes of walking through what gets masked in a familiar email does more than an abstract privacy deck. For regulated teams in banking, fintech, insurance, real estate, legal, healthcare, accounting, and HR, the checklist belongs inside the AI usage policy, so an employee never has to invent a privacy workflow on a deadline.

How Paperwork fits

Privacy Mask covers the individual browser moment: an employee wants to clean a prompt before sending it to ChatGPT, Claude, Gemini, Grok, Copilot, DeepSeek, or Perplexity. It suits people who handle customer messages, finance notes, HR text, legal snippets, and internal documents during normal work.

Document Anonymization covers the structured business workflow: files, API calls, LLM routing, audit logs, deterministic tokenization, and controlled re-identification. It fits when a team wants one enforced workflow across many users or many documents.

Paperwork also supports the adjacent document-risk work: bank statement analysis, document fraud detection, and document verification for fintech lenders. The common thread is the same across all of them: extract the useful signal from documents while exposing as little sensitive data as the workflow allows.

FAQ

Is it safe to paste customer data into ChatGPT?

Raw customer data does not belong in a general-purpose AI chat. Consumer accounts can use conversations for model training under default settings, and pasted text is retained under the provider's policies. Mask names, contact details, card data, and account references first, and keep only the task context visible.

Does ChatGPT use what employees paste for training?

On consumer plans, OpenAI states that conversations may be used to improve models unless the user disables the setting. Business, Enterprise, and Education plans are excluded by default. The safer habit is masking before paste, which holds regardless of plan and settings.

Do Claude and Gemini have the same defaults?

Broadly, yes. Anthropic's consumer plans have defaulted to allowing chats in model training since September 2025, with retention extended to five years while the setting is on. Google's Gemini Apps Activity, on by default, allows a subset of conversations to be read by human reviewers and retained for up to three years. Business tiers of both are excluded.

Does PII masking make AI chat safe for work?

It makes AI chat safer for many office tasks, and it works best alongside governance rather than instead of it. Teams still need approved tools, policy, retention rules, access controls, and review for sensitive workflows.

Does Privacy Mask send raw text to a Paperwork server?

No. Privacy Mask is designed to mask sensitive information locally in the browser before the user sends text to an AI assistant.

Which AI assistants does Privacy Mask work with?

The extension targets ChatGPT, Claude, Gemini, Grok, Copilot, DeepSeek, and Perplexity. It is scoped to supported AI chat pages rather than every website, and the same masked-prompt habit works manually on any assistant it does not cover yet.

What is the difference between masking and redaction?

Redaction removes or blacks out information. Masking replaces sensitive values with placeholders that stay consistent across the prompt, so the model can still refer to [PERSON_1] or [CARD_1] and the answer maps back onto the original case.

When should a team use document anonymization instead?

Use document anonymization when the workflow involves whole files, batches, APIs, audit logs, LLM gateways, regulated review, or team-wide enforcement. Browser masking fits fast, individual copy-paste moments.

Sources

Originally published at Paperwork.

Top comments (4)

Alex Shev • Jul 5

Masking before the prompt is a good default. The mistake I see is teams treating PII cleanup as a UI reminder instead of a pipeline step that can be logged, tested, and reviewed before any external model call.

Paperwork Paperwork • Jul 6

Yes, agreed. This is not meant to replace a proper company-level AI governance layer.

For teams, this should eventually be handled as a pipeline step: LLM gateways, interceptors, anonymizers, audit logs, policy checks, and review before any external model call.

The point here is masking is the first low-friction barrier for privacy-conscious individuals who still paste work text into AI chats every day.

Alex Shev • Jul 6

Yes, that individual-to-pipeline path is the right framing. The lightweight masking step solves the daily behavior problem, while gateways and audit logs solve the organizational control problem. They are different maturity levels of the same boundary.

Mudassir Khan • Jul 9

the 'policy cannot see which account an employee is logged into at 6pm on a deadline' line is where the real problem lives. no masking habit survives deadline pressure.

we tried user side masking with engineering teams and coverage dropped to near zero within a quarter. people mask in the approved tool and paste raw into the next one. the only fix was moving enforcement upstream: gateway tokenization before the payload leaves the internal network.

browser masking has a role for individual workers without a gateway. but for any team handling regulated data, the pipeline step is the actual control.

what's the detection accuracy on unstructured entities like project names and client references? that's the class where coverage degraded first for us.