Here is a question that stops most AI projects cold once someone in compliance asks it: "When a customer types their account number into your agent, where does it go?" In a lot of hastily built systems, the honest answer is "into a third-party model's logs, into our own debug traces, into a vector database, and into three other places nobody mapped." For a fintech or healthtech business, that is not a bug. It is a reportable incident waiting to happen.
Agents are unusually good at leaking data because they touch so much of it - user inputs, retrieved documents, tool outputs, conversation history - and they pass it all around in plain text. If you handle regulated data under DPDP, GDPR, HIPAA or your sector's equivalent, you cannot treat privacy as a feature to add later. Here is the checklist we run before any agent goes live.
1. Know exactly what leaves your boundary
The moment you call a hosted model API, data crosses from your environment into someone else's. Map that flow explicitly. What fields are in the prompt? Does the provider retain inputs? For how long, and for what purpose? Many providers offer zero-retention or no-training tiers for exactly this reason - use them, and get it in the contract. If the data is too sensitive to leave at all, that is a real architectural signal: you may need a model that runs inside your own environment. Decide this deliberately, not by accident.
2. Redact before you send, not after
The cheapest privacy win is to never send the sensitive value in the first place. Before a prompt leaves your system, strip or tokenize what the model does not actually need - account numbers, full names, national IDs, medical record numbers. The agent can reason about "the customer's checking account" without ever seeing the 16-digit number. Where it needs to act on the real value, your tool layer holds it and substitutes a token in the text the model sees. The model orchestrates; it does not need to memorize the secrets.
3. Your logs are a data store - govern them like one
This is the leak almost everyone misses. To debug an agent you log prompts, responses, and tool calls. Those logs are now full of customer data, sitting in a system that often has looser access controls than your production database and a much longer retention. We have seen more privacy exposure come from verbose debug logs than from the model calls themselves. Redact at the logging layer, restrict who can read agent logs, and set an aggressive retention policy. A log you deleted cannot be breached.
4. Treat the vector database as regulated data
If you embed customer documents for retrieval, those embeddings and the source text live in a vector store - which is, for compliance purposes, a copy of the original data. It needs the same encryption, the same access controls, and crucially the same deletion path. When a customer exercises their right to be forgotten, can you actually remove their data from the vector index, not just the primary database? If you cannot answer yes, you have a gap a regulator will find.
5. Enforce permissions at the data layer, not in the prompt
A tempting shortcut is to tell the agent "only show this user their own records." Do not rely on that. Instructions in a prompt are guidance, not a security boundary, and they can be talked around. Real access control lives in your tools and your data layer: the query the agent triggers is scoped to that authenticated user's permissions, server-side, every time. The agent should be structurally incapable of fetching data the user is not entitled to - not merely instructed to avoid it.
6. Watch the inputs, too
Privacy is not only about what leaks out; it is also about what users put in. People paste things into chat boxes they should not - full card numbers, a colleague's health details, credentials. A regulated agent should detect and handle sensitive input: refuse it, mask it, or route it appropriately, and certainly never echo it back or store it raw. Assume your input box will be used in ways you did not intend, because it will be.
Privacy is a design constraint, not a disclaimer
In regulated industries, the projects that ship are the ones that treated data protection as an architectural requirement from day one - mapped the flows, minimized what they sent, governed the logs, and enforced access where it actually counts. The projects that stall are the ones that built an impressive demo and then discovered, in a compliance review, that they could not say where the data went. You cannot retrofit trust. Build the agent so that the honest answer to "where does the customer's data go?" is short, complete, and comfortable to say out loud.
About Shanti Infosoft: Shanti Infosoft is a CMMI Level 5 AI development company that has delivered 700+ projects across 16+ industries. We help teams move from AI ideas to dependable, production-grade software - shantiinfosoft.com | AI integration services.
If you operate in fintech or healthtech, we can review what your agent touches and close the data-privacy gaps before they become a compliance problem. Talk to our team.
Related reading: Why AI Projects Die Faster in Fintech & HealthTech - It's Compliance, Not Capability
Rishabh Jain is a Director at Shanti Infosoft, where the team builds AI agents and automation for real business operations.
Top comments (0)