Privent v2.2: An Open-Source Firewall for Data Leaks on Agentic AI

#opensource #security #ai

Repo: github.com/privent-ai/n8n-nodes-privent, MIT licensed, 573 local detectors, no API key required to try the local mode. If you find a hole in it, that's exactly what the last section of this post is asking you to do.

Every time an AI agent workflow sends a prompt to a model, whatever's in that prompt goes to a third party, full stop. Customer emails, phone numbers, card numbers, home addresses sitting in a support ticket: if it's in the input, it's in the request body of an API call nobody reviews per execution. That's not a hypothetical, it's just how the plumbing works, and almost nobody is checking what's actually flowing through it.

We hit this wall directly building AI agent workflows that touch real user data. n8n's built-in Guardrails mask PII by replacing values with [REDACTED], but that's destructive. Once an email becomes [REDACTED], every downstream node that needs to send a confirmation, write to a CRM, or generate an audit log has nothing left to work with. You've traded one leak for a broken workflow, not actually fixed anything.

So we built something that doesn't destroy the data, it just hides it from the one component that shouldn't see it raw: the model.

Try it in about a minute, no account needed:

npm install n8n-nodes-privent

Full source, issues, and the threat model doc are all in the repo: github.com/privent-ai/n8n-nodes-privent.

Reversible tokenization, not redaction

user@acme.com becomes [EMAIL_001] before the LLM ever sees the prompt. The agent reasons, plans, and calls tools using tokens the whole way through, it never has the raw value. Only at egress points you explicitly mark as trusted does the original value get restored. Redaction throws information away. Tokenization just moves where the raw value is allowed to exist.

The primitives

We designed this as six composable pieces, not one big opaque node:

Session: a scoped token vault. Every node in a workflow run shares a session, so [EMAIL_001] means the same address everywhere in that execution, not a new random token per node.
Tokenize: scans the payload, detects sensitive entities against 573 local regex patterns, replaces matches with deterministic tokens. Runs entirely inside n8n. Nothing leaves the instance for this step.
Risk Check: optional. Sends the payload to our ML scoring backend, gets back a risk level (LOW/MEDIUM/HIGH/CRITICAL) with an entity breakdown. Skip it entirely and you're fully local-detection-only.
Detokenize: resolves tokens back to real values, but only at points the workflow designer placed intentionally. It's not tool-callable by the agent (usableAsTool: false), and Strict Mode blocks resolution unless the destination URL is on an explicit Trusted Sinks list.
Audit: structured events for every tokenize/detokenize action, risk scores, session IDs, sent to our backend or to a webhook you configure yourself.
Handoff: marks a trust boundary when a workflow delegates to a sub-agent or external system, so the audit trail stays coherent across agent hops instead of going dark at the handoff.

All six are implemented in the open, nothing here is described differently than it's actually coded. Privent.node.ts in the repo is the exact source: github.com/privent-ai/n8n-nodes-privent.

573 detectors, and why the number matters more than it sounds

We didn't stop at emails, phones, cards, SSNs, IBANs, that's maybe a day of regex work and every DLP tool on earth has it. The detectors that actually mattered came from building real workflows for real industries:

Regional government IDs: Dutch BSN, Indian Aadhaar, Polish PESEL, Brazilian CPF, Singapore NRIC, South Korean RRN, Turkish ID, Emirati ID
Cloud and platform credentials: AWS Access Key, AWS ARN, GitHub tokens, OpenAI keys, Stripe, Heroku, Firebase, Azure Resource ID, GCP service accounts
Crypto addresses: Bitcoin, Ethereum, Solana, Cardano, Monero, Tezos, Algorand
Shipping tracking numbers across 15+ carriers
Gig-economy identifiers: Uber trip IDs, DoorDash refs, Upwork job IDs, because fintech and HR automation workflows run into these constantly and nothing else detects them

573 entity types, spanning 90+ countries and industries, all evaluated locally inside n8n.

The architecture, without the marketing gloss

We'd rather you know exactly where the line is than find out later:

Detection (regex, all 573 patterns): inside n8n. Nothing leaves your instance for this.
Token vault (the value to token mapping): server-side. Either Privent Cloud (api.privent.ai) or self-hosted; point the Base URL credential at your own deployment and it's yours end to end.
ML risk scoring: server-side, optional, same backend.
Audit ingestion: server-side, or a webhook pointed wherever you want.

The n8n nodes themselves are MIT licensed. The backend is what we run as a hosted service, and you can replace it with your own. Verified integration partner on n8n.io/integrations/privent, 3,000+ npm installs, currently at v2.2.1.

What we can't solve alone

We'd rather list our actual open problems than pretend the repo is finished:

Detector coverage skews Western and major-Asian. We're missing Balkan national ID formats, Central Asian passport formats, MENA tax ID formats, and probably a dozen others we don't know we're missing because we haven't built for those regions yet. Adding a detector is genuinely one PR: kind, regex, confidence, category, and it ships to every user immediately.

Prompt injection against Detokenize is a real, undertested threat. The node is non-tool-callable and gated by Strict Mode and Trusted Sinks, but a sufficiently crafted payload instructing the agent to hit our detokenize API directly is a documented threat model we haven't stress-tested nearly enough. If you do offensive security work, this is the single highest-value place to look, and we'd genuinely rather you find the hole than someone else does.

Context-free detection has a real cost. A 16-digit number gets flagged as a possible credit card even when it's actually a hardware order ID. We're working on a context layer, but there's no clean fix yet that doesn't trade away coverage to buy back precision. If you've solved this problem elsewhere, we want to hear how.

⭐ Where to look

GitHub (source, issues, threat model doc): github.com/privent-ai/n8n-nodes-privent
npm: https://www.npmjs.com/package/n8n-nodes-privent

If you're building AI agents that touch real user data, clone the repo and try to break the Detokenize path, that's the part we most want tested by people who aren't us. Open an issue if you find something; we'll fix real ones and say so in the issue thread, publicly, including anything embarrassing.

If you find a regional ID format we're missing, the PR is smaller than you think: kind, regex, confidence, category, done.

And if none of that applies to you right now but this is still useful to know about: starring the repo is genuinely the thing that gets it in front of the next person building agent workflows before their first data leak instead of after. That's not a formality, that's the actual distribution mechanism for a project like this.

n8n nodes: MIT licensed. Backend: hosted service or self-hosted, your choice.