DEV Community

Asil Ozyildirim
Asil Ozyildirim

Posted on

5 Ways to Stop Data from Leaking Out of Your n8n AI Workflows

If you're running AI workflows in n8n that touch real customer data — emails, phone numbers, account IDs, health records — that data is almost certainly reaching external LLM APIs in plain text. n8n execution history stores every node's input and output by default, which means anyone with instance access can read raw PII from your logs.

This post covers five concrete approaches, from zero-dependency quick fixes to production-grade solutions, with real tools, install instructions, and honest tradeoffs for each.


Why this matters before we start

A typical n8n AI workflow looks like this:

Webhook → Pull customer record → Build prompt → OpenAI → Send response
Enter fullscreen mode Exit fullscreen mode

By the time that prompt hits OpenAI, it might contain:

Summarize the support case for john.doe@company.com,
SSN 999-88-7777, account #48291, phone 555-304-8821.
Issue: {{ $json.description }}
Enter fullscreen mode Exit fullscreen mode

Every field there is PII. It's going to OpenAI's infrastructure. It's sitting in your n8n execution logs. And unless you've taken specific steps to prevent it, it will keep doing that silently.


Option 1: Manual tokenization with a Code node

What it is: Write JavaScript in an n8n Code node to replace sensitive fields with tokens before the LLM node, then reverse it afterward.

Setup: No installation needed. Add a Code node before your LLM node.

// "Tokenize" Code node — Run Once for All Items
const map = {};
let counter = 1;

function token(value, kind) {
  const t = `[${kind}_${String(counter++).padStart(3, '0')}]`;
  map[t] = value;
  return t;
}

const input = $input.first().json;

return [{
  json: {
    safe_prompt: `Summarize the case for ${token(input.email, 'EMAIL')},
      account ${token(input.account_id, 'ACCT')},
      phone ${token(input.phone, 'PHONE')}.
      Issue: ${input.description}`,
    _pii_map: map
  }
}];
Enter fullscreen mode Exit fullscreen mode

Then after your LLM node, a second Code node to restore values:

// "Detokenize" Code node
let response = $input.first().json.message.content;
const map = $('Tokenize').first().json._pii_map;

for (const [token, value] of Object.entries(map)) {
  response = response.replaceAll(token, value);
}

return [{ json: { response } }];
Enter fullscreen mode Exit fullscreen mode

What it actually covers:

  • Fields you explicitly list in the code
  • PII in the final prompt string

What it misses:

  • Anything you forget to include — every new workflow needs this written again from scratch
  • No detection of implicit sensitive content (proprietary project names, M&A context, etc.)
  • _pii_map still appears in execution logs if you're not careful
  • No audit trail per execution
  • Breaks immediately if your data schema changes

Best for: Prototyping. One-off workflows where you know exactly which 2–3 fields carry PII and you won't forget to update the code when the schema changes.


Option 2: n8n's built-in Guardrails node

What it is: A native n8n node (available since v1.113.3, November 2025) that sits between your data and your LLM node. No external services required for pattern-based checks.

Setup: Update n8n to at least v1.113.3. The Guardrails node appears in the node search — no installation needed.

The node has two modes:

Check Text for Violations — scans text against selected policies and routes to a Fail branch if anything triggers. You then decide what to do: halt the workflow, log the attempt, return a safe fallback.

Sanitize Text — redacts detected content in-place and replaces it with placeholders like [EMAIL_ADDRESS] or [PHONE_NUMBER]. The workflow keeps running with the cleaned text.

A typical pattern:

Webhook → Guardrails (Sanitize) → OpenAI → Response
Enter fullscreen mode Exit fullscreen mode

Available guardrails include: PII detection (20+ entity types: emails, phones, credit cards, SSNs, IBANs, passports, driver's licenses, medical licenses, and country-specific formats), Secret Keys, Keywords, URLs, Custom Regex, Jailbreak detection (LLM-based), NSFW detection (LLM-based), and Topical Alignment.

For PII specifically, the Sanitize mode catches structured entities via pattern matching — no API call required, no latency added. Jailbreak and NSFW detection require a connected LLM node and add one API call per check.

What it actually covers:

  • Structured PII in a single text field you configure
  • API key patterns, common credential formats
  • Jailbreak/injection attempts on user-facing inputs
  • No external service dependency for pattern-based checks

What it misses:

  • No detokenization — once redacted, the value is gone. The response back to the user will contain [EMAIL_ADDRESS], not the original. For workflows where you need the real value restored after the LLM call, you'll need additional logic.
  • No cross-node visibility — it sees the text field you point it at, not what accumulated across multiple upstream nodes
  • No audit trail in an external system
  • One documented limitation: consistent detokenization (same entity → same token across a long session) requires extra workflow logic; the node doesn't handle this automatically

Best for: Adding a first layer of protection to user-facing chatbots and intake workflows. Excellent for blocking jailbreaks and catching structured PII on input. Not enough on its own if your workflow composes prompts from data pulled across multiple nodes.


Option 3: n8n-nodes-rehydra (community node)

What it is: An open-source n8n community node (GitHub, npm) built on the Rehydra SDK. Handles both anonymization and rehydration — meaning it can restore original values after the LLM responds.

Setup:

Self-hosted n8n → Settings → Community Nodes → Install → enter n8n-nodes-rehydra.

Or via CLI:

npm install n8n-nodes-rehydra
Enter fullscreen mode Exit fullscreen mode

Three nodes in the package:

Rehydra: Anonymize — replaces detected PII with XML-style tags: <PII type="EMAIL" id="1"/>. Supports Pseudonymize mode (reversible, default) and Anonymize mode (irreversible, for when you never need the value back). Outputs: anonymizedText, piiMap (encrypted), entities.

Rehydra: Rehydrate — takes the piiMap from a prior Anonymize step and restores original values. Requires the same encryption key.

Rehydra: Inspect — dry run mode. Returns detected entities without modifying the text. Useful for testing what would be caught before going to production.

Configuration options:

  • NER Mode: Disabled (regex only, fast), Quantized (~280 MB ONNX model, auto-downloaded on first run), or Standard (~1.1 GB model). The ONNX model runs locally — no API calls, works offline, PII never leaves your machine.
  • PII Types: Email, phone, IBAN, names, organizations, and more — select which to detect.
  • Locale: affects detection patterns for country-specific formats.

A typical workflow:

Database → Rehydra: Anonymize → Claude → Rehydra: Rehydrate → Save result
Enter fullscreen mode Exit fullscreen mode

What it actually covers:

  • Structured PII + soft PII (names, organizations) when NER mode is enabled
  • Reversible pseudonymization — the LLM works with stable placeholders, the response gets real values restored
  • Fully local when using NER mode — no external API calls for detection
  • Works on any text field you configure

What it misses:

  • NER mode requires a ~280 MB model download on first execution (quantized) or ~1.1 GB (standard) — adds startup latency to first run
  • No cross-node data movement visibility — you configure which field it processes
  • No audit trail or dashboard
  • Unverified community node — requires self-hosted n8n with N8N_COMMUNITY_PACKAGES_ENABLED=true; not available on n8n Cloud

Best for: Self-hosted teams who need reversible PII masking with no external service dependency, and specifically need name/organization detection beyond regex-only approaches.


Option 4: Microsoft Presidio via HTTP Request node

What it is: An open-source PII detection and anonymization engine from Microsoft, designed for production use. You deploy it as a local service and call it from n8n via HTTP Request nodes.

Setup:

Deploy with Docker:

docker pull mcr.microsoft.com/presidio-analyzer
docker pull mcr.microsoft.com/presidio-anonymizer

docker run -d -p 5001:3000 mcr.microsoft.com/presidio-analyzer
docker run -d -p 5002:3000 mcr.microsoft.com/presidio-anonymizer
Enter fullscreen mode Exit fullscreen mode

In n8n, add an HTTP Request node:

POST http://localhost:5001/analyze
Body: {
  "text": "{{ $json.prompt }}",
  "language": "en"
}
Enter fullscreen mode Exit fullscreen mode

This returns detected entities with positions. Send those to the anonymizer:

POST http://localhost:5002/anonymize
Body: {
  "text": "{{ $json.prompt }}",
  "analyzer_results": "{{ $json.analyzerResults }}"
}
Enter fullscreen mode Exit fullscreen mode

Presidio supports 50+ entity types, custom recognizers, and multiple anonymization operators (replace, redact, hash, encrypt, mask). It's the basis for many enterprise PII pipelines and supports English and a growing list of other languages.

What it actually covers:

  • Broad entity detection (50+ types out of the box)
  • Custom recognizers for domain-specific entities
  • Multiple anonymization strategies per entity type
  • Fully local — nothing leaves your infrastructure
  • Encryption-based anonymization for reversible workflows

What it misses:

  • Requires running and maintaining a separate Docker service
  • No native n8n node — you're wiring HTTP Request nodes together, which means more workflow complexity and error handling to build yourself
  • Sees only the text field you send it — no visibility into cross-node data flow
  • No n8n-specific audit trail
  • You're responsible for the anonymization/deanonymization map storage if you need reversibility

Best for: Teams with existing DevOps capacity who want maximum control over entity detection, custom recognizers for industry-specific PII, and no dependency on third-party SaaS.


Option 5: n8n-nodes-privent (community node)

What it is: A native n8n community package (npm, privent.ai) that runs inside your workflow graph — not as an external proxy. 2,000+ installs on npm.

The architectural difference from everything above: Privent nodes read node input/output JSON and cross-node data movement directly, the same way any other n8n node does. It sees what accumulated across your entire workflow before the prompt is composed, not just the final text field you point at.

Setup:

Self-hosted:

N8N_COMMUNITY_PACKAGES_ENABLED=true
Enter fullscreen mode Exit fullscreen mode

Then Settings → Community Nodes → Install → n8n-nodes-privent.

n8n Cloud Pro/Enterprise: same UI path — no environment variable needed.

Create a Privent API credential with your pv_live_… key (vault backend is configured automatically based on your deployment type).

Six nodes in the package:

Privent Session — generates a sessionId and prewarms the in-memory vault. Keeps token mappings consistent when the same value appears across multiple nodes in one session.

Privent Tokenize — replaces detected sensitive data with deterministic [KIND_NNN] placeholders. Detects 10 categories: EMAIL, SSN, CREDIT_CARD, IBAN, AWS_KEY, JWT, API_KEY, and more. The detection engine (ACARS) evaluates six weighted signals simultaneously — entity sensitivity, semantic risk, contextual amplification, destination risk, behavioral velocity, and policy overrides — rather than pattern-matching alone.

Privent Detokenize — resolves placeholders back to real values, but only at sinks you declare as trusted. With strict: true, it hashes the downstream sink URL and checks it against your trustedSinks prefix list. An HTTP node targeting an unknown endpoint keeps the placeholder — the cleartext value stays in the vault regardless of what downstream logic does.

Privent Risk Check — scores the prompt before it reaches the model, with the full ACARS breakdown per execution.

Privent Handoff — emits agent_handoff audit events when one agent delegates to another. Flags unauthorized scope expansions.

Privent Audit Event — emits custom observability events into the Privent dashboard.

A typical workflow:

Webhook → [your nodes] → Session → Tokenize → OpenAI → Detokenize → Response
Enter fullscreen mode Exit fullscreen mode

The workflow JSON:

{
  "nodes": [
    { "name": "Webhook", "type": "n8n-nodes-base.webhook" },
    { "name": "Session", "type": "n8n-nodes-privent.priventSession" },
    {
      "name": "Tokenize",
      "type": "n8n-nodes-privent.priventTokenize",
      "parameters": {
        "sessionId": "={{ $('Session').item.json.sessionId }}",
        "textField": "prompt"
      }
    },
    { "name": "OpenAI", "type": "n8n-nodes-base.openAi" },
    {
      "name": "Detokenize",
      "type": "n8n-nodes-privent.priventDetokenize",
      "parameters": {
        "strict": true,
        "trustedSinks": "https://internal.yourcompany.com"
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

What it actually covers:

  • Dynamic runtime PII + structured entities + implicit sensitive content (semantic risk scoring catches proprietary context, M&A language, etc.)
  • Cross-node data movement — because it runs inside the graph, it sees data as it flows between nodes, not just at the egress point
  • Egress gating at the vault level — strict mode prevents cleartext from reaching untrusted destinations even if downstream workflow logic tries to send it
  • Consistent token mapping across a session
  • Full audit trail per execution in the Privent dashboard
  • Credential and API key detection (AWS_KEY, JWT, API_KEY)
  • Multi-agent delegation auditing via the Handoff node

What it requires:

  • n8n Cloud Pro or Enterprise for n8n Cloud usage; self-hosted with community packages enabled for everything else
  • A Privent account and API key
  • Privent processes data ephemerally — raw prompt text is never written to disk, never stored, never used for training

Deployment options: Privent Cloud (managed, API key), Dedicated (isolated environment), or fully on-prem (detection engine, rules, and AI models all run inside your network — nothing leaves).

Best for: Production workflows handling real customer data across multiple nodes, multi-agent architectures where data moves between agents, healthcare (HIPAA) and financial (GDPR, CCPA) environments, or any setup where you need to know exactly what left your infrastructure and where it went.


Comparison

Code Node n8n Guardrails Rehydra Presidio Privent
Installation None None Community node Docker service Community node
Detokenization Manual ❌ redact-only ✅ (custom)
Detects names/orgs ⚠️ limited ✅ (NER mode)
Implicit/semantic PII
Cross-node visibility
Egress gating
Audit trail
Works on n8n Cloud ✅ (Pro+)
External service req. Docker API key
On-prem option

How to choose

Start with n8n Guardrails if you're on n8n Cloud or want zero configuration overhead and your main concern is protecting user-submitted input on a chatbot or intake form. It's already there, costs nothing to set up, and catches the most common cases.

Add Rehydra if you need reversible anonymization on self-hosted n8n and can't send detection to an external service. The local NER model handles names and organizations that regex-only approaches miss.

Use Presidio if you have DevOps capacity, need 50+ entity types or custom recognizers for industry-specific PII, and want maximum control over anonymization strategy.

Use Privent if your workflow composes prompts from data pulled across multiple nodes, you're running multi-agent flows, or you need an audit trail that shows you exactly what left your infrastructure. The graph-state visibility gap is real — other approaches protect the field you point them at; Privent watches the entire execution.


What approach are you using in production? Curious especially about edge cases with multi-node workflows — drop it in the comments.

Top comments (0)