Your application is probably logging PII right now.
Not maliciously - it happens naturally. A user submits a form with their email. Your framework logs the full request body for debugging. The email lands in CloudWatch, Datadog, or your ELK cluster. It sits there for 90 days, or 365, or however long your retention policy says.
Under GDPR, that's a data breach waiting for a complaint. Under HIPAA, it's a violation. Under any audit, it's a finding.
The fix isn't "tell developers to be careful." Developers are already careful - until they're debugging a production incident at 2am and add a quick console.log(request.body). The fix is a masking layer that runs automatically, before any log hits storage.
This article is about building that layer in Node.js.
What PII Actually Looks Like in Logs
Before masking, you need to know what you're masking. PII in logs shows up in three forms:
Structured fields - JSON payloads where the key makes the value obvious:
{ "email": "alice@example.com", "password": "hunter2", "ssn": "123-45-6789" }
Embedded in strings - PII inside log messages:
User alice@example.com failed login from 192.168.1.1
Authorization: Bearer eyJhbGciOiJIUzI1NiJ9...
Nested or transformed - Base64-encoded, URL-encoded, or buried in stack traces:
Error processing request body: %7B%22email%22%3A%22alice%40example.com%22%7D
A good masking pipeline handles all three. Most tutorials only handle the first one.
The Architecture: Mask at Ingestion, Not at Display
There are two schools of thought on when to mask:
- Mask at display - store everything, redact when showing logs in the UI
- Mask at ingestion - strip PII before it ever reaches storage
Mask at ingestion is the only defensible choice for compliance. If PII reaches your database, it's already a GDPR problem - even if you never display it. The data is there, it can be breached, and you own the liability.
The pipeline looks like this:
Application → Log event → [Masking layer] → Storage
↑
This is where we operate
The masking layer runs synchronously, in-process, before any network call to your log storage. No PII leaves the machine.
Building the Masking Layer
Step 1: Define your masking strategies
Before writing regex, decide what "masked" means for your use case. Three strategies cover most cases:
type MaskingStrategy = 'mask' | 'redact' | 'hash'
// mask: show partial value - useful for debugging (still recognizable, not storable)
// "alice@example.com" → "al***@***.com"
// redact: replace entirely - use when value has no debugging value
// "hunter2" → "[REDACTED]"
// hash: deterministic SHA-256 - use when you need to correlate without exposing
// "alice@example.com" → "sha256:2f3a4b..." (same input always produces same hash)
// ⚠️ Always set PII_HASH_SALT in your environment. Emails and SSNs have low entropy
// and are trivially reversible from unsalted hashes via rainbow tables.
Hashing is underused. It lets you answer "did this user appear in these logs?" without storing the actual email. Useful for audit trails and correlation.
Step 2: Pattern-based detection
import { createHash } from 'crypto'
const PII_PATTERNS: Array<{
name: string
pattern: RegExp
strategy: MaskingStrategy
}> = [
// Email addresses
{
name: 'email',
pattern: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g,
strategy: 'mask',
},
// Credit card numbers (Format-valid patterns — prefix and length, not Luhn checksum)
{
name: 'credit_card',
pattern: /\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11})\b/g,
strategy: 'redact',
},
// US Social Security Numbers
{
name: 'ssn',
pattern: /\b\d{3}-\d{2}-\d{4}\b/g,
strategy: 'redact',
},
// Bearer tokens / JWT
{
name: 'bearer_token',
pattern: /Bearer\s+[A-Za-z0-9\-_=]+\.[A-Za-z0-9\-_=]+\.?[A-Za-z0-9\-_.+/=]*/g,
strategy: 'redact',
},
// AWS access keys
{
name: 'aws_access_key',
pattern: /\b(AKIA|AIPA|AKIA|ASIA)[A-Z0-9]{16}\b/g,
strategy: 'redact',
},
// IPv4 addresses (optional — some teams want these, some don't)
{
name: 'ipv4',
pattern: /\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b/g,
strategy: 'mask',
},
// Phone numbers (loose — adjust for your region)
{
name: 'phone',
pattern: /(\+?[\d\s\-().]{10,15})/g,
strategy: 'mask',
},
]
function applyStrategy(value: string, strategy: MaskingStrategy): string {
switch (strategy) {
case 'redact':
return '[REDACTED]'
case 'hash':
return `sha256:${createHash('sha256').update(value + (process.env.PII_HASH_SALT ?? '')).digest('hex').slice(0, 16)}`
case 'mask': {
if (value.includes('@')) {
// Email masking: show first 2 chars of local part and domain TLD
const [local, domain] = value.split('@')
const [domainName, ...tlds] = domain.split('.')
return `${local.slice(0, 2)}***@***.${tlds.join('.')}`
}
// Generic masking: show first and last char, mask middle
if (value.length <= 4) return '****'
return `${value[0]}${'*'.repeat(value.length - 2)}${value[value.length - 1]}`
}
}
}
function maskString(input: string): string {
let result = input
for (const { pattern, strategy } of PII_PATTERNS) {
result = result.replace(pattern, (match) => applyStrategy(match, strategy))
}
return result
}
Step 3: Field-name detection
Pattern matching catches PII embedded in strings. But for structured JSON, matching on field names is faster and more reliable:
const SENSITIVE_FIELD_NAMES = new Set([
'password', 'passwd', 'secret', 'token', 'api_key', 'apikey', 'api-key',
'authorization', 'auth', 'credential', 'credentials',
'email', 'e_mail', 'e-mail',
'ssn', 'social_security', 'national_id',
'credit_card', 'card_number', 'cvv', 'cvc',
'phone', 'phone_number', 'mobile',
'dob', 'date_of_birth', 'birthday',
'address', 'street_address', 'postal_code', 'zip_code',
'ip_address', 'ip', 'x_forwarded_for',
])
function isFieldSensitive(key: string): boolean {
const normalized = key.toLowerCase().replace(/[-_\s]/g, '_')
return SENSITIVE_FIELD_NAMES.has(normalized)
}
Step 4: Recursive object traversal
The masking function needs to traverse nested objects - request bodies aren't always flat:
type LogValue = string | number | boolean | null | LogObject | LogValue[]
type LogObject = { [key: string]: LogValue }
function maskObject(obj: LogObject, depth = 0): LogObject {
// Prevent infinite recursion on circular references
if (depth > 10) return { '[max_depth_exceeded]': true }
const result: LogObject = {}
for (const [key, value] of Object.entries(obj)) {
if (isFieldSensitive(key)) {
// Field name match: redact or hash based on field type
// Note: this hardcodes the strategy per field type for brevity. In a production
// system, map field names to your central PII_PATTERNS configuration to keep
// strategies consistent across both field-name and pattern-based detection.
const strategy = key.toLowerCase().includes('email') ? 'hash' : 'redact'
result[key] = typeof value === 'string'
? applyStrategy(value, strategy)
: '[REDACTED]'
continue
}
if (typeof value === 'string') {
result[key] = maskString(value)
} else if (Array.isArray(value)) {
result[key] = value.map((item) =>
typeof item === 'object' && item !== null
? maskObject(item as LogObject, depth + 1)
: typeof item === 'string'
? maskString(item)
: item
)
} else if (typeof value === 'object' && value !== null) {
result[key] = maskObject(value as LogObject, depth + 1)
} else {
result[key] = value
}
}
return result
}
Step 5: The masking pipeline entry point
Wrap everything in a single function that handles both structured objects and raw strings:
export function maskPII(input: unknown): unknown {
if (typeof input === 'string') {
return maskString(input)
}
if (typeof input === 'object' && input !== null && !Array.isArray(input)) {
return maskObject(input as LogObject)
}
if (Array.isArray(input)) {
return input.map(maskPII)
}
return input
}
Integrating With Your Logger
With Pino (recommended for Node.js)
Pino supports redact paths natively, but it only handles known field paths. For dynamic detection, use a serializers hook:
import pino from 'pino'
import { maskPII } from './masking'
const logger = pino({
serializers: {
// Mask the entire request object
req: (req) => maskPII({
method: req.method,
url: req.url,
headers: req.headers,
body: req.body,
}),
// Mask arbitrary metadata
meta: (meta) => maskPII(meta),
},
})
// Usage
logger.info({ req, meta: { userId: user.email } }, 'Request received')
With Winston
import winston from 'winston'
import { maskPII } from './masking'
const maskingTransform = winston.format((info) => {
return maskPII(info) as typeof info
})
const logger = winston.createLogger({
format: winston.format.combine(
maskingTransform(),
winston.format.json()
),
transports: [new winston.transports.Console()],
})
With a raw HTTP ingest endpoint
If you're building an ingest endpoint that receives logs from external sources (SDKs, collectors), apply masking server-side before writing to storage:
import Fastify from 'fastify'
import { maskPII } from './masking'
const app = Fastify()
app.post('/api/v1/ingest', async (request, reply) => {
const { logs } = request.body as { logs: LogObject[] }
const maskedLogs = logs.map((log) => ({
...maskObject(log),
ingested_at: new Date().toISOString(),
}))
await db.insertInto('logs').values(maskedLogs).execute()
return reply.send({ accepted: maskedLogs.length })
})
The Edge Cases Nobody Talks About
URL-encoded and Base64-encoded PII
Attackers (and frameworks) encode data. Your masking needs to handle it:
function maskStringWithDecoding(input: string): string {
let result = input
// Try URL decode and re-mask
try {
const decoded = decodeURIComponent(result)
if (decoded !== result) {
result = encodeURIComponent(maskString(decoded))
}
} catch {}
// Try Base64 decode and re-mask
const base64Pattern = /\b[A-Za-z0-9+/]{20,}={0,2}\b/g
result = result.replace(base64Pattern, (match) => {
try {
const decoded = Buffer.from(match, 'base64').toString('utf8')
// Only re-encode if it looks like it decoded to something meaningful
if (/^[\x20-\x7E]+$/.test(decoded)) {
const masked = maskString(decoded)
if (masked !== decoded) {
return Buffer.from(masked).toString('base64')
}
}
} catch {}
return match
})
return maskString(result)
}
Stack traces
Stack traces can contain PII in exception messages:
Error: User not found for email alice@example.com
at UserService.findByEmail (user.service.ts:42)
function maskStackTrace(stack: string): string {
return stack
.split('\n')
.map((line, index) => {
// Mask the error message line (first line), leave stack frames alone
if (index === 0) return maskString(line)
return line
})
.join('\n')
}
Performance considerations
The masking pipeline runs on every log event. Profile it:
// Simple benchmark
const iterations = 10_000
const sampleLog = {
message: 'User alice@example.com logged in from 192.168.1.1',
email: 'alice@example.com',
headers: { authorization: 'Bearer eyJhbGciOiJIUzI1NiJ9.test.test' },
}
const start = performance.now()
for (let i = 0; i < iterations; i++) {
maskObject(sampleLog)
}
const elapsed = performance.now() - start
console.log(`${iterations} iterations in ${elapsed.toFixed(2)}ms (${(elapsed / iterations).toFixed(3)}ms each)`)
On a modern machine, a well-implemented masking pipeline takes 0.05-0.2ms per log event. At 1,000 logs/second, that's 50-200ms of CPU per second — acceptable for most applications, but worth measuring for high-throughput services.
If performance is a concern, compile your regex patterns once outside the function — the compilation cost is paid only once, not on every log event:
// Bad: regex compiled on every call
function maskEmail(str: string) {
return str.replace(/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g, '***')
}
// Good: compiled once, reused on every call
// Note: String.prototype.replace() manages lastIndex internally — no manual reset needed
const EMAIL_PATTERN = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g
function maskEmail(str: string) {
return str.replace(EMAIL_PATTERN, '***')
}
Testing Your Masking Pipeline
A masking layer without tests is worse than no masking layer — it gives you false confidence.
import { describe, it, expect } from 'vitest'
import { maskPII, maskObject } from './masking'
describe('PII masking', () => {
it('masks email addresses in strings', () => {
const result = maskPII('User alice@example.com logged in') as string
expect(result).not.toContain('alice@example.com')
expect(result).toContain('@') // partial masking, not full redaction
})
it('redacts password fields', () => {
const result = maskObject({ password: 'hunter2', username: 'alice' })
expect(result.password).toBe('[REDACTED]')
expect(result.username).toBe('alice') // non-sensitive fields unchanged
})
it('handles nested objects', () => {
const result = maskObject({
user: { email: 'alice@example.com', preferences: { theme: 'dark' } }
})
expect((result.user as any).email).not.toBe('alice@example.com')
expect((result.user as any).preferences.theme).toBe('dark')
})
it('redacts bearer tokens', () => {
const result = maskPII('Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.test.sig') as string
expect(result).toContain('[REDACTED]')
expect(result).not.toContain('eyJhbGciOiJIUzI1NiJ9')
})
it('does not modify non-PII strings', () => {
const input = 'Server started on port 3000'
expect(maskPII(input)).toBe(input)
})
it('handles null and undefined gracefully', () => {
expect(() => maskPII(null)).not.toThrow()
expect(() => maskPII(undefined)).not.toThrow()
})
})
The Masking Preview Problem
One practical challenge: developers need to test whether their masking rules are working without shipping to production. Build a simple preview endpoint (dev/staging only) that runs the masking pipeline and returns the diff:
if (process.env.NODE_ENV !== 'production') {
app.post('/debug/mask-preview', async (request, reply) => {
const input = request.body
const masked = maskPII(input)
return reply.send({
original: input,
masked,
changed: JSON.stringify(input) !== JSON.stringify(masked),
})
})
}
Call it with a sample log payload and immediately see what gets masked. Faster than print-debugging your way through regex patterns.
Summary
PII masking in logs is not a nice-to-have. It's a compliance requirement, and more importantly, it's the right thing to do with your users' data.
The pattern is straightforward:
- Mask at ingestion, not at display
- Combine field-name detection (fast, reliable for structured data) with pattern matching (catches PII in strings)
- Choose the right strategy per field type: mask for emails, redact for passwords/tokens, hash for correlation keys
- Handle edge cases: URL encoding, Base64, stack traces
- Test it like production code, because it is production code
The implementation above is about 150 lines of TypeScript. There's no reason every Node.js application logging to CloudWatch, Datadog, or anywhere else shouldn't have something equivalent running before the first log event leaves the process.
Top comments (0)