Masking Sensitive Data in CloudWatch Logs for APIs (and keeping your secrets safe!)

#aws #cloudwatch #devops #security

😬 The Problem

So, picture this: you’ve built a shiny API that people love. Life is good, until one day you peek into your CloudWatch Logs and — BAM 💥 — staring right back at you are… user credentials. Passwords, tokens, maybe even PAN numbers. Not exactly the kind of surprise you want in your logs, especially when regulators (and your security team) are watching.

As much as I love CloudWatch for debugging and monitoring, nobody wants to see their production logs looking like an open diary of sensitive user data. In fintech, that’s a big no-no (think PCI, GDPR, SOC2 nightmares).

🎯 The Mission

My mission was clear: 👉 Mask sensitive data (like passwords, tokens, card details) in CloudWatch Logs — while still keeping logs useful for troubleshooting.
Bonus challenge: make the solution scalable, beginner-friendly, and fun.

🛠️ The Fix

1. First, a refresher

What are CloudWatch Logs?
CloudWatch Logs is like your app’s black box recorder — every API call, error, and debug line can land here. - You can stream logs from API Gateway, Lambda, ECS, EC2, you name it. - You can search them in CloudWatch Insights or ship them to OpenSearch for SQL-style queries. - You can even set alarms when things go sideways.
But — and it’s a big BUT — by default, CloudWatch logs whatever you give it. If you send passwords, it happily stores passwords. 🙈

2. My “Aha!” moment

CloudWatch Data Protection Policies
Instead of writing messy regex scripts or re-engineering logging middleware, AWS now gives us Data Protection Policies for log groups.

Think of it like a secret filter on your logs:

You tell CloudWatch which sensitive data patterns to look for.
It uses managed data identifiers (e.g., credit cards, AWS keys, emails, etc.).
When a match is found → it’s automatically masked (like ****).
Only users with the special logs:Unmask permission can see raw values.

When you create a data protection policy in the AWS Console, you’ll go through a few options:

Managed Data Identifiers

AWS provides a long list of preconfigured data types (AWS keys, financial numbers, PII, PHI, etc.) you can select via checkboxes.
You can also define custom data identifiers using regex if you need something AWS doesn’t provide.

Audit vs Mask (Deidentify)

Audit: Detect and record sensitive data findings without altering the logs. Great for discovery — you can see where secrets show up before deciding to mask.
Deidentify (Mask): Actually redacts those values, so they show up as **** everywhere logs are consumed. This ensures secrets never persist unmasked in CloudWatch.

Findings Destination

If you choose Audit, you can direct audit findings to another log group, S3 bucket, or Kinesis Firehose. Useful for compliance reports.
If you don’t specify, it can be left empty ({}), which means no special destination.

Apply Policy

After you save the policy, only new logs ingested are scanned and masked. Old logs are not retroactively changed.
Masking only works on Standard log class groups.
Also, the DataIdentifier arrays in Audit and Deidentify statements must match exactly; otherwise AWS rejects the policy.

Here’s a sample policy combining both audit and mask:

`{
"Name": "data-protection-policy",
"Description": "",
"Version": "2021-06-01",
"Statement": [
{
"Sid": "audit-policy",
"DataIdentifier": [
"arn:aws:dataprotection::aws:data-identifier/AwsSecretKey",
"arn:aws:dataprotection::aws:data-identifier/BankAccountNumber-FR"
],
"Operation": {
"Audit": {
"FindingsDestination": {}
}
}
},
{
"Sid": "redact-policy",
"DataIdentifier": [
"arn:aws:dataprotection::aws:data-identifier/AwsSecretKey",
"arn:aws:dataprotection::aws:data-identifier/BankAccountNumber-FR"
],
"Operation": {
"Deidentify": {
"MaskConfig": {}
}
}
}
]
}`

In practice, I like to start with Audit only so I can measure where sensitive data actually shows up. Once I’m confident, I enable Deidentify to make sure those values are masked everywhere.

Boom 💥 — no more plaintext credentials in logs!

Using the logs after masking Once logs are masked, you can still consume them as usual: - CloudWatch Logs Insights → run queries like:

fields @timestamp, @message
  | filter @message like /ERROR/
  | sort @timestamp desc
  | limit 20

OpenSearch SQL → even fancier! You can connect your log group to OpenSearch and query like:

SELECT
    requestId,
    httpMethod,
    status,
    message
  FROM apigateway_logs
  WHERE status >= 500
  ORDER BY timestamp DESC
  LIMIT 10;

Masked fields still show up as **** , which means you keep debugging power without exposing secrets.

✅ The Outcome

No more sensitive data (passwords, tokens, PAN) leaking into logs.
Security and compliance team = happy campers.
Developers still get useful logs for troubleshooting.
And me? I sleep better at night knowing I won’t wake up to a compliance ticket.

😂 Fun Takeaway

Think of CloudWatch Data Protection like an automatic censor beep on live TV. Your logs may still be dramatic, but at least they’re PG-13 instead of Rated R for Regulatory Nightmares.

So if you’re running a fintech API (or really any API), do yourself a favor — let CloudWatch keep the logs, but mask the secrets. After all, what happens in production logs should stay in production logs... safely masked.🥺

What do you think? Have you tried CloudWatch Data Protection yet, or do you still rely on custom masking? I’d love to hear how you’re handling log hygiene in your stack!