The Problem: Datadog Bills and GDPR Nightmares
If you are running applications in Kubernetes and shipping your logs to Datadog, you have probably faced two major headaches:
- Cost: Datadog charges you based on the volume of logs ingested and indexed. Every megabyte counts. Moreover, while Datadog offers a built-in Sensitive Data Scanner, it is a premium feature billed separately on top of your base log costs. By using a free, open-source sidecar, you can completely bypass expensive vendor-side premium scrubbers.
- Compliance: Sending Personally Identifiable Information (PII) like emails, credit card numbers, or API keys to a third-party logging service often violates GDPR and other privacy laws.
The more comprehensive your logs are for debugging, the higher your Datadog bill gets, and the bigger your risk of a privacy breach becomes.
The Standard Approach (And Why It Hurts)
The standard way to solve this is to configure the Datadog Agent to mask or scrub PII before it leaves your cluster.
However, this approach has significant drawbacks:
- Complexity: Setting up custom parsing rules, regexes, and pipelines in the Datadog Agent configuration can be tedious and difficult to maintain.
- High CPU Usage: Running heavy regex operations over massive volumes of text inside your log shipper consumes a lot of CPU resources. This can slow down your node's performance or require larger, more expensive compute instances.
- Whack-a-Mole: You are constantly updating rules as your application output changes, which takes time and effort.
The Solution: PII-Shield as a Lightweight Sidecar
Instead of burdening your cluster-wide log shipper with heavy processing, you can mask PII before it even leaves the pod.
PII-Shield is a lightning-fast, zero-dependency tool written in Go. It acts as a sidecar container that sits right next to your application. It intercepts the logs in real-time, scrubs sensitive data using entropy detection and deterministic hashing, and then passes the clean logs forward.
By the time the Datadog Agent picks up the logs from the Kubernetes node, they are already completely sanitized.
Ready-to-Use Pod Configuration
Here is a practical example of how to inject PII-Shield as a sidecar into your Kubernetes Pod. We use a shared volume so PII-Shield can read the application's output and write safe logs to its own standard output.
apiVersion: v1
kind: Pod
metadata:
name: my-app-with-pii-shield
spec:
containers:
- name: my-app
image: my-app-image:v1.0.0
# Instead of writing directly to stdout, the app writes to a shared file or pipe
command: ["/bin/sh", "-c"]
args: ["./my-app-binary > /shared-logs/app.log"]
volumeMounts:
- name: shared-logs
mountPath: /shared-logs
- name: pii-shield-sidecar
image: thelisdeep/pii-shield:v1.2.3
env:
- name: PII_SALT
value: "your-secure-random-salt"
# PII-Shield reads the file in real-time, scrubs the data, and outputs to stdout
command: ["/bin/sh", "-c"]
args: ["tail -n +1 -f /shared-logs/app.log | pii-shield"]
volumeMounts:
- name: shared-logs
mountPath: /shared-logs
volumes:
- name: shared-logs
emptyDir: {}
Note: The thelisdeep/pii-shield image is multi-arch (supporting both amd64 and arm64), which is perfect if you are saving costs by running on ARM processors like AWS Graviton.
How does Datadog know what to read?
Because the main application now redirects its output to a file, its standard output (stdout) is empty. The Datadog Agent, which natively listens to stdout across all containers via Autodiscovery, will automatically pick up only the clean stream from the pii-shield-sidecar. There are no conflicts and no duplicate logs.
Why this is better:
- Zero Configuration for Log Shippers: Datadog just receives clean logs. There are no complex pipeline rules to manage.
- Bypass Premium Vendor Fees: Datadog's built-in Sensitive Data Scanner is a premium feature billed on top of your regular log volumes. By using a free, open-source sidecar, you completely eliminate the need for expensive vendor-side scrubbing.
- Predictable Performance: PII-Shield utilizes zero-allocation JSON parsing and consumes a mere ~30Mi of memory (footprint). For a sidecar running in every pod across your cluster, this negligible resource footprint is critical.
- Easy Debugging: With deterministic hashing,
user@email.combecomes something like[HIDDEN:a1b2c3]. You can still trace that same user across your Datadog logs for debugging without ever knowing their real email.
By putting the shield right where the data is generated, you protect your users' privacy and keep your observability bills in check.
Ready to secure your Kubernetes logs?
Check out the PII-Shield repository on GitHub, try out the Helm chart, and if you find it useful, consider dropping a star!
Top comments (0)