Durrell Gemuh

Posted on Jun 8

Building LogSentry: A Serverless AWS Pipeline That Detects Secrets Leaked into Application Logs in Real-Time

#aws #security #serverless #devops

A serverless AWS pipeline that auto-monitors every CloudWatch log group, scans for leaked credentials in real-time, sends rate-limited alerts, and ships with a triage dashboard — zero config after deploy.

Every production system I've operated has had the same recurring incident: a developer accidentally logs a password, an API key ends up in CloudWatch, a database connection string appears in a debug statement that was never removed.

The consequences are serious. A leaked AWS key in logs can be scraped by attackers in minutes. A database URL with credentials gives direct access. A Stripe key means money.

I built LogSentry to catch these leaks the moment they happen — before they become breaches. And critically, it requires zero configuration after the initial deploy. New services are monitored automatically.

The Problem

In a typical microservices environment running 20+ services, each generating thousands of log lines per minute, manual review is impossible. Teams discover leaked secrets in one of three ways:

A security audit (weeks or months later)
An incident (the secret was already exploited)
A colleague happens to notice during debugging

We need detection that's real-time, automated, comprehensive, and low-noise.

Architecture

Services → CloudWatch → [Auto-Subscribe] → Kinesis → Lambda Scanner → DynamoDB + SNS

The full flow:

Services generate logs → CloudWatch Log Groups
Auto-subscribe (EventBridge + 5-min scheduled scan) adds subscription filters to every log group — zero manual setup
Kinesis Data Stream buffers log events (backpressure, replay, scaling)
Lambda Scanner processes events: 12 regex patterns + Shannon entropy analysis
DynamoDB stores deduplicated findings (with TTL auto-expiry at 90 days)
SNS sends rate-limited alerts to Slack/email/PagerDuty
CloudWatch custom metrics track detection rates per severity
SQS Dead Letter Queue preserves failed events for 14 days

Zero-Config Auto-Subscribe

The key differentiator is automatic monitoring. Instead of manually attaching subscription filters to each log group, LogSentry handles it:

Trigger	Latency	How
New log group created	Instant	EventBridge rule (requires CloudTrail)
Scheduled fallback	≤ 5 minutes	Lambda scans for unsubscribed groups
First deploy	Immediate	Terraform local-exec subscribes existing groups

Excluded prefixes (configurable): /aws/lambda/logsentry, /aws/cloudtrail, /aws/rds — to avoid scanning its own logs or known-noisy sources.

After terraform apply, every existing and future log group is monitored. Deploy a new service tomorrow — it's covered automatically.

The Detection Engine

The scanner applies 12 patterns covering the most common secret types:

Pattern	What It Catches	Severity
AWS Access Key	`AKIA` + 16 chars	Critical
Password in logs	`password=`, `pwd:` with value	Critical
Database URL	`postgres://user:pass@host`	Critical
Private Key	`-----BEGIN RSA PRIVATE KEY-----`	Critical
Stripe Key	`sk_live_`, `pk_live_` prefixes	Critical
GitHub Token	`ghp_`, `gho_`, `ghs_` prefixes	Critical
JWT Token	`eyJ` base64 pattern	High
Bearer Token	`Bearer` + 20+ chars	High
Slack Token	`xoxb-`, `xoxp-` patterns	High
Generic API Key	`api_key=`, `api_secret=`	High
Generic Secret	`secret=`, `token=`	Medium

Reducing False Positives with Entropy

Raw regex matching produces too much noise. The string password=loading matches the pattern but isn't a credential.

LogSentry uses Shannon entropy analysis as a secondary filter. Real secrets have high randomness (entropy > 3.5 bits/char). Dictionary words and common values have low entropy and get filtered.

def calculate_entropy(data: str) -> float:
    if not data:
        return 0.0
    entropy = 0.0
    for x in set(data):
        p_x = data.count(x) / len(data)
        entropy -= p_x * math.log2(p_x)
    return entropy

This noticeably cuts false positives on generic patterns like secret= and token=, where the value is what tells a real credential apart from a placeholder.

Rate-Limited Alerting

A burst of 1,000 identical leaked secrets shouldn't generate 1,000 alerts. LogSentry handles this at two levels:

Deduplication — DynamoDB conditional writes (attribute_not_exists(finding_id)). Same secret = same finding ID = one alert.
Rate limiting — Maximum 10 alerts per Lambda invocation (configurable). Prevents notification storms during mass-logging incidents.

When a new finding is detected:

🚨 LogSentry Alert — CRITICAL

Pattern: AWS Access Key ID detected
Service: /app/payment-service
Value: AKIA************MPLE
Time: 2026-06-08T14:23:01Z
Environment: production

The Dashboard

I built a Flask dashboard that works in two modes:

Demo mode (python app.py) — uses sample data, no AWS credentials needed. Perfect for showcasing to a team.
Live mode (LOGSENTRY_MODE=live python app.py) — reads directly from DynamoDB.

Dashboard features:

Stats overview — total findings, broken down by severity and status
Findings list — filter by severity, status, service, pattern
Live scanner — paste any log line and instantly see what patterns match
Resolve workflow — one-click to mark findings as remediated
Service breakdown — which services are leaking the most

The live scanner is particularly useful for developers: paste a log line from your app and verify it doesn't contain detectable secrets before deploying.

Auto-Expiring Findings

Resolved findings don't need to live forever. DynamoDB TTL automatically deletes resolved items after 90 days (configurable). This keeps the table clean and costs low without manual maintenance.

Infrastructure as Code

Everything is provisioned with a single terraform apply:

Kinesis Data Stream (1 shard dev, 2 shards prod)
Scanner Lambda (zip deploy, no Docker needed)
Auto-subscribe Lambda + EventBridge rules + scheduled trigger
DynamoDB table (GSI on severity, TTL, point-in-time recovery)
SNS topic with email subscription
SQS Dead Letter Queue
CloudWatch alarms (error rate, critical findings)
IAM roles (least-privilege, scoped per function)
Subscription filters on all existing log groups

No manual steps. No clicking in the console.

CI/CD Pipeline

Push-to-deploy via GitHub Actions (with GitLab CI equivalent):

Push → pytest (20 tests) → Trivy scan → Zip + deploy Lambda

The Terraform infrastructure has its own pipeline:

Push → terraform fmt → validate → tfsec → plan → apply (manual gate for prod)

Lambda is deployed as a zip file — no Docker build step, no ECR, no image management. Keeps CI fast (under 2 minutes).

Testing

20 unit tests covering detection of each pattern type, false positive rejection, multiple findings in one line, entropy calculation, secret masking, deduplication logic, and finding structure validation.

All tests run in under a second with zero AWS dependencies — the scanner logic is pure Python with lazy-loaded boto3.

Key Takeaways

Zero-config monitoring is the goal — if you require manual steps per service, coverage will always have gaps
Deduplication + rate limiting are essential — without them, alert fatigue kills the system
Entropy filtering works — Shannon entropy is a simple, effective way to cut false positives on generic patterns
Serverless fits perfectly — bursty workload, scales with log volume, near-zero cost at quiet times
A dashboard makes it real — having a live scanner and resolve workflow turns this from a notification tool into a security platform

The full source is on GitHub: github.com/durrello/logsentry

Security built into the platform, not bolted on after an incident. If your services are generating logs without automated secret detection, you're relying on luck.

DEV Community