It started innocently enough—a spike in CloudWatch metrics that looked like success. More visitors than ever before? My portfolio was going viral!
Then I looked at the logs. The "visitors" weren't human. They were bots. Scrapers, crawlers, and automated scripts churning through my portfolio like it was an all-you-can-eat buffet. My API was getting hammered, costs were creeping up, and legitimate traffic was getting choked out.
Time to fight back. Here's how I used AWS WAF to reclaim my portfolio from the bots.
The Bot Problem
Bots aren't inherently evil—search engines and monitoring services are helpful. But scrapers and crawlers? They're digital parasites that:
Skew analytics (making you think your portfolio's a viral hit when it's just Python scripts)
Burn through resources (your AWS bill isn't as fun to look at as your portfolio)
Slow down real visitors (because your API's too busy serving bots)
I needed a way to separate the wheat from the chaff—legitimate visitors from automated pests.
Enter AWS WAF Bot Control
AWS WAF Bot Control is a managed rule group that detects and mitigates bot traffic with minimal configuration. The best part? It costs $10/month—well worth the peace of mind and resource savings.
Level 1: The Common Protection
The Common inspection level catches self-identifying bots using traditional detection techniques. It:
Detects web scraping frameworks, search engines, and automated browsers
Labels traffic from these bots
Blocks unverified bots by default
Think of it as the bouncer who checks ID at the door. Most bots aren't smart enough to fake it.
Level 2: The Targeted Protection
For more sophisticated bots that don't self-identify, I enabled the Targeted inspection level. This uses advanced techniques like:
Behavior-based detection
Browser interrogation
Machine learning analysis of traffic patterns
The ML rules analyze timestamps, browser characteristics, and other behavioral signals to spot coordinated bot activity.
My Custom Rule Configuration
Blocking Verified Bots
Some bots identify themselves honestly but aren't welcome. I added a label-matching rule that runs after the Bot Control rule group to block specific verified bots:
json
{
"Name": "match_rule",
"Statement": {
"LabelMatchStatement": {
"Scope": "LABEL",
"Key": "awswaf:managed:aws:bot-control:bot:verified"
}
},
"Action": {
"Block": {}
}
}
This blocks all verified bots in one shot. Want to block only specific ones? Match on the bot name label instead:
json
"LabelMatchStatement": {
"Scope": "LABEL",
"Key": "awswaf:managed:aws:bot-control:bot:name:bingbot"
}
Rate-Based Rules: The Silent Guardian
Sometimes bots aren't malicious—they're just... too much. I added rate-based rules to handle this gracefully.
A blanket rule protects against HTTP floods:
text
Limit: 500 requests per 5 minutes
Action: BLOCK
Then more targeted rules with stricter limits:
text
Scope-down: if uri_path starts_with '/api'
Limit: 100 requests per 5 minutes
Aggregation key: IP
This means if some scraper decides to hammer my API endpoint 500 times in a minute, it gets blocked while legitimate traffic flows freely.
Creating Exceptions for Good Bots
Not all bots are bad. I wanted search engines to index my portfolio. You can set specific rules to allow verified search bots while blocking others:
json
{
"Name": "allow-search-bots",
"Statement": {
"AndStatement": {
"Statements": [
{
"LabelMatchStatement": {
"Scope": "LABEL",
"Key": "awswaf:managed:aws:bot-control:bot:verified"
}
},
{
"NotStatement": {
"Statement": {
"ByteMatchStatement": {
"FieldToMatch": {
"SingleHeader": {
"Name": "user-agent"
}
},
"SearchString": "googlebot",
"PositionalConstraint": "CONTAINS"
}
}
}
}
]
}
},
"Action": {"Block": {}}
}
The Results
The impact was immediate:
API costs dropped by ~40% (fewer pointless requests)
CloudWatch metrics normalized (actual traffic! Humans!)
Response times improved (serving fewer bots = faster responses)
Analytics became useful again (I could actually see real user behavior)
Lessons Learned
Start with Common protection — It handles 80% of the bot problem with zero configuration
Layer your rules — Combine Bot Control with rate-based rules for defense in depth
Use scope-down statements — Only inspect what matters. Target API endpoints or dynamic content to save on inspection costs
Monitor the Bot Control dashboard — AWS provides pre-built dashboards showing bot activity levels. Use them to understand what you're dealing with
The Bottom Line
AWS WAF made protecting my portfolio from bots surprisingly painless. A few JSON rules, $10/month, and some dashboard monitoring—and my portfolio serves humans again.
The bots are still out there. They'll keep scraping. But now? They're hitting a wall. And I'm sleeping better knowing my AWS bill won't spike because some scraper decided to index my entire portfolio 50 times a day.
Your portfolio deserves better than being bot food. Time to give it the protection it deserves.
Top comments (0)