How I Reduced ELB Access Log Analysis Time by 80% Using AWS Data Processing MCP Server

#mcp #aws #devops #security

The other day, we had a sudden spike in requests to one of our services, which triggered a flood of ELB latency alerts.

Fortunately, I had already built a foundation for analyzing ELB access logs via Athena.
The problem? In an incident scenario, running SQL-based analysis is slow. Writing a query → fixing it → waiting for aggregation → fixing again… it all takes time.

I eventually tracked down a suspicious IP hammering us with requests and blocked it, but I was left thinking "That could have gone faster."

While browsing AWS’s official MCP servers again, I noticed one I’d previously ignored the AWS Data Processing MCP Server.
The name made me think it wasn’t relevant, so I had skipped it before—but it turns out it lets you query Athena using natural language.

This post is a quick walkthrough of how I set it up, the snags I hit along the way, and how it worked in practice.

Setup

I connected it from Claude Desktop by adding the following to claude_desktop_config.json.
work-mcp is an AWS_PROFILE I created just for this purpose.

{
  "mcpServers": {
    "aws.dp-mcp": {
      "command": "uvx",
      "args": ["awslabs.aws-dataprocessing-mcp-server@latest"],
      "env": {
        "FASTMCP_LOG_LEVEL": "ERROR",
        "AWS_PROFILE": "work-mcp",
        "AWS_REGION": "ap-northeast-1"
      },
      "autoApprove": [],
      "disabled": false
    }
  }
}

IAM Policy (Where I Got Stuck)

The permissions listed in the official docs weren’t enough for my environment—I ran into missing permissions like athena:StartQueryExecution.
Here’s the policy I ended up using. You’ll need to replace the S3 bucket names with your own.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AthenaQueryAccess",
            "Effect": "Allow",
            "Action": [
                "athena:StartQueryExecution",
                "athena:GetQueryExecution", 
                "athena:GetQueryResults",
                "athena:StopQueryExecution"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AthenaMetadataAccess", 
            "Effect": "Allow",
            "Action": [
                "athena:GetDatabase",
                "athena:GetTableMetadata",
                "athena:ListDatabases",
                "athena:ListTableMetadata"
            ],
            "Resource": "*"
        },
        {
            "Sid": "GlueDataCatalogAccess",
            "Effect": "Allow", 
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:GetTable",
                "glue:GetTables"
            ],
            "Resource": "*"
        },
        {
            "Sid": "S3LogsAccess",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::your-alb-logs-bucket",
                "arn:aws:s3:::your-alb-logs-bucket/*"
            ]
        },
        {
            "Sid": "AthenaResultsS3Access",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketLocation",
                "s3:GetObject", 
                "s3:ListBucket",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::your-athena-results-bucket",
                "arn:aws:s3:::your-athena-results-bucket/*"
            ]
        }
    ]
}

Example Usage

Question 1

I want to analyze `alb_access_logs` in Athena.
Between 2025-08-07 00:00:00 and 23:59:59 JST, are there any bot-like requests?

Answer 1 (Sensitive data masked)

# Bot Analysis of ALB Access Logs (Aug 7, 2025 JST / Some details masked)

Analyzed ALB access logs in Athena to extract confirmed and suspected bot traffic.
* External IPs are anonymized as `[IP-X]`
* Internal API paths are parameterized
* Counts are shown in approximate ranges

## 🤖 Bots Detected via User-Agent

1. **VirusTotalBot** — ~160 requests
   - Representative IPs: `[IP-V1]`, `[IP-V2]`, `[IP-V3]`

2. **Googlebot** — a few requests
   - Representative IP: `[IP-G1]`

3. **Google-Safety Bot** — a few dozen requests
   - Representative IPs: `[IP-GS1]`, `[IP-GS2]`

4. **Plesk screenshot bot** — ~50 requests
   - Representative IP: `[IP-P1]`

5. **Others**:
   - **AmazonProductDiscovery bot** (few)
   - **Brainfeedbot** (few)
   - **meta-externalads** (few, Facebook-related)
   - **MicroAdBot** (single instance)
   - **curl** (few)

6. **User-Agent not set (“-”)** — ~1.1k requests (most from `[IP-U1]`)

## 🚨 High-Volume Suspicious Activity (Top Examples)

1. **[IP-A]** — ~280k requests (most for the day)
   - Main UA: Generic desktop Chrome
   - Unique URLs: ~2.0k
   - Pattern: Very frequent polling for a single session (peak ~3k–5k/hour)

2. **[IP-B]** — ~180k requests
   - 2 distinct UAs
   - Unique URLs: ~18k

3. **[IP-C]** — ~120k requests
   - 3 distinct UAs
   - Unique URLs: ~10k

4. **[IP-D]** — ~56k requests
   - Concentrated over ~3 hours (peak ~4k/hour)

## 🔍 Notable Patterns (Excerpt)

**[IP-A]** strongly suspected to be a bot or misconfigured client due to:
- Continuous high-frequency polling of similar URLs
- Targeting endpoints like `/v1/log/{sessionId}` for session monitoring

## 📊 Daily Totals

- **Total requests**: ~15.2M
- **Clear bots**: ~1.4k
- **High-volume sources (≥1k/day)**: ~20 IPs totaling ~2.0M requests

**Conclusion:**  
`[IP-A]` is likely a bot or a misconfigured client, given the excessively short polling interval.  
Recommend rate-limiting, access control, or switching to a push/caching approach.

Question 2

If we enable AWS WAF Bot Control, will the block count be high?
And would `[IP-A]` or `[IP-B]` be caught?

Answer 2

# AWS WAF Bot Control Assessment (Details masked)

## 🛡️ Bots Likely to be Blocked (Clearly identifiable by UA)

1. **VirusTotalBot**: ~160 requests
2. **Googlebot**: Few (usually whitelisted)
3. **Google-Safety Bot**: Few dozen (usually whitelisted)
4. **MicroAdBot**: ~10
5. **Brainfeedbot**: ~10
6. **Plesk screenshot bot**: Few–dozen
7. **curl**: Few

**Total:** ~230–290 requests/day

## User-Agent Not Set (“-”): ~0.5k–1.1k/day
- Some would be detected by Bot Control
- Could include legitimate proxy traffic

### 📊 Block Volume Prediction
- Clear bots: ~300–400 requests/day
- Share of total: ~0.002–0.003% (on 15–18M daily requests)

## 🎯 Would `[IP-A]` and `[IP-B]` Be Blocked?

### `[IP-A]` (Global IP)
- Likely bypassed
- UA: Generic desktop Chrome (Bot Control focuses on UA-based detection)
- Could be caught only via behavioral analysis (e.g., consistent ~25k/day mechanical access)

### `[IP-B]` (Corporate network)
- Almost certainly bypassed
- UA: Common browser
- Many legitimate authenticated API requests
- Access patterns suggest possible misconfiguration rather than malicious intent

## 🔧 Suggested WAF Enhancements

1. **Rate Limiting Rules**
   - Per IP: ~100 req/min, ~1,000 req/hour

2. **Endpoint-Specific Limits**
   - `/v1/companies/*`: ~10 req/min per IP
   - `/v1/log/*`: ~5 req/min per IP

3. **Behavioral Rules**
   - Burst access to same URL
   - Unusual high-frequency overnight traffic
   - Evenly timed polling

4. **Whitelist Exceptions**
   - Fixed IP ranges for trusted companies
   - Certain UAs for internal monitoring tools

### Effectiveness Forecast
- Bot Control alone: ~300–500 blocked requests/day
- Problematic `[IP-A]/[IP-B]` traffic remains
- Combining with rate-limiting significantly improves impact with minimal false positives

How This Influenced WAF Design

Manually writing SQL for “How many bot-like requests?” is tedious.
This tool delivered a detailed breakdown instantly—helping me confirm that Bot Control wouldn’t address our main offenders.
That was valuable not only for effectiveness but also for cost analysis, since Bot Control is billed separately from other WAF rules.

The MCP server doesn’t make SQL obsolete, but it dramatically shortens the time from alert to first hypothesis.

DEV Community