๐ The Challenge: Can You Spot the Attackers in Your Logs?
Every second, your web server is writing access logs. Most entries are legitimate users browsing your site. But buried in thousands of normal requests are attackers probing for vulnerabilities, attempting SQL injection, and scanning for exposed files.
Your mission: Build a log parser that extracts critical security information and detects attack patterns.
๐ก Part of a 48-week Intel โ Remote AppSec Engineer curriculum
Follow along as I transition from Intel Security Engineering to a remote AppSec role by June 2026.
๐ฏ Real-World Relevance
When Equifax was breached in 2017 (compromising 147 million people), investigators found the attackers had been in their systems for 76 days. The evidence? Buried in access logs that nobody was parsing.
GitHub's security team processes millions of log entries daily to detect:
- Credential stuffing attacks
- API abuse patterns
- Automated vulnerability scanners
- Path traversal attempts
Every security engineer needs to parse logs. This is Day 1 skills.
๐ฆ What's in the Full Repository
This challenge is part of my AppSec learning series:
- Weekly Security Challenges (releasing through June 2026)
- LeetCode-style format with 60+ test cases per exercise
- Real interview prep for Trail of Bits, GitLab, Stripe, Coinbase
- Public accountability - tracking my Intel โ Remote AppSec transition
Repository: https://github.com/fosres/AppSec-Exercises
๐จ What You're Preventing
Without proper log analysis:
- SQL injection attempts go undetected until your database is dumped on the dark web
-
Directory traversal attacks (
../../etc/passwd) succeed because you didn't notice the pattern - Brute force login attempts from 100+ IPs look like normal traffic
- Reconnaissance scans map your entire infrastructure before the real attack
Companies that parse logs well detect breaches in hours, not months.
๐ The Challenge
Difficulty: Week 2 (Python Workout Chapters 1-4 only)
Skills Required: String methods, lists, dictionaries, basic I/O
Time Estimate: 2-3 hours
Write a Python script log_parser.py that parses web access logs and extracts security-relevant information.
Input Format
Your script will process Apache/nginx Combined Log Format entries:
192.168.1.100 - - [15/Dec/2025:14:23:45 +0000] "GET /index.html HTTP/1.1" 200 1024 "https://google.com" "Mozilla/5.0"
10.0.0.50 - admin [15/Dec/2025:14:24:12 +0000] "POST /login.php HTTP/1.1" 401 512 "-" "curl/7.68.0"
203.0.113.42 - - [15/Dec/2025:14:25:03 +0000] "GET /admin' OR '1'='1 HTTP/1.1" 403 2048 "-" "sqlmap/1.5"
198.51.100.23 - - [15/Dec/2025:14:26:30 +0000] "GET /../../etc/passwd HTTP/1.1" 404 256 "-" "Mozilla/5.0"
Log Format Breakdown:
IP_ADDRESS - USERNAME [TIMESTAMP] "METHOD PATH PROTOCOL" STATUS_CODE BYTES "REFERRER" "USER_AGENT"
Sample Input File
You can test your parser with this sample log file (save as access.log):
192.168.1.100 - - [15/Dec/2025:14:23:45 +0000] "GET /index.html HTTP/1.1" 200 1024 "https://google.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
192.168.1.100 - - [15/Dec/2025:14:23:48 +0000] "GET /about.html HTTP/1.1" 200 2048 "https://example.com/index.html" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
10.0.0.50 - admin [15/Dec/2025:14:24:12 +0000] "POST /login.php HTTP/1.1" 401 512 "-" "curl/7.68.0"
10.0.0.50 - admin [15/Dec/2025:14:24:15 +0000] "POST /login.php HTTP/1.1" 401 512 "-" "curl/7.68.0"
10.0.0.50 - admin [15/Dec/2025:14:24:18 +0000] "POST /login.php HTTP/1.1" 200 1024 "-" "curl/7.68.0"
203.0.113.42 - - [15/Dec/2025:14:25:03 +0000] "GET /products.php?id=1' OR '1'='1 HTTP/1.1" 200 4096 "-" "sqlmap/1.5#dev"
203.0.113.42 - - [15/Dec/2025:14:25:08 +0000] "GET /products.php?id=1 UNION SELECT password FROM users-- HTTP/1.1" 403 256 "-" "sqlmap/1.5#dev"
198.51.100.23 - - [15/Dec/2025:14:26:30 +0000] "GET /../../etc/passwd HTTP/1.1" 404 256 "-" "Mozilla/5.0"
198.51.100.23 - - [15/Dec/2025:14:26:35 +0000] "GET /../../../windows/system32/config/sam HTTP/1.1" 404 256 "-" "Mozilla/5.0"
172.16.0.5 - - [15/Dec/2025:14:27:10 +0000] "GET /api/users HTTP/1.1" 200 8192 "-" "python-requests/2.28.0"
172.16.0.5 - - [15/Dec/2025:14:27:45 +0000] "POST /api/upload HTTP/1.1" 201 128 "https://example.com/dashboard" "Mozilla/5.0"
Required Output Format
Your script must output JSON with this exact structure:
{
"summary": {
"total_requests": 11,
"unique_ips": 4,
"failed_requests": 4,
"total_bytes_transferred": 18432,
"most_common_status_codes": {
"200": 6,
"401": 2,
"403": 1,
"404": 2
}
},
"top_ips": [
{"ip": "10.0.0.50", "requests": 3},
{"ip": "192.168.1.100", "requests": 2},
{"ip": "203.0.113.42", "requests": 2},
{"ip": "198.51.100.23", "requests": 2},
{"ip": "172.16.0.5", "requests": 2}
],
"security_findings": [
{
"severity": "HIGH",
"finding_type": "SQL_INJECTION",
"ip": "203.0.113.42",
"path": "/products.php?id=1' OR '1'='1",
"timestamp": "15/Dec/2025:14:25:03 +0000",
"user_agent": "sqlmap/1.5#dev"
},
{
"severity": "HIGH",
"finding_type": "SQL_INJECTION",
"ip": "203.0.113.42",
"path": "/products.php?id=1 UNION SELECT password FROM users--",
"timestamp": "15/Dec/2025:14:25:08 +0000",
"user_agent": "sqlmap/1.5#dev"
},
{
"severity": "MEDIUM",
"finding_type": "PATH_TRAVERSAL",
"ip": "198.51.100.23",
"path": "/../../etc/passwd",
"timestamp": "15/Dec/2025:14:26:30 +0000",
"user_agent": "Mozilla/5.0"
},
{
"severity": "MEDIUM",
"finding_type": "PATH_TRAVERSAL",
"ip": "198.51.100.23",
"path": "/../../../windows/system32/config/sam",
"timestamp": "15/Dec/2025:14:26:35 +0000",
"user_agent": "Mozilla/5.0"
},
{
"severity": "LOW",
"finding_type": "BRUTE_FORCE",
"ip": "10.0.0.50",
"description": "3 failed login attempts detected",
"failed_request_count": 3,
"target_path": "/login.php"
}
],
"suspicious_user_agents": [
{"user_agent": "sqlmap/1.5#dev", "count": 2},
{"user_agent": "curl/7.68.0", "count": 3},
{"user_agent": "python-requests/2.28.0", "count": 1}
]
}
Important Notes:
-
top_ipsmust be sorted byrequestsin descending order (highest first). In the example above,10.0.0.50with 3 requests is listed first. - For IPs with the same request count, the order doesn't matter (any order is acceptable).
Detection Requirements
Your parser must detect these attack patterns:
1. SQL Injection (HIGH severity)
Detect these patterns in the URL path:
' OR '1'='1' OR 1=1--UNION SELECT; DROP TABLE-
--(SQL comment) -
'followed by SQL keywords (OR, AND, UNION, SELECT, etc.)
2. Path Traversal (MEDIUM severity)
Web Server Configuration:
The web server's document root is /var/www/html. All requested files must stay within this directory.
Your Task:
Detect when a requested path would escape the document root.
Examples:
-
GET /docs/index.htmlโ Safe (stays in/var/www/html/docs/) -
GET /../../etc/passwdโ Attack (escapes to/var/etc/passwd) -
GET /../../../windows/system32/config/samโ Attack (escapes to/windows/system32/)
Detection (Week 2 Level):
Use pattern matching to detect dangerous path patterns:
- Look for
../or..\in the path - Check for sensitive file paths like
/etc/passwd,/windows/system32 - Any method that correctly identifies the attacks in the sample log is acceptable
OPTIONAL/ADVANCED: URL encoding (covered in Week 13-14)
- Attackers may URL-encode paths:
..%2f..%2fetc%2fpasswd - Or double-encode:
..%252f..%252fetc%252fpasswd - For Week 2, you can ignore URL encoding
- You'll enhance this detection later when studying "Hacking APIs" Chapter 13
3. Brute Force Attempts (LOW severity)
Detect when an attacker is trying to guess passwords by making multiple failed login attempts.
Exact Detection Rules:
Rule 1: What counts as a "failed login attempt"?
A request is a failed login attempt if ALL of these conditions are true:
- Status code is
401(Unauthorized) OR403(Forbidden) - Path contains the word
login(case-insensitive, anywhere in the path)
Rule 2: What are valid login endpoints?
Any path containing login (case-insensitive):
- โ
/login,/login.php,/admin/login,/api/login,/LOGIN - โ
/admin(no "login"),/auth(different word),/signin(different word)
Rule 3: Detection threshold
- Track failed login attempts per IP address
- If the same IP has 3 or more failed login attempts โ Create a BRUTE_FORCE finding
- Check this after processing all log entries (not per line)
Rule 4: Output format
Create one finding per IP that meets the threshold:
{
"severity": "LOW",
"finding_type": "BRUTE_FORCE",
"ip": "10.0.0.50",
"description": "3 failed login attempts detected",
"failed_request_count": 3,
"target_path": "/login.php"
}
Examples:
โ
Brute Force Detected:
10.0.0.50 - - [15/Dec/2025:14:00:00 +0000] "POST /login.php HTTP/1.1" 401 512
10.0.0.50 - - [15/Dec/2025:14:00:02 +0000] "POST /login.php HTTP/1.1" 401 512
10.0.0.50 - - [15/Dec/2025:14:00:04 +0000] "POST /login.php HTTP/1.1" 401 512
Result: 3 failed attempts to /login.php from 10.0.0.50 = BRUTE_FORCE
โ NOT Brute Force:
10.0.0.50 - - [15/Dec/2025:14:00:00 +0000] "POST /admin HTTP/1.1" 401 512
10.0.0.50 - - [15/Dec/2025:14:00:02 +0000] "POST /admin HTTP/1.1" 401 512
10.0.0.50 - - [15/Dec/2025:14:00:04 +0000] "POST /admin HTTP/1.1" 401 512
Result: Path /admin doesn't contain "login" = NOT brute force
Note: For Week 2, we're simplifying this - just count total failures per IP to login endpoints, ignore timing windows.
4. Suspicious User Agents
Flag these patterns:
-
sqlmap- SQL injection tool -
nikto- vulnerability scanner -
nmap- port scanner -
curl/wget- scripted access (not browsers) -
python-requests- scripted access
Grading Criteria
Your script will be graded on:
-
Correct JSON Structure (30 points)
- Exact field names as specified
- Proper data types (numbers as numbers, not strings)
- Valid JSON syntax
-
Summary Statistics (20 points)
- Accurate
total_requestscount - Correct
unique_ipscount - Accurate
failed_requests(4xx/5xx status codes) - Correct
total_bytes_transferredsum - Proper
most_common_status_codesdictionary -
top_ipssorted by request count (descending - highest first)
- Accurate
-
SQL Injection Detection (20 points)
- Detects
' OR '1'='1pattern - Detects
UNION SELECTpattern - Captures IP, path, timestamp, user_agent
- Marks as HIGH severity
- Detects
-
Path Traversal Detection (15 points)
- Correctly identifies both path traversal attempts in the sample log
- Marks as MEDIUM severity
- Captures IP, path, timestamp, user_agent
- Note: Any detection method that correctly identifies the attacks is acceptable
-
Brute Force Detection (10 points)
- Groups failed attempts (401/403) by IP
- Only counts failures to login endpoints (path contains "login")
- Detects 3+ failed login attempts from same IP
- Marks as LOW severity
- Includes failed_request_count field
-
User Agent Analysis (5 points)
- Flags
sqlmap,curl,python-requests - Counts occurrences of each suspicious user agent
- Flags
Usage
Your script should be runnable like this:
python3 log_parser.py access.log
Output should print the JSON to stdout (or save to output.json).
๐ง What You'll Learn (Week 2 Skills)
By completing this challenge, you'll master:
- String Parsing with .split() - Extract structured data from text using basic string methods
-
String Slicing - Use
[start:end]to extract substrings from log lines -
String Methods -
.lower(),.find(),.strip(),inoperator for pattern matching -
Dictionaries & Counting - Use
.get()method to count occurrences - List Comprehensions - Filter and transform log entries efficiently
- Path Security Concepts - Understand how directory traversal attacks work and how to detect dangerous path patterns
- OWASP Top 10 - Real SQL injection and path traversal attack patterns
- JSON Output - Format data for security tools (SIEM, alerting)
Note: This exercise uses Python skills from Chapters 1-4 of Python Workout (no regex, no datetime, no OOP). URL encoding detection and advanced path validation are optional enhancements you can add later in Week 13-14 when studying evasion techniques in "Hacking APIs" Chapter 13.
๐ก Coming up in future weeks: SQLi detection (Week 4), PCAP analysis (Week 9), reverse engineering (Week 13), and full SIEM engines (Week 20).
๐ก Hints (Conceptual Guidance Only)
Approach to parsing log lines:
- Each log line has a consistent format with fields separated by spaces and quotes
- The IP address is always the first element
- The timestamp is wrapped in square brackets
[] - The HTTP request is wrapped in double quotes
"GET /path HTTP/1.1" - The status code and bytes come after the request
- The user agent is the last quoted string
- Think about: How can you use
.split(),.find(), and string slicing[start:end]to extract these fields?
Detecting SQL injection patterns:
- SQL injection often includes SQL keywords like
OR,AND,UNION,SELECT,DROP - Look for single quotes
'combined with SQL logic - SQL comments like
--or/* */are suspicious - Remember to check case-insensitively (
.lower()is your friend) - Think about: How can you check if certain substrings exist in the path using the
inoperator?
Detecting path traversal:
- The web server's document root is
/var/www/html - Requests that escape this directory are path traversal attacks
- Think about: How do you check if a path like
/../../etc/passwdescapes/var/www/html? - You can use simple pattern matching OR figure out how to resolve and validate paths
- Optional (Week 13+): Python has built-in modules that can help with path operations and URL decoding
Counting and aggregating:
- You'll need to count requests per IP, count status codes, and group failed attempts
- Dictionaries are perfect for counting: use IP or status code as the key, count as the value
- The
.get(key, default)method is useful:counts.get(ip, 0) + 1 - Think about: How do you iterate through all log entries and update your count dictionaries?
Detecting brute force:
- For this simplified version, just count how many failed attempts (401/403 status) each IP has
- Group failed attempts by IP address using a dictionary
- If an IP has 3 or more failed attempts total, it's a potential brute force
- Think about: How do you filter entries by status code and count them per IP?
General strategy:
- Read the log file line by line
- Parse each line to extract: IP, timestamp, method, path, status, user_agent
- Store parsed data in a list of dictionaries
- Analyze the list to generate summary statistics
- Check each entry for attack patterns (SQLi, path traversal, suspicious user agents)
- Group and count for brute force detection
- Format everything as JSON and output
๐ Common Mistakes
-
Not handling missing fields - Some logs have
-for bytes/referrer, check before converting to int -
Case sensitivity - SQL keywords can be
UNION,Union, orunion- use.lower()first -
Partial matches -
unioninreunion.htmlis NOT SQL injection - be careful withinchecks - String splitting edge cases - User agents and referrers contain spaces, can't just split the whole line on spaces
- Quote handling - The request and user agent are wrapped in quotes, need to extract carefully
-
JSON formatting - Use
json.dumps()withindent=2for readable output
๐ Success Criteria
Your parser successfully:
- โ Parses all 11 sample log entries without errors
- โ Outputs valid JSON matching the exact schema
- โ
Detects all 2 SQL injection attempts from
203.0.113.42 - โ
Detects all 2 path traversal attempts from
198.51.100.23 - โ
Detects the brute force pattern from
10.0.0.50 - โ
Flags
sqlmap,curl, andpython-requestsas suspicious - โ Calculates summary statistics correctly
Want instant feedback? Use the automated grader in the next section to verify all criteria! ๐
๐งช Test Your Solution
Ready to validate your parser? I've created a comprehensive test suite with 12 log files and an automated grader.
๐ Complete Test Suite & Grader:
๐ https://github.com/fosres/AppSec-Exercises/tree/main/cyberscripts/logs
What's Included
Test Files (sample_logs/ directory):
-
access.log- The 11 entries from this challenge -
01_normal_traffic_only.log- Zero attacks (tests for false positives) -
02_sql_injection_heavy.log- 8 SQL injection patterns -
03_path_traversal_heavy.log- 8 directory traversal attacks -
04_mixed_attacks_multi_ip.log- Combined attack scenarios -
05_brute_force_multi_ip.log- Multiple brute force patterns -
06_edge_cases.log- Missing fields, large files, unusual data -
07_successful_attacks.log- Attacks with 200 status (dangerous!) -
08_distributed_attack.log- Attacks from many IPs -
09_url_encoded_attacks.log- URL-encoded payloads -
10_mixed_http_methods.log- Various HTTP methods -
01_empty.log- Empty file edge case
Automated Grader (grader.py):
- Tests your parser against all 12 log files
- Validates JSON structure and field types
- Checks detection accuracy (SQLi, path traversal, brute force)
- Awards points and letter grade (100 points total)
- Provides detailed feedback on what's wrong
Reference Solution (log_parser.py):
- Complete implementation
- Handles all edge cases
- Scores 100/100 on the grader
- Well-commented code following best practices
How to Use
# 1. Clone the repository
git clone https://github.com/fosres/AppSec-Exercises.git
cd AppSec-Exercises/cyberscripts/logs
# 2. Test your solution
python3 grader.py
# 3. Expected output:
# โ
Test 1: access.log - 30/30 points
# โ
Test 2: Normal Traffic - 10/10 points
# ...
# TOTAL SCORE: 100/100 (A+)
The grader tests:
- โ JSON structure validation (30 pts)
- โ Summary statistics accuracy (15 pts)
- โ SQL injection detection (15 pts)
- โ Path traversal detection (10 pts)
- โ Brute force detection (10 pts)
- โ Edge case handling (5 pts)
- โ User agent analysis (5 pts)
- โ No crashes bonus (10 pts)
Pro tip: Run the grader incrementally as you build features. Don't wait until you're "done"!
๐ Resources (Week 2 Level)
- Apache Log Format Docs: https://httpd.apache.org/docs/current/logs.html
- OWASP Injection: https://owasp.org/www-community/Injection_Flaws
- Python String Methods: https://docs.python.org/3/library/stdtypes.html#string-methods
- Python Dictionaries: https://docs.python.org/3/tutorial/datastructures.html#dictionaries
- Grace Nolan's Security Notes: https://github.com/gracenolan/Notes
Note: This exercise uses only Python Workout Chapters 1-4 skills. You don't need regex, datetime parsing, or OOP yet!
๐ฎ Future Enhancements (Later in Curriculum)
This exercise focuses on Week 2 fundamentals, but you'll enhance it later:
Week 13-14: WAF Evasion & URL Encoding
- URL encoding detection:
..%2fvs../ - Double URL encoding:
..%252f(bypasses filters) - Case switching bypasses
- String terminators (null bytes)
- Reading: "Hacking APIs" Chapter 13
Week 20-23: Production SIEM
- 1000+ events/second processing
- Multi-stage attack correlation
- Geographic IP analysis
- False positive reduction
For now, focus on the basics: parsing, pattern matching, JSON output!
๐ Next Steps (After You Master Week 2)
Once you've completed the basic parser and progressed further in your curriculum:
-
Week 5+: Add regex patterns - Use
remodule for more sophisticated pattern matching -
Week 6+: Add time-based detection - Parse timestamps with
datetimeto detect rapid attacks - Week 9+: Create a LogEntry class - Use OOP to organize your code better
-
Week 12+: GeoIP lookup - Map IPs to countries using
geoip2library - Week 15+: Export to SIEM - Format output for Splunk/ELK ingestion
For now: Focus on mastering string methods, dictionaries, and list comprehensions. These are your foundation!
๐ Get Involved
โญ Like This Challenge?
Star the repository to get more weekly security challenges:
๐ https://github.com/fosres/AppSec-Exercises
New challenges every Monday covering: SQLi detection, PCAP analysis, reverse engineering, SIEM correlation, API security, and more.
๐ฌ Stay Connected
๐ฏ Submit Your Solution
Completed the challenge? Share your solution!
Submit here: https://github.com/fosres/AppSec-Exercises/tree/main/cyberscripts/logs
Fork the repository, add your solution, and submit a pull request. All solutions are welcome - beginners to experts!
Part of a 48-week journey from Intel Security Engineer โ Remote AppSec Engineer. All challenges, solutions, and progress documented publicly at github.com/fosres/AppSec-Exercises
Top comments (0)