A hands-on walkthrough of email header analysis, SPF/DKIM/DMARC validation, and phishing detection using pure Python
Introduction
Phishing is still the number one initial access vector in cyberattacks. According to virtually every major threat report, over 90% of successful breaches start with a phishing email. Yet most people — and even some security teams — still rely on gut feeling to decide if an email is suspicious.
I wanted to change that. So I built a Python tool that analyzes raw email files, checks every technical indicator, and produces a scored risk verdict from 0 to 100. No gut feeling required.
This article walks through exactly how it works, what it checks, and why those checks matter.
What Does the Tool Actually Do?
The Phishing Email Analyzer takes a raw .eml email file as input and runs it through a series of checks across four categories:
Header analysis (From, Reply-To, Return-Path mismatches)
Authentication validation (SPF, DKIM, DMARC)
Content analysis (urgency keywords, credential requests, suspicious links)
Attachment analysis (double extensions, executable disguises)
Each indicator adds points to a risk score. At the end, the tool returns a color-coded verdict:
0–30: Likely Legitimate
31–60: Suspicious
61–80: High Risk
81–100: Critical — High Probability Phishing
The Technical Foundation: What is an .eml File?
An .eml file is the raw format of an email. It contains everything — headers, body, attachments — all in plain text. Every email client can export emails in this format, which makes it perfect for forensic analysis.
Python's built-in email library can parse .eml files natively, which means the entire tool runs with no external dependencies beyond the standard library.
The Checks — And Why They Matter
- Header Mismatch Detection Legitimate emails have consistent headers. The From address, Reply-To address, and Return-Path should all point to the same domain. When they don't — that's a red flag. Attackers frequently use a display name like "PayPal Security" while the actual sending address is noreply@paypa1-security.xyz. The tool extracts all three headers and flags any domain mismatch immediately.
- SPF, DKIM, and DMARC Validation These three protocols are the backbone of email authentication:
SPF (Sender Policy Framework) — verifies that the sending server is authorized to send on behalf of the domain
DKIM (DomainKeys Identified Mail) — verifies that the email content hasn't been tampered with in transit using a cryptographic signature
DMARC — ties SPF and DKIM together and tells receiving servers what to do if either check fails
The tool checks the Authentication-Results header — which is added by the receiving mail server — for pass/fail status on all three. A legitimate email from a major organization will almost always pass all three. A phishing email often fails one or more.
- URL and Link Analysis The tool extracts every URL from the email body and checks for:
URL shorteners (bit.ly, tinyurl, etc.) — used to hide the real destination
Suspicious TLDs (.xyz, .tk, .ml) — popular with attackers because they are cheap or free
IP addresses used directly in links — no legitimate organization sends links like http://185.220.101.34/login
Lookalike domains — domains that visually resemble trusted brands (paypa1.com, amaz0n.net)
- Urgency and Credential Request Keywords Phishing emails almost universally use psychological pressure. The tool scans the email body for phrases like "your account will be suspended," "verify immediately," "click here to confirm your password," and similar patterns. Each match increases the risk score.
- Attachment Analysis Malicious attachments often disguise themselves using double extensions — invoice.pdf.exe looks like a PDF but executes as a program. The tool checks every attachment filename for this pattern and flags executable file types hidden behind document extensions.
Running It
The tool is simple to use. Download a suspicious email as a .eml file and run:
python3 phishing_analyzer.py suspicious_email.eml
The output shows every check with a pass/fail result and explains why each indicator matters, followed by the final risk score and verdict.
Test Results
I included two sample emails in the repository to demonstrate the contrast:
The phishing sample — a fake PayPal security alert — scored 100/100. It triggered 13 indicators including a From/Reply-To domain mismatch, failed SPF, URL shorteners, urgency keywords, and a credential request.
The legitimate sample — a standard professional email — scored 0/100. Every header was consistent, no suspicious URLs, no urgency language.
The contrast makes the tool's value immediately clear.
What I Learned
Building this tool gave me a much deeper understanding of why email authentication protocols exist and how attackers work around them. Most phishing emails don't try to bypass SPF or DKIM — they just rely on the fact that most people never check those headers manually. Automating those checks removes the human error factor entirely.
It also reinforced something important about SOC work: the difference between a trained analyst and a beginner is often just knowing what to look for. This tool encodes that knowledge into a repeatable process.
Try It Yourself
The full source code, sample emails, and README are available on GitHub:
GitHub: https://github.com/SankethSubhas/phishing-email-analyzer
Sanketh Subhas is a Cybersecurity Analyst with 3.5+ years of experience in SOC operations, GRC, and threat detection.
Portfolio: sankethsubhas.pages.dev | GitHub: github.com/SankethSubhas
Top comments (0)