Introduction
Phishing remains one of the most prevalent cybersecurity threats, leveraging deception to steal sensitive information such as login credentials and financial data. Detecting these patterns at scale requires innovative approaches combined with readily available open source tools. In this article, we explore how a security researcher can utilize SQL queries on existing datasets to identify common phishing indicators.
The Challenge
Phishing patterns often involve analyzing URLs, domain names, timestamps, and email attributes to find suspicious correlations. Traditional methods rely on machine learning models, but SQL offers a powerful and accessible alternative for pattern detection, especially when dealing with large, structured data in relational databases.
Setting Up the Environment
First, the researcher sets up an open source database, e.g., PostgreSQL, and ingests email logs, URL lists, and domain information. Here is an example schema:
CREATE TABLE email_logs (
id SERIAL PRIMARY KEY,
sender VARCHAR(255),
recipient VARCHAR(255),
timestamp TIMESTAMP,
url TEXT,
subject TEXT
);
CREATE TABLE domain_info (
domain VARCHAR(255) PRIMARY KEY,
registration_date DATE,
registrar VARCHAR(255),
trust_score INTEGER
);
Populate with relevant raw data; then, proceed with pattern detection queries.
Detecting Suspicious URLs and Domains
One common pattern is high similarity between URLs sent in short time frames or multiple emails from domains with recent registration dates, often indicative of malicious actors.
Query 1: Find domains registered within the last 30 days with high email activity
SELECT di.domain, di.registration_date, COUNT(el.id) AS email_count
FROM email_logs el
JOIN domain_info di ON el.url LIKE '%' || di.domain || '%'
WHERE di.registration_date > NOW() - INTERVAL '30 days'
GROUP BY di.domain, di.registration_date
HAVING COUNT(el.id) > 50;
This query helps flag newly registered domains that are actively sent emails, a typical characteristic of phishing campaigns.
Query 2: Identify URLs with suspicious substrings
Malicious actors often embed threatening words or misspellings.
SELECT id, url
FROM email_logs
WHERE url ILIKE '%secure-login%' OR url ILIKE '%update-account%' OR url ILIKE '%verify%'
AND timestamp > NOW() - INTERVAL '7 days';
This filters emails containing common phishing bait phrases.
Combining Patterns for Better Detection
A practical approach involves combining multiple signals in SQL, such as recent domain registration, URL patterns, and email sender reputation.
SELECT el.id, el.sender, el.url, di.registration_date, COUNT(*) OVER (PARTITION BY di.domain) AS domain_activity
FROM email_logs el
JOIN domain_info di ON el.url LIKE '%' || di.domain || '%'
WHERE di.registration_date > NOW() - INTERVAL '30 days'
AND (el.url ILIKE '%secure-login%' OR el.url ILIKE '%update-account%')
AND el.sender NOT IN (/* list of trusted senders */);
Conclusion
Utilizing SQL for phishing pattern detection offers a flexible, interpretable, and open-source approach to cybersecurity analysis. By constructing targeted queries based on known malicious behaviors—such as recent domain creation, suspicious URL patterns, and email activity—researchers and security teams can proactively identify and respond to threats.
This methodology can be extended and integrated into larger security workflows, leveraging existing open source tools like PostgreSQL, and can be enhanced with additional data sources like WHOIS information, DNS logs, and user behavioral data for more comprehensive threat detection.
References
- Open source tools: PostgreSQL, pgAdmin
- Data sources: Passive DNS, WHOIS, email logs
- Further reading: OWASP Phishing Detection
🛠️ QA Tip
To test this safely without using real user data, I use TempoMail USA.
Top comments (0)