<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rebeca</title>
    <description>The latest articles on DEV Community by Rebeca (@cryptogirl).</description>
    <link>https://dev.to/cryptogirl</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3357048%2F1bda244b-d5d2-4233-ae46-cd748ebfe6da.jpeg</url>
      <title>DEV Community: Rebeca</title>
      <link>https://dev.to/cryptogirl</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/cryptogirl"/>
    <language>en</language>
    <item>
      <title>How I Built a Python Phishing Detector with 92% Accuracy</title>
      <dc:creator>Rebeca</dc:creator>
      <pubDate>Tue, 15 Jul 2025 13:50:02 +0000</pubDate>
      <link>https://dev.to/cryptogirl/how-i-built-a-python-phishing-detector-with-92-accuracy-mc6</link>
      <guid>https://dev.to/cryptogirl/how-i-built-a-python-phishing-detector-with-92-accuracy-mc6</guid>
      <description>&lt;p&gt;"Phishing attacks account for 36% of data breaches (IBM Security 2023). As a cybersecurity enthusiast, I developed a Python-based tool that detects malicious URLs with 92% accuracy. Here’s how you can build one too!"&lt;/p&gt;

&lt;p&gt;Why it matters:&lt;/p&gt;

&lt;p&gt;Real-world problem: Phishing scams cost businesses $4.9B annually (FBI IC3 2022).&lt;br&gt;
Accessible solution: No expensive tools—just Python and ML.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tools &amp;amp; Technologie&lt;/strong&gt;s&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;`# Immediately showcase code to grab attention
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

print("Loading phishing dataset...")
data = pd.read_csv("phishing_dataset.csv")`
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 1: Building the Dataset&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data Sources:&lt;/p&gt;

&lt;p&gt;Malicious URLs: PhishTank, OpenPhish.&lt;br&gt;
Legitimate URLs: Common Crawl.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def extract_features(url):
    return {
        "url_length": len(url),
        "num_special_chars": sum(1 for char in url if char in "@#!$%&amp;amp;"),
        "uses_https": 1 if "https://" in url else 0,  # Phishing sites often lack HTTPS
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key Insight: "Phishing URLs are 3x more likely to contain special characters than legitimate ones (based on my dataset)."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Training the Model&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Why Random Forest?&lt;/p&gt;

&lt;p&gt;Handles imbalanced data well.&lt;/p&gt;

&lt;p&gt;Interpretable (vs. "black box" models like neural networks).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model = RandomForestClassifier(
    n_estimators=100,
    class_weight="balanced"  # Critical for imbalanced datasets
)
model.fit(X_train, y_train)
print(classification_report(y_test, model.predict(X_test)))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Results:&lt;/p&gt;

&lt;p&gt;Metric  Score&lt;br&gt;
Accuracy    0.92&lt;br&gt;
Recall  0.89    (Minimizes false negatives!)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Deploying to Production&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Option 1: Flask API (for enterprise integration):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from flask import Flask, request, jsonify

app = Flask(__name__)
@app.route("/predict", methods=["POST"])
def predict():
    url = request.json.get("url")
    features = extract_features(url)
    prediction = model.predict([features])[0]
    return jsonify({"is_phishing": bool(prediction)})
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Option 2: CLI Tool (for SOC teams):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python detector.py --url "https://fake-paypal-login.com"
# Output: ✅ Legitimate  or  ⚠️ PHISHING ATTEMPT DETECTED
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lessons Learned &amp;amp; Next Steps&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Challenges:&lt;/p&gt;

&lt;p&gt;Shortened URLs: Solved with requests to follow redirects.&lt;/p&gt;

&lt;p&gt;Data Imbalance: Used class_weight="balanced" and SMOTE oversampling.&lt;/p&gt;

&lt;p&gt;Future Improvements:&lt;/p&gt;

&lt;p&gt;Add logo detection (OpenCV) to spot fake brand impersonations.&lt;/p&gt;

&lt;p&gt;Publish model on Hugging Face Spaces for community use.&lt;/p&gt;

</description>
      <category>cybersecurity</category>
      <category>python</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
