Karthik Vankayalapati

Posted on Apr 26

TrustShield AI: Multi-Layer Phishing Detection Framework Using Machine Learning

#python #machinelearning #cybersecurity #flask

description: "Learn how TrustShield AI combines machine learning, URL intelligence, and real-time threat monitoring to detect sophisticated phishing attacks with 95-98% accuracy."
published: true
cover_image: https://github.com/karthikeya1498/PFSD-BLOG/blob/main/assets/hero-shield.jpg?raw=true
tags: ['python', 'machinelearning', 'cybersecurity', 'flask', 'mongodb', 'phishing']

canonical_url: https://pfsd-blog.vercel.app/

🛡️ TrustShield AI: A Multi-Layer Phishing Detection Framework Using Machine Learning

TrustShield AI is a multi-layered, AI-driven phishing detection framework designed to identify and mitigate sophisticated email-based attacks in real time. Built on a three-tier architecture comprising a frontend dashboard, a Flask-based asynchronous backend, and a MongoDB persistence layer, the system fuses six independent intelligence signals to achieve detection accuracy of approximately 95-98%.

🎯 Key Features

Feature	Specification
Detection latency	`< 200 ms`
Detection accuracy	`≈ 95-98%`
Real-time processing	`Asynchronous Flask backend`
Living retraining	`Continuous model adaptation`
Chrome Extension	`Manifest V3 integration`
SOC Dashboard	`Real-time monitoring interface`

🏗️ System Architecture

TrustShield AI is structured into three logical tiers. This separation allows each tier to scale, fail and be replaced independently of the others.

🔧 Three-Tier Design

Frontend Dashboard 📊 - Web-based SOC interface for security analysts
Backend ⚙️ - Flask (Python) with asyncio for asynchronous processing
Database 💾 - MongoDB for persistence and real-time analytics

🛠️ Technology Stack

graph TB
    A[Frontend Dashboard] --> B[Flask Backend]
    B --> C[MongoDB Database]
    D[Chrome Extension] --> B
    E[ML Models] --> B
    F[URL Intelligence] --> B
    G[Rule Engine] --> B
    H[LLM Analysis] --> B

Layer	Technology	Purpose
Frontend	`HTML5, CSS3, JavaScript (Vanilla)`	Dashboard UI, real-time updates
Backend	`Flask (Python) · asyncio`	API server, async processing
Database	`MongoDB (PyMongo)`	Data persistence, analytics
ML Library	`scikit-learn · pandas · numpy`	Model training and inference
Models	`LogReg · RF · GBM · Linear SVM`	Classification algorithms
LLM Assist	`Ollama (phi model, local)`	Semantic analysis
Extension	`Chrome MV3`	Browser integration

🔍 Detection Engine

The detection engine is the analytic core of TrustShield AI. Each incoming email is normalized, vectorized and dispatched to a non-blocking executor that runs ML inference alongside five rule-driven intelligence modules.

⚡ Aggressive Fusion Strategy

TrustShield uses a strategy referred to internally as aggressive fusion. Every layer returns a numeric score in the range [0, 1], where higher values indicate greater phishing likelihood.

# Aggressive Fusion Algorithm
final_score = (
    ml_prediction * 0.35 +      # Machine Learning
    url_intelligence * 0.25 +  # URL Analysis  
    rule_heuristics * 0.20 +   # Rule Engine
    emotional_analysis * 0.10 + # Emotion Detection
    behavioral_anomalies * 0.07 + # Behavior Analysis
    llm_semantic * 0.03         # LLM Understanding
)

verdict = "phishing" if final_score < 0.4 else "legitimate"

📊 SOC Dashboard

The Security Operations Centre (SOC) dashboard is a web-based interface that allows security analysts to monitor the live behaviour of TrustShield AI.

🎯 Dashboard Features

The dashboard surfaces four primary views:

📈 Live activity feed - Every scan and every verdict, streamed in real time
📊 Risk levels and trends - Hourly and daily phishing pressure, segmented per tenant
🤖 Model information - Active model version, accuracy, calibration and drift indicators
🚨 Alerts and notifications - High-risk verdicts, drift alarms and pipeline failures

🔍 Real-time Threat Monitoring

The real-time threat monitoring interface displays live phishing detection results, risk scores, and automated threat intelligence feeds from the TrustShield AI system.

🔌 Chrome Extension Integration

TrustShield AI integrates with email clients through a Chrome Manifest V3 browser extension.

🔄 Extension Workflow

sequenceDiagram
    participant U as User
    participant E as Extension
    participant A as API
    participant D as Database

    U->>E: Opens email
    E->>E: Extract content & URLs
    E->>A: Send to /analyze endpoint
    A->>A: Process through detection layers
    A->>E: Return verdict & score
    E->>U: Display risk indicator
    A->>D: Store results for retraining

📋 Extension Process

📖 Reads the email content and extracts URLs from the active DOM
📤 Sends the payload to the Flask /analyze endpoint with a rotating API key
📱 Displays the risk score, classification and triggered rules to the user
🔄 Mirrors the verdict to the SOC dashboard via the same logging spine

🧠 Living Retraining Dataset

A central design principle of TrustShield AI is that the model must learn from the traffic it sees. The system does not rely solely on static phishing corpora.

📚 Dataset Schema

Field	Type	Description
`email_id`	`ObjectId`	Unique identifier
`timestamp`	`ISODate`	Time of analysis (UTC)
`content`	`Text`	Email body content
`urls`	`Array`	Extracted URLs
`label`	`Enum`	`phishing` or `legitimate`
`confidence_score`	`Float [0,1]`	Model probability
`risk_level`	`Enum`	`low · medium · high · critical`
`source`	`Enum`	`dashboard` or `extension`

🚀 Deployment & Performance

⚡ Core Engine Implementation

import asyncio
from typing import Dict, List

async def analyze_email(email_content: str, urls: List[str]) -> Dict:
    """Parallel processing of all detection layers"""

    # Execute all detection layers concurrently
    tasks = [
        ml_predictor.predict(email_content),
        url_analyzer.check_urls(urls),
        rule_engine.evaluate(email_content),
        emotion_analyzer.analyze(email_content),
        behavior_detector.analyze(email_content),
        llm_analyzer.analyze(email_content)
    ]

    ml_score, url_score, rule_score, emotion_score, behavior_score, llm_score = await asyncio.gather(*tasks)

    # Aggressive fusion with configurable weights
    final_score = (
        ml_score * 0.35 +
        url_score * 0.25 +
        rule_score * 0.20 +
        emotion_score * 0.10 +
        behavior_score * 0.07 +
        llm_score * 0.03
    )

    return {
        'verdict': 'phishing' if final_score < 0.4 else 'legitimate',
        'confidence': final_score,
        'risk_level': calculate_risk_level(final_score),
        'layer_scores': {
            'ml': ml_score,
            'url': url_score,
            'rules': rule_score,
            'emotion': emotion_score,
            'behavior': behavior_score,
            'llm': llm_score
        }
    }

📊 Performance Metrics

Metric	Value
Latency	`< 200ms` per email
Throughput	`1000+` emails/minute
Accuracy	`95-98%`
False Positive Rate	`< 2%`
Coverage	`100%` of inbound emails

📈 Future Enhancements

🔮 Planned Features

⚡ Edge inference - Execution of the model inside the extension itself
🤖 Autonomous remediation - Automatic quarantine and sender disposal
🏢 Multi-tenant support - Isolated environments for different organizations
🧠 Advanced LLM integration - Fine-tuned models for specific phishing patterns
📱 Mobile app - Native applications for iOS and Android

🔬 Research Directions

🎯 Zero-day phishing detection - Using unsupervised learning for novel attack patterns
🔄 Cross-platform integration - Support for Outlook, Gmail, and other email clients
⛓️ Blockchain integration - Immutable audit trails for compliance
🤝 Federated learning - Collaborative model training across organizations

📚 References

Putra, F. P. E. et al. (2024). "Analysis of phishing attack trends, impacts and prevention methods: Literature study." Brilliance: Research of Artificial Intelligence, 4(1), 413–421.
Alghenaim, M. et al. (2025). "The state of the art in ai-based phishing detection: A systematic literature review." Studies in Computational Intelligence, 1178.
Afane, K. et al. (2024). "Next-generation phishing: How llm agents empower cyber attackers." IEEE International Conference on Big Data (BigData), 2558–2567.
Roy, S. S. et al. (2024). "From chatbots to phishbots?: Phishing scam generation in commercial large language models." IEEE Symposium on Security and Privacy (SP), 36–54.
Kyaw, P. H. et al. (2024). "A systematic review of deep learning techniques for phishing email detection." Electronics, 13(3823).

🛠️ Getting Started

📋 Prerequisites

✅ Python 3.8+
✅ MongoDB 4.4+
✅ Node.js 16+
✅ Chrome Browser (for extension)

🚀 Installation

# Clone the repository
git clone https://github.com/karthikeya1498/PFSD-BLOG.git
cd PFSD-BLOG

# Install backend dependencies
pip install -r requirements.txt

# Install frontend dependencies
npm install

# Start MongoDB
mongod

# Run the Flask backend
python app.py

# Run the frontend
npm run dev

⚙️ Configuration

Set up your MongoDB connection string in config.py
Configure your Ollama instance for LLM integration
Load the pre-trained ML models from models/
Install the Chrome extension from extension/

🤝 Contributing

We welcome contributions to TrustShield AI!

🔗 Source Code: GitHub Repository
🌐 Live Demo: TrustShield AI Blog
🐛 Issues: GitHub Issues

🛡️ TrustShield AI · 2026

Written by TrustShield AI Team

This blog was last edited on 26 April 2026, by TrustShield AI Team. Text is available under the open documentation license; the source code is published on github.com/Tejus468/pfsd_project.

DEV Community