Skip to content

DEV Community

Kevin

Posted on Nov 10

Weekend Project- Building a Serverless Phishing Detector for Google's Cloud Run Hackathon

#cloud #python #api #cybersecurity

I created this blog post for the purposes of entering the Google Cloud Run hackathon.

When I set out to build ParsePhish for the Google Cloud Run hackathon, I thought I had a solid plan: create a dual-purpose API that could analyze both emails AND URLs for phishing indicators. What I ended up with was a much more focused, production-ready solution - and a valuable lesson about the power of simplicity in AI applications.

First, what is ParsePhish?

ParsePhish is a REST API that uses transformer embeddings and GPU-accelerated similarity search to analyze email content for phishing indicators. It runs entirely serverless on Google Cloud Run with NVIDIA L4 GPUs. It was designed to be privacy-respecting and inexpensive to run. My idea was that now you could detect phishing messages without needing to send your messages to a service that you don't control.

The Original Vision (And Its Problems)

My initial idea was ambitious: why not detect phishing in both email content AND suspicious URLs? I built a FastAPI service with two endpoints:

/analyze/email - for email content analysis
/analyze/url - for URL analysis

The email analysis worked beautifully. Using SentenceTransformers and FAISS (Facebook AI Similarity Search) similarity search on the cloud GPUs, it could accurately identify phishing patterns in email text. But the URL analysis? That's where things got interesting.

curl -X POST $API_URL/analyze/url \
  -d '{"url": "https://google.com"}'
# Response: {"phishy_score": 0.8, "verdict": "Potential phishing"}

Google.com with a phishing score of 0.8? Something was very wrong.

The problem wasn't with the AI model - it was with my approach. The URL analysis was fundamentally flawed and destined to not be useful because:

Context Loss: A URL without user context is just a string - the real phishing happens in how it's presented
Domain Complexity: Modern web infrastructure (CDNs, subdomains, redirects) broke simple pattern matching and would make sure that even if I analyzed a URL in a sketchy email it would likely not do the user any good.

What I actually built

What remained after I stripped out the URL analysis feature was a laser-focused email phishing detection API that works reliably.

Here's the breakdown:

Architecture:

FastAPI backend deployed on Cloud Run with NVIDIA L4 GPUs
SentenceTransformers (intfloat/e5-small-v2) for GPU-accelerated 384-dimensional text embeddings
FAISS CPU for high-speed similarity search against known phishing patterns
Python container

AI Pipeline:

Text normalization and preprocessing
GPU-accelerated transformer embedding generation
FAISS similarity search against training corpus
Intelligent scoring combining similarity votes with pattern matching
Real-time response with explanatory details

Cloud Run Configuration:

16Gi memory + 4 vCPUs for GPU instances
Scale-to-zero for cost efficiency
HTTPS-only with automatic TLS

Check it out here:

kevinl95 / ParsePhish

ParsePhish uses NVIDIA L4 GPUs on Google Cloud Run to detect email phishing entirely serverless! Transformer embeddings + FAISS similarity search = real-time protection without storing your data.

The text ParsePhish displayed underneath a shield with a fish on it.

ParsePhish Email Analysis API

GPU-powered phishing detection for email content
Serverless AI-powered email analysis using transformer embeddings and similarity search.

ParsePhish is a REST API that uses transformer embeddings and GPU-accelerated similarity search to analyze email content for phishing indicators. Built for the Cloud Run GPU Category hackathon, it runs entirely serverless on Google Cloud Run with NVIDIA L4 GPUs.

Quick Start

Prerequisites

Before deploying, you'll need:

Google Cloud CLI: Install gcloud
Google Cloud Project: Create a project with billing enabled
Authentication: Run gcloud auth login and gcloud auth application-default login
GPU Quota: Request NVIDIA L4 GPU quota in europe-west4 region
Required APIs: The deployment script will enable these automatically:
- Cloud Run API
- Cloud Build API
- Container Registry API

Deploy to Cloud Run

# Clone the repository
git clone https://github.com/kevinl95/ParsePhish.git
cd ParsePhish
# Deploy with GPU support to your Google Cloud project
./deploy.sh YOUR_PROJECT_ID

…

Once you deploy it, give it a shot:

# Obvious phishing email
curl -X POST $API_URL/analyze/email \
  -d '{
    "content": "URGENT! Click here or your account will be deleted!",
    "subject": "Security Alert"
  }'

# Legitimate email  
curl -X POST $API_URL/analyze/email \
  -d '{
    "content": "Your monthly statement is available in your account portal.",
    "subject": "Statement Ready"
  }'

where of course API_URL is your deployed ParsePhish.

Where next?

I would love to prototype extensions for Thunderbird and other email clients so people can take advantage of this tool right from their favorite email apps. For now, I'm excited to get this out in front of the community to see if it can be valuable for anyone building tools that can make use of an easy-to-deploy phishing detector.

Top comments (0)

Subscribe