tags: ai, security, phishing, next.js, node.js, cybersecurity
Introduction
Every day, millions of people fall victim to phishing scams and phishing messages. Scammers don't just target people in English-speaking countries; they attack globally, using local languages to make their deceptive messages seem more trustworthy.
This is where ScamDetect comes in.
I built ScamDetect to solve this problem - a free, multilingual AI-powered platform that can detect phishing attempts, scams, and malicious URLs in multiple languages. Whether you speak English, Spanish, French, Arabic, or Chinese, ScamDetect can help you identify and avoid online scams before they harm you.
Why I Built This Project
The motivation came from two key realizations:
1. Phishing is a Global Problem
According to recent statistics, over 3.4 billion phishing emails are sent every day -> View. But what's worse is that non-English speakers are often underserved by existing security tools. A scammer might convince someone in Indonesia by sending a message in Indonesian, but government and commercial security tools often focus on English-language threats.
2. A friend of mine was scammed πͺπ₯
A friend of mine was duped through a phishing attack, and it really made me think about how easily these things can happen. Many times people donβt recognize phishing messages until itβs already too late.
Iβve always had a strong interest in cybersecurity and phishing awareness, and whenever I get the opportunity, I try to educate people about how these scams work.
When the lingo.dev hackathon came up, I saw it as an opportunity to build something that could actually help people beyond just awareness.
Key Features of ScamDetect
1. Multilingual Text Message Analysis
Users can paste a suspicious text message (SMS, WhatsApp, Telegram, etc.) and ScamDetect analyzes it for phishing indicators. The detection is language-independent, so a message in Spanish, Arabic, or Portuguese can be analyzed with the same accuracy.
2. URL Phishing Scanning
ScamDetect checks suspicious URLs against two powerful threat intelligence databases:
- VirusTotal API β checks against 80+ antivirus engines
- PhishTank β checks against known phishing databases
This multi-source approach ensures fewer false positives while catching more real threats.
3. Screenshot OCR Detection
Many scams come from screenshots of fake banking apps, fake payment screens, or fabricated messages. ScamDetect can:
- Extract text from screenshots using Google Vision OCR (even though I have challenges enabling the billing after trying several attempt. I could not log into Microsoft Azure to use the OCR tool hence why I reverted back to Google Vision OCR since that is just only a billing issue.)
- Analyze the extracted text for phishing indicators
- All without needing the user to manually type anything
4. AI-Powered Scam Classification
Using Ollama with the gpt-oss:120b-cloud model, ScamDetect runs an advanced AI classifier to determine if a message is genuinely a phishing attempt or a legitimate message. This goes beyond simple keyword matchingβthe AI understands context and language patterns.
5. Results Translated into Your Language
All analysis results are automatically translated into the user's preferred language using the Lingo.dev translation API.
6. User Dashboard
Users can:
- View their scan history
- Track patterns in scams they've encountered
Architecture Overview
ScamDetect follows a modern full-stack architecture:
βββββββββββββββββββββββββββββββββββββββ
β Frontend (Next.js + React) β
β β’ User Interface β
β β’ Language Selection β
β β’ Input Handling β
ββββββββββββββ¬βββββββββββββββββββββββββ
β
β HTTP/HTTPS
β
ββββββββββββββΌβββββββββββββββββββββββββ
β Backend (Node.js + Express) β
β β’ API Endpoints β
β β’ Detection Pipeline β
β β’ Service Orchestration β
ββββββββββββββ¬βββββββββββββββββββββββββ
β
ββββββββββΌβββββββββ¬βββββββββββ
β β β β
βΌ βΌ βΌ βΌ
ββββββββββββββββββββββββββββββββββββββββ
βDatabaseββ OCR ββAI ββAPIs (VT, β
β ββ ββModel ββPhishTank) β
βSupabaseββGoogleββOllamaββTranslation β
β ββ Visionββgpt-oss:120b-cloudββLingo.dev β
ββββββββββββββββββββββββββββββββββββββββ
Data Flow:
- User Input β Frontend captures text, URL, or screenshot image
- Frontend sends request β Backend API with language preference
-
Backend processes:
- Extracts keywords and URLs
- Checks domain similarity using the Levenshtein distance method
- Submits URLs to VirusTotal/PhishTank
- Runs AI classification if text is detected
- Results are compiled β Risk score calculated
- Translation layer β Results translated to user's language
- Response sent back β Frontend displays rich visualization
- Data persisted β Results stored in Supabase for user's dashboard
Flow Chart:
Important Code Examples
Let's look at real code from ScamDetect and understand how it works.
1. Sending Text to Backend for Analysis
Frontend Code - how users send text messages for analysis:
// frontend/src/lib/api.ts
export const api = {
/** POST /api/analyze-message */
analyzeMessage: async (
message: string,
language = "en",
): Promise<DetectionResult> =>
fetch(`${API_URL}/api/analyze-message`, {
method: "POST",
headers: {
"Content-Type": "application/json",
...(await authHeaders()), // Include user auth token
},
body: JSON.stringify({ message, language }),
}).then(handleResponse<DetectionResult>),
};
What this does:
- Takes a suspicious message and target language
- Sends it to the backend
/api/analyze-messageendpoint - Includes authentication headers (if user is logged in)
- Returns structured analysis results
Why it's important:
- Simple, type-safe API using TypeScript
- Supports authentication without exposing tokens
- Error handling built-in via
handleResponse
Backend Code - how the backend processes the message:
// backend/src/controllers/analyze.controller.ts
export async function analyzeMessage(
req: Request,
res: Response,
): Promise<void> {
const { message, language } = req.body;
try {
// Run the full detection pipeline
const result = await runDetectionPipeline(message, language ?? "en");
// Save result to database asynchronously
saveDetectionResult(message, result, req.userId).catch(() => {});
// Return results immediately
res.json(result);
} catch (err) {
res.status(500).json({ error: "Detection pipeline failed" });
}
}
What this does:
- Validates and extracts message and language from request
- Runs the detection pipeline (keyword matching, URL extraction, AI classification)
- Saves results to database for user history
- Returns response immediately (doesn't wait for database save)
Why it's important:
- Fast response timeβdoesn't block on database operations
- Persistent history for user analysis
- Graceful error handling
2. Running VirusTotal API for URL Scanning
Backend Service β how we check URLs against VirusTotal:
// backend/src/services/virustotalService.ts
export async function submitUrl(url: string): Promise<string | null> {
const key = apiKey();
if (!key) return null;
try {
const body = new URLSearchParams();
body.set("url", url);
const res = await axios.post<{ data: { id: string } }>(
`${VT_API_URL}/urls`,
body,
{
headers: {
"x-apikey": key,
"Content-Type": "application/x-www-form-urlencoded",
},
},
);
return res.data.data.id;
} catch (err) {
return null;
}
}
/**
* Get detailed analysis of a VirusTotal scan
*/
export async function getAnalysis(
analysisId: string,
): Promise<VTDetailedResult | null> {
const key = apiKey();
if (!key) return null;
try {
const res = await axios.get<VTAnalysisResponse>(
`${VT_API_URL}/analyses/${analysisId}`,
{
headers: { "x-apikey": key },
},
);
const attrs = res.data.data.attributes;
return {
vtAnalysisId: analysisId,
status: attrs.status,
maliciousCount: attrs.stats.malicious || 0,
phishingCount: (attrs.results || {})["Phish Threat"] ? 1 : 0,
harmlessCount: attrs.stats.harmless || 0,
suspiciousCount: attrs.stats.suspicious || 0,
undetectedCount: attrs.stats.undetected || 0,
engines: Object.entries(attrs.results || {}).map(
([name, result]) => ({
name,
category: result.category,
result: result.result,
}),
),
};
} catch (err) {
return null;
}
}
What this does:
- Submits a URL to VirusTotal's API for scanning
- Polls the analysis endpoint to get the results
- Returns detailed information including malicious engines and detection counts
- Gracefully handles API key missing or API failures
Why it's important:
- VirusTotal provides crowd-sourced threat intelligence from 80+ antivirus engines
- One malicious URL detection might be a false positive, but multiple engines agreeing is strong signal
- Decoupling URL checking from our own detection logic keeps our system modular
3. Extracting Text from Screenshots Using OCR
Backend Service β how we extract text from screenshot images:
// backend/src/services/ocrService.ts
import { ImageAnnotatorClient } from "@google-cloud/vision";
export interface OCRResult {
extractedText: string;
confidence?: number;
}
/** Extract text from an image using Google Cloud Vision API */
export async function extractTextFromImage(
imageBase64: string,
): Promise<OCRResult> {
const client = createClient();
const request = {
image: { content: imageBase64 },
};
const response = await client.documentTextDetection(request);
const fullTextAnnotation = response[0].fullTextAnnotation;
if (!fullTextAnnotation) {
return { extractedText: "", confidence: 0 };
}
// Calculate average confidence from all blocks
const annotations = response[0].fullTextAnnotation.pages?.[0].blocks || [];
const confidences = annotations
.flatMap((b) => b.paragraphs || [])
.flatMap((p) => p.words || [])
.flatMap((w) => w.symbols || [])
.map((s) => s.confidence || 0);
const avgConfidence =
confidences.length > 0
? confidences.reduce((a, b) => a + b, 0) / confidences.length
: 0;
return {
extractedText: fullTextAnnotation.text || "",
confidence: avgConfidence,
};
}
Backend Controller β how the OCR endpoint works:
// backend/src/controllers/screenshot.controller.ts
export async function scanScreenshot(
req: Request,
res: Response,
): Promise<void> {
const { imageBase64, language } = req.body;
try {
// Extract text from image
const { extractedText } = await extractTextFromImage(imageBase64);
if (!extractedText.trim()) {
return res.json({
riskLevel: "SAFE",
message: "No text found in image",
});
}
// Analyze the extracted text
const result = await runDetectionPipeline(extractedText, language ?? "en");
// Save for history
saveDetectionResult(imageBase64, result, req.userId).catch(() => {});
res.json(result);
} catch (err) {
res.status(500).json({ error: "Screenshot scanning failed" });
}
}
What this does:
- Takes a base64-encoded image from the frontend
- Uses Google Cloud Vision to extract text
- Calculates confidence score based on per-character confidence values
- Runs the normal detection pipeline on extracted text
- Returns phishing analysis results
Why it's important:
- OCR allows users to share screenshots instead of typing (better UX)
- Confidence score helps users understand if text extraction was reliable
- Image-based scams (fake banking screens, fake payment apps) are very commonβthis feature is critical
4. Translating Results with Lingo.dev
Backend Service β translating analysis results:
// backend/src/services/translationService.ts
import { LingoDotDevEngine } from "lingo.dev/sdk";
const engine = new LingoDotDevEngine({
apiKey: process.env.LINGODOTDEV_API_KEY,
});
export async function translateText(
text: string,
targetLanguage: string,
): Promise<string> {
// Skip translation if no API key or target is English
if (!process.env.LINGODOTDEV_API_KEY || targetLanguage === "en") {
return text;
}
const result = await engine.localizeText(text, {
sourceLocale: "en",
targetLocale: targetLanguage,
});
return result ?? text;
}
Detection Pipeline β integrating translation into results:
export async function runDetectionPipeline(
text: string,
language = "en",
): Promise<DetectionResult> {
// ... analysis code ...
const riskLevel = calculateRiskLevel(totalScore);
const recommendation = generateRecommendation(riskLevel);
// Translate recommendation to user's language
if (language !== "en") {
translatedRecommendation = await translateText(recommendation, language);
}
return {
riskLevel,
score: totalScore,
flags,
recommendation: translatedRecommendation,
extractedUrls,
language,
};
}
What this does:
- Uses Lingo.dev SDK to translate text between languages
- Skips translation if no API key (graceful degradation)
- Integrates into the detection pipeline to translate analysis recommendations
Why it's important:
- Makes the tool accessible globally
- Users aren't forced to read English analysis results
- Lingo.dev handles nuanced localization (not just word translation)
5. Frontend Component for User Interaction
Frontend Page β scanning URLs with real-time feedback:
// frontend/src/app/scan-url/page.tsx
export default function ScanUrlPage() {
const [url, setUrl] = useState("");
const [result, setResult] = useState<DetectionResult | null>(null);
const [loading, setLoading] = useState(false);
const [error, setError] = useState<string | null>(null);
const { language, setLanguage } = useLanguage();
async function handleScan() {
if (!url.trim()) return;
setLoading(true);
setError(null);
setResult(null);
try {
// Call backend API
const data = await api.checkUrl(url, language);
setResult(data);
} catch (err) {
setError(err instanceof Error ? err.message : "Scan failed");
} finally {
setLoading(false);
}
}
return (
<div className="min-h-screen px-4 py-16">
<div className="mx-auto max-w-2xl">
{/* Header */}
<h1 className="font-mono text-3xl font-bold text-[#e2e8ff]">
Scan <span className="text-[#ff00ff]">URL</span>
</h1>
{/* Input Panel */}
<input
type="url"
value={url}
onChange={(e) => setUrl(e.target.value)}
onKeyDown={(e) => e.key === "Enter" && handleScan()}
placeholder="https://suspicious-site.com"
className="w-full rounded border bg-[rgba(255,0,255,0.03)] py-3 px-4 text-[#e2e8ff]"
/>
{/* Scan Button */}
<button
onClick={handleScan}
disabled={loading}
className="mt-4 px-6 py-2 bg-[#ff00ff] rounded text-white font-bold disabled:opacity-50"
>
{loading ? "Scanning..." : "Scan URL"}
</button>
{/* Results */}
{result && <ResultPanel result={result} />}
{error && <div className="text-red-500">{error}</div>}
</div>
</div>
);
}
What this does:
- Provides user interface for URL scanning
- Handles loading states and error display
- Calls the backend API with user's language preference
- Uses React hooks to manage state
- Keyboard support (Enter to scan)
Challenges I Faced
Building ScamDetect involved solving several complex problems:
1. OCR Accuracy & Confidence Level
The Challenge:
Different images have different quality levels. At first I tried a common OCR library (Tesseract) but noticed it was missing characters or skipping parts of the text. That forced me to experiment with other OCR options until I found something that worked better for extracting text from screenshots such as Google Vision OCR or Microsoft Azure OCR tool. I decided to go for the Google Vision OCR since I have worked with different Google Cloud services.
After setting up the Google Vision OCR, I had an error to enable billing because the service required enabling billing but Google could not verify every card details I tried. I decided to make use of Microsoft Azure, I signed up and fill every detailed bearing in mind that I am close but unfortunately, I could not login after signup, I kept receiving this error message :"interaction_required: AADSTS5000225: This tenant has been blocked due to inactivity. To learn more about tenant lifecycle policies, see https://aka.ms/TenantLifecycle". I had to maintain the Google Vision OCR setup since it was only billing issue.
The Solution:
No solution yet because I am yet to figure out how to solve the billing issue. I have done everything asked of me by Google but I still have same issue.
What I intend to do is to make a subscription on hugging face and use one of the models such as Zai for the OCR extraction. I could have used the Deepseek OCR model in Ollama but the model is large and I might not be able to use it in production, it is best used locally.
// Only flag as high-risk if confidence is good
if (confidence < 0.7) {
console.warn("Low OCR confidenceβreducing detection sensitivity");
flags = flags.filter((f) => f.score > 30); // Only high-confidence flags
}
2. Integrating AI Models Reliably
The Challenge:
Ollama (local AI) and VirusTotal API both can fail intermittently. We can't let one failing service break the entire detection pipeline.
The Solution:
- Made all external service calls optional
- If VirusTotal fails, we still return keyword-based detection
- If Ollama doesn't respond, we skip AI classification but complete the analysis
- Return partial results instead of errors
- This is called "graceful degradation"
// Ollama AI classification is optional
try {
const { classifyWithOllama } = await import("./ollamaService");
aiClassification = (await classifyWithOllama(text)) ?? undefined;
} catch (err) {
console.error("[AI] Error:", err);
// Continue without AI classification
}
// Return results even if external services failed
return {
riskLevel,
score: calculateRiskScore(flags), // Built from flags we do have
flags,
message: "Detection completed with available services",
};
3. Managing API Call Costs and Rate Limits
The Challenge:
- VirusTotal has rate limits (free tier: 4 requests/min)
- Translation API has per-character costs
- Scanning screenshots with Google Cloud Vision costs money
- We needed to avoid wasting money on repeated scans of the same URLs
The Solution:
- Implemented result caching in Supabase
- If a URL was scanned recently, return cached result instead of making new API call
- Batch translation requests when possible
- Use express-rate-limit middleware to protect our backend
- Rate limit by user and IP address
// Rate limiting in express
const limiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100, // Max 100 requests per IP per 15 min
message: { error: "Too many requests, please try again later." },
});
app.use(limiter);
Setup Guide for Developers
Want to run ScamDetect yourself? Here's how to get started.
Prerequisites
Before you begin, make sure you have:
- Node.js 20+ β Download here
- Ollama installed β Download here
- Git β Download here
You'll also need API keys from:
- Supabase (free) β supabase.com
- VirusTotal (free) β virustotal.com
- Google Cloud Vision (free tier) β cloud.google.com
- Lingo.dev (for translation) β lingo.dev
Step 1: Clone the Repository
git clone https://github.com/blaycoder/ScamDetect-Multilingual
cd ScamDetect
Step 2: Setup Backend
# Navigate to backend directory
cd backend
# Install dependencies
npm install
# Copy environment file
cp .env.example .env
# Edit .env with your API keys
nano .env
# Fill in:
# SUPABASE_URL=your-supabase-url
# SUPABASE_KEY=your-supabase-key
# VIRUSTOTAL_API_KEY=your-virustotal-key
# GOOGLE_CLOUD_CREDENTIALS=your-google-vision-json
# LINGODOTDEV_API_KEY=your-lingo-key
Start Ollama first (in a separate terminal):
ollama serve
Run the backend server:
npm run dev
You should see:
[Express] Server running on port 4000
[Ollama] Connected to local model
Step 3: Setup Frontend
In a new terminal:
# Navigate to frontend directory
cd frontend
# Install dependencies
npm install
# Create environment file
cp .env.example .env.local
# Edit .env.local
nano .env.local
# Fill in:
# NEXT_PUBLIC_API_URL=http://localhost:4000
# NEXT_PUBLIC_SUPABASE_URL=your-supabase-url
# NEXT_PUBLIC_SUPABASE_KEY=your-supabase-key
Run the frontend:
npm run dev
Open your browser to http://localhost:3000
Step 4: Test the Application
- Test text scanning: Go to "Analyze Message" and paste a phishing message
- Test URL scanning: Go to "Scan URL" and paste a suspicious URL
- Test screenshot: Go to "Upload Screenshot" and upload a screenshot
If everything works, you should see detection results with risk scores!
Troubleshooting
Backend won't start:
- Make sure Ollama is running (
ollama serve) - Check that port 4000 isn't in use:
lsof -i :4000
Frontend can't reach backend:
- Make sure
NEXT_PUBLIC_API_URLpoints to your backend - Check CORS settings in
backend/src/app.ts
OCR not working:
- Verify Google Cloud Vision credentials are correct
- Check that you have the right JSON file format
- Check that billing is enabled on Google Cloud
- Confirm that Google Vision OCR is enabled in APIs & Services
Ollama responses are slow:
- First response can take 10-30 seconds
- Subsequent responses should be faster (model is cached in memory)
Real World Applications
ScamDetect isn't just a technical projectβit has real impact on people's lives.
Use Case 1: Protecting Vulnerable Communities
A grandmother in rural India receives a WhatsApp message claiming to be from her bank. The message asks her to "verify her account" by clicking a link. She's not tech-savvy and the message looks official.
Instead of losing βΉ50,000 to the scammer, she pastes the message into ScamDetect. The system detects phishing keywords and suspicious domain patterns. The result appears in Hindi, her native language. She learns not to click the link.
Impact: One person saved from financial loss.
Use Case 2: Supporting Small Business Owners
A small business owner in Mexico receives an email claiming to be from "PayPal Support" asking him to verify his account. He doesn't speak English well, but he can use ScamDetect to analyze the email in Spanish.
ScamDetect detects:
- Phishing keywords
- Domain impersonation (fake PayPal domain)
- VirusTotal flags from multiple antivirus engines
Impact: Business owner avoids losing business payment data.
Use Case 3: Educational Outreach
Schools and cybersecurity awareness programs can use ScamDetect to teach students about phishing in their native language. Instead of abstract lessons, students can:
- Scan real (anonymized) phishing attempts
- See how detection works
- Understand the techniques scammers use
Impact: Next generation grows up more phishing-aware.
Key Takeaways
Building ScamDetect taught me several important lessons:
Accessibility matters β A tool is only useful if people can actually use it. Multilingual support isn't a nice-to-have, it's essential.
AI + APIs = powerful combinations β Combining local AI (Ollama) with threat intelligence APIs (VirusTotal) creates something stronger than either alone.
Graceful degradation β Don't let one failing service break your whole system. Build systems that work with partial data.
Open source tools are powerful β Ollama, Mistral, Google Cloud Vision, and VirusTotal's free tier made this project possible without massive cloud budgets.
Real-world problems inspire good design β Building something that helps people avoid financial loss is more motivating than building for the sake of building.
Get to know me:
GitHub: github.com/blaycoder
Linkedin: https://www.linkedin.com/in/ayomide-onatola-3180281a5
Conclusion
Phishing and scams are one of the biggest cybersecurity threats facing everyday people. Language barriers shouldn't make anyone more vulnerable.
ScamDetect is my attempt to make the internet a little bit safer, one multilingual detection at a time.
If you found this interesting, I'd love to hear your thoughts! Feel free to comment, ask questions, or share your own experiences building security tools.
Before I drop my pen, check out this post written by me to understand why your privacy is important: Understanding your privacy is very important
Stay safe out there! π‘οΈ

Top comments (0)