Introduction
Every file upload is a potential Trojan Horse. What looks like a harmless .pdf might carry embedded scripts, malware, or invisible metadata leaks. In a world where attackers hide in plain sight, file validation isn't optional - it's your firewall in disguise.
What is Inside:
CalmAV AV Scanning
PII detection via regex
Metadata extraction (EXIF, PDF)
AES_Based Fernet encryption
MIME validation & filename sanitisation
The Upload Threat Surface
letting users upload files without deep inspection is like letting strangers walk into a secure building without a metal detector. Here what could go wrong:
Executables disguised as PDFs
Embedded scripts in image metadata
PII hidden inside resume or logs
Suspicious links in documents redirecting to phishing sites
We didn't want to build a simple validator - we wanted full blown File Threat Intelligence Unit.
Tech Stack
Fast API: High-performance Python web framework
ClamAV: Open-source antivirus engine
Cryptography: Encryption and secure key handling
PyPDF2 & Pillow: Metadata extraction
Selenium: For future dynamic phishing detection
API Features Breakdown
1. File Upload and Sanitisation (/scan endpoint)
Filename sanitisation (to prevent path traversal)
MIME type validation
mimetypesFile size limit enforcement (default: 10MB)
2. Metadata Forensics
- PDF: Author, creation tool, creation/mod timestamp
- Images: EXIF tags including camera, GPS, lens info
3. Sensitive Data Pattern Matching
Scans
.txt, .log, .csvfiles-
Regex pattern detect:
- SSNs (e.g.,
123-45-6789) - Emails (e.g.,
user@example.com) - Phone numbers (e.g.,
+1 -555-1234)
- SSNs (e.g.,
4. ClamAV Antivirus Scanning
Integrates with
clamscanFlags known malware or suspicious payloads
-
If malware is found:
- File is rejected
- Response contains detailed scan logs
5. Encryption + Download Flow
Files passing all checks are encrypted using Fernet (AES-128)
Stored with
.encextensionDecryption handled by
/decryptendpoint
Example: Uploading a File:
Example: Decryption Flow
Sample Error Response (Virus Detected)
Architecture Diagram
Changes & Learnings
ClamAV setup in Docker was tricky (cron needed for DB updates)
Regex PII detection is basic - needs NLP upgrade for production
MIME spoofing still risky; deeper harder analysis recommended
Setup and Deployment
Dockerfile
requirement.txt

Pro Tip: Mount a volume to persist ClamAV DB or update frequently using cron
Final Thoughts: Security Starts at the Upload Button
This API isn't just a validator - it's a mini forensic unit in your DevSecOps pipeline. It combines antivirus, encryption, metadata intelligence, and phishing analysis - all under one /scan endpoint.
Whether you are building a job portal, CMS, or enterprise upload system - this API is your first line of defense.
Try it Yourself
Live Demo: /scan FastAPI





Top comments (0)