DEV Community

RAHUL KUMAR THAKUR
RAHUL KUMAR THAKUR

Posted on

Behind the Upload: Building a Zero-Trust File Inspection API with FastAPI

Introduction

Every file upload is a potential Trojan Horse. What looks like a harmless .pdf might carry embedded scripts, malware, or invisible metadata leaks. In a world where attackers hide in plain sight, file validation isn't optional - it's your firewall in disguise.

What is Inside:

  • CalmAV AV Scanning

  • PII detection via regex

  • Metadata extraction (EXIF, PDF)

  • AES_Based Fernet encryption

  • MIME validation & filename sanitisation

The Upload Threat Surface

letting users upload files without deep inspection is like letting strangers walk into a secure building without a metal detector. Here what could go wrong:

  • Executables disguised as PDFs

  • Embedded scripts in image metadata

  • PII hidden inside resume or logs

  • Suspicious links in documents redirecting to phishing sites
    We didn't want to build a simple validator - we wanted full blown File Threat Intelligence Unit.

Tech Stack

  • Fast API: High-performance Python web framework

  • ClamAV: Open-source antivirus engine

  • Cryptography: Encryption and secure key handling

  • PyPDF2 & Pillow: Metadata extraction

  • Selenium: For future dynamic phishing detection

API Features Breakdown

1. File Upload and Sanitisation (/scan endpoint)

  • Filename sanitisation (to prevent path traversal)

  • MIME type validation mimetypes

  • File size limit enforcement (default: 10MB)

2. Metadata Forensics

  • PDF: Author, creation tool, creation/mod timestamp
  • Images: EXIF tags including camera, GPS, lens info

3. Sensitive Data Pattern Matching

  • Scans .txt, .log, .csv files

  • Regex pattern detect:

    • SSNs (e.g., 123-45-6789)
    • Emails (e.g., user@example.com)
    • Phone numbers (e.g., +1 -555-1234)

4. ClamAV Antivirus Scanning

  • Integrates with clamscan

  • Flags known malware or suspicious payloads

  • If malware is found:

    • File is rejected
    • Response contains detailed scan logs

5. Encryption + Download Flow

  • Files passing all checks are encrypted using Fernet (AES-128)

  • Stored with .enc extension

  • Decryption handled by /decrypt endpoint

Example: Uploading a File:


Response

Example: Decryption Flow

Sample Error Response (Virus Detected)

Architecture Diagram

Changes & Learnings

  • ClamAV setup in Docker was tricky (cron needed for DB updates)

  • Regex PII detection is basic - needs NLP upgrade for production

  • MIME spoofing still risky; deeper harder analysis recommended

Setup and Deployment

Dockerfile

requirement.txt


Pro Tip: Mount a volume to persist ClamAV DB or update frequently using cron

Final Thoughts: Security Starts at the Upload Button

This API isn't just a validator - it's a mini forensic unit in your DevSecOps pipeline. It combines antivirus, encryption, metadata intelligence, and phishing analysis - all under one /scan endpoint.

Whether you are building a job portal, CMS, or enterprise upload system - this API is your first line of defense.

Try it Yourself

Live Demo: /scan FastAPI

Top comments (0)