I Built a Free Deepfake Detector in 3 Hours — Here's the Architecture

AnubhavBharadwaaj — Wed, 27 May 2026 14:40:35 +0000

I shipped TruthLens today — a multi-signal deepfake detection tool that shows you exactly why an image is flagged, not just a confidence score.

Try it free (no signup): huggingface.co/spaces/legendarydragontymer/DeepfakeDetectGPU

The Problem

Most deepfake detectors are black boxes. They tell you "87% fake" but never explain why. That's useless for developers building trust & safety features — you need to know which signals triggered and how confident each one is.

How TruthLens Works

Instead of a single model, TruthLens fuses 4 independent signals:

1. CNN Spatial Analysis (50% weight)

EfficientNet backbone analyzing pixel-level artifacts — blending boundaries, texture inconsistencies, unnatural smoothing that the human eye misses.

2. FFT Frequency Analysis (20% weight)

This is the most visually interesting part. Every image has a frequency spectrum (via Fast Fourier Transform). Real photos show smooth, natural falloff. AI-generated images leave grid artifacts — visible as a cross/plus pattern in the spectrum.

TruthLens renders this spectrum so you can literally see the GAN fingerprint.

3. Edge Density Analysis (15% weight)

Real photos from cameras have complex, varied edges from natural scenes. AI-generated images tend to be unnaturally smooth with less micro-texture. A simple Sobel filter quantifies this.

4. Noise Residual Analysis (15% weight)

Camera sensors produce characteristic noise patterns. AI generators produce different (often more uniform) noise. TruthLens extracts the noise residual via median filter subtraction and measures its uniformity across quadrants.

The Stack

Model: PyTorch + EfficientNet-B0
UI: Gradio
Hosting: HuggingFace Spaces + ZeroGPU (free H200 access)
Signals: torchvision, NumPy FFT, PIL filters
Total cost: $9/month (HF Pro subscription)
Time to build: ~3 hours

What the Output Looks Like

Upload any image and you get:

A composite fake/real score
A signal breakdown table showing each detector's individual score
Detailed metrics with human-readable interpretations
The actual FFT frequency spectrum image

API Access

Every Gradio Space auto-exposes an API. You can call TruthLens programmatically:

from gradio_client import Client

client = Client("legendarydragontymer/DeepfakeDetectGPU")
result = client.predict(
    "path/to/image.jpg",
    api_name="/detect_deepfake"
)
print(result)

I'm building toward a production API with rate limiting and paid tiers for developers who need deepfake screening at scale.

What's Next

Fine-tuned CNN weights from ImageCLEF 2026 competition research
Audio deepfake detection (voice clone detection)
PRNU camera fingerprinting for device-level provenance
Multi-model ensemble for higher accuracy

Try It

Live demo: huggingface.co/spaces/legendarydragontymer/DeepfakeDetectGPU

Feedback welcome — especially edge cases where it gets things wrong. That's how I improve the fusion weights.

Built by @AnubhavBharadwaaj | AgenticEdge.in

I tested a 4B model vs a 70B model on research papers. The 4B model won

AnubhavBharadwaaj — Wed, 15 Apr 2026 07:37:27 +0000

I've been competing in ML competitions (OpenAI Parameter Golf,
WorldQuant IQC) and kept hitting the same wall: I'd read a paper,
understand it conceptually, but lose hours hunting for the actual
learning rate on page 14, the calibration procedure buried in a
footnote, and the failure mode mentioned once in a table caption.

So I built a CLI tool that extracts all of that into a structured
file. One command, ~2 minutes per paper. That part isn't surprising.

What surprised me is what happened when I gave those files to
small models.

The experiment

I took a 33-page quantization survey paper and asked 10 specific
implementation questions like:

"What is the exact inference speedup of InceptionV3 with INT8?"
"What is the energy cost of INT4 vs FP32 at 45nm?"
"In symmetric quantization, what happens to zero point Z?"

I tested two setups:

Setup A: Give the raw PDF to a large model (70B parameters)
Setup B: Give the pre-extracted skill file to a tiny model
(4B parameters — runs on a phone)

The result

The 4B model with the skill file gave more precise answers.

Not "roughly equivalent." More precise. The 70B model with the
raw PDF would say "approximately 2-4x speedup on GPU hardware."
The 4B model with the skill file said "5.02x speedup on NVIDIA
GTX 1080, reference [157]."

Why this happens

It's not magic. It's structural:

Context window. A 33-page PDF is ~50K tokens. A 4B model
has an 8K context window. It literally can't fit the PDF. A
500-line skill file is ~4K tokens. Fits easily.
Table parsing. Small models are terrible at finding numbers
in dense academic prose. A skill file puts every number in a
labeled markdown table row. The model just reads a row.
Hallucination reduction. When a small model can't find
information, it guesses. With structured skill files, the
information is either there (in a labeled field) or not. No
ambiguous prose to misinterpret.
Variable definitions. A PDF says "α" in one paragraph and
"the weighting coefficient" three pages later. A skill file says
α = weighting coefficient for student loss right next to the
equation.

What the skill file looks like

---
name: quantization-for-efficient-neural-networks
description: "Use this skill when implementing model quantization,"
  post-training quantization (PTQ), quantization-aware training 
  (QAT), or mixed-precision inference.
---

## Uniform Quantization
Q(r) = Int(r/S) - Z
where:
  r = real-valued input (activation or weight)
  S = real-valued scaling factor
  Z = integer zero point

## Inference Speedup Data
| Model       | Quant Type | Hardware        | Speedup |
|-------------|-----------|-----------------|---------|
| ResNet50    | INT8      | NVIDIA GTX 1080 | 3.89x   |
| InceptionV3 | INT8      | NVIDIA GTX 1080 | 5.02x   |
| BERT        | INT8      | (unspecified)   | 4.0x    |

## Key Takeaways
1. Use symmetric quantization for weights, asymmetric for activations
2. lr=1e-5 for QAT fine-tuning (NOT 1e-3 — causes oscillation)
3. Channelwise quantization for kernels — one scaling factor per channel

Every skill file follows this exact structure. Whether I generate
it today or six months from now. Whether it's a quantization paper
or a distillation paper.

The real value isn't accuracy — it's workflow

Could you get the same answer by uploading the PDF to Claude Opus?
Yes. Claude reads PDFs excellently.

But:

Can you do that for 30 papers in one command? No.
Will the output format be identical across months? No.
Can you load the results into a 4B local model running offline? No.
Do those ChatPDF sessions still exist six months later? No.

Skill files go in your git repo. They travel with your codebase.
They work in Claude, Cursor, Windsurf, Ollama — any tool that
reads files.

The tool

It's called SkillForge. Single Python file, ~2000 lines, open source.

# Free (uses OpenRouter free models)
python skillforge.py --arxiv 2103.13630 --provider openrouter

# Batch mode — process your weekly reading list
python skillforge.py batch --list sources.txt --provider openrouter --paid

Cost: $0 with free models, ~$0.03/paper with paid mode.

If the quality isn't high enough, it auto-escalates through
stronger models (gemini-flash → deepseek → gemini-pro → claude-sonnet
→ claude-opus) until the target is met.

GitHub: https://github.com/AnubhavBharadwaaj/skillforge

Demo video: https://www.youtube.com/watch?v=O0J55eRcwZw

The finding that small models + structured context beats large
models + raw documents feels generalizable beyond papers.
Any domain where you're feeding unstructured reference material
to an LLM probably benefits from pre-structuring it — even if
the structuring itself costs a frontier model call. You pay once,
every subsequent query is cheaper and more accurate.

Curious if anyone has seen similar results in other domains.

DEV Community: AnubhavBharadwaaj