DEV Community

Andrei P.
Andrei P.

Posted on

You Don't Need a Neural Network to Spot a Deepfake

Most detection pipelines today are black boxes — a neural network says "fake" and you just trust it. I wanted to see how far pure statistics could go. No deep learning. Just handcrafted image features and a logistic regression.

The results were better than I expected.


The setup

Dataset: CIFAKE — ~60,000 images (real photos vs. AI-generated)

Approach: Extract statistical features from each image, evaluate with two metrics:

  • Covariance difference (Frobenius norm) — how different are the real vs. fake distributions?
  • LDA accuracy — how well does a linear classifier separate the two classes?

Results by feature family

Feature Cov. Difference LDA Accuracy
Noise residual 2.05 × 10³ 84.8%
FFT (frequency) 6.23 × 10¹¹ 79.9%
Texture (LBP + GLCM + Gabor) 1.05 × 10⁵ 76.2%
Color statistics 5.23 × 10³ 73.0%
DCT coefficients 4.65 × 10³ 68.2%
Intensity statistics 2.61 × 10³ 64.3%
Wavelet decomposition 8.99 × 10³ 63.1%

Two things stand out:

1. Noise wins. At 84.8% LDA accuracy, noise residuals outperform every other feature family. Real cameras produce structured, spatially correlated sensor noise. Generative models don't have a camera — their noise patterns are statistically different, and easy to measure.

2. FFT is huge but nonlinear. The covariance gap for frequency features is 6.23 × 10¹¹ — orders of magnitude larger than anything else — yet LDA accuracy sits at only 79.9%. The differences are real but the decision boundary is nonlinear. FFT features likely need an SVM or neural network layer to be fully exploited.


Full pipeline results

Combining all features into a 48-dimensional vector, trained on 84,000 images, tested on 36,000:

Metric Score
Accuracy 85.5%
Precision 86.3%
Recall 84.5%
F1 85.4%
ROC-AUC 92.9%
Training time 4.04 s
Inference time 0.02 s

A 92.9% ROC-AUC from a logistic regression, trained in 4 seconds, running inference in 20ms. No GPU needed.


Why this matters

Statistical detectors give you three things deep learning often doesn't:

  • Interpretability — you can point to exactly which feature triggered the flag
  • Speed — 20ms inference on a laptop, no GPU cluster required
  • Generalization potential — features grounded in physical image properties are less tied to a specific generator than a CNN trained on one dataset

The best production systems will likely be hybrid: statistical features for fast first-pass screening, deep models for depth. Neither replaces the other.


The anomaly map

Beyond classification, I built a patch-level anomaly heatmap. Each patch gets a weighted score:

score = 0.45 × residual + 0.35 × frequency + 0.20 × gradient
Enter fullscreen mode Exit fullscreen mode

Real images produce flat, uniform maps. Synthetic images show concentrated anomalies — usually at object boundaries or regions where the generator lost spatial coherence. Spatial explainability you don't get from a softmax output.


Experiments run on CIFAKE using Python, scikit-learn, OpenCV, and scikit-image.

Top comments (0)