DEV Community

Cover image for I Built a Deepfake Detector with Explainable AI (And Here's What It Taught Me About Trus
VivekLumbhani
VivekLumbhani

Posted on

I Built a Deepfake Detector with Explainable AI (And Here's What It Taught Me About Trus

The Problem: Can You Trust What You See?
"Is this photo real?"

It's a question we're asking more and more. And honestly? Sometimes
I can't tell anymore.

Deepfakes have gone from Hollywood special effects to something anyone
can create on their laptop. Politicians saying things they never said.
Celebrities appearing in videos they never filmed. Your mate's face
swapped onto someone else entirely.

For my MSc dissertation at Middlesex University, I decided to tackle
this problem: Can we build a system that detects deepfakes? And more
importantly, can we understand HOW it makes decisions?

This is the story of building an explainable deepfake detection system,
and what I learned about trust in AI along the way.

Why Explainability Matters (More Than Accuracy)
Here's the thing about AI: anyone can throw data at a neural network
and get predictions. But when you're dealing with deepfakes—where
misinformation can influence elections, ruin reputations, or spread
false information—you need more than just a prediction.

You need to know WHY.

Imagine a journalist verifying a video of a politician. An AI system
says "This is fake." The journalist asks, "How do you know?"

If your answer is "Trust me, the neural network said so," that's not
good enough.

That's why I built explainability into the core of my system from
day one.

The Architecture: An Ensemble with a Twist
I didn't want to rely on a single model. Different architectures
notice different things. So I built an ensemble of three state-of-
the-art models:

  1. Xception: Developed by Google, excellent at detecting subtle
    artefacts in manipulated images

  2. EfficientNet: Balances accuracy and efficiency, good for spotting
    compression artefacts

  3. ResNet50: The robust baseline—reliable and well-understood

But here's what makes it different: each model doesn't just vote.
I integrated Grad-CAM (Gradient-weighted Class Activation Mapping)
to visualise exactly WHERE each model is looking when it makes a
decision.

What is Grad-CAM? (The X-Ray for Neural Networks)
Think of Grad-CAM as an X-ray for your neural network's brain.

When a model says "This image is fake," Grad-CAM shows you:

  • Which pixels influenced that decision
  • What regions the model found suspicious
  • Where it's focusing its "attention"

The result? A heatmap overlay showing:
🔴 Red/hot colours: "I'm very interested in this area"
🔵 Blue/cool colours: "This part doesn't matter much"

This is crucial because:
✅ You can verify the model is looking at sensible things (faces,
not backgrounds)
✅ You can identify when it's making decisions for wrong reasons
✅ You can explain results to non-technical users
✅ You can debug when predictions go wrong

What I Learned: The Grad-CAM Reveals EverythingDiscovery 1: Models Look at Different Things
When I started analysing the Grad-CAM heatmaps, something fascinating
emerged: each model focused on different facial regions.

Xception: Heavily weighted edges and boundaries

  • Face contours
  • Hairline transitions
  • Where face meets background
  • Why? GAN-generated images often have subtle boundary artefacts

EfficientNet: Focused on texture and details

  • Skin texture
  • Fine facial features
  • Compression artefacts
  • Why? Deepfakes often introduce unusual texture patterns

ResNet50: Broader facial structure

  • Overall face geometry
  • Symmetry
  • Facial landmarks (eyes, nose, mouth)
  • Why? Deepfakes can distort natural facial proportions

This explained why the ensemble worked better than individual models—
they were literally looking at different clues.

Discovery 2: Low Confidence = Model Uncertainty (Not Failure)
Early on, I got a result that puzzled me:

Prediction: REAL
Confidence: 19.20%

Wait, what? The model thinks it's real but is only 19% confident?

Looking at the individual predictions:

  • Xception: 96% FAKE
  • EfficientNet: 62% REAL
  • ResNet: 83% FAKE

The ensemble averaged these to barely cross the threshold for "REAL."

But here's what the Grad-CAM revealed:

The models were focusing on DIFFERENT regions entirely:

  • Xception spotted compression artefacts around the face edges
  • EfficientNet saw natural skin texture
  • ResNet detected unusual lighting patterns

This wasn't a failure—it was the system saying "I'm not sure, this
needs human review."

And that's EXACTLY what you want in a real-world system.

Discovery 3: Explainability Builds Trust
I showed my system to a journalist friend who covers misinformation.

Without Grad-CAM:
"Your AI says this is fake. But how do I know I can trust it?"

With Grad-CAM:
"Oh, I see—it's focusing on the edges around the face. That does
look weird when you point it out. And this model is looking at the
eyes, which do seem off. Okay, I can work with this."

The difference? She could verify the AI's reasoning matched her
own observations.

That's the power of explainability: it turns a black box into a
collaborative tool.

The Results (Honest Assessment)
Let me be transparent about performance:

Overall Accuracy: ~78% on test set
Precision: 0.75
Recall: 0.82
F1-Score: 0.78

Is this state-of-the-art? No.
Current best systems achieve 90%+ accuracy.

But here's what I learned:

  1. Building a working deepfake detector is hard
    • Deepfakes are getting better constantly
    • No single model is perfect
    • Generalisation across different generation methods is challenging

  2. Explainability comes with tradeoffs
    • More complex models might be more accurate
    • But harder to explain
    • Finding the balance is an art

  3. Real-world deployment requires more than accuracy
    • Edge cases need human review
    • Confidence thresholds matter enormously
    • Users need to understand AND trust the system

Challenges I Faced (And How I Tackled Them)
Challenge 1: Disagreeing Models
Problem:

Xception: 96% FAKE
EfficientNet: 62% REAL
ResNet: 83% FAKE

How do you combine these into a single decision?
solution I tried:
# Attempt 1: Simple averaging
ensemble_pred = np.mean([xception_pred, efficient_pred, resnet_pred])
# Problem: Treats all models equally even if some are better

# Attempt 2: Weighted voting based on validation performance
weights = {'xception': 0.4, 'efficientnet': 0.3, 'resnet': 0.3}
ensemble_pred = sum(weights[m] * preds[m] for m in models)
# Better, but still simplistic

# Attempt 3: Meta-learner (stacking)
from sklearn.linear_model import LogisticRegression

meta_model = LogisticRegression()
meta_features = np.column_stack([
    xception_preds, 
    efficient_preds, 
    resnet_preds
])
meta_model.fit(meta_features, labels)
# Best performance, but less interpretable
Enter fullscreen mode Exit fullscreen mode

What I learned:
There's no perfect ensemble method. Each has tradeoffs between
accuracy, interpretability, and computational cost.


Challenge 2: Threshold Selection

Problem:



At threshold 0.74: Predicts FAKE
At threshold 0.81: Predicts REAL

Same image, different result. That's... not great.

I built the models, THEN figured out how to evaluate them.

Better approach:
- Define success metrics upfront
- Build evaluation pipeline first
- Test on diverse scenarios early
- Identify failure modes systematically

Lesson: You can't improve what you can't measure.

I've open-sourced the core components of this project:

GitHub: https://github.com/VivekLumbhani/deepfake-detection-using-machine-learning
What's included:
✅ Pre-trained model weights
✅ Grad-CAM implementation
✅ Example notebook
✅ Demo web interface (Streamlit)
✅ Evaluation scripts

If you're working on similar problems, here's what I'd emphasise:

✅ Explainability isn't optional
   Black box predictions aren't enough for high-stakes decisions

✅ Ensemble methods are powerful
   Different models capture different patterns

✅ Confidence matters as much as accuracy
   Knowing when to defer to humans is crucial

✅ Perfect is the enemy of done
   My 78% accurate explainable system is more useful than a 
   95% accurate black box I never finished

✅ Real-world deployment is hard
   Account for edge cases, failure modes, and user needs

✅ Trust is earned through transparency
   Show your working, admit limitations, enable verification
I'd love to hear from the community:

1. Have you worked on deepfake detection or explainable AI?
   What challenges did you face?

2. What other applications need explainable predictions?
   Where else is "show your working" crucial?

3. How do you balance accuracy vs interpretability?
   When is one more important than the other?

4. What deepfake detection methods interest you?
   Temporal analysis? Audio-visual consistency? Metadata forensics?

5. How should we communicate AI uncertainty to end users?
   Confidence scores? Visual indicators? Something else?

Drop your thoughts in the comments. Let's discuss how we can 
build AI systems that people can actually trust.

Enter fullscreen mode Exit fullscreen mode

Top comments (0)