DEV Community

Cover image for How to Build an AI vs Human Image Detector Using Streamlit & Transformers
Mustapha Tijani
Mustapha Tijani

Posted on

How to Build an AI vs Human Image Detector Using Streamlit & Transformers

Artificial Intelligence models like SDXL, Grok, Gemini, and others are producing images so realistic that even humans can’t always tell them apart from real photos. As these models get better, traditional detectors become less effective.

In this guide, I’ll show you how to build your own AI-vs-Human Image Detector using:

  • Streamlit for the UI
  • Hugging Face Transformers
  • PyTorch
  • A modern detection model: Organika/sdxl-detector

Overview of What We’re Building

This detector:

  • Accepts an uploaded image
  • Processes it using a pretrained deep-learning model
  • Predicts whether the image is AI-generated or Human-captured
  • Displays the model’s confidence score
  • Works on CPU, CUDA, or Apple Silicon (MPS)

The entire stack sits inside a simple Streamlit app that users can run locally or online.

Step-by-Step: Let's Get Started

Below, we’ll break down the important sections of the script so you not only use it—but understand why it works.

Environment Setup and Packages Installation

We need to setup a virtual environment.

python -m venv env
source env/bin/activate  # On Linux/macOS
# env\Scripts\activate   # On Windows
Enter fullscreen mode Exit fullscreen mode

Packages Installation

# Core dependencies
pip install streamlit pillow torch transformers accelerate

# Optional: If you encounter errors with an old NumPy version (e.g., NumPy 2.x),
# you may need to downgrade it for PyTorch compatibility:
# pip install "numpy<2"
Enter fullscreen mode Exit fullscreen mode

We Will Import Dependencies & Set the Model

import streamlit as st
from PIL import Image
import torch
from transformers import AutoImageProcessor, AutoModelForImageClassification 

MODEL_ID = "Organika/sdxl-detector"
Enter fullscreen mode Exit fullscreen mode

What’s happening here?

  • streamlit powers the web interface
  • Pillow loads and manipulates the uploaded images
  • torch handles model execution
  • transformers loads the Hugging Face model
  • MODEL_ID points to a model optimized for SDXL-level imagery

Selecting the Compute Device (CPU / MPS)

if torch.backends.mps.is_available():
    device = torch.device("mps")
else:
    device = torch.device("cpu")
Enter fullscreen mode Exit fullscreen mode

This section ensures:

  • Mac users get fast inference via Apple Silicon (MPS)
  • Everyone else falls back to CPU

Loading the Model & Processor

processor = AutoImageProcessor.from_pretrained(MODEL_ID)

model = AutoModelForImageClassification.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float32,
    device_map="auto"
)
model.eval()
Enter fullscreen mode Exit fullscreen mode

Here:

  • Image Processor converts PIL images into model tensors
  • Model is loaded with smart device placement
  • eval() ensures the model runs in inference mode

device_map="auto" makes Transformers automatically handle multi-device setups.

Image Classification Next

def predict_pil(img):
    inputs = processor(images=img, return_tensors="pt")
    inputs = {k: v.float().to(device) for k, v in inputs.items()}

    with torch.no_grad():
        outputs = model(**inputs)

    logits = outputs.logits
    probs = torch.softmax(logits, dim=-1)[0]

    pred = torch.argmax(probs).item()

    label = model.config.id2label[pred]
    confidence = float(probs[pred])

    return label, confidence
Enter fullscreen mode Exit fullscreen mode

This function:

  1. Converts the image into model-ready tensors
  2. Moves them to the correct device
  3. Runs a forward pass (no gradients)
  4. Applies softmax to get probabilities
  5. Returns
  • Predicted class label
  • Confidence score

This is the core part of the detector.

Let's add the User-Interface Using Streamlit

st.set_page_config(page_title="AI Image Detector", layout="centered")

st.title("AI vs Human Image Detector")
st.write("Upload an image to detect whether it was generated by an AI model or captured by a human.")
Enter fullscreen mode Exit fullscreen mode

Streamlit handles:

  • Page layout
  • Title + description
  • File uploader

Upload and Display

uploaded = st.file_uploader("Upload Image", type=["jpg", "jpeg", "png"])
MAX_SIZE = (200, 200)

if uploaded:
    img = Image.open(uploaded).convert("RGB")
    img.thumbnail(MAX_SIZE)
    st.image(img, caption="Uploaded Image", width='stretch')
Enter fullscreen mode Exit fullscreen mode

We will resize the image for display but still preserve enough detail for classification.

Running Prediction & Displaying Results

with st.spinner("Analyzing image..."):
    label, confidence = predict_pil(img)
Enter fullscreen mode Exit fullscreen mode

We then interpret results:

if "real" in label.lower() or "human" in label.lower():
    result_style = "Likely Human Captured"
elif "artificial" in label.lower() or "ai" in label.lower():
    result_style = "Likely AI-Generated"
else:
    result_style = label

st.markdown(f"**Prediction:** **{result_style}**")
st.write(f"**Confidence:** **{confidence * 100:.2f}%**")
Enter fullscreen mode Exit fullscreen mode

We normalize the model’s labels into human-readable categories.

Final Thoughts

This project is a great example of how:

  • Streamlit can turn any ML model into a usable app within minutes
  • Transformers makes loading advanced models extremely simple
  • Device-aware code ensures reliability across different hardware

You now have everything needed to build, modify, or extend your own AI detection tools.

Try the live app
https://tj-ai-image-detector.streamlit.app/

Get the source code
https://github.com/tijanidevit/ai-image-detector

Watch Youtube Demo
https://youtu.be/4aLgpu5sirA?si=S6B3kXkfRqBl1-P8

Top comments (0)