How to Build an AI vs Human Image Detector Using Streamlit & Transformers

#ai #tutorial #huggingface #image

Artificial Intelligence models like SDXL, Grok, Gemini, and others are producing images so realistic that even humans can’t always tell them apart from real photos. As these models get better, traditional detectors become less effective.

In this guide, I’ll show you how to build your own AI-vs-Human Image Detector using:

Streamlit for the UI
Hugging Face Transformers
PyTorch
A modern detection model: Organika/sdxl-detector

Overview of What We’re Building

This detector:

Accepts an uploaded image
Processes it using a pretrained deep-learning model
Predicts whether the image is AI-generated or Human-captured
Displays the model’s confidence score
Works on CPU, CUDA, or Apple Silicon (MPS)

The entire stack sits inside a simple Streamlit app that users can run locally or online.

Step-by-Step: Let's Get Started

Below, we’ll break down the important sections of the script so you not only use it—but understand why it works.

Environment Setup and Packages Installation

We need to setup a virtual environment.

python -m venv env
source env/bin/activate  # On Linux/macOS
# env\Scripts\activate   # On Windows

Packages Installation

# Core dependencies
pip install streamlit pillow torch transformers accelerate

# Optional: If you encounter errors with an old NumPy version (e.g., NumPy 2.x),
# you may need to downgrade it for PyTorch compatibility:
# pip install "numpy<2"

We Will Import Dependencies & Set the Model

import streamlit as st
from PIL import Image
import torch
from transformers import AutoImageProcessor, AutoModelForImageClassification 

MODEL_ID = "Organika/sdxl-detector"

What’s happening here?

streamlit powers the web interface
Pillow loads and manipulates the uploaded images
torch handles model execution
transformers loads the Hugging Face model
MODEL_ID points to a model optimized for SDXL-level imagery

Selecting the Compute Device (CPU / MPS)

if torch.backends.mps.is_available():
    device = torch.device("mps")
else:
    device = torch.device("cpu")

This section ensures:

Mac users get fast inference via Apple Silicon (MPS)
Everyone else falls back to CPU

Loading the Model & Processor

processor = AutoImageProcessor.from_pretrained(MODEL_ID)

model = AutoModelForImageClassification.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float32,
    device_map="auto"
)
model.eval()

Here:

Image Processor converts PIL images into model tensors
Model is loaded with smart device placement
eval() ensures the model runs in inference mode

device_map="auto" makes Transformers automatically handle multi-device setups.

Image Classification Next

def predict_pil(img):
    inputs = processor(images=img, return_tensors="pt")
    inputs = {k: v.float().to(device) for k, v in inputs.items()}

    with torch.no_grad():
        outputs = model(**inputs)

    logits = outputs.logits
    probs = torch.softmax(logits, dim=-1)[0]

    pred = torch.argmax(probs).item()

    label = model.config.id2label[pred]
    confidence = float(probs[pred])

    return label, confidence

This function:

Converts the image into model-ready tensors
Moves them to the correct device
Runs a forward pass (no gradients)
Applies softmax to get probabilities
Returns

Predicted class label
Confidence score

This is the core part of the detector.

Let's add the User-Interface Using Streamlit

st.set_page_config(page_title="AI Image Detector", layout="centered")

st.title("AI vs Human Image Detector")
st.write("Upload an image to detect whether it was generated by an AI model or captured by a human.")

Streamlit handles:

Page layout
Title + description
File uploader

Upload and Display

uploaded = st.file_uploader("Upload Image", type=["jpg", "jpeg", "png"])
MAX_SIZE = (200, 200)

if uploaded:
    img = Image.open(uploaded).convert("RGB")
    img.thumbnail(MAX_SIZE)
    st.image(img, caption="Uploaded Image", width='stretch')

We will resize the image for display but still preserve enough detail for classification.

Running Prediction & Displaying Results

with st.spinner("Analyzing image..."):
    label, confidence = predict_pil(img)

We then interpret results:

if "real" in label.lower() or "human" in label.lower():
    result_style = "Likely Human Captured"
elif "artificial" in label.lower() or "ai" in label.lower():
    result_style = "Likely AI-Generated"
else:
    result_style = label

st.markdown(f"**Prediction:** **{result_style}**")
st.write(f"**Confidence:** **{confidence * 100:.2f}%**")