Artificial Intelligence models like SDXL, Grok, Gemini, and others are producing images so realistic that even humans can’t always tell them apart from real photos. As these models get better, traditional detectors become less effective.
In this guide, I’ll show you how to build your own AI-vs-Human Image Detector using:
- Streamlit for the UI
- Hugging Face Transformers
- PyTorch
- A modern detection model: Organika/sdxl-detector
Overview of What We’re Building
This detector:
- Accepts an uploaded image
- Processes it using a pretrained deep-learning model
- Predicts whether the image is AI-generated or Human-captured
- Displays the model’s confidence score
- Works on CPU, CUDA, or Apple Silicon (MPS)
The entire stack sits inside a simple Streamlit app that users can run locally or online.
Step-by-Step: Let's Get Started
Below, we’ll break down the important sections of the script so you not only use it—but understand why it works.
Environment Setup and Packages Installation
We need to setup a virtual environment.
python -m venv env
source env/bin/activate # On Linux/macOS
# env\Scripts\activate # On Windows
Packages Installation
# Core dependencies
pip install streamlit pillow torch transformers accelerate
# Optional: If you encounter errors with an old NumPy version (e.g., NumPy 2.x),
# you may need to downgrade it for PyTorch compatibility:
# pip install "numpy<2"
We Will Import Dependencies & Set the Model
import streamlit as st
from PIL import Image
import torch
from transformers import AutoImageProcessor, AutoModelForImageClassification
MODEL_ID = "Organika/sdxl-detector"
What’s happening here?
-
streamlitpowers the web interface -
Pillowloads and manipulates the uploaded images -
torchhandles model execution -
transformersloads the Hugging Face model -
MODEL_IDpoints to a model optimized for SDXL-level imagery
Selecting the Compute Device (CPU / MPS)
if torch.backends.mps.is_available():
device = torch.device("mps")
else:
device = torch.device("cpu")
This section ensures:
- Mac users get fast inference via Apple Silicon (MPS)
- Everyone else falls back to CPU
Loading the Model & Processor
processor = AutoImageProcessor.from_pretrained(MODEL_ID)
model = AutoModelForImageClassification.from_pretrained(
MODEL_ID,
torch_dtype=torch.float32,
device_map="auto"
)
model.eval()
Here:
- Image Processor converts PIL images into model tensors
- Model is loaded with smart device placement
-
eval()ensures the model runs in inference mode
device_map="auto" makes Transformers automatically handle multi-device setups.
Image Classification Next
def predict_pil(img):
inputs = processor(images=img, return_tensors="pt")
inputs = {k: v.float().to(device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.softmax(logits, dim=-1)[0]
pred = torch.argmax(probs).item()
label = model.config.id2label[pred]
confidence = float(probs[pred])
return label, confidence
This function:
- Converts the image into model-ready tensors
- Moves them to the correct device
- Runs a forward pass (no gradients)
- Applies softmax to get probabilities
- Returns
- Predicted class label
- Confidence score
This is the core part of the detector.
Let's add the User-Interface Using Streamlit
st.set_page_config(page_title="AI Image Detector", layout="centered")
st.title("AI vs Human Image Detector")
st.write("Upload an image to detect whether it was generated by an AI model or captured by a human.")
Streamlit handles:
- Page layout
- Title + description
- File uploader
Upload and Display
uploaded = st.file_uploader("Upload Image", type=["jpg", "jpeg", "png"])
MAX_SIZE = (200, 200)
if uploaded:
img = Image.open(uploaded).convert("RGB")
img.thumbnail(MAX_SIZE)
st.image(img, caption="Uploaded Image", width='stretch')
We will resize the image for display but still preserve enough detail for classification.
Running Prediction & Displaying Results
with st.spinner("Analyzing image..."):
label, confidence = predict_pil(img)
We then interpret results:
if "real" in label.lower() or "human" in label.lower():
result_style = "Likely Human Captured"
elif "artificial" in label.lower() or "ai" in label.lower():
result_style = "Likely AI-Generated"
else:
result_style = label
st.markdown(f"**Prediction:** **{result_style}**")
st.write(f"**Confidence:** **{confidence * 100:.2f}%**")
We normalize the model’s labels into human-readable categories.
Final Thoughts
This project is a great example of how:
- Streamlit can turn any ML model into a usable app within minutes
- Transformers makes loading advanced models extremely simple
- Device-aware code ensures reliability across different hardware
You now have everything needed to build, modify, or extend your own AI detection tools.
Try the live app
https://tj-ai-image-detector.streamlit.app/
Get the source code
https://github.com/tijanidevit/ai-image-detector
Watch Youtube Demo
https://youtu.be/4aLgpu5sirA?si=S6B3kXkfRqBl1-P8
Top comments (0)