Simple DETR Object Detection with Python

#ai #machinelearning #python #objectdetection

DETR (DEtection TRansformer) is a deep learning model designed for object detection. It utilizes the Transformer architecture, initially created for natural language processing (NLP) tasks, as its core element to tackle the object detection challenge in an innovative and highly efficient way.

Prerequisites

I’d assume you have a background in programming with python. If not it should be installed on your computer before continuing.

If you need to download Python, you can visit the official Python downloads page.

Download Python

Create your virtual environment

Create a virtual environment in python so you can run your packages separate from your host’s environment

python -m venv myenv

Activate virtual environment

Windows

myenv\Scripts\activate

Mac

source myenv/bin/activate

Install packages

We will need to install a few packages before we get started.

pip install transformers torch Pillow requests

Next, create an /images folder in the root of your project. This is where you will save your images to test your AI solution. Im using .jpg files from www.unsplash.com.

After saving an image into the /images directory, we can now start to write the code that will find our image and pass it into the Image.open() method.

import os
from transformers import DetrImageProcessor, DetrForObjectDetection 
import torch
from PIL import Image 

print("transformers", DetrImageProcessor)

current_dir = os.path.dirname(os.path.abspath(__file__))
images_dir = os.path.abspath(os.path.join(current_dir, 'images'))

print("Root directory:", images_dir)

image_path = os.path.join(images_dir, 'airplane.jpg') #

print("image path:", image_path)
print("Reading images from /images")

image = Image.open(image_path)

print("Processing image...")

Once this runs with no errors, we can confidently add the rest of our solution which will scan and provide the results of our image detection.

processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50", revision="no_timm")
model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50", revision="no_timm")

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)

target_sizes = torch.tensor([image.size[::-1]])

results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0]

for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    box = [round(i, 2) for i in box.tolist()]
    print(
            f"Detected {model.config.id2label[label.item()]} with confidence "
            f"{round(score.item(), 3)} at location {box}"
    )

After running server.py, you should get an output similar to this. The decimal numbers you see after location are the coordinates of the area in your image that your model detected the object at.

Reading images from /images
Processing image...
Detected bird with confidence 0.992 at location [55.82, 32.17, 225.04, 225.28]

Potential business value

Models like this can provide a lot of value to software services and products people interact with daily.

Image detection models can detect things like cancer in clinical trials, assist autonomous vehicles with identifying red light and emergency signals or even prevent unauthorized access to systems and physical resources by detecting the identity of a user.

Possibilities are endless.

Connect with me on LinkedIn

DEV Community