The Python file, main.py
, is an object detection application that uses the DEtection TRansformer (DETR) model from Facebook's Hugging Face library. It allows users to identify and crop images of detected objects, storing these cropped images in a specified output directory.
The application provides a Graphical User Interface (GUI) developed with the tkinter library, where users can specify the input directory of images, the output directory for the cropped images, and a confidence level for the model to use in object detection.
Now, let's break down the script into sections and explain each part in detail.
Importing Required Libraries
The script begins by importing the necessary libraries. These include:
-
tkinter
: a standard Python interface to the Tk GUI toolkit, used for developing desktop applications. -
filedialog
: a tkinter module for displaying dialog boxes that let the user select files or directories. -
PIL (Pillow)
: a library for opening, manipulating, and saving many different image file formats. -
transformers
: a state-of-the-art Natural Language Processing (NLP) library that provides pre-trained models for various tasks, including the DETR model used here for object detection. -
torch
: a Python library for scientific computing, especially deep learning, providing tensors that can run on either a CPU or a GPU. -
requests
andos
: standard Python libraries for handling HTTP requests and interacting with the operating system, respectively.
Initializing the Model and Processor
The DetrImageProcessor
and DetrForObjectDetection
classes are imported from the transformers library. These are initialized with the pretrained DETR model from Facebook, "facebook/detr-resnet-50"
.
Defining the Image Crop Function
def image_crops(input_directory, output_directory, confidence):
# Loop through every file in the input directory
for filename in os.listdir(input_directory):
# Get the path to the current file
curr_path = os.path.join(input_directory, filename)
# Open the current file as an image
image = Image.open(curr_path)
# Pass the image to the model to detect objects
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
# Get the dimensions of the image
target_sizes = torch.tensor([image.size[::-1]])
# Post process the model outputs to get the detected objects
results = processor.post_process_object_detection(
outputs, target_sizes=target_sizes, threshold=confidence)[0]
# Loop through the detected objects
with Image.open(curr_path) as im:
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
# Round the coordinates of the detected object
box = [round(i, 2) for i in box.tolist()]
# Get the label of the detected object
label_text = model.config.id2label[label.item()]
print(
f"Detected {model.config.id2label[label.item()]} with confidence "
f"{round(score.item(), 3)} at location {box}"
)
# Create a directory for the label if it does not exist
if not (os.path.exists(f"{output_directory}/{label_text}")):
os.mkdir(f"{output_directory}/{label_text}")
counts[label_text] = counts.get(label_text, 0) + 1
remote_region = im.crop(box)
remote_region.save(
f"{output_directory}/{label_text}/{label_text}_{counts[label_text]}.jpg")
The image_crops
function takes three arguments: input_directory
, output_directory
, and confidence
. This function iterates through all the images in the input_directory
, performs object detection on each image, and saves the cropped images in the corresponding output_directory
. The confidence
parameter is a threshold for the model to determine if an object is present or not.
The function performs the following steps:
- Loops through each image file in the input directory.
- Opens the image file and processes it using the
DetrImageProcessor
. - Passes the processed image to the
DetrForObjectDetection
model. - Gets the dimensions of the image and post-processes the model outputs to get the detected objects and their bounding boxes.
- Loops through each detected object and crops the image based on the bounding box coordinates.
- Saves the cropped image to the output directory, creating new directories for each detected object type if necessary.
GUI Functions
Several functions are defined to interact with the GUI:
-
select_input_dir
: Lets the user choose the input directory. -
select_output_dir
: Lets the user choose the output directory. -
submit
: Gets the selected directories and confidence level, calls theimage_crops
function with these parameters, and closes the application after processing.
Building the GUI
The tkinter library is used to create the GUI. The application window is created using tk.Tk()
. The GUI contains buttons for selecting the input and output directories, a slider for setting the confidence level, and a submit button to start the processing. The grid
function is used to position these elements in the application window.
The mainloop
function is called to start the tkinter event loop, which waits for user interaction and responds accordingly.
The final script is a complete application that allows users to perform object detection and image cropping tasks easily. It is a great example of how powerful machine learning models can be combined with user-friendly interfaces to create practical tools.
Top comments (0)