DEV Community

Cover image for Flask-Powered Object Detection: Integrating YOLOv3 and YOLOv12 for Real-Time Analysis
Doyin Elugbadebo
Doyin Elugbadebo

Posted on

1

Flask-Powered Object Detection: Integrating YOLOv3 and YOLOv12 for Real-Time Analysis

Computer vision is revolutionizing industries, from autonomous driving to real-time surveillance and healthcare diagnostics. One of the most powerful techniques in this field is the YOLO (You Only Look Once) - a state-of-the-art computer vision framework, known for its speed and accuracy in detecting multiple objects in a single image.

In this article, I’ll guide you through the process of building a Flask-based real-time object detection using both YOLOv3 and YOLOv12. By implementing both versions, you'll gain a broader understanding of the YOLO framework, thereby allowing you to compare performance, accuracy, and efficiency across different YOLO architectures.

By the end of this tutorial, you'll have a fully functional Flask Application that not only serves a single YOLO-based object detection, but also allows you to upload multiple images, process them in parallel, and download the results as a ZIP file.

You can download the full code from the GitHub repo

Let’s get started!

Prerequisites

Before getting started, ensure you have the following:

  • Python 3.8+ installed and properly set up
  • Basic understanding of Flask, and HTTP requests
  • A basic understanding of what object detection means

Note: While Flask mastery will be helpful, don’t worry if you're new to it. This guide is designed to be beginner-friendly, breaking down complex concepts step by step. Whether you're new to object detection and Flask or have some experience, you'll be able to implement YOLO-based object detection in Flask with less complexities!

Understanding the Yolo Framework

YOLO (You Only Look Once) is a state-of-the-art object detection algorithm known for its speed and accuracy. Unlike traditional object detection methods that rely on region proposals and multiple passes over an image (e.g. R-CNN, Faster R-CNN), YOLO treats detection as a single regression problem, predicting bounding boxes and class probabilities in one forward pass of the neural network.

We’ll be using both YOLOv3 and YOLOv12 to gain a broader understanding of the YOLO framework and its evolution over time. Here's why:

  • YOLOv3 has been a time-tested model known for its balance between computational efficiency and detection accuracy, thanks to its Darknet-53 backbone. It remains widely used for real-time object detection tasks.
  • YOLOv12, the latest iteration in the YOLO family, released on February 18th, 2025. It introduces modern enhancements, improved accuracy, and better optimization for various hardware. The model achieves both a lower latency and higher mAP when benchmarked on the Microsoft COCO dataset. By working with both versions, you'll not only master object detection but also learn how to transition between models efficiently for different use cases.

Introduction to Flask

If you're new to Flask, it's a lightweight and flexible micro web framework for Python, designed to make web development quick and easy. According to the Flask website, it provides the essentials for building web applications without enforcing too many restrictions, making it ideal for small to medium-sized projects.

That being said, being a minimalistic framework doesn't mean Flask lacks power. On the contrary, it offers extensive flexibility—you can extend it with various extensions and third-party libraries to add features like authentication, database integration, and more, making it suitable for production-ready applications.

Key Features of Flask:

  • Lightweight & Minimalistic: – Flask comes with only the essentials, such as routing, request handling, and templating, allowing developers to add extensions as needed for databases, authentication, and more.
  • Built-in Development Server & Debugger: Flask comes with a built-in development server, making it easy to test your application locally during development.
  • Jinja2 Templating: Flask uses the Jinja2 templating engine, which allows you to create dynamic HTML pages by embedding Python-like expressions and control structures.
  • Extensibility: Flask’s modular design allows you to add functionality through extensions. For example: Flask-SQLAlchemy for database integration, Flask-WTF for form handling and validation or Flask-Login for user authentication.
  • RESTful Request Handling: Flask supports RESTful request handling, making it a great choice for building APIs and web services.
  • WSGI Compatibility: Flask is fully compatible with the Web Server Gateway Interface (WSGI), ensuring it works seamlessly with various web servers and deployment options.

With these capabilities, Flask allows developers to quickly build web applications while keeping full control over the project's structure and dependencies.

We'll make use of a few of Flask's features as we proceed and see how easy their implementation is.

Now, let’s dive into setting up Flask and building our first application!

Step 1: Setting Up Your Virtual Environment

When working on machine learning projects, it's always a good practice to use a virtual environment to isolate dependencies and avoid conflicts.

First, create a virtual environment called flask_objectDetection_env

python -m venv flask_objectDetection_env
Enter fullscreen mode Exit fullscreen mode

Activate it:

On Windows:

flask_objectDetection_env\Scripts\activate
Enter fullscreen mode Exit fullscreen mode

On Linux/macOS:

source flask_objectDetection_env/bin/activate
Enter fullscreen mode Exit fullscreen mode

Installing Required Libraries

With the virtual environment activated, let's install Flask.

pip install flask
Enter fullscreen mode Exit fullscreen mode

We'll also install other necessary libraries such as OpenCV (essential for image processing) and Numpy:

pip install opencv-python opencv-python-headless numpy
Enter fullscreen mode Exit fullscreen mode

Setting Up the Flask Application

As you may recall, Flask prioritizes simplicity. Its minimalist design allows you to run an entire Flask app with just a single app.py file.

That's exactly what we'll do.

To proceed, make sure your virtual environment is activated. Next, create a directory for your project and navigate into it:

mkdir flask_object_detection  
cd flask_object_detection  
Enter fullscreen mode Exit fullscreen mode

Now, create a new app.py file and add the following basic Flask code:

from flask import Flask

# Create an instance of the Flask class
app = Flask(__name__)

# Define a route and a view function
@app.route('/')
def home():
    return "Hello, Flask!"

# Run the app
if __name__ == '__main__':
    app.run(debug=True)
Enter fullscreen mode Exit fullscreen mode

This script initializes a Flask web application by importing the Flask class from the flask module. It creates an instance of the Flask application, named app, and defines a route ('/') that maps to the home function. When accessed, this route returns a simple "Hello, Flask!" message. Additionally, the script includes a condition to ensure that the Flask application starts in debug mode when the script is run directly.

Debug mode enables features like real-time code changes (auto-reloading) and detailed error tracking, which are particularly useful during development.

Start the Flask server:

python app.py
Enter fullscreen mode Exit fullscreen mode

By default, Flask runs on http://127.0.0.1:5000/.

Open this URL in your browser, and you should see "Hello, Flask!" displayed.

Now that our basic Flask app is running, let's move forward with setting up the image upload functionality, which is essential for object detection.

With this in mind, we need to configure an upload folder where images will be stored and set up a route to handle image uploads. Additionally, I'll import the necessary libraries that we'll use as we progress through the tutorial.

Update your app.py file with the following code:

from flask import Flask, request, render_template, send_file, flash, redirect, url_for, send_from_directory
from werkzeug.utils import secure_filename
import cv2
import numpy as np
import random
import os
import uuid
import zipfile
from io import BytesIO

app = Flask(__name__)
app.secret_key = os.urandom(24)

# Configure upload folders
app.config['UPLOAD_FOLDER'] = ‘uploads’
app.config['ALLOWED_EXTENSIONS'] = {'png', 'jpg', 'jpeg', 'bmp', 'tiff'}
os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)

def allowed_file(filename):
    return '.' in filename and \
           filename.rsplit('.', 1)[1].lower() in app.config['ALLOWED_EXTENSIONS']


# ---------- Paste other codes here -----------


if __name__ == "__main__":
     app.run(debug=True, use_reloader=False)
Enter fullscreen mode Exit fullscreen mode

According to the code, the UPLOAD_FOLDER variable specifies the directory where uploaded images will be stored, and the line os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True) makes sure the folder is created if it doesn’t already exist. Also, the ALLOWED_EXTENSIONS set defines the file types that can be uploaded (e.g., PNG, JPG, JPEG, BMP, TIFF). Lastly, The allowed_file function checks if the uploaded file has an allowed extension.

Note: It is also advisable to set a secret key in your Flask app to enhance security. The secret key is used to encrypt session data, cookies, and other sensitive information, ensuring that your application is protected against tampering and attacks. Without a secret key, Flask will not allow you to use certain features like sessions or flash messages securely. We'll need flask messages in our app, hence the necessity(heck).

To generate a secure secret key, I imported the os library and used os.urandom(24) to create a random key.

Step 2: Download and Set Up YOLO Pre-trained Files

To begin with, download the YOLO pre-trained weights and configuration files from this link. Once downloaded, create a new folder named "models" at the root folder of the application and extract the downloaded files there.

Ensure that your folder contains the following essential files:

  1. yolov3.weights: This file contains the pre-trained weights for the YOLOv3 model. These weights enable the model to perform object detection based on the features it has learned during training.
  2. yolov3.cfg: The configuration file that defines the model's architecture, including layer configurations, filter sizes, and other parameters necessary for building the YOLOv3 network.
  3. coco.names: A text file that lists the class names from the COCO dataset. The YOLOv3 model has been retrained on the COCO dataset, which includes more than 80 common object categories such as people, animals, and everyday objects.
  4. yolov12nt: The Tiny version of YOLOv12 Nano, designed for ultra-fast performance with lower computational requirements.

The first three files are required by YOLOv3 while YOLOv12 makes use of the last file.

Modelling Using Yolov3:

  • YOLOv3 is a significant iteration in the YOLO series, known for its remarkable object detection capabilities. Developed by Joseph Redmon and Ali Farhadi, YOLOv3 improves upon its predecessors (YOLOv1 and YOLOv2) by leveraging a deep neural network with multiple detection scales. This approach utilizes feature maps of varying resolutions, enabling efficient detection of both small and large objects.
  • A key enhancement in YOLOv3 is its use of Darknet-53, a deeper and more efficient convolutional neural network (CNN) backbone compared to YOLOv2’s Darknet-19. Darknet-53 incorporates residual connections (inspired by ResNet), enhancing feature extraction and improving overall detection accuracy. Additionally, YOLOv3 employs anchor boxes - predefined bounding box shapes—to refine object localization.
  • Unlike previous versions, which used softmax for class prediction, YOLOv3 adopts independent logistic classifiers, allowing for multi-label classification - enabling an object to belong to multiple classes simultaneously.
  • YOLOv3 also strikes an effective balance between accuracy and speed, making it faster than most two-stage object detectors while maintaining competitive precision. Due to its efficiency, it is widely used in real-world applications such as video surveillance, autonomous driving, and image recognition.

To work with YOlOv3, we'll be using OpenCV DNN module.

Now, make sure you have the extracted files in the models folder. Once that's settled, paste the following code after the configuration code in app.py:

# Load the YOLOv3 model configuration and weights
model = cv2.dnn.readNet('models/yolov3.weights', 'models/yolov3.cfg')

# Get all the layer names from the YOLO model
layer_names = model.getLayerNames()

# Identify the output layers (layers with no connections going forward)
unconnected_layers = model.getUnconnectedOutLayers()
output_layers = [layer_names[i[0] - 1] if isinstance(i, np.ndarray) else layer_names[i - 1] 
                 for i in unconnected_layers]

# Load COCO class names from file
with open('models/coco.names', 'r') as f:
    classes = [line.strip() for line in f.readlines()]
Enter fullscreen mode Exit fullscreen mode

Explanation:

The cv2.dnn.readNet() function loads the model by reading both the weights and the configuration file. Then, the model’s layer names are retrieved using getLayerNames(). Then, using getUnconnectedOutLayers(), we identify the output layers. This step is crucial because YOLO uses these layers to output detection results. Lastly, The COCO class labels are loaded from the coco.names file. Each line in the file represents one class name, which is stored in the classes list after stripping any extra whitespace.

With these steps completed, you have successfully set up the YOLO model along with the COCO class labels. We can now proceed with object detection tasks.

STEP 3: Define Routes and Templates for Image Processing

Going further, we'll need to define routes, create template files to handle image uploads and display the processed image(s).

Go ahead and create a templates folder in the project's root directory.

Inside the templates folder, create a file named index.html. Paste the following code:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Object Detection</title>
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet">
</head>
<body class="bg-light">
    <nav class="navbar navbar-dark bg-dark">
        <div class="container">
            <a class="navbar-brand" href="/">Object Detection</a>
        </div>
    </nav>

    <div class="container py-5">
        <div class="text-center mb-4">
            <h1 class="display-4 text-primary">Object Detection</h1>
            <p class="lead">Upload images to detect objects using Yolov3 or Yolo11 Algorithms</p>
        </div>

        <div class="row justify-content-center">
            <div class="col-md-8">
                <div class="card shadow">
                    <div class="card-body">
                        <form action="/process" method="post" enctype="multipart/form-data" id="uploadForm">
                            <div class="mb-3">
                                <input type="file" name="files" 
                                       class="form-control" accept="image/*" multiple>
                                <div class="form-text">
                                    Select one or multiple images (PNG, JPG, JPEG, BMP, TIFF)
                                </div>
                            </div>
                            <button type="submit" class="btn btn-primary w-100">
                                Process Image
                            </button>
                        </form>
                    </div>
                </div>
            </div>
        </div> 

        <!-- Loading Spinner -->
        <div id="loadingSpinner" class="d-none mt-4 text-center">
            <div class="spinner-border text-primary" role="status" style="width: 3rem; height: 3rem;">
                <span class="visually-hidden">Loading...</span>
            </div>
            <p class="mt-3 text-muted">Processing images, please wait...</p>
        </div>  
    </div>

    <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script>
    <script>
        document.getElementById('uploadForm').addEventListener('submit', function() {
            document.getElementById('loadingSpinner').classList.remove('d-none');
            this.querySelector('button').disabled = true;
        });
    </script>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

Also create a result.html file inside the templates folder. This page will display the image after processing:

result.html – Displaying the Processed Image

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Processing Result</title>
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet">
</head>
<body class="bg-light">
    <nav class="navbar navbar-dark bg-dark">
        <div class="container">
            <a class="navbar-brand" href="/">Object Detection</a>
        </div>
    </nav>

    <div class="container py-5">
        <div class="card shadow">
            <div class="card-body text-center">
                <h3 class="mb-4">✅ Processed Successfully</h3>
                <div class="alert alert-success">
                    Image processed with ID: {{ process_id }}
                </div>
                <img src="{{ image_url }}" class="img-fluid rounded" alt="Processed Result">
                <div class="mt-4">
                    <a href="/" class="btn btn-primary">Process Another Image</a>
                </div>
        </div>
        </div>
    </div>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

Your project folder should look like this:

flask_object_detection/
├── app.py
├── templates/
│   ├── index.html
│   └── result.html
└── static/
    └── uploads/       # This folder will store uploaded images
Enter fullscreen mode Exit fullscreen mode

After this, replace the index route in app.py with this:

@app.route('/')
def index():
    return render_template('index.html')
Enter fullscreen mode Exit fullscreen mode

Next, defined routes to handle file uploads, image processing and post-processing

@app.route('/uploads/<filename>')
def serve_processed_image(filename):
    return send_from_directory(app.config['UPLOAD_FOLDER'], filename)

@app.route('/process', methods=['POST'])
def process_files():
    if 'files' not in request.files:
        flash('No files selected', 'error')
        return redirect(url_for('index'))

    files = request.files.getlist('files')
    if len(files) == 0 or files[0].filename == '':
        flash('No files selected', 'error')
        return redirect(url_for('index'))

    process_id = uuid.uuid4().hex  # Unique ID for this processing session
    processed_files = []

    for file in files:
        if file and allowed_file(file.filename):
            try:
                # Process image
                img = cv2.imdecode(np.frombuffer(file.read(), np.uint8), cv2.IMREAD_COLOR)

                #YOLOv3 detection and drawing code:
                boxes, confidences, class_ids, indexes = detect_objects(img)
                if len(indexes) > 0:
                    for i in indexes.flatten():
                        x, y, w, h = boxes[i]
                        label = f"{classes[class_ids[i]]}: {confidences[i]:.2f}"
                        color = [random.randint(0, 255) for _ in range(3)]
                        cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
                        cv2.putText(img, label, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

                # Save processed image with unique name
                filename = f"{process_id}_{secure_filename(file.filename)}"
                save_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
                cv2.imwrite(save_path, img) #YOLOv3 Implementation
                processed_files.append(filename)

            except Exception as e:
                app.logger.error(f"Error processing {file.filename}: {str(e)}")
                flash(f'Error processing {file.filename}', 'error')

    if len(processed_files) == 0:
        flash('No files processed successfully', 'error')
        return redirect(url_for('index'))

    # Handle single file response
    if len(processed_files) == 1:
        return render_template('result.html', 
                             image_url=url_for('serve_processed_image', 
                             filename=processed_files[0]),
                             process_id=process_id)
Enter fullscreen mode Exit fullscreen mode

If you find yourself stuck or need clarification, feel free to check the accompanying GitHub repository for reference.

To make sure we're on the same page, here is the updated app.py at this point

from flask import Flask, request, render_template, send_file, flash, redirect, url_for, send_from_directory
from werkzeug.utils import secure_filename
import cv2
import numpy as np
import random
import os
import uuid
import zipfile
from io import BytesIO

app = Flask(__name__)
app.secret_key = os.urandom(24)


# Configure upload folders
#-------------------------------------
app.config['UPLOAD_FOLDER'] = 'static/uploads'
app.config['ALLOWED_EXTENSIONS'] = {'png', 'jpg', 'jpeg', 'bmp', 'tiff'}
os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)

def allowed_file(filename):
    return '.' in filename and \
           filename.rsplit('.', 1)[1].lower() in app.config['ALLOWED_EXTENSIONS']


#Yolov3 Implementation
#--------------------------------------
# Load the YOLO model configuration and weights
model = cv2.dnn.readNet('models/yolov3.weights', 'models/yolov3.cfg')
# Get all the layer names from the YOLO model
layer_names = model.getLayerNames()
# Identify the output layers (layers with no connections going forward)
unconnected_layers = model.getUnconnectedOutLayers()
output_layers = [layer_names[i[0] - 1] if isinstance(i, np.ndarray) else layer_names[i - 1] 
                 for i in unconnected_layers]

# Load COCO class names from file
with open('models/coco.names', 'r') as f:
    classes = [line.strip() for line in f.readlines()]


# Routes
#----------------------------------------
@app.route('/')
def index():
    return render_template('index.html')

@app.route('/uploads/<filename>')
def serve_processed_image(filename):
    return send_from_directory(app.config['UPLOAD_FOLDER'], filename)

@app.route('/process', methods=['POST'])
def process_files():
    if 'files' not in request.files:
        flash('No files selected', 'error')
        return redirect(url_for('index'))

    files = request.files.getlist('files')
    if len(files) == 0 or files[0].filename == '':
        flash('No files selected', 'error')
        return redirect(url_for('index'))

    process_id = uuid.uuid4().hex  # Unique ID for this processing session
    processed_files = []

    for file in files:
        if file and allowed_file(file.filename):
            try:
                # Process image
                img = cv2.imdecode(np.frombuffer(file.read(), np.uint8), cv2.IMREAD_COLOR)

                #YOLOv3 detection and drawing code:
                boxes, confidences, class_ids, indexes = detect_objects(img)
                if len(indexes) > 0:
                    for i in indexes.flatten():
                        x, y, w, h = boxes[i]
                        label = f"{classes[class_ids[i]]}: {confidences[i]:.2f}"
                        color = [random.randint(0, 255) for _ in range(3)]
                        cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
                        cv2.putText(img, label, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

                # Save processed image with unique name
                filename = f"{process_id}_{secure_filename(file.filename)}"
                save_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
                cv2.imwrite(save_path, img) #YOLOv3 Implementation
                processed_files.append(filename)

            except Exception as e:
                app.logger.error(f"Error processing {file.filename}: {str(e)}")
                flash(f'Error processing {file.filename}', 'error')

    if len(processed_files) == 0:
        flash('No files processed successfully', 'error')
        return redirect(url_for('index'))

    # Handle single file response
    if len(processed_files) == 1:
        return render_template('result.html', 
                             image_url=url_for('serve_processed_image', 
                             filename=processed_files[0]),
                             process_id=process_id)

if __name__ == "__main__":
     app.run(debug=True, use_reloader=False)
Enter fullscreen mode Exit fullscreen mode

Run Your Application

Start your Flask application by running:

python app.py
Enter fullscreen mode Exit fullscreen mode

Then, open your web browser and navigate to http://localhost:5000/.

You should see the index page with the upload form. When we upload an image, it should be processed, and the result displayed on the result page.

Image description

However, the app won’t process any images at this point because we've not defined the function that will handle the actual object detection process.

Here is the terminal output:

[2025-03-12 05:41:51,484] ERROR in app: Error processing 0_lCB37mwYtKFKJcrI.jpg: name 'detect_objects' is not defined
Enter fullscreen mode Exit fullscreen mode

The error occurs because the detect_objects function is used in the '/process' route but hasn't been defined yet. Specifically, the issue arises from this line: boxes, confidences, class_ids, indexes = detect_objects(img), where the function is called but not implemented beforehand.

So, to handle the image processing logic, go back to app.py and paste the following code immediately after YOLOv3 implementation.

def detect_objects(img):
    height, width = img.shape[:2]
    blob = cv2.dnn.blobFromImage(img, 1/255.0, (416, 416), swapRB=True, crop=False)
    model.setInput(blob)
    outputs = model.forward(output_layers)

    boxes, confidences, class_ids = [], [], []
    for output in outputs:
        for detection in output:
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            if confidence > 0.5:
                box = detection[0:4] * np.array([width, height, width, height])
                (center_x, center_y, w, h) = box.astype("int")
                x = int(center_x - (w / 2))
                y = int(center_y - (h / 2))
                boxes.append([x, y, int(w), int(h)])
                confidences.append(float(confidence))
                class_ids.append(class_id)

    indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
    return boxes, confidences, class_ids, indexes
Enter fullscreen mode Exit fullscreen mode

The detect_objects(img) function processes an input image by extracting its dimensions and converting it into a blob for a deep learning model, which then performs object detection. It filters detections with confidence above 0.5, calculates bounding box coordinates, and applies Non-Maximum Suppression (NMS) to remove redundant boxes before returning the final detections, confidence scores, class IDs, and retained indexes.

Remember if you found yourself stuck or need clarification, feel free to check the accompanying GitHub repository for reference.

Explanation:

This route handles the image file upload, processes it for object detection, and then renders the result page.

Restart the server and test again:

python app.py
Enter fullscreen mode Exit fullscreen mode

Image description

Terminal output:

0: 448x640 1 person, 7506.0ms
Speed: 934.0ms preprocess, 7506.0ms inference, 1403.0ms postprocess per image at shape (1, 3, 448, 640)

0: 448x640 1 person, 831.0ms
Speed: 11.0ms preprocess, 831.0ms inference, 12.0ms postprocess per image at shape (1, 3, 448, 640)
127.0.0.1 - - [12/Mar/2025 05:33:36] "POST /process HTTP/1.1" 200 -
127.0.0.1 - - [12/Mar/2025 05:33:39] "GET /uploads/cb53a67b6db04b82910a1895d1a1887d_0_lCB37mwYtKFKJcrI_-_Copy_-_Copy.jpg HTTP/1.1" 200 -
Enter fullscreen mode Exit fullscreen mode

Processing Multiple Images

Now, let’s process multiple Image file uploads. Go back to app.py and paste this after the single file code code snippet

# Handle multiple files as zip
    zip_filename = f"processed_images_{process_id}.zip"
    zip_buffer = BytesIO()

    with zipfile.ZipFile(zip_buffer, 'w', zipfile.ZIP_DEFLATED) as zip_file:
        for filename in processed_files:
            file_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
            zip_file.write(file_path, filename)

    zip_buffer.seek(0)

    response = send_file(zip_buffer,
                        mimetype='application/zip',
                        as_attachment=True,
                        download_name=zip_filename)

    # Add custom header to identify zip response
    response.headers['X-Content-Type'] = 'application/zip'
    return response
Enter fullscreen mode Exit fullscreen mode

Start the server and ensure it's running, then upload multiple image files through the application. The app will begin processing the images and automatically zip all the processed files.

Once processing is complete, the zipped file will be downloaded to your default Downloads directory.

With that, we can now switch from YOLOv3 model to YOLOv12.

Switching to Yolov12

YOLOv12 is an advanced object detection model that emphasizes attention mechanisms to enhance detection accuracy while maintaining real-time processing speeds. It was YOLOv12 was released on February 18, 2025 and was developed by Yunjie Tian, Qixiang Ye, David Doermann (Read the paper here). This model surpasses previous iterations, such as YOLOv10 and YOLOv11, by achieving higher mean Average Precision (mAP) scores with comparable or faster inference times.

Its architecture supports various tasks, including object detection, segmentation, classification, keypoint detection, and oriented bounding box detection.

First things first, note that YOLOv3 uses OpenCV’s DNN module, while YOLOv12 is accessed via Ultralytics library, which has a different API.

So, go ahead to install Ultralytics library

pip install ultralytics:
Enter fullscreen mode Exit fullscreen mode

One more thing:- the original code for YOLOv3 involves loading the model with cv2.dnn.readNet, processing blobs, and handling outputs through specific layers. For YOLOv12, the Ultralytics model is more straightforward. The model is loaded directly with YOLO('model.pt'), and predictions are made with model.predict(), which returns a Results object.

Basically, the main changes would be in the model loading and the detection function. The detect_objects function in the Yolov3 would need to be replaced. Instead of manually processing outputs, YOLOv12 handles this internally. The bounding boxes, confidence scores, and class IDs can be extracted directly from the Results object. Also, YOLOv12 has a plot() method that annotates the image. With this chnages, you'll discover that the **detect_objects** function will be 5-10x shorter for the detection logic and Less error-prone to error (no manual box calculations)

So, let’s go ahead to Replace YOLOv3 model loading with Ultralytics version and Update the **detect_objects** function.

In app.py, Replace this section:

# ========== YOLOv3 Implementation (Original) ==========
# model = cv2.dnn.readNet('models/yolov3.weights', 'models/yolov3.cfg')
# layer_names = model.getLayerNames()
# unconnected_layers = model.getUnconnectedOutLayers()
# output_layers = [layer_names[i[0] - 1] if isinstance(i, np.ndarray) else layer_names[i - 1] 
#                  for i in unconnected_layers]

# def detect_objects(img):
#     height, width = img.shape[:2]
#     blob = cv2.dnn.blobFromImage(img, 1/255.0, (416, 416), swapRB=True, crop=False)
#     model.setInput(blob)
#     outputs = model.forward(output_layers)
#     # ... rest of detection logic ...
Enter fullscreen mode Exit fullscreen mode

With this:

# ========== YOLOv12 Implementation (New) ==========
from ultralytics import YOLO

# Load YOLOv12 model
model = YOLO('models/yolov12n.pt')  

def detect_objects(img):
    results = model.predict(img)
    return results[0].plot()  # Returns the annotated image directly
Enter fullscreen mode Exit fullscreen mode

Also, replace this section under @app.route('/process', methods=['POST']):

# ======== REPLACE THIS SECTION ========
                # Old YOLOv3 detection and drawing code:
                # boxes, confidences, class_ids, indexes = detect_objects(img)
                # if len(indexes) > 0:
                #     for i in indexes.flatten():
                #         x, y, w, h = boxes[i]
                #         label = f"{classes[class_ids[i]]}: {confidences[i]:.2f}"
                #         color = [random.randint(0, 255) for _ in range(3)]
                #         cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
                #         cv2.putText(img, label, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

Enter fullscreen mode Exit fullscreen mode

With this:

# New YOLOv12 detection and auto-annotation:
                results = model.predict(img)
                annotated_img = detect_objects(img)
Enter fullscreen mode Exit fullscreen mode

Then immediately after that, replace:

cv2.imwrite(save_path, img) 
Enter fullscreen mode Exit fullscreen mode

with this:

cv2.imwrite(save_path, annotated_img)
Enter fullscreen mode Exit fullscreen mode

Now, go ahead and test both single and multiple file uploads again. The images will be processed and outputted as expected.

There's More!

Congrats on making it this far! Now that we've reached the end of the guide, there are still several ways to enhance the app:

  1. Improved File Handling: Currently, the app processes multiple images only as a zipped file and automatically downloads them. You can enhance this by displaying all processed images in a grid layout with an option to download them individually.
  2. Video Processing Support: Right now, object detection works only for images. Extend the app to support video processing, allowing users to upload videos directly, analyze YouTube videos, or even process surveillance feeds in real time.
  3. Authenticated API & User Dashboard: You can implement an authentication system where only registered users can access the app. Provide users with a dashboard where they can choose different object detection and segmentation methods for their needs.

If you need further customizations or guidance, feel free to reach out.

You can download the full code from the GitHub repo

Love the guide?

Your support will be appreciated

Buy Me A Coffee

Conclusion

In conclusion, this guide has provided a comprehensive overview of how to build and deploy a real-time object detection API using Flask and YOLOv3 and YOLOv12. By examining the strengths and differences between these two versions of the YOLO framework, you now have a deeper understanding of how to optimize object detection models for speed, accuracy, and scalability.

References

  1. https://pjreddie.com/darknet/yolo/
  2. https://blog.roboflow.com/train-yolov12-model/
  3. https://learnopencv.com/yolov12/
  4. https://github.com/sunsmarterjie/yolov12
  5. https://roboflow.com/model/yolov12
  6. https://www.arxiv.org/pdf/2502.12524

AWS Q Developer image

Your AI Code Assistant

Ask anything about your entire project, code and get answers and even architecture diagrams. Built to handle large projects, Amazon Q Developer works alongside you from idea to production code.

Start free in your IDE

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay