How to use Detectron2 Instance Segmentation on Videos

#computervision #segmentation #detectron2 #python

Instance segmentation, a challenging task in computer vision that involves detecting and delineating individual objects within an image or video, has seen significant advancements in recent years. One such advancement is Detectron2, a flexible and efficient framework developed by Facebook AI Research. In this guide, we'll explore how to leverage the power of Detectron2 within the Google Colab environment to perform instance segmentation on videos.

Step 1: Check GPU availability

Check whether you have connected to GPU by changing the runtime from the Runtime tab in the dropdown menu.

After that check whether the GPU is accessible or not by running the command:

!nvidia-smi

If you see something like this, you are all set to go.

Step 2: Install detectron2

Run this single command to directly install detectron2.

!python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Step 3: Import libraries

Import the required libraries.

# COMMON LIBRARIES
import os
import cv2

from google.colab.patches import cv2_imshow

# VISUALIZATION
from detectron2.utils.visualizer import Visualizer
from detectron2.utils.visualizer import ColorMode

# CONFIGURATION
from detectron2 import model_zoo
from detectron2.config import get_cfg

# EVALUATION
from detectron2.engine import DefaultPredictor

Step 4: Initialize the predictor

Choose a model as per your requirement from the model zoo. You can see the list of available models here.

cfg = get_cfg()
cfg.merge_from_file("detectron2/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5
cfg.MODEL.WEIGHTS = "detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl"
predictor = DefaultPredictor(cfg)

Step 5: Inference on Video

Set the path to your video in the following code, and execute the code. The output will be a video with segmentation applied.

import imageio
import numpy

# Load video
video_path = "path_to_your_video.mp4"
cap = cv2.VideoCapture(video_path)

# Initialize video writer
fps = cap.get(cv2.CAP_PROP_FPS)
output_path = '/content/output.mp4'
writer = imageio.get_writer(output_path, fps=fps)

# Perform instance segmentation on each frame
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    outputs = predictor(frame)

    # Find the classes (Optional)
    pred_classes = instances.pred_classes.cpu().numpy()

    # Find the segment points (Optional)
    pred_masks = instances.pred_masks.cpu().numpy()

    v = Visualizer(frame[:, :, ::-1], metadata=MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=0.8)
    frame = v.draw_instance_predictions(outputs["instances"].to("cpu")).get_image()[:, :, ::-1]
    # Write processed frame to output video
    writer.append_data(frame)

# Release video resources
cap.release()
writer.close()