Introduction
Social distancing is deliberately increasing the physical space between people to avoid spreading illness. Staying at least six feet away from other people lessens your chances of contracting COVID-19. We can use OpenCV and YOLO to monitor/analyze whether people are maintaining social distancing or not.
Techniques and tools used
I used Python for this project. Some other tools I used were OpenCV and NumPy.
Theory
A little theory won’t hurt :)
OpenCV
So, if you don’t know what OpenCV is, OpenCV is a library of programming functions mainly aimed at real-time computer vision. OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products. Being a BSD-licensed product, OpenCV makes it easy for businesses to utilize and modify the code.
The library has more than 2500 optimized algorithms, which includes a comprehensive set of both classic and state-of-the-art computer vision and machine learning algorithms.
For more info Click Here.
YOLO
YOLO(You Only Look Once) is a clever convolutional neural network (CNN) for doing object detection in real-time. The algorithm applies a single neural network to the full image, and then divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities.
YOLO is popular because it achieves high accuracy while also being able to run in real-time. The algorithm “only looks once” at the image in the sense that it requires only one forward propagation pass through the neural network to make predictions. After non-max suppression (which makes sure the object detection algorithm only detects each object once), it then outputs recognized objects together with the bounding boxes.
We will use these 2 libraries in our project extensively.
Overview
We will use YOLO for object detection.
Once the objects(people) are detected, we will then draw a bounding box around them.
Using the centroid of the boxes we then measure the distances between them.
For the distance measure, Euclidean Distance was used.
A box is colored RED if unsafe and GREEN if safe.
We will also count the number of people who are unsafe because they are not maintaining social-distancing.
Already interested? Let’s gets started with the fun part…
Project
First, let’s see the project structure
Now for the video.mp4 file(input) Click here. Also you can download the YOLOv3 weights, configuration and COCO names from here:
YOLOv3 weights — Click here
YOLOv3 cfg — Click here
COCO names — Click here
Now after that is done, open up the constants.py and copy the following lines of code
YOLOV3_LABELS_PATH = './yolov3/coco.names'
YOLOV3_CFG_PATH = './yolov3/yolov3.cfg'
YOLOV3_WEIGHTS_PATH = './yolov3/yolov3.weights'
VIDEO_PATH = './videos/video.mp4'
OUTPUT_PATH = './output/output.avi'
SAFE_DISTANCE = 60
Wait… What did I just copy?
Don’t Worry! This file just contains the absolute paths of the YOLO weights, cfg file, COCO names, input video path, output video path and the SAFE DISTANCE to be maintained.
Now onto the main part. Open up the main.py file. First let’s make the necessary imports. We also define 2 more constants LABELS and COLORS which we will be using later.
import numpy as np
import imutils
import time
import cv2
import os
import math
from itertools import chain
from constants import *
LABELS = open(YOLOV3_LABELS_PATH).read().strip().split('\n')
np.random.seed(42)
COLORS = np.random.randint(0, 255, size=(len(LABELS), 3), dtype='uint8')
Next, we load in the YOLO model using the configuration and weights we downloaded before. The readNetFromDarknet function helps us to do so.
print('Loading YOLO from disk...')
neural_net = cv2.dnn.readNetFromDarknet(YOLOV3_CFG_PATH, YOLOV3_WEIGHTS_PATH)
layer_names = neural_net.getLayerNames()
layer_names = [layer_names[i[0] - 1] for i in neural_net.getUnconnectedOutLayers()]
layer_names consists of all the output layer names we need from YOLO.
Now, we use OpenCV’s VideoCapture function to read the input video stream.
vs = cv2.VideoCapture(VIDEO_PATH)
writer = None
(W, H) = (None, None)
try:
if(imutils.is_cv2()):
prop = cv2.cv.CV_CAP_PROP_FRAME_COUNT
else:
prop = cv2.CAP_PROP_FRAME_COUNT
total = int(vs.get(prop))
print('Total frames detected are: ', total)
except Exception as e:
print(e)
total = -1
We also set the dimensions of the video frame (W, H) as (None, None) initially. After this, we use the CAP_PROP_FRAME_COUNT of OpenCV to count the number of frames in the given input video stream. We also embed this in a try/except in order to catch any exceptions.
We then read each frame of the input video stream.
while True:
(grabbed, frame) = vs.read()
if not grabbed:
break
if W is None or H is None:
H, W = (frame.shape[0], frame.shape[1])
blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416), swapRB=True, crop=False)
neural_net.setInput(blob)
start_time = time.time()
layer_outputs = neural_net.forward(layer_names)
end_time = time.time()
OpenCV’s read function helps us do that easily. What is a frame you ask? It is simple! As the name suggests, a frame is basically one shot of the video. All these frames stitched together makes up a video. The frame is an array consists of 3 arrays. Each array represents a color i.e Blue, Green, Red(BGR). Each array consists of numbers between 0 to 255, which are called as pixel values. Each image is made up of pixels. So for a 4 * 4 image, there are 16 pixels.
We use a while loop to loop over all the frames of the input video. If in any case a frame is not grabbed we break the while loop as it may be the end of the video. We also update our H and W variables from (None, None) to the (height_of_frame, width_of_frame). Next, we create a blob of the image frame. As OpenCV uses ‘traditional’ representation of colors, they are in the form of BGR(Blue, Greeen, Red). So, we pass the argument swapRB = True to swap the R&B color arrays. Thus, we now get an RGB color array. We also rescale the image by dividing the array elements by 255, so that each element lies between 0 to 1 which helps the model to perform better.
A BLOB stands for Binary Large OBject and refers to a group of connected pixels in a binary image. We then give that as input to the model and then we perform a forward pass of YOLO.
The output from YOLO consists of a set of values. These values help us define which class the object is of and it also gives us the detected object’s bounding box values.
boxes = []
confidences = []
classIDs = []
lines = []
box_centers = []
for output in layer_outputs:
for detection in output:
scores = detection[5:]
classID = np.argmax(scores)
confidence = scores[classID]
if confidence > 0.5 and classID == 0:
box = detection[0:4] * np.array([W, H, W, H])
(centerX, centerY, width, height) = box.astype('int')
x = int(centerX - (width / 2))
y = int(centerY - (height / 2))
box_centers = [centerX, centerY]
boxes.append([x, y, int(width), int(height)])
confidences.append(float(confidence))
classIDs.append(classID)
We loop over every output in the layer_outputs and every detection in the output. We get the scores of each class(80 classes from the COCO names) from the detection array. Also we get the confidence of each class. We then keep a threshold confidence as 0.5 and as we are only interested in detecting people, we set the classID as 0. From each detection we get a bounding box. The first 4 elements of the detection array gives us [X_center_of_box, Y_center_of_box, Width_of_box, Height_of_box], which we then scale to our image frame dimensions.
Then we start drawing the bounding boxes
idxs = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.3)
if len(idxs) > 0:
unsafe = []
count = 0
for i in idxs.flatten():
(x, y) = (boxes[i][0], boxes[i][1])
(w, h) = (boxes[i][2], boxes[i][3])
centeriX = boxes[i][0] + (boxes[i][2] // 2)
centeriY = boxes[i][1] + (boxes[i][3] // 2)
color = [int(c) for c in COLORS[classIDs[i]]]
text = '{}: {:.4f}'.format(LABELS[classIDs[i]], confidences[i])
idxs_copy = list(idxs.flatten())
idxs_copy.remove(i)
for j in np.array(idxs_copy):
centerjX = boxes[j][0] + (boxes[j][2] // 2)
centerjY = boxes[j][1] + (boxes[j][3] // 2)
distance = math.sqrt(math.pow(centerjX - centeriX, 2) + math.pow(centerjY - centeriY, 2))
if distance <= SAFE_DISTANCE:
cv2.line(frame, (boxes[i][0] + (boxes[i][2] // 2), boxes[i][1] + (boxes[i][3] // 2)), (boxes[j][0] + (boxes[j][2] // 2), boxes[j][1] + (boxes[j][3] // 2)), (0, 0, 255), 2)
unsafe.append([centerjX, centerjY])
unsafe.append([centeriX, centeriY])
if centeriX in chain(*unsafe) and centeriY in chain(*unsafe):
count += 1
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 0, 255), 2)
else:
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.putText(frame, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
cv2.rectangle(frame, (50, 50), (450, 90), (0, 0, 0), -1)
cv2.putText(frame, 'No. of people unsafe: {}'.format(count), (70, 70), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 3)
We use Non-Max Suppression in order to avoid to avoid weak and overlapping bounding boxes. Then we calculate the distance between the centroid of current box with all the other detected bounding box centroids. We use the Euclidean distance to measure the distances between the boxes. Below is the formula for Euclidean distance.
Image for post
We compare the each distance with the SAFE_DISTANCE constant we defined earlier in the constants.py file. Next, we use the use the rectangle function of OpenCV to create a rectangle with the box dimensions we received from the model. We check if the box is safe or unsafe. If unsafe then the box color will be colored red else the box will be colored green. We also display a text showing the number of people unsafe using OpenCV’s text function.
Now we create a video by joining each frame back
if writer is None:
fourcc = cv2.VideoWriter_fourcc(*'MJPG')
writer = cv2.VideoWriter(OUTPUT_PATH, fourcc, 30,(frame.shape[1], frame.shape[0]), True)
if total > 0:
elap = (end_time - start_time)
print('Single frame took {:.4f} seconds'.format(elap))
print('Estimated total time to finish: {:.4f}'.format(elap * total))
writer.write(frame)
print('Cleaning up...')
writer.release()
vs.release()
The VideoWriter function of OpenCV helps us to do that. It will store the output video at the location specified by OUTPUT_PATH which we have defined in the constants.py file earlier. The release function will then release the file pointers.
Output
Phew!… Now that the coding part is over, time to see the fruits of our effort.
Go ahead and run the main.py file as follows.
python main.py
Once the program is executed completely check your output folder and open the output.avi file.
It should look something like this…
Impressive right!
Limitations and Future Scope
Although this project is cool, it has a few limitations,
This project does not take into account the camera perspective.
It does not leverage a proper camera calibration (Distances are not measure accurate).
I will work on these limitations in the future.
End notes
You can find the entire code for this article here.
Leave a ⭐ on the repo and a ❤️ on this article if you found it useful. Thank you:)
Top comments (13)
throwing error on google colab
'NoneType' object has no attribute 'shape'
kindly help
I'm getting the same error. Were you able to resolve it Hemant?
Hey! I ran the code on Google Colab and the code ran without any error. I think something went wrong while processing the frame.
hey plz send that code i need it
github.com/sherwyn11/Social-Distan...
can you share limitations and importance of the project
Limitations:
This project does not take into account the camera perspective.
It does not leverage a proper camera calibration (Distances are not measure accurate).
Importance:
Social distancing is deliberately increasing the physical space between people to avoid spreading illness. Staying at least six feet away from other people lessens your chances of contracting COVID-19. We can use OpenCV and YOLO to monitor/analyze whether people are maintaining social distancing or not. It could be used for just keeping a track whether people are maintaining social distancing.
hi may i know the 60 in safe distance is in what unit? and how you determine it?
Hi... As I have said in the limitations, this project does not take into consideration actual perspective. So, the 60 is just a rough estimate for safe estimate for this camera angle.
Awesome! :o
Thank you :)
where is the constant.py file bruh!!!
I've attached a file directory structure image above. So create a constants.py file accordingly and then add this: