When you start using YOLO8 you loose a lot of time to find the basics codes to get the bounding boxes, the confidence , the classes or to run a yolo model with onnx runtime!
We will work with the pre-trained COCO object detection. This models have 80 classes:
A class is just a type of object which will be detected, like a “cat” or a “person”.
# Classes names:
0: person 1: bicycle 2: car 3: motorcycle 4: airplane 5: bus
6: train 7: truck 8: boat 9: traffic light 10: fire hydrant
11: stop sign 12: parking meter 13: bench 14: bird 15: cat 16: dog 17: horse 18: sheep 19: cow 20: elephant 21: bear 22: zebra
23: giraffe 24: backpack 25: umbrella 26: handbag 27: tie
28: suitcase 29: frisbee 30: skis 31: snowboard 32: sports ball 33: kite 34: baseball bat 35: baseball glove 36: skateboard
37: surfboard 38: tennis racket 39: bottle 40: wine glass 41: cup 42: fork 43: knife 44: spoon 45: bowl 46: banana 47: apple
48: sandwich 49: orange 50: broccoli 51: carrot 52: hot dog
53: pizza 54: donut 55: cake 56: chair 57: couch 58: potted plant 59: bed 60: dining table 61: toilet 62: tv 63: laptop 64: mouse 65: remote 66: keyboard 67: cell phone 68: microwave 69: oven
70: toaster 71: sink 72: refrigerator 73: book 74: clock 75: vase
76: scissors 77: teddy bear 78: hair drier 79: toothbrush
*We have then various Models to detect this classes :
*
Name Size
yolov8n.pt Nano
yolov8s.pt small
yolov8m.pt Medium
yolov8l.pt Large
yolov8x.pt Huge
They all do the same but resumed: bigger is the model better is the prediction.
All these model are trained with an image size of 640.
You don’t have to resize the image to make a prediction this will be done automatically!
In this story we gonna use the smallest one yolov8n.pt.
Let’s Start with Python I recommend ≥ 3.9, the console command will be detailled at the end for the BAT lovers!
first install Ultralytics:
python -m pip install ultralytics
Create a folder Yolo8 and a file myYoloTests.py or whathever you want in it, and copy this code in it!
from ultralytics import YOLO
#Loading the nano model
model = YOLO('yolov8n.pt')
The model will be downloaded automatically in the first run!
> python myYoloTests.py
Downloading https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt to 'yolov8n.pt'...
0%| | 0.00/6.23M [00:00<?, ?B/s]
6%|6 | 384k/6.23M [00:00<00:01, 3.60MB/s]
18%|#7 | 1.09M/6.23M [00:00<00:00, 5.53MB/s]
26%|##6 | 1.62M/6.23M [00:00<00:00, 5.33MB/s]
35%|###5 | 2.19M/6.23M [00:00<00:00, 5.36MB/s]
46%|####5 | 2.86M/6.23M [00:00<00:00, 5.75MB/s]
55%|#####4 | 3.41M/6.23M [00:00<00:00, 5.34MB/s]
64%|######4 | 4.00M/6.23M [00:00<00:00, 5.43MB/s]
73%|#######2 | 4.55M/6.23M [00:00<00:00, 5.37MB/s]
81%|########1 | 5.06M/6.23M [00:01<00:00, 4.82MB/s]
89%|########9 | 5.56M/6.23M [00:01<00:00, 4.81MB/s]
97%|#########6| 6.03M/6.23M [00:01<00:00, 3.90MB/s]
100%|##########| 6.23M/6.23M [00:01<00:00, 4.75MB/s]
You know have the pre-trained model yolov8n.pt in your folder!
Now let’s start with the basic function to detect an object on this image:
You can download it as dog.jpg and modify your myYoloTests.py like this:
from ultralytics import YOLO
#Loading the nano model
model = YOLO('yolov8n.pt',task='detect')
#Defining the image to test
image = 'dog.jpg'
#Running an inference on the image
model(image)
*Run it:
*
> python myYoloTests.py
You will get as result :
image 1/1 C:\Users\Quasar\Desktop\yolo8medium\dog.jpg: 480x640 1 dog, 546.9ms
Speed: 15.6ms preprocess, 546.9ms inference, 31.2ms postprocess per image at shape (1, 3, 480, 640)
You get as result that a dog was detected and the inference time!
Yhat’s a good start! But I want an image with bounding boxes on it displayed at the run!!!!
Modify your myYoloTests.py this way:
from ultralytics import YOLO
import cv2
#Loading the nano model
model = YOLO('yolov8n.pt',task='detect')
image = 'dog.jpg'
model(image,show=True)
cv2.waitKey(0)
And the image with the dog detected insight bounding box will be displayed!!
If you want that YOLO save that all automatically for you just add:
from ultralytics import YOLO
import cv2
#Loading the nano model
model = YOLO('yolov8n.pt',task='detect')
image = 'dog.jpg'
model(image,show=True,save=True,save_crop=True)
cv2.waitKey(0)
image 1/1 C:\Users\Quasar\Desktop\yolo8medium\dog.jpg: 480x640 1 dog, 484.4ms
Speed: 15.6ms preprocess, 484.4ms inference, 31.2ms postprocess per image at shape (1, 3, 480, 640)
Results saved to runs\detect\predict
YOLO have created the folders runs\detect\predict
In the predict folder you will have a folder crops\dog (the detected class). In this dog folder you have the image dog.jpg cropped:
on the folder \predict** you have the original image **dog.jpg with the bounding box:
And in predict\labels you have the file dog.txt containing:
16 0.493492 0.420779 0.828031 0.76239
16 is the class number for a dog (you can check it at the begining of the story) and the for last float number are the bounding box coordinate in YOLO format in relation of the input image!
How to transform this YOLO format in an understandable format? This is here just as information for someone who maybe will use it, but you will see later that yolo give other informations and you normally will never use this function!
import math
import cv2
def yolo2xywh(im):
img = cv2.imread(im)
shape = img.shape
y,x = shape[:2]
yoloFile = open('.\\runs\\detect\\predict\\labels\\dog.txt','r')
yoloArray = yoloFile.read().split()
yoloFile.close()
yoloArray = [float(z) for z in yoloArray]
x1 = math.ceil((yoloArray[1]-yoloArray[3]/2)*x)
w = math.ceil((yoloArray[1]+yoloArray[3]/2)*x)
y1 = math.ceil((yoloArray[2]-yoloArray[4]/2)*y)
h = math.ceil((yoloArray[2]+yoloArray[4]/2)*y)
return x1,y1,w,h
x,y,w,h = yolo2xywh('dog.jpg')
print("x:",x,"y:",y,"width:",w,"height:",h)
Which return:
x: 51 y: 20 width: 581 height: 385
. . .
OK now we want to get the full control to crop image get the class and the confidence and optimize the speed process!!!
Yolo will give you all the information of a prediction in a variable:
from ultralytics import YOLO
import cv2
#Loading the nano model
model = YOLO('yolov8n.pt',task='detect')
image = 'dog.jpg'
results = model(image)
print(results[0])
You will get a complete structure of your inference :
ultralytics.engine.results.Results object with attributes:
boxes: ultralytics.engine.results.Boxes object
keypoints: None
keys: ['boxes']
masks: None
names: {0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone', 68: 'microwave', 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator', 73: 'book', 74: 'clock', 75: 'vase', 76: 'scissors', 77: 'teddy bear', 78: 'hair drier', 79: 'toothbrush'}
orig_img: array([[[196, 179, 182],
[196, 182, 184],
[197, 184, 186],
...,
[255, 255, 251],
[255, 255, 251],
[255, 255, 251]],
[[196, 179, 182],
[196, 182, 184],
[197, 184, 186],
...,
[255, 255, 251],
[255, 255, 252],
[255, 255, 251]],
[[194, 180, 181],
[195, 183, 183],
[195, 185, 185],
...,
[255, 255, 252],
[255, 255, 254],
[255, 255, 252]],
...,
[[188, 191, 196],
[188, 191, 196],
[187, 190, 195],
...,
[125, 138, 136],
[125, 138, 136],
[125, 138, 136]],
[[188, 191, 196],
[188, 191, 196],
[187, 190, 195],
...,
[124, 138, 137],
[124, 138, 137],
[125, 139, 138]],
[[188, 191, 196],
[188, 191, 196],
[187, 190, 195],
...,
[123, 139, 138],
[125, 138, 140],
[125, 138, 140]]], dtype=uint8)
orig_shape: (480, 640)
path: 'C:\\Users\\Quasar\\Desktop\\yolo8medium\\dog.jpg'
probs: None
save_dir: None
speed: {'preprocess': 15.631437301635742, 'inference': 609.3780994415283, 'postprocess': 31.238794326782227}
If you want the bounding box:
from ultralytics import YOLO
import cv2
model = YOLO('yolov8n.pt',task='detect')
image = 'dog.jpg'
results = model(image)[0]
box = results.boxes
print(box)
And the return :
ultralytics.engine.results.Boxes object with attributes:
boxes: tensor([[ 50.8651, 19.0002, 580.8049, 384.9475, 0.5911, 16.0000]])
cls: tensor([16.])
conf: tensor([0.5911])
data: tensor([[ 50.8651, 19.0002, 580.8049, 384.9475, 0.5911, 16.0000]])
id: None
is_track: False
orig_shape: (480, 640)
shape: torch.Size([1, 6])
xywh: tensor([[315.8350, 201.9738, 529.9398, 365.9473]])
xywhn: tensor([[0.4935, 0.4208, 0.8280, 0.7624]])
xyxy: tensor([[ 50.8651, 19.0002, 580.8049, 384.9475]])
xyxyn: tensor([[0.0795, 0.0396, 0.9075, 0.8020]])
You see that you have all the type of bounding box as tensor!!
I like to work with xyxy dimension, so with the results List we can get everything from original image to boundings boxs and classes and confidences!!
So I make a function that return the coords,the class, the confidence and the image already cropped, I make only 1 detection in this example that why I putted max_det=1 in the model function!
from ultralytics import YOLO
import numpy as np
import cv2
model = YOLO('yolov8n.pt',task='detect')
image = 'dog.jpg'
Threshold=0.3
results = model(image,conf=Threshold,max_det=1)
def affRes(results):
result = results[0]
res = result.boxes[0]
cords = res.xyxy[0].tolist()
cords = [round(x) for x in cords]
class_id = result.names[res.cls[0].item()]
conf = round(res.conf[0].item(), 2)
img_cropped = cv2.resize(np.array(result.orig_img[cords[1]:cords[3],cords[0]:cords[2]]), (128, 128), interpolation=cv2.INTER_AREA)
return cords,conf,class_id,img_cropped
box = results[0].boxes
if len(box)==0:
print("Not detected!!")
exit()
else:
coords,conf,cl,img_cropped = affRes(results)
print(coords,conf,cl)
cv2.imshow('cropped',img_cropped)
cv2.waitKey(0)
OK that the basic of what we do with a detection model with YOLO:
What is important to know is that YOLO8 accept various sources:
*Test it with your video camera:
*
from ultralytics import YOLO
model = YOLO('yolov8n.pt',task='detect')
Threshold=0.3
#0 for video camera source
model("0",conf=Threshold,show=True,max_det=1)
** - - -**
Speeding the inference time!
First thingh we can do to speed up the inference time is to export our model to ONNX
Yolo have a built in export function! We gonna export 2 ONNX models 1 for image size of 640 and one for image size of 416! These value are treated automatically with YOLO8!
from ultralytics import YOLO
import os
image = 'dog.jpg'
model = YOLO('yolov8n.pt')
model(image,imgsz=640)
model.export(format="onnx",imgsz=640,opset=12)
os.rename('yolov8n.onnx','yolov8n640.onnx')
model = YOLO('yolov8n.pt')
model(image,imgsz=416)
model.export(format="onnx",imgsz=416,opset=12)
os.rename('yolov8n.onnx','yolov8n416.onnx')
You now have 2 new model yolov8n640.onnx and yolov8n416.onnx!
Yolo will automatically resize the image to the defined size before the inference! With a smaller image we will get a better inference speed, but we can loose some accuracy in the detection!
We can now tet the inference time for each one, I always warm up each model so the models are loaded and ready to be used!
from ultralytics import YOLO
import time
from time import strftime, sleep
#We define the models the pt model will accept the two inference size
#but not the onnx that why we exported 2 onnx models
modelPt = YOLO('yolov8n.pt',task='detect')
modelOnnx640 = YOLO('yolov8n640.onnx',task='detect')
modelOnnx416 = YOLO('yolov8n416.onnx',task='detect')
image = 'dog.jpg'
#We warm up the models to loads them !
def warmUp():
print("Warming Up Models!!")
modelPt(image,verbose=False)
modelOnnx640(image,imgsz=640,verbose=False)
modelOnnx416(image,imgsz=416,verbose=False)
warmUp()
#####
#A function that test the inference time for 10 run and for each run!
def runTimeTest(modelname,model,imgsz):
start = time.perf_counter()
i=0
arr_inf = []
while i <= 10:
si = time.perf_counter()
model(image,imgsz=imgsz)
se = time.perf_counter() - si
arr_inf.append("{:.2f}".format(se))
i += 1
tend = time.perf_counter() - start
print("Model:",modelname,"Time:",tend)
print("INFERENCE ARRAY:",arr_inf)
print("###########")
###We run the function for each model with the 2 sizes
runTimeTest('PT 640',modelPt,640)
runTimeTest('ONNX 640',modelOnnx640,640)
runTimeTest('PT 416',modelPt,416)
runTimeTest('ONNX 416',modelPt,416)
Result on my I3 laptop:
Model: PT 640 Time: 3.5733098000000005
INFERENCE ARRAY: ['0.42', '0.28', '0.32', '0.32', '0.29', '0.37', '0.36', '0.31', '0.34', '0.32', '0.26']
###########
Model: ONNX 640 Time: 2.593342299999996
INFERENCE ARRAY: ['0.23', '0.25', '0.22', '0.22', '0.22', '0.24', '0.24', '0.22', '0.22', '0.25', '0.28']
###########
Model: PT 416 Time: 1.5918490999999975
INFERENCE ARRAY: ['0.23', '0.12', '0.12', '0.12', '0.17', '0.16', '0.17', '0.15', '0.12', '0.13', '0.11']
###########
Model: ONNX 416 Time: 1.5584512999999944
INFERENCE ARRAY: ['0.16', '0.13', '0.15', '0.12', '0.15', '0.15', '0.14', '0.12', '0.13', '0.15', '0.16']
###########
We see that for the 640 image size models the .onnx is better the the .pt, but for the 416 image size they s no diference! But the 416 px image size reduce 50% of the time inference! That’s very good!
Thanks for reading
Top comments (0)