DEV Community

Cover image for YOLO8 The basic functions for beginner!
SachaDee
SachaDee

Posted on

YOLO8 The basic functions for beginner!

When you start using YOLO8 you loose a lot of time to find the basics codes to get the bounding boxes, the confidence , the classes or to run a yolo model with onnx runtime!

We will work with the pre-trained COCO object detection. This models have 80 classes:

A class is just a type of object which will be detected, like a “cat” or a “person”.

# Classes names:

0: person 1: bicycle 2: car 3: motorcycle 4: airplane 5: bus 
6: train 7: truck 8: boat 9: traffic light 10: fire hydrant 
11: stop sign 12: parking meter 13: bench 14: bird 15: cat 16: dog 17: horse 18: sheep 19: cow 20: elephant 21: bear 22: zebra 
23: giraffe 24: backpack 25: umbrella 26: handbag 27: tie 
28: suitcase 29: frisbee 30: skis 31: snowboard 32: sports ball 33: kite 34: baseball bat 35: baseball glove 36: skateboard 
37: surfboard 38: tennis racket 39: bottle 40: wine glass 41: cup 42: fork 43: knife 44: spoon 45: bowl 46: banana 47: apple 
48: sandwich 49: orange 50: broccoli 51: carrot 52: hot dog 
53: pizza 54: donut 55: cake 56: chair 57: couch 58: potted plant 59: bed 60: dining table 61: toilet 62: tv 63: laptop 64: mouse 65: remote 66: keyboard 67: cell phone 68: microwave 69: oven 
70: toaster 71: sink 72: refrigerator 73: book 74: clock 75: vase 
76: scissors 77: teddy bear 78: hair drier 79: toothbrush
Enter fullscreen mode Exit fullscreen mode

*We have then various Models to detect this classes :
*

 Name        Size

 yolov8n.pt  Nano
 yolov8s.pt  small
 yolov8m.pt  Medium
 yolov8l.pt  Large
 yolov8x.pt  Huge
Enter fullscreen mode Exit fullscreen mode

They all do the same but resumed: bigger is the model better is the prediction.

All these model are trained with an image size of 640.

You don’t have to resize the image to make a prediction this will be done automatically!

In this story we gonna use the smallest one yolov8n.pt.

Let’s Start with Python I recommend ≥ 3.9, the console command will be detailled at the end for the BAT lovers!

first install Ultralytics:

python -m pip install ultralytics

Enter fullscreen mode Exit fullscreen mode

Create a folder Yolo8 and a file myYoloTests.py or whathever you want in it, and copy this code in it!

from ultralytics import YOLO

#Loading the nano model

model = YOLO('yolov8n.pt')
Enter fullscreen mode Exit fullscreen mode

The model will be downloaded automatically in the first run!

> python myYoloTests.py

Downloading https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt to 'yolov8n.pt'...

  0%|          | 0.00/6.23M [00:00<?, ?B/s]
  6%|6         | 384k/6.23M [00:00<00:01, 3.60MB/s]
 18%|#7        | 1.09M/6.23M [00:00<00:00, 5.53MB/s]
 26%|##6       | 1.62M/6.23M [00:00<00:00, 5.33MB/s]
 35%|###5      | 2.19M/6.23M [00:00<00:00, 5.36MB/s]
 46%|####5     | 2.86M/6.23M [00:00<00:00, 5.75MB/s]
 55%|#####4    | 3.41M/6.23M [00:00<00:00, 5.34MB/s]
 64%|######4   | 4.00M/6.23M [00:00<00:00, 5.43MB/s]
 73%|#######2  | 4.55M/6.23M [00:00<00:00, 5.37MB/s]
 81%|########1 | 5.06M/6.23M [00:01<00:00, 4.82MB/s]
 89%|########9 | 5.56M/6.23M [00:01<00:00, 4.81MB/s]
 97%|#########6| 6.03M/6.23M [00:01<00:00, 3.90MB/s]
100%|##########| 6.23M/6.23M [00:01<00:00, 4.75MB/s]

Enter fullscreen mode Exit fullscreen mode

You know have the pre-trained model yolov8n.pt in your folder!

Now let’s start with the basic function to detect an object on this image:

Dog.jpg

You can download it as dog.jpg and modify your myYoloTests.py like this:

from ultralytics import YOLO

#Loading the nano model
model = YOLO('yolov8n.pt',task='detect')

#Defining the image to test
image = 'dog.jpg'

#Running an inference on the image
model(image)
Enter fullscreen mode Exit fullscreen mode

*Run it:
*

> python myYoloTests.py
Enter fullscreen mode Exit fullscreen mode

You will get as result :

image 1/1 C:\Users\Quasar\Desktop\yolo8medium\dog.jpg: 480x640 1 dog, 546.9ms
Speed: 15.6ms preprocess, 546.9ms inference, 31.2ms postprocess per image at shape (1, 3, 480, 640)
Enter fullscreen mode Exit fullscreen mode

You get as result that a dog was detected and the inference time!

Yhat’s a good start! But I want an image with bounding boxes on it displayed at the run!!!!

Modify your myYoloTests.py this way:

from ultralytics import YOLO
import cv2

#Loading the nano model

model = YOLO('yolov8n.pt',task='detect')

image = 'dog.jpg'

model(image,show=True)

cv2.waitKey(0)
Enter fullscreen mode Exit fullscreen mode

And the image with the dog detected insight bounding box will be displayed!!

If you want that YOLO save that all automatically for you just add:

from ultralytics import YOLO
import cv2

#Loading the nano model

model = YOLO('yolov8n.pt',task='detect')

image = 'dog.jpg'

model(image,show=True,save=True,save_crop=True)

cv2.waitKey(0)
Enter fullscreen mode Exit fullscreen mode
image 1/1 C:\Users\Quasar\Desktop\yolo8medium\dog.jpg: 480x640 1 dog, 484.4ms
Speed: 15.6ms preprocess, 484.4ms inference, 31.2ms postprocess per image at shape (1, 3, 480, 640)
Results saved to runs\detect\predict
Enter fullscreen mode Exit fullscreen mode

YOLO have created the folders runs\detect\predict

In the predict folder you will have a folder crops\dog (the detected class). In this dog folder you have the image dog.jpg cropped:

Dog.jpg

on the folder \predict** you have the original image **dog.jpg with the bounding box:

dog.jpg

And in predict\labels you have the file dog.txt containing:

16 0.493492 0.420779 0.828031 0.76239
Enter fullscreen mode Exit fullscreen mode

16 is the class number for a dog (you can check it at the begining of the story) and the for last float number are the bounding box coordinate in YOLO format in relation of the input image!

How to transform this YOLO format in an understandable format? This is here just as information for someone who maybe will use it, but you will see later that yolo give other informations and you normally will never use this function!

import math
import cv2

def yolo2xywh(im):
 img = cv2.imread(im)
 shape = img.shape
 y,x = shape[:2]
 yoloFile = open('.\\runs\\detect\\predict\\labels\\dog.txt','r')
 yoloArray = yoloFile.read().split()
 yoloFile.close()
 yoloArray = [float(z) for z in yoloArray]
 x1 = math.ceil((yoloArray[1]-yoloArray[3]/2)*x)
 w = math.ceil((yoloArray[1]+yoloArray[3]/2)*x)
 y1 = math.ceil((yoloArray[2]-yoloArray[4]/2)*y)
 h = math.ceil((yoloArray[2]+yoloArray[4]/2)*y)
 return x1,y1,w,h

x,y,w,h = yolo2xywh('dog.jpg')
print("x:",x,"y:",y,"width:",w,"height:",h)
Enter fullscreen mode Exit fullscreen mode

Which return:

x: 51 y: 20 width: 581 height: 385
Enter fullscreen mode Exit fullscreen mode

. . .

OK now we want to get the full control to crop image get the class and the confidence and optimize the speed process!!!

Yolo will give you all the information of a prediction in a variable:

from ultralytics import YOLO
import cv2

#Loading the nano model

model = YOLO('yolov8n.pt',task='detect')

image = 'dog.jpg'

results = model(image)

print(results[0])
Enter fullscreen mode Exit fullscreen mode

You will get a complete structure of your inference :

ultralytics.engine.results.Results object with attributes:

boxes: ultralytics.engine.results.Boxes object
keypoints: None
keys: ['boxes']
masks: None
names: {0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone', 68: 'microwave', 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator', 73: 'book', 74: 'clock', 75: 'vase', 76: 'scissors', 77: 'teddy bear', 78: 'hair drier', 79: 'toothbrush'}
orig_img: array([[[196, 179, 182],
        [196, 182, 184],
        [197, 184, 186],
        ...,
        [255, 255, 251],
        [255, 255, 251],
        [255, 255, 251]],

       [[196, 179, 182],
        [196, 182, 184],
        [197, 184, 186],
        ...,
        [255, 255, 251],
        [255, 255, 252],
        [255, 255, 251]],

       [[194, 180, 181],
        [195, 183, 183],
        [195, 185, 185],
        ...,
        [255, 255, 252],
        [255, 255, 254],
        [255, 255, 252]],

       ...,

       [[188, 191, 196],
        [188, 191, 196],
        [187, 190, 195],
        ...,
        [125, 138, 136],
        [125, 138, 136],
        [125, 138, 136]],

       [[188, 191, 196],
        [188, 191, 196],
        [187, 190, 195],
        ...,
        [124, 138, 137],
        [124, 138, 137],
        [125, 139, 138]],

       [[188, 191, 196],
        [188, 191, 196],
        [187, 190, 195],
        ...,
        [123, 139, 138],
        [125, 138, 140],
        [125, 138, 140]]], dtype=uint8)
orig_shape: (480, 640)
path: 'C:\\Users\\Quasar\\Desktop\\yolo8medium\\dog.jpg'
probs: None
save_dir: None
speed: {'preprocess': 15.631437301635742, 'inference': 609.3780994415283, 'postprocess': 31.238794326782227}
Enter fullscreen mode Exit fullscreen mode

If you want the bounding box:

from ultralytics import YOLO
import cv2

model = YOLO('yolov8n.pt',task='detect')

image = 'dog.jpg'

results = model(image)[0]
box = results.boxes

print(box)
Enter fullscreen mode Exit fullscreen mode

And the return :

ultralytics.engine.results.Boxes object with attributes:

boxes: tensor([[ 50.8651,  19.0002, 580.8049, 384.9475,   0.5911,  16.0000]])
cls: tensor([16.])
conf: tensor([0.5911])
data: tensor([[ 50.8651,  19.0002, 580.8049, 384.9475,   0.5911,  16.0000]])
id: None
is_track: False
orig_shape: (480, 640)
shape: torch.Size([1, 6])
xywh: tensor([[315.8350, 201.9738, 529.9398, 365.9473]])
xywhn: tensor([[0.4935, 0.4208, 0.8280, 0.7624]])
xyxy: tensor([[ 50.8651,  19.0002, 580.8049, 384.9475]])
xyxyn: tensor([[0.0795, 0.0396, 0.9075, 0.8020]])
Enter fullscreen mode Exit fullscreen mode

You see that you have all the type of bounding box as tensor!!

I like to work with xyxy dimension, so with the results List we can get everything from original image to boundings boxs and classes and confidences!!

So I make a function that return the coords,the class, the confidence and the image already cropped, I make only 1 detection in this example that why I putted max_det=1 in the model function!

from ultralytics import YOLO
import numpy as np
import cv2

model = YOLO('yolov8n.pt',task='detect')

image = 'dog.jpg'
Threshold=0.3

results = model(image,conf=Threshold,max_det=1)


def affRes(results):
   result = results[0]
   res = result.boxes[0] 
   cords = res.xyxy[0].tolist()
   cords = [round(x) for x in cords]
   class_id = result.names[res.cls[0].item()]
   conf = round(res.conf[0].item(), 2)
   img_cropped = cv2.resize(np.array(result.orig_img[cords[1]:cords[3],cords[0]:cords[2]]), (128, 128), interpolation=cv2.INTER_AREA)
   return cords,conf,class_id,img_cropped

box = results[0].boxes
if len(box)==0:
 print("Not detected!!")
 exit()
else:
 coords,conf,cl,img_cropped =  affRes(results)
print(coords,conf,cl)
cv2.imshow('cropped',img_cropped)
cv2.waitKey(0)
Enter fullscreen mode Exit fullscreen mode

OK that the basic of what we do with a detection model with YOLO:

What is important to know is that YOLO8 accept various sources:

Predict - Ultralytics YOLOv8 Docs

Discover how to use YOLOv8 predict mode for various tasks. Learn about different inference sources like images, videos, and data formats.

favicon docs.ultralytics.com

*Test it with your video camera:
*

from ultralytics import YOLO

model = YOLO('yolov8n.pt',task='detect')

Threshold=0.3

#0 for video camera source

model("0",conf=Threshold,show=True,max_det=1)
Enter fullscreen mode Exit fullscreen mode

cam test.gif

** - - -**

Speeding the inference time!

First thingh we can do to speed up the inference time is to export our model to ONNX

Yolo have a built in export function! We gonna export 2 ONNX models 1 for image size of 640 and one for image size of 416! These value are treated automatically with YOLO8!

from ultralytics import YOLO
import os

image = 'dog.jpg'

model = YOLO('yolov8n.pt')
model(image,imgsz=640)
model.export(format="onnx",imgsz=640,opset=12)
os.rename('yolov8n.onnx','yolov8n640.onnx')

model = YOLO('yolov8n.pt')
model(image,imgsz=416)
model.export(format="onnx",imgsz=416,opset=12)
os.rename('yolov8n.onnx','yolov8n416.onnx')
Enter fullscreen mode Exit fullscreen mode

You now have 2 new model yolov8n640.onnx and yolov8n416.onnx!

Yolo will automatically resize the image to the defined size before the inference! With a smaller image we will get a better inference speed, but we can loose some accuracy in the detection!

We can now tet the inference time for each one, I always warm up each model so the models are loaded and ready to be used!

from ultralytics import YOLO
import time
from time import strftime, sleep

#We define the models the pt model will accept the two inference size
#but not the onnx that why we exported 2 onnx models

modelPt = YOLO('yolov8n.pt',task='detect')
modelOnnx640  = YOLO('yolov8n640.onnx',task='detect')
modelOnnx416  = YOLO('yolov8n416.onnx',task='detect')

image = 'dog.jpg'

#We warm up the models to loads them !

def warmUp():
 print("Warming Up Models!!")
 modelPt(image,verbose=False)
 modelOnnx640(image,imgsz=640,verbose=False)
 modelOnnx416(image,imgsz=416,verbose=False)

warmUp()

#####


#A function that test the inference time for 10 run and for each run!

def runTimeTest(modelname,model,imgsz):
 start = time.perf_counter()
 i=0
 arr_inf = []
 while i <= 10:
  si = time.perf_counter()
  model(image,imgsz=imgsz)
  se = time.perf_counter() - si
  arr_inf.append("{:.2f}".format(se))
  i += 1
 tend = time.perf_counter() - start
 print("Model:",modelname,"Time:",tend)
 print("INFERENCE ARRAY:",arr_inf)
 print("###########")

###We run the function for each model with the 2 sizes

runTimeTest('PT 640',modelPt,640)
runTimeTest('ONNX 640',modelOnnx640,640)
runTimeTest('PT 416',modelPt,416)
runTimeTest('ONNX 416',modelPt,416)

Enter fullscreen mode Exit fullscreen mode

Result on my I3 laptop:

Model: PT 640 Time: 3.5733098000000005
INFERENCE ARRAY: ['0.42', '0.28', '0.32', '0.32', '0.29', '0.37', '0.36', '0.31', '0.34', '0.32', '0.26']
###########
Model: ONNX 640 Time: 2.593342299999996
INFERENCE ARRAY: ['0.23', '0.25', '0.22', '0.22', '0.22', '0.24', '0.24', '0.22', '0.22', '0.25', '0.28']
###########
Model: PT 416 Time: 1.5918490999999975
INFERENCE ARRAY: ['0.23', '0.12', '0.12', '0.12', '0.17', '0.16', '0.17', '0.15', '0.12', '0.13', '0.11']
###########
Model: ONNX 416 Time: 1.5584512999999944
INFERENCE ARRAY: ['0.16', '0.13', '0.15', '0.12', '0.15', '0.15', '0.14', '0.12', '0.13', '0.15', '0.16']
###########
Enter fullscreen mode Exit fullscreen mode

We see that for the 640 image size models the .onnx is better the the .pt, but for the 416 image size they s no diference! But the 416 px image size reduce 50% of the time inference! That’s very good!

Thanks for reading

Top comments (0)