YOLOv8 classifier trained on a custom dataset

#yolo #deeplearning #computervision #train

YOLO (you only look once) is an advanced deep learning model that allows ML software developers to solve computer vision problems easily and efficiently. YOLOv8 is the latest version released in January 2023. It includes a number of pretrained models with different set of parameters (5 options from nano to xlarge).
YOLOv8 can solve three tasks related to computer vision: object detection, segmentation and classification. Each of the tasks has its own scope of application, which can be visualized in the image:

Classification is needed when you want to understand what kind of object is shown in the image. It doesn't matter where in the image the object is located. Often it can be similar images, such as products on a store shelf or letters on a license plate. With a machine learning model, the object can be recognized quickly and accurately.

YOLOv8 has several model variants, which have been pretrained on known and common datasets. Detection and Segmentation models are pretrained on the COCO dataset, while Classification models are pretrained on the ImageNet dataset.
Unfortunately, these datasets and the models trained on them are not always well suited for a particular application. For example, if you need to track people in a video, the COCO dataset may not be a good fit, because in addition to people it will find chairs, cars, phones and other objects. So in many application tasks there is a need to train models on a custom dataset.

YOLOv8 allows developers to train the model on custom datasets, this can be done both from the command line, and with the help of program code written in Python.
CLI:



yolo detect train data=coco128.yaml model=yolov8n.pt epochs=100

Python:



from ultralytics import YOLO

model = YOLO("yolov8n.pt")
model.train(data="coco128.yaml", epochs=5)

The key to a model's ability to make accurate predictions is to prepare the dataset in the format required for use in the model.

Model training consists of 5 stages:

preparing images and assigning them to classes
split data for train, valuation, and test
preparation the configuration file
selecting the structure of the model
running the model’s training

1. Preparing images and assigning them to classes

Usually a dataset prepared for training an object detection model consists of images and special files for each image with annotations of the objects depicted in the image with the indication of the coordinates of the object location. These files are not needed for the classification task.
The class of objects for deep learning model YOLOv8 is determined by placing the image in the folder with the class name. All the images are sufficiently placed in the folders on disk, and these folders’ names define the class names.

2. Splitting data for model’s train, evaluation, and test

Traditionally in machine learning model training, a dataset is divided into three parts: the first part is used to train the model, the second part to validate the accuracy of the model, and the third part to objectively test the model on new data that the model has not seen. Usually the dataset is divided into these three parts in the proportion of 70-20-10, but it can be any ratio.

In order to divide the data for the YOLOv8 model, you need to create special folders within a dataset’s directory. The "datasets" folder should reside in the folder where your project's work files are located and model training is running. Within this “datasets" folder you should create a folder with the name of your dataset, and then train, val, and test folders. Each of the train, val, test folders should have folders with class names which contain files with dataset images.

As a result the structure of folders looks like this:

3. Preparing the configuration file

In the datasets directory, you need to prepare a configuration file that tells the model which classes it should recognize. Here is an example of a configuration file with three classes:



train: train/
valid: valid/
test: test/

# number of classes
nc: 3

# class names
names: ["сlass1","сlass2","сlass3"]

The filename should be the name of your dataset (the same as your dataset's folder name), with the extension of ".yaml". The structure of this configuration file is obvious, and you can adjust it to fit your project.

4. Selecting the structure of the model

A neural machine learning model consists of multiple layers with varying numbers of parameters. The structure of the model layers can be defined manually, but you can use a ready-made structure of one of the pretrained models. For the classification problem, YOLOv8 has 5 ready-made models, which differ in the number of parameters, accuracy and speed:

It is advisable to try different models and choose the one that will be optimal for your particular project with respect to speed and accuracy.

5. Running the model’s training (and waiting for a long time)

After you have placed all the images in folders and prepared the configuration file, the final step is to run the model training. The easiest way to do this is from the command line or terminal, if you are using a server.



yolo task=classify mode=train data=mydataset model=yolov8n-cls.pt epochs=50

As you can see, the name of your dataset with corresponding folder and configuration file is set by the data parameter, and the selected model structure (in this example it is yolov8n-cls.pt) is defined in the model parameter. You should perform at least 10 runs (epochs), depending on the model and your dataset it could be 50-100.
You will see the whole process of training the model and the results of each run. It is desirable to use a computer with a powerful video card, plenty of memory and the PyTorch library with CUDA support for training.
When finished, the trained model will be saved to the address runs/classify/trainX/weights/best.pt, where X is the sequential number of training runs.

In order to use the new model in the program, you can use the code in Python:



from ultralytics import YOLO
model = YOLO("runs/classify/train1/weights/best.pt")
filePath = "img.jpg"
results = model(filePath)[0]
results = results.probs.tolist()
print(“Maximum probability: ",max(results))
print(“Class with maximum probability: ",results.index(max(results))+1)
    return results.index(max(results))+1

This code will show the class number that the neural network has detected as the best match for the selected image in the img.jpg file.
Thanks for reading and good luck!
If you have any questions, write in the comments, I will do my best to help you.
Regards, Ilya.