Danyson

Posted on Jun 21, 2021 • Edited on Jan 16, 2022

How to prepare custom image dataset, split as train set & test set and build a CNN model using Keras?

#machinelearning #deeplearning #datascience #computerscience

Imagine you have two class of images, Class_A & Class_B.

Now, you need a custom dataset with train set and test set for training and validation of our image data.

We are going to use Keras for our Dataset generation.

----------------------------------logo:keras.io----------------------------

Steps in creating the directory for images:

Create folder named data
Create folders train and validation as subfolders inside folder data.
Create folders class_A and class_B as subfolders inside train and validation folders.
Place 80% class_A images in data/train/class_A folder path.
Place 20% class_A imagess in `data/validation/class_A folder path.
Place 80% class_B images in data/train/class_B folder path.
Place 20% class_B imagess in data/validation/class_B folder path.

Directory structure.

Steps to do in code.

1, Imports.
from keras.preprocessing.image import ImageDataGenerator from keras.models import Sequential from keras.layers import Conv2D, MaxPooling2D from keras.layers import Activation, Dropout, Flatten, Dense
2, Initialize variables as follow
# image dimensions, set as per your preference. img_width, img_height = 150, 150

train_data_dir = 'data/train' validation_data_dir = 'data/validation'

set the following parameters as per your preference

batch_size = 10 nb_train_samples = 800 nb_validation_samples = 200 epochs = 40
3, Augmentation configuration for train set

train_datagen = ImageDataGenerator( rescale=1. / 255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True)
4, Augmentation configuration for test set

# rescaling test_datagen = ImageDataGenerator(rescale=1. / 255)

5, Now, use the flow_from_directory() method in ImageDataGenerator class to generate a data generator from image files in a directory.

train_generator = train_datagen.flow_from_directory( train_data_dir, target_size=(img_width, img_height), batch_size=batch_size, class_mode='binary')

validation_generator = test_datagen.flow_from_directory( validation_data_dir, target_size=(img_width, img_height), batch_size=batch_size, class_mode='binary')

6, Build an image classifier model, a sequential CNN architecture with relu as hidden neurons activation function and sigmoid as output neuron activation function.
model = Sequential() model.add(Conv2D(32, (3, 3), input_shape=input_shape)) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten()) model.add(Dense(64)) model.add(Activation('relu')) model.add(Dropout(0.5)) model.add(Dense(1)) model.add(Activation('sigmoid'))

7, Compile the model as follows
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

8, Now use fit() method to fit your train set, validate your image dataset and calculate steps_per_epoch & validation_steps by doing a floor division of steps_per_epoch=nb_train_samples // batch_size validation_steps=nb_validation_samples // batch_size.
model.fit( train_generator, steps_per_epoch=nb_train_samples // batch_size, epochs=epochs, validation_data=validation_generator, validation_steps=nb_validation_samples // batch_size)

Reference :

Keras Image data preprocessing

Explore Us On:Doge Algo

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

DEV Community

How to prepare custom image dataset, split as train set & test set and build a CNN model using Keras?

Steps in creating the directory for images:

Directory structure.

Steps to do in code.

set the following parameters as per your preference

Reference :

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

Top comments (0)

Read next

Top 7 Data Careers You Should Know About in 2025

Building Bedrock Agents for AWS Account Metadata and Cost Analysis

🚀⚙️ JavaScript Visualized: the JavaScript Engine

New Open-Source AI Model OLMo 2 Matches Leading Language Models While Using Less Computing Power