DEV Community

Cover image for Classify any Object using pre-trained CNN Model
Sparsh Gupta
Sparsh Gupta

Posted on • Originally published at towardsdatascience.com

Classify any Object using pre-trained CNN Model

Large Scale Image Classification using pre-trained Inception v3 Convolution Neural Network Model

Photo by [Lenin Estrada](https://unsplash.com/@lenin33?utm_source=medium&utm_medium=referral) on [Unsplash](https://unsplash.com?utm_source=medium&utm_medium=referral)

Today we have the super-effective technique as Transfer Learning where we can use a pre-trained model by Google AI to classify any image of classified visual objects in the world of computer vision.

Transfer learning is a machine learning method which utilizes a pre-trained neural network. Here, the image recognition model called Inception-v3 consists of two parts:

  • Feature extraction part with a convolutional neural network.

  • Classification part with fully-connected and softmax layers.

Inception-v3 is a pre-trained convolutional neural network model that is 48 layers deep.

It is a version of the network already trained on more than a million images from the ImageNet database. It is the third edition of Inception CNN model by Google, originally instigated during the ImageNet Recognition Challenge.

This pre-trained network can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals. As a result, the network has learned rich feature representations for a wide range of images. The network has an image input size of 299-by-299. The model extracts general features from input images in the first part and classifies them based on those features in the second part.

Schematic diagram of Inception v3 — By Google AISchematic diagram of Inception v3 — By Google AI

Inception v3 is a widely-used image recognition model that has been shown to attain greater than 78.1% accuracy on the ImageNet dataset and around 93.9% accuracy in top 5 results. The model is the culmination of many ideas introduced by multiple researchers over the past years. It is based on the original paper: “Rethinking the Inception Architecture for Computer Vision” by Szegedy, et. al.

More information about the Inception architecture can be found here.

In Transfer Learning, when you build a new model to classify your original dataset, you reuse the feature extraction part and re-train the classification part with your dataset. Since you don’t have to train the feature extraction part (which is the most complex part of the model), you can train the model with less computational resources and training time.

In this article, we will just use the Inception v3 model to predict some images and fetch the top 5 predicted classes for the same. Let’s begin.

We are using Tensorflow v2.x

Import Data

import os
import numpy as np
from PIL import Image
from imageio import imread
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
import tf_slim as slim
from tf_slim.nets import inception
import tf_slim as slim
import cv2
import matplotlib.pyplot as plt

Data Loading

Setup all initial variables with default file locations and respective values.

ckpt_path = "/kaggle/input/inception_v3.ckpt"
images_path = "/kaggle/input/animals/*"
img_width = 299
img_height = 299
batch_size = 16
batch_shape = [batch_size, img_height, img_width, 3]
num_classes = 1001
predict_output = []
class_names_path = "/kaggle/input/imagenet_class_names.txt"
with open(class_names_path) as f:
    class_names = f.readlines()

Create Inception v3 model

X = tf.placeholder(tf.float32, shape=batch_shape)

with slim.arg_scope(inception.inception_v3_arg_scope()):
    logits, end_points = inception.inception_v3(
        X, num_classes=num_classes, is_training=False
    )

predictions = end_points["Predictions"]
saver = tf.train.Saver(slim.get_model_variables())

Define function for loading images and resizing for sending to model for evaluation in RGB mode.

def load_images(input_dir):
    global batch_shape
    images = np.zeros(batch_shape)
    filenames = []
    idx = 0
    batch_size = batch_shape[0]
    files = tf.gfile.Glob(input_dir)[:20]
    files.sort()
    for filepath in files:
        with tf.gfile.Open(filepath, "rb") as f:
            imgRaw = np.array(Image.fromarray(imread(f, as_gray=False, pilmode="RGB")).resize((299, 299))).astype(np.float) / 255.0
        images[idx, :, :, :] = imgRaw * 2.0 - 1.0
        filenames.append(os.path.basename(filepath))
        idx += 1
        if idx == batch_size:
            yield filenames, images
            filenames = []
            images = np.zeros(batch_shape)
            idx = 0
    if idx > 0:
        yield filenames, images

Load Pre-Trained Model

session_creator = tf.train.ChiefSessionCreator(
        scaffold=tf.train.Scaffold(saver=saver),
        checkpoint_filename_with_path=ckpt_path,
        master='')

Classify Images using Model

with tf.train.MonitoredSession(session_creator=session_creator) as sess:
    for filenames, images in load_images(images_path):
        labels = sess.run(predictions, feed_dict={X: images})
        for filename, label, image in zip(filenames, labels, images):
            predict_output.append([filename, label, image])

Predictions

We will use some images from the Animals-10 dataset from kaggle to declare the model predictions.

for x in predict_output:
    out_list = list(x[1])
    topPredict = sorted(range(len(out_list)), key=lambda i: out_list[i], reverse=True)[:5]
    plt.imshow((((x[2]+1)/2)*255).astype(int))
    plt.show()
    print("Filename:",x[0])
    print("Displaying the top 5 Predictions for above image:")
    for p in topPredict:
        print(class_names[p-1].strip())

At length, all of the classes are classified spot on, and we can also see that the top 5 similar classes as predicted by the model are pretty good and precise.

Top comments (0)