DEV Community

Michael Learns
Michael Learns

Posted on

Monkey Recognition with Tensorflow Keras

It’s amazing how easy machine learning has become. It’s no longer a hinderance if you don’t have any mathematical background knowledge or any academia background on machine learning. I certainly belong to the category.

One of my goals in using machine learning is when recognising something. I wanted to build a model that could do facial recognition, identify animal species, name of things, etc. Learning the math to this was horrendous and quite difficult for someone with little knowledge on math. I almost gave up on this seemingly unattainable goal until I discovered “pre-trained models”.

Pre-trained models are basically (based on my understanding), models that have been trained on different datasets to solve a similar problem. There are different kinds of pre-trained models, each one for their own specific purposes for different problems.

Today, I’ll be showing you how I built a monkey recognition model using the pre-trained model ResNet50. ResNet50 is great for specifying species (based on what I’ve seen). I’ve seen this being used for dogs as well. Sooo let’s go with monkeys 🙈


I'm doing this project in a jupyter notebook. You can check my setup post if you haven't yet setup your machine for machine learning. I'm using the 10-monkey-species from Kaggle.

We must also have a JSON file with all the indexes of the species. You can download it in this Github gist. It's gonna be important for later.

Lastly, download the pre-trained models of ResNet50. Here's a link from Kaggle.

Structuring our Data Directory

My directory folder looks like this:

Directory Structure

We have images for training and validation. *remember data splitting
Inside the training/validation folder we have the directories for each species and their images.

Transforming the Images

First hurdle is taking the images and changing it into something that our machine learning model can understand.

from tensorflow.python.keras.preprocessing.image import ImageDataGenerator
from tensorflow.python.keras.applications.resnet50 import preprocess_input

image_size = 224
data_generator_with_aug = ImageDataGenerator(preprocessing_function=preprocess_input, 
train_generator = data_generator_with_aug.flow_from_directory(train_path, 
                                                     target_size=(image_size, image_size), 

data_generator_with_no_aug = ImageDataGenerator(preprocessing_function=preprocess_input)

validation_generator = data_generator_with_no_aug.flow_from_directory(validation_path,
                                                          target_size=(image_size, image_size),
Enter fullscreen mode Exit fullscreen mode

Let’s break this down. ImageDataGenerator is a function that takes the image and preprocess the image and spits out data that the model can understand. The preprocessing_function argument takes in another function, in this case the preprocess_input from ResNet50 which translates the image that RestNet50 can understand. The other arguments allows for data augmentation.

Data augmentation basically allows you to augment your data. Meaning that it allows you to increase your data samples without actually increasing the number of data you have. In this example flipping the image or slightly changing the width and height of the image allows us to have more samples of the image without actually using the same exact image.

flow_from_directory takes in all the images you’ve placed in the directory.

Building Our Model

Now, let’s build our model!!

from tensorflow.python.keras.applications import ResNet50
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, Flatten, Conv2D, Dropout

resnet_weights_path = 'resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5'
model = Sequential()
model.add(ResNet50(include_top=False, pooling='avg', weights=resnet_weights_path))

num_classes = 10
model.add(Dense(num_classes, activation='softmax'))

# Say not to train first layer (ResNet) model. It is already trained
model.layers[0].trainable = False

model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])


Enter fullscreen mode Exit fullscreen mode

Break down:

  • So we used a Sequential convolutional network. Then we added a ResNet50 layer with the pre-trained weights.
  • Then we added a Dense layer with the number of classes we have which is 10. Number of classes means the number of species (in this case) that we're trying to identify.
  • We also said that for the first layer of our model we don’t want to re-train it.
  • Then we compiled everything that we did with model.compile() and finally fitted everything in the model.fit_generator()

Remember the generators we’ve made when we transformed our images? We placed them all in here.

  • Epochs is basically the number of times the model will cycle through the images.
  • steps_per_epoch is the number of images we would take in for each epoch.

Testing Our Model

import numpy as np
from os.path import join
from tensorflow.python.keras.preprocessing.image import load_img, img_to_array

image_dir = '10-monkey-species/validation'
img_paths = [join(image_dir, filename) for filename in 

image_size = 224
def read_prep_images(img_paths, img_height=image_size, img_width=image_size):
    imgs = [load_img(img_path, target_size=(img_height, img_width)) for img_path in img_paths]
    img_array = np.array([img_to_array(img) for img in imgs])
    output = preprocess_input(img_array)

    return output

test_data = read_prep_images(img_paths)
Enter fullscreen mode Exit fullscreen mode

Here we take in several test images from our validation dataset and changing them into a np.array using the read_prep function.

Then we predict them with our model.

preds = model.predict(test_data)
Enter fullscreen mode Exit fullscreen mode

Translating Our Predictions

We need to translate our predictions into indexes so that:

  1. we can use that index on our JSON file
  2. Return the appropriate data for that index

I copied this from Kaggle with a few slight adjustments.

import json
from IPython.display import Image, display

def decode_predictions(preds, top=5, class_list_path=None):
  if len(preds.shape) != 2 or preds.shape[1] != 10:
    raise ValueError('`decode_predictions` expects '
                     'a batch of predictions '
                     '(i.e. a 2D array of shape (samples, 1000)). '
                     'Found array with shape: ' + str(preds.shape))
  index_list = json.load(open(class_list_path))
  results = []
  for pred in preds:
    top_indices = pred.argsort()[-top:][::-1]
    result = [tuple(index_list[str(i)]) + (pred[i],) for i in top_indices]
    result.sort(key=lambda x: x[2], reverse=True)
  return results

most_likely_labels = decode_predictions(preds, top=5, class_list_path='10-monkey-species/class_monkey_species.json')

for i, img_path in enumerate(img_paths):
Enter fullscreen mode Exit fullscreen mode

So the decode_predictions process takes in the preds which is a 2D array and indexes the corresponding object from the json file. So for example if preds was [0, 1, 8] then it would index that from the json and return [[‘index_0’, ‘folder_name’], ['index_1', 'folder_name'], ['index_8', 'folder_name']].

The for loop basically takes in the images and displays them with the decoded predictions.

And this is the result:


Machine Learning has become incredibly simple. It’s flabbergasting. Certain terms here like ‘softmax’ and ‘relu’ are all mathematical terms which you don’t really need to dive into their mechanics but would be useful if you had a general idea of what they do and what they’re for. In my journey so far in machine learning, you only need a general understanding and concept of it to really begin.

Hope you learned something from this!
Would love to hear your feedback 😄
See more from me on twitter and instagram @heyimprax.

Thanks for reading! You've made it 🎉 Have a coffee break ☕️

Top comments (4)

evelin profile image

Hi. Thank you for the helpful tutorial. I'm trying to reproduce the monkey classification but at the line "model.add(ResNet50(include_top=False, pooling='avg', weights=resnet_weights_path))" I get an error message "Shapes (1, 1, 256, 512) and (512, 128, 1, 1) are incompatible". Do you have any idea what could be the problem? I didn't make any changes in the code, except when I imported the preprocess_input: "from tensorflow.keras.applications.resnet import preprocess_input". I use tensorflow 2.0.

imronlearning profile image
Michael Learns

Hi e-velin! Thanks for reading my post! It's been awhile since I did this post, but if I recall correctly I was using an earlier version of tensorflow. It was probably around v1. Try downgrading your tensorflow to an earlier version.

adrianodennanni_2 profile image
Adriano Dennanni • Edited

Great tutorial! This proves how easy it is to start using Machine Learning in your company.

imronlearning profile image
Michael Learns

Thanks! 😄