How to use the TensorFlow Object Detection API (inference, with Colab)

#python #tensorflow #machinelearning #computervision

This article shows how to use the TensorFlow Object Detection API (the inference part). You can do it in Colab.

Colab sample — run the cells in order to try the TensorFlow Object Detection API. Change Image_Path in the last cell to your own image to detect objects in it.

The official TensorFlow Model Zoo has many kinds of models.

(For training a model: Train an object detection model with the TensorFlow Object Detection API)
(For quick training with just a few images: Quick-train an object detection model with the TensorFlow Object Detection API)

Steps

0. Install TensorFlow 2

!pip install -U --pre tensorflow=="2.2.0"

1. Clone the official TensorFlow Models from GitHub

import os
import pathlib

# If "models" is in the current directory path, move there. Otherwise clone it.
if "models" in pathlib.Path.cwd().parts:
  while "models" in pathlib.Path.cwd().parts:
    os.chdir('..')
elif not pathlib.Path('models').exists():
  !git clone --depth 1 https://github.com/tensorflow/models

2. Install the Object Detection API and required modules

%%bash # enable bash commands
cd models/research/
protoc object_detection/protos/*.proto --python_out=.
cp object_detection/packages/tf2/setup.py .
python -m pip install .

3. Import modules

import matplotlib
import matplotlib.pyplot as plt

import io
import scipy.misc
import numpy as np
from six import BytesIO
from PIL import Image, ImageDraw, ImageFont

import tensorflow as tf

from object_detection.utils import label_map_util
from object_detection.utils import config_util
from object_detection.utils import visualization_utils as viz_utils
from object_detection.builders import model_builder

%matplotlib inline

4. Image-loading function

def load_image_into_numpy_array(path):
  """Load an image into a numpy array.

  Puts image into numpy array to feed into tensorflow graph.
  Note that by convention we put it into a numpy array with shape
  (height, width, channels), where channels=3 for RGB.

  Args:
    path: the file path to the image

  Returns:
    uint8 numpy array with shape (img_height, img_width, 3)
  """
  img_data = tf.io.gfile.GFile(path, 'rb').read()
  image = Image.open(BytesIO(img_data))
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)

def get_keypoint_tuples(eval_config):
  """Return a tuple list of keypoint edges from the eval config.

  Args:
    eval_config: an eval config containing the keypoint edges

  Returns:
    a list of edge tuples, each in the format (start, end)
  """
  tuple_list = []
  kp_list = eval_config.keypoint_edge
  for edge in kp_list:
    tuple_list.append((edge.start, edge.end))
  return tuple_list

5. Download a model

!wget http://download.tensorflow.org/models/object_detection/tf2/20200713/centernet_hg104_512x512_coco17_tpu-8.tar.gz
!tar -xf centernet_hg104_512x512_coco17_tpu-8.tar.gz

Download any model you like from the official Model Zoo. Hover over a model name there to see its download URL.

It's fun just looking at the performance comparisons. Once download and extraction finish, you get a folder containing checkpoint, saved_model, and pipeline.config.

6. Read the pipeline config and build the model

# Path to the config file. The repo has a folder of config files, but the model
# names are slightly abbreviated, so the downloaded one is more reliable.
pipeline_config = "./centernet_hg104_512x512_coco17_tpu-8/pipeline.config"
# Path to the checkpoint
model_dir = "./centernet_hg104_512x512_coco17_tpu-8/checkpoint"

# Load the model config
configs = config_util.get_configs_from_pipeline_file(pipeline_config)
model_config = configs['model']

# Build the model from the loaded config
detection_model = model_builder.build(
      model_config=model_config, is_training=False)

# Restore weights from the checkpoint
ckpt = tf.compat.v2.train.Checkpoint(model=detection_model)
ckpt.restore(os.path.join(model_dir, 'ckpt-0')).expect_partial()

7. Prepare the inference function

def get_model_detection_function(model):
  """Get a tf.function for detection."""

  @tf.function
  def detect_fn(image):
    """Detect objects in image."""

    image, shapes = model.preprocess(image)
    prediction_dict = model.predict(image, shapes)
    detections = model.postprocess(prediction_dict, shapes)

    return detections, prediction_dict, tf.reshape(shapes, [-1])

  return detect_fn

detect_fn = get_model_detection_function(detection_model)

8. Prepare labels

Inference needs the object labels the model was trained on. The labels are in the official repo at models/research/object_detection/data/. This model was trained on COCO, so we use mscoco_label_map.pbtxt.

label_map_path = './models/research/object_detection/data/mscoco_label_map.pbtxt'
label_map = label_map_util.load_labelmap(label_map_path)
categories = label_map_util.convert_label_map_to_categories(
    label_map,
    max_num_classes=label_map_util.get_max_label_map_index(label_map),
    use_display_name=True)
category_index = label_map_util.create_category_index(categories)
label_map_dict = label_map_util.get_label_map_dict(label_map, use_display_name=True)

9. Run detection on your image

Upload any image to Colab and set its path as image_path. By the way, images with an alpha channel seem to need converting to 3 channels first.

image_dir = 'models/research/object_detection/test_images/'
image_path = os.path.join(image_dir, 'image2.jpg')
image_np = load_image_into_numpy_array(image_path)

# Things to try:
# Flip horizontally
# image_np = np.fliplr(image_np).copy()

# Convert image to grayscale
# image_np = np.tile(
#     np.mean(image_np, 2, keepdims=True), (1, 1, 3)).astype(np.uint8)

input_tensor = tf.convert_to_tensor(
    np.expand_dims(image_np, 0), dtype=tf.float32)
detections, predictions_dict, shapes = detect_fn(input_tensor)

label_id_offset = 1
image_np_with_detections = image_np.copy()

# Use keypoints if available in detections
keypoints, keypoint_scores = None, None
if 'detection_keypoints' in detections:
  keypoints = detections['detection_keypoints'][0].numpy()
  keypoint_scores = detections['detection_keypoint_scores'][0].numpy()

viz_utils.visualize_boxes_and_labels_on_image_array(
      image_np_with_detections,
      detections['detection_boxes'][0].numpy(),
      (detections['detection_classes'][0].numpy() + label_id_offset).astype(int),
      detections['detection_scores'][0].numpy(),
      category_index,
      use_normalized_coordinates=True,
      max_boxes_to_draw=200,
      min_score_thresh=.30,
      agnostic_mode=False,
      keypoints=keypoints,
      keypoint_scores=keypoint_scores,
      keypoint_edges=get_keypoint_tuples(configs['eval_config']))

plt.figure(figsize=(12,16))
plt.imshow(image_np_with_detections)
plt.show()