DEV Community


Posted on

Object Detection Using Tensorflow

Object Detection using Tensorflow is a computer vision technique. As the name suggests, it helps us in detecting, locating, and tracing an object from an image or camera.

What is Object detection?
Object detection is a computer vision technique in which a software system can detect, locate, and trace the object from a given image or video. The special attribute about object detection is that it identifies the class of object (person, table, chair, etc.) and their location-specific coordinates in the given image. The location is pointed out by drawing a bounding box around the object. The bounding box may or may not accurately locate the position of the object. The ability to locate the object inside an image defines the performance of the algorithm used for detection. Face detection is one of the examples of object detection.

How does Object detection work?
Generally, the object detection task is carried out in three steps:

  • Generates the small segments in the input as shown in the image below. As you can see the large set of bounding boxes are spanning the full image
  • Feature extraction is carried out for each segmented rectangular area to predict whether the rectangle contains a valid object.
  • Overlapping boxes are combined into a single bounding rectangle (Non-Maximum Suppression)

What is TensorFlow?
Tensorflow is an open-source library for numerical computation and large-scale machine learning that ease Google Brain TensorFlow, the process of acquiring data, training models, serving predictions, and refining future results.

Tensorflow bundles together Machine Learning and Deep Learning models and algorithms. It uses Python as a convenient front-end and runs it efficiently in optimized C++.

Object Detection Using Tensorflow
As mentioned above the knowledge of neural network and machine learning is not mandatory for using this API as we are mostly going to use the files provided in the API. All we need is some knowledge of python and passion for completing this project.
Follow the below-mentioned steps :
Step 1: Create a folder named ObjectDetection and open it with the VS Code.
Step 2: Download Tensorflow API from Github Repository by typing the below-mentioned command in vs code's terminal

git clone

step 3: Setting up a virtual environment

python -m venv --system-site-packages .\venv

  • activate the environment


  • upgrading pip version to the latest

python -m pip install --upgrade --ignore-installed

Step 4: Installing dependencies

  • Installing and upgrading tensorflow pip install tensorflow

pip install --upgrade tensorflow

  • installing matplotlib

pip install pillow Cython lxml jupyter matplotlib

  • Navigate to the research subfolder in the models folder.

cd \models\research\

Step 5: Now we need to download Protocol Buffers (Protobuf) which are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data, – think of it as XML, but smaller, faster, and simpler.

  • Extract the content of the downloaded zip from the above link in the research subfolder of the models' folder and visit the bin folder and copy the path of the protoc.exe there.
  • Then open "Edit the system environment variables" and click on "Environment Variables".

(i) Under system variables select 'path' and click on edit.
(ii)Click on New and paste the copied path of the 'protoc.exe'.
Step 6: Then in the terminal of vs code run this command

protoc object_detection/protos/*.proto --python_out=.

Step 7: Create a new python file named '' in the same folder and paste the code given below :

import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
import pathlib
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from IPython.display import display
from object_detection.utils import ops as utils_ops
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util

while "models" in pathlib.Path.cwd().parts:

def load_model(model_name):
  base_url = ''
  model_file = model_name + '.tar.gz'
  model_dir = tf.keras.utils.get_file(
    origin=base_url + model_file,

  model_dir = pathlib.Path(model_dir)/"saved_model"
  model = tf.saved_model.load(str(model_dir))
  return model

PATH_TO_LABELS = 'models/research/object_detection/data/mscoco_label_map.pbtxt'
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

model_name = 'ssd_inception_v2_coco_2017_11_17'
detection_model = load_model(model_name)

def run_inference_for_single_image(model, image):
  image = np.asarray(image)
  # The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
  input_tensor = tf.convert_to_tensor(image)
  # The model expects a batch of images, so add an axis with `tf.newaxis`.
  input_tensor = input_tensor[tf.newaxis,...]

  # Run inference
  model_fn = model.signatures['serving_default']
  output_dict = model_fn(input_tensor)

  # All outputs are batches tensors.
  # Convert to numpy arrays, and take index [0] to remove the batch dimension.
  # We're only interested in the first num_detections.
  num_detections = int(output_dict.pop('num_detections'))
  output_dict = {key:value[0, :num_detections].numpy() 
                 for key,value in output_dict.items()}
  output_dict['num_detections'] = num_detections

  # detection_classes should be ints.
  output_dict['detection_classes'] = output_dict['detection_classes'].astype(np.int64)

  # Handle models with masks:
  if 'detection_masks' in output_dict:
    # Reframe the the bbox mask to the image size.
    detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
              output_dict['detection_masks'], output_dict['detection_boxes'],
               image.shape[0], image.shape[1])      
    detection_masks_reframed = tf.cast(detection_masks_reframed > 0.5,
    output_dict['detection_masks_reframed'] = detection_masks_reframed.numpy()

  return output_dict

def show_inference(model, frame):
  #take the frame from webcam feed and convert that to array
  image_np = np.array(frame)
  # Actual detection.

  output_dict = run_inference_for_single_image(model, image_np)
  # Visualization of the results of a detection.
      instance_masks=output_dict.get('detection_masks_reframed', None),


#Now we open the webcam and start detecting objects
import cv2
video_capture = cv2.VideoCapture(0)
while True:
    # Capture frame-by-frame
    re,frame =
    Imagenp=show_inference(detection_model, frame)
    cv2.imshow('object detection', cv2.resize(Imagenp, (800,600)))
    if cv2.waitKey(1) & 0xFF == ord('q'):
Enter fullscreen mode Exit fullscreen mode

Step 7:For real-time object detection we need one more dependency OpenCV.So to install OpenCV run this command in terminal.

pip install opencv-python

Step 8: Now we are all done and our setup is ready to be executed and detect the objects. Execute the ''


Top comments (0)