DEV Community

PythicCoder for Microsoft Azure

Posted on • Originally published at Medium on

Visual Brand Detection with Azure Video Indexer

TLDR; This post will show how to use the Azure Video Indexer, Computer Vision API and Custom Vision Services to extract key frames and detect custom image tags in indexed videos.

All code for the tutorial can be found in the notebook below. This code can be extended to support almost any image classification or object detection task.

aribornstein/AzureVideoIndexerVisualBrandDetection

The tutorial requires an Azure subscription, however everything can be achieved using the free tier. If you are new to Azure you can get a free subscription here.

Create your Azure free account today | Microsoft Azure

What is Azure Video Indexer?

Azure Video Indexer automatically extracts metadata — such as spoken words, written text, faces, speakers, celebrities, emotions, topics, brands, and scenes from video and audio files. Developers can then access the data within their application or infrastructure, make it more discover-able, and use it to create new over-the-top (OTT) experiences and monetization opportunities

Use the Video Indexer API - Azure Media Services

Often, we wish to extract useful tags from videos content.These tags are often the differentiating factor for having successful engagement on social media services such as Instagram, Facebook, and YouTube

This tutorial will show how to use Azure Video Indexer, Computer Vision API, and Custom Vision service to extract key frames and custom tags. We will use these Azure services to detect custom brand logos in indexed videos.

This code can be extended to support almost any image classification or object detection task.

Step #1 Download A Sample Video with the pyTube API

The first step is to download a sample video to be indexed. We will be downloading an episode of Azure Mythbusters on Azure Machine Learning by my incredible Co-Worker Amy Boyd using the Open Source pyTube API!

Installation:

pyTube can be installed with pip

!pip install pytube3 --upgrade

Code:

from pytube import YouTube
from pathlib import Path

video2Index = YouTube('https://www.youtube.com/watch?v=ijtKxXiS4hE').streams[0].download()

video\_name = Path(video2Index).stem

Step #2 Create An Azure Video Indexer Instance

Navigate to https://www.videoindexer.ai/ and follow the instructions to create an Account

For the next steps, you will need your Video Indexer

  • Subscription Key
  • Location
  • Account Id

These can be found in the account settings page in the Video Indexer Website pictured above. For more information see the documentation below. Feel free to comment below if you get stuck.

Use the Video Indexer API - Azure Media Services

Step #3 Use the Unofficial Video Indexer Python Client to Process our Video and Extract Key Frames

To interact with the Video Indexer API, we will use the unofficial Python client.

Installation:

pip install video-indexer

Code:

  • Initialize Client:
vi = VideoIndexer(vi\_subscription\_key='SUBSCRIPTION\_KEY',
                  vi\_location='LOCATION',
                  vi\_account\_id='ACCOUNT\_ID')
  • Upload Video:
video\_id = vi.upload\_to\_video\_indexer(
              input\_filename = video2Index,
              video\_name=video\_name, #must be unique
              video\_language='English')
  • Get Video Info
info = vi.get\_video\_info(video\_id, video\_language='English')
  • Extract Key Frame Ids
keyframes = []
for shot in info["videos"][0]["insights"]["shots"]:
    for keyframe in shot["keyFrames"]:
        keyframes.append(keyframe["instances"][0]['thumbnailId'])
  • Get Keyframe Thumbnails
for keyframe in keyframes:
    img\_str = vi.get\_thumbnail\_from\_video\_indexer(video\_id,    
                                                  keyframe)

Step #3 Use the Azure Computer Vision API to Extract Popular Brands from Key Frames

Out of the box, Azure Video Indexer uses optical character recognition and audio transcript generated from speech-to-text transcription to detect references to popular brands.

Now, that we have extracted the key frames we are going to leverage the Computer Vision API to extend this functionality to see if there are any known brands in the key frames.

Brand detection - Computer Vision - Azure Cognitive Services

  • First we will have to create a Computer Vision API key. There is a free tier that can be used for the demo that can be generated with the instructions in the documentation link below. Once done you should get a Computer Vision subscription key and endpoint

Create a Cognitive Services resource in the Azure portal - Azure Cognitive Services

After we have our Azure Computer Vision subscription key and endpoint , we can then use the Client SDK to evaluate our video’s keyframes:

Installation:

pip install --upgrade azure-cognitiveservices-vision-computervision

Code:

  • Initialize Computer Vision Client
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials

computervision\_client = ComputerVisionClient(endpoint, CognitiveServicesCredentials(subscription\_key))
  • Send Keyframe To Azure Computer Vision Service to Detect Brands
import time

timeout\_interval, timeout\_time = 5, 10.0
image\_features = ["brands"]

for index, keyframe in enumerate(keyframes):

if index % timeout\_interval == 0:
     print("Trying to prevent exceeding request limit waiting {} seconds".format(timeout\_time))
     time.sleep(timeout\_time)

# Get KeyFrame Image Byte String From Video Indexer
img\_str = vi.get\_thumbnail\_from\_video\_indexer(video\_id, keyframe)

# Convert Byte Stream to Image Stream
img\_stream = io.BytesIO(img\_str)

# Analyze with Azure Computer Vision
cv\_results = computervision\_client.analyze\_image\_in\_stream(img\_stream, image\_features)

print("Detecting brands in keyframe {}: ".format(keyframe))

if len(cv\_results.brands) == 0:
    print("No brands detected.")

else:
    for brand in cv\_results.brands:

        print("'{}' brand detected with confidence {:.1f}% at location {}, {}, {}, {}".format( brand.name, brand.confidence \* 100, brand.rectangle.x, brand.rectangle.x + brand.rectangle.w, brand.rectangle.y, brand.rectangle.y + brand.rectangle.h))

Azure Computer Vision API — General Brand Detection

Quickstart: Computer Vision client library - Azure Cognitive Services

Step #4 Use the Azure Custom Vision Service to Extract Custom Logos from Keyframes

The Azure Computer Vision API, provides the ability to capture many of the worlds most popular brands, but sometimes a brand may be more obscure. In the last section, we will use the Custom Vision Service, to train a custom logo detector to detect the Azure Developer Relation Mascot Bit in in the keyframes extracted by Video Indexer.

My training set for Custom Bit Detector

This tutorial assumes you know how to train a Custom Vision Service object detection model for brand detection. If not check out the If not, check out the documentation below for a tutorial.

Tutorial: Use custom logo detector to recognize Azure services - Custom Vision - Azure Cognitive Services

Instead of deploying to mobile, however we will use the python client API for the Azure Custom Vision Service. All the information you’ll need can be found in the settings menu of your Custom Vision project.

Settings menu for Custom Vision Service

Installation:

pip install azure-cognitiveservices-vision-customvision

Code:

  • Initialize Custom Vision Service Client
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient

prediction\_threshold = .8
prediction\_key = "Custom Vision Service Key"
custom\_endpoint = "Custom Vision Service Endpoint"
project\_id = "Custom Vision Service Model ProjectId"
published\_name = "Custom Vision Service Model Iteration Name"

predictor = CustomVisionPredictionClient(prediction\_key, endpoint=published\_name)
  • Use Custom Vision Service Model to Predict Key Frames
import time
timeout\_interval, timeout\_time = 5, 10.0

for index, keyframe in enumerate(keyframes):
    if index % timeout\_interval == 0:
       print("Trying to prevent exceeding request limit waiting {} seconds".format(timeout\_time))
       time.sleep(timeout\_time)

    # Get KeyFrame Image Byte String From Video Indexer
    img\_str = vi.get\_thumbnail\_from\_video\_indexer(video\_id, keyframe)

    # Convert Byte Stream to Image Stream
    img\_stream = io.BytesIO(img\_str)

    # Analyze with Azure Computer Vision
    cv\_results = predictor.detect\_image(project\_id, published\_name, img\_stream)
    predictions = [pred for pred in cv\_results.predictions if pred.probability > prediction\_threshold]
    print("Detecting brands in keyframe {}: ".format(keyframe))

    if len(predictions) == 0:
       print("No custom brands detected.")
    else:
       for brand in predictions:
           print("'{}' brand detected with confidence {:.1f}% at location {}, {}, {}, {}".format( brand.tag\_name, brand.probability \* 100, brand.bounding\_box.left, brand.bounding\_box.top, brand.bounding\_box.width, brand.bounding\_box.height))

Conclusion

And there we have it! I am able to find all the frames that have either Microsoft for or the Cloud Advocacy Bit Logo in my video.

Sample Key Frames with Bit

Next Steps

You now have all you need to extend the Azure Video Indexer Service with your own custom computer vision models. Below is a list of additional resources to take that will help you take your integration with Video Indexer to the next level.

Offline Computer Vision

In a production system, you might see request throttling from a huge number of requests. In this case, the Azure Computer Vision service can be run in an offline container

How to install and run containers - Computer Vision - Azure Cognitive Services

Additionally, the Custom Vision model can be run locally as well.

Tutorial - Deploy Custom Vision classifier to a device using Azure IoT Edge

Video Indexer + Zoom Media

Azure-Samples/media-services-video-indexer

Creating an Automated Video Processing Flow in Azure

Creating an automated video processing flow in Azure

About the Author

Aaron (Ari) Bornstein is an AI researcher with a passion for history, engaging with new technologies and computational medicine. As an Open Source Engineer at Microsoft’s Cloud Developer Advocacy team, he collaborates with Israeli Hi-Tech Community, to solve real world problems with game changing technologies that are then documented, open sourced, and shared with the rest of the world.


Top comments (1)

Collapse
 
shaijut profile image
Shaiju T

Hi Can you answer my question. I am planning to participate in a Azure Hackathon. So I have idea to detect social distancing using phone camera in real time. Is there a way to do this using Azure Custom Vision AI OR is there some other Azure services for this purpose ?