Nishiki Asumi Yapa

Posted on Apr 2, 2021

Azure Computer vision - Introduction

#azure

This is the first part of the 'Azure cognitive service - Computer vision' article series. The article describes the fundamentals of computer vision service.

What is cognitive service?

Cognitive Services are a set of machine learning algorithms that Microsoft has developed to solve problems in the field of Artificial Intelligence (AI). The importance of the cognitive service is that it is used pre-trained models to analyse images.

There are a set of services under cognitive service as follow.

Vision
Speech
Language
Knowledge
Search

What is computer vision?

The Computer Vision service from Azure provides you with access to advanced algorithms that process images and return information based on the visual features you're interested in. Pixel values of the image can be used as 'features' to train machine learning models and make the prediction.

There are mainly three steps on the ways of using computer vision service.

Step 01 - Create a resource

Before you create a resource on Microfot azure portal, you should have an Azure subscription.

There are basically two (02) types of resources.

Computer vision
If you are using only one cognitive service (only for image analysis), you can use a computer vision resource.
Cognitive service
If you are using more than one cognitive service, you can use a cognitive service resource.

step 02 - get the key and endpoint

After you are choosing the relevant resource type according to your preference, you may create that resource in your account.

You can copy the key and the endpoint from the resource management section, to the code which you are using to implement the service.

key is used to authenticate the client application
endpoint provides the HTTP address at which your resource can be accessed.

step 03 - submit the image

Read the text with the computer vision service

As discussed in the previous topic, we can use either computer vision or cognitive service resource to do the process.

Under the process of reading text, the word OCR (Optimal Character Recognition) is mostly used.

What is OCR?

OCR is the basic foundation of processing printed image. This technique is used to take notes, digitizing forms and, scan printed or handwritten checks, etc.

Computer vision service provides two (02) API's that can be used to read the text in images.

OCR API

OCR API extract a small amount of text in the image. It can be recognized text in numerous languages. This API returns a hierarchy of information to process an image. The hierarchy as follows.

 Region
 Lines
 words

For each element, OCR API also returns boundary box coordinates, which is similar to the outline of the text. It defines a rectangle to indicate the location in the image where the region, lines and words appear.

Read API

Read API is a better option for scanned document with lots of text and to extract text from both handwritten and printed images.

There are mainly three (03) processes to use Read API.

step 01 - submit an image to the API and retrieve an operation ID in response

step 02 - use operation ID to check the state of the image analysis operator and wait until it has completed.

step 03 - retrieve the result of the operation.

Read API also returns the hierarchy of information to process the image.

 Pages
 Lines
 words

How to extract text from the handwritten image and printed image?
Continued for part 02 of this article series... 😉

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

DEV Community

Azure Computer vision - Introduction

Step 01 - Create a resource

step 02 - get the key and endpoint

step 03 - submit the image

Read the text with the computer vision service

step 01 - submit an image to the API and retrieve an operation ID in response

step 02 - use operation ID to check the state of the image analysis operator and wait until it has completed.

step 03 - retrieve the result of the operation.

Get n8n VPS hosting 3x cheaper than a cloud solution

Top comments (0)

A Workflow Copilot. Tailored to You.

Read next

Unveiling the wxWidgets License: A Deep Dive into Freedom, Fairness, and Flexibility

Exploring the Boost Software License 1.0: Simplicity Meets Innovation

"Rethinking Cultural Alignment: The Unseen Biases in AI Evaluations"

Unveiling the OpenSSL License: A Deep Dive into Open Source Security

Okay