This is the first part of the 'Azure cognitive service - Computer vision' article series. The article describes the fundamentals of computer vision service.
What is cognitive service?
Cognitive Services are a set of machine learning algorithms that Microsoft has developed to solve problems in the field of Artificial Intelligence (AI). The importance of the cognitive service is that it is used pre-trained models to analyse images.
There are a set of services under cognitive service as follow.
- Vision
- Speech
- Language
- Knowledge
- Search
What is computer vision?
The Computer Vision service from Azure provides you with access to advanced algorithms that process images and return information based on the visual features you're interested in. Pixel values of the image can be used as 'features' to train machine learning models and make the prediction.
There are mainly three steps on the ways of using computer vision service.
Step 01 - Create a resource
Before you create a resource on Microfot azure portal, you should have an Azure subscription.
There are basically two (02) types of resources.
Computer vision
If you are using only one cognitive service (only for image analysis), you can use a computer vision resource.Cognitive service
If you are using more than one cognitive service, you can use a cognitive service resource.
step 02 - get the key and endpoint
After you are choosing the relevant resource type according to your preference, you may create that resource in your account.
You can copy the key and the endpoint from the resource management section, to the code which you are using to implement the service.
key is used to authenticate the client application
endpoint provides the HTTP address at which your resource can be accessed.
step 03 - submit the image
Read the text with the computer vision service
As discussed in the previous topic, we can use either computer vision or cognitive service resource to do the process.
Under the process of reading text, the word OCR (Optimal Character Recognition) is mostly used.
What is OCR?
OCR is the basic foundation of processing printed image. This technique is used to take notes, digitizing forms and, scan printed or handwritten checks, etc.
Computer vision service provides two (02) API's that can be used to read the text in images.
- OCR API
OCR API extract a small amount of text in the image. It can be recognized text in numerous languages. This API returns a hierarchy of information to process an image. The hierarchy as follows.
Region
Lines
words
For each element, OCR API also returns boundary box coordinates, which is similar to the outline of the text. It defines a rectangle to indicate the location in the image where the region, lines and words appear.
- Read API
Read API is a better option for scanned document with lots of text and to extract text from both handwritten and printed images.
There are mainly three (03) processes to use Read API.
step 01 - submit an image to the API and retrieve an operation ID in response
step 02 - use operation ID to check the state of the image analysis operator and wait until it has completed.
step 03 - retrieve the result of the operation.
Read API also returns the hierarchy of information to process the image.
Pages
Lines
words
How to extract text from the handwritten image and printed image?
Continued for part 02 of this article series... 😉
Top comments (0)