Olalekan Oladiran

Posted on Jul 29

Extract Text Like Magic: Build an OCR App with Azure AI Vision in Python

#python #azure #ai #machinelearning

Introduction

Optical character recognition (OCR) is a subset of computer vision that deals with reading text in images and documents. The Azure AI Vision Image Analysis service provides an API for reading text, which you’ll explore in this exercise.

Provision an Azure AI Vision resource

Open the Azure portal at https://portal.azure.com, and sign in using your Azure credentials. Close any welcome messages or tips that are displayed.
Select Create a resource.
In the search bar, search for Computer Vision, select Computer Vision, and create the resource with the following settings:
- Subscription: Your Azure subscription
- Resource group: Create or select a resource group
- Region: Choose from East US, West US, France Central, Korea Central, North Europe, Southeast Asia, West Europe, or East Asia*
- Name: A valid name for your Computer Vision resource
- Pricing tier: Free F0 *Azure AI Vision 4.0 full feature sets are currently only available in these regions.
Select the required checkboxes and create the resource.
Wait for deployment to complete, and then view the deployment details.
When the resource has been deployed, go to it and under the Resource management node in the navigation pane, view its Keys and Endpoint page. You will need the endpoint and one of the keys from this page in the next procedure.

Develop a text extraction app with the Azure AI Vision SDK

Open VS Code
Enter the following commands to clone the GitHub repo containing the code files for this exercise git clone https://github.com/MicrosoftLearning/mslearn-ai-vision

After the repo has been cloned, use the following command to navigate to and view the folder containing the application code files: cd mslearn-ai-vision/Labfiles/ocr/python/read-text

The folder contains application configuration and code files for your app. It also contains an /images subfolder, which contains some image files for your app to analyze.

Install the Azure AI Vision SDK package and other required packages by running the following commands:

pip install -r requirements.txt azure-ai-vision-imageanalysis==1.0.0

Open env file in VS Code, update the configuration values it contains to reflect the endpoint and an authentication key for your Computer Vision resource (copied from its Keys and Endpoint page in the Azure portal).
After you’ve replaced the placeholders, use the CTRL+S command to save your changes and then use the CTRL+Q command to close the code editor while keeping the cloud shell command line open.

Add code to read text from an image

Open read-text.py in VS Code.
In the code file, find the comment Import namespaces, and add the following code to import the namespaces you will need to use the Azure AI Vision SDK:

# import namespaces
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential

In the Main function, the code to load the configuration settings and determine the file to be analyzed has been provided. Then find the comment Authenticate Azure AI Vision client and add the following language-specific code to create and authenticate an Azure AI Vision Image Analysis client object:

# Authenticate Azure AI Vision client
cv_client = ImageAnalysisClient(
     endpoint=ai_endpoint,
     credential=AzureKeyCredential(ai_key))

In the Main function, under the code you just added, find the comment Read text in image and add the following code to use the Image Analysis client to read the text in the image:

# Read text in image
with open(image_file, "rb") as f:
     image_data = f.read()
print (f"\nReading text in {image_file}")

result = cv_client.analyze(
     image_data=image_data,
     visual_features=[VisualFeatures.READ])

Find the comment Print the text and add the following code (including the final comment) to print the lines of text that were found and call a function to annotate them in the image (using the bounding_polygon returned for each line of text):

# Print the text
if result.read is not None:
     print("\nText:")

     for line in result.read.blocks[0].lines:
         print(f" {line.text}")        
     # Annotate the text in the image
     annotate_lines(image_file, result.read)

     # Find individual words in each line

Save your changes (CTRL+S) but keep the code editor open in case you need to fix any typo’s.
Resize the panes so you can see more of the console, then enter the following command to run the program:

python3 read-text.py images/Lincoln.jpg
The program reads the text in the specified image file (images/Lincoln.jpg), which looks like this:
Open lines.jpg
Run the program again, this time specifying the parameter images/Business-card.jpg to extract text from the following image:

python read-text.py images/Business-card.jpg

Run the program one more time, this time specifying the parameter images/Note.jpg to extract text from this image:

python read-text.py images/Note.jpg

Add code to return the position of individual words

Resize the panes so you can see more of the code file. Then find the comment Find individual words in each line and add the following code (being careful to maintain the correct indentation level):

# Find individual words in each line
print ("\nIndividual words:")
for line in result.read.blocks[0].lines:
     for word in line.words:
         print(f"  {word.text} (Confidence: {word.confidence:.2f}%)")
# Annotate the words in the image
annotate_words(image_file, result.read)

Save your changes (CTRL+S). Then, in the command line pane, rerun the program to extract text from images/Lincoln.jpg.
Observe the output, which should include each individual word in the image, and the confidence associated with their prediction.
In the read-text folder, a words.jpg image has been created. Open words.jpg
Rerun the program for images/Business-card.jpg and images/Note.jpg; viewing the words.jpg file generated for each image.

You’ve just unlocked the power to transform images into actionable text data—whether it’s digitizing documents, processing receipts, or extracting text from photos. With Azure AI Vision, what once required manual effort now takes just a few lines of Python code.

Project guide link: https://microsoftlearning.github.io/mslearn-ai-vision/Instructions/Labs/02-ocr.html