Olalekan Oladiran

Posted on Jul 29

Teaching Computers to Understand Images: Hands-On with Azure AI Vision

#ai #machinelearning #python #vscode

Introduction

Azure AI Vision is an artificial intelligence capability that enables software systems to interpret visual input by analyzing images. In Microsoft Azure, the Vision Azure AI service provides pre-built models for common computer vision tasks, including analysis of images to suggest captions and tags, detection of common objects and people. You can also use the Azure AI Vision service to remove the background or create a foreground matting of images.

Provision an Azure AI Vision resource

Open the Azure portal at https://portal.azure.com, and sign in using your Azure credentials. Close any welcome messages or tips that are displayed.
Select Create a resource.
In the search bar, search for Computer Vision, select Computer Vision, and create the resource with the following settings:
- Subscription: Your Azure subscription
- Resource group: Create or select a resource group
- Region: Choose from East US, West US, France Central, Korea Central, North Europe, Southeast Asia, West Europe, or East Asia*
- Name: A valid name for your Computer Vision resource
- Pricing tier: Free F0 *Azure AI Vision 4.0 full feature sets are currently only available in these regions.
Select the required checkboxes and create the resource.
Wait for deployment to complete, and then view the deployment details.
When the resource has been deployed, go to it and under the Resource management node in the navigation pane, view its Keys and Endpoint page. You will need the endpoint and one of the keys from this page in the next procedure.

Develop an image analysis app with the Azure AI Vision SDK

Open VS Code
Enter the following commands to clone the GitHub repo containing the code files for this exercise git clone https://github.com/MicrosoftLearning/mslearn-ai-vision

After the repo has been cloned, use the following command to navigate to and view the folder containing the application code files: cd mslearn-ai-vision/Labfiles/analyze-images/python/image-analysis

The folder contains application configuration and code files for your app. It also contains a /images subfolder, which contains some image files for your app to analyze.

Install the Azure AI Vision SDK package and other required packages by running the following commands:

pip install -r requirements.txt azure-ai-vision-imageanalysis==1.0.0

Open env file in VS Code, update the configuration values it contains to reflect the endpoint and an authentication key for your Computer Vision resource (copied from its Keys and Endpoint page in the Azure portal).
After you’ve replaced the placeholders, use the CTRL+S command to save your changes and then use the CTRL+Q command to close the code editor while keeping the cloud shell command line open.

Add code to suggest a caption

Open image-analysis.py in VS Code.
In the code file, find the comment Import namespaces, and add the following code to import the namespaces you will need to use the Azure AI Vision SDK:

# import namespaces
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential

In the Main function, note that the code to load the configuration settings and determine the image file to be analyzed has been provided. Then find the comment Authenticate Azure AI Vision client and add the following code to create and authenticate a Azure AI Vision client object (be sure to maintain the correct indentation levels):

# Authenticate Azure AI Vision client
cv_client = ImageAnalysisClient(
     endpoint=ai_endpoint,
     credential=AzureKeyCredential(ai_key))

In the Main function, under the code you just added, find the comment Analyze image and add the following code:

# Analyze image
with open(image_file, "rb") as f:
     image_data = f.read()
print(f'\nAnalyzing {image_file}\n')

result = cv_client.analyze(
     image_data=image_data,
     visual_features=[
         VisualFeatures.CAPTION,
         VisualFeatures.DENSE_CAPTIONS,
         VisualFeatures.TAGS,
         VisualFeatures.OBJECTS,
         VisualFeatures.PEOPLE],
)

Find the comment Get image captions, add the following code to display image captions and dense captions:

# Get image captions
if result.caption is not None:
     print("\nCaption:")
     print(" Caption: '{}' (confidence: {:.2f}%)".format(result.caption.text, result.caption.confidence * 100))

if result.dense_captions is not None:
     print("\nDense Captions:")
     for caption in result.dense_captions.list:
         print(" Caption: '{}' (confidence: {:.2f}%)".format(caption.text, caption.confidence * 100))

Save your changes (CTRL+S) and resize the panes so you can clearly see the command line console while keeping the code editor open. Then enter the following command to run the program with the argument images/street.jpg: python3 image-analysis.py images/street.jpg

Observe the output, which should include a suggested caption for the street.jpg image, which looks like this:
Run the program again, this time with the argument images/building.jpg to see the caption that gets generated for the building.jpg image, which looks like this:
Repeat the previous step to generate a caption for the images/person.jpg file, which looks like this:

Add code to generate suggested tags

It can sometimes be useful to identify relevant tags that provide clues about the contents of an image.

In VS Code, ind the comment Get image tags and add the following code:

# Get image tags
if result.tags is not None:
     print("\nTags:")
     for tag in result.tags.list:
         print(" Tag: '{}' (confidence: {:.2f}%)".format(tag.name, tag.confidence * 100))

Save your changes (CTRL+S) and run the program with the argument images/street.jpg, observing that in addition to the image caption, a list of suggested tags is displayed.
Rerun the program for the images/building.jpg and images/person.jpg files.

Add code to detect and locate objects

In the code editor, in the AnalyzeImage function, find the comment Get objects in the image and add the following code to list the objects detected in the image, and call the provided function to annotate an image with the detected objects:

# Get objects in the image
if result.objects is not None:
     print("\nObjects in image:")
     for detected_object in result.objects.list:
         # Print object tag and confidence
         print(" {} (confidence: {:.2f}%)".format(detected_object.tags[0].name, detected_object.tags[0].confidence * 100))
     # Annotate objects in the image
     show_objects(image_file, result.objects.list)

Save your changes (CTRL+S) and run the program with the argument images/street.jpg, observing that in addition to the image caption and suggested tags; a file named objects.jpg is generated.
Check the object file
Rerun the program for the images/building.jpg and images/person.jpg files, downloading the generated objects.jpg file after each run.

Add code to detect and locate people

In the code editor, in the AnalyzeImage function, find the comment Get people in the image and add the following code to list any detected people with a confidence level of 20% or more, and call a provided function to annotate them in an image:

# Get people in the image
if result.people is not None:
     print("\nPeople in image:")

     for detected_person in result.people.list:
         if detected_person.confidence > 0.2:
             # Print location and confidence of each person detected
             print(" {} (confidence: {:.2f}%)".format(detected_person.bounding_box, detected_person.confidence * 100))
     # Annotate people in the image
     show_people(image_file, result.people.list)

Save your changes (CTRL+S) and run the program with the argument images/street.jpg, observing that in addition to the image caption, suggested tags, and objects.jpg file; a list of person locations and file named people.jpg is generated.
Open people.jpg file
Rerun the program for the images/building.jpg and images/person.jpg files, downloading the generated people.jpg file after each run.

You’ve just unlocked the power of computer vision—transforming raw pixels into intelligent insights with Azure AI. From auto-generating captions to detecting objects and people, you now have the tools to build applications that truly see and understand visual data.

Project guide link: https://microsoftlearning.github.io/mslearn-ai-vision/Instructions/Labs/01-analyze-images.html