Convert Image Text in your Preferred Language using Azure Cognitive Services

These days if you want to create an interesting app, some part of it should have intelligence in it. And when I speak of intelligence, it is related to Machine Learning and Artificial Intelligence. Few years back, working on Machine Learning was a complex task that required you to clean data, create an ML Algorithm, train, score and evaluate your model. It was a very time-consuming process. It is still the same now. However, with the advent of Cloud Technologies and the extensive development of the Machine learning services, working on Machine Learning is just a few clicks through the Cloud Providers. All you have to worry about is how you want your functionality to work and the rest is taken care of by Machine Learning Services. Azure Cognitive Service is one such example!

I have been a backend developer for most of my career, and recently got trained in Azure Cloud. My focus area is mostly developing solutions with Azure Cloud, but Machine Learning is hard to ignore these days. This is evident from the Gartner 2021 report survey that emphasizes on the Cloud AI Developer services (CAIDS) that would allow development teams and business users to leverage artificial intelligence models via APIs, SDKs, or applications without requiring deep data science expertise (check the references section). This definitely means that developers like us would have to soon adapt the understanding of the Machine Learning capabilities that cloud provides.

Having good development experience in Azure Cloud, I started exploring the Azure Cognitive services. The intent was to learn and develop a solution using my expertise in Coding blended with the Azure ML and AI Services.

Which Use Case to Pick?

This was the most critical choice to make as I was biased towards building a solution where my coding expertise would do most of the magic. However, building a solution with a focus on ML and AI was the desired outcome. After much brainstorming during the ideation phase, I finally decided to use the Azure Cognitive Services – Image to Text, and Translator service to design a solution that makes use of this service.

My Use Case

My use case is as simple as it could be. The solution is to display a text, in this case an image, in one language, and display its corresponding translation in another language.

A typical scenario would be that a user captures a text image in one language and would want it to be translated to their preferred language. This could be typically used for capturing the image of sign boards or display banners in some language, and the solution would display the corresponding user’s preferred language. Any image in digital format can be easily converted to the other language. This could be a very useful use-case for places where there is vast diversity of languages used, and a simple translator would do the magic. India is one such great place where the native languages are used extensively across different states.

Environment and Services Used

For coding:

Visual Studio Code IDE

Services Used:

Azure Cognitive Services
Azure Translator Services

For this solution, I have used the image text in English where my solution would convert it in Hindi.

Application Workflow

The Solution

Step 1: Create an Azure Cognitive Service

Create the service with a unique name by providing the following:

Subscription
Resource Group
Region
Name - in my case ReadMyImage
Pricing tier

Once the deployment is complete, record the Key (any of the two keys), Endpoint and Location to be used in the code.

Step 2: Create an Azure Translator Service

Create the service with a unique name by providing the following:

Subscription
Resource Group
Region
Name - in my case MyImageTranslator
Pricing tier

Once the deployment is complete, record the Key (any of the two keys), Endpoint and Location to be used in the code.

Step 3: Develop the Solution

The solution development was done through the Visual Studio IDE. The Azure services API were called in through the Key and the Endpoints in the code. Here is how the solution was build:

The Image Reader Azure Vison class: This class is calling cognitive service to extract text in the image. The method accepts the image path and returns a collection of all the words extracted and the position in the image.

using Microsoft.Azure.CognitiveServices.Vision.ComputerVision;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ReadMyImage
{
    class ImageRegionDetails
    {
        public string Text { get; set; }
        public int Coordinate1 { get; set; }
        public int Coordinate2 { get; set; }
        public int Coordinate3 { get; set; }
        public int Coordinate4 { get; set; }
    }
    class ImageReaderAzureVision
    {
        private string cognitiveServicesKey = "******************************";
        // replace with your URL
        private string cognitiveServicesUrl = "https://readmyimage.cognitiveservices.azure.com/";
        string location = "southeastasia";

        public List<ImageRegionDetails> ProcessImage(string filepath = @"G:\demo.PNG")
        {
            var client = new ComputerVisionClient(new ApiKeyServiceClientCredentials(cognitiveServicesKey));
            client.Endpoint = cognitiveServicesUrl;

            var fileToProcess = File.OpenRead(filepath);
            var apiResult = client.RecognizePrintedTextInStreamAsync(false,
                                                                    fileToProcess);
            apiResult.Wait();

            var ocrResult = apiResult.Result;
            var t = ocrResult.Regions;
            List<ImageRegionDetails> ls = new List<ImageRegionDetails>();
            foreach (var r in ocrResult.Regions)
            {
                foreach (var l in r.Lines)
                {
                    foreach (var w in l.Words)
                    {
                        ImageRegionDetails imageRegionDetails = new ImageRegionDetails();
                        imageRegionDetails.Text = w.Text;
                        var coordinates = w.BoundingBox.Split(',');
                        imageRegionDetails.Coordinate1 = int.Parse(coordinates[0]);
                        imageRegionDetails.Coordinate2 = int.Parse(coordinates[1]);
                        imageRegionDetails.Coordinate3 = int.Parse(coordinates[2]);
                        imageRegionDetails.Coordinate4 = int.Parse(coordinates[3]);
                        ls.Add(imageRegionDetails);
                    }
                }
            }
            return ls;
        }
    }
}

The Text Translator class: This class calls the Azure Analytics service to get the words translated. The purpose is to replace individual word by corresponding translated text and superimpose on existing image. The method translate on this class returns asynchronously single word translated.

using Newtonsoft.Json;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;

namespace ReadMyImage
{
    class TextTranslator
    {
        string location = "southeastasia";
        private static readonly string subscriptionKey = "*************************";
        private static readonly string endpoint = "https://api.cognitive.microsofttranslator.com/";

        public async Task<string> translate(string text)
        {
            string route = "/translate?api-version=3.0&from=en&to=hi";
            object[] body = new object[] { new { Text = text } };
            var requestBody = JsonConvert.SerializeObject(body);
            using (var client = new HttpClient())
            using (var request = new HttpRequestMessage())
            {
                // Build the request.
                request.Method = HttpMethod.Post;
                request.RequestUri = new Uri(endpoint + route);
                request.Content = new StringContent(requestBody, Encoding.UTF8, "application/json");
                request.Headers.Add("Ocp-Apim-Subscription-Key", subscriptionKey);
                request.Headers.Add("Ocp-Apim-Subscription-Region", location);

                // Send the request and get response.
                HttpResponseMessage response = await client.SendAsync(request).ConfigureAwait(false);
                // Read response as a string.
                string result = await response.Content.ReadAsStringAsync();
                string temp = result.Substring(result.IndexOf("text") + 7, result.Length - result.IndexOf("text") - 7);
                return temp.Substring(0, temp.IndexOf(",") - 1);
            }
        }

    }
}

The Main Form class: Simple Windows form to orchestrate the idea of image tag replacement. It allows to browse the image and puts service calls together to show the final translated/superimposed image.

using Microsoft.Azure.CognitiveServices.Vision.ComputerVision;
using Newtonsoft.Json;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace ReadMyImage
{
    public partial class Form1 : Form
    {

        public Form1()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, EventArgs e)
        {
            var digres =openFileDialog1.ShowDialog();
            if (digres == DialogResult.OK)
            {
                string filepath = openFileDialog1.FileName;
                pictureBox1.ImageLocation = filepath;
            }
        }

        private void button2_Click(object sender, EventArgs e)
        {
            StringBuilder sb = new StringBuilder();
            var image = new Bitmap(this.pictureBox1.Width, this.pictureBox1.Height);
            Font font = new Font("TimesNewRoman", 25, FontStyle.Bold, GraphicsUnit.Pixel);
            var graphics = Graphics.FromImage(image);
            ImageReaderAzureVision imageReaderAzureVision = new ImageReaderAzureVision();
            var result = imageReaderAzureVision.ProcessImage();
            TextTranslator textTranslator = new TextTranslator();
            foreach (var l in result)
            {
                sb.Append(" " + l.Text);
                string s = textTranslator.translate(l.Text).Result;

                graphics.FillRectangle(Brushes.Bisque, new Rectangle(l.Coordinate1, l.Coordinate2, l.Coordinate3, l.Coordinate4));
                graphics.DrawString(s, font, Brushes.Green, new Point(l.Coordinate1, l.Coordinate2));
            }
            this.pictureBox2.Image = image;
            pictureBox2.Refresh();
            label1.Text = sb.ToString();
        }
    }
}

Step 4: Testing the Solution

To test the solution, I have an image in .png format called ‘demo’. I uploaded the image and ran the code for further build.

Once a part of the code is executed, I am brought back to the form page. I clicked on the Translate Image button, and ran the remaining part of the code.