DEV Community

Mehr Muhammad Hamza
Mehr Muhammad Hamza

Posted on

Explore Azure OCR in 125 Languages with IronOCR

This article is all about AzureOcr and IronOcr. If you are looking for tesseract Ocr, you can refer to my previous article from this link.

Azure OCR by Microsoft:

Optical Character Recognition (OCR) allows you to extract handwritten or printed text from images like documents, bills and articles etc. Microsoft supports 73 languages for extracting printed documents.

Read API in lastest AzureOCR Technology extracts printed and handwritten text , digits and currency symbol from images and PDF documents. It is optimized for extracting text from heavy images or multi-page PDF documents. It's features include:

  1. Print text extraction in 73 languages
  2. Handwritten text extraction in English
  3. Text lines and words with location and confidence scores
  4. No language identification required
  5. Support for mixed languages, mixed mode (print and handwritten)

IronOCR:

IronOCR is a C# software library that allows .NET developers to read text from images and PDF documents. It is a pure .NET OCR library. Azure OCR is an excellent tool allowing to extract text from an image by API calls. It provides developers with access to advanced algorithms that process images and return information. To analyze an image, you can either upload an image or specify an image URL.

How Is IronOCR better than AzureOCR?

The cloud-based OCR API provides developers with access to advanced algorithms for reading text in images and returning structured content and you can learn how to extract printed and handwritten text in multiple languages with quickstarts, tutorials, and samples. IronOCR supports 125 international languages.

Benefits To Use IronOCR in AzureOCR:

IronOCR provides all these :

  1. In Azure OCR, there is Azure Cognitive Services which is a computer vision API but incase you wanted to make sure that you could run this without making API calls, and without paying more as you scale up. IronOCR is a one time fee.
  2. It Support multiple languages as compare to other libraries who supports English only.
  3. It provides a smart cleanup like if the scanned document or image is a bit scratchy, the library come up with something to make it clean.
  4. It has a huge feature that works both for printed and handwritten.
  5. You can easily use it in the cloud that means that you don’t need to install anything because you may not be using a VM at all and it will be entirely serverless.

Azure OCR API features:

  1. The ability to perform an OCR on almost any file, image, or PDF
  2. Lightning-fast speed
  3. Exceptional accuracy
  4. Reads bar codes and QR codes
  5. Runs locally, with no SaaS required
  6. Can turn PDFs and images into searchable documents
  7. Excellent Alternative to Azure OCR from Microsoft Cognitive Services.

IronOCR With AzureOCR:

Let’s s see an example of screenshot from the Google Books page on Frankenstein by Mary Shelly.
image
I then took my C#/.NET Console Application, and ran the following in the nuget package manager to install IronOCR
Install-Package IronOcr
you can directly download from this link
And then onto the code and OCR’d this image to extract text, including line breaks and everything, using 4 lines of code.

var ocr = new IronTesseract();
using (var Input = new OcrInput("Frankenstein.PNG"))
{
    var result = ocr.Read(Input);
    Console.WriteLine(result.Text);
}
Enter fullscreen mode Exit fullscreen mode

And here is the result

Frankenstein
Annotated for Scientists, Engineers, and Creators of All Kinds
By Mary Wollstonecraft Shelley - 2017

We are using Azure Functions in a microservices architecture, here is a simple function that can take a parameter of an image, OCR it, and return the text

public static class OCRFunction
{
    public static HttpClient _httpClient = new HttpClient();

    [FunctionName("OCRFunction")]
    public static async Task<IActionResult> Run([HttpTrigger] HttpRequest req, ExecutionContext context)
    {
        var imageUrl = req.Query["image"];
        var imageStream = await _httpClient.GetStreamAsync(imageUrl);

        var ocr = new IronTesseract();
        using (var input = new OcrInput(imageStream))
        {
            var result = ocr.Read(input);
            return new OkObjectResult(result.Text);
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

We take the parameter of picture, download it and OCR it right away. The good thing about doing this inside an Azure Function is that it support various pieces of our application in a microservice architecture without us duplicating the code everywhere.

In case you're right now paying for some help that charges a for each OCR expense. Things can seem modest yet at scale, the month to month expense can rapidly winding crazy. Contrast this with a one time expense with IronOCR, and you're getting what is basically a callable API all facilitated in the Azure Cloud, and without any continuous expenses

Non-English Support:

Many OCR libraries only support English language.

However, IronOCR supports 125 languages currently, and you can add as many or as few as you like by simply installing the applicable Nuget language pack.

Iron Azure OCR Language Support
image

Summary:

For past couple years, Computer vision and optical character recognition is on high demand. IronOCR supports 125 international languages as compare to Microsoft Azure that supports 73 languages. IronOCR has a computer vision API and an ability to convert any image , file or PDF in OCR with its advanced algorithms. Interested thing is, If you buy complete Iron Suite, you will get all 5 Products for the Price of 2. For further details about the licensing, Please follow this link to Purchase complete Package.
I hope that you have liked this article. Feel free to ask any query in the comment section.

Top comments (4)

Collapse
 
mahditaheri2004 profile image
mahditaheri2004

Hi,
I see you poster about IronOCR that support Persian language, does it really extract Persian from Image?
I have attached car plate number image in this post,so does IronOCR can extract Iranian number from this image?

Collapse
 
mhamzap10 profile image
Mehr Muhammad Hamza • Edited

Hi mahditaheri2004,
Yes, It can extract Iranian Number in Persian langugae from the Image. Just install Nuget Package for Persian Language "IronOCr.Language.Persian" as attached below.
Snapshot of Nuget Package IronOCr.Language.Persian

Collapse
 
mahditaheri2004 profile image
mahditaheri2004

Yes already tested this library in Persian but it gives me better result in Arabic!
however it is not very accurate in my project because I want to extract Car Plate Number and still not successful!

Thread Thread
 
mhamzap10 profile image
Mehr Muhammad Hamza

It's Quality might be too low. Furthermore, Car Number Plates don't work well with scanned document OCR - they need computer vision software.