DEV Community

Cover image for Using Azure Doc intelligence for OCR
Ishwar398
Ishwar398

Posted on

Using Azure Doc intelligence for OCR

Whenever we need to read text from a PDF File, Image, Doc file etc. we use Optical Character Recognition (OCR). With OCR, we can read a document, handwritten or typed, across all the supported formats.
Azure Document Intelligence is an AI service which embeds the intelligence of AI in performing OCR. There are many use cases of Azure Document Intelligence other than OCR, but for this post we will stick to OCR.

Creating the Azure Document Intelligence service

Search for Azure Document intelligence on the Azure portal and the click on Create.
Fill in the details like the Subscription, Resource Group, Region, Name and the Pricing Tier.

Document Intelligence Service creation

Performing OCR

We can perform OCR on a document using 2 ways.

  1. Using the File URL (40MB size limitation for F0 tier)
  2. Using the actual file (4MB size limitation for F0 tier)

Using the File URL

When we need to perform OCR on the file that's present on some storage or is hosted, we can use the URL of the file to perform OCR on it. The only requirement here is that the URL should be publicly accessible. If the URL is not publicly available, Document Intelligence will not be able to read it.
When using this way, the POST request body should contain the URL of the file.

{ 
   'urlSource': 'URL_OF_THE_FILE'
}
Enter fullscreen mode Exit fullscreen mode

Along with this, the header value for Content-Type should be as follows:

Content-Type: "application/json"
Enter fullscreen mode Exit fullscreen mode

Using the actual file

Now, if we need to send the file directly to the Document intelligence service, we can use this way.
Here, the POST request body will contain the file, and the header value for Content-Type will be as follows:

Content-Type: "application/octet-stream"
Enter fullscreen mode Exit fullscreen mode

Document Intelligence in action for doing OCR

Once everything is setup, we can use the Document Intelligence service using REST API. There are other options available as well. But for this post, we will be focusing on REST API.
Getting the OCR results from Document Intelligence is a two step process.
The first step is to upload the file using any of the desired way i.e. either by sending a file directly or by providing the File URL.
This will provide us with the Result ID.

The second step is to use the Result ID to get the results.

First. let's try using the File URL.
I'll be using Postman to call the API endpoints.

Step 1:
We need to send the document for analysis. The API endpoints should be as follows:

{endpoint}/formrecognizer/documentModels/{modelID}:analyze?api-version=2023-07-31
Enter fullscreen mode Exit fullscreen mode

Endpoint: The endpoint provided on the Azure portal for the Document Intelligence service
modelID: prebuilt-document (using this model for OCR)

Step 2: Setting up the headers
Apart from the normal headers, we need to add two headers.

Ocp-Apim-Subscription-Key: Get this key from the Azure portal
Content-Type: "application/json"

Body:
Currently, I'm considering a dummy hosted PDF file

{ 
   "urlSource": "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
}
Enter fullscreen mode Exit fullscreen mode

Send this as a POST request. If everything is correct, you'll get a 202 Accepted response. Check the response header. You'll get a apim-request-id in the headers.
Copy this request id.

Response Headers

Step 3: Getting the results
To Get the results for OCR, we need to make a GET request.

https://{endpoint}/formrecognizer/documentModels/{modelId}/analyzeResults/{resultId}?api-version=2023-07-31
Enter fullscreen mode Exit fullscreen mode

Endpoint: your azure document intelligence service endpoint
modelID: prebuilt-document
resultId: apim-request-id from the first step

Make this GET request. If everything is correct, you'll get a 200OK response along with the content of the PDF file.

OCR Result

Using the actual file

Now, if we need to send the file to the service instead of the File URL, only 2 things will change in the Step 1.

Content-Type: "application/octet-stream"
Enter fullscreen mode Exit fullscreen mode

Content-Type

In the body, instead of the File URL, we need to send the actual file.

Body for the POST request

Send the request, if everything is correct you'll get the apim-request-id, which can then be used in similar way.

Top comments (0)