DEV Community

Cover image for Low-Code Cognitive Services for all: Computers making sense of hand-written text
Mohammed Brueckner
Mohammed Brueckner

Posted on • Updated on

Low-Code Cognitive Services for all: Computers making sense of hand-written text

You know what's great about APIs? (Put aside they can be fun to use?)
They shield off insane complexity and give you the essence of what you need. Perfect example: image recognition services!
You might not be a data scientist, but you sure can identify objects and do a lot of fancy stuff.

Like recognize hand-written text, using Azure Cognitive Services: Computer Vision. These services democratize complex pre-trained machine learning models. In case of recognizing hand-written text, the results are quite impressive.
testing the Computer Vision service with Postman

Now imagine this very important every-day business scenario:
You have a letter with hand-written text on it and want to transfer it into machine-readable text, allowing us to finally put it somewhere where we can do something with it. Or store it. Or post it on Facebook. Whatever.

What we will do / our TO-DO list:

  1. Make a picture of some hand-written text and upload, if the picture you want to use is not already publicly hosted
  2. Run that picture through Cognitive Services: Computer Vision and it's function to recognize hand-written text
  3. Extract the recognized text

Before we go into the APIs you need - and of course it's APIs you are looking for - let's make sure we all understand the basic classifications.
AI services

So you see we are in the top layer here with Cognitive Services. The same goes for similar services like AWS Textract or AWS Comprehend. While I will write about Azure Services, I am giving no recommendations - feel free to use other similar services as you like. Maybe you want to experiment with Google Vision as well?

How we will do it:
We will use Logic Apps as workflow engine. Why? Minimum coding, because code you do not write is code you do not have to maintain. Flexible. Logic App flows can be easily cloned, logging and monitoring are fully integrated. Speaking of integrations, the service comes with plenty of connectors. Plenty of configuration options. Microsoft Flow is the little brother of Logic Apps, so if you know one you pretty much know the other.

OK, now let us go through that to-do list!

  1. Upload Picture

That's so straight-forward I will not to go into much detail. Upload your picture and make it publicly available. If you are dealing with sensitive content make sure to use a short-lived shared access signature (SAS) on Azure Storage or respectively a signed URL on AWS S3.

For steps 2 - 3: Run a Logic App workflow.
handwritten text recognition with logic apps
Everything is kicked off with a HTTP trigger. The trigger expects a simple JSON payload like
{
"piclocation":"xyz"
}

The next flow step extracts the location of the public image URL. We'll need this for the next step --

We'll use it to call up this endpoint, at least if your region is Westeurope like in my case:
https://westeurope.api.cognitive.microsoft.com/vision/v1.0/recognizeText?handwriting=true

A HTTP response header called "Operation-Location" is parsed out in the first "set variable" action:
set flow variables and delay

That location header tells us where to "knock" for processing updates. So we give it a couple of seconds, that is what the delay is all about, and then continue to call up the service.
deferred call of the location endpoint

Logic App allows to deal with case distinctions and exceptions at every flow step. (NB: I wish callbacks would have been used instead of asking to poll, but that might be a good suggestion to the Cognitive Services team.)

What's next is parsing the response. We just create a new variable and append the "text" property for every single "line" array item.

getting the results from the contained array

And that is about it. Just some basic check and up we go, in my case pushing a message to Slack.
basic checks and post to slack

Amazing. Computers can finally read you. Literally.
happy message output to a Slack channel

One last best practice advise. You should put your Logic Apps behind an API Management system like Azure API Management and only access them in that manner publicly. Plenty of reasons for that, like DDOS protection and fine-grained access control, among others.
Enjoy!

Discussion (0)