You know what's great about APIs? (Put aside they can be fun to use?)
They shield off insane complexity and give you the essence of what you need. Perfect example: image recognition services!
You might not be a data scientist, but you sure can identify objects and do a lot of fancy stuff.
Like recognize hand-written text, using Azure Cognitive Services: Computer Vision. These services democratize complex pre-trained machine learning models. In case of recognizing hand-written text, the results are quite impressive.
Now imagine this very important every-day business scenario:
You have a letter with hand-written text on it and want to transfer it into machine-readable text, allowing us to finally put somewhere where we can do something with it. Or store it. Or post it on Facebook. Whatever.
What we will do / our TO-DO list:
- Make a picture of some hand-written text and upload, if the picture you want to use is not already publicly hosted
- Run that picture through Cognitive Services: Computer Vision and it's function to recognize hand-written text
- Extract the recognized text
So you see we are in the top layer here with Cognitive Services. The same goes for similar services like AWS Textract or AWS Comprehend. While I will write about Azure Services, I am giving no recommendations - feel free to use other similar services as you like. Maybe you want to experiment with Google Vision as well?
How we will do it:
We will use Logic Apps as workflow engine. Why? Minimum coding, because code you do not write is code you do not have to maintain. Flexible. Logic App flows can be easily cloned, logging and monitoring are fully integrated. Speaking of integrations, the service comes with plenty of connectors. Plenty of configuration options. Microsoft Flow is the little brother of Logic Apps, so if you know one you pretty much know the other.
OK, now let us go through that to-do list!
- Upload Picture
That's so straight-forward I will not to go into much detail. Upload your picture and make it publicly available. If you are dealing with sensitive content make sure to use a short-lived shared access signature (SAS) on Azure Storage or respectively a signed URL on AWS S3.
The next flow step extracts the location of the public image URL. We'll need this for the next step --
We'll use it to call up this endpoint, at least if your region is Westeurope like in my case:
Logic App allows to deal with case distinctions and exceptions at every flow step. And we should take care of this as in theory it could always be the case that the processing is not finished, even after the x seconds we waited. (At that point I wish callbacks would have been used instead of asking to poll, but that might be a good suggestion to the Cognitive Services team.)
What's next is parsing the response. We just create a new variable and append the "text" property for every single "line" array item.
One last best practice advise. You should put your Logic Apps behind an API Management system like Azure API Management and only access them in that manner publicly. Plenty of reasons for that, like DDOS protection and fine-grained access control, among others.