Building a document management software: how to choose the best OCR library

#appdev #programming #softwaredevelopment

In this post, I will share the experience of our company in choosing an OCR library that suits best for solving our tasks and goals.

As we needed to improve document management within our company, notably to automate the process of enterprise paper records analysis, we decided to create a software solution based on one of OCR libraries.

OCR, or optical character recognition , is a mechanical or electronic conversion of images of typed text into machine-encoded text.

OCR also represents a method of digitizing a printed text so that it can be electronically stored, edited, displayed, and used in machine processes like cognitive computing, machine translation, and data mining.

What’s more, OCR is applied as a form of information entry from paper documents (including financial records, business cards, invoices, and a lot more).

Before starting the development process, we made a research on the three most popular OCR libraries in order to determine the one that would suit our goals best.

We investigated the following libraries:

Google Text Recognition API
Tesseract
Anyline

Google Text Recognition API

Google Text Recognition API is the process of detecting text in images and video streams and recognizing the text contained therein. Once detected, the recognizer determines the actual text in each block and segments it into lines and words.

The Text API detects text in multiple languages (French, German, English, etc.) in real-time.

One should note that in general Google Text Recognition API was effective for solving our tasks. We received the ability to recognize text both in real-time and text documents’ ready images.

During our research, we defined some pros and cons of using Google Text Recognition OCR library.

Pros: