DEV Community

Walkman42
Walkman42

Posted on • Originally published at juejin.cn

Chapter 1· Computer Vision (2) Image Text Recognition (OCR)

OCR

2. Recognizing Text Information on Cards (Chinese and English)

Now that we have successfully identified each card, the next step is to recognize the information on the cards.

Solution Comparison

Here are several candidate solutions:

  1. Tesseract OCR: Tesseract is an open-source OCR engine developed by Google. It can recognize multiple languages and excels in text recognition accuracy. Tesseract's Python binding library is called pytesseract, making it easy to integrate with Python.
  2. EasyOCR: EasyOCR is a user-friendly OCR library with support for multiple languages and pretrained models. It is based on deep learning technology and can be used for text detection and recognition in images.
  3. PyOCR: PyOCR is a Python wrapper for OCR that can integrate with various OCR engines, including Tesseract and CuneiForm, providing flexibility to choose the appropriate OCR engine based on requirements.
  4. Google Cloud Vision API: Google offers a cloud-based text recognition API that can be used via the Python SDK. It can recognize multiple languages, handwritten text, and printed text, with powerful image analysis capabilities.
  5. Microsoft Azure Computer Vision API: Microsoft Azure provides a set of computer vision APIs, including text recognition. You can access these APIs using Azure's Python SDK for tasks like text recognition and other image analysis tasks.
  6. Amazon Textract: Amazon Textract is Amazon's OCR service for extracting text and structured data from scanned documents. Amazon provides a Python SDK, making it easy to integrate into Python applications.[HA]

Among these options, options 4, 5, and 6 are cloud services and may incur costs, but they offer enhanced efficiency and stability compared to the others.

EasyOCR supports over 80 languages and performs better than Tesseract OCR in Chinese text recognition. Additionally, CnOCR, which uses Baidu PaddlePaddle models, offers good efficiency and accuracy and can complement EasyOCR.

Special Handling for Macbook Air M1/M2 Chip

I use a Macbook Air with an M1 chip, but the installation of EasyOCR on this machine can be quite cumbersome.

When installing it using the default method, you may encounter the error message "Could not initialize NNPACK! Reason: Unsupported hardware," and EasyOCR won't work. Therefore, if you are using the same development environment as mine, you can follow the steps below to resolve this issue.

If circumstances allow, we can create a virtual environment locally using conda (or other virtual environment software):

conda create --name openmmlab
conda activate openmmlab
Enter fullscreen mode Exit fullscreen mode

Installation of non-Mac M1 chip machines

conda install pytorch torchvision cpuonly -c pytorch
pip install easyocr
Enter fullscreen mode Exit fullscreen mode

That's right, if your hardware is not Apple's M1/M2 chip, congratulations on saving one and a half hours of compilation and installation time.

Installation of Mac M1/M2 chip machine

Continuing with the installation process, proceed to install PyTorch. Please note that you should use "USE_NNPACK=0" to disable NNPACK. This step is necessary because PyTorch's optimal configuration is typically designed for Nvidia GPUs. However, Macbook machines are not equipped with Nvidia GPUs, so you must enable the "cpuonly" option.

git clone https://github.com/pytorch/pytorch.git
cd pytorch
export USE_NNPACK=0 
# This step will take longer
python setup.py install
Enter fullscreen mode Exit fullscreen mode

Next, proceed with the installation of mmdetection, which is an open-source deep learning project for Object Detection and Instance Segmentation. It is built on the PyTorch framework and serves as a powerful tool for researchers and engineers to train and deploy models for object detection and instance segmentation.

Please note that it is essential to download and install the source code to avoid the errors mentioned earlier.

git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
pip install -r requirements/build.txt
pip install -v -e .
# "-v" means verbose, or more output
# "-e" means installing a project in editable mode,
# thus any local modifications made to the code will take effect without reinstallation.
Enter fullscreen mode Exit fullscreen mode

Installing the star of the show: the EasyOCR library.

pip install easyocr
Enter fullscreen mode Exit fullscreen mode

At this point, the preparations have been completed.

Top comments (2)

Collapse
 
jjokah profile image
John Johnson Okah

Nice. Keep it going.

Collapse
 
walkman42 profile image
Walkman42

Thank you, this is a series, I will continue to update it