Optical Character Recognition (OCR) is a technology that converts images, scanned documents, or PDFs into machine-readable text. In Python, there are many powerful OCR libraries and models available that support different backends and use cases.
In this blog, I explore the most popular OCR modules available in Python and how they are used in real-world applications.
1.TESSERACT OCR (pytesseract):
Backend: Google Tesseract Engine (C++ based)
Python Wrapper: pytesseract
Tesseract is one of the most widely used open-source OCR engines. It is maintained by Google and supports multiple languages.
HOW TO INSTALL:
pip install pytesseract
SAMPLE CODE:
import pytesseract
from PIL import Image
img = Image.open("sample.png")
text = pytesseract.image_to_string(img, lang="eng")
print(text)
- EASYOCR
Backend: PyTorch (Deep Learning based)
EasyOCR is a deep learning-based OCR library. It works well with complex images and multiple languages.
HOW TO INSTALL:
pip install easyocr
SAMPLE CODE:
import easyocr
reader = easyocr.Reader(['en','ta'])
result = reader.readtext('sample.png')
for r in result:
print(r[1])
- PADDLE OCR:
Backend: PaddlePaddle (Deep Learning Framework)
PaddleOCR is a powerful industrial-level OCR toolkit developed by Baidu.
HOW TO INSTALL:
pip install paddleocr paddlepaddle
SAMPLE CODE:
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang='en')
result = ocr.ocr('sample.png')
for line in result[0]:
print(line[1][0])
- KERAS OCR:
Backend: TensorFlow / Keras
Keras-OCR is built using deep learning models and provides both text detection and recognition
HOW TO INSTALL:
pip install keras-ocr
SAMPLE CODE:
import keras_ocr
pipeline = keras_ocr.pipeline.Pipeline()
images = [keras_ocr.tools.read('sample.png')]
prediction = pipeline.recognize(images)
print(prediction)
HOW IT WORKS IN BACKEND:
Image / PDF
↓
Preprocessing (OpenCV)
↓
Text Detection (DL/CNN)
↓
Text Recognition (CRNN / Transformer)
↓
Output Text (JSON / String)
AVAILABLE MODULES :
1.pytesseract
2.easyocr
3.paddleocr
4.opencv-python
5.pillow
6.pdf2image
Top comments (0)