Oscar Nord for Eyevinn Video Dev-Team Blog

Posted on Mar 2, 2021

Introduction to text detection using OpenCV and pytesseract

#python #opencv #tutorial #imageprocessing

Introduction

Detecting and reading text from images is a complex task with a multitude of factors that is needed to take into account, it can be something like detecting handwritten text on a piece of paper to detect subtitles in a movie.

In this short tutorial we are going to utilise OpenCVs image manipulation tools and the Python wrapper for Google’s Tesseract-OCR Engine Python-tesseract or pytesseract to detect a specific number of predefined words in a video sequence.

The fun part

You need to have OpenCV for Python and pytesseract on your system, so start with installing them via pip:

pip install opencv-python
pip install pytesseract

The idea here is to detect when (on what frame) the pre/post credits in a video shows up. For the simplicity of this tutorial just print the word and the frame number to the console.

We start by importing OpenCV and pytesseract and define our list with words.

from cv2 import cv2
import pytesseract

LIST = ['director', 'directed', 'written', 'writer',
        'producer', 'produced', 'production', 'lead']

The input will be a path to a local video file or a url, the url can also be a HLS manifest. Then we extract the first frame and define our frame counter.

path = input('Enter path to file or hls manifest: ')

vidcap = cv2.VideoCapture(path)
success,frame = vidcap.read()
framenbr = 0

What we want to do now is to read the next frame and process it as long as there is content left in our vidcap variable. If no content is present we are done and can break out of the reading loop.

while success:
    success,frame = vidcap.read()
    if not success:
        break

We are now going to process the frame using OpenCV to make it easier for pytesseract to extract the text by convert the image to grayscale.

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

We can also add layers of processing to the image at this stage but I will skip this for the simplicity of this tutorial.

Now on to text extraction using pytesseract:

data = pytesseract.image_to_string(gray, lang='eng', config='--psm 6').lower()

Here gray is the grayscale image, lang='eng' is of course the language that we expect the text which we are trying to detect to be in, and config='--psm 6' defines what setting pytesseract will use when extracting the text. Page segmentation mode 6 will make pytesseract assume a single uniform block of text on extraction.

The available page segmentation modes are:

0 Orientation and script detection (OSD) only.
1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR. (not implemented)
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
5 Assume a single uniform block of vertically aligned text.
6 Assume a single uniform block of text.
7 Treat the image as a single text line.
8 Treat the image as a single word.
9 Treat the image as a single word in a circle.
10 Treat the image as a single character.
11 Sparse text. Find as much text as possible in no particular order.
12 Sparse text with OSD.
13 Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.

The only thing left is to compare what we have in our data variable with our predefined list and continue to the next frame.

if any(word in data for word in LIST):
    print(data + '\nFrame number: ' + str(framenbr))
framenbr += 1

Here is the complete example code:

from cv2 import cv2
import pytesseract

LIST = ['director', 'directed', 'written', 'writer',
        'producer', 'produced', 'production', 'lead']

path = input('Enter path to file or hls manifest: ')

vidcap = cv2.VideoCapture(path)
success,frame = vidcap.read()
framenbr = 0

while success:
    success,frame = vidcap.read()
    if not success:
        break
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    gray = cv2.resize(gray, None, fx=2, fy=2, interpolation=cv2.INTER_LINEAR)
    data = pytesseract.image_to_string(gray, lang='eng', config='--psm 6').lower()
    if any(word in data for word in LIST):
        print(data + '\nFrame number: ' + str(framenbr))
    framenbr += 1

Conclusion

If we would take this to the next step it could be used for detecting when an intro occurs in a tv-serie to make it easier for the viewer to skip it or to continue to the next episode.

If you need assistance in the development and implementation of this, our team of video developers are happy to help you out. If you have any questions or comments just drop a line in the comments section to this post.

DEV Community

Introduction to text detection using OpenCV and pytesseract

Introduction

The fun part

Conclusion

Top comments (0)

Read next

JSON#

Machine Learning Basics: Building Your First Predictive Model in R

Why Seeing Data Beats Reading It: The Case for Data Visualization

GO:lack of synchronization