IamCathal

Posted on Aug 6, 2020

Using Google's Tesseract OCR to cheat at online memory tests

#python #tutorial #beginners #ocr

When I first played HumanBenchmark's verbal memory test test I was hooked. The idea of the minigame is that it shows a word and you have to say if it is new or if it has appeared before. This might sound simple with 3 lives but after about 12 words your ability to differentiate between the two is practically as good as guessing.

Instead of actually improving my verbal memory I decided to do it the easy way, cheat. Google's Tesseract OCR is one of the most popular and easy to use image to text converters available. I wrote up the script in about half an hour and although it's no enterprise software it gets the job done.

Installation

Tesseract OCR

We need to download and add Tesseract OCR to our path to be able to access it from any directory on our computer. The installation page guides you through installation for Linux, Windows and Mac OS so just follow that and after check its working with the command tesseract --version.

Python Modules

If you're running python 3 (as you should be) and using pip3 to install your python modules you can get all the required ones with these 3 commands.

pip3 install pynput: This module allows us to take input from the mouse to click the buttons on the minigame and move the mouse.
pip3 install pyscreenshot: PyScreenshot is going to take the screenshots of the words to then pass onto Tesseract.
pip3 install pytesseract: We need the PyTesseract module as we can't use Tesseract directly from python without the python wrapper. This is just a middle man between us and the actual OCR.

Writing the script

So our main tasks for this script are going to be as follows:

Screenshot the word
Pass it onto Tesseract, get the output and add it to a list of known words
If the word has appeared before, click seen, else add it to the list and click new
Continue until the user's desired highscore has been reached or something goes wrong

Imports

from pynput.mouse import Button, Controller
from PIL import Image
import pyscreenshot
import pytesseract
import sys
import re

Pynput is quite a large module and we are only going to need two elements of it, button and controller. Therefore we only need to import those two parts of Pynput. We use PIL (python image library) to open the image of the screenshot, again we only need Image. The rest of the modeuls are pretty self explanatory. The last module we import is re, which is python's regex module. We only use this once to check if the the game has gone wrong and isn't' showing a word. Essentially its a bit overkill but if you were to upgrade the script you'd find much more use for regex expressions to parse text.

Screenshotting

def screenshot():
    region = pyscreenshot.grab(bbox=(700,390,1250,450))
    region.save('currentWord.png')

To get a screenshot we need to specify what region of the screen we want captured. bbox is in the format (X1,Y1,X2,Y2). For a normal 1080p screen the arguments given make a selection area big enough for any word that the game throws at you. With this screenshot we need to save it somewhere. We save it under currentWord.png which is parsed later by Tesseract. We can overwrite this after every iteration since it'll be put into our list.

Pass the word onto Tesseract and add to list

def getWord(filename):
    return pytesseract.image_to_string(Image.open(filename))

Using pyTesseract is fairly simple. With the self explanatory method image_to_string we pass in the opened filename which is always going to be currentWord.png. Great now we have the actual text from from the screenshot, now we should add it to an array (or list as python calls it) so that we can check for duplicates in the future.

def isNew(currentWord, allWords):
    if currentWord in allWords:
        return 0
    else:
        allWords.append(currentWord)
        return 1

In isNew() we simply pass in the currentWord and the list of all words that we've seen before. If the word is in the list we return 0 indicating that it is not new, if it's new we then append it to the list of all known words and return 1.

So far we've some functions that do the tasks that we need but we need a main game loop to call and take care of all these. This is a good bit of code but I will go through every line of it.

def playGame(allWords, userLevel):
    mouse = Controller()
    for i in range(1,userLevel):
        screenshot()
        subject = getWord('currentWord.png')
        # if new
        if isNew(subject, allWords):
            mouse.position = (1030,505)
            mouse.click(Button.left, 1)
        # if seen
        else:
            mouse.position = (870, 505)
            mouse.click(Button.left, 1)

The function playGame() takes in the list of all words and the desired level of the user (what score the bot should achieve). It instantiates the mouse as a controller (from Pynput) as we need to move and click the mouse. We loop from 1 to the userLevel and in every iteration we screenshot the word, get the text output from this and see if it's new or not. From these two outcomes we click either on new or seen and then the loop begins again. .click() takes in two arguments in the format (X1, Y1). Again this script only works for a 1080p screen with the window in full screen mode as this is where the button is when thats the case. So from this main game loop we still need somewhere to take the user's input of what level they want to achieve and to pass in the allWords list. We'll use the main function for this.

def main():
    allWords = list()
    mouse = Controller()
    userLevel = int(input('What level would you like to reach: '))
    if userLevel < 2:
        print('Error: defaulting to 10')
        userLevel = 10
    mouse.position = (950, 610)
    mouse.click(Button.left, 1)
    playGame(allWords, userLevel)

if __name__ == '__main__':
    main()

We get the user's desired level and check if the number is valid. We then move the mouse to the start button and click to begin. From here we don't return to main. The last two lines is just how you force the python interpreter to run the main function by default.

Improvements

This script works as intended but there is a few more little things we can add to perhaps see the overall amount of seen words compared to new words as it changes overtime. Also some small error checking to make sure the script doesn't go off the rails.

In our getWord() function we are going to add a check to see if the current "word" isn't the end screen. We are going to use a regex search to see if the string matches "See how you compare". If we are on that screen the program exits as it shouldn't ever be in that state.

def getWord(filename):
    text = pytesseract.image_to_string(Image.open(filename))
    if re.search('(See how you compare)+', text):
        print('Error: Unexpected end of game.')
        exit()
    else:
        return text

In our playGame() function we're going to add in some logging of the frequency of both seen and new words.

def playGame(allWords, userLevel):
    mouse = Controller()
    seenWords = 0
    for i in range(1,userLevel):
        screenshot()
        subject = getWord('currentWord.png')
        # if new
        if isNew(subject, allWords, seenWords, i):
            mouse.position = (1030,505)
            mouse.click(Button.left, 1)
        # if seen
        else:
            seenWords += 1
            mouse.position = (870, 505)
            mouse.click(Button.left, 1)
        percentage = 0
        percentage = float(seenWords/i)*100
        sys.stdout.write('\r{:.2f}% overall duplicates'.format(percentage))
        sys.stdout.flush()

I added a seen words counter to track the total amount of seen words. After every iteration we display this as over the total amount of words encountered and display it in a nice format.

We use sys.stdout.write to write to stdout and the \r which might be new to you is used to basically erase and write over the current line. It therefore doesn't print a new line for every iteration, it updates the line printed with the new statistics. We then flush the output for good measure.

The finished product

from pynput.mouse import Button, Controller
from PIL import Image
import pyscreenshot
import pytesseract
import sys
import re


def getWord(filename):
    text = pytesseract.image_to_string(Image.open(filename))
    if re.search('(See how you compare)+', text):
        print('Error: Unexpected end of game.')
        exit()
    else:
        return text

def screenshot():
    region = pyscreenshot.grab(bbox=(700,390,1250,450))
    region.save('currentWord.png')


def isNew(currentWord, allWords):
    if currentWord in allWords:
        return 0
    else:
        allWords.append(currentWord)
        return 1


def playGame(allWords, userLevel):
    mouse = Controller()
    seenWords = 0
    for i in range(1,userLevel):
        screenshot()
        subject = getWord('currentWord.png')
        # if new
        if isNew(subject, allWords):
            mouse.position = (1030,505)
            mouse.click(Button.left, 1)
        # if seen
        else:
            seenWords += 1
            mouse.position = (870, 505)
            mouse.click(Button.left, 1)
        percentage = 0
        percentage = float(seenWords/i)*100
        sys.stdout.write('\r{:.2f}% overall duplicates'.format(percentage))
        sys.stdout.flush()


def main():
    allWords = list()
    mouse = Controller()
    userLevel = int(input('What level would you like to reach: '))
    if userLevel < 2:
        print('Error: defaulting to 10')
        userLevel = 10
    mouse.position = (950, 610)
    mouse.click(Button.left, 1)
    try:
        playGame(allWords, userLevel)
    except KeyboardInterrupt:
        exit()


if __name__ == '__main__':
    main()

Further Improvements

If you've finished the script and happy that it works here are some extra features that you can implement to improve the functionality:

Make the script adapt to different size windows (use half the screen width for the middle coordinate instead of using half of the dimensions of a 1080p screen)
Instead of hogging the mouse return the pointer to it's original position when it's not busy clicking.
Implement the script for other mini games of similar nature and wreck the high scores!

If you've any questions about the script, clueless about how some part works or have made your own improvements and want to show them off you can make an issue or pull request on the repo

Top comments (2)

Grzegorz Kućmierz • Aug 6 '20 • Edited

There is much easier way to cheat this test.

Just few lines of JavaScript code:

(() => {
  const DELAY = 200;
  const set = new Set();
  setInterval(() => {
    const word = document.querySelector('.word').innerText;
    const label = set.has(word) ? 'SEEN' : 'NEW';
    const btn = [...document.querySelectorAll('button')].filter(btn => btn.innerText === label).pop();
    btn.click();
    set.add(word);
  }, DELAY);
})();

IamCathal • Aug 6 '20

That is a very concise solution, nice.