DEV Community

Isaac Black (Student)
Isaac Black (Student)

Posted on

How can I determine the position of a text string on the screen?

Recently, I've been having an annoying problem: I like listening to music while I work, but because I listen to music on Youtube, I'm often interrupted by ads. Because I'm doing this on a school-issued computer, I can't just install an adblocker. Now, what I have been doing is just skipping the ads manually, but sometimes my computer is in my bag, so I'd have to get it out and open it up to skip the ad. So, I decided to write a little Python script to automatically skip the ad for me! However, I ran into a problem: I can't find any good methods of determining where I need to click to skip the ad.

What I'm going to do is use some sort of OCR/computer vision library to look for the string "Skip" or "Skip ad" on the screen, and then click its location. However, none of the libraries suggested in places like StackOverflow work for me. Because I am doing this on a school-issued computer, I can't just install whatever programs I want. I can install modules using PIP, but that's it. Because of that, I can't use things like pytesseract that require you to install something alongside the module. Selenium is also ruled out because I want this to be something I can just start and have work, without having to open up tabs specifically for this program, and Selenium can't hook into windows that weren't specifically opened by it. I was thinking of using openCV, but as far as I can tell, its text detection only finds the bounding boxes of text, and not what the text actually is.

So, what should I use?

Top comments (0)