DEV Community


Extract characters from image using tesseract.js (OCR)

geekalaa profile image geekalaa ・2 min read

Hello 👋🏻.

Welcome to my first post here,So in the past couple of years i readed many posts in this website and i feel it's very useful to share informations with other and have differents opinions about many tech subjects.
My name is Alaa ,I am a web developer and a 'Webmaster' graduated from the Faculty of Economics and Management of Nabeul and a 2nd year computer science engineering student specializing in WEB technologies at the Private School of Engineering and Technologies (Esprit).
What is OCR ? Well ,it's an algorithm that we use to extract characters from a photo where we teach the algorithm to know the shape of a character in pixels prospective.
We gonna use tesseract.js (OCR) package to extract the words from an image and a file contain the data (characters shape) to use it for the character recognition.
To run the tesseract.js properly you should run the .html file that we gonna make on a server not on local.

  1. Create a HTML file with the name index.html
        <!-- the tesseract javascript file -->
        <script  src = "js/tesseract.min.js" ></script>

  workerPath: "js/worker.min.js",
  langPath: "langs-folder/",
  corePath: "js/tesseract-core.wasm.js",


                   // alert(;

Enter fullscreen mode Exit fullscreen mode

2.Create a directory in your root named js and put the js files :
Download the files :
3.Create a directory named 'langs-folder' and download the data files :
The global lang directory :
4.We gonna use an image for the test :

Execution :

Sans titre-1

I used the same script with more advanced features in my online tool try it : character count

Discussion (3)

Editor guide
meladkamari profile image
MeLad Kamari

Why Developer Use This?
It does not support many languages

geekalaa profile image
geekalaa Author • Edited

Because i think it's the easiest way to extract text from image without using so much ram and processing power .

geekalaa profile image
geekalaa Author

good point there ,i just added the link for the global lang data :