Welcome to my first post here,So in the past couple of years i readed many posts in this website and i feel it's very useful to share informations with other and have differents opinions about many tech subjects.
My name is Alaa ,I am a web developer and a 'Webmaster' graduated from the Faculty of Economics and Management of Nabeul and a 2nd year computer science engineering student specializing in WEB technologies at the Private School of Engineering and Technologies (Esprit).
What is OCR ? Well ,it's an algorithm that we use to extract characters from a photo where we teach the algorithm to know the shape of a character in pixels prospective.
We gonna use tesseract.js (OCR) package to extract the words from an image and a file contain the data (characters shape) to use it for the character recognition.
To run the tesseract.js properly you should run the .html file that we gonna make on a server not on local.
- Create a HTML file with the name index.html
2.Create a directory in your root named js and put the js files :
Download the files : https://github.com/geekalaa/OCRJS/tree/main/js
3.Create a directory named 'langs-folder' and download the data files : https://github.com/geekalaa/OCRJS/tree/main/langs-folder
The global lang directory : https://github.com/tesseract-ocr/langdata
4.We gonna use an image for the test : https://github.com/geekalaa/OCRJS/blob/main/OCR.png
I used the same script with more advanced features in my online tool try it : character count