DEV Community

Cover image for How I made free OCR chrome extension for Japanese language learners.
OuterSpaceHobo
OuterSpaceHobo

Posted on

How I made free OCR chrome extension for Japanese language learners.

In this article I talk about how I decided to create my first open source project after completing JS & React courses.

For some years I have been casually learning Japanese language and in my studies I heavily preferred immersive approach.

But Japanese language (as well as Chinese & partially Korean) have a pretty high bar for casual reading because not only one have to learn two alphabets, but also memorize at least few hundreds kanji.

The regular need switch from reading light text/manga (or watching youtube, dorama etc.) to some dictionary tab just to find this one unknown or forgotten kanji was really ruining the experience.

I discovered some awesome resources like Yomichan (highly recommend checking it out), which recognize text under mouse cursor and provide translation and annotation from given dictionaries.

Unfortunately none of free options used OCR so that one could check any pic be it manga panel or subtitles in paused video right from the same tab. So I decided to make one myself!

ScanLingua(project name proudly accepted from chatGPT suggestions).

I decided to start with chrome extension and used knowledge, gained from selfstudy and Jad Joubran JS & React courses (cool courses btw).

This is my of first/experimental and kind of portfolio project, so I tested different approaches and thus it may seem inconsistent, but thats the point.

Project structure.

Features two main parts: servise_worker and content_script, which are separate React Apps.

servise_worker is responsible for manifest, fetch and main popup render and content_script provides logic and render overlay & popup within current (active) tab.

content_script is bundled in single file inside servise_worker/public.

Apps exchange necessary data via chrome messaging with unique IDs like so:

// content_script

async function requestTranslation () {
    const requestId = getRandomInt(100000) 
    const response = await chrome.runtime.sendMessage({type: "request-translation", requestId}); 
    const res = await new Promise((resolve, reject) => {
        requests.set(requestId, resolve) 
        setTimeout(reject, 60*1000) //60sec
    })
    return res
}

// service_worker

chrome.runtime.onMessage.addListener(gotMessage)

async function gotMessage(request, sendResponse) {
    if (request.type == "request-translation") {
        const rawTranslation = await translateZone(visionText)
        const translation = await unEscape(rawTranslation)
        await sendTranslation(request.requestId, translation)
    }
}

async function sendTranslation(requestId, translation) {
    const [tab] = await chrome.tabs.query({active: true, lastFocusedWindow: true});
    const response = await chrome.tabs.sendMessage(tab.id, {type: "your-translation", requestId, translation});
} 
Enter fullscreen mode Exit fullscreen mode

Active tab overlay and canvas.

I started to implement crop zone selection functionality with creating main overlay div and five static children (top, bottom, right, left, zone).

And then passed on mousedown and on mousemove coords (x1, y1, x2, y2) to children divs sides so they change dynamically.

Image description

To capture selected zone on mouseup I used chrome captureVisibleTab and drew OffscreenCanvas on it with zone div coords.

I rushed to convert data to base64 and finally call OCR API, but the results were incorrect, and visualized zone image showed this:

Image description

Image description

The zone was off and looked zoomed in. At first it was frustrating but thanks to that Window.devicePixelRatio was discovered along with the source of problem — display ratio 2 on Mac. I’m kinda glad I found this early because be my display ratio 1 it would pass the radar just to appear later on other displays.

Styling.

Another problem was the need isolate active tab popup styles from the page in was injected into. Upon some research I decided between creating an iFrame or manually overriding conflicting styles.

Considering the active tab popup will host only one button and some styled text inside, iFrame looked like overkill so I went with override option. I also tried using React Styled Components since it is mandatory to bundle content_script in one .js file.

Used APIs.

For OCR & translation ScanLingua uses Google vision and translation APIs.

This APIs are not free, but provide 1000 free vision requests and 500 000 free character translation rep month which should suffice for personal use. One just have to create appropriate API key and pay attention for APIs usage to not exceed free monthly limits (key setup instruction provided).

To help User track key usage simple hotkey counter is set at extension Home tab.

For Kanji annotation I picked awesome free open source dictionary Jotoba (it also takes data from various resources, such as KanjiDic, Jisho and many other; since Jotoba API is free its availability is not guaranteed).

Next steps.

It is still raw & buggy, but already usable in general. Proof of concept release pushed to store:

ScanLingua - Chrome Web Store

Scan, recognize and translate parts of your screen.

favicon chrome.google.com

Repo:

GitHub logo OuterSpaceHobo / ScanLingua

Free open source chrome extension for immersive japanese language learners.

ScanLingua icon-32

ScanLingua provides OCR & translation for more that 100 languages and kanji annotation of selected screen areas.

It is great for checking on subtitles, manga panels or any japanese text.

This project is heavily inspired with Yomichan and use awesome japanese dictionary Jotoba API for kanji annotation (Jotoba takes data from various resources, you can check them out at https://jotoba.de/about).

Demo video: https://www.youtube.com/watch?v=wZ3lVvQ6_u8

For text recognition and translation ScanLingua use Google Vision and Translation APIs.

This APIs are not free, but provide 1000 free vision requests and 500 000 free character translation per month which should suffice for personal use. API KEY is NOT provided. User need to create appropriate API key.

To-Do list: work on design; fix errors and expand existing functionality; add annotation for chinese and korean languages;

Feedback is highly appreciated!

How to get API key?

  1. Register at https://console.cloud.google.com/ and create a project, in project…

I plan to work on UI, fix errors and expand existing functionality.

Thank you for reading. Feedback is highly appreciated!

Top comments (0)