DEV Community

loading...
Cover image for Using JavaScript to Preprocess Images for OCR

Using JavaScript to Preprocess Images for OCR

Mathew Chan
I write about Python, NLP, and languages.
Updated on ・7 min read

Preprocessing and OCR

When we preprocess an image, we transform images to make them more OCR-friendly. OCR engines are usually trained with image data resembling print, so the closer the text in your image is to print, the better the OCR will perform. In this post, we will apply several preprocessing methods to improve our OCR accuracy.

Methods of Preprocessing

1_Yajazz-a5PwbFOYS9w7nlg

  • Binarization
  • Skew Correction
  • Noise Removal
  • Thinning and Skeletonization

You can find detailed information on each of these methods in this article. Here we will focus on working with dialogue text from video games.

Quick Setup

In my last post, I talked about how to snip screenshots from videos and run OCR on the browser with tesseract.js. We can reuse our code for this demonstration.

To get started you can download the html file and open it on your browser. It would prompt you to select a window for sharing. After that, click and drag over your video to snip an image for OCR.

Binarization

To binarize an image means to convert the pixels of an image to either black or white. To determine whether the pixel is black or white, we define a threshold value. Pixels that are greater than the threshold value are black, otherwise they are white.

Applying a threshold filter removes a lot of unwanted information from the image.

Let's add two functions: preprocessImage and thresholdFilter. These functions will take pixel information as parameters, which can be obtained from the canvas context with ctx.getImageData().data. For every pixel we calculate its grayscale value from its [r,g,b] values and compare it to our threshold level to set it to either black or white.

 function preprocessImage(canvas) {
  const processedImageData = canvas.getContext('2d').getImageData(0,0,canvas.width, canvas.height);
  thresholdFilter(processedImageData.data, level=0.5);
  return processedImageData;
  }

  // from https://github.com/processing/p5.js/blob/main/src/image/filters.js
  function thresholdFilter(pixels, level) {
    if (level === undefined) {
      level = 0.5;
    }
    const thresh = Math.floor(level * 255);
    for (let i = 0; i < pixels.length; i += 4) {
      const r = pixels[i];
      const g = pixels[i + 1];
      const b = pixels[i + 2];
      const gray = 0.2126 * r + 0.7152 * g + 0.0722 * b;
      let val;
      if (gray >= thresh) {
        val = 255;
      } else {
        val = 0;
      }
      pixels[i] = pixels[i + 1] = pixels[i + 2] = val;
    }
  }
Enter fullscreen mode Exit fullscreen mode

Then call our new function in the VideoToCroppedImage function after we are done snipping the image with drawImage. We can apply the processed image to the canvas with putImageData.

function VideoToCroppedImage({width, height, x, y}) {
  ..
  ctx2.drawImage(videoElement, x*aspectRatioX, y*aspectRatioY, width*aspectRatioX, height*aspectRatioY, 0, 0, cv2.width, cv2.height);
  ctx2.putImageData(preprocessImage(cv2), 0, 0);
  const dataURI = cv2.toDataURL('image/jpeg');
  recognize_image(dataURI);
}
Enter fullscreen mode Exit fullscreen mode

Here's how it looks like before and after the threshold filter.

original

soindkxzhq9e8ukqk0l0

OCR Results:

Screenshot 2020-11-13 at 1.06.25 PM

The filter removed the gray patterns behind the text. Now our OCR result has one fewer error!


Here's a more challenging image.

sample to work

comparison copy 2

OCR Results:

Screenshot 2020-11-13 at 12.57.04 PM

As you can see, the background strokes are creating noise. Simply applying the threshold filter would worsen the OCR result.

Let's find out how to remove noise.

Noise Removal

We can remove patches of high intensity in an image by blurring it. Box blur and Gaussian blur are one of the many blurring methods.

Insert two helper functions getARGB and setPixels.

function getARGB (data, i) {
  const offset = i * 4;
  return (
    ((data[offset + 3] << 24) & 0xff000000) |
    ((data[offset] << 16) & 0x00ff0000) |
    ((data[offset + 1] << 8) & 0x0000ff00) |
    (data[offset + 2] & 0x000000ff)
  );
};

function setPixels (pixels, data) {
  let offset = 0;
  for (let i = 0, al = pixels.length; i < al; i++) {
    offset = i * 4;
    pixels[offset + 0] = (data[i] & 0x00ff0000) >>> 16;
    pixels[offset + 1] = (data[i] & 0x0000ff00) >>> 8;
    pixels[offset + 2] = data[i] & 0x000000ff;
    pixels[offset + 3] = (data[i] & 0xff000000) >>> 24;
  }
};
Enter fullscreen mode Exit fullscreen mode

For the Gaussian blur, add two functions buildBlurKernel and blurARGB.

// internal kernel stuff for the gaussian blur filter
  let blurRadius;
  let blurKernelSize;
  let blurKernel;
  let blurMult;

  // from https://github.com/processing/p5.js/blob/main/src/image/filters.js
  function buildBlurKernel(r) {
  let radius = (r * 3.5) | 0;
  radius = radius < 1 ? 1 : radius < 248 ? radius : 248;

  if (blurRadius !== radius) {
    blurRadius = radius;
    blurKernelSize = (1 + blurRadius) << 1;
    blurKernel = new Int32Array(blurKernelSize);
    blurMult = new Array(blurKernelSize);
    for (let l = 0; l < blurKernelSize; l++) {
      blurMult[l] = new Int32Array(256);
    }

    let bk, bki;
    let bm, bmi;

    for (let i = 1, radiusi = radius - 1; i < radius; i++) {
      blurKernel[radius + i] = blurKernel[radiusi] = bki = radiusi * radiusi;
      bm = blurMult[radius + i];
      bmi = blurMult[radiusi--];
      for (let j = 0; j < 256; j++) {
        bm[j] = bmi[j] = bki * j;
      }
    }
    bk = blurKernel[radius] = radius * radius;
    bm = blurMult[radius];

    for (let k = 0; k < 256; k++) {
      bm[k] = bk * k;
    }
  }
}

// from https://github.com/processing/p5.js/blob/main/src/image/filters.js
function blurARGB(pixels, canvas, radius) {
  const width = canvas.width;
  const height = canvas.height;
  const numPackedPixels = width * height;
  const argb = new Int32Array(numPackedPixels);
  for (let j = 0; j < numPackedPixels; j++) {
    argb[j] = getARGB(pixels, j);
  }
  let sum, cr, cg, cb, ca;
  let read, ri, ym, ymi, bk0;
  const a2 = new Int32Array(numPackedPixels);
  const r2 = new Int32Array(numPackedPixels);
  const g2 = new Int32Array(numPackedPixels);
  const b2 = new Int32Array(numPackedPixels);
  let yi = 0;
  buildBlurKernel(radius);
  let x, y, i;
  let bm;
  for (y = 0; y < height; y++) {
    for (x = 0; x < width; x++) {
      cb = cg = cr = ca = sum = 0;
      read = x - blurRadius;
      if (read < 0) {
        bk0 = -read;
        read = 0;
      } else {
        if (read >= width) {
          break;
        }
        bk0 = 0;
      }
      for (i = bk0; i < blurKernelSize; i++) {
        if (read >= width) {
          break;
        }
        const c = argb[read + yi];
        bm = blurMult[i];
        ca += bm[(c & -16777216) >>> 24];
        cr += bm[(c & 16711680) >> 16];
        cg += bm[(c & 65280) >> 8];
        cb += bm[c & 255];
        sum += blurKernel[i];
        read++;
      }
      ri = yi + x;
      a2[ri] = ca / sum;
      r2[ri] = cr / sum;
      g2[ri] = cg / sum;
      b2[ri] = cb / sum;
    }
    yi += width;
  }
  yi = 0;
  ym = -blurRadius;
  ymi = ym * width;
  for (y = 0; y < height; y++) {
    for (x = 0; x < width; x++) {
      cb = cg = cr = ca = sum = 0;
      if (ym < 0) {
        bk0 = ri = -ym;
        read = x;
      } else {
        if (ym >= height) {
          break;
        }
        bk0 = 0;
        ri = ym;
        read = x + ymi;
      }
      for (i = bk0; i < blurKernelSize; i++) {
        if (ri >= height) {
          break;
        }
        bm = blurMult[i];
        ca += bm[a2[read]];
        cr += bm[r2[read]];
        cg += bm[g2[read]];
        cb += bm[b2[read]];
        sum += blurKernel[i];
        ri++;
        read += width;
      }
      argb[x + yi] =
        ((ca / sum) << 24) |
        ((cr / sum) << 16) |
        ((cg / sum) << 8) |
        (cb / sum);
    }
    yi += width;
    ymi += width;
    ym++;
  }
  setPixels(pixels, argb);
}
Enter fullscreen mode Exit fullscreen mode

For this example, we also need two more functions:

  1. invertColors: inverts the colors of the pixels.
  2. dilate: increases light areas of the image.
function invertColors(pixels) {
  for (var i = 0; i < pixels.length; i+= 4) {
    pixels[i] = pixels[i] ^ 255; // Invert Red
    pixels[i+1] = pixels[i+1] ^ 255; // Invert Green
    pixels[i+2] = pixels[i+2] ^ 255; // Invert Blue
  }
}
// from https://github.com/processing/p5.js/blob/main/src/image/filters.js
function dilate(pixels, canvas) {
 let currIdx = 0;
 const maxIdx = pixels.length ? pixels.length / 4 : 0;
 const out = new Int32Array(maxIdx);
 let currRowIdx, maxRowIdx, colOrig, colOut, currLum;

 let idxRight, idxLeft, idxUp, idxDown;
 let colRight, colLeft, colUp, colDown;
 let lumRight, lumLeft, lumUp, lumDown;

 while (currIdx < maxIdx) {
   currRowIdx = currIdx;
   maxRowIdx = currIdx + canvas.width;
   while (currIdx < maxRowIdx) {
     colOrig = colOut = getARGB(pixels, currIdx);
     idxLeft = currIdx - 1;
     idxRight = currIdx + 1;
     idxUp = currIdx - canvas.width;
     idxDown = currIdx + canvas.width;

     if (idxLeft < currRowIdx) {
       idxLeft = currIdx;
     }
     if (idxRight >= maxRowIdx) {
       idxRight = currIdx;
     }
     if (idxUp < 0) {
       idxUp = 0;
     }
     if (idxDown >= maxIdx) {
       idxDown = currIdx;
     }
     colUp = getARGB(pixels, idxUp);
     colLeft = getARGB(pixels, idxLeft);
     colDown = getARGB(pixels, idxDown);
     colRight = getARGB(pixels, idxRight);

     //compute luminance
     currLum =
       77 * ((colOrig >> 16) & 0xff) +
       151 * ((colOrig >> 8) & 0xff) +
       28 * (colOrig & 0xff);
     lumLeft =
       77 * ((colLeft >> 16) & 0xff) +
       151 * ((colLeft >> 8) & 0xff) +
       28 * (colLeft & 0xff);
     lumRight =
       77 * ((colRight >> 16) & 0xff) +
       151 * ((colRight >> 8) & 0xff) +
       28 * (colRight & 0xff);
     lumUp =
       77 * ((colUp >> 16) & 0xff) +
       151 * ((colUp >> 8) & 0xff) +
       28 * (colUp & 0xff);
     lumDown =
       77 * ((colDown >> 16) & 0xff) +
       151 * ((colDown >> 8) & 0xff) +
       28 * (colDown & 0xff);

     if (lumLeft > currLum) {
       colOut = colLeft;
       currLum = lumLeft;
     }
     if (lumRight > currLum) {
       colOut = colRight;
       currLum = lumRight;
     }
     if (lumUp > currLum) {
       colOut = colUp;
       currLum = lumUp;
     }
     if (lumDown > currLum) {
       colOut = colDown;
       currLum = lumDown;
     }
     out[currIdx++] = colOut;
   }
 }
 setPixels(pixels, out);
};
Enter fullscreen mode Exit fullscreen mode

Finally call these newly created filters in the preprocessing function. The order of these filters is significant as you will see later.

function preprocessImage(canvas) {
    const processedImageData = canvas.getContext('2d').getImageData(0,0,canvas.width, canvas.height);
    blurARGB(processedImageData.data, canvas, radius=1);
    dilate(processedImageData.data, canvas);
    invertColors(processedImageData.data);
    thresholdFilter(processedImageData.data, level=0.4);
    return processedImageData;
  }

Enter fullscreen mode Exit fullscreen mode

Here's what the image looks like after every filter is applied.

comparison

OCR Results:

Screenshot 2020-11-13 at 12.56.48 PM

After a series of filters, our image resembles a lot more like printed text and the result is nearly perfect!

Let's go through what each filter does to the image.

  1. Gaussian Blur: Smoothen the image to remove random areas of high intensity.
  2. Dilation: Brighten the white text.
  3. Color Inversion: Make the bright text dark but the dark background light.
  4. Threshold Filter: Turn light pixels including the background into white, but turn the dark text black.

Note: There is no need to reinvent the wheel by writing your own filter algorithms. I borrowed these algorithms from p5.js repository and this article so I can use the functions that I need without having to import an entire image processing library like OpenCV.

Wrapping it up

When it comes to OCR, data quality and data cleansing can be even more important to the end result than data training.

There are many more methods to preprocess data and you will have to make the decisions on what to use. Alternatively to expand on this project, you can employ adaptive processing or set rules such as inverting color when text is white or applying threshold filters only when the background is light.

Let me know if you found this post helpful. :)

References

Discussion (3)

Collapse
fabsway23 profile image
fabsway23

i want to use this code in an react app can you give the whole code as one function which takes in an image and return one

Collapse
skydev66 profile image
skydev66

Thanks for the nice tutorial. Can you provide a python project that outputs same result?

Collapse
mathewthe2 profile image
Mathew Chan Author

Sure, I'll let you know when I write it up. It's going to cover OpenCV and I was also thinking on touching on OpenCV's Adaptive thresholding. Lately I feel like open-cv-python is relatively small compared to the other libraries one might need for a python project anyway.