A Deep Dive into Grid Removal for OCR

#computervision #ocr

If you’ve ever tried to run OCR on handwritten notes, you know the struggle. Standard algorithms excel at clean, black-on-white typed text. But throw in a background grid or low-contrast pencil marks, and the accuracy plummets.

While some papers can be quite nice for text recognition, some may be... let's say - hard.

The grid on this paper is of almost the same intensity as writing.

The following Python function uses OpenCV to perform "surgery" on an image: it identifies the grid, removes it without destroying the text, and then uses adaptive equalization to make the handwriting pop. Let’s break down how it works step-by-step.

1. Thresholding: Creating a Binary World

The process starts by converting the image to grayscale and applying an Adaptive Threshold.

img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
thresh = cv2.adaptiveThreshold(img, 255, 
    cv2.ADAPTIVE_THRESH_MEAN_C, 
    cv2.THRESH_BINARY_INV, 
    41, 5)

Unlike global thresholding (which uses one value for the whole image), adaptive thresholding calculates different thresholds for small pixel neighborhoods.

Why? Scanned documents often have uneven lighting (shadows in the corners).
The Result: We get a "binary" image (black and white) where the grid and text are white and the background is black (THRESH_BINARY_INV). This makes it easier for the math in the next step to identify shapes.

2. Morphological Operations: Isolating the Grid

Now we need to tell the computer what a "grid line" looks like. We use Structuring Elements (kernels).

scale = 40
hor_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (img.shape[1] // scale, 1))
ver_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, img.shape[0] // scale))
mask_h = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, hor_kernel, iterations=1)
mask_v = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, ver_kernel, iterations=1)
grid_mask = cv2.add(mask_h, mask_v)

We create two long, thin rectangles: one horizontal and one vertical. By applying a Morphological Open operation, we effectively say: "Keep only the shapes that this rectangle can fit into."

mask_h: Only keeps horizontal lines.
mask_v: Only keeps vertical lines.
grid_mask: By adding them together and dilating (thickening) the result, we create a map of exactly where the grid sits.

We can dilate it by few pixels, to be sure it won't leave any artifacts after removal.

grid_mask = cv2.dilate(grid_mask, np.ones((5,5), np.uint8), iterations=1)

3. Inpainting: The "Content-Aware Fill"

Simply "deleting" the grid would leave white scars through your letters. Instead, we use Inpainting.

result_inpainted = cv2.inpaint(img, grid_mask, 3, cv2.INPAINT_TELEA)

Inpainting looks at the grid_mask (the areas we want to fix) and fills those pixels by interpolating data from the surrounding non-grid pixels. It’s like a smart "heal" tool. It removes the grid while attempting to preserve the continuity of the pen strokes that crossed over it.

Looks like magic :)

4. CLAHE: Bringing Back the Contrast

Finally, we deal with legibility. Handwritten text is often faint. We use CLAHE (Contrast Limited Adaptive Histogram Equalization).

clahe = cv2.createCLAHE(clipLimit=2.5, tileGridSize=(8, 8))
enhanced = clahe.apply(result_inpainted)

Standard Histogram Equalization spreads out the most frequent intensity values, but it often over-amplifies noise. CLAHE operates on small tiles (8x8 pixels) and clips the contrast to prevent the background noise from becoming overwhelming.

After this step it could also be eroded (thickened) by some small kernel, but in the case of small tight writing it could destroy the visibility of individual letters.

kernel = np.ones((2, 2), np.uint8)
thickened = cv2.erode(enhanced, kernel, iterations=1)

The Result

By the end of this pipeline, the image has undergone a massive transformation:

Gridlines are intelligently "healed" out of the image.
Shadows from the scan are neutralized.
Faint handwriting is darkened and sharpened.

This pre-processed image provides a much higher "signal-to-noise" ratio, giving your OCR engine a clear path to accurate character recognition.

Now, when it comes to recognition itself, is another story. For this type of writing, as in this example (yes, it's mine...), there's no self-hosted solution, everyone of them is failing very miserably. For such bad writing only big cloud vision models can help. Why? Not only because it looks bad - but also it's so tight every self-hosted algorithm of separating this into individual lines fails, just like that. Doesn't matter if we use some clever engineering or let some vision AI to do it. Only the biggest models can do that ;) Of course, if you have some better papers, written in a more.. let's say, human way ;) then maybe there's a solution for self-hosted recognition. I've spent several nights on this and in the end I went for Gemini ;)