Building a Web-based Document Scanner: Tackling Perspective and Shadows with OpenCV

#python #productivity #react #webdev

We've all been there: you need to scan a receipt or a signed document, but all you have is your phone. You take a photo, but it's skewed, shadowy, and looks nothing like a real "scan."

I decided to solve this by building DocuScanPro, a web-based document scanner that focuses on high-fidelity image enhancement directly in the browser.

The Problem: Perspective and Lighting
Standard mobile photos suffer from two major issues:

Perspective Distortion: When you don't take the photo perfectly from above.
Uneven Lighting: Shadows from your hand or environmental lighting that make the text hard to read.
The Solution: A Python + OpenCV Pipeline
To tackle this, I built a backend processing engine using FastAPI and OpenCV. Here’s a high-level look at the pipeline:

Finding the Document We use Canny edge detection followed by contour approximation to identify the four corners of the document.

python

Simple example of findContours in OpenCV

contours, _ = cv2.findContours(edged, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)

... find the largest 4-point contour

Perspective Transform
Once we have the corners, we apply a warp perspective transform to "flatten" the document into a perfect rectangle.
Shadow Removal & Enhancement
This is where the magic happens. Instead of a simple global threshold, I implemented Adaptive Thresholding. This calculates a threshold for small regions of the image, which helps eliminate shadows while keeping the text sharp and the background a clean, "paper" white.

The Results (Before & After)

Try it out!
I’ve just launched the initial version and I’m looking for feedback on the algorithm performance—especially for edge cases like white paper on light tables.

Check out the live tool here: https://www.docuscan.pro/

I'd love to hear your thoughts on the image processing logic or any libraries you recommend.