DEV Community

UCHIHAMADRA
UCHIHAMADRA

Posted on

I Built a 100% Browser-Based OCR That Never Uploads Your Documents — Here's How

Your medical prescriptions, passports, and bank statements deserve better than being uploaded to someone else's server.

I'm a developer from India, and I built DoctorDocs — a free OCR platform where every single byte of processing happens in your browser. No uploads. No servers. No data collection. Your documents never leave your device.

Here's why I built it, how it works under the hood, and what I learned shipping a WebAssembly-powered app to production.


The Problem That Made Me Angry

My grandmother needed to read a doctor's prescription. The handwriting was illegible — even the pharmacist squinted at it. I thought, "surely there's a free tool online for this."

There is. Dozens of them. And every single one requires you to upload your medical prescription to their server. Think about that — your name, your medications, your diagnosis, sitting on some random company's S3 bucket.

Google Lens works great, but it sends your image to Google's servers. Adobe Scan requires an account. Every "free OCR" tool I found was actually "free to upload your sensitive documents to our cloud."

I decided to build one that works differently.


The Architecture: Zero Server Processing

DoctorDocs runs on a thick-client / thin-server architecture built with Next.js 15. The "thin server" part? It just serves the static HTML/JS. All the actual OCR processing runs in your browser using WebAssembly.

Here's the pipeline:

User drops image
    ↓
OpenCV.js (WASM) → Binarization, shadow removal, contrast enhancement
    ↓
Tesseract.js (WASM) → LSTM neural network OCR, multi-threaded via Web Workers
    ↓
Custom text formatter → Noise reduction, error correction
    ↓
Monaco editor → Edit, copy, or export to PDF
Enter fullscreen mode Exit fullscreen mode

Every step runs on the client's CPU. The server never sees the image.

The Magic Enhance Feature

The #1 problem with phone camera OCR is uneven lighting. You photograph a prescription under a desk lamp, and half the page is bright while the other half is in shadow.

Most tools just crank up the brightness globally. That makes the bright parts white and the dark parts... still dark.

I used OpenCV.js to run adaptive Gaussian thresholding — it breaks the image into 31×31 pixel neighborhoods and adjusts each one relative to its local area. Shadows disappear. Text becomes crisp. It's the same algorithm used in industrial document scanners, running in your browser via WebAssembly.

// This runs entirely in the browser via OpenCV.js WASM
cv.adaptiveThreshold(
  grayMat,
  binaryMat,
  255,
  cv.ADAPTIVE_THRESH_GAUSSIAN_C,
  cv.THRESH_BINARY,
  31,  // block size
  15   // constant
);
Enter fullscreen mode Exit fullscreen mode

Multi-Threaded OCR

Tesseract.js is powerful but slow on a single thread. So I query navigator.hardwareConcurrency to detect CPU cores and spin up a worker pool:

const cores = navigator.hardwareConcurrency || 2;
const workerCount = Math.min(Math.max(cores - 1, 1), 4);

// Each worker loads the eng_best LSTM model
const worker = await createWorker('eng', OEM.LSTM_ONLY, {
  corePath: 'tesseract-core-lstm.wasm.js',
  langPath: '4.0.0_best',  // Deep learning model, not the fast one
});
Enter fullscreen mode Exit fullscreen mode

On a modern laptop, this cuts processing time by 60-70% compared to single-threaded OCR.


150+ Tool Pages, One Engine

DoctorDocs has 144 statically generated tool pages — /tools/handwriting-to-text, /tools/prescription-ocr, /tools/receipt-scanner, etc. They all use the same Tesseract.js engine under the hood.

"Isn't that cheating?" — No. It's the exact strategy Smallpdf and ILovePDF use. The OCR engine doesn't change, but the SEO metadata, titles, FAQs, and use-case descriptions do. Each page targets a different search keyword.

// generateStaticParams() SSGs all 144 pages at build time
export async function generateStaticParams() {
  return TOOLS_CATALOG.map((tool) => ({ slug: tool.slug }));
}
Enter fullscreen mode Exit fullscreen mode

Every tool page auto-generates a "You Might Also Like" section linking to 6 related tools, creating an internal link mesh across all pages.


Beyond OCR: The Tools That Run Locally

DoctorDocs isn't just OCR. It includes 9 PDF utilities and 5 image editing tools, all client-side:

PDF Tools (powered by pdf-lib + pdf.js):

  • Merge, Split, Compress, Watermark, Rotate PDFs
  • Extract/Remove pages
  • Image to PDF, PDF to JPG

Image Tools (powered by HTML Canvas):

  • Crop, Brighten, Black & White, AI Upscale

AI Tools (powered by @xenova/transformers):

  • AI Text Detector — runs a 300MB RoBERTa model in the browser via WebGL
  • AI Text Writer
  • AI Summarizer

Every single one runs without uploading anything.


The Self-Learning OCR Pipeline

This is the part I'm most excited about. DoctorDocs implements a three-tier OCR system that learns from every user interaction:

Tier 1: Gemini 2.5 Flash — When available, the image is sent to Google's Gemini API for enterprise-grade accuracy. This is opt-in and only used when API keys are configured.

Tier 2: TrOCR Vision Transformer — Runs entirely in the browser as a "shadow model." It processes the same image in the background, and its output is compared against Tier 1 for training purposes.

Tier 3: Tesseract.js — The offline fallback. Always works, even without internet.

When a user copies or downloads the text, the system captures the diff between the AI output and the user's corrected version. This ground truth data feeds future model training — making the OCR better over time.


What I Learned

WebAssembly is production-ready for heavy compute. Running a C++ OCR engine in the browser via WASM sounds crazy, but it works reliably across all modern browsers. The eng_best LSTM model uses ~500MB RAM but delivers vastly better results than the fast model.

Privacy is a real feature, not just marketing. When I tell people "your prescription never leaves your phone," they visibly relax. In India especially, where data privacy concerns are high but digital literacy varies, this matters.

SEO takes time. The site has been live for 3+ months and traffic is still building. If you're building a tool site, start promoting it on day one — don't wait until it's "perfect."

Client-side architecture eliminates your biggest cost. My hosting bill is $0. Vercel free tier serves the static assets. All compute runs on the user's device. I could handle 100,000 users without paying a cent for servers.


Try It

doctordocs.in — completely free, no sign-up required.

Drop a photo of a handwritten prescription, an old letter, a receipt, or any document. Watch the text appear — processed entirely on your device.

The entire project is built with Next.js 15, TailwindCSS, Tesseract.js, OpenCV.js, and Transformers.js. If you're interested in the technical architecture, I've documented everything in a detailed project report.


What do you think? Have you built anything with WebAssembly in the browser? I'd love to hear about your experiences in the comments.


`

Top comments (1)

Collapse
 
sumit_singh_be595a4e6bb99 profile image
Sumit Singh

Great write-up! I ran into similar tradeoffs building DoctorDocs — a browser-only OCR tool using Tesseract.js + OpenCV.js. The hardest part was convincing myself the eng_best LSTM model (~500MB RAM) was worth it over the fast model. Spoiler: it absolutely was. Would love your take on how you handled cold load times.