<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: UCHIHAMADRA</title>
    <description>The latest articles on DEV Community by UCHIHAMADRA (@uchihamadra).</description>
    <link>https://dev.to/uchihamadra</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F952017%2F0c68dc5a-d2b5-4f1e-befc-8e3bca2de35a.jpeg</url>
      <title>DEV Community: UCHIHAMADRA</title>
      <link>https://dev.to/uchihamadra</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/uchihamadra"/>
    <language>en</language>
    <item>
      <title>I Built a 100% Browser-Based OCR That Never Uploads Your Documents — Here's How</title>
      <dc:creator>UCHIHAMADRA</dc:creator>
      <pubDate>Fri, 10 Apr 2026 15:30:20 +0000</pubDate>
      <link>https://dev.to/uchihamadra/i-built-a-100-browser-based-ocr-that-never-uploads-your-documents-heres-how-286p</link>
      <guid>https://dev.to/uchihamadra/i-built-a-100-browser-based-ocr-that-never-uploads-your-documents-heres-how-286p</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Your medical prescriptions, passports, and bank statements deserve better than being uploaded to someone else's server.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I'm a developer from India, and I built &lt;strong&gt;&lt;a href="https://doctordocs.in" rel="noopener noreferrer"&gt;DoctorDocs&lt;/a&gt;&lt;/strong&gt; — a free OCR platform where every single byte of processing happens in your browser. No uploads. No servers. No data collection. Your documents never leave your device.&lt;/p&gt;

&lt;p&gt;Here's why I built it, how it works under the hood, and what I learned shipping a WebAssembly-powered app to production.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem That Made Me Angry
&lt;/h2&gt;

&lt;p&gt;My grandmother needed to read a doctor's prescription. The handwriting was illegible — even the pharmacist squinted at it. I thought, "surely there's a free tool online for this."&lt;/p&gt;

&lt;p&gt;There is. Dozens of them. And every single one requires you to &lt;strong&gt;upload&lt;/strong&gt; your medical prescription to their server. Think about that — your name, your medications, your diagnosis, sitting on some random company's S3 bucket.&lt;/p&gt;

&lt;p&gt;Google Lens works great, but it sends your image to Google's servers. Adobe Scan requires an account. Every "free OCR" tool I found was actually "free to upload your sensitive documents to our cloud."&lt;/p&gt;

&lt;p&gt;I decided to build one that works differently.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture: Zero Server Processing
&lt;/h2&gt;

&lt;p&gt;DoctorDocs runs on a &lt;strong&gt;thick-client / thin-server&lt;/strong&gt; architecture built with Next.js 15. The "thin server" part? It just serves the static HTML/JS. All the actual OCR processing runs in your browser using WebAssembly.&lt;/p&gt;

&lt;p&gt;Here's the pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User drops image
    ↓
OpenCV.js (WASM) → Binarization, shadow removal, contrast enhancement
    ↓
Tesseract.js (WASM) → LSTM neural network OCR, multi-threaded via Web Workers
    ↓
Custom text formatter → Noise reduction, error correction
    ↓
Monaco editor → Edit, copy, or export to PDF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every step runs on the client's CPU. The server never sees the image.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Magic Enhance Feature
&lt;/h3&gt;

&lt;p&gt;The #1 problem with phone camera OCR is &lt;strong&gt;uneven lighting&lt;/strong&gt;. You photograph a prescription under a desk lamp, and half the page is bright while the other half is in shadow.&lt;/p&gt;

&lt;p&gt;Most tools just crank up the brightness globally. That makes the bright parts white and the dark parts... still dark.&lt;/p&gt;

&lt;p&gt;I used OpenCV.js to run &lt;strong&gt;adaptive Gaussian thresholding&lt;/strong&gt; — it breaks the image into 31×31 pixel neighborhoods and adjusts each one relative to its local area. Shadows disappear. Text becomes crisp. It's the same algorithm used in industrial document scanners, running in your browser via WebAssembly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// This runs entirely in the browser via OpenCV.js WASM&lt;/span&gt;
&lt;span class="nx"&gt;cv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;adaptiveThreshold&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;grayMat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;binaryMat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;cv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ADAPTIVE_THRESH_GAUSSIAN_C&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;cv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;THRESH_BINARY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// block size&lt;/span&gt;
  &lt;span class="mi"&gt;15&lt;/span&gt;   &lt;span class="c1"&gt;// constant&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Multi-Threaded OCR
&lt;/h3&gt;

&lt;p&gt;Tesseract.js is powerful but slow on a single thread. So I query &lt;code&gt;navigator.hardwareConcurrency&lt;/code&gt; to detect CPU cores and spin up a worker pool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;navigator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hardwareConcurrency&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;workerCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cores&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Each worker loads the eng_best LSTM model&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;worker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createWorker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;eng&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;OEM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;LSTM_ONLY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;corePath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tesseract-core-lstm.wasm.js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;langPath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;4.0.0_best&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Deep learning model, not the fast one&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On a modern laptop, this cuts processing time by 60-70% compared to single-threaded OCR.&lt;/p&gt;




&lt;h2&gt;
  
  
  150+ Tool Pages, One Engine
&lt;/h2&gt;

&lt;p&gt;DoctorDocs has 144 statically generated tool pages — &lt;code&gt;/tools/handwriting-to-text&lt;/code&gt;, &lt;code&gt;/tools/prescription-ocr&lt;/code&gt;, &lt;code&gt;/tools/receipt-scanner&lt;/code&gt;, etc. They all use the same Tesseract.js engine under the hood.&lt;/p&gt;

&lt;p&gt;"Isn't that cheating?" — No. It's the exact strategy Smallpdf and ILovePDF use. The OCR engine doesn't change, but the SEO metadata, titles, FAQs, and use-case descriptions do. Each page targets a different search keyword.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// generateStaticParams() SSGs all 144 pages at build time&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;generateStaticParams&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;TOOLS_CATALOG&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;slug&lt;/span&gt; &lt;span class="p"&gt;}));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every tool page auto-generates a "You Might Also Like" section linking to 6 related tools, creating an internal link mesh across all pages.&lt;/p&gt;




&lt;h2&gt;
  
  
  Beyond OCR: The Tools That Run Locally
&lt;/h2&gt;

&lt;p&gt;DoctorDocs isn't just OCR. It includes &lt;strong&gt;9 PDF utilities&lt;/strong&gt; and &lt;strong&gt;5 image editing tools&lt;/strong&gt;, all client-side:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PDF Tools&lt;/strong&gt; (powered by &lt;code&gt;pdf-lib&lt;/code&gt; + &lt;code&gt;pdf.js&lt;/code&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Merge, Split, Compress, Watermark, Rotate PDFs&lt;/li&gt;
&lt;li&gt;Extract/Remove pages&lt;/li&gt;
&lt;li&gt;Image to PDF, PDF to JPG&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Image Tools&lt;/strong&gt; (powered by HTML Canvas):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Crop, Brighten, Black &amp;amp; White, AI Upscale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AI Tools&lt;/strong&gt; (powered by &lt;code&gt;@xenova/transformers&lt;/code&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI Text Detector — runs a 300MB RoBERTa model in the browser via WebGL&lt;/li&gt;
&lt;li&gt;AI Text Writer&lt;/li&gt;
&lt;li&gt;AI Summarizer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every single one runs without uploading anything.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Self-Learning OCR Pipeline
&lt;/h2&gt;

&lt;p&gt;This is the part I'm most excited about. DoctorDocs implements a &lt;strong&gt;three-tier OCR system&lt;/strong&gt; that learns from every user interaction:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 1: Gemini 2.5 Flash&lt;/strong&gt; — When available, the image is sent to Google's Gemini API for enterprise-grade accuracy. This is opt-in and only used when API keys are configured.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 2: TrOCR Vision Transformer&lt;/strong&gt; — Runs entirely in the browser as a "shadow model." It processes the same image in the background, and its output is compared against Tier 1 for training purposes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 3: Tesseract.js&lt;/strong&gt; — The offline fallback. Always works, even without internet.&lt;/p&gt;

&lt;p&gt;When a user copies or downloads the text, the system captures the diff between the AI output and the user's corrected version. This ground truth data feeds future model training — making the OCR better over time.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;WebAssembly is production-ready for heavy compute.&lt;/strong&gt; Running a C++ OCR engine in the browser via WASM sounds crazy, but it works reliably across all modern browsers. The &lt;code&gt;eng_best&lt;/code&gt; LSTM model uses ~500MB RAM but delivers vastly better results than the fast model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Privacy is a real feature, not just marketing.&lt;/strong&gt; When I tell people "your prescription never leaves your phone," they visibly relax. In India especially, where data privacy concerns are high but digital literacy varies, this matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SEO takes time.&lt;/strong&gt; The site has been live for 3+ months and traffic is still building. If you're building a tool site, start promoting it on day one — don't wait until it's "perfect."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Client-side architecture eliminates your biggest cost.&lt;/strong&gt; My hosting bill is $0. Vercel free tier serves the static assets. All compute runs on the user's device. I could handle 100,000 users without paying a cent for servers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://doctordocs.in" rel="noopener noreferrer"&gt;doctordocs.in&lt;/a&gt;&lt;/strong&gt; — completely free, no sign-up required.&lt;/p&gt;

&lt;p&gt;Drop a photo of a handwritten prescription, an old letter, a receipt, or any document. Watch the text appear — processed entirely on your device.&lt;/p&gt;

&lt;p&gt;The entire project is built with Next.js 15, TailwindCSS, Tesseract.js, OpenCV.js, and Transformers.js. If you're interested in the technical architecture, I've documented everything in a detailed project report.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What do you think? Have you built anything with WebAssembly in the browser? I'd love to hear about your experiences in the comments.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;`&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
