Xiao Ling

Posted on Jan 22 • Originally published at dynamsoft.com

Building an Intelligent Web Document Scanner with OCR and Chrome's Built-in AI

#webdev #ai #programming #javascript

In today's digital workplace, the ability to quickly scan, process, and understand documents is crucial. Whether you're digitizing invoices, processing legal documents, or managing medical records, having an efficient document workflow can save hours of manual work. In this comprehensive tutorial, you'll learn how to build a modern web-based document scanner that not only scans and extracts text but also uses AI to automatically summarize documents.

Demo Video: Web Document Scanner with OCR and AI Summarization

Prerequisites

30-day free trial license
OCR requires installing the OCR add-on (Windows Only). Download DynamicWebTWAINOCRResources.zip from Dynamsoft's website and run the installer as administrator.

Technology Stack Overview

Our intelligent document scanner combines three powerful technologies:

1. Dynamic Web TWAIN - Professional Document Scanning

Dynamic Web TWAIN is a JavaScript SDK that enables direct scanner access from web browsers. It provides:

Direct Scanner Control - Access TWAIN/ICA/SANE scanners from the browser
Advanced Image Processing - Auto-deskew, noise reduction, border detection
OCR Integration - Built-in text extraction in multiple languages
Document Editing - Crop, rotate, adjust brightness/contrast
Multiple Output Formats - PDF, JPEG, PNG, TIFF, and more

2. Chrome's Gemini Nano - Built-in AI

Google's Gemini Nano is a lightweight AI model that runs entirely in the browser:

Privacy-First - All processing happens locally, no data sent to servers
Zero Cost - Free to use, no API keys or quotas
Fast Performance - No network latency
Offline Capable - Works without internet connection
Multiple APIs - Summarization, translation, question-answering, and more

3. simple-dynamsoft-mcp - Development Accelerator

The simple-dynamsoft-mcp is a Model Context Protocol (MCP) server that provides:

Quick Code Snippets - Get Dynamic Web TWAIN code examples instantly
API Documentation - Access SDK documentation through your AI assistant
Best Practices - Get guidance on common implementation patterns
Faster Development - Reduce time spent searching documentation

Step-by-Step Development Tutorial

Step 1: Set Up Your Development Environment

First, create your project structure:

mkdir web-document-scanner
cd web-document-scanner

Create the following files:

index.html - Main application UI
css/style.css - Custom styles
js/app.js - Application logic

Step 2: Install Dynamic Web TWAIN

Add the Dynamic Web TWAIN SDK to your HTML:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document Scanner App</title>

    <!-- Bootstrap CSS -->
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet">

    <!-- Font Awesome -->
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css">

    <!-- Dynamic Web TWAIN SDK -->
    <script src="https://cdn.jsdelivr.net/npm/dwt@latest/dist/dynamsoft.webtwain.min.js"></script>

    <!-- Custom CSS -->
    <link rel="stylesheet" href="css/style.css">
</head>
<body>
    <!-- Your UI will go here -->
    <script src="js/app.js"></script>
</body>
</html>

Step 3: Initialize Dynamic Web TWAIN

In your js/app.js, initialize the SDK:

// Configure Dynamic Web TWAIN
Dynamsoft.DWT.ProductKey = "YOUR_LICENSE_KEY_HERE"; // Get free trial at dynamsoft.com
Dynamsoft.DWT.ResourcesPath = 'https://cdn.jsdelivr.net/npm/dwt@latest/dist';
Dynamsoft.DWT.AutoLoad = false;
Dynamsoft.DWT.UseDefaultViewer = false;

let DWTObject;

// Initialize SDK
Dynamsoft.DWT.CreateDWTObjectEx({
    WebTwainId: 'dwtId',
}, function (object) {
    DWTObject = object;
    console.log('Dynamic Web TWAIN initialized successfully');
    loadScanners();
}, function (error) {
    console.error('DWT initialization failed:', error);
    alert('Failed to initialize scanner. Please refresh and try again.');
});

Step 4: Load Available Scanners

Detect and list available scanners:

function loadScanners() {
    const sourceSelect = document.getElementById('sourceSelect');

    DWTObject.GetDevicesAsync().then(function (devices) {
        sourceSelect.innerHTML = '';

        if (devices.length === 0) {
            sourceSelect.innerHTML = '<option>No scanners found</option>';
            return;
        }

        devices.forEach((device, index) => {
            const option = document.createElement('option');
            option.value = index;
            option.textContent = device.displayName;
            sourceSelect.appendChild(option);
        });

        console.log(`Found ${devices.length} scanner(s)`);
    }).catch(function (error) {
        console.error('Failed to load scanners:', error);
    });
}

Step 5: Implement Document Scanning

Add the scanning functionality:

function acquireImage() {
    const selectedIndex = document.getElementById('sourceSelect').value;

    if (!deviceList[selectedIndex]) {
        alert('Please select a scanner first.');
        return;
    }

    DWTObject.SelectDeviceAsync(deviceList[selectedIndex])
        .then(function () {
            return DWTObject.AcquireImageAsync({
                IfShowUI: false,
                IfCloseSourceAfterAcquire: true
            });
        })
        .then(function () {
            console.log('Scan completed successfully');
            updateImageDisplay();
        })
        .catch(function (error) {
            alert('Scan failed: ' + error.message);
        });
}

Step 6: Display Scanned Documents

Create a thumbnail view and large preview:

function updateImageDisplay() {
    const thumbnailContainer = document.getElementById('thumbnailContainer');
    thumbnailContainer.innerHTML = '';

    if (DWTObject.HowManyImagesInBuffer === 0) {
        thumbnailContainer.innerHTML = '<div class="empty-state">No documents yet</div>';
        return;
    }

    // Create thumbnails for each image
    for (let i = 0; i < DWTObject.HowManyImagesInBuffer; i++) {
        const imgUrl = DWTObject.GetImageURL(i);
        const thumbnail = document.createElement('div');
        thumbnail.className = 'thumbnail-item';
        thumbnail.onclick = () => selectImage(i);

        const img = document.createElement('img');
        img.src = imgUrl;
        img.alt = `Document ${i + 1}`;

        thumbnail.appendChild(img);
        thumbnailContainer.appendChild(thumbnail);
    }
}

function selectImage(index) {
    selectedImageIndex = index;
    const imgUrl = DWTObject.GetImageURL(index);

    document.getElementById('largeImageDisplay').src = imgUrl;
    document.getElementById('largeImageDisplay').style.display = 'block';
    document.getElementById('ocrBtnLarge').disabled = false;
}

Step 7: Implement OCR Text Extraction

Add OCR functionality using Dynamic Web TWAIN's OCR add-on:

async function performOCR(imageIndex) {
    const modal = document.getElementById('ocrModal');
    const ocrText = document.getElementById('ocrText');
    const language = document.getElementById('ocrLanguage').value;

    modal.classList.add('show');
    ocrText.textContent = 'Processing OCR...';

    try {
        // Perform OCR
        const result = await DWTObject.Addon.OCRKit.Recognize(imageIndex, {
            settings: { language: language }
        });

        // Extract text from result
        const extractedText = formatOCRResult(result);
        ocrText.textContent = extractedText || 'No text found.';

    } catch (error) {
        console.error('OCR Error:', error);
        ocrText.textContent = `OCR Error: ${error.message}`;
    }
}

function formatOCRResult(result) {
    let text = '';
    if (result && result.blocks) {
        result.blocks.forEach(block => {
            if (block.lines) {
                block.lines.forEach(line => {
                    if (line.words) {
                        line.words.forEach(word => {
                            text += word.value + ' ';
                        });
                    }
                    text += '\n';
                });
            }
            text += '\n';
        });
    }
    return text.trim();
}

Step 8: Integrate Chrome's Gemini Nano for Summarization

Now for the exciting part - adding AI-powered summarization:

async function summarizeOCRText() {
    const ocrText = document.getElementById('ocrText').textContent;

    if (!ocrText || ocrText.startsWith('OCR Error')) {
        alert('No valid text to summarize.');
        return;
    }

    const summaryModal = document.getElementById('summaryModal');
    const summaryText = document.getElementById('summaryText');
    const progressBar = document.getElementById('summaryProgress');

    summaryModal.classList.add('show');
    summaryText.textContent = 'Preparing summary...';

    try {
        // Check if Summarizer API is supported
        if (!('Summarizer' in self)) {
            throw new Error('Summarizer API not supported. Please use Chrome 138+.');
        }

        // Check availability
        const availability = await Summarizer.availability();
        if (availability === 'unavailable') {
            throw new Error('Summarizer API is not available.');
        }

        // Download model if needed (first time only)
        summaryText.textContent = 'Loading AI model...';

        const summarizer = await Summarizer.create({
            type: 'tldr',        // Options: 'tldr', 'key-points', 'headline'
            length: 'medium',    // Options: 'short', 'medium', 'long'
            format: 'plain-text', // Options: 'plain-text', 'markdown'
            monitor(m) {
                // Track download progress
                m.addEventListener('downloadprogress', (e) => {
                    const progress = Math.round(e.loaded * 100);
                    summaryText.textContent = `Loading AI model... ${progress}%`;
                });
            }
        });

        summaryText.textContent = 'Generating summary...';

        // Generate summary
        const summary = await summarizer.summarize(ocrText, {
            context: 'This is text extracted from a scanned document via OCR.'
        });

        summaryText.textContent = summary;

    } catch (error) {
        console.error('Summarization Error:', error);
        summaryText.textContent = `Error: ${error.message}`;
    }
}

Step 9: Add Save Functionality

Allow users to save scanned documents as PDF:

function saveImages() {
    if (DWTObject.HowManyImagesInBuffer === 0) {
        alert('No images to save.');
        return;
    }

    DWTObject.IfShowFileDialog = true;
    DWTObject.SaveAllAsPDF(
        'scanned_documents.pdf',
        function () {
            alert('Documents saved successfully!');
        },
        function (errorCode, errorString) {
            alert('Failed to save: ' + errorString);
        }
    );
}

Step 10: Build the User Interface

Create a clean, modern interface in index.html:

<div class="container scanner-container">
    <div class="scanner-header">
        <h1><i class="fas fa-scanner"></i> Document Scanner</h1>
        <p class="lead">Scan, extract, and summarize documents with AI</p>
    </div>

    <div class="card scanner-card">
        <!-- Scanner Controls -->
        <div class="scanner-controls">
            <div class="row">
                <div class="col-md-6">
                    <label for="sourceSelect">Select Scanner:</label>
                    <select class="form-select" id="sourceSelect">
                        <option>Loading scanners...</option>
                    </select>
                </div>
                <div class="col-md-6 text-center">
                    <button id="scanBtn" class="btn btn-primary btn-lg">
                        <i class="fas fa-play-circle"></i> Start Scanning
                    </button>
                </div>
            </div>
        </div>

        <!-- Document Viewer -->
        <div class="viewer-container">
            <div class="document-viewer">
                <!-- Thumbnail Panel -->
                <div class="thumbnail-panel">
                    <h6>Documents</h6>
                    <div id="thumbnailContainer"></div>
                </div>

                <!-- Large Image Panel -->
                <div class="large-image-panel">
                    <h6>Document View</h6>
                    <div id="largeImageContainer">
                        <img id="largeImageDisplay" alt="Selected document">
                    </div>
                    <div class="controls">
                        <button id="ocrBtnLarge">
                            <i class="fas fa-search"></i> OCR
                        </button>
                        <button id="editBtnLarge">
                            <i class="fas fa-pen"></i> Edit
                        </button>
                    </div>
                </div>
            </div>
        </div>
    </div>
</div>

<!-- OCR Modal -->
<div id="ocrModal" class="modal">
    <div class="modal-content">
        <div class="modal-header">
            <h4>OCR Result</h4>
            <button class="close" onclick="closeOCRModal()">&times;</button>
        </div>
        <div class="modal-body">
            <div id="ocrText">Processing OCR...</div>
        </div>
        <div class="modal-footer">
            <select id="ocrLanguage">
                <option value="en">English</option>
                <option value="fr">French</option>
                <option value="es">Spanish</option>
                <option value="de">German</option>
            </select>
            <button onclick="copyOCRText()">Copy Text</button>
            <button onclick="summarizeOCRText()">Summarize</button>
        </div>
    </div>
</div>

<!-- Summary Modal -->
<div id="summaryModal" class="modal">
    <div class="modal-content">
        <div class="modal-header">
            <h4>Document Summary</h4>
            <button class="close" onclick="closeSummaryModal()">&times;</button>
        </div>
        <div class="modal-body">
            <div id="summaryProgress" style="display:none;">
                <div class="progress-fill"></div>
            </div>
            <div id="summaryText">Preparing summary...</div>
        </div>
        <div class="modal-footer">
            <button onclick="copySummaryText()">Copy Summary</button>
        </div>
    </div>
</div>

Using simple-dynamsoft-mcp for Faster Development

To accelerate development with Dynamic Web TWAIN, use the simple-dynamsoft-mcp server with your AI assistant:

Installation

npm install -g simple-dynamsoft-mcp

Configuration for VS Code

Add to your MCP settings:

{
  "servers": {
    "dynamsoft": {
      "command": "npx",
      "args": ["-y", "simple-dynamsoft-mcp"]
    }
  }
}

Benefits of Using MCP

Once configured, you can ask your AI assistant:

"Show me how to scan documents with Dynamic Web TWAIN"
"How do I perform OCR on a scanned image?"
"What's the code to save images as PDF?"
"How to load images from local files?"

The MCP server provides instant code examples and API documentation, dramatically reducing development time.

Enabling Gemini Nano in Chrome

To use AI summarization, enable Gemini Nano:

Update Chrome to version 138 or later
Enable flags:
- Go to chrome://flags/#optimization-guide-on-device-model
- Set to "Enabled BypassPerfRequirement"
- Go to chrome://flags/#summarization-api-for-gemini-nano
- Set to "Enabled"
Restart Chrome
Download model:
- Visit chrome://components/
- Find "Optimization Guide On Device Model"
- Click "Check for update"

Testing Your Application

1. Start a Local Server

# Using Python
python -m http.server 8000

# Using Node.js
npx http-server -p 8000

2. Open in Browser

Navigate to http://localhost:8000

3. Test Workflow

Select Scanner - Choose from the dropdown
Scan Document - Click "Start Scanning"
View Document - Click thumbnail to preview
Extract Text - Click "OCR" button
Summarize - Click "Summarize" in OCR modal
Save - Click "Save All" to export as PDF

Source Code

https://github.com/yushulx/web-twain-document-scan-management/tree/main/examples/scan_ocr_summarize

DEV Community