DEV Community

Xiao Ling
Xiao Ling

Posted on • Originally published at dynamsoft.com

Building an Intelligent Web Document Scanner with OCR and Chrome's Built-in AI

In today's digital workplace, the ability to quickly scan, process, and understand documents is crucial. Whether you're digitizing invoices, processing legal documents, or managing medical records, having an efficient document workflow can save hours of manual work. In this comprehensive tutorial, you'll learn how to build a modern web-based document scanner that not only scans and extracts text but also uses AI to automatically summarize documents.

Demo Video: Web Document Scanner with OCR and AI Summarization

Prerequisites

Technology Stack Overview

Our intelligent document scanner combines three powerful technologies:

1. Dynamic Web TWAIN - Professional Document Scanning

Dynamic Web TWAIN is a JavaScript SDK that enables direct scanner access from web browsers. It provides:

  • Direct Scanner Control - Access TWAIN/ICA/SANE scanners from the browser
  • Advanced Image Processing - Auto-deskew, noise reduction, border detection
  • OCR Integration - Built-in text extraction in multiple languages
  • Document Editing - Crop, rotate, adjust brightness/contrast
  • Multiple Output Formats - PDF, JPEG, PNG, TIFF, and more

2. Chrome's Gemini Nano - Built-in AI

Google's Gemini Nano is a lightweight AI model that runs entirely in the browser:

  • Privacy-First - All processing happens locally, no data sent to servers
  • Zero Cost - Free to use, no API keys or quotas
  • Fast Performance - No network latency
  • Offline Capable - Works without internet connection
  • Multiple APIs - Summarization, translation, question-answering, and more

3. simple-dynamsoft-mcp - Development Accelerator

The simple-dynamsoft-mcp is a Model Context Protocol (MCP) server that provides:

  • Quick Code Snippets - Get Dynamic Web TWAIN code examples instantly
  • API Documentation - Access SDK documentation through your AI assistant
  • Best Practices - Get guidance on common implementation patterns
  • Faster Development - Reduce time spent searching documentation

Step-by-Step Development Tutorial

Step 1: Set Up Your Development Environment

First, create your project structure:

mkdir web-document-scanner
cd web-document-scanner
Enter fullscreen mode Exit fullscreen mode

Create the following files:

  • index.html - Main application UI
  • css/style.css - Custom styles
  • js/app.js - Application logic

Step 2: Install Dynamic Web TWAIN

Add the Dynamic Web TWAIN SDK to your HTML:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document Scanner App</title>

    <!-- Bootstrap CSS -->
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet">

    <!-- Font Awesome -->
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css">

    <!-- Dynamic Web TWAIN SDK -->
    <script src="https://cdn.jsdelivr.net/npm/dwt@latest/dist/dynamsoft.webtwain.min.js"></script>

    <!-- Custom CSS -->
    <link rel="stylesheet" href="css/style.css">
</head>
<body>
    <!-- Your UI will go here -->
    <script src="js/app.js"></script>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

Step 3: Initialize Dynamic Web TWAIN

In your js/app.js, initialize the SDK:

// Configure Dynamic Web TWAIN
Dynamsoft.DWT.ProductKey = "YOUR_LICENSE_KEY_HERE"; // Get free trial at dynamsoft.com
Dynamsoft.DWT.ResourcesPath = 'https://cdn.jsdelivr.net/npm/dwt@latest/dist';
Dynamsoft.DWT.AutoLoad = false;
Dynamsoft.DWT.UseDefaultViewer = false;

let DWTObject;

// Initialize SDK
Dynamsoft.DWT.CreateDWTObjectEx({
    WebTwainId: 'dwtId',
}, function (object) {
    DWTObject = object;
    console.log('Dynamic Web TWAIN initialized successfully');
    loadScanners();
}, function (error) {
    console.error('DWT initialization failed:', error);
    alert('Failed to initialize scanner. Please refresh and try again.');
});
Enter fullscreen mode Exit fullscreen mode

Step 4: Load Available Scanners

Detect and list available scanners:

function loadScanners() {
    const sourceSelect = document.getElementById('sourceSelect');

    DWTObject.GetDevicesAsync().then(function (devices) {
        sourceSelect.innerHTML = '';

        if (devices.length === 0) {
            sourceSelect.innerHTML = '<option>No scanners found</option>';
            return;
        }

        devices.forEach((device, index) => {
            const option = document.createElement('option');
            option.value = index;
            option.textContent = device.displayName;
            sourceSelect.appendChild(option);
        });

        console.log(`Found ${devices.length} scanner(s)`);
    }).catch(function (error) {
        console.error('Failed to load scanners:', error);
    });
}
Enter fullscreen mode Exit fullscreen mode

Step 5: Implement Document Scanning

Add the scanning functionality:

function acquireImage() {
    const selectedIndex = document.getElementById('sourceSelect').value;

    if (!deviceList[selectedIndex]) {
        alert('Please select a scanner first.');
        return;
    }

    DWTObject.SelectDeviceAsync(deviceList[selectedIndex])
        .then(function () {
            return DWTObject.AcquireImageAsync({
                IfShowUI: false,
                IfCloseSourceAfterAcquire: true
            });
        })
        .then(function () {
            console.log('Scan completed successfully');
            updateImageDisplay();
        })
        .catch(function (error) {
            alert('Scan failed: ' + error.message);
        });
}
Enter fullscreen mode Exit fullscreen mode

Step 6: Display Scanned Documents

Create a thumbnail view and large preview:

function updateImageDisplay() {
    const thumbnailContainer = document.getElementById('thumbnailContainer');
    thumbnailContainer.innerHTML = '';

    if (DWTObject.HowManyImagesInBuffer === 0) {
        thumbnailContainer.innerHTML = '<div class="empty-state">No documents yet</div>';
        return;
    }

    // Create thumbnails for each image
    for (let i = 0; i < DWTObject.HowManyImagesInBuffer; i++) {
        const imgUrl = DWTObject.GetImageURL(i);
        const thumbnail = document.createElement('div');
        thumbnail.className = 'thumbnail-item';
        thumbnail.onclick = () => selectImage(i);

        const img = document.createElement('img');
        img.src = imgUrl;
        img.alt = `Document ${i + 1}`;

        thumbnail.appendChild(img);
        thumbnailContainer.appendChild(thumbnail);
    }
}

function selectImage(index) {
    selectedImageIndex = index;
    const imgUrl = DWTObject.GetImageURL(index);

    document.getElementById('largeImageDisplay').src = imgUrl;
    document.getElementById('largeImageDisplay').style.display = 'block';
    document.getElementById('ocrBtnLarge').disabled = false;
}
Enter fullscreen mode Exit fullscreen mode

Step 7: Implement OCR Text Extraction

Add OCR functionality using Dynamic Web TWAIN's OCR add-on:

async function performOCR(imageIndex) {
    const modal = document.getElementById('ocrModal');
    const ocrText = document.getElementById('ocrText');
    const language = document.getElementById('ocrLanguage').value;

    modal.classList.add('show');
    ocrText.textContent = 'Processing OCR...';

    try {
        // Perform OCR
        const result = await DWTObject.Addon.OCRKit.Recognize(imageIndex, {
            settings: { language: language }
        });

        // Extract text from result
        const extractedText = formatOCRResult(result);
        ocrText.textContent = extractedText || 'No text found.';

    } catch (error) {
        console.error('OCR Error:', error);
        ocrText.textContent = `OCR Error: ${error.message}`;
    }
}

function formatOCRResult(result) {
    let text = '';
    if (result && result.blocks) {
        result.blocks.forEach(block => {
            if (block.lines) {
                block.lines.forEach(line => {
                    if (line.words) {
                        line.words.forEach(word => {
                            text += word.value + ' ';
                        });
                    }
                    text += '\n';
                });
            }
            text += '\n';
        });
    }
    return text.trim();
}
Enter fullscreen mode Exit fullscreen mode

Step 8: Integrate Chrome's Gemini Nano for Summarization

Now for the exciting part - adding AI-powered summarization:

async function summarizeOCRText() {
    const ocrText = document.getElementById('ocrText').textContent;

    if (!ocrText || ocrText.startsWith('OCR Error')) {
        alert('No valid text to summarize.');
        return;
    }

    const summaryModal = document.getElementById('summaryModal');
    const summaryText = document.getElementById('summaryText');
    const progressBar = document.getElementById('summaryProgress');

    summaryModal.classList.add('show');
    summaryText.textContent = 'Preparing summary...';

    try {
        // Check if Summarizer API is supported
        if (!('Summarizer' in self)) {
            throw new Error('Summarizer API not supported. Please use Chrome 138+.');
        }

        // Check availability
        const availability = await Summarizer.availability();
        if (availability === 'unavailable') {
            throw new Error('Summarizer API is not available.');
        }

        // Download model if needed (first time only)
        summaryText.textContent = 'Loading AI model...';

        const summarizer = await Summarizer.create({
            type: 'tldr',        // Options: 'tldr', 'key-points', 'headline'
            length: 'medium',    // Options: 'short', 'medium', 'long'
            format: 'plain-text', // Options: 'plain-text', 'markdown'
            monitor(m) {
                // Track download progress
                m.addEventListener('downloadprogress', (e) => {
                    const progress = Math.round(e.loaded * 100);
                    summaryText.textContent = `Loading AI model... ${progress}%`;
                });
            }
        });

        summaryText.textContent = 'Generating summary...';

        // Generate summary
        const summary = await summarizer.summarize(ocrText, {
            context: 'This is text extracted from a scanned document via OCR.'
        });

        summaryText.textContent = summary;

    } catch (error) {
        console.error('Summarization Error:', error);
        summaryText.textContent = `Error: ${error.message}`;
    }
}
Enter fullscreen mode Exit fullscreen mode

Step 9: Add Save Functionality

Allow users to save scanned documents as PDF:

function saveImages() {
    if (DWTObject.HowManyImagesInBuffer === 0) {
        alert('No images to save.');
        return;
    }

    DWTObject.IfShowFileDialog = true;
    DWTObject.SaveAllAsPDF(
        'scanned_documents.pdf',
        function () {
            alert('Documents saved successfully!');
        },
        function (errorCode, errorString) {
            alert('Failed to save: ' + errorString);
        }
    );
}
Enter fullscreen mode Exit fullscreen mode

Step 10: Build the User Interface

Create a clean, modern interface in index.html:

<div class="container scanner-container">
    <div class="scanner-header">
        <h1><i class="fas fa-scanner"></i> Document Scanner</h1>
        <p class="lead">Scan, extract, and summarize documents with AI</p>
    </div>

    <div class="card scanner-card">
        <!-- Scanner Controls -->
        <div class="scanner-controls">
            <div class="row">
                <div class="col-md-6">
                    <label for="sourceSelect">Select Scanner:</label>
                    <select class="form-select" id="sourceSelect">
                        <option>Loading scanners...</option>
                    </select>
                </div>
                <div class="col-md-6 text-center">
                    <button id="scanBtn" class="btn btn-primary btn-lg">
                        <i class="fas fa-play-circle"></i> Start Scanning
                    </button>
                </div>
            </div>
        </div>

        <!-- Document Viewer -->
        <div class="viewer-container">
            <div class="document-viewer">
                <!-- Thumbnail Panel -->
                <div class="thumbnail-panel">
                    <h6>Documents</h6>
                    <div id="thumbnailContainer"></div>
                </div>

                <!-- Large Image Panel -->
                <div class="large-image-panel">
                    <h6>Document View</h6>
                    <div id="largeImageContainer">
                        <img id="largeImageDisplay" alt="Selected document">
                    </div>
                    <div class="controls">
                        <button id="ocrBtnLarge">
                            <i class="fas fa-search"></i> OCR
                        </button>
                        <button id="editBtnLarge">
                            <i class="fas fa-pen"></i> Edit
                        </button>
                    </div>
                </div>
            </div>
        </div>
    </div>
</div>

<!-- OCR Modal -->
<div id="ocrModal" class="modal">
    <div class="modal-content">
        <div class="modal-header">
            <h4>OCR Result</h4>
            <button class="close" onclick="closeOCRModal()">&times;</button>
        </div>
        <div class="modal-body">
            <div id="ocrText">Processing OCR...</div>
        </div>
        <div class="modal-footer">
            <select id="ocrLanguage">
                <option value="en">English</option>
                <option value="fr">French</option>
                <option value="es">Spanish</option>
                <option value="de">German</option>
            </select>
            <button onclick="copyOCRText()">Copy Text</button>
            <button onclick="summarizeOCRText()">Summarize</button>
        </div>
    </div>
</div>

<!-- Summary Modal -->
<div id="summaryModal" class="modal">
    <div class="modal-content">
        <div class="modal-header">
            <h4>Document Summary</h4>
            <button class="close" onclick="closeSummaryModal()">&times;</button>
        </div>
        <div class="modal-body">
            <div id="summaryProgress" style="display:none;">
                <div class="progress-fill"></div>
            </div>
            <div id="summaryText">Preparing summary...</div>
        </div>
        <div class="modal-footer">
            <button onclick="copySummaryText()">Copy Summary</button>
        </div>
    </div>
</div>
Enter fullscreen mode Exit fullscreen mode

Using simple-dynamsoft-mcp for Faster Development

To accelerate development with Dynamic Web TWAIN, use the simple-dynamsoft-mcp server with your AI assistant:

Installation

npm install -g simple-dynamsoft-mcp
Enter fullscreen mode Exit fullscreen mode

Configuration for VS Code

Add to your MCP settings:

{
  "servers": {
    "dynamsoft": {
      "command": "npx",
      "args": ["-y", "simple-dynamsoft-mcp"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Benefits of Using MCP

Once configured, you can ask your AI assistant:

  • "Show me how to scan documents with Dynamic Web TWAIN"
  • "How do I perform OCR on a scanned image?"
  • "What's the code to save images as PDF?"
  • "How to load images from local files?"

The MCP server provides instant code examples and API documentation, dramatically reducing development time.

Enabling Gemini Nano in Chrome

To use AI summarization, enable Gemini Nano:

  1. Update Chrome to version 138 or later
  2. Enable flags:
    • Go to chrome://flags/#optimization-guide-on-device-model
    • Set to "Enabled BypassPerfRequirement"
    • Go to chrome://flags/#summarization-api-for-gemini-nano
    • Set to "Enabled"
  3. Restart Chrome
  4. Download model:
    • Visit chrome://components/
    • Find "Optimization Guide On Device Model"
    • Click "Check for update"

Testing Your Application

1. Start a Local Server

# Using Python
python -m http.server 8000

# Using Node.js
npx http-server -p 8000
Enter fullscreen mode Exit fullscreen mode

2. Open in Browser

Navigate to http://localhost:8000

3. Test Workflow

  1. Select Scanner - Choose from the dropdown
  2. Scan Document - Click "Start Scanning"
  3. View Document - Click thumbnail to preview
  4. Extract Text - Click "OCR" button
  5. Summarize - Click "Summarize" in OCR modal
  6. Save - Click "Save All" to export as PDF

web document scanner with ocr and ai summarization

Source Code

https://github.com/yushulx/web-twain-document-scan-management/tree/main/examples/scan_ocr_summarize

Top comments (0)