Xiao Ling

Posted on Sep 16 • Originally published at dynamsoft.com

Intelligent Document Splitting: Blank Page Detection with Dynamic Web TWAIN

#webdev #programming #javascript #scanner

Document scanning workflows often involve processing multi-page documents that contain separator pages or blank pages used for organizational purposes. Manually identifying and removing these blank pages while splitting documents can be time-consuming and error-prone. In this tutorial, we'll explore how to implement intelligent document splitting using Dynamic Web TWAIN's powerful blank page detection capabilities.

Demo Video: Blank Image Detection for Document Management

Online Demo

https://yushulx.me/web-twain-document-scan-management/examples/split_merge_document/

Prerequisites

Dynamsoft license key

The Challenge of Blank Pages in Document Scanning

In professional document scanning environments, blank pages serve various purposes:

Document Separators: Used to divide different documents in a batch scan
Page Padding: Added to ensure proper document alignment in feeders
Organizational Markers: Inserted between sections for filing purposes
Accidental Inclusions: Blank pages mixed in with content pages

Manual processing of these documents requires:

Human review of each page
Manual identification of blank pages
Time-consuming splitting and reorganization
Risk of human error in document classification

Why Dynamic Web TWAIN for Blank Detection

Dynamic Web TWAIN provides several advantages for implementing intelligent blank page detection:

JavaScript Blank Detection API

Built-in IsBlankImageAsync() method for accurate detection
Handles various image qualities and scanning conditions

Browser-Based Scanner Control

Cross-platform compatibility (Windows, macOS, Linux)
Supports TWAIN, WIA, ICA, and SANE scanners

Understanding the Auto Split Feature

Our implementation provides an intelligent Auto Split feature that:

Analyzes each page using Dynamic Web TWAIN's blank detection
Identifies separator pages based on blank content detection
Splits documents at blank page boundaries
Removes blank pages from the final output
Creates organized document groups automatically

Key Benefits:

Automated Workflow: Eliminates manual intervention
Improved Accuracy: Reduces human error in document organization
Time Savings: Processes hundreds of pages in seconds
Clean Output: Removes unwanted blank pages automatically

Implementation Guide

Before implementing blank page detection, ensure you have included the Dynamic Web TWAIN SDK in your project.

<!-- Dynamic Web TWAIN SDK -->
<script src="https://unpkg.com/dwt/dist/dynamsoft.webtwain.min.js"></script>

Basic Setup

Initialize the Dynamic Web TWAIN environment with your license key:

Dynamsoft.DWT.ProductKey = licenseKey;
Dynamsoft.DWT.ResourcesPath = 'https://unpkg.com/dwt/dist/';

Dynamsoft.DWT.CreateDWTObjectEx({
    WebTwainId: 'mydwt-' + Date.now()
}, (dwtObject) => {
    console.log('Dynamic Web TWAIN initialized successfully');
}, (error) => {
    console.error('DWT initialization failed:', error);
});

Auto Split Function with Blank Page Detection

Here's the complete implementation of our intelligent auto split feature:

async autoSplit() {
    if (!DWTObject || imageCount === 0) {
        Utils.showNotification('No images to analyze for auto split', 'error');
        return;
    }

    Utils.showNotification('Analyzing images for blank pages...', 'info');
    let splitsPerformed = 0;
    let blankPagesRemoved = 0;

    const imageBoxContainer = document.querySelector('#imagebox-1 .ds-imagebox');
    if (!imageBoxContainer) {
        Utils.showNotification('No images found to analyze', 'error');
        return;
    }

    for (let i = imageCount - 1; i >= 0; i--) {
        try {
            let isBlank = await DWTObject.IsBlankImageAsync(i);

            if (isBlank) {
                const imageID = DWTObject.IndexToImageID(i);
                const imgElement = document.querySelector(`img[imageid="${imageID}"]`);

                if (imgElement) {
                    const imageWrapper = imgElement.parentNode;
                    const previousWrapper = imageWrapper.previousElementSibling;

                    if (previousWrapper) {
                        this.splitImage(imgElement);
                        splitsPerformed++;
                        console.log(`Split performed before blank page (image index: ${i})`);
                    }

                    FileManager.deleteOneImage(imgElement);
                    blankPagesRemoved++;
                    console.log(`Blank page removed (image index: ${i})`);
                }
            }
        } catch (error) {
            console.error('Error analyzing image at index', i, ':', error);
        }
    }

    FileManager.deleteEmptyDocs();

    if (splitsPerformed > 0 || blankPagesRemoved > 0) {
        let message = 'Auto split completed! ';
        if (splitsPerformed > 0) message += `${splitsPerformed} split(s) performed. `;
        if (blankPagesRemoved > 0) message += `${blankPagesRemoved} blank page(s) removed.`;
        Utils.showNotification(message, 'success');
        PageManager.updateAll();
    } else {
        Utils.showNotification('No blank pages detected for splitting', 'info');
    }
}

Implementation Details

1. Reverse Processing Strategy

for (let i = imageCount - 1; i >= 0; i--) {
}

Processing images in reverse order prevents index shifting issues when documents are split or pages are removed.

2. Blank Page Detection

let isBlank = await DWTObject.IsBlankImageAsync(i);

Dynamic Web TWAIN's IsBlankImageAsync() method provides accurate blank page detection algorithm.

3. Smart Document Splitting

if (previousWrapper) {
    this.splitImage(imgElement);
    splitsPerformed++;
}

The algorithm intelligently splits documents only when blank pages have preceding content, preventing empty document creation.

4. Cleanup and Removal

FileManager.deleteOneImage(imgElement);
blankPagesRemoved++;

Blank pages are completely removed from the document workflow, ensuring clean output.

Document Splitting Logic

The splitImage() method handles the creation of new document groups:

splitImage(imageEl) {
    const imageWrapperDiv = imageEl.parentNode;
    const previousDivEl = imageWrapperDiv.previousSibling;

    if (previousDivEl) {
        this.createNextDocument(previousDivEl);
    }
}

This method:

Identifies the split point before the blank page
Creates a new document group
Moves subsequent pages to the new group
Updates the UI to reflect the new document structure

Source Code

https://github.com/yushulx/web-twain-document-scan-management/tree/main/examples/split_merge_document

DEV Community