Xiao Ling

Posted on Apr 2 • Originally published at dynamsoft.com

How to Build a JavaScript Multi-Page Document Scanner Web App with Auto-Capture and PDF Export

#javascript #webdev #document #scanner

Scanning physical documents with a phone or laptop camera—and getting a clean, de-skewed result instantly—is a common requirement in expense-management, healthcare, and contract-signing workflows. The Dynamsoft Capture Vision SDK handles the heavy lifting (boundary detection, perspective normalization) entirely in the browser via WebAssembly, so no server-side component is needed.

What you'll build: A vanilla-JavaScript web app that opens the device camera, automatically detects document boundaries frame-by-frame, captures and normalizes multi-page documents, lets users edit corners, apply image filters, reorder pages, and export the result as a PDF or individual PNGs — powered by Dynamsoft Capture Vision Bundle v3.2.5000.

Online Demo

https://yushulx.me/javascript-barcode-qr-code-scanner/examples/multi-document-capture/

Demo Video: Multi-Page Document Scanner in Action

Prerequisites

A modern browser with camera access (Chrome, Edge, Safari, Firefox).
Node.js and any static HTTP server (Python's http.server, npx serve, etc.) for local development.
A Dynamsoft Capture Vision license key.

Get a 30-day free trial license for Dynamsoft Capture Vision.

Step 1: Load the SDK via CDN

The entire SDK is delivered as a single bundle from jsDelivr — no npm install required. Add both the Dynamsoft Capture Vision bundle and jsPDF to the <head> of index.html:

<script src="https://cdn.jsdelivr.net/npm/dynamsoft-capture-vision-bundle@3.2.5000/dist/dcv.bundle.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/jspdf@2.5.1/dist/jspdf.umd.min.js"></script>

The bundle includes the Document Detection & Normalization (DDN) module, the CaptureVisionRouter, and the LicenseManager — everything needed to go from raw camera frames to normalized document images.

Step 2: Initialize the License and CaptureVisionRouter

After the user submits a license key on the start screen, initSDK activates the license, loads the DDN WebAssembly module, creates a CaptureVisionRouter instance, and loads the preset template file that defines the detection and normalization tasks:

const TEMPLATE_PATH = "./DBR_and_DDN_detect_PresetTemplates.json";
const DETECT_TEMPLATE = "DetectDocumentBoundaries_Default";
const NORMALIZE_TEMPLATE = "NormalizeDocument_Default";

async function initSDK(licenseKey) {
    updateInitStatus("Activating license...");
    await Dynamsoft.License.LicenseManager.initLicense(licenseKey, true);

    updateInitStatus("Loading modules...");
    await Dynamsoft.Core.CoreModule.loadWasm(["DDN"]);

    updateInitStatus("Initializing scanner...");
    cvr = await Dynamsoft.CVR.CaptureVisionRouter.createInstance();
    await cvr.initSettings(TEMPLATE_PATH);

    updateInitStatus("Opening camera...");
    await initCamera();
}

The initSettings call registers DetectDocumentBoundaries_Default and NormalizeDocument_Default as named tasks so they can be invoked by name later.

Step 3: Run the Frame-by-Frame Detection Loop

The scan loop calls cvr.capture() on each animation frame, passing the live <video> element and the detection template name. When a four-point quad is returned, it is drawn on an overlay <canvas> and fed into the stabilizer:

async function scanLoop() {
    if (!isScanning) return;

    if (isProcessingFrame || isCaptureInProgress) {
        scanLoopId = requestAnimationFrame(scanLoop);
        return;
    }

    isProcessingFrame = true;
    try {
        const result = await cvr.capture(videoElement, DETECT_TEMPLATE);
        let quad = null;

        for (const item of result.items || []) {
            if (item.location && item.location.points && item.location.points.length === 4) {
                quad = item.location.points;
                break;
            }
        }

        if (quad) {
            latestDetectedQuad = quad;
            drawOverlay(quad);
            // ... stabilizer logic and auto-capture
        } else {
            clearOverlay();
            resetStabilizer();
        }
    } catch (_) {
        clearOverlay();
    }

    isProcessingFrame = false;
    if (isScanning) {
        scanLoopId = requestAnimationFrame(scanLoop);
    }
}

The isProcessingFrame guard prevents concurrent captures on devices where WebAssembly inference takes longer than a single animation frame.

Step 4: Stabilize the Detected Quad Before Auto-Capture

Firing a capture the instant a quad is detected produces blurry or mis-framed results. The stabilizer compares consecutive quads using Intersection over Union (IoU) and an area-delta check; only after stableFrameCount consecutive stable frames is the capture triggered automatically:

const quadStabilizer = {
    enabled: true,
    iouThreshold: 0.85,
    areaDeltaThreshold: 0.15,
    stableFrameCount: 3,
};

function isQuadStable(current, previous) {
    if (!current || !previous || current.length !== 4 || previous.length !== 4) return false;
    const boxA = pointsToBoundingBox(current);
    const boxB = pointsToBoundingBox(previous);
    const iou = computeIoU(boxA, boxB);

    const areaA = polygonArea(current);
    const areaB = polygonArea(previous);
    const areaDelta = areaB === 0 ? 1 : Math.abs(areaA - areaB) / areaB;

    return iou >= quadStabilizer.iouThreshold && areaDelta <= quadStabilizer.areaDeltaThreshold;
}

All three thresholds (iouThreshold, areaDeltaThreshold, stableFrameCount) are exposed in an in-app settings panel so users can tune them for their environment.

Step 5: Normalize the Document with Perspective Correction

Once a stable quad is confirmed (or the user taps the capture button), normalizeDocument projects the detected boundary points back into cvr to perform perspective de-skew and returns the corrected image as a <canvas>:

async function normalizeDocument(frameCanvas, points) {
    try {
        const settings = await cvr.getSimplifiedSettings(NORMALIZE_TEMPLATE);
        settings.roi.points = points;
        settings.roiMeasuredInPercentage = 0;
        await cvr.updateSettings(NORMALIZE_TEMPLATE, settings);

        const normalizedResult = await cvr.capture(frameCanvas, NORMALIZE_TEMPLATE);
        for (const item of normalizedResult.items || []) {
            if (item.toCanvas && typeof item.toCanvas === "function") {
                return item.toCanvas();
            }
        }
        return null;
    } catch (_) {
        return null;
    }
}

Setting roiMeasuredInPercentage = 0 tells the router that settings.roi.points are pixel coordinates in the source frame, not percentages.

Step 6: Apply Image Filters Per Page

After capture, each page can be rendered in color, grayscale, or binary mode. The filter is applied client-side using getImageData / putImageData on a scratch canvas, leaving the original baseCanvas untouched:

function buildFilteredCanvas(page) {
    const src = page.baseCanvas;
    const out = document.createElement("canvas");
    out.width = src.width;
    out.height = src.height;
    const ctx = out.getContext("2d", { willReadFrequently: true });
    ctx.drawImage(src, 0, 0);

    if (page.filter === "color") {
        return out;
    }

    const image = ctx.getImageData(0, 0, out.width, out.height);
    const data = image.data;
    for (let i = 0; i < data.length; i += 4) {
        const gray = Math.round(data[i] * 0.299 + data[i + 1] * 0.587 + data[i + 2] * 0.114);
        if (page.filter === "grayscale") {
            data[i] = gray;
            data[i + 1] = gray;
            data[i + 2] = gray;
        } else {
            const binary = gray > 140 ? 255 : 0;
            data[i] = binary;
            data[i + 1] = binary;
            data[i + 2] = binary;
        }
    }
    ctx.putImageData(image, 0, 0);
    return out;
}

Step 7: Edit the Document Quad After Capture

If the auto-detected corners are inaccurate, the user can open the edit screen for any captured page, drag the four corner handles to the correct positions, and re-apply perspective normalization without recapturing. openEdit loads the stored originalCanvas and quadPoints into editState, then renderEditCanvas draws the image and overlays draggable 14 px circular handles:

function openEdit() {
    const page = pages[currentPageIndex];
    if (!page.originalCanvas || !page.quadPoints) {
        showToast("No quad data available for editing.");
        return;
    }
    editState = {
        originalCanvas: page.originalCanvas,
        quadPoints: page.quadPoints.map(p => ({ x: p.x, y: p.y })),
        draggingCorner: -1,
        imgRect: null,
    };
    resultScreen.classList.add("hidden");
    editScreen.classList.remove("hidden");
    requestAnimationFrame(renderEditCanvas);
}

function renderEditCanvas() {
    if (!editState) return;
    const rect = editCanvasEl.getBoundingClientRect();
    const cw = Math.round(rect.width);
    const ch = Math.round(rect.height);
    if (cw === 0 || ch === 0) return;
    editCanvasEl.width = cw;
    editCanvasEl.height = ch;

    const img = editState.originalCanvas;
    const scale = Math.min(cw / img.width, ch / img.height);
    const imgW = Math.round(img.width * scale);
    const imgH = Math.round(img.height * scale);
    const imgX = Math.round((cw - imgW) / 2);
    const imgY = Math.round((ch - imgH) / 2);
    editState.imgRect = { x: imgX, y: imgY, w: imgW, h: imgH };

    editCtx2.clearRect(0, 0, cw, ch);
    editCtx2.drawImage(img, imgX, imgY, imgW, imgH);

    const pts = editState.quadPoints.map(p => ({
        x: imgX + (p.x / img.width) * imgW,
        y: imgY + (p.y / img.height) * imgH,
    }));

    editCtx2.beginPath();
    editCtx2.moveTo(pts[0].x, pts[0].y);
    for (let i = 1; i < 4; i++) editCtx2.lineTo(pts[i].x, pts[i].y);
    editCtx2.closePath();
    editCtx2.fillStyle = "rgba(106,196,187,0.2)";
    editCtx2.fill();
    editCtx2.strokeStyle = "#6ac4bb";
    editCtx2.lineWidth = 2;
    editCtx2.stroke();

    const HANDLE_R = 14;
    pts.forEach((p, i) => {
        editCtx2.beginPath();
        editCtx2.arc(p.x, p.y, HANDLE_R, 0, Math.PI * 2);
        editCtx2.fillStyle = i === editState.draggingCorner ? "#fe8e14" : "#6ac4bb";
        editCtx2.fill();
        editCtx2.strokeStyle = "#fff";
        editCtx2.lineWidth = 2;
        editCtx2.stroke();
    });
}

When the user taps Apply, applyEdit calls normalizeDocument with the updated corner coordinates and replaces the stored baseCanvas in-place — the page is corrected without affecting any other page:

async function applyEdit() {
    if (!editState) return;
    editApplyBtn.disabled = true;
    try {
        const normalized = await normalizeDocument(editState.originalCanvas, editState.quadPoints);
        if (!normalized) {
            showToast("Normalization failed. Adjust corners and try again.");
            return;
        }
        pages[currentPageIndex].baseCanvas = copyCanvas(normalized);
        pages[currentPageIndex].quadPoints = editState.quadPoints.map(p => ({ x: p.x, y: p.y }));
        editState = null;
        editScreen.classList.add("hidden");
        resultScreen.classList.remove("hidden");
        renderResult();
        updateThumbnailBar();
    } catch (e) {
        console.error(e);
        showToast("Edit failed: " + (e?.message || "Unknown error"));
    } finally {
        editApplyBtn.disabled = false;
    }
}

Step 8: Reorder Pages with Drag-and-Drop

For multi-page sessions, the sort overlay lets users drag thumbnail cards into the desired order. The implementation supports both mouse (dragstart / drop) and touch (touchstart / touchmove / touchend) events. A floating clone follows the finger during a touch drag, and document.elementFromPoint identifies the drop target:

function openSort() {
    if (pages.length < 2) {
        showToast("Need at least 2 pages to reorder.");
        return;
    }
    const workingOrder = pages.map((_, i) => i);
    let dragIdx = -1;

    function rebuild() {
        sortList.innerHTML = "";
        workingOrder.forEach((pageIdx, pos) => {
            const item = document.createElement("div");
            item.className = "sort-item";
            item.draggable = true;
            item.dataset.pos = pos;

            item.addEventListener("dragstart", (e) => {
                dragIdx = pos;
                item.classList.add("dragging");
                e.dataTransfer.effectAllowed = "move";
            });
            item.addEventListener("drop", (e) => {
                e.preventDefault();
                item.classList.remove("drag-over");
                const dropPos = Number(item.dataset.pos);
                if (dragIdx >= 0 && dragIdx !== dropPos) {
                    const [moved] = workingOrder.splice(dragIdx, 1);
                    workingOrder.splice(dropPos, 0, moved);
                    rebuild();
                }
            });

            // Touch drag support
            let touchClone = null;
            item.addEventListener("touchstart", (e) => {
                dragIdx = pos;
                const touch = e.touches[0];
                touchClone = item.cloneNode(true);
                touchClone.style.cssText = `position:fixed;left:${touch.clientX - 30}px;top:${touch.clientY - 24}px;width:${item.offsetWidth}px;opacity:0.8;z-index:200;pointer-events:none`;
                document.body.appendChild(touchClone);
                item.classList.add("dragging");
            }, { passive: true });
            item.addEventListener("touchend", (e) => {
                item.classList.remove("dragging");
                if (touchClone) { touchClone.remove(); touchClone = null; }
                const touch = e.changedTouches[0];
                const overItem = document.elementFromPoint(touch.clientX, touch.clientY)?.closest(".sort-item");
                if (overItem) {
                    const dropPos = Number(overItem.dataset.pos);
                    if (dragIdx >= 0 && dragIdx !== dropPos) {
                        const [moved] = workingOrder.splice(dragIdx, 1);
                        workingOrder.splice(dropPos, 0, moved);
                        rebuild();
                    }
                }
                dragIdx = -1;
            });
            sortList.appendChild(item);
        });
    }

    rebuild();
    sortOverlay.classList.remove("hidden");

    sortDoneBtn.onclick = () => {
        pages = workingOrder.map(i => pages[i]);
        currentPageIndex = 0;
        sortOverlay.classList.add("hidden");
        renderResult();
        updateThumbnailBar();
    };
}

When the user taps Done, workingOrder (an index remapping array) is applied to pages in a single .map() call, making the reorder non-destructive until confirmed.

Step 9: Export the Scanned Pages as a PDF

All captured pages (with filters applied) are packed into a single multi-page PDF using jsPDF. Each page is centered and scaled to fit A4:

async function exportPdf() {
    if (!pages.length || !window.jspdf) return;
    const { jsPDF } = window.jspdf;
    const pdf = new jsPDF({ unit: "pt", format: "a4" });

    for (let i = 0; i < pages.length; i++) {
        const canvas = buildFilteredCanvas(pages[i]);
        const imageData = canvas.toDataURL("image/jpeg", 0.95);
        if (i > 0) pdf.addPage("a4", "portrait");

        const pageWidth = pdf.internal.pageSize.getWidth();
        const pageHeight = pdf.internal.pageSize.getHeight();
        const ratio = Math.min(pageWidth / canvas.width, pageHeight / canvas.height);
        const drawWidth = canvas.width * ratio;
        const drawHeight = canvas.height * ratio;
        const x = (pageWidth - drawWidth) / 2;
        const y = (pageHeight - drawHeight) / 2;

        pdf.addImage(imageData, "JPEG", x, y, drawWidth, drawHeight);
    }

    const blob = pdf.output("blob");
    const blobUrl = URL.createObjectURL(blob);
    window.open(blobUrl, "_blank", "noopener,noreferrer");
}

Individual pages can also be downloaded as PNG files via the Export as Images option, which calls canvas.toDataURL("image/png") for each page.

Source Code

https://github.com/yushulx/javascript-barcode-qr-code-scanner/tree/main/examples/multi-document-capture

DEV Community