DEV Community

monkeymore studio
monkeymore studio

Posted on

Building a Browser-Based PDF Page Removal Tool with WebAssembly and Web Workers

In this article, we'll explore how to implement a pure client-side PDF page removal tool that runs entirely in the browser. No server required, no file uploads, complete privacy protection.

Why Browser-Based PDF Processing?

Traditional PDF processing typically requires:

  • Uploading files to a server
  • Processing on the backend
  • Downloading the result

This approach has significant drawbacks:

  1. Privacy concerns - Your sensitive documents are sent to third-party servers
  2. Network dependency - Requires stable internet connection
  3. Latency - Upload and download times for large files
  4. Server costs - Backend infrastructure required

Browser-based processing solves all these issues:

  • ✅ Files never leave your computer
  • ✅ Works offline after initial load
  • ✅ Instant processing
  • ✅ Zero server costs for PDF operations

Architecture Overview

Our solution combines three powerful web technologies:

  1. WebAssembly (WASM) - Running QPDF (a powerful PDF manipulation library) compiled to WASM
  2. Web Workers - Offloading heavy PDF operations to a background thread
  3. Comlink - Making worker communication as simple as async function calls

Core Data Structures

PageRange Type

The fundamental data structure for specifying which pages to remove:

// types/pdfdata.ts
export type PageRange = [number, number];
Enter fullscreen mode Exit fullscreen mode

Each PageRange is a tuple where:

  • Index 0: Start page number (inclusive)
  • Index 1: End page number (inclusive)
  • Single pages are represented as [n, n]

WorkerFunctions Interface

The contract between main thread and worker:

// hooks/useqpdf.ts
interface WorkerFunctions {
  init: () => Promise<void>;
  remove: (files: File, ...range: PageRange[]) => Promise<ArrayBuffer | null>;
  // ... other operations
}
Enter fullscreen mode Exit fullscreen mode

Implementation Deep Dive

1. User Interface Layer

The UI component handles user input and triggers the removal process:

// app/[locale]/_components/qpdf/remove.tsx
export const Organize = () => {
  const [files, setFiles] = useState<File[]>([]);
  const { value: pages, onChange: onChangeUserPassword } =
    useInputValue<string>("1-z");
  const { remove } = useQpdf();

  const mergeInMain = async () => {
    // Parse user input: "1-3,5,10-z" → [[1,3], [5,5], [10,10000]]
    const remoePages = pages
      .replaceAll("", ",")  // Support Chinese comma
      .split(",")
      .map((e) => {
        if (e.includes("-")) {
          const t = e.split("-");
          return [parseInt(t[0]!), parseInt(t[1]!)] as PageRange;
        } else {
          return [parseInt(e), parseInt(e)] as PageRange;
        }
      });

    const outputFile = await remove(files[0]!, ...remoePages);

    if (outputFile) {
      autoDownloadBlob(new Blob([outputFile]), "organize.pdf");
    }
  };
  // ...
};
Enter fullscreen mode Exit fullscreen mode

Key features of the input format:

  • Comma-separated ranges: 1-3,5,10-z
  • Single pages: 5 becomes [5, 5]
  • Ranges: 1-3 becomes [1, 3]
  • Special character z represents the last page
  • Supports Chinese comma for localization

2. Worker Management with Comlink

The useQpdf hook manages the Web Worker lifecycle:

// hooks/useqpdf.ts
export const useQpdf = () => {
  const workerRef = useRef<Comlink.Remote<WorkerFunctions>>(null);

  useEffect(() => {
    async function initWorker() {
      if (workerRef.current) return;
      const worker = new PdfWorker();

      worker.onerror = (error) => {
        console.error("Worker error:", error);
      };

      workerRef.current = Comlink.wrap<WorkerFunctions>(worker);
      await workerRef.current.init();
      return () => worker.terminate();
    }

    initWorker().catch(() => { return; });
  }, []);

  const remove = async (
    file: File,
    ...range: PageRange[]
  ): Promise<ArrayBuffer | null> => {
    if (!workerRef.current) return null;
    const r = await workerRef.current.remove(file, ...range);
    return r;
  };

  return { remove };
};
Enter fullscreen mode Exit fullscreen mode

Why Comlink?

  • Eliminates manual postMessage boilerplate
  • Provides type-safe function calls
  • Handles serialization automatically
  • Makes worker code look like regular async functions

3. The Range Inversion Algorithm

QPDF's --pages flag specifies which pages to keep, not which to remove. So we need to invert the user's "remove" ranges into "keep" ranges:

// hooks/pdf.worker.js
function removeRanges(mainRange, ...excludeRanges) {
  const [start, end] = mainRange;
  const excludeSet = new Set();

  // Collect all pages to exclude
  excludeRanges.forEach(([s, e]) => {
    for (let i = s; i <= e; i++) {
      excludeSet.add(i);
    }
  });

  // Collect remaining pages
  const remaining = [];
  for (let i = start; i <= end; i++) {
    if (!excludeSet.has(i)) {
      remaining.push(i);
    }
  }

  // Convert consecutive numbers to compact ranges
  const result = [];
  if (remaining.length === 0) return result;

  let currentStart = remaining[0];
  let currentEnd = remaining[0];

  for (let i = 1; i < remaining.length; i++) {
    if (remaining[i] === currentEnd + 1) {
      currentEnd = remaining[i];
    } else {
      result.push(
        currentStart === currentEnd
          ? [currentStart]
          : [currentStart, currentEnd]
      );
      currentStart = remaining[i];
      currentEnd = remaining[i];
    }
  }

  result.push(
    currentStart === currentEnd ? [currentStart] : [currentStart, currentEnd]
  );

  return result;
}
Enter fullscreen mode Exit fullscreen mode

Example transformation:

  • Input: Remove [1,3], [5,5], [10,10000] from document with 100 pages
  • Process: Exclude pages 1-3, 5, 10-100 → Remaining: 4, 6-9
  • Output: [[4], [6,9]] → formatted as "4,6-9"

4. QPDF WASM Execution

The core PDF processing happens in the Web Worker using QPDF compiled to WebAssembly:

// hooks/pdf.worker.js
async remove(file, ...range) {
  // Convert File to ArrayBuffer
  const arrayBuffer = await file.arrayBuffer();
  const uint8Array = new Uint8Array(arrayBuffer);

  // Write to QPDF's virtual filesystem
  qpdf.FS.writeFile(`/input.pdf`, uint8Array);

  // Calculate pages to KEEP (inverse of pages to remove)
  const result = removeRanges([1, 10000], ...range);
  result[result.length - 1][1] = "z";  // Use 'z' for last page

  const resultstr = result.map((e) => {
    if (e.length == 1) return e[0] + "";
    else return e[0] + "-" + e[1];
  });

  // Build QPDF command
  const params = [
    "/input.pdf",
    "--pages",
    "/input.pdf",
    resultstr.join(","),  // Pages to KEEP
    "--",
    "/output.pdf",
  ];

  // Execute QPDF
  qpdf.callMain(params);

  // Read output from virtual filesystem
  const outputFile = qpdf.FS.readFile("/output.pdf");
  return outputFile;
}
Enter fullscreen mode Exit fullscreen mode

QPDF Command Example:

# To remove pages 1-3 and 5 from a 100-page document:
# We need to keep pages 4, 6-100
qpdf input.pdf --pages input.pdf 4,6-z -- output.pdf
Enter fullscreen mode Exit fullscreen mode

5. WASM Initialization

The QPDF WASM module is initialized with Emscripten's virtual filesystem:

// lib/qpdfwasm.js
import createModule from "@neslinesli93/qpdf-wasm";

const f = async () => {
  const qpdf = await createModule({
    locateFile: () => "/qpdf.wasm",
    noInitialRun: true,  // Don't run main() immediately
    preRun: [(module) => {
      if (module.FS) {
        // Filesystem is ready
      }
    }],
  });
  return qpdf;
};
Enter fullscreen mode Exit fullscreen mode

Complete Processing Flow

Key Technical Decisions

1. Why QPDF?

QPDF is a powerful command-line tool for PDF manipulation. By compiling it to WASM:

  • We get battle-tested PDF processing logic
  • Supports complex operations (merge, split, rotate, encrypt)
  • Handles edge cases and malformed PDFs well

2. Why Web Workers?

PDF processing can be CPU-intensive:

  • Parsing large PDFs
  • Rebuilding document structure
  • Writing output files

Running in a Web Worker:

  • Prevents UI freezing
  • Maintains 60fps during processing
  • Provides true parallelism on multi-core systems

3. Virtual File System

Emscripten provides an in-memory filesystem:

  • No actual disk access needed
  • Fast read/write operations
  • Automatic cleanup when worker terminates

File Download Utility

After processing, we trigger the browser download:

// utils/pdf.ts
export function autoDownloadBlob(blob: Blob, filename: string) {
  const blobUrl = URL.createObjectURL(blob);
  const downloadLink = document.createElement("a");
  downloadLink.href = blobUrl;
  downloadLink.download = filename;
  downloadLink.style.display = "none";

  document.body.appendChild(downloadLink);
  downloadLink.click();
  document.body.removeChild(downloadLink);
  URL.revokeObjectURL(blobUrl);
}
Enter fullscreen mode Exit fullscreen mode

Benefits of This Architecture

  1. Privacy First: Files never leave the browser
  2. Performance: Near-native speed with WASM
  3. Responsive UI: Web Workers prevent blocking
  4. Type Safety: TypeScript + Comlink = type-safe worker communication
  5. Maintainability: Clean separation of concerns

Try It Yourself

Want to remove pages from your PDF without uploading anything to a server? Try our free browser-based PDF tool:

Remove PDF Pages Online →

All processing happens locally in your browser - your files never leave your computer!


Conclusion

Building a browser-based PDF processing tool demonstrates the power of modern web technologies. By combining WebAssembly, Web Workers, and Comlink, we can perform complex PDF operations entirely client-side while maintaining a responsive user interface.

This approach is ideal for:

  • Privacy-sensitive documents
  • Offline-capable applications
  • Reducing server costs
  • Improving user experience with instant processing

The complete source code demonstrates production-ready patterns for WASM integration in React applications.

Top comments (0)