monkeymore studio

Posted on Apr 8

Converting PDF Pages to Images: A Client-Side Rendering Approach

#webdev #javascript #tutorial #frontend

Introduction

Converting PDF pages to images is a common need - whether you need to extract visuals for presentations, create thumbnails for a gallery, or share specific pages on social media. In this article, we'll explore how to build a pure browser-side PDF to image converter that renders each page as a high-quality PNG image and packages them into a downloadable ZIP file.

Why Browser-Side Conversion?

Traditional PDF to image conversion typically requires:

Server Uploads: Sending your PDF to external servers
Processing Queues: Waiting for server-side rendering
Quality Limitations: Compressed or watermarked outputs
Privacy Concerns: Documents stored on third-party servers

Browser-side processing offers significant advantages:

Documents never leave your device
Instant processing with no waiting
Full quality preservation (lossless PNG)
Complete privacy and security

Architecture Overview

Our implementation uses PDF.js for rendering and JSZip for packaging:

Note: Unlike other features in this application, PDF to image conversion happens entirely in the main thread (not a Web Worker), with only PDF.js using its own dedicated worker for parsing.

Core Technologies

1. PDF.js - Mozilla's PDF Library

PDF.js is the industry-standard JavaScript library for PDF rendering:

Canvas-based rendering: High-quality output
Web Worker support: Non-blocking PDF parsing
Text extraction: Can extract text alongside images
CMap support: Proper handling of CJK (Chinese, Japanese, Korean) characters

2. JSZip - In-Browser ZIP Creation

JSZip allows creating ZIP archives entirely in the browser:

No server required: Generate ZIPs client-side
Compression options: Configurable compression levels
Streaming support: Handle large files efficiently

Implementation

1. Entry Point - Page Component

// pdf2jpg/page.tsx
import { type Metadata } from "next";
import { getTranslations } from "next-intl/server";
import { seoConfig } from "../_components/seo-config";
import { Organize } from "@/app/[locale]/_components/qpdf/pdf2image";

export async function generateMetadata({
  params,
}: {
  params: Promise<{ locale: string }>;
}): Promise<Metadata> {
  const { locale } = await params;
  const seo =
    seoConfig[locale as keyof typeof seoConfig]?.pdf2jpg ||
    seoConfig["en-us"].pdf2jpg;

  return {
    title: seo.title,
    description: seo.description,
  };
}

export default async function Page() {
  return <Organize />;
}

2. Main Component - PDF to Image Converter

// _components/qpdf/pdf2image.tsx
"use client";

import { useState } from "react";
import { useTranslations } from "next-intl";
import { PdfPage } from "../pdfpage";
import { usePdfjs } from "@/hooks/usepdfjs";
import { autoDownloadBlob } from "@/utils/pdf";

export const Organize = () => {
  const [files, setFiles] = useState<File[]>([]);
  const { page2image, isLoading } = usePdfjs();
  const t = useTranslations("Pdf2Jpg");

  const mergeInMain = async () => {
    console.log("Converting PDF to images");
    files.forEach((e) => console.log(e.name));

    // Convert PDF pages to images
    const outputFile = await page2image(files[0]!);

    if (outputFile) {
      autoDownloadBlob(new Blob([outputFile]), "images.zip");
    }
  };

  const onPdfFiles = (files: File[]) => {
    console.log("Files selected");
    files.forEach((e) => console.log(e.name));
    setFiles(files);
  };

  return (
    <PdfPage
      title={t("title")}
      onFiles={onPdfFiles}
      process={mergeInMain}
      processDisabled={isLoading}
    >
      <div className="text-sm text-gray-600">
        {t("description")}
      </div>
    </PdfPage>
  );
};

3. Core Conversion Logic - usePdfjs Hook

// hooks/usepdfjs.ts
import { useEffect, useRef, useState } from "react";
import JSZip from "jszip";
import * as pdfjs from "pdfjs-dist";

type PdfjsLibType = {
  getDocument: typeof pdfjs.getDocument;
  GlobalWorkerOptions: typeof pdfjs.GlobalWorkerOptions;
};

export const usePdfjs = () => {
  const pdfjsRef = useRef<PdfjsLibType | null>(null);
  const [loaded, setLoaded] = useState(false);
  const [isLoading, setIsLoading] = useState(false);

  // Dynamically load PDF.js
  useEffect(() => {
    if (typeof globalThis === "undefined") {
      (window as any).globalThis = window;
    }

    const script = document.createElement("script");
    script.src = "/pdf/pdf.min.mjs";
    script.type = "module";
    script.async = true;
    script.onload = () => {
      console.log("pdfjs-dist loaded");
      const typedPdfjs = (window as any).pdfjsLib as PdfjsLibType;
      typedPdfjs.GlobalWorkerOptions.workerSrc = "/pdf/pdf.worker.min.mjs";
      pdfjsRef.current = typedPdfjs;
      setLoaded(true);
    };
    document.head.appendChild(script);

    return () => {
      document.head.removeChild(script);
      pdfjsRef.current = null;
    };
  }, []);

  const page2image = async (file: File): Promise<ArrayBuffer | null> => {
    if (!pdfjsRef.current) {
      console.error("pdfjs not ready yet");
      return null;
    }

    setIsLoading(true);
    const canvas = document.createElement("canvas");
    const arrayBuffer = await file.arrayBuffer();

    try {
      // Load PDF document with CMap support for Chinese characters
      const pdfDoc = await pdfjsRef.current.getDocument({
        data: new Uint8Array(arrayBuffer),
        cMapUrl: "https://cdn.jsdelivr.net/npm/pdfjs-dist@5.4.149/cmaps/",
        cMapPacked: true, // Essential for CJK (Chinese/Japanese/Korean) PDFs
      }).promise;

      const zip = new JSZip();

      // Render each page to canvas and save as PNG
      for (let i = 1; i <= pdfDoc.numPages; ++i) {
        const page = await pdfDoc.getPage(i);
        const viewport = page.getViewport({ scale: 1 });

        // Set canvas size to match PDF page
        canvas.width = viewport.width;
        canvas.height = viewport.height;

        // Render PDF page to canvas
        await page.render({
          canvasContext: canvas.getContext("2d")!,
          viewport: viewport,
        }).promise;

        // Convert canvas to PNG blob
        const pngBlob = await new Promise<Blob>((resolve, reject) => {
          canvas.toBlob((blob) => {
            if (!blob) {
              reject(new Error("Failed to create blob"));
            } else {
              resolve(blob);
            }
          }, "image/png"); // Lossless PNG format
        });

        // Add to ZIP archive
        zip.file(`page${i}.png`, pngBlob);
      }

      // Generate ZIP with compression
      const zipBuffer = await zip.generateAsync({
        type: "arraybuffer",
        compression: "DEFLATE",
        compressionOptions: { level: 6 },
      });

      return zipBuffer;
    } finally {
      setIsLoading(false);
    }
  };

  return { page2image, isLoading, loaded };
};

Key Implementation Details:

Dynamic Loading: PDF.js is loaded dynamically from /pdf/pdf.min.mjs
Worker Configuration: PDF parsing happens in a worker via GlobalWorkerOptions.workerSrc
CMap Support: Essential for rendering PDFs with Chinese, Japanese, or Korean text
Scale 1: Renders at original PDF resolution (typically 72-150 DPI)
PNG Format: Lossless compression for maximum quality
ZIP Packaging: All pages packaged with DEFLATE compression

4. Canvas Rendering Process

5. ZIP Generation Helper

For extracting embedded images from PDFs (different from page rendering):

// lib/parsePdfImage.js
import JSZip from "jszip";

export async function zipImageBitmaps(data) {
  const zip = new JSZip();

  // Process each image
  for (let i = 0; i < data.length; i++) {
    const bitmap = data[i];

    // Convert ImageBitmap to PNG Blob
    const pngBlob = await imageBitmapToPngBlob(
      bitmap.data,
      bitmap.width,
      bitmap.height,
    );

    console.log("Image blob size", pngBlob.size);

    // Add to ZIP with original name
    zip.file(bitmap.name, pngBlob);
  }

  // Generate compressed ZIP
  const zipBuffer = await zip.generateAsync({
    type: "arraybuffer",
    compression: "DEFLATE",
    compressionOptions: { level: 6 },
  });

  return zipBuffer;
}

// Convert ImageBitmap to PNG using canvas
export async function imageBitmapToPngBlob(data, width, height) {
  const canvas = document.createElement("canvas");
  canvas.width = width;
  canvas.height = height;

  const ctx = canvas.getContext("2d");
  ctx.drawImage(data, 0, 0);

  return new Promise((resolve, reject) => {
    canvas.toBlob((blob) => {
      if (!blob) {
        reject(null);
      }
      resolve(blob);
    }, "image/png");
  });
}

Complete User Flow

Technical Highlights

1. CMap Support for CJK Characters

const pdfDoc = await pdfjsRef.current.getDocument({
  data: new Uint8Array(arrayBuffer),
  cMapUrl: "https://cdn.jsdelivr.net/npm/pdfjs-dist@5.4.149/cmaps/",
  cMapPacked: true, // Essential for Chinese/Japanese/Korean PDFs
}).promise;

Why CMaps Matter:

PDFs with Asian characters need character mapping tables
Without CMaps, Chinese text renders as gibberish
CDN-hosted CMaps ensure proper rendering

2. Canvas to Blob Conversion

const pngBlob = await new Promise<Blob>((resolve, reject) => {
  canvas.toBlob((blob) => {
    if (!blob) reject(new Error("Failed"));
    else resolve(blob);
  }, "image/png"); // Explicit PNG format
});

PNG vs JPG:

PNG: Lossless, larger files, perfect quality
JPG: Lossy, smaller files, quality degradation
This implementation uses PNG for maximum fidelity

3. ZIP Compression Configuration

const zipBuffer = await zip.generateAsync({
  type: "arraybuffer",
  compression: "DEFLATE",
  compressionOptions: { level: 6 }, // 0-9, 6 is balanced
});

Compression Levels:

0: No compression (fastest)
6: Balanced (default, good compression, reasonable speed)
9: Maximum compression (slowest, smallest files)

4. Memory Management

// Create single canvas and reuse
const canvas = document.createElement("canvas");

for (let i = 1; i <= pdfDoc.numPages; ++i) {
  // Reuse same canvas for each page
  canvas.width = viewport.width;
  canvas.height = viewport.height;

  // Render and convert
  await page.render({ canvasContext, viewport }).promise;
  const blob = await canvasToBlob(canvas);

  // Canvas is cleared and reused for next page
}

Benefits:

Single canvas instance reduces memory allocation
Garbage collection minimized
Better performance for large PDFs

Browser Compatibility

Requirements:

Canvas API - For rendering PDF pages
PDF.js - PDF parsing and rendering
JSZip - ZIP archive creation
ES6+ - Modern JavaScript features
File API - For reading PDF files

Supported in all modern browsers (Chrome, Firefox, Safari, Edge).

Performance Considerations

1. Resolution and Scale

// Current: Scale 1 (original PDF resolution)
const viewport = page.getViewport({ scale: 1 });

// For higher resolution:
const viewport = page.getViewport({ scale: 2 }); // 2x resolution

Trade-offs:

Higher scale = Better quality but larger files
Lower scale = Smaller files but pixelated

2. Batch Processing

For very large PDFs, consider:

Streaming ZIP generation
Progress indicators
Page-by-page downloads

3. Memory Usage

All rendered pages are held in memory before ZIP generation:

10-page PDF at 100KB per page = ~1MB memory
100-page PDF at 500KB per page = ~50MB memory

Conclusion

Building a browser-side PDF to image converter demonstrates the power of modern web APIs. By combining:

PDF.js for high-quality PDF rendering
Canvas API for image generation
JSZip for client-side packaging
CMap support for international text

We've created a tool that offers:

Complete privacy - Documents never leave your device
Maximum quality - Lossless PNG output
International support - CJK character rendering
Instant processing - No server delays
Convenient packaging - ZIP download with all pages

The ability to convert PDF pages to images entirely in the browser makes document sharing and editing more accessible than ever.

Need to convert your PDF pages to images? Try our free online tool at Free Online PDF Tools - convert each page to high-quality PNG images packaged in a ZIP file, all processed locally in your browser for complete privacy!

DEV Community