Building a Browser-Based Ebook Reader: Parsing EPUB Files in JavaScript

#javascript #webdev #beginners #programming

EPUB files are just zip archives containing HTML, CSS, and images. This means a browser is already equipped to render ebook content natively. You do not need Kindle software or a dedicated reader app. But extracting and rendering EPUB content correctly involves several parsing steps that are not immediately obvious.

I built a browser-based reader and learned more about the EPUB specification than I ever expected.

The EPUB structure

An EPUB file is a zip archive with a specific directory structure:

mimetype                          (must be first, uncompressed)
META-INF/
  container.xml                   (points to the content file)
OEBPS/ (or similar)
  content.opf                     (manifest and spine)
  toc.ncx or nav.xhtml            (table of contents)
  chapter1.xhtml
  chapter2.xhtml
  styles/
    book.css
  images/
    cover.jpg

The container.xml file tells you where the .opf file is. The .opf file contains the manifest (all files in the book) and the spine (the reading order of chapters).

Parsing in the browser

JavaScript can handle every step. First, unzip the file using a library like JSZip:

import JSZip from 'jszip';

async function parseEPUB(file) {
  const zip = await JSZip.loadAsync(file);

  // Step 1: Find the OPF file
  const container = await zip.file('META-INF/container.xml').async('string');
  const parser = new DOMParser();
  const containerDoc = parser.parseFromString(container, 'text/xml');
  const opfPath = containerDoc.querySelector('rootfile').getAttribute('full-path');

  // Step 2: Parse the OPF
  const opf = await zip.file(opfPath).async('string');
  const opfDoc = parser.parseFromString(opf, 'text/xml');

  // Step 3: Extract spine (reading order)
  const spineItems = opfDoc.querySelectorAll('spine itemref');
  // ... map to manifest items to get file paths
}

Each chapter is an XHTML file that can be parsed as DOM and rendered in an iframe or a div. The CSS included in the EPUB styles the content, though you will likely want to override some styles for your reader's theme.

Rendering challenges

Image URLs: Images are referenced with relative paths inside the EPUB. When rendering in the browser, you need to convert these to blob URLs: read the image from the zip, create a Blob, and replace the src attribute with URL.createObjectURL(blob).

CSS conflicts: The book's CSS can conflict with your reader's UI CSS. Rendering book content inside an iframe isolates styles naturally. If using a div, you need careful scoping.

Pagination: The web naturally scrolls. Books are traditionally paginated. Implementing page-like reading uses CSS multi-column layout:

.reader-content {
  column-width: 100%;
  column-gap: 0;
  height: 100vh;
  overflow: hidden;
}

Navigation between "pages" shifts the column offset.

Font embedding: Many EPUBs include custom fonts. Extract them from the zip, create blob URLs, and inject @font-face rules.

The reader experience

A good ebook reader needs:

Adjustable font size and family
Light and dark themes
Bookmarking and progress tracking (localStorage)
Table of contents navigation
Search within the book
Keyboard navigation (arrow keys for pages)

All of these are achievable in pure browser technology without any server component.

The tool

For reading EPUBs directly in your browser with no uploads, no accounts, and no software installation, I built an ebook reader that handles EPUB parsing, rendering, theming, and bookmarking entirely client-side. Drop in a file and read.

I'm Michael Lip. I build free developer tools at zovo.one. 500+ tools, all private, all free.