monkeymore studio

Posted on Apr 2

Building a Browser-Based MOBI Splitter: Client-Side Ebook Processing with TypeScript

#javascript #webdev #showdev #typescript

Splitting a collection of ebooks into individual files typically requires server-side processing. But what if you could do it entirely in the browser? This article explores how we built a client-side MOBI splitter that parses binary files, reconstructs HTML content, and generates valid MOBI files—all without uploading data to a server.

The Challenge

MOBI files, used by Kindle devices, are binary formats with a complex structure:

PalmDB (PDB) container format with record-based storage
MOBI headers with metadata and flags
EXTH (Extended Header) records for book information
PalmDOC compression for text content
Resource references for images and fonts

Processing these in the browser means:

Binary parsing in JavaScript/TypeScript without native libraries
Compression/decompression of PalmDOC format
HTML reconstruction from fragmented records
File generation that produces valid MOBI output

Architecture Overview

Step 1: File Parsing with Web Workers

The first challenge is parsing the MOBI format without blocking the UI. We use a Web Worker with Comlink for elegant communication:

// MobiSplitterClient.tsx
const WorkerClass = Comlink.wrap<MobiSplitterWorkerType>(
  new Worker(new URL('./mobi-splitter.worker.ts', import.meta.url))
);

In the worker, we use @lingo-reader/mobi-parser to extract:

Metadata: Title, author, language
Table of Contents (TOC): With filepos references to chapter locations
Spine: Ordered list of content records
Resources: Images and fonts stored in the file

Step 2: HTML Reconstruction

MOBI files store content as fragmented records. We need to reconstruct the full HTML:

// Build full HTML from spine records
const chapters: string[] = [];
for (const id of book.spine) {
  const html = await book.loadChapter(id);
  chapters.push(html);
}
const fullHtml = chapters.join('\n');

During reconstruction, we:

Convert filepos links: Original TOC uses filepos:XXXX hrefs. We convert these to anchors <a id="fileposXXXX"> for position tracking.
Normalize resource references: Handle various image reference formats:
- recindex="N" - Standard MOBI references
- src="blob:..." - Blob URLs from parser
- src="Images/imageXXXX.jpg" - Path-based references

Step 3: Detecting Book Boundaries

A "MOBI collection" contains multiple books. We detect boundaries from TOC entries:

// Filter out non-book entries (copyright, contents, intro, etc.)
const blackList = ['copyright', 'contents', 'intro', 'introduction', 'cover'];

const bookTocEntries = toc.filter((entry, index) => {
  const label = (entry.label || '').toLowerCase();
  // Keep entries that look like book titles (not chapters)
  return !blackList.some(keyword => label.includes(keyword)) && 
         index > 0; // Skip first entry (usually collection title)
});

Step 4: Content Extraction with Image Mapping

For each book, we:

The tricky part is image index reassignment. Since we're creating standalone MOBI files, each book needs its own resource index space starting from 0:

// Build image map for this book
const imageMap = new Map<number, Uint8Array>();
let newIndex = 1; // MOBI uses 1-based indexing for resources

html.match(/recindex="(\d+)"/g)?.forEach((match) => {
  const oldIndex = parseInt(match.match(/\d+/)![0]);
  if (!imageMap.has(oldIndex)) {
    const imageData = await book.loadResource(oldIndex);
    imageMap.set(newIndex, imageData); // Map to new index
    html = html.replace(
      new RegExp(`recindex="${oldIndex}"`, 'g'),
      `recindex="${newIndex}"`
    );
    newIndex++;
  }
});

Step 5: PalmDOC Compression

MOBI files use PalmDOC compression, an LZ77 variant with optimizations:

function palmDocCompress(data: Uint8Array): Uint8Array {
  const result: number[] = [];
  let pos = 0;

  while (pos < data.length) {
    // Look for repeated sequences within 2047 bytes back
    let bestLength = 0;
    let bestOffset = 0;

    const searchStart = Math.max(0, pos - 2047);
    for (let i = searchStart; i < pos; i++) {
      let length = 0;
      while (length < 10 && 
             pos + length < data.length && 
             data[i + length] === data[pos + length]) {
        length++;
      }
      if (length > bestLength) {
        bestLength = length;
        bestOffset = pos - i;
      }
    }

    if (bestLength >= 3) {
      // Encode as back-reference: 0x80 + (length-3)<<11 + offset
      const encoded = 0x8000 | ((bestLength - 3) << 11) | bestOffset;
      result.push((encoded >> 8) & 0xFF, encoded & 0xFF);
      pos += bestLength;
    } else if (data[pos] === 0x20 && pos + 1 < data.length && 
               data[pos + 1] >= 0x40 && data[pos + 1] <= 0x7F) {
      // Optimize "space + ASCII" to single byte
      result.push(data[pos + 1] ^ 0x80);
      pos += 2;
    } else {
      result.push(data[pos]);
      pos++;
    }
  }

  return new Uint8Array(result);
}

Step 6: MOBI File Generation

Finally, we generate valid MOBI files using MobiWriterCalibre:

The writer creates:

PalmDB Header: Database name, attributes, record count
MOBI Header: Format version, text encoding, flags
EXTH Records: Title, author, language, ASIN
Record Table: Offset pointers for each record
FLIS/FCIS Records: Required by Kindle for proper rendering

class MobiWriterCalibre {
  constructor(
    title: string,
    author: string,
    html: string,
    images: Map<number, Uint8Array>,
    language: string = 'en'
  ) {
    // Build MOBI structure
    this.records = [];
    this.addTextRecords(html);        // Compressed text
    this.addImageRecords(images);     // Binary resources
    this.addFLISRecord();             // Kindle compatibility
    this.addFCISRecord();
    this.buildHeader();
  }
}

Performance Considerations

Processing large ebook collections (100MB+) in the browser requires optimization:

Key Takeaways

Binary processing in TypeScript is viable for complex formats like MOBI using ArrayBuffer and typed arrays
Web Workers + Comlink provide a clean abstraction for offloading heavy computation without callback hell
PalmDOC compression uses classic LZ77 with space+ASCII optimization—understanding legacy formats helps when working with ebook standards
Resource remapping is critical when splitting files—each standalone output needs its own index space
Browser capabilities have evolved to handle complex document processing, making client-side tools increasingly powerful

Conclusion

Building a browser-based MOBI splitter demonstrates that modern web technologies can handle complex binary file processing. By leveraging Web Workers for concurrency, implementing compression algorithms in TypeScript, and carefully managing binary data structures, we created a tool that processes ebooks entirely client-side—preserving privacy and eliminating server costs.

The complete implementation shows that even legacy formats like MOBI can be parsed and generated in JavaScript, opening possibilities for other client-side document processing tools.

Technologies Used:

Next.js 14 with App Router
TypeScript 5
Comlink for Web Worker communication
@lingo-reader/mobi-parser for parsing
JSZip for archive creation
PalmDOC compression algorithm (custom implementation)

Try It Yourself

Want to split your own MOBI collections? Visit Free online Mobi splitter to try our browser-based MOBI splitter. No installation required—just upload your file and download individual books instantly. Your files never leave your browser, ensuring complete privacy.

Features:

100% client-side processing—no data upload to servers
Supports large collections (tested up to 100MB+)
Automatic book detection from table of contents
Download as individual MOBI files or ZIP archive
Works on all modern browsers

DEV Community