DEV Community

monkeymore studio
monkeymore studio

Posted on

Building a Browser-Based MOBI Splitter: Client-Side Ebook Processing with TypeScript

Splitting a collection of ebooks into individual files typically requires server-side processing. But what if you could do it entirely in the browser? This article explores how we built a client-side MOBI splitter that parses binary files, reconstructs HTML content, and generates valid MOBI files—all without uploading data to a server.

The Challenge

MOBI files, used by Kindle devices, are binary formats with a complex structure:

  • PalmDB (PDB) container format with record-based storage
  • MOBI headers with metadata and flags
  • EXTH (Extended Header) records for book information
  • PalmDOC compression for text content
  • Resource references for images and fonts

Processing these in the browser means:

  1. Binary parsing in JavaScript/TypeScript without native libraries
  2. Compression/decompression of PalmDOC format
  3. HTML reconstruction from fragmented records
  4. File generation that produces valid MOBI output

Architecture Overview

Step 1: File Parsing with Web Workers

The first challenge is parsing the MOBI format without blocking the UI. We use a Web Worker with Comlink for elegant communication:

// MobiSplitterClient.tsx
const WorkerClass = Comlink.wrap<MobiSplitterWorkerType>(
  new Worker(new URL('./mobi-splitter.worker.ts', import.meta.url))
);
Enter fullscreen mode Exit fullscreen mode

In the worker, we use @lingo-reader/mobi-parser to extract:

  • Metadata: Title, author, language
  • Table of Contents (TOC): With filepos references to chapter locations
  • Spine: Ordered list of content records
  • Resources: Images and fonts stored in the file

Step 2: HTML Reconstruction

MOBI files store content as fragmented records. We need to reconstruct the full HTML:

// Build full HTML from spine records
const chapters: string[] = [];
for (const id of book.spine) {
  const html = await book.loadChapter(id);
  chapters.push(html);
}
const fullHtml = chapters.join('\n');
Enter fullscreen mode Exit fullscreen mode

During reconstruction, we:

  1. Convert filepos links: Original TOC uses filepos:XXXX hrefs. We convert these to anchors <a id="fileposXXXX"> for position tracking.
  2. Normalize resource references: Handle various image reference formats:
    • recindex="N" - Standard MOBI references
    • src="blob:..." - Blob URLs from parser
    • src="Images/imageXXXX.jpg" - Path-based references

Step 3: Detecting Book Boundaries

A "MOBI collection" contains multiple books. We detect boundaries from TOC entries:

// Filter out non-book entries (copyright, contents, intro, etc.)
const blackList = ['copyright', 'contents', 'intro', 'introduction', 'cover'];

const bookTocEntries = toc.filter((entry, index) => {
  const label = (entry.label || '').toLowerCase();
  // Keep entries that look like book titles (not chapters)
  return !blackList.some(keyword => label.includes(keyword)) && 
         index > 0; // Skip first entry (usually collection title)
});
Enter fullscreen mode Exit fullscreen mode

Step 4: Content Extraction with Image Mapping

For each book, we:

The tricky part is image index reassignment. Since we're creating standalone MOBI files, each book needs its own resource index space starting from 0:

// Build image map for this book
const imageMap = new Map<number, Uint8Array>();
let newIndex = 1; // MOBI uses 1-based indexing for resources

html.match(/recindex="(\d+)"/g)?.forEach((match) => {
  const oldIndex = parseInt(match.match(/\d+/)![0]);
  if (!imageMap.has(oldIndex)) {
    const imageData = await book.loadResource(oldIndex);
    imageMap.set(newIndex, imageData); // Map to new index
    html = html.replace(
      new RegExp(`recindex="${oldIndex}"`, 'g'),
      `recindex="${newIndex}"`
    );
    newIndex++;
  }
});
Enter fullscreen mode Exit fullscreen mode

Step 5: PalmDOC Compression

MOBI files use PalmDOC compression, an LZ77 variant with optimizations:

function palmDocCompress(data: Uint8Array): Uint8Array {
  const result: number[] = [];
  let pos = 0;

  while (pos < data.length) {
    // Look for repeated sequences within 2047 bytes back
    let bestLength = 0;
    let bestOffset = 0;

    const searchStart = Math.max(0, pos - 2047);
    for (let i = searchStart; i < pos; i++) {
      let length = 0;
      while (length < 10 && 
             pos + length < data.length && 
             data[i + length] === data[pos + length]) {
        length++;
      }
      if (length > bestLength) {
        bestLength = length;
        bestOffset = pos - i;
      }
    }

    if (bestLength >= 3) {
      // Encode as back-reference: 0x80 + (length-3)<<11 + offset
      const encoded = 0x8000 | ((bestLength - 3) << 11) | bestOffset;
      result.push((encoded >> 8) & 0xFF, encoded & 0xFF);
      pos += bestLength;
    } else if (data[pos] === 0x20 && pos + 1 < data.length && 
               data[pos + 1] >= 0x40 && data[pos + 1] <= 0x7F) {
      // Optimize "space + ASCII" to single byte
      result.push(data[pos + 1] ^ 0x80);
      pos += 2;
    } else {
      result.push(data[pos]);
      pos++;
    }
  }

  return new Uint8Array(result);
}
Enter fullscreen mode Exit fullscreen mode

Step 6: MOBI File Generation

Finally, we generate valid MOBI files using MobiWriterCalibre:

The writer creates:

  1. PalmDB Header: Database name, attributes, record count
  2. MOBI Header: Format version, text encoding, flags
  3. EXTH Records: Title, author, language, ASIN
  4. Record Table: Offset pointers for each record
  5. FLIS/FCIS Records: Required by Kindle for proper rendering
class MobiWriterCalibre {
  constructor(
    title: string,
    author: string,
    html: string,
    images: Map<number, Uint8Array>,
    language: string = 'en'
  ) {
    // Build MOBI structure
    this.records = [];
    this.addTextRecords(html);        // Compressed text
    this.addImageRecords(images);     // Binary resources
    this.addFLISRecord();             // Kindle compatibility
    this.addFCISRecord();
    this.buildHeader();
  }
}
Enter fullscreen mode Exit fullscreen mode

Performance Considerations

Processing large ebook collections (100MB+) in the browser requires optimization:

Key Takeaways

  1. Binary processing in TypeScript is viable for complex formats like MOBI using ArrayBuffer and typed arrays

  2. Web Workers + Comlink provide a clean abstraction for offloading heavy computation without callback hell

  3. PalmDOC compression uses classic LZ77 with space+ASCII optimization—understanding legacy formats helps when working with ebook standards

  4. Resource remapping is critical when splitting files—each standalone output needs its own index space

  5. Browser capabilities have evolved to handle complex document processing, making client-side tools increasingly powerful

Conclusion

Building a browser-based MOBI splitter demonstrates that modern web technologies can handle complex binary file processing. By leveraging Web Workers for concurrency, implementing compression algorithms in TypeScript, and carefully managing binary data structures, we created a tool that processes ebooks entirely client-side—preserving privacy and eliminating server costs.

The complete implementation shows that even legacy formats like MOBI can be parsed and generated in JavaScript, opening possibilities for other client-side document processing tools.


Technologies Used:

  • Next.js 14 with App Router
  • TypeScript 5
  • Comlink for Web Worker communication
  • @lingo-reader/mobi-parser for parsing
  • JSZip for archive creation
  • PalmDOC compression algorithm (custom implementation)

Try It Yourself

Want to split your own MOBI collections? Visit Free online Mobi splitter to try our browser-based MOBI splitter. No installation required—just upload your file and download individual books instantly. Your files never leave your browser, ensuring complete privacy.

Features:

  • 100% client-side processing—no data upload to servers
  • Supports large collections (tested up to 100MB+)
  • Automatic book detection from table of contents
  • Download as individual MOBI files or ZIP archive
  • Works on all modern browsers

Top comments (0)