Splitting a collection of ebooks into individual files typically requires server-side processing. But what if you could do it entirely in the browser? This article explores how we built a client-side MOBI splitter that parses binary files, reconstructs HTML content, and generates valid MOBI files—all without uploading data to a server.
The Challenge
MOBI files, used by Kindle devices, are binary formats with a complex structure:
- PalmDB (PDB) container format with record-based storage
- MOBI headers with metadata and flags
- EXTH (Extended Header) records for book information
- PalmDOC compression for text content
- Resource references for images and fonts
Processing these in the browser means:
- Binary parsing in JavaScript/TypeScript without native libraries
- Compression/decompression of PalmDOC format
- HTML reconstruction from fragmented records
- File generation that produces valid MOBI output
Architecture Overview
Step 1: File Parsing with Web Workers
The first challenge is parsing the MOBI format without blocking the UI. We use a Web Worker with Comlink for elegant communication:
// MobiSplitterClient.tsx
const WorkerClass = Comlink.wrap<MobiSplitterWorkerType>(
new Worker(new URL('./mobi-splitter.worker.ts', import.meta.url))
);
In the worker, we use @lingo-reader/mobi-parser to extract:
- Metadata: Title, author, language
-
Table of Contents (TOC): With
fileposreferences to chapter locations - Spine: Ordered list of content records
- Resources: Images and fonts stored in the file
Step 2: HTML Reconstruction
MOBI files store content as fragmented records. We need to reconstruct the full HTML:
// Build full HTML from spine records
const chapters: string[] = [];
for (const id of book.spine) {
const html = await book.loadChapter(id);
chapters.push(html);
}
const fullHtml = chapters.join('\n');
During reconstruction, we:
-
Convert filepos links: Original TOC uses
filepos:XXXXhrefs. We convert these to anchors<a id="fileposXXXX">for position tracking. -
Normalize resource references: Handle various image reference formats:
-
recindex="N"- Standard MOBI references -
src="blob:..."- Blob URLs from parser -
src="Images/imageXXXX.jpg"- Path-based references
-
Step 3: Detecting Book Boundaries
A "MOBI collection" contains multiple books. We detect boundaries from TOC entries:
// Filter out non-book entries (copyright, contents, intro, etc.)
const blackList = ['copyright', 'contents', 'intro', 'introduction', 'cover'];
const bookTocEntries = toc.filter((entry, index) => {
const label = (entry.label || '').toLowerCase();
// Keep entries that look like book titles (not chapters)
return !blackList.some(keyword => label.includes(keyword)) &&
index > 0; // Skip first entry (usually collection title)
});
Step 4: Content Extraction with Image Mapping
For each book, we:
The tricky part is image index reassignment. Since we're creating standalone MOBI files, each book needs its own resource index space starting from 0:
// Build image map for this book
const imageMap = new Map<number, Uint8Array>();
let newIndex = 1; // MOBI uses 1-based indexing for resources
html.match(/recindex="(\d+)"/g)?.forEach((match) => {
const oldIndex = parseInt(match.match(/\d+/)![0]);
if (!imageMap.has(oldIndex)) {
const imageData = await book.loadResource(oldIndex);
imageMap.set(newIndex, imageData); // Map to new index
html = html.replace(
new RegExp(`recindex="${oldIndex}"`, 'g'),
`recindex="${newIndex}"`
);
newIndex++;
}
});
Step 5: PalmDOC Compression
MOBI files use PalmDOC compression, an LZ77 variant with optimizations:
function palmDocCompress(data: Uint8Array): Uint8Array {
const result: number[] = [];
let pos = 0;
while (pos < data.length) {
// Look for repeated sequences within 2047 bytes back
let bestLength = 0;
let bestOffset = 0;
const searchStart = Math.max(0, pos - 2047);
for (let i = searchStart; i < pos; i++) {
let length = 0;
while (length < 10 &&
pos + length < data.length &&
data[i + length] === data[pos + length]) {
length++;
}
if (length > bestLength) {
bestLength = length;
bestOffset = pos - i;
}
}
if (bestLength >= 3) {
// Encode as back-reference: 0x80 + (length-3)<<11 + offset
const encoded = 0x8000 | ((bestLength - 3) << 11) | bestOffset;
result.push((encoded >> 8) & 0xFF, encoded & 0xFF);
pos += bestLength;
} else if (data[pos] === 0x20 && pos + 1 < data.length &&
data[pos + 1] >= 0x40 && data[pos + 1] <= 0x7F) {
// Optimize "space + ASCII" to single byte
result.push(data[pos + 1] ^ 0x80);
pos += 2;
} else {
result.push(data[pos]);
pos++;
}
}
return new Uint8Array(result);
}
Step 6: MOBI File Generation
Finally, we generate valid MOBI files using MobiWriterCalibre:
The writer creates:
- PalmDB Header: Database name, attributes, record count
- MOBI Header: Format version, text encoding, flags
- EXTH Records: Title, author, language, ASIN
- Record Table: Offset pointers for each record
- FLIS/FCIS Records: Required by Kindle for proper rendering
class MobiWriterCalibre {
constructor(
title: string,
author: string,
html: string,
images: Map<number, Uint8Array>,
language: string = 'en'
) {
// Build MOBI structure
this.records = [];
this.addTextRecords(html); // Compressed text
this.addImageRecords(images); // Binary resources
this.addFLISRecord(); // Kindle compatibility
this.addFCISRecord();
this.buildHeader();
}
}
Performance Considerations
Processing large ebook collections (100MB+) in the browser requires optimization:
Key Takeaways
Binary processing in TypeScript is viable for complex formats like MOBI using ArrayBuffer and typed arrays
Web Workers + Comlink provide a clean abstraction for offloading heavy computation without callback hell
PalmDOC compression uses classic LZ77 with space+ASCII optimization—understanding legacy formats helps when working with ebook standards
Resource remapping is critical when splitting files—each standalone output needs its own index space
Browser capabilities have evolved to handle complex document processing, making client-side tools increasingly powerful
Conclusion
Building a browser-based MOBI splitter demonstrates that modern web technologies can handle complex binary file processing. By leveraging Web Workers for concurrency, implementing compression algorithms in TypeScript, and carefully managing binary data structures, we created a tool that processes ebooks entirely client-side—preserving privacy and eliminating server costs.
The complete implementation shows that even legacy formats like MOBI can be parsed and generated in JavaScript, opening possibilities for other client-side document processing tools.
Technologies Used:
- Next.js 14 with App Router
- TypeScript 5
- Comlink for Web Worker communication
- @lingo-reader/mobi-parser for parsing
- JSZip for archive creation
- PalmDOC compression algorithm (custom implementation)
Try It Yourself
Want to split your own MOBI collections? Visit Free online Mobi splitter to try our browser-based MOBI splitter. No installation required—just upload your file and download individual books instantly. Your files never leave your browser, ensuring complete privacy.
Features:
- 100% client-side processing—no data upload to servers
- Supports large collections (tested up to 100MB+)
- Automatic book detection from table of contents
- Download as individual MOBI files or ZIP archive
- Works on all modern browsers





Top comments (0)