Overview
harumi is a Pure Rust library that lets you dynamically add CJK text (Japanese, Chinese, Korean) to existing PDFs. Unlike bindings-based solutions, it has zero C dependencies and handles font subsetting automatically.
- crates.io: https://crates.io/crates/harumi
- GitHub: https://github.com/kent-tokyo/harumi
Why Another Rust PDF Crate?
The existing Rust PDF ecosystem leaves a gap:
| Crate | Limitation |
|---|---|
lopdf |
Low-level; no font subsetting or CMap generation |
printpdf |
Create-only; can't edit existing PDFs |
pdfium-render |
Requires linking against the C-based PDFium library |
harumi fills that gap: append-only editing of existing PDFs, Pure Rust, with automatic CJK font subsetting and ToUnicode CMap generation built in.
The Three Hard Problems of CJK in PDF
Getting Japanese (and CJK in general) right inside a PDF isn't just about "embedding a font." There are three distinct challenges:
1. Font Subsetting
A full Japanese font file can easily exceed 10 MB. For practical file sizes you must extract only the glyphs actually used and rebuild the font binary — this is subsetting. harumi does this automatically at save time.
2. ToUnicode CMap Generation
PDFs separate rendering (Glyph IDs) from semantics (Unicode code points). Without a ToUnicode CMap, copy-paste and text search produce garbled output. harumi generates this mapping for every font it embeds.
3. Glyph Advance Width Recalculation
After subsetting, Glyph IDs are reassigned. The advance widths stored in the PDF must be recalculated to match — otherwise text spacing breaks. harumi handles this as part of the save pipeline.
Lazy Subsetting Pipeline
harumi uses a lazy subsetting design to handle all three problems in one pass:
-
embed_font()— store raw font bytes; no processing yet - Collect all text draw calls across all pages
- Walk every page at
save()time, gathering the complete set of used characters - Subset the font to only those glyphs
- Reassign Glyph IDs
- Build the ToUnicode CMap
- Recalculate advance widths and write the final CIDFont object
This single-pass approach avoids redundant font processing and keeps the implementation straightforward.
Feature Overview
use harumi::Document;
let mut doc = Document::open("input.pdf")?;
// Append text (including invisible text for search layers)
doc.page(0).add_text("Hello, 世界!", font, 12.0, x, y)?;
// Draw shapes and embed images
doc.page(0).draw_rect(x, y, width, height, color)?;
doc.page(0).embed_image(image_bytes, x, y, width, height)?;
// Page operations
doc.rotate_page(1, 90)?;
doc.delete_page(2)?;
doc.reorder_pages(&[2, 0, 1])?;
// Merge and split
let other = Document::open("other.pdf")?;
doc.merge(other)?;
let parts = doc.split_at(&[3])?;
// Extract text
let text = doc.extract_text(0)?;
// Metadata
doc.set_title("My Document")?;
doc.save("output.pdf")?;
Current Status & Roadmap
harumi is published on crates.io and the source is available on GitHub.
Planned improvements:
- Broader CJK font format support
- Form field editing
- Performance optimizations for large documents
Feedback, issues, and contributions are very welcome!
Top comments (0)