My first implementation was simple.
Open PDF → render all pages → dump into the DOM.
It worked fine up to about 100 pages. At 300 it started dragging. At 1,000 the app froze for 8 seconds on open.
The fix wasn't clever. It was just the right architecture from the start — and here's exactly what that looks like.
The core problem: rendering what you can't see
Loading a 1,000-page PDF shouldn't mean processing 1,000 pages of content. At any given moment, a user sees maybe 2-3 pages.
The solution is virtual scrolling — only render what's visible, destroy what isn't.
The tricky part for PDFs specifically: you need each page's dimensions before rendering it, so the scroll container knows its total height. PDF pages aren't always the same size.
I solve this upfront with lopdf, reading just the MediaBox from each page:
pub fn get_page_sizes(doc: &Document) -> Vec<(f64, f64)> {
doc.page_iter().map(|page_id| {
let page = doc.get_object(page_id).unwrap();
let media_box = page.as_dict()
.and_then(|d| d.get(b"MediaBox"))
.and_then(|o| o.as_array())
.map(|arr| (
arr[2].as_float().unwrap_or(595.0),
arr[3].as_float().unwrap_or(842.0),
))
.unwrap_or((595.0, 842.0));
media_box
}).collect()
}
This runs instantly — no rendering, just metadata. Now the scroll container knows exactly how tall it needs to be.
Ghost Batch: eliminating process spawn overhead
Virtual scrolling alone still caused jank. Every scroll event fired a new render request, and each one had process spawn overhead.
Ghost Batch fixes this by queuing render requests and processing them together:
Without Ghost Batch:
scroll → spawn process → render page A → return
scroll → spawn process → render page B → return
With Ghost Batch:
scroll → queue page A
scroll → queue page B
queue threshold hit → spawn once → render A + B together → return
This alone cut process creation overhead by ~90% in practice.
Intelligent Prefetch: rendering ahead of the user
Detect scroll direction, pre-render the next 2 pages in the background:
const handleScroll = (e: React.UIEvent) => {
const { scrollTop, clientHeight } = e.currentTarget;
const direction = scrollTop > prevScrollTop.current ? 'down' : 'up';
const visiblePages = getVisiblePages(scrollTop, clientHeight);
const prefetchPages = direction === 'down'
? [visiblePages.last + 1, visiblePages.last + 2]
: [visiblePages.first - 1, visiblePages.first - 2];
prefetchPages
.filter(p => p >= 0 && p < totalPages)
.forEach(p => prefetchPage(p));
prevScrollTop.current = scrollTop;
};
By the time the user scrolls to the next page, it's already rendered.
Results
| Metric | Before | After |
|---|---|---|
| 1,000-page open time | ~8s freeze | Near-instant |
| Memory usage | All pages | ~3x visible pages |
| Scroll jank | Noticeable | Gone |
Current state (dev build)
1,000+ pages, no freeze. The architecture holds.
Next devlog
Magic Pipeline — chaining OCR → compress → save into a single click. The workflow automation engine behind it.
Hiyoko PDF Vault → https://hiyokoko.gumroad.com/l/HiyokoPDFVault
X → @hiyoyok
Top comments (0)