DEV Community

Cover image for 1,000-Page PDF. No Freeze. Here's the Rendering Architecture That Made It Possible. [Devlog #4]
hiyoyo
hiyoyo

Posted on

1,000-Page PDF. No Freeze. Here's the Rendering Architecture That Made It Possible. [Devlog #4]

My first implementation was simple.

Open PDF → render all pages → dump into the DOM.

It worked fine up to about 100 pages. At 300 it started dragging. At 1,000 the app froze for 8 seconds on open.

The fix wasn't clever. It was just the right architecture from the start — and here's exactly what that looks like.


The core problem: rendering what you can't see

Loading a 1,000-page PDF shouldn't mean processing 1,000 pages of content. At any given moment, a user sees maybe 2-3 pages.

The solution is virtual scrolling — only render what's visible, destroy what isn't.

The tricky part for PDFs specifically: you need each page's dimensions before rendering it, so the scroll container knows its total height. PDF pages aren't always the same size.

I solve this upfront with lopdf, reading just the MediaBox from each page:

pub fn get_page_sizes(doc: &Document) -> Vec<(f64, f64)> {
    doc.page_iter().map(|page_id| {
        let page = doc.get_object(page_id).unwrap();
        let media_box = page.as_dict()
            .and_then(|d| d.get(b"MediaBox"))
            .and_then(|o| o.as_array())
            .map(|arr| (
                arr[2].as_float().unwrap_or(595.0),
                arr[3].as_float().unwrap_or(842.0),
            ))
            .unwrap_or((595.0, 842.0));
        media_box
    }).collect()
}
Enter fullscreen mode Exit fullscreen mode

This runs instantly — no rendering, just metadata. Now the scroll container knows exactly how tall it needs to be.


Ghost Batch: eliminating process spawn overhead

Virtual scrolling alone still caused jank. Every scroll event fired a new render request, and each one had process spawn overhead.

Ghost Batch fixes this by queuing render requests and processing them together:

Without Ghost Batch:
scroll → spawn process → render page A → return
scroll → spawn process → render page B → return

With Ghost Batch:
scroll → queue page A
scroll → queue page B
queue threshold hit → spawn once → render A + B together → return
Enter fullscreen mode Exit fullscreen mode

This alone cut process creation overhead by ~90% in practice.


Intelligent Prefetch: rendering ahead of the user

Detect scroll direction, pre-render the next 2 pages in the background:

const handleScroll = (e: React.UIEvent) => {
  const { scrollTop, clientHeight } = e.currentTarget;
  const direction = scrollTop > prevScrollTop.current ? 'down' : 'up';
  const visiblePages = getVisiblePages(scrollTop, clientHeight);

  const prefetchPages = direction === 'down'
    ? [visiblePages.last + 1, visiblePages.last + 2]
    : [visiblePages.first - 1, visiblePages.first - 2];

  prefetchPages
    .filter(p => p >= 0 && p < totalPages)
    .forEach(p => prefetchPage(p));

  prevScrollTop.current = scrollTop;
};
Enter fullscreen mode Exit fullscreen mode

By the time the user scrolls to the next page, it's already rendered.


Results

Metric Before After
1,000-page open time ~8s freeze Near-instant
Memory usage All pages ~3x visible pages
Scroll jank Noticeable Gone

Current state (dev build)

1,000+ pages, no freeze. The architecture holds.


Next devlog

Magic Pipeline — chaining OCR → compress → save into a single click. The workflow automation engine behind it.


Hiyoko PDF Vault → https://hiyokoko.gumroad.com/l/HiyokoPDFVault
X → @hiyoyok

Top comments (0)