DEV Community

Cover image for Pixel Diff vs Structural Diff for PDFs — Two Very Different Problems
hiyoyo
hiyoyo

Posted on

Pixel Diff vs Structural Diff for PDFs — Two Very Different Problems

All tests run on an 8-year-old MacBook Air.

"Compare these two PDFs" sounds like one problem. It's actually two completely different problems depending on what you mean.

Hiyoko PDF Vault implements both. Here's the difference.


Pixel diff: what changed visually

Render both PDFs to images, compare pixel by pixel. Highlights visual differences — layout shifts, image changes, text that moved.

pub fn pixel_diff(
    page_a: &[u8],  // rendered PNG bytes
    page_b: &[u8],
) -> DiffResult {
    let img_a = image::load_from_memory(page_a).unwrap().to_rgba8();
    let img_b = image::load_from_memory(page_b).unwrap().to_rgba8();

    let mut diff_map = img_a.clone();
    let mut changed_pixels = 0u32;

    for (x, y, pixel_a) in img_a.enumerate_pixels() {
        let pixel_b = img_b.get_pixel(x, y);

        if pixel_a != pixel_b {
            // Highlight difference in red
            diff_map.put_pixel(x, y, Rgba([255, 50, 50, 255]));
            changed_pixels += 1;
        }
    }

    DiffResult { diff_map, changed_pixels }
}
Enter fullscreen mode Exit fullscreen mode

Good for: catching layout regressions, verifying print output, spotting visual tampering.

Bad for: understanding what changed — you see that something changed, not what the content change was.


Structural diff: what changed in the document object tree

Walk both documents' object trees, compare dictionaries and streams at the PDF object level.

pub fn structural_diff(
    doc_a: &Document,
    doc_b: &Document,
) -> Vec {
    let mut changes = Vec::new();

    for (id, obj_a) in &doc_a.objects {
        match doc_b.objects.get(id) {
            None => changes.push(StructuralChange::Removed(*id)),
            Some(obj_b) if obj_a != obj_b => {
                changes.push(StructuralChange::Modified {
                    id: *id,
                    before: obj_a.clone(),
                    after: obj_b.clone(),
                })
            }
            _ => {}
        }
    }

    changes
}
Enter fullscreen mode Exit fullscreen mode

Good for: detecting hidden object changes, metadata modifications, subtle binary tampering that wouldn't show up visually.

Bad for: human readability — the output is raw PDF object data.


When to use which

Situation Use
"Did the layout change?" Pixel diff
"Was this document tampered with?" Structural diff
"What text was edited?" Pixel diff + text extraction
"Were hidden objects added?" Structural diff

I expose both in the UI. Most users reach for pixel diff. Forensic users reach for structural.


Hiyoko PDF Vault → https://hiyokoko.gumroad.com/l/HiyokoPDFVault
X → @hiyoyok

Top comments (0)