DEV Community

Cover image for Your "Deleted" PDF Metadata Isn't Gone. Here's How I Actually Remove It — and Hide a Tracer Inside. [Devlog #6]
hiyoyo
hiyoyo

Posted on

Your "Deleted" PDF Metadata Isn't Gone. Here's How I Actually Remove It — and Hide a Tracer Inside. [Devlog #6]

All tests run on an 8-year-old MacBook Air.

Saving a PDF doesn't erase who made it, when, or with what software.

That metadata is still there. Most "remove metadata" tools just overwrite the fields — the structure remains intact, and in some cases the original data is still recoverable.

Forensic Deep Purge takes a different approach. And then Stealth Watermark does the exact opposite — hides an invisible tracer inside.

Here's what breaks if you get either one wrong.


Forensic Deep Purge: Zero-Trust Reconstruction

Don't delete metadata. Rebuild the PDF from scratch using only what you need.

pub fn deep_purge(input: &[u8]) -> Result, PurgeError> {
    let mut doc = Document::load_mem(input)?;

    // Strip Info dictionary entirely
    doc.trailer.remove(b"Info");

    // Remove XMP metadata stream from catalog
    if let Ok(catalog) = doc.catalog_mut() {
        catalog.remove(b"Metadata");
    }

    // Walk all objects and strip authorship fields
    for (_, object) in doc.objects.iter_mut() {
        if let Ok(dict) = object.as_dict_mut() {
            for key in &[b"Author", b"Creator", b"Producer", b"CreationDate", b"ModDate"] {
                dict.remove(*key);
            }
        }
    }

    // Full re-serialization — structure rebuilt clean
    let mut output = Vec::new();
    doc.save_to(&mut output)?;
    Ok(output)
}
Enter fullscreen mode Exit fullscreen mode

The re-serialization step is what makes this different from a simple field wipe. Hidden layers, deleted-but-present content, invisible annotations — gone, because they were never included in the rebuild.


What a typical PDF actually contains

Before purge, a standard Word export includes:

  • Author: Windows username of the creator
  • Creator: "Microsoft Word for Microsoft 365"
  • Producer: PDF library version
  • CreationDate: exact timestamp to the second
  • ModDate: last edit time
  • XMP block with redundant copies of all of the above

After deep purge: none of it.


Stealth Watermark: the opposite problem

Sometimes you don't want to remove a trace — you want to plant one.

Layer 1: Micro-stamp (invisible text)

1pt font, opacity 0.01 — invisible at normal zoom, detectable forensically:

pub fn embed_stealth_text(
    doc: &mut Document,
    page_id: ObjectId,
    stamp_text: &str,
) -> Result<(), lopdf::Error> {
    let content_stream = format!(
        "q\nBT\n/F1 1 Tf\n0.01 g\n100 100 Td\n({}) Tj\nET\nQ\n",
        stamp_text  // e.g. "COPY-2024-USER-0042"
    );
    append_content_to_page(doc, page_id, content_stream.as_bytes())?;
    Ok(())
}
Enter fullscreen mode Exit fullscreen mode

Layer 2: Forensic Ghost-Mark

Non-standard field buried in the PDF catalog — invisible to viewers, visible in the raw object tree:

pub fn embed_ghost_mark(
    doc: &mut Document,
    seal: &str,
) -> Result<(), lopdf::Error> {
    let catalog = doc.catalog_mut()?;
    catalog.set(b"HiyokoSeal", Object::string_literal(seal));
    Ok(())
}
Enter fullscreen mode Exit fullscreen mode

Two layers. Both invisible. Either one survives most re-saves.


Current state (dev build)

8-year-old MacBook Air. Purge runs instantly.


Next devlog

Offline AI chat with Ollama — asking questions about PDF content without sending a single byte to any server.


Hiyoko PDF Vault → https://hiyokoko.gumroad.com/l/HiyokoPDFVault
X → @hiyoyok

Top comments (0)