All tests run on an 8-year-old MacBook Air.
Saving a PDF doesn't erase who made it, when, or with what software.
That metadata is still there. Most "remove metadata" tools just overwrite the fields — the structure remains intact, and in some cases the original data is still recoverable.
Forensic Deep Purge takes a different approach. And then Stealth Watermark does the exact opposite — hides an invisible tracer inside.
Here's what breaks if you get either one wrong.
Forensic Deep Purge: Zero-Trust Reconstruction
Don't delete metadata. Rebuild the PDF from scratch using only what you need.
pub fn deep_purge(input: &[u8]) -> Result, PurgeError> {
let mut doc = Document::load_mem(input)?;
// Strip Info dictionary entirely
doc.trailer.remove(b"Info");
// Remove XMP metadata stream from catalog
if let Ok(catalog) = doc.catalog_mut() {
catalog.remove(b"Metadata");
}
// Walk all objects and strip authorship fields
for (_, object) in doc.objects.iter_mut() {
if let Ok(dict) = object.as_dict_mut() {
for key in &[b"Author", b"Creator", b"Producer", b"CreationDate", b"ModDate"] {
dict.remove(*key);
}
}
}
// Full re-serialization — structure rebuilt clean
let mut output = Vec::new();
doc.save_to(&mut output)?;
Ok(output)
}
The re-serialization step is what makes this different from a simple field wipe. Hidden layers, deleted-but-present content, invisible annotations — gone, because they were never included in the rebuild.
What a typical PDF actually contains
Before purge, a standard Word export includes:
- Author: Windows username of the creator
- Creator: "Microsoft Word for Microsoft 365"
- Producer: PDF library version
- CreationDate: exact timestamp to the second
- ModDate: last edit time
- XMP block with redundant copies of all of the above
After deep purge: none of it.
Stealth Watermark: the opposite problem
Sometimes you don't want to remove a trace — you want to plant one.
Layer 1: Micro-stamp (invisible text)
1pt font, opacity 0.01 — invisible at normal zoom, detectable forensically:
pub fn embed_stealth_text(
doc: &mut Document,
page_id: ObjectId,
stamp_text: &str,
) -> Result<(), lopdf::Error> {
let content_stream = format!(
"q\nBT\n/F1 1 Tf\n0.01 g\n100 100 Td\n({}) Tj\nET\nQ\n",
stamp_text // e.g. "COPY-2024-USER-0042"
);
append_content_to_page(doc, page_id, content_stream.as_bytes())?;
Ok(())
}
Layer 2: Forensic Ghost-Mark
Non-standard field buried in the PDF catalog — invisible to viewers, visible in the raw object tree:
pub fn embed_ghost_mark(
doc: &mut Document,
seal: &str,
) -> Result<(), lopdf::Error> {
let catalog = doc.catalog_mut()?;
catalog.set(b"HiyokoSeal", Object::string_literal(seal));
Ok(())
}
Two layers. Both invisible. Either one survives most re-saves.
Current state (dev build)
8-year-old MacBook Air. Purge runs instantly.
Next devlog
Offline AI chat with Ollama — asking questions about PDF content without sending a single byte to any server.
Hiyoko PDF Vault → https://hiyokoko.gumroad.com/l/HiyokoPDFVault
X → @hiyoyok

Top comments (0)