All tests run on an 8-year-old MacBook Air.
"scan_20260101_0042.pdf" tells you nothing.
Smart Rename reads the first page of each PDF and generates a meaningful filename. No API call, no LLM, no internet. Just heuristics and pattern matching in Rust.
What the first page usually contains
Most documents put identifying information near the top: document type, date, recipient, reference number. The challenge is extracting it reliably across wildly different layouts.
pub fn extract_rename_candidates(doc: &Document) -> RenameCandidates {
let first_page_text = doc.extract_text(&[1]).unwrap_or_default();
let lines: Vec<&str> = first_page_text.lines()
.map(str::trim)
.filter(|l| !l.is_empty())
.collect();
RenameCandidates {
title: "find_title(&lines),"
date: find_date(&lines),
doc_type: classify_document(&lines),
reference: find_reference_number(&lines),
}
}
Date extraction
Dates appear in many formats. Regex covers the common ones:
pub fn find_date(lines: &[&str]) -> Option {
let patterns = [
r"(\d{4})[年/\-\.](\d{1,2})[月/\-\.](\d{1,2})[日]?",
r"(\d{1,2})[/\-\.](\d{1,2})[/\-\.](\d{4})",
r"(January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{1,2},?\s+\d{4}",
];
for line in lines {
for pattern in &patterns {
if let Some(cap) = Regex::new(pattern).unwrap().captures(line) {
return Some(normalize_date(&cap));
}
}
}
None
}
Document type classification
A small keyword list covers most business documents:
pub fn classify_document(lines: &[&str]) -> &'static str {
let text = lines.join(" ").to_lowercase();
if text.contains("invoice") || text.contains("請求書") { return "invoice"; }
if text.contains("contract") || text.contains("契約書") { return "contract"; }
if text.contains("receipt") || text.contains("領収書") { return "receipt"; }
if text.contains("report") || text.contains("報告書") { return "report"; }
if text.contains("minutes") || text.contains("議事録") { return "minutes"; }
"document"
}
Output
scan_20260101_0042.pdf → 20260115_invoice_ABC-Corp.pdf
doc_final_v3.pdf → 20260203_contract_NDA.pdf
untitled.pdf → 20260310_report.pdf
Not perfect. Works well enough that users stop caring about the originals.
Hiyoko PDF Vault → https://hiyokoko.gumroad.com/l/HiyokoPDFVault
X → @hiyoyok
Top comments (0)