All tests run on an 8-year-old MacBook Air.
Scan → OCR → compress → save to the right folder.
Every. Single. Time.
Not complex enough to write a script for. Too repetitive to keep doing manually. So I built a pipeline engine into the app — and here's how the architecture works.
The idea: steps as data
Each operation (OCR, compress, encrypt, rename, watermark, save) is a typed StepType. A pipeline is just an ordered list of enabled steps.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum StepType {
Ocr,
Compress { level: CompressionLevel },
Encrypt { password: String },
Rename { template: String },
Save { destination: PathBuf },
Watermark { text: String, opacity: f32 },
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct PipelineStep {
pub step_type: StepType,
pub enabled: bool,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Pipeline {
pub name: String,
pub steps: Vec,
}
Storing pipelines as serializable data means users can save, share, and reload them. The UI just edits a JSON blob.
The execution engine
Each step's output becomes the next step's input. If any step fails, the whole pipeline halts and temp files are cleaned up.
pub async fn run_pipeline(
pipeline: &Pipeline,
input_path: &Path,
) -> Result {
let mut current_path = input_path.to_path_buf();
for step in pipeline.steps.iter().filter(|s| s.enabled) {
current_path = match &step.step_type {
StepType::Ocr => run_ocr(¤t_path).await?,
StepType::Compress { level } => compress_pdf(¤t_path, level).await?,
StepType::Encrypt { password } => encrypt_pdf(¤t_path, password).await?,
StepType::Rename { template } => rename_file(¤t_path, template).await?,
StepType::Save { destination } => save_to(¤t_path, destination).await?,
StepType::Watermark { text, opacity } => add_watermark(¤t_path, text, *opacity).await?,
};
}
Ok(current_path)
}
The ? operator handles error propagation cleanly. Each step function is independently testable.
Hot Folder: drop a file, pipeline runs automatically
Point it at a folder. Any PDF that lands there triggers the pipeline immediately.
use notify::{Watcher, RecursiveMode, watcher};
pub fn watch_folder(
folder: &Path,
pipeline: Pipeline,
) -> Result<(), notify::Error> {
let (tx, rx) = std::sync::mpsc::channel();
let mut watcher = watcher(tx, Duration::from_secs(1))?;
watcher.watch(folder, RecursiveMode::NonRecursive)?;
loop {
match rx.recv() {
Ok(DebouncedEvent::Create(path)) => {
if path.extension().map_or(false, |e| e == "pdf") {
tokio::spawn(run_pipeline(&pipeline, &path));
}
}
Err(e) => eprintln!("watch error: {:?}", e),
_ => {}
}
}
}
Set your scanner's output folder as the Hot Folder. Scan the document — by the time you walk back to your desk, it's already OCR'd, compressed, and filed.
Current state (dev build)
Steps are drag-and-drop reorderable. Toggle individual steps on/off without deleting them. Save named pipelines for different workflows.
Next devlog
Forensic Deep Purge and Stealth Watermark — invisible security. How do you prove a document leaked without the leaker knowing you can prove it?
Hiyoko PDF Vault → https://hiyokoko.gumroad.com/l/HiyokoPDFVault
X → @hiyoyok

Top comments (0)