DEV Community

Cover image for Tiny Programs 1: docshund-rs

Posted on

Tiny Programs 1: docshund-rs

Long story short, I've wound up starting work on a small Tesseract OCR program. I call it docshund-rs, because it finds things in documents like a dachshund finds gophers in holes, and it's written in Rust. I'm intensely creative.

It took me longer to remember how Rust does Result<> type returns and accordingly unwrap the results of the tesseract-rs calls than it did to get the program working.

Though, all things told, it's already pretty cool. It can successfully scan image files like JPEG, PNG and TIF with a reasonable degree of accuracy.

Ultimately I think docshund-rs will be a program that can take a PDF file, turn it into images, and then process a bunch of those pages concurrently before barfing the output back out into a searchable PDF, or at least just a text file dump.

This is also subject to my interest level in the project, which usually varies wildly.

Though I think I'll keep a running tab of Tiny Programs and link it all together as a series, regardless.

Title photo by James Watson on Unsplash

Top comments (0)