DEV Community

Cover image for Tiny Programs 1: docshund-rs
Connor
Connor

Posted on

 

Tiny Programs 1: docshund-rs

Long story short, I've wound up starting work on a small Tesseract OCR program. I call it docshund-rs, because it finds things in documents like a dachshund finds gophers in holes, and it's written in Rust. I'm intensely creative.

It took me longer to remember how Rust does Result<> type returns and accordingly unwrap the results of the tesseract-rs calls than it did to get the program working.

Though, all things told, it's already pretty cool. It can successfully scan image files like JPEG, PNG and TIF with a reasonable degree of accuracy.

Ultimately I think docshund-rs will be a program that can take a PDF file, turn it into images, and then process a bunch of those pages concurrently before barfing the output back out into a searchable PDF, or at least just a text file dump.

This is also subject to my interest level in the project, which usually varies wildly.

Though I think I'll keep a running tab of Tiny Programs and link it all together as a series, regardless.

Title photo by James Watson on Unsplash

Top comments (0)

An Animated Guide to Node.js Event Loop

Node.js doesnโ€™t stop from running other operations because of Libuv, a C++ library responsible for the event loop and asynchronously handling tasks such as network requests, DNS resolution, file system operations, data encryption, etc.

What happens under the hood when Node.js works on tasks such as database queries? We will explore it by following this piece of code step by step.