I've been using document scanning apps like CamScanner and Smallpdf to send digital copies of physical documents for a long time, but I'd always wondered how exactly the apps worked. When you take a picture of a piece of paper, even without the paper perfectly centered, these apps automatically find its corners and warp the perspective of the image so it looks as if it were taken with a dedicated scanner. A few weeks ago, I started searching for open-source document scanners I could study.
The problem? There are none. Rather, the only open-source document scanners I could find basically handed everything off to OpenCV, which unfortunately has very sparse internal documentation.
So I decided to build my own document scanner, with one catch: I would be using zero third-party libraries. One month later, I have a prototype I'm happy with, and it's worked well on most of the documents I've tested it with.
Let's try it with a random image from Google:
Here's my document scanner at work:
And here is our final result:
The quality isn't perfect because the original image wasn't very high resolution, but taking pictures of most documents with a decent smartphone yields great results. If you'd like to check it out, the code and a demo website are available on GitHub.
However, I'd strongly suggest reading the rest of the articles in this series first to get a grasp on what exactly is happening under the hood. I believe that anyone can learn even the most complicated aspects of computer science, so I've written this series in such a way that you only need beginner programming skills and a basic understanding of algebra to follow along. Let's dive right in!
Top comments (0)