<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Andrew Judd</title>
    <description>The latest articles on DEV Community by Andrew Judd (@awjudd).</description>
    <link>https://dev.to/awjudd</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F135490%2Fb09c511c-654a-4e20-8a05-6470be8a6b39.png</url>
      <title>DEV Community: Andrew Judd</title>
      <link>https://dev.to/awjudd</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/awjudd"/>
    <language>en</language>
    <item>
      <title>The OCR Rabbit Hole</title>
      <dc:creator>Andrew Judd</dc:creator>
      <pubDate>Thu, 21 May 2026 15:44:33 +0000</pubDate>
      <link>https://dev.to/awjudd/the-ocr-rabbit-hole-3h7</link>
      <guid>https://dev.to/awjudd/the-ocr-rabbit-hole-3h7</guid>
      <description>&lt;p&gt;Long weekend. Pile of handwritten documents on the desk. They need to become structured data - searchable fields in an app, not just scanned images.&lt;/p&gt;

&lt;p&gt;OCR's been around forever. This should take a couple of hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Documents
&lt;/h2&gt;

&lt;p&gt;They're handwritten. Different people, different paper, different decades. Faded ink on some. Crossed-out words on others. Notes crammed into margins where there wasn't really room.&lt;/p&gt;

&lt;p&gt;Any person can read these in about five seconds. Remember that.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tesseract
&lt;/h2&gt;

&lt;p&gt;Saturday morning. Tesseract is free, it's open source, I can run it on my own machine. Good enough to start.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pytesseract&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pytesseract&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;image_to_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;image_file&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;glob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;--- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;image_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; ---&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;extract_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_file&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Printed text? Fine. Handwriting? "2 1/4 cups flour" comes back as "2 114 cps flcar."&lt;/p&gt;

&lt;p&gt;I try different config options. Page segmentation modes. OCR engine modes. Character allowlists. Each one helps somewhere and breaks something else.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pre-Processing
&lt;/h2&gt;

&lt;p&gt;Maybe the images are the problem.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pytesseract&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ImageFilter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ImageEnhance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ImageOps&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;preprocess_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;L&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ImageOps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;autocontrast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cutoff&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ImageFilter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SHARPEN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ImageEnhance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Contrast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;enhance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1.8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;img_array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img_array&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;img_array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;img_array&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uint8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromarray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img_array&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;image_file&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;glob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;--- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;image_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; ---&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;preprocess_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pytesseract&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;image_to_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Contrast boosting helps the faded ones. Hurts the clear ones. Binarization destroys the subtle parts of the handwriting. Sharpening makes some letters readable and others bleed together.&lt;/p&gt;

&lt;p&gt;Every fix is a trade-off. I'm tuning parameters per document. That's not automation.&lt;/p&gt;

&lt;p&gt;30-40% accuracy on the handwriting. Saturday morning gone.&lt;/p&gt;

&lt;p&gt;Tesseract is great at printed text. That's not what I have.&lt;/p&gt;

</description>
      <category>recipes</category>
      <category>ocr</category>
      <category>learning</category>
      <category>python</category>
    </item>
    <item>
      <title>Mistakes Happen</title>
      <dc:creator>Andrew Judd</dc:creator>
      <pubDate>Tue, 19 Oct 2021 13:15:05 +0000</pubDate>
      <link>https://dev.to/awjudd/mistakes-happen-3nf8</link>
      <guid>https://dev.to/awjudd/mistakes-happen-3nf8</guid>
      <description>&lt;p&gt;Software development is a mostly “behind the scenes” operation. This is great because we don’t need to be in the limelight to do our job, nor to be good at it. But, this also opens up the door for other possibilities as well.&lt;/p&gt;

&lt;p&gt;We have a lot of different tools at our disposal for the code that we write, and with those tools, we also have a responsibility to use for good. And guess what, we are also human. So, we make mistakes!&lt;/p&gt;

&lt;p&gt;When doing anything in life, mistakes are inevitable. They are made in every occupation, it’s how we react to them that matters. Unfortunately I’ve seen countless times people trying to “protect themselves” by covering up the mistake. We have the tools available to us to do some of this. But I have to ask why is it necessary? Just because we are sitting behind a screen and can slave away for a few extra hours to remedy it, doesn’t make it right. Sure, admitting that you made a mistake is a bit of a slap to the ego, it should be! It further proves that you are not infallible. On the flip side, if you don’t admit it to yourself or just try to cover it up, and you are able to repair it it may end up actually inflating your ego. Telling yourself, “See, I messed up and was able to power through and fix it, I’m an amazing developer.” This however, is missing the point. It’s a learning experience for you.&lt;/p&gt;

&lt;p&gt;The bigger issue that I find when people try to cover up mistakes is the fact that, it gets the problem solved today, but:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What happens if you make bigger mistakes when covering up this one?&lt;/li&gt;
&lt;li&gt;What happens if you don’t even realize you are missing one piece of the puzzle? (i.e. if you are restoring data in a database, and return data to 2 out of the 3 required tables)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Problem solving is part of our job, and these issues could just be looked at as another problem to solve, which is fair however, not favourable.&lt;/p&gt;

&lt;p&gt;What happens if you make a mistake that you can’t cover up? What happens if one of your co-workers find out that you covered something else up? That’s when things start to go downhill fast. Your reputation has been tarnished. The worst part about it is the fact people know you do this, so sometimes the unasked question (or asked question) is “what else did you cover up?”. This of course will instill doubt within your co-workers. They also will lose faith in you, and at this point it’s difficult to rebuild this trust. You built part of the working relationship on lies, and betrayed your fellow developers, how can they trust you?&lt;/p&gt;

&lt;p&gt;When you do make a mistake that you are not a fan of, instead of covering it up, do the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Get a good understanding of the mistake that was made, so you are able to explain it&lt;/li&gt;
&lt;li&gt;Come up with possible solutions on how to resolve it&lt;/li&gt;
&lt;li&gt;Mention to your team that you made this mistake, and have come up with a few options and ask colleagues for their opinions&lt;/li&gt;
&lt;li&gt;Work with your colleagues to resolve the issue&lt;/li&gt;
&lt;li&gt;Come up with an action plan to prevent this from happening in the future&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Yes, doing this may add a scratch on your reputation but it will also help to add a feather into your cap. You have shown your co-workers a vulnerable side, and that you are fallible. Because of that, they will, over time, trust you more. They will know you tell the truth and aren’t trying to save face. You may also become one of the people that they go to when they inevitably make a mistake.&lt;/p&gt;

&lt;p&gt;Mistakes suck, but are inevitable as humans. It’s how we choose to react and grow from them which shows a level of maturity and integrity that you have. The initial ego hit sucks, but the trust that you develop as a team member will help to reduce the overall burn from this mistake.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>mistakes</category>
      <category>professional</category>
      <category>learning</category>
    </item>
  </channel>
</rss>
