I Tried Adding Image OCR to a Morse Code Translator

#ai #machinelearning #showdev #sideprojects

I thought adding image support to a morse code translator would be straightforward.

Text Morse is deterministic enough: split the dots and dashes into groups, map the groups to letters, then handle spaces between words. An image felt like the same pipeline with OCR in front of it.

That assumption broke as soon as I looked at real inputs.

Screenshots get compressed. Photos are tilted. Scans have uneven lighting. Dots and dashes may be visible to a human, but the spacing between them is often too fuzzy for a tool to trust.

The first attempt: clean the image

The obvious first step was preprocessing.

I tried the usual OCR cleanup ideas:

convert the image to black and white
increase contrast
crop around the Morse area
reduce noise
handle slight rotation or skew
make the marks easier for OCR to see

These steps help in some cases. They are also dangerous.

Morse code is not just “visible marks.” The gaps are part of the data. If preprocessing makes a dot too thick, removes a faint dash, or merges two spaces, the final decoded text can change.

That was the first useful lesson: a cleaner-looking image is not always a more faithful Morse image.

Why general OCR is a poor fit

Most OCR tools are designed around letters, words, and lines of text.

Morse code is closer to a timing or segmentation problem. The tool needs to preserve:

short marks
long marks
gaps inside a character
gaps between characters
larger gaps between words

A human can look at a messy image and infer intent. A tool has to decide from pixels.

If a photo is taken at an angle, the same visual distance may mean different things across the line. If the image is blurry, a dot may look like noise. If the spacing is inconsistent, the OCR result can look plausible while still being wrong.

Browser OCR vs backend recognition

I also tried thinking about where recognition should happen.

Browser-side OCR is attractive because it feels immediate. The user uploads an image and gets a result without waiting for a backend job. It also keeps the workflow simple.

Backend OCR or model-based recognition gives more flexibility. You can try heavier models, orientation detection, or multiple passes. But it adds cost, latency, deployment complexity, and another place where the result can become a black box.

I looked at newer OCR model families too. PaddleOCR and Transformer-based OCR approaches like TrOCR are useful references, especially for messy inputs. But Morse images are not ordinary text images. The model still needs to preserve dot-dash structure and spacing in a form that a Morse decoder can trust.

The current product decision

In Morse Coder, I am treating image decoding as an assisted workflow instead of a magic button.

The current direction is:

Let the user upload an image.
Try to extract the Morse pattern automatically.
Use AI-assisted recognition when normal OCR struggles.
Keep manual Morse input available.
Let the user inspect or correct the extracted Morse before trusting the decoded text.

That manual correction step is not just a fallback. It is part of the design.

If the tool jumps straight from image to final text, the user has no way to know whether an error came from image cleanup, mark detection, spacing, or Morse translation. Showing the intermediate Morse layer gives the user something concrete to repair.

You can try the current image workflow in Morse Coder.

The open question

The part I am still unsure about is how much the tool should guess.

For puzzle users, a good guess may be enough. For historical material, learning material, or classroom use, inspecting the Morse layer may matter more.

Right now I am leaning toward a hybrid: automate what is safe, expose the intermediate result, and make correction easy.

If you have any suggestions for this problem, I’d love to hear them.