DEV Community

Cover image for From Image to Text in Seconds — Tesseract OCR in a Docker Container
Chandrani Mukherjee
Chandrani Mukherjee

Posted on

From Image to Text in Seconds — Tesseract OCR in a Docker Container

If you’ve ever needed OCR (Optical Character Recognition) in your projects, you’ve probably come across Tesseract — the open-source OCR engine by Google.

Running Tesseract locally is great, but what if you want it in a self-contained, portable environment?

That’s where Docker comes in.

In this tutorial, we’ll containerize Tesseract so you can run OCR anywhere — no OS dependencies, no installation headaches.


📦 Why Use Docker for Tesseract?

  • No local setup — all dependencies in one container
  • Cross-platform — run the same setup on Windows, macOS, or Linux
  • Scalable — perfect for batch OCR jobs in the cloud

🛠 Step 1: Install Docker

Make sure Docker is installed and running.

You can check with:

docker --version
Enter fullscreen mode Exit fullscreen mode

If not installed, download it here: https://www.docker.com/get-started


🐳 Step 2: Create a Dockerfile for Tesseract

Create a Dockerfile:

FROM ubuntu:22.04

# Install dependencies
RUN apt-get update && apt-get install -y     tesseract-ocr     tesseract-ocr-eng     libtesseract-dev     && rm -rf /var/lib/apt/lists/*

# Set working directory
WORKDIR /data

# Default command
CMD ["tesseract", "--version"]
Enter fullscreen mode Exit fullscreen mode

📂 Step 3: Build the Docker Image

From your project folder:

docker build -t tesseract-ocr .
Enter fullscreen mode Exit fullscreen mode

🖼 Step 4: Run Tesseract OCR in the Container

To OCR an image:

docker run --rm -v $(pwd):/data tesseract-ocr input.png output
Enter fullscreen mode Exit fullscreen mode

This will:

  • Mount your current folder to /data inside the container
  • Read input.png
  • Save OCR text as output.txt

🌍 Adding More Languages

Want to OCR in multiple languages?

Modify the Dockerfile to install extra language packs:

RUN apt-get install -y tesseract-ocr-fra tesseract-ocr-spa
Enter fullscreen mode Exit fullscreen mode

🚀 Pro Tips

  • For faster processing, mount a GPU-enabled container with OCR libraries.
  • Combine this with Python’s pytesseract inside the container for AI workflows.
  • Use Docker Compose if integrating into larger ML pipelines.

✅ Conclusion

With Docker, Tesseract becomes portable, consistent, and production-ready.

No matter where you run it — local machine, cloud VM, or CI/CD pipeline — your OCR setup will be the same every time.

Top comments (0)