DEV Community

Anthony Gicheru
Anthony Gicheru

Posted on

Why Your Code Breaks in Production (and How Docker Fixes It)

1. Why This Matters

You write your code.
You test it locally.
Everything works perfectly.

Then it goes to production… and breaks.

You spend hours debugging, only to realize:
nothing is wrong with your code — the environment is the problem.

In data engineering, this happens all the time:

  • A Spark job runs locally but fails in production
  • Airflow works on Ubuntu but breaks on macOS
  • Kafka pipelines behave differently across environments

At its core, the issue is simple:

Your environment is not consistent.

Containerization solves this by packaging everything your application needs into a single, portable unit that runs the same way anywhere.


2. Core Concept — What is Containerization?

Let’s simplify it with an analogy.

Analogy: A Fully Equipped House

Imagine being placed in an empty field with nothing around you.

No food.
No water.
No electricity.
No shelter.

You might survive for a while, but functioning properly would be difficult.

Now imagine being placed inside a fully equipped house.

Everything you need is already there:

  • food
  • water
  • electricity
  • furniture
  • internet
  • a bed

No matter where that house is moved, you can still live comfortably because your essentials move with you.

Applications work the same way.

An application needs certain things to function:

  • libraries
  • runtime versions
  • system tools
  • environment variables
  • dependencies

Without them, the application breaks.

Containerization solves this problem by packaging the application together with everything it needs to run.

Think of a container as:

a fully equipped house for your application.

Diagram comparing a Docker container to a fully equipped house containing everything an application needs to run consistently across different environments.

Inside the container, the app already has:

  • its dependencies
  • configurations
  • runtime environment
  • required tools

So whether the container runs on:

  • your laptop
  • a cloud server
  • a teammate’s machine

…the application still behaves the same way.

The Mental Model

Containerization gives your application its own portable environment with everything it needs to survive and run consistently.


3. Docker Basics

Key Components

  • Image - A blueprint/template
  • Container - A running instance of that image
  • Dockerfile - Instructions to build the image

A clean modern DevOps-style diagram showing the relationship between a Dockerfile, a Docker Image, and a running Docker Container.

Let’s Make It Real

Here’s the smallest possible Docker setup for a Python app.

app.py

print("Hello from Docker!")
Enter fullscreen mode Exit fullscreen mode

Dockerfile

FROM python:3.10-slim

WORKDIR /app
COPY app.py .

CMD ["python", "app.py"]
Enter fullscreen mode Exit fullscreen mode

Build and Run

docker build -t my-python-app .
docker run my-python-app
Enter fullscreen mode Exit fullscreen mode

Notice what we didn’t do:

  • Install Python manually
  • Manage versions
  • Configure anything

The environment is fully defined in the Dockerfile.


4. Why Docker is Useful in Data Engineering

In real-world data systems, you work with tools like:

  • Apache Airflow
  • Spark / PySpark
  • PostgreSQL or another data warehouse
  • Reporting tools or dashboards

Each of these has:

  • Different dependencies
  • Different configurations
  • Different runtime requirements
  • Different ports
  • Different environment variables

Without Docker, they often conflict.

For example:

  • Airflow may require specific Python packages
  • PySpark may need Java and Spark installed
  • PostgreSQL may need database credentials and storage
  • Dashboard tools may need access to the processed data

With Docker:

each tool runs in its own isolated environment — no conflicts, no surprises.

This is especially useful in batch data pipelines because the entire workflow can be reproduced across different machines and environments.


5. Docker Compose — Managing Multiple Containers

Real systems are never just one container.

A Dockerized data engineering pipeline may include:

  • An Airflow webserver
  • An Airflow scheduler
  • A PostgreSQL database
  • A Spark / PySpark processing service
  • Shared folders for DAGs, logs, scripts, and data

Running each service manually quickly becomes painful.


Docker vs Docker Compose

  • Docker - runs one container
  • Docker Compose - runs an entire system made up of multiple containers

The Key Insight

Without Docker Compose:

  • Multiple terminals
  • Manual startup order
  • Constant configuration issues
  • Harder networking between services

With Docker Compose:

one command starts everything.

Docker Compose orchestration diagram showing multiple services being started and managed from one docker-compose.yml file.


Example: Multi-Service Setup

A simplified Docker Compose setup for a batch pipeline may include Airflow and PostgreSQL.

docker-compose.yml

services:
  airflow-webserver:
    image: apache/airflow:3.2.1
    container_name: airflow_webserver
    command: airflow webserver
    ports:
      - "8080:8080"
    environment:
      AIRFLOW__CORE__EXECUTOR: LocalExecutor
      AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
    volumes:
      - ./dags:/opt/airflow/dags
      - ./logs:/opt/airflow/logs
      - ./jobs:/opt/airflow/jobs
    depends_on:
      - postgres

  airflow-scheduler:
    image: apache/airflow:3.2.1
    container_name: airflow_scheduler
    command: airflow scheduler
    environment:
      AIRFLOW__CORE__EXECUTOR: LocalExecutor
      AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
    volumes:
      - ./dags:/opt/airflow/dags
      - ./logs:/opt/airflow/logs
      - ./jobs:/opt/airflow/jobs
    depends_on:
      - postgres

  postgres:
    image: postgres:16
    container_name: postgres_db
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
    ports:
      - "5433:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data

volumes:
  postgres_data:
Enter fullscreen mode Exit fullscreen mode

8. Common Mistakes

  • Using localhost inside containers

This breaks almost everyone at first.

Inside a container:

localhost refers to the container itself, not your machine.

  • Forgetting environment variables

Missing configs often cause silent failures.

  • Not persisting data

Containers are temporary. Without volumes, your data disappears.

  volumes:
    - postgres_data:/var/lib/postgresql/data
Enter fullscreen mode Exit fullscreen mode
  • Rebuilding unnecessarily

Poor Dockerfile structure can slow builds significantly.


9. Best Practices

  • Use lightweight images
  FROM python:3.10-slim
Enter fullscreen mode Exit fullscreen mode
  • Add a .dockerignore
  node_modules
  .git
  .env
Enter fullscreen mode Exit fullscreen mode
  • Avoid latest in production

Use fixed versions to keep builds predictable.

  • Separate dev and production setups

They have different requirements.

  • Use Docker Compose for local development

It helps simulate real systems easily.

  • Use clear service names

Examples:

  • kafka
  • postgres
  • airflow

This simplifies networking and debugging.


10. Conclusion

Containerization changes how you think about environments.

  • Docker packages your application into a portable unit.
  • Docker Compose runs entire systems with one command.
  • Your pipelines become reproducible and consistent.

The real shift is this:

You stop debugging environments — and start defining them as code.

And once you reach that point:

You’re no longer just writing code — you’re building systems.

Top comments (0)