Anthony Gicheru

Posted on May 12

Why Your Code Breaks in Production (and How Docker Fixes It)

#docker #containers #devops

1. Why This Matters

You write your code.
You test it locally.
Everything works perfectly.

Then it goes to production… and breaks.

You spend hours debugging, only to realize:
nothing is wrong with your code — the environment is the problem.

In data engineering, this happens all the time:

A Spark job runs locally but fails in production
Airflow works on Ubuntu but breaks on macOS
Kafka pipelines behave differently across environments

At its core, the issue is simple:

Your environment is not consistent.

Containerization solves this by packaging everything your application needs into a single, portable unit that runs the same way anywhere.

2. Core Concept — What is Containerization?

Let’s simplify it with an analogy.

Analogy: A Fully Equipped House

Imagine being placed in an empty field with nothing around you.

No food.
No water.
No electricity.
No shelter.

You might survive for a while, but functioning properly would be difficult.

Now imagine being placed inside a fully equipped house.

Everything you need is already there:

food
water
electricity
furniture
internet
a bed

No matter where that house is moved, you can still live comfortably because your essentials move with you.

Applications work the same way.

An application needs certain things to function:

libraries
runtime versions
system tools
environment variables
dependencies

Without them, the application breaks.

Containerization solves this problem by packaging the application together with everything it needs to run.

Think of a container as:

a fully equipped house for your application.

Inside the container, the app already has:

its dependencies
configurations
runtime environment
required tools

So whether the container runs on:

your laptop
a cloud server
a teammate’s machine

…the application still behaves the same way.

The Mental Model

Containerization gives your application its own portable environment with everything it needs to survive and run consistently.

3. Docker Basics

Key Components

Image - A blueprint/template
Container - A running instance of that image
Dockerfile - Instructions to build the image

Let’s Make It Real

Here’s the smallest possible Docker setup for a Python app.

app.py

print("Hello from Docker!")

Dockerfile

FROM python:3.10-slim

WORKDIR /app
COPY app.py .

CMD ["python", "app.py"]

Build and Run

docker build -t my-python-app .
docker run my-python-app

Notice what we didn’t do:

Install Python manually
Manage versions
Configure anything

The environment is fully defined in the Dockerfile.

4. Why Docker is Useful in Data Engineering

In real-world data systems, you work with tools like:

Apache Airflow
Spark / PySpark
PostgreSQL or another data warehouse
Reporting tools or dashboards

Each of these has:

Different dependencies
Different configurations
Different runtime requirements
Different ports
Different environment variables

Without Docker, they often conflict.

For example:

Airflow may require specific Python packages
PySpark may need Java and Spark installed
PostgreSQL may need database credentials and storage
Dashboard tools may need access to the processed data

With Docker:

each tool runs in its own isolated environment — no conflicts, no surprises.

This is especially useful in batch data pipelines because the entire workflow can be reproduced across different machines and environments.

5. Docker Compose — Managing Multiple Containers

Real systems are never just one container.

A Dockerized data engineering pipeline may include:

An Airflow webserver
An Airflow scheduler
A PostgreSQL database
A Spark / PySpark processing service
Shared folders for DAGs, logs, scripts, and data

Running each service manually quickly becomes painful.

Docker vs Docker Compose

Docker - runs one container
Docker Compose - runs an entire system made up of multiple containers

The Key Insight

Without Docker Compose:

Multiple terminals
Manual startup order
Constant configuration issues
Harder networking between services

With Docker Compose:

one command starts everything.

Example: Multi-Service Setup

A simplified Docker Compose setup for a batch pipeline may include Airflow and PostgreSQL.

docker-compose.yml

services:
  airflow-webserver:
    image: apache/airflow:3.2.1
    container_name: airflow_webserver
    command: airflow webserver
    ports:
      - "8080:8080"
    environment:
      AIRFLOW__CORE__EXECUTOR: LocalExecutor
      AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
    volumes:
      - ./dags:/opt/airflow/dags
      - ./logs:/opt/airflow/logs
      - ./jobs:/opt/airflow/jobs
    depends_on:
      - postgres

  airflow-scheduler:
    image: apache/airflow:3.2.1
    container_name: airflow_scheduler
    command: airflow scheduler
    environment:
      AIRFLOW__CORE__EXECUTOR: LocalExecutor
      AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
    volumes:
      - ./dags:/opt/airflow/dags
      - ./logs:/opt/airflow/logs
      - ./jobs:/opt/airflow/jobs
    depends_on:
      - postgres

  postgres:
    image: postgres:16
    container_name: postgres_db
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
    ports:
      - "5433:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data

volumes:
  postgres_data:

8. Common Mistakes

Using localhost inside containers

This breaks almost everyone at first.

Inside a container:

localhost refers to the container itself, not your machine.

Forgetting environment variables

Missing configs often cause silent failures.

Not persisting data

Containers are temporary. Without volumes, your data disappears.

  volumes:
    - postgres_data:/var/lib/postgresql/data

Rebuilding unnecessarily

Poor Dockerfile structure can slow builds significantly.

9. Best Practices

Use lightweight images

  FROM python:3.10-slim

Add a .dockerignore

  node_modules
  .git
  .env

Avoid latest in production

Use fixed versions to keep builds predictable.

Separate dev and production setups

They have different requirements.

Use Docker Compose for local development

It helps simulate real systems easily.

Use clear service names

Examples:

kafka
postgres
airflow

This simplifies networking and debugging.

10. Conclusion

Containerization changes how you think about environments.

Docker packages your application into a portable unit.
Docker Compose runs entire systems with one command.
Your pipelines become reproducible and consistent.

The real shift is this:

You stop debugging environments — and start defining them as code.

And once you reach that point:

You’re no longer just writing code — you’re building systems.

DEV Community