DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’» is a community of 967,911 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

Create account Log in
Jakub T
Jakub T

Posted on

How to develop Apache Airflow DAGs in Docker Compose

How to run a development environment on docker-compose

Quick overview of how to run Apache airflow for development and tests on your local machine using docker-compose.

We will be still using unofficial puckel/docker-airflow image. There is already an official docker image but I didn't test it yet.

Requirements

Project structure

  • docker-compose.yml - configuration file for the docker-compose
  • dags - will contain all our dags
  • lib - will contain all our custom code
  • test - will contain our pytests
  • .env - file with environment variables that we wish to include the containers

The environment variables are very handy because they allow you to customize almost everything in Airflow (https://airflow.apache.org/docs/stable/best-practices.html?highlight=environment#configuration)

docker-compose.yml

The basic structure:

version: '2.1'
services:
    postgres:
        image: postgres:9.6
        environment:
            - POSTGRES_USER=airflow
            - POSTGRES_PASSWORD=airflow
            - POSTGRES_DB=airflow
    webserver:
        image: puckel/docker-airflow:1.10.9
        restart: always
        mem_limit: 2048m
        depends_on:
            - postgres
        env_file:
            - .env
        environment:
            - LOAD_EX=n
            - EXECUTOR=Local
        volumes:
            - ./dags:/usr/local/airflow/dags
            - ./test:/usr/local/airflow/test
            - ./plugins:/usr/local/airflow/plugins
            # Uncomment to include custom plugins
            - ./requirements.txt:/requirements.txt
            - ~/.aws:/usr/local/airflow/.aws
        ports:
            - "8080:8080"
        command: webserver
        healthcheck:
            test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
            interval: 30s
            timeout: 30s
            retries: 3

As you can see we have several things there:

  • we allow to pass custom environment variables straight from the dotenv file (best practice is not include it in the files)
  • we will use postgres instance running as another docker container
  • we share our dags/test/plugins directories with the host so we can just edit our code on our local machine and run all the tests in container

Dummy DAG

Let's edit our first DAG: dags/dummy_dag.py

from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from datetime import datetime

with DAG('my_dag', start_date=datetime(2016, 1, 1)) as dag:
    op = DummyOperator(task_id='op')

Running the environment

$ docker-compose up

Starting airflow-on-docker-compose_postgres_1 ... done
Starting airflow-on-docker-compose_webserver_1 ... done
Attaching to airflow-on-docker-compose_postgres_1, airflow-on-docker-compose_webserver_1
[...]
webserver_1  | __init__.py:51}} INFO - Using executor [2020-05-05 10:19:08,741] {{dagbag.py:403}} INFO - Filling up the DagBag from /usr/local/airflow/dags
webserver_1  | LocalExecutor
webserver_1  | [2020-05-05 10:19:08,743] {{dagbag.py:403}} INFO - Filling up the DagBag from /usr/local/airflow/dags

Let's open the (http://localhost:8080)

Airflow instance on docker-compose

Running the tests in the environment

In order to run the tests in the environment we can just run:

docker-compose run webserver bash

This will give us access to the bash running in the container:

➜  airflow-on-docker-compose git:(master) βœ— docker-compose run webserver bash
Starting airflow-on-docker-compose_postgres_1 ... done
WARNING: You are using pip version 20.0.2; however, version 20.1 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
airflow@be3e69366e23:~$ ls
airflow.cfg  dags  plugins  test
airflow@be3e69366e23:~$ pytest test
bash: pytest: command not found

Of course we didn't install pytest yet - this is very easy:

$ echo "pytest" >> requirements.txt
$ docker-compose run webserver bash
Starting airflow-on-docker-compose_postgres_1 ... done
Collecting pytest
  Downloading pytest-5.4.1-py3-none-any.whl (246 kB)
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 246 kB 222 kB/s
Collecting more-itertools>=4.0.0
  Downloading more_itertools-8.2.0-py3-none-any.whl (43 kB)
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 43 kB 3.1 MB/s
Collecting wcwidth
  Downloading wcwidth-0.1.9-py2.py3-none-any.whl (19 kB)
Requirement already satisfied: importlib-metadata>=0.12; python_version < "3.8" in /usr/local/lib/python3.7/site-packages (from pytest->-r /requirements.txt (line 1)) (1.5.0)
Collecting packaging
  Downloading packaging-20.3-py2.py3-none-any.whl (37 kB)
Collecting pluggy<1.0,>=0.12
  Downloading pluggy-0.13.1-py2.py3-none-any.whl (18 kB)
Collecting py>=1.5.0
  Downloading py-1.8.1-py2.py3-none-any.whl (83 kB)
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 83 kB 956 kB/s
Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.7/site-packages (from pytest->-r /requirements.txt (line 1)) (19.3.0)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/site-packages (from importlib-metadata>=0.12; python_version < "3.8"->pytest->-r /requirements.txt (line 1)) (2.2.0)
Requirement already satisfied: six in /usr/local/lib/python3.7/site-packages (from packaging->pytest->-r /requirements.txt (line 1)) (1.14.0)
Collecting pyparsing>=2.0.2
  Downloading pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 67 kB 624 kB/s
Installing collected packages: more-itertools, wcwidth, pyparsing, packaging, pluggy, py, pytest
  WARNING: The scripts py.test and pytest are installed in '/usr/local/airflow/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed more-itertools-8.2.0 packaging-20.3 pluggy-0.13.1 py-1.8.1 pyparsing-2.4.7 pytest-5.4.1 wcwidth-0.1.9

We can implement our first basic test taken directly from (https://github.com/apache/airflow/blob/master/docs/best-practices.rst)

from airflow.models import DagBag

def test_dag_loading():
    dagbag = DagBag()
    dag = dagbag.get_dag(dag_id='dummy_dag')
    assert dagbag.import_errors == {}
    assert dag is not None
    assert len(dag.tasks) == 1

And now we can freely run our tests:

airflow@a6ca8c1b706d:~$ .local/bin/pytest
========================================================================== test session starts ==========================================================================
platform linux -- Python 3.7.6, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /usr/local/airflow
plugins: celery-4.4.0
collected 1 item

test/test_dag_loading.py .

===================================================================== 1 passed in 0.83s =====================================================================

All the code can be found here: https://github.com/troszok/airflow-on-docker-compose

Top comments (1)

Collapse
 
paulaburner profile image
Paula Burner

If you’re searching for the best places to go zip lining, look no further than this curated collection of the top 13 most amazing ziplines in the world! We hand-picked the fastest, highest, and most beautiful locations for ziplining across the planet.

Image description

Want the Python badge for your profile?

It's awarded to the top Python author each week. Start your post here!