GeraldM

Posted on May 2

Guide to Installing and Configuring Apache Airflow 3.2.0 with PostgreSQL and Running Your First DAG

#airflow #postgres #dag #beginners

Introduction

As a data engineer, you may have recently learned about Apache Airflow, what it is, and how it orchestrates and automates data workflows. The next step is gaining hands-on experience by setting it up in your own environment.

This article provides a step-by-step guide to installing and configuring Apache Airflow, connecting it to PostgreSQL, and running your first DAG. By the end, you will have a fully functional Airflow environment ready for building and managing data pipelines.

We will be following the installation guide from the official airflow documentation: apache-airflow

Methodology

Prerequisites

A Linux environment (Eg. Linux virtual private server)

Successful installation and running requires:
python3 (supported versions: 3.10, 3.11, 3.12, 3.13, 3.14)
Python virtual environment
psycopg2-binary
Pandas

On your linux server install python3 using the command:

sudo apt install python-is-python3

Airflow requires a home directory, and uses ~/airflowby default, but you can set a different location if you prefer.This step of setting the environment variable should be done before installing Airflow so that the installation process knows where to store the necessary files.

export AIRFLOW_HOME=~/airflow

Navigate to your root directory using the command cd ~ , create a folder named airflow and navigate into it.

Inside the airflow directory, create a virtual environment named airflow_venv

Activate the virtual environment you have created

Upgrade pip

Install airflow with using the correct python version constraints. Make sure to specify the airflow version you want to install. In my case I have installed airflow version 3.2.0.
command:

pip install apache-airflow[celery]==3.2.0 --constraint https://raw.githubusercontent.com/apache/airflow/constraints-3.2.0/constraints-3.12.txt

The installation will start and give it a few seconds to complete

After a few seconds it completes installation

Verify successful installation by running the command airflow version to check the installed version

Start Airflow

After successful installation, there are two ways you can start airflow.
1. Using Airflow Standalone
By running the command airflow standalone, a database is initialized, a user admin created and all components started.

Run the command airflow standalone. Airflow will start, starting all the components.

Notice that after running the command, raw real-time logs are displayed on our terminal and preventing us from using the terminal.
If you exit from this open dialog, airflow will stop.

To avoid this, use the command:

nohup airflow standalone > airflow.log 2>&1 &

This command tells airflow to start but to write all the logs silently and in the background to a file airflow.log.
After using the nohup command, confirm that airflow is running by using the command ps aux | grep 'airflow'

Then use <your server ip>:<port 8080> Eg: 102.209.32.65:8080 on your web browser and airflow web UI will be available.

2. Run individual parts manually
If you want to run the individual parts of Airflow manually rather than using the all-in-one standalone command, you can instead run the following:

airflow db migrate

airflow users create \
    --username admin \
    --firstname Peter \
    --lastname Parker \
    --role Admin \
    --email spiderman@superhero.org

airflow api-server --port 8080

airflow scheduler

airflow dag-processor

airflow triggerer

Note: In airflow version 3+ and above, the above commands will note work as they are only available when Flask AppBuilder (FAB) auth manager is enabled. This is different from Airflow versions below 3.0 which did not have this requirement.

To enable Flask AppBuilder (FAB) auth manager, open the file airflow.cfg using the command nano /root/airflow/airflow.cfg and add the following

Then start Airflow.
You will get an error ModuleNotFoundError: No module named 'airflow.providers.fab'. This is because a module named airflow.providers.fab is missing.

Install the module using pip install apache-airflow-providers-fab

Run airflow db migrate again and the error is resolved

Creates a user

Start the services using the following commands (used nohup so that the services will in the background and persistently while storing logs in the defined log files)

nohup airflow api-server --port 8080 > api-server.log 2>&1 &
nohup airflow scheduler > scheduler.log 2>&1 &
nohup airflow dag-processor > dag-processor.log 2>&1 &
nohup airflow triggerer > triggerer.log 2>&1 &

Airflow is now accessible via browser UI

Configurations

Having started airflow successfully and confirming it is accessible via web UI. There are some configuration changes we need to makes to our airflow.
The configurations changes reside in the location ~/airflow and the file we will be editing is airflow.cfg

Before making configuration changes, stop airflow. I started airflow using the command nohup airflow standalone > airflow.log 2>&1 & to stop it I will used the command pkill -9 airflow

Navigate to the file location

Use nano airflow.cfg to open the file for editing

We change the locations in which out DAGs will be stored from /root/airflow/dags to /root/workflows.
Note: This is optional

Set airflow timezone to your local timezone

Verify that your executor is set to LocalExecutor if you are running airflow locally and want to run more than on task

If you want to connect airflow to an external database, set the connection to your database. I have PostgreSQL database running on the same server as Airflow so I set airflow to use that database.

Change the load_examples to False. This is to prevent Airflow from showcase examples DAGs that exists.

Using ctrl + S and ctrl + x Save the file and exit.

Airflow uses a binary called psycopg2. Inside the airflow virtual environment, install it using the commands pip install psycopg2-binary

Airflow also uses a module named asyncpg, install it using the command pip install asyncpg

By default, Airflow uses it's own database (airflowdb), but remember, we edited the config file to tell airflow to use our PostgreSQL database. While the airflow virtual environment is active, we run the command airflow db migrate to use the new database.

To add a DAG to airflow, navigate to the workflows directory we created and pointed Airflow to read DAGs from and add a python code containing a DAG.
I added a simple dag (simple.py) inside the directory workflows

Use nano simple.py and type your DAG(python code)

from airflow import DAG
from datetime import datetime, timedelta
from airflow.providers.standard.operators.python import PythonOperator

def say_hello():
    print("Hello from Airflow!")

def say_goodbye():
    print("Goodbye from Airflow!")

with DAG(
    dag_id='simple_dag',
    start_date = datetime(2026, 1, 1),
    schedule = timedelta(minutes=5),
    catchup=False,
) as dag:

    hello_task = PythonOperator(
        task_id='hi',
        python_callable=say_hello,
    )

    goodbye_task = PythonOperator(
        task_id='bye',
        python_callable=say_goodbye,
    )

    hello_task >> goodbye_task

Airflow automatically picks up the DAG and loads it. DAGs are found in the Dags section on the UI

By clicking on the DAG, we are able to see it's details including the runs and whether they are successful.

Conclusion

In this article, you have successfully installed and configured Apache Airflow 3.2.0, connected it to PostgreSQL, and explored two different ways of launching Airflow: using the simplified standalone approach and the production-style setup with manually created users and individual Airflow services. You also learned how to make essential configuration changes in the airflow.cfg file and deployed your first DAG into the Airflow environment.

With this foundation in place, you now have a working orchestration platform capable of scheduling, monitoring, and managing data workflows. As you continue learning Airflow, you can expand into more advanced topics such as task dependencies, scheduling strategies, integrations with cloud platforms, and building production-grade ETL and data engineering pipelines.