De' Clerke

Posted on Apr 15 • Edited on Jun 5

Apache Airflow 2 vs 3: A Deep Technical Comparison for Data Engineers

#airflow #dataengineering #python #docker

Apache Airflow 2 vs 3: A Deep Technical Comparison for Data Engineers

TL;DR: Airflow 3 dissolves the monolithic webserver into three independent services, strips direct database access from task code, ships a fully stable Task SDK, and rewrites the entire UI in React. If you are running Airflow 2 in production, this article will tell you exactly what breaks, what improves, and how to migrate without losing a night's sleep.

Why This Comparison Matters

Every major Airflow release has nudged the architecture forward. Airflow 2 gave us the TaskFlow API, the Scheduler high-availability refactor, and provider packages. Airflow 3 is different in kind, not just degree.

In the process of migrating a production Docker Compose stack for a healthcare ML retraining pipeline from Airflow 2 patterns to Airflow 3, every single one of the following hit in production:

CPU spike to 600% caused by a silent breaking change in JWT key management
Tasks silently failing with Connection refused because localhost no longer means what it used to
A healthcheck that always reported unhealthy because port 8974 no longer exists
A user creation step that silently did nothing because FAB is gone

Each of these failures traces back to a deliberate, principled architectural decision in Airflow 3. Once you understand why the changes were made, the fixes are obvious. Without that context, Airflow 3 can feel like it is actively working against you.

This article is that context.

The 30-Second Summary

Dimension	Airflow 2	Airflow 3
UI framework	Flask-AppBuilder (FAB)	React (FastAPI backend)
Webserver	`airflow webserver`	`airflow api-server`
DAG Processor	Embedded in scheduler	Mandatory separate service
Task Execution	Direct fork/subprocess	Task Execution API (AIP-72)
Metadata DB access from tasks	Allowed	Prohibited
Auth manager default	FAB (full RBAC)	SimpleAuthManager
REST API	v1 (Flask)	v2 (FastAPI, stable)
Default schedule	`@daily` (cron)	`None`
`catchup` default	`True`	`False`
SequentialExecutor	Available	Removed
SubDAGs	Available	Removed
SLAs	Available	Removed
Import path for `@dag`/`@task`	`airflow.decorators`	`airflow.sdk`
XCom pickling	Enabled by default	Disabled by default
Python minimum	3.8	3.9
PostgreSQL minimum	12	13

Part 1: The Architectural Paradigm Shift

Airflow 2: One Webserver to Rule Them All

In Airflow 2, the mental model for a self-hosted deployment is relatively straightforward. You run four processes:

airflow webserver       # Flask-AppBuilder UI + REST API v1 + auth
airflow scheduler       # parses DAGs + triggers task instances
airflow worker          # (CeleryExecutor) executes tasks
postgres/mysql          # metadata database

The webserver does double duty: it serves the browser UI, exposes the REST API, and handles authentication, all from a single Flask application. The scheduler parses your dags/ directory inline, as part of its own main loop.

This is simple to reason about. It is also a single point of failure for three completely separate concerns.

Airflow 3: Separation of Concerns as a First-Class Constraint

Airflow 3 decomposes the monolith into discrete, independently scalable services:

airflow api-server          # FastAPI: UI + REST API v2 + auth (replaces webserver)
airflow scheduler           # triggers task instances only; NO DAG parsing
airflow dag-processor       # mandatory: parses DAGs, writes to serialized_dag table
airflow triggerer           # manages deferrable operators
postgres/mysql              # metadata database

The key insight: the scheduler in Airflow 3 does not parse DAGs. It reads the serialized_dag table, which is populated exclusively by the dag-processor service. If you start a scheduler without a dag-processor, it will start cleanly and then do nothing, because it has no serialized DAGs to schedule.

# Airflow 2: single scheduler did everything
[Scheduler process]
  ├── Parses dags/ directory
  ├── Updates serialized_dag table
  ├── Checks heartbeats
  └── Triggers TaskInstances

# Airflow 3: responsibilities split
[dag-processor]               [scheduler]
  └── Parses dags/                 ├── Reads serialized_dag
      Updates serialized_dag       ├── Checks heartbeats
                                   └── Triggers TaskInstances via Execution API

This split unlocks horizontal scalability. The dag-processor can be scaled independently on compute-heavy deployments with thousands of DAG files, without touching the scheduler's scheduling loop latency.

Part 2: The Task Execution API (AIP-72)

How Airflow 2 Ran Tasks

In Airflow 2 with LocalExecutor, task execution worked like this:

Scheduler identifies a TaskInstance ready to run
Scheduler forks a subprocess
Subprocess imports your DAG file directly
Subprocess calls task.execute(context)
Task code has unrestricted access to settings.Session, DagRun, TaskInstance models -- the entire Airflow metadata database

Step 5 is a footgun. Task code could accidentally (or intentionally) query, modify, or drop metadata. It tightly coupled your business logic to Airflow internals.

How Airflow 3 Runs Tasks

Airflow 3 introduces a Task Execution API: a lightweight HTTP interface that sits between the task subprocess and the metadata database:

[Scheduler] ──triggers──► [Task Subprocess]
                                 │
                                 │ HTTP (JWT-authenticated)
                                 ▼
                          [API Server /execution/]
                                 │
                                 ▼
                          [Metadata Database]

Task code no longer talks to the database. It talks to the Execution API, which enforces a controlled, auditable surface for every metadata operation. Direct imports like from airflow.models import DagRun inside task code will raise errors in Airflow 3.

The JWT Problem (and Why It Caused a 600% CPU Spike)

The Execution API authenticates requests with JWT tokens. The scheduler signs each task's token; the api-server verifies it. Both must use the same secret key.

In Airflow 3, if AIRFLOW__API_AUTH__JWT_SECRET is not explicitly set, each service calls get_signing_key() and generates a random in-memory key. The scheduler's random key does not equal the api-server's random key. Every task fails immediately with:

Invalid auth token: Signature verification failed

The fix is one environment variable, shared across all containers:

# docker-compose.yml -- x-airflow-common environment block
AIRFLOW__API_AUTH__JWT_SECRET: "your-static-secret-change-in-prod"

The 600% CPU spike came from a related issue: the api-server, when launched with --workers > 1 (uvicorn default), spawns worker processes via multiprocessing.spawn. Each spawned process re-initialises its own random JWT key and immediately crashes when it receives a token signed by the master process. The crash loop runs at full speed:

[api-server] Waiting for child process [12]...
[api-server] Child process [12] died unexpectedly
[api-server] Waiting for child process [13]...
[api-server] Child process [13] died unexpectedly

Fix: enforce a single worker until this is resolved upstream.

command: api-server --workers 1

The `EXECUTION_API_SERVER_URL` Problem

Every scheduler container needs to know where the Execution API lives. The default is http://localhost:8080/execution/. In a Docker Compose deployment, localhost inside the scheduler container is the scheduler container's own loopback interface. The api-server is a different container on a different network namespace.

# Airflow 2: localhost was fine (single process model)
# Airflow 3 Docker: localhost = wrong container

Result: every task fails with httpx.ConnectError: [Errno 111] Connection refused, even when the api-server is perfectly healthy.

Fix:

AIRFLOW__CORE__EXECUTION_API_SERVER_URL: "http://airflow-api-server:8080/execution/"

Part 3: Authentication

Flask-AppBuilder in Airflow 2

Airflow 2 used Flask-AppBuilder (FAB) for authentication. FAB gave you:

Full RBAC with built-in roles (Admin, Op, User, Viewer, Public)
OAuth integrations (Google, GitHub, LDAP, etc.)
A complete user management UI
_AIRFLOW_WWW_USER_CREATE environment variable for bootstrapping admin users

# Airflow 2: works as expected
_AIRFLOW_WWW_USER_CREATE: "true"
_AIRFLOW_WWW_USER_USERNAME: "admin"
_AIRFLOW_WWW_USER_PASSWORD: "admin"
_AIRFLOW_WWW_USER_ROLE: "Admin"

SimpleAuthManager in Airflow 3

Airflow 3 ships SimpleAuthManager as the default. It stores users and passwords in a plain-text JSON file:

{
  "admin": "my_secure_password"
}

FAB is not gone. It is available as an explicit provider, but it is no longer the default. The _AIRFLOW_WWW_USER_CREATE variable is silently ignored when SimpleAuthManager is active. You will see this in your init logs:

Skipping user creation as auth manager different from Fab is used

There is no warning that your carefully configured user variables did nothing.

To bootstrap a user with SimpleAuthManager in Docker Compose:

# Step 1: configure the users list and passwords file location
AIRFLOW__CORE__SIMPLE_AUTH_MANAGER_USERS: "admin:Admin"
AIRFLOW__CORE__SIMPLE_AUTH_MANAGER_PASSWORDS_FILE: "/opt/airflow/project/simple_auth_manager_passwords.json"

# Step 2: write the passwords file in your init container
command:
  - -c
  - |
    python3 -c "
    import json
    open('/opt/airflow/project/simple_auth_manager_passwords.json','w').write(
        json.dumps({'admin': 'your_password'})
    )"
    exec /entrypoint airflow version

The passwords file must be accessible to all containers. Use a shared bind mount.

Choosing Between SimpleAuthManager and FAB

Scenario	Recommendation
Local dev / CI / demos	SimpleAuthManager: fast, zero config
Small team, basic username/password	SimpleAuthManager
Enterprise SSO (LDAP, OAuth, SAML)	FAB provider (`apache-airflow-providers-fab`)
Multi-team RBAC with fine-grained permissions	FAB provider
Kubernetes deployments	FAB provider or custom `AuthManager` implementation

Part 4: Breaking Changes Catalogue

4.1 SubDAGs Replaced by TaskGroups and Assets

SubDAGs are removed in Airflow 3. They were always problematic: they introduced deadlock risks with pool management, made the graph view confusing, and performed poorly at scale.

# Airflow 2 (SubDAG pattern -- do not migrate this verbatim)
from airflow.operators.subdag import SubDagOperator

process_data = SubDagOperator(
    task_id="process_data",
    subdag=create_subdag(dag.dag_id, "process_data", args),
    dag=dag
)

# Airflow 3 migration: TaskGroups for visual grouping
from airflow.sdk import dag, task
from airflow.utils.task_group import TaskGroup

@dag(schedule=None)
def my_pipeline():
    with TaskGroup("process_data") as process_data:
        @task
        def validate(): ...

        @task
        def transform(): ...

        validate() >> transform()

For cross-DAG dependencies that SubDAGs were sometimes used for, the preferred Airflow 3 pattern is Asset-based scheduling:

from airflow.sdk import Asset

raw_data = Asset("s3://my-bucket/raw/")

@dag(schedule=[raw_data])  # this DAG runs when raw_data is updated
def downstream_pipeline(): ...

4.2 SequentialExecutor Removed

SequentialExecutor (runs one task at a time, no parallelism) is gone. The replacement for local development is LocalExecutor with a PostgreSQL or SQLite backend.

# Airflow 2: SequentialExecutor was the default for fresh installs
AIRFLOW__CORE__EXECUTOR: SequentialExecutor

# Airflow 3: use LocalExecutor
AIRFLOW__CORE__EXECUTOR: LocalExecutor

Note: LocalExecutor requires a real database backend (PostgreSQL recommended). SQLite with LocalExecutor is technically functional but unsupported for production.

4.3 SLA Misses Removed

The SLA miss feature is gone. It was notoriously unreliable: callbacks fired inconsistently depending on scheduler restart timing, and the implementation was tightly coupled to the old execution model.

# Airflow 2 (no longer works in Airflow 3)
@dag(
    sla_miss_callback=my_sla_handler
)
def my_dag():
    slow_task = PythonOperator(
        task_id="slow_task",
        python_callable=run_slow_thing,
        sla=timedelta(hours=2)  # removed
    )

Migration options:

Airflow 3.2+: Use Deadline Alerts (scheduler-native, much more reliable)
External monitoring: Instrument task duration in your observability stack (Prometheus, Datadog, etc.) and alert from there

4.4 REST API v1 Removed

The REST API v1 (Flask-based, under /api/v1/) is completely removed. Airflow 3 ships a stable, FastAPI-backed REST API under /api/v2/.

The v2 API is not backward-compatible. Common breakage points:

# v1 endpoint (broken in Airflow 3)
GET /api/v1/dags/{dag_id}/dagRuns

# v2 endpoint (Airflow 3)
GET /api/v2/dags/{dag_id}/dagRuns

Beyond the URL prefix change, the response schemas have also changed. Any custom integrations, CI scripts, or tooling that hit the Airflow API directly will require updates.

The new health endpoint is GET /api/v2/monitor/health:

{
  "metadatabase": {"status": "healthy"},
  "scheduler": {"status": "healthy"},
  "triggerer": {"status": "healthy"},
  "dag_processor": {"status": "healthy"}
}

Note that dag_processor is a new key. It did not exist in Airflow 2 health responses.

4.5 Removed Context Variables

Several context variables that were available in TaskInstance.context are removed:

# These no longer exist in Airflow 3 task context
execution_date      # use logical_date
tomorrow_ds         # compute manually
yesterday_ds        # compute manually
prev_ds             # compute manually
prev_execution_date # removed
next_execution_date # removed

The execution_date rename to logical_date reflects a deeper semantic change: in Airflow 3, logical_date represents run_after (when the DAG should run) rather than data_interval_start (the start of the data window). For event-driven and manual DAGs, this distinction matters.

# Airflow 2
def my_task(**context):
    run_date = context["execution_date"]  # deprecated

# Airflow 3
def my_task(**context):
    run_date = context["logical_date"]  # correct

4.6 XCom Pickling Disabled

XCom pickling is disabled by default in Airflow 3. In Airflow 2, Python objects were serialized via pickle and stored in the metadata database. This allowed arbitrary Python objects to flow between tasks but introduced security risks (arbitrary code execution on deserialization) and size limits.

# Airflow 2: this worked silently
@task
def extract():
    return {"data": some_sklearn_model}  # pickled into XCom

# Airflow 3: raises an error with default XCom backend
# Use JSON-serializable return values or a custom XCom backend
@task
def extract():
    return {"rows": 1000, "path": "s3://bucket/output.parquet"}  # safe

For large artifacts (models, DataFrames), the recommended pattern is to write to external storage (S3, GCS, local filesystem) and pass only the path as XCom.

Part 5: What's New in Airflow 3

5.1 The `airflow.sdk` Namespace

Airflow 3 ships a stable, versioned Task SDK. All DAG authoring primitives now live under airflow.sdk:

# Airflow 2 import paths (still work in early Airflow 3, will be removed)
from airflow.decorators import dag, task
from airflow.models.dag import DAG
from airflow.sensors.base import BaseSensorOperator
from airflow.datasets import Dataset

# Airflow 3 canonical imports
from airflow.sdk import dag, task, DAG, Asset
from airflow.sdk import BaseSensorOperator

The SDK is designed to have a stable interface across minor versions. The intent is that DAGs written against airflow.sdk should be forward-compatible with future Airflow releases without import-path churn.

Important for Docker deployments: The airflow.sdk import chain triggers a connection attempt to the Task Execution API at import time. If the api-server is unavailable or CPU-starved, the dag-processor will hang on this import and eventually be SIGKILL'd by its own parse timeout. Fix the api-server first; everything else follows.

5.2 DAG Versioning (AIP-66)

Airflow 3 introduces first-class DAG versioning. Multiple versions of the same DAG can exist simultaneously in the serialized_dag table, and running DagRuns execute against the DAG version they were triggered with, not the latest version.

dag_id: "healthcare_retrain"
├── version 1: train → validate (runs triggered before 2026-04-10)
└── version 2: load_data → train → validate (runs triggered after 2026-04-10)

This solves a long-standing pain point: in Airflow 2, modifying a DAG while runs were in-flight could corrupt active DagRuns if the task structure changed.

5.3 Asset-Based Scheduling (AIP-74, AIP-75)

The Airflow 2 Dataset concept has been renamed to Asset and significantly expanded. Assets replace cron-based scheduling for data-driven pipelines:

from airflow.sdk import dag, task, Asset

# Producer DAG
raw_asset = Asset("s3://my-datalake/raw/events.parquet")

@dag(schedule="@hourly")
def ingest_events():
    @task(outlets=[raw_asset])
    def fetch_and_write():
        # ... write to S3
        pass
    fetch_and_write()

# Consumer DAG: runs when raw_asset is updated, not on a clock
@dag(schedule=[raw_asset])
def process_events():
    @task
    def transform():
        pass
    transform()

Assets enable a push-driven scheduling model where downstream DAGs run when their data dependencies are satisfied, not when a clock fires.

5.4 Edge Executor (AIP-69)

The Edge Executor allows Airflow tasks to run on lightweight remote workers without CeleryExecutor's operational overhead. Workers register with the api-server via HTTP polling and execute tasks locally, making it viable for:

IoT / edge compute deployments
Low-resource VMs that cannot run a Celery broker
Multi-cloud task distribution without VPN tunnels

# airflow.cfg / env var
AIRFLOW__CORE__EXECUTOR: EdgeExecutor

5.5 Scheduler-Managed Backfills (AIP-78)

Backfills in Airflow 2 were CLI-driven one-shot operations. Airflow 3 makes backfills first-class scheduler concepts:

# Airflow 3: create a scheduler-managed backfill
airflow dags backfill create --dag-id my_dag \
--from-date 2024-01-01 --to-date 2024-12-31

# Inspect backfill state
airflow dags backfill list --dag-id my_dag

Scheduler-managed backfills respect pool limits, run in parallel with live DagRuns, and are visible in the UI, eliminating the "backfill is a black box" experience from Airflow 2.

5.6 React UI (AIP-38, AIP-84)

The Airflow 3 UI is a full rewrite in React, backed by the FastAPI REST API v2. Practical implications:

Significantly faster rendering for DAGs with hundreds of tasks
Grid view replaces the old Tree view as the primary timeline view
The legacy Graph view (force-directed) is replaced with a cleaner task-level dependency graph
The UI now works correctly in all modern browsers without Flask session issues
Dark mode is available natively

Part 6: Import Path Migration Guide

This is the table you want bookmarked during a migration:

Airflow 2 import	Airflow 3 import
`from airflow.decorators import dag, task`	`from airflow.sdk import dag, task`
`from airflow.models.dag import DAG`	`from airflow.sdk import DAG`
`from airflow.sensors.base import BaseSensorOperator`	`from airflow.sdk import BaseSensorOperator`
`from airflow.datasets import Dataset`	`from airflow.sdk import Asset`
`from airflow.models import Variable`	`from airflow.sdk import Variable`
`from airflow.models import Connection`	`from airflow.sdk import Connection`
`from airflow.operators.python import PythonOperator`	`apache-airflow-providers-standard` package
`from airflow.operators.bash import BashOperator`	`apache-airflow-providers-standard` package
`from airflow.sensors.filesystem import FileSensor`	`apache-airflow-providers-standard` package

Many common operators (Python, Bash, File sensors) have moved to apache-airflow-providers-standard. Install this package explicitly:

pip install apache-airflow-providers-standard

Automated Migration with Ruff

Airflow 3 ships with Ruff lint rules specifically for migration:

pip install "ruff>=0.13.1"

# Check for mandatory breaking changes (AIR301)
ruff check dags/ --select AIR301 --preview

# Auto-fix safe renames
ruff check dags/ --select AIR301 --fix --unsafe-fixes --preview

# Check for recommended updates (AIR302: deprecated-but-not-yet-removed)
ruff check dags/ --select AIR302 --preview

Example output:

dags/retrain_dag.py:3:1: AIR301 airflow.decorators.dag is removed in Airflow 3.0. Use airflow.sdk.dag instead.
[*] AIR301 auto-fix available

Part 7: Docker Compose Migration

If you are running Airflow 2 via Docker Compose, here is a precise list of changes required for Airflow 3.

Services to Add

airflow-dag-processor:
  <<: *airflow-common
  command: dag-processor
  healthcheck:
    test: ["CMD", "airflow", "jobs", "check", "--job-type", "DagProcessorJob", "--local"]
    interval: 60s
    timeout: 60s
    retries: 3
    start_period: 300s
  restart: always
  depends_on:
    airflow-init:
      condition: service_completed_successfully

Services to Rename

# Airflow 2
airflow-webserver:
  command: webserver
  ports:
    - "8080:8080"

# Airflow 3
airflow-api-server:
  command: api-server --workers 1  # --workers 1 is critical (see Part 2)
  ports:
    - "8080:8080"

Environment Variables to Add

x-airflow-common:
  &airflow-common
  environment:
    # Critical: prevents Connection refused in scheduler
    AIRFLOW__CORE__EXECUTION_API_SERVER_URL: "http://airflow-api-server:8080/execution/"

    # Critical: prevents JWT Signature verification failed
    AIRFLOW__API_AUTH__JWT_SECRET: "change-this-in-production"

    # Required for SimpleAuthManager user configuration
    AIRFLOW__CORE__SIMPLE_AUTH_MANAGER_USERS: "admin:Admin"
    AIRFLOW__CORE__SIMPLE_AUTH_MANAGER_PASSWORDS_FILE: "/opt/airflow/project/simple_auth_manager_passwords.json"

Healthcheck Changes

# Airflow 2 scheduler healthcheck (port 8974 no longer exists in Airflow 3)
healthcheck:
  test: ["CMD", "curl", "--fail", "http://localhost:8974/health"]

# Airflow 3 scheduler healthcheck
healthcheck:
  test: ["CMD", "airflow", "jobs", "check", "--job-type", "SchedulerJob", "--local"]
  interval: 60s
  timeout: 60s  # airflow jobs check takes ~42s (full Python + DB round-trip)
  retries: 3
  start_period: 300s  # covers pip install time on first start

The --local flag is essential. --hostname $(hostname) compares the container's $HOSTNAME env var against the hostname Airflow registered in the database. These often differ (9811c4ea8dec vs airflow-scheduler.internal), causing perpetual unhealthy status even when the service is running correctly.

Part 8: Configuration Migration

Changed Defaults That Will Surprise You

# catchup_by_default was True in Airflow 2, False in Airflow 3
# If you have DAGs with start_date in the past and no explicit catchup=True,
# they will NOT backfill on first deploy -- this is usually what you want,
# but verify before deploying
[scheduler]
catchup_by_default = False  # Airflow 3 default

# Default schedule was @daily implicit in some contexts, now None
# DAGs with no schedule parameter will not run automatically

Renamed Configuration Keys

# Airflow 2 -> Airflow 3 config key mapping
[webserver]
web_server_host = 0.0.0.0  ->  [api] host = 0.0.0.0

[webserver]
error_logfile = ...  ->  REMOVED (no replacement)

Automated Config Migration

# Check your airflow.cfg for deprecated/invalid keys
airflow config lint

# Apply automatic fixes
airflow config update --fix

Part 9: Migration Path

If you are upgrading a production Airflow 2 deployment, follow this sequence.

Phase 1: Prepare (Still on Airflow 2)

Upgrade to Airflow 2.7+: the schema migration from earlier versions significantly increases airflow db migrate time; get that done first.
Clean the metadata database: airflow db clean removes old DagRun/TaskInstance records and dramatically speeds up the schema migration.
Run Ruff AIR301 checks: ruff check dags/ --select AIR301 --preview.
Fix all deprecation warnings: zero warnings in Airflow 2.9 means fewer surprises in Airflow 3.
Audit direct database access: grep your task code for from airflow.models imports; these will break.

# Find tasks using direct metadata DB access
grep -r "from airflow.models" dags/ --include="*.py"
grep -r "settings.Session" dags/ --include="*.py"
grep -r "DagRun|TaskInstance|Variable" dags/ --include="*.py" | grep "import"

Phase 2: Upgrade

Back up your metadata database: non-negotiable.
Update your Docker image to apache/airflow:3.0.0.
Add dag-processor service to your Compose/Kubernetes manifests.
Rename webserver to api-server in service definitions.
Set the three critical env vars:
- AIRFLOW__CORE__EXECUTION_API_SERVER_URL
- AIRFLOW__API_AUTH__JWT_SECRET
- SimpleAuthManager password config
Run airflow db migrate.
Update all import paths (use Ruff auto-fix first, then manual review).
Update healthchecks to airflow jobs check --local.

Phase 3: Validate

# Check all services healthy
curl http://localhost:8080/api/v2/monitor/health | python3 -m json.tool

{
  "metadatabase": {"status": "healthy"},
  "scheduler": {"status": "healthy"},
  "triggerer": {"status": "healthy"},
  "dag_processor": {"status": "healthy"}
}

# Trigger a test DAG
airflow dags trigger your_test_dag

# Check task state
airflow tasks states-for-dag-run your_test_dag <run_id>

Part 10: Should You Upgrade?

Upgrade Now If

You are starting a new project: there is no reason to build on Airflow 2.
You have simple DAGs (PythonOperator, BashOperator, standard providers): the migration is mostly find-and-replace on import paths.
You want DAG versioning: this solves real operational pain.
You are running on Kubernetes: the separation of concerns maps cleanly to individual pod scaling.

Wait If

You depend heavily on FAB's OAuth/LDAP integrations and have not tested the FAB provider on Airflow 3.
You have extensive SLA miss callback logic and no monitoring alternative ready.
Your codebase has heavy direct metadata database access in task code -- refactoring that to the Python Client is non-trivial.
You use CeleryKubernetesExecutor or LocalKubernetesExecutor -- both are removed; you need to evaluate the Multiple Executor Configuration feature instead.
You have custom Flask-AppBuilder views or blueprints -- these require porting to FastAPI.

The Honest Assessment

Airflow 3 is the version the project should have been architecturally from the beginning. The separation of the dag-processor, the Task Execution API, and the prohibition on direct metadata access are the right engineering decisions. They make Airflow significantly more secure, more scalable, and more maintainable at the cost of a one-time migration investment.

The upgrade complexity is proportional to how much your codebase relied on Airflow 2's leaky abstractions: direct database access, FAB internals, SLA callbacks, and SubDAGs. If you followed Airflow 2 best practices (TaskFlow API, provider operators, no direct DB access), the migration is a half-day of import path updates and Docker Compose additions.

If you did not, this upgrade is the forcing function to do it properly.

Conclusion

The jump from Airflow 2 to Airflow 3 is the most significant change in the project's history. The webserver is gone. The scheduler no longer parses DAGs. Tasks no longer touch the metadata database. The JWT-authenticated Execution API connects them all.

Each of these changes surfaces as a concrete failure mode in the first deployment: CPU spikes from JWT key divergence, Connection refused from wrong service URLs, silent healthcheck failures from removed ports, and silently no-op user creation from a replaced auth manager.

Understanding the why behind the architecture (isolation, security, scalability) converts each failure from mysterious to obvious. The fixes are not workarounds; they are the intended configuration patterns for a distributed, multi-service orchestration system.

Airflow 3 is what a modern data orchestrator should look like. Migrate when you are ready, migrate properly, and you will not look back.

Resources

Written from direct production experience migrating a healthcare ML retraining pipeline from Airflow 2 patterns to Airflow 3.0.0 on Docker Compose, April 2026.

Apache Airflow 2 vs 3: A Deep Technical Comparison for Data Engineers

Why This Comparison Matters

The 30-Second Summary

Part 1: The Architectural Paradigm Shift

Airflow 2: One Webserver to Rule Them All

Airflow 3: Separation of Concerns as a First-Class Constraint

Part 2: The Task Execution API (AIP-72)

How Airflow 2 Ran Tasks

How Airflow 3 Runs Tasks

The JWT Problem (and Why It Caused a 600% CPU Spike)

The EXECUTION_API_SERVER_URL Problem

Part 3: Authentication

Flask-AppBuilder in Airflow 2

SimpleAuthManager in Airflow 3

Choosing Between SimpleAuthManager and FAB

Part 4: Breaking Changes Catalogue

4.1 SubDAGs Replaced by TaskGroups and Assets

4.2 SequentialExecutor Removed

4.3 SLA Misses Removed

4.4 REST API v1 Removed

4.5 Removed Context Variables

4.6 XCom Pickling Disabled

Part 5: What's New in Airflow 3

5.1 The airflow.sdk Namespace

5.2 DAG Versioning (AIP-66)

5.3 Asset-Based Scheduling (AIP-74, AIP-75)

5.4 Edge Executor (AIP-69)

5.5 Scheduler-Managed Backfills (AIP-78)

5.6 React UI (AIP-38, AIP-84)

Part 6: Import Path Migration Guide

Automated Migration with Ruff

Part 7: Docker Compose Migration

Services to Add

Services to Rename

Environment Variables to Add

Healthcheck Changes

Part 8: Configuration Migration

Changed Defaults That Will Surprise You

Renamed Configuration Keys

Automated Config Migration

Part 9: Migration Path

Phase 1: Prepare (Still on Airflow 2)

Phase 2: Upgrade

Phase 3: Validate

Part 10: Should You Upgrade?

Upgrade Now If

Wait If

The Honest Assessment

Conclusion

Resources

The `EXECUTION_API_SERVER_URL` Problem

5.1 The `airflow.sdk` Namespace