DEV Community

Hasan Naqvi
Hasan Naqvi

Posted on

Airflow Version Upgrade for Enterprises: A Practical Blueprint for AWS, Snowflake, dbt, and Fintech Data Platforms

Airflow Version Upgrade for Enterprises: A Practical Blueprint for AWS, Snowflake, dbt, and Fintech Data Platforms

SEO title: Airflow Version Upgrade for Enterprises: Secure, Tested, Production-Ready
Meta description: Airflow version upgrade for enterprises requires dependency control, metadata database migration, DAG validation, and safe rollout across AWS, Snowflake, dbt, and fintech platforms.

Enterprise Airflow upgrades are rarely just a pip install --upgrade exercise. In regulated fintech environments, an Airflow version upgrade touches orchestration reliability, auditability, IAM boundaries, data lineage, SLAs, dbt jobs, Snowflake cost controls, and incident response. Apache Airflow’s own documentation notes that newer versions can include metadata database migrations and that airflow db migrate must be run during upgrades; Airflow 3 is also a major release with breaking changes, so enterprises need a controlled engineering process rather than an ad hoc deployment. (Apache Airflow)


1. Build an Enterprise Upgrade Strategy Before Touching Production

A good enterprise upgrade starts with an inventory: Airflow core version, Python version, providers, custom plugins, DAG import behavior, executor type, metadata database engine, secrets backend, and deployment model.

For a fintech platform running Airflow on AWS, the upgrade scope usually includes:

Area Upgrade Risk Enterprise Control
Airflow metadata DB High Snapshot, migration dry run, rollback plan
Providers High Pin versions and test hooks/operators
DAG code High Static validation and import tests
Snowflake connections Medium Validate auth, warehouses, roles
dbt orchestration Medium Test commands, profiles, artifacts
IAM/secrets High Validate AWS Secrets Manager, KMS, IRSA
Observability Medium Update alerts, metrics, dashboards

Apache Airflow recommends using constraints when installing from PyPI because constraints files are fixed for a released Airflow version to help produce consistent installs. (Apache Airflow) On Amazon MWAA, AWS notes that environments keep using the specified Airflow image version until upgraded, and for Airflow v2.7.2 and later MWAA requires a constraints statement in requirements.txt or applies one for compatibility. (AWS Documentation)

# Example: inspect the current enterprise Airflow runtime
airflow version
python --version
airflow providers list

# Export installed dependencies for comparison
pip freeze | sort > airflow-current-freeze.txt

# Capture DAG inventory
airflow dags list --output json > airflow-current-dags.json

# Capture current configuration, excluding secrets before sharing
airflow config list > airflow-current-config.txt
Enter fullscreen mode Exit fullscreen mode

Enterprise recommendation: treat the Airflow upgrade as a platform migration with a formal change record, not a library bump.


2. Pin Airflow, Providers, and Python Dependencies Reproducibly

Dependency drift is one of the most common causes of failed Airflow upgrades. Providers evolve independently from Airflow core, and enterprise DAGs often rely on AWS, Snowflake, Kubernetes, Slack, dbt, OpenLineage, or custom internal packages.

A hardened upgrade should use:

  • A target Airflow version.
  • A supported Python version.
  • A constraints file.
  • Explicit provider versions.
  • A separate constraints lock for internal packages.
  • A reproducible container build.

For Airflow 3.0.0, Apache’s migration guide states that Python 3.9, 3.10, 3.11, and 3.12 are supported, and it also recommends being on Airflow 2.7 or later before moving from Airflow 2.x to Airflow 3. (Apache Airflow)

# Dockerfile example for self-managed Airflow on ECS/EKS
FROM apache/airflow:3.2.1-python3.11

ARG AIRFLOW_VERSION=3.2.1
ARG PYTHON_VERSION=3.11

USER root
RUN apt-get update && apt-get install -y --no-install-recommends \
    git \
    build-essential \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/*

USER airflow

COPY requirements.txt /requirements.txt

RUN pip install --no-cache-dir \
    --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt" \
    -r /requirements.txt
Enter fullscreen mode Exit fullscreen mode

Example requirements.txt:

apache-airflow-providers-amazon
apache-airflow-providers-snowflake
apache-airflow-providers-slack
apache-airflow-providers-cncf-kubernetes

dbt-core==1.8.9
dbt-snowflake==1.8.4

openlineage-airflow
great-expectations
Enter fullscreen mode Exit fullscreen mode

For Amazon MWAA, the dependency pattern is different because AWS manages the base image. AWS documents that MWAA builds images bundling Airflow releases with common binaries and Python libraries, so enterprises should validate requirements against the MWAA-supported Airflow version rather than assuming parity with a self-managed container. (AWS Documentation)

# MWAA requirements.txt example
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.10.3/constraints-3.11.txt"

apache-airflow-providers-snowflake
apache-airflow-providers-amazon
dbt-core==1.8.9
dbt-snowflake==1.8.4
Enter fullscreen mode Exit fullscreen mode

3. Validate DAG Compatibility Before the Upgrade Window

Enterprise DAG validation should happen in CI before a platform image reaches staging. The goal is to catch import errors, deprecated APIs, provider incompatibilities, slow DAG parsing, direct metadata DB access, and business-critical workflow regressions.

Airflow 3 contains breaking changes, and the official Airflow 3 upgrade guide specifically calls out architectural changes and migration preparation from Airflow 2.x. (Apache Airflow) Astronomer’s Airflow 2-to-3 guidance also emphasizes testing updated DAGs locally before upgrading production. (astronomer.io)

# tests/test_dag_imports.py
import os
import pytest
from airflow.models import DagBag

DAGS_FOLDER = os.environ.get("AIRFLOW__CORE__DAGS_FOLDER", "dags")

@pytest.fixture(scope="session")
def dag_bag():
    return DagBag(dag_folder=DAGS_FOLDER, include_examples=False)

def test_no_dag_import_errors(dag_bag):
    assert dag_bag.import_errors == {}, dag_bag.import_errors

def test_dags_have_owners(dag_bag):
    for dag_id, dag in dag_bag.dags.items():
        assert dag.owner, f"{dag_id} has no owner"

def test_dags_have_tags(dag_bag):
    for dag_id, dag in dag_bag.dags.items():
        assert dag.tags, f"{dag_id} has no tags"

def test_no_paused_critical_dags(dag_bag):
    critical_prefixes = ("payments_", "ledger_", "risk_", "reconciliation_")
    for dag_id in dag_bag.dags:
        if dag_id.startswith(critical_prefixes):
            assert dag_bag.dags[dag_id].schedule is not None
Enter fullscreen mode Exit fullscreen mode

Add a CI job that runs against the target Airflow image:

# .github/workflows/airflow-upgrade-validation.yml
name: airflow-upgrade-validation

on:
  pull_request:
    paths:
      - "dags/**"
      - "plugins/**"
      - "requirements.txt"
      - "Dockerfile"

jobs:
  validate-airflow:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Build target Airflow image
        run: docker build -t enterprise-airflow-upgrade:test .

      - name: Run DAG import tests
        run: |
          docker run --rm \
            -e AIRFLOW__CORE__LOAD_EXAMPLES=False \
            enterprise-airflow-upgrade:test \
            bash -c "pip install pytest && pytest tests/test_dag_imports.py -q"
Enter fullscreen mode Exit fullscreen mode

For fintech workloads, also add regression tests for DAGs that move money, produce ledger entries, generate regulatory reports, or trigger customer-facing notifications.


4. Run Metadata Database Migration as a Controlled Operation

The metadata database is the heart of Airflow state: DAG runs, task instances, rendered templates, variables, connections, pools, serialized DAGs, and scheduler metadata. Apache Airflow states that newer versions can contain database migrations and that airflow db migrate should be run to apply schema changes. (Apache Airflow) Airflow’s database setup documentation also says Airflow components should not be running while the database migration executes, and notes that airflow db upgrade was deprecated in favor of airflow db migrate before Airflow 2.7. (Apache Airflow)

#!/usr/bin/env bash
set -euo pipefail

echo "Stopping Airflow services..."
kubectl scale deployment airflow-scheduler --replicas=0 -n airflow
kubectl scale deployment airflow-webserver --replicas=0 -n airflow
kubectl scale deployment airflow-worker --replicas=0 -n airflow || true

echo "Running metadata DB migration..."
kubectl run airflow-db-migrate \
  --rm -i --restart=Never \
  --namespace airflow \
  --image registry.example.com/data-platform/airflow:3.2.1 \
  -- bash -c "airflow db check && airflow db migrate"

echo "Restarting Airflow services..."
kubectl scale deployment airflow-scheduler --replicas=2 -n airflow
kubectl scale deployment airflow-webserver --replicas=2 -n airflow
kubectl scale deployment airflow-worker --replicas=6 -n airflow || true
Enter fullscreen mode Exit fullscreen mode

For RDS-backed metadata databases, a safer enterprise sequence is:

  1. Take an automated RDS snapshot.
  2. Restore the snapshot to a staging database.
  3. Run airflow db migrate against staging.
  4. Run DAG import and scheduler smoke tests.
  5. Run the production migration during a change window.
  6. Keep the old image available for rollback planning.

For Amazon MWAA and managed Airflow platforms, the procedure may be provider-specific. AWS documents supported MWAA versions and upgrade considerations, while Google Cloud Composer states that Airflow 2 to Airflow 3 requires side-by-side migration rather than in-place upgrade for its managed Airflow service. (AWS Documentation)


5. Architecture Decision: Blue/Green Upgrade Instead of In-Place Upgrade

Decision: We decided to use a blue/green Airflow upgrade for enterprise production environments.

We chose to deploy a parallel target Airflow environment, validate DAGs, mirror connections and variables, replay non-destructive DAGs, and then cut over scheduler ownership instead of upgrading the running environment in place.

Trade-off: blue/green costs more in infrastructure and operational coordination, but it materially reduces outage risk and gives the enterprise a fast fallback path.

flowchart LR
    A[Git DAG Repository] --> B[CI Validation]
    B --> C[Blue Airflow Current Version]
    B --> D[Green Airflow Target Version]

    C --> E[(Current Metadata DB)]
    D --> F[(Cloned Metadata DB)]

    C --> G[Snowflake Prod]
    D --> H[Snowflake Staging / Prod Read-Only]

    D --> I[Smoke Tests]
    I --> J{Cutover Approved?}
    J -->|Yes| K[Route Schedulers and Web UI to Green]
    J -->|No| L[Keep Blue Active]
Enter fullscreen mode Exit fullscreen mode

In a self-managed AWS architecture, this can be implemented using EKS namespaces, separate Helm releases, independent metadata databases, and shared read-only access to DAG code during validation.

# Example Helm-based blue/green deployment
helm upgrade --install airflow-green apache-airflow/airflow \
  --namespace airflow-green \
  --create-namespace \
  --values values-green.yaml \
  --set images.airflow.repository=registry.example.com/data-platform/airflow \
  --set images.airflow.tag=3.2.1 \
  --set dags.gitSync.enabled=true \
  --set dags.gitSync.repo=git@github.com:example/enterprise-dags.git \
  --set dags.gitSync.branch=airflow-upgrade
Enter fullscreen mode Exit fullscreen mode

Example cutover guardrail:

-- Snowflake: confirm no duplicate production writes during cutover
select
    dag_id,
    task_id,
    count(*) as write_events,
    min(event_ts) as first_event_ts,
    max(event_ts) as last_event_ts
from platform_audit.airflow_write_events
where event_ts >= dateadd(hour, -2, current_timestamp())
  and environment in ('airflow-blue', 'airflow-green')
group by 1, 2
having count(distinct environment) > 1;
Enter fullscreen mode Exit fullscreen mode

Blue/green is especially valuable for fintech platforms where duplicate task execution can produce customer-impacting side effects. The safest design is to make DAGs idempotent and to prevent green from writing to production until cutover is approved.


6. Upgrade dbt and Snowflake Workloads With Cost and Lineage Controls

Airflow upgrades often expose hidden assumptions in dbt and Snowflake workflows: environment variables, profiles paths, warehouse names, OAuth integrations, private keys, masking policies, and task retry semantics.

A typical enterprise pattern is to run dbt inside an isolated KubernetesPodOperator, ECS task, or containerized Python virtualenv rather than installing every analytics dependency into the Airflow scheduler image.

# dags/dbt_snowflake_upgrade_smoke_test.py
from __future__ import annotations

from datetime import datetime
from airflow import DAG
from airflow.providers.cncf.kubernetes.operators.pod import KubernetesPodOperator

with DAG(
    dag_id="dbt_snowflake_upgrade_smoke_test",
    start_date=datetime(2025, 1, 1),
    schedule=None,
    catchup=False,
    tags=["upgrade", "dbt", "snowflake", "smoke-test"],
) as dag:

    dbt_debug = KubernetesPodOperator(
        task_id="dbt_debug",
        namespace="data-platform",
        image="registry.example.com/analytics/dbt-snowflake:1.8.9",
        cmds=["bash", "-lc"],
        arguments=[
            """
            set -euo pipefail
            dbt --version
            dbt debug --profiles-dir /opt/dbt/profiles
            """
        ],
        env_vars={
            "DBT_TARGET": "staging",
            "SNOWFLAKE_WAREHOUSE": "AIRFLOW_UPGRADE_WH",
        },
        get_logs=True,
        is_delete_operator_pod=True,
    )

    dbt_compile = KubernetesPodOperator(
        task_id="dbt_compile",
        namespace="data-platform",
        image="registry.example.com/analytics/dbt-snowflake:1.8.9",
        cmds=["bash", "-lc"],
        arguments=[
            """
            set -euo pipefail
            dbt deps
            dbt compile --target staging --profiles-dir /opt/dbt/profiles
            """
        ],
        get_logs=True,
        is_delete_operator_pod=True,
    )

    dbt_debug >> dbt_compile
Enter fullscreen mode Exit fullscreen mode

For Snowflake, apply explicit controls during the upgrade:

-- Snowflake warehouse for upgrade validation
create warehouse if not exists AIRFLOW_UPGRADE_WH
  warehouse_size = 'XSMALL'
  auto_suspend = 60
  auto_resume = true
  initially_suspended = true;

-- Dedicated role for upgrade smoke tests
create role if not exists AIRFLOW_UPGRADE_ROLE;

grant usage on warehouse AIRFLOW_UPGRADE_WH to role AIRFLOW_UPGRADE_ROLE;
grant usage on database ANALYTICS_DEV to role AIRFLOW_UPGRADE_ROLE;
grant usage on all schemas in database ANALYTICS_DEV to role AIRFLOW_UPGRADE_ROLE;
grant select on all tables in database ANALYTICS_DEV to role AIRFLOW_UPGRADE_ROLE;
Enter fullscreen mode Exit fullscreen mode

This keeps the upgrade test path observable, cost-limited, and isolated from production finance reporting.


7. Post-Upgrade Observability and Rollback Readiness

After the upgrade, validate platform health using operational metrics and business metrics. Do not rely only on a green webserver.

Minimum checks:

  • Scheduler heartbeat is healthy.
  • DAG parse time is within baseline.
  • Critical DAGs are scheduled.
  • Worker queues are draining.
  • Metadata DB connections are stable.
  • Snowflake query volume is expected.
  • dbt artifacts are generated.
  • SLA or deadline alerts still fire.
  • Secrets backend resolution works.
  • Audit logs are complete.
# dags/platform_upgrade_canary.py
from __future__ import annotations

from datetime import datetime
from airflow.decorators import dag, task
from airflow.providers.snowflake.hooks.snowflake import SnowflakeHook

@dag(
    dag_id="platform_upgrade_canary",
    start_date=datetime(2025, 1, 1),
    schedule="*/15 * * * *",
    catchup=False,
    tags=["platform", "upgrade", "canary"],
)
def platform_upgrade_canary():

    @task
    def check_snowflake_connection() -> str:
        hook = SnowflakeHook(snowflake_conn_id="snowflake_platform")
        result = hook.get_first("select current_version(), current_role(), current_warehouse()")
        return f"Snowflake OK: {result}"

    @task
    def check_airflow_runtime() -> str:
        import airflow
        return f"Airflow runtime OK: {airflow.__version__}"

    check_airflow_runtime() >> check_snowflake_connection()

platform_upgrade_canary()
Enter fullscreen mode Exit fullscreen mode

A rollback plan should be written before the change window. For database migrations, rollback is not always a simple downgrade; the safest strategy is to keep a database snapshot, old image, old DAG branch, and frozen dependency set available. Airflow’s migration reference documents the migrations executed by airflow db migrate, which is useful for reviewing the schema impact before production execution. (Apache Airflow)


Conclusion

An Airflow version upgrade for enterprises should be engineered like a production platform migration: version pinning, reproducible builds, DAG compatibility testing, metadata database migration, blue/green rollout, Snowflake/dbt smoke tests, and post-upgrade observability.

The most important principle is control. Control the dependencies. Control the metadata database migration. Control task side effects. Control cutover. Control rollback. Enterprises that follow this model can upgrade Airflow with lower operational risk while improving security, scheduler performance, provider compatibility, and long-term maintainability.

Top comments (0)