Isha Vason

Posted on Mar 5

Orchestrating Our Way Out of Chaos: How I Compared Airflow, Prefect, and Dagster (and Picked What to Ship)

#bigdata #elt #airflow #dagster

A few quarters ago I inherited a lovable mess: ad‑hoc cron jobs, a couple of shell scripts duct‑taped to a BI refresh, and one heroic Python file that only ran if you patted it gently. My task was simple on paper: pick an orchestrator that wouldn’t implode the moment we added a new source or missed a weekend run.

This post is the story of how I evaluated Apache Airflow, Prefect, and Dagster on a real project—with prototypes, production constraints, and the occasional oh‑no‑why‑is‑nothing‑running moment. I’ll share what I tested, what surprised me, and where each tool shined or stumbled for us.

If you want docs‑level definitions, they exist. Here, you’ll find the parts that mattered in practice, with links when I cite a factual claim or version detail.

The problem I had to solve (quick context)

Ingest nightly data from 3 sources (S3 drops, a SaaS API, and a warehouse copy).
Run dbt transforms, publish a few derived tables, and trigger a downstream dashboard.
Add observability and reduce “zombie jobs” without scaling workers to the moon.
Make it easy for another engineer to onboard next sprint.

In other words: solid batch orchestration, good visibility, and room to grow—without a six‑week platform build.

Round 1: Airflow—the dependable heavyweight

What I tried

I spun up Airflow on Kubernetes using the official Helm chart and Docker images. That gave me a web UI, a scheduler, and workers with minimal friction. The sheer size of the provider ecosystem helped me quickly wire GCP, AWS, Snowflake, Slack, and dbt without reinventing hooks.

The pleasant surprise

Our long‑wait steps (e.g., waiting on BigQuery or S3 sensors) didn’t hog workers thanks to deferrable operators. You run a lightweight triggerer process; tasks “defer” during idle time so your cluster isn’t just… waiting. For us, this was the difference between needing more workers and reusing what we had.

Why Airflow is still Airflow in 2026

Airflow’s newest 3.x cycle modernizes UI and internals (service‑oriented components, faster DAG parsing). That matters operationally—less mysterious sluggishness when you have lots of DAGs, better developer ergonomics, and a cleaner path for upgrades.

Where I felt the drag

Authoring is Pythonic (TaskFlow API), but you still think in DAGs first. That’s great for explicit control; it can feel heavy when all you want is “run this Python, fan out over these 200 files, retry smartly.” Still, if your world is full of external tools and strict schedules, Airflow is home base.

My Airflow takeaway

If your stack touches “everything” and reliability under scale is non‑negotiable, start here. Use deferrables, lean on providers, and sleep better.

Round 2: Prefect—the Python‑native fast mover

What I tried

I ported the pipeline into Prefect using @flow and @task. It felt like writing normal Python—with perks. Retries, caching, and concurrent submits are built in. For our “fan‑out over N files then aggregate” steps, the code stayed tidy and readable.

The pleasant surprise

The hybrid Prefect Cloud model fit our security posture: we kept code/data in our VPC while using Cloud for orchestration metadata, UI, RBAC, and automations. Spinning up workers where the data lives kept latency predictable.

Where it clicked for the team

Debugging and iteration were fast. New engineers could run flows locally, push a deployment, and watch runs in the Cloud UI—all without touching Kubernetes on day one. For a contracting engagement with a tight runway, this mattered more than we expected.

Trade‑offs I felt

Compared to Airflow’s ocean of providers, you’ll sometimes write a sprinkle of glue. Not a blocker—just be aware if your org standardizes on ready‑made operators for everything.

My Prefect takeaway

If your team is Python‑heavy, iterates quickly, and wants a low‑friction path from laptop to production with governance, Prefect is a joy. The retries/mapping model is simple and powerful.

Round 3: Dagster—the data‑product mindset

What I tried

I rewrote the pipeline as software‑defined assets (SDAs). Instead of “run task A then B,” I declared, “the orders_clean table exists and depends on raw_orders.” Dagster then gave me lineage graphs, asset health, and re‑materialization controls out of the box.

The pleasant surprise

We could think in data products instead of job steps. With asset sensors and freshness policies, it was easy to trigger downstream work when an upstream asset changed, and to backfill just the partitions we cared about. This was perfect for dbt‑heavy transformations and ML feature tables.

What the team noticed

The UI’s asset catalog is more than pretty pictures—it made onboarding easier. New teammates grasped the pipeline by reading the graph, not spelunking code. For governance and re‑runs, that visibility was gold.

Trade‑offs I felt

Switching to an asset‑first mental model can be a paradigm shift. If your engineers are used to task DAGs, there’s a small learning curve. And while Dagster OSS is strong, Dagster+ introduces a credits model for managed materializations—fine for many teams, but something to price out.

My Dagster takeaway

If lineage, partitions/backfills, and data contracts are front‑and‑center, Dagster makes those concerns first‑class rather than bolt‑ons.

Head‑to‑head (from my notebook)

Integrations at enterprise scale → Airflow. The provider catalog saved me days connecting to warehouses, clouds, and SaaS. Paired with deferrables, it’s efficient for long waits.
Developer velocity & hybrid Cloud → Prefect. Python decorators, clean retries/mapping, and metadata‑only Cloud made shipping fast and safe.
Lineage, selective re‑runs, partitions → Dagster. SDAs + sensors/freshness gave us surgical control over data products and great visibility.

There’s no absolute winner—only the right fit for your constraints.

What I actually shipped (and why)

For this client, we chose Prefect for the initial rollout:

We needed speed, low ceremony, and a gentle onboarding curve.
The workloads were Python‑heavy with dynamic fan‑out.
Security wanted a managed control plane without data leaving our VPC.

Prefect hit those goals with minimal ops and let us keep momentum while we stabilized sources and schemas. If we had a sprawling integration surface or strict batch SLAs across many third‑party systems, I would’ve pushed Airflow. If the mandate had been strict lineage, partitions, and data governance from day one, I’d have argued for Dagster.

A few practical tips from the trenches

Prototype your pain point, not a toy DAG.

If long waits are killing you, test Airflow’s deferrable operators with your warehouse jobs. The results are tangible in a day.
Don’t over‑optimize day 1.

I’ve seen teams spend weeks perfecting Kubernetes before a single pipeline is reliable. Prefect’s local → deployment workflow can buy you that time to prove value.
If governance will matter later, model as assets now.

Even if you don’t adopt Dagster, design pipelines as data products (clear inputs/outputs, contracts). It pays off when audits or lineage questions arrive. Dagster just bakes this into the tooling.
Pick one, design cleanly, keep migration possible.

Encapsulate business logic away from orchestrator glue. Then swapping engines—should you ever need to—becomes an engineering task, not a re‑platform.

Starter snippets I used (trimmed)

Airflow (TaskFlow with a deferrable wait)

from airflow.decorators import dag, task
from datetime import datetime
from airflow.sensors.time_sensor import TimeSensorAsync  # deferrable

@dag(start_date=datetime(2025,1,1), schedule='@daily', catchup=False)
def daily_ingest():
    @task
    def pull_s3_keys() -> list[str]:
        # list objects...
        return [...]

    # free the worker while waiting for a window
    wait = TimeSensorAsync(task_id="window", target_time="03:00")

    @task
    def load_and_transform(keys: list[str]) -> int:
        # load & process in warehouse
        return len(keys)

    keys = pull_s3_keys()
    wait >> load_and_transform(keys)

daily_ingest()

Airflow TaskFlow keeps you in Python while the triggerer handles idle time for deferrable tasks/sensors.

Prefect (fan‑out with retries)

from prefect import flow, task

@task(retries=3, retry_delay_seconds=[1, 2, 4])
def ingest_one(path: str) -> int:
    # read, validate, write to bronze
    return 1

@flow(log_prints=True)
def nightly(files: list[str]) -> int:
    futures = [ingest_one.submit(f) for f in files]
    return sum(f.result() for f in futures)

@flow/@task feels like straight Python. Retries/concurrency are built in, and you can register the flow as a deployment to Prefect Cloud (hybrid).

Dagster (assets with lineage)

import dagster as dg

@dg.asset
def raw_orders() -> str:
    return "s3://bucket/raw/orders.csv"

@dg.asset(deps=[raw_orders])
def orders_clean() -> None:
    # transform + write to warehouse
    ...

# add schedules or sensors for downstream triggers

With software‑defined assets, you’ll see lineage and materializations in the UI and can set freshness/backfill policies.

Closing thoughts

If you’re choosing an orchestrator this quarter, let your constraints pick the winner:

Airflow if integrations + strict scheduling at scale are the game.
Prefect if you want Python‑first velocity and hybrid Cloud governance.
Dagster if data products, lineage, and partitions are the north star.

We shipped Prefect first, and I’d make the same call given the same pressure and team. But all three are excellent—pick one, keep your business logic clean, and your future self (or the next contractor) will thank you.

DEV Community