DEV Community

Karen Langat
Karen Langat

Posted on

TaskFlow API vs Traditional Operators in Apache Airflow

Introduction

Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. It has been the go-to orchestration platform for data pipelines, but the way of writing DAGs has changed as Airflow evolved. Airflow 2.0 introduced TaskFlow API to provide a more pyhtonic way to write DAGs by using decorators instead of traditional operators.

However, many production systems still rely on traditional operators like PythonOperator. This raises an important question:
Which approach should you use, and when?

In this article, we’ll break down:

  • The difference between TaskFlow API and traditional operators
  • How data is passed between tasks (XComs)
  • A side-by-side comparison
  • When to use each approach in real-world pipelines

Traditional Operators

Before TaskFlow API, Airflow workflows were built using operators.

Example

from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime


def extract(ti):
    data = [1, 2, 3]
    ti.xcom_push(key="data", value=data)


def transform(ti):
    data = ti.xcom_pull(task_ids="extract_task", key="data")
    return [x**2 for x in data]


with DAG(
    dag_id="traditional_dag",
    start_date=datetime(2025, 1, 1),
    schedule="@daily",
    catchup=False
) as dag:

    t1 = PythonOperator(
        task_id="extract_task",
        python_callable=extract
    )

    t2 = PythonOperator(
        task_id="transform_task",
        python_callable=transform
    )

    t1 >> t2

Enter fullscreen mode Exit fullscreen mode

Key Characteristics

  • Explicit task definitions
  • Manual handling of dependencies
  • Explicit use of XCom (xcom_push / xcom_pull)

Taskflow API

The TaskFlow API introduces decorators like @dag and @task, making DAGs cleaner and easier to read.

Example

from airflow.decorators import dag, task
from datetime import datetime


@dag(
    dag_id="taskflow_dag",
    start_date=datetime(2025, 1, 1),
    schedule="@daily",
    catchup=False
)
def pipeline():

    @task
    def extract():
        return [1, 2, 3]

    @task
    def transform(data):
        return [x**2 for x in data]

    transform(extract())


dag = pipeline()
Enter fullscreen mode Exit fullscreen mode

Key Characteristics

  • Uses decorators (@task, @dag)
  • Handles dependencies automatically
  • Implicit XCom handling
  • Cleaner and more Pythonic

XComs: The Core Difference

XCom (cross-communication) is Airflow's mechanism for passing small amounts of data between tasks. An XCom is identified by a key, as well as the task_id and dag_id it came from.

In Traditional Operators, values are explicitly pushed and pulled to/from their storage using the xcom_push and xcom_pull methods on Task Instances.

ti.xcom_push(key="data", value=data) 
data = ti.xcom_pull(task_ids="extract", key="data")
Enter fullscreen mode Exit fullscreen mode

In the TaskFlow API, the XComs are made invisible to the developer. When a @task-decorated function returns a value, Airflow automatically pushes it to XCom. When that return value is passed as an argument to the next @task, Airflow automatically pulls it.

@task 
def extract(): 
    return [1, 2, 3]

@task 
def transform(data): 
    return [x**2 for x in data]
Enter fullscreen mode Exit fullscreen mode

Key Differences

Aspect Traditional TaskFlow
Push mechanism ti.xcom_push(key=..., value=...) Function return statement
Pull mechanism ti.xcom_pull(task_ids=..., key=...) Function argument
Coupling Coupled by task ID string and key name Coupled by Python variable reference
Refactoring safety Low - renames break silently High - Python linter catches issues
Visibility in UI Explicit key labels Auto-keyed as return_value
Multiple outputs Multiple xcom_push calls with different keys Use @task(multiple_outputs=True) with a dict return

Multiple outputs in TaskFlow

When your task produces multiple distinct outputs, use the multiple_outputs=True flag:

@task(multiple_outputs=True)
def extract() -> dict:
    return {
        "order_count": 1500,
        "customer_id": "cust_007"
    }
Enter fullscreen mode Exit fullscreen mode

When to Use Each Approach

Use the TaskFlow API when:

  • Your pipeline is primarily Python logic.
  • You want clean, testable code
  • You need dynamic task mapping.
  • Data passing between tasks is simple
  • You're building greenfield pipelines on Airflow 2.0+. For new projects without legacy constraints, TaskFlow should be your default.

Use Traditional Operators when:

  • Working with legacy DAGs
  • You need fine-grained operator configuration.
  • Integrating with non-Python operators
  • Handling complex XCom patterns

In practice, most teams use a hybrid approach:

  • TaskFlow API for core ETL logic
  • Traditional operators for integrations (e.g. Bash, SQL, external systems) For example:
  • Extract & transform - TaskFlow
  • Load to database - Traditional Operators

Common Mistakes

  • Passing large data via XCom: For both approaches, avoid passing large datasets directly, use files or object storage instead.
  • Mixing patterns incorrectly: Switching between TaskFlow and operators without understanding dependencies can lead to broken DAGs.
  • Debugging assumptions: Implicit XComs in TaskFlow can hide issues if you don’t understand what’s being passed.

Conclusion

Both TaskFlow API and traditional operators are essential tools in Airflow.

  • TaskFlow API improves readability and developer experience
  • Traditional operators provide flexibility and control.

The best approach is not choosing one over the other, but understanding when to use each effectively.

Top comments (0)