Introduction
Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. It has been the go-to orchestration platform for data pipelines, but the way of writing DAGs has changed as Airflow evolved. Airflow 2.0 introduced TaskFlow API to provide a more pyhtonic way to write DAGs by using decorators instead of traditional operators.
However, many production systems still rely on traditional operators like PythonOperator. This raises an important question:
Which approach should you use, and when?
In this article, we’ll break down:
- The difference between TaskFlow API and traditional operators
- How data is passed between tasks (XComs)
- A side-by-side comparison
- When to use each approach in real-world pipelines
Traditional Operators
Before TaskFlow API, Airflow workflows were built using operators.
Example
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
def extract(ti):
data = [1, 2, 3]
ti.xcom_push(key="data", value=data)
def transform(ti):
data = ti.xcom_pull(task_ids="extract_task", key="data")
return [x**2 for x in data]
with DAG(
dag_id="traditional_dag",
start_date=datetime(2025, 1, 1),
schedule="@daily",
catchup=False
) as dag:
t1 = PythonOperator(
task_id="extract_task",
python_callable=extract
)
t2 = PythonOperator(
task_id="transform_task",
python_callable=transform
)
t1 >> t2
Key Characteristics
- Explicit task definitions
- Manual handling of dependencies
- Explicit use of XCom (xcom_push / xcom_pull)
Taskflow API
The TaskFlow API introduces decorators like @dag and @task, making DAGs cleaner and easier to read.
Example
from airflow.decorators import dag, task
from datetime import datetime
@dag(
dag_id="taskflow_dag",
start_date=datetime(2025, 1, 1),
schedule="@daily",
catchup=False
)
def pipeline():
@task
def extract():
return [1, 2, 3]
@task
def transform(data):
return [x**2 for x in data]
transform(extract())
dag = pipeline()
Key Characteristics
- Uses decorators (@task, @dag)
- Handles dependencies automatically
- Implicit XCom handling
- Cleaner and more Pythonic
XComs: The Core Difference
XCom (cross-communication) is Airflow's mechanism for passing small amounts of data between tasks. An XCom is identified by a key, as well as the task_id and dag_id it came from.
In Traditional Operators, values are explicitly pushed and pulled to/from their storage using the xcom_push and xcom_pull methods on Task Instances.
ti.xcom_push(key="data", value=data)
data = ti.xcom_pull(task_ids="extract", key="data")
In the TaskFlow API, the XComs are made invisible to the developer. When a @task-decorated function returns a value, Airflow automatically pushes it to XCom. When that return value is passed as an argument to the next @task, Airflow automatically pulls it.
@task
def extract():
return [1, 2, 3]
@task
def transform(data):
return [x**2 for x in data]
Key Differences
| Aspect | Traditional | TaskFlow |
|---|---|---|
| Push mechanism | ti.xcom_push(key=..., value=...) |
Function return statement |
| Pull mechanism | ti.xcom_pull(task_ids=..., key=...) |
Function argument |
| Coupling | Coupled by task ID string and key name | Coupled by Python variable reference |
| Refactoring safety | Low - renames break silently | High - Python linter catches issues |
| Visibility in UI | Explicit key labels | Auto-keyed as return_value
|
| Multiple outputs | Multiple xcom_push calls with different keys |
Use @task(multiple_outputs=True) with a dict return |
Multiple outputs in TaskFlow
When your task produces multiple distinct outputs, use the multiple_outputs=True flag:
@task(multiple_outputs=True)
def extract() -> dict:
return {
"order_count": 1500,
"customer_id": "cust_007"
}
When to Use Each Approach
Use the TaskFlow API when:
- Your pipeline is primarily Python logic.
- You want clean, testable code
- You need dynamic task mapping.
- Data passing between tasks is simple
- You're building greenfield pipelines on Airflow 2.0+. For new projects without legacy constraints, TaskFlow should be your default.
Use Traditional Operators when:
- Working with legacy DAGs
- You need fine-grained operator configuration.
- Integrating with non-Python operators
- Handling complex XCom patterns
In practice, most teams use a hybrid approach:
- TaskFlow API for core ETL logic
- Traditional operators for integrations (e.g. Bash, SQL, external systems) For example:
- Extract & transform - TaskFlow
- Load to database - Traditional Operators
Common Mistakes
- Passing large data via XCom: For both approaches, avoid passing large datasets directly, use files or object storage instead.
- Mixing patterns incorrectly: Switching between TaskFlow and operators without understanding dependencies can lead to broken DAGs.
- Debugging assumptions: Implicit XComs in TaskFlow can hide issues if you don’t understand what’s being passed.
Conclusion
Both TaskFlow API and traditional operators are essential tools in Airflow.
- TaskFlow API improves readability and developer experience
- Traditional operators provide flexibility and control.
The best approach is not choosing one over the other, but understanding when to use each effectively.
Top comments (0)