DEV Community

Rose1845
Rose1845

Posted on

Airflow DAGs, Tasks, and Operators: A Complete Beginner’s Walkthrough

Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. Airflow’s extensible Python framework enables you to build workflows connecting with virtually any technology. A web-based UI helps you visualize, manage, and debug your workflows. You can run Airflow in a variety of configurations, i.e., from a single process on your laptop to a distributed system capable of handling massive workloads.

Workflows as code
Airflow workflows are defined entirely in Python. This “workflows as code” approach brings several advantages:

  • Dynamic: Pipelines are defined in code, enabling dynamic Dag generation and parameterization.
  • Extensible: The Airflow framework includes a wide range of built-in operators and can be extended to fit your needs.
  • Flexible: Airflow leverages the Jinja templating engine, allowing rich customizations.

Dag
A Dag is a model that encapsulates everything needed to execute a workflow. Some Dag attributes include the following:

  • Schedule: When the workflow should run.
  • Tasks: tasks are discrete units of work that are run on workers.
  • Task Dependencies: The order and conditions under which tasks execute.
  • Callbacks: Actions to take when the entire workflow completes.
  • Additional Parameters: And many other operational details.

Unpacking the three words( D .A G.)
Directed. The arrows between tasks go one way. Task A points to Task B. Not the other way around. You can't reverse a dependency.

Acyclic. No loops. Task A cannot eventually depend on itself, directly or indirectly. If it could, the pipeline would run forever. Airflow enforces this rule and will throw an error if you accidentally create a cycle.

Graph. Just a map of connected things. Nodes (your tasks) and edges (the dependencies between them). That's it. Nothing more complicated than what you'd draw on a whiteboard to explain a workflow to a colleague.


Trigger a Dag manually.... see picture below

In other words, we can say:
"A DAG is a one-directional, no-loop map of your workflow. You define the steps. Airflow figures out the order."

Task
A task is one unit of work. One step in your pipeline. "Fetch data from the API" is a task. "Clean the data" is a task. "Save to CSV" is a task. A task does one job and one job only. The moment a task is trying to do three things, it should probably be three tasks.

Operator
An operator is the type of task. Airflow comes with a bunch of built-in operators for common jobs. Examples of the popular operators in Airflow

  1. PythonOperator: Runs a Python function. This is what we'll use today.
  2. BashOperator: Runs a shell command. Useful for scripts, CLI tools, anything you'd run in a terminal.

Top comments (0)