Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. Airflow’s extensible Python framework enables you to build workflows connecting with virtually any technology. A web-based UI helps you visualize, manage, and debug your workflows. You can run Airflow in a variety of configurations, i.e., from a single process on your laptop to a distributed system capable of handling massive workloads.
Workflows as code
Airflow workflows are defined entirely in Python. This “workflows as code” approach brings several advantages:
- Dynamic: Pipelines are defined in code, enabling dynamic Dag generation and parameterization.
- Extensible: The Airflow framework includes a wide range of built-in operators and can be extended to fit your needs.
- Flexible: Airflow leverages the Jinja templating engine, allowing rich customizations.
Dag
A Dag is a model that encapsulates everything needed to execute a workflow. Some Dag attributes include the following:
- Schedule: When the workflow should run.
- Tasks: tasks are discrete units of work that are run on workers.
- Task Dependencies: The order and conditions under which tasks execute.
- Callbacks: Actions to take when the entire workflow completes.
- Additional Parameters: And many other operational details.
Unpacking the three words( D .A G.)
Directed. The arrows between tasks go one way. Task A points to Task B. Not the other way around. You can't reverse a dependency.
Acyclic. No loops. Task A cannot eventually depend on itself, directly or indirectly. If it could, the pipeline would run forever. Airflow enforces this rule and will throw an error if you accidentally create a cycle.
Graph. Just a map of connected things. Nodes (your tasks) and edges (the dependencies between them). That's it. Nothing more complicated than what you'd draw on a whiteboard to explain a workflow to a colleague.

Trigger a Dag manually.... see picture below

In other words, we can say:
"A DAG is a one-directional, no-loop map of your workflow. You define the steps. Airflow figures out the order."
Task
A task is one unit of work. One step in your pipeline. "Fetch data from the API" is a task. "Clean the data" is a task. "Save to CSV" is a task. A task does one job and one job only. The moment a task is trying to do three things, it should probably be three tasks.
Operator
An operator is the type of task. Airflow comes with a bunch of built-in operators for common jobs. Examples of the popular operators in Airflow
- PythonOperator: Runs a Python function. This is what we'll use today.
- BashOperator: Runs a shell command. Useful for scripts, CLI tools, anything you'd run in a terminal.


Top comments (0)