DEV Community

Cover image for Automating a Scones Baking Workflow with Apache Airflow
Rain Leander
Rain Leander

Posted on

Automating a Scones Baking Workflow with Apache Airflow

Apache Airflow is an open-source platform used to author, schedule, and monitor workflows. It's widely used for orchestrating ETL processes, machine learning pipelines, and various other data processing tasks. In this blog post, we will demonstrate how to get started with Apache Airflow using a fun and simple example: automating a scones baking workflow based on the provided recipe.

Recipe: Scones

Ingredients

2 cups flour
4 teaspoons baking powder
3/4 teaspoon salt
1/3 cup sugar
4 tablespoons butter
2 tablespoons shortening
3/4 cup cream
1 egg
Handful dried currants or dried cranberries

Instructions

Heat oven to 375 degrees.
In a large mixing bowl, combine flour, baking powder, salt, and sugar. Mix well.
Cut in butter and shortening.
In a separate bowl, combine cream with beaten egg then add to dry ingredients. Stir in fruit.
Turn dough out onto a floured surface. Roll dough out and cut into biscuit-sized rounds.
Bake for 15 minutes or until brown.

Setting up Apache Airflow

  • Installation
  • Initialization
  • Create an Airflow User
  • Start the Airflow Webserver
  • Start the Airflow Scheduler

(For detailed steps, refer to the previous post in this series!)

Creating a Scones Baking Workflow using Apache Airflow

Let's create a simple Directed Acyclic Graph (DAG) that represents the scones baking workflow. We'll use the PythonOperator to define each step in the recipe as a separate task.

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.python_operator import PythonOperator

def heat_oven():
    print("Heating oven to 375 degrees.")

def mix_dry_ingredients():
    print("Mixing dry ingredients.")

def cut_butter_shortening():
    print("Cutting in butter and shortening.")

def combine_wet_dry_ingredients():
    print("Combining wet and dry ingredients and stirring in fruit.")

def roll_cut_dough():
    print("Rolling dough and cutting into biscuit-sized rounds.")

def bake_scones():
    print("Baking scones for 15 minutes or until brown.")

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2023, 1, 1),
    'retries': 0,
}

dag = DAG(
    'scones_baking_workflow',
    default_args=default_args,
    description='A scones baking workflow using Apache Airflow',
    schedule_interval=None,
    catchup=False,
)

heat_oven_task = PythonOperator(task_id='heat_oven', python_callable=heat_oven, dag=dag)
mix_dry_ingredients_task = PythonOperator(task_id='mix_dry_ingredients', python_callable=mix_dry_ingredients, dag=dag)
cut_butter_shortening_task = PythonOperator(task_id='cut_butter_shortening', python_callable=cut_butter_shortening, dag=dag)
combine_wet_dry_ingredients_task = PythonOperator(task_id='combine_wet_dry_ingredients', python_callable=combine_wet_dry_ingredients, dag=dag)
roll_cut_dough_task = PythonOperator(task_id='roll_cut_dough', python_callable=roll_cut_dough, dag=dag)
bake_scones_task = PythonOperator(task_id='bake_scones', python_callable=bake_scones, dag=dag)

# Set task dependencies
heat_oven_task >> mix_dry_ingredients_task >> cut_butter_shortening_task >> combine_wet_dry_ingredients_task >> roll_cut_dough_task >> bake_scones_task
Enter fullscreen mode Exit fullscreen mode

This DAG defines six tasks, each representing a step in the scones baking process. The tasks are executed sequentially, as defined by the task dependencies set using the >> operator.

To deploy the DAG, save the script in the dags folder within your Airflow installation directory.

Monitoring and Managing the Scones Baking Workflow

With the scones baking workflow DAG deployed and the Airflow components running, you can now monitor and manage your DAG using the Airflow web interface. To trigger a run of the scones baking workflow, find your DAG in the list and click on the "Play" button. You can view the progress of each task, examine the logs for details about the baking process, and visualize the task dependencies.

In this example, the tasks are simple print statements, but in real-world scenarios, you could use Python code to interact with various tools, APIs, or systems relevant to the tasks at hand.

Now you know how to get started with Apache Airflow using a simple and fun example: automating a scones baking workflow based on a given recipe. While this example is not a typical use case for Apache Airflow, it showcases the flexibility and power of the platform for orchestrating complex workflows.

As you become more familiar with Apache Airflow, you can explore more advanced features such as branching, parallelism, dynamic pipelines, and custom operators to manage even more complex data processing tasks and workflows in your projects. Moreover, you can integrate Airflow with various data processing tools, databases, and cloud services, enhancing the capabilities of your data pipelines.

By leveraging Apache Airflow, you can create, schedule, and monitor workflows for diverse use cases, from simple tasks like our scones baking example to more advanced data engineering and machine learning pipelines. The platform's extensibility, flexibility, and ease of use make it an invaluable tool for managing data workflows and ensuring that your projects run smoothly and efficiently.

Top comments (0)