DEV Community

Marcus Denison
Marcus Denison

Posted on

My 10-Minute Airflow Pitch Approach

"Good Morning, can you check if the pipelines ran successfully?". A sentence that probably many of us know or heard a few times before.
But, wouldn't it be nice, if it would be a message that's more like "Hey, just saw that the pipelines haven't failed for a while, great work!" ?

To help teams to achieve exactly that, I have a little 10 Minute Pitch Approach I usually reach for if I see that exact pattern where teams know that they need better Orchestration and Observability but don't know where to start, have a full backlog with other priorities or are overwhelmed with the landscape of tools that are out there.

This probably works for anyone who wants to see if Apache Airflow is for their organization or team.

To boil it down, there are only a few commands that really matter to get up to speed. Assuming you have a working Python and uv Environment.

  • uv init
  • uv add apache-airflow==3.2.1
  • export AIRFLOW_HOME=$(pwd)/.airflow_home/
  • export AIRFLOW__CORE__LOAD_EXAMPLES=False
  • uv run airflow standalone

And there you go, head to http://localhost:8080 and you should see a login screen. This is where a little tricky part comes in, airflow standalone generates a file called simple_auth_manager_passwords.json.generated which is located in the $AIRFLOW_HOME folder. So, for our case I could get the login data by running cat $(pwd)/.airflow_home/simple_auth_manager_passwords.json.generated. This should output something similar to {"admin": "Sr25aWGkT4n2WaZD"}. And as long as that file is there and $AIRFLOW_HOME points to the same directory, the same credentials are valid. You could also create the simple_auth_manager_passwords.json.generated beforehand and place it into the $AIRFLOW_HOME folder, this way, you can set your own credentials. This should be, of course, never be used anywhere near a production environment!

But why don't we load the examples, isn't that exactly what we want to see to evaluate Airflow? Well, as always, it depends.

As I've introduced Airflow to a few clients, I already know which use cases are a great fit and when I maybe should avoid it.

I usually like to pick one very easy example to transfer to Airflow. Imagine having an AWS Eventbridge schedule that triggers an AWS Lambda function. That works and is reliable. But, how accessible is that for co-workers who are not so firm with AWS and its ecosystem? Is there one single point where you can see what is scheduled and when it is scheduled? It can become very cumbersome to find failing Lambdas or finding their failing logs. Of course, for a few AWS Lambdas this is not a big deal. But imagine a larger Data Platform with a few hundred jobs running across different AWS services or even different cloud vendors, then the picture gets messier with everything that gets added.

How do I decide, when to transfer the logic to a DAG or if Airflow should just trigger a service? For pitching it, I usually just start by triggering other services via Airflow. Since this can be achieved very quickly and I can show the benefits of having a centralized tool for Orchestration and Observability.

In order to trigger AWS services from Airflow, we need to install the Airflow Amazon Providers

  • uv add apache-airflow-providers-amazon==9.28.0 This installs the LambdaInvokeFunctionOperator which can trigger an AWS Lambda function.
  • export AIRFLOW_CONN_AWS_DEFAULT='{"conn_type": "aws", "extra": {"region_name": "eu-central-1", "profile_name": "your-profile"}}' Is necessary so that the Airflow AWS Provider knows how to connect to your AWS Services. This defines a connection called aws_default which is referenced in the DAG via aws_conn_id.

This is the example AWS Lambda function we want to trigger:

def lambda_handler(event, context):
    print("value1 = " + event['source'])
    print("value2 = " + event['run'])
    return "Hello from the AWS Lambda"
Enter fullscreen mode Exit fullscreen mode

The DAG file is dropped at $(pwd)/airflow/dags and Airflow also needs to know about this export AIRFLOW__CORE__DAGS_FOLDER=$(pwd)/airflow/dags

import json
from datetime import datetime, timedelta

from airflow.sdk import dag, task
from airflow.providers.amazon.aws.operators.lambda_function import (
    LambdaInvokeFunctionOperator,
)


@dag(
    dag_id="invoke_lambda_function",
    start_date=datetime(2026, 1, 1),
    schedule=None,
    catchup=False,
    default_args={
        "retries": 2,
        "retry_delay": timedelta(seconds=30),
    },
    max_active_runs=1
)
def invoke_lambda_function():
    invoke = LambdaInvokeFunctionOperator(
        task_id="invoke_lambda_function",
        function_name="trigger_from_airflow",
        aws_conn_id="aws_default",
        invocation_type="RequestResponse",
        payload=json.dumps({"source": "airflow", "run": "{{ ds }}"}),
    )

    @task
    def report(response: dict) -> None:
        print(f"Lambda responded: {response}")

    report(invoke.output)


invoke_lambda_function()
Enter fullscreen mode Exit fullscreen mode

With this minimal setup, I am able to demonstrate a few key features which might be important to a data team:

  • The seamless integration with a cloud provider.
  • The visualization of current job executions and the history of past job executions.
  • The visualization of a DAG with multiple tasks.
  • How are retries handled and visualized by Airflow? Easy to demo by making the Lambda fail on purpose.
  • How Airflow can make sure that one workflow does not run in parallel multiple times. This is something AWS Eventbridge does not provide.

Making small changes to the AWS Lambda function, forcing a failure or slowing it down is a quick way to demo retries, self-healing and long-running workflows.

And this is where teams then usually decide to move forward with Airflow or ask further questions.

For a complete short example, go to https://github.com/mdenison/airflow-standalone-pitch.

And hanks to the people who maintain Airflow there is a neat little Quick Start Tutorial where I have most of this from.

Top comments (0)