Aki for AWS Community Builders

Posted on May 22

Organizing How to Use AWS Glue Workflow

#aws #dataengineering

Original Japanese article: AWS Glue Workflowの使い方について整理してみる

Introduction

I'm Aki, an AWS Community Builder (@jitepengin).

Previously, I wrote an article comparing when to use AWS Step Functions versus Glue Workflow.
Organizing the Use Cases of AWS Step Functions and Glue Workflow for ETL Processing with AWS Glue Jobs

As I mentioned there, I personally like Glue Workflow and consider it an excellent service that balances simplicity and low cost.

However, in recent years, Step Functions has become increasingly mainstream, and I get the impression that opportunities to work with Glue Workflow have decreased.
Because of that, I think many people are unsure about how they should actually use it in practice.

So in this article, I’d like to organize everything from the basics of Glue Workflow to practical usage patterns.
I hope this helps more people become interested in Glue Workflow.

What Is Glue Workflow?

AWS Glue Workflow is Glue’s native workflow orchestration feature.
It defines ETL pipelines by combining the following three elements:

Component	Role
Glue Job	The actual ETL processing using Spark or Python Shell
Glue Crawler	Scans data sources such as S3 and registers table definitions in the Data Catalog
Glue Trigger	Defines execution conditions for Jobs and Crawlers (schedule, event, conditional, etc.)

By connecting these components as a DAG (Directed Acyclic Graph), you can build ETL pipelines.

Another characteristic is that workflows can be visually configured through the Glue console GUI.

A major advantage is that workflows themselves incur no additional cost—you only pay for Job and Crawler execution.

Details of the Core Components

The core of Glue Workflow is the Trigger mechanism.
There are four trigger types, each with different roles.

Trigger Type	Execution Condition	Typical Use Case
SCHEDULED	Scheduled execution using cron expressions (UTC, minimum 5-minute interval)	Periodic execution such as daily ETL
ON_DEMAND	Manual execution or via API/SDK	Arbitrary execution timing
CONDITIONAL	Triggered based on the status of preceding Jobs/Crawlers	Chained execution between Jobs
EVENT	Triggered by EventBridge events	Event-driven pipelines

The typical pattern is:

Use SCHEDULED, ON_DEMAND, or EVENT as the workflow’s starting trigger
Use CONDITIONAL to connect downstream Jobs

By the way, AWS officially recommends keeping the number of elements included in a workflow (Jobs + Crawlers + Triggers) under 100.
Exceeding this recommendation can cause errors when resuming or stopping Workflow Runs.

CONDITIONAL Trigger Usage Patterns

CONDITIONAL Triggers are one of the key features that provide flexibility in Glue Workflow.
Here are several representative patterns.

1. Simple Sequential Pattern

This is the most basic usage pattern: “Run JobB after JobA succeeds.”

JobA (SUCCEEDED) → JobB

You can implement this simply by configuring a CONDITIONAL Trigger with the condition:
“Start when JobA reaches the SUCCEEDED state.”

2. Waiting for Multiple Jobs to Complete (AND Condition)

This pattern runs JobC only after both JobA and JobB succeed.

JobA (SUCCEEDED) ┐
                 ├→ JobC
JobB (SUCCEEDED) ┘

This can be implemented by setting Logical: AND in the CONDITIONAL Trigger predicate and listing multiple conditions.

This is useful in scenarios such as:
“Run an aggregation Job only after multiple data sources have finished loading.”

3. Triggering When Any Job Completes (ANY Condition)

This pattern runs JobC when either JobA or JobB succeeds.

The predicate of a CONDITIONAL Trigger has a Logical field where you can specify either AND or ANY.
Using ANY causes the trigger to fire as soon as any one of the specified conditions is satisfied.

Note that although this behavior is logically equivalent to “OR,” the actual Glue configuration value is ANY.
This is important when defining workflows using IaC or CLI because specifying OR will result in an error.

4. Failure Branching (Catching FAILED States)

CONDITIONAL Triggers can react not only to SUCCEEDED, but also to states such as:

FAILED
STOPPED
TIMEOUT
ERROR

(The supported states differ slightly between Jobs and Crawlers.)

Using this feature, you can create patterns such as launching a notification Job (for example, a Python Shell Job that publishes to SNS) when a Job fails.

JobA (SUCCEEDED) → Downstream Processing
JobA (FAILED)    → Notification Job

For relatively simple error handling, this approach allows you to avoid introducing Step Functions.

Managing Parameters with `default_run_properties`

Glue Workflow provides a property called default_run_properties, which acts like globally shared variables across the workflow.

How It Works

default_run_properties stores key-value pairs that can be referenced by all Jobs within the workflow.

It functions as the default set of parameters passed during Workflow execution and serves as the foundation for sharing information between Jobs.

One important note:
Run Property values may appear in logs, so you should avoid storing secrets directly in them.

Instead, retrieve secrets through services such as:

AWS Secrets Manager
Glue Connections

How to Configure It

There are three main configuration methods.

Configure from the Console

In the Glue console:
Workflow → Edit Properties

You can add key-value pairs there.

Configure via boto3

import boto3

glue = boto3.client('glue')

glue.create_workflow(
    Name='my-workflow',
    DefaultRunProperties={
        'env': 'production',
        'target_date': '2026-05-21'
    }
)

Configure via IaC (CloudFormation, etc.)

You can specify DefaultRunProperties in the AWS::Glue::Workflow resource.

Static Parameters vs Dynamic Parameters

Typical usage patterns include:

Static parameters
- Environment names (dev / prod)
- S3 bucket names
- Values that rarely change
Dynamic parameters
- Processing dates
- Execution IDs
- Values that change per execution

Dynamic parameters can be updated either by:

Passing RunProperties to start_workflow_run
Dynamically updating them later using put_workflow_run_properties

Passing Data Between Jobs

Using default_run_properties as the foundation, let’s look at how to exchange data between Jobs.

Dynamically Updating Run Properties

During Job execution, you can dynamically update Workflow Run properties by calling the put_workflow_run_properties API.

import boto3

glue = boto3.client('glue')

glue.put_workflow_run_properties(
    Name='my-workflow',
    RunId=workflow_run_id,
    RunProperties={
        'processed_records': '12345',
        'output_path': 's3://mybucket/output/2026-05-21/'
    }
)

This allows downstream Jobs to reference values calculated by upstream Jobs.

Retrieving Properties from a PySpark Job

Inside a PySpark Job, you first retrieve the Workflow Run ID and then call get_workflow_run_properties.

import sys
import boto3
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(
    sys.argv,
    ['JOB_NAME', 'WORKFLOW_NAME', 'WORKFLOW_RUN_ID']
)

glue = boto3.client('glue')

response = glue.get_workflow_run_properties(
    Name=args['WORKFLOW_NAME'],
    RunId=args['WORKFLOW_RUN_ID']
)

properties = response['RunProperties']

target_date = properties.get('target_date')

WORKFLOW_NAME and WORKFLOW_RUN_ID are special arguments automatically passed when a Job is launched through a Workflow.

Retrieving Properties from a Python Shell Job

The basic approach is the same for Python Shell Jobs:

Retrieve arguments using getResolvedOptions
Access properties through boto3

Since Python Shell Jobs do not require SparkContext initialization, they can be written more lightweight.

They are also cheaper than Spark-based Glue Jobs, so depending on the requirements, they can be a good option.

Personally, I also like Python Shell Jobs, although I feel opportunities to use them in real-world projects have decreased, which is a bit unfortunate.

I’ve also written articles about Python Shell Jobs, so feel free to check them out.

S3 Triggers: How to Launch Glue Python Shell via AWS Lambda

Anti-Patterns for Data Passing

default_run_properties should only be used for metadata-like information exchange.

The following usage patterns should generally be avoided:

Passing large datasets directly
- Store the data itself in S3 and pass only the path
Storing secrets
- Use Secrets Manager or Glue Connections instead
Frequently rewriting properties
- This increases API calls and introduces race condition risks

Integrating with EventBridge and Other Services

Glue Workflow becomes much more flexible when combined with EventBridge.

EventBridge-Based Startup (EVENT Trigger)

Glue Workflow can be started directly by EventBridge events.

This is achieved by setting the Trigger Type to EVENT.

aws glue create-trigger \
  --workflow-name my-workflow \
  --type EVENT \
  --name s3-arrival-trigger \
  --actions JobName=my-job

By configuring an EventBridge rule with Glue Workflow as the target, the workflow starts automatically when an event occurs.

However, appropriate IAM permissions such as glue:notifyEvent are required.

Batch Event Startup

EVENT Triggers also support event batching.

Using EventBatchingCondition, you can configure the workflow to start when either:

N events arrive
M seconds pass since the first event arrived

aws glue create-trigger \
  --workflow-name my-workflow \
  --type EVENT \
  --name batch-trigger \
  --event-batching-condition BatchSize=10,BatchWindow=300 \
  --actions JobName=my-job

This enables patterns such as:
“Run ETL once 100 files have arrived.”

The maximum batch window is 900 seconds (15 minutes).

Starting from S3 Events (The Parameter Passing Limitation)

A common use case is:
“Start a workflow when a file is uploaded to S3.”

When starting Glue Workflow through EventBridge, the event IDs are automatically stored in a Run Property called aws:eventIds.

event_ids = glue_client.get_workflow_run_properties(
    Name=workflow_name,
    RunId=workflow_run_id
)['RunProperties']['aws:eventIds']

The returned value looks like:

'["abc-123", "def-456"]'

However, this is where one of Glue Workflow’s limitations becomes apparent.

The EventBridge event payload itself (such as the S3 object key or bucket name) is not automatically passed as Run Properties.

Only the event IDs are provided.

If you need the actual object details, your Job must retrieve the corresponding event contents from CloudTrail, which becomes somewhat cumbersome.

Because of this, many cases are easier to manage by placing Lambda in the middle and explicitly calling start_workflow_run with structured Run Properties.

S3 PUT → EventBridge → Lambda → start_workflow_run (pass parameters via RunProperties)

Example Lambda code:

import boto3

glue = boto3.client('glue')

def lambda_handler(event, context):
    bucket = event['detail']['bucket']['name']
    key = event['detail']['object']['key']

    glue.start_workflow_run(
        Name='my-workflow',
        RunProperties={
            'source_bucket': bucket,
            'source_key': key
        }
    )

With Step Functions, the EventBridge payload can be received directly using paths such as $.detail, so there is no need for an intermediate Lambda function.

This is one of the areas where Glue Workflow limitations become noticeable compared to Step Functions.

Detecting Workflow Completion via EventBridge

Glue Workflow status changes are emitted as EventBridge events.

You can use this to:

Send SNS notifications
Trigger downstream systems
Launch post-processing workflows

Glue Workflow (COMPLETED/FAILED)
    → EventBridge
        → SNS / Lambda

This is especially useful when you want separate processing for success and failure cases.

Calling Glue Workflow from Step Functions

It is also possible to launch Glue Workflow from Step Functions.

However, since there is no .sync integration pattern available, you must implement your own polling logic to detect completion.

I covered this in detail in the previous article, so feel free to refer to it if interested.

Organizing the Use Cases of AWS Step Functions and Glue Workflow for ETL Processing with AWS Glue Jobs

Operational Tips

Here are several practical points worth knowing when operating Glue Workflow in production.

Resuming Failed Workflows (`ResumeWorkflowRun`)

Glue Workflow provides a feature called ResumeWorkflowRun, which allows resuming execution from failed nodes.

In the console, you can:

Open the failed Workflow Run detail page
Select the nodes to resume
Enable the “Resume” checkbox

It is also available through CLI/API.

aws glue resume-workflow-run \
  --name my-workflow \
  --run-id wr_xxxx \
  --node-ids node_yyyy node_zzzz

When resumed:

The specified nodes
And all downstream nodes

are re-executed.

The resumed workflow is tracked using a new Run ID.

Compared with Step Functions Redrive, however, there are several limitations:

You must explicitly specify failed nodes
Retrieving node IDs requires calling: get-workflow-run --include-graph
Additional IAM permissions (glue:ResumeWorkflowRun) are required

For simple retry scenarios, restarting the entire workflow via ON_DEMAND execution is often easier.

For pipelines with more sophisticated recovery requirements, Step Functions tends to provide a smoother operational experience.

Monitoring with CloudWatch

Glue Workflow execution states can be viewed in the Glue console, but detailed Job and Crawler logs are output to CloudWatch Logs.

Typical monitoring targets include:

Workflow Run list
- Glue Console
Job execution logs
- /aws-glue/jobs/output
- /aws-glue/jobs/error
Metrics
- CloudWatch Metrics
- Execution duration
- DPU usage

In practice, monitoring and alerting strategies often combine:

Workflow-level statuses
Job-level metrics

depending on the situation.

Handling the 100-Object Recommendation Limit

As mentioned earlier, AWS recommends keeping the total number of objects in a workflow (Jobs + Crawlers + Triggers) below 100.

If a large pipeline approaches this limit, consider:

Splitting workflows
- Trigger downstream workflows after upstream completion
Consolidating common processing inside Jobs
- Merge smaller processing units into larger Jobs

In my experience, workflows approaching 100 objects are often already too complex from a design perspective.
At that point, it may be worth reconsidering the architecture itself or migrating to Step Functions.

Where Glue Workflow Fits — and Its Limitations

Glue Workflow shines in scenarios such as:

Simple ETL pipelines completed entirely within Glue Jobs and Crawlers
Periodic ingestion pipelines for data lakes
Lightweight small-to-medium scale pipelines that need to be launched quickly

On the other hand, Step Functions is generally better suited for:

Integrations involving Lambda, ECS, and other AWS services
Complex branching and dynamic parameter control
Large-scale development involving multiple teams

I discussed these decision criteria in more detail in the previous article, so feel free to refer to it.

Organizing the Use Cases of AWS Step Functions and Glue Workflow for ETL Processing with AWS Glue Jobs

Conclusion

In this article, I organized AWS Glue Workflow from the basics through practical usage patterns.

To summarize:

Four trigger types:
SCHEDULED / ON_DEMAND / CONDITIONAL / EVENT
CONDITIONAL Triggers:
Flexible flow control using AND / ANY conditions and failure branching
default_run_properties:
Shared workflow-wide parameter management
Data passing between Jobs:
Dynamic value propagation using put_workflow_run_properties
EventBridge integration:
Event-driven execution via EVENT Trigger
(although Lambda-based parameter passing is often easier)
ResumeWorkflowRun:
Partial restart functionality from failed nodes

After writing all of this, I still have to admit:
Step Functions is generally easier to use.

That said, Glue Workflow still provides meaningful value today because it allows you to build Glue-centric ETL pipelines in a very simple and cost-efficient way.

Rather than defaulting to Step Functions automatically, understanding Glue Workflow properly and knowing when to use it can broaden your architectural options.

I hope this article helps both people who are starting to use Glue Workflow and those already working with it.

DEV Community

Organizing How to Use AWS Glue Workflow

Introduction

What Is Glue Workflow?

Details of the Core Components

CONDITIONAL Trigger Usage Patterns

1. Simple Sequential Pattern

2. Waiting for Multiple Jobs to Complete (AND Condition)

3. Triggering When Any Job Completes (ANY Condition)

4. Failure Branching (Catching FAILED States)

Managing Parameters with `default_run_properties`

How It Works

How to Configure It

Configure from the Console

Configure via boto3

Configure via IaC (CloudFormation, etc.)

Static Parameters vs Dynamic Parameters

Passing Data Between Jobs

Dynamically Updating Run Properties

Retrieving Properties from a PySpark Job

Retrieving Properties from a Python Shell Job

Anti-Patterns for Data Passing

Integrating with EventBridge and Other Services

EventBridge-Based Startup (EVENT Trigger)

Batch Event Startup

Starting from S3 Events (The Parameter Passing Limitation)

Detecting Workflow Completion via EventBridge

Calling Glue Workflow from Step Functions

Operational Tips

Resuming Failed Workflows (`ResumeWorkflowRun`)

Monitoring with CloudWatch

Handling the 100-Object Recommendation Limit

Where Glue Workflow Fits — and Its Limitations

Conclusion

Top comments (0)

Introduction

What Is Glue Workflow?

Details of the Core Components

CONDITIONAL Trigger Usage Patterns

1. Simple Sequential Pattern

2. Waiting for Multiple Jobs to Complete (AND Condition)

3. Triggering When Any Job Completes (ANY Condition)

4. Failure Branching (Catching FAILED States)

Managing Parameters with default_run_properties

How It Works

How to Configure It

Configure from the Console

Configure via boto3

Configure via IaC (CloudFormation, etc.)

Static Parameters vs Dynamic Parameters

Passing Data Between Jobs

Dynamically Updating Run Properties

Retrieving Properties from a PySpark Job

Retrieving Properties from a Python Shell Job

Anti-Patterns for Data Passing

Integrating with EventBridge and Other Services

EventBridge-Based Startup (EVENT Trigger)

Batch Event Startup

Starting from S3 Events (The Parameter Passing Limitation)

Detecting Workflow Completion via EventBridge

Calling Glue Workflow from Step Functions

Operational Tips

Resuming Failed Workflows (ResumeWorkflowRun)

Monitoring with CloudWatch

Handling the 100-Object Recommendation Limit

Where Glue Workflow Fits — and Its Limitations

Conclusion

Managing Parameters with `default_run_properties`

Resuming Failed Workflows (`ResumeWorkflowRun`)