DEV Community

Wakeup Flower
Wakeup Flower

Posted on

AWS Batch + Step Functions

1. What This Combination Means

  • AWS Batch → runs batch computing workloads (large-scale processing jobs).
  • AWS Step Functions → orchestrates workflows (runs jobs in order, with retries, parallelism, conditional logic).
  • S3 → stores input data, intermediate results, or final outputs.

So together:

Step Functions → AWS Batch → process → S3
Enter fullscreen mode Exit fullscreen mode

2. When to Use This Combination

You would use Batch + Step Functions → S3 when you have large-scale, automated, orchestrated workloads that produce or consume data stored in S3.

Examples:

  • Data processing pipelines
    You have raw data in S3 that needs heavy computation (e.g., ETL, simulations, analytics). Step Functions can coordinate multiple Batch jobs and store outputs in S3.

  • Machine learning workflows
    Train models on large datasets in S3. Step Functions orchestrate Batch jobs for preprocessing, training, and evaluation.

  • Media processing
    Large-scale video/image transcoding, where Step Functions run Batch jobs to process media stored in S3.

  • Periodic workloads
    Scheduled jobs (e.g., nightly reports) where Step Functions trigger Batch jobs automatically, and outputs go to S3.


3. Why This Pattern Works Well

  • Scalability: AWS Batch can handle thousands of compute jobs without you managing servers.
  • Orchestration: Step Functions give you control — you can run jobs in sequence, in parallel, or based on conditions.
  • Durability: S3 stores your inputs and results persistently.
  • Automation: Step Functions + Batch create fully automated workflows.

4. How It Works (Example Flow)

Example: Large-scale CSV processing job

  1. Raw CSV files uploaded to S3.
  2. Step Functions triggered (via S3 Event or CloudWatch Event).
  3. Step Functions starts AWS Batch job(s) to process files.
  4. Batch job reads input from S3, processes it, writes results back to S3.
  5. Step Functions tracks progress and errors.

Diagram:

S3 (input) → Step Functions → AWS Batch → S3 (output)
Enter fullscreen mode Exit fullscreen mode

5. When NOT to Use This Pattern

  • If your workload is real-time → AWS Lambda or Kinesis may be better.
  • If you don’t need orchestration → Batch alone may suffice.
  • If your data fits in memory and doesn’t require huge scaling → simpler EC2 or Lambda jobs may be easier.

💡 Quick analogy:

  • S3 = your warehouse of data.
  • AWS Batch = your factory workers that process things in bulk.
  • Step Functions = your factory manager who coordinates jobs efficiently.
      +---------+           +----------------+           +--------+
      |         |           |                |           |        |
      |   S3    |  Trigger  | Step Functions |  Submit   | AWS    |
      | (input) | --------> | (orchestration)| --------> | Batch  |
      |         |           |                |           |        |
      +---------+           +----------------+           +--------+
                                                          /        \
                                                         /          \
                                              Process data           Process data
                                                 in Batch jobs       in Batch jobs
                                                        |                 |
                                                        v                 v
                                                   +-------------------------+
                                                   |                         |
                                                   |        S3 Output       |
                                                   | (processed results)    |
                                                   +-------------------------+

Enter fullscreen mode Exit fullscreen mode

Top comments (0)