Jayesh Shinde

Posted on Nov 9

🧩 From 15 Minutes to Infinite: Scaling STT Jobs with AWS Batch

#aws #fargate #lambda #batch

💡 The Problem

We recently ran into a production issue — our Speech-to-Text (STT) service stopped working for a few hours.
The feature was fixed quickly, but the transcripts for that downtime were missing.

Luckily, in Amazon Connect, all call recordings are stored in S3.
So the audio was there, but no transcripts.

We needed to reprocess all those missed files — fast.

🧠 First Attempt: Lambda (and its Limitations)

We quickly built a Lambda function to process unprocessed files from S3.

It worked fine — until it didn’t.
AWS Lambda has a 15-minute execution limit, and processing large audio files can easily exceed that.

We could have switched to EC2, but that felt like using a hammer for a small screw — no auto-scaling, no graceful shutdown, no built-in retry or job management.

We needed something that behaved like a job, not a script.

🚀 Enter AWS Batch + Fargate

That’s when AWS Batch came to the rescue.
It’s perfect for this kind of workload — long-running, batch-style, event-driven jobs.

Here’s the setup we used:

Created a Compute Environment

Backed by AWS Fargate → no EC2 management.
Scales automatically depending on job load.

Defined a Job Queue

All reprocessing jobs will be submitted here.
The queue ensures controlled concurrency and retries.

Built a Job Definition

Packaged our STT processing logic as a Docker image.
Uploaded it to Amazon ECR.
Defined required vCPU and memory for each job.

Triggered via Lambda

A small Lambda fetches a list of unprocessed S3 files.
For each batch (say 50 files), it submits a Batch Job.

⚙️ The Flow in Action

Lambda → Checks for unprocessed audio files in S3.
Lambda → AWS Batch: Submits a job to process them.
AWS Batch (Fargate) spins up compute, runs the job.
Job → Downloads audio → runs STT → uploads transcript → updates metadata.
Fargate shuts down automatically when the job finishes.

No idle servers, no manual cleanup, no stress.

🧩 Why This Design Rocks

✅ Serverless all the way — Lambda + Fargate + S3
✅ Auto-scaling compute — no EC2 to babysit
✅ Long-running safe zone — runs beyond Lambda’s 15-min cap
✅ Reusable — we can reprocess any backlog anytime
✅ Cost-efficient — pay only for what’s used

🪄 Bonus Tip

You can even schedule a “missed transcript” job to run daily or weekly,
checking for any files without transcripts and triggering a Batch job automatically.

🧩 Understanding AWS Batch Scaling

In AWS Batch, the number of tasks (containers) that run in parallel depends on three things working together:

Compute Environment capacity
→ e.g., your environment has a maximum of 10 vCPUs.
Job Definition requirements
→ e.g., each job needs 1 vCPU.
How many jobs are in the queue (and their array size, if used).

🔹 Case 1: You Submit Multiple Independent Jobs

If you submit 10 jobs, each with 1 vCPU, and your environment allows 10 vCPUs,
then AWS Batch can run all 10 in parallel (subject to available Fargate capacity).

Example:

# pseudo example
for i in {1..10}; do
  aws batch submit-job \
    --job-name process-audio-$i \
    --job-queue my-queue \
    --job-definition my-job-def
done

Each job = 1 vCPU → up to 10 can run simultaneously.

AWS Batch’s Job Scheduler will automatically pack as many as possible based on available compute.

🔹 Case 2: You Use an Array Job

Instead of manually looping, you can submit an array job.

Example:

aws batch submit-job \
  --job-name process-audios \
  --job-queue my-queue \
  --job-definition my-job-def \
  --array-properties size=10

This creates 10 child jobs under a single parent, each running independently (great for S3 list chunking).

Same result — 10 parallel containers, each with 1 vCPU.

🔹 Case 3: You Submit a Single Job that Needs More vCPUs

If you set in your job definition:

"vcpus": 4

and your environment has 10 total vCPUs →
then Batch will reserve 4 vCPUs for that job, leaving room for other smaller jobs.

So the compute environment doesn’t spawn “10 copies automatically” —
it just enforces a maximum pool of total CPU that concurrent jobs can consume.

⚙️ TL;DR — How to Scale

Goal	What to Do
Run multiple tasks concurrently	Submit multiple jobs or an array job
Each job’s CPU need	Defined in Job Definition (e.g., 1 vCPU)
Max parallel limit	Based on compute environment capacity
Control at runtime	You can pass `--array-properties size=N` dynamically
Scaling behavior	Batch automatically scales Fargate/EC2 capacity up/down

🏁 Closing Thoughts

This experience reminded me —

“When your script starts feeling like a job, give it job-like powers.”

AWS Batch (especially with Fargate) is often underrated,
but it’s a powerful tool when you need on-demand, containerized, long-running compute
without managing any servers.

DEV Community