A colleague and I are currently in the process of updating an aging service in our stack. The service is pretty well isolated — it takes a bit of input, does a few units of work, then puts the results in a specific S3 bucket with a key provided as part of the input.
Normally I would reach for SQS + EC2 with a homegrown background worker or a community library like shoryuken. However, having a few dozen servers in production already, I wanted a solution that wouldn’t include additional infrastructure I would need to manage.
When I run in to an infrastructure related challenge nowadays, I do a quick search in the AWS console to see what turns up. Sure enough, AWS has a service for background jobs called Batch.
At first glance, it wasn't entirely clear how Batch worked. I Googled a few different keyword arrangements, none of which turned up anything particularly useful; a few articles here and there, but nothing like an in-depth tutorial that covered our particular use case.
After going through the Getting Started Guide, I found that Batch was very simple to set up. Most of the overhead was getting over what a traditional background worker setup looks like.
There are a few characteristics that, collectively, make Batch work well for background jobs.
Batch jobs are run in a Compute Environment. This is just an ECS cluster that Batch creates and manages for you. These are the nodes/instances that will run your job.
Jobs are defined using a Job Definition, which allows you to customize the number of CPUs, memory, and retries, as well as set environment variables and other options, like what default Docker Image you want to use.
We knew we wanted to use Docker Images from the beginning. Our job has a large amount of runtime dependencies, and we thought it best to packages these into an atomic unit. Batch is a great fit, since Batch supports Docker Images.
Unlike other background job libraries, Batch has its own queueing system — it doesn't rely on SQS or Redis.
Job Queues are tied to a Compute Environment and can have different priorities. Jobs are submitted to Job Queues, which get run by the Compute Environment.
Jobs are submitted to a Job Queue using a Job Definition. Default arguments set in the Job Definition can be overwritten when submitting a job.
There are a few ways to submit jobs. We submit our jobs in our codebase using the the Batch gem provided by AWS.
There's still a lot more for me to learn about Batch, but I wanted to share my experience using it so far. Feel free to leave a comment if you have any questions.