DEV Community

Cover image for How to Schedule Any Task with AWS Lambda

How to Schedule Any Task with AWS Lambda

Renato Byrro on August 12, 2019

Did you know it is possible to schedule the execution of virtually any task to run in the cloud with AWS Lambda? I have been asked how to do it a ...
Collapse
 
jdrydn profile image
James

We cannot customize how Lambda will run based only on what the scheduler provides, like we did with DynamoDB and S3.

Using the "Constant JSON text" as the screenshot shows, you can configure the input for the function. so you could have multiple for schedules for the same function with different inputs:

rate(1 minute) → {"rate":"EVERY-MINUTE"}
cron(*/10 * * * ? *) → {"rate":"EVERY-TEN-MINUTES"}
Collapse
 
byrro profile image
Renato Byrro

Hi James, thanks for the comment!

I'm replying a bit late, but what I wanted to mean was that DynamoDB and S3 can provide more information to be processed. In the case of Dynamo, we can receive the entire object with the invocation, as well as its state prior to the change.

With CloudWatch Rules, it's just a rule. Unless it's a self-contained task, Lambda will probably need to go get more data somewhere else to actually process what it's supposed to.

Collapse
 
ale_annini profile image
Alessandro Annini

Hi Renato,
thanks for your interesting article.
What if my task needs to be executed at (almost) exactly the time specified? Do you have any advice?

Thanks

Collapse
 
byrro profile image
Renato Byrro

Hi Alessandro, glad you liked the post. Thanks for the comment, that is a very interesting question.

You'll need to implement custom code. I can think of a few ideas, but need more investigation to come up with a proper architecture. Is it possible to detail a little more your use case?

One naive idea would be:

  1. Set up a Lambda to run every minute, triggered by a CloudWatch Rule.
  2. Store tasks in a DynamoDB table, indicating the precise time to execute.
  3. Lambda will query this table and get all tasks scheduled for start=current_timestamp + 30 seconds & end=current_timestamp + 90 seconds (the 30 sec start is an offset to account for Lambda startup time - this needs to be adjusted according to a number of factors).
  4. Implement one or more additional Lambdas to process each type of task.
  5. The first Lambda will invoke these executor Lambdas passing the task.
  6. Each Executor Lambda code could implement a "while" loop to check whether current_timestamp == task_execution_timestamp. When evaluates to true, it executes the task.

I said it's a naive idea because it ignores some important things:

What does "exactly the time specified" mean to you?

Is it enough to run the task on a given second? Or do you need time resolution down to the millisecond, maybe microsecond?

That will have an impact over the implementation. Some programming languages will resolve time down to milliseconds, only.

How much deviation can you accept to meet the "almost" requirement?

If you're using AWS Lambda, beware that you can't control which machine is running your code. Could be multiple machines throughout a given period of time. It's actually most likely to be a different machine for every cold start.

This has important implications since there are issues with syncing clocks on distributed systems.

Depending on how much deviation you can accept in the "almost the exact time", this can be a problem.

Scalability

How many tasks do you expect to schedule and how are they distributed over time?

Is it possible that you'll have 50,000 tasks to run on a given millisecond? If yes, the challenge will be setting an infra that can scale to that level of concurrent requests.

Reliability (in general, not only infra-wise)

What happens if the triggering process of a task fails, or if the task executor fails entirely and a block of tasks is not executed at all.

Do you need a system in place to check for that and retry the task or can you afford having some tasks being lost?

Will it be too late if a few seconds have passed before retrying?

Is it a problem if, occasionally, the same task gets executed twice? If yes, a proper locking mechanism needs to be in place to ensure each task is processed once and only once.

Collapse
 
byrro profile image
Renato Byrro

By the way, make sure you research libraries that could help you with the code implementation. For example, Python has the celery project, which can help you with scheduling tasks with precise timing.

Collapse
 
chrisarmstrong profile image
Chris Armstrong

Another way to schedule tasks (albeit within a 15min time-frame) is to post a message to an SQS queue with a DelaySeconds parameter. This can be useful for implementing a polling/reccurrent task started on demand that terminates itself.

Collapse
 
byrro profile image
Renato Byrro

That is a clever idea, thanks for sharing Chris! ;)