Ruby on AWS Lambda: Package & Ship It

#ruby #aws #serverless #devops

This article is part of our Ruby on AWS Lambda series. A recent project had us migrating an existing pdf document processing system from Rails/Sidekiq to AWS Lambda. The processing includes OCR, creating preview images, splicing the pdf, and more. Moving to Lambda reduced processing time by 300% in some cases; parallelization for the win!

This series will serve less as a step-by-step process to get OCR serverless infrastructure up and running and more of a highlight reel of our "Aha!" moments. In part one, we talked about creating an AWS Lambda Layer with Docker. In part two, we chatted about architecture. Here in part three, we'll go through some strategies surrounding deployment. Check out the other posts in the series:

Deploying to Lambda gets more complicated as the application grows. There is a big difference between deploying a single hello world function with zero dependencies, and deploying multiple functions in multiple languages along with dependencies in a Lambda Layer. We discovered some useful things along the way while making that transition. Let's dive in.

Function naming conventions and environment variables

We followed a straightforward convention for differentiating between function names and their environments:

{function_name}_{environment}

Given our three Rails environments (development, staging, and production) and a function name of pdf_processor, we'd have three separate functions: pdf_processor_development, pdf_processor_staging, and pdf_processor_production. Adding more functions may seem a bit overwhelming in terms of function count, but the clear distinctions between functions and their environments allow for isolated development, testing, and debugging.

Along with other normal environment variables that the functions use, we added a RUBY_ENV environment variable that was either development, staging, or production. This is primarily used for name-spacing S3 buckets, DynamoDB tables, and calling other lambda functions. This decision played out well because we now have isolation in how we use other systems and services. This also makes reading logs and tracking performance metrics a lot easier.

Instrumentation

Instrumentation refers to the measure of a product's performance, to diagnose errors, and to write trace information. Early on in development, we added instrumentation to gain insight into how the bulk of the processing work performs.

Here’s our implementation:

# instrument.rb

module Instrument
 def instrument(tag)
   start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
   -> { puts("--> #{tag} took #{((Process.clock_gettime(Process::CLOCK_MONOTONIC) - start) * 1000).round}ms") }
 end
end

Then it is used like:

require 'instrument'

i = instrument('doing some work')
do_the_work()
i.call

The log would look like:

--> doing some work took 300ms

The start variable gets caught in a closure and is available for the Proc that is returned. The semi-cryptic looking Process.clock_gettime is used to read from the OS clock, and is more accurate than Ruby's Time class.

Serverless

When we first started building on Lambda, we deployed the AWS way: zip the code up and ship it via the AWS CLI. This was fine in the beginning, but it left a lot to be desired. It puts the responsibility on the developer to know about the deployment process, something that should be handled by configuration. That added responsibility also detracted from a deployment being deterministic. If you missed a small detail, a deployment, and thus the function itself, could be botched. We don't zip and deploy our Rails apps up manually, and we shouldn't have to do that with our serverless functions either!

In comes Serverless Framework. Everything we love about configuration and deterministic deployments are handled by Serverless. All configuration is written in YAML and is concise, intuitive, and incredibly powerful.. Environment variables, runtime specification, memory configuration, AWS roles, and layer versioning can all be done in a single YAML file. These features became even more valuable when we rewrote our vent function in JavaScript. We now have a separate serverless config for each lambda function that allows us to achieve true language isolation. If you'd like to learn more about using Serverless, take a look at their AWS quickstart guide.

Shipping a Lambda application doesn’t have to be difficult. I hope what we’ve learned has helped you!