loading...

Serverless for the server obsessed, Part 1

elliottroche profile image Elliott Roche Updated on ・5 min read

A/B testing doesn't offer much unless you get some useful data out of it, and I needed a way to calculate how statistically significant the results of an A/B test were. Python is one of the obvious choices for this kind of work. But, Preferr is written in Ruby and has a Rails based GraphQL API, so I couldn't just shove in some Python and have it work.

And then a little voice in the back of my head whispered, "Hey remember all those podcasts and articles about serverless? You should try it."

Getting started with Serverless

The setup

As someone who had no idea where to start with serverless, I googled serverless and, what do you know, serverless.com popped up. With a name like that, surely it was the right place to start.

I signed up, followed some set up instructions, and then got to work on setting up Python and a virtual environment.

Note:
If you're running macOS and you're installing Python 3 via homebrew like I did, be warned that it might screw up OpenSSL and really ruin your day. To unruin your day, you can run brew reinstall openssl@1.1 or check this stackoverflow thread with all sorts of other possible ways to solve the problem.

Generating a new serverless project

Serverless comes with a very nice CLI that allows you to quickly generate some boilerplate for a ton of different templates.

I used the python3 template:

serverless create --template aws-python3 --name my-special-service --path serverless-project

That generated 3 files: .gitignore, handler.py, and serverless.yml.

Each of those comes with some generated code that can be used to take the function for a little test run.

Installing Python libraries

Among the many Serverless guides and tutorials is one about managing python packages. I referenced it frequently.

Note: Their tutorial uses virtualenv for python, but I used pipenv.

I needed the numpy and scipy packages, so I started a pipenv shell and installed them.

pipenv shell
pipenv install numpy
pipenv install scipy

That generated a Pipfile and Pipfile.lock in the project and I was now free to use those packages in handler.py.

The handler.py file

The handler.py file in a Serverless project is where the logic for your function(s) lives.

The generated output from serverless create is this:

import json


def hello(event, context):
    body = {
        "message": "Go Serverless v1.0! Your function executed successfully!",
        "input": event
    }

    response = {
        "statusCode": 200,
        "body": json.dumps(body)
    }

    return response

    # Use this code if you don't use the http event with the LAMBDA-PROXY
    # integration
    """
    return {
        "message": "Go Serverless v1.0! Your function executed successfully!",
        "event": event
    }
    """

Obviously, this file needed to be modified to do some real work.

Here's what the end result of handler.py looks like for Preferr:

try:
    import unzip_requirements
except ImportError:
    pass
import json
# from serverless_sdk import tag_event
import numpy as np
from scipy.stats import chi2_contingency
from scipy.stats import fisher_exact


def chiValue(event, context):
    contingency_table = json.loads(event["body"])
    statistic = chi2_contingency(contingency_table['contingencyTable'])

    headers = {
        "Access-Control-Allow-Origin": "*",
    }

    body = {
        "chi2": statistic[0],
        "pValue": statistic[1],
        "input": event
    }

    response = {
        "statusCode": 200,
        "headers": headers,
        "body": json.dumps(body)
    }

    return response


def fishersExact(event, context):
    contingency_table = json.loads(event["body"])
    statistic = fisher_exact(contingency_table['contingencyTable'])

    headers = {
        "Access-Control-Allow-Origin": "*",
    }

    body = {
        "pValue": statistic[1],
        "input": event
    }

    response = {
        "statusCode": 200,
        "headers": headers,
        "body": json.dumps(body)
    }

    return response

The most critical parts of this file are at the top with the imports. To understand why, I'll take you to the serverless.yml file.

The serverless.yml file

serverless.yml is the configuration file that tells Serverless how your function operates within Serverless itself and how it should be deployed to AWS Lambda. It establishes what cloud provider you're using, what Serverless plugins are needed, what functions can be run, and a lot more. Here's a full reference for options in this file when using aws as your provider: serverless.yml Reference

I spent the majority of my time tweaking this file to get it right. I deployed at least a hundred times over the course of a few days as I debugged and tried different settings.

These are the settings that ended up being the most important for me.

plugins:
  - serverless-python-requirements

This plugin automatically bundles Python libraries found in requirements.txt or Pipfile for use in the deployed function(s).

To install it, run npm init in your project and follow the directions to create a package.json file.

After that, run npm install --save serverless-python-requirements.

custom:
  pythonRequirements:
    dockerizePip: non-linux
    zip: true

This tells the serverless-python-requirements plugin how it should bundle up dependencies like numpy and scipy.

dockerizePip uses Docker to package these dependencies and it's essential to get things working on AWS Lambda. To quote the Serverless docs:

"Docker packaging is essential if you need to build native packages that are part of your dependencies like Psycopg2, NumPy, Pandas, etc."

functions:
  chiValue:
    handler: handler.chiValue
    events:
      - http:
          path: /chi-value
          method: post
  fishersExact:
    handler: handler.fishersExact
    events:
      - http:
          path: /fishers-exact
          method: post

And this defines the functions that will be given HTTP endpoints for you to call. Serverless will give you a URL that you can find on your dashboard.

Here's how Preferr's serverless.yml file ended up:

service: preferr-stats-chisquared
app: preferr-stats
org: ellitt

provider:
  name: aws
  runtime: python3.6
  stage: prod
  region: us-east-1

plugins:
  - serverless-python-requirements
package:
  individually: true
  exclude:
    - venv/**
    - .vscode/**

custom:
  pythonRequirements:
    dockerizePip: non-linux
    zip: true

functions:
  chiValue:
    handler: handler.chiValue
    events:
      - http:
          path: /chi-value
          method: post
  fishersExact:
    handler: handler.fishersExact
    events:
      - http:
          path: /fishers-exact
          method: post

An Important Modification

The guide that I followed from Serverless got me about 80% of the way towards having a usable Lambda function on AWS, but I did have to dig and figure out why my deploys kept getting rejected.

Adding zip: true to pythonRequirements

zip: true makes the deployed package much smaller and prevents Lambda from rejecting it due to space constraints. numpy and scipy are both large libraries, so they need to be compressed in order to work.

zip: true also requires a change to handler.py as well.

try:
  import unzip_requirements
except ImportError:
  pass

This will unzip the compressed libraries so that they can be used in handler.py.

Deployment

After spending a few days sorting all of this out, I deployed the function and gave it a try.

serverless deploy

And it deployed successfully! Now, I had the function on AWS Lambda via Serverless, but it wasn't worth much if I wasn't making any requests to it.

I'll cover making those requests in the next post.

Discussion

markdown guide
 

Amazing article! Keep it up πŸ‘πŸ”₯