Raphael Jambalos for AWS Community ASEAN

Posted on Feb 1, 2022 • Edited on Feb 13, 2022

Black Belt Techniques in Serverless Framework App Deployments

#aws #serverless #devops

We've been writing Python applications using Serverless Framework for the past 18 months. Throughout a dozen projects and POCs, our team has accumulated best practices in configuring our applications as we deploy them.

In this post, I will walk you through our top 7 best practices we wished we knew back when we started:



1. Use a single serverless.yml
2. Manage environment variables
3. Not all environment variables have to be global
4. Have a local environment
5. Prune your applications with Serverless Prune Plugin
6. Use Lambda Layers
7. Exclude specific directories from the deployment packages

You can also access the code repository I used in validating the contents of this blog post.

[1] Use a single serverless.yml

When we started out, we used to have one serverless.yml file for each environment like this:



serverless-develop.yml
serverless-uat.yml
serverless-staging.yml
serverless-production.yml

It was hard to maintain because I had to copy changes in the YAML file as I promoted my code from the develop branch to later branches. It also didn't play well when devs wanted to have their own environments. Each of them had to create their own serverless.yml files (i.e serverless-kim.yml).

Now, we use only one serverless.yml file, and we use the --stage parameter to deploy to different environments. We don't have to maintain multiple serverless.yml files anymore.



serverless deploy --stage dev
serverless deploy --stage uat
serverless deploy --stage stg
serverless deploy --stage prd

The first problem we encountered with this new approach is how to vary the values of environment variables between environments. For that, we have the next tip.

[2] Manage environment variables: hardcoded, stored in .env, parameter store

When we were starting out, all our variables were hardcoded. That worked out a bit because we had multiple serverless.yml files. We started having problems when we enforced having just one serverless.yml file across multiple environments.

For this tip, we will take a look at the three types of environment variables in a serverless.yml file:

[2.1] Hardcoded

There is still a place for hardcoded env vars. This is for values that is constant across all environments and does not need to be secure. Examples include FREE_SHIPPING_SUBTOTAL (min amount to avail of free shipping) and MINIMUM_CHECKOUT_SUBTOTAL (min amount to complete an order).

[2.2] Stored in .env with serverless-dotenv-plugin

This is for non-secure configuration that can be stored in your developer's (or build server's) environment. These values may vary among environments. An example includes FAQ_BLOG_ID (the ID of the article that shows the FAQ of the website may vary between dev and prod as they are using different databases).

If you are doing deployment for different environments in your local machine, you will need to have one .env file for each environment:



.env.production
.env.staging
.env.uat
.env.develop

To manage multiple .env files, we recommend the plugin serverless-dotenv-plugin. With this plugin, you can specify which .env file to use during deployment:



NODE_ENV=develop serverless deploy --stage dev
NODE_ENV=uat serverless deploy --stage uat
NODE_ENV=staging serverless deploy --stage staging
NODE_ENV=production serverless deploy --stage prd

Before you proceed, make sure to install the plugin via:



serverless plugin install -n serverless-dotenv-plugin

[2.3] Securely stored with parameter store

When storing credentials (i.e passwords, db credentials), it is highly advised not to keep them locally. It is best to save it to parameter store and use SF's native syntax to refer to a parameter store and retrieve its values.

In order to create the parameter store values, use the Python commands below (or use the AWS Console). To have a Python CLI to enter this command, enter the command python in your terminal. You can also do AWS_PROFILE=customer-profile python in case you have multiple AWS profiles in your local machine:




import boto3

client = boto3.client('ssm', region_name="ap-southeast-1")

# define for DEV
client.put_parameter(Name="/sf-blackbelt/dev/DB_USER", Value="dev-admin", Type="String", Overwrite=True)
client.put_parameter(Name="/sf-blackbelt/dev/DB_PASSWORD", Value="dev_secure_password", Type="String", Overwrite=True)

# define for UAT
client.put_parameter(Name="/sf-blackbelt/uat/DB_USER", Value="uat-admin", Type="String", Overwrite=True)
client.put_parameter(Name="/sf-blackbelt/uat/DB_PASSWORD", Value="uat_secure_password", Type="String", Overwrite=True)

Next, reference them in your serverless.yml (as seen in the next section).

Putting them all together

Below, we prepared a short YAML file that incorporates the 3 types of environment variables. We will use this as we go through our article.



service: sf-blackbelt

frameworkVersion: '2'

provider:
  name: aws
  runtime: python3.7
  versionFunctions: false
  stage: ${opt:stage, 'dev'}
  environment:

    ##################
    # Using Parameter Store
    ##################
    DB_USER: ${ssm:/sf-blackbelt/${self:provider.stage}/DB_USER} 
    DB_PASSWORD: ${ssm:/sf-blackbelt/${self:provider.stage}/DB_PASSWORD}

    ##################
    # Using Constants
    ##################
    FREE_SHIPPING_SUBTOTAL: 1000
    MINIMUM_CHECKOUT_SUBTOTAL: 300

    ##################
    # Using Env File
    ##################
    FAQ_BLOG_ID: ${env:FAQ_BLOG_ID}
    MAIN_BANNER: ${env:MAIN_BANNER}

functions:
  completeOrder:
    handler: handler.complete_order
    timeout: 30
    events:
      - http:
          path: /orders/complete
          method: post

  calculateOrder:
    handler: handler.calculate_order
    timeout: 30
    events:
      - http:
          path: /calculate_order
          method: post

  homepage:
    handler: handler.complete_order
    timeout: 30
    events:
      - http:
          path: /
          method: GET

The parameter store section shows how we used a special SF syntax: ${ssm} allows us to reference values in our parameter store, while ${self:provider.stage} enable us to define the stage dynamically.

Our .env.develop file contains:



FAQ_BLOG_ID=12345
MAIN_BANNER=678910

And our .env.uat file contains:



FAQ_BLOG_ID=33333
MAIN_BANNER=121212

If you want to follow along, also copy the handler.py file from my Github repo

In the .env section, we used the ${env:VAR} syntax to access the value of the variable from the environment. Since we are using the plugin serverless-dotenv-plugin, we can get this from our chosen .env file when we deploy. We specify which .env file to select by adding the NODE_ENV variable as part of our deployment command:



NODE_ENV=develop serverless deploy --stage dev
NODE_ENV=uat serverless deploy --stage uat

[3] Not all environment variables have to be global

We got a little too comfortable using the tips mentioned above. It's gotten so convenient putting all the environment variables under the provider section of the serverless.yml file. We have them all in one place.

This worked out well for us for a couple of months, until we got a long project where we built out the project over the course of several months. In the 4th month, we hit the 4KB limit on environment variables. We couldn't deploy after that. So we had to trim out our environment variables.

First, we tried adding environment variables for each Lambda function. Notice that for DB_USER and DB_PASSWORD, we kept them in the provider section because all of our Lambda functions need them. They are genuinely global variables. But for the others, they are only required by some Lambda functions, as you will see here:



provider:
  environment:
    DB_USER: ${ssm:/sf-blackbelt/${self:provider.stage}/DB_USER} 
    DB_PASSWORD: ${ssm:/sf-blackbelt/${self:provider.stage}/DB_PASSWORD}

functions:
  completeOrder:
    handler: handler.complete_order
    timeout: 30
    events:
      - http:
          path: /orders/complete
          method: post
    environment:
      FREE_SHIPPING_SUBTOTAL: 1000
      MINIMUM_CHECKOUT_SUBTOTAL: 300

  calculateOrder:
    handler: handler.calculate_order
    timeout: 30
    events:
      - http:
          path: /calculate_order
          method: post
    environment:
      FREE_SHIPPING_SUBTOTAL: 1000
      MINIMUM_CHECKOUT_SUBTOTAL: 300

  homepage:
    handler: handler.complete_order
    timeout: 30
    events:
      - http:
          path: /
          method: GET
    environment:
      FAQ_BLOG_ID: ${env:FAQ_BLOG_ID}
      MAIN_BANNER: ${env:MAIN_BANNER}

As you might have guessed from the syntax above, if two or more Lambda functions refer to the same environment variable, you have to define them over and over. If you have 10 Lambda functions that need the FREE_SHIPPING_SUBTOTAL variable, you define them 10 times. And that's a recipe for disaster. Forget to change just one Lambda function, and some lambda functions will check for 1000 pesos while some will check for 500 pesos to grant the free shipping.

To bypass that headache, we use references:



functions:
  completeOrder:
    handler: handler.complete_order
    timeout: 30
    events:
      - http:
          path: /orders/complete
          method: post
    environment:
      FREE_SHIPPING_SUBTOTAL: ${self:custom.environment.FREE_SHIPPING_SUBTOTAL}
      MINIMUM_CHECKOUT_SUBTOTAL: ${self:custom.environment.MINIMUM_CHECKOUT_SUBTOTAL}

  calculateOrder:
    handler: handler.calculate_order
    timeout: 30
    events:
      - http:
          path: /calculate_order
          method: post
    environment:
      FREE_SHIPPING_SUBTOTAL: ${self:custom.environment.FREE_SHIPPING_SUBTOTAL}
      MINIMUM_CHECKOUT_SUBTOTAL: ${self:custom.environment.MINIMUM_CHECKOUT_SUBTOTAL}

  homepage:
    handler: handler.complete_order
    timeout: 30
    events:
      - http:
          path: /
          method: GET
    environment:
      FAQ_BLOG_ID: ${self:custom.environment.FAQ_BLOG_ID}
      MAIN_BANNER: ${self:custom.environment.MAIN_BANNER}

custom:
  environment:
    FREE_SHIPPING_SUBTOTAL: 1000
    MINIMUM_CHECKOUT_SUBTOTAL: 300
    FAQ_BLOG_ID: ${env:FAQ_BLOG_ID}
    MAIN_BANNER: ${env:MAIN_BANNER}

With this, we get to define those environment variables in one place (in the custom section) and choose what environment variables get included for each Lambda function. A true win-win!

Checking the Calculate Order Lambda function on the AWS Console, we see the environment variables it has access to:

[4] Have a local environment

Using traditional frameworks like Ruby on Rails and Laravel, we start development on our local device and move our way up to the deployment. And that usually happens at the end of the project. Serverless Framework (SF) inverted that paradigm. The first thing you learn in SF is how to deploy to the cloud.

That focus on deployment plays to SF's strengths as a framework that seamlessly deploys your Serverless application. However, developing apps via SF is not local dev friendly by default. You have to set up plugins one by one so you can develop, test and debug entirely on your local machine. Some AWS services don't have a plugin either, and you'd be forced to work with the cloud service itself. Fret not, however, since a lot of the most commonly used AWS Services for SF projects already have a plugin.

While it might be a hassle to set up, it is absolutely worth it. For one, you won't be incurring costs if your environment is local. It would also be faster to develop (and test if your app works) if you can execute them locally.

While I won't be able to discuss them deeply one by one, I'll walk you through the plugins along with a short description of them so you can check them out:

serverless invoke local

The local invoke command is available by default. You can invoke Lambda functions by calling them in the terminal.



serverless invoke local --function homepage \
                        --path mocks/registry/show_all_registries/base.json \
                        --region ap-southeast-1

serverless invoke local --function hello \
--region ap-northeast-1 
--data '{"checkout_data": "A1234"}'

serverless-offline

This is installed with the SF plugin serverless-offline. It allows you to "emulate" APIGW and Lambda locally so you can have your app locally invokable via localhost:3000.

serverless-dynamodb-local

This plugin installs and sets up a local version of DynamoDB running in locahost:8000. It also allows you to define a JSON file that will serve as the database seed to "pre-fill" your DynamoDB tables.

serverless-offline-sqs

This plugin runs a local ElasticMQ that exposes an Amazon SQS compatible interface.

[5] Prune your applications with serverless-prune-plugin

When you deploy using serverless deploy, SF packages your code along with everything it needs to run into something called a "deployment package". Then, it is uploaded to Lambda. SF creates a new version in Lambda, where the deployment package is stored. In the snippet below, we show a Lambda function with 53 versions, one version for each of the 53 times I have deployed on this function. If my deployment package is 1MB in size, I have already consumed 53MB of space.

That may seem small considering that Lambda has a limit of 75GB of storage per region per account. But if this Lambda function is part of an SF application with 100 Lambda functions, 53 deployments of 1MB across 100 Lambda Functions quickly compound to 5300MB.

After a few months, we reached 75GB and could not deploy to our staging region. This limit can be increased to Terabytes as per the docs. But that only spoils our developers and allows them not to mind the deployment package size. So it's better to stick to the 75GB limit so we can discipline our team to use best practices in making the deployment packages smaller.

Get started by installing the plugin:



serverless plugin install -n serverless-prune-plugin

And adding this to the custom part of your serverless.yml file.



custom:
  prune:
    automatic: true
    includeLayers: true
    number: 1

Essentially, this snippet automatically deletes old versions of each of your Lambda functions, but it keeps the latest version.

Every time you deploy, you will see something like this:

[6] Use Lambda Layers

Usually, it's not the code we write that bloats up our deployment package to 1MB (or more). The Python packages we use in our requirements.txt file compose most of that 1MB.

For illustration purposes, let's say in that 1MB deployment package, 900KB of that is Python packages, while the code we write for our application is only 100KB.

The scenario in the previous tip shows the same code repository, but in this tip, we expand that example to 100 Lambda functions. We are uploading the same 900KB of code for each Lambda function. That's why our deployment package is 100MB per deployment.

We can upload the 900KB of Python packages once as a Lambda Layer with Lambda layers. Then, each of our functions would have 100KB each. This reduces our deployment package to 100KB per function, or 1MB for all 100 Lambda functions, plus the 900KB of Python packages. All in all, 1.9MB is now the size of our deployment package per deployment. That's down from 100MB per deployment before this approach.



provider:
  layers:
      - Ref: PythonRequirementsLambdaLayer
custom:
  pythonRequirements:
    layer: true

I tried this out myself. This was before adding the layer: true. This application has 2 Python packages: boto3 and pytz.

And this was after adding layer: true:

Immediately we see a reduction in the same of deployment package per Lambda function from 9.2 MB to 71.3KB each. Instead, the 9.2MB is uploaded once as a Lambda Layer, and referenced by the Lambda functions.

During serverless deploy, you will SF deploy 2 separate deployment packages: one for your Lambda functions, and another one for your Lambda layer:

[7] Exclude specific directories from the deployment packages

Another thing that bloats up our deployment packages is our virtual environment in Python and our node modules in Node. Both are features of Python and Node in creating a virtual environment in that directory so we don't have to install everything globally. The problem is that these folders are usually huge.

If we don't do anything, both folders get uploaded by default as part of our deployment package. To prevent that, we include this little bit of snippet in our serverless.yml



package:
  exclude:
    - venv/**
    - node_modules/**

This has an immediate effect on the size of our deployment package:

How about you? What are your black-belt Lambda tips?

Photo by Uriel Soberanes on Unsplash

Special thanks to my editor, Allen, for making my posts more coherent.

DEV Community