loading...
Cover image for AWS Serverless services for  REST API

AWS Serverless services for REST API

bruck1701 profile image Bruno Kümmel ・13 min read

Introduction

One of the best definitions that I've seen for API is that it works a "translator", so programs written in, possibly, different languages and/or running different Operating Systems can exchange data. Considering the number of different systems, programming languages and frameworks, chances are that most developers, sooner or later, will have to make use of or develop their own API to exchange data with other programs.

Besides the programming language that the developer is familiar with (or the learning curve to develop with a new programming language) the next important decision, as with any web applications, is where to deploy your API.

The developer must decide on performance, availability, scalability, and cost of executing the service. Every time we see these words put together, we automatically think of a cloud service. The natural choice would be to deploy it on a cloud provider.

The Cloud ☁️

When we first think about deploying an app or a service to the cloud, the classical way would be to instantiate a machine (EC2 on AWS, Compute Engine on GCP or Azure Virtual Machines), install the needed libraries and deploy the application, right?

The thing about this strategy is that you kind of only changed the location of the server running your app. You still need to manage most aspects of the underlying system: patch the OS, install the packages, libraries, perform the backup, manage security, provision more machines if the demand grows, and the list goes on... Besides, you are paying the cloud provider for the machine runtime, even when it is idle.

In cases like an API, you can really leverage the cloud potential, when you start shifting your app to make the most of serverless services.

Serverless paradigm

By 2022, most platform as a service (PaaS) offerings will evolve to a fundamentally serverless model, rendering the cloud platform architectures dominating in 2017 as legacy architectures”
Gartner

What is serverless anyway? Serverless is a native architecture of the cloud that enables you to shift more of the operational responsibilities to the cloud provider. It allows us to build and run applications and services without thinking about servers. You just manage your code and its life-cycle. Isn't it great?

  • You don't have to spend time/money in the maintenance, provision and management of the underlying systems;
  • You don't have to plan the scalability and provision new machines for a sudden increase of the use of the app, and
  • You also only pay for the end-use of the service.

But not everything is wonderful and golden in the serverless realm. The trade-off is that the complexity to design serverless solutions tend to increase a bit. Now the developer has to know how set a bunch of different services and how to orchestrate these services together to perform the same task that a single application used to handle. So it is a good idea to create some small projects to get to know these services.

So let's play a little bit with AWS serverless solutions for a rest API. We'll be using the AWS API Gateway, AWS lambda functions and DynamoDB.

Case Study

For this mini-guide I've chosen a simple example that manages items in a warehouse. The methods are the classical actions of a regular CRUD API:

  • list all the items;
  • check the price of a specific item;
  • create a new item;
  • update the price of an item, and
  • delete an item.

Since the serverless services scales according to the demand (and, consequently, the cost of operating it increases), it is of extreme importance to set up an authentication to the API before deploying it. However, I think the topic about authentication for the API deserves a post of its own. So, for now, let's go through with the basic functions of our API.

Normally, if you are developing your API in the classical way, you would have one big monolith of a solution. The whole process would be handled by an app running on a EC2 instance. In our case, we'll split it in three services in AWS: API Gateway - to handle the user interface through its endpoints -, AWS lambda - that will receive the request from API Gateway, process it and exchange data with the database-, and DynamoDB - that will serve as the Database of the solution.

So let's start:

Creating the API and the resources: AWS API Gateway

The service that handles the user requests is AWS API Gateway. It is a fully managed service (as all the serverless services) that handles all tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls.

There are two ways to set it up manually (as with most of AWS services): you can log in to AWS console and navigate through the options, or use the AWS CLI.

By using the console you would get these options:

splashscreen

We click on the Build button on the REST API (not the private one.)

apiCreation

Then we choose the initial options for our API such as name, description, and Endpoint Type.

The equivalent CLI command:

$ aws apigateway create-rest-api --name 'my-warehouse-API' --endpoint-configuration types="REGIONAL" --description 'This is a CLI created API'
{
    "id": "123456789",
    "name": "my-warehouse-API",
    "description": "This is a CLI created API",
    "createdDate": "2020-08-23T08:12:50+02:00",
    "apiKeySource": "HEADER",
    "endpointConfiguration": {
        "types": [
            "REGIONAL"
        ]
    }
}

Next we need to create the endpoints Item and Items in API Gateway. Again we can use either the AWS console or the CLI. For the AWS console

resource

To create a new resource from the CLI, you need the ids of the API and the parent path to add the new resource. The CLI commands to list the resources of a given API and to create a new resource are:

$ aws apigateway get-resources --rest-api-id 123456789 
{
    "items": [
        {
            "id": "555555",
            "parentId": "d8wd8wd8wd",
            "pathPart": "item",
            "path": "/item"
        },
        {
            "id": "d8wd8wd8wd",
            "path": "/"
        }
    ]
}
$ aws apigateway create-resource --rest-api-id 123456789 --parent-id d8wd8wd8wd --path-part 'items'
{
    "id": "666666",
    "parentId": "d8wd8wd8wd",
    "pathPart": "items",
    "path": "/items"
}

Great! Next we need the methods for the resources, but before we jump to add the resources, and the functions, we need to set up our database.
So Let's pause the API Gateway set up here and create our table on DynamoDB.

DynamoDB

DynamoDB is a fully managed NoSQL database service from AWS that provides seamless scalability. There is a HUGE amount of information on DynamoDB that can't be covered here, so I'll cut to the chase on how to set it up for our API. However, I strongly recommend to go through the documentation before deploying it beyond a development environment. So, let's create the table.

First you navigate to DynamoDB service on AWS, click on the button create table. Choose your table name, primary key and a sort key and click create.

create_ddb

Done!

The CLI command is a bit uglier this time. If you want to fine tune your table, it is best to use a JSON file to pass the configurations or to create a template on Cloudformation, but since our table is relatively simple we can use the following command:

$ aws dynamodb create-table \
     --table-name warehouse-DDB-table \
     --attribute-definitions AttributeName=item_code,AttributeType=S AttributeName=name,AttributeType=S \
     --key-schema AttributeName=item_code,KeyType=HASH AttributeName=name,KeyType=RANGE \
     --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5
{
    "TableDescription": {
        "AttributeDefinitions": [
            {
                "AttributeName": "item_code",
                "AttributeType": "S"
            },
            {
                "AttributeName": "name",
                "AttributeType": "S"
            }
        ],
        "TableName": "warehouse-DDB-table",
        "KeySchema": [
            {
                "AttributeName": "item_code",
                "KeyType": "HASH"
            },
            {
                "AttributeName": "name",
                "KeyType": "RANGE"
            }
        ],
        "TableStatus": "CREATING",
        "CreationDateTime": "2020-08-27T21:41:28.828000+02:00",
        "ProvisionedThroughput": {
            "NumberOfDecreasesToday": 0,
            "ReadCapacityUnits": 5,
            "WriteCapacityUnits": 5
        },
        "TableSizeBytes": 0,
        "ItemCount": 0,
        "TableArn": "arn:aws:dynamodb:<region>:1234567898123:table/warehouse-DDB-table",
        "TableId": "12345678-aaaa-12f-1222f-g12312345512331234e1"
    }
}

For our example let's assume that the item_code, the partition key, is a manufacturer coded that has a good cardinality to distribute our items homogeneously between partitions and the name should be the name of the product. The primary key is, then, composed of a manufacture's code+item's name.

Great! Now back to the Lambdas! Don't forget to take note of your Table ARN. We'll need it to add the correct IAM roles in order to allow the lambda functions to read and write from the table.

AWS Lambda functions.

Now comes the fun part! A first approach to start using Lambda functions would be to basically paste your working code into the handler, thus creating a humongous Lambda monolith!
In a serverless paradigm, some say it is not the ideal design. There is a trade off between quicker functions, that cost less money to execute, versus larger functions, that have a smaller response ( the function would have less cold starts if it runs longer). AWS usually recommends that when in doubt, if you should increase the execution time of your Lambda or the amount of memory, you should increase the memory. So I take that for the common case, it is best to keep the functions quick and single purpose.

Having this in mind, let's set up our functions. We'll need a function to the methods: get (single item and all items), post, put, and delete.

Get method

The steps to the set up the functions are very similar. The only thing that changes is the code of each function and the IAM permission that each function will need. So we'll go step by step on the first function:

First we click the button to create the function
createlambda
In our case, since we are trying to learn the concepts and commands, we'll build it from scratch, choose the name of the function and click create...
lambdaName

and Done!

lambdaGreeting

Once we have created the function, we need to set up the permission to be able to exchange data from DynamoDB. A good practice here is that we should always give the minimum necessary permission to perform its task. So we click on permissions:

lambdaPermission

Click on Role Name. This will take us to the IAM Console. In the IAM Console, we attach the needed policies to our Lambda Role. As rule of thumb, every function should have its own role with its own set of permissions.

lambda-role

For the GET method, we can use the AWS managed policy: AmazonDynamoDBReadOnlyAccess. For the POST, PUT and DELETE methods, we'll need to create our own write policy, since the AWS managed FullAccess policy is not really recommended for this case.

lambda-dynamo-permission

With the permission in place, we can add the lambda code. A good pattern when writing lambda functions is to keep the handler as small as possible.
It means that the database (and any service connection for that matter) should be made outside of the handler. This is a good practice because all variables outside of the handler will be frozen in between Lambda invocations and they might be reused by other functions. This way you won't flood your database with connections when your functions scale up.

There are a lot of other settings to tune your Lambda functions. The one that I think it is interesting to know while developing is the Concurrency setting. You can set how many concurrent executions of your function are able to co-exist at a given time. The limit for your WHOLE account is 1000. So, if you have several functions, and one is executing 1000 different instances, the other functions will be throttled and won't respond properly on time. So it is a good idea to tune the number of concurrent calls of your functions.

The following source codes are the handler codes of the functions that implement the methods GET, POST, PUT and DELETE.

import json
import boto3
import os
from botocore.exceptions import ClientError

dynamodb = boto3.resource('dynamodb') 
table = dynamodb.Table(os.environ['TABLE_NAME'])

def lambda_handler(event, context):

    search_term_id = event["queryStringParameters"]['code']
    search_term_name = event["queryStringParameters"]['name']

    try:
        response = table.get_item(Key={'code':search_term_id,'name':search_term_name})

    except ClientError as e:
        return {
            'statusCode': 500,
            'body': json.dumps( e.response['Error']['Message'] )
        } 
    else:
        price="{:.2f}".format(float(response['Item']['price']))
        name = response['Item']['name']
        return {
            'statusCode': 200, 
            'body': json.dumps({'name':name,'price':price}) 
        }

Notice that to use the


 function, we need both keys, even if we are considering that the name of the item should be unique. The partition key and the Sort key must be present in the function.

### GET for all items



```python
# Same Boilerplate code: libraries and connection to DynamoDB before the handler.

def lambda_handler(event, context):
    try:
        response = table.scan()
    except ClientError as e:
        return {
            'statusCode': 500,
            'body': json.dumps( e.response['Error']['Message'] )
        } 
    else:
        for el in response['Items']:
            el['price']="{:.2f}".format(float(el['price']))

        return {'statusCode': 200, 'body': json.dumps(response['Items']) }

The function

table.scan()

on the other hand, by default returns all the data on the table. So, we don't need to specify the keys.
For larger tables (hundreds of thousands of entries), it is better to use paginators. It is important to keep in mind that scan is a costly operation that consumes a lot of RCU of DynamoDB and should not be used often.

POST method: Create the Lambda function

The function has the attached permission read and write to the DynamoDB table. As with a POST method, we look for the item first, to verify if the item already exists in the table. If it exists, the POST method should return an error alerting that the item is already present in the database.


import json
import boto3
import os
from botocore.exceptions import ClientError
from decimal import Decimal
from boto3.dynamodb.conditions import Key

dynamodb = boto3.resource('dynamodb') 
table = dynamodb.Table(os.environ['TABLE_NAME'])

def lambda_handler(event, context):

    item_code = event['queryStringParameters']['code']
    item_name = event['queryStringParameters']['name']
    price = event['queryStringParameters']['price']

    response = table.get_item(Key={'code':item_code,'name':item_name})

    if 'Item' in response:
        return {
            'statusCode': 409,
            'body': json.dumps({'message':'item already exists in the DB'})
        }
    else:

        response = table.put_item(
          Item={
                'code': item_code,
                'name': item_name,
                'price':Decimal(price)
                }
        )
        return{
            'statusCode': 201,
            'body': json.dumps({'message':f"{item_name} inserted in the DB" })
        } 


PUT method: Update Lambda function

The PUT method, on the other hand, allows us to add a new item to our table, if it doesn't exist. In case that the item is present, it is possible to update its price. For this, we attach the IAM permission to read and to update and entry on the table.


# Same code as the create function

    if 'Item' in response:

        response = table.update_item(
            Key={
                'code': response['Item']['code'],
                'name': response['Item']['name']
            },
            UpdateExpression="set price=:p",
            ExpressionAttributeValues={
                ':p': Decimal(price)
            },
            ReturnValues="UPDATED_NEW"
        )


        return {
            'statusCode': 201,
            'body': json.dumps({'message':f"{item_name} price updated to {price}"})
        }

    else:


 # Same code as the create function

Delete

As with the get method, to delete an item from the table we need both the item_code and item_name.

import json
import boto3
import os
from botocore.exceptions import ClientError


dynamodb = boto3.resource('dynamodb') 
table = dynamodb.Table(os.environ['TABLE_NAME'])

def lambda_handler(event, context):

    item_code = event["queryStringParameters"]['code']
    item_name = event["queryStringParameters"]['name']

    try:
        response = table.delete_item(Key={'code': item_code, 'name':item_name })
    except ClientError as e:
        return {'statusCode': 400, 'body': json.dumps({'message': e.response['Error']['Message']} ) }

    else:
         return{
            'statusCode':200,
            'body': json.dumps({'message':f"{item_name} no longer in the DB"})
            }

For all of these functions the appropriate permission to modify the DynamoDB were created and the functions were tested to check if they are reaching the service adequately.

One extra step that we would normally do at the final stage of developing the function and getting it ready to test it would be do publish the Lambda.
Once we decide to publish the function, it creates a version to the function and that version cannot be modified.
We would still be able to change the code of the $LATEST version (a development version) of the function, but the published version of the Lambda is immutable.

There is a very nice feature in AWS Lambda that you can create Aliases to the published functions. With these aliases you can manage the distribution of incoming requests to different versions of your Lambda functions. So it is possible to plan a gradual migration from one version to another when new changes are made to the function, AND you don't have to access all the services that call a function to change the version manually there!

Connecting to the API and testing it.

Finally we can can go back to API Gateway to create the methods of our resources. First we select the HTTP request method:

createMethodAPI

Then we link it to the respective Lambda function. It is important to check the Lambda proxy integration box. Through this option the API Gateway passes to the integrated Lambda function the raw request as-is. Including the request headers, query string parameters, URL path variables, payload, and API configuration data.

apiGETmethod

The diagram that follows shows the dataflow from the API request to the Lambda function. The DynamoDB is hidden from the rest of the structure.

api_lambda

Once the API Gateway is connected to the Lambda function, we can test it to see it working.
A very interesting detail not shown in the picture but that demonstrates the issue of cold start in Lambda functions is that the first invocation of the Lambda in the API Gateway test has a latency of over 800ms, which is pretty bad! However, the next calls are under 100ms, which is pretty good for a web service! In terms of comparison, a similar wsgi API written in Python running on a EC2 Instance in the same AWS Region has latency of a little bit over 200ms!

apiLambdaTest

Great, our tests are working fine! The next step would be to deploy the API, but as I mentioned before, it is better to set some security to authenticate to your API service first. Considering that we are creating a serverless application that can easily scale up, if we don't set up some basic security, we can be victim of an attack that generates thousands of requests to our API and our AWS bill would go through the roof.

Conclusion

That's it! In this walk-through, we had a glimpse on how to set up a serverless RESTful API using API Gateway, AWS Lambda and DynamoDB. There is a lot more to cover in each of these services, so that we could hardly cover in a single post. The idea here was to show how to get started with a simple example and to become familiar with the very first steps.

As I mentioned in the introduction, I firmly believe that in the next couple of years, most of the services offered by the cloud providers will have some serverless aspect to it. Not only because it is better for the customer (in terms of cost and availability), but it may be easier for the cloud provider to manage as well (maintenance, security and cost of operation).

Thus with this fast paced world of software development, we must keep ourselves updated, or at least a bit familiar with some of this services.

Any comments, suggestions, tips... feel free to share your thoughts!

Special Thanks

To all the amazing people of the #100DaysOfCloud community, You guys are awesome!!

Posted on by:

bruck1701 profile

Bruno Kümmel

@bruck1701

Freelance developer and Cloud professional.

Discussion

pic
Editor guide