DEV Community

Cover image for Building a serverless COVID-19 notification pipeline
Danny Kay
Danny Kay

Posted on • Edited on

Building a serverless COVID-19 notification pipeline

Intro

Due to the lockdown in place because of COVID-19, I found myself with more spare time than usual so I decided to put it to better use. I'd built plenty of Lambda's over the years but I'd never built a fully serverless solution...so whatever I was building was going to be serverless.

I was having a browse around Github at COVID-19 datasets and came across a repository that contained data for the UK and was updated daily...then I had a lightbulb moment!

I'd seen a ton of amazing dashboards tracking cases all over the globe but my angle was different. I'd ingest the UK data from Github and somehow send out an email showing the increase in confirmed cases, tests and deaths.

I put my AWS cap on and came up with a basic solution...

  • A Lambda that ingests a CSV document from Github and stores it in S3, triggered by a CloudWatch event
  • A Lambda that reads the CSV doc from S3 and stores the latest data in DynamoDB, utilising DynamoDB as a materialised view
  • Another Lambda which is triggered by a DynamoDB Stream and calculates the difference between the old data and new data then sends out an email notification advising of the changes

Then I put the wheels in motion...

First steps - Installing AWS SAM and Creating our Project

We are going to be using the AWS SAM framework to build the pipeline. It's pretty straightforward to get installed and the instructions can be found here.

With SAM installed, we can create a new project by running sam init and following the CLI prompts.


▶ sam init

Which template source would you like to use?

1 - AWS Quick Start Templates
2 - Custom Template Location

Choice: 1

Which runtime would you like to use?
1 - nodejs12.x
2 - python3.8
3 - ruby2.7
4 - go1.x
5 - java11
6 - dotnetcore3.1
7 - nodejs10.x
8 - python3.7
9 - python3.6
10 - python2.7
11 - ruby2.5
12 - java8
13 - dotnetcore2.1
14 - dotnetcore2.0
15 - dotnetcore1.0

Runtime: 8

Project name [sam-app]: Serverless-Covid-19-Pipeline
Cloning app templates from https://github.com/awslabs/aws-sam-cli-app-templates.git

AWS quick start application templates:

1 - Hello World Example
2 - EventBridge Hello World
3 - EventBridge App from scratch (100+ Event Schemas)

Template selection: 1

-----------------------
Generating application:
-----------------------
Name: Serverless-Covid-19-Pipeline
Runtime: python3.7
Application Template: hello-world
Output Directory: .

Next steps can be found in the README file at ./Covid-19-Serverless-Pipeline/README.md
Enter fullscreen mode Exit fullscreen mode

I chose to write the Lambda's in Python because I'm a big fan of the language and its something I want to use more in my projects even though I'm still learning the language.

With the project created we should have a folder structure like the following...

Serverless-Covid-19-Pipeline/
├── README.md
├── events/
│ └── event.json
├── HelloWorld/
│ ├── __init__.py
│ └── app.py
│ └── requirements.txt
├── template.yaml
├── .gitignore
└── tests/
Enter fullscreen mode Exit fullscreen mode

We can trigger a build by running the sam build command.

▶ sam build
Building resource 'HelloWorldFunction'
Running PythonPipBuilder:ResolveDependencies
Running PythonPipBuilder:CopySource

Build Succeeded

Built Artifacts  : .aws-sam/build
Built Template   : .aws-sam/build/template.yaml

Commands you can use next
=========================
[*] Invoke Function: sam local invoke
[*] Deploy: sam deploy --guided
Enter fullscreen mode Exit fullscreen mode

As you can see our first build succeeded. In order to test the function we need to have Docker installed on our machine, we'll do this later when we have our first Lambda function written.

As this project is going to contain multiple Lambdas, we have to create the directories for the two other Lambdas, add the relevant files and rename the existing Lambda folder.

Serverless-Covid-19-Pipeline/
├── README.md
├── events/
│ └── event.json
├── data_ingestion_function/
│ ├── __init__.py
│ └── app.py
│ └── requirements.txt
├── s3_processing_function/
│ ├── __init__.py
│ └── app.py
│ └── requirements.txt
├── ddb_stream_notification/
│ ├── __init__.py
│ └── app.py
│ └── requirements.txt
├── template.yaml
├── .gitignore
└── tests/
Enter fullscreen mode Exit fullscreen mode

I'm not the best person when it comes to naming things, the data_ingestion_function will contain the Lamba which downloads the data from Github and save it to S3, the s3_processing_function will contain the Lambda which downloads the data from S3 and saves it to DynamoDB and lastly, the ddb_stream_notification will contain the Lambda which receives the DynamoDB stream and sends the message to SNS.

First Function - Data Ingestion

The first Lambda function we have to build is responsible for downloading the data from the Github repository and saving it to S3. This Lambda is going to be triggered by a CloudWatch Scheduled Event, so it can be invoked at the same time every day.

We're using the requests and boto3 libraries in this example so make sure they are included in the requirements.txt file and install them.

The first thing we want to do open the app.py file and replace the boilerplate code with our function code.

Data Ingestion Lambda

When the function is invoked it downloads the raw data from Github, writes the data to a CSV file in the tmp folder and then uploads this file to the specified S3 Bucket.

One of the amazing features of the AWS SAM Framework is the ability to test Lambda functions locally before deploying them to AWS.

In order to test our function, we need a mock scheduled event. We can create one by running sam local generate-event cloudwatch scheduled-event. This will generate a JSON snippet which we can modify as necessary and paste into a new JSON file named scheduled-event.json within the events folder of the parent directory.

But, we have a little problem.

Creating an S3 Bucket with CloudFormation

When we test this Lambda it will try and upload the file to our S3 Bucket...which we don't have. We need to modify the template.yaml and deploy our Cloudformation Stack to create the S3 Bucket.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
  Serverless-Covid-19-Pipeline

  SAM Template for Serverless-Covid-19-Pipeline

Parameters:

  S3Bucket:
    Type: String
    Default: danieljameskay-data-ingestion-bucket

Resources:
  DataIngestionBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: !Ref S3Bucket
Enter fullscreen mode Exit fullscreen mode

To deploy our CloudFormation Stack, we need to run sam deploy --guided, this will allow us to save values for deployment which will be used for future deployments.

$ sam build && sam deploy --guided

Build Succeeded

Built Artifacts  : .aws-sam/build
Built Template   : .aws-sam/build/template.yaml

Commands you can use next
=========================
[*] Invoke Function: sam local invoke
[*] Deploy: sam deploy --guided


Configuring SAM deploy
======================

        Looking for samconfig.toml :  Not found

        Setting default arguments for 'sam deploy'
        =========================================
        Stack Name [sam-app]: serverless-covid-19-pipeline
        AWS Region [us-east-1]: eu-west-2
        #Shows you resources changes to be deployed and require a 'Y' to initiate deploy
        Confirm changes before deploy [y/N]: y
        #SAM needs permission to be able to create roles to connect to the resources in your template
        Allow SAM CLI IAM role creation [Y/n]: y
        Save arguments to samconfig.toml [Y/n]: y

        Looking for resources needed for deployment: Found!

                Managed S3 bucket: aws-sam-cli-managed-default-samclisourcebucket-1nw0fx8sanitw
                A different default S3 bucket can be set in samconfig.toml

        Saved arguments to config file
        Running 'sam deploy' for future deployments will use the parameters saved above.
        The above parameters can be changed by modifying samconfig.toml
        Learn more about samconfig.toml syntax at 
        https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-config.html

        Deploying with following values
        ===============================
        Stack name                 : serverless-covid-19-pipeline
        Region                     : eu-west-2
        Confirm changeset          : True
        Deployment s3 bucket       : aws-sam-cli-managed-default-samclisourcebucket-1nw0fx8sanitw
        Capabilities               : ["CAPABILITY_IAM"]
        Parameter overrides        : {}

Initiating deployment
=====================

Waiting for changeset to be created..

CloudFormation stack changeset
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Operation                                                    LogicalResourceId                                            ResourceType                                               
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Add                                                        DataIngestionBucket                                          AWS::S3::Bucket                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Changeset created successfully. arn:aws:cloudformation:eu-west-2:983527076849:changeSet/samcli-deploy1586858125/37a6647c-ef63-4175-8b5c-dd28a8bd201c


Previewing CloudFormation changeset before deployment
======================================================
Deploy this changeset? [y/N]: y

2020-04-14 10:56:15 - Waiting for stack create/update to complete

CloudFormation events from changeset
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ResourceStatus                               ResourceType                                 LogicalResourceId                            ResourceStatusReason                       
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CREATE_IN_PROGRESS                           AWS::S3::Bucket                              DataIngestionBucket                          -                                          
CREATE_IN_PROGRESS                           AWS::S3::Bucket                              DataIngestionBucket                          Resource creation Initiated                
CREATE_COMPLETE                              AWS::S3::Bucket                              DataIngestionBucket                          -                                          
CREATE_COMPLETE                              AWS::CloudFormation::Stack                   serverless-covid-19-pipeline                 -                                          
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Successfully created/updated stack - serverless-covid-19-pipeline in eu-west-2
Enter fullscreen mode Exit fullscreen mode

So now if we head over to the AWS Console and open the S3 Explorer we should see the newly created Bucket. Good job!

Updating the Lambda configuration in the Cloudformation template

We need to update the Cloudformation template with our Data Ingestion function configuration.

Resources:
  DataIngestionFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: DataIngestionFunction
      CodeUri: data_ingestion_function/
      Handler: app.lambda_handler
      Runtime: python3.7
      Policies:
        - AmazonS3FullAccess
        - AWSLambdaBasicExecutionRole
        - AWSXrayWriteOnlyAccess
      Tracing: Active
      Environment:
        Variables:
          AWS_DESTINATION_BUCKET: !Ref S3Bucket
          DATA_URL: https://raw.githubusercontent.com/tomwhite/covid-19-uk-data/master/data/covid-19-totals-uk.csv
Enter fullscreen mode Exit fullscreen mode

We have assigned some basic policies to the Lambda function and specified the Bucket name and URL of the Github data as environment variables.

Now we should be able to build and test the function...

$ sam build && sam local invoke "DataIngestionFunction" -e events/scheduled-event.json
Building resource 'DataIngestionFunction'
Running PythonPipBuilder:ResolveDependencies
Running PythonPipBuilder:CopySource

Build Succeeded

Built Artifacts  : .aws-sam/build
Built Template   : .aws-sam/build/template.yaml

Commands you can use next
=========================
[*] Invoke Function: sam local invoke
[*] Deploy: sam deploy --guided

Invoking app.lambda_handler (python3.7)

Fetching lambci/lambda:python3.7 Docker container image......
Mounting /Users/daniel.kay/dev/Serverless-Covid-19-Pipeline/Serverless-Covid-19-Pipeline/.aws-sam/build/DataIngestionFunction as /var/task:ro,delegated inside runtime container
[INFO]  2020-04-14T13:28:13.880Z                Found credentials in environment variables.

START RequestId: 46f1780b-7e04-124f-f233-6ce13fb27a11 Version: $LATEST
[INFO]  2020-04-14T13:28:14.452Z        46f1780b-7e04-124f-f233-6ce13fb27a11    Requesting data from Github

[INFO]  2020-04-14T13:28:14.946Z        46f1780b-7e04-124f-f233-6ce13fb27a11    Data downloaded successfully

[INFO]  2020-04-14T13:28:15.758Z        46f1780b-7e04-124f-f233-6ce13fb27a11    Data uploaded to S3

END RequestId: 46f1780b-7e04-124f-f233-6ce13fb27a11
REPORT RequestId: 46f1780b-7e04-124f-f233-6ce13fb27a11  Init Duration: 1513.60 ms       Duration: 1310.72 ms    Billed Duration: 1400 ms        Memory Size: 128 MB     Max Memory Used: 43 MB

{"message":"Lambda completed"}
Enter fullscreen mode Exit fullscreen mode

Looking good, we can see from the log lines that the data was downloaded and uploaded to S3. If we navigate to the S3 Explorer and open up the Bucket we should see the CSV file.

Validate S3 Upload

We can now download the file and verify the contents. The file should contain data up until the previous day.

Configuring a Scheduled Events

So our Data Ingestion Lambda works perfectly. It downloads the data and it uploads it to S3. But we don't want to invoke the function manually, we want it invoked at a particular time of the day. We can utilize a Cloudwatch Scheduled Event to invoke the function for us.

We have to update our Cloudformation Template to include the event rule and permission for EventBridge to invoke the Lambda function.

DataIngestionScheduledRule:
  Type: AWS::Events::Rule
    Properties:
      Name: DataIngestionScheduledRule
      Description: Triggers the Data Ingestion lambda once per day
      ScheduleExpression: cron(45 14 * * ? *)
      State: ENABLED
      Targets:
        -
          Arn: !GetAtt DataIngestionFunction.Arn
          Id: DataIngestionScheduledRule

DataIngestionInvokePermission:
  Type: AWS::Lambda::Permission
  Properties:
    FunctionName: !Ref DataIngestionFunction
    Action: lambda:InvokeFunction
    Principal: events.amazonaws.com
    SourceArn: !GetAtt DataIngestionScheduledRule.Arn
Enter fullscreen mode Exit fullscreen mode

We are setting the scheduled expression to be a cron expression which will invoke the function daily at 1545 GMT due to daylight saving in the UK 😄. The target is the Data Ingestion function as this is the function we want to be invoked.

Right, it's ready to be built and deployed...

$ sam build && sam deploy
Building resource 'DataIngestionFunction'
Running PythonPipBuilder:ResolveDependencies
Running PythonPipBuilder:CopySource

Build Succeeded

Built Artifacts  : .aws-sam/build
Built Template   : .aws-sam/build/template.yaml

Commands you can use next
=========================
[*] Invoke Function: sam local invoke
[*] Deploy: sam deploy --guided


        Deploying with following values
        ===============================
        Stack name                 : serverless-covid-19-pipeline
        Region                     : eu-west-2
        Confirm changeset          : True
        Deployment s3 bucket       : aws-sam-cli-managed-default-samclisourcebucket-1nw0fx8sanitw
        Capabilities               : ["CAPABILITY_IAM"]
        Parameter overrides        : {}

Initiating deployment
=====================
Uploading to serverless-covid-19-pipeline/175b90da9d229e544ad442eeb0c9e280.template  1711 / 1711.0  (100.00%)

Waiting for changeset to be created..

CloudFormation stack changeset
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Operation                                                    LogicalResourceId                                            ResourceType                                               
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Add                                                        DataIngestionInvokePermission                                AWS::Lambda::Permission                                    
+ Add                                                        DataIngestionScheduledRule                                   AWS::Events::Rule                                          
* Modify                                                     DataIngestionFunctionRole                                    AWS::IAM::Role                                             
* Modify                                                     DataIngestionFunction                                        AWS::Lambda::Function                                      
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Changeset created successfully. arn:aws:cloudformation:eu-west-2:983527076849:changeSet/samcli-deploy1586873775/cd0112a5-5534-4d7e-95b7-a64503823b64


Previewing CloudFormation changeset before deployment
======================================================
Deploy this changeset? [y/N]: y

2020-04-14 15:16:25 - Waiting for stack create/update to complete

CloudFormation events from changeset
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ResourceStatus                               ResourceType                                 LogicalResourceId                            ResourceStatusReason                       
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
UPDATE_IN_PROGRESS                           AWS::Lambda::Function                        DataIngestionFunction                        Requested update requires the creation of  
                                                                                                                                       a new physical resource; hence creating    
                                                                                                                                       one.                                       
UPDATE_COMPLETE                              AWS::Lambda::Function                        DataIngestionFunction                        -                                          
UPDATE_IN_PROGRESS                           AWS::Lambda::Function                        DataIngestionFunction                        Resource creation Initiated                
CREATE_IN_PROGRESS                           AWS::Events::Rule                            DataIngestionScheduledRule                   -                                          
CREATE_IN_PROGRESS                           AWS::Events::Rule                            DataIngestionScheduledRule                   Resource creation Initiated                
CREATE_COMPLETE                              AWS::Events::Rule                            DataIngestionScheduledRule                   -                                          
CREATE_IN_PROGRESS                           AWS::Lambda::Permission                      DataIngestionInvokePermission                Resource creation Initiated                
CREATE_IN_PROGRESS                           AWS::Lambda::Permission                      DataIngestionInvokePermission                -                                          
CREATE_COMPLETE                              AWS::Lambda::Permission                      DataIngestionInvokePermission                -                                          
UPDATE_COMPLETE_CLEANUP_IN_PROGRESS          AWS::CloudFormation::Stack                   serverless-covid-19-pipeline                 -                                          
UPDATE_COMPLETE                              AWS::CloudFormation::Stack                   serverless-covid-19-pipeline                 -                                          
DELETE_COMPLETE                              AWS::Lambda::Function                        DataIngestionFunction                        -                                          
DELETE_IN_PROGRESS                           AWS::Lambda::Function                        DataIngestionFunction                        -                                          
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Successfully created/updated stack - serverless-covid-19-pipeline in eu-west-2
Enter fullscreen mode Exit fullscreen mode

We can see from the above that the invoke permissions and the scheduled rule have been created successfully. To validate this, we can head into CloudWatch Console and check out the Rules within the Events section.

Verify scheduled rule

If we navigate to the Lambda Console in AWS and open our Lambda from within the Functions section, we should see that the CloudWatch Event is now set up as a trigger for this Lambda.

Lambda trigger
Exciting times!

So how do we know if this function was in fact triggered at 1545?

We can do this in a few different ways:

  • We can check the timestamp on the CSV file that was created when we were testing as this will have a new modified date and time
  • If the file wasn't there to begin with there should be a CSV file in the bucket
  • We can check the CloudWatch logs
  • CloudWatch ServiceLens (my favorite)

If we check the CloudWatch logs it will look very similar to the log lines we saw when we were testing earlier.

CloudWatch logs

We can see the data was downloaded and uploaded to S3 without any issues.

CloudWatch ServiceLens provides us with complete observability of our applications and services by providing logs, metrics, tracing and alarms in one friendly dashboard.

I'm planning on writing another post about ServiceLens at a later date.

Second Function - Data Processing

We are now going to start working on the second Lambda function. This function will be responsible for listening to S3 events from the Bucket we created earlier, using the event to download the CSV file and insert the data for the most recent date into DynamoDB.

As we've covered how we interact with the AWS SAM CLI, these steps won't be as heavily detailed as they were in previous sections

First things first, we need a DynamoDB table. We can update our CloudFormation template to include a DynamoDB resource type with some basic attributes.

Whilst we are doing this we can also add our Lambda configuration.

DynamoDBTable:
  Type:  AWS::DynamoDB::Table
  Properties:
    TableName:  danieljameskay-covid19-data
    KeySchema:
      -
        AttributeName:  country
        KeyType:  HASH
    AttributeDefinitions:
      -
        AttributeName:  country
        AttributeType:  S
    ProvisionedThroughput:
      ReadCapacityUnits:  5
      WriteCapacityUnits:  5
    StreamSpecification:
      StreamViewType:  NEW_AND_OLD_IMAGES

S3ProcessingFunction:
  Type: AWS::Serverless::Function
  Properties:
    FunctionName: S3ProcessingFunction
    CodeUri: s3_processing_function/
    Handler: app.lambda_handler
    Runtime: python3.7
    Tracing: Active
    Environment:
      Variables:
        AWS_DOWNLOAD_BUCKET: !Ref S3Bucket
        DDB_TABLE: danieljameskay-covid19-data
    Policies:
      - AWSLambdaBasicExecutionRole
      - AWSXrayWriteOnlyAccess
      - AmazonDynamoDBFullAccess
      - AmazonS3FullAccess
    Events:
      s3Notification:
        Type: S3
        Properties:
          Bucket: !Ref DataIngestionBucket
          Events: s3:ObjectCreated:*
Enter fullscreen mode Exit fullscreen mode

We are specifying the key of our table is the country. The type is hash which is also known as the partition key. We're also specifying a StreamSpecification of type NEW_AND_OLD_IMAGES which means that when the values for the country are updated, DynamoDB will stream the new value and old value to any listeners...in our case, our Notification Lambda.

We should now be able to run sam build && sam deploy to deploy our DynamoDB Table.

Now the code...

S3 processing app

When the function is invoked it downloads the CSV file from Github and stores it in the tmp directory, processes it and inserts the most recent record into DynamoDB along with the date that the data corresponds to.

There's probably a nicer way to write some of this logic but as I'm still learning Python...it does the job 👨‍💻

As we did earlier we can test this Lambda by running sam build "S3ProcessingFunction" && sam local invoke "S3ProcessingFunction" -e events/s3-put-event.json

We can verify the Lambda has worked correctly by verifying the log lines...

[INFO]  2020-04-16T10:25:10.372Z        bfcc35ee-2908-1a44-afe0-e02816837dbc    Downloading CSV file from S3
[INFO]  2020-04-16T10:25:10.663Z        bfcc35ee-2908-1a44-afe0-e02816837dbc    File downloaded to tmp directory
[INFO]  2020-04-16T10:25:10.669Z        bfcc35ee-2908-1a44-afe0-e02816837dbc    Inserting record into DynamoDB...
[INFO]  2020-04-16T10:25:10.937Z        bfcc35ee-2908-1a44-afe0-e02816837dbc    Record added to DynamoDB
Enter fullscreen mode Exit fullscreen mode

and checking DynamoDB via the AWS CLI...

$ aws dynamodb scan --table-name danieljameskay-covid19-data                                             
{
    "Count": 1, 
    "Items": [
        {
            "date": {
                "S": "04-13-2020"
            }, 
            "country": {
                "S": "UK"
            }, 
            "confirmed_cases": {
                "S": "88621"
            }, 
            "tests": {
                "S": "290720"
            }, 
            "deaths": {
                "S": "11329"
            }
        }
    ], 
    "ScannedCount": 1, 
    "ConsumedCapacity": null
}
Enter fullscreen mode Exit fullscreen mode

We can see that our test has successfully run and the data has been inserted into DynamoDB.

We should now be able to run sam build && sam deploy to deploy our new Lambda.

Lets see if what we've done so far works 🤔

So, we've tested our 2 Lambdas locally and 1 of them in AWS. We can wait until the time we set earlier to see if our Lambda's work or we can trigger the Data Ingestion Lambda from the AWS CLI...

$ aws lambda invoke --function-name DataIngestionFunction out --log-type None
{
    "ExecutedVersion": "$LATEST", 
    "StatusCode": 200
}
Enter fullscreen mode Exit fullscreen mode

and then we can check DynamoDB.

$ aws dynamodb scan --table-name danieljameskay-covid19-data                 
{
    "Count": 1, 
    "Items": [
        {
            "date": {
                "S": "04-15-2020"
            }, 
            "country": {
                "S": "UK"
            }, 
            "confirmed_cases": {
                "S": "98476"
            }, 
            "tests": {
                "S": "313769"
            }, 
            "deaths": {
                "S": "12868"
            }
        }
    ], 
    "ScannedCount": 1, 
    "ConsumedCapacity": null
}
Enter fullscreen mode Exit fullscreen mode

Looking good! So we manually triggered the Data Ingestion Lambda which in turn triggered the S3 Processing Lambda using the S3 Event which added the new data to the Table.

Third Function - Notification

We have a Lambda which downloads the data and another which processes this data and saves it to DynamoDB.

The last Lambda in this solution listens to a stream from our DynamoDB Table which contains the old data and the new data. Using this data it works out the percent difference between the two datasets and builds a message which is then used to be sent out using AWS SNS to an email address.

Right, CloudFormation additions...

DDBNotificationFunction:
  Type: AWS::Serverless::Function
  Properties:
    CodeUri: ddb_notification_function
    FunctionName: 'DDBNotificationFunction'
    Handler: app.lambda_handler
    Runtime: python3.7
    Tracing: Active
    Environment:
      Variables:
        SNS_TOPIC: !Ref SNSTopic
    Policies:
      - AmazonSNSFullAccess
      - AWSLambdaBasicExecutionRole
      - AWSXrayWriteOnlyAccess
    Events:
      Stream:
        Type: DynamoDB
        Properties:
          Stream: !GetAtt  DynamoDBTable.StreamArn
          StartingPosition: TRIM_HORIZON

SNSTopic:
  Type: AWS::SNS::Topic
  Properties:
  TopicName: "ddb-stream-covid-19-daily-data"

SNSSubscription:
  Type: AWS::SNS::Subscription
  Properties:
    Endpoint: 4i4nbyt@2go-mail.com
    Protocol: email
    TopicArn: !Ref  'SNSTopic'
Enter fullscreen mode Exit fullscreen mode

Here we have our last Lambda configuration. The only thing to really point out here is we are specifying DynamoDB as the type of event source for this Lambda and the Stream ARN of our DynamoDB Table.

Lastly, we are creating an SNS Topic and Subscription. Our Lambda will send the message to the SNS Topic and the Subscription will be responsible for sending it to our temporary email address.

Code time people...

DDB stream notification

We are retrieving the old and new records from the event which triggered the Lambda function. Then we establish the percentage difference between the old attributes and new attributes and finally publishing a message with the percentage change to SNS.

We should now be able to run sam build && sam deploy to deploy our new Lambda.

Testing from the Lambda Console

In the previous sections, we tested our functions from the AWS CLI and AWS SAM CLI but we can invoke our function from Lambda Console.

With the Notification Lambda open in the Lambda Console, we can click "Test" followed by "Configure Test Events". Using the "Hello World" template, we can just overwrite the JSON payload with a JSON payload generated by running sam local generate-event dynamodb update and changing the values.

Below is an example of the event that the DynamoDB Notification Lambda receives when the data is changed.

{
    "Records": [
        {
            "eventID": "24df338c43fa112256a16d7d13c6eb14",
            "eventName": "MODIFY",
            "eventVersion": "1.1",
            "eventSource": "aws:dynamodb",
            "awsRegion": "eu-west-2",
            "dynamodb": {
                "ApproximateCreationDateTime": 1587043428,
                "Keys": {
                    "country": {
                        "S": "UK"
                    }
                },
                "NewImage": {
                    "date": {
                        "S": "04-13-2020"
                    },
                    "country": {
                        "S": "UK"
                    },
                    "confirmed_cases": {
                        "S": "98621"
                    },
                    "tests": {
                        "S": "291720"
                    },
                    "deaths": {
                        "S": "12329"
                    }
                },
                "OldImage": {
                    "date": {
                        "S": "04-12-2020"
                    },
                    "country": {
                        "S": "UK"
                    },
                    "confirmed_cases": {
                        "S": "88621"
                    },
                    "tests": {
                        "S": "290720"
                    },
                    "deaths": {
                        "S": "11329"
                    }
                },
                "SequenceNumber": "5571000000000004914106821",
                "SizeBytes": 125,
                "StreamViewType": "NEW_AND_OLD_IMAGES"
            },
            "eventSourceARN": "arn:aws:dynamodb:eu-west-2:983527076849:table/danieljameskay-covid19-data/stream/2020-04-15T09:00:54.674"
        }
    ]
}
Enter fullscreen mode Exit fullscreen mode

We can then use this saved event and some fake data to trigger our Lambda.

Lambda success

As our Lambda has successfully run we should have received an email. Let's check our inbox...

Sns email

Oh yeah 😎😎😎

And there we have it, the email with the percentage changes. It looks amazing, doesn't it?

End to end

We haven't seen the solution work end to end as of yet, let's change that!

I modified the time for the DataIngestionScheduledRule to 10 minutes in the future and altered the DynamoDB record so it had the values for the 15th April so when the Data Ingestion Lambda ran it would pull down the data for the 16th April.

So I deployed the changes, made a coffee and came back to see the email in my inbox with the difference between the two days.

I also checked SerivceLens and everything had run smoothly with no errors :)

Service lens

Wrapping Up

Working on this has been extremely rewarding. I know my way around AWS pretty well but, I've never used CloudFormation before and I'm still learning Python which added an extra challenge.

We could have used LocalStack and local DynamoDB to aid with the testing locally so we wouldn't have to interact with AWS until we deployed.

Things I would have liked to do and may do in the future:

  • Utilize AWS Amplify and build a React/GraphQL application that interacts with the full UK dataset as opposed to the most recent dates
  • Use some more datasets for USA and Europe
  • Use the X-Ray SDK to track AWS SDK calls
  • Learn more about CloudFormation and tidy up the template.yaml file

With everything that's been going on with COVID-19 building this solution and writing it up has kept me preoccupied in these uncertain times. Hopefully, there will be another post or two coming in the future.

Big thanks to Tom White for the data used in this post. For anyone interested in his Github profile, it can be found here.

Cover Photo by Dhaya Eddine Bentaleb on Unsplash.

I hope everyone who has read this post has enjoyed it and if anyone has any questions drop me a comment or drop a tweet!

Stay safe!

Cheers

DK

Top comments (0)