Collect and Store Streaming TimeSeries data into Amazon TimeStream DB

#aws #serverless #database #architecture

Introduction

This post discusses serverless architectural consideration and configuration steps for deploying the Streaming TimeSeries Data Solution for Amazon TimeStream DB in the Amazon Web Services (AWS) Cloud.It includes links to a code repository that can be used as a base to deploy this solution by following AWS best practices for security and availability.

Components Basics

AWS Lambda
Lambda is a serverless compute service that lets you run code without provisioning or managing servers.
Amazon Timestream
Amazon Timestream is a fast, scalable, and serverless time series database service for IoT and other operational applications.
AWS Kinesis
Amazon Kinesis Data Streams ingests a large amount of data in real-time, durably stores the data, and makes the data available for consumption.

DataFLow

The streaming data pipeline will look alike as given below after the deployment.

Getting Started

AWS Kinesis Setup
Create timeseries-stream DataStream with a Shard.

Amazon TimeStream Setup
Create a database named ecomm in the same region as kinesis datastream and table named inventory in database ecomm using the gists shared in github.

AWS Lambda Setup
Create a Kinesis producer to create and ingest time series data into the kinesis data stream which has to be read in the same order. To do so, create a kinesis consumer. The python SDK examples used for this article has been kept at github repository TimeStream.

Deployment

Serverless framework makes deployment and development faster. Deploy Lambda Producers and Consumers into AWS and schedule them to run based on time-series event triggers or any schedule. In a typical production scenario, the producers might be outside of the cloud region and events might arrive through the API gateway.
Producer and Consumer Logs will be available in cloud watch.

The written results can be quired using the query editor of Amazon TimeStream.
select * from “ecomm”.”inventory” limit 10

Conclusion

In most organizations, Timeseries data points are written once and read multiple times. It is clear that time-series data can be collected and stored using serverless services. Though Timestream can be integrated with various AWS services, kinesis is chosen since it has data retention and replay features. The next article about time-series data will have a use case using kappa data processing architecture.

DEV Community