DEV Community

Asanka Boteju
Asanka Boteju

Posted on • Updated on

Stream Data at scale from millions of sources with Amazon Kinesis (Serverless)

Amazon Kinesis is a streaming service specifically designed to address the complexities of data streaming and the cost associated with data streaming into the AWS Cloud.

Using Kinesis you can securely CONNECT, ANALYSE AND PROCESS data in real-time or near-real-time. The data you can process with Kinesis includes social-media feeds, event logs, click-stream data, sensor data from IoT devices and data from applications.

Kinesis Video Streaming service can be used to stream binary-encoded data such as audio, videos, images and time-series data. To process text-encoded data such as logs, clickstream data, social-media feeds, financial transactions and telemetry data from IoT devices you can use the other services such Amazon Kinesis Data Streams, Amazon Kinesis Data Fire-horse and Amazon Kinesis Data Analytics.

When talking about data streaming there are five layers that comes to our mind

Image description

The Stream Storage Layer is a low latency data buffer with a minimum data retention period is 24 hours (which can be extended to 7days with an additional charge) and the maximum retention duration is 365 days. The data in this layer are immutable, meaning the data can not be modified or deleted, but with a pre-defined TTL the data can be expired and removed automatically. Data can be sent to destination data storages such as Data Lake, Data Warehouse, to another data stream or to a durable storage such as Amazon S3, Amazon Redshift, Amazon Elastic Search, Splunk and so on.

You can use AWS IAM service (AWS Identity Management Service) to control access to stream data. Also to secure your data both at-rest and in-transit you can use TLS (Transport Layer Security) for security at transit and AWS KMS (Key Management Service) to automatically secure your data at-rest while data are in the stream and even when the data is stored in amazon storage services such as AWS S3 or Amazon Redshift Data warehouse service.

We can categorize AWS Kinesis into four main categories;

1. Amazon Kinesis Video Streams.
2. Amazon Kinesis Data Streams.
3. Amazon Kinesis Data Fire-horse.
4. Amazon Kinesis Data Analytics.

Image description

1. Amazon Kinesis Video Streams:

This service is used to stream binary-encoded data such as audio, videos, images and time-series data.

Image description

This service is specifically designed to stream binary-encoded data from millions of sources such as audio, video or binary-encoded time-series data. The AWS SDK enables you to ingested data that can be then used for playbacks, storage, machine learning and other data processing purposes in a secure manner. Even you can ingest data from smartphone, security cameras, IoT edge, Radars, drones, satellites and even from dash-cams.

Kinesis Video Streams supports the WebRTC open-source project which allows real-time media streaming between web browsers, mobile devices and other connected devices.

2. Kinesis Data Streams:

Kinesis Data Stream service is a highly customizable stream processing service, you can customize data ingestion, monitoring, scaling, elasticity and consumption programmatically when creating the stream. The required resources will be provision by AWS only when it is requested.

Image description

Amazon Kinesis data streams does not have the capability to perform auto-scaling on its own, you need to build it into your solution, if you need scaling. You can use AWS SDK, API's and AWS CLI, Kinesis agent for Linux and kinesis agent for Windows to facilitate development, usage and management activities.

Below are two main component of a data streaming use case.

  • Stream Producer:

Producers can ingest data into the streams and you can use AWS SDK, Kinesis agent, Kinesis API or Kinesis Producer Library (KPL).

In simple terms, The Kinesis Data Streams is a set of shards where each shard contains a sequence of immutable data records.Shard typically consist of 3 components, a sequence number, partition key and data blob as an immutable sequence of bytes.

Data can be retained a minimum of 24 hours by default, but can be extended to 7 days for an additional charge. Data stored for more then 24 hours and 7 days is charged separately per-giga-byte used per month.

You can configure and update the retention period via the below API calls when creating the stream.

  • IncreaseStreamRetentionPeriod
  • DecreaseStreamRetentionPeriod

There are some charges that in will incur for data retrieval after 7 days using the GetRecords() API. However, There are no charges for data retrieval when using the enhance fanout-consumer using the SubscribeToShard() API.

These pricing models and charges can change over time, therefore it always good to look for the AWS pricing when you plan to use these services in your solution.

  • Stream Consumer:

Kinesis data streams consumer applications can get data from kinesis data streams and process them. You can also create your own custom applications using the AWS SDK, Kinesis API's or KCL (Kinesis client library).

  • Classical (Pull Model):
    Pull the data from the data stream also known as pooling.

  • Enhanced Fan out (Push Model):
    With the push method the consumers can subscribe to a shard so the data will be atomically pushed from the shard to the consumer application resulting in a 2mbps of throughput per shard per each consumer.

3. Kinesis Data Fire-horse:

Amazon Kinesis Data Firehose is a fully managed service from AWS that makes it easy to reliably load streaming data into data lakes, data stores, and analytics services.

Image description

Key Features

  • Real-time Data Ingestion:
    Kinesis Data Firehose can capture, transform, and load streaming data into Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk.

  • Automatic Scaling:
    The service automatically scales to match the throughput of your data, ensuring seamless data processing without manual intervention.

  • Data Transformation:
    You can configure Kinesis Data Firehose to transform your data before delivery using AWS Lambda functions. This allows for on-the-fly data format conversion and enrichment.

  • Batch, Compress, and Encrypt Data:
    The service can batch data, compress it to save storage costs, and encrypt it for secure delivery.

  • Monitoring and Error Handling:
    Integrated with Amazon Cloud Watch, Kinesis Data Firehose offers robust monitoring capabilities, and it automatically retries failed data delivery attempts, providing durable and reliable data transfer.

Use Cases

  • Log and Event Data Collection:
    Collect and analyze log data from servers, applications and devices in real-time.

  • Data Lake Ingestion:
    Stream data into Amazon S3 to build a scalable and durable data lake.

  • Real-time Analytics:
    Deliver streaming data to Amazon Redshift and other analytics services for real-time analysis.

  • IoT Data:
    Collect and process data from IoT devices for immediate insights and actions.

How It Works

1. Data Producers:
Your applications, servers, or devices generate streaming data.

2. Delivery Stream:
Kinesis Data Firehose acts as the conduit that collects, transforms, and delivers the data.

3. Data Consumers:
The transformed and stored data can be analyzed using services like Amazon Redshift, S3, Elasticsearch, and Splunk.

Basic Workflow

  • Setup Delivery Stream:
    From the AWS Management Console, create a Firehose delivery stream.

  • Send Data:
    Use the AWS SDK or Kinesis Agent to send data to the Firehose delivery stream.

  • Transform Data:
    You can optionally configure a Lambda function to process and transform the data.

  • Delivery Destination:
    Configure the destination where Firehose should deliver the data to.

4. Kinesis Data Analytics:

Image description

Amazon Kinesis Data Analytics service is specifically designed to process and analyze streaming data in real time using SQL.

It allows you to easily query streaming data or build an entire streaming applications for you to gain insights in a timely manner and react effectively.

Key Features

  • Real-Time Data Processing:
    Analyze streaming data in real time using SQL queries.
    Continuously ingest data from sources like Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose.

  • SQL-Based Analytics:
    Use standard SQL to process and analyze data streams.
    Build complex stream processing applications without the need for extensive programming.

  • Integrations:
    Seamlessly integrates with other AWS services like Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and AWS Lambda.
    Enables easy data ingestion and downstream processing.

  • Automatic Scaling:
    Automatically scales to match the volume and throughput of incoming data ensures consistent performance without manual intervention.

  • Built-In Error Handling:
    Provides error handling and data recovery mechanisms. Allows you to handle data processing errors gracefully.

  • Monitoring and Logging:
    Integrated with Amazon Cloud Watch for monitoring and logging.
    Offers detailed metrics and logging for operational visibility and troubleshooting.

Typical Use Cases

  • Real-Time Analytics:
    Analyze sensor data, clickstreams, application logs, and other real-time data sources to gain immediate insights.

  • Log and Event Monitoring:
    Monitor application logs and events to detect anomalies, generate alerts, and automate responses in real time.

  • Data Transformation:
    Perform real-time ETL (Extract, Transform, Load) operations on streaming data, transforming it before storing it in data lakes, warehouses, or other destinations.

  • Dynamic Content Updates:
    Update dashboards, leaderboards, or other dynamic content in real time based on incoming data streams.

Basic Workflow

  • Data Ingestion:
    Stream data from sources like IoT devices, website clickstreams, social media feeds, or application logs into Amazon Kinesis Data Streams.

  • Data Processing:
    Use Amazon Kinesis Data Analytics to write SQL queries that process the streaming data in real time. For example, you might filter, aggregate, and transform the data.

  • Data Output:
    Send the processed data to various destinations, such as Amazon S3 for storage, Amazon Redshift for further analysis, or AWS Lambda to trigger additional processing workflows.

By leveraging Amazon Kinesis Data Analytics, organizations can harness the power of real-time data to drive more informed decision-making enhance operational efficiency and improve overall responsiveness to the changing conditions.


Thank you for your time...

Top comments (0)