DEV Community

Cover image for Demystifying Amazon Kinesis Data Streams Concepts
Anita Andonoska for AWS Community Builders

Posted on • Originally published at Medium

Demystifying Amazon Kinesis Data Streams Concepts

In the previous post we’ve explored what data streaming is, and here we’ll explore Amazon Kinesis Data Streams service.

What is Amazon Kinesis Data Streams

Amazon Kinesis Data Streams is a fully managed streaming data service that collects and stores data. It is a highly scalable and durable data ingestion pipeline. It can continuously collect and aggregate data from a wide range of sources, including IoT devices, applications, servers, and log files. It can continuously capture and store terabytes of data per hour from hundreds of thousands of sources. Data is stored in the order it was received and can be replayed indefinitely within the retention period.

It enables real-time data processing, allowing users to analyze and react to incoming data streams as they happen. This is crucial for applications that require low-latency data processing and quick decision-making.

Key concepts

Stream
A stream is the fundamental concept in Amazon Kinesis Data Streams, it is an ordered sequence of data records.

Shard
A shard is a sequence of data records in a stream. Shards are used to scale the stream and distribute data across multiple instances for parallel processing.

Data records
Data records are the individual units of data in a stream. Each data record consists of a data payload and a sequence number, which uniquely identifies the record within its shard.

Producers and Consumers
An application that sends records to a Kinesis Data Streams is called a producer. Applications and services that read and process data from Kinesis Data Streams are called consumers.

Capacity mode
Amazon Kinesis Data Streams offers two capacity modes for data ingestion:

  1. On-Demand mode. Kinesis Data Streams scales capacity automatically by managing the shards to provide the necessary throughput. You pay only for the actual throughput used, and Kinesis Data Streams automatically scales up or down to accommodate your workload throughput.
  2. Provisioned mode. You can choose provisioned mode if you want to provision and manage throughput on your own. In provisioned mode, you specify the number of shards for the data stream. The total capacity of a data stream is the sum of the capacities of its shards. You can increase or decrease the number of shards in a data stream as needed, and you pay for the number of shards at an hourly rate.

Sending and Consuming Data

AWS Kinesis Data Streams integrates with other AWS services and third party applications for sending and consuming data.

To mention a few, on the producer side Kinesis Data streams integrates with Amazon Aurora, DynamoDb, CloudWatch, EventBridge, etc. From the third party services, the most interesting one (for me) is the integration with Kafka.

To consume the data, there are direct integrations to Kinesis Data Analytics, AWS Lambda, EventBridge, SQS, SNS, Glue, etc. Through Kinesis Firehose the data can be delivered to different destinations like S3, DynamoDb, OpenSearch, etc. There are third party integrations to Databricks, Apache Spark, etc.

You can also build custom producer and consumer applications by using Amazon Kinesis APIs.

Use cases

Kinesis Data Streams enables rapid and continuous data flow from producers and then continuous processing of the data, either by transforming it before emitting to a data store, running real-time metrics and analytics, or deriving more complex data streams for further processing.

Common use cases for Amazon Kinesis Data Streams include real-time analytics, monitoring and alerting, clickstream analysis, fraud detection, IoT data processing, and log data aggregation.

Top comments (0)