DEV Community

Cover image for Streaming Services Review: AWS Kinesis Data Streams vs Azure Event Hubs
Anita Andonoska for AWS Community Builders

Posted on • Originally published at Medium

Streaming Services Review: AWS Kinesis Data Streams vs Azure Event Hubs

In this post we’ll provide a comprehensive overview of AWS Kinesis Data Streams and Azure Event Hubs, covering various aspects of both services.

General overview

Data streaming (also known as event streaming) is a continuous flow of data generated by thousands of sources. Sources of data stream can be application logs, clickstream data from websites and mobile devices, telemetry data from IoT devices, real-time location tracking etc. (more info can be found in my previous post).

Azure offers Event Hubs as their streaming service: “Azure Event Hubs is a cloud native data streaming service that can stream millions of events per second, with low latency, from any source to any destination.” In Azure you first create Event Hubs namespace which is a management container for event hubs. In the namespace you create one or more event hubs. The data records within Event Hub are organized into partitions.

AWS offers Kinesis Data Streams: “Amazon Kinesis Data Streams is a serverless streaming data service that makes it easy to capture, process, and store data streams at any scale.” In Kinesis, you create the stream as a single resource. The data records within Kinesis data stream are organized into shards.

Capabilities

Azure Event Hubs is a multi-protocol event streaming engine that natively supports AMQP, Apache Kafka, and HTTPs protocols. Event Hubs is compatible with Apache Kafka, and it enables you to run existing Kafka workloads without any code changes. Event Hub integrates with Azure Functions, Azure Spring Apps, Kafka Connectors, and other data analytics platforms and technologies such as Databricks, Apache Spark and Apache Flink.

AWS Kinesis Data Streams is exposed through HTTPs protocol. AWS Kinesis Data Streams integrates with other AWS services and third party applications for sending and consuming data. On the producer side Kinesis Data streams integrates with Amazon Aurora, DynamoDb, CloudWatch, EventBridge, etc., and with third party services, like the integration with Kafka. To consume the data, there are direct integrations to Kinesis Data Analytics, AWS Lambda, EventBridge, SQS, SNS, Glue, etc. and there are third party integrations to Databricks, Apache Spark, etc.

Both services allow event size up to 1MB.

Pricing Models

As usual, Microsoft offers tiered pricing with Basic, Standard, Premium and Dedicated pricing models for Event Hubs as well. The higher cost models offer more throughput capacity and there are some differences in features available. Throughput units are pre purchased and are billed per hour. You are charged for throughput units and number of ingress events.

Kinesis Data Streams has two capacity modes: on-demand mode, where AWS manages the capacity and provisioned mode, where you specify the number of shards for your application.
In on-demand mode, pricing is based on the volume of data ingested and retrieved along with a per-hour charge for each data stream in your account. In provisioned mode you’re charged for each shard at an hourly rate and also pay for records written into your Kinesis data stream.

Scaling and Performance

Scaling in Event Hub depends on throughput units (standard tier) or processing units (premium tier) and partitions. A single throughput unit allows for up to 1 MB per second or 1,000 events per second ingress, and up to 2 MB per second or 4,096 events per second egress. Beyond the capacity of the purchased throughput units, ingress is throttled. A standard Event Hubs namespace has a limit of 40 throughput units, and these are shared across all event hubs in that namespace.

The auto-scaling or as it is called in Azure, auto-inflate feature of Event Hubs, automatically scales up by increasing the number of throughput units, to meet usage needs. You need to initially set the desired throughput units. The Auto-inflate feature of Event Hubs automatically scales up by increasing the number of TUs, to meet usage needs. Increasing TUs prevents throttling scenarios where data ingress or data egress rates exceed the rates allowed by the TUs assigned to the namespace. Auto-inflate doesn’t automatically scale down the number of TUs when ingress or egress rates drop below the limits.

Kinesis Data Streams by default, if you choose on-demand mode, scales capacity automatically, freeing you from provisioning and managing capacity. Kinesis Data Streams monitors your data traffic and scales the shard count up or down depending on traffic increase or decrease. You can choose provisioned mode if you want to provision and manage throughput on your own.

By default, new data streams created with the on-demand capacity mode have 4 MB/s of write and 8 MB/s of read throughput. As the traffic increases, data streams with the on-demand capacity mode scale up to 200 MB/s of write and 400 MB/s read throughput. You can request a quota increase up to 2 GB/s write and 4 GB/s read capacity.

Data Consumption

With Event Hubs the consumption of data is done through consumer groups. Consumer group is a logical grouping of consumers that read data from an event hub. It enables multiple consuming applications to read the same streaming data in an event hub independently at their own pace with their offsets.

In Kinesis you can choose between shared fan-out and enhanced fan-out consumer types to read data from a Kinesis data stream. When a consumer uses enhanced fan-out, it gets its own 2 MB/sec allotment of read throughput, allowing multiple consumers to read data from the same stream in parallel, without contending for read throughput with other consumers. You should use enhanced fan-out if you have, or expect to have, multiple consumers retrieving data from a stream in parallel. There are additional charges for using the Enhanced Fan-Out feature.

Data Retention and Durability

Published events are removed from an event hub based on a configurable, timed-based retention policy. The default value and shortest possible retention period is 1 hour. For Event Hubs Standard, the maximum retention period is 7 days. For Event Hubs Premium and Dedicated, the maximum retention period is 90 days. If you need to archive events beyond the allowed retention period, you can have them automatically stored in Azure Storage or Azure Data Lake by turning on the Event Hubs Capture feature.

Kinesis Data Streams have a default retention period of 24 hours that can be extended to 7 days. There is also long-term data retention greater than seven days and up to 365 days that lets you reprocess old data. You can also deliver data stored in Kinesis Data Streams to Amazon S3, Amazon OpenSearch Service, Amazon Redshift, and custom HTTP endpoints using its prebuilt integration with Kinesis Data Firehose.

Conclusion

Both services in general offer similar features, like auto-scaling, data retention and durability etc. Event Hubs support more protocols like AMQP and Apache Kafka.

There is a difference in scaling and performance. AWS has a simple model for this, you either choose on-demand and the service scales up or down based on traffic, or you handle the capacity yourself with provisioned mode. With Event Hubs even though there is auto scaling, it does not scale down, so you have to manage this yourself.

Top comments (0)