DEV Community

Kinesis Data Analytics Summary

There are two flavors of it.
There is the first one for SQL applications, and the second one for Apache Flink.

Kinesis Data Analytics for SQL Applications

So it sits in the center, and the two data sources that it's able to read from Kinesis Data Streams and Kinesis Data Firehose.
So you can read from either of those, and then you can apply SQL statements to perform your real-time analytics.

It's also possible for you to join some reference data by referencing it from an Amazon S3 bucket.
This will, for example, allow you to enrich the data in real-time.
Then you can send data to various destinations, and there are two of them.
The first one is a Kinesis data stream.
So you can create a stream out of a Kinesis Data Analytics real-time query, or you can send it directly into Kinesis Data Firehose, each with their own use cases.

If you send directly into Kinesis Data Firehose, then you can send into Amazon S3, Amazon Redshift, or Amazon OpenSearch, or any other Firehose destinations.

Whereas if you send it into a Kinesis data stream, you can do real-time processing of that stream of data using AWS Lambda or whatever applications you are running on EC2 instances.

It's a fully managed service, and you don't provision any servers.
There is automatic scaling, and you actually pay for whatever goes through Kinesis Data Analytics.

In terms of output you can go into Kinesis Data Streams or Kinesis Data Firehose.
And the use cases would be to do time-series analytics, real-time dashboards, or real-time metrics.

Kinesis Data Analytics for Apache Flink

As the name indicates, you can use actually Apache Flink on the service, and if you use Flink, you can write your application using Java, Scala, or even SQL to process and analyze streaming data.

So you may say, "Well, that's the same thing, isn't it?"
And it's not!
Flink are special applications you need to write as code, and what it allows you is that you can actually run these Flink applications on the cluster that's dedicated to it on Kinesis Data Analytics, but it's all behind the scenes.

With Apache Flink, you can read from two main data sources, you can read from Kinesis Data Streams or Amazon MSK.
So with this service, you run any Flink application on a managed cluster on AWS.

Flink is going to be a lot more powerful than just standard SQL.
So if you need advanced querying capability, or to read streaming data from other services such as Kinesis Data Streams or Amazon MSK, which is managed Kafka on AWS, then you would use this service.

So with this service, you get automatic provisioning of compute resources, parallel computation, and automatic scaling.

GitHub
LinkedIn
Facebook
Medium

Top comments (0)