Core of stream processing pipeline :
Source: Any device or software program that generates real-time data quickly qualifies as a source.
Stream Ingestion: Data can be gathered and ingested in real time from tens of thousands of data sources. In this point, tools like Kafka can be useful.
Stream Storage: For a predetermined amount of time, data is kept in the order it was received. Data can be replayed endlessly during this time, and order is important at this stage.
Stream processing: Records are read in the sequence in which they are produced, enabling streaming ETL or real-time data analytics. You have data that is kept in stream storage, and you now want to process it so that you can understand it. This is point KDA is very helpful.
_ Destination_: At this point, process data needs to be sent somewhere. This stage uses a data lake or data warehouse. It makes use of third-party applications or tools like S3.
What are the use cases?
This is the simplest place to start. The data you want to use has been aggregated, which reduces the amount of storage needed overall. Other alternatives include stateful event processing (currently underutilized but helpful in detecting fraud) and analytics apps (it help in making sense of data real time data coming from application).
Amazon Kinesis Data analytics(KDA)
What is stateful stream processing? IoT devices transmit data every minute in Fahrenheit; thus, we must process the data in a way that converts it to Celsius. Since we can use the lambda function, there is no need to keep the data while processing. The simplest technique to stream stateful computation is in this manner. The Apache Flink and Apache Beam prebuilt libraries can be useful and aid in applying a variety of operators. From a broad standpoint, flink would be useful in preserving the application state where you are reading and where you are writing. In a streaming situation, maintaining data integrity is simpler.
Why Apache flink:
We already discussed that it support diverse use cases in event driven application.
Expressive API: Because of its layered API strategy, including the use of wrappers around tables and SQL API, developers choose Apache Flink.
Process guarantee: consistency. No of nodes that processing the streaming data is exactly one processed.
Scale out architecture: Adapt to desired throughput.
Community: Vibrant open source community.
Thanks to Deepthi Mohan for sharing her perspective and @aws-builders.
Top comments (0)