loading...

FLaNK Stack Demo

tspannhw profile image Timothy Spann ・1 min read

My friend, Ian Brooks, just wrote an amazing demo utilizing the FLaNK Stack.

https://github.com/BrooksIan/Flink2Kafka

It is using Apache NiFi to read NYC Taxi data (CSV), preprocessing it, transforming it and then publishing it to an Apache Kafka topic. An Apache Flink SQL Streaming job reads the Kafka events, enriches them and then publishes them to another Apache Kafka topic. Finally, Apache NiFi consumes those events from that topic.

The next step is to store both of these feeds in Apache Kudu (or another datastore in CDP say Hive, Impala (Parquet), HBase, Druid, HDFS/S3 and then write some queries / reports on top with say DAS, Hue, Zeppelin or Jupyter. Finally doing some additional machine learning with CML and writing a visual application in CML.

Really cool app and great use of Flink SQL!

Also, please note he developed and ran all of this utilizing IntelliJ, nice use of local tools and then we can push the final app to a large cloud or K8 hosted cluster like Cloudera Data Platform.

FLaNKStack #ApacheFlink #ApacheNiFi #ApacheKafka

Posted on Apr 9 by:

tspannhw profile

Timothy Spann

@tspannhw

I am a Principal Field Engineer for Data in Motion at Cloudera. I work with Apache NiFi, Apache Kafka, Apache Spark, Apache Flink, IoT, MXNet, DLJ.AI, Deep Learning, Machine Learning, Streaming...

Discussion

markdown guide