loading...

Migrating Apache Flume Flows to Apache NiFi: Kafka Source to Apache Parquet on HDFS

tspannhw profile image Timothy Spann Originally published at datainmotion.dev on ・3 min read

Migrating Apache Flume Flows to Apache NiFi: Kafka Source to Apache Parquet on HDFS

Article 3 - This

*Article 2 - * https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache.html

Article 1 - https://www.datainmotion.dev/2019/08/migrating-apache-flume-flows-to-apache.html

*Source Code: * https://github.com/tspannhw/flume-to-nifi

This is one possible simple, fast replacement for "Flafka". I can read any/all Kafka topics, route and transform them with SQL and store them in Apache ORC, Apache Avro, Apache Parquet, Apache Kudu, Apache HBase, JSON, CSV, XML or compressed files of many types in S3, Apache HDFS, File Systems or anywhere you want to stream this data in Real-time. Also with a fast easy to use Web UI. Everything you liked doing in Flume but now easier and with more Source and Sink options.

[

](https://1.bp.blogspot.com/-rrzR-xonOAg/XZvQp8wsUDI/AAAAAAAAYew/7LUVLuqb5hY1pkAPV9l8pU_vPOvHf640gCLcBGAsYHQ/s1600/createParquetTable.png)[

](https://1.bp.blogspot.com/-rrzR-xonOAg/XZvQp8wsUDI/AAAAAAAAYew/7LUVLuqb5hY1pkAPV9l8pU_vPOvHf640gCLcBGAsYHQ/s1600/createParquetTable.png)Consume Kafka And Store to Apache Parquet

Kafka to Kudu, ORC, AVRO and Parquet

With Apache 1.10 I can send those Parquet files anywhere not only HDFS.

JSON (or CSV or AVRO or ...) and Parquet Out

In Apache 1.10, Parquet has a dedicated reader and writer

Or I can use PutParquet

Create A Parquet Table and Query It

References

Posted on by:

tspannhw profile

Timothy Spann

@tspannhw

I am a Principal Field Engineer for Data in Motion at Cloudera. I work with Apache NiFi, Apache Kafka, Apache Spark, Apache Flink, IoT, MXNet, DLJ.AI, Deep Learning, Machine Learning, Streaming...

Discussion

markdown guide