DEV Community

🇺🇦Timothy Spann🇺🇦
🇺🇦Timothy Spann🇺🇦

Posted on • Originally published at datainmotion.dev on

Migrating Apache Flume Flows to Apache NiFi: Kafka Source to Apache Parquet on HDFS

Migrating Apache Flume Flows to Apache NiFi: Kafka Source to Apache Parquet on HDFS

Article 3 - This

*Article 2 - * https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache.html

Article 1 - https://www.datainmotion.dev/2019/08/migrating-apache-flume-flows-to-apache.html

*Source Code: * https://github.com/tspannhw/flume-to-nifi

This is one possible simple, fast replacement for "Flafka". I can read any/all Kafka topics, route and transform them with SQL and store them in Apache ORC, Apache Avro, Apache Parquet, Apache Kudu, Apache HBase, JSON, CSV, XML or compressed files of many types in S3, Apache HDFS, File Systems or anywhere you want to stream this data in Real-time. Also with a fast easy to use Web UI. Everything you liked doing in Flume but now easier and with more Source and Sink options.

[

](https://1.bp.blogspot.com/-rrzR-xonOAg/XZvQp8wsUDI/AAAAAAAAYew/7LUVLuqb5hY1pkAPV9l8pU_vPOvHf640gCLcBGAsYHQ/s1600/createParquetTable.png)[

](https://1.bp.blogspot.com/-rrzR-xonOAg/XZvQp8wsUDI/AAAAAAAAYew/7LUVLuqb5hY1pkAPV9l8pU_vPOvHf640gCLcBGAsYHQ/s1600/createParquetTable.png)Consume Kafka And Store to Apache Parquet

Kafka to Kudu, ORC, AVRO and Parquet

With Apache 1.10 I can send those Parquet files anywhere not only HDFS.

JSON (or CSV or AVRO or ...) and Parquet Out

In Apache 1.10, Parquet has a dedicated reader and writer

Or I can use PutParquet

Create A Parquet Table and Query It

References

Discussion (0)