DEV Community

Timothy Spann.   🇺🇦
Timothy Spann. 🇺🇦

Posted on • Originally published at datainmotion.dev on

Top 25 Use Cases of Cloudera Flow Management Powered by Apache NiFi

Top 25 Use Cases of Cloudera Flow Management Powered by Apache NiFi

Cloudera Flow Management has proven immensely popular in solving so many different use cases I thought I would make a list of the top twenty-five that I have seen recently.

If you have never used CFM or Apache NiFi before, please checkout these two quick resources: https://github.com/tspannhw/EverythingApacheNiFi and https://nifi.apache.org/docs/nifi-docs/.

21-25

25. Ingesting Data into Kafka in the Public Cloud

https://docs.cloudera.com/cdf-datahub/7.2.2/nifi-kafka-ingest/topics/cdf-datahub-fm-kafka-ingest-overview.html

24. Cybersecurity Data Collection and Filtering

https://www.datainmotion.dev/2020/10/monitoring-mac-laptops-with-apache-nifi.html

23. Ingesting Data into Hive in the Public Cloud

https://docs.cloudera.com/cdf-datahub/7.2.2/nifi-hive-ingest/topics/cdf-datahub-nifi-hive-ingest.html

22. Ingesting Data into HBase in the Public Cloud

https://docs.cloudera.com/cdf-datahub/7.2.2/nifi-hbase-ingest/topics/cdf-datahub-nifi-hbase-ingest.html

21. Ingesting Data into Kudu in the Public Cloud

https://docs.cloudera.com/cdf-datahub/7.2.2/nifi-kudu-ingest/topics/cdf-datahub-nifi-kudu-ingest.html

16-20

20. Ingesting Data into ADLS Storage

https://docs.cloudera.com/cdf-datahub/7.2.2/nifi-azure-ingest/topics/cdf-datahub-fm-adls-ingest-overview.html

19. Populate SOLR Indexes

https://www.datainmotion.dev/2020/04/building-search-indexes-with-apache.html

18. Hadoop Data to Kafka

https://www.datainmotion.dev/2020/04/read-apache-impala-apache-kudu-tables.html

17. Deep Learning And Machine Learning Pipelines

https://www.datainmotion.dev/2019/12/easy-deep-learning-in-apache-nifi-with.html

16. Intercepting JMS and SOA

https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache_42.html

11-15

15. Edge ML Model Integration

https://www.datainmotion.dev/2019/08/updating-machine-learning-models-at.html

14. Migrate Data from On-Premise Private Cloud to Public Cloud

https://docs.cloudera.com/cfm/2.0.4/site-to-site/topics/cdf-datahub-site-to-site.html

13. Converting XML to JSON

https://www.datainmotion.dev/2020/07/ingesting-all-weather-data-with-apache.html

12. MQTT to HDFS

https://community.cloudera.com/t5/Community-Articles/Ingesting-Apache-MXNet-Gluon-Deep-Learning-Results-Via-MQTT/ta-p/248544

https://community.cloudera.com/t5/Community-Articles/Deep-Learning-IoT-Workflows-with-Raspberry-Pi-MQTT-MXNet/ta-p/249456

11. Ingesting REST Endpoints (Bulk)

https://www.datainmotion.dev/2020/07/ingesting-all-weather-data-with-apache.html

6-10

10. Ingesting Data into AWS S3 Buckets

https://docs.cloudera.com/cdf-datahub/7.2.2/nifi-aws-ingest/topics/cdf-datahub-fm-s3-ingest-overview.html

9. Ingest REST Endpoints

https://www.datainmotion.dev/2020/05/cloudera-flow-management-101-lets-build.html

8. Ingesting SAAS Products Like Salesforce

https://www.datainmotion.dev/2019/11/ingest-salesforce-data-into-hive-using.html

7. Automating Manual Tasks

https://www.datainmotion.dev/2020/09/using-google-forms-as-a-data-source-for.html

6. Ingesting Social Media Data

https://www.datainmotion.dev/2020/04/harnessing-data-lifecycle-for-customer.html

Top 5

5. Logs, Logs, Logs

https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache_35.html

https://www.datainmotion.dev/2019/08/migrating-apache-flume-flows-to-apache.html

4. FLaNK Streaming Data Pipeline (Any Data to Kafka to Flink SQL)

https://www.flankstack.dev/

3. IoT - MiNiFi Agents Ingest, Store and Forward

https://www.datainmotion.dev/2020/02/edgeai-google-coral-with-coral.html

https://community.cloudera.com/t5/Community-Articles/IoT-Series-Sensors-Utilizing-Breakout-Garden-Hat-Part-2/ta-p/249380

2. Pseudo-CDC / Database Ingest

https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache_15.html

1. Doing a 1,000 different ingest, conversion, routing and transformation flows

The most common use case is doing a lot of things with a lot of data, including things like documents, XML, JSON, AVRO, Parquet, CSV, PDF, Images, Video, Mongo documents, Logs and more. Rarely do I ever see someone solve just one problem with NiFi and say, that was enough. One simple use cases leads to another and another and before you know it every cron job, script, ETL, ELT and big data op is now touched by NiFi. Keep it up, Cloudera will make it ever easier soon. Also check out NiFi Stateless for some of those more job/event oriented things like File to Kafka, Kafka to Kafka and more.

https://community.cloudera.com/t5/Community-Articles/Scanning-Documents-into-Data-Lakes-via-Tesseract-MQTT-Python/ta-p/248492

Top comments (0)