loading...

Connecting Apache NiFi to Apache Atlas For Data Governance At Scale in Streaming

tspannhw profile image Timothy Spann Originally published at datainmotion.dev on ・4 min read

Connecting Apache NiFi to Apache Atlas For Data Governance At Scale in Streaming

Once connected you can see NiFi and Kafka flowing to Atlas.

You must add Atlas Report to NiFi cluster.

Add a ReportLineageToAtlas under Controller Settings / Reporting Tasks

You must add URL for Atlas, Authentication method and if basic, username/password.

You need to set Atlas Configuration directory, NiFi URL to use, Lineage Strategy - Complete Path

Another example with an AWS hosted NiFi and Atlas:

You can now see the lineage state:

Configure Atlas to Be Enabled and Have Kafka

Have Atlas Service enabled in NiFi configuration

Example Configuration

You must have access to Atlas Application Properties.

/etc/atlas/conf

atlas-application.properties

Generated by Apache NiFi ReportLineageToAtlas ReportingTask at 2020-02-21T17:18:28.493Z

Fri Feb 21 17:18:28 UTC 2020

atlas.kafka.bootstrap.servers=princeton0.field.hortonworks.com:9092

atlas.enableTLS=false

atlas.kafka.client.id=ReportLineageToAtlas.687a48e2-0170-1000-0000-00000a0de4ea

atlas.cluster.name=Princeton0

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 18.0px Menlo; color: #ffffff; background-color: #224fbc} span.s1 {font-variant-ligatures: no-common-ligatures}

atlas.kafka.security.protocol=PLAINTEXT

atlas-server.properties

princeton0.field.hortonworks.com:atlas.authentication.method.kerberos=false

princeton0.field.hortonworks.com:atlas.enableTLS=false

princeton0.field.hortonworks.com:atlas.kafka.zookeeper.connection.timeout.ms=30000

princeton0.field.hortonworks.com:atlas.kafka.zookeeper.session.timeout.ms=60000

princeton0.field.hortonworks.com:atlas.kafka.zookeeper.sync.time.ms=20

princeton0.field.hortonworks.com:atlas.server.bind.address=0.0.0.0

princeton0.field.hortonworks.com:atlas.server.http.port=31000

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 18.0px Menlo; color: #ffffff; background-color: #224fbc} span.s1 {font-variant-ligatures: no-common-ligatures}

princeton0.field.hortonworks.com:atlas.server.https.port=31443

Running Atlas

See:

Posted on by:

tspannhw profile

Timothy Spann

@tspannhw

I am a Principal Field Engineer for Data in Motion at Cloudera. I work with Apache NiFi, Apache Kafka, Apache Spark, Apache Flink, IoT, MXNet, DLJ.AI, Deep Learning, Machine Learning, Streaming...

Discussion

markdown guide