DEV Community

Timothy Spann.   πŸ‡ΊπŸ‡¦
Timothy Spann. πŸ‡ΊπŸ‡¦

Posted on β€’ Originally published at datainmotion.dev on

5 1 1

Connecting Apache NiFi to Apache Atlas For Data Governance At Scale in Streaming

Connecting Apache NiFi to Apache Atlas For Data Governance At Scale in Streaming

Once connected you can see NiFi and Kafka flowing to Atlas.

You must add Atlas Report to NiFi cluster.

Add a ReportLineageToAtlas under Controller Settings / Reporting Tasks

You must add URL for Atlas, Authentication method and if basic, username/password.

You need to set Atlas Configuration directory, NiFi URL to use, Lineage Strategy - Complete Path

Another example with an AWS hosted NiFi and Atlas:

You can now see the lineage state:

Configure Atlas to Be Enabled and Have Kafka

Have Atlas Service enabled in NiFi configuration

Example Configuration

You must have access to Atlas Application Properties.

/etc/atlas/conf

atlas-application.properties

Generated by Apache NiFi ReportLineageToAtlas ReportingTask at 2020-02-21T17:18:28.493Z

Fri Feb 21 17:18:28 UTC 2020

atlas.kafka.bootstrap.servers=princeton0.field.hortonworks.com:9092

atlas.enableTLS=false

atlas.kafka.client.id=ReportLineageToAtlas.687a48e2-0170-1000-0000-00000a0de4ea

atlas.cluster.name=Princeton0

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 18.0px Menlo; color: #ffffff; background-color: #224fbc} span.s1 {font-variant-ligatures: no-common-ligatures}

atlas.kafka.security.protocol=PLAINTEXT

atlas-server.properties

princeton0.field.hortonworks.com:atlas.authentication.method.kerberos=false

princeton0.field.hortonworks.com:atlas.enableTLS=false

princeton0.field.hortonworks.com:atlas.kafka.zookeeper.connection.timeout.ms=30000

princeton0.field.hortonworks.com:atlas.kafka.zookeeper.session.timeout.ms=60000

princeton0.field.hortonworks.com:atlas.kafka.zookeeper.sync.time.ms=20

princeton0.field.hortonworks.com:atlas.server.bind.address=0.0.0.0

princeton0.field.hortonworks.com:atlas.server.http.port=31000

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 18.0px Menlo; color: #ffffff; background-color: #224fbc} span.s1 {font-variant-ligatures: no-common-ligatures}

princeton0.field.hortonworks.com:atlas.server.https.port=31443

Running Atlas

See:

Image of Datadog

The Essential Toolkit for Front-end Developers

Take a user-centric approach to front-end monitoring that evolves alongside increasingly complex frameworks and single-page applications.

Get The Kit

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

πŸ‘‹ Kindness is contagious

Please leave a ❀️ or a friendly comment on this post if you found it helpful!

Okay