This article provides a detailed, step-by-step guide on setting up a secure Apache NiFi cluster with a NiFi Registry in Kubernetes, featuring the following capabilities:
- NiFi and the NiFi Registry are secured via https.
- Authentication of all services is realized via OpenId Connect (OIDC).
- The internal communication between the nodes is encrypted.
- The communication between the cluster and the NiFi Registry is encrypted and authenticated.
Motivation
In a world where ChatGPT is bringing artificial intelligence into our everyday lives, data integration becomes a key challenge. Ensuring that AI systems receive the right data at the right time will be crucial.
This article addresses this challenge using Apache NiFi, a proven data integration system that has been effectively solving data integration problems long before the AI revolution.
However, I do have one major criticism of Apache NiFi: The barrier to entry is relatively high. The process of setting up a secure cluster of nodes and connecting it to a secure NiFi registry can be time-consuming, especially for those new to the system.
That is why I have decided to write this article to help everyone get started with this excellent system. I promise it will be worth it!
Prerequisites
You need a Linux system (I used Ubuntu 22.04.3 LTS) with the following software installed:
- docker: Platform and runtime environment for container virtualization.
- minikube: A local Kubernetes environment for development.
- helm: A package manager for organizing software and systems developed for Kubernetes.
1. Preparations
The use of minikube is very simple. With just one command, a local Kubernetes is started, which offers all the features of Kubernetes except for the high scalability:
$> minikube config set cpus 4
$> minikube config set memory 8184
$> minikube start
1.1 Enable ingress in minikube
To be able to access the Apache NiFi services via URL later on, we still need to enable the ingress controller of minikube.
$> minikube addons enable ingress
1.2 Integrate kubectl
for minikube
Another useful thing about minikube is that it always comes with a matching kubectl
client. This can be accessed with the command minikube kubectl
and behaves identically to a standalone installation of kubectl
. Therefore it is recommended to provide this command with an alias and enable auto-completion for kubectl
.
$> echo 'alias kubectl="minikube kubectl --"' >> ~/.bashrc
$> echo 'source <(kubectl completion bash)' >> ~/.bashrc
$> source ~/.bashrc
After all the above steps are done, everything is set up to use kubectl
against minikube. To test kubectl
, you can run the following command (you can use auto-completion with Tab
) :
$> kubectl version -o yaml
1.3 Install cert-manager in minikube
The cert-manager is a framework to organize X.509 certificates within a Kubernetes cluster and simplifies the process of obtaining, renewing and using certificates.
To secure the NiFi cluster from unauthorized access and to encrypt the communication between the NiFi nodes, we use the cert-manager to issue us these certificates.
The installation of the cert-manager can be done with a single command:
$> kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.12.0/cert-manager.yaml
1.4 Map NiFi domains to minikube
While the ingress controller handles URL mapping within the Kubernetes cluster, it's important to note that the URL must initially reach minikube. After successfully configuring the setup, you will be able to access the following two addresses via your browser:
nifi.example.org
nifi-registry.example.org
You can use the following command to add both mappings to the /etc/hosts
file:
$> cat << EOF | sudo tee -a /etc/hosts
# Map nifi.example.org and nifi-registry.example.org to minikube ip
`minikube ip` nifi.example.org
`minikube ip` nifi-registry.example.org
EOF
1.5 Register OpenID connect (OIDC) clients
OpenID Connect (OIDC) is a protocol for secure user authentication and information sharing, where the provider performs authentication on behalf of the application. OIDC has become a standard and is offered by many large platform providers such as Google, PayPal but also GitLab.
Moreover, OIDC is gaining popularity within organizations. Solutions such as Keycloak or Authelia offer a convenient ways to provide OpenId Connect on the basis of e.g. LDAP.
Both Apache NiFi and the Apache NiFi Registry support OpenID Connect to authenticate their users. This means we have to register two clients with a OIDC provider.
For this article I will use GitLab as OIDC provider. However, any other platform can be used as well. The only thing that changes is actually the domain name. The required information remains the same.
For GitLab the OIDC client registration is very easy. Just open your GitLab profile and create two new Applications:
NiFi OIDC Client:
-
Name
: NiFi -
Redirect URI
: https://nifi.example.org/nifi-api/access/oidc/callback -
Scopes
:openid
email
NiFi Registry OIDC Client:
-
Name
: NiFi Registry -
Redirect URI
: https://nifi-registry.example.org/nifi-registry-api/access/oidc/callback -
Scopes
:openid
email
For both registrations you need to save the following information for later integration into our NiFi services:
Name | Placeholder | Example |
---|---|---|
Discovery URL | <discovery_url> |
https://gitlab.com/.well-known/openid-configuration |
Application ID | <application_id> |
c9515c774fa1036cbcae5de455a23cc6ca7da54109a858f5b2c6869a89d40f08 |
Secret | <secret> |
90b45e16b759fa097461917e7ef3df2c79916b548e1dc44f8d1c2b2c8a8c5537 |
GitLab email | <registered_email> |
john.doe@example.org |
"The OIDC standard uses a discovery endpoint (discovery_url
) to supply clients with configuration information from the OIDC server. This endpoint URL consistently ends with .well-known/openid-configuration
but might have a unique prefix path depending on the provider.
For instance, Keycloak includes additional realm information in its discovery URL: https://{keycloakhost}:{keycloakport}/realms/{realm}/.well-known/openid-configuration
.
2. Setup an Apache NiFi cluster
One major advantage of Kubernetes standardization is the availability of numerous preconfigured software packages, including entire software systems in the form of helm packages. Fortunately, there's a helm chart for Apache NiFi, simplifying the process of setting up a entire cluster.
You can access this package through a public helm repository, which you can conveniently add to your local helm chart sources using these commands:
$> helm repo add cetic https://cetic.github.io/helm-charts
$> helm repo update
This helm chart provides a wide range of configuration options, all documented in the associated GitHub project helm-nifi.
The following configuration deploys a secure NiFi cluster with two nodes and OIDC authentication (the placeholders <application_id>
, <secret>
and <registered_email>
are defined in section 1.5):
fullnameOverride: nifi
image:
tag: 1.18.0
replicaCount: 2
properties:
sensitiveKey: changeMechangeMe
isNode: true
webProxyHost: nifi.example.org
certManager:
enabled: true
## Uncomment the next two lines only if you have
## installed the NiFi registry
# caSecrets:
# - nifi-registry-ca
auth:
admin: <registered_email>
oidc:
enabled: true
discoveryUrl: https://gitlab.com/.well-known/openid-configuration
clientId: <application_id>
clientSecret: <secret>
claimIdentifyingUser: email
admin: <registered_email>
persistence:
enabled: true
subPath:
enabled: true
ingress:
enabled: true
className: nginx
annotations:
nginx.ingress.kubernetes.io/app-root: /nifi
nginx.ingress.kubernetes.io/backend-protocol: HTTPS
nginx.ingress.kubernetes.io/affinity-mode: persistent
nginx.ingress.kubernetes.io/affinity: "cookie"
cert-manager.io/issuer: "nifi-ca"
hosts:
- nifi.example.org
tls:
- hosts:
- nifi.example.org
secretName: nifi-example-crt-secret
Now you can deploy the Apache NiFi cluster with the given configuration file nifi_values.yaml
:
$> helm upgrade -i -f nifi_values.yaml nifi cetic/nifi
Open your browser and enter the address https://nifi.example.org
.
Depending on your PC and internet connection, the download of all Docker images and the startup of the entire system can take several minutes.
You can check the current progress by entering:
$> kubectl get pods
If all pods are in the READY state, you should be able to access the service via the browser.
You may have to accept your browser's certificate warning.
3. Setup an Apache NiFi Registry
As with the installation of the NiFi cluster, there is a helm package for the NiFi registry. You need to add it to your local helm repository:
$> helm repo add dysnix https://dysnix.github.io/charts/
$> helm repo update
The following configuration deploys a secure NiFi registry with OIDC authentication (the placeholders <application_id>
, <secret>
and <registered_email>
are defined in section 1.5):
fullnameOverride: nifi-registry
image:
tag: 1.18.0
security:
enabled: true
needClientAuth: false
admin: <registered_email>
certManager:
enabled: true
replaceDefaultTrustStore: false
caSecrets:
- nifi-ca
additionalDnsNames:
- nifi-registry
oidc:
enabled: true
discoveryUrl: https://gitlab.com/.well-known/openid-configuration
clientId: <application_id>
clientSecret: <secret>
claimIdentifyingUser: email
admin: <registered_email>
persistence:
enabled: true
ingress:
enabled: true
className: nginx
annotations:
nginx.ingress.kubernetes.io/app-root: /nifi-registry
nginx.ingress.kubernetes.io/backend-protocol: HTTPS
cert-manager.io/issuer: " nifi-registry-ca"
hosts:
- host: nifi-registry.example.org
paths:
- path: /
pathType: Prefix
tls:
- hosts:
- nifi-registry.example.org
secretName: nifi-reistry-example-crt-secret
Now you can deploy the Apache NiFi Registry with the given configuration file nifi_reg_values.yaml
:
$> helm upgrade -i -f nifi_reg_values.yaml nifi-reg dysnix/nifi-registry
Open your browser and enter the address https://nifi-registry.example.org
.
4. Configure Apache NiFi
Since we have enabled authentication with the NiFi Registry, the NiFi cluster must also authenticate itself to the registry. For this to happen, both services need to trust each other.
To do this, we have to uncomment the caSecrets
configuration block from the nifi_values.yaml
. This imports the newly available NiFi Registry certificate into the local truststore of the NiFi nodes. To activate the change, you need to deploy configuration again. If you have copied the whole nifi_values.yaml
originally you can use following command to uncomment the lines and redeploy the NiFi nodes:
$> sed -i 's/#[^#]//' nifi_values.yaml && helm upgrade -i -f nifi_values.yaml nifi cetic/nifi
After the cluster nodes have been rebooted, you will need to add a new NiFi registry to the NiFi settings. Navigate to the burger menu at the top right of the NiFi UI and opening the "Controller Settings" menu. Go to the "REGISTER CLIENTS" tab and register a new client using the "+" symbol.
Give it a name and an optional description and save it.
You will then need to edit the newly created entry, go to the "PROPERTIES" tab and set the URL to https://nifi-registry:18080
.
This URL
https://nifi-registry:18080
is the internal service address within the cluster. Do not use the external URLhttps://nifi-registry.example.org
otherwise the node authentication will fail.
Now only the node authentication is left. For this purpose, each NiFi node is created as a separate user. These can be found by clicking on the burger menu at the top right of the NiFi UI and opening the "Users" menu.
Exactly these users must now be created and authorized in the NiFi Registry. To do this, go to the NiFi Registry UI and click on "LOGIN" in the upper right corner.
Once GitLab has successfully authenticated you, you will be redirected back to your NiFi registry and your username should show up along with a "Settings" icon.
You can click it and create a new "Bucket".
Then switch to the "USERS" Tab and create the NiFi node users with read permissions on "Can manage buckets" and read, write and delete permissions on "Can proxy user requests".
If all steps have been completed successfully, you should now see the "Sample Bucket" in the NiFi UI when you import a "Process Group" from the Registry (drag&drop a Process Group
from the header menu and click Import from Registry
).
4. Limitations and Troubleshooting
The helm charts cetic/nifi
(1.1.4
) and dysnix/nifi-registry
(1.1.4
) are very helpful, but have some limitations.
The latest working NiFi and NiFi Registry docker versions are
1.18.0
. The following container versions use Java 11, which brings a new default format (PKCS12 instead of JKS) for the truststore (breaking change).When deploying the NiFi Registry, a permission error may occur on the
auth-conf
folder. This is due to a bug in helm chart which omits the setting of permissions on this folder. Either the permissions on this folder have to be set manually once, or the folder has to be added to theinitContainers
script.Due to the use of the cert-manager, certificates are issued on the fully qualified internal Kubernetes service name. This is composed as follows:
{{ fullname }}-{{ replicaCount }}.{{ fullname }}-headless.{{ namespace }}.svc.cluster.local
.
Since the common name of a certificate may not exceed 64 bytes, this leads to the following name length restriction:
2 * length(fullname) + length(replicaCount) + length(namespace) < 35 characters
.The
sensitiveKey
andclientSecret
secrets cannot be passed as Kubernetes secrets. This means that the configurations should not be pushed into a version control system.Automatic scaling of an existing NiFi cluster is not possible by simply increasing the
replicaCount
. Once deployed, you must additionally add the new nodes manually to the configuration of all cluster nodes.
5. Conclusion
This article gave you a tutorial on how to set up a secure Apache NiFi cluster with Apache NiFi Registry integration. It also addresses the limitations and challenges that can arise in this process. If everything worked well, you can now seamlessly dive into modeling data integration workflows and become familiar with Apache NiFi's core functionality. With this knowledge, you're well-equipped to harness the advantages of this powerful platform and effectively handle your data integration tasks.
Top comments (12)
Update
The restrictions for using Apache NiFi versions `>1.18.0´ (section 4.1) have been fixed:
cetic/nifi
in version1.2.0
(pull request)dysnix/nifi-registry
in version1.1.5
(pull request)This means that you can now also use the latest Apache NiFi versions.
Hi Jannik Rebmann,
I am trying this configuration but i am facing some issue .
Using oidc nifi and nifi-registry redirect-uri is coming as below:
Nifi : https://:443/nifi-api/access/oidc/callback
Nifi-registry: http://:80/nifi-registry-api/access/oidc/callback
is there something i am missing. why nifi-registry oidc redirect uri is coming on http. but in logs it is running on https(18443).
Please help me out here.
Hi @anmoln4
I had face this issue before you need add header x-proxyscheme: https and x-proxyport:443 in request-transformer for nifi to redirect https header instead of http
Hope its help you.
Hi @anmoln4
I think I need more information about your OIDC configuration.
The
Callback URL
must be set with your OIDC provider. This is the URL that sends back the OIDC authentication response to your NIFI service.So maybe you have set
http://:80/nifi-registry-api/access/oidc/callback
asCallback URL
on your OIDC server?Hi @jrebmann ,
I am following your configuration to enabled CertManager, but i am hitting some issue on unable to locate initial admin. Could you possible to share an example for authorizers.xml ?
Error:
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'authorizer': FactoryBean threw exception on object creation; nested exception is org.apache.nifi.authorization.exception.AuthorizerCreationException: org.apache.nifi.authorization.exception.AuthorizerCreationException: Unable to locate initial admin JoseAce@xxxx.com to seed policies
authorizers.xml
Hi @kamniphat01,
thanks for your question.
I have never experienced this error.
Please make sure you also have the mail
joseace@xxxx.com
set in thesecurity
andoidc
sections.hi @jrebmann ,
Thanks for the article on how to setup a secure nifi cluster. Now i was able to successfully deploy nifi cluster with oidc method. Appreciate it
@kamniphat01 You're welcome! Thank you for reading. I hope you will like Apache NiFi ... it has solved so many problems for me.
am deploying nifi and nifi-registry on aks and everything is working but the integration with git.
I try almost everything
change persistance from true to false, tried username and password auth.
this code is part of the values.yaml of nifi.
Hi @heni_nechi,
first of all, I would recommend that you use at least version 1.18.0 of the Apache NIfi Registry. I had a similar problem with the Git integration. I finally solved the problem by using a ssh key.
The corresponding secret looks like following:
I hope this helps you.
Hello @jrebmann
Thanks for your quick response I tried using the integration with ssh key before, I'll give it a shot again with the code provided and I'll get back to you with a reply.
Hey again @jrebmann
as I already told I have tried using the ssh key before and it didn't work same as now.
I really don't know what am doing worng, the secret is being set right by checking the logs it's always defaulting to FileSystemFlowPersistenceProvider and the providers.xml is not being configured :