Wait, Why Should I Care About DNS?
Let me start with a story. Early in my career, I got paged at 2 AM because "nothing was working." Applications were timing out, users couldn't access services, and my monitoring was completely useless. After hours of panic, we discovered someone had accidentally modified the DNS configuration, and the entire cluster couldn't resolve internal service names.
That night changed how I think about DNS. It's the foundation everything else builds on, and when it breaks, nothing works.
In this guide, I'll walk you through how to set up DNS governance for OpenShift using Red Hat Advanced Cluster Management (RHACM). Don't worry if you're new to this - I'll explain everything from scratch.
What Even Is DNS in OpenShift?
First, let's talk about what DNS does in your cluster. You can think of DNS as the phonebook of the internet. When your application wants to talk to another service (like a database or API), it needs to know the IP address. DNS translates service names to IP addresses.
In OpenShift, this is handled by CoreDNS - a DNS server that runs on every node in your cluster. Each node has its own DNS resolver, which means your pods don't need to go far to resolve names.
Here's the cool part: OpenShift automatically creates DNS entries for your services. If you create a service called my-app in the namespace production, other pods can reach it just by using my-app.production.svc.cluster.local. No manual configuration needed.
The DNS Operator in OpenShift manages all of this. It watches for services you create and automatically updates the DNS records. Pretty handy, right?
What's RHACM and Why Do I Need It?
Now, let's talk about RHACM. If you're managing multiple OpenShift clusters (like a development cluster, staging, and production), RHACM is like a "hub" that lets you control all of them from one place.
One of RHACM's superpowers is policies. A policy is basically a rule that says "this is how things should be configured." You define the policy on your hub cluster, and RHACM makes sure all your managed clusters comply with it.
For DNS governance, we want policies that:
- Check if DNS is healthy
- Verify the configuration hasn't drifted
- Alert us if something goes wrong
The Four Things We Need to Monitor
Here's my simple framework for DNS governance. We're going to check four things:
1. Is the DNS Operator Happy?
The DNS Operator is the thing that manages CoreDNS. If it's sad (degraded), nothing else will work properly. We monitor this using something called a ClusterOperator resource.
2. Is the Corefile Correct?
The Corefile is the configuration file for CoreDNS. It tells DNS what plugins to use and how to handle queries. We want to make sure critical plugins are always present.
3. Are All DNS Pods Running?
CoreDNS runs as a DaemonSet (one pod per node). Sometimes pods show as "Running" but aren't actually working. We need to verify all expected pods are truly available.
4. Do We Get Alerted?
If something goes wrong, we need to know about it. We'll set up alerts that page us when DNS has problems.
Let's Build It: Step-by-Step
Ready to see how this works? Here's how to set up DNS governance on your cluster.
Prerequisites
You'll need:
- An OpenShift cluster with RHACM installed
- Access to the
occommand line tool - Permission to create policies on the hub cluster
Step 1: Clone the Repository
First, grab the policy templates from my repository:
git clone https://github.com/tosin2013/dns-policy-config.git
cd dns-policy-config
Step 2: Create the Policy Namespace
On your RHACM hub cluster, create a namespace to hold your DNS policies:
oc apply -f demo/namespace.yaml
This creates a namespace called dns-governance-policies.
Step 3: Bind Your ClusterSet
Next, connect your managed cluster to this namespace:
oc apply -f demo/clusterset-binding.yaml
This tells RHACM which clusters should receive these policies.
Step 4: Apply the DNS Policies
Now let's add the four DNS governance policies:
# Monitor DNS Operator health
oc apply -f policies/dns/operator-health-check.yaml
# Check Corefile configuration
oc apply -f policies/dns/corefile-integrity.yaml
# Verify all DNS pods are running
oc apply -f policies/dns/resource-exhaustion.yaml
# Set up alerting
oc apply -f policies/observability/dns-alerting-rule.yaml
Step 5: Point to Your Cluster
Edit the demo/placement.yaml file to target your managed cluster. Look for the cluster name and change it to match yours:
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
name: dns-policy-placement
namespace: dns-governance-policies
spec:
predicates:
- requiredClusterSelector:
labelSelector:
matchExpressions:
- key: name
operator: In
values:
- your-cluster-name # Change this!
Then apply it:
oc apply -f demo/placement.yaml
oc apply -f demo/placement-binding.yaml
Step 6: Check Compliance
Let's see if everything is working:
oc get policy -n dns-governance-policies
You should see something like:
NAME REMEDIATION COMPLIATION
policy-dns-operator-health inform Compliant
policy-dns-corefile-integrity inform Compliant
policy-dns-resource-exhaustion inform Compliant
policy-dns-alerting-rule enforce Compliant
All four policies should show "Compliant"!
What Does Each Policy Actually Do?
Let me break down each policy in plain English:
Policy 1: operator-health-check
This watches the DNS Operator and makes sure it's not degraded. If the operator has problems, this policy will tell you.
Policy 2: corefile-integrity
This checks that your CoreDNS configuration has the essential plugins: forward, errors, health, and cache. If any are missing, you'll know.
Policy 3: resource-exhaustion
This verifies that the number of DNS pods actually running matches what should be running. Sometimes pods can be in a weird state - this catches that.
Policy 4: dns-alerting-rule
This creates Prometheus alerts that will page you when DNS has problems. This is the only policy that uses "enforce" mode because it creates new alerting rules.
Why "Inform" Instead of "Enforce"?
You might notice most policies use "inform" mode instead of "enforce." Here's why that's intentional:
- Inform means "tell me if something is wrong, but don't fix it automatically"
- Enforce means "automatically fix things"
For DNS, automatic fixes are risky. Imagine if a policy accidentally overwrote your DNS configuration - you'd have a cluster-wide outage. By using "inform" mode, we get alerted to problems but don't risk making things worse automatically.
The only exception is the alerting rule, which creates new alert definitions - that's safe to enforce.
Wrapping Up
And that's it! You've now got DNS governance set up for your OpenShift cluster. Here's what you've accomplished:
- Created policies that monitor your DNS Operator
- Verified your CoreDNS configuration is correct
- Ensured all DNS pods are actually running
- Set up alerts for when things go wrong
DNS might seem like background infrastructure, but it deserves attention. With these policies in place, you'll know about DNS problems before they become cluster-wide outages.
Remember: the best time to set up governance was when you deployed your cluster. The second best time is now.
Quick Reference
Commands used:
git clone https://github.com/tosin2013/dns-policy-config.git
oc apply -f demo/namespace.yaml
oc apply -f demo/clusterset-binding.yaml
oc apply -f policies/dns/operator-health-check.yaml
oc apply -f policies/dns/corefile-integrity.yaml
oc apply -f policies/dns/resource-exhaustion.yaml
oc apply -f policies/observability/dns-alerting-rule.yaml
oc apply -f demo/placement.yaml
oc apply -f demo/placement-binding.yaml
What to check:
oc get policy -n dns-governance-policies
Questions? Want to learn more? Check out the full repository at github.com/tosin2013/dns-policy-config
Top comments (0)