Tosin Akinosho

Posted on Feb 24

DNS Governance for OpenShift Beginners: A Friendly Guid

#openshift #kubernetes #devops #dns

Wait, Why Should I Care About DNS?

Let me start with a story. Early in my career, I got paged at 2 AM because "nothing was working." Applications were timing out, users couldn't access services, and my monitoring was completely useless. After hours of panic, we discovered someone had accidentally modified the DNS configuration, and the entire cluster couldn't resolve internal service names.

That night changed how I think about DNS. It's the foundation everything else builds on, and when it breaks, nothing works.

In this guide, I'll walk you through how to set up DNS governance for OpenShift using Red Hat Advanced Cluster Management (RHACM). Don't worry if you're new to this - I'll explain everything from scratch.

What Even Is DNS in OpenShift?

First, let's talk about what DNS does in your cluster. You can think of DNS as the phonebook of the internet. When your application wants to talk to another service (like a database or API), it needs to know the IP address. DNS translates service names to IP addresses.

In OpenShift, this is handled by CoreDNS - a DNS server that runs on every node in your cluster. Each node has its own DNS resolver, which means your pods don't need to go far to resolve names.

Here's the cool part: OpenShift automatically creates DNS entries for your services. If you create a service called my-app in the namespace production, other pods can reach it just by using my-app.production.svc.cluster.local. No manual configuration needed.

The DNS Operator in OpenShift manages all of this. It watches for services you create and automatically updates the DNS records. Pretty handy, right?

What's RHACM and Why Do I Need It?

Now, let's talk about RHACM. If you're managing multiple OpenShift clusters (like a development cluster, staging, and production), RHACM is like a "hub" that lets you control all of them from one place.

One of RHACM's superpowers is policies. A policy is basically a rule that says "this is how things should be configured." You define the policy on your hub cluster, and RHACM makes sure all your managed clusters comply with it.

For DNS governance, we want policies that:

Check if DNS is healthy
Verify the configuration hasn't drifted
Alert us if something goes wrong

The Four Things We Need to Monitor

Here's my simple framework for DNS governance. We're going to check four things:

1. Is the DNS Operator Happy?

The DNS Operator is the thing that manages CoreDNS. If it's sad (degraded), nothing else will work properly. We monitor this using something called a ClusterOperator resource.

2. Is the Corefile Correct?

The Corefile is the configuration file for CoreDNS. It tells DNS what plugins to use and how to handle queries. We want to make sure critical plugins are always present.

3. Are All DNS Pods Running?

CoreDNS runs as a DaemonSet (one pod per node). Sometimes pods show as "Running" but aren't actually working. We need to verify all expected pods are truly available.

4. Do We Get Alerted?

If something goes wrong, we need to know about it. We'll set up alerts that page us when DNS has problems.

Let's Build It: Step-by-Step

Ready to see how this works? Here's how to set up DNS governance on your cluster.

Prerequisites

You'll need:

An OpenShift cluster with RHACM installed
Access to the oc command line tool
Permission to create policies on the hub cluster

Step 1: Clone the Repository

First, grab the policy templates from my repository:

git clone https://github.com/tosin2013/dns-policy-config.git
cd dns-policy-config

Step 2: Create the Policy Namespace

On your RHACM hub cluster, create a namespace to hold your DNS policies:

oc apply -f demo/namespace.yaml

This creates a namespace called dns-governance-policies.

Step 3: Bind Your ClusterSet

Next, connect your managed cluster to this namespace:

oc apply -f demo/clusterset-binding.yaml

This tells RHACM which clusters should receive these policies.

Step 4: Apply the DNS Policies

Now let's add the four DNS governance policies:

# Monitor DNS Operator health
oc apply -f policies/dns/operator-health-check.yaml

# Check Corefile configuration
oc apply -f policies/dns/corefile-integrity.yaml

# Verify all DNS pods are running
oc apply -f policies/dns/resource-exhaustion.yaml

# Set up alerting
oc apply -f policies/observability/dns-alerting-rule.yaml

Step 5: Point to Your Cluster

Edit the demo/placement.yaml file to target your managed cluster. Look for the cluster name and change it to match yours:

apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
  name: dns-policy-placement
  namespace: dns-governance-policies
spec:
  predicates:
  - requiredClusterSelector:
      labelSelector:
        matchExpressions:
        - key: name
          operator: In
          values:
          - your-cluster-name  # Change this!

Then apply it:

oc apply -f demo/placement.yaml
oc apply -f demo/placement-binding.yaml

Step 6: Check Compliance

Let's see if everything is working:

oc get policy -n dns-governance-policies

You should see something like:

NAME                              REMEDIATION   COMPLIATION
policy-dns-operator-health        inform        Compliant
policy-dns-corefile-integrity    inform        Compliant
policy-dns-resource-exhaustion   inform        Compliant
policy-dns-alerting-rule         enforce       Compliant

All four policies should show "Compliant"!

What Does Each Policy Actually Do?

Let me break down each policy in plain English:

Policy 1: operator-health-check

This watches the DNS Operator and makes sure it's not degraded. If the operator has problems, this policy will tell you.

Policy 2: corefile-integrity

This checks that your CoreDNS configuration has the essential plugins: forward, errors, health, and cache. If any are missing, you'll know.

Policy 3: resource-exhaustion

This verifies that the number of DNS pods actually running matches what should be running. Sometimes pods can be in a weird state - this catches that.

Policy 4: dns-alerting-rule

This creates Prometheus alerts that will page you when DNS has problems. This is the only policy that uses "enforce" mode because it creates new alerting rules.

Why "Inform" Instead of "Enforce"?

You might notice most policies use "inform" mode instead of "enforce." Here's why that's intentional:

Inform means "tell me if something is wrong, but don't fix it automatically"
Enforce means "automatically fix things"

For DNS, automatic fixes are risky. Imagine if a policy accidentally overwrote your DNS configuration - you'd have a cluster-wide outage. By using "inform" mode, we get alerted to problems but don't risk making things worse automatically.

The only exception is the alerting rule, which creates new alert definitions - that's safe to enforce.

Wrapping Up

And that's it! You've now got DNS governance set up for your OpenShift cluster. Here's what you've accomplished:

Created policies that monitor your DNS Operator
Verified your CoreDNS configuration is correct
Ensured all DNS pods are actually running
Set up alerts for when things go wrong

DNS might seem like background infrastructure, but it deserves attention. With these policies in place, you'll know about DNS problems before they become cluster-wide outages.

Remember: the best time to set up governance was when you deployed your cluster. The second best time is now.

Quick Reference

Commands used:

git clone https://github.com/tosin2013/dns-policy-config.git
oc apply -f demo/namespace.yaml
oc apply -f demo/clusterset-binding.yaml
oc apply -f policies/dns/operator-health-check.yaml
oc apply -f policies/dns/corefile-integrity.yaml
oc apply -f policies/dns/resource-exhaustion.yaml
oc apply -f policies/observability/dns-alerting-rule.yaml
oc apply -f demo/placement.yaml
oc apply -f demo/placement-binding.yaml

What to check:

oc get policy -n dns-governance-policies

Questions? Want to learn more? Check out the full repository at github.com/tosin2013/dns-policy-config

DEV Community