giveitatry

Posted on Apr 1

K3s Certificate Rotation: Complete Guide to Managing and Rotating Certificates

#k8s #k3s #kubernetes

TL;DR — All Commands You Need

# 1. Check certificate status on any node
sudo k3s certificate check

# 2. Rotate on the SERVER node
sudo systemctl stop k3s
sudo k3s certificate rotate
sudo systemctl start k3s

# 3. Update kubeconfig after server rotation
sudo cp /etc/rancher/k3s/k3s.yaml $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# 4. IMMEDIATELY restart k3s-agent on EVERY worker node
#    (workers will disconnect after server cert rotation — this fixes it)
ssh root@<worker-ip>
sudo systemctl restart k3s-agent

# 5. Verify all nodes are back
kubectl get nodes -w

Certificate Types and Their Lifetimes

K3s manages two fundamentally different categories of certificates, and confusing them is the most common source of operational mistakes.

Client and Server Certificates — 365 days

K3s client and server certificates are valid for 365 days from their date of issuance. These are the leaf certificates used by every component in the cluster for mutual TLS: the API server, kubelet, etcd, kube-proxy, controller-manager, scheduler, and others. They are signed by the cluster's CA certificates.

The full list of rotatable services is: admin, api-server, controller-manager, scheduler, k3s-controller, k3s-server, cloud-controller, etcd, auth-proxy, kubelet, kube-proxy.

CA (Certificate Authority) Certificates — 10 years

By default, K3s generates self-signed CA certificates during startup of the first server node. These CA certificates are valid for 10 years from date of issuance, and are not automatically renewed. They live in /var/lib/rancher/k3s/server/tls and are the root of trust for the entire cluster. The authoritative copy is stored encrypted in the datastore (etcd or SQLite), with copies extracted to disk on startup.

Automatic Certificate Renewal — How It Works

Any certificates that are expired or within 120 days of expiring are automatically renewed every time K3s starts. This renewal reuses the existing keys, and extends the lifetime of the existing certificates.

This is the key distinction: automatic renewal extends the same keys, it doesn't rotate them. If you want brand new keys generated — for example after a security incident — you need the rotate subcommand covered below.

Version note: Prior to the May 2025 releases (v1.33.1+k3s1, v1.32.5+k3s1, v1.31.9+k3s1, v1.30.13+k3s1), alerts and rotation were triggered at 90 days instead of 120 days. If you're running an older cluster, the threshold is 90 days.

The Practical Implication

If your cluster nodes are rebooted or K3s is restarted at least once every few months as part of normal patching, certificates will silently renew themselves without any operator action. It is expected that you would be taking your hosts down periodically for patching and upgrading every few months — but reality has shown that many do not patch or reboot for more than 3 months, so the best practice is monitoring the certificate expiration.

A cluster that runs for an entire year without a restart will let its certificates expire.

Warning Signs — What You See Before Expiry

When a certificate is within 120 days of expiring, a Kubernetes Warning Event with reason: CertificateExpirationWarning is created, with a relation to the Node using the certificate.

You'll see events like this in your cluster logs:

Warning  CertificateExpirationWarning  Node certificates require attention -
restart k3s on this node to trigger automatic rotation:
kube-proxy/client-kube-proxy.crt: certificate CN=system:kube-proxy will
expire within 90 days at 2025-05-03T12:14:51Z

On newer versions these appear via:

kubectl get events -A --field-selector reason=CertificateExpirationWarning

What Happens When Certificates Actually Expire

This is where things get painful. You get something like this from systemctl status k3s:

x509: certificate has expired or is not yet valid

And any kubectl call returns:

Unable to connect to the server: x509: certificate has expired or is not yet valid

What still works: Apps that are currently running continue to run with no issues — workloads don't immediately crash. The data plane keeps operating. What breaks is the control plane: you lose kubectl access entirely, API server communication fails, and no new deployments, scaling, or config changes are possible. If nodes restart or pods crash, Kubernetes cannot reschedule them because the control plane is inaccessible.

The good news: certificates that are already expired are still automatically renewed every time K3s starts — so the recovery procedure is identical to the proactive rotation procedure. K3s is designed to self-heal on restart even in the fully expired state.

Checking Certificate Status

sudo k3s certificate check

Note: the --output table flag was added in the January 2025 releases (v1.32.0+k3s1, v1.31.5+k3s1, v1.30.9+k3s1) and will error on anything older. Check your version first:

k3s --version

# Only use --output table if you're on a January 2025+ release
sudo k3s certificate check --output table

Example output:

FILENAME                    SUBJECT                     USAGES       EXPIRES                  RESIDUAL TIME   STATUS
--------                    -------                     ------       -------                  -------------   ------
client-kube-proxy.crt       system:kube-proxy           ClientAuth   Jun 09, 2026 10:17 UTC   1 year          OK
client-kubelet.crt          system:node:k3s-server-1    ClientAuth   Jun 09, 2026 10:17 UTC   1 year          OK
serving-kubelet.crt         k3s-server-1                ServerAuth   Jun 09, 2026 10:17 UTC   1 year          OK
client-k3s-controller.crt   system:k3s-controller       ClientAuth   Jun 09, 2026 10:17 UTC   1 year          OK

Each certificate file in the FILENAME column contains at least two certificates — the leaf (or end entity) client/server certificate, any intermediate Certificate Authority certificates, and the root Certificate Authority certificate. That's why you see each filename appear twice — the leaf cert at 1 year, and the CA cert at 10 years.

To inspect a specific cert directly with openssl:

sudo openssl x509 \
  -in /var/lib/rancher/k3s/server/tls/client-admin.crt \
  -noout -dates

To inspect the certificate currently embedded in your kubeconfig:

sudo cat /etc/rancher/k3s/k3s.yaml \
  | grep 'client-certificate-data' \
  | awk '{print $2}' \
  | base64 -d \
  | openssl x509 -noout -dates

This is useful to confirm your local kubectl context is using up-to-date credentials, separate from what the node itself reports.

Manual Rotation — Step by Step

On the Server Node

# Step 1: Stop K3s
sudo systemctl stop k3s

# Step 2: Rotate certificates (generates new certs AND new keys)
sudo k3s certificate rotate

# Step 3: Start K3s
sudo systemctl start k3s

After K3s starts back up, the new certificate is written to /etc/rancher/k3s/k3s.yaml. This file is your kubeconfig and now contains the updated client certificate, client key, and certificate authority data:

cd /etc/rancher/k3s
ls
# k3s.yaml

cat k3s.yaml

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: <base64-encoded-new-CA>
    server: https://127.0.0.1:6443
  name: default
contexts:
- context:
    cluster: default
    user: default
  name: default
current-context: default
users:
- name: default
  user:
    client-certificate-data: <base64-encoded-new-cert>
    client-key-data: <base64-encoded-new-key>

# Step 4: Copy the updated kubeconfig
sudo cp /etc/rancher/k3s/k3s.yaml $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# If managing the cluster from a remote machine, copy it there too
scp root@<server-ip>:/etc/rancher/k3s/k3s.yaml ~/.kube/config
# Fix the server address if it points to 127.0.0.1
sed -i 's/127.0.0.1/<your-server-ip>/g' ~/.kube/config

# Step 5: Verify server is healthy
sudo k3s certificate check
kubectl get nodes

Any CI/CD pipelines, Helm deployments, or developer workstations that had a copy of the old kubeconfig will also need this updated file — the old client certificate is invalid the moment rotation runs.

Rotating Specific Services Only

You can limit rotation to specific components rather than rotating everything at once:

sudo k3s certificate rotate --service api-server,kubelet
sudo k3s certificate rotate --service etcd
sudo k3s certificate rotate --service admin,controller-manager,scheduler

This is useful when only certain certs are approaching expiry, or when you want to minimize the blast radius in a sensitive environment. The stop/start of K3s is still required around this command.

⚠️ Warning: Worker Nodes Will Disconnect After Rotation

This is something the official documentation doesn't warn you about clearly enough, and it can cause a cascading failure that looks far worse than a simple cert issue.

What Happens

When you rotate certificates on the server node, the API server starts presenting new certificates. Worker nodes running k3s-agent were trusting the old certificates. Until the agent is restarted and picks up the new trust chain, kubelet on each worker loses the ability to communicate with the API server:

Kubelet stopped posting node status.

The workers show as NotReady even though the machines themselves are perfectly healthy.

The Cascade With Local Storage

If you're using local-path storage (the K3s default), this gets significantly worse:

You rotate certs on the server 🔄
Kubelet on workers stops trusting the API server ❌
Workers become NotReady ❌
PVCs using local-path are physically locked to those dead nodes ❌
Pods tied to that storage cannot reschedule ❌
Scheduler starts throwing volume node affinity conflict ❌
Any stateful workload (databases, etc.) is now stuck in Pending or Initializing forever ❌

The fix is straightforward — but you have to do it immediately after rotating the server, before anything starts crashing and trying to reschedule.

Fix — Restart the Agent on Every Worker Node

SSH into each worker and restart the agent:

# Step 1 — SSH into the worker
ssh root@<worker-ip>

# Step 2 — Restart k3s-agent
sudo systemctl restart k3s-agent

# Step 3 — Check status
sudo systemctl status k3s-agent

If the agent still won't come back after a restart:

# Full reboot as last resort
sudo reboot

Then verify from the control plane that all nodes are back:

kubectl get nodes -w

Wait for all workers to flip back to Ready. Once they do, any stuck pods will reschedule automatically — no manual intervention on the workloads themselves is needed.

The Correct Order of Operations for a Full Cluster Rotation

To avoid the disconnection cascade entirely, always treat cert rotation as a coordinated cluster-wide operation:

1. Stop k3s on server           →  sudo systemctl stop k3s
2. Rotate certs on server       →  sudo k3s certificate rotate
3. Start k3s on server          →  sudo systemctl start k3s
4. Update kubeconfig            →  cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
5. Restart agent on each worker →  sudo systemctl restart k3s-agent
6. Verify all nodes Ready       →  kubectl get nodes

Don't wait between steps 4 and 5. The longer workers run without restarting their agent, the higher the chance that a pod crash or node pressure event triggers a rescheduling attempt that hits the volume node affinity conflict wall.

CA Certificate Rotation

CA rotation is a completely different operation from leaf cert rotation. To rotate CA certificates and keys, use the k3s certificate rotate-ca command. The command performs integrity checks to confirm that the updated certificates and keys are usable. If the updated data is acceptable, the datastore's encrypted bootstrap key is updated, and the new certificates and keys will be used the next time K3s starts. If problems are encountered while validating the certificates and keys, an error is reported to the system log and the operation is cancelled without changes.

Non-disruptive vs. Disruptive Rotation

A cluster that has been started with custom CA certificates can renew or rotate the CA certificates and keys non-disruptively, as long as the same root CA is used. If a new root CA is required, the rotation will be disruptive. The k3s certificate rotate-ca --force option must be used, all nodes that were joined with a secure token (including servers) will need to be reconfigured to use the new token value, and pods will need to be restarted to trust the new root CA.

The safe path is to generate new intermediate CAs signed by the existing root CA — existing trust chains remain valid, nodes can rejoin without token changes, and pods don't need restarts.

CA Rotation Procedure

# Stage new CA certificates into a TEMP directory
# Do NOT place them directly into the live /server/tls directory
mkdir -p /tmp/k3s-ca-rotate

# Use the helper script from the K3s repo to generate new CAs
curl -o /tmp/generate-custom-ca-certs.sh \
  https://raw.githubusercontent.com/k3s-io/k3s/main/contrib/util/generate-custom-ca-certs.sh
chmod +x /tmp/generate-custom-ca-certs.sh
K3S_DATADIR=/tmp/k3s-ca-rotate /tmp/generate-custom-ca-certs.sh

# Load the new CAs into the datastore
sudo k3s certificate rotate-ca --path /tmp/k3s-ca-rotate/server/tls

# Restart K3s on all nodes — servers first, then agents
sudo systemctl restart k3s

Back up the generated root and intermediate CA files somewhere safe after this — you'll need them if you ever need to rotate CAs again in the future.

Monitoring — Preventing Surprise Expiry

Watch for Kubernetes warning events:

kubectl get events -A \
  --field-selector reason=CertificateExpirationWarning \
  --watch

Prometheus alert rule (works with kube-prometheus-stack):

- alert: K3sCertificateExpiringSoon
  expr: |
    (k3s_certificate_expiry_seconds - time()) < 30 * 24 * 3600
  for: 1h
  labels:
    severity: warning
  annotations:
    summary: "K3s certificate expiring in less than 30 days on {{ $labels.node }}"

Simple weekly cron check on the server node:

#!/bin/bash
# /etc/cron.weekly/k3s-cert-check
k3s certificate check | mail -s "K3s cert check on $(hostname)" ops@yourcompany.com

Summary Table

Certificate type	Validity	Auto-renewal	Rotation command
Client/Server leaf certs	365 days	Yes, on restart (if < 120 days remaining)	`k3s certificate rotate`
CA certificates	10 years	No	`k3s certificate rotate-ca`
kubeconfig embedded cert	Same as leaf	Follows leaf rotation	Copy updated `k3s.yaml`

Key Takeaways

Restart K3s regularly. Even just for patching every few months — this is what triggers automatic certificate renewal. A cluster that never restarts will expire its certs at the 1-year mark.
rotate ≠ restart. A plain restart extends existing keys. k3s certificate rotate generates entirely new keys. Use rotate when you need cryptographic freshness, not just extended validity.
Always restart worker agents immediately after rotating the server. Workers disconnect the moment the server starts presenting new certs. Don't wait — do it before anything tries to reschedule.
local-path storage amplifies worker disconnection. PVCs are locked to specific nodes. If a worker goes NotReady while holding a PVC, any pod using that storage is stuck until the worker comes back.
Update your kubeconfig after server node rotation — the embedded client certificate changes, and any CI/CD pipelines using the old kubeconfig will break immediately.
CA certs are your 10-year bomb. Mark a calendar reminder. They won't warn you and won't auto-renew.
Backups are automatic. Old certs are saved to /var/lib/rancher/k3s/server/tls-<timestamp> before rotation, giving you a rollback path.

DEV Community