DEV Community

Cover image for Fix Cert-Manager Conflict with EKS
Mohamed Radwan for AWS Community Builders

Posted on • Edited on

Fix Cert-Manager Conflict with EKS

I was facing issue with multiple managed worker nodes running on EKS clusters.

The issue was appearing randomly in different nodes, I cannot access the pods or get the logs by kubectl.

x509: cannot validate certificate for 10.0.83.153 because it doesn’t contain any IP SANs 
Enter fullscreen mode Exit fullscreen mode

Kube API in the CloudWatch showing the following errors:

E0327 08:54:17.406029 11 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"error dialing backend: x509: cannot validate certificate for 10.0.83.153 because it doesn't contain any IP SANs"}: error dialing backend: x509: cannot validate certificate for 10.0.83.153 because it doesn't contain any IP SANs 
Enter fullscreen mode Exit fullscreen mode

After investigating the issue with the AWS EKS support team, we found that cert-manager-webhook is causing the issue.
Kubelet certificate chain is being used from cert-manager-webhook-ca.

Run the following command on the non-working node:

openssl s_client -connect localhost:10250 
Enter fullscreen mode Exit fullscreen mode
CONNECTED(00000003)
---
Certificate chain
 0 s:
   i:/CN=cert-manager-webhook-ca
---
Server certificate
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
subject=
issuer=/CN=cert-manager-webhook-ca
---
Enter fullscreen mode Exit fullscreen mode

Run the following command on the working healthy node:

openssl s_client -connect localhost:10250 
Enter fullscreen mode Exit fullscreen mode
CONNECTED(00000003)
---
Certificate chain
 0 s:/O=system:nodes/CN=system:node:ip-10-0-31-151.eu-west-1.compute.internal
   i:/CN=kubernetes
---
Server certificate
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
subject=/O=system:nodes/CN=system:node:ip-10-0-31-151.eu-west-1.compute.internal
issuer=/CN=kubernetes
---
Enter fullscreen mode Exit fullscreen mode

The cert-manager-webhook deployment uses port 10250 which is also used for kubelet.

The solution is change the port of cert-manager-webhook to 10260.

By setting webhook.securePort to 10260

helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.10.0 \
  --set webhook.securePort=10260
Enter fullscreen mode Exit fullscreen mode

Sources:

https://cert-manager.io/docs/concepts/webhook/
https://cert-manager.io/docs/installation/compatibility/#aws-eks

Top comments (0)