DEV Community

Cover image for Fix Cert-Manager Conflict with EKS
Mohamed Radwan for AWS Community Builders

Posted on • Edited on

6 1 1 1 1

Fix Cert-Manager Conflict with EKS

I was facing issue with multiple managed worker nodes running on EKS clusters.

The issue was appearing randomly in different nodes, I cannot access the pods or get the logs by kubectl.

x509: cannot validate certificate for 10.0.83.153 because it doesn’t contain any IP SANs 
Enter fullscreen mode Exit fullscreen mode

Kube API in the CloudWatch showing the following errors:

E0327 08:54:17.406029 11 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"error dialing backend: x509: cannot validate certificate for 10.0.83.153 because it doesn't contain any IP SANs"}: error dialing backend: x509: cannot validate certificate for 10.0.83.153 because it doesn't contain any IP SANs 
Enter fullscreen mode Exit fullscreen mode

After investigating the issue with the AWS EKS support team, we found that cert-manager-webhook is causing the issue.
Kubelet certificate chain is being used from cert-manager-webhook-ca.

Run the following command on the non-working node:

openssl s_client -connect localhost:10250 
Enter fullscreen mode Exit fullscreen mode
CONNECTED(00000003)
---
Certificate chain
 0 s:
   i:/CN=cert-manager-webhook-ca
---
Server certificate
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
subject=
issuer=/CN=cert-manager-webhook-ca
---
Enter fullscreen mode Exit fullscreen mode

Run the following command on the working healthy node:

openssl s_client -connect localhost:10250 
Enter fullscreen mode Exit fullscreen mode
CONNECTED(00000003)
---
Certificate chain
 0 s:/O=system:nodes/CN=system:node:ip-10-0-31-151.eu-west-1.compute.internal
   i:/CN=kubernetes
---
Server certificate
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
subject=/O=system:nodes/CN=system:node:ip-10-0-31-151.eu-west-1.compute.internal
issuer=/CN=kubernetes
---
Enter fullscreen mode Exit fullscreen mode

The cert-manager-webhook deployment uses port 10250 which is also used for kubelet.

The solution is change the port of cert-manager-webhook to 10260.

By setting webhook.securePort to 10260

helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.10.0 \
  --set webhook.securePort=10260
Enter fullscreen mode Exit fullscreen mode

Sources:

https://cert-manager.io/docs/concepts/webhook/
https://cert-manager.io/docs/installation/compatibility/#aws-eks

AWS Security LIVE!

Join us for AWS Security LIVE!

Discover the future of cloud security. Tune in live for trends, tips, and solutions from AWS and AWS Partners.

Learn More

Top comments (0)

Best Practices for Running  Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK cover image

Best Practices for Running Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK

This post discusses the process of migrating a growing WordPress eShop business to AWS using AWS CDK for an easily scalable, high availability architecture. The detailed structure encompasses several pillars: Compute, Storage, Database, Cache, CDN, DNS, Security, and Backup.

Read full post