DEV Community

Cover image for Automated TLS Certificates with Let's Encrypt and DNS-01 Challenges
Ian Packard for Octasoft Ltd

Posted on • Originally published at wsl-ui.octasoft.co.uk

Automated TLS Certificates with Let's Encrypt and DNS-01 Challenges

HTTPS everywhere isn't optional anymore. Browsers flag HTTP as insecure, service meshes expect mTLS, and APIs should be encrypted in transit. But managing TLS certificates manually is tedious - renewals every 90 days, CSR generation, key management, distribution to load balancers.

Let's Encrypt solved the cost problem (free certificates) and the validation problem (automated domain verification). cert-manager brings this to Kubernetes with native integration. This post covers how to set up fully automated TLS certificate management using DNS-01 challenges with AWS Route53.

Series context: This is Part 6 of the Homelab Kubernetes Series. We've covered bootstrapping, GitOps with ArgoCD, service mesh setup, and secrets management. Now we're adding automated TLS certificate management to complete the security layer.

Why DNS-01 Over HTTP-01

Let's Encrypt supports two main challenge types for proving domain ownership:

HTTP-01: Let's Encrypt requests a file at http://yourdomain.com/.well-known/acme-challenge/token. Your server must respond with the correct token.

DNS-01: Let's Encrypt queries a TXT record at _acme-challenge.yourdomain.com. You create this record with the token value.

homelab-part-6-tls-certificates/dns01-flow

For many Kubernetes deployments, DNS-01 is the better choice:

Scenario HTTP-01 DNS-01
Public-facing services Works Works
Private/internal services Fails Works
Wildcard certificates Not supported Supported
Firewall restrictions Needs port 80 open No inbound ports
Load balancer complexity Needs routing DNS only

If your services aren't publicly accessible - homelab, private cloud, internal APIs - HTTP-01 won't work. DNS-01 proves you control the domain regardless of where your services run.

The Components

The setup involves four pieces:

  1. cert-manager - Kubernetes controller that manages certificate lifecycle
  2. Let's Encrypt - Free certificate authority with ACME protocol support
  3. AWS Route53 - DNS provider (or your DNS provider of choice)
  4. External Secrets - Secure credential management for AWS access

cert-manager watches for Certificate resources, contacts Let's Encrypt, solves the DNS challenge by creating Route53 records, and stores the issued certificate as a Kubernetes Secret.

Installing cert-manager

cert-manager installs via Helm. The key configuration enables Gateway API support and Prometheus metrics:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: cert-manager
  namespace: argocd
spec:
  source:
    repoURL: https://charts.jetstack.io
    chart: cert-manager
    targetRevision: v1.18.2

    helm:
      values: |
        # Install CRDs with the chart
        installCRDs: true

        # Resource limits for homelab/small clusters
        resources:
          requests:
            cpu: 20m
            memory: 64Mi
          limits:
            cpu: 200m
            memory: 256Mi

        webhook:
          resources:
            requests:
              cpu: 5m
              memory: 16Mi
            limits:
              cpu: 50m
              memory: 64Mi

        cainjector:
          resources:
            requests:
              cpu: 5m
              memory: 16Mi
            limits:
              cpu: 100m
              memory: 128Mi

        # Prometheus metrics
        prometheus:
          enabled: true
          servicemonitor:
            enabled: true

        # Enable Gateway API integration
        extraArgs:
          - --enable-gateway-api

        # Security
        securityContext:
          runAsNonRoot: true

  destination:
    server: https://kubernetes.default.svc
    namespace: cert-manager
Enter fullscreen mode Exit fullscreen mode

The --enable-gateway-api flag is important if you're using Gateway API instead of Ingress. As we covered in Part 4 (Service Mesh), cert-manager can automatically provision certificates for Gateway listeners.

AWS IAM Setup

cert-manager needs permission to create DNS records in Route53. The principle of least privilege means giving it exactly what it needs - and for ACME DNS-01 challenges, that's surprisingly narrow.

IAM architecture options:

  1. IAM User with direct permissions - Simple, but the user has standing Route53 access
  2. IAM Role with IRSA (EKS) - Pod assumes role via service account, no static credentials
  3. IAM User + Role assumption - User can only assume a role, role has the actual permissions

I use option 3. The IAM user has no Route53 permissions at all - its only ability is to assume a specific role. This separates identity (the user) from capability (the role), and means compromised credentials can't do anything without also compromising the role assumption.

The IAM User (Identity Only)

The user exists solely to provide credentials that can assume the role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sts:AssumeRole",
        "sts:TagSession"
      ],
      "Resource": "arn:aws:iam::ACCOUNT_ID:role/CertManagerDNSRole"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

That's it. The user can assume exactly one role and nothing else.

The IAM Role (Capability)

The role has the actual Route53 permissions, tightly scoped for ACME challenges:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "route53:GetChange",
      "Resource": "arn:aws:route53:::change/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "route53:ChangeResourceRecordSets",
        "route53:ListResourceRecordSets"
      ],
      "Resource": "arn:aws:route53:::hostedzone/ZONE_ID_HERE",
      "Condition": {
        "ForAllValues:StringEquals": {
          "route53:ChangeResourceRecordSetsRecordTypes": ["TXT"]
        },
        "ForAllValues:StringLike": {
          "route53:ChangeResourceRecordSetsRecordNames": ["_acme-challenge.*"]
        }
      }
    },
    {
      "Effect": "Allow",
      "Action": [
        "route53:ListHostedZonesByName",
        "route53:ListHostedZones",
        "route53:GetHostedZone"
      ],
      "Resource": "*"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

The conditions are key:

  • ChangeResourceRecordSetsRecordTypes: ["TXT"] - Can only modify TXT records
  • ChangeResourceRecordSetsRecordNames: ["_acme-challenge.*"] - Can only modify records starting with _acme-challenge.

Even if the role is compromised, it can't touch your A records, MX records, or anything else. It can only create and delete the specific TXT records needed for ACME validation.

Trust Policy (Who Can Assume)

The role's trust policy specifies which principals can assume it:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::ACCOUNT_ID:user/service-accounts/certmanager-dns-challenge"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Only the specific cert-manager IAM user can assume this role. Combined with the scoped permissions, this creates defence in depth: compromise the user credentials, and you can only assume this one role; compromise the role, and you can only modify ACME challenge records.

homelab-part-6-tls-certificates/iam-role-assumption

Credential Management

The IAM access key and secret need to reach cert-manager. Hardcoding them in manifests is a security incident waiting to happen. As covered in Part 5 (Secrets Management), I use External Secrets Operator to pull credentials from Infisical:

apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: route53-credentials-external
  namespace: cert-manager
spec:
  refreshInterval: 15m
  secretStoreRef:
    name: infisical-cluster-secretstore
    kind: ClusterSecretStore
  target:
    name: route53-credentials
    creationPolicy: Owner
  data:
    - secretKey: access-key-id
      remoteRef:
        key: /cert-manager/AWS_ACCESS_KEY_ID
    - secretKey: secret-access-key
      remoteRef:
        key: /cert-manager/AWS_SECRET_ACCESS_KEY
Enter fullscreen mode Exit fullscreen mode

This creates a Kubernetes Secret named route53-credentials in the cert-manager namespace, refreshed every 15 minutes. If you rotate the AWS credentials in your secrets manager, they propagate automatically.

For simpler setups without a secrets manager, create the secret directly (but don't commit it to Git):

kubectl create secret generic route53-credentials \
  --namespace cert-manager \
  --from-literal=access-key-id=AKIAIOSFODNN7EXAMPLE \
  --from-literal=secret-access-key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Enter fullscreen mode Exit fullscreen mode

The ClusterIssuer

A ClusterIssuer defines how cert-manager obtains certificates. For Let's Encrypt with DNS-01:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    # Let's Encrypt production server
    server: https://acme-v02.api.letsencrypt.org/directory

    # Email for expiry notifications and account recovery
    email: your-email@example.com

    # Secret to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-prod

    # DNS-01 solver configuration
    solvers:
    - dns01:
        route53:
          region: us-east-1
          hostedZoneID: "Z0123456789ABCDEFGHIJ"

          # Reference to the credentials secret
          accessKeyIDSecretRef:
            name: route53-credentials
            key: access-key-id
          secretAccessKeySecretRef:
            name: route53-credentials
            key: secret-access-key

          # IAM role to assume (optional but recommended)
          role: "arn:aws:iam::ACCOUNT_ID:role/CertManagerDNSRole"

      # Only use this solver for specific domains
      selector:
        dnsZones:
        - example.com
Enter fullscreen mode Exit fullscreen mode

Key fields explained:

  • server: Use staging (acme-staging-v02.api.letsencrypt.org) for testing to avoid rate limits
  • email: Let's Encrypt sends expiry warnings here (cert-manager renews automatically, but good to have)
  • privateKeySecretRef: cert-manager creates this secret to store your ACME account key
  • hostedZoneID: Your Route53 zone ID (find it in the AWS console)
  • role: Optional IAM role to assume - if specified, cert-manager uses STS AssumeRole
  • selector.dnsZones: Restricts this solver to specific domains (useful with multiple issuers)

Always test with staging first. Let's Encrypt has strict rate limits (50 certificates per domain per week). Staging certificates aren't trusted by browsers but validate your configuration works.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    # ... rest identical to production
Enter fullscreen mode Exit fullscreen mode

Requesting Certificates

With the ClusterIssuer configured, request a certificate:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: my-app-tls
  namespace: my-app
spec:
  # Where to store the certificate
  secretName: my-app-tls-cert

  # Which issuer to use
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer

  # Domains to include
  dnsNames:
  - app.example.com
  - "*.app.example.com"  # Wildcard - requires DNS-01
Enter fullscreen mode Exit fullscreen mode

cert-manager:

  1. Contacts Let's Encrypt to start the ACME flow
  2. Creates a TXT record in Route53: _acme-challenge.app.example.com
  3. Tells Let's Encrypt to verify the record
  4. Receives the signed certificate
  5. Stores it in the my-app-tls-cert Secret
  6. Cleans up the DNS record
  7. Renews automatically before expiry (default: 30 days before)

The resulting secret contains:

  • tls.crt - The certificate chain (your cert + Let's Encrypt intermediate)
  • tls.key - The private key

Using Certificates

Kubernetes has two approaches for routing external traffic: the original Ingress API and the newer Gateway API. Gateway API is the successor - more expressive, role-oriented, and now the recommended approach for new deployments.

Gateway API (Recommended)

Gateway API separates concerns: infrastructure teams manage Gateways (the load balancer configuration), application teams manage HTTPRoutes (where traffic goes). As we set up in Part 4 (Service Mesh), the Gateway references the TLS certificate:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: main-gateway
  namespace: istio-ingress
spec:
  gatewayClassName: istio
  listeners:
  - name: https
    port: 443
    protocol: HTTPS
    tls:
      mode: Terminate
      certificateRefs:
      - name: my-app-tls-cert
        namespace: istio-ingress
    allowedRoutes:
      namespaces:
        from: All
Enter fullscreen mode Exit fullscreen mode

Services then attach to the Gateway via HTTPRoute resources. The HTTPRoute doesn't need to know about TLS - the Gateway handles termination:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: my-app
  namespace: my-app
spec:
  parentRefs:
  - name: main-gateway
    namespace: istio-ingress
  hostnames:
  - "app.example.com"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /
    backendRefs:
    - name: my-app
      port: 80
Enter fullscreen mode Exit fullscreen mode

This separation means one wildcard certificate on the Gateway serves all HTTPRoutes - you don't need per-service certificates.

homelab-part-6-tls-certificates/gateway-certificate-flow

Ingress (Legacy Approach)

Ingress combines routing and TLS in a single resource. It still works and is widely supported, but lacks the flexibility of Gateway API:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
  - hosts:
    - app.example.com
    secretName: my-app-tls-cert
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-app
            port:
              number: 80
Enter fullscreen mode Exit fullscreen mode

With the annotation, cert-manager automatically creates a Certificate resource when you create the Ingress. This is convenient but couples certificate management to the Ingress resource.

Which to Choose

Consideration Gateway API Ingress
New projects Recommended Works, but why?
Existing Ingress setup Migrate when ready Continue using
Multi-team environments Better separation of concerns Everyone edits same resources
Advanced routing Traffic splitting, header matching Basic path/host routing
Controller support Istio, Envoy, Nginx, Traefik Nearly universal

Troubleshooting

Check certificate status:

kubectl get certificates -A
kubectl describe certificate my-app-tls -n my-app
Enter fullscreen mode Exit fullscreen mode

Check certificate request:

kubectl get certificaterequests -A
kubectl describe certificaterequest my-app-tls-xxxxx -n my-app
Enter fullscreen mode Exit fullscreen mode

Check challenges:

kubectl get challenges -A
kubectl describe challenge my-app-tls-xxxxx -n my-app
Enter fullscreen mode Exit fullscreen mode

Common issues:

Symptom Likely cause Fix
Challenge stays pending AWS credentials wrong Check secret exists and has correct keys
"no hosted zone found" Wrong zone ID or region Verify hostedZoneID matches Route53
"access denied" IAM permissions insufficient Add required Route53 actions to policy
"rate limited" Too many requests Wait, or use staging issuer for testing
Certificate not renewing cert-manager pod unhealthy Check logs: kubectl logs -n cert-manager deploy/cert-manager

Verify DNS propagation:

# Check if the challenge record exists
dig TXT _acme-challenge.app.example.com

# Should return something like:
# _acme-challenge.app.example.com. 300 IN TXT "abc123..."
Enter fullscreen mode Exit fullscreen mode

Other DNS Providers

Route53 is one of many supported DNS providers. cert-manager has solvers for:

  • Cloudflare - API token with Zone:DNS:Edit permission
  • Google Cloud DNS - Service account with dns.admin role
  • Azure DNS - Service principal with DNS Zone Contributor
  • DigitalOcean - API token
  • RFC2136 - Dynamic DNS updates (for self-hosted DNS)

The configuration differs slightly per provider, but the pattern is the same: give cert-manager credentials to modify DNS records, configure the solver in your ClusterIssuer.

Example for Cloudflare:

solvers:
- dns01:
    cloudflare:
      email: your-email@example.com
      apiTokenSecretRef:
        name: cloudflare-api-token
        key: api-token
Enter fullscreen mode Exit fullscreen mode

Security Considerations

Credential scope: The DNS credentials can modify your domain's records. Scope them as tightly as possible - specific zones, specific record types if your provider supports it.

Namespace isolation: ClusterIssuer works across all namespaces. For multi-tenant clusters, consider namespace-scoped Issuer resources with separate credentials per tenant.

Private key protection: The certificate's private key is stored in a Kubernetes Secret. Enable encryption at rest for secrets, and use RBAC to restrict who can read them.

Rate limits: Let's Encrypt enforces rate limits. For production, this rarely matters (50 certs/domain/week). For development/testing, use the staging server.

What I'd Change

Automatic Gateway API certificates: cert-manager can watch Gateway resources and automatically provision certificates for listeners. I'm not using this yet - explicitly creating Certificate resources gives more control, but the automatic approach reduces boilerplate.

Multiple DNS providers: For redundancy, you could configure multiple solvers with different DNS providers. If Route53 has issues, fall back to Cloudflare. Probably overkill for most setups, but possible.

Certificate monitoring: Prometheus metrics are enabled, but I haven't built dashboards for certificate expiry tracking. The certmanager_certificate_expiration_timestamp_seconds metric exists - alerting on certificates expiring within 14 days would add defense in depth.


This is Part 6 of the Homelab Kubernetes Series, covering TLS and certificate automation for Kubernetes.

Sources:


Originally published at https://wsl-ui.octasoft.co.uk/blog/homelab-part-6-tls-certificates

Top comments (0)