Chioma Nwosu

Posted on May 26

Deploying Spring PetClinic Microservices on AWS EKS with Terraform: Lessons from an Infrastructure Engineer

#kubernetes #terraform #aws #devops

When people talk about Kubernetes projects, they often focus on the final deployment screenshots.

What they rarely talk about are:

the broken IAM trust relationships,
worker nodes failing to register,
EBS CSI drivers crashing,
NAT Gateway mistakes,
cluster connectivity issues,
and the countless troubleshooting sessions behind the scenes.

Recently, I worked as both the Infrastructure Engineer and Team Lead on a collaborative cloud-native project where we deployed the Spring PetClinic Microservices application on Amazon EKS using Terraform, Docker, and Kubernetes.

This wasn’t a simple localhost deployment.

We built a production-style Kubernetes environment on AWS — complete with networking, ingress, persistent storage, IAM integration, node scaling, and multi-service orchestration.

In this article, I’ll walk through:

the infrastructure architecture,
what I implemented,
the major challenges we faced,
and the lessons I learned while managing the infrastructure side of the project.
Project Architecture

The application followed a microservices architecture deployed on Kubernetes.

Core Components
Config Server
Discovery Server
API Gateway
Customers Service
Vets Service
Visits Service
GenAI Service
MySQL Stateful Databases
AWS Services Used
Service Purpose
Amazon EKS Kubernetes orchestration
EC2 Worker nodes
IAM Access management
VPC Networking
NAT Gateway Internet access for private nodes
ALB External traffic routing
EBS CSI Driver Persistent storage
ECR Container image registry
Terraform Infrastructure provisioning
My Role as Infrastructure Engineer

My responsibilities included:

Provisioning AWS infrastructure using Terraform
Managing the Amazon EKS cluster
Configuring VPC networking
Managing worker nodes and scaling
Installing Kubernetes add-ons
Configuring AWS Load Balancer Controller
Managing IAM roles and OIDC integration
Enabling persistent storage using EBS CSI Driver
Supporting deployment teams
Troubleshooting infrastructure and Kubernetes issues

I also ended up coordinating infrastructure operations across the team whenever deployment blockers occurred.

Provisioning the Infrastructure with Terraform

The environment was provisioned using Terraform.

Core Infrastructure Created
Networking
VPC
Public Subnets
Private Subnets
Internet Gateway
NAT Gateway
Route Tables
Kubernetes Infrastructure
EKS Cluster
Managed Node Groups
IAM Roles
Security Groups
Terraform Workflow
terraform init
terraform validate
terraform plan
terraform apply

One of the biggest advantages of using Terraform was reproducibility.

Instead of manually provisioning infrastructure from the AWS Console, the environment could be recreated consistently using Infrastructure as Code.

Configuring the EKS Cluster

After provisioning the cluster, I connected locally using:

aws eks update-kubeconfig \
--region us-east-1 \
--name petclinic-cluster

Verification:

kubectl get nodes

At one point, the cluster returned:

No resources found

After investigation, I discovered the node group had previously been scaled down to zero to reduce costs.

Scaling the node group back up restored cluster functionality.

Installing the AWS Load Balancer Controller

To expose Kubernetes Ingress resources externally, I installed the AWS Load Balancer Controller.

This involved:

configuring OIDC,
creating IAM policies,
creating IAM service accounts,
and deploying the controller using Helm.
OIDC Configuration
eksctl utils associate-iam-oidc-provider \
--region us-east-1 \
--cluster petclinic-cluster \
--approve

Helm Installation
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
-n kube-system \
--set clusterName=petclinic-cluster \
--set serviceAccount.create=false \
--set serviceAccount.name=aws-load-balancer-controller
The OIDC Trust Policy Problem

One major issue completely broke the ALB controller deployment.

The controller continuously failed with AccessDenied.

After several troubleshooting sessions, I discovered:
the IAM trust relationship referenced the wrong OIDC provider ID.

This was one of the most important lessons from the project:

In AWS EKS, IAM/OIDC integration is extremely sensitive to trust policy configuration.

Once the correct OIDC provider was configured, the controller immediately became healthy.

Persistent Storage with EBS CSI Driver

The database team required persistent storage for MySQL StatefulSets.

To support this, I installed the AWS EBS CSI Driver.

IAM Role Creation
eksctl create iamserviceaccount \
--name ebs-csi-controller-sa \
--namespace kube-system \
--cluster petclinic-cluster \
--role-name AmazonEKS_EBS_CSI_DriverRole \
--role-only \
--attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
--approve

Initially, the CSI driver entered:

CrashLoopBackOff

Root cause:
incorrect IAM permissions.

After correcting the IAM role attachment, the storage driver became healthy and PersistentVolumeClaims successfully bound to EBS volumes.

Worker Node Capacity Problems

As more microservices were deployed, our single worker node became overloaded.

Observed issues included:

Pending pods
High CPU utilisation
Scheduling failures

We eventually scaled the node group to multiple worker nodes.

At one point, one EC2 worker instance became unhealthy.

To recover:

I drained the node,
deleted it,
and allowed the Auto Scaling Group to recreate a replacement instance automatically.

This was one of the most realistic operational experiences in the project.

Networking Challenges

Earlier in the project, we deleted the NAT Gateway to reduce AWS costs.

That introduced multiple failures:

worker nodes lost outbound internet access,
pods could not pull images,
API connectivity became unstable.

This reinforced another important lesson:

Private EKS worker nodes depend heavily on NAT Gateway connectivity.

Terraform was later used to recreate the NAT Gateway and stabilise the environment.

Kubernetes Troubleshooting Experience

This project involved far more troubleshooting than I initially expected.

Some major issues included:

This project taught me that Kubernetes engineering is often less about deployment and more about troubleshooting distributed systems.

Leadership Beyond Infrastructure

Although my primary responsibility was infrastructure engineering, I also helped coordinate:

IAM access for teammates,
cluster access troubleshooting,
deployment support,
infrastructure recovery,
and operational decisions during failures.

One of the biggest lessons I learned is this:

Infrastructure engineering is not just about provisioning resources.
It’s also about ownership, communication, stability, and helping teams move forward when systems fail.

Key Lessons Learned

This project gave me hands-on experience with:

Amazon EKS administration
Kubernetes operations
IAM and OIDC integration
Terraform Infrastructure as Code
Kubernetes networking
Load balancing and ingress
Persistent storage management
Worker node recovery
Cluster troubleshooting
Cloud-native architecture

Most importantly, it taught me how real-world infrastructure behaves under operational pressure.

Final Thoughts

This project was one of the most challenging and rewarding cloud engineering experiences I’ve had so far.

We successfully built and managed:

a production-style EKS environment,
persistent Kubernetes workloads,
scalable worker nodes,
ingress-based traffic routing,
and cloud-native infrastructure on AWS.