Last week, I deployed three microservices to Kubernetes locally. This week, I deployed them to AWS EKS. What I thought would be a straightforward migration turned into a deep lesson about container platforms, cross-compilation, and production deployment patterns.
Here's what actually happened when theory met reality.
The Confidence Before the Crash
After successfully deploying to Minikube, I felt ready. I had working Kubernetes manifests, healthy pods, and services communicating properly. Moving to EKS seemed like the natural next step - just point kubectl at a different cluster, right?
I provisioned an EKS cluster, updated my kubeconfig, and confidently deployed all three services:
kubectl apply -f k8s-eks/api-service/
kubectl apply -f k8s-eks/auth-service/
kubectl apply -f k8s-eks/worker-service/
Then I watched the pods:
kubectl get pods -w
Everything showed "Running" status initially. Then "CrashLoopBackOff." All six pods. Every single one failing.
This was not the smooth migration I'd envisioned.
When Everything Fails: The Debugging Process
The first instinct when all your pods crash is panic. The second is to check the logs:
kubectl logs api-service-6bdd859969-bsssz
The output was cryptic:
exec /usr/local/bin/docker-entrypoint.sh: exec format error
I'd never seen this error before. Google taught me what it meant: platform architecture mismatch.
My development machine is an M2 Mac (ARM64 architecture). I'd been building Docker images locally, which defaulted to ARM64. Minikube on my Mac runs ARM64, so everything worked fine.
But EKS nodes run on AMD64 architecture. When Kubernetes tried to run my ARM64 containers on AMD64 nodes, the kernel couldn't execute the binaries. Hence: "exec format error."
The lesson: Local development and production infrastructure need platform alignment, or you need multi-platform builds.
Solution Attempt #1: Just Add --platform (Failed)
Armed with this knowledge, I tried the obvious fix:
docker buildx build --platform linux/amd64 \
-t 023231074087.dkr.ecr.us-east-1.amazonaws.com/secure-cloud-platform/api-service:v1 \
--push .
I rebuilt all three services with the --platform linux/amd64
flag and pushed to ECR. Then I restarted the deployments:
kubectl rollout restart deployment api-service
kubectl rollout restart deployment worker-service
kubectl rollout restart deployment auth-service
Surely this would work. The new pods spun up. I watched them start...and crash again.
Same error. Still ARM64 images somehow.
The lesson: The default Docker builder on Mac doesn't reliably cross-compile. You need a proper buildx builder.
Solution Attempt #2: Multi-Platform Builder (Success)
The fix required creating a dedicated buildx builder that properly supports cross-platform builds:
docker buildx create --name multiplatform --driver docker-container --use
docker buildx inspect --bootstrap
This creates a container-based builder that can properly build for different platforms. Then I rebuilt:
cd apps/api-service
docker buildx build --platform linux/amd64 \
-t 023231074087.dkr.ecr.us-east-1.amazonaws.com/secure-cloud-platform/api-service:v1 \
--push .
Same for worker and auth services.
But there was still a problem. When I checked which platforms were actually in ECR:
docker buildx imagetools inspect \
023231074087.dkr.ecr.us-east-1.amazonaws.com/secure-cloud-platform/api-service:v1
The output showed multiple manifests - including both AMD64 AND an "unknown/unknown" platform (build attestations). Kubernetes was apparently pulling the wrong one.
The lesson: Multi-platform images are great, but you need to ensure Kubernetes pulls the right architecture.
The Final Solution: Explicit SHA Digests
The breakthrough came from understanding that the :v1
tag pointed to a manifest list with multiple platform variants. Kubernetes was making the wrong choice.
The fix was to reference the specific AMD64 digest directly:
kubectl set image deployment/api-service \
api=023231074087.dkr.ecr.us-east-1.amazonaws.com/secure-cloud-platform/api-service:v1@sha256:6b8903f38db383b2732565a4022ad916b10009c962c761986a76841c2c354834
I got the SHA256 digest from the imagetools inspect output - it showed exactly which digest was the AMD64 platform.
After updating all three deployments with their specific AMD64 digests, I watched the pods:
kubectl get pods
NAME READY STATUS RESTARTS AGE
api-service-65f8f5f745-6mhd7 1/1 Running 0 39s
api-service-65f8f5f745-sv2xx 1/1 Running 0 29s
auth-service-7867d7bc58-nlkj9 1/1 Running 0 6s
worker-service-7dd95b78bc-rtslr 1/1 Running 0 20s
Finally. All pods running. All services healthy.
Testing the public API LoadBalancer:
curl http://aec0488a8aa514bafbe7b0ad77647334-1658914460.us-east-1.elb.amazonaws.com/health
{"status":"healthy","service":"api-service","timestamp":"2025-10-16T23:51:26.385Z"}
Success.
What I Learned About Production Deployments
Platform Architecture Matters
In local development, you can ignore platform differences. In production, you can't. Your build process needs to account for where your code will actually run.
For cloud deployments, that almost always means AMD64/x86_64, even if you develop on ARM Macs.
Multi-Platform Builds Are the Standard
The proper solution isn't to avoid building on ARM machines. It's to build multi-platform images correctly:
docker buildx build --platform linux/amd64,linux/arm64 \
-t myimage:latest --push .
This creates a manifest list that works on both architectures. Kubernetes will automatically pull the right variant for each node.
But you need the right builder setup, or it silently fails.
Debugging Requires Multiple Tools
Solving this required understanding several layers:
-
kubectl logs
to see the error -
kubectl describe pod
to check events -
docker buildx imagetools inspect
to verify what platforms were actually built - ECR console to confirm what was pushed
- Understanding of Docker manifest lists and multi-arch images
No single tool showed the full picture. Production debugging means combining multiple perspectives.
Local-to-Production Gaps Are Real
Minikube worked because it matched my local architecture. Moving to EKS exposed the platform mismatch.
This is a small example of a larger truth: local development environments don't fully replicate production, no matter how hard you try.
The solution isn't to make local identical to production. It's to catch these differences early with proper CI/CD pipelines that build and test in production-like environments.
The Architecture That Emerged
After all the debugging, here's what's running in EKS:
Three Deployments:
- API Service: 2 replicas on port 3000
- Auth Service: 2 replicas on port 3001
- Worker Service: 2 replicas on port 3002
Three Services:
- API Service: LoadBalancer (public internet access)
- Auth Service: ClusterIP (internal only)
- Worker Service: ClusterIP (internal only)
Resource Configuration:
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
Health Checks:
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
Every service has proper resource limits to prevent one container from consuming all node resources. Every service implements health checks so Kubernetes knows when to restart or remove pods from load balancing.
This isn't just "getting it to work." This is production-ready configuration.
What's Next
This deployment taught me about the gap between local development and production infrastructure. But it also revealed the next challenges:
Immediate Needs:
- ConfigMaps for application configuration
- Secrets for sensitive data (JWT keys, DB credentials)
- Proper logging and monitoring
- CI/CD pipeline for automated deployments
Future Infrastructure:
- Ingress controller for better HTTP routing
- Network policies for service-to-service security
- Horizontal pod autoscaling based on load
- Complete observability stack
The platform is running, but it's not yet production-ready. There's still work to do.
The Value of Building in Public
When you document your learning publicly, you can't hide mistakes. That's uncomfortable but valuable.
I could have written a clean tutorial: "Here's how to deploy to EKS in 5 easy steps." That would have been simpler.
But it wouldn't have been honest. And it wouldn't have taught the real lesson: production deployments are rarely smooth, and the debugging process is where you actually learn.
Every error message is a lesson. Every wrong assumption gets corrected. Every "why doesn't this work?" forces you to understand the system more deeply.
That's how you build real expertise.
Want to Follow Along?
The complete project is on GitHub: secure-cloud-platform
All the Kubernetes manifests, Dockerfiles, and deployment documentation are there. You can see exactly what I built and how it's configured.
I'm documenting this journey on Dev.to and LinkedIn. If you're also learning Kubernetes, dealing with platform architecture issues, or just want to follow along, connect with me.
Next post: Setting up proper configuration management with ConfigMaps and Secrets, and why hardcoded values in manifests are technical debt.
About my journey: Former Air Force officer and software engineer/solutions architect, now teaching middle school computer science while transitioning back into tech with a focus on DevSecOps. Building elite expertise in Infrastructure as Code, Kubernetes security, and cloud-native platforms. AWS certified (SA Pro, Security Specialty, Developer, SysOps). Learning in public, one commit at a time.
Top comments (0)