<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Lanre Awe</title>
    <description>The latest articles on DEV Community by Lanre Awe (@ralphlarry).</description>
    <link>https://dev.to/ralphlarry</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3983547%2F8264b77e-3e12-4482-8402-43fad498b23c.jpeg</url>
      <title>DEV Community: Lanre Awe</title>
      <link>https://dev.to/ralphlarry</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ralphlarry"/>
    <language>en</language>
    <item>
      <title>How I Built a Production-Style GitOps Platform on AWS EKS — Solo, From Scratch</title>
      <dc:creator>Lanre Awe</dc:creator>
      <pubDate>Sat, 27 Jun 2026 12:13:10 +0000</pubDate>
      <link>https://dev.to/ralphlarry/how-i-built-a-production-style-gitops-platform-on-aws-eks-solo-from-scratch-a2g</link>
      <guid>https://dev.to/ralphlarry/how-i-built-a-production-style-gitops-platform-on-aws-eks-solo-from-scratch-a2g</guid>
      <description>&lt;p&gt;Most DevOps portfolio projects follow the same pattern: deploy a "hello world" app to Kubernetes, write a README, call it done.&lt;/p&gt;

&lt;p&gt;This isn't that.&lt;/p&gt;

&lt;p&gt;I took the &lt;a href="https://github.com/spring-petclinic/spring-petclinic-microservices" rel="noopener noreferrer"&gt;Spring PetClinic microservices&lt;/a&gt; — a real Java application with 7 independent services, service discovery, an API gateway, and distributed tracing — and built the &lt;em&gt;entire platform around it&lt;/em&gt; on AWS. Infrastructure as code, a proper GitOps delivery pipeline, autoscaling at two layers, end-to-end observability, and a reproducible lifecycle that provisions or destroys the whole environment with a single command.&lt;/p&gt;

&lt;p&gt;The live app is running right now at &lt;a href="https://petclinic.ralphnetwork.online" rel="noopener noreferrer"&gt;petclinic.ralphnetwork.online&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This post is a walkthrough of what I built, how I made the decisions I made, and — most importantly — what broke and why. Because that last part is what actually teaches you something.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I built this
&lt;/h2&gt;

&lt;p&gt;I'm an infrastructure engineer with 18 years of hands-on experience — servers, networking, firewalls, backup and DR — making the transition into DevOps and cloud engineering. I've been building cloud-native projects and documenting the journey publicly.&lt;/p&gt;

&lt;p&gt;My goal with this project was specific: &lt;strong&gt;demonstrate that I can operate at the platform layer, not just the tool layer.&lt;/strong&gt; Anyone can follow a tutorial and get &lt;code&gt;kubectl apply&lt;/code&gt; to work. What I wanted to prove was that I could make engineering decisions, build a reliable delivery pipeline, handle real failures, and articulate the trade-offs — the way a working engineer actually operates.&lt;/p&gt;

&lt;p&gt;So I treated it like a real system, not a demo.&lt;/p&gt;




&lt;h2&gt;
  
  
  The architecture
&lt;/h2&gt;

&lt;p&gt;At a high level: a push to &lt;code&gt;main&lt;/code&gt; triggers a GitHub Actions pipeline that builds and pushes Docker images to ECR, then commits a tag bump to the Helm chart in Git. Argo CD detects the change and syncs the cluster. The CI pipeline never runs &lt;code&gt;kubectl&lt;/code&gt; directly — git is the authoritative source of truth.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart TB
    Dev[Push to GitHub main] --&amp;gt; GHA[GitHub Actions CI]
    GHA --&amp;gt;|OIDC role - no static keys| ECR[Amazon ECR]
    GHA --&amp;gt;|bump image tag + commit| Git[Helm chart in Git]
    Git --&amp;gt; Argo[Argo CD]
    Argo --&amp;gt;|sync| Cluster
    ECR --&amp;gt; Cluster

    subgraph Cluster["EKS cluster (petclinic-prod) — eu-central-1"]
      direction TB
      ALB[ALB Ingress - ACM TLS] --&amp;gt; GW[api-gateway]
      GW --&amp;gt; APP[customers / vets / visits]
      APP --- Platform[discovery + config server]
      HPA[HPA] -. scales pods .-&amp;gt; APP
      Karpenter[Karpenter] -. scales nodes .-&amp;gt; Nodes[EC2 nodes]
      APP -. traces .-&amp;gt; Zipkin
      APP -. metrics .-&amp;gt; Prometheus --&amp;gt; Grafana
    end
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cluster:&lt;/strong&gt; &lt;code&gt;petclinic-prod&lt;/code&gt; · &lt;strong&gt;Region:&lt;/strong&gt; eu-central-1 · &lt;strong&gt;Kubernetes:&lt;/strong&gt; 1.33&lt;/p&gt;




&lt;h2&gt;
  
  
  The full stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Tooling&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;td&gt;AWS (EKS, ECR, VPC, IAM, ALB, ACM, SQS)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IaC&lt;/td&gt;
&lt;td&gt;Terraform — remote state on S3 + DynamoDB, reusable modules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Containers&lt;/td&gt;
&lt;td&gt;Docker, Amazon ECR (one repo per service, scan-on-push)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orchestration&lt;/td&gt;
&lt;td&gt;Kubernetes (EKS, managed node group + Karpenter)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Packaging&lt;/td&gt;
&lt;td&gt;Helm (one values-driven chart for all 7 services)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitOps&lt;/td&gt;
&lt;td&gt;Argo CD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CI/CD&lt;/td&gt;
&lt;td&gt;GitHub Actions (OIDC auth — no static AWS keys)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Autoscaling&lt;/td&gt;
&lt;td&gt;HPA (pods) + Karpenter (nodes) + metrics-server&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Prometheus, Grafana, Zipkin (distributed tracing)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;App&lt;/td&gt;
&lt;td&gt;Spring Boot microservices, Spring Cloud Config + Eureka&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Layer by layer: what I built and why
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Infrastructure as Code (Terraform)
&lt;/h3&gt;

&lt;p&gt;Every AWS resource is defined in Terraform, split into reusable modules and wired together in a single &lt;code&gt;prod&lt;/code&gt; environment. The first thing I provisioned — before anything else — was the &lt;strong&gt;remote state backend&lt;/strong&gt;: an S3 bucket (versioned, encrypted, public access blocked) and a DynamoDB lock table. If you lose your state file, you lose control of your infrastructure. That comes first, always.&lt;/p&gt;

&lt;p&gt;The modules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;vpc&lt;/code&gt;&lt;/strong&gt; — 2 availability zones, public and private subnets, with the specific subnet tags the AWS Load Balancer Controller and Karpenter need to discover them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;eks&lt;/code&gt;&lt;/strong&gt; — built on the official &lt;code&gt;terraform-aws-modules/eks&lt;/code&gt; module, EKS 1.33, managed node group, IRSA and EKS Pod Identity enabled, control-plane logging on.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ecr&lt;/code&gt;&lt;/strong&gt; — one repository per service with image scanning on push.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;iam&lt;/code&gt;&lt;/strong&gt; — IRSA role for the Load Balancer Controller.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;github-actions&lt;/code&gt;&lt;/strong&gt; — OIDC trust policy and an IAM role so GitHub Actions can assume it without a long-lived access key.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Karpenter&lt;/strong&gt; — IAM role, SQS interruption queue, and node role, via the EKS module's built-in Karpenter submodule using Pod Identity.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Terraform provisions the AWS platform. Everything &lt;em&gt;above&lt;/em&gt; that — cluster add-ons, Argo CD, the app — is installed by &lt;code&gt;scripts/addons.sh&lt;/code&gt; in the correct dependency order.&lt;/p&gt;

&lt;h3&gt;
  
  
  GitOps delivery
&lt;/h3&gt;

&lt;p&gt;This is the piece I'm proudest of, because it's the difference between "I can run kubectl" and "I built a delivery pipeline."&lt;/p&gt;

&lt;p&gt;The workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Push to &lt;code&gt;main&lt;/code&gt; triggers GitHub Actions.&lt;/li&gt;
&lt;li&gt;GitHub Actions authenticates to AWS via an &lt;strong&gt;OIDC role&lt;/strong&gt; — no &lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt; in secrets, not anywhere.&lt;/li&gt;
&lt;li&gt;All 7 services are built as Docker images and pushed to ECR, tagged with the git SHA.&lt;/li&gt;
&lt;li&gt;The pipeline then &lt;strong&gt;bumps the image tag in &lt;code&gt;helm/petclinic/values.yaml&lt;/code&gt; and commits it back to the repo&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Argo CD detects the change and syncs the Helm chart to the cluster.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The cluster never pulls credentials from CI. CI never holds cluster access. The audit trail for every deployment lives in git history. That's real GitOps, and it's meaningfully different from "run &lt;code&gt;kubectl apply&lt;/code&gt; at the end of a pipeline."&lt;/p&gt;

&lt;h3&gt;
  
  
  Packaging with Helm
&lt;/h3&gt;

&lt;p&gt;The app started with hand-written Kubernetes manifests with hardcoded image tags — one manifest per service, with the image version baked in. I converted everything into &lt;strong&gt;one values-driven Helm chart&lt;/strong&gt; that renders all 7 services from a single config block.&lt;/p&gt;

&lt;p&gt;That collapsed seven hardcoded image tags into &lt;strong&gt;one value&lt;/strong&gt; that CI controls. It also eliminated hundreds of lines of duplicated YAML, made per-service configuration changes a one-line edit, and gave me a single versioned artifact I can promote, roll back, or diff. It also made Argo CD's diff view meaningful — you can actually see what changed per deployment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Autoscaling at two layers
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;HPA&lt;/strong&gt; (Horizontal Pod Autoscaler) is configured on the four stateless services — api-gateway, customers-service, vets-service, visits-service — with a minimum of 2 replicas, maximum of 4, scaling on CPU at 70%, fed by &lt;code&gt;metrics-server&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Karpenter&lt;/strong&gt; handles node autoscaling. When the HPA needs to schedule more pods than the current nodes can fit, Karpenter provisions a right-sized EC2 instance and decommissions it when idle. I didn't just configure this — I tested it under real load. Pending pods from HPA scaling triggered a &lt;code&gt;t3a.medium&lt;/code&gt; provisioning event, and Karpenter had a node ready in approximately 90 seconds.&lt;/p&gt;

&lt;p&gt;The choice to use Karpenter over the older cluster-autoscaler was deliberate. It bin-packs more efficiently, picks instance types dynamically, and it's the modern EKS approach. More setup, but a better result.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observability
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prometheus&lt;/strong&gt; scrapes every service via the &lt;code&gt;/actuator/prometheus&lt;/code&gt; endpoint. &lt;strong&gt;Grafana&lt;/strong&gt; visualises the metrics. &lt;strong&gt;Zipkin&lt;/strong&gt; collects distributed traces, so you can follow a single user request as it travels from the api-gateway through customers-service and back.&lt;/p&gt;

&lt;p&gt;Getting all three working together — and getting &lt;em&gt;traces&lt;/em&gt; working end to end specifically — was one of the most instructive parts of the build. More on that in the debugging section.&lt;/p&gt;

&lt;h3&gt;
  
  
  Networking and TLS
&lt;/h3&gt;

&lt;p&gt;An &lt;strong&gt;ALB Ingress&lt;/strong&gt; (provisioned by the AWS Load Balancer Controller from a Kubernetes &lt;code&gt;Ingress&lt;/code&gt; object) fronts the gateway. TLS is terminated at the ALB using an &lt;strong&gt;ACM certificate&lt;/strong&gt;, with a real DNS record at &lt;code&gt;petclinic.ralphnetwork.online&lt;/code&gt;. The cluster itself runs in private subnets. The only public entry point is the load balancer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reproducible from one command
&lt;/h2&gt;

&lt;p&gt;A platform you can't rebuild from scratch isn't really infrastructure as code — it's a managed pet. So I encoded the full lifecycle in a &lt;code&gt;Makefile&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Provision the state backend once per AWS account&lt;/span&gt;
make state

&lt;span class="c"&gt;# Provision the full platform + install add-ons + deploy the app&lt;/span&gt;
make up

&lt;span class="c"&gt;# Tear everything down cleanly&lt;/span&gt;
make down
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;make up&lt;/code&gt; runs two phases in order: &lt;code&gt;terraform apply&lt;/code&gt; to provision the AWS layer, then &lt;code&gt;scripts/addons.sh&lt;/code&gt; to install add-ons in dependency order: AWS Load Balancer Controller → metrics-server → Karpenter → Argo CD → the PetClinic application.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;make down&lt;/code&gt; is the part that trips people up. Terraform provisions the base infrastructure, but the &lt;strong&gt;in-cluster controllers create resources at runtime&lt;/strong&gt; that Terraform doesn't know about — specifically the ALB and any Karpenter-provisioned EC2 nodes. A naive &lt;code&gt;terraform destroy&lt;/code&gt; hangs waiting for a VPC it can't delete because the ALB is still attached. The teardown script deletes the Kubernetes layer first, waits for the ALB and extra nodes to actually drain, and &lt;em&gt;then&lt;/em&gt; runs &lt;code&gt;terraform destroy&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This means I can provision the full environment for a live demo and destroy it to near-zero cost afterward without leaving orphaned load balancers or a surprise AWS bill.&lt;/p&gt;




&lt;h2&gt;
  
  
  The bugs — the part that actually matters
&lt;/h2&gt;

&lt;p&gt;I want to be honest about this: most of the learning in this project came from what broke. Here are the ones that taught me the most.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zipkin showed no traces, even though all services were up
&lt;/h3&gt;

&lt;p&gt;Tracing export is non-fatal — a failure to connect to Zipkin doesn't crash the application, it just silently drops spans. So the services appeared healthy while producing zero traces.&lt;/p&gt;

&lt;p&gt;The root causes were two independent misconfigurations that had to be fixed together:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The tracing endpoint in the Spring Cloud Config pointed at &lt;code&gt;tracing-server&lt;/code&gt;, which didn't match the Kubernetes Service name &lt;code&gt;zipkin&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The endpoint was only set under one Spring profile, so most services never exported at all.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The fix: corrected the hostname, and had the Helm chart inject the Zipkin endpoint via an environment variable into &lt;strong&gt;every&lt;/strong&gt; service — so tracing is now uniform and controlled at the platform level, not buried in per-service config files.&lt;/p&gt;

&lt;h3&gt;
  
  
  CI was building images that never actually deployed
&lt;/h3&gt;

&lt;p&gt;The deploy step rewrote &lt;code&gt;:latest&lt;/code&gt; tags, but the manifests had specific version pins (&lt;code&gt;:4.0.1&lt;/code&gt;, &lt;code&gt;:4.0.2&lt;/code&gt;). The substitution matched nothing — every "deployment" silently re-applied the old images. The cluster looked updated; it wasn't.&lt;/p&gt;

&lt;p&gt;Migrating to Helm fixed this properly. Image tags became a single chart value that CI bumps to the git SHA, and Argo CD shows a visible diff when the value changes. There's no ambiguity about what's running.&lt;/p&gt;

&lt;h3&gt;
  
  
  Argo CD and the HPA fought over replica counts
&lt;/h3&gt;

&lt;p&gt;With Argo's self-healing enabled, it kept resetting &lt;code&gt;replicas&lt;/code&gt; to the chart value. The HPA simultaneously tried to scale based on CPU. They were in a tug of war that neither could win cleanly.&lt;/p&gt;

&lt;p&gt;The fix is a standard but non-obvious GitOps pattern: omit &lt;code&gt;replicas&lt;/code&gt; from the Deployment spec when an HPA controls the workload, and configure Argo to &lt;strong&gt;explicitly ignore differences&lt;/strong&gt; on the &lt;code&gt;replicas&lt;/code&gt; field. That way Argo reconciles everything except replica counts, and the HPA owns that field exclusively.&lt;/p&gt;

&lt;h3&gt;
  
  
  Karpenter's IAM policy exceeded AWS's size limit
&lt;/h3&gt;

&lt;p&gt;The error was: &lt;code&gt;LimitExceeded: Cannot exceed quota for PolicySize: 6144&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Karpenter's required IAM policy is large. The fix was to switch from a managed policy to an &lt;strong&gt;inline policy&lt;/strong&gt; (10,240-character limit) using a flag built into the EKS Terraform module. One line change; the reason isn't obvious unless you've hit it.&lt;/p&gt;

&lt;h3&gt;
  
  
  A capacity planning decision that wasn't a bug
&lt;/h3&gt;

&lt;p&gt;Enabling HPA would have scheduled more pods than the 2-node cluster could hold — they'd have sat in &lt;code&gt;Pending&lt;/code&gt; indefinitely, which looks like a broken cluster. I had three options: add a third static node, use cluster-autoscaler, or add Karpenter.&lt;/p&gt;

&lt;p&gt;I chose Karpenter: it scales on demand rather than requiring a fixed node count, bins-packs more efficiently, and it's the approach AWS recommends for EKS. The decision had a cost in setup time and complexity. The benefit is a cluster that genuinely scales rather than one that holds a fixed headroom.&lt;/p&gt;




&lt;h2&gt;
  
  
  Decisions and trade-offs
&lt;/h2&gt;

&lt;p&gt;The interesting engineering questions in this project weren't "which tool" — they were "why this, versus that, given these constraints."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitOps over direct &lt;code&gt;kubectl apply&lt;/code&gt; in CI.&lt;/strong&gt; More moving parts upfront. But: CI doesn't hold cluster credentials, every deployment is auditable in git, and Argo's self-healing means drift from the desired state gets corrected automatically. For any real team, this is non-negotiable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Karpenter over cluster-autoscaler.&lt;/strong&gt; Faster to respond, picks the right instance type for the pending workload, consolidates underutilised nodes. The trade-off is more setup. Worth it for the operational behaviour and the learning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kept Eureka + Spring Cloud Config, deliberately.&lt;/strong&gt; On Kubernetes, native Service DNS and ConfigMaps overlap with what these frameworks provide — they're somewhat redundant. I kept them because rewriting all 7 services to drop them was out of scope, and doing it poorly would be worse than the overlap. Going fully Kubernetes-native is explicitly on the backlog as a next step, not an oversight I missed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Single NAT gateway, single environment.&lt;/strong&gt; A deliberate cost decision. Multi-AZ NAT gateways add ~$30/month per gateway, which adds up quickly for a demo project. I know exactly where the HA gap is and named it, rather than pretending it's production-grade multi-region when it isn't.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this project demonstrates
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill area&lt;/th&gt;
&lt;th&gt;Evidence&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure as Code&lt;/td&gt;
&lt;td&gt;Modular Terraform, remote state, full AWS platform from scratch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CI/CD&lt;/td&gt;
&lt;td&gt;GitHub Actions, OIDC auth, build → push → tag bump → deploy automated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitOps&lt;/td&gt;
&lt;td&gt;Argo CD syncing a Helm chart; git as sole source of truth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kubernetes&lt;/td&gt;
&lt;td&gt;EKS, Helm, HPA, PDBs, ALB Ingress, Pod Identity / IRSA&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud (AWS)&lt;/td&gt;
&lt;td&gt;EKS, ECR, IAM, VPC, ALB, ACM, SQS — provisioned end to end&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Autoscaling&lt;/td&gt;
&lt;td&gt;HPA + Karpenter, verified under real load, capacity reasoning documented&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Prometheus, Grafana, distributed tracing with Zipkin&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;OIDC over static keys, least-privilege IAM, TLS on all public traffic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Debugging&lt;/td&gt;
&lt;td&gt;Real failures diagnosed and fixed; root causes explained&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineering judgment&lt;/td&gt;
&lt;td&gt;Trade-offs documented and defended, not assumed away&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What I'd do next
&lt;/h2&gt;

&lt;p&gt;I kept an honest backlog rather than claiming the project is "done":&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NetworkPolicies&lt;/strong&gt; — default-deny with explicit allow rules for each service-to-service path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secrets management&lt;/strong&gt; — External Secrets Operator backed by AWS SSM Parameter Store or Secrets Manager, to replace env-var secrets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;kube-prometheus-stack&lt;/code&gt;&lt;/strong&gt; — replace the hand-assembled Prometheus + Grafana setup with the community Helm chart.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Go Kubernetes-native&lt;/strong&gt; — remove Eureka and Spring Cloud Config in favour of Kubernetes Service DNS and native ConfigMaps.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;This project started as "deploy an app to Kubernetes" and became a study in what it actually means to build a platform. The delivery pipeline, the autoscaling, the tracing, the teardown ordering, the GitOps patterns — none of that comes from a tutorial. It comes from making deliberate choices, hitting real problems, and working through them.&lt;/p&gt;

&lt;p&gt;That's the work I want to do professionally, and this project is my evidence that I can.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Repos:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/Ralphlarry/petclinic-infrastructure" rel="noopener noreferrer"&gt;petclinic-infrastructure&lt;/a&gt; — Terraform, Makefile, addon scripts&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/Ralphlarry/spring-petclinic-microservices" rel="noopener noreferrer"&gt;spring-petclinic-microservices&lt;/a&gt; — app code, Helm chart, GitHub Actions&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/Ralphlarry/spring-petclinic-microservices-config" rel="noopener noreferrer"&gt;spring-petclinic-microservices-config&lt;/a&gt; — Spring Cloud Config&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Live app:&lt;/strong&gt; &lt;a href="https://petclinic.ralphnetwork.online" rel="noopener noreferrer"&gt;petclinic.ralphnetwork.online&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're hiring for DevOps or Platform Engineering roles — remote or Lagos on-site — I'd genuinely love to talk. Find me on &lt;a href="https://www.linkedin.com/in/olanrewaju-awe-62761758" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>aws</category>
      <category>kubernetes</category>
      <category>terraform</category>
    </item>
    <item>
      <title>From Learning DevOps to Deploying a Production Application: My DMI Cohort 2 Experience</title>
      <dc:creator>Lanre Awe</dc:creator>
      <pubDate>Mon, 15 Jun 2026 07:07:39 +0000</pubDate>
      <link>https://dev.to/ralphlarry/from-learning-devops-to-deploying-a-production-application-my-dmi-cohort-2-experience-3m0o</link>
      <guid>https://dev.to/ralphlarry/from-learning-devops-to-deploying-a-production-application-my-dmi-cohort-2-experience-3m0o</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1hnl47bxm5tpe9g7cbyl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1hnl47bxm5tpe9g7cbyl.png" alt=" " width="800" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;One of the biggest milestones in my DevOps journey was participating in the deployment of a production-ready microservices application as part of the DevOps Mentorship Initiative (DMI) Cohort 2.&lt;/p&gt;

&lt;p&gt;Together with a team of talented engineers, we successfully deployed the Spring Petclinic Microservices application to Azure Kubernetes Service (AKS). This was not a simple demo project—it involved real-world tools, cloud infrastructure, CI/CD pipelines, observability, AI integration, and production deployment practices.&lt;/p&gt;

&lt;p&gt;The experience challenged me technically and personally, but it also showed me what working on a real DevOps project looks like.&lt;/p&gt;

&lt;h2&gt;
  
  
  About the Project
&lt;/h2&gt;

&lt;p&gt;Spring Petclinic is a cloud-native microservices application built with Spring Boot. The application consists of multiple services that communicate with each other and are deployed as containers.&lt;/p&gt;

&lt;p&gt;Our technology stack included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spring Boot&lt;/li&gt;
&lt;li&gt;Spring AI&lt;/li&gt;
&lt;li&gt;Docker&lt;/li&gt;
&lt;li&gt;Terraform&lt;/li&gt;
&lt;li&gt;Azure Kubernetes Service (AKS)&lt;/li&gt;
&lt;li&gt;Helm&lt;/li&gt;
&lt;li&gt;Azure Pipelines&lt;/li&gt;
&lt;li&gt;Azure OpenAI&lt;/li&gt;
&lt;li&gt;Grafana&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As part of the project, I worked as the &lt;strong&gt;GenAI Engineer&lt;/strong&gt;, responsible for integrating the AI chatbot using Spring AI and Azure OpenAI.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Role: Integrating AI into the Platform
&lt;/h2&gt;

&lt;p&gt;My responsibility was to make the chatbot feature work seamlessly within the microservices environment.&lt;/p&gt;

&lt;p&gt;This involved:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Connecting the application to Azure OpenAI&lt;/li&gt;
&lt;li&gt;Integrating Spring AI into the backend services&lt;/li&gt;
&lt;li&gt;Ensuring the chatbot worked correctly within a reactive Spring WebFlux environment&lt;/li&gt;
&lt;li&gt;Troubleshooting deployment and runtime issues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Although it was a single project ticket, it required understanding multiple layers of the architecture—from application code to cloud infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges We Faced
&lt;/h2&gt;

&lt;p&gt;No real-world deployment is without problems, and this project was no exception.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Docker Hub Access Issues
&lt;/h3&gt;

&lt;p&gt;At one point, Docker Hub access became unreliable due to ISP restrictions.&lt;/p&gt;

&lt;p&gt;To solve this, we imported images directly into Azure Container Registry (ACR) using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;az acr import
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This removed our dependency on Docker Hub and improved reliability.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Azure OpenAI Quota Problems
&lt;/h3&gt;

&lt;p&gt;Having Azure credits did not automatically mean we had access to AI model quotas.&lt;/p&gt;

&lt;p&gt;We had to upgrade the subscription and request the required quota before deploying the chatbot successfully.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Spring WebFlux Blocking Calls
&lt;/h3&gt;

&lt;p&gt;The AI service initially caused threading issues because the AI call was blocking.&lt;/p&gt;

&lt;p&gt;The solution was to wrap the operation using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Mono&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromCallable&lt;/span&gt;&lt;span class="o"&gt;(...)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;subscribeOn&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Schedulers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;boundedElastic&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once implemented, the chatbot responded correctly without blocking the reactive application.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Secrets Accidentally Committed to Git
&lt;/h3&gt;

&lt;p&gt;One of the most important lessons came from discovering that an API key had been committed to the repository.&lt;/p&gt;

&lt;p&gt;The team immediately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rotated the compromised key&lt;/li&gt;
&lt;li&gt;Removed it from Git history&lt;/li&gt;
&lt;li&gt;Improved secret management practices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It was a valuable reminder that security starts from day one.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Empty Vector Store
&lt;/h3&gt;

&lt;p&gt;The chatbot initially returned poor results because the vector store had been built with incompatible embeddings.&lt;/p&gt;

&lt;p&gt;We regenerated the embeddings using the correct Azure OpenAI model and rebuilt the vector store using production data.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;This project taught me lessons that go beyond certifications and tutorials.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloud Solves Real Problems
&lt;/h3&gt;

&lt;p&gt;Some issues that are difficult to handle locally become much easier when using managed cloud services.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reactive Programming Requires Discipline
&lt;/h3&gt;

&lt;p&gt;Spring WebFlux works extremely well, but blocking operations can quickly cause problems if not handled correctly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Secrets Management Is Critical
&lt;/h3&gt;

&lt;p&gt;Security cannot be treated as an afterthought. Proper handling of credentials and sensitive information must be part of the development process from the beginning.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deployment Order Matters
&lt;/h3&gt;

&lt;p&gt;In a microservices environment, services depend on one another. Starting services in the wrong order can prevent the entire application from functioning correctly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Team Behind the Project
&lt;/h2&gt;

&lt;p&gt;One of the most rewarding parts of this experience was working alongside an amazing team.&lt;/p&gt;

&lt;p&gt;Special appreciation to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Michael Ikedimma&lt;/li&gt;
&lt;li&gt;Benjamin Akinteye&lt;/li&gt;
&lt;li&gt;Gift Ukporo&lt;/li&gt;
&lt;li&gt;Duru Juliet Chinenye&lt;/li&gt;
&lt;li&gt;Pradeep Neelaboyina&lt;/li&gt;
&lt;li&gt;Angela Chibuike&lt;/li&gt;
&lt;li&gt;Oladayo Aremu&lt;/li&gt;
&lt;li&gt;Ubani OnU Chukwu&lt;/li&gt;
&lt;li&gt;Osman Farah Ali Farah&lt;/li&gt;
&lt;li&gt;Kolawole Yinusa&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everyone brought unique skills and expertise that helped make the project successful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Looking back, this project was one of the most impactful experiences of my career so far.&lt;/p&gt;

&lt;p&gt;Beyond deploying applications and solving technical problems, I learned how real engineering teams collaborate, troubleshoot, communicate, and deliver production systems.&lt;/p&gt;

&lt;p&gt;The journey was challenging, but every obstacle helped me grow as a DevOps engineer.&lt;/p&gt;

&lt;p&gt;If you're looking for a practical way to learn DevOps by working on real-world projects, I highly recommend the DevOps Mentorship Initiative (DMI).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DMI Cohort 3 starts on 27 June.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Apply here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.google.com/forms/d/e/1FAIpQLSel7ai7nyb0P1qLW4vEyfB_nEsD4lUF1XG88vmAaFGBOb6hPA/viewform" rel="noopener noreferrer"&gt;https://docs.google.com/forms/d/e/1FAIpQLSel7ai7nyb0P1qLW4vEyfB_nEsD4lUF1XG88vmAaFGBOb6hPA/viewform&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Connect With Me
&lt;/h2&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Ralphlarry" rel="noopener noreferrer"&gt;https://github.com/Ralphlarry&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;LinkedIn: &lt;a href="http://www.linkedin.com/in/olanrewaju-awe-62761758" rel="noopener noreferrer"&gt;www.linkedin.com/in/olanrewaju-awe-62761758&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  DMI #DevOps #CloudComputing #Azure #AKS #Terraform #Docker #Microservices #SpringBoot #AI #TheCloudAdvisory
&lt;/h1&gt;

</description>
    </item>
    <item>
      <title>Deploying Spring Petclinic Microservices Locally with Docker Compose</title>
      <dc:creator>Lanre Awe</dc:creator>
      <pubDate>Sun, 14 Jun 2026 21:44:55 +0000</pubDate>
      <link>https://dev.to/ralphlarry/deploying-spring-petclinic-microservices-locally-with-docker-compose-552e</link>
      <guid>https://dev.to/ralphlarry/deploying-spring-petclinic-microservices-locally-with-docker-compose-552e</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnu0pqal2w3uqnepp1c9e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnu0pqal2w3uqnepp1c9e.png" alt=" " width="800" height="426"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftyfalsfde5fp4gq1t19u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftyfalsfde5fp4gq1t19u.png" alt=" " width="800" height="452"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjd20c1n13k3ajwni1glo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjd20c1n13k3ajwni1glo.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fioj91woyk18lvj7q05jn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fioj91woyk18lvj7q05jn.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;Introduction&lt;/p&gt;

&lt;p&gt;As part of the DevOps Mentorship Initiative (DMI), I deployed the Spring Petclinic Microservices application locally using Docker Compose and explored how a modern microservices architecture operates in practice.&lt;/p&gt;

&lt;p&gt;Spring Petclinic is a cloud-native sample application built with Spring Boot and Spring Cloud. Instead of running as a single application, it consists of multiple independent services that communicate with one another.&lt;/p&gt;

&lt;p&gt;The deployment included:&lt;/p&gt;

&lt;p&gt;Config Server&lt;br&gt;
Discovery Server (Eureka)&lt;br&gt;
API Gateway&lt;br&gt;
Customers Service&lt;br&gt;
Vets Service&lt;br&gt;
Visits Service&lt;br&gt;
GenAI Service&lt;br&gt;
Admin Server&lt;/p&gt;

&lt;p&gt;In addition, the observability stack included:&lt;/p&gt;

&lt;p&gt;Prometheus&lt;br&gt;
Grafana&lt;br&gt;
Zipkin&lt;/p&gt;

&lt;p&gt;This project provided hands-on experience with service discovery, centralized configuration, container orchestration, and observability.&lt;/p&gt;

&lt;p&gt;Prerequisites&lt;/p&gt;

&lt;p&gt;Before starting, I installed the following tools:&lt;br&gt;
Docker engine for WSL.&lt;br&gt;
Docker engine in WSL was used to run and manage all application containers.&lt;/p&gt;

&lt;p&gt;Verify installation:&lt;br&gt;
docker --version&lt;br&gt;
docker compose version&lt;br&gt;
Git&lt;/p&gt;

&lt;p&gt;Git was used to clone the project repository.&lt;/p&gt;

&lt;p&gt;Verify installation:&lt;br&gt;
git --version&lt;/p&gt;

&lt;p&gt;Step 1: Clone the Repository&lt;br&gt;
Clone the Spring Petclinic Microservices repository:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Ralphlarry/spring-petclinic-microservices.git" rel="noopener noreferrer"&gt;https://github.com/Ralphlarry/spring-petclinic-microservices.git&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Move into the project directory:&lt;br&gt;
cd spring-petclinic-microservices&lt;/p&gt;

&lt;p&gt;Step 2: Start the Entire Application&lt;br&gt;
The most interesting part of the project was that the entire platform could be started with a single command:&lt;/p&gt;

&lt;p&gt;docker compose up -d&lt;/p&gt;

&lt;p&gt;Docker Compose automatically:&lt;br&gt;
Pulled required images&lt;br&gt;
Created containers&lt;br&gt;
Connected services through a shared network&lt;br&gt;
Applied startup dependencies&lt;br&gt;
Started the monitoring stack&lt;/p&gt;

&lt;p&gt;To confirm everything was running:&lt;br&gt;
docker ps&lt;/p&gt;

&lt;p&gt;Expected containers:&lt;br&gt;
config-server&lt;br&gt;
discovery-server&lt;br&gt;
api-gateway&lt;br&gt;
customers-service&lt;br&gt;
vets-service&lt;br&gt;
visits-service&lt;br&gt;
genai-service&lt;br&gt;
admin-server&lt;br&gt;
prometheus-server&lt;br&gt;
grafana-server&lt;br&gt;
tracing-server&lt;/p&gt;

&lt;p&gt;Step 3: Verify the Deployment&lt;br&gt;
API Gateway&lt;/p&gt;

&lt;p&gt;Check the gateway health endpoint:&lt;/p&gt;

&lt;p&gt;curl &lt;a href="http://localhost:8080/actuator/health" rel="noopener noreferrer"&gt;http://localhost:8080/actuator/health&lt;/a&gt;&lt;br&gt;
Expected response:&lt;br&gt;
{"status":"UP"}&lt;/p&gt;

&lt;p&gt;Eureka Dashboard&lt;br&gt;
Open:&lt;br&gt;
&lt;a href="http://localhost:8761" rel="noopener noreferrer"&gt;http://localhost:8761&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All services should appear as registered instances.&lt;/p&gt;

&lt;p&gt;Spring Boot Admin&lt;br&gt;
Open:&lt;br&gt;
&lt;a href="http://localhost:9090" rel="noopener noreferrer"&gt;http://localhost:9090&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This dashboard provides visibility into application health and metrics.&lt;br&gt;
Understanding the Startup Order&lt;/p&gt;

&lt;p&gt;One important concept in this deployment is service startup dependency.&lt;br&gt;
The Docker Compose file ensures that the Config Server and Discovery Server start before the other services.&lt;/p&gt;

&lt;p&gt;Why Config Server Starts First&lt;br&gt;
The Config Server stores centralized configuration for all services.&lt;br&gt;
When services such as Customers Service or API Gateway start, they immediately request configuration from the Config Server.&lt;/p&gt;

&lt;p&gt;Without it:&lt;br&gt;
Services cannot retrieve configuration&lt;br&gt;
Startup may fail&lt;br&gt;
Environment settings become unavailable&lt;br&gt;
Why Discovery Server Starts Second&lt;/p&gt;

&lt;p&gt;The Discovery Server (Eureka) acts as a service registry.&lt;/p&gt;

&lt;p&gt;Every microservice registers itself with Eureka when it starts.&lt;/p&gt;

&lt;p&gt;Without Eureka:&lt;br&gt;
Services cannot discover each other&lt;br&gt;
API Gateway routing fails&lt;br&gt;
Inter-service communication breaks&lt;/p&gt;

&lt;p&gt;In short:&lt;/p&gt;

&lt;p&gt;Config Server&lt;br&gt;
      ↓&lt;br&gt;
Discovery Server&lt;br&gt;
      ↓&lt;br&gt;
All Other Services&lt;/p&gt;

&lt;p&gt;This startup sequence is critical for a healthy deployment.&lt;/p&gt;

&lt;p&gt;Observability and Monitoring&lt;br&gt;
One of the most valuable parts of this project was learning how observability tools provide visibility into distributed systems.&lt;/p&gt;

&lt;p&gt;Prometheus&lt;br&gt;
Prometheus continuously collects metrics from the Spring Boot Actuator endpoints.&lt;/p&gt;

&lt;p&gt;Metrics include:&lt;br&gt;
CPU usage&lt;br&gt;
Memory usage&lt;br&gt;
HTTP request counts&lt;br&gt;
Application performance statistics&lt;/p&gt;

&lt;p&gt;Access:&lt;br&gt;
&lt;a href="http://localhost:9091" rel="noopener noreferrer"&gt;http://localhost:9091&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Prometheus acts as the data collection layer for monitoring.&lt;/p&gt;

&lt;p&gt;Grafana&lt;br&gt;
Grafana visualizes metrics collected by Prometheus.&lt;/p&gt;

&lt;p&gt;Access:&lt;br&gt;
&lt;a href="http://localhost:3000" rel="noopener noreferrer"&gt;http://localhost:3000&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Using Grafana dashboards, I could monitor:&lt;br&gt;
Service health&lt;br&gt;
JVM memory consumption&lt;br&gt;
Request throughput&lt;br&gt;
System performance trends&lt;/p&gt;

&lt;p&gt;Instead of reading raw metrics, Grafana transforms them into easy-to-understand charts and dashboards.&lt;/p&gt;

&lt;p&gt;Zipkin&lt;br&gt;
Zipkin provides distributed tracing.&lt;/p&gt;

&lt;p&gt;Access:&lt;br&gt;
&lt;a href="http://localhost:9411" rel="noopener noreferrer"&gt;http://localhost:9411&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Distributed tracing allows engineers to follow a request as it travels across multiple services.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;Client&lt;br&gt;
  ↓&lt;br&gt;
API Gateway&lt;br&gt;
  ↓&lt;br&gt;
Customers Service&lt;br&gt;
  ↓&lt;br&gt;
Database&lt;/p&gt;

&lt;p&gt;Zipkin records timing information for every step, helping identify bottlenecks and performance issues.&lt;/p&gt;

&lt;p&gt;Although tracing required additional verification during testing, understanding how distributed tracing works was one of the most educational parts of the project.&lt;/p&gt;

&lt;p&gt;Docker Compose Up vs Down&lt;br&gt;
Start Everything&lt;br&gt;
docker compose up -d&lt;/p&gt;

&lt;p&gt;This command:&lt;br&gt;
Creates containers&lt;br&gt;
Creates networks&lt;br&gt;
Starts services&lt;br&gt;
Runs containers in the background&lt;br&gt;
Stop Everything&lt;br&gt;
docker compose down&lt;/p&gt;

&lt;p&gt;This command:&lt;br&gt;
Stops containers&lt;br&gt;
Removes containers&lt;br&gt;
Removes networks created by Compose&lt;/p&gt;

&lt;p&gt;Using these two commands makes managing the entire environment simple and repeatable.&lt;/p&gt;

&lt;p&gt;What I Learned&lt;br&gt;
The biggest lesson from this deployment was that running microservices is much more than simply starting containers.&lt;/p&gt;

&lt;p&gt;A successful deployment depends on:&lt;br&gt;
Correct startup sequencing&lt;br&gt;
Service discovery&lt;br&gt;
Centralized configuration&lt;br&gt;
Monitoring&lt;br&gt;
Distributed tracing&lt;br&gt;
Health checks&lt;/p&gt;

&lt;p&gt;I also learned how observability tools such as Prometheus, Grafana, and Zipkin help engineers understand what is happening inside a distributed system.&lt;/p&gt;

&lt;p&gt;These tools become increasingly important as systems grow larger and more complex.&lt;/p&gt;

&lt;p&gt;Looking Ahead to Production&lt;/p&gt;

&lt;p&gt;If deploying this architecture to AWS, I would replace local Docker Compose components with managed cloud services:&lt;/p&gt;

&lt;p&gt;Local Deployment   |    AWS Production&lt;br&gt;
Docker Compose     |    Amazon EKS&lt;br&gt;
Local containers   |    Kubernetes Pods&lt;br&gt;
Local networking   |    Kubernetes Services&lt;br&gt;
Local monitoring   |    Amazon Managed Prometheus + Grafana&lt;br&gt;
Local storage      |    Amazon EBS/EFS&lt;br&gt;
Local secrets      |    AWS Secrets Manager&lt;br&gt;
Local images       |    Amazon ECR&lt;br&gt;
Manual deployment  |    CI/CD with GitHub Actions and ArgoCD&lt;/p&gt;

&lt;p&gt;This would provide better scalability, availability, security, and operational reliability.&lt;/p&gt;

&lt;p&gt;Conclusion&lt;br&gt;
Deploying Spring Petclinic Microservices gave me practical experience with modern cloud-native architecture and DevOps practices.&lt;/p&gt;

&lt;p&gt;From centralized configuration and service discovery to monitoring and tracing, this project demonstrated many of the concepts used in real-world production environments.&lt;/p&gt;

&lt;p&gt;This project was completed as part of the DevOps Mentorship Initiative (DMI).&lt;/p&gt;

&lt;p&gt;Interested in joining the next cohort?&lt;/p&gt;

&lt;p&gt;DMI Cohort 3 Registration:&lt;br&gt;
&lt;a href="https://docs.google.com/forms/d/e/1FAIpQLSel7ai7nyb0P1qLW4vEyfB_nEsD4lUF1XG88vmAaFGBOb6hPA/viewform" rel="noopener noreferrer"&gt;https://docs.google.com/forms/d/e/1FAIpQLSel7ai7nyb0P1qLW4vEyfB_nEsD4lUF1XG88vmAaFGBOb6hPA/viewform&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Author: Olanrewaju Awe&lt;br&gt;
GitHub: &lt;a href="https://github.com/Ralphlarry" rel="noopener noreferrer"&gt;https://github.com/Ralphlarry&lt;/a&gt;&lt;br&gt;
LinkedIn: &lt;a href="http://www.linkedin.com/in/olanrewaju-awe-62761758" rel="noopener noreferrer"&gt;www.linkedin.com/in/olanrewaju-awe-62761758&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
