DEV Community: Cloudev

The Return of 5Y-KZX: A Boeing 777 Finding Its Way Back Home

Cloudev — Thu, 18 Jun 2026 19:42:24 +0000

For the last decade, one aircraft has quietly lived far from the skies it once called home.

5Y-KZX, a Kenya Airways Boeing 777-300ER, has been away in Turkey for years, operating outside the familiar rhythm of East African routes. But now, something feels different. The aircraft is preparing for a return, rejoining the long haul network it once served, with upcoming rotations to Johannesburg and London beginning on the 3rd of July.

As someone deeply fascinated by aviation, I cannot help but feel a sense of excitement watching this unfold in real time.

A personal connection to aviation

I have always been drawn to aircraft.

Not just as machines, but as moving systems of engineering, coordination, and global connection. There is something almost poetic about how a single aircraft can link cities, cultures, and people across continents.

Tracking flights has become more than a technical exercise for me. It is a way of understanding movement at a global scale, and appreciating the invisible infrastructure that keeps the world connected.

KQ Flight Tracker (5Y-KZX)

A real time aircraft tracking system built on AWS that monitors Kenya Airways Boeing 777-300ER (5Y-KZX) using OpenSky Network data. The system behaves like a simplified Flightradar-style dashboard, showing live aircraft movement, flight state, and historical tracking.

Architecture

OpenSky API → AWS Lambda → DynamoDB → API Gateway → S3 Frontend (Leaflet Map)

Building a real time flight tracking system

This journey inspired me to build a cloud based aircraft tracking system using AWS.

The idea is simple:

Pull live aircraft data from OpenSky Network
Track 5Y-KZX in real time using AWS Lambda
Store flight positions in DynamoDB
Detect flight states such as departure, cruising, and arrival
Visualize everything on a live map dashboard

In many ways, it feels like building a personal version of Flightradar24, but tailored to a single aircraft that I care about.

Flight state intelligence

One of the most interesting parts of the system is flight state detection.

Instead of just showing coordinates, the system interprets movement:

DEPARTED: when the aircraft leaves its origin region
CRUISING: when it is en route over long distances
ARRIVING: when it approaches Kenyan airspace
LANDED: when it reaches JKIA

This transforms raw flight data into meaningful storytelling.

Why this matters

Real time aviation data is powerful, but often overwhelming when viewed globally.

Focusing on a single aircraft creates clarity. It turns noise into narrative and movement into something observable in a meaningful way.

This project sits at the intersection of cloud engineering and aviation curiosity, combining data ingestion, serverless compute, and frontend visualization.

Challenges
1.Managing API rate limits during frequent data requests
2.Filtering global flight data down to a single airline
3.Ensuring smooth map updates without performance issues
4.Handling inconsistent aircraft metadata across sources

GitHub Repository

The full source code, architecture, and implementation details for this project are available on GitHub:
https://github.com/Copubah/kq-flight-tracker

Kubernetes Deployment for Dummies

Cloudev — Sat, 16 May 2026 17:16:37 +0000

Managing Kubernetes clusters through the command line works well, but it becomes slow and error-prone as complexity grows. This project explores how to build a simple web-based Kubernetes dashboard using Go and React to improve visibility and control over cluster resources.

Why I Built This Project

Kubernetes is powerful, but operational tasks often rely heavily on commands like:
kubectl get pods
kubectl logs pod-name
kubectl scale deployment app --replicas=3

While effective, this approach has challenges:

1.Hard to visualize cluster state
2.Repetitive commands for daily operations
3.Steep learning curve for teams new to Kubernetes
4.Increased risk of manual errors

What the Dashboard Does
The Kubernetes Deployment Dashboard provides a centralized UI to:
1.View pods and their status across namespaces

Monitor deployments and replica health 3.Inspect pod logs directly in the browser 4.Scale deployments with a single action 5.Restart deployments using rollout updates 6.Secure API access using Basic Authentication

Tech Stack

Backend: Go, Gin
Kubernetes integration: client-go
Frontend: React (Vite)
Containerization: Docker
Cluster: Kubernetes (Minikube / EKS compatible)
Authentication: Basic Auth

Backend Design
The backend is built using Go and Gin. It acts as a bridge between the frontend and Kubernetes API.

Key responsibilities:

Connect to Kubernetes cluster using kubeconfig or in-cluster service accounts
Fetch cluster resources (pods, deployments)
Execute actions (scale, restart)
Stream logs from pods
Return structured JSON responses

Kubernetes Integration
The project uses Kubernetes client-go to interact with the cluster.
It can:

List workloads
Modify deployments
Retrieve logs
Work across namespaces

RBAC permissions are defined to ensure the service only has the access it needs.
Security Considerations

For simplicity, the project uses Basic Authentication.
In production environments, this should be replaced with:

OAuth2 or OIDC
TLS encryption
Fine-grained RBAC policies
Audit logging

Possible Improvements

There are several directions to extend this project:

Real-time logs using WebSockets
Multi-cluster support
Metrics dashboard using Prometheus
Role-based authentication system
Helm chart deployment

If you want to explore the code or contribute, the project is available on GitHub:https://github.com/Copubah/k8s-dashboard

A Self-Healing AWS ECS Monitoring System with Slack Alerts Using Terraform

Cloudev — Fri, 20 Mar 2026 14:58:23 +0000

Modern cloud applications need more than monitoring they need self-healing infrastructure. Waiting for humans to react to failures increases downtime and risks user impact. In this guide, I’ll show you how to build a system that automatically detects ECS service failures, notifies your team on Slack, and restores the service all using Terraform.

Why This Project Matters

In containerized environments, services can fail due to application crashes, resource exhaustion, or deployment issues. Traditional monitoring tools detect failures, but manual intervention is slow.

A self-healing system solves this by:

Detecting failures automatically
Restarting services without human intervention
Sending alerts to teams in real-time

Architecture Overview

Here’s how the system works:

ECS service health degrades (task crashes, reduced running count)
CloudWatch monitors ECS metrics and triggers an alarm when RunningTaskCount < desired count 3.EventBridge captures the alarm state change
Lambda executes:
Sends a Slack alert
Restarts the ECS service

This creates a closed-loop, event-driven system.

AWS Services Used

Amazon ECS (Fargate) – Hosts containerized apps
CloudWatch – Monitors service health
EventBridge – Captures CloudWatch alarms and triggers Lambda
Lambda – Executes remediation logic and sends Slack notifications 5.Slack Webhook – Sends alerts to your team

Terraform Implementation

I built the infrastructure using Terraform for repeatable, version-controlled deployment. Key points:
1.Modular structure (ecs, lambda, cloudwatch, eventbridge, iam, ssm)

Slack webhook stored securely in SSM Parameter Store
Lambda reads the webhook at runtime and sends formatted alerts

This project shows how to turn ECS monitoring into a self-healing system. By combining AWS services and Slack integration, you can detect failures, alert your team, and restore services automatically, reducing downtime and improving reliability.
Github repo:https://github.com/Copubah/AWS-ecs-monitoring-and-auto-remediation

Automating AWS Cost Monitoring with Terraform, Lambda, and Slack

Cloudev — Wed, 18 Mar 2026 12:42:50 +0000

Managing cloud costs can quickly become challenging, especially when resources scale dynamically. Instead of manually checking the AWS console, I built an automated system that sends daily cost summaries directly to Slack.

Problem

AWS bills can grow unexpectedly without real-time visibility. Logging into the console daily is inefficient and easy to forget.

Solution

I built a serverless cost monitoring system using Terraform and AWS services that automatically sends cost updates to Slack.

Architecture

The system follows a simple event-driven design:

EventBridge triggers a Lambda function daily
Lambda queries AWS Cost Explorer API
The cost data is formatted into a readable summary
A Slack webhook sends the message to a channel

Tools Used

Terraform (latest version)
AWS Lambda (Python)
EventBridge
IAM (least privilege)
AWS Cost Explorer API
Slack Incoming Webhooks

How It Works

Every day, the Lambda function runs and retrieves:
Total AWS spend for the day
Breakdown of costs by service
Optional alerts if spending exceeds thresholds

It then sends a message like this to Slack:
AWS Cost Summary – 18 Mar 2026
Total Spend Today: $12.34
Top Services:

EC2: $6.50
S3: $3.20
Lambda: $2.64 Key Benefits
Automated cost visibility
Early detection of unexpected spikes
Low cost to run (within free tier for most users)
Fully serverless and scalable

GitHub Repository
https://github.com/Copubah/aws-cost-reporter

Deploying a Flask Application to Kubernetes Using Minikube

Cloudev — Sat, 07 Mar 2026 15:54:50 +0000

I have recently been spending time tinkering with Kubernetes and exploring how container orchestration works in practice. To get hands on experience, I set up a local cluster using Minikube and started experimenting with deploying simple applications.
Project Overview
The goal of this project is to:

Build a simple Flask web application
Package it in a container using Docker
Deploy it to a Kubernetes cluster
Expose the application so it can be accessed from a browser

This setup runs locally using Kubernetes through Minikube.
Architecture

The deployment follows a simple Kubernetes architecture:

User → Kubernetes Service → Deployment → Pods → Flask Container
Deployment manages the application pods
Pods run the containerized Flask application
Service exposes the application to the outside network

Project Structure

The repository contains the following files:

kubernetes-flask-app
│
├── app.py
├── requirements.txt
├── Dockerfile
├── deployment.yaml
├── service.yaml
└── README.md

app.py A simple Flask application that returns a message when accessed.
requirements.txt Contains the Python dependencies required to run the application.
Dockerfile Defines how the Flask application is packaged into a container image.
deployment.yaml Defines the Kubernetes deployment and manages the application pods.
service.yaml Creates a Kubernetes service to expose the application.

Running the Project
1. Start Minikube
Start your local Kubernetes cluster.

minikube start

Verify the cluster is running.

kubectl get nodes 2. Build the Docker Image Configure your environment to use the Docker daemon inside Minikube.

eval $(minikube docker-env)

Build the container image.

docker build -t flask-k8s-app .
3. Deploy the Application
Create the deployment in Kubernetes.

kubectl apply -f deployment.yaml

Check the running pods.

kubectl get pods
4. Expose the Application

Create a Kubernetes service.

kubectl apply -f service.yaml

To access the application:

minikube service flask-service

Your browser will open the application.

What This Project Demonstrates

This project helps demonstrate core DevOps and cloud concepts:

Containerization using Docker
Container orchestration using Kubernetes -Local Kubernetes development using Minikube
Deployments and services in Kubernetes

Repository

You can find the full project source code on GitHub:
https://github.com/Copubah/kubernetes-flask-app

Chaos by Design: Production Maintenance Drills on Kubernetes

Cloudev — Thu, 26 Feb 2026 17:37:13 +0000

There's an old SRE adage: "Hope is not a strategy." Yet most engineering teams only discover how their systems fail under pressure when that pressure is real, unplanned, and 2 AM on a Saturday. Production outages are expensive teachers.

The alternative is to make failure boring — to rehearse it so often that when it actually happens, your team moves through the recovery playbook on autopilot. That's the idea behind prod-maintenance-drills: a self-hosted Kubernetes environment where you deliberately break things to learn how to fix them.

Why Drills Matter
Chaos engineering, popularized by Netflix's Chaos Monkey, is the discipline of intentionally introducing failures into a system to build confidence in its ability to withstand turbulent, unexpected conditions. But you don't need a Netflix-scale infrastructure to benefit from it.

Even on a local Kubernetes cluster with a handful of pods, running structured drills teaches you things you can't learn from diagrams or documentation:

How fast does your deployment actually recover after a pod crash?
Does your application handle a database restart gracefully, or does it need a manual restart too?
At what CPU threshold does your HPA kick in — and does it kick in fast enough?
When disk fills up, do your alerts fire before the application starts failing? Running drills answers these questions with evidence, not assumptions.

The FastAPI application exposes two endpoints: / for a health check and /db to verify database connectivity. These simple endpoints become the canary in the coal mine — you watch them during drills to confirm the system has recovered.

The monitoring stack runs kube-prometheus-stack via Helm, giving you Prometheus scraping, Grafana dashboards, and Alertmanager rules all preconfigured. You get real-time visibility into pod restarts, CPU usage, and database status without having to wire anything up manually.

The Five Drills

app_crash.sh
Deletes a running pod to test Kubernetes self-healing and deployment recovery time.
db_failure.sh
Kills the PostgreSQL pod to validate application reconnection behavior.

-high_load.sh
Schedules a CPU stress job to trigger horizontal pod autoscaling at 60% threshold.

-backup.sh
Creates a timestamped PostgreSQL dump to practice backup and restore procedures.

disk_fill.sh Simulates disk exhaustion on a node and verifies that monitoring alerts fire correctly.

Setting It Up
Prerequisites
You'll need: kind or minikube, kubectl, docker, and helm. That's it no cloud account required.

Spin up the cluster** # Create a local kind cluster kind create cluster --name simple-k8sbash
Install the monitoring stack** helm repo add prometheus-community \ https://prometheus-community.github.io/helm-charts helm repo update

helm install monitoring \
prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespacebash
Tip
For kind clusters, you'll also need to install metrics-server and add --kubelet-insecure-tls to its deployment args, otherwise HPA won't be able to read resource metrics.
3.Build and load the application image**

Build the FastAPI image

cd app && docker build -t prod-app:latest . && cd ..

Load it into kind (not needed for cloud clusters)

kind load docker-image prod-app:latest --name simple-k8sbash

Deploy everything** kubectl apply -f k8s/

Verify everything came up

kubectl get pods -n prod
kubectl get pods -n monitoring
kubectl get hpa -n prodbash
Once the pods are running, grab the node IP and hit your endpoints:

NODE_IP=$(kubectl get nodes -o jsonpath=\
'{.items[0].status.addresses[?(@.type=="InternalIP")].address}')

curl http://$NODE_IP:30007/

→ {"status":"running"}

curl http://$NODE_IP:30007/db

→ {"db":"connected"}bash

Running Your First Drill
Let's walk through the pod crash drill end to end, because it's the cleanest example of what these drills teach you.

Open two terminal windows. In the first, start watching your pods:

watch kubectl get pods -n prodbash In the second, run the drill:

./app_crash.shbash
You'll see one pod disappear from the watch window. Within seconds — typically under 30 — Kubernetes will have scheduled a replacement. That's the Deployment controller doing its job.

Now do it again. And again. Notice the restart count climb in Prometheus. Notice the brief dip in the up{namespace="prod"} metric. This is what your monitoring dashboards look like during an incident. Seeing it in a drill is far less stressful than seeing it at 2 AM for the first time.

Prometheus Queries for Drills
Monitor these metrics live while running drills:

sum(kube_pod_container_status_restarts_total{namespace="prod"}) by (pod)
sum(rate(container_cpu_usage_seconds_total{namespace="prod"}[1m])) by (pod)
up{namespace="prod"}
The HPA Drill — Watching Your System Scale
The high_load.sh drill is especially satisfying because you get to watch autoscaling happen in real time. The HPA is configured with a minimum of 2 replicas, a maximum of 5, and a target CPU utilization of 60%.

In one terminal: watch the HPA

watch kubectl get hpa -n prod

In another: trigger the load

./high_load.shbash
You'll see the TARGETS column climb past 60%, and within a minute or two the REPLICAS column will tick up from 2. The load job eventually completes, and the HPA scales back down after the cooldown period.

This drill builds intuition for how long scaling takes end-to-end: from metric collection, to HPA decision, to pod scheduling, to readiness. That latency matters when you're sizing your HPA thresholds for real production traffic spikes.

The Database Failure Drill — Resilience by Default?
This one is the most revealing. When PostgreSQL restarts, does your FastAPI application reconnect automatically, or does it need a restart too?

./db_failure.shbash
While the PostgreSQL pod is down, hitting /db should return a connectivity error. When it comes back, your application should reconnect on the next request — if your database connection pool is configured to retry.

If it doesn't reconnect automatically, you've just discovered a resilience gap before it cost you. Fix it: use a connection pool with reconnect logic, add retry wrappers around database calls, or configure proper liveness/readiness probes that cycle the app pod when the DB is unreachable.

Connection Resilience Checklist
After the DB drill, verify: (1) App eventually reconnects without manual intervention. (2) Readiness probe correctly marks the pod as not-ready while DB is down. (3) Prometheus alert fires within your SLO window. (4) The alert resolves automatically after recovery.
Integrating With Your Workflow
Drills are most valuable when they're scheduled, not spontaneous. A few patterns that work well:

Weekly game days: Block 30 minutes every week for one drill, rotating through the five scenarios. Document observations and improvements in a shared runbook.
Pre-release validation: Run the full suite before any major deployment. If your new release doesn't survive a pod crash drill, it's not ready.
Onboarding tool: New engineers run the drills in their first week. There's no better way to learn a system than to watch it fail and recover.
CI gate: In staging, run app_crash.sh as part of your pipeline and fail the build if recovery takes longer than your SLO allows.
What's Next
The current drill set covers the most common failure modes. Here are some directions to extend it:

Network partition drill: Use a network policy to block traffic between the app and the database for a set duration, simulating a network split.
Memory pressure: A complement to the CPU drill — fill pod memory to trigger OOM kills and test restart behavior.
Rolling update with failure injection: Trigger a deployment rollout while simultaneously running the crash drill to validate zero-downtime deploys.
Restore drill: Pair backup.sh with a corresponding restore.sh that brings the database back from a backup and validates data integrity.
Multi-node scenarios: With a multi-node kind cluster, add a node drain drill to practice pod eviction and rescheduling.

https://github.com/Copubah/prod-maintenance-drills

Building a Simple Cloud Security Automation Tool in Rust

Cloudev — Sun, 25 Jan 2026 19:44:00 +0000

Cloud security is no longer just about dashboards and manual reviews. Modern security teams rely heavily on automation to detect and respond to misconfigurations in real time.

In this article, I will show how I built a simple Cloud Security Posture Management (CSPM) tool using Rust and the AWS SDK. The goal is to demonstrate how Rust can be used for real world cloud security automation, not just systems programming.
Why Rust for Cloud Security

Most cloud security automation is written in Python or Go. Rust is less common, but it has some serious advantages:

Memory safety by default
High performance for log processing and scanning
Single static binaries for agents and tools
Strong type system for building reliable security systems

Rust is especially useful when building security tooling that needs to be fast, stable, and safe to run in production.

Project Overview: CloudGuard

The project is a simple Rust CLI tool called CloudGuard.

It performs two basic but very realistic security checks:

1.Detect public S3 buckets
2.Detect EC2 security groups open to the world on sensitive ports

This is essentially a mini CSPM tool.

What the Tool Does

CloudGuard scans an AWS account and prints a security report showing:
1.Any S3 buckets with public access
2.Any security groups with 0.0.0.0/0 on:
3.Port 22 (SSH)
4.Port 3389 (RDP)
5.Port 3306 (MySQL)

These are some of the most common real world cloud misconfigurations.

Architecture

The architecture is very simple:

Rust CLI
→ AWS SDK for Rust
→ AWS APIs (S3, EC2)

There is no agent and no infrastructure required. It runs using normal AWS credentials.

Project Structure

The project is split into small modules:

main.rs: entry point
s3_scan.rs: S3 public access checks
sg_scan.rs: security group checks

This keeps the code clean and easy to extend.

Setting Up the Project

Create the project:

cargo new cloud-guard
cd cloud-guard

Add dependencies to Cargo.toml:

[dependencies]

aws-config = "1"
aws-sdk-s3 = "1"
aws-sdk-ec2 = "1"
tokio = { version = "1", features = ["full"] }

Configure AWS credentials:

aws configure

Example: Scanning for Public S3 Buckets

The tool lists all buckets and checks their ACLs for public access.

The logic is:

Call ListBuckets
For each bucket, call GetBucketAcl
If the grantee contains AllUsers, the bucket is public

This mirrors how real CSPM tools work internally.

Example: Scanning Open Security Groups**

The security group scan works like this:

Call DescribeSecurityGroups
Loop through inbound rules
If CIDR is 0.0.0.0/0 and port is sensitive, flag it

This is exactly the same logic used in enterprise security tools.

Running the Tool

Run it locally:

cargo run

You will get output like:

=== S3 Public Bucket Scan ===
Public bucket found: test-assets-bucket

=== Security Group Scan ===
Open SG: web-sg on ports 22-22

That is already a working cloud security scanner.
Github repo:https://github.com/Copubah/aws-cloudguard

Kubernetes Essentials

Cloudev — Sat, 03 Jan 2026 12:46:48 +0000

Kubernetes is the go-to platform for managing containerized applications at scale. Here’s a concise guide to the basics every developer or SysOps engineer should know

1. Pods

Definition: Smallest deployable unit in Kubernetes; can run one or more containers.

Commands:

kubectl get pods # List all pods
kubectl describe pod # Detailed info
kubectl logs # View container logs

2. Deployments

Purpose: Ensure your application runs with the desired number of replicas. Handles updates and rollbacks automatically.

Commands:

kubectl create deployment --image=
kubectl get deployments
kubectl scale deployment --replicas=N

3. Services

Purpose: Expose pods to internal or external traffic.
Types:

ClusterIP – internal only (default)

NodePort – accessible via node IP

LoadBalancer – external access via cloud LB

Commands:

kubectl expose deployment --type=NodePort --port=80
kubectl get svc

4. Common Commands

kubectl get all – List all resources in the cluster

kubectl delete pod – Remove a pod

kubectl apply -f – Apply configuration files

5. Troubleshooting

kubectl describe pod – Check events, errors, or misconfigurations

kubectl logs – Inspect application logs

kubectl get nodes – Check node health and availability

Tip: Always start by inspecting pods and their logs when troubleshooting, then check deployments and services. Kubernetes is powerful, but clear visibility into resources makes management easier

Deploying a Highly Available AWS Architecture with Terraform

Cloudev — Thu, 01 Jan 2026 12:40:49 +0000

High availability is one of those concepts everyone mentions, but far fewer people actually implement correctly. In many demo projects, availability stops at launching an EC2 instance and exposing it to the internet. That works until something breaks, and in production something always breaks.

In this post, I walk through a Terraform project where the primary goal is resilience. The architecture is designed to keep serving traffic even when individual instances or an entire Availability Zone fails.

The full source code is available here:
https://github.com/Copubah/Terraform-AWS-Multi-AZ-Highly-Available-Architecture

Why I Built This Project

I wanted a project that answers practical questions instead of just showing that something is running.

What happens if an EC2 instance crashes
What happens if an Availability Zone becomes unavailable
How fast can the environment be rebuilt from scratch
How cleanly is the infrastructure defined and reused

This project focuses on those questions using Terraform as the single source of truth for infrastructure.

High Level Architecture

At a high level, the architecture follows a common and proven AWS pattern.
User traffic enters through an Application Load Balancer deployed in public subnets. The load balancer distributes traffic to application instances running in private subnets across multiple Availability Zones. An Auto Scaling Group ensures that capacity is always maintained. Supporting networking components like NAT Gateways and route tables ensure instances can communicate outbound without being publicly exposed.

Every major component is spread across at least two Availability Zones to eliminate single points of failure.

Core AWS Components Used

VPC
A custom VPC provides full control over networking. DNS support and hostnames are enabled to support internal service discovery and load balancing.

Subnets
Public subnets host the Application Load Balancer and NAT Gateways. Private subnets host the EC2 instances. Subnets are evenly distributed across Availability Zones to ensure redundancy.

Application Load Balancer
The ALB acts as the entry point to the system. It performs health checks on backend instances and only routes traffic to healthy targets. If an instance fails health checks, it is automatically removed from rotation.

Auto Scaling Group
The Auto Scaling Group maintains a minimum number of EC2 instances across multiple Availability Zones. If an instance terminates or becomes unhealthy, Auto Scaling replaces it automatically. This is one of the key pieces that enables self healing behavior.

Security Groups
Security groups are tightly scoped. The load balancer allows inbound HTTP traffic from the internet. Application instances only allow inbound traffic from the load balancer. This reduces the attack surface and follows least privilege principles.

Terraform and Modularity

One of the main goals of this project was clean Terraform structure.

Instead of placing everything in a single main.tf file, the infrastructure is broken into reusable modules. Each module is responsible for a single concern such as networking, load balancing, or compute. This mirrors how Terraform is used in real teams and makes the code easier to reason about.

Each module contains its own variables and outputs, which keeps dependencies explicit and avoids hidden coupling. The root module simply wires everything together.

This modular approach also makes future expansion straightforward. Adding a database layer or extending to multi region deployments would not require restructuring the existing code.

Failure Scenarios and How the Architecture Responds

Instance failure
If an EC2 instance crashes or is terminated, the load balancer stops sending traffic to it. The Auto Scaling Group detects the capacity drop and launches a replacement instance automatically.

Availability Zone failure
If an entire Availability Zone becomes unavailable, the load balancer routes traffic only to healthy instances in the remaining zones. Auto Scaling launches new instances in available zones to maintain capacity.

Traffic spikes
Auto Scaling policies can be added to scale out based on load. The architecture already supports horizontal scaling without any redesign.

Infrastructure rebuild
Because everything is defined in Terraform, the entire environment can be destroyed and recreated consistently. This is critical for disaster recovery and reproducibility.
Why This Makes a Strong Portfolio Project

This project focuses on reliability rather than visual complexity. It demonstrates understanding of core AWS concepts such as Availability Zones, load balancing, self healing infrastructure, and infrastructure as code.

It also shows discipline in Terraform usage through modular design, clear variable definitions, and reproducibility. These are the qualities teams look for when reviewing real world infrastructure code.

Hands-On Journey Experimenting with Kubernetes: FastAPI + React Deployment

Cloudev — Tue, 23 Dec 2025 17:03:12 +0000

Over the past few weeks, I’ve been diving deep into Kubernetes, exploring how to deploy and manage containerized applications. To really understand the mechanics, I decided to create a small but complete full-stack project: a FastAPI backend with a React frontend running on a local Kubernetes cluster using Kind (Kubernetes in Docker).

Why I Started Experimenting

Kubernetes is powerful but complex, and reading documentation can only take you so far. I wanted to:

See how Deployments and Services interact in real-time
Understand how frontend and backend communicate inside a cluster
Experiment with replicas, scaling, and networking without affecting production

This project became my sandbox for testing Kubernetes concepts and workflows hands-on

How I Set Up the Project

I structured the project into three main areas:

Backend (FastAPI)

Runs on Python 3.11, serving API requests on port 80
Exposed internally with a ClusterIP service
I’ve experimented with creating endpoints and testing them using pytest

2.Frontend (React + Nginx)

React app is built statically and served by Nginx
Exposed externally with a LoadBalancer service
Configured SPA routing and CORS headers to communicate seamlessly with the backend

Kubernetes Manifests
Separate YAML files for backend and frontend
Each deployment has 2 replicas
Services use Kubernetes DNS for internal pod communication

My Hands-On Experiments
Here’s what I’ve been tinkering with:

Creating a Kind Cluster: Spinning up a lightweight Kubernetes environment locally
Loading Docker Images: Pre-loading backend and frontend images into the cluster with kind load docker-image
Deploying and Updating: Iteratively modifying endpoints, rebuilding images, and redeploying -Service Discovery: Using Kubernetes DNS for frontend-backend communication instead of hardcoded IPs -Debugging: Inspecting logs, port-forwarding services, and fixing issues like ImagePullBackOff and misconfigured CORS

Through this, I’ve gained a deeper understanding of:

Replicas and pod distribution
ClusterIP vs LoadBalancer services
Internal pod networking
How environment variables manage API connections

Testing & Local Development

Backend

Tested with pytest and FastAPI TestClient
GET / returns a 200 status code with JSON
Tests run locally with make test

Frontend

Points to backend API via REACT_APP_API_URL
Can be tested locally before deploying to Kubernetes

Experimenting with Kubernetes is all about deploying, breaking, fixing, and iterating. Creating a small project like this is the fastest way to understand how everything fits together.
Check out the repo:https://github.com/Copubah/simple-k8s-project

CI/CD for Dummies

Cloudev — Sat, 20 Dec 2025 16:50:17 +0000

Continuous Integration and Continuous Deployment (or Delivery) better known as CI/CD might sound like a fancy buzzword. But at its core, it’s just automation that helps make sure your code is tested, validated, and safely shipped every time you make changes.

What is CI/CD?

CI/CD is a workflow that automatically takes your code from development all the way through testing and ready-to-deploy stages without manual steps.

Continuous Integration (CI) means your code gets automatically tested and merged early and often. Every push triggers automated builds and tests so bugs don’t sneak in.
Continuous Delivery (CD) means your code is always in a deployable state. Once tests pass, deployments can be triggered with a click.
Continuous Deployment goes one step further — if all checks pass, your code goes straight to production without human intervention.

In traditional development, teams write code, then hand it over to QA and operations with lots of manual steps. CI/CD flips that script by automating the whole lifecycle so you can ship faster and with confidence

Why CI/CD Matters

Here’s why engineers love CI/CD:

Faster feedback-tests run automatically on every commit, so you catch problems early. Dummies
Reliable deployments-automation means fewer mistakes compared to manual deploys. Dummies
Better collaboration-developers integrate changes more frequently with less conflict. Dummies

Confidence in releases-because tests run every time, you know the code is solid before deploying.

A Simple CI/CD Example with GitHub Actions

To make CI/CD tangible, I built a basic Node.js project that demonstrates a live pipeline using GitHub Actions. The idea is simple: every time you push code, GitHub tests it automatically.

Here’s what happens under the hood:

1.You push your code to GitHub.
2.GitHub Actions sees the change and triggers the pipeline.
3.It sets up a fresh environment, installs dependencies, and runs your tests.
4.You get feedback right inside GitHub if everything passed or failed

Your repo structure looks like this:

CI-CD-for-Dummies/
├── app.js

├── test.js

├── package.json

├── .github/
│ └── workflows/
│ └── ci.yml

└── README.md

This is all you need to set up a basic pipeline no extra servers or tools required.

Try it Yourself

If you want to experience CI/CD first-hand:

Clone the project:

git clone https://github.com/Copubah/CI-CD-for-Dummies.git

cd CI-CD-for-Dummies

Install dependencies:

npm install

Run tests locally:

npm test

.Commit and push changes back to GitHub and watch your CI/CD pipeline run on the Actions tab.
Watching tests run automatically on every push is a little like magic when you’re just getting started.

Wrapping Up

CI/CD takes away repetitive manual work so you can focus on writing code. Whether you’re just learning DevOps or want reliable hands-off deployments, automating your builds and tests with CI/CD is one of the best practices you can adopt as a developer.

If you’re curious how this scales to larger projects or more advanced workflows, there are tons of tools and techniques out there — but mastering the basics will give you a huge head start.

Building a Modular Serverless ETL Pipeline on AWS with Terraform & Lambda

Cloudev — Sun, 07 Dec 2025 08:20:28 +0000

Many applications even small ones receive data as raw CSV files (customer exports, logs, partner data dumps). Without automation to clean, validate, and store that data in a standard format, teams end up with messy data, duplicated effort, inconsistent formats, and manual steps each time new data arrives.

This pipeline provides:

Automated processing of raw CSV uploads
Basic data hygiene (cleaning / validation)
Ready-to-use outputs for analytics or downstream systems
Modular, reproducible, and extendable infrastructure

By combining Terraform + AWS Lambda + Amazon S3, the solution is serverless, scalable, and easy to redeploy.
Because it’s built with Terraform + AWS Lambda + Amazon S3, you don’t manage servers AWS handles compute, storage and scaling and you get repeatable infrastructure deployment. This pattern is ideal for small to medium data ingestion workflows, proofs‑of‑concept, and even production‑ready ETL for modest data volumes.

Architecture & Design

Here’s the high-level architecture of the pipeline:

Raw CSV file -> S3 raw-bucket -> S3 event trigger -> Lambda function (Python)

│

▼

Data cleaning / transformation

│

▼

Save cleaned CSV to S3 clean-bucket

│

(optional: push cleaned data to DynamoDB / RDS)

How It Works

A user (or another system) uploads a CSV file into the “raw” S3 bucket.

2.S3 triggers the Lambda function automatically on object creation.

3.The Lambda reads the CSV, parses rows, applies validation and transformation logic (e.g. remove invalid rows, normalize text, enforce schema).

4.Cleaned data is written back to a “clean” S3 bucket — optionally also sent to a database (like DynamoDB) or another data store.

Because everything is managed via Terraform, you can version your infrastructure, redeploy consistently across environments (dev / staging / prod), and manage permissions cleanly.

Example Use Cases

Customer data ingestion: Partners or internal teams export user data; this pipeline cleans, standardizes, and readies it for analytics or import.
Daily sales / transaction reports: Automate processing of daily uploads into a clean format ready for dashboards or billing systems.
Log / event data processing: Convert raw logs or CSV exports into normalized data for analytics or storage.

-Pre‑processing for analytics or machine learning: Clean and standardize raw data before loading into data warehouse or data lake.

-Archival + compliance workflows: Maintain clean, versioned, and validated data sets for audits or record‑keeping.

Learning Outcomes

Infrastructure as Code with Terraform
Event-driven serverless architecture with Lambda
Secure IAM policies and resource permissions
Modular, reusable Terraform modules
Clean, maintainable ETL logic in Python

Possible Enhancements

Schema validation and error logging
Deduplication logic using DynamoDB or file hashes
Multiple destinations (S3, DynamoDB, RDS)
Monitoring and CloudWatch metrics
Multi-format support (CSV, JSON, Parquet)
CI/CD integration
Multi-environment deployment (dev, staging, prod)

Conclusion

This project demonstrates how to build a real-world, production-inspired ETL pipeline on AWS. It’s a small but powerful example of combining serverless computing, IaC, and automation. Being recently experimenting with these tools, I found this project an excellent way to learn best practices while building something tangible for a portfolio.

Github repo:https://github.com/Copubah/aws-etl-pipeline-terraform