DEV Community: Diganto Paul

Load Balancing on AWS and GCP: A Practical Guide

Diganto Paul — Thu, 02 Jul 2026 16:37:33 +0000

Choosing and configuring the right managed load balancer for your cloud architecture

Introduction

Both AWS and GCP offer mature, fully managed load balancing services — removing the need to run and patch your own HAProxy or NGINX fleet. But each cloud has its own naming, tiers, and quirks, and picking the wrong one can mean paying for capabilities you don't need or missing ones you do. This guide walks through the load balancing options on both platforms, when to use each, and how to configure them.

1. The Load Balancer Landscape

Layer	AWS	GCP
Layer 7 (HTTP/HTTPS)	Application Load Balancer (ALB)	External/Internal HTTP(S) Load Balancer
Layer 4 (TCP/UDP)	Network Load Balancer (NLB)	External/Internal TCP/UDP Network Load Balancer
Legacy/basic	Classic Load Balancer (CLB) — legacy, avoid for new builds	(none — GCP moved fully to the above two)
Global anycast	Global Accelerator	Global External HTTP(S) Load Balancer (native)
Service mesh / internal	AWS App Mesh + NLB/ALB	Internal HTTP(S) LB + Traffic Director

Key distinction: GCP's HTTP(S) Load Balancer is global by default — a single anycast IP can front backends in multiple regions. AWS's ALB and NLB are regional; global reach requires layering Global Accelerator or Route 53 on top.

2. AWS Load Balancing Options

Application Load Balancer (ALB)

Layer 7, HTTP/HTTPS/gRPC-aware. Best for microservices, path-based routing, and containerized workloads (ECS/EKS).

aws elbv2 create-load-balancer \
  --name my-app-alb \
  --subnets subnet-0123abcd subnet-0456efgh \
  --security-groups sg-0789ijkl \
  --scheme internet-facing \
  --type application

Routing rules let you send traffic to different target groups based on path or host:

aws elbv2 create-rule \
  --listener-arn <listener-arn> \
  --priority 10 \
  --conditions Field=path-pattern,Values='/api/*' \
  --actions Type=forward,TargetGroupArn=<api-target-group-arn>

Network Load Balancer (NLB)

Layer 4, ultra-low latency, handles millions of requests per second, preserves client source IP. Best for TCP/UDP workloads, gaming, or when raw throughput matters more than content-aware routing.

aws elbv2 create-load-balancer \
  --name my-tcp-nlb \
  --subnets subnet-0123abcd subnet-0456efgh \
  --type network \
  --scheme internet-facing

Global Accelerator

Uses AWS's global network backbone and anycast IPs to route users to the nearest healthy regional endpoint (which can itself be an ALB or NLB). Useful for multi-region failover and reducing latency for globally distributed users.

aws globalaccelerator create-accelerator \
  --name my-global-app \
  --ip-address-type IPV4 \
  --enabled

Health Checks (ALB/NLB)

aws elbv2 create-target-group \
  --name my-targets \
  --protocol HTTP \
  --port 80 \
  --vpc-id vpc-0abc123 \
  --health-check-path /health \
  --health-check-interval-seconds 15 \
  --healthy-threshold-count 2 \
  --unhealthy-threshold-count 3

3. GCP Load Balancing Options

External HTTP(S) Load Balancer (Global)

Layer 7, globally distributed via a single anycast IP, backed by Google's edge network. Best default choice for public-facing web apps and APIs.

# Create a health check
gcloud compute health-checks create http my-health-check \
  --port 80 --request-path=/health

# Create a backend service
gcloud compute backend-services create my-backend-service \
  --protocol=HTTP \
  --health-checks=my-health-check \
  --global

# Add an instance group or NEG as a backend
gcloud compute backend-services add-backend my-backend-service \
  --instance-group=my-instance-group \
  --instance-group-zone=us-central1-a \
  --global

# Create URL map, proxy, and forwarding rule
gcloud compute url-maps create my-url-map \
  --default-service=my-backend-service

gcloud compute target-http-proxies create my-http-proxy \
  --url-map=my-url-map

gcloud compute forwarding-rules create my-http-rule \
  --global \
  --target-http-proxy=my-http-proxy \
  --ports=80

Internal HTTP(S) Load Balancer

Same Layer 7 capabilities as above, but scoped to a VPC — ideal for internal microservice-to-microservice traffic within GKE or Compute Engine.

External/Internal TCP/UDP Network Load Balancer

Layer 4 pass-through balancing, preserves source IP, minimal latency overhead. Choose this for non-HTTP protocols or when you need raw packet forwarding performance.

gcloud compute forwarding-rules create my-tcp-rule \
  --region=us-central1 \
  --ports=443 \
  --target-pool=my-target-pool

GKE-Native Load Balancing

On GKE, a standard Service of type LoadBalancer automatically provisions a GCP Network Load Balancer:

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  type: LoadBalancer
  selector:
    app: my-app
  ports:
    - port: 80
      targetPort: 8080

For Layer 7 routing on GKE, an Ingress resource provisions a Global External HTTP(S) Load Balancer automatically:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-ingress
  annotations:
    kubernetes.io/ingress.class: "gce"
spec:
  rules:
    - http:
        paths:
          - path: /api/*
            pathType: ImplementationSpecific
            backend:
              service:
                name: api-service
                port:
                  number: 80

4. Choosing Between AWS and GCP Options

Your Need	AWS	GCP
Public HTTP(S) web app, single region	ALB	External HTTP(S) LB (regional backend, still global IP)
Public HTTP(S) app, multi-region, one IP	ALB + Global Accelerator	External HTTP(S) LB (global by default)
Raw TCP/UDP, high throughput, preserve client IP	NLB	External/Internal TCP/UDP LB
Internal service-to-service traffic	ALB/NLB with `internal` scheme	Internal HTTP(S) or TCP/UDP LB
Kubernetes workloads	ALB/NLB via AWS Load Balancer Controller (EKS)	Native via GKE Ingress/Service
gRPC	ALB (native support)	External/Internal HTTP(S) LB (native support)

5. Health Checks and Failover

Both platforms let you tune failure detection sensitivity — keep this in mind so a single blip doesn't remove a healthy backend, but real failures are caught quickly.

AWS: --healthy-threshold-count / --unhealthy-threshold-count on target groups, plus --health-check-interval-seconds.
GCP: --check-interval and --unhealthy-threshold on gcloud compute health-checks.

# GCP: tune check sensitivity
gcloud compute health-checks update http my-health-check \
  --check-interval=10s \
  --unhealthy-threshold=3 \
  --healthy-threshold=2

Recommendation: start with a 2-failure threshold to mark unhealthy and a 2-success threshold to mark healthy again — aggressive enough to react fast, conservative enough to avoid flapping.

6. TLS Termination

Both clouds support terminating TLS at the load balancer, offloading certificate management from your application servers.

AWS (ALB with ACM certificate):

aws elbv2 create-listener \
  --load-balancer-arn <alb-arn> \
  --protocol HTTPS \
  --port 443 \
  --certificates CertificateArn=<acm-cert-arn> \
  --default-actions Type=forward,TargetGroupArn=<target-group-arn>

GCP (HTTP(S) LB with Google-managed certificate):

gcloud compute ssl-certificates create my-cert \
  --domains=example.com \
  --global

gcloud compute target-https-proxies create my-https-proxy \
  --url-map=my-url-map \
  --ssl-certificates=my-cert

Both AWS Certificate Manager and Google-managed SSL certificates auto-renew, so once configured correctly this is largely maintenance-free.

Closing Thoughts

AWS and GCP take slightly different philosophies — AWS gives you regional building blocks (ALB, NLB) that you compose into a global architecture with Global Accelerator, while GCP's HTTP(S) Load Balancer is global by default. Neither is strictly better; the right choice depends on whether your traffic is HTTP-aware or raw TCP/UDP, whether you need global anycast reach, and how your workloads are deployed. Once you match the load balancer type to the traffic pattern, both platforms make the operational side — health checks, TLS, scaling — largely hands-off.

Load Balancing in System Design: A Practical Guide

Diganto Paul — Thu, 02 Jul 2026 16:32:06 +0000

How to distribute traffic, choose the right algorithm, and keep systems resilient at scale

Introduction

Every system that outgrows a single server eventually faces the same question: how do you spread incoming requests across multiple machines without breaking things? Load balancing is the answer — but doing it well requires more than just "add a load balancer." This guide covers the core concepts, algorithms, and architectural decisions behind effective load balancing in modern system design.

1. Why Load Balancing Matters

A load balancer sits between clients and backend servers, distributing requests so that:

No single server is overwhelmed while others sit idle
Failed servers are automatically removed from rotation
The system can scale horizontally by adding more servers behind the balancer
Latency is reduced by routing to the nearest or fastest available server

Without load balancing, scaling means buying a bigger machine (vertical scaling) — a strategy that hits a ceiling fast and creates a single point of failure.

2. Layer 4 vs. Layer 7 Load Balancing

Type	Operates At	Routing Decisions Based On	Examples
Layer 4 (Transport)	TCP/UDP	IP address, port	AWS NLB, HAProxy (TCP mode), IPVS
Layer 7 (Application)	HTTP/HTTPS	URL path, headers, cookies, content	NGINX, AWS ALB, Envoy

Layer 4 is faster and simpler — it just forwards packets. Layer 7 is smarter — it can route /api/* to one service and /static/* to another, terminate TLS, and inspect request content. Most modern architectures use Layer 7 for flexibility, falling back to Layer 4 when raw throughput matters most.

3. Load Balancing Algorithms

Choosing the right algorithm depends on your traffic pattern and backend characteristics.

Round Robin — requests distributed sequentially across servers. Simple, but ignores server load.
Least Connections — routes to the server with the fewest active connections. Better for long-lived or uneven requests.
Weighted Round Robin / Least Connections — accounts for servers with different capacities.
IP Hash — routes based on client IP, useful for session affinity without sticky sessions.
Least Response Time — sends traffic to the fastest-responding, least-loaded server. More adaptive, more overhead.
Random with Two Choices — picks two servers at random and routes to the less loaded one; a good balance of simplicity and effectiveness at scale.

upstream backend {
    least_conn;
    server 10.0.0.1:8080 weight=3;
    server 10.0.0.2:8080 weight=2;
    server 10.0.0.3:8080 weight=1;
}

server {
    listen 80;
    location / {
        proxy_pass http://backend;
    }
}

Tip: Start with round robin or least connections. Reach for adaptive algorithms only once you have metrics showing simple strategies aren't enough.

4. Health Checks

A load balancer is only as good as its ability to detect unhealthy servers.

Active health checks — the balancer periodically pings a /health endpoint.
Passive health checks — the balancer observes real traffic failures (timeouts, 5xx errors) and reacts.

healthCheck:
  path: /health
  intervalSeconds: 10
  timeoutSeconds: 2
  unhealthyThreshold: 3
  healthyThreshold: 2

Best practices:

Keep health check endpoints lightweight — don't run full dependency checks on every ping.
Use both active and passive checks together; passive checks catch issues active checks miss between intervals.
Set thresholds to avoid flapping (a server bouncing in and out of rotation).

5. Session Persistence (Sticky Sessions)

Some applications need a client to keep hitting the same backend server, typically because session state is stored in memory rather than a shared store.

Approach	How It Works	Trade-off
Cookie-based affinity	LB injects a cookie tying client to a server	Breaks if that server goes down
IP hash	Client IP maps deterministically to a server	Uneven for clients behind shared IPs (e.g., NAT)
Stateless design	Session state moved to Redis/DB, no affinity needed	Requires architectural change, but most scalable

Recommendation: where possible, design services to be stateless and externalize session data. It removes the need for sticky sessions entirely and makes failover seamless.

6. Global vs. Local Load Balancing

Local load balancing distributes traffic across servers within a single data center or region.
Global (DNS-based) load balancing distributes traffic across regions, typically using:
- GeoDNS — routes users to the nearest region
- Anycast routing — same IP announced from multiple locations; network routes to the closest
- Latency-based routing — routes based on measured latency (e.g., AWS Route 53)

A typical production setup layers both:

Client → Global LB (DNS/Anycast) → Regional Load Balancer → Service Instances

This combination minimizes latency for users while still balancing load within each region.

7. Load Balancers as a Single Point of Failure

Ironically, a load balancer can itself become the bottleneck or single point of failure if not designed carefully.

Mitigations:

Run load balancers in active-active or active-passive pairs.
Use a floating/virtual IP (via keepalived or a cloud-managed VIP) that fails over automatically.
For DNS-based global balancing, keep TTLs low enough to allow fast failover, but not so low that DNS query volume becomes a cost or performance issue.

        ┌────────────┐
        │   VIP      │
        └─────┬──────┘
       ┌───────┴───────┐
       │               │
 ┌─────▼─────┐   ┌─────▼─────┐
 │  LB Node A │   │  LB Node B │  (active-passive, keepalived)
 │  (active)  │   │  (standby) │
 └─────┬─────┘   └───────────┘
       │
 ┌─────▼──────────────────┐
 │   Backend Server Pool   │
 └──────────────────────────┘

8. Rate Limiting and Load Shedding

Load balancing isn't just about distribution — it's also about protecting backends from being overwhelmed entirely.

Rate limiting — cap requests per client/IP/API key to prevent abuse or runaway clients.
Circuit breaking — stop routing to a backend that's failing repeatedly, giving it time to recover.
Load shedding — intentionally drop or reject low-priority requests when the system is at capacity, preserving service for critical traffic.

rateLimiting:
  requestsPerSecond: 100
  burst: 20
  keyBy: client_ip

These mechanisms turn a load balancer from a passive traffic router into an active guardian of system stability.

Closing Thoughts

Good load balancing is invisible when it works — traffic flows smoothly, failures go unnoticed by users, and the system scales without drama. The key is treating the load balancer as a first-class architectural component, not an afterthought: pick the right layer and algorithm for your traffic, make health checks meaningful, remove single points of failure, and pair distribution with protection through rate limiting and circuit breaking. Get these fundamentals right, and load balancing becomes one less thing to worry about as your system grows.

How to configure, secure, and operate a production-ready Kubernetes cluster

Diganto Paul — Thu, 02 Jul 2026 16:27:11 +0000

Introduction

Kubernetes has become the de facto standard for container orchestration, but standing up a cluster is only the beginning. The real work of administration lies in configuring it correctly, securing it, and keeping it healthy over time. This guide walks through the core building blocks of Kubernetes cluster administration — from initial setup to day-two operations — so you can run a cluster with confidence.

1. Choosing Your Cluster Architecture

Before writing a single YAML file, decide how your cluster will be built and who will manage the control plane.

Approach	Best For	Trade-offs
Managed (EKS, GKE, AKS)	Teams that want to focus on workloads, not infrastructure	Less control over control-plane internals
Self-managed (kubeadm)	On-prem, air-gapped, or highly customized environments	Full responsibility for upgrades, HA, and patching
Lightweight (k3s, kind, minikube)	Edge, dev/test, or resource-constrained setups	Not typically suited for large-scale production

A common starting point for self-managed clusters is kubeadm, which automates control-plane bootstrapping while leaving room for customization.

# Initialize the control plane node
kubeadm init --pod-network-cidr=10.244.0.0/16

# Set up local kubeconfig access
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

2. Networking: Choosing a CNI Plugin

Kubernetes doesn't ship with built-in pod networking — you need a Container Network Interface (CNI) plugin. Popular choices include:

Calico — strong network policy support, widely used in production
Cilium — eBPF-based, excellent observability and security features
Flannel — simple, minimal overhead, good for smaller clusters

Installing Calico, for example, is typically a single manifest away:

kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/calico.yaml

Tip: Choose your CNI before joining worker nodes — some plugins require specific pod CIDR configurations set at kubeadm init time.

3. Node Configuration and Joining Workers

Once the control plane is up, worker nodes join using a token generated during initialization:

kubeadm token create --print-join-command

Run the resulting command on each worker node. After joining, verify cluster health:

kubectl get nodes -o wide
kubectl get pods -n kube-system

All nodes should report Ready, and system pods (CoreDNS, kube-proxy, CNI agents) should be Running.

4. Role-Based Access Control (RBAC)

Security starts with least-privilege access. RBAC governs who can do what within the cluster.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: development
  name: pod-reader
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
  namespace: development
subjects:
  - kind: User
    name: jane.doe
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

Best practices:

Avoid binding to cluster-admin except for break-glass accounts.
Use Groups over individual Users for easier management at scale.
Regularly audit bindings with kubectl get rolebindings,clusterrolebindings -A.

5. Resource Management: Quotas and Limits

Left unchecked, workloads can consume an entire cluster's resources. Use ResourceQuota and LimitRange to keep things fair.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: dev-quota
  namespace: development
spec:
  hard:
    requests.cpu: "4"
    requests.memory: 8Gi
    limits.cpu: "8"
    limits.memory: 16Gi
    pods: "20"

Pair this with per-container defaults:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: development
spec:
  limits:
    - default:
        cpu: 500m
        memory: 256Mi
      defaultRequest:
        cpu: 250m
        memory: 128Mi
      type: Container

6. Storage Configuration

Persistent workloads need reliable storage. Define a StorageClass so PersistentVolumeClaims can be dynamically provisioned:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp3
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer

Setting reclaimPolicy: Retain prevents accidental data loss when a PVC is deleted — a small setting that saves a lot of pain later.

7. Monitoring and Observability

You can't administer what you can't see. A standard, battle-tested stack includes:

Prometheus — metrics collection and alerting
Grafana — dashboards and visualization
Loki or EFK stack — centralized logging
kube-state-metrics — cluster object state exposed as metrics

A minimal Prometheus install via Helm:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace

Set up alerts for the essentials early: node disk pressure, pod crash loops, and API server latency.

8. Upgrades and Maintenance

Kubernetes releases new minor versions roughly every four months, and only the latest three minor versions are supported upstream. A safe upgrade path:

Back up etcd before any upgrade.
Upgrade the control plane first, one minor version at a time.
Drain and upgrade nodes individually:

kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
kubeadm upgrade node
kubectl uncordon <node-name>

Validate workloads after each stage before proceeding.

9. Backup and Disaster Recovery

At minimum, back up:

etcd (the source of truth for cluster state)
Persistent volumes (application data)
Cluster manifests (via GitOps, so they're already version-controlled)

Tools like Velero simplify this significantly:

velero backup create daily-backup --include-namespaces production

Test restores periodically — a backup you've never restored isn't a real backup.

Closing Thoughts

Kubernetes administration is less about a single "correct" configuration and more about establishing sane defaults, guardrails, and repeatable processes. Get networking, RBAC, and resource limits right early, invest in observability, and treat backups and upgrades as routine — not emergencies. With that foundation, your cluster will scale with your team rather than against it.