kazeem mohammed

Posted on Aug 28

Zero-Downtime Deployments on Kubernetes (Step-by-Step)

#kubernetes #cloud #devops #microservices

In today’s always-on world, downtime is expensive — both in terms of money and customer trust. Whether you’re running a SaaS product, an internal service, or a mission-critical API, you can’t afford to have even a few minutes of outage during upgrades.

That’s where zero-downtime deployments on Kubernetes come in.

In this article, we’ll walk step-by-step through how to update applications running on Kubernetes without causing any service interruption , complete with practical YAML examples , best practices , and troubleshooting tips.

Why Zero-Downtime Matters

Imagine you’re deploying a new version of your application at 2:00 PM on a busy weekday. If your deployment strategy stops the old pods before starting the new ones, users may experience failed requests, 500 errors, or complete outages.

Zero-downtime deployment ensures:

No user sees an error during upgrades.
Traffic is smoothly shifted from old to new versions.
You can roll back quickly if something goes wrong.

Kubernetes Strategies for Zero-Downtime

Kubernetes provides multiple deployment strategies, but for most cases, Rolling Updates is the default and the easiest way to achieve zero downtime.

1. Rolling Update

Pods are replaced gradually with new ones while keeping the service available.

Pros: Simple, built-in, no extra tools needed.
Cons: Harder to do database schema changes that aren’t backward-compatible.

2. Blue-Green Deployment

You run two environments (Blue = current, Green = new) and switch traffic instantly.

Pros: Instant rollback.
Cons: Requires double resources during deployment.

3. Canary Deployment

Deploy new versions to a small percentage of users first, then gradually increase.

Pros: Lower risk of mass outages.
Cons: More setup complexity.

In this guide, we’ll focus on Rolling Updates (with a touch on Blue-Green).

Step-by-Step: Zero-Downtime Rolling Update

Let’s walk through a practical example.

Step 1 — Prepare Your Deployment

Here’s a basic Deployment YAML :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: my-app-container
          image: myregistry/my-app:v1
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 20

Key settings for zero downtime:

maxUnavailable: 0 → Never take down more pods than needed.
maxSurge: 1 → Allow at most 1 extra pod above the desired count during updates.
readinessProbe → Ensures traffic only hits healthy pods.
livenessProbe → Restarts pods automatically if they get stuck.

Step 2 — Deploy Version 1

kubectl apply -f deployment.yaml
kubectl rollout status deployment/my-app

You should see:

deployment "my-app" successfully rolled out

Step 3 — Update to Version 2

Change the image tag in the YAML:

image: myregistry/my-app:v2

Apply the update:

kubectl apply -f deployment.yaml
kubectl rollout status deployment/my-app

Kubernetes will:

Spin up 1 new pod (maxSurge).
Wait until it passes the readiness probe.
Terminate 1 old pod (maxUnavailable=0 means keep all old pods running until new ones are ready).
Repeat until all pods are updated.

During this, traffic is never sent to unready pods.

Step 4 — Validate Zero Downtime

You can test with a continuous request loop:

while true; do curl -s http://<service-ip>/ | grep "version"; sleep 0.5; done

During deployment, you should see responses alternating between v1 and v2, but no failures.

Blue-Green Deployment: Instant Rollback Option

If you want an instant rollback path , try Blue-Green.

Example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app-green
  template:
    metadata:
      labels:
        app: my-app-green
    spec:
      containers:
        - name: my-app-container
          image: myregistry/my-app:v2
          ports:
            - containerPort: 8080

You keep your Service pointing to the blue deployment until green is ready, then update the selector:

kubectl patch service my-app-service -p '{"spec":{"selector":{"app":"my-app-green"}}}'

Rollback? Just point the service back to blue.

Best Practices for Zero-Downtime Kubernetes Deployments

Always use Readiness Probes — without them, traffic may hit pods that are still starting.
Avoid Breaking Changes — your new version should work with old clients and database schemas.
Set Proper Resource Requests/Limits — avoid pod evictions due to resource starvation.
Use kubectl rollout pause/resume for controlled, manual rollouts.
Enable PodDisruptionBudgets (PDBs) to prevent too many pods from going down during maintenance.
Monitor During Deployments — tools like Prometheus, Grafana, and Datadog can alert you to issues in real time.
Use Separate Namespaces for Staging & Production — test your deployment process before going live.

Final Thoughts

Zero-downtime deployments aren’t just a nice-to-have — they’re essential for modern applications. Kubernetes gives you the tools, but it’s your deployment strategy and application design that make it truly zero-downtime.

By combining rolling updates , health checks , and careful configuration , you can ship new features and fixes without users even noticing a blip.

💬 What deployment strategy do you use in Kubernetes — rolling, blue-green, or canary? Share your thoughts in the comments!