DEV Community

Dilip Kola
Dilip Kola

Posted on

Implementing a Cost-Efficient Micro services Platform on Azure Kubernetes

Terraform, Autoscaling, Spot Capacity, and Workload Identity

This article focuses on implementation details. Architectural rationale and cost trade-offs are covered in Part 1.

Implementing cost-efficient microservices on AKS


Scope and Assumptions

This post assumes:

  • Familiarity with Kubernetes fundamentals
  • Comfort reading Terraform and Helm
  • Interest in operating systems, not just deploying them

The platform runs on Azure Kubernetes Service, provisioned with Terraform, and deployed using Helm.


1. AKS Baseline: Start Small, Scale on Demand

The most common AKS cost mistake is provisioning for peak load.

We instead:

  • Start with minimal baseline capacity
  • Enable the cluster autoscaler
  • Let demand drive node count

AKS Cluster with Autoscaling

resource "azurerm_kubernetes_cluster" "aks" {
  name                = var.cluster_name
  location            = var.location
  resource_group_name = var.resource_group_name
  dns_prefix          = var.cluster_name

  default_node_pool {
    name                 = "default"
    vm_size              = "Standard_D2s_v5"
    auto_scaling_enabled = true
    min_count            = 1
    max_count            = 10
  }

  identity {
    type = "SystemAssigned"
  }
}
Enter fullscreen mode Exit fullscreen mode

Why this works

  • Idle cost remains low
  • Nodes are added only when pods are pending
  • Capacity matches actual demand, not estimates

2. Horizontal Pod Autoscaling with Predictable Behavior

Autoscaling defaults are aggressive and often unstable.

We explicitly tune scale behavior to reduce churn and latency spikes.

HPA with Stabilization

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-service
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
Enter fullscreen mode Exit fullscreen mode

Key outcomes

  • Prevents rapid scale-down during brief traffic dips
  • Improves tail latency
  • Reduces unnecessary pod restarts

3. Spot Node Pools for Fault-Tolerant Workloads

Spot capacity is one of the highest-leverage cost optimizations—when isolated properly.

Terraform: Spot Node Pool

resource "azurerm_kubernetes_cluster_node_pool" "spot" {
  name                  = "spot"
  kubernetes_cluster_id = azurerm_kubernetes_cluster.aks.id
  vm_size               = "Standard_D2as_v5"

  priority        = "Spot"
  eviction_policy = "Delete"
  spot_max_price  = -1

  auto_scaling_enabled = true
  min_count            = 0
  max_count            = 10

  node_taints = [
    "kubernetes.azure.com/scalesetpriority=spot:NoSchedule"
  ]
}
Enter fullscreen mode Exit fullscreen mode

Scheduling Workers on Spot Nodes

tolerations:
- key: "kubernetes.azure.com/scalesetpriority"
  operator: "Equal"
  value: "spot"
  effect: "NoSchedule"

nodeSelector:
  kubernetes.azure.com/scalesetpriority: spot
Enter fullscreen mode Exit fullscreen mode

Rules we followed

  • APIs never run on spot
  • Workers handle SIGTERM cleanly
  • All state lives outside the worker

Used this way, spot capacity delivered substantial savings without user-visible impact.


4. Managed PostgreSQL with Private Networking

Databases are not where cost experiments belong.

PostgreSQL runs as a managed service with:

  • Subnet delegation
  • Private DNS
  • No public access

Delegated Subnet for PostgreSQL

resource "azurerm_subnet" "postgres" {
  name                 = "postgres-subnet"
  virtual_network_name = azurerm_virtual_network.main.name
  resource_group_name  = var.resource_group_name
  address_prefixes     = ["10.0.2.0/24"]

  delegation {
    name = "postgres"
    service_delegation {
      name = "Microsoft.DBforPostgreSQL/flexibleServers"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Why this matters

  • Database is unreachable from the public internet
  • Access is restricted at the network layer
  • Operational risk is significantly reduced

5. Secure Deployments with Helm Defaults

Helm charts were written with secure-by-default assumptions.

Pod Security Context

securityContext:
  runAsUser: 1000
  runAsGroup: 1000
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
Enter fullscreen mode Exit fullscreen mode

This immediately:

  • Shrinks the attack surface
  • Prevents runtime mutation
  • Surfaces insecure images early

Health Probes That Matter

livenessProbe:
  httpGet:
    path: /health/db-cache
    port: 8080
  initialDelaySeconds: 60
Enter fullscreen mode Exit fullscreen mode

We intentionally check dependencies, not just process health.


6. Workload Identity: No Secrets in Kubernetes

Storing cloud credentials in Kubernetes secrets is unnecessary.

We use Workload Identity for pod-to-Azure authentication.

Federated Identity Credential

resource "azurerm_federated_identity_credential" "api" {
  name      = "api-federated"
  parent_id = azurerm_user_assigned_identity.api.id
  issuer    = azurerm_kubernetes_cluster.aks.oidc_issuer_url
  subject   = "system:serviceaccount:production:api-sa"
  audience  = ["api://AzureADTokenExchange"]
}
Enter fullscreen mode Exit fullscreen mode

Kubernetes Service Account

apiVersion: v1
kind: ServiceAccount
metadata:
  name: api-sa
  annotations:
    azure.workload.identity/client-id: "<client-id>"
Enter fullscreen mode Exit fullscreen mode

Result

  • No long-lived secrets in manifests
  • Automatic token rotation
  • Azure RBAC enforced at runtime

7. Observability Without Ingestion-Based Pricing

Instead of managed log ingestion, we use:

  • Prometheus for metrics
  • Loki for logs
  • Object storage for retention

Loki Storage Configuration

storage_config:
  azure:
    container_name: logs
    account_name: ${ACCOUNT_NAME}
    access_tier: Cool
Enter fullscreen mode Exit fullscreen mode

Why this works

  • Logs are queried infrequently
  • Storage is inexpensive
  • Ingestion costs dominate managed observability pricing

This preserved full visibility with minimal incremental cost.


8. Infrastructure Access Patterns (Secure and Practical)

Cluster Access

  • Azure AD–backed kubectl
  • Auditable
  • No shared credentials

In practice, cluster-admin access is restricted to a small bootstrap group; most teams use namespace-scoped roles.


Database Access (Occasional Admin Tasks)

kubectl run psql \
  --image=postgres:16 \
  --rm -it -- psql -h <private-host> -U admin
Enter fullscreen mode Exit fullscreen mode

The database remains private; access is authenticated and auditable.


Observability Dashboards

Grafana is protected behind OAuth using Azure AD.
OAuth2-Proxy runs with multiple replicas and rotated cookie secrets to avoid becoming a single point of failure.

No VPNs. No bastion hosts. No additional managed services required.


Operational Lessons (The Real Ones)

Defaults Are Rarely Production-Safe

Autoscaling, probes, and security contexts need explicit tuning.

Spot Capacity Requires Discipline

It works extremely well—but only when isolated.

Identity Scales Better Than Secrets

It reduces operational load and security risk.

Kubernetes Is a Tool, Not a Destination

Use it where it adds leverage—not as a dumping ground.


Closing Thoughts

This implementation isn’t about clever tricks—it’s about intentional trade-offs.

By:

  • Scaling only when needed
  • Using spot capacity responsibly
  • Keeping critical state managed
  • Avoiding ingestion-based observability costs
  • Treating security as a default

we ended up with a system that is:

  • Predictable to operate
  • Cost-efficient at idle
  • Resilient under load
  • Easy to evolve

Architecture sets direction.
Implementation determines whether it survives contact with reality.

Top comments (0)