Vivian Chiamaka Okose

Posted on Mar 21

Building Production-Grade Infrastructure on Azure with Terraform: A Complete Three-Tier Architecture

#terraform #devops #cloud #azure

Building Production-Grade Infrastructure on Azure with Terraform: A Complete Three-Tier Architecture

By Vivian Chiamaka Okose
Tags: #terraform #azure #devops #threetier #applicationgateway #mysql #nextjs #nodejs #iac #cloud #security

This is the project where everything clicked.

Not because it went smoothly -- it absolutely did not. A VM got compromised by a cryptominer midway through. Regional capacity restrictions blocked MySQL provisioning twice. An Application Gateway TLS policy error required a specific fix. An IP address changed mid-deployment and locked me out of my own server.

But every one of those problems had a solution. And working through each one taught me something that no tutorial could replicate.

This is the story of Assignment 5: deploying the Book Review App on Azure using a production-grade three-tier Terraform architecture.

The Architecture

This project uses a proper three-tier design with strict security boundaries between each layer:

Network Layer:

One VNet with CIDR 10.0.0.0/16
6 subnets across the address space: 2 web tier, 2 app tier, 2 database tier, plus a dedicated Application Gateway subnet
NSGs per tier with explicit inbound rules
Private DNS Zone for MySQL VNet integration

Compute Layer:

Web Tier VM running Next.js frontend via Nginx reverse proxy
App Tier VM running Node.js backend on port 5000 via PM2
Public Application Gateway fronting the web tier
Internal Load Balancer fronting the app tier

Database Layer:

Azure MySQL Flexible Server with private VNet integration
Not publicly accessible -- reachable only through the VNet
Delegated subnets for MySQL service

Why Multiple Terraform Files Matter

Previous projects used a single main.tf. This project used seven:

terraform-bookreview-azure/
  main.tf           # Provider and resource group
  networking.tf     # VNet, subnets, NSGs, DNS
  compute.tf        # VMs, NICs, public IPs
  loadbalancer.tf   # App Gateway and Internal LB
  database.tf       # MySQL Flexible Server
  variables.tf      # All configurable values
  outputs.tf        # Endpoints and IPs

This is not organisational preference -- it is a practical necessity at this scale. When the Application Gateway takes 9 minutes to provision and the MySQL server takes 7 minutes, you need to be able to read the plan output and understand which file to look at when something fails. A 500-line main.tf becomes unreadable. Separated files make the dependency chain visible.

The variables file also eliminates a security risk. Sensitive values like database passwords are defined once with sensitive = true, which prevents them from ever appearing in plan output or logs.

The NSG Design

Each tier has its own Network Security Group with rules specific to its role.

Web tier NSG allows HTTP (80), HTTPS (443) and SSH from a specific IP only:

security_rule {
  name                       = "Allow-SSH"
  priority                   = 120
  direction                  = "Inbound"
  access                     = "Allow"
  protocol                   = "Tcp"
  source_port_range          = "*"
  destination_port_range     = "22"
  source_address_prefix      = var.my_ip
  destination_address_prefix = "*"
}

App tier NSG allows port 3001 only from the web subnet CIDR:

security_rule {
  name                       = "Allow-App-Port"
  priority                   = 100
  direction                  = "Inbound"
  access                     = "Allow"
  protocol                   = "Tcp"
  source_port_range          = "*"
  destination_port_range     = "3001"
  source_address_prefix      = "10.0.1.0/24"
  destination_address_prefix = "*"
}

The database tier has no NSG rule for public access at all. MySQL is accessible only through the VNet integration. There is no port 3306 open to any external IP.

The Errors That Taught Me the Most

Error 1: MySQL Provisioning Disabled in UK South

Status: "ProvisioningDisabled"
Message: "Provisioning is restricted in this region."

Azure restricts MySQL Flexible Server provisioning in certain regions for free tier subscriptions. The fix was switching to West Europe. This is the kind of constraint you only discover by trying -- no documentation lists it clearly.

Error 2: Application Gateway Deprecated TLS Policy

ApplicationGatewayDeprecatedTlsVersionUsedInSslPolicy

Azure now requires an explicit modern TLS policy on Application Gateways. Adding this block resolved it:

ssl_policy {
  policy_type = "Predefined"
  policy_name = "AppGwSslPolicy20220101"
}

Error 3: Zone Not Available

Status: "ZoneNotAvailableForRegion"

Specifying zone = "1" for the MySQL server failed because that zone was not available. Removing the zone specification entirely let Azure pick an available zone automatically.

Error 4: Read Replica Not Supported on Burstable Tier

ReplicationNotSupportedForBurstableEdition

The B_Standard_B1ms burstable tier does not support read replicas. This requires the General Purpose tier which costs significantly more. For a learning environment, the replica was removed. In production, you would use GP_Standard_D2ds_v4 or similar.

The Security Incident

Midway through the first deployment, the PM2 logs showed something alarming:

xmrig-6.21.0/xmrig -o pool.supportxmr.com:443
scanner_linux -t 1000

The VM had been compromised by a cryptominer within hours of deployment. The attack vector was password authentication on port 22 open to 0.0.0.0/0. Automated bots scan the entire internet for open SSH ports and attempt common passwords. The password BookReview@1234! was not weak by typical standards, but brute force tools eventually found it.

The response was immediate:

Exit the compromised VM
Run terraform destroy to wipe all infrastructure
Rebuild with SSH key authentication and IP-restricted NSG rules

The rebuilt configuration:

resource "azurerm_linux_virtual_machine" "web_vm" {
  disable_password_authentication = true

  admin_ssh_key {
    username   = var.admin_username
    public_key = var.admin_ssh_public_key
  }
}

And the NSG SSH rule restricted to a specific IP:

source_address_prefix = var.my_ip  # "102.90.126.10/32"

SSH keys are mathematically impossible to brute force. A 4096-bit RSA key has more possible values than atoms in the observable universe. Combined with IP restriction, the attack surface becomes essentially zero.

This was not a lab exercise in security. It was a real attack on a real VM that I responded to in real time. That experience is worth more than any security tutorial.

The Application Deployment

After the secure rebuild, the deployment went cleanly:

Backend startup confirmation:

Database 'book_review_db' connected successfully with SSL!
✅ Database schema updated successfully!
👤 Sample users added!
📚 Sample books added!
✍️ Sample reviews added!
🚀 Server running on port 5000

Nginx configuration proxying port 80 to Next.js on 3000 and API calls to backend on 5000:

server {
    listen 80;
    server_name _;

    location /api/ {
        proxy_pass http://localhost:5000/api/;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
    }

    location / {
        proxy_pass http://localhost:3000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
    }
}

The app loaded with books, reviews, user registration and login all working end to end.

What I Now Understand About Production Infrastructure

Tier isolation is not optional. Each layer having its own NSG with explicit inbound rules means a compromise at the web tier cannot automatically reach the database. Defence in depth is built into the network topology itself.

Terraform file organisation is architecture documentation. When someone new joins a team, networking.tf tells them the network design. database.tf tells them the data tier. The files are the architecture diagram expressed in executable code.

IP addresses change. Dynamic IPs on home connections mean NSG rules need updating whenever your IP changes. In production this is handled with VPN gateways or jump boxes. In a learning environment, it means keeping terraform apply fast and knowing how to update a single variable without destroying everything.

The rebuild is not failure. Destroying compromised infrastructure and rebuilding from scratch with security fixes applied is exactly what the immutable infrastructure pattern is designed for. Terraform made that rebuild take 25 minutes instead of days.

What Is Next

This was the fifth and final project in my Terraform series. The full series covered Azure VMs, AWS EC2 with custom VPC, React app deployment, full-stack EpicBook with RDS, and this production three-tier architecture.

Each project built on the last. The networking patterns from Assignment 2 informed the VPC design in Assignment 4. The Nginx SPA configuration from Assignment 3 carried directly into Assignment 5. The RDS security group design from Assignment 4 shaped the MySQL VNet integration approach here.

That is what a curriculum looks like when it is designed properly. And that is what I am taking forward into the next chapter of this DevOps journey.

I document cloud infrastructure projects in public. Follow along for Terraform, AWS, Azure, and real-world DevOps engineering.

GitHub: vivianokose

DEV Community

Building Production-Grade Infrastructure on Azure with Terraform: A Complete Three-Tier Architecture

Building Production-Grade Infrastructure on Azure with Terraform: A Complete Three-Tier Architecture

The Architecture

Why Multiple Terraform Files Matter

The NSG Design

The Errors That Taught Me the Most

Error 1: MySQL Provisioning Disabled in UK South

Error 2: Application Gateway Deprecated TLS Policy

Error 3: Zone Not Available

Error 4: Read Replica Not Supported on Burstable Tier

The Security Incident

The Application Deployment

What I Now Understand About Production Infrastructure

What Is Next

Top comments (0)