Building Production-Grade Infrastructure on Azure with Terraform: A Complete Three-Tier Architecture
By Vivian Chiamaka Okose
Tags: #terraform #azure #devops #threetier #applicationgateway #mysql #nextjs #nodejs #iac #cloud #security
This is the project where everything clicked.
Not because it went smoothly -- it absolutely did not. A VM got compromised by a cryptominer midway through. Regional capacity restrictions blocked MySQL provisioning twice. An Application Gateway TLS policy error required a specific fix. An IP address changed mid-deployment and locked me out of my own server.
But every one of those problems had a solution. And working through each one taught me something that no tutorial could replicate.
This is the story of Assignment 5: deploying the Book Review App on Azure using a production-grade three-tier Terraform architecture.
The Architecture
This project uses a proper three-tier design with strict security boundaries between each layer:
Network Layer:
- One VNet with CIDR
10.0.0.0/16 - 6 subnets across the address space: 2 web tier, 2 app tier, 2 database tier, plus a dedicated Application Gateway subnet
- NSGs per tier with explicit inbound rules
- Private DNS Zone for MySQL VNet integration
Compute Layer:
- Web Tier VM running Next.js frontend via Nginx reverse proxy
- App Tier VM running Node.js backend on port 5000 via PM2
- Public Application Gateway fronting the web tier
- Internal Load Balancer fronting the app tier
Database Layer:
- Azure MySQL Flexible Server with private VNet integration
- Not publicly accessible -- reachable only through the VNet
- Delegated subnets for MySQL service
Why Multiple Terraform Files Matter
Previous projects used a single main.tf. This project used seven:
terraform-bookreview-azure/
main.tf # Provider and resource group
networking.tf # VNet, subnets, NSGs, DNS
compute.tf # VMs, NICs, public IPs
loadbalancer.tf # App Gateway and Internal LB
database.tf # MySQL Flexible Server
variables.tf # All configurable values
outputs.tf # Endpoints and IPs
This is not organisational preference -- it is a practical necessity at this scale. When the Application Gateway takes 9 minutes to provision and the MySQL server takes 7 minutes, you need to be able to read the plan output and understand which file to look at when something fails. A 500-line main.tf becomes unreadable. Separated files make the dependency chain visible.
The variables file also eliminates a security risk. Sensitive values like database passwords are defined once with sensitive = true, which prevents them from ever appearing in plan output or logs.
The NSG Design
Each tier has its own Network Security Group with rules specific to its role.
Web tier NSG allows HTTP (80), HTTPS (443) and SSH from a specific IP only:
security_rule {
name = "Allow-SSH"
priority = 120
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "22"
source_address_prefix = var.my_ip
destination_address_prefix = "*"
}
App tier NSG allows port 3001 only from the web subnet CIDR:
security_rule {
name = "Allow-App-Port"
priority = 100
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "3001"
source_address_prefix = "10.0.1.0/24"
destination_address_prefix = "*"
}
The database tier has no NSG rule for public access at all. MySQL is accessible only through the VNet integration. There is no port 3306 open to any external IP.
The Errors That Taught Me the Most
Error 1: MySQL Provisioning Disabled in UK South
Status: "ProvisioningDisabled"
Message: "Provisioning is restricted in this region."
Azure restricts MySQL Flexible Server provisioning in certain regions for free tier subscriptions. The fix was switching to West Europe. This is the kind of constraint you only discover by trying -- no documentation lists it clearly.
Error 2: Application Gateway Deprecated TLS Policy
ApplicationGatewayDeprecatedTlsVersionUsedInSslPolicy
Azure now requires an explicit modern TLS policy on Application Gateways. Adding this block resolved it:
ssl_policy {
policy_type = "Predefined"
policy_name = "AppGwSslPolicy20220101"
}
Error 3: Zone Not Available
Status: "ZoneNotAvailableForRegion"
Specifying zone = "1" for the MySQL server failed because that zone was not available. Removing the zone specification entirely let Azure pick an available zone automatically.
Error 4: Read Replica Not Supported on Burstable Tier
ReplicationNotSupportedForBurstableEdition
The B_Standard_B1ms burstable tier does not support read replicas. This requires the General Purpose tier which costs significantly more. For a learning environment, the replica was removed. In production, you would use GP_Standard_D2ds_v4 or similar.
The Security Incident
Midway through the first deployment, the PM2 logs showed something alarming:
xmrig-6.21.0/xmrig -o pool.supportxmr.com:443
scanner_linux -t 1000
The VM had been compromised by a cryptominer within hours of deployment. The attack vector was password authentication on port 22 open to 0.0.0.0/0. Automated bots scan the entire internet for open SSH ports and attempt common passwords. The password BookReview@1234! was not weak by typical standards, but brute force tools eventually found it.
The response was immediate:
- Exit the compromised VM
- Run
terraform destroyto wipe all infrastructure - Rebuild with SSH key authentication and IP-restricted NSG rules
The rebuilt configuration:
resource "azurerm_linux_virtual_machine" "web_vm" {
disable_password_authentication = true
admin_ssh_key {
username = var.admin_username
public_key = var.admin_ssh_public_key
}
}
And the NSG SSH rule restricted to a specific IP:
source_address_prefix = var.my_ip # "102.90.126.10/32"
SSH keys are mathematically impossible to brute force. A 4096-bit RSA key has more possible values than atoms in the observable universe. Combined with IP restriction, the attack surface becomes essentially zero.
This was not a lab exercise in security. It was a real attack on a real VM that I responded to in real time. That experience is worth more than any security tutorial.
The Application Deployment
After the secure rebuild, the deployment went cleanly:
Backend startup confirmation:
Database 'book_review_db' connected successfully with SSL!
✅ Database schema updated successfully!
👤 Sample users added!
📚 Sample books added!
✍️ Sample reviews added!
🚀 Server running on port 5000
Nginx configuration proxying port 80 to Next.js on 3000 and API calls to backend on 5000:
server {
listen 80;
server_name _;
location /api/ {
proxy_pass http://localhost:5000/api/;
proxy_http_version 1.1;
proxy_set_header Host $host;
}
location / {
proxy_pass http://localhost:3000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
}
}
The app loaded with books, reviews, user registration and login all working end to end.
What I Now Understand About Production Infrastructure
Tier isolation is not optional. Each layer having its own NSG with explicit inbound rules means a compromise at the web tier cannot automatically reach the database. Defence in depth is built into the network topology itself.
Terraform file organisation is architecture documentation. When someone new joins a team, networking.tf tells them the network design. database.tf tells them the data tier. The files are the architecture diagram expressed in executable code.
IP addresses change. Dynamic IPs on home connections mean NSG rules need updating whenever your IP changes. In production this is handled with VPN gateways or jump boxes. In a learning environment, it means keeping terraform apply fast and knowing how to update a single variable without destroying everything.
The rebuild is not failure. Destroying compromised infrastructure and rebuilding from scratch with security fixes applied is exactly what the immutable infrastructure pattern is designed for. Terraform made that rebuild take 25 minutes instead of days.
What Is Next
This was the fifth and final project in my Terraform series. The full series covered Azure VMs, AWS EC2 with custom VPC, React app deployment, full-stack EpicBook with RDS, and this production three-tier architecture.
Each project built on the last. The networking patterns from Assignment 2 informed the VPC design in Assignment 4. The Nginx SPA configuration from Assignment 3 carried directly into Assignment 5. The RDS security group design from Assignment 4 shaped the MySQL VNet integration approach here.
That is what a curriculum looks like when it is designed properly. And that is what I am taking forward into the next chapter of this DevOps journey.
I document cloud infrastructure projects in public. Follow along for Terraform, AWS, Azure, and real-world DevOps engineering.
GitHub: vivianokose
Top comments (0)