Hi dear reader 👋, Welcome to my very first blog post on DEV, grab a cup of coffee, this one's worth it.
Introduction
I have been on a deliberate journey to move beyond certifications and into the craft of engineering in the cloud. While I hold a number of cloud certs and a strong foundational background of cloud concepts, I knew that certs alone wouldn't cut it. To truly master cloud engineering, I have to build, I have to understand how the pieces connect, where things break, and how to design systems that are not just functional, but resilient and reproducible.
That search led me to the Cloud Resume Challenge by Forrest Brazeal (Azure Edition), and it turned out to be exactly the right project at the right time.
While going through this challenge, i did not build it to complete the checklist but to practice intentional engineering - to make architectural decisions consciously, understand the trade-offs behind them, and come out the other side with a system I could genuinely reason about.
So in this post, what follows is not largely a step-by-step guide, but a breakdown of the architectural choices, trade-offs, lessons and principles that shaped this build.
Let's dive in!
Architecture Overview
Before getting into the weeds, here's a high-level view of what the final system looks like:

Fig 1.0 - Image above, showing high-level architecture diagram of all components of the Cloud Resume Challenge build on Azure.
- Frontend - Azure Storage Account (Static Website), HTML, CSS & JavaScript
- Domain Registrar - Namecheap
- DNS, SSL & Edge Caching / Redirects - Cloudflare (Free-tier)
- Backend API - Azure Function App (Python)
- Database - Azure Cosmos DB (NoSQL)
- Infrastructure as Code (Iac) - Terraform
- Remote State Backend (Terraform) - Azure Blob Storage
- Automation (CI/CD) - GitHub Actions
- Cost Management - Azure Budgets
The Frontend
The frontend is a website that renders my resume and features a visitor counter that tracks every visit in real-time.
🌐 Live Project: wisdomresume.site

Fig 1.1 - Image above showing the visitor counter displayed in the footer section of the live resume website.
I was already comfortable with frontend web development, so I didn't want to spend too much time here. I generated a base User Interface (UI) using v0.dev and customized it to fit my needs. I then hosted it on an Azure Storage Account container with Static Website hosting enabled, which gives you a public URL out of the box.
The CDN & Custom Domain Challenge
Situation: The Cloud Resume Challenge (CRC) requires HTTPS and a custom domain. A custom domain e.g. (wisdomresume.site), is significantly more professional and readable than the default Azure blob endpoint (https://storage-account-name.z6.web.core.windows.net/), and getting it properly configured was a non-negotiable part of the build.
Obstacle: The expected path forward, which was using Azure CDN (Classic) for content delivery and custom domain name hit an immediate wall, because Azure CDN (Classic) is being retired, and new resources must now use Azure Front Door (AFD) Standard.

Fig 1.2 - Image above, showing Azure docs warning that Azure CDN Classic will be retired on August 15, 2025, and recommending migration to Azure Front Door
Unlike the original CDN's consumption-based pricing model, Front Door carries a minimum monthly base fee of approximately $35, which is inefficient for a personal resume project with a strict zero-cost operational target.
What I Did: Rather than accepting the cost, I researched alternatives and landed on Cloudflare's free tier as a replacement. I had already purchased my domain through Namecheap, so I added it to Cloudflare and delegated DNS management by associating the Cloudflare's DNS zone nameservers to Namecheap. From there, three deliberate configurations brought the setup to production standard:
- Indirect CNAME Validation — Azure requires domain ownership proof before serving content over a custom domain. As shown in the configuration below (
with specific domain and target details redacted for security), I usedasverifyCNAME records, keptunproxied, to satisfy Azure's verification logic while routing all live traffic through Cloudflare's Web Application Firewall (WAF).

Fig 1.3 - Image above showing Cloudflare DNS configuration and the unproxied asverify CNAME records used for Azure domain ownership verification.
- SSL Full (Strict) — Think of this as a two-lock security system. Most basic setups only encrypt the connection between Cloudflare and the visitor's browser, but what about the connection between Cloudflare and Azure? That leg could still be exposed. By enabling SSL Full (Strict), I told Cloudflare: "Don't just secure the front door, verify the back door too." Ensuring encryption is enforced across the entire request path, not just half of it.

Fig 1.4: Image showing Cloudflare SSL/TLS settings - enforcing end-to-end encryption between the visitors browser, Cloudflare, and Azure (Origin Server).
- Edge-Level Redirect Handling — Instead of letting the request travel all the way to Azure just to be redirected, Cloudflare intercepts it at the edge (a server closest to the visitor), and instantly issues a 301 Permanent Redirect, sending them to
www.wisdomresume.site. Like a receptionist who directs you before you even walk through the door, which greatly reduces latency.

Fig 1.5 - Image showing Cloudflare Page Rule configured to issue a 301 Permanent Redirect from the root domain to the www subdomain at the network edge.
The Final Result: I achieved a full CDN, SSL, and a properly configured custom domain, at $0/month. The $35 price tag that blocked me ended up pushing me toward a better solution altogether.
The Permissions Lesson
While pushing an updated counter.js file via the Azure CLI:
az storage blob upload --account-name storageacctname --container-name '$web' --name counter.js --file counter.js --overwrite true --auth-mode login
I got blocked with a permissions error as shown below. My first reaction was confusion: I'm the Global Admin. Why am I being denied?
You do not have the required permissions needed to perform this operation. Depending on your operation, you may need to be assigned one of the following roles:
"Storage Blob Data Owner"
"Storage Blob Data Contributor"
"Storage Blob Data Reader"
"Storage Queue Data Contributor"
"Storage Queue Data Reader"
"Storage Table Data Contributor"
"Storage Table Data Reader"
If you want to use the old authentication method and allow querying for the right account key, please use the "--auth-mode" parameter and "key" value.
Then i realized: subscription-level ownership doesn't automatically grant data-plane access to individual resources. I had to explicitly assign myself the Storage Blob Data Contributor role. This is the principle of least privilege operating exactly as designed, access is explicit, never assumed. It's the kind of thing you read about in documentation but really internalize when you live it.
Cache Busting
Situation: After deploying updates to the storage container hosting my static website files, I noticed the website was still serving old content. The culprit was Cloudflare's caching layer, which had retained the previous version. A way to resolve this was manually purging the cache, but doing this every time wasn't going to work long-term.
The fix: I did something called cache busting, versioning the JavaScript reference in my HTML:
<script src="counter.js?v=2" defer> </script>
Each update I made to my code, I'd manually increment the version number of the JavaScript reference in my HTML - for example in the script above it shows src="counter.js?v=2" so next would be src="counter.js?v=3". This makes the browser treat it as a new resource request and fetches the latest file.
Result: No need logging into my Cloudflare account to manually purge cache, once i made an update to my code, and incremented the version number of the JavaScript reference in my HTML, once any visitor refreshes they get the latest changes instantly.
Note: Improvements where made to this when i integrated GitHub Actions, which permanently made the process of cache purge automatic.
The Database
Situation: Every time someone visits my resume, that visit needs to be counted and saved — even after the browser closes, the page refreshes, or the server restarts. That's what a database is for. Visitor counts need to persist, and Azure Cosmos DB was the right tool for the job.
Obstacle: The original CRC spec recommends Cosmos DB Table API. But before just following the spec, I paused to ask myself: is this still the best choice? Microsoft continuously evolves its data platform, and building on a stack that's losing momentum can create technical debt down the line. I needed something with strong long-term support and clean compatibility with the rest of my stack — particularly Terraform.
What I Did: After research, I opted for Cosmos DB NoSQL (Core SQL) API instead. Think of it as choosing a modern, well-maintained road over an older one that's slowly being phased out. The reasons were concrete:
- Better and more reliable support in the Terraform AzureRM provider
- Cleaner fit with modern JSON document modelling
- Avoids legacy table storage constraints that introduce rigidity in IaC workflows
- Free tier available — keeping costs at zero ($0)

Fig 1.6 - Image above showing Azure Portal showing the Cosmos DB NoSQL API offering.
Also while setting up the capacity mode, I chose Provisioned Throughput over Serverless. Think of Serverless like a pay-as-you-go taxi, convenient, but unpredictable on cost. Provisioned Throughput is more like a monthly travel pass, you know exactly what you're getting and what you're paying. At 400 RU/s, it can comfortably handle around 400 concurrent visitors with no throttling risk and no surprise bills.

Fig 1.7 - Image above showing Cosmos DB throughput screen with Provisioned Throughput configured at 400 RU/s.
Result: A modern, cost-predictable database layer, fully compatible with Terraform, built on a supported API, and running at $0/month. The deliberate deviation from the spec paid off.

Fig 1.8 - Image showing Azure Portal Cosmos DB overview page with the free tier badge.
The Backend API
Situation: The frontend needed a way to communicate with the database, but exposing the database directly to the public internet was never an option. I needed a secure middleware layer that could sit between the two, handle the business logic, and keep sensitive credentials away from the browser entirely. That's where the Azure Function App came in.
Obstacle: Without a middleware layer, the database would need to be publicly accessible — which is a serious security risk. Any credentials embedded in the frontend code would be visible to anyone who inspected the browser's network requests. Beyond security, there was also the question of where concerns like Cross Origin Resource Sharing (CORS) enforcement and input validation should live. Scattering that logic across the frontend is messy and unreliable.
Action: I built the backend using an Azure Function App in Python. Here's how the flow works: when a visitor lands on my site, the frontend sends an HTTP request to the Function App. The Function retrieves the current visitor count from Cosmos DB, increments it, and returns the updated value back to the browser. This makes it clean, decoupled, and secure, the database never touches the public internet.

Fig 1.9 - Image above showing the flow between the visitors browser, middleware (Azure Function App) and the Database.

Fig 1.10 - Image above showing the Function App overview with the HTTP trigger configured and ready in the Azure Portal.
One configuration that deserved careful attention was CORS. Without explicitly whitelisting my website's origin in the Function App settings, the browser automatically blocks the request, a silent failure that's easy to miss until it stops your entire frontend from working. Getting this right from the start saved a lot of debugging time.

Fig 1.11 - Image above showing the frontend origin whitelisted as an allowed origin in the Function App CORS configuration on Azure Portal.
Obstacle 2 (Within the Build) - The Python Import Error: While running backend tests with pytest -q, I ran into this:
ModuleNotFoundError: No module named 'src'
The failing import was: from src.visitors_service import increment_counter
My project had __init__.py files in both api/ and api/src/ — the correct structure for a valid Python package. So why couldn't Python find it?
The issue wasn't the package at all. It was the Python import path. When pytest runs from the command line, the project root isn't automatically added to sys.path, so Python had no idea where to look for the api package. It's the kind of problem that looks like broken code but is actually just a misconfigured environment.

Image 1.12 - Image above showing the ModuleNotFoundError in the terminal output, when running pytest before the import path was configured.
The fix: It was a single pytest.ini file placed in the project root. This is a configuration file that pytest reads automatically when it runs, think of it as a set of instructions you leave for pytest before it starts, telling it exactly how to behave in your project environment.
In this case, I needed to tell pytest one specific thing: "start looking for packages from the project root."

Fig 1.13 - Image above showing the pytest.ini configuration file with the pythonpath set to the project root directory.
Result: 2 passed in 0.56s — backend tests passing, Function App deployed, and the visitor counter working end-to-end.

Fig 1.14 - Image showing Terminal output with 2 tests passed after the pytest.ini fix was applied.
The broader lesson: before assuming your code is broken, verify your test runner's working directory and path configuration. The environment can mislead you just as easily as the code itself.
Infrastructure as Code (Terraform)
Situation: After weeks of manually provisioning and configuring resources through the Azure Portal, I had a working system, but it was fragile. If I needed to rebuild it, I'd have to remember every click, every setting, every configuration detail. That's not engineering, that's guesswork. I needed a way to make this entire deployment reproducible and deployable in minutes, from anywhere.
I chose Terraform over Azure's native Bicep, not because Bicep isn't capable, but because Terraform is multi-cloud. Learning it once gives me the same foundational skills across Azure, AWS, and GCP. That transferability was worth more to me than any platform-native convenience.

Fig 1.15 - Image above showing VS Code Explorer panel with the Terraform project file structure including main.tf and .terraform.lock.hcl.
For context, my Terraform configuration lives in a single main.tf file, this is where all the resource definitions, provider configuration, and backend setup are declared. The .terraform.lock.hcl file is auto-generated by Terraform and locks the provider versions to ensure consistent deployments across environments.
Obstacle 1 - Import or Recreate? My first challenge: I already had live resources running. Do I tear everything down and recreate them with Terraform, or import them into Terraform state as they are?
Recreating felt risky for two concrete reasons:
- - Azure allows only one free-tier Cosmos DB per subscription, recreating would forfeit mine and potentially introduce cost.
- - I had working, tested configurations I wasn't willing to gamble with.
What I did: I used config-driven import (available from Terraform 1.5+), writing the resource configuration first, then declaratively linking it to the existing infrastructure. Think of it like drawing a map of a building that already exists, rather than demolishing it and rebuilding from scratch. Far cleaner than fighting with terraform import commands.
The moment it worked:
No changes. Your infrastructure matches the configuration.
That output meant Terraform fully understood my existing infrastructure. From that point, everything was under declarative control.

Fig 1.16 - Image above showing Terminal output: "No changes. Your infrastructure matches the configuration." confirming successful Terraform import
Obstacle 2 - Local State Danger: I started with a local terraform.tfstate file, which sounds fine until I realized what it meant. That file is my infrastructure's memory. If I lose it, corrupt it, or accidentally delete it, and Terraform no longer knows what it deployed. I'd be flying blind.
Action: I migrated to a remote backend on Azure Blob Storage with versioning enabled. Think of it like moving your most important document from a sticky note on your desk to a fireproof safe in the cloud. This gave me:
- State locking — prevents two processes from modifying infrastructure at the same time
- Version history — every change to the state file is recorded and recoverable
- Independence — the source of truth lives in the cloud, not on my laptop

Fig 1.17 - Image above showing the Azure Blob Storage backend setup for storing Terraform state remotely on Azure portal.
Obstacle 3 - Provider Compatibility Conflict: While deploying Azure Functions on the Flex Consumption plan, I ran into deprecated properties and configuration conflicts that had nothing to do with my logic. I spent time second-guessing my configuration before realizing the real problem, an outdated AzureRM provider.
What I did: I Upgraded my terraform provider version to v4.61.0, this resolved everything immediately. Cloud services evolve fast, and Terraform providers need to keep pace with them. The rule I now follow: when Terraform behaves unexpectedly, check the provider version before touching anything else.
Obstacle 4 - The Single Resource Group Risk: It's tempting to put everything in one Resource Group because it's simpler and feels tidy. But there's a quiet danger: if a deployment error wipes that Resource Group, it takes the Terraform state file with it. The very file I'd need to recover is now gone.
What I Did: I implemented a dual-resource-group architecture, think of it as separating your house keys from your spare keys. You never keep them in the same place.
- Management Resource Group — holds the Terraform remote backend, the immutable source of truth. This never gets touched by application deployments.
- Application Resource Group — holds all the functional resume workloads. This is where deployments happen.
This separation creates isolated RBAC boundaries for each group and ensures that even if something goes wrong at the application level, the state backend remains untouched and recoverable.

Fig 1.18 - Image above showing both the Management and Application Resource Groups side by side, demonstrating the dual-resource-group separation on Azure.
Result: A fully reproducible infrastructure where every resource, configuration and dependency is expressed as code. What took weeks to build manually can now be deployed in minutes. The system is auditable and recoverable.
![]()
Fig 1.19 - Image above showing a successful terraform apply with all resources provisioned and no errors in the terminal output.
CI/CD with GitHub Actions
Situation: At this point, every deployment was still a manual process - push code, open the terminal, run commands, hope nothing breaks. I needed a pipeline that would take code changes and automatically deliver them to the right place, reliably and consistently, every single time. This is where GitHub Actions came in, and honestly, it's where the project stopped feeling like a personal experiment and started feeling like a real production service.

Fig 1.20 - Image above showing the CI/CD workflow files listed under the .github/workflows directory.
Obstacle 1 - Every Change Triggering a Full Redeployment: My first concern was efficiency. I didn't want a small CSS tweak on the frontend triggering a full redeployment of the backend — and vice versa. That's wasted time, wasted runner minutes, and unnecessary risk.
What I Did: I implemented path filters in the GitHub Actions workflows. Think of it like a smart postman who only delivers to the address on the envelope, frontend changes only redeploy the static site, and backend changes only redeploy the Function App.

Fig 1.21 - Image above showing the GitHub Actions workflow YAML file with the path filter configuration for the backend API deployment pipeline.

Fig 1.22 - Image above showing the GitHub Actions workflow YAML file with the path filter configuration for the frontend deployment pipeline.
Result: Faster builds, fewer wasted runner minutes, and a deployment process that is proportional to the change being made.
Obstacle 2 - Environment Mismatch Between GitHub and Azure: I started hitting deployment failures that made no sense locally. Everything worked fine on my machine, but the pipeline kept failing. The problem was a mismatch between the GitHub Actions runner environment and Azure's Linux runtime, causing Python dependency resolution to break at mid-deployment.
What I Did: The fix was enabling remote-build: true in the Azure Functions deployment action. Instead of the GitHub runner trying to build dependencies in its own environment, it hands that responsibility to Azure, letting Azure resolve and build everything natively in its own runtime. Think of it like shipping flat-pack furniture to be assembled at the destination rather than trying to build it in the delivery van.

Fig 1.23 - Image above showing GitHub Actions workflow YAML file with the remote-build: true configuration in the Azure Functions deployment step.
Result: Deployment failures stopped. The pipeline became reliable and consistent from that point forward.
Fig 1.24 - Image above showing GitHub Actions workflow run showing a fully successful deployment with a green checkmark.
Obstacle 3 - Cloudflare Serving Stale Content After Deployments: Even after successful deployments, I was seeing old versions of the site. The pipeline was doing its job, but Cloudflare's cache was holding onto previous versions of the files and serving them as if nothing had changed.
What I did: I added an automated cache purge step to the deployment pipeline using a scoped Cloudflare API token. Now, every time a deployment completes successfully, the pipeline automatically sends a signal to Cloudflare to invalidate its cache. Now no cache busting or manual intervention was needed anymore.

Fig 1.25 - Image above showing GitHub Actions workflow YAML with the automated Cloudflare cache purge step added at the end of the deployment pipeline.
Result: Every deployment now goes live instantly, no stale content and no manual cache management (cache busting).

Fig 1.26 - Image above showing GitHub Actions workflow log showing the Cloudflare cache purge step completing successfully after a deployment.
Cost Governance & Budget Controls
Situation: By this point, the architecture was designed to run at zero cost, free tiers, consumption-based services, and Intentional resource choices all pointed in that direction. But here's the thing: designing for low cost and knowing your costs are low are two very different things. In the cloud, surprises on your bill are rarely pleasant. I needed visibility, not just intention.

Fig 1.27 - Image above showing the Budget overview page with the configured budget name, amount, and current spend tracking on Azure Portal.
Obstacle: Without active monitoring, I would be essentially flying blind. Any misconfigured resource, accidental deployment, or unexpected usage spike could quietly rack up charges, and I'd only find out at the end of the billing cycle. Hope is not a cost control strategy.
What I did: I configured Azure Budget to track actual spend against forecast estimates in real time. Think of it like a fuel gauge on a car, you don't wait until the engine stops to check if you have petrol. You keep an eye on it as you drive.
On top of that, I set up budget thresholds with automated email alerts. If spending went above the limit i defined, I'd be notified immediately, before it becomes a problem, not after.

Fig 1.28 - Image above showing Azure Budget alert configuration with the defined spending threshold and the email address set up to receive automated notifications.
This gave me three things that matter in cloud operations:
- Visibility — every resource's contribution to total spend was clear and accountable
- Predictability — forecast estimates let me anticipate costs before they landed
- Control — automated alerts meant I had guardrails, not just good intentions
Result: No surprises, every resource was accounted for, and the project ran exactly as designed, at $0/month. The guardrails were never triggered, but knowing they were there made the entire build feel more production-ready.
This solidified a principle I'll carry into every cloud project going forward: if you can't see it, you can't control it.
Key Takeaways
Here are the lessons I'll carry into every future cloud project:
Data-plane access is not inherited from subscription ownership. Explicit RBAC assignments at the resource level are required, and that's a feature, not a bug.
When Terraform behaves unexpectedly, check your provider version. Cloud services evolve fast. Compatibility issues masquerade as logic errors more often than you'd think.
Plan your architecture around free tier constraints. One free Cosmos DB per subscription shaped a significant infrastructure decision. Know your limits before you build.
Isolate your state backend from your application resources. It's a resilience pattern that costs nothing to implement and could save everything if something goes wrong.
Cloudflare is a genuinely powerful free alternative to Azure's native CDN for projects where budget is a real constraint — not just a compromise.
Test your test runner, not just your tests. Environment and path configuration issues are real, and they look like broken code until you find them.
Check It Out
🌐 Live Project: wisdomresume.site
💻 Source Code: GitHub
If this was useful to you, whether you're in the middle of your own Cloud Resume Challenge, just starting to explore Azure, or a recruiter trying to get a feel for how I think and build — I'd genuinely love to hear from you in the comments.
And if you spot something I could have done better or differently, even better. That's how we all get sharper.

Top comments (0)