1. Introduction
An event-driven architecture using Azure Cosmos DB, Azure Functions, and Azure Service Bus is the enterprise standard for designing highly decoupled, scalable, and resilient systems in the Microsoft Cloud. In this tutorial, you will construct a complete, asynchronous data flow from scratch using Azure's analogous features to the AWS ecosystem.
The process begins when modifications occur in a Cosmos DB container, which automatically surfaces through the Cosmos DB Change Feed. This feed is continuously monitored by a producer Azure Function, which processes the change and publishes a message to a Service Bus Topic. The Service Bus Topic acts as a high-availability message router (similar to AWS SNS), performing a fan-out distribution of this notification to a Service Bus Subscription (acting as the queue, similar to AWS SQS). Finally, a second consumer Azure Function reads the messages from the subscription to perform the final business logic processing. By the end of this guide, you will be able to deploy this robust infrastructure using Terraform, adhering to Azure security best practices such as utilizing Managed Identities instead of vulnerable connection strings.
2. Prerequisites
Before starting the infrastructure build, ensure your development environment is prepared. You will need an active Azure subscription with permissions to provision compute, messaging, database, and Role-Based Access Control (RBAC) resources. The Azure Command-Line Interface (Azure CLI) must be installed and authenticated (az login) on your local machine. Additionally, Terraform (version 1.3.0 or higher) needs to be installed and globally accessible from your command terminal. Fundamental knowledge of Terraform's declarative syntax (HCL) and a basic understanding of Azure resource hierarchy (Resource Groups) are highly recommended. A modern code editor like VS Code with the HashiCorp Terraform extension will greatly facilitate your workflow.
3. Step-by-Step
Step 1: Initial Configuration and Azure Provider
What to do: Create the root Terraform configuration file to define the required Terraform version, establish the connection with the AzureRM provider, and create a foundational Resource Group.
Why do it: The azurerm provider block instructs Terraform on how to interact with the Azure Resource Manager API. In Azure, all resources must be logically grouped into a Resource Group. Deploying all interconnected services (Cosmos DB, Service Bus, Functions) into a single Resource Group makes it easier to manage their lifecycle, monitor costs, and delete the entire environment cleanly when no longer needed.
Example (main.tf):
terraform {
required_version = ">= 1.3.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.0"
}
}
}
provider "azurerm" {
features {}
}
resource "azurerm_resource_group" "main" {
name = "rg-eventdriven-architecture-prod"
location = "East US"
tags = {
Environment = "Production"
Project = "EventDrivenArchitecture"
}
}
Step 2: Creation of the Cosmos DB Account and Containers
What to do: Provision a Cosmos DB account with the SQL (Core) API, a database, and two containers: one for your application data and one "leases" container required to manage the state of the Change Feed.
Why do it: Cosmos DB is Azure's fully managed NoSQL database. Unlike AWS DynamoDB where Streams must be explicitly enabled, the Cosmos DB Change Feed is enabled by default. However, to consume it reliably with Azure Functions, you must create an auxiliary container called a "lease container." This container acts as a checkpoint system, allowing the producer function to keep track of which events it has already processed, ensuring reliable, ordered, and distributed event consumption.
Example (database.tf):
resource "azurerm_cosmosdb_account" "db_account" {
name = "cosmos-event-app-prod"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
offer_type = "Standard"
kind = "GlobalDocumentDB"
consistency_policy {
consistency_level = "Session"
}
geo_location {
location = azurerm_resource_group.main.location
failover_priority = 0
}
}
resource "azurerm_cosmosdb_sql_database" "database" {
name = "ApplicationData"
resource_group_name = azurerm_resource_group.main.name
account_name = azurerm_cosmosdb_account.db_account.name
}
# Main Data Container
resource "azurerm_cosmosdb_sql_container" "data_container" {
name = "Items"
resource_group_name = azurerm_resource_group.main.name
account_name = azurerm_cosmosdb_account.db_account.name
database_name = azurerm_cosmosdb_sql_database.database.name
partition_key_path = "/id"
}
# Leases Container for Change Feed State
resource "azurerm_cosmosdb_sql_container" "leases_container" {
name = "leases"
resource_group_name = azurerm_resource_group.main.name
account_name = azurerm_cosmosdb_account.db_account.name
database_name = azurerm_cosmosdb_sql_database.database.name
partition_key_path = "/id"
}
Step 3: Provisioning the Service Bus Topic and Subscription
What to do: Implement the Azure Service Bus Namespace, a Topic, and a Subscription. This maps perfectly to the AWS SNS/SQS fan-out pattern.
Why do it: A Service Bus Topic allows you to publish a single message that can be received by multiple independent Subscriptions. The Topic handles the routing (like SNS), while the Subscription acts as a dedicated, persistent queue for your consumer function (like SQS). This decouples your microservices and provides message retention in case the consumer function is temporarily unavailable.
Example (messaging.tf):
resource "azurerm_servicebus_namespace" "sb_namespace" {
name = "sb-event-routing-prod"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
sku = "Standard" # Required for Topics
}
resource "azurerm_servicebus_topic" "event_topic" {
name = "data-processing-topic"
namespace_id = azurerm_servicebus_namespace.sb_namespace.id
}
resource "azurerm_servicebus_subscription" "event_subscription" {
name = "data-processing-subscription"
topic_id = azurerm_servicebus_topic.event_topic.id
max_delivery_count = 10 # Send to Dead Letter Queue after 10 fails
}
Step 4: Deploying the Producer and Consumer Azure Functions
What to do: Create a Storage Account and an App Service Plan (Serverless Consumption tier) required to host Azure Functions. Then, deploy the two Linux Function Apps, ensuring that System Assigned Managed Identities are enabled for secure, passwordless authentication.
Why do it: Azure Functions require a backing Storage Account to manage triggers and function execution state. By enabling identity { type = "SystemAssigned" }, Azure automatically provisions a service principal in Microsoft Entra ID (formerly Azure AD) tied to the lifecycle of the Function. This is a crucial security best practice, allowing your functions to securely access Cosmos DB and Service Bus without hardcoding connection strings in your configuration.
Example (functions.tf):
resource "azurerm_storage_account" "func_storage" {
name = "stfuncappdata123"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
account_tier = "Standard"
account_replication_type = "LRS"
}
resource "azurerm_service_plan" "consumption_plan" {
name = "asp-serverless-plan"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
os_type = "Linux"
sku_name = "Y1" # Dynamic Serverless Consumption
}
resource "azurerm_linux_function_app" "producer_func" {
name = "func-cosmos-producer-prod"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
service_plan_id = azurerm_service_plan.consumption_plan.id
storage_account_name = azurerm_storage_account.func_storage.name
storage_account_access_key = azurerm_storage_account.func_storage.primary_access_key
site_config {
application_stack {
node_version = "20"
}
}
identity {
type = "SystemAssigned"
}
}
resource "azurerm_linux_function_app" "consumer_func" {
name = "func-sb-consumer-prod"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
service_plan_id = azurerm_service_plan.consumption_plan.id
storage_account_name = azurerm_storage_account.func_storage.name
storage_account_access_key = azurerm_storage_account.func_storage.primary_access_key
site_config {
application_stack {
node_version = "20"
}
}
identity {
type = "SystemAssigned"
}
}
Step 5: Applying Role-Based Access Control (RBAC) Triggers
What to do: Grant the specific IAM roles required for the Functions to interact with the database and messaging bus using Azure Role Assignments.
Why do it: While AWS uses IAM Policies attached to Event Source Mappings, Azure utilizes RBAC assignments. The Producer Function needs permission to read the Cosmos DB Change Feed and send messages to the Service Bus. The Consumer Function needs permission to receive messages from the Service Bus. Defining these natively in Terraform centralizes your security posture and enforces the principle of least privilege.
Example (rbac.tf):
data "azurerm_subscription" "current" {}
# Give Producer Function permission to Send to Service Bus
resource "azurerm_role_assignment" "producer_sb_sender" {
scope = azurerm_servicebus_namespace.sb_namespace.id
role_definition_name = "Azure Service Bus Data Sender"
principal_id = azurerm_linux_function_app.producer_func.identity[0].principal_id
}
# Cosmos DB RBAC is custom, but for simplicity we grant Cosmos DB Account Reader
# (Note: Data plane role assignments for Cosmos DB are often managed via azapi or CLI,
# but this represents the structural intent)
resource "azurerm_role_assignment" "producer_cosmos_reader" {
scope = azurerm_cosmosdb_account.db_account.id
role_definition_name = "DocumentDB Account Contributor"
principal_id = azurerm_linux_function_app.producer_func.identity[0].principal_id
}
# Give Consumer Function permission to Read from Service Bus
resource "azurerm_role_assignment" "consumer_sb_receiver" {
scope = azurerm_servicebus_namespace.sb_namespace.id
role_definition_name = "Azure Service Bus Data Receiver"
principal_id = azurerm_linux_function_app.consumer_func.identity[0].principal_id
}
4. Common Troubleshooting
Navigating cloud permissions and distributed architectures can present hurdles. Here are common issues and their solutions when working with this Azure stack:
- Producer Function is not triggering on Cosmos DB changes: This usually indicates a problem with the leases container. Ensure that the
leasescontainer actually exists in your Cosmos DB database and that your Function'sfunction.jsonbindings point to the correct database and collection names. Additionally, verify that your Function app settings contain the fully qualified URI to the Cosmos DB account if using Managed Identities. - Authorization rules or "Unauthorized" errors on Service Bus: If your consumer function fails to read the queue, ensure that the "Azure Service Bus Data Receiver" role has propagated. RBAC assignments in Azure can sometimes take 5-10 minutes to fully propagate globally. Avoid using older connection string bindings in your function code; ensure your code uses
ServiceBusClientconfigured withDefaultAzureCredential(). - Poison Messages and Infinite Loops: Just like AWS SQS, if your Consumer Function throws an unhandled exception, the Service Bus message is not marked as complete and will be retried. If this continues, it will cause an infinite loop. The
max_delivery_count = 10defined in Step 3 solves this by automatically moving the failing message to a dead-letter queue (DLQ) after 10 attempts, allowing your system to continue processing healthy messages.
5. Conclusion
In this tutorial, you transitioned architectural concepts across cloud providers, building an enterprise-grade, asynchronous infrastructure on Microsoft Azure. By utilizing Cosmos DB Change Feeds, Azure Functions, and Service Bus Topics/Subscriptions, you implemented a resilient Fan-Out event-driven system. Leveraging Terraform and System Assigned Managed Identities ensured that your infrastructure is not only automated and version-controlled but also inherently secure by eliminating the need for hardcoded credentials. As next steps, consider implementing Azure Application Insights for end-to-end distributed tracing across your functions, and placing your Storage Account behind an Azure Virtual Network (VNet) private endpoint for enhanced network isolation.
Top comments (0)