DEV Community

Cláudio Filipe Lima Rapôso
Cláudio Filipe Lima Rapôso

Posted on

Practical Guide: Building an Event-Driven Infrastructure on AWS with Terraform and Python

1. Introduction

Event-driven architectures transform the way systems communicate by replacing direct synchronous calls with a highly decoupled publish-subscribe model. By the end of this tutorial, you will be able to provision a complete event-based infrastructure on AWS, using Amazon EventBridge as the central router, Amazon SQS for queuing, and AWS Lambda for computational processing.

This approach is fundamental for creating scalable and resilient systems. Decoupling ensures that a failure in one component does not take down the entire application, and it allows different parts of the system to scale independently. Although the technical focus here is AWS, mastering event routing and asynchronous processing is a core structural skill for strategies in multicloud environments, where similar architectural patterns are applied using corresponding services from each provider within a unified landing zone.

2. Prerequisites

To follow this tutorial and execute the proposed configurations, you need the following resources and prior knowledge:

  • An active Amazon Web Services (AWS) account with administrative permissions to create IAM, Lambda, SQS, and EventBridge resources.
  • Terraform installed on your local machine (version 1.0 or higher) for provisioning the Infrastructure as Code (IaC).
  • Python (version 3.9 or higher) installed locally to package the Lambda function code.
  • AWS credentials configured in your local environment (via AWS CLI or environment variables).
  • Basic knowledge of terminal navigation and Terraform HCL syntax.

3. Step-by-Step

Before writing the code, it is important to visualize the event lifecycle in our infrastructure. The sequence diagram below illustrates the flow of information between the provisioned services.

Sequence Diagram

3.1 Configuring the Provider and Main File

What to do: Create the initial Terraform configuration file to define AWS as the cloud provider and establish the deployment region.

Why do it: Terraform needs to know which cloud API it will interact with and the default region to allocate resources. This centralizes the basic configuration and prepares the infrastructure state.

Example:
Create a file named main.tf and add the following block:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}
Enter fullscreen mode Exit fullscreen mode

3.2 Creating the Event Bus (EventBridge)

What to do: Provision a customized event bus in Amazon EventBridge.

Why do it: Although AWS provides a default bus, creating a custom bus for applications is an excellent domain isolation practice. This prevents mixing AWS infrastructure events with your application's business events.

Example:
Add to your main.tf file:

resource "aws_cloudwatch_event_bus" "custom_bus" {
  name = "app-domain-events"
}
Enter fullscreen mode Exit fullscreen mode

3.3 Creating the SQS Queue for Load Absorption

What to do: Create a queue in Amazon SQS and configure the access policy to allow EventBridge to send messages to it.

Why do it: Sending events directly from EventBridge to Lambda can cause data loss in case of failures or traffic spikes. The SQS queue acts as a buffer, ensuring message delivery and allowing reprocessing in case of temporary errors.

Example:
Add the queue configurations and its respective policy:

resource "aws_sqs_queue" "event_queue" {
  name = "event-processing-queue"
}

resource "aws_sqs_queue_policy" "event_queue_policy" {
  queue_url = aws_sqs_queue.event_queue.id
  policy    = data.aws_iam_policy_document.sqs_policy.json
}

data "aws_iam_policy_document" "sqs_policy" {
  statement {
    effect    = "Allow"
    actions   = ["sqs:SendMessage"]
    resources = [aws_sqs_queue.event_queue.arn]

    principals {
      type        = "Service"
      identifiers = ["events.amazonaws.com"]
    }

    condition {
      test     = "ArnEquals"
      variable = "aws:SourceArn"
      values   = [aws_cloudwatch_event_rule.event_rule.arn]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

3.4 Developing the Lambda Function in Python

What to do: Write the Python source code that will be executed when an event arrives in the queue.

Why do it: The Python code represents the business or computational logic that reacts to the event. In this example, we will iterate over the SQS queue records and log the details for visual validation of the architecture's functionality.

Example:
Create a file named processor.py in the same directory:

import json
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    logger.info("Starting processing of SQS event batch.")

    for record in event.get('Records', []):
        body_string = record.get('body', '{}')

        try:
            event_detail = json.loads(body_string)
            logger.info(f"Received event detail: {json.dumps(event_detail, indent=2)}")

            # Extracting the payload from EventBridge
            detail = event_detail.get('detail', {})
            order_id = detail.get('order_id')

            logger.info(f"Successfully processed order: {order_id}")

        except json.JSONDecodeError:
            logger.error("Error decoding message body as JSON.")

    return {
        'statusCode': 200,
        'body': json.dumps('Processing completed successfully')
    }
Enter fullscreen mode Exit fullscreen mode

3.5 Provisioning Lambda with Terraform and Connecting to the Queue

What to do: Create the ZIP package of the Python code, define the Lambda function in Terraform, create the execution permission (IAM Role), and map the SQS queue as the Lambda event source.

Why do it: The infrastructure needs to allocate the code in AWS and provide the exact permissions so the Lambda can read messages from the queue and write logs to CloudWatch, following the principle of least privilege.

Example:
Add the following code to your main.tf:

data "archive_file" "lambda_zip" {
  type        = "zip"
  source_file = "processor.py"
  output_path = "processor.zip"
}

resource "aws_iam_role" "lambda_exec_role" {
  name = "lambda_sqs_execution_role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "lambda.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "lambda_basic_execution" {
  role       = aws_iam_role.lambda_exec_role.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}

resource "aws_iam_role_policy_attachment" "lambda_sqs_execution" {
  role       = aws_iam_role.lambda_exec_role.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaSQSQueueExecutionRole"
}

resource "aws_lambda_function" "event_processor" {
  filename         = data.archive_file.lambda_zip.output_path
  function_name    = "OrderEventProcessor"
  role             = aws_iam_role.lambda_exec_role.arn
  handler          = "processor.lambda_handler"
  source_code_hash = data.archive_file.lambda_zip.output_file_base64sha256
  runtime          = "python3.11"
}

resource "aws_lambda_event_source_mapping" "sqs_trigger" {
  event_source_arn = aws_sqs_queue.event_queue.arn
  function_name    = aws_lambda_function.event_processor.arn
  batch_size       = 10
}
Enter fullscreen mode Exit fullscreen mode

3.6 Creating the Routing Rule in EventBridge

What to do: Configure a rule that filters specific events published on the bus and directs them to the SQS queue.

Why do it: The bus can receive thousands of different events. Rules ensure that the target system (your SQS queue) receives only the events that matter to it, optimizing processing and reducing costs.

Example:
Complete the main.tf file with the routing configurations:

resource "aws_cloudwatch_event_rule" "event_rule" {
  name           = "capture-order-created"
  event_bus_name = aws_cloudwatch_event_bus.custom_bus.name
  description    = "Captures order created events"

  event_pattern = jsonencode({
    source      = ["com.mycompany.orders"]
    detail-type = ["OrderCreated"]
  })
}

resource "aws_cloudwatch_event_target" "sqs_target" {
  rule           = aws_cloudwatch_event_rule.event_rule.name
  event_bus_name = aws_cloudwatch_event_bus.custom_bus.name
  target_id      = "SendToSQS"
  arn            = aws_sqs_queue.event_queue.arn
}
Enter fullscreen mode Exit fullscreen mode

To deploy the entire infrastructure, run the commands terraform init, terraform plan, and terraform apply in your terminal.

4. Common Troubleshooting

Even with a well-defined script, infrastructure provisioning can encounter obstacles. Below are the most frequent problems and their solutions:

  1. Access Denied Permissions on the SQS Queue:
    What is happening: EventBridge cannot deliver messages to the queue.
    How to solve it: Check the aws_sqs_queue_policy. Ensure the EventBridge rule ARN is correct in the security condition. A misconfigured policy will result in the silent discard of events.

  2. Lambda is not triggered by SQS messages:
    What is happening: Messages arrive in the queue, but the Python function is not executed.
    How to solve it: Validate if the IAM Role linked to the Lambda (lambda_sqs_execution_role) has the AWSLambdaSQSQueueExecutionRole policy properly attached. Lambda needs read permissions (sqs:ReceiveMessage, sqs:DeleteMessage, sqs:GetQueueAttributes) to consume the data.

  3. Event Pattern Incompatibility:
    What is happening: You publish an event to the bus, but it does not appear in the queue.
    How to solve it: Verify the accuracy of the injected JSON. The source and detail-type fields sent by the message producer must perfectly match the values stipulated in the event_pattern block of the Terraform rule. EventBridge is case-sensitive.

5. Conclusion

In this tutorial, we built a robust backbone for an event-driven architecture. We configured the structure in code using Terraform, establishing EventBridge as the intelligent traffic router, SQS as the fault-tolerance layer, and Python Lambda as the scalable processing unit.

Mastering this topology empowers you to integrate microservices safely and cleanly. As next steps, I recommend diving into the implementation of Dead Letter Queues (DLQ) for advanced handling of unresolved failures and exploring how this decoupling architectural pattern translates into the design of complex systems in multicloud scenarios.

Top comments (0)