In Part 1, we established the Terraform scaffolding for a resilient AWS Network Firewall, focusing on strict rule ordering and a decoupled parent-child module architecture. By defining clear boundaries between Terraform-owned and Automation-owned resources, we successfully eliminated the state drift tug-of-war for dynamic rule groups. Now, in Part 2, we move from infrastructure to orchestration to see how an event-driven, serverless pipeline manages high-frequency rule updates programmatically.
TL;DR
Externally managed stateful rule groups safely decoupled tenant-driven rule changes from Terraform-managed infrastructure, preserving IaC workflows and remote state — but automation is what makes them safe.
Table Of Contents
- Architectural Overview
- Engineering Walkthrough
- Source Config
- Target Config
- Runtime Environment
- The Firewall-Lambda
- The Final Words
Architectural Overview
After analyzing the application’s update frequency and internal logic, it became clear that a traditional IaC approach would be insufficient. To support high-frequency rule updates without manual bottlenecks or the risk of Terraform state drift, an asynchronous, event-driven pipeline was the only scalable path forward.
This architecture creates a strict Separation of Concerns: it decouples the tenant-facing application logic from the core network security layer. By moving the heavy lifting to a dedicated automation plane, security constraints are enforced programmatically and independently of the application's lifecycle.
The following diagram illustrates the unidirectional flow from the tenant's request to the firewall's enforcement:
1️⃣ Tenant Self-Serviced Input:
Tenants submit hostname (new/update) via a their Admin UI. This is the entry point for all allowlist modifications.
2️⃣ Backend Application Processing:
The backend application first validates the input for format and ownership constraints. Once validated, it persists the record to the Application Database and immediately invokes Application Lambda function to handle the downstream synchronization.
3️⃣ Source-of-Truth Synchronization:
The application-lambda aggregates the active hosts and updates tenant_hosts.json file in one pre-configured Amazon S3 bucket. This file acts as the definitive Desired State for the firewall. The bucket is versioned to allow for immediate rollbacks and historical auditing.
4️⃣ S3 Event → EventBridge:
Critically, the application does not communicate with the firewall directly. Instead, the update to the S3 object emits an event to Amazon EventBridge. This event is filtered by bucket name and object key (tenant_hosts), ensuring the pipeline is only triggered by valid state changes.
5️⃣ Firewall Automation Lambda:
The EventBridge trigger invokes the Firewall Lambda, which serves as the authoritative control plane. This function:
- Reads the updated
tenant_hosts.jsonfile from S3. - Generates the corresponding suricata-compatible rule syntax.
- Performs a logical
Diffagainst the active AWS Network Firewall configuration. - Executes an API update to the Externally Managed Rule Group only if actual changes are detected.
6️⃣ Human Visibility (ChatOps):
Once the operation is complete, the Lambda dispatches a success or failure notification to Microsoft Teams. This closes the feedback loop, providing the engineering team with real-time visibility into the automated security operations.
Engineering Walkthrough
As illustrated in the architectural diagram, this is a multi-account system. Both the Internal_Apps account and the External-Comm accounts must be configured in tandem to facilitate this cross-account event flow.
Note on Cross-Account Events: While the granular details of cross-account EventBridge bus configurations are out of scope for this article, you can find a comprehensive guide on Triggering Cross-Account Workflows here.
Source Side Config:
Generating the Source of Truth
The application-side implementation was moderately simple: Terraform provisions the application-lambda as part of the EKS-apps infrastructure and injects its metadata (ARN/ID) into the application pods as environment variables.
Whenever a tenant modifies their hostname, the backend invokes the Lambda using the ${lambda_function_arn}. The Lambda then aggregates the current state and updates the tenant_hosts.json file in the S3 bucket. This file serves as the immutable desired state for our firewall.
Configuring the S3 Event Bridge
To expose S3 activity to this automation pipeline, the bucket must be configured to publish events to Amazon EventBridge across the accounts. As S3 bucket operations are scoped to the service itself by default, S3 Event Notifications must be explicitly enabled for Amazon EventBridge.
Using aws_s3_bucket_notification resource, this can be enabled globally for the bucket. It will then allows S3 to publish Object Created events (from Internal-Apps account) directly to the EventBridge bus (in External-Comm account), eliminating the need for any intermediate Lambda functions or custom notification logic.
resource "aws_s3_bucket_notification" "eb_xacc" {
bucket = module.nfw_bucket.name
eventbridge = true
provider = aws.eksapp #<-- required for cross-account
}
Establishing the Cross-Account Trust
For Amazon EventBridge to forward events across account boundaries, it requires explicit permission to hand-off the data. It can be achieved by provisioning a dedicated IAM Role within the Internal-Apps account that the EventBridge service is authorized to assume.
This role serves a single, highly-scoped purpose: it grants the local EventBridge service the events:PutEvents permission, specifically targeting the ARN of our central Event Bus in the External-Comm account. This setup ensures that as soon as S3 drops a notification into the local bus, EventBridge has the credentials necessary to push that event across the border to our automation pipeline.
resource "aws_iam_role" "eb_forward" {
name = "${local.template_name}-xacc-s3eb-Role"
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [{
Effect = "Allow",
Principal = { "Service" : "events.amazonaws.com" },
Action = ["sts:AssumeRole"]
}]
})
provider = aws.eksapp #<-- if you doing cross-account
}
resource "aws_iam_role_policy" "eb_forward" {
role = aws_iam_role.eb_forward.id
policy = jsonencode({
Version = "2012-10-17",
Statement = [{
Effect = "Allow",
Action = "events:PutEvents",
Resource = aws_cloudwatch_event_bus.s3_trigger[each.value].arn
}]
})
provider = aws.eksapp #<-- if you doing cross-account
}
Capturing and Forwarding the Event(s)
With the permissions established, only thing was needed, a mechanism to intercept the specific S3 notifications and direct them to our cross-account destination. This is handled by the EventBridge Rule and its corresponding Target.
The EventBridge rule acts as a targeted filter on the application account’s default event bus. Instead of forwarding all S3 activities, it is configured with a highly specific event pattern that matches only Object Created events from the designated S3 bucket and restricts them further to the tenant_hosts key prefix. This precision filter prevents unnecessary Lambda invocations in the downstream account, keeping the overall design both efficient and cost-effective.
resource "aws_cloudwatch_event_rule" "eb_xacc" {
for_each = local.nfw_svc_enabled ? toset([var.service_name]) : []
name = "${local.template_name}-xacc"
description = "Forward S3 Object Created events to EXC event bus"
provider = aws.eksapp #<-- if you doing cross-account
event_pattern = jsonencode({
source = ["aws.s3"],
detail-type = ["Object Created"],
detail = {
bucket = {
name = [local.xacc_s3_trigger.bucket]
},
object = {
key = [
{ prefix = local.xacc_s3_trigger.object }
]
}
}
})
}
The aws_cloudwatch_event_target then acts as the exit ramp. It connects the rule to our central Event Bus in the security account. By referencing the ${aws_iam_role.eb_forward} we created earlier, the target has the authority to cross the account boundary and deliver the payload to our central automation hub.
resource "aws_cloudwatch_event_target" "eb_xacc" {
rule = aws_cloudwatch_event_rule.eb_xacc.name
arn = aws_cloudwatch_event_bus.s3_trigger.arn
role_arn = aws_iam_role.eb_forward.arn
provider = aws.eksapp #<-- if you doing cross-account
}
Target End Config:
Custom Event Bus and Permissions
In the External-Comm account, I provisioned a dedicated Custom Event Bus to isolate our firewall automation traffic from the general noise of the account, to keep the security events clean and auditable. It's as simple as defining this:
resource "aws_cloudwatch_event_bus" "s3_trigger" {
name = "${local.template_name}-nfw-s3-trigger"
}
However, a custom bus is closed by default. So, to receive events from the Internal-Apps account, I created a dedicated Custom EventBridge event bus in the External-Comm account. To enable cross-account delivery, I used the aws_cloudwatch_event_permission resource to create a resource-based policy (details below) to explicitly allow external-apps account (referenced via ${local.xacc_s3_trigger.acc_id}) to call PutEvents on this custom bus. In this way, the system can securely hand off S3 notifications across the account boundary without exposing the bus to the broader public.
resource "aws_cloudwatch_event_permission" "s3_trigger" {
event_bus_name = aws_cloudwatch_event_bus.s3_trigger.name
principal = local.xacc_s3_trigger.acc_id
action = "events:PutEvents"
statement_id = "AllowIncToPutEvents"
}
Filtering for Actionable Changes
On this custom bus, an EventBridge rule is defined to listen specifically for S3 Object Created events, originating from the expected bucket and object key prefix. When a matching event is received, the rule triggers the firewall-lambda responsible for validation and rule group updates.
As soon as a new/updated tenant_hosts.json lands in S3 and crosses the border, this rule captures it and triggers the automation logic that keeps our firewall synchronized in real-time.
resource "aws_cloudwatch_event_rule" "s3_trigger" {
name = "${local.template_name}-s3-trigger"
event_bus_name = aws_cloudwatch_event_bus.s3_trigger.name
description = "Trigger Lambda when object updated in ${local.xacc_s3_trigger.bucket}"
event_pattern = jsonencode({
source = ["aws.s3"],
detail-type = ["Object Created"],
detail = {
bucket = { name = [local.xacc_s3_trigger.bucket] },
object = {
key = [{ prefix = local.xacc_s3_trigger.object }]
}
}
})
}
Lambda IAM Role & Permissions
For the automation to function, the firewall-lambda needs to act as a privileged operator across a nmber of AWS services. This IAM role and policy document are specificially designed using the principle of least privilege, ensuring the function has exactly what it needs to perform the Diff and Update logic—and nothing more.
The most critical aspect of this policy is the AWS Network Firewall authority boundary. The UpdateRuleGroup and DescribeRuleGroup permissions are explicitly scoped to the A*RN of the externally managed stateful rule group* (${local.extl_suricata_rg_arn}) and nothing else. By restricting the access to a single resource, the automation pipeline is technically incapable of modifying any other Terraform-managed firewall rules or resources, even in a failure or misconfiguration scenario. Standard CloudWatch Logs permissions are also included to ensure every evaluation and decision made by the automation logic is fully auditable.
// IAM role for firewall-lambda
resource "aws_iam_role" "nfw_lambda" {
name = "${local.nfw_resource_prefix}-lambda-Role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = [
"lambda.amazonaws.com"
]
}
},
]
})
}
// Policy document for firewall-lambda role
data "aws_iam_policy_document" "nfw_lambda" {
# S3 permission: read tenant-hosts.json
statement {
effect = "Allow"
actions = [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:ListBucket"
]
resources = [
"arn:aws:s3:::${local.xacc_s3_trigger.bucket}",
"arn:aws:s3:::${local.xacc_s3_trigger.bucket}/*",
]
}
# NFW permission: update the RSL rule group
statement {
effect = "Allow"
actions = [
"network-firewall:DescribeRuleGroup",
"network-firewall:UpdateRuleGroup"
]
resources = [
local.extl_suricata_rg_arn
]
}
# Allow Lambda to kms:Decrypt
statement {
effect = "Allow"
actions = [
"kms:Decrypt",
"kms:DescribeKey",
"kms:GenerateDataKey*"
]
resources = [module.nfw_bucket_key[each.value].kms_key_arn]
}
# CloudWatch Logs permission
statement {
effect = "Allow"
actions = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
]
resources = ["*"]
}
}
Runtime Environment:
AWS Lambda Function
The dth_updater (Dynamic Tenant Hosts) Lambda acts as the control plane for firewall updates, with all of the required context at runtime. It runs on Python 3.12 for strong[er] Boto3 support and efficient string processing when handling Suricata rules. Critical configuration is passed via environment variables, including the ARN of the externally managed rule group, SOPS-encrypted Microsoft Teams webhook etc., enabling secure, real-time notifications without exposing sensitive endpoints.
resource "aws_lambda_function" "dth_updater" {
filename = data.archive_file.dthu_lambda_zip.output_path
function_name = "${local.template_name}-tenant-fw-updater"
handler = "nfw_tenant_hosts_updater.lambda_handler"
role = aws_iam_role.nfw_lambda.arn
source_code_hash = data.archive_file.dthu_lambda_zip.output_base64sha256
runtime = "python3.12"
timeout = 30
environment {
variables = {
DYNAMIC_SURICATA_RG_ARN = local.extl_suricata_rg_arn
TEAMS_WEBHOOK_URL = lookup(
data.sops_file.tenant_teams_wh.data,
"teams_webhook_url",
null
)
ENV_NAME = var.aws_acc_name
MSG_TITLE = var.notify_teams["tenant_hosts"].title
NOTIFY_TEAMS = var.notify_teams["tenant_hosts"].enabled
SERVICE_NAME = var.tf_module_name
}
}
depends_on = [
aws_iam_role_policy_attachment.nfw_lambda
]
}
Explicit Invocation
Provisioning the Lambda and EventBridge rule alone is not sufficient; AWS services default to a zero-trust invocation model. An explicit Lambda permission is required to allow EventBridge to invoke the function.
By scoping the principal to events.amazonaws.com and restricting the source_arn to the specific S3-trigger rule, invocation is limited strictly to the events, originating from the custom event bus. This ensures the Lambda cannot be triggered accidentally or maliciously from other sources within the account.
resource "aws_lambda_permission" "s3_trigger" {
statement_id = "AllowEventsInvoke"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.dth_updater.function_name
principal = "events.amazonaws.com"
source_arn = aws_cloudwatch_event_rule.s3_trigger.arn
}
Wiring the Pipeline
The final component is the EventBridge target, which connects the filtered rule to the firewall automation Lambda. The target references the dth_updater function ARN directly. I also added the explicit dependency on to ensure Lambda invoke permissions are fully propagated before the target becomes active. This avoids race conditions during initial deployment and guarantees the pipeline is ready and reliable from the first S3 object update onward.
resource "aws_cloudwatch_event_target" "s3_trigger" {
rule = aws_cloudwatch_event_rule.s3_trigger.name
event_bus_name = aws_cloudwatch_event_bus.s3_trigger.name
target_id = "tenant-fw-updater"
arn = aws_lambda_function.dth_updater.arn
depends_on = [
aws_lambda_permission.s3_trigger
]
}
Filewall-Lambda Function
Function Code
I have added the Lambda function shown below is provided purely as a reference implementation. The exact structure, validation logic, and rule-generation strategy will vary depending on individual requirements, tenant models, and security posture. This is just to show one of the ways of implementing an event-driven automation Lambda can safely control an externally managed Network Firewall rule group.
At a high level, this function performs the following steps:
- Parses the incoming EventBridge S3 event to identify the updated object
- Loads tenant-defined hostnames from S3 and expands them into fully qualified domains
- Reads the current externally managed Suricata rule group from Network Firewall
- Computes a deterministic diff between existing and desired state
- Skips the update entirely if no effective change is detected
- Rebuilds the Suricata ruleset and updates the rule group using an update token
- Optional - Emits a structured notification to Microsoft Teams for visibility and auditability
This pattern keeps the Lambda focused on validation, diffing, and enforcement, while allowing upstream systems to express intent without direct access to the firewall control plane.
def lambda_handler(event, context):
"""
Dynamic Suricata Rule-Group updater.
Triggered via EventBridge when tenant-temp-hosts.json changes in S3.
"""
result = {
"stage": "start",
"status": "ok",
"added": [],
"removed": [],
"error": None,
}
try:
logger.info("Event: %s", json.dumps(event))
# 1. Parse S3 event (EventBridge style)
# -----------------------------------------------
result["stage"] = "parse_event"
try:
# This is very important to pick up the correct
# S3 bucket and object from event JSON
bucket = event["detail"]["bucket"]["name"]
key = event["detail"]["object"]["key"]
except Exception as e:
raise RuntimeError(f"Invalid EventBridge S3 event: {e}")
logger.info("Processing file s3://%s/%s", bucket, key)
# 2. Load tenants + expand to full FQDNs
# -----------------------------------------------
result["stage"] = "load_and_expand"
host_map = load_domain_map(bucket, key)
new_fqdns = expand_fqdns(host_map)
logger.info("New expanded FQDNs: %d entries", len(new_fqdns))
# 3. Read existing dynamic Suricata rule-group
# -----------------------------------------------
result["stage"] = "describe_rule_group"
describe = nfw.describe_rule_group(
RuleGroupArn=DYNAMIC_SURICATA_RG_ARN,
Type="STATEFUL"
)
existing_desc = describe.get("RuleGroupResponse", {}).get("Description")
rule_group_def = describe.get("RuleGroup", {})
update_token = describe["UpdateToken"]
update_args = {
"RuleGroupArn": DYNAMIC_SURICATA_RG_ARN,
"Type": "STATEFUL",
"UpdateToken": update_token,
"RuleGroup": rule_group_def,
}
# Only include Description if AWS returned a valid string
if isinstance(existing_desc, str) and existing_desc.strip():
update_args["Description"] = existing_desc
existing_rules_str = (
rule_group_def
.get("RulesSource", {})
.get("RulesString", "")
)
old_fqdns = extract_fqdns_from_rules(existing_rules_str)
# 4. The Diff
# -----------------------------------------------
result["stage"] = "diff"
new_set = set(new_fqdns)
added = sorted(new_set - old_fqdns)
removed = sorted(old_fqdns - new_set)
result["added"] = added
result["removed"] = removed
if not added and not removed:
logger.info("No changes detected. RuleGroup remains unchanged.")
result["stage"] = "no_change"
return {
"statusCode": 200,
"body": json.dumps({
"message": "No change in tenant rules",
"changed": False,
"fqdn_count": len(new_fqdns),
}),
}
logger.info("Changes detected: +%d, -%d", len(added), len(removed))
# 5. Build new Suricata rules (stable hash SIDs)
# -----------------------------------------------
result["stage"] = "build_suricata_rules"
new_rules_str = build_suricata_rules(new_fqdns)
# 6. Insert updated rules back into RuleGroup dict
# -----------------------------------------------
result["stage"] = "prepare_update"
if "RulesSource" not in rule_group_def:
rule_group_def["RulesSource"] = {}
rule_group_def["RulesSource"]["RulesString"] = new_rules_str
# 7. Update rule group in Network Firewall
# -----------------------------------------------
result["stage"] = "update_rule_group"
resp = nfw.update_rule_group(**update_args)
logger.info("UpdateRuleGroup response: %s", json.dumps(resp, default=str))
result["stage"] = "completed"
return {
"statusCode": 200,
"body": json.dumps({
"message": "Dynamic Suricata tenant allowlist updated",
"added": added,
"removed": removed,
"fqdn_count": len(new_fqdns),
"changed": True,
}),
}
except Exception as e:
logger.error("Lambda FAILED: %s", e, exc_info=True)
result["status"] = "failure"
result["error"] = str(e)
raise
finally:
# 8. Teams notification (if enabled)
# -----------------------------------------------
if NOTIFY_TEAMS:
try:
env_name = os.environ.get("ENV_NAME", "unknown")
svc = os.environ.get("SERVICE_NAME", "tenant-fw-updater")
send_teams_update(
stage=result["stage"],
status=("success" if result["status"] == "ok" else "failure"),
added=result["added"],
removed=result["removed"],
env=env_name,
service_name=svc,
error=result["error"],
)
except Exception as te:
logger.error("Teams notification failed: %s", te)
Helper Functions
For completeness, the following helper functions support the Lambda handler shown above. Python is not among my strongest skill-sets but worked for me handling domain normalization, deterministic rule generation, and extraction of existing state from the firewall rule group.
- Treating S3 as the canonical desired-state source
- Generating stable Suricata rule identifiers to avoid churn
- Rebuilding rules deterministically from intent, rather than mutating in place
def load_domain_map(bucket: str, key: str) -> Dict[str, List[str]]:
"""Load JSON mapping of base_domain → list of tenant names."""
obj = s3.get_object(Bucket=bucket, Key=key)
text = obj["Body"].read().decode("utf-8")
data = json.loads(text)
if not isinstance(data, dict):
raise ValueError("JSON must be a dict of { base_domain: [tenants...] }")
out = {}
for base, tenants in data.items():
if not isinstance(tenants, list):
raise ValueError(f"Value for '{base}' must be a list")
cleaned = [t.strip().lower() for t in tenants if str(t).strip()]
if cleaned:
out[base.strip().lower()] = cleaned
return out
#
def expand_fqdns(mapping: Dict[str, List[str]]) -> List[str]:
"""Expand tenant + base domain to FQDNs."""
fqdns = set()
for base, tenants in mapping.items():
for t in tenants:
fqdn = f"{t}.{base}"
fqdns.add(fqdn)
return sorted(fqdns)
#
def sid_for_fqdn(fqdn: str) -> int:
"""Stable hash-based SID."""
base = 30_000_000
h = zlib.crc32(fqdn.encode("utf-8")) & 0xFFFFFFFF
return base + (h % 9_999_999)
#
def build_suricata_rules(fqdns: List[str]) -> str:
"""Generate Suricata rules for each FQDN."""
rules = []
for fqdn in fqdns:
sid = sid_for_fqdn(fqdn)
rule = (
'pass tls $EXTERNAL_NET any -> $HOME_NET any ('
f'tls.sni; content:"{fqdn}"; '
'ssl_state:client_hello; '
'msg:"Dynamic tenant allowed host"; '
'flow:to_server, established; '
'nocase; '
f"sid:{sid}; rev:1;)"
)
rules.append(rule)
return "\n".join(rules) + "\n"
CONTENT_RE = re.compile(r'content:"([^"]+)"')
#
def extract_fqdns_from_rules(rules: str) -> Set[str]:
"""Extract FQDNs from Suricata RulesString."""
if not rules:
return set()
return {m.lower() for m in CONTENT_RE.findall(rules)}
At the Day's End
That's all for now, folks!
With the automated pipeline in place and cross-account EventBridge wiring established, I managed to turn our Network Firewall into a dynamic enforcement layer that responds safely to tenant-driven change. By delegating high-frequency updates to an event-driven Lambda workflow, we eliminated manual intervention and avoid the Terraform state drift that typically accompanies mutable firewall rules. As a result, now it takes only few secods to update the firewall (+ the time NFW takes to reload), compare to around 40 minitus with old HAProxy based system.
Let me know in the commanet section below, if anyone interested in the observability features (Microsoft Teams notification) that I implemented, failure handling or the long-term maintainability option of dynamic Suricata rule sets - I can make the final part of this series, focusing on those.
While I tried my best to ensure the accuracy of the code snippets provided; feel free to reach out in the comments below or via LinkedIn if you spot any discrepancies or have questions regarding the implementation.

Top comments (0)