Nez Iribas

Posted on Apr 28

How I Governance-Proofed Our Bedrock Agents Across Multiple AWS Accounts

#ai #aws #terraform #community

Part 2 of 4 — Securing Agentic AI on AWS: A Cloud Engineer's Playbook

In Part 1, I fixed 11 security gaps in a single Bedrock Agent. That was the easy part.

The hard part: making sure no developer in any of our AWS accounts could ever reproduce those gaps.

The moment someone opens a new sandbox account and deploys an agent without reading the playbook, everything you fixed is undone. A playbook isn't governance. Governance is what makes the playbook irrelevant.

This article covers how to enforce Bedrock security controls at the organizational level — so insecure configurations become impossible to deploy, not just discouraged. By the end, you'll have:

3 production-ready SCPs that block insecure Bedrock configurations before they're deployed
2 AWS Config Rules (1 managed + 1 custom Lambda) that catch drift in real time
An EventBridge → SNS alerting pipeline wired up with Terraform
A hands-on test sequence to verify every control actually works

All Terraform. All verified against AWS documentation. No handwaving.

Note: Some supporting resources (KMS keys, S3 buckets, IAM roles) are referenced but not fully defined — this article focuses on the governance layer itself. Adapt the examples to fit your existing infrastructure.

The Architecture: Three Layers

Layer 1 — SCPs (Preventive)
  └── Block insecure configurations before they're deployed

Layer 2 — AWS Config Rules (Detective)
  └── Catch drift in resources that already exist

Layer 3 — EventBridge + SNS (Responsive)
  └── Alert immediately when something slips through

If Layer 1 works, Layer 2 never fires. If Layer 2 fires, Layer 3 wakes someone up. Defense in depth — not for paranoia, but because people make mistakes and environments drift.

Prerequisites

You'll need:

AWS Organizations with at least one OU
Terraform v1.5+ with AWS provider v5.x
Permissions to manage org-level policies

Verify your org structure before you start:

# Confirm your org exists and get the root ID
aws organizations describe-organization

# List your OUs
ROOT_ID=$(aws organizations list-roots --query 'Roots[0].Id' --output text)
aws organizations list-organizational-units-for-parent --parent-id $ROOT_ID

Layer 1: Service Control Policies

SCPs are evaluated before IAM policies — before the request even reaches the service. Even an account root user cannot override them. This is what makes them the right tool for hard enforcement.

⚠️ Always test SCPs in a sandbox OU first. Attach to a single test account before applying org-wide. A misconfigured SCP can immediately lock out legitimate access across every account in the OU.

SCP 1: Block Bedrock API Keys Entirely

In July 2025, AWS released Bedrock API Keys — a new authentication method that lets developers call Bedrock outside of standard IAM roles. Long-lived keys, created with one click from the console, not tied to IAM roles. If your team has no need for them, block both the creation and the usage.

This SCP requires two statements to be complete. Most examples online only include one — that's not enough:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyBedrockAPIKeyCreation",
      "Effect": "Deny",
      "Action": "iam:CreateServiceSpecificCredential",
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "iam:ServiceSpecificCredentialServiceName": "bedrock.amazonaws.com"
        }
      }
    },
    {
      "Sid": "DenyBedrockBearerTokenUsage",
      "Effect": "Deny",
      "Action": "bedrock:CallWithBearerToken",
      "Resource": "*"
    }
  ]
}

Why both statements? The first blocks creating new API keys. The second blocks using existing ones — including short-term keys that are generated client-side and never appear as a CloudTrail creation event. Skip either statement and you have a gap.

Important: Without the iam:ServiceSpecificCredentialServiceName condition, the first statement also blocks API key creation for CodeCommit and Amazon Keyspaces. Always scope it.

SCP 2: Enforce Approved Models with Wildcard ARNs

Use NotResource with wildcard ARNs — not full model version strings. Full version strings break when AWS releases a new model version, and they do regularly.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyUnapprovedBedrockModels",
      "Effect": "Deny",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream",
        "bedrock:Converse",
        "bedrock:ConverseStream"
      ],
      "NotResource": [
        "arn:aws:bedrock:*::foundation-model/anthropic.*",
        "arn:aws:bedrock:*::foundation-model/amazon.*"
      ]
    }
  ]
}

Why include Converse and ConverseStream? AWS says these are blocked automatically when InvokeModel is denied — but the Converse API is the recommended unified inference API, and relying on implicit cascading behavior is fragile. Make it explicit.

Note on Cross-Region Inference (CRIS): If you use Bedrock's cross-region inference profiles, avoid aws:RequestedRegion conditions for model governance — CRIS routes requests to multiple regions automatically. Stick to NotResource model ARN matching instead.

SCP 3: Block Specific Model Families

For compliance, legal, or cost reasons you may want to explicitly deny certain model families:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyUnapprovedModelFamilies",
      "Effect": "Deny",
      "Action": "bedrock:*",
      "Resource": [
        "arn:aws:bedrock:*:*:foundation-model/deepseek.*",
        "arn:aws:bedrock:*:*:foundation-model/cohere.*"
      ]
    }
  ]
}

Terraform: Deploy All SCPs to Your OU

# terraform/governance/scps.tf

locals {
  scp_policies = {
    deny_api_keys = {
      name        = "DenyBedrockAPIKeys"
      description = "Blocks Bedrock API key creation and usage — use IAM roles instead"
      file        = "policies/deny-bedrock-api-keys.json"
    }
    enforce_approved_models = {
      name        = "EnforceApprovedBedrockModels"
      description = "Restricts Bedrock invocations to approved model families"
      file        = "policies/approved-models.json"
    }
    deny_unapproved_families = {
      name        = "DenyUnapprovedModelFamilies"
      description = "Explicitly blocks disallowed model families"
      file        = "policies/deny-model-families.json"
    }
  }
}

resource "aws_organizations_policy" "bedrock_governance" {
  for_each = local.scp_policies

  name        = each.value.name
  description = each.value.description
  type        = "SERVICE_CONTROL_POLICY"
  content     = file("${path.module}/${each.value.file}")

  tags = {
    Purpose   = "bedrock-governance"
    ManagedBy = "terraform"
  }
}

resource "aws_organizations_policy_attachment" "bedrock_governance" {
  for_each = local.scp_policies

  policy_id = aws_organizations_policy.bedrock_governance[each.key].id
  target_id = var.ai_workloads_ou_id
}

SCP quota: AWS limits you to 5 SCP attachments per target by default. If you're close to the limit, combine related policies into a single JSON document with multiple statements rather than creating separate SCPs.

Layer 2: AWS Config Rules

SCPs prevent new violations. Config Rules catch existing drift — resources deployed before governance was in place, or changes that slipped through manually.

Step 1: Verify the Config Recorder Is Running

Config Rules silently do nothing if the recorder isn't enabled. Always check this first:

aws configservice describe-configuration-recorder-status \
  --query 'ConfigurationRecordersStatus[].recording'

If it returns false or nothing, enable it:

# terraform/governance/config.tf

resource "aws_config_configuration_recorder" "main" {
  name     = "bedrock-governance-recorder"
  role_arn = aws_iam_role.config_recorder.arn

  recording_group {
    all_supported                 = false
    include_global_resource_types = false
    resource_types = [
      "AWS::Lambda::Function",
      "AWS::IAM::Role",
      "AWS::CloudTrail::Trail",
    ]
  }
}

resource "aws_config_delivery_channel" "main" {
  name           = "bedrock-governance-channel"
  s3_bucket_name = aws_s3_bucket.config_logs.id
  depends_on     = [aws_config_configuration_recorder.main]
}

resource "aws_config_configuration_recorder_status" "main" {
  name       = aws_config_configuration_recorder.main.name
  is_enabled = true
  depends_on = [aws_config_delivery_channel.main]
}

Managed Rule: CloudTrail Enabled

resource "aws_config_config_rule" "cloudtrail_enabled" {
  name        = "cloudtrail-enabled-bedrock"
  description = "CloudTrail must be enabled — Bedrock invocations won't be logged otherwise"

  source {
    owner             = "AWS"
    source_identifier = "CLOUD_TRAIL_ENABLED"
  }

  depends_on = [aws_config_configuration_recorder_status.main]
}

Custom Rule: Bedrock Agent Lambda Must Be in a VPC

No managed Config Rule exists for this check. The rule evaluates only Lambda functions tagged BedrockAgent=true, so it doesn't create noise for unrelated functions.

# lambda/config-rules/bedrock-vpc-check/handler.py
import boto3
import json

def lambda_handler(event, context):
    invoking_event = json.loads(event["invokingEvent"])
    config_item    = invoking_event.get("configurationItem", {})

    if config_item.get("resourceType") != "AWS::Lambda::Function":
        return put_evaluation("NOT_APPLICABLE", config_item, event)

    tags = config_item.get("tags", {})
    if tags.get("BedrockAgent") != "true":
        return put_evaluation("NOT_APPLICABLE", config_item, event)

    function_name = config_item.get("configuration", {}).get("functionName", "unknown")
    subnet_ids    = config_item.get("configuration", {}) \
                               .get("vpcConfig", {}) \
                               .get("subnetIds", [])

    if not subnet_ids:
        return put_evaluation(
            "NON_COMPLIANT", config_item, event,
            annotation=f"Bedrock agent '{function_name}' is not deployed in a VPC"
        )

    return put_evaluation("COMPLIANT", config_item, event)


def put_evaluation(compliance_type, config_item, event, annotation=None):
    evaluation = {
        "ComplianceResourceType": config_item["resourceType"],
        "ComplianceResourceId":   config_item["resourceId"],
        "ComplianceType":         compliance_type,
        "OrderingTimestamp":      config_item["configurationItemCaptureTime"],
    }
    if annotation:
        evaluation["Annotation"] = annotation

    boto3.client("config").put_evaluations(
        Evaluations=[evaluation],
        ResultToken=event.get("resultToken", "token")
    )
    return evaluation

# terraform/governance/config-rules.tf

data "archive_file" "bedrock_vpc_check" {
  type        = "zip"
  source_dir  = "${path.module}/lambda/config-rules/bedrock-vpc-check"
  output_path = "${path.module}/dist/bedrock-vpc-check.zip"
}

resource "aws_lambda_function" "bedrock_vpc_check" {
  filename         = data.archive_file.bedrock_vpc_check.output_path
  source_code_hash = data.archive_file.bedrock_vpc_check.output_base64sha256
  function_name    = "bedrock-governance-vpc-check"
  role             = aws_iam_role.config_rule_lambda.arn
  handler          = "handler.lambda_handler"
  runtime          = "python3.12"
  timeout          = 30
}

resource "aws_lambda_permission" "config_invoke" {
  action         = "lambda:InvokeFunction"
  function_name  = aws_lambda_function.bedrock_vpc_check.function_name
  principal      = "config.amazonaws.com"
  source_account = data.aws_caller_identity.current.account_id
}

resource "aws_config_config_rule" "bedrock_agent_vpc_required" {
  name        = "bedrock-agent-vpc-required"
  description = "Lambda functions tagged BedrockAgent=true must run inside a VPC"

  source {
    owner             = "CUSTOM_LAMBDA"
    source_identifier = aws_lambda_function.bedrock_vpc_check.arn

    source_detail {
      event_source = "aws.config"
      message_type = "ConfigurationItemChangeNotification"
    }
  }

  scope {
    compliance_resource_types = ["AWS::Lambda::Function"]
  }

  depends_on = [
    aws_lambda_permission.config_invoke,
    aws_config_configuration_recorder_status.main
  ]
}

Layer 3: EventBridge + SNS Alerting

# terraform/governance/alerting.tf

resource "aws_sns_topic" "bedrock_compliance_alerts" {
  name              = "bedrock-compliance-violations"
  kms_master_key_id = aws_kms_key.sns.id
}

resource "aws_sns_topic_subscription" "security_team_email" {
  topic_arn = aws_sns_topic.bedrock_compliance_alerts.arn
  protocol  = "email"
  endpoint  = var.security_team_email
}

resource "aws_cloudwatch_event_rule" "config_noncompliant" {
  name        = "bedrock-compliance-violation"
  description = "Fires when a Bedrock Config rule detects a violation"

  event_pattern = jsonencode({
    source      = ["aws.config"]
    detail-type = ["Config Rules Compliance Change"]
    detail = {
      configRuleName = [
        "bedrock-agent-vpc-required",
        "cloudtrail-enabled-bedrock"
      ]
      newEvaluationResult = {
        complianceType = ["NON_COMPLIANT"]
      }
    }
  })
}

resource "aws_cloudwatch_event_target" "notify_security_team" {
  rule      = aws_cloudwatch_event_rule.config_noncompliant.name
  target_id = "NotifySecurityTeam"
  arn       = aws_sns_topic.bedrock_compliance_alerts.arn

  input_transformer {
    input_paths = {
      rule     = "$.detail.configRuleName"
      resource = "$.detail.resourceId"
      account  = "$.account"
      region   = "$.region"
    }
    input_template = "\" Bedrock Governance Violation\\nRule: <rule>\\nResource: <resource>\\nAccount: <account> | Region: <region>\\nAction required: Review in AWS Config console\""
  }
}

Hands-On: Verify Every Control Works

Don't assume your controls are working — test them. Here's the exact sequence to verify each layer.

Test 1: Verify the Model Restriction SCP

Run this from inside an account that's attached to your AI Workloads OU. It tries to call a model outside your approved list — if the SCP is working, it must fail with AccessDeniedException.

# test_scp.py
import boto3, json

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

# Use a model that is NOT in your NotResource approved list
TEST_MODEL = "arn:aws:bedrock:us-east-1::foundation-model/deepseek.r1"

try:
    bedrock.invoke_model(
        modelId=TEST_MODEL,
        body=json.dumps({"prompt": "hi", "max_tokens": 1})
    )
    print(" SCP NOT working — call succeeded, check your policy")
except bedrock.exceptions.ClientError as e:
    code = e.response["Error"]["Code"]
    if code == "AccessDeniedException":
        print(" SCP working — unapproved model blocked correctly")
    else:
        print(f"  Unexpected error ({code}): {e}")

Test 2: Verify the API Key Block

Try to create a Bedrock API key. It must fail:

# Get your current IAM username first
IAM_USER=$(aws sts get-caller-identity --query 'Arn' --output text | cut -d'/' -f2)

# Try to create a Bedrock API key — must fail with AccessDenied if SCP is working
aws iam create-service-specific-credential \
  --user-name "$IAM_USER" \
  --service-name bedrock.amazonaws.com 2>&1

# Expected output contains: "An error occurred (AccessDenied)"
# Note: This test only works if you're authenticated as an IAM User, not a role.
# If you're using a role, the SCP still applies — you just can't test this specific
# action from a role session. Test from a dedicated IAM User in the sandbox account.

Test 3: Trigger the Config Rule

Deploy a non-compliant Lambda (no VPC, tagged as a Bedrock agent), wait for Config to detect it, then clean up:

# Create a minimal deployment package
echo 'def handler(e, c): pass' > /tmp/index.py
cd /tmp && zip dummy.zip index.py

# Deploy non-compliant Lambda — no VPC, tagged as Bedrock agent
aws lambda create-function \
  --function-name bedrock-agent-test-noncompliant \
  --runtime python3.12 \
  --role arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):role/lambda-basic-execution \
  --handler index.handler \
  --zip-file fileb:///tmp/dummy.zip \
  --tags '{"BedrockAgent":"true"}' \
  --no-cli-pager

# Wait for Config to pick it up (~2-3 minutes)
echo "Waiting 3 minutes for Config evaluation..."
sleep 180

# Check the result
aws configservice get-compliance-details-by-resource \
  --resource-type "AWS::Lambda::Function" \
  --resource-id "bedrock-agent-test-noncompliant" \
  --query 'EvaluationResults[0].ComplianceType' \
  --output text
# Expected: NON_COMPLIANT

# Clean up
aws lambda delete-function --function-name bedrock-agent-test-noncompliant

Test 4: Confirm the Alert Pipeline

Force a re-evaluation and confirm the SNS email arrives:

# Trigger manual re-evaluation
aws configservice start-config-rules-evaluation \
  --config-rule-names bedrock-agent-vpc-required cloudtrail-enabled-bedrock

# Check recent violations
aws configservice get-compliance-details-by-config-rule \
  --config-rule-name bedrock-agent-vpc-required \
  --compliance-types NON_COMPLIANT \
  --query 'EvaluationResults[:5].{Resource:EvaluationResultIdentifier.EvaluationResultQualifier.ResourceId,Time:ResultRecordedTime}' \
  --output table

If there's a non-compliant resource and the SNS subscription is confirmed, you should receive an email within 1–2 minutes.

Common Mistakes

1. Only one statement in the API key SCP. Blocking iam:CreateServiceSpecificCredential without also blocking bedrock:CallWithBearerToken leaves short-term API keys completely unblocked. Short-term keys are generated client-side with no CloudTrail creation event — the only way to block them is the second statement.

2. Testing SCPs in production. Org-wide lockouts are real. Attach to a single sandbox account first, run the test sequence above, then expand to the full OU.

3. Using full model version ARNs instead of wildcards. anthropic.claude-3-haiku-20240307-v1:0 stops matching when AWS releases a new version. Use anthropic.* to match the entire family.

4. Forgetting Converse and ConverseStream. The Converse API is the recommended unified inference API. Don't rely on implicit cascading — include it explicitly.

5. Missing iam:ServiceSpecificCredentialServiceName. Without this condition, the API key SCP also blocks CodeCommit and Keyspaces credentials, breaking unrelated workflows.

6. Config Rules deployed without a running recorder. Run describe-configuration-recorder-status before deploying rules. If the recorder isn't running, rules evaluate nothing and return no findings — silently.

What We've Built

AWS Organizations
└── AI Workloads OU
    ├── SCPs (apply to ALL accounts automatically on join)
    │   ├── DenyBedrockAPIKeys (creation + usage, two statements)
    │   ├── EnforceApprovedBedrockModels (wildcard families)
    │   └── DenyUnapprovedModelFamilies (explicit blocklist)
    │
    └── Each Member Account
        ├── Config Recorder
        ├── Config Rules
        │   ├── cloudtrail-enabled-bedrock (managed)
        │   └── bedrock-agent-vpc-required (custom Lambda)
        └── EventBridge → SNS
            └── Email alert on NON_COMPLIANT within ~2 minutes

When a new account joins the AI Workloads OU: SCPs apply immediately, Config picks up existing resources within minutes, and any violation triggers an alert. A developer in that account cannot deploy an insecure Bedrock agent and have it go unnoticed.

What's Next

Article	Topic
Part 1 ✅	Security boundaries: IAM, VPC, GuardDuty, cost controls
Part 2 ✅ (this)	Governance: SCPs, AWS Config, automated alerting
Part 3	Observability: CloudWatch + X-Ray for non-deterministic agents
Part 4	Cost engineering: Lambda vs ECS vs Fargate for agentic workloads

Drop a comment if you've run into governance challenges I didn't cover — especially around CRIS or multi-account setups. I'm also curious how many of you are using Bedrock API keys in production vs sticking with IAM roles.

Part 3 is going to cover observability for agents that are non-deterministic by design — if you've tried to debug a Bedrock agent with X-Ray and felt like you were reading tea leaves, that one's for you.

#AWS #CloudEngineering #AWSCommunityBuilders #Bedrock #Security #Terraform #Governance #AWSOrganizations

Top comments (1)

Ismail G. • Apr 29

Very informative, thanks Nez.