mgbec for AWS Community Builders

Posted on May 18 • Originally published at Medium on May 17

Everything is Under Control

#amazonbedrock #aigovernance #ai #amazonbedrockagentco

I’m a control enthusiast, not a control freak. And control is part of my job description, so no apologies. As an enterprise, with all the new AI tools entering the atmosphere every day, we want to enable innovation and efficiency. We also need to have governance over these tools and their usage. Organizations want to make sure they minimize any potential risks, and of course, have observability into everything that is happening.

I wanted to test an AgentCore Gateway workflow with multiple control mechanisms- https://github.com/mgbec/CEDAR-plus-interceptor. There are three pieces I put into play:

OAuth 2.1 (via Cognito) — “Who are you?”

The problem it solves: Identity and authentication. Before the gateway can make any access decisions, it needs to know who’s making the request and verify they’re legitimate.

What it does in this scenario:

-The agent (or user) authenticates against Cognito with their email/password.

-Cognito issues a JWT containing the user’s identity (sub) and group memberships (cognito:groups: [“engineering”])

-The gateway’s CUSTOM_JWT authorizer validates the token signature, expiry, audience, and issuer against Cognito’s OIDC discovery endpoint.

-If the token is invalid or missing → 401 immediately, nothing else runs

What it can’t do: It has no opinion on what the authenticated user is allowed to do. A valid token from a marketing user looks the same as one from an admin at this layer — both pass authentication.

I had to think about one detail here that was a little confusing to me. Cognito returns both an ID Token and an Access Token. The ID Token tells the client application who the user is and the Access Token tells the gateway about the application client and the scope they are granted. The Access Token does not authorize the user to do anything beyond get to the gateway, however. The access token’s scope claim only gets the request past the gateway’s front door — it’s a binary check: “does this token have a valid scope?”

Real-world analogy: The badge reader at the building entrance. It confirms you’re an employee, but doesn’t know which floors you’re allowed on.

Cedar Policy — “Are you allowed to do this?”

The problem it solves: Authorization. Given a verified identity with known group memberships, should this specific tool invocation be permitted?

What it does in this scenario:

-Reads the cognito:groups claim from the validated JWT to determine the principal

-Evaluates Cedar rules: “Is Group::”engineering” permitted Action::”InvokeTool” on Tool::”DatabaseTools___delete_records”?”

-Returns allow or deny based purely on the static policy set

The forbid on delete_records for engineers is absolute — no other rule can override it

What it can’t do:

It can’t count how many times you’ve called a tool today

It can’t call an external service to check something

It can’t modify the request or response

It can’t make decisions based on the request body content (e.g., “only allow SELECT queries, not DELETE queries”)

Real-world analogy: The access control list on each floor. Engineering badges open the lab doors but not the server room. Marketing badges only open the conference rooms.

Request Interceptor (Rate Limiter Lambda)- “Should we let this through right now?”

https://github.com/mgbec/CEDAR-plus-interceptor/tree/main/lambdas/rate-limiter

The problem it solves: Runtime enforcement that requires state, external lookups, or data transformation — things that can’t be expressed as static allow/deny rules.

What it does in this scenario:

-Runs only after OAuth and Cedar have both passed (no point rate-limiting a request that would be denied anyway)

-Reads the user ID and group from the request context

-Queries DynamoDB: “How many requests has this user made in the current hour?”

-Compares against the role-based quota (admins: 100, engineering: 50, marketing: 20)

-Either passes the request through or returns 429

Real-world analogy: The security guard who checks if the parking lot is full before letting your car in, even though your badge is valid and you’re allowed on that floor.

Response Interceptor (PII Redactor Lambda)- “Is this role allowed to view PII?”

https://github.com/mgbec/CEDAR-plus-interceptor/tree/main/lambdas/pii-redactor

This lambda reads the users’ Cognito group and determines if they are allowed to see PII based on that group membership. Mine is a pretty simple PII detector with detection for just SSN’s, Credit Card Numbers, email addresses, and phone numbers. In production you would want something more robust.

The PII is redacted from responses before they reach the agent, depending on the group they are in.

Static access control is not as ideal here in responders. You could implement role-based permissions in a Lambda, but it’d be harder to audit, version, and reason about than Cedar policies.

Real-world analogy: On the way out of the building, the guard would check you for contraband items being removed from company premises.

Building (and Troubleshooting)

There was quite a bit of troubleshooting involved for me to build this out. I tried both CDK and Terraform. Terraform seemed to work better, but there were some resources that were problematic. Kiro was incredibly helpful with debugging and part of this may have been user error. Issues that seemed to be true are:

Rate limit counters persist across tests — DynamoDB counters use a 1-hour window. If you test marketing (limit 20) and then test again in the same hour, the counter is already at 20+ and everything gets blocked immediately. Clear the table between test runs or wait for the next hour.

UpdateGateway replaces everything- The UpdateGateway API is a full replacement, not a patch. If you call it to attach the policy engine but don’t include interceptorConfigurations, the interceptor gets wiped. Every update must pass through ALL existing fields. https://docs.aws.amazon.com/bedrock-agentcore-control/latest/APIReference/API_UpdateGateway.html. This caused interceptors to disappear multiple times.

Cedar Policy Entity Types- AgentCore::Group doesn’t exist. The valid principal type is AgentCore::OAuthUser. Group membership is checked via tags: principal.hasTag(“cognito:groups”) && principal.getTag(“cognito:groups”) like “*engineering*”. https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/policy-understanding-cedar.html

Tool-specific policies require the exact gateway ARN. You can’t use “resource is AgentCore::Gateway” for tool-scoped policies — the API rejects it. And when the gateway gets recreated (new ID), all policies become stale and need to be recreated with the new ARN.

Gateway recreation breaks policy references- when Terraform recreates the gateway (e.g., terraform apply -replace), it gets a new ID and ARN. All Cedar policies that reference the old gateway ARN stop matching (default-deny kicks in). You have to delete and recreate the policies with the new ARN.

From my understanding, the gateway ARN coupling is by design (security isolation between gateways). The best practice is to treat the gateway as a long-lived resource and avoid recreating it.

Using a combination of scripts and Terraform seemed to work best for me, as long as I remembered the correct order of operations. The danger zone is when either tool updates the gateway — it can wipe what the other entity set. The safest workflow is:

terraform apply (creates/updates gateway shell)
create-policies.sh (attaches policy engine + interceptor, preserving existing config)

Observability (and Troubleshooting)

My first test was a bit of a failure. There is a small amount of observability built into the output of the tests, so we can at least see that things did not go as planned.

However, one of the best things about AgentCore is all of the detailed observability baked into the components.

We can even dig down into the trace level to watch our policies in action.

We can look at the bigger picture of our gateway performance with metrics like denied and allowed policy decisions.

One important thing to note for observability of the PII redactor response interceptor:

The traces and logs capture the response from the Lambda target, which contains the full unredacted PII. The PII redactor runs after that, as the last step before the client receives the response. The observability system records what the Lambda returned, not what the client ultimately saw.

The flow is:

Lambda returns full PII

│

├──→ CloudWatch logs/traces capture THIS (unredacted)

│

▼

PII Redactor intercepts

│

▼

Client receives redacted response

This is actually correct from a security audit perspective — you want the logs to show the full data so that security teams can audit what data was accessed. You can verify the redactor is working by comparing logs versus client response. To quickly see what is returned to the client, you can manually set the token and Gateway URL and then test with curl.

TOKEN=$(./scripts/get-token.sh engineer@example.com 2>/dev/null)

GATEWAY_URL=$(terraform -chdir=terraform output -raw gateway_url)

curl -s -X POST “$GATEWAY_URL” \

-H “Authorization: Bearer $TOKEN” \

-H “Content-Type: application/json” \

-d ‘{“jsonrpc”: “2.0”, “id”: 1, “method”: “tools/call”, “params”: {“name”: “DatabaseTools___run_query”, “arguments”: {“sql”: “SELECT * FROM users”, “database”: “analytics”}}}’ \

| jq -r ‘.result.content[0].text’ | python3 -m json.tool

Next try with an admin user, which should receive unredacted data.

TOKEN=$(./scripts/get-token.sh admin@example.com 2>/dev/null)

GATEWAY_URL=$(terraform -chdir=terraform output -raw gateway_url)

curl -s -X POST “$GATEWAY_URL” \

-H “Authorization: Bearer $TOKEN” \

-H “Content-Type: application/json” \

-d ‘{“jsonrpc”: “2.0”, “id”: 1, “method”: “tools/call”, “params”: {“name”: “DatabaseTools___run_query”, “arguments”: {“sql”: “SELECT * FROM users”, “database”: “analytics”}}}’ \

| jq -r ‘.result.content[0].text’ | python3 -m json.tool

Final Thoughts

So, do I feel like I have things completely under control? Not really, on many levels, but that may be a personal issue. These AgentCore Gateway, in addition to OAuth 2.1, Cedar Policies, and Lambda interceptors, are helping us with constraints and oversight, as well as giving us some assistance with governance. Again, as we have heard over and over, this is such a dynamic field. I’m looking forward to the evolution of our GenAI and cybersecurity fields and the technological transformations we will see. Thanks for reading!