John Walker for AWS Community Builders

Posted on Jan 31 • Edited on Apr 28

Implementing Guardrails on AWS Bedrock AgentCore

#ai #aws #llm #security

AWS Bedrock Guardrails

When using AI and Gen AI on AWS, users can send questions to your agent with queries that may end up returning information that you don't want to be public, or provide responses that don't match your intention.

This is where Guardrails come in. They are an added layer of control that work with the models you are using in your Agents / Gen AI, to provide controls around content policies, Personally Identifiable Information, restriction on topics and grounding checks.

What are the Guardrails?

Content Filters

Content filters are set up across 6 categories that can be enforced independently for both input and output at different strengths (None / Low / Medium / High).

Hate speech
Insults
Sexual content
Violence
Misconduct (fraud / illegal activity)
Prompt Attack (Jailbreaking / Prompt Injection)

Denied topics

Denied topics allow you to add areas that the AI agent shouldn't respond with, such as a financial services agent that shouldn't provide trading recommendations or competitor product comparisons.

Sensitive Information Redaction

Information that is returned from your systems could potentially include sensitive information that you don't want to return to users. Guardrails provide 3 options when sensitive information is detected, you can

Block - completely block the request
Anonymise - blank out the information, but return the rest of the response
None - just log the information

There are built in types, including Name, Email, Credit / Debit Card number and more, but you can also provide a custom regex patter, for using propriety data you want to remove, like employee ids

Grounding Checks

You can use grounding checks in Bedrock to check the output of your agents against known documentation on the same topic. Grounding compares the model response against 2 axes. The response against the source documents and how similar it is, and how relevant the response is. You can set a threshold of how similar and relevant it needs to be before it is blocked.

Word Filters

If there are specific words you want to remove from response, you can provide a list of words or phrases and the input / output is block if they are detected. This is on top of managed lists for profanity.

Guardrails via the console

In Amazon Bedrock, under Guardrails, choose Create Guardrail. The wizard will then go through each of the policy types. When you set this up in the console, the guardrail will be saved as a draft. This is useful to then use the test console to put through test inputs before publishing so you can tweak the guardrails as you see fit. Then you can publish the guardrail as a version that your agents can run against. Any changes then go to a new version.

Guardrails on Boto3

Creating Guardrails in code is done through the Bedrock SDK. I'll provide some examples using Python and boto3.

This follows 2 steps:
Creating the Guardrail
Publish the Guardrail

Creating the Guardrail

import boto3
import json

client = boto3.client("bedrock", region_name="eu-west-1")

response = client.create_guardrail(
    name="customer-support-guardrail",
    description="Production guardrail for the customer support agent",

    # Content filters — input and output thresholds set independently
    contentPolicyConfig={
        "filtersConfig": [
            {"type": "HATE",          "inputStrength": "HIGH",   "outputStrength": "MEDIUM"},
            {"type": "INSULTS",       "inputStrength": "MEDIUM", "outputStrength": "MEDIUM"},
            {"type": "SEXUAL",        "inputStrength": "HIGH",   "outputStrength": "HIGH"},
            {"type": "VIOLENCE",      "inputStrength": "MEDIUM", "outputStrength": "LOW"},
            {"type": "MISCONDUCT",    "inputStrength": "HIGH",   "outputStrength": "HIGH"},
            {"type": "PROMPT_ATTACK", "inputStrength": "HIGH",   "outputStrength": "NONE"},
        ]
    },

    # Denied topics with natural language definitions
    topicPolicyConfig={
        "topicsConfig": [
            {
                "name": "InvestmentAdvice",
                "definition": "Any recommendation, suggestion, or guidance on buying, "
                              "selling, or holding specific financial instruments or assets.",
                "examples": [
                    "Should I buy Apple stock?",
                    "Is now a good time to invest in crypto?",
                ],
                "type": "DENY",
            }
        ]
    },

    # PII redaction
    sensitiveInformationPolicyConfig={
        "piiEntitiesConfig": [
            {"type": "EMAIL",        "action": "ANONYMIZE"},
            {"type": "PHONE",        "action": "ANONYMIZE"},
            {"type": "NAME",         "action": "ANONYMIZE"},
            {"type": "CREDIT_DEBIT_CARD_NUMBER", "action": "BLOCK"},
        ],
        "regexesConfig": [
            {
                "name": "InternalAccountRef",
                "description": "Internal 8-digit account reference numbers",
                "pattern": r"\bACC-\d{8}\b",
                "action": "ANONYMIZE",
            }
        ],
    },

    # Custom blocked messages
    blockedInputMessaging="I'm not able to process that request. Please rephrase or contact support.",
    blockedOutputsMessaging="I'm unable to provide a response to that. Reference: GRD-001.",
)

guardrail_id = response["guardrailId"]
print(f"Created guardrail: {guardrail_id}")

This will create the guardrail with some examples in content policy covering gate speech, insults, sexual references, violence, misconduct and prompt attacks. It also adds some denied topics, and sensitive information filtering.

The last section customises the messages being send to the user.

Publish the guardrail

Once the guardrail is created, it needs to be published to be used against Bedrock agents.

version_response = client.create_guardrail_version(
    guardrailIdentifier=guardrail_id,
    description="Initial production release — content filters + PII redaction",
)

guardrail_version = version_response["version"]
print(f"Published version: {guardrail_version}")  # e.g. "1"

Updating Guardrails

If you want to change your guardrails, say to add new filters, or change the strength of one of the policies, you call the update_guardrail function.

This provides a new draft guardrail, which you can then publish to become a new version (as published versions are immutable, so each change requires a new version, then agents need to be updated to use the new version)

client.update_guardrail(
    guardrailIdentifier=guardrail_id,
    name="my-agent-guardrail",
    description="Updated: added violence output filter tightening",
    contentPolicyConfig={
        "filtersConfig": [
            # updated config
            {"type": "VIOLENCE", "inputStrength": "MEDIUM", "outputStrength": "MEDIUM"},
        ]
    },
    blockedInputMessaging="I'm not able to process that request.",
    blockedOutputsMessaging="I'm unable to provide a response to that.",
)

# Test DRAFT against your test suite, then publish
new_version_response = client.create_guardrail_version(
    guardrailIdentifier=guardrail_id,
    description="Tightened violence output threshold to MEDIUM",
)
print(f"New version: {new_version_response['version']}")  # "2"

Using Guardrails with Models

Adding a Guardrail into a model happens in the BedrockModel setup in boto3, they can also be attached via the AWS Console.

When using the Strands SDK, you import BedrockModel from strands.models, and add in a guardrail_config section, and add in the guardrail identifier from the create_guardrail call above, and the version number of the published version.

from strands import Agent
from strands.models import BedrockModel

model = BedrockModel(
    model_id="us.anthropic.claude-3-5-sonnet-20241022-v2:0",
    region_name="eu-west-1",
    guardrail_config={
        "guardrailIdentifier": "a1b2c3d4e5f6",  # guardrailId from create_guardrail
        "guardrailVersion": "1",                  # always a specific version, never DRAFT
        "trace": "enabled",                       # surface intervention detail in responses
    },
)

agent = Agent(model=model)

response = agent("Tell me which stocks I should invest in right now.")
print(response)
# With the policies in place, this would be blocked.

DEV Community