AWS Bedrock Guardrails
When using AI and Gen AI on AWS, users can send questions to your agent with queries that may end up returning information that you don't want to be public, or provide responses that don't match your intention.
This is where Guardrails come in. They are an added layer of control that work with the models you are using in your Agents / Gen AI, to provide controls around content policies, Personally Identifiable Information, restriction on topics and grounding checks.
What are the Guardrails?
Content Filters
Content filters are set up across 6 categories that can be enforced independently for both input and output at different strengths (None / Low / Medium / High).
- Hate speech
- Insults
- Sexual content
- Violence
- Misconduct (fraud / illegal activity)
- Prompt Attack (Jailbreaking / Prompt Injection)
Denied topics
Denied topics allow you to add areas that the AI agent shouldn't respond with, such as a financial services agent that shouldn't provide trading recommendations or competitor product comparisons.
Sensitive Information Redaction
Information that is returned from your systems could potentially include sensitive information that you don't want to return to users. Guardrails provide 3 options when sensitive information is detected, you can
- Block - completely block the request
- Anonymise - blank out the information, but return the rest of the response
- None - just log the information
There are built in types, including Name, Email, Credit / Debit Card number and more, but you can also provide a custom regex patter, for using propriety data you want to remove, like employee ids
Grounding Checks
You can use grounding checks in Bedrock to check the output of your agents against known documentation on the same topic. Grounding compares the model response against 2 axes. The response against the source documents and how similar it is, and how relevant the response is. You can set a threshold of how similar and relevant it needs to be before it is blocked.
Word Filters
If there are specific words you want to remove from response, you can provide a list of words or phrases and the input / output is block if they are detected. This is on top of managed lists for profanity.
Guardrails via the console
In Amazon Bedrock, under Guardrails, choose Create Guardrail. The wizard will then go through each of the policy types. When you set this up in the console, the guardrail will be saved as a draft. This is useful to then use the test console to put through test inputs before publishing so you can tweak the guardrails as you see fit. Then you can publish the guardrail as a version that your agents can run against. Any changes then go to a new version.
Guardrails on Boto3
Creating Guardrails in code is done through the Bedrock SDK. I'll provide some examples using Python and boto3.
This follows 2 steps:
Creating the Guardrail
Publish the Guardrail
Creating the Guardrail
import boto3
import json
client = boto3.client("bedrock", region_name="eu-west-1")
response = client.create_guardrail(
name="customer-support-guardrail",
description="Production guardrail for the customer support agent",
# Content filters — input and output thresholds set independently
contentPolicyConfig={
"filtersConfig": [
{"type": "HATE", "inputStrength": "HIGH", "outputStrength": "MEDIUM"},
{"type": "INSULTS", "inputStrength": "MEDIUM", "outputStrength": "MEDIUM"},
{"type": "SEXUAL", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "VIOLENCE", "inputStrength": "MEDIUM", "outputStrength": "LOW"},
{"type": "MISCONDUCT", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "PROMPT_ATTACK", "inputStrength": "HIGH", "outputStrength": "NONE"},
]
},
# Denied topics with natural language definitions
topicPolicyConfig={
"topicsConfig": [
{
"name": "InvestmentAdvice",
"definition": "Any recommendation, suggestion, or guidance on buying, "
"selling, or holding specific financial instruments or assets.",
"examples": [
"Should I buy Apple stock?",
"Is now a good time to invest in crypto?",
],
"type": "DENY",
}
]
},
# PII redaction
sensitiveInformationPolicyConfig={
"piiEntitiesConfig": [
{"type": "EMAIL", "action": "ANONYMIZE"},
{"type": "PHONE", "action": "ANONYMIZE"},
{"type": "NAME", "action": "ANONYMIZE"},
{"type": "CREDIT_DEBIT_CARD_NUMBER", "action": "BLOCK"},
],
"regexesConfig": [
{
"name": "InternalAccountRef",
"description": "Internal 8-digit account reference numbers",
"pattern": r"\bACC-\d{8}\b",
"action": "ANONYMIZE",
}
],
},
# Custom blocked messages
blockedInputMessaging="I'm not able to process that request. Please rephrase or contact support.",
blockedOutputsMessaging="I'm unable to provide a response to that. Reference: GRD-001.",
)
guardrail_id = response["guardrailId"]
print(f"Created guardrail: {guardrail_id}")
This will create the guardrail with some examples in content policy covering gate speech, insults, sexual references, violence, misconduct and prompt attacks. It also adds some denied topics, and sensitive information filtering.
The last section customises the messages being send to the user.
Publish the guardrail
Once the guardrail is created, it needs to be published to be used against Bedrock agents.
version_response = client.create_guardrail_version(
guardrailIdentifier=guardrail_id,
description="Initial production release — content filters + PII redaction",
)
guardrail_version = version_response["version"]
print(f"Published version: {guardrail_version}") # e.g. "1"
Updating Guardrails
If you want to change your guardrails, say to add new filters, or change the strength of one of the policies, you call the update_guardrail function.
This provides a new draft guardrail, which you can then publish to become a new version (as published versions are immutable, so each change requires a new version, then agents need to be updated to use the new version)
client.update_guardrail(
guardrailIdentifier=guardrail_id,
name="my-agent-guardrail",
description="Updated: added violence output filter tightening",
contentPolicyConfig={
"filtersConfig": [
# updated config
{"type": "VIOLENCE", "inputStrength": "MEDIUM", "outputStrength": "MEDIUM"},
]
},
blockedInputMessaging="I'm not able to process that request.",
blockedOutputsMessaging="I'm unable to provide a response to that.",
)
# Test DRAFT against your test suite, then publish
new_version_response = client.create_guardrail_version(
guardrailIdentifier=guardrail_id,
description="Tightened violence output threshold to MEDIUM",
)
print(f"New version: {new_version_response['version']}") # "2"
Using Guardrails with Models
Adding a Guardrail into a model happens in the BedrockModel setup in boto3, they can also be attached via the AWS Console.
When using the Strands SDK, you import BedrockModel from strands.models, and add in a guardrail_config section, and add in the guardrail identifier from the create_guardrail call above, and the version number of the published version.
from strands import Agent
from strands.models import BedrockModel
model = BedrockModel(
model_id="us.anthropic.claude-3-5-sonnet-20241022-v2:0",
region_name="eu-west-1",
guardrail_config={
"guardrailIdentifier": "a1b2c3d4e5f6", # guardrailId from create_guardrail
"guardrailVersion": "1", # always a specific version, never DRAFT
"trace": "enabled", # surface intervention detail in responses
},
)
agent = Agent(model=model)
response = agent("Tell me which stocks I should invest in right now.")
print(response)
# With the policies in place, this would be blocked.
Top comments (0)