DEV Community

Dipayan Das
Dipayan Das

Posted on

Using Guardrails in Amazon Bedrock: Building Safer and Governed Generative AI Applications

As Generative AI moves from experimentation to production, governance, safety, and control become non-negotiable—especially in regulated or enterprise environments. Amazon Bedrock addresses this need through Guardrails, a built-in capability that allows organizations to define and enforce policies on model behavior across prompts and responses.
In this blog, we walk through how to create and configure a Guardrail in Amazon Bedrock, using the AWS Console and the screenshots provided. By the end, you’ll understand not only how to set up guardrails, but also why they are critical for responsible AI adoption.

What Are Guardrails in Amazon Bedrock?

Amazon Bedrock Guardrails help you:
• Enforce responsible AI policies
• Restrict unsafe or non-compliant content
• Control how models respond to sensitive topics
• Apply consistent rules across models and applications
Guardrails operate as a policy layer that sits between:
• User prompts
• Foundation models
• Generated responses
This ensures outputs align with organizational, legal, and ethical standards—without modifying model weights or prompts.

Step 1: Navigate to Guardrails in Amazon Bedrock

  1. Sign in to the AWS Management Console
  2. Open Amazon Bedrock
  3. From the left navigation menu, locate Guardrails (under Build or Safety & Governance, depending on UI version) You will see the Guardrails Overview page, which includes: • A list of existing guardrails • Options to create, test, and deploy guardrails • Account-level and organization-level enforcement settings

You will see the Guardrails Overview page, which includes:
• A list of existing guardrails
• Options to create, test, and deploy guardrails
• Account-level and organization-level enforcement settings

Step 2: Create a New Guardrail

On the Guardrails page:
Click Create guardrail

This launches a guided configuration workflow.

You’ll see a high-level flow indicating:

  • Create guardrail
  • Test guardrail
  • Deploy guardrail This mirrors the recommended lifecycle: define → validate → enforce.

Step 3: Provide Guardrail Details

In the Provide guardrail details screen:

  1. Enter a Guardrail name o Example: enterprise-genai-guardrail
  2. (Optional) Add a description o Example: “Guardrail to enforce safe, compliant responses for enterprise GenAI workloads.”
  3. Review the note indicating that: o Guardrails can be reused across multiple applications o Updates are versioned and auditable The screen shows the “Provide guardrail details” page with a section highlighted for Guardrail instructions. Key fields include: • Guardrail name – a unique identifier for the guardrail • Description – optional context for administrators • Guardrail instructions (highlighted)

Guardrail instructions are high-level behavioral rules written in plain language that tell the model:
• What it should do
• What it should not do
• How it should respond in restricted or ambiguous situations
Example from the screenshot:
“Say the model cannot answer questions that contain sensitive content.”
This instruction ensures the model:
• Recognizes sensitive or restricted prompts
• Refuses or safely redirects the response
• Maintains compliance without relying on prompt engineering alone
Guardrail instructions:
• Apply consistently across all prompts and applications
• Work independently of the user’s input prompt
• Reduce hallucinations and unsafe responses
• Enforce enterprise, regulatory, or ethical constraints
Unlike prompt instructions, guardrail instructions cannot be overridden by users, making them ideal for production workloads.

Step 4: Configure Content Filters (Optional but Recommended)

This step allows you to configure content filters that control what types of content the model is allowed to generate or respond to. Content filters work alongside guardrail instructions to enforce safety, compliance, and responsible AI policies.
In the screenshot, this step is labeled “Configure content filters – optional”, but in practice it is strongly recommended for production use cases.

In the screenshot, this step is labeled “Configure content filters – optional”, but in practice it is strongly recommended for production use cases.

Content filters automatically detect and restrict responses related to unsafe or sensitive topics, even if the prompt attempts to bypass instructions.
They provide category-based controls with configurable enforcement levels.

Content Categories Shown in the Screenshot

  1. Harmful Content
    Controls responses related to:
    • Violence
    • Self-harm
    • Abuse
    • Illegal activities
    You can set enforcement levels such as:
    • Low
    • Medium
    • High
    Higher confidence levels increase strictness and reduce the risk of unsafe output.

  2. Prompt Attacks (Prompt Injection Protection)
    This section protects against:
    • Prompt injection
    • Jailbreak attempts
    • Instructions that try to override system or guardrail rules
    Example attacks blocked:
    • “Ignore previous instructions and do X”
    • “Act as an unrestricted model”
    Prompt attacks are one of the most common risks in GenAI systems. This filter ensures guardrails cannot be bypassed by clever phrasing.

  3. Content Filters for Custom Categories
    This section allows you to restrict:
    • Competitive intelligence
    • Proprietary or confidential topics
    • Domain-specific sensitive content
    You can choose:
    • Default filtering behavior
    • Custom filtering tailored to your organization
    When a user submits a prompt:

  • Bedrock evaluates the input against configured content filters
  • The model output is scanned before being returned
  • if a violation is detected: The response is blocked, redacted, or replaced with a safe alternative
  • The action is logged for audit and review

This happens before the user ever sees the response.

Step 5 – Add Denied Topics (Optional but Strongly Recommended)

This step allows you to explicitly define topics that the model must never respond to, regardless of how the prompt is phrased. It provides deterministic topic-level control, which is especially important for regulated, enterprise, or brand-sensitive use cases.
In the screenshot, this step is labeled “Add denied topics – optional”, but in production environments it is considered a best practice.
Denied topics are explicit subject areas that you want to completely block from AI-generated responses. If a user prompt falls into one of these topics:
• The model will not generate an answer
• A safe refusal or redirection message is returned instead
This enforcement happens before generation, making it stronger than prompt-based instructions.
From the screenshot, you can see a table where each denied topic includes:
• Name
A short, descriptive label (e.g., Insider trading advice, Medical diagnosis, Operational grid switching).
• Definition
A clear explanation of what the topic includes. This helps the model accurately classify prompts.
• Sample prompts
Example questions that fall under this denied topic. These improve detection accuracy.
• Output action
Defines what the model should do when the topic is detected (for example, refuse to answer or provide a safe alternative).
• Status
Indicates whether the denied topic is enabled.
You can click “Add denied topic” to create new entries or edit existing ones.
Denied topics provide:
• Hard boundaries that cannot be overridden
• Protection against policy violations, even with clever prompt phrasing
• Strong controls for regulated advice, proprietary knowledge, or unsafe instructions
Examples of common denied topics:
• Medical or legal advice
• Financial trading or insider information
• Critical infrastructure operations
• Proprietary algorithms or internal business strategies
• Instructions for illegal or harmful activities
When a prompt is submitted:

  1. Bedrock evaluates whether it matches any denied topic
  2. If a match is found: o Generation is blocked o A predefined refusal or safe message is returned
  3. The event is logged for audit and monitoring This occurs before content filters and generation, making it one of the strongest guardrail mechanisms.

This screenshot shows the “Add denied topic” configuration dialog in Amazon Bedrock Guardrails. In this step, you define a specific topic that the model must not respond to, along with how Bedrock should handle requests related to that topic.
This step gives you deterministic, topic-level enforcement beyond general safety filters.
At the top, you provide a clear definition of the denied topic.
• This description explains what the topic includes and excludes
• It helps Bedrock accurately classify user prompts

Write definitions in plain, unambiguous language that clearly captures the intent of the restriction.
Example:
“This topic includes any requests for medical diagnosis, treatment recommendations, or health advice intended for personal use.”

In the Input section, you specify how Bedrock should treat incoming prompts that match this denied topic.
• This tells Bedrock to detect and block prompts related to the defined topic
• The system uses this configuration to intercept the request before generation
This ensures the model does not even attempt to reason about restricted subject matter.
In the Output section, you define what the model should do instead of answering.
Typical behaviors include:
• Refusing to answer politely
• Providing a safe redirection
• Displaying a predefined message
Example output behavior:
“I’m not able to help with that request, but I can provide general information on related topics.”
This protects user experience while maintaining compliance.

You can optionally add sample prompts that represent real user questions related to the denied topic.
Why this matters:
• Improves detection accuracy
• Helps Bedrock learn how users may phrase restricted requests
• Strengthens enforcement against prompt paraphrasing
Example:
• “What medication should I take for chest pain?”
• “Can you diagnose this condition based on symptoms?”
When a user submits a prompt:

  1. Bedrock checks if the prompt matches any denied topic
  2. If a match is found: o Generation is blocked o The configured output behavior is triggered
  3. The event is logged for governance and auditing This happens before content filters and model generation, making it one of the strongest guardrail controls.

Step 6 – Add Sensitive Information Filters (PII Protection)

This step allows you to configure Sensitive Information Filters in Amazon Bedrock Guardrails. These filters ensure that the model does not expose, generate, or repeat personally identifiable information (PII) or other sensitive data in its responses.
This is a critical step for enterprise, regulated, and customer-facing applications.
The screenshot shows the “Add new PII” dialog under Add sensitive information filters – optional.
Here, you explicitly define:
• What type of sensitive information to detect
• How the model should handle that information
• Whether it applies to user input, model output, or both

  1. Select the PII Type
    At the top, you select a PII Type from a predefined list. Examples shown include:
    • Name
    • Email
    • Phone number
    • Address
    • Date of birth
    • Username
    • Driver’s license number
    • Passport number
    • Credit card number
    • Bank account number
    • Social Security Number
    • Tax ID
    These are pre-trained, system-recognized PII categories, meaning Bedrock already understands how to detect them accurately.

  2. Choose the Filter Action
    After selecting the PII type, you define how Bedrock should respond when this data is detected.
    Typical actions include:
    • Block – Prevent the response entirely
    • Redact – Mask or remove the sensitive portion
    • Allow with warning (if applicable in your configuration)
    Best practice:
    • Use Block or Redact for PII in production workloads
    • Avoid allowing raw PII to pass through generative responses

  3. Scope of Enforcement
    The configuration applies to:
    • Input prompts (prevents users from submitting PII)
    • Model outputs (prevents AI from generating or echoing PII)
    • Or both
    This ensures protection regardless of where the sensitive data originates.
    Sensitive information filters:
    • Protect user privacy
    • Support compliance with regulations (GDPR, HIPAA, CCPA)
    • Reduce legal and reputational risk
    • Prevent accidental data leakage in GenAI outputs
    Unlike prompt-based controls, these filters are system-enforced and cannot be bypassed by clever wording.
    When a request is processed:

  4. Bedrock scans the prompt and response for configured PII types

  5. If sensitive data is detected:
    The defined action (block/redact) is applied

  6. The response is either sanitized or rejected before reaching the user
    4.Events are logged for audit and governance

Step 7 – Add Contextual Grounding Checks (Optional but Highly Recommended)

This step allows you to configure Contextual Grounding Checks, which ensure that the model’s responses are relevant, accurate, and grounded in the provided source material rather than hallucinated or inferred beyond context.
This is a critical control for RAG (Retrieval-Augmented Generation) and enterprise GenAI applications where answers must be based on verified knowledge.
Contextual grounding checks evaluate whether:
• The response is supported by retrieved context or known sources
• The answer stays within scope of the user’s request
• The model avoids fabricating facts or unsupported claims
In the screenshot, you see two major grounding dimensions:

  1. Grounding
  2. Relevance
  3. Grounding Check The Grounding section verifies that responses are anchored to the provided context (e.g., Knowledge Base documents, retrieved chunks, or reference material). Configuration Options Shown: • Enable grounding check (toggle) • Confidence threshold (slider) • Action on violation (Block / Allow / Warn) What This Means: • If the model generates content not supported by retrieved sources, the grounding check detects it. • If confidence falls below the threshold, Bedrock can: o Block the response o Return a safer alternative o Log the violation for review

Use grounding checks whenever your application relies on:
• Knowledge Bases
• Enterprise documents
• Regulatory or factual content

  1. Relevance Check The Relevance section ensures that the response directly answers the user’s question and does not drift into unrelated or speculative content. Configuration Options Shown: • Enable relevance check • Minimum relevance threshold • Action on violation Why This Matters: • Prevents verbose but irrelevant responses • Reduces “confident but wrong” outputs • Improves user trust and answer quality This is especially useful in: • Customer support bots • Internal knowledge assistants • Compliance-driven applications When a prompt is submitted:
  2. Bedrock retrieves relevant context (if applicable)
  3. The model generates a response
  4. Grounding and relevance checks evaluate: o Is this answer supported by context? o Does it directly address the question?
  5. If thresholds are not met: o The configured action is applied (block, warn, or replace)
  6. The final response is returned (or withheld) This happens after generation but before delivery to the user, ensuring unsafe or misleading answers are intercepted.

Step 8 – Add Automated Reasoning Check

This step enables Automated Reasoning, a guardrail capability that evaluates whether the model’s response follows logical, policy-compliant reasoning before it is returned to the user. It is designed to prevent internally inconsistent, illogical, or policy-violating conclusions, even when the content itself appears safe.

What Is Automated Reasoning?
Automated Reasoning uses formal logic techniques to:
• Validate that the model’s reasoning aligns with defined policies
• Detect contradictions or invalid inferences
• Ensure responses comply with organizational rules, constraints, and assumptions
This goes beyond content filtering by checking how the model arrived at an answer, not just what it says.

  1. Enable Automated Reasoning Policy • The toggle enables or disables automated reasoning checks for this guardrail. • When enabled, Bedrock evaluates the model’s reasoning chain against configured policies. This is especially valuable for decision-support, compliance, and regulated workloads.
  2. Confidence Threshold • You define a confidence threshold (for example, 0.8). • If the system’s confidence that the response follows correct reasoning falls below this threshold, enforcement actions are triggered. Start with a moderate threshold and increase it as you validate performance.
  3. Policy Selection
    You can associate one or more reasoning policies with the guardrail, such as:
    • Business rules
    • Compliance constraints
    • Operational logic (e.g., “do not recommend action X without condition Y”)
    These policies act as logical constraints that the model must respect.

  4. Enforcement Action
    If a response fails the reasoning check, Bedrock can:
    • Block the response
    • Replace it with a safe alternative
    • Log the violation for audit and monitoring
    This ensures that even fluent responses that sound correct but violate logic or policy never reach end users.
    How This Works at Runtime

  5. The model generates a response

  6. Automated Reasoning evaluates:
    o Logical consistency
    o Policy adherence
    o Confidence in valid reasoning

  7. If the confidence score is below the threshold:
    o The configured enforcement action is applied

  8. The final response is either delivered, modified, or blocked
    This occurs after generation but before delivery, making it a strong last-line control.

What this looks like after configuring content filters.

Step 9 – Select and Attach a Model to the Guardrail

In this step, you choose which foundation model(s) the guardrail will be applied to. Guardrails in Amazon Bedrock are model-agnostic policies, but they only take effect once they are explicitly attached to a model.
The dialog is divided into three sections:

  1. Categories (Model Providers) On the left, you see different model providers, such as: • Amazon • Anthropic • Cohere • Meta • Mistral • Others supported by Bedrock This allows you to apply the same guardrail consistently across multiple model families if needed.
  2. Models In the center panel, you select a specific foundation model from the chosen provider. From the screenshot, an Amazon model such as: • Nova Pro (or similar Nova family model) is selected. This determines which model’s inputs and outputs will be governed by the guardrail rules you configured (content filters, denied topics, PII filters, grounding checks, automated reasoning, etc.).
  3. Inference Type On the right, you select the inference mode, such as: • On-demand (shown in the screenshot) This defines how the model is invoked at runtime and ensures the guardrail is enforced consistently regardless of invocation method. Once you apply the selection: • The guardrail becomes active for that model • Every prompt sent to the model is evaluated against the guardrail • Every generated response is filtered, validated, or blocked according to the configured policies • Enforcement happens before responses are returned to applications or users This creates a policy enforcement boundary around the model. Without attaching a model: • Guardrails exist but are not enforced • Applications may inadvertently bypass governance By attaching the guardrail: • Safety and compliance become automatic • Policies apply uniformly across applications • Teams can reuse the same guardrail for multiple workloads This is especially useful in enterprise environments where multiple teams use the same models.

Step 10 – Guardrail Validation

This screen represents the Guardrail Validation (Test Guardrail) stage in Amazon Bedrock. It allows you to validate that your guardrail behaves exactly as intended before deploying it to production workloads.
Guardrail validation is a critical quality-control step that ensures safety, compliance, and governance rules are enforced correctly.
The screen is divided into two main areas:

  1. Guardrail Overview (Left Panel)
    This section displays:
    • Guardrail name
    • Status (e.g., Working draft)
    • Model attached (e.g., Nova Pro)
    • Guardrail configuration summary
    • Versions (draft vs deployed)
    This confirms:
    • The guardrail is correctly configured
    • It is not yet promoted to a deployed version
    • It is safe to test without impacting production traffic

  2. Test Guardrail Panel (Right Side)
    This is where validation happens.
    You can:
    • Enter a test prompt
    • Choose the foundation model governed by the guardrail
    • Execute the test and observe the outcome
    The test evaluates all guardrail components simultaneously, including:
    • Guardrail instructions
    • Content filters
    • Denied topics
    • Sensitive information (PII) filters
    • Contextual grounding checks
    • Automated reasoning checks
    When you click Test:

  3. The prompt is submitted to the selected foundation model

  4. The model generates a response

  5. The guardrail evaluates:
    o Whether the prompt violates any denied topics
    o Whether content filters are triggered
    o Whether sensitive information is detected
    o Whether grounding and reasoning thresholds are met

  6. Based on enforcement rules, the response is:
    o Allowed
    o Modified (redacted / replaced)
    o Blocked with a safe refusal message
    All actions happen before the response is returned, exactly as they would in production. You can see from here that no guardrail action is taken as prompt did not violate any restriction.

The screenshot below illustrates that the validation process confirms the guardrail is properly integrated with the model, effectively intercepting responses and enforcing established safety and governance policies. The test results indicate that responses are appropriately modified or constrained, demonstrating the guardrail’s proper functionality and readiness for deployment.

This below screenshot represents the Guardrail evaluation outcome after running a test prompt against a foundation model governed by an Amazon Bedrock Guardrail. It shows how each guardrail control was evaluated and enforced for the submitted prompt.

  1. Test Prompt Execution (Left Panel) Prompt The prompt entered (highlighted at the top) is intentionally crafted to trigger guardrail logic—for example, asking for content that violates defined policies. This is a best practice during validation: • Test with adversarial or edge-case prompts • Confirm that guardrails activate as expected
  2. Guardrail Action (Left Panel – Bottom) The Guardrail action section clearly states: “Sorry, the model cannot answer the question as it appears sensitive.” This message confirms that: • The model response was intercepted • The guardrail blocked or modified the output • A safe refusal message was returned instead of raw model output Key signal: The guardrail is actively enforcing policy, not just monitoring.
  3. Policy Evaluation Summary (Right Panel) The right panel provides a policy-by-policy evaluation breakdown, which is the most important validation artifact. Content Filters • Status: Tested • Indicates that harmful or restricted content categories were evaluated • No unsafe content was allowed through Denied Topics • Status: Enforced (Triggered) • Confirms the prompt matched a denied topic • This is the primary reason the response was blocked Sensitive Information Filters • Status: Not triggered • Indicates no PII or sensitive data was detected in this specific prompt This granular breakdown proves: • The correct policy fired • Other policies were evaluated but not unnecessarily triggered • Enforcement is precise, not over-aggressive
  4. What This Confirms Technically From a guardrail evaluation standpoint, this screen proves that:
  5. Prompt classification is working o The system correctly identified the prompt as violating a denied topic
  6. Policy enforcement is deterministic o The guardrail did not rely on prompt engineering or model discretion o The response was blocked before generation reached the user
  7. Safe fallback behavior is configured o The user received a compliant refusal message o No sensitive or unsafe content leaked
  8. Multi-layer guardrails are functioning o Content filters, denied topics, and PII checks all executed o Only the relevant control was enforced

*MAKE SURE TO DELETE KNOWLEDGE BASE *

When you delete an Amazon Bedrock Knowledge Base, Bedrock removes only:

The Knowledge Base configuration in Bedrock

Metadata that links the KB to:

  • The data source (S3)
  • The embedding model
  • The vector store
  • The ability to query or sync that Knowledge Base

It does NOT delete underlying infrastructure resources.

What Is Not Deleted Automatically

As our Knowledge Base uses Amazon OpenSearch Serverless as the vector store, the following remain intact after KB deletion:

OpenSearch Serverless collection

  • Vector index
  • Stored embeddings
  • Network and security policies
  • IAM access policies

These resources continue to:

  • Exist in your AWS account
  • Consume cost
  • Be accessible (subject to IAM policies)

Deleting an Amazon Bedrock Knowledge Base does not delete Amazon OpenSearch Serverless resources; those must be deleted manually if no longer needed to avoild any additional cost from AWS end.

Top comments (0)