Piotr Pabis for AWS Community Builders

Posted on Jan 29 • Originally published at pabis.eu

IT Support Agent on AWS Bedrock - Implementing Guardrails

#aws #bedrock #guardrails

As I mentioned in the previous post where we have created JIRA ticketing action groups, that I managed to create a ticket that was insulting the IT team. As an IT team, I would like to not be insulted, right? So today we are going to implements Guardrails. But wait, there's more! The database tickets I were supposed to create, could be set to lose access at any date, in the past or in the far future (post-AGI). I will explain to you along the way.

Part 1 - Confluence Knowledge Base Part 2 - JIRA ticket action group GitHub: ppabis/it-agent-bedrock

Guardrail #1 - Insults and profanity

The first Guardrail I would like to implement is the one that will prevent users from posting inappropriate content into the tickets. I will use only input filtering as maybe in the future I would implement another "Action" that will retrieve the tickets and I don't want to be stopped by the guardrail this time. In Terraform code this would look something like this:

resource "aws_bedrockagent_guardrail" "filtering" {
  name        = "ticket-agent-guardrail"
  description = "Input-only guardrail for toxic language and disallowed topics."

  blocked_input_messaging = "Your input includes content that is not allowed to pass further. Please rephrase your message."
  blocked_outputs_message = "Response blocked by guardrail."

  content_policy_config {
    filters_config {
      type            = "HATE"
      input_strength  = "MEDIUM"
      output_strength = "NONE"
    }

    filters_config {
      type            = "INSULTS"
      input_strength  = "MEDIUM"
      output_strength = "NONE"
    }

    filters_config {
      type            = "VIOLENCE"
      input_strength  = "MEDIUM"
      output_strength = "NONE"
    }

    filters_config {
      type            = "SEXUAL"
      input_strength  = "MEDIUM"
      output_strength = "NONE"
    }
  }
}

Guardrail #2 - Financial and hacking topics

I would also like the agent to not help the user hack into the company systems based for example on the knowledge base. It is possible that such Confluence wiki might have some information that can be very useful for malicious actors. What is more, if I plan to equip the agent with some more tools in the future, such as getting current state of the applications or the infrastructure, it could make it even worse. Let's extend the above guardrail.

resource "aws_bedrockagent_guardrail" "filtering" {
  name        = "ticket-agent-guardrail"
  description = "Input-only guardrail for toxic language and disallowed topics."

  // ...

  topic_policy_config {
    topics_config {
      name       = "Financial advice"
      definition = "Requests for personal investment, trading, or financial guidance."
      examples = [
        "Should I buy crypto?",
        "Recommend stocks for a quick profit.",
        "Should I take a loan to buy a house?",
        "Should I take a loan to buy a car?",
        "Should I consolidate my debt?"
      ]
      type = "DENY"
    }

    topics_config {
      name       = "Unauthorized system access"
      definition = "Requests to get information about system vulnerabilities."
      examples = [
        "Which instances have SSH open?",
        "How to increase my privileges without submitting a ticket?"
      ]
      type = "DENY"
    }
  }
}

Guardrail #3 - PII redaction

I really don't want to deal with leaked PII into our company JIRA boards. Let's say one of the employees is not aware of the risks of writing down credit card details (mostly number). I would like to prevent this from happening into our kanban board.

resource "aws_bedrockagent_guardrail" "filtering" {
  name        = "ticket-agent-guardrail"
  description = "Input-only guardrail for toxic language, disallowed topics and sensitive inputs."
  // ...
  sensitive_information_policy_config {
    pii_entities_config {
      type           = "INTERNATIONAL_BANK_ACCOUNT_NUMBER"
      action         = "ANONYMIZE"
      input_enabled  = true
      output_enabled = true
      input_action = "ANONYMIZE"
      output_action = "ANONYMIZE"
    }

    pii_entities_config {
      type           = "CREDIT_DEBIT_CARD_CVV"
      action         = "ANONYMIZE"
      input_enabled  = true
      output_enabled = true
      input_action = "ANONYMIZE"
      output_action = "ANONYMIZE"
    }

    pii_entities_config {
      type           = "CREDIT_DEBIT_CARD_NUMBER"
      action         = "ANONYMIZE"
      input_enabled  = true
      output_enabled = true
      input_action = "ANONYMIZE"
      output_action = "ANONYMIZE"
    }

    pii_entities_config {
      type           = "AWS_SECRET_KEY"
      action         = "ANONYMIZE"
      input_enabled  = true
      output_enabled = true
      input_action = "ANONYMIZE"
      output_action = "ANONYMIZE"
    }
  }
}

Guardrails Cross-Region inference

For some reason, Guardrails are also offered as an Inference Profile of some sort. Thus I would like to use it. As I set up the Agent in Frankfurt, the only cross-region profile I can use is across-EU. I didn't find any way to list the inference profiles for guardrails, so you have to rely on this hardcoded ARN taken from the console.

resource "aws_bedrockagent_guardrail" "filtering" {
  name        = "ticket-agent-guardrail"
  description = "Input-only guardrail for toxic language, disallowed topics and sensitive inputs."
  // ...
  cross_region_config {
    guardrail_profile_identifier = "arn:aws:bedrock:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:guardrail-profile/eu.guardrail.v1:0"
  }
}

Lastly, I would like to publish a version of this guardrail. As everything in Bedrock, almost each resource is versioned, so are guardrails. The use case is that you want to test a change to it on some development agent before publishing it to production (AWS doesn't know GitFlow and multi-account strategies 😂).

resource "aws_bedrock_guardrail_version" "filtering" {
  guardrail_arn = aws_bedrock_guardrail.filtering.guardrail_arn
}

Before we mount the guardrail to the Agent, let's test it first with some prompts. I have already tested the insults section and for some reason, the word "bastards" is not picked up even on high sensitivity, thus I had to increase the strength 😅. In testing some responses are blocked because this is just a raw model without Knowledge base or action groups, so it hallucinates random stuff that might contain some "how to get access" hints.

I will now try to apply the guardrails to our agent and somewhat test a bit how is the flow working with it. Maybe the "hacking" ban will be too strict? Who knows. What is also important is to allow the agent's IAM role to use the guardrail.

data "aws_iam_policy_document" "agent_policy" {
  statement {
    effect = "Allow"
    actions = [
      "bedrock:InvokeModel",
      "bedrock:InvokeModel*",
      "bedrock:GetInferenceProfile",
      "bedrock:ListInferenceProfiles",
      "bedrock:ListTagsForResource",
      "bedrock:GetKnowledgeBase",
      "bedrock:ListKnowledgeBases",
      "bedrock:Retrieve",
      "bedrock:RetrieveAndGenerate",
      "bedrock:ApplyGuardrail",
      "bedrock:GetGuardrail"
    ]
    resources = ["*"]
  }
  // ... continues
}

resource "aws_bedrockagent_agent" "ticketagent" {
  agent_name              = "ticketagent"
  foundation_model        = local.inference_profile_arn
  agent_resource_role_arn = aws_iam_role.agent_role.arn
  depends_on              = [aws_bedrockagent_knowledge_base.confluence]
  prepare_agent           = true
  instruction             = local.prompts.agent_prompt

  guardrail_configuration {
    guardrail_identifier = aws_bedrock_guardrail_version.filtering.guardrail_arn
    guardrail_version    = aws_bedrock_guardrail_version.filtering.version
  }
}

Testing the casual guardrails

I performed some tests after applying the guardrail to the agent. In both cases, the success is dependent on the mood of the agent's model as well as guardrail 😅. Sometimes single words like "database access, my manager will review the permissions" would trip the "hacking" topic, but all in all it looked pretty well. The only thing that turns out doesn't work the way I wanted it to is PII masking. I was sure that when I send a message with credit card number, the input will be filtered and the ticked would contain some redacted fields.

No 🙅.

What AWS does in this case is to simply echo your message with replaced fields, like if it was coming from the model itself. This is crazy and unexpected. I was sure that it will just clean the message on my input and pass it to the model.

One thing that I sill need to change in here is that the date for any access can be arbitrarily long. I can do it in code but there's another feature I would like to try out...

Automated Reasoning checks

It's possible to define the reasoning using JSON and SMT-LIB expressions but that's absolutely crazy to do. What I will use is to write a natural language document, use AWS Console to generate the reasoning tests and import it later with OpenTofu.

Just input set of rules for your agent (such as company guidelines) and wait until it generates. I for example gave the rules to never request access tickets that are longer than 2 years as well as tell agent to always create a ticket for IT on hardware issues instead of guiding the user to fix it themselves.

After generation perform manual tests with the button "Generate tests" and answer some questions with thumbs up or thumbs down. This will provide more tests that will check if the ruleset is correct. Create some annotations to update the rules and create more tests.

Then in OpenTofu add awscc provider, configure it for your region, create a new file import.tf with the following content:

import {
    to = awscc_bedrock_automated_reasoning_policy.it_agent_policy
    id = "arn:aws:bedrock:eu-central-1:123456789012:automated-reasoning-policy/abcdef1234"
}

Then run tofu plan -generate-config-out=arc.tf. A new file arc.tf will be created but it is not complete unfortunately. Using AWS CLI you can fetch missing IDs and ask AI in your IDE to fix it.

POLICY_ARN=arn:aws:bedrock:eu-central-1:123456789012:automated-reasoning-policy/abcdef1234
aws cloudcontrol get-resource \
 --type-name 'AWS::Bedrock::AutomatedReasoningPolicy' \
 --identifier $POLICY_ARN \
 --output text \
 --query ResourceDescription.Properties \
 | jq '.PolicyDefinition.Rules[] | {id: .Id, expression: .Expression}'

Try first apply to import the changes. They will fail. You have to unfortunately recreate the whole policy with IaC if you want to have it in the Terraform state. Proceed to AWS Console to delete the policy (along with the tests).

Now remember to set force_delete = true so that it's easy to delete the policy later on. We also need to create a version of it. Create this block to save a version of reasoning policy.

resource "awscc_bedrock_automated_reasoning_policy_version" "it_agent_policy" {
  policy_arn                   = awscc_bedrock_automated_reasoning_policy.it_agent_policy.policy_arn
  last_updated_definition_hash = awscc_bedrock_automated_reasoning_policy.it_agent_policy.definition_hash
}

The problems emerge

I wanted here to write a simple instruction how to connect the reasoning checks to the guardrail and test it. However, several problems appeared out of nowhere:

aws_bedrock_guardrail doesn't have a field to add ARC,
After simple conversion of aws_bedrock_guardrail to awscc_bedrock_guardrail can't be applied due to a bug I found,
Even after doing everything manually, the checks are not applied to the guardrail when interacting with the agent ¯\(ツ)/¯. It always either passes or says it's too complex,
Bedrock is a "Production Ready Product™️" 🙄 which means AWS "Best Practices™️" means doing everything in the Console using mouse.

Testing ARC

So I will try some prompts that definitely go against the policies defined for reasoning. I will first try to create a normal database ticket, then another one with some date in the far future. Next, I will to troubleshoot my company laptop.

![Help me fix the laptop]https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3o8qhekve1cg9rfqjbe6.jpg)

As you see the checks do not do anything to prevent me from going further. The database tickets are simply created for 2035 and the model tries to look for information in the knowledge base for hardware fixes (even though it suggests creating a ticket, this is simply its own creativity, on other tries it just simply says there's nothing in KB to help me).

Therefore, I decided to revert any commit related to Automated Reasoning Checks.

DEV Community