DEV Community

mgbec for AWS Community Builders

Posted on • Originally published at Medium on

Observe and Report (and Prevent) — keeping an eye on your AI with CloudWatch and CloudTrail.

Observe and Report (and Prevent) — keeping an eye on your AI with CloudWatch and CloudTrail.

Artificial intelligence components and data are evolving at rapid-fire speed. How are we supposed to keep tabs on performance, usage, and security?

Our old familiar friends, CloudWatch and CloudTrail, can step up to the plate and monitor our fast moving Bedrock environments. Of course CloudWatch and CloudTrail can monitor all of the usual components that might make up our Bedrock workflow, whether it is API Gateway, Lambda, Dynamo, or something else. For Bedrock itself, there are some specific metrics and data that pertain directly to Bedrock that we can log, measure, and dashboard in CloudWatch and CloudTrail. Likewise, we can generate alarms, or trigger actions based on Bedrock data. More details are available here: https://docs.aws.amazon.com/bedrock/latest/userguide/monitoring.html.

CloudWatch

To get started in CloudWatch, we just need two quick steps:

First, create a CloudWatch log group:

Second, we need to enable Model invocation logging in Bedrock Settings, under the Bedrock configurations sidebar. I am going to include all data types in my logs, just send the logs to CloudWatch, and create a new service role to do this.

Now, let’s get some data to look at. I invoked a few different models and agents that I had created previously.

CloudWatch will not only have the details for the components of your workflow, like Lambda or DynamoDB, but now you will see Bedrock invocation details in your log group.

We can analyze these log details with Log Insights:

We could also use some of the other CloudWatch functionality such as Metrics, Anomaly Detector, and Alarms. A pre-created Dashboard for Bedrock Metrics that has the following fields but we could certainly customize:

-Invocation Count

-Invocation Latency

-Token Counts by Model

-Daily Total Tokens by ModelID

-Input Token Count, Output Token Count

-Requests Grouped by input tokens

-Invocation Throttles

-Invocation Error Count

There is also specific data for some optional components of your workflow- like Knowledge Bases, Guardrails, and Agents.

Knowledge bases:

https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-bases-logging.html

Knowledge base logs will need to be enabled first and there are data ingestion level logs and resource level logs. Data ingestion logs provide ingestion job information like data source id, number of resources updated, ingested, deleted, and more. Resource level logs give us details about the status of the ingestion logs in the pipeline. Is it scheduled, embedded, indexed, partially done, or failed?

Guardrails:

https://docs.aws.amazon.com/bedrock/latest/userguide/monitoring-guardrails-cw-metrics.html

Guardrails are an important part of our Bedrock environment and need attention for both security and performance factors. Metrics include:

-Invocations

-InvocationLatency

-InvocationClientErrors

-InvocationServerErrors

-InvocationThrottles

-InvocationsIntervened

-TextUnitCount

“InvocationsIntervened” is a specific metric for Guardrails giving us the number of invocations where Guardrails took action and intervened.

“TextUnitCount” takes a little more explanation. A text unit is up to 1000 characters. This unit helps AWS determine what to charge for the enabled Guardrail policies. Different types of policies may have different pricing per cost unit than others.

Agents:

Agents are another component that have metrics we can look at-

https://docs.aws.amazon.com/bedrock/latest/userguide/monitoring-agents-cw-metrics.html . Metrics included are:

-InvocationCount

-TotalTime

-TTFT

-InvocationThrottles

-InvocationServerErrors

-InvocationClientErrors

-ModelLatency

-ModelInvocationCount

-ModelInvocationThrottles

-ModelInvocationClientErrors

-ModelInvocationServerErrors

-InputTokenCount

-OutputTokenCount

TTFT is “time to first token” and is only given when streaming configuration is turned on for the agent request.

CloudTrail

Amazon Bedrock Runtime API operations are management events, which are logged by default in CloudTrail. These include:Invoke Model, InvokeModelWithResponseStream, Converse, and ConverseStream.

Agents for Amazon Bedrock Runtime API operations are data events, and not logged by default. These are InvokeAgent, InvokeInlineAgent,Retrieve, RetrieveandGenerate,InvokeFlow, and RenderPrompt.

If you would like to get your CloudTrail data events included, you can enable data event logging.

https://docs.aws.amazon.com/bedrock/latest/userguide/logging-using-cloudtrail.html

To enable the data events for Bedrock:

First, create a new Trail in CloudTrail. Then, click into the new trail and enable data event collection.

The dropdown menu for Resource type will have Bedrock resources. Currently the resource types available are Bedrock Agent Alias, Bedrock Blueprint, Bedrock Data Automation Profile, Bedrock Data Automation Project, Bedrock Flow Alias, Bedrock Guardrail, Bedrock Invoke Inline-Agent, Bedrock Knowledge Base, Bedrock Model, Bedrock Prompt, and Bedrock Session. Choose your options and run some models to generate data.

You can navigate to CloudTrail and click the link to the S3 bucket. Analyze with your choice of tools, whether it is OpenSearch, a SIEM, or something else.

So, what are we looking for in CloudTrail and CloudWatch? Performance and observability, of course, and expenditure review. However, if I wanted to look for security related events, what could I see?

Some attacks on our AI resources could be caught further outside of the ring of fire, in the usual AWS ways, like API Gateway, CloudFront, and WAF. With a layered defense approach, we would want to look for multiple indicators of attacks or compromise throughout the AI workflow. Some things to watch in CloudWatch and CloudTrail logs and metrics include:

Prompt analysis : in CloudWatch logs, the prompt is returned in full, letting us analyze any anomalies or indicators of misuse.

The CloudWatch automatic Bedrock dashboard also can show us anomalous use of input tokens and there is a specific graph that would help us see if we are getting inputs that do not follow our standard pattern- “Requests, grouped by input tokens”.

Response analysis: in CloudWatch logs, we also get to see the full response to the prompt. We will need to evaluate these for things like sensitive or proprietary information disclosure. We will also want to look for hallucinations, illogical responses, or generation of harmful content.

This is the point where I realized that when I set up Bedrock Model invocation logging, I should have specified both S3 and CloudWatch Logs as the logging destination. I’m missing the full response, since just sending the logs to CloudWatch logs will limit output JSON bodies to 100KB in size.

https://docs.aws.amazon.com/bedrock/latest/userguide/model-invocation-logging.html

I went back, created a new bucket and changed this logging destination. I ran a few more invocations and confirmed that I am now getting the full response.

Again, the CloudWatch automatic Bedrock dashboard also can show us anomalous use of output tokens and also graphs “InputTokenCount” against “OutputTokenCounts. “OutputImageCount” is another metric you could watch, but it’s not in the current automatic dashboard.

Performance Degradation: the CloudWatch Metrics automatic Bedrock dashboard is a great place to look for signs of this. Variance in invocation throttles, latency, and errors are included in the standard dashboard. Increased and unexpected invocations is also an indicator. Other metrics that you could set up monitoring for are InvocationClientErrors and InvocationServerErrors.

CloudTrail indicators: unexpected changes in API calls can have a number of origins. I mentioned enabling data event in addition to management event collection above. Specific to Bedrock, the Bedrock API Reference https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Operations_Amazon_Bedrock.html can help us understand what we might be looking for. We can also look at agent specific API calls https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Operations_Agents_for_Amazon_Bedrock.html. Indicators of Bedrock compromise or attack would also show up in API calls to non-Bedrock resources, like S3 or DynamoDB.

Similar to all aspects of information security, monitoring Bedrock resources for indicators of an attack is a layered and reiterated task. Also similar to the AI field, as a whole, the tools we will use to monitor performance and security will keep evolving and changing. What does that mean for us as security professionals? I don’t think we will run out of things to learn any time soon- more brains required, donuts optional. Thanks for reading and let me know if you have any questions or comments!

Top comments (0)