Vadym Kazulkin for AWS Heroes

Posted on Sep 3 • Edited on Nov 15

Amazon Bedrock AgentCore Runtime - Part 3 AgentCore Observability

#aws #agenticai #serverless #mcp

Introduction

In the part 2 article, we deployed our agent implemented with Strands Agents SDK with the Amazon Bedrock AgentCore Runtime Starter Toolkit.

In this part of the series, we'll dive deeper into the AgentCore Observability.

AgentCore Observability

AgentCore Observability helps us trace, debug, and monitor agent performance in production environments. It offers detailed visualizations of each step in the agent workflow, enabling us to inspect an agent's execution path, audit intermediate outputs, and debug performance bottlenecks and failures.

AgentCore Observability gives us real-time visibility into agent operational performance through access to dashboards powered by Amazon CloudWatch and telemetry for key metrics such as session count, latency, duration, token usage, and error rates. Rich metadata tagging and filtering simplify issue investigation and quality maintenance at scale. AgentCore emits telemetry data in standardized OpenTelemetry (OTEL)-compatible format, enabling us to easily integrate it with our existing monitoring and observability stack.

By default, AgentCore outputs a set of key built-in metrics for agents, gateway resources, and memory resources. We can also instrument our agent code to provide additional span and trace data and custom metrics and logs.

For the detailed information I refer to the following articles:

Model Invocations Metrics

Navigate to the CloudWatch service and select GenAI Observability and then go to the Model Invocations metrics in the "Model Invocations" panel:

You can find very useful metrics there like:

Invocation count
Invocation latency
Invocation throttles
Invocation error count
Token count by model (input and output tokens and total)
Requests, grouped by input token

Here are some examples how these metrics look like after invoking the agent introduced in the part 2 for some time:

The nice thing is that all these metrics are also available per model.

Model Invocation Logging

To get the complete logging for the whole invocation chain, we need to enable logging for the AgentCore Gateway and Memory as well.

Let's also explore model invocation logging which you can find below in the "Model Invocations" panel. In order to receive these logs, we need to enable them :

Please press "enable model invocation logging" button and we'll see the settings page:

Enable model invocation logging, select the data types to include with logs (in our case all of them) and select the logging destinations (in our case CloudWatch only) :

Provide the existing Log group name (in our case agentcore-logging) and choose to create and use a new role (with the name bedrock-role) to authorize Bedrock and press the "Save" button.

After invoking the agent introduced in the part 2 several times we see that the logs are flowing:

We can click on the specific log entry and see the model invocation input and tool(s) used and their results :

as well as the model invocation output:

Enabling CloudWatch Transaction Search for Bedrock AgentCore

But before we cover that we need to enable CloudWatch Transaction Search in the "Bedrock AgentCore" panel in the CloudWatch GenAI Observability console:

Please press the "Configure" button. We'll then land in the X-Ray traces area in the "Transaction Search" panel:

Please press the "Edit" button.

Then check "Enable Transaction Search" and optionally increase the "X-Ray Trace indexing" (rembeber, only 1% if the spans are indexed as trace summaries for free). Then press "Save". We'll see "Ingest OpenTelemtry spans" being in updated state :

And then finally enabled.

Adding the AWS Distro for Open Telemetry (ADOT) SDK to our agent code

To view this data in the CloudWatch console generative AI observability page and in Amazon CloudWatch, we need to add the AWS Distro for Open Telemetry (ADOT) SDK to our agent code. ADOT is a secure, production-ready, AWS-supported distribution of the OpenTelemetry project. Part of the Cloud Native Computing Foundation, OpenTelemetry provides open source APIs, libraries, and agents to collect distributed traces and metrics for application monitoring. With ADOT, we can instrument our applications just once to send correlated metrics and traces to multiple AWS and Partner monitoring solutions. In our case we will send the metrics to the CloudWatch GenAI Observability service.

As we saw in the part 2 article, we deployed our agent implemented with Strands Agents SDK with the Amazon Bedrock AgentCore Runtime starter toolkit. After we executed the command :

agentcore configure --entrypoint agentcore_runtime_demo.py -er IAM_ARN

The Dockerfile was generated which contained among others the following snippets:

FROM public.ecr.aws/docker/library/python:3.13-slim

....

RUN pip install aws-opentelemetry-distro>=0.10.1
.....


CMD ["opentelemetry-instrument", "python", "-m", "agentcore_runtime_demo"]

Package aws-opentelemetry-distro was installed and our agent was instrumented with the opentelemetry-instrument. This is exactly the part where the AWS Distro for Open Telemetry (ADOT) SDK was automatically added to our agent code. You can read more about this topic in the article Getting Started with the Python SDK on Traces and Metrics Instrumentation.

Bedrock AgentCore Agents, Sessions and Traces View

Now let's invoke our agent several times as we described in part 2 and look at the Agents, Sessions and Traces views in the "Bedrock AgentCore" panel in the CloudWatch GenAI Observability.

In the "Agents View" below we see some general metrics of the invoked agent like number of sessions and traces, errors and throttle rate :

In the "Sessions view" below we see an overview of all agent sessions:

When clicking on the session's link we see the linked traces (in our case only 1 trace) of the selected session:

To get the traces for the whole invocation chain, we need to enable tracing for the AgentCore Gateway, Memory and Identity as well.

By clicking on the trace itself we get to see the spans for the whole agent invocation chain :

We see that we first invoke /invocations endpoint (this is the endpoint exposed by the AgentCore Runtime HTTP Server), then Cognito gets invoked to get the authentication token later required to invoke the AgentCore Gateway Open API Target (which is Amazon Gateway URL) which we used in our setup in the part 2. We also see the Strands Agents event loop cycle and that we use Amazon Nova Pro Model which decides to invoke MCP tool DemoOpenAPITargetS3OrderAPI___getOrdersByCreatedDates" exposed via Open API:

We can click on the "Span data" to see the whole payload for each span. On the right side of each span we see its latency in milliseconds.

We also have the "Timeline" view of the same spans where we can also see how much time did each span take in the agent invocation chain:

We also get the "Trajectory" view of the spans where we can better understand the whole flow of the agent invocation:

It's especially useful to have this Trajectory view in case some error occured as it was the case during one of my agent invocations. We can directly see the where (in what span) exactly the error occurred in red:

AgentCore Observability for Open Source Agents not on AgentCore Runtime

With support for OpenTelemetry compatible telemetry and detailed visualizations of each step of the agent workflow, Amazon CloudWatch GenAI Observability enables developers to easily gain visibility into agent behavior also for agents not hosted on AgentCore Runtime. This topic is beyond the scope of this article but you can look into AgentCore Observability for Open Source Agents not on AgentCore Runtime for such examples.

Conclusion

In this part of the series, we dove deeper into the AgentCore Observability in general and explored AWS Distro for OpenTelemetry, with the help of which we can instrument our applications to send correlated metrics and traces to CloudWatch GenAI Observability service. We also explored different "Bedrock AgentCore" panel views (Agents, Sessions and Traces) as well as model invocation metrics and logging.

In the next parts of the series, we'll use Custom Agent implementation instead of the Starter Toolkit which gives us full control over our agent's HTTP interface.

Please also check out my Amazon Bedrock AgentCore Gateway article series. This series also includes the article about AgentCore Gateway Observability to gain the additional visibility to the communication with the AgentCore Gateway.

DEV Community