Introduction
In the part 2 article, we deployed our agent with the Amazon Bedrock AgentCore Runtime Starter Toolkit.
In this part of the series, we'll dive deeper into the AgentCore Observability.
AgentCore Observability
AgentCore Observability helps us trace, debug, and monitor agent performance in production environments. It offers detailed visualizations of each step in the agent workflow, enabling us to inspect an agent's execution path, audit intermediate outputs, and debug performance bottlenecks and failures.
AgentCore Observability gives us real-time visibility into agent operational performance through access to dashboards powered by Amazon CloudWatch and telemetry for key metrics such as session count, latency, duration, token usage, and error rates. Rich metadata tagging and filtering simplify issue investigation and quality maintenance at scale. AgentCore emits telemetry data in standardized OpenTelemetry (OTEL)-compatible format, enabling us to easily integrate it with our existing monitoring and observability stack.
By default, AgentCore outputs a set of key built-in metrics for agents, gateway resources, and memory resources. We can also instrument our agent code to provide additional span and trace data and custom metrics and logs.
For the detailed information I refer to the following articles:
- Observe your agent applications on Amazon Bedrock AgentCore Observability
- Add observability to your Amazon Bedrock AgentCore resources
Model Invocations Metrics
Navigate to the CloudWatch service and select GenAI Observability and then go to the Model Invocations metrics in the "Model Invocations" panel:
You can find very useful metrics there like:
- Invocation count
- Invocation latency
- Invocation throttles
- Invocation error count
- Token count by model (input and output tokens and total)
- Requests, grouped by input token
Model Invocation Logging
Let's also explore model invocation logging which you can find below in the "Model Invocations" panel. In order to receive these logs, we need to enable them :
Please press "enable model invocation logging" button and we'll see the settings page:
Enable model invocation logging, select the data types to include with logs (in our case all of them) and select the logging destinations (in our case CloudWatch only) :
Provide the existing Log group name (in our case agentcore-logging) and choose to create and use a new role (with the name bedrock-role) to authorize Bedrock and press the "Save" button.
After invoking the agent several times we see that the logs are flowing:
We can click on the specific log entry and see the model invocation input and tool(s) used and their results :
as well as the model invocation output:
Enabling CloudWatch Transaction Search for Bedrock AgentCore
But before we cover that we need to enable CloudWatch Transaction Search in the "Bedrock AgentCore" panel in the CloudWatch GenAI Observability console:
Please press the "Configure" button. We'll then land in the X-Ray traces area in the "Transaction Search" panel:
Please press the "Edit" button.
Then check "Enable Transaction Search" and optionally increase the "X-Ray Trace indexing" (rembeber, only 1% if the spans are indexed as trace summaries for free). Then press "Save". We'll see "Ingest OpenTelemtry spans" being in updated state :
And then finally enabled.
Adding the AWS Distro for Open Telemetry (ADOT) SDK to our agent code
To view this data in the CloudWatch console generative AI observability page and in Amazon CloudWatch, we need to add the AWS Distro for Open Telemetry (ADOT) SDK to our agent code. ADOT is a secure, production-ready, AWS-supported distribution of the OpenTelemetry project. Part of the Cloud Native Computing Foundation, OpenTelemetry provides open source APIs, libraries, and agents to collect distributed traces and metrics for application monitoring. With ADOT, we can instrument our applications just once to send correlated metrics and traces to multiple AWS and Partner monitoring solutions. In our case we will send the metrics to the CloudWatch GenAI Observability service.
As we saw in the part 2 article, we deployed our agent with the Amazon Bedrock AgentCore Runtime starter toolkit. After we executed the command :
agentcore configure --entrypoint agentcore_runtime_demo.py -er IAM_ARN
The Dockerfile was generated which contained among others the following snippets:
FROM public.ecr.aws/docker/library/python:3.13-slim
....
RUN pip install aws-opentelemetry-distro>=0.10.1
.....
CMD ["opentelemetry-instrument", "python", "-m", "agentcore_runtime_demo"]
Package aws-opentelemetry-distro was installed and our agent was instrumented with the opentelemetry-instrument. This is exactly the part where the AWS Distro for Open Telemetry (ADOT) SDK was automatically added to our agent code.
Bedrock AgentCore Agents, Sessions and Traces View
Now let's invoke our agent several times as we described in part 2 and look at the Agents, Sessions and Traces views in the "Bedrock AgentCore" panel in the CloudWatch GenAI Observability.
In the "Agents View" below we see some general metrics of the invoked agent like number of sessions and traces, errors and throttle rate :
In the "Sessions view" below we see an overview of all agent sessions:
When clicking on the session's link we see the linked traces (in our case only 1 trace) of the selected session:
By clicking on the trace itself we get to see the spans for the whole agent invocation chain :
We see that we first invoke /invocations endpoint (this is the endpoint exposed by the AgentCore Runtime HTTP Server), then Cognito gets invoked to get the authentication token later required to invoke the AgentCore Gateway Open API Target (which is Amazon Gateway URL) which we used in our setup in the part 2. We also see the Strands Agents event loop cycle and that we use Amazon Nova Pro Model which decides to invoke MCP tool DemoOpenAPITargetS3OrderAPI___getOrdersByCreatedDates" exposed via Open API:
We can click on the "Span data" to see the whole payload for each span. On the right side of each span we see its latency in milliseconds.
We also have the "Timeline" view of the same spans where we can also see how much time did each span take in the agent invocation chain:
We also get the "Trajectory" view of the spans where we can better understand the whole flow of the agent invocation:
It's especially useful to have this Trajectory view in case some error occured as it was the case during one of my agent invocations. We can directly see the where (in what span) exactly the error occurred in red:
Conclusion
In this part of the series, we dove deeper into the AgentCore Observability in general and explored AWS Distro for OpenTelemetry, with the help of which we can instrument our applications to send correlated metrics and traces to CloudWatch GenAI Observability service. We also explored different "Bedrock AgentCore" panel views (Agents, Sessions and Traces) as well as model invocation metrics and logging.
In the next parts of the series, we'll use Custom Agent implementation instead of the Starter Toolkit which gives us full control over our agent's HTTP interface.
Top comments (0)