This is the fourth in a series of posts documenting the architecture, implementation, and lessons learned from building the AWS Briefing Agent - a personalised AWS assistant deployed on Amazon Bedrock AgentCore Runtime.
- Part 1: Building a Full-Stack AI Agent on Bedrock AgentCore
- Part 2: Data Ingestion: RSS Feeds, Knowledge Base, S3 Vectors, and Metadata Filtering
- Part 3: Strands Agents + AgentCore Runtime - a perfect match
- Part 4: Adding Memory to the Agent
- Part 5: Experimenting with API Gateway
- Part 6: Observability and Evaluations
- Part 7: Third Party Integrations - Identity, Gateway and Slack Notifications
As mentioned in the first blog post, each session on AgentCore Runtime is assigned a dedicated Firecracker microVM with isolated CPU, memory and filesystem resources. When the session finishes, the entire microVM is destroyed. There is no shared state between sessions, which prevents any cross-session data leakage.
When a user accesses our AWS Briefing Agent service for the first time, they are asked a number of questions.
This includes asking about the primary AWS services the user is interested in, their experience level in AWS, and if there are specific AWS areas they want to track closely.
Without any memory capability, the user will have to provide the same information each time they start a new session. This is where AgentCore Memory comes into play. This post walks through setting up AgentCore Memory using Strands Agents.
Configuring Memory in AgentCore
The agentcore.json file is the primary configuration file used in Amazon Bedrock AgentCore to define and manage AI agents, gateways, memory stores and datasets. It acts as the central orchestrator to package up the agents infrastructure.
When we run the agentcore deploy command, the CLI reads this file and uses the AWS CDK to synthesize and deploy CloudFormation resources. We add long term to our agent in the memory section using a resource identifier of "BriefingAgentMemory". This is the identifier that is referenced in our handler.
AgentCore Memory itself consists of several key components that work together to provide both short-term context and long-term intelligence for agents as shown in the diagram below:
The interactions with the user are stored in short term for 90 days, as specified in the event expiry duration attribute. We then specify two distinct memory strategies that transform these short term raw events into long-term memory. Note that all strategies by default ignore personally identifiable information (PII) data from long-term memory records.
We define the following strategies in the agentcore.json file:
- SEMANTIC - this memory strategy identifies and extracts key pieces of factual information and contextual knowledge from conversational data. For example, a user is running AWS Lambda in production.
- USER_PREFERENCE - this memory strategy is designed to automatically identify and extract user preferences, choices and styles from conversations. For example, a user is interested in serverless and containers.
Each strategy stores its long-term memory in a hierarchical structure within a namespace. These namespaces act as distinct logical containers. We segregate them using the special {actorId} placeholder variable, so that we guarantee separation between each user.
The complete relevant memory section in our agentcore.json file is shown below:
"memories": [
{
"name": "BriefingAgentMemory",
"eventExpiryDuration": 90,
"strategies": [
{
"type": "SEMANTIC",
"name": "semantic_facts",
"namespaces": [
"/users/{actorId}/facts"
]
},
{
"type": "USER_PREFERENCE",
"name": "user_preferences",
"namespaces": [
"/users/{actorId}/preferences"
]
}
]
}
],
Integrating Cognito and AgentCore Runtime
At this point, we need to do a segway into how we authenticate requests to our agent. AgentCore Runtime supports two inbound authentication mechanisms:
- AWS IAM SigV4 - where the request to the
InvokeAgentRuntimeAPI is SigV4-signed with valid AWS credentials that have thebedrock-agentcore:InvokeAgentRuntimeIAM permission. - JWT Bearer Token Auth - which is configured with an Inbound JWT authoriser
When our frontend invokes the agent, it is sending a request to the agent's public endpoint URL:
https://bedrock-agentcore.eu-west-1.amazonaws.com/runtimes/<arn>/invocations
This URL is a special public-facing endpoint that AgentCore Runtime exposes. We specify this in the agentcore.json file:
"runtimes": [
{
"name": "AWSBriefingAgent",
"build": "Container",
"entrypoint": "handler.py",
...
"authorizerType": "CUSTOM_JWT",
"authorizerConfiguration": {
"customJwtAuthorizer": {
"discoveryUrl": "https://cognito-idp.eu-west-1.amazonaws.com/eu-west-1_dshjdhskj/.well-known/openid-configuration",
"allowedClients": [
"dhjhdjskhdjkshdjkhsd"
]
}
}
}
The discoveryUrl points to Cognito's OpenID Connect discovery document for the AWS Cognito User Pool with the specified ID that is being used to authenticate users to the frontend. When AgentCore Runtime wants to validate the JWT token, it retrieves information from this endpoint such as the issuer and JWKS endpoint (contains the public keys used to verify the JWT signature).
The allowedClients shows the Cognito Application Client ID. When a user logs in, Cognito stamps the token with the client_id. AgentCore validates the JWT’s client_id claim, so only tokens issued for one of the permitted application clients can invoke the runtime.
When the user logs into our frontend application with their email address and password, the frontend calls Cognito directly to verify, and receives back
- Access token — proves who you are and what you're allowed to do.
- ID token — contains profile info (email, name). Used by the frontend to display the username.
- Refresh token — used to get new access/ID tokens when they expire (usually after 1 hour). These tokens are stored by the frontend auth library.
When we send a request to the agent, the frontend attaches the access token as a bearer token
POST /invocations
Authorization: Bearer eyJraWQi...
Body: {"prompt": "Give me a briefing"}
This is the JWT token that gets validated by AgentCore Runtime.
Returning Memory records in Handler function
The following code snippet shows how we retrieve the memory records to display in the sidebar of the frontend.
@app.entrypoint
async def invoke(payload: Dict[str, Any], context: Any = None):
message = payload.get("prompt", payload.get("message", ""))
# Derive actor_id from the JWT 'sub' claim (source of truth)
actor_id = _extract_sub_from_jwt(context) or payload.get("user_id", "default-user")
# Sanitize actor_id for AgentCore Memory
actor_id = re.sub(r"[^a-zA-Z0-9\-_/]", "_", actor_id)
# Retrieve memory records to include in the stream
memory_used = get_memory_records(actor_id, message)
The @app.entrypoint decorator registers a function as the handler for POST requests to /invocations. AgentCore Runtime calls this handler function when a client invokes the agent. Our handler function is an async generator, which means that it automatically streams the response as Server-Sent Events (SSE) delivered to the client in real-time (more around this in the next blog post).
Within the handler, we get the message that has been sent in the payload. We then extract the user's identity from the JWT token that Cognito issued. One of the claims in the JWT token is the sub or subject, which is the unique user ID assigned by Cognito to a user when they first register. We know that the JWT token has been cryptographically signed by Cognito and validated by AgentCore Runtime before it reaches the handler function. We assign this sub value to be the actor_id. We apply some regex to the actual value to ensure it has no characters in it that are not supported.
We then call our get_memory_records function. This function calls the AgentCore retrieve memory records API to search the long-term memory for facts and preferences relevant to the promt that has just been passed in. We retrieve the 5 highest scoring results from the vector search and store them in a records array, which is streamed back to the frontend to be displayed in the sidebar.
def get_memory_records(actor_id: str, prompt: str) -> List[Dict[str, Any]]:
"""Retrieve long-term memory records relevant to the user's prompt.
Searches both the facts and preferences namespaces and returns
the records the agent would have seen for this invocation.
"""
if not MEMORY_ID:
return []
try:
client = boto3.client("bedrock-agentcore", region_name=REGION)
records = []
for namespace in [
f"users/{actor_id}/facts",
f"users/{actor_id}/preferences",
]:
try:
response = client.retrieve_memory_records(
memoryId=MEMORY_ID,
namespace=namespace,
searchCriteria={
"searchQuery": prompt,
"topK": 5,
},
maxResults=5,
)
for r in response.get("memoryRecordSummaries", []):
records.append({
"memoryRecordId": r.get("memoryRecordId", ""),
"text": r.get("content", {}).get("text", ""),
"score": r.get("score"),
"memoryStrategyId": r.get("memoryStrategyId", ""),
"namespaces": r.get("namespaces", []),
})
except Exception as exc:
logger.warning("Failed to retrieve from %s: %s", namespace, exc)
return records
except Exception as exc:
logger.warning("Failed to retrieve memory records: %s", exc)
return []
We can see an example of the sidebar in the frontend below:
Setting up Memory with Strands
Both short-term and long-term memory are handled for us automatically through the AgentCore Memory session manager integration for Strands.
The memory ID is retrieved in a module-level constant:
MEMORY_ID = os.environ.get("MEMORY_BRIEFINGAGENTMEMORY_ID")
This reads the memory resource ID that AgentCore Runtime automatically injects as an environment variable into your container at runtime. The naming convention is: MEMORY__ID. Given the memory was given a name of "BriefingAgentMemory" in the agentcore.json file, AgentCore sets MEMORY_BRIEFINGAGENTMEMORY_ID to the actual memory resource ID (something like AWSBriefingAgent_BriefingAgentMemory-q2iBfL64BS).
The following function in our code is called on every request. A new stateless Strands Agent instance is created on each invocation, configured with the relevant session manager that loads conversation history from AgentCore Memory, tools and model settings.
def _create_agent(session_id: str, actor_id: str, gateway_tools: list = None) -> Agent:
"""Create a Strands Agent with KB retrieval, AgentCore Memory, and Gateway tools."""
session_manager = None
if MEMORY_ID:
try:
from bedrock_agentcore.memory.integrations.strands.config import (
AgentCoreMemoryConfig,
RetrievalConfig,
)
from bedrock_agentcore.memory.integrations.strands.session_manager import (
AgentCoreMemorySessionManager,
)
config = AgentCoreMemoryConfig(
memory_id=MEMORY_ID,
session_id=session_id,
actor_id=actor_id,
retrieval_config={
f"users/{actor_id}/facts": RetrievalConfig(
top_k=5, relevance_score=0.5
),
f"users/{actor_id}/preferences": RetrievalConfig(
top_k=5, relevance_score=0.5
),
},
)
session_manager = AgentCoreMemorySessionManager(
agentcore_memory_config=config,
region_name=REGION,
)
except Exception as exc:
logger.warning("Failed to initialise memory session manager: %s", exc)
tools = [retrieve, format_slack_message] + (gateway_tools or [])
return Agent(
system_prompt=_load_system_prompt(),
model=_create_model(),
tools=tools,
session_manager=session_manager,
conversation_manager=SlidingWindowConversationManager(
window_size=20,
should_truncate_results=True,
per_turn=True,
),
callback_handler=None,
)
In our code, if memory has been set, then we import the AgentCoreMemorySessionManager. This session manager integrates Strands agents with AgentCore Memory, which synchronises the short-term and long-term memory capabilities. Some of its features include loading the conversation history from short-term memory during agent initialisation, and integrating with long-term memory for context injection into agent state.
Next we create a AgentCoreMemoryConfig configuration object which will be passed to the session manager telling it:
- memory_id - which AgentCore Memory resource to connect to
- session_id - the identifier for the conversation session
- actor_id - the unique identifier for the user
- retrieval_config - a dictionary mapping of namespaces to retrieval configurations. This tells the session manage to search the two namespaces for relevant long-term memories, and to get the 5 most relevant facts and user preferences
Our use of AgentCore Memory is now handled automatically by Strands Agents session manager. Before each turn, it will load recent events from the same session to populate the agent's conversation context. The short-term memory is the raw event stream. The agent will see the last 20 turns in its context window, as this has been configured with the Sliding Window Conversation Manager. After (and during) invocations of the agent, new conversation messages are automatically persisted to AgentCore Memory.
With this in place, we have now successfully added long-term memory to our agent, personalising the briefing for each user based on their preferences.
Biography
As Chief AWS Architect at IBM in the UK, I am responsible for growing the AWS capability and community within one of the fastest growing AWS consulting partners globally. This gives me the opportunity to try out the latest features in preview before they go into general availability. You'll often find me blogging about my experience, but please reach out if there are services you'd like to know more about.



Top comments (0)