Matt Lewis for AWS Heroes

Posted on Jun 16

Adding Memory to the Agent

#agents #ai #aws #tutorial

This is the fourth in a series of posts documenting the architecture, implementation, and lessons learned from building the AWS Briefing Agent - a personalised AWS assistant deployed on Amazon Bedrock AgentCore Runtime.

Part 1: Building a Full-Stack AI Agent on Bedrock AgentCore
Part 2: Data Ingestion: RSS Feeds, Knowledge Base, S3 Vectors, and Metadata Filtering
Part 3: Strands Agents + AgentCore Runtime - a perfect match
Part 4: Adding Memory to the Agent
Part 5: Experimenting with API Gateway
Part 6: Observability and Evaluations
Part 7: Third Party Integrations - Identity, Gateway and Slack Notifications

As mentioned in the first blog post, each session on AgentCore Runtime is assigned a dedicated Firecracker microVM with isolated CPU, memory and filesystem resources. When the session finishes, the entire microVM is destroyed. There is no shared state between sessions, which prevents any cross-session data leakage.

When a user accesses our AWS Briefing Agent service for the first time, they are asked a number of questions.

This includes asking about the primary AWS services the user is interested in, their experience level in AWS, and if there are specific AWS areas they want to track closely.

Without any memory capability, the user will have to provide the same information each time they start a new session. This is where AgentCore Memory comes into play. This post walks through setting up AgentCore Memory using Strands Agents.

Configuring Memory in AgentCore

The agentcore.json file is the primary configuration file used in Amazon Bedrock AgentCore to define and manage AI agents, gateways, memory stores and datasets. It acts as the central orchestrator to package up the agents infrastructure.

When we run the agentcore deploy command, the CLI reads this file and uses the AWS CDK to synthesize and deploy CloudFormation resources. We add long term to our agent in the memory section using a resource identifier of "BriefingAgentMemory". This is the identifier that is referenced in our handler.

AgentCore Memory itself consists of several key components that work together to provide both short-term context and long-term intelligence for agents as shown in the diagram below:

The interactions with the user are stored in short term for 90 days, as specified in the event expiry duration attribute. We then specify two distinct memory strategies that transform these short term raw events into long-term memory. Note that all strategies by default ignore personally identifiable information (PII) data from long-term memory records.

We define the following strategies in the agentcore.json file:

SEMANTIC - this memory strategy identifies and extracts key pieces of factual information and contextual knowledge from conversational data. For example, a user is running AWS Lambda in production.
USER_PREFERENCE - this memory strategy is designed to automatically identify and extract user preferences, choices and styles from conversations. For example, a user is interested in serverless and containers.

Each strategy stores its long-term memory in a hierarchical structure within a namespace. These namespaces act as distinct logical containers. We segregate them using the special {actorId} placeholder variable, so that we guarantee separation between each user.

The complete relevant memory section in our agentcore.json file is shown below:

  "memories": [
    {
      "name": "BriefingAgentMemory",
      "eventExpiryDuration": 90,
      "strategies": [
        {
          "type": "SEMANTIC",
          "name": "semantic_facts",
          "namespaces": [
            "/users/{actorId}/facts"
          ]
        },
        {
          "type": "USER_PREFERENCE",
          "name": "user_preferences",
          "namespaces": [
            "/users/{actorId}/preferences"
          ]
        }
      ]
    }
  ],

Integrating Cognito and AgentCore Runtime

At this point, we need to do a segway into how we authenticate requests to our agent. AgentCore Runtime supports two inbound authentication mechanisms:

AWS IAM SigV4 - where the request to the InvokeAgentRuntime API is SigV4-signed with valid AWS credentials that have the bedrock-agentcore:InvokeAgentRuntime IAM permission.
JWT Bearer Token Auth - which is configured with an Inbound JWT authoriser

When our frontend invokes the agent, it is sending a request to the agent's public endpoint URL:

https://bedrock-agentcore.eu-west-1.amazonaws.com/runtimes/<arn>/invocations

This URL is a special public-facing endpoint that AgentCore Runtime exposes. We specify this in the agentcore.json file:

  "runtimes": [
    {
      "name": "AWSBriefingAgent",
      "build": "Container",
      "entrypoint": "handler.py",
      ...
      "authorizerType": "CUSTOM_JWT",
      "authorizerConfiguration": {
        "customJwtAuthorizer": {
          "discoveryUrl": "https://cognito-idp.eu-west-1.amazonaws.com/eu-west-1_dshjdhskj/.well-known/openid-configuration",
          "allowedClients": [
            "dhjhdjskhdjkshdjkhsd"
          ]
        }
      }
    }

The discoveryUrl points to Cognito's OpenID Connect discovery document for the AWS Cognito User Pool with the specified ID that is being used to authenticate users to the frontend. When AgentCore Runtime wants to validate the JWT token, it retrieves information from this endpoint such as the issuer and JWKS endpoint (contains the public keys used to verify the JWT signature).

The allowedClients shows the Cognito Application Client ID. When a user logs in, Cognito stamps the token with the client_id. AgentCore validates the JWT’s client_id claim, so only tokens issued for one of the permitted application clients can invoke the runtime.

When the user logs into our frontend application with their email address and password, the frontend calls Cognito directly to verify, and receives back

Access token — proves who you are and what you're allowed to do.
ID token — contains profile info (email, name). Used by the frontend to display the username.
Refresh token — used to get new access/ID tokens when they expire (usually after 1 hour). These tokens are stored by the frontend auth library.

When we send a request to the agent, the frontend attaches the access token as a bearer token

POST /invocations
Authorization: Bearer eyJraWQi...
Body: {"prompt": "Give me a briefing"}

This is the JWT token that gets validated by AgentCore Runtime.

Returning Memory records in Handler function

The following code snippet shows how we retrieve the memory records to display in the sidebar of the frontend.

@app.entrypoint
async def invoke(payload: Dict[str, Any], context: Any = None):
    message = payload.get("prompt", payload.get("message", ""))

    # Derive actor_id from the JWT 'sub' claim (source of truth)
    actor_id = _extract_sub_from_jwt(context) or payload.get("user_id", "default-user")

    # Sanitize actor_id for AgentCore Memory
    actor_id = re.sub(r"[^a-zA-Z0-9\-_/]", "_", actor_id)

    # Retrieve memory records to include in the stream
    memory_used = get_memory_records(actor_id, message)

The @app.entrypoint decorator registers a function as the handler for POST requests to /invocations. AgentCore Runtime calls this handler function when a client invokes the agent. Our handler function is an async generator, which means that it automatically streams the response as Server-Sent Events (SSE) delivered to the client in real-time (more around this in the next blog post).

Within the handler, we get the message that has been sent in the payload. We then extract the user's identity from the JWT token that Cognito issued. One of the claims in the JWT token is the sub or subject, which is the unique user ID assigned by Cognito to a user when they first register. We know that the JWT token has been cryptographically signed by Cognito and validated by AgentCore Runtime before it reaches the handler function. We assign this sub value to be the actor_id. We apply some regex to the actual value to ensure it has no characters in it that are not supported.

We then call our get_memory_records function. This function calls the AgentCore retrieve memory records API to search the long-term memory for facts and preferences relevant to the promt that has just been passed in. We retrieve the 5 highest scoring results from the vector search and store them in a records array, which is streamed back to the frontend to be displayed in the sidebar.

def get_memory_records(actor_id: str, prompt: str) -> List[Dict[str, Any]]:
    """Retrieve long-term memory records relevant to the user's prompt.

    Searches both the facts and preferences namespaces and returns
    the records the agent would have seen for this invocation.
    """
    if not MEMORY_ID:
        return []

    try:
        client = boto3.client("bedrock-agentcore", region_name=REGION)
        records = []

        for namespace in [
            f"users/{actor_id}/facts",
            f"users/{actor_id}/preferences",
        ]:
            try:
                response = client.retrieve_memory_records(
                    memoryId=MEMORY_ID,
                    namespace=namespace,
                    searchCriteria={
                        "searchQuery": prompt,
                        "topK": 5,
                    },
                    maxResults=5,
                )
                for r in response.get("memoryRecordSummaries", []):
                    records.append({
                        "memoryRecordId": r.get("memoryRecordId", ""),
                        "text": r.get("content", {}).get("text", ""),
                        "score": r.get("score"),
                        "memoryStrategyId": r.get("memoryStrategyId", ""),
                        "namespaces": r.get("namespaces", []),
                    })
            except Exception as exc:
                logger.warning("Failed to retrieve from %s: %s", namespace, exc)

        return records
    except Exception as exc:
        logger.warning("Failed to retrieve memory records: %s", exc)
        return []

We can see an example of the sidebar in the frontend below:

Setting up Memory with Strands

Both short-term and long-term memory are handled for us automatically through the AgentCore Memory session manager integration for Strands.

The memory ID is retrieved in a module-level constant:

MEMORY_ID = os.environ.get("MEMORY_BRIEFINGAGENTMEMORY_ID")

This reads the memory resource ID that AgentCore Runtime automatically injects as an environment variable into your container at runtime. The naming convention is: MEMORY__ID. Given the memory was given a name of "BriefingAgentMemory" in the agentcore.json file, AgentCore sets MEMORY_BRIEFINGAGENTMEMORY_ID to the actual memory resource ID (something like AWSBriefingAgent_BriefingAgentMemory-q2iBfL64BS).

The following function in our code is called on every request. A new stateless Strands Agent instance is created on each invocation, configured with the relevant session manager that loads conversation history from AgentCore Memory, tools and model settings.

def _create_agent(session_id: str, actor_id: str, gateway_tools: list = None) -> Agent:
    """Create a Strands Agent with KB retrieval, AgentCore Memory, and Gateway tools."""
    session_manager = None

    if MEMORY_ID:
        try:
            from bedrock_agentcore.memory.integrations.strands.config import (
                AgentCoreMemoryConfig,
                RetrievalConfig,
            )
            from bedrock_agentcore.memory.integrations.strands.session_manager import (
                AgentCoreMemorySessionManager,
            )

            config = AgentCoreMemoryConfig(
                memory_id=MEMORY_ID,
                session_id=session_id,
                actor_id=actor_id,
                retrieval_config={
                    f"users/{actor_id}/facts": RetrievalConfig(
                        top_k=5, relevance_score=0.5
                    ),
                    f"users/{actor_id}/preferences": RetrievalConfig(
                        top_k=5, relevance_score=0.5
                    ),
                },
            )
            session_manager = AgentCoreMemorySessionManager(
                agentcore_memory_config=config,
                region_name=REGION,
            )
        except Exception as exc:
            logger.warning("Failed to initialise memory session manager: %s", exc)

    tools = [retrieve, format_slack_message] + (gateway_tools or [])

    return Agent(
        system_prompt=_load_system_prompt(),
        model=_create_model(),
        tools=tools,
        session_manager=session_manager,
        conversation_manager=SlidingWindowConversationManager(
            window_size=20,
            should_truncate_results=True,
            per_turn=True,
        ),
        callback_handler=None,
    )

In our code, if memory has been set, then we import the AgentCoreMemorySessionManager. This session manager integrates Strands agents with AgentCore Memory, which synchronises the short-term and long-term memory capabilities. Some of its features include loading the conversation history from short-term memory during agent initialisation, and integrating with long-term memory for context injection into agent state.

Next we create a AgentCoreMemoryConfig configuration object which will be passed to the session manager telling it:

memory_id - which AgentCore Memory resource to connect to
session_id - the identifier for the conversation session
actor_id - the unique identifier for the user
retrieval_config - a dictionary mapping of namespaces to retrieval configurations. This tells the session manage to search the two namespaces for relevant long-term memories, and to get the 5 most relevant facts and user preferences

Our use of AgentCore Memory is now handled automatically by Strands Agents session manager. Before each turn, it will load recent events from the same session to populate the agent's conversation context. The short-term memory is the raw event stream. The agent will see the last 20 turns in its context window, as this has been configured with the Sliding Window Conversation Manager. After (and during) invocations of the agent, new conversation messages are automatically persisted to AgentCore Memory.

With this in place, we have now successfully added long-term memory to our agent, personalising the briefing for each user based on their preferences.

Biography

As Chief AWS Architect at IBM in the UK, I am responsible for growing the AWS capability and community within one of the fastest growing AWS consulting partners globally. This gives me the opportunity to try out the latest features in preview before they go into general availability. You'll often find me blogging about my experience, but please reach out if there are services you'd like to know more about.