Prerequisites
Familiarity with AWS Bedrock, boto3, and building LLM-based agents.
The Problem With Stateless Agents
If we ship a Bedrock agent to production, we already hit this wall. Every invocation is stateless. We hack around it by stuffing conversation history into the prompt, bloating your token count, and eventually hitting context limits. Or you build your own memory layer — DynamoDB for session state, OpenSearch for semantic retrieval, some glue Lambda in between — and suddenly you're maintaining infrastructure that has nothing to do with your actual agent logic.
AgentCore Memory is AWS's answer to this. It's a managed memory service purpose-built for agents, with three distinct memory tiers and a retrieval API that plugs directly into the Bedrock agent runtime. Let's actually use it.
Setup
Enabling AgentCore Memory
First, make sure we're on a region that supports it (us-east-1, us-west-2 at GA). Create a memory store:
python
import boto3
bedrock_agent = boto3.client("bedrock-agent", region_name="us-east-1")
response = bedrock_agent.create_memory(
name="customer-support-memory",
description="Persistent memory for support agent",
memoryConfiguration={
"enabledMemoryTypes": ["SESSION_SUMMARY", "SEMANTIC"],
"storageDays": 90
}
)
memory_id = response["memory"]["memoryId"]
print(f"Memory store created: {memory_id}")
Store the memory_id — you'll attach it to every agent invocation. Think of it like a database connection string for our agent's brain.
The Three Memory Tiers (What They Actually Do)
- Session Memory This is in-context working memory scoped to a single conversation. Bedrock manages it automatically when you pass a sessionId — you don't write to it directly. What we want to control is whether session summaries get promoted to long-term memory when the session ends.
python
bedrock_runtime = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
response = bedrock_runtime.invoke_agent(
agentId="YOUR_AGENT_ID",
agentAliasId="YOUR_ALIAS_ID",
sessionId="user-1234-session-abc", # scopes the session memory
memoryId=memory_id, # links to the persistent store
inputText="My order #9982 still hasn't arrived after 10 days",
enableTrace=True
)
Bedrock tracks everything in this session under user-1234-session-abc. When the session closes (or hits the TTL), it automatically summarises the key facts and pushes them into long-term memory.
2.Long-Term (Semantic) Memory
This is the tier that makes agents genuinely useful across sessions. Facts extracted from past conversations are embedded and stored in a managed vector store. When a new session starts, the agent runtime does semantic retrieval against this store before constructing the prompt.
We can also write to it directly — useful for seeding known user preferences or backfilling from an existing CRM:
python
bedrock_agent.put_memory_record(
memoryId=memory_id,
memoryRecord={
"content": {
"text": "Customer John Doe (user-1234) has a Premium plan. "
"Prefers resolution via email. Had delivery issue in Jan 2025."
},
"memoryRecordType": "SEMANTIC",
"sessionId": "user-1234-bootstrap"
}
)
And to retrieve it manually (e.g., for a pre-flight check before invoking the agent):
python
results = bedrock_agent.retrieve_memory_records(
memoryId=memory_id,
memoryRecordType="SEMANTIC",
searchQuery="user-1234 preferences and history",
maxResults=5
)
for record in results["memoryRecordSummaries"]:
print(record["content"]["text"])
print(f"Score: {record['score']}") # cosine similarity
The retrieval is semantic, not keyword-based. Querying "does this user have premium?" will match a record that says "subscribed to the top-tier plan" — no exact string match required.
- Episodic Memory This is the newest tier and the most powerful for iterative workflows. Episodic memory stores sequences of events — entire chains of tool calls, decisions, and outcomes — not just extracted facts. The agent can later retrieve past episodes and use them to inform strategy. Enable it at store creation:
python
response = bedrock_agent.create_memory(
name="coding-assistant-memory",
memoryConfiguration={
"enabledMemoryTypes": ["SESSION_SUMMARY", "SEMANTIC", "EPISODIC"],
"storageDays": 180
}
)
Then tag sessions with a namespace so related episodes can be retrieved together:
python
response = bedrock_runtime.invoke_agent(
agentId="YOUR_AGENT_ID",
agentAliasId="YOUR_ALIAS_ID",
sessionId="user-5678-session-xyz",
memoryId=memory_id,
inputText="Debug why my FastAPI app returns 422 on file uploads",
sessionAttributes={
"episodeNamespace": "user-5678-debugging"
}
)
After a few sessions, the agent accumulates episodes like: "For user-5678, file upload 422s were caused by missing Content-Type headers twice. Solution: always check middleware config first." It surfaces this automatically on the next relevant session.
Controlling What Gets Remembered
By default, AgentCore summarises everything. In production we'll need to be more aware. Use memory consolidation filters to control promotion from session → long-term:
python
bedrock_agent.update_memory(
memoryId=memory_id,
memoryConfiguration={
"enabledMemoryTypes": ["SESSION_SUMMARY", "SEMANTIC"],
"storageDays": 90,
"sessionSummaryConfiguration": {
"maxRecentSessions": 20,
"summaryPromptTemplate": (
"Extract only: user preferences, unresolved issues, "
"account facts. Ignore pleasantries and small talk."
)
}
}
)
The summaryPromptTemplate is a prompt sent to the underlying FM during consolidation. Customizing it prevents noise (greetings, filler, repeated questions) from polluting your long-term store.
Deleting Memory (GDPR / Right to Erasure)
This is non-negotiable in production. When a user requests data deletion:
python
# List all records for a user
records = bedrock_agent.list_memory_records(
memoryId=memory_id,
memoryRecordType="SEMANTIC",
maxResults=100
)
# Delete each one
for record in records["memoryRecordSummaries"]:
if "user-1234" in record.get("sessionId", ""):
bedrock_agent.delete_memory_record(
memoryId=memory_id,
memoryRecordId=record["memoryRecordId"]
)
Or nuke the entire memory store for a user namespace if you're using per-user stores:
python
bedrock_agent.delete_memory(memoryId=memory_id)
Architecture Pattern:
Per-User vs Shared Memory Stores
Two approaches in production:
Per-user store — one memoryId per user. Total isolation, clean deletion, but more stores to manage and higher overhead for low-activity users.
Shared store with namespaced session IDs — one store, session IDs prefixed with user ID (user-1234-session-abc). Simpler operationally, but retrieval must filter carefully to avoid cross-user bleed. Always scope your searchQuery with user identifiers.
For most B2C applications, the shared store with namespaced session IDs is the pragmatic choice. For enterprise multi-tenant SaaS, per-user (or per-tenant) stores are worth the overhead for the isolation guarantees.
Observability:
Tracing Memory Retrievals
Enable traces to see exactly what's being pulled from memory on each invocation.
python
response = bedrock_runtime.invoke_agent(
agentId="YOUR_AGENT_ID",
agentAliasId="YOUR_ALIAS_ID",
sessionId="user-1234-session-new",
memoryId=memory_id,
inputText="What was that issue I had last month?",
enableTrace=True
)
for event in response["completion"]:
if "trace" in event:
trace = event["trace"]["trace"]
if "orchestrationTrace" in trace:
obs = trace["orchestrationTrace"].get("observation", {})
if "knowledgeBaseLookupOutput" in obs:
print("Memory retrieved:", obs["knowledgeBaseLookupOutput"])
This surfaces which records were retrieved, their similarity scores, and how they were injected into the prompt. Essential for debugging why your agent is (or isn't) remembering something.
Cost Considerations
AgentCore Memory pricing has two components: storage (per GB/month for the vector store) and retrieval (per 1K queries). A few things to watch:
Session summaries are generated by an FM call — this counts against our Bedrock token usage. Noisy sessions with long summaries add up. The summaryPromptTemplate customisation above directly controls this cost.
Episodic memory stores more data than semantic — budget accordingly if you enable it.
Set storageDays aggressively. 90 days is usually sufficient; most users don't need their agent to recall a conversation from 18 months ago.
Top comments (0)