DEV Community: Sampath Karan

AWS Bedrock AgentCore Memory: Give Your AI Agent a Brain That Actually Remembers

Sampath Karan — Sat, 14 Mar 2026 11:51:41 +0000

Prerequisites

Familiarity with AWS Bedrock, boto3, and building LLM-based agents.

The Problem With Stateless Agents

If we ship a Bedrock agent to production, we already hit this wall. Every invocation is stateless. We hack around it by stuffing conversation history into the prompt, bloating your token count, and eventually hitting context limits. Or you build your own memory layer — DynamoDB for session state, OpenSearch for semantic retrieval, some glue Lambda in between — and suddenly you're maintaining infrastructure that has nothing to do with your actual agent logic.
AgentCore Memory is AWS's answer to this. It's a managed memory service purpose-built for agents, with three distinct memory tiers and a retrieval API that plugs directly into the Bedrock agent runtime. Let's actually use it.

Setup

Enabling AgentCore Memory
First, make sure we're on a region that supports it (us-east-1, us-west-2 at GA). Create a memory store:

python
import boto3

bedrock_agent = boto3.client("bedrock-agent", region_name="us-east-1")

response = bedrock_agent.create_memory(
    name="customer-support-memory",
    description="Persistent memory for support agent",
    memoryConfiguration={
        "enabledMemoryTypes": ["SESSION_SUMMARY", "SEMANTIC"],
        "storageDays": 90
    }
)

memory_id = response["memory"]["memoryId"]
print(f"Memory store created: {memory_id}")

Store the memory_id — you'll attach it to every agent invocation. Think of it like a database connection string for our agent's brain.

The Three Memory Tiers (What They Actually Do)

Session Memory This is in-context working memory scoped to a single conversation. Bedrock manages it automatically when you pass a sessionId — you don't write to it directly. What we want to control is whether session summaries get promoted to long-term memory when the session ends.

python
bedrock_runtime = boto3.client("bedrock-agent-runtime", region_name="us-east-1")

response = bedrock_runtime.invoke_agent(
    agentId="YOUR_AGENT_ID",
    agentAliasId="YOUR_ALIAS_ID",
    sessionId="user-1234-session-abc",   # scopes the session memory
    memoryId=memory_id,                   # links to the persistent store
    inputText="My order #9982 still hasn't arrived after 10 days",
    enableTrace=True
)

Bedrock tracks everything in this session under user-1234-session-abc. When the session closes (or hits the TTL), it automatically summarises the key facts and pushes them into long-term memory.

2.Long-Term (Semantic) Memory
This is the tier that makes agents genuinely useful across sessions. Facts extracted from past conversations are embedded and stored in a managed vector store. When a new session starts, the agent runtime does semantic retrieval against this store before constructing the prompt.
We can also write to it directly — useful for seeding known user preferences or backfilling from an existing CRM:

python
bedrock_agent.put_memory_record(
    memoryId=memory_id,
    memoryRecord={
        "content": {
            "text": "Customer John Doe (user-1234) has a Premium plan. "
                    "Prefers resolution via email. Had delivery issue in Jan 2025."
        },
        "memoryRecordType": "SEMANTIC",
        "sessionId": "user-1234-bootstrap"
    }
)

And to retrieve it manually (e.g., for a pre-flight check before invoking the agent):

python
results = bedrock_agent.retrieve_memory_records(
    memoryId=memory_id,
    memoryRecordType="SEMANTIC",
    searchQuery="user-1234 preferences and history",
    maxResults=5
)

for record in results["memoryRecordSummaries"]:
    print(record["content"]["text"])
    print(f"Score: {record['score']}")  # cosine similarity

The retrieval is semantic, not keyword-based. Querying "does this user have premium?" will match a record that says "subscribed to the top-tier plan" — no exact string match required.

Episodic Memory This is the newest tier and the most powerful for iterative workflows. Episodic memory stores sequences of events — entire chains of tool calls, decisions, and outcomes — not just extracted facts. The agent can later retrieve past episodes and use them to inform strategy. Enable it at store creation:

python
response = bedrock_agent.create_memory(
    name="coding-assistant-memory",
    memoryConfiguration={
        "enabledMemoryTypes": ["SESSION_SUMMARY", "SEMANTIC", "EPISODIC"],
        "storageDays": 180
    }
)

Then tag sessions with a namespace so related episodes can be retrieved together:

python

response = bedrock_runtime.invoke_agent(
    agentId="YOUR_AGENT_ID",
    agentAliasId="YOUR_ALIAS_ID",
    sessionId="user-5678-session-xyz",
    memoryId=memory_id,
    inputText="Debug why my FastAPI app returns 422 on file uploads",
    sessionAttributes={
        "episodeNamespace": "user-5678-debugging"
    }
)

After a few sessions, the agent accumulates episodes like: "For user-5678, file upload 422s were caused by missing Content-Type headers twice. Solution: always check middleware config first." It surfaces this automatically on the next relevant session.

Controlling What Gets Remembered
By default, AgentCore summarises everything. In production we'll need to be more aware. Use memory consolidation filters to control promotion from session → long-term:

python
bedrock_agent.update_memory(
    memoryId=memory_id,
    memoryConfiguration={
        "enabledMemoryTypes": ["SESSION_SUMMARY", "SEMANTIC"],
        "storageDays": 90,
        "sessionSummaryConfiguration": {
            "maxRecentSessions": 20,
            "summaryPromptTemplate": (
                "Extract only: user preferences, unresolved issues, "
                "account facts. Ignore pleasantries and small talk."
            )
        }
    }
)

The summaryPromptTemplate is a prompt sent to the underlying FM during consolidation. Customizing it prevents noise (greetings, filler, repeated questions) from polluting your long-term store.

Deleting Memory (GDPR / Right to Erasure)
This is non-negotiable in production. When a user requests data deletion:

python
# List all records for a user
records = bedrock_agent.list_memory_records(
    memoryId=memory_id,
    memoryRecordType="SEMANTIC",
    maxResults=100
)

# Delete each one
for record in records["memoryRecordSummaries"]:
    if "user-1234" in record.get("sessionId", ""):
        bedrock_agent.delete_memory_record(
            memoryId=memory_id,
            memoryRecordId=record["memoryRecordId"]
        )

Or nuke the entire memory store for a user namespace if you're using per-user stores:


python
bedrock_agent.delete_memory(memoryId=memory_id)

Architecture Pattern:

Per-User vs Shared Memory Stores

Two approaches in production:

Per-user store — one memoryId per user. Total isolation, clean deletion, but more stores to manage and higher overhead for low-activity users.
Shared store with namespaced session IDs — one store, session IDs prefixed with user ID (user-1234-session-abc). Simpler operationally, but retrieval must filter carefully to avoid cross-user bleed. Always scope your searchQuery with user identifiers.
For most B2C applications, the shared store with namespaced session IDs is the pragmatic choice. For enterprise multi-tenant SaaS, per-user (or per-tenant) stores are worth the overhead for the isolation guarantees.

Observability:

Tracing Memory Retrievals
Enable traces to see exactly what's being pulled from memory on each invocation.

python
response = bedrock_runtime.invoke_agent(
    agentId="YOUR_AGENT_ID",
    agentAliasId="YOUR_ALIAS_ID",
    sessionId="user-1234-session-new",
    memoryId=memory_id,
    inputText="What was that issue I had last month?",
    enableTrace=True
)


for event in response["completion"]:
    if "trace" in event:
        trace = event["trace"]["trace"]
        if "orchestrationTrace" in trace:
            obs = trace["orchestrationTrace"].get("observation", {})
            if "knowledgeBaseLookupOutput" in obs:
                print("Memory retrieved:", obs["knowledgeBaseLookupOutput"])

This surfaces which records were retrieved, their similarity scores, and how they were injected into the prompt. Essential for debugging why your agent is (or isn't) remembering something.

Cost Considerations

AgentCore Memory pricing has two components: storage (per GB/month for the vector store) and retrieval (per 1K queries). A few things to watch:

Session summaries are generated by an FM call — this counts against our Bedrock token usage. Noisy sessions with long summaries add up. The summaryPromptTemplate customisation above directly controls this cost.
Episodic memory stores more data than semantic — budget accordingly if you enable it.
Set storageDays aggressively. 90 days is usually sufficient; most users don't need their agent to recall a conversation from 18 months ago.

Plug & Productionize Your AI Agents with AWS Bedrock AgentCore

Sampath Karan — Wed, 26 Nov 2025 09:38:15 +0000

Deploying a Local AI Agent to AWS Bedrock AgentCore

This post demonstrates a streamlined approach to deploying locally developed AI agents using AWS Bedrock AgentCore. We'll build a simple single-node LLM agent, extend it with real-time web search, and deploy it seamlessly.

Overview

I have created a simple LLM agent using OpenAI and extended it with DuckDuckGoSearchResults (LangChain) to fetch current internet information. Once tested locally, the agent can be deployed to AWS Bedrock AgentCore, giving you:

Automatic scaling
Serverless execution
Fully managed runtime

Tech Stack

OpenAI Model ** : gpt-5-nano
**Agent Framework : LangGraph
Deployment : AWS Bedrock AgentCore
Tooling : DuckDuckGoSearchResults (LangChain)

Follow along Github repo link : https://github.com/sampathkaran/langgraph-openai-agentcore-demo

Prerequisites

Before starting, make sure you have:

Python 3.10 installed
AWS CLI configured with credentials
Miniconda installed

Local Setup & Testing

Create a virtual environment.

conda create -n agentcore python=3.10
conda activate agentcore

Install dependencies.

cd agentcore-deployment
pip install -r requirements.txt

Store OpenAI API key in AWS Secrets Manager.

- Go to AWS Console → Secrets Manager → Store a new secret
- Select Other type of secret
- Key name: api-key, Value:
- Name the secret: my-api-key

Now we can test the code locally. Edit agent.py Uncomment the print statement for a test query, and comment out aws_app.run()

#print(langgraph_bedrock({"messages": "Who won the Formula 1 Singapore 2025 race? Give a brief answer"}))

Invoke the script
python agent.py

Architecture Overview

Here’s what happens when you execute a query:

The LLM checks if it already has the answer.

If not, it calls the DuckDuckGoSearchResults tool to fetch current information.
The agent may loop through the search until it has sufficient data.
Once complete, the LLM generates the final response.

Deploying to AWS Bedrock AgentCore

Amazon Bedrock AgentCore provides the following features:

Serverless runtime
Automatic scaling
Session management
Security isolation

We'll use the AgentCore Starter Toolkit CLI to deploy our agent.

Steps to Deploy

Update requirements.txt


bedrock-agentcore<=0.1.5
bedrock-agentcore-starter-toolkit==0.1.14

Import AgentCore runtime library

from bedrock_agentcore.runtime import BedrockAgentCoreApp

aws_app = BedrockAgentCoreApp()

Add the entrypoint decorator

@aws_app.entrypoint

Enable the runtime server

# print(langgraph_bedrock(...))  # Comment out local print
aws_app.run()  # Starts HTTP server on port 8080

Configure AgentCore CLI

agentcore configure --entrypoint agent.py

Accept default options (IAM roles, permissions, etc.)
This generates a Dockerfile and deployment YAML

Launch the agent

agentcore launch

Attach the IAM role permission to access AWS Secrets Manager (secret-policy.json)
Verify your deployed agent in the AWS console
Invoke the agent

agentcore invoke '{"message": "Hello"}'

Conclusion

Using AWS Bedrock AgentCore, you can:

Take a local LLM agent to production
Enable serverless scaling and runtime management
Integrate external tools for real-time information
This approach keeps your AI agent lightweight locally while making it fully production-ready in the cloud.

[Boost]

Sampath Karan — Wed, 26 Nov 2025 09:15:13 +0000

Plug & Productionize Your AI Agents with AWS Bedrock AgentCore

Sampath Karan — Wed, 26 Nov 2025 09:05:52 +0000

Prompt Engineering Techniques - AWS BedRock

Sampath Karan — Mon, 30 Dec 2024 13:29:33 +0000

What is Prompt Engineering?

In simple terms add more context to the user request is Prompt Engineering. Adding more context will basic aid the LLMs to generate appropriate responses.

Let me illustrate by an example

Here I am using AWS Bedrock Claude 3 Haiku as an example.

First I am adding a context that I am an experienced pilot and I wanted the LLM to explain me about the use of Flaps in Aircraft and you could see the response from the LLM that is more suited for an experienced pilot.

Next I am adding a context to the LLM that I am a school student. The LLM have changed the response now that is more suited for the School student as below

Zero Shot Prompting

Here we provide the prompt without any specific examples and the response generated by LLMs are vry diverse.

Few Shot Prompting

Here we provide multiple examples for a request and example response such that the LLMs understand well about and produce expected results

Chain of Thought Prompting(COT)

COT prompting is basically used to breakdown a complex problem and try to produce the results step by step, basically it adds some thinking process to the LLMs.

AWS Bedrock Knowledge Base - An overview

Sampath Karan — Mon, 23 Dec 2024 19:37:25 +0000

Knowledge Base

As we witness the rapid advancement of AI in this era, AI also play a pivotal role in every organization to assist the customer and the employees.

For instance if an employee of an organization inference a Foundational model about the hierarchy of particular person. The FM may unable to generate the results as this is domain specific prompt that is scoped only within this organization and FM may no be trained on such internal data.

To extract some domain specific data we can make use of a feature AWS Bedrock Knowledge Base. Knowledge base is fully managed service that helps to integrate the company proprietary information into the generative-AI applications using Retrieval Augmented Generation (RAG) technique.

Without using KnowlegeBase

Here the FM model was unable to generate the output and it is asking for more context as it is more specific within that organization.

After using KnoweldgeBase

Below are the steps that happens with knowledge Base

The user prompt the query asking for the manager of John Doe.
The query goes to the vector DB say Opensearch DB
The proprietary documents can be uploaded to the a data source like S3.
We use a agent FM model to convert the raw data to vector embedding that can be stored in the DB.
The vector DB searches for this query in the DB and augments it with the prompt and send it to the FM model.
The FM model then can able to find out the data and return back the response as the manager of John Doe

EKS Pod Identity AddOns

Sampath Karan — Sat, 23 Dec 2023 17:47:39 +0000

Recently a new EKS addons introduced an addon feature Pod Identities. Basically if the pod want to communicate with other AWS services it will happen through the IAM Roles for service account (IRSA) where the IAM role will be configured as service account and attached to pods and a switch happens between EKS and IAM. Now with Pod Identity addons we can provide granular permissions for the pods.

You can install the addons and verify if it is added to the cluster

aws eks --region ap-south-1 list-addons --cluster-name demo

{
    "addons": [
        "coredns",
        "eks-pod-identity-agent",
        "kube-proxy",
        "vpc-cni"
    ]
}

You can verify the addons running as daemonset in the cluster

kubectl get daemonset -A


NAMESPACE     NAME                     DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
kube-system   aws-node                 2         2         2       2            2           <none>          51m
kube-system   eks-pod-identity-agent   2         2         2       2            2           <none>          48m
kube-system   kube-proxy               2         2         2       2            2           <none>          51m

Let us break down and see how exactly it works, we will try to access S3 bucket from the pod using pod identity.

Step 1. Create test S3 bucket name test-884.
Step 2. Create an IAM role pod-identity-s3-demo choose trusted entity EKS and EKS pod identity.
![Image description](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8fwfyfwcs6tn5hqdnzqc.png
Step 3. Click next and you could see a trust policy added to the role

Step 4. Click next and create the role.
Step 5. After creating a role we can add inline policy with the bucket name specified as below

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "s3:GetObject",
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::test-884/*",
      "Sid": "PodIdentity"
    }
  ]
}

Step 6. Now associate the IAM role with the EKS pod by using the Pod Identity association, navigate to the eks cluster and access tab and click on

Step 7. You can specify the existing namespace and service account as below

step7: Finally create a pod with the service account and the pod get the temporary access to S3 bucket

AWS EKS Upgrade Insights

Sampath Karan — Fri, 22 Dec 2023 08:21:28 +0000

I had posted a block previously on How to upgrade EKS version via terraform. This time I am back with some recent updates on how effectively we could able to upgrade the cluster seamlessly with recommendations from feature known as AWS EKS Cluster Insights.

What is EKS Upgrade Insights ?

Basically whenever we plan for EKS upgrade, we should basically get into details to understand what workloads are hosted within the cluster and how it is been impacted such as API deprecation and check the compatibility of external services hosted with the cluster for the new targeted version.

EKS Upgrade insights will basically scan the cluster and let us know what API are getting deprecated when we upgrade to the new version with recommendations and remediation advice thus by reducing the administration effort and seamless upgrade.

You can see this tab in the console as per the screenshot below

Here I am running a EKS cluster of outdate version v1.23 and the below are the recommendations from the cluster what is going to get deprecated in the next higher versions

Note: These insights shows recommendations only and the administrator have to take necessary steps to upgrade the deprecated APIs before upgrade.

The refresh of the cluster happens every once in a day, suppose if you update the deprecated API the cluster will still show that there are still deprecated APIs in the cluster. So we have to wait for a day to see the status change in the insights page.

AWS FSx for ONTAP -Storage Provisioning

Sampath Karan — Sat, 01 Jul 2023 07:17:40 +0000

Intro

To manage unmanaged data, AWS introduced FSx - a storage offering from AWS which is a collaboration between AWS and NetApp. You can bring in the features of the NetApp ONTAP cluster within AWS and you can use AWS API or AWS cli to manage the storage. You could use the features of NetApp such as Snap-Mirror and Snapshots, De-duplication, Compression etc.

Unlike traditional ONTAP cluster the AWS FSX for ONTAP is a fully managed cluster i.e. installation and maintenance is taken care by AWS. It has also automated storage tiering to move data into low cost storage for cold data. It supports multiprotocol access such as SMB, CIFS and iSCSI and we can access from AWS EC2 Linux, Windows OS and container services like ECS and EKS.

FSx ONTAP Resources:

Basically you provision the below 3 resource for your file storage

Filesystem
A file system is the primary resource for ONTAP
resource, similar to an on-premises NetApp ONTAP cluster.

Storage Virtual Machine(SVM)
A storage virtual machine(SVM) is an isolated file server with its own administrative and data access endpoints for
administering and accessing data. A default SVM is create when
you create a filesystem. We can add more SVMs if needed.
Basically the clients and the work station access the data via
the SVM endpoint.

Volumes
FSx for ONTAP volumes are virtual resources that you use for
organizing and grouping your data. Volumes are logical
containers, and data stored in them consumes physical capacity
on your file system. Volumes are hosted on SVMs.

By default, when you create a new file system from the AWS Management Console, Amazon FSx automatically creates a file system with a single storage virtual machine (SVM) and one volume. After your file system is created, you can create additional SVMs and volumes as needed.

Data Migration

To migrate the data from on-premise to AWS we can basically setup a SnapMirror replication between the on-premise volume to the AWS FSX volumes and sync all changes.

How to Create a FSX file system using Management Console

Step 1

Search for FSx and select create filesystem

Step 2

Select Amazon FSx for NetApp ONTAP and click next

Step 3

Select standard create option to customize various configuration.

Give a name for your filesystem.
Select the deployment type where to be single AZ or multi AZ.
Select the flash SSD storage capacity, the minimum is 1024 GiB
We can customize your provisioned IOPS as per your application requirements by selecting User-provisioned option.
Throughput Capacity is recommended based on your SSD capacity we can customize that as well.

Step 4 - Network Setup

As we have selected Multi AZ in the previous step it will basically launch a 2 node ONTAP cluster i.e. an active node on 1 AZ and a passive node on other AZ.

Select the VPC

Attach the security Group of the VPC.
Basically 2 subnets from the VPC were selected, there will be data sync happening between the preferred subnet cluster(active) and the standby cluster(passive).In event of any failure to the preferred subnet cluster the standby subnet cluster will become active.
We can select the routes and as well as provide an endpoint IP address for the cluster. Here I am leaving it as default.

Step 5 - Setup the Security and Encryption

We have the option to setup the encryption key from KMS, I would
leave that to default for this demo.
We have to specify the admin password for filesystem to access from ONTAP CLI

Step 6 - Storage Virtual Machine

We need to create a separate admin access for SVMs, you can create multiple SVMs for multi tenancy requirement to share the same filesystem. you can also set a password for the SVM.

Step 7 - Create a volume

These volume created support thin provisioning ie it consumes the storage of what we use. By enabling storage efficiency you could able to leverage ONTAP features such as compression, deduplication, snapshots etc. You can also set the snapshot policy

Once you hit create you can able to create the filesystem and mount it on the VM

Upgrade AWS Elastic Kubernetes Service (EKS) Cluster Via Terraform

Sampath Karan — Sun, 14 May 2023 03:18:14 +0000

Kubernetes is the new normal when it comes to host your applications.

AWS Elastic Kubernetes service is a managed service where the control plane is deployed in a High Availability and it is completely managed by AWS in the backend allowing the administrators/SRE/DevOps Engineers to manage the data plane and the microservices running as pods.

As of writing the post today Kubernetes community has a three releases per year cadence for the k8s version. On the other hand AWS has their own customized version of Kubernetes(EKS Version) and have their own release cadence. You could find this information at https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html

Note - EKS upgrade is a step upgrade and can be upgraded from one minor version at a time for e.g. 1.21 to 1.22

Managing AWS EKS via terraform helps us to maintain the desired state and it also allows us seamlessly to perform the cluster upgrade.

Pre-requisites in Terraform

Verify that the state file of EKS does not throws any error before the upgrade.
Ensure the state is stored in a remote place such as Amazon S3

Pre-requisites in EKS

Ensure 5 free IP addresses from the VPC subnets of EKS cluster (explained in below section)
Ensure the Kubelet version is same as the control plane version
Verify EKS addons version and upgrade if necessary before the start of cluster upgrade.
Pod Disruption Budget (PDB) some time cause error while draining pods (recommended to disable it while upgrading)
Use an K8s API depreciation finder tool like Pluto to support the API changes on the newer version.

Upgrade Process

https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html

Let me break down the upgrade process that happens when we perform the upgrade. This is a sequential upgrade

Control Plane upgrade

The control plane upgrade is an in-place upgrade means that it launches new control plane with the target version within the same subnet of the existing control plane and that is the where we need atleast 5 free IPs in the EKS subnet to accommodate the new control plane. The new control plane will go through readiness and health check and once passed the new control plane will replace the old plane. This process happens in the backend within AWS infrastructure and there will be no impact to application

Node upgrade

The node upgrade is also an in-place upgrade where it launches new nodes with the target version and the pod from old nodes will get evicted and launched in the new node.

Add-ons upgrade

The addons such as coredns, VPC CNI, kube-proxy etc on your cluster need to be upgraded accrodingly as per the matrix in https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html#vpc-add-on-update.

Let us take an example of upgrading from 1.21 to 1.22

Step-1:

Ensure control plane and nodes are in same version

kubectl version --short kubectl get nodes

Step-2:

Before updating your cluster, ensure that the proper Pod security policies are in place. This is to avoid potential security issues

kubectl get psp eks.privileged

Step-3:

Update you target version in your terraform file to the target version say 1.22 and then perform a TF plan and apply

Step-4:

Once the control is upgraded,the managed worker nodes upgrade process get invoked automatically. In case you are using the self managed worker nodes upgrade. Choose the AMI as per your control plane version and region in the matrix below https://docs.aws.amazon.com/eks/latest/userguide/retrieve-ami-id.html
Update your worker nodes TF file with the new AMI id and run TF plan and apply

Step-5:

Once control plane and workernodes upgrade were completed. Now it is time to upgrade the addons, see what addons are enabled in your cluster and upgrade each addons via console or eksctl based on how you manage it.
Each addons has the compatiblity matrix from the AWS documentation and it has to be upgraded appropriately
sample ref : https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html#vpc-add-on-update

Step-6:

If you wish to upgrade from 1.22 to 1.23 repeat the steps above