🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.
Overview
📖 AWS re:Invent 2025 - Put your data to work for Agentic AI with AWS storage (STG218)
In this video, AWS specialists Venkata Sistla and John Mallory explore building context-aware AI agents with persistent memory using AWS storage services. They explain the distinction between short-term and long-term memory types (episodic, semantic, conversational, and summary memory) and how data transforms into actionable memory through a five-stage process. The session demonstrates how Amazon S3, S3 Vectors, S3 Tables, and FSX integrate with Amazon Bedrock AgentCore to create scalable agent architectures. Real-world examples from BMW Group and Rocket Companies illustrate production implementations. Key recommendations include building Iceberg-based data lakes, using Model Context Protocol (MCP) for tool integration, and implementing proper metadata strategies for agent data discovery.
; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.
Main Part
The Critical Gap: Why Your Petabytes of Data Can't Power AI Agents
Here's the paradox we're facing in 2025. We're building AI agents that can reason, act, and plan autonomously, agents that should be able to achieve your business outcomes, optimize business operations, and also make real-time decisions. Yet most organizations are sitting on petabytes, sometimes exabytes of data, that their agents simply can't use effectively. Your customer service logs in S3, product documentation, some sort of a distributed file system, years of transaction history in S3. Data that has been perfectly stored, governed and backed up, but there's a gap, a critical gap in terms of where the data sits and how your agent needs to access it.
Here's the irony: the data is already there. You've invested in storing it, backing it up, and protecting it. But when the agent needs to recall a customer interaction from years ago, or find a pattern across millions of transaction history, or tap into institutional knowledge when it needs to make a decision, it hits a wall. Because agents don't just need data, they need memory, they need context, they need to be able to learn, collaborate, and adapt in real time.
The breakthrough everyone's chasing isn't about building smarter models. We're already getting pretty good at that. The real breakthrough lies when your existing data becomes agent ready. So turning passive storage into active memory, transforming archived information into accessible intelligence that your agents can actually work with. That's exactly what we're here to discuss today. Because the most powerful agent is only as good as the data architecture behind it. So let's talk about putting your data to work, not just for analytical or compliance, but for AI agents that are shaping the next generation of applications and enterprises.
Introduction and the Coming Wave of Agentic AI in Enterprise Applications
I'm Venkata Sistla, a Senior Worldwide Specialist Solutions Architect here at AWS. From today's agenda perspective, we'll cover several topics. First, we'll start off with covering a few fundamentals in terms of what makes an agent truly agentic versus just another simple chatbot. Then we'll dive deep into what it takes to build context-aware agents. We'll also explore building scalable agents using managed versus self-managed approaches. Then we'll dive deeper into how AWS storage becomes the external brain for your AI agents. Finally, we'll leave you with a few architectural examples and references and resources that you can take away after the session so you can learn and dive deeper.
These numbers should grab your attention. We're looking at a massive transformation that's happening right now. Gartner predicts that by 2028, one in three enterprise applications will have agentic AI. That's nearly up from less than one percent in 2024. We're also talking about a $120 billion market by 2030, which tells you that this isn't just hype and enterprises are making serious investments.
I'm also seeing this firsthand from my customers where they're transitioning from building simple Q&A chatbots to really complex multi-agent systems as part of their enterprises to solve complex problems. The question isn't whether this will happen, but whether you'll be ready when it happens. Some of the customers that I've worked with include Ericsson, Thomson Reuters, and Expedia, who are actually going really deep in terms of deploying agentic architectures as part of their enterprise applications.
What Makes AI Truly Agentic: Beyond Simple Chatbots to Autonomous Systems
Agentic AI promises to enhance productivity and efficiency, taking on problems that were difficult to be solved by traditional software, simplifying integration, and also finding answers and data that were previously invisible. Agentic AI systems autonomously decide how to accomplish a task, taking on prompts in the form of natural language and adapting their plan as they're learning about new information. But there is still a human in the mix. A human essentially sets the goal in the form of natural language and exercises supervisory control.
What makes it truly special is their ability to learn and improve over time. Every interaction, every new piece of information that you give to the agent, it essentially stores it in memory and then adapts for its future conversations. This diagram illustrates what transforms a simple LLM into an autonomous agent capable of solving complex problems. The key difference here is autonomy. These systems don't just respond to prompts, but they're actively working towards accomplishing a specific objective or goal set by the human.
Going with the components, the first we have is the LLM. The LLM is more capable and responsible for reasoning capabilities, so it'll give you the understanding of the intent as well as making decisions about next steps.
This is critical. Second, you have tools. Tools give the ability of the agent to interact with the internal and external environment, whether that's extracting data, executing certain functions, or interacting with systems. Third, memory. Memory ensures continuity. An agent doesn't need to start from scratch all the time. It essentially builds or resumes from where it left off. Fourth is context awareness. Context awareness allows the agent to understand its environment so that it can adapt its behavior based on who it's interacting with. Finally, prompt engineering. Prompt engineering defines the agent's role, capabilities, and constraints that it needs to operate with.
What makes this powerful is the shift in how we interact with AI. Instead of step-by-step instructions that we've been used to over the last couple of years, you're now providing the overall goal in the form of natural language, and the agent then autonomously plans the required steps to achieve that goal and continues to deliver. This is a million-dollar question that every organization is grappling with right now. Most context-aware agents don't just need a good LLM—that's just table stakes. You need persistent memory, real-time data access, and the ability to learn and adapt over time. Most importantly, you need all of this with enterprise-scale, enterprise-grade security and governance.
Agent Memory as Computational Exocortex: Why Agents Without Memory Are Like Goldfish
Let's dive into what this actually looks like in practice. Agent memory is a computational exocortex for AI agents. This means it's a system that combines an LLM's built-in memory with persistent storage so that it's able to remember, retrieve, and adapt over time with past experiences and new information that it's trying to learn. Just like human memory, it helps agents build knowledge over time, maintain context across conversations, and learn behavior adaptations based on interactions. This transforms agents from one-off responders to reliable and truly intelligent systems.
Our customer conversations show that one-off interactions are good at capturing the initial impression, but the real value comes from agents having the ability to remember context, learn from history, and adapt the overall experience. Memory management isn't just a feature; it's the core infrastructure that turns reactive agents into truly intelligent systems that deliver sustained value for your enterprises. Let me show you a real-world example. This perfectly illustrates why memory matters. The difference in user experience is dramatic, and it translates directly to business value. Pay attention to how this particular conversation flow changes completely when the agent is able to remember the past conversation and build upon it.
Without agent memory, there are several critical limitations that we will end up hitting. First, inability to maintain conversation continuity. The agent cannot build upon or reference previous dialogue. Second, no behavioral adaptation. The agent cannot learn from user feedback or adjust approaches based on your preferences. Third, lack of personal objectives. Without personal objectives, the agent cannot sustain the overall session or achieve the goal set by you. Fourth, missing personalization. The agent cannot develop user-specific preferences, and I have some examples in future slides that will demonstrate why personalization is a very important trait for AI agents.
There's research that was done by Microsoft and Salesforce in a study called "LLMs Get Lost in Multi-Turn Conversations." The study found that most LLMs experience significant performance drops in extended conversations, primarily because those LLMs make premature assumptions early on and fail to recover when they're proved wrong on those assumptions. This exactly goes back to the conversation continuity limitation that we just discussed. Here's another frustrating example of agents without memory. Every conversation starts from scratch. Notice how the agent asked for the iPhone model twice within just two days apart. This creates a terrible user experience and makes the agent feel robotic and unhelpful.
I love this quote which says an AI agent without memory is like a goldfish—everything's new every three seconds.
When we implement proper memory, we unlock three key capabilities that transform the user experience. First, contextual intelligence, meaning the agent not just understands what you're asking, but why you're asking it. Second, user preferences. It truly personalizes interactions. The agents are able to remember and adapt based on how you work and communicate. Third is knowledge retention. With continued interactions, the agent builds its own knowledge base about the world, the facts, the things, and the participants, and it gets essentially smarter with every interaction. These aren't nice-to-have features. They're really essential for enterprise-level adoption.
From RAG to Agentic Memory: Extending Context Beyond Finite Windows
Now let's dive a little deeper into agentic memory. Understanding these core concepts will help you design AI systems to work at scale. Think of it as building neural pathways for your AI. If you get this right, everything else from an AI agentic deployment becomes much easier. Agentic memory is really critical, especially when you're building personalized AI systems, because it enables adaptive learning through each interaction. It allows agents to understand individual preferences, communication styles, and behavioral patterns. Without access to persistent memory, even the most sophisticated agents or chatbots cannot really provide the personalized user experience that we are demanding in this day and age of AI applications.
This is probably a personal reflection looking at the last couple of years. GPT models have provided broad general knowledge to begin with. We quickly introduced RAG to ground those GPT models to our proprietary data. But what we quickly realized was that as RAG architecture scales, we encountered a limitation with finite context windows, which limited us with the conversation continuity problem that we tried to uncover previously, as well as personalization in the overall experience of how we interact with AI. Agentic memory essentially extends the RAG capabilities by providing persistent memory across sessions. It enables agents to build context over multiple interactions and delivers really relevant and personalized experiences to users. The RAG concept doesn't go away; it is essentially getting extended into the concept of agentic memory, where it's trying to give us multiple benefits.
Data retrieval becomes fundamental to agentic memory architecture. Effective systems must intelligently surface the relevant context from vast stores of user stories and past interaction data. This requires sophisticated algorithms to precisely identify which elements of past conversations would really supplement the current context of existing interactions, enabling truly adaptive learning that compounds over time. Advanced systems still require higher-order forms of information retrieval, organization, and retention that mirrors human cognitive processes, which we now define as agentic memory.
There are two fundamental types of memory that every agent needs. Short-term memory, also called raw memory, is like RAM in a computer, which is mostly temporary and session-based. Long-term memory, also called intelligent memory, is like a hard drive, which is more persistent and supplements evolutionary learning. This dual memory architecture enables both immediate responsiveness and sustained improvement over time. Long-term memory is also crucial for enabling AI self-evolution, where agents automatically learn, adapt, and refine their reasoning based on accumulated examples and interactions of data. By incorporating long-term memory, AI agents become like adaptive teammates, really getting specialized in their skill and knowledge over time as subject matter experts, just like humans would evolve.
Short-Term Memory Architecture: Working Memory, Episodic Memory, and Conversational Context
Let's dive a little deeper into the short-term memory concept. Short-term memory is all about maintaining the conversation flow and immediate context. In this particular example, see how the agent is able to remember the iPhone model from the earlier conversation and provide really specific help to the user. This requires storing and retrieving conversation history in real time. The agent needs to quickly access recent messages and understand the current context and maintain that state across multiple interactions. From a storage perspective, this means we require really fast and low-latency access to the data.
Think of this as an agent's working memory. It needs to be immediately accessible, but also doesn't necessarily need to persist forever. Short-term memory units last anywhere from seconds to days, depending on the application needs.
There are two terminologies when we mention short-term memory: working memory and short-term memory. Often these terms are used interchangeably, but there is a clear distinction. Working memory is a special type of short-term memory that is used specifically for actively processing information for that specific task. Short-term memory is a broader temporary storage for the overall session.
Not all working memory is short-term memory, but all short-term memory is working memory. Working memory is the doing part, while short-term memory is the holding part for that session.
Episodic memory is the agent's record of specific events and interactions, just like humans' personal memory of our own life experiences. It stores conversation history, summaries of important events, and individual occurrences with specifically attached metadata such as timestamps and participants. Conversation memory is a specific type of episodic memory that is essentially focused on chat history or user preferences.
It keeps a complete record of conversations—who said what and when—and also helps agents stay consistent throughout interactions. It is able to refer back to the earlier parts of the conversation and provide contextualized responses as part of the interaction. The system continuously updates its memory blocks as the conversation progresses. In short, episodic memory is the what happened storage, and conversational memory is the what we talked about storage.
Long-Term Memory Systems: Semantic Knowledge, Entity Recognition, and Summary Distillation
Long-term memory is where things get really interesting, and this is about learning and personalization over time. Notice how the agent is able to remember specific preferences from a few days ago. It is able to remember the brand, the employee discount, and also the color of the headphones. This isn't just storing data; it's about extracting and organizing those insights that can be applied to future interactions.
The agent builds up a profile of user preferences that gets richer and more accurate over time with every interaction. From a technical standpoint, this requires a very sophisticated storage and retrieval system that can quickly find relevant preferences based on the given context. This is where vector databases and semantic search become really crucial for overall agentic performance.
Next, we have semantic memory. Semantic memory is an agent's organized knowledge base—everything the agent knows about the world, including facts, concepts, and how things relate to each other. This includes knowledge bases, which are essentially collections of factual information. Entity memory provides specific details about people, facts, and things. Persona memory is role-based knowledge that guides the agent in terms of how it should behave with every interaction.
Semantic memory is essentially an agent's structured world knowledge that enables consistent reasoning. The most common real-world example is RAG, which is able to take factual documents or proprietary data that we supply and provide contextualized responses only from the given factual information. In simple terms, semantic memory is an agent's encyclopedia of facts and concepts that it can reference to answer questions, make decisions, and separate memories for each user profile.
In this particular example, the agent has learned specific business values or business rules. It has learned the return policy, the employee discount, and the overall product specification.
Semantic memory is particularly powerful in the context of enterprises, especially when you have company policies, factual documentation, or various product catalogs. When you have different dimensions of information and want an agent to find semantic meaning or context across different dimensions, semantic memory is where it comes into play. Rather than just storing the raw conversation data, the agent essentially builds up a knowledge base, as you're seeing on the right-hand side as an example, full of facts and relationships. This is where the integration between your agent's memory and your existing data sources within your organization becomes really crucial.
Third, summary memory, essentially known as distilling for key insights. Summary memory is another type of episodic memory that distills key insights from longer interactions. Practically, we can store every single interaction in storage. But when we're looking to scale the overall agentic performance, retrieving every single interaction from storage and then extracting the insights from it would be a very laborious task or very performance intensive. This is why the agent essentially stores a summary of the very long conversation with really key messages or the storyline of what that interaction was about.
In this particular example, you can see it's talking about the user buying headphones, finding a price match or a cheaper price, and being able to do the price match as a result of it. This is also where you start to see the value of agents having the ability to learn and adapt with the summarization over time.
Building and Deploying Agents: From Open Source Frameworks to Amazon Bedrock AgentCore
I'm John Mallory, a go-to-market specialist here at AWS, and I want to switch gears now. Venkata just walked us through the importance of both short-term and persistent memory for building agents. What I'd like to do is double-click into some common approaches of how you do this and then layer in how storage supports all of that.
Let's get started with how you're going to build, deploy, and host agents. A very common design pattern is that a lot of AI builders want to start off using open source frameworks like Strands Agents, LangChain, LlamaIndex, and AutoGen. There's a whole host of them out there, and they're really good because they can accelerate experimentation and building and learning. They have a lot of packaged tools built in and can really simplify things, particularly using interfaces and protocols like MCP and agent-to-agent stitching all the components together.
However, the challenge lies in how you start to move this into production. Then you have to start to worry about scaling infrastructure, managing security, managing the various types of memory that Venkata talked about, and gluing all these pieces together. As you want to scale to hundreds of thousands of users with very complex orchestrated agentic workflows while making sure you've got the right guardrails and safety and security around it, it can quickly get very complicated.
This is evidenced by a Gartner study that over 40 percent of agentic AI projects will be canceled in 2027 due to unclear business value, increasing cost, and questionable security and governance policies that aren't going to meet enterprise requirements. That being said, we do see a lot of sophisticated customers using self-managed frameworks. There are a couple of other approaches. Down on the left, if you're just starting out your journey and want to leverage the power of agents, we have Amazon Que in our quick suite, which has agents packaged in for common enterprise workloads. You can really leverage the power of agentic AI without needing to build anything.
The next stop is the fully managed approach using Amazon Bedrock Agents, which really starts to stitch all these pieces together. It handles key parts of the infrastructure like hosting the LLMs, building knowledge bases, and orchestrating multi-step agent workflows, so that gives you a lot of flexibility and can help you get started quickly.
The key thing is you really have to choose between a do-it-yourself approach using these open source frameworks and some of the features and capabilities of Bedrock. What we released back in July and went GA in October was a Bedrock AgentCore. This removes a lot of the undifferentiated heavy lifting of building agents where you can take a mix and match approach. You can use your open source frameworks and choose the framework of your choice, have your choice of models both inside and outside of Bedrock, but then have these agent core components that really start to make it a managed experience.
If you take runtime, that handles compute where you don't have to worry about provisioning or scaling compute up and down—it does that for you. Agent core memory reduces a lot of the work you have to do to build both the short-term and long-term memory that was discussed. It automates a lot of that. Identity helps with all the security and permissions, and gateway can help orchestrate between all the tools you're going to use. This is a great way to get started, but even if you take this approach and use agent core memory, you still need to think about storage when it comes to building agents.
Three Critical Challenges: Data Accessibility, Security Governance, and Semantic Search at Scale
Before we dive into the various components, let's talk about a few of the challenges you need to be mindful of as you start this journey or as you evolve in this journey. The first is you've got to make all of your data accessible, discoverable, and actionable. Agents that create business value are going to need to access all forms of your data that you have in your organization, company, or enterprise. This is everything from structured data that may live in data warehouses and databases—transactional data that drives all of your business transactions, billing, and key use cases that power the business daily—all the way to unstructured data of all forms depending on what industry you're in.
This includes things like images, video, call record logs, and if you're in specific industries like healthcare, medical records and pathology reports. Agents really need to be able to understand and choose from all of these data sets. The second key challenge is privacy, security, and governance. You want the agents to access as much data as they need to be effective and add value, but you need to put guardrails on that because you cannot build an agent that exposes sensitive patient data to the world. You've got to have proper governance and guardrails.
As discussed, agents are going to act if they're built and really adding value autonomously without human intervention and achieve complex goals. So you really need to monitor this both to make sure you don't have any data leakage or governance breaches, but also to collect data on how to improve accuracy and iterate as your agents learn and evolve. You need to think about governance. Finally, semantic search and storage is key both for grounding agents with data via RAG as well as for long-term memory. This isn't just theoretical.
I had a customer meeting this morning with a large enterprise software customer who has built some agents and seen some business value. Now they really want to scale it, but they're thinking about how to build the data foundation to do this. Essentially, they want an agentic layer that they can bring the agents to the data because if you start to bring data to the agents, now you're back with data sprawl and trying to maintain data fidelity across multiple copies. This really brought all these challenges to life. They've got multiple data warehouses—Snowflake, Databricks, Redshift, Athena—all these different pieces, and they're trying to figure out how to stitch this all together into a common data foundation before they can even start their journey.
AWS Storage as the External Brain: S3, FSX, DynamoDB, and the Apache Iceberg Foundation
So let's switch gears now and talk about some of our storage services and how it can help with all this. This is a map I like to use to start to layer in how storage interacts.
No matter if you're going to use Bedrock AgentCore memory, which manages short-term and long-term memory, you still need to think about that data foundation. This brings us back to the enterprise software provider challenge. You want to start with a data lake, ideally built on S3 if you're building on Amazon, and then use services like S3 Tables. This will aggregate your data, allow you to provide common security and governance, and make the data cataloged, discoverable, and usable by the agents.
We use S3 because of its durability and scalability, both the AWS first-party services that integrate with it and all the third-party providers that use it as a common data hub. Building on that, as Venkata discussed, short-term memory is the temporary storage or working memory for information that the agents are processing. It's typically stored in raw data formats. If you have multiple agents coordinating with each other, they need to exchange information, so they need to have a shared storage layer.
Think of a research agent workflow where you probably have an agent discovering data, maybe searching through academic papers, another one verifying citations and rights to use that data, and then another one summarizing. This is all going to be iterative, so they have to communicate. Given this, we typically recommend starting with our FSX family of file services. There's a whole family to choose from. They're low latency and scalable, you have a lot of options to save cost and make it easy to manage for scratch space and for shared memory access.
S3 plays a part here because ultimately you're going to want to snapshot that state so that if things crash, you can quickly recover. You're also going to want to start to stage and consolidate all that short-term memory and run it through a processing pipeline to start to create the longer-term memory, the semantic memory that Venkata talked about. You can use S3 for that. A couple of other components of short-term memory are that many of these agents can be highly transactional, so people may want to use a high-performance key-value store to capture state like DynamoDB.
Another layer you want to build at that level is a semantic cache layer, which might be something like ElastiCache, which recently introduced vector search. If you put caching in front of your LLM, your RAG, and your prompting, even though those storage layers can be expensive on a dollar basis, if you reduce the interaction back and forth with the LLM and cache responses, you're quickly driving down cost and making it a much more cost-effective architecture.
Finally, as Venkata also indicated, you need to think about semantic memory and RAG to ground the LLMs and get the most accurate results. We recommend S3 Vectors, which just went GA today in Garmin's keynote. It's a very cost-effective, highly durable, and very scalable approach for that. What do you want to consider when you're building that data lake foundation? I talked about this enterprise customer who has all these different platforms that they're trying to integrate, each with their own formats, their own catalogs, their own security and governance.
The industry has built an open-source solution to start to address these challenges called Apache Iceberg. If you're building a new data lake or even if you have an existing one, you probably want to think about how to adopt Apache Iceberg as your data lake foundation. A couple of key advantages of it are that it brings a lot of the data warehouse capabilities like ACID transactions, rollback, and a number of other transactional integrity features.
Secondly, it has Iceberg REST catalog endpoints which make it easy to integrate different data catalogs that different provider solutions have together with each other. Now you can start to get better interoperability between multiple data engines, both AWS native ones as well as a lot of third-party and even open source ones. Definitely consider building Iceberg as your data foundation.
To help with that, we launched S3 Tables, which is a native Iceberg managed table within S3. We launched that a year ago here at re:Invent and have continued to iterate on it to the point where today we just announced intelligent tiering for it, so you can really optimize cost for Iceberg and native Iceberg replication across regions. If you're an enterprise, you can start to build highly resilient, highly available Iceberg infrastructure.
Model Context Protocol and Data Discovery: The USB Port for Agents
I want to divert a minute and talk about Model Context Protocol because that's really the key to stitching all these components together. It's an open source standard that allows AI agents via MCP clients to communicate with external tools, data, and services in a structured way. It allows agentic AI vendors and builders to have access to potentially thousands of tools that your agents can call upon without having to learn all those individual interfaces. You just need to know how to speak MCP, pick the right MCP servers from a catalog of tools, and now you can start to build. If you have data over all these different devices, you can quickly stitch it all together. Your agents can start to understand how to access data without knowing how to speak individually to each one of these devices. It's like a USB port for agents.
We've started to adopt that in the storage family as well. We released an MCP server for S3 Tables, so agents that want to use tabular data stored in Iceberg format can speak to and understand and access data in S3 Tables. One tip I would recommend when you're thinking about MCP, particularly if you're just starting out, is you should really consider only using it with read-only capability and scoping down to the least needed access privileges via IAM. These MCP servers like the one for S3 Tables support all that through IAM, but you should really be deliberate about not doing an allow write type capability unless you're absolutely sure you have a specific use case that needs that.
Another key is data discovery. You're going to build this data lake and have all your data aggregated. How are agents going to discover this? Structured data catalogs have been the foundation of analytics and data lakes for decades, but a lot of the data that AI agents are going to use is unstructured data of many different types, and traditional analytic catalogs don't really deal with that well. You really need to think about a metadata strategy and have that metadata be discoverable and consumable by agents so that they can self-discover and self-describe and understand the data that they're drawing upon. Metadata is really what makes it actionable.
To help with that, we released S3 Metadata, which is a built-in metadata service in S3. It will take system-collected metadata, allow you to augment it with user-generated metadata, and then query it through an Iceberg table. You can actually use the MCP server for S3 Tables to also let agents start to discover metadata and the data itself. The way it works essentially is you turn it on in S3, and S3 will populate a metadata table both with inventory of all the objects but also with lineage and essentially mutations of the objects so you can also start to use it for governance.
When you combine all these pieces together with S3 Tables and MCP, AI agents can now have true data discovery, particularly if they have MCP servers that also speak to things like Iceberg catalogs. Another key thing we just introduced in the keynote today for NetApp was S3 access points for FSX. One of the challenges is that while it's good to say build a data lake, a lot of people's data still lives on premises in traditional file systems. What do you do about that when there are many legacy applications that don't know how to speak to object storage? They were built to speak to traditional file systems. IDC indicated in a study that while things are moving to the cloud, 48% of data still lives on premises and 29% in the cloud by the end of 2028, so it is a gradual transition. You really need to start bringing all this data together in a coherent way.
While you could bring the agents to data, going back to this customer I talked about, you start to run into data fidelity issues. This is where FSX access points for S3 really help. You can take data that lives in traditional file systems, and if it's in NetApp file systems, which has been a standard on-premises for decades, you can start to use their replication capabilities and snap copy to maintain a coherent copy with the cloud, even if you're running on premises. You can expose that into S3 as if the data is living there natively. Now when you're building that data foundation, you've integrated your file data without having to maintain a separate copy.
S3 Vectors and Agentic Search: From BMW's 20 Petabyte Data Lake to Intelligent Recommendations
This is really important because the same IDC study indicated that today an enterprise has 6.4 data silos per organization and has to manage 13 copies of data. Anything you can do to simplify that is really going to help with making your agents have access to data in a more simplified way. Finally, vectors really are the language of AI. You can take increasingly sophisticated embedding models and create vectors of any data type, then find things that are semantically similar both in context and meaning. For agents, you can find things that even look the same. They really power a lot of this, everything from RAG to intelligent search to long-term memory.
We launched S3 Vectors this summer, and it went GA today . It's really designed to have good enough performance for a lot of agentic workflows and RAG workflows, with lower cost of similarity search and semantic search by up to 90% when you look at TCO. It scales to billions of vectors. With the limits we introduced today, 2 billion vectors per index, 10 billion in 10,000 indexes per bucket, you can scale to 20 trillion vectors in a single S3 vector bucket. You're not going to run into scalability challenges.
Starting to stitch all these pieces together, you can start to do things like agentic search , where agents can use conversational information and choose strategies. Venkata talked about a shopping assistant and mapping that to short and long-term memory. The shopping assistant can start to pull in image data and user preference data and make intelligent recommendations to buyers. It can even start to create images using a multimodal model so that they can visualize what that dress they found, or another one like it, might look like on them. That's super powerful.
To ground it in a real-world customer example, BMW Group has a 20 petabyte data lake , which they call their cloud data hub of all their data about their vehicles, their manufacturing processes, and their warranty information. They really wanted to make it simple for all of their users to access all these different data types without being data experts. They built an agent-powered search capability for this.
This capability can provide three modes of search including direct structured search, hybrid search across structured and semantic data, and peer similarity search based on the context of the natural language query the user is trying to do. So it really does democratize access to data for all of their users.
The Five-Stage Transformation: How Data Becomes Memory Through Aggregation, Encoding, and Activation
So with that I'm going to turn it back to Venkata. Thank you, John. This is probably my favorite slide of the overall content. We are going to talk about when does data actually become memory. We saw the first part where we understood the memory concept constructs, and then John covered what are the building blocks that facilitate the overall memory building process. So now we are bringing both together to understand how data becomes memory.
In agentic AI development, data and memory are often used interchangeably, but that is fair because memory is actually data. However, there is a clear distinction between both terms. Data actually becomes memory when it transforms passive information into an active component that informs agent behavior and reasoning. This is a five-stage transformation process.
To understand the overall transformation in an easier way, let's look at a use case. Let's pick a customer chatbot on an e-commerce website so that we can follow the overall flow and then build the overall transformation pipeline. Let's say you have your source data stream. On the e-commerce website, you are getting different channels of information: order information, supply chain merchandise, and all sorts of information coming through the sources.
The first step is to aggregate them. You consolidate them into one single structure, bringing them all together typically in a data lake and aggregating them at the first stage. The second step is encoding or structuring. We then take the structured and unstructured data that we have aggregated and create vector embeddings out of them with enhanced metadata. Not just creating vector embeddings would be helpful, but adding more extensive metadata to it goes a long way. The second step of structuring is creating vector embeddings using your preferred model, with all the models available out of Bedrock.
The third step is storage, which is where the persistent encoded data, like the embeddings that we have created as part of stage two, are created in stage three. This is where S3 Vectors shines as a long-term persistent memory store because it has the scalability attributes from a latency perspective, storage, and performance. So S3 Vectors becomes the preferred long-term memory store for AI agents.
Fourth is organization. We essentially structure the data through modeling, indexing, and relationships. Conversations are organized chronologically. Product information is organized hierarchically, and order data is spread across multiple conversations. It is mostly structured across either time-based or topic-based organization from this particular use case perspective.
The last stage is activation or retrieval. This is a crucial stage where information becomes actionable memory. Through text search for exact matches, vector search for semantic similarity, and also graph reversal for complex relationships. Then we have the LLM. We utilize the LLM's built-in capabilities and go through an iterative process. As we are interacting with the chatbot more iteratively, it is able to take the learnings from the interaction, put it in the storage again, go through a reorganization. It then gets activated and then sends back to the LLM. This is the iterative process which essentially gives the ability for AI agents to become truly intelligent and adaptive systems.
A memory unit is a structured container that holds information plus metadata and also relationships about the information it has learned, so that essentially makes it useful for agent reasoning. Unlike traditional data storage, which treats all information as equal, there is a distinction that we are trying to form here. Memory units carry several entities. The first one is the temporal context, which is when the information was learned. The second is a strength indicator, which means how relevant the information is and how reliable it is. The third is the associative links, how it connects to other memories that it has stored. Fourth is the semantic context, which is the overall meaning captured with the interaction. The last is retrieval metadata, how and when it should be accessed.
Data essentially becomes memory at the point of storage, which is step three. When data is collected and stored with the intent of enabling adaptation and coherent interaction over time, this transformation is what enables stateless applications to become truly intelligent agents.
Production-Ready Architectures: EKS Deployment Patterns and Rocket Companies' Success Story
Here is a comprehensive architecture for deploying agents using Amazon EKS. This is more of a self-managed approach. It highlights all the components working together, including authentication, model serving, memory management, and monitoring. Notice how S3 Vectors are integrated within the memory construct. You have ElastiCache and DynamoDB as the short-term memory offerings, and then S3 as the working memory or scratch pad, and S3 Vectors covering the long-term memory. This architecture also includes proper security with Cognito authentication and monitoring built in using Amazon CloudWatch. This is a production-ready pattern that can scale to support multiple agents and high user volumes.
Now let's look at a weather agent and how it handles a simple query workflow. The agent maintains its session state in S3. It accesses its long-term memory through S3 Vectors and also stores its user content in Amazon S3 Data Lake House. The MCP server enables standardized integration with external APIs. We are leveraging an MCP server in this particular example. Even this simple example demonstrates the power of having persistent memory and context. Notice how the agent does not just answer the question. It actually remembers the user's location preferences and is able to provide personalized recommendations as a result.
Rocket Companies is a home financing organization that helps people through the whole process of home ownership. There is a good blog available if you search for Rocket Companies, AWS, and agents, which goes into much more depth about how they did this and their architecture. They used Amazon Bedrock Agents to build an agent-powered engagement platform for their customers that really helped their customers resolve their queries much more quickly and gave much improved guidance, which really helped with their customer satisfaction. Look up the Rocket Mortgages blog if you want to learn more.
Key Takeaways: Building Actionable Data Foundations and Scaling Cost-Effectively
Closing out with a couple of key learnings to hopefully tie all this together. First, no matter where you are going in your agentic journey, make your data actionable by building a modern data foundation, ideally Iceberg-based on Amazon S3, and then use cataloging and metadata, particularly in combination with MCP servers and tools so that agents can discover data and use it and learn and evolve from it.
Second, scale cost effectively because your data is going to grow if you are successful. Use Amazon S3 as your data foundation if it works, and use S3 Vectors as your semantic data foundation if that meets your needs. Third, you want to iterate rapidly, which means you are going to implement observability and look at things like your retrieval accuracy, how your agents are interacting with each other, and how they are evolving over time. You need to have a good observability story in place.
Make your agents quickly move from proof of concept to production and drive business value so you are not one of those forty percent of companies that fail in your AI project. This means consider using Amazon Bedrock AgentCore even if you want to take an open source framework, just to reduce the undifferentiated heavy lifting you have to do on the security, infrastructure, and storage side.
To wrap up with a few actions and resources, we have a couple of blogs here. The second one is the Rocket story that I mentioned, and the third one for the EKS design pattern is how to build a self-managed RAG design pattern using EKS and Amazon S3 Vectors that Venkata wrote. Thank you so much for your time. We are happy to take questions.
; This article is entirely auto-generated using Amazon Bedrock.





















































Top comments (0)