What Is Vertex AI Agent Memory Bank ?

#ai #llm #rag #vertexai

Since the emergence of the LLMs, we have travelled a lot in this journey. Initially, we addressed time-based data limitations through web search functionality, followed by improved reasoning with deep research capabilities.

Through these advancements, we discovered that context is critical for generating accurate responses. We first tackled this with prompt engineering, and more recently, context engineering, a new approach that emerged just days ago.

To address the same challenge, Google introduced the Vertex AI Agent Memory Bank, which manages memory recall, addition, and updates or deletion of past memories.

Let's understand what the Vertex AI Agent Memory Bank is and how it works.

What is Vertex AI Agent Memory Bank?

The Vertex AI Agent Memory Bank is a dynamic system for storing memories from conversations between a user and an AI agent. These memories serve as context for future interactions, allowing the agent to access past conversation data across different sessions to provide personalized responses.

You might wonder, why do we need a memory bank in the first place? The issue lies in how we provide context to LLMs.

Problems with Current Methods

Feeding entire conversations as context often leads to a context window overflow.
Large contexts increase inference costs and slow down performance.
As conversation data grows over time, it reduces LLM quality, causing issues like the lost in the middle and context rot problems.

Lost in the Middle Problem: LLMs struggle to process information present in the middle of long input sequences. They perform better when relevant details are at the start or end of the text, but accuracy drops for information in the middle.
Context Rot: LLM performance declines as the input context length increases.

How Does Vertex AI Agent Memory Bank Solve These Issues?

It stores personalized information from user-agent conversations.
It retrieves data from past conversations, regardless of the time elapsed.
It fetches only the necessary information from the memory bank, improving accuracy and updating data to maintain relevance, thus reducing incorrect responses.
It eliminates the need for users to repeatedly provide context in prompts.

How Does the Memory Bank Work?

The Memory Bank integrates with Vertex AI Agent Engine sessions to generate and manage memories. It relies on several key functionalities:

Session: Created using CreateSession, a session is a chronological sequence of messages between a user and an agent. Each session requires a user ID to map memories to specific users for personalized responses.
Events: These are interactions, such as user messages, agent responses, or tool actions, saved to the memory bank using AppendEvent. The agent retrieves these events with ListEvents to generate responses.
Memory Generation: Memories can be generated in two ways:
- GenerateMemories: Automatically extracts facts from conversations at specific intervals (e.g., end of a session or turn) for use in current or future sessions.
- CreateMemory: Used as a "memory-as-a-tool" approach, giving the agent more control over when to write memories. Memories can be retrieved using RetrieveMemories, either as a complete set (simple retrieval) or the most relevant ones (similarity search retrieval), depending on the need.

How to Implement Vertex AI Agent Memory Bank?

Currently, you can use the Memory Bank through two methods:

Memory Bank in Action

The image below illustrates how the AI Agent Memory Bank operates in real time.

Initially, a user mentions having oily skin and seeking a moisturizer. The memory bank stores this as "User has oily skin."
Days later, when the user asks for a cleanser recommendation, the agent retrieves the stored information about the user’s oily skin and suggests a suitable product.
Later, the user follows a routine that dries their skin, prompting the agent to update the memory bank.
When the user asks for recommendations again, the agent uses the updated "dry skin" information to provide a tailored suggestion.

This demonstrates how the memory bank efficiently supplies LLMs with relevant context, delivering personalized responses without requiring users to repeat context in every query or system prompt.

Conclusion

The Vertex AI Agent Memory Bank enhances user experience by enabling LLMs to access and utilize conversation history effectively. By addressing context-related challenges, it ensures faster, more accurate, and personalized responses, making interactions with AI agents more seamless and efficient.

👉 If you found this helpful, don’t forget to share and follow for more agent-powered insights. Got an idea or workflow in mind? Join the discussion in the comments or reach out on Twitter | LinkedIn