π»πͺπ¨π± Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr
GitHub repositorie: Strands Agent Samples
Part 2: Adding Persistent Memory with FAISS
In our Multi-Modal Content Processing with Strands Agent and just a few lines of code, we explored how to build a multi-modal AI agent capable of processing images, documents, and videos using the Strands Agent framework. Today, we're taking it a step further by adding persistent memory capabilities using FAISS (Facebook AI Similarity Search) to create an agent that can remember and recall information across sessions.
π§ Why Memory Matters
- Contextual conversations: "Remember that architecture diagram I showed you yesterday? How does it relate to this new document?"
- Progressive learning: Building knowledge over time from multiple interactions
- Personalized responses: Tailoring answers based on your previous preferences and content
- Cross-session continuity: Maintaining context even after restarting your application
π What We're Building
We'll enhance our multi-modal agent with:
-
FAISS-powered memory storage using the
mem0_memory
tool - Persistent information storage across sessions
- Smart retrieval of relevant memories based on context
- Memory management operations (store, retrieve, list)
π οΈ Setting Up the Enhanced Agent
Let's start by configuring our agent with memory capabilities:
import boto3
from strands.models import BedrockModel
from strands import Agent
from strands_tools import image_reader, file_read, mem0_memory, use_llm
from video_reader import video_reader
# Enhanced system prompt with memory instructions
MULTIMODAL_SYSTEM_PROMPT = """ You are a helpful assistant that can process documents, images, and videos.
Analyze their contents and provide relevant information. You have memory capabilities and can remember previous interactions.
You can:
1. For PNG, JPEG/JPG, GIF, or WebP formats use image_reader to process file
2. For PDF, csv, docx, xls or xlsx formats use file_read to process file
3. For MP4, MOV, AVI, MKV, WebM formats use video_reader to process file
4. Just deliver the answer
Memory capabilities:
- Store new information using mem0_memory tool (action="store")
- Retrieve relevant memories (action="retrieve")
- List all memories (action="list")
- Provide personalized responses
Key Rules:
- Always include user_id={USER_ID} in tool calls
- Be conversational and natural in responses
- Format output clearly
- Acknowledge stored information
- Reference relevant past interactions when appropriate
"""
# Configure Amazon Bedrock
session = boto3.Session(region_name='us-west-2')
bedrock_model = BedrockModel(
model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
boto_session=session,
streaming=False
)
# Create enhanced agent with memory capabilities
multimodal_agent = Agent(
system_prompt=MULTIMODAL_SYSTEM_PROMPT,
tools=[image_reader, file_read, video_reader, mem0_memory, use_llm],
model=bedrock_model,
)
πΎ Memory Operations in Action
1. Storing Initial User Context
First, let's store some basic information about our user:
USER_ID = "eli_abc" # Generate a unique user ID
content = """Hello, my name is Elizabeth, but they call me Eli. I'm a developer advocate at AWS,
and I want to understand what's in images, videos, and documents to improve my day-to-day work."""
# Store user context in memory
multimodal_agent.tool.mem0_memory(action="store", content=content, user_id=USER_ID)
2. Image Analysis with Memory Storage
Now let's analyze an image and automatically store the results:
print("=== πΈ IMAGE ANALYSIS WITH MEMORY ===")
image_result = multimodal_agent(
f"Analyze the image data-sample/diagram.jpg in detail and describe everything you observe. "
f"Remember this information for later. USER_ID: {USER_ID}"
)
print(image_result)
The agent will:
- Process the image using
image_reader
- Analyze the architectural diagram
- Automatically store the analysis in memory using
mem0_memory
- Provide a detailed description
3. Video Analysis with Memory
Let's process a video and store its content:
print("=== π¬ VIDEO ANALYSIS WITH MEMORY ===")
video_result = multimodal_agent(
"Analyze the video data-sample/moderation-video.mp4 and describe in detail "
"the actions and scenes you observe. Store this information in your memory."
)
print(video_result)
4. Document Processing with Memory
Process and remember document content:
print("=== π DOCUMENT ANALYSIS WITH MEMORY ===")
doc_result = multimodal_agent(
"Summarize as json the content of the document data-sample/Welcome-Strands-Agents-SDK.pdf "
"and store this information in your memory."
)
print(doc_result)
π Memory Retrieval and Management
Retrieving Specific Memories
# Retrieve memories related to a specific query
retrieved_memories = multimodal_agent.tool.mem0_memory(
action="retrieve",
query="What services are in the image?",
user_id=USER_ID
)
print("Retrieved Memories:", retrieved_memories)
Listing All Stored Memories
# List all stored memories for the user
all_memories = multimodal_agent.tool.mem0_memory(
action="list",
user_id=USER_ID
)
print("All Stored Memories:", all_memories)
Testing Cross-Modal Memory Recall
The real power comes when testing memory across different media types:
print("=== π§ MEMORY RECALL TEST ===")
memory_result = multimodal_agent(
"What do you remember about the image, video, and document I showed you earlier?"
)
print(memory_result)
π― Real-World Use Cases
This memory-enhanced agent opens up numerous practical applications:
1. Technical Documentation Assistant
- Remember architecture diagrams, code snippets, and documentation
- Provide contextual answers based on your project history
- Track changes and evolution of your technical designs
2. Content Analysis Pipeline
- Process batches of images, videos, and documents
- Build a knowledge base of analyzed content
- Generate reports based on accumulated insights
3. Personal Knowledge Management
- Store and recall information from various media types
- Create connections between different pieces of content
- Build a personalized AI assistant that grows with your needs
4. Educational Content Processing
- Analyze educational materials across different formats
- Remember student preferences and learning patterns
- Provide personalized learning recommendations
π Getting Started
- Clone the repository:
git clone https://github.com/elizabethfuentes12/strands-agent-samples)
cd notebook
- Install dependencies:
pip install -r requirements.txt
Configure AWS credentials for Bedrock access
Try the notebook:
multi-understanding-with-memory.ipynb
The combination of Strands Agent's multi-modal capabilities with persistent memory creates a foundation for building intelligent, context-aware applications that can truly understand and remember your content.
π Resources
- Strands Agent Documentation
- Part 1: Basic Multi-Modal Processing
- Complete Code Examples
- AWS Bedrock Documentation
- Getting Started with Strands Agents
Stay tuned for more Strands Agent implementations!
Gracias!
π»πͺπ¨π± Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr
Top comments (2)
Awesomeee, can't wait to build something amazing with this
Hi, great article.
Unfortunately the links to github resources do not work. I mean those two:
Strands Agent Documentation (maybe this is private repo?)
Complete Code Examples
Could you fix them?