DEV Community

Cover image for Multi-Modal Content Processing with strands agents and FAISS Memory
Elizabeth Fuentes L for AWS

Posted on • Edited on • Originally published at builder.aws.com

Multi-Modal Content Processing with strands agents and FAISS Memory

πŸ‡»πŸ‡ͺπŸ‡¨πŸ‡± Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr

GitHub repositorie: Strands Agent Samples

Part 2: Adding Persistent Memory with FAISS

In our Multi-Modal Content Processing with Strands Agent and just a few lines of code, we explored how to build a multi-modal AI agent capable of processing images, documents, and videos using the Strands Agent framework. Today, we're taking it a step further by adding persistent memory capabilities using FAISS (Facebook AI Similarity Search) to create an agent that can remember and recall information across sessions.

🧠 Why Memory Matters

  • Contextual conversations: "Remember that architecture diagram I showed you yesterday? How does it relate to this new document?"
  • Progressive learning: Building knowledge over time from multiple interactions
  • Personalized responses: Tailoring answers based on your previous preferences and content
  • Cross-session continuity: Maintaining context even after restarting your application

πŸš€ What We're Building

We'll enhance our multi-modal agent with:

  1. FAISS-powered memory storage using the mem0_memory tool
  2. Persistent information storage across sessions
  3. Smart retrieval of relevant memories based on context
  4. Memory management operations (store, retrieve, list)

πŸ› οΈ Setting Up the Enhanced Agent

Let's start by configuring our agent with memory capabilities:

import boto3
from strands.models import BedrockModel
from strands import Agent
from strands_tools import image_reader, file_read, mem0_memory, use_llm
from video_reader import video_reader

# Enhanced system prompt with memory instructions
MULTIMODAL_SYSTEM_PROMPT = """ You are a helpful assistant that can process documents, images, and videos. 
Analyze their contents and provide relevant information. You have memory capabilities and can remember previous interactions.

You can:
1. For PNG, JPEG/JPG, GIF, or WebP formats use image_reader to process file
2. For PDF, csv, docx, xls or xlsx formats use file_read to process file  
3. For MP4, MOV, AVI, MKV, WebM formats use video_reader to process file
4. Just deliver the answer

Memory capabilities:
- Store new information using mem0_memory tool (action="store")
- Retrieve relevant memories (action="retrieve")
- List all memories (action="list")
- Provide personalized responses

Key Rules:
- Always include user_id={USER_ID} in tool calls
- Be conversational and natural in responses
- Format output clearly
- Acknowledge stored information
- Reference relevant past interactions when appropriate
"""

# Configure Amazon Bedrock
session = boto3.Session(region_name='us-west-2')
bedrock_model = BedrockModel(
    model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    boto_session=session,
    streaming=False
)

# Create enhanced agent with memory capabilities
multimodal_agent = Agent(
    system_prompt=MULTIMODAL_SYSTEM_PROMPT,
    tools=[image_reader, file_read, video_reader, mem0_memory, use_llm],
    model=bedrock_model,
)
Enter fullscreen mode Exit fullscreen mode

πŸ’Ύ Memory Operations in Action

1. Storing Initial User Context

First, let's store some basic information about our user:

USER_ID = "eli_abc"  # Generate a unique user ID
content = """Hello, my name is Elizabeth, but they call me Eli. I'm a developer advocate at AWS, 
and I want to understand what's in images, videos, and documents to improve my day-to-day work."""

# Store user context in memory
multimodal_agent.tool.mem0_memory(action="store", content=content, user_id=USER_ID)
Enter fullscreen mode Exit fullscreen mode

2. Image Analysis with Memory Storage

Now let's analyze an image and automatically store the results:

print("=== πŸ“Έ IMAGE ANALYSIS WITH MEMORY ===")
image_result = multimodal_agent(
    f"Analyze the image data-sample/diagram.jpg in detail and describe everything you observe. "
    f"Remember this information for later. USER_ID: {USER_ID}"
)
print(image_result)
Enter fullscreen mode Exit fullscreen mode

The agent will:

  1. Process the image using image_reader
  2. Analyze the architectural diagram
  3. Automatically store the analysis in memory using mem0_memory
  4. Provide a detailed description

3. Video Analysis with Memory

Let's process a video and store its content:

print("=== 🎬 VIDEO ANALYSIS WITH MEMORY ===")
video_result = multimodal_agent(
    "Analyze the video data-sample/moderation-video.mp4 and describe in detail "
    "the actions and scenes you observe. Store this information in your memory."
)
print(video_result)
Enter fullscreen mode Exit fullscreen mode

4. Document Processing with Memory

Process and remember document content:

print("=== πŸ“„ DOCUMENT ANALYSIS WITH MEMORY ===")
doc_result = multimodal_agent(
    "Summarize as json the content of the document data-sample/Welcome-Strands-Agents-SDK.pdf "
    "and store this information in your memory."
)
print(doc_result)
Enter fullscreen mode Exit fullscreen mode

πŸ” Memory Retrieval and Management

Retrieving Specific Memories

# Retrieve memories related to a specific query
retrieved_memories = multimodal_agent.tool.mem0_memory(
    action="retrieve", 
    query="What services are in the image?", 
    user_id=USER_ID
)
print("Retrieved Memories:", retrieved_memories)
Enter fullscreen mode Exit fullscreen mode

Listing All Stored Memories

# List all stored memories for the user
all_memories = multimodal_agent.tool.mem0_memory(
    action="list", 
    user_id=USER_ID
)
print("All Stored Memories:", all_memories)
Enter fullscreen mode Exit fullscreen mode

Testing Cross-Modal Memory Recall

The real power comes when testing memory across different media types:

print("=== 🧠 MEMORY RECALL TEST ===")
memory_result = multimodal_agent(
    "What do you remember about the image, video, and document I showed you earlier?"
)
print(memory_result)
Enter fullscreen mode Exit fullscreen mode

🎯 Real-World Use Cases

This memory-enhanced agent opens up numerous practical applications:

1. Technical Documentation Assistant

  • Remember architecture diagrams, code snippets, and documentation
  • Provide contextual answers based on your project history
  • Track changes and evolution of your technical designs

2. Content Analysis Pipeline

  • Process batches of images, videos, and documents
  • Build a knowledge base of analyzed content
  • Generate reports based on accumulated insights

3. Personal Knowledge Management

  • Store and recall information from various media types
  • Create connections between different pieces of content
  • Build a personalized AI assistant that grows with your needs

4. Educational Content Processing

  • Analyze educational materials across different formats
  • Remember student preferences and learning patterns
  • Provide personalized learning recommendations

πŸš€ Getting Started

  1. Clone the repository:
   git clone https://github.com/elizabethfuentes12/strands-agent-samples)
   cd notebook
Enter fullscreen mode Exit fullscreen mode
  1. Install dependencies:
   pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode
  1. Configure AWS credentials for Bedrock access

  2. Try the notebook:

   multi-understanding-with-memory.ipynb
Enter fullscreen mode Exit fullscreen mode

The combination of Strands Agent's multi-modal capabilities with persistent memory creates a foundation for building intelligent, context-aware applications that can truly understand and remember your content.

πŸ“š Resources


Stay tuned for more Strands Agent implementations!


Gracias!

πŸ‡»πŸ‡ͺπŸ‡¨πŸ‡± Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr

Top comments (2)

Collapse
 
camila_hinojosa_anez profile image
Camila Hinojosa Anez

Awesomeee, can't wait to build something amazing with this

Collapse
 
st22nestrel profile image
st22nestrel

Hi, great article.
Unfortunately the links to github resources do not work. I mean those two:
Strands Agent Documentation (maybe this is private repo?)
Complete Code Examples

Could you fix them?