Building Your First RAG System with Amazon Bedrock Agents and FAISS: A Developer's Journey
Discover how AI is transforming the way we build intelligent applications in the cloud. Let's explore how to create your first Retrieval-Augmented Generation (RAG) system using Amazon Bedrock Agents (managed AI agents service) with FAISS (Facebook AI Similarity Search) as our local vector database, perfect for those first steps into GenAI territory.
Why FAISS for Your First RAG Adventure?
The idea here is simple: when you're starting with RAG (Retrieval-Augmented Generation, a technique to enrich LLMs with external data), you want to focus on understanding the core concepts without getting lost in database configurations. FAISS runs locally, it's fast, and gives you complete control over your vector operations. Plus, it's free and doesn't require any AWS infrastructure setup for the vector storage part.
In practice, this means FAISS is your local playground where you can experiment, learn, and prototype before moving to production-ready solutions like Amazon OpenSearch Serverless or Amazon MemoryDB.
The Architecture We're Building
Our setup combines the best of both worlds:
- Amazon Bedrock Agents: Handles the orchestration, reasoning, and LLM (Large Language Model) interactions
- FAISS: Manages vector storage and similarity search locally
- Custom Action Group: Bridges the agent with our FAISS operations
User Query β Bedrock Agent β Action Group β FAISS Search β Context β LLM Response
Setting Up Your Local FAISS Environment
Let's explore the initial configuration. What's interesting is that we can have everything running with just a few key components:
import boto3
import faiss
import numpy as np
from sentence_transformers import SentenceTransformer
import json
import uuid
from typing import List, Dict, Any
# Initialize the embedding model
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
embedding_dimension = 384
# Create FAISS index
index = faiss.IndexFlatIP(embedding_dimension) # Inner Product for cosine similarity
document_store = {} # Store original documents with metadata
Creating Your Document Ingestion Pipeline
The idea here is to create a simple but effective ingestion system. My recommendation is to start with this class that encapsulates all the functionality:
class FAISSDocumentStore:
def __init__(self, embedding_model_name='all-MiniLM-L6-v2'):
self.embedding_model = SentenceTransformer(embedding_model_name)
self.dimension = self.embedding_model.get_sentence_embedding_dimension()
self.index = faiss.IndexFlatIP(self.dimension)
self.documents = {}
def add_documents(self, texts: List[str], metadata: List[Dict] = None):
"""Add documents to the FAISS index"""
if metadata is None:
metadata = [{}] * len(texts)
# Generate embeddings
embeddings = self.embedding_model.encode(texts, normalize_embeddings=True)
# Add to FAISS index
start_id = len(self.documents)
self.index.add(embeddings.astype('float32'))
# Store documents with metadata
for i, (text, meta) in enumerate(zip(texts, metadata)):
doc_id = start_id + i
self.documents[doc_id] = {
'text': text,
'metadata': meta,
'id': doc_id
}
def search(self, query: str, k: int = 5) -> List[Dict]:
"""Search for similar documents"""
query_embedding = self.embedding_model.encode([query], normalize_embeddings=True)
scores, indices = self.index.search(query_embedding.astype('float32'), k)
results = []
for score, idx in zip(scores[0], indices[0]):
if idx != -1: # Valid result
doc = self.documents[idx].copy()
doc['similarity_score'] = float(score)
results.append(doc)
return results
# Initialize your document store
doc_store = FAISSDocumentStore()
Building the Bedrock Agent Action Group
Let's explore how to create the action group that will handle RAG operations. In practice, this means creating a Lambda function that acts as a bridge:
def lambda_handler(event, context):
"""
Lambda function to handle Bedrock Agent action group calls
"""
# Parse the incoming request
action_group = event.get('actionGroup', '')
api_path = event.get('apiPath', '')
http_method = event.get('httpMethod', '')
parameters = event.get('parameters', [])
# Extract query parameter
query = None
for param in parameters:
if param.get('name') == 'query':
query = param.get('value')
break
if not query:
return {
'messageVersion': '1.0',
'response': {
'actionGroup': action_group,
'apiPath': api_path,
'httpMethod': http_method,
'httpStatusCode': 400,
'responseBody': {
'application/json': {
'body': json.dumps({'error': 'Query parameter is required'})
}
}
}
}
try:
# Perform FAISS search
results = doc_store.search(query, k=3)
# Format results for the agent
context_documents = []
for result in results:
context_documents.append({
'content': result['text'],
'score': result['similarity_score'],
'metadata': result.get('metadata', {})
})
response_body = {
'query': query,
'documents': context_documents,
'total_results': len(context_documents)
}
return {
'messageVersion': '1.0',
'response': {
'actionGroup': action_group,
'apiPath': api_path,
'httpMethod': http_method,
'httpStatusCode': 200,
'responseBody': {
'application/json': {
'body': json.dumps(response_body)
}
}
}
}
except Exception as e:
return {
'messageVersion': '1.0',
'response': {
'actionGroup': action_group,
'apiPath': api_path,
'httpMethod': http_method,
'httpStatusCode': 500,
'responseBody': {
'application/json': {
'body': json.dumps({'error': str(e)})
}
}
}
}
Configuring Your Bedrock Agent
What's interesting is that we can use AWS CLI to create the entire setup. My recommendation is to use this approach:
# Create the agent
aws bedrock-agent create-agent \
--agent-name "faiss-rag-agent" \
--description "RAG agent using FAISS for vector search" \
--foundation-model "anthropic.claude-3-sonnet-20240229-v1:0" \
--instruction "You are a helpful assistant that can search through documents to answer questions. When a user asks a question, use the search_documents function to find relevant information, then provide a comprehensive answer based on the retrieved context." \
--region us-east-1
# Create action group (after deploying your Lambda function)
aws bedrock-agent create-agent-action-group \
--agent-id "YOUR_AGENT_ID" \
--agent-version "DRAFT" \
--action-group-name "document-search" \
--description "Search documents using FAISS vector database" \
--action-group-executor lambda="arn:aws:lambda:us-east-1:YOUR_ACCOUNT:function:faiss-rag-function" \
--api-schema '{
"openapi": "3.0.0",
"info": {
"title": "Document Search API",
"version": "1.0.0"
},
"paths": {
"/search": {
"post": {
"description": "Search for relevant documents",
"parameters": [
{
"name": "query",
"in": "query",
"required": true,
"schema": {
"type": "string"
},
"description": "The search query"
}
]
}
}
}
}' \
--region us-east-1
Loading Your First Documents
Let's explore how to populate our FAISS index with sample documents:
# Sample documents about AWS services
sample_docs = [
"Amazon S3 is a highly scalable object storage service that offers industry-leading durability, availability, and performance.",
"AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume.",
"Amazon EC2 provides secure, resizable compute capacity in the cloud. It's designed to make web-scale cloud computing easier.",
"Amazon RDS makes it easy to set up, operate, and scale a relational database in the cloud.",
"Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale."
]
metadata = [
{"service": "S3", "category": "Storage"},
{"service": "Lambda", "category": "Compute"},
{"service": "EC2", "category": "Compute"},
{"service": "RDS", "category": "Database"},
{"service": "DynamoDB", "category": "Database"}
]
# Add documents to FAISS
doc_store.add_documents(sample_docs, metadata)
print(f"Added {len(sample_docs)} documents to FAISS index")
Testing Your RAG System
The idea here is to validate that everything works before connecting the components. In practice, this means:
# Test the search functionality
test_query = "What database services does AWS offer?"
results = doc_store.search(test_query, k=2)
print(f"Query: {test_query}")
print("Results:")
for i, result in enumerate(results, 1):
print(f"{i}. Score: {result['similarity_score']:.3f}")
print(f" Text: {result['text']}")
print(f" Service: {result['metadata'].get('service', 'N/A')}")
print()
Connecting Everything with Bedrock Agents SDK
What's interesting is that we can interact with our agent programmatically:
import boto3
def chat_with_rag_agent(query: str, agent_id: str, agent_alias_id: str):
"""
Chat with the Bedrock agent that uses FAISS for RAG
"""
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1')
try:
response = bedrock_agent_runtime.invoke_agent(
agentId=agent_id,
agentAliasId=agent_alias_id,
sessionId=str(uuid.uuid4()),
inputText=query
)
# Process the streaming response
full_response = ""
for event in response['completion']:
if 'chunk' in event:
chunk = event['chunk']
if 'bytes' in chunk:
full_response += chunk['bytes'].decode('utf-8')
return full_response
except Exception as e:
print(f"Error invoking agent: {str(e)}")
return None
# Example usage
agent_response = chat_with_rag_agent(
query="What are the benefits of using AWS Lambda?",
agent_id="YOUR_AGENT_ID",
agent_alias_id="YOUR_ALIAS_ID"
)
print("Agent Response:", agent_response)
Optimizing Your FAISS Performance
In practice, this means that when you scale up, consider these optimizations:
# For larger datasets, use IndexIVFFlat for faster search
def create_optimized_index(dimension: int, nlist: int = 100):
"""
Create an optimized FAISS index for larger datasets
"""
quantizer = faiss.IndexFlatIP(dimension)
index = faiss.IndexIVFFlat(quantizer, dimension, nlist)
return index
# Add GPU support if available
def create_gpu_index(dimension: int):
"""
Create GPU-accelerated FAISS index
"""
if faiss.get_num_gpus() > 0:
res = faiss.StandardGpuResources()
index = faiss.IndexFlatIP(dimension)
gpu_index = faiss.index_cpu_to_gpu(res, 0, index)
return gpu_index
else:
return faiss.IndexFlatIP(dimension)
Common Pitfalls and How to Avoid Them
1. Embedding Model Consistency
Always use the same embedding model for indexing and querying. Mixing models will give you poor results.
2. Normalization Matters
# Always normalize embeddings for cosine similarity
embeddings = model.encode(texts, normalize_embeddings=True)
3. Batch Processing
For large document sets, process in batches to avoid memory issues:
def add_documents_in_batches(doc_store, texts, batch_size=100):
for i in range(0, len(texts), batch_size):
batch = texts[i:i+batch_size]
doc_store.add_documents(batch)
print(f"Processed batch {i//batch_size + 1}")
Next Steps and Production Considerations
This setup is perfect for learning and prototyping, but for production, my recommendation is:
- Persistence: Save your FAISS index to disk
faiss.write_index(index, "my_index.faiss")
- Scalability: Consider Amazon OpenSearch Serverless for production workloads
- Monitoring: Add logging and metrics to your Lambda function
- Security: Implement proper IAM roles and VPC configurations
El Takeaway Principal
The main takeaway is that starting with FAISS locally allows you to understand RAG fundamentals without additional complexities. Once you feel comfortable, you can explore Amazon Bedrock Knowledge Bases for a fully managed solution, or Amazon OpenSearch Serverless for more advanced vector search capabilities.
Lo interesante es que this approach allows you to iterate quickly, understand each component, and gradually move to more sophisticated setups as your needs grow.
Quick Actions for You
- [ ] Set up your local Python environment with the required dependencies
- [ ] Create a simple FAISS index with your own documents
- [ ] Deploy the Lambda function for the action group
- [ ] Configure your Bedrock Agent with the action group
- [ ] Test the end-to-end RAG pipeline
- [ ] Experiment with different embedding models
- [ ] Try optimizing FAISS for your specific use case
My Final Recommendation
Start small with this local FAISS approach to understand RAG fundamentals. Once you're comfortable, explore Amazon Bedrock Knowledge Bases for a fully managed solution, or Amazon OpenSearch Serverless for more advanced vector search capabilities.
The beauty of this approach is that you can iterate quickly, understand every component, and gradually move to more sophisticated setups as your needs grow.
Thanks for reading, now it's your turn to try it out! Let's connect and share our experiences building with GenAI:
π LinkedIn: https://www.linkedin.com/in/carloscortezcloud
π¦ X: https://x.com/ccortezb
π» GitHub: https://github.com/ccortezb
π Dev.to: https://dev.to/ccortezb
π AWS Heroes: https://builder.aws.com/community/@breakinthecloud
π Medium: https://ccortezb.medium.com
Stay curious, keep experimenting β and I'll catch you in the next one!
Top comments (0)