DEV Community

Cover image for Building Your First RAG System with Amazon Bedrock Agents and FAISS: A Developer's Journey

Building Your First RAG System with Amazon Bedrock Agents and FAISS: A Developer's Journey

Building Your First RAG System with Amazon Bedrock Agents and FAISS: A Developer's Journey

Discover how AI is transforming the way we build intelligent applications in the cloud. Let's explore how to create your first Retrieval-Augmented Generation (RAG) system using Amazon Bedrock Agents (managed AI agents service) with FAISS (Facebook AI Similarity Search) as our local vector database, perfect for those first steps into GenAI territory.

Why FAISS for Your First RAG Adventure?

The idea here is simple: when you're starting with RAG (Retrieval-Augmented Generation, a technique to enrich LLMs with external data), you want to focus on understanding the core concepts without getting lost in database configurations. FAISS runs locally, it's fast, and gives you complete control over your vector operations. Plus, it's free and doesn't require any AWS infrastructure setup for the vector storage part.

In practice, this means FAISS is your local playground where you can experiment, learn, and prototype before moving to production-ready solutions like Amazon OpenSearch Serverless or Amazon MemoryDB.

The Architecture We're Building

Our setup combines the best of both worlds:

  • Amazon Bedrock Agents: Handles the orchestration, reasoning, and LLM (Large Language Model) interactions
  • FAISS: Manages vector storage and similarity search locally
  • Custom Action Group: Bridges the agent with our FAISS operations
User Query β†’ Bedrock Agent β†’ Action Group β†’ FAISS Search β†’ Context β†’ LLM Response
Enter fullscreen mode Exit fullscreen mode

Setting Up Your Local FAISS Environment

Let's explore the initial configuration. What's interesting is that we can have everything running with just a few key components:

import boto3
import faiss
import numpy as np
from sentence_transformers import SentenceTransformer
import json
import uuid
from typing import List, Dict, Any

# Initialize the embedding model
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
embedding_dimension = 384

# Create FAISS index
index = faiss.IndexFlatIP(embedding_dimension)  # Inner Product for cosine similarity
document_store = {}  # Store original documents with metadata
Enter fullscreen mode Exit fullscreen mode

Creating Your Document Ingestion Pipeline

The idea here is to create a simple but effective ingestion system. My recommendation is to start with this class that encapsulates all the functionality:

class FAISSDocumentStore:
    def __init__(self, embedding_model_name='all-MiniLM-L6-v2'):
        self.embedding_model = SentenceTransformer(embedding_model_name)
        self.dimension = self.embedding_model.get_sentence_embedding_dimension()
        self.index = faiss.IndexFlatIP(self.dimension)
        self.documents = {}

    def add_documents(self, texts: List[str], metadata: List[Dict] = None):
        """Add documents to the FAISS index"""
        if metadata is None:
            metadata = [{}] * len(texts)

        # Generate embeddings
        embeddings = self.embedding_model.encode(texts, normalize_embeddings=True)

        # Add to FAISS index
        start_id = len(self.documents)
        self.index.add(embeddings.astype('float32'))

        # Store documents with metadata
        for i, (text, meta) in enumerate(zip(texts, metadata)):
            doc_id = start_id + i
            self.documents[doc_id] = {
                'text': text,
                'metadata': meta,
                'id': doc_id
            }

    def search(self, query: str, k: int = 5) -> List[Dict]:
        """Search for similar documents"""
        query_embedding = self.embedding_model.encode([query], normalize_embeddings=True)
        scores, indices = self.index.search(query_embedding.astype('float32'), k)

        results = []
        for score, idx in zip(scores[0], indices[0]):
            if idx != -1:  # Valid result
                doc = self.documents[idx].copy()
                doc['similarity_score'] = float(score)
                results.append(doc)

        return results

# Initialize your document store
doc_store = FAISSDocumentStore()
Enter fullscreen mode Exit fullscreen mode

Building the Bedrock Agent Action Group

Let's explore how to create the action group that will handle RAG operations. In practice, this means creating a Lambda function that acts as a bridge:

def lambda_handler(event, context):
    """
    Lambda function to handle Bedrock Agent action group calls
    """

    # Parse the incoming request
    action_group = event.get('actionGroup', '')
    api_path = event.get('apiPath', '')
    http_method = event.get('httpMethod', '')
    parameters = event.get('parameters', [])

    # Extract query parameter
    query = None
    for param in parameters:
        if param.get('name') == 'query':
            query = param.get('value')
            break

    if not query:
        return {
            'messageVersion': '1.0',
            'response': {
                'actionGroup': action_group,
                'apiPath': api_path,
                'httpMethod': http_method,
                'httpStatusCode': 400,
                'responseBody': {
                    'application/json': {
                        'body': json.dumps({'error': 'Query parameter is required'})
                    }
                }
            }
        }

    try:
        # Perform FAISS search
        results = doc_store.search(query, k=3)

        # Format results for the agent
        context_documents = []
        for result in results:
            context_documents.append({
                'content': result['text'],
                'score': result['similarity_score'],
                'metadata': result.get('metadata', {})
            })

        response_body = {
            'query': query,
            'documents': context_documents,
            'total_results': len(context_documents)
        }

        return {
            'messageVersion': '1.0',
            'response': {
                'actionGroup': action_group,
                'apiPath': api_path,
                'httpMethod': http_method,
                'httpStatusCode': 200,
                'responseBody': {
                    'application/json': {
                        'body': json.dumps(response_body)
                    }
                }
            }
        }

    except Exception as e:
        return {
            'messageVersion': '1.0',
            'response': {
                'actionGroup': action_group,
                'apiPath': api_path,
                'httpMethod': http_method,
                'httpStatusCode': 500,
                'responseBody': {
                    'application/json': {
                        'body': json.dumps({'error': str(e)})
                    }
                }
            }
        }
Enter fullscreen mode Exit fullscreen mode

Configuring Your Bedrock Agent

What's interesting is that we can use AWS CLI to create the entire setup. My recommendation is to use this approach:

# Create the agent
aws bedrock-agent create-agent \
    --agent-name "faiss-rag-agent" \
    --description "RAG agent using FAISS for vector search" \
    --foundation-model "anthropic.claude-3-sonnet-20240229-v1:0" \
    --instruction "You are a helpful assistant that can search through documents to answer questions. When a user asks a question, use the search_documents function to find relevant information, then provide a comprehensive answer based on the retrieved context." \
    --region us-east-1

# Create action group (after deploying your Lambda function)
aws bedrock-agent create-agent-action-group \
    --agent-id "YOUR_AGENT_ID" \
    --agent-version "DRAFT" \
    --action-group-name "document-search" \
    --description "Search documents using FAISS vector database" \
    --action-group-executor lambda="arn:aws:lambda:us-east-1:YOUR_ACCOUNT:function:faiss-rag-function" \
    --api-schema '{
        "openapi": "3.0.0",
        "info": {
            "title": "Document Search API",
            "version": "1.0.0"
        },
        "paths": {
            "/search": {
                "post": {
                    "description": "Search for relevant documents",
                    "parameters": [
                        {
                            "name": "query",
                            "in": "query",
                            "required": true,
                            "schema": {
                                "type": "string"
                            },
                            "description": "The search query"
                        }
                    ]
                }
            }
        }
    }' \
    --region us-east-1
Enter fullscreen mode Exit fullscreen mode

Loading Your First Documents

Let's explore how to populate our FAISS index with sample documents:

# Sample documents about AWS services
sample_docs = [
    "Amazon S3 is a highly scalable object storage service that offers industry-leading durability, availability, and performance.",
    "AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume.",
    "Amazon EC2 provides secure, resizable compute capacity in the cloud. It's designed to make web-scale cloud computing easier.",
    "Amazon RDS makes it easy to set up, operate, and scale a relational database in the cloud.",
    "Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale."
]

metadata = [
    {"service": "S3", "category": "Storage"},
    {"service": "Lambda", "category": "Compute"},
    {"service": "EC2", "category": "Compute"},
    {"service": "RDS", "category": "Database"},
    {"service": "DynamoDB", "category": "Database"}
]

# Add documents to FAISS
doc_store.add_documents(sample_docs, metadata)
print(f"Added {len(sample_docs)} documents to FAISS index")
Enter fullscreen mode Exit fullscreen mode

Testing Your RAG System

The idea here is to validate that everything works before connecting the components. In practice, this means:

# Test the search functionality
test_query = "What database services does AWS offer?"
results = doc_store.search(test_query, k=2)

print(f"Query: {test_query}")
print("Results:")
for i, result in enumerate(results, 1):
    print(f"{i}. Score: {result['similarity_score']:.3f}")
    print(f"   Text: {result['text']}")
    print(f"   Service: {result['metadata'].get('service', 'N/A')}")
    print()
Enter fullscreen mode Exit fullscreen mode

Connecting Everything with Bedrock Agents SDK

What's interesting is that we can interact with our agent programmatically:

import boto3

def chat_with_rag_agent(query: str, agent_id: str, agent_alias_id: str):
    """
    Chat with the Bedrock agent that uses FAISS for RAG
    """
    bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

    try:
        response = bedrock_agent_runtime.invoke_agent(
            agentId=agent_id,
            agentAliasId=agent_alias_id,
            sessionId=str(uuid.uuid4()),
            inputText=query
        )

        # Process the streaming response
        full_response = ""
        for event in response['completion']:
            if 'chunk' in event:
                chunk = event['chunk']
                if 'bytes' in chunk:
                    full_response += chunk['bytes'].decode('utf-8')

        return full_response

    except Exception as e:
        print(f"Error invoking agent: {str(e)}")
        return None

# Example usage
agent_response = chat_with_rag_agent(
    query="What are the benefits of using AWS Lambda?",
    agent_id="YOUR_AGENT_ID",
    agent_alias_id="YOUR_ALIAS_ID"
)

print("Agent Response:", agent_response)
Enter fullscreen mode Exit fullscreen mode

Optimizing Your FAISS Performance

In practice, this means that when you scale up, consider these optimizations:

# For larger datasets, use IndexIVFFlat for faster search
def create_optimized_index(dimension: int, nlist: int = 100):
    """
    Create an optimized FAISS index for larger datasets
    """
    quantizer = faiss.IndexFlatIP(dimension)
    index = faiss.IndexIVFFlat(quantizer, dimension, nlist)
    return index

# Add GPU support if available
def create_gpu_index(dimension: int):
    """
    Create GPU-accelerated FAISS index
    """
    if faiss.get_num_gpus() > 0:
        res = faiss.StandardGpuResources()
        index = faiss.IndexFlatIP(dimension)
        gpu_index = faiss.index_cpu_to_gpu(res, 0, index)
        return gpu_index
    else:
        return faiss.IndexFlatIP(dimension)
Enter fullscreen mode Exit fullscreen mode

Common Pitfalls and How to Avoid Them

1. Embedding Model Consistency
Always use the same embedding model for indexing and querying. Mixing models will give you poor results.

2. Normalization Matters

# Always normalize embeddings for cosine similarity
embeddings = model.encode(texts, normalize_embeddings=True)
Enter fullscreen mode Exit fullscreen mode

3. Batch Processing
For large document sets, process in batches to avoid memory issues:

def add_documents_in_batches(doc_store, texts, batch_size=100):
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i+batch_size]
        doc_store.add_documents(batch)
        print(f"Processed batch {i//batch_size + 1}")
Enter fullscreen mode Exit fullscreen mode

Next Steps and Production Considerations

This setup is perfect for learning and prototyping, but for production, my recommendation is:

  1. Persistence: Save your FAISS index to disk
faiss.write_index(index, "my_index.faiss")
Enter fullscreen mode Exit fullscreen mode
  1. Scalability: Consider Amazon OpenSearch Serverless for production workloads
  2. Monitoring: Add logging and metrics to your Lambda function
  3. Security: Implement proper IAM roles and VPC configurations

El Takeaway Principal

The main takeaway is that starting with FAISS locally allows you to understand RAG fundamentals without additional complexities. Once you feel comfortable, you can explore Amazon Bedrock Knowledge Bases for a fully managed solution, or Amazon OpenSearch Serverless for more advanced vector search capabilities.

Lo interesante es que this approach allows you to iterate quickly, understand each component, and gradually move to more sophisticated setups as your needs grow.

Quick Actions for You

  • [ ] Set up your local Python environment with the required dependencies
  • [ ] Create a simple FAISS index with your own documents
  • [ ] Deploy the Lambda function for the action group
  • [ ] Configure your Bedrock Agent with the action group
  • [ ] Test the end-to-end RAG pipeline
  • [ ] Experiment with different embedding models
  • [ ] Try optimizing FAISS for your specific use case

My Final Recommendation

Start small with this local FAISS approach to understand RAG fundamentals. Once you're comfortable, explore Amazon Bedrock Knowledge Bases for a fully managed solution, or Amazon OpenSearch Serverless for more advanced vector search capabilities.

The beauty of this approach is that you can iterate quickly, understand every component, and gradually move to more sophisticated setups as your needs grow.


Thanks for reading, now it's your turn to try it out! Let's connect and share our experiences building with GenAI:

πŸ”— LinkedIn: https://www.linkedin.com/in/carloscortezcloud

🐦 X: https://x.com/ccortezb

πŸ’» GitHub: https://github.com/ccortezb

πŸ“ Dev.to: https://dev.to/ccortezb

πŸ† AWS Heroes: https://builder.aws.com/community/@breakinthecloud

πŸ“– Medium: https://ccortezb.medium.com

Stay curious, keep experimenting β€” and I'll catch you in the next one!

Top comments (0)