How to Build a RAG System with Amazon Bedrock
Introduction
I am currently implementing an AI Chatbot with RAG functionality using Amazon Bedrock for a project.
Now that things have settled down, I'd like to document what I've researched as a reference.
Since LLM and RAG functionality cover a wide range, this article provides a conceptual overview.
What is RAG Functionality?
Basic Knowledge
Before diving into the explanation, let me explain RAG.
RAG stands for "Retrieval-Augmented Generation."
RAG is a technique for improving the capabilities of large language models (LLMs), and it works with the following mechanism:
Retrieval: Search for information related to the user's question from external databases or documents
Augmented: Add the relevant information obtained from the search to the LLM input
Generation: The LLM generates responses using both the original question and the searched information
Main Benefits of RAG
When using traditional LLMs, responses are generated within the scope of knowledge from data used during training. In this case, if the information you want to know through dialogue is recent content or if the knowledge and terminology are not common, you may not be able to get correct answers.
On the other hand, by using RAG, you can generate responses that better reflect local context by adding external documents as reference information.
Here are some practical examples:
- Chatbots that search internal company documents to answer questions
- Research support systems that reference the latest academic papers
- Customer support that searches product manuals for information
RAG has become an important technology for building AI systems that can better meet user-specific requirements by combining the general knowledge of LLMs with specific contexts and the latest information.
How to Build with AWS?
About Bedrock
AWS has a service called Bedrock for generative AI, which we will use.
Bedrock is a service that enables access to various LLM models like Claude and makes them accessible via API.
Bedrock has various services, but to realize dialogue with LLMs using RAG functionality, you need to use the following services:
- Agent: Provides API access to LLM models for dialogue and response generation.
- KnowledgeBase: A service for RAG that connects to data sources and vector stores. Performs data vectorization.
Using Agent, you can have general LLM conversations via API, but by adding KnowledgeBase, you can have conversations using RAG functionality.
How KnowledgeBase Works
Bedrock KnowledgeBase is an orchestration tool to realize RAG functionality.
It provides functions such as importing training data and data retrieval to improve LLM dialogue performance.
KnowledgeBase connects to various data sources, vectorizes documents, and registers them in vector databases.
Data sources include S3, SharePoint, Confluence, Salesforce, etc., and vector databases can use OpenSearch, Pinecone, Amazon Aurora, etc.
Vectorization Process
In RAG systems, documents stored as context are converted to numerical vectors and stored.
Vectorization is an important process that converts text to numerical vectors in a format that computers can understand.
The following steps are involved in vectorization:
- Chunk splitting
- Embedding model selection
- Vector conversion
1. Chunk Splitting
First, long documents are divided into appropriate sizes according to the following policy:
- Size: Usually around 500-1000 characters
- Overlap: Provide 50-100 character overlap between chunks
- Boundaries: Split at paragraph or sentence breaks, not in the middle of sentences
Example:
Original document: "Amazon Bedrock is a generative AI service. It provides access to various models..."
↓
Chunk1: "Amazon Bedrock is a generative AI service. It provides access to various models and can be used via API."
Chunk2: "It can be used via API. Main features include Agent functionality and KnowledgeBase functionality."
2. Embedding Model Selection
Next, select an embedding model. Use this model to vectorize the chunked text.
Bedrock uses embedding models such as Amazon Titan Embeddings:
- Amazon Titan Embeddings: General-purpose text embedding
- Cohere Embed: Multi-language support
3. Vector Conversion
Convert each chunk to vectors and store them in the vector store. When stored, vectors are typically saved as arrays of floating-point numbers.
During dialogue creation, similarity calculations are performed from these vectors to search for information needed for dialogue.
During search, the user's question is also vectorized using the same method, and cosine similarity or Euclidean distance is calculated to search for documents with high similarity.
Sample Implementation
Now let's write a sample implementation for Bedrock KnowledgeBase.
The implementation language is Python, the vector store is OpenSearch managed cluster, and the data source is S3.
The implementation flow is as follows:
- Create index in OpenSearch
- Create VectorStore
- Create data source
- Ingest vector data into VectorStore
- Execute agent combined with KnowledgeBase
Create Index in OpenSearch
First, as preparation, let's create an index in OpenSearch.
There are various configuration items, but there is a sample in the official documentation, so it's good to customize based on this.
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"<vector-name>": {
"type": "knn_vector",
"dimension": <embedding-dimension>,
"data_type": "binary", # Only needed for binary embeddings
"space_type": "l2" | "hamming", # Use l2 for float embeddings and hamming for binary embeddings
"method": {
"name": "hnsw",
"engine": "faiss",
"parameters": {
"ef_construction": 128,
"m": 24
}
}
},
"AMAZON_BEDROCK_METADATA": {
"type": "text",
"index": "false"
},
"AMAZON_BEDROCK_TEXT_CHUNK": {
"type": "text",
"index": "true"
}
}
}
}
For explanations of values like ef_construction, the OpenSearch official documentation has explanations and recommended parameter combinations, which I think would be helpful as reference.
https://opensearch.org/blog/a-practical-guide-to-selecting-hnsw-hyperparameters/#:~:text=efficiency%20%5B3%2C%204%5D.-,Recommended,-HNSW%20configurations
Create VectorStore
Next, use create_knowledge_base to create a VectorStore.
import boto3
def main():
client = boto3.client(
"bedrock-agent", region_name='<your-aws-region>'
)
bedrock_kb_role_arn = "<KB-IAMrole-arn>"
embedding_model_arn = "<KB-Embedding-model-arn>"
opensearch_domain_endpoint = "https://<os-domain-endpoint>"
opensearch_domain_arn = "<os-domain-arn>"
vector_index_name = "<vector-store-index-name>"
vector_field_name = "<vector_field_name>"
response = client.create_knowledge_base(
name="kb-sample",
roleArn=bedrock_kb_role_arn,
knowledgeBaseConfiguration={
"type": "VECTOR",
"vectorKnowledgeBaseConfiguration": {
"embeddingModelArn": embedding_model_arn,
# Set this if implementing multimodal processing for images like PDFs
"supplementalDataStorageConfiguration": {
"storageLocations": [
{
"s3Location": {
"uri": "<s3-multimodal-bucket-arn>"
},
"type": "S3",
}
]
}
},
},
storageConfiguration={
"type": "OPENSEARCH_MANAGED_CLUSTER",
"domainEndpoint": opensearch_domain_endpoint,
"domainArn": opensearch_domain_arn,
"vectorIndexName": vector_index_name,
"fieldMapping": {
"vectorField": vector_field_name,
"textField": "AMAZON_BEDROCK_TEXT_CHUNK",
"metadataField": "AMAZON_BEDROCK_METADATA",
},
}
)
...
If you want to vectorize images embedded in PDFs, you need to set an S3 bucket for multimodal processing in supplementalDataStorageConfiguration
.
Create Data Source
Next, create a data source for the vector store. Here's an example when the text chunk strategy is set to fixed_size.
import boto3
def main():
client = boto3.client(
"bedrock-agent", region_name='<your-aws-region>'
)
knowledge_base_id = <knowledge-base-id>
datasource_s3_arn = <S3-arn>
aws_account_id = <aws-account-id>
inclusion_prefixes = <inclusion-prefixes-array>
foundation_model_arn = <foundation-model-arn>
chunking_max_tokens = <chunking-max-tokens>
overlap_percentage = <overlap-percentage>
response = client.create_data_source(
knowledgeBaseId=knowledge_base_id,
name=<knowledge-base-name>,
dataSourceConfiguration={
"type": "S3",
"s3Configuration": {
"bucketArn": datasource_s3_arn,
"bucketOwnerAccountId": aws_account_id,
"inclusionPrefixes": inclusion_prefixes,
},
},
vectorIngestionConfiguration={
"chunkingConfiguration": {
"chunkingStrategy": "FIXED_SIZE",
"fixedSizeChunkingConfiguration": {
"maxTokens": chunking_max_tokens,
"overlapPercentage": overlap_percentage
}
},
"parsingConfiguration": {
"bedrockFoundationModelConfiguration": {
"modelArn": foundation_model_arn,
"parsingModality": "MULTIMODAL",
},
"parsingStrategy": "BEDROCK_FOUNDATION_MODEL",
},
}
)
...
Ingest Vector Data into VectorStore
Next, ingest vector data. This time, we'll use ingest_knowledge_base_documents to specify S3 files and insert vector data.
import boto3
def main():
client = boto3.client(
"bedrock-agent", region_name='<your-aws-region>'
)
knowledge_base_id = <knowledge-base-id>
datasource_id = <datasource-id>
s3_uri = <s3-uri-where-the-source-file-is-located>
metadata_s3_uri = f"{s3_uri}.metadata.json"
aws_account_id = <aws-account-id>
response = client.ingest_knowledge_base_documents(
knowledgeBaseId=knowledge_base_id,
dataSourceId=datasource_id,
documents=[
{
# Set this if you want to add custom metadata
'metadata': {
'type': 'S3_LOCATION',
's3Location': {
'uri': metadata_s3_uri,
'bucketOwnerAccountId': aws_account_id
}
},
'content': {
'dataSourceType': 'S3',
's3': {
's3Location': {
'uri': s3_uri
}
}
}
},
]
)
When importing files, you can get page numbers and source file information by default, but you can also set custom metadata. You can add metadata to the vector store by preparing a file named filename.metadata.json
in the same directory as the target file and specifying that path.
https://docs.aws.amazon.com/bedrock/latest/userguide/kb-metadata.html
Execute Agent Combined with KnowledgeBase
Finally, let's call the agent for dialogue. You can easily invoke dialogue by calling invoke_inline_agent. To simplify the explanation, I'll only include the parts related to KnowledgeBase.
import boto3
def main():
client = boto3.client(
"bedrock-agent-runtime", region_name='<your-aws-region>'
)
vector_store_id = <vector-store-id>
override_search_type = <override-search-type>
response = client.invoke_inline_agent(
knowledgeBases=[
{
"knowledgeBaseId": vector_store_id,
"description": "Knowledge base for document retrieval",
"retrievalConfiguration": {
"vectorSearchConfiguration": {
"overrideSearchType": override_search_type
}
},
}
]
)
After the API search, you can get metadata of referenced documents from the following fields in the response:
{
'completion': EventStream({
'trace': {
'trace': {
'orchestrationTrace': {
'observation': {
'knowledgeBaseLookupOutput': {
'metadata': {
'clientRequestId': 'string',
'endTime': datetime(2015, 1, 1),
'operationTotalTimeMs': 123,
'startTime': datetime(2015, 1, 1),
'totalTimeMs': 123,
'usage': {
'inputTokens': 123,
'outputTokens': 123
}
},
'retrievedReferences': [
{
'content': {
'byteContent': 'string',
'row': [
{
'columnName': 'string',
'columnValue': 'string',
'type': 'BLOB'|'BOOLEAN'|'DOUBLE'|'NULL'|'LONG'|'STRING'
},
],
'text': 'string',
'type': 'TEXT'|'IMAGE'|'ROW'
},
'location': {
'confluenceLocation': {
'url': 'string'
},
'customDocumentLocation': {
'id': 'string'
},
'kendraDocumentLocation': {
'uri': 'string'
},
's3Location': {
'uri': 'string'
},
'salesforceLocation': {
'url': 'string'
},
'sharePointLocation': {
'url': 'string'
},
'sqlLocation': {
'query': 'string'
},
'type': 'S3'|'WEB'|'CONFLUENCE'|'SALESFORCE'|'SHAREPOINT'|'CUSTOM'|'KENDRA'|'SQL',
'webLocation': {
'url': 'string'
}
},
'metadata': {
'string': {...}|[...]|123|123.4|'string'|True|None
}
},
]
},
},
},
}
},
}),
}
Among these, retrievedReferences[*].metadata contains metadata about referenced files such as:
-
x-amz-bedrock-kb-source-uri
: S3 URL -
x-amz-bedrock-kb-chunk-id
: Chunk Id used to generate response -
x-amz-bedrock-kb-data-source-id
: Datasource id -
x-amz-bedrock-kb-document-page-number
: page number of the source document In addition, if you specify a metadata.json file, metadata will be added here.
Conclusion
This has been an explanation of RAG functionality using KnowledgeBase.
KnowledgeBase and RAG functionality have much to discuss, and it's impossible to cover everything in this blog, so I've only introduced the basics.
If I have time, I'd like to touch on parameter tuning for OpenSearch index as well.
I hope this article will be helpful for everyone.
Top comments (0)