Here's a step-by-step process for using Knowledge Bases with Amazon Bedrock, to easily customize Bedrock with your own data. The code and commands used can be found here. The cost will be <$5, if you remember to clean up at the end!
The architecture looks like this:
Architecture Components
SageMaker Notebook
A SageMaker notebook is used as the IDE (Integrated Dev Environment), to run all commands used to set everything up, and the code to interact with Bedrock.
S3
The custom data is uploaded to an S3 bucket. After the bucket is created, manually upload any text based data that you want to work with, e.g. PDF, .csv, Microsoft Word, or .txt files. Sample documents to test with are included in the GitHub Repo.
Amazon OpenSearch Serverless
Amazon OpenSearch Serverless (AOSS) is used to create a vector index/data store, from the S3 data.
Bedrock Knowledge Base
The Bedrock Knowledge Base is configured to use the AOSS vector index as a data store, and will answer prompts based on the provided data.
Prerequisites
1) Do everything in us-west-2.
2) In your AWS account, request access for the Bedrock models that you would like to use. You'll find this in the Bedrock console, under model access. (For this, I enabled all the Anthropic Claude models.)
3)To create the SageMaker Notebook, first make sure you have a SageMaker Domain in us-west-2, this on-time step creates home directory space, and VPC configurations needed by any Notebooks you create in this region. If you don't have one already, select the Create Domain option, and it will do everything for you.
Next, use this CloudFormation template to create a Sagemaker Notebook, that we'll use to run the commands from. The template will configure the SageMaker Notebook instance, with an associated IAM role that includes permissions for a few required services, including:
- S3 full access
- Bedrock full access
- IAM full access
- Lambda full access
- Amazon Opensearch serverless full access
After everything has been configured, the permissions can be tightened up if needed.
4) When the Notebook is ready, select the Notebook instance and select open Jupyter Lab. The GitHub repository will already be downloaded.
5) From the cloned repository, open the file named: bedrock_rag_demo.ipynb - this is an Interactive Python Notebook, each block of code is displayed in a cell that can be run in sequence, to observe the outcome of each step.
6) Run all the cells in contained in the .ipynb file, which at a high level, will do the following:
Install required libraries like boto3, which is the AWS SDK for Python that interacts with Bedrock. And opensearch-py which is the Python client used to interact with OpenSearch.
Create an S3 bucket to store our custom data. (Then manually upload the custom data, I just used a PDF containing some text.)
Create the OpenSearch Serverless collection which is a container for OpenSearch indexes.
Create the OpenSearch Serverless vector index. This will contain the vector embeddings, or numerical representations of your data. So that the LLM can make sense of your data and understand the meaning it contains.
Configure the Bedrock Knowledge Base using the OpenSearch Serverless vector index. Data source will be S3.
Ingest the data into the Knowledge Base
Testing
Run some prompts to test that the LLM is using the Knowledge Base to answer. Try a prompt that we know it won’t find the answer in the custom data. If it’s working properly, the model should return that it doesn’t know.
Example Prompts to Try:
If you uploaded the files provided in the repo try running the following prompts:
- What is the parental leave policy at Bob's Pizza?
- What are the top three most expensive services?
- What is FinOps?
- What is the sick leave policy at Bob's Pizza? (None of the provided documents contain this data, so the model should tell you it doesn't know).
So this is a great way to avoid dreaded hallucinations, to help improve accuracy by providing correct and up-to-date data, and to control the data that is being used by the model.
Cleaning Up to Avoid Charges
After testing, run the last four cells in the notebook, to clean up the Knowledge Base, OSS, S3 and IAM to avoid unnecessary charges. Then remember to manually stop and delete the Notebook instance if you no longer need it.
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.