DEV Community

How to Use RAG with Amazon Bedrock + Nova for Building Chatbots

Recently, I have been exploring Retrieval-Augmented Generation (RAG) and building a chatbot for fun. The reason became I interested in RAG because i use AI every day, especially Generative AI, such as large language models (LLM). LLM so helpful and often provide impressive responses.

However, LLM-generated response have a challenge. These response need to include with factual information. When you use an LLM to generate a response, it relies only on the data on which it was trained.

For example, when you ask, "What food should I eat when I do X?", the result will likely be a impressive, grammatically correct and logical response, but because it isn't based on factual or contextual data, it maybe inaccurate.

In contrast, you can use a data or knowledge with some relevant, factual context and information to generate a contextualized, relevant, and accurate response.

The data or knowledge can be any document of relevant data to provide relevant information. For example, you could use data from a daily food you eat to help answer the prompt "What food should i eat when do X?" so that the response includes relevant response.

To ensure that an LLM provides accurate and domain-specific responses, you can use Retrieval-Augmented Generation (RAG).

RAG is a process of optimizing the output of a large language model using retrieving information technique that is relevant to the user's prompt.

RAG have three steps:

  1. Retrieve data based on the user prompt.
  2. Augment the prompt with knowledge data.
  3. Use a language model to generate a response.

By retrieving context from specified data, you ensure that the LLM uses relevant information when responding, rather than relying solely on its training data

A standard RAG app usually has two main parts:

  • Indexing: a process that takes data from a source, organizes it, and stores it in an index.
  • Retrieval and generation: the RAG workflow itself, where a user’s query is received, the system looks up the most relevant information from the index, and then sends that context to the model to generate a response.

In my explorations, I will build a simple chatbot that can generate answers based on a provided document using the following tools:

  • Langchain
  • Bedrock Embedding
  • Amazon titan embedding model
  • Amazon nova for chat model
  • Chroma as Vector DB

So, we will start by indexing. Let's go!

Setup

First, we will setup tools that will we use. Install this packages:
pip install -qU "langchain[aws]" langchain-aws langchain-chroma PyMuPDF langchain-text-splitters langchain-community langgraph

Indexing

After we setup the project, next we will indexing. Indexing data have this part:

  • Load: Start by loading data.
  • Split: Break big documents into smaller pieces.
  • Store: Save those in a storage system that supports indexing and searching. This is often done with a Vector Store combined with an embeddings model.

Write following code.

embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0")
vector_store = Chroma(
    collection_name=COLLECTION,
    embedding_function=embeddings,
    persist_directory=DB_DIR,
)
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)

def index_documents():
    print("📂 Loading and indexing documents...")
    loader = PyMuPDFLoader(PDF_PATH, mode="page")
    docs = loader.load()
    chunks = splitter.split_documents(docs)

    vector_store.add_documents(chunks)
    vector_store.persist()
    print(f"✅ Indexed {len(chunks)} chunks into {DB_DIR}")
Enter fullscreen mode Exit fullscreen mode

First, we set up embeddings using Amazon’s Titan model. Embeddings as a way to turn text into numbers so the computer can understand and compare meaning.

Then, we create a vector database with Chroma that can store and search those embeddings. We also set up a text splitter so large documents can be cut into smaller, manageable chunks of around 1,000 characters each, with a bit of overlap to keep context intact.

The function index_documents(), creates a loader that reads a PDF file (located at PDF_PATH). The mode="page" means it treats each page of the PDF as a separate unit.

Then, split the documents into smaller part (chunking) and saves those chunks into the vector database.

At the end, you’ll have your document neatly indexed, so later when a user asks a question, the system can quickly find the right pieces of text to feed into AI model.

Retrieval and Generation

Now, it’s time to build the core application. The goal is to make a simple app that takes a user’s question, finds the most relevant documents, sends both the question and those documents to a model, and then gives back an answer.

For generation, we will use the chat model with Amazon Nova. Write following code.

llm = init_chat_model("us.amazon.nova-lite-v1:0", model_provider="bedrock_converse")
Enter fullscreen mode Exit fullscreen mode

Then, write RAG pipeline. We will write state for application. Write this code.

from langchain_core.documents import Document
from typing_extensions import List, TypedDict

class State(TypedDict):
    question: str
    context: List[Document]
    answer: str
Enter fullscreen mode Exit fullscreen mode

We will keep track of the question, the context (docs we retrieve), and the answer.

Next, write this function.

def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"], k=4)
    return {"context": retrieved_docs}
Enter fullscreen mode Exit fullscreen mode

When a user asks a question, this part looks inside the vector database and pulls out the top 4 most relevant documents. Those docs will become the context for the model to use.

After that, write system prompt and create generate function.

SYSTEM_PROMPT = "Your system prompt"

def generate(state: State):
    if not state["context"]:
        return {"answer": "Sorry, i can't answers."}

    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    user_payload = f"""Context: {docs_content} Question: {state['question']} Answer:"""
    messages = [
        SystemMessage(content=SYSTEM_PROMPT),
        HumanMessage(content=user_payload)
    ]
    response = llm.invoke(messages)
    return {"answer": response.content}
Enter fullscreen mode Exit fullscreen mode

Lastly, we compile our application into a single graph object. In this case, we are just connecting the retrieval and generation steps into a single sequence.

graph_builder = StateGraph(State)
graph_builder.add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()
Enter fullscreen mode Exit fullscreen mode

This is the complete code.

import argparse
import os
import textwrap
import langchain
from typing_extensions import List, TypedDict

from langchain_aws import BedrockEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_chroma import Chroma
from langchain.chat_models import init_chat_model
from langchain.schema import SystemMessage, HumanMessage
from langgraph.graph import START, StateGraph


# ===================================================================
# Global Config
# ===================================================================
langchain.verbose = False
langchain.debug = False
langchain.llm_cache = None

DB_DIR = "./chroma_langchain_db"
COLLECTION = "example_collection"
PDF_PATH = "docs/ilovepdf_merged.pdf"

embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0")
vector_store = Chroma(
    collection_name=COLLECTION,
    embedding_function=embeddings,
    persist_directory=DB_DIR,
)

llm = init_chat_model("us.amazon.nova-lite-v1:0", model_provider="bedrock_converse")

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)


# ===================================================================
# Indexing Function
# ===================================================================
def index_documents():
    print("📂 Loading and indexing documents...")
    loader = PyMuPDFLoader(PDF_PATH, mode="page")
    docs = loader.load()
    chunks = splitter.split_documents(docs)

    vector_store.add_documents(chunks)
    vector_store.persist()
    print(f"✅ Indexed {len(chunks)} chunks into {DB_DIR}")


# ===================================================================
# RAG Pipeline
# ===================================================================
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"], k=4)
    return {"context": retrieved_docs}


SYSTEM_PROMPT = "Your system prompts".
def generate(state: State):
    if not state["context"]:
        return {"answer": "Sorry, i can't answers."}

    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    user_payload = f"""Context: {docs_content} Question: {state['question']} Answer:"""
    messages = [
        SystemMessage(content=SYSTEM_PROMPT),
        HumanMessage(content=user_payload)
    ]
    response = llm.invoke(messages)
    return {"answer": response.content}

graph_builder = StateGraph(State)
graph_builder.add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

def query_pipeline(question: str):
    response = graph.invoke({"question": question})
    print(f"Question: {question}\nAnswer: {response['answer']}")


# ===================================================================
# CLI Entrypoint
# ===================================================================
if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--index", action="store_true", help="Index documents into vector DB")
    parser.add_argument("--query", type=str, help="Run a query against the DB")
    args = parser.parse_args()

    if args.index:
        index_documents()
    elif args.query:
        query_pipeline(args.query)
    else:
        parser.print_help()
Enter fullscreen mode Exit fullscreen mode

You can test give a prompt and ask a questions that relate with information on the documents.

If you encounter error, make sure you have a access to the Amazon Bedrock and the Amazon Titan and Nova models.

Thanks for reading.

Want to connect?
Arsy Opraza, Curriculum Developer at Dicoding.
https://www.linkedin.com/in/arasopraza/
https://github.com/arasopraza
https://twitter.com/arsyopraza
Enter fullscreen mode Exit fullscreen mode

Top comments (0)