DEV Community: Tim Schill

How to use rerank models in Amazon Bedrock

Tim Schill — Fri, 27 Dec 2024 16:45:35 +0000

During re:Invent 2024 it was announced that Amazon Bedrock now have support for a new kind of model — the reranker.

In a typical RAG workflow the user query is converted into vectors through the help of an embedding model, then the vectors are used to carry out a search in a vector database and the result is passed back as context to the prompt and fed to an LLM. In a Proof of Concept the amount of data being searched and returned is often so small that the above workflow works just fine. But in production, the amount of data is probably much bigger and the results may not be as effective.

1.Reranker to the rescue

This is where a reranker or a rerank model comes into the picture. It acts as a second filter between the vector search and the prompt. Let’s say we want to return up to 50 results from our vector database, but all 50 of these results are most likely not relevant to the user question.

So we take the 50 results and pass them to our rerank model and tell it we want to get the 5 closest matching results back. The rerank model will evaluate the 50 documents against the user query and give us back 5 results with the highest matching documents based on scoring (scores are calculated between 1 and 0 where better is as close to 1 as possible). We can then take these 5 documents and add them to our prompt as context and carry out the user question with the help of an ordinary LLM. The result should in theory be more precise and accurate.

Amazon Bedrock currently support two rerank models, Amazon Rerank 1.0 and Cohere Rerank 3.5. Below I will show you how you can work with them through two available methods in Bedrocks API.

2.Our Options:

The first method is the standard “invoke_model” endpoint in the bedrock-runtime API, and the second option is the “rerank” method available in bedrock-agent-runtime endpoint.

The bedrock-agent-runtime endpoint is as you probably can guess supposed to be part of an agentic workflow and Amazon Knowledge Bases (another managed feature in Bedrock) but it can be used without having to involve any of these features.

The reason I will show you both is because I found each of them to have their own strength and weaknesses. For example the “invoke_model” option is more straight and simple. It allows us to pass documents in the form of strings or json. But it wont allow (at least to my knowledge) to involve rank_fields which can be super effective if you are working with json documents. With rank_fields you basically tell the rerank model which keys it should take into consideration while scoring the documents.

While in the case of the “rerank” method in bedrock-agent-runtime, you get a more complex payload, you can’t mix and match between TEXT (strings) and JSON, but you can leverage the rank_fields.

So let’s get started, the first approach in this demonstration will be the invoke_model path.

# Example 1: Using invoke_model with mixed string and JSON documents
import boto3
import json

bedrock_client = boto3.client("bedrock-runtime")

# Prepare the documents, mixing strings and dictionaries
documents = [
    "Stockholm is the capital of Sweden",  # String
    {"title": "Bangkok", "text": "Bangkok is the capital of Thailand"},  # Dictionary
    "Oslo is the capital of Norway",  # String
    "Udon Thani is a city in Thailand",  # String
    "Kuala Lumpur is the capital of Malaysia",  # String
    {"title": "Vientiane", "text": "Vientiane is the capital of Laos"},  # Dictionary
]

# Serialize dictionaries to JSON strings to simulate mixed inputs
serialized_documents = [
    json.dumps(doc) if isinstance(doc, dict) else doc for doc in documents
]

# Construct the body
body = json.dumps(
    {
        "query": "What are three cities in south east asia?",
        "documents": serialized_documents,
        "top_n": 3,
        "api_version": 2,
    }
)

# Invoke the Bedrock model
response = bedrock_client.invoke_model(
    modelId="cohere.rerank-v3-5:0",
    accept="application/json",
    contentType="application/json",
    body=body,
)

# Process the response
response_body = json.loads(response.get("body").read())
print(response_body)

# Extract the indices of the matching documents
matching_indices = [result["index"] for result in response_body["results"]]

# Retrieve and print the matching documents
matching_documents = [documents[i] for i in matching_indices]

print("Matching Documents:")
for doc in matching_documents:
    # Deserialize JSON strings back to dictionaries for readability
    if isinstance(doc, str) and doc.startswith("{") and doc.endswith("}"):
        doc = json.loads(doc)
    print(doc)

Output to the query “What are three cities in south east asia?”

{'results': [
  {'index': 1, 'relevance_score': 0.20046411},
  {'index': 4, 'relevance_score': 0.19896571},
  {'index': 5, 'relevance_score': 0.18133602}
]}

Matching Documents:
{'title': 'Bangkok', 'text': 'Bangkok is the capital of Thailand'}
Kuala Lumpur is the capital of Malaysia
{'title': 'Vientiane', 'text': 'Vientiane is the capital of Laos'}

Second approach with “rerank” method in the bedrock-agent-runtime:

# Example 2: Using rerank with JSON documents and rank_fields
import boto3
import json

bedrock_client = boto3.client("bedrock-agent-runtime")

payload = {
    "queries": [
        {
            "textQuery": {"text": "What is the capital of Sweden?"},
            "type": "TEXT",
        }
    ],
    "rerankingConfiguration": {
        "bedrockRerankingConfiguration": {
            "modelConfiguration": {
                "modelArn": "arn:aws:bedrock:eu-central-1::foundation-model/amazon.rerank-v1:0",
                "additionalModelRequestFields": {"rank_fields": ["city", "text"]},
            },
            "numberOfResults": 2,
        },
        "type": "BEDROCK_RERANKING_MODEL",
    },
    "sources": [
        {
            "inlineDocumentSource": {
                "jsonDocument": {
                    "city": "Stockholm",
                    "text": "Stockholm is the capital of Sweden.",
                },
                "type": "JSON",
            },
            "type": "INLINE",
        },
        {
            "inlineDocumentSource": {
                "jsonDocument": {
                    "city": "Bangkok",
                    "text": "Bangkok is the capital of Thailand.",
                },
                "type": "JSON",
            },
            "type": "INLINE",
        },
        {
            "inlineDocumentSource": {
                "jsonDocument": {
                    "city": "New Stockholm",
                    "text": "New Stockholm is a city in USA.",
                },
                "type": "JSON",
            },
            "type": "INLINE",
        },
    ],
}

# Construct the payload
rerank_params = {
    "queries": payload["queries"],
    "rerankingConfiguration": payload["rerankingConfiguration"],
    "sources": payload["sources"],
}

# Invoke the Bedrock model
response_body = bedrock_client.rerank(**rerank_params)

print(response_body["results"])

# Extract the indices of the matching documents
matching_indices = [result["index"] for result in response_body["results"]]

# Retrieve and print the matching documents
matching_documents = [payload["sources"][i] for i in matching_indices]

print("Matching Documents:")
for doc in matching_documents:
    # Deserialize JSON strings back to dictionaries for readability
    if isinstance(doc, str) and doc.startswith("{") and doc.endswith("}"):
        doc = json.loads(doc)
    print(doc)

Output from the query “What is the capital of Sweden?”

[
  {'index': 0, 'relevanceScore': 0.981979250907898},
  {'index': 2, 'relevanceScore': 0.006241259165108204}
]

Matching Documents:
{'inlineDocumentSource': {'jsonDocument': {'city': 'Stockholm', 'text': 'Stockholm is the capital of Sweden.'}, 'type': 'JSON'}, 'type': 'INLINE'}
{'inlineDocumentSource': {'jsonDocument': {'city': 'New Stockholm', 'text': 'New Stockholm is a city in USA.'}, 'type': 'JSON'}, 'type': 'INLINE'}

I must say that I do favour the bedrock-agent-runtime approach because it can leverage the “rank_fields” but it do add some complexity to the payload being sent to bedrock.

Conclusion

Overall I am really happy to see Amazon Bedrock implement support for not one but two great rerank models. I know this is a feature many people have been waiting for. And I hope people start to implement these kind of models into their RAG workflow because they can make a big difference.

How to Build Chatbots with Amazon Bedrock & LangChain

Tim Schill — Wed, 01 Nov 2023 08:29:47 +0000

Amazon Bedrock was released on the 28th of September, and I have been fortunate enough to have had access to it for some time now while it has been in closed preview.

If you are reading this article, I’m sure you already know about Amazon Bedrock; if not, let’s summarize it quickly. Amazon Bedrock is a fully managed service that gives you access to foundation models (FMs). Being serverless means there is no infrastructure to care about. However, you still have access to some powerful features (for some models) like fine-tuning with your own data and agents, which you can look at as tools you can create yourself to supercharge your FM with abilities it usually doesn’t have. For example, you could create an agent to query a database or talk with an external API.

You also can create context-aware applications with the help of Related Augmented Generation (RAG). We will not dive deeper than that in this article.

The Setup

Getting started with Bedrock is simple, but to utilize its full power, you can add LangChain to the mix. Langchain is a framework that will simplify the creation of applications using large language models (LLMs). Its power lies in its ability to “chain” or combine multiple components.

In this example, we will create a chatbot with the help of Streamlit, LangChain, and its classes “ConversationChain” and “ConversationBufferMemory.”

First, we create a new Python file. Let’s call it “bedrock.py”.

import os
import boto3
from langchain.chains import ConversationChain
from langchain.llms.bedrock import Bedrock
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate

We import some libraries from LangChain. Let’s quickly go through them. Bedrock will allow us to create an object with details about what FM we want to use and configure model parameters, authentication, etc. PromptTemplate will enable us to create prompts we can ingest variables into, similar to Python f-strings.

ConversationBufferMemory allows us to take control of the history or memory efficiently. And finally, ConversationChain will tie all these objects together in a chain.

Next we will create the chain function.

def bedrock_chain():
    profile = os.environ["AWS_PROFILE"]

    bedrock_runtime = boto3.client(
        service_name="bedrock-runtime",
        region_name="us-east-1",
    )

    titan_llm = Bedrock(
        model_id="amazon.titan-text-express-v1", client=bedrock_runtime, credentials_profile_name=profile
    )
    titan_llm.model_kwargs = {"temperature": 0.5, "maxTokenCount": 700}

    prompt_template = """System: The following is a friendly conversation between a knowledgeable helpful assistant and a customer.
    The assistant is talkative and provides lots of specific details from it's context.

    Current conversation:
    {history}

    User: {input}
    Bot:"""
    PROMPT = PromptTemplate(
        input_variables=["history", "input"], template=prompt_template
    )

    memory = ConversationBufferMemory(human_prefix="User", ai_prefix="Bot")
    conversation = ConversationChain(
        prompt=PROMPT,
        llm=titan_llm,
        verbose=True,
        memory=memory,
    )

    return conversation

Our “bedrock_chain” function will create our Bedrock object. In our case, we use the Titan Text G1 — Express model. We then make our prompt template. Our prompt holds three essential parts: the instruction, the context (history), and our user query (input). We then configure the memory attribute with the help of “ConversationBufferMemory.” And finally, we put all this together by creating a “ConversationChain” object.

And we will end with creating two functions.

def run_chain(chain, prompt):
    num_tokens = chain.llm.get_num_tokens(prompt)
    return chain({"input": prompt}), num_tokens


def clear_memory(chain):
    return chain.memory.clear()

The first one, run_chain will be used when we call our chain from our Streamlit app. And clear_memory is pretty obvious; it will empty our history.

We now have everything we need to communicate with Amazon Bedrock. In the next step, we will create our Streamlit app.

We start with creating a new file called “app.py” and import a couple of libraries.

import streamlit as st
import uuid
import bedrock

There is nothing strange here. Streamlit is needed, uuid is used to handle user sessions, and Bedrock is the file we created before.

Now, let’s configure our session_state in Streamlit.

USER_ICON = "images/user-icon.png"
AI_ICON = "images/ai-icon.png"

if "user_id" in st.session_state:
    user_id = st.session_state["user_id"]
else:
    user_id = str(uuid.uuid4())
    st.session_state["user_id"] = user_id

if "llm_chain" not in st.session_state:
    st.session_state["llm_app"] = bedrock
    st.session_state["llm_chain"] = bedrock.bedrock_chain()

if "questions" not in st.session_state:
    st.session_state.questions = []

if "answers" not in st.session_state:
    st.session_state.answers = []

if "input" not in st.session_state:
    st.session_state.input = ""

Next, we create a function to create our top bar, add a button for clearing the chat, and an if-statement with some clearing functionality.

def write_top_bar():
    col1, col2, col3 = st.columns([2, 10, 3])
    with col2:
        header = "Amazon Bedrock Chatbot"
        st.write(f"<h3 class='main-header'>{header}</h3>", unsafe_allow_html=True)
    with col3:
        clear = st.button("Clear Chat")

    return clear


clear = write_top_bar()

if clear:
    st.session_state.questions = []
    st.session_state.answers = []
    st.session_state.input = ""
    bedrock.clear_memory(st.session_state["llm_chain"])

We can now create the main function for handling input from the user.

def handle_input():
    input = st.session_state.input

    llm_chain = st.session_state["llm_chain"]
    chain = st.session_state["llm_app"]
    result, amount_of_tokens = chain.run_chain(llm_chain, input)
    question_with_id = {
        "question": input,
        "id": len(st.session_state.questions),
        "tokens": amount_of_tokens,
    }
    st.session_state.questions.append(question_with_id)

    st.session_state.answers.append(
        {"answer": result, "id": len(st.session_state.questions)}
    )
    st.session_state.input = ""

The important part is lines 6–8. This is where we initialize our chain, call it, and store the result. Next is to create a couple of functions to render the question, answer, and our history.

And finally, we call all the functions and add our input form.

def write_user_message(md):
    col1, col2 = st.columns([1, 12])

    with col1:
        st.image(USER_ICON, use_column_width="always")
    with col2:
        st.warning(md["question"])
        st.write(f"Tokens used: {md['tokens']}")


def render_answer(answer):
    col1, col2 = st.columns([1, 12])
    with col1:
        st.image(AI_ICON, use_column_width="always")
    with col2:
        st.info(answer["response"])


def write_chat_message(md):
    chat = st.container()
    with chat:
        render_answer(md["answer"])


with st.container():
    for q, a in zip(st.session_state.questions, st.session_state.answers):
        write_user_message(q)
        write_chat_message(a)


st.markdown("---")
input = st.text_input(
    "You are talking to an AI, ask any question.", key="input", on_change=handle_input
)

That’s it; we can now start our application by typing “streamlit run app.py” and start chatting.

Our chatbot built with Streamlit, LangChain and Amazon Bedrock

What’s next?

I recommend you look deeper at LangChain if you are not already familiar with it. You can also look at the aws-samples Github page; they have some great examples to get you started. For example, you could add Amazon Kendra to the mix. Connect it with one of its many sources, like Atlassian Confluence, and set up Langchain to utilize the Kendra retriever. And now you have a chatbot that can answer questions based on the context it grabs from your internal confluence wiki pages.

In Conclusion

Amazon Bedrock is super easy to start with; it does not require many lines of code, and with the help of LangChain, you can create some really powerful applications. And if you add agents to the mix, the possibilities are almost limitless.