Building a Friends-Themed Chatbot: Exploring Amazon Bedrock for Dialogue Refinement

Hi Everyone,

While browsing the datasets in Kaggle, I came across this dataset where dialogues are provided character-wise from the Friends Series.

Friends Sitcom Dataset

The dialogues in the dataset brought back the fun time I had watching the Friends series. There comes the thought of building a chatbot using this dataset.

Initial Thought Process: This is how the initial thought process was, Divide the dialogues character-wise and generate embeddings for each dialogue. Store them in open-search, query them based on the user prompt, and return the most suitable dialog from the index.

Challenges: With the initial thought process converted each dialogue into an embedding using Amazon Bedrock models and stored them in the OpenSearch. However, while querying them, there is a big gap between the user prompt and the returned dialogue.

Solution: Even though it finds the most relevant dialogue based on the user prompt from the available dataset, sometimes it looks completely different. So I thought of adding one more Bedrock model to refine the queried dialogue and provide a relevant response.

Final Conclusion: So what I have done finally is, after querying a similar dialogue, I used a bedrock model which is good in Natural Language Processing to refine the dialogue and provide a relevant response without changing the tone of the dialogue. For this model, I prompted the context with some example prompts.

Finally, the bot came in good shape(To my knowledge 😁).

You can access the bot using this link. Give it a try with your input. I am open to suggestions. Feel free to comment

Update: Link is removed to reduce the cost for now

Step-by-step implementation:

  • Refine the dataset and store the dialogues character-wise

  • Generate embeddings and store them in OpenSearch

  • Query the OpenSearch index and refine the received dialogues using the Titan Model

  • Deploy a Front-End application to chat

Refine the dataset and store the dialogues character-wise:

  • Download the dataset from Kaggle using the link shared above

  • Extract the zip file. It contains 3 files. We are gonna use friends.csv file

  • Use the below script to divide the dialogues character-wise and store them in a folder

import pandas as pd
    import os

    df = pd.read_csv('friends.csv')

    refined_df = df[['text','speaker']]

    characters = ['Monica Geller', 'Joey Tribbiani', 'Chandler Bing', 'Phoebe Buffay', 'Ross Geller', 'Rachel Green']

    output_dir = "char_wise_dialogs"

    os.makedirs(output_dir, exist_ok=True)

    for character in characters:
        char_dialogs = refined_df[refined_df['speaker'] == character]

        file_name = f"{character.replace(' ','_')}_dialogues.csv"
        output_file = os.path.join(output_dir, file_name)

        char_dialogs.to_csv(output_file, index=False)
        print(f"Saved {character}'s dialogues to {output_file}")
Generate Embeddings and store them in OpenSearch:

  • Visit the OpenSearch service and create a domain with instance type with 10GB of Storage in a single AZ

  • Make the OpenSearch domain public and create a master user for login

  • Use the below script to iterate through the dialogues, Generate embeddings, and store them in an index

  • We will be using the model amazon.titan-embed-text-v2:0

 import boto3
    import pandas as pd
    import os
    import json
    from opensearchpy import OpenSearch, RequestsHttpConnection, helpers

    # AWS OpenSearch domain details
    OPENSEARCH_HOST = "open search endpoint without https"  # Replace with your endpoint
    INDEX_NAME = "friends-dialogues"

    # Initialize OpenSearch client
    client = OpenSearch(
        hosts=[{'host': OPENSEARCH_HOST, 'port': 443}],
        http_auth=('admin', '******'),  # Replace with your OpenSearch credentials

    # Initialize Bedrock client
    bedrock_client = boto3.client('bedrock-runtime', region_name='us-east-1')  # Replace with your region

    # Folder containing dialogues
    input_folder = "char_wise_dialogs"

    # Batch size for processing
    BATCH_SIZE = 20

    # Function to generate an embedding using Bedrock
    def generate_embedding(text):
        payload = {
            "inputText": text
        response = bedrock_client.invoke_model(
        response_body = json.loads(response['body'].read())
        return response_body.get('embedding')

    # Function to index documents in bulk in OpenSearch
    def bulk_index_documents(batch):
        actions = [
                "_index": INDEX_NAME,
                "_source": {
                    "character": doc["character"],
                    "dialogue": doc["dialogue"],
                    "embedding": doc["embedding"]
            for doc in batch
        helpers.bulk(client, actions)

    # Create the index in OpenSearch (if not already created)
    if not client.indices.exists(INDEX_NAME):
        client.indices.create(index=INDEX_NAME, body={
            "settings": {
                "number_of_shards": 1,
                "number_of_replicas": 1,
                "index": {
                    "knn": True  # Enable kNN search for this index
            "mappings": {
                "properties": {
                    "character": {"type": "keyword"},
                    "dialogue": {"type": "text"},
                    "embedding": {
                        "type": "knn_vector",
                        "dimension": 1024  # Replace with the embedding size
        print(f"Created index with knn_vector: {INDEX_NAME}")

    # Process each character file
    for file_name in os.listdir(input_folder):
        if file_name.endswith('.csv'):
            # Read character dialogues
            character_file = os.path.join(input_folder, file_name)
            df = pd.read_csv(character_file)

            # Process in batches
            batch = []
            for index, row in df.iterrows():
                dialogue = row['text']
                character = row['speaker']

                    # Generate embedding for each dialogue
                    embedding = generate_embedding(dialogue)
                    batch.append({"dialogue": dialogue, "character": character, "embedding": embedding})

                    # Process the batch if it reaches the batch size
                    if len(batch) == BATCH_SIZE:
                        # Bulk index the batch into OpenSearch
                        print(f"Indexed batch of size {len(batch)}")
                        batch = []  # Reset the batch
                except Exception as e:
                    print(f"Error processing dialogue: {dialogue[:50]} - {e}")

            # Process any remaining documents in the last batch
            if batch:
                print(f"Indexed remaining batch of size {len(batch)}")
Query the OpenSearch index and refine the received dialogues using the Titan Model:

  • Once the index has our data, Let’s create a script to query the index

  • Create a Lambda function with Python 3.9

  • Copy and paste the following code in the Lambda function and provide the necessary permissions

  • This script will query similar dialogues from the index and pass the received dialogue to the next model

  • We will be using **amazon.titan-text-express-v1 **model to refine the dialogue and add some relevant data to match the user prompt

  • Once the Lamba is ready, Create an API in API Gateway and add POST method for sending user message

import boto3
    import json
    from opensearchpy import OpenSearch, RequestsHttpConnection

    # OpenSearch configuration
    OPENSEARCH_HOST = "open search endpoint without https"
    INDEX_NAME = "friends-dialogues"

    # Initialize OpenSearch client
    client = OpenSearch(
        hosts=[{'host': OPENSEARCH_HOST, 'port': 443}],
        http_auth=('admin', '******'),

    # Bedrock clients for embedding and refinement
    bedrock_client = boto3.client('bedrock-runtime', region_name='us-east-1')

    # Function to generate embedding for user input
    def generate_embedding(text):
        payload = {"inputText": text}
        response = bedrock_client.invoke_model(
        response_body = json.loads(response['body'].read())
        return response_body.get('embedding')

    # Function to query OpenSearch for similar dialogues
    def query_opensearch(user_embedding):
        query = {
            "size": 1,
            "query": {
                "knn": {
                    "embedding": {
                        "vector": user_embedding,
                        "k": 1
        response =, body=query)
        hits = response["hits"]["hits"]
        if hits:
            return hits[0]["_source"]
        return None

    def refine_response(user_prompt, character, retrieved_dialogue):
        # Construct a guided and controlled prompt
        prompt = (
            f"You are an assistant generating responses for a Friends-themed chatbot. Your task is to:\n"
            f"1. Respond in the tone and style of the specified character.\n"
            f"2. Avoid adding irrelevant details or extra sentences.\n"
            f"3. Ensure responses are casual and character-specific.\n"
            f"4. Exclude any metadata or instructional text in the response.\n\n"
            f"- User Prompt: \"What's your favorite food?\"\n"
            f"  Character: Joey Tribbiani\n"
            f"  Dialogue: \"Joey doesn't share food!\"\n"
            f"  Response: \"Joey doesn't share food! But I do love a big meatball sub.\"\n\n"
            f"- User Prompt: \"Let's go for a vacation.\"\n"
            f"  Character: Ross Geller\n"
            f"  Dialogue: \"Spring vacation.\"\n"
            f"  Response: \"Spring vacation! I’ll pack my fossils!\"\n\n"
            f"User Prompt: {user_prompt}\n"
            f"Retrieved Dialogue: \"{retrieved_dialogue}\"\n"
            f"Character: {character}\n\n"
            f"Now, generate a response as the specified character, ensuring it aligns with the dialogue and the user's prompt."

        payload = {"inputText": prompt}

            # Invoke the Titan Text G1 - Express model
            response = bedrock_client.invoke_model(
            response_body = json.loads(response['body'].read())
            generated_response = response_body['results'][0]['outputText']

            # Post-process the response
            # 1. Remove metadata or prompt details
            if "User Prompt" in generated_response:
                generated_response = generated_response.split("User Prompt")[0].strip()

            # 2. Limit response length
            max_length = 150
            if len(generated_response) > max_length:
                generated_response = generated_response[:max_length].rsplit(" ", 1)[0] + "..."

            # 3. Ensure relevance: Fallback to retrieved dialogue if response is invalid
            if not generated_response or "irrelevant" in generated_response.lower():
                return retrieved_dialogue

            return generated_response

        except Exception as e:
            print(f"Error refining response: {e}")
            # Fallback to the retrieved dialogue in case of an error
            return retrieved_dialogue

    # Lambda function handler
    def lambda_handler(event, context):
            # Extract user input
            body = json.loads(event["body"])
            user_input = body["message"]

            # Generate embedding for user input
            user_embedding = generate_embedding(user_input)

            # Query OpenSearch for the most relevant dialogue
            result = query_opensearch(user_embedding)

            if not result:
                return {
                    "statusCode": 200,
                    "headers": {
                        "Content-Type": "application/json",
                        "Access-Control-Allow-Origin": "*"
                    "body": json.dumps({"character": "Unknown", "response": "I'm not sure how to respond to that!"})

            # Refine the response
            # refined_response = f"{result['dialogue']}"
            refined_response = refine_response(user_input, result["character"], result["dialogue"])

            # Return the refined response
            return {
                "statusCode": 200,
                "headers": {
                    "Content-Type": "application/json",
                    "Access-Control-Allow-Origin": ""
                "body": json.dumps({
                    "character": result["character"],
                    "response": refined_response

        except Exception as e:
            return {
                "statusCode": 500,
                "headers": {
                    "Content-Type": "application/json",
                    "Access-Control-Allow-Origin": ""
                "body": json.dumps({"error": str(e)})
Deploy a Front-End application to chat:

  • Once everything is ready, let’s build a simple front-end application and host it on the S3 Static web hosting.

  • If you have your own domain, Add it to Route 53 and point it to the S3 bucket.

  • Use the below code to create an HTML file and host it in an S3 bucket

  • Replace the API link with your own API

<!DOCTYPE html>
    <html lang="en">
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <title>Friends Chatbot</title>
            #chat-container {
                width: 90%;
                max-width: 600px;
                margin: 20px auto;
                font-family: Arial, sans-serif;
            #messages {
                height: 400px;
                overflow-y: auto;
                border: 1px solid #ccc;
                padding: 10px;
                border-radius: 5px;
                background-color: #f9f9f9;
            .message {
                margin: 10px 0;
            .user {
                text-align: right;
                color: blue;
            .bot {
                text-align: left;
                color: green;
            #input-container {
                display: flex;
                margin-top: 10px;
            #user-input {
                flex: 1;
                padding: 10px;
                border: 1px solid #ccc;
                border-radius: 5px;
            button {
                margin-left: 5px;
                padding: 10px 20px;
                background-color: #007bff;
                color: white;
                border: none;
                border-radius: 5px;
                cursor: pointer;
            button:hover {
                background-color: #0056b3;
        <div id="chat-container">
            <div id="messages"></div>
            <div id="input-container">
                <input type="text" id="user-input" placeholder="Type your message...">
                <button onclick="sendMessage()">Send</button>
            const apiEndpoint = "replace with you api gateway endpoint";

            function sendMessage() {
                const inputField = document.getElementById("user-input");
                const message = inputField.value.trim();
                if (!message) return;

                const messagesContainer = document.getElementById("messages");

                // Add user message
                const userMessage = document.createElement("div");
                userMessage.className = "message user";
                userMessage.textContent = message;

                // Clear input
                inputField.value = "";

                // Send API request
                fetch(apiEndpoint, {
                    method: "POST",
                    headers: { "Content-Type": "application/json" },
                    body: JSON.stringify({ message }),
                    .then((response) => response.json())
                    .then((data) => {
                        // Add bot response
                        const botMessage = document.createElement("div");
                        botMessage.className = "message bot";
                        botMessage.textContent = `${data.character}: ${data.response}`;

                        // Scroll to bottom
                        messagesContainer.scrollTop = messagesContainer.scrollHeight;
                    .catch((error) => {
                        console.error("Error:", error);
                        const botMessage = document.createElement("div");
                        botMessage.className = "message bot";
                        botMessage.textContent = "Error connecting to the chatbot.";
That’s it. Visit my hosted solution using the above link shared and share your feedback through the comments.

Thanks 😀

