Building an AI powered app to chat with any website

Github repo and link to the live app at the end of the article

Let’s build a Web Application to chat with any website using the Exa Neural Search API and OpenAI GPT-3.5 turbo within a Streamlit front-end.

Creating the web retrieval function

As we want to RAG on the whole website, and not just a single page, this makes this step a bit tricky. The hack we came up with is using the Exa RAG API and to constrain it to only the domain of the website we want to chat with.
After getting a free API key on their website, we can build our retrieval function:

pip install exa-py loguru

from exa_py import Exa
from typing import Dict, List, Tuple
from loguru import logger

exa = Exa("EXA_API_KEY")

def get_text_chunks(
    query: str,
    url: str,
    num_sentences: int = 15,
    highlights_per_url: int = 5,
) -> Tuple[List[str], List[str]]:
    """
    Return a lsit of text chunks from the given URL that are relevant to the query.
    """

    highlights_options = {
        "num_sentences": num_sentences,  # how long our highlights should be
        "highlights_per_url": highlights_per_url,
    }
    search_response = exa.search_and_contents(
        query,
        highlights=highlights_options,
        num_results=10,
        use_autoprompt=True,
        include_domains=[url],
    )

    chunks = [sr.highlights[0] for sr in search_response.results]
    url_sources = list(set([sr.url for sr in search_response.results]))

    return chunks, url_sources

Using GPT-3.5 turbo to generate an answer with the content chunks

To generate an answer to the user query, we will pass to GPT-3.5 turbo the content chunks retrieved from the website as context.
We use Llama index RAG prompt template available here to build our prompt:

def generate_prompt_from_chuncks(chunks: List[str], query: str) -> str:
    """
    Generate a prompt from the given chunks and question.
    TODO: add a check on token lenght to avoid exceeding the max token length of the model.
    """
    assert len(chunks) > 0, "Chunks should not be empty"

    concatenated_chunks = ""
    for chunck in chunks:
        concatenated_chunks += chunck + "\n\n"

    prompt = f"""
    Context information is below.
    ---------------------
    {concatenated_chunks}
    ---------------------
    Given the context information and not prior knowledge, answer the query.
    Do not start your answer with something like Based on the provided context information...
    Query: {query}
    Answer: 
    """
    return prompt

Using GPT-3.5 turbo to generate an answer with the content chunks

Then, we can invoke GPT-3.5 turbo on this prompt (optionally with the previous messages):

pip install openai

from openai import OpenAI

openai_client = OpenAI(api_key=config.OPENAI_API_KEY)

def invoke_llm(
    prompt: str,
    model_name: str = "gpt-3.5-turbo",
    previous_messages: List[Dict[str, str]] = None,
) -> str:
    """
    Invoke the language model with the given prompt and return the response.
    """
    if previous_messages is None:
        previous_messages = []
    completion = openai_client.chat.completions.create(
        model=model_name,
        messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant replying to questions given a context.",
            }
        ]
        + previous_messages
        + [
            {"role": "user", "content": prompt},
        ],
        temperature=0.0,
    )

    return completion.choices[0].message.content

Now, we have our answering function ready to go:

def query2answer(
    query: str, url: str, session_messages: List[Dict[str, str]]
) -> Tuple[str, List[str]]:
    """
    Given a query and an URL, return the answer to the query.
    """
    try:
        logger.info(f"Query: {query}")
        chuncks, url_sources = get_text_chunks(query, url)
        logger.info(f"Retrieved {len(chuncks)} chunks from {url}")
        prompt = generate_prompt_from_chuncks(chuncks, query)
        # TODO: add a check on token lenght to avoid exceeding the max token length of the model.
        llm_answer = invoke_llm(prompt, previous_messages=session_messages)
        logger.info(f"Answer: {llm_answer}")
    except Exception as e:
        logger.error(f"An error occurred: {e}")
        llm_answer = "Sorry, I was not able to answer. Either you setup a wrong URL or the URL is too new."
        url_sources = []
    return llm_answer, url_sources

Using Streamlit to build the frontend

Using Streamlit, we can easily build a frontend in python for our agent:

pip install streamlit

import streamlit as st
from agent import query2answer
from urllib.parse import urlparse
import time
import config

# Initialize URL
# Check the query parameters for a URL
if "url" in st.query_params:
    # Check it is note None
    if st.query_params.url and st.query_params.url != "None":
        st.session_state.url = st.query_params.url
if "url" not in st.session_state:
    st.session_state.url = None
# Initialize chat history
if "messages" not in st.session_state:
    st.session_state.messages = []

st.markdown(
    "# 📖 url2chat - Chat with any website"
)

ROLE_TO_AVATAR = {
    "user": "🦸‍♂️",
    "assistant": "📖",
}

if st.session_state.url is None:
    url = st.text_input("Enter the URL of a website to chat with it")
    if url:
        # Format checks
        if not url.startswith("http"):
            url = "https://" + url

        # Parse the URL to only have the domain
        o = urlparse(url)
        domain = o.hostname

        st.session_state.url = f"https://{domain}"
        # Set the URL as a query parameter to trigger a rerun
        st.query_params.url = f"https://{domain}"
        # Trigger a rerun to start chatting
        time.sleep(0.5)
        st.rerun()

else:
    # Add the URL as a query parameter (the rerun will remove it from the URL bar)
    st.query_params.url = st.session_state.url
    # Button to change the URL
    col1, col2 = st.columns([1, 1])
    with col1:
        if st.button("Change URL", use_container_width=True):
            st.session_state.url = None
            st.query_params.pop("url", None)
            st.session_state.messages = []
            # We need to add a small delay, otherwise the query parameter is not removed before the rerun
            time.sleep(0.5)
            st.rerun()
    with col2:
        if st.button("Clear chat", use_container_width=True):
            st.session_state.messages = []
            st.rerun()

    with st.chat_message("assistant", avatar=ROLE_TO_AVATAR["assistant"]):
        st.markdown(f"You're chatting with {st.session_state.url}. Ask me anything! 📖")

    # Display chat messages from history on app rerun
    for message in st.session_state.messages:
        with st.chat_message(message["role"], avatar=ROLE_TO_AVATAR[message["role"]]):
            st.markdown(message["content"])

    # Accept user input
    if prompt := st.chat_input("What is this website about?"):
        # Add user message to chat history
        st.session_state.messages.append({"role": "user", "content": prompt})
        # Display user message in chat message container
        with st.chat_message("user", avatar=ROLE_TO_AVATAR["user"]):
            st.markdown(prompt)

        # Display assistant response in chat message container
        chat_answer, url_sources = query2answer(
            prompt, st.session_state.url, st.session_state.messages
        )
        with st.chat_message("assistant", avatar=ROLE_TO_AVATAR["assistant"]):
            st.markdown(chat_answer)
            # Display the sources in a hidden accordion container
            with st.expander("Sources", expanded=False):
                for source in url_sources:
                    st.markdown("- " + source)

        st.session_state.messages.append({"role": "assistant", "content": chat_answer})

Adding text analytics to understand how our app performs and add user feed

Now that our app is working, let’s see how people are using it and how it is perfoming. To do so, we will use phospho, an open-source text analytics solution. In this example, we will use the free trial of the hosted version but you can self host it (see the github repo for more info on how to do so).
First we need to get our phospho project id and API key and to add it to our .streamlit/secrets.toml file:

PHOSPHO_API_KEY=""
PHOSPHO_PROJECT_ID=""

pip install --upgrade phospho

Then, in our streamlit file, we can start logging messages:

import phospho

phospho.init()

# ...

phospho.log(input=prompt,
            output=chat_answer,
            metadata={"sources": url_sources},
        )

phospho enable us to handle sessions. Let’s add a session support (see full file on github).

Let’s handle feedbacks from our users (you will need to have put in place the session id from above):

pip install streamlit_feedback

from streamlit_feedback import streamlit_feedback

# ...

# Add feedback button
def _submit_feedback(feedback: dict):
    # Add a check if phospho is setup
    if config.PHOSPHO_API_KEY and config.PHOSPHO_PROJECT_ID:
        phospho.user_feedback(
            task_id=phospho.latest_task_id,
            raw_flag=feedback["score"],
            notes=feedback["text"],
        )
        st.toast(f"Thank you for your feedback!")
    else:
        st.toast(f"phospho is not setup, feedback not sent.")


if len(st.session_state.messages) > 1:
    feedback = streamlit_feedback(
        feedback_type="thumbs",
        optional_text_label="[Optional] Please provide an explanation",
        on_submit=_submit_feedback,
        # To create a new feedback component for every message and session, you need to provide a unique key
        key=f"{st.session_state.session_id}_{len(st.session_state.messages)}",
    )

Now, we can use phospho to detect some events of interest:

When the assistant answer that it doesn’t have the information
When the users wants to take an action (for instance buying a good or a service)

Conclusion

In this article, we’ve taken a deep dive into how to build a sophisticated web application, url2chat, that enables users to chat with any website. Leveraging the Exa Neural Search API, OpenAI GPT-3.5 turbo, and Streamlit, we created a system that extracts relevant information from entire websites, generates context-aware responses, and presents it all within a user-friendly interface.

Possible improvements

According to the data we collected using phospho, the user experinece on our app isn’t meeting our quality standard. Some possible improvements are:

not using a RAG search API but passing the whole website in the LLM context window (only suitable for small websites)
use the sitemap to find pages relevant to the query, and then pass these pages to the LLM

Want to test it?

Clone the Github repo here and run it locally or use the version deployed on Streamlit Community Cloud here.

Top comments (2)

Paul • Mar 6 '24

Looks really cool !
Open source perplexity ?