Tiago Souto

Posted on Apr 12, 2024

AI Series Part V: Creating a RAG chatbot with LangChain (NextJS+Python)

#python #nextjs #langchain #ai

Previously, we created our first chatbot integrated with OpenAI and our first RAG chat using LangChain and NextJS.

As mentioned in a previous post, LangChain was originally built in Python and then a JavaScript version was created. For working with AI and ML the most commonly used programming language is Python due to its versatility, frameworks, scalability, and learning curve. So, it’s not a surprise that not only LangChain does better support for Python, but also there are more features and resources available in Python than in JavaScript nowadays to work with AI.

I have almost zero expertise with Python, but I recognize the need for learning at least the basics so I can either use it when needed or even find solutions that are not yet available in JS and try to replicate the logic.

The same could be applied to you. Maybe you work with a team whose backend expertise is Python, or you have that expertise and want to take advantage of the ML support it provides, or maybe you just want to learn something new like me. So, how can we integrate Python with NextJS? I’ll show you in this post by creating a RAG chat app using Python, LangChain, and NextJS.

Starting the Server
First, things first: as always, keep the base chat app that we created in the Part III of this AI series at hand. We’ll reuse a lot of that code, so I won’t repeat myself with the basic setup. Feel free to also get it from this GitHub repo: https://github.com/soutot/ai-series/tree/main/nextjs-chat-base

For this example, we’ll use Flask as for me who has no previous experience with Python it’s the easiest way of creating a web server.

Assuming you already have the base chat app running, let’s start by creating a directory in the root of the project called “flask”. Inside it, create a file app.py, and then let’s create our first Flask endpoint:

from flask import Flask, jsonify
from flask_cors import CORS
from dotenv import load_dotenv

app = Flask(__name__)
CORS(app)
load_dotenv()
@app.route('/', methods=['GET'])
def index():
    return jsonify({"message": 'Flask server is running'})

Cool, but in order to be able to run that, we need to install a few packages and dependencies. So let’s go for it.

Installation
If you don’t have Python 3 installed, download and install it following the official website: https://www.python.org/downloads/

Then let’s install and initialize Flask. For this example, I’m going to run the MacOS bash scripts. If you’re on a Windows machine, please check the commands as suggested on the official website: https://flask.palletsprojects.com/en/3.0.x/installation/

Inside the flask directory, run

python3 -m venv .venv

This will create a virtual environment where packages will be installed. Next, we need to activate this virtual environment to tell Python where we’re working.

. .venv/bin/activate

And now we can install Flask:

pip install Flask flask_cors

All set, now it’s time to run the app:

flask run --host=0.0.0.0 --debug -p 5328

If everything’s correct, you should now be able to access http://localhost:5328 and see the Flask server is running message.

As our initial setup is ready, we can now start working on the RAG app. First, let’s install LangChain dependencies:

pip install langchain langchain-community langchain-core langchain-openai langchainhub python-dotenv gpt4all chromadb

After installing the dependencies you might like to create the requirements.txt file, which is similar to node package.json. So it holds the project dependencies and you can also install all the dependencies from it when starting a fresh app. You can do that by running

pip freeze > requirements.txt

With all the dependencies installed, let's move back to the code.

Python RAG Implementation
Now we'll start the RAG implementation, by creating endpoints, uploading and embedding files, storing in a vector store, retrieving documents, making LLM requests, and returning a response. So let's get started.

We’ll first create a class to handle streaming responses

import queue
q = queue.Queue()
stop_item = "###finish###"

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
import sys
class StreamingStdOutCallbackHandlerYield(StreamingStdOutCallbackHandler):
    def on_llm_start(
        self, serialized, prompts, **kwargs
    ) -> None:
        """Run when LLM starts running."""
        with q.mutex:
            q.queue.clear()
    def on_llm_new_token(self, token, **kwargs) -> None:
        """Run on new LLM token. Only available when streaming is enabled."""
        sys.stdout.write(token)
        sys.stdout.flush()
        q.put(token)
    def on_llm_end(self, response, **kwargs) -> None:
        """Run when LLM ends running."""
        q.put(stop_item)

Then initialize our POST endpoint and import all dependencies we’re going to use on it

from flask import request, Response, stream_with_context
from langchain_community.chat_models import ChatOpenAI
import os
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import GPT4AllEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import ConversationalRetrievalChain
@app.route('/', methods=['POST'])
def send_prompt():
    data = request.get_json()
    prompt = data['prompt']
Then, we’ll create our LLM object that’ll be used to send the chat requests

llm = ChatOpenAI(
      temperature=0,
      openai_api_key=os.getenv('OPENAI_API_KEY'),
      model='gpt-3.5-turbo',
      streaming=True,
      callbacks=[StreamingStdOutCallbackHandlerYield()]
    )

After that, we can initialize the vectorstore instance, where our embedded document will be stored

DB_PATH = "vectorstores/db/"
vectorstore = Chroma(persist_directory=DB_PATH, embedding_function=GPT4AllEmbeddings())
Now, we create the chain that’ll be responsible for putting all-together and sending the requests

chain = ConversationalRetrievalChain.from_llm(
        llm,
        retriever=vectorstore.as_retriever(),
        condense_question_llm=llm,
        return_source_documents=True,
        chain_type="stuff",
        get_chat_history=lambda h : h
    )

We’ll create the system prompt to provide instructions to the LLM and append it to our chain

PROMPT_TEMPLATE = """You are a good assistant that answers questions. Your knowledge is strictly limited to the following piece of context. Use it to answer the question at the end.
    If the answer can't be found in the context, just say you don't know. *DO NOT* try to make up an answer.
    If the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.
    Give a response in the same language as the question.

    Context: {context}
    """
system_prompt = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
chain.combine_docs_chain.llm_chain.prompt.messages[0] = system_prompt

With all that set, we can finally call the chain and return the streaming response to the frontend

ai_response = chain({"question": prompt, 'chat_history': ''})

return Response(stream_with_context(ai_response['answer']))

Great! The endpoint is ready to receive requests. But before we start working on it, there are still a few things left to be done. We’ll start by adding another endpoint for uploading the document file, embedding it, and storing into the vectorstore

Let’s start by creating the endpoint and getting the file content

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import GPT4AllEmbeddings
@app.route('/embed', methods=['POST'])
def embed():
    try:
        if 'file' not in request.files:
            return jsonify({"error": "No file provided"}), 400

        file = request.files['file']
        file_content = file.read().decode("utf-8")

We’ll then create the document chunks by splitting the content using line breaks as separator

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=100,
    separators=[
    "\n\n",
    "\n",
    ]
  )
texts=text_splitter.create_documents([file_content])

We’re now going to use GPT4AllEmbeddings to embed the documents and store on ChromaDB

DB_PATH = "vectorstores/db/"
vectorstore = Chroma.from_documents(documents=texts, embedding=GPT4AllEmbeddings(), persist_directory=DB_PATH)      
vectorstore.persist()

Last, but not least, let’s create a .env file with the OPENAI_API_KEY in the root of the flask app

OPENAI_API_KEY=<your key>

We’re finished with the Python code. You can now restart your server by running

flask run --host=0.0.0.0 --debug -p 5328

Let’s go back to the NextJS App.

NextJS and Python Integration
On the page.tsx, let’s add the uploadFile function, similar to how we did for the NextJS RAG app

async function uploadFile(file: File) {
  try {
    const formData = new FormData()
    formData.append('file', file)
    const response = await fetch('/api/embed', {
      method: 'POST',
      body: formData,
    })

  if (response.ok) {
      console.log('Embedding successful!')
    } else {
      const errorResponse = await response.text()
      throw new Error(`Embedding failed: ${errorResponse}`)
    }
  } catch (error) {
    throw new Error(`Error during embedding: ${error}`)
  }
}

Then, call it inside the handleFileSelected method

const handleFileSelected = async (event?: ChangeEvent<HTMLInputElement>) => {
    if (!event) return clearFile()
    setIsUploading(true)
    const {files} = event.currentTarget

  if (!files?.length) {
      return
    }
    const selectedFile = files[0]
    await uploadFile(selectedFile)
    setFile(selectedFile)
    setIsUploading(false)
    event.target.value = '' // clear input as we handle the file selection in state
  }

Okay, now we need to make sure the NextJS frontend app sends requests to the Flask backend server. To do so, we’ll update the nextjs.config.mjs file and include a rewrite rule:

rewrites: async () => {
    return [
      {
        source: '/api/:path*',
        destination: 'http://localhost:5328/:path*',
      },
    ]
  },

What we’re doing here is telling NextJS that every request sent to /api should be forwarded to http://localhost:5328. It includes subroutes, so sending a request to /api/embed will forward it to http://localhost:5328/embed

If you're running through Docker, instead of using localhost you should use the container's name. So if the container's name is flask-server, then the destination should be http://flask-server:5328/:path*

We can now delete the src/api directory from the NextJS app as it’s no longer needed.

If everything looks good, you can now run the NextJS app

pnpm run dev
and start playing with the chat app.

Great, we’ve now covered the basics of how to use RAG in NextJS with a Python server. There’s still a lot of things you can do from here. But I’ll keep it simple for now.

Let’s move forward with new topics in the upcoming posts. Hope this one was helpful for someone. See you in the next one where we'll talk about using images with GPT-4 Vision to ingest our vectorstore.

GitHub code repository: https://github.com/soutot/ai-series/tree/main/python-chat-rag

Forem

AI Series Part V: Creating a RAG chatbot with LangChain (NextJS+Python)

Top comments (0)

Read next

Speech to Text using Assembly AI

Ping Pong game in Pygame python

Kernel Memory with Azure OpenAI, Blob storage and AI Search services

Kernel Memory document ingestion