A chatbot system that can be trained with custom data from PDF files.
In this tutorial, we will create a chatbot system that can be trained with custom data from PDF files. The chatbot will utilize Next.js for the frontend, MaterialUI for the UI components, Langchain and OpenAI for working with language models, and Supabase to store the data and embeddings. By the end, you will have a fully functional chatbot that can answer questions based on the contents of uploaded PDF files.
Next.js
Next.js is a powerful and flexible React framework developed by Vercel that enables developers to build server-side rendering (SSR) and static web applications with ease. It combines the best features of React with additional capabilities to create optimized and scalable web applications.
OpenAI
The OpenAI module in Node.js provides a way to interact with OpenAI’s API, allowing developers to leverage powerful language models like GPT-3 and GPT-4. This module enables you to integrate advanced AI functionalities into your Node.js applications.
LangChain.js
LangChain is a powerful framework designed for developing applications with language models. Originally developed for Python, it has since been adapted for other languages, including Node.js. Here’s an overview of LangChain in the context of Node.js:
What is LangChain?
LangChain is a library that simplifies the creation of applications using large language models (LLMs). It provides tools to manage and integrate LLMs into your applications, handle chaining of calls to these models, and enable complex workflows with ease.
Key Features
- Model Integration: Connect and interact with various language models, including those from OpenAI, Hugging Face, and more.
- Prompt Management: Manage, optimize, and format prompts effectively.
- Chain Building: Create sequences of model interactions, enabling more sophisticated workflows.
- Memory Management: Maintain context between interactions, making the models’ responses more coherent and contextually aware.
- Tool Use: Integrate external tools and APIs to augment the capabilities of language models.
- Streaming: Support for streaming responses, which is useful for real-time applications.
How Large Language Models (LLM) Work?
Large Language Models (LLMs) like OpenAI’s GPT-3.5 are trained on vast amounts of text data to understand and generate human-like text. They can generate responses, translate languages, and perform many other natural language processing tasks.
Supabase
Supabase is an open-source backend-as-a-service (BaaS) platform designed to help developers quickly build and deploy scalable applications. It offers a suite of tools and services that simplify database management, authentication, storage, and real-time capabilities, all built on top of PostgreSQL
Our Goals
Training the Model
- PDF File Conversion: The uploaded PDF files are converted into vectors. Vectors are numerical representations that the AI can understand.
- Embedding: The vectors are embedded into a vector store for efficient querying.
Chatting with the Trained Model
- User Input: The user provides an input query.
- Prompt Conversion: The input is converted into a standalone question.
- Vector Conversion: The question is converted into a vector.
- Nearest Match Search: The system searches for the nearest match in the vector store.
- Response Generation: The system generates an answer based on the closest match.
Prerequisites
Before we start, ensure you have the following:
- Node.js and npm installed
- A Supabase account
- API key for OpenAI
Step 1: Setting Up Supabase
Creating Tables and Functions
First, create an extension if it doesn’t already exist for our vector store:
create extension if not exists vector;
Next, create a table named “documents”. This table will be used to store and embed the content of our uploaded PDF files in vector format:
create table if not exists documents (
id bigint primary key generated always as identity,
content text,
metadata jsonb,
embedding vector(1536)
);
Now, we need a function to query our embedded data:
create or replace function match_documents (
query_embedding vector(1536),
match_count int default null,
filter jsonb default '{}'
) returns table (
id bigint,
content text,
metadata jsonb,
similarity float
) language plpgsql as $$
begin
return query
select
id,
content,
metadata,
1 - (documents.embedding <=> query_embedding) as similarity
from documents
where metadata @> filter
order by documents.embedding <=> query_embedding
limit match_count;
end;
$$;
The “match_documents” function performs the task of querying the embedded data. We will call this function in our Next.js app via Supabase Vector Store.
Next, we need to set up our tables for the chatbot system:
create table if not exists files (
id bigint primary key generated always as identity,
name text not null,
created_at timestamp with time zone default timezone('utc'::text, now()) not null
);
create table if not exists rooms (
id bigint primary key generated always as identity,
created_at timestamp with time zone default timezone('utc'::text, now()) not null
);
create table if not exists chats (
id bigint primary key generated always as identity,
room bigint references rooms(id) on delete cascade,
role text not null,
message text not null,
created_at timestamp with time zone default timezone('utc'::text, now()) not null
);
The “files” table will store details of the uploaded PDF files. This allows us to reference and filter the files in the “documents” table. Our chatbot system will query embedding data with the given “file id” selected in our app. This way, our chatbot system can manage multiple PDF files and focus on the context of a specific file.
The “rooms” table will store all the chat sessions, allowing users to have multiple chat sessions within our app.
Finally, the “chats” table will store all the chats from a particular chat session (room). The role will differentiate whether it’s a user or a bot. If it’s a user, the role will be “user”.
Step 2: Setting Up Next.js
Create Next.js app
$ npx create-next-app chatbot
$ cd ./chatbot
Install the required dependencies:
npm install @langchain/community @langchain/core @langchain/openai @supabase/supabase-js langchain openai pdf-parse pdfjs-dist
Then we will install Material UI for building our interface, feel free to use other library:
npm install @mui/material @emotion/react @emotion/styled
Connecting to Supabase
Create a file to connect your Next.js app to Supabase:
// src/libs/supabaseClient.ts
import { createClient, SupabaseClient } from "@supabase/supabase-js";
const supabaseUrl: string = process.env.NEXT_PUBLIC_SUPABASE_URL || "";
const supabaseAnonKey: string = process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY || "";
if (!supabaseUrl) throw new Error("Supabase URL not found.");
if (!supabaseAnonKey) throw new Error("Supabase Anon key not found.");
export const supabaseClient: SupabaseClient = createClient(supabaseUrl, supabaseAnonKey);
Setting Up LLM clients
Create a file to set up LangChain and Embeddings:
// src/libs/openAI.ts
import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
const openAIApiKey: string = process.env.NEXT_PUBLIC_OPENAI_API_KEY || "";
if (!openAIApiKey) throw new Error("OpenAI API key not found.");
export const llm = new ChatOpenAI({
openAIApiKey,
modelName: "gpt-3.5-turbo",
temperature: 0.9,
});
export const embeddings = new OpenAIEmbeddings(
{
openAIApiKey,
},
{ maxRetries: 0 }
);
Next.js Config
Lastly, we need to update our Next.js config file, since we will be using Web PDF Loader from Langchain, and it depends on fs module that will throw error if used in the browser. So update your config file following this snippet:
/** @type {import('next').NextConfig} */
const nextConfig = {
reactStrictMode: true,
output: "export",
webpack: (config, { isServer }) => {
// See https://webpack.js.org/configuration/resolve/#resolvealias
config.resolve.alias = {
...config.resolve.alias,
sharp$: false,
"onnxruntime-node$": false,
};
config.experiments = {
...config.experiments,
topLevelAwait: true,
asyncWebAssembly: true,
};
config.module.rules.push({
test: /\.md$/i,
use: "raw-loader",
});
// Fixes npm packages that depend on `fs` module
if (!isServer) {
config.resolve.fallback = {
...config.resolve.fallback, // if you miss it, all the other options in fallback, specified
// by next.js will be dropped. Doesn't make much sense, but how it is
fs: false, // the solution
"node:fs/promises": false,
module: false,
perf_hooks: false,
};
}
return config;
},
};
export default nextConfig;
Now our Next.js app is ready! let’s continue on building the chatbot system.
Step 3: Prepare the services that communicate with our database
We will use these services / methods to communicate with the Supabase from our React component.
File Service
The file service handles file-related operations, such as fetching the list of files and saving a new file to the database.
// src/services/file.ts
import { embeddings } from "@/libs/openAI";
import { supabaseClient } from "@/libs/supabaseClient";
import { WebPDFLoader } from "@langchain/community/document_loaders/web/pdf";
import { SupabaseVectorStore } from "@langchain/community/vectorstores/supabase";
export interface IFile {
id?: number | undefined;
name: string;
created_at?: Date | undefined;
}
// Fetch the list of uploaded files from the Supabase database.
export async function fetchFiles(): Promise<IFile[]> {
const { data, error } = await supabaseClient
.from("files")
.select()
.order("created_at", { ascending: false })
.returns<IFile[]>();
if (error) throw error;
return data;
}
// Save a new file to the database, convert it to vectors, and store the vectors.
export async function saveFile(file: File): Promise<IFile> {
const { data, error } = await supabaseClient
.from("files")
.insert({ name: file.name })
.select()
.single<IFile>();
if (error) throw error;
const loader = new WebPDFLoader(file);
const output = await loader.load();
const docs = output.map((d) => ({
...d,
metadata: { ...d.metadata, file_id: data.id },
}));
await SupabaseVectorStore.fromDocuments(docs, embeddings, {
client: supabaseClient,
tableName: "documents",
queryName: "match_documents",
});
return data;
}
- fetchFiles: Fetches the list of uploaded files from the Supabase database, ordered by creation date.
- saveFile: Saves a new file to the database, converts the PDF content to vectors using the Langchain library, and stores the vectors in the Supabase vector store.
Room Service
The room service handles operations related to chat rooms, such as fetching the list of rooms and creating a new room.
// src/services/room.ts
import { supabaseClient } from "@/libs/supabaseClient";
export interface IRoom {
id?: number | undefined;
created_at?: Date | undefined;
}
// Fetch the list of chat rooms from the Supabase database.
export async function fetchRooms(): Promise<IRoom[]> {
const { data, error } = await supabaseClient
.from("rooms")
.select()
.order("created_at", { ascending: false })
.returns<IRoom[]>();
if (error) throw error;
return data;
}
// Create a new chat room in the database.
export async function createRoom(): Promise<IRoom> {
const { data, error } = await supabaseClient
.from("rooms")
.insert({})
.select()
.single<IRoom>();
if (error) throw error;
return data;
}
- fetchRooms: Fetches the list of chat rooms from the Supabase database, ordered by creation date.
- createRoom: Creates a new chat room in the database and returns the created room.
Chat Service
The chat service handles operations related to chats, such as fetching the list of chats, posting a new chat, and getting an answer from the chatbot.
// src/services/chat.ts
import { embeddings, llm } from "@/libs/openAI";
import { supabaseClient } from "@/libs/supabaseClient";
import { SupabaseVectorStore } from "@langchain/community/vectorstores/supabase";
import { StringOutputParser } from "@langchain/core/output_parsers";
import {
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
} from "@langchain/core/prompts";
import {
RunnablePassthrough,
RunnableSequence,
} from "@langchain/core/runnables";
import { formatDocumentsAsString } from "langchain/util/document";
export interface IChat {
id?: number | undefined;
room: number;
role: string;
message: string;
created_at?: Date | undefined;
}
// Fetch the list of chats for a given room from the Supabase database.
export async function fetchChats(roomId: number): Promise<IChat[]> {
const { data, error } = await supabaseClient
.from("chats")
.select()
.eq("room", roomId)
.order("created_at", { ascending: true })
.returns<IChat[]>();
if (error) throw error;
return data;
}
// Post a new chat message to the database.
export async function postChat(chat: IChat): Promise<IChat> {
const { data, error } = await supabaseClient
.from("chats")
.insert(chat)
.select()
.single<IChat>();
if (error) throw error;
return data;
}
// Get an answer from the chatbot based on the user's chat message.
export async function getAnswer(chat: IChat, fileId: number): Promise<IChat> {
const vectorStore = await SupabaseVectorStore.fromExistingIndex(embeddings, {
client: supabaseClient,
tableName: "documents",
queryName: "match_documents",
});
const retriever = vectorStore.asRetriever({
filter: (rpc) => rpc.filter("metadata->>file_id", "eq", fileId),
k: 2,
});
const SYSTEM_TEMPLATE = `Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
{context}`;
const messages = [
SystemMessagePromptTemplate.fromTemplate(SYSTEM_TEMPLATE),
HumanMessagePromptTemplate.fromTemplate("{question}"),
];
const prompt = ChatPromptTemplate.fromMessages(messages);
const chain = RunnableSequence.from([
{
context: retriever.pipe(formatDocumentsAsString),
question: new RunnablePassthrough(),
},
prompt,
llm,
new StringOutputParser(),
]);
const answer = await chain.invoke(chat.message);
const { data, error } = await supabaseClient
.from("chats")
.insert({
role: "bot",
room: chat.room,
message: answer,
})
.select()
.single<IChat>();
if (error) throw error;
return data;
}
- fetchChats: Fetches the list of chats for a given room from the Supabase database, ordered by creation date.
- postChat: Posts a new chat message to the database.
- getAnswer: Gets an answer from the chatbot based on the user’s chat message. It uses the Langchain library to retrieve the most relevant documents from the vector store and generates a response using OpenAI’s language model.
Step 4: Building the UI
Chat Room Component
The ChatRoom component handles the display and interaction of the chat interface.
// src/components/ChatRoom.tsx
import {
Box,
Button,
LinearProgress,
Stack,
TextField,
Typography,
} from "@mui/material";
import { ChangeEvent, MouseEvent, useEffect, useState } from "react";
import { IChat, fetchChats, getAnswer, postChat } from "@/services/chat";
export default function ChatRoom({
roomId,
fileId,
}: {
roomId: number;
fileId: number;
}) {
const [message, setMessage] = useState<string>("");
const [chats, setChats] = useState<IChat[]>([]);
const [submitting, setSubmitting] = useState(false);
const onChangeInput = (e: ChangeEvent<HTMLInputElement>) =>
setMessage(e.target.value);
const onSubmitInput = async (e: MouseEvent<HTMLElement>) => {
e.preventDefault();
if (!message) return;
let currChats = [...chats];
try {
setSubmitting(true);
const chat = await postChat({
role: "user",
room: roomId,
message,
});
setMessage("");
currChats.push(chat);
const answer = await getAnswer(chat, fileId);
currChats.push(answer);
setChats(currChats);
} catch (err) {
console.error(err);
} finally {
setSubmitting(false);
}
};
useEffect(() => {
(async () => {
try {
if (typeof roomId !== "undefined") {
const chats = await fetchChats(roomId);
setChats(chats);
}
} catch (err) {
console.error(err);
}
})();
}, [roomId]);
return (
<>
<Stack sx={{ gap: 2, mb: 2 }}>
{chats.map((chat, i) => (
<Box
key={i}
sx={{
display: "flex",
justifyContent: chat.role === "user" ? "flex-end" : "flex-start",
}}
>
<Box
sx={{
minWidth: "250px",
maxWidth: "1000px",
p: 2,
border: "1px solid #555",
borderRadius: (theme) => theme.spacing(2),
}}
>
<Typography
sx={{
whiteSpace: "pre-line",
wordBreak: "break-word",
mb: 2,
display: "block",
}}
>
{chat.message}
</Typography>
</Box>
</Box>
))}
</Stack>
{submitting && <LinearProgress />}
<TextField
fullWidth
multiline
minRows={2}
maxRows={10}
value={message}
label="Write Something ..."
onChange={onChangeInput}
sx={{ mb: 2 }}
/>
<Button
fullWidth
type="submit"
variant="contained"
onClick={onSubmitInput}
disabled={submitting}
>
<Typography>Send</Typography>
</Button>
</>
);
}
- useEffect: Fetches the list of chats for the room when the component mounts or the roomId changes.
- onSubmitInput: Handles sending a new chat message, posting it to the database, and getting a response from the chatbot.
File Uploader Component
The FileUploader component handles uploading files.
// src/components/Fi
import { ChangeEvent, MouseEvent, useState } from "react";
import { Box, Button, Typography } from "@mui/material";
import { IFile, saveFile } from "@/services/file";
export default function FileUploader({
onSave,
}: {
onSave: (file: IFile) => void;
}) {
const [inputFile, setInputFile] = useState<File | undefined>(undefined);
const [uploading, setUploading] = useState<boolean>(false);
const onChangeFile = (e: ChangeEvent<HTMLInputElement>) => {
const file = e?.target?.files?.[0];
setInputFile(file);
};
const handleSaveFile = async (e: MouseEvent<HTMLElement>) => {
e.preventDefault();
if (!inputFile) return;
try {
setUploading(true);
const file = await saveFile(inputFile);
onSave(file);
} catch (err) {
console.error(err);
} finally {
setUploading(false);
}
};
return (
<>
<Box
component="label"
htmlFor="file-uploader"
sx={{ mb: 2, display: "block" }}
>
<input
accept="application/pdf"
id="file-uploader"
type="file"
style={{ display: "none" }}
onChange={onChangeFile}
/>
<Button variant="outlined" fullWidth component="span">
<Typography>{inputFile ? inputFile.name : "Select File"}</Typography>
</Button>
</Box>
<Button
fullWidth
variant="contained"
color="primary"
disabled={!inputFile || uploading}
onClick={handleSaveFile}
>
<Typography>Upload</Typography>
</Button>
</>
);
}
- handleSaveFile: Handles file upload, saving the file to the database, and updating the list of files.
Home Page
The Home component is the main page that allows the user to create or select chat room, upload a file, and selecting a file to chat about.
// src/pages/index.tsx
import ChatRoom from "@/components/ChatRoom";
import FileUploader from "@/components/FileUploader";
import { IFile, fetchFiles } from "@/services/file";
import { IRoom, createRoom, fetchRooms } from "@/services/room";
import {
Button,
Divider,
Grid,
List,
ListItemButton,
Typography,
} from "@mui/material";
import { MouseEvent, useEffect, useMemo, useState } from "react";
export default function Home() {
const [rooms, setRooms] = useState<IRoom[]>([]);
const [files, setFiles] = useState<IFile[]>([]);
const [roomId, setRoomId] = useState<number | undefined>(undefined);
const [fileId, setFileId] = useState<number | undefined>(undefined);
const onSaveFile = (file: IFile) => setFiles((v) => [file, ...v]);
const handleCreateRoom = async (e: MouseEvent<HTMLElement>) => {
e.preventDefault();
try {
const newRoom = await createRoom();
setRooms((v) => [newRoom, ...v]);
setRoomId(newRoom.id);
} catch (err) {
console.error(err);
}
};
const handleSelectRoom =
(id: number | undefined) => (e: MouseEvent<HTMLElement>) => {
e.preventDefault();
setRoomId(id);
};
const handleSelectFile =
(id: number | undefined) => (e: MouseEvent<HTMLElement>) => {
e.preventDefault();
setFileId(id);
};
useEffect(() => {
(async () => {
try {
const rooms = await fetchRooms();
setRooms(rooms);
const files = await fetchFiles();
setFiles(files);
} catch (err) {
console.error(err);
}
})();
}, []);
return (
<Grid container>
<Grid item xs={2} sx={{ p: 2 }}>
<Button fullWidth variant="contained" onClick={handleCreateRoom}>
New Chat
</Button>
<Divider sx={{ my: 2 }} />
<List>
{rooms.map((room, i) => (
<ListItemButton
selected={roomId === room.id}
key={i}
onClick={handleSelectRoom(room.id)}
>
{room.created_at?.toString()}
</ListItemButton>
))}
</List>
</Grid>
<Grid item xs={2} sx={{ p: 2 }}>
<FileUploader onSave={onSaveFile} />
<Divider sx={{ my: 2 }} />
<List>
{files.map((file, i) => (
<ListItemButton
selected={fileId === file.id}
key={i}
onClick={handleSelectFile(file.id)}
>
{file.name}
</ListItemButton>
))}
</List>
</Grid>
<Grid item xs sx={{ p: 2 }}>
{roomId && fileId ? (
<ChatRoom roomId={roomId as number} fileId={fileId as number} />
) : (
<Typography>Select one room and one file</Typography>
)}
</Grid>
</Grid>
);
}
- onSaveFile: Callback to becalled once our FileUploader component save the file into database successfully, this way we can update the “files” state with the new file.
- handleCreateRoom: Handles on creating a new chat room
- handleSelectRoom: Handles selecting a room
- handleSelectFile: Handles selecting a file.
- Conditional Rendering: Renders the text helper component if no file and room is selected, and the ChatRoom component if a file and room is selected.
Step 5: Running the Application
Create a .env file in the root of your project to store your environment variables:
NEXT_PUBLIC_SUPABASE_URL=your-supabase-url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-supabase-anon-key
NEXT_PUBLIC_OPENAI_API_KEY=your-openai-api-key
Finally, start your Next.js application:
npm run dev
Now, you should have a running application where you can upload PDF files, chat with a bot trained on your data, and receive relevant responses based on the uploaded content.
Conclusion
This guide provided a comprehensive overview of building a custom chatbot that can answer questions based on uploaded PDF files. You learned how to set up your project, configure Supabase and OpenAI, create the necessary services, and build the frontend components with React and MaterialUI. With this foundation, you can extend and customize the chatbot to fit your specific needs.
Check the source code in this repo:
https://github.com/firstpersoncode/chatbot
Happy coding!
Top comments (0)