Welcome to our journey into the world of conversational AI! In this blog series, we will build a simple but complete chatbot application that enable users to interact with PDF documents they upload.
Our goal is to create a chatbot that empower users to engage with PDF content in a conversational manner, extracting information, answering questions, and providing insights — all through natural language interaction, with the help of OpenAI's large language models (LLM).
Our tech stack for this adventure includes some powerful tools and frameworks:
- LangChain: A framework that helps developers build context-aware reasoning applications. Although LangChain supports a lot of LLMs, we will certainly use the most popular one provided by OpenAI.
- Vercel AI SDK: an open-source library designed by Vercel to help developers build conversational streaming user interfaces.
- PostgreSQL with the pgvector extension: A robust relational database with advanced vector storage capabilities, essential for storing and retrieving document embeddings efficiently.
- Prisma: A modern database toolkit and ORM for Node.js and TypeScript, simplifying database access and management in our Nuxt application.
- Nuxt: A lightning-fast web framework built on Vue.js and Nitro, offering server-side rendering and a seamless developer experience.
- Tailwind CSS with the daisyUI plugin: A utility-first CSS framework paired with the amazing daisyUI plugin, offering a sleek and customizable UI design for our chatbot interface.
Throughout the series, we'll delve into each key component of chatting with documents, from setting up the infrastructure to implementing LLM capabilities. A fundamental knowledge of web development using Nuxt is required to start, but you don't have to be an expert of it.Let's dive in!
1. An Overview of the App Flow and Key Concept
The App Flow
Firstly Let's take a bird's-eye view of our chatbot application, which comprises two main parts: Data Contribution and Question Answering.
- Data Contribution
The user can contribute knowledge to our chatbot by uploading a PDF document. Here's what happens behind the scenes:
- PDF Upload: The user uploads a PDF file containing valuable information.
- Text Splitting: The uploaded file is split into smaller chunks using a text splitter, breaking down the document into digestible pieces.
- Embeddings: Each chunk of text is sent to OpenAI, where sophisticated algorithms generate embeddings - a way to represent the essence of the text - as numerical vectors. We will explain it in detail later on.
- Vector Store: These embeddings that are stored in PostgreSQL creates a searchable database of document representations.
- Question Answering
Here is how our chatbot engages with users to provide insightful answers:
- User Input: The user inputs some words or a question related to the content of the uploaded PDF.
- Conversation Memory: Your input is stored in the conversation memory, serving as the context for the ongoing conversation.
- Question Summarization: OpenAI summarizes your input into a concise question, ensuring clarity and relevance.
- Question Embedding: The summarized question is transformed into an embedding — a numerical representation — by OpenAI's powerful algorithms.
- Vector Search: The chatbot searches the PostgreSQL vector store for the nearest match to the question embedding, retrieving the most relevant document chunk.
- Answer Generation: Leveraging the user input, the nearest match, and the conversation history stored in memory, OpenAI generates a comprehensive answer.
- Presentation and Storage: The answer is presented to you, providing valuable insights, and is also stored in the conversation memory for future reference.
By understanding this flow, you'll gain insight into how our chatbot seamlessly interacts with users, leveraging advanced technologies to provide intelligent responses based on uploaded PDF content and user inquiries.
Key Concept: Embeddings
Embeddings are like fingerprints for words or documents. They're numerical representations that capture the essence of text, enabling computers to understand and compare words based on their meaning. Think of embeddings as unique identifiers that help machines interpret language and make sense of textual data.
Embeddings encode semantic similarities between words by placing similar words closer together in the vector space. Words with similar meanings will have similar embeddings, allowing machines to infer relationships and make intelligent predictions.
Let's consider a simple example using a small corpus of text:
"The quick brown fox jumps over the lazy dog."
This piece of text can be tokenized (split) into individual words or tokens, to prepare the data for further processing, like this:
["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]
Then we can use some techniques to train a model on the tokenized text data to generate embeddings for each word like this:
"quick": [0.2, 0.4, -0.1, 0.5, ...]
"fox": [0.1, 0.3, -0.2, 0.6, ...]
"lazy": [0.3, 0.6, -0.3, 0.8, ...]
Each word is represented as a high-dimensional vector, where the values in the vector capture different aspects of the word's meaning. Similar words will have similar vector representations, allowing machines to understand and process language more effectively.
2. The App Skeleton
Before delving into the explanation of key concepts, let's roll up our sleeves and dive into building a chatbot app skeleton using Nuxt, Vercel AI SDK, and LangChain. Our primary objective here is to implement the simplest feature: enabling users to chat with a Language Model (LLM) powered by Vercel AI SDK.
Prerequisites
Make sure you have installed these softwares on your local machine.
- Node.js (18+)
- PNPM (latest)
- Visual Studio Code (latest)
- Docker Desktop (latest, which will be used in later parts)
Setting Up the Nuxt App
We can refer to the Nuxt document to create a Nuxt app project from scratch.
Let's execute the command below in a terminal to create the project skeleton. Here we give our app a name: chatpdf.
npx nuxi@latest init chatpdf
When we are asked about which package manager to use, here let's choose pnpm
.
❯ Which package manager would you like to use?
○ npm
● pnpm
○ yarn
○ bun
When the package installation step is finished and we are asked about initialize a git repo, choose yes
❯ Initialize git repository?
● Yes / ○ No
Then let's run code chatpdf
to open the project using Visual Studio Code. From now on the commands can be operated in the VSCode terminal.
We can see that a minimal project structure has already been created, which includes:
- A component named
app.vue
which is the home page of the app - A public folder in which we can put static files like images and icons.
- A server folder in which we can put Nitro backend source code.
Now we can run docker compose up -d && pnpm dev
in the terminal to start a dev server and open http://localhost:3000/
in the browser to view the web app.
Install Essential Modules
To equip our Nuxt app with essential chatting features, we'll begin by installing these Nuxt modules and packages as below.
-
@nuxtjs/tailwindcss
: the zero-config Tailwind CSS module for Nuxt -
daisyui
: a Tailwind CSS plugin with a bunch of powerful UI elements -
ai
: Vercel AI SDK -
langchain
: LangChain framework -
lodash
: a utility library that provides helpful functions for manipulating arrays and objects
We can run this command to install all of them.
pnpm add lodash ai langchain @langchain/core @langchain/openai @langchain/community
pnpm add -D @nuxtjs/tailwindcss daisyui @types/lodash
Then let's add modules into the app configuration by modifying nuxt.config.ts
(that resides in the project root folder) as below.
export default defineNuxtConfig({
devtools: { enabled: true },
modules: ['@nuxtjs/tailwindcss'], // add modules
tailwindcss: {
config: {
plugins: [require('daisyui')], // add daisyui plugins
},
},
});
Implement the chat UI
Now, let's create a Vue component named ChatBox.vue
in the components
subdirectory and insert the following source code into ChatBox.vue
.
<template>
<div class="flex flex-col gap-2">
<div class="overflow-scroll border border-solid border-gray-300 rounded p-4 flex-grow">
<template v-for="message in messages" :key="message.id">
<div
class="whitespace-pre-wrap chat"
:class="[isMessageFromUser(message) ? 'chat-end' : 'chat-start']"
>
<div class="chat-image flex flex-col items-center">
<div class="avatar">
<div class="w-10 rounded-full">
<img :src="isMessageFromUser(message) ? 'https://cdn-icons-png.flaticon.com/512/3541/3541871.png' : 'https://cdn-icons-png.flaticon.com/512/1624/1624640.png'" />
</div>
</div>
<strong>
{{ isMessageFromUser(message) ? 'Me' : 'AI' }}
</strong>
</div>
<div class="chat-header"></div>
<div class="chat-bubble mb-4" :class="isMessageFromUser(message) ? 'chat-bubble-secondary' : 'chat-bubble-primary'">
{{ message.content }}
</div>
</div>
</template>
</div>
<form class="w-full" @submit.prevent="handleSubmit">
<input
class="w-full p-2 mb-8 border border-gray-300 rounded shadow-xl outline-none"
v-model="input"
placeholder="Say something..."
/>
</form>
</div>
</template>
<script lang="ts" setup>
const messages = ref<any>([{
id: 0,
role: 'ai',
content: 'Hello!'
}]);
const input = ref<string>('');
function isMessageFromUser(message: any) {
return message.role === 'user';
}
function handleSubmit() {
if (input.value) {
messages.value = [
...messages.value,
{
id: messages.value.length,
role: 'user',
content: input.value,
},
{
id: messages.value.length + 1,
role: 'ai',
content: input.value,
},
];
input.value = '';
}
}
</script>
<style></style>
Then modify the app.vue
file as below.
<template>
<div class="h-screen w-1/2 mx-auto my-2 flex flex-col gap-2">
<ChatBox class="flex-grow" />
</div>
</template>
Now we can run pnpm dev
and open http://localhost:3000
to see a super simple echo bot app running!
Wire Up for OpenAI in the backend
Now, let's integrate OpenAI into our chatpdf
app to facilitate smooth conversations with ChatGPT.
Firstly, obtain your OpenAI API key here. To securely store the API key separately from the source code, create a .env
file directly in the root directory and add your API key like this:
NUXT_OPENAI_API_KEY=your_api_key_here
Be sure to replace "your_api_key_here"
with the actual key generated from your OpenAI account.
Please note that the key name NUXT_OPENAI_API_KEY
should remain unchanged to allow our Nuxt app to read it properly, including case changes. You can find more information on its usage here.
Next, let's define the environment variable we're using in nuxt.config.ts
:
export default defineNuxtConfig({
...,
// add your desired config
runtimeConfig: {
openaiApiKey: '',
}
});
Then, create the api subdirectory inside the server directory and add a chat.ts
file to it.
Now replace the contents of chat.ts
with the following source code to integrate with Vercel SDK, LangChain, and OpenAI.
import { LangChainStream, Message, StreamingTextResponse } from 'ai';
import { ChatOpenAI } from 'langchain/chat_models/openai';
import { AIMessage, HumanMessage } from 'langchain/schema';
export default defineLazyEventHandler(() => {
// fetch the OpenAI API key
const apiKey = useRuntimeConfig().openaiApiKey;
if (!apiKey) {
throw createError('Missing OpenAI API key');
}
// create a OpenAI LLM client
const llm = new ChatOpenAI({
openAIApiKey: apiKey,
streaming: true,
});
return defineEventHandler(async (event) => {
const { messages } = await readBody<{ messages: Message[] }>(event);
const { stream, handlers } = LangChainStream();
llm
.invoke(
(messages as Message[]).map((message) =>
message.role === 'user' ? new HumanMessage(message.content) : new AIMessage(message.content)
),
{ callbacks: [handlers] }
)
.catch(console.error);
return new StreamingTextResponse(stream);
});
});
Please note that:
- We're utilizing
defineLazyEventHandler
to perform one-time setup before defining the actual event handler. -
useRuntimeConfig
will automatically retrieve environment variables from the.env
file, convertingNUXT_***
-like names (in SCREAMING_SNAKE_CASE) to their camel-case equivalents. For instance,NUXT_OPENAI_API_KEY
will be accessed asuseRuntimeConfig().openaiApiKey
.
With Vercel AI SDK now offering streaming capabilities for both frontend and backend, our task is to connect our ChatBox.vue
component to the Vercel AI SDK for Vue. Let's proceed by updating our ChatBox.vue
component as follows.
<template>
...
</template>
<script lang="ts" setup>
import { useChat, type Message } from 'ai/vue';
const { messages, input, handleSubmit } = useChat({
headers: { 'Content-Type': 'application/json' },
});
function isMessageFromUser(message: Message) {
return message.role === 'user';
}
</script>
<style></style>
As you've observed, the chat messages, input message, and chat submit handler are already thoroughly defined and implemented within the Vercel SDK. Therefore, all we need to do is import and utilize them using its useChat
composable. Quite convenient, isn't it?
Now, let's launch our app with pnpm dev, and open http://localhost:3000 in the browser to test interacting with ChatGPT. Congratulations, you've done it!
3. Setting Up the Database and Vector Store
Let's begin by configuring the database to store document data and associated embeddings. We'll utilize Docker to run PostgreSQL, ensuring a smooth setup process. Ensure that Docker Desktop is installed and running on your local machine.
Running PostgreSQL with Docker Compose
If you're not well-versed in Docker and PostgreSQL, there's no need to worry. You don't need to be a Docker expert to set up your database efficiently. Docker Compose simplifies the setup process and guarantees consistency across different environments.
To start, create a docker-compose.yml
file in the project's root directory with the following configuration, defining our PostgreSQL service and including the necessary settings for the pgvector
extension.
version: '3'
services:
db:
image: ankane/pgvector
ports:
- 5432:5432
volumes:
- db:/var/lib/postgresql/data
environment:
- POSTGRES_PASSWORD=postgres
- POSTGRES_USER=postgres
- POSTGRES_DB=chatdoc
volumes:
db:
You can rest assured that the complexities of PostgreSQL installation and extension setup are all taken care of by Docker Compose. Ensure that Docker Desktop is running on your local machine, then execute the following command in your project's root directory:
docker compose up -d
This command will initiate the PostgreSQL service defined in the docker-compose.yml file, along with the pgvector
extension. Docker Compose will handle container creation, network setup, and volume mounting automatically.
If you're comfortable with the command line, you won't need to install any additional PostgreSQL client tools on your local computer. Simply use docker compose
to access the running container and inspect the database:
docker compose exec db psql -h localhost -U postgres -d chatpdf
Once inside the container's terminal, execute \l
to display all databases. You should see something like this:
chatpdf=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | ICU Locale | Locale Provider | Access privileges
-----------+----------+----------+------------+------------+------------+-----------------+-----------------------
chatpdf | postgres | UTF8 | en_US.utf8 | en_US.utf8 | | libc |
postgres | postgres | UTF8 | en_US.utf8 | en_US.utf8 | | libc |
template0 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | | libc | =c/postgres +
| | | | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | | libc | =c/postgres +
| | | | | | | postgres=CTc/postgres
(4 rows)
The chatpdf
database is the one we'll connect to our app. Execute \dt
to display all tables in the current database (which should be chatpdf
):
chatpdf=# \dt
Did not find any relations
As expected, it's currently empty. We'll create our first table shortly. To exit the terminal and the container, execute \q
since we'll manipulate the database in the source code (using Prisma) rather than the command line.
Setting Up the Database with Prisma
Prisma is a modern database toolkit that streamlines database access and manipulation for developers. It provides a type-safe and auto-generated database client, facilitating seamless interaction with our PostgreSQL database.
Let's begin by installing the necessary packages for Prisma and performing initialization:
pnpm add prisma @prisma/client
npx prisma init
This will generate a prisma/schema.prisma
file with the following content:
// This is your Prisma schema file,
// learn more about it in the docs: https://pris.ly/d/prisma-schema
generator client {
provider = "prisma-client-js"
}
datasource db {
provider = "postgresql"
url = env("DATABASE_URL")
}
Now, append the following source code to define our first model, Document, which will store document chunks:
model Document {
id String @id @default(cuid())
content String
vector Unsupported("vector")?
}
In simple terms, we've defined three fields for the Document model:
-
id
: the document chunk ID -
content
: the original text of the document chunk -
vector
: the generated embedding for the document chunk
Since the schema uses an environment variable named DATABASE_URL
, we need to define it in the .env
file. Replace the contents of .env
with the following:
DATABASE_URL=postgres://postgres:postgres@localhost:5432/chatpdf
Now, run npx prisma migrate dev
to create a migration and migrate the data. You'll be prompted to name the new migration. For example, you can name it create the document table
:
Environment variables loaded from .env
Prisma schema loaded from prisma/schema.prisma
Datasource "db": PostgreSQL database "chatpdf", schema "public" at "localhost:5432"
? Enter a name for the new migration: › create the document table
Once the command successfully executes, the Document
table will be created in the chatpdf
database. To verify the result, connect to the container's console:
docker compose exec db psql -h localhost -U postgres -d chatpdf
In the container's console, run select * from "Document"
to display the tables:
chatpdf=# select * from "Document";
id | content | vector
----+---------+--------
(0 rows)
Now that we have an empty Document
table created, let's proceed to populate some data into it.
4. Uploading Files, Text Splitting, and Embedding
In this section, we'll delve into the process of uploading files, splitting their text content into manageable chunks, generating embeddings to represent each chunk, and finally storing the embeddings into the database (vector store). This is a critical step in the data contribution phase and lays the groundwork for building a chatbot capable of intelligently interacting with the content of uploaded documents.
File Uploading
As file uploading is a common task for web applications, we'll simplify the process by leveraging the popular Node.js package called formidable
. Since Nuxt's backend is powered by the h3
framework, we'll opt for the h3-formidable
package instead. This package seamlessly integrates formidable
with h3
, and it includes the original formidable package as a dependency. Additionally, we'll install its corresponding type library for enhanced TypeScript support.
pnpm add h3-formidable
pnpm add -D @types/formidable
Next, we'll create an server/api/upload.ts
file and incorporate the provided source code into it.
import _ from 'lodash';
import { readFiles } from 'h3-formidable';
export default defineEventHandler(async (event) => {
const { files } = await readFiles(event, {
maxFiles: 1,
keepExtensions: true,
});
_.chain(files)
.values()
.flatten()
.compact()
.value()
.forEach((file) => {
const { originalFilename, newFilename } = file;
console.log({ originalFilename, newFilename });
});
});
For now, our primary goal is to parse and log the uploaded files to verify the basic feature. To keep things simple, let's restrict the number of uploaded files to just one.
Next, let's integrate the file uploading feature into the frontend. In app.vue
, insert a file-input element above the ChatBox
component to enable file selection and uploading. Make sure to include the accept=".pdf"
attribute in the <input>
element to limit file selection to only PDF files.
<template>
<div class="h-screen w-1/2 mx-auto my-2 flex flex-col gap-2">
<form class="flex justify-between items-center gap-1">
<input
type="file"
id="file"
accept=".pdf"
@change="uploadFile($event.target as HTMLInputElement)"
/>
</form>
<ChatBox class="flex-grow" />
</div>
</template>
Afterwards, include the following TypeScript source code within a <script>
tag to implement the uploadFile
function:
<script lang="ts" setup>
async function uploadFile(elem: HTMLInputElement) {
const formData = new FormData();
formData.append('file', elem.files?.[0] as Blob);
try {
await useFetch('/api/upload', {
method: 'POST',
body: formData,
});
} catch (error) {
console.error(error);
}
}
</script>
Now, run pnpm dev
and access http://localhost:3000
. Click "Choose File" and select a local file to complete the upload. You should see log messages similar to the following, confirming that the basic file uploading is functioning:
{
originalFilename: 'Design Patterns.pdf',
newFilename: '30e4616e653a5dd30a0b1c300.pdf'
}
Text Splitting
After uploading a file, we'll employ the LangChain framework to split its text content into smaller, digestible chunks. This step is essential for breaking down lengthy documents into individual sentences or paragraphs, enabling our chatbot to process and analyze each part effectively.
To begin, we'll need to install the pdf-parse
package to gain the capability to parse PDF data:
pnpm add pdf-parse
Next, we'll modify the server/api/upload.ts
file as follows to handle this task.
import _ from 'lodash';
import { readFiles } from 'h3-formidable';
import { PDFLoader } from 'langchain/document_loaders/fs/pdf';
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
export default defineEventHandler(async (event) => {
const { files } = await readFiles(event, {
maxFiles: 1,
keepExtensions: true,
});
_.chain(files)
.values()
.flatten()
.compact()
.value()
.forEach(async (file) => {
const loader = new PDFLoader(file.filepath);
const docs = await loader.load();
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 500,
chunkOverlap: 50,
});
const chunks = await splitter.splitDocuments(docs);
chunks.map((chunk, i) => {
console.log(
`page ${i}:`,
_.truncate(chunk.pageContent, { length: 10 })
);
});
});
});
We utilize PDFLoader
to load and parse the uploaded file, and RecursiveCharacterTextSplitter
to split it into chunks. The parameters for RecursiveCharacterTextSplitter
are:
-
chunkSize
: the maximum size of one document chunk -
chunkOverlap
: the size of overlap between chunks
Sure, let's execute the app by running docker compose up -d && pnpm dev
, then access http://localhost:3000
in your browser. Once the page loads, select a PDF document (for instance, I'll choose a lecture document about design patterns), and upload it to the server.
As we inspect the console, log messages in the console will resemble the following. (Note: Text contents may vary.)
page 0: Design Patterns
...
page 1: • connections among...
page 2: between flexibility...
page 3: Disadvantages: The ...
page 4: Disadvantages: Comm...
page 5: and do bookkeeping ...
page 6: Disadvantages: Comm...
page 7: 1.2 When (not) to ...
page 8: the domain and pri...
page 9: can increase under ...
page 10: way and in this pr...
...
These log messages indicate that the uploaded file has been successfully split into multiple document chunks. Now, it's time to generate embeddings for each chunk.
Embeddings & Vector Store
To generate embeddings, we'll utilize OpenAIEmbeddings
, and to store the embeddings into the database, we'll use PrismaVectorStore
. Let's import these components and implement the code in server/api/upload.ts
.
import _ from 'lodash';
import { readFiles } from 'h3-formidable';
import { PDFLoader } from 'langchain/document_loaders/fs/pdf';
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import { OpenAIEmbeddings } from '@langchain/openai';
import { PrismaVectorStore } from '@langchain/community/vectorstores/prisma';
import { Prisma, PrismaClient, type Document } from '@prisma/client';
export default defineEventHandler(async (event) => {
const { files } = await readFiles(event, {
maxFiles: 1,
keepExtensions: true,
});
_.chain(files)
.values()
.flatten()
.compact()
.value()
.forEach(async (file) => {
// split the uploaded file into chunks
const loader = new PDFLoader(file.filepath);
const docs = await loader.load();
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 500,
chunkOverlap: 50,
});
const chunks = await splitter.splitDocuments(docs);
// generate embeddings and the vector store
const openAIApiKey = useRuntimeConfig().openaiApiKey;
const embeddings = new OpenAIEmbeddings({ openAIApiKey });
const db = new PrismaClient();
const vectorStore = PrismaVectorStore.withModel<Document>(db).create(
embeddings,
{
prisma: Prisma,
tableName: 'Document',
vectorColumnName: 'vector',
columns: {
id: PrismaVectorStore.IdColumn,
content: PrismaVectorStore.ContentColumn,
},
}
);
// store the chunks in the database
await db.document.deleteMany(); // delete existing document chunks
await vectorStore.addModels(
await db.$transaction(
chunks.map((chunk) =>
db.document.create({ data: { content: chunk.pageContent } })
)
)
);
});
});
Assuming only one file is allowed to exist, we need to delete all existing document chunks before inserting new data.
Let's proceed with the following steps to test our implementation:
- Run the app using the commands:
docker compose up -d && pnpm dev
- Access http://localhost:3000 in your browser.
- Upload a PDF file using the provided UI.
- Enter the
db
container by running the command:docker compose exec db psql -h localhost -U postgres -d chatpdf
- In the container's console, execute the following SQL statement to inspect the result:
sql select * from "Document" limit 1
You should observe a result similar to the following:
id | content | vector
--------------------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
clt0t638o000uhkyprf1voxkr | Design Patterns +| [0.017870644,0.010731987,-0.0072209467,0.00070460804,-0.012679516,0.004573952,-0.032422256,-0.014085304,-0.012117201,-0.026236793,3.6932724e-06,0.012713804,-0.0016620865,-0.010649697,0.010341109,0.023822954,0.0016320848,-0.0071455142,0.037551668,0.0007217518,0.008551301,-0.0059111645,-0.010574264,-0.007755832,-0.009730792,...]
The sequence of numbers represents the generated embeddings vector for the text chunk.
5. Standalone Questions and Prompt Templates
Now, let's delve into the question-answering aspect. In essence, the process revolves around generating a vector for the user's input question, finding the nearest match in our vector store, and formulating an appropriate response.
However, before we begin, it's crucial to understand the concept of standalone questions.
Standalone Questions
A standalone question is a distilled version of the user's input question, devoid of any contextual dependencies. It serves as a concise summary, facilitating clear and efficient communication.
For instance, consider a user query on an online store:
I'm thinking of buying one of your T-shirts, but I need to know what your return policy is as some T-shirts just don't fit me and I don't want to waste my money.
Parsing this original question directly might yield inaccurate results due to its verbosity and extraneous details. Therefore, we distill it down to its core essence: I need to know what your return policy is
. This concise version is what we search for in our database, ensuring accuracy and relevance in our responses.
To accomplish this, we leverage the mechanism of prompt templates in LangChain.
Prompt Templates
Prompt templates serve as a structured framework for guiding interactions between users and AI systems. They define predefined formats or patterns for AI responses, ensuring relevance and coherence in conversations.
For instance, consider the following example using LangChain's prompt templates:
import { ChatOpenAI } from "@langchain/openai";
import { PromptTemplate } from "@langchain/core/prompts";
const model = new ChatOpenAI({});
const promptTemplate = PromptTemplate.fromTemplate("Tell me a joke about {topic}");
const chain = promptTemplate.pipe(model);
const result = await chain.invoke({ topic: "bears" });
console.log(result);
/*
AIMessage {
content: "Why don't bears wear shoes?\n\nBecause they have bear feet!",
}
*/
In this example, the prompt template "Tell me a joke about {topic}"
incorporates a parameter {topic}
. By providing a value for {topic}
, such as "bears", we generate a specific prompt: "Tell me a joke about bears". This structured prompt guides the AI's response, resulting in a relevant and targeted answer.
Now, let's leverage prompt templates to generate standalone questions in the server/api/chat.ts
file.
import { LangChainStream, Message, StreamingTextResponse } from 'ai';
import { ChatOpenAI } from '@langchain/openai';
import { PromptTemplate } from '@langchain/core/prompts';
import _ from 'lodash';
export default defineLazyEventHandler(() => {
const apiKey = useRuntimeConfig().openaiApiKey;
if (!apiKey) {
throw createError('Missing OpenAI API key');
}
const llm = new ChatOpenAI({
openAIApiKey: apiKey,
streaming: true,
});
return defineEventHandler(async (event) => {
const { messages } = await readBody<{ messages: Message[] }>(event);
const { stream, handlers } = LangChainStream();
const standaloneQuestionTemplate = `Given a question, convert the question to a standalone question.
question: {question}
standalone question:`;
const standaloneQuestionPrompt = PromptTemplate.fromTemplate(standaloneQuestionTemplate);
standaloneQuestionPrompt
.pipe(llm)
.invoke(
{ question: _.last(messages)?.content || '' },
{ callbacks: [handlers] }
)
.catch(console.error);
return new StreamingTextResponse(stream);
});
});
Notice that standaloneQuestionPrompt.pipe(llm)
is just the first chain we've created. In LangChain, chains are fundamental components that consist of sequences of calls to various entities, such as large language models (LLMs), tools, or data preprocessing steps.
Chains are assembled by linking objects and actions together using the pipe
method, analogous to how pipes (|
) function in the Linux command line, to execute a series of tasks. Finally, the invoke
method is invoked to obtain the ultimate result.
Now, let's launch our application and give it a test run. You should receive the standalone version of your input question, as depicted in the image below.
6. Retrieval and Answering
Now that we've set up the standalone question prompt, let's proceed to use these standalone questions as inputs to search the database and generate responses.
Retrieval
The process outlined above is known as Retrieval
. It typically involves the following steps:
- Generating embeddings for the standalone question.
- Utilizing these embeddings to query the vector store and identify the closest match.
- Retrieving all matched document chunks.
The component responsible for executing retrieval tasks is referred to as a retriever
. You can obtain a retriever from a Prisma vector store using the code snippet provided below.
const llm = new ChatOpenAI({ openAIApiKey: "..." });
const db = new PrismaClient();
const embeddings = new OpenAIEmbeddings({ openAIApiKey: "..." });
const vectorStore = PrismaVectorStore.withModel<Document>(db).create(
embeddings,
{
prisma: Prisma,
tableName: 'Document',
vectorColumnName: 'vector',
columns: {
id: PrismaVectorStore.IdColumn,
content: PrismaVectorStore.ContentColumn,
},
}
);
const retriever = vectorStore.asRetriever();
...
The retriever
object can seamlessly integrate into our chain, handling all the intricate retrieval tasks for us.
Let's incorporate the provided code snippet into our app's server/api/chat.ts
file.
import { Message, StreamingTextResponse } from 'ai';
import { ChatOpenAI, OpenAIEmbeddings } from '@langchain/openai';
import { PromptTemplate } from '@langchain/core/prompts';
import _ from 'lodash';
import { Prisma, PrismaClient, type Document } from '@prisma/client';
import { PrismaVectorStore } from '@langchain/community/vectorstores/prisma';
import { StringOutputParser } from '@langchain/core/output_parsers';
export default defineLazyEventHandler(() => {
const apiKey = useRuntimeConfig().openaiApiKey;
if (!apiKey) {
throw createError('Missing OpenAI API key');
}
const llm = new ChatOpenAI({
openAIApiKey: apiKey,
streaming: true,
});
const db = new PrismaClient();
const embeddings = new OpenAIEmbeddings({ openAIApiKey: apiKey });
const vectorStore = PrismaVectorStore.withModel<Document>(db).create(
embeddings,
{
prisma: Prisma,
tableName: 'Document',
vectorColumnName: 'vector',
columns: {
id: PrismaVectorStore.IdColumn,
content: PrismaVectorStore.ContentColumn,
},
}
);
return defineEventHandler(async (event) => {
const { messages } = await readBody<{ messages: Message[] }>(event);
// standalone question prompt
const standaloneQuestionTemplate = `Given a question, convert the question to a standalone question.
question: {question}
standalone question:`;
const standaloneQuestionPrompt = PromptTemplate.fromTemplate(standaloneQuestionTemplate);
// retrieval
const retriever = vectorStore.asRetriever();
const outputParser = new StringOutputParser();
// chain them together
const chain = standaloneQuestionPrompt
.pipe(llm)
.pipe(outputParser)
.pipe(retriever)
.pipe((docs) => docs.map((e) => e.pageContent).join('\n\n'));
const stream = await chain.stream({
question: _.last(messages)?.content || '',
});
return new StreamingTextResponse(stream);
});
});
Please note that:
- We've introduced a
StringOutputParser
object into our chain to extract a text string from the LLM result, as the next step,retriever
, expects a string as input. - Instead of using
handlers
andstream
fromLangChainStream
, we've usedchain.stream()
to generate a streaming HTTP response. This decision was made becauseLangChainStream
only handles events on interaction with LLM. - To concatenate retrieved document chunks into a single text string, we've employed the following function as the last step in the chain:
ts (docs) => docs.map((e) => e.pageContent).join('\n\n')
With these changes in place, let's launch our app using the command pnpm dev
and conduct a test in the browser. You should observe the content of matched document chunks displayed together.
Answering
Let's proceed to generate a readable answer based on the matched document chunks.
To begin, we'll define the personality of our chatbot and create suitable prompt templates for crafting answers. Our chatbot is designed to:
- Be friendly
- Provide answers only based on the given context
- Avoid making up answers
- Apologize if it's unable to find an answer
Below is an example of an answering prompt:
const answerTemplate = `You are a helpful support assistant who can provide answers based on the provided context.
Please try to find the answer within the given context.
If you're unable to find the answer, respond with "I'm sorry, I don't know the answer to that."
Never attempt to create an answer. Always respond as if you were conversing with a friend.
Context: {context}
Question: {question}
Answer: `;
const answerPrompt = PromptTemplate.fromTemplate(answerTemplate);
Next, let's incorporate this prompt into our chain within the server/api/chat.ts
file.
...
export default defineLazyEventHandler(() => {
...
return defineEventHandler(async (event) => {
...
// standalone question prompt
...
// answer prompt
const answerTemplate = `You are a helpful support assistant who can answer a given question based on the context provided.
Try to find the answer in the context.
If you can't find the answer, say "I'm sorry, I don't know the answer to that."
Don't try to make up an answer. Always speak as if you were chatting to a friend.
context: {context}
question: {question}
answer: `;
const answerPrompt = PromptTemplate.fromTemplate(answerTemplate);
...
// chain
const question = _.last<Message>(messages)?.content || '';
const chain = standaloneQuestionPrompt
.pipe(llm)
.pipe(outputParser)
.pipe(retriever)
.pipe((docs) => ({
question,
context: docs.map((e) => e.pageContent).join('\n\n'),
}))
.pipe(answerPrompt)
.pipe(llm)
.pipe(outputParser);
const stream = await chain.stream({ question });
return new StreamingTextResponse(stream);
});
});
Please keep in mind:
- We define a
question
variable outside the chain to store the original input question from the user for access in multiple steps. - Retrieved document chunks are concatenated into a
context
field to match the input for the subsequent step. - The output generated by LLM is parsed into a text string by
outputParser
to ensure correct streaming.
To test the application, run docker compose up -d && pnpm dev
and conduct a trial in the browser.
As our source code of chain has become lengthy, managing it effectively requires advanced restructuring techniques. This is where the RunnableSequence
class becomes essential in the next section.
7. The RunnableSequence Class
In LangChain, a Runnable
is akin to an independent task or job that can be executed. Similarly, a RunnableSequence
is a series of such tasks lined up sequentially, where the output of one task becomes the input for the next. Think of it as a production line, where each step completes its job and passes the result downstream.
To enhance the organization of our current chain, we'll split it into three smaller chains: standalone question processing, retrieval, and answering. This structured approach will streamline our code and improve manageability. Let's proceed to build these chains individually.
- The standalone-question chain:
ts const standaloneQuestionTemplate = ... const standaloneQuestionPrompt = PromptTemplate.fromTemplate(standaloneQuestionTemplate); const questionChain = standaloneQuestionPrompt.pipe(llm).pipe(outputParser).pipe((qst) => { standaloneQuestion = qst; return qst; });
- The retrieval chain:
ts const retriever = vectorStore.asRetriever(); const retrievalChain = retriever.pipe((docs) => ({ question: standaloneQuestion, context: docs.map((e) => e.pageContent).join('\n\n'), }));
- The answering chain:
ts const answerTemplate = ... const answerPrompt = PromptTemplate.fromTemplate(answerTemplate); const answerChain = answerPrompt.pipe(llm).pipe(outputParser);
We can refactor our server/api/chat.ts
file by combining the smaller chains into one using RunnableSequence.from()
. This will help streamline the code and improve readability. Let's proceed with the modification.
...
import { RunnableSequence } from '@langchain/core/runnables';
export default defineLazyEventHandler(() => {
...
return defineEventHandler(async (event) => {
const { messages } = await readBody<{ messages: Message[] }>(event);
const outputParser = new StringOutputParser();
let question = _.last<Message>(messages)?.content || '';
// standalone question chain
const standaloneQuestionTemplate = `Given a question, convert the question to a standalone question.
question: {question}
standalone question:`;
const standaloneQuestionPrompt = PromptTemplate.fromTemplate(
standaloneQuestionTemplate
);
const questionChain = standaloneQuestionPrompt.pipe(llm).pipe(outputParser);
// retrieval chain
const retriever = vectorStore.asRetriever();
const retrievalChain = retriever.pipe((docs) => ({
question,
context: docs.map((e) => e.pageContent).join('\n\n'),
}));
// answering chain
const answerTemplate = `You are a helpful support assistant who can answer a given question based on the context provided.
Try to find the answer in the context.
If you can't find the answer, say "I'm sorry, I don't know the answer to that."
Don't try to make up an answer. Always speak as if you were chatting to a friend.
context: {context}
question: {question}
answer: `;
const answerPrompt = PromptTemplate.fromTemplate(answerTemplate);
const answerChain = answerPrompt.pipe(llm).pipe(outputParser);
// overall chain
const chain = RunnableSequence.from([questionChain, retrievalChain, answerChain]);
const stream = await chain.stream({ question: _.last(messages)?.content || '' });
return new StreamingTextResponse(stream);
});
});
Please note that:
-
RunnableSequence.from()
takes an array ofRunnableLike
objects, which behave likeRunnable
, to construct a new chain. - We've defined a
question
variable outside the chain to ensure it's accessible throughout, representing the original input. Thus it facilitates access to question across the chain.
Using variables defined outside the chain can hinder maintainability as the chain logic grows. LangChain provides a solution with the RunnablePassthrough
class.
Consider RunnablePassthrough
as a conveyor belt in a factory, seamlessly transferring input from one end to the other without alteration. Additionally, it can augment input with extra information if it's an object.
Let's integrate RunnablePassthrough
into our server/api/chat.ts
file, replacing the reliance on external variables.
...
import { RunnablePassthrough, RunnableSequence} from '@langchain/core/runnables';
import { Document as DocumentChunk } from '@langchain/core/documents';
export default defineLazyEventHandler(() => {
...
return defineEventHandler(async (event) => {
const { messages } = await readBody<{ messages: Message[] }>(event);
const outputParser = new StringOutputParser();
// standalone question chain
...
// retrieval chain
const retriever = vectorStore.asRetriever();
const retrievalChain = RunnableSequence.from([
(prevResult) => prevResult.standaloneQuestion,
retriever,
(docs: DocumentChunk[]) =>
docs.map((doc) => doc.pageContent).join('\n\n'),
]);
// answering chain
...
// overall chain
const chain = RunnableSequence.from([
{
standaloneQuestion: questionChain,
originalInput: new RunnablePassthrough(),
},
{
context: retrievalChain,
question: ({ originalInput }) => originalInput.question,
},
answerChain,
]);
const stream = await chain.stream({
question: _.last(messages)?.content || '',
});
return new StreamingTextResponse(stream);
});
});
Please note that:
- In the first step of the overall chain, we’ve transformed it into an object that encapsulates a
RunnablePassthrough
instance (with the keyoriginalInput
). When creating this instance, the original input parameters are retained for future use. - We have adjusted the second step in the overall chain to match the input expected by the following
answerChain
. - We have revamped the
retrievalChain
using aRunnableSequence
instead of the previouspipe
. Additionally, we’ve introduced a destructuring function before the retriever. This change is necessary because the output from the preceding questionChain is no longer a simple text string, and it now follows this structure:{ standaloneQuestion: ..., originalInput: ...}
.
8. Conversation Memory
To enhance the LLM's answer generation process, it's advantageous to leverage the following information:
- Nearest match
- Original user input
- Conversation history
While we've already integrated the first two types of information, let's finalize the process by incorporating the conversation history.
Fortunately, Vercel AI SDK offers built-in functionality to manage conversation history in memory for both of the backend and frontend.
In server/api/chat.ts
, the initial line of the defineEventHandler
function is:
const { messages } = await readBody<{ messages: Message[] }>(event);
This line extracts the messages
variable from the HTTP body, an array of Message
objects representing the conversation history. Let's update server/api/chat.ts
to include this conversation history in our chain.
...
return defineEventHandler(async (event) => {
...
const standaloneQuestionTemplate = `Given some conversation history (if any) and a question, convert the question to a standalone question.
conversation history: {conversation}
question: {question}
standalone question:`;
...
const answerTemplate = `You are a helpful support assistant who can answer a given question based on the context provided.
At first, try to find the answer in the context.
If the answer is not given in the context, find the answer in the conversation history if possible.
If you really don't know the answer, say "I'm sorry, I don't know the answer to that."
Don't try to make up an answer. Always speak as if you were chatting to a friend.
context: {context}
conversation history: {conversation}
question: {question}
answer: `;
...
// overall chain
const chain = RunnableSequence.from([
{
standaloneQuestion: questionChain,
originalInput: new RunnablePassthrough(),
},
{
context: retrievalChain,
conversation: ({ originalInput }) => originalInput.conversation,
question: ({ originalInput }) => originalInput.question,
},
answerChain,
]);
const stream = await chain.stream({
question: _.last(messages)?.content || '',
conversation: messages.map((m) => `${m.role}: ${m.content}`).join('\n\n'),
});
...
});
...
Please note that:
- We've updated question and answer prompt templates to incorporate the conversation history, whether generating standalone questions or answers.
- We've included received messages in the chain's input as the
conversation
field.
Now, run the app again with docker compose up -d && pnpm dev
, and observe the outcome. You'll notice that we've successfully developed a comprehensive chatbot application capable of processing any PDF document provided.
9. Wrap Up
Congratulations on completing this journey! I trust you've gained valuable insights into building a simple document chatbot with basic skills. However, the features we've developed represent just the tip of the iceberg in terms of what a real-life chatbot can offer. Consider enhancing the app with additional features such as managing multiple file uploads, implementing user authentication, and handling extensive conversation histories.
Moreover, it's essential to address any potential limitations in AI performance. To improve precision and efficiency, consider adjusting parameters like chunk size and overlap in text splitting, optimizing prompt engineering, and fine-tuning OpenAI settings such as temperature and model selection.
Remember, the path to mastering generative AI is both challenging and rewarding. Keep pushing forward and exploring new possibilities.
Feel free to refer to the source code provided here. Additionally, the sample PDF document used in this tutorial can be found here.
Keep striving for excellence, and don't hesitate to reach out if you encounter any hurdles along the way.
Top comments (0)