DEV Community

Edgar
Edgar

Posted on • Edited on

Building an Intelligent Chatbot with Nuxt, LangChain, and Vercel AI SDK

Welcome to our journey into the world of conversational AI! In this blog series, we will build a simple but complete chatbot application that enable users to interact with PDF documents they upload.

Our goal is to create a chatbot that empower users to engage with PDF content in a conversational manner, extracting information, answering questions, and providing insights — all through natural language interaction, with the help of OpenAI's large language models (LLM).

Our tech stack for this adventure includes some powerful tools and frameworks:

  • LangChain: A framework that helps developers build context-aware reasoning applications. Although LangChain supports a lot of LLMs, we will certainly use the most popular one provided by OpenAI.
  • Vercel AI SDK: an open-source library designed by Vercel to help developers build conversational streaming user interfaces.
  • PostgreSQL with the pgvector extension: A robust relational database with advanced vector storage capabilities, essential for storing and retrieving document embeddings efficiently.
  • Prisma: A modern database toolkit and ORM for Node.js and TypeScript, simplifying database access and management in our Nuxt application.
  • Nuxt: A lightning-fast web framework built on Vue.js and Nitro, offering server-side rendering and a seamless developer experience.
  • Tailwind CSS with the daisyUI plugin: A utility-first CSS framework paired with the amazing daisyUI plugin, offering a sleek and customizable UI design for our chatbot interface.

Throughout the series, we'll delve into each key component of chatting with documents, from setting up the infrastructure to implementing LLM capabilities. A fundamental knowledge of web development using Nuxt is required to start, but you don't have to be an expert of it.Let's dive in!

1. An Overview of the App Flow and Key Concept

The App Flow

Firstly Let's take a bird's-eye view of our chatbot application, which comprises two main parts: Data Contribution and Question Answering.

  • Data Contribution

The user can contribute knowledge to our chatbot by uploading a PDF document. Here's what happens behind the scenes:

  1. PDF Upload: The user uploads a PDF file containing valuable information.
  2. Text Splitting: The uploaded file is split into smaller chunks using a text splitter, breaking down the document into digestible pieces.
  3. Embeddings: Each chunk of text is sent to OpenAI, where sophisticated algorithms generate embeddings - a way to represent the essence of the text - as numerical vectors. We will explain it in detail later on.
  4. Vector Store: These embeddings that are stored in PostgreSQL creates a searchable database of document representations.
  • Question Answering

Here is how our chatbot engages with users to provide insightful answers:

  1. User Input: The user inputs some words or a question related to the content of the uploaded PDF.
  2. Conversation Memory: Your input is stored in the conversation memory, serving as the context for the ongoing conversation.
  3. Question Summarization: OpenAI summarizes your input into a concise question, ensuring clarity and relevance.
  4. Question Embedding: The summarized question is transformed into an embedding — a numerical representation — by OpenAI's powerful algorithms.
  5. Vector Search: The chatbot searches the PostgreSQL vector store for the nearest match to the question embedding, retrieving the most relevant document chunk.
  6. Answer Generation: Leveraging the user input, the nearest match, and the conversation history stored in memory, OpenAI generates a comprehensive answer.
  7. Presentation and Storage: The answer is presented to you, providing valuable insights, and is also stored in the conversation memory for future reference.

By understanding this flow, you'll gain insight into how our chatbot seamlessly interacts with users, leveraging advanced technologies to provide intelligent responses based on uploaded PDF content and user inquiries.

Key Concept: Embeddings

Embeddings are like fingerprints for words or documents. They're numerical representations that capture the essence of text, enabling computers to understand and compare words based on their meaning. Think of embeddings as unique identifiers that help machines interpret language and make sense of textual data.

Embeddings encode semantic similarities between words by placing similar words closer together in the vector space. Words with similar meanings will have similar embeddings, allowing machines to infer relationships and make intelligent predictions.

Let's consider a simple example using a small corpus of text:

"The quick brown fox jumps over the lazy dog."
Enter fullscreen mode Exit fullscreen mode

This piece of text can be tokenized (split) into individual words or tokens, to prepare the data for further processing, like this:

["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]
Enter fullscreen mode Exit fullscreen mode

Then we can use some techniques to train a model on the tokenized text data to generate embeddings for each word like this:

"quick": [0.2, 0.4, -0.1, 0.5, ...]
"fox": [0.1, 0.3, -0.2, 0.6, ...]
"lazy": [0.3, 0.6, -0.3, 0.8, ...]
Enter fullscreen mode Exit fullscreen mode

Each word is represented as a high-dimensional vector, where the values in the vector capture different aspects of the word's meaning. Similar words will have similar vector representations, allowing machines to understand and process language more effectively.

2. The App Skeleton

Before delving into the explanation of key concepts, let's roll up our sleeves and dive into building a chatbot app skeleton using Nuxt, Vercel AI SDK, and LangChain. Our primary objective here is to implement the simplest feature: enabling users to chat with a Language Model (LLM) powered by Vercel AI SDK.

Prerequisites

Make sure you have installed these softwares on your local machine.

  • Node.js (18+)
  • PNPM (latest)
  • Visual Studio Code (latest)
  • Docker Desktop (latest, which will be used in later parts)

Setting Up the Nuxt App

We can refer to the Nuxt document to create a Nuxt app project from scratch.

Let's execute the command below in a terminal to create the project skeleton. Here we give our app a name: chatpdf.

npx nuxi@latest init chatpdf
Enter fullscreen mode Exit fullscreen mode

When we are asked about which package manager to use, here let's choose pnpm.

❯ Which package manager would you like to use?
○ npm
● pnpm
○ yarn
○ bun
Enter fullscreen mode Exit fullscreen mode

When the package installation step is finished and we are asked about initialize a git repo, choose yes

❯ Initialize git repository?
● Yes / ○ No
Enter fullscreen mode Exit fullscreen mode

Then let's run code chatpdf to open the project using Visual Studio Code. From now on the commands can be operated in the VSCode terminal.

We can see that a minimal project structure has already been created, which includes:

  • A component named app.vue which is the home page of the app
  • A public folder in which we can put static files like images and icons.
  • A server folder in which we can put Nitro backend source code.

Now we can run docker compose up -d && pnpm dev in the terminal to start a dev server and open http://localhost:3000/ in the browser to view the web app.

Install Essential Modules

To equip our Nuxt app with essential chatting features, we'll begin by installing these Nuxt modules and packages as below.

  • @nuxtjs/tailwindcss: the zero-config Tailwind CSS module for Nuxt
  • daisyui: a Tailwind CSS plugin with a bunch of powerful UI elements
  • ai: Vercel AI SDK
  • langchain: LangChain framework
  • lodash: a utility library that provides helpful functions for manipulating arrays and objects

We can run this command to install all of them.

pnpm add lodash ai langchain @langchain/core @langchain/openai @langchain/community
pnpm add -D @nuxtjs/tailwindcss daisyui @types/lodash
Enter fullscreen mode Exit fullscreen mode

Then let's add modules into the app configuration by modifying nuxt.config.ts (that resides in the project root folder) as below.

export default defineNuxtConfig({
  devtools: { enabled: true },
  modules: ['@nuxtjs/tailwindcss'],  // add modules
  tailwindcss: {
    config: {
      plugins: [require('daisyui')],  // add daisyui plugins
    },
  },
});
Enter fullscreen mode Exit fullscreen mode

Implement the chat UI

Now, let's create a Vue component named ChatBox.vue in the components subdirectory and insert the following source code into ChatBox.vue.

<template>
  <div class="flex flex-col gap-2">
    <div class="overflow-scroll border border-solid border-gray-300 rounded p-4 flex-grow">
      <template v-for="message in messages" :key="message.id">
        <div
          class="whitespace-pre-wrap chat"
          :class="[isMessageFromUser(message) ? 'chat-end' : 'chat-start']"
        >
          <div class="chat-image flex flex-col items-center">
            <div class="avatar">
              <div class="w-10 rounded-full">
                <img :src="isMessageFromUser(message) ? 'https://cdn-icons-png.flaticon.com/512/3541/3541871.png' : 'https://cdn-icons-png.flaticon.com/512/1624/1624640.png'" />
              </div>
            </div>
            <strong>
              {{ isMessageFromUser(message) ? 'Me' : 'AI' }}
            </strong>
          </div>
          <div class="chat-header"></div>
          <div class="chat-bubble mb-4" :class="isMessageFromUser(message) ? 'chat-bubble-secondary' : 'chat-bubble-primary'">
            {{ message.content }}
          </div>
        </div>
      </template>
    </div>
    <form class="w-full" @submit.prevent="handleSubmit">
      <input
        class="w-full p-2 mb-8 border border-gray-300 rounded shadow-xl outline-none"
        v-model="input"
        placeholder="Say something..."
      />
    </form>
  </div>
</template>

<script lang="ts" setup>
const messages = ref<any>([{
  id: 0,
  role: 'ai',
  content: 'Hello!'
}]);
const input = ref<string>('');

function isMessageFromUser(message: any) {
  return message.role === 'user';
}

function handleSubmit() {
  if (input.value) {
    messages.value = [
      ...messages.value,
      {
        id: messages.value.length,
        role: 'user',
        content: input.value,
      },
      {
        id: messages.value.length + 1,
        role: 'ai',
        content: input.value,
      },
    ];
    input.value = '';
  }
}
</script>

<style></style>
Enter fullscreen mode Exit fullscreen mode

Then modify the app.vue file as below.

<template>
  <div class="h-screen w-1/2 mx-auto my-2 flex flex-col gap-2">
    <ChatBox class="flex-grow" />
  </div>
</template>
Enter fullscreen mode Exit fullscreen mode

Now we can run pnpm dev and open http://localhost:3000 to see a super simple echo bot app running!

Wire Up for OpenAI in the backend

Now, let's integrate OpenAI into our chatpdf app to facilitate smooth conversations with ChatGPT.

Firstly, obtain your OpenAI API key here. To securely store the API key separately from the source code, create a .env file directly in the root directory and add your API key like this:

NUXT_OPENAI_API_KEY=your_api_key_here
Enter fullscreen mode Exit fullscreen mode

Be sure to replace "your_api_key_here" with the actual key generated from your OpenAI account.

Please note that the key name NUXT_OPENAI_API_KEY should remain unchanged to allow our Nuxt app to read it properly, including case changes. You can find more information on its usage here.

Next, let's define the environment variable we're using in nuxt.config.ts:

export default defineNuxtConfig({
  ...,
  // add your desired config
  runtimeConfig: {
    openaiApiKey: '',
  }
});
Enter fullscreen mode Exit fullscreen mode

Then, create the api subdirectory inside the server directory and add a chat.ts file to it.

Now replace the contents of chat.ts with the following source code to integrate with Vercel SDK, LangChain, and OpenAI.

import { LangChainStream, Message, StreamingTextResponse } from 'ai';
import { ChatOpenAI } from 'langchain/chat_models/openai';
import { AIMessage, HumanMessage } from 'langchain/schema';

export default defineLazyEventHandler(() => {
  // fetch the OpenAI API key
  const apiKey = useRuntimeConfig().openaiApiKey;
  if (!apiKey) {
    throw createError('Missing OpenAI API key');
  }

  // create a OpenAI LLM client
  const llm = new ChatOpenAI({
    openAIApiKey: apiKey,
    streaming: true,
  });

  return defineEventHandler(async (event) => {
    const { messages } = await readBody<{ messages: Message[] }>(event);

    const { stream, handlers } = LangChainStream();
    llm
      .invoke(
        (messages as Message[]).map((message) =>
          message.role === 'user' ? new HumanMessage(message.content) : new AIMessage(message.content)
        ),
        { callbacks: [handlers] }
      )
      .catch(console.error);
    return new StreamingTextResponse(stream);
  });
});
Enter fullscreen mode Exit fullscreen mode

Please note that:

  • We're utilizing defineLazyEventHandler to perform one-time setup before defining the actual event handler.
  • useRuntimeConfig will automatically retrieve environment variables from the .env file, converting NUXT_***-like names (in SCREAMING_SNAKE_CASE) to their camel-case equivalents. For instance, NUXT_OPENAI_API_KEY will be accessed as useRuntimeConfig().openaiApiKey.

With Vercel AI SDK now offering streaming capabilities for both frontend and backend, our task is to connect our ChatBox.vue component to the Vercel AI SDK for Vue. Let's proceed by updating our ChatBox.vue component as follows.

<template>
...
</template>

<script lang="ts" setup>
import { useChat, type Message } from 'ai/vue';

const { messages, input, handleSubmit } = useChat({
  headers: { 'Content-Type': 'application/json' },
});

function isMessageFromUser(message: Message) {
  return message.role === 'user';
}
</script>

<style></style>
Enter fullscreen mode Exit fullscreen mode

As you've observed, the chat messages, input message, and chat submit handler are already thoroughly defined and implemented within the Vercel SDK. Therefore, all we need to do is import and utilize them using its useChat composable. Quite convenient, isn't it?

Now, let's launch our app with pnpm dev, and open http://localhost:3000 in the browser to test interacting with ChatGPT. Congratulations, you've done it!

v0

3. Setting Up the Database and Vector Store

Let's begin by configuring the database to store document data and associated embeddings. We'll utilize Docker to run PostgreSQL, ensuring a smooth setup process. Ensure that Docker Desktop is installed and running on your local machine.

Running PostgreSQL with Docker Compose

If you're not well-versed in Docker and PostgreSQL, there's no need to worry. You don't need to be a Docker expert to set up your database efficiently. Docker Compose simplifies the setup process and guarantees consistency across different environments.

To start, create a docker-compose.yml file in the project's root directory with the following configuration, defining our PostgreSQL service and including the necessary settings for the pgvector extension.

version: '3'
services:
  db:
    image: ankane/pgvector
    ports:
      - 5432:5432
    volumes:
      - db:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=postgres
      - POSTGRES_USER=postgres
      - POSTGRES_DB=chatdoc

volumes:
  db:
Enter fullscreen mode Exit fullscreen mode

You can rest assured that the complexities of PostgreSQL installation and extension setup are all taken care of by Docker Compose. Ensure that Docker Desktop is running on your local machine, then execute the following command in your project's root directory:

docker compose up -d
Enter fullscreen mode Exit fullscreen mode

This command will initiate the PostgreSQL service defined in the docker-compose.yml file, along with the pgvector extension. Docker Compose will handle container creation, network setup, and volume mounting automatically.

If you're comfortable with the command line, you won't need to install any additional PostgreSQL client tools on your local computer. Simply use docker compose to access the running container and inspect the database:

docker compose exec db psql -h localhost -U postgres -d chatpdf
Enter fullscreen mode Exit fullscreen mode

Once inside the container's terminal, execute \l to display all databases. You should see something like this:

chatpdf=# \l
                                                List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    | ICU Locale | Locale Provider |   Access privileges
-----------+----------+----------+------------+------------+------------+-----------------+-----------------------
 chatpdf   | postgres | UTF8     | en_US.utf8 | en_US.utf8 |            | libc            |
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |            | libc            |
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 |            | libc            | =c/postgres          +
           |          |          |            |            |            |                 | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 |            | libc            | =c/postgres          +
           |          |          |            |            |            |                 | postgres=CTc/postgres
(4 rows)
Enter fullscreen mode Exit fullscreen mode

The chatpdf database is the one we'll connect to our app. Execute \dt to display all tables in the current database (which should be chatpdf):

chatpdf=# \dt
Did not find any relations
Enter fullscreen mode Exit fullscreen mode

As expected, it's currently empty. We'll create our first table shortly. To exit the terminal and the container, execute \q since we'll manipulate the database in the source code (using Prisma) rather than the command line.

Setting Up the Database with Prisma

Prisma is a modern database toolkit that streamlines database access and manipulation for developers. It provides a type-safe and auto-generated database client, facilitating seamless interaction with our PostgreSQL database.

Let's begin by installing the necessary packages for Prisma and performing initialization:

pnpm add prisma @prisma/client
npx prisma init
Enter fullscreen mode Exit fullscreen mode

This will generate a prisma/schema.prisma file with the following content:

// This is your Prisma schema file,
// learn more about it in the docs: https://pris.ly/d/prisma-schema

generator client {
  provider = "prisma-client-js"
}

datasource db {
  provider = "postgresql"
  url      = env("DATABASE_URL")
}
Enter fullscreen mode Exit fullscreen mode

Now, append the following source code to define our first model, Document, which will store document chunks:

model Document {
  id      String    @id @default(cuid())
  content String
  vector  Unsupported("vector")?
}
Enter fullscreen mode Exit fullscreen mode

In simple terms, we've defined three fields for the Document model:

  • id: the document chunk ID
  • content: the original text of the document chunk
  • vector: the generated embedding for the document chunk

Since the schema uses an environment variable named DATABASE_URL, we need to define it in the .env file. Replace the contents of .env with the following:

DATABASE_URL=postgres://postgres:postgres@localhost:5432/chatpdf
Enter fullscreen mode Exit fullscreen mode

Now, run npx prisma migrate dev to create a migration and migrate the data. You'll be prompted to name the new migration. For example, you can name it create the document table:

Environment variables loaded from .env
Prisma schema loaded from prisma/schema.prisma
Datasource "db": PostgreSQL database "chatpdf", schema "public" at "localhost:5432"

? Enter a name for the new migration: › create the document table
Enter fullscreen mode Exit fullscreen mode

Once the command successfully executes, the Document table will be created in the chatpdf database. To verify the result, connect to the container's console:

docker compose exec db psql -h localhost -U postgres -d chatpdf
Enter fullscreen mode Exit fullscreen mode

In the container's console, run select * from "Document" to display the tables:

chatpdf=# select * from "Document";

 id | content | vector
----+---------+--------
(0 rows)
Enter fullscreen mode Exit fullscreen mode

Now that we have an empty Document table created, let's proceed to populate some data into it.

4. Uploading Files, Text Splitting, and Embedding

In this section, we'll delve into the process of uploading files, splitting their text content into manageable chunks, generating embeddings to represent each chunk, and finally storing the embeddings into the database (vector store). This is a critical step in the data contribution phase and lays the groundwork for building a chatbot capable of intelligently interacting with the content of uploaded documents.

File Uploading

As file uploading is a common task for web applications, we'll simplify the process by leveraging the popular Node.js package called formidable. Since Nuxt's backend is powered by the h3 framework, we'll opt for the h3-formidable package instead. This package seamlessly integrates formidable with h3, and it includes the original formidable package as a dependency. Additionally, we'll install its corresponding type library for enhanced TypeScript support.

pnpm add h3-formidable
pnpm add -D @types/formidable
Enter fullscreen mode Exit fullscreen mode

Next, we'll create an server/api/upload.ts file and incorporate the provided source code into it.

import _ from 'lodash';
import { readFiles } from 'h3-formidable';

export default defineEventHandler(async (event) => {
  const { files } = await readFiles(event, {
    maxFiles: 1,
    keepExtensions: true,
  });
  _.chain(files)
    .values()
    .flatten()
    .compact()
    .value()
    .forEach((file) => {
      const { originalFilename, newFilename } = file;
      console.log({ originalFilename, newFilename });
    });
});
Enter fullscreen mode Exit fullscreen mode

For now, our primary goal is to parse and log the uploaded files to verify the basic feature. To keep things simple, let's restrict the number of uploaded files to just one.

Next, let's integrate the file uploading feature into the frontend. In app.vue, insert a file-input element above the ChatBox component to enable file selection and uploading. Make sure to include the accept=".pdf" attribute in the <input> element to limit file selection to only PDF files.

<template>
  <div class="h-screen w-1/2 mx-auto my-2 flex flex-col gap-2">
    <form class="flex justify-between items-center gap-1">
      <input
        type="file"
        id="file"
        accept=".pdf"
        @change="uploadFile($event.target as HTMLInputElement)"
      />
    </form>
    <ChatBox class="flex-grow" />
  </div>
</template>
Enter fullscreen mode Exit fullscreen mode

Afterwards, include the following TypeScript source code within a <script> tag to implement the uploadFile function:

<script lang="ts" setup>
async function uploadFile(elem: HTMLInputElement) {
  const formData = new FormData();
  formData.append('file', elem.files?.[0] as Blob);
  try {
    await useFetch('/api/upload', {
      method: 'POST',
      body: formData,
    });
  } catch (error) {
    console.error(error);
  }
}
</script>
Enter fullscreen mode Exit fullscreen mode

Now, run pnpm dev and access http://localhost:3000. Click "Choose File" and select a local file to complete the upload. You should see log messages similar to the following, confirming that the basic file uploading is functioning:

{
  originalFilename: 'Design Patterns.pdf',
  newFilename: '30e4616e653a5dd30a0b1c300.pdf'
}
Enter fullscreen mode Exit fullscreen mode

Text Splitting

After uploading a file, we'll employ the LangChain framework to split its text content into smaller, digestible chunks. This step is essential for breaking down lengthy documents into individual sentences or paragraphs, enabling our chatbot to process and analyze each part effectively.

To begin, we'll need to install the pdf-parse package to gain the capability to parse PDF data:

pnpm add pdf-parse
Enter fullscreen mode Exit fullscreen mode

Next, we'll modify the server/api/upload.ts file as follows to handle this task.

import _ from 'lodash';
import { readFiles } from 'h3-formidable';
import { PDFLoader } from 'langchain/document_loaders/fs/pdf';
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';

export default defineEventHandler(async (event) => {
  const { files } = await readFiles(event, {
    maxFiles: 1,
    keepExtensions: true,
  });
  _.chain(files)
    .values()
    .flatten()
    .compact()
    .value()
    .forEach(async (file) => {
      const loader = new PDFLoader(file.filepath);
      const docs = await loader.load();
      const splitter = new RecursiveCharacterTextSplitter({
        chunkSize: 500,
        chunkOverlap: 50,
      });
      const chunks = await splitter.splitDocuments(docs);
      chunks.map((chunk, i) => {
        console.log(
          `page ${i}:`,
          _.truncate(chunk.pageContent, { length: 10 })
        );
      });
    });
});
Enter fullscreen mode Exit fullscreen mode

We utilize PDFLoader to load and parse the uploaded file, and RecursiveCharacterTextSplitter to split it into chunks. The parameters for RecursiveCharacterTextSplitter are:

  • chunkSize: the maximum size of one document chunk
  • chunkOverlap: the size of overlap between chunks

Sure, let's execute the app by running docker compose up -d && pnpm dev, then access http://localhost:3000 in your browser. Once the page loads, select a PDF document (for instance, I'll choose a lecture document about design patterns), and upload it to the server.

As we inspect the console, log messages in the console will resemble the following. (Note: Text contents may vary.)

page 0: Design Patterns
...
page 1: • connections among...
page 2: between flexibility...
page 3: Disadvantages: The ...
page 4: Disadvantages: Comm...
page 5: and do bookkeeping ...
page 6: Disadvantages: Comm...
page 7: 1.2 When (not) to ...
page 8: the domain and pri...
page 9: can increase under ...
page 10: way and in this pr...
...
Enter fullscreen mode Exit fullscreen mode

These log messages indicate that the uploaded file has been successfully split into multiple document chunks. Now, it's time to generate embeddings for each chunk.

Embeddings & Vector Store

To generate embeddings, we'll utilize OpenAIEmbeddings, and to store the embeddings into the database, we'll use PrismaVectorStore. Let's import these components and implement the code in server/api/upload.ts.

import _ from 'lodash';
import { readFiles } from 'h3-formidable';
import { PDFLoader } from 'langchain/document_loaders/fs/pdf';
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import { OpenAIEmbeddings } from '@langchain/openai';
import { PrismaVectorStore } from '@langchain/community/vectorstores/prisma';
import { Prisma, PrismaClient, type Document } from '@prisma/client';

export default defineEventHandler(async (event) => {
  const { files } = await readFiles(event, {
    maxFiles: 1,
    keepExtensions: true,
  });
  _.chain(files)
    .values()
    .flatten()
    .compact()
    .value()
    .forEach(async (file) => {
      // split the uploaded file into chunks
      const loader = new PDFLoader(file.filepath);
      const docs = await loader.load();
      const splitter = new RecursiveCharacterTextSplitter({
        chunkSize: 500,
        chunkOverlap: 50,
      });
      const chunks = await splitter.splitDocuments(docs);
      // generate embeddings and the vector store
      const openAIApiKey = useRuntimeConfig().openaiApiKey;
      const embeddings = new OpenAIEmbeddings({ openAIApiKey });
      const db = new PrismaClient();
      const vectorStore = PrismaVectorStore.withModel<Document>(db).create(
        embeddings,
        {
          prisma: Prisma,
          tableName: 'Document',
          vectorColumnName: 'vector',
          columns: {
            id: PrismaVectorStore.IdColumn,
            content: PrismaVectorStore.ContentColumn,
          },
        }
      );
      // store the chunks in the database
      await db.document.deleteMany();   // delete existing document chunks
      await vectorStore.addModels(
        await db.$transaction(
          chunks.map((chunk) =>
            db.document.create({ data: { content: chunk.pageContent } })
          )
        )
      );
    });
});
Enter fullscreen mode Exit fullscreen mode

Assuming only one file is allowed to exist, we need to delete all existing document chunks before inserting new data.

Let's proceed with the following steps to test our implementation:

  1. Run the app using the commands: docker compose up -d && pnpm dev
  2. Access http://localhost:3000 in your browser.
  3. Upload a PDF file using the provided UI.
  4. Enter the db container by running the command: docker compose exec db psql -h localhost -U postgres -d chatpdf
  5. In the container's console, execute the following SQL statement to inspect the result:
   select * from "Document" limit 1
Enter fullscreen mode Exit fullscreen mode

You should observe a result similar to the following:

            id            |      content     |              vector
--------------------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
clt0t638o000uhkyprf1voxkr | Design Patterns +| [0.017870644,0.010731987,-0.0072209467,0.00070460804,-0.012679516,0.004573952,-0.032422256,-0.014085304,-0.012117201,-0.026236793,3.6932724e-06,0.012713804,-0.0016620865,-0.010649697,0.010341109,0.023822954,0.0016320848,-0.0071455142,0.037551668,0.0007217518,0.008551301,-0.0059111645,-0.010574264,-0.007755832,-0.009730792,...]
Enter fullscreen mode Exit fullscreen mode

The sequence of numbers represents the generated embeddings vector for the text chunk.

5. Standalone Questions and Prompt Templates

Now, let's delve into the question-answering aspect. In essence, the process revolves around generating a vector for the user's input question, finding the nearest match in our vector store, and formulating an appropriate response.

However, before we begin, it's crucial to understand the concept of standalone questions.

Standalone Questions

A standalone question is a distilled version of the user's input question, devoid of any contextual dependencies. It serves as a concise summary, facilitating clear and efficient communication.

For instance, consider a user query on an online store:

I'm thinking of buying one of your T-shirts, but I need to know what your return policy is as some T-shirts just don't fit me and I don't want to waste my money.
Enter fullscreen mode Exit fullscreen mode

Parsing this original question directly might yield inaccurate results due to its verbosity and extraneous details. Therefore, we distill it down to its core essence: I need to know what your return policy is. This concise version is what we search for in our database, ensuring accuracy and relevance in our responses.

To accomplish this, we leverage the mechanism of prompt templates in LangChain.

Prompt Templates

Prompt templates serve as a structured framework for guiding interactions between users and AI systems. They define predefined formats or patterns for AI responses, ensuring relevance and coherence in conversations.

For instance, consider the following example using LangChain's prompt templates:

import { ChatOpenAI } from "@langchain/openai";
import { PromptTemplate } from "@langchain/core/prompts";

const model = new ChatOpenAI({});
const promptTemplate = PromptTemplate.fromTemplate("Tell me a joke about {topic}");

const chain = promptTemplate.pipe(model);

const result = await chain.invoke({ topic: "bears" });

console.log(result);

/*
  AIMessage {
    content: "Why don't bears wear shoes?\n\nBecause they have bear feet!",
  }
*/
Enter fullscreen mode Exit fullscreen mode

In this example, the prompt template "Tell me a joke about {topic}" incorporates a parameter {topic}. By providing a value for {topic}, such as "bears", we generate a specific prompt: "Tell me a joke about bears". This structured prompt guides the AI's response, resulting in a relevant and targeted answer.

Now, let's leverage prompt templates to generate standalone questions in the server/api/chat.ts file.

import { LangChainStream, Message, StreamingTextResponse } from 'ai';
import { ChatOpenAI } from '@langchain/openai';
import { PromptTemplate } from '@langchain/core/prompts';
import _ from 'lodash';

export default defineLazyEventHandler(() => {
  const apiKey = useRuntimeConfig().openaiApiKey;
  if (!apiKey) {
    throw createError('Missing OpenAI API key');
  }
  const llm = new ChatOpenAI({
    openAIApiKey: apiKey,
    streaming: true,
  });

  return defineEventHandler(async (event) => {
    const { messages } = await readBody<{ messages: Message[] }>(event);

    const { stream, handlers } = LangChainStream();

    const standaloneQuestionTemplate = `Given a question, convert the question to a standalone question.
      question: {question}
      standalone question:`;
    const standaloneQuestionPrompt = PromptTemplate.fromTemplate(standaloneQuestionTemplate);

    standaloneQuestionPrompt
      .pipe(llm)
      .invoke(
        { question: _.last(messages)?.content || '' },
        { callbacks: [handlers] }
      )
      .catch(console.error);
    return new StreamingTextResponse(stream);
  });
});

Enter fullscreen mode Exit fullscreen mode

Notice that standaloneQuestionPrompt.pipe(llm) is just the first chain we've created. In LangChain, chains are fundamental components that consist of sequences of calls to various entities, such as large language models (LLMs), tools, or data preprocessing steps.

Chains are assembled by linking objects and actions together using the pipe method, analogous to how pipes (|) function in the Linux command line, to execute a series of tasks. Finally, the invoke method is invoked to obtain the ultimate result.

Now, let's launch our application and give it a test run. You should receive the standalone version of your input question, as depicted in the image below.

v1

6. Retrieval and Answering

Now that we've set up the standalone question prompt, let's proceed to use these standalone questions as inputs to search the database and generate responses.

Retrieval

The process outlined above is known as Retrieval. It typically involves the following steps:

  • Generating embeddings for the standalone question.
  • Utilizing these embeddings to query the vector store and identify the closest match.
  • Retrieving all matched document chunks.

The component responsible for executing retrieval tasks is referred to as a retriever. You can obtain a retriever from a Prisma vector store using the code snippet provided below.

const llm = new ChatOpenAI({ openAIApiKey: "..." });
const db = new PrismaClient();
const embeddings = new OpenAIEmbeddings({ openAIApiKey: "..." });
const vectorStore = PrismaVectorStore.withModel<Document>(db).create(
  embeddings,
  {
    prisma: Prisma,
    tableName: 'Document',
    vectorColumnName: 'vector',
    columns: {
      id: PrismaVectorStore.IdColumn,
      content: PrismaVectorStore.ContentColumn,
    },
  }
);
const retriever = vectorStore.asRetriever();
...
Enter fullscreen mode Exit fullscreen mode

The retriever object can seamlessly integrate into our chain, handling all the intricate retrieval tasks for us.

Let's incorporate the provided code snippet into our app's server/api/chat.ts file.

import { Message, StreamingTextResponse } from 'ai';
import { ChatOpenAI, OpenAIEmbeddings } from '@langchain/openai';
import { PromptTemplate } from '@langchain/core/prompts';
import _ from 'lodash';
import { Prisma, PrismaClient, type Document } from '@prisma/client';
import { PrismaVectorStore } from '@langchain/community/vectorstores/prisma';
import { StringOutputParser } from '@langchain/core/output_parsers';

export default defineLazyEventHandler(() => {
  const apiKey = useRuntimeConfig().openaiApiKey;
  if (!apiKey) {
    throw createError('Missing OpenAI API key');
  }
  const llm = new ChatOpenAI({
    openAIApiKey: apiKey,
    streaming: true,
  });

  const db = new PrismaClient();
  const embeddings = new OpenAIEmbeddings({ openAIApiKey: apiKey });
  const vectorStore = PrismaVectorStore.withModel<Document>(db).create(
    embeddings,
    {
      prisma: Prisma,
      tableName: 'Document',
      vectorColumnName: 'vector',
      columns: {
        id: PrismaVectorStore.IdColumn,
        content: PrismaVectorStore.ContentColumn,
      },
    }
  );

  return defineEventHandler(async (event) => {
    const { messages } = await readBody<{ messages: Message[] }>(event);
    // standalone question prompt
    const standaloneQuestionTemplate = `Given a question, convert the question to a standalone question.
      question: {question}
      standalone question:`;
    const standaloneQuestionPrompt = PromptTemplate.fromTemplate(standaloneQuestionTemplate);
    // retrieval
    const retriever = vectorStore.asRetriever();
    const outputParser = new StringOutputParser();
    // chain them together
    const chain = standaloneQuestionPrompt
      .pipe(llm)
      .pipe(outputParser)
      .pipe(retriever)
      .pipe((docs) => docs.map((e) => e.pageContent).join('\n\n'));
    const stream = await chain.stream({
      question: _.last(messages)?.content || '',
    });
    return new StreamingTextResponse(stream);
  });
});
Enter fullscreen mode Exit fullscreen mode

Please note that:

  • We've introduced a StringOutputParser object into our chain to extract a text string from the LLM result, as the next step, retriever, expects a string as input.
  • Instead of using handlers and stream from LangChainStream, we've used chain.stream() to generate a streaming HTTP response. This decision was made because LangChainStream only handles events on interaction with LLM.
  • To concatenate retrieved document chunks into a single text string, we've employed the following function as the last step in the chain:
  (docs) => docs.map((e) => e.pageContent).join('\n\n')
Enter fullscreen mode Exit fullscreen mode

With these changes in place, let's launch our app using the command pnpm dev and conduct a test in the browser. You should observe the content of matched document chunks displayed together.

v2

Answering

Let's proceed to generate a readable answer based on the matched document chunks.

To begin, we'll define the personality of our chatbot and create suitable prompt templates for crafting answers. Our chatbot is designed to:

  • Be friendly
  • Provide answers only based on the given context
  • Avoid making up answers
  • Apologize if it's unable to find an answer

Below is an example of an answering prompt:

const answerTemplate = `You are a helpful support assistant who can provide answers based on the provided context.
  Please try to find the answer within the given context.
  If you're unable to find the answer, respond with "I'm sorry, I don't know the answer to that."
  Never attempt to create an answer. Always respond as if you were conversing with a friend.
  Context: {context}
  Question: {question}
  Answer: `;
const answerPrompt = PromptTemplate.fromTemplate(answerTemplate);
Enter fullscreen mode Exit fullscreen mode

Next, let's incorporate this prompt into our chain within the server/api/chat.ts file.

...
export default defineLazyEventHandler(() => {
  ...
  return defineEventHandler(async (event) => {
    ...
    // standalone question prompt
    ...
    // answer prompt
    const answerTemplate = `You are a helpful support assistant who can answer a given question based on the context provided.
      Try to find the answer in the context.
      If you can't find the answer, say "I'm sorry, I don't know the answer to that."
      Don't try to make up an answer. Always speak as if you were chatting to a friend.
      context: {context}
      question: {question}
      answer: `;
    const answerPrompt = PromptTemplate.fromTemplate(answerTemplate);
    ...
    // chain
    const question = _.last<Message>(messages)?.content || '';
    const chain = standaloneQuestionPrompt
      .pipe(llm)
      .pipe(outputParser)
      .pipe(retriever)
      .pipe((docs) => ({
        question,
        context: docs.map((e) => e.pageContent).join('\n\n'),
      }))
      .pipe(answerPrompt)
      .pipe(llm)
      .pipe(outputParser);
    const stream = await chain.stream({ question });
    return new StreamingTextResponse(stream);
  });
});
Enter fullscreen mode Exit fullscreen mode

Please keep in mind:

  • We define a question variable outside the chain to store the original input question from the user for access in multiple steps.
  • Retrieved document chunks are concatenated into a context field to match the input for the subsequent step.
  • The output generated by LLM is parsed into a text string by outputParser to ensure correct streaming.

To test the application, run docker compose up -d && pnpm dev and conduct a trial in the browser.

As our source code of chain has become lengthy, managing it effectively requires advanced restructuring techniques. This is where the RunnableSequence class becomes essential in the next section.

7. The RunnableSequence Class

In LangChain, a Runnable is akin to an independent task or job that can be executed. Similarly, a RunnableSequence is a series of such tasks lined up sequentially, where the output of one task becomes the input for the next. Think of it as a production line, where each step completes its job and passes the result downstream.

To enhance the organization of our current chain, we'll split it into three smaller chains: standalone question processing, retrieval, and answering. This structured approach will streamline our code and improve manageability. Let's proceed to build these chains individually.

  • The standalone-question chain:
  const standaloneQuestionTemplate = ...
  const standaloneQuestionPrompt = PromptTemplate.fromTemplate(standaloneQuestionTemplate);
  const questionChain = standaloneQuestionPrompt.pipe(llm).pipe(outputParser).pipe((qst) => {
    standaloneQuestion = qst;
    return qst;
  });
Enter fullscreen mode Exit fullscreen mode
  • The retrieval chain:
  const retriever = vectorStore.asRetriever();
  const retrievalChain = retriever.pipe((docs) => ({
    question: standaloneQuestion,
    context: docs.map((e) => e.pageContent).join('\n\n'),
  }));
Enter fullscreen mode Exit fullscreen mode
  • The answering chain:
  const answerTemplate = ...
  const answerPrompt = PromptTemplate.fromTemplate(answerTemplate);
  const answerChain = answerPrompt.pipe(llm).pipe(outputParser);
Enter fullscreen mode Exit fullscreen mode

We can refactor our server/api/chat.ts file by combining the smaller chains into one using RunnableSequence.from(). This will help streamline the code and improve readability. Let's proceed with the modification.

...
import { RunnableSequence } from '@langchain/core/runnables';

export default defineLazyEventHandler(() => {
  ...

  return defineEventHandler(async (event) => {
    const { messages } = await readBody<{ messages: Message[] }>(event);
    const outputParser = new StringOutputParser();
    let question = _.last<Message>(messages)?.content || '';

    // standalone question chain
    const standaloneQuestionTemplate = `Given a question, convert the question to a standalone question.
      question: {question}
      standalone question:`;
    const standaloneQuestionPrompt = PromptTemplate.fromTemplate(
      standaloneQuestionTemplate
    );
    const questionChain = standaloneQuestionPrompt.pipe(llm).pipe(outputParser);

    // retrieval chain
    const retriever = vectorStore.asRetriever();
    const retrievalChain = retriever.pipe((docs) => ({
      question,
      context: docs.map((e) => e.pageContent).join('\n\n'),
    }));

    // answering chain
    const answerTemplate = `You are a helpful support assistant who can answer a given question based on the context provided.
      Try to find the answer in the context.
      If you can't find the answer, say "I'm sorry, I don't know the answer to that."
      Don't try to make up an answer. Always speak as if you were chatting to a friend.
      context: {context}
      question: {question}
      answer: `;
    const answerPrompt = PromptTemplate.fromTemplate(answerTemplate);
    const answerChain = answerPrompt.pipe(llm).pipe(outputParser);

    // overall chain
    const chain = RunnableSequence.from([questionChain, retrievalChain, answerChain]);
    const stream = await chain.stream({ question: _.last(messages)?.content || '' });
    return new StreamingTextResponse(stream);
  });
});
Enter fullscreen mode Exit fullscreen mode

Please note that:

  • RunnableSequence.from() takes an array of RunnableLike objects, which behave like Runnable, to construct a new chain.
  • We've defined a question variable outside the chain to ensure it's accessible throughout, representing the original input. Thus it facilitates access to question across the chain.

Using variables defined outside the chain can hinder maintainability as the chain logic grows. LangChain provides a solution with the RunnablePassthrough class.

Consider RunnablePassthrough as a conveyor belt in a factory, seamlessly transferring input from one end to the other without alteration. Additionally, it can augment input with extra information if it's an object.

Let's integrate RunnablePassthrough into our server/api/chat.ts file, replacing the reliance on external variables.

...
import { RunnablePassthrough, RunnableSequence} from '@langchain/core/runnables';
import { Document as DocumentChunk } from '@langchain/core/documents';

export default defineLazyEventHandler(() => {
  ...

  return defineEventHandler(async (event) => {
    const { messages } = await readBody<{ messages: Message[] }>(event);
    const outputParser = new StringOutputParser();

    // standalone question chain
    ...

    // retrieval chain
    const retriever = vectorStore.asRetriever();
    const retrievalChain = RunnableSequence.from([
      (prevResult) => prevResult.standaloneQuestion,
      retriever,
      (docs: DocumentChunk[]) =>
        docs.map((doc) => doc.pageContent).join('\n\n'),
    ]);

    // answering chain
    ...

    // overall chain
    const chain = RunnableSequence.from([
      {
        standaloneQuestion: questionChain,
        originalInput: new RunnablePassthrough(),
      },
      {
        context: retrievalChain,
        question: ({ originalInput }) => originalInput.question,
      },
      answerChain,
    ]);
    const stream = await chain.stream({
      question: _.last(messages)?.content || '',
    });
    return new StreamingTextResponse(stream);
  });
});
Enter fullscreen mode Exit fullscreen mode

Please note that:

  • In the first step of the overall chain, we’ve transformed it into an object that encapsulates a RunnablePassthrough instance (with the key originalInput). When creating this instance, the original input parameters are retained for future use.
  • We have adjusted the second step in the overall chain to match the input expected by the following answerChain.
  • We have revamped the retrievalChain using a RunnableSequence instead of the previous pipe. Additionally, we’ve introduced a destructuring function before the retriever. This change is necessary because the output from the preceding questionChain is no longer a simple text string, and it now follows this structure: { standaloneQuestion: ..., originalInput: ...}.

8. Conversation Memory

To enhance the LLM's answer generation process, it's advantageous to leverage the following information:

  • Nearest match
  • Original user input
  • Conversation history

While we've already integrated the first two types of information, let's finalize the process by incorporating the conversation history.

Fortunately, Vercel AI SDK offers built-in functionality to manage conversation history in memory for both of the backend and frontend.

In server/api/chat.ts, the initial line of the defineEventHandler function is:

const { messages } = await readBody<{ messages: Message[] }>(event);
Enter fullscreen mode Exit fullscreen mode

This line extracts the messages variable from the HTTP body, an array of Message objects representing the conversation history. Let's update server/api/chat.ts to include this conversation history in our chain.

...
return defineEventHandler(async (event) => {
  ...

  const standaloneQuestionTemplate = `Given some conversation history (if any) and a question, convert the question to a standalone question.
    conversation history: {conversation}
    question: {question}
    standalone question:`;
  ...

  const answerTemplate = `You are a helpful support assistant who can answer a given question based on the context provided.
    At first, try to find the answer in the context.
    If the answer is not given in the context, find the answer in the conversation history if possible.
    If you really don't know the answer, say "I'm sorry, I don't know the answer to that."
    Don't try to make up an answer. Always speak as if you were chatting to a friend.
    context: {context}
    conversation history: {conversation}
    question: {question}
    answer: `;
  ...

  // overall chain
  const chain = RunnableSequence.from([
    {
      standaloneQuestion: questionChain,
      originalInput: new RunnablePassthrough(),
    },
    {
      context: retrievalChain,
      conversation: ({ originalInput }) => originalInput.conversation,
      question: ({ originalInput }) => originalInput.question,
    },
    answerChain,
  ]);
  const stream = await chain.stream({
    question: _.last(messages)?.content || '',
    conversation: messages.map((m) => `${m.role}: ${m.content}`).join('\n\n'),
  });
  ...
});
...
Enter fullscreen mode Exit fullscreen mode

Please note that:

  • We've updated question and answer prompt templates to incorporate the conversation history, whether generating standalone questions or answers.
  • We've included received messages in the chain's input as the conversation field.

Now, run the app again with docker compose up -d && pnpm dev, and observe the outcome. You'll notice that we've successfully developed a comprehensive chatbot application capable of processing any PDF document provided.

v3

9. Wrap Up

Congratulations on completing this journey! I trust you've gained valuable insights into building a simple document chatbot with basic skills. However, the features we've developed represent just the tip of the iceberg in terms of what a real-life chatbot can offer. Consider enhancing the app with additional features such as managing multiple file uploads, implementing user authentication, and handling extensive conversation histories.

Moreover, it's essential to address any potential limitations in AI performance. To improve precision and efficiency, consider adjusting parameters like chunk size and overlap in text splitting, optimizing prompt engineering, and fine-tuning OpenAI settings such as temperature and model selection.

Remember, the path to mastering generative AI is both challenging and rewarding. Keep pushing forward and exploring new possibilities.

Feel free to refer to the source code provided here. Additionally, the sample PDF document used in this tutorial can be found here.

Keep striving for excellence, and don't hesitate to reach out if you encounter any hurdles along the way.

References

Top comments (0)