<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Andrew Smith</title>
    <description>The latest articles on DEV Community by Andrew Smith (@emertechie).</description>
    <link>https://dev.to/emertechie</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3320453%2F49187b04-a0fe-4f7b-9382-d20cc63faef5.jpg</url>
      <title>DEV Community: Andrew Smith</title>
      <link>https://dev.to/emertechie</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/emertechie"/>
    <language>en</language>
    <item>
      <title>Building a RAG chatbot with TypeScript and Next.js</title>
      <dc:creator>Andrew Smith</dc:creator>
      <pubDate>Fri, 04 Jul 2025 16:23:58 +0000</pubDate>
      <link>https://dev.to/emertechie/building-a-rag-chatbot-with-typescript-and-nextjs-53c6</link>
      <guid>https://dev.to/emertechie/building-a-rag-chatbot-with-typescript-and-nextjs-53c6</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;This post walks through how I built a Retrieval Augmented Generation (RAG) chatbot using TypeScript and Next.js, along with a companion Node.js tool to fetch and index documents. The chatbot UI is based on Vercel's &lt;a href="https://chat-sdk.dev/" rel="noopener noreferrer"&gt;Chat SDK&lt;/a&gt;, which is built on their excellent &lt;a href="https://ai-sdk.dev/" rel="noopener noreferrer"&gt;AI SDK&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Retrieval Augmented Generation (RAG) is a technique for providing additional context to an LLM to improve the accuracy of its response. This may be needed if the information is held in a private knowledge base for example, or if you want the LLM to always use the most up-to-date documentation.&lt;/p&gt;

&lt;p&gt;By the end of the post, you'll know how to create a RAG chatbot that uses your own knowledge base to answer questions, and links to the source documents.&lt;/p&gt;

&lt;p&gt;All the code is available in this GitHub repo: &lt;a href="https://github.com/emertechie/rag-ai-chatbot" rel="noopener noreferrer"&gt;https://github.com/emertechie/rag-ai-chatbot&lt;/a&gt;. The &lt;a href="https://cal.com/docs/developing/introduction" rel="noopener noreferrer"&gt;cal.dom developer documentation&lt;/a&gt; is used as an example knowledge base in this post and the code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who is this post for?
&lt;/h2&gt;

&lt;p&gt;This post is aimed at JavaScript developers who want to explore how to build a RAG chatbot, but are not familiar with Python or Python-based libraries like LangChain. You may have heard of RAG terms like embeddings and vector databases, but are not sure how they fit together.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;p&gt;If you want to get the project running locally, follow the steps below:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Fork the &lt;a href="https://github.com/emertechie/rag-ai-chatbot" rel="noopener noreferrer"&gt;rag-ai-chatbot&lt;/a&gt; repository on GitHub&lt;/li&gt;
&lt;li&gt;Clone your forked repository locally&lt;/li&gt;
&lt;li&gt;Create a PostgreSQL database and retrieve its connection string to set the &lt;code&gt;DATABASE_URL&lt;/code&gt; environment variable in &lt;code&gt;.env.local&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Create a &lt;a href="https://vercel.com/marketplace/redis" rel="noopener noreferrer"&gt;Redis instance&lt;/a&gt; and set its connection string in the &lt;code&gt;REDIS_URL&lt;/code&gt; environment variable in &lt;code&gt;.env.local&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Sign up for an &lt;a href="https://auth.openai.com/log-in" rel="noopener noreferrer"&gt;OpenAI platform account&lt;/a&gt; (different to a ChatGPT account), &lt;a href="https://platform.openai.com/settings/organization/billing/overview" rel="noopener noreferrer"&gt;add some credit&lt;/a&gt;—you'll likely need much less than $5, &lt;a href="https://platform.openai.com/settings/organization/api-keys" rel="noopener noreferrer"&gt;create an API key&lt;/a&gt;, and then set the &lt;code&gt;OPENAI_API_KEY&lt;/code&gt; environment variable in &lt;code&gt;.env.local&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Install dependencies using &lt;code&gt;pnpm install&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Start the development server using &lt;code&gt;pnpm dev&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These steps are based on the &lt;a href="https://chat-sdk.dev/docs/getting-started/setup#develop-locally-and-deploy-later" rel="noopener noreferrer"&gt;Chat SDK local setup guide&lt;/a&gt;, but with OpenAI used instead of Grok as the AI provider. I also didn't configure a blob store, as I won't be using file storage in this example.&lt;/p&gt;

&lt;h2&gt;
  
  
  RAG overview
&lt;/h2&gt;

&lt;p&gt;Before describing the system components, it's important to have a basic understanding of RAG systems. Feel free to skip this section if you're already familiar.&lt;/p&gt;

&lt;p&gt;RAG systems provide additional &lt;strong&gt;context&lt;/strong&gt; to an LLM so it can accurately answer a user's query. A RAG system is based around &lt;strong&gt;embeddings&lt;/strong&gt;, also known as &lt;strong&gt;vectors&lt;/strong&gt;. An embedding is a numerical representation of text or images that captures its &lt;strong&gt;semantic meaning&lt;/strong&gt;. From the AI SDK documentation:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Embeddings are a way to represent words, phrases, or images as vectors in a high-dimensional space. In this space, similar words are close to each other, and the distance between words can be used to measure their similarity.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;An &lt;strong&gt;ingestion&lt;/strong&gt; process fetches source documents, breaks them up into smaller &lt;strong&gt;chunks&lt;/strong&gt;, and creates an embedding (vector) for each. Embeddings are stored in a &lt;strong&gt;vector database&lt;/strong&gt; along with the original chunk text, and details of the source document.&lt;/p&gt;

&lt;p&gt;A vector database efficiently stores and indexes vectors (surprise) and provides a way to query the &lt;strong&gt;distance between vectors&lt;/strong&gt; in the high-dimensional space, which is the key to finding closely related information.&lt;/p&gt;

&lt;p&gt;When a user queries the chatbot, their query is turned into an embedding and used to find the closest vectors in the database. The chunks associated with those vectors contain information highly related to the query and are passed to the chatbot LLM so it can generate its response.&lt;/p&gt;

&lt;h2&gt;
  
  
  System components
&lt;/h2&gt;

&lt;p&gt;This system integrates the following components to build a RAG chatbot:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI library&lt;/strong&gt;: AI SDK&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI provider&lt;/strong&gt;: OpenAI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector database&lt;/strong&gt;: Postgres pgvector extension&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document indexer&lt;/strong&gt;: Custom implementation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chatbot UI&lt;/strong&gt;: Chat SDK&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document source UI component&lt;/strong&gt;: Custom implementation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chatbot state databases&lt;/strong&gt;: Postgres and Redis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The following sections describe some of these in more detail.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI library
&lt;/h2&gt;

&lt;p&gt;Using an AI library lets you work with a unified API over the different LLM providers, models, and capabilities available in AI systems today.&lt;/p&gt;

&lt;p&gt;I initially considered the Python-based LangChain library, given that it was one of the original LLM frameworks and is the foundation of many AI systems. But I wanted something in the JavaScript space that I could get up and running with more quickly.&lt;/p&gt;

&lt;p&gt;Fortunately, there's now an excellent TypeScript option in the &lt;a href="https://ai-sdk.dev/" rel="noopener noreferrer"&gt;AI SDK&lt;/a&gt; from Vercel. The SDK has two main libraries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://ai-sdk.dev/docs/ai-sdk-core" rel="noopener noreferrer"&gt;AI SDK Core&lt;/a&gt;: "A unified API for generating text, structured objects, tool calls, and building agents with LLMs."&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ai-sdk.dev/docs/ai-sdk-ui" rel="noopener noreferrer"&gt;AI SDK UI&lt;/a&gt;: "A set of framework-agnostic hooks for quickly building chat and generative user interface."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The SDK also has excellent documentation. Something that can be lacking in other libraries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Document indexer
&lt;/h2&gt;

&lt;p&gt;To get my source documents into a vector database format, I needed a document indexer process to handle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Document ingestion&lt;/strong&gt;: loading documents from various sources&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunking&lt;/strong&gt;: breaking a document into smaller pieces called chunks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding&lt;/strong&gt;: creating an embedding for each chunk, representing its semantic meaning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Indexing&lt;/strong&gt;: storing the embeddings and chunks in a vector database to enable querying&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I built my indexer in Node.js, and the following sections describe the different steps in the process in more detail. The indexer source code is available in the &lt;a href="https://github.com/emertechie/rag-ai-chatbot/tree/main/indexer" rel="noopener noreferrer"&gt;&lt;code&gt;/indexer&lt;/code&gt;&lt;/a&gt; folder of the repo.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4phsw703hfj3hq3zsqz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4phsw703hfj3hq3zsqz.png" alt="Screenshot of document indexer command line" width="800" height="335"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Document ingestion
&lt;/h3&gt;

&lt;p&gt;I wanted to index markdown documents from my local machine, so I could quickly experiment with different chunking and indexing options. I also wanted to support other document sources in future, so the indexer has a &lt;code&gt;DataSource&lt;/code&gt; base class with an overridable &lt;code&gt;discoverDocuments&lt;/code&gt; function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;  &lt;span class="cm"&gt;/**
   * Discover indexable documents from this data source
   * @param options Configuration options for discovery
   * @returns AsyncGenerator that yields indexable documents one by one
   */&lt;/span&gt;
  &lt;span class="nx"&gt;abstract&lt;/span&gt; &lt;span class="nf"&gt;discoverDocuments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DataSourceOptions&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;AsyncGenerator&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;IndexableDocument&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that this uses an &lt;a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/AsyncGenerator" rel="noopener noreferrer"&gt;async generator&lt;/a&gt; to return documents one by one as they are loaded, avoiding the need to load all documents into memory before processing can start.&lt;/p&gt;

&lt;p&gt;The indexer comes with a couple of &lt;code&gt;DataSource&lt;/code&gt; implementations, described below.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;FileSystemDataSource&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;The &lt;code&gt;FileSystemDataSource&lt;/code&gt; type was the first implementation and allowed me to use local markdown files to quickly get an end-to-end test working.&lt;/p&gt;

&lt;p&gt;However, I wanted to link to source documents from the chatbot UI—to show where the LLM got its information—so local file paths were not much use to anyone else! I needed a data source to discover and fetch documents from a URL, described next.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;URLDataSource&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;I was considering building a web scraper, but then I thought of the new &lt;a href="https://llmstxt.org/" rel="noopener noreferrer"&gt;&lt;code&gt;llms.txt&lt;/code&gt; proposal&lt;/a&gt;, which attempts to standardize how sites present information for LLMs. In the proposal, now adopted by many documentation sites such as &lt;a href="https://docs.stripe.com/llms.txt" rel="noopener noreferrer"&gt;Stripe&lt;/a&gt; and &lt;a href="https://developers.cloudflare.com/llms.txt" rel="noopener noreferrer"&gt;Cloudflare&lt;/a&gt;, the &lt;code&gt;llms.txt&lt;/code&gt; file contains a list of documentation links, but each has a &lt;code&gt;.md&lt;/code&gt; extension to return the content as Markdown.&lt;/p&gt;

&lt;p&gt;So using &lt;code&gt;llms.txt&lt;/code&gt; effectively gives you a sitemap of a site's content, but all links point to Markdown files, which are much easier for an LLM to consume than HTML. And by simply removing the &lt;code&gt;.md&lt;/code&gt; extension at the end of each link, you have the link to the human-readable version, which you can use in the chatbot UI to link to document sources.&lt;/p&gt;

&lt;p&gt;This was a perfect solution for my purposes since the cal.com docs I was using to test against also supported &lt;a href="https://cal.com/docs/llms.txt" rel="noopener noreferrer"&gt;&lt;code&gt;llms.txt&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Chunking
&lt;/h3&gt;

&lt;p&gt;The next step in the indexing process is chunking, which breaks up larger documents into smaller chunks. An embedding is created for each chunk to capture its semantic meaning.&lt;/p&gt;

&lt;p&gt;The size and content of each chunk affect how well its embedding matches with a query. If a chunk is too big, its encoding may represent too many details, and won't be "close" in the high-dimensional space to a user's query. Alternatively, too small and it may not capture enough context to make a good match possible.&lt;/p&gt;

&lt;p&gt;Unfortunately, there isn't a one-size-fits-all solution to chunk sizing. It's highly dependent on your content and its structure. For example, a novel with long narrative passages may perform well with larger chunks, but using larger chunks with a dense, technical Markdown document may include too much distinct information in each chunk.&lt;/p&gt;

&lt;h4&gt;
  
  
  Text splitters
&lt;/h4&gt;

&lt;p&gt;There are different strategies to split up text into chunks, including by length, by sentence or paragraph breaks, and by using additional knowledge of the document structure such as Markdown headers.&lt;/p&gt;

&lt;p&gt;There are tools to help you visualize how the different strategies and parameter values affect chunks, such as this one: &lt;a href="https://huggingface.co/spaces/m-ric/chunk_visualizer" rel="noopener noreferrer"&gt;https://huggingface.co/spaces/m-ric/chunk_visualizer&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In this project, I was indexing Markdown documents so I wanted a Markdown-specific splitter. Unfortunately, Vercel's AI SDK doesn't include any text splitters, but I was able to use the &lt;code&gt;MarkdownTextSplitter&lt;/code&gt; from LangChain's &lt;a href="https://js.langchain.com/docs/concepts/text_splitters/" rel="noopener noreferrer"&gt;TypeScript library&lt;/a&gt; and fall back to the &lt;code&gt;RecursiveCharacterTextSplitter&lt;/code&gt; for any other file type.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;MarkdownTextSplitter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;RecursiveCharacterTextSplitter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;langchain/text_splitter&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="p"&gt;...&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;textSplitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;isMarkdownDoc&lt;/span&gt;
  &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;MarkdownTextSplitter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;chunkSize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;chunkOverlap&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;chunkSize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;chunkOverlap&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;separators&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;textSplitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createDocuments&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Embedding
&lt;/h3&gt;

&lt;p&gt;Once you have your documents split into chunks, you can create embeddings for each using a text &lt;strong&gt;embedding model&lt;/strong&gt;. For this project I used OpenAI's &lt;code&gt;text-embedding-3-small&lt;/code&gt; model, and used the AI SDK &lt;a href="https://ai-sdk.dev/docs/reference/ai-sdk-core/embed-many" rel="noopener noreferrer"&gt;&lt;code&gt;embedMany&lt;/code&gt;&lt;/a&gt; function to process multiple chunks at a time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;embeddings&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;embedMany&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;myProvider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;textEmbeddingModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;embedding-model&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The embedding model is also used to create the embedding of a user's query, so the same model must be used for comparisons to make sense.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; the embedding model is not the same as the LLM model used by the chatbot. A text embedding model specializes in converting text into dense vectors that capture semantic meaning. Its focus is on retrieval tasks, not generative.&lt;/p&gt;

&lt;h3&gt;
  
  
  Indexing
&lt;/h3&gt;

&lt;p&gt;The last step in the indexer process is to store each embedding in the vector database, along with the chunk of text it represents, and a link to the source document. If a query embedding matches an embedding in the database, the corresponding original chunk of text can be passed to the LLM as context for its response, and a link to the source document shown in the chatbot UI.&lt;/p&gt;

&lt;p&gt;Special purpose vector databases such as &lt;a href="https://www.pinecone.io/" rel="noopener noreferrer"&gt;Pinecone&lt;/a&gt; can be used if performance and scalability are a concern. Since I just wanted to run this project locally though, and already had a Postgres database for general application state, I chose to use the &lt;code&gt;pgvector&lt;/code&gt; Postgres extension.&lt;/p&gt;

&lt;p&gt;Vercel's Chat SDK repo already uses the Drizzle ORM for database query and migration support. I added TypeScript schema definitions for the &lt;code&gt;Resource&lt;/code&gt; and &lt;code&gt;ResourceChunk&lt;/code&gt; tables (&lt;code&gt;Document&lt;/code&gt; was already in use), and used &lt;code&gt;pnpm db:generate&lt;/code&gt; to create the migrations shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="nv"&gt;"Resource"&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nv"&gt;"id"&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;"source_type"&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;"source_uri"&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;"content_hash"&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;"createdAt"&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;"updatedAt"&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="nv"&gt;"Resource_source_uri_unique"&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"source_uri"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="nv"&gt;"ResourceChunk"&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nv"&gt;"id"&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;"resource_id"&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;"content"&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;"embedding"&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="nv"&gt;"ResourceChunk"&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="nv"&gt;"ResourceChunk_resource_id_Resource_id_fk"&lt;/span&gt; &lt;span class="k"&gt;FOREIGN&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"resource_id"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="nv"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;cascade&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="k"&gt;no&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="nv"&gt;"embedding_index"&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="nv"&gt;"ResourceChunk"&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;hnsw&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"embedding"&lt;/span&gt; &lt;span class="n"&gt;vector_cosine_ops&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This migration creates a straightforward parent-child relationship between &lt;code&gt;Resource&lt;/code&gt; and &lt;code&gt;ResourceChunk&lt;/code&gt;, with a couple of notable columns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Resource.content_hash&lt;/code&gt;: a hash of the entire source document, used to skip processing unchanged ones&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ResourceChunk.embedding&lt;/code&gt;: the embedding for the text chunk stored in the &lt;code&gt;content&lt;/code&gt; column.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A special &lt;code&gt;hnsw&lt;/code&gt; type index is automatically created by Drizzle to support efficient querying on the &lt;code&gt;embedding&lt;/code&gt; vector column.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: different embedding models use different numbers of &lt;strong&gt;dimensions&lt;/strong&gt;, so ensure the &lt;code&gt;embedding&lt;/code&gt; column size matches your selected model. The AI SDK documentation has a list of &lt;a href="https://ai-sdk.dev/docs/ai-sdk-core/embeddings#embedding-providers--models" rel="noopener noreferrer"&gt;supported embedding models&lt;/a&gt; and their dimension sizes.&lt;/p&gt;

&lt;p&gt;In the TypeScript database code, since an embedding is just an array of numbers, you don't need special handling when inserting one to the database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunkValues&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;chunksWithEmbeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(([&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="nx"&gt;resourceId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Type: number[]&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resourceChunk&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunkValues&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;returning&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Document deletion
&lt;/h2&gt;

&lt;p&gt;The document indexer inserts or updates new content each time it's run. But what about previously indexed documents that no longer exist now? You don't want your LLM providing old information, so the indexer also needs to remove deleted documents from the database.&lt;/p&gt;

&lt;p&gt;By tracking the set of discovered documents on each run, and comparing it with the list of documents currently stored in the database, the indexer can determine which documents used to exist but no longer do, and delete them.&lt;/p&gt;

&lt;p&gt;Deletion tracking is implemented in the &lt;code&gt;indexer/index.ts&lt;/code&gt; script, where the core logic looks as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;discoveredUris&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Set&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;dataSource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;discoverDocuments&lt;/span&gt;&lt;span class="p"&gt;({}))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;discoveredUris&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sourceUri&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;processDocument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;handleDocumentDeletion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;dataSource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getSourceType&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nx"&gt;discoveredUris&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Chatbot integration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Search tool
&lt;/h3&gt;

&lt;p&gt;The LLM uses a &lt;strong&gt;tool&lt;/strong&gt; interface to query the information in the vector database. The AI SDK has a really nice interface for defining &lt;a href="https://ai-sdk.dev/docs/reference/ai-sdk-core/tool" rel="noopener noreferrer"&gt;tools&lt;/a&gt;, including a type-safe &lt;code&gt;parameters&lt;/code&gt; collection. The repo includes example tools from the Chat SDK in the &lt;code&gt;lib/ai/tools&lt;/code&gt; folder, including one to get the weather for example.&lt;/p&gt;

&lt;p&gt;I added a &lt;code&gt;searchKnowledge&lt;/code&gt; tool with the following definition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;searchKnowledge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Search for relevant information about cal.com developer documentation in the knowledge base using a natural language query&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;The search query to find relevant information&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LLM uses a tool's description to know when to call it, and uses each parameter description to construct the parameters passed to the &lt;code&gt;execute&lt;/code&gt; function.&lt;/p&gt;

&lt;p&gt;For my search tool, the &lt;code&gt;execute&lt;/code&gt; function receives the user's query and calls the AI SDK's &lt;a href="https://ai-sdk.dev/docs/reference/ai-sdk-core/embed" rel="noopener noreferrer"&gt;&lt;code&gt;embed&lt;/code&gt;&lt;/a&gt; function to create an embedding for it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;embedding&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;myProvider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;textEmbeddingModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;embedding-model&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// `query` passed to the `execute` function&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This &lt;code&gt;embedding&lt;/code&gt; can then be compared with the embeddings in the database to find related chunks of text. The most common way to compare embeddings is to use &lt;strong&gt;cosine similarity&lt;/strong&gt;, which measures the cosine of the angle between two vectors. This gives a floating point value (usually between 0 and 1) indicating how similar their direction is.&lt;/p&gt;

&lt;p&gt;Embedding comparison is implemented in the &lt;code&gt;searchSimilarChunks&lt;/code&gt; function, which uses cosine similarity to sort and filter the results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;export&lt;/span&gt; &lt;span class="n"&gt;async&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;searchSimilarChunks&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="k"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;const&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt;`1 - (${cosineDistance(resourceChunk.embedding, embedding)})`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;const&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;select&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="n"&gt;chunkId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;resourceChunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;chunkContent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;resourceChunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;resourceId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;resourceType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sourceType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;resourceUri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sourceUri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;resourceCreatedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;resourceUpdatedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;updatedAt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;similarity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resourceChunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;innerJoin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resourceChunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resourceId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;similarity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="c1"&gt;-- filter by cosine distance&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orderBy&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;similarity&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="c1"&gt;-- sort by cosine distance&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;-- take top N results&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You may need to tweak the &lt;code&gt;threshold&lt;/code&gt; parameter value to ensure relevant chunks are returned from the database. I had to change it from &lt;code&gt;0.75&lt;/code&gt; to &lt;code&gt;0.6&lt;/code&gt; after changing text embedding model for example.&lt;/p&gt;

&lt;h3&gt;
  
  
  System prompt
&lt;/h3&gt;

&lt;p&gt;The system prompt is how you control the chatbot LLM's overall behaviour. For example, asking it to assume a particular role or giving it constraints about what it can and cannot answer.&lt;/p&gt;

&lt;p&gt;Here's the original system prompt included with the Chat SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are a friendly assistant! Keep your responses concise and helpful
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I wanted to test how accurately this assistant responded to queries. Since I was using the public cal.com documentation to test with though, there was a chance the LLM's training data already included a previously copy of that site. So I needed to ensure my chatbot was actually using my indexed documents, and not its own (possibly stale) knowledge.&lt;/p&gt;

&lt;p&gt;I started with a blank vector database and used the &lt;code&gt;FileSystemDataSource&lt;/code&gt; introduced earlier to index a local copy of the cal.com documentation, which included a freshly merged PR with some updated documentation that I knew wouldn't be in the LLM's training data.&lt;/p&gt;

&lt;p&gt;I then asked the chatbot about that new information, and my friendly assistant helpfully hallucinated a response that sounded very plausible, but unfortunately wasn't accurate. So I needed to tweak the system prompt.&lt;/p&gt;

&lt;p&gt;After a bit of trial and error, I ended up with this system prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are a friendly assistant that can help with developer questions about using cal.com.
You can ONLY answer using knowledge you get from the tools you have access to.
DO NOT RELY ON YOUR OWN KNOWLEDGE TO ANSWER THE QUESTION.
If you cannot answer the question, say "I'm sorry, I don't know the answer to that". No exceptions.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You have to explicitly tell the ever-eager LLM to say "I don't know" if it can't answer your question. I also had to really emphasise that it shouldn't use its own knowledge—after earlier, more gentle prompts, still led to hallucinations.&lt;/p&gt;

&lt;p&gt;After tweaking the prompt and seeing the chatbot correctly use the information from my database, I temporarily disconnected my &lt;code&gt;searchKnowledge&lt;/code&gt; tool from the LLM to confirm the chatbot would now &lt;em&gt;not&lt;/em&gt; be able to answer my original query. After starting a new chat (so previous messages with the correct answer weren't provided as context to the LLM), I re-ran my original query and the LLM correctly told me it didn't know the answer. Much better!&lt;/p&gt;

&lt;p&gt;As this shows, it's important to test not only positive cases but negative ones also, where the LLM &lt;em&gt;shouldn't&lt;/em&gt; answer.&lt;/p&gt;

&lt;p&gt;While testing, you'll probably come up with several different queries that exercise different response behavior, so it's also important to ensure these aren't forgotten about—if someone else adjusts the system prompt in future for example. To prevent these kind of regressions, you can use &lt;strong&gt;evals&lt;/strong&gt;. These are automated tests which get the LLM to answer a set of pre-defined questions, and often use another LLM to judge if the response was correct (since the responses are non-deterministic).&lt;/p&gt;

&lt;p&gt;One final adjustment I made was to ensure that, if the chatbot response links to a Markdown document, it strips off the trailing &lt;code&gt;.md&lt;/code&gt; extension from the URL so it points to the HTML version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;If linking to a document returned from the \`searchKnowledge\` tool, remove any '.md' extension from the link (That will link to a human-readable version).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Showing document sources
&lt;/h3&gt;

&lt;p&gt;Similar to how online LLM tools like Perplexity link to their sources, I wanted to show which documents were used to answer the user's query. To give more user's more confidence in the response accuracy.&lt;/p&gt;

&lt;p&gt;The AI SDK's &lt;a href="https://ai-sdk.dev/docs/reference/ai-sdk-ui/use-chat" rel="noopener noreferrer"&gt;&lt;code&gt;useChat&lt;/code&gt;&lt;/a&gt; hook returns all the chat messages so far, including results from tool calls. So by filtering for my &lt;code&gt;searchKnowledge&lt;/code&gt; tool, I can pass the result from the &lt;code&gt;searchSimilarChunks&lt;/code&gt; query function, discussed above, to my &lt;code&gt;SearchKnowledge&lt;/code&gt; React component. The result includes any chunks relevant to the query, and the document they belong to.&lt;/p&gt;

&lt;p&gt;The component shows a simple "Sources" header and link. The UI and UX can certainly do with some improvement, but that wasn't my focus for this project.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxd0axnz1kgwa495usmu8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxd0axnz1kgwa495usmu8.png" alt="Screenshot of chatbot showing document sources" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting it all together
&lt;/h2&gt;

&lt;p&gt;With all those components in place, I can index the cal.dom docs using the following command line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx tsx &lt;span class="nt"&gt;--env-file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;.env.local indexer/index.ts &lt;span class="nt"&gt;--url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://cal.com/docs/llms.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And run the chatbot with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pmpm dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For additional indexer flags, see the &lt;a href="https://github.com/emertechie/rag-ai-chatbot" rel="noopener noreferrer"&gt;README&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In this post I showed how to use the AI and Chat SDKs from Vercel, along with a custom document indexer, to roll your own RAG chatbot. A reminder that all the source is available here: &lt;a href="https://github.com/emertechie/rag-ai-chatbot" rel="noopener noreferrer"&gt;https://github.com/emertechie/rag-ai-chatbot&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I encourage you to play around with the system and try to apply it to a real use case. Having a practical application is a great way to really learn something in depth.&lt;/p&gt;

&lt;p&gt;Also check out the docs for the &lt;a href="https://ai-sdk.dev/" rel="noopener noreferrer"&gt;AI SDK&lt;/a&gt; and &lt;a href="https://chat-sdk.dev/" rel="noopener noreferrer"&gt;Chat SDK&lt;/a&gt;, which are great resources for further learning.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>nextjs</category>
      <category>typescript</category>
    </item>
  </channel>
</rss>
