DEV Community

Cover image for How to build a question answering system in Node.js with a vector index and OpenAI
Pratik Agarwal for Momento

Posted on • Originally published at gomomento.com

How to build a question answering system in Node.js with a vector index and OpenAI

In this step-by-step guide, we delve into building a question answering system from scratch, focusing on a specific topic: carrots. Central to our exploration is the concept of treating question answering as a retrieval process. This approach involves identifying source documents or specific sections within them that contain the answers to users' queries. By revealing the underlying process without the complexities introduced by external libraries, we aim to provide valuable insights into the fundamental workings of such systems. We'll be presenting this tutorial with code examples in TypeScript (Node.js).

Here's a quick overview of how we will get this done:

  • Initialize OpenAI and Momento clients.

  • Fetch and process (create chunks) carrot data from Wikipedia.

  • Generate embeddings for the text using OpenAI.

  • Store the embeddings in Momento Vector Index.

  • Search and respond to queries using the stored data.

  • Utilize OpenAI's chat completions for refined responses.

Environment Setup

Before we start coding, we need to create our index in Momento for storing data, and generate an API key to access Momento programmatically. You can do both on our console, and follow this guide for details! The code below uses mvi-openai-demo as the index name, 1536 for the number of dimensions (more on this soon!), and cosine similarity as the similarity metric. Cosine similarity cares more about the orientation of vectors than its magnitude (the word count in this case), which are suitable for a question answering system.

We also need an OpenAI API key to generate embeddings of our data and search queries. 

Next, we have to install the necessary packages. For TypeScript, we use @gomomento/sdk and openai.

NodeJS:

npm install @gomomento/sdk openai

Step 1: Initializing Clients

We begin by initializing our OpenAI and Momento clients. Here, we set up our development environment with the necessary packages and API keys. This step is crucial for establishing communication with OpenAI and Momento services. It lays the foundation for our Q&A engine.

Make sure you have the environment variables 'OPENAI_API_KEY' and 'MOMENTO_API_KEY' set before you run the code!

import OpenAI from 'openai';
import * as https from 'https';
import { ALL_VECTOR_METADATA, CredentialProvider, PreviewVectorIndexClient, VectorIndexConfigurations, VectorSearch, VectorUpsertItemBatch } from '@gomomento/sdk';
import { CreateEmbeddingResponse } from "openai/resources";


const openai = new OpenAI({ apiKey: process.env['OPENAI_API_KEY'] });
const mviClient = new PreviewVectorIndexClient({ credentialProvider:  CredentialProvider.fromEnvironmentVariable({ environmentVariableName: 'MOMENTO_API_KEY' }), configuration: VectorIndexConfigurations.Laptop.latest() });
const indexName = 'mvi-openai-demo'
Enter fullscreen mode Exit fullscreen mode

Step 2: Loading data from Wikipedia

We start by extracting data about carrots from Wikipedia. This step demonstrates how to handle external API calls and parse JSON responses. Go ahead and try this out locally for any Wikipedia page!

interface WikipediaResponse {
  query: {
    pages: {
      [key: string]: {
        extract: string;
      };
    };
  };
}

function getWikipediaExtract(url: string): Promise<string> {
  return new Promise((resolve, reject) => {
    https
      .get(url, response => {
        let data = '';

        response.on('data', chunk => {
          data += chunk;
        });

        response.on('end', () => {
          try {
            const jsonData: WikipediaResponse = JSON.parse(data);
            const pages = jsonData.query.pages;
            const extract = Object.values(pages)[0].extract;
            resolve(extract);
          } catch (error) {
            reject(error);
          }
        });
      })
      .on('error', error => {
        reject(error);
      });
  });
}

const url = "https://en.wikipedia.org/w/api.php?action=query&format=json&titles=Carrot&prop=extracts&explaintext";
Enter fullscreen mode Exit fullscreen mode

Now let’s run these snippets and view the length of our Carrot wikipedia page with a sample text.

const extractText = await getWikipediaExtract(url);
console.log('Total characters in carrot Wikipedia page: ' + String(extractText.length));
console.log('Sample text in carrot Wikipedia page:\n\n ' + extract_text.substring(0, 500))
Enter fullscreen mode Exit fullscreen mode

Output:

Total characters in carrot Wikipedia page: 21534 The carrot (Daucus carota subsp. sativus) is a root vegetable, typically orange in color, though heirloom variants including purple, black, red, white, and yellow cultivars exist, all of which are domesticated forms of the wild carrot, Daucus carota, native to Europe and Southwestern Asia. The plant probably originated in Persia and was originally cultivated for its leaves and seeds. The most commonly eaten part of the plant is the taproot, although the stems and leaves are also eaten.

Go ahead and try this out locally for any Wikipedia page!

Step 3: Preprocessing data to create chunks

In building our Q&A engine, we approach question answering as a kind of retrieval: identifying which source documents (or parts of them) contain the answers to a user's query. This concept is fundamental to our process and influences how we handle our data.

To make our system effective, we preprocess the data into chunks. This is because, in a question-answering context, answers often reside in specific sections of a document rather than across the entire text. By splitting the data into manageable chunks, we're effectively creating smaller, searchable units that our system can scan to find relevant answers. This chunking process is a crucial step in transforming extensive text into a format conducive to semantic search and retrieval.

We've opted for a straightforward approach to split our text by character count. However, it's crucial to understand that the size and method of chunking can significantly impact the system's effectiveness. Too large chunks might dilute the relevance of search results, while too small ones may miss critical context.

Alternative chunking methods may use tokenizers such as tiktoken to split the text along boundaries that align with the text embedding model. These methods may produce better results, but require external libraries. For demonstration we opt for a simpler method.

function splitTextIntoChunks(text: string, chunkSize = 600): string[] {
  const chunks = [];
  for (let i = 0; i < text.length; i += chunkSize) {
    chunks.push(text.substring(i, i + chunkSize));
  }
  return chunks;
}

const chunks = splitTextIntoChunks(extractText);
Enter fullscreen mode Exit fullscreen mode

Now we can view the total number of chunks that got created

 

console.log('Total number of chunks created: ' + String(chunks.length));
console.log('Total characters in each chunk: ' + String(chunks[0].length));
Enter fullscreen mode Exit fullscreen mode

Output:

Total number of chunks created: 36
Total characters in each chunk: 600

Step 4: Generating Embeddings with OpenAI

In our approach to building a Q&A engine, we've chosen to leverage the power of vector search, a state-of-the-art technique in semantic search. This method differs significantly from traditional keyword search approaches, like those used in Elasticsearch or Lucene. Vector search delves deeper into the intricacies of language, capturing concepts and meanings in a way that keyword search can't.

To facilitate vector search, our first task is to transform our textual data into a format that embodies this richer semantic understanding. We achieve this by generating embeddings using OpenAI's text-embedding-ada-002 model. This model is known for striking a balance between accuracy, cost, and speed, making it an ideal choice for generating text embeddings.

async function generate_embeddings(chunks: string[]) {
  return await openai.embeddings.create({input: chunks, model: 'text-embedding-ada-002'});
}
Enter fullscreen mode Exit fullscreen mode

Recall that we selected 1536 as the dimensionality for our vector index. This decision was based on the fact that OpenAI, when generating embeddings for each chunk, produces these embeddings as floating point vectors with a length of 1536.

const embeddingsResponse = await generateEmbeddings(chunks);


console.log('Length of each embedding: ' + String(embeddingsResponse.data[0].embedding.length));
console.log('Sample embedding: ' + String(embeddingsResponse.data[0].embedding.slice(0, 10)));
Enter fullscreen mode Exit fullscreen mode

Output:

Length of each embedding: 1536

Sample embedding: 0.008307404,-0.03437371,0.00043777542,-0.01768263,-0.010926112,-0.0056728064,-0.0025742147,-0.023453956,-0.021114917,-0.020148791

Step 5: Storing Data in Momento Vector Index

After generating embeddings, we store them in Momento's Vector Index. This involves creating items with IDs, vectors, and metadata, then upserting them to MVI. When storing data in the Momento Vector Index, it's important to use deterministic chunk IDs. This ensures that the same data isn't re-indexed repeatedly; optimizing storage, retrieval efficiency, and response accuracy. Managing data storage effectively is key to maintaining a scalable and responsive Q&A system.

async function upsertToMomentoVectorIndex(embeddingsResponse: CreateEmbeddingResponse, chunks: string[]) {
  const embeddings = embeddingsResponse.data.map(embedding => embedding.embedding);

  // Generate IDs for each chunk
  const ids = chunks.map((_, index) => `chunk${index + 1}`);

  // Generate metadata for each chunk. This will be needed when we search.
  const metadatas = chunks.map(chunk => ({text: chunk}));

  // Create VectorIndexItem objects
  const items = ids.map((id, index) => {
    return {
      id: id,
      vector: embeddings[index],
      metadata: metadatas[index],
    };
  });

  // Upsert to Momento Vector Index
  try {
    const upsertResponse = await mviClient.upsertItemBatch(indexName, items);
    if (upsertResponse instanceof VectorUpsertItemBatch.Success) {
      console.log('\n\nUpsert successful. Items have been stored');
    } else if (upsertResponse instanceof VectorUpsertItemBatch.Error) {
      console.error('Upsert error:', upsertResponse.message);
    }
  } catch (error) {
    console.error('Unexpected error during upsert:', error);
  }
}

await upsertToMomentoVectorIndex(embeddingsResponse, chunks);

Enter fullscreen mode Exit fullscreen mode

Output:

Upsert successful. Items have been stored.

Step 6: Searching and Responding to Queries

This step highlights the core functionality of the Q&A engine - retrieving answers using Momento Vector Index.This process involves searching through the indexed data using text embeddings, a technique that ensures we find the most relevant and contextually appropriate results.

When we indexed snippets of text in the previous steps, we first transformed these text snippets into vector representations using OpenAI's model. This transformation was key to preparing our data for efficient storage and retrieval in the Momento Vector Index.

Now, as we turn to the task of querying, it's crucial to apply a similar preprocessing step. The user's question, "What is a carrot?" in this instance, must also be converted into a vector. This enables us to perform a vector-to-vector search within our index.

The effectiveness of our search hinges on the consistency of preprocessing. The same embedding model and process used during indexing must be applied to the query. This ensures that the vector representation of the query aligns with the vectors stored in our index, otherwise the approach would not work.

async function searchQuery(queryText: string): Promise<string[]> {
  const queryResponse = await openai.embeddings.create({ input: queryText, model: 'text-embedding-ada-002' });
  const queryVector = queryResponse.data[0].embedding;

  try {
    const searchResponse = await mviClient.search(indexName, queryVector, {
      topK: 2,
      metadataFields: ALL_VECTOR_METADATA
    });

    if (searchResponse instanceof VectorSearch.Success) {
      const texts: string[] = searchResponse.hits().map(hit => hit.metadata.text as string);
      return texts;
    } else if (searchResponse instanceof VectorSearch.Error) {
      console.error('Search error:', searchResponse.message());
    }
  } catch (error) {
    console.error('Unexpected error during search:', error);
  }

  return [];
}
Enter fullscreen mode Exit fullscreen mode

Let’s start with a simple search for “What is a carrot?”:

const query = 'What is a carrot?';
const texts = await searchQuery(query);
if (texts.length > 0) {
  console.log('\n=========================================\n');
  console.log('Embedding search results:\n\n', texts.join('\n'));
  console.log('\n=========================================\n');
}
Enter fullscreen mode Exit fullscreen mode

The output for this query looks like:

The carrot (Daucus carota subsp. sativus) is a root vegetable, typically orange in color, though heirloom variants including purple, black, red, white, and yellow cultivars exist, all of which are domesticated forms of the wild carrot, Daucus carota, native to Europe and Southwestern Asia. The plant probably originated in Persia and was originally cultivated for its leaves and seeds. The most commonly eaten part of the plant is the taproot, although the stems and leaves are also eaten. The domestic carrot has been selectively bred for its enlarged, more palatable, less woody-textured taproot. The carrot is a biennial plant in the umbellifer family, Apiaceae. At birth, it grows a rosette of leaves while building up the enlarged taproot. Fast-growing cultivars mature within about three months (90 days) of sowing the seed, while slower-maturing cultivars need a month longer (120 days). The roots contain high quantities of alpha- and beta-carotene, lycopene, anthocyanins, lutein, and are a good source of vitamin A, vitamin K, and vitamin B6. Black carrots are one of the richest sources of anthocyanins (250-300 mg/100 g fresh root weight), and hence possesses high antioxidant ability

As you see, we indexed vectors in Momento Vector Index and stored the original text as metadata in the items. When asked the question “What is a carrot?”, we transformed the text into a vector, performed a vector search in MVI, and returned the original text stored in the metadata. Under the hood we did a vector-to-vector matching, yet from a user perspective it looks like a text-to-text search.

Step 7: Too verbose? Let’s use chat completions to enhance query responses

Until now, our approach has treated question answering primarily as a retrieval task. We've taken the user's query, performed a search, and presented snippets of information that could potentially contain the answer. This method, while effective in fetching relevant data, still places the onus on the user to sift through the results and extract the answer. It's akin to providing pages from a reference book without pinpointing the exact information sought.

To elevate the user experience from mere retrieval to direct answer generation, we introduce Large Language Models (LLMs) like OpenAI's GPT-3.5. LLMs have the ability to not just find but also synthesize information, offering concise and contextually relevant answers. This is a significant leap from delivering a page of search results to providing a clear, succinct response to the user's query.

async function searchWithChatCompletion(texts: string[], queryText: string) {
  const text = texts.join('\n');
  const prompt: string = 'Given the following extracted parts about carrot, answer questions pertaining to' +
    " carrot only from the provided text. If you don't know the answer, just say that " +
    "you don't know. Don't try to make up an answer. Do not answer anything outside of the context given. " +
    "Your job is to only answer about carrots, and only from the text below. If you don't know the answer, just " +
    "say that you don't know. Here's the text:\n\n----------------\n\n";

  const resp = await openai.chat.completions.create({
    messages: [
      { role: 'system', content: prompt + text },
      { role: 'user', content: queryText },
    ],
    model: 'gpt-3.5-turbo',
  });

  return resp;

}
Enter fullscreen mode Exit fullscreen mode

And let’s use the same query “What is a carrot?” to compare the response.

const chatCompletionResp = await searchWithChatCompletion(texts, query);

console.log('\n=========================================\n');
console.log('Chat completion search results:\n\n', chatCompletionResp.choices[0].message.content);
console.log('\n=========================================\n');
Enter fullscreen mode Exit fullscreen mode

Output:

A carrot is a root vegetable that is typically orange in color, although there are also other colored variants such as purple, black, red, white, and yellow.  |

Now let’s quickly compare the outputs of a more specific question such as "How fast do fast-growing cultivators mature in carrots?"

const query = 'how fast do fast-growing cultivators mature in carrots?';
const texts = await searchQuery(query);
  if (texts.length > 0) {
    console.log('\n=========================================\n');
    console.log('Embedding search results:\n\n', texts[0]);
    console.log('\n=========================================\n');

    const chatCompletionResp = await searchWithChatCompletion(texts, query);

    console.log('\n=========================================\n');
    console.log('Chat completion search results:\n\n', chatCompletionResp.choices[0].message.content);
    console.log('\n=========================================\n');
  }
Enter fullscreen mode Exit fullscreen mode

Output:

Notice how brief and precise the chat completion response is compared to the raw semantic search results.

Embedding search results: The carrot is a biennial plant in the umbellifer family, Apiaceae. At birth, it grows a rosette of leaves while building up the enlarged taproot. Fast-growing cultivars mature within about three months (90 days) of sowing the seed, while slower-maturing cultivars need a month longer (120 days). The roots contain high quantities of alpha- and beta-carotene, lycopene, anthocyanins, lutein, and are a good source of vitamin A, vitamin K, and vitamin B6. Black carrots are one of the richest sources of anthocyanins (250-300 mg/100 g fresh root weight), and hence possesses high antioxidant ability.

Chat completion search results: Fast-growing cultivars of carrots mature within about three months (90 days) of sowing the seed.

Conclusion

In this guide, we embarked on a journey to build a question answering system from the ground up. The key idea behind our approach was to treat question answering as a retrieval problem. By using text embeddings and vector search, we've brought in state of the art nuanced and semantically rich search, surpassing traditional keyword-based approaches. Let's briefly recap the steps we took to get here:

  • Initializing Clients: Set up OpenAI and Momento clients, laying the groundwork for our system.

  • Fetching and Processing Data: Extracted and processed data from Wikipedia, preparing it for embedding generation. We learnt about the significance of creating chunks of data for efficient retrieval.

  • Generating Embeddings: Utilized OpenAI's text-embedding-ada-002 model to generate text embeddings, converting our corpus into a format suitable for semantic search. We learnt how the length of these embeddings direct the number of dimensions of a vector index.

  • Storing in MVI: Stored these embeddings in Momento's Vector Index, ensuring efficient retrieval. We learnt about a common pitfall of using UUID as an index’s item ID, which results in repeated re-indexing of the same data. 

  • Searching and Responding to Queries: Implemented a search functionality that leverages vector indexing for semantic search to find the most relevant responses. We perform a vector-to-vector search, and use the text stored in the metadata of our items to display to the user. 

  • Enhancing Responses with Chat Completions: Added a layer of refinement using OpenAI's chat completions to generate concise and accurate answers. Here we witnessed that Large Language Models not only improve the accuracy of the responses but also ensure they are contextually relevant, coherent, and presented in a user-friendly format.

Finally, while our hands-on approach offers a deep dive into the mechanics of building a Q&A engine, we recognize the complexity involved in such an endeavor. Frameworks like Langchain abstract much of this complexity, providing a higher-level abstraction that simplifies the process of chaining embeddings from OpenAI or altering the vector store. Langchain is a choice tool for many developers, making it easier to build, modify, and maintain complex AI-driven applications.

Top comments (0)