DEV Community: Ryan Michael

Building a RAG tool with Vercel's Generative UI components

Ryan Michael — Thu, 07 Mar 2024 19:52:11 +0000

Retrieval-Augmented Generation (RAG) blends the generative abilities of LLMs with the retrieval of information from diverse knowledge bases (KBs). However, traditional implementations of RAG have often relied on pre-scripting the logic for selecting and utilizing these KBs.

This conventional method, while effective, places limits on the flexibility and adaptability, and performance of the applications.

But consider an alternative: applications that empower the LLM itself to determine which actions are necessary to generate a response. This approach not only harnesses the LLM's generative capabilities but also its ability to make contextual decisions on-the-fly, thereby opening up a more dynamic and responsive way to handle knowledge search tasks. As a result, you can build targeted KB's and use the power of LLMs to write application-specific queries that retrieve better, more relevant results.

To demonstrate this approach, we'll build a simple chatbot backed by a custom tool. The chatbot will have access to a KB containing product information - any time you ask a question the chatbot determines could be answered using this tool it will query the KB and show both the in-progress query and retrieved results as custom UI elements before responding with the retrieved products as context.

We'll use Vercel's Generative UI library to render the chat window and Dewy to implement the information storage and retrieval used by our tool and Dewy as the knowledge store backing the tool.

Why Dewy and Vercel?

Vercel's Generative UI library) is designed to simplify the creation of custom LLM "tools" and to render rich, interactive UI elements that are specific to each tool the LLM chooses to use. What this means in practice is that your UI can dynamically adapt to the specific tools chosen by the LLM, displaying tool-specific information and progress indicators to the user.

Dewy is an OSS knowledge base with the simplicity and ease-of-use of a document store: insert your documents and Dewy takes care of preparing them for semantic search. Dewy's flexibility and ease of use allow developers to focus on building amazing user experiences rather than complex data-processing pipelines.

Prerequisites

Before diving into the tutorial, ensure you have the following prerequisites covered:

Basic knowledge of Typescript and React
A NextJS environment setup on your local machine
A copy of Dewy running on your local machine (see Dewy's installation instructions if you need help here).
Access to the OpenAI API platform.

Set up your project

Initialize a new NextJS project: Create a NextJS app by running the following command in your terminal:
```
pnpm dlx create-next-app@canary rag-tool
```
Navigate into your new project directory:
```
cd rag-tool
```
Install required packages: Next, we'll install ai, Vercel's AI library, openai, OpenAI's official JavaScript SDK compatible with the Vercel Edge Runtime, and dewy-ts, the Dewy client library. Zod will be used to describe the input types for our custom tool.
```
pnpm install ai openai zod dewy-ts
```
Prepare your environment: Configure your OpenAI key and Dewy endpoint.
```
// .env.local
OPENAI_API_KEY=xxxxxxxxx
DEWY_ENDPOINT=localhost:8000
```

Create a custom tool

First, create the custom tool the LLM will use to answer product-related questions. This tool describes its purpose and parameters so the LLM knows when and how to use it, and defines the implementation and UI logic to complete the tool invocation.

In this case, the tool will "get information about products", given a search query and a result count. When the LLM determines this tool should be used, it will generate the query string and choose how many results are appropriate, then execute the render function defined below.

On execution, the tool searches Dewy for relevant products, then calls the LLM again to pick up where it left off. Since this process can take awhile, the render function yield's components indicating that the tool is being used and its progress. These components allow the UI to reflect the specific information relevant to this tool's outcomes.

// app/productSearch.tsx

import { z } from 'zod';
import { Dewy } from 'dewy-ts';
import { OpenAIStream } from 'ai'
import { Tokens } from 'ai/react';

import AssistantMessage from './AssistantMessage';
import ResultCard from './ResultCard';
import SearchCard from './SearchCard';

// Create a Dewy client.
const dewy = new Dewy({
    BASE: process.env.DEWY_ENDPOINT
})

// Implement the tool's logic.
// In this case, we search for the given query
// and return the `count` most similar chunks in the KB.
// The returned chunks are used in the `render` method below.
async function searchProducts(query: string, count: number) {
    const context = await dewy.kb.retrieveChunks({
        collection: "product_info",
        query: query,
        n: count,
    })
    return context.text_results.map(c => c.text)
}

// Define the behavior of the product search tool.
export default function productSearch(aiState, openai) {
    return {
        // A description of the tool.
        // This used by the LLM to decide when to use the tool
        description: 'Get information about products',

        // Parameters control how the tool behaves.
        // These values will be picked by the LLM,
        // so be sure to clearly explain what they're
        // used for.
        parameters: z.object({
            query: z.string().describe(```


                A description of a product 
                or what the product can be used for.


            ```),
            count: z.number().describe(```


                The number of products to return.


            ```),
        }).required(),

        // Configure the tool's behavior.
        // This function will be called after the LLM has
        // chosen to use the tool and generated values for
        // the parameters we configured above.
        render: async function* ({ query, count }) {
            // Let the user know we're looking for 
            // products related to their message
            yield <SearchCard query={query} count={count} />

            // Search for products related to the user's question
            const products = await searchProducts(query, count)

            // Update the message history 
            // with the results we found
            aiState.update([
                ...aiState.get(),
                {
                    role: "function",
                    name: "product_search",
                    content: JSON.stringify(products),
                }
            ]);

            // Now reply to the user.
            // The products we retrieved are part of the state 
            // provided as the messages parameter
            const resp = await openai.chat.completions.create({
                model: 'gpt-4-0125-preview',
                messages: aiState.get(),
                stream: true,
            })

            // Stream the results back as they're generated
            const stream = OpenAIStream(resp, {
                onFinal: (completion) => {
                    // Update the conversation history 
                    // once the full response is received
                    aiState.done([
                        ...aiState.get(),
                        {
                            role: "assistant",
                            content: completion,
                        }
                    ])
                }
            });

            // Display the response alongside 
            // the products provided to the model.
            return <div className="flex flex-col gap-2">
                <ResultCard query={query} results={context} />
                <AssistantMessage>
                    <Tokens stream={stream} />
                </AssistantMessage>
            </div>
        }
    }
}

Setup the server-side message handler

Configure a message handling action to receive new messages from the client. This handler configures the initial state of the chatbot and defines the server's behavior when a message is received from the user.

// app/action.tsx

import { OpenAI } from 'openai';
import { createAI, createStreamableUI, getMutableAIState, render } from 'ai/rsc';

import productSearch from './productSearch';
import AssistantMessage from './AssistantMessage';

// Configure the LLM, in this case OpenAI
const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

async function submitUserMessage(userInput: string) {
  'use server';

  // The AI state. This contains our message history,
  // and will be provided to the LLM each time a 
  // chat completion is generated.
  const aiState = getMutableAIState<typeof AI>();
  aiState.update([
    ...aiState.get(),
    {
      role: 'user',
      content: userInput,
    },
  ]);

  // Helper function for building streamable UI components.
  const ui = render({
    model: 'gpt-4-0125-preview',
    provider: openai,
    messages: [
      { role: 'system', content: `
        You are a product recommendation engine. 
        Respond only with information about products 
        retrieved using the "product_search" function role.
        `},
      { role: 'user', content: userInput }
    ],

    // This determines how generated text 
    // (as opposed to function calls) will
    // be rendered.
    text: ({ content, done }) => {
      if (done) {
        aiState.done([
          ...aiState.get(),
          {
            role: "assistant",
            content
          }
        ]);
      }
      return <AssistantMessage>{content}</AssistantMessage>
    },

    // Configure the tools available to the LLM
    tools: {
      product_search: productSearch(aiState, openai),
    }
  })

  return {
    id: Date.now(),
    display: ui
  };
}

// Define the initial state of the AI. It can be any JSON object.
const initialAIState: {
  role: 'user' | 'assistant' | 'system' | 'function';
  content: string;
  id?: string;
  name?: string;
}[] = [];

// The initial UI state that the client will keep track of, which contains the message IDs and their UI nodes.
const initialUIState: {
  id: number;
  display: React.ReactNode;
}[] = [{id: 0, display: <AssistantMessage>Hi! How can I help you today?</AssistantMessage>}];

// AI is a provider you wrap your application with so you can access AI and UI state in your components.
export const AI = createAI({
  actions: {
    submitUserMessage
  },
  // Each state can be any shape of object, but for chat applications
  // it makes sense to have an array of messages. Or you may prefer something like { id: number, messages: Message[] }
  initialUIState,
  initialAIState
});

Setup the chat UI on the client

Finally, set up the chat UI on the client.
Map over the UI state's messages and insert their display property. Configure a form to update the AI state, then call the submitUserMessage server side handler.

// app/page.tsx
'use client'

import { useState } from 'react';
import { useUIState, useActions } from 'ai/rsc';
import type { AI } from './action';
import { Input } from '@/components/ui/input';
import { Button } from '@/components/ui/button';

import UserMessage from './UserMessage';
import { PlaneIcon } from './icons';

export default function Page() {
  const [inputValue, setInputValue] = useState('');
  const [messages, setMessages] = useUIState<typeof AI>();
  const { submitUserMessage } = useActions<typeof AI>();

  return (
    <>
      <div className="flex-1 overflow-auto px-4 mt-4">
        <div className="grid gap-4 md:gap-8">
          {
            messages.map((message) => (
              <div key={message.id}>
                {message.display}
              </div>
            ))
          }

        </div>
      </div>

      <form onSubmit={async (e) => {
        e.preventDefault();

        // Add user message to UI state
        setMessages((currentMessages) => [
          ...currentMessages,
          {
            id: Date.now(),
            display: <UserMessage>{inputValue}</UserMessage>,
          },
        ]);

        // Submit and get response message
        const responseMessage = await submitUserMessage(inputValue);
        setMessages((currentMessages) => [
          ...currentMessages,
          responseMessage,
        ]);

        setInputValue('');
      }}>
        <div className="border-t-2">
          <div className="flex items-center h-14 px-4">
            <Input
              className="rounded-full flex-1 min-w-0 bg-gray-200 dark:bg-gray-800"
              placeholder="Type a message..."
              value={inputValue}
              type="text"
              onChange={(event) => {
                setInputValue(event.target.value)
              }}
            />
            <Button className="ml-2 h-8 w-8 rounded-full bg-gray-200 dark:bg-gray-800" size="icon">
              <PlaneIcon className="h-4 w-4" />
              <span className="sr-only">Send message</span>
            </Button>
          </div>
        </div>
      </form>
    </>
  )
}

To keep this post from going any longer than it is already, several basic UI components have been omitted, but you can checkout the full implementation in the examples directory of the Dewy repo.

Conclusion

In conclusion, this blog post has guided you through the process of building a Retrieval-Augmented Generation (RAG) tool using Vercel's Generative UI components. This approach moves away from the traditional methods of pre-scripting logic for selecting and utilizing knowledge bases. By empowering the LLM to determine which knowledge bases are necessary and how best to use them, we've showcased a more dynamic, efficient, and user-centric way of handling information retrieval tasks.

By utilizing Dewy as our knowledge base, we've emphasized the importance of a flexible, easy-to-use platform for storing, organizing, and retrieving information. This synergy between Dewy's streamlined data management and Vercel's dynamic UI rendering paves the way for developers to create more intelligent, responsive, and user-friendly applications.

If this tutorial has been helpful and you'd like to help others learn about Dewy, please consider starring our GitHub repo!

Typescript is a perfect fit for your RAG app

Ryan Michael — Mon, 26 Feb 2024 20:06:38 +0000

When it comes to building applications using Retrieval-Augmented Generation (RAG) many full-stack developers assume they need to start by learning Python and its extensive ML and GenAI ecosystem. But, Typescript (or Javascript if that’s your thing) is a perfect fit for this type of application.

In this post we’ll look at the defining challenges of writing a RAG application and see how Typescript’s strengths are perfectly aligned with this problem.

First of all, you don’t need Python

Let's get this out of the way: Python is a great language with a rich tool ecosystem, but for the majority of RAG applications Python isn’t necessary.

While Python's libraries and frameworks are indeed powerful for training models, the reality is that you're more likely to consume existing models than train your own. Leveraging pre-trained models through libraries and APIs is the crux of RAG application development, a domain where TypeScript shines.

Many of the most popular Gen AI libraries (ie LangChain and LlamaIndex) have recognized this fact and released Typescript implementations of their toolkit. Expect to see the Typescript Gen AI ecosystem continue to grow as Gen AI becomes increasingly mainstream.

Prompt Engineering is where you’ll spend your time

At the heart of any retrieval-augmented generation (RAG) application lies the art and science of prompt engineering—a process that involves designing and refining the inputs (prompts) given to large language models (LLMs) to make the most accurate, relevant, and contextually appropriate responses.

const prompt =
    PromptTemplate.fromTemplate(`Answer the question
    based only on the following context:
    {context}

    Question: {question}`);

This process is important because the quality of the output generated by an LLM is directly influenced by how well the prompt is constructed. A well-engineered prompt can dramatically improve the effectiveness of the generative model, making prompt engineering arguably the most critical aspect of building a RAG application.

const chain = RunnableSequence.from([
    {
        context: retriever.pipe(formatDocumentsAsString),
        question: new RunnablePassthrough(),
    },
    prompt,
    model,
    new StringOutputParser(),
]);

const stream = await chain.streamLog(question);

The TypeScript ecosystem is full of tools and libraries that facilitate rich prompt engineering and inference. Frameworks like LangChain and tools for interacting with APIs from OpenAI or Vertex AI are readily accessible in the JavaScript ecosystem. Moreover, utilities like LangSmith for refining prompts and integrated logging mechanisms ensure that developers have everything they need to craft and optimize interactions with AI models.

Orchestrating API calls is what will make your app smart

At the heart of RAG apps lies the ability to fetch, process, and synthesize information from various data sources. The “Retrieval” portion of RAG is what sets your application apart from naive ChatGPT queries. By pulling in knowledge relevant to your users’ needs, you can provide a tailored and superior experience, and this will generally be accomplished by interacting with APIs and knowledge bases.

const messages =  [
    ...prompt, 
    [{role: 'user': content: 'Tell me about RAG'}]
]
const res = await openai.chat.completions.create({
    messages,
    model: 'gpt-3.5-turbo',
    temperature: 0.7,
})

TypeScript excels at orchestrating API calls and handling responses, seamlessly consolidating disparate data into cohesive, actionable insights. This ability to efficiently manage asynchronous operations and network requests is inherent to TypeScript, thanks to its event-driven nature and robust support for Promises and async/await syntax.

User experience is king

User experience can make or break a RAG application, where the dynamic and often unpredictable nature of generated content presents unique challenges for user interface (UI) and user experience (UX) design. The stochasticity (unpredictability) inherent in the responses of generative models requires interfaces that can gracefully handle and present varying outputs in a coherent and user-friendly manner.

Additionally, the necessity for user feedback loops to gauge and improve the performance of generative models makes it important to design interfaces that actively engage users and encourage interaction. Given these applications' computational requirements, optimizing for streaming and minimizing latency are critical to maintaining a seamless user experience. Ensuring that the application remains responsive and interactive, even as it performs complex back-end operations, is paramount to user satisfaction and overall application success.

TypeScript, with its roots deeply embedded in web technologies, is unparalleled in crafting responsive and interactive user interfaces. The ecosystem offers a variety of frameworks and libraries, such as React, Vue, and Angular, which enable you to build applications that not only look great but also perform exceptionally well, even in real-time, latency-sensitive environments. Moreover, TypeScript can run both on the client and server side, making it easier to build amazing applications quickly.

Dewy handles document ingestion and indexing

Document ingestion and indexing is an area where Python's mature Gen AI ecosystem has set it apart from Typescript. Until recently, the most commonly used libraries for preparing documents for RAG have been Python-based, and managing this process has been challenging:

Libraries like LangChain give you a hundred ways to do the same thing and no guidance on which is best
Orchestrating document processing and vector indexing in a robust, fault tolerant way is challenging (more on this)
Querying and ranking extracted information is complex and it can be difficult to understand why your results aren’t relevant to the context

These challenges are why we created Dewy.

const document = await dewy.kb.addDocument({
    collection: 'receipts',
    url: document_url,
});

Dewy is a standalone knowledge base that simplifies the complex process of extracting and querying your documents. Dewy fills in the last piece of the RAG puzzle, allowing you to build applications from start to finish quickly using the tools you’re already familiar with.

Conclusion

The development landscape for RAG applications is evolving, with TypeScript leading the charge in accessibility, functionality, and efficiency. This ecosystem not only provides a comprehensive suite of tools for interacting with advanced AI models but also excels in the critical areas of API integration, user interface development, and data processing.

If you're embarking on the journey of creating a RAG application, TypeScript offers a compelling, versatile, and powerful toolkit that aligns perfectly with the demands of modern app development.

RAG for Real - Gotchas to consider before building your app

Ryan Michael — Thu, 22 Feb 2024 16:04:16 +0000

As developers, we sometimes find ourselves at the transition from experimentation to production wishing we'd thought of something earlier. The journey from a prototype to a fully operational application is full of challenges, especially for RAG and Gen AI applications.

This post presents some of the surprising challenges specific to building RAG applications, and offers concrete solutions to help you build a successful application. The goal is to help you avoid some of the common gotchas and save you some headaches down the road.

Challenge 1: Managing changing documents and metadata

What needs to be done: When a RAG application ingests a document, it often extracts and indexes multiple chunks to facilitate efficient search. Often times, documents are associated with metadata used for access control, hybrid search, and data organization. In a production system, the set of documents is often dynamic, changing as users interact with the application.

Why it's hard: Each chunk of a document is indexed as a separate vector, so any update or deletion requires tracking these fragments and their metadata to ensure consistency. Hybrid search models, which leverage both traditional and vector search mechanisms, require a careful approach to indexing and querying. Additionally, enforcing ACLs at the chunk level complicates permissions management.

Design Around Documents

The key to a successful RAG application lies in recognizing that vector indexes are just one component of a complete solution. Instead of treating documents as just data to be chunked and indexed, think of each document as the central entity around which your application architecture is built.

This perspective allows the development of systems that support complex interactions with documents, such as updates, deletions, and access control, in a more integrated manner. By focusing on documents, you ensure that every part of your application, from search to security, is aligned with the goal of delivering relevant, accessible, and secure content.

Capture document-level metadata

Document metadata can support many use cases, for example hybrid search and access control - but it's important to remember that a single document will be associated with an arbitrary number of extracted text chunks, images, summarizations, etc.

There are two basic ways to manage this type of 1:N relationship: normalized (where you store the metadata in a separate table) and denormalized (where the metadata is duplicated and stored alongside each chunk).

Whenever possible the normalized approach is preferable. For example you can store document metadata alongside documents or as a separate table and use SQL joins to associate metadata with rows in your vector index. This approach simplifies document and metadat changes, which otherwise could affect a huge number of related DB records (chunks, summaries, etc).

Implement versioned document storage

To handle the challenges of document updates and deletions efficiently, it's a good idea to implement a system of document versioning. This approach allows you to manage updates and deletions in a controlled manner, where each version of a document is tracked, and changes can be rolled back if necessary. This setup is particularly beneficial for applications that require frequent updates to their documents, as it minimizes the impact on search functionality and ensures that users always have access to the most current and accurate information. Moreover, versioning supports the implementation of ACLs by providing a clear framework for who has access to what information and when, thereby enhancing the security and integrity of the application.

Challenge 2: Ensuring Persistence and Durability

What needs to be done: Moving from a prototype to a production-ready application often means meeting more stringent requirements for data persistence and durability. Production applications need to work for multiple users concurrently, and users will be continually interacting with the application, adding & removing documents, making queries, etc. In a Jupyter notebook, local or in-memory storage may suffice for temporary experimentation, but production environments demand reliable storage solutions built for concurrent users.

Why it's hard: External databases and persistent queues introduce complexity, including network latency, data synchronization issues, and the need for robust error handling mechanisms. Ensuring the durability of vector indexes, which are crucial for the fast retrieval of information in RAG applications, requires careful selection of storage solutions that can support high read/write throughput. Finally, ingesting documents to support RAG often involves time-consuming data extraction and summarization: these processes shouldn’t happen as part of an API request, and need to be tolerant of machine failures, restarts, and code deployments.

Use a durable ingestion queue

A durable ingestion queue is foundational for managing the continuous addition and removal of documents by multiple users. Services like Amazon SQS and Inngest or durable databases like Postgres can be used to build an ingestion queue that ensures all document processing tasks are reliably captured and processed, even in the face of system failures or restarts.

This queue acts as a buffer, absorbing spikes in activity and allowing document processing tasks to be carried out asynchronously, without impacting the responsiveness of the application to user queries. By decoupling document ingestion from immediate processing, you provide a scalable way to handle user interactions and background tasks, ensuring that the system remains responsive and reliable.

Use idempotent ingestion logic

To manage the complexities of concurrent document workloads and ensure atomic updates, idempotent ingestion logic is essential. This means designing your ingestion processes so that if the same operation is performed multiple times, the results don’t affect each other.

For example, if an LLM is used during document chunking its outputs may change from one run to the next. If your ingestion pipeline includes steps after chunking and a job is restarted during the chunking operation, you could end up with inconsistent results based on different chunking results.

Implementing idempotent operations allows for retries without side effects, which is crucial in a distributed system where network issues or component failures can interrupt tasks. This approach also facilitates concurrent processing, ensuring that multiple workers can safely process tasks without stepping on each other's toes, thereby increasing the efficiency and reliability of document ingestion and updates. Idempotent processes work hand-in-hand with document versioning, allowing new versions to be released atomically after all processing has completed successfully.

Use a remote vector index

Choosing a remote vector index like pgvector or AstraDB is critical for supporting the durable, high-throughput read/write operations necessary for RAG applications. These technologies are designed to handle the demands of vector indexing, offering the scalability and performance needed for fast retrieval of information.

A remote vector index also enables separation of concerns, allowing the computational workload associated with vector operations to be offloaded from the primary application database. This separation ensures that the indexing and retrieval processes do not interfere with each other, maintaining high availability and performance even under heavy load. Moreover, these solutions come with built-in durability and fault tolerance features, ensuring that your vector data remains intact and accessible even in the event of system failures.

Challenge 3: Monitoring, Auditing, and Debugging

What needs to be done: As RAG applications are deployed in production, understanding their operation and performance becomes crucial. You want to know if your users are having a good experience and how much your LLM bill is going to be before it gets out of hand.

Monitoring, auditing, and debugging take on heightened importance, not just for traditional metrics like response times and error rates, but also for tracking the non-determinism of language model outputs and the effectiveness of prompt engineering.

Why it's hard: The inherent non-determinism of language models and the ambiguity of natural language processing make it challenging to predict and diagnose issues. Traditional monitoring tools may not be sufficient to capture the nuanced behavior of RAG applications, requiring developers to think creatively about how to track performance and user interactions.

Persist DB and LLM queries

To effectively monitor and audit RAG applications, it is crucial to persist database (DB) and Large Language Model (LLM) queries. This involves capturing and storing logs of all interactions with the database and the language model, such as Postgres access logs and proprietary tools like LangSmith. The general principle is to ensure that every query, its response, and associated metadata are recorded.

This logging allows you to analyze the performance of both DB queries and LLM interactions over time, identify patterns or anomalies, and troubleshoot issues with data retrieval or language model responses. Additionally, by persisting these queries, organizations can track and forecast usage costs, crucial for managing expenses related to LLM usage.

Design for user feedback

Incorporating mechanisms for user feedback directly into the RAG application is essential for understanding user satisfaction and the effectiveness of the system's outputs. Simple features like thumbs-up/down buttons, star ratings, and options to regenerate responses provide users with a direct way to communicate their experience.

This feedback not only serves as a critical metric for monitoring user satisfaction but also offers valuable data that can be used to fine-tune the application. In the absence of deterministic outputs, user feedback can be a useful way to determine if your application is doing what you expect it to.

Implement LLM performance testing

To ensure the reliability and accuracy of RAG applications over time, implementing a robust regression testing framework is key. Tools like TruLens or RAGAS, can be employed to systematically test the application against a suite of predefined scenarios and inputs. This testing helps identify any deviations from expected outcomes, potentially highlighting issues introduced by updates to the language model, changes in the data, or alterations in the application code.

Regression testing is particularly important in the context of RAG applications due to the non-deterministic nature of language models; it ensures that updates or changes do not adversely affect the application's performance or the quality of its outputs. Furthermore, consistent regression testing facilitates a continuous improvement cycle, where feedback and testing results inform ongoing development and optimization efforts.

Conclusion

Transitioning RAG applications from prototyping to production can be challenging, but with careful planning and the right tools, these hurdles can be overcome. By addressing document management complexities, ensuring data persistence, and implementing robust monitoring and debugging practices, you can deploy RAG applications that are not only powerful but also reliable and maintainable. Remember, the key to success in production is not just leveraging the latest technology but also understanding the nuances of your application and anticipating potential challenges before they arise.

We developed Dewy to incorporate many of these lessons and ensure that you can focus on creating impactful applications without getting bogged down by the underlying complexities.

Dewy is designed to handle document management, data persistence, and operational transparency, allowing you to leverage the power of RAG applications seamlessly.

Check out the docs to learn more!

Building a Question-Answering CLI with Dewy and LangChain.js

Ryan Michael — Thu, 15 Feb 2024 19:47:03 +0000

In this tutorial, we're focusing on how to build a question-answering CLI tool using Dewy and LangChain.js. Dewy is an open-source knowledge base that helps developers organize and retrieve information efficiently. LangChain.js is a framework that simplifies the integration of large language models (LLMs) into applications. By combining Dewy's capabilities for managing knowledge with LangChain.js's LLM integration, you can create tools that answer complex queries with precise and relevant information.

This guide walks you through setting up your environment, loading documents into Dewy, and using an LLM through LangChain.js to answer questions based on the stored data. It's designed for engineers looking to enhance their projects with advanced question-answering functionalities.

Why Dewy and LangChain.js?

Dewy is an OSS knowledge base designed to streamline the way developers store, organize, and retrieve information. Its flexibility and ease of use make it an excellent choice for developers aiming to build knowledge-driven applications.

LangChain.js, on the other hand, is a powerful framework that enables developers to integrate LLMs into their applications seamlessly. By combining Dewy's structured knowledge management with LangChain.js's LLM capabilities, developers can create sophisticated question-answering systems that can understand and process complex queries, offering precise and contextually relevant answers.

The Goal

Our aim is to build a simple yet powerful question-answering CLI script. This script will allow users to load documents into the Dewy knowledge base and then use an LLM, through LangChain.js, to answer questions based on the information stored in Dewy. This tutorial will guide you through the process, from setting up your environment to implementing the CLI script.

You'll learn how to use LangChain to build a simple question-answering applicaiton, and how to integrate Dewy as a source of knowledge, allowing your applicaiton to answer questions based on specific documents you provide it.

Prerequisites

Before diving into the tutorial, ensure you have the following prerequisites covered:

Basic knowledge of Typescript programming
Familiarity with CLI tools development
A copy of Dewy running on your local machine (see Dewy's installation instructions if you need help here).

Step 1: Set Up Your Project

:::info
The final code for this example is available in the Dewy repo if you'd like to jump ahead.
:::

First, create a directory for the TypeScript CLI project and change into the directory

mkdir dewy_qa
cd dewy_qa

With the directory set up, you can install TypeScript and initialize the project:

npm init -y
npm i typescript --save-dev
npx tsc --init

Depending on your environment, you may need to make some changes to your TypeScript config. Make sure that your tsconfig.json looks something like the following:

{
  "compilerOptions": {
    "target": "ES6",
    "module": "CommonJS", 
    "moduleResolution": "node", 
    "declaration": true,
    "outDir": "./dist",
    "esModuleInterop": true,
    "strict": true,
}

Now you're ready to create the CLI application. To keep the code from getting too messy, organize it into several directories, with the following layout

dewy_qa/
├── commands/
│   └── ...
├── utils/
│   └── ...
├── index.ts
├── package.json
└── tsconfig.ts

Each command will be implemented in the commands directory, and shared code will go in the utils directory. The entrypoint to the CLI application is the file index.ts.

Start with a simple "hello world" version of index.ts - you'll start filling it out in the next section

#!/usr/bin/env ts-node-script

console.log("hello world");

To verify the environment is setup correctly, try running the following command - you should see "hello world" printed in the console:

npx ts-node index.ts

Rather than typing out this very long command every time, let's create an entry in package.json for the command. This will help us remember how to invoke the CLI, and make it easier to install as a command:

{
  ...
  "bin": {
    "dewy_qa": "./index.ts"
  }
  ...
}

Now you can run your script with npm exec dewy_qa or npm link the package and run it as just dewy_qa

Step 2: Implement document loading

Load documents by setting up the Dewy client. The first step is to add some dependencies to the project. The first is dewy-ts, the client library for Dewy. The second is commander, which will help us build a CLI application with argument parsing, subcommands, and more. Finally, chalk to makes the prompts more colorful.

npm install dewy-ts commander chalk

Next, implement the load command's logic. You'll do this in a separate file named commands/load.ts. This file implements a function named load, which expects a URL and some additional options - this will be wired up with the CLI in a later section.

Dewy makes document loading super simple - just setup the client and call addDocument with the URL of the file you'd like to load. Dewy takes care of extracting the PDF's contents, splitting them into chunks just the right size for sending to an LLM and indexing them for semantic search.

import { Dewy } from 'dewy-ts'; 

import { success, error } from '../utils/colors';

export async function load(url: string, options: { collection: string, dewy_endpoint: string }): Promise<void> {
  console.log(success(`Loading ${url} into collection: ${options.collection}`));

  try {
    const dewy = new Dewy({
        BASE: options.dewy_endpoint
    })

    const result = await dewy.kb.addDocument({ collection: options.collection, url });

    console.log(success(`File loaded successfully`));
    console.log(JSON.stringify(result, null, 2));

  } catch (err: any) {
    console.error(error(`Failed to load file: ${err.message}`));
  }
}

You may have noticed that some functions were imported from ../utils/colors. This file just sets up some helpers for coloring console output - put it in utils so it can be used elsewhere:

import chalk from 'chalk';

export const success = (message: string) => chalk.green(message);
export const info = (message: string) => chalk.blue(message);
export const error = (message: string) => chalk.red(message);

Step 3: Implement question-answering

With the ability to load documents into Dewy, it's time to integrate LangChain.js to utilize LLMs for answering questions. This step involves setting up LangChain.js to query the Dewy knowledge base and process the results using an LLM to generate answers.

To start, install some additional pacakges - langchain and openai to use the OpenAI API as LLM:

npm install dewy-langchain langchain @langchain/openai openai

:::info
This command is sort of long, so we'll walk through several pieces of it before combining them in the end
:::

Create Clients for OpenAI and Dewy

The first thing to setup is Dewy (as before) and an LLM. One difference from before is that dewy is used to build a DewyRetriever: this is a special type used by LangChain for retrieving information as part of a chain. You'll see how the retriever is used in a just a minute.

const model = new ChatOpenAI({
    openAIApiKey: options.openai_api_key,
});
const dewy = new Dewy({
    BASE: options.dewy_endpoint
})

const retriever = new DewyRetriever({ dewy, collection });

Create a LangChain Prompt

This is a string template that instructs the LLM how it should behave, with placeholders for additional context which will be provided when the "chain" is created. In this case, the LLM is instructed to answer the question, but only using the information it's provided. This reduces the model's tendency to "hallucinate", or make up an answer that's plausible but wrong. The values of context and question are provided in the next step:

const prompt =
PromptTemplate.fromTemplate(`Answer the question 
based only on the following context:

{context}

Question: {question}`);

Build the Chain

LangChain works by building up "chains" of behavior that control how to query the LLM and other data sources. This example uses LCEL, which provides a more flexible programming experience than some of LangChain's original interfaces.

Use a RunnableSequence to create an LCEL chain. This chain describes how to generate the context and question values: the context is generated using the retriever created earlier, and the question is generated by passing through the step's input. The results Dewy retrieves are formatted as a string by piping them to the formatDocumentsAsString function.

This chain does the following:

It retrieves documents using the DewyRetriever and assigns them to context and assigns the chain's input value to question.
It formats the prompt string using the context and question variables.
It passes the formatted prompt to the LLM to generate a response.
It formats the LLM's response as a string.

const chain = RunnableSequence.from([
    {
        context: retriever.pipe(formatDocumentsAsString),
        question: new RunnablePassthrough(),
    },
    prompt,
    model,
    new StringOutputParser(),
]);

Execute the chain

Now that the chain has been constructed, execute it and output the results to the console. As you'll see, question is an input argument provided by the caller of the function.

Executing the chain using chain.streamLog() allows you to see each response chunk as it's returned from the LLM. The stream handler loop is sort of ugly, but it's just filtering to appropriate stream results and writing them to STDOUT (using console.log it would have added newlines after each chunk).

const stream = await chain.streamLog(question);

// Write chunks of the response to STDOUT as they're received
console.log("Answer:");
for await (const chunk of stream) {
    if (chunk.ops?.length > 0 && chunk.ops[0].op === "add") {
        const addOp = chunk.ops[0];
        if (
        addOp.path.startsWith("/logs/ChatOpenAI") &&
        typeof addOp.value === "string" &&
        addOp.value.length
        ) {
        process.stdout.write(addOp.value);
        }
    }
}

Pull it all together as a command

Now that you've seen all the pieces, you're ready to create the query command. This should look similar to the load command from before, with some additional imports.

import { StringOutputParser } from "@langchain/core/output_parsers";
import { PromptTemplate } from "@langchain/core/prompts";
import { formatDocumentsAsString } from "langchain/util/document";
import { RunnablePassthrough, RunnableSequence } from "@langchain/core/runnables";
import { ChatOpenAI } from "@langchain/openai";

import { Dewy } from 'dewy-ts'; 
import { DewyRetriever } from 'dewy-langchain';

import { success, error } from '../utils/colors';

export async function query(question: string, options: { collection: string, dewy_endpoint: string, openai_api_key: string }): Promise<void> {
  console.log(success(`Querying ${options.collection} collection for: "${question}"`));

  try {
    const model = new ChatOpenAI({
        openAIApiKey: options.openai_api_key,
    });
    const dewy = new Dewy({
        BASE: options.dewy_endpoint
    })

    const retriever = new DewyRetriever({ dewy, collection: options.collection });

    const prompt =
    PromptTemplate.fromTemplate(`Answer the question based only on the following context:
    {context}

    Question: {question}`);

    const chain = RunnableSequence.from([
        {
            context: retriever.pipe(formatDocumentsAsString),
            question: new RunnablePassthrough(),
        },
        prompt,
        model,
        new StringOutputParser(),
    ]);

    const stream = await chain.streamLog(question);

    // Write chunks of the response to STDOUT as they're received
    console.log("Answer:");
    for await (const chunk of stream) {
        if (chunk.ops?.length > 0 && chunk.ops[0].op === "add") {
          const addOp = chunk.ops[0];
          if (
            addOp.path.startsWith("/logs/ChatOpenAI") &&
            typeof addOp.value === "string" &&
            addOp.value.length
          ) {
            process.stdout.write(addOp.value);
          }
        }
    }

  } catch (err: any) {
    console.error(error(`Failed to query: ${err.message}`));
  }
}

Step 4: Building the CLI

With Dewy and LangChain.js integrated, the next step is to build the CLI interface. Use a library like commander to create a user-friendly command-line interface that supports commands for loading documents into Dewy and querying the knowledge base using LangChain.js.

First, rewrite index.ts to create the subcommands load and query. The --collection argument determines which Dewy collection the document should be loaded into (Dewy lets you organize documents into different collections, similar to file folders). The --dewy-endpoint argument lets you specify how to connect to Dewy - by default an instance running locally on port 8000 is assumed. Finally, the --openai_api_key argument (which defaults to an environment variable) configures the OpenAI API:

#!/usr/bin/env ts-node-script

import { Command } from 'commander';
import { load } from './commands/load';
import { query } from './commands/query';

const program = new Command();

program.name('dewy-qa').description('CLI tool for interacting with a knowledge base API').version('1.0.0');

const defaultOpenAIKey = process.env.OPENAI_API_KEY;

program
  .command('load')
  .description("Load documents into Dewy from a URL")
  .option('--collection <collection>', 'Specify the collection name', 'main')
  .option('--dewy-endpoint <endpoint>', 'Specify the collection name',  'http://localhost:8000')
  .argument('<url>', 'URL to load into the knowledge base')
  .action(load);

program
  .command('query')
  .description('Ask questions using an LLM and the loaded documents for answers')
  .option('--collection <collection>', 'Specify the collection name', 'main')
  .option('--dewy-endpoint <endpoint>', 'Specify the collection name',  'http://localhost:8000')
  .option('--openai-api-key <key>', 'Specify the collection name', defaultOpenAIKey)
  .argument('<question>', 'Question to ask the knowledge base')
  .action(query);

program.parse(process.argv);

OK, all done - wasn't that easy? You can try it out by running the command:

dewy_qa load https://arxiv.org/pdf/2009.08553.pdf

You should see something like

Loading https://arxiv.org/pdf/2009.08553.pdf into collection: main
File loaded successfully
{
  "id": 18,
  "collection": "main",
  "extracted_text": null,
  "url": "https://arxiv.org/pdf/2009.08553.pdf",
  "ingest_state": "pending",
  "ingest_error": null
}

:::tip
Extracting the content of a large PDF can take a minute or two, so you'll often see "ingest_state": "pending" when you first load a new document.
:::

Next, try asking some questions:

dewy_qa query "tell me about RAG

You should see something like

Querying main collection for: "tell me about RAG"
Answer:
Based on the given context, RAG refers to the RAG proteins, 
which are involved in DNA binding and V(D)J recombination. 
The RAG1 and RAG2 proteins work together to bind specific 
DNA sequences known as RSS (recombination signal sequences) 
and facilitate the cutting and rearrangement of DNA segments 
during the process of V(D)J recombination...

Conclusion

By following this guide, you've learned how to create a CLI that uses Dewy to manage knowledge and LangChain.js to process questions and generate answers. This tool demonstrates the practical application of combining a structured knowledge base with the analytical power of LLMs, enabling developers to build more intelligent and responsive applications.

Building a RAG chatbot with NextJS, OpenAI & Dewy

Ryan Michael — Mon, 12 Feb 2024 15:40:45 +0000

Creating a Retrieval-Augmented Generation (RAG) application allows you to leverage the capabilities of language models while grounding their responses in specific, reliable information you provide to the model.

This guide will walk you through building a RAG application using NextJS for the web framework, the OpenAI API for the language model, and Dewy as your knowledge base.

By the end of this tutorial, you'll understand how to integrate these technologies to reduce hallucinations in language model responses and ensure the information provided is relevant and accurate.

What is Dewy?

Dewy is a knowledge base designed to simplify RAG applications by managing the extraction of knowledge from your documents and implementing semantic search over the extracted content.

Providing recent, relevant domain knowledge in the form of documents ensures the models have the right information to answer your questions, without needing to hallucinate.

Using large documents (ie PDF's) for Retrieval-Augmented Generation (RAG) poses several challenges:

Token limits imposed by language models restrict the amount of text that can be analyzed in a single query, requiring documents to be summarized or "chunked" into smaller pieces
Automatically extracting information from documents designed for humans requires complex parsing and post-processing
Documents may change over time, making it difficult to maintain the accuracy and relevancy of the RAG system's outputs.
Finding the portions of a document that are most relevant to a given conversation requires sophisticated semantic search techniques.

Passing the entire content of all the documents through the LLM for every question isn’t practical. Dewy (and RAG in general) addresses these by doing some work upfront – extracting and indexing the content – so that it can do less at query time.

Dewy addresses these challenges by automating the extraction, indexing, and retrieval of information from your documents.

Getting Started

This guide will walk you through how to create a simple RAG-powered chatbot.
The final code is available as an example if you'd rather skip to the end and start hacking 😉.

Prerequisites

Basic knowledge of JavaScript and React.
A NextJS environment set up on your local machine.
A copy of Dewy running on your local machine (see Dewy's installation instructions if you need help here).
Access to the OpenAI API platforms.

Step 1: Set Up Your NextJS Project

Initialize a new NextJS project: Create a new NextJS app by running the following command in your terminal:

   npx create-next-app@latest my-rag-app

Navigate into your new project directory:

   cd my-rag-app

Install required packages: Install client libraries for the OpenAI API and Dewy.
```
npm install openai dewy-ts ai
```
Prepare environment variables: Set up environment variables for the OpenAI API key and your Dewy instance. Create a .env.local file in the root of your NextJS project and add the following lines:

   OPENAI_API_KEY=<your_openai_api_key_here>
   DEWY_ENDPOINT=localhost:8000
   DEWY_COLLECTION=main

Step 2: Create an API endpoint to add documents

// app/api/documents/route.ts

import { Dewy } from 'dewy-ts';

export const runtime = 'edge'

// Create a Dewy client
const dewy = new Dewy({
    BASE: process.env.DEWY_ENDPOINT
})

export async function POST(req: Request) {
    // Pull the document's URL out of the request
    const formData = await req.formData();
    const url = formData.get('url');

    const document = await dewy.kb.addDocument({
        collection: process.env.DEWY_COLLECTION,
        url,
    });

    return NextResponse.json({document_id: result.id})
}

This API handler receives a form containing a document and indexes it in the knowledgebase.
Dewy takes care of downloading the document, extracting information from it and making that information available as searchable chunks.

Step 3: Create an API endpoint for chat generation

Create a generation function: This function will take the user's query and the retrieved documents from Dewy, and send a request to the OpenAI API to generate a response. The key is to format the prompt to include relevant information from the retrieved documents.

// app/api/chat/utils.tsx

import OpenAI from 'openai';
import { Dewy } from 'dewy-ts'; 

// Create Dewy and OpenAI clients
const openai = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY
})
const dewy = new Dewy({
    BASE: process.env.DEWY_ENDPOINT
})

export async function generate({query}) {
    // Search Dewy for chunks relevant to the given query.
    const context = await dewy.kb.retrieveChunks({
        collection: process.env.DEWY_COLLECTION,
        query: query, 
        n: 10,
    });

    // Build an augmented prompt providing the retrieved chunks as context for the LLM.
    const prompt = [{
        role: 'system',
        content: `You are a helpful assistant.
            You will take into account any CONTEXT BLOCK 
            that is provided in a conversation.
            START CONTEXT BLOCK
            ${context.results.map((c: any) => c.chunk.text).join("\n")}
            END OF CONTEXT BLOCK`,
    } ]

    // Call the OpenAI chat completion API to generate a response
    const messages =  [...prompt, [{role: 'user': content: 'Tell me about RAG'}]]
    const res = await openai.chat.completions.create({
        messages,
        model: 'gpt-3.5-turbo',
        temperature: 0.7,
    })

    return res
}

Create the route handler: This function handles chat messages by calling the generation function we just created and streaming back the generated response in real-time.

// app/api/chat/route.tsx
import { OpenAIStream, StreamingTextResponse } from 'ai';

import { generate } from "./utils";

export async function POST(req: Request) {
    const json = await req.json()
    const { messages } = json

    // Generate a response to the updated conversation
    const response = await generate(messages);

    // Convert the response into a friendly text-stream
    const stream = OpenAIStream(response);

    // Respond with the stream
    return new StreamingTextResponse(stream);
    }

Step 4: Build the Frontend

Basic form for loading Documents: This component creates a simple form with a text box and submit button, for sending URL's to the document creating route we created earlier.

// app/components/AddFromUrl.tsx
import React, { useState, FormEvent } from 'react';

export default function AddFromUrl(props) {
    async function onSubmit(event: FormEvent<HTMLFormElement>) {
        event.preventDefault()

        const formData = new FormData(event.currentTarget)
        await fetch('/api/documents', {
            method: 'POST',
            body: formData,
        })
    }

    return (
        <form onSubmit={onSubmit} {...props}>
            <input type="text" name="url" placeholder="URL to load..."/>
            <button type="submit">Load</button>
        </form>
    )
}

Create a simple chat UI: Use NextJS pages to build a user interface where users can input their queries. This will involve creating a form in the pages/index.js file.

    // app/page.tsx

    'use client';

    import { useChat } from 'ai/react';
    import AddFromUrl from './components/AddFromUrl';

    export default function Chat() {
        const { messages, input, handleInputChange, handleSubmit } = useChat();
        return (
            <div className="flex flex-col w-full max-w-md py-24 mx-auto stretch">
            {messages.map(m => (
                <div key={m.id} className="whitespace-pre-wrap">
                {m.role === 'user' ? 'User: ' : 'AI: '}
                {m.content}
                </div>
            ))}

            <form onSubmit={handleSubmit}>
                <input
                className="fixed bottom-10 w-full max-w-md p-2 mb-8 border border-gray-300 rounded "
                value={input}
                placeholder="Say something..."
                onChange={handleInputChange}
                />
            </form>
            <AddFromUrl className="fixed bottom-0 w-full max-w-md p-2 mb-8 border border-gray-300 rounded shadow-xl"/>
            </div>
        );
    }

Try it out!

Build your application and run it locally:

npm run dev

You should see a simple chat UI like the following:

Managing Documents for RAG Using Dewy's Admin Console

In addition to API endpoints for managing documents programmatically, Dewy provides a GUI admin console.
You can see the admin console in a browser at port 8000 (ie https://localhost:8000 if you're running Dewy locally).

Dewy's admin console is designed to streamline the management of documents used for Retrieval-Augmented Generation (RAG) applications.
By offering an intuitive interface and comprehensive features, it helps you fine-tune your knowledge bases, ensuring the AI generates responses that are both accurate and relevant.
Here's how you can use Dewy's admin console to manage your documents effectively:

Adding a Document

Upload or input new documents into your Dewy knowledge base through the admin console.

Once a document is added, you can immediately observe how it influences the AI-generated results.
This is useful for assessing the utility of new information and ensuring it aligns with the desired output quality and relevance.

Exploring Extracted Information

Dewy's console allows you to get into the specifics of how information is extracted from each document.
You can view structured data extracted from the text, making it easier to understand how the document might influence generation.

This exploration aids in fine-tuning the extraction process, ensuring that the most relevant pieces of information are highlighted and utilized in the RAG process.

Sample Queries

The admin console lets you test sample queries against your knowledge base.
This helps when evaluating how well the RAG system retrieves relevant document chunks based on different inputs.

By observing what is returned for each sample query, you can quickly gauge the effectiveness of your current document set and retrieval algorithms, making it easier to identify areas for improvement.

Conclusion and Key Takeaways

By building this RAG application, you've learned how to reduce hallucinations by providing specific, relevant information to your Gen AI application.
This approach mitigates common issues such as hallucinations by ensuring the AI's responses are grounded in accurate information, and it addresses the challenge of managing large documents by intelligently retrieving only the most relevant information for each query.

DEV Community: Ryan Michael

Building a RAG tool with Vercel's Generative UI components

Why Dewy and Vercel?

Prerequisites

Set up your project

Create a custom tool

Setup the server-side message handler

Setup the chat UI on the client

Conclusion

Typescript is a perfect fit for your RAG app

First of all, you don’t need Python

Prompt Engineering is where you’ll spend your time

Orchestrating API calls is what will make your app smart

User experience is king

Dewy handles document ingestion and indexing

Conclusion

RAG for Real - Gotchas to consider before building your app

Challenge 1: Managing changing documents and metadata

Design Around Documents

Capture document-level metadata

Implement versioned document storage

Challenge 2: Ensuring Persistence and Durability

Use a durable ingestion queue

Use idempotent ingestion logic

Use a remote vector index

Challenge 3: Monitoring, Auditing, and Debugging

Persist DB and LLM queries

Design for user feedback

Implement LLM performance testing

Conclusion

Building a Question-Answering CLI with Dewy and LangChain.js

Why Dewy and LangChain.js?

The Goal

Prerequisites

Step 1: Set Up Your Project

Step 2: Implement document loading

Step 3: Implement question-answering

Create Clients for OpenAI and Dewy

Create a LangChain Prompt

Build the Chain

Execute the chain

Pull it all together as a command

Step 4: Building the CLI

Conclusion

Further Reading and Resources

Building a RAG chatbot with NextJS, OpenAI & Dewy

What is Dewy?

Getting Started

Prerequisites

Step 1: Set Up Your NextJS Project

Step 2: Create an API endpoint to add documents

Step 3: Create an API endpoint for chat generation

Step 4: Build the Frontend

Try it out!

Managing Documents for RAG Using Dewy's Admin Console

Adding a Document

Exploring Extracted Information

Sample Queries

Conclusion and Key Takeaways