DEV Community: Kaylynn

Debugging Your RAG Application: A LangChain, Python, and OpenAI Tutorial

Kaylynn — Thu, 11 Jan 2024 19:41:24 +0000

Let’s explore a real-world example of debugging a RAG-type application. I recently undertook this process while updating our company knowledge base – a resource for potential clients and employees to learn about us.

Tech Stack:

I work with Python and the LangChain framework, specifically using LangChain Expression Language (LCEL) to build chains. You can find the LangChain LCEL documentation here.

This approach services as a good alternative to LangChain’s debugging tool, LangSmith.

# Load memory
def get_session_history(session_id: str) -> ConversationBufferMemory:
    if session_id not in store:
        store[session_id] = ConversationBufferMemory(
            return_messages=True, output_key="answer", input_key="question"
        )
    return store[session_id]

def _get_loaded_memory(x):
    return get_session_history(x["session_id"]).load_memory_variables({"question": x["question"]})

def load_memory_chain():
    return RunnablePassthrough.assign(
        chat_history=RunnableLambda(_get_loaded_memory) | itemgetter("history"),
    )

# Create Question
def create_question_chain():
    return {
        "standalone_question": {
                                   "question": itemgetter("question"),
                                   "chat_history": lambda x: get_buffer_string(x["chat_history"]),
                               }
                               | CONDENSE_QUESTION_PROMPT
                               | llm
                               | StrOutputParser(),
        "role": itemgetter("role"),
    }

# Retrieve Documents
def retrieve_documents_chain(vector_store):
    retriever = vector_store.as_retriever()
    return {
        "role": itemgetter("role"),
        "docs": itemgetter("standalone_question") | retriever,
        "question": lambda x: x["standalone_question"],
    }

# Answer
def create_answer_chain():
    final_inputs = {
        "role": itemgetter("role"),
        "context": lambda x: combine_documents(x["docs"], DEFAULT_DOCUMENT_PROMPT),
        "question": itemgetter("question"),
    }
    return {
        "answer": final_inputs | ANSWER_PROMPT | llm,
        "docs": itemgetter("docs"),
    }

# Final Chain looks like this
chain = load_memory_chain() | create_question_chain() | retrieve_documents_chain() | create_answer_chain()

While debugging, I prefer using a cheaper model like gpt-3.5-turbo for its cost-effectiveness. The less advanced models are more than adequate for basic testing. For final testing and deployment to production, you might consider upgrading to gpt-4-turbo or a similar advanced model.

I also favor Jupyter notebooks for much of my debugging. This way, I can include the notebook in a .gitignore file, reducing cleanup from debugging shenanigans in my main code. I can also run very specific pieces of my code without plumbing overhead.

Initial Observations

I noticed that basic queries received correct answers, but any follow-up question would lack the appropriate context, indicating that conversational memory was no longer functioning effectively.

Here's what I observed:

Question: What are the Focused Labs core values?
> AI: The core values of Focused Labs are Love Your Craft, Listen First, and Learn Why ✅
> Sources: ...

Question: Tell me more about the first one.
> AI: Based on the given context, the first one is about the importance of the "Red" step in Test Driven Development (TDD). ❌
> Sources: ...

However, I expected responses more in line with explanations like "Love Your craft is when you are passionate about what you do."

For more context, this issue with conversational memory arose while I was implementing a new feature: allowing end users to customize responses based on their role. So, for example, a developer could receive a highly technical answer while a marketing manager would see more high-level details.

Debugging Steps

1. Ensure Role Feature Integrity

To avoid impacting the newly implemented role feature, I made it overly obvious and active in every response during this debugging session by temporarily updating my system prompt.

SYSTEM_PROMPT = """Answer the question from the perspective of a {role}."""

DEBUGGING_SYSTEM_PROMPT = """Answer the question in a {role} accent."""

Here's how the AI responded, clearly adhering to my updated prompt:

Question: What are the Focused Labs core values?
Role: pirate
> AI: Arr, the core values of Focused Labs be Love Your Craft, Listen First, and Learn Why, matey! ✅
> Sources: ...

Question: Tell me more about the first one.
> AI: Arr, the first one be talkin' about the importance of reachin' the "Red" stage in Test Driven Development... ✅
> Sources: ...

2. Creating a Visual Representation

I created a diagram of the app to visualize the process flow.

I began at the end of my flow and worked backward to identify issues. I first checked whether my LLM was answering questions based on the provided context. Upon inspecting the sources, I realized that the given context was a blog on TDD.

> Sources: [{'URL': 'https://focusedlabs.io/blog/tdd-first-step-think'}, ...]

Thus, I ruled out the answer component as the source of the bug.

3. Tracing the Bug's Origin

Next, I examined the logic for retrieving documents. I added a 'standalone question' key to every input and output chain to log runtime values, which revealed that questions were being incorrectly rephrased.

💡Adding these keys to the chains allows us to log the values seen by the components at runtime. Using breakpoints will only show the code when it’s instantiated and not populated with real-time values.

# Code Snippet with added keys
def retrieve_documents_chain(vector_store):
    retriever = vector_store.as_retriever()
    return {
          .
                .
                .
        "standalone_question": itemgetter("standalone_question") # Added
    }

def create_answer_chain():
    final_inputs = {
          .
                .
                .
        "standalone_question": itemgetter("standalone_question") # Added 
    }
    return {
          .
                .
                .
        "standalone_question": itemgetter("standalone_question") # Added
    }

I expected the standalone_question to be more specific, like “What can you tell me about the core value of Love your Craft?”

Question: What are the Focused Labs core values?
> standalone_question: What are the core values of Focused Labs? ✅

Question: Tell me more about the first one.
> standalone_question: What can you tell me about the first one? ❌

4. Identifying the Exact Source

I focused on the chat_history variable, suspecting an issue with how the chat history was being recognized.

def retrieve_documents_chain(vector_store):
    retriever = vector_store.as_retriever()
    return {
          .
                .
                .
        "standalone_question": itemgetter("standalone_question") # Added
                "chat_history": itemgetter("chat_history") # Added
    }

def create_answer_chain():
    final_inputs = {
          .
                .
                .
        "standalone_question": itemgetter("standalone_question") # Added 
                "chat_history": itemgetter("chat_history") # Added
    }
    return {
          .
                .
                .
        "standalone_question": itemgetter("standalone_question") # Added
                "chat_history": itemgetter("chat_history") # Added
    }

Question: What are the Focused Labs core values?

Question: Tell me more about the first one.
> chat_history: [] ❌

🔔 Found the issue! Since the chat_history was blank, it wasn’t being loaded as I had assumed.

5. Implementing the Solution

I resolved the issue by checking my conversation memory store. As a dict, the conversation memory store was sensitive to the type of saved messages. I saved the messages with a str converted version of session_id. But, I invoked with an Optional[UUID] version. So, while the conversation memory store itself was set up correctly, I needed to update how I invoked my chain.

result = 
chain.invoke({"question": question, "session_id": session_id, "role": role})

Therefore, I updated the session_id type to str.

result = 
chain.invoke({"question": question, "session_id": str(session_id), "role": role})

6. Confirming the Fix

I confirmed that the conversation memory now functioned correctly.

Question: What are the Focused Labs core values?

Question: Tell me more about the first one.
> chat_history: ['What are the Focused Labs core values?'] ✅
> standalone_question: Can you provide more information about the first core value: Love Your Craft? ✅
> AI: This value means that we are passionate about being the best at what we do, paying attention to every detail... ✅
> Sources: [{'URL': 'https://www.notion.so/Who-are-we-c42efb179fa64f6bb7866deb363fb7ef'}, ...] ✅

7. Final Cleanup and Future-Proofing

I reverted back from the temporary pirate accent debug feature used for easy identification of the role feature.

I decided to maintain detailed logging within the system for future debugging efforts.

Key Takeaways

Debugging AI Systems: A mix of traditional and AI-specific debugging techniques is essential.
Opting for Cost-Effective Models: Use more affordable models to reduce costs during repeated queries.
Importance of Transparency: Clear visibility into each step and component of your RAG accelerates debugging.
Type Consistency: Paying attention to small details, like variable types, can significantly impact functionality.

Thanks for reading!

Stay tuned for more insights into the world of software engineering and AI. Have questions or insights? Feel free to share them in the comments below!

Open AI Assistants: Limited, but Incredible

Kaylynn — Fri, 22 Dec 2023 17:58:47 +0000

In the ever-evolving realm of AI development, OpenAI has ventured into the domain of Retrieval Augmented Generation (RAG) with its latest offering, Assistants. Assistants with Retrieval represents OpenAI's attempt to harness the power of RAG — a technique widely embraced and refined within open-source libraries to augment an AI’s knowledge with custom information. Think ChatGPT, but with the ability to ask it about custom information. While it doesn't yet lead the pack in creating AI-driven knowledge bases, its inception marks a significant step towards more sophisticated and nuanced AI applications. As developers, our dive into this beta tool is not just about evaluating its current capabilities, but also understanding its place in the broader context of RAG's evolution and its potential to reshape our approaches to LLMs.

Incredible Accuracy for Incredible Ease of Use

OpenAI's RAG tool is an awesome prospect for developers eager to explore the nuances of retrieval-augmented models. It serves as a canvas for experimentation, offering a glimpse into the future of sophisticated AI applications. Here's why it's a valuable experimental platform:

Ease of Use: Its user-friendly nature invites developers to explore and experiment with minimal setup. This allows more people to try out more use cases to learn where LLM retrieval could improve their workflows.
Respectable Accuracy: We swapped out our gpt-3.5-turbo model in our custom chatbot with an OpenAI Assistant, and saw a slightly-lower, but similar level of accuracy. This is so cool! Being able to spin up a custom chatbot in a couple of hours with ~75% accuracy rate is incredible! For more information about our custom chatbot, visit our website: focusedlabs.io/ai.

Understanding the Limitations: A Beta Analysis

In its beta stage, this tool has its limitations. The current landscape of RAG in open-source libraries such as Langchain sets a high benchmark, and OpenAI's iteration is an ambitious stride into this territory. However, with ambition comes the teething problems of any beta technology. Here's what developers should be aware of:

Source Citation Feature: A vital feature for gaining user trust, source citation, is not yet functional, indicating future improvements but also a present gap.
Document Limitations: An assistant supports only 20 documents, each up to 512 MB, which is not scalable for most enterprise datasets. For smaller data sets, while the file size itself is not limiting, combining multiple smaller files into 1 larger file is an anti-pattern resulting in loss of structure and context.
Lack of Customization: While the simple interface facilitates quick setup times, high abstraction levels mean less control for developers to adjust and optimize the tool for specific use cases.
Polling Over Streaming: Without streaming capabilities, developers are left with polling methods, which hamper real-time efficiency.

Tips for Maximizing Your Experience

Navigating a beta tool requires a mix of patience and strategy. To make the most of your journey with OpenAI's RAG tool, keep these tips in mind:

Data Management: Efficiently handle your limited document space by converting to and concatenating data wherever possible. We recommend using *.txt formats for ease and compression.
File Type Performance: Experiment with different file types to discover which yields the best results for your specific case. For us, *.txt returned more accurate results over PDFs.
Use with Other Libraries:Langchain supports Open AI Assistants. While Assistants may not be the leader on their own, they are still powerful when combined with other techniques. I also recommend using LlamaHub’s loaders to help integrate with various data sources.

Conclusion

OpenAI's entry into the RAG space is a testament to the ongoing evolution of AI and machine learning technologies. While this particular tool may not be ready for production, it offers a valuable learning curve for developers keen on the future of retrieval-augmented models. By engaging with it critically and creatively, we can contribute to its growth and simultaneously expand our own understanding of where RAG can take us in the realm of AI development.

Enhancing AI Apps with Streaming: Practical Tips for Smoother AI Generation

Kaylynn — Thu, 14 Dec 2023 20:01:13 +0000

Introduction

Interested in building an AI-powered app using generative models? You're on the right track! The realm of generative AI is brimming with untapped potential. A popular approach is to prompt the AI to generate a list of ideas based on provided context, allowing users to select their preferred option and then refine and expand on the content.

An example of a generative AI app is one my team and I created for copywriters to seamlessly integrate storytelling into marketing emails. This user-friendly wizard elevates email creation, combining the power of generative AI with the user’s own expertise and creativity for impactful results. Our development journey led to the integration of streaming technology, significantly reducing AI response times.

In this article, I'll share 2 essential tips to enhance user experience in AI-driven apps through effective use of streaming. Let's dive in!

Tech Stack

Backend: Node.js Typescript with Express
Frontend: React Typescript
AI Integration: OpenAI’s Node SDK and GPT-4

Let’s start with the basics.

First, send requests to OpenAI leveraging their Node.js SDK. Due to our prompt, the AI response is a list with bullet numbers like “1.”. We use the bullet number format to parse the separate options.

import OpenAI from "openai";

require("dotenv").config();
const openai = new OpenAI({
    apiKey: process.env.OPEN_AI_API_KEY,
});
const chatModel = "gpt-4";

export const createIdeas = async ({ occasion }: {
    occasion: string;
}) => {
    const completion = await openai.chat.completions.create({
        messages: [
            {
                role: "user",
                content: `I want to host an event for the following occasion: ${occasion}. Write me a list of 4 separate ideas for this event`,
            },
        ],
        model: chatModel,
    });
    let choiceElementElement = completion.choices[0]["message"]["content"];
    return {ideas: parseIdeas(choiceElementElement || "")};
};

// Use the bullet number format (ex: "1.") to split the ideas into individual elelements in an array
const parseIdeas = (text: string): string[] => {
    const message = text.split(/[0-9]+\./gm);

    const messageSliced = message.slice(1, message.length);
    return [...messageSliced];
};

Then, let’s return the parsed ideas to the frontend. This is example method leverages Express.

app.post("/generate-ideas", async (req: Request, res: Response) => {
  const { occasion } = req.body;
  const generatedIdeas = await createIdeas({ occasion });
  res.send(generatedIdeas);
});

This produces a nicely formatted json response to the frontend that is very easy to pass into the appropriate UI components.

{
    "ideas": [
        " Example first idea \n\n",
        ...
    ]
}

The Challenge: Latency

The hiccup? A waiting time of up to 30 seconds before users can view the AI’s suggestions. Watching a loading icon spin for half a minute is not a good user experience.

Tip #1: Leverage Streaming

Enter OpenAI’s “streaming” feature - a savior for reducing latency. By setting OpenAI’s Node SDK input parameter **stream** to true , we display words to the user as they became available. This doesn’t expedite the complete generation process, but it cuts down the wait time for the first word. Think of it as the “typewriter” effect seen in ChatGPT.

To peek under the hood, the streaming feature uses an HTML5 capability called Server Sent Events (SSE). SSE allows servers to push real-time data to web clients over a single HTTP connection. Unlike WebSockets, which is bidirectional, SSE is unidirectional, making it perfect for sending data from the server to the client in scenarios where the client doesn't need to send data back.

So, we refactor the request to OpenAI to include the input parameter stream and we return a Stream wrapped in an API Promise to our controller method.

export const createIdeas = async ({ occasion }: {
    occasion: string;
}) => {
  return openai.chat.completions.create({
    model: chatModel,
    stream: true,
    messages: [
      {
        role: "user",
        content: `I want to host an event for the following occasion: ${occasion}. Write me a list of 4 separate ideas for this event`,
      },
    ],
  });
};

In our controller method, we clean up the data a little bit, and then send the data over the HTTP connection to the frontend as it is received from our request to OpenAI.

app.post("/generate-ideas", async (req, res) => {
const { occasion } = req.body;

for await (let chunk of await createIdeas({occasion})) {

    // If we don't receive any more content, end the connection. 
  if (chunk.choices[0]?.delta.content == undefined) {
    res.end();
    return;
  }

    // Sometimes the AI will format the response with an extra label. 
    // Remove this. 
  if (chunk.startsWith("data:")) {
    chunk = JSON.parse(chunk.replace("data: ", ""));
  }

    const text = chunk.choices[0].delta.content;
  res.write(`${text}`);
}
};

After understanding the mechanics, our next step was to synchronize the frontend client that communicates with the backend Express endpoint above with the frontend to support SSE.

export const requestIdeas = async (occasion: string) => {
  const res = await fetch(`${process.env.REACT_APP_API_BASE_URL}/generate-ideas`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({occasion})
  });
  return res.body?.pipeThrough(new TextDecoderStream()).getReader();
};

The Impact

The results are night and day. From a staggering 30-second wait, users now see the initial AI-generated content within half a second.

Tip #2: UI Components of Streaming

When users are given multiple options to choose from, the UI should split those different options into different components. For example, each option should be a radio button or a different div. But streaming text in real-time throws a wrench in the works. How can we differentiate, parse each AI-generated suggestion as distinct UI components?

The Solution: Add parsing based on a unique character.

Add a unique Bullet Identifier: Update your prompt to the AI to use an unusual character as a bullet point that is unlikely to appear in the rest of your text. We used the “¶” symbol and updated our prompt to include the following: Start each bullet point with a new line and the ¶ character. Do not include the normal bullet point character, the '-' character, or list numbers.`
Splitting the Stream: We segmented each byte array from the SSE endpoint into distinct words. This separation was pivotal, given that a single SSE byte array content was unpredictable. Sometimes it included a single word, other times it included full phrases that contained the “¶” character, like subject matter. ¶ Engage.
Append each word: Once each word is prepared, we append the value to the appropriate UI component. Tracking the “¶” occurrences helps us assign words to the correct component. For instance, a single “¶” means the content belonged to the first-option component. Repeating this process in the loop until the SSE endpoint closed.

`tsx
export const parseResponseValue = (index: number, value: string, setters: Function[]) => {
// Separate into individual words only (no phrases)
const splitValue = value.split(/(?! {2})/g);

for (let word of splitValue) {
if (word.includes('¶')) {
index++;
} else {
settersindex;
}
}
// Return the index to the calling function for use for the next byte array received from the endpoint.
return index;
};
`

`tsx

const [firstIdea, setFirstIdea] = useState("");
const [secondIdea, setSecondIdea] = useState("");
const [thirdIdea, setThirdIdea] = useState("");
const [fourthIdea, setFourthIdea] = useState("");

const getIdeas = async (newStatus: boolean) => {
const responseReader = await requestIdeas(occasion);

// The index will be incremented each time the unique bullet identifier is seen. This includes the first bullet, so offset by 1.

let index = -1;
while (true) {
const { value, done } = await responseReader.read();
if (done) {
break;
}

index = parseResponseValue(index, value, [
  setFirstIdea,
  setSecondIdea,
  setThirdIdea,
  setFourthIdea,
]);

}
};
`

Though string parsing occasionally fails due to edge cases from the AI, it facilitates an overall better user experience by stylizing real-time text streaming. On encountering an AI anomaly, equip users with the ability to retry AI generation. Generally, this fixes any parsing issue encountered the first time.

Importantly, by avoiding multiple smaller requests, we economized on tokens sent to GPT-4. This not only curtailed costs but also enriched the result quality.

Conclusion

Harnessing the power of generative AI in applications is undeniably transformative, but it doesn't come without its challenges. As we've explored, latency can be a significant hurdle, potentially hampering user experience. However, with innovative solutions like real-time streaming and strategic UI component parsing, we can overcome these challenges, making our applications not only more responsive but also user-friendly.

The Blueprint for Trustworthy AI: Constructing Accurate Chatbots with Sophisticated Data Pipelines

Kaylynn — Thu, 09 Nov 2023 15:49:41 +0000

AI Custom Chatbot with an Advanced Data Pipeline

To get started leveling up your custom AI chatbot, check out our GitHub repository here.

This tutorial shows you how to level-up your data pipeline when building a custom AI chatbot.

We recommend setting up a basic custom chatbot first. This tutorial will build on top of the quick start you can reference here.

Building AI chatbot apps is easy. Customizing is hard.

We are using our preferred tech stack.

We are leveraging Retrieval Augmented Generation (RAG).

The goal of this tutorial is to help level-up your customized chatbot beyond the basics.

Increase accuracy
Grow dataset by integrating more disparate data sources
Earning user trust

Technical Objectives

Increase accuracy by implementing multiple data cleansing techniques
Integrate Notion
Earn user trust by providing the link to the original data source

Here’s a peak at the components of the AI custom chatbot.

For more resources: check out our other blog posts on AI.

From Basic to Custom: AI Chatbot Building 101

Kaylynn — Thu, 02 Nov 2023 15:47:19 +0000

AI Custom Chatbot Quickstart

To get started leveling up your custom AI chatbot, check out our GitHub repository here.

Domain specific AI chatbots are powerful applications that can be used in many use cases:

Virtual Assistants
Knowledge retrieval
Text synthesis
Text formatting
Sentiment analysis

Building AI chatbot apps is easy. Customizing is hard.

This ready-to-go sample project shows you how to build a custom chatbot using our preferred LLM tech stack:

We are leveraging Retrieval Augmented Generation (RAG).

The goal of this tutorial is to demonstrate the basics.

Demonstrate how to add custom data to an LLM model (we're using OpenAI's gpt-3.5-turbo)
Demonstrate a conversational memory LLM chatbot
Demonstrate using agents and tools with the Langchain Framework

Technical Objectives

Ingest data into a vector database
Query the vector database
Query an agent that decides whether to query the vector database

DEV Community: Kaylynn

Debugging Your RAG Application: A LangChain, Python, and OpenAI Tutorial

Tech Stack:

Initial Observations

Debugging Steps

1. Ensure Role Feature Integrity

2. Creating a Visual Representation

3. Tracing the Bug's Origin

4. Identifying the Exact Source

5. Implementing the Solution

6. Confirming the Fix

7. Final Cleanup and Future-Proofing

Key Takeaways

Thanks for reading!

Open AI Assistants: Limited, but Incredible

Incredible Accuracy for Incredible Ease of Use

Understanding the Limitations: A Beta Analysis

Tips for Maximizing Your Experience

Conclusion

Enhancing AI Apps with Streaming: Practical Tips for Smoother AI Generation

Introduction

Tech Stack

The Challenge: Latency

Tip #1: Leverage Streaming

The Impact

Tip #2: UI Components of Streaming

The Solution: Add parsing based on a unique character.

Conclusion

The Blueprint for Trustworthy AI: Constructing Accurate Chatbots with Sophisticated Data Pipelines

AI Custom Chatbot with an Advanced Data Pipeline

Building AI chatbot apps is easy. Customizing is hard.

The goal of this tutorial is to help level-up your customized chatbot beyond the basics.

Technical Objectives

Here’s a peak at the components of the AI custom chatbot.

For more resources: check out our other blog posts on AI.

From Basic to Custom: AI Chatbot Building 101

AI Custom Chatbot Quickstart

Building AI chatbot apps is easy. Customizing is hard.

The goal of this tutorial is to demonstrate the basics.

Technical Objectives

Here’s a peak at the components of a basic AI custom chatbot.

For more resources: check out our other blog posts on AI.