DEV Community: Tiago Souto

AI Series Part V: Creating a RAG chatbot with LangChain (NextJS+Python)

Tiago Souto — Fri, 12 Apr 2024 13:40:38 +0000

Previously, we created our first chatbot integrated with OpenAI and our first RAG chat using LangChain and NextJS.

As mentioned in a previous post, LangChain was originally built in Python and then a JavaScript version was created. For working with AI and ML the most commonly used programming language is Python due to its versatility, frameworks, scalability, and learning curve. So, it’s not a surprise that not only LangChain does better support for Python, but also there are more features and resources available in Python than in JavaScript nowadays to work with AI.

I have almost zero expertise with Python, but I recognize the need for learning at least the basics so I can either use it when needed or even find solutions that are not yet available in JS and try to replicate the logic.

The same could be applied to you. Maybe you work with a team whose backend expertise is Python, or you have that expertise and want to take advantage of the ML support it provides, or maybe you just want to learn something new like me. So, how can we integrate Python with NextJS? I’ll show you in this post by creating a RAG chat app using Python, LangChain, and NextJS.

Starting the Server
First, things first: as always, keep the base chat app that we created in the Part III of this AI series at hand. We’ll reuse a lot of that code, so I won’t repeat myself with the basic setup. Feel free to also get it from this GitHub repo: https://github.com/soutot/ai-series/tree/main/nextjs-chat-base

For this example, we’ll use Flask as for me who has no previous experience with Python it’s the easiest way of creating a web server.

Assuming you already have the base chat app running, let’s start by creating a directory in the root of the project called “flask”. Inside it, create a file app.py, and then let’s create our first Flask endpoint:

from flask import Flask, jsonify
from flask_cors import CORS
from dotenv import load_dotenv

app = Flask(__name__)
CORS(app)
load_dotenv()
@app.route('/', methods=['GET'])
def index():
    return jsonify({"message": 'Flask server is running'})

Cool, but in order to be able to run that, we need to install a few packages and dependencies. So let’s go for it.

Installation
If you don’t have Python 3 installed, download and install it following the official website: https://www.python.org/downloads/

Then let’s install and initialize Flask. For this example, I’m going to run the MacOS bash scripts. If you’re on a Windows machine, please check the commands as suggested on the official website: https://flask.palletsprojects.com/en/3.0.x/installation/

Inside the flask directory, run

python3 -m venv .venv

This will create a virtual environment where packages will be installed. Next, we need to activate this virtual environment to tell Python where we’re working.

. .venv/bin/activate

And now we can install Flask:

pip install Flask flask_cors

All set, now it’s time to run the app:

flask run --host=0.0.0.0 --debug -p 5328

If everything’s correct, you should now be able to access http://localhost:5328 and see the Flask server is running message.

As our initial setup is ready, we can now start working on the RAG app. First, let’s install LangChain dependencies:

pip install langchain langchain-community langchain-core langchain-openai langchainhub python-dotenv gpt4all chromadb

After installing the dependencies you might like to create the requirements.txt file, which is similar to node package.json. So it holds the project dependencies and you can also install all the dependencies from it when starting a fresh app. You can do that by running

pip freeze > requirements.txt

With all the dependencies installed, let's move back to the code.

Python RAG Implementation
Now we'll start the RAG implementation, by creating endpoints, uploading and embedding files, storing in a vector store, retrieving documents, making LLM requests, and returning a response. So let's get started.

We’ll first create a class to handle streaming responses

import queue
q = queue.Queue()
stop_item = "###finish###"

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
import sys
class StreamingStdOutCallbackHandlerYield(StreamingStdOutCallbackHandler):
    def on_llm_start(
        self, serialized, prompts, **kwargs
    ) -> None:
        """Run when LLM starts running."""
        with q.mutex:
            q.queue.clear()
    def on_llm_new_token(self, token, **kwargs) -> None:
        """Run on new LLM token. Only available when streaming is enabled."""
        sys.stdout.write(token)
        sys.stdout.flush()
        q.put(token)
    def on_llm_end(self, response, **kwargs) -> None:
        """Run when LLM ends running."""
        q.put(stop_item)

Then initialize our POST endpoint and import all dependencies we’re going to use on it

from flask import request, Response, stream_with_context
from langchain_community.chat_models import ChatOpenAI
import os
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import GPT4AllEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import ConversationalRetrievalChain
@app.route('/', methods=['POST'])
def send_prompt():
    data = request.get_json()
    prompt = data['prompt']
Then, we’ll create our LLM object that’ll be used to send the chat requests

llm = ChatOpenAI(
      temperature=0,
      openai_api_key=os.getenv('OPENAI_API_KEY'),
      model='gpt-3.5-turbo',
      streaming=True,
      callbacks=[StreamingStdOutCallbackHandlerYield()]
    )

After that, we can initialize the vectorstore instance, where our embedded document will be stored

DB_PATH = "vectorstores/db/"
vectorstore = Chroma(persist_directory=DB_PATH, embedding_function=GPT4AllEmbeddings())
Now, we create the chain that’ll be responsible for putting all-together and sending the requests

chain = ConversationalRetrievalChain.from_llm(
        llm,
        retriever=vectorstore.as_retriever(),
        condense_question_llm=llm,
        return_source_documents=True,
        chain_type="stuff",
        get_chat_history=lambda h : h
    )

We’ll create the system prompt to provide instructions to the LLM and append it to our chain

PROMPT_TEMPLATE = """You are a good assistant that answers questions. Your knowledge is strictly limited to the following piece of context. Use it to answer the question at the end.
    If the answer can't be found in the context, just say you don't know. *DO NOT* try to make up an answer.
    If the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.
    Give a response in the same language as the question.

    Context: {context}
    """
system_prompt = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
chain.combine_docs_chain.llm_chain.prompt.messages[0] = system_prompt

With all that set, we can finally call the chain and return the streaming response to the frontend

ai_response = chain({"question": prompt, 'chat_history': ''})

return Response(stream_with_context(ai_response['answer']))

Great! The endpoint is ready to receive requests. But before we start working on it, there are still a few things left to be done. We’ll start by adding another endpoint for uploading the document file, embedding it, and storing into the vectorstore

Let’s start by creating the endpoint and getting the file content

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import GPT4AllEmbeddings
@app.route('/embed', methods=['POST'])
def embed():
    try:
        if 'file' not in request.files:
            return jsonify({"error": "No file provided"}), 400

        file = request.files['file']
        file_content = file.read().decode("utf-8")

We’ll then create the document chunks by splitting the content using line breaks as separator

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=100,
    separators=[
    "\n\n",
    "\n",
    ]
  )
texts=text_splitter.create_documents([file_content])

We’re now going to use GPT4AllEmbeddings to embed the documents and store on ChromaDB

DB_PATH = "vectorstores/db/"
vectorstore = Chroma.from_documents(documents=texts, embedding=GPT4AllEmbeddings(), persist_directory=DB_PATH)      
vectorstore.persist()

Last, but not least, let’s create a .env file with the OPENAI_API_KEY in the root of the flask app

OPENAI_API_KEY=<your key>

We’re finished with the Python code. You can now restart your server by running

flask run --host=0.0.0.0 --debug -p 5328

Let’s go back to the NextJS App.

NextJS and Python Integration
On the page.tsx, let’s add the uploadFile function, similar to how we did for the NextJS RAG app

async function uploadFile(file: File) {
  try {
    const formData = new FormData()
    formData.append('file', file)
    const response = await fetch('/api/embed', {
      method: 'POST',
      body: formData,
    })

  if (response.ok) {
      console.log('Embedding successful!')
    } else {
      const errorResponse = await response.text()
      throw new Error(`Embedding failed: ${errorResponse}`)
    }
  } catch (error) {
    throw new Error(`Error during embedding: ${error}`)
  }
}

Then, call it inside the handleFileSelected method

const handleFileSelected = async (event?: ChangeEvent<HTMLInputElement>) => {
    if (!event) return clearFile()
    setIsUploading(true)
    const {files} = event.currentTarget

  if (!files?.length) {
      return
    }
    const selectedFile = files[0]
    await uploadFile(selectedFile)
    setFile(selectedFile)
    setIsUploading(false)
    event.target.value = '' // clear input as we handle the file selection in state
  }

Okay, now we need to make sure the NextJS frontend app sends requests to the Flask backend server. To do so, we’ll update the nextjs.config.mjs file and include a rewrite rule:

rewrites: async () => {
    return [
      {
        source: '/api/:path*',
        destination: 'http://localhost:5328/:path*',
      },
    ]
  },

What we’re doing here is telling NextJS that every request sent to /api should be forwarded to http://localhost:5328. It includes subroutes, so sending a request to /api/embed will forward it to http://localhost:5328/embed

If you're running through Docker, instead of using localhost you should use the container's name. So if the container's name is flask-server, then the destination should be http://flask-server:5328/:path*

We can now delete the src/api directory from the NextJS app as it’s no longer needed.

If everything looks good, you can now run the NextJS app

pnpm run dev
and start playing with the chat app.

Great, we’ve now covered the basics of how to use RAG in NextJS with a Python server. There’s still a lot of things you can do from here. But I’ll keep it simple for now.

Let’s move forward with new topics in the upcoming posts. Hope this one was helpful for someone. See you in the next one where we'll talk about using images with GPT-4 Vision to ingest our vectorstore.

GitHub code repository: https://github.com/soutot/ai-series/tree/main/python-chat-rag

AI Series Part IV: Creating a RAG chatbot with LangChain (NextJS)

Tiago Souto — Wed, 10 Apr 2024 11:00:00 +0000

In this post, we'll explore some more coding to build a simple chat app that we can use to ask questions limiting the LLM answers to a specific topic of our desire. But before we start with coding, first, we need to define some concepts that will help us to understand the whole picture.

What is RAG?

Retrieval-Augumented Generation (RAG) is a process of interacting with LLMs using a source of truth to provide the base knowledge the LLM has to provide responses to a user prompt.

This approach was proposed in May 2020, to solve issues that fine-tuned models face such as long-term memory, lack of accuracy for specific outputs, time-consuming, and high costs. But it was largely adopted only in 2023 when it got consolidated as one of the most used techniques to work with LLMs.

How it works?

I won't go deep on this explanation, I'll stay around what's important to know from a practical perspective.
RAG basically needs 4 things to work: a prompt, a store, a retriever, and an LLM.

Prompt: generally, it's textual data provided as input by a user or as instructions by the application
Store: a vector store where the embedded source of truth data (RAG knowledge) is stored
Retriever: it's a method used to find relevant pieces of data in the store
LLM: it's used for interpreting the user prompt, following the application instructions, reading the retrieved context, and generating a response

And, the workflow goes as the following:

A data source is selected to be used (PDFs, DOCX, Web Pages, Video, Audio, Image, etc)
The data source is split into smaller pieces of documents ("chunks")
An LLM is used to embed (convert to vectorial representation) each document chunk
The embedded data is stored in a vector store
A user sends a prompt
The application may provide further instructions about how the LLM should behave as a "system message"
The user prompt is embedded
The embedded user prompt is used to search in the vector store for semantical similar chunks of documents
The most similar documents are retrieved and converted back to textual representation
The user prompt, system prompt, and document chunks are sent to a generative LLM
The LLM generates a response in natural language that'll be sent back to the user

There are many different use cases where RAG can be the best choice for working with LLM like Questions & Answers, Virtual Assistants, Content Generation, and many others.

What's LangChain?

LangChain is a framework originally written in Python - but it also has a JavaScript version - that helps with the development of solutions using AI APIs in really many ways. It has integrations with the most used LLM APIs, vector stores, it can handle document splitting, handles many different retrievers, file loaders, embeddings, prompt templates, and more.

It also has a feature called "chain" that helps to link calls and results from LLM to another. That's very helpful when you have to handle multiple LLM calls in a single request. For example: you can ask an LLM to summarize some content, then get its response and send it to another LLM asking it to generate a nice response for the user, including the summarization data. Or you can split the user request into different LLMs to take advantage of their specializations. Or even to reduce costs.

There are many different use cases for chains. This is really a great feature. And, I'd say LangChain is a must-have for most use cases.

Now that the basic concepts are covered, let's move to the code.

The resource you won't find on Google

I'd like to bring your attention to this because I found it to be very important. When working with open source projects, it's quite common that the documentation sometimes misses some features or details over the API. And that's not different with LangChain. People are doing incredible work with it, but sometimes it's hard to find methods, attribute definitions, and explore the possibilities by just following the official documentation, especially the JS. It does cover lots of things, but not everything. So I highly recommend you keep the API definition (https://api.js.langchain.com/index.html) in hand when you work with it. There are many cases where you'll try to find details on the official docs or by googling it and won't find any relevant resource, but you can find it in the API definition. So keep that in mind. And thanks to LangChain team for providing it for us, this is tremendously helpful. If you ever lose this link, you can get it from references on their GitHub repo: https://github.com/langchain-ai/langchainjs?tab=readme-ov-file#-documentation

Building a RAG Chat

We'll reuse the same base project we did for the OpenAI chat app post. If you missed that, I recommend you take a look at it to understand the project details and follow the OpenAI API setups. You can also download the code from this GitHub repo: https://github.com/soutot/ai-series/tree/main/nextjs-chat-base
First, create a new directory and name it nextjs-langchain-openai. Then initialize the project just like we did before.
Once you get it up and running, we can start by installing the Langchain packages:

pnpm install @langchain/community @langchain/core @langchain/openai langchain

The main package is langchain, but we'll also need @langchain/community to use some packages developed by community, and @langchain/openai to get specific integrations with OpenAI API. The @langchain/core is the base package needed to use all packages other than the main. You can find more details in the official docs: https://js.langchain.com/docs/get_started/installation

Now we need to install our vectorstore dependency. For this sample, we'll use HNSWLib as it's very simple to set up. More details here: https://js.langchain.com/docs/integrations/vectorstores/hnswlib

pnpm install hnswlib-node

Then, update your next.config.mjs as the following:



/** @type {import('next').NextConfig} */
const nextConfig = {
  webpack(config) {
    config.externals = config.externals || [];
    config.externals = [...config.externals, "hnswlib-node"]
    config.resolve.alias['fs'] = false;
return config
  },
};

export default nextConfig;

This is needed to prevent NextJS errors when importing hnswlib and using fs.

With what we need all set, we'll now add a new endpoint to upload our file and create the embeddings.

Inside the api directory, create an embed/route.ts file. And, we'll start by importing the packages needed



import {HNSWLib} from '@langchain/community/vectorstores/hnswlib'
import {OpenAIEmbeddings} from '@langchain/openai'
import {RecursiveCharacterTextSplitter} from 'langchain/text_splitter'
import {NextResponse} from 'next/server'

Then, we'll create our POST method and read the file it'll receive from the form data



export async function POST(request: Request) {
  const data = request.formData()
  const file: File | null = (await data).get('file') as unknown as File
  if (!file) {
    return NextResponse.json({message: 'Missing file input', success: false})
  }

  const fileContent = await file.text()

We'll now initialize the text splitter. We're going to use RecursiveCharacterTextSplitter as this is the recommended way of starting splitting texts. You can find more splitters in the official docs: https://js.langchain.com/docs/modules/data_connection/document_transformers/



const textSplitter = new RecursiveCharacterTextSplitter({
    chunkSize: 1000,
    chunkOverlap: 100,
    separators:['\n']
  })

This is how we split the file content into chunks, by using the createDocuments method



const splitDocs = await textSplitter.createDocuments(fileContent)

Then we initialize the embedding model. In this case, we'll use OpenAIEmbedding which by default uses text-embedding-ada-002. You can use different models, even others than OpenAI's. More details in the official docs: https://js.langchain.com/docs/integrations/text_embedding/openai



const embeddings = new OpenAIEmbeddings({
  openAIApiKey: process.env.OPENAI_API_KEY,
})

And store it in the HSNW vector store



const vectorStore = await HNSWLib.fromDocuments(splitDocs, embeddings)
await vectorStore.save('vectorstore/rag-store.index')
return new NextResponse(JSON.stringify({success: true}), {
  status: 200,
  headers: {'content-type': 'application/json'},
})

Okay, the embedding processing is done. Now we'll be able to generate the vectorial representation of our data source and store it in a vector store for further usage.
We have to update the frontend to send the document to be embedded
Open the page.tsx and add this new function to perform the upload process



async function uploadFile(file: File) {
  try {
    const formData = new FormData()
    formData.append('file', file)
    const response = await fetch('/api/embed', {
      method: 'POST',
      body: formData,
    })
    if (response.ok) {
      console.log('Embedding successful!')
    } else {
      const errorResponse = await response.text()
      throw new Error(`Embedding failed: ${errorResponse}`)
    }
  } catch (error) {
    throw new Error(`Error during embedding: ${error}`)
  }
}

Now, go to the handleSelectedFile method and call this function passing the selectedFile



const handleFileSelected = async (event?: ChangeEvent<HTMLInputElement>) => {
    if (!event) return clearFile()
    setIsUploading(true)
    const {files} = event.currentTarget
    if (!files?.length) {
      return
    }
    const selectedFile = files[0]
    await uploadFile(selectedFile)
    setFile(selectedFile)
    setIsUploading(false)
    event.target.value = '' // clear input as we handle the file selection in state
  }

Cool, we're done here. Now, as the last part, let's update our main API route to retrieve a response based on the user prompt.
First, let's import all dependencies:



import {HNSWLib} from '@langchain/community/vectorstores/hnswlib'
import {BaseMessage} from '@langchain/core/messages'
import {ChatPromptTemplate} from '@langchain/core/prompts'
import {ChatOpenAI, OpenAIEmbeddings} from '@langchain/openai'
import {LangChainStream, StreamingTextResponse} from 'ai'
import {ConversationalRetrievalQAChain} from 'langchain/chains'
import {ChatMessageHistory, ConversationTokenBufferMemory} from 'langchain/memory'
import {NextResponse} from 'next/server'
import {z} from 'zod'

You can find more details of each dependency in the official docs: https://api.js.langchain.com/ and https://js.langchain.com/docs

Now, let's create a prompt template that'll be used to instruct the LLM how it should behave:



const QA_PROMPT_TEMPLATE = `You are a good assistant that answers questions. Your knowledge is strictly limited to the following piece of context. Use it to answer the question at the end.
  If the answer can't be found in the context, just say you don't know. *DO NOT* try to make up an answer.
  If the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.
  Give a response in the same language as the question.

  Context: """"{context}"""
  Question: """{question}"""
  Helpful answer in markdown:`

Let's create the POST method and read the user prompt



export async function POST(request: Request) {
  const body = await request.json()
  const bodySchema = z.object({
    prompt: z.string(),
  })

const {prompt} = bodySchema.parse(body)

Now let's prepare the retriever to read the data from the vector store



try {
    const embeddings = new OpenAIEmbeddings({
      openAIApiKey: process.env.OPENAI_API_KEY,
    })

    const vectorStore = await HNSWLib.load('vectorstore/rag-store.index', embeddings)
    const retriever = vectorStore.asRetriever()

We now initialize the LLM that'll be used to perform the response. Pay attention to the temperature: 0. For RAG we want the LLM to answer more precisely what's in the document, so lower temperatures are better to prevent different terms or concepts than what's retrieved from the store



const {stream, handlers} = LangChainStream()

const llm = new ChatOpenAI({
  temperature: 0,
  openAIApiKey: process.env.OPENAI_API_KEY,
  streaming: true,
  modelName: 'gpt-3.5-turbo',
  callbacks: [handlers],
})

And, finally, perform the LLM request and send back the streaming response



const chain = ConversationalRetrievalQAChain.fromLLM(llm, retriever, {
      returnSourceDocuments: true,
      qaChainOptions: {
        type: 'stuff',
        prompt: ChatPromptTemplate.fromTemplate(QA_PROMPT_TEMPLATE),
      },
    })
chain.invoke({question: prompt, chat_history: ''})
    return new StreamingTextResponse(stream)

Note that we're passing chat_history: '' as we're still not handling it. So for now the LLM will not have the context of the previously sent messages. This is good enough for our testing purposes. I can walk through how to work with history and memory in a further post.

Now, just run
pnpm run dev
and, if everything's correct, you'll be able to see the app running.

Attach a file so it'll be uploaded to the backend and the vector store will be created from the embedding document. Then ask a question that can be answered by the document content you've just uploaded.

In the example below, I used the Part I of this AI series post:

Well, that's just the beginning. There are so many things that can be done, even by changing this simple example.

Hope someone finds it helpful.

See you in the next part.

GitHub code repository: https://github.com/soutot/ai-series/tree/main/nextjs-chat-rag

AI Series Part III: Creating a chatbot with OpenAI GPT Model (NextJS)

Tiago Souto — Tue, 09 Apr 2024 11:00:00 +0000

In the previous posts, I covered some shallow AI concepts and mentioned a few tips to help you use AI tools to improve your performance as a developer. Now it’s time to take a look at some code and build a simple chatbot.

In this post, I’ll assume you have some previous basic knowledge of React and NextJS, so my goal is to focus on the OpenAI integration.

Getting Started

First of all, you need to have an account at OpenAI to be able to generate an API Key, which is mandatory for the code we’ll create can work properly. If you don’t have an account already, please visit the OpenAI (https://openai.com/) webpage and create an account.

After you log in, go to the Apps page (https://platform.openai.com/apps) and select the API

In the left menu, select API Key

Click on Create new secret key, add a name, and click on create button. You’ll be prompted with your secret key. Copy this and save it in a file, we’ll use this key soon.

Now go to the Settings, then to Billing.

If it’s your first time on the platform, you might see $5 in your credit balance, which OpenAI gives for free for people to test. If your balance is $0, then you’ll have to add credits using a payment method. For now, $5 is more than enough for the tests we’re about to do. But for the next posts, you might need to add more funds — we’ll see.

Okay, now that we have the API Key and funds, we can start with the code.

NextJS App Setup

First, let’s create our NextJS app by running

npx create-next-app@latest nextjs-chat-openai
Select the following settings:

TypeScript
ESLint
Tailwind CSS
src/ directory
App Router
Import Alias (@/*)

Now open the project directory and install the following dependencies. For this post, I’ll use pnpm, but you can use npm or yarn if you prefer.

pnpm add ai class-variance-authority clsx date-fns highlight.js lucide-react openai rehype-highlight react-markdown tailwind-merge tailwindcss-animate zustand

ai: it’s a helper library for working with ai chat, streaming, and more
class-variance-authority, clsx, tailwind-merge, and tailwindcss-animate: are tailwind helpers we’ll use to prevent class conflicts, add conditional styles, and more
date-fns: will be used to parse dates
highlight.js, rehype-highlight, and react-markdown: they’ll be used to show code blocks in the chat messages
lucide-react: icons library
openai: we need to use OpenAI API
zustand: for global state management

Now let’s install some dev dependencies:

pnpm add -D @tailwindcss/typography css-loader style-loader prettier prettier-plugin-tailwindcss zod

@tailwindcss/typography, css-loader, style-loader, prettier, and prettier-plugin-tailwindcss: will be used to set tailwind configs and prettify
zod: will be used for API validation

You can add more dependencies as you feel like, such as lint staged, husky, and others. But for this post purposes, those are enough.

Now let’s set the .prettierrc.cjs file. You can add your own preferences or skip this if you don’t like prettier:



module.exports = {
  bracketSpacing: false,
  jsxBracketSameLine: true,
  singleQuote: true,
  trailingComma: 'es5',
  semi: false,
  printWidth: 100,
  tabWidth: 2,
  useTabs: false,
  importOrder: ['^\\u0000', '^@?\\w', '^[^.]', '^\\.'],
  importOrderSeparation: true,
  plugins: ["prettier-plugin-tailwindcss"]
};

We’ll use shadcn-ui to install some UI components. Follow the steps 2 and 3 from their guide here: https://ui.shadcn.com/docs/installation/next

Then, install the following components:

Button: https://ui.shadcn.com/docs/components/button
Input: https://ui.shadcn.com/docs/components/input
Label: https://ui.shadcn.com/docs/components/label
Separator: https://ui.shadcn.com/docs/components/separator
Textarea: https://ui.shadcn.com/docs/components/textarea

It’d take a bunch of time to walk through creating all the components. So I recommend you go to my GitHub repo and copy the following files:

/components/Avatar
/components/Chat
/components/Root
/components/Message
page.tsx
layout.tsx

Also the /lib folder.

A brief explanation of every component:

Avatar.tsx: a wrapper component that receives a children and sets the common styles for the avatar
BotAvatar.tsx: uses the base Avatar.tsx component and renders a robot icon
UserAvatar.tsx: same as BotAvatar, but renders a person icon
index.ts: each component folder has its index file. It exports all components so we can use it like ,
Chat.tsx: the main chat wrapper, receives a list of messages and renders message item containing the message balloon and avatars
ChatInput.tsx: it’s the text input and send button
Message.tsx: A wrapper to set the message item row and avatar positioning
MessageBalloon.tsx: it renders the message text, and also implements markdown handlers, code highlight, copy, and download buttons
Root (index.tsx): contains the html and body tags and global state handlers. This is needed as layout can’t be use client component type
lib/store.ts: it handles the global state manager, currently we’re persisting the messages
lib/utils.ts: it has a helper function to deal with tailwind classes

The page.tsx renders the chat, the input, and handles the logics and api calls

OpenAI Integration

Okay, we have the basics to start working. Now, it’s time to use the OpenAI Key we created before. In the root of the project, create a .env file and add the OPENAI_API_KEY=your_key

Next, create a file in the lib folder and name it openai.ts . In this file we'll initialize the OpenAI API object as the following:



import {OpenAI} from 'openai'

export const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
})

Now, let’s create a new folder inside the src called api and then create a route.ts file inside of it.

We’ll start importing the modules we’ll need:



import {OpenAIStream, StreamingTextResponse} from 'ai' // helpers to deal with ai chat streaming
import {NextResponse} from 'next/server' // NextJS response helper
import {ChatCompletionMessageParam} from 'openai/resources/index.mjs' // type definition
import {z} from 'zod' // used for API scheme validation
import {openai} from '@/lib/openai' // our openai initializer

Then, we’ll create the system prompt that’ll be sent to the OpenAI GPT model. As we’ve mentioned previously in the basic concepts post, the system prompt is the instruction that we’ll send to the LLM in order to define its behavior. Here’s how we set it:



const generateSystemPrompt = (): ChatCompletionMessageParam => {
  const content = `You are a chat bot and will interact with a user. Be cordial and reply their messages using markdown syntax if needed. If markdown is a code block, specify the programming language accordingly.`
  return {role: 'system', content}
}

And, finally, we’ll start writing our POST method that will be called by the frontend.

First, we’ll start with the basic definition and get the prompt argument sent in the HTTP request:



export async function POST(request: Request) {
  const body = await request.json()
  const bodySchema = z.object({
    prompt: z.string(),
  })
  const {prompt} = bodySchema.parse(body)

We use zod to specify the expected argument is a string.

Now we can call our system prompt generator function and store it in a variable so we can use it later

const systemPrompt = generateSystemPrompt()
We’re almost done. Now it’s time to make the request to OpenAI passing some arguments in order to get the GPT response:


 tsx
try {
    const response = await openai.chat.completions.create({
      model: 'gpt-3.5-turbo-16k',
      temperature: 0.5,
      messages: [systemPrompt, {role: 'user', content: prompt}],
      stream: true,
    })

We’re using chat.completions method to create a chat request.

We’re pre-defining the LLM we want to use in the model property. You can replace that with other available models like GPT-4. But keep in mind that different models have different costs.

The temperature means how creative we want the LLM to be, it's a range from 0 to 1, where 0 means we don't want it to be creative (it'll follow the instructions and respond exactly what'd been asked in the prompt) and 1 means we want it to be very creative (it might include additional information and details that's related to the prompt, but that's not been asked for). Each temperature value has a purpose depending on the app we're building. For this example, we'll keep the default 0.5 .

The messages attribute is the list of messages from the chat. We could also add the chat history here to make the LLM aware of the whole conversation context. But for now, we're just passing the system instructions and the user prompt.

Stream is a boolean that defines whether we want the response to be received in streams or if it should wait for it to be ready and sent all at once.

And, finally, we just return the response to the frontend



const stream = OpenAIStream(response)
  return new StreamingTextResponse(stream)
} catch (error) {
  console.log('error', error)
  return new NextResponse(JSON.stringify({error}), {
    status: 500,
    headers: {'content-type': 'application/json'},
  })
}

Running the App

We’re done! If everything’s good, you should be able to test the app by running yarn build && yarn start

Access http://localhost:3000 and you can start chatting with OpenAI GPT-3.5.

This is a very basic example just to give you a starting point. Many other improvements can be implemented like selecting different models, limiting token usage, chat history, and much more.

Extra: Running on Docker

Another way of running the app is using Docker. For now, it’s not needed, that’s why I didn’t include it in the main scope. But this will be helpful for future posts as we’ll start integrating new features. So feel free to add it now so you can use this first project as a base for what’s coming next.

First, create a Dockerfile in the root of the project and add the following:



ARG PNPM_VERSION=8.7.1
FROM node:20.6.1

COPY . ./chat-app
WORKDIR /chat-app
RUN npm install -g pnpm@${PNPM_VERSION}
ENTRYPOINT pnpm install && pnpm run build && pnpm start
Then, create a docker-compose.yaml file in the root of the project and add the following:

services:
  chat-app:
    container_name: chat-app
    build:
      context: .
      dockerfile: Dockerfile
    environment:
      OPENAI_API_KEY: ${OPENAI_API_KEY}
    ports:
      - 3000:3000
    entrypoint: sh -c "pnpm install && pnpm run build && pnpm run dev"
    working_dir: /chat-app
    volumes:
      - .:/chat-app

Now, if you run docker-compose up you'll be able to see the app up and running. Make sure to have Docker installed and running on your machine before running this command.

We’re going to explore more topics in the coming posts.

See you there!

GitHub code repository: https://github.com/soutot/ai-series/tree/main/nextjs-chat-openai

AI Series Part II: Tips for using GPT and Copilot as a Developer

Tiago Souto — Mon, 08 Apr 2024 11:00:00 +0000

Previously, I wrote a bit about how I started working with LLMs and some shallow and basic AI concepts I understood so far.

Before we move forward to the code, let’s talk about something I believe is very important nowadays and that will probably change forever how we do coding.

The usage of generative AIs reached an unexpected spot really fast, as it’s very intuitive and can help people in many ways we can’t even imagine. But speaking from a software engineer/developer perspective, I can visualize on not so far future we look back in time and wonder “how did people used to write code without AI assistance?” because the benefits and performance boosting are so huge that will feel odd to work without it, just like it’s odd to someone used to work with strongly typed languages starts working with a non typed language.

Many people are afraid of AI and others say it’s going to steal our jobs or software engineering is condemned to fade. Recently, the Devin project made a lot of noise in the community as it’s claimed to be the solution to replace human developers. To be honest, I don’t believe in anything of that. AIs are tools, and tools are made for a purpose. For a software engineer, AI can be used to automate repetitive tasks, help troubleshooting a problem, find a quick solution, and many other cases that we’ll explore a bit more in this post. But at the end of the day, there’s one main purpose: boost productivity.

A Misconception

People are used to thinking that the economy is related to hard work. So the harder people work, the richer they get. But that’s wrong. The economy is related to productivity. That’s why the Industrial Revolution changed the world. Not because everybody worked harder, but because they could make things faster. What do the top highest GDP countries have in common? They all are highly productive. And, a similar concept can be applied to individuals. You’ll get richer (and understand that as monetary, freedom, free time, or as you want) as more productive you are. So fighting against the solutions that will help you get more productive is a mistake.

Some companies may start firing people as their crew productivity increases in a way costs can be reduced. However, opportunities will also be created as the barrier to starting a business is reduced. As long as the demand in the market keeps growing, the need for solutions will also grow. So those who are prepared to deliver high-quality work with high productivity may be compensated.

Let’s use the tools we have available to us to our advantage so we can focus on what we’re good: solving problems.

Tips

There are some patterns I like to use when working with GPT and Copilot that help me to instruct the tools to get the results I want other than just letting it guess. Here are a few tips for you to get started.

GPT

Let's start by mentioning a few things that GPT can help you with and a prompt example to achieve that goal:

Brainstorming: GPT can be a great tool for brainstorming ideas or generating content
Prompt example: What are some resources or tutorials to learn {specific technology/framework}

Learning: You can use GPT to learn about new topics or concepts.
Prompt example: Can you explain how to implement {specific algorithm or data structure} in {specific programming language}?

Code generation: You can ask GPT to generate a piece of code for you.
Prompt example: Can you provide a code snippet to connect to a database using {specific programming language}?

Code explanation: You can ask GPT to explain a piece of code to you.
Prompt example: Can you explain what this code is doing {specific algorithm or data structure} in {specific programming language}?

Debugging: You can describe a bug you’re experiencing and ChatGPT may be able to suggest possible solutions.
Prompt example: Can you help me understand the error message {specific error message} in {specific programming language} and give me an example of how to fix it?

You can use it as many other ways you can imagine, like asking it to write unity tests for your code, rewrite a given programming language script using another programming language, ask it to review your code and suggest improvements, write documentation to your code, and many others. A few other tips you may find helpful:

Be clear on your prompt and provide context as much as possible
Ask a single question per request; avoid asking subsequent questions on the same prompt
Use triple quotes to indicate the context/code content (i.e """context""", context)
Make it clear what programming language you’re referring to. You can do it in the context like this javascript \n console.log(‘foo bar’) \n
Give a role to the GPT. If you want it to write a code, start the prompt telling the GPT it’s an experienced software engineer with sharp Python knowledge; or it’s a professor teaching a student about algorithms; or anything else that matches how you want the GPT to behave It’s good to keep in mind the GPT limitations. GPT is a powerful tool, but it’s not perfect. It doesn’t understand context in the same way humans do and can sometimes provide incorrect or nonsensical responses. So your reasoning over the responses is crucial. Always double-check if the responses make sense and be prepared to fix any misinformation the GPT may provide.

Copilot

Copilot is another great tool for boosting your performance as you write code. I used to use 2 main ways to directly ask it to generate codes instead of passively waiting for it to guess:

Add a comment to your code as something like “function to calc 2 ints and return the result”. Then in the next line start writing the function and Copilot will complete the code for you

Write a non-typed function and use it to generate the TypeScript types. It can recognize how every argument is used, look for similar approaches in the code, and create the full Type object.

Using strategies similar to those above, you can do really lots of things with Copilot support, such as:

Code generation
Learning new language
Use it for code reviews
Use it for writing tests
Use it for documentation

Conclusion

The advent of AI in the software development landscape is not a threat but an opportunity. It’s a tool that can significantly enhance our productivity, streamline our workflow, and help us deliver high-quality work more efficiently. The fear of AI replacing human developers is unfounded. Instead, AI tools like GPT and Copilot are here to assist us, automate repetitive tasks, and help us troubleshoot problems more effectively.

GPT can be a great tool for brainstorming, learning, code generation, code explanation, and debugging. On the other hand, Copilot can boost your performance as you write code. It can help with code generation, learning new languages, code reviews, writing tests, and documentation. The trick is to actively instruct it to generate codes instead of passively waiting for it to guess.

There are many other ways of using AI to our advantage, those are just a few examples to give you insights. Now it’s up to you to use a human capability that AI doesn’t have: your creativity.

Hope this can be of any sort of help. Next, we’ll start writing some code.

AI Series Part I: Introduction Concepts

Tiago Souto — Sun, 07 Apr 2024 11:00:00 +0000

In the last post, I did a brief introduction of the AI topic, of myself, and how I started working with it. Now it’s time to start looking at some AI concepts that may help us to understand how it works.

Disclaimer:
I’m not an ML specialist. This is the shallowest shallow description of some AI-related concepts. This is NOT a guide or tutorial of any sort. There might be flaws, misconceptions, and incomplete information. This is a brief of my current understanding of topics and you should treat it as base content to make you think about it and search for more content from specialized sources. And, of course, I’m more than happy if you can share with me what you’ve learned differently, correct me if I say anything inaccurate, or add further details to any related topic.

That said, let’s get started.

Concepts

What’s Artificial Intelligence (AI)?

The Artificial Intelligence concept was first brought up as a study field in 1956 by John McCarthy at Darthmout Conference and many believe this is the AI Birthdate. However, the core concept can be tracked far back in time — and when I say “far” I do mean it.

If we understand the AI concept as a trained humanlike intelligence that resides outside of a life form, then we can go back in time to somewhere around 700 B.C. where, according to historians, that’s when Hesiod and Homer wrote famous Greek poems. Hephaestus forged Talos from bronze, women made of gold, and even automata devices, and to some of them, he gave intelligence — pretty much what we call Artificial Intelligence (or General Artificial Intelligence) nowadays. And, considering that from the mythology perspective, even humankind was made of mud and then granted intelligence, could we also say we are god-made Artificial Intelligence? Well, that’s a topic for another philosophical discussion.

The point is that the AI concept is not something new, but has been in the social imaginary and collective unconscious for a very long time. However, only recently with the advances in technology and research fields, we’ve been able to achieve a new step of AI usage by accomplishing human tasks. And in 2022 OpenAI unlocked a new myriad of possibilities by releasing ChatGPT.

Well, as this is supposed to be a more technical view of AI concepts, I’ll try my best to keep social and philosophical topics out of this from now on. (Sorry about that.)

So going straight to the point and answering the question “What is Artificial Intelligence?” I could briefly say: AI is a concept that represents the ability of a machine to simulate humanlike intelligence and mimic human skills to learn, solve problems, and make decisions, based on pre-trained data and other concepts and techniques such as Neural Network, Machine Learning, Deep Learning, and Data Science.

What’s a Neural Network?

A Neural Network follows the same concept as our brain’s neural system which is responsible for our learning process. The neural network is made of functions (neurons) built to decrypt other functions. A neuron is a linear function that given an input uses algorithms to try different parameters and biases that result in the expected output. This try-and-fail process is also known as Back Propagation. However, the function may not be perfectly decrypted, instead, an approximate solution can be found as it outputs the expected result.

The neurons need to be activated in order to be able to process the different parameters and try to approximate the expected result. This activation is performed by changing its linear function to a non-linear function. And, the amount of parameters used for training a neuron is what the “n b(illion)” parameters mean when we see some Large Language Model naming, “Llama-2–7b” for example. The amount of parameters doesn’t matter that much, what’s most important is how the neural network has been trained, so it can get to a faster and more precise approximation to an expected output given an input.

Basically, a neural system is capable of solving any computing problem, as far as the neural layers are deep enough to process the amount of tests needed, this is what’s called Deep Learning.

All this process is done by mathematical calculations using vectors and matrices.

What are vectors and matrices?

Before explaining vectors and matrices, first, we need to recall that everything in computing is represented by a number. The characters you type on your keyboard, the pixels that form the images you see on your screen. Everything is numbers. This is where a vector takes place.

A vector is a list of numbers. We can think of it as an array. So a vector can, for example, contain the numerical representation of words. A number is a scalar value and is processed by CPUs. And, a vector can be defined with a length bits, like 16-bits, 32-bits, and so on. We can think of the length as the “resolution” of the vector. The higher the bits, the more detailed the data, but the more resource it needs.

A matrix, on the other hand, is a representation of a list of vectors (or an array of arrays). As humans, we can visualize only 3 dimensions (X, Y, and Z), but a matrix can be made of multiple dimensions that we can only understand through mathematics.

To process matrices, we need to make multiplication calculations and that’s why GPUs are needed, as they are built to process a large amount of multiplication calculations efficiently. Originally, GPUs were designed for image processing, as images are composed of matrices. But as LLMs also need matrix processing, GPUs became the best tool for dealing with it.

What’s Embedding?

A common terminology we see when working with Machine Learning and Large Language Models is “embedding”. Embeddings are converters that transform words into a vectorial representation (array of numbers). It then can be stored in a vector store and be used for semantical similarity search.

What is GPT?

GPT that’s largely known from ChatGPT stands for Generative Pre-training Transformer. It’s basically an AI that uses techniques to, given a word, predict the next word according to its probability. Differently from what people can usually believe, GPT does not create anything new, it just uses its trained data to generate a sequence of words by predicting one after another.

What’s LLM?

LLM stands for Large Language Model, it’s an AI system trained to generate human-like texts using NLP (Natural Language Processing) based on patterns learned from a vast number of parameters. GPT is considered an LLM. And, it can be used for a large variety of use cases like summarization, code generation, text generation, question and answers, and more.

We can say the evolution of LLMs are LMMs, or Large Multimodal Models. As LLMs are capable of receiving a single input and output type such as text, LMM can receive multiple input and output types like text, images, videos, etc. It means LMM can receive a text and an image as input and output a video, or any other way around.

Some common LLMs are: GPT-3.5, Mistral, Llama2
Some common LMMs are: GPT-4, Llava, Gemini, Claude-3

How is all of this coming to play together?

Now that we covered the basics, let’s see how everything works together.

Let’s say that OpenAI used Neural Networks, Machine Learning, Deep Learning, and Data Science to train a GPT with 175 billion parameters. Then you open the ChatGPT website and type “what is AI?”. Your prompt will be embedded into a vector representation then the GPUs on OpenAI servers will make dozens of matrix calculations by multiplying the vector representation of your prompt vs. the pre-trained data stored on OpenAI’s vector stores and retrieve the word with the highest semantical similarity score compared to your prompt. A sequence of words will be displayed and you feel like someone is really typing them as they show, but in fact, the words are being calculated one by one in real-time. You get your answer and are mesmerized by how amazing ChatGPT is, but it “just” auto-completed your question and it looked like an answer. It’s really fascinating how well done it can achieve the results and the possibilities to use such technology, even though the GPU consumption may be pretty high.

The 1-bit LLM

When dealing with vectors and matrices, there’s a technique called quantization that means reducing the vector length and so optimizing the energy and storage consumption, as well as increasing the feedback speed. But, it comes with a trade-off of reducing the accuracy of LLM responses.

Recently a Microsoft team released a paper introducing the 1-bit LLM concept. It suggests a different approach to deal with vectors and matrices, by reducing the vector length from 32 or 16-bits to 1.58-bit, in practical terms it’s a 2-bits length. It means each vector would handle only the values -1, 0, or 1. The results of their benchmarking of energy and storage consumption are really nice, as well as the feedback response that drastically reduced. We could even question the need for GPUs for returning LLM responses as matrix calculations would be replaced from multiplications to sum — and this can be well handled by CPUs that are cheaper and demand less energy. This is a huge win when we think of the real costs that big tech companies are paying to allow us to use GPT nowadays. Or even for companies that are self-hosting their LLMs and investing tons of money in cloud services. However, some experts are really unsure if the 1-bit LLM is really going to work as well as the paper suggests, especially when we consider the decrease in accuracy that it may come with. This is something for us all to figure out in the near future once they release the BitNet b1.58 model.

It looks like this model, however, would be game-changing for serving LLMs, but not for training them. Even if the demand for GPUs would not fall to zero, I wonder how it’d impact NVIDIA market value if GPUs are going to be less needed for the AI race, and how Microsoft would be interested in making that true — but that’s a topic for a different discussion.

OpenAI and Anthrophic directions

Recently, CEOs from both companies that are leading the AI trendings nowadays, along with Microsoft and Meta, showed interest in making AI even more immersive tooling to use on daily-basis tasks. Some people speculate for GPT-5 and Claude-4 we might see some sort of feature that integrates with our computers, mobile devices, and gadgets.

This is not something totally new, as there are already projects moving towards to that direction. There’s an open-source project that’s really standing out on that behalf called Open Interpreter. It basically allows us to chat with our computer and ask it to complete tasks for us, from simple things like “move this file from folder A to folder B” to even more complex things like scheduling a meeting and sending emails.

This is just a glimpse of how we’re at just the beginning of the adoption of AI in our daily-basis routine. There are still a lot of fields to explore and solutions to be developed.

Conclusion

In this post, we’ve taken a look at the world of Artificial Intelligence, exploring its origins, its core concepts, and its current applications. We’ve discussed the concept of AI, Neural Networks, vectors and matrices, embedding, GPT, LLMs, the exciting new concept of 1-bit LLMs, and what the big companies seem to be interested in.

Remember, this is just a brief overview of these topics. I encourage you to continue your own research and deepen your understanding of these fascinating concepts. And, as always, I welcome your thoughts, corrections, and additions to this discussion.

For the next posts, we’ll start implementing solutions using mainly JavaScript/TypeScript (NextJS) and a bit of Python.

Stay tuned for the next part of the AI series!

Brave New AI-World

Tiago Souto — Sat, 06 Apr 2024 12:45:08 +0000

The world is continuously changing. Since the dawn of time, the entropy and evolution are non-stoppable. From the void, comes the energy. From the energy, comes the matter. From the matter, comes the most things we know. This endless cycle of processing and changing lives on until the present time and will continue until we're no longer here to witness. And, as nature-made beings, we are part of this cycle and we apply this concept to our daily-basis tasks: we changed the way we create fire, harvest food, and craft shelter. From caves to pyramids, and then buildings that reach the sky, our society changed and so did our needs and capabilities. We created machines, computers, and the internet. And recently we've just created something that will change our society from now on. We created the Artificial Intelligence.

Among a billion people, there's a small self-consciousness grain of sand that I call "me". Just a random guy who started studying HTML and CSS as a kid to play around creating some games webpages, who a few years later would improve those skills to be used no longer as a play, but as a professional.
After many changes and transitions in my career, recently, I came across an unstoppable change that'll affect all. Especially us, who work as developers and software engineers. And as you can already guess, I'm speaking of AIs.

The First Steps

My first professional contact with AI happened in 2023 when a company I worked for as a contractor wanted to build a feature for content summarization. At that point, I was already longing so much to work with it, as I've been having a lot of fun as a user with ChatGPT, Midjourney, and others, had lots of ideas in mind, and I knew that's where our future as professionals is towarding to. So I started following tutorials to learn some basic concepts and finally started working on this feature. It was originally written in Python by a very experienced developer, and my goal was to fully understand this code and translate it to TypeScript. He explained to me how it works and introduced me to some tools like LangChain. So my first challenge was finding the correlations between LangChain's Python and JavaScript APIs as they are a bit different. Also, as I had almost zero experience with Python, I had to learn it a bit to understand how to translate the logic to TypeScript.

It was really fun to work on this and it took me just a few hours to fully convert the code and get it up and running. So our summarization feature worked pretty much as expected.
Working on that feature was pretty fun and I kept studying LangChain and ML in general. I ended up finding a cool tutorial that taught me how to build a web app that generates a title and description for an uploaded video using OpenAI model.

This app consists of uploading a mp4 file, using WebAssembly to convert the video to mp3, then using OpenAI Whisper to transcript the audio to text, and finally using OpenAI GPT 3 to generate the title and description. It simulates a tool that could be used by YouTube content creators to automate some parts of the process of uploading videos.
Originally the tutorial used Vite and a NodeJS server, but I followed a different approach and created a NextJS app, dockernized it, and added a few more features. It was pretty fun.

You can find the project in this GitHub repo: https://github.com/soutot/upload-ai

The Real Challenge

A few weeks later, I was about to start a new project, when I got the opportunity to work on a simple chat app using OpenAI. I did that using my previous knowledge and the result was pretty nice, so more features started being requested. First, upload files and send them as context, then experiment with different LLMs, adding different integrations with third-party APIs, and different tools and features. I started learning RAG, vector stores, and retrievers. I had to learn a bit of Python. I learned different ways of generating data sources to be consumed by the LLMs and of RAG limitations. I have studied so many different things in really a short period. But it was so satisfying that I could see that kid back in time learning the first HTML tags and CSS styles to create web pages for fun. I was really having fun learning all of this.

And, in the end, this experimental app became a real project.
This is so grateful, and I'm glad to have the opportunity to work on things I believe are going to be our new way of coding.

From this project, I could learn really a lot of things in a very short period. I deep dove into LLM and RAG concepts, read a ton of articles, code samples, and tutorials, watched countless videos, and I have no idea how many hours I put into this. But the result was satisfying.

The Goal

In the past, I used to contribute more to the dev community by answering questions on Stackoverflow, Facebook, and Slack groups. Actively participated in events, contributed to open source code, and I also wrote a few blog posts to help people.

The past few years I stepped out of this as I was focused on other things of life and career. And, also, I was missing a topic that I believed I could contribute.

As I was studying all this AI subject, I felt it could be an opportunity to share what I've learned (and still learning) about AIs from a software engineer perspective. There's already lots of new content every day on this subject, but I guess I can contribute with my 2 cents. And that's what motivated me to decide to write this series of posts. I'm far from being an expert and there're still too many things for me to learn, to revisit, and to realize I may have misunderstood. But given all the content I had contact with and the experiences with real-life projects, maybe it can be helpful for somebody, or at least for my future self.

I guess that's it for now. Just wanted to provide an overview of myself and how I got into this, and point this will be more like a learning process than a tutorial. So I recommend you keep your expectations lower, but I hope I can contribute somehow with knowledge.

See you in the next post.

What's coming next:
Part 1: Introduction Concepts
Part II: Tips for using GPT and Copilot as a Developer
Part III: Creating a chatbot with OpenAI GPT Model (NextJS)
Part IV: Creating a RAG chatbot with LangChain (NextJS)
Part V: Creating a RAG chatbot with LangChain (NextJS+Python)
Part VI: Using Images for RAG (NextJS)
Part VII: Evaluating your RAG with TruLens (NextJS+Python)
Part VIII: Evaluating your RAG responses with Ragas (NextJS+Python)
Part IX: Evaluating your RAG responses with custom LLM (NextJS)
Part X: Using different LLMs: Llama2, Mistral, Claude-3 (NextJS)
Part XI: Improving your RAG responses (NextJS)
Part XII: Introducing to LlamaIndex (NextJS)
Part XIII: Serving self-hosted LLMs with vLLM
Part XIV: Working with different vector stores
Part XV: Knowledge Graphs for RAG

GraphQL Anti-pattern in FrontEnd Clients

Tiago Souto — Tue, 07 Apr 2020 02:28:00 +0000

GraphQL came to solve lots of problems in how do we handle data access. One of its benefits is providing an API to the frontend so it can fetch the data as it wants without the need of the backend team to provide a specific endpoint for each data combination. The frontend needs a client (Apollo or Relay) to provide the link with the GraphQL API and so be able to perform queries and mutations as you define them. It gives more power to the frontend to design how they will be handled.

"With great power comes great responsibilities", Uncle Ben

If you don't architect your queries correctly in the frontend, you might have bad problems in the future and some will be hard to fix. So following I'll give some examples from my past experiences about anti-pattern you might avoid when building your queries and fragments to use in your GraphQL Client.

TL;DR

Avoid anti-patterns on your queries and mutations to avoid issues as your application grows

Reuse Fragments in different queries

As smart developers, as we are, we try to avoid repeating ourselves as much as possible (DRY), and so we are into writing reusable codes everywhere and importing them whenever we need it. That's nice! But it may not be so nice when it comes to fragments and queries.

This mindset might lead us to think we can reuse fragments along with different queries. If I need the user name and email, why don't I create a userFragment and spread it on every query that needs the user?

The problem is: what if in a specific component you need now the user job role? And in another component, you need the user friends list? Especially if you're working in a big team, you may easily fall into over fetching issues, and that's what we want the less. GraphQL is designed specially to solve this problem and we are recreating it.

A simple example of different queries using the same fragment
So it's not a matter of never using it, but avoiding using it. Of course, there might be use cases where this approach can be helpful, but in general, it's so easy to get tempted to keep adding items to this fragment and face issues in the future

Solution

Instead of doing that, you can use smart components to solve this issue. So each component in your tree is responsible for defining the fragment with all the data it needs to be rendered. You might repeat yourself more often, but you'll never face over fetch, under fetch, or performance issues. It also helps with the scalability and maintainability of the project.

Nested fragments and over fetching

In the previous topic, we talked about the issue of reusing the fragments and over fetching issues. This one is kind related. Let's say we have an application where the user selects the latest books he/she read. Then, we want to display the list of users and the books they read. So we create a query in our GraphQL using a UserFragment and a BookFragment. Something like this:

Let's say that we need to display to the users the list of the books available in our application to be selected. We would need a query like this:

Cool! Now let's say we want to display the author data in the books' list. We can create an AuthorFragment and add it to the BookFragment:

And here our problem begins.

If we don't need the author data in the user list since we're spreading the AuthorFragment in the BookFragment, and the BookFragment in the UserFragment, when you load the user list your query will fetch more data than your component needs, and so you'll fall into the over fetching issues.

The same applies if after a few months in the project you no longer need some data in the users' list, but you do in the books' list, you might end removing some data from the AuthorFragment to fix the users' list and then introduce errors to the books' list.

This is quite a simple example but think about it in a large and complex application. You'll definitely have headaches when start to face performance issues.

Solution

Instead of using nested fragments, I'd suggest keeping your fragments as flatten as possible. Even if your fragment looks big (too many lines), it's better to have a well-controlled fragment in a single component, than facing over fetching or data dependency issues.

Nested fragments and argument dependencies

Another issue of nested fragments is the argument dependencies. Taking our previous case as an example let's say in our books' list we want to display a list of related books, to do so we create a fragment and use an argument to order the related books by name, so it'll look something like this:

It works great! But do you remember that we have this BookFragment spread in the users' list?

So, since we added the RelatedBooks node with an argument, we now need to define the same argument in the users' list query, otherwise, it'll break your GraphQL request.

If you use the approach of having multiple files for your GraphQL SDL (Schema Definition Language), you might get into the situation of getting this error only in the runtime. As your application grows, and if you don't have enough tests on it, you might only be aware of this new error when your application is already deployed. And of course, we don't want our customers to see a Bad Request GraphQL error.

Solution

The smart components and flatten fragments solution also helps to prevent this error.

Refetch after mutate, instead of using mutate result

This one divides opinions in the community. Especially because Apollo Client has the refetchQueries option in the mutation that allows you to specify which queries you want to refetch after the mutation completes. The main problem with this approach is that it prevents us from using one of the GraphQL concepts.
The GraphQL mutation is designed to send a write method to the server and return a read as a result. So every time you call a mutation, you're also returning a query from it. And this is where it might be considered as an anti-pattern to use the refetchQueries after the mutation. You can design your mutation to return the query you need and update your cache based on this return, so you'll only need a single request in your server.
If you use the refetchQueries, instead, you'll have to reach to your server multiple times: one for the mutation and N more times for as many queries as you have defined in the refetchQueries, which may imply on performance and bandwidth issues.

Solution

So, as already mentioned, the ideal solution for this is to design your mutations to return what your component needs to update your cache, instead of calling all the queries over again.

Final thoughts

So that's it. I just would like to share some of my past experiences and how not following some good practices, in the beginning, can lead to tremendous issues and refactoring in the future. Hope this article can help someone else. I'm also open to suggestions and opinions!

The Coder Lair

Tiago Souto — Sat, 01 Feb 2020 16:50:51 +0000

For RPG and fantasy lovers, there's a well-known rule that shall always be followed: ~~don't split the party~~ NEVER stole a treasure from a dragon's lair - not while the dragon lives.

The Dragons in Fantasy

For those not so familiar with it, I'll tell you that some kind of dragon has the cool hobby of growing piles of treasure. Unfortunately, they don't work 40 hours/week to buy their preciousness, neither jump into a quest for the king in exchange for a bag of gold. Instead, they generally get it from the cities they attack or from adventurers that dare to enter in their lairs. They like to keep their treasure in the very core of the lair, and the tales tell that they are aware of every single piece of gold that lies within their halls. They create some kind of connection with their treasure that allows them to recognize and track down even the smallest copper coin that's miles away from them. And they HATE to the most losing a piece of their piles of treasure.

Now imagine you, a naive (or greedy) adventure that gets in the dragon lair to try to take something from there. No matter how smoothly you walk away, the dragon will know you've been there, and he will know which treasure you're carrying with you - and now you have to be prepared. He'll hunt you down with no cease until it smashes you to the ground.

TL;DR if you touch a dragon's treasure, you are in really big trouble

Starting the analogy

Ok, now that I've introduced you to the dangerousness of touching a dragon treasure, and how the dragon is aware of everything around its lair, let's move the topic.

I want you to think of it in another way now: you are the dragon and the codebase you're working on is your lair.

Welcome, adventurer, to the Coder Lair.

You must be aware of everything that comes in and out of the codebase. Every corner. Everyone who touches it.

But why would I like to do that? You might be asking. Well, I'll tell you that by completely understanding your codebase you can achieve lots of benefits that I'm going to explore more in this article.

The Coder Lair

You know that dragons have a strong connection with their lairs and their treasures, right? They can even cast special powerful spells when fighting inside its lair. All of that due to the knowledge and connection with the environment. So maybe we could say you can become as powerful as a dragon by knowing YOUR environment (the codebase)?

TL;DR the environment brings power, rule yours

So let's list a couple of powers you might get from the Coder Lair:

Support your peers
If you know the details of how things are working, you're more likely to be able to help others on troubleshooting, problem-solving, and time-saving. That's a huge value for you and for the team.
Raise your seniority
The more you learn from the codebase, the faster you'll be in your deliveries, and you might start making fewer mistakes since you've seen similar working solutions a couple of times.
Find patterns
You'll get more likely to reason about the code structuring and it'll become easier to think about solutions
Reuse code
Since you know have large visibility of what's going on and how everything is connected, you'll be able to recognize codes that can be reused in different places. It saves a lot of time and makes sure your code is less error-prone.
Predict possible bugs
You're more aware of what works, what doesn't work, and what could be refactored. So you are more likely to know where the code can fail and fix it before the client gets it.
Find quicker solutions and debug faster
Your troubleshooting skills will be sharper. You'll be more likely to know what caused an error only by its message. You'll probably get a better idea about where to add breakpoints or logs and what the possible state that has triggered it since you've read that code a lot.
Feature insights
Since you have a nice overview of what your code delivers and its flaws, it'll be easier for you to think about improvements and features you can add to it.
To know current limitations
Never fall again in that pitfall of "this is going to be quick". You know what are the limitations of your current code and you can raise a flag to the team before something "easy" that can't be handled yet is suppose to be addressed.
Estimate tasks more precisely
You know if the task requested has a similar solution somewhere else that maybe you can copy from, or if you'll need to write it from the scratch, or even if there's not enough data structuring to handle it. When you have an idea about it you might be able to estimate better your efforts to complete the tasks.
Prevent regressions
You know that bugfix you addressed that broke another page you even didn't know it existed? Well, that's a lack of knowledge in your lair. If you have a complete overview of it you might introduce fewer or no regressions to your code.

"If you know the enemy and know yourself, you need not fear the result of a hundred battles. If you know yourself but not the enemy, for every victory gained you will also suffer a defeat. If you know neither the enemy nor yourself, you will succumb in every battle." Sun Tzu, Art of War

And you want to know the Almighty Dragon Breath move? It is that you can apply most of these items from a team perspective. Not only being careful about your code but also from your peers. And we're going to talk more about it in the next topic.

Into the Coder Lair

Now let me share with you a few tips that might help you get into your Coder Lair:

Read the codebase as much as possible
When you get a task to work on, avoid just touching your single line and read the entire code instead. Try to understand what this code is doing, what are the relationship with the other files, think in a way of how could you explain it to someone else if you were asked to. Little by little you'll get a nice overview of the entire project.
Adopt a design system
Adding a design system to your codebase will help you and the team to make sure everything is written following the same pattern. You'll reason faster about the code and move faster in your product.
Adopt a git flow or similar pattern
This is important to make sure things are getting into the code by following some strict rules, it gives you a lot of control of what's happening.
Review every single PR
Just like the dragon knows when a gold coin is put or removed from its pile of treasures, you must be aware of everything that's coming and leaving the codebase. Review even merged PRs that you couldn't do while it was open.
Understand what problem the PR is solving
This gives you an overview of what's going on, where the product is going to, and if that's the proper solution.
Understand how the PR is solving the problem
This is a crucial step in the code review process. You must understand how that code is fixing the problem. You'll learn a lot from it.
Volunteer to work on the hardest tasks
Your muscles only grow when you lift heavyweight. You can't expect to keep sharped if you only take quick wins. The more complex a task is, the more you'll learn from it
Prefer working on bugs over features
Okay, this one I know many people will disagree. My point is to add new features you may not need to know the codebase, you can just write everything from scratch if you want (I don't recommend it!). If you don't have a strong review process, it's mostly liked your team doesn't even notice it. But to fix a bug, generally, you'll need to keep the same pattern as the already written code, you'll need to read more the code, and think about debugging and regressions. And you can always learn from the features by doing code reviews as item #4 suggests.
Write and read tests
Testing code is an efficient way of getting to know it. You can get an overview of what is expected from it to do, how it should work, and how it shouldn't.
Enforce type-checking
Similar to tests, type-checking is a strong ally when it comes to understand and maintain the code, making it way more clear to reason about why something is written the way it is.
Talk about the code with your peers
There's no omni-knowledge. You will never know everything and you can always learn from every team member, from the 10 years senior to the junior that just got its first job. You should be open to talking about the code with your peers, hear their opinion, and share knowledge.
Do the impossible
Man! Are you nuts? I have limited hours and I got too many tasks to deliver, how would I be able to do all these things? How will I remember all the codebase? Ha! You must be kidding!!! This article is totally bullshit!
Just take this for your life: the close to the impossible, the far from the mediocrity.

Final thoughts

Well, that's it for now. I know it's a quite hard work to do it, I even don't do it as much as I'd like to, but it's always something I'm willing to in my work time. Some thoughts that inspire me and northern my career to try keeping better than I was in the past.
I hope this article helps at least 1 person to become that fearsome and Almighty Dragon!

Feel free to share thoughts about this topic and what you use to do to control your Coder Lair!

Twitter: @tiagocsouto