DEV Community

Akmal Chaudhri for SingleStore

Posted on • Updated on

Quick tip: SingleStoreDB integration with LangChain

Abstract

Recently, SingleStoreDB has been integrated with LangChain. In this short article, we'll walk through a quick example to demonstrate the integration and how easy it is to use these two technologies together.

Introduction

LangChain is a software development framework designed to simplify the creation of applications using Large Language Models (LLMs). In this short article, we'll streamline the example described in a previous article developed before the SingleStoreDB LangChain integration was announced, and show how easy it is to use SingleStoreDB and LangChain together.

As described in the previous article, we'll follow the instructions to create a SingleStoreDB Cloud account, Workspace Group, Workspace, and Notebook.

Fill out the Notebook

First, we'll install some libraries:

!pip install langchain --quiet
!pip install openai --quiet
!pip install singlestoredb --quiet
!pip install tiktoken --quiet
!pip install unstructured --quiet
Enter fullscreen mode Exit fullscreen mode

Next, we'll read in a PDF document. This is an article by Neal Leavitt titled "Whatever Happened to Object-Oriented Databases?" OODBs were an emerging technology during the late 1980s and early 1990s. We'll add leavcom.com to the firewall by selecting the Edit Firewall option in the top right. Once the address has been added to the firewall, we'll read the PDF file:

from langchain.document_loaders import OnlinePDFLoader

loader = OnlinePDFLoader("http://leavcom.com/pdf/DBpdf.pdf")

data = loader.load()
Enter fullscreen mode Exit fullscreen mode

We can use LangChain's OnlinePDFLoader, which makes reading a PDF file easier.

Next, we'll get some data on the document:

from langchain.text_splitter import RecursiveCharacterTextSplitter

print (f"You have {len(data)} document(s) in your data")
print (f"There are {len(data[0].page_content)} characters in your document")
Enter fullscreen mode Exit fullscreen mode

The output should be:

You have 1 document(s) in your data
There are 13040 characters in your document
Enter fullscreen mode Exit fullscreen mode

We'll now split the document into pages containing 2,000 characters each:

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 2000, chunk_overlap = 0)
texts = text_splitter.split_documents(data)

print (f"You have {len(texts)} pages")
Enter fullscreen mode Exit fullscreen mode

Next, we'll set our OpenAI API Key:

import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
Enter fullscreen mode Exit fullscreen mode

and use LangChain's OpenAIEmbeddings:

from langchain.embeddings import OpenAIEmbeddings

embedder = OpenAIEmbeddings()
Enter fullscreen mode Exit fullscreen mode

Now we'll store the text with the vector embeddings in the database system. This is much simpler using the LangChain integration:

from langchain.vectorstores import SingleStoreDB

os.environ["SINGLESTOREDB_URL"] = "admin:<password>@<host>:3306/pdf_db"

docsearch = SingleStoreDB.from_documents(
    texts,
    embedder,
    table_name = "pdf_docs2",
)
Enter fullscreen mode Exit fullscreen mode

We'll replace the <password> and <host> with the values from our SingleStoreDB Cloud account.

We can now ask a question, as follows:

query_text = "Will object-oriented databases be commercially successful?"

docs = docsearch.similarity_search(query_text)

print(docs[0].page_content)
Enter fullscreen mode Exit fullscreen mode

The integration again shows its power and ease of use.

Finally, we can use a GPT to provide an answer, based on the earlier question:

import openai

prompt = f"The user asked: {query_text}. The most similar text from the document is: {docs[0].page_content}"

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ]
)

print(response['choices'][0]['message']['content'])
Enter fullscreen mode Exit fullscreen mode

Here is some example output:

While object-oriented databases are still in use and have solid niche markets,
they have not gained as much commercial success as relational databases.
Observers previously anticipated that OO databases would surpass relational
databases, especially with the emergence of multimedia data on the internet,
but this prediction did not come to fruition. However, OO databases continue
to be used in specific fields, such as CAD and telecommunications. Experts
have varying opinions on the future of OO databases, with some predicting
further decline and others seeing potential growth.
Enter fullscreen mode Exit fullscreen mode

Summary

Comparing our solution in this article with the previous one, we can see that the LangChain integration provides a simpler solution. For example, we did not need to write any SQL statements. The framework abstracted the database access allowing us to focus on the business problem and providing a compelling, time-saving solution.

Top comments (0)