DEV Community

Roshan Sanjeewa Wijesena
Roshan Sanjeewa Wijesena

Posted on

AI Chat-Bot with pinecone vector DB

Vector Databases like pinecode is a good candidate to store your custom data, that you want to use in your next AI application.

In the blog post i will be using pinecone vector database which is easy to use and its a cloud native.

Also i will be using openAI apis as my LLM model.

First get your pinecone API Key - https://app.pinecone.io/organizations/-/projects

You will need openAI API key as well to call openAI models

Install below python libs

!pip install langchain
!pip install pinecone-client
!pip install openai
!pip install pypdf
!pip install tiktoken
!pip install langchain-community
%pip install --upgrade --quiet  langchain-pinecone langchain-openai langchain
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.vectorstores import Pinecone
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
import os
Enter fullscreen mode Exit fullscreen mode

We can upload a folder in current workspace and we can upload a any pdf file with your data into that folder. This will be your custom data to train your chat agent

!mkdir pdfs
loader = PyPDFDirectoryLoader("pdfs")
data = loader.load()
Enter fullscreen mode Exit fullscreen mode

Split the loaded data into smaller chunks before inserting into the vector database

text_chunks = text_splitter.split_documents(data)
Enter fullscreen mode Exit fullscreen mode

Set your keys

os.environ["OPENAI_API_KEY"] = "<Key>"

os.environ["PINECONE_API_KEY"] = "<Key>"
Enter fullscreen mode Exit fullscreen mode

Use openAIEmbedding to embedded your texts

embeddings = OpenAIEmbeddings()
Enter fullscreen mode Exit fullscreen mode

Use langchain pineconevectorstore module to store data into the pinecone.
Before that make sure that you have created a new index in pinecone database with a namespace, in my case "roshan"

from langchain_pinecone import PineconeVectorStore
index = "vectorone"
docsearch = PineconeVectorStore.from_documents(text_chunks, embeddings, index_name=index,namespace='roshan')
Enter fullscreen mode Exit fullscreen mode

Now you check your pinecone database you should be able to see data

Image description

You can run your query to ask questions around your uploaded pdf data

docsearch.as_retriever()
query = "what is Scaled Dot-Product Attention?"
docs = docsearch.similarity_search(query)
llm = OpenAI(temperature=0)
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())
qa.run(query)
Enter fullscreen mode Exit fullscreen mode

You can simply build small commandline chatbot to see how its working.

import sys
while True:
  user_input = input(f"Input Prompt:" )
  if user_input == "exit":
    sys.exit()
  if user_input == '':
    continue
  result = qa.run({'query': user_input})
  print(result)
Enter fullscreen mode Exit fullscreen mode

Image description

Top comments (1)

Collapse
 
jayanath_liyanage_3991f20 profile image
Jayanath Liyanage

Great 👍