DEV Community

Roshan Sanjeewa Wijesena
Roshan Sanjeewa Wijesena

Posted on

1

AI Chat-Bot with pinecone vector DB

Vector Databases like pinecode is a good candidate to store your custom data, that you want to use in your next AI application.

In the blog post i will be using pinecone vector database which is easy to use and its a cloud native.

Also i will be using openAI apis as my LLM model.

First get your pinecone API Key - https://app.pinecone.io/organizations/-/projects

You will need openAI API key as well to call openAI models

Install below python libs

!pip install langchain
!pip install pinecone-client
!pip install openai
!pip install pypdf
!pip install tiktoken
!pip install langchain-community
%pip install --upgrade --quiet  langchain-pinecone langchain-openai langchain
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.vectorstores import Pinecone
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
import os
Enter fullscreen mode Exit fullscreen mode

We can upload a folder in current workspace and we can upload a any pdf file with your data into that folder. This will be your custom data to train your chat agent

!mkdir pdfs
loader = PyPDFDirectoryLoader("pdfs")
data = loader.load()
Enter fullscreen mode Exit fullscreen mode

Split the loaded data into smaller chunks before inserting into the vector database

text_chunks = text_splitter.split_documents(data)
Enter fullscreen mode Exit fullscreen mode

Set your keys

os.environ["OPENAI_API_KEY"] = "<Key>"

os.environ["PINECONE_API_KEY"] = "<Key>"
Enter fullscreen mode Exit fullscreen mode

Use openAIEmbedding to embedded your texts

embeddings = OpenAIEmbeddings()
Enter fullscreen mode Exit fullscreen mode

Use langchain pineconevectorstore module to store data into the pinecone.
Before that make sure that you have created a new index in pinecone database with a namespace, in my case "roshan"

from langchain_pinecone import PineconeVectorStore
index = "vectorone"
docsearch = PineconeVectorStore.from_documents(text_chunks, embeddings, index_name=index,namespace='roshan')
Enter fullscreen mode Exit fullscreen mode

Now you check your pinecone database you should be able to see data

Image description

You can run your query to ask questions around your uploaded pdf data

docsearch.as_retriever()
query = "what is Scaled Dot-Product Attention?"
docs = docsearch.similarity_search(query)
llm = OpenAI(temperature=0)
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())
qa.run(query)
Enter fullscreen mode Exit fullscreen mode

You can simply build small commandline chatbot to see how its working.

import sys
while True:
  user_input = input(f"Input Prompt:" )
  if user_input == "exit":
    sys.exit()
  if user_input == '':
    continue
  result = qa.run({'query': user_input})
  print(result)
Enter fullscreen mode Exit fullscreen mode

Image description

API Trace View

Struggling with slow API calls?

Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more →

Top comments (1)

Collapse
 
jayanath_liyanage_3991f20 profile image
Jayanath Liyanage

Great 👍

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay