DEV Community: Vardhanam Daga

Using Qdrant’s Discovery API for Video Search

Vardhanam Daga — Tue, 16 Jan 2024 13:41:13 +0000

How Video Search Works

In this blog, we are going to understand how semantic search works for a video database. It is a technique where the description of a video tells a story so that querying becomes easier than ever before. But, first, let’s understand how most video search works currently.

1. Metadata-Based Search: This is one of the most common methods. Videos are tagged with various metadata such as titles, descriptions, tags, and other relevant information. When you search, the system looks for keywords in this metadata to find matching videos. This search is limited to only the keywords contained in the description of the video, and misses the semantic intention behind these descriptions.

2. Transcript or Closed Caption Search: With the advent of better speech recognition technology, many platforms have started using automated transcripts or closed captions for videos. This allows for text-based searching within the video content. When you search for a phrase, the system checks the transcript for matches and returns videos containing those words.

3. Thumbnail-Based Search: Some search algorithms analyze the thumbnails of videos to determine their content and relevance to a search query. This method, however, is less precise than the others.

4. Content-Based Video Retrieval (CBVR): This is a more advanced approach where the search algorithm analyzes the actual content of the video (like objects, scenes, actions, faces, etc.) using image and video processing techniques. This method can be resource-intensive and is not yet widely implemented in most commercial video search platforms.

5. User Interaction Data: Search algorithms also take into account user interaction data like views, likes, comments, and watch history to determine the relevance and ranking of videos in search results.

6. Contextual and Semantic Analysis: Some advanced search systems also perform contextual and semantic analysis of the video description and user query to understand the intent behind a search and the context of the content in the videos.

In this article, we are going to delve deeper into this last point, i.e., semantic analysis of video descriptions.

What Is Semantic Analysis?

Semantic analysis involves understanding the themes, narratives, and concepts presented in the video description, beyond just the visible keywords. For example, a video description might have the words ‘a person running’ but, semantically, it could be about ‘fitness’, ‘perseverance’, or even a ‘sports brand advertisement’.

Let's consider a more nuanced example of semantic video search:

Suppose our Search Query is: 'Easy dinner for busy weekdays’.

A Traditional Keyword-Based Search Approach:

Would focus on keywords like 'dinner,' 'busy,' and 'weekdays.'
Might return generic dinner recipes or videos titled with these specific keywords.

On the other hand, a Semantic Video Search approach would:

Understand the Query Semantics: Recognize that the user is looking for a meal that is quick and easy to prepare, suitable for a busy weekday schedule.The system might also consider semantically related concepts such as 'meal prep ideas,' 'quick healthy dinners,' or 'family-friendly quick recipes,' expanding the search scope to include relevant content that doesn't necessarily match the exact query terms.

Then the system would search for videos where the description aligns with the concept of 'easy' and 'quick' preparation, even if those exact words aren't used in the video's title or description or content. For example, it might prioritize videos with concepts such as '30 minutes or less,' 'simple ingredients,' or 'one-pot meals.'

Sentence Transformers

To generate embeddings for our video description, we’ll use the ‘all-MiniLM-L6-v2’ sentence transformers from hugging face. This model can map sentences of our video description to a vector space of 384 dimensions, and is ideal for tasks like semantic searching.

Building a Semantic Video Search Engine with Qdrant

In this article, we’ll use a collection of GIFs (which is, basically, low-resolution short video snippets), and use the sentence-transformers model to generate their embeddings. Then we’ll store these embeddings in Qdrant, which is a vector database designed for efficient storing, searching and retrieval of vector embeddings.

An interesting feature of Qdrant is its Discovery API. This API is particularly notable for refining search parameters to achieve greater precision, focusing on the concept of 'context.' This context refers to a collection of positive-negative pairs that define zones within a space, dividing the space into positive or negative segments. The search algorithm then prioritizes points based on their inclusion within positive zones or their avoidance of negative zones.

The Discovery API offers two main modes of operation:

Discovery Search: This utilizes a target point to find the most relevant points in the collection, but only within preferred areas. It's essentially a controlled search operation.

(from Qdrant’s Documentation)

Context Search: Similar to discovery search but without a target point. Instead, it uses 'context' to navigate the Hierarchical Navigable Small World (HNSW) graph towards preferred zones. This mode is expected to yield diverse results, not centered around one point, and is suitable for more exploratory approaches in navigating the vector space.

(from Qdrant’s Documentation)

Setting Up the Environment and Code

We’ll use data from the raingo/TGIF-Release GitHub repository. It’s a tsv file containing 10,000 rows of GIF urls and descriptions. Download it with the following command on your Colab notebooks.

!wget https://github.com/raingo/TGIF-Release/raw/master/data/tgif-v1.0.tsv

Next, install all the dependencies.

pip install requests pillow transformers qdrant-client sentence-transformers accelerate tqdm sentence-transformers

Next, launch a cluster on Qdrant Cloud.

Retain the API key of the cluster. In this cluster, we’ll create a collection for storing the vector embeddings (also known as points).

Load the CLIP model and create a Qdrant client.

from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from qdrant_client.http import models

print("[INFO] Loading the model...")
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

client = QdrantClient(url=url="https://xxxxxx-xxxxx-xxxxx-xxxx-xxxxxxxxx.us-east.aws.cloud.qdrant.io:6333",
    api_key="<your-api-key>"
,)
print("[INFO] Client created...")

We’ll now create embeddings of the description of our video. We’ll also create an index for our vectors. Additionally, we’ll make a dictionary of payloads that will contain the url and the description of our videos.

import csv

# Replace 'your_file.tsv' with the path to your TSV file
file_path = '/content/tgif-v1.0.tsv'

# Lists to store the data
descriptions = []
payload = []


# Reading the TSV file
with open(file_path, 'r', encoding='utf-8') as file:
    tsv_reader = csv.reader(file, delimiter='\t')


    # Iterate through each row in the TSV file
    for row in tsv_reader:
        if len(row) >= 2:  # Checking if the row has at least two elements (URL and description)
            url, description = row[0], row[1]
            descriptions.append(description)
            payload.append({"url": url, "description": description})


idx = list(range(1, len(descriptions) + 1))

Create embeddings of the description.

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embeddings = model.encode(descriptions).tolist()

Create a Qdrant collection with the name gif_collection.

print("[INFO] Creating qdrant data collection...")
client.create_collection(
    collection_name="gif_collection",
    vectors_config=models.VectorParams(size=384, distance=models.Distance.COSINE),

Next, we upload the records to our collection. We’ll only upload the first 2500 records for convenience’s sake.

#uploading the records to client
print("[INFO] Uploading data records to data collection...")
client.upsert(
    collection_name="gif_collection",
    points=models.Batch(
        ids=idx[:2500],
        payloads=payload[:2500],
        vectors=embeddings[:2500],
    ),
)

Now we have our Vector DB ready, and we can use the Discovery API to query it.

Let’s set our target to videos of 'tiger', with the positive context as 'animals', but we don’t want any cat videos, so we’ll set the negative context 'cats':

target = 'animals'
positive = 'tiger'
negative = 'cats'
emb_target = model.encode(target).tolist()
emb_positive = model.encode(positive).tolist()
emb_negative = model.encode(negative).tolist()

hits = client.discover(
    collection_name="gif_collection",
    target = emb_target,
    context = [{'positive': emb_positive, 'negative': emb_negative}],
    limit=5,
)

Let’s see what the top 5 results look like:

hits

[ScoredPoint(id=53, version=0, score=1.6944957, payload={'description': 'the tiger is running really fast in the garden', 'url': 'https://38.media.tumblr.com/8c14aa571911985d6d774762f8159452/tumblr_n4gk7efcMw1spy7ono1_400.gif'}, vector=None, shard_key=None),
 ScoredPoint(id=1120, version=0, score=1.6891048, payload={'description': 'a giant tiger creature jumps through the air.', 'url': 'https://38.media.tumblr.com/be596acc2860e691e578c2a64d044d94/tumblr_nq0mhrqyAf1u4fhlko1_250.gif'}, vector=None, shard_key=None),
 ScoredPoint(id=1269, version=0, score=1.6566299, payload={'description': 'a young tiger cub nestles down to sleep next to a large teddy bear.', 'url': 'https://38.media.tumblr.com/b9762b528bbd4a61f4b1c6ac5352b92f/tumblr_nqi6joMEOR1uv0y2ro1_250.gif'}, vector=None, shard_key=None),
 ScoredPoint(id=4, version=0, score=1.6384017, payload={'description': 'an animal comes close to another in the jungle', 'url': 'https://38.media.tumblr.com/9f659499c8754e40cf3f7ac21d08dae6/tumblr_nqlr0rn8ox1r2r0koo1_400.gif'}, vector=None, shard_key=None),
 ScoredPoint(id=2333, version=0, score=1.6347136, payload={'description': 'a view from car where a lion is looking in the tries to get in.', 'url': 'https://33.media.tumblr.com/2aeb5012cde835e61f2e35cd49960971/tumblr_nnums3GbD71ssgyoro1_400.gif'}, vector=None, shard_key=None)]

We get three video descriptions with the word ‘tiger’ in it, followed by a description which has the word ‘animal’ in it, and then the last video is that of a ‘lion’. This is as expected. Most of our videos are of our target, which is ‘tiger’. We got one ‘animal’ video because that is our positive context. And we did not get a single video of ‘cat’ as that is our negative context. We got one video of ‘lion’ because ‘lion’ is semantically similar to ‘tiger’.

Conclusion

In conclusion, the process of building a semantic video search engine using sentence-transformers and Qdrant demonstrates the capabilities of AI in understanding and categorizing video descriptions. The traditional keyword-based search methods are limited to metadata, transcripts, and thumbnails, which may not capture the full context or semantic meaning of the video. However, with semantic search, the system comprehends the deeper themes and narratives within the videos, making search results more relevant and precise.

The Discovery API of Qdrant, with its unique ability to refine search parameters through context, further enhances the precision of these search results.

References

https://qdrant.tech/documentation/concepts/explore/

Build a Customer Service Chatbot Using Flask, Llama 2, LangChain and Qdrant

Vardhanam Daga — Tue, 26 Dec 2023 12:34:41 +0000

Introduction

Customer service chatbots play a pivotal role in transforming the customer experience across diverse industries. These intelligent virtual assistants are employed in a myriad of use cases to streamline and enhance customer support. They excel in handling routine tasks, such as automated FAQs, providing instant responses to common queries and offering real-time updates on order status and tracking. Chatbots are instrumental in automating appointment scheduling, making reservations, and even facilitating product recommendations based on user preferences. Additionally, they prove invaluable in resolving technical issues through guided troubleshooting, collecting valuable customer feedback, and easing the onboarding process for new users. Industries ranging from e-commerce to healthcare leverage chatbots for tasks such as billing inquiries, language translation, and lead generation. The versatility of customer service chatbots extends to internal support, where they assist employees with HR-related queries. While providing swift and efficient responses, these chatbots significantly contribute to optimizing operational processes and fostering positive customer interactions.

In this article, we’ll build a Customer Service Chatbot that is powered by Flask, Qdrant, LangChain, and Llama 2. As an example we’ll supply the chatbot with a document containing Google’s Code of Conduct, but you can use your own company’s documentation as a context for the chatbot.

Components

**Flask: **Flask is an eminent web framework in the Python programming community, renowned for its simplicity and elegance. As a micro-framework, it is minimalistic yet powerful, offering developers a solid foundation for building a variety of web applications without imposing any specific tools or libraries. This flexibility allows Flask to be lightweight and straightforward to learn, making it an excellent choice for both beginners and experienced developers. It leverages the Jinja2 template engine for dynamic content rendering, ensuring a seamless integration of Python code with HTML. Flask's built-in development server and debugger facilitate efficient development and troubleshooting. Moreover, its capacity for easy extension with numerous available plugins supports more complex requirements, like database integration, form validation, and authentication. Flask's robustness and versatility, combined with a strong community and comprehensive documentation, has cemented its status as a go-to framework for web development in Python.

Qdrant: Qdrant is an open-source vector search engine designed to facilitate efficient and scalable similarity search in high-dimensional vector spaces. It is particularly tailored for machine learning applications, such as image or natural language processing, where embedding vectors are commonly used to represent complex data. Qdrant stands out for its performance optimization and ease of use, providing a robust system for storing, managing, and querying large volumes of vector data. It supports various distance metrics, enabling precise and relevant search results for different types of data and use cases. Additionally, Qdrant offers features like filtering and full-text search, allowing for more complex and refined queries. Its architecture is designed to be horizontally scalable, making it well-suited for handling big data scenarios. Qdrant's user-friendly API and compatibility with popular programming languages like Python further enhance its accessibility to developers and data scientists. As a result, Qdrant is becoming increasingly popular in the field of AI and data-driven applications, where efficient and accurate vector search is crucial.

LangChain: LangChain is an innovative open-source library designed to augment the capabilities of large language models (LLMs), in creating applications that require complex, multi-step reasoning or knowledge retrieval. It primarily focuses on enhancing the ability of LLMs to interface with external knowledge sources, such as databases and search engines, thereby extending their problem-solving and informational capabilities beyond their intrinsic knowledge. LangChain's architecture is modular, allowing for the integration of various components and tools to tailor the LLM's performance to specific applications. This modularity also facilitates experimentation with different strategies for knowledge retrieval and reasoning, making it a versatile tool for developers and researchers in the field of AI and natural language processing. By leveraging LangChain, developers can create more sophisticated and intelligent applications that combine the nuanced understanding of human language inherent in LLMs with vast, dynamic external knowledge bases. This combination opens up new possibilities for AI applications in areas such as automated research, complex decision-making, and personalized content generation.

Llama 2: Llama 2, a powerhouse language model developed by Meta and Microsoft, stands as a giant in the world of AI. Trained on a vast ocean of internet data, it possesses the remarkable ability to converse, generate creative text formats, and answer your questions in an informative way. This open-source behemoth, freely available for research and commercial use, marks a significant leap forward in AI accessibility. With its immense potential and commitment to responsible development, Llama 2 paves the way for a future where AI empowers human creativity and understanding.

Using the above 4 components we are going to implement a RAG pipeline into our customer service chatbot.

Setting Up the Environment

Install the following dependencies by creating a requirements.txt file:

Flask
Flask-Session
sentence-transformers
langchain
transformers
scipy
trl
bitsandbytes
peft
accelerate
torch
datasets
langchain
qdrant-client
pypdf

Create a file called app.py. That will be the file containing backend flask code, and all the logic for creating the RAG pipeline. First, I’ll paste the entire working code, then I’ll describe each section one-by-one in case you want to understand what the code does.

from flask import Flask, request, session, jsonify
from werkzeug.utils import secure_filename
import os
import os
import uuid
import torch
import transformers
from transformers import (
AutoTokenizer,
AutoModelForCausalLM,
BitsAndBytesConfig,
pipeline
)
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_transformers import Html2TextTransformer
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import Qdrant
from langchain.document_loaders import PyPDFLoader
from langchain.llms import HuggingFacePipeline
from langchain.chains import LLMChain, RetrievalQA

app = Flask(__name__)
app.config['UPLOAD_FOLDER'] = 'uploads/'
ALLOWED_EXTENSIONS = {'pdf'}

if not os.path.exists(app.config['UPLOAD_FOLDER']):
    os.makedirs(app.config['UPLOAD_FOLDER'])

#Loading the Llama-2 Model
model_name='meta-llama/Llama-2-7b-chat-hf'
model_config = transformers.AutoConfig.from_pretrained(
model_name,
)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

#################################################################
# bitsandbytes parameters
#################################################################

# Activate 4-bit precision base model loading
use_4bit = True
# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"
# Quantization type (fp4 or nf4)
bnb_4bit_quant_type = "nf4"
# Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False

#################################################################
# Set up quantization config
#################################################################
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
load_in_4bit=use_4bit,
bnb_4bit_quant_type=bnb_4bit_quant_type,
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=use_nested_quant,
)
# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and use_4bit:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        print("=" * 80)
        print("Your GPU supports bfloat16: accelerate training with bf16=True")
        print("=" * 80)

#################################################################
# Load pre-trained config
#################################################################
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
)

# Building a LLM QNA chain
text_generation_pipeline = transformers.pipeline(
model=model,
tokenizer=tokenizer,
task="text-generation",
temperature=0.2,
repetition_penalty=1.1,
return_full_text=True,
max_new_tokens=300,
)

llama_llm = HuggingFacePipeline(pipeline=text_generation_pipeline)
file_id = None
retrieval_chain = None

def allowed_file(filename):
    return '.' in filename and \
           filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS

@app.route('/upload', methods=['POST'])
def upload_file():
    global file_id
    if 'file' not in request.files:
        return 'No file part', 400
    file = request.files['file']
    if file.filename == '':
        return 'No selected file', 400
    if file and allowed_file(file.filename):
        filename = secure_filename(file.filename)
        file_id = str(uuid.uuid4())
        filepath = os.path.join(app.config['UPLOAD_FOLDER'], file_id)
        file.save(filepath)

        # Placeholder for your PDF processing logic
        process_pdf(filepath)

        return 'File uploaded & processed successfully. You can begin querying now', 200

def process_pdf(filepath):
    global retrieval_chain
    # Loading the splitting the document #
    filepath = os.path.join(app.config['UPLOAD_FOLDER'], file_id)
    loader = PyPDFLoader(filepath)
    docs = loader.load_and_split()
    # Chunk text
    text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=100)
    chunked_documents = text_splitter.split_documents(docs)

    # Load chunked documents into the Qdrant index
    db = Qdrant.from_documents(chunked_documents,              HuggingFaceEmbeddings(model_name='sentence-transformers/all-mpnet-base-v2'), location= ":memory:")
    retriever = db.as_retriever()
    retrieval_chain = RetrievalQA.from_llm(llm= llama_llm, retriever= retriever)

@app.route('/query', methods=['POST'])
def query():
    global retrieval_chain
    data = request.json
    query = data.get('query')

    return jsonify(retrieval_chain.run(query)), 200

if __name__ == '__main__':
    app.run(debug=True)

from flask import Flask, request, session, jsonify
......
from langchain.chains import LLMChain, RetrievalQA

First, here’s a list of all the import statements needed to initialize our project.

app = Flask(__name__)
....
    os.makedirs(app.config['UPLOAD_FOLDER'])

We initialize our flask app and create the necessary directories.

#Loading the Llama-2 Model
model_name='TheBloke/Llama-2-7B-Chat-GGUF'
………
quantization_config=bnb_config,)

\
In this part of the code we set the quantization parameters of our model so it may work speedily and efficiently.

# Building a LLM QNA chain
text_generation_pipeline = transformers.pipeline(
……retrieval_chain = None

Next, a LLM chain pipeline is created and some global variables are initialized, which we’ll later use to save data as the flask server is running.

def allowed_file(filename):
    …………
           filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS

Helper function to grab the name of our pdf file.

@app.route('/upload', methods=['POST'])
def upload_file():
…………

        return 'File uploaded & processed successfully. You can begin querying now', 200

This is where we are creating the API endpoint for uploading the pdf file. The upload file is then saved using a unique id. In-between, we are also using the helper function process_pdf to process our document for the vector store.

def process_pdf(filepath):
    ………    retrieval_chain = RetrievalQA.from_llm(llm= llm, retriever= retriever)

\We chunk the document into smaller parts, then use the Hugging Face embeddings to upsert it into our vector store, and then create a retrieval QA chain which is assigned to a global variable so that it can be used later.

@app.route('/query', methods=['POST'])
def query():
………
    return jsonify(retrieval_chain.run(query)), 200

Then we create the API endpoint for sending queries to our server and then receiving the returned messages from the retrieval_chain.

Launching the app and interacting with our server (chatbot)

Save the app.py file and, in the terminal, launch it with the following command:

python app.py

You have a flask server (the chatbot) up and running on port 5000 of the localhost.

To upload your pdf file, use the following command in a new terminal. For our article we shall upload Google’s Code of Conduct into our chatbot.

curl -X POST -F 'file=@/path/to/your/file.pdf' http://localhost:5000/upload

If the file is uploaded and processed, you should see the following message:

To send queries, use the following command:

curl -X POST -H "Content-Type: application/json" -d '{"query":"your query text here"}' http://localhost:5000/query

Example queries

curl -X POST -H "Content-Type: application/json" -d '{"query":"Summarize the code of conduct document for me"}' http://localhost:5000/query

curl -X POST -H "Content-Type: application/json" -d '{"query":"What is google's stance on gender discrimination?"}' http://localhost:5000/query

curl -X POST -H "Content-Type: application/json" -d '{"query":"What is policy for outside employment?"}' http://localhost:5000/query

Conclusion

In conclusion, the provided code sets up a Flask server to deploy a chatbot based on the RAG (Retrieval-Augmented Generation) pipeline. The server allows users to upload PDF documents, which are pr## Introduction

Querying news articles via a streamlit app using openAI, langchain, and qdrant db

Vardhanam Daga — Tue, 19 Dec 2023 08:02:15 +0000

Introduction

Chatbots integrated into news querying serve various crucial purposes. They offer a convenient and conversational approach for users to access news, eliminating the need to navigate websites or apps. Users can simply ask a chatbot for news on a specific topic or event, making information more accessible, especially for those who find traditional methods challenging.

Personalization is a key feature of chatbots in the news industry. By learning from users' past queries and interactions, chatbots can present more relevant and personalized news articles, enhancing the overall user experience. This tailoring of content ensures that users receive information aligned with their preferences.

Time efficiency is another significant advantage. Chatbots quickly sift through vast amounts of information, presenting users with the most relevant news articles. This time-saving aspect is particularly beneficial for users who would otherwise have to manually search and filter through numerous sources.

Moreover, chatbots contribute to interactive news consumption. They engage users in a conversation, answering follow-up questions and providing additional context or related information. This interactive approach adds depth to the news-reading experience, surpassing the passive nature of traditional methods.

Information overload is a common issue in the digital age, and chatbots help mitigate it by filtering out noise. They deliver news that is most relevant to the user's interests and needs, streamlining the consumption process and enhancing user satisfaction.

Visually impaired users benefit significantly from chatbots, especially when integrated with voice technology. This combination provides an invaluable audio-based method for accessing news, promoting inclusivity in news consumption.

Integration into commonly used platforms, such as messaging apps, enhances user convenience. Users can receive news updates in the same environment where they communicate with others, streamlining their digital experience.

Automated updates and alerts are a proactive feature of chatbots. Programmed to send timely news updates or alerts about breaking news, chatbots ensure that users stay informed in real time, contributing to a more connected and aware user base.

Language and regional customization further extend the accessibility of news. Chatbots can be designed to deliver news in multiple languages and tailor content to regional or local interests, catering to diverse demographics and preferences.

In summary, chatbots in the news industry elevate user experience through convenient, personalized, and interactive access to news. They address various challenges in traditional news consumption methods while catering to a diverse range of user needs and preferences.

In this article, we’ll design an RAG pipeline using OpenAI, Langchain, and Qdrant DB and encase it in an user interface via Streamlit.

What Is RAG

"RAG" stands for "Retrieval-Augmented Generation." It's a technique used in natural language processing and machine learning, particularly in the development of advanced language models like chatbots.In a RAG setup, when a query is input (like a question or a prompt), the retrieval system first searches through its database to find relevant information or documents. This information is then passed on to the generative model, which synthesizes it to create a coherent and contextually appropriate response.

A Brief Note on the Components

Langchain: LangChain is an open-source framework designed to simplify the creation of applications using large language models (LLMs). LangChain's use-cases largely overlap with those of language models in general, including document analysis and summarization, chatbots, and code analysis. LangChain enables developers to connect LLMs to other data sources, interact with their environment, and build complex applications. It is written in Python and JavaScript and supports a variety of language models, including GPT-3, LLAMA, Hugging Face Jurassic-1 Jumbo, and more.

Qdrant: Qdrant is an open-source vector similarity search engine and vector database written in Rust. It provides a production-ready service with a convenient API to store, search, and manage points—vectors with an additional payload. Qdrant is tailored to extended filtering support, making it useful for various neural network or semantic-based matching, faceted search, and other applications.

Setting Up the Environment & Code

First, in your directory, create a requirements.txt file with the following content:

langchain
streamlit
requests
opeanai
qdrant-client
tiktoken

Then run the command to install these dependencies:

pip install -r requirements.txt

Now create a file name app.py and paste the following code in it, the comments explain their functionality:

#importing the needed libraries
import streamlit as st
import requests
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Qdrant
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
import os

#function to fetch text data from the links of news websites
def fetch_article_content(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
    }
    try:
        response = requests.get(url, headers=headers)
        response.raise_for_status()
        return response.text
    except requests.RequestException as e:
        st.error(f"Error fetching {url}: {e}")
        return ""

#function to collate all the text from the news website into a single string
def process_links(links):
    all_contents = ""
    for link in enumerate(links):
        content = fetch_article_content(link.strip())
        all_contents += content + "\n\n"
    return all_contents

#function to chunk the articles beofore creating vector embeddings
def get_text_chunks_langchain(text):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    texts = text_splitter.split_text(text)
    return texts

#creating the streamlit app

def main():
    st.title('News Article Fetcher')

    # Initialize state variables
    if 'articles_fetched' not in st.session_state:
        st.session_state.articles_fetched = False
    if 'chat_history' not in st.session_state:
        st.session_state.chat_history = ""

    # Model selection
    model_choice = st.radio("Choose your model", ["GPT 3.5", "GPT 4"], key= "model_choice")
    model = "gpt-3.5-turbo-1106" if st.session_state.model_choice == "GPT 3.5" else "gpt-4-1106-preview"

    #API_KEY
    API_KEY = st.text_input("Enter your OpenAI API key", type="password", key= "API_KEY")

    # Ensure API_KEY is set before proceeding
    if not API_KEY:
        st.warning("Please enter your OpenAI API key.")
        st.stop()

    #asking user to upload a text file with links to news articles (1 link per line)
    uploaded_file = st.file_uploader("Upload a file with links", type="txt")

    # Read the file into a list of links
    if uploaded_file:
        stringio = uploaded_file.getvalue().decode("utf-8")
        links = stringio.splitlines()

    # Fetch the articles' content
    if st.button("Fetch Articles") and uploaded_file:
        progress_bar = st.progress(0)
        with st.spinner('Fetching articles...'):
            article_contents = process_links(links)
            progress_bar.progress(0.25)  # Update progress to 25%

            #Process the article contents
            texts = get_text_chunks_langchain(article_contents)
            progress_bar.progress(0.5)  # Update progress to 50%

            #storing the chunked articles as embeddings in Qdrant
            os.environ["OPENAI_API_KEY"] =  st.session_state.API_KEY
            embeddings = OpenAIEmbeddings()
            vector_store = Qdrant.from_texts(texts, embeddings, location=":memory:",)
            retriever = vector_store.as_retriever()
            progress_bar.progress(0.75)  # Update progress to 75%

            #Creating a QA chain against the vectorstore
            llm = ChatOpenAI(model_name= model)
            if 'qa' not in st.session_state:
                st.session_state.qa = RetrievalQA.from_llm(llm= llm, retriever= retriever)
            progress_bar.progress(1)

            st.success('Articles fetched successfully!')
            st.session_state.articles_fetched = True

    #once articles are fetched, take input for user query

    if 'articles_fetched' in st.session_state and st.session_state.articles_fetched:

        query = st.text_input("Enter your query here:", key="query")

        if query:
            # Process the query using your QA model (assuming it's already set up)
            with st.spinner('Analyzing query...'):
                qa = st.session_state.qa
                response = qa.run(st.session_state.query)  
            # Update chat history
            st.session_state.chat_history += f"> {st.session_state.query}\n{response}\n\n"

        # Display conversation history
        st.text_area("Conversation:", st.session_state.chat_history, height=1000, key="conversation_area")
        # JavaScript to scroll to the bottom of the text area
        st.markdown(
            f"<script>document.getElementById('conversation_area').scrollTop = document.getElementById('conversation_area').scrollHeight;</script>",
            unsafe_allow_html=True
        )

if __name__ == "__main__":
    main()

Then save the app.py and run the following command in your terminal:

streamlit run app.py

This launches your application at localhost with port number 8051.

Here’s how the UI of the news article fetcher application looks like:

Conclusion

In conclusion, the integration of chatbots into news querying not only addresses the challenges of traditional news consumption but also significantly enhances user experience by providing a convenient, personalized, and interactive access to information. The discussed RAG pipeline, incorporating OpenAI, Langchain, and Qdrant DB, coupled with a Streamlit-based user interface, exemplifies the cutting-edge technological advancements in natural language processing and machine learning. This comprehensive solution not only streamlines the process of fetching and analyzing news articles but also showcases the potential of AI-driven systems in delivering tailored content, mitigating information overload, and ensuring inclusivity for visually impaired users. The outlined code implementation serves as a practical guide for developers interested in building advanced chatbot applications for news retrieval, demonstrating the fusion of language models, vector similarity search engines, and efficient UI design. Ultimately, this innovative approach represents a paradigm shift in news consumption, offering a glimpse into the future of user-centric and technology-driven information access.

References

https://api.python.langchain.com/en/latest/api_reference.html

https://python.langchain.com/docs/integrations/vectorstores/qdrant

Building a Multidocument Chatbot using Mistral 7B, Qdrant, and Langchain

Vardhanam Daga — Wed, 06 Dec 2023 09:28:10 +0000

Introduction

A Multi-document chatbot is basically a robot friend that can read lots of different stories or articles and then chat with you about them, giving you the scoop on all they’ve learned. This robot doesn’t get tired, even if it has to go through a whole pile of papers to find the information. They’re like your personal homework helper who does all the reading for you and then tells you what you need to know in a nice and easy way. In this blog post, we’ll elucidate the process of creating a multidocument chatbot using advanced technologies such as Mistral 7B, Qdrant, and Langchain.

Understanding Mistral 7B

Mistral 7B is a cutting-edge language model crafted by the startup Mistral, which has impressively raised $113 million in seed funding to focus on building and openly sharing advanced AI models. It boasts sophisticated features such as deep language comprehension, impressive text generation, and the ability to adapt to specialized tasks. For those interested in creating chatbots that can process information from multiple documents, delving into Mistral 7B’s functionalities would be crucial.

Vector Database

A vector database is a specialized type of database that is designed to store and manage vector data, which are arrays of numbers that represent various features of high-dimensional data points. In the context of machine learning and artificial intelligence, these vectors often come from the embeddings of more complex data types, such as images, text, or audio.

The vectors are a way to translate this complex data into a language that a computer can more easily work with for tasks such as similarity search. For instance, in a vector database, you could ask it to find images similar to a given image, and the database would search through its stored vectors — numerical representations of images — to find those that are closest to your input vector in terms of distance in the multi-dimensional space they exist in.

The “closeness” or “similarity” is determined by distance metrics like Euclidean distance, cosine similarity, or others, depending on the database and the specific application. Vector databases are optimized to perform these types of high-speed similarity searches and are crucial for supporting various AI-driven applications, including recommendation systems, search engines, and data analysis tools.

Qdrant: An Introduction

Qdrant is a vector database that specializes in storing and searching through high-dimensional vectors — the kind of data that represents complex items like images, text, or user preferences in a form that AI can understand. It provides an API service that developers use to add search functionality to their applications, allowing them to quickly find the most relevant pieces of information based on their similarity. This is particularly useful for building smarter search engines, recommendation systems, and other applications that need to sift through large amounts of data to find patterns or connections.

Implementing Langchain for Language Workflows

Using Langchain with Language Models: When you’re working with language models, sometimes you need specific information that the model wasn’t trained on. Retrieval Augmented Generation (RAG) is one way to solve this problem. It involves finding outside information and then giving it to the language model to help it answer questions. Langchain has many tools to help with this, from easy to advanced. In this part of the guide, we’ll talk about how to get the information you need, including finding documents, preparing them, and then using that information to help the language model.

Getting Documents: Langchain can find documents from lots of different places for you. It works with more than 100 types of document finders and connects with other big services like AirByte and Unstructured. Whether you’re working with web pages, PDFs, or code, and whether your documents are stored privately (like in S3 buckets) or publicly, Langchain can help you load them up.
Preparing Documents: It’s important to only fetch the parts of documents you really need. Langchain has tools to make this easier by breaking down big documents into smaller pieces that are easier to handle. It has different ways to do this, and some are even tailored for specific kinds of documents, like code or markdown files.
Understanding Text: To make sure we can find documents that are similar to each other, we need to turn them into embeddings, which are like numerical fingerprints of text. Langchain lets you use more than 25 different services to create these embeddings. You can pick from free options or paid services, depending on what you need. Plus, Langchain makes it simple to switch from one service to another.
Storing Embeddings: As embeddings become more common, we need special databases that can store and find them quickly. Langchain has options for this, too, with over 50 different databases you can use, from free local ones to paid cloud services. No matter which one you choose, Langchain keeps the way you interact with them consistent, so it’s easy to change databases if you need to.
Retrievers

When managing data within your database for a multidocument chatbot, the effectiveness of retrieving the right information is crucial. Langchain steps up to this challenge with a suite of retrieval tools. These include:

The Parent Document Retriever, which allows for the generation of multiple data ‘fingerprints’ for a larger document. This method enables the chatbot to search through smaller sections of text while still understanding the context of the entire document.
The Self Query Retriever, which is designed to tackle queries that aren’t just about semantic meaning but also include specific user details or conditions. It separates the meaningful content of a user’s question from other non-semantic information to ensure the response is as relevant as possible.
The Ensemble Retriever, which is especially useful when your chatbot needs to pull information from a variety of databases or when you want to blend different retrieval methods for a more robust search.
Integrating these sophisticated retrieval capabilities with Langchain can significantly enhance your chatbot’s ability to access and utilize multiple documents, delivering precise and contextually rich interactions.

Building the Multidocument Chatbot

Now that we understand Mistral 7B, Qdrant, and Langchain, we now begin building the multidocument chatbot. This entails data preprocessing, model fine-tuning, and deployment strategies to ensure that your chatbot can provide accurate and informative responses.

Running LangChain and RAG for Text Generation and Retrieval
In this section, we’ll go through the steps to use LangChain and the Retrieval-Augmented Generation (RAG) model to perform text generation and information retrieval tasks.

Running With LangChain
Setting Up the Environment
!pip install -q -U bitsandbytes !pip install -q -U git+https://github.com/huggingface/transformers.git !pip install -q -U git+https://github.com/huggingface/peft.git !pip install -q -U git+https://github.com/huggingface/accelerate.git !pip install -q -U einops !pip install -q -U safetensors !pip install -q -U torch !pip install -q -U xformers !pip install -q -U langchain !pip install -q -U ctransformers[cuda] !pip install qdrant_client !pip install sentence-transformers
Authenticate with Hugging Face
To authenticate with Hugging Face, you’ll need an access token. Here’s how to get it:

Go to your Hugging Face account.
Navigate to ‘Settings’ and click on ‘Access Tokens’.
Create a new token or copy an existing one.
(Link to Huggingface)

!pip install huggingface_hub from huggingface_hub import notebook_login notebook_login()

We begin by defining the model we want to use. In this case, it’s “mistralai/Mistral-7B-Instruct-v0.1.”
We create an instance of the model for text generation and set various parameters for its behavior.

import torch from transformers import BitsAndBytesConfig quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, )

LangChain Setup
First we import LangChain components.
Then we create a LangChain pipeline using the model for text generation.

`model_id = "mistralai/Mistral-7B-Instruct-v0.1"

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_4bit = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto",quantization_config=quantization_config, )
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipeline = pipeline(
"text-generation",
model=model_4bit,
tokenizer=tokenizer,
use_cache=True,
device_map="auto",
max_length=500,
do_sample=True,
top_k=5,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.eos_token_id,
)

from langchain import HuggingFacePipeline
from langchain import PromptTemplate, LLMChain
llm = HuggingFacePipeline(pipeline=pipeline)`

Generating Text
We define a template for generating responses that includes context and a question.
We provide a specific question and context for the model to generate a response.
The response variable now contains the generated response.

`template = """[INST] You are a helpful, respectful and honest assistant. Answer exactly in few words from the context
Answer the question below from context below :
{context}
{question} [/INST]
"""
question_p = """Which companies announced their mergers"""
context_p = """ In a landmark fusion of tech titans, Cybervine and QuantumNet announced their merger today, promising to redefine the digital frontier with their combined innovation powerhouse, now known as CyberQuantum."""
prompt = PromptTemplate(template=template, input_variables=["question","context"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
response = llm_chain.run({"question":question_p,"context":context_p})
response

template = """[INST] You are a helpful, respectful and honest assistant. Answer exactly in few words from the context
Answer the question below from context below :
{context}
{question} [/INST]
"""
question_p = """How many stocks did the investor buy? """
context_p = """ Amidst a flurry of market activity, investor Jordan McHale made headlines by confidently scooping up 50,000 shares of the burgeoning tech firm Solarity Innovations, signaling a bullish stance on the company's green energy prospects."""
prompt = PromptTemplate(template=template, input_variables=["question","context"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
response = llm_chain.run({"question":question_p,"context":context_p})
response`

Retrieval Augmented Generation (RAG)
Setting Up RAG
We start by importing the necessary modules for RAG setup.

from qdrant_client import QdrantClient from langchain.llms import HuggingFacePipeline from langchain.document_loaders import TextLoader from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain.embeddings import HuggingFaceEmbeddings from langchain.chains import RetrievalQA from langchain.vectorstores import Qdrant

Providing Document Context
We provide a sample document context, which is a news article in this case.

`mna_news =
"""Vectora, a forward-thinking player in the tech startup ecosystem, has ushered in a new chapter by naming Priyanka Desai as its Chief Executive Officer. Desai, a renowned figure in the tech community for her groundbreaking work at Nexus Energy Solutions, takes the reins at Vectora to propel the company into a leader in sustainable technology. With an expansive vision and a stellar record, Desai emerged as the chosen leader after an extensive international search, reflecting the board's confidence in her innovative approach and strategic acumen.
This strategic appointment coincides with Vectora's most recent milestone--securing a transformative $200 million in funding aimed at accelerating its growth. Desai's illustrious career, highlighted by her success in scaling Nexus Energy Solutions to an industry vanguard, speaks to her exceptional leadership. "Priyanka is the embodiment of leadership with purpose, and her alignment with our core values is what makes this appointment a perfect match," expressed Anil Mehta, Vectora's co-founder. Desai's plans for Vectora not only encompass financial growth but also reinforce the company's pledge to environmental innovation.
Addressing the company after her appointment, Desai unveiled an ambitious roadmap to expand Vectora's R&D efforts and introduce groundbreaking products to reduce industrial carbon emissions. "I am energized to lead a company that is as committed to sustainability as it is to technological innovation," Desai shared, underscoring her commitment to combating the urgent challenges posed by climate change.
Desai's leadership style, characterized by her emphasis on inclusive growth and collaborative innovation, has been met with resounding approval from within Vectora's ranks and across the tech sector. Her drive for fostering a workplace where diverse ideas flourish has drawn particular admiration. "Priyanka brings a dynamic perspective to Vectora that will undoubtedly spark creativity and drive," commented Anjali Vaidya, a prominent technology sector analyst. "Her track record of empowering her teams speaks volumes about her potential impact on Vectora's trajectory."
As Desai takes charge, industry observers are keenly awaiting the rollout of Vectora's most ambitious endeavor yet--an AI-driven toolset designed to optimize energy management for a global clientele. With Desai at the wheel, Vectora stands on the precipice of not just market success, but also contributing a significant handprint to the global sustainability effort. The tech world is abuzz as Desai is set to officially step into her new role next week, marking a potentially transformative era for Vectora and the industry at large.

"""`

Setting Up RAG Components
Let’s configure various components, such as text splitting and embeddings.
Then create a vector store using the provided documents and embeddings.
We’ll also configure the retrieval component, retriever, and setup the RetrievalQA.

`from langchain.schema.document import Document
documents = [Document(page_content=mna_news, metadata={"source": "local"})]

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
all_splits = text_splitter.split_documents(documents)
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}
embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

vectordb = Qdrant.from_documents(documents=all_splits, embedding=embeddings, location=":memory:",
prefer_grpc=True,
collection_name="my_documents",
) # Local mode with in-memory storage only

retriever = vectordb.as_retriever()

qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
verbose=True
)
def run_my_rag(qa, query):
print(f"Query: {query}\n")
result = qa.run(query)
print("\nResult: ", result)
Running a Query

query =""" What company is buyer and seller here """
run_my_rag(qa, query)
`

Real-World Applications

Chatbots that can read and understand information from many documents can be really helpful for different jobs. They can talk to customers to help solve their problems, help people find information they need for research, and pick out the best content for different topics.

Conclusion

The development of multidocument chatbots is an exciting frontier in the field of AI-powered conversational agents. By combining Mistral 7B’s language understanding, Qdrant’s vectordb, and Langchain’s language processing, developers can create chatbots that provide comprehensive, context-aware responses to user queries. This article serves as a starting point for anyone interested in building chatbots using these advanced technologies, opening up new possibilities for human-machine interaction. With the right tools and techniques, you can create chatbots that are more informative and engaging than ever before.

References

Qdrant: https://qdrant.tech/documentation/overview/

Langchain documentation: https://python.langchain.com/docs/modules/data_connection/

Mistral 7B Research Paper: https://arxiv.org/pdf/2310.06825.pdf