<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: azhar</title>
    <description>The latest articles on DEV Community by azhar (@moazharu).</description>
    <link>https://dev.to/moazharu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1260240%2Fc48613f5-c6a7-40ea-a6a6-b0a08a58d5b4.png</url>
      <title>DEV Community: azhar</title>
      <link>https://dev.to/moazharu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/moazharu"/>
    <language>en</language>
    <item>
      <title>Implementing RAG with Mamba and the Qdrant Database: A Detailed Exploration (with Code)</title>
      <dc:creator>azhar</dc:creator>
      <pubDate>Tue, 30 Jan 2024 14:35:05 +0000</pubDate>
      <link>https://dev.to/moazharu/implementing-rag-with-mamba-and-the-qdrant-database-a-detailed-exploration-with-code-4f2l</link>
      <guid>https://dev.to/moazharu/implementing-rag-with-mamba-and-the-qdrant-database-a-detailed-exploration-with-code-4f2l</guid>
      <description>&lt;p&gt;Hi everyone, and welcome! Today, we’re diving into the fascinating world of AI, particularly focusing on implementing Retrieval-Augmented Generation (RAG) with Mamba and utilizing the Qdrant database. Mamba, a recent development in AI, challenges the conventional norms set by Transformers, especially in processing lengthy sequences. The synergy of RAG, Mamba, and Qdrant promises a compelling blend of efficiency and scalability, revolutionizing how we approach large-scale data processing and retrieval.&lt;/p&gt;

&lt;p&gt;Before we proceed, let’s stay connected! Please consider following me on Dev.to, and don’t forget to connect with me on LinkedIn for a regular dose of data science and deep learning insights.” 🚀📊🤖&lt;/p&gt;

&lt;p&gt;To learn more about Mamba, be sure to check out our previous Article.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/ai-insights-cobet/decoding-mamba-the-next-big-leap-in-ai-sequence-modeling-ef3908060cb8"&gt;Decoding Mamba: The Next Big Leap in AI Sequence Modeling&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the Basics of RAG, and Mamba: An Overview
&lt;/h2&gt;

&lt;p&gt;Mamba stands out with its Selective State Spaces, blending the adaptability of LSTMs and the efficiency of state space models. Its capability to process entire sequences in one go is reminiscent of Transformers but with a novel twist. RAG, on the other hand, specializes in improving the precision of Large Language Models (LLMs) by efficiently sifting through and refining massive datasets.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Role of Mamba in RAG
&lt;/h3&gt;

&lt;p&gt;The Mamba architecture plays a pivotal role in augmenting the capabilities of Retrieval Augmented Generation (RAG). Mamba, with its innovative approach to handling lengthy sequences, is particularly well-suited for enhancing RAG’s efficiency and accuracy. Its Selective State Spaces model allows for a more flexible and adaptable transition of states compared to traditional state space models, making it highly effective in the context of RAG.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Mamba Improves RAG
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Handling Lengthy Sequences:&lt;/strong&gt; Mamba’s inherent ability to scale to longer sequences without a significant trade-off in computational efficiency is crucial for RAG. This characteristic becomes particularly beneficial when dealing with extensive external knowledge bases, ensuring that the retrieval process is both quick and accurate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Selective State Spaces:&lt;/strong&gt; The Selective State Spaces in Mamba provide a more nuanced approach to sequence processing. This feature is invaluable in RAG’s context retrieval process, as it allows for a more dynamic and context-sensitive analysis of the query and the corresponding information retrieved from databases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Efficient Computation:&lt;/strong&gt; Mamba retains the efficient computation traits of state space models, enabling it to perform forward passes of entire sequences in one sweep. This efficiency is beneficial in the RAG framework, especially when integrating and processing large volumes of external data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Flexibility and Adaptability:&lt;/strong&gt; Mamba’s architecture, akin to LSTMs, offers flexibility and adaptability in processing sequences. This flexibility is advantageous when dealing with the variety and unpredictability of user queries in RAG, ensuring that the system can adeptly handle a wide range of information retrieval tasks.&lt;/p&gt;

&lt;p&gt;Before diving into the technical implementation, let’s set the stage for how we bring the concepts of Retrieval-Augmented Generation (RAG), Mamba architecture, and the Qdrant database together in a practical, code-driven scenario.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Utilize the Mamba Model and Integrate RAG with &lt;a href="https://qdrant.tech/"&gt;Qdrant&lt;/a&gt; for Efficient Data Retrieval
&lt;/h3&gt;

&lt;p&gt;In this section, we will explore a Python script that exemplifies the integration of these advanced technologies. This script not only illustrates the installation and setup of the necessary environments and libraries but also demonstrates how to prepare and process data, initialize and utilize the Mamba model, and effectively integrate RAG with &lt;a href="https://qdrant.tech/"&gt;Qdrant &lt;/a&gt;for efficient data retrieval and response generation. The following breakdown of the code will provide insights into each step of the process, showcasing how the synergy of Mamba’s computational efficiency and Qdrant’s retrieval capabilities can enhance the performance of a RAG-based system.&lt;/p&gt;

&lt;p&gt;Let’s delve into the code to see these cutting-edge technologies in action.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Environment Setup and Library Installation
&lt;/h3&gt;

&lt;p&gt;Initially, the script installs necessary libraries, including PyTorch, Mamba-SSM, LangChain, &lt;a href="https://qdrant.tech/"&gt;Qdrant &lt;/a&gt;client, and others. These installations are crucial for setting up the environment needed for RAG, Mamba, and Qdrant to work together.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;inspect&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cleandoc&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mamba_ssm.models.mixer_seq_simple&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MambaLMHeadModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.embeddings&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HuggingFaceBgeEmbeddings&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.vectorstores&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Qdrant&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.document_loaders&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TextLoader&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.document_loaders&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DirectoryLoader&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.text_splitter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Loading Data
&lt;/h3&gt;

&lt;p&gt;Then, it downloads and unzips a dataset (new_articles.zip), presumably containing textual documents to be used in the RAG process.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;wget&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;www&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;vs6ocyvpzzncvwh&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;new_articles&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;
&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;unzip&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="n"&gt;new_articles&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="n"&gt;new_articles&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Mamba Model Initialization
&lt;/h3&gt;

&lt;p&gt;The Mamba model is initialized with a specific model name ("havenhq/mamba-chat") and set to use either a GPU or CPU based on availability. This step is crucial for leveraging Mamba's efficient computation for long sequences in RAG.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;havenhq/mamba-chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MambaLMHeadModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;DEVICE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Tokenization and Model Input Preparation
&lt;/h3&gt;

&lt;p&gt;The tokenizer prepares the input for the model. It’s configured to handle the inputs and outputs appropriately for the Mamba model. The tokenization process is essential for transforming user queries into a format that Mamba can process.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;ANSWER_START&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;|assistant|&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;ANSWER_END&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;|endoftext|&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eos_token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ANSWER_END&lt;/span&gt;
&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pad_token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eos_token&lt;/span&gt;
&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat_template&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BAAI/bge-small-en-v1.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;chat_template&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. RAG Process: Retrieval and Generation
&lt;/h3&gt;

&lt;p&gt;The script includes functions for loading documents, splitting text, and creating a database index using Qdrant. It illustrates the integration of Qdrant for efficient vector-based retrieval of relevant documents, a critical step in the RAG process.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DirectoryLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./new_articles/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;glob&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./*.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loader_cls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TextLoader&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;#splitting the text into
&lt;/span&gt;&lt;span class="n"&gt;text_splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;texts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text_splitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_index&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="c1"&gt;#creates and returns an in-memory vector store to be used in the application
&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BAAI/bge-small-en-v1.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;encode_kwargs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;normalize_embeddings&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="c1"&gt;# set True to compute cosine similarity
&lt;/span&gt;
    &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HuggingFaceBgeEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;device&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;encode_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;encode_kwargs&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;index_from_loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Qdrant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:memory:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Local mode with in-memory storage only
&lt;/span&gt;            &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my_documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;index_from_loader&lt;/span&gt; &lt;span class="c1"&gt;#return the index to be cached by the client app
&lt;/span&gt;
&lt;span class="n"&gt;vector_index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Qdrant serves as our vector database due to its fast indexing, querying capabilities, and support for various distance metrics. This makes it ideal for managing large volumes of vector data with enhanced search accuracy and relevance.&lt;/p&gt;

&lt;p&gt;The semantic_search function performs the retrieval part of RAG, querying the Qdrant vector index to find documents relevant to a given prompt.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;semantic_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;original_prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="c1"&gt;#rag client function
&lt;/span&gt;
    &lt;span class="n"&gt;relevant_prompts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;original_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;list_prompts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relevant_prompts&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="n"&gt;list_prompts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relevant_prompts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;list_prompts&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The predict function then integrates the retrieval part with the generation part, where the Mamba model generates responses based on the context provided by the retrieved documents.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;selected_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;semantic_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector_index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;selected_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; , &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;selected_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;selected_prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please respond to the original query. If the selected document prompt is relevant and informative, provide a detailed answer based on its content. However, if the selected prompt does not offer useful information or is not applicable, simply state &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;No answer found&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Original Prompt: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;
                    Selected Prompt: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;selected_prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;
                    respond: &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;input_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply_chat_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;add_generation_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DEVICE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;input_ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;input_ids&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;top_p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;eos_token_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eos_token_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;extract_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6. Generating Responses
&lt;/h3&gt;

&lt;p&gt;The model generates responses to user queries ("What is the meaning of life?", "How much money did Pando raise?") by considering both the original prompt and the context retrieved from the Qdrant database. This step demonstrates the practical application of RAG, enhanced by Mamba's efficient processing and Qdrant's retrieval capabilities.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How much money did Pando raise?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Selected Prompt: How much money did Pando raise?&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Selected Answer: $30 million in a Series B round, bringing its total raised to $45 million.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the news about Pando?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;    
Selected Prompt: What is the news about Pando?&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Selected Response: Pando has raised $30 million in a Series B round, bringing its total raised to $45 million. The startup is led by Nitin Jayakrishnan and Abhijeet Manohar, who previously worked together at iDelivery, an India-based freight tech marketplace. The startup is focused on global logistics and supply chain management through a software-as-a-service platform. Pando has a compelling sales, marketing and delivery capabilities, according to Jayakrishnan. The startup has also tapped existing enterprise users at warehouses, factories, freight yards and ports and expects to expand its customer base. The company is also open to exploring strategic partnerships and acquisitions with this round of funding.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In our current experiment, we are utilizing a model with 2.7 billion parameters, and it’s fascinating to observe its performance. Remarkably, it operates nearly as effectively as the 7 billion parameter LLaMA2 model. When compared to the LLaMA2-7B, it stands out not just in terms of speed but also in efficiency, which is particularly notable given its smaller size. This advantage could be pivotal when deploying AI in environments with limited computational power, such as mobile phones or other low-capacity devices.&lt;/p&gt;

&lt;p&gt;However, there is a trade-off; the 2.7B parameter model seems to lag slightly behind in reasoning capabilities when compared to some of its larger Transformer counterparts. Looking ahead, fine-tuning the model to enhance its reasoning skills could be a valuable step. For now, though, its balance of performance and efficiency makes it a compelling choice, especially for applications where computing resources are a constraint. This model holds the promise of broadening the accessibility and applicability of advanced AI technology.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/azharlabs/medium/blob/main/notebooks/rag_with_mamba_with_qdrant.ipynb?source=post_page-----3e9a12b610f3--------------------------------"&gt;Code&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Words:
&lt;/h2&gt;

&lt;p&gt;In conclusion, this integration of RAG, Mamba, and Qdrant stands as a testament to the relentless pursuit of innovation in the field of AI. It represents a step towards making AI more efficient, accessible, and capable of handling the ever-growing demands of data processing in our digital world. As we continue to explore and refine these technologies, we eagerly anticipate the new possibilities they will unlock for the future of AI.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Vision Mamba: The Next Leap in Visual Representation Learning</title>
      <dc:creator>azhar</dc:creator>
      <pubDate>Sat, 20 Jan 2024 16:59:41 +0000</pubDate>
      <link>https://dev.to/moazharu/vision-mamba-the-next-leap-in-visual-representation-learning-58ja</link>
      <guid>https://dev.to/moazharu/vision-mamba-the-next-leap-in-visual-representation-learning-58ja</guid>
      <description>&lt;p&gt;In the ever-evolving landscape of artificial intelligence, the introduction of the Vision Mamba architecture heralds a significant shift in how we approach visual data processing. Mamba, an alternative neural network architecture to Transformers, initially captivated the AI community with its text-based applications. However, the recent development of its vision-centric variant, as detailed in the paper “Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Models,” signifies a groundbreaking stride in computer vision.&lt;/p&gt;

&lt;p&gt;Before we proceed, let’s stay connected! Please consider following me on DEV, and don’t forget to connect with me on LinkedIn for a regular dose of data science and deep learning insights.” 🚀📊🤖&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Vision Mamba
&lt;/h2&gt;

&lt;p&gt;Vision Mamba, as an architecture, is designed to efficiently handle vision tasks — a departure from its text-focused predecessor. This shift is crucial given the increasing importance of visual data in our digital age, where images and videos are omnipresent, from social media to surveillance systems.&lt;/p&gt;

&lt;p&gt;The core of Vision Mamba lies in its ability to process visual data through a novel approach that differs from the Transformer models predominantly used in computer vision tasks. Transformers, while powerful, often require substantial computational resources, particularly for high-resolution images. Vision Mamba aims to address this by offering a more efficient alternative.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vision Tasks and Their Importance
&lt;/h2&gt;

&lt;p&gt;To appreciate the significance of Vision Mamba, it’s essential to understand the variety of tasks in computer vision:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Classification:&lt;/strong&gt; Identifying the category of an object within an image, like determining if an X-ray indicates pneumonia.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Detection:&lt;/strong&gt; Locating specific objects within an image, such as identifying cars in a street scene.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Segmentation:&lt;/strong&gt; Differentiating and labeling various parts of an image, often used in medical imaging.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These tasks are integral to numerous applications, from healthcare diagnostics to traffic monitoring and beyond.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vision Mamba vs. Transformer Models
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F98w73ogv7qrnj5bi57qo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F98w73ogv7qrnj5bi57qo.png" alt="Image description" width="781" height="613"&gt;&lt;/a&gt;&lt;br&gt;
The paper, predominantly contributed by researchers from &lt;strong&gt;Huazhong University of Science and Technology, Horizon Robotic, Beijing Academy of Artificial Intelligence institutions&lt;/strong&gt; delves into how Vision Mamba is tailored for these vision tasks. The architecture’s efficiency comes from its bidirectional state space model, which theoretically allows for quicker processing of visual data compared to traditional Transformer models.&lt;/p&gt;

&lt;p&gt;Transformers, although highly effective, can be resource-intensive due to their self-attention mechanisms, especially when dealing with large image datasets. Vision Mamba’s architecture promises a more scalable solution, potentially enabling more complex and larger-scale visual processing tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Unique Challenges of Visual Data
&lt;/h3&gt;

&lt;p&gt;Handling visual data is inherently more complex than processing text. Images are not just sequences of pixels; they encompass intricate patterns, varying spatial relationships, and a need for understanding the overall context. This complexity makes the efficient processing of visual data a challenging task, particularly at scale and with high resolution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vision Mamba’s Approach
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F237wo66feeuevs8dr233.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F237wo66feeuevs8dr233.png" alt="Image description" width="800" height="539"&gt;&lt;/a&gt;&lt;br&gt;
The bidirectional Mamba blocks, a key feature of Vim, tackle these challenges head-on. By marking image sequences with positional embeddings and compressing visual representation with bidirectional state space models, Vision Mamba effectively captures the global context of an image. This approach addresses the inherent position sensitivity of visual data, a critical aspect that traditional Transformer models often struggle with, especially at higher resolutions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vision Mamba Encoder
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwu9yh1i3cx55y7n937nv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwu9yh1i3cx55y7n937nv.png" alt="Image description" width="800" height="332"&gt;&lt;/a&gt;&lt;br&gt;
The proposed Vim model begins by dividing the input image into patches, which are then projected into patch tokens. These tokens are subsequently fed into the Vim encoder. For tasks such as ImageNet classification, we add an additional learnable classification token to the sequence of patch tokens. Unlike the Mamba model used for text sequence modeling, the Vim encoder uniquely processes the token sequence in both forward and backward directions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bidirectional Processing: A Game Changer
&lt;/h3&gt;

&lt;p&gt;A standout feature of Vision Mamba is its bidirectional processing capability. Unlike many contemporary models that process data in a unidirectional manner, Vision Mamba’s encoder processes tokens in both forward and backward directions. This approach is reminiscent of BERT in text processing and offers a more comprehensive analysis of the visual data. The bidirectional model allows for a richer understanding of the image context, a critical factor in accurate image classification and segmentation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benchmarks and Performance
&lt;/h3&gt;

&lt;p&gt;The paper presents compelling evidence of Vision Mamba’s superiority through various benchmarks. On ImageNet classification, COCO object detection, and ADE20K semantic segmentation, Vim demonstrates not just higher performance but also greater efficiency. For instance, in handling high-resolution images (1248x1248), Vim is 2.8 times faster than DEIT while saving a significant 86% of GPU memory. This efficiency is particularly notable given the memory constraints often encountered in high-resolution image processing.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Comparative Analysis with VIT
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbxe2125tfze568hjb1p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbxe2125tfze568hjb1p.png" alt="Image description" width="652" height="747"&gt;&lt;/a&gt;&lt;br&gt;
Interestingly, the paper doesn’t just stop at comparing VIM with DEIT. It also includes comparisons with Google’s Vision Image Transformer (VIT). This is an important inclusion because VIT represents another significant advancement in Transformer-based vision models. The results in the paper show that while VIT is indeed a powerful model, VIM still surpasses it in efficiency and performance, especially as the resolution increases. This comparison is vital for readers familiar with the landscape of computer vision models, as it provides a broader context for evaluating VIM’s capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Importance of High-Resolution Image Processing
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6t3dzd314ow11muyk98n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6t3dzd314ow11muyk98n.png" alt="Image description" width="800" height="227"&gt;&lt;/a&gt;&lt;br&gt;
The paper emphasizes the critical importance of high-resolution image processing in various fields. In satellite imagery, for instance, high resolution is essential for detailed analysis and accurate conclusions. Similarly, in industrial settings such as PCB manufacturing, the ability to detect minute faults in high-resolution images can be crucial for quality control. VIM’s proficiency in handling such tasks not only shows its practical utility but also underscores the need for efficient high-resolution image processing models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four Key Contributions of the Paper
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Introduction of Vision Mamba (VIM):&lt;/strong&gt; The paper introduces a revolutionary approach in the form of VIM, which utilizes bidirectional state space models (SSMs) for global visual context modeling and positional embeddings. This approach marks a departure from reliance on traditional attention mechanisms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficient Positional Understanding:&lt;/strong&gt; The VIM demonstrates an efficient way to grasp the positional context of visual data without the need for Transformer-based attention mechanisms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Computation and Memory Efficiency:&lt;/strong&gt; VIM stands out for its sub-quadratic time computation and linear memory complexity, a stark contrast to the quadratic increase typically seen in Transformer models. This aspect makes VIM particularly suitable for processing high-resolution images.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensive Experimental Validation:&lt;/strong&gt; Through comprehensive testing on benchmarks like ImageNet classification, VIM’s performance and efficiency are validated, solidifying its position as a formidable model in computer vision.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Implications and Future Directions
&lt;/h2&gt;

&lt;p&gt;The development of Vision Mamba opens up exciting possibilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Enhanced Efficiency:&lt;/strong&gt; With its potential for faster processing, Vision Mamba could revolutionize areas like real-time video analysis and large-scale image processing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accessibility:&lt;/strong&gt; Its efficiency could make advanced computer vision more accessible to organizations with limited computational resources.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Innovation:&lt;/strong&gt; Vision Mamba might spur further innovations in neural network architectures, especially for specialized data types.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Paper: &lt;a href="https://arxiv.org/pdf/2401.09417.pdf"&gt;https://arxiv.org/pdf/2401.09417.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;code : &lt;a href="https://github.com/hustvl/Vim"&gt;https://github.com/hustvl/Vim&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In summary, Vision Mamba (WHM) stands as a revolutionary model in the field of computer vision. Its unique architecture, bidirectional processing, and efficiency in handling high-resolution images position it as a superior alternative to existing Transformer-based models. Its potential applications are vast, spanning various sectors that rely on detailed visual data.&lt;/p&gt;

&lt;p&gt;As we progress further into an era dominated by visual content, models like Vision Mamba will become increasingly vital. They offer the promise of not just keeping up with the growing demand for image processing but doing so in a way that is both efficient and effective. The future of computer vision is being reshaped by these advancements, and Vision Mamba is at the forefront of this transformation. For those keen on exploring the cutting edge of AI and computer vision, delving into the full details of the Vision Mamba paper will undoubtedly be a rewarding endeavor.”&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Enhancing Text-to-Image AI: Prompt Recommendation System for Stable Diffusion Using Qdrant Vector Search and RAG</title>
      <dc:creator>azhar</dc:creator>
      <pubDate>Thu, 18 Jan 2024 17:19:07 +0000</pubDate>
      <link>https://dev.to/moazharu/enhancing-text-to-image-ai-prompt-recommendation-system-for-stable-diffusion-using-qdrant-vector-search-and-rag-lcm</link>
      <guid>https://dev.to/moazharu/enhancing-text-to-image-ai-prompt-recommendation-system-for-stable-diffusion-using-qdrant-vector-search-and-rag-lcm</guid>
      <description>&lt;p&gt;Stable Diffusion has emerged as a groundbreaking text-to-image model, transforming the way digital art and image synthesis are approached. By converting textual descriptions into detailed and nuanced images, Stable Diffusion opens a world of possibilities for artists, designers, and content creators. However, the effectiveness of this technology hinges on the quality of the input prompts, which guide the AI in generating relevant images.&lt;/p&gt;

&lt;p&gt;Before we proceed, let’s stay connected! Please consider following me on DEV, and don’t forget to connect with me on &lt;a href="https://www.linkedin.com/in/mohamed-azharudeen/"&gt;LinkedIn &lt;/a&gt;for a regular dose of data science and deep learning insights.” 🚀📊🤖&lt;/p&gt;

&lt;h3&gt;
  
  
  The Challenge of Prompting Stable Diffusion
&lt;/h3&gt;

&lt;p&gt;Crafting the perfect prompt for Stable Diffusion is a nuanced art. The model responds to the intricacies of language, and a well-constructed prompt can lead to stunning visual outputs. Conversely, vague or poorly structured prompts may result in unsatisfactory images. The challenge for users is navigating through and understanding the vast array of potential prompts to find one that aligns with their vision.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution
&lt;/h2&gt;

&lt;p&gt;To assist users in this task, a sophisticated system using Vector Search and Retrieval Augmented Generation (RAG) can be employed. This system aims to analyze a vast database of successful prompts, identifying and suggesting the most relevant ones to the user’s input, thus streamlining the process of initiating Stable Diffusion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vector Search — A Key Solution
&lt;/h3&gt;

&lt;p&gt;Vector Search plays a pivotal role in this system. It involves transforming textual data into high-dimensional vectors using models like BGE embeddings. These vectors capture the semantic essence of the text, enabling the system to perform semantic searches. By comparing the vector of a user’s input with vectors from a prompt database, the system can identify the most semantically similar prompts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Utilizing Qdrant for Vector Database
&lt;/h3&gt;

&lt;p&gt;Qdrant, chosen for its efficiency and scalability, serves as the vector database. It offers fast indexing and querying capabilities, essential for handling large volumes of vector data. Qdrant’s support for different distance metrics and filtering options further enhances the search’s accuracy and relevance.&lt;/p&gt;

&lt;h2&gt;
  
  
  This system would involve several key steps:
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Prompt Database Creation
&lt;/h3&gt;

&lt;p&gt;Here’s compiling a diverse and comprehensive collection of prompts previously used with Stable Diffusion.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Importance of Diversity and Comprehensiveness
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Diversity:&lt;/strong&gt; This implies that the prompts should cover a wide range of subjects, styles, and themes. The goal is to encompass as many different types of imagery as possible — from landscapes and portraits to abstract art and specific object representations. Diversity ensures that the system can cater to a broad spectrum of user requests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Comprehensiveness:&lt;/strong&gt; A comprehensive database is one that not only covers a wide range of subjects but also includes variations in the detail, complexity, and structure of the prompts. This includes prompts of varying lengths, different levels of descriptiveness, and diverse linguistic styles. A comprehensive database allows the system to understand and generate more nuanced and tailored prompts.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datasets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dataset&lt;/span&gt;

&lt;span class="c1"&gt;######################### Part 1: Load DiffusionDB ############################
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;urllib.request&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;urlretrieve&lt;/span&gt;

&lt;span class="c1"&gt;# Download the parquet table
&lt;/span&gt;&lt;span class="n"&gt;table_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://huggingface.co/datasets/poloclub/diffusiondb/resolve/main/metadata.parquet&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="nf"&gt;urlretrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;metadata.parquet&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Read the table using Pandas
&lt;/span&gt;&lt;span class="n"&gt;raw_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_parquet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;metadata.parquet&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;raw_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Keep top 10K prompts
&lt;/span&gt;&lt;span class="n"&gt;prompts_raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;del&lt;/span&gt; &lt;span class="n"&gt;raw_df&lt;/span&gt;


&lt;span class="c1"&gt;######################### Part 2: Data Preparation ############################
&lt;/span&gt;
&lt;span class="c1"&gt;# Remove prompts with word count less than 10
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;filter_strings_with_word_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strings&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;filtered_strings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;strings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;filtered_strings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;filtered_strings&lt;/span&gt;

&lt;span class="n"&gt;prompts_filtered&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;filter_strings_with_word_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompts_raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# remove prompts with very high similarities
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Levenshtein&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;concurrent.futures&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;remove_similar_strings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;unique_strings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;step_counter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_unique&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;nonlocal&lt;/span&gt; &lt;span class="n"&gt;unique_strings&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;us&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;unique_strings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;distance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Levenshtein&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;distance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;us&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;concurrent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strings&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;is_unique&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;result&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                &lt;span class="n"&gt;unique_strings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Print number of strings processed for every 1000 steps
&lt;/span&gt;            &lt;span class="c1"&gt;#if (i + 1) % 1000 == 0:
&lt;/span&gt;            &lt;span class="c1"&gt;#    print(f"Processed {i + 1} strings")
&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;unique_strings&lt;/span&gt;

&lt;span class="c1"&gt;# Set a similarity threshold (adjust as needed)
&lt;/span&gt;&lt;span class="n"&gt;similarity_threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="c1"&gt;# Adjust threshold as desired
&lt;/span&gt;
&lt;span class="c1"&gt;# Remove similar prompts
&lt;/span&gt;&lt;span class="n"&gt;prompts_unique&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;remove_similar_strings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompts_filtered&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;similarity_threshold&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;########################## Part 3: Data Storage ###############################
&lt;/span&gt;
&lt;span class="c1"&gt;# Specify the CSV file name
&lt;/span&gt;&lt;span class="n"&gt;csv_file_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompts_unique.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Open the CSV file for writing with UTF-8 encoding
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;csv_file_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;newline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;csv_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;csv_writer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;csv_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;csv_writer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writerow&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt example&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="c1"&gt;# Write each string as a separate row in the CSV file
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;prompts_unique&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;csv_writer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writerow&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This script is part of a pipeline to process a large dataset of prompts for a model like Stable Diffusion. It involves downloading and filtering this dataset to ensure the prompts are diverse and unique, and then storing a subset of these prompts in a CSV file for further use.&lt;/p&gt;

&lt;p&gt;This kind of preprocessing is crucial for creating an effective dataset for tasks like training AI models or creating a prompt recommendation system.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Vector Embedding
&lt;/h3&gt;

&lt;p&gt;We’re using a language model to convert these prompts into semantic vectors and indexing them in Qdrant.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Semantic Representation:&lt;/strong&gt; The vectors produced by the language model are not just random numbers. They are carefully structured so that similar prompts have similar vector representations. This similarity in vector space ideally reflects semantic similarity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-Dimensional Space:&lt;/strong&gt; These vectors usually exist in a high-dimensional space (hundreds or thousands of dimensions), enabling them to encapsulate a wide range of linguistic features.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BAAI/bge-small-en-v1.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;encode_kwargs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;normalize_embeddings&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="c1"&gt;# set True to compute cosine similarity
&lt;/span&gt;
&lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HuggingFaceBgeEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;model_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;device&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="n"&gt;encode_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;encode_kwargs&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Debugging: Check if the file exists
&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompts_unique.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;  &lt;span class="c1"&gt;# Or the correct relative path to your file
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;FileNotFoundError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The file &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; was not found.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="n"&gt;loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CSVLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    
&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;index_from_loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Qdrant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:memory:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Local mode with in-memory storage only
&lt;/span&gt;        &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my_documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The conversion of prompts into semantic vectors and their subsequent indexing in a vector database like Qdrant is a foundational step in creating a prompt recommendation system for Stable Diffusion.&lt;/p&gt;

&lt;p&gt;This process enables the system to understand and work with prompts in a machine-readable format, paving the way for advanced search and retrieval functions based on the semantic content of the prompts. This step is vital for leveraging the full capabilities of AI in generating relevant and effective prompts for text-to-image models.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Semantic Search Implementation
&lt;/h3&gt;

&lt;p&gt;When a user inputs a prompt, the system converts it into a vector and performs a semantic search in Qdrant, retrieving closely related prompts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Ldw0HJ_S--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nzmewxdcswe750yu6u6k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Ldw0HJ_S--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nzmewxdcswe750yu6u6k.png" alt="Original Prompt" width="800" height="171"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;semantic_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;original_prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="c1"&gt;#rag client function
&lt;/span&gt;
    &lt;span class="n"&gt;relevant_prompts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;original_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    

    &lt;span class="n"&gt;list_prompts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relevant_prompts&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="n"&gt;list_prompts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relevant_prompts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;list_prompts&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Process of Semantic Search Implementation
&lt;/h3&gt;

&lt;h3&gt;
  
  
  User Input Conversion
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Initial Step:&lt;/strong&gt; When a user inputs a prompt into the system, the first step is to interpret this input in a way that the machine understands — as a vector.&lt;/li&gt;
&lt;li&gt;The system employs a language model to convert the textual prompt into a high-dimensional vector. This process involves analyzing the linguistic characteristics of the prompt and encoding them into numerical form.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Performing the Semantic Search in Qdrant
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;**Searching for Similar Vectors: **The user’s input vector is then used to query a vector database — in this case, Qdrant.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How Qdrant Works:&lt;/strong&gt; Qdrant has indexed a vast array of prompts (also converted into vectors) in its database. When it receives the vector representation of a user’s prompt, it performs a search to find the most similar vectors from its index.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic Similarity:&lt;/strong&gt; The similarity between vectors is determined based on their positioning in the high-dimensional space. Vectors that are close to each other represent prompts that are semantically similar.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Retrieving Closely Related Prompts
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Result Generation:&lt;/strong&gt; The output of this search is a list of prompts whose vectors are most similar to the vector of the user’s input. These are the prompts that, semantically, closely relate to what the user is looking for.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advantage Over Keyword Searches:&lt;/strong&gt; This method is more efficient and accurate than traditional keyword searches as it understands and matches the context and nuances of the user’s input, rather than just matching words.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--DjGmlEG3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/e671t7k77apa08jhj7wm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--DjGmlEG3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/e671t7k77apa08jhj7wm.png" alt="After conducting a semantic search of the user’s prompt using the Qdrant database" width="800" height="402"&gt;&lt;/a&gt;&lt;br&gt;
The implementation of semantic search within this system is a vital component that significantly enhances the user experience. It brings sophistication and precision to the process of finding the right prompts for text-to-image generation models, ensuring that the creative intent of the user is accurately captured and reflected in the AI-generated images.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Integration with RAG
&lt;/h3&gt;

&lt;p&gt;The top results from the vector search are then fed into a RAG setup, which intelligently combines elements from these prompts with the user’s original input, refining the prompt further.&lt;/p&gt;

&lt;p&gt;For the Retrieval Augmented Generation (RAG) component, we utilized the Mistral 7B model, sourced from LM Studio.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--E6-EB20W--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/c6gc5xs9rtqb31m6cxvj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--E6-EB20W--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/c6gc5xs9rtqb31m6cxvj.png" alt="LM Studio is utilized for querying the Mistral model." width="800" height="253"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Integration Process
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Combining with User’s Original Input:
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yW9yabnt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kvy94ug7fkrz98zebecp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yW9yabnt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kvy94ug7fkrz98zebecp.png" alt="Choosing the index corresponding to the relevant semantic that we intend to use." width="800" height="97"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The RAG setup takes these top-ranked prompts and intelligently merges their elements with the user’s original input.&lt;/li&gt;
&lt;li&gt;This integration is crucial as it ensures that the essence of the user’s initial intent is preserved, while enriching it with ideas and expressions from the retrieved prompts.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Refining the Prompt
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;The RAG language model then works on this combined input to generate a new, refined prompt.&lt;/li&gt;
&lt;li&gt;This refinement process involves creatively fusing the various elements, ensuring that the new prompt is not only relevant but also likely to produce more effective and accurate results when used in a text-to-image model.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# LM Studio Endpoint URL
&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:1234/v1/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Headers
&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Data payload
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This app is to generate prompt for image generation. the user will provide Original Prompt for image generation. Based on Selected prompt, Only slightly revise Original Prompt. &lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s"&gt;                Please keep the Generated Prompt clear, complete, and less than 50 words. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Original Prompt: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;original_prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;
                Selected Prompt: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;selected_prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;
                Generated Prompt: &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Make the POST request
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Check if the request was successful
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Success:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--I9cBpwbC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xtci4qx6posgqtfvy3oh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--I9cBpwbC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xtci4qx6posgqtfvy3oh.png" alt="RAG-Generated Result: Enhanced and Refined Prompt" width="800" height="127"&gt;&lt;/a&gt;&lt;br&gt;
It not only streamlines the process of prompt creation for complex models like Stable Diffusion but also elevates the quality and effectiveness of these prompts. This approach showcases how the combination of retrieval and generative techniques can lead to innovative solutions in AI applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Prompt Testing with Stable Diffusion
&lt;/h3&gt;

&lt;p&gt;The refined prompts can be tested with the Stable Diffusion model to demonstrate their effectiveness in generating high-quality images.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7cFng8zJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/p2ogojft9j2srkfcq5v2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7cFng8zJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/p2ogojft9j2srkfcq5v2.jpg" alt="Generated Image" width="512" height="512"&gt;&lt;/a&gt;&lt;br&gt;
The primary goal of prompt testing is to evaluate how well the refined prompts perform when used with the Stable Diffusion text-to-image model. This involves feeding the refined prompts into Stable Diffusion and analyzing the quality, relevance, and accuracy of the images produced.&lt;/p&gt;

&lt;p&gt;The goal is a user-friendly system that significantly reduces the time and effort needed to discover effective prompts for Stable Diffusion. By leveraging Vector Search and RAG, users can quickly find and refine prompts, leading to more satisfying and relevant image generation outcomes.&lt;/p&gt;

&lt;h4&gt;
  
  
  Code
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;GitHub Code :&lt;/strong&gt; &lt;a href="https://github.com/azharlabs/Vector-Search-and-RAG-for-Stable-Diffusion-using-Qdrant-DB"&gt;Vector Search and RAG for Stable Diffusion using Qdrant DB&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The integration of Vector Search and RAG into the process of generating prompts for Stable Diffusion represents a significant step forward in democratizing AI-driven art creation. It addresses a key challenge faced by many users of these advanced models and opens up new avenues for creative expression. As these technologies continue to evolve, we can expect even more sophisticated tools and systems to emerge, further enhancing the accessibility and utility of AI in artistic and design endeavors.&lt;/p&gt;

</description>
      <category>stablediffusion</category>
      <category>rag</category>
      <category>semantic</category>
      <category>qdrant</category>
    </item>
  </channel>
</rss>
