<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Robin Lee</title>
    <description>The latest articles on DEV Community by Robin Lee (@sl5035).</description>
    <link>https://dev.to/sl5035</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1083474%2F1a5f7cdb-8629-4aef-a299-cdd849d179be.jpeg</url>
      <title>DEV Community: Robin Lee</title>
      <link>https://dev.to/sl5035</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sl5035"/>
    <language>en</language>
    <item>
      <title>LLMs and Vector Databases</title>
      <dc:creator>Robin Lee</dc:creator>
      <pubDate>Mon, 22 May 2023 10:03:54 +0000</pubDate>
      <link>https://dev.to/sl5035/llms-and-vector-databases-40jo</link>
      <guid>https://dev.to/sl5035/llms-and-vector-databases-40jo</guid>
      <description>&lt;p&gt;About a month ago, vector database &lt;a href="https://www.prnewswire.com/news-releases/weaviate-raises-50-million-series-b-funding-to-meet-soaring-demand-for-ai-native-vector-database-technology-301803296.html"&gt;Weaviate&lt;/a&gt; landed 50 million dollars in series B funding. About three weeks ago, &lt;a href="https://www.businessinsider.com/vector-database-startup-chroma-raises-seed-funding-generative-artificial-intelligence-2023-4"&gt;Chroma&lt;/a&gt;, an open source project with only 5k stars raised 18 million for its embeddings database and about two weeks ago, &lt;a href="https://techcrunch.com/2023/04/27/pinecone-drops-100m-investment-on-750m-valuation-as-vector-database-demand-grows/"&gt;Pinecone DB&lt;/a&gt; announced a $100 million Series B investment on a $750 million post valuation. Naturally, a question arises, what is a vector database?&lt;/p&gt;

&lt;p&gt;To talk about vector databases, we first need to know what a vector is. Vector is just an array of numbers. However, they can represent more complex objects such as words, sentences, images, or audio files in a continuous high dimensional space called an embedding.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5KJoOIJd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ezfhdrqbc1vzj14izinh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5KJoOIJd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ezfhdrqbc1vzj14izinh.png" width="800" height="597"&gt;&lt;/a&gt;&lt;br&gt;Vector embeddings
  &lt;/p&gt;

&lt;p&gt;Embeddings map the semantic meaning of words together or similar features in virtually any other data type. These embeddings can then be used for recommendation systems, search engines, and even text generation such as ChatGPT. But once you have your embeddings, the real question becomes: Where do you store them and how do you query them?&lt;/p&gt;

&lt;p&gt;That's where vector databases come in. In a relational database, you have rows and columns. In a document database, you have documents and collections. However, in a vector database, you have arrays of numbers clustered together based on similarity which can later be queried with ultra low latency, making it an ideal choice for AI driven applications.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--A34VLwSN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vq6l67r8ksgetdd5w5l5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--A34VLwSN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vq6l67r8ksgetdd5w5l5.png" width="800" height="235"&gt;&lt;/a&gt;&lt;br&gt;Relational vs. Document databases
  &lt;/p&gt;

&lt;p&gt;Relational databases such as PostgreSQL have tools like PGVector to support this type of functionality and Redis also has its first class vector support such as Redisearch. Bunch of new native vector databases are popping up, too. Weaviate and Milvus are open source options written in Go. Chroma, based on Clickhouse under the hood, is also an another open source option. Another extremely popular option is PineconeDB, but it is not open source.&lt;/p&gt;




&lt;p&gt;Let's jump into some code to see what it looks like. I will be using PineconeDB and Python. Using the official guide, I will be implementing the Abstractive Question Answering program using the ELI5 BART model. Abstractive question answering focuses on the generation of multi-sentence answers to open-ended questions. It usually works by searching massive document stores for relevant information and then using this information to synthetically generate answers.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;Our source data will be taken from the Wiki Snippets dataset, which contains over 17 million passages from Wikipedia. We will only utilize 5,000 passages that include "History" in the "section title" column &lt;em&gt;(due to memory issues, you can utilize the complete dataset if you want; the official guide used 50,000 passages)&lt;/em&gt;.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;# create a pandas dataframe with the documents we extracted
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--D1nQ1Jbw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2gitdrrjv16gmcl410c5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--D1nQ1Jbw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2gitdrrjv16gmcl410c5.png" alt="A sneak peek at our source dataset" width="800" height="170"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;To build our vector index, we must first establish a connection with Pinecone. Then, we create a new index. An index is the highest-level organizational unit of vector data in Pinecone. It accepts and stores vectors, serves queries over the vectors it contains, and does other vector operations over its contents. We specify the metric type as "cosine" and dimension as 768 because the retriever we use to generate context embeddings is optimized for cosine similarity and outputs 768-dimension vectors. Other metrics are "euclidean" and "dotproduct."&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pinecone&lt;/span&gt;

&lt;span class="c1"&gt;# connect to pinecone environment
&lt;/span&gt;&lt;span class="n"&gt;pinecone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"YOUR_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"us-central1-gcp"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;index_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"abstractive-question-answering"&lt;/span&gt;

&lt;span class="c1"&gt;# check if the abstractive-question-answering index exists
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;index_name&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pinecone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;list_indexes&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# create the index if it does not exist
&lt;/span&gt;    &lt;span class="n"&gt;pinecone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;dimension&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"cosine"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# connect to abstractive-question-answering index we created
&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pinecone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;We will use a SentenceTransformer model based on Microsoft's MPNet as our retriever. Also, we will be using ELI5 BART for the generator which is a Sequence-To-Sequence model trained using the "Explain Like I'm 5" (ELI5) dataset. Sequence-To-Sequence models can take a text sequence as input and produce a different text sequence as output. You can download these models from the &lt;a href="https://huggingface.co/"&gt;Huggingface hub&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sentence_transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SentenceTransformer&lt;/span&gt;

&lt;span class="c1"&gt;# set device to GPU if available
&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'cuda'&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_available&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="s"&gt;'cpu'&lt;/span&gt;
&lt;span class="c1"&gt;# load the retriever model from huggingface model hub
&lt;/span&gt;&lt;span class="n"&gt;retriever&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SentenceTransformer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"flax-sentence-embeddings/all_datasets_v3_mpnet-base"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;retriever&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--PDYENcWz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6i5cp6is2gs2b6yarf47.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--PDYENcWz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6i5cp6is2gs2b6yarf47.png" alt="Sentence Transformer" width="800" height="85"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BartTokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BartForConditionalGeneration&lt;/span&gt;

&lt;span class="c1"&gt;# load bart tokenizer and model from huggingface
&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;BartTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'vblagoje/bart_lfqa'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;generator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;BartForConditionalGeneration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'vblagoje/bart_lfqa'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Next, we should upload our data to the pinecone database using the &lt;code&gt;index.upsert()&lt;/code&gt; command. If the operation was successful, you should see the following output.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--P1cwgLk---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/chvu166srupdr89qt2dh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--P1cwgLk---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/chvu166srupdr89qt2dh.png" alt="Uploading data" width="800" height="134"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Then let's right some helper functions to retrieve context passages from Pinecone index and to format the query in the way the generator expects the input.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_pinecone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# generate embeddings for the query
&lt;/span&gt;    &lt;span class="n"&gt;xq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="c1"&gt;# search pinecone index for context passage with the answer
&lt;/span&gt;    &lt;span class="n"&gt;xc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;include_metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;xc&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;format_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# extract passage_text from Pinecone search result and add the &amp;lt;P&amp;gt; tag
&lt;/span&gt;    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;"&amp;lt;P&amp;gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'metadata'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'passage_text'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="c1"&gt;# concatinate all context passages
&lt;/span&gt;    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;" "&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# contcatinate the query and context passages
&lt;/span&gt;    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;"question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; context: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;

&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"when was the first electric power system built?"&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;query_pinecone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yEru-LjE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/o9f7rddb187k9lcl5pth.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yEru-LjE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/o9f7rddb187k9lcl5pth.png" alt="Query result" width="800" height="333"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Lastly, we'll write a helper function that generates the answer given a query.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;pprint&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pprint&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# tokenize the query to get input_ids
&lt;/span&gt;    &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"pt"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# use generator to predict output ids
&lt;/span&gt;    &lt;span class="n"&gt;ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;generator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"input_ids"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;num_beams&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;min_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# use tokenizer to decode the output ids
&lt;/span&gt;    &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;batch_decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;skip_special_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;clean_up_tokenization_spaces&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;pprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;We use this function to test different queries as shown below.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--OjPf3Itu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/i6ug3fnwmib9nx6t4vdd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--OjPf3Itu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/i6ug3fnwmib9nx6t4vdd.png" alt="Different queries and their results" width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note that the answers are not complete since we only utilized 5,000 passages. You can adjust the numbers of passages and observe results.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;The real reason that these databases are so hot right now is that they can extend LLMs with long-term memory. You start with a general purpose model like OpenAI's GPT-4, Meta's LLaMA, or Google's LaMDA then provide your own data in a vector database. When the user makes a prompt, you can then query relevant documents from your own database to update the context which will customize the final response and it can also retrieve historical data to give the AI long-term memory. They also integrate with tools like Langchain that combine multiple LLMs together.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Some parts transcribed from Fireship's video: Vector databases are so hot right now. WTF are they?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Official abstractive-question-answering guide: &lt;a href="https://docs.pinecone.io/docs/abstractive-question-answering"&gt;https://docs.pinecone.io/docs/abstractive-question-answering&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>ai</category>
      <category>database</category>
      <category>machinelearning</category>
      <category>news</category>
    </item>
    <item>
      <title>Twitter's Open-Source Recommendation Algorithm</title>
      <dc:creator>Robin Lee</dc:creator>
      <pubDate>Thu, 18 May 2023 06:02:32 +0000</pubDate>
      <link>https://dev.to/sl5035/twitters-open-source-recommendation-algorithm-2c08</link>
      <guid>https://dev.to/sl5035/twitters-open-source-recommendation-algorithm-2c08</guid>
      <description>&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
      &lt;div class="c-embed__cover"&gt;
        &lt;a href="https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm" class="c-link s:max-w-50 align-middle" rel="noopener noreferrer"&gt;
          &lt;img alt="" src="https://res.cloudinary.com/practicaldev/image/fetch/s--s4GS0ue5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.cms-twdigitalassets.com/content/dam/blog-twitter/engineering/en_us/main-template-assets/Eng_EXPLORE_Pink.png.twimg.768.png" height="288" class="m-0" width="768"&gt;
        &lt;/a&gt;
      &lt;/div&gt;
    &lt;div class="c-embed__body"&gt;
      &lt;h2 class="fs-xl lh-tight"&gt;
        &lt;a href="https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm" rel="noopener noreferrer" class="c-link"&gt;
          Twitter's Recommendation Algorithm
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;p class="truncate-at-3"&gt;
          Twitter aims to deliver you the best of what’s happening in the world right now. This blog is an introduction to how the algorithm selects Tweets for your timeline.
        &lt;/p&gt;
      &lt;div class="color-secondary fs-s flex items-center"&gt;
          &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://res.cloudinary.com/practicaldev/image/fetch/s--NaYBMdGN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.twitter.com/etc/designs/blog-twitter/public/img/favicon.ico" width="48" height="48"&gt;
        blog.twitter.com
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Less than seven months ago, Elon Musk paid 44 billion dollars for Twitter. Ever since, he fired half the company and gave blue check marks to everyone. Twitter is now only worth 20 billion dollars. Many users have moved to Mastodon and the NYT lost its blue check. It looks like Twitter is collapsing. However, in reality, Elon is playing the long game of chess against the mainstream news media like the Fox News and CNN channels. He is trying to take their advertisers by making Twitter the future platform for all journalism.&lt;/p&gt;

&lt;p&gt;Twitter made a part of its recommendation algorithm open-source about a month ago. Although it is real production code at Twitter, it is not 100 percent of the code, so it is really only useful for research and transparency. The code base is mostly written in Scala, a JVM language that is similar to JAVA but concise. Twitter was originally written with Ruby on Rails but they moved away from it over a decade ago.&lt;/p&gt;

&lt;p&gt;If you take a closer look into some of the files in the repo, we can notice some extremely interesting implementations and details. Take a look at these code snippets for example &lt;em&gt;(getLinearRankingParams from EarlybirdTensorflowBasedSimilarityEngine.scala file is now deprecated as of Apr 05, 2023)&lt;/em&gt;.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;We have a bunch of ranking parameters each with a default value. Retweets provide a 20 times boost while likes provide a 30 times boost. Images and videos also provide a small boost. Not surprisingly, you also get a boost for being a paying Twitter blue member.&lt;/p&gt;

&lt;p&gt;On the other hand, a tweet can also get a negative boost if the account has a lot of mutes, blocks, or spam reports. Spelling errors and made up words will also give you a debuff. &lt;/p&gt;

&lt;p&gt;Offensive, spamming, and NSFW tweets can also get a debuff while trending, verified, and media tweets get a boost. There is also a long list of topics that won't be amplified: anything that has been flagged as misinformation, harassment, etc.&lt;/p&gt;




&lt;p&gt;How does Twitter actually select the tweets to display on our home page using these parameters, then? We can break the recommendation pipeline into three parts. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_qHtAMGB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rm3lmoqct13u6va03sci.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_qHtAMGB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rm3lmoqct13u6va03sci.png" width="800" height="257"&gt;&lt;/a&gt;&lt;br&gt;How the twitter recommendation pipeline works
  &lt;/p&gt;

&lt;p&gt;The first step is to find a pool of 1500 tweets that you might be interested in using a technique called &lt;strong&gt;candidate sourcing&lt;/strong&gt;. There are three ways Twitter uses for candidate sourcing. First pool of candidates that consist a majority of your home page is using your followers, or your in-network source. For this, Twitter uses a model called &lt;a href="https://www.ueo-workshop.com/wp-content/uploads/2014/04/sig-alternate.pdf"&gt;Realgraph&lt;/a&gt; which predicts the likelihood of engagement between two users. Second pool of candidates come from accounts you don’t follow yet, or your out-of-network source, using two concepts: social graphs and embedding spaces. To select relevant tweets from your graph, Twitter uses an algorithm called &lt;a href="https://www.vldb.org/pvldb/vol9/p1281-sharma.pdf"&gt;GraphJet&lt;/a&gt;, a graph processing engine that maintains a real-time interaction graph between users and Tweets that traverses through your social graph. For most of your out-of-network tweets, however, Twitter uses an algorithm called &lt;a href="https://dl.acm.org/doi/10.1145/3394486.3403370"&gt;SimClusters&lt;/a&gt; to discover  communities anchored by a cluster of influential users in an embedding space.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Bwr__zgu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/uxnvo25qidaofu7l1wzo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Bwr__zgu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/uxnvo25qidaofu7l1wzo.png" width="800" height="524"&gt;&lt;/a&gt;&lt;br&gt;Communities in an embedding space grouped by the SimClusters algorithm
  &lt;/p&gt;

&lt;p&gt;From there, it ranks that pool of tweets with a 48 million parameter neural network. Lastly, it filters out contents by static rules like accounts that you've blocked or muted.&lt;/p&gt;




&lt;p&gt;Why would Elon do this? Why would he release his trade secrets to the public? Well, it kind of makes Twitter like the Linux of social media. The public can identify parts that are unfair in the algorithm and address them in public. &lt;/p&gt;

&lt;p&gt;In my opinion, it is mostly a marketing move to build trust. It no longer feels like Twitter is run by a mysterious figure and de-boost content without some degree of transparency. There is also a huge opportunity here because the trust in the mainstream media has fallen so low many people already use Twitter to consume the news. And although Twitter is currently losing money, they have talked about compensating content creators just like Youtube and other platforms, too. When that happens, journalists could potentially make a living on Twitter and put their best content there. &lt;/p&gt;

&lt;p&gt;Elon knows Twitter blue is never going to make Twitter any money but rather it is designed to uplift independent creators while embarrassing the establishment. The blue checks are now irrelevant and by open sourcing the code, Twitter is laying the groundwork to become the fair and balanced most trusted name in the news. This may force other social media platforms to become more transparent.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Some parts transcribed from Fireship's video: Twitter algorithm open-sourced...&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>twitter</category>
      <category>machinelearning</category>
      <category>news</category>
      <category>scala</category>
    </item>
  </channel>
</rss>
