<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ce Gao</title>
    <description>The latest articles on DEV Community by Ce Gao (@gaocegege).</description>
    <link>https://dev.to/gaocegege</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F924627%2F45195224-8adc-48fb-97e9-39f353ab0f9a.jpeg</url>
      <title>DEV Community: Ce Gao</title>
      <link>https://dev.to/gaocegege</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gaocegege"/>
    <language>en</language>
    <item>
      <title>My binary vector search is better than your FP32 vectors</title>
      <dc:creator>Ce Gao</dc:creator>
      <pubDate>Tue, 26 Mar 2024 02:10:06 +0000</pubDate>
      <link>https://dev.to/gaocegege/my-binary-vector-search-is-better-than-your-fp32-vectors-4j2k</link>
      <guid>https://dev.to/gaocegege/my-binary-vector-search-is-better-than-your-fp32-vectors-4j2k</guid>
      <description>&lt;p&gt;Within the field of vector search, an intriguing development has arisen: binary vector search. This approach shows promise in tackling the long-standing issue of memory consumption by achieving a remarkable 30x reduction. However, a critical aspect that sparks debate is its effect on accuracy.&lt;/p&gt;

&lt;p&gt;We believe that using binary vector search, along with specific optimization techniques, can maintain similar accuracy. To provide clarity on this subject, we showcase a series of experiments that will demonstrate the effects and implications of this approach.&lt;/p&gt;

&lt;h1&gt;
  
  
  What is a binary vector?
&lt;/h1&gt;

&lt;p&gt;A binary vector is a representation of a vector where each element in the vector is encoded as a binary value, typically either 0 or 1. This encoding scheme transforms the original vector, which may contain real-valued or high-dimensional data, into a binary format.&lt;/p&gt;

&lt;p&gt;Binary vectors require only one bit of memory to store each element, while the original float32 vectors need 4 bytes for each element. This means that using binary vectors can reduce memory usage by &lt;strong&gt;up to 32 times&lt;/strong&gt;. Additionally, this reduction in memory requirements corresponds to a notable increase in Requests Per Second (RPS) for binary vector operations.&lt;/p&gt;

&lt;p&gt;Let's consider an example where we have 1 million vectors, and each vector is represented by float32 values in a 3072-dimensional space. In this scenario, the original float32 vector index would require around 20 gigabytes (GB) of memory to store all the vectors.&lt;/p&gt;

&lt;p&gt;Now, if we were to use binary vectors instead, the memory usage would be significantly reduced. In this case, the binary vector index would take approximately 600 megabytes (MB) to store all 1 million vectors.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JsKiMj1n--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://docs.pgvecto.rs/assets/memusage.iXLefc3e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JsKiMj1n--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://docs.pgvecto.rs/assets/memusage.iXLefc3e.png" alt="" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, it was expected that this reduction in memory would lead to a significant decrease in accuracy because binary vectors lose a lot of the original information.&lt;/p&gt;

&lt;p&gt;Surprisingly, our experiments showed that the &lt;strong&gt;decrease in accuracy was not as big as expected.&lt;/strong&gt; Even though binary vectors lose some specific details, they can still capture important patterns and similarities that allow them to maintain a reasonable level of accuracy.&lt;/p&gt;

&lt;h1&gt;
  
  
  Experiment
&lt;/h1&gt;

&lt;p&gt;To evaluate the performance metrics in comparison to the original vector approach, we conducted benchmarking using the &lt;a href="https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-large-3072-1M"&gt;&lt;code&gt;dbpedia-entities-openai3-text-embedding-3-large-3072-1M&lt;/code&gt;&lt;/a&gt; dataset. The benchmark was performed on a Google Cloud virtual machine (VM) with specifications of n2-standard-8, which includes 8 virtual CPUs and 32GB of memory. We used &lt;a href="https://github.com/tensorchord/pgvecto.rs"&gt;pgvecto.rs&lt;/a&gt; v0.2.1 as the vector database.&lt;/p&gt;

&lt;p&gt;After inserting 1 million vectors into the database table, we built indexes for both the original float32 vectors and the binary vectors.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;openai3072&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;bigserial&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;text_embedding_3_large_3072_embedding&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3072&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;text_embedding_3_large_3072_bvector&lt;/span&gt; &lt;span class="n"&gt;bvector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3072&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;openai_vector_index&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;openai3072&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_embedding_3_large_3072_embedding&lt;/span&gt; &lt;span class="n"&gt;vector_l2_ops&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;openai_vector_index_bvector&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;openai3072&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_embedding_3_large_3072_bvector&lt;/span&gt; &lt;span class="n"&gt;bvector_l2_ops&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After building the indexes, we conducted vector search queries to assess the performance. These queries were executed with varying limits, indicating the number of search results to be retrieved (limit 5, 10, 50, 100).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RHksQ0b---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1710992242559/95798c49-7e3d-49ba-a7ce-412cd737ba36.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RHksQ0b---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1710992242559/95798c49-7e3d-49ba-a7ce-412cd737ba36.png" alt="" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We observed that the Requests Per Second (RPS) for binary vector search was approximately 3000, whereas the RPS for the original vector search was only around 300.&lt;/p&gt;

&lt;p&gt;The RPS metric indicates the number of requests or queries that can be processed by the system per second. A higher RPS value signifies a higher throughput and faster response time.&lt;/p&gt;

&lt;p&gt;However, the accuracy of the binary vector search was reduced to about 80% compared to the original vector search. While this decrease may not be seen as significant in some cases, it can be considered unacceptable in certain situations where achieving high accuracy is crucial.&lt;/p&gt;

&lt;h1&gt;
  
  
  Optimization: adaptive retrieval
&lt;/h1&gt;

&lt;p&gt;Luckily, we have a simple and effective method called adaptive retrieval, which we learned from the &lt;a href="https://aniketrege.github.io/blog/2024/mrl/#what-is-mrl-really-this-time"&gt;Matryoshka Representation Learning&lt;/a&gt;, to improve the accuracy.&lt;/p&gt;

&lt;p&gt;The name is complex but the idea behind adaptive retrieval is straightforward. Let's say we want to find the best 100 candidates. We can follow these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Query the binary vector index&lt;/strong&gt; to retrieve a larger set (e.g. 200 candidates) from the 1 million embeddings. This is a fast operation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rerank the candidates using a KNN query&lt;/strong&gt; to retrieve the top 100 candidates. Please notice that we are running KNN instead of ANN. KNN is well-suited for scenarios where we need to work with smaller sets and perform accurate similarity search, making it an excellent choice for reranking in this case.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7vZHJEHM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1710993929166/a09969ef-8515-4236-9d4a-471916b8f363.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7vZHJEHM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1710993929166/a09969ef-8515-4236-9d4a-471916b8f363.png" alt="" width="800" height="650"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By incorporating this reranking step, we can achieve a notable increase in accuracy, potentially reaching up to 95%. Additionally, the system maintains a high Requests Per Second (RPS), approximately 1700. Furthermore, despite these improvements, the memory usage of the index remains significantly smaller, around 30 times less, compared to the original vector representation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4vdfmrfX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://docs.pgvecto.rs/assets/bvector.x2qPilMU.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4vdfmrfX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://docs.pgvecto.rs/assets/bvector.x2qPilMU.png" alt="" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Below is the SQL code that can be used to execute the adaptive retrieval:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;match_documents_adaptive&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;query_embedding&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3072&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;match_count&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="k"&gt;SETOF&lt;/span&gt; &lt;span class="n"&gt;openai3072&lt;/span&gt;
&lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="k"&gt;SQL&lt;/span&gt;
&lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="c1"&gt;-- Step 1: Query binary vector index to retrieve match_count * 2 candidates&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;shortlist&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;openai3072&lt;/span&gt;
  &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;text_embedding_3_large_3072_bvector&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;binarize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="n"&gt;match_count&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;-- Step 2: Rerank the candidates using a KNN query to retrieve the top candidates&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;shortlist&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;text_embedding_3_large_3072_embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;query_embedding&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="n"&gt;match_count&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Comparison with shortening vectors
&lt;/h1&gt;

&lt;p&gt;OpenAI latest embedding model &lt;code&gt;text-embedding-3-large&lt;/code&gt; has a feature that allows you to &lt;a href="https://openai.com/blog/new-embedding-models-and-api-updates#fn-A"&gt;shorten vectors&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--BVhI6THY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://docs.pgvecto.rs/assets/shortening-embedding.9WYnJK_l.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--BVhI6THY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://docs.pgvecto.rs/assets/shortening-embedding.9WYnJK_l.svg" alt="" width="800" height="197"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It produces embeddings with 3072 dimensions by default. But you could safely remove some numbers from the end of the sequence and still maintain a valid representation for the text. For example, you could shorten the embeddings to 1024 dimensions.&lt;/p&gt;

&lt;p&gt;This feature can help you save memory and make your requests faster, just like binary vectors. It would be a good idea to compare the performance and see which one works better for your needs.&lt;/p&gt;

&lt;p&gt;Based on what we discovered, the conclusion is clear: &lt;strong&gt;Binary vectors significantly outperform shortened vectors.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We performed similar benchmarks to compare with binary vectors. We created indexes using the same dataset and machine type, but with varying dimensionalities. One index had 256 dimensions, while the other had 1024 dimensions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LbH2AjZM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://docs.pgvecto.rs/assets/first-pass.6ZFqbcX6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LbH2AjZM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://docs.pgvecto.rs/assets/first-pass.6ZFqbcX6.png" alt="" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The 1024-dimensional index achieved an accuracy of approximately 85% with a request rate of 1000 requests per second (RPS). On the other hand, the 256-dimensional index had around 60% accuracy with a higher request rate of 1200 RPS.&lt;/p&gt;

&lt;p&gt;The 1024-dimensional index required approximately 8GB of memory, while the 256-dimensional index used around 2GB. In comparison, the binary vector approach achieved an accuracy of around 80% with a request rate of 3000 RPS, and its memory usage was approximately 600MB.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JsKiMj1n--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://docs.pgvecto.rs/assets/memusage.iXLefc3e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JsKiMj1n--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://docs.pgvecto.rs/assets/memusage.iXLefc3e.png" alt="" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We implemented adaptive retrieval with lower-dimensional indexes. The binary vector index still outperformed the 256-dimensional index in terms of both request rate (RPS) and accuracy, while also exhibiting lower memory usage. On the other hand, the adaptive retrieval with the 1024-dimensional index achieved a higher accuracy of 99%; however, it had a relatively lower request rate and consumed 12 times more memory compared to the other indexes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--j7CcjVnC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://docs.pgvecto.rs/assets/adaptive-retrieval-bench.6qE-m9sI.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--j7CcjVnC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://docs.pgvecto.rs/assets/adaptive-retrieval-bench.6qE-m9sI.png" alt="adaptive retrieval benchmark" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Summary
&lt;/h1&gt;

&lt;p&gt;By utilizing adaptive retrieval techniques, binary vectors can maintain a high level of accuracy while significantly reducing memory usage by 30 times. We have presented benchmark metrics in a table to showcase the results. It is important to note that these outcomes are specific to the openai text-embedding-3-large model, which possesses this particular property.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Kau5NtIV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://docs.pgvecto.rs/assets/adaptive-retrieval-tab.EPcQOvxQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Kau5NtIV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://docs.pgvecto.rs/assets/adaptive-retrieval-tab.EPcQOvxQ.png" alt="table" width="800" height="501"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>vectordatabase</category>
      <category>llm</category>
      <category>ai</category>
    </item>
    <item>
      <title>pgvector vs. pgvecto.rs in 2024: A Comprehensive Comparison for Vector Search in PostgreSQL</title>
      <dc:creator>Ce Gao</dc:creator>
      <pubDate>Wed, 20 Mar 2024 02:41:57 +0000</pubDate>
      <link>https://dev.to/gaocegege/pgvector-vs-pgvectors-in-2024-a-comprehensive-comparison-for-vector-search-in-postgresql-3n08</link>
      <guid>https://dev.to/gaocegege/pgvector-vs-pgvectors-in-2024-a-comprehensive-comparison-for-vector-search-in-postgresql-3n08</guid>
      <description>&lt;p&gt;pgvector and &lt;a href="http://pgvecto.rs"&gt;pgvecto.rs&lt;/a&gt; are both vector search extensions designed to enhance the capabilities of PostgreSQL. These extensions revolutionize the way vector search is performed within the database, providing scalable, low-latency, and hybrid-enabled solutions. In this blog post, we will illustrate the differences between pgvector and &lt;a href="http://pgvecto.rs"&gt;pgvecto.rs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;TL; DR: Please navigate directly to the final section that contains a table showcasing the main distinctions between the two.&lt;/p&gt;

&lt;h2&gt;
  
  
  Search
&lt;/h2&gt;

&lt;p&gt;Search is essential for a database. It plays a pivotal role in facilitating efficient data retrieval, enabling users to quickly find and access the information they need. We will demonstrate their performance and highlight the various features &lt;a href="https://pgvecto.rs"&gt;pgvecto.rs&lt;/a&gt; and pgvector offer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Filtering
&lt;/h3&gt;

&lt;p&gt;While both pgvector and &lt;a href="http://pgvecto.rs"&gt;pgvecto.rs&lt;/a&gt; provide support for vector similarity search in PostgreSQL, &lt;a href="http://pgvecto.rs"&gt;pgvecto.rs&lt;/a&gt; goes a step further by introducing &lt;a href="https://www.usenix.org/conference/osdi23/presentation/zhang-qianxi"&gt;VBASE method from OSDI 2023&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Real-world workloads often involve &lt;strong&gt;a combination of vector search and relational queries&lt;/strong&gt;. While vector search is powerful for similarity matching and retrieval of similar vectors, relational queries allow you to perform complex joins, filters, and aggregations on structured data.&lt;/p&gt;

&lt;p&gt;pgvecto.rs is designed to perform efficiently in scenarios that involve &lt;strong&gt;Single-Vector TopK + Filter + Join.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We utilize the &lt;code&gt;laion-768-5m-ip-probability&lt;/code&gt; dataset for benchmarking purposes due to the absence of a comprehensive relational benchmark. The dataset is derived from LAION 2B images. It contains 5,000,000 vectors, 10,000 queries.&lt;/p&gt;

&lt;p&gt;The dataset includes a &lt;code&gt;probability&lt;/code&gt; column that stores random floating-point values generated from a uniform distribution between 0 and 1. The ratio of 0.01 means that each query covers 1% (or 0.01 times) of the dataset, allowing for focused analysis.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--n9b_QidZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1707832203612/14235301-ff6e-401e-be2e-62f6ea29cebf.png%3Fauto%3Dcompress%2Cformat%26format%3Dwebp" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--n9b_QidZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1707832203612/14235301-ff6e-401e-be2e-62f6ea29cebf.png%3Fauto%3Dcompress%2Cformat%26format%3Dwebp" alt="" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HHw_6YN---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1707832190249/b370b8e0-47a0-4aaf-a7b8-67bbd9f51314.png%3Fauto%3Dcompress%2Cformat%26format%3Dwebp" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HHw_6YN---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1707832190249/b370b8e0-47a0-4aaf-a7b8-67bbd9f51314.png%3Fauto%3Dcompress%2Cformat%26format%3Dwebp" alt="" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We present the recall, latency (in milliseconds), and RPS (requests per second) for various &lt;code&gt;probability&lt;/code&gt; ranges while keeping the &lt;code&gt;ef_search&lt;/code&gt; constant. The &lt;code&gt;ef_search&lt;/code&gt; parameter represents the size of the list utilized during k-NN (k-Nearest Neighbors) searches, determining the trade-off between search accuracy and query processing time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0EPlw9JM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1707279248378/4ebbb813-0593-4949-9898-9abdee58e4fc.png%3Fauto%3Dcompress%2Cformat%26format%3Dwebp" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0EPlw9JM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1707279248378/4ebbb813-0593-4949-9898-9abdee58e4fc.png%3Fauto%3Dcompress%2Cformat%26format%3Dwebp" alt="" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://pgvecto.rs/"&gt;&lt;strong&gt;pgvecto.rs&lt;/strong&gt;&lt;/a&gt;, when used with VBASE, consistently yields improved recall, particularly when working with low probability values.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sparse Vector Search
&lt;/h3&gt;

&lt;p&gt;Sparse vectors are high-dimensional vectors that contain few non-zero values. They are suitable for traditional information retrieval use cases. For example, a vector with 32,000 dimensions but only 2 non-zero elements is a typical sparse vector.&lt;/p&gt;

&lt;p&gt;$$S[1\times 32000]=\left[ \begin{matrix} 0 &amp;amp; 0 &amp;amp; 0.015 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; \cdots &amp;amp; 0.543 &amp;amp; 0 \end{matrix} \right]$$&lt;/p&gt;

&lt;p&gt;Dense vectors are embeddings from neural networks. They are generated by text embedding models and have most or all elements non-zero. They have fewer dimensions, such as 256 or 1536, much less than sparse vectors.&lt;/p&gt;

&lt;p&gt;$$D[1\times 256]=\left[ \begin{matrix} 0.342 &amp;amp; 1.774 &amp;amp; 0.087 &amp;amp; 0.870 &amp;amp; 0.001 &amp;amp; \cdots &amp;amp; 0.543 &amp;amp; 0.999 \end{matrix} \right]$$&lt;/p&gt;

&lt;p&gt;pgvector supports dense vector search well, but it &lt;a href="https://github.com/pgvector/pgvector/issues/81"&gt;does not have plan to support sparse vector&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://pgvecto.rs"&gt;pgvecto.rs&lt;/a&gt;, on the other hand, supports both dense vector search and &lt;a href="https://docs.pgvecto.rs/use-case/sparse-vector.html"&gt;sparse vector search&lt;/a&gt;. It provides the ability to use the &lt;a href="https://docs.pgvecto.rs/use-case/sparse-vector.html"&gt;&lt;code&gt;svector&lt;/code&gt;&lt;/a&gt; data type to build sparse vector indexes and perform searches on them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;bigserial&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;svector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[0.1,0,0,0,0,0,0,0,0,0]'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;your_index_name&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;svector_l2_ops&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'[0.3,0,0,0,0,0,0,0,0,0]'&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Vector dimensions
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="http://pgvecto.rs"&gt;pgvecto.rs&lt;/a&gt; extension provides support for up to 65535 dimensions, whereas the pgvector extension supports a maximum of 2000 dimensions. This difference in dimensionality support allows &lt;a href="http://pgvecto.rs"&gt;pgvecto.rs&lt;/a&gt; to handle vectors from OpenAI's latest embedding model directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Single Instruction/Multiple Data (SIMD)
&lt;/h3&gt;

&lt;p&gt;pgvecto.rs utilizes the SIMD (Single Instruction, Multiple Data) instructions available on the user's machine, pgvector relies on the compiler to generate SIMD code at compile time and cannot choose a faster instruction set at runtime. It makes search in pgvecto.rs faster due to the faster calculation of vector distances.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Types
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Binary Vector
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://docs.pgvecto.rs/usage/vector-types.html#bvector-binary-vector"&gt;&lt;code&gt;bvector&lt;/code&gt;&lt;/a&gt; type is a binary vector type in &lt;a href="http://pgvecto.rs"&gt;pgvecto.rs&lt;/a&gt;. It represents a binary vector, which is a vector where each component can take on two possible values, typically 0 and 1.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;bigserial&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;bvector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1,0,1]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[0,1,0]'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;your_index_name&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;bvector_l2_ops&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'[1,0,1]'&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Binary vectors offer significant advantages in specific embedding models like the OpenAI embedding model by effectively reducing memory usage while preserving a satisfactory level of accuracy.&lt;/p&gt;

&lt;p&gt;Here are some performance benchmarks for the &lt;code&gt;bvector&lt;/code&gt; type. We use the &lt;a href="https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-large-3072-1M"&gt;dbpedia-entities-openai3-text-embedding-3-large-3072-1M&lt;/a&gt; dataset for the benchmark. The VM is n2-standard-8 (8 vCPUs, 32 GB memory) on Google Cloud.&lt;/p&gt;

&lt;p&gt;We upsert 1M binary vectors into the table and then run a KNN query for each embedding. It only takes about 600MB memory to index 1M binary vectors, while the &lt;code&gt;vector&lt;/code&gt; type takes about 18GB memory to index the same number of vectors. The &lt;code&gt;bvector&lt;/code&gt;'s accuracy exceeds 95% if we adopt &lt;a href="https://docs.pgvecto.rs/use-case/adaptive-retrieval.html"&gt;&lt;strong&gt;adaptive retrieval&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4vdfmrfX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://docs.pgvecto.rs/assets/bvector.x2qPilMU.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4vdfmrfX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://docs.pgvecto.rs/assets/bvector.x2qPilMU.png" alt="bvector" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  FP16/INT8
&lt;/h3&gt;

&lt;p&gt;Besides binary vectors, &lt;a href="http://pgvecto.rs"&gt;pgvecto.rs&lt;/a&gt; also provides support for &lt;a href="https://docs.pgvecto.rs/usage/vector-types.html#vecf16-half-precision-vector"&gt;FP16 (16-bit floating point)&lt;/a&gt; and INT8 (8-bit integer) data types.&lt;/p&gt;

&lt;h2&gt;
  
  
  Indexing
&lt;/h2&gt;

&lt;p&gt;pgvecto.rs takes a different approach compared to pgvector. It handles the storage and memory of indexes separately from PostgreSQL, instead of relying on the native storage engine of PostgreSQL like pgvector does.&lt;/p&gt;

&lt;p&gt;It's a design tradeoff. During the initial stages of our development, we conducted experiments with PostgreSQL's page storage for the Hierarchical Navigable Small World (HNSW) index. similar to pgvector. However, we encountered various limitations that hindered its effectiveness and functionality:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Parallelization Challenges: The process model of PostgreSQL, where each statement corresponds to a single process and the lack of thread-safe APIs, presents challenges for parallelization. In the case of building vector indexes, which involve computationally intensive tasks, parallelization can significantly enhance performance. However, our efforts to parallelize this process were impeded by frequent occurrences of 'Too many shared buffer locked' errors.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Write-Ahead Logging (WAL) Amplification: The issue of Write-Ahead Logging (WAL) amplification arises when inserting a 2KB vector results in generating over 20KB of WAL. This problem is inherent in the HNSW algorithm used, as it involves modifying multiple edges for a single point insertion. PostgreSQL records each change individually in the WAL, leading to substantial amplification of the WAL size.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Lock Contention: Lock contention arises due to the need to lock every edge list during reads and writes when traversing the HNSW graph. The hierarchical structure of HNSW, where higher levels contain fewer points, often results in lock contention becoming a common bottleneck during index usage. This contention occurs when multiple operations attempt to access or modify the same edge list simultaneously, leading to delays and reduced concurrency.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="http://pgvecto.rs"&gt;pgvecto.rs&lt;/a&gt; adopted a design akin to FreshDiskANN, resembling the Log-Structured Merge (LSM) tree concept. This architecture comprises three components: the writing segment, the growing segment, and the sealed segment. New vectors are initially written to the writing segment. A background process then asynchronously transforms them into the immutable growing segment. Subsequently, the growing segment undergoes a merge with the sealed segment, akin to the compaction process in an LSM tree. This design offers several benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Non-blocking Insertions: Index modification operations do not impede insertion processes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Batched Modifications: Grouping modifications to the HNSW graph improves throughput.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Elimination of Read-Write Lock Contention: Since sealed segments are immutable, issues related to read-write lock contention are mitigated.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, there are drawbacks to this approach. One notable limitation is the absence of out-of-the-box Write-Ahead Logging (WAL) support for the index. This means that features like Point-in-Time Recovery and Physical Replication, which rely on WAL, are not readily available for the index. Nevertheless, the PostgreSQL ecosystem is robust and allows for extensions to define their own custom WAL through a resource manager. While implementing this solution requires additional effort, it is feasible to overcome the limitation and enable WAL support for the index.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Here we will show the main distinctions between pgvecto.rs and pgvector.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;pgvecto.rs&lt;/th&gt;
&lt;th&gt;pgvector&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Filtering&lt;/td&gt;
&lt;td&gt;Introduces VBASE method for vector search and relational query (e.g. Single-Vector TopK + Filter + Join).&lt;/td&gt;
&lt;td&gt;When filters are applied, the results may be incomplete. For example, if you originally intended to limit the results to 10, you might end up with only 5 results with filters.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sparse Vector Search&lt;/td&gt;
&lt;td&gt;Supports both dense and sparse vector search.&lt;/td&gt;
&lt;td&gt;Supports dense vector search.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vector Dimensions&lt;/td&gt;
&lt;td&gt;Supports up to 65535 dimensions.&lt;/td&gt;
&lt;td&gt;Supports up to 2000 dimensions.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SIMD&lt;/td&gt;
&lt;td&gt;SIMD instructions are dynamically dispatched at runtime to maximize performance based on the capabilities of the specific machine.&lt;/td&gt;
&lt;td&gt;Relies on compiler-generated SIMD code at compile time.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Types&lt;/td&gt;
&lt;td&gt;Introduces additional data types: binary vectors, FP16 (16-bit floating point), and INT8 (8-bit integer).&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Indexing&lt;/td&gt;
&lt;td&gt;Handles the storage and memory of indexes separately from PostgreSQL&lt;/td&gt;
&lt;td&gt;Relies on the native storage engine of PostgreSQL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WAL Support&lt;/td&gt;
&lt;td&gt;Provides Write-Ahead Logging (WAL) support for data, index support is working in progress.&lt;/td&gt;
&lt;td&gt;Provides Write-Ahead Logging (WAL) support for index and data.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

</description>
      <category>postgres</category>
      <category>vectordatabase</category>
      <category>llm</category>
      <category>ai</category>
    </item>
    <item>
      <title>pgvecto.rs 0.2: Unifying Relational Queries and Vector Search in PostgreSQL</title>
      <dc:creator>Ce Gao</dc:creator>
      <pubDate>Thu, 14 Mar 2024 04:16:45 +0000</pubDate>
      <link>https://dev.to/gaocegege/pgvectors-02-unifying-relational-queries-and-vector-search-in-postgresql-424e</link>
      <guid>https://dev.to/gaocegege/pgvectors-02-unifying-relational-queries-and-vector-search-in-postgresql-424e</guid>
      <description>&lt;p&gt;We are excited to announce the release of &lt;a href="https://pgvecto.rs" rel="noopener noreferrer"&gt;pgvecto.rs&lt;/a&gt; 0.2, a significant milestone in the journey of bridging the gap between relational queries and vector search in PostgreSQL. This update brings together the power of both worlds, offering enhanced efficiency and enabling complex queries within PostgreSQL.&lt;/p&gt;

&lt;p&gt;In the past, developers and data scientists encountered the significant challenge of managing separate systems for relational queries and vector search. This resulted in increased complexity and resource overhead. However, with the release of &lt;a href="https://pgvecto.rs" rel="noopener noreferrer"&gt;pgvecto.rs&lt;/a&gt; 0.2, we have addressed this issue by integrating the cutting-edge &lt;a href="https://www.usenix.org/conference/osdi23/presentation/zhang-qianxi" rel="noopener noreferrer"&gt;VBASE method from OSDI 2023&lt;/a&gt;. This integration has substantially refined the efficiency of vector search within PostgreSQL.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.hashnode.com%2Fres%2Fhashnode%2Fimage%2Fupload%2Fv1707832190249%2Fb370b8e0-47a0-4aaf-a7b8-67bbd9f51314.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.hashnode.com%2Fres%2Fhashnode%2Fimage%2Fupload%2Fv1707832190249%2Fb370b8e0-47a0-4aaf-a7b8-67bbd9f51314.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.hashnode.com%2Fres%2Fhashnode%2Fimage%2Fupload%2Fv1707832203612%2F14235301-ff6e-401e-be2e-62f6ea29cebf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.hashnode.com%2Fres%2Fhashnode%2Fimage%2Fupload%2Fv1707832203612%2F14235301-ff6e-401e-be2e-62f6ea29cebf.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-world applications: Immich
&lt;/h2&gt;

&lt;p&gt;Real-world applications often require complex queries that go beyond simple Approximate Nearest Neighbor (ANN) search. To explore a practical example of such applications, let's take a closer look at &lt;a href="https://immich.app/" rel="noopener noreferrer"&gt;immich&lt;/a&gt;, a self-hosted photo and video backup solution that highlights the importance of advanced vector and traditional relational queries.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimmich.app%2Fassets%2Fimages%2Fsearch-ex-2-707fe5ab1ab89621a7a1f3e8807b724a.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimmich.app%2Fassets%2Fimages%2Fsearch-ex-2-707fe5ab1ab89621a7a1f3e8807b724a.webp"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://immich.app/" rel="noopener noreferrer"&gt;immich&lt;/a&gt; leverages advanced &lt;strong&gt;vector-based and relational queries&lt;/strong&gt; to provide intelligent search capabilities. With &lt;a href="https://immich.app/" rel="noopener noreferrer"&gt;immich&lt;/a&gt;, you can efficiently search and discover relevant media files based on visual similarity, metadata, and user-defined tags. The underlying technology powering this functionality is pgvecto.rs.&lt;/p&gt;

&lt;p&gt;We will provide a concise overview of the search feature in immich. Consider a scenario where our database consists of three tables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;AssetEntity&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ownerId&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;createdAt&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;updatedAt&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;deletedAt&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;isArchived&lt;/span&gt; &lt;span class="nb"&gt;BOOLEAN&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;isVisible&lt;/span&gt; &lt;span class="nb"&gt;BOOLEAN&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;ExifInfo&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;assetId&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;lat&lt;/span&gt; &lt;span class="nb"&gt;FLOAT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;long&lt;/span&gt; &lt;span class="nb"&gt;FLOAT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="k"&gt;state&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;country&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;ImageEmbedding&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;assetId&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;-- assuming 'n' is the dimensionality of the vector&lt;/span&gt;
  &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We have a table named &lt;code&gt;AssetEntity&lt;/code&gt; that stores information about the images, including their unique identifier (&lt;code&gt;id&lt;/code&gt;), the owner (&lt;code&gt;ownerId&lt;/code&gt;), creation and update timestamps (&lt;code&gt;createdAt&lt;/code&gt; and &lt;code&gt;updatedAt&lt;/code&gt;), and other relevant attributes.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;ExifInfo&lt;/code&gt; table contains information specific to the &lt;a href="https://en.wikipedia.org/wiki/Exif" rel="noopener noreferrer"&gt;EXIF data&lt;/a&gt; of the images, such as the latitude (&lt;code&gt;lat&lt;/code&gt;), longitude (&lt;code&gt;long&lt;/code&gt;), city, state, country, and description. The &lt;code&gt;assetId&lt;/code&gt; column in this table establishes a relationship with the &lt;code&gt;asset_entity&lt;/code&gt; table.&lt;/p&gt;

&lt;p&gt;Additionally, we have the &lt;code&gt;ImageEmbedding&lt;/code&gt; table, which stores vector-based embeddings for each image. The &lt;code&gt;embedding&lt;/code&gt; column is an array of floating-point numbers representing the image embedding vector. The &lt;code&gt;assetId&lt;/code&gt; column in this table also establishes a relationship with the &lt;code&gt;asset_entity&lt;/code&gt; table.&lt;/p&gt;

&lt;p&gt;The query statement below is used to search for images based on certain criteria and sorting by the similarity of the image embeddings. It joins the &lt;code&gt;AssetEntity&lt;/code&gt;, &lt;code&gt;ImageEmbedding&lt;/code&gt;, and &lt;code&gt;ExifInfo&lt;/code&gt; tables, filters the results based on criteria like &lt;code&gt;ownerId&lt;/code&gt;, &lt;code&gt;isArchived&lt;/code&gt;, &lt;code&gt;isVisible&lt;/code&gt;, &lt;code&gt;createdAt&lt;/code&gt;, and &lt;code&gt;city&lt;/code&gt; in the EXIF info, then orders the images by the similarity of the provided embedding. The query returns a limited number of results based on the specified limit.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;AssetEntity&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;ImageEmbedding&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assetId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;ExifInfo&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assetId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ownerId&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(:&lt;/span&gt;&lt;span class="n"&gt;userIds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isArchived&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isVisible&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;createdAt&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;numResults&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It can be seen as a scenario involving &lt;strong&gt;Single-Vector TopK + Filter + Join&lt;/strong&gt; operations. The limitations of pgvector in supporting such operations highlight the need for VBASE.&lt;/p&gt;

&lt;p&gt;When it comes to Single-Vector TopK operations, pgvector falls short in providing efficient performance. TopK queries require finding the K nearest neighbors to a target vector, but pgvector struggles to predict the optimal value of K, leading to suboptimal query performance. VBASE, on the other hand, addresses this limitation by leveraging relaxed monotonicity and offering significantly higher efficiency. It provides a more accurate and efficient solution for single-vector TopK queries.&lt;/p&gt;

&lt;p&gt;Additionally, pgvector's support for Filter and Join operations in conjunction with vector queries is limited. Complex queries that involve filtering or joining on both scalar and vector data can be challenging to execute efficiently in pgvector. VBASE, however, is designed to handle these types of queries seamlessly. It integrates vector search systems with relational databases, allowing for the execution of complex queries involving filters and joins on both scalar and vector attributes. This capability makes VBASE a more suitable choice for applications that require these operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark
&lt;/h2&gt;

&lt;p&gt;To evaluate the performance, benchmarks can be conducted to measure the efficiency and effectiveness of both systems. We utilize the &lt;code&gt;laion-768-5m-ip-probability&lt;/code&gt; dataset for benchmarking purposes due to the absence of a comprehensive relational benchmark. The dataset is derived from LAION 2B images. It contains 5,000,000 vectors, 10,000 queries.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.hashnode.com%2Fres%2Fhashnode%2Fimage%2Fupload%2Fv1707279248378%2F4ebbb813-0593-4949-9898-9abdee58e4fc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.hashnode.com%2Fres%2Fhashnode%2Fimage%2Fupload%2Fv1707279248378%2F4ebbb813-0593-4949-9898-9abdee58e4fc.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The dataset includes a &lt;code&gt;probability&lt;/code&gt; column that stores random floating-point values generated from a uniform distribution between 0 and 1. The ratio of 0.01 means that each query covers 1% (or 0.01 times) of the dataset, allowing for focused analysis.&lt;/p&gt;

&lt;p&gt;We present the recall, latency (in milliseconds), and RPS (requests per second) for various &lt;code&gt;probability&lt;/code&gt; ranges while keeping the &lt;code&gt;ef_search&lt;/code&gt; constant. The &lt;code&gt;ef_search&lt;/code&gt; parameter represents the size of the list utilized during k-NN (k-Nearest Neighbors) searches, determining the trade-off between search accuracy and query processing time.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://pgvecto.rs" rel="noopener noreferrer"&gt;pgvecto.rs&lt;/a&gt;, when used with VBASE, consistently yields improved recall, particularly when working with low probability values.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other features
&lt;/h2&gt;

&lt;p&gt;&lt;a href="http://pgvecto.rs" rel="noopener noreferrer"&gt;pgvecto.rs&lt;/a&gt; 0.2 introduces the following key features and improvements other than VBASE integration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;FP16: Users can now store embeddings in PostgreSQL using half the float32 size, significantly improving latency. This optimization has a negligible impact on final recall, less than 1%.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Asynchronous indexing: Insertion operations are &lt;strong&gt;non-blocking&lt;/strong&gt;, ensuring a smoother and more efficient data insertion and indexing process.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Doubled query performance: &lt;a href="http://pgvecto.rs" rel="noopener noreferrer"&gt;pgvecto.rs&lt;/a&gt; 0.2 offers query performance that is twice as fast as the previous version (0.1), marking a significant leap forward in system efficiency.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Observability: The new &lt;code&gt;pg_vector_index_stat&lt;/code&gt; view provides a transparent view into the indexing internals of &lt;a href="http://pgvecto.rs" rel="noopener noreferrer"&gt;pgvecto.rs&lt;/a&gt;. Users can monitor index construction, configuration adjustments, and detailed statistical analysis in real time, fostering a more intuitive and controlled environment.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Quick start&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To get started with &lt;a href="http://pgvecto.rs" rel="noopener noreferrer"&gt;pgvecto.rs&lt;/a&gt; 0.2, you could run it in a docker container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; pgvecto-rs-demo &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mysecretpassword &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 5432:5432 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; tensorchord/pgvecto-rs:pg16-v0.2.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Please &lt;a href="https://docs.pgvecto.rs/getting-started/overview.html" rel="noopener noreferrer"&gt;check out our documentation&lt;/a&gt; for more details. We encourage you to try out &lt;a href="http://pgvecto.rs" rel="noopener noreferrer"&gt;&lt;strong&gt;pgvecto.rs&lt;/strong&gt;&lt;/a&gt;, benchmark it against your workloads, and contribute your indexing innovations. Join our &lt;a href="https://discord.gg/KqswhpVgdU" rel="noopener noreferrer"&gt;&lt;strong&gt;Discord community&lt;/strong&gt;&lt;/a&gt; to connect with the developers and other users working to improve pgvecto.rs!&lt;/p&gt;

</description>
      <category>vectordatabase</category>
      <category>postgres</category>
      <category>rust</category>
      <category>python</category>
    </item>
    <item>
      <title>20x Faster as the Beginning: Introducing pgvecto.rs extension written in Rust</title>
      <dc:creator>Ce Gao</dc:creator>
      <pubDate>Mon, 07 Aug 2023 04:51:54 +0000</pubDate>
      <link>https://dev.to/gaocegege/20x-faster-as-the-beginning-introducing-pgvectors-extension-written-in-rust-3d2f</link>
      <guid>https://dev.to/gaocegege/20x-faster-as-the-beginning-introducing-pgvectors-extension-written-in-rust-3d2f</guid>
      <description>&lt;p&gt;We are thrilled to announce the release of &lt;a href="https://github.com/tensorchord/pgvecto.rs"&gt;pgvecto.rs&lt;/a&gt;, a powerful Postgres extension for vector similarity search written in Rust. It's HNSW algorithm is 20x faster than pgvector at 90% recall. But speed is just the start - pgvecto.rs is architected to easily add new algorithms. We've made it an extensible architecture for contributors to implement new index with ease, and we look forward to the open source community driving pgvecto.rs to new heights!&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Rust?
&lt;/h2&gt;

&lt;p&gt;Pgvecto.rs is implemented in Rust rather than C like many existing Postgres extensions. It is built on top of the pgrx framework for writing Postgres extensions in Rust. Rust provides many advantages for an extension like pgvecto.rs. Rust's strict compile-time checks guarantee memory safety, which helps avoid entire classes of bugs and security issues that can plague C extensions. Just as importantly, Rust provides modern developer ergonomics with great documentation, package management, and excellent error messages. This makes pgvecto.rs more approachable for developers to use and contribute to compared to sprawling C codebases. The safety and ease of use of Rust make it an ideal language for building the next generation of Postgres extensions like pgvecto.rs on top of pgrx.&lt;/p&gt;

&lt;h2&gt;
  
  
  Extensible Architectures
&lt;/h2&gt;

&lt;p&gt;Pgvecto.rs is designed with an extensible architecture that makes it easy to add support for new index types. At the core is a set of traits that define the required behaviors for a vector index, like building, saving, loading, and querying. Implementing a new index is as straightforward as creating a struct for that index type and implementing the required traits. Pgvecto.rs currently comes with two built-in index types - HNSW for maximum search speed, and ivfflat for quantization-based approximate search. But the doors are open for anyone to create additional indexes like RHNSW, NGT, or custom types tailored to specific use cases. The extensible architecture makes pgvecto.rs adaptable as new vector search algorithms emerge. And it lets you select the right index for your data and performance needs. Pgvecto.rs provides the framework for making vector search in Postgres as flexible and future-proof as possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Speed and Performance
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/usamoi/pgvecto.rs-bench"&gt;Benchmarks&lt;/a&gt; show pgvecto.rs offers massive speed improvements over existing Postgres extensions like pgvector. In tests, its HNSW index demonstrates search performance up to 25x faster compared to pgvector's ivfflat index. The flexible architecture also allows using different indexing algorithms to optimize for either maximum throughput or precision. We're working on the quantization HNSW now, please also stay tuned!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FHC4VSk3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://hackmd.io/_uploads/SyOOvsC5n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FHC4VSk3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://hackmd.io/_uploads/SyOOvsC5n.png" alt="" width="742" height="458"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Persistence and Management
&lt;/h2&gt;

&lt;p&gt;Previous work pg_embedding did a great job implementing HNSW indexes, but lacked support for persistence and proper CRUD operations. pgvecto.rs adds those two core functionalities that were missing in pg_embedding. Vector indexes in pgvecto.rs are properly persisted using WAL (write-ahead logging). pgvecto.rs handles saving, loading, rebuilding, and updating indexes automatically behind the scenes. You get durable indexes that don't require external management while fitting cleanly into current Postgres deployments and workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;Let's assume you've created a table using the following SQL command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;bigserial&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;emb&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, &lt;code&gt;vector(4)&lt;/code&gt; denotes the vector data type, with &lt;code&gt;4&lt;/code&gt; representing the dimension of the vector. You can use &lt;code&gt;vector&lt;/code&gt; without specifying a dimension, but be aware that you cannot create an index on a vector type without a specified dimension.&lt;/p&gt;

&lt;p&gt;You can insert data like this anytime.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;emb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1.1, 2.2, 3.3, 4.4]'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To create an index on the &lt;code&gt;emb&lt;/code&gt; vector column using squared Euclidean distance, you can use the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;emb&lt;/span&gt; &lt;span class="n"&gt;l2_ops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="n"&gt;capacity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2097152&lt;/span&gt;
&lt;span class="n"&gt;size_ram&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4294967296&lt;/span&gt;
&lt;span class="n"&gt;storage_vectors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;"ram"&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;algorithm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hnsw&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;storage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;"ram"&lt;/span&gt;
&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;
&lt;span class="n"&gt;ef&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;256&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want to retrieve the top 10 vectors closest to the origin, you can use the following SQL command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;emb&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'[0, 0, 0, 0]'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;emb&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'[0, 0, 0, 0]'&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;pgvecto.rs represents an exciting step forward for vector search in Postgres. Its implementation in Rust and extensible architecture provide key advantages over existing extensions like speed, safety, and flexibility. We're thrilled to release pgvecto.rs as an open source project under Apache 2.0 license and can't wait to see what the community builds on top of it. There's ample room for pgvecto.rs to expand - adding new index types and algorithms, optimizing for different data distributions and use cases, and integrating with existing Postgres workflows.&lt;/p&gt;

&lt;p&gt;We encourage you to try out &lt;a href="https://github.com/tensorchord/pgvecto.rs"&gt;pgvecto.rs on GitHub&lt;/a&gt;, benchmark it against your workloads, and contribute your own indexing innovations. Together, we can make pgvecto.rs the best vector search extension Postgres has ever seen! The potential is vast, and we're just getting started. Please join us on this journey to bring unprecedented vector search capabilities to the Postgres ecosystem.  Join our &lt;a href="https://discord.gg/KqswhpVgdU"&gt;Discord community&lt;/a&gt; to connect with the developers and other users working to improve pgvecto.rs!&lt;/p&gt;

&lt;h2&gt;
  
  
  Advertisement Time
&lt;/h2&gt;

&lt;p&gt;The mission of ModelZ is to simplify the process of taking machine learning models into production. With experiences from AWS, Tiktok, and Kubeflow, our team has extensive expertise in MLOps engineering. So if you have any questions related to putting models into production, please feel free to &lt;a href="https://docs.modelz.ai/community"&gt;reach out&lt;/a&gt;, by joining &lt;a href="https://discord.gg/F4WnzqmeNj"&gt;Discord&lt;/a&gt;, or through &lt;a href="//mailto:modelz-support@tensorchord.ai"&gt;modelz-support@tensorchord.ai&lt;/a&gt;. We're happy to help draw on our background building MLOps platforms across companies to provide guidance on any part of the model development to deployment workflow.&lt;/p&gt;

&lt;p&gt;More products with ModelZ:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://modelz.ai"&gt;ModelZ&lt;/a&gt; - A Managed serverless GPU platform to deploy your own models&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/mosecorg/mosec"&gt;Mosec&lt;/a&gt; - A high-performance serving framework for ML models, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine. Simple and faster alternative to NVIDIA Triton.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/tensorchord/envd"&gt;envd&lt;/a&gt; - A command-line tool that helps you create the container-based environment for AI/ML, from development to the production. Python is all you need to know to use this tool.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/tensorchord/modelz-llm"&gt;ModelZ-llm&lt;/a&gt; - OpenAI compatible API for LLMs and embeddings (LLaMA, Vicuna, ChatGLM and many others)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>vectordatabase</category>
      <category>machinelearning</category>
      <category>ai</category>
      <category>llm</category>
    </item>
    <item>
      <title>Do we really need a specialized vector database?</title>
      <dc:creator>Ce Gao</dc:creator>
      <pubDate>Mon, 07 Aug 2023 04:45:26 +0000</pubDate>
      <link>https://dev.to/gaocegege/do-we-really-need-a-specialized-vector-database-5aci</link>
      <guid>https://dev.to/gaocegege/do-we-really-need-a-specialized-vector-database-5aci</guid>
      <description>&lt;p&gt;With the popularity of LLM (Large Language Model), vector databases have also become a hot topic. With just a few lines of simple Python code, a vector database can act as a cheap but highly effective "external brain" for your LLM. But do we really need a specialized vector database?&lt;/p&gt;

&lt;h2&gt;
  
  
  Why does LLM need vector search?
&lt;/h2&gt;

&lt;p&gt;First, let me briefly introduce why LLM needs to use vector search technology. Vector search is a problem that has been around for a long time. The process of finding the most similar object in a collection given an object is vector search. Text/images, etc. can be converted into a vector representation, and the similarity problem of text/images can be transformed into a vector similarity problem.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2zgp6bb32qh7qarlj6h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2zgp6bb32qh7qarlj6h.png" alt="Image description" width="800" height="663"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the example above, we convert different words into a three-dimensional vector. Therefore, we can intuitively display the similarity between different words in a 3D space. For example, the similarity between "student" and "school" is higher than the similarity between "student" and "food".&lt;/p&gt;

&lt;p&gt;Returning to LLM, the limitation of the context window length is a major challenge. For instance, ChatGPT 3.5 has a context length limit of 4k tokens. This poses a significant problem for LLM's context-learning ability and negatively impacts the model's user experience. However, vector search provides an elegant solution to this problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Divide the text that exceeds the context length limit into shorter chunks and convert different chunks into vectors (embeddings).&lt;/li&gt;
&lt;li&gt;Before inputting the prompt to LLM, convert the prompt into a vector (embedding).&lt;/li&gt;
&lt;li&gt;Search the prompt vector to find the most similar chunk vector.&lt;/li&gt;
&lt;li&gt;Concatenate the most similar chunk vector with the prompt vector as the input to LLM.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is like giving LLM an external memory, which allows it to search for the most relevant information from this memory. This memory is the ability brought by vector search. If you want to learn more details, you can read these articles (&lt;a href="https://simplicityissota.substack.com/p/what-is-an-embedding-anyways"&gt;Article 1&lt;/a&gt; and &lt;a href="https://betterprogramming.pub/enhancing-chatgpt-with-infinite-external-memory-using-vector-database-and-chatgpt-retrieval-plugin-b6f4ea16ab8"&gt;Article 2&lt;/a&gt;), which explain it more clearly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why is vector database so popular?
&lt;/h2&gt;

&lt;p&gt;In LLM, the vector database has become an indispensable part, and one of the most important reasons is its ease of use. After being used in conjunction with OpenAI Embedding models (such as &lt;code&gt;text-embedding-ada-002&lt;/code&gt;), it only takes about ten lines of code to convert a prompt query into a vector and perform the entire process of vector search.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

    &lt;span class="c1"&gt;# Creates embedding vector from user query
&lt;/span&gt;    &lt;span class="n"&gt;embedded_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Embedding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;EMBEDDING_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;near_vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;embedded_query&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Queries input schema with vectorized user query
&lt;/span&gt;    &lt;span class="n"&gt;query_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_near_vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;near_vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;do&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;query_result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In LLM, vector search mainly plays a role in recall. Simply put, recall is finding the most similar objects in a candidate set. In LLM, the candidate set is all chunks, and the most similar object is the chunk that is most similar to the prompt. In the reasoning process of LLM, vector search is regarded as the main implementation of recall. It is easy to implement and can use OpenAI Embedding models to solve the most troublesome problem of converting text into vectors. The remaining part is an independent and clean vector search problem, which can be well completed by current vector databases. Therefore, the entire process is particularly smooth.&lt;/p&gt;

&lt;p&gt;As the name suggests, vector database is a database specifically designed for the special data type of vectors. The similarity calculation of vectors was originally an O(n^2) complexity problem because it required comparing all vectors in the set pairwise. Therefore, the industry proposed the Approximate Nearest Neighbor (ANN) algorithm. By using the ANN algorithm, the vector index is constructed by pre-calculating in the vector database, using the idea of trading space for time, which greatly speeds up the process of similarity calculation. This is similar to the index in traditional databases.&lt;/p&gt;

&lt;p&gt;Therefore, vector databases not only have strong performance but also excellent ease of use, making them a perfect match for LLM! (Really?)&lt;/p&gt;

&lt;h2&gt;
  
  
  Perhaps a general-purpose database would be better?
&lt;/h2&gt;

&lt;p&gt;We've talked about the advantages and benefits of vector databases, but what are their limitations? &lt;a href="https://www.singlestore.com/blog/why-your-vector-database-should-not-be-a-vector-database/"&gt;A blog post by SingleStore&lt;/a&gt; provides a good answer to this question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Vectors and vector search are a data type and query processing approach, not a foundation for a new way of processing data. Using a specialty vector database (SVDB) will lead to the usual problems we see (and solve) again and again with our customers who use multiple specialty systems: redundant data, excessive data movement, lack of agreement on data values among distributed components, extra labor expense for specialized skills, extra licensing costs, limited query language power, programmability and extensibility, limited tool integration, and poor data integrity and availability compared with a true DBMS.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;There are two issues that I think are important. The first is the issue of data consistency. During the prototyping phase, vector databases are very suitable, and ease of use is more important than anything else. However, a vector database is an independent system that is completely decoupled from other data storage systems such as TP databases and AP data lakes. Therefore, data needs to be synchronized, streamed, and processed between multiple systems.&lt;/p&gt;

&lt;p&gt;Imagine if your data is already stored in an OLTP database such as PostgreSQL. To perform vector search using an independent vector database, you need to first extract the data from the database, then convert each data point into a vector using services such as OpenAI Embedding, and then synchronize it to a dedicated vector database. This adds a lot of complexity. Furthermore, if a user deletes a data point in PostgreSQL but it is not deleted in the vector database, then there will be data inconsistency issues. This issue can be very serious in actual production environments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Update the embedding column for the documents table&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;length&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Create an index on the embedding column&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;ivfflat&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector_l2_ops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Query the similar embeddings&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;openai_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'hello world'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the other hand, if everything is done in a general-purpose database, the user experience may be simpler than with an independent vector database. Vectors are just one data type in a general-purpose database, not an independent system. This way, data consistency is no longer an issue.&lt;/p&gt;

&lt;p&gt;The second issue is with query language. The query language of vector databases is typically designed specifically for vector search, so there may be many limitations in other types of queries. For example, in metadata filtering scenarios, users need to filter based on certain metadata fields. The filtering operators supported by some vector databases are limited.&lt;/p&gt;

&lt;p&gt;In addition, the supported data types for metadata are also very limited, usually only including String, Number, List of Strings, and Booleans. This is not friendly for complex metadata queries.&lt;/p&gt;

&lt;p&gt;If traditional databases can support the vector data type, then the aforementioned issues do not exist. Firstly, data consistency is already taken care of as TP or AP databases are existing infrastructure in production environments. Secondly, the issue of query language no longer exists because vector data type is just one data type in the database, so queries for vector data type can use the native query language of the database, such as SQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Detailed explanation
&lt;/h2&gt;

&lt;p&gt;However, it is unfair to only compare the disadvantages of vector databases. There are several counterpoints to consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ease of Use&lt;/strong&gt;: Vector databases are designed with ease of use in mind, and users can easily work with them without worrying about the underlying implementation details. However, integrating them with other data storage systems can be a challenge, as mentioned earlier.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance&lt;/strong&gt;: Vector databases have a significant advantage over traditional databases in terms of performance for certain use cases. Their design for vector search allows for fast and efficient similarity searches on large-scale datasets with high-dimensional vectors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metadata Filtering&lt;/strong&gt;: While metadata filtering capabilities in vector databases may be limited, they can still meet the needs of most business scenarios. However, for more complex metadata queries, a hybrid approach may be needed, where metadata is stored in a separate database or data lake and linked to the vector data in the vector database.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;How can you address these issues? In the following section, I will provide my perspective by answering these questions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vector databases are easy to use
&lt;/h3&gt;

&lt;p&gt;While it is true that vector databases are easy to use, this is not unique to them. The ease of use of vector databases is mainly due to their abstraction of a specific domain, which allows them to be specifically designed for the most commonly used machine learning programming language, Python, and optimized for vector search scenarios. However, if traditional databases could also support the vector data type, they could offer similar ease of use.&lt;/p&gt;

&lt;p&gt;In addition, traditional databases can provide Python SDKs and other integrated tools to meet the needs of most scenarios, as well as standard SQL interfaces to handle more complex query scenarios. Therefore, it is not necessary to use a vector database solely for its ease of use.&lt;/p&gt;

&lt;p&gt;Another advantage of vector databases is their distributed design, which allows them to scale horizontally to meet the data volume and QPS requirements of users. However, traditional databases can also meet these requirements through distributed systems. Nevertheless, the decision to use a distributed system should be based on the actual needs of the data volume and QPS requirements, as well as the associated costs.&lt;/p&gt;

&lt;p&gt;In summary, while vector databases have their advantages, traditional databases can also provide similar ease of use and distributed capabilities if they support the vector data type. Therefore, the choice between a vector database and a traditional database should be based on the specific needs of the application and the available resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vector databases have better performance
&lt;/h3&gt;

&lt;p&gt;To investigate the performance of vector databases in LLM scenarios, a &lt;a href="https://www.ethanrosenthal.com/2023/04/10/nn-vs-ann/"&gt;naive benchmark&lt;/a&gt; of vector retrieval was conducted. The benchmark involved N randomly initialized 256-dimensional vectors, and the query time for the top-5 nearest neighbors was measured for different scales of N. Two different methods were used for the test:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Numpy was used to perform &lt;strong&gt;real-time&lt;/strong&gt; calculation, which executed completely accurate, non-precomputed nearest neighbor calculation.&lt;/li&gt;
&lt;li&gt;Hnswlib was used to &lt;strong&gt;precompute approximate&lt;/strong&gt; nearest neighbors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fww74e6286spa1wai159u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fww74e6286spa1wai159u.png" alt="Image description" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The benchmark results show that, at the scale of 1 million vectors, the delay of real-time calculation using Numpy is approximately 50ms. Using this as a benchmark, we can compare the time spent on LLM inference after completing vector search. For instance, the 7B model requires approximately 10 seconds for inference on 300 Chinese characters on an Nvidia A100 (40GB). Therefore, even if the query time for real-time accurate calculation of the similarity of 1 million vectors using Numpy is considered, it only accounts for 0.5% of the total delay in the end-to-end LLM inference. Thus, in terms of delay, the benefits brought by vector databases may be overshadowed by the delay of LLM itself in the current LLM scenario. Therefore, we need to also consider throughput. The throughput of LLM is much lower than that of vector databases. Thus, I do not believe that throughput is the core issue in this scenario.&lt;/p&gt;

&lt;p&gt;If performance is not the primary concern, what factors will determine the user's choice? I think it is the overall ease of use, including ease of use for both usage and operation, consistency, and other solutions to database-related issues. Traditional databases have mature solutions for these problems, while vector databases are still in the early stages of development.&lt;/p&gt;

&lt;h3&gt;
  
  
  Metadata filtering can still meet the needs of most business scenarios
&lt;/h3&gt;

&lt;p&gt;When considering metadata filtering, it's important to note that it's not just a matter of the number of supported operators. Consistency of data is also a crucial factor. Metadata in vectors is essentially data in traditional databases, while vectors themselves are indexes of the data. Therefore, it's reasonable to consider storing both vectors and metadata in traditional databases.&lt;/p&gt;

&lt;p&gt;Traditional databases do have the capability to support vector data types and provide similar ease of use and distributed capabilities as vector databases. Furthermore, traditional databases have mature solutions to ensure data consistency and integrity, such as transaction management and data backup and recovery.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vectors in traditional databases
&lt;/h2&gt;

&lt;p&gt;Since we see vectors as a new data type in traditional databases, let's take a look at how to support vector data types in traditional databases, using PostgreSQL as an example. &lt;a href="https://github.com/pgvector/pgvector"&gt;pgvector&lt;/a&gt; is an open-source PostgreSQL plugin that supports vector data types. pgvector uses exact calculation by default, but it also supports building an IVFFlat index and precomputing ANN results using the IVFFlat algorithm, sacrificing calculation accuracy for performance.&lt;/p&gt;

&lt;p&gt;pgvector has done an excellent job of supporting vectors and is used by products such as &lt;a href="https://supabase.com/blog/openai-embeddings-postgres-vector"&gt;supabase&lt;/a&gt;. However, the supported index algorithm is limited, with only the simplest IVFFlat algorithm supported, and no quantization or storage optimization is implemented. Moreover, the index algorithm of pgvector is not disk-friendly and is designed for use in memory. Therefore, vector index algorithms designed for disk, such as &lt;a href="https://proceedings.neurips.cc/paper_files/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf"&gt;DiskANN&lt;/a&gt;, are also valuable in the traditional database ecosystem.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1rSpTWLf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://pbs.twimg.com/media/FF8KN7cXsBAGOLj.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1rSpTWLf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://pbs.twimg.com/media/FF8KN7cXsBAGOLj.jpg" alt="" width="400" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Extending pgvector can be challenging due to its implementation in the C programming language. Despite being open-source for two years, pgvector currently has only three contributors. While the implementation of pgvector is not particularly complex, it may be worth considering rewriting it in Rust.&lt;/p&gt;

&lt;p&gt;Rewriting pgvector in Rust can enable the code to be organized in a more modern and extensible way. Rust's ecosystem is also very rich, with existing Rust bindings such as &lt;a href="https://github.com/Enet4/faiss-rs"&gt;faiss-rs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flzmeih0n6w6wmdt0t1q5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flzmeih0n6w6wmdt0t1q5.png" alt="Image description" width="800" height="380"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As a result, &lt;a href="https://github.com/tensorchord/pgvecto.rs"&gt;pgvecto.rs&lt;/a&gt; was created. pgvecto.rs currently supports exact vector query operations and three distance calculation operators. Work is underway to design and implement index support. In addition to IVFFlat, we also hope to support more indexing algorithms such as DiskANN, SPTAG, and ScaNN. We welcome contributions and feedback from the community!&lt;/p&gt;

&lt;p&gt;pgvecto.rs offers a modern and extensible codebase with improved performance and concurrency. Its design and implementation allow seamless integration with other machine learning libraries and tools, making it an ideal choice for similarity search scenarios.&lt;/p&gt;

&lt;p&gt;With ongoing development, pgvecto.rs aims to be a valuable tool for data scientists and machine learning practitioners. Its support for various indexing algorithms and its ease of use make it a promising candidate for large-scale similarity search applications. We look forward to continuing development and contributions from the community.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- call the distance function through operators&lt;/span&gt;

&lt;span class="c1"&gt;-- square Euclidean distance&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="c1"&gt;-- dot product distance&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;#&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="c1"&gt;-- cosine distance&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="c1"&gt;-- create table&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;bigserial&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;emb&lt;/span&gt; &lt;span class="nb"&gt;numeric&lt;/span&gt;&lt;span class="p"&gt;[]);&lt;/span&gt;
&lt;span class="c1"&gt;-- insert values&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;emb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARRAY&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARRAY&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="c1"&gt;-- query the similar embeddings&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;emb&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ARRAY&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- query the neighbors within a certain distance&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;emb&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ARRAY&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Future
&lt;/h2&gt;

&lt;p&gt;As LLMs gradually move into production environments, infrastructure requirements are becoming increasingly demanding. The emergence of vector databases is an important addition to the infrastructure. We do not believe that vector databases will replace traditional databases, but rather that they will each play to their strengths in different scenarios. The emergence of vector databases will also promote traditional databases to support vector data types.&lt;/p&gt;

&lt;p&gt;We hope that pgvecto.rs can become an important component of the Postgres ecosystem, providing better vector support for Postgres. Its implementation in Rust and support for various indexing algorithms make it a promising candidate for large-scale similarity search applications. We believe that its development and contributions from the community will help it become a valuable tool for data scientists and machine learning practitioners.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>machinelearning</category>
      <category>llm</category>
      <category>vectordatabase</category>
    </item>
    <item>
      <title>Launching ModelZ Beta!</title>
      <dc:creator>Ce Gao</dc:creator>
      <pubDate>Wed, 21 Jun 2023 03:21:18 +0000</pubDate>
      <link>https://dev.to/gaocegege/launching-modelz-beta-4gpa</link>
      <guid>https://dev.to/gaocegege/launching-modelz-beta-4gpa</guid>
      <description>&lt;p&gt;We're excited to announce the launch of &lt;a href="https://modelz.ai"&gt;Modelz&lt;/a&gt; beta, a serverless GPU inference platform. Our team has been hard at work building a platform that democratizes access to machine learning, making it easier than ever to build and deploy models for a variety of use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Features
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://modelz.ai"&gt;Modelz&lt;/a&gt; is a fully managed platform that provides users with a simple API for deploying their machine learning models. The platform takes care of all the underlying infrastructure, including servers, storage, and networking. This means that users can focus on developing their models and deploying them on the platform without worrying about the underlying infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://modelz.ai"&gt;Modelz&lt;/a&gt; provides the following features out-of-the-box:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Serverless&lt;/strong&gt;: Serverless architecture enables us to easily scale up or down according to your needs, allowing us to provide a reliable and scalable solution for deploying and prototyping machine learning applications at any scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reduce cost&lt;/strong&gt;: Pay only for the resources you consume, without any additional charges for idle servers or cold starts. Get 30 free minutes of L4 GPU usage when you join us. Attach a payment method and get an extra 90 minutes free.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI compatible API&lt;/strong&gt;: Our platform supports OpenAI compatible API, which means you can easily integrate new open source LLMs into your existing applications with just a few lines of code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prototyping environment&lt;/strong&gt;: We provide a robust prototyping environment with support for Gradio and Streamlit. With our integration with HuggingFace Space, accessing pre-trained models and launching demos is easier than ever, with just one click. This allows you to quickly test and iterate on your models, saving you time and effort in the development process.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick start
&lt;/h2&gt;

&lt;p&gt;Getting started with Modelz is easy and straightforward. Here are the quick steps to get started:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://cloud.modelz.ai/login"&gt;Sign up for an account&lt;/a&gt; on the website.&lt;/li&gt;
&lt;li&gt;Use the Modelz templates to create the deployment.&lt;/li&gt;
&lt;li&gt;Send requests or visit the UI of the deployment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---NkZyGtu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://modelz.ai/blog-images/modelz-beta/templates.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---NkZyGtu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://modelz.ai/blog-images/modelz-beta/templates.png" alt="" width="800" height="558"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's a full workflow example using the Modelz Beta platform and the bloomz 560M template to create a inference deployment.&lt;/p&gt;

&lt;p&gt;After the creation, you will get the detailed information in the UI:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XFSQJw0N--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://modelz.ai/blog-images/modelz-beta/detail.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XFSQJw0N--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://modelz.ai/blog-images/modelz-beta/detail.png" alt="" width="800" height="430"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will show the logs, events (e.g. the deployment scale-up and scale-down events), and the metrics (e.g. total requests, inflight requests) in the dashboard. Besides this, You could get the usage guide too.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--b3gY4uyu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://modelz.ai/blog-images/modelz-beta/guide.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--b3gY4uyu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://modelz.ai/blog-images/modelz-beta/guide.png" alt="" width="800" height="599"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bloomz 560M template is powered by &lt;a href="https://github.com/tensorchord/modelz-llm"&gt;modelz-llm&lt;/a&gt;, which provides the OpenAI compatible API for the model. Thus you could use OpenAI python &lt;br&gt;
package to use the model. First you need to get the endpoint and &lt;a href="https://cloud.modelz.ai/settings"&gt;API key&lt;/a&gt; from the dashboard.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;
&lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_base&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://bloomz-webhiq5i9dagphhu.modelz.io&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="c1"&gt;# Use your API Key in modelz.
&lt;/span&gt;&lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mzi-xxx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# create a chat completion
&lt;/span&gt;&lt;span class="n"&gt;chat_completion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ChatCompletion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;any&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello world&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Serverless
&lt;/h2&gt;

&lt;p&gt;The deployment will scale down to 0 after a idle interval (it is configured in the creation page). You could get the autoscaling events and metrics in the dashboard:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8UbQeNdH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://modelz.ai/blog-images/modelz-beta/serverless.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8UbQeNdH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://modelz.ai/blog-images/modelz-beta/serverless.png" alt="" width="800" height="263"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Community
&lt;/h2&gt;

&lt;p&gt;Modelz is built on top of &lt;a href="https://github.com/tensorchord/envd"&gt;&lt;code&gt;envd&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://github.com/mosecorg/mosec"&gt;mosec&lt;/a&gt;, &lt;a href="https://github.com/tensorchord/modelz-llm"&gt;modelz-llm&lt;/a&gt; and many other open source projects. If you're interested in joining the Modelz community, here are some ways to get involved:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Join the &lt;a href="https://discord.gg/KqswhpVgdU"&gt;Modelz discord community&lt;/a&gt;: We have a discord community where you can connect with other developers and data scientists, ask questions, and share your knowledge and expertise.&lt;/li&gt;
&lt;li&gt;Contribute to open source projects: Modelz is built on top of &lt;a href="https://github.com/tensorchord/envd"&gt;&lt;code&gt;envd&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://github.com/mosecorg/mosec"&gt;mosec&lt;/a&gt;, &lt;a href="https://github.com/tensorchord/modelz-llm"&gt;modelz-llm&lt;/a&gt; and many other open source projects. If you're interested in contributing to these projects, you can check out their GitHub repositories and start contributing.&lt;/li&gt;
&lt;li&gt;Share your models and projects: If you've built a machine learning model or a project using Modelz, we'd love to hear about it! You can share your projects on our discord community or on twitter using the hashtag #Modelz, or mention &lt;a href="https://twitter.com/TensorChord"&gt;@TensorChord&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As we continue to explore the possibilities of AI and its impact on our world, I wish you a great journey in your pursuit of knowledge and innovation. Whether you are just getting started with AI or are an experienced professional, the field of AI offers endless opportunities for growth and discovery.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>machinelearning</category>
      <category>mlops</category>
      <category>beginners</category>
    </item>
    <item>
      <title>A command-line tool to create development environments for AI/ML, based on buildkit</title>
      <dc:creator>Ce Gao</dc:creator>
      <pubDate>Mon, 12 Sep 2022 01:32:50 +0000</pubDate>
      <link>https://dev.to/gaocegege/a-command-line-tool-to-create-development-environments-for-aiml-based-on-buildkit-21ie</link>
      <guid>https://dev.to/gaocegege/a-command-line-tool-to-create-development-environments-for-aiml-based-on-buildkit-21ie</guid>
      <description>&lt;h2&gt;
  
  
  What is envd?
&lt;/h2&gt;

&lt;p&gt;envd (&lt;code&gt;ɪnˈvdɪ&lt;/code&gt;) is a command-line tool that helps you create the container-based development environment for AI/ML.&lt;/p&gt;

&lt;p&gt;Development environments are full of python and system dependencies, CUDA, BASH scripts, Dockerfiles, SSH configurations, Kubernetes YAMLs, and many other clunky things that are always breaking. envd is to solve the problem:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Declare the list of dependencies (CUDA, python packages, your favorite IDE, and so on) in &lt;code&gt;build.envd&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Simply run &lt;code&gt;envd up&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Develop in the isolated environment.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feyg20mnnnpt2tmll3yh4.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feyg20mnnnpt2tmll3yh4.gif" alt="Image description" width="800" height="606"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why use &lt;code&gt;envd&lt;/code&gt;?
&lt;/h2&gt;

&lt;p&gt;Environments built with &lt;code&gt;envd&lt;/code&gt; provide the following features out-of-the-box:&lt;/p&gt;

&lt;p&gt;❤️ &lt;strong&gt;Knowledge reuse in your team&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;envd&lt;/code&gt; build functions can be reused. Use &lt;code&gt;include&lt;/code&gt; function to import any git repositories. No more copy/paste Dockerfile instructions, let's reuse them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;envdlib&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;include&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://github.com/tensorchord/envdlib&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;base&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ubuntu20.04&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;envdlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tensorboard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8888&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;⏱️ &lt;strong&gt;Builtkit native, build up to 6x faster&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/moby/buildkit"&gt;Buildkit&lt;/a&gt; supports parallel builds and software cache (e.g. pip index cache and apt cache). You can enjoy the benefits without knowledge of it.&lt;/p&gt;

&lt;p&gt;For example, the PyPI cache is shared across builds and thus the package will be cached if it has been downloaded before.&lt;/p&gt;

&lt;p&gt;
  &lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--YkIvNJvD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://user-images.githubusercontent.com/5100735/188601795-8c37f5a3-b13b-422b-816f-8a0c51f1f8b1.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--YkIvNJvD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://user-images.githubusercontent.com/5100735/188601795-8c37f5a3-b13b-422b-816f-8a0c51f1f8b1.svg" width="800" height="600"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;🐍 &lt;strong&gt;One configuration to rule them all&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Development environments are full of Dockerfiles, bash scripts, Kubernetes YAML manifests, and many other clunky files that are always breaking. You just need one configuration file &lt;code&gt;build.envd&lt;/code&gt;&lt;sup id="fnref1"&gt;1&lt;/sup&gt;, it works both for local Docker and Kubernetes clusters in the cloud.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--wIrinPzF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://user-images.githubusercontent.com/5100735/188821980-dcbd9069-b504-436a-9ffd-05ac5543a6d1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--wIrinPzF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://user-images.githubusercontent.com/5100735/188821980-dcbd9069-b504-436a-9ffd-05ac5543a6d1.png" alt="envd" width="800" height="143"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;✍️ &lt;strong&gt;Don't sacrifice your developer experience&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;SSH is configured for the created environment. You can use vscode-remote, jupyter, pycharm or other IDEs that you love. Besides this, declare the IDE extensions you want, let &lt;code&gt;envd&lt;/code&gt; take care of them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;install&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;vscode_extensions&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ms-python.python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;☁️ &lt;strong&gt;No polluted environment&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Are you working on multiple projects, all of which need different versions of CUDA? &lt;code&gt;envd&lt;/code&gt; helps you create isolated and clean environments. &lt;/p&gt;

&lt;h2&gt;
  
  
  Who should use envd?
&lt;/h2&gt;

&lt;p&gt;We’re focused on helping data scientists and teams that develop AI/ML models. And they may suffer from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;building the development environments with Python/R/Julia, CUDA, Docker, SSH, and so on. Do you have a complicated Dockerfile or build script that sets up all your dev environments, but is always breaking?&lt;/li&gt;
&lt;li&gt;Updating the environment. Do you always need to ask infrastructure engineers how to add a new Python/R/Julia package in the Dockerfile?&lt;/li&gt;
&lt;li&gt;Managing environments and machines. Do you always forget which machines are used for the specific project, because you handle multiple projects concurrently?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Getting Started 🚀
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Requirements
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Docker (20.10.0 or above)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Install and bootstrap &lt;code&gt;envd&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;envd&lt;/code&gt; can be installed with &lt;code&gt;pip&lt;/code&gt; (only support Python3). After the installation, please run &lt;code&gt;envd bootstrap&lt;/code&gt; to bootstrap.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip3 &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--pre&lt;/span&gt; &lt;span class="nt"&gt;--upgrade&lt;/span&gt; envd
envd bootstrap
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Create an &lt;code&gt;envd&lt;/code&gt; environment
&lt;/h3&gt;

&lt;p&gt;Please clone the &lt;a href="https://github.com/tensorchord/envd-quick-start"&gt;&lt;code&gt;envd-quick-start&lt;/code&gt;&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/tensorchord/envd-quick-start.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The build manifest &lt;code&gt;build.envd&lt;/code&gt; looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;base&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ubuntu20.04&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Configure the pip index if needed.
&lt;/span&gt;    &lt;span class="c1"&gt;# config.pip_index(url = "https://pypi.tuna.tsinghua.edu.cn/simple")
&lt;/span&gt;    &lt;span class="n"&gt;install&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;python_packages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;numpy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="nf"&gt;shell&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zsh&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Note that we use Python here as an example but please check out examples for other languages such as R and Julia &lt;a href="https://github.com/tensorchord/envd/tree/main/examples"&gt;here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Then please run the command below to set up a new environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd envd-quick-start &amp;amp;&amp;amp; envd up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ cd envd-quick-start &amp;amp;&amp;amp; envd up
[+] ⌚ parse build.envd and download/cache dependencies 2.8s ✅ (finished)
 =&amp;gt; download oh-my-zsh                                                    2.8s
[+] 🐋 build envd environment 18.3s (25/25) ✅ (finished)
 =&amp;gt; create apt source dir                                                 0.0s
 =&amp;gt; local://cache-dir                                                     0.1s
 =&amp;gt; =&amp;gt; transferring cache-dir: 5.12MB                                     0.1s
...
 =&amp;gt; pip install numpy                                                    13.0s
 =&amp;gt; copy /oh-my-zsh /home/envd/.oh-my-zsh                                 0.1s
 =&amp;gt; mkfile /home/envd/install.sh                                          0.0s
 =&amp;gt; install oh-my-zsh                                                     0.1s
 =&amp;gt; mkfile /home/envd/.zshrc                                              0.0s
 =&amp;gt; install shell                                                         0.0s
 =&amp;gt; install PyPI packages                                                 0.0s
 =&amp;gt; merging all components into one                                       0.3s
 =&amp;gt; =&amp;gt; merging                                                            0.3s
 =&amp;gt; mkfile /home/envd/.gitconfig                                          0.0s
 =&amp;gt; exporting to oci image format                                         2.4s
 =&amp;gt; =&amp;gt; exporting layers                                                   2.0s
 =&amp;gt; =&amp;gt; exporting manifest sha256:7dbe9494d2a7a39af16d514b997a5a8f08b637f  0.0s
 =&amp;gt; =&amp;gt; exporting config sha256:1da06b907d53cf8a7312c138c3221e590dedc2717  0.0s
 =&amp;gt; =&amp;gt; sending tarball                                                    0.4s
envd-quick-start via Py v3.9.13 via 🅒 envd 
⬢ [envd]❯ # You are in the container-based environment!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Set up Jupyter notebook
&lt;/h3&gt;

&lt;p&gt;Please edit the &lt;code&gt;build.envd&lt;/code&gt; to enable jupyter notebook:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;base&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ubuntu20.04&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Configure the pip index if needed.
&lt;/span&gt;    &lt;span class="c1"&gt;# config.pip_index(url = "https://pypi.tuna.tsinghua.edu.cn/simple")
&lt;/span&gt;    &lt;span class="n"&gt;install&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;python_packages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;numpy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="nf"&gt;shell&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zsh&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;jupyter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can get the endpoint of the running Jupyter notebook via &lt;code&gt;envd envs ls&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;envd up &lt;span class="nt"&gt;--detach&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;envd envs &lt;span class="nb"&gt;ls
&lt;/span&gt;NAME                    JUPYTER                 SSH TARGET              CONTEXT                                 IMAGE                   GPU     CUDA    CUDNN   STATUS          CONTAINER ID
envd-quick-start        http://localhost:42779   envd-quick-start.envd   /home/gaocegege/code/envd-quick-start   envd-quick-start:dev    &lt;span class="nb"&gt;false&lt;/span&gt;   &amp;lt;none&amp;gt;  &amp;lt;none&amp;gt;  Up 54 seconds   bd3f6a729e94
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  More on documentation 📝
&lt;/h2&gt;

&lt;p&gt;See &lt;a href="https://envd.tensorchord.ai/guide/getting-started.html"&gt;envd documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Roadmap 🗂️
&lt;/h2&gt;

&lt;p&gt;Please checkout &lt;a href="https://envd.tensorchord.ai/community/roadmap.html"&gt;ROADMAP&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Contribute 😊
&lt;/h2&gt;

&lt;p&gt;We welcome all kinds of contributions from the open-source community, individuals, and partners.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Join our &lt;a href="https://discord.gg/KqswhpVgdU"&gt;discord community&lt;/a&gt;!&lt;/li&gt;
&lt;li&gt;To build from the source, please read our &lt;a href="https://envd.tensorchord.ai/community/contributing.html"&gt;contributing documentation&lt;/a&gt; and &lt;a href="https://envd.tensorchord.ai/community/development.html"&gt;development tutorial&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://gitpod.io/#https://github.com/tensorchord/envd"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RwAleoJm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gitpod.io/button/open-in-gitpod.svg" alt="Open in Gitpod" width="138" height="32"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;The build language is &lt;a href="https://docs.bazel.build/versions/main/skylark/language.html"&gt;starlark&lt;/a&gt;, which is a dialect of Python. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>machinelearning</category>
      <category>docker</category>
      <category>showdev</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
