DEV Community: Ce Gao

My binary vector search is better than your FP32 vectors

Ce Gao — Tue, 26 Mar 2024 02:10:06 +0000

Within the field of vector search, an intriguing development has arisen: binary vector search. This approach shows promise in tackling the long-standing issue of memory consumption by achieving a remarkable 30x reduction. However, a critical aspect that sparks debate is its effect on accuracy.

We believe that using binary vector search, along with specific optimization techniques, can maintain similar accuracy. To provide clarity on this subject, we showcase a series of experiments that will demonstrate the effects and implications of this approach.

What is a binary vector?

A binary vector is a representation of a vector where each element in the vector is encoded as a binary value, typically either 0 or 1. This encoding scheme transforms the original vector, which may contain real-valued or high-dimensional data, into a binary format.

Binary vectors require only one bit of memory to store each element, while the original float32 vectors need 4 bytes for each element. This means that using binary vectors can reduce memory usage by up to 32 times. Additionally, this reduction in memory requirements corresponds to a notable increase in Requests Per Second (RPS) for binary vector operations.

Let's consider an example where we have 1 million vectors, and each vector is represented by float32 values in a 3072-dimensional space. In this scenario, the original float32 vector index would require around 20 gigabytes (GB) of memory to store all the vectors.

Now, if we were to use binary vectors instead, the memory usage would be significantly reduced. In this case, the binary vector index would take approximately 600 megabytes (MB) to store all 1 million vectors.

However, it was expected that this reduction in memory would lead to a significant decrease in accuracy because binary vectors lose a lot of the original information.

Surprisingly, our experiments showed that the decrease in accuracy was not as big as expected. Even though binary vectors lose some specific details, they can still capture important patterns and similarities that allow them to maintain a reasonable level of accuracy.

Experiment

To evaluate the performance metrics in comparison to the original vector approach, we conducted benchmarking using the dbpedia-entities-openai3-text-embedding-3-large-3072-1M dataset. The benchmark was performed on a Google Cloud virtual machine (VM) with specifications of n2-standard-8, which includes 8 virtual CPUs and 32GB of memory. We used pgvecto.rs v0.2.1 as the vector database.

After inserting 1 million vectors into the database table, we built indexes for both the original float32 vectors and the binary vectors.

CREATE TABLE openai3072 (
  id bigserial PRIMARY KEY,
  text_embedding_3_large_3072_embedding vector(3072),
  text_embedding_3_large_3072_bvector bvector(3072)
);

CREATE INDEX openai_vector_index on openai3072 using vectors(text_embedding_3_large_3072_embedding vector_l2_ops);

CREATE INDEX openai_vector_index_bvector ON public.openai3072 USING vectors (text_embedding_3_large_3072_bvector bvector_l2_ops);

After building the indexes, we conducted vector search queries to assess the performance. These queries were executed with varying limits, indicating the number of search results to be retrieved (limit 5, 10, 50, 100).

We observed that the Requests Per Second (RPS) for binary vector search was approximately 3000, whereas the RPS for the original vector search was only around 300.

The RPS metric indicates the number of requests or queries that can be processed by the system per second. A higher RPS value signifies a higher throughput and faster response time.

However, the accuracy of the binary vector search was reduced to about 80% compared to the original vector search. While this decrease may not be seen as significant in some cases, it can be considered unacceptable in certain situations where achieving high accuracy is crucial.

Optimization: adaptive retrieval

Luckily, we have a simple and effective method called adaptive retrieval, which we learned from the Matryoshka Representation Learning, to improve the accuracy.

The name is complex but the idea behind adaptive retrieval is straightforward. Let's say we want to find the best 100 candidates. We can follow these steps:

Query the binary vector index to retrieve a larger set (e.g. 200 candidates) from the 1 million embeddings. This is a fast operation.
Rerank the candidates using a KNN query to retrieve the top 100 candidates. Please notice that we are running KNN instead of ANN. KNN is well-suited for scenarios where we need to work with smaller sets and perform accurate similarity search, making it an excellent choice for reranking in this case.

By incorporating this reranking step, we can achieve a notable increase in accuracy, potentially reaching up to 95%. Additionally, the system maintains a high Requests Per Second (RPS), approximately 1700. Furthermore, despite these improvements, the memory usage of the index remains significantly smaller, around 30 times less, compared to the original vector representation.

Below is the SQL code that can be used to execute the adaptive retrieval:

CREATE OR REPLACE FUNCTION match_documents_adaptive(
  query_embedding vector(3072),
  match_count int
)
RETURNS SETOF openai3072
LANGUAGE SQL
AS $$
-- Step 1: Query binary vector index to retrieve match_count * 2 candidates
WITH shortlist AS (
  SELECT *
  FROM openai3072
  ORDER BY text_embedding_3_large_3072_bvector <-> binarize(query_embedding)
  LIMIT match_count * 2
)
-- Step 2: Rerank the candidates using a KNN query to retrieve the top candidates
SELECT *
FROM shortlist
ORDER BY text_embedding_3_large_3072_embedding <-> query_embedding
LIMIT match_count;
$$;

Comparison with shortening vectors

OpenAI latest embedding model text-embedding-3-large has a feature that allows you to shorten vectors.

It produces embeddings with 3072 dimensions by default. But you could safely remove some numbers from the end of the sequence and still maintain a valid representation for the text. For example, you could shorten the embeddings to 1024 dimensions.

This feature can help you save memory and make your requests faster, just like binary vectors. It would be a good idea to compare the performance and see which one works better for your needs.

Based on what we discovered, the conclusion is clear: Binary vectors significantly outperform shortened vectors.

We performed similar benchmarks to compare with binary vectors. We created indexes using the same dataset and machine type, but with varying dimensionalities. One index had 256 dimensions, while the other had 1024 dimensions.

The 1024-dimensional index achieved an accuracy of approximately 85% with a request rate of 1000 requests per second (RPS). On the other hand, the 256-dimensional index had around 60% accuracy with a higher request rate of 1200 RPS.

The 1024-dimensional index required approximately 8GB of memory, while the 256-dimensional index used around 2GB. In comparison, the binary vector approach achieved an accuracy of around 80% with a request rate of 3000 RPS, and its memory usage was approximately 600MB.

We implemented adaptive retrieval with lower-dimensional indexes. The binary vector index still outperformed the 256-dimensional index in terms of both request rate (RPS) and accuracy, while also exhibiting lower memory usage. On the other hand, the adaptive retrieval with the 1024-dimensional index achieved a higher accuracy of 99%; however, it had a relatively lower request rate and consumed 12 times more memory compared to the other indexes.

Summary

By utilizing adaptive retrieval techniques, binary vectors can maintain a high level of accuracy while significantly reducing memory usage by 30 times. We have presented benchmark metrics in a table to showcase the results. It is important to note that these outcomes are specific to the openai text-embedding-3-large model, which possesses this particular property.

pgvector vs. pgvecto.rs in 2024: A Comprehensive Comparison for Vector Search in PostgreSQL

Ce Gao — Wed, 20 Mar 2024 02:41:57 +0000

pgvector and pgvecto.rs are both vector search extensions designed to enhance the capabilities of PostgreSQL. These extensions revolutionize the way vector search is performed within the database, providing scalable, low-latency, and hybrid-enabled solutions. In this blog post, we will illustrate the differences between pgvector and pgvecto.rs.

TL; DR: Please navigate directly to the final section that contains a table showcasing the main distinctions between the two.

Search

Search is essential for a database. It plays a pivotal role in facilitating efficient data retrieval, enabling users to quickly find and access the information they need. We will demonstrate their performance and highlight the various features pgvecto.rs and pgvector offer.

Filtering

While both pgvector and pgvecto.rs provide support for vector similarity search in PostgreSQL, pgvecto.rs goes a step further by introducing VBASE method from OSDI 2023.

Real-world workloads often involve a combination of vector search and relational queries. While vector search is powerful for similarity matching and retrieval of similar vectors, relational queries allow you to perform complex joins, filters, and aggregations on structured data.

pgvecto.rs is designed to perform efficiently in scenarios that involve Single-Vector TopK + Filter + Join.

We utilize the laion-768-5m-ip-probability dataset for benchmarking purposes due to the absence of a comprehensive relational benchmark. The dataset is derived from LAION 2B images. It contains 5,000,000 vectors, 10,000 queries.

The dataset includes a probability column that stores random floating-point values generated from a uniform distribution between 0 and 1. The ratio of 0.01 means that each query covers 1% (or 0.01 times) of the dataset, allowing for focused analysis.

We present the recall, latency (in milliseconds), and RPS (requests per second) for various probability ranges while keeping the ef_search constant. The ef_search parameter represents the size of the list utilized during k-NN (k-Nearest Neighbors) searches, determining the trade-off between search accuracy and query processing time.

pgvecto.rs, when used with VBASE, consistently yields improved recall, particularly when working with low probability values.

Sparse Vector Search

Sparse vectors are high-dimensional vectors that contain few non-zero values. They are suitable for traditional information retrieval use cases. For example, a vector with 32,000 dimensions but only 2 non-zero elements is a typical sparse vector.

$$S[1\times 32000]=\left[ \begin{matrix} 0 & 0 & 0.015 & 0 & 0 & 0 & 0 & \cdots & 0.543 & 0 \end{matrix} \right]$$

Dense vectors are embeddings from neural networks. They are generated by text embedding models and have most or all elements non-zero. They have fewer dimensions, such as 256 or 1536, much less than sparse vectors.

$$D[1\times 256]=\left[ \begin{matrix} 0.342 & 1.774 & 0.087 & 0.870 & 0.001 & \cdots & 0.543 & 0.999 \end{matrix} \right]$$

pgvector supports dense vector search well, but it does not have plan to support sparse vector.

pgvecto.rs, on the other hand, supports both dense vector search and sparse vector search. It provides the ability to use the svector data type to build sparse vector indexes and perform searches on them.

CREATE TABLE items (
  id bigserial PRIMARY KEY,
  embedding svector(10) NOT NULL
);

INSERT INTO items (embedding) VALUES ('[0.1,0,0,0,0,0,0,0,0,0]');

CREATE INDEX your_index_name ON items USING vectors (embedding svector_l2_ops);

SELECT * FROM items ORDER BY embedding <-> '[0.3,0,0,0,0,0,0,0,0,0]' LIMIT 1;

Vector dimensions

The pgvecto.rs extension provides support for up to 65535 dimensions, whereas the pgvector extension supports a maximum of 2000 dimensions. This difference in dimensionality support allows pgvecto.rs to handle vectors from OpenAI's latest embedding model directly.

Single Instruction/Multiple Data (SIMD)

pgvecto.rs utilizes the SIMD (Single Instruction, Multiple Data) instructions available on the user's machine, pgvector relies on the compiler to generate SIMD code at compile time and cannot choose a faster instruction set at runtime. It makes search in pgvecto.rs faster due to the faster calculation of vector distances.

Data Types

Binary Vector

The bvector type is a binary vector type in pgvecto.rs. It represents a binary vector, which is a vector where each component can take on two possible values, typically 0 and 1.

CREATE TABLE items (
  id bigserial PRIMARY KEY,
  embedding bvector(3) NOT NULL
);

INSERT INTO items (embedding) VALUES ('[1,0,1]'), ('[0,1,0]');

CREATE INDEX your_index_name ON items USING vectors (embedding bvector_l2_ops);

SELECT * FROM items ORDER BY embedding <-> '[1,0,1]' LIMIT 5;

Binary vectors offer significant advantages in specific embedding models like the OpenAI embedding model by effectively reducing memory usage while preserving a satisfactory level of accuracy.

Here are some performance benchmarks for the bvector type. We use the dbpedia-entities-openai3-text-embedding-3-large-3072-1M dataset for the benchmark. The VM is n2-standard-8 (8 vCPUs, 32 GB memory) on Google Cloud.

We upsert 1M binary vectors into the table and then run a KNN query for each embedding. It only takes about 600MB memory to index 1M binary vectors, while the vector type takes about 18GB memory to index the same number of vectors. The bvector's accuracy exceeds 95% if we adopt adaptive retrieval.

FP16/INT8

Besides binary vectors, pgvecto.rs also provides support for FP16 (16-bit floating point) and INT8 (8-bit integer) data types.

Indexing

pgvecto.rs takes a different approach compared to pgvector. It handles the storage and memory of indexes separately from PostgreSQL, instead of relying on the native storage engine of PostgreSQL like pgvector does.

It's a design tradeoff. During the initial stages of our development, we conducted experiments with PostgreSQL's page storage for the Hierarchical Navigable Small World (HNSW) index. similar to pgvector. However, we encountered various limitations that hindered its effectiveness and functionality:

Parallelization Challenges: The process model of PostgreSQL, where each statement corresponds to a single process and the lack of thread-safe APIs, presents challenges for parallelization. In the case of building vector indexes, which involve computationally intensive tasks, parallelization can significantly enhance performance. However, our efforts to parallelize this process were impeded by frequent occurrences of 'Too many shared buffer locked' errors.
Write-Ahead Logging (WAL) Amplification: The issue of Write-Ahead Logging (WAL) amplification arises when inserting a 2KB vector results in generating over 20KB of WAL. This problem is inherent in the HNSW algorithm used, as it involves modifying multiple edges for a single point insertion. PostgreSQL records each change individually in the WAL, leading to substantial amplification of the WAL size.
Lock Contention: Lock contention arises due to the need to lock every edge list during reads and writes when traversing the HNSW graph. The hierarchical structure of HNSW, where higher levels contain fewer points, often results in lock contention becoming a common bottleneck during index usage. This contention occurs when multiple operations attempt to access or modify the same edge list simultaneously, leading to delays and reduced concurrency.

pgvecto.rs adopted a design akin to FreshDiskANN, resembling the Log-Structured Merge (LSM) tree concept. This architecture comprises three components: the writing segment, the growing segment, and the sealed segment. New vectors are initially written to the writing segment. A background process then asynchronously transforms them into the immutable growing segment. Subsequently, the growing segment undergoes a merge with the sealed segment, akin to the compaction process in an LSM tree. This design offers several benefits:

Non-blocking Insertions: Index modification operations do not impede insertion processes.
Batched Modifications: Grouping modifications to the HNSW graph improves throughput.
Elimination of Read-Write Lock Contention: Since sealed segments are immutable, issues related to read-write lock contention are mitigated.

However, there are drawbacks to this approach. One notable limitation is the absence of out-of-the-box Write-Ahead Logging (WAL) support for the index. This means that features like Point-in-Time Recovery and Physical Replication, which rely on WAL, are not readily available for the index. Nevertheless, the PostgreSQL ecosystem is robust and allows for extensions to define their own custom WAL through a resource manager. While implementing this solution requires additional effort, it is feasible to overcome the limitation and enable WAL support for the index.

Summary

Here we will show the main distinctions between pgvecto.rs and pgvector.

Feature	pgvecto.rs	pgvector
Filtering	Introduces VBASE method for vector search and relational query (e.g. Single-Vector TopK + Filter + Join).	When filters are applied, the results may be incomplete. For example, if you originally intended to limit the results to 10, you might end up with only 5 results with filters.
Sparse Vector Search	Supports both dense and sparse vector search.	Supports dense vector search.
Vector Dimensions	Supports up to 65535 dimensions.	Supports up to 2000 dimensions.
SIMD	SIMD instructions are dynamically dispatched at runtime to maximize performance based on the capabilities of the specific machine.	Relies on compiler-generated SIMD code at compile time.
Data Types	Introduces additional data types: binary vectors, FP16 (16-bit floating point), and INT8 (8-bit integer).	-
Indexing	Handles the storage and memory of indexes separately from PostgreSQL	Relies on the native storage engine of PostgreSQL
WAL Support	Provides Write-Ahead Logging (WAL) support for data, index support is working in progress.	Provides Write-Ahead Logging (WAL) support for index and data.

pgvecto.rs 0.2: Unifying Relational Queries and Vector Search in PostgreSQL

Ce Gao — Thu, 14 Mar 2024 04:16:45 +0000

We are excited to announce the release of pgvecto.rs 0.2, a significant milestone in the journey of bridging the gap between relational queries and vector search in PostgreSQL. This update brings together the power of both worlds, offering enhanced efficiency and enabling complex queries within PostgreSQL.

In the past, developers and data scientists encountered the significant challenge of managing separate systems for relational queries and vector search. This resulted in increased complexity and resource overhead. However, with the release of pgvecto.rs 0.2, we have addressed this issue by integrating the cutting-edge VBASE method from OSDI 2023. This integration has substantially refined the efficiency of vector search within PostgreSQL.

Real-world applications: Immich

Real-world applications often require complex queries that go beyond simple Approximate Nearest Neighbor (ANN) search. To explore a practical example of such applications, let's take a closer look at immich, a self-hosted photo and video backup solution that highlights the importance of advanced vector and traditional relational queries.

immich leverages advanced vector-based and relational queries to provide intelligent search capabilities. With immich, you can efficiently search and discover relevant media files based on visual similarity, metadata, and user-defined tags. The underlying technology powering this functionality is pgvecto.rs.

We will provide a concise overview of the search feature in immich. Consider a scenario where our database consists of three tables.

CREATE TABLE AssetEntity (
  id UUID PRIMARY KEY,
  ownerId VARCHAR NOT NULL,
  createdAt TIMESTAMPTZ NOT NULL,
  updatedAt TIMESTAMPTZ NOT NULL,
  deletedAt TIMESTAMPTZ,
  isArchived BOOLEAN DEFAULT false,
  isVisible BOOLEAN DEFAULT true,
  ...
);

CREATE TABLE ExifInfo (
  id INT PRIMARY KEY,
  assetId UUID,
  lat FLOAT,
  long FLOAT,
  city VARCHAR(255),
  state VARCHAR(255),
  country VARCHAR(255),
  description TEXT,
  ...
);

CREATE TABLE ImageEmbedding (
  id INT PRIMARY KEY,
  assetId UUID,
  embedding vector(n), -- assuming 'n' is the dimensionality of the vector
  ...
);

We have a table named AssetEntity that stores information about the images, including their unique identifier (id), the owner (ownerId), creation and update timestamps (createdAt and updatedAt), and other relevant attributes.

The ExifInfo table contains information specific to the EXIF data of the images, such as the latitude (lat), longitude (long), city, state, country, and description. The assetId column in this table establishes a relationship with the asset_entity table.

Additionally, we have the ImageEmbedding table, which stores vector-based embeddings for each image. The embedding column is an array of floating-point numbers representing the image embedding vector. The assetId column in this table also establishes a relationship with the asset_entity table.

The query statement below is used to search for images based on certain criteria and sorting by the similarity of the image embeddings. It joins the AssetEntity, ImageEmbedding, and ExifInfo tables, filters the results based on criteria like ownerId, isArchived, isVisible, createdAt, and city in the EXIF info, then orders the images by the similarity of the provided embedding. The query returns a limited number of results based on the specified limit.

SELECT a.*, e.*, e.*
FROM AssetEntity AS a
INNER JOIN ImageEmbedding AS e ON e.assetId = a.id
LEFT JOIN ExifInfo AS e ON e.assetId = a.id
WHERE a.ownerId IN (:userIds)
  AND a.isArchived = false
  AND a.isVisible = true
  AND a.createdAt < NOW()
  AND e.city = :city
ORDER BY s.embedding <=> :embedding
LIMIT :numResults;

It can be seen as a scenario involving Single-Vector TopK + Filter + Join operations. The limitations of pgvector in supporting such operations highlight the need for VBASE.

When it comes to Single-Vector TopK operations, pgvector falls short in providing efficient performance. TopK queries require finding the K nearest neighbors to a target vector, but pgvector struggles to predict the optimal value of K, leading to suboptimal query performance. VBASE, on the other hand, addresses this limitation by leveraging relaxed monotonicity and offering significantly higher efficiency. It provides a more accurate and efficient solution for single-vector TopK queries.

Additionally, pgvector's support for Filter and Join operations in conjunction with vector queries is limited. Complex queries that involve filtering or joining on both scalar and vector data can be challenging to execute efficiently in pgvector. VBASE, however, is designed to handle these types of queries seamlessly. It integrates vector search systems with relational databases, allowing for the execution of complex queries involving filters and joins on both scalar and vector attributes. This capability makes VBASE a more suitable choice for applications that require these operations.

Benchmark

To evaluate the performance, benchmarks can be conducted to measure the efficiency and effectiveness of both systems. We utilize the laion-768-5m-ip-probability dataset for benchmarking purposes due to the absence of a comprehensive relational benchmark. The dataset is derived from LAION 2B images. It contains 5,000,000 vectors, 10,000 queries.

pgvecto.rs, when used with VBASE, consistently yields improved recall, particularly when working with low probability values.

Other features

pgvecto.rs 0.2 introduces the following key features and improvements other than VBASE integration:

FP16: Users can now store embeddings in PostgreSQL using half the float32 size, significantly improving latency. This optimization has a negligible impact on final recall, less than 1%.
Asynchronous indexing: Insertion operations are non-blocking, ensuring a smoother and more efficient data insertion and indexing process.
Doubled query performance: pgvecto.rs 0.2 offers query performance that is twice as fast as the previous version (0.1), marking a significant leap forward in system efficiency.
Observability: The new pg_vector_index_stat view provides a transparent view into the indexing internals of pgvecto.rs. Users can monitor index construction, configuration adjustments, and detailed statistical analysis in real time, fostering a more intuitive and controlled environment.

Quick start

To get started with pgvecto.rs 0.2, you could run it in a docker container:

docker run \
  --name pgvecto-rs-demo \
  -e POSTGRES_PASSWORD=mysecretpassword \
  -p 5432:5432 \
  -d tensorchord/pgvecto-rs:pg16-v0.2.0

Please check out our documentation for more details. We encourage you to try out pgvecto.rs, benchmark it against your workloads, and contribute your indexing innovations. Join our Discord community to connect with the developers and other users working to improve pgvecto.rs!

20x Faster as the Beginning: Introducing pgvecto.rs extension written in Rust

Ce Gao — Mon, 07 Aug 2023 04:51:54 +0000

We are thrilled to announce the release of pgvecto.rs, a powerful Postgres extension for vector similarity search written in Rust. It's HNSW algorithm is 20x faster than pgvector at 90% recall. But speed is just the start - pgvecto.rs is architected to easily add new algorithms. We've made it an extensible architecture for contributors to implement new index with ease, and we look forward to the open source community driving pgvecto.rs to new heights!

Why Rust?

Pgvecto.rs is implemented in Rust rather than C like many existing Postgres extensions. It is built on top of the pgrx framework for writing Postgres extensions in Rust. Rust provides many advantages for an extension like pgvecto.rs. Rust's strict compile-time checks guarantee memory safety, which helps avoid entire classes of bugs and security issues that can plague C extensions. Just as importantly, Rust provides modern developer ergonomics with great documentation, package management, and excellent error messages. This makes pgvecto.rs more approachable for developers to use and contribute to compared to sprawling C codebases. The safety and ease of use of Rust make it an ideal language for building the next generation of Postgres extensions like pgvecto.rs on top of pgrx.

Extensible Architectures

Pgvecto.rs is designed with an extensible architecture that makes it easy to add support for new index types. At the core is a set of traits that define the required behaviors for a vector index, like building, saving, loading, and querying. Implementing a new index is as straightforward as creating a struct for that index type and implementing the required traits. Pgvecto.rs currently comes with two built-in index types - HNSW for maximum search speed, and ivfflat for quantization-based approximate search. But the doors are open for anyone to create additional indexes like RHNSW, NGT, or custom types tailored to specific use cases. The extensible architecture makes pgvecto.rs adaptable as new vector search algorithms emerge. And it lets you select the right index for your data and performance needs. Pgvecto.rs provides the framework for making vector search in Postgres as flexible and future-proof as possible.

Speed and Performance

Benchmarks show pgvecto.rs offers massive speed improvements over existing Postgres extensions like pgvector. In tests, its HNSW index demonstrates search performance up to 25x faster compared to pgvector's ivfflat index. The flexible architecture also allows using different indexing algorithms to optimize for either maximum throughput or precision. We're working on the quantization HNSW now, please also stay tuned!

Persistence and Management

Previous work pg_embedding did a great job implementing HNSW indexes, but lacked support for persistence and proper CRUD operations. pgvecto.rs adds those two core functionalities that were missing in pg_embedding. Vector indexes in pgvecto.rs are properly persisted using WAL (write-ahead logging). pgvecto.rs handles saving, loading, rebuilding, and updating indexes automatically behind the scenes. You get durable indexes that don't require external management while fitting cleanly into current Postgres deployments and workflows.

Getting Started

Let's assume you've created a table using the following SQL command:

CREATE TABLE items (id bigserial PRIMARY KEY, emb vector(4));

Here, vector(4) denotes the vector data type, with 4 representing the dimension of the vector. You can use vector without specifying a dimension, but be aware that you cannot create an index on a vector type without a specified dimension.

You can insert data like this anytime.

INSERT INTO items (emb)
VALUES ('[1.1, 2.2, 3.3, 4.4]');

To create an index on the emb vector column using squared Euclidean distance, you can use the following command:

CREATE INDEX ON items USING vectors (emb l2_ops)
WITH (options = $$
capacity = 2097152
size_ram = 4294967296
storage_vectors = "ram"
[algorithm.hnsw]
storage = "ram"
m = 32
ef = 256
$$);

If you want to retrieve the top 10 vectors closest to the origin, you can use the following SQL command:

SELECT *, emb <-> '[0, 0, 0, 0]' AS score
FROM items
ORDER BY emb <-> '[0, 0, 0, 0]'
LIMIT 10;

Conclusion

pgvecto.rs represents an exciting step forward for vector search in Postgres. Its implementation in Rust and extensible architecture provide key advantages over existing extensions like speed, safety, and flexibility. We're thrilled to release pgvecto.rs as an open source project under Apache 2.0 license and can't wait to see what the community builds on top of it. There's ample room for pgvecto.rs to expand - adding new index types and algorithms, optimizing for different data distributions and use cases, and integrating with existing Postgres workflows.

We encourage you to try out pgvecto.rs on GitHub, benchmark it against your workloads, and contribute your own indexing innovations. Together, we can make pgvecto.rs the best vector search extension Postgres has ever seen! The potential is vast, and we're just getting started. Please join us on this journey to bring unprecedented vector search capabilities to the Postgres ecosystem. Join our Discord community to connect with the developers and other users working to improve pgvecto.rs!

Advertisement Time

The mission of ModelZ is to simplify the process of taking machine learning models into production. With experiences from AWS, Tiktok, and Kubeflow, our team has extensive expertise in MLOps engineering. So if you have any questions related to putting models into production, please feel free to reach out, by joining Discord, or through modelz-support@tensorchord.ai. We're happy to help draw on our background building MLOps platforms across companies to provide guidance on any part of the model development to deployment workflow.

More products with ModelZ:

ModelZ - A Managed serverless GPU platform to deploy your own models
Mosec - A high-performance serving framework for ML models, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine. Simple and faster alternative to NVIDIA Triton.
envd - A command-line tool that helps you create the container-based environment for AI/ML, from development to the production. Python is all you need to know to use this tool.
ModelZ-llm - OpenAI compatible API for LLMs and embeddings (LLaMA, Vicuna, ChatGLM and many others)

Do we really need a specialized vector database?

Ce Gao — Mon, 07 Aug 2023 04:45:26 +0000

With the popularity of LLM (Large Language Model), vector databases have also become a hot topic. With just a few lines of simple Python code, a vector database can act as a cheap but highly effective "external brain" for your LLM. But do we really need a specialized vector database?

Why does LLM need vector search?

First, let me briefly introduce why LLM needs to use vector search technology. Vector search is a problem that has been around for a long time. The process of finding the most similar object in a collection given an object is vector search. Text/images, etc. can be converted into a vector representation, and the similarity problem of text/images can be transformed into a vector similarity problem.

In the example above, we convert different words into a three-dimensional vector. Therefore, we can intuitively display the similarity between different words in a 3D space. For example, the similarity between "student" and "school" is higher than the similarity between "student" and "food".

Returning to LLM, the limitation of the context window length is a major challenge. For instance, ChatGPT 3.5 has a context length limit of 4k tokens. This poses a significant problem for LLM's context-learning ability and negatively impacts the model's user experience. However, vector search provides an elegant solution to this problem:

Divide the text that exceeds the context length limit into shorter chunks and convert different chunks into vectors (embeddings).
Before inputting the prompt to LLM, convert the prompt into a vector (embedding).
Search the prompt vector to find the most similar chunk vector.
Concatenate the most similar chunk vector with the prompt vector as the input to LLM.

This is like giving LLM an external memory, which allows it to search for the most relevant information from this memory. This memory is the ability brought by vector search. If you want to learn more details, you can read these articles (Article 1 and Article 2), which explain it more clearly.

Why is vector database so popular?

In LLM, the vector database has become an indispensable part, and one of the most important reasons is its ease of use. After being used in conjunction with OpenAI Embedding models (such as text-embedding-ada-002), it only takes about ten lines of code to convert a prompt query into a vector and perform the entire process of vector search.

def query(query, collection_name, top_k=20):

    # Creates embedding vector from user query
    embedded_query = openai.Embedding.create(
        input=query,
        model=EMBEDDING_MODEL,
    )["data"][0]['embedding']

    near_vector = {"vector": embedded_query}

    # Queries input schema with vectorized user query
    query_result = (
        client.query
        .get(collection_name)
        .with_near_vector(near_vector)
        .with_limit(top_k)
        .do()
    )

    return query_result

In LLM, vector search mainly plays a role in recall. Simply put, recall is finding the most similar objects in a candidate set. In LLM, the candidate set is all chunks, and the most similar object is the chunk that is most similar to the prompt. In the reasoning process of LLM, vector search is regarded as the main implementation of recall. It is easy to implement and can use OpenAI Embedding models to solve the most troublesome problem of converting text into vectors. The remaining part is an independent and clean vector search problem, which can be well completed by current vector databases. Therefore, the entire process is particularly smooth.

As the name suggests, vector database is a database specifically designed for the special data type of vectors. The similarity calculation of vectors was originally an O(n^2) complexity problem because it required comparing all vectors in the set pairwise. Therefore, the industry proposed the Approximate Nearest Neighbor (ANN) algorithm. By using the ANN algorithm, the vector index is constructed by pre-calculating in the vector database, using the idea of trading space for time, which greatly speeds up the process of similarity calculation. This is similar to the index in traditional databases.

Therefore, vector databases not only have strong performance but also excellent ease of use, making them a perfect match for LLM! (Really?)

Perhaps a general-purpose database would be better?

We've talked about the advantages and benefits of vector databases, but what are their limitations? A blog post by SingleStore provides a good answer to this question:

Vectors and vector search are a data type and query processing approach, not a foundation for a new way of processing data. Using a specialty vector database (SVDB) will lead to the usual problems we see (and solve) again and again with our customers who use multiple specialty systems: redundant data, excessive data movement, lack of agreement on data values among distributed components, extra labor expense for specialized skills, extra licensing costs, limited query language power, programmability and extensibility, limited tool integration, and poor data integrity and availability compared with a true DBMS.

There are two issues that I think are important. The first is the issue of data consistency. During the prototyping phase, vector databases are very suitable, and ease of use is more important than anything else. However, a vector database is an independent system that is completely decoupled from other data storage systems such as TP databases and AP data lakes. Therefore, data needs to be synchronized, streamed, and processed between multiple systems.

Imagine if your data is already stored in an OLTP database such as PostgreSQL. To perform vector search using an independent vector database, you need to first extract the data from the database, then convert each data point into a vector using services such as OpenAI Embedding, and then synchronize it to a dedicated vector database. This adds a lot of complexity. Furthermore, if a user deletes a data point in PostgreSQL but it is not deleted in the vector database, then there will be data inconsistency issues. This issue can be very serious in actual production environments.

-- Update the embedding column for the documents table
UPDATE documents SET embedding = openai_embedding(content) WHERE length(embedding) = 0;

-- Create an index on the embedding column
CREATE INDEX ON documents USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);

-- Query the similar embeddings
SELECT * FROM documents ORDER BY embedding <-> openai_embedding('hello world') LIMIT 5;

On the other hand, if everything is done in a general-purpose database, the user experience may be simpler than with an independent vector database. Vectors are just one data type in a general-purpose database, not an independent system. This way, data consistency is no longer an issue.

The second issue is with query language. The query language of vector databases is typically designed specifically for vector search, so there may be many limitations in other types of queries. For example, in metadata filtering scenarios, users need to filter based on certain metadata fields. The filtering operators supported by some vector databases are limited.

In addition, the supported data types for metadata are also very limited, usually only including String, Number, List of Strings, and Booleans. This is not friendly for complex metadata queries.

If traditional databases can support the vector data type, then the aforementioned issues do not exist. Firstly, data consistency is already taken care of as TP or AP databases are existing infrastructure in production environments. Secondly, the issue of query language no longer exists because vector data type is just one data type in the database, so queries for vector data type can use the native query language of the database, such as SQL.

Detailed explanation

However, it is unfair to only compare the disadvantages of vector databases. There are several counterpoints to consider:

Ease of Use: Vector databases are designed with ease of use in mind, and users can easily work with them without worrying about the underlying implementation details. However, integrating them with other data storage systems can be a challenge, as mentioned earlier.
Performance: Vector databases have a significant advantage over traditional databases in terms of performance for certain use cases. Their design for vector search allows for fast and efficient similarity searches on large-scale datasets with high-dimensional vectors.
Metadata Filtering: While metadata filtering capabilities in vector databases may be limited, they can still meet the needs of most business scenarios. However, for more complex metadata queries, a hybrid approach may be needed, where metadata is stored in a separate database or data lake and linked to the vector data in the vector database.

How can you address these issues? In the following section, I will provide my perspective by answering these questions.

Vector databases are easy to use

While it is true that vector databases are easy to use, this is not unique to them. The ease of use of vector databases is mainly due to their abstraction of a specific domain, which allows them to be specifically designed for the most commonly used machine learning programming language, Python, and optimized for vector search scenarios. However, if traditional databases could also support the vector data type, they could offer similar ease of use.

In addition, traditional databases can provide Python SDKs and other integrated tools to meet the needs of most scenarios, as well as standard SQL interfaces to handle more complex query scenarios. Therefore, it is not necessary to use a vector database solely for its ease of use.

Another advantage of vector databases is their distributed design, which allows them to scale horizontally to meet the data volume and QPS requirements of users. However, traditional databases can also meet these requirements through distributed systems. Nevertheless, the decision to use a distributed system should be based on the actual needs of the data volume and QPS requirements, as well as the associated costs.

In summary, while vector databases have their advantages, traditional databases can also provide similar ease of use and distributed capabilities if they support the vector data type. Therefore, the choice between a vector database and a traditional database should be based on the specific needs of the application and the available resources.

Vector databases have better performance

To investigate the performance of vector databases in LLM scenarios, a naive benchmark of vector retrieval was conducted. The benchmark involved N randomly initialized 256-dimensional vectors, and the query time for the top-5 nearest neighbors was measured for different scales of N. Two different methods were used for the test:

Numpy was used to perform real-time calculation, which executed completely accurate, non-precomputed nearest neighbor calculation.
Hnswlib was used to precompute approximate nearest neighbors.

The benchmark results show that, at the scale of 1 million vectors, the delay of real-time calculation using Numpy is approximately 50ms. Using this as a benchmark, we can compare the time spent on LLM inference after completing vector search. For instance, the 7B model requires approximately 10 seconds for inference on 300 Chinese characters on an Nvidia A100 (40GB). Therefore, even if the query time for real-time accurate calculation of the similarity of 1 million vectors using Numpy is considered, it only accounts for 0.5% of the total delay in the end-to-end LLM inference. Thus, in terms of delay, the benefits brought by vector databases may be overshadowed by the delay of LLM itself in the current LLM scenario. Therefore, we need to also consider throughput. The throughput of LLM is much lower than that of vector databases. Thus, I do not believe that throughput is the core issue in this scenario.

If performance is not the primary concern, what factors will determine the user's choice? I think it is the overall ease of use, including ease of use for both usage and operation, consistency, and other solutions to database-related issues. Traditional databases have mature solutions for these problems, while vector databases are still in the early stages of development.

Metadata filtering can still meet the needs of most business scenarios

When considering metadata filtering, it's important to note that it's not just a matter of the number of supported operators. Consistency of data is also a crucial factor. Metadata in vectors is essentially data in traditional databases, while vectors themselves are indexes of the data. Therefore, it's reasonable to consider storing both vectors and metadata in traditional databases.

Traditional databases do have the capability to support vector data types and provide similar ease of use and distributed capabilities as vector databases. Furthermore, traditional databases have mature solutions to ensure data consistency and integrity, such as transaction management and data backup and recovery.

Vectors in traditional databases

Since we see vectors as a new data type in traditional databases, let's take a look at how to support vector data types in traditional databases, using PostgreSQL as an example. pgvector is an open-source PostgreSQL plugin that supports vector data types. pgvector uses exact calculation by default, but it also supports building an IVFFlat index and precomputing ANN results using the IVFFlat algorithm, sacrificing calculation accuracy for performance.

pgvector has done an excellent job of supporting vectors and is used by products such as supabase. However, the supported index algorithm is limited, with only the simplest IVFFlat algorithm supported, and no quantization or storage optimization is implemented. Moreover, the index algorithm of pgvector is not disk-friendly and is designed for use in memory. Therefore, vector index algorithms designed for disk, such as DiskANN, are also valuable in the traditional database ecosystem.

Extending pgvector can be challenging due to its implementation in the C programming language. Despite being open-source for two years, pgvector currently has only three contributors. While the implementation of pgvector is not particularly complex, it may be worth considering rewriting it in Rust.

Rewriting pgvector in Rust can enable the code to be organized in a more modern and extensible way. Rust's ecosystem is also very rich, with existing Rust bindings such as faiss-rs.

As a result, pgvecto.rs was created. pgvecto.rs currently supports exact vector query operations and three distance calculation operators. Work is underway to design and implement index support. In addition to IVFFlat, we also hope to support more indexing algorithms such as DiskANN, SPTAG, and ScaNN. We welcome contributions and feedback from the community!

pgvecto.rs offers a modern and extensible codebase with improved performance and concurrency. Its design and implementation allow seamless integration with other machine learning libraries and tools, making it an ideal choice for similarity search scenarios.

With ongoing development, pgvecto.rs aims to be a valuable tool for data scientists and machine learning practitioners. Its support for various indexing algorithms and its ease of use make it a promising candidate for large-scale similarity search applications. We look forward to continuing development and contributions from the community.

-- call the distance function through operators

-- square Euclidean distance
SELECT array[1, 2, 3] <-> array[3, 2, 1];
-- dot product distance
SELECT array[1, 2, 3] <#> array[3, 2, 1];
-- cosine distance
SELECT array[1, 2, 3] <=> array[3, 2, 1];

-- create table
CREATE TABLE items (id bigserial PRIMARY KEY, emb numeric[]);
-- insert values
INSERT INTO items (emb) VALUES (ARRAY[1,2,3]), (ARRAY[4,5,6]);
-- query the similar embeddings
SELECT * FROM items ORDER BY emb <-> ARRAY[3,2,1] LIMIT 5;
-- query the neighbors within a certain distance
SELECT * FROM items WHERE emb <-> ARRAY[3,2,1] < 5;

Future

As LLMs gradually move into production environments, infrastructure requirements are becoming increasingly demanding. The emergence of vector databases is an important addition to the infrastructure. We do not believe that vector databases will replace traditional databases, but rather that they will each play to their strengths in different scenarios. The emergence of vector databases will also promote traditional databases to support vector data types.

We hope that pgvecto.rs can become an important component of the Postgres ecosystem, providing better vector support for Postgres. Its implementation in Rust and support for various indexing algorithms make it a promising candidate for large-scale similarity search applications. We believe that its development and contributions from the community will help it become a valuable tool for data scientists and machine learning practitioners.

Launching ModelZ Beta!

Ce Gao — Wed, 21 Jun 2023 03:21:18 +0000

We're excited to announce the launch of Modelz beta, a serverless GPU inference platform. Our team has been hard at work building a platform that democratizes access to machine learning, making it easier than ever to build and deploy models for a variety of use cases.

Features

Modelz is a fully managed platform that provides users with a simple API for deploying their machine learning models. The platform takes care of all the underlying infrastructure, including servers, storage, and networking. This means that users can focus on developing their models and deploying them on the platform without worrying about the underlying infrastructure.

Modelz provides the following features out-of-the-box:

Serverless: Serverless architecture enables us to easily scale up or down according to your needs, allowing us to provide a reliable and scalable solution for deploying and prototyping machine learning applications at any scale.
Reduce cost: Pay only for the resources you consume, without any additional charges for idle servers or cold starts. Get 30 free minutes of L4 GPU usage when you join us. Attach a payment method and get an extra 90 minutes free.
OpenAI compatible API: Our platform supports OpenAI compatible API, which means you can easily integrate new open source LLMs into your existing applications with just a few lines of code.
Prototyping environment: We provide a robust prototyping environment with support for Gradio and Streamlit. With our integration with HuggingFace Space, accessing pre-trained models and launching demos is easier than ever, with just one click. This allows you to quickly test and iterate on your models, saving you time and effort in the development process.

Quick start

Getting started with Modelz is easy and straightforward. Here are the quick steps to get started:

Sign up for an account on the website.
Use the Modelz templates to create the deployment.
Send requests or visit the UI of the deployment.

Here's a full workflow example using the Modelz Beta platform and the bloomz 560M template to create a inference deployment.

After the creation, you will get the detailed information in the UI:

We will show the logs, events (e.g. the deployment scale-up and scale-down events), and the metrics (e.g. total requests, inflight requests) in the dashboard. Besides this, You could get the usage guide too.

Bloomz 560M template is powered by modelz-llm, which provides the OpenAI compatible API for the model. Thus you could use OpenAI python
package to use the model. First you need to get the endpoint and API key from the dashboard.

import openai
openai.api_base="https://bloomz-webhiq5i9dagphhu.modelz.io"
# Use your API Key in modelz.
openai.api_key="mzi-xxx"

# create a chat completion
chat_completion = openai.ChatCompletion.create(
  model="any", messages=[{"role": "user", "content": "Hello world"}])

Serverless

The deployment will scale down to 0 after a idle interval (it is configured in the creation page). You could get the autoscaling events and metrics in the dashboard:

Community

Modelz is built on top of envd, mosec, modelz-llm and many other open source projects. If you're interested in joining the Modelz community, here are some ways to get involved:

Join the Modelz discord community: We have a discord community where you can connect with other developers and data scientists, ask questions, and share your knowledge and expertise.
Contribute to open source projects: Modelz is built on top of envd, mosec, modelz-llm and many other open source projects. If you're interested in contributing to these projects, you can check out their GitHub repositories and start contributing.
Share your models and projects: If you've built a machine learning model or a project using Modelz, we'd love to hear about it! You can share your projects on our discord community or on twitter using the hashtag #Modelz, or mention @TensorChord.

As we continue to explore the possibilities of AI and its impact on our world, I wish you a great journey in your pursuit of knowledge and innovation. Whether you are just getting started with AI or are an experienced professional, the field of AI offers endless opportunities for growth and discovery.

A command-line tool to create development environments for AI/ML, based on buildkit

Ce Gao — Mon, 12 Sep 2022 01:32:50 +0000

What is envd?

envd (ɪnˈvdɪ) is a command-line tool that helps you create the container-based development environment for AI/ML.

Development environments are full of python and system dependencies, CUDA, BASH scripts, Dockerfiles, SSH configurations, Kubernetes YAMLs, and many other clunky things that are always breaking. envd is to solve the problem:

Declare the list of dependencies (CUDA, python packages, your favorite IDE, and so on) in build.envd
Simply run envd up.
Develop in the isolated environment.

Why use `envd`?

Environments built with envd provide the following features out-of-the-box:

❤️ Knowledge reuse in your team

envd build functions can be reused. Use include function to import any git repositories. No more copy/paste Dockerfile instructions, let's reuse them.

envdlib = include("https://github.com/tensorchord/envdlib")

def build():
    base(os="ubuntu20.04", language="python")
    envdlib.tensorboard(8888)

⏱️ Builtkit native, build up to 6x faster

Buildkit supports parallel builds and software cache (e.g. pip index cache and apt cache). You can enjoy the benefits without knowledge of it.

For example, the PyPI cache is shared across builds and thus the package will be cached if it has been downloaded before.

🐍 One configuration to rule them all

Development environments are full of Dockerfiles, bash scripts, Kubernetes YAML manifests, and many other clunky files that are always breaking. You just need one configuration file build.envd¹, it works both for local Docker and Kubernetes clusters in the cloud.

✍️ Don't sacrifice your developer experience

SSH is configured for the created environment. You can use vscode-remote, jupyter, pycharm or other IDEs that you love. Besides this, declare the IDE extensions you want, let envd take care of them.

def build():
    install.vscode_extensions([
        "ms-python.python",
    ])

☁️ No polluted environment

Are you working on multiple projects, all of which need different versions of CUDA? envd helps you create isolated and clean environments.

Who should use envd?

We’re focused on helping data scientists and teams that develop AI/ML models. And they may suffer from:

building the development environments with Python/R/Julia, CUDA, Docker, SSH, and so on. Do you have a complicated Dockerfile or build script that sets up all your dev environments, but is always breaking?
Updating the environment. Do you always need to ask infrastructure engineers how to add a new Python/R/Julia package in the Dockerfile?
Managing environments and machines. Do you always forget which machines are used for the specific project, because you handle multiple projects concurrently?

Getting Started 🚀

Requirements

Docker (20.10.0 or above)

Install and bootstrap `envd`

envd can be installed with pip (only support Python3). After the installation, please run envd bootstrap to bootstrap.

pip3 install --pre --upgrade envd
envd bootstrap

Create an `envd` environment

Please clone the envd-quick-start:

git clone https://github.com/tensorchord/envd-quick-start.git

The build manifest build.envd looks like:

def build():
    base(os="ubuntu20.04", language="python3")
    # Configure the pip index if needed.
    # config.pip_index(url = "https://pypi.tuna.tsinghua.edu.cn/simple")
    install.python_packages(name = [
        "numpy",
    ])
    shell("zsh")

Note that we use Python here as an example but please check out examples for other languages such as R and Julia here.

Then please run the command below to set up a new environment:

cd envd-quick-start && envd up

$ cd envd-quick-start && envd up
[+] ⌚ parse build.envd and download/cache dependencies 2.8s ✅ (finished)
 => download oh-my-zsh                                                    2.8s
[+] 🐋 build envd environment 18.3s (25/25) ✅ (finished)
 => create apt source dir                                                 0.0s
 => local://cache-dir                                                     0.1s
 => => transferring cache-dir: 5.12MB                                     0.1s
...
 => pip install numpy                                                    13.0s
 => copy /oh-my-zsh /home/envd/.oh-my-zsh                                 0.1s
 => mkfile /home/envd/install.sh                                          0.0s
 => install oh-my-zsh                                                     0.1s
 => mkfile /home/envd/.zshrc                                              0.0s
 => install shell                                                         0.0s
 => install PyPI packages                                                 0.0s
 => merging all components into one                                       0.3s
 => => merging                                                            0.3s
 => mkfile /home/envd/.gitconfig                                          0.0s
 => exporting to oci image format                                         2.4s
 => => exporting layers                                                   2.0s
 => => exporting manifest sha256:7dbe9494d2a7a39af16d514b997a5a8f08b637f  0.0s
 => => exporting config sha256:1da06b907d53cf8a7312c138c3221e590dedc2717  0.0s
 => => sending tarball                                                    0.4s
envd-quick-start via Py v3.9.13 via 🅒 envd 
⬢ [envd]❯ # You are in the container-based environment!

Set up Jupyter notebook

Please edit the build.envd to enable jupyter notebook:

def build():
    base(os="ubuntu20.04", language="python3")
    # Configure the pip index if needed.
    # config.pip_index(url = "https://pypi.tuna.tsinghua.edu.cn/simple")
    install.python_packages(name = [
        "numpy",
    ])
    shell("zsh")
    config.jupyter()

You can get the endpoint of the running Jupyter notebook via envd envs ls.

$ envd up --detach
$ envd envs ls
NAME                    JUPYTER                 SSH TARGET              CONTEXT                                 IMAGE                   GPU     CUDA    CUDNN   STATUS          CONTAINER ID
envd-quick-start        http://localhost:42779   envd-quick-start.envd   /home/gaocegege/code/envd-quick-start   envd-quick-start:dev    false   <none>  <none>  Up 54 seconds   bd3f6a729e94

Roadmap 🗂️

Please checkout ROADMAP.

Contribute 😊

We welcome all kinds of contributions from the open-source community, individuals, and partners.

Join our discord community!
To build from the source, please read our contributing documentation and development tutorial.

The build language is starlark, which is a dialect of Python. ↩

DEV Community: Ce Gao

My binary vector search is better than your FP32 vectors

What is a binary vector?

Experiment

Optimization: adaptive retrieval

Comparison with shortening vectors

Summary

pgvector vs. pgvecto.rs in 2024: A Comprehensive Comparison for Vector Search in PostgreSQL

Search

Filtering

Sparse Vector Search

Vector dimensions

Single Instruction/Multiple Data (SIMD)

Data Types

Binary Vector

FP16/INT8

Indexing

Summary

pgvecto.rs 0.2: Unifying Relational Queries and Vector Search in PostgreSQL

Real-world applications: Immich

Benchmark

Other features

Quick start

20x Faster as the Beginning: Introducing pgvecto.rs extension written in Rust

Why Rust?

Extensible Architectures

Speed and Performance

Persistence and Management

Getting Started

Conclusion

Advertisement Time

Do we really need a specialized vector database?

Why does LLM need vector search?

Why is vector database so popular?

Perhaps a general-purpose database would be better?

Detailed explanation

Vector databases are easy to use

Vector databases have better performance

Metadata filtering can still meet the needs of most business scenarios

Vectors in traditional databases

Future

Launching ModelZ Beta!

Features

Quick start

Serverless

Community

A command-line tool to create development environments for AI/ML, based on buildkit

What is envd?

Why use envd?

Who should use envd?

Getting Started 🚀

Requirements

Install and bootstrap envd

Create an envd environment

Set up Jupyter notebook

More on documentation 📝

Roadmap 🗂️

Contribute 😊

Why use `envd`?

Install and bootstrap `envd`

Create an `envd` environment