DEV Community: David Myriel

Blog-Reading Chatbot with GPT-4o

David Myriel — Tue, 14 May 2024 16:31:49 +0000

In this tutorial, you will build a RAG system that combines blog content ingestion with the capabilities of semantic search. OpenAI’s GPT-4o LLM is powerful, but scaling its use requires us to supply context systematically.

RAG enhances the LLM’s generation of answers by retrieving relevant documents to aid the question-answering process. This setup showcases the integration of advanced search and AI language processing to improve information retrieval and generation tasks.

A notebook for this tutorial is available on GitHub.

Data Privacy and Sovereignty: RAG applications often rely on sensitive or proprietary internal data. Running the entire stack within your own environment becomes crucial for maintaining control over this data. Qdrant Hybrid Cloud deployed on Scaleway addresses this need perfectly, offering a secure, scalable platform that still leverages the full potential of RAG. Scaleway offers serverless Functions and serverless Jobs, both of which are ideal for embedding creation in large-scale RAG cases.

Components

Cloud Host: Scaleway on managed Kubernetes for compatibility with Qdrant Hybrid Cloud.
Vector Database: Qdrant Hybrid Cloud as the vector search engine for retrieval.
LLM: GPT-4o, developed by OpenAI is utilized as the generator for producing answers.
Framework: LangChain for extensive RAG capabilities.

> Langchain supports a wide range of LLMs, and GPT-4o is used as the main generator in this tutorial. You can easily swap it out for your preferred model that might be launched on your premises to complete the fully private setup. For the sake of simplicity, we used the OpenAI APIs, but LangChain makes the transition seamless.

Deploying Qdrant Hybrid Cloud on Scaleway

Scaleway Kapsule and Kosmos are managed Kubernetes services from Scaleway. They abstract away the complexities of managing and operating a Kubernetes cluster.

The primary difference being, Kapsule clusters are composed solely of Scaleway Instances. Whereas, a Kosmos cluster is a managed multi-cloud Kubernetes engine that allows you to connect instances from any cloud provider to a single managed Control-Plane.

To start using managed Kubernetes on Scaleway, follow the platform-specific documentation.
Once your Kubernetes clusters are up, you can begin deploying Qdrant Hybrid Cloud.

Prerequisites

To prepare the environment for working with Qdrant and related libraries, it’s necessary to install all required Python packages. This can be done using Poetry, a tool for dependency management and packaging in Python. The code snippet imports various libraries essential for the tasks ahead, including bs4 for parsing HTML and XML documents, langchain and its community extensions for working with language models and document loaders, and Qdrant for vector storage and retrieval. These imports lay the groundwork for utilizing Qdrant alongside other tools for natural language processing and machine learning tasks.

Qdrant will be running on a specific URL and access will be restricted by the API key. Make sure to store them both as environment variables as well:

export QDRANT_URL="https://qdrant.example.com"
export QDRANT_API_KEY="your-api-key"

Optional: Whenever you use LangChain, you can also configure LangSmith, which will help us trace, monitor and debug LangChain applications. You can sign up for LangSmith here.

export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY="your-api-key"
export LANGCHAIN_PROJECT="your-project"  # if not specified, defaults to "default"

Now you can get started:

import getpass
import os

import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_qdrant import Qdrant
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

Set up the OpenAI API key:

os.environ["OPENAI_API_KEY"] = getpass.getpass()

Initialize the language model:


llm = ChatOpenAI(model="gpt-4o")

It is here that we configure both the Embeddings and LLM. You can replace this with your own models using Ollama or other services. Scaleway has some great L4 GPU Instances you can use for compute here.

Download and parse data

To begin working with blog post contents, the process involves loading and parsing the HTML content. This is achieved using urllib and BeautifulSoup, which are tools designed for such tasks. After the content is loaded and parsed, it is indexed using Qdrant, a powerful tool for managing and querying vector data. The code snippet demonstrates how to load, chunk, and index the contents of a blog post by specifying the URL of the blog and the specific HTML elements to parse. This step is crucial for preparing the data for further processing and analysis with Qdrant.

# Load, chunk and index the contents of the blog.
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

Chunking data

When dealing with large documents, such as a blog post exceeding 42,000 characters, it’s crucial to manage the data efficiently for processing. Many models have a limited context window and struggle with long inputs, making it difficult to extract or find relevant information. To overcome this, the document is divided into smaller chunks. This approach enhances the model’s ability to process and retrieve the most pertinent sections of the document effectively.

In this scenario, the document is split into chunks using the RecursiveCharacterTextSplitter with a specified chunk size and overlap. This method ensures that no critical information is lost between chunks. Following the splitting, these chunks are then indexed into Qdrant—a vector database for efficient similarity search and storage of embeddings. The Qdrant.from_documents function is utilized for indexing, with documents being the split chunks and embeddings generated through OpenAIEmbeddings. The entire process is facilitated within an in-memory database, signifying that the operations are performed without the need for persistent storage, and the collection is named “lilianweng” for reference.

This chunking and indexing strategy significantly improves the management and retrieval of information from large documents, making it a practical solution for handling extensive texts in data processing workflows.

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

vectorstore = Qdrant.from_documents(
    documents=splits, embedding=OpenAIEmbeddings(), location=":memory:", collection_name="lilianweng"
)

Retrieve and generate content

The vectorstore is used as a retriever to fetch relevant documents based on vector similarity. The hub.pull("rlm/rag-prompt") function is used to pull a specific prompt from a repository, which is designed to work with retrieved documents and a question to generate a response.

The format_docs function formats the retrieved documents into a single string, preparing them for further processing. This formatted string, along with a question, is passed through a chain of operations. Firstly, the context (formatted documents) and the question are processed by the retriever and the prompt. Then, the result is fed into a large language model (llm) for content generation. Finally, the output is parsed into a string format using StrOutputParser().

This chain of operations demonstrates a sophisticated approach to information retrieval and content generation, leveraging both the semantic understanding capabilities of vector search and the generative prowess of large language models.

Now, retrieve and generate data using relevant snippets from the blogL

retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

Invoking the RAG Chain

rag_chain.invoke("What is Task Decomposition?")

Next steps:

We built a solid foundation for a simple chatbot, but there is still a lot to do. If you want to make the system production-ready, you should consider implementing the mechanism into your existing stack. We recommend

Our vector database can easily be hosted on Scaleway, our trusted Qdrant Hybrid Cloud partner. This means that Qdrant can be run from your Scaleway region, but the database itself can still be managed from within Qdrant Cloud’s interface. Both products have been tested for compatibility and scalability, and we recommend their managed Kubernetes service. Their French deployment regions e.g. France are excellent for network latency and data sovereignty. For hosted GPUs, try rendering with L4 GPU instances.

If you have any questions, feel free to ask on our Discord community.

Qdrant 1.8.0 - Major Performance Enhancements

David Myriel — Fri, 08 Mar 2024 16:41:12 +0000

Qdrant 1.8.0 is out!

This time around, we have focused on Qdrant's internals. Our goal was to optimize performance, so that your existing setup can run faster and save on compute. Here is what we've been up to:

Faster sparse vectors: Hybrid search is up to 16x faster now!
CPU resource management: You can allocate CPU threads for faster indexing.
Better indexing performance: We optimized text indexing on the backend.

Faster search with sparse vectors

Search throughput is now up to 16 times faster for sparse vectors. If you are using Qdrant for hybrid search, this means that you can now handle up to sixteen times as many queries. This improvement comes from extensive backend optimizations aimed at increasing efficiency and capacity.

What this means for your setup:

Query speed: The time it takes to run a search query has been significantly reduced.
Search capacity: Qdrant can now handle a much larger volume of search requests.
User experience: Results will appear faster, leading to a smoother experience for the user.
Scalability: You can easily accomodate rapidly growing users or an expanding dataset.

Sparse vectors benchmark

Performance results are publicly available for you to test. Qdrant's R&D developed a dedicated open-source benchmarking tool just to test sparse vector performance.

A real-life simulation of sparse vector queries was run against the NeurIPS 2023 dataset. All tests were done on an 8 CPU machine on Azure.

Latency (y-axis) has dropped significantly for queries. You can see the before/after here:

Figure 1: Dropping latency in sparse vector search queries across versions 1.7-1.8.

The colors within both scatter plots show the frequency of results. The red dots show that the highest concentration is around 2200ms (before) and 135ms (after). This tells us that latency for sparse vectors queries dropped by about a factor of 16. Therefore, the time it takes to retrieve an answer with Qdrant is that much shorter.

This performance increase can have a dramatic effect on hybrid search implementations. Read more about how to set this up.

FYI, sparse vectors were released in Qdrant v.1.7.0. They are stored using a different index, so first check out the documentation if you want to try an implementation.

CPU resource management

Indexing is Qdrant’s most resource-intensive process. Now you can account for this by allocating compute use specifically to indexing. You can assign a number CPU resources towards indexing and leave the rest for search. As a result, indexes will build faster, and search quality will remain unaffected.

This isn't mandatory, as Qdrant is by default tuned to strike the right balance between indexing and search. However, if you wish to define specific CPU usage, you will need to do so from config.yaml.

This version introduces a optimizer_cpu_budget parameter to control the maximum number of CPUs used for indexing.

Read more about config.yaml in the configuration file.

# CPU budget, how many CPUs (threads) to allocate for an optimization job.
optimizer_cpu_budget: 0

If left at 0, Qdrant will keep 1 or more CPUs unallocated - depending on CPU size.
If the setting is positive, Qdrant will use this exact number of CPUs for indexing.
If the setting is negative, Qdrant will subtract this number of CPUs from the available CPUs for indexing.

For most users, the default optimizer_cpu_budget setting will work well. We only recommend you use this if your indexing load is significant.

Our backend leverages dynamic CPU saturation to increase indexing speed. For that reason, the impact on search query performance ends up being minimal. Ultimately, you will be able to strike a the best possible balance between indexing times and search performance.

This configuration can be done at any time, but it requires a restart of Qdrant. Changing it affects both existing and new collections.

Note: This feature is not configurable on Qdrant Cloud.

Better indexing for text data

In order to minimize your RAM expenditure, we have developed a new way to index specific types of data. Please keep in mind that this is a backend improvement, and you won't need to configure anything.

Going forward, if you are indexing immutable text fields, we estimate a 10% reduction in RAM loads. Our benchmark result is based on a system that uses 64GB of RAM. If you are using less RAM, this reduction might be higher than 10%.

Immutable text fields are static and do not change once they are added to Qdrant. These entries usually represent some type of an attribute, description or a tag. Vectors associated with them can be indexed more efficiently, since you don’t need to re-index them anymore. Conversely, mutable fields are dynamic and can be modified after their initial creation. Please keep in mind that they will continue to require additional RAM.

This approach ensures stability in the vector search index, with faster and more consistent operations. We achieved this by setting up a field index which helps minimize what is stored. To improve search performance we have also optimized the way we load documents for searches with a text field index. Now our backend loads documents mostly sequentially and in increasing order.

Minor improvements and new features

Beyond these enhancements, Qdrant v1.8.0 adds and improves on several smaller features:

Order points by payload: In addition to searching for semantic results, you might want to retrieve results by specific metadata (such as price). You can now use Scroll API to order points by payload key.
Datetime support: We have implemented datetime support for the payload index. Prior to this, if you wanted to search for a specific datetime range, you would have had to convert dates to UNIX timestamps. (PR#3320)
Check collection existence: You can check whether a collection exists via the /exists endpoint to the /collections/{collection_name}. You will get a true/false response. (PR#3472).
Find points whose payloads match more than the minimal amount of conditions. We included the min_should match feature for a condition to be true (PR#3331).
Modify nested fields: We have improved the set_payload API, adding the ability to update nested fields (PR#3548).

Release notes

For more information, see our release notes.
Qdrant is an open source project. We welcome your contributions; raise issues, or contribute via pull requests!

RAG is Dead. Long Live RAG!

David Myriel — Wed, 28 Feb 2024 21:44:14 +0000

When Anthropic came out with a context window of 100K tokens, they said: “Vector search is dead. LLMs are getting more accurate and won’t need RAG anymore.”

Google’s Gemini 1.5 now offers a context window of 10 million tokens. Their supporting paper claims victory over accuracy issues, even when applying Greg Kamradt’s NIAH methodology.

It’s over. RAG must be completely obsolete now. Right?

No.

Larger context windows are never the solution. Let me repeat. Never. They require more computational resources and lead to slower processing times.

The community is already stress testing Gemini 1.5:

This is not surprising. LLMs require massive amounts of compute and memory to run. To cite Grant, running such a model by itself “would deplete a small coal mine to generate each completion”. Also, who is waiting 30 seconds for a response?

Context stuffing is not the solution

Relying on context is expensive, and it doesn’t improve response quality in real-world applications. Retrieval based on vector search offers much higher precision.

If you solely rely on an LLM to perfect retrieval and precision, you are doing it wrong.

A large context window makes it harder to focus on relevant information. This increases the risk of errors or hallucinations in its responses.

Google found Gemini 1.5 significantly more accurate than GPT-4 at shorter context lengths and “a very small decrease in recall towards 1M tokens”. The recall is still below 0.8.

We don’t think 60-80% is good enough. The LLM might retrieve enough relevant facts in its context window, but it still loses up to 40% of the available information.

The whole point of vector search is to circumvent this process by efficiently picking the information your app needs to generate the best response. A vector database keeps the compute load low and the query response fast. You don’t need to wait for the LLM at all.

Qdrant’s benchmark results are strongly in favor of accuracy and efficiency. We recommend that you consider them before deciding that an LLM is enough. Take a look at our open-source benchmark reports and try out the tests yourself.

Vector search in compound systems

The future of AI lies in careful system engineering. As per Zaharia et al., results from Databricks find that “60% of LLM applications use some form of RAG, while 30% use multi-step chains.”

Even Gemini 1.5 demonstrates the need for a complex strategy. When looking at Google’s MMLU Benchmark, the model was called 32 times to reach a score of 90.0% accuracy. This shows us that even a basic compound arrangement is superior to monolithic models.

As a retrieval system, a vector database perfectly fits the need for compound systems. Introducing them into your design opens the possibilities for superior applications of LLMs. It is superior because it’s faster, more accurate, and much cheaper to run.

The key advantage of RAG is that it allows an LLM to pull in real-time information from up-to-date internal and external knowledge sources, making it more dynamic and adaptable to new information. - Oliver Molander, CEO of IMAGINAI

Qdrant scales to enterprise RAG scenarios

People still don’t understand the economic benefit of vector databases. Why would a large corporate AI system need a stand-alone vector db like Qdrant? In our minds, this is the most important question. Let’s pretend that LLMs cease struggling with context thresholds altogether.

How much would all of this cost?

If you are running a RAG solution in an enterprise environment with petabytes of private data, your compute bill will be unimaginable. Let's assume 1 cent per 1K input tokens (which is the current GPT-4 Turbo pricing). Whatever you are doing, every time you go 100 thousand tokens deep, it will cost you $1.

That’s a buck a question.

According to our estimations, vector search queries are at least 100 million times cheaper than queries made by LLMs.

Conversely, the only up-front investment with vector databases is the indexing (which requires more compute). After this step, everything else is a breeze. Once setup, Qdrant easily scales via features like Multitenancy and Sharding. This lets you scale up your reliance on the vector retrieval process and minimize your use of the compute-heavy LLMs. As an optimization measure, Qdrant is irreplaceable.

Julien Simon from HuggingFace says it best:

RAG is not a workaround for limited context size. For mission-critical enterprise use cases, RAG is a way to leverage high-value, proprietary company knowledge that will never be found in public datasets used for LLM training. At the moment, the best place to index and query this knowledge is some sort of vector index. In addition, RAG downgrades the LLM to a writing assistant. Since built-in knowledge becomes much less important, a nice small 7B open-source model usually does the trick at a fraction of the cost of a huge generic model.

Long Live RAG

As LLMs continue to require enormous computing power, users will need to leverage vector search and RAG.

Our customers remind us of this fact every day. As a product, our vector database is highly scalable and business-friendly. We develop our features strategically to follow our company’s Unix philosophy.

We want to keep Qdrant compact, efficient and with a focused purpose. This purpose is to empower our customers to use it however they see fit.

When large enterprises release their generative AI into production, they need to keep costs under control, while retaining the best possible quality of responses. Qdrant has the tools to do just that.

Whether through RAG, Semantic Search, Dissimilarity Search, Recommendations or Multimodality - Qdrant will continue to journey on.

Best Practices for Massive-Scale Deployments: Multitenancy and Custom Sharding

David Myriel — Tue, 13 Feb 2024 14:53:58 +0000

We are seeing the topics of multitenancy and distributed deployment pop-up daily on our Discord support channel. This tells us that many of you are looking to scale Qdrant along with the rest of your machine learning setup.

Whether you are building a bank fraud-detection system, RAG for e-commerce, or services for the federal government - you will need to leverage a multitenant architecture to scale your product.
In the world of SaaS and enterprise apps, this setup is the norm. It will considerably increase your application's performance and lower your hosting costs.

We have developed two major features just for this. You can now scale a single Qdrant cluster and support all of your customers worldwide. Under multitenancy, each customer's data is completely isolated and only accessible by them. At times, if this data is location-sensitive, Qdrant also gives you the option to divide your cluster by region or other criteria that further secure your customer's access. This is called custom sharding.

Combining these two will result in an efficiently-partitioned architecture that further leverages the convenience of a single Qdrant cluster. This article will briefly explain the benefits and show how you can get started using both features.

One collection, many tenants

When working with Qdrant, you can upsert all your data to a single collection, and then partition each vector via its payload. This means that all your users are leveraging the power of a single Qdrant cluster, but their data is still isolated within the collection. Let's take a look at a two-tenant collection:

Figure 1: Each individual vector is assigned a specific payload that denotes which tenant it belongs to. This is how a large number of different tenants can share a single Qdrant collection.

Qdrant is built to excel in a single collection with a vast number of tenants. You should only create multiple collections when your data is not homogenous or if users' vectors are created by different embedding models. Creating too many collections may result in resource overhead and cause dependencies. This can increase costs and affect overall performance.

Sharding your database

With Qdrant, you can also specify a shard for each vector individually. This feature is useful if you want to control where your data is kept in the cluster. For example, one set of vectors can be assigned to one shard on its own node, while another set can be on a completely different node.

During vector search, your operations will be able to hit only the subset of shards they actually need. In massive-scale deployments, this can significantly improve the performance of operations that do not require the whole collection to be scanned.

This works in the other direction as well. Whenever you search for something, you can specify a shard or several shards and Qdrant will know where to find them. It will avoid asking all machines in your cluster for results. This will minimize overhead and maximize performance.

Common use cases

A clear use-case for this feature is managing a multitenant collection, where each tenant (let it be a user or organization) is assumed to be segregated, so they can have their data stored in separate shards. Sharding solves the problem of region-based data placement, whereby certain data needs to be kept within specific locations. To do this, however, you will need to move your shards between nodes.

Figure 2: Users can both upsert and query shards that are relevant to them, all within the same collection. Regional sharding can help avoid cross-continental traffic.

Custom sharding also gives you precise control over other use cases. A time-based data placement means that data streams can index shards that represent latest updates. If you organize your shards by date, you can have great control over the recency of retrieved data. This is relevant for social media platforms, which greatly rely on time-sensitive data.

Before I go any further.....how secure is my user data?

By design, Qdrant offers three levels of isolation. We initially introduced collection-based isolation, but your scaled setup has to move beyond this level. In this scenario, you will leverage payload-based isolation (from multitenancy) and resource-based isolation (from sharding). The ultimate goal is to have a single collection, where you can manipulate and customize placement of shards inside your cluster more precisely and avoid any kind of overhead. The diagram below shows the arrangement of your data within a two-tier isolation arrangement.

Figure 3: Users can query the collection based on two filters: the group_id and the individual shard_key_selector. This gives your data two additional levels of isolation.

Create custom shards for a single collection

When creating a collection, you will need to configure user-defined sharding. This lets you control the shard placement of your data, so that operations can hit only the subset of shards they actually need. In big clusters, this can significantly improve the performance of operations, since you won't need to go through the entire collection to retrieve data.

client.create_collection(
    collection_name="{tenant_data}",
    shard_number=2,
    sharding_method=models.ShardingMethod.CUSTOM,
    # ... other collection parameters
)
client.create_shard_key("{tenant_data}", "canada")
client.create_shard_key("{tenant_data}", "germany")

In this example, your cluster is divided between Germany and Canada. Canadian and German law differ when it comes to international data transfer. Let's say you are creating a RAG application that supports the healthcare industry. Your Canadian customer data will have to be clearly separated for compliance purposes from your German customer.

Even though it is part of the same collection, data from each shard is isolated from other shards and can be retrieved as such. For additional examples on shards and retrieval, consult Distributed Deployments documentation and Qdrant Client specification.

Configure a multitenant setup for users

Let's continue and start adding data. As you upsert your vectors to your new collection, you can add a group_id field to each vector. If you do this, Qdrant will assign each vector to its respective group.

Additionally, each vector can now be allocated to a shard. You can specify the shard_key_selector for each individual vector. In this example, you are upserting data belonging to tenant_1 to the Canadian region.

client.upsert(
    collection_name="{tenant_data}",
    points=[
        models.PointStruct(
            id=1,
            payload={"group_id": "tenant_1"},
            vector=[0.9, 0.1, 0.1], 
        ),
        models.PointStruct(
            id=2,
            payload={"group_id": "tenant_1"},
            vector=[0.1, 0.9, 0.1],
        ),
    ],
    shard_key_selector="canada",
)

Keep in mind that the data for each group_id is isolated. In the example below, tenant_1 vectors are kept separate from tenant_2. The first tenant will be able to access their data in the Canadian portion of the cluster. However, as shown below tenant_2might only be able to retrieve information hosted in Germany.

client.upsert(
    collection_name="{tenant_data}",
    points=[
        models.PointStruct(
            id=3,
            payload={"group_id": "tenant_2"},
            vector=[0.1, 0.1, 0.9],
        ),
    ],
    shard_key_selector="germany",
)

Retrieve data via filters

The access control setup is completed as you specify the criteria for data retrieval. When searching for vectors, you need to use a query_filter along with group_id to filter vectors for each user.

client.search(
    collection_name="{tenant_data}",
    query_filter=models.Filter(
        must=[
            models.FieldCondition(
                key="group_id",
                match=models.MatchValue(
                    value="tenant_1",
                ),
            ),
        ]
    ),
    query_vector=[0.1, 0.1, 0.9],
    limit=10,
)

Performance considerations

The speed of indexation may become a bottleneck if you are adding large amounts of data in this way, as each user's vector will be indexed into the same collection. To avoid this bottleneck, consider bypassing the construction of a global vector index for the entire collection and building it only for individual groups instead.

By adopting this strategy, Qdrant will index vectors for each user independently, significantly accelerating the process.

To implement this approach, you should:

Set payload_m in the HNSW configuration to a non-zero value, such as 16.
Set m in hnsw config to 0. This will disable building global index for the whole collection.

from qdrant_client import QdrantClient, models

client = QdrantClient("localhost", port=6333)

client.create_collection(
    collection_name="{tenant_data}",
    vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE),
    hnsw_config=models.HnswConfigDiff(
        payload_m=16,
        m=0,
    ),
)

Create keyword payload index for group_id field.

client.create_payload_index(
    collection_name="{tenant_data}",
    field_name="group_id",
    field_schema=models.PayloadSchemaType.KEYWORD,
)

Note: Keep in mind that global requests (without the group_id filter) will be slower since they will necessitate scanning all groups to identify the nearest neighbors.

Next steps

Qdrant is ready to support a massive-scale architecture for your machine learning project. If you want to see whether our vector database is right for you, try the quickstart tutorial or read our docs and tutorials.

To spin up a free instance of Qdrant, sign up for Qdrant Cloud - no strings attached.

Get support or share ideas in our Discord community. This is where we talk about vector search theory, publish examples and demos and discuss vector database setups.