DEV Community: Sabrina

Conjuring Cursed Halloween Tales with Qdrant's Dark Arts

Sabrina — Thu, 31 Oct 2024 15:58:13 +0000

It’s finally Halloween!! 🎃

That time of the year for carved pumpkins, sexy costumes, and eerie stories whispered around a flickering candle.

But if you’re like me, you never quite remember any creepy tales when you need them the most. So I thought, why not create a tool that can go through a massive collection of stories and really pick the ones that can really give us the chills.

So that’s exactly what we’re building today.

The plan is simple.

We’ll take a dataset of Reddit Horror Stories, embed it, and set up a Qdrant collection to search through it based on themes, atmosphere, etc. Essentially, capturing the “vibe” like ‘haunted house’ or ‘creepy forest.’

I’ll show all the steps you'll need to build an app like this: setting up the vector database, embedding and indexing the data, and conjuring the most cursed Halloween tales.

So let's get started.

1. Install the Libraries

First things first, let's start by installing the tools we'll be using:

pip install qdrant-client sentence_transformers datasets

2. Download the Dataset

We'll be using the Reddit horror stories dataset. Let's download it using the datasets library:

from datasets import load_dataset

ds = load_dataset("intone/horror_stories_reddit")

3. Load the Embedding Model

We'll use the sentence_transformers library to help us embed our data with the model all-MiniLM-L6-v2. Here's how we'll set it up:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
          'sentence-transformers/all-MiniLM-L6-v2', device='cpu'
      )

If you have a GPU available and want to speed things up, simply change it to device='cuda:0'.

4. Create the Embeddings

The generate_embeddings_direct function processes the dataset part (like "train") by breaking it into smaller groups called batches, based on the specified batch_size. This helps manage memory efficiently.

For each batch, the function extracts a set of sentences (e.g., 32 at a time) and uses the loaded embedding model to embed them.

from tqdm import tqdm

def generate_embeddings(split, batch_size=32):
    embeddings = []
    split_name = [name for name, data_split in ds.items() if data_split is split][0]

    with tqdm(total=len(split), desc=f"Generating embeddings for {split_name} split") as pbar:
        for i in range(0, len(split), batch_size):
            batch_sentences = split['text'][i:i+batch_size]
            batch_embeddings = model.encode(batch_sentences)
            embeddings.extend(batch_embeddings)
            pbar.update(len(batch_sentences))

    return embeddings

It immediately adds them in a new column in the dataset. This way, the function efficiently updates the dataset without overloading memory.

train_embeddings = generate_embeddings(ds['train'])
ds["train"] = ds["train"].add_column("embeddings", train_embeddings)

5. Set up a Client

Now we can start our Qdrant Client. If you’re working locally, just connect to the default endpoint and you’re good to go:

from qdrant_client import QdrantClient

# Connecting to a locally running instance
qdrant_client = QdrantClient(url="http://localhost:6333")

Simple enough, right? But in the real world, you’re likely working in the cloud. That means getting your Qdrant Cloud instance set up and authenticated.

Cloud Setup

To connect to your cloud instance, you’ll need the instance URL and an API key. Here’s how to do it:

from qdrant_client import QdrantClient

# Initialize the client with the Qdrant Cloud URL and API key
qdrant_client = QdrantClient(
    url="https://YOUR_CLOUD_INSTANCE_ID.aws.qdrant.tech",  # Replace with your cloud instance URL
    api_key="YOUR_API_KEY"  # Replace with your API key
)

Make sure to replace YOUR_CLOUD_INSTANCE_ID with your actual instance ID and YOUR_API_KEY with the API key you created. You’ll find these in your Qdrant Cloud Console.

6. Create a Collection

A collection in Qdrant is like a mini-database optimized for storing and querying vectors. When defining one, we need to set the size of our vectors and the metric to measure similarity. Here’s what that setup might look like:

from qdrant_client import models

collection_name="halloween"

# Creating a collection to hold vectors for product features
qdrant_client.create_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(size=384, distance=models.Distance.COSINE)
)

We defined a collection halloween with 384-dimensional vectors which is the size of the all-MiniLM-L6-v2 embeddings. Cosine distance is used here as our similarity metric. Depending on your data and use case, you might want to use different distance metrics like Distance.EUCLID or Distance.DOT.

7. Load the Vectors

Collections are nothing without data. It’s time to insert the embeddings we created earlier into it. Here’s a strategy to load embeddings in batches:

def batched(iterable, n):
    iterator = iter(iterable)
    while batch := list(islice(iterator, n)):
        yield batch

batch_size = 100
current_id = 0  # Initialize a counter

The batched function divides an iterable into smaller chunks of size n. It uses islice to extract consecutive elements and yields each chunk until the dataset is fully processed.

from itertools import islice

for batch in batched(ds["train"], batch_size):
    # Generate a list of IDs using the counter
    ids = list(range(current_id, current_id + len(batch)))

    # Update the counter to continue from the next ID after the batch
    current_id += len(batch)

    vectors = [point.pop("embeddings") for point in batch]

    qdrant_client.upsert(
        collection_name=collection_name,
        points=models.Batch(
            ids=ids,
            vectors=vectors,
            payloads=batch,
        ),
    )

Each batch is sent to Qdrant using the upsert method, which inserts the batch of data. The upsert method takes a collection of IDs, vectors, and remaining item data (payloads) to store or update in the Qdrant collection.

8. Conjuring the Cursed Tales

Finally, it's time.

With everything set up, it’s time to see if our horror story search tool can deliver some real scares. Let’s try searching for a theme like “creepy clown” and see what we get:

import json
import textwrap

# Function to wrap and print long text
def print_wrapped(text, width=80):
    wrapped_text = textwrap.fill(text, width=width)
    print(wrapped_text)

# Search result query
search_result = qdrant_client.query_points(
    collection_name=collection_name,
    query=model.encode("creepy clown").tolist(),
    limit=1,
)

# Access the first result
if search_result.points:
    tale = search_result.points[0]

    # Pretty-print the payload
    print("ID:", tale.id)
    print("Score:", tale.score)
    print("Original:", tale.payload.get('isOriginal', 'N/A'))

    # Print specific payload fields
    print("Title:", tale.payload.get('title', 'N/A'))
    print("Author:", tale.payload.get('author', 'N/A'))
    print("Subreddit:", tale.payload.get('subreddit', 'N/A'))
    print("URL:", tale.payload.get('url', 'N/A'))

    # Print the text of the story separately with word wrapping for readability
    print("\nStory Text:\n")
    print_wrapped(tale.payload.get('text', 'No text available'), width=80)
else:
    print("No results found.")

The result popped up, and there it was: a story titled “Sneaky Peeky.”

And honestly, it was CREEPY.

Whether it’s based on a true story or not? Honestly, I don’t know. It leaves you with that lingering unease like something’s watching. It's quite long, so I won't post it here, but if you want to see for yourself, go ahead—run the program and try it.

You can explore any other atmosphere: “haunted house,” “creepy forest,” “possessed doll,” or whatever you’re in the mood for. Who knows? You might find something even creepier.

If you do, please post it in the comments. I’d love to see what else this thing can discover.

Next Steps

Thanks for sticking with me through this Halloween experiment! If you’ve followed along, you’ve now taken your first step into the world of vector search and learned how to find stories that feel creepy rather than just containing spooky words.

If you’re ready to go into the dark arts of vector search, there are lots of more advanced topics you can explore, like multitenancy, payload structures, and bulk upload.

So, go ahead, and see just how deep you can go.

Happy hunting! 👻

What is Vector Quantization?

Sabrina — Fri, 27 Sep 2024 16:14:21 +0000

Vector quantization is a data compression technique used to reduce the size of high-dimensional data. Compressing vectors reduces memory usage while maintaining nearly all of the essential information. This method allows for more efficient storage and faster search operations, particularly in large datasets.

When working with high-dimensional vectors, such as embeddings from providers like OpenAI, a single 1536-dimensional vector requires 6 KB of memory.

With 1 million vectors needing around 6 GB of memory, as your dataset grows to multiple millions of vectors, the memory and processing demands increase significantly.

To understand why this process is so computationally demanding, let's take a look at the nature of the HNSW index.

The HNSW (Hierarchical Navigable Small World) index organizes vectors in a layered graph, connecting each vector to its nearest neighbors. At each layer, the algorithm narrows down the search area until it reaches the lower layers, where it efficiently finds the closest matches to the query.

Each time a new vector is added, the system must determine its position in the existing graph, a process similar to searching. This makes both inserting and searching for vectors complex operations.

One of the key challenges with the HNSW index is that it requires a lot of random reads and sequential traversals through the graph. This makes the process computationally expensive, especially when you're dealing with millions of high-dimensional vectors.

The system has to jump between various points in the graph in an unpredictable manner. This unpredictability makes optimization difficult, and as the dataset grows, the memory and processing requirements increase significantly.

Since vectors need to be stored in fast storage like RAM or SSD for low-latency searches, as the size of the data grows, so does the cost of storing and processing it efficiently.

Quantization offers a solution by compressing vectors to smaller memory sizes, making the process more efficient.

There are several methods to achieve this, and here we will focus on three main ones:

1. What is Scalar Quantization?

In Qdrant, each dimension is represented by a float32 value, which uses 4 bytes of memory. When using Scalar Quantization, we map our vectors to a range that the smaller int8 type can represent. An int8 is only 1 byte and can represent 256 values (from -128 to 127, or 0 to 255). This results in a 75% reduction in memory size.

For example, if our data lies in the range of -1.0 to 1.0, Scalar Quantization will transform these values to a range that int8 can represent, i.e., within -128 to 127. The system maps the float32 values into this range.

Here's a simple linear example of what this process looks like:

To set up Scalar Quantization in Qdrant, you need to include the quantization_config section when creating or updating a collection:

PUT /collections/{collection_name}
{
    "vectors": {
      "size": 128,
      "distance": "Cosine"
    },
    "quantization_config": {
        "scalar": {
            "type": "int8",
            "quantile": 0.99,
            "always_ram": true
        }
    }
}

The quantile parameter is used to calculate the quantization bounds. For example, if you specify a 0.99 quantile, the most extreme 1% of values will be excluded from the quantization bounds.

This parameter only affects the resulting precision, not the memory footprint. You can adjust it if you experience a significant decrease in search quality.

Scalar Quantization is a great choice if you're looking to boost search speed and compression without losing much accuracy. It also slightly improves performance, as distance calculations (such as dot product or cosine similarity) using int8 values are computationally simpler than using float32 values.

While the performance gains of Scalar Quantization may not match those achieved with Binary Quantization (which we'll discuss later), it remains an excellent default choice when Binary Quantization isn’t suitable for your use case.

2. What is Binary Quantization?

Binary Quantization is an excellent option if you're looking to reduce memory usage while also achieving a significant boost in speed. It works by converting high-dimensional vectors into simple binary (0 or 1) representations.

Values greater than zero are converted to 1.
Values less than or equal to zero are converted to 0.

Let's consider our initial example of a 1536-dimensional vector that requires 6 KB of memory (4 bytes for each float32 value).

After Binary Quantization, each dimension is reduced to 1 bit (1/8 byte), so the memory required is:

\frac{1536 dimensions}{8 bits per byte} = 192 bytes

This leads to a 32x memory reduction.

Qdrant automates the Binary Quantization process during indexing. As vectors are added to your collection, each 32-bit floating-point component is converted into a binary value according to the configuration you define.

Here’s how you can set it up:

PUT /collections/{collection_name}
{
    "vectors": {
      "size": 1536,
      "distance": "Cosine"
    },
    "quantization_config": {
        "binary": {
            "always_ram": true
        }
    }
}

Binary Quantization is by far the quantization method that provides the most significant processing speed gains compared to Scalar and Product Quantizations. This is because the binary representation allows the system to use highly optimized CPU instructions, such as XOR and Popcount, for fast distance computations.

It can speed up search operations by up to 40x, depending on the dataset and hardware.

Not all models are equally compatible with Binary Quantization, and in the comparison above, we are only using models that are compatible. Some models may experience a greater loss in accuracy when quantized. We recommend using Binary Quantization with models that have at least 1024 dimensions to minimize accuracy loss.

The models that have shown the best compatibility with this method include:

OpenAI text-embedding-ada-002 (1536 dimensions)
Cohere AI embed-english-v2.0 (4096 dimensions)

These models demonstrate minimal accuracy loss while still benefiting from substantial speed and memory gains.

Even though Binary Quantization is incredibly fast and memory-efficient, the trade-offs are in precision and model compatibility, so you may need to ensure search quality using techniques like oversampling and rescoring.

If you're interested in exploring Binary Quantization in more detail—including implementation examples, benchmark results, and usage recommendations—check out our dedicated article on Binary Quantization - Vector Search, 40x Faster.

3. What is Product Quantization?

Product Quantization is a method used to compress high-dimensional vectors by representing them with a smaller set of representative points.

The process begins by splitting the original high-dimensional vectors into smaller sub-vectors. Each sub-vector represents a segment of the original vector, capturing different characteristics of the data.

For each sub-vector, a separate codebook is created, representing regions in the data space where common patterns occur.

The codebook in Qdrant is trained automatically during the indexing process. As vectors are added to the collection, Qdrant uses your specified quantization settings in the quantization_config to build the codebook and quantize the vectors. Here’s how you can set it up:

PUT /collections/{collection_name}
{
    "vectors": {
      "size": 1024,
      "distance": "Cosine"
    },
    "quantization_config": {
        "product": {
            "compression": "x32",
            "always_ram": true
        }
    }
}

Each region in the codebook is defined by a centroid, which serves as a representative point summarizing the characteristics of that region. Instead of treating every single data point as equally important, we can group similar sub-vectors together and represent them with a single centroid that captures the general characteristics of that group.

The centroids used in Product Quantization are determined using the K-means clustering algorithm.

Qdrant always selects K = 256 as the number of centroids in its implementation, based on the fact that 256 is the maximum number of unique values that can be represented by a single byte.

This makes the compression process efficient because each centroid index can be stored in a single byte.

The original high-dimensional vectors are quantized by mapping each sub-vector to the nearest centroid in its respective codebook.

The compressed vector stores the index of the closest centroid for each sub-vector.

Here’s how a 1024-dimensional vector, originally taking up 4096 bytes, is reduced to just 128 bytes by representing it as 128 indexes, each pointing to the centroid of a sub-vector:

After setting up quantization and adding your vectors, you can perform searches as usual. Qdrant will automatically use the quantized vectors, optimizing both speed and memory usage. Optionally, you can enable rescoring for better accuracy.

POST /collections/{collection_name}/points/search
{
    "query": [0.22, -0.01, -0.98, 0.37],
    "params": {
        "quantization": {
            "rescore": true
        }
    },
    "limit": 10
}

Product Quantization can significantly reduce memory usage, potentially offering up to 64x compression in certain configurations. However, it's important to note that this level of compression can lead to a noticeable drop in quality.

If your application requires high precision or real-time performance, Product Quantization may not be the best choice. However, if memory savings are critical and some accuracy loss is acceptable, it could still be an ideal solution.

Here’s a comparison of speed, accuracy, and compression for all three methods, adapted from Qdrant's documentation:

Quantization method	Accuracy	Speed	Compression
Scalar	0.99	up to x2	4
Product	0.7	0.5	up to 64
Binary	0.95*	up to x40	32

* - for compatible models

For a more in-depth understanding of the benchmarks you can expect, check out our dedicated article on Product Quantization in Vector Search.

Rescoring, Oversampling, and Reranking

When we use quantization methods like Scalar, Binary, or Product Quantization, we're compressing our vectors to save memory and improve performance. However, this compression removes some detail from the original vectors.

This can slightly reduce the accuracy of our similarity searches because the quantized vectors are approximations of the original data. To mitigate this loss of accuracy, you can use oversampling and rescoring, which help improve the accuracy of the final search results.

The original vectors are never deleted during this process, and you can easily switch between quantization methods or parameters by updating the collection configuration at any time.

Here’s how the process works, step by step:

1. Initial Quantized Search

When you perform a search, Qdrant retrieves the top candidates using the quantized vectors based on their similarity to the query vector, as determined by the quantized data. This step is fast because we're using the quantized vectors.

2. Oversampling

Oversampling is a technique that helps compensate for any precision lost due to quantization. Since quantization simplifies vectors, some relevant matches could be missed in the initial search. To avoid this, you can retrieve more candidates, increasing the chances that the most relevant vectors make it into the final results.

You can control the number of extra candidates by setting an oversampling parameter. For example, if your desired number of results (limit) is 4 and you set an oversampling factor of 2, Qdrant will retrieve 8 candidates (4 × 2).

You can adjust the oversampling factor to control how many extra vectors Qdrant includes in the initial pool. More candidates mean a better chance of obtaining high-quality top-K results, especially after rescoring with the original vectors.

3. Rescoring with Original Vectors

After oversampling to gather more potential matches, each candidate is re-evaluated based on additional criteria to ensure higher accuracy and relevance to the query.

The rescoring process maps the quantized vectors to their corresponding original vectors, allowing you to consider factors like context, metadata, or additional relevance that wasn’t included in the initial search, leading to more accurate results.

During rescoring, one of the lower-ranked candidates from oversampling might turn out to be a better match than some of the original top-K candidates.

Even though rescoring uses the original, larger vectors, the process remains much faster because only a very small number of vectors are read. The initial quantized search already identifies the specific vectors to read, rescore, and rerank.

4. Reranking

With the new similarity scores from rescoring, reranking is where the final top-K candidates are determined based on the updated similarity scores.

For example, in our case with a limit of 4, a candidate that ranked 6th in the initial quantized search might improve its score after rescoring because the original vectors capture more context or metadata. As a result, this candidate could move into the final top 4 after reranking, replacing a less relevant option from the initial search.

Here's how you can set it up:

POST /collections/{collection_name}/points/search
{
  "query": [0.22, -0.01, -0.98, 0.37],
  "params": {
    "quantization": {
      "rescore": true,
      "oversampling": 2
    }
  },
  "limit": 4
}

You can adjust the oversampling factor to find the right balance between search speed and result accuracy.

If quantization is impacting performance in an application that requires high accuracy, combining oversampling with rescoring is a great choice. However, if you need faster searches and can tolerate some loss in accuracy, you might choose to use oversampling without rescoring, or adjust the oversampling factor to a lower value.

Distributing Resources Between Disk & Memory

Qdrant stores both the quantized and original vectors. When you enable quantization, both the original and quantized vectors are stored in RAM by default. You can move the original vectors to disk to significantly reduce RAM usage and lower system costs. Simply enabling quantization is not enough—you need to explicitly move the original vectors to disk by setting on_disk=True.

Here’s an example configuration:

PUT /collections/{collection_name}
{
  "vectors": {
    "size": 1536,
    "distance": "Cosine",
    "on_disk": true  # Move original vectors to disk
  },
  "quantization_config": {
    "binary": {
      "always_ram": true  # Store only quantized vectors in RAM
    }
  }
}

Without explicitly setting on_disk=True, you won't see any RAM savings, even with quantization enabled. So, make sure to configure both storage and quantization options based on your memory and performance needs. If your storage has high disk latency, consider disabling rescoring to maintain speed.

Speeding Up Rescoring with io_uring

When dealing with large collections of quantized vectors, frequent disk reads are required to retrieve both original and compressed data for rescoring operations. While mmap helps with efficient I/O by reducing user-to-kernel transitions, rescoring can still be slowed down when working with large datasets on disk due to the need for frequent disk reads.

On Linux-based systems, io_uring allows multiple disk operations to be processed in parallel, significantly reducing I/O overhead. This optimization is particularly effective during rescoring, where multiple vectors need to be re-evaluated after the initial search. With io_uring, Qdrant can retrieve and rescore vectors from disk in the most efficient way, improving overall search performance.

When you perform vector quantization and store data on disk, Qdrant often needs to access multiple vectors in parallel. Without io_uring, this process can be slowed down due to the system’s limitations in handling many disk accesses.

To enable io_uring in Qdrant, add the following to your storage configuration:

storage:
  async_scorer: true  # Enable io_uring for async storage

Without this configuration, Qdrant will default to using mmap for disk I/O operations.

For more information and benchmarks comparing io_uring with traditional I/O approaches like mmap, check out Qdrant's io_uring implementation article.

Performance of Quantized vs. Non-Quantized Data

Qdrant uses the quantized vectors by default if they are available. If you want to evaluate how quantization affects your search results, you can temporarily disable it to compare results from quantized and non-quantized searches. To do this, set ignore: true in the query:

POST /collections/{collection_name}/points/query
{
    "query": [0.22, -0.01, -0.98, 0.37],
    "params": {
        "quantization": {
            "ignore": true,
        }
    },
    "limit": 4
}

Switching Between Quantization Methods

Not sure if you’ve chosen the right quantization method? In Qdrant, you have the flexibility to remove quantization and rely solely on the original vectors, adjust the quantization type, or change compression parameters at any time without affecting your original vectors.

To switch to binary quantization and adjust the compression rate, for example, you can update the collection’s quantization configuration using the update_collection method:

PUT /collections/{collection_name}
{
  "vectors": {
    "size": 1536,
    "distance": "Cosine"
  },
  "quantization_config": {
    "binary": {
      "always_ram": true,
      "compression_rate": 0.8  # Set the new compression rate
    }
  }
}

If you decide to turn off quantization and use only the original vectors, you can remove the quantization settings entirely with quantization_config=None:

PUT /collections/my_collection
{
  "vectors": {
    "size": 1536,
    "distance": "Cosine"
  },
  "quantization_config": null  # Remove quantization and use original vectors only
}

Wrapping Up

Quantization methods like Scalar, Product, and Binary Quantization offer powerful ways to optimize memory usage and improve search performance when dealing with large datasets of high-dimensional vectors. Each method comes with its own trade-offs between memory savings, computational speed, and accuracy.

Here are some final thoughts to help you choose the right quantization method for your needs:

Quantization Method	Key Features	When to Use
Binary Quantization	• Fastest method and most memory-efficient • Up to 40x faster search and 32x reduced memory footprint	• Use with tested models like OpenAI's `text-embedding-ada-002` and Cohere's `embed-english-v2.0` • When speed and memory efficiency are critical
Scalar Quantization	• Minimal loss of accuracy • Up to 4x reduced memory footprint	• Safe default choice for most applications. • Offers a good balance between accuracy, speed, and compression.
Product Quantization	• Highest compression ratio • Up to 64x reduced memory footprint	• When minimizing memory usage is the top priority • Acceptable if some loss of accuracy and slower indexing is tolerable

Learn More

If you want to learn more about improving accuracy, memory efficiency, and speed when using quantization in Qdrant, we have a dedicated Quantization tips section in our docs that explains all the quantization tips you can use to enhance your results.

Learn more about optimizing real-time precision with oversampling in Binary Quantization by watching this interview with Qdrant’s CTO, Andrey Vasnetsov:

Stay up-to-date on the latest in vector search and quantization, share your projects, ask questions, join our vector search community!

A Complete Guide to Filtering in Vector Search

Sabrina — Thu, 12 Sep 2024 14:45:10 +0000

Imagine you sell computer hardware. To help shoppers easily find products on your website, you need to have a user-friendly search engine.

If you’re selling computers and have extensive data on laptops, desktops, and accessories, your search feature should guide customers to the exact device they want - or a very similar match needed.

When storing data in Qdrant, each product is a point, consisting of an id, a vector and payload:

{
  "id": 1, 
  "vector": [0.1, 0.2, 0.3, 0.4],
  "payload": {
    "price": 899.99,
    "category": "laptop"
  }
}

The id is a unique identifier for the point in your collection. The vector is a mathematical representation of similarity to other points in the collection.
Finally, the payload holds metadata that directly describes the point.

Though we may not be able to decipher the vector, we are able to derive additional information about the item from its metadata, In this specific case, we are looking at a data point for a laptop that costs $899.99.

What is filtering?

When searching for the perfect computer, your customers may end up with results that are mathematically similar to the search entry, but not exact. For example, if they are searching for laptops under $1000, a simple vector search without constraints might still show other laptops over $1000.

This is why semantic search alone may not be enough. In order to get the exact result, you would need to enforce a payload filter on the price. Only then can you be sure that the search results abide by the chosen characteristic.

This is called filtering and it is one of the key features of vector databases.
Here is how a filtered vector search looks behind the scenes. We'll cover its mechanics in the following section.

POST /collections/online_store/points/search
{
  "vector": [ 0.2, 0.1, 0.9, 0.7 ],
  "filter": {
    "must": [
      {
        "key": "category",
        "match": { "value": "laptop" }
      },
      {
        "key": "price",
        "range": {
          "gt": null,
          "gte": null,
          "lt": null,
          "lte": 1000
        }
      }
    ]
  },
  "limit": 3,
  "with_payload": true,
  "with_vector": false
}

The filtered result will be a combination of the semantic search and the filtering conditions imposed upon the query. In the following pages, we will show that filtering is a key practice in vector search for two reasons:

With filtering, you can dramatically increase search precision. More on this in the next section.
Filtering helps control resources and reduce compute use. More on this in Payload Indexing.

What you will learn in this guide:

In vector search, filtering and sorting are more interdependent than they are in traditional databases. While databases like SQL use commands such as WHERE and ORDER BY, the interplay between these processes in vector search is a bit more complex.

Most people use default settings and build vector search apps that aren't properly configured or even setup for precise retrieval. In this guide, we will show you how to use filtering to get the most out of vector search with some basic and advanced strategies that are easy to implement.

Remember to run all tutorial code in Qdrant's Dashboard

The easiest way to reach that "Hello World" moment is to try filtering in a live cluster. Our interactive tutorial will show you how to create a cluster, add data and try some filtering clauses.

Qdrant's approach to filtering

Qdrant follows a specific method of searching and filtering through dense vectors.

Let's take a look at this 3-stage diagram. In this case, we are trying to find the nearest neighbour to the query vector (green). Your search journey starts at the bottom (orange).

By default, Qdrant connects all your data points within the vector index. After you introduce filters, some data points become disconnected. Vector search can't cross the grayed out area and it won't reach the nearest neighbor.
How can we bridge this gap?

Figure 1: How Qdrant maintains a filterable vector index.

Filterable vector index: This technique builds additional links (orange) between leftover data points. The filtered points which stay behind are now traversible once again. Qdrant uses special category-based methods to connect these data points.

Qdrant's approach vs traditional filtering methods

The filterable vector index is Qdrant's solves pre and post-filtering problems by adding specialized links to the search graph. It aims to maintain the speed advantages of vector search while allowing for precise filtering, addressing the inefficiencies that can occur when applying filters after the vector search.

Pre-filtering

In pre-filtering, a search engine first narrows down the dataset based on chosen metadata values, and then searches within that filtered subset. This reduces unnecessary computation over a dataset that is potentially much larger.

The choice between pre-filtering and using the filterable HNSW index depends on filter cardinality. When metadata cardinality is too low, the filter becomes restrictive and it can disrupt the connections within the graph. This leads to fragmented search paths (as in Figure 1). When the semantic search process begins, it won’t be able to travel to those locations.

However, Qdrant still benefits from pre-filtering under certain conditions. In cases of low cardinality, Qdrant's query planner stops using HNSW and switches over to the payload index alone. This makes the search process much cheaper and faster than if using HNSW.

Figure 2: On the user side, this is how filtering looks. We start with five products with different prices. First, the $1000 price filter is applied, narrowing down the selection of laptops. Then, a vector search finds the relevant results within this filtered set.

In conclusion, pre-filtering is efficient in specific cases when you use small datasets with low cardinality metadata. However, pre-filtering should not be used over large datasets as it breaks too many links in the HNSW graph, causing lower accuracy.

Post-filtering

In post-filtering, a search engine first looks for similar vectors and retrieves a larger set of results. Then, it applies filters to those results based on metadata. The problem with post-filtering becomes apparent when using low-cardinality filters.

When you apply a low-cardinality filter after performing a vector search, you often end up discarding a large portion of the results that the vector search returned.

Figure 3: In the same example, we have five laptops. First, the vector search finds the top two relevant results, but they may not meet the price match. When the $1000 price filter is applied, other potential results are discarded.

The system will waste computational resources by first finding similar vectors and then discarding many that don't meet the filter criteria. You're also limited to filtering only from the initial set of vector search results. If your desired items aren't in this initial set, you won't find them, even if they exist in the database.

Basic filtering example: ecommerce and laptops

We know that there are three possible laptops that suit our price point.
Let's see how Qdrant's filterable vector index works and why it is the best method of capturing all available results.

First, add five new laptops to your online store. Here is a sample input:

laptops = [
    (1, [0.1, 0.2, 0.3, 0.4], {"price": 899.99, "category": "laptop"}),
    (2, [0.2, 0.3, 0.4, 0.5], {"price": 1299.99, "category": "laptop"}),
    (3, [0.3, 0.4, 0.5, 0.6], {"price": 799.99, "category": "laptop"}),
    (4, [0.4, 0.5, 0.6, 0.7], {"price": 1099.99, "category": "laptop"}),
    (5, [0.5, 0.6, 0.7, 0.8], {"price": 949.99, "category": "laptop"})
]

The four-dimensional vector can represent features like laptop CPU, RAM or battery life, but that isn’t specified. The payload, however, specifies the exact price and product category.

Now, set the filter to "price is less than $1000":

{
  "key": "price",
  "range": {
    "gt": null,
    "gte": null,
    "lt": null,
    "lte": 1000
  }
}

When a price filter of equal/less than $1000 is applied, vector search returns the following results:

[
  {
    "id": 3,
    "score": 0.9978443564622781,
    "payload": {
      "price": 799.99,
      "category": "laptop"
    }
  },
  {
    "id": 1,
    "score": 0.9938079894227599,
    "payload": {
      "price": 899.99,
      "category": "laptop"
    }
  },
  {
    "id": 5,
    "score": 0.9903751498208603,
    "payload": {
      "price": 949.99,
      "category": "laptop"
    }
  }
]

As you can see, Qdrant's filtering method has a greater chance of capturing all possible search results.

This specific example uses the range condition for filtering. Qdrant, however, offers many other possible ways to structure a filter

For detailed usage examples, filtering docs are the best resource.

Scrolling instead of searching

You don't need to use our search and query APIs to filter through data. The scroll API is another option that lets you retrieve lists of points which meet the filters.

If you aren't interested in finding similar points, you can simply list the ones that match a given filter. While search gives you the most similar points based on some query vector, scroll will give you all points matching your filter not considering similarity.

In Qdrant, scrolling is used to iteratively retrieve large sets of points from a collection. It is particularly useful when you’re dealing with a large number of points and don’t want to load them all at once. Instead, Qdrant provides a way to scroll through the points one page at a time.

You start by sending a scroll request to Qdrant with specific conditions like filtering by payload, vector search, or other criteria.

Let's retrieve a list of top 10 laptops ordered by price in the store:

POST /collections/online_store/points/scroll
{
    "filter": {
        "must": [
            {
                "key": "category",
                "match": {
                    "value": "laptop"
                }
            }
        ]
    },
    "limit": 10,
    "with_payload": true,
    "with_vector": false,
    "order_by": [
        {
            "key": "price",
        }
    ]
}

The response contains a batch of points that match the criteria and a reference (offset or next page token) to retrieve the next set of points.

Scrolling is designed to be efficient. It minimizes the load on the server and reduces memory consumption on the client side by returning only manageable chunks of data at a time.

Available filtering conditions

Condition	Usage	Condition	Usage
Match	Exact value match.	Range	Filter by value range.
Match Any	Match multiple values.	Datetime Range	Filter by date range.
Match Except	Exclude specific values.	UUID Match	Filter by unique ID.
Nested Key	Filter by nested data.	Geo	Filter by location.
Nested Object	Filter by nested objects.	Values Count	Filter by element count.
Full Text Match	Search in text fields.	Is Empty	Filter empty fields.
Has ID	Filter by unique ID.	Is Null	Filter null values.

All clauses and conditions are outlined in Qdrant's filtering documentation.

Filtering clauses to remember

Clause	Description	Clause	Description
Must	Includes items that meet the condition (similar to `AND`).	Should	Filters if at least one condition is met (similar to `OR`).
Must Not	Excludes items that meet the condition (similar to `NOT`).	Clauses Combination	Combines multiple clauses to refine filtering (similar to `AND`).

Advanced filtering example: dinosaur diets

We can also use nested filtering to query arrays of objects within the payload. In this example, we have two points. They each represent a dinosaur with a list of food preferences (diet) that indicate what type of food they like or dislike:

[
  {
    "id": 1,
    "dinosaur": "t-rex",
    "diet": [
      { "food": "leaves", "likes": false},
      { "food": "meat", "likes": true}
    ]
  },
  {
    "id": 2,
    "dinosaur": "diplodocus",
    "diet": [
      { "food": "leaves", "likes": true},
      { "food": "meat", "likes": false}
    ]
  }
]

To ensure that both conditions are applied to the same array element (e.g., food = meat and likes = true must refer to the same diet item), you need to use a nested filter.

Nested filters are used to apply conditions within an array of objects. They ensure that the conditions are evaluated per array element, rather than across all elements.

POST /collections/dinosaurs/points/scroll
{
    "filter": {
        "must": [
            {
                "key": "diet[].food",
                  "match": {
                    "value": "meat"
                }
            },
            {
                "key": "diet[].likes",
                  "match": {
                    "value": true
                }
            }
        ]
    }
}

client.scroll(
    collection_name="dinosaurs",
    scroll_filter=models.Filter(
        must=[
            models.FieldCondition(
                key="diet[].food", match=models.MatchValue(value="meat")
            ),
            models.FieldCondition(
                key="diet[].likes", match=models.MatchValue(value=True)
            ),
        ],
    ),
)

client.scroll("dinosaurs", {
  filter: {
    must: [
      {
        key: "diet[].food",
        match: { value: "meat" },
      },
      {
        key: "diet[].likes",
        match: { value: true },
      },
    ],
  },
});

use qdrant_client::qdrant::{Condition, Filter, ScrollPointsBuilder};

client
    .scroll(
        ScrollPointsBuilder::new("dinosaurs").filter(Filter::must([
            Condition::matches("diet[].food", "meat".to_string()),
            Condition::matches("diet[].likes", true),
        ])),
    )
    .await?;

import java.util.List;

import static io.qdrant.client.ConditionFactory.match;
import static io.qdrant.client.ConditionFactory.matchKeyword;

import io.qdrant.client.QdrantClient;
import io.qdrant.client.QdrantGrpcClient;
import io.qdrant.client.grpc.Points.Filter;
import io.qdrant.client.grpc.Points.ScrollPoints;

QdrantClient client =
    new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build());

client
    .scrollAsync(
        ScrollPoints.newBuilder()
            .setCollectionName("dinosaurs")
            .setFilter(
                Filter.newBuilder()
                    .addAllMust(
                        List.of(matchKeyword("diet[].food", "meat"), match("diet[].likes", true)))
                    .build())
            .build())
    .get();

using Qdrant.Client;
using static Qdrant.Client.Grpc.Conditions;

var client = new QdrantClient("localhost", 6334);

await client.ScrollAsync(
    collectionName: "dinosaurs",
    filter: MatchKeyword("diet[].food", "meat") & Match("diet[].likes", true)
);

This happens because both points are matching the two conditions:

the "t-rex" matches food=meat on diet[1].food and likes=true on diet[1].likes
the "diplodocus" matches food=meat on diet[1].food and likes=true on diet[0].likes

To retrieve only the points where the conditions apply to a specific element within an array (such as the point with id 1 in this example), you need to use a nested object filter.

Nested object filters enable querying arrays of objects independently, ensuring conditions are checked within individual array elements.

This is done by using the nested condition type, which consists of a payload key that targets an array and a filter to apply. The key should reference an array of objects and can be written with or without bracket notation (e.g., "data" or "data[]").

POST /collections/dinosaurs/points/scroll
{
    "filter": {
        "must": [{
            "nested": {
                "key": "diet",
                "filter":{
                    "must": [
                        {
                            "key": "food",
                            "match": {
                                "value": "meat"
                            }
                        },
                        {
                            "key": "likes",
                            "match": {
                                "value": true
                            }
                        }
                    ]
                }
            }
        }]
    }
}

client.scroll(
    collection_name="dinosaurs",
    scroll_filter=models.Filter(
        must=[
            models.NestedCondition(
                nested=models.Nested(
                    key="diet",
                    filter=models.Filter(
                        must=[
                            models.FieldCondition(
                                key="food", match=models.MatchValue(value="meat")
                            ),
                            models.FieldCondition(
                                key="likes", match=models.MatchValue(value=True)
                            ),
                        ]
                    ),
                )
            )
        ],
    ),
)

client.scroll("dinosaurs", {
  filter: {
    must: [
      {
        nested: {
          key: "diet",
          filter: {
            must: [
              {
                key: "food",
                match: { value: "meat" },
              },
              {
                key: "likes",
                match: { value: true },
              },
            ],
          },
        },
      },
    ],
  },
});

use qdrant_client::qdrant::{Condition, Filter, NestedCondition, ScrollPointsBuilder};

client
    .scroll(
        ScrollPointsBuilder::new("dinosaurs").filter(Filter::must([NestedCondition {
            key: "diet".to_string(),
            filter: Some(Filter::must([
                Condition::matches("food", "meat".to_string()),
                Condition::matches("likes", true),
            ])),
        }
        .into()])),
    )
    .await?;

import java.util.List;

import static io.qdrant.client.ConditionFactory.match;
import static io.qdrant.client.ConditionFactory.matchKeyword;
import static io.qdrant.client.ConditionFactory.nested;

import io.qdrant.client.grpc.Points.Filter;
import io.qdrant.client.grpc.Points.ScrollPoints;

client
    .scrollAsync(
        ScrollPoints.newBuilder()
            .setCollectionName("dinosaurs")
            .setFilter(
                Filter.newBuilder()
                    .addMust(
                        nested(
                            "diet",
                            Filter.newBuilder()
                                .addAllMust(
                                    List.of(
                                        matchKeyword("food", "meat"), match("likes", true)))
                                .build()))
                    .build())
            .build())
    .get();

using Qdrant.Client;
using static Qdrant.Client.Grpc.Conditions;

var client = new QdrantClient("localhost", 6334);

await client.ScrollAsync(
    collectionName: "dinosaurs",
    filter: Nested("diet", MatchKeyword("food", "meat") & Match("likes", true))
);

The matching logic is adjusted to operate at the level of individual elements within an array in the payload.

Nested filters function as though each element of the array is evaluated separately. The parent document will be considered a match if at least one array element satisfies the nested filter conditions.

Other creative uses for filters

You can use filters to retrieve data points without knowing their id. You can search through data and manage it, solely by using filters. Let's take a look at some creative uses for filters:

Action	Description	Action	Description
Delete Points	Deletes all points matching the filter.	Set Payload	Adds payload fields to all points matching the filter.
Scroll Points	Lists all points matching the filter.	Update Payload	Updates payload fields for points matching the filter.
Order Points	Lists all points, sorted by the filter.	Delete Payload	Deletes fields for points matching the filter.
Count Points	Totals the points matching the filter.

Filtering with the payload index

When you start working with Qdrant, your data is by default organized in a vector index.
In addition to this, we recommend adding a secondary data structure - the payload index.

Just how the vector index organizes vectors, the payload index will structure your metadata.

Figure 4: The payload index is an additional data structure that supports vector search. A payload index (in green) organizes candidate results by cardinality, so that semantic search (in red) can traverse the vector index quickly.

On its own, semantic searching over terabytes of data can take up lots of RAM. Filtering and Indexing are two easy strategies to reduce your compute usage and still get the best results. Remember, this is only a guide. For an exhaustive list of filtering options, you should read the filtering documentation.

Here is how you can create a single index for a metadata field "category":

PUT /collections/computers/index
{
    "field_name": "category",
    "field_schema": "keyword"
}

from qdrant_client import QdrantClient

client = QdrantClient(url="http://localhost:6333")

client.create_payload_index(
   collection_name="computers",
   field_name="category",
   field_schema="keyword",
)

Once you mark a field indexable, you don't need to do anything else. Qdrant will handle all optimizations in the background.

Why should you index metadata?

The payload index acts as a secondary data structure that speeds up retrieval. Whenever you run vector search with a filter, Qdrant will consult a payload index - if there is one.

If you are indexing your metadata, the difference in search performance can be dramatic.

As your dataset grows in complexity, Qdrant takes up additional resources to go through all data points. Without a proper data structure, the search can take longer - or run out of resources.

Payload indexing helps evaluate the most restrictive filters

The payload index is also used to accurately estimate filter cardinality, which helps the query planning choose a search strategy. Filter cardinality refers to the number of distinct values that a filter can match within a dataset. Qdrant's search strategy can switch from HNSW search to payload index-based search if the cardinality is too low.

How it affects your queries: Depending on the filter used in the search - there are several possible scenarios for query execution. Qdrant chooses one of the query execution options depending on the available indexes, the complexity of the conditions and the cardinality of the filtering result.

The planner estimates the cardinality of a filtered result before selecting a strategy.
Qdrant retrieves points using the payload index if cardinality is below threshold.
Qdrant uses the filterable vector index if the cardinality is above a threshold

What happens if you don't use payload indexes?

If you only rely on searching for the nearest vector, Qdrant will have to go through the entire vector index. It will calculate similarities against each vector in the collection, relevant or not. Alternatively, when you filter with the help of a payload index, the HSNW algorithm won't have to evaluate every point. Furthermore, the payload index will help HNSW construct the graph with additional links.

How does the payload index look?

A payload index is similar to conventional document-oriented databases. It connects metadata fields with their corresponding point id’s for quick retrieval.

In this example, you are indexing all of your computer hardware inside of the computers collection. Let’s take a look at a sample payload index for the field category.

Payload Index by keyword:
+------------+-------------+
| category   | id          |
+------------+-------------+
| laptop     | 1, 4, 7     |
| desktop    | 2, 5, 9     |
| speakers   | 3, 6, 8     |
| keyboard   | 10, 11      |
+------------+-------------+

When fields are properly indexed, the search engine roughly knows where it can start its journey. It can start looking up points that contain relevant metadata, and it doesn’t need to scan the entire dataset. This reduces the engine’s workload by a lot. As a result, query results are faster and the system can easily scale.

You may create as many payload indexes as you want, and we recommend you do so for each field that is frequently used.
If your users are often filtering by laptop when looking up a product category, indexing all computer metadata will speed up retrieval and make the results more precise.

Different types of payload indexes

Index Type	Description
Full-text Index	Enables efficient text search in large datasets.
Tenant Index	For data isolation and retrieval efficiency in multi-tenant architectures.
Principal Index	Manages data based on primary entities like users or accounts.
On-Disk Index	Stores indexes on disk to manage large datasets without memory usage.
Parameterized Index	Allows for dynamic querying, where the index can adapt based on different parameters or conditions provided by the user. Useful for numeric data like prices or timestamps.

Indexing payloads in multitenant setups

Some applications need to have data segregated, whereby different users need to see different data inside of the same program. When setting up storage for such a complex application, many users think they need multiple databases for segregated users.

We see this quite often. Users very frequently make the mistake of creating a separate collection for each tenant inside of the same cluster. This can quickly exhaust the cluster’s resources. Running vector search through too many collections can start using up too much RAM. You may start seeing out-of-memory (OOM) errors and degraded performance.

To mitigate this, we offer extensive support for multitenant systems, so that you can build an entire global application in one single Qdrant collection.

PUT /collections/{collection_name}/index
{
   "field_name": "payload_field_name",
   "field_schema": {
       "type": "keyword",
       "is_tenant": true
   }
}

The tenant index is another variant of the payload index. When creating or updating a collection, you can mark a metadata field as indexable. This time, the request will specify the field as a tenant. This means that you can mark various user types and customer id’s as is_tenant: true.

Key takeaways in filtering and indexing

Filtering with float-point (decimal) numbers

If you filter by the float data type, your search precision may be limited and inaccurate.

Float Datatype numbers have a decimal point and are 64 bits in size. Here is an example:

{
   "price": 11.99
}

When you filter for a specific float number, such as 11.99, you may get a different result, like 11.98 or 12.00. With decimals, numbers are rounded differently, so logically identical values may appear different. Unfortunately, searching for exact matches can be unreliable in this case.

To avoid inaccuracies, use a different filtering method. We recommend that you try Range Based Filtering instead of exact matches. This method accounts for minor variations in data, and it boosts performance - especially with large datasets.

Here is a sample JSON range filter for values greater than or equal to 11.99 and less than or equal to the same number. This will retrieve any values within the range of 11.99, including those with additional decimal places.

{
 "key": "price",
 "range": {
   "gt": null,
   "gte": 11.99,
   "lt": null,
   "lte": 11.99
  }
}

Working with pagination in queries

When you're implementing pagination in filtered queries, indexing becomes even more critical. When paginating results, you often need to exclude items you've already seen. This is typically managed by applying filters that specify which IDs should not be included in the next set of results.

However, an interesting aspect of Qdrant's data model is that a single point can have multiple values for the same field, such as different color options for a product. This means that during filtering, an ID might appear multiple times if it matches on different values of the same field.

Proper indexing ensures that these queries are efficient, preventing duplicate results and making pagination smoother.

Conclusion: Real-life use cases of filtering

Filtering in a vector database like Qdrant can significantly enhance search capabilities by enabling more precise and efficient retrieval of data.

As a conclusion to this guide, let's look at some real-life use cases where filtering is crucial:

Use Case	Vector Search	Filtering
E-Commerce Product Search	Search for products by style or visual similarity	Filter by price, color, brand, size, ratings
Recommendation Systems	Recommend similar content (e.g., movies, songs)	Filter by release date, genre, etc. (e.g., movies after 2020)
Geospatial Search in Ride-Sharing	Find similar drivers or delivery partners	Filter by rating, distance radius, vehicle type
Fraud & Anomaly Detection	Detect transactions similar to known fraud cases	Filter by amount, time, location

Before you go - all the code is in Qdrant's Dashboard

The easiest way to reach that "Hello World" moment is to try filtering in a live cluster. Our interactive tutorial will show you how to create a cluster, add data and try some filtering clauses.

What is RAG (Retrieval-Augmented Generation)?

Sabrina — Tue, 19 Mar 2024 16:47:57 +0000

Retrieval-augmented generation (RAG) integrates external information retrieval into the process of generating responses by Large Language Models (LLMs). It searches a database for information beyond its pre-trained knowledge base, significantly improving the accuracy and relevance of the generated responses.

Language models have exploded on the internet ever since ChatGPT came out, and rightfully so. They can write essays, code entire programs, and even make memes (though we’re still deciding on whether that's a good thing).

But as brilliant as these chatbots become, they still have limitations in tasks requiring external knowledge and factual information.

Yes, it can describe the honeybee's waggle dance in excruciating detail. But they become far more valuable if they can generate insights from any data that we provide, rather than just their original training data.

Since retraining those large language models from scratch costs millions of dollars and takes months, we need better ways to give our existing LLMs access to our custom data.

While you could be more creative with your prompts, it is only a short-term solution. LLMs can consider only a limited amount of text in their responses, known as a context window.

Some models like GPT-3 can see up to around 12 pages of text (that’s 4,096 tokens of context). That’s not good enough for most knowledge bases.

The image above shows how a basic RAG system works. Before forwarding the question to the LLM, we have a layer that searches our knowledge base for the "relevant knowledge" to answer the user query. Specifically, in this case, the spending data from the last month.

Our LLM can now generate a relevant non-hallucinated response about our budget.

As your data grows, you’ll need efficient ways to identify the most relevant information for your LLM's limited memory. This is where you’ll want a proper way to store and retrieve the specific data you’ll need for your query, without needing the LLM to remember it.

Vector databases store information as vector embeddings. This format supports efficient similarity searches to retrieve relevant data for your query. For example, Qdrant is specifically designed to perform fast, even in scenarios dealing with billions of vectors.

This article will focus on RAG systems and architecture. If you’re interested in learning more about vector search, we recommend the following articles: What is a Vector Database? and What are Vector Embeddings?.

RAG architecture

At its core, a RAG architecture includes the retriever and the generator. Let's start by understanding what each of these components does.

The Retriever

When you ask a question to the retriever, it uses similarity search to scan through a vast knowledge base of vector embeddings. It then pulls out the most relevant vectors to help answer that query. There are a few different techniques it can use to know what’s relevant:

How indexing works in RAG retrievers

The indexing process organizes the data into your vector database in a way that makes it easily searchable. This allows the RAG to access relevant information when responding to a query.

As shown in the image above, here’s the process:

Start with a loader that gathers documents containing your data. These documents could be anything from articles and books to web pages and social media posts.
Next, a splitter divides the documents into smaller chunks, typically sentences or paragraphs.
This is because RAG models work better with smaller pieces of text. In the diagram, these are document snippets.
Each text chunk is then fed into an embedding machine. This machine uses complex algorithms to convert the text into vector embeddings.

All the generated vector embeddings are stored in a knowledge base of indexed information. This supports efficient retrieval of similar pieces of information when needed.

Query vectorization

Once you have vectorized your knowledge base you can do the same to the user query. When the model sees a new query, it uses the same preprocessing and embedding techniques. This ensures that the query vector is compatible with the document vectors in the index.

Retrieval of relevant documents

When the system needs to find the most relevant documents or passages to answer a query, it utilizes vector similarity techniques. Vector similarity is a fundamental concept in machine learning and natural language processing (NLP) that quantifies the resemblance between vectors, which are mathematical representations of data points.

The system can employ different vector similarity strategies depending on the type of vectors used to represent the data:

Sparse vector representations

A sparse vector is characterized by a high dimensionality, with most of its elements being zero.

The classic approach is keyword search, which scans documents for the exact words or phrases in the query. The search creates sparse vector representations of documents by counting word occurrences and inversely weighting common words. Queries with rarer words get prioritized.

TF-IDF (Term Frequency-Inverse Document Frequency) and BM25 are two classic related algorithms. They're simple and computationally efficient. However, they can struggle with synonyms and don't always capture semantic similarities.

If you’re interested in going deeper, refer to our article on Sparse Vectors.

Dense vector embeddings

This approach uses large language models like BERT to encode the query and passages into dense vector embeddings. These models are compact numerical representations that capture semantic meaning. Vector databases like Qdrant store these embeddings, allowing retrieval based on semantic similarity rather than just keywords using distance metrics like cosine similarity.

This allows the retriever to match based on semantic understanding rather than just keywords. So if I ask about "compounds that cause BO," it can retrieve relevant info about "molecules that create body odor" even if those exact words weren't used. We explain more about it in our What are Vector Embeddings article.

Hybrid search

However, neither keyword search nor vector search are always perfect. Keyword search may miss relevant information expressed differently, while vector search can sometimes struggle with specificity or neglect important statistical word patterns. Hybrid methods aim to combine the strengths of different techniques.

Some common hybrid approaches include:

Using keyword search to get an initial set of candidate documents. Next, the documents are re-ranked/re-scored using semantic vector representations.
Starting with semantic vectors to find generally topically relevant documents. Next, the documents are filtered/re-ranked e based on keyword matches or other metadata.
Considering both semantic vector closeness and statistical keyword patterns/weights in a combined scoring model.
Having multiple stages were different techniques. One example: start with an initial keyword retrieval, followed by semantic re-ranking, then a final re-ranking using even more complex models.

When you combine the powers of different search methods in a complementary way, you can provide higher quality, more comprehensive results. Check out our article on Hybrid Search if you’d like to learn more.

The Generator

With the top relevant passages retrieved, it's now the generator's job to produce a final answer by synthesizing and expressing that information in natural language.

The LLM is typically a model like GPT, BART or T5, trained on massive datasets to understand and generate human-like text. It now takes not only the query (or question) as input but also the relevant documents or passages that the retriever identified as potentially containing the answer to generate its response.

The retriever and generator don't operate in isolation. The image bellow shows how the output of the retrieval feeds the generator to produce the final generated response.

Where is RAG being used?

Because of their more knowledgeable and contextual responses, we can find RAG models being applied in many areas today, especially those who need factual accuracy and knowledge depth.

Real-World Applications:

Question answering: This is perhaps the most prominent use case for RAG models. They power advanced question-answering systems that can retrieve relevant information from large knowledge bases and then generate fluent answers.

Language generation: RAG enables more factual and contextualized text generation for contextualized text summarization from multiple sources

Data-to-text generation: By retrieving relevant structured data, RAG models can generate product/business intelligence reports from databases or describing insights from data visualizations and charts

Multimedia understanding: RAG isn't limited to text - it can retrieve multimodal information like images, video, and audio to enhance understanding. Answering questions about images/videos by retrieving relevant textual context.

Creating your first RAG chatbot with Langchain, Groq, and OpenAI

Are you ready to create your own RAG chatbot from the ground up? We have a video explaining everything from the beginning. Daniel Romero’s will guide you through:

Setting up your chatbot
Preprocessing and organizing data for your chatbot's use
Applying vector similarity search algorithms
Enhancing the efficiency and response quality

After building your RAG chatbot, you'll be able to evaluate its performance against that of a chatbot powered solely by a Large Language Model (LLM).

What’s next?

Have a RAG project you want to bring to life? Join our Discord community where we’re always sharing tips and answering questions on vector search and retrieval.

Learn more about how to properly evaluate your RAG responses: Evaluating Retrieval Augmented Generation - a framework for assessment.

What are Vector Embeddings?

Sabrina — Wed, 07 Feb 2024 17:56:15 +0000

Embeddings are numerical machine learning representations of the semantic of the input data. They capture the meaning of complex, high-dimensional data, like text, images, or audio, into vectors. Enabling algorithms to process and analyze the data more efficiently.

Why Use Vector Embeddings?

You know when you’re scrolling through your social media feeds and the content just feels incredibly tailored to you? There's the news you care about, followed by a perfect tutorial with your favorite tech stack, and then a meme that makes you laugh so hard you snort.

Or what about how YouTube recommends videos you ended up loving. It’s by creators you've never even heard of and you didn’t even send YouTube a note about your ideal content lineup.

This is the magic of embeddings.

These are the result of deep learning models analyzing the data of your interactions online. From your likes, shares, comments, searches, the kind of content you linger on, and even the content you decide to skip. It also allows the algorithm to predict future content that you are likely to appreciate.

The same embeddings can be repurposed for search, ads, and other features, creating a highly personalized user experience.

They make high-dimensional data more manageable. This reduces storage requirements, improves computational efficiency, and makes sense of a ton of unstructured data.

How do embeddings work?

The nuances of natural language or the hidden meaning in large datasets of images, sounds, or user interactions are hard to fit into a table. Traditional relational databases can't efficiently query most types of data being currently used and produced, making the retrieval of this information very limited.

In the embeddings space, synonyms tend to appear in similar contexts and end up having similar embeddings. The space is a system smart enough to understand that "pretty" and "attractive" are playing for the same team. Without being explicitly told so.

That’s the magic.

At their core, vector embeddings are about semantics. They take the idea that "a word is known by the company it keeps" and apply it on a grand scale.

This capability is crucial for creating search systems, recommendation engines, retrieval augmented generation (RAG) and any application that benefits from a deep understanding of content.

Embeddings are created through neural networks. They capture complex relationships and semantics into dense vectors which are more suitable for machine learning and data processing applications. They can then project these vectors into a proper high-dimensional space, specifically, a Vector Database.

The meaning of a data point is implicitly defined by its position on the vector space. After the vectors are stored, we can use their spatial properties to perform nearest neighbor searches. These searches retrieve semantically similar items based on how close they are in this space.

The quality of the vector representations drives the performance. The embedding model that works best for you depends on your use case.

Creating Vector Embeddings

Embeddings translate the complexities of human language to a format that computers can understand. It uses neural networks to assign numerical values to the input data, in a way that similar data has similar values.

For example, if I want to make my computer understand the word 'right', I can assign a number like 1.3. So when my computer sees 1.3, it sees the word 'right’.

Now I want to make my computer understand the context of the word ‘right’. I can use a two-dimensional vector, such as [1.3, 0.8], to represent 'right'. The first number 1.3 still identifies the word 'right', but the second number 0.8 specifies the context.

We can introduce more dimensions to capture more nuances. For example, a third dimension could represent formality of the word, a fourth could indicate its emotional connotation (positive, neutral, negative), and so on.

The evolution of this concept led to the development of embedding models like Word2Vec and GloVe. They learn to understand the context in which words appear to generate high-dimensional vectors for each word, capturing far more complex properties.

However, these models still have limitations. They generate a single vector per word, based on its usage across texts. This means all the nuances of the word "right" are blended into one vector representation. That is not enough information for computers to fully understand the context.

So, how do we help computers grasp the nuances of language in different contexts? In other words, how do we differentiate between:

"your answer is right"
"turn right at the corner"
"everyone has the right to freedom of speech"

Each of these sentences use the word 'right', with different meanings.

More advanced models like BERT and GPT use deep learning models based on the transformer architecture, which helps computers consider the full context of a word. These models pay attention to the entire context. The model understands the specific use of a word in its surroundings, and then creates different embeddings for each.

But how does this process of understanding and interpreting work in practice? Think of the term: "biophilic design", for example. To generate its embedding, the transformer architecture can use the following contexts:

"Biophilic design incorporates natural elements into architectural planning."
"Offices with biophilic design elements report higher employee well-being."
"...plant life, natural light, and water features are key aspects of biophilic design."

And then it compares contexts to known architectural and design principles:

"Sustainable designs prioritize environmental harmony."
"Ergonomic spaces enhance user comfort and health."

The model creates a vector embedding for "biophilic design" that encapsulates the concept of integrating natural elements into man-made environments. Augmented with attributes that highlight the correlation between this integration and its positive impact on health, well-being, and environmental sustainability.

Integration with Embedding APIs

Selecting the right embedding model for your use case is crucial to your application performance. Qdrant makes it easier by offering seamless integration with the best selection of embedding APIs, including Cohere, Gemini, Jina Embeddings, OpenAI, Aleph Alpha, Fastembed, and AWS Bedrock.

If you’re looking for NLP and rapid prototyping, including language translation, question-answering, and text generation, OpenAI is a great choice. Gemini is ideal for image search, duplicate detection, and clustering tasks.

Fastembed, which we’ll use on the example below, is designed for efficiency and speed, great for applications needing low-latency responses, such as autocomplete and instant content recommendations.

I plan to go deeper into selecting the best model based on performance, cost, integration ease, and scalability in a future post.

Create a Neural Search Service with Fastembed

Now that you’re familiar with the core concepts around vector embeddings, how about start building your own Neural Search Service?

Tutorial guides you through a practical application of how to use Qdrant for document management based on descriptions of companies from startups-list.com. From embedding data, integrating it with Qdrant's vector database, constructing a search API, and finally deploying your solution with FastAPI.

Check out what the final version of this project looks like on the live online demo.

Let us know what you’re building with embeddings! Join our Discord community and share your projects!

What is a Vector Database?

Sabrina — Thu, 25 Jan 2024 21:29:58 +0000

A Vector Database is a specialized database system designed for efficiently indexing, querying, and retrieving high-dimensional vector data. Those systems enable advanced data analysis and similarity-search operations that extend well beyond the traditional, structured query approach of conventional databases.

Why use a Vector Database?

The data flood is real.

In 2024, we're drowning in unstructured data like images, text, and audio, that don’t fit into neatly organized tables. Still, we need a way to easily tap into the value within this chaos of almost 330 million terabytes of data being created each day.

Traditional databases, even with extensions that provide some vector handling capabilities, struggle with the complexities and demands of high-dimensional vector data.

Handling of vector data is extremely resource-intensive. A traditional vector is around 6Kb. You can see how scaling to millions of vectors can demand substantial system memory and computational resources. Which is at least very challenging for traditional OLTP and OLAP databases to manage.

Vector databases allow you to understand the context or conceptual similarity of unstructured data by representing them as vectors, enabling advanced analysis and retrieval based on data similarity.

For example, in recommendation systems, vector databases can analyze user behavior and item characteristics to suggest products or content with a high degree of personal relevance.

In search engines and research databases, they enhance the user experience by providing results that are semantically similar to the query, rather than relying solely on the exact words typed into the search bar.

If you’re new to the vector search space, this article explains the key concepts and relationships that you need to know.

So let's get into it.

What is Vector Data?

To understand vector databases, let's begin by defining what is a 'vector' or 'vector data'.

Vectors are a numerical representation of some type of complex information.

To represent textual data, for example, it will encapsulate the nuances of language, such as semantics and context.

With an image, the vector data encapsulates aspects like color, texture, and shape. The dimensions relate to the complexity and the amount of information each image contains.

Each pixel in an image can be seen as one dimension, as it holds data (like color intensity values for red, green, and blue channels in a color image). So even a small image with thousands of pixels translates to thousands of dimensions.

So from now on, when we talk about high-dimensional data, we mean that the data contains a large number of data points (pixels, features, semantics, syntax).

The creation of vector data (so we can store this high-dimensional data on our vector database) is primarily done through embeddings.

How do Embeddings Work?

Embeddings translate this high-dimensional data into a more manageable, lower-dimensional vector form that's more suitable for machine learning and data processing applications, typically through neural network models.

In creating dimensions for text, for example, the process involves analyzing the text to capture its linguistic elements.

Transformer-based neural networks like BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer), are widely used for creating text embeddings.

Each layer extracts different levels of features, such as context, semantics, and syntax.

The final layers of the network condense this information into a vector that is a compact, lower-dimensional representation of the image but still retains the essential information.

Core Functionalities of Vector Databases

What is Indexing?

Have you ever tried to find a specific face in a massive crowd photo? Well, vector databases face a similar challenge when dealing with tons of high-dimensional vectors.

Now, imagine dividing the crowd into smaller groups based on hair color, then eye color, then clothing style. Each layer gets you closer to who you’re looking for. Vector databases use similar multi-layered structures called indexes to organize vectors based on their "likeness."

This way, finding similar images becomes a quick hop across related groups, instead of scanning every picture one by one.

Different indexing methods exist, each with its strengths. HNSW balances speed and accuracy like a well-connected network of shortcuts in the crowd. Others, like IVF or Product Quantization, focus on specific tasks or memory efficiency.

What is Binary Quantization?

Quantization is a technique used for reducing the total size of the database. It works by compressing vectors into a more compact representation at the cost of accuracy.

Binary Quantization is a fast indexing and data compression method used by Qdrant. It supports vector comparisons, which can dramatically speed up query processing times (up to 40x faster!).

Think of each data point as a ruler. Binary quantization splits this ruler in half at a certain point, marking everything above as "1" and everything below as "0". This binarization process results in a string of bits, representing the original vector.

This "quantized" code is much smaller and easier to compare. Especially for OpenAI embeddings, this type of quantization has proven to achieve a massive performance improvement at a lower cost of accuracy.

What is Similarity Search?

Similarity search allows you to search not by keywords but by meaning. This way you can do searches such as similar songs that evoke the same mood, finding images that match your artistic vision, or even exploring emotional patterns in text.

The way it works is, when the user queries the database, this query is also converted into a vector (the query vector). The vector search starts at the top layer of the HNSW index, where the algorithm quickly identifies the area of the graph likely to contain vectors closest to the query vector. The algorithm compares your query vector to all the others, using metrics like "distance" or "similarity" to gauge how close they are.

The search then moves down progressively narrowing down to more closely related vectors. The goal is to narrow down the dataset to the most relevant items. The image below illustrates this.

Once the closest vectors are identified at the bottom layer, these points translate back to actual data, like images or music, representing your search results.

Scalability

Vector databases often deal with datasets that comprise billions of high-dimensional vectors. This data isn't just large in volume but also complex in nature, requiring more computing power and memory to process. Scalable systems can handle this increased complexity without performance degradation. This is achieved through a combination of a distributed architecture, dynamic resource allocation, data partitioning, load balancing, and optimization techniques.

Systems like Qdrant, exemplify scalability in vector databases. It leverages Rust's efficiency in memory management and performance, allowing handling of large-scale data with optimized resource usage.

Efficient Query Processing

The key to efficient query processing in these databases is linked to their indexing methods, which enable quick navigation through complex data structures. By mapping and accessing the high-dimensional vector space, HNSW and similar indexing techniques significantly reduce the time needed to locate and retrieve relevant data.

Other techniques like handling computational load and parallel processing are used for performance, especially when managing multiple simultaneous queries. Complementing them, strategic caching is also employed to store frequently accessed data, facilitating a quicker retrieval for subsequent queries.

Using Metadata and Filters

Filters use metadata to refine search queries within the database. For example, in a database containing text documents, a user might want to search for documents not only based on textual similarity but also filter the results by publication date or author.

When a query is made, the system can use both the vector data and the metadata to process the query. In other words, the database doesn’t just look for the closest vectors. It also considers the additional criteria set by the metadata filters, creating a more customizable search experience.

Data Security and Access Control

Vector databases often store sensitive information. This could include personal data in customer databases, confidential images, or proprietary text documents. Ensuring data security means protecting this information from unauthorized access, breaches, and other forms of cyber threats.

At Qdrant, this includes mechanisms such as:

User authentication
Role-based access control
Attribute-based access control
Encryption for data at rest and in transit
Keeping audit trails
Advanced database monitoring and anomaly detection

Architecture of a Vector Database

A vector database is made of multiple different entities and relations. Here's a high-level overview of Qdrant's terminologies and how they fit into the larger picture:

Collections: Collections are a named set of data points, where each point is a vector with an associated payload. All vectors within a collection must have the same dimensionality and be comparable using a single metric.

Distance Metrics: These metrics are used to measure the similarity between vectors. The choice of distance metric is made when creating a collection. It depends on the nature of the vectors and how they were generated, considering the neural network used for the encoding.

Points: Each point consists of a vector and can also include an optional identifier (ID) and payload. The vector represents the high-dimensional data and the payload carries metadata information in a JSON format, giving the data point more context or attributes.

Storage Options: There are two primary storage options. The in-memory storage option keeps all vectors in RAM, which allows for the highest speed in data access since disk access is only required for persistence.

Alternatively, the Memmap storage option creates a virtual address space linked with the file on disk, giving a balance between memory usage and access speed.

Clients: Qdrant supports various programming languages for client interaction, such as Python, Go, Rust, and Typescript. This way developers can connect to and interact with Qdrant using the programming language they prefer.

Vector Database Use Cases

If we had to summarize the use cases for vector databases into a single word, it would be "match". They are great at finding non-obvious ways to correspond or “match” data with a given query. Whether it's through similarity in images, text, user preferences, or patterns in data.

Here are some examples of how to take advantage of using vector databases:

Personalized recommendation systems to analyze and interpret complex user data, such as preferences, behaviors, and interactions. For example, on Spotify, if a user frequently listens to the same song or skips it, the recommendation engine takes note of this to personalize future suggestions.

Semantic search allows for systems to be able to capture the deeper semantic meaning of words and text. In modern search engines, if someone searches for "tips for planting in spring," it tries to understand the intent and contextual meaning behind the query. It doesn’t try just matching the words themselves. Here’s an example of a vector search engine for Startups made with Qdrant:

There are many other use cases like for fraud detection and anomaly analysis used in sectors like finance and cybersecurity, to detect anomalies and potential fraud. And Content-Based Image Retrieval (CBIR) for images by comparing vector representations rather than metadata or tags.

Those are just a few examples. The ability of vector databases to “match” data with queries makes them essential for multiple types of applications. Here are some more use case examples you can take a look at.

Starting Your First Vector Database Project

Now that you're familiar with the core concepts around vector databases, it’s time to get our hands dirty. Start by building your own semantic search engine for science fiction books in just about 5 minutes with the help of Qdrant. You can also watch our video tutorial.

Feeling ready to dive into a more complex project? Take the next step and get started building an actual Neural Search Service with a complete API and a dataset.

Let’s get into action!

Distributed Systems Like You're 5

Sabrina — Thu, 30 Mar 2023 17:25:08 +0000

TL;DR
Let's break down distributed systems! In this blog post, I'll explore how a group of computers works together as a team to handle big tasks and see how these systems are essential for solving real-world problems, optimizing databases and computing, and playing a major role in MLOps. I'll also be hosting a Twitter Space tomorrow, where four distributed systems experts and I will dive deeper into building and maintaining distributed systems, addressing challenges, and sharing some best practices.d

What are distributed systems?

Imagine you're building a huge Minecraft castle, and you have to gather resources, craft items, and construct the entire structure on your own. It would be overwhelming and time-consuming to do everything by yourself. So, you invite your friends to join your server and help you. Each of you takes responsibility for gathering specific resources, crafting particular items, and constructing parts of the castle, making the whole process much faster and more enjoyable. That's what distributed systems are like – they take a big, complex task and break it down into smaller, manageable parts that can be done by different computers working together.

In the world of computing, there are lots of tasks that are just too big for one computer to handle, and just like the Minecraft castle, these tasks need the help of many computers to get the job done quickly and efficiently. These computers work together as a team to complete the task by dividing it into smaller pieces.

How do they work?

Have you ever played a game of "Telephone"? It's a bit like that. Each computer passes along a message to the next one. They use "special languages", called protocols, to communicate with each other and share information, making sure that every computer has the information of what's going on and what part of the task they need to work on.

To make sure the computers can communicate well and maintain consistency across the system, distributed systems use various algorithms and protocols. Consensus algorithms like Paxos and Raft help nodes agree on the state of the system, while data replication techniques like sharding and partitioning distribute data across computers for better fault tolerance, which means they can keep working even when something goes wrong.

Where is it?

Distributed systems are widely used in real-world applications by some of the most popular tech companies. For instance, Google uses distributed systems in its search engine infrastructure to handle billions of queries and return results quickly. Also, Netflix relies on distributed systems to manage and deliver its massive library of movies and TV shows to millions of users around the globe. We can also mention AWS and Microsoft Azure, which provide cloud computing platforms built on distributed systems that enable businesses to scale their applications and services efficiently.

If you're interested in getting started with distributed systems, there are plenty of resources available to help. You can begin by exploring online courses and tutorials covering the basics of distributed computing and parallel processing. It's also a great idea to experiment with some of the popular tools and technologies, like Apache Kafka, Apache Spark, and Dask. Working hands-on with these tools will give you a deeper understanding of how distributed systems work in practice and build the skills needed to work with them effectively.

Let's talk about it!

Looking for a live discussion about it with top industry experts to answer all your questions? Well, this is your lucky day.

I'm excited to invite you to our upcoming Twitter Spaces event, where we'll dive even deeper into distributed systems. We have an outstanding panel of experts who will share their knowledge and insights about distributed systems, answering all your questions and helping you easily learn this complex topic.

The Spaces will happen tomorrow (Friday, March 31st) at 12 pm EST.

Mark on your calendars and access the event through this link:
https://twitter.com/i/spaces/1LyxBqRwLYWJN?s=20

Set your reminders!

Won't be able to attend live but would still like to watch? The recorded Spaces will be available on the same link.

Como Consegui uma Carreira em DevRel Ainda na Faculdade

Sabrina — Tue, 21 Feb 2023 17:43:43 +0000

Sumário

Intro
Fazendo Umas Piadinhas
O PODER do Networking
Levando um Hobby a Sério
MUITA Coisa pra Aprender
Resumindo: Como Começar uma Comunidade
Conclusão

TL;DR
Nesse blog vou compartilhar a minha história "acidental" como criadora de conteúdo, que me levou a conhecer pessoas incríveis e a oportunidade de trabalhar na área de DevRel enquanto ainda estou na faculdade. Vou falar sobre o que aprendi até agora, desde os meus primeiros tweets até a construção de uma carreira internacional. Apanhei bastante pelo caminho, e essas são as maiores lições e dicas que tenho para quem gostaria de seguir um caminho parecido.

Intro

Sem ter muito o que fazer durante a pandemia do Covid, decidi entrar em uma antiga conta minha do Twitter, que eu nunca realmente usei e devia ter um pouco mais de 5 followers na época. Comecei a seguir muitos tópicos e contas que eu achava interessante, e alguns deles eram relacionados com Ciência da Computação.

Nessa época eu já programava ocasionalmente na faculdade, e já tinha tido contato com programação pelo programa CS50 de Harvard (essencial para quem está começando) no meu ensino médio, que fiz técnico em informática, e alguns projetos soltos por aí. Mas nada muito uaaau.

Eventualmente me deparei com uma hashtag que me chamou muita atenção, o #100DaysOfCode. Que consiste basicamente da galera codando por 100 dias seguidos, no mínimo 1 hora por dia, e tweetando sobre o que aprendeu naquele dia.

Decidi começar a aprender um pouco de front-end, já que tinha muito conteúdo da galera ajudando a começar e era o tópico mais falado no tech twitter naquela época. Isso literalmente eu sem seguidores, tá? Estava fazendo apenas pra me forçar a codar consistentemente e "bater o ponto", nem que fosse uma horinha só.

Por isso, os tweets que eu fazia com a hashtag não tinha o intuito de ser muito informativo ou tentando ensinar algo. Geralmente era apenas um pensamento engraçado ou uma observação interessante de algo que eu aprendi.

Por causa do alcance dessa hashtag na época, feita justamente com o intuito de dar visibilidade para novos devs e se ajudar como comunidade, esses tweets começaram a ganhar uns likes. Lembro que no primeiro post já tive um engajamento e fiquei muito animada porque minha conta era realmente bem isolada.

Fui acumulando seguidores um pouquinho todo dia, e os tweets começaram a ganhar cada vez mais alcance.

Nessa época eu interagia muuito com a galera através de DMs e comentários, formei amigos que converso até hoje, e conheci pessoas que são literalmente lendas do tech twitter, como o legendário Jack Forge, que me ajudaram muito. Todo mundo era muito legal e aquilo tudo era muito novo e interessante para mim.

Fazendo Umas Piadinhas

Muitas pessoas começaram a me seguir por se identificar com tipo de conteúdo mais sarcástico, que eu também gostava de fazer. E sem querer fui formando meu nicho.

Conseguir criar e identificar o seu nicho é uma das etapas mais importantes na criação de conteúdo, e é o que vai fazer a diferença entre você crescer rápido ou devagar.

Com o tempo eu entendi que, quando postava conteúdo para esse nicho, o engajamento tomava maiores proporções. Isso acontece porque minha audiência são pessoas que gostam desses tweets e já relacionaram eles com a minha conta, então elas são muito mais prováveis de curtir ou retuitar esse conteúdo.

Ser receptivo nos comentários faz as pessoas interagirem com você com mais frequência. Aqui estão alguns dos comentários desse mesmo post que acabei de mostrar:

Comentários também são uma das ferramentas mais poderosas para aumentar o seu alcance. A maior parte de como as pessoas vão te ver e decidir se gostam ou não de você é observando como você interage com outros usuários ou seguidores nos comentários.

Todo o processo de entender como criação de conteúdo funciona e como utilizar o algoritmo ao seu favor são pontos cruciais se você planeja criar uma audiência para o seu conteúdo.

Nesse post, eu ganhei muito alcance apenas respondendo comentários, e contas com poucos seguidores também ganharam bastante visibilidade com um simples reply.

O PODER do Networking

Quando eu estava começando, interagir com a comunidade foi crucial para o meu crescimento. Além disso te permitir conhecer pessoas incríveis, muitos são dispostos a ajudar ou ter um rápido bate-papo que pode fazer a diferença na sua carreira.

Por causa dessas interações, tive mentorias, peguei insights de uma galera FODA, fui convidada para eventos online e muitas outras oportunidades que nunca teria tido sem a ajuda da comunidade.

Não consigo enfatizar o suficiente como é importante, independentemente do seu conteúdo, número de seguidores ou quantidade de curtidas, fazer conexões com outras pessoas. No final, é isso que realmente importa.

Resumindo, esses são os passos que considero fundamentais para ter sucesso em todas essas dicas até agora:

Escolha seu nicho
Ache pessoas referências nele
Interaja com o conteúdo dessas pessoas
Crie seu próprio conteúdo
Se mantenha consistente
Interaja com a comunidade

Seguindo os passos 1 a 3, é garantido que você comece a chamar atenção para a sua conta. Seu trabalho agora é converter essa atenção para followers.

No passo 4, se atente a produzir valor e focar no seu nicho específico. Seja original e poste as coisas de forma autêntica, sempre guarde um tempinho para planejar o seu próximo passo. Isso vai fazer com que as pessoas queiram te seguir e interagir com o seu conteúdo.

Para os passos 5 e 6, tente postar todo dia. Isso vai fazer com que as pessoas vejam mais de você e aumentar a chance de se tornar um seguidor. Responda os comentários que puder e interaja com seus seguidores. Ninguém gosta de seguir uma conta fantasma!

Levando um Hobby a Sério

Criar uma presença online foi um processo que me ensinou muito, mas que por muito tempo encarei apenas como um hobby, que me motivava a programar de forma consistente. Eu apenas fazia porque gostava.

Mas como isso me levou para DevRel?

Quando estava saindo de um estágio como desenvolvedora de software, recebi uma proposta para trabalhar meio período como gerente de comunidade para uma startup canadense chamada Shakudo, onde trabalho até hoje. Obviamente fiquei muito animada com a proposta de trabalhar pra fora do país e ainda mais com algo que eu já gostava de fazer de graça.

Nesse período me formei em Ciência e Tecnologia da Computação na UFRN, que é um generalista para entrar no meu atual curso Engenharia da Computação. Eventualmente, as habilidades técnicas que construí durante os anos sozinha e na faculdade, junto com a minha experiência com criação de comunidades, me ajudaram a entrar integralmente como Developer Advocate na Shakudo 🎉

Agora, sou responsável por fazer demos com o produto, criar estudos de caso, escrever documentação e gerenciar várias iniciativas, certificando que estão alinhadas aos objetivos da empresa. Irei falar mais sobre meu dia-a-dia como Developer Advocate em futuros posts do DevRel Brasil.

A maior parte do meu time trabalha presencial no Canadá então muitas vezes estou assim:

Mas Sabrina, eu PRECISO criar uma audiência própria para trabalhar em DevRel?

A resposta é: NÃO!

Não é necessário ter qualquer influência online para ser um bom DevRel, até porque não vai ser todo mundo que quer se expor publicamente assim.

PORÉM

Se você quer trabalhar nessa área, é bom ter algum histórico de sucesso criando conteúdo e desenvolvendo comunidades, além das skills técnicas.

Falo sobre presença online pois é o método que funcionou para mim e para dezenas de conhecidos que trabalham com DevRel hoje. Contudo, diria que esse não é o caso para uma grande parte de profissionais DevRel. Como existem diversas portas de entrada, tudo depende de você e como você quer construir a sua carreira.

MUITA Coisa pra Aprender

Com certeza, há muitas coisas que você precisa aprender para ter sucesso em DevRel. É normal que algumas delas sejam mais fáceis para você do que outras, por isso minha dica é:

"Faça algo ruim com esforço para depois fazer algo bom sem esforço."

Não tente alcançar a perfeição logo de cara. Se você fizer algo que não parece tão bom agora, mas que pode ser melhorado, poste, analise o que pode ser aprimorado e vá melhorando com o tempo. Lembre que Roma não foi construída em um dia.

Acredite, esse processo eventualmente se tornará natural para você, aumentando a quantidade de conteúdo de qualidade que você vai conseguir gerar em pouco tempo.

Não deixe de postar o seu processo de aprendizagem nas redes sociais. Além de ser um ótimo tipo de conteúdo, é bastante motivador para quem gostaria de fazer algo semelhante.

Resumindo: Como Começar uma Comunidade

Se você ainda não tem uma presença online e está se perguntando como começar do zero a sua própria marca pessoal ou comunidade de desenvolvedores, aqui estão algumas coisas que você deve considerar antes de começar adoidado:

1. Planeje sua rede social

Aqui eu falo bastante sobre o Twitter, pois acaba sendo a rede que gosto mais e que me proporcionou diversas oportunidades. Isso não significa que outras redes também não tenham oportunidades incríveis para o crescimento pessoal e de uma comunidade. Outros exemplos são: Instagram, Twitch, Reddit, Youtube e Discord.

Antes te começar, analise qual se encaixa melhor no seu perfil e que tipo de conteúdo gostaria de produzir. Aqui estão alguns exemplos de comunidades de desenvolvedores bem sucedidas nessas outras redes:

Javascript Wizz Instagram
Midudev Twitch
WebDev Reddit
Joma Tech Youtube
He4art Developers Discord

2. Comece pequeno

Colocar objetivos grandes logo de cara pode facilmente levar a frustração, desmotivação ou burnout. Você tem mais chances de se dar melhor se focar em ganhar 10 followers ou membros em 1 semana, do que 10,000 em um ano.

3. Mantenha Consistência

Não custa nada repetir o quanto isso é essencial. Um dos métodos que utilizei para me manter consistente foi a hashtag #100DaysOfCode. Independente do que você escolha para isso, e importante é aparecer todo dia, e criar um pouquinho de valor para a comunidade.

Entenda o seu nicho comunidade, estude o que outras pessoas estão fazendo e aplique esse conhecimento no seu conteúdo consistentemente.

4. Inove

Ter um diferencial é um fator importante para escalar sua comunidade. Se pergunte: já existem muitas comunidades lá fora que desenvolvedores podem participar. Por que eles escolheriam participar da sua? Pense em algo que você gostaria de ver acontecendo hoje que ainda não tenha muito suporte.

Pode ser um projeto open-source que ajudaria muita gente a começar em alguma linguagem. Ou uma forma de facilitar a entrada e pessoas transacionando a carreira para tech depois dos 30. Talvez uma comunidade totalmente focada em criar coisas no-code ou com inteligência artificial.

Escolha algo que você tenha afinidade e crie algo novo que você gostaria de ver dentro daquela área. As chances são que muita gente também gostaria de participar da sua ideia.

Conclusão

Compartilhei aqui um pouco da minha experiência e de muitas pequenas coisas que me levaram a conseguir trilhar uma carreira internacional enquanto ainda termino a faculdade de Engenharia de Computação.

No entanto, a dica mais valiosa que aprendi até hoje e que pode ajudar em qualquer carreira e na vida em geral é: esteja sempre pronto para falhar e aprender.

Se você aprender constantemente, trabalhar duro, e não ser um cuzão, as chances de você ser extremamente bem sucedido em DevRel ou em qualquer outra carreira são enormes.

Espero que esse post tenha te ajudado a entender um pouco mais sobre a importância da criação de uma presença online, e no melhor dos casos, seja a sementinha de motivação para alguém começar a construir sua própria comunidade de desenvolvedores.