midegdugarova

Posted on Jun 25

Qdrant in Production: 10 Gotchas the Quickstart Won't Tell You

#ai #rag #qdrant #vectorsearch

The Qdrant quickstart is genuinely good — you're upserting vectors and getting
search results in five minutes. But there's a gap between "the demo works" and
"this runs in production without surprising me," and most of what lives in that
gap isn't in any single docs page. It's scattered across reference sections,
GitHub issues, and the scars of people who hit it at 2 a.m.

I collected these while ramping up on Qdrant — reading the docs end to end,
building demos, and auditing the gaps. Here are the ten that matter, ordered
roughly by when they'll bite you: your first week, your first month, your
first incident.

All code uses the current Python client API (query_points, not the deprecated
search).

Your first week

1. Payload indexing is not automatic

This is the big one. Qdrant lets you filter on any payload field out of the
box, and at demo scale it's fast — so it's easy to assume filtering "just
works." It does. It's just doing a full scan over candidate payloads,
which falls off a cliff as the collection grows.

Every field you filter on needs an explicit index:

client.create_payload_index(
    collection_name="my_docs",
    field_name="category",
    field_schema="keyword",
)

There's no warning when you filter on an unindexed field. The symptom is just
"filtered queries got slow somewhere past a few hundred thousand points." Make
payload indexes part of your collection-creation script, not an afterthought.

2. Cosine vs. dot product: normalization decides

If your embeddings are L2-normalized — and OpenAI and Cohere embeddings are —
cosine similarity and dot product give identical rankings, but dot skips
the normalization step, so it's the faster choice:

vectors_config=VectorParams(size=1536, distance=Distance.DOT)

The trap runs the other way: use DOT with un-normalized embeddings and your
results get silently biased toward vectors with larger magnitudes. No error,
just subtly wrong rankings — the worst kind of bug.

Rule of thumb: OpenAI/Cohere → DOT. Anything else, or unsure → COSINE,
which normalizes for you.

3. Collection config is forever

Vector dimensions and distance metric are immutable after
create_collection. There is no migration path — switching embedding models
means a new collection and a full re-ingest of everything.

That's worth a real decision upfront, not a default. And if you suspect you'll
ever migrate models (you will), use named vectors from day one — you can
add a new named vector for the new model and backfill, instead of rebuilding
the world:

vectors_config={
    "openai-small": VectorParams(size=1536, distance=Distance.DOT),
    # room to add "openai-large" later without a new collection
}

Your first month

4. `upsert` replaces the entire point

Qdrant has three update operations, and using the wrong one silently loses
data:

upsert — replaces the whole point: vector and all payload fields
set_payload — updates only the payload fields you pass
update_vectors — updates only the vector

The classic mistake is using upsert to "update one field." Any payload field
you didn't re-include is gone — no error, no warning. If you're patching
metadata, you want set_payload.

5. Very selective filters quietly change the algorithm

Qdrant's filtered search is smart: the query planner estimates how many points
match your filter, and if the match set is very small (think under ~1% of the
collection), it skips the HNSW index entirely and does an exact scan over the
matching points — because that's genuinely faster at that selectivity.

This is correct behavior, but it produces a confusing symptom: "search is
fast usually, slow sometimes," depending on which filter a user happens to
pick. If you have a dimension that's always extremely selective — per-tenant
data is the classic case — consider making it a separate collection (or using
Qdrant's multitenancy patterns) instead of filtering one giant one.

6. Set `score_threshold`, or your RAG pipeline will hallucinate politely

By default, search returns the limit nearest results no matter how far
away they are. Ask about something your collection knows nothing about, and
you still get back the top 5 "closest" chunks — which are garbage — and your
LLM will confidently synthesize an answer from them.

The fix is one parameter plus one honest code path:

results = client.query_points(
    collection_name="my_docs",
    query=query_vector,
    limit=5,
    score_threshold=0.7,
).points

if not results:
    return "I don't have information about that."

A threshold around 0.7 is a reasonable starting point for OpenAI embeddings,
but calibrate it per model — score distributions vary a lot. The empty-results
branch is not an edge case; it's the feature.

Your first incident

7. HNSW tuning: know which knob to turn first

Three parameters control the recall/speed/memory trade-off:

ef (search time) — beam width during search. Tune this first: it needs no rebuild and is often all you need.
ef_construct (default 100) — beam width during index build. Higher = better graph quality, but 3–5× slower ingest. Requires rebuild.
m (default 16) — edges per node. Higher = better recall and more memory, permanently. Requires rebuild.

So the debugging sequence when recall is too low: raise ef → if that's not
enough, raise ef_construct and rebuild → only then touch m. Going straight
to m=64 because a blog post said so costs you memory forever.

8. Snapshots are your backup primitive — and they don't schedule themselves

Self-hosted Qdrant has no automatic backups. The primitive is the snapshot:

client.create_snapshot(collection_name="my_docs")

Three things to internalize before the incident, not during:

Nothing triggers snapshots for you. Cron it, or it doesn't happen.
A snapshot on the same disk as the data protects you from nothing. Ship it off-node.
Replication is not backup. replication_factor > 1 in distributed mode gives you high availability — it cheerfully replicates your bad deploy's deletions too.

(Qdrant Cloud handles backups for you — this one is squarely a self-hosting
gotcha.)

Two you'll be glad you knew

9. Sparse vectors are a different type, and hybrid search is a query shape

Sparse vectors (for BM25-style keyword matching) are not "dense vectors with a
different flag." They're configured separately (sparse_vectors_config with
SparseVectorParams) and use their own value type
(SparseVector(indices=[...], values=[...])).

And hybrid search isn't a magic hybrid=True parameter — it's a query shape:
two prefetch sub-queries (one dense, one sparse) fused with Reciprocal Rank
Fusion:

client.query_points(
    collection_name="my_docs",
    prefetch=[
        models.Prefetch(query=dense_vector, using="dense", limit=20),
        models.Prefetch(
            query=models.SparseVector(indices=[...], values=[...]),
            using="sparse",
            limit=20,
        ),
    ],
    query=models.FusionQuery(fusion=models.Fusion.RRF),
)

Once you see it as composition rather than configuration, the whole Query API
makes more sense.

10. One point can carry many vectors

The model that finally clicked for me: a Qdrant point is not "a vector with
metadata." It's an entity that can hold multiple named dense vectors and
sparse vectors simultaneously:

vector={
    "text": text_embedding,
    "image": image_embedding,
    "sparse": SparseVector(indices=[...], values=[...]),
}

That's text search, image search, and keyword search over the same objects
from one collection — no syncing three stores, no duplicate payloads. If
you're designing a multimodal or hybrid system, this is the feature to design
around from the start (see gotcha #3: you can't bolt it on later without a
re-ingest).

The pattern underneath

Almost every item on this list is the same lesson wearing different clothes:
Qdrant's defaults are tuned for the demo, and production is a set of
explicit decisions — index your filter fields, pick your distance metric on
purpose, choose the right update operation, schedule your own snapshots,
threshold your own scores.

None of these are flaws; they're the configuration surface of a tool that
trusts you. But the quickstart can't make those decisions for you, and the
worst failures here are the silent ones. Better to meet them in a blog post
than in an incident channel.

Top comments (1)

Ahmet Özel • Jun 25

This matches my experience. Payload indexing being manual is the one that bites everyone. The symptom is sneaky too: filtered queries still return correct results in the demo, so nothing looks broken, and then latency quietly degrades as the collection grows because the filter is doing a full scan. Two more that fit your list: be deliberate about on-disk vs in-memory once the collection outgrows RAM, and watch the consistency and write-ordering settings if you upsert and immediately query, since you can read your own write before it is indexed. Did you settle on a default for hnsw_ef at query time, or tune it per collection? That one is easy to leave at the default and lose recall.