Davi Orlandi

Posted on Dec 11, 2025

RAG with MongoDB Vector Search PART 1

#mongodb #rag #python #ai

What is RAG?

Retrieval-augmented generation (RAG) is a technique that enables large language models (LLMs) to retrieve and incorporate new information. RAG feeds contextual data to the LLM to deliver more accurate and grounded answers.

Why Vector Search Matters in RAG?

Traditional keyword search breaks down when queries are vague, paraphrased, or semantically rich. Vector search solves this by representing text as high-dimensional embeddings that encode meaning rather than literal wording.

Embedding converts documents and user queries into high-dimensional vectors that capture their semantic meaning.
Indexing then uses these vectors to build an approximate nearest-neighbor (ANN) structure such as HNSW, IVF-Flat, or PQ, enabling efficient similarity search.
Retrieval embeds the incoming query and compares it against the indexed vectors, returning the closest matches based on semantic similarity.

Let's think about it:

Imagine we have these sources of data:

Source 1
"Hono is a fast, lightweight JavaScript framework built on Web Standards. It focuses on low overhead, edge-friendly execution, and a minimal API surface."

Source 2
"Elysia is a Bun-optimized web framework that provides strong typing, schema validation, and excellent performance. It is designed for building scalable HTTP services with good developer ergonomics."

Source 3
"Express is a minimalistic and widely adopted Node.js framework. It is commonly used to build REST APIs because of its simplicity, extensive ecosystem, and flexible middleware model."

Now imagine a user searches for the following query:
"How can I build a backend service in JavaScript, with better Bun runtime integration?"

When this query is embedded, the resulting vector represents concepts such as backend service, API development, HTTP frameworks, and JavaScript server-side technologies.

Most modern embedding models (Voyage, OpenAI, HuggingFace, etc.) generate vectors between 512 and 3072 dimensions. See an example:

[
  0.0182, -0.0925, 0.0441, 0.0107, -0.0713, 0.1234, -0.0089, 0.0562,
  -0.0041, 0.0977, 0.0229, -0.0335, 0.1412, -0.0611, 0.0054, 0.0883,
  -0.0122, 0.0745, -0.1099, 0.0671, 0.0144, -0.0528, 0.0995, -0.0173,
  0.0811, -0.0442, 0.0368, 0.1210, -0.0075, 0.0932, -0.0661, 0.0152,
  0.0473, -0.0891, 0.1329, 0.0287, -0.0174, 0.0721, -0.0554, 0.1012,
  0.0069, -0.0312, 0.1184, -0.0251, 0.0526, 0.0048, -0.0903, 0.1301,
  0.0110, -0.0782, 0.0433, 0.0271, -0.0622, 0.0999, -0.0148, 0.0711,
  0.0835, -0.0222, 0.0579, -0.0384
]

Vector search then compares this query vector with the vectors generated from the sources above using the index.

The similarity search identifies which sources are semantically closest to the intent of the query and retrieves them. In this case:

Source 2 (Elysia) is likely to rank highest because it explicitly mentions Bun optimization and scalable HTTP services—concepts directly aligned with building a backend service in JavaScript with strong Bun runtime integration.

Source 3 (Express) would typically appear next, as it strongly relates to backend development and REST API construction, even though it is not tailored to Bun.

Source 1 (Hono) remains relevant but ranks lower because its description emphasizes minimalism and edge-friendly execution rather than Bun-specific integration or backend-focused features.

A vector search result might look like this:

[
  {
    "text": "Elysia is a Bun-optimized web framework that provides strong typing, schema validation, and excellent performance. It is designed for building scalable HTTP services with good developer ergonomics.",
    "score": 0.91
  },
  {
    "text": "Express is a minimalistic and widely adopted Node.js framework. It is commonly used to build REST APIs because of its simplicity, extensive ecosystem, and flexible middleware model.",
    "score": 0.78
  },
  {
    "text": "Hono is a fast, lightweight JavaScript framework built on Web Standards. It focuses on low overhead, edge-friendly execution, and a minimal API surface.",
    "score": 0.61
  }
]

Why use MongoDB Atlas Vector Search?

MongoDB Atlas Vector Search brings vector similarity, metadata filtering, and document storage into a single, unified system.

Instead of splitting your stack between a vector database and an operational database, Atlas lets you keep embeddings, raw documents, and application data side-by-side. This removes the overhead of synchronizing two systems, reduces latency, and simplifies your architecture.

For RAG pipelines, this matters: you can store the original sources, their embeddings, and any contextual metadata (tags, timestamps, access rules, versions) all in one place and query everything in a single round trip.

How MongoDB Vector Search Works

MongoDB stores your embeddings inside collections as arrays of numbers, just like any other field. When you enable vector search on that field, Atlas builds an ANN index optimized for fast semantic similarity lookup.

When a query comes in, Atlas uses your embedded input (usually through your app or an LLM workflow), compares the query vector against the indexed vectors, and returns the documents with the smallest distance.

Creating a Vector Search Index in MongoDB Atlas

To enable vector search in MongoDB Atlas, you start by defining a vector index on the field that stores your embeddings. This index tells Atlas how to structure the ANN graph (HNSW) and what similarity metric to use. A typical index definition looks like this:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "embedding": {
        "type": "knnVector",
        "dimensions": 1536,
        "similarity": "cosine"
      }
    }
  }
}

Here, embedding is the field where each document stores its vector representation. The dimensions value must match the size of the embedding model you use, and similarity defines how distance is calculated during retrieval.

Once indexed, MongoDB can perform vector search queries through the $vectorSearch stage. A minimal example looks like this:

{
  "$vectorSearch": {
    "index": "frameworks_vector_index",
    "path": "embedding",
    "queryVector": [/* query embedding values */],
    "numCandidates": 50,
    "limit": 3
  }
}

Behind the scenes, Atlas takes your queryVector, traverses the HNSW graph, and identifies the closest nodes based on the configured similarity metric.

For the earlier query, Atlas would receive the embed user input, compare it against all stored embeddings, and return the most semantically aligned frameworks, just like in the example shown above. Because the retrieval happens inside the same database where your documents and metadata live, your RAG pipeline can immediately assemble the final context to feed into the LLM.

A critical requirement: your embedding model must be the same for both stored documents and incoming queries. Mixing models or versions breaks vector compatibility and degrades similarity search. Always embed with the same model.

This was a brief overview of how RAG, vector search, and MongoDB Atlas fit together in a practical workflow. I will continue publishing more articles exploring RAG architectures, vector indexing strategies, hybrid search, retrieval optimization, and real-world patterns using MongoDB Atlas.

Documentation

MongoDB Atlas Vector Search
https://www.mongodb.com/docs/atlas/atlas-search/vector-search/

MongoDB University (free courses)
https://learn.mongodb.com/

Voyage AI
https://docs.voyageai.com/

See you soon ;)

Top comments (1)

Art light • Dec 12 '25

Really enjoyed how clearly you explained the whole RAG flow—your MongoDB Vector Search breakdown made the concepts feel so practical, and I’m excited to see the next parts of your series.