An Vo

Posted on Jan 27

GenAI #2: What are Vector Databases?

#ai #beginners #database #machinelearning

What is a Vector?

A vector is a list of numbers that represents something (text, image, audio, user, product, etc.) in a way computers can compare.

Think of a vector as a “numerical fingerprint.”
A vector might look like: [0.48, -0.87, 1.04, 0.33, ...]
Each number captures some aspect of meaning or features.

Why vectors matter in GenAI

AI models can’t directly understand words or images — they understand numbers. So we convert things into vectors using embedding models.

Example:

"I love dogs" → vector A

"I like puppies" → vector B

Those vectors will be close together in vector space because they mean similar things.

What are Vector Databases?

A vector database stores vectors and lets you quickly find the most similar ones.

Why regular databases aren’t enough

Traditional databases are great for:

Exact matches (id = 123)
Filters (price > 50)

But GenAI needs:

“Find text similar to this”

“Find documents that mean the same thing”

“Find images that look alike”

That’s where vector databases come in.

What does a Vector Database do?

A vector database:

Stores vectors (embeddings)
Indexes them efficiently
Performs similarity search (usually cosine similarity or distance)

Typical GenAI Workflow (RAG-style)

You start with data such as PDFs, notes, system logs, customer support chat logs, etc.
These documents are split into smaller chunks and converted into vectors (embeddings) using an embedding model.
The vectors are stored in a Vector Database.
A user asks a question.
The question is converted into a query vector using the same embedding model.
The Vector Database performs a similarity search to find the vectors (document chunks) most relevant to the query vector.
The retrieved text chunks are fed into an LLM as context.
The LLM uses this context to understand, summarize, and synthesize the information—following the system prompt—and returns a natural-language response to the end user.

Popular Vector Databases:

Pinecone
Weaviate
Milvus
Chroma
FAISS (library, not a DB)
PostgreSQL + pgvector (Relational database with vector support)
Qdrant
AWS:
- Individual Vector DB: Amazon Kendra, OpenSearch Service, and RDS for PostgreSQL with pgvector.
- Managed service via Amazon Bedrock Knowledge Bases: Aurora PostgreSQL, Neptune Analytics, OpenSearch Serverless, Pinecone, and Redis Enterprise Cloud.

Vector Index Optimization:

In a Vector Store, there can be millions or even billions of vectors.
To retrieve relevant information efficiently, these vectors cannot be stored as a simple flat list, which would require brute-force comparison against every vector.

Vector index optimization refers to selecting and configuring specialized algorithms that organize vectors into efficient data structures, enabling fast approximate similarity search while maintaining acceptable accuracy.

Brute-Force Comparison: means comparing a query vector against every single vector in the vector store to find the most similar ones. It's guaranteeing accuracy but killing performance at scale.

Assume:

- You have 1,000,000 vectors
- Each vector has 1,536 dimensions (typical OpenAI embedding size)
- When a user asks a question:
    - Convert the question into a vector
    - Compute similarity between the question vector and:
Vector 1
Vector 2
Vector 3
…
Vector 1,000,000

    - Sort the results
    - Return the top-K most similar vectors

That’s brute force.

Hierarchical Navigable Small World (HNSW): It builds a graph structure connecting similar vectors. High-speed (low latency and high recall), consumes RAM, and is the best fit for RAG chatbots with end users.
Below is an intuitive sample:

Layer 3 (top, very sparse)
    o -------- o
       \
        o

Layer 2
  o ---- o ---- o
    \      \
     o      o

Layer 1
 o -- o -- o -- o
 |    |    |    |
 o -- o -- o -- o

- Each `o` is a vector (a data point)
- Lines are connections to nearby vectors
- Higher layers have fewer nodes and longer jumps

** Step 1: Start at the top layer

Begin from a random entry point
This layer helps you make big jumps across the space
Goal: get close to the target region Like using Google Maps zoomed out at the continent level

** Step 2: Move down layer by layer

At each layer:
Look at neighboring nodes
Move to the neighbor that’s closer to your query
Repeat until you can’t get closer
Drop down to the next layer Like zooming from continent → country → city → street

** Step 3: Final search at the bottom layer

Bottom layer is dense
You now explore the local neighborhood
Retrieve the nearest vectors
This is where accuracy comes from.

Inverted File Index (IVF): IVF does not create paths or layers like HNSW. Instead, it:
Partitions space into regions
Stores vectors in buckets
Searches only the most relevant buckets
Imagine the entire vector space as a world map:

+-------------------+-------------------+
|     Region A      |     Region B      |
|   (Cluster C1)    |   (Cluster C2)    |
|   ● ● ● ● ●       |   ● ● ● ●         |
|                   |                   |
+-------------------+-------------------+
|     Region C      |     Region D      |
|   (Cluster C3)    |   (Cluster C4)    |
|   ● ● ●           |   ● ● ● ● ● ●     |
+-------------------+-------------------+
- Each region is a cluster
- Each ● is a vector (document chunk)
- Each region has a centroid (not shown yet)

** Step 1: Zoom In to the Centroids (Region Representatives)
Now imagine flags planted at the center of each region:

      🚩C1                🚩C2

      ● ● ● ●            ● ● ●

      🚩C3                🚩C4

      ● ● ●              ● ● ● ● ●

These flags (centroids) act like: “If your query is close to me, search my region.”

** Step 2: A user question becomes a vector

          🔎 Query
              ●

** Step 3: Compare ONLY with centroids
Instead of checking every ●, you check:

Distance(Query, 🚩C1)
Distance(Query, 🚩C2)
Distance(Query, 🚩C3)
Distance(Query, 🚩C4)

This is cheap and fast.

** Step 4: Pick nearest regions (nprobe)
Say the closest centroids are C2 and C1:

Search these regions only:
✅ Region B (C2)
✅ Region A (C1)

Ignore:
❌ Region C
❌ Region D

** Step 5: Search inside selected buckets
Now you finally compare the query with vectors:

Region B: ● ● ● ●
Region A: ● ● ● ● ●

But only those — not the entire world.

Curious Why It’s Called “Inverted File”??
Instead of: “Which cluster does this vector belong to?”
You store: “Here are all vectors belonging to this cluster.”
==> That’s the inversion.

This series will build toward the goal of preparing for the AWS Certified Generative AI Developer – Professional (AIP-C01) certification
https://docs.aws.amazon.com/aws-certification/latest/examguides/ai-professional-01.html