Making sense of PDFs with AI: Why we need both Sentence-BERT and FAISS

When I first started exploring question-answering on PDFs, one thing confused me:
👉 Why do we use Sentence-BERT and FAISS together?
Can’t FAISS just create embeddings on its own?

Here’s the simple breakdown 👇

🔹 Sentence-BERT

It’s a neural network model.

Converts text into embeddings (vectors of numbers).

These vectors capture the meaning of the text.

Example: “dog” and “puppy” → end up with vectors close to each other.

🔹 FAISS (Facebook AI Similarity Search)

It doesn’t create embeddings.

Instead, it’s an efficient search engine for vectors.

Given a query vector, it finds the nearest neighbors (most similar chunks of text) super fast.

👉 Think of it like this:

Sentence-BERT = Translator (text → coordinates on a “map of meaning”)

FAISS = GPS (finds the closest points on that map in milliseconds)

💡 Together, they make semantic search possible:

Sentence-BERT gives us the “language of meaning” (embeddings)

FAISS makes searching through thousands or millions of embeddings lightning fast

Without Sentence-BERT → FAISS has nothing meaningful to compare.
Without FAISS → you can still compare embeddings, but it’s painfully slow at scale.

DEV Community

Making sense of PDFs with AI: Why we need both Sentence-BERT and FAISS

Top comments (0)