What Is Sentence-Transformers (SBERT)?
When I first explored text similarity, it was for an exam management system. The problem? Sometimes different teachers unknowingly entered the same question twice in the database just phrased a little differently. Traditional keyword-based matching struggled with this, since two questions could look different on the surface but actually mean the same thing.
For example:
“What is the capital city of France?”
“Which city is the capital of France?”
To a keyword search, those might look different. But to a student (and to us), they’re clearly duplicates.
This is where Sentence-Transformers (SBERT) comes in. By converting questions into semantic embeddings, SBERT allows us to measure how close two sentences are in meaning, not just in wording. That makes it perfect for detecting duplicate questions automatically.
In this article, I’ll walk you through how to build a Text Similarity Checker API using SBERT and Flask. While my first use case was exam management, this same idea applies to semantic search, chatbots, recommendation engines, and plagiarism detection. By the end, you’ll have a lightweight REST API that can take two texts and tell you how similar they are.
Project Setup
Before we start coding, let’s set up a clean environment for our project. I recommend using a virtual environment (venv
) to keep dependencies isolated and avoid conflicts with other Python projects on your system.
Next, in your venv install the required libraries:
pip install flask sentence-transformers
This will install:
- Flask → to create our REST API.
- Sentence Transformers → to load SBERT and compute sentence embeddings.
Building A Minimal Flask Application
Before we bring in Sentence-BERT, let’s start with the simplest possible Flask app to make sure everything works.
Create a file called app.py
inside your project folder and add the following code:
from flask import Flask
# Create a Flask app instancefrom flask import Flask # Create a Flask app instance app = Flask(__name__) # Define a basic route @app.route('/') def home(): return "Hello, Flask is running!" if __name__ == '__main__': app.run(debug=True)
app = Flask(__name__)
# Define a basic route
@app.route('/')
def home():
return "Hello, Flask is running!"
if __name__ == '__main__':
app.run(debug=True)
Then you could start the server with:
python app.py
or
flask --app app run
If everything is set up correctly, you’ll see output like:
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
Now, open a browser and go to http://127.0.0.1:5000 and you should see:
Hello, Flask is running!
This minimal example confirms Flask is installed and working. Next, we’ll extend this app to build our Text Similarity Checker API using SBERT.
Adding a Helper for SBERT
Our Flask app is running, now let’s bring in Sentence-BERT (SBERT).
To keep our code clean, we’ll create a new file called embedding_service.py where we handle three things:
- Loading the model once (lazy loading).
- Turning text into embeddings (vectors).
- Comparing two embeddings to get similarity.
We’ll build this step by step.
Step 1: Load the Model
from transformers import AutoTokenizer, AutoModel
# Lazy-loaded global variables
_tokenizer = None
_model = None
def _load_model():
"""Load SBERT tokenizer and model once (lazy loading)."""
global _tokenizer, _model
if _tokenizer is None or _model is None:
_tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
_model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
return _tokenizer, _model
Here’s what’s happening:
-
_tokenizer
and_model
are kept as global variables so the model is only loaded once. - By default, we’re using
all-MiniLM-L6-v2
, a fast and lightweight SBERT model (384 dimensions). - You can swap this out with any other Sentence Transformer model depending on your use case (better accuracy, domain-specific data, etc.).
A wide collection of SBERT models can be found on Hugging Face here:
https://huggingface.co/models?library=sentence-transformers
For example, you could replace the default with:
-
sentence-transformers/all-mpnet-base-v2
(higher accuracy, 768 dimensions) -
sentence-transformers/paraphrase-MiniLM-L12-v2
(optimized for paraphrase detection)
This flexibility makes SBERT powerful where you can start small, then upgrade the model later without changing the rest of your code.
Step 2: Convert Sentences into Vectors
import torch
import torch.nn.functional as F
def mean_pooling(model_output, attention_mask):
"""Average token embeddings with attention mask (standard SBERT pooling)."""
token_embeddings = model_output[0]
mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * mask_expanded, 1) / torch.clamp(mask_expanded.sum(1), min=1e-9)
def to_vector(text):
"""Convert a string (or list of strings) into SBERT embeddings."""
if isinstance(text, str):
text = [text]
tokenizer, model = _load_model()
encoded_input = tokenizer(text, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
model_output = model(**encoded_input)
sentence_embeddings = mean_pooling(model_output, encoded_input["attention_mask"])
return F.normalize(sentence_embeddings, p=2, dim=1)
-
mean_pooling
: SBERT doesn’t just use the [CLS] token; instead, we average all token embeddings while respecting the attention mask. -
to_vector
: - Takes a string (or list of strings).
- Tokenizes it (padding/truncation for batch processing).
- Runs it through the model in inference mode (torch.no_grad()).
- Pools it into a single vector per sentence.
- Normalizes the vector (important for cosine similarity).
- Now we can turn any sentence into a numerical vector.
Step 3: Compare Two Sentences
def compare(text1, text2):
"""Compare two texts and return cosine similarity."""
v1 = to_vector(text1)
v2 = to_vector(text2)
similarity = torch.nn.functional.cosine_similarity(v1, v2)
return similarity.item()
- This function takes two texts.
- It converts them into vectors using to_vector.
- Then, it uses cosine similarity to calculate how close they are in meaning.
- Finally, it returns a single similarity score (0 → different, 1 → identical meaning).
A Note on Using the SentenceTransformers Shortcut
You might be wondering:
“Why not just use the SentenceTransformers library directly? Isn’t there already a function for this?”
And you’d be right here’s the quick way:
from sentence_transformers import SentenceTransformer
# Load a pretrained Sentence Transformer model
model = SentenceTransformer("all-MiniLM-L6-v2")
# Encode sentences directly
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
]
embeddings = model.encode(sentences)
# Calculate pairwise similarities
similarities = model.similarity(embeddings, embeddings)
print(similarities)
This works perfectly fine, and for small projects or experiments, it’s the fastest way to go.
Why We Wrote It Manually
In this tutorial, I chose to build the embedding helper “by hand” (with AutoTokenizer
, AutoModel
, and pooling
) instead of relying only on the SentenceTransformer.encode()
shortcut. Why?
- Transparency → you see exactly how embeddings are generated, step by step.
- Deeper Understanding → by writing it out, you gain a better grasp of what’s happening under the hood (tokenization → model forward pass → pooling → normalization → similarity).
- Flexibility → you can change pooling strategies, thresholds, or normalization methods to fit your use case.
- Configurability → we tie it into Flask’s app.config, so you can swap models or tweak parameters without rewriting code.
- Scalability → manual control lets you optimize for GPUs, batching, or mixed precision when moving to production.
- Compatibility → you’re not limited to models wrapped by SentenceTransformers. Any Hugging Face model can be plugged in.
Most importantly: by building it this way, you’ll understand it well enough to modify it later. If your project needs domain-specific embeddings or a different similarity function, you’ll know exactly where and how to make those changes.
Building the REST API
Now we’ll extend our app.py file so that it:
Accepts two input texts via a POST request.
Uses our embedding_service to calculate similarity.
Returns the similarity score as JSON.
Step 1: Import the Helper
Inside app.py
, import the helper we created:
from flask import Flask, request, jsonify
from embedding_service import compare
Step 2: Add a New Route
We’ll define a new endpoint /similarity
that accepts a POST request with two texts.
@app.route('/similarity', methods=['POST'])
def similarity():
try:
data = request.get_json()
text1 = data.get("text1")
text2 = data.get("text2")
if not text1 or not text2:
return jsonify({"error": "Both text1 and text2 are required"}), 400
score = compare(text1, text2)
return jsonify({
"text1": text1,
"text2": text2,
"similarity": score
})
except Exception as e:
return jsonify({"error": str(e)}), 500
In this code we
- We grab the JSON body with request.get_json().
- Extract text1 and text2.
- Validate that both are provided.
- Call our compare() function from the embedding service.
- Return a JSON response with the result.
Step 3: Test the API
Run the Flask app again:
python app.py
Now, send a POST request with two sentences (using curl, Postman, or httpie):
curl -X POST http://127.0.0.1:5000/similarity \
-H "Content-Type: application/json" \
-d '{"text1": "What is the capital of France?", "text2": "Which city is the capital of France?"}'
If everything is working, you should get a JSON response like:
{
"similarity": 0.9416358470916748,
"text1": "What is the capital of France?",
"text2": "Which city is the capital of France?"
}
Now you have a fully working Text Similarity REST API powered by SBERT and Flask.
Conclusion and Next Steps
In this article, we built a hands-on text similarity API step by step:
- First, we learned how embeddings work by writing comparison logic manually.
- Then, we automated embedding and similarity with the
SentenceTransformer
API. - Finally, we wrapped everything in a Flask REST service so others can consume it easily.
- This is a solid foundation but you don’t have to stop here.
You could improve this project by:
- Registering embeddings into a vector database (such as Pinecone, Weaviate, Qdrant, or FAISS or pgvector).
- That way, instead of only comparing two sentences, you can run nearest-neighbor queries across thousands or millions of stored texts.
- This unlocks semantic search, recommendation systems, and even chat with your documents use cases.
- Scaling the API with FastAPI + Uvicorn for production.
- Adding authentication & logging if others will consume your service.
With these improvements, your little embedding service can evolve into a real-world semantic search engine.
Top comments (0)