DEV Community

Cover image for Build Your First Semantic Search with Sentence Transformers and ChromaDB
Chidinma Oham
Chidinma Oham

Posted on

Build Your First Semantic Search with Sentence Transformers and ChromaDB

I recently finished watching Game of Thrones (no comment on the final season) and as the final credits rolled, I wasn’t quite ready to leave that atmosphere behind. I loved the political scheming, the shifting loyalties and even the moral ambiguity of certain characters so I found myself wanting to read a book that captured that exact same vibe or maybe even listen to a playlist that matched the emotional texture of it all.

Most search engines couldn't help me because they were built to recommend based on genre, popularity or what other people had clicked. It wouldn't understand feelings or moods or the subtle nuances of human emotion. I didn't want “another fantasy show”. I wanted something that felt emotionally adjacent to what I had just experienced.

That idea stayed in my head for a while and eventually I decided to play around with transformers to see what I could do about it.

The result was a semantic media recommendation engine using ChromaDB and Sentence Transformers that takes in a natural language input describing an emotion, vibe, concept or narrative situation, converts that input into embeddings then retrieves semantically similar media recommendations across books, films, poems and songs.

In this tutorial, we will build that cross-media recommender using semantic search, understand the core concepts that power it and compare different architectures along with their benefits and trade offs.

What We Are Building

At a high level, the system works like this:

  1. A dataset of media items is prepared.
  2. Each media item is converted into an embedding vector.
  3. Those embeddings are stored inside ChromaDB.
  4. A user enters a natural language query.
  5. The query is embedded using the same model.
  6. ChromaDB retrieves semantically similar media items.
  7. The system returns recommendations across multiple media types.

system architecture flowchart

An important thing to note here is that we are not matching keywords.

The user does not need to type: “sad movie”, “grief song” or “political fantasy book”

Instead, they can describe an emotion or situation naturally and the system retrieves results based on semantic similarity. For instance; "last day in a city before I relocate"

The Tech Stack

For this project, we'll be using:

  • Python
  • ChromaDB
  • Sentence Transformers (The all-MiniLM-L6-v2 embedding model)

The architecture is intentionally simple so we can appreciate the mechanics of semantic retrieval before adding APIs, interfaces or LLM integrations.

Understanding Embeddings

Before writing any code, we need to understand embeddings properly.

An embedding is a numerical representation of text. When we pass text into a transformer model, the model converts that text into a high-dimensional vector.

Something like:

[0.231, -0.884, 0.442, ...]

The actual numbers are not important. What matters is spatial similarity. Text with similar meaning ends up mathematically close together in vector space.

For example:

“grief after death”
“mourning someone”
“coping with loss”

may all exist near each other inside that space. Meanwhile:

“summer beach party”
“high energy dance music”

would likely exist far away.

3D embedding visualization on a white background showing how semantically similar concepts appear near one another in embedding space while emotionally unrelated concepts are positioned farther apart

Step 1: Setting Up the Project

Create a new folder for the project and navigate into it:

mkdir semantic-media-search
cd semantic-media-search
Enter fullscreen mode Exit fullscreen mode

Next, create a virtual environment.

python -m venv venv
Enter fullscreen mode Exit fullscreen mode

Activate the environment:

venv\Scripts\activate
Enter fullscreen mode Exit fullscreen mode

Once activated, install the required dependencies:

pip install chromadb sentence-transformers
Enter fullscreen mode Exit fullscreen mode

Here is what each package is responsible for:

sentence-transformers handles text embeddings using transformer models.
chromadb stores and retrieves embeddings using vector similarity search.

Step 2: Project Structure

Create the following project structure:

semantic-media-search/
│
├── data/
│   └── media.json
│
├── chroma_db/
│
├── embed.py
├── search.py
└── venv/
Enter fullscreen mode Exit fullscreen mode

Step 3: Structuring The Dataset

Since semantic search depends heavily on contextual meaning, the quality of the dataset matters a lot.

For this prototype, we'd be using a manually curated json dataset containing books, films, songs and poems. Each item contains a title, creator, themes, mood, description and media type.

In the media.json file inside the data folder:

[
  {
    "id": "1",
    "title": "Purple Hibiscus",
    "type": "book",
    "creator": "Chimamanda Ngozi Adichie",
    "themes": ["family", "religion", "silence"],
    "mood": ["melancholic", "tense"],
    "description": "A coming-of-age story exploring control, silence and political unrest."
  },
  {
    "id": "2",
    "title": "Moonlight",
    "type": "film",
    "creator": "Barry Jenkins",
    "themes": ["identity", "loneliness", "masculinity"],
    "mood": ["introspective", "emotional"],
    "description": "A deeply emotional film about identity, vulnerability and human connection."
  }
]
Enter fullscreen mode Exit fullscreen mode

One thing you'll quickly realize while building this project is that embeddings become much better when the descriptions are emotionally descriptive instead of mechanically factual. For example:

"description": "A fantasy film released in 2012" does not carry much semantic value. Meanwhile:

"description": "A haunting story about grief, memory and emotional isolation" contains significantly richer contextual meaning for the embedding model to work with.

Step 4: Creating Embeddings

Now we can begin generating embeddings from our media dataset.

In our embed.py file

import json
import chromadb
from sentence_transformers import SentenceTransformer
Enter fullscreen mode Exit fullscreen mode

We first load the embedding model:

print("Downloading and loading model...")
model = SentenceTransformer('all-MiniLM-L6-v2')
Enter fullscreen mode Exit fullscreen mode

The first time you run this, the model will be downloaded locally. Next, initialize ChromaDB:

chroma_client = chromadb.PersistentClient(path="./chroma_db")

collection = chroma_client.get_or_create_collection(
    name="media_recommendations"
)
Enter fullscreen mode Exit fullscreen mode

We are using PersistentClient instead of an in-memory database because we want the embeddings stored permanently between runs.

Now load the dataset:

with open('data/media.json', 'r') as file:
    media_items = json.load(file)
Enter fullscreen mode Exit fullscreen mode

At this stage, we need to convert each media item into embedding-friendly text.

This is an important step.

We are not embedding only the description field. Instead, we combine the title, themes, mood, creator and description into a single contextual string. That gives the transformer more semantic information to work with.

Inside a loop:

for item in media_items:

    embedding_text = f"""
    Title: {item['title']}
    Type: {item['type']}
    Creator: {item['creator']}
    Themes: {", ".join(item['themes'])}
    Mood: {", ".join(item['mood'])}
    Description: {item['description']}
    """

Enter fullscreen mode Exit fullscreen mode

Now generate the embedding:

embedding = model.encode(embedding_text).tolist()

Enter fullscreen mode Exit fullscreen mode

The .tolist() conversion is necessary because ChromaDB expects standard Python lists instead of NumPy arrays.

Next, store everything inside ChromaDB:

collection.add(
    ids=[item["id"]],
    embeddings=[embedding],
    documents=[embedding_text],
    metadatas=[{
        "title": item["title"],
        "type": item["type"],
        "creator": item["creator"]
    }]
)

Enter fullscreen mode Exit fullscreen mode

Finally, print a success message:

print("Success! Embeddings stored in ChromaDB.")
Enter fullscreen mode Exit fullscreen mode

Run the script:

python embed.py
Enter fullscreen mode Exit fullscreen mode

If everything works correctly, your embeddings will now be stored inside the chroma_db directory.

At this point, the system understands your media semantically. We now need a way to retrieve similar results from natural language input.

Step 5: Building The Search Layer

Inside the search.py file:

import chromadb
from sentence_transformers import SentenceTransformer
Enter fullscreen mode Exit fullscreen mode

Load the same embedding model again:

model = SentenceTransformer('all-MiniLM-L6-v2')
Enter fullscreen mode Exit fullscreen mode

This part is extremely important. The same embedding model used during ingestion must also be used during retrieval. Otherwise, the vectors would exist in different semantic spaces and similarity search would break.

Now reconnect to ChromaDB:

chroma_client = chromadb.PersistentClient(path="./chroma_db")

collection = chroma_client.get_collection(
    name="media_recommendations"
)

Enter fullscreen mode Exit fullscreen mode

Take user input:

query = input(
    "What kind of vibe or story are you looking for?\n> "
)

Enter fullscreen mode Exit fullscreen mode

Convert the query into an embedding:

query_embedding = model.encode(query).tolist()
Enter fullscreen mode Exit fullscreen mode

Now comes the retrieval step.

Instead of retrieving random recommendations across all media, we will intentionally retrieve one recommendation for each media type: a book, a film, a song and a poem.

Define the media types:

media_types = ["book", "film", "song", "poem"]
Enter fullscreen mode Exit fullscreen mode

Now loop through them:

for media in media_types:

    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=1,
        where={"type": media}
    )
Enter fullscreen mode Exit fullscreen mode

The where filter ensures we retrieve recommendations within each category separately. Without this filter, the system might return four books or four songs depending on vector similarity.

Now handle the results safely:

if len(results["metadatas"][0]) > 0:

    metadata = results["metadatas"][0][0]

    print("-------------------------")
    print(f"Type: {metadata['type'].upper()}")
    print(f"Title: {metadata['title']}")
    print(f"Creator: {metadata['creator']}")

else:

    print("-------------------------")
    print(f"Type: {media.upper()}")
    print("Result: Nothing found.")
Enter fullscreen mode Exit fullscreen mode

Run the script:

python search.py
Enter fullscreen mode Exit fullscreen mode

Then try prompts like:

*grief after losing someone
*feeling like a million bucks
*last day in a city before relocating
*the feeling of growing apart from your childhood

Architecture Comparisons and Trade-Offs

The stack for this prototype is entirely local. We used a local instance of ChromaDB and a lightweight open-source embedding model. When you are building a semantic search engine, you generally have a few architectural routes each with its own headaches and perks.

Route 1: The Fully Local Stack (Our approach)

We ran the database on our machine and generated embeddings using our own CPU. It was completely free. No API key or internet connection needed (after initial download) and nobody else has access to our data. However, this approach could be heavy on your local resources. The all-MiniLM-L6-v2 model is tiny but if you scale up to millions of rows or use a larger, more nuanced model your machine might struggle.

Route 2: The Managed API Stack

You use OpenAI or Cohere for embeddings and a managed vector database like Pinecone or Weaviate Cloud. These services handle the heavy math, meaning your app runs fast on any device and scales effortlessly. But you pay per token and per database hour. Plus, you are completely dependent on external services staying online.

Route 3: The Hybrid Stack

You might keep the database local or self-hosted but use an external API for the embeddings. This would allow you control your data storage while outsourcing the intense compute required for generating embeddings. You still have API dependency and moving large chunks of vector data back and forth over a network can create latency.

Next Steps

This same architecture powers a wide range of real-world applications, from enterprise recommendation systems to customer support AI, semantic document search and any other contextual retrieval systems.

When I was satisfied with the results in the terminal, I wrapped it in FastAPI and deployed to a web application with the help of Codex. I tested it out using a prompt of political scheming, shifting loyalties and characters making highly questionable moral choices. It handed me The Traitor Baru Cormorant by Seth Dickinson. 10/10 no notes.

Top comments (0)