DEV Community

Cover image for How to Perform Semantic Search using ChromaDB in JavaScript
vaatiesther
vaatiesther

Posted on

How to Perform Semantic Search using ChromaDB in JavaScript

This tutorial will cover how to use embeddings and vectors to perform semantic search using ChromaDB in JavaScript.

What are Embeddings

Have you ever wondered how recommendation systems like Netflix almost always know what movies you like? When you log in to Netflix, the app presents recommendations that will likely fit your tastes and preferences;Embeddings power the mechanism behind this.
Embeddings refer to the transformation of words, text, or audio into numerical vectors. A numerical vector is essentially an array of numbers. This transformation preserves the meaning of the words and also captures their relationship to to other words in the vector space.

What is A vector space

A vector space is a mathematical space where vectors represent data. For example, consider the words 'cat' and 'kitten.' When these words are represented as vectors in a vector space, the vectors capture their semantic relationship, thus facilitating their mapping within the space.

The distance between the 'cat' and 'kitten' vectors measures their relatedness. Since 'cat' and 'kitten' are close to one another, the distance between them is small. Larger distances between vectors indicate that the words or texts are not closely related.

This means that when you search for "cat," the system can recognize the similarity and suggest content related to cats and kittens.

This powerful technology is what allows platforms like Netflix and Spotify to provide you with personalized and accurate recommendations, enhancing your viewing and listening experience.

How to create Embeddings with OpenAI

OpenAI provides an embedding model that measures the relatedness of text. To get an embedding of our 'cat' and 'kitten' words, we need to send each string to the OpenAI embeddings API endpoint along with the model name

First, define your OpenAI API_KEY

const OPENAI_API_KEY ="your_openai_api_key";
Enter fullscreen mode Exit fullscreen mode

Create a function that takes a phrase or word as an argument, sends it to the OpenAI embeddings API, and gives back the embedding.

async function createEmbeddings(word) {
  const url = " https://api.openai.com/v1/embeddings";

  const headers = {
    "Content-Type": "application/json",
    Authorization: `Bearer ${OPENAI_API_KEY}`,
  };
  const data = {
    input: word,
    model: "text-embedding-3-small",
  };
  const response = await fetch(url, {
    method:'POST',
    headers: headers,
    body: JSON.stringify(data),

  });
  const embedding = await response.json();
  console.log(embedding.data)
}
Enter fullscreen mode Exit fullscreen mode

Now let's invoke the function with the words cat and kitten

createEmbeddings("cat");
createEmbeddings("kitten");

Enter fullscreen mode Exit fullscreen mode

The output will look like this:

[
  {
    object: 'embedding',
    index: 0,
    embedding: [
         0.02552942,  -0.023411665, -0.016092611,    0.03937628,   0.02094483,
        -0.02632067,  0.0018908527,  0.030602723,  -0.015929706, 0.0053118416,
         0.02214334, -0.0002121755,  0.010460779,  0.0031213614,   0.02985802,
        0.006265995,  -0.021363726, -0.010716772,  -0.030532908,  0.057528466,
         0.03409353,    0.04589245,  0.020502662,  -0.046637155, -0.006871068,
         0.03800323,  -0.009268087,   0.04405396,   0.051803548, -0.013497779,
       0.0033686268,  -0.043123078,   -0.0112753,  -0.029090041, -0.022946225,
        0.017768197,   0.017570386, -0.028019529,  -0.015743531,   0.01378868,
       -0.037281796,  -0.008773557,  0.045799363,   0.011473113,  0.009460081,
         -0.0533395,  -0.022597145, -0.019606689,   0.019362332,  0.037142165,
        0.023388393,  -0.014870829,   0.01746566,    0.04998833, -0.004168603,
      -0.0011636016,  -0.019292515,   0.04659061, -0.0029279126,  0.009279723,
       -0.024970891,  0.0059925485,   0.02518034,  -0.002679193,  0.019420512,
        0.038282495,    0.01837327,  0.017232941,   -0.05962295, -0.018210366,
      -0.0058034635,   0.028415153, -0.062089786,   0.011286936,  0.047218956,
        0.009401902,  -0.029974379, -0.000250538,   0.062974125,  0.043425616,
       0.0011352389,   0.058552437,  0.016243879,  -0.025226884,   0.01259017,
       -0.023202218,  -0.034512427,   0.02850824,   0.011054216, -0.026041405,
      -0.0038457036,   0.015487539, -0.044798665,  -0.038980655, -0.010332783,
        0.043774694,  -0.008517564, -0.048219655,  -0.001969396,  0.014149397,
      ... 1436 more items
    ]
  }
]
Enter fullscreen mode Exit fullscreen mode

What is a Vector Database

As the name suggests, a vector database is a database that can store vectors. Unlike traditional databases that use primary keys and foreign keys when querying data, data in vector databases is in the form of highly dimensional vectors. When querying, vector databases use mathematical proximity to find similar items.

How to Set up A vector database with ChromaDB and Docker

Vector databases are ideal for building complex AI applications. ChromadB is an open-source vector database that requires minimal configuration to get started.

To get started, you should have Docker Installed. Follow the steps below to get it running on your machine:

Pull the ChromaDB docker image from the Docker hub repository.

docker pull chromadb/chromadb
Enter fullscreen mode Exit fullscreen mode

Run the chromaDB container and specify the ports

docker run -d -p 8080:8080 --name chromadb chromadb/chromadb
Enter fullscreen mode Exit fullscreen mode

To verify that the container is running, issue this command

docker ps
Enter fullscreen mode Exit fullscreen mode

You should see the ChromaDB container from your list of running containers.

chroma image

Adding Data to the VectorStore

To ensure the semantic meaning of data is accurate, the data needs to be in small chunks, we will start by adding items in an array describing some movies that look like this:

const movies = [
  '"Title":"Due Date","Year":"2010","Rated":"R","Released":"05 Nov 2010","Runtime":"95 min","Genre":"Comedy, Drama","Actors":"Robert Downey Jr., Zach Galifianakis, Michelle Monaghan","Plot":"High-strung father-to-be Peter Highman is forced to hitch a ride with aspiring actor Ethan Tremblay on a road trip in order to make it to his child\'s birth on time."',
  '"Title":"Easy A","Year":"2010","Rated":"PG-13","Released":"17 Sep 2010","Runtime":"92 min","Genre":"Comedy, Drama, Romance","Actors":"Emma Stone, Amanda Bynes, Penn Badgley","Plot":"When Olive lies to her best friend about losing her virginity to one of the college boys, a girl overhears their conversation. Soon, her story spreads across the entire school like wildfire."',
  '"Title":"Unstoppable","Year":"2010","Rated":"PG-13","Released":"12 Nov 2010","Runtime":"98 min","Genre":"Action, Thriller","Actors":"Denzel Washington, Chris Pine, Rosario Dawson","Plot":"With an unmanned, half-mile-long freight train barreling toward a city, a veteran engineer and a young conductor race against the clock to prevent a catastrophe."',
  '"Title":"Despicable Me","Year":"2010","Rated":"PG","Runtime":"95 min","Genre":"Animation, Adventure, Comedy","Actors":"Steve Carell, Jason Segel, Russell Brand","Plot":"Gru, a criminal mastermind, adopts three orphans as pawns to carry out the biggest heist in history. His life takes an unexpected turn when the little girls see the evildoer as their potential father."',
  '"Title":"Don Henley: Live Inside Job","Year":"2000","Rated":"N/A","Runtime":"105 min","Genre":"Documentary, Music","Actors":"Don Henley, Jonathan K. Bendis, Will Hollis","Plot":"Don Henley performs his greatest hits live in Dallas."',
  '"Title":"Harry Potter and the Deathly Hallows: Part 1","Year":"2010","Rated":"PG-13","Runtime":"146 min","Genre":"Adventure, Family, Fantasy","Actors":"Daniel Radcliffe, Emma Watson, Rupert Grint","Plot":"As Harry, Ron and Hermione race against time and evil to destroy the Horcruxes, they uncover the existence of the three most powerful objects in the wizarding world: the Deathly Hallows."',
  '"Title":"Tangled","Year":"2010","Rated":"PG",,"Runtime":"100 min","Genre":"Animation, Adventure, Comedy","Actors":"Mandy Moore, Zachary Levi, Donna Murphy","Plot":"The magically long-haired Rapunzel has spent her entire life in a tower, but now that a runaway thief has stumbled upon her, she is about to discover the world for the first time, and who she really is."',
  '"Title":"Black Swan","Year":"2010","Rated":"R",,"Runtime":"108 min","Genre":"Drama, Thriller","Actors":"Natalie Portman, Mila Kunis, Vincent Cassel","Plot":"Nina is a talented but unstable ballerina on the verge of stardom. Pushed to the breaking point by her artistic director and a seductive rival, Nina\'s grip on reality slips, plunging her into a waking nightmare."',
  '"Title":"The Social Network","Year":"2010","Rated":"PG-13","Released":"01 Oct 2010","Runtime":"120 min","Genre":"Biography, Drama","Actors":"Jesse Eisenberg, Andrew Garfield, Justin Timberlake","Plot":"As Harvard student Mark Zuckerberg creates the social networking site that would become known as Facebook, he is sued by the twins who claimed he stole their idea and by the co-founder who was later squeezed out of the business."',
  '"Title":"Toy Story 3","Year":"2010","Rated":"G","Runtime":"103 min","Genre":"Animation, Adventure, Comedy","Actors":"Tom Hanks, Tim Allen, Joan Cusack","Plot":"The toys are mistakenly delivered to a day-care center instead of the attic right before Andy leaves for college, and it\'s up to Woody to convince the other toys that they weren\'t abandoned and to return home."',
  '"Title":"A Clockwork Orange","Year":"1971","Rated":"R","Runtime":"136 min","Genre":"Crime, Sci-Fi","Actors":"Malcolm McDowell, Patrick Magee, Michael Bates","Plot":"In the future, a sadistic gang leader is imprisoned and volunteers for a conduct-aversion experiment, but it doesn\'t go as planned."',
  '"Title":"Inception","Year":"2010","Rated":"PG-13",,"Runtime":"148 min","Genre":"Action, Adventure, Sci-Fi","Actors":"Leonardo DiCaprio, Joseph Gordon-Levitt, Elliot Page","Plot":"A thief who steals corporate secrets through the use of dream-sharing technology is given the inverse task of planting an idea into the mind of a C.E.O., but his tragic past may doom the project."'
];
Enter fullscreen mode Exit fullscreen mode

Import ChromaClient.

import { ChromaClient } from "chromadb";
Enter fullscreen mode Exit fullscreen mode

Instantiate a chromaDB client that will connect to the ChromaDBb server.

const client = ChromaClient();
Enter fullscreen mode Exit fullscreen mode

Create a collection.

A collection is a way to organize vectors. Our collection will store all the details and features about the movies in the movies array. Each vector will have the following features:

  • ID,
  • metadata,
  • movie details,
  • and embeddings.

Chroma is integrated with OpenAI's Embeddings, which allows it to leverage OpenAI's Embedding capabilities.

Import OpenAIEmbeddingFunction class from chromadb and instantiate an OpenAIEmbeddingFunction class , authenticate with OpenAI and supply your embedding function in creating a collection.

import { ChromaClient,OpenAIEmbeddingFunction } from "chromadb";
const embeddingFunction = new OpenAIEmbeddingFunction({
  openai_api_key: OPENAI_API_KEY,
});

Enter fullscreen mode Exit fullscreen mode

Create a collection called movies and specify the embedding function.
const collection = await client.createCollection({
name: "movies",
embeddingFunction:embeddingFunction
});

The embedding function ensures that Chroma transforms each individual movie into a multi-dimensional array (embeddings). This will ensure the semantic meaning is maintained, which will be useful when performing queries.

Add data to the Collection

Each movie should have a unique ID, so we will loop over the movie's array, create a unique ID for each movie, and insert it into the database.

for (const movie of movies) {
  const uniqueId = `${Date.now()}-${Math.floor(Math.random() * 10000)}`;

  collection.add({
    documents: [movie],
    ids: [uniqueId],
    metadatas: [{ name: movie }],

  });
Enter fullscreen mode Exit fullscreen mode

To view the collection, navigate to http://localhost:8000/api/v1/collections , and you should see all your collections.

collections

Perform Similarity Search

Let's first get the collection. Use the .getCollection() method and specify the name of your collection and the embeddingFunction.

const mycollection = await client.getCollection({
    name:"movies",
    embeddingFunction:embeddingFunction
})
Enter fullscreen mode Exit fullscreen mode

Search Collection

Let's do a query with the phrase “ recommend for me a movie suitable for kids”,

const results = await mycollection.query({
    queryTexts: ["recommend for me a movie suitable for kids"],
    nResults: 2,
  });
console.log(results.documents);
Enter fullscreen mode Exit fullscreen mode

Here is the response .

[
  [
    '"Title":"Despicable Me","Year":"2010","Rated":"PG","Runtime":"95 min","Genre":"Animation, Adventure, Comedy","Actors":"Steve Carell, Jason Segel, Russell Brand","Plot":"Gru, a criminal mastermind, adopts three orphans as pawns to carry out the biggest heist in history. His life takes an unexpected turn when the little girls see the evildoer as their potential father."',
    `"Title":"Toy Story 3","Year":"2010","Rated":"G","Runtime":"103 min","Genre":"Animation, Adventure, Comedy","Actors":"Tom Hanks, Tim Allen, Joan Cusack","Plot":"The toys are mistakenly delivered to a day-care center instead of the attic right before Andy leaves for college, and it's up to Woody to convince the other toys that they weren't abandoned and to return home."`
  ]
]

Enter fullscreen mode Exit fullscreen mode

We expected our query to return results that are semantically similar to the query, and as you can see, the response is accurate. Despicable Me and Toy Story 3 are all movies suitable for kids. How awesome is this?

Conclusion

In conclusion, this tutorial has shown you how to leverage the power of embeddings and ChromaDB to perform semantic searches in JavaScript.

Stay tuned for part 2, where we will cover how to add a retriever.

Top comments (0)