Stephen Collins

Posted on Oct 2, 2023

How to use Chroma to store and query vector embeddings

#chroma #vectordatabase #vectorembeddings #semanticsearch

Chroma is an open-source embedding database designed to store and query vector embeddings efficiently, enhancing Large Language Models (LLMs) by providing relevant context to user inquiries. In this tutorial, I will explain how to use Chroma in persistent server mode using a custom embedding model within an example Python project. The companion code repository for this blog post is available on GitHub.

Prequisites

Here are the items that you need to have installed before continuing with this tutorial:

Git installed on your system (for cloning Chroma).
Chroma (for our example project), PyTorch and Transformers installed in your Python environment.
Docker installed on your system.
Docker Compose also installed on your system.

Setting Up Chroma

Before diving into the code, we need to set up Chroma in server mode.

Create a new project directory for our example project. Next, we need to clone the Chroma repository to get started. At the root of your project directory let's clone Chroma into it:

git clone git@github.com:chroma-core/chroma.git

This will create a subdirectory chroma inside of your current project directory. Once you've cloned the Chroma repository, navigate to the root of the chroma directory and run the following command at the root of the chroma directory to start the server:

docker compose up --build

This will set up Chroma and run it as a server with uvicorn, making port 8000 accessible outside the net docker network. The command also mounts a persistent docker volume for Chroma's database, found at chroma/chroma from your project's root.

I won't cover how to implement authentication with chroma in server mode, to keep this blog post simpler and more focused on exploring Chroma's functionality. More information on chroma authentication.

Next, ensure that the server is running by executing in another terminal:

curl http://localhost:8000/api/v1/heartbeat

You should get a response like:

{"nanosecond heartbeat":1696129725137410131}

Now that the chroma server is running, let's move onto our example Python app project for creating, storing and querying vector embeddings.

Embedding Generation

In embedding_util.py, used by our app.py module, we define a custom embedding class (that I am calling CustomEmbeddingFunction) by inheriting chroma's EmbeddingFunction class and leveraging the Transformers library. This function tokenizes the input text and generates embeddings using a pre-trained model, in this case, thenlper/gte-base one of the currently top performing open source embedding models - and very runnable on many consumer hardware devices. The inspiration the implementation of generate_embeddings came from the gte-base model card on Hugging Face.

class CustomEmbeddingFunction(EmbeddingFunction):
    def __call__(self, texts: Documents) -> Embeddings:
        return list(map(generate_embeddings, texts))

Creating the Chroma Client

Now in app.py, we import the necessary modules and create a chroma client by specifying the host and port where the Chroma server is running.

from chromadb import HttpClient
from embedding_util import CustomEmbeddingFunction

client = HttpClient(host="localhost", port=8000)

Testing our client with the following heartbeat check:

print('HEARTBEAT:', client.heartbeat())

Creating Collections and Adding Documents

Once the chroma client is created, we need to create a chroma collection to store our documents. A collection can be created or retrieved using get_or_create_collection method.

collection = client.get_or_create_collection(
    name="test", embedding_function=CustomEmbeddingFunction())

After creating the collection, we can add documents to it. Here, I've added an array of documents related to various topics, each assigned a unique ID.

documents = [
    "A group of vibrant parrots chatter loudly, sharing stories of their tropical adventures.",
    "The mathematician found solace in numbers, deciphering the hidden patterns of the universe.",
    "The robot, with its intricate circuitry and precise movements, assembles the devices swiftly.",
    "The chef, with a sprinkle of spices and a dash of love, creates culinary masterpieces.",
    "The ancient tree, with its gnarled branches and deep roots, whispers secrets of the past.",
    "The detective, with keen observation and logical reasoning, unravels the intricate web of clues.",
    "The sunset paints the sky with shades of orange, pink, and purple, reflecting on the calm sea.",
    "In the dense forest, the howl of a lone wolf echoes, blending with the symphony of the night.",
    "The dancer, with graceful moves and expressive gestures, tells a story without uttering a word.",
    "In the quantum realm, particles flicker in and out of existence, dancing to the tunes of probability."]

# Every document needs an id for Chroma
document_ids = list(map(lambda tup: f"id{tup[0]}", enumerate(documents)))

collection.add(documents=documents, ids=document_ids)

Querying the Collection

With our documents added, we can query the collection to find the most similar documents to a given query. Below, we execute a query and print the most similar documents along with their distance scores, which we will calculate cosine similiarty from with 1 - cosine distance. The higher the cosine similarity, the more similiar the given document is to the input query.

This is particularly useful for developing applications like AI-driven customer support agents, especially when utilizing existing collections of help documentation or e-commerce product listings.

result = collection.query(query_texts=[query], n_results=5, include=["documents", 'distances',])

for id_, document, distance in zip(ids, documents, distances):
    print(f"ID: {id_}, Document: {document}, Similarity: {1 - distance}")

Running the Example

To run our example app, first, ensure you've installed the dependencies listed in the requirements.txt file, and then run app.py using a modern Python 3 version (This example project was tested with Python version 3.9.6).

python app.py

You should see output printed similar to the following:

HEARTBEAT: 1696127501102440278
Query: Give me some content about the ocean
Most similar sentences:
ID: id6, Document: The sunset paints the sky with shades of orange, pink, and purple, reflecting on the calm sea., Similarity: 0.6018089274366792
ID: id4, Document: The ancient tree, with its gnarled branches and deep roots, whispers secrets of the past., Similarity: 0.5219426511858611
ID: id0, Document: A group of vibrant parrots chatter loudly, sharing stories of their tropical adventures., Similarity: 0.5164872313681625
ID: id7, Document: In the dense forest, the howl of a lone wolf echoes, blending with the symphony of the night., Similarity: 0.48931321779282144
ID: id1, Document: The mathematician found solace in numbers, deciphering the hidden patterns of the universe., Similarity: 0.4799339689190174

Chroma orders the output by similarity to the input query - thus vector search with results sorted by similarity.

Conclusion

Chroma provides a versatile and efficient platform for managing vector embeddings, allowing developers to easily integrate advanced search and similarity features into their applications. By following this tutorial, you can set up and interact with Chroma to explore its capabilities and adapt them to suit your project needs.

For more details and resources, visit Chroma’s official documentation and GitHub repository.

This blog post's companion code repository is available on GitHub.

Questions or comments? Feel free to contact me or connect on social media!

DEV Community

How to use Chroma to store and query vector embeddings

Prequisites

Setting Up Chroma

Embedding Generation

Creating the Chroma Client

Creating Collections and Adding Documents

Querying the Collection

Running the Example

Conclusion

Top comments (0)

Read next

Arbitrum's Approach to Token Burning: A Deeper Dive into Ethereum's Layer 2 Solution

Arbitrum and Regulatory Challenges: Navigating the Evolving Landscape of Decentralized Finance

Arbitrum One vs Arbitrum Nova: Navigating the Future of Ethereum Scaling

"Unlocking Efficiency: LServe's Breakthrough in Long-Sequence LLMs"