DEV Community

MongoDB Guests for MongoDB

Posted on

My first “local” Vector Search: MongoDB community edition

This article was written by Darshan Jayarama.

Ever since I received an email about the vector search auto-Embedding feature released in MongoDB community edition, my palms have been itching to test it out. And now.. I’m happily writing this blog after witnessing its power firsthand.

Vector Search is one of the most admired, powerful, and amazing products of MongoDB. I felt pity for MongoDB Community/On-prem customers, as they were unable to use this because Vector search and Atlas search were only available in MongoDB Atlas. But this is no longer the case!

How did I test this?

I am sure by now you might be thinking, “Enough of this product endorsement, show me the code!”

I’m getting there :). Initially, I followed what the MongoDB documentation suggests, but I struggled a lot as there is a disconnection when setting up the Docker containers:

  1. As per the Installation documentation, select Docker as your operating system.
  2. When the MongoDB community server starts, it starts with hostname as mongod.search-community:27017. Make a note of this.
  3. When we asked to start the MongoDB Search container, in that yml file, it sets the syncsource as mongot-community.search-community:27017

    • Don't just copy/paste the yml from there. Make sure you are setting the correct syncSource, which should be the Community server container name mongod.search-community:27017. To ensure your container hostname, run db.isMaster().me, it should print you the hostname.
  4. Next, there is mongot username and password entry that should be made in the yml file. Whileyou are creating a user, the instructions specify mongot user as mongot, but in the yml file, it is mentioned as mongotUser. If you created it with another name, correct the section.

After these corrections, you can see the 2 containers running happily (I literally jumped out of my chair over joy). Next, connect to the mongod using mongosh to test the power.

I created an index on the favorite movies collection, plot field:

db.movies.createSearchIndex("vector_index", "vectorSearch", {
 "fields": [
   {
     "type": "autoEmbed",
     "modality": "text",
     "path": "plot",
     "model": "voyage-4"
   },
   {
     "type": "filter",
     "path": "year"
   }
 ]
})
Enter fullscreen mode Exit fullscreen mode

You can use voyage-4-lite, but as per the voyage usage stats, both voyage-4 and voyage-4-lite both have the same 10000TPM 3RPM limitation. My preferred choice is voyage-4.

Then I wanted to see the results based on the context. So I ran the below query:

db.movies.aggregate([
 {
   "$vectorSearch": {
     "index": "vector_index",
     "path": "plot",
     "query": {
       "text": "bullied boy learns karate"
     },"model": "voyage-4-lite" ,
           "numCandidates": 10000,
     "limit": 10
   }
 },
 {
   "$project": {
     "_id": 0,
     "title": 1,
     "year":1,
     "plot": 1,
     "score": { $meta: "vectorSearchScore" }
   }
 }
])
Enter fullscreen mode Exit fullscreen mode

The above query should retrieve movies with context of “bullied boy learns karate” (here we expect ‘karate-kid’):

The answers amused me:

[
 {
   plot: 'A love-struck weakling must pretend to be boxer in order to gain respect from the family of the girl he loves.',
   title: 'Battling Butler',
   year: 1926,
   score: 0.6800339221954346
 },
 {
   plot: 'Two young brothers become the leaders of a gang of kids in their neighborhood. Their father is an office clerk who tries for advancement by playing up his boss. When the boys visit the boss...',
   title: 'I Was Born, But...',
   year: 1932,
   score: 0.6692402362823486
 },
 {
   plot: 'A living puppet, with the help of a cricket as his conscience, must prove himself worthy to become a real boy.',
   title: 'Pinocchio',
   year: 1940,
   score: 0.6628834009170532
 },
 {
   plot: 'An idealistic adolescent, suffering under the thumb of a sadistic schoolmaster, falls in love with a loose girl who is bullied and tormented by another lover.',
   title: 'Torment',
   year: 1944,
   score: 0.6620646715164185
 },
 {
   plot: `Against all odds Father Flanagan starts "Boys' Town" after hearing a convict's story. Whitey Marsh comes there. He runs away but, hungry, returns. He runs away again but, when friend Pee ...`,
   title: 'Boys Town',
   year: 1938,
   score: 0.6570459604263306
 },
 {
   plot: 'When three thuggish men are responsible for the death of his father and the crippling of his brother, young David must choose between supporting his family or risking his life and exacting vengeance.',
   title: "Tol'able David",
   year: 1921,
   score: 0.6559526920318604
 },
 {
   plot: "Fight promoter Nick Donati grooms a bellhop as a future champ, but has second thoughts when the 'kid' falls for his sister.",
   title: 'Kid Galahad',
   year: 1937,
   score: 0.6525536179542542
 },
 {
   plot: 'In a repressive boarding school with rigid rules of behavior, four boys decide to rebel against the direction on a celebration day.',
   title: 'Zero for Conduct',
   year: 1933,
   score: 0.651709794998169
 },
 {
   plot: 'While at a ski lodge, Larry Blake sees instructor Karin Borg and decides to sign up for private lessons. The next thing he knows, she is Mrs. Blake. When he announces that he is going back ...',
   title: 'Two-Faced Woman',
   year: 1941,
   score: 0.6513179540634155
 },
 {
   plot: 'To reconcile with his girlfriend, a bookish college student tries to become an athlete.',
   title: 'College',
   year: 1927,
   score: 0.6459328532218933
 }
]
Enter fullscreen mode Exit fullscreen mode

Even though the expected result was not there, all the results related to karate, boxing, or being bullied.

Below are the commands I have used to complete the setup of this.

Pulling docker images;

docker pull mongodb/mongodb-community-search:latest
docker pull mongodb/mongodb-community-server:latest
Enter fullscreen mode Exit fullscreen mode

Create internal docker network to communicate

docker network create search-community
Enter fullscreen mode Exit fullscreen mode

Starting community server container

echo '
net:
  port: 27017
  bindIpAll: true  # Equivalent to --bind_ip_all

replication:
  replSetName: rs0
systemLog:
  destination: file
  path: "/var/log/mongodb/mongod.log"
  logAppend: true

setParameter:
  searchIndexManagementHostAndPort: mongot-community.search-community:27028
  mongotHost: mongot-community.search-community:27028
  skipAuthenticationToSearchIndexManagementServer: false
  useGrpcForSearch: true

# Security configuration
security:
  authorization: enabled  # Equivalent to --auth
  keyFile: /keyfile' > mongod.conf

mkdir ./data/db

openssl rand -base64 756 > keyfile
chmod 400 keyfile

docker run -d --rm --name mongod -e MONGODB_INITDB_ROOT_USERNAME=root -e MONGODB_INITDB_ROOT_PASSWORD=rootpass -v ./mongod.conf:/etc/mongod.conf:ro -v ./data/db:/data/db -v ./keyfile:/keyfile -p 27017:27017 --network search-community mongodb/mongodb-community-server:latest --config /etc/mongod.conf
Enter fullscreen mode Exit fullscreen mode

Initiate replication

mongosh -u root -p rootpass --eval 'rs.initiate(); sleep(10); rs.status()'
mongosh -u root -p rootpass --eval "db.getSiblingDB('admin').createUser(
 {
   user: 'mongot',
   pwd: 'mongotPassword',
   roles: ['searchCoordinator']
 }
)"
Enter fullscreen mode Exit fullscreen mode

Load sample movies collection data
Downloaded movies from https://github.com/neelabalan/mongodb-sample-dataset/blob/main/sample_mflix/movies.json

mongoimport  -d sample_mflix -c movies /Users/darshanj/Downloads/movies.json -u root -p rootpass --authenticationDatabase admin
Enter fullscreen mode Exit fullscreen mode

Prepare search process

mkdir mongot_data
Enter fullscreen mode Exit fullscreen mode

Creating mongot config file

hostname=`mongosh -u root -p rootpass --eval "db.isMaster().me"`
cat << EOF > mongot.config
syncSource:
  replicaSet:
     hostAndPort: "$hostname" 
     username: "mongot"
     passwordFile: "/passwordFile"
     authSource: "admin"
     tls: false
     readPreference: primaryPreferred
storage:
  dataPath: "/data/mongot"
server:
  grpc:
     address: "mongot-community.search-community:27028"
     tls:
        mode: "disabled"
metrics:
  enabled: true
  address: "mongot-community.search-community:9946"
healthCheck:
  address: "mongot-community.search-community:8080"
logging:
  verbosity: INFO
embedding:
  queryKeyFile: /etc/mongot/voyage-api-query-key
  indexingKeyFile: /etc/mongot/voyage-api-indexing-key
  providerEndpoint: https://api.voyageai.com/v1/embeddings
  isAutoEmbeddingViewWriter: true

echo -n "mongotPassword" > passwordFile
chmod 400 passwordFile
Enter fullscreen mode Exit fullscreen mode

Go to https://dashboard.voyageai.com/organization/api-keys API key, copy secret

printf "<your-voyage-api-query-key>" > voyage-api-query-key
Enter fullscreen mode Exit fullscreen mode

Repeat one more time for index API. Go to https://dashboard.voyageai.com/organization/api-keys API key, copy secret

printf "<your-voyage-api-index-key>" > voyage-api-indexing-key
Enter fullscreen mode Exit fullscreen mode

Starting MongoDB Search

docker run -d --rm --name mongot-community -v ./mongot_data:/data/mongot -v ./mongot.config:/mongot-community/config.default.yml -v ./passwordFile:/passwordFile:ro -v ./voyage-api-indexing-key:/etc/mongot/voyage-api-indexing-key:ro -v ./voyage-api-query-key:/etc/mongot/voyage-api-query-key:ro --network search-community -p 8080:8080 -p 9946:9946 mongodb/mongodb-community-search:latest --internalListAllIndexesForTesting=true
Enter fullscreen mode Exit fullscreen mode

Check the docker status

docker ps
Enter fullscreen mode Exit fullscreen mode

Now connect to the mongosh and create the index and run the test query

mongosh -u root -p rootpass

use sample_mflix
db.movies.createSearchIndex("vector_index", "vectorSearch", {
 "fields": [
   {
     "type": "autoEmbed",
     "modality": "text",
     "path": "plot",
     "model": "voyage-4"
   },
   {
     "type": "filter",
     "path": "year"
   }
 ]
})

db.movies.aggregate([
 {
   "$vectorSearch": {
     "index": "vector_index",
     "path": "plot",
     "query": {
       "text": "bullied boy learns karate"
     },"model": "voyage-4-lite" ,
           "numCandidates": 10000,
     "limit": 10
   }
 },
 {
   "$project": {
     "_id": 0,
     "title": 1,
     "year":1,
     "plot": 1,
     "score": { $meta: "vectorSearchScore" }
   }
 }
])
Enter fullscreen mode Exit fullscreen mode

I know we can make one single Docker compose file to make it easier, but I prefer this way to make debugging easier, and the understanding of each step will be clearer.

Congratulations! You have successfully created your first local vector search.

PS: To be safe, delete the API keys when no longer needed, just like I did :).

Top comments (0)