This article was written by Darshan Jayarama.
Ever since I received an email about the vector search auto-Embedding feature released in MongoDB community edition, my palms have been itching to test it out. And now.. I’m happily writing this blog after witnessing its power firsthand.
Vector Search is one of the most admired, powerful, and amazing products of MongoDB. I felt pity for MongoDB Community/On-prem customers, as they were unable to use this because Vector search and Atlas search were only available in MongoDB Atlas. But this is no longer the case!
How did I test this?
I am sure by now you might be thinking, “Enough of this product endorsement, show me the code!”
I’m getting there :). Initially, I followed what the MongoDB documentation suggests, but I struggled a lot as there is a disconnection when setting up the Docker containers:
- As per the Installation documentation, select Docker as your operating system.
- When the MongoDB community server starts, it starts with hostname as
mongod.search-community:27017. Make a note of this. -
When we asked to start the MongoDB Search container, in that yml file, it sets the syncsource as
mongot-community.search-community:27017- Don't just copy/paste the yml from there. Make sure you are setting the correct syncSource, which should be the Community server container name
mongod.search-community:27017. To ensure your container hostname, rundb.isMaster().me, it should print you the hostname.
- Don't just copy/paste the yml from there. Make sure you are setting the correct syncSource, which should be the Community server container name
Next, there is
mongotusername and password entry that should be made in the yml file. Whileyou are creating a user, the instructions specify mongot user as mongot, but in the yml file, it is mentioned asmongotUser. If you created it with another name, correct the section.
After these corrections, you can see the 2 containers running happily (I literally jumped out of my chair over joy). Next, connect to the mongod using mongosh to test the power.
I created an index on the favorite movies collection, plot field:
db.movies.createSearchIndex("vector_index", "vectorSearch", {
"fields": [
{
"type": "autoEmbed",
"modality": "text",
"path": "plot",
"model": "voyage-4"
},
{
"type": "filter",
"path": "year"
}
]
})
You can use voyage-4-lite, but as per the voyage usage stats, both voyage-4 and voyage-4-lite both have the same 10000TPM 3RPM limitation. My preferred choice is voyage-4.
Then I wanted to see the results based on the context. So I ran the below query:
db.movies.aggregate([
{
"$vectorSearch": {
"index": "vector_index",
"path": "plot",
"query": {
"text": "bullied boy learns karate"
},"model": "voyage-4-lite" ,
"numCandidates": 10000,
"limit": 10
}
},
{
"$project": {
"_id": 0,
"title": 1,
"year":1,
"plot": 1,
"score": { $meta: "vectorSearchScore" }
}
}
])
The above query should retrieve movies with context of “bullied boy learns karate” (here we expect ‘karate-kid’):
The answers amused me:
[
{
plot: 'A love-struck weakling must pretend to be boxer in order to gain respect from the family of the girl he loves.',
title: 'Battling Butler',
year: 1926,
score: 0.6800339221954346
},
{
plot: 'Two young brothers become the leaders of a gang of kids in their neighborhood. Their father is an office clerk who tries for advancement by playing up his boss. When the boys visit the boss...',
title: 'I Was Born, But...',
year: 1932,
score: 0.6692402362823486
},
{
plot: 'A living puppet, with the help of a cricket as his conscience, must prove himself worthy to become a real boy.',
title: 'Pinocchio',
year: 1940,
score: 0.6628834009170532
},
{
plot: 'An idealistic adolescent, suffering under the thumb of a sadistic schoolmaster, falls in love with a loose girl who is bullied and tormented by another lover.',
title: 'Torment',
year: 1944,
score: 0.6620646715164185
},
{
plot: `Against all odds Father Flanagan starts "Boys' Town" after hearing a convict's story. Whitey Marsh comes there. He runs away but, hungry, returns. He runs away again but, when friend Pee ...`,
title: 'Boys Town',
year: 1938,
score: 0.6570459604263306
},
{
plot: 'When three thuggish men are responsible for the death of his father and the crippling of his brother, young David must choose between supporting his family or risking his life and exacting vengeance.',
title: "Tol'able David",
year: 1921,
score: 0.6559526920318604
},
{
plot: "Fight promoter Nick Donati grooms a bellhop as a future champ, but has second thoughts when the 'kid' falls for his sister.",
title: 'Kid Galahad',
year: 1937,
score: 0.6525536179542542
},
{
plot: 'In a repressive boarding school with rigid rules of behavior, four boys decide to rebel against the direction on a celebration day.',
title: 'Zero for Conduct',
year: 1933,
score: 0.651709794998169
},
{
plot: 'While at a ski lodge, Larry Blake sees instructor Karin Borg and decides to sign up for private lessons. The next thing he knows, she is Mrs. Blake. When he announces that he is going back ...',
title: 'Two-Faced Woman',
year: 1941,
score: 0.6513179540634155
},
{
plot: 'To reconcile with his girlfriend, a bookish college student tries to become an athlete.',
title: 'College',
year: 1927,
score: 0.6459328532218933
}
]
Even though the expected result was not there, all the results related to karate, boxing, or being bullied.
Below are the commands I have used to complete the setup of this.
Pulling docker images;
docker pull mongodb/mongodb-community-search:latest
docker pull mongodb/mongodb-community-server:latest
Create internal docker network to communicate
docker network create search-community
Starting community server container
echo '
net:
port: 27017
bindIpAll: true # Equivalent to --bind_ip_all
replication:
replSetName: rs0
systemLog:
destination: file
path: "/var/log/mongodb/mongod.log"
logAppend: true
setParameter:
searchIndexManagementHostAndPort: mongot-community.search-community:27028
mongotHost: mongot-community.search-community:27028
skipAuthenticationToSearchIndexManagementServer: false
useGrpcForSearch: true
# Security configuration
security:
authorization: enabled # Equivalent to --auth
keyFile: /keyfile' > mongod.conf
mkdir ./data/db
openssl rand -base64 756 > keyfile
chmod 400 keyfile
docker run -d --rm --name mongod -e MONGODB_INITDB_ROOT_USERNAME=root -e MONGODB_INITDB_ROOT_PASSWORD=rootpass -v ./mongod.conf:/etc/mongod.conf:ro -v ./data/db:/data/db -v ./keyfile:/keyfile -p 27017:27017 --network search-community mongodb/mongodb-community-server:latest --config /etc/mongod.conf
Initiate replication
mongosh -u root -p rootpass --eval 'rs.initiate(); sleep(10); rs.status()'
mongosh -u root -p rootpass --eval "db.getSiblingDB('admin').createUser(
{
user: 'mongot',
pwd: 'mongotPassword',
roles: ['searchCoordinator']
}
)"
Load sample movies collection data
Downloaded movies from https://github.com/neelabalan/mongodb-sample-dataset/blob/main/sample_mflix/movies.json
mongoimport -d sample_mflix -c movies /Users/darshanj/Downloads/movies.json -u root -p rootpass --authenticationDatabase admin
Prepare search process
mkdir mongot_data
Creating mongot config file
hostname=`mongosh -u root -p rootpass --eval "db.isMaster().me"`
cat << EOF > mongot.config
syncSource:
replicaSet:
hostAndPort: "$hostname"
username: "mongot"
passwordFile: "/passwordFile"
authSource: "admin"
tls: false
readPreference: primaryPreferred
storage:
dataPath: "/data/mongot"
server:
grpc:
address: "mongot-community.search-community:27028"
tls:
mode: "disabled"
metrics:
enabled: true
address: "mongot-community.search-community:9946"
healthCheck:
address: "mongot-community.search-community:8080"
logging:
verbosity: INFO
embedding:
queryKeyFile: /etc/mongot/voyage-api-query-key
indexingKeyFile: /etc/mongot/voyage-api-indexing-key
providerEndpoint: https://api.voyageai.com/v1/embeddings
isAutoEmbeddingViewWriter: true
echo -n "mongotPassword" > passwordFile
chmod 400 passwordFile
Go to https://dashboard.voyageai.com/organization/api-keys API key, copy secret
printf "<your-voyage-api-query-key>" > voyage-api-query-key
Repeat one more time for index API. Go to https://dashboard.voyageai.com/organization/api-keys API key, copy secret
printf "<your-voyage-api-index-key>" > voyage-api-indexing-key
Starting MongoDB Search
docker run -d --rm --name mongot-community -v ./mongot_data:/data/mongot -v ./mongot.config:/mongot-community/config.default.yml -v ./passwordFile:/passwordFile:ro -v ./voyage-api-indexing-key:/etc/mongot/voyage-api-indexing-key:ro -v ./voyage-api-query-key:/etc/mongot/voyage-api-query-key:ro --network search-community -p 8080:8080 -p 9946:9946 mongodb/mongodb-community-search:latest --internalListAllIndexesForTesting=true
Check the docker status
docker ps
Now connect to the mongosh and create the index and run the test query
mongosh -u root -p rootpass
use sample_mflix
db.movies.createSearchIndex("vector_index", "vectorSearch", {
"fields": [
{
"type": "autoEmbed",
"modality": "text",
"path": "plot",
"model": "voyage-4"
},
{
"type": "filter",
"path": "year"
}
]
})
db.movies.aggregate([
{
"$vectorSearch": {
"index": "vector_index",
"path": "plot",
"query": {
"text": "bullied boy learns karate"
},"model": "voyage-4-lite" ,
"numCandidates": 10000,
"limit": 10
}
},
{
"$project": {
"_id": 0,
"title": 1,
"year":1,
"plot": 1,
"score": { $meta: "vectorSearchScore" }
}
}
])
I know we can make one single Docker compose file to make it easier, but I prefer this way to make debugging easier, and the understanding of each step will be clearer.
Congratulations! You have successfully created your first local vector search.
PS: To be safe, delete the API keys when no longer needed, just like I did :).

Top comments (0)