DEV Community

Rishabh Bisht for MongoDB

Posted on

Building an Archaeology Matcher: A (Literal) Deep Dive Into Multimodal Vector Search

The MongoDB team built a visual similarity search app for SymfonyCon. This article lays out how to build the app using Voyage AI multimodal embedding, MongoDB Atlas, and SymfonyAI.

This article was written by Pauline Vos.

What you’ll learn

  • What vector search and embedding models are.
  • How to use MongoDB to store embeddings and create vector search indexes.
  • How Symfony AI can help enhance your applications with AI.

SymfonyCon 2025 is behind us, and MongoDB's PHP team is still basking in the afterglow of so many great talks and conversations at our booth. After six years, SymfonyCon returned to my home town: Amsterdam—a major European city with a rich history—and this year marks its 750-year jubilee. What better way to represent ourselves than to show off a bit of that history and our powerful vector search capabilities?

First, some context

(Or watch the video.)

If you're ever in Amsterdam, I recommend making a stop at the Rokin metro station. One of the escalators serves as a sort of history museum: It displays a curation of archaeological finds collected during the construction of metro line 52.

Not planning to come over anytime soon? Lucky for you, many of the ~140,000 recorded objects were cataloged and published on Below the Surface. I've adored this website since its launch in 2017, as it lets users explore a timeline of objects ranging from the early 2000s to over a hundred thousand years ago. Not only that, it offers all the data in a downloadable CSV file to use freely—and don't we love some good data at MongoDB?

The Symfony app we built

So, how to wrangle all this data? Using MongoDB Atlas Vector Search, Voyage AI's multimodal capabilities, and Symfony AI, I built a little Symfony app that allows you to submit a description or image of an object, and finds the most similar object from the depths of Amsterdam's history! Pretty neat, right?

Go on, try it out (we do not store your image).

QR code that takes you to the app

Scan if you're not on your phone

You can also explore the repo on GitHub.

How the app works

A common use of AI is similarity search. This technique uses an embedding model to take data and turn it into a vector embedding—a multi-dimensional array that records many different dimensions of said data. For instance, given one image of a parrot and one of a bicycle, a vector may record features like:

Cartoon illustration of a red macaw parrot Cartoon illustration of a bicycle
Animal. Vehicle.
Has 2 wings. Has 2 wheels.
Has beak. Has basket.
Flies. Rolls.
Jungle. Urban.
Mostly red. Mostly brown.

The number of data points recorded correlates to the number of dimensions the vector has, which is often several thousand.

Using a similarity algorithm, we can now compare vectors to see which are mathematically most similar. This is called a vector search. Given a vector store containing embeddings for the parrot and the bicycle, we can generate another embedding for, say, an image of a duck. When we perform a vector search using the duck's vector embedding, it's statistically most likely the top search result will be a parrot.

That's vector search in a nutshell! Different types of data may be used for this type of search. For instance, a blog enhanced with semantic search may embed text. A reverse image search feature will use image embeddings. In our demo use case, we use multiple data types (text from the CSV file, and images from the website), which is called "multimodal embedding."

Not all embedding models support every data type. Also, different models may have different "opinions" of what constitutes similarity. This can be further affected by the specific purpose a model is trained for. All are important considerations when choosing the right model for your data and use case!

How it was built using MongoDB, Symfony, and Voyage AI

There are two parts to this app: indexing the data, and letting users perform a vector search.

Diagram detailing the indexing architecture

Indexing

SymfonyAI's Store component offers an Indexer that handles indexing. It will:

  1. Load data from a given source (Loader).
  2. Generate vector embeddings (Vectorizer).
  3. Store the embeddings (Store).

1. Load data

First, we can create an implementation of LoaderInterface to load all the artifacts' text and image data from the CSV dataset and the website respectively, to prepare for embedding. For the embedding input, I decided on:

  • A concatenated string made up of all the text columns in the CSV (describing the object's function, material, date range, and more).
  • The object's image(s), which is one or two depending on the object.

Creating this sort of multimodal embedding will improve the similarity search by having both visual data and descriptions to go on for comparing the artifacts. Were we to only embed the images, for example, we wouldn't have data points for things like country of origin or date range.

2. Create vector embeddings

Now we've gathered our inputs for our embedding model (voyage-multimodal-3), we can pass them to Vectorizer. To do this, we define a Voyage Platform service, which we can inject into Vectorizer along with the model we want to use:

# config/packages/ai.yaml
ai:
    platform:
        voyage:
            api_key: '%env(VOYAGE_API_KEY)%'
    vectorizer:
        voyage:
            platform: 'ai.platform.voyage'
            model: 'voyage-multimodal-3'
Enter fullscreen mode Exit fullscreen mode

The Vectorizer will handle sending all the inputs to Voyage AI (in batches), and return the resulting vector embeddings for storage.

3. Store the embeddings

Now that we have our vectors, we can store them in MongoDB.

Note: Before doing this, we have to create vector search indexes in our database. Since we're using Doctrine ODM entities with the VectorSearchIndex attribute, that's a matter of running bin/console doctrine:mongodb:schema:create.

While SymfonyAI comes with a MongoDB Store, for this app, I made our own Doctrine ODM implementation of StoreInterface. I hope to contribute one to the Store component soon to make it easier to use Doctrine entities with SymfonyAI.

When we wire up the Indexer with these three services, it'll pass the data through each of them to complete the indexing process.

# config/packages/ai.yaml
ai:

    # ...

    indexer:
        artifacts:
            loader: 'App\Service\ArtefactLoader'
            vectorizer: 'ai.vectorizer.voyage'
            store: 'App\Store\DoctrineODMStore'
Enter fullscreen mode Exit fullscreen mode

This configuration now enables us to run bin/console ai:store:index artifacts, by the end of which all our artifacts will have multimodal embeddings stored in MongoDB Atlas.

Diagram detailing the search app architecture

Performing the vector search

With our embedding data and search indexes set up, we're ready to start finding matches.

1. Submit photo and/or prompt

In the UI, the user enters a description of what they're trying to find, takes a photo of an object, or both. For instance, a user may take a photo of a toothbrush and enter "medieval" in the text field to see if people in the Middle Ages used similar objects.

2. Generate image description

If there's an image, we use one of OpenAI's GPT models to generate a description of it to embed alongside the image data. We'll need to define another Platform service for OpenAI, which we can inject into our query handler.

# config/packages/ai.yaml
ai:
    platform:
        voyage:
            api_key: '%env(VOYAGE_API_KEY)%'
        openai:
            api_key: '%env(OPENAI_API_KEY)%'
Enter fullscreen mode Exit fullscreen mode
# config/services.yaml
    App\Query\MatchQueryHandler:
        arguments:
            $openAi: '@ai.platform.openai'
Enter fullscreen mode Exit fullscreen mode

You may wonder why I chose to generate a description of an image. After all, don't we already have an embedding model recording data from that same image? The main reason is that we can send instructions to GPT to only focus on the object in the photo and ignore any background, which would only serve as irrelevant noise in our similarity search. We can also ask it to focus specifically on features recorded in the CSV dataset (things like function and finish).

This can add more weight to data points that we're especially interested in comparing, and may improve our results. It's definitely trial and error, so I recommend playing around with different messages to see how they change results.

For our app, we're sending the following to GPT:

$messages = new MessageBag(
    Message::forSystem('You are an image analyzer bot that helps identify objects in images.'),
    Message::ofUser(
        'Describe the object in the foreground, ignoring the background or the person holding it. Try to focus primarily on the function. Color, shape, material, and finish of the object may also be included.',
        $image
    ),
);
Enter fullscreen mode Exit fullscreen mode

3. Create vector embedding

Armed with our text description and image data, we can generate another multimodal embedding by sending it all off to Voyage AI. The resulting vector can then be used as a search query for a vector search!

4. Perform vector search

For our final trick, we use the Doctrine ODM aggregation builder to execute a vector search aggregation:

$builder = $this->documentManager->getRepository(Artefact::class)
    ->createAggregationBuilder()
    ->hydrate(MatchCandidate::class);

$builder
    ->vectorSearch()
    ->limit(10)
    ->numCandidates(200)
    ->index('default')
    ->path('embeddingVector')
    ->queryVector($vectorized->vector->getData())
    ->project()
    ->field('_id')->expression(0)
    ->field('artefact')->expression('$$ROOT')
    ->field('score')->meta('vectorSearchScore');

$matches = $builder
    ->getAggregation()
    ->execute();
Enter fullscreen mode Exit fullscreen mode

This aggregation will return the 10 best matches it could find, including their scores. By default, they're ordered by score (descending), so the first match will be the most similar.

Et voilà! Pass them on to a template, and the user is presented with a cool object from Amsterdam's history and some other close matches. For example, this picture of the ring my stepdad made me when I was a kid yields the following results:A photograph of a hand holding a small ring. The band is silver. A square made of gold and silver sits on the band, and a small round ruby is embedded into the square.

A screenshot of a web app showing two columns. The left column provides a description of the ring held in the previous image, and below it a picture of an old metal ring with a header saying

Next steps

Want to see more? Try out the app! You can also build it yourself using SymfonyAI and MongoDB.

Key takeaways

  • Generate vector embeddings and store them in MongoDB Atlas to implement similarity search.
  • Combine different types of data in multimodal embeddings to enhance search results.
  • Symfony AI can help you tie your store and platform together to easily index and embed your data.

Top comments (0)