Jonathan Ellis for DataStax

Posted on Jan 9 • Originally published at datastax.com

The Best Embedding Models for Information Retrieval in 2025

#vectordatabase #embeddingmodels #ai #vectorembeddings

The just-released Voyage-3-large is the surprise leader in embedding relevance

With the exception of OpenAI (whose text-embedding-3 models from March 2023 are ancient in light of the pace of AI progress), all the prominent commercial vector embedding vendors released a new version of their flagship models in late 2024 or early 2025.

Here’s how the latest and greatest proprietary and open-source models stack up against each other in DataStax Astra DB vector search.

Models tested

I tested eight commercial models from three categories:

The Gemini and OpenAI models are still the default option for most people. (Google has released text-embedding-005, but only on its enterprise Vertex platform so far. I tested the older text-embedding-004 that’s available via the Gemini API.)
Jina, Cohere, and Voyage are the third-party vendors enjoying the most success with embeddings models designed for retrieval.
NVIDIA is of course the 800 lb. gorilla in its home market, looking to commoditize its complements by providing high-quality models licensed to run on NVIDIA hardware. They have previously offered fine tunes of the e5 embeddings model, the llama-based model evaluated here is the first of a new generation for them.

I also tested three open models:

Stella is the top-performing model on the MTEB retrieval leaderboard that allows commercial use, so I tested both the 400m and 1.5b variants. (The bte-en-icl model does slightly better, but that’s designed for few-shot use rather than zero-shot, so it’s a different paradigm than everything else here.)
ModernBERT Embed is a brand-new model based on the ModernBERT model from Answer.AI and LightOn AI. ModernBERT aims to improve on the foundational BERT model in both speed and accuracy, enabling models like Nomic’s Embed to inherit the same advantages.

Here are the details:

Models marked with * are trained with Matroyshka techniques, meaning it is designed to keep the most important information in the first dimensions of the output vector, allowing the vector to be truncated while preserving most of the semantic information. I only evaluated the largest, most accurate sizes for these models.

Datasets

These are test sets from the ViDoRe image search benchmark, OCR’d using Gemini Flash 1.5. Details on the datasets can be found in section 3.1 of the ColPali paper. Notably, TabFQuAD and Shift Project sources are in French; the rest are in English.

I picked these because most if not all of the classic text-search datasets are being trained on by model developers for whom it’s more important to get to the top of the MTEB leaderboard than to build something actually useful. By OCRing data from image search datasets, I believe I was able to give these models data that they haven’t seen before.

Cost

Of course, cost is also a concern when evaluating models. Here’s a way to visualize cost versus performance for these models.

I estimated costs for NVIDIA llama v1, ModernBERT Embed, and the Stella models by multiplying their parameter counts by Jina v3’s price/parameter count, since Jina is the only proprietary model for which I have both hosted pricing and parameter counts available.

Observations

ModernBERT Embed and Gemini text-embedding-004 are trained on English only, so their results are not included for the French datasets. The other models are all multilingual. (The Stella models contain “en” in their full name, but they do just fine on the French datasets, so I left them in.)
Voyage-3-large is in a league of its own. None of the others consistently comes close. After also sweeping the reranker results, Tengyu Ma and his team are doing phenomenal work.
There seems to be a general trend towards larger-dimension outputs for models prioritizing the highest relevance. OpenAI’s v3-large was early to the over-2k output size, but NVIDIA’s llama model and voyage-3-large have also moved up to 2048 dimensions. Not coincidentally, these are the three models delivering the most accurate results. And yet, Voyage-3-lite delivers results very nearly as good as NVIDIA llama and OpenAI v3-large in only 512 output dimensions.
Sitting a notch below voyage-3-large, OpenAI’s v3-large and NVDIA’s llama-v1 are quite good.
Stella is also in this second tier, which represents incredibly impressive work from its author, Dun Zhang. After dropping the Stella model like a bomb on HuggingFace, Zhang released a whitepaper in late December giving a few more details. However, the 4x larger stella-1.5b is not significantly more accurate than stella-400m.
Gemini 004 is in a class by itself with modest performance but the low price of Free. This comes with a reasonable rate limit of 1500 RPM; the only downside is that there’s no way to pay for more throughput.
Jina v3 and Cohere v3 are at the bottom and are strictly outcompeted: as much as I love Cohere v3, you can use other models with better performance, for less money.

Conclusion

Voyage continues to kill it with their recent releases; if you want the maximum possible relevance, there is a wide gap between voyage-3-large and the group of models that collectively take second place. Voyage-3-lite is also in a strong position with respect to cost:performance, coming very close to openai-v3-large performance for about 1/5 of the price – and with a much smaller output size, meaning searches will be proportionally faster.

On the open source side, Stella is an excellent option out-of-the-box, and small enough to easily fine-tune for even better performance. It’s crazy to me that this came from a single developer.

It’s an exciting time to build with AI!

DEV Community

The Best Embedding Models for Information Retrieval in 2025

Models tested

Datasets

Cost

Observations

Conclusion

Top comments (0)

Read next

8 Modern Developer Tools That Will 10X Your Productivity 🚀🔥

Stepping into AI as a Web Developer

How to Integrate AI Models into Modern Web Applications: A Comprehensive Guide with Examples

Intel Gaudi NPU Matches NVIDIA GPU Performance at 30% Lower Cost in AI Workload Tests