Skip to content

DEV Community

Reena Sharma

Posted on Jun 17

Speed, Accuracy, and Efficiency: Benchmarking Endee vs. Google Vertex AI

#benchmark #ai #vectordatabase #machinelearning

Vector databases are the quiet powerhouse behind generative AI, semantic search, and real-time recommendation engines. Picking the right one isn't just an engineering detail, it's a decision that ripples through your cloud bill, your system's scalability, and the experience your users actually get.

So we put that decision to the test. We ran a head-to-head benchmark of Endee vs. Google Vertex AI Vector Search using VectorDBBench (Zilliz's open-source benchmarking tool) against 1 million Cohere vectors (768 dimensions).

We measured three things:
1)Accuracy (recall)
2)Throughput (QPS)
3)Responsiveness (p99 latency).

The Hardware

Google Vertex AI: n1-standard-16 — 16 vCPUs / 60 GB RAM
Endee: custom container — 4 vCPUs / 16 GB RAM

Keep that gap in mind, every result below comes from Endee running on a fraction of the iron Vertex AI was given.

1. Retrieval Accuracy: Recall vs. Top-K
Recall measures how good a system is at surfacing the truly relevant matches out of a huge dataset. Top-K is how many results you ask it to return.

For this test, throughput was held steady at ~800 QPS so we could isolate how accuracy behaves as K increases.

Configuration:

Vertex AI: approx_neighbors=128
Endee: m=32, ef_con=256, precision=int16

Results:

Top-3: Vertex AI 89.97% | Endee 99.23%
Top-5: Vertex AI 89.32% | Endee 99.34%
Top-10: Vertex AI 88.93% | Endee 99.18%
Top-15: Vertex AI 85.80% | Endee 99.11%
Top-30: Vertex AI 77.76% | Endee 98.67%

Why it matters:

If your app can't find the right vectors, it serves bad recommendations — full stop. Vertex AI's recall erodes as K grows, dropping to 77.76% at Top-30. Endee barely moves, staying above 98.6% across the board, on lighter hardware.

2. Throughput: QPS vs. Concurrence
Throughput is your ceiling for traffic. To compare fairly, we tuned both systems to a matching ~97.3% recall baseline, then ramped up concurrent requests.

Configuration:
Vertex AI: leaf_nodes_to_search=0.195
Endee: m=16, ef_con=128, ef_search=128, precision=int16
Results (QPS):

Concurrency 2: Vertex AI 140.81 | Endee 661.13
Concurrency 4: Vertex AI 279.66 | Endee 1,295.04
Concurrency 8: Vertex AI 544.99 | Endee 1,881.23
Concurrency 16: Vertex AI 1,079.52 | Endee 2,091.50

Why it matters:

Throughput is a direct line to your cloud bill. Vertex AI needs a 16-core box to hit 1,080 QPS. Endee nearly doubles that: 2,100 QPS on a 4-core box. That's not a marginal efficiency gain, that's a different cost curve entirely as you scale.

3. Responsiveness: p99 Latency vs. Concurrency
p99 latency is your worst-case response time for 99% of request, the number that actually determines whether your UI feels instant or sluggish. Same 97.3% accuracy baseline as above.

Results (p99 latency, ms):

Concurrency 2: Vertex AI 59.2 | Endee 3.7
Concurrency 4: Vertex AI 68.7 | Endee 3.7
Concurrency 8: Vertex AI 62.5 | Endee 3.8
Concurrency 16: Vertex AI 25.3 | Endee 3.7

Why it matters:

Vertex AI bounces between 25ms and 69ms depending on load. Endee sits flat at 3.7–3.8ms regardless of concurrency basically removing the database as a bottleneck in your request path.

Wrapping Up
Across all three axes: recall, throughput, and latency, Endee came out ahead of Vertex AI Vector Search, and did it on roughly a quarter of the compute (4 vCPU / 16 GB vs. 16 vCPU / 60 GB).

All the configs above are reproducible: VectorDBBench is open-source, and so is Endee. If you want to run this benchmark on your own dataset, or just want to see how Endee handles your specific recall/throughput tradeoffs, the fastest way is to spin it up directly at Endee is open-source (Apache 2.0), self-hostable via Docker, or available as a managed service with a free Starter tier. Full docs, quickstarts, and integration guides (LangChain, LlamaIndex, CrewAI) are at docs.endee.io.
If you run your own benchmark against Endee, we'd genuinely like to see the numbers — drop them in the comments or find us on

endee-io / endee

Endee.io – A high-performance vector database, designed to handle up to 1B vectors on a single node, delivering significant performance gains through optimized indexing and execution. Also available in cloud https://endee.io/

High-performance open-source vector database for AI search, RAG, semantic search, and hybrid retrieval.

Quick Start • Why Endee • Use Cases • Features • API and Clients • Docs • Contact

Endee: Open-Source Vector Database for AI Search

Endee is a high-performance open-source vector database built for AI search and retrieval workloads. It is designed for teams building RAG pipelines, semantic search, hybrid search, recommendation systems, and filtered vector retrieval APIs that need production-oriented performance and control.

Endee combines vector search with filtering, sparse retrieval support, backup workflows, and deployment flexibility across local builds and Docker-based environments. The project is implemented in C++ and optimized for modern CPU targets, including AVX2, AVX512, NEON, and SVE2.

If you want the fastest path to evaluate Endee locally, start with the Getting Started guide or the hosted docs at docs.endee.io.

Why Endee

Built as a dedicated vector database for…

Top comments (0)

Subscribe