Apache Cloudberry is an advanced and mature open-source Massively Parallel Processing (MPP) database, derived from the open-source version of the Pivotal Greenplum Database® but built on a more modern PostgreSQL kernel and with more advanced enterprise capabilities. Cloudberry can serve as a data warehouse and can also be used for large-scale analytics and AI/ML workloads.
This benchmark is based on the latest Apache Cloudberry main branch 3.0 (a development branch that has not yet been officially released). It presents a systematic performance comparison of three full-text search implementations within the PostgreSQL ecosystem: ParadeDB BM25 (tested via pg_search), Postgres GIN (trigram), and ZomboDB.
The benchmark covers key metrics, including index build time, query performance, and memory consumption, with the goal of assessing their real-world behavior in large-scale text retrieval workloads. The results aim to offer practical advice to Cloudberry users and database developers when selecting an appropriate full-text search solution within the PostgreSQL ecosystem.
Test Environment
All tests were conducted on a system equipped with a 4-core CPU and 16 GB of memory, running Apache Cloudberry version 3.0 (main branch).
The dataset used in this evaluation is All The News 2.1 (dropbox.com/s/cn2utnr5ipathhh/all-...), a publicly available news corpus containing 2,688,878 articles sourced from 27 U.S. publications.
The searchable fields include:
date, year, month, day, author, title, article, url, section, and publication.
ParadeDB BM25 is a full-text search extension developed by ParadeDB.
It is built on a vectorized BM25 ranking algorithm, enabling efficient relevance-based retrieval over large text corpora.
Within Cloudberry, it can be integrated via the pg_search interface and is characterized by high query performance and low memory footprint.
However, the current version does not yet support JOIN operations.
Postgres GIN (trigram) is the built-in full-text search option in Apache Cloudberry, implemented through PostgreSQL’s core pg_trgm module.
It supports similarity matching and text search using trigram indexes, requires no external dependencies, and offers high integration and ease of deployment.
As such, it serves as one of Cloudberry’s default, out-of-the-box search mechanisms.
ZomboDB is an extension that integrates PostgreSQL with Elasticsearch, allowing users to execute Elasticsearch-powered full-text searches directly within SQL queries.
It combines PostgreSQL’s transactional consistency with Elasticsearch’s rich full-text and boolean search capabilities, making it particularly suitable for workloads that require complex search logic or external search integration.
At present, ZomboDB for Apache Cloudberry has not yet been open-sourced and remains in the planning stage.
This evaluation uses a compatible build solely to assess its potential performance within the Cloudberry environment.
All three performed similarly; differences were negligible.
GIN (trigram) was marginally faster (0.11 s).
Boolean query performance
AND queries → ZomboDB showed a clear lead (4.08 s).
OR queries → ParadeDB BM25 was fastest (25.66 s).
GIN (trigram) had the slowest OR query performance.
Memory Usage During Query Execution
Index Type
Approx. Memory Usage
ParadeDB BM25
≈ 120 MB
GIN (trigram)
≈ 1.4 GB
ZomboDB
≈ 120 MB
Conclusions
Each of the three full-text search solutions has their own dist :
ZomboDB: Offers the fastest index build time, making it well-suited for environments that require frequent index rebuilding or rapid deployment. It delivers stable query performance and supports complex Boolean search operations, making it ideal for applications that prioritize indexing speed and extensibility.
GIN (trigram): Provides moderate index build times but exhibits weaker performance in query execution—particularly with high-frequency terms and composite OR queries—while consuming significantly more memory. Its key advantages lie in simplicity and tight integration with PostgreSQL, which makes it suitable for lightweight workloads or use cases that depend on native PostgreSQL extensions.
ParadeDB BM25: Delivers the best overall query performance, especially for relevance-based searches, high-frequency terms, and complex queries. It also maintains low memory usage. However, its index build process takes longer, and current feature limitations—such as the lack of JOIN support—should be considered during system design.
In conclusion, ParadeDB BM25 is best suited for full-text search scenarios that emphasize text relevance and query performance; ZomboDB is more appropriate for applications requiring complex Boolean logic and extended search capabilities; and GIN (trigram) is most fitting for lightweight, low-frequency text matching tasks.
These benchmarking results provide valuable insights for Cloudberry users and PostgreSQL developers when selecting an appropriate full-text search solution within the PostgreSQL ecosystem.
Top comments (1)
Apache Cloudberry is an advanced and mature open-source Massively Parallel Processing (MPP) database, derived from the open-source version of the Pivotal Greenplum Database® but built on a more modern PostgreSQL kernel and with more advanced enterprise capabilities. Cloudberry can serve as a data warehouse and can also be used for large-scale analytics and AI/ML workloads.
This benchmark is based on the latest Apache Cloudberry main branch 3.0 (a development branch that has not yet been officially released). It presents a systematic performance comparison of three full-text search implementations within the PostgreSQL ecosystem: ParadeDB BM25 (tested via pg_search), Postgres GIN (trigram), and ZomboDB.
The benchmark covers key metrics, including index build time, query performance, and memory consumption, with the goal of assessing their real-world behavior in large-scale text retrieval workloads. The results aim to offer practical advice to Cloudberry users and database developers when selecting an appropriate full-text search solution within the PostgreSQL ecosystem.
Test Environment
All tests were conducted on a system equipped with a 4-core CPU and 16 GB of memory, running Apache Cloudberry version 3.0 (main branch).
The dataset used in this evaluation is All The News 2.1 (dropbox.com/s/cn2utnr5ipathhh/all-...), a publicly available news corpus containing 2,688,878 articles sourced from 27 U.S. publications.
The searchable fields include:
date, year, month, day, author, title, article, url, section, and publication.
The table schema is defined as follows:
1. ParadeDB BM25 Index
ParadeDB BM25 is a full-text search extension developed by ParadeDB.
It is built on a vectorized BM25 ranking algorithm, enabling efficient relevance-based retrieval over large text corpora.
Within Cloudberry, it can be integrated via the pg_search interface and is characterized by high query performance and low memory footprint.
However, the current version does not yet support JOIN operations.
2. GIN (trigram) Index
Postgres GIN (trigram) is the built-in full-text search option in Apache Cloudberry, implemented through PostgreSQL’s core pg_trgm module.
It supports similarity matching and text search using trigram indexes, requires no external dependencies, and offers high integration and ease of deployment.
As such, it serves as one of Cloudberry’s default, out-of-the-box search mechanisms.
3. ZomboDB Index
ZomboDB is an extension that integrates PostgreSQL with Elasticsearch, allowing users to execute Elasticsearch-powered full-text searches directly within SQL queries.
It combines PostgreSQL’s transactional consistency with Elasticsearch’s rich full-text and boolean search capabilities, making it particularly suitable for workloads that require complex search logic or external search integration.
At present, ZomboDB for Apache Cloudberry has not yet been open-sourced and remains in the planning stage.
This evaluation uses a compatible build solely to assess its potential performance within the Cloudberry environment.
Index Build Time Comparison
Performance Benchmark Results
Word-Level Search Latency
Boolean (Composite) Queries
Detailed Analysis
High-frequency term (“the”)
Medium-frequency term (“table”)
Low-frequency term (“abcd”)
Boolean query performance
Memory Usage During Query Execution
Conclusions
Each of the three full-text search solutions has their own dist :
ZomboDB: Offers the fastest index build time, making it well-suited for environments that require frequent index rebuilding or rapid deployment. It delivers stable query performance and supports complex Boolean search operations, making it ideal for applications that prioritize indexing speed and extensibility.
GIN (trigram): Provides moderate index build times but exhibits weaker performance in query execution—particularly with high-frequency terms and composite OR queries—while consuming significantly more memory. Its key advantages lie in simplicity and tight integration with PostgreSQL, which makes it suitable for lightweight workloads or use cases that depend on native PostgreSQL extensions.
ParadeDB BM25: Delivers the best overall query performance, especially for relevance-based searches, high-frequency terms, and complex queries. It also maintains low memory usage. However, its index build process takes longer, and current feature limitations—such as the lack of JOIN support—should be considered during system design.
In conclusion, ParadeDB BM25 is best suited for full-text search scenarios that emphasize text relevance and query performance; ZomboDB is more appropriate for applications requiring complex Boolean logic and extended search capabilities; and GIN (trigram) is most fitting for lightweight, low-frequency text matching tasks.
These benchmarking results provide valuable insights for Cloudberry users and PostgreSQL developers when selecting an appropriate full-text search solution within the PostgreSQL ecosystem.
Welcome to Apache Cloudberry
Dev mailing list: