Introduction
Ever needed powerful search capabilities in your application but dreaded the complexity of managing Elasticsearch or OpenSearch? You're not alone. While these dedicated search engines are powerful, they introduce significant operational overhead: another cluster to deploy, ETL pipelines to maintain, and constant synchronization headaches.
What if you could get search engine capabilities directly inside PostgreSQL? That's exactly what pg_search offers. In this post, I'll walk you through what pg_search is, why it can't run on Amazon RDS, and how we architected a solution using EC2 that keeps everything in PostgreSQL while delivering modern search features.
What is pg_search?
pg_search is a PostgreSQL extension from ParadeDB that transforms your database into a full-featured search engine. Instead of basic substring matching with LIKE or ILIKE, you get:
- Full-text search with BM25 ranking (the algorithm behind modern search engines)
- Vector search for semantic similarity using embeddings
- Hybrid search that combines text relevance and vector similarity
- All using standard SQL queries
Here's a simple example:
SELECT id, title,
bm25_rank((title, body), 'configure postgres search') AS score
FROM articles
ORDER BY score DESC
LIMIT 10;
This returns ranked results like a real search engine, not just rows that contain your keywords.
Why pg_search Instead of Traditional Search Solutions?
You might reach for pg_search when:
Plain Postgres queries aren't enough. Basic LIKE queries don't provide relevance ranking, and scaling text search across multiple columns becomes difficult.
You want to avoid separate infrastructure. Solutions like Elasticsearch require:
- Additional clusters to deploy and monitor
- ETL pipelines to sync data from Postgres
- Managing consistency between two systems
- Extra latency from network hops
You prefer the Postgres ecosystem. With pg_search, everything stays in Postgres. Your backups, permissions, transactions, and tooling all work the same way. No new stack to learn.
The RDS Challenge
Here's the catch: Amazon RDS doesn't support pg_search.
RDS only allows a pre-approved list of extensions. Since RDS is a managed service, AWS controls what can be loaded into the PostgreSQL process. If you try:
CREATE EXTENSION pg_search;
You'll get:
ERROR: extension "pg_search" is not available
This limitation is by design. To use pg_search, you need a self-managed PostgreSQL instance where you control the installation.
Our Solution: RDS + EC2 Architecture
Since we can't install pg_search on RDS, we use a hybrid approach:
- Amazon RDS remains our primary database for all application writes and transactions
-
EC2 PostgreSQL runs as a search replica with
pg_searchinstalled - Data flows from RDS to EC2 using logical replication
- Applications query RDS for normal operations and EC2 for search
Here's the architecture:
┌─────────────────────────────┐
│ Application │
│ (API / backend / service) │
└────────────┬────────────────┘
│
Write / Read
│
┌────────────▼──────────────┐
│ Amazon RDS PostgreSQL │
│ (Primary DB) │
└────────────┬──────────────┘
│
Logical Replication
│
┌────────────▼──────────────┐
│ Amazon EC2 │
│ PostgreSQL + pg_search │
│ (Search / Analytics) │
└────────────▲──────────────┘
│
Search Queries
│
┌────────────┴──────────────┐
│ Application │
└───────────────────────────┘
Everything stays inside your private AWS VPC with no external services.
Setting Up pg_search on EC2
Step 1: Provision Your EC2 Instance
Launch an EC2 instance (t3.medium or larger) in the same VPC as your RDS instance. Place it in a private subnet and attach sufficient storage (100+ GB gp3 EBS).
Configure security groups to allow:
- EC2 → RDS communication on port 5432
- Your admin access via SSH
Step 2: Install PostgreSQL with pg_search
We'll use Docker for a reproducible setup. Create a Dockerfile:
FROM postgres:16
# Install build dependencies
RUN apt-get update && apt-get install -y \
build-essential git curl pkg-config libssl-dev \
libclang-dev clang postgresql-server-dev-16 \
libicu-dev wget ca-certificates
# Install Rust toolchain
ENV CARGO_HOME=/usr/local/cargo
ENV PATH=$CARGO_HOME/bin:$PATH
RUN curl https://sh.rustup.rs -sSf | sh -s -- -y && \
. "$CARGO_HOME/env" && \
cargo install cargo-pgrx --version 0.15.0 --locked && \
cargo pgrx init --pg16=$(which pg_config)
# Install pgvector (optional, for vector search)
RUN wget https://github.com/pgvector/pgvector/archive/refs/tags/v0.5.1.tar.gz && \
tar -xvzf v0.5.1.tar.gz && \
cd pgvector-0.5.1 && make && make install && \
cd .. && rm -rf pgvector-0.5.1 v0.5.1.tar.gz
# Build and install pg_search
RUN git clone https://github.com/paradedb/paradedb.git && \
cd paradedb/pg_search && \
cargo pgrx install --release && \
cd ../.. && rm -rf paradedb
Create a docker-compose.yml:
version: '3.8'
services:
postgres:
build:
context: .
dockerfile: Dockerfile
container_name: pgsearch-ec2
restart: always
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: your_strong_password
ports:
- "5432:5432"
volumes:
- /mnt/pgdata:/var/lib/postgresql/data
command:
- "postgres"
- "-c"
- "wal_level=logical"
- "-c"
- "max_replication_slots=10"
- "-c"
- "max_wal_senders=10"
- "-c"
- "shared_preload_libraries=pg_search"
Start the container:
docker compose up -d
Step 3: Enable the Extensions
Connect to your EC2 PostgreSQL instance:
psql -h localhost -U postgres
Create your database and enable extensions:
CREATE DATABASE app_db;
\c app_db
CREATE EXTENSION pg_search;
CREATE EXTENSION vector; -- if using vector search
Step 4: Create Search Indexes
Now you can create powerful search indexes on your tables:
-- BM25 full-text search index
CREATE INDEX documents_search_idx
ON documents
USING bm25 (title, body)
WITH (key_field = 'id');
-- Vector similarity index (for semantic search)
CREATE INDEX documents_embedding_idx
ON documents
USING ivfflat (embedding vector_l2_ops)
WITH (lists = 100);
Syncing Data from RDS to EC2
Installing pg_search is only half the solution. We need continuous data replication from RDS to EC2.
Initial Data Load
First, take a snapshot of your RDS data:
# Dump from RDS
pg_dump -h your-rds-endpoint.amazonaws.com \
-U your_user -d app_db \
-Fc -f app_db_dump.backup
# Restore to EC2
psql -h localhost -U postgres -c "CREATE DATABASE app_db;"
pg_restore -h localhost -U postgres \
-d app_db app_db_dump.backup
Configure RDS for Logical Replication
In your RDS parameter group, set:
rds.logical_replication = 1
max_replication_slots = 50 (or more, depending on your needs)
max_wal_senders = 10
max_slot_wal_keep_size = 2048 (MB — safety net to avoid WAL filling storage)
Apply changes and reboot your RDS instance.
Create a replication user on RDS:
CREATE ROLE repl_user WITH LOGIN REPLICATION PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE app_db TO repl_user;
GRANT USAGE ON SCHEMA public TO repl_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO repl_user;
Create a publication on RDS:
-- Replicate all tables
CREATE PUBLICATION app_pub FOR ALL TABLES;
-- Or specific tables only
-- CREATE PUBLICATION app_pub FOR TABLE documents, users;
Set Up Subscription on EC2
On your EC2 PostgreSQL instance:
CREATE SUBSCRIPTION app_sub
CONNECTION 'host=your-rds-endpoint.amazonaws.com port=5432
user=repl_user password=secure_password dbname=app_db'
PUBLICATION app_pub
WITH (
slot_name = app_slot,
create_slot = true,
copy_data = false, -- data already loaded via pg_restore
enabled = true
);
Verify Replication
Check replication status on EC2:
SELECT subname, pid, latest_end_time,
now() - latest_end_time AS delay
FROM pg_stat_subscription;
Test with a simple insert on RDS:
INSERT INTO documents (id, title, body)
VALUES (gen_random_uuid(), 'Test', 'Replication check');
The row should appear on EC2 within seconds.
Running Search Queries
Now your application can run powerful search queries against the EC2 instance:
Full-text search with ranking:
SELECT id, title,
bm25_rank((title, body), 'postgres configuration') AS score
FROM documents
ORDER BY score DESC
LIMIT 10;
Vector similarity search:
SELECT id, title,
1 - (embedding <=> $1) AS similarity
FROM documents
ORDER BY embedding <=> $1
LIMIT 10;
Your application connects to RDS for writes and normal reads, and to EC2 for search operations.
Keeping New Tables in Sync
By default:
If you used FOR ALL TABLES in your publication, new tables will be included on the publisher side, but…
The subscriber needs to be refreshed to “see” them:
ALTER SUBSCRIPTION app_sub REFRESH PUBLICATION WITH (copy_data = true);
You can automate this with a cron job on EC2:
*/15 * * * * psql -h localhost -U postgres -d app_db \
-c "ALTER SUBSCRIPTION app_sub REFRESH PUBLICATION WITH (copy_data = true);"
This ensures new tables created on RDS are picked up and synced to EC2 regularly.
Pros and Cons
Advantages
✅ Advanced search features – BM25 ranking, vector search, and hybrid search capabilities
✅ Postgres-native – Use SQL and standard Postgres tooling instead of learning new systems
✅ RDS stays primary – Keep RDS for transactions, backups, and managed infrastructure
✅ Private network – Everything stays inside your VPC with no external dependencies
✅ Flexible extensions – Install any Postgres extension on EC2, not limited by RDS restrictions
Challenges
❌ Additional infrastructure – You now manage a self-hosted Postgres instance
❌ Replication complexity – Setting up and maintaining logical replication requires care
❌ Near real-time search – Search data lags RDS by a few seconds (acceptable for most use cases)
❌ Dual endpoints – Application needs to connect to two different databases
❌ Higher costs – Extra EC2 instance and storage expenses
Conclusion
We chose this architecture because it gives us powerful search capabilities without the operational complexity of maintaining a separate search infrastructure. While managing an EC2 Postgres instance adds some overhead, it's significantly simpler than running Elasticsearch clusters and ETL pipelines.
For teams already comfortable with PostgreSQL, this approach feels natural. Everything stays in the Postgres ecosystem, uses familiar SQL, and remains inside your private network. The trade-off of managing one additional Postgres node is worthwhile for the search capabilities you gain.
If you're considering this architecture, start small: set up a test EC2 instance, replicate a subset of tables, and validate that the search performance meets your needs before committing to production deployment.
Have questions about this setup? Feel free to reach out or leave a comment below.
Top comments (0)