You store embeddings as vectors. You run a similarity search inside PostgreSQL. Two extensions matter in this space: pgvector and NeuronDB. This post compares both extensions using real behavior from the source trees in this repo. Every limit and object name matches code and SQL.
Introduction
Vector databases store embeddings. Similarity search ranks rows by distance. PostgreSQL extensions bring vector types, distance operators, and index access methods into the SQL layer.
You choose pgvector when you want a focused extension with broad adoption. You choose NeuronDB when you want pgvector-style SQL plus additional types, GPU paths, and operational surface area inside the extension.
This post uses pgvector v0.8.1 semantics and NeuronDB v3.0.0-devel semantics from the local source. Feature parity varies by object. Some pieces match one-to-one. Some pieces use different names with aliases.
Project references
NeuronDB site: https://www.neurondb.ai
Source code: https://github.com/neurondb/neurondb
Architecture
Architectural choices define performance limits and feature capabilities.
pgvector Architecture
pgvector implements types, operators, and index access methods in C and SQL. The extension exposes a small surface area and relies on PostgreSQL storage, WAL, and query planning.
Type system and layouts
pgvector defines these public types:
-
vector: dense float32 vector, up to 16000 dimensions. -
halfvec: half precision vector, up to 16000 dimensions. -
sparsevec: sparse vector with int32 indices and float32 values, limits depend on operation. -
bit: PostgreSQLbittype, used as a binary vector for Hamming and Jaccard distance.
The vector type uses a varlena header plus two int16 fields: dim and unused. The payload stores dim float32 values. This layout matches typedef struct Vector in pgvector and yields storage of 4 times dimensions plus 8 bytes per value.
The sparsevec on disk format stores dim (int32), nnz (int32), and unused (int32) in the header, followed by nnz int32 indices. Values follow indices as a contiguous float32 array.
CPU dispatch
pgvector uses a mix of scalar code and CPU dispatch. On Linux x86_64, some functions compile with target_clones to generate multiple code paths. The code selects a path based on CPU capabilities. This approach appears in vector.c and bitutils.c.
NeuronDB Architecture
NeuronDB implements vector types, operators, access methods, and additional systems within a single extension. The extension defines types beyond pgvector, adds IVF under the access method name ivf, and includes GPU backends.
Type system and layouts
NeuronDB exposes pgvector style types, plus additional NeuronDB-specific types:
-
vector: dense float32 vector withdimint16 and anunusedint16 field, up to 16000 dimensions. -
vectorp: packed vector with metadata. The layout includes a CRC32 fingerprint, a version, a dimension, and an endian guard, followed by float32 data. -
vecmap: sparse high-dimensional map. The layout storestotal_dimandnnz, followed by parallel int32 indices and float32 values. -
halfvec: half precision vector with a 4000 dimension limit in NeuronDB. -
sparsevec: sparse vector type with 1000 nonzero entries and 1M dimensions in NeuronDB. -
binaryvec: binary vector type with a Hamming distance operator.
NeuronDB also defines internal structs for quantized vectors, such as int8, int4, and binary-packed representations. The SQL surface exposes conversion and distance functions.
Index access methods
NeuronDB defines two ANN index access methods as PostgreSQL access methods:
-
hnsw: HNSW index access method. -
ivf: IVF index access method.
NeuronDB also defines operator classes for vector, halfvec, sparsevec, bit, and binaryvec in both hnsw and ivf.
CPU SIMD and GPU backends
NeuronDB includes explicit AVX2 and AVX-512 implementations of common distance functions in vector_distance_simd.c. The build selects the compiled path based on compiler flags.
NeuronDB includes three GPU backend families in the tree:
- CUDA
- ROCm
- Metal
The runtime backend selection logic maps backend type to names cuda, rocm, and metal. GPU entry points for HNSW and IVF kNN search are provided via SQL functions.
Feature Comparison
Both extensions integrate with PostgreSQL. NeuronDB adds operational features.
Table 1: Types, distances, indexes, and hard limits
This table focuses on public SQL objects and hard limits enforced by each project.
| Area | pgvector | NeuronDB |
|---|---|---|
| Extension name | vector |
neurondb |
| Dense type |
vector (float32), max 16000 dims |
vector (float32), max 16000 dims |
| Half type |
halfvec (half), max 16000 dims |
halfvec (FP16), max 4000 dims |
| Sparse type |
sparsevec (dim int32, nnz int32, indices int32, values float32), max 1e9 dims, max 16000 nnz |
sparsevec, max 1M dims, max 1000 nonzero entries |
| Binary vector | PostgreSQL bit plus pgvector operators |
PostgreSQL bit operator classes plus binaryvec type |
| Distance operators |
<-> L2, <#> negative inner product, <=> cosine, <+> L1, <~> Hamming, <%> Jaccard |
<-> L2, <#> negative inner product, <=> cosine, <+> L1, <~> Hamming, Jaccard via vector_jaccard_distance(vector, vector) and <%> for bit
|
| ANN access methods |
hnsw, ivfflat
|
hnsw, ivf
|
| Dense index max dims | 2000 for HNSW and IVFFlat | limited by page layout, large dims fail with a page size error during build |
Table 2: Tuning knobs, defaults, and where each knob lives
This table lists knobs. Each knob changes recall, latency, or build time. The table also lists the location of each knob.
| Knob | pgvector | NeuronDB |
|---|---|---|
HNSW m
|
index option WITH (m = N), default 16 |
index option WITH (m = N), default 16 |
HNSW ef_construction
|
index option WITH (ef_construction = N), default 64 |
index option WITH (ef_construction = N), default 200 |
HNSW ef_search
|
GUC hnsw.ef_search, default 40 |
GUC neurondb.hnsw_ef_search, default 64 |
| HNSW iterative scans | GUC hnsw.iterative_scan
|
GUC neurondb.hnsw_iterative_scan
|
| HNSW scan stop | GUC hnsw.max_scan_tuples and hnsw.scan_mem_multiplier
|
GUC neurondb.hnsw_max_scan_tuples and neurondb.hnsw_scan_mem_multiplier
|
| IVF lists | index option WITH (lists = N) on ivfflat
|
index option WITH (lists = N) on ivf, and NeuronDB maps ivfflat to ivf in helper functions |
| IVF probes | GUC ivfflat.probes, default 1 |
GUC neurondb.ivf_probes, default 10 |
| IVF iterative scans | GUC ivfflat.iterative_scan and ivfflat.max_probes
|
GUC neurondb.ivf_iterative_scan and neurondb.ivf_max_probes
|
Table 3: Acceleration and storage formats
This table covers CPU SIMD, GPU backends, and compressed formats.
| Area | pgvector | NeuronDB |
|---|---|---|
| CPU vector dispatch |
target_clones dispatch on supported builds |
explicit AVX2 and AVX-512 distance functions, selected by build flags |
| GPU backends | none | CUDA, ROCm, Metal |
| GPU kNN helpers | none |
hnsw_knn_search_gpu(query vector, k int, ef_search int) and ivf_knn_search_gpu(query vector, k int, nprobe int)
|
| Packed dense format | none |
vectorp with CRC32 fingerprint, version, endian guard, and float32 data |
| Sparse high dim format | sparsevec |
vecmap and NeuronDB sparsevec type |
| Quantized internal types | binary quantization via binary_quantize to bit
|
int8, int4, binary, and FP16 quantization in type and function layer |
Production Readiness
Production systems need repeatable behavior, clear configuration, and a monitoring path. NeuronDB ships extra primitives for tenant controls, queue-based workflows, and metrics export.
NeuronDB includes these operational surfaces in SQL:
- tenant usage tables and quota tracking
- background worker tables and manual triggers
- Prometheus compatible metrics via SQL, plus an HTTP exporter endpoint
Performance
Performance depends on dataset shape, index parameters, storage layout, and query patterns. Use the benchmark scripts in this repo to measure your hardware and build.
Benchmarks
The repository includes benchmark scripts and SQL stress tests. Use these tools to compare pgvector and NeuronDB on your own system.
Vector benchmark suite
The vector benchmark suite downloads public ANN datasets, loads them into PostgreSQL, builds indexes, runs queries, and writes JSON results.
Run the full pipeline:
python3 NeuronDB/benchmark/vector/run_bm.py --prepare --load --run --datasets sift-128-euclidean --configs hnsw --k-values 10
Run a quick pipeline with defaults:
python3 NeuronDB/benchmark/vector/run_bm.py --prepare --load --run
Stress tests
The repo includes SQL stress suites for pgvector and NeuronDB.
pgvector stress suite:
\i NeuronDB/benchmark/vector/pgvector_stress.sql
NeuronDB stress suite:
\i NeuronDB/benchmark/vector/neurondb_vector_stress.sql
Example result from committed artifact
This example comes from NeuronDB/benchmark/vector/results/benchmark_sift-128-euclidean_hnsw_20260104_211033.json.
Dataset: sift-128-euclidean
Train vectors: 1000000
Test queries: 10000
Dimension: 128
Index: hnsw, m 16, ef_construction 200
Query: k 10, ef_search 100
Average latency ms: 512.9799604415894
P95 latency ms: 521.4026927947998
QPS: 1.9493938888746618
Recall: 1.0
Practical usage
This section focuses on repeatable workflows. Each workflow uses real object names from both projects.
Basic table and query pattern
Use a fixed-dimension column when a single embedding model drives it. Use a typmod column less when multiple embedding models share one column.
Example with a fixed dimension:
CREATE TABLE items (
id bigint PRIMARY KEY,
embedding vector(3)
)\g
INSERT INTO items (id, embedding) VALUES
(1, '[1,2,3]'::vector),
(2, '[4,5,6]'::vector)\g
SELECT
id,
embedding <-> '[3,1,2]'::vector AS l2_distance
FROM items
ORDER BY l2_distance
LIMIT 5\g
Indexing with HNSW
HNSW uses a graph. You trade recall for speed by changing the candidate list size during search.
pgvector uses these knobs:
- index reloptions:
m,ef_construction - query time GUC:
hnsw.ef_search
NeuronDB uses these knobs:
- index reloptions:
m,ef_construction - query time GUC:
neurondb.hnsw_ef_search
HNSW index creation:
CREATE INDEX items_embedding_hnsw_l2
ON items
USING hnsw (embedding vector_l2_ops)
WITH (m = 16, ef_construction = 64)\g
Query time tuning with pgvector:
SET hnsw.ef_search = 100\g
Query time tuning with NeuronDB:
SET neurondb.hnsw_ef_search = 100\g
Indexing with IVF
IVF uses lists. Search probes determine how many lists participate in a query.
pgvector uses the access method name ivfflat and uses:
- index reloption:
lists - query time GUC:
ivfflat.probes
NeuronDB uses the access method name ivf and uses:
- index reloption:
lists - query time GUC:
neurondb.ivf_probes
IVF index creation in pgvector:
CREATE INDEX items_embedding_ivfflat_l2
ON items
USING ivfflat (embedding vector_l2_ops)
WITH (lists = 100)\g
IVF index creation in NeuronDB:
CREATE INDEX items_embedding_ivf_l2
ON items
USING ivf (embedding vector_l2_ops)
WITH (lists = 100)\g
Query time tuning with pgvector:
SET ivfflat.probes = 10\g
Query time tuning with NeuronDB:
SET neurondb.ivf_probes = 10\g
Filtered search
Filtered kNN queries need two things: a filter predicate and an ordered distance sort with a limit.
SELECT
id,
embedding <-> '[3,1,2]'::vector AS l2_distance
FROM items
WHERE id <> 1
ORDER BY l2_distance
LIMIT 5\g
For approximate indexes, filtering happens after index traversal in pgvector. pgvector provides iterative index scans to extend scans when filtering removes rows from the first pass.
Iterative scans for pgvector HNSW:
SET hnsw.iterative_scan = strict_order\g
SET hnsw.max_scan_tuples = 20000\g
SET hnsw.scan_mem_multiplier = 2\g
Iterative scans for NeuronDB HNSW:
SET neurondb.hnsw_iterative_scan = strict_order\g
SET neurondb.hnsw_max_scan_tuples = 20000\g
SET neurondb.hnsw_scan_mem_multiplier = 2\g
NeuronDB packed and sparse formats
Use vectorp when you want a packed dense format with metadata. Use vecmap for sparse high-dimensional inputs.
CREATE TABLE packed_items (
id bigint PRIMARY KEY,
embedding vectorp
)\g
INSERT INTO packed_items (id, embedding) VALUES
(1, '[1,2,3]'::vectorp)\g
SELECT
id,
embedding
FROM packed_items
ORDER BY id
LIMIT 1\g
CREATE TABLE sparse_items (
id bigint PRIMARY KEY,
embedding vecmap
)\g
Quantization workflows
Quantization trades precision for smaller storage and faster scans. NeuronDB exposes multiple quantization functions. Each function returns a bytea representation.
CPU quantization examples:
SELECT
vector_to_int8('[1,2,3]'::vector) AS q_int8,
vector_to_fp16('[1,2,3]'::vector) AS q_fp16,
vector_to_binary('[1,2,3]'::vector) AS q_binary,
vector_to_int4('[1,2,3]'::vector) AS q_int4\g
Accuracy analysis examples:
SELECT
quantize_analyze_int8('[1,2,3]'::vector) AS int8_stats,
quantize_analyze_fp16('[1,2,3]'::vector) AS fp16_stats,
quantize_analyze_binary('[1,2,3]'::vector) AS binary_stats,
quantize_analyze_int4('[1,2,3]'::vector) AS int4_stats\g
GPU workflows in NeuronDB
NeuronDB exposes GPU status, GPU distance functions, and GPU kNN helpers in SQL.
GPU initialization and status:
SELECT neurondb_gpu_enable() AS gpu_enabled\g
SELECT
device_id,
device_name,
total_memory_mb,
free_memory_mb,
is_available
FROM neurondb_gpu_info()\g
GPU distance functions:
SELECT
vector_l2_distance_gpu('[1,2,3]'::vector, '[4,5,6]'::vector) AS l2_gpu,
vector_cosine_distance_gpu('[1,2,3]'::vector, '[4,5,6]'::vector) AS cosine_gpu,
vector_inner_product_gpu('[1,2,3]'::vector, '[4,5,6]'::vector) AS ip_gpu\g
GPU kNN helpers:
SELECT
id,
distance
FROM hnsw_knn_search_gpu('[1,2,3]'::vector, 10, 100)\g
SELECT
id,
distance
FROM ivf_knn_search_gpu('[1,2,3]'::vector, 10, 10)\g
GPU usage stats:
SELECT
queries_executed,
fallback_count,
total_gpu_time_ms,
total_cpu_time_ms,
avg_latency_ms
FROM neurondb_gpu_stats()\g
Multi tenant controls in NeuronDB
NeuronDB includes tenant quota tracking and tenant specific helper functions. These objects live in the neurondb schema.
Tenant quota tables and views support workflows such as:
- enforce per tenant vector count limits
- track per tenant storage usage
- query tenant usage and quota percent
NeuronDB includes tenant-aware HNSW helper functions:
hnsw_tenant_createhnsw_tenant_searchhnsw_tenant_quota
Monitoring in NeuronDB
NeuronDB exposes Prometheus-compatible metrics via SQL:
SELECT
queries_total,
queries_success,
queries_error,
query_duration_sum,
vectors_total,
cache_hits,
cache_misses,
workers_active
FROM neurondb_prometheus_metrics()\g
Background workers and queues in NeuronDB
NeuronDB stores the queue and metrics state in SQL tables under the neurondb schema. Background workers process or sample those tables when enabled in PostgreSQL.
Queue workflow:
INSERT INTO neurondb.job_queue (tenant_id, job_type, payload)
VALUES (0, 'embedding', '{"text":"hello"}'::jsonb)\g
SELECT
job_id,
tenant_id,
job_type,
status,
retry_count,
created_at
FROM neurondb.job_queue
ORDER BY created_at DESC
LIMIT 10\g
Manual trigger helpers exist for testing:
SELECT neuranq_run_once() AS queued_work\g
SELECT neuranmon_sample() AS tuner_sample\g
SELECT neurandefrag_run() AS defrag_ran\g
LLM configuration and jobs in NeuronDB
NeuronDB stores LLM provider configuration in neurondb.llm_config and stores jobs in neurondb.llm_jobs.
Configuration workflow:
SELECT neurondb.set_llm_config(
'https://api-inference.huggingface.co',
'REPLACE_WITH_KEY',
'REPLACE_WITH_MODEL'
)\g
SELECT
api_base,
default_model,
updated_at
FROM neurondb.get_llm_config()\g
Job enqueue workflow:
SELECT ndb_llm_enqueue(
'embed',
'REPLACE_WITH_MODEL',
'hello world',
'tenant0'
) AS job_id\g
Index tuning helpers in NeuronDB
NeuronDB exposes index tuning and diagnostics helpers in SQL. These helpers return JSONB.
Examples:
SELECT index_tune_hnsw('items', 'embedding') AS hnsw_recommendation\g
SELECT index_tune_ivf('items', 'embedding') AS ivf_recommendation\g
SELECT index_recommend_type('items', 'embedding') AS index_choice\g
SELECT index_tune_query_params('items_embedding_hnsw_l2') AS query_knobs\g
Migration
Migration replaces one extension with another extension. Existing tables remain. Indexes with dependencies on vector extension objects drop during DROP EXTENSION vector CASCADE.
- Drop pgvector
DROP EXTENSION vector CASCADE\g
- Install NeuronDB
CREATE EXTENSION neurondb\g
- Verify data
SELECT count(1) AS row_count FROM items\g
Your table rows remain. You still need to rebuild ANN indexes after dropping pgvector.
- Recreate indexes
For large tables (>100GB), increase
maintenance_work_membefore building.
SET maintenance_work_mem = '4GB'\g
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)
WITH (m = 16, ef_construction = 64)\g
- GPU Use the NeuronDB GPU functions and settings from the extension. The SQL surface exposes GPU kNN functions for HNSW and IVF.
Use Case Recommendations
Select the tool for your infrastructure and requirements. Start with your workload, then map your constraints to a short decision.
Use pgvector when your goal is simple vector search
Pick pgvector when you want fewer moving parts and fewer extension specific features.
Use pgvector when you meet most of these conditions:
- You run CPU only workloads.
- You want
ivfflatandhnswnaming across docs, examples, and client libraries. - You want distance operators and two ANN access methods, with minimal extra SQL objects.
- You want a smaller operational surface area inside the extension.
Use pgvector when your query pattern looks like this most of the time:
SELECT
id,
embedding <-> $1::vector AS distance
FROM items
ORDER BY distance
LIMIT 10
Use pgvector when you tune with pgvector GUCs:
SET hnsw.ef_search = 100\g
SET ivfflat.probes = 10\g
Use NeuronDB when your goal is a larger in database surface area
Pick NeuronDB when you want the same distance operators and index patterns plus additional SQL objects for GPU workflows, quantization workflows, tuning helpers, and operational queues.
Use NeuronDB when you meet most of these conditions:
- You want
ivfas an access method name, plushnsw. - You want
vectorpandvecmapas additional storage formats. - You want SQL functions for quantization, with both CPU and GPU entry points.
- You want SQL functions for GPU status, GPU distance, and GPU kNN helpers.
- You want SQL tables and views for tenant quotas, job queues, metrics, and Prometheus export.
Use NeuronDB when you want NeuronDB specific query time tuning:
SET neurondb.hnsw_ef_search = 100\g
SET neurondb.ivf_probes = 10\g
Use NeuronDB when you want index tuning helpers and diagnostics in SQL:
SELECT index_tune_hnsw('items', 'embedding') AS hnsw_recommendation\g
SELECT index_tune_ivf('items', 'embedding') AS ivf_recommendation\g
SELECT index_recommend_type('items', 'embedding') AS index_choice\g
Short decision flow
Start here when you need a quick answer.
- If you want
ivfflat, pick pgvector. - If you want
ivf, pick NeuronDB. - If you want GPU SQL entry points, pick NeuronDB.
- If you want fewer extension-owned tables and views, pick pgvector.
- If you want quantization helpers and analysis functions in SQL, pick NeuronDB.
Practical scenarios
Use this section as a checklist.
Scenario 1: Single app, single embedding model, CPU only
- Use pgvector
- Create one HNSW index on
vector_l2_opsorvector_cosine_ops - Tune
hnsw.ef_searchper endpoint or per query
Scenario 2: Multi-tenant SaaS with per-tenant limits
- Use NeuronDB
- Use tenant quota tables and views under the
neurondbschema - Use tenant aware HNSW helper functions when you want tenant scoped index management
Scenario 3: Storage pressure from large embeddings
- Use NeuronDB
- Use quantization functions to produce compact
byteaoutputs - Compare distance preservation with
quantize_compare_distances
Scenario 4: GPU present, batch heavy workloads
- Use NeuronDB
- Enable GPU runtime and query
neurondb_gpu_info - Use GPU kNN helpers where your workflow matches those function signatures
Conclusion
pgvector focuses on vector search primitives. NeuronDB adds additional types, an ivf access method, GPU entry points, quantization helpers, worker tables, and metrics export.
Pick the extension based on your operational goal. Keep your schema and query patterns simple. Measure with the benchmark scripts in this repo, then tune one knob at a time.
Top comments (0)