NeuronDB Support

Posted on Feb 1

NeuronDB Vector vs pgvector: Technical Comparison

#vectordatabase #pgvector #postgres #vector

You store embeddings as vectors. You run a similarity search inside PostgreSQL. Two extensions matter in this space: pgvector and NeuronDB. This post compares both extensions using real behavior from the source trees in this repo. Every limit and object name matches code and SQL.

Introduction

Vector databases store embeddings. Similarity search ranks rows by distance. PostgreSQL extensions bring vector types, distance operators, and index access methods into the SQL layer.

You choose pgvector when you want a focused extension with broad adoption. You choose NeuronDB when you want pgvector-style SQL plus additional types, GPU paths, and operational surface area inside the extension.

This post uses pgvector v0.8.1 semantics and NeuronDB v3.0.0-devel semantics from the local source. Feature parity varies by object. Some pieces match one-to-one. Some pieces use different names with aliases.

Project references
NeuronDB site: https://www.neurondb.ai
Source code: https://github.com/neurondb/neurondb

Architecture

Architectural choices define performance limits and feature capabilities.

pgvector Architecture

pgvector implements types, operators, and index access methods in C and SQL. The extension exposes a small surface area and relies on PostgreSQL storage, WAL, and query planning.

Type system and layouts

pgvector defines these public types:

vector: dense float32 vector, up to 16000 dimensions.
halfvec: half precision vector, up to 16000 dimensions.
sparsevec: sparse vector with int32 indices and float32 values, limits depend on operation.
bit: PostgreSQL bit type, used as a binary vector for Hamming and Jaccard distance.

The vector type uses a varlena header plus two int16 fields: dim and unused. The payload stores dim float32 values. This layout matches typedef struct Vector in pgvector and yields storage of 4 times dimensions plus 8 bytes per value.

The sparsevec on disk format stores dim (int32), nnz (int32), and unused (int32) in the header, followed by nnz int32 indices. Values follow indices as a contiguous float32 array.

CPU dispatch

pgvector uses a mix of scalar code and CPU dispatch. On Linux x86_64, some functions compile with target_clones to generate multiple code paths. The code selects a path based on CPU capabilities. This approach appears in vector.c and bitutils.c.

NeuronDB Architecture

NeuronDB implements vector types, operators, access methods, and additional systems within a single extension. The extension defines types beyond pgvector, adds IVF under the access method name ivf, and includes GPU backends.

Type system and layouts

NeuronDB exposes pgvector style types, plus additional NeuronDB-specific types:

vector: dense float32 vector with dim int16 and an unused int16 field, up to 16000 dimensions.
vectorp: packed vector with metadata. The layout includes a CRC32 fingerprint, a version, a dimension, and an endian guard, followed by float32 data.
vecmap: sparse high-dimensional map. The layout stores total_dim and nnz, followed by parallel int32 indices and float32 values.
halfvec: half precision vector with a 4000 dimension limit in NeuronDB.
sparsevec: sparse vector type with 1000 nonzero entries and 1M dimensions in NeuronDB.
binaryvec: binary vector type with a Hamming distance operator.

NeuronDB also defines internal structs for quantized vectors, such as int8, int4, and binary-packed representations. The SQL surface exposes conversion and distance functions.

Index access methods

NeuronDB defines two ANN index access methods as PostgreSQL access methods:

hnsw: HNSW index access method.
ivf: IVF index access method.

NeuronDB also defines operator classes for vector, halfvec, sparsevec, bit, and binaryvec in both hnsw and ivf.

CPU SIMD and GPU backends

NeuronDB includes explicit AVX2 and AVX-512 implementations of common distance functions in vector_distance_simd.c. The build selects the compiled path based on compiler flags.

NeuronDB includes three GPU backend families in the tree:

CUDA
ROCm
Metal

The runtime backend selection logic maps backend type to names cuda, rocm, and metal. GPU entry points for HNSW and IVF kNN search are provided via SQL functions.

Feature Comparison

Both extensions integrate with PostgreSQL. NeuronDB adds operational features.

Table 1: Types, distances, indexes, and hard limits

This table focuses on public SQL objects and hard limits enforced by each project.

Area	pgvector	NeuronDB
Extension name	`vector`	`neurondb`
Dense type	`vector` (float32), max 16000 dims	`vector` (float32), max 16000 dims
Half type	`halfvec` (half), max 16000 dims	`halfvec` (FP16), max 4000 dims
Sparse type	`sparsevec` (dim int32, nnz int32, indices int32, values float32), max 1e9 dims, max 16000 nnz	`sparsevec`, max 1M dims, max 1000 nonzero entries
Binary vector	PostgreSQL `bit` plus pgvector operators	PostgreSQL `bit` operator classes plus `binaryvec` type
Distance operators	`<->` L2, `<#>` negative inner product, `<=>` cosine, `<+>` L1, `<~>` Hamming, `<%>` Jaccard	`<->` L2, `<#>` negative inner product, `<=>` cosine, `<+>` L1, `<~>` Hamming, Jaccard via `vector_jaccard_distance(vector, vector)` and `<%>` for `bit`
ANN access methods	`hnsw`, `ivfflat`	`hnsw`, `ivf`
Dense index max dims	2000 for HNSW and IVFFlat	limited by page layout, large dims fail with a page size error during build

Table 2: Tuning knobs, defaults, and where each knob lives

This table lists knobs. Each knob changes recall, latency, or build time. The table also lists the location of each knob.

Knob	pgvector	NeuronDB
HNSW `m`	index option `WITH (m = N)`, default 16	index option `WITH (m = N)`, default 16
HNSW `ef_construction`	index option `WITH (ef_construction = N)`, default 64	index option `WITH (ef_construction = N)`, default 200
HNSW `ef_search`	GUC `hnsw.ef_search`, default 40	GUC `neurondb.hnsw_ef_search`, default 64
HNSW iterative scans	GUC `hnsw.iterative_scan`	GUC `neurondb.hnsw_iterative_scan`
HNSW scan stop	GUC `hnsw.max_scan_tuples` and `hnsw.scan_mem_multiplier`	GUC `neurondb.hnsw_max_scan_tuples` and `neurondb.hnsw_scan_mem_multiplier`
IVF lists	index option `WITH (lists = N)` on `ivfflat`	index option `WITH (lists = N)` on `ivf`, and NeuronDB maps `ivfflat` to `ivf` in helper functions
IVF probes	GUC `ivfflat.probes`, default 1	GUC `neurondb.ivf_probes`, default 10
IVF iterative scans	GUC `ivfflat.iterative_scan` and `ivfflat.max_probes`	GUC `neurondb.ivf_iterative_scan` and `neurondb.ivf_max_probes`

Table 3: Acceleration and storage formats

This table covers CPU SIMD, GPU backends, and compressed formats.

Area	pgvector	NeuronDB
CPU vector dispatch	`target_clones` dispatch on supported builds	explicit AVX2 and AVX-512 distance functions, selected by build flags
GPU backends	none	CUDA, ROCm, Metal
GPU kNN helpers	none	`hnsw_knn_search_gpu(query vector, k int, ef_search int)` and `ivf_knn_search_gpu(query vector, k int, nprobe int)`
Packed dense format	none	`vectorp` with CRC32 fingerprint, version, endian guard, and float32 data
Sparse high dim format	`sparsevec`	`vecmap` and NeuronDB `sparsevec` type
Quantized internal types	binary quantization via `binary_quantize` to `bit`	int8, int4, binary, and FP16 quantization in type and function layer

Production Readiness

Production systems need repeatable behavior, clear configuration, and a monitoring path. NeuronDB ships extra primitives for tenant controls, queue-based workflows, and metrics export.

NeuronDB includes these operational surfaces in SQL:

tenant usage tables and quota tracking
background worker tables and manual triggers
Prometheus compatible metrics via SQL, plus an HTTP exporter endpoint

Performance

Performance depends on dataset shape, index parameters, storage layout, and query patterns. Use the benchmark scripts in this repo to measure your hardware and build.

Benchmarks

The repository includes benchmark scripts and SQL stress tests. Use these tools to compare pgvector and NeuronDB on your own system.

Vector benchmark suite

The vector benchmark suite downloads public ANN datasets, loads them into PostgreSQL, builds indexes, runs queries, and writes JSON results.

Run the full pipeline:

python3 NeuronDB/benchmark/vector/run_bm.py --prepare --load --run --datasets sift-128-euclidean --configs hnsw --k-values 10

Run a quick pipeline with defaults:

python3 NeuronDB/benchmark/vector/run_bm.py --prepare --load --run

Stress tests

The repo includes SQL stress suites for pgvector and NeuronDB.

pgvector stress suite:

\i NeuronDB/benchmark/vector/pgvector_stress.sql

NeuronDB stress suite:

\i NeuronDB/benchmark/vector/neurondb_vector_stress.sql

Example result from committed artifact

This example comes from NeuronDB/benchmark/vector/results/benchmark_sift-128-euclidean_hnsw_20260104_211033.json.

Dataset: sift-128-euclidean
Train vectors: 1000000
Test queries: 10000
Dimension: 128
Index: hnsw, m 16, ef_construction 200
Query: k 10, ef_search 100
Average latency ms: 512.9799604415894
P95 latency ms: 521.4026927947998
QPS: 1.9493938888746618
Recall: 1.0

Practical usage

This section focuses on repeatable workflows. Each workflow uses real object names from both projects.

Basic table and query pattern

Use a fixed-dimension column when a single embedding model drives it. Use a typmod column less when multiple embedding models share one column.

Example with a fixed dimension:

CREATE TABLE items (
    id bigint PRIMARY KEY,
    embedding vector(3)
)\g

INSERT INTO items (id, embedding) VALUES
    (1, '[1,2,3]'::vector),
    (2, '[4,5,6]'::vector)\g

SELECT
    id,
    embedding <-> '[3,1,2]'::vector AS l2_distance
FROM items
ORDER BY l2_distance
LIMIT 5\g

Indexing with HNSW

HNSW uses a graph. You trade recall for speed by changing the candidate list size during search.

pgvector uses these knobs:

index reloptions: m, ef_construction
query time GUC: hnsw.ef_search

NeuronDB uses these knobs:

index reloptions: m, ef_construction
query time GUC: neurondb.hnsw_ef_search

HNSW index creation:

CREATE INDEX items_embedding_hnsw_l2
ON items
USING hnsw (embedding vector_l2_ops)
WITH (m = 16, ef_construction = 64)\g

Query time tuning with pgvector:

SET hnsw.ef_search = 100\g

Query time tuning with NeuronDB:

SET neurondb.hnsw_ef_search = 100\g

Indexing with IVF

IVF uses lists. Search probes determine how many lists participate in a query.

pgvector uses the access method name ivfflat and uses:

index reloption: lists
query time GUC: ivfflat.probes

NeuronDB uses the access method name ivf and uses:

index reloption: lists
query time GUC: neurondb.ivf_probes

IVF index creation in pgvector:

CREATE INDEX items_embedding_ivfflat_l2
ON items
USING ivfflat (embedding vector_l2_ops)
WITH (lists = 100)\g

IVF index creation in NeuronDB:

CREATE INDEX items_embedding_ivf_l2
ON items
USING ivf (embedding vector_l2_ops)
WITH (lists = 100)\g

Query time tuning with pgvector:

SET ivfflat.probes = 10\g

Query time tuning with NeuronDB:

SET neurondb.ivf_probes = 10\g

Filtered search

Filtered kNN queries need two things: a filter predicate and an ordered distance sort with a limit.

SELECT
    id,
    embedding <-> '[3,1,2]'::vector AS l2_distance
FROM items
WHERE id <> 1
ORDER BY l2_distance
LIMIT 5\g

For approximate indexes, filtering happens after index traversal in pgvector. pgvector provides iterative index scans to extend scans when filtering removes rows from the first pass.

Iterative scans for pgvector HNSW:

SET hnsw.iterative_scan = strict_order\g
SET hnsw.max_scan_tuples = 20000\g
SET hnsw.scan_mem_multiplier = 2\g

Iterative scans for NeuronDB HNSW:

SET neurondb.hnsw_iterative_scan = strict_order\g
SET neurondb.hnsw_max_scan_tuples = 20000\g
SET neurondb.hnsw_scan_mem_multiplier = 2\g

NeuronDB packed and sparse formats

Use vectorp when you want a packed dense format with metadata. Use vecmap for sparse high-dimensional inputs.

CREATE TABLE packed_items (
    id bigint PRIMARY KEY,
    embedding vectorp
)\g

INSERT INTO packed_items (id, embedding) VALUES
    (1, '[1,2,3]'::vectorp)\g

SELECT
    id,
    embedding
FROM packed_items
ORDER BY id
LIMIT 1\g

CREATE TABLE sparse_items (
    id bigint PRIMARY KEY,
    embedding vecmap
)\g

Quantization workflows

Quantization trades precision for smaller storage and faster scans. NeuronDB exposes multiple quantization functions. Each function returns a bytea representation.

CPU quantization examples:

SELECT
    vector_to_int8('[1,2,3]'::vector) AS q_int8,
    vector_to_fp16('[1,2,3]'::vector) AS q_fp16,
    vector_to_binary('[1,2,3]'::vector) AS q_binary,
    vector_to_int4('[1,2,3]'::vector) AS q_int4\g

Accuracy analysis examples:

SELECT
    quantize_analyze_int8('[1,2,3]'::vector) AS int8_stats,
    quantize_analyze_fp16('[1,2,3]'::vector) AS fp16_stats,
    quantize_analyze_binary('[1,2,3]'::vector) AS binary_stats,
    quantize_analyze_int4('[1,2,3]'::vector) AS int4_stats\g

GPU workflows in NeuronDB

NeuronDB exposes GPU status, GPU distance functions, and GPU kNN helpers in SQL.

GPU initialization and status:

SELECT neurondb_gpu_enable() AS gpu_enabled\g

SELECT
    device_id,
    device_name,
    total_memory_mb,
    free_memory_mb,
    is_available
FROM neurondb_gpu_info()\g

GPU distance functions:

SELECT
    vector_l2_distance_gpu('[1,2,3]'::vector, '[4,5,6]'::vector) AS l2_gpu,
    vector_cosine_distance_gpu('[1,2,3]'::vector, '[4,5,6]'::vector) AS cosine_gpu,
    vector_inner_product_gpu('[1,2,3]'::vector, '[4,5,6]'::vector) AS ip_gpu\g

GPU kNN helpers:

SELECT
    id,
    distance
FROM hnsw_knn_search_gpu('[1,2,3]'::vector, 10, 100)\g

SELECT
    id,
    distance
FROM ivf_knn_search_gpu('[1,2,3]'::vector, 10, 10)\g

GPU usage stats:

SELECT
    queries_executed,
    fallback_count,
    total_gpu_time_ms,
    total_cpu_time_ms,
    avg_latency_ms
FROM neurondb_gpu_stats()\g

Multi tenant controls in NeuronDB

NeuronDB includes tenant quota tracking and tenant specific helper functions. These objects live in the neurondb schema.

Tenant quota tables and views support workflows such as:

enforce per tenant vector count limits
track per tenant storage usage
query tenant usage and quota percent

NeuronDB includes tenant-aware HNSW helper functions:

hnsw_tenant_create
hnsw_tenant_search
hnsw_tenant_quota

Monitoring in NeuronDB

NeuronDB exposes Prometheus-compatible metrics via SQL:

SELECT
    queries_total,
    queries_success,
    queries_error,
    query_duration_sum,
    vectors_total,
    cache_hits,
    cache_misses,
    workers_active
FROM neurondb_prometheus_metrics()\g

Background workers and queues in NeuronDB

NeuronDB stores the queue and metrics state in SQL tables under the neurondb schema. Background workers process or sample those tables when enabled in PostgreSQL.

Queue workflow:

INSERT INTO neurondb.job_queue (tenant_id, job_type, payload)
VALUES (0, 'embedding', '{"text":"hello"}'::jsonb)\g

SELECT
    job_id,
    tenant_id,
    job_type,
    status,
    retry_count,
    created_at
FROM neurondb.job_queue
ORDER BY created_at DESC
LIMIT 10\g

Manual trigger helpers exist for testing:

SELECT neuranq_run_once() AS queued_work\g
SELECT neuranmon_sample() AS tuner_sample\g
SELECT neurandefrag_run() AS defrag_ran\g

LLM configuration and jobs in NeuronDB

NeuronDB stores LLM provider configuration in neurondb.llm_config and stores jobs in neurondb.llm_jobs.

Configuration workflow:

SELECT neurondb.set_llm_config(
    'https://api-inference.huggingface.co',
    'REPLACE_WITH_KEY',
    'REPLACE_WITH_MODEL'
)\g

SELECT
    api_base,
    default_model,
    updated_at
FROM neurondb.get_llm_config()\g

Job enqueue workflow:

SELECT ndb_llm_enqueue(
    'embed',
    'REPLACE_WITH_MODEL',
    'hello world',
    'tenant0'
) AS job_id\g

Index tuning helpers in NeuronDB

NeuronDB exposes index tuning and diagnostics helpers in SQL. These helpers return JSONB.

Examples:

SELECT index_tune_hnsw('items', 'embedding') AS hnsw_recommendation\g
SELECT index_tune_ivf('items', 'embedding') AS ivf_recommendation\g
SELECT index_recommend_type('items', 'embedding') AS index_choice\g
SELECT index_tune_query_params('items_embedding_hnsw_l2') AS query_knobs\g

Migration

Migration replaces one extension with another extension. Existing tables remain. Indexes with dependencies on vector extension objects drop during DROP EXTENSION vector CASCADE.

Drop pgvector

DROP EXTENSION vector CASCADE\g

Install NeuronDB

CREATE EXTENSION neurondb\g

Verify data

SELECT count(1) AS row_count FROM items\g

Your table rows remain. You still need to rebuild ANN indexes after dropping pgvector.

Recreate indexes For large tables (>100GB), increase maintenance_work_mem before building.

SET maintenance_work_mem = '4GB'\g
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)
WITH (m = 16, ef_construction = 64)\g

GPU Use the NeuronDB GPU functions and settings from the extension. The SQL surface exposes GPU kNN functions for HNSW and IVF.

Use Case Recommendations

Select the tool for your infrastructure and requirements. Start with your workload, then map your constraints to a short decision.

Use pgvector when your goal is simple vector search

Pick pgvector when you want fewer moving parts and fewer extension specific features.

Use pgvector when you meet most of these conditions:

You run CPU only workloads.
You want ivfflat and hnsw naming across docs, examples, and client libraries.
You want distance operators and two ANN access methods, with minimal extra SQL objects.
You want a smaller operational surface area inside the extension.

Use pgvector when your query pattern looks like this most of the time:

SELECT
    id,
    embedding <-> $1::vector AS distance
FROM items
ORDER BY distance
LIMIT 10

Use pgvector when you tune with pgvector GUCs:

SET hnsw.ef_search = 100\g
SET ivfflat.probes = 10\g

Use NeuronDB when your goal is a larger in database surface area

Pick NeuronDB when you want the same distance operators and index patterns plus additional SQL objects for GPU workflows, quantization workflows, tuning helpers, and operational queues.

Use NeuronDB when you meet most of these conditions:

You want ivf as an access method name, plus hnsw.
You want vectorp and vecmap as additional storage formats.
You want SQL functions for quantization, with both CPU and GPU entry points.
You want SQL functions for GPU status, GPU distance, and GPU kNN helpers.
You want SQL tables and views for tenant quotas, job queues, metrics, and Prometheus export.

Use NeuronDB when you want NeuronDB specific query time tuning:

SET neurondb.hnsw_ef_search = 100\g
SET neurondb.ivf_probes = 10\g

Use NeuronDB when you want index tuning helpers and diagnostics in SQL:

SELECT index_tune_hnsw('items', 'embedding') AS hnsw_recommendation\g
SELECT index_tune_ivf('items', 'embedding') AS ivf_recommendation\g
SELECT index_recommend_type('items', 'embedding') AS index_choice\g

Short decision flow

Start here when you need a quick answer.

If you want ivfflat, pick pgvector.
If you want ivf, pick NeuronDB.
If you want GPU SQL entry points, pick NeuronDB.
If you want fewer extension-owned tables and views, pick pgvector.
If you want quantization helpers and analysis functions in SQL, pick NeuronDB.

Practical scenarios

Use this section as a checklist.

Scenario 1: Single app, single embedding model, CPU only

Use pgvector
Create one HNSW index on vector_l2_ops or vector_cosine_ops
Tune hnsw.ef_search per endpoint or per query

Scenario 2: Multi-tenant SaaS with per-tenant limits

Use NeuronDB
Use tenant quota tables and views under the neurondb schema
Use tenant aware HNSW helper functions when you want tenant scoped index management

Scenario 3: Storage pressure from large embeddings

Use NeuronDB
Use quantization functions to produce compact bytea outputs
Compare distance preservation with quantize_compare_distances

Scenario 4: GPU present, batch heavy workloads

Use NeuronDB
Enable GPU runtime and query neurondb_gpu_info
Use GPU kNN helpers where your workflow matches those function signatures

Conclusion

pgvector focuses on vector search primitives. NeuronDB adds additional types, an ivf access method, GPU entry points, quantization helpers, worker tables, and metrics export.

Pick the extension based on your operational goal. Keep your schema and query patterns simple. Measure with the benchmark scripts in this repo, then tune one knob at a time.