DEV Community

Cover image for NeuronDB Vector vs pgvector: Technical Comparison
NeuronDB Support
NeuronDB Support

Posted on

NeuronDB Vector vs pgvector: Technical Comparison

You store embeddings as vectors. You run a similarity search inside PostgreSQL. Two extensions matter in this space: pgvector and NeuronDB. This post compares both extensions using real behavior from the source trees in this repo. Every limit and object name matches code and SQL.

Introduction

Vector databases store embeddings. Similarity search ranks rows by distance. PostgreSQL extensions bring vector types, distance operators, and index access methods into the SQL layer.

You choose pgvector when you want a focused extension with broad adoption. You choose NeuronDB when you want pgvector-style SQL plus additional types, GPU paths, and operational surface area inside the extension.

This post uses pgvector v0.8.1 semantics and NeuronDB v3.0.0-devel semantics from the local source. Feature parity varies by object. Some pieces match one-to-one. Some pieces use different names with aliases.

Project references
NeuronDB site: https://www.neurondb.ai
Source code: https://github.com/neurondb/neurondb

Architecture

Architectural choices define performance limits and feature capabilities.

pgvector Architecture

pgvector implements types, operators, and index access methods in C and SQL. The extension exposes a small surface area and relies on PostgreSQL storage, WAL, and query planning.

Type system and layouts

pgvector defines these public types:

  • vector: dense float32 vector, up to 16000 dimensions.
  • halfvec: half precision vector, up to 16000 dimensions.
  • sparsevec: sparse vector with int32 indices and float32 values, limits depend on operation.
  • bit: PostgreSQL bit type, used as a binary vector for Hamming and Jaccard distance.

The vector type uses a varlena header plus two int16 fields: dim and unused. The payload stores dim float32 values. This layout matches typedef struct Vector in pgvector and yields storage of 4 times dimensions plus 8 bytes per value.

The sparsevec on disk format stores dim (int32), nnz (int32), and unused (int32) in the header, followed by nnz int32 indices. Values follow indices as a contiguous float32 array.

CPU dispatch

pgvector uses a mix of scalar code and CPU dispatch. On Linux x86_64, some functions compile with target_clones to generate multiple code paths. The code selects a path based on CPU capabilities. This approach appears in vector.c and bitutils.c.

NeuronDB Architecture

NeuronDB implements vector types, operators, access methods, and additional systems within a single extension. The extension defines types beyond pgvector, adds IVF under the access method name ivf, and includes GPU backends.

Type system and layouts

NeuronDB exposes pgvector style types, plus additional NeuronDB-specific types:

  • vector: dense float32 vector with dim int16 and an unused int16 field, up to 16000 dimensions.
  • vectorp: packed vector with metadata. The layout includes a CRC32 fingerprint, a version, a dimension, and an endian guard, followed by float32 data.
  • vecmap: sparse high-dimensional map. The layout stores total_dim and nnz, followed by parallel int32 indices and float32 values.
  • halfvec: half precision vector with a 4000 dimension limit in NeuronDB.
  • sparsevec: sparse vector type with 1000 nonzero entries and 1M dimensions in NeuronDB.
  • binaryvec: binary vector type with a Hamming distance operator.

NeuronDB also defines internal structs for quantized vectors, such as int8, int4, and binary-packed representations. The SQL surface exposes conversion and distance functions.

Index access methods

NeuronDB defines two ANN index access methods as PostgreSQL access methods:

  • hnsw: HNSW index access method.
  • ivf: IVF index access method.

NeuronDB also defines operator classes for vector, halfvec, sparsevec, bit, and binaryvec in both hnsw and ivf.

CPU SIMD and GPU backends

NeuronDB includes explicit AVX2 and AVX-512 implementations of common distance functions in vector_distance_simd.c. The build selects the compiled path based on compiler flags.

NeuronDB includes three GPU backend families in the tree:

  • CUDA
  • ROCm
  • Metal

The runtime backend selection logic maps backend type to names cuda, rocm, and metal. GPU entry points for HNSW and IVF kNN search are provided via SQL functions.

Feature Comparison

Both extensions integrate with PostgreSQL. NeuronDB adds operational features.

Table 1: Types, distances, indexes, and hard limits

This table focuses on public SQL objects and hard limits enforced by each project.

Area pgvector NeuronDB
Extension name vector neurondb
Dense type vector (float32), max 16000 dims vector (float32), max 16000 dims
Half type halfvec (half), max 16000 dims halfvec (FP16), max 4000 dims
Sparse type sparsevec (dim int32, nnz int32, indices int32, values float32), max 1e9 dims, max 16000 nnz sparsevec, max 1M dims, max 1000 nonzero entries
Binary vector PostgreSQL bit plus pgvector operators PostgreSQL bit operator classes plus binaryvec type
Distance operators <-> L2, <#> negative inner product, <=> cosine, <+> L1, <~> Hamming, <%> Jaccard <-> L2, <#> negative inner product, <=> cosine, <+> L1, <~> Hamming, Jaccard via vector_jaccard_distance(vector, vector) and <%> for bit
ANN access methods hnsw, ivfflat hnsw, ivf
Dense index max dims 2000 for HNSW and IVFFlat limited by page layout, large dims fail with a page size error during build

Table 2: Tuning knobs, defaults, and where each knob lives

This table lists knobs. Each knob changes recall, latency, or build time. The table also lists the location of each knob.

Knob pgvector NeuronDB
HNSW m index option WITH (m = N), default 16 index option WITH (m = N), default 16
HNSW ef_construction index option WITH (ef_construction = N), default 64 index option WITH (ef_construction = N), default 200
HNSW ef_search GUC hnsw.ef_search, default 40 GUC neurondb.hnsw_ef_search, default 64
HNSW iterative scans GUC hnsw.iterative_scan GUC neurondb.hnsw_iterative_scan
HNSW scan stop GUC hnsw.max_scan_tuples and hnsw.scan_mem_multiplier GUC neurondb.hnsw_max_scan_tuples and neurondb.hnsw_scan_mem_multiplier
IVF lists index option WITH (lists = N) on ivfflat index option WITH (lists = N) on ivf, and NeuronDB maps ivfflat to ivf in helper functions
IVF probes GUC ivfflat.probes, default 1 GUC neurondb.ivf_probes, default 10
IVF iterative scans GUC ivfflat.iterative_scan and ivfflat.max_probes GUC neurondb.ivf_iterative_scan and neurondb.ivf_max_probes

Table 3: Acceleration and storage formats

This table covers CPU SIMD, GPU backends, and compressed formats.

Area pgvector NeuronDB
CPU vector dispatch target_clones dispatch on supported builds explicit AVX2 and AVX-512 distance functions, selected by build flags
GPU backends none CUDA, ROCm, Metal
GPU kNN helpers none hnsw_knn_search_gpu(query vector, k int, ef_search int) and ivf_knn_search_gpu(query vector, k int, nprobe int)
Packed dense format none vectorp with CRC32 fingerprint, version, endian guard, and float32 data
Sparse high dim format sparsevec vecmap and NeuronDB sparsevec type
Quantized internal types binary quantization via binary_quantize to bit int8, int4, binary, and FP16 quantization in type and function layer

Production Readiness

Production systems need repeatable behavior, clear configuration, and a monitoring path. NeuronDB ships extra primitives for tenant controls, queue-based workflows, and metrics export.

NeuronDB includes these operational surfaces in SQL:

  • tenant usage tables and quota tracking
  • background worker tables and manual triggers
  • Prometheus compatible metrics via SQL, plus an HTTP exporter endpoint

Performance

Performance depends on dataset shape, index parameters, storage layout, and query patterns. Use the benchmark scripts in this repo to measure your hardware and build.

Benchmarks

The repository includes benchmark scripts and SQL stress tests. Use these tools to compare pgvector and NeuronDB on your own system.

Vector benchmark suite

The vector benchmark suite downloads public ANN datasets, loads them into PostgreSQL, builds indexes, runs queries, and writes JSON results.

Run the full pipeline:

python3 NeuronDB/benchmark/vector/run_bm.py --prepare --load --run --datasets sift-128-euclidean --configs hnsw --k-values 10
Enter fullscreen mode Exit fullscreen mode

Run a quick pipeline with defaults:

python3 NeuronDB/benchmark/vector/run_bm.py --prepare --load --run
Enter fullscreen mode Exit fullscreen mode

Stress tests

The repo includes SQL stress suites for pgvector and NeuronDB.

pgvector stress suite:

\i NeuronDB/benchmark/vector/pgvector_stress.sql
Enter fullscreen mode Exit fullscreen mode

NeuronDB stress suite:

\i NeuronDB/benchmark/vector/neurondb_vector_stress.sql
Enter fullscreen mode Exit fullscreen mode

Example result from committed artifact

This example comes from NeuronDB/benchmark/vector/results/benchmark_sift-128-euclidean_hnsw_20260104_211033.json.

Dataset: sift-128-euclidean
Train vectors: 1000000
Test queries: 10000
Dimension: 128
Index: hnsw, m 16, ef_construction 200
Query: k 10, ef_search 100
Average latency ms: 512.9799604415894
P95 latency ms: 521.4026927947998
QPS: 1.9493938888746618
Recall: 1.0
Enter fullscreen mode Exit fullscreen mode

Practical usage

This section focuses on repeatable workflows. Each workflow uses real object names from both projects.

Basic table and query pattern

Use a fixed-dimension column when a single embedding model drives it. Use a typmod column less when multiple embedding models share one column.

Example with a fixed dimension:

CREATE TABLE items (
    id bigint PRIMARY KEY,
    embedding vector(3)
)\g

INSERT INTO items (id, embedding) VALUES
    (1, '[1,2,3]'::vector),
    (2, '[4,5,6]'::vector)\g

SELECT
    id,
    embedding <-> '[3,1,2]'::vector AS l2_distance
FROM items
ORDER BY l2_distance
LIMIT 5\g
Enter fullscreen mode Exit fullscreen mode

Indexing with HNSW

HNSW uses a graph. You trade recall for speed by changing the candidate list size during search.

pgvector uses these knobs:

  • index reloptions: m, ef_construction
  • query time GUC: hnsw.ef_search

NeuronDB uses these knobs:

  • index reloptions: m, ef_construction
  • query time GUC: neurondb.hnsw_ef_search

HNSW index creation:

CREATE INDEX items_embedding_hnsw_l2
ON items
USING hnsw (embedding vector_l2_ops)
WITH (m = 16, ef_construction = 64)\g
Enter fullscreen mode Exit fullscreen mode

Query time tuning with pgvector:

SET hnsw.ef_search = 100\g
Enter fullscreen mode Exit fullscreen mode

Query time tuning with NeuronDB:

SET neurondb.hnsw_ef_search = 100\g
Enter fullscreen mode Exit fullscreen mode

Indexing with IVF

IVF uses lists. Search probes determine how many lists participate in a query.

pgvector uses the access method name ivfflat and uses:

  • index reloption: lists
  • query time GUC: ivfflat.probes

NeuronDB uses the access method name ivf and uses:

  • index reloption: lists
  • query time GUC: neurondb.ivf_probes

IVF index creation in pgvector:

CREATE INDEX items_embedding_ivfflat_l2
ON items
USING ivfflat (embedding vector_l2_ops)
WITH (lists = 100)\g
Enter fullscreen mode Exit fullscreen mode

IVF index creation in NeuronDB:

CREATE INDEX items_embedding_ivf_l2
ON items
USING ivf (embedding vector_l2_ops)
WITH (lists = 100)\g
Enter fullscreen mode Exit fullscreen mode

Query time tuning with pgvector:

SET ivfflat.probes = 10\g
Enter fullscreen mode Exit fullscreen mode

Query time tuning with NeuronDB:

SET neurondb.ivf_probes = 10\g
Enter fullscreen mode Exit fullscreen mode

Filtered search

Filtered kNN queries need two things: a filter predicate and an ordered distance sort with a limit.

SELECT
    id,
    embedding <-> '[3,1,2]'::vector AS l2_distance
FROM items
WHERE id <> 1
ORDER BY l2_distance
LIMIT 5\g
Enter fullscreen mode Exit fullscreen mode

For approximate indexes, filtering happens after index traversal in pgvector. pgvector provides iterative index scans to extend scans when filtering removes rows from the first pass.

Iterative scans for pgvector HNSW:

SET hnsw.iterative_scan = strict_order\g
SET hnsw.max_scan_tuples = 20000\g
SET hnsw.scan_mem_multiplier = 2\g
Enter fullscreen mode Exit fullscreen mode

Iterative scans for NeuronDB HNSW:

SET neurondb.hnsw_iterative_scan = strict_order\g
SET neurondb.hnsw_max_scan_tuples = 20000\g
SET neurondb.hnsw_scan_mem_multiplier = 2\g
Enter fullscreen mode Exit fullscreen mode

NeuronDB packed and sparse formats

Use vectorp when you want a packed dense format with metadata. Use vecmap for sparse high-dimensional inputs.

CREATE TABLE packed_items (
    id bigint PRIMARY KEY,
    embedding vectorp
)\g

INSERT INTO packed_items (id, embedding) VALUES
    (1, '[1,2,3]'::vectorp)\g

SELECT
    id,
    embedding
FROM packed_items
ORDER BY id
LIMIT 1\g
Enter fullscreen mode Exit fullscreen mode
CREATE TABLE sparse_items (
    id bigint PRIMARY KEY,
    embedding vecmap
)\g
Enter fullscreen mode Exit fullscreen mode

Quantization workflows

Quantization trades precision for smaller storage and faster scans. NeuronDB exposes multiple quantization functions. Each function returns a bytea representation.

CPU quantization examples:

SELECT
    vector_to_int8('[1,2,3]'::vector) AS q_int8,
    vector_to_fp16('[1,2,3]'::vector) AS q_fp16,
    vector_to_binary('[1,2,3]'::vector) AS q_binary,
    vector_to_int4('[1,2,3]'::vector) AS q_int4\g
Enter fullscreen mode Exit fullscreen mode

Accuracy analysis examples:

SELECT
    quantize_analyze_int8('[1,2,3]'::vector) AS int8_stats,
    quantize_analyze_fp16('[1,2,3]'::vector) AS fp16_stats,
    quantize_analyze_binary('[1,2,3]'::vector) AS binary_stats,
    quantize_analyze_int4('[1,2,3]'::vector) AS int4_stats\g
Enter fullscreen mode Exit fullscreen mode

GPU workflows in NeuronDB

NeuronDB exposes GPU status, GPU distance functions, and GPU kNN helpers in SQL.

GPU initialization and status:

SELECT neurondb_gpu_enable() AS gpu_enabled\g

SELECT
    device_id,
    device_name,
    total_memory_mb,
    free_memory_mb,
    is_available
FROM neurondb_gpu_info()\g
Enter fullscreen mode Exit fullscreen mode

GPU distance functions:

SELECT
    vector_l2_distance_gpu('[1,2,3]'::vector, '[4,5,6]'::vector) AS l2_gpu,
    vector_cosine_distance_gpu('[1,2,3]'::vector, '[4,5,6]'::vector) AS cosine_gpu,
    vector_inner_product_gpu('[1,2,3]'::vector, '[4,5,6]'::vector) AS ip_gpu\g
Enter fullscreen mode Exit fullscreen mode

GPU kNN helpers:

SELECT
    id,
    distance
FROM hnsw_knn_search_gpu('[1,2,3]'::vector, 10, 100)\g
Enter fullscreen mode Exit fullscreen mode
SELECT
    id,
    distance
FROM ivf_knn_search_gpu('[1,2,3]'::vector, 10, 10)\g
Enter fullscreen mode Exit fullscreen mode

GPU usage stats:

SELECT
    queries_executed,
    fallback_count,
    total_gpu_time_ms,
    total_cpu_time_ms,
    avg_latency_ms
FROM neurondb_gpu_stats()\g
Enter fullscreen mode Exit fullscreen mode

Multi tenant controls in NeuronDB

NeuronDB includes tenant quota tracking and tenant specific helper functions. These objects live in the neurondb schema.

Tenant quota tables and views support workflows such as:

  • enforce per tenant vector count limits
  • track per tenant storage usage
  • query tenant usage and quota percent

NeuronDB includes tenant-aware HNSW helper functions:

  • hnsw_tenant_create
  • hnsw_tenant_search
  • hnsw_tenant_quota

Monitoring in NeuronDB

NeuronDB exposes Prometheus-compatible metrics via SQL:

SELECT
    queries_total,
    queries_success,
    queries_error,
    query_duration_sum,
    vectors_total,
    cache_hits,
    cache_misses,
    workers_active
FROM neurondb_prometheus_metrics()\g
Enter fullscreen mode Exit fullscreen mode

Background workers and queues in NeuronDB

NeuronDB stores the queue and metrics state in SQL tables under the neurondb schema. Background workers process or sample those tables when enabled in PostgreSQL.

Queue workflow:

INSERT INTO neurondb.job_queue (tenant_id, job_type, payload)
VALUES (0, 'embedding', '{"text":"hello"}'::jsonb)\g

SELECT
    job_id,
    tenant_id,
    job_type,
    status,
    retry_count,
    created_at
FROM neurondb.job_queue
ORDER BY created_at DESC
LIMIT 10\g
Enter fullscreen mode Exit fullscreen mode

Manual trigger helpers exist for testing:

SELECT neuranq_run_once() AS queued_work\g
SELECT neuranmon_sample() AS tuner_sample\g
SELECT neurandefrag_run() AS defrag_ran\g
Enter fullscreen mode Exit fullscreen mode

LLM configuration and jobs in NeuronDB

NeuronDB stores LLM provider configuration in neurondb.llm_config and stores jobs in neurondb.llm_jobs.

Configuration workflow:

SELECT neurondb.set_llm_config(
    'https://api-inference.huggingface.co',
    'REPLACE_WITH_KEY',
    'REPLACE_WITH_MODEL'
)\g

SELECT
    api_base,
    default_model,
    updated_at
FROM neurondb.get_llm_config()\g
Enter fullscreen mode Exit fullscreen mode

Job enqueue workflow:

SELECT ndb_llm_enqueue(
    'embed',
    'REPLACE_WITH_MODEL',
    'hello world',
    'tenant0'
) AS job_id\g
Enter fullscreen mode Exit fullscreen mode

Index tuning helpers in NeuronDB

NeuronDB exposes index tuning and diagnostics helpers in SQL. These helpers return JSONB.

Examples:

SELECT index_tune_hnsw('items', 'embedding') AS hnsw_recommendation\g
SELECT index_tune_ivf('items', 'embedding') AS ivf_recommendation\g
SELECT index_recommend_type('items', 'embedding') AS index_choice\g
SELECT index_tune_query_params('items_embedding_hnsw_l2') AS query_knobs\g
Enter fullscreen mode Exit fullscreen mode

Migration

Migration replaces one extension with another extension. Existing tables remain. Indexes with dependencies on vector extension objects drop during DROP EXTENSION vector CASCADE.

  1. Drop pgvector
DROP EXTENSION vector CASCADE\g
Enter fullscreen mode Exit fullscreen mode
  1. Install NeuronDB
CREATE EXTENSION neurondb\g
Enter fullscreen mode Exit fullscreen mode
  1. Verify data
SELECT count(1) AS row_count FROM items\g
Enter fullscreen mode Exit fullscreen mode

Your table rows remain. You still need to rebuild ANN indexes after dropping pgvector.

  1. Recreate indexes For large tables (>100GB), increase maintenance_work_mem before building.
SET maintenance_work_mem = '4GB'\g
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)
WITH (m = 16, ef_construction = 64)\g
Enter fullscreen mode Exit fullscreen mode
  1. GPU Use the NeuronDB GPU functions and settings from the extension. The SQL surface exposes GPU kNN functions for HNSW and IVF.

Use Case Recommendations

Select the tool for your infrastructure and requirements. Start with your workload, then map your constraints to a short decision.

Use pgvector when your goal is simple vector search

Pick pgvector when you want fewer moving parts and fewer extension specific features.

Use pgvector when you meet most of these conditions:

  • You run CPU only workloads.
  • You want ivfflat and hnsw naming across docs, examples, and client libraries.
  • You want distance operators and two ANN access methods, with minimal extra SQL objects.
  • You want a smaller operational surface area inside the extension.

Use pgvector when your query pattern looks like this most of the time:

SELECT
    id,
    embedding <-> $1::vector AS distance
FROM items
ORDER BY distance
LIMIT 10
Enter fullscreen mode Exit fullscreen mode

Use pgvector when you tune with pgvector GUCs:

SET hnsw.ef_search = 100\g
SET ivfflat.probes = 10\g
Enter fullscreen mode Exit fullscreen mode

Use NeuronDB when your goal is a larger in database surface area

Pick NeuronDB when you want the same distance operators and index patterns plus additional SQL objects for GPU workflows, quantization workflows, tuning helpers, and operational queues.

Use NeuronDB when you meet most of these conditions:

  • You want ivf as an access method name, plus hnsw.
  • You want vectorp and vecmap as additional storage formats.
  • You want SQL functions for quantization, with both CPU and GPU entry points.
  • You want SQL functions for GPU status, GPU distance, and GPU kNN helpers.
  • You want SQL tables and views for tenant quotas, job queues, metrics, and Prometheus export.

Use NeuronDB when you want NeuronDB specific query time tuning:

SET neurondb.hnsw_ef_search = 100\g
SET neurondb.ivf_probes = 10\g
Enter fullscreen mode Exit fullscreen mode

Use NeuronDB when you want index tuning helpers and diagnostics in SQL:

SELECT index_tune_hnsw('items', 'embedding') AS hnsw_recommendation\g
SELECT index_tune_ivf('items', 'embedding') AS ivf_recommendation\g
SELECT index_recommend_type('items', 'embedding') AS index_choice\g
Enter fullscreen mode Exit fullscreen mode

Short decision flow

Start here when you need a quick answer.

  1. If you want ivfflat, pick pgvector.
  2. If you want ivf, pick NeuronDB.
  3. If you want GPU SQL entry points, pick NeuronDB.
  4. If you want fewer extension-owned tables and views, pick pgvector.
  5. If you want quantization helpers and analysis functions in SQL, pick NeuronDB.

Practical scenarios

Use this section as a checklist.

Scenario 1: Single app, single embedding model, CPU only

  • Use pgvector
  • Create one HNSW index on vector_l2_ops or vector_cosine_ops
  • Tune hnsw.ef_search per endpoint or per query

Scenario 2: Multi-tenant SaaS with per-tenant limits

  • Use NeuronDB
  • Use tenant quota tables and views under the neurondb schema
  • Use tenant aware HNSW helper functions when you want tenant scoped index management

Scenario 3: Storage pressure from large embeddings

  • Use NeuronDB
  • Use quantization functions to produce compact bytea outputs
  • Compare distance preservation with quantize_compare_distances

Scenario 4: GPU present, batch heavy workloads

  • Use NeuronDB
  • Enable GPU runtime and query neurondb_gpu_info
  • Use GPU kNN helpers where your workflow matches those function signatures

Conclusion

pgvector focuses on vector search primitives. NeuronDB adds additional types, an ivf access method, GPU entry points, quantization helpers, worker tables, and metrics export.

Pick the extension based on your operational goal. Keep your schema and query patterns simple. Measure with the benchmark scripts in this repo, then tune one knob at a time.

Top comments (0)