To store and query embeddings, we need a database capable of handling vector similarity search.
A practical solution is using PostgreSQL with the pgvector extension.
This allows PostgreSQL to store vectors and perform similarity queries efficiently.
Docker Compose Setup
Create a docker-compose.yml.
services:
postgres:
image: pgvector/pgvector:pg16
container_name: vector-db
environment:
POSTGRES_DB: vectordb
POSTGRES_USER: admin
POSTGRES_PASSWORD: admin
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
postgres_data:
Run:
docker compose up -d
Enable the Extension
Inside PostgreSQL:
CREATE EXTENSION vector;
Creating a Vector Table
CREATE TABLE document_embedding (
id BIGSERIAL PRIMARY KEY,
document_id BIGINT,
chunk_text TEXT,
embedding VECTOR(1536)
);
1536 dimensions are common for many embedding models.
Similarity Search
Example query:
SELECT chunk_text
FROM document_embedding
ORDER BY embedding <-> '[0.123, 0.443, ...]'
LIMIT 5;
The <-> operator calculates vector distance.
Why Use PostgreSQL?
Advantages:
- production-ready database
- familiar SQL
- easy Docker setup
- integrates well with Java
- avoids introducing new infrastructure
Sequence
- Meaning: How Data Vectorization Powers AI
- Turning PostgreSQL Into a Vector Database with Docker
- Indexing Knowledge Base Content with Spring Boot and pgvector
- Building Semantic Search with Spring Boot, PostgreSQL, and pgvector (RAG Retrieval)
Top comments (0)