To store and query embeddings, we need a database capable of handling vector similarity search.
A practical solution is using PostgreSQL with the pgvector extension.
This allows PostgreSQL to store vectors and perform similarity queries efficiently.
Docker Compose Setup
Create a docker-compose.yml.
services:
postgres:
image: pgvector/pgvector:pg16
container_name: vector-db
environment:
POSTGRES_DB: vectordb
POSTGRES_USER: admin
POSTGRES_PASSWORD: admin
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
postgres_data:
Run:
docker compose up -d
Enable the Extension
Inside PostgreSQL:
CREATE EXTENSION vector;
Creating a Vector Table
CREATE TABLE document_embedding (
id BIGSERIAL PRIMARY KEY,
document_id BIGINT,
chunk_text TEXT,
embedding VECTOR(1536)
);
1536 dimensions are common for many embedding models.
Similarity Search
Example query:
SELECT chunk_text
FROM document_embedding
ORDER BY embedding <-> '[0.123, 0.443, ...]'
LIMIT 5;
The <-> operator calculates vector distance.
Why Use PostgreSQL?
Advantages:
- production-ready database
- familiar SQL
- easy Docker setup
- integrates well with Java
- avoids introducing new infrastructure
Next Article
Now that the database is ready, the next step is indexing a knowledge base using Spring Boot.
Project Here
Top comments (0)