Turning PostgreSQL Into a Vector Database with Docker

#postgres #ai #programming

To store and query embeddings, we need a database capable of handling vector similarity search.

A practical solution is using PostgreSQL with the pgvector extension.

This allows PostgreSQL to store vectors and perform similarity queries efficiently.

Docker Compose Setup

Create a docker-compose.yml.

services:
  postgres:
    image: pgvector/pgvector:pg16
    container_name: vector-db
    environment:
      POSTGRES_DB: vectordb
      POSTGRES_USER: admin
      POSTGRES_PASSWORD: admin
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data

volumes:
  postgres_data:

Run:

docker compose up -d

Enable the Extension

Inside PostgreSQL:

CREATE EXTENSION vector;

Creating a Vector Table

CREATE TABLE document_embedding (
    id BIGSERIAL PRIMARY KEY,
    document_id BIGINT,
    chunk_text TEXT,
    embedding VECTOR(1536)
);

1536 dimensions are common for many embedding models.

Similarity Search

Example query:

SELECT chunk_text
FROM document_embedding
ORDER BY embedding <-> '[0.123, 0.443, ...]'
LIMIT 5;

The <-> operator calculates vector distance.

Why Use PostgreSQL?

Advantages:

production-ready database
familiar SQL
easy Docker setup
integrates well with Java
avoids introducing new infrastructure

Sequence

Meaning: How Data Vectorization Powers AI
Turning PostgreSQL Into a Vector Database with Docker
Indexing Knowledge Base Content with Spring Boot and pgvector
Building Semantic Search with Spring Boot, PostgreSQL, and pgvector (RAG Retrieval)
How I Added LangChain4j Without Letting It Take Over My Spring Boot App

Project Here

DEV Community