DEV Community

yukaty
yukaty Subscriber

Posted on • Edited on

16

Part 1: Setup with PostgreSQL and pgvector

Ever wondered how Netflix suggests movies you might like, or how Spotify creates personalized playlists? These AI-powered features often use vector similarity search under the hood. In this series, we'll build our own AI search engine using PostgreSQL with pgvector!

Let's get started...🐢


Contents


Project Overview ✨

We'll build a search engine to find similar content based on meaning, not just matching keywords. This is the same type of technology behind:

  • GitHub Copilot's code suggestions
  • Spotify's song recommendations
  • Netflix's movie recommendations

While various tools and services support similar functionality, we'll use pgvector to implement vector similarity search within postgreSQL.

In Part 1, we'll set up the database infrastructure. In Part 2, we'll implement the search functionality using OpenAI's embeddings.


What is Vector Search? 🔎

When AI processes content (text, code, or images), it creates a special list of numbers called embedding. Think of it as a smart summary that captures the content's meaning. Similar content will have similar numbers, making it easy to find related items.

If you're not familiar with Machine Learning, don't worry! You can easily obtain these embeddings from AI APIs like OpenAI, even without deep AI knowledge.

pgvector helps us efficiently store and search these embeddings as vectors in PostgreSQL.


Step-by-Step Setup 👣

Make sure you have Docker Desktop installed on your computer.

Project Structure

vector-search/
├── compose.yml
└── postgres/
    └── schema.sql
Enter fullscreen mode Exit fullscreen mode

1. Create compose.yml

services:
  db:
    image: pgvector/pgvector:pg17 # PostgreSQL with pgvector support
    container_name: pgvector-db
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: password
      POSTGRES_DB: example_db
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data
      - ./postgres/schema.sql:/docker-entrypoint-initdb.d/schema.sql

volumes:
  pgdata: # Stores data outside the container to ensure persistence
Enter fullscreen mode Exit fullscreen mode

2. Define Database Schema

Create postgres/schema.sql:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create sample table
CREATE TABLE items (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    item_data JSONB,
    embedding vector(1536) -- vector data
);
Enter fullscreen mode Exit fullscreen mode

3. Start the Database

Run Docker Compose to build and start the PostgreSQL container with pgvector.

docker compose up --build
Enter fullscreen mode Exit fullscreen mode

4. Verify the Setup

Connect to PostgreSQL:

docker exec -it pgvector-db psql -U postgres -d example_db
Enter fullscreen mode Exit fullscreen mode

Check if everything is set up correctly:

-- Check installed extensions
\dx

-- Check table creation
\dt

-- Check table structure
\d items
Enter fullscreen mode Exit fullscreen mode

Troubleshooting Tips 🛠️

Error: Port 5432 already in use

Change the port in compose.yml to 5433 or another free port.

  ports:
    - "5433:5432"
Enter fullscreen mode Exit fullscreen mode

Database not initializing properly

Remove the volume and restart.

  docker-compose down -v    # Remove existing volume
  docker-compose up --build # Start fresh
Enter fullscreen mode Exit fullscreen mode

Still not sure what's wrong?

Check the container logs.

  docker compose logs db
Enter fullscreen mode Exit fullscreen mode

Quick Preview 👀

Here's a quick preview of how we'll query similar items in Part 2:

-- Find items similar to a specific vector
SELECT id, name, item_data
FROM items
ORDER BY embedding <-> '[0.1, 0.2, ...]'::vector
LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

Replace [0.1, 0.2, ...] with an actual vector from AI models.


What's Next? 💭

We'll dive into the following topics:

  • Understand what embeddings are and how they work
  • Generate embeddings using OpenAI
  • See how vector search works in practice

Stay tuned! 🚀

Spot any mistakes or have a better way? Please leave a comment below! 💬

Do your career a big favor. Join DEV. (The website you're on right now)

It takes one minute, it's free, and is worth it for your career.

Get started

Community matters

Top comments (0)

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay