DEV Community

yukaty
yukaty

Posted on

Getting Started with Vector Search (Part 2)

In Part 1, we set up PostgreSQL with pgvector. Now, let's see how vector search actually works.

Contents

What are Embeddings?

An embedding is like a smart summary of content in numbers. The distance between two embeddings indicates their level of similarity. A small distance suggests that the vectors are quite similar, and a large distance indicates that they are less related.

๐Ÿ“š Book A: Web Development  (Distance: 0.2) โฌ…๏ธ Very Similar!
๐Ÿ“š Book B: JavaScript 101   (Distance: 0.3) โฌ…๏ธ Similar!
๐Ÿ“š Book C: Cooking Recipes  (Distance: 0.9) โŒ Not Similar
Enter fullscreen mode Exit fullscreen mode

Loading Sample Data

Now, let's populate our database with some data. We'll use:

  • Open Library API for book data
  • OpenAI API to create embeddings
  • pgvector to store and search them

Project Structure

pgvector-setup/             # From Part 1
  โ”œโ”€โ”€ compose.yml
  โ”œโ”€โ”€ postgres/
  โ”‚   โ””โ”€โ”€ schema.sql
  โ”œโ”€โ”€ .env                  # New: for API keys
  โ””โ”€โ”€ scripts/              # New: for data loading
      โ”œโ”€โ”€ requirements.txt
      โ”œโ”€โ”€ Dockerfile
      โ””โ”€โ”€ load_data.py
Enter fullscreen mode Exit fullscreen mode

Create a Script

Let's start with a script to load data from external APIs. The full script is Here.

Setting Up Data Loading

Create .env:

OPENAI_API_KEY=your_openai_api_key
Enter fullscreen mode Exit fullscreen mode

Update compose.yml to add the data loader:

services:
  # ... existing db service from Part 1

  data_loader:
    build:
      context: ./scripts
    environment:
      - DATABASE_URL=postgresql://postgres:password@db:5432/example_db
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    depends_on:
      - db
Enter fullscreen mode Exit fullscreen mode

Load the data:

docker compose up data_loader
Enter fullscreen mode Exit fullscreen mode

You should see 10 programming books with their metadata.

Exploring Vector Search

Connect to your database:

docker exec -it pgvector-db psql -U postgres -d example_db
Enter fullscreen mode Exit fullscreen mode

Understanding Vector Data

Let's peek at what embeddings actually look like:

-- View first 5 dimensions of an embedding
SELECT
    name,
    (embedding::text::float[])[1:5] as first_5_dimensions
FROM items
LIMIT 1;
Enter fullscreen mode Exit fullscreen mode
  • Each embedding has 1536 dimensions (using OpenAI's model)
  • Values typically range from -1 to 1
  • These numbers represent semantic meaning

Finding Similar Books

Try a simple similarity search:

-- Find 3 books similar to any book about Web
SELECT name, metadata
FROM items
ORDER BY embedding <-> (
    SELECT embedding
    FROM items
    WHERE metadata->>'title' LIKE '%Web%'
    LIMIT 1
)
LIMIT 3;
Enter fullscreen mode Exit fullscreen mode
  1. Find a book with "Web" in its title
  2. Get that book's embedding (its mathematical representation)
  3. Compare this embedding with all other books' embeddings
  4. Get the 3 most similar books (smallest distances)

Understanding PostgreSQL Operators

Let's break down the operators used in vector search queries:

JSON Text Operator: ->>

Extracts text value from a JSON field.

Example:

-- If metadata = {"title": "ABC"}, it returns "ABC"
SELECT metadata->>'title' FROM items;
Enter fullscreen mode Exit fullscreen mode

Vector Distance Operator: <->

Measures similarity between two vectors.

  • Smaller distance = More similar
  • Larger distance = Less similar

Example:

-- Find similar books
SELECT name, embedding <-> query_embedding as distance
FROM items
ORDER BY distance
LIMIT 3;
Enter fullscreen mode Exit fullscreen mode

Next Steps

Up next, we'll:

  • Build a FastAPI application
  • Create search endpoints
  • Make our vector search accessible via API

Stay tuned for Part 3: "Building a Vector Search API"! ๐Ÿš€

Feel free to drop a comment below! ๐Ÿ’ฌ

Top comments (0)