DEV Community

Michael Wong
Michael Wong

Posted on

Building Your Notion AI with pgvector in 10 Minutes

Ever wish you could ask your team's knowledge base a question and get instant answers? That's exactly what Notion AI does—and it's pretty magical. But here's the thing: you can build your own version in about 10 minutes.

In this tutorial, we'll create a RAG (Retrieval-Augmented Generation) agent that turns your Notion pages into a personal AI assistant. We're using PostgreSQL with pgvector (yes, Postgres can do vector search!), OpenAI, and Python.

The starter template for this tutorial is available in my GitHub repository: https://github.com/michaelwwn/notion-ai-tutorial. Clone it to get started quickly!

What You'll Need

  • Python 3.8 or above
  • Docker: For running Postgres with pgvector easily.
  • OpenAI API Key: For generating embeddings and responses.

Step 1: Set Up Environment

We'll use Docker to spin up a Postgres instance with the pgvector extension pre-installed.

  1. Create a docker-compose.yml file in your project directory with the following content:
services:
  postgres:
    image: pgvector/pgvector:pg18
    restart: unless-stopped
    networks:
    - default
    volumes:
    - postgres-data:/var/lib/postgresql/data
    - ./postgres/schema.sql:/docker-entrypoint-initdb.d/schema.sql
    ports:
    - 5432:5432
    environment:
        POSTGRES_USER: postgres
        POSTGRES_PASSWORD: postgres

volumes:
  postgres-data:
Enter fullscreen mode Exit fullscreen mode
  1. Create a postgres directory and add a schema.sql file inside it:
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE IF NOT EXISTS runbooks (
    content TEXT,
    embedding VECTOR(1536)
);
CREATE INDEX ON runbooks USING hnsw (embeddings vector_cosine_ops);
Enter fullscreen mode Exit fullscreen mode
  1. Start the Docker container:
docker-compose up -d
Enter fullscreen mode Exit fullscreen mode

This will launch Postgres on localhost:5432, create the database, enable pgvector, and set up the runbooks table automatically.

The Magic Behind RAG: What's Actually Happening

Before the code, let's cover foundational concepts from transformer models, which power Notion AI. Here is a simplified flow.

Tokenization

Tokenization breaks text into tokens. For "I love playing football.", it becomes 4 tokens ["I", "loving", "playing", "football"].

Embeddings

Embeddings are vectors capturing semantic meaning. They start from pre-trained vocabularies and represent data in high-dimensional space.

Embeddings

Positional Encoding

Adds position info to embeddings to preserve word order.

Positional Encoding

Attention (Self-Attention)

Self-attention is the core essence of transformer model. It allows focusing on relevant input parts. In "I ate an apple in the Apple store.", "apple" attends to "ate" highly.

Attention

Transformer Flow

The encoded embeddings pass through three layers of attention mechanisms (typically multi-head self-attention in transformers), followed by feedforward networks and normalization in each layer. The final layer outputs contextual embedding vectors for each token, which can be used for downstream tasks.

Transformer Flow

How can we generate response from user query?

Notion documented the process of their RAG agent. Basically it generates the embeddings for each page content and store them in a vector database. Then, the service queries the database to retrieve the similiar result of user query's embeddings, finally passing the content to the LLM to generate response.

Response Generation Flow

Step 2: Start Building

Get your API key from https://platform.openai.com/api-keys. The $5 credit for new users is more than enough for this tutorial. Create a .env file in your project directory:

OPENAI_API_KEY=your-api-key-here
Enter fullscreen mode Exit fullscreen mode

Export the Notion content with all the subpages as HTML. In this example, I am using a on-call runbooks table view with the incident details documented in the subpage. You can use free incident management template from The Stanford Daily if you don't already have one.

Export from Notion

Move the exported HTML files to "pages" folder of your project directory.

Install Dependency

pip install psycopg pgvector openai beautifulsoup4 python-dotenv
Enter fullscreen mode Exit fullscreen mode

Database Connection Function

A helper to connect to Postgres and register vector support.

import psycopg
from pgvector.psycopg import register_vector

def get_db_connection():
    """Creates a connection to PostgreSQL with vector support."""
    conn = psycopg.connect(
        host="localhost",
        dbname="postgres",
        user="postgres",
        password="postgres",
        port=5432
    )
    register_vector(conn)
    return conn
Enter fullscreen mode Exit fullscreen mode

Extract Text from HTML

Notion exports include lots of markup. We need just the text:

from bs4 import BeautifulSoup

def parse_html_to_text(html_path):
    """Extracts clean text content from HTML file."""
    with open(html_path, 'r', encoding='utf-8') as file:
        soup = BeautifulSoup(file, 'html.parser')
        body = soup.body
        if body:
            return body.get_text(separator='\n', strip=True)
        return ""
Enter fullscreen mode Exit fullscreen mode

Ingest Embeddings

This is where we process your Notion pages and store their embeddings:

import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

HTML_DIR = "pages"

def ingest():
    """Reads Notion pages and stores them with embeddings in the database."""
    conn = get_db_connection()
    cur = conn.cursor()

    html_files = [f for f in os.listdir(HTML_DIR) if f.endswith('.html')]

    for index, html_file in enumerate(html_files, 1):
        print(f"Processing {index}/{len(html_files)}: {html_file}")

        html_path = os.path.join(HTML_DIR, html_file)
        text = parse_html_to_text(html_path)

        if text:
            # Generate embedding vector for the page content
            response = client.embeddings.create(
                model="text-embedding-3-small",
                input=[text]
            )
            embedding = response.data[0].embedding

            cur.execute("""
                INSERT INTO runbooks (content, embeddings)
                VALUES (%s, %s)
            """, (text, embedding))

    conn.commit()
    cur.close()
    conn.close()
    print(f"✅ Ingested {len(html_files)} pages")
Enter fullscreen mode Exit fullscreen mode

Retrieve Similar Content (Search Embeddings)

Embed the query and find top similar contents using cosine similarity (<=> operator in pgvector).

def retrieve_similar_content(query_text, limit=3):
    """Finds pages most similar to the query using vector search."""
    conn = get_db_connection()
    cur = conn.cursor()

    # Convert query to embedding
    query_embedding = client.embeddings.create(
        model="text-embedding-3-small",
        input=[query_text]
    ).data[0].embedding

    # Find similar embeddings using cosine distance
    cur.execute("""
        SELECT content
        FROM runbooks 
        ORDER BY embeddings <=> %s::vector
        LIMIT %s
    """, (query_embedding, limit))

    results = cur.fetchall()
    cur.close()
    conn.close()

    return results
Enter fullscreen mode Exit fullscreen mode

Generate Response

def generate_response(query, relevant_contexts):
    """Generates AI answer based on relevant page content."""
    context_str = "\n\n".join([
        f"Knowledge Base {i+1}:\n{content[0]}"
        for i, content in enumerate(relevant_contexts)
    ])

    response = client.responses.create(
        model="gpt-5-mini",
        input=f"""
  You are a helpful assistant. Use the following retrieved contexts to answer the user's query.
  If the contexts don't contain relevant information, say you don't know.

  Retrieved Knowledge Bases:
  {context_str}

  User Query: {query}

  Answer:
  """)

    return response.output_text
Enter fullscreen mode Exit fullscreen mode

The Complete Script

Here's everything put together:

import psycopg
from pgvector.psycopg import register_vector
import os
from bs4 import BeautifulSoup
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

HTML_DIR = "pages"

def get_db_connection():
    """Creates a connection to PostgreSQL with vector support."""
    conn = psycopg.connect(
        host="localhost",
        dbname="postgres",
        user="postgres",
        password="postgres",
        port=5432
    )
    register_vector(conn)
    return conn

def parse_html_to_text(html_path):
    """Extracts clean text content from HTML file."""
    with open(html_path, 'r', encoding='utf-8') as file:
        soup = BeautifulSoup(file, 'html.parser')
        body = soup.body
        if body:
            return body.get_text(separator='\n', strip=True)
        return ""

def ingest():
    """Reads Notion pages and stores them with embeddings in the database."""
    conn = get_db_connection()
    cur = conn.cursor()

    html_files = [f for f in os.listdir(HTML_DIR) if f.endswith('.html')]

    for index, html_file in enumerate(html_files, 1):
        print(f"Processing {index}/{len(html_files)}: {html_file}")

        html_path = os.path.join(HTML_DIR, html_file)
        text = parse_html_to_text(html_path)

        if text:
            # Generate embedding vector for the page content
            response = client.embeddings.create(
                model="text-embedding-3-small",
                input=[text]
            )
            embedding = response.data[0].embedding

            cur.execute("""
                INSERT INTO runbooks (content, embeddings)
                VALUES (%s, %s)
            """, (text, embedding))

    conn.commit()
    cur.close()
    conn.close()
    print(f"✅ Ingested {len(html_files)} pages")

def retrieve_similar_content(query_text, limit=3):
    """Finds pages most similar to the query using vector search."""
    conn = get_db_connection()
    cur = conn.cursor()

    # Convert query to embedding
    query_embedding = client.embeddings.create(
        model="text-embedding-3-small",
        input=[query_text]
    ).data[0].embedding

    # Find similar embeddings using cosine distance
    cur.execute("""
        SELECT content
        FROM runbooks 
        ORDER BY embeddings <=> %s::vector
        LIMIT %s
    """, (query_embedding, limit))

    results = cur.fetchall()
    cur.close()
    conn.close()

    return results

def generate_response(query, relevant_contexts):
    """Generates AI answer based on relevant page content."""
    context_str = "\n\n".join([
        f"Knowledge Base {i+1}:\n{content[0]}"
        for i, content in enumerate(relevant_contexts)
    ])

    response = client.responses.create(
        model="gpt-5-mini",
        input=f"""
  You are a helpful assistant. Use the following retrieved contexts to answer the user's query.
  If the contexts don't contain relevant information, say you don't know.

  Retrieved Knowledge Bases:
  {context_str}

  User Query: {query}

  Answer:
  """)

    return response.output_text

if __name__ == "__main__":
    # Step 1: Ingest Notion pages (comment this out after first run)
    ingest()

    # Step 2: Query the knowledge base
    query = "What is our organization email?"
    similar_pages = retrieve_similar_content(query, limit=3)
    response = generate_response(query, similar_pages)

    print(f"\nQuery: {query}")
    print(f"\nAnswer:\n{response}")
Enter fullscreen mode Exit fullscreen mode

You've just built a production-ready RAG agent that can power your team's knowledge base. But this is just the beginning.

Try experimenting with different embedding models, add memory to track conversation history, or implement filtering by document categories. The possibilities are endless once you understand the fundamentals.

Further Readings

Have questions? Drop them in the comments. Thank you for reading! ❤️

Top comments (0)