How I Built a Personal AI Knowledge Base with Amazon Aurora pgvector and Next.js — AWS H0 Hackathon

#aws #aurora #h0hackathon #pgvector

I built ChatScroll for the AWS H0 Hackathon — an app that
lets you save AI answers as searchable "Scrolls" using
Amazon Aurora PostgreSQL with pgvector for semantic search.

The Problem

Every day people ask AI assistants valuable questions and
get great answers — then lose them forever. Chat history
is linear, unsearchable, and ephemeral. I kept re-Googling
the same questions knowing I had already found the answer
somewhere but couldn't find it again.

The Solution

ChatScroll transforms AI conversations into a personal
knowledge library. Save any AI answer as a "Scroll",
organize it automatically, and find it later with
semantic search.

The Core Technical Challenge

Making search understand MEANING not just keywords. When
you search "blood thinner medication" it should find your
warfarin scroll even though "blood thinner" doesn't appear
in the title.

How pgvector on Aurora Solves This

Amazon Aurora PostgreSQL with the pgvector extension stores
3072-dimensional vector embeddings for every saved Scroll.

When a user saves a Scroll:

The answer text is sent to Google's gemini-embedding-001
The model returns a 3072-dimensional vector
The vector is stored in Aurora alongside the content

When a user searches:

The search query is converted to a vector
Aurora finds the most similar vectors using cosine distance
Results are ranked by semantic similarity

-- Semantic search with threshold
WHERE 1 - (embedding <=> $queryVec) > 0.5
ORDER BY embedding <=> $queryVec
LIMIT 5

Three PostgreSQL Extensions Working Together

What makes Aurora special for this use case is three
extensions working together:

pgvector — stores 3072-dim embeddings, enables cosine
similarity search between vectors

ltree — stores folder paths as dot-separated label trees
(programming.containers), enables subtree queries without
recursive CTEs

tsvector — powers full-text search with ranking via
ts_rank, combined with pgvector for hybrid search

The Dual Database Architecture

I made a deliberate choice to use TWO AWS databases:

Amazon Aurora PostgreSQL for structured data:

Scrolls with embeddings
Folder hierarchy (ltree)
User accounts (Cognito sub)
Conversation metadata

Amazon DynamoDB for chat messages:

PK: conversationId
SK: timestamp#messageId
TTL: 90-day auto-expiry
PAY_PER_REQUEST billing

This separation keeps Aurora lean for complex queries
while DynamoDB handles the high-volume chat stream.

The Result

Searching "containerization technology" correctly surfaces
the Docker scroll. Searching "blood thinner medication"
finds warfarin — no programming results contaminating it.

Semantic search scoped to the same folder category
ensures results are always relevant.