Turning a Pile of Saved Links Into a Queryable Knowledge Hub (.NET + pgvector)

#ai #postgres #webdev #productivity

You save things for months — articles, videos, threads that explained exactly the thing you were stuck on. Your library fills up. And then, when you actually need something, you Google it from scratch anyway, because finding it in your own saves feels harder than searching the whole internet again.

That's the gap I wanted to close: turn a passive pile of bookmarks into a knowledge hub you can just ask. Type a question in plain language, get a real answer, with links back to the exact saves it came from.

Here's how it's built.

Disclosure: I'm the founder of SavePosty, a read-it-later app, and this is a write-up of a real feature in our own .NET codebase — not a sponsored post. Every detail below is from production.

The shape of it

The hub is two pipelines that share one index. Ingest runs once per save and builds the searchable representation. Query runs every time you ask a question and reads that representation live.

Nothing here is a separate search product. It's all .NET on the Postgres I already run — which turns out to be the most important design decision.

Storage: the hub lives in Postgres

The popular instinct is to reach for a dedicated vector database. I used pgvector instead — a Postgres extension that adds a vector column type and distance operators (cosine, L2, inner product).

Why it's the right call at my scale:

Every save already lives in Postgres. A similarity search is one SQL query joined on user_id — no second datastore, no sync job, no "did the vector store drift from the relational store?" class of bug.
An HNSW index turns that search from O(n) brute force (seconds at half a million vectors) into O(log n) graph traversal (milliseconds). It's approximate, but "close enough" is exactly what semantic search wants.

The honest tradeoff: the HNSW index is memory-hungry — it wants RAM as your vector count grows. That's fine per user; the real conversation starts at platform scale, and by then I'll be deciding with production numbers instead of a guess.

Embeddings: the provider is config, not code

An embedding is just an array of floats that captures the meaning of a chunk of text. The model that produces them is the one piece I deliberately did not hardcode.

I started on self-hosted Ollama (nomic-embed-text, zero per-call cost) and later moved to OpenAI's text-embedding-3-small. The migration was zero code: the embedding provider is a capability mapping in an admin UI, so I changed Embedding → provider and re-indexed. If prices change or a better model ships, I flip it back the same way.

The ingest details that matter: text is chunked into ~200-word windows with ~40-word overlap (so an idea that straddles a boundary is still retrievable), and an EmbeddedAt timestamp gates visibility — the hub never answers from a half-indexed save.

The agent: it doesn't just retrieve, it acts

A plain RAG chain is a fixed pipeline: embed query → retrieve top-k → stuff into prompt → answer. It works. But I wanted the hub to do things, so I gave it an agent with tools.

The agent has search_my_saves() and create_collection(), and it decides when to use them. So it can skip searching when it already knows, reformulate a vague question, and — the fun part — act on the results: ask it to "group everything I saved about RAG" and it searches, then builds a collection from what it found. Retrieval plus action, from one request.

The cost is real (more tokens, slightly less determinism), so I split the work: a stronger model (claude-sonnet-4-6 via OpenRouter) drives the agent, while a cheap one (claude-haiku-4-5) handles quick one-shot questions for next to nothing.

Streaming: the answer arrives as it forms

Answers stream back over Server-Sent Events, with typed events the UI reacts to.

thinking, text_chunk, citation, collection_created — that's the whole protocol. I chose SSE over WebSockets on purpose: the data flows one direction (server → client), so a stateful bidirectional socket would only add complexity. SSE is HTTP-native, reconnects cleanly, and the typed events are what make the live "thinking" indicator and inline citations possible.

What I'd reach for next

Hybrid search (BM25 + vector) to catch the exact-keyword queries pure semantic search occasionally fumbles.
Semantic chunking instead of fixed windows, for tighter retrieval.
Embedding-cache invalidation so editing a save re-embeds only what changed.

The big lesson: a "knowledge hub" sounds like it needs heavy infrastructure, and it mostly doesn't. Postgres you already run, a swappable embedding provider, an agent with two tools, and a streaming endpoint get you a long way.

I'm building SavePosty in the open. If the internals are your thing, come follow the build — and tell me what you'd do differently in the comments.