Why Twio Chose Vertex AI Search over pgvector for Production RAG

#ai #architecture #database #rag

When we first built RAG at Twio, pgvector was the obvious pick. Our business data was already in PostgreSQL, and dropping embeddings into the same database was the fastest path to a working product.

For the first version, that was right. As we scaled, the problem stopped being "how do we store vectors?" and became "how do we reliably understand thousands of broker documents, emails, and attachments in production?" That changed the answer. Today, Vertex AI Search is our main retrieval layer.

RAG is Twio's memory layer, not a search feature

Twio is an AI SaaS for loan brokers. A single client case is a mess of fragmented information:

email threads
payslips, bank statements, identity documents
loan forms, lender requirements
handwritten notes, follow-up emails, missing-document requests

The AI needs to answer questions like:

What documents has this client already sent?
Which email mentioned the missing requirement?
Does this bank statement support the income claim?
Summarize all documents related to this borrower.

If retrieval is weak, the answer is weak. If indexing lags, context is missing. If parsing is wrong, the model sees the wrong evidence. RAG isn't a feature on the side — it's the memory layer of the product.

Why pgvector was the right first choice

Twio is a multi-tenant SaaS, so retrieval can't just return "similar content" — it has to return similar content scoped to the right user, client, application, or file. pgvector made that trivial: embeddings sat next to the business records, joined cleanly, and filtered with plain SQL.

The early wins were real:

no new infrastructure
low cost, easy local dev
SQL inspection for debugging
straightforward metadata filtering
fast to ship

It let us build the first version quickly and learn from actual usage. That matters more than people give it credit for.

Where pgvector stopped paying off

pgvector didn't fail. It did exactly what it's designed for. The issue was that vector storage is only one slice of the RAG pipeline, and pgvector left every other slice to us:

download attachments
extract text from PDFs, run OCR on scans
chunk documents, generate embeddings
design metadata, build retrieval queries
tune indexes, improve ranking
monitor Postgres load, debug retrieval quality

A clean PDF is easy. A scanned bank statement isn't. An email body is easy. An email with five attachments, lender forms, tables, and partial OCR isn't. A demo dataset is easy. A real broker workspace with years of historical emails isn't.

With pgvector, every weakness in that pipeline was ours to fix. When retrieval quality dropped, the suspect list ran all the way from OCR through chunking and embedding to vector distance, SQL filtering, ranking, and DB performance. The extension is simple. The production RAG system around it isn't.

The cost shifted from cloud bill to engineering time — and engineering time was the constrained resource.

pgvector vs Vertex AI Search, in Twio's terms

Scenario	pgvector	Vertex AI Search
Clean text PDF	We own extraction, chunking, embedding, storage, search	Vertex handles most of the indexing and retrieval workflow
Scanned document	We build or integrate OCR ourselves	Vertex absorbs much of the document-processing logic
Broker asks a document question	We own query design, ranking, filtering	Managed search with stronger out-of-the-box quality
Attachment bursts	Postgres carries more search and indexing load	Search workload lives outside the main database
Debugging	Excellent SQL visibility, but many custom layers to inspect	Less low-level control, but far less custom infra to debug
Cost	Lower direct service cost	Higher service cost, lower engineering and maintenance cost
Production readiness	Significant custom work required	Easier to operate as a managed layer

pgvector was cheaper as a database extension. Vertex is cheaper as a product decision. The cloud bill is one input; engineering time, reliability, and iteration speed are the bigger ones at our stage.

Why Vertex fits Twio's shape of problem

Twio's RAG problem is document-heavy. We aren't searching short snippets — we're dealing with messy broker PDFs, scans, forms, tables, and forwarded attachments. Vertex helps in four concrete ways:

Less infrastructure to own. Indexing and retrieval are handled by the managed layer, so we don't rebuild that surface.
Less document-processing logic to maintain. OCR and parsing for messy broker files is one of the harder parts of the pipeline to keep healthy. Vertex covers much of it.
Postgres stays focused on what it's good at — business data, transactions, workflow state — instead of competing with OLTP work for the same resources.
It scales more naturally as document volume grows.

Vertex isn't free, but the alternative isn't either. Building OCR, indexing, ranking, monitoring, and tuning ourselves has its own bill — paid in engineer-weeks.

What pgvector still does well

pgvector is still a strong choice when:

data volume is moderate
you're already on Postgres and want retrieval close to your data
your documents are already clean text
you need tight SQL filtering and full control
you want a fast, low-cost first version

For us, it was the right first implementation — and it taught us what retrieval the product actually needed. It may stay in the stack for internal or fallback use cases.

Takeaway

The lesson from Twio's RAG evolution is simple:

Start with the tool that helps you learn fastest. Move to the tool that helps you operate best.

pgvector got us to a working RAG system quickly. As the product matured, the real challenge shifted to document processing, indexing quality, and operational reliability — and at that point, Vertex AI Search became the better fit. It costs more as a service and less as a system to maintain. For a SaaS at Twio's stage, that's the trade that matters.