Skip to content

DEV Community

马国锦

Posted on Jun 11

Build Your RAG System Right the First Time: 6 Decisions That Make or Break It

#rag #ai #machinelearning #tutorial

After debugging 20+ broken RAG systems, I have identified the 6 decisions that determine whether yours works.

Decision 1: Embedding Model

Language	Use This
Chinese	BAAI/bge-large-zh-v1.5
Chinese + English	BAAI/bge-m3
English	text-embedding-3-large

Non-negotiable: indexing model and query model must be byte-for-byte identical.

Decision 2: Chunk Size

Document Type	Sweet Spot	Overlap
FAQ	128-256	20
Technical docs	512	50
Long-form	768-1024	100

Use recursive splitting, not fixed-length.

Decision 3: Index Type — HNSW vs IVF

Scale	Use
< 1M vectors	HNSW (recall > 0.95)
1-5M, RAM tight	IVF + PQ
> 5M	IVF + PQ + Sharding

Decision 4: Metadata

Without metadata filtering, every query scans all vectors. Add department=engineering AND date > 2024-01-01 to go from 5M to 50K vectors.

Decision 5: Deduplication — Do It Twice

Document-level: MinHash + LSH, threshold 0.85
Chunk-level: SimHash, threshold 0.95

Decision 6: Query Processing

Technique	When
Query rewriting	Short/fuzzy queries
HyDE	Factual QA
RRF fusion	Semantic + exact-match
Cross-Encoder rerank	Post-retrieval

Minimum viable stack: Query rewriting + Cross-Encoder rerank.

Optimization Priority

Embedding model language-appropriate?
Chunk size reasonable (256-768)?
Deduplicating?
Query rewriting
Cross-Encoder reranking
Metadata filtering

☕ Support This Content

If my articles saved you debugging time, scan the QR code below to buy me a coffee.

Follow @mgj for weekly AI engineering deep dives.

Top comments (0)

Subscribe