Harsh Pundhir

Posted on Jul 20

Still RAG‑ing in 2025? Use this periodic table instead

#ai #aiops #rag #machinelearning

Building a Retrieval‑Augmented‑Generation stack still feels like LEGO on hard‑mode—too many bricks, not enough labels.

So I stole a page from Feynman: break the system into “atoms,” keep only one killer advantage and one honest gotcha per tool, and let you mix‑and‑match. Grab one block from each column and you’ve got production RAG.

🧱 Data Sources & Content Prep

Common Crawl – nonprofit snapshot of the open web; + free 250 B‑page corpus, – raw HTML is very noisy
Apache Tika – Swiss‑army text extractor; + handles 1 000+ file types, – Java dependencies add weight
Unstructured – Python ETL that chunks complex docs; + turns PDFs into LLM‑friendly bits, – API shifts fast

🧭 Embeddings

Sentence‑Transformers – SBERT family of open‑source encoders; + can fine‑tune locally, – slower than one‑shot API vectors
OpenAI Embeddings – pay‑per‑call vectors; + one‑line rollout, – vendor lock‑in & cost
Cohere Embed – multilingual / multimodal embeddings; + strong non‑English support, – closed‑source API only

🗄️ Vector Stores

Pinecone – managed vector database; + zero ops, – gets pricey at scale
Faiss – C++/GPU brute‑force search; + blazing local speed, – DIY sharding & infra
Qdrant – Rust‑based engine with filters; + fast & simple, – smaller community
Weaviate – GraphQL‑native hybrid search; + combines keyword+vector, – memory‑hungry JVM
Milvus – cloud‑native cluster‑scale store; + billion‑scale horizontal scaling, – helm‑chart complexity

🧠 Large Language Models

GPT‑4o – OpenAI’s flagship multimodal LLM; + tops most benchmarks, – highest token cost
Llama 3 – Meta’s permissively‑licensed weights; + self‑host friendly, – trails GPT‑4o on code
Mixtral 8×7B – sparse‑mixture MoE rocket; + GPT‑3.5 quality at 6× speed, – big MoE weight files
DeepSeek‑LLM – bilingual 2 T‑token model; + English‑Chinese fluency, – EU data‑privacy questions
Grok 4 – xAI’s real‑time reasoning model; + built‑in web search, – premium subscription wall
Claude 3 – Anthropic’s alignment‑first family; + 200 K context window, – rate‑limited free tier

🔗 RAG Orchestration

LangChain – LEGO‑style building blocks for chains & agents; + huge ecosystem, – can feel heavyweight
LlamaIndex – data‑centric RAG graphs; + 100+ data loaders, – API churn
Haystack – node‑based pipelines with built‑in eval; + first‑class RAG evaluation, – verbose YAML configs

🏷️ Re‑ranking & Evaluation

BGE‑Reranker – tiny cross‑encoder for relevance boost; + SOTA recall, – adds latency
Cohere Rerank – drop‑in API reordering; + one‑line integration, – paid quota
ColBERT – late‑interaction bi‑encoder; + scales to 100 M docs, – GPU hungry
FlashRank – super‑lite LLM ranker; + CPU friendly, – very new project
RAGAS – automated RAG metrics dashboards; + generates test sets, – metrics still evolving

🚢 Deploy & Safety

Kubernetes – the de‑facto container OS; + autoscaling pods, – steep ops learning curve
OpenFaaS – serverless on your own K8s; + function‑first DX, – cold‑start lag
Guardrails AI – schema‑based output validators; + easy policy checks, – can over‑filter creativity
NeMo Guardrails – Nvidia hallucination fence; + programmable dialogue rules, – CUDA bias

🎛️ UI / UX

Streamlit – data apps in five lines of Python; + instant dashboards, – limited custom CSS
Gradio – shareable ML demos; + public links in seconds, – hefty JS bundle
Reflex – full‑stack web apps in pure Python; + no JavaScript required, – pre‑1.0 API churn
PostHog – open‑source product analytics; + autocapture events, – self‑hosting eats RAM

👩‍🍳 10‑Line Recipe

Common Crawl ➜ Apache Tika ➜ Sentence‑Transformers ➜ Qdrant ➜ BGE‑Rerank ➜ Llama 3 ➜ LangChain ➜ Guardrails‑AI ➜ Streamlit ➜ Kubernetes

Swap any piece in its column to tweak cost, latency, or licensing — the chemistry still works. Happy RAG‑building!

Top comments (1)

Harsh Pundhir • Jul 20

🌟 Honorable Mentions

Data & Prep

Wikipedia Dumps – monthly snapshot of Wikipedia; + encyclopedic & structured, – limited domain diversity
arXiv Bulk Data – open archive of scientific papers; + high‑quality technical text, – LaTeX markup noise

Embeddings

Jina Embeddings v2 – multilingual, lightweight models; + small footprint, – trails SOTA accuracy
E5 – all‑in‑one unsupervised embedder; + top BEIR scores, – 1.3 B parameters

Vector Stores

Redis Vector – in‑memory ANN in Redis; + sub‑ms latency, – RAM gets expensive
Elasticsearch k‑NN – vector search inside ELK; + reuses existing ELK clusters, – slower than purpose‑built DBs

Large Language Models

Phi‑3 – 3.8 B‑parameter tiny powerhouse; + SOTA for size, – short context window
Gemini 1.5 Flash – Google’s fast Gemini tier; + 1 M‑token context, – closed API

Orchestration

Semantic Kernel – Microsoft’s .NET/JS orchestrator; + tight MS 365 hooks, – smaller OSS community
CrewAI – YAML‑first agentic RAG; + super‑simple syntax, – bleeding‑edge project

Re‑rank & Eval

TruLens – LLM eval & guardrails; + interactive dashboards, – setup complexity
LangSmith – hosted observability for LangChain; + deep LC hooks, – vendor lock‑in

Deploy & Safety

Ollama – run open models locally via containers; + dead‑easy local GPU, – macOS/Linux only
Replicate – serverless model hosting; + zero infra, pay‑per‑second, – cost spikes on heavy use

UI / UX

Dash – Plotly‑powered analytic apps; + awesome charts, – more boilerplate than Streamlit
Shiny for Python – R‑style reactive web in Py; + reactive magic, – smaller ecosystem

🧱 Data Sources & Content Prep

🧭 Embeddings

🗄️ Vector Stores

🧠 Large Language Models

🔗 RAG Orchestration

🏷️ Re‑ranking & Evaluation

🚢 Deploy & Safety

🎛️ UI / UX