DEV Community

Cover image for Still RAG‑ing in 2025? Use this periodic table instead
Harsh Pundhir
Harsh Pundhir

Posted on

Still RAG‑ing in 2025? Use this periodic table instead


Building a Retrieval‑Augmented‑Generation stack still feels like LEGO on hard‑mode—too many bricks, not enough labels.

So I stole a page from Feynman: break the system into “atoms,” keep only one killer advantage and one honest gotcha per tool, and let you mix‑and‑match. Grab one block from each column and you’ve got production RAG.


🧱 Data Sources & Content Prep

  • Common Crawl – nonprofit snapshot of the open web; + free 250 B‑page corpus, – raw HTML is very noisy
  • Apache Tika – Swiss‑army text extractor; + handles 1 000+ file types, – Java dependencies add weight
  • Unstructured – Python ETL that chunks complex docs; + turns PDFs into LLM‑friendly bits, – API shifts fast

🧭 Embeddings

  • Sentence‑Transformers – SBERT family of open‑source encoders; + can fine‑tune locally, – slower than one‑shot API vectors
  • OpenAI Embeddings – pay‑per‑call vectors; + one‑line rollout, – vendor lock‑in & cost
  • Cohere Embed – multilingual / multimodal embeddings; + strong non‑English support, – closed‑source API only

🗄️ Vector Stores

  • Pinecone – managed vector database; + zero ops, – gets pricey at scale
  • Faiss – C++/GPU brute‑force search; + blazing local speed, – DIY sharding & infra
  • Qdrant – Rust‑based engine with filters; + fast & simple, – smaller community
  • Weaviate – GraphQL‑native hybrid search; + combines keyword+vector, – memory‑hungry JVM
  • Milvus – cloud‑native cluster‑scale store; + billion‑scale horizontal scaling, – helm‑chart complexity

🧠 Large Language Models

  • GPT‑4o – OpenAI’s flagship multimodal LLM; + tops most benchmarks, – highest token cost
  • Llama 3 – Meta’s permissively‑licensed weights; + self‑host friendly, – trails GPT‑4o on code
  • Mixtral 8×7B – sparse‑mixture MoE rocket; + GPT‑3.5 quality at 6× speed, – big MoE weight files
  • DeepSeek‑LLM – bilingual 2 T‑token model; + English‑Chinese fluency, – EU data‑privacy questions
  • Grok 4 – xAI’s real‑time reasoning model; + built‑in web search, – premium subscription wall
  • Claude 3 – Anthropic’s alignment‑first family; + 200 K context window, – rate‑limited free tier

🔗 RAG Orchestration

  • LangChain – LEGO‑style building blocks for chains & agents; + huge ecosystem, – can feel heavyweight
  • LlamaIndex – data‑centric RAG graphs; + 100+ data loaders, – API churn
  • Haystack – node‑based pipelines with built‑in eval; + first‑class RAG evaluation, – verbose YAML configs

🏷️ Re‑ranking & Evaluation

  • BGE‑Reranker – tiny cross‑encoder for relevance boost; + SOTA recall, – adds latency
  • Cohere Rerank – drop‑in API reordering; + one‑line integration, – paid quota
  • ColBERT – late‑interaction bi‑encoder; + scales to 100 M docs, – GPU hungry
  • FlashRank – super‑lite LLM ranker; + CPU friendly, – very new project
  • RAGAS – automated RAG metrics dashboards; + generates test sets, – metrics still evolving

🚢 Deploy & Safety

  • Kubernetes – the de‑facto container OS; + autoscaling pods, – steep ops learning curve
  • OpenFaaS – serverless on your own K8s; + function‑first DX, – cold‑start lag
  • Guardrails AI – schema‑based output validators; + easy policy checks, – can over‑filter creativity
  • NeMo Guardrails – Nvidia hallucination fence; + programmable dialogue rules, – CUDA bias

🎛️ UI / UX

  • Streamlit – data apps in five lines of Python; + instant dashboards, – limited custom CSS
  • Gradio – shareable ML demos; + public links in seconds, – hefty JS bundle
  • Reflex – full‑stack web apps in pure Python; + no JavaScript required, – pre‑1.0 API churn
  • PostHog – open‑source product analytics; + autocapture events, – self‑hosting eats RAM

👩‍🍳 10‑Line Recipe

Common Crawl ➜ Apache Tika ➜ Sentence‑Transformers ➜ Qdrant ➜ BGE‑Rerank ➜ Llama 3 ➜ LangChain ➜ Guardrails‑AI ➜ Streamlit ➜ Kubernetes

Swap any piece in its column to tweak cost, latency, or licensing — the chemistry still works. Happy RAG‑building!

Top comments (1)

Collapse
 
harsh_pundhir_61bb1fe5fbd profile image
Harsh Pundhir

🌟 Honorable Mentions

Data & Prep

  • Wikipedia Dumps – monthly snapshot of Wikipedia; + encyclopedic & structured, – limited domain diversity
  • arXiv Bulk Data – open archive of scientific papers; + high‑quality technical text, – LaTeX markup noise

Embeddings

  • Jina Embeddings v2 – multilingual, lightweight models; + small footprint, – trails SOTA accuracy
  • E5 – all‑in‑one unsupervised embedder; + top BEIR scores, – 1.3 B parameters

Vector Stores

  • Redis Vector – in‑memory ANN in Redis; + sub‑ms latency, – RAM gets expensive
  • Elasticsearch k‑NN – vector search inside ELK; + reuses existing ELK clusters, – slower than purpose‑built DBs

Large Language Models

  • Phi‑3 – 3.8 B‑parameter tiny powerhouse; + SOTA for size, – short context window
  • Gemini 1.5 Flash – Google’s fast Gemini tier; + 1 M‑token context, – closed API

Orchestration

  • Semantic Kernel – Microsoft’s .NET/JS orchestrator; + tight MS 365 hooks, – smaller OSS community
  • CrewAI – YAML‑first agentic RAG; + super‑simple syntax, – bleeding‑edge project

Re‑rank & Eval

  • TruLens – LLM eval & guardrails; + interactive dashboards, – setup complexity
  • LangSmith – hosted observability for LangChain; + deep LC hooks, – vendor lock‑in

Deploy & Safety

  • Ollama – run open models locally via containers; + dead‑easy local GPU, – macOS/Linux only
  • Replicate – serverless model hosting; + zero infra, pay‑per‑second, – cost spikes on heavy use

UI / UX

  • Dash – Plotly‑powered analytic apps; + awesome charts, – more boilerplate than Streamlit
  • Shiny for Python – R‑style reactive web in Py; + reactive magic, – smaller ecosystem