DEV Community

Priyadharshini Selvaraj
Priyadharshini Selvaraj

Posted on

RAG and LangChain Basics

Introduction

Large Language Models (LLMs) like GPT-4 and Claude are incredibly powerful, but they have a significant limitation: they can only work with information they were trained on. What if you want your AI application to access your company's internal documents, real-time data, or domain-specific knowledge? This is where Retrieval-Augmented Generation (RAG) comes in, and LangChain makes implementing RAG systems remarkably straightforward.

What is RAG?

RAG is an architectural pattern that combines the generative capabilities of LLMs with external knowledge retrieval. Instead of relying solely on the model's pre-trained knowledge, RAG systems first retrieve relevant information from external sources, then use that context to generate more accurate and up-to-date responses.

Traditional LLM vs RAG Architecture

Traditional LLM:
User Query → LLM → Response (limited to training data)

RAG System:
User Query → Retrieve Relevant Docs → LLM + Context → Enhanced Response
Enter fullscreen mode Exit fullscreen mode

Key Benefits of RAG:

  • Access to current, domain-specific information
  • Reduced hallucinations through grounded responses
  • Cost-effective alternative to fine-tuning
  • Easy to update knowledge base without retraining

RAG System Architecture

Image description

The RAG pipeline consists of two main phases:

1. Indexing Phase (Data Preparation)

  • Document Loading: Ingest documents from various sources (PDFs, web pages, databases)
  • Text Splitting: Break documents into manageable chunks
  • Embedding: Convert text chunks into vector representations
  • Storage: Store embeddings in a vector database

2. Retrieval Phase (Query Processing)

  • Query Embedding: Convert user query to vector format
  • Similarity Search: Find most relevant document chunks
  • Context Assembly: Combine retrieved chunks with the original query
  • Generation: LLM generates response using augmented context

Enter LangChain

LangChain is a framework that simplifies building applications with LLMs. It provides abstractions and tools that make implementing RAG systems much easier than building from scratch.

Core LangChain Components for RAG

# Document Loading
from langchain.document_loaders import PyPDFLoader, WebBaseLoader

# Text Splitting
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Embeddings
from langchain.embeddings import OpenAIEmbeddings

# Vector Stores
from langchain.vectorstores import Chroma, Pinecone

# LLMs
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

# Chains
from langchain.chains import RetrievalQA
Enter fullscreen mode Exit fullscreen mode

Building a RAG System with LangChain

Let's walk through creating a simple RAG system that can answer questions about your documentation:

Step 1: Document Ingestion and Processing

from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

# Load documents
loader = DirectoryLoader('./docs', glob="**/*.md")
documents = loader.load()

# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
splits = text_splitter.split_documents(documents)

# Create embeddings and vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=embeddings
)
Enter fullscreen mode Exit fullscreen mode

Step 2: Setting up the Retrieval Chain

from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

# Initialize LLM
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# Create retrieval chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(
        search_kwargs={"k": 3}  # Retrieve top 3 relevant chunks
    )
)

# Query the system
response = qa_chain.run("How do I implement authentication?")
print(response)
Enter fullscreen mode Exit fullscreen mode

RAG Query Flow Visualization

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   User Query    │───▶│  Vector Search   │───▶│  Top K Chunks   │
│"How to deploy?" │    │   (Similarity)   │    │   Retrieved     │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                                        │
                                                        ▼
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│Final Response   │◀───│      LLM         │◀───│  Query + Context│
│  (Grounded)     │    │   Generation     │    │   Combined      │
└─────────────────┘    └──────────────────┘    └─────────────────┘
Enter fullscreen mode Exit fullscreen mode

Advanced RAG Patterns

1. Multi-Query Retrieval

Generate multiple variations of the user query to improve retrieval quality:

from langchain.retrievers import MultiQueryRetriever

retriever = MultiQueryRetriever.from_llm(
    retriever=vectorstore.as_retriever(),
    llm=llm
)
Enter fullscreen mode Exit fullscreen mode

2. Contextual Compression

Filter retrieved documents to only include relevant portions:

from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectorstore.as_retriever()
)
Enter fullscreen mode Exit fullscreen mode

3. Self-Query Retrieval

Allow the system to construct filters based on query metadata:

from langchain.retrievers.self_query.base import SelfQueryRetriever

retriever = SelfQueryRetriever.from_llm(
    llm=llm,
    vectorstore=vectorstore,
    document_contents="Technical documentation",
    metadata_field_info=metadata_field_info
)
Enter fullscreen mode Exit fullscreen mode

Best Practices for Production RAG Systems

Data Quality and Chunking Strategy

  • Optimal Chunk Size: 500-1000 characters with 100-200 character overlap
  • Preserve Context: Avoid splitting mid-sentence or breaking logical units
  • Metadata Enrichment: Add source, timestamp, and category information

Retrieval Optimization

  • Hybrid Search: Combine semantic and keyword search
  • Reranking: Use cross-encoders to improve retrieval quality
  • Evaluation Metrics: Track retrieval precision, recall, and end-to-end accuracy

Performance Considerations

# Async processing for better performance
from langchain.chains import AsyncLLMChain

# Caching for repeated queries
from langchain.cache import InMemoryCache
import langchain
langchain.llm_cache = InMemoryCache()

# Streaming responses
def stream_response(query):
    for chunk in qa_chain.stream({"query": query}):
        yield chunk
Enter fullscreen mode Exit fullscreen mode

Real-World Applications

RAG systems are being used across various domains:

  • Customer Support: AI chatbots with access to knowledge bases
  • Legal Research: Query legal documents and case law
  • Medical Assistance: Access medical literature and guidelines
  • Code Documentation: Developer tools with codebase awareness
  • Educational Content: Personalized learning with curriculum access

Conclusion

RAG represents a paradigm shift in how we build AI applications. By combining the power of LLMs with external knowledge retrieval, we can create systems that are both intelligent and grounded in factual information. LangChain makes this implementation accessible to developers, providing the tools and abstractions needed to build production-ready RAG systems.

The key to successful RAG implementation lies in understanding your data, optimizing your retrieval strategy, and continuously evaluating system performance. As the field evolves, we're seeing more sophisticated approaches like agentic RAG, multi-modal retrieval, and graph-based knowledge systems.

Start small with a simple RAG system using LangChain, then gradually incorporate advanced patterns as your needs grow. The future of AI applications is not just about better models, but about better ways to ground those models in real-world knowledge.


Ready to build your own RAG system? Check out the LangChain documentation and start experimenting with your own documents today!

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.