DEV Community

CriticalMynd
CriticalMynd

Posted on

LightRAG Tutorial: Getting Started with Knowledge Graph-Based RAG

This tutorial walks through setting up and using LightRAG, a retrieval-augmented generation system that combines knowledge graphs with vector search for document retrieval.

What is LightRAG?

LightRAG is a RAG (Retrieval-Augmented Generation) system that builds knowledge graphs from your documents. Unlike classical RAG systems that rely solely on vector similarity search, LightRAG extracts entities and relationships from documents to create a structured knowledge graph, then uses both the graph and vector search for retrieval.

How LightRAG Differs from Classical RAG

Classical RAG:

  • Uses vector embeddings to find semantically similar document chunks
  • Retrieval is based on cosine similarity between query and document vectors
  • No structured understanding of entities or relationships

LightRAG:

  • Extracts entities (people, organizations, concepts) and relationships from documents
  • Builds a knowledge graph that captures these relationships
  • Uses both graph traversal and vector search for retrieval
  • Offers multiple query modes: naive, local, global, hybrid, and mix

The knowledge graph enables more precise retrieval by understanding not just what documents are similar, but how concepts relate to each other. This can improve accuracy for queries that require understanding relationships between entities.

Prerequisites

  • Docker and Docker Compose
  • An LLM provider (OpenAI, Gemini, Ollama, Azure OpenAI, AWS Bedrock, or Jina)
  • An embedding model (supports OpenAI, Gemini, Ollama, Jina, and others)
  • Optional: A reranker model

Setup

1. Clone the Repository

git clone https://github.com/HKUDS/LightRAG.git
cd LightRAG
Enter fullscreen mode Exit fullscreen mode

2. Configure Environment Variables

Copy the example environment file and configure it:

cp env.example .env
Enter fullscreen mode Exit fullscreen mode

Edit .env with your configuration. Here's an example using Ollama for both LLM and embeddings:

LLM_BINDING=ollama
LLM_MODEL=llama3.2:latest
LLM_BINDING_HOST=http://host.docker.internal:11434

EMBEDDING_BINDING=ollama
EMBEDDING_MODEL=bge-m3:latest
EMBEDDING_DIM=1536
EMBEDDING_BINDING_HOST=http://host.docker.internal:11434

RERANK_BINDING=null
PORT=9621
Enter fullscreen mode Exit fullscreen mode

3. Start the Server

docker compose up -d
Enter fullscreen mode Exit fullscreen mode

The web UI will be available at http://localhost:9621/webui/

Using LightRAG: Step-by-Step Tutorial

Step 1: Access the Web UI

Navigate to http://localhost:9621/webui/ in your browser. You'll see the main interface with tabs for Documents, Knowledge Graph, Retrieval, and API.

Initial UI

Step 2: Upload a Document

Click on the Documents tab. Initially, you'll see "No Documents" if this is a fresh installation.

Documents Tab

You can upload documents in two ways:

  1. Direct Upload (Recommended): Click the Upload button in the Documents tab. A dialog will open where you can drag and drop files or click to browse and select files from your file system. LightRAG supports many file types including TXT, MD, DOCX, PDF, PPTX, XLSX, and various code files.

Upload Dialog

After selecting a file, it will be uploaded automatically and processing will begin. You'll see a success notification and the document will appear in the table with a "Processing" status.

File Uploaded and Processing

  1. Directory Scan: Alternatively, you can place your document in the data/inputs/ directory, then click Scan/Retry. LightRAG will detect the document and queue it for processing.

Step 3: Document Processing

LightRAG processes the document by:

  1. Chunking the text
  2. Extracting entities and relationships
  3. Building embeddings
  4. Constructing the knowledge graph

You can monitor the processing status. Once complete, you'll see "Completed" with the number of chunks processed.

Document Processing

Document Completed

Step 4: Explore the Knowledge Graph

This is where LightRAG's key differentiator becomes visible. Click on the Knowledge Graph tab to explore the extracted entities and relationships.

Knowledge Graph Tab

Click on the Search node name dropdown to see all extracted entities. In our example, we can see entities like:

  • LightRAG
  • Knowledge Graph
  • Entity-Relationship Extraction
  • Query Modes
  • LLM Providers
  • And many more...

Select a node (e.g., "LightRAG") to view its details and relationships.

Knowledge Graph with Node Selected

The visualization shows:

  • Node Details Panel (right side): Displays the node's ID, labels, degree (number of connections), properties, and relations
  • Graph Canvas (center): Visual representation of the knowledge graph showing nodes and their connections

In this example, the "LightRAG" node has:

  • Degree: 14 - It's connected to 14 other entities
  • Relations: Including "Retrieval-Augmented Generation (RAG)", "Knowledge Graph", "Entity-Relationship Extraction", "Vector Search", "Query Modes", "LLM Providers", and others

You can click on neighbor nodes to expand the graph and explore relationships:

Expanded Knowledge Graph

This visualization makes it clear how LightRAG understands relationships between concepts, not just document similarity.

Step 5: Query the System

Switch to the Retrieval tab to query your documents. You can:

  1. Enter your query
  2. Select a query mode (naive, local, global, hybrid, or mix)
  3. Optionally provide custom instructions for the LLM
  4. Adjust token limits and other parameters

Query Interface

Example query: "What is LightRAG? Please provide a short answer (2-3 sentences maximum)."

Query Result

The system retrieves relevant context using both the knowledge graph and vector search, then generates a response that follows your instructions.

Query Modes Explained

LightRAG offers five query modes:

  • Naive: Simple retrieval without graph traversal
  • Local: Uses local subgraph around query entities
  • Global: Considers the entire knowledge graph
  • Hybrid: Combines local and global approaches
  • Mix: Advanced combination strategy

Each mode has different characteristics in terms of speed and accuracy, depending on your use case.

Key Observations

  1. Knowledge Graph Construction: LightRAG automatically extracts entities and relationships during document processing. This happens in the background and doesn't require manual annotation.

  2. Visual Exploration: The knowledge graph visualization makes it easy to understand how your documents are structured and how concepts relate to each other.

  3. Dual Retrieval: LightRAG uses both graph-based and vector-based retrieval, which can provide more accurate results than vector search alone.

  4. Flexible Configuration: Supports multiple LLM and embedding providers, making it adaptable to different infrastructure setups.

Configuration Tips

  • LLM Requirements: LightRAG recommends ≥32B parameter models for knowledge graph extraction. Smaller models may not extract relationships as effectively.

  • Embedding Models: Popular choices include BAAI/bge-m3, text-embedding-3-large, and gemini-embedding-001.

  • Local Setup: Using Ollama for both LLM and embeddings provides a fully local setup without API dependencies.

  • Production: For production deployments, LightRAG supports PostgreSQL, MongoDB, Neo4j, and other enterprise databases instead of JSON storage.

Conclusion

LightRAG provides a practical approach to RAG that combines the benefits of knowledge graphs with vector search. The automatic entity and relationship extraction, combined with the visual graph exploration, makes it easier to understand how your documents are structured and how concepts relate to each other.

The knowledge graph visualization is particularly useful for:

  • Understanding document structure
  • Debugging retrieval results
  • Exploring relationships between concepts
  • Validating that entities and relationships were extracted correctly

While classical RAG systems work well for many use cases, LightRAG's knowledge graph approach can provide advantages when queries require understanding relationships between entities or when you need more precise retrieval based on structured knowledge.

Resources

  • GitHub Repository: https://github.com/HKUDS/LightRAG
  • PyPI Package: pip install lightrag-hku
  • Documentation: See the repository README for detailed API documentation and advanced configuration options

Top comments (0)