DEV Community

Cover image for Build a GraphRAG Knowledge Base from HazelJS Docs: The hazeljs-rag-graph-starter
Muhammad Arslan
Muhammad Arslan

Posted on

Build a GraphRAG Knowledge Base from HazelJS Docs: The hazeljs-rag-graph-starter

We've released hazeljs-rag-graph-starter — a production-ready example that indexes the entire HazelJS documentation using GraphRAG from @hazeljs/rag. In minutes, you get a REST API that answers questions about HazelJS using knowledge graph retrieval: local (entity-centric), global (thematic), and hybrid search. This post walks through what it does, how it works, and how to extend it.


Packages & Links


What is GraphRAG?

Traditional RAG retrieves the K most similar text chunks by cosine distance. It works well for narrow, keyword-anchored questions but struggles with:

  • Cross-document reasoning — "How do all the components in the system relate to each other?"
  • Thematic questions — "What are the main architectural layers of this codebase?"
  • Entity-relationship queries — "What does the AgentGraph depend on?"

GraphRAG fixes this by building a knowledge graph of entities and relationships extracted from your documents. Instead of searching raw chunks, it retrieves structured facts and cross-document themes. The @hazeljs/rag package includes a full GraphRAGPipeline that:

  1. Extracts entities and relationships from text (LLM-powered)
  2. Builds an in-memory knowledge graph
  3. Detects communities (clusters of related entities)
  4. Summarizes each community with an LLM-generated report
  5. Searches via local (entity-centric), global (community reports), or hybrid (both)

See the GraphRAG Guide for the full architecture.


What hazeljs-rag-graph-starter Does

The starter is a minimal HazelJS app that:

  • Indexes HazelJS docs from hazeljs-landing/src/content/docs (.md and .mdx)
  • Exposes a REST API for building the graph and searching
  • Supports all three search modes: local, global, hybrid

By default, it loads docs from the sibling hazeljs-landing directory — so you can run it alongside the docs repo and query the same content you're reading. You can also point it at any directory, file, or raw text array.


Quick Start

# Clone or navigate to hazeljs-rag-graph-starter
cd hazeljs-rag-graph-starter

# Add your OpenAI API key
cp .env.example .env
# Edit .env and set OPENAI_API_KEY=sk-...

# Install and run
npm install
npm run dev
Enter fullscreen mode Exit fullscreen mode

Then:

# 1. Build the graph from hazeljs-landing docs (default)
curl -X POST http://localhost:3000/api/graphrag/build \
  -H "Content-Type: application/json" \
  -d '{}'

# 2. Search (hybrid mode — local + global)
curl -X POST http://localhost:3000/api/graphrag/search \
  -H "Content-Type: application/json" \
  -d '{"query":"How does GraphRAG work and when should I use it?"}'
Enter fullscreen mode Exit fullscreen mode

The first call extracts entities and relationships from all docs, detects communities, and generates reports. The second call runs hybrid search and returns an LLM-synthesized answer with entities and communities used as context.


Search Modes in Detail

Local Search — Entity-Centric

Best for specific "what is / how does" questions about named concepts, technologies, or features.

curl -X POST http://localhost:3000/api/graphrag/search/local \
  -H "Content-Type: application/json" \
  -d '{"query":"What is the RAG package and what loaders does it support?"}'
Enter fullscreen mode Exit fullscreen mode

How it works: Finds seed entities matching the query (e.g. "RAG", "loaders"), traverses the graph up to K hops, assembles entity + relationship context, and synthesizes an answer.

Global Search — Community Reports

Best for broad, thematic questions that span many parts of the knowledge base.

curl -X POST http://localhost:3000/api/graphrag/search/global \
  -H "Content-Type: application/json" \
  -d '{"query":"What are the main architectural layers of HazelJS?"}'
Enter fullscreen mode Exit fullscreen mode

How it works: Ranks community reports by query relevance, assembles the top-K summaries as context, and synthesizes a holistic answer.

Hybrid Search — Recommended Default

Runs local and global in parallel, merges both contexts, and makes a single LLM synthesis call. Covers specific entities and broad themes in one go.

curl -X POST http://localhost:3000/api/graphrag/search \
  -H "Content-Type: application/json" \
  -d '{"query":"What vector stores does @hazeljs/rag support and how do I swap them?"}'
Enter fullscreen mode Exit fullscreen mode

API Reference

Endpoint Method Description
/api/graphrag/build POST Build graph from default docs, or pass dirPath, filePath, or texts
/api/graphrag/search POST Hybrid search (default)
/api/graphrag/search/local POST Entity-centric local search
/api/graphrag/search/global POST Community-report global search
/api/graphrag/graph GET Full knowledge graph (entities, relationships, communities)
/api/graphrag/communities GET Community reports only
/api/graphrag/stats GET Entity count, relationship count, community count
/api/graphrag/clear DELETE Wipe the graph

Project Structure

hazeljs-rag-graph-starter/
├── src/
│   ├── graphrag/
│   │   ├── dto/graphrag.dto.ts      # Request/response types
│   │   ├── graphrag.controller.ts  # REST endpoints
│   │   ├── graphrag.module.ts      # HazelJS module
│   │   └── graphrag.service.ts     # GraphRAGPipeline wrapper
│   ├── health/health.controller.ts
│   ├── app.module.ts
│   └── main.ts
├── .env.example
├── package.json
└── README.md
Enter fullscreen mode Exit fullscreen mode

The GraphRAGService wraps GraphRAGPipeline from @hazeljs/rag, configures the OpenAI LLM for extraction and synthesis, and uses DirectoryLoader to load .md and .mdx files from the default docs path.


Customizing the Docs Source

By default, the starter loads from ../hazeljs-landing/src/content/docs. You can override this:

Custom directory:

curl -X POST http://localhost:3000/api/graphrag/build \
  -H "Content-Type: application/json" \
  -d '{"dirPath":"./my-knowledge-base"}'
Enter fullscreen mode Exit fullscreen mode

Single file:

curl -X POST http://localhost:3000/api/graphrag/build \
  -H "Content-Type: application/json" \
  -d '{"filePath":"./readme.md"}'
Enter fullscreen mode Exit fullscreen mode

Raw text:

curl -X POST http://localhost:3000/api/graphrag/build \
  -H "Content-Type: application/json" \
  -d '{"texts":["HazelJS is a TypeScript framework...","The RAG package provides..."]}'
Enter fullscreen mode Exit fullscreen mode

Environment Variables

Variable Default Description
OPENAI_API_KEY Required. OpenAI API key for extraction and synthesis
QA_MODEL gpt-4o-mini Model for LLM calls
GRAPH_EXTRACTION_CHUNK_SIZE 2000 Max chars per extraction chunk
GRAPH_COMMUNITY_REPORTS true Generate community reports (needed for global search)
GRAPH_MAX_COMMUNITY_SIZE 15 Max entities per community before splitting
GRAPH_LOCAL_DEPTH 2 BFS hops for local search
GRAPH_LOCAL_TOPK 5 Seed entities for local search
GRAPH_GLOBAL_TOPK 5 Community reports for global search
PORT 3000 Server port

Inspecting the Graph

The full knowledge graph is available for visualization (e.g. D3.js, Cytoscape.js):

# Full graph (entities, relationships, communities, stats)
curl http://localhost:3000/api/graphrag/graph

# Community reports only
curl http://localhost:3000/api/graphrag/communities

# Statistics
curl http://localhost:3000/api/graphrag/stats
Enter fullscreen mode Exit fullscreen mode

Each entity has a type (CONCEPT, TECHNOLOGY, FEATURE, etc.), description, and source document IDs. Relationships have types like USES, DEPENDS_ON, PART_OF. Community reports include a title, summary, key findings, and importance rating.


GraphRAG vs Traditional RAG

Dimension Traditional RAG GraphRAG
Storage Flat vector index Knowledge graph + entities
Retrieval unit Text chunk (K nearest) Entity + relationships + community reports
Cross-document reasoning Limited Native via graph traversal
Broad thematic questions Poor Excellent (community reports)
Specific entity questions Good Excellent (BFS traversal)
Setup cost Low Medium (LLM extraction pass)
Best use case Single-domain Q&A Multi-document knowledge bases

For a deeper comparison, see RAG vs Agentic RAG and the RAG Package docs.


Extending the Starter

  • Add more loaders: Use WebLoader, GitHubLoader, or PdfLoader from @hazeljs/rag to ingest from URLs, repos, or PDFs. See the Document Loaders Guide.
  • Persist the graph: Serialize graphRag.getGraph().toJSON() to Redis or a file and reload on startup to avoid re-extraction on every deploy.
  • Combine with vector RAG: Use hazeljs-rag-documents-starter for full RAG + GraphRAG with all loaders.
  • Agentic RAG: Add @QueryPlanner, @SelfReflective, or @HyDE from @hazeljs/rag/agentic for autonomous retrieval. See the Agentic RAG Guide.

Learn More


Install

npm install @hazeljs/rag openai
Enter fullscreen mode Exit fullscreen mode

Clone the starter from the HazelJS monorepo or create your own using GraphRAGPipeline and DirectoryLoader from @hazeljs/rag.

Try it and share your feedback: GitHub · npm · hazeljs.ai

Top comments (0)