Muhammad Arslan

Posted on Mar 14 • Edited on Mar 21

Build a GraphRAG Knowledge Base in NodeJS

#ai #javascript #rag #showdev

We've released hazeljs-rag-graph-starter — a production-ready example that indexes the entire HazelJS documentation using GraphRAG from @hazeljs/rag. In minutes, you get a REST API that answers questions about HazelJS using knowledge graph retrieval: local (entity-centric), global (thematic), and hybrid search. This post walks through what it does, how it works, and how to extend it.

Packages & Links

@hazeljs/rag — RAG pipeline, document loaders, GraphRAG (npm · docs)
GraphRAG Guide — In-depth walkthrough of knowledge graph retrieval
Document Loaders Guide — All 11 loaders (PDF, Markdown, web, GitHub, etc.)
RAG vs Agentic RAG — When to use standard RAG vs Agentic RAG
Related: hazeljs-rag-documents-starter — Full RAG + GraphRAG with all document loaders

What is GraphRAG?

Traditional RAG retrieves the K most similar text chunks by cosine distance. It works well for narrow, keyword-anchored questions but struggles with:

Cross-document reasoning — "How do all the components in the system relate to each other?"
Thematic questions — "What are the main architectural layers of this codebase?"
Entity-relationship queries — "What does the AgentGraph depend on?"

GraphRAG fixes this by building a knowledge graph of entities and relationships extracted from your documents. Instead of searching raw chunks, it retrieves structured facts and cross-document themes. The @hazeljs/rag package includes a full GraphRAGPipeline that:

Extracts entities and relationships from text (LLM-powered)
Builds an in-memory knowledge graph
Detects communities (clusters of related entities)
Summarizes each community with an LLM-generated report
Searches via local (entity-centric), global (community reports), or hybrid (both)

See the GraphRAG Guide for the full architecture.

What hazeljs-rag-graph-starter Does

The starter is a minimal HazelJS app that:

Indexes HazelJS docs from hazeljs-landing/src/content/docs (.md and .mdx)
Exposes a REST API for building the graph and searching
Supports all three search modes: local, global, hybrid

By default, it loads docs from the sibling hazeljs-landing directory — so you can run it alongside the docs repo and query the same content you're reading. You can also point it at any directory, file, or raw text array.

Quick Start

# Clone or navigate to hazeljs-rag-graph-starter
cd hazeljs-rag-graph-starter

# Add your OpenAI API key
cp .env.example .env
# Edit .env and set OPENAI_API_KEY=sk-...

# Install and run
npm install
npm run dev

Then:

# 1. Build the graph from hazeljs-landing docs (default)
curl -X POST http://localhost:3000/api/graphrag/build \
  -H "Content-Type: application/json" \
  -d '{}'

# 2. Search (hybrid mode — local + global)
curl -X POST http://localhost:3000/api/graphrag/search \
  -H "Content-Type: application/json" \
  -d '{"query":"How does GraphRAG work and when should I use it?"}'

The first call extracts entities and relationships from all docs, detects communities, and generates reports. The second call runs hybrid search and returns an LLM-synthesized answer with entities and communities used as context.

Search Modes in Detail

Local Search — Entity-Centric

Best for specific "what is / how does" questions about named concepts, technologies, or features.

curl -X POST http://localhost:3000/api/graphrag/search/local \
  -H "Content-Type: application/json" \
  -d '{"query":"What is the RAG package and what loaders does it support?"}'

How it works: Finds seed entities matching the query (e.g. "RAG", "loaders"), traverses the graph up to K hops, assembles entity + relationship context, and synthesizes an answer.

Global Search — Community Reports

Best for broad, thematic questions that span many parts of the knowledge base.

curl -X POST http://localhost:3000/api/graphrag/search/global \
  -H "Content-Type: application/json" \
  -d '{"query":"What are the main architectural layers of HazelJS?"}'

How it works: Ranks community reports by query relevance, assembles the top-K summaries as context, and synthesizes a holistic answer.

Hybrid Search — Recommended Default

Runs local and global in parallel, merges both contexts, and makes a single LLM synthesis call. Covers specific entities and broad themes in one go.

curl -X POST http://localhost:3000/api/graphrag/search \
  -H "Content-Type: application/json" \
  -d '{"query":"What vector stores does @hazeljs/rag support and how do I swap them?"}'

API Reference

Endpoint	Method	Description
`/api/graphrag/build`	POST	Build graph from default docs, or pass `dirPath`, `filePath`, or `texts`
`/api/graphrag/search`	POST	Hybrid search (default)
`/api/graphrag/search/local`	POST	Entity-centric local search
`/api/graphrag/search/global`	POST	Community-report global search
`/api/graphrag/graph`	GET	Full knowledge graph (entities, relationships, communities)
`/api/graphrag/communities`	GET	Community reports only
`/api/graphrag/stats`	GET	Entity count, relationship count, community count
`/api/graphrag/clear`	DELETE	Wipe the graph

Project Structure

hazeljs-rag-graph-starter/
├── src/
│   ├── graphrag/
│   │   ├── dto/graphrag.dto.ts      # Request/response types
│   │   ├── graphrag.controller.ts  # REST endpoints
│   │   ├── graphrag.module.ts      # HazelJS module
│   │   └── graphrag.service.ts     # GraphRAGPipeline wrapper
│   ├── health/health.controller.ts
│   ├── app.module.ts
│   └── main.ts
├── .env.example
├── package.json
└── README.md

The GraphRAGService wraps GraphRAGPipeline from @hazeljs/rag, configures the OpenAI LLM for extraction and synthesis, and uses DirectoryLoader to load .md and .mdx files from the default docs path.

Customizing the Docs Source

By default, the starter loads from ../hazeljs-landing/src/content/docs. You can override this:

Custom directory:

curl -X POST http://localhost:3000/api/graphrag/build \
  -H "Content-Type: application/json" \
  -d '{"dirPath":"./my-knowledge-base"}'

Single file:

curl -X POST http://localhost:3000/api/graphrag/build \
  -H "Content-Type: application/json" \
  -d '{"filePath":"./readme.md"}'

Raw text:

curl -X POST http://localhost:3000/api/graphrag/build \
  -H "Content-Type: application/json" \
  -d '{"texts":["HazelJS is a TypeScript framework...","The RAG package provides..."]}'

Environment Variables

Variable	Default	Description
`OPENAI_API_KEY`	—	Required. OpenAI API key for extraction and synthesis
`QA_MODEL`	`gpt-4o-mini`	Model for LLM calls
`GRAPH_EXTRACTION_CHUNK_SIZE`	`2000`	Max chars per extraction chunk
`GRAPH_COMMUNITY_REPORTS`	`true`	Generate community reports (needed for global search)
`GRAPH_MAX_COMMUNITY_SIZE`	`15`	Max entities per community before splitting
`GRAPH_LOCAL_DEPTH`	`2`	BFS hops for local search
`GRAPH_LOCAL_TOPK`	`5`	Seed entities for local search
`GRAPH_GLOBAL_TOPK`	`5`	Community reports for global search
`PORT`	`3000`	Server port

Inspecting the Graph

The full knowledge graph is available for visualization (e.g. D3.js, Cytoscape.js):

# Full graph (entities, relationships, communities, stats)
curl http://localhost:3000/api/graphrag/graph

# Community reports only
curl http://localhost:3000/api/graphrag/communities

# Statistics
curl http://localhost:3000/api/graphrag/stats

Each entity has a type (CONCEPT, TECHNOLOGY, FEATURE, etc.), description, and source document IDs. Relationships have types like USES, DEPENDS_ON, PART_OF. Community reports include a title, summary, key findings, and importance rating.

GraphRAG vs Traditional RAG

Dimension	Traditional RAG	GraphRAG
Storage	Flat vector index	Knowledge graph + entities
Retrieval unit	Text chunk (K nearest)	Entity + relationships + community reports
Cross-document reasoning	Limited	Native via graph traversal
Broad thematic questions	Poor	Excellent (community reports)
Specific entity questions	Good	Excellent (BFS traversal)
Setup cost	Low	Medium (LLM extraction pass)
Best use case	Single-domain Q&A	Multi-document knowledge bases

For a deeper comparison, see RAG vs Agentic RAG and the RAG Package docs.

Extending the Starter

Add more loaders: Use WebLoader, GitHubLoader, or PdfLoader from @hazeljs/rag to ingest from URLs, repos, or PDFs. See the Document Loaders Guide.
Persist the graph: Serialize graphRag.getGraph().toJSON() to Redis or a file and reload on startup to avoid re-extraction on every deploy.
Combine with vector RAG: Use hazeljs-rag-documents-starter for full RAG + GraphRAG with all loaders.
Agentic RAG: Add @QueryPlanner, @SelfReflective, or @HyDE from @hazeljs/rag/agentic for autonomous retrieval. See the Agentic RAG Guide.

Learn More

GraphRAG Guide — Architecture, search modes, entity types
RAG Package — Vector stores, loaders, GraphRAG API
Document Loaders Guide — All 11 loaders
RAG vs Agentic RAG — When to use which
RAG Patterns — Advanced patterns

Install

npm install @hazeljs/rag openai

Clone the starter from the HazelJS monorepo or create your own using GraphRAGPipeline and DirectoryLoader from @hazeljs/rag.

Try it and share your feedback: GitHub · npm · hazeljs.ai

DEV Community