DEV Community

Mir Mursalin Ankur
Mir Mursalin Ankur

Posted on

Graphify + code-review-graph: Build a Self-Updating Knowledge Graph for Claude Code and other AI Coding Agent

Every developer working with LLMs on a large codebase eventually hits the same wall: context windows are finite, but codebases are not.

You start a new AI coding session, ask about the payment flow — and your agent starts re-reading dozens of files just to get oriented. Twenty thousand tokens evaporated before a single line of code is written. Multiply that by every session, every team member, every day.

Two open-source tools solve this in different but complementary ways:

  • Graphify — converts your folder into a queryable knowledge graph with community detection, Obsidian-compatible reports, and cross-file traversal
  • code-review-graph (CRG) — builds a SQLite-backed AST graph with blast-radius analysis, embedding-based semantic search, ~25 MCP tools (allow-listable to a working set of 8), and sub-second incremental updates

This guide walks through installing both tools, connecting them to any AI coding agent — Claude Code, Cursor, Gemini CLI, Windsurf, GitHub Copilot, and more — wiring auto-updates for code edited by humans, git commits, or the agent itself, and pairing everything with an Obsidian vault as a persistent memory layer.

Pick one, both, or none — every section degrades gracefully. Each tool works standalone; the smart-grep-hook, session-start cheatsheet, and CLAUDE.md routing rules all detect what is present and adapt. Use just graphify if you want a pure CLI / zero-MCP setup. Use just CRG if you want embedding-aware semantic search and PR-grade impact tools. Use both for the full stack (CRG primary, graphify on miss). If neither is installed, the agent silently falls back to grep — no broken hooks, no errors.

All commands in this guide were tested on Ubuntu and macOS across multiple real pnpm monorepos of varying sizes.


Real Numbers from Two Test Projects

Before diving in, here's what both tools produced across two real codebases — one a full-stack TypeScript monorepo with 5 packages, the other a lighter frontend-only repo:

Metric Graphify (AST-only) code-review-graph
Files indexed (large) 1,020 1,052
Nodes (large) 3,815 5,780
Edges (large) 4,830 30,611
Files indexed (small) 702 711
Nodes (small) 2,035 2,773
Edges (small) 2,357 15,037
Communities 750 / 499 28 wiki pages each
Incremental update ~10s (8 workers) 0.425s
LLM tokens used 0 0
Storage graphify-out/ (JSON) .code-review-graph/ (SQLite)

How Each Tool Works

Pipeline Architecture: Graphify vs code-review-graph

Graphify — Two-Pass Graph with Communities

Your Code
    │
    ▼
Pass 1: Tree-sitter AST   ← 0 tokens, 25 languages
(classes, functions, imports, call graphs)
    │
    ▼
Pass 2: AI Extraction     ← only for PDFs, images, markdown (optional)
(semantic relationships via Claude subagents)
    │
    ▼
NetworkX Graph + Leiden Clustering
    │
    ├── graphify-out/graph.json       (queryable)
    ├── graphify-out/GRAPH_REPORT.md  (750 communities, Obsidian links)
    ├── graphify-out/graph.html       (interactive visual)
    └── graphify-out/cache/           (SHA256 per file)
Enter fullscreen mode Exit fullscreen mode

Each edge has a confidence tag:

Tag Source Confidence
EXTRACTED Directly in AST 1.0
INFERRED Reasonable deduction 0.7–0.9
AMBIGUOUS Needs review <0.7

On the large monorepo: 87% EXTRACTED · 13% INFERRED · 0% AMBIGUOUS

code-review-graph — Blast-Radius Graph with MCP

Your Code (git-tracked files)
    │
    ▼
Tree-sitter AST (23 languages, 0 tokens)
    │
    ▼
SQLite (.code-review-graph/graph.db)
    │
    ├── Nodes: functions, classes, files
    ├── Edges: imports, calls, inheritance
    └── Full-text search index (FTS5)
    │
    ▼
~25 MCP tools available (allow-list to ~8 via CRG_TOOLS env — see "Strip Unused CRG Tools")
(semantic_search_nodes, query_graph, get_impact_radius, list_communities, ...)
Enter fullscreen mode Exit fullscreen mode

Installation

Install one or both — the rest of the guide shows where each section applies.

Ubuntu

# Pick what you need:
pip install graphifyy         # graphify CLI (note: two y's on PyPI)
pip install code-review-graph # CRG (CLI + MCP server, includes embeddings extra)

# Verify (only the lines for installed tools):
graphify --help | head -5
code-review-graph --version
Enter fullscreen mode Exit fullscreen mode

macOS

# Via uv (fastest):
uv tool install graphifyy           # graphify only
uv tool install code-review-graph   # CRG only — or run both lines for both

# Or via pipx:
brew install pipx
pipx install graphifyy
pipx install code-review-graph
Enter fullscreen mode Exit fullscreen mode

PyPI quirk: The package is graphifyy (two y's). The CLI command after install is graphify (one y).


Step 1: Create Ignore Files

Before building any graph, exclude noise from indexing. Place these at your project root.

.graphifyignore

node_modules/
dist/
build/
.pnpm-store/
coverage/
*.min.js
*.min.css
*.map
pnpm-lock.yaml
yarn.lock
*.lock
*.log
.env*
graphify-out/
.code-review-graph/
*.example.*
Enter fullscreen mode Exit fullscreen mode

.code-review-graphignore (same content)

node_modules/
dist/
build/
.pnpm-store/
coverage/
*.min.js
*.min.css
*.map
pnpm-lock.yaml
yarn.lock
*.lock
*.log
.env*
graphify-out/
.code-review-graph/
*.example.*
Enter fullscreen mode Exit fullscreen mode

Step 2: Build the Graphs Manually

code-review-graph

cd /path/to/your-project

# Full build (first time) — parses all files
code-review-graph build

# Output (large monorepo):
# Full build: 1052 files, 5780 nodes, 30611 edges (postprocess=full)

# Output (smaller frontend repo):
# Full build: 711 files, 2773 nodes, 15037 edges (postprocess=full)
Enter fullscreen mode Exit fullscreen mode

Graphify (AST-only, no LLM cost)

cd /path/to/your-project

# AST-only update (no API key required)
graphify update .

# Output (large monorepo):
# Rebuilt: 3815 nodes, 4830 edges, 750 communities
# graph.json, graph.html and GRAPH_REPORT.md updated in graphify-out

# Output (smaller repo):
# Rebuilt: 2035 nodes, 2357 edges, 499 communities
Enter fullscreen mode Exit fullscreen mode

For the richer semantic graph (PDFs, images, markdown — uses LLM):

# Full extraction with Claude subagents (requires ANTHROPIC_API_KEY)
graphify extract .
Enter fullscreen mode Exit fullscreen mode

Step 3: Register with Your AI Agent

code-review-graph (auto-configures 5 platforms)

code-review-graph install
Enter fullscreen mode Exit fullscreen mode

This single command:

  • Writes .mcp.json (Claude Code MCP server config)
  • Writes .cursor/mcp.json, .opencode.json, Zed settings, .cursorrules, GEMINI.md, AGENTS.md
  • Creates .claude/skills/ for Claude Code tool integration
  • Installs hooks in .claude/settings.json (move these out — see below)
  • Installs a git pre-commit hook
  • Updates .gitignore to exclude .code-review-graph/

Post-install housekeeping: code-review-graph install is aggressive — it writes configs for every AI IDE it knows about. Most teams only use one. Add the noise to .gitignore:

# .gitignore additions
AGENTS.md
GEMINI.md
.mcp.json          # keep only .mcp.example.json
.cursorrules
.windsurfrules
.opencode.json
.kiro/
Enter fullscreen mode Exit fullscreen mode

Three-file hook pattern: settings.json stays clean — permissions only, no hooks. Hooks go into example/local:

  • .claude/settings.example.json (committed) — documents the full hook structure for teammates. Contains all graph hooks (Stop, SessionStart, PreToolUse). Copy from here to activate.
  • .claude/settings.local.json (gitignored) — your actual personal hooks that fire at runtime. Copy from settings.example.json on first setup, then customize with env vars or secrets. You can also copy individual hooks into your global ~/.claude/settings.json if you want them active across all projects.
// .claude/settings.example.json  committed reference, shows hook structure
// Copy this file to .claude/settings.local.json and customize
{
  "hooks": {
    "PostToolUse": [],
    "Stop": [
      {
        "_comment": "CRG auto-update after AI finishes turn — once per turn, PID-guarded.",
        "hooks": [{
          "type": "command",
          "command": "command -v code-review-graph >/dev/null 2>&1 && [ -d .code-review-graph ] && { PF=/tmp/crg-claude.pid; if [ -f \"$PF\" ] && kill -0 \"$(cat \"$PF\")\" 2>/dev/null; then true; else { code-review-graph update --skip-flows 2>/dev/null && nohup code-review-graph embed >/dev/null 2>&1 & } & echo $! > \"$PF\"; fi; } || true",
          "timeout": 5
        }]
      }
    ],
    "SessionStart": [
      {
        "_comment": "CRG graph status on session open",
        "hooks": [{
          "type": "command",
          "command": "command -v code-review-graph >/dev/null 2>&1 && code-review-graph status 2>/dev/null || true",
          "timeout": 10
        }]
      },
      {
        "_comment": "Setup nudge — prompt to initialize if CLI installed but no graph built yet",
        "hooks": [{
          "type": "command",
          "command": "if command -v code-review-graph >/dev/null 2>&1 && git rev-parse --is-inside-work-tree >/dev/null 2>&1 && [ ! -d .code-review-graph ]; then printf '{\"systemMessage\":\"Graph tool installed but not yet initialized. Ask me to set up: code-review-graph (code-review-graph install)\"}'; fi || true",
          "timeout": 10
        }]
      },
      {
        "_comment": "Graph query cheatsheet — injected once per session (~150 tokens)",
        "hooks": [{
          "type": "command",
          "command": "if [ -f .code-review-graph/graph.db ] || [ -f graphify-out/graph.json ]; then STATS=\"\"; TOOL_LINES=\"\"; if [ -f .code-review-graph/graph.db ]; then STATS=$(python3 -c \"import sqlite3; c=sqlite3.connect('.code-review-graph/graph.db'); n=c.execute('SELECT COUNT(*) FROM nodes').fetchone()[0]; e=c.execute('SELECT COUNT(*) FROM edges').fetchone()[0]; print(f'{n} nodes, {e} edges'); c.close()\" 2>/dev/null || echo \"\"); TOOL_LINES=\"  where is X defined    → semantic_search_nodes_tool(query=X)\\n  who calls X           → query_graph_tool(pattern=callers_of, target=X)\\n  pre-refactor blast    → get_impact_radius_tool(changed_files=[...])\\n  community/cluster     → list_communities_tool()\\n  code review context   → get_review_context_tool(changed_files=[...])\"; fi; if [ -f graphify-out/graph.json ]; then GFY_STATS=$(python3 -c \"import json; g=json.load(open('graphify-out/graph.json')); nodes=g.get('nodes',[]); comms=len(set(n.get('community','') for n in nodes if n.get('community',''))); print(f'{len(nodes)} nodes, {comms} communities')\" 2>/dev/null || echo \"\"); [ -n \"$GFY_STATS\" ] && STATS=\"${STATS:+$STATS | }graphify: $GFY_STATS\"; TOOL_LINES=\"${TOOL_LINES:+$TOOL_LINES\\n}  CRG miss / explore    → graphify query '<term>' --graph graphify-out/graph.json\\n  path A→B              → graphify path '<from>' '<to>' --graph graphify-out/graph.json\"; fi; printf '{\"hookSpecificOutput\":{\"hookEventName\":\"SessionStart\",\"additionalContext\":\"GRAPH QUERY CHEATSHEET (%s) — use BEFORE Read/Grep/Bash-find on code:\\n%s\\nSkip graph for: .md .json .yml .log .jsonl configs cross-repo paths.\\nOverride grep gate: append --graph-tried to any Bash command.\"}}' \"$STATS\" \"$TOOL_LINES\"; fi",
          "timeout": 5
        }]
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode
# .claude/settings.local.json — gitignored, actual runtime hooks
# Copy from settings.example.json on first setup:
cp .claude/settings.example.json .claude/settings.local.json
# Then add any personal env vars or secrets
Enter fullscreen mode Exit fullscreen mode

Why Stop, not PostToolUse? PostToolUse fires after every individual file edit — if Claude makes 10 edits in one response, it fires 10 times. The pgrep guard that prevents double-spawning has a race window of a few milliseconds, which is not enough to block concurrent spawns reliably. The result: multiple CRG Python processes pile up, load average hits 12+, RAM saturates, and the machine slows to a crawl. Stop fires once when the AI finishes its entire turn — all edits batched, single update triggered. The PID-file guard then prevents overlap across turns (skip if the previous turn's update is still running). Zero pile-up by design.

Why graphify update is NOT in any Claude hook: graphify update takes ~10s+ on a real monorepo (even with SHA256 caching). Running it in Stop or PostToolUse would block the session or cause the process to hang in the background, accumulating across turns. It belongs only in git hooks (post-commit, post-checkout) where it runs as a fully detached nohup process after the commit returns — the developer has already moved on, so the latency is invisible.

# .claude/settings.local.json — gitignored, actual runtime hooks (copy from example)
Enter fullscreen mode Exit fullscreen mode

The .mcp.json it creates (keep as .mcp.example.json only):

{
  "mcpServers": {
    "code-review-graph": {
      "command": "uvx",
      "args": ["code-review-graph", "serve"],
      "type": "stdio"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Graphify (AI Agent Integration)

# Adds graphify section to CLAUDE.md + PreToolUse hook (Claude Code)
graphify claude install
Enter fullscreen mode Exit fullscreen mode

Graphify's claude install adds a PreToolUse hook that intercepts grep, rg, find commands and redirects the agent to graphify query instead — turning search interception into graph navigation. For other agents: code-review-graph install already writes equivalent config to GEMINI.md, AGENTS.md, .cursorrules, and Zed settings.

Note: graphify claude install writes the PreToolUse hook into .claude/settings.json. Move it to .claude/settings.example.json (committed as a reference) and copy to .claude/settings.local.json to activate it locally — not every teammate will have graphify installed, so it shouldn't fire automatically for everyone.

Auto-Update on Commit (the full picture)

Auto-Update Triggers: PostToolUse, git commit, branch switch

Three independent triggers keep the graph fresh — install all of them so no edit path leaves it stale:

Trigger Hook location Tool Notes
Claude finishes a turn settings.example.jsonsettings.local.json Stop hook CRG only ~0.425s — fast enough to run after every AI turn; PID-guarded
Any commit .git/hooks/post-commit (or .husky/post-commit) graphify + CRG terminal commits, IDE commits, other AI tools; graphify runs as background nohup
Branch switch .git/hooks/post-checkout (or .husky/post-checkout) graphify + CRG smart: ≤5 files diff → incremental update; >5 files or new branch → full rebuild; background nohup

graphify is NOT in the Claude Stop hook. graphify update takes ~10s+ on large monorepos — too slow for an AI turn hook. It would pile up, hang in the background, and saturate CPU/RAM (observed: 3 stuck processes at 65–73% CPU each). Use git hooks instead — the developer has already moved on by the time graphify finishes.

Graphify ships its own installer for the git side:

graphify hook install   # writes post-commit + post-checkout
Enter fullscreen mode Exit fullscreen mode

Code-review-graph does not. Its install command writes a pre-commit hook that runs detect-changes --brief (a status warning before the commit lands) but no post-commit hook to update the SQLite graph after. Without one, .code-review-graph/graph.db only refreshes when Claude touches a file — every terminal/IDE/other-tool commit drifts.

Add the post-commit update yourself. Both forms are detached so git commit returns immediately.

Resource guard required. Graph rebuilds are CPU-intensive. Without guards, multiple rebuilds can pile up (observed: 3 concurrent graphify processes at 65–73% CPU each, load average 12+, RAM saturated). Every hook below includes a _resources_ok check (CPU ≤ 50% of cores, memory ≥ 2 GB free) and a pgrep process deduplication guard. If either check fails, the rebuild silently skips — the next commit retriggers a fresh attempt.

Plain git project — append to .git/hooks/post-commit:

#!/bin/sh
# Knowledge graph tools are optional — each section silently skips if tools are absent.

# Returns 0 (ok) when CPU load and free memory are within acceptable limits.
# CPU: 1-min load average must be <= 50% of logical core count (Linux/macOS)
#      or aggregate load percentage <= 50% (Windows).
# Memory: effectively available memory must be >= 2048 MB on all platforms.
_resources_ok() {
  _os=$(uname -s 2>/dev/null)
  case "$_os" in
    Linux)
      _nproc=$(nproc 2>/dev/null || grep -c '^processor' /proc/cpuinfo 2>/dev/null || echo 1)
      _cpu_ok=$(awk -v n="$_nproc" 'NR==1 { print ($1 / n <= 0.50) ? "1" : "0" }' /proc/loadavg 2>/dev/null)
      if [ "${_cpu_ok:-1}" != "1" ]; then
        echo "[graph hook] Skipping rebuild — CPU load above 50% threshold"
        return 1
      fi
      _mem_ok=$(awk '/^MemAvailable:/ { print ($2 / 1024 >= 2048) ? "1" : "0" }' /proc/meminfo 2>/dev/null)
      if [ "${_mem_ok:-1}" != "1" ]; then
        echo "[graph hook] Skipping rebuild — available memory below 2 GB threshold"
        return 1
      fi
      ;;
    Darwin)
      _nproc=$(sysctl -n hw.logicalcpu 2>/dev/null || echo 1)
      _cpu_ok=$(sysctl -n vm.loadavg 2>/dev/null | awk -v n="$_nproc" \
        '{ gsub(/[{}]/, ""); print ($1 / n <= 0.50) ? "1" : "0" }')
      if [ "${_cpu_ok:-1}" != "1" ]; then
        echo "[graph hook] Skipping rebuild — CPU load above 50% threshold"
        return 1
      fi
      _page_size=$(sysctl -n hw.pagesize 2>/dev/null || echo 4096)
      _mem_ok=$(vm_stat 2>/dev/null | awk -v ps="$_page_size" '
        /Pages free:/        { gsub(/\./, "", $3); free = $3 + 0 }
        /Pages inactive:/    { gsub(/\./, "", $3); inactive = $3 + 0 }
        /Pages speculative:/ { gsub(/\./, "", $3); spec = $3 + 0 }
        END { print ((free + inactive + spec) * ps / 1048576 >= 2048) ? "1" : "0" }
      ')
      if [ "${_mem_ok:-1}" != "1" ]; then
        echo "[graph hook] Skipping rebuild — available memory below 2 GB threshold"
        return 1
      fi
      ;;
    MINGW*|MSYS*|CYGWIN*)
      if command -v powershell.exe > /dev/null 2>&1; then
        _cpu_pct=$(powershell.exe -NoProfile -Command \
          "(Get-CimInstance Win32_Processor | Measure-Object -Property LoadPercentage -Average).Average" \
          2>/dev/null | tr -d '\r\n ')
        _cpu_over=$(echo "${_cpu_pct:-0}" | awk '{ print ($1 > 50) ? "1" : "0" }')
        if [ "$_cpu_over" = "1" ]; then
          echo "[graph hook] Skipping rebuild — CPU load above 50% threshold"
          return 1
        fi
        _mem_mb=$(powershell.exe -NoProfile -Command \
          "[math]::Round((Get-CimInstance Win32_OperatingSystem).FreePhysicalMemory / 1024)" \
          2>/dev/null | tr -d '\r\n ')
        _mem_low=$(echo "${_mem_mb:-9999}" | awk '{ print ($1 < 2048) ? "1" : "0" }')
        if [ "$_mem_low" = "1" ]; then
          echo "[graph hook] Skipping rebuild — available memory below 2 GB threshold"
          return 1
        fi
      fi
      ;;
  esac
  return 0
}

# Skip during rebase/merge/cherry-pick
GIT_DIR=$(git rev-parse --git-dir 2>/dev/null)
[ -d "$GIT_DIR/rebase-merge" ] && exit 0
[ -d "$GIT_DIR/rebase-apply" ] && exit 0
[ -f "$GIT_DIR/MERGE_HEAD" ] && exit 0
[ -f "$GIT_DIR/CHERRY_PICK_HEAD" ] && exit 0

# ── graphify: rebuild graph after commit ─────────────────────────────────────
# Entire block skips if graphify is not installed.
if command -v graphify > /dev/null 2>&1; then
  CHANGED=$(git diff --name-only HEAD~1 HEAD 2>/dev/null || git diff --name-only HEAD 2>/dev/null)
  if [ -n "$CHANGED" ]; then
    GRAPHIFY_BIN=$(command -v graphify)
    case "$GRAPHIFY_BIN" in
      *.exe) _SHEBANG="" ;;
      *)     _SHEBANG=$(head -1 "$GRAPHIFY_BIN" | sed 's/^#![[:space:]]*//') ;;
    esac
    case "$_SHEBANG" in
      */env\ *) GRAPHIFY_PYTHON="${_SHEBANG#*/env }" ;;
      *)        GRAPHIFY_PYTHON="$_SHEBANG" ;;
    esac
    case "$GRAPHIFY_PYTHON" in
      *[!a-zA-Z0-9/_.@-]*) GRAPHIFY_PYTHON="" ;;
    esac
    if [ -z "$GRAPHIFY_PYTHON" ] || ! "$GRAPHIFY_PYTHON" -c "import graphify" 2>/dev/null; then
      GRAPHIFY_PYTHON=""
      command -v python3 >/dev/null 2>&1 && python3 -c "import graphify" 2>/dev/null \
        && GRAPHIFY_PYTHON="python3"
      [ -z "$GRAPHIFY_PYTHON" ] && command -v python >/dev/null 2>&1 \
        && python -c "import graphify" 2>/dev/null && GRAPHIFY_PYTHON="python"
    fi
    if [ -n "$GRAPHIFY_PYTHON" ] && ! pgrep -qf 'graphify' 2>/dev/null && _resources_ok; then
      export GRAPHIFY_CHANGED="$CHANGED"
      _LOG="${HOME}/.cache/graphify-rebuild.log"
      mkdir -p "$(dirname "$_LOG")"
      echo "[graphify hook] launching background rebuild (log: $_LOG)"
      nohup timeout 300 $GRAPHIFY_PYTHON -c "
import os, sys
from pathlib import Path
changed_raw = os.environ.get('GRAPHIFY_CHANGED', '')
changed = [Path(f.strip()) for f in changed_raw.strip().splitlines() if f.strip()]
if not changed:
    sys.exit(0)
print(f'[graphify hook] {len(changed)} file(s) changed - rebuilding graph...')
try:
    from graphify.watch import _rebuild_code
    _force = os.environ.get('GRAPHIFY_FORCE', '').lower() in ('1', 'true', 'yes')
    _rebuild_code(Path('.'), force=_force)
except Exception as exc:
    print(f'[graphify hook] Rebuild failed: {exc}')
    sys.exit(1)
" > "$_LOG" 2>&1 < /dev/null &
      disown 2>/dev/null || true
    fi
  fi
fi

# ── code-review-graph: incrementally update MCP graph index after commit ────
# Uses `update --skip-flows` (incremental, sub-second) rather than `build`
# (full rebuild — minutes on large repos).
if command -v code-review-graph > /dev/null 2>&1 && [ -d .code-review-graph ] && ! pgrep -qf 'code-review-graph update' 2>/dev/null && _resources_ok; then
  _LOG="${HOME}/.cache/code-review-graph-update.log"
  mkdir -p "$(dirname "$_LOG")"
  echo "[code-review-graph] Commit detected - launching background update (log: $_LOG)"
  nohup sh -c 'code-review-graph update --skip-flows && code-review-graph embed' > "$_LOG" 2>&1 < /dev/null &
  disown 2>/dev/null || true
fi
Enter fullscreen mode Exit fullscreen mode

Husky project — create .husky/post-commit (Husky v9+ runs .husky/<hookname> directly; no .husky/_/ shim editing needed). The content is identical to the plain git version above — copy the entire script. Always prefer update --skip-flows (incremental, sub-second) over build (full rebuild — minutes on large repos):

# .husky/post-commit — same content as .git/hooks/post-commit above
# (copy the full script from the plain git example)
Enter fullscreen mode Exit fullscreen mode

Verify both graphs caught up to the latest commit:

git log -1 --format=%ci                          # last commit timestamp
stat -f %Sm graphify-out/graph.json              # macOS
stat -f %Sm .code-review-graph/graph.db          # macOS
# stat -c %y graphify-out/graph.json             # Linux equivalent
Enter fullscreen mode Exit fullscreen mode

Both file mtimes should be at or after the commit timestamp. If graph.db lags, the post-commit hook isn't firing — check chmod +x on the hook file and that pnpm install ran (so Husky's prepare script wired up .husky/).

Agent Config: trigger-list pattern (CLAUDE.md / AGENTS.md / GEMINI.md)

Rather than pasting the full tool docs into your agent config file (which bloats every session's context), create a dedicated docs/agent/knowledge-graph.md and add a trigger-list pointer. This works the same way across agents — Claude Code reads CLAUDE.md, Gemini CLI reads GEMINI.md, Codex and OpenAI agents read AGENTS.md:

## Knowledge Graph

**Read `docs/agent/knowledge-graph.md` whenever you:**
- Answer any architecture, cross-module, or "how does X work" question
- Plan to grep, find, or glob through the codebase
- Need to understand an unfamiliar module or trace a call chain
- Are about to refactor or change something with unclear blast radius

The doc covers graphify (community detection, path tracing, GRAPH_REPORT.md), code-review-graph (MCP tools, impact analysis), when to use each, and the full auto-update pipeline.
Enter fullscreen mode Exit fullscreen mode

Place this block in CLAUDE.md for Claude Code, GEMINI.md for Gemini CLI, or AGENTS.md for Codex/OpenAI-compatible agents — the trigger-list pattern works the same across all of them. A vague "read before exploring unfamiliar code" pointer is easy for any agent to skip. Explicit trigger conditions — especially "plan to grep" — activate the graph habit reliably. The full reference lives in docs/agent/knowledge-graph.md and is only loaded when actually needed.

Enforcing Graph-First: The Smart Grep Hook

The trigger-list in CLAUDE.md is a soft nudge — in practice, agents still reach for grep by default even with both knowledge graphs initialized and the pointer in place. Soft warnings alone do not reliably change behaviour.

The fix is a smarter PreToolUse hook that doesn't just warn — it answers the question inside the rejection, and handles every scenario gracefully: no graph, cross-repo grep, multi-repo registries.

Smart Grep Hook Decision Ladder

Decision ladder (first match wins, zero false positives):

Situation Hook behaviour
No graph in this repo, no registered siblings Silent pass — zero noise
Non-code target (.md, .json, .log, .jsonl) Silent pass
--graph-tried flag in command Silent pass — explicit override
Grep targets abs path outside CWD + that dir has graph Query that graph, show result as context, allow grep
Grep targets abs path outside CWD + no graph found Silent pass — grep is the right tool here
No local graph + registered sibling repo matches pattern Show sibling result as context, allow grep
First grep this session + local graph hit Show answer and allow grep (one-shot lesson)
First grep this session + no graph hit Allow grep, suggest MCP tool for next time
Subsequent grep + local graph hit Deny with answer inline — zero retry loop
Subsequent grep + no graph hit Silent pass

The key insight: the deny message IS the answer. The costly retry loop (deny → model reads error → model tries graph → gets answer = 2 API calls) collapses to 1.

Three graph sources, consulted in priority order:

  1. Local graph.code-review-graph/graph.db in current working directory
  2. Cross-repo graph — walk up from any absolute path in the command to find the nearest .code-review-graph/ directory
  3. Multi-repo registrycode-review-graph repos for globally registered repos

All lookups use the nodes_fts FTS5 full-text index — sub-millisecond, no external process, no network. Falls back to a LIKE scan if FTS returns empty.

Save ~/.claude/scripts/smart-grep-hook.sh:

#!/usr/bin/env bash
# ~/.claude/scripts/smart-grep-hook.sh
# Adaptive graph-first grep interceptor — CRG (SQLite FTS5) + graphify (JSON)
# Decision ladder (first match wins):
#   1. Not a search command              → pass silently
#   2. --graph-tried override            → pass silently
#   3. Non-code target (.md/.json/…)    → pass silently
#   4. Cross-repo abs path:
#      a. Target has CRG or graphify     → query both, show context, allow grep
#      b. No graph in target             → pass silently
#   5. No local graph of either type:
#      a. CRG registered repo match     → show context, allow grep
#      b. Global merged graphify match  → show context, allow grep
#      c. No match                      → pass silently
#   6. Local graph(s) present — session-aware gating:
#      a. First grep + hit              → show result, allow (one-shot lesson)
#      b. First grep + miss             → allow, suggest tool for next time
#      c. Subsequent + hit              → answering deny (result inline, no retry)
#      d. Subsequent + miss             → pass silently
set -uo pipefail

INPUT=$(cat)
CMD=$(printf '%s' "$INPUT" | python3 -c \
  "import json,sys; d=json.load(sys.stdin); print(d.get('tool_input',d).get('command',''))" \
  2>/dev/null || echo "")

# ── Tier 1-3: fast exits ──────────────────────────────────────────────────────
case "$CMD" in *grep*|*" rg "*|*"   rg "*|*ripgrep*|*" fd "*|*" ack "*|*" ag "*) ;;
  *find\ *) ;; *) exit 0 ;; esac
case "$CMD" in *--graph-tried*|*"# graph-checked"*|*"GRAPH_TRIED=1"*) exit 0 ;; esac
case "$CMD" in
  *.md*|*.json*|*.yml*|*.yaml*|*.log*|*.jsonl*|*.txt*|*.csv*|\
  *node_modules*|*"/.git/"*|*/dist/*|*/build/*|*/.next/*|*/__pycache__/*) exit 0 ;; esac

json_esc() { python3 -c "import json,sys; print(json.dumps(sys.stdin.read())[1:-1])"; }

# ── CRG: SQLite FTS5 query (sub-ms) ──────────────────────────────────────────
query_crg() {
  local db="$1" pat="$2"
  python3 - "$pat" "$db" <<'PYEOF' 2>/dev/null
import sqlite3, sys, os
pat, db = sys.argv[1], sys.argv[2]
if not os.path.exists(db): sys.exit(0)
try:
    c = sqlite3.connect(f"file:{db}?mode=ro", uri=True, timeout=3)
    rows = c.execute(
        "SELECT n.kind, n.name, n.file_path, n.line_start "
        "FROM nodes_fts f JOIN nodes n ON n.id=f.rowid WHERE nodes_fts MATCH ? LIMIT 5",
        (pat,)).fetchall()
    if not rows:
        rows = c.execute(
            "SELECT kind, name, file_path, line_start FROM nodes WHERE name LIKE ? LIMIT 5",
            (f'%{pat}%',)).fetchall()
    c.close()
    for kind, name, path, line in rows:
        print(f'[crg] {kind}  {name}  →  {path}:{line}')
except: pass
PYEOF
}

# ── graphify: JSON in-memory search (~5ms for 4k nodes) ──────────────────────
query_graphify() {
  local json="$1" pat="$2"
  python3 - "$pat" "$json" <<'PYEOF' 2>/dev/null
import json, sys, os
pat_low, gfile = sys.argv[1].lower(), sys.argv[2]
if not os.path.exists(gfile): sys.exit(0)
try:
    g = json.load(open(gfile))
    results = []
    for n in g.get('nodes', []):
        label = n.get('label', '')
        if pat_low in label.lower() or pat_low in n.get('id', '').lower():
            src = n.get('source_file', '')
            loc = n.get('source_location', '') or n.get('line', '')
            kind = n.get('file_type', 'node')
            community = n.get('community', '')
            results.append(f'[graphify] {kind}  {label}  →  {src}:{loc}  (community {community})')
    for r in results[:5]: print(r)
except: pass
PYEOF
}

# ── Query both graphs at a given root ─────────────────────────────────────────
query_at_root() {
  local root="$1" pat="$2"
  local out=""
  local crg_db="${root}/.code-review-graph/graph.db"
  local gfy_json="${root}/graphify-out/graph.json"
  [ -f "$crg_db" ]   && out+=$(query_crg "$crg_db" "$pat")$'\n'
  [ -f "$gfy_json" ] && out+=$(query_graphify "$gfy_json" "$pat")$'\n'
  printf '%s' "$out" | sed '/^[[:space:]]*$/d'
}

# ── Walk up to nearest repo with any graph ────────────────────────────────────
find_graph_root() {
  local d="$1"; [ ! -d "$d" ] && d=$(dirname "$d")
  while [ "$d" != "/" ] && [ "$d" != "$HOME" ]; do
    { [ -f "$d/.code-review-graph/graph.db" ] || [ -f "$d/graphify-out/graph.json" ]; } \
      && echo "$d" && return
    d=$(dirname "$d")
  done
}

# ── Extract search pattern ────────────────────────────────────────────────────
PATTERN=$(printf '%s' "$CMD" | python3 -c "
import sys, shlex
cmd = sys.stdin.read().strip()
try: parts = shlex.split(cmd)
except: parts = cmd.split()
bases = {'grep','rg','ripgrep','egrep','fgrep','ag','ack','fd'}
idx = next((i for i,p in enumerate(parts) if p.rsplit('/',1)[-1] in bases), -1)
if idx < 0: sys.exit(0)
for p in parts[idx+1:]:
    if not p.startswith('-') and '/' not in p and len(p) > 2: print(p[:60]); break
" 2>/dev/null || echo "")

# ── Tier 4: cross-repo detection ──────────────────────────────────────────────
TARGET=$(printf '%s' "$CMD" | python3 -c "
import sys, shlex, os
try: parts = shlex.split(sys.stdin.read())
except: parts = sys.stdin.read().split()
cwd = os.getcwd()
for p in parts:
    if p.startswith('/') and not p.startswith('-') and not p.startswith(cwd): print(p); break
" 2>/dev/null || echo "")

if [ -n "$TARGET" ]; then
  ROOT=$(find_graph_root "$TARGET")
  if [ -n "$ROOT" ] && [ -n "$PATTERN" ]; then
    RESULT=$(query_at_root "$ROOT" "$PATTERN")
    if [ -n "$RESULT" ]; then
      MSG="Cross-repo graph hit (${ROOT##*/}) for '${PATTERN}':\n${RESULT}\n\nGrep proceeding — result shown as context. Open a session in that repo for deeper analysis."
      printf '{"hookSpecificOutput":{"hookEventName":"PreToolUse","additionalContext":"%s"}}' \
        "$(printf '%s' "$MSG" | json_esc)"
    fi
  fi
  exit 0  # always allow cross-repo grep
fi

# ── Local graph detection ─────────────────────────────────────────────────────
HAVE_CRG=0; HAVE_GFY=0
[ -f .code-review-graph/graph.db ]   && HAVE_CRG=1
[ -f graphify-out/graph.json ]       && HAVE_GFY=1

# ── Tier 5: no local graph → registry + global merged graph ───────────────────
if [ "$HAVE_CRG" = "0" ] && [ "$HAVE_GFY" = "0" ]; then
  EXTRA_RESULT=""

  # 5a: CRG multi-repo registry
  if command -v code-review-graph >/dev/null 2>&1 && [ -n "$PATTERN" ] && [ "${#PATTERN}" -gt 2 ]; then
    REG=$(code-review-graph repos 2>/dev/null | python3 - "$PATTERN" <<'PYEOF' 2>/dev/null
import sys, os, sqlite3
pat = sys.argv[1]; results = []
for line in sys.stdin.read().strip().splitlines():
    for token in line.split():
        if not token.startswith('/'): continue
        db = f"{token}/.code-review-graph/graph.db"
        if not os.path.exists(db): continue
        try:
            c = sqlite3.connect(f'file:{db}?mode=ro', uri=True, timeout=2)
            rows = c.execute(
                'SELECT n.kind, n.name, n.file_path, n.line_start '
                'FROM nodes_fts f JOIN nodes n ON n.id=f.rowid WHERE nodes_fts MATCH ? LIMIT 3',
                (pat,)).fetchall()
            c.close()
            for kind, name, fp, ln in rows:
                results.append(f'[crg:{token}] {kind}  {name}  →  {fp}:{ln}')
        except: pass
for r in results[:5]: print(r)
PYEOF
    )
    [ -n "$REG" ] && EXTRA_RESULT+="$REG"$'\n'
  fi

  # 5b: global merged graphify (~/obsidian-vault/merged-graph.json)
  GLOBAL_GFY="${HOME}/obsidian-vault/merged-graph.json"
  if [ -f "$GLOBAL_GFY" ] && [ -n "$PATTERN" ] && [ "${#PATTERN}" -gt 2 ]; then
    GRESULT=$(query_graphify "$GLOBAL_GFY" "$PATTERN")
    [ -n "$GRESULT" ] && EXTRA_RESULT+="$GRESULT"$'\n'
  fi

  if [ -n "$EXTRA_RESULT" ]; then
    CLEAN=$(printf '%s' "$EXTRA_RESULT" | sed '/^[[:space:]]*$/d')
    MSG="No local graph. Sibling/global graph hit for '${PATTERN}':\n${CLEAN}\n\nGrep proceeding. Open a session in the repo shown above for deeper analysis."
    printf '{"hookSpecificOutput":{"hookEventName":"PreToolUse","additionalContext":"%s"}}' \
      "$(printf '%s' "$MSG" | json_esc)"
  fi
  exit 0
fi

# ── Tier 6: local graph(s) — session-aware gating ────────────────────────────
KEY=$(printf '%s' "${PWD}" | md5 2>/dev/null || printf '%s' "${PWD}" | md5sum 2>/dev/null | cut -c1-8)
DIR="${HOME}/.cache/claude-graph-hook"; mkdir -p "$DIR"
SLOT="${DIR}/first-grep-${KEY}-$(date +%Y%m%d_%H)"

RESULT=""
[ -n "$PATTERN" ] && [ "${#PATTERN}" -gt 2 ] && \
  RESULT=$(query_at_root "." "$PATTERN")

# Adaptive tool hint — shows MCP tool (CRG) or graphify CLI or both
TOOL_HINT=""
[ "$HAVE_CRG" = "1" ] && TOOL_HINT="semantic_search_nodes_tool(query='${PATTERN}')"
[ "$HAVE_GFY" = "1" ] && {
  GFY_HINT="graphify query '${PATTERN}' --graph graphify-out/graph.json"
  TOOL_HINT="${TOOL_HINT:+${TOOL_HINT} or }${GFY_HINT}"
}

if [ ! -f "$SLOT" ]; then
  touch "$SLOT"
  if [ -n "$RESULT" ]; then
    MSG="Graph pre-answer for '${PATTERN}':\n${RESULT}\n\nIf enough — skip the grep. Running this time (one-shot). Future code-path greps denied when graph answers. Override: --graph-tried."
  else
    MSG="No graph hit for '${PATTERN}' — grep proceeding (one-shot). Next time try: ${TOOL_HINT}. Append --graph-tried to bypass permanently."
  fi
  printf '{"hookSpecificOutput":{"hookEventName":"PreToolUse","additionalContext":"%s"}}' \
    "$(printf '%s' "$MSG" | json_esc)"
  exit 0
fi

# Subsequent greps: answering deny if graph hits, else pass
if [ -n "$RESULT" ]; then
  MSG="Graph has this — no retry needed:\n\n${RESULT}\n\nUse: ${TOOL_HINT}. Append --graph-tried to override."
  printf '{"hookSpecificOutput":{"hookEventName":"PreToolUse","permissionDecision":"block","permissionDecisionReason":"%s"}}' \
    "$(printf '%s' "$MSG" | json_esc)"
  exit 0
fi

printf '[graph-hook] no result for "%s"\n' "$PATTERN" >> "${DIR}/bypass.log"
exit 0
Enter fullscreen mode Exit fullscreen mode

Wire it in settings.example.json (then copy to settings.local.json or ~/.claude/settings.json):

"PreToolUse": [
  {
    "matcher": "Bash",
    "hooks": [
      {
        "type": "command",
        "command": "bash ~/.claude/scripts/smart-grep-hook.sh",
        "timeout": 6
      }
    ]
  }
]
Enter fullscreen mode Exit fullscreen mode

Pair with a concrete SessionStart cheatsheet — injects a query→tool map once per session (~150 tokens, prevents most reflexive greps before they happen). Fires only when a graph exists in the current project, silent otherwise:

"SessionStart": [
  {
    "hooks": [{
      "type": "command",
      "command": "if [ -f .code-review-graph/graph.db ] || [ -f graphify-out/graph.json ]; then STATS=''; TOOL_LINES=''; if [ -f .code-review-graph/graph.db ]; then STATS=$(python3 -c \"import sqlite3; c=sqlite3.connect('.code-review-graph/graph.db'); n=c.execute('SELECT COUNT(*) FROM nodes').fetchone()[0]; e=c.execute('SELECT COUNT(*) FROM edges').fetchone()[0]; print(f'{n} nodes, {e} edges'); c.close()\" 2>/dev/null || echo ''); TOOL_LINES=\"  where is X defined    → semantic_search_nodes_tool(query=X)\\n  who calls X           → query_graph_tool(pattern=callers_of, target=X)\\n  pre-refactor blast    → get_impact_radius_tool(changed_files=[...])\\n  community/cluster     → list_communities_tool()\\n  code review context   → get_review_context_tool(changed_files=[...])\"; fi; if [ -f graphify-out/graph.json ]; then GFY_STATS=$(python3 -c \"import json; g=json.load(open('graphify-out/graph.json')); nodes=g.get('nodes',[]); comms=len(set(n.get('community','') for n in nodes if n.get('community',''))); print(f'{len(nodes)} nodes, {comms} communities')\" 2>/dev/null || echo ''); [ -n \"$GFY_STATS\" ] && STATS=\"${STATS:+$STATS | }graphify: $GFY_STATS\"; TOOL_LINES=\"${TOOL_LINES:+$TOOL_LINES\\n}  CRG miss / explore    → graphify query '<term>' --graph graphify-out/graph.json\\n  path A→B              → graphify path '<from>' '<to>' --graph graphify-out/graph.json\"; fi; printf '{\"hookSpecificOutput\":{\"hookEventName\":\"SessionStart\",\"additionalContext\":\"GRAPH QUERY CHEATSHEET (%s) — use BEFORE Read/Grep/Bash-find on code:\\n%s\\nSkip graph for: .md .json .yml .log .jsonl configs cross-repo paths.\\nOverride grep gate: append --graph-tried to any Bash command.\"}}' \"$STATS\" \"$TOOL_LINES\"; fi",
      "timeout": 5
    }]
  }
]
Enter fullscreen mode Exit fullscreen mode

Why --graph-tried beats a hard block:

13 characters. The agent adds it when it has already consulted the graph or when the query targets content the AST graph cannot index (string literals, config values, log text, content in another language not indexed). The hook detects the flag and passes silently — legitimate grep is never more than one retry away.

Cross-repo and no-graph: always allow, never penalise:

The hook is non-blocking in every case where the graph can't definitively help. Cross-repo greps get graph context shown as additional info, not a gate. Repos without any graph see zero hook overhead. The only hard deny is when the local graph has the exact answer — and even then, --graph-tried is the immediate escape.

Session state (hourly bucket per repo):

~/.cache/claude-graph-hook/
├── first-grep-<repo-hash>-<YYYYMMDDHH>   ← one-shot allowance marker
└── bypass.log                             ← audit: greps that passed with no graph answer
Enter fullscreen mode Exit fullscreen mode

The allowance resets hourly. The bypass.log is an audit trail — recurring patterns signal queries worth adding to a custom graph index or documentation.


Real Token Cost Comparison

Token Cost Per Lookup: grep vs graphify vs CRG

All numbers measured live on a 1336-node TypeScript monorepo, embeddings enabled (code-review-graph embed run once, then incremental after each edit).

Symbol and concept lookups

Query grep graphify query CRG keyword CRG semantic
"Where is createThread?" 594 tokens (2374 chars, 24 lines) 1532 tokens (165 nodes, BFS depth=2) 115 tokens 115 tokens
"uiState persistence layer" 1069 tokens (4274 chars, 43 lines) 1552 tokens (87 nodes) miss 100 tokens
"comment submit handler" 114 tokens (narrow query) 1538 tokens miss 100 tokens
"what imports api.ts?" 437 tokens (filenames only) 1563 tokens (378 nodes) 125 tokens (41 results) 125 tokens
"who calls createThread?" ~1109 tokens + manual filter 1532 tokens 80 tokens 80 tokens
"error handling in server" 2220 chars (noise: auto-generated .react-router types) 1538 tokens 477 chars (source only) 477 chars

Hidden grep cost: after a grep hit the model usually reads the matched file. comment-form.tsx = 909 tokens. comment-thread.tsx = 1397 tokens. A typical grep + follow-up read costs 1300–2500 tokens per question. CRG returns file:line — no follow-up read needed.

Signal quality gap: grep returns every file containing the pattern — including dist/, generated .react-router/types/, auto-compiled .d.ts, and node_modules leakage. CRG indexes only what Tree-sitter parses as source. For the "error handling" query: grep returned 20 lines from auto-generated type files before any real source. CRG returned 5 precise hits: sendError and withJsonBody in http-utils.ts, ErrorBoundary in root.tsx and diff.tsx. Zero noise.

graphify query — right tool, wrong use case: BFS depth=2 returns 87–378 nodes (~1500 tokens) regardless of query specificity. For symbol lookup ("where is createThread?"), this is 15× more expensive than CRG semantic. For neighborhood exploration ("what connects to this module when I don't know the codebase?"), the BFS output is exactly what you want — a visual map of relationships. The smart-grep hook intercepts it for symbol queries and routes through CRG; if CRG misses, graphify query proceeds. Use graphify query deliberately for exploration, not as a default lookup tool.

Why graphify appears less in session logs than you'd expect: In a measured multi-hour session on a 1336-node monorepo, graphify had 2 actual code lookups vs CRG's 36 graph tool calls. This is correct behaviour, not underuse. CRG with embeddings covers 90%+ of targeted symbol and concept queries at 100–350 tokens — graphify BFS for the same query returns 1,500 tokens. The session was targeted feature work on a known codebase; both tools appropriately routed to CRG. Graphify's value surface is: unfamiliar codebase onboarding, cross-community architecture tracing, and path exploration between distant modules — none of which were needed. If the session had started with "I've never seen this codebase, orient me", graphify would have dominated.

The CRG-miss → grep fallback gap: When semantic_search_nodes_tool returns 0 results (happened 3× for exact symbol names like validateRef, parseUnifiedDiff), the current cheatsheet doesn't tell the agent to try graphify query next. The agent falls back to grep, which returned 6–29 chars of near-nothing. The correct fallback chain is: CRG semantic (miss) → graphify query '<term>' → grep. Add this line to your CLAUDE.md trigger list or SessionStart cheatsheet to close the gap:

CRG 0 results → graphify query '<term>' --graph graphify-out/graph.json
Enter fullscreen mode Exit fullscreen mode

What grep cannot do at all

Some questions require multi-hop graph traversal. Grep answers the text search question; it cannot answer the dependency question.

Scenario: "What breaks if I change server.ts?"

With grep you would need to:

  1. Read server.ts header to find what it exports (~828 chars)
  2. grep -rn "startServer" packages/ to find 1-hop callers (~715 chars)
  3. For each caller found, grep again for their importers (~309+ chars)
  4. Aggregate manually — still missing transitive re-exports and inferred dependencies

Total: 3 separate operations, 1852+ chars, still incomplete.

With get_impact_radius_tool:

{"summary": "Blast radius for server.ts:
  - 21 nodes directly changed
  - 500 nodes impacted (within 2 hops)
  - 135 additional files affected",
 "risk": "high",
 "key_entities": ["completer","promptBranch","fetchDefaultReviewers"]}
Enter fullscreen mode Exit fullscreen mode

1 call, 274 chars, complete 2-hop picture, risk rating included.

Grep reaches one level of importers in one pass. To get two levels, you need to loop. To get re-exports of re-exports, you need to understand the module graph. get_impact_radius_tool does all of this from the pre-built AST graph — sub-second.

This matters most before refactoring: touching a widely-imported file without an impact check risks breaking 135 files silently.

get_review_context_tool — best single call for code review

Before any code review, this one call returns: risk level, impacted node count, impacted file count, and test gaps — all in ~111 tokens (session-measured average; range 80–270 depending on depth). No follow-up reads needed.

get_review_context_tool(changed_files=["packages/git/src/threads.ts"], depth="minimal", include_source=false)

→ Risk: high
→ 359 impacted nodes in 126 files
→ test_gaps: 15
Enter fullscreen mode Exit fullscreen mode

Compare with manually: read file (828 tokens) + grep importers (715 tokens) + read callers to assess risk (1400+ tokens) = 2943 tokens, incomplete.

Add it to your review workflow via the SessionStart cheatsheet — it fires once per session at ~150 tokens total, pointing the agent at this tool first for any PR or diff review.

The winner by query type

┌────────────────────────────────┬────────────────────────────────┬──────────────────────────────┐
│ Query type                     │ Best tool                      │ Est. tokens                  │
├────────────────────────────────┼────────────────────────────────┼──────────────────────────────┤
│ Find function/class X          │ semantic_search_nodes_tool     │ ~100                         │
│ Who calls X?                   │ query_graph_tool(callers_of)   │ ~80                          │
│ What imports file X?           │ query_graph_tool(importers)    │ ~125 (41 results)            │
│ Concept graph traversal        │ traverse_graph_tool(query=X)   │ ~1000 (measured avg, depth=3) │
│ Pre-refactor impact check      │ get_impact_radius_tool         │ ~70, complete 2-hop          │
│ Code review / PR impact        │ get_review_context_tool        │ ~111 (measured avg), risk + test gaps │
│ Module/behavioral clusters     │ list_communities_tool          │ ~170 (12 communities)        │
│ CRG miss / neighborhood explore│ graphify query '<term>'        │ ~1500 (BFS — use sparingly)  │
│ Exact path A→B hop-by-hop      │ graphify path '<A>' '<B>'      │ ~200                         │
│ String/regex in code body      │ grep (# --graph-tried)         │ varies                       │
│ Config/JSON/log values         │ grep — graph can't index these │ varies                       │
└────────────────────────────────┴────────────────────────────────┴──────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Real session measurement: complete token accounting

The per-query numbers above cover tool result sizes. A complete picture must also include the hidden context costs each approach carries: MCP schema overhead, required setup reads, and cache dynamics. These numbers were measured from a live multi-hour coding session on the same 1336-node monorepo (578 total API calls).

Hidden context overhead per approach:

Approach Schema/setup overhead Per-call result How it's paid
grep 0 tokens (none) ~550 tok/call per call, direct
graphify CLI 0 tokens (it's a Bash call) ~292 tok/call per call, direct
+ GRAPH_REPORT.md +458 tok (partial read, measured) one-time Read into context once
CRG MCP tools ~6,000 tok always (~25 tool schemas, default) — drops to ~1,800 tok with CRG_TOOLS allow-list (8 tools) ~69–1,359/call written to cache on cold start, then served from cache at 10× lower cost

Honest full-session accounting (37 graph tool calls):

What With CRG With grep+Read
Schema/setup overhead ~6,000 tok (cached) 0
Tool result tokens 14,167 tok (measured) 110,335 tok (estimated)
Total ~20,167 tok ~110,335 tok
Net saving ~90,168 tokens — 5.5× cheaper after schema overhead

CRG schema tokens are written to the prompt cache on the first API call of a session (cold start: 59,310 tokens including system prompt + all tool schemas). After that they're served from cache_read at $0.30/MTok — one-tenth the cost of new input ($3.00/MTok). The get_architecture_overview_tool block saved 33,000 tokens from a single denied call; that alone exceeds the CRG schema overhead.

Cache economics for the full session:

Metric Value
Cold-start cache_write (call 1) 59,310 tokens — full context incl. all schemas
Total cache_write across session 3,056,502 tokens
Cold-start share of writes 1.9%
Total cache_read 50,942,267 tokens
New input tokens 7,694 tokens
Est. session cost (Sonnet 4.x) ~$36 (cache_write $11.46 + cache_read $15.28 + output $9.49)

Estimated cost without CRG (same session, grep+Read instead): the additional 90K tool result tokens would each need to be cache-written then read — adding roughly $0.34 in cache_write + $2.70 in cache_read per 100K tokens. More importantly, a larger context window means cache blocks expire and must be re-created more often, compounding costs across 578 calls.

Three-way comparison including all hidden costs (37 lookup calls, measured session):

                      grep          graphify CLI      CRG MCP
─────────────────────────────────────────────────────────────
Schema overhead       0 tok         0 tok             ~6,000 tok
                      (none)        (Bash call,       (~25 tool schemas,
                                     no MCP schema)    always in context)
Setup read            0 tok         +458 tok          0 tok
                                    (GRAPH_REPORT.md,
                                     one partial read,
                                     measured)
Per-call result       ~550 tok      ~292 tok          ~69–1,359 tok
Follow-up Read?       +~1,842 tok   not needed        not needed
                      (usually yes)

37 calls — total cost:
  Tool results        ~20,350 tok   ~10,804 tok       14,167 tok (measured)
  + overhead          0             +458 tok          +6,000 tok
  ─────────────────────────────────────────────────────────────
  TOTAL               ~20,350 tok   ~11,262 tok       ~20,167 tok

vs grep+Read equiv:   110,335 tok   110,335 tok       110,335 tok
Net saving            ~5.4×         ~9.8×             ~5.5×
Enter fullscreen mode Exit fullscreen mode

graphify CLI is the most token-efficient per lookup when full overhead is counted — zero MCP schema tax, no follow-up Read needed, compact output. Its limitation is output size at BFS scale (87–378 nodes, ~1,500 tokens) and needing GRAPH_REPORT.md for orientation. For targeted symbol lookup CRG wins; for schema-free cost-conscious exploration graphify wins.

CRG's 6K schema overhead pays back fast: the blocked get_architecture_overview_tool call alone saved 33,000 tokens — 5× the entire schema overhead — from a single hook denial in this session. After cold-start, schemas cost $0.30/MTok (cache_read) not $3.00/MTok (new input).

Graphify vs CRG split: graphify had 2 actual code lookups; CRG had 36 tool calls. This is the right ratio for targeted feature work on a known codebase — CRG semantic search (100–350 tok) dominated because the work was precise. Graphify BFS (~1,500 tok) is for orientation on unfamiliar ground. Same session on an unknown codebase would flip the ratio.

Strip Unused CRG Tools (Cuts Schema Overhead 70%)

CRG registers 25 MCP tools by default but a typical session uses ~8. Every unused tool still costs always-loaded schema tokens. CRG ships with a built-in allow-list filter — _apply_tool_filter removes tools at server boot via mcp.local_provider.remove_tool(name). Filtered tools cannot be invoked at all, so the explicit PreToolUse block on get_architecture_overview_tool becomes redundant.

Set CRG_TOOLS in ~/.claude.json mcpServers env (or pass serve --tools ...):

"mcpServers": {
  "code-review-graph": {
    "type": "stdio",
    "command": "uvx",
    "args": ["--with", "code-review-graph[embeddings]", "code-review-graph", "serve"],
    "env": {
      "CRG_TOOLS": "semantic_search_nodes_tool,query_graph_tool,get_impact_radius_tool,traverse_graph_tool,list_communities_tool,get_community_tool,get_review_context_tool,list_graph_stats_tool"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The 8-tool working set covers 95% of session usage:

Tool When
semantic_search_nodes_tool "where is X defined" — embedding match, ~100–350 tok
query_graph_tool callers/callees/relations — pattern=callers_of, ~80 tok
get_impact_radius_tool pre-refactor blast radius
traverse_graph_tool dependency walk (keep depth ≤ 3)
list_communities_tool architecture overview, ~170 tok for 12 clusters
get_community_tool cluster drill-down
get_review_context_tool code review with risk + test gaps, ~111 tok avg
list_graph_stats_tool sanity check / status

Dropped (17 tools): get_architecture_overview_tool (33k tok output — never useful), refactor_tool / apply_refactor_tool (use git), get_minimal_context_tool, get_docs_section_tool, get_wiki_page_tool, find_large_functions_tool, list_flows_tool / get_flow_tool / get_affected_flows_tool (flow analysis is rare), get_hub_nodes_tool / get_bridge_nodes_tool (covered by communities), get_knowledge_gaps_tool, get_surprising_connections_tool, get_suggested_questions_tool, list_repos_tool / cross_repo_search_tool (single-repo work).

Token math:

  • Before: ~6,000 tok schema overhead (25 tools, full descriptions + params, always loaded after cold-start)
  • After: ~1,800 tok (8 tools)
  • Savings: ~4,200 tok per session, ~70% schema reduction
  • Bonus: Claude Code's deferred-tool layer further lazy-loads param schemas via ToolSearch — only the description metadata stays always-loaded, so real overhead is even lower

Lazy loading reality check: MCP protocol itself does not support lazy-loading tool schemas — all are listed at startup. Claude Code adds a deferred-tool layer on top: tool descriptions are exposed for matching, but full param schemas only load when ToolSearch resolves a tool by name. So stripping at the CRG layer cuts the always-loaded description set; the Claude Code layer cuts the params. They compound.

By case:

Workflow Strategy
Targeted symbol lookup on known codebase semantic_search_nodes_tool → fallback to query_graph_tool → fallback to graphify query → grep
Pre-refactor blast get_impact_radius_tool + traverse_graph_tool(depth=2)
Architecture orientation list_communities_toolget_community_tool (NEVER get_architecture_overview_tool)
Code review on a PR get_review_context_tool(changed_files=[...]) only
Cross-repo work re-add cross_repo_search_tool, list_repos_tool to allow-list
Flow / DAG analysis re-add list_flows_tool, get_flow_tool, get_affected_flows_tool

The concurrent session had zero graph tools: a parallel session on the same codebase used only grep+Read (15 grep calls, 22 reads). Cold start: 60,610 tokens (same MCP schemas loaded — schemas are always present regardless of whether graph tools are used). Tool result tokens were proportionally higher per lookup, with no structured relationship data returned.

Bypassing the smart-grep hook: append # --graph-tried as a shell comment (not a flag) to pass silently: grep -rn "TODO" src/ # --graph-tried. On macOS, ugrep rejects --graph-tried as an unknown flag — always use it as a comment.

Expensive Calls to Avoid

Some graph tool calls look useful but are token-budget traps.

get_architecture_overview_tool — returns 131,000 chars (~33,000 tokens), exceeds any usable context budget. Best fix: strip it via CRG_TOOLS allow-list (see "Strip Unused CRG Tools" above) so it cannot be called at all. If you can't strip it, block via PreToolUse hook. Alternative: list_communities_tool first (~170 tokens for 12 clusters), then get_community_tool for any cluster you want to drill into.

graphify query '<term>' — BFS depth=2 returns 87–378 nodes (~1500 tokens) regardless of specificity. The smart-grep hook intercepts it: if CRG has a matching symbol it blocks with the CRG result inline (~100 tokens); if CRG misses it allows graphify to proceed but warns about cost. This is the right split — CRG for symbol lookup, graphify for neighborhood exploration when you genuinely want to see what's connected.

traverse_graph_tool on hub nodes without budget — takes a query string (not source/target nodes), does concept-based traversal, and at depth=5+ on hub nodes (server.ts, api.ts) returns thousands of nodes. Keep depth ≤ 3 and token_budget ≤ 1500. For a specific A→B path, use graphify path '<A>' '<B>' instead — it returns the shortest hop chain in ~200 tokens. For "who calls X", use query_graph_tool(pattern=callers_of, target=X) (~80 tokens).

Tool stripping > hook blocking. The cleanest enforcement is the CRG_TOOLS allow-list shown earlier — stripped tools are removed from the MCP server entirely, so they cannot be invoked and their schemas never enter context. The PreToolUse hook block is only useful if you cannot edit the MCP server env (e.g., shared config).

The graphify query interception lives inside smart-grep-hook.sh and requires no extra config — it fires via the existing Bash PreToolUse hook.

Making This Work for Any Project

The hooks in settings.example.json are designed to be portable — they silently no-op when the relevant tool or graph data isn't present. This means you can copy them to settings.local.json per-project, or into ~/.claude/settings.json to activate across all projects.

What the hooks do (each row is a no-op when the relevant tool/file is missing):

Hook What fires When Resource-safe?
Stop CRG update + embed (background, PID-guarded — silent if not installed/initialized) Once after AI finishes each turn (all edits batched) PID file prevents overlap
PreToolUse smart-grep-hook.sh (graph-first routing — adaptive: CRG only, graphify only, both, or silent fallback) Before every grep/find/graphify query Read-only, no resource concern
PreToolUse Linter config guard Before writing ESLint/Prettier/Biome N/A
SessionStart Graph cheatsheet (adaptive: shows CRG tools, graphify CLI, both, or nothing) Session open, only if graph exists Read-only
SessionStart Setup nudge (only when CLI installed but graph not initialized) Session open Read-only

What the git hooks do (resource-guarded, all platforms):

Hook What fires When Guards
post-commit graphify rebuild + CRG update + embed (both background nohup) Every commit _resources_ok + pgrep + timeout 300
post-checkout graphify rebuild + CRG update/build (smart: ≤5 files incremental, >5 full) Branch switch only _resources_ok + pgrep + timeout 300
pre-commit Branch protection (blocks direct commits to master/development/release branches) Every commit attempt

graphify is not in the Stop hook. Running graphify update after every AI turn causes long-running background processes that accumulate, spike CPU, and exhaust swap. graphify update fires only via git hooks (post-commit, post-checkout) where it runs detached after the developer's action — invisible latency.

Why no get_architecture_overview_tool block hook? The recommended setup uses the CRG_TOOLS allow-list to strip that tool entirely (see "Strip Unused CRG Tools"). A stripped tool cannot be invoked, so the explicit block hook becomes redundant. If you cannot or do not want to strip, add the PreToolUse block as a fallback.

Project .claude/settings.json is only for project-specific permissions — which package-manager commands to allow, which config files to protect. Graph hooks go in settings.example.json (committed reference) and settings.local.json (live copy) — or your global ~/.claude/settings.json if you want them everywhere.

One-time setup per project — pick what you have installed:

# CRG only (embedding-aware, MCP integration)
code-review-graph install
code-review-graph embed

# Graphify only (pure CLI, zero MCP overhead)
graphify init .
graphify update .

# Both — run all four; the smart-grep-hook routes CRG-first, graphify on miss

# Git hooks — graphify ONLY updates via git hooks (too slow for Claude hooks)
code-review-graph hook install   # post-commit: CRG update + embed (also runs in Stop hook)
graphify hook install            # post-commit + post-checkout: graphify update (background nohup)
Enter fullscreen mode Exit fullscreen mode

After setup: every time the AI finishes a turn, the Stop hook fires once — CRG only updates in background (PID-guarded, ~0.425s, no pile-up). Every commit triggers both git hooks. graphify never runs from a Claude hook — only from git hooks. If only one tool is installed, only that one updates. If neither is installed, hooks are silent no-ops.


Step 4: Verify Your Setup

Run the checks for whichever tools you installed. Skip the rows for tools you did not install — there is no requirement to have both.

Check CLI installation

# Either or both:
graphify --help | head -3            # graphify only
code-review-graph --version          # CRG only
# code-review-graph 2.3.2
Enter fullscreen mode Exit fullscreen mode

Check graph outputs exist

# Graphify (skip if not installed):
ls graphify-out/
# cache  graph.html  graph.json  GRAPH_REPORT.md  manifest.json

# CRG (skip if not installed):
code-review-graph status
# Nodes: 5780  Edges: 30611  Files: 1052
# Languages: bash, javascript, vue, typescript
# Last updated: 2026-05-05T18:29:51
Enter fullscreen mode Exit fullscreen mode

Check GRAPH_REPORT.md freshness

head -15 graphify-out/GRAPH_REPORT.md
# ## Corpus Check
# - 1020 files · ~1,587,186 words
# - Verdict: corpus is large enough that graph structure adds value.
#
# ## Summary
# - 3815 nodes · 4830 edges · 750 communities (533 shown, 217 thin omitted)
# - Extraction: 87% EXTRACTED · 13% INFERRED · 0% AMBIGUOUS
#
# ## Graph Freshness
# - Built from commit: `4968d67a`
# - Run `git rev-parse HEAD` and compare to check if the graph is stale.
Enter fullscreen mode Exit fullscreen mode

Check Stop hook is in settings.json (or settings.local.json)

python3 -m json.tool .claude/settings.json | grep -A5 '"Stop"'
# Should show the CRG PID-guard command
# PostToolUse should be empty ([]) — graph updates moved to Stop hook

# Verify PostToolUse is cleared (no graph update commands there):
python3 -m json.tool .claude/settings.local.json | grep -A3 '"PostToolUse"'
# Should show: "PostToolUse": []
Enter fullscreen mode Exit fullscreen mode

Check MCP server config

cat .mcp.example.json
# Should show: code-review-graph serve entry
# Copy to .mcp.json locally if it doesn't exist yet:
cp .mcp.example.json .mcp.json
Enter fullscreen mode Exit fullscreen mode

Check git hooks

# Husky projects (hooks live in .husky/, not .husky/_/)
grep -c "graphify" .husky/post-commit
# 1 (or more)

# Plain git projects
head -5 .git/hooks/pre-commit
Enter fullscreen mode Exit fullscreen mode

Test incremental update speed

time code-review-graph update --skip-flows
# real  0m0.425s  ← CRG incremental speed (runs in Stop hook after each AI turn)

time graphify update .
# SHA256 cache skips unchanged files — subsequent runs are near-instant
# Note: graphify is slower (~10s on large repos) — this is why it moved to Stop
# hook (once per turn) rather than PostToolUse (once per file edit)
Enter fullscreen mode Exit fullscreen mode

Test vault sync (if vault is configured)

cp graphify-out/GRAPH_REPORT.md ai-vault/graphify/GRAPH_REPORT.md
ls ai-vault/graphify/
# GRAPH_REPORT.md
Enter fullscreen mode Exit fullscreen mode

Step 5: Query the Graphs

Graphify CLI

# BFS traversal — find connections between concepts
graphify query "what connects payment to enrollment" \
  --graph graphify-out/graph.json --budget 1500

# Depth-first traversal for flow tracing
graphify query "auth flow" --dfs --graph graphify-out/graph.json

# Shortest path between two nodes
graphify path "UserDashboard.vue" "PaymentService" \
  --graph graphify-out/graph.json

# Plain-language explanation of a node
graphify explain "UserDashboard.vue" --graph graphify-out/graph.json
Enter fullscreen mode Exit fullscreen mode

Example output from graphify explain:

Node: UserDashboard.vue
  Source:    packages/client/src/views/Dashboard
  Community: 239
  Degree:    3

Connections (3):
  --> UserDashboard()  [imports_from] [EXTRACTED]
  --> handleApiError() [contains]     [EXTRACTED]
  --> AuthService      [calls]        [INFERRED]
Enter fullscreen mode Exit fullscreen mode

Inside Your AI Agent

After graphify install and code-review-graph install, your AI agent gets graph tools automatically. For Claude Code, a SessionStart hook shows the graph status on every session open. Cursor, Gemini CLI, and Windsurf pick up the MCP server from .mcp.json instead.

# code-review-graph status on session open:
Nodes: 5780  Edges: 30611  Files: 1052
Languages: bash, javascript, vue, typescript
Last updated: 2026-05-05T18:29:51
Enter fullscreen mode Exit fullscreen mode

Step 6: Auto-Update Strategies

The graph is only valuable if it reflects current code. Here are the four strategies, from manual to fully automatic.

Strategy A: Manual Update (Always Available)

# Incremental (only changed files — fast)
code-review-graph update        # 0.425s on ~1000-file project
graphify update .               # SHA256 cache, AST-only

# Full rebuild (when switching branches or major refactors)
code-review-graph build
graphify update . --force
Enter fullscreen mode Exit fullscreen mode

Strategy B: Auto-Update via Git Hooks (Recommended for graphify)

graphify update belongs in git hooks, not in Claude Code hooks. The incremental rebuild takes ~10s+ on a real monorepo — too slow for a per-turn trigger. Git hooks fire at the right moment: after a commit or branch switch, when the developer has already moved on.

graphify hook install   # installs post-commit + post-checkout
Enter fullscreen mode Exit fullscreen mode

The post-commit hook runs the rebuild in the background (detached nohup process) so git commit returns immediately. Rebuild logs go to ~/.cache/graphify-rebuild.log. The post-checkout hook fires on every git checkout <branch> and rebuilds for the new branch state.

This is the only recommended automatic trigger for graphify update — do not add it to Stop, PostToolUse, or SessionStart hooks.

The hooks installed by graphify hook install — both run as detached nohup so the git operation returns immediately. Add the _resources_ok function (shown in the post-commit hook above) to the top of each hook file to prevent resource exhaustion.

.git/hooks/post-commit — see the full resource-guarded version in the "Auto-Update on Commit" section above.

.git/hooks/post-checkout (or .husky/post-checkout for Husky projects):

#!/bin/sh
# Knowledge graph tools are optional — each section silently skips if tools are absent.

# _resources_ok function — copy from the post-commit hook above
# (CPU ≤ 50% of logical cores, memory ≥ 2 GB free — same implementation)

PREV_HEAD=$1
NEW_HEAD=$2
BRANCH_SWITCH=$3

# Only run on branch switches, not file checkouts
[ "$BRANCH_SWITCH" != "1" ] && exit 0

# Skip during rebase/merge/cherry-pick
GIT_DIR=$(git rev-parse --git-dir 2>/dev/null)
[ -d "$GIT_DIR/rebase-merge" ] && exit 0
[ -d "$GIT_DIR/rebase-apply" ] && exit 0
[ -f "$GIT_DIR/MERGE_HEAD" ] && exit 0
[ -f "$GIT_DIR/CHERRY_PICK_HEAD" ] && exit 0

# Smart rebuild threshold — shared by graphify and CRG sections below
_ZERO="0000000000000000000000000000000000000000"
_CHANGED_COUNT=$(git diff --name-only "$PREV_HEAD" "$NEW_HEAD" 2>/dev/null | wc -l | tr -d ' ')
_FULL_REBUILD=false
{ [ "$PREV_HEAD" = "$_ZERO" ] || [ "${_CHANGED_COUNT:-0}" -gt 5 ]; } && _FULL_REBUILD=true

# ── graphify: force-rebuild graph after branch switch ────────────────────────
# Entire block skips if graphify is not installed or graph has never been built.
if command -v graphify > /dev/null 2>&1 && [ -d "graphify-out" ]; then
  # (Python interpreter detection — same as post-commit hook)
  GRAPHIFY_BIN=$(command -v graphify)
  # ... shebang detection omitted for brevity — copy from post-commit hook ...
  GRAPHIFY_PYTHON="python3"  # fallback

  if [ -n "$GRAPHIFY_PYTHON" ] && ! pgrep -qf 'graphify' 2>/dev/null && _resources_ok; then
    _LOG="${HOME}/.cache/graphify-rebuild.log"
    mkdir -p "$(dirname "$_LOG")"
    [ "$_FULL_REBUILD" = "true" ] && export GRAPHIFY_FORCE=true
    echo "[graphify] Branch switched (${_CHANGED_COUNT:-?} files diff, full=$_FULL_REBUILD) - launching background rebuild (log: $_LOG)"
    nohup timeout 300 $GRAPHIFY_PYTHON -c "
from graphify.watch import _rebuild_code
from pathlib import Path
import os, sys
try:
    _force = os.environ.get('GRAPHIFY_FORCE', '').lower() in ('1', 'true', 'yes')
    _rebuild_code(Path('.'), force=_force)
except Exception as exc:
    print(f'[graphify] Rebuild failed: {exc}')
    sys.exit(1)
" > "$_LOG" 2>&1 < /dev/null &
    disown 2>/dev/null || true
  fi
fi

# ── code-review-graph: update MCP graph index after branch switch ────────────
if command -v code-review-graph > /dev/null 2>&1 && [ -d .code-review-graph ] && ! pgrep -qf 'code-review-graph update\|code-review-graph build' 2>/dev/null && _resources_ok; then
  _LOG="${HOME}/.cache/code-review-graph-update.log"
  mkdir -p "$(dirname "$_LOG")"
  if [ "$_FULL_REBUILD" = "true" ]; then
    echo "[code-review-graph] Branch switched (full rebuild: ${_CHANGED_COUNT:-?} files diff) - launching background build (log: $_LOG)"
    nohup sh -c 'code-review-graph build' > "$_LOG" 2>&1 < /dev/null &
  else
    echo "[code-review-graph] Branch switched (incremental: ${_CHANGED_COUNT:-?} files diff) - launching background update (log: $_LOG)"
    nohup sh -c 'code-review-graph update --skip-flows && code-review-graph embed' > "$_LOG" 2>&1 < /dev/null &
  fi
  disown 2>/dev/null || true
fi
Enter fullscreen mode Exit fullscreen mode

Key differences from post-commit:

  • Only runs on actual branch switches (BRANCH_SWITCH=1 check)
  • Smart rebuild: ≤5 files diff → incremental update; >5 files or new branch → full rebuild
  • CRG uses build (full SQLite rebuild) for large diffs, update --skip-flows for small ones
  • graphify sets GRAPHIFY_FORCE=true for large diffs to force full AST reparse

Strategy C: Git Hooks (On Commit + Branch Switch)

Both tools install git hooks automatically:

# code-review-graph: pre-commit hook in .git/hooks/pre-commit
code-review-graph install

# graphify: installs hooks in .git/hooks/ — for Husky projects, copy the hook logic to .husky/post-commit
graphify hook install
Enter fullscreen mode Exit fullscreen mode

The graphify post-commit hook runs the rebuild in the background (detached process) so git commit returns immediately. Rebuild logs go to ~/.cache/graphify-rebuild.log.

Strategy D: Claude Code Hooks (AI-Driven Updates)

Only CRG belongs in Claude hooks. graphify does not.

Tool Claude hook? Why
code-review-graph ✅ Stop hook ~0.425s incremental — safe after every AI turn
graphify ❌ No Claude hook ~10s+ on large monorepos — causes hanging background processes, CPU spike, swap exhaustion

When an AI agent edits files, the Stop hook triggers a CRG incremental update. code-review-graph install writes hooks into .claude/settings.json initially — move them out. The right home is .claude/settings.example.json (committed reference for teammates) and .claude/settings.local.json (your live runtime copy, gitignored).

Use the fallback-safe version that silently no-ops if the CLI isn't installed:

At 0.425s per update, CRG runs after every AI turn without blocking the agent's workflow.

The settings.example.json hooks (copy to settings.local.json for this project, or to ~/.claude/settings.json for all projects):

{
  "hooks": {
    "PostToolUse": [],
    "Stop": [
      {
        "hooks": [{
          "type": "command",
          "command": "command -v code-review-graph >/dev/null 2>&1 && [ -d .code-review-graph ] && { PF=/tmp/crg-claude.pid; if [ -f \"$PF\" ] && kill -0 \"$(cat \"$PF\")\" 2>/dev/null; then true; else { code-review-graph update --skip-flows 2>/dev/null && nohup code-review-graph embed >/dev/null 2>&1 & } & echo $! > \"$PF\"; fi; } || true",
          "timeout": 5
        }]
      }
    ],
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [{
          "type": "command",
          "command": "bash ~/.claude/scripts/smart-grep-hook.sh",
          "timeout": 6
        }]
      }
    ],
    "SessionStart": [
      {
        "hooks": [{
          "type": "command",
          "command": "command -v code-review-graph >/dev/null 2>&1 && code-review-graph status 2>/dev/null || true",
          "timeout": 10
        }]
      },
      {
        "hooks": [{
          "type": "command",
          "command": "if command -v code-review-graph >/dev/null 2>&1 && git rev-parse --is-inside-work-tree >/dev/null 2>&1 && [ ! -d .code-review-graph ]; then printf '{\"systemMessage\":\"Graph tool installed but not yet initialized. Ask me to set up: code-review-graph (code-review-graph install)\"}'; fi || true",
          "timeout": 10
        }]
      },
      {
        "hooks": [{
          "type": "command",
          "command": "if [ -f .code-review-graph/graph.db ] || [ -f graphify-out/graph.json ]; then STATS=\"\"; TOOL_LINES=\"\"; if [ -f .code-review-graph/graph.db ]; then STATS=$(python3 -c \"import sqlite3; c=sqlite3.connect('.code-review-graph/graph.db'); n=c.execute('SELECT COUNT(*) FROM nodes').fetchone()[0]; e=c.execute('SELECT COUNT(*) FROM edges').fetchone()[0]; print(f'{n} nodes, {e} edges'); c.close()\" 2>/dev/null || echo \"\"); TOOL_LINES=\"  where is X defined    → semantic_search_nodes_tool(query=X)\\n  who calls X           → query_graph_tool(pattern=callers_of, target=X)\\n  pre-refactor blast    → get_impact_radius_tool(changed_files=[...])\\n  community/cluster     → list_communities_tool()\\n  code review context   → get_review_context_tool(changed_files=[...])\"; fi; if [ -f graphify-out/graph.json ]; then GFY_STATS=$(python3 -c \"import json; g=json.load(open('graphify-out/graph.json')); nodes=g.get('nodes',[]); comms=len(set(n.get('community','') for n in nodes if n.get('community',''))); print(f'{len(nodes)} nodes, {comms} communities')\" 2>/dev/null || echo \"\"); [ -n \"$GFY_STATS\" ] && STATS=\"${STATS:+$STATS | }graphify: $GFY_STATS\"; TOOL_LINES=\"${TOOL_LINES:+$TOOL_LINES\\n}  CRG miss / explore    → graphify query '<term>' --graph graphify-out/graph.json\\n  path A→B              → graphify path '<from>' '<to>' --graph graphify-out/graph.json\"; fi; printf '{\"hookSpecificOutput\":{\"hookEventName\":\"SessionStart\",\"additionalContext\":\"GRAPH QUERY CHEATSHEET (%s) — use BEFORE Read/Grep/Bash-find on code:\\n%s\\nSkip graph for: .md .json .yml .log .jsonl configs cross-repo paths.\\nOverride grep gate: append --graph-tried to any Bash command.\"}}' \"$STATS\" \"$TOOL_LINES\"; fi",
          "timeout": 5
        }]
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

Why Stop, not PostToolUse? PostToolUse fires after every individual file edit — if Claude makes 10 edits in one response, it fires 10 times. The PID-file guard has a race window, so concurrent CRG Python processes pile up. Stop fires once when the AI finishes its entire turn — all edits batched, single update, PID-guarded. Zero pile-up by design.

Why PostToolUse: []? Explicitly empty. Previous setups used PostToolUse for CRG updates. Clearing it prevents stale hook configs from double-triggering with the Stop hook.


Step 7: Generate Additional Outputs

Resource exhaustion prevention — all git hooks above include a _resources_ok() guard that checks CPU load (≤ 50% of logical cores) and available memory (≥ 2 GB) before spawning any background rebuild. If either threshold is exceeded, the rebuild silently skips and logs why. This prevents a real problem: rapid-fire commits (amend chains, interactive rebase) spawning multiple concurrent graphify/CRG processes — observed at 3 stuck processes at 65–73% CPU each, load average 12+, RAM saturated, machine unresponsive. The pgrep guard adds process-level deduplication (if a previous rebuild is still running, skip). The timeout 300 ensures no background process lives longer than 5 minutes. Together these three guards ensure graph rebuilds are always non-blocking and device-safe.

Wiki (Markdown pages per community)

code-review-graph wiki
# Wiki: 28 new pages → .code-review-graph/wiki/

graphify tree --graph graphify-out/graph.json
# D3 collapsible-tree HTML
Enter fullscreen mode Exit fullscreen mode

Interactive Visualization

code-review-graph visualize
# → .code-review-graph/graph.html

# graphify-out/graph.html is generated automatically during build
Enter fullscreen mode Exit fullscreen mode

Blast-Radius Analysis

# What does this commit affect?
code-review-graph detect-changes --base HEAD~1 --brief

# Risk-scored output:
# Analyzed 3 changed file(s)
# - 2 changed function(s)/class(es)
# - 1 affected flow(s)
# - Overall risk score: 0.42
Enter fullscreen mode Exit fullscreen mode

Step 8: Obsidian Vault (Visual Graph + Persistent Memory)

The Obsidian layer adds a human-readable, visual knowledge graph on top of your codebase. You've probably seen screenshots shared in the Claude Code / graphify community — a force-directed graph of hundreds of connected nodes. That's Obsidian's Graph View rendering one .md file per symbol, linked by imports.

Install Obsidian:

# Ubuntu
snap install obsidian

# macOS
brew install --cask obsidian
Enter fullscreen mode Exit fullscreen mode

Three Vault Modes

Mode Where Best for
Browser only graphify-out/graph.html One-off exploration, no Obsidian needed
Project vault <project>/ai-vault/ Single project, clean wikilinks
Global vault ~/obsidian-vault/ Multiple projects + cross-repo merged view

Mode 1: Browser Only (Zero Setup)

The graph.html file graphify generates is a self-contained interactive D3 force graph — drag nodes, zoom, click to inspect. No Obsidian required.

xdg-open graphify-out/graph.html   # Linux
open graphify-out/graph.html        # macOS
Enter fullscreen mode Exit fullscreen mode

Mode 2: Project Vault

The project vault lives at ./ai-vault/ inside the project (gitignored). It has two layers:

Layer 1 — Basic (GRAPH_REPORT.md only):

mkdir -p ai-vault/{graphify,permanent,logs,chats}
cp graphify-out/GRAPH_REPORT.md ai-vault/graphify/
Enter fullscreen mode Exit fullscreen mode

This gives you the community report with wikilinks — useful for navigating clusters.

Layer 2 — Rich vault (one .md per symbol):

The GRAPH_REPORT.md uses wikilinks like [[_COMMUNITY_Community 0]], but those target files don't exist yet. To make Obsidian's Graph View actually light up with a full network, generate individual note files for every node and community:

Save this script to ~/.graphify/gen-obsidian-vault.py:

#!/usr/bin/env python3
"""
gen-obsidian-vault.py — Generate Obsidian notes from graphify graph.json.

Usage:
  # Project vault
  python3 ~/.graphify/gen-obsidian-vault.py --project-root /path/to/project

  # Global vault (multiple projects)
  python3 ~/.graphify/gen-obsidian-vault.py \
    --project-root /path/to/project \
    --vault ~/obsidian-vault \
    --project-name my-project

  # Merged cross-repo graph
  python3 ~/.graphify/gen-obsidian-vault.py \
    --merged-graph ~/obsidian-vault/merged-graph.json \
    --vault ~/obsidian-vault \
    --project-name merged
"""

import argparse, json, os, re, sys
from collections import defaultdict
from pathlib import Path

MAX_LINKS = 30

def _build_adjacency(nodes, links):
    out, inc = defaultdict(list), defaultdict(list)
    for link in links:
        src, tgt, rel = link["source"], link["target"], link["relation"]
        if src in nodes and tgt in nodes:
            out[src].append((tgt, rel))
            inc[tgt].append((src, rel))
    return out, inc

def _node_files(nodes, out, inc, nodes_dir, node_pfx, comm_pfx):
    nodes_dir.mkdir(parents=True, exist_ok=True)
    for nid, node in nodes.items():
        lines = [f"# {node['label']}", ""]
        if node.get("source_file"): lines.append(f"**File:** `{node['source_file']}`  ")
        if node.get("source_location"): lines.append(f"**Location:** {node['source_location']}  ")
        if node.get("file_type"): lines.append(f"**Type:** {node['file_type']}  ")
        lines.append(f"**Community:** [[{comm_pfx}/Community {node.get('community','?')}|Community {node.get('community','?')}]]")
        lines.append("")
        if out[nid]:
            lines.append("## Imports / Depends On")
            for tgt_id, rel in out[nid][:MAX_LINKS]:
                if nodes.get(tgt_id): lines.append(f"- [[{node_pfx}/{tgt_id}|{nodes[tgt_id]['label']}]] `{rel}`")
            lines.append("")
        if inc[nid]:
            lines.append("## Used By")
            for src_id, rel in inc[nid][:MAX_LINKS]:
                if nodes.get(src_id): lines.append(f"- [[{node_pfx}/{src_id}|{nodes[src_id]['label']}]] `{rel}`")
            lines.append("")
        (nodes_dir / f"{nid}.md").write_text("\n".join(lines))

def _community_files(nodes, out, inc, comm_dir, node_pfx, comm_pfx):
    comm_dir.mkdir(parents=True, exist_ok=True)
    groups = defaultdict(list)
    for nid, node in nodes.items(): groups[node.get("community", "?")].append(nid)
    for cid, mids in sorted(groups.items()):
        lines = [f"# Community {cid}", "", f"**Members:** {len(mids)}", "", "## Members"]
        for mid in mids:
            if nodes.get(mid): lines.append(f"- [[{node_pfx}/{mid}|{nodes[mid]['label']}]]")
        cross = {nodes[t].get("community") for m in mids for t, _ in out[m] if nodes.get(t) and nodes[t].get("community") != cid}
        cross |= {nodes[s].get("community") for m in mids for s, _ in inc[m] if nodes.get(s) and nodes[s].get("community") != cid}
        if cross:
            lines += ["", "## Connected Communities"] + [f"- [[{comm_pfx}/Community {c}|Community {c}]]" for c in sorted(cross) if c]
        lines.append("")
        (comm_dir / f"Community {cid}.md").write_text("\n".join(lines))
    return len(groups)

def generate(project_root=None, vault_root=None, project_name=None, graph_json_path=None):
    graph_json = Path(graph_json_path) if graph_json_path else Path(project_root) / "graphify-out" / "graph.json"
    if not graph_json.exists():
        print(f"ERROR: {graph_json} not found. Run 'graphify update .' first."); sys.exit(1)
    g = json.loads(graph_json.read_text())
    nodes = {n["id"]: n for n in g["nodes"]}
    links = g.get("links", [])
    out, inc = _build_adjacency(nodes, links)
    vault = Path(vault_root).expanduser() if vault_root else Path(project_root) / "ai-vault"
    base = vault / project_name if project_name else vault
    pfx = f"{project_name}/" if project_name else ""
    _node_files(nodes, out, inc, base / "nodes", f"{pfx}nodes", f"{pfx}communities")
    n_comm = _community_files(nodes, out, inc, base / "communities", f"{pfx}nodes", f"{pfx}communities")
    if project_root:
        report = Path(project_root) / "graphify-out" / "GRAPH_REPORT.md"
        if report.exists():
            gdir = base / "graphify"; gdir.mkdir(parents=True, exist_ok=True)
            content = re.sub(r"\[\[_COMMUNITY_Community (\d+)\|Community \1\]\]",
                lambda m: f"[[{pfx}communities/Community {m.group(1)}|Community {m.group(1)}]]",
                report.read_text())
            (gdir / "GRAPH_REPORT.md").write_text(content)
    print(f"{len(nodes)} nodes · {len(links)} links · {n_comm} communities → {base}")

def main():
    p = argparse.ArgumentParser()
    p.add_argument("--project-root"); p.add_argument("--vault"); p.add_argument("--project-name")
    p.add_argument("--merged-graph")
    a = p.parse_args()
    generate(a.project_root, a.vault, a.project_name, a.merged_graph)

if __name__ == "__main__": main()
Enter fullscreen mode Exit fullscreen mode

Run it:

mkdir -p ~/.graphify
# save the script above to ~/.graphify/gen-obsidian-vault.py, then:

python3 ~/.graphify/gen-obsidian-vault.py --project-root .
# ✓ 3815 nodes · 4830 links · 749 communities → ./ai-vault
Enter fullscreen mode Exit fullscreen mode

This generates ai-vault/nodes/ (one .md per symbol) and ai-vault/communities/ (one .md per cluster). Each file contains wikilinks to its neighbors — Obsidian's Graph View draws the edges automatically.

ai-vault/
├── .obsidian/               ← Obsidian config (auto-created)
├── nodes/                   ← one .md per function / class / file  ← NEW
├── communities/             ← one .md per module cluster           ← NEW
├── graphify/
│   └── GRAPH_REPORT.md      ← synced from graphify-out/
├── permanent/               ← Architecture decisions (human notes)
├── logs/                    ← Session records
└── CLAUDE.md                ← vault-level agent instructions
Enter fullscreen mode Exit fullscreen mode

Open in Obsidian → press Ctrl+G → you get the full network.


Mode 3: Global Vault (Multiple Projects)

One Obsidian window, all projects. Each project goes into its own subfolder with namespaced wikilinks — necessary because two projects often share node IDs (shared packages like dto, types, utilities).

# Add each project to the global vault
python3 ~/.graphify/gen-obsidian-vault.py \
  --project-root /path/to/project-a \
  --vault ~/obsidian-vault \
  --project-name project-a

python3 ~/.graphify/gen-obsidian-vault.py \
  --project-root /path/to/project-b \
  --vault ~/obsidian-vault \
  --project-name project-b
Enter fullscreen mode Exit fullscreen mode
~/obsidian-vault/
├── 00-Welcome.md           ← optional index note
├── project-a/
│   ├── nodes/
│   ├── communities/
│   └── graphify/GRAPH_REPORT.md
├── project-b/
│   ├── nodes/
│   ├── communities/
│   └── graphify/GRAPH_REPORT.md
└── merged/                 ← optional cross-repo combined view
Enter fullscreen mode Exit fullscreen mode

Filter by project in Graph View: Graph View sidebar → Filters → path:project-a/nodes


Cross-Repo Merged View

When projects share types or utilities, merging their graphs reveals bridge nodes — code that affects both sides.

graphify merge-graphs \
  /path/to/project-a/graphify-out/graph.json \
  /path/to/project-b/graphify-out/graph.json \
  --out ~/obsidian-vault/merged-graph.json

python3 ~/.graphify/gen-obsidian-vault.py \
  --merged-graph ~/obsidian-vault/merged-graph.json \
  --vault ~/obsidian-vault \
  --project-name merged
Enter fullscreen mode Exit fullscreen mode

Nodes in merged/communities/ that link to both projects are your highest-impact change points — changing them ripples across repos.


Switching Between Vaults

This is a common pain point: clicking X quits Obsidian entirely and the next launch reopens the same vault.

Switch inside Obsidian (always works):
Click the vault icon at the very bottom-left of the sidebar (looks like a small building/safe). A panel lists all known vaults — click to switch.

Open two vaults simultaneously (Obsidian v1.x+):
Open the vault switcher → hold Ctrl while clicking a vault → opens in a new window. Both are live at once.

From the terminal — Linux snap install:

The obsidian:// URI scheme and xdg-open do not work with snap-installed Obsidian because snap doesn't register the protocol with the system. Use a small helper script instead.

Save ~/.graphify/obs.py:

#!/usr/bin/env python3
"""Open a specific Obsidian vault from the terminal (snap-compatible)."""

import hashlib, json, subprocess, sys, time
from pathlib import Path

OBSIDIAN_CONFIG = Path("~/snap/obsidian/current/.config/obsidian/obsidian.json").expanduser()

def _load():
    return json.loads(OBSIDIAN_CONFIG.read_text()) if OBSIDIAN_CONFIG.exists() else {"vaults": {}}

def _save(c):
    OBSIDIAN_CONFIG.parent.mkdir(parents=True, exist_ok=True)
    OBSIDIAN_CONFIG.write_text(json.dumps(c, indent=2))

def open_vault(vault_path):
    path = str(Path(vault_path).expanduser().resolve())
    config = _load()
    vaults = config.setdefault("vaults", {})
    vid = next((k for k, v in vaults.items() if v.get("path") == path), None)
    if vid is None:
        vid = hashlib.md5(path.encode()).hexdigest()[:16]
        vaults[vid] = {"path": path, "ts": int(time.time() * 1000)}
        print(f"Registered: {path}")
    for k in vaults: vaults[k].pop("open", None)
    vaults[vid]["open"] = True
    _save(config)
    print(f"Opening: {path}")
    subprocess.Popen(["/snap/bin/obsidian"], start_new_session=True,
                     stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)

if __name__ == "__main__":
    if len(sys.argv) < 2: print("Usage: obs.py <vault-path>"); sys.exit(1)
    open_vault(sys.argv[1])
Enter fullscreen mode Exit fullscreen mode

This registers the vault in Obsidian's JSON config and launches Obsidian pointing to it.

Shell aliases (add to ~/.zshrc on macOS, ~/.bashrc on Linux):

alias obs='python3 ~/.graphify/obs.py ~/obsidian-vault'
alias obs-project='python3 ~/.graphify/obs.py /path/to/your-project/ai-vault'
Enter fullscreen mode Exit fullscreen mode

Then: source ~/.zshrc (or source ~/.bashrc on Linux) and use obs / obs-project from any terminal.

macOS (non-snap):

open "obsidian://open?vault=obsidian-vault"
# or
alias obs='open "obsidian://open?vault=obsidian-vault"'
Enter fullscreen mode Exit fullscreen mode

Practical tip — avoid switching entirely:
Keep the global vault as your default. Inside Graph View, use the path filter to scope to one project:

  • Graph View sidebar → Filters → path:my-project/nodes

This shows only that project's symbols without switching vaults.


Graph View Filters

Filter Shows
path:nodes All code symbols
path:my-project/nodes One project's symbols (global vault)
path:communities Module clusters only
path:permanent Architecture decisions (human notes)
path:logs Session records
-path:nodes Human notes only, no code graph

ai-vault/CLAUDE.md (vault-level agent instructions — Claude Code example; use AGENTS.md or GEMINI.md for other agents)

## Session Commands
- `/resume` — Read the latest log in `logs/` to restore context
- `/save` — Write a timestamped session summary to `logs/YYYY-MM-DD-HH-MM.md`

## Navigation
- `graphify/GRAPH_REPORT.md` — Codebase graph report
- `permanent/` — Architecture decisions and atomic notes
- `logs/` — Session records

## Graph Usage
Before answering architecture questions, read `graphify/GRAPH_REPORT.md`.
Use `graphify query`, `graphify path`, and `graphify explain` from the project root.
Enter fullscreen mode Exit fullscreen mode

Keeping the Vault Fresh

Git hooks keep graph.json current automatically. Refresh the Obsidian notes after significant changes:

# Project vault
python3 ~/.graphify/gen-obsidian-vault.py --project-root .

# Global vault entry
python3 ~/.graphify/gen-obsidian-vault.py \
  --project-root /path/to/project \
  --vault ~/obsidian-vault \
  --project-name my-project

# Rebuild merged view
graphify merge-graphs \
  /path/to/a/graphify-out/graph.json \
  /path/to/b/graphify-out/graph.json \
  --out ~/obsidian-vault/merged-graph.json && \
python3 ~/.graphify/gen-obsidian-vault.py \
  --merged-graph ~/obsidian-vault/merged-graph.json \
  --vault ~/obsidian-vault \
  --project-name merged
Enter fullscreen mode Exit fullscreen mode

Add vault regen to git hooks

Husky projects: create .husky/post-commit (git-tracked; note .husky/_/ is gitignored and dead code — the h wrapper always exits before any appended code there runs).

Plain git projects: use .git/hooks/post-commit instead.

The hook is structured in two independent sections so teammates without these tools see zero overhead — each section guards itself and silently no-ops if the tool isn't present:

#!/bin/sh
# Knowledge graph tools are optional — each section silently skips if tools are absent.

# _resources_ok function — copy from the post-commit hook above
# (CPU ≤ 50% of logical cores, memory ≥ 2 GB free, all platforms)

GIT_DIR=$(git rev-parse --git-dir 2>/dev/null)
[ -d "$GIT_DIR/rebase-merge" ] && exit 0
[ -d "$GIT_DIR/rebase-apply" ] && exit 0
[ -f "$GIT_DIR/MERGE_HEAD" ] && exit 0
[ -f "$GIT_DIR/CHERRY_PICK_HEAD" ] && exit 0

# ── graphify: rebuild graph after commit ─────────────────────────────────────
# Entire block skips if graphify is not installed.
if command -v graphify > /dev/null 2>&1; then
  CHANGED=$(git diff --name-only HEAD~1 HEAD 2>/dev/null || git diff --name-only HEAD 2>/dev/null)
  if [ -n "$CHANGED" ]; then
    # (Python interpreter detection omitted for brevity — see full hook above)
    if [ -n "$GRAPHIFY_PYTHON" ] && ! pgrep -qf 'graphify' 2>/dev/null && _resources_ok; then
      export GRAPHIFY_CHANGED="$CHANGED"
      _LOG="${HOME}/.cache/graphify-rebuild.log"
      mkdir -p "$(dirname "$_LOG")"
      echo "[graphify hook] launching background rebuild (log: $_LOG)"
      nohup timeout 300 $GRAPHIFY_PYTHON -c "
import os, sys
from pathlib import Path
changed_raw = os.environ.get('GRAPHIFY_CHANGED', '')
changed = [Path(f.strip()) for f in changed_raw.strip().splitlines() if f.strip()]
if not changed:
    sys.exit(0)
from graphify.watch import _rebuild_code
_rebuild_code(Path('.'))
" > "$_LOG" 2>&1 < /dev/null &
      disown 2>/dev/null || true
    fi
  fi
fi

# ── code-review-graph: incrementally update MCP graph index after commit ────
if command -v code-review-graph > /dev/null 2>&1 && [ -d .code-review-graph ] && ! pgrep -qf 'code-review-graph update' 2>/dev/null && _resources_ok; then
  _LOG="${HOME}/.cache/code-review-graph-update.log"
  mkdir -p "$(dirname "$_LOG")"
  echo "[code-review-graph] Commit detected - launching background update (log: $_LOG)"
  nohup sh -c 'code-review-graph update --skip-flows && code-review-graph embed' > "$_LOG" 2>&1 < /dev/null &
  disown 2>/dev/null || true
fi
Enter fullscreen mode Exit fullscreen mode

Why timeout 300? graphify _rebuild_code has no internal deadline. On large repos it can hang indefinitely if it hits a file lock, encoding error, or memory pressure. timeout 300 auto-kills the background process after 5 minutes — the git commit returns immediately either way, and the next commit retriggers a fresh attempt.

Why _resources_ok? Without it, rapid-fire commits (amend chains, rebase, interactive rebase) spawn multiple concurrent graphify processes. Observed: 3 stuck processes at 65–73% CPU each, load average 12+, RAM exhausted. The resource check ensures a rebuild only starts when the machine can handle it.

Create the same file at .husky/post-checkout with one change: wrap the graphify block with if command -v graphify > /dev/null 2>&1 && [ -d "graphify-out" ] (force-rebuild only when the graph has been built before). Mark both executable: chmod +x .husky/post-commit .husky/post-checkout.

What "safe to commit" means here: a developer with none of these tools runs exactly the 4 rebase guards and exits 0 — no Python spawned, no directories created, no errors.


Complete Auto-Update Flow

After full setup, the pipeline looks like this:

AI agent session opens (Claude Code shown — other agents use equivalent plugin hooks)
    → SessionStart: code-review-graph status
    → SessionStart: setup nudge (only if CLI installed but no graph built yet)
    → SessionStart: graph cheatsheet injected (query→tool map, ~150 tokens, once per session)

AI agent searches the codebase (grep / rg / find)
    → PreToolUse: smart-grep-hook.sh intercepts
        ├── no graph / non-code target / --graph-tried flag → pass silently
        ├── first grep this session
        │     ├── graph has answer → show answer in context, allow grep (one-shot lesson)
        │     └── graph has no answer → allow grep, suggest MCP tool
        └── subsequent grep
              ├── graph has answer → deny with answer inline (zero retry, saves 2000+ tokens)
              └── graph has no answer → pass silently

AI agent edits a file
    → Stop hook (once per full AI turn): code-review-graph update --skip-flows (0.425s)
    [graphify does NOT update here — too slow for per-turn trigger]

Code committed
    → post-commit: _resources_ok() check
        ├── CPU > 50% or memory < 2 GB → skip rebuild, log why
        ├── pgrep: graphify already running → skip
        └── resources OK → graphify update . (background nohup, timeout 300)
    → post-commit: _resources_ok() check
        ├── pgrep: CRG already running → skip
        └── resources OK → code-review-graph update + embed (background nohup)

Branch switched
    → post-checkout: diff count = git diff --name-only PREV NEW | wc -l
        ├── ≤5 files OR same-base branch switch
        │     ├── graphify: incremental update (SHA256 cached, fast) [resource-guarded]
        │     └── CRG: update --skip-flows + embed [resource-guarded]
        └── >5 files OR brand-new branch
              ├── graphify: force full rebuild (GRAPHIFY_FORCE=true) [resource-guarded]
              └── CRG: build (full SQLite rebuild) [resource-guarded]

Result: developer commits or switches branches → both graphs update in background (if device has resources).
        Agent edits files → CRG updates after each AI turn.
        Agent searches → graph answers before grep fires.
        Device overloaded → rebuilds silently skip, next commit retries.
Enter fullscreen mode Exit fullscreen mode

Quick Reference

Each block below is independent — install only the tools you want. Pure CLI setup → graphify only. Embedding + MCP setup → CRG only. Full stack → both.

Installation

# Either, both, or pick one:
pip install graphify              # graphify (CLI only)
pip install code-review-graph     # CRG (CLI + MCP server)
# Or via uvx / pipx — see the Installation section earlier in this guide
Enter fullscreen mode Exit fullscreen mode

Project Setup (run once per project, for each tool you installed)

# Create ignore files (only the ones you need)
touch .graphifyignore                 # graphify only
touch .code-review-graphignore        # CRG only

# Build graphs manually
graphify update .           # graphify: JSON graph, AST-only (no tokens)
code-review-graph build     # CRG: SQLite graph (fast, no tokens)

# Register with your AI agent (run only the lines for installed tools)
graphify claude install     # graphify hooks + CLAUDE.md trigger
graphify hook install       # graphify post-commit hook
code-review-graph install   # CRG MCP server + skills + hooks
Enter fullscreen mode Exit fullscreen mode

Verify (run only the rows for tools you installed)

code-review-graph status            # CRG: node/edge counts
head -15 graphify-out/GRAPH_REPORT.md  # graphify: corpus + freshness
time code-review-graph update --skip-flows   # CRG: should be <1s
python3 -m json.tool .claude/settings.local.json | grep -A3 PostToolUse
Enter fullscreen mode Exit fullscreen mode

Daily Commands (each line targets one tool — use what is relevant)

# Manual incremental update
graphify update .           # graphify: SHA256 incremental
code-review-graph update    # CRG: 0.4s

# Query (graphify CLI)
graphify query "auth flow" --graph graphify-out/graph.json
graphify path "ComponentA" "ServiceB" --graph graphify-out/graph.json
graphify explain "MyService" --graph graphify-out/graph.json

# Blast-radius review (CRG)
code-review-graph detect-changes --base HEAD~1 --brief

# Status (CRG)
code-review-graph status

# Wiki / visualization (CRG)
code-review-graph wiki
code-review-graph visualize
Enter fullscreen mode Exit fullscreen mode

Watch Mode

graphify watch .                  # continuous auto-rebuild (graphify)
code-review-graph watch           # continuous auto-rebuild (crg)
Enter fullscreen mode Exit fullscreen mode

Why Not a Vector Database?

Code navigation is fundamentally relational — UserController calls AuthService which imports TokenRepository. This is a directed graph, not a bag of vectors.

Knowledge Graph Vector DB
Code structure Topology-exact Approximate
Setup No embedding pipeline Embedding + chunking + sync
Hallucination None (AST is deterministic) Can return similar-but-wrong
Cost (indexing) 0 tokens (AST mode) Embedding cost per file
Incremental update 0.4s (SQLite diff) Re-embed changed chunks
Exact symbol lookup Use grep/LSP (still best) Often worse

Knowledge graphs excel at "how does X relate to Y" questions. For exact symbol lookup, grep and LSP still win — and both tools' PreToolUse hooks (or equivalent agent config) redirect the agent toward the graph for structural questions while leaving grep for exact matches.


Files Created / Modified

After completing this setup, here's what changes in each project:

Committed to repo:

your-project/
├── .graphifyignore                    ← new
├── .code-review-graphignore           ← new
├── .mcp.example.json                  ← new (reference config for teammates)
├── CLAUDE.md                          ← 2-line pointer to docs/agent/knowledge-graph.md
├── .claude/
│   ├── settings.json                  ← permissions only, no hooks
│   ├── settings.example.json          ← new (hook reference: Stop CRG update + SessionStart cheatsheet + setup nudge)
│   └── skills/                        ← new (code-review-graph skills relevant to the repo)
├── .husky/                            ← Husky hooks (if using Husky)
│   ├── pre-commit                     ← branch protection (blocks direct commits to protected branches)
│   ├── post-commit                    ← graphify + CRG rebuild (resource-guarded, background nohup)
│   └── post-checkout                  ← graphify + CRG rebuild on branch switch (resource-guarded, smart incremental/full)
└── docs/agent/knowledge-graph.md      ← new (full tool reference)
Enter fullscreen mode Exit fullscreen mode

Gitignored (local only, not committed):

your-project/
├── .mcp.json                          ← personal MCP config (copy from .mcp.example.json)
├── .claude/settings.local.json        ← live personal hooks (copy from settings.example.json)
├── graphify-out/                      ← generated (graph.json, GRAPH_REPORT.md, graph.html)
├── .code-review-graph/                ← generated (SQLite db, wiki, visualization)
├── ai-vault/                          ← Obsidian vault (personal, lives inside the project)
├── AGENTS.md / GEMINI.md              ← tool-generated, IDE-specific
├── .cursorrules / .windsurfrules      ← IDE-specific
└── .kiro/ / .opencode.json            ← IDE-specific
Enter fullscreen mode Exit fullscreen mode

New teammate setup (copy example files and activate):

cp .mcp.example.json .mcp.json
cp .claude/settings.example.json .claude/settings.local.json
# Or copy hooks into ~/.claude/settings.json to activate across all projects
Enter fullscreen mode Exit fullscreen mode

References

  • Graphify — AI coding assistant skill (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, and more). Turn any folder of code, SQL schemas, R scripts, shell scripts, docs, papers, images, or videos into a queryable knowledge graph. App code + database schema + infrastructure in one graph.

  • code-review-graph — Local knowledge graph for Claude Code. Builds a persistent map of your codebase so Claude reads only what matters — 6.8× fewer tokens on reviews and up to 49× on daily coding tasks.

  • From Karpathy's LLM Wiki to Graphify: AI Memory Layers are Here — Analytics Vidhya article on the evolution and implementation of knowledge graphs for AI systems.


Let's Connect

I'm always excited to hear about what you're building! If you found this guide helpful, have questions, or just want to share your ai setup strategy:

Top comments (0)