Mayne

Posted on Jul 1

ThinkGraph Tutorial: How to Make Your LLM Actually Think Before Answering

#ai #tutorial #llm #opensource

In my previous article, I introduced ThinkGraph -- an open-source pipeline that forces LLMs to decompose complex prompts into atomic fact DAGs before answering. This time, let's get hands-on.

What You Will Learn

How to install ThinkGraph for Claude Code, OpenCode, Cursor, and others
How to use the CLI for step-by-step prompt decomposition
How self-consistency voting catches hallucinations
How web grounding fills knowledge gaps without API keys
How to integrate ThinkGraph as an MCP server

Installation

git clone https://github.com/Mayne-X/thinkgraph.git
cd thinkgraph

# Auto-detect and install for all your agents
python install.py

# See what it would do without touching files
python install.py --dry-run

The installer detects OpenCode, Claude Code, Cursor, Codex, Copilot, and Gemini CLI configs and injects the appropriate adapter. Idempotent -- safe to re-run.

CLI Tutorial: Decompose a Complex Prompt

Let's walk through a real example. Say you ask:

"Should we migrate our PostgreSQL database to CockroachDB for a multi-region SaaS product?"

Step 1: Triage

python cli/thinkgraph.py triage "Should we migrate our PostgreSQL database to CockroachDB for a multi-region SaaS product?"

Output: multi-hop -- not trivial, enters the pipeline.

Step 2: Decompose into DAG

The agent (not the CLI) generates a dependency graph like:

{
  "nodes": [
    {"id": "Q1", "q": "What are CockroachDB's multi-region capabilities?", "deps": []},
    {"id": "Q2", "q": "What are PostgreSQL's multi-region limitations?", "deps": []},
    {"id": "Q3", "q": "What is the migration complexity from PG to CockroachDB?", "deps": ["Q1","Q2"]},
    {"id": "Q4", "q": "What is the cost comparison for our scale?", "deps": ["Q1","Q2"]},
    {"id": "Q5", "q": "Should we migrate given Q1-Q4?", "deps": ["Q3","Q4"]}
  ]
}

Step 3: Validate the DAG

python cli/thinkgraph.py validate-dag graph.json

Checks for cycles, dangling deps, orphan nodes, and computes execution batches:

Batch 0: Q1, Q2 (parallel)
Batch 1: Q3, Q4 (parallel)
Batch 2: Q5 (needs Q3 + Q4)

Step 4: Resolve with Web Grounding

Low-confidence sub-questions (< 0.6) can trigger automatic web search:

python cli/thinkgraph.py web-search "CockroachDB multi-region latency benchmark 2026" --num-results 5

No API key needed -- ThinkGraph parses DuckDuckGo results directly via HTTP. This is huge for automated pipelines where you don't want to manage search API credentials.

Step 5: Cache Results

python cli/thinkgraph.py cache-set "what are cockroachdb multi-region capabilities" '{"claim": "CockroachDB offers automated partitioning, survivable across regions", "confidence": 0.92}'
python cli/thinkgraph.py cache-get "what are cockroachdb multi-region capabilities"

Facts persist globally at ~/.thinkgraph/cache.json -- shared across projects and sessions.

Step 6: Self-Consistency Vote

Run synthesis 2-3 times, then vote on consistency:

python cli/thinkgraph.py vote \
  "CockroachDB is better for multi-region due to automated survivability" \
  "CockroachDB offers better multi-region support with automated partitioning and survivability" \
  "For multi-region, CockroachDB has automated survivability which PostgreSQL lacks"

Output picks the Jaccard centroid -- the response with highest average word overlap:

{"winner": "CockroachDB offers better multi-region support...", "scores": [0.72, 0.81, 0.68], "response_count": 3}

Step 7: Prune Unnecessary Nodes

python cli/thinkgraph.py prune-dag graph.json --facts facts.json --prompt "Should we migrate PostgreSQL to CockroachDB?"

If a parent node's answer already covers a child's question, the child is auto-removed. Saves tokens without losing information.

Step 8: Export as Report

python cli/thinkgraph.py export results.json --format markdown > migration-analysis.md

Supports JSON, YAML, and Markdown.

Using ThinkGraph as an MCP Server

This is the slickest integration path. Start the server:

python mcp/thinkgraph_mcp.py

Then configure any MCP-compatible client:

{
  "mcpServers": {
    "thinkgraph": {
      "command": "python",
      "args": ["/path/to/thinkgraph/mcp/thinkgraph_mcp.py"]
    }
  }
}

Seven tools are exposed: triage, validate-dag, vote, web-search, cache-get, cache-set, tokens. Your MCP client can call these directly without the CLI.

A/B Testing Your Prompts

Want to prove ThinkGraph improves quality? Use the A/B scorer:

python cli/thinkgraph.py ab-score \
  "CockroachDB supports multi-region deployments" \
  --ground-truth "CockroachDB supports multi-region deployments with automated partitioning and survivability across regions"

Output:

keyword_recall: 66.67%
precision: 100.00%
claim_count: 1
uncertainty_markers: 0

Compressing Long Contexts

When synthesis context is too long, compress it:

python cli/thinkgraph.py compress long_spec.txt --ratio 0.3

Uses TF-IDF sentence extraction to keep the 30% most important content. No embeddings, no API calls -- pure term-frequency math in stdlib Python.

Writing Custom Plugins

Need a custom resolve function? Register a plugin:

python cli/thinkgraph.py plugin-register my_db_lookup 'def fn(q, ctx): return {"claim": db.query(q), "confidence": 0.95}'
python cli/thinkgraph.py plugin-list
# my_db_lookup, shell, weblookup

The Token Math

ThinkGraph sets a hard ceiling of 4x the direct answer tokens. If the pipeline would exceed that, it aborts to a direct answer. Here's the budget for each stage:

Stage	Max Tokens
Triage	50
Decompose	200
Per sub-question	300
Synthesize	600
Hard ceiling	4x direct answer

This means simple prompts (< 200 tokens) bypass the pipeline entirely. You never pay more than 4x, and most of the time you pay far less.

Why Zero Dependencies Matters

The entire CLI is Python 3.8+ stdlib only. No pip install. No venv. No requirements.txt. Precompiled regex, LRU cache decorators, and sets for membership. It runs anywhere Python runs -- including CI runners, Docker scratch containers, and air-gapped environments.

Test Suite

python tests/test_golden.py       # 15/15 passing
python tests/test_new_features.py # 14/14 passing
python tests/benchmark.py         # 10-prompt quality benchmark

The benchmark scores triage accuracy at 80%, compression quality at 70%, and vote consistency at 64% -- solid baselines that improve with use.

When NOT to Use ThinkGraph

Honest answer: don't use it for simple Q&A ("What is the capital of France?"). The triage stage catches trivial prompts and short-circuits. Also don't use it for creative writing -- decomposition kills flow. But for any prompt involving comparison, planning, analysis, or multi-step reasoning, it shines.

Get Started

The entire project is MIT-licensed at github.com/Mayne-X/thinkgraph. Star it, fork it, break it, fix it. Contributions welcome.

If you want the big-picture overview first, read my introductory article. Or just clone the repo and run python cli/thinkgraph.py triage on your hardest prompt right now.

ThinkGraph forces structured thinking before guessing. Your prompts deserve a foundation, not a hallucination.

DEV Community