DEV Community

Mayne
Mayne

Posted on

ThinkGraph Tutorial: How to Make Your LLM Actually Think Before Answering

In my previous article, I introduced ThinkGraph -- an open-source pipeline that forces LLMs to decompose complex prompts into atomic fact DAGs before answering. This time, let's get hands-on.

What You Will Learn

  • How to install ThinkGraph for Claude Code, OpenCode, Cursor, and others
  • How to use the CLI for step-by-step prompt decomposition
  • How self-consistency voting catches hallucinations
  • How web grounding fills knowledge gaps without API keys
  • How to integrate ThinkGraph as an MCP server

Installation

git clone https://github.com/Mayne-X/thinkgraph.git
cd thinkgraph

# Auto-detect and install for all your agents
python install.py

# See what it would do without touching files
python install.py --dry-run
Enter fullscreen mode Exit fullscreen mode

The installer detects OpenCode, Claude Code, Cursor, Codex, Copilot, and Gemini CLI configs and injects the appropriate adapter. Idempotent -- safe to re-run.

CLI Tutorial: Decompose a Complex Prompt

Let's walk through a real example. Say you ask:

"Should we migrate our PostgreSQL database to CockroachDB for a multi-region SaaS product?"

Step 1: Triage

python cli/thinkgraph.py triage "Should we migrate our PostgreSQL database to CockroachDB for a multi-region SaaS product?"
Enter fullscreen mode Exit fullscreen mode

Output: multi-hop -- not trivial, enters the pipeline.

Step 2: Decompose into DAG

The agent (not the CLI) generates a dependency graph like:

{
  "nodes": [
    {"id": "Q1", "q": "What are CockroachDB's multi-region capabilities?", "deps": []},
    {"id": "Q2", "q": "What are PostgreSQL's multi-region limitations?", "deps": []},
    {"id": "Q3", "q": "What is the migration complexity from PG to CockroachDB?", "deps": ["Q1","Q2"]},
    {"id": "Q4", "q": "What is the cost comparison for our scale?", "deps": ["Q1","Q2"]},
    {"id": "Q5", "q": "Should we migrate given Q1-Q4?", "deps": ["Q3","Q4"]}
  ]
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Validate the DAG

python cli/thinkgraph.py validate-dag graph.json
Enter fullscreen mode Exit fullscreen mode

Checks for cycles, dangling deps, orphan nodes, and computes execution batches:

Batch 0: Q1, Q2 (parallel)
Batch 1: Q3, Q4 (parallel)
Batch 2: Q5 (needs Q3 + Q4)
Enter fullscreen mode Exit fullscreen mode

Step 4: Resolve with Web Grounding

Low-confidence sub-questions (< 0.6) can trigger automatic web search:

python cli/thinkgraph.py web-search "CockroachDB multi-region latency benchmark 2026" --num-results 5
Enter fullscreen mode Exit fullscreen mode

No API key needed -- ThinkGraph parses DuckDuckGo results directly via HTTP. This is huge for automated pipelines where you don't want to manage search API credentials.

Step 5: Cache Results

python cli/thinkgraph.py cache-set "what are cockroachdb multi-region capabilities" '{"claim": "CockroachDB offers automated partitioning, survivable across regions", "confidence": 0.92}'
python cli/thinkgraph.py cache-get "what are cockroachdb multi-region capabilities"
Enter fullscreen mode Exit fullscreen mode

Facts persist globally at ~/.thinkgraph/cache.json -- shared across projects and sessions.

Step 6: Self-Consistency Vote

Run synthesis 2-3 times, then vote on consistency:

python cli/thinkgraph.py vote \
  "CockroachDB is better for multi-region due to automated survivability" \
  "CockroachDB offers better multi-region support with automated partitioning and survivability" \
  "For multi-region, CockroachDB has automated survivability which PostgreSQL lacks"
Enter fullscreen mode Exit fullscreen mode

Output picks the Jaccard centroid -- the response with highest average word overlap:

{"winner": "CockroachDB offers better multi-region support...", "scores": [0.72, 0.81, 0.68], "response_count": 3}
Enter fullscreen mode Exit fullscreen mode

Step 7: Prune Unnecessary Nodes

python cli/thinkgraph.py prune-dag graph.json --facts facts.json --prompt "Should we migrate PostgreSQL to CockroachDB?"
Enter fullscreen mode Exit fullscreen mode

If a parent node's answer already covers a child's question, the child is auto-removed. Saves tokens without losing information.

Step 8: Export as Report

python cli/thinkgraph.py export results.json --format markdown > migration-analysis.md
Enter fullscreen mode Exit fullscreen mode

Supports JSON, YAML, and Markdown.

Using ThinkGraph as an MCP Server

This is the slickest integration path. Start the server:

python mcp/thinkgraph_mcp.py
Enter fullscreen mode Exit fullscreen mode

Then configure any MCP-compatible client:

{
  "mcpServers": {
    "thinkgraph": {
      "command": "python",
      "args": ["/path/to/thinkgraph/mcp/thinkgraph_mcp.py"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Seven tools are exposed: triage, validate-dag, vote, web-search, cache-get, cache-set, tokens. Your MCP client can call these directly without the CLI.

A/B Testing Your Prompts

Want to prove ThinkGraph improves quality? Use the A/B scorer:

python cli/thinkgraph.py ab-score \
  "CockroachDB supports multi-region deployments" \
  --ground-truth "CockroachDB supports multi-region deployments with automated partitioning and survivability across regions"
Enter fullscreen mode Exit fullscreen mode

Output:

keyword_recall: 66.67%
precision: 100.00%
claim_count: 1
uncertainty_markers: 0
Enter fullscreen mode Exit fullscreen mode

Compressing Long Contexts

When synthesis context is too long, compress it:

python cli/thinkgraph.py compress long_spec.txt --ratio 0.3
Enter fullscreen mode Exit fullscreen mode

Uses TF-IDF sentence extraction to keep the 30% most important content. No embeddings, no API calls -- pure term-frequency math in stdlib Python.

Writing Custom Plugins

Need a custom resolve function? Register a plugin:

python cli/thinkgraph.py plugin-register my_db_lookup 'def fn(q, ctx): return {"claim": db.query(q), "confidence": 0.95}'
python cli/thinkgraph.py plugin-list
# my_db_lookup, shell, weblookup
Enter fullscreen mode Exit fullscreen mode

The Token Math

ThinkGraph sets a hard ceiling of 4x the direct answer tokens. If the pipeline would exceed that, it aborts to a direct answer. Here's the budget for each stage:

Stage Max Tokens
Triage 50
Decompose 200
Per sub-question 300
Synthesize 600
Hard ceiling 4x direct answer

This means simple prompts (< 200 tokens) bypass the pipeline entirely. You never pay more than 4x, and most of the time you pay far less.

Why Zero Dependencies Matters

The entire CLI is Python 3.8+ stdlib only. No pip install. No venv. No requirements.txt. Precompiled regex, LRU cache decorators, and sets for membership. It runs anywhere Python runs -- including CI runners, Docker scratch containers, and air-gapped environments.

Test Suite

python tests/test_golden.py       # 15/15 passing
python tests/test_new_features.py # 14/14 passing
python tests/benchmark.py         # 10-prompt quality benchmark
Enter fullscreen mode Exit fullscreen mode

The benchmark scores triage accuracy at 80%, compression quality at 70%, and vote consistency at 64% -- solid baselines that improve with use.

When NOT to Use ThinkGraph

Honest answer: don't use it for simple Q&A ("What is the capital of France?"). The triage stage catches trivial prompts and short-circuits. Also don't use it for creative writing -- decomposition kills flow. But for any prompt involving comparison, planning, analysis, or multi-step reasoning, it shines.

Get Started

The entire project is MIT-licensed at github.com/Mayne-X/thinkgraph. Star it, fork it, break it, fix it. Contributions welcome.

If you want the big-picture overview first, read my introductory article. Or just clone the repo and run python cli/thinkgraph.py triage on your hardest prompt right now.


ThinkGraph forces structured thinking before guessing. Your prompts deserve a foundation, not a hallucination.

Top comments (0)