In my previous article, I introduced ThinkGraph -- an open-source pipeline that forces LLMs to decompose complex prompts into atomic fact DAGs before answering. This time, let's get hands-on.
What You Will Learn
- How to install ThinkGraph for Claude Code, OpenCode, Cursor, and others
- How to use the CLI for step-by-step prompt decomposition
- How self-consistency voting catches hallucinations
- How web grounding fills knowledge gaps without API keys
- How to integrate ThinkGraph as an MCP server
Installation
git clone https://github.com/Mayne-X/thinkgraph.git
cd thinkgraph
# Auto-detect and install for all your agents
python install.py
# See what it would do without touching files
python install.py --dry-run
The installer detects OpenCode, Claude Code, Cursor, Codex, Copilot, and Gemini CLI configs and injects the appropriate adapter. Idempotent -- safe to re-run.
CLI Tutorial: Decompose a Complex Prompt
Let's walk through a real example. Say you ask:
"Should we migrate our PostgreSQL database to CockroachDB for a multi-region SaaS product?"
Step 1: Triage
python cli/thinkgraph.py triage "Should we migrate our PostgreSQL database to CockroachDB for a multi-region SaaS product?"
Output: multi-hop -- not trivial, enters the pipeline.
Step 2: Decompose into DAG
The agent (not the CLI) generates a dependency graph like:
{
"nodes": [
{"id": "Q1", "q": "What are CockroachDB's multi-region capabilities?", "deps": []},
{"id": "Q2", "q": "What are PostgreSQL's multi-region limitations?", "deps": []},
{"id": "Q3", "q": "What is the migration complexity from PG to CockroachDB?", "deps": ["Q1","Q2"]},
{"id": "Q4", "q": "What is the cost comparison for our scale?", "deps": ["Q1","Q2"]},
{"id": "Q5", "q": "Should we migrate given Q1-Q4?", "deps": ["Q3","Q4"]}
]
}
Step 3: Validate the DAG
python cli/thinkgraph.py validate-dag graph.json
Checks for cycles, dangling deps, orphan nodes, and computes execution batches:
Batch 0: Q1, Q2 (parallel)
Batch 1: Q3, Q4 (parallel)
Batch 2: Q5 (needs Q3 + Q4)
Step 4: Resolve with Web Grounding
Low-confidence sub-questions (< 0.6) can trigger automatic web search:
python cli/thinkgraph.py web-search "CockroachDB multi-region latency benchmark 2026" --num-results 5
No API key needed -- ThinkGraph parses DuckDuckGo results directly via HTTP. This is huge for automated pipelines where you don't want to manage search API credentials.
Step 5: Cache Results
python cli/thinkgraph.py cache-set "what are cockroachdb multi-region capabilities" '{"claim": "CockroachDB offers automated partitioning, survivable across regions", "confidence": 0.92}'
python cli/thinkgraph.py cache-get "what are cockroachdb multi-region capabilities"
Facts persist globally at ~/.thinkgraph/cache.json -- shared across projects and sessions.
Step 6: Self-Consistency Vote
Run synthesis 2-3 times, then vote on consistency:
python cli/thinkgraph.py vote \
"CockroachDB is better for multi-region due to automated survivability" \
"CockroachDB offers better multi-region support with automated partitioning and survivability" \
"For multi-region, CockroachDB has automated survivability which PostgreSQL lacks"
Output picks the Jaccard centroid -- the response with highest average word overlap:
{"winner": "CockroachDB offers better multi-region support...", "scores": [0.72, 0.81, 0.68], "response_count": 3}
Step 7: Prune Unnecessary Nodes
python cli/thinkgraph.py prune-dag graph.json --facts facts.json --prompt "Should we migrate PostgreSQL to CockroachDB?"
If a parent node's answer already covers a child's question, the child is auto-removed. Saves tokens without losing information.
Step 8: Export as Report
python cli/thinkgraph.py export results.json --format markdown > migration-analysis.md
Supports JSON, YAML, and Markdown.
Using ThinkGraph as an MCP Server
This is the slickest integration path. Start the server:
python mcp/thinkgraph_mcp.py
Then configure any MCP-compatible client:
{
"mcpServers": {
"thinkgraph": {
"command": "python",
"args": ["/path/to/thinkgraph/mcp/thinkgraph_mcp.py"]
}
}
}
Seven tools are exposed: triage, validate-dag, vote, web-search, cache-get, cache-set, tokens. Your MCP client can call these directly without the CLI.
A/B Testing Your Prompts
Want to prove ThinkGraph improves quality? Use the A/B scorer:
python cli/thinkgraph.py ab-score \
"CockroachDB supports multi-region deployments" \
--ground-truth "CockroachDB supports multi-region deployments with automated partitioning and survivability across regions"
Output:
keyword_recall: 66.67%
precision: 100.00%
claim_count: 1
uncertainty_markers: 0
Compressing Long Contexts
When synthesis context is too long, compress it:
python cli/thinkgraph.py compress long_spec.txt --ratio 0.3
Uses TF-IDF sentence extraction to keep the 30% most important content. No embeddings, no API calls -- pure term-frequency math in stdlib Python.
Writing Custom Plugins
Need a custom resolve function? Register a plugin:
python cli/thinkgraph.py plugin-register my_db_lookup 'def fn(q, ctx): return {"claim": db.query(q), "confidence": 0.95}'
python cli/thinkgraph.py plugin-list
# my_db_lookup, shell, weblookup
The Token Math
ThinkGraph sets a hard ceiling of 4x the direct answer tokens. If the pipeline would exceed that, it aborts to a direct answer. Here's the budget for each stage:
| Stage | Max Tokens |
|---|---|
| Triage | 50 |
| Decompose | 200 |
| Per sub-question | 300 |
| Synthesize | 600 |
| Hard ceiling | 4x direct answer |
This means simple prompts (< 200 tokens) bypass the pipeline entirely. You never pay more than 4x, and most of the time you pay far less.
Why Zero Dependencies Matters
The entire CLI is Python 3.8+ stdlib only. No pip install. No venv. No requirements.txt. Precompiled regex, LRU cache decorators, and sets for membership. It runs anywhere Python runs -- including CI runners, Docker scratch containers, and air-gapped environments.
Test Suite
python tests/test_golden.py # 15/15 passing
python tests/test_new_features.py # 14/14 passing
python tests/benchmark.py # 10-prompt quality benchmark
The benchmark scores triage accuracy at 80%, compression quality at 70%, and vote consistency at 64% -- solid baselines that improve with use.
When NOT to Use ThinkGraph
Honest answer: don't use it for simple Q&A ("What is the capital of France?"). The triage stage catches trivial prompts and short-circuits. Also don't use it for creative writing -- decomposition kills flow. But for any prompt involving comparison, planning, analysis, or multi-step reasoning, it shines.
Get Started
The entire project is MIT-licensed at github.com/Mayne-X/thinkgraph. Star it, fork it, break it, fix it. Contributions welcome.
If you want the big-picture overview first, read my introductory article. Or just clone the repo and run python cli/thinkgraph.py triage on your hardest prompt right now.
ThinkGraph forces structured thinking before guessing. Your prompts deserve a foundation, not a hallucination.
Top comments (0)