Mayne

Posted on Jul 1

ThinkGraph - Give Your LLM a 50% Accuracy Boost by Building a Fact Foundation First

#ai #llm #opensource #showdev

ThinkGraph: Structured Decomposition for LLM Prompts

Every developer using LLMs has experienced this: you ask a complex question, and the model guesses the whole answer at once. It hallucinates details, misses constraints, and assumes facts it should verify first.

ThinkGraph is an open-source skill that intercepts prompts, forces the LLM to decompose them into a dependency graph of atomic facts, resolves those facts sequentially, and then synthesizes a grounded answer.

The result? 50%+ accuracy improvement on multi-hop prompts while actually saving tokens on simple ones.

The Problem: LLMs Guess Whole Answers

Prompt: "Compare React and Vue for a large enterprise dashboard with SSR requirements"

X LLM guesses React is better without checking:
   - SSR maturity of each framework
   - Enterprise adoption rates
   - Team size implications
   - Bundle size trade-offs

The Solution: Build a Foundation First

ThinkGraph adds a 5-stage pipeline before the final answer:

1. Triage

Classify the prompt: trivial? Answer directly (save tokens). Multi-hop/planning? Enter the pipeline.

2. Decompose

Break the prompt into a DAG of atomic sub-questions:

{
  "nodes": [
    {"id": "Q1", "q": "What is React's SSR maturity?", "deps": []},
    {"id": "Q2", "q": "What is Vue's SSR maturity?", "deps": []},
    {"id": "Q3", "q": "Which has better enterprise adoption?", "deps": []},
    {"id": "Q4", "q": "Given Q1-Q3, which is better for a 10-person team?", "deps": ["Q1","Q2","Q3"]}
  ]
}

3. Resolve

Answer each node in topological order. Parallelize independent nodes. Each answer carries a confidence score:

Q1 - React 18+ supports streaming SSR (conf: 0.92)
Q2 - Vue 3 has SSR via Nuxt 3 (conf: 0.88)
Q3 - React is more widely adopted in enterprise (conf: 0.85)

Low-confidence facts (<0.6) can be auto-web-searched via DuckDuckGo - no API key needed.

4. Self-Consistency Vote

Run the synthesis stage 2-3 times, then pick the Jaccard centroid - the response with highest average word overlap with all others.

5. Synthesize

Feed ONLY verified sub-facts into the final prompt.

Token Efficiency

Stage	Max Tokens
Triage	50
Decompose	200
Per sub-question	300
Synthesize	600
Hard ceiling	4x direct answer

Key Design Decisions

No API keys required. The helper CLI does deterministic bookkeeping only.

stdlib only (Python). Zero dependencies.

Works with 6+ agents. OpenCode, Claude Code, Cursor, Codex, Copilot, Gemini CLI.

Quick Start

git clone https://github.com/Mayne-X/thinkgraph.git
cd thinkgraph
python install.py
python thinkgraph.py triage "compare React and Vue for enterprise SSR"
python thinkgraph.py validate-dag graph.json

MCP Server

ThinkGraph also ships as an MCP server (7 tools):

{
  "mcpServers": {
    "thinkgraph": {
      "command": "python",
      "args": ["/path/to/mcp/thinkgraph_mcp.py"]
    }
  }
}

What's Next

CLI interactive mode
Multi-model routing
Streaming DAG emission

Try It

GitHub: github.com/Mayne-X/thinkgraph
MIT License

ThinkGraph forces structured thinking before guessing. Your prompts deserve a foundation, not a hallucination.

Top comments (1)

Alex Shev • Jul 1

The fact-foundation step is underrated. Most LLM failures are not because the model cannot write the final answer; they happen because the model starts composing before the facts are pinned down. I like the idea of forcing a fact graph before asking for synthesis.