Tian AI Thinker: Building a Three-Layer LLM Reasoning Engine

#ai #python #llm

Tian AI Thinker: Building a Three-Layer LLM Reasoning Engine

The Thinker is the cognitive core of Tian AI. It orchestrates a local Qwen2.5-1.5B model through three distinct reasoning modes, each optimized for different query types.

Architecture Overview

┌─────────────────────────────────────────────┐
│              ThinkerRouter                    │
│  ┌──────────┐  ┌──────────┐  ┌────────────┐ │
│  │   Fast   │  │  CoT     │  │   Deep     │ │
│  │  Mode    │  │  Mode    │  │   Mode     │ │
│  └────┬─────┘  └────┬─────┘  └─────┬──────┘ │
│       └──────────────┼──────────────┘        │
│                      ▼                       │
│            LLMBridge (Qwen2.5)               │
│       ┌─────────────────────────────┐        │
│       │  llama-server :8080         │        │
│       │  (Qwen2.5-1.5B GGUF)       │        │
│       └─────────────────────────────┘        │
└─────────────────────────────────────────────┘

Three Thinking Modes

Fast Mode (~1-3s)

Single-pass generation for quick queries. The LLM receives a concise prompt with relevant knowledge context and generates a direct answer.

Chain-of-Thought Mode (~30-60s)

For complex problems requiring step-by-step reasoning. The LLM is prompted to:

Break the problem into steps
Analyze each step with knowledge base context
Synthesize a final answer

Deep Mode (~60-120s)

Multi-perspective analysis with reflection. The system:

Generates multiple analysis angles
Evaluates each with knowledge base lookup
Reflects on the analysis quality
Synthesizes a comprehensive answer

PromptCache

To reduce LLM calls on repeated queries, Tian AI uses a PromptCache with:

LRU eviction (max 1000 entries)
TTL expiry (5-30 minutes depending on mode)
Multi-level caching: fast queries get longer TTL

LLMBridge

The LLMBridge handles all communication with llama.cpp server:

30-second health check timeout
5 retry attempts with exponential backoff
Streaming response support
Automatic fallback to cached responses

Key Insight

Small models (1.5B) can punch above their weight when combined with smart prompting architecture and a large local knowledge base. Tian AI's Thinker proves that local AI doesn't have to be dumb AI.

Published on 2026-04-25 21:19 UTC by Tian AI Dev Team

DEV Community

Tian AI Thinker: Building a Three-Layer LLM Reasoning Engine

Tian AI Thinker: Building a Three-Layer LLM Reasoning Engine

Architecture Overview

Three Thinking Modes

Fast Mode (~1-3s)

Chain-of-Thought Mode (~30-60s)

Deep Mode (~60-120s)

PromptCache

LLMBridge

Key Insight

Top comments (0)