DEV Community

Jeffrey.Feillp
Jeffrey.Feillp

Posted on

Tian AI Thinker: Building a Three-Layer LLM Reasoning Engine

Tian AI Thinker: Building a Three-Layer LLM Reasoning Engine

The Thinker is the cognitive core of Tian AI. It orchestrates a local Qwen2.5-1.5B model through three distinct reasoning modes, each optimized for different query types.

Architecture Overview

┌─────────────────────────────────────────────┐
│              ThinkerRouter                    │
│  ┌──────────┐  ┌──────────┐  ┌────────────┐ │
│  │   Fast   │  │  CoT     │  │   Deep     │ │
│  │  Mode    │  │  Mode    │  │   Mode     │ │
│  └────┬─────┘  └────┬─────┘  └─────┬──────┘ │
│       └──────────────┼──────────────┘        │
│                      ▼                       │
│            LLMBridge (Qwen2.5)               │
│       ┌─────────────────────────────┐        │
│       │  llama-server :8080         │        │
│       │  (Qwen2.5-1.5B GGUF)       │        │
│       └─────────────────────────────┘        │
└─────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Three Thinking Modes

Fast Mode (~1-3s)

Single-pass generation for quick queries. The LLM receives a concise prompt with relevant knowledge context and generates a direct answer.

Chain-of-Thought Mode (~30-60s)

For complex problems requiring step-by-step reasoning. The LLM is prompted to:

  1. Break the problem into steps
  2. Analyze each step with knowledge base context
  3. Synthesize a final answer

Deep Mode (~60-120s)

Multi-perspective analysis with reflection. The system:

  1. Generates multiple analysis angles
  2. Evaluates each with knowledge base lookup
  3. Reflects on the analysis quality
  4. Synthesizes a comprehensive answer

PromptCache

To reduce LLM calls on repeated queries, Tian AI uses a PromptCache with:

  • LRU eviction (max 1000 entries)
  • TTL expiry (5-30 minutes depending on mode)
  • Multi-level caching: fast queries get longer TTL

LLMBridge

The LLMBridge handles all communication with llama.cpp server:

  • 30-second health check timeout
  • 5 retry attempts with exponential backoff
  • Streaming response support
  • Automatic fallback to cached responses

Key Insight

Small models (1.5B) can punch above their weight when combined with smart prompting architecture and a large local knowledge base. Tian AI's Thinker proves that local AI doesn't have to be dumb AI.


Published on 2026-04-25 21:19 UTC by Tian AI Dev Team

Top comments (0)