The Three-Layer Reasoning Architecture
Tian AI's Thinker module implements a sophisticated three-layer reasoning engine that adapts to different complexity levels. By intelligently routing queries to the appropriate reasoning mode, it achieves both speed and depth without compromising on quality.
Layer 1: Fast Mode (Direct Response)
For simple queries like greetings or basic facts, Fast Mode generates direct responses using minimal context. The prompt engineering here is deliberately lightweight:
System: You are a helpful AI assistant. Respond concisely.
Query: {user_input}
Response:
This mode achieves ~30 tokens/second on Qwen2.5-1.5B, making it perfect for real-time chat interactions.
Layer 2: CoT Mode (Chain-of-Thought)
For multi-step reasoning problems, CoT Mode activates step-by-step thinking:
System: You are a reasoning AI. Think step by step.
Query: {user_input}
Let me think through this carefully:
1. First, I need to understand...
The key trick is temperature control: we set temperature=0.3 for CoT to ensure logical consistency while maintaining some creative exploration.
Layer 3: Deep Mode (Context-Enhanced Reasoning)
The most powerful mode activates context-aware reasoning with retrieved knowledge:
System: You are a knowledgeable AI with access to a personal knowledge base.
Context: {retrieved_entries}
Query: {user_input}
Based on the context and my knowledge:
Prompt Engineering for Small Models
Making Qwen2.5-1.5B punch above its weight requires careful prompt engineering:
- Structured output formats: Always request JSON or numbered lists for complex responses
- Few-shot examples: Include 2-3 examples in the system prompt for new tasks
- Negative constraints: Explicitly tell the model what NOT to do ("Do not mention external tools you don't have")
- Token budget: Cap response length to match query complexity
Performance Results
| Mode | Latency | Quality Score | Use Case |
|---|---|---|---|
| Fast | 0.5-1s | 6/10 | Greetings, simple facts |
| CoT | 2-3s | 8/10 | Math, logic problems |
| Deep | 3-5s | 9/10 | Knowledge-based Q&A |
The Thinker module dynamically selects the appropriate layer based on query complexity analysis, ensuring optimal performance for every interaction.
Top comments (0)