DEV Community

Jeffrey.Feillp
Jeffrey.Feillp

Posted on

Tian AI Knowledge Base: Million Entries on Your Phone

How Tian AI Builds a Million-Entry Knowledge Base on Your Phone

Tian AI includes a massive local knowledge base — millions of indexed concepts across 100+ domains, stored in a single SQLite file, searchable in ~0.04 seconds.

The Problem

Large language models like GPT-4 store knowledge in their weights. Smaller local models (1.5B parameters) have limited knowledge capacity. The solution: augment the LLM with an external knowledge base.

The Architecture

User Query → KnowledgeRetriever → Confidence > 0.8? → Direct Answer
                                      ↓ No
                               Inject context into LLM prompt
                                      ↓
                               LLM generates augmented response
Enter fullscreen mode Exit fullscreen mode

Database Schema

CREATE TABLE IF NOT EXISTS concepts (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    concept TEXT NOT NULL,
    category TEXT,
    response_template TEXT,
    question_patterns TEXT
);

CREATE VIRTUAL TABLE IF NOT EXISTS concepts_fts USING fts5(
    concept, category, response_template, question_patterns
);
Enter fullscreen mode Exit fullscreen mode

Each concept stores:

  • The concept name (e.g., "artificial intelligence")
  • Category (e.g., "technology")
  • Response template (the knowledge content)
  • 30 question patterns for flexible retrieval

Batch Generation Strategy

Building millions of entries requires careful batch processing:

  1. No primary key on batch insert — Using INSERT instead of INSERT OR IGNORE prevents key conflicts
  2. Chinese tokenization — Single-character splitting (each Chinese character is a token) instead of regex r'[\w]+' (which matches Chinese chars in Python)
  3. Index after insert — Build FTS5 index after all data is loaded

Retrieval Performance

Metric Value
Query time 0.04-0.1s
Database size ~34GB (indexed)
Concepts Millions
Domains 100+
Question patterns per concept 30

The Result

Even without a cloud connection, Tian AI can answer questions about science, technology, history, medicine, finance, and more — drawing from its local knowledge base rather than relying on model parameters.


Published on 2026-04-25 21:19 UTC by Tian AI Dev Team

Top comments (0)