Tian AI PromptCache: LRU+TTL Strategy for Local LLMs

#ai #python #performance

Tian AI PromptCache: LRU+TTL Strategy for Local LLMs

LLM inference is expensive — both in time and battery. Tian AI's PromptCache dramatically reduces unnecessary calls.

The Strategy

Tian AI uses a dual eviction strategy: LRU (Least Recently Used) + TTL (Time To Live).

LRU Eviction

Maximum cache size: 1000 entries
When full, oldest entry is removed
Frequently accessed entries stay forever

TTL Expiry

Fast mode queries: 30 minute TTL
CoT mode queries: 15 minute TTL
Deep mode queries: 5 minute TTL
Knowledge base lookups: 60 minute TTL

Cache Key Design

def cache_key(query, mode, knowledge_context):
    text = f"{mode}:{query}:{knowledge_context[:200]}"
    return hashlib.md5(text.encode()).hexdigest()

The key includes:

The reasoning mode (Fast/CoT/Deep)
The raw query text
A prefix of the knowledge context

Performance Impact

Scenario	Without Cache	With Cache
Repeated query (10x)	30-60 seconds	0.1 seconds
Similar queries	Full LLM call	Partial match
User asks "What's Tian AI?" twice	1.5B inference	Instant reply

Implementation Details

class PromptCache:
    def __init__(self, max_size=1000, default_ttl=1800):
        self.cache = OrderedDict()
        self.expiry = {}

    def get(self, key):
        if key in self.cache:
            if time.time() < self.expiry[key]:
                self.cache.move_to_end(key)
                return self.cache[key]
            else:
                del self.cache[key]
        return None

The cache uses OrderedDict for O(1) LRU operations and a separate expiry dict for TTL tracking. This design was chosen over Redis or memcached because Tian AI runs entirely offline with no external dependencies.