Tian AI PromptCache: LRU+TTL Strategy for Local LLMs
LLM inference is expensive — both in time and battery. Tian AI's PromptCache dramatically reduces unnecessary calls.
The Strategy
Tian AI uses a dual eviction strategy: LRU (Least Recently Used) + TTL (Time To Live).
LRU Eviction
- Maximum cache size: 1000 entries
- When full, oldest entry is removed
- Frequently accessed entries stay forever
TTL Expiry
- Fast mode queries: 30 minute TTL
- CoT mode queries: 15 minute TTL
- Deep mode queries: 5 minute TTL
- Knowledge base lookups: 60 minute TTL
Cache Key Design
def cache_key(query, mode, knowledge_context):
text = f"{mode}:{query}:{knowledge_context[:200]}"
return hashlib.md5(text.encode()).hexdigest()
The key includes:
- The reasoning mode (Fast/CoT/Deep)
- The raw query text
- A prefix of the knowledge context
Performance Impact
| Scenario | Without Cache | With Cache |
|---|---|---|
| Repeated query (10x) | 30-60 seconds | 0.1 seconds |
| Similar queries | Full LLM call | Partial match |
| User asks "What's Tian AI?" twice | 1.5B inference | Instant reply |
Implementation Details
class PromptCache:
def __init__(self, max_size=1000, default_ttl=1800):
self.cache = OrderedDict()
self.expiry = {}
def get(self, key):
if key in self.cache:
if time.time() < self.expiry[key]:
self.cache.move_to_end(key)
return self.cache[key]
else:
del self.cache[key]
return None
The cache uses OrderedDict for O(1) LRU operations and a separate expiry dict for TTL tracking. This design was chosen over Redis or memcached because Tian AI runs entirely offline with no external dependencies.
Top comments (0)