DEV Community

Jeffrey.Feillp
Jeffrey.Feillp

Posted on

Tian AI PromptCache: LRU+TTL Strategy for Local LLMs

Tian AI PromptCache: LRU+TTL Strategy for Local LLMs

LLM inference is expensive — both in time and battery. Tian AI's PromptCache dramatically reduces unnecessary calls.

The Strategy

Tian AI uses a dual eviction strategy: LRU (Least Recently Used) + TTL (Time To Live).

LRU Eviction

  • Maximum cache size: 1000 entries
  • When full, oldest entry is removed
  • Frequently accessed entries stay forever

TTL Expiry

  • Fast mode queries: 30 minute TTL
  • CoT mode queries: 15 minute TTL
  • Deep mode queries: 5 minute TTL
  • Knowledge base lookups: 60 minute TTL

Cache Key Design

def cache_key(query, mode, knowledge_context):
    text = f"{mode}:{query}:{knowledge_context[:200]}"
    return hashlib.md5(text.encode()).hexdigest()
Enter fullscreen mode Exit fullscreen mode

The key includes:

  • The reasoning mode (Fast/CoT/Deep)
  • The raw query text
  • A prefix of the knowledge context

Performance Impact

Scenario Without Cache With Cache
Repeated query (10x) 30-60 seconds 0.1 seconds
Similar queries Full LLM call Partial match
User asks "What's Tian AI?" twice 1.5B inference Instant reply

Implementation Details

class PromptCache:
    def __init__(self, max_size=1000, default_ttl=1800):
        self.cache = OrderedDict()
        self.expiry = {}

    def get(self, key):
        if key in self.cache:
            if time.time() < self.expiry[key]:
                self.cache.move_to_end(key)
                return self.cache[key]
            else:
                del self.cache[key]
        return None
Enter fullscreen mode Exit fullscreen mode

The cache uses OrderedDict for O(1) LRU operations and a separate expiry dict for TTL tracking. This design was chosen over Redis or memcached because Tian AI runs entirely offline with no external dependencies.

Top comments (0)