DEV Community

LeoLan
LeoLan

Posted on

Smarter Ai Agents with Caching, Building cache_utiles

When building AI agents or LLM-powered systems, one of the biggest performance challenges is managing expensive computations, model loading, embeddings, and repeated API calls.

Each time your agent recomputes something it could’ve cached, you lose both time and tokens.

That’s why I built cache_utiles, a lightweight Python toolkit that brings LRU caching, memory control, and result persistence to AI workflows.


Overview

This module includes three key utilities that make AI pipelines faster and more efficient:

Class Description
MemoryCache Generic LRU cache with memory limit (in MB). Ideal for large LLM responses or embeddings.
ModelLRUStore Keeps a limited number of loaded model instances in memory, avoiding repeated reloading.
ResultCache Caches computed results with TTL and LRU eviction. Perfect for embeddings, API responses, etc.

1. MemoryCache, Control Memory Like a Pro

A flexible, thread-safe cache that automatically removes the least recently used items once the memory cap is reached.

from cache_utils import MemoryCache

# Create a cache with a 100MB limit
cache = MemoryCache(max_size_mb=100)

# Store responses
cache.set("response:123", {"text": "Hello, world!"})

# Retrieve them later
print(cache.get("response:123"))  
# {'text': 'Hello, world!'}

# View cache stats
print(cache.stats())
# {'entries': 1, 'used_MB': 0.01, 'max_MB': 100.0}
Enter fullscreen mode Exit fullscreen mode


`

Perfect for LLM results, embeddings, or any intermediate computation your agents reuse often.


Why It Matters

These caching layers make a real difference in AI and LLM systems:

  • Reduce API & compute costs
  • Speed up inference and responses
  • Prevent memory leaks
  • Thread-safe for concurrent agents
  • Easy to integrate with LangChain, LlamaIndex, or any custom framework

Tech Highlights

  • Pure Python, no heavy dependencies
  • Optional: pympler for advanced memory tracking
  • LRU + TTL eviction strategies
  • MIT licensed and open source

🔗 Try It Yourself

👉 Full project: CodingStation/cache_utiles

pip install pympler # Optional dependency

Start caching smarter today!


Top comments (0)