In the world of AI applications, performance is not just about speed—it's about intelligent resource management. Litho's caching system is like a "smart memory palace" that remembers everything important and forgets what's not needed.
Project Open Source Address: https://github.com/sopaco/deepwiki-rs
Introduction: The Art of AI Application Performance Optimization
Imagine an AI documentation generation system that needs to analyze thousands of code files. Without caching, each analysis would require re-processing the same files, re-calling expensive AI APIs, and re-generating the same documentation structures. The cost would be prohibitive, and the performance unacceptable.
Litho's intelligent caching system solves this problem through a multi-level, intelligent caching strategy that balances performance, cost, and freshness requirements.
Chapter 1: Multi-Level Cache Architecture Design
1.1 Cache Hierarchy: From Memory to Disk
Litho implements a sophisticated multi-level cache architecture:
1.2 Cache Level Implementation
// Multi-level cache manager
pub struct MultiLevelCache {
l1_cache: Arc<L1MemoryCache>,
l2_cache: Arc<L2DiskCache>,
l3_cache: Arc<L3PersistentCache>,
l4_cache: Option<Arc<L4ExternalCache>>,
}
impl MultiLevelCache {
pub async fn get<T: DeserializeOwned + Send + Sync>(
&self,
key: &CacheKey
) -> Result<Option<T>> {
// Try L1 cache first (fastest)
if let Some(value) = self.l1_cache.get(key).await? {
return Ok(Some(value));
}
// Try L2 cache
if let Some(value) = self.l2_cache.get(key).await? {
// Store in L1 cache for future fast access
self.l1_cache.set(key, &value).await?;
return Ok(Some(value));
}
// Try L3 cache
if let Some(value) = self.l3_cache.get(key).await? {
// Store in L2 and L1 caches
self.l2_cache.set(key, &value).await?;
self.l1_cache.set(key, &value).await?;
return Ok(Some(value));
}
// Try L4 cache if available
if let Some(l4_cache) = &self.l4_cache {
if let Some(value) = l4_cache.get(key).await? {
// Populate all lower level caches
self.l3_cache.set(key, &value).await?;
self.l2_cache.set(key, &value).await?;
self.l1_cache.set(key, &value).await?;
return Ok(Some(value));
}
}
Ok(None)
}
pub async fn set<T: Serialize + Send + Sync>(
&self,
key: &CacheKey,
value: &T,
ttl: Option<Duration>
) -> Result<()> {
// Set in all cache levels
if let Some(l4_cache) = &self.l4_cache {
l4_cache.set(key, value, ttl).await?;
}
self.l3_cache.set(key, value, ttl).await?;
self.l2_cache.set(key, value, ttl).await?;
self.l1_cache.set(key, value, ttl).await?;
Ok(())
}
}
Chapter 2: Intelligent Cache Invalidation Strategies
2.1 Content-Based Cache Keys
Litho uses content-based cache keys to ensure cache validity:
// Content-based cache key generation
pub struct CacheKeyGenerator {
hasher: Blake3,
}
impl CacheKeyGenerator {
pub fn generate_key(&self, content: &CacheableContent) -> CacheKey {
let mut hasher = self.hasher.clone();
// Include content hash
hasher.update(&content.content_hash());
// Include content metadata
hasher.update(&content.metadata_hash());
// Include analysis parameters
hasher.update(&content.parameters_hash());
// Include model version (for AI responses)
if let Some(model_version) = content.model_version() {
hasher.update(model_version.as_bytes());
}
CacheKey::from(hasher.finalize().as_bytes())
}
}
// Cacheable content trait
pub trait CacheableContent {
fn content_hash(&self) -> [u8; 32];
fn metadata_hash(&self) -> [u8; 32];
fn parameters_hash(&self) -> [u8; 32];
fn model_version(&self) -> Option<String>;
fn cache_ttl(&self) -> Duration;
}
2.2 Event-Driven Cache Invalidation
Litho implements an event-driven cache invalidation system:
// Cache invalidation manager
pub struct CacheInvalidationManager {
event_bus: EventBus,
cache: Arc<MultiLevelCache>,
}
impl CacheInvalidationManager {
pub async fn start_invalidation_listener(&self) -> Result<()> {
let mut receiver = self.event_bus.subscribe().await;
while let Ok(event) = receiver.recv().await {
match event {
CacheEvent::FileModified { path, old_hash, new_hash } => {
self.invalidate_file_cache(&path, old_hash, new_hash).await?;
}
CacheEvent::CodeStructureChanged { module_path, changes } => {
self.invalidate_structure_cache(&module_path, &changes).await?;
}
CacheEvent::AIModelUpdated { model_name, new_version } => {
self.invalidate_ai_responses(model_name, new_version).await?;
}
CacheEvent::ConfigurationChanged { new_config } => {
self.invalidate_config_dependent_caches(&new_config).await?;
}
}
}
Ok(())
}
async fn invalidate_file_cache(
&self,
path: &Path,
old_hash: [u8; 32],
new_hash: [u8; 32]
) -> Result<()> {
// Generate cache keys for old content
let old_key = self.generate_file_key(path, &old_hash);
// Invalidate cache entries for old content
self.cache.invalidate(&old_key).await?;
// If file was deleted, also clean up related metadata
if new_hash == [0; 32] { // Special hash for deleted files
self.cleanup_file_metadata(path).await?;
}
info!("Invalidated cache for modified file: {}", path.display());
Ok(())
}
}
Chapter 3: Cost-Aware Caching Strategies
3.1 AI API Call Cost Optimization
Litho implements cost-aware caching specifically for expensive AI API calls:
// Cost-aware cache manager for AI responses
pub struct CostAwareAICache {
cache: Arc<MultiLevelCache>,
cost_tracker: Arc<CostTracker>,
budget_manager: Arc<BudgetManager>,
}
impl CostAwareAICache {
pub async fn get_ai_response(
&self,
prompt: &AIPrompt,
model: &str
) -> Result<Option<AIResponse>> {
let cache_key = self.generate_ai_cache_key(prompt, model);
// Check cache first
if let Some(response) = self.cache.get::<AIResponse>(&cache_key).await? {
// Update cost statistics (cache hit saves money)
self.cost_tracker.record_cache_hit(model, prompt.token_count()).await;
return Ok(Some(response));
}
// Calculate expected cost of API call
let expected_cost = self.cost_tracker.estimate_cost(model, prompt.token_count()).await;
// Check if within budget
if !self.budget_manager.can_spend(expected_cost).await? {
return Err(anyhow!("Budget exceeded for AI API calls"));
}
// Make actual API call
let response = self.make_ai_api_call(prompt, model).await?;
// Record actual cost
self.cost_tracker.record_api_call(model, response.token_count(), response.cost()).await;
// Cache the response with appropriate TTL
let ttl = self.calculate_ai_response_ttl(&response);
self.cache.set(&cache_key, &response, Some(ttl)).await?;
Ok(Some(response))
}
fn calculate_ai_response_ttl(&self, response: &AIResponse) -> Duration {
// Base TTL on response characteristics
match response.volatility() {
ResponseVolatility::High => Duration::from_secs(300), // 5 minutes
ResponseVolatility::Medium => Duration::from_secs(3600), // 1 hour
ResponseVolatility::Low => Duration::from_secs(86400), // 24 hours
ResponseVolatility::Static => Duration::from_secs(604800), // 1 week
}
}
}
3.2 Adaptive Cache Sizing
Litho dynamically adjusts cache size based on available resources:
// Adaptive cache size manager
pub struct AdaptiveCacheManager {
memory_monitor: Arc<MemoryMonitor>,
cache: Arc<MultiLevelCache>,
config: AdaptiveCacheConfig,
}
impl AdaptiveCacheManager {
pub async fn start_adaptive_management(&self) -> Result<()> {
loop {
// Monitor system memory usage
let memory_info = self.memory_monitor.get_memory_info().await?;
// Adjust cache size based on available memory
self.adjust_cache_size(&memory_info).await?;
// Clean up expired entries
self.cleanup_expired_entries().await?;
// Wait before next adjustment
tokio::time::sleep(self.config.adjustment_interval).await;
}
}
async fn adjust_cache_size(&self, memory_info: &MemoryInfo) -> Result<()> {
let available_memory = memory_info.available_memory();
let total_memory = memory_info.total_memory();
// Calculate target cache size based on available memory
let target_cache_size = if available_memory > total_memory * 0.3 {
// Plenty of memory available - use aggressive caching
available_memory * 0.4 // Use 40% of available memory
} else if available_memory > total_memory * 0.1 {
// Moderate memory available - use conservative caching
available_memory * 0.2 // Use 20% of available memory
} else {
// Low memory available - use minimal caching
available_memory * 0.1 // Use 10% of available memory
};
// Resize cache
self.cache.resize(target_cache_size).await?;
info!(
"Adjusted cache size to {} MB ({}% of available memory)",
target_cache_size / 1024 / 1024,
(target_cache_size as f64 / available_memory as f64 * 100.0) as u32
);
Ok(())
}
}
Chapter 4: Performance Monitoring and Analytics
4.1 Comprehensive Cache Metrics
Litho collects detailed cache performance metrics:
// Cache performance metrics
pub struct CacheMetrics {
hits: AtomicU64,
misses: AtomicU64,
hit_ratio: AtomicF64,
average_access_time: AtomicU64, // nanoseconds
memory_usage: AtomicU64, // bytes
}
impl CacheMetrics {
pub fn new() -> Self {
Self {
hits: AtomicU64::new(0),
misses: AtomicU64::new(0),
hit_ratio: AtomicF64::new(0.0),
average_access_time: AtomicU64::new(0),
memory_usage: AtomicU64::new(0),
}
}
pub fn record_hit(&self, access_time_ns: u64) {
self.hits.fetch_add(1, Ordering::Relaxed);
self.update_hit_ratio();
self.update_average_time(access_time_ns);
}
pub fn record_miss(&self, access_time_ns: u64) {
self.misses.fetch_add(1, Ordering::Relaxed);
self.update_hit_ratio();
self.update_average_time(access_time_ns);
}
fn update_hit_ratio(&self) {
let hits = self.hits.load(Ordering::Relaxed);
let misses = self.misses.load(Ordering::Relaxed);
let total = hits + misses;
if total > 0 {
let ratio = hits as f64 / total as f64;
self.hit_ratio.store(ratio, Ordering::Relaxed);
}
}
pub fn get_metrics_report(&self) -> CacheMetricsReport {
CacheMetricsReport {
hits: self.hits.load(Ordering::Relaxed),
misses: self.misses.load(Ordering::Relaxed),
hit_ratio: self.hit_ratio.load(Ordering::Relaxed),
average_access_time_ns: self.average_access_time.load(Ordering::Relaxed),
memory_usage_bytes: self.memory_usage.load(Ordering::Relaxed),
}
}
}
4.2 Real-Time Performance Dashboard
Litho provides real-time cache performance monitoring:
Chapter 5: Case Study: Large-Scale Project Analysis
5.1 Performance Impact Analysis
A real-world case study of using Litho to analyze a large codebase:
Project Specifications:
- Codebase Size: 1.5 million lines of code
- File Count: 8,000+ source files
- Analysis Duration: 4 hours with caching vs 12+ hours without caching
- Cost Savings: 75% reduction in AI API costs
Cache Performance Metrics:
- Cache Hit Ratio: 82%
- Average Response Time: 3.2ms (cached) vs 2.3s (uncached)
- Memory Usage: 512MB peak
- Disk Cache Size: 2.3GB
5.2 Cost-Benefit Analysis
Without Caching:
- AI API Costs: $120 per analysis
- Time Cost: 12 hours
- Resource Usage: High memory and CPU
With Litho's Intelligent Caching:
- AI API Costs: $30 per analysis (75% savings)
- Time Cost: 4 hours (67% faster)
- Resource Usage: Optimized through intelligent caching
Conclusion: The Economics of AI Performance
Litho's intelligent caching system demonstrates that in AI applications, performance optimization is not just a technical challenge—it's an economic imperative. By intelligently managing cache resources, Litho achieves:
- Cost Efficiency: Dramatic reduction in expensive AI API calls
- Performance Optimization: Sub-millisecond response times for cached content
- Resource Management: Adaptive caching based on available system resources
- Freshness Guarantee: Intelligent invalidation ensures data accuracy
This approach makes advanced AI documentation generation accessible and affordable for projects of all sizes, from small open-source libraries to large enterprise codebases.
This article concludes the Litho project technical analysis series. Litho's open source address: https://github.com/sopaco/deepwiki-rs
Series Summary: Through these five articles, we've explored Litho's core technical innovations, including its four-stage pipeline, multi-agent architecture, Rust technology advantages, plugin system design, and intelligent caching strategies. Together, these technologies make Litho a powerful, extensible, and efficient AI-driven documentation generation platform.
Top comments (0)