DEV Community

Cover image for Caching AI Responses in a Desktop App — Don't Pay Twice for the Same Question
hiyoyo
hiyoyo

Posted on

Caching AI Responses in a Desktop App — Don't Pay Twice for the Same Question

If this is useful, a ❤️ helps others find it.

All tests run on an 8-year-old MacBook Air.

If a user closes the AI diagnosis overlay and reopens it, should you call Gemini again?

No. Cache the result. Same input → same output. No reason to burn rate limit quota.

Here's the caching layer I built into HiyokoLogcat.


The problem

Without caching:

  1. User clicks diagnose on error line 847
  2. Gemini responds in 3 seconds
  3. User closes the overlay
  4. User reopens the overlay
  5. Gemini call again → 3 more seconds, 1 more request

With caching:
Steps 4-5 → instant, zero API calls.


Cache key: hash the input

Same log context → same hash → same cached result.

use std::collections::HashMap;
use sha2::{Sha256, Digest};

pub struct DiagnosisCache {
    entries: HashMap,
    max_size: usize,
}

#[derive(Clone)]
pub struct CacheEntry {
    pub result: String,
    pub created_at: std::time::Instant,
}

impl DiagnosisCache {
    pub fn new(max_size: usize) -> Self {
        Self { entries: HashMap::new(), max_size }
    }

    pub fn key(context: &str) -> String {
        let mut hasher = Sha256::new();
        hasher.update(context.as_bytes());
        format!("{:x}", hasher.finalize())
    }

    pub fn get(&self, key: &str) -> Option<&CacheEntry> {
        self.entries.get(key)
    }

    pub fn insert(&mut self, key: String, result: String) {
        // Evict oldest entries if at capacity
        if self.entries.len() >= self.max_size {
            if let Some(oldest_key) = self.entries
                .iter()
                .min_by_key(|(_, v)| v.created_at)
                .map(|(k, _)| k.clone())
            {
                self.entries.remove(&oldest_key);
            }
        }

        self.entries.insert(key, CacheEntry {
            result,
            created_at: std::time::Instant::now(),
        });
    }
}
Enter fullscreen mode Exit fullscreen mode

Using it in the command

#[tauri::command]
pub async fn diagnose(
    context: String,
    api_key: String,
    cache: tauri::State<'_, Mutex>,
) -> Result {
    let key = DiagnosisCache::key(&context);

    // Check cache first
    {
        let cache = cache.lock().unwrap();
        if let Some(entry) = cache.get(&key) {
            return Ok(entry.result.clone());  // instant
        }
    }

    // Cache miss — call Gemini
    let result = call_gemini(&context, &api_key).await?;

    // Store result
    {
        let mut cache = cache.lock().unwrap();
        cache.insert(key, result.clone());
    }

    Ok(result)
}
Enter fullscreen mode Exit fullscreen mode

Cache size

50 entries is enough for a session. Log lines change constantly — a cache from 2 hours ago is rarely useful. Clear on app restart, or add a TTL if you want.

// Register in main.rs
.manage(Mutex::new(DiagnosisCache::new(50)))
Enter fullscreen mode Exit fullscreen mode

Result

First diagnosis: 3 seconds. Every repeat: instant. Rate limit usage cut significantly for users who re-examine the same errors.


Hiyoko PDF Vault → https://hiyokoko.gumroad.com/l/HiyokoPDFVault
X → @hiyoyok

Top comments (0)