Mukunda Rao Katta

Posted on May 25

tool-result-cache-rs: LRU Memoization for Agent Tool Calls in Rust

#hermeschallenge #ai #rust #agents

The geocoder that got called 400 times

Multi-step research agent. Takes a list of company names, finds their headquarters city, and builds a report. Forty companies in the input. Each one triggered a call to get_city_coordinates so downstream tools could compute distances.

Several companies shared the same headquarters city. "San Francisco" appeared 11 times across the company list. The agent called get_coordinates("San Francisco") 11 times, to the same external geocoding API, getting the same result each time.

The geocoding API had a per-day quota. The agent burned 40 calls on 18 unique cities. Over a dataset of 200 companies, that became 200 API calls for 30 unique cities. Quota hit before the job finished.

The fix is obvious in retrospect. Cache the result of get_coordinates keyed on the city name. If you asked for "San Francisco" already this session, return the cached coordinates. Do not call the API again.

tool-result-cache-rs does exactly that. You give it a tool name and args, it gives you the cached result or a miss. When you get a miss, you call the tool, then store the result. Next call with the same args is a cache hit.

Shape of the fix

[dependencies]
tool-result-cache-rs = "0.1"
serde_json = "1"

Basic lookup-and-fill pattern:

use tool_result_cache_rs::ToolResultCache;
use serde_json::json;
use std::time::Duration;

// 1000-entry LRU, 5-minute TTL
let cache = ToolResultCache::new(1000, Duration::from_secs(300));

let args = json!({"city": "San Francisco"});
let key = cache.cache_key("get_coordinates", &args);

// Try the cache first
if let Some(cached) = cache.get(&key) {
    return Ok(cached);
}

// Cache miss: call the actual tool
let result = get_coordinates_impl(&args)?;

// Store for next time
cache.set(key, result.clone());

Ok(result)

You can also skip the manual key step with the combined get_or_insert method:

let result = cache.get_or_insert(
    "get_coordinates",
    &json!({"city": "Austin"}),
    || get_coordinates_impl(&json!({"city": "Austin"}))
)?;

Inspect cache state:

let stats = cache.stats();
println!("Hits: {}, Misses: {}, Evictions: {}", stats.hits, stats.misses, stats.evictions);
println!("Current size: {}/{}", stats.current_size, stats.capacity);

The cache key is a SHA-256 hash of the canonical JSON form of (tool_name, args). Argument objects with the same keys in different orders hash to the same key. {"a": 1, "b": 2} and {"b": 2, "a": 1} are the same cache entry.

What it does NOT do

This crate does not know whether a tool is safe to cache. It does not inspect the tool function or its side effects. You are responsible for only caching idempotent tools. Calling send_email twice with the same args is a bug if the second call is a cache miss. Calling get_coordinates twice is fine. The crate gives you the cache machinery. Deciding which tools should use it is your call.

The cache is also not durable. When the process exits, the cache is gone. Every run starts fresh. For tools where the result is expensive enough to persist across runs (external API calls with quotas, file reads, database lookups), a persistent cache keyed on the same hash is worth building on top of this. The cache_key method gives you the hash string that works as a persistent store key.

Inside the lib

The cache is a hand-rolled LRU backed by a HashMap and a doubly linked list. There are no external crate dependencies beyond serde_json and sha2.

The choice to hand-roll LRU rather than pull in lru or quick-cache was deliberate. The dependency surface matters for a utility crate that is going to be composed with other crates. Adding a transitive dependency on an LRU crate that itself has opinions about allocation or unsafe code creates friction when the crate is embedded in larger systems. The implementation is about 250 lines and covers the standard LRU contract: access promotes a node to the front, eviction removes from the back, insert at the front.

The SHA-256 key is computed over the canonical JSON representation of the tool name and args. Canonical here means keys are sorted before serialization. This is the same approach used in llm-message-hash for stable prompt hashes.

cache_key("get_coordinates", {"city": "Austin"})
    -> SHA-256(canonical_json(["get_coordinates", {"city": "Austin"}]))
    -> "a3f2b9..."

The TTL is checked at read time, not at write time. When you call cache.get(&key), the crate checks whether the entry's age exceeds the TTL. If it does, the entry is removed and None is returned. Entries are not proactively swept on a timer. This means entries can sit in the LRU past their TTL until something reads them. For typical agent loop usage (a few hundred entries, TTL of minutes to hours) this is not a problem. The LRU eviction bound keeps the total size capped regardless of staleness.

The 30-test suite covers: LRU eviction ordering, TTL expiration at read time, canonical key stability across arg orderings, stats tracking, concurrent access via Arc<Mutex<>> wrapping, and edge cases like empty args and null values.

When useful

Agent loops where the same lookup tool gets called repeatedly across turns or across items in a batch (geocoding, ticker lookups, taxonomy classification, static config reads).
Batch jobs over datasets with low cardinality. If your dataset has 1,000 items but only 50 unique values for a given field, caching the tool result for that field cuts API calls by 95%.
Rate-limited external APIs where repeated identical calls waste quota. The cache absorbs repeated calls; the API sees only unique ones.
Multi-turn conversations where the agent re-derives context that was already fetched in an earlier turn (user profile lookups, document metadata fetches).

When NOT

Tools with side effects. Caching the result of a write operation and returning the cached result on the next identical call is almost always wrong.
Tools where fresh data is critical on every call. Weather, stock prices, live sensor readings. A 5-minute TTL on "current temperature" gives you a stale reading. Set TTL to zero (or do not cache) for fresh-data tools.
High-cardinality tools where args are always unique. If every call to query_database uses a different query string, the cache will have a 0% hit rate and you are paying the hash cost for nothing.
Cross-process agent deployments where all workers need to share a cache. This crate is in-process. Use Redis or a shared service for multi-process caching.

Install

[dependencies]
tool-result-cache-rs = "0.1"
serde_json = "1"

crates.io: tool-result-cache-rs
GitHub: MukundaKatta/tool-result-cache-rs

Siblings

Crate / Package	What it does
tool-call-cache (Python)	Python port; same LRU+TTL pattern for Python agent loops
tool-result-cache (Python)	Extended Python version with async support and disk persistence
tool-loop-guard-rs	Detect repeated identical calls in a sliding window; stop loops
llm-message-hash	Canonical hash of full LLM requests for cross-turn prompt dedup
agentidemp-rs	Idempotency keys for agent operations; broader than tool-level

What is next

Async support via a tokio::sync::Mutex-backed variant is the most requested addition. The current implementation uses std::sync::Mutex, which blocks the async executor thread during lock contention. In practice, the lock is held for microseconds and contention is rare in agent loops. But for high-throughput async applications with many concurrent tool calls, a tokio-aware version with RwLock (multiple readers, one writer) would reduce contention.

A Cache::with_key_fn constructor that accepts a custom key function would also help. Right now the key is always SHA-256 of canonical JSON. Some tools have natural string keys (a URL, a city name, a ticker symbol) where the hash is unnecessary overhead. Allowing the caller to provide a key function directly would make those cases faster and easier to debug (you can inspect the key without reversing a hash).

Part of the Hermes Agent Challenge sprint. All crates shipped on crates.io.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.