The bill that showed up at the end of the session
I ran a research agent that used a geocoding tool. The tool called a paid geocoding API and returned lat/lng for an address string. Cost: $0.005 per call.
The agent ran for a session. At the end, I pulled the tool call log and counted.
800 calls to the geocoding tool. 40 unique addresses. That is 20 calls per address on average.
760 of those calls were duplicates. The agent had re-geocoded addresses it had already resolved earlier in the same session. Same string in, same coordinates out, 760 times over.
Cost from unique work: $0.20. Cost from cache misses: $3.80. Four dollars total, $3.80 of it wasted.
The fix is not complicated. Memoize the tool function. Return the cached result when the same args come in again. Do not call the API a second time.
tool-result-cache is that memoizer.
The shape of the fix
Install the library:
pip install tool-result-cache
Wrap the tool function with @cache_result:
from tool_result_cache import ToolResultCache, cache_result
cache = ToolResultCache(max_size=256)
@cache_result(cache)
def geocode(address: str) -> dict:
# real API call here
return geocoding_api.lookup(address)
Now the agent can call geocode("1600 Pennsylvania Ave NW, Washington DC") 20 times. The first call hits the API. The other 19 return from cache instantly.
With an optional TTL:
cache = ToolResultCache(max_size=512, ttl_seconds=3600)
@cache_result(cache)
def geocode(address: str) -> dict:
return geocoding_api.lookup(address)
Entries older than one hour are treated as misses and re-fetched. Useful for data that changes on a known schedule.
You can also call the cache directly without the decorator:
result = cache.get({"address": "1600 Pennsylvania Ave NW"})
if result is None:
result = geocoding_api.lookup(address)
cache.set({"address": "1600 Pennsylvania Ave NW"}, result)
The direct API is useful when you need to skip the cache conditionally or when the tool function is not under your control.
What it does NOT do
Before going further, here is what the library intentionally skips:
-
No semantic similarity matching.
geocode("1600 Pennsylvania Ave")andgeocode("1600 Pennsylvania Avenue")hash to different keys. The library does not try to decide if two argument strings are "close enough." That would require another model call to cache a tool call, which is backwards. -
No cross-session persistence. The cache is in-memory. When the process exits, cached results are gone. If you need results to survive restarts, pair this with
agent-resumeor write your own persistence layer around the direct API. - No distributed backend. This is a single-process LRU cache. Redis, Memcached, and similar are out of scope.
-
No side-effect detection. The library does not know whether your tool has side effects. It is the caller's job to only wrap tools that are safe to memoize. Do not wrap
send_emailorwrite_filewith this decorator.
Inside the lib: SHA-256 canonical key design
The cache key is computed from the JSON-serialized, key-sorted args dict:
import json
import hashlib
def make_key(kwargs: dict) -> str:
canonical = json.dumps(kwargs, sort_keys=True, ensure_ascii=False)
return hashlib.sha256(canonical.encode()).hexdigest()
sort_keys=True is the load-bearing part. It means:
make_key({"address": "1600 Pennsylvania Ave", "country": "US"})
make_key({"country": "US", "address": "1600 Pennsylvania Ave"})
Both produce the same hash. Argument insertion order does not affect the key. The values are what matter.
This handles a real quirk of LLM tool calls. The model sometimes generates keyword arguments in different orders across turns. The canonical representation absorbs that variation without any special handling.
What does NOT hash to the same key:
make_key({"address": "1600 Pennsylvania Ave"})
make_key({"address": "1600 Pennsylvania Avenue"})
Different strings, different hashes, treated as different calls. The library does structural identity, not semantic similarity. That boundary is intentional and important. Crossing it would mean the cache can produce incorrect results if the similarity judgment is wrong.
When this is useful
The library earns its keep in a few specific patterns:
High-cost external APIs with stable data. Geocoding, currency conversion, taxonomy lookups, anything billed per call where the underlying data does not change between agent turns. Cache on the first call, skip the API on every repeat.
Multi-turn conversations with context re-fetching. Agents tend to re-establish context at each reasoning step. A tool that retrieves a customer record or a product description may get called once per turn even though the record has not changed. A result cache cuts that to one call per unique set of args per session.
Development and testing. Wrap the real API tool with @cache_result during development. The first test run hits the live API and populates the cache. Subsequent runs return cached results without burning rate limit or budget. Useful when you are iterating quickly on prompt logic and the tool outputs are not what you are testing.
Reducing latency in multi-agent setups. When multiple agents share a process and can call the same tools with the same args, a shared ToolResultCache instance prevents redundant parallel calls to the same endpoint.
When NOT to use it
Some tool calls should not be cached:
Side-effecting tools. send_notification, charge_card, post_to_slack. The second call to a cached side-effecting tool would silently do nothing. The agent would think the action happened. It did not.
Time-sensitive data. Live stock prices, current queue depths, real-time availability. Use TTL to bound staleness if you need any caching here at all, or skip the cache entirely.
Tools where identical args should produce different results. Random sampling, UUID generation, "get current time." The cache would return the same value on every call, which is wrong by design.
Stateful tools. Tools that depend on server-side session state or that perform streaming reads. The first result may not represent what a second call would return.
The library has no way to detect any of these cases automatically. That judgment belongs to the code that wires up the agent. Wrap only the tools where same-args-same-result is the correct contract.
Install
pip install tool-result-cache
Zero runtime dependencies. Python 3.9 and up. 25 tests.
The library has no opinion about your LLM SDK, your agent framework, or how you define tool functions. It works on any callable that takes keyword arguments and returns a JSON-serializable value. Wrap it with @cache_result or use the direct ToolResultCache API.
Source: MukundaKatta/tool-result-cache
Siblings
| Lib | Boundary | Repo |
|---|---|---|
| tool-call-cache | Similar idea, intercepts at the LLM call layer rather than the tool function layer | MukundaKatta/tool-call-cache |
| tool-result-cache-rs | Rust port of this library, same 30 tests, LRU plus pure-Rust SHA-256 | MukundaKatta/tool-result-cache-rs |
| llm-message-hash-py | Canonical JSON hashing they share, applied to full LLM request structs | MukundaKatta/llm-message-hash-py |
| agent-resume | Persistence across runs, a different concern, complements in-session caching | MukundaKatta/agent-resume |
What is next
A few things worth adding once the basics are solid:
Hit and miss counters. A simple stats surface so callers can see cache effectiveness without adding their own instrumentation around the decorator. Knowing your hit rate is useful when deciding whether to increase max_size.
Eviction hooks. A callback that fires when an entry is evicted from the LRU. Useful for logging or for writing evicted entries to an external store before they disappear.
Async decorator support. The current @cache_result decorator works on sync functions. An async-native variant would avoid awkward wrapping in asyncio agent loops.
If any of those would be useful for your agent stack, open an issue or PR on the repo.
This is part of the Hermes Agent Challenge, a sprint to build and ship practical agent infrastructure libraries. The goal is a library per day covering the gaps between LLM SDK calls and production-ready agent behavior. Each library is small, focused, and ships with a full test suite.
Top comments (0)