DEV Community

Cover image for LLM-Driven Client-Side Caching: A Hybrid Decision Architecture
Damir Karimov
Damir Karimov

Posted on • Originally published at blog.damir-karimov.com

LLM-Driven Client-Side Caching: A Hybrid Decision Architecture

Client-side caching is usually implemented as a storage optimization layer (TTL, SWR, invalidation rules). In practice it behaves like a decision system under uncertainty.

Static strategies fail when data volatility is non-uniform across the same application. This leads to either stale UI or excessive network traffic.

This article breaks down:

  • why standard caching approaches plateau
  • where ML improves the system
  • where LLMs actually fit
  • how to design a production-grade decision pipeline

Problem: caching is not a storage problem

Different data types behave differently:

  • user profiles → low volatility
  • feeds / notifications → high volatility
  • search results → context-dependent volatility
  • partially hydrated UI → unknown volatility

The core issue:

caching requires a policy decision per request, not a static rule

So the real problem is:

data → context → decision (cache / revalidate / bypass)

Baseline systems (what already exists)

1. SWR / TTL-based caching

Used in React Query / SWR:

  • stale-while-revalidate
  • background refetch
  • TTL invalidation

Works when:

  • update cycles are predictable
  • data freshness is stable

Fails when:

  • volatility varies inside the same dataset
  • freshness depends on UI state

2. Heuristic scoring systems

Example adaptive TTL:


volatilityScore = EWMA(changeFrequency)
priorityScore = userInteractionWeight * dataImportance
ttl = baseTTL / volatilityScore

Enter fullscreen mode Exit fullscreen mode

Improves:

  • adaptive cache lifetime
  • frequency-aware invalidation

Limitations:

  • requires manual feature design
  • domain-specific tuning
  • breaks under missing signals

3. Lightweight ML models

Typical approach:

  • logistic regression
  • XGBoost / LightGBM
  • embedding classifiers

Pros:

  • fast inference
  • stable behavior
  • cheaper than LLMs

Cons:

  • needs labeled “optimal cache decision” data (rare)
  • retraining pipeline required
  • brittle under product changes

Why all baseline approaches plateau

All classical systems assume:

  • feature space is complete
  • behavior is stationary

In real systems:

  • user behavior is contextual
  • volatility depends on UI state
  • freshness is semantic, not numeric
  • signals are incomplete

Result:

  • heuristics → saturate
  • ML-light → overfit or drift

Key idea: caching is a decision system under uncertainty

Instead of:

“how long do we cache this?”

The correct formulation is:

“what action should we take given incomplete information?”

actions:

  • HIT
  • REVALIDATE
  • BYPASS
  • SWR

Where LLMs fit (and where they don’t)

LLMs are not a replacement layer.

They function as:

fallback policy engine for ambiguous decision space

They are useful only when:

  • scoring model confidence is low
  • signals conflict
  • unseen patterns appear

Architecture: layered decision system

UI Layer
   ↓
Context Builder
   ↓
Policy Engine
   ├── Rule Layer (deterministic)
   ├── ML Scoring Layer (probabilistic)
   └── LLM Fallback Layer (uncertainty)
   ↓
Cache Layer
   ↓
Network
Enter fullscreen mode Exit fullscreen mode

Context model (input abstraction)

All decisions must be based on structured signals:

{
  "key": "user_feed",
  "lastUpdatedMs": 1200,
  "accessFrequency": "high",
  "volatilityScore": 0.82,
  "userAction": "scroll",
  "stalenessToleranceMs": 500
}
Enter fullscreen mode Exit fullscreen mode

Important constraint:

  • no raw prompts
  • only structured features

LLM role (strictly bounded)

LLM is only a classifier:

{
  "strategy": "HIT | REVALIDATE | BYPASS | SWR",
  "ttlMs": 1200,
  "confidence": 0.78
}
Enter fullscreen mode Exit fullscreen mode

Triggered only when:

  • ML confidence < threshold
  • feature signals conflict
  • unseen context patterns

Meta-cache: caching the decision layer

To reduce cost:

decisionCache(contextHash) → strategy

Effects:

  • avoids repeated LLM calls
  • stabilizes latency
  • amortizes inference cost

Cost-aware execution pipeline

IF rule matches:
    use rule engine
ELSE IF ML confidence > threshold:
    use ML model
ELSE:
    use LLM
Enter fullscreen mode Exit fullscreen mode

Typical production distribution:

  • 80–90% rules
  • 10–20% ML
  • <10% LLM

Failure modes

1. Overuse of LLM

Problem:

  • cost spikes
  • unpredictable latency

Mitigation:

  • strict confidence gating
  • bounded invocation layer

2. Latency variance

Problem:

  • inconsistent response time in UI

Mitigation:

  • decision caching
  • async precomputation

3. Model drift

Problem:

  • ML decisions degrade over time

Mitigation:

  • feedback loop
  • periodic recalibration

Engineering takeaways

  • caching is a decision system, not storage optimization
  • SWR + heuristics solve majority of cases
  • ML-light is optimal in stable feature spaces
  • LLMs are only for ambiguous cases
  • production systems require strict routing hierarchy

Conclusion

Client-side caching becomes effective only when modeled as a layered decision system.

  • rules handle deterministic cases
  • ML handles structured uncertainty
  • LLM handles ambiguity

The correct design is hybrid, with strict boundaries and cost control, not LLM-centric

Discussion

Where should the boundary be defined between ML confidence and LLM fallback in production caching systems?

Top comments (0)