Damir Karimov

Posted on May 4 • Edited on May 11 • Originally published at blog.damir-karimov.com

LLM-Driven Client-Side Caching: A Hybrid Decision Architecture

#frontend #systemdesign #architecture #llm

Client-side caching is usually implemented as a storage optimization layer (TTL, SWR, invalidation rules). In practice it behaves like a decision system under uncertainty.

Static strategies fail when data volatility is non-uniform across the same application. This leads to either stale UI or excessive network traffic.

This article breaks down:

why standard caching approaches plateau
where ML improves the system
where LLMs actually fit
how to design a production-grade decision pipeline

Problem: caching is not a storage problem

Different data types behave differently:

user profiles → low volatility
feeds / notifications → high volatility
search results → context-dependent volatility
partially hydrated UI → unknown volatility

The core issue:

caching requires a policy decision per request, not a static rule

So the real problem is:

data → context → decision (cache / revalidate / bypass)

Baseline systems (what already exists)

1. SWR / TTL-based caching

Used in React Query / SWR:

stale-while-revalidate
background refetch
TTL invalidation

Works when:

update cycles are predictable
data freshness is stable

Fails when:

volatility varies inside the same dataset
freshness depends on UI state

2. Heuristic scoring systems

Example adaptive TTL:


volatilityScore = EWMA(changeFrequency)
priorityScore = userInteractionWeight * dataImportance
ttl = baseTTL / volatilityScore

Improves:

adaptive cache lifetime
frequency-aware invalidation

Limitations:

requires manual feature design
domain-specific tuning
breaks under missing signals

3. Lightweight ML models

Typical approach:

logistic regression
XGBoost / LightGBM
embedding classifiers

Pros:

fast inference
stable behavior
cheaper than LLMs

Cons:

needs labeled “optimal cache decision” data (rare)
retraining pipeline required
brittle under product changes

Why all baseline approaches plateau

All classical systems assume:

feature space is complete
behavior is stationary

In real systems:

user behavior is contextual
volatility depends on UI state
freshness is semantic, not numeric
signals are incomplete

Result:

heuristics → saturate
ML-light → overfit or drift

Key idea: caching is a decision system under uncertainty

Instead of:

“how long do we cache this?”

The correct formulation is:

“what action should we take given incomplete information?”

actions:

HIT
REVALIDATE
BYPASS
SWR

Where LLMs fit (and where they don’t)

LLMs are not a replacement layer.

They function as:

fallback policy engine for ambiguous decision space

They are useful only when:

scoring model confidence is low
signals conflict
unseen patterns appear

Architecture: layered decision system

UI Layer
   ↓
Context Builder
   ↓
Policy Engine
   ├── Rule Layer (deterministic)
   ├── ML Scoring Layer (probabilistic)
   └── LLM Fallback Layer (uncertainty)
   ↓
Cache Layer
   ↓
Network

Context model (input abstraction)

All decisions must be based on structured signals:

{
  "key": "user_feed",
  "lastUpdatedMs": 1200,
  "accessFrequency": "high",
  "volatilityScore": 0.82,
  "userAction": "scroll",
  "stalenessToleranceMs": 500
}

Important constraint:

no raw prompts
only structured features

LLM role (strictly bounded)

LLM is only a classifier:

{
  "strategy": "HIT | REVALIDATE | BYPASS | SWR",
  "ttlMs": 1200,
  "confidence": 0.78
}

Triggered only when:

ML confidence < threshold
feature signals conflict
unseen context patterns

Meta-cache: caching the decision layer

To reduce cost:

decisionCache(contextHash) → strategy

Effects:

avoids repeated LLM calls
stabilizes latency
amortizes inference cost

Cost-aware execution pipeline

IF rule matches:
    use rule engine
ELSE IF ML confidence > threshold:
    use ML model
ELSE:
    use LLM

Typical production distribution:

80–90% rules
10–20% ML
<10% LLM

Failure modes

1. Overuse of LLM

Problem:

cost spikes
unpredictable latency

Mitigation:

strict confidence gating
bounded invocation layer

2. Latency variance

Problem:

inconsistent response time in UI

Mitigation:

decision caching
async precomputation

3. Model drift

Problem:

ML decisions degrade over time

Mitigation:

feedback loop
periodic recalibration

Engineering takeaways

caching is a decision system, not storage optimization
SWR + heuristics solve majority of cases
ML-light is optimal in stable feature spaces
LLMs are only for ambiguous cases
production systems require strict routing hierarchy

Conclusion

Client-side caching becomes effective only when modeled as a layered decision system.

rules handle deterministic cases
ML handles structured uncertainty
LLM handles ambiguity

The correct design is hybrid, with strict boundaries and cost control, not LLM-centric

Discussion

Where should the boundary be defined between ML confidence and LLM fallback in production caching systems?

DEV Community

LLM-Driven Client-Side Caching: A Hybrid Decision Architecture

Problem: caching is not a storage problem

Baseline systems (what already exists)

1. SWR / TTL-based caching

2. Heuristic scoring systems

3. Lightweight ML models

Why all baseline approaches plateau

Key idea: caching is a decision system under uncertainty

Where LLMs fit (and where they don’t)

Architecture: layered decision system

Context model (input abstraction)

LLM role (strictly bounded)

Meta-cache: caching the decision layer

Cost-aware execution pipeline

Failure modes

1. Overuse of LLM

2. Latency variance

3. Model drift

Engineering takeaways

Conclusion

Discussion

Top comments (0)