From 0.3% Crash Rate to Zero: Production-Grade Performance
This is Part 3 of a 3-part series.
π Catch up on the series:
- Part 1: When SharedPreferences Fails (Resilience & Architecture)
- Part 2: The JWT Token Incident (Security & Encryption)
Our cache now:
- β Survives platform failures (Circuit Breaker β 0% crash rate)
- β Prevents memory leaks (LRU Eviction β 52MB vs 380MB)
- β Protects sensitive data (Hardware-backed encryption)
But when we launched to production, our telemetry showed new problems:
| Metric | Status | Issue |
|---|---|---|
| Cache crashes | β 0% | Circuit breaker works! |
| Security vulnerabilities | β 0 | Encryption works! |
| Login sync time (50 items) | β 2.5s | Users frustrated |
| Large batch operations | β CRASH | TransactionTooLargeException |
| UI reactivity | β Stale | Profile photo update doesn't refresh screen |
These aren't architecture problemsβthey're operational problems.
Production exposed three critical gaps:
- Performance: Sequential writes are too slow. Naive parallelism crashes the app.
- Concurrency: Race conditions cause data loss during TTL expiration.
- Observability: Cache updates don't trigger UI updates (stale state).
In this part, we'll solve production-scale challenges:
- β‘ Performance: Chunked batching to respect platform limits (1MB Binder, XPC constraints)
- β‘ Concurrency: Optimistic Locking to prevent race conditions (version-based atomic operations)
- β‘ Reactivity: Observer Pattern to make cache changes trigger UI updates
- β‘ Monitoring: Production metrics and circuit breaker health checks
- β‘ Quality: 3-layer testing strategy (unit, integration, widget tests)
- β‘ Lessons Learned: What worked, what we'd change
By the end, you'll have a production-ready cache that handles millions of operations with zero downtime.
Full source code available on GitHub
Part 6: Performance - Batching Against Platform Limits
The Sequential Write Problem
When we first implemented setMultiple(), we took the straightforward approach:
@override
Future<void> setMultiple(
Map<String, dynamic> items, {
String? driver,
Duration? ttl,
}) async {
for (final entry in items.entries) {
await set(entry.key, entry.value, driver: driver, ttl: ttl);
}
}
This looks innocent. It's readable. It works.
But in production, when syncing 50 user preferences after login, we measured:
- 2.5 seconds on Android (Pixel 4a)
- 1.8 seconds on iOS (iPhone 11)
Each await blocks the next operation. With 50 items at ~50ms per write, you're looking at 2,500ms of sequential I/O.
The Naive Parallel Solution
Our first optimization attempt was obvious:
// WORSE: Unbounded Parallelism
await Future.wait(items.map((item) => cache.set(item.key, item.value)));
On paper, this should be 50x faster (50ms vs 2,500ms).
In reality:
- β Local testing: 50 items in 120ms (20x improvement!)
- β Production (Android): App crashes after 200+ items
- β Production (iOS): Keychain access denied errors
Deep Dive: Why Unbounded Parallelism Crashes
Flutter communicates with native code via platform channels. These have undocumented hard limits:
Android (MethodChannel - Binder Transaction Limit):
Max concurrent calls: ~64
Transaction buffer: 1MB total
When exceeded: TransactionTooLargeException
Result: App crash
iOS (FlutterMethodChannel - XPC Limit):
Max concurrent calls: ~128
Message queue limit: Varies by iOS version
When exceeded: Dropped messages (silent failures!)
Result: Data loss without exceptions
When we launched 1,000 concurrent platform channel calls, we exhausted these system resources. The crash wasn't in our Dart codeβit was in the native bridge layer.
Problem Breakdown:
-
Memory Exhaustion: Each
Futureholds:- Serialized data (strings)
- Platform channel buffers (native memory)
- Event loop callbacks
- Stack frames
For 1,000 items, we saw 500MB+ memory spikes and GC pauses causing UI jank.
Platform Channel Saturation: The Android Binder has a 1MB transaction buffer limit shared across all IPC. iOS XPC has similar constraints. When exceeded, writes silently fail without throwing exceptions.
-
Secure Storage Rate Limiting:
-
iOS Keychain: ~100 writes/second (error:
errSecInteractionNotAllowed -25308) -
Android KeyStore: ~50 writes/second (error:
android.security.KeyStoreException)
-
iOS Keychain: ~100 writes/second (error:
These aren't documented by Flutter or the platform vendorsβwe discovered them through production telemetry and crash reports.
Understanding Platform Limits
Android Binder Transaction Buffer:
The Binder buffer is shared across all system services (not just your app). If another app is also using Binder heavily, your app's available buffer shrinks.
iOS XPC Message Limits:
ββββββββββββββββββββββββββββββββββββββββββ
β XPC Message Queue (per connection) β
ββββββββββββββββββββββββββββββββββββββββββ€
β Message 1: Keychain write β
β Message 2: Keychain write β
β ... β
β Message 128: Keychain write β
ββββββββββββββββββββββββββββββββββββββββββ€
β Total: ~128 messages in flight β
β Message 129: DROPPED (no error!) β β
ββββββββββββββββββββββββββββββββββββββββββ
XPC drops messages instead of crashing. This means data loss without any visible exception in your Dart code.
The Solution: Chunked Parallel Writes
After profiling across 15 device models, we found the sweet spot:
Future<void> setMultiple(
Map<String, dynamic> items, {
String? driver,
Duration? ttl,
}) async {
// Tune per platform and driver type
final chunkSize = _getOptimalChunkSize(driver);
final entries = items.entries.toList();
for (var i = 0; i < entries.length; i += chunkSize) {
final chunk = entries.skip(i).take(chunkSize);
// Parallel within chunk, sequential between chunks
await Future.wait(
chunk.map((entry) =>
set(entry.key, entry.value, driver: driver, ttl: ttl)
)
);
}
}
int _getOptimalChunkSize(String? driver) {
switch (driver) {
case 'secure_storage':
return 10; // Keychain/KeyStore rate limits
case 'shared_prefs':
return 50; // Disk I/O sweet spot
case 'memory':
return 100; // CPU-bound, can go higher
default:
return 50; // Conservative default
}
}
Why This Works:
- Memory-Bounded: Only N futures in flight at once (vs unbounded)
- Platform-Safe: Stays under Binder/XPC channel limits
- Rate-Limit Compliant: Respects native storage throttling
- Back-Pressure: Chunks act as natural flow control
Visualizing the Chunking Strategy
Sequential (Original):
Production Results:
| Scenario | Sequential | Naive Parallel | Chunked Parallel |
|---|---|---|---|
| 50 items (shared_prefs) | 2,500ms | 120ms | 140ms β |
| 500 items (shared_prefs) | 25,000ms | CRASH β | 1,200ms β |
| 50 items (secure_storage) | 12,000ms | ERROR β | 2,500ms β |
| 1,000 items (memory) | 8,000ms | 180ms* | 120ms β |
*Naive parallel succeeded in memory driver but used 400MB RAM
Architectural Insight: This is a classic distributed systems problem applied to mobile. In backend systems, you use circuit breakers and bulkheads to prevent cascading failures. On mobile, the constraint isn't network latencyβit's shared OS resources (memory, file handles, Keychain locks).
Chunking is our bulkhead pattern. It isolates failures:
- If chunk 5 fails, chunks 1-4 succeeded
- We can retry chunk 5 without redoing all work
- Memory pressure is predictable and bounded
Benchmarking Chunk Sizes
We benchmarked different chunk sizes on various device tiers:
SharedPreferences Driver:
| Chunk Size | 50 items | 500 items | 1,000 items | Memory Peak |
|---|---|---|---|---|
| 10 | 180ms | 2,100ms | 4,300ms | 45MB |
| 25 | 152ms | 1,450ms | 2,800ms | 78MB |
| 50 | 140ms | 1,200ms | 2,350ms | 95MB |
| 100 | 138ms | CRASH | CRASH | 180MB |
SecureStorage Driver:
| Chunk Size | 50 items | 500 items | Memory Peak |
|---|---|---|---|
| 5 | 3,200ms | 35,000ms | 42MB |
| 10 | 2,500ms | 26,000ms | 58MB |
| 20 | 2,450ms | ERROR | 95MB |
| 50 | ERROR | ERROR | - |
Key Findings:
- SharedPrefs: 50 items/chunk optimal (disk I/O saturates after this)
- SecureStorage: 10 items/chunk optimal (Keychain rate-limits kick in)
- Memory: 100 items/chunk optimal (pure CPU, no I/O bottleneck)
Part 8: The Observer Pattern - Making Cache Observable
The Problem: Stale UI State
Traditional caches are black boxes:
- Widget A updates the cache
- Widget B still shows old data
- You have to manually refresh Widget B
This leads to stale UI and bugs like:
- User updates profile photo β API succeeds β Cache updates β Profile screen doesn't refresh
- Settings changed β Cache updated β App still uses old settings
The Solution: Reactive Events (Pub/Sub)
We implemented an Observer Pattern using Dart Streams:
// lib/core/cache/domain/events/cache_event.dart
enum CacheEventType {
created,
updated,
removed,
expired,
cleared,
}
class CacheEvent {
final String key;
final CacheEventType type;
final dynamic value;
final dynamic oldValue;
final DateTime timestamp;
const CacheEvent({
required this.key,
required this.type,
this.value,
this.oldValue,
required this.timestamp,
});
bool get isCreated => type == CacheEventType.created;
bool get isUpdated => type == CacheEventType.updated;
bool get isRemoved => type == CacheEventType.removed;
bool get isExpired => type == CacheEventType.expired;
}
Implementation:
// lib/core/cache/utils/cache_subscription_manager.dart
class CacheSubscriptionManager {
final Map<String, StreamController<CacheEvent>> _controllers = {};
Stream<T?> watch<T>(String key) {
_controllers.putIfAbsent(
key,
() => StreamController<CacheEvent>.broadcast(),
);
return _controllers[key]!.stream.map((event) => event.value as T?);
}
void notify(CacheEvent event) {
_controllers[event.key]?.add(event);
}
void dispose(String key) {
_controllers[key]?.close();
_controllers.remove(key);
}
}
Integration with Cache Operations
Every cache mutation triggers an event:
@override
Future<void> set<T>(String key, T value, {String? driver, Duration? ttl}) async {
// Get old value for event
dynamic oldValue;
try {
oldValue = await get<T>(key, driver: driver);
} catch (_) {
// Key doesn't exist yet
}
// Perform write
final targetDriver = _manager.getDriver(driver);
final serialized = CacheSerializer.serialize(value);
await targetDriver.set(key, serialized);
// Set TTL if provided
if (ttl != null && _config?.enableTTL == true) {
_ttl.set(key, ttl);
}
// Notify subscribers
_subscriptions.notify(CacheEvent(
key: key,
type: oldValue == null ? CacheEventType.created : CacheEventType.updated,
value: value,
oldValue: oldValue,
timestamp: DateTime.now(),
));
}
Usage in UI:
class ProfileScreen extends StatelessWidget {
@override
Widget build(BuildContext context) {
return StreamBuilder<User?>(
stream: Cache.watch<User>('current_user'),
builder: (context, snapshot) {
if (!snapshot.hasData) return CircularProgressIndicator();
final user = snapshot.data!;
return Column(
children: [
CircleAvatar(backgroundImage: NetworkImage(user.photoUrl)),
Text(user.name),
],
);
},
);
}
}
// Anywhere in the app:
await Cache.set('current_user', updatedUser);
// β This automatically triggers the StreamBuilder to rebuild!
How Broadcast Streams Work
We use StreamController.broadcast() instead of regular streams:
// Regular stream (single listener)
final controller = StreamController<CacheEvent>();
controller.stream.listen((event) { }); // OK
controller.stream.listen((event) { }); // ERROR: Already has listener
// Broadcast stream (multiple listeners)
final controller = StreamController<CacheEvent>.broadcast();
controller.stream.listen((event) { }); // OK
controller.stream.listen((event) { }); // OK - multiple listeners allowed
This allows:
- Multiple widgets subscribing to the same cache key
- Different parts of the app reacting to the same data changes
- No coupling between widgets (they don't know about each other)
Real-World Usage Patterns
Pattern 1: Profile Photo Update
// ProfileEditScreen.dart
class ProfileEditScreen extends StatelessWidget {
Future<void> _updatePhoto(File photo) async {
// Upload to API
final photoUrl = await api.uploadPhoto(photo);
// Update user object
final user = await Cache.get<User>('current_user');
final updated = user.copyWith(photoUrl: photoUrl);
// Cache update triggers all listeners
await Cache.set('current_user', updated);
// No need to manually update UI!
// ProfileScreen automatically rebuilds via StreamBuilder
}
}
// ProfileScreen.dart (automatically updates)
class ProfileScreen extends StatelessWidget {
@override
Widget build(BuildContext context) {
return StreamBuilder<User?>(
stream: Cache.watch<User>('current_user'),
builder: (context, snapshot) {
// Rebuilds when 'current_user' changes
return CircleAvatar(
backgroundImage: NetworkImage(snapshot.data!.photoUrl),
);
},
);
}
}
// AppDrawer.dart (also automatically updates)
class AppDrawer extends StatelessWidget {
@override
Widget build(BuildContext context) {
return StreamBuilder<User?>(
stream: Cache.watch<User>('current_user'),
builder: (context, snapshot) {
// Also rebuilds when 'current_user' changes
return DrawerHeader(
child: CircleAvatar(
backgroundImage: NetworkImage(snapshot.data!.photoUrl),
),
);
},
);
}
}
Pattern 2: Feature Flags
// Remote config updates feature flags
void onRemoteConfigFetched(Map<String, dynamic> flags) async {
await Cache.set('feature_flags', flags);
// All widgets watching 'feature_flags' rebuild automatically
}
// Multiple screens react
class HomeScreen extends StatelessWidget {
@override
Widget build(BuildContext context) {
return StreamBuilder<Map<String, dynamic>?>(
stream: Cache.watch<Map<String, dynamic>>('feature_flags'),
builder: (context, snapshot) {
final flags = snapshot.data ?? {};
if (flags['new_ui_enabled'] == true) {
return NewHomeUI();
}
return LegacyHomeUI();
},
);
}
}
This effectively turns our Cache into a lightweight State Management solution for global app data, similar to:
- Redux/Riverpod (but simpler)
- Provider (but with persistence)
- GetX (but with Clean Architecture)
When to use Cache.watch() vs Provider/Riverpod:
- Use Cache.watch() for: User session, app config, feature flags (data that needs persistence)
- Use Provider/Riverpod for: UI state, form state, navigation state (ephemeral data)
Performance Considerations
Question: Won't this rebuild widgets too often?
Answer: No, because:
-
Dart Streams are lazy:
StreamBuilderonly listens when widget is mounted - Flutter rebuilds are cheap: Virtual DOM diffing makes unnecessary rebuilds fast
- You control granularity: Subscribe to specific keys, not the entire cache
Benchmarks:
// Worst case: 1,000 cache writes/second
for (var i = 0; i < 1000; i++) {
await Cache.set('counter', i);
}
// Result: StreamBuilder rebuilds 1,000 times
// Performance impact: ~5ms total (0.005ms per rebuild)
// UI remains smooth (60fps)
Flutter's virtual DOM diffing means if the actual UI output is identical, no expensive repaints occur.
Part 9: Optimistic Locking - Solving Race Conditions
The Race Condition Scenario
Consider this timeline with concurrent operations:
Time Thread A Thread B
ββββββββββββββββββββββββββββββββββββββββββββββββββ
0ms get('api_response')
1ms β Sees TTL expired
2ms set('api_response', fresh_data)
3ms β Updates version to 1
4ms remove('api_response')
5ms β Deletes fresh data! β
Result: Thread A deletes the fresh data Thread B just wrote, causing data loss.
The Solution: Versioning with Optimistic Locking
We attached a version counter to every TTL entry:
// lib/core/cache/utils/cache_ttl.dart
class TTLEntry {
final DateTime expiresAt;
final int version;
TTLEntry(this.expiresAt, this.version);
TTLEntry copyWithNewVersion() => TTLEntry(expiresAt, version + 1);
}
class CacheTTL {
final Map<String, TTLEntry> _ttlMap = {};
int _globalVersion = 0;
void set(String key, Duration ttl) {
_ttlMap[key] = TTLEntry(DateTime.now().add(ttl), _globalVersion++);
}
/// Check if expired and return entry for atomic operations
TTLEntry? getIfExpired(String key) {
if (!_ttlMap.containsKey(key)) return null;
final entry = _ttlMap[key]!;
if (DateTime.now().isAfter(entry.expiresAt)) {
return entry; // Return entry with version for atomic check
}
return null;
}
/// Remove only if version matches (atomic operation)
bool removeIfVersionMatches(String key, int version) {
final entry = _ttlMap[key];
// Atomic Check: Only delete if version hasn't changed
if (entry != null && entry.version == version) {
_ttlMap.remove(key);
log('TTL EXPIRED (atomic): $key', name: 'Cache');
return true;
}
return false; // Version changed, key was updated! Abort deletion.
}
}
How it's used in CacheImpl:
@override
Future<T?> get<T>(String key, {String? driver}) async {
// Check for expiration with atomic version tracking
final expiredEntry = _ttl.getIfExpired(key);
if (expiredEntry != null) {
// Get value before removing for event notification
dynamic lastValue;
try {
final targetDriver = _manager.getDriver(driver);
final raw = await targetDriver.get(key);
if (raw != null) {
lastValue = CacheSerializer.deserialize<T>(raw);
}
} catch (_) {
// Ignore errors when getting expired value
}
// Atomic removal - only if version hasn't changed
if (_ttl.removeIfVersionMatches(key, expiredEntry.version)) {
await _manager.getDriver(driver).remove(key);
// Notify subscribers about expiration
_subscriptions.notify(CacheEvent(
key: key,
type: CacheEventType.expired,
oldValue: lastValue,
timestamp: DateTime.now(),
));
throw CacheTTLExpiredException(
key: key,
expiredAt: expiredEntry.expiresAt,
);
}
// else: Version changed, key was updated - continue to get fresh data
}
// ... rest of get() logic
}
Timeline with Optimistic Locking:
Time Thread A Thread B
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
0ms get('api_response')
1ms β Sees TTL expired (version 0)
2ms set('api_response', fresh_data)
3ms β Updates version to 1
4ms removeIfVersionMatches(key, 0)
5ms β Check fails (version is now 1) β
6ms β Aborts deletion, returns fresh data
Result: Data consistency guaranteed, no data loss.
Understanding Optimistic Locking
Optimistic Locking assumes conflicts are rare, so it doesn't lock resources upfront. Instead:
- Read data with version number
- Process data (do work)
- Write back only if version hasn't changed
- Retry if version mismatch (conflict detected)
Compare to Pessimistic Locking:
// Pessimistic (traditional databases)
lock.acquire(); // Block other threads
final data = read('key');
process(data);
write('key', data);
lock.release(); // Allow other threads
Optimistic (our approach):
// No lock! Multiple threads can read simultaneously
final entry = getWithVersion('key'); // Read + version
process(entry.data);
// Only check version when writing
if (currentVersion == entry.version) {
write('key', data); // Success
} else {
retry(); // Version changed, someone else updated it
}
Why Optimistic for Mobile?
- No thread blocking: UI remains responsive (no frozen screens)
- Better battery life: No spinning on locks
- Conflicts are rare: Mobile apps usually have single-user, single-device access
- Simpler code: No need for mutexes, semaphores, or deadlock prevention
This is the same pattern used in:
- Database transactions (Optimistic Concurrency Control)
- Version control systems (Git merge conflicts)
- Distributed caches (Redis WATCH command)
- RESTful APIs (ETags for conditional updates)
When Optimistic Locking Fails (and How to Handle It)
Scenario: High Contention
// 10 threads all trying to update the same key
for (var i = 0; i < 10; i++) {
Future(() async {
final entry = _ttl.getIfExpired('hot_key');
// ... do work ...
final success = _ttl.removeIfVersionMatches('hot_key', entry.version);
if (!success) {
// Version changed - retry
await Future.delayed(Duration(milliseconds: 10));
// Retry logic here
}
});
}
When does this happen in production?
- Background sync + user action on same data
- Multiple API calls updating same cache key
- TTL expiration + manual refresh simultaneously
Solution: Exponential Backoff
Future<T?> getWithRetry<T>(String key, {int maxRetries = 3}) async {
for (var attempt = 0; attempt < maxRetries; attempt++) {
try {
return await get<T>(key);
} on CacheTTLExpiredException {
if (attempt == maxRetries - 1) rethrow;
// Exponential backoff: 10ms, 20ms, 40ms
await Future.delayed(Duration(milliseconds: 10 * (1 << attempt)));
}
}
return null;
}
Part 13: Production Metrics & Monitoring
We built a health check endpoint that exposes system status:
@override
Future<Map<String, dynamic>> stats() async {
final stats = <String, dynamic>{
'defaultDriver': _manager.defaultDriver,
'availableDrivers': _manager.drivers.keys.map((k) => k.value).toList(),
'driverHealth': _manager.driverHealth,
'memorySize': await size(driver: 'memory'),
'sharedPrefsSize': await size(driver: 'shared_prefs'),
'secureStorageSize': await size(driver: 'secure_storage'),
'config': _manager.config.toString(),
};
return stats;
}
Example output:
{
"defaultDriver": "shared_prefs",
"availableDrivers": ["memory", "shared_prefs", "secure_storage"],
"driverHealth": {
"memory": true,
"shared_prefs": false,
"secure_storage": true
},
"memorySize": 42,
"sharedPrefsSize": 0,
"secureStorageSize": 3,
"config": "CacheConfig(ttl: true, maxKeyLength: 250, logFallbacks: true)"
}
Monitoring Strategy:
void main() async {
await Cache.initialize(config: CacheConfig.defaults());
// Send cache health to analytics on app launch
final stats = await Cache.stats();
Analytics.track('cache_health', stats);
// Alert if SharedPrefs circuit is open
if (stats['driverHealth']['shared_prefs'] == false) {
Analytics.track('cache_circuit_breaker_triggered', {
'driver': 'shared_prefs',
'fallback': 'memory',
});
}
runApp(MyApp());
}
Production Value:
A spike in shared_prefs: false alerts us to platform-specific issues before users complain. We can correlate with:
- Device models (Pixel 6 Pro having issues?)
- OS versions (Android 14 bug?)
- App versions (Did our last update break something?)
Key Metrics to Track
1. Circuit Breaker State:
-
driver_availability_rate: % of time each driver is available -
fallback_events_count: How often fallbacks occur -
circuit_open_duration: How long circuits stay open
2. Performance Metrics:
-
cache_hit_rate: % of successful cache retrievals -
cache_miss_rate: % of cache misses (fetch from source) -
avg_read_latency: Average time for get() operations -
avg_write_latency: Average time for set() operations
3. Memory Metrics:
-
lru_evictions_count: How often LRU evicts entries -
memory_cache_size: Current number of entries in memory -
memory_pressure_events: iOS memory warnings received
4. Error Metrics:
-
serialization_errors: Malformed data count -
ttl_expirations: How many items expire (vs manually removed) -
keystore_failures: Secure storage access failures
Alerting Rules
Critical Alerts:
if (stats['driverHealth']['shared_prefs'] == false) {
// Circuit breaker triggered - disk storage down
sendAlert('P1: Cache degraded to memory-only mode');
}
if (stats['memorySize'] > 5000) {
// LRU limit might be too high for device
sendAlert('P2: Memory cache approaching limit');
}
if (errorRate > 0.01) {
// More than 1% of operations failing
sendAlert('P1: Cache error rate elevated');
}
Warning Alerts:
if (cacheHitRate < 0.80) {
// Low hit rate = users fetching from network too often
sendAlert('P3: Cache hit rate below 80%');
}
if (avgWriteLatency > 500) {
// Slow writes = poor UX
sendAlert('P3: Cache write latency elevated');
}
Part 14: Testing Architecture - From Theory to Practice
One of the biggest benefits of this Clean Architecture approach is testability. We employ a 3-Layer Testing Strategy.
Layer 1: Unit Tests (Domain Logic - Pure Dart)
Testing pure logic without mocking platform channels or Flutter framework.
Testing Optimistic Locking for Race Conditions:
import 'package:flutter_test/flutter_test.dart';
import 'package:flutter_production_architecture/core/cache/utils/cache_ttl.dart';
void main() {
group('CacheTTL - Optimistic Locking', () {
late CacheTTL ttl;
setUp(() {
ttl = CacheTTL(enabled: true);
});
test('Race condition: Stale version should not delete fresh data', () {
// Simulate two threads operating on the same key
// Thread A: Set key with TTL of 1 second (version 0)
ttl.set('api_response', Duration(seconds: 1));
// Capture the expired entry info (what Thread A sees)
final expiredEntry = ttl.getIfExpired('api_response');
expect(expiredEntry, isNull); // Not expired yet
// Simulate time passing (thread A goes to check expiration)
// Meanwhile, Thread B updates the key with fresh data
ttl.set('api_response', Duration(seconds: 10)); // version 1
// Now Thread A finally checks expiration (after network delay)
// Thread A tries to remove with its stale version (0)
final removed = ttl.removeIfVersionMatches('api_response', 0);
// Should FAIL because version is now 1 (Thread B updated it)
expect(removed, false);
// The key should still exist with fresh TTL
expect(ttl.isExpired('api_response'), false);
});
test('Successful removal when version matches', () async {
ttl.set('key', Duration(milliseconds: 10));
// Wait for expiration
await Future.delayed(Duration(milliseconds: 50));
final expiredEntry = ttl.getIfExpired('key');
expect(expiredEntry, isNotNull);
// Remove with correct version
final removed = ttl.removeIfVersionMatches('key', expiredEntry!.version);
expect(removed, true);
// Entry should be gone
expect(ttl.isExpired('key'), false); // No longer tracked
});
});
}
Why this matters: This test documents our architectural decision better than any comment could. It proves the race condition is handled correctly.
Layer 2: Integration Tests (Data Layer - Platform Interaction)
Testing the Circuit Breaker logic by simulating platform failures.
import 'package:flutter_test/flutter_test.dart';
import 'package:shared_preferences/shared_preferences.dart';
import 'package:flutter_production_architecture/core/cache/data/repositories/cache_repository_impl.dart';
import 'package:flutter_production_architecture/core/cache/domain/entities/cache_config.dart';
void main() {
group('CacheImpl - Circuit Breaker', () {
test('Falls back to memory when SharedPreferences write fails', () async {
// Arrange: Create a cache with default driver = shared_prefs
SharedPreferences.setMockInitialValues({}); // Start clean
final cache = await CacheImpl.create(
defaultDriver: 'shared_prefs',
config: CacheConfig(logFallbacks: true),
);
// In a real scenario, SharedPreferences might fail due to:
// - Disk full
// - Permissions error
// - Corrupt storage
// Our circuit breaker catches this and falls back to memory
// Act: Write to cache
await cache.set('test_key', 'test_value');
// Assert: Data should be retrievable (from memory fallback if needed)
final value = await cache.get<String>('test_key');
expect(value, 'test_value');
// Verify circuit breaker health
final stats = await cache.stats();
expect(stats['driverHealth'], isNotNull);
});
test('Memory driver always works as last resort', () async {
final cache = await CacheImpl.create(defaultDriver: 'memory');
// Memory driver should never fail
await cache.set('key', 'value');
final value = await cache.get<String>('key');
expect(value, 'value');
});
});
}
Production bugs this caught:
- Disk full errors on devices with <500MB free space
- SharedPreferences corruption after force-stop during write
- SecureStorage unavailable on Android emulators without Google Play Services
The circuit breaker prevented 100% of potential cache crashes.
Layer 3: Widget Tests (Presentation Layer - UI Reactivity)
Testing if the UI reacts to cache changes via the Observer Pattern.
import 'package:flutter/material.dart';
import 'package:flutter_test/flutter_test.dart';
import 'package:flutter_production_architecture/core/cache/presentation/cache_facade.dart';
void main() {
group('Cache Subscriptions - Widget Updates', () {
testWidgets('Widget rebuilds when subscribed cache key changes', (tester) async {
// Arrange: Create a test app with a widget that subscribes to cache
await tester.pumpWidget(
MaterialApp(
home: Scaffold(
body: Builder(
builder: (context) {
return StreamBuilder<String?>(
stream: Cache.watch<String>('user_name'),
builder: (context, snapshot) {
return Text(snapshot.data ?? 'No user');
},
);
},
),
),
),
);
// Initial state
expect(find.text('No user'), findsOneWidget);
// Act: Update cache
await Cache.set('user_name', 'Alice');
await tester.pump(); // Process the stream event
// Assert: Widget should reflect new value
expect(find.text('Alice'), findsOneWidget);
expect(find.text('No user'), findsNothing);
// Act: Update again
await Cache.set('user_name', 'Bob');
await tester.pump();
// Assert: Should show updated value
expect(find.text('Bob'), findsOneWidget);
expect(find.text('Alice'), findsNothing);
});
testWidgets('Multiple widgets can subscribe to the same key', (tester) async {
await tester.pumpWidget(
MaterialApp(
home: Column(
children: [
StreamBuilder<String?>(
stream: Cache.watch<String>('count'),
builder: (context, snapshot) => Text('Widget1: ${snapshot.data ?? "0"}'),
),
StreamBuilder<String?>(
stream: Cache.watch<String>('count'),
builder: (context, snapshot) => Text('Widget2: ${snapshot.data ?? "0"}'),
),
],
),
),
);
// Both should start at default
expect(find.text('Widget1: 0'), findsOneWidget);
expect(find.text('Widget2: 0'), findsOneWidget);
// Update cache
await Cache.set('count', '42');
await tester.pump();
// Both should update
expect(find.text('Widget1: 42'), findsOneWidget);
expect(find.text('Widget2: 42'), findsOneWidget);
});
});
}
Real bug this caught: User updates their profile photo β API call succeeds β Cache updates β Profile screen doesn't refresh (shows old photo). We forgot to subscribe the widget to cache changes. This test would have prevented the bug from reaching production.
Coverage Targets
flutter test --coverage
genhtml coverage/lcov.info -o coverage/html
open coverage/html/index.html
Our targets:
- Domain layer (cache_ttl.dart, cache_validator.dart): 100% coverage β
- Data layer (cache_repository_impl.dart): 95%+ coverage β
- Presentation layer (cache_facade.dart): 80%+ coverage β
What we DON'T test:
- Platform plugin code (SharedPreferences, SecureStorage) - trust the package maintainers
- Flutter framework internals (StreamBuilder) - trust the Flutter team
Key Insight: Tests are executable documentation. When someone asks "Why optimistic locking?", the answer is: "Run the race condition test. Watch it fail without locking, pass with it."
Part 15: Lessons Learned - What We'd Do Differently
What Worked Well β
-
Circuit Breaker Pattern:
- Zero cache-related crashes in 8+ months of production
- Graceful degradation on 100% of storage failures
-
Interface Segregation (Clean Architecture):
- Unit tests run in 5ms instead of 500ms (no platform channels)
- Business logic is 100% platform-independent
-
Optimistic Locking:
- Eliminated race conditions in high-concurrency flows
- No data loss in TTL expiration scenarios
-
LRU Eviction:
- Memory stable at 52MB vs 380MB before
- 0% OOM crashes from unbounded cache growth
-
Observable Cache:
- UI automatically syncs with cache changes
- Replaced need for separate state management in many cases
What We'd Improve π§
-
Compression:
- Problem: Large JSON payloads (10KB+) waste disk space
- Future Solution: Add Gzip compression for values >5KB
- Implementation: Transparent in CacheSerializer
- Trade-off: 15% CPU overhead for 70% space savings
-
Encryption-at-Rest:
- Problem: Only SecureStorage is encrypted. SharedPrefs is plain text.
- Future Solution: Encrypt SharedPreferences data by default
- Trade-off: Performance hit (5ms β 15ms per operation)
-
API:
CacheConfig(encryptSharedPrefs: true)
-
Cache Warming:
- Problem: App starts cold, critical data loaded on-demand
- Future Solution: Preload critical data during splash screen
- Implementation:
await Cache.warmup(['current_user', 'app_config', 'feature_flags']); -
Selective Eviction:
- Problem: LRU evicts by access time, not importance
- Future Solution: Priority-based eviction (pin critical keys)
- API:
await Cache.set('user', user, priority: CachePriority.high); -
Cross-Tab Synchronization:
- Problem: Multiple instances (web) don't sync cache changes
- Future Solution: Use BroadcastChannel (web) or IsolateChannel (mobile)
Part 16: Putting It All Together - Complete Usage Guide
Basic CRUD Operations
// Initialize once in main()
await Cache.initialize(config: CacheConfig.defaults());
// Store data (auto-serialized)
await Cache.set<User>('current_user', user);
await Cache.set<List<String>>('tags', ['flutter', 'dart', 'mobile']);
// Retrieve data (type-safe)
final user = await Cache.get<User>('current_user');
final tags = await Cache.get<List<String>>('tags');
// Check existence
if (await Cache.has('current_user')) {
print('User cached');
}
// Remove data
await Cache.remove('current_user');
// Clear all
await Cache.clear();
Secure Storage (Encrypted)
// Automatically uses FlutterSecureStorage (Keychain/KeyStore)
await Cache.secure.set('jwt_token', 'abc123');
await Cache.secure.set('api_key', 'sk_live_...');
// Retrieve
final token = await Cache.secure.get<String>('jwt_token');
// Remove
await Cache.secure.remove('jwt_token');
TTL (Time-To-Live)
// Cache with 1-hour expiration
await Cache.set('api_response', response, ttl: Duration(hours: 1));
// After 1 hour, this throws CacheTTLExpiredException
try {
final response = await Cache.get('api_response');
} on CacheTTLExpiredException catch (e) {
print('Data expired at: ${e.expiredAt}');
// Fetch fresh data
}
Observable Cache (Reactive UI)
// Subscribe to changes
Cache.watch<User>('current_user').listen((user) {
print('User updated: ${user?.name}');
});
// Update from anywhere
await Cache.set('current_user', updatedUser);
// β All subscribers automatically notified!
// Use in widgets
class ProfileWidget extends StatelessWidget {
@override
Widget build(BuildContext context) {
return StreamBuilder<User?>(
stream: Cache.watch<User>('current_user'),
builder: (context, snapshot) {
if (!snapshot.hasData) return CircularProgressIndicator();
return Text(snapshot.data!.name);
},
);
}
}
Error Handling
try {
final user = await Cache.get<User>('user');
} on CacheMissException {
// Key doesn't exist - fetch from API
final user = await api.fetchUser();
await Cache.set('user', user);
} on CacheTTLExpiredException catch (e) {
// Data expired - refresh it
print('Expired at: ${e.expiredAt}');
final user = await api.fetchUser();
await Cache.set('user', user, ttl: Duration(hours: 1));
} on CacheSerializationException catch (e) {
// Data corrupt - clear and refetch
print('Corrupt data for type: ${e.type}');
await Cache.remove('user');
final user = await api.fetchUser();
await Cache.set('user', user);
} on CacheDriverException catch (e) {
// Storage failed - circuit breaker already handled this
print('Driver ${e.driverName} failed, using fallback');
}
Batch Operations
// Write multiple items efficiently (chunked)
await Cache.setMultiple({
'user': user,
'settings': settings,
'theme': theme,
// ... up to 1,000+ items
});
// Read multiple items
final results = await Cache.getMultiple<String>(['key1', 'key2', 'key3']);
// Returns: {'key1': 'value1', 'key2': null, 'key3': 'value3'}
Health Monitoring
final stats = await Cache.stats();
print(stats);
// {
// 'defaultDriver': 'shared_prefs',
// 'driverHealth': {'memory': true, 'shared_prefs': true, 'secure_storage': true},
// 'memorySize': 42,
// 'sharedPrefsSize': 128,
// 'secureStorageSize': 3
// }
Conclusion: From Prototype to Production
When we started this journey, we asked: "Why does SharedPreferences fail in production?"
The answer: Because production systems require resilience, security, and scaleβnot just functionality.
What we built across this 3-part series:
Part 1: Resilience
- β Circuit Breaker Pattern (0% crash rate from cache failures)
- β LRU Eviction (52MB vs 380MB memory usage)
- β Clean Architecture (100% testable domain logic)
- β Strategy Pattern (swappable storage backends)
- β Type-safe serialization (automatic JSON handling)
Part 2: Security
- β Hardware-backed encryption (iOS Keychain, Android KeyStore)
- β Tiered storage strategy (public vs secure cache)
- β Defense-in-depth (protection against root, forensics, malware)
- β Security-aware exceptions (meaningful error handling)
Part 3: Scale & Quality
- β Chunked batching (140ms vs 2.5s for 50 items)
- β Optimistic Locking (zero race condition data loss)
- β Observable cache (reactive UI updates via Observer Pattern)
- β Production metrics (circuit breaker health monitoring)
- β 3-layer testing (95%+ coverage across domain/data/presentation)
Production Impact Summary
| Metric | Before | After | Improvement |
|---|---|---|---|
| Cache-related crashes | 0.3% | 0.0% | 100% reduction |
| Memory usage (8h session) | 380MB | 52MB | 86% reduction |
| Sync time (50 items) | 2,500ms | 140ms | 94% faster |
| Race condition data loss | Occasional | 0 | Eliminated |
| Security vulnerabilities | P0 Critical | 0 | Compliant |
| Test coverage | ~40% | 95%+ | 2.4x increase |
Key Architectural Insights
The patterns here aren't Flutter-specificβthey're systems engineering applied to mobile:
- Circuit Breakers (used by Netflix, AWS)
- LRU Eviction (used by Redis, CPU caches)
- Optimistic Locking (used by databases, Git)
- Observer Pattern (used by reactive frameworks everywhere)
- Strategy Pattern (used by payment gateways, storage engines)
Mobile platforms have unique constraints:
- Shared OS resources (Binder, XPC)
- Hardware-backed security (Secure Enclave, TrustZone)
- Memory pressure (iOS jetsam, Android OOM killer)
- User expectations (instant UI, offline-first)
Our architecture respects these constraints while applying battle-tested patterns from distributed systems.
What's Next for Your Implementation
If you're implementing this in your app:
- Start with Part 1: Get the Circuit Breaker and LRU working first
- Add Part 2 Security: Identify sensitive keys (tokens, keys) β move to SecureStorage
- Optimize with Part 3: Profile your batch operations β add chunking if needed
Beyond this series (future enhancements):
- Compression for large payloads (Gzip for >5KB values)
- Priority-based eviction (pin critical keys, evict less important ones first)
- Cross-tab synchronization (BroadcastChannel for web, IsolateChannel for mobile)
- Encryption-at-rest for SharedPreferences (EncryptedSharedPreferences by default)
- Cache warming (preload critical data during splash screen)
Resources
Full Source Code:
π Flutter Production Architecture on GitHub
Read the Series:
- π Part 1: When SharedPreferences Fails (Resilience)
- π Part 2: The JWT Token Incident (Security)
- π Part 3: From 0.3% Crash Rate to Zero (Performance) β You are here
Further Reading:
- Clean Architecture (Robert C. Martin)
- Circuit Breaker Pattern (Microsoft)
- iOS Keychain Services
- Android KeyStore System
- LRU Cache Implementation (Leetcode)
If this series helped you:
- β Star the GitHub repository
- π¬ Share your implementation experiences in the comments
- π Share with your team (especially those fighting SharedPreferences bugs!)
Questions or improvements? Open an issue or PR on the GitHub repo. This is a living architectureβfeedback makes it better.
Tags: #Flutter #Architecture #SystemDesign #Production #Mobile #CleanArchitecture #Performance #Testing
Author: DevMatrash
Date: February 2026




Top comments (0)