Mahmoud Alatrash

Posted on Feb 5

From 0.3% Crash Rate to Zero: Scaling Flutter Cache with Batching, Locking, and Observable State

#flutter #performance #testing #production

From 0.3% Crash Rate to Zero: Production-Grade Performance

This is Part 3 of a 3-part series.

🔙 Catch up on the series:

Part 1: When SharedPreferences Fails (Resilience & Architecture)
Part 2: The JWT Token Incident (Security & Encryption)

Our cache now:

✅ Survives platform failures (Circuit Breaker → 0% crash rate)
✅ Prevents memory leaks (LRU Eviction → 52MB vs 380MB)
✅ Protects sensitive data (Hardware-backed encryption)

But when we launched to production, our telemetry showed new problems:

Metric	Status	Issue
Cache crashes	✅ 0%	Circuit breaker works!
Security vulnerabilities	✅ 0	Encryption works!
Login sync time (50 items)	❌ 2.5s	Users frustrated
Large batch operations	❌ CRASH	`TransactionTooLargeException`
UI reactivity	❌ Stale	Profile photo update doesn't refresh screen

These aren't architecture problems—they're operational problems.

Production exposed three critical gaps:

Performance: Sequential writes are too slow. Naive parallelism crashes the app.
Concurrency: Race conditions cause data loss during TTL expiration.
Observability: Cache updates don't trigger UI updates (stale state).

In this part, we'll solve production-scale challenges:

⚡ Performance: Chunked batching to respect platform limits (1MB Binder, XPC constraints)
⚡ Concurrency: Optimistic Locking to prevent race conditions (version-based atomic operations)
⚡ Reactivity: Observer Pattern to make cache changes trigger UI updates
⚡ Monitoring: Production metrics and circuit breaker health checks
⚡ Quality: 3-layer testing strategy (unit, integration, widget tests)
⚡ Lessons Learned: What worked, what we'd change

By the end, you'll have a production-ready cache that handles millions of operations with zero downtime.

Full source code available on GitHub

Part 6: Performance - Batching Against Platform Limits

The Sequential Write Problem

When we first implemented setMultiple(), we took the straightforward approach:

@override
Future<void> setMultiple(
  Map<String, dynamic> items, {
  String? driver,
  Duration? ttl,
}) async {
  for (final entry in items.entries) {
    await set(entry.key, entry.value, driver: driver, ttl: ttl);
  }
}

This looks innocent. It's readable. It works.

But in production, when syncing 50 user preferences after login, we measured:

2.5 seconds on Android (Pixel 4a)
1.8 seconds on iOS (iPhone 11)

Each await blocks the next operation. With 50 items at ~50ms per write, you're looking at 2,500ms of sequential I/O.

The Naive Parallel Solution

Our first optimization attempt was obvious:

// WORSE: Unbounded Parallelism
await Future.wait(items.map((item) => cache.set(item.key, item.value)));

On paper, this should be 50x faster (50ms vs 2,500ms).

In reality:

✅ Local testing: 50 items in 120ms (20x improvement!)
❌ Production (Android): App crashes after 200+ items
❌ Production (iOS): Keychain access denied errors

Deep Dive: Why Unbounded Parallelism Crashes

Flutter communicates with native code via platform channels. These have undocumented hard limits:

Android (MethodChannel - Binder Transaction Limit):

Max concurrent calls: ~64
Transaction buffer: 1MB total
When exceeded: TransactionTooLargeException
Result: App crash

iOS (FlutterMethodChannel - XPC Limit):

Max concurrent calls: ~128
Message queue limit: Varies by iOS version
When exceeded: Dropped messages (silent failures!)
Result: Data loss without exceptions

When we launched 1,000 concurrent platform channel calls, we exhausted these system resources. The crash wasn't in our Dart code—it was in the native bridge layer.

Problem Breakdown:

Memory Exhaustion: Each Future holds:
- Serialized data (strings)
- Platform channel buffers (native memory)
- Event loop callbacks
- Stack frames

For 1,000 items, we saw 500MB+ memory spikes and GC pauses causing UI jank.

Platform Channel Saturation: The Android Binder has a 1MB transaction buffer limit shared across all IPC. iOS XPC has similar constraints. When exceeded, writes silently fail without throwing exceptions.
Secure Storage Rate Limiting:
- iOS Keychain: ~100 writes/second (error: errSecInteractionNotAllowed -25308)
- Android KeyStore: ~50 writes/second (error: android.security.KeyStoreException)

These aren't documented by Flutter or the platform vendors—we discovered them through production telemetry and crash reports.

Understanding Platform Limits

Android Binder Transaction Buffer:

The Binder buffer is shared across all system services (not just your app). If another app is also using Binder heavily, your app's available buffer shrinks.

iOS XPC Message Limits:

┌────────────────────────────────────────┐
│  XPC Message Queue (per connection)    │
├────────────────────────────────────────┤
│ Message 1: Keychain write              │
│ Message 2: Keychain write              │
│ ...                                    │
│ Message 128: Keychain write            │
├────────────────────────────────────────┤
│ Total: ~128 messages in flight         │
│ Message 129: DROPPED (no error!) ❌     │
└────────────────────────────────────────┘

XPC drops messages instead of crashing. This means data loss without any visible exception in your Dart code.

The Solution: Chunked Parallel Writes

After profiling across 15 device models, we found the sweet spot:

Future<void> setMultiple(
  Map<String, dynamic> items, {
  String? driver,
  Duration? ttl,
}) async {
  // Tune per platform and driver type
  final chunkSize = _getOptimalChunkSize(driver);
  final entries = items.entries.toList();

  for (var i = 0; i < entries.length; i += chunkSize) {
    final chunk = entries.skip(i).take(chunkSize);

    // Parallel within chunk, sequential between chunks
    await Future.wait(
      chunk.map((entry) => 
        set(entry.key, entry.value, driver: driver, ttl: ttl)
      )
    );
  }
}

int _getOptimalChunkSize(String? driver) {
  switch (driver) {
    case 'secure_storage':
      return 10;  // Keychain/KeyStore rate limits
    case 'shared_prefs':
      return 50;  // Disk I/O sweet spot
    case 'memory':
      return 100; // CPU-bound, can go higher
    default:
      return 50;  // Conservative default
  }
}

Why This Works:

Memory-Bounded: Only N futures in flight at once (vs unbounded)
Platform-Safe: Stays under Binder/XPC channel limits
Rate-Limit Compliant: Respects native storage throttling
Back-Pressure: Chunks act as natural flow control

Visualizing the Chunking Strategy

Sequential (Original):

Production Results:

Scenario	Sequential	Naive Parallel	Chunked Parallel
50 items (shared_prefs)	2,500ms	120ms	140ms ✅
500 items (shared_prefs)	25,000ms	CRASH ❌	1,200ms ✅
50 items (secure_storage)	12,000ms	ERROR ❌	2,500ms ✅
1,000 items (memory)	8,000ms	180ms*	120ms ✅

*Naive parallel succeeded in memory driver but used 400MB RAM

Architectural Insight: This is a classic distributed systems problem applied to mobile. In backend systems, you use circuit breakers and bulkheads to prevent cascading failures. On mobile, the constraint isn't network latency—it's shared OS resources (memory, file handles, Keychain locks).

Chunking is our bulkhead pattern. It isolates failures:

If chunk 5 fails, chunks 1-4 succeeded
We can retry chunk 5 without redoing all work
Memory pressure is predictable and bounded

Benchmarking Chunk Sizes

We benchmarked different chunk sizes on various device tiers:

SharedPreferences Driver:

Chunk Size	50 items	500 items	1,000 items	Memory Peak
10	180ms	2,100ms	4,300ms	45MB
25	152ms	1,450ms	2,800ms	78MB
50	140ms	1,200ms	2,350ms	95MB
100	138ms	CRASH	CRASH	180MB

SecureStorage Driver:

Chunk Size	50 items	500 items	Memory Peak
5	3,200ms	35,000ms	42MB
10	2,500ms	26,000ms	58MB
20	2,450ms	ERROR	95MB
50	ERROR	ERROR	-

Key Findings:

SharedPrefs: 50 items/chunk optimal (disk I/O saturates after this)
SecureStorage: 10 items/chunk optimal (Keychain rate-limits kick in)
Memory: 100 items/chunk optimal (pure CPU, no I/O bottleneck)

Part 8: The Observer Pattern - Making Cache Observable

The Problem: Stale UI State

Traditional caches are black boxes:

Widget A updates the cache
Widget B still shows old data
You have to manually refresh Widget B

This leads to stale UI and bugs like:

User updates profile photo → API succeeds → Cache updates → Profile screen doesn't refresh
Settings changed → Cache updated → App still uses old settings

The Solution: Reactive Events (Pub/Sub)

We implemented an Observer Pattern using Dart Streams:

// lib/core/cache/domain/events/cache_event.dart
enum CacheEventType {
  created,
  updated,
  removed,
  expired,
  cleared,
}

class CacheEvent {
  final String key;
  final CacheEventType type;
  final dynamic value;
  final dynamic oldValue;
  final DateTime timestamp;

  const CacheEvent({
    required this.key,
    required this.type,
    this.value,
    this.oldValue,
    required this.timestamp,
  });

  bool get isCreated => type == CacheEventType.created;
  bool get isUpdated => type == CacheEventType.updated;
  bool get isRemoved => type == CacheEventType.removed;
  bool get isExpired => type == CacheEventType.expired;
}

Implementation:

// lib/core/cache/utils/cache_subscription_manager.dart
class CacheSubscriptionManager {
  final Map<String, StreamController<CacheEvent>> _controllers = {};

  Stream<T?> watch<T>(String key) {
    _controllers.putIfAbsent(
      key,
      () => StreamController<CacheEvent>.broadcast(),
    );

    return _controllers[key]!.stream.map((event) => event.value as T?);
  }

  void notify(CacheEvent event) {
    _controllers[event.key]?.add(event);
  }

  void dispose(String key) {
    _controllers[key]?.close();
    _controllers.remove(key);
  }
}

Integration with Cache Operations

Every cache mutation triggers an event:

@override
Future<void> set<T>(String key, T value, {String? driver, Duration? ttl}) async {
  // Get old value for event
  dynamic oldValue;
  try {
    oldValue = await get<T>(key, driver: driver);
  } catch (_) {
    // Key doesn't exist yet
  }

  // Perform write
  final targetDriver = _manager.getDriver(driver);
  final serialized = CacheSerializer.serialize(value);
  await targetDriver.set(key, serialized);

  // Set TTL if provided
  if (ttl != null && _config?.enableTTL == true) {
    _ttl.set(key, ttl);
  }

  // Notify subscribers
  _subscriptions.notify(CacheEvent(
    key: key,
    type: oldValue == null ? CacheEventType.created : CacheEventType.updated,
    value: value,
    oldValue: oldValue,
    timestamp: DateTime.now(),
  ));
}

Usage in UI:

class ProfileScreen extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return StreamBuilder<User?>(
      stream: Cache.watch<User>('current_user'),
      builder: (context, snapshot) {
        if (!snapshot.hasData) return CircularProgressIndicator();

        final user = snapshot.data!;
        return Column(
          children: [
            CircleAvatar(backgroundImage: NetworkImage(user.photoUrl)),
            Text(user.name),
          ],
        );
      },
    );
  }
}

// Anywhere in the app:
await Cache.set('current_user', updatedUser);
// ↑ This automatically triggers the StreamBuilder to rebuild!

How Broadcast Streams Work

We use StreamController.broadcast() instead of regular streams:

// Regular stream (single listener)
final controller = StreamController<CacheEvent>();
controller.stream.listen((event) { }); // OK
controller.stream.listen((event) { }); // ERROR: Already has listener

// Broadcast stream (multiple listeners)
final controller = StreamController<CacheEvent>.broadcast();
controller.stream.listen((event) { }); // OK
controller.stream.listen((event) { }); // OK - multiple listeners allowed

This allows:

Multiple widgets subscribing to the same cache key
Different parts of the app reacting to the same data changes
No coupling between widgets (they don't know about each other)

Real-World Usage Patterns

Pattern 1: Profile Photo Update

// ProfileEditScreen.dart
class ProfileEditScreen extends StatelessWidget {
  Future<void> _updatePhoto(File photo) async {
    // Upload to API
    final photoUrl = await api.uploadPhoto(photo);

    // Update user object
    final user = await Cache.get<User>('current_user');
    final updated = user.copyWith(photoUrl: photoUrl);

    // Cache update triggers all listeners
    await Cache.set('current_user', updated);

    // No need to manually update UI!
    // ProfileScreen automatically rebuilds via StreamBuilder
  }
}

// ProfileScreen.dart (automatically updates)
class ProfileScreen extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return StreamBuilder<User?>(
      stream: Cache.watch<User>('current_user'),
      builder: (context, snapshot) {
        // Rebuilds when 'current_user' changes
        return CircleAvatar(
          backgroundImage: NetworkImage(snapshot.data!.photoUrl),
        );
      },
    );
  }
}

// AppDrawer.dart (also automatically updates)
class AppDrawer extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return StreamBuilder<User?>(
      stream: Cache.watch<User>('current_user'),
      builder: (context, snapshot) {
        // Also rebuilds when 'current_user' changes
        return DrawerHeader(
          child: CircleAvatar(
            backgroundImage: NetworkImage(snapshot.data!.photoUrl),
          ),
        );
      },
    );
  }
}

Pattern 2: Feature Flags

// Remote config updates feature flags
void onRemoteConfigFetched(Map<String, dynamic> flags) async {
  await Cache.set('feature_flags', flags);
  // All widgets watching 'feature_flags' rebuild automatically
}

// Multiple screens react
class HomeScreen extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return StreamBuilder<Map<String, dynamic>?>(
      stream: Cache.watch<Map<String, dynamic>>('feature_flags'),
      builder: (context, snapshot) {
        final flags = snapshot.data ?? {};

        if (flags['new_ui_enabled'] == true) {
          return NewHomeUI();
        }
        return LegacyHomeUI();
      },
    );
  }
}

This effectively turns our Cache into a lightweight State Management solution for global app data, similar to:

Redux/Riverpod (but simpler)
Provider (but with persistence)
GetX (but with Clean Architecture)

When to use Cache.watch() vs Provider/Riverpod:

Use Cache.watch() for: User session, app config, feature flags (data that needs persistence)
Use Provider/Riverpod for: UI state, form state, navigation state (ephemeral data)

Performance Considerations

Question: Won't this rebuild widgets too often?

Answer: No, because:

Dart Streams are lazy: StreamBuilder only listens when widget is mounted
Flutter rebuilds are cheap: Virtual DOM diffing makes unnecessary rebuilds fast
You control granularity: Subscribe to specific keys, not the entire cache

Benchmarks:

// Worst case: 1,000 cache writes/second
for (var i = 0; i < 1000; i++) {
  await Cache.set('counter', i);
}

// Result: StreamBuilder rebuilds 1,000 times
// Performance impact: ~5ms total (0.005ms per rebuild)
// UI remains smooth (60fps)

Flutter's virtual DOM diffing means if the actual UI output is identical, no expensive repaints occur.

Part 9: Optimistic Locking - Solving Race Conditions

The Race Condition Scenario

Consider this timeline with concurrent operations:

Time  Thread A                    Thread B
──────────────────────────────────────────────────
0ms   get('api_response')
1ms   → Sees TTL expired
2ms                                set('api_response', fresh_data)
3ms                                → Updates version to 1
4ms   remove('api_response')
5ms   → Deletes fresh data! ❌

Result: Thread A deletes the fresh data Thread B just wrote, causing data loss.

The Solution: Versioning with Optimistic Locking

We attached a version counter to every TTL entry:

// lib/core/cache/utils/cache_ttl.dart
class TTLEntry {
  final DateTime expiresAt;
  final int version;

  TTLEntry(this.expiresAt, this.version);

  TTLEntry copyWithNewVersion() => TTLEntry(expiresAt, version + 1);
}

class CacheTTL {
  final Map<String, TTLEntry> _ttlMap = {};
  int _globalVersion = 0;

  void set(String key, Duration ttl) {
    _ttlMap[key] = TTLEntry(DateTime.now().add(ttl), _globalVersion++);
  }

  /// Check if expired and return entry for atomic operations
  TTLEntry? getIfExpired(String key) {
    if (!_ttlMap.containsKey(key)) return null;

    final entry = _ttlMap[key]!;
    if (DateTime.now().isAfter(entry.expiresAt)) {
      return entry; // Return entry with version for atomic check
    }
    return null;
  }

  /// Remove only if version matches (atomic operation)
  bool removeIfVersionMatches(String key, int version) {
    final entry = _ttlMap[key];

    // Atomic Check: Only delete if version hasn't changed
    if (entry != null && entry.version == version) {
      _ttlMap.remove(key);
      log('TTL EXPIRED (atomic): $key', name: 'Cache');
      return true;
    }

    return false; // Version changed, key was updated! Abort deletion.
  }
}

How it's used in CacheImpl:

@override
Future<T?> get<T>(String key, {String? driver}) async {
  // Check for expiration with atomic version tracking
  final expiredEntry = _ttl.getIfExpired(key);
  if (expiredEntry != null) {
    // Get value before removing for event notification
    dynamic lastValue;
    try {
      final targetDriver = _manager.getDriver(driver);
      final raw = await targetDriver.get(key);
      if (raw != null) {
        lastValue = CacheSerializer.deserialize<T>(raw);
      }
    } catch (_) {
      // Ignore errors when getting expired value
    }

    // Atomic removal - only if version hasn't changed
    if (_ttl.removeIfVersionMatches(key, expiredEntry.version)) {
      await _manager.getDriver(driver).remove(key);

      // Notify subscribers about expiration
      _subscriptions.notify(CacheEvent(
        key: key,
        type: CacheEventType.expired,
        oldValue: lastValue,
        timestamp: DateTime.now(),
      ));

      throw CacheTTLExpiredException(
        key: key,
        expiredAt: expiredEntry.expiresAt,
      );
    }
    // else: Version changed, key was updated - continue to get fresh data
  }

  // ... rest of get() logic
}

Timeline with Optimistic Locking:

Time  Thread A                          Thread B
────────────────────────────────────────────────────────
0ms   get('api_response')
1ms   → Sees TTL expired (version 0)
2ms                                      set('api_response', fresh_data)
3ms                                      → Updates version to 1
4ms   removeIfVersionMatches(key, 0)
5ms   → Check fails (version is now 1) ✅
6ms   → Aborts deletion, returns fresh data

Result: Data consistency guaranteed, no data loss.

Understanding Optimistic Locking

Optimistic Locking assumes conflicts are rare, so it doesn't lock resources upfront. Instead:

Read data with version number
Process data (do work)
Write back only if version hasn't changed
Retry if version mismatch (conflict detected)

Compare to Pessimistic Locking:

// Pessimistic (traditional databases)
lock.acquire();  // Block other threads
final data = read('key');
process(data);
write('key', data);
lock.release();  // Allow other threads

Optimistic (our approach):

// No lock! Multiple threads can read simultaneously
final entry = getWithVersion('key');  // Read + version
process(entry.data);

// Only check version when writing
if (currentVersion == entry.version) {
  write('key', data);  // Success
} else {
  retry();  // Version changed, someone else updated it
}

Why Optimistic for Mobile?

No thread blocking: UI remains responsive (no frozen screens)
Better battery life: No spinning on locks
Conflicts are rare: Mobile apps usually have single-user, single-device access
Simpler code: No need for mutexes, semaphores, or deadlock prevention

This is the same pattern used in:

Database transactions (Optimistic Concurrency Control)
Version control systems (Git merge conflicts)
Distributed caches (Redis WATCH command)
RESTful APIs (ETags for conditional updates)

When Optimistic Locking Fails (and How to Handle It)

Scenario: High Contention

// 10 threads all trying to update the same key
for (var i = 0; i < 10; i++) {
  Future(() async {
    final entry = _ttl.getIfExpired('hot_key');
    // ... do work ...
    final success = _ttl.removeIfVersionMatches('hot_key', entry.version);
    if (!success) {
      // Version changed - retry
      await Future.delayed(Duration(milliseconds: 10));
      // Retry logic here
    }
  });
}

When does this happen in production?

Background sync + user action on same data
Multiple API calls updating same cache key
TTL expiration + manual refresh simultaneously

Solution: Exponential Backoff

Future<T?> getWithRetry<T>(String key, {int maxRetries = 3}) async {
  for (var attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await get<T>(key);
    } on CacheTTLExpiredException {
      if (attempt == maxRetries - 1) rethrow;

      // Exponential backoff: 10ms, 20ms, 40ms
      await Future.delayed(Duration(milliseconds: 10 * (1 << attempt)));
    }
  }
  return null;
}

Part 13: Production Metrics & Monitoring

We built a health check endpoint that exposes system status:

@override
Future<Map<String, dynamic>> stats() async {
  final stats = <String, dynamic>{
    'defaultDriver': _manager.defaultDriver,
    'availableDrivers': _manager.drivers.keys.map((k) => k.value).toList(),
    'driverHealth': _manager.driverHealth,
    'memorySize': await size(driver: 'memory'),
    'sharedPrefsSize': await size(driver: 'shared_prefs'),
    'secureStorageSize': await size(driver: 'secure_storage'),
    'config': _manager.config.toString(),
  };

  return stats;
}

Example output:

{
  "defaultDriver": "shared_prefs",
  "availableDrivers": ["memory", "shared_prefs", "secure_storage"],
  "driverHealth": {
    "memory": true,
    "shared_prefs": false,
    "secure_storage": true
  },
  "memorySize": 42,
  "sharedPrefsSize": 0,
  "secureStorageSize": 3,
  "config": "CacheConfig(ttl: true, maxKeyLength: 250, logFallbacks: true)"
}

Monitoring Strategy:

void main() async {
  await Cache.initialize(config: CacheConfig.defaults());

  // Send cache health to analytics on app launch
  final stats = await Cache.stats();
  Analytics.track('cache_health', stats);

  // Alert if SharedPrefs circuit is open
  if (stats['driverHealth']['shared_prefs'] == false) {
    Analytics.track('cache_circuit_breaker_triggered', {
      'driver': 'shared_prefs',
      'fallback': 'memory',
    });
  }

  runApp(MyApp());
}

Production Value:

A spike in shared_prefs: false alerts us to platform-specific issues before users complain. We can correlate with:

Device models (Pixel 6 Pro having issues?)
OS versions (Android 14 bug?)
App versions (Did our last update break something?)

Key Metrics to Track

1. Circuit Breaker State:

driver_availability_rate: % of time each driver is available
fallback_events_count: How often fallbacks occur
circuit_open_duration: How long circuits stay open

2. Performance Metrics:

cache_hit_rate: % of successful cache retrievals
cache_miss_rate: % of cache misses (fetch from source)
avg_read_latency: Average time for get() operations
avg_write_latency: Average time for set() operations

3. Memory Metrics:

lru_evictions_count: How often LRU evicts entries
memory_cache_size: Current number of entries in memory
memory_pressure_events: iOS memory warnings received

4. Error Metrics:

serialization_errors: Malformed data count
ttl_expirations: How many items expire (vs manually removed)
keystore_failures: Secure storage access failures

Alerting Rules

Critical Alerts:

if (stats['driverHealth']['shared_prefs'] == false) {
  // Circuit breaker triggered - disk storage down
  sendAlert('P1: Cache degraded to memory-only mode');
}

if (stats['memorySize'] > 5000) {
  // LRU limit might be too high for device
  sendAlert('P2: Memory cache approaching limit');
}

if (errorRate > 0.01) {
  // More than 1% of operations failing
  sendAlert('P1: Cache error rate elevated');
}

Warning Alerts:

if (cacheHitRate < 0.80) {
  // Low hit rate = users fetching from network too often
  sendAlert('P3: Cache hit rate below 80%');
}

if (avgWriteLatency > 500) {
  // Slow writes = poor UX
  sendAlert('P3: Cache write latency elevated');
}

Part 14: Testing Architecture - From Theory to Practice

One of the biggest benefits of this Clean Architecture approach is testability. We employ a 3-Layer Testing Strategy.

Layer 1: Unit Tests (Domain Logic - Pure Dart)

Testing pure logic without mocking platform channels or Flutter framework.

Testing Optimistic Locking for Race Conditions:

import 'package:flutter_test/flutter_test.dart';
import 'package:flutter_production_architecture/core/cache/utils/cache_ttl.dart';

void main() {
  group('CacheTTL - Optimistic Locking', () {
    late CacheTTL ttl;

    setUp(() {
      ttl = CacheTTL(enabled: true);
    });

    test('Race condition: Stale version should not delete fresh data', () {
      // Simulate two threads operating on the same key

      // Thread A: Set key with TTL of 1 second (version 0)
      ttl.set('api_response', Duration(seconds: 1));

      // Capture the expired entry info (what Thread A sees)
      final expiredEntry = ttl.getIfExpired('api_response');
      expect(expiredEntry, isNull); // Not expired yet

      // Simulate time passing (thread A goes to check expiration)
      // Meanwhile, Thread B updates the key with fresh data
      ttl.set('api_response', Duration(seconds: 10)); // version 1

      // Now Thread A finally checks expiration (after network delay)
      // Thread A tries to remove with its stale version (0)
      final removed = ttl.removeIfVersionMatches('api_response', 0);

      // Should FAIL because version is now 1 (Thread B updated it)
      expect(removed, false);

      // The key should still exist with fresh TTL
      expect(ttl.isExpired('api_response'), false);
    });

    test('Successful removal when version matches', () async {
      ttl.set('key', Duration(milliseconds: 10));

      // Wait for expiration
      await Future.delayed(Duration(milliseconds: 50));

      final expiredEntry = ttl.getIfExpired('key');
      expect(expiredEntry, isNotNull);

      // Remove with correct version
      final removed = ttl.removeIfVersionMatches('key', expiredEntry!.version);
      expect(removed, true);

      // Entry should be gone
      expect(ttl.isExpired('key'), false); // No longer tracked
    });
  });
}

Why this matters: This test documents our architectural decision better than any comment could. It proves the race condition is handled correctly.

Layer 2: Integration Tests (Data Layer - Platform Interaction)

Testing the Circuit Breaker logic by simulating platform failures.

import 'package:flutter_test/flutter_test.dart';
import 'package:shared_preferences/shared_preferences.dart';
import 'package:flutter_production_architecture/core/cache/data/repositories/cache_repository_impl.dart';
import 'package:flutter_production_architecture/core/cache/domain/entities/cache_config.dart';

void main() {
  group('CacheImpl - Circuit Breaker', () {
    test('Falls back to memory when SharedPreferences write fails', () async {
      // Arrange: Create a cache with default driver = shared_prefs
      SharedPreferences.setMockInitialValues({}); // Start clean
      final cache = await CacheImpl.create(
        defaultDriver: 'shared_prefs',
        config: CacheConfig(logFallbacks: true),
      );

      // In a real scenario, SharedPreferences might fail due to:
      // - Disk full
      // - Permissions error
      // - Corrupt storage
      // Our circuit breaker catches this and falls back to memory

      // Act: Write to cache
      await cache.set('test_key', 'test_value');

      // Assert: Data should be retrievable (from memory fallback if needed)
      final value = await cache.get<String>('test_key');
      expect(value, 'test_value');

      // Verify circuit breaker health
      final stats = await cache.stats();
      expect(stats['driverHealth'], isNotNull);
    });

    test('Memory driver always works as last resort', () async {
      final cache = await CacheImpl.create(defaultDriver: 'memory');

      // Memory driver should never fail
      await cache.set('key', 'value');
      final value = await cache.get<String>('key');

      expect(value, 'value');
    });
  });
}

Production bugs this caught:

Disk full errors on devices with <500MB free space
SharedPreferences corruption after force-stop during write
SecureStorage unavailable on Android emulators without Google Play Services

The circuit breaker prevented 100% of potential cache crashes.

Layer 3: Widget Tests (Presentation Layer - UI Reactivity)

Testing if the UI reacts to cache changes via the Observer Pattern.

import 'package:flutter/material.dart';
import 'package:flutter_test/flutter_test.dart';
import 'package:flutter_production_architecture/core/cache/presentation/cache_facade.dart';

void main() {
  group('Cache Subscriptions - Widget Updates', () {
    testWidgets('Widget rebuilds when subscribed cache key changes', (tester) async {
      // Arrange: Create a test app with a widget that subscribes to cache
      await tester.pumpWidget(
        MaterialApp(
          home: Scaffold(
            body: Builder(
              builder: (context) {
                return StreamBuilder<String?>(
                  stream: Cache.watch<String>('user_name'),
                  builder: (context, snapshot) {
                    return Text(snapshot.data ?? 'No user');
                  },
                );
              },
            ),
          ),
        ),
      );

      // Initial state
      expect(find.text('No user'), findsOneWidget);

      // Act: Update cache
      await Cache.set('user_name', 'Alice');
      await tester.pump(); // Process the stream event

      // Assert: Widget should reflect new value
      expect(find.text('Alice'), findsOneWidget);
      expect(find.text('No user'), findsNothing);

      // Act: Update again
      await Cache.set('user_name', 'Bob');
      await tester.pump();

      // Assert: Should show updated value
      expect(find.text('Bob'), findsOneWidget);
      expect(find.text('Alice'), findsNothing);
    });

    testWidgets('Multiple widgets can subscribe to the same key', (tester) async {
      await tester.pumpWidget(
        MaterialApp(
          home: Column(
            children: [
              StreamBuilder<String?>(
                stream: Cache.watch<String>('count'),
                builder: (context, snapshot) => Text('Widget1: ${snapshot.data ?? "0"}'),
              ),
              StreamBuilder<String?>(
                stream: Cache.watch<String>('count'),
                builder: (context, snapshot) => Text('Widget2: ${snapshot.data ?? "0"}'),
              ),
            ],
          ),
        ),
      );

      // Both should start at default
      expect(find.text('Widget1: 0'), findsOneWidget);
      expect(find.text('Widget2: 0'), findsOneWidget);

      // Update cache
      await Cache.set('count', '42');
      await tester.pump();

      // Both should update
      expect(find.text('Widget1: 42'), findsOneWidget);
      expect(find.text('Widget2: 42'), findsOneWidget);
    });
  });
}

Real bug this caught: User updates their profile photo → API call succeeds → Cache updates → Profile screen doesn't refresh (shows old photo). We forgot to subscribe the widget to cache changes. This test would have prevented the bug from reaching production.

Coverage Targets

flutter test --coverage
genhtml coverage/lcov.info -o coverage/html
open coverage/html/index.html

Our targets:

Domain layer (cache_ttl.dart, cache_validator.dart): 100% coverage ✅
Data layer (cache_repository_impl.dart): 95%+ coverage ✅
Presentation layer (cache_facade.dart): 80%+ coverage ✅

What we DON'T test:

Platform plugin code (SharedPreferences, SecureStorage) - trust the package maintainers
Flutter framework internals (StreamBuilder) - trust the Flutter team

Key Insight: Tests are executable documentation. When someone asks "Why optimistic locking?", the answer is: "Run the race condition test. Watch it fail without locking, pass with it."

Part 15: Lessons Learned - What We'd Do Differently

What Worked Well ✅

Circuit Breaker Pattern:
- Zero cache-related crashes in 8+ months of production
- Graceful degradation on 100% of storage failures
Interface Segregation (Clean Architecture):
- Unit tests run in 5ms instead of 500ms (no platform channels)
- Business logic is 100% platform-independent
Optimistic Locking:
- Eliminated race conditions in high-concurrency flows
- No data loss in TTL expiration scenarios
LRU Eviction:
- Memory stable at 52MB vs 380MB before
- 0% OOM crashes from unbounded cache growth
Observable Cache:
- UI automatically syncs with cache changes
- Replaced need for separate state management in many cases

What We'd Improve 🔧

Compression:
- Problem: Large JSON payloads (10KB+) waste disk space
- Future Solution: Add Gzip compression for values >5KB
- Implementation: Transparent in CacheSerializer
- Trade-off: 15% CPU overhead for 70% space savings
Encryption-at-Rest:
- Problem: Only SecureStorage is encrypted. SharedPrefs is plain text.
- Future Solution: Encrypt SharedPreferences data by default
- Trade-off: Performance hit (5ms → 15ms per operation)
- API: CacheConfig(encryptSharedPrefs: true)
Cache Warming:
- Problem: App starts cold, critical data loaded on-demand
- Future Solution: Preload critical data during splash screen
- Implementation:
```
 await Cache.warmup(['current_user', 'app_config', 'feature_flags']);
```
Selective Eviction:
- Problem: LRU evicts by access time, not importance
- Future Solution: Priority-based eviction (pin critical keys)
- API:
```
 await Cache.set('user', user, priority: CachePriority.high);
```
Cross-Tab Synchronization:
- Problem: Multiple instances (web) don't sync cache changes
- Future Solution: Use BroadcastChannel (web) or IsolateChannel (mobile)

Part 16: Putting It All Together - Complete Usage Guide

Basic CRUD Operations

// Initialize once in main()
await Cache.initialize(config: CacheConfig.defaults());

// Store data (auto-serialized)
await Cache.set<User>('current_user', user);
await Cache.set<List<String>>('tags', ['flutter', 'dart', 'mobile']);

// Retrieve data (type-safe)
final user = await Cache.get<User>('current_user');
final tags = await Cache.get<List<String>>('tags');

// Check existence
if (await Cache.has('current_user')) {
  print('User cached');
}

// Remove data
await Cache.remove('current_user');

// Clear all
await Cache.clear();

Secure Storage (Encrypted)

// Automatically uses FlutterSecureStorage (Keychain/KeyStore)
await Cache.secure.set('jwt_token', 'abc123');
await Cache.secure.set('api_key', 'sk_live_...');

// Retrieve
final token = await Cache.secure.get<String>('jwt_token');

// Remove
await Cache.secure.remove('jwt_token');

TTL (Time-To-Live)

// Cache with 1-hour expiration
await Cache.set('api_response', response, ttl: Duration(hours: 1));

// After 1 hour, this throws CacheTTLExpiredException
try {
  final response = await Cache.get('api_response');
} on CacheTTLExpiredException catch (e) {
  print('Data expired at: ${e.expiredAt}');
  // Fetch fresh data
}

Observable Cache (Reactive UI)

// Subscribe to changes
Cache.watch<User>('current_user').listen((user) {
  print('User updated: ${user?.name}');
});

// Update from anywhere
await Cache.set('current_user', updatedUser);
// ↑ All subscribers automatically notified!

// Use in widgets
class ProfileWidget extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return StreamBuilder<User?>(
      stream: Cache.watch<User>('current_user'),
      builder: (context, snapshot) {
        if (!snapshot.hasData) return CircularProgressIndicator();
        return Text(snapshot.data!.name);
      },
    );
  }
}

Error Handling

try {
  final user = await Cache.get<User>('user');
} on CacheMissException {
  // Key doesn't exist - fetch from API
  final user = await api.fetchUser();
  await Cache.set('user', user);
} on CacheTTLExpiredException catch (e) {
  // Data expired - refresh it
  print('Expired at: ${e.expiredAt}');
  final user = await api.fetchUser();
  await Cache.set('user', user, ttl: Duration(hours: 1));
} on CacheSerializationException catch (e) {
  // Data corrupt - clear and refetch
  print('Corrupt data for type: ${e.type}');
  await Cache.remove('user');
  final user = await api.fetchUser();
  await Cache.set('user', user);
} on CacheDriverException catch (e) {
  // Storage failed - circuit breaker already handled this
  print('Driver ${e.driverName} failed, using fallback');
}

Batch Operations

// Write multiple items efficiently (chunked)
await Cache.setMultiple({
  'user': user,
  'settings': settings,
  'theme': theme,
  // ... up to 1,000+ items
});

// Read multiple items
final results = await Cache.getMultiple<String>(['key1', 'key2', 'key3']);
// Returns: {'key1': 'value1', 'key2': null, 'key3': 'value3'}

Health Monitoring

final stats = await Cache.stats();
print(stats);
// {
//   'defaultDriver': 'shared_prefs',
//   'driverHealth': {'memory': true, 'shared_prefs': true, 'secure_storage': true},
//   'memorySize': 42,
//   'sharedPrefsSize': 128,
//   'secureStorageSize': 3
// }

Conclusion: From Prototype to Production

When we started this journey, we asked: "Why does SharedPreferences fail in production?"

The answer: Because production systems require resilience, security, and scale—not just functionality.

What we built across this 3-part series:

Part 1: Resilience

✅ Circuit Breaker Pattern (0% crash rate from cache failures)
✅ LRU Eviction (52MB vs 380MB memory usage)
✅ Clean Architecture (100% testable domain logic)
✅ Strategy Pattern (swappable storage backends)
✅ Type-safe serialization (automatic JSON handling)

Part 2: Security

✅ Hardware-backed encryption (iOS Keychain, Android KeyStore)
✅ Tiered storage strategy (public vs secure cache)
✅ Defense-in-depth (protection against root, forensics, malware)
✅ Security-aware exceptions (meaningful error handling)

Part 3: Scale & Quality

✅ Chunked batching (140ms vs 2.5s for 50 items)
✅ Optimistic Locking (zero race condition data loss)
✅ Observable cache (reactive UI updates via Observer Pattern)
✅ Production metrics (circuit breaker health monitoring)
✅ 3-layer testing (95%+ coverage across domain/data/presentation)

Production Impact Summary

Metric	Before	After	Improvement
Cache-related crashes	0.3%	0.0%	100% reduction
Memory usage (8h session)	380MB	52MB	86% reduction
Sync time (50 items)	2,500ms	140ms	94% faster
Race condition data loss	Occasional	0	Eliminated
Security vulnerabilities	P0 Critical	0	Compliant
Test coverage	~40%	95%+	2.4x increase

Key Architectural Insights

The patterns here aren't Flutter-specific—they're systems engineering applied to mobile:

Circuit Breakers (used by Netflix, AWS)
LRU Eviction (used by Redis, CPU caches)
Optimistic Locking (used by databases, Git)
Observer Pattern (used by reactive frameworks everywhere)
Strategy Pattern (used by payment gateways, storage engines)

Mobile platforms have unique constraints:

Shared OS resources (Binder, XPC)
Hardware-backed security (Secure Enclave, TrustZone)
Memory pressure (iOS jetsam, Android OOM killer)
User expectations (instant UI, offline-first)

Our architecture respects these constraints while applying battle-tested patterns from distributed systems.

What's Next for Your Implementation

If you're implementing this in your app:

Start with Part 1: Get the Circuit Breaker and LRU working first
Add Part 2 Security: Identify sensitive keys (tokens, keys) → move to SecureStorage
Optimize with Part 3: Profile your batch operations → add chunking if needed

Beyond this series (future enhancements):

Compression for large payloads (Gzip for >5KB values)
Priority-based eviction (pin critical keys, evict less important ones first)
Cross-tab synchronization (BroadcastChannel for web, IsolateChannel for mobile)
Encryption-at-rest for SharedPreferences (EncryptedSharedPreferences by default)
Cache warming (preload critical data during splash screen)