DEV Community

Cover image for From 0.3% Crash Rate to Zero: Scaling Flutter Cache with Batching, Locking, and Observable State
Mahmoud Alatrash
Mahmoud Alatrash

Posted on

From 0.3% Crash Rate to Zero: Scaling Flutter Cache with Batching, Locking, and Observable State

From 0.3% Crash Rate to Zero: Production-Grade Performance

This is Part 3 of a 3-part series.

πŸ”™ Catch up on the series:

Our cache now:

  • βœ… Survives platform failures (Circuit Breaker β†’ 0% crash rate)
  • βœ… Prevents memory leaks (LRU Eviction β†’ 52MB vs 380MB)
  • βœ… Protects sensitive data (Hardware-backed encryption)

But when we launched to production, our telemetry showed new problems:

Metric Status Issue
Cache crashes βœ… 0% Circuit breaker works!
Security vulnerabilities βœ… 0 Encryption works!
Login sync time (50 items) ❌ 2.5s Users frustrated
Large batch operations ❌ CRASH TransactionTooLargeException
UI reactivity ❌ Stale Profile photo update doesn't refresh screen

These aren't architecture problemsβ€”they're operational problems.

Production exposed three critical gaps:

  1. Performance: Sequential writes are too slow. Naive parallelism crashes the app.
  2. Concurrency: Race conditions cause data loss during TTL expiration.
  3. Observability: Cache updates don't trigger UI updates (stale state).

In this part, we'll solve production-scale challenges:

  • ⚑ Performance: Chunked batching to respect platform limits (1MB Binder, XPC constraints)
  • ⚑ Concurrency: Optimistic Locking to prevent race conditions (version-based atomic operations)
  • ⚑ Reactivity: Observer Pattern to make cache changes trigger UI updates
  • ⚑ Monitoring: Production metrics and circuit breaker health checks
  • ⚑ Quality: 3-layer testing strategy (unit, integration, widget tests)
  • ⚑ Lessons Learned: What worked, what we'd change

By the end, you'll have a production-ready cache that handles millions of operations with zero downtime.

Full source code available on GitHub


Part 6: Performance - Batching Against Platform Limits

The Sequential Write Problem

When we first implemented setMultiple(), we took the straightforward approach:

@override
Future<void> setMultiple(
  Map<String, dynamic> items, {
  String? driver,
  Duration? ttl,
}) async {
  for (final entry in items.entries) {
    await set(entry.key, entry.value, driver: driver, ttl: ttl);
  }
}
Enter fullscreen mode Exit fullscreen mode

This looks innocent. It's readable. It works.

But in production, when syncing 50 user preferences after login, we measured:

  • 2.5 seconds on Android (Pixel 4a)
  • 1.8 seconds on iOS (iPhone 11)

Each await blocks the next operation. With 50 items at ~50ms per write, you're looking at 2,500ms of sequential I/O.

The Naive Parallel Solution

Our first optimization attempt was obvious:

// WORSE: Unbounded Parallelism
await Future.wait(items.map((item) => cache.set(item.key, item.value)));
Enter fullscreen mode Exit fullscreen mode

On paper, this should be 50x faster (50ms vs 2,500ms).

In reality:

  • βœ… Local testing: 50 items in 120ms (20x improvement!)
  • ❌ Production (Android): App crashes after 200+ items
  • ❌ Production (iOS): Keychain access denied errors

Deep Dive: Why Unbounded Parallelism Crashes

Flutter communicates with native code via platform channels. These have undocumented hard limits:

Android (MethodChannel - Binder Transaction Limit):

Max concurrent calls: ~64
Transaction buffer: 1MB total
When exceeded: TransactionTooLargeException
Result: App crash
Enter fullscreen mode Exit fullscreen mode

iOS (FlutterMethodChannel - XPC Limit):

Max concurrent calls: ~128
Message queue limit: Varies by iOS version
When exceeded: Dropped messages (silent failures!)
Result: Data loss without exceptions
Enter fullscreen mode Exit fullscreen mode

When we launched 1,000 concurrent platform channel calls, we exhausted these system resources. The crash wasn't in our Dart codeβ€”it was in the native bridge layer.

Problem Breakdown:

  1. Memory Exhaustion: Each Future holds:
    • Serialized data (strings)
    • Platform channel buffers (native memory)
    • Event loop callbacks
    • Stack frames

For 1,000 items, we saw 500MB+ memory spikes and GC pauses causing UI jank.

  1. Platform Channel Saturation: The Android Binder has a 1MB transaction buffer limit shared across all IPC. iOS XPC has similar constraints. When exceeded, writes silently fail without throwing exceptions.

  2. Secure Storage Rate Limiting:

    • iOS Keychain: ~100 writes/second (error: errSecInteractionNotAllowed -25308)
    • Android KeyStore: ~50 writes/second (error: android.security.KeyStoreException)

These aren't documented by Flutter or the platform vendorsβ€”we discovered them through production telemetry and crash reports.

Understanding Platform Limits

Android Binder Transaction Buffer:

The Binder buffer is shared across all system services (not just your app). If another app is also using Binder heavily, your app's available buffer shrinks.

iOS XPC Message Limits:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  XPC Message Queue (per connection)    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Message 1: Keychain write              β”‚
β”‚ Message 2: Keychain write              β”‚
β”‚ ...                                    β”‚
β”‚ Message 128: Keychain write            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Total: ~128 messages in flight         β”‚
β”‚ Message 129: DROPPED (no error!) ❌     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

XPC drops messages instead of crashing. This means data loss without any visible exception in your Dart code.

The Solution: Chunked Parallel Writes

After profiling across 15 device models, we found the sweet spot:

Future<void> setMultiple(
  Map<String, dynamic> items, {
  String? driver,
  Duration? ttl,
}) async {
  // Tune per platform and driver type
  final chunkSize = _getOptimalChunkSize(driver);
  final entries = items.entries.toList();

  for (var i = 0; i < entries.length; i += chunkSize) {
    final chunk = entries.skip(i).take(chunkSize);

    // Parallel within chunk, sequential between chunks
    await Future.wait(
      chunk.map((entry) => 
        set(entry.key, entry.value, driver: driver, ttl: ttl)
      )
    );
  }
}

int _getOptimalChunkSize(String? driver) {
  switch (driver) {
    case 'secure_storage':
      return 10;  // Keychain/KeyStore rate limits
    case 'shared_prefs':
      return 50;  // Disk I/O sweet spot
    case 'memory':
      return 100; // CPU-bound, can go higher
    default:
      return 50;  // Conservative default
  }
}
Enter fullscreen mode Exit fullscreen mode

Why This Works:

  1. Memory-Bounded: Only N futures in flight at once (vs unbounded)
  2. Platform-Safe: Stays under Binder/XPC channel limits
  3. Rate-Limit Compliant: Respects native storage throttling
  4. Back-Pressure: Chunks act as natural flow control

Visualizing the Chunking Strategy

Sequential (Original):

Production Results:

Scenario Sequential Naive Parallel Chunked Parallel
50 items (shared_prefs) 2,500ms 120ms 140ms βœ…
500 items (shared_prefs) 25,000ms CRASH ❌ 1,200ms βœ…
50 items (secure_storage) 12,000ms ERROR ❌ 2,500ms βœ…
1,000 items (memory) 8,000ms 180ms* 120ms βœ…

*Naive parallel succeeded in memory driver but used 400MB RAM

Architectural Insight: This is a classic distributed systems problem applied to mobile. In backend systems, you use circuit breakers and bulkheads to prevent cascading failures. On mobile, the constraint isn't network latencyβ€”it's shared OS resources (memory, file handles, Keychain locks).

Chunking is our bulkhead pattern. It isolates failures:

  • If chunk 5 fails, chunks 1-4 succeeded
  • We can retry chunk 5 without redoing all work
  • Memory pressure is predictable and bounded

Benchmarking Chunk Sizes

We benchmarked different chunk sizes on various device tiers:

SharedPreferences Driver:

Chunk Size 50 items 500 items 1,000 items Memory Peak
10 180ms 2,100ms 4,300ms 45MB
25 152ms 1,450ms 2,800ms 78MB
50 140ms 1,200ms 2,350ms 95MB
100 138ms CRASH CRASH 180MB

SecureStorage Driver:

Chunk Size 50 items 500 items Memory Peak
5 3,200ms 35,000ms 42MB
10 2,500ms 26,000ms 58MB
20 2,450ms ERROR 95MB
50 ERROR ERROR -

Key Findings:

  • SharedPrefs: 50 items/chunk optimal (disk I/O saturates after this)
  • SecureStorage: 10 items/chunk optimal (Keychain rate-limits kick in)
  • Memory: 100 items/chunk optimal (pure CPU, no I/O bottleneck)

Part 8: The Observer Pattern - Making Cache Observable

The Problem: Stale UI State

Traditional caches are black boxes:

  • Widget A updates the cache
  • Widget B still shows old data
  • You have to manually refresh Widget B

This leads to stale UI and bugs like:

  • User updates profile photo β†’ API succeeds β†’ Cache updates β†’ Profile screen doesn't refresh
  • Settings changed β†’ Cache updated β†’ App still uses old settings

The Solution: Reactive Events (Pub/Sub)

We implemented an Observer Pattern using Dart Streams:

// lib/core/cache/domain/events/cache_event.dart
enum CacheEventType {
  created,
  updated,
  removed,
  expired,
  cleared,
}

class CacheEvent {
  final String key;
  final CacheEventType type;
  final dynamic value;
  final dynamic oldValue;
  final DateTime timestamp;

  const CacheEvent({
    required this.key,
    required this.type,
    this.value,
    this.oldValue,
    required this.timestamp,
  });

  bool get isCreated => type == CacheEventType.created;
  bool get isUpdated => type == CacheEventType.updated;
  bool get isRemoved => type == CacheEventType.removed;
  bool get isExpired => type == CacheEventType.expired;
}
Enter fullscreen mode Exit fullscreen mode

Implementation:

// lib/core/cache/utils/cache_subscription_manager.dart
class CacheSubscriptionManager {
  final Map<String, StreamController<CacheEvent>> _controllers = {};

  Stream<T?> watch<T>(String key) {
    _controllers.putIfAbsent(
      key,
      () => StreamController<CacheEvent>.broadcast(),
    );

    return _controllers[key]!.stream.map((event) => event.value as T?);
  }

  void notify(CacheEvent event) {
    _controllers[event.key]?.add(event);
  }

  void dispose(String key) {
    _controllers[key]?.close();
    _controllers.remove(key);
  }
}
Enter fullscreen mode Exit fullscreen mode

Integration with Cache Operations

Every cache mutation triggers an event:

@override
Future<void> set<T>(String key, T value, {String? driver, Duration? ttl}) async {
  // Get old value for event
  dynamic oldValue;
  try {
    oldValue = await get<T>(key, driver: driver);
  } catch (_) {
    // Key doesn't exist yet
  }

  // Perform write
  final targetDriver = _manager.getDriver(driver);
  final serialized = CacheSerializer.serialize(value);
  await targetDriver.set(key, serialized);

  // Set TTL if provided
  if (ttl != null && _config?.enableTTL == true) {
    _ttl.set(key, ttl);
  }

  // Notify subscribers
  _subscriptions.notify(CacheEvent(
    key: key,
    type: oldValue == null ? CacheEventType.created : CacheEventType.updated,
    value: value,
    oldValue: oldValue,
    timestamp: DateTime.now(),
  ));
}
Enter fullscreen mode Exit fullscreen mode

Usage in UI:

class ProfileScreen extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return StreamBuilder<User?>(
      stream: Cache.watch<User>('current_user'),
      builder: (context, snapshot) {
        if (!snapshot.hasData) return CircularProgressIndicator();

        final user = snapshot.data!;
        return Column(
          children: [
            CircleAvatar(backgroundImage: NetworkImage(user.photoUrl)),
            Text(user.name),
          ],
        );
      },
    );
  }
}

// Anywhere in the app:
await Cache.set('current_user', updatedUser);
// ↑ This automatically triggers the StreamBuilder to rebuild!
Enter fullscreen mode Exit fullscreen mode

How Broadcast Streams Work

We use StreamController.broadcast() instead of regular streams:

// Regular stream (single listener)
final controller = StreamController<CacheEvent>();
controller.stream.listen((event) { }); // OK
controller.stream.listen((event) { }); // ERROR: Already has listener

// Broadcast stream (multiple listeners)
final controller = StreamController<CacheEvent>.broadcast();
controller.stream.listen((event) { }); // OK
controller.stream.listen((event) { }); // OK - multiple listeners allowed
Enter fullscreen mode Exit fullscreen mode

This allows:

  • Multiple widgets subscribing to the same cache key
  • Different parts of the app reacting to the same data changes
  • No coupling between widgets (they don't know about each other)

Real-World Usage Patterns

Pattern 1: Profile Photo Update

// ProfileEditScreen.dart
class ProfileEditScreen extends StatelessWidget {
  Future<void> _updatePhoto(File photo) async {
    // Upload to API
    final photoUrl = await api.uploadPhoto(photo);

    // Update user object
    final user = await Cache.get<User>('current_user');
    final updated = user.copyWith(photoUrl: photoUrl);

    // Cache update triggers all listeners
    await Cache.set('current_user', updated);

    // No need to manually update UI!
    // ProfileScreen automatically rebuilds via StreamBuilder
  }
}

// ProfileScreen.dart (automatically updates)
class ProfileScreen extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return StreamBuilder<User?>(
      stream: Cache.watch<User>('current_user'),
      builder: (context, snapshot) {
        // Rebuilds when 'current_user' changes
        return CircleAvatar(
          backgroundImage: NetworkImage(snapshot.data!.photoUrl),
        );
      },
    );
  }
}

// AppDrawer.dart (also automatically updates)
class AppDrawer extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return StreamBuilder<User?>(
      stream: Cache.watch<User>('current_user'),
      builder: (context, snapshot) {
        // Also rebuilds when 'current_user' changes
        return DrawerHeader(
          child: CircleAvatar(
            backgroundImage: NetworkImage(snapshot.data!.photoUrl),
          ),
        );
      },
    );
  }
}
Enter fullscreen mode Exit fullscreen mode

Pattern 2: Feature Flags

// Remote config updates feature flags
void onRemoteConfigFetched(Map<String, dynamic> flags) async {
  await Cache.set('feature_flags', flags);
  // All widgets watching 'feature_flags' rebuild automatically
}

// Multiple screens react
class HomeScreen extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return StreamBuilder<Map<String, dynamic>?>(
      stream: Cache.watch<Map<String, dynamic>>('feature_flags'),
      builder: (context, snapshot) {
        final flags = snapshot.data ?? {};

        if (flags['new_ui_enabled'] == true) {
          return NewHomeUI();
        }
        return LegacyHomeUI();
      },
    );
  }
}
Enter fullscreen mode Exit fullscreen mode

This effectively turns our Cache into a lightweight State Management solution for global app data, similar to:

  • Redux/Riverpod (but simpler)
  • Provider (but with persistence)
  • GetX (but with Clean Architecture)

When to use Cache.watch() vs Provider/Riverpod:

  • Use Cache.watch() for: User session, app config, feature flags (data that needs persistence)
  • Use Provider/Riverpod for: UI state, form state, navigation state (ephemeral data)

Performance Considerations

Question: Won't this rebuild widgets too often?

Answer: No, because:

  1. Dart Streams are lazy: StreamBuilder only listens when widget is mounted
  2. Flutter rebuilds are cheap: Virtual DOM diffing makes unnecessary rebuilds fast
  3. You control granularity: Subscribe to specific keys, not the entire cache

Benchmarks:

// Worst case: 1,000 cache writes/second
for (var i = 0; i < 1000; i++) {
  await Cache.set('counter', i);
}

// Result: StreamBuilder rebuilds 1,000 times
// Performance impact: ~5ms total (0.005ms per rebuild)
// UI remains smooth (60fps)
Enter fullscreen mode Exit fullscreen mode

Flutter's virtual DOM diffing means if the actual UI output is identical, no expensive repaints occur.


Part 9: Optimistic Locking - Solving Race Conditions

The Race Condition Scenario

Consider this timeline with concurrent operations:

Time  Thread A                    Thread B
──────────────────────────────────────────────────
0ms   get('api_response')
1ms   β†’ Sees TTL expired
2ms                                set('api_response', fresh_data)
3ms                                β†’ Updates version to 1
4ms   remove('api_response')
5ms   β†’ Deletes fresh data! ❌
Enter fullscreen mode Exit fullscreen mode

Result: Thread A deletes the fresh data Thread B just wrote, causing data loss.

The Solution: Versioning with Optimistic Locking

We attached a version counter to every TTL entry:

// lib/core/cache/utils/cache_ttl.dart
class TTLEntry {
  final DateTime expiresAt;
  final int version;

  TTLEntry(this.expiresAt, this.version);

  TTLEntry copyWithNewVersion() => TTLEntry(expiresAt, version + 1);
}

class CacheTTL {
  final Map<String, TTLEntry> _ttlMap = {};
  int _globalVersion = 0;

  void set(String key, Duration ttl) {
    _ttlMap[key] = TTLEntry(DateTime.now().add(ttl), _globalVersion++);
  }

  /// Check if expired and return entry for atomic operations
  TTLEntry? getIfExpired(String key) {
    if (!_ttlMap.containsKey(key)) return null;

    final entry = _ttlMap[key]!;
    if (DateTime.now().isAfter(entry.expiresAt)) {
      return entry; // Return entry with version for atomic check
    }
    return null;
  }

  /// Remove only if version matches (atomic operation)
  bool removeIfVersionMatches(String key, int version) {
    final entry = _ttlMap[key];

    // Atomic Check: Only delete if version hasn't changed
    if (entry != null && entry.version == version) {
      _ttlMap.remove(key);
      log('TTL EXPIRED (atomic): $key', name: 'Cache');
      return true;
    }

    return false; // Version changed, key was updated! Abort deletion.
  }
}
Enter fullscreen mode Exit fullscreen mode

How it's used in CacheImpl:

@override
Future<T?> get<T>(String key, {String? driver}) async {
  // Check for expiration with atomic version tracking
  final expiredEntry = _ttl.getIfExpired(key);
  if (expiredEntry != null) {
    // Get value before removing for event notification
    dynamic lastValue;
    try {
      final targetDriver = _manager.getDriver(driver);
      final raw = await targetDriver.get(key);
      if (raw != null) {
        lastValue = CacheSerializer.deserialize<T>(raw);
      }
    } catch (_) {
      // Ignore errors when getting expired value
    }

    // Atomic removal - only if version hasn't changed
    if (_ttl.removeIfVersionMatches(key, expiredEntry.version)) {
      await _manager.getDriver(driver).remove(key);

      // Notify subscribers about expiration
      _subscriptions.notify(CacheEvent(
        key: key,
        type: CacheEventType.expired,
        oldValue: lastValue,
        timestamp: DateTime.now(),
      ));

      throw CacheTTLExpiredException(
        key: key,
        expiredAt: expiredEntry.expiresAt,
      );
    }
    // else: Version changed, key was updated - continue to get fresh data
  }

  // ... rest of get() logic
}
Enter fullscreen mode Exit fullscreen mode

Timeline with Optimistic Locking:

Time  Thread A                          Thread B
────────────────────────────────────────────────────────
0ms   get('api_response')
1ms   β†’ Sees TTL expired (version 0)
2ms                                      set('api_response', fresh_data)
3ms                                      β†’ Updates version to 1
4ms   removeIfVersionMatches(key, 0)
5ms   β†’ Check fails (version is now 1) βœ…
6ms   β†’ Aborts deletion, returns fresh data
Enter fullscreen mode Exit fullscreen mode

Result: Data consistency guaranteed, no data loss.

Understanding Optimistic Locking

Optimistic Locking assumes conflicts are rare, so it doesn't lock resources upfront. Instead:

  1. Read data with version number
  2. Process data (do work)
  3. Write back only if version hasn't changed
  4. Retry if version mismatch (conflict detected)

Compare to Pessimistic Locking:

// Pessimistic (traditional databases)
lock.acquire();  // Block other threads
final data = read('key');
process(data);
write('key', data);
lock.release();  // Allow other threads
Enter fullscreen mode Exit fullscreen mode

Optimistic (our approach):

// No lock! Multiple threads can read simultaneously
final entry = getWithVersion('key');  // Read + version
process(entry.data);

// Only check version when writing
if (currentVersion == entry.version) {
  write('key', data);  // Success
} else {
  retry();  // Version changed, someone else updated it
}
Enter fullscreen mode Exit fullscreen mode

Why Optimistic for Mobile?

  1. No thread blocking: UI remains responsive (no frozen screens)
  2. Better battery life: No spinning on locks
  3. Conflicts are rare: Mobile apps usually have single-user, single-device access
  4. Simpler code: No need for mutexes, semaphores, or deadlock prevention

This is the same pattern used in:

  • Database transactions (Optimistic Concurrency Control)
  • Version control systems (Git merge conflicts)
  • Distributed caches (Redis WATCH command)
  • RESTful APIs (ETags for conditional updates)

When Optimistic Locking Fails (and How to Handle It)

Scenario: High Contention

// 10 threads all trying to update the same key
for (var i = 0; i < 10; i++) {
  Future(() async {
    final entry = _ttl.getIfExpired('hot_key');
    // ... do work ...
    final success = _ttl.removeIfVersionMatches('hot_key', entry.version);
    if (!success) {
      // Version changed - retry
      await Future.delayed(Duration(milliseconds: 10));
      // Retry logic here
    }
  });
}
Enter fullscreen mode Exit fullscreen mode

When does this happen in production?

  • Background sync + user action on same data
  • Multiple API calls updating same cache key
  • TTL expiration + manual refresh simultaneously

Solution: Exponential Backoff

Future<T?> getWithRetry<T>(String key, {int maxRetries = 3}) async {
  for (var attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await get<T>(key);
    } on CacheTTLExpiredException {
      if (attempt == maxRetries - 1) rethrow;

      // Exponential backoff: 10ms, 20ms, 40ms
      await Future.delayed(Duration(milliseconds: 10 * (1 << attempt)));
    }
  }
  return null;
}
Enter fullscreen mode Exit fullscreen mode

Part 13: Production Metrics & Monitoring

We built a health check endpoint that exposes system status:

@override
Future<Map<String, dynamic>> stats() async {
  final stats = <String, dynamic>{
    'defaultDriver': _manager.defaultDriver,
    'availableDrivers': _manager.drivers.keys.map((k) => k.value).toList(),
    'driverHealth': _manager.driverHealth,
    'memorySize': await size(driver: 'memory'),
    'sharedPrefsSize': await size(driver: 'shared_prefs'),
    'secureStorageSize': await size(driver: 'secure_storage'),
    'config': _manager.config.toString(),
  };

  return stats;
}
Enter fullscreen mode Exit fullscreen mode

Example output:

{
  "defaultDriver": "shared_prefs",
  "availableDrivers": ["memory", "shared_prefs", "secure_storage"],
  "driverHealth": {
    "memory": true,
    "shared_prefs": false,
    "secure_storage": true
  },
  "memorySize": 42,
  "sharedPrefsSize": 0,
  "secureStorageSize": 3,
  "config": "CacheConfig(ttl: true, maxKeyLength: 250, logFallbacks: true)"
}
Enter fullscreen mode Exit fullscreen mode

Monitoring Strategy:

void main() async {
  await Cache.initialize(config: CacheConfig.defaults());

  // Send cache health to analytics on app launch
  final stats = await Cache.stats();
  Analytics.track('cache_health', stats);

  // Alert if SharedPrefs circuit is open
  if (stats['driverHealth']['shared_prefs'] == false) {
    Analytics.track('cache_circuit_breaker_triggered', {
      'driver': 'shared_prefs',
      'fallback': 'memory',
    });
  }

  runApp(MyApp());
}
Enter fullscreen mode Exit fullscreen mode

Production Value:

A spike in shared_prefs: false alerts us to platform-specific issues before users complain. We can correlate with:

  • Device models (Pixel 6 Pro having issues?)
  • OS versions (Android 14 bug?)
  • App versions (Did our last update break something?)

Key Metrics to Track

1. Circuit Breaker State:

  • driver_availability_rate: % of time each driver is available
  • fallback_events_count: How often fallbacks occur
  • circuit_open_duration: How long circuits stay open

2. Performance Metrics:

  • cache_hit_rate: % of successful cache retrievals
  • cache_miss_rate: % of cache misses (fetch from source)
  • avg_read_latency: Average time for get() operations
  • avg_write_latency: Average time for set() operations

3. Memory Metrics:

  • lru_evictions_count: How often LRU evicts entries
  • memory_cache_size: Current number of entries in memory
  • memory_pressure_events: iOS memory warnings received

4. Error Metrics:

  • serialization_errors: Malformed data count
  • ttl_expirations: How many items expire (vs manually removed)
  • keystore_failures: Secure storage access failures

Alerting Rules

Critical Alerts:

if (stats['driverHealth']['shared_prefs'] == false) {
  // Circuit breaker triggered - disk storage down
  sendAlert('P1: Cache degraded to memory-only mode');
}

if (stats['memorySize'] > 5000) {
  // LRU limit might be too high for device
  sendAlert('P2: Memory cache approaching limit');
}

if (errorRate > 0.01) {
  // More than 1% of operations failing
  sendAlert('P1: Cache error rate elevated');
}
Enter fullscreen mode Exit fullscreen mode

Warning Alerts:

if (cacheHitRate < 0.80) {
  // Low hit rate = users fetching from network too often
  sendAlert('P3: Cache hit rate below 80%');
}

if (avgWriteLatency > 500) {
  // Slow writes = poor UX
  sendAlert('P3: Cache write latency elevated');
}
Enter fullscreen mode Exit fullscreen mode

Part 14: Testing Architecture - From Theory to Practice

One of the biggest benefits of this Clean Architecture approach is testability. We employ a 3-Layer Testing Strategy.

Layer 1: Unit Tests (Domain Logic - Pure Dart)

Testing pure logic without mocking platform channels or Flutter framework.

Testing Optimistic Locking for Race Conditions:

import 'package:flutter_test/flutter_test.dart';
import 'package:flutter_production_architecture/core/cache/utils/cache_ttl.dart';

void main() {
  group('CacheTTL - Optimistic Locking', () {
    late CacheTTL ttl;

    setUp(() {
      ttl = CacheTTL(enabled: true);
    });

    test('Race condition: Stale version should not delete fresh data', () {
      // Simulate two threads operating on the same key

      // Thread A: Set key with TTL of 1 second (version 0)
      ttl.set('api_response', Duration(seconds: 1));

      // Capture the expired entry info (what Thread A sees)
      final expiredEntry = ttl.getIfExpired('api_response');
      expect(expiredEntry, isNull); // Not expired yet

      // Simulate time passing (thread A goes to check expiration)
      // Meanwhile, Thread B updates the key with fresh data
      ttl.set('api_response', Duration(seconds: 10)); // version 1

      // Now Thread A finally checks expiration (after network delay)
      // Thread A tries to remove with its stale version (0)
      final removed = ttl.removeIfVersionMatches('api_response', 0);

      // Should FAIL because version is now 1 (Thread B updated it)
      expect(removed, false);

      // The key should still exist with fresh TTL
      expect(ttl.isExpired('api_response'), false);
    });

    test('Successful removal when version matches', () async {
      ttl.set('key', Duration(milliseconds: 10));

      // Wait for expiration
      await Future.delayed(Duration(milliseconds: 50));

      final expiredEntry = ttl.getIfExpired('key');
      expect(expiredEntry, isNotNull);

      // Remove with correct version
      final removed = ttl.removeIfVersionMatches('key', expiredEntry!.version);
      expect(removed, true);

      // Entry should be gone
      expect(ttl.isExpired('key'), false); // No longer tracked
    });
  });
}
Enter fullscreen mode Exit fullscreen mode

Why this matters: This test documents our architectural decision better than any comment could. It proves the race condition is handled correctly.

Layer 2: Integration Tests (Data Layer - Platform Interaction)

Testing the Circuit Breaker logic by simulating platform failures.

import 'package:flutter_test/flutter_test.dart';
import 'package:shared_preferences/shared_preferences.dart';
import 'package:flutter_production_architecture/core/cache/data/repositories/cache_repository_impl.dart';
import 'package:flutter_production_architecture/core/cache/domain/entities/cache_config.dart';

void main() {
  group('CacheImpl - Circuit Breaker', () {
    test('Falls back to memory when SharedPreferences write fails', () async {
      // Arrange: Create a cache with default driver = shared_prefs
      SharedPreferences.setMockInitialValues({}); // Start clean
      final cache = await CacheImpl.create(
        defaultDriver: 'shared_prefs',
        config: CacheConfig(logFallbacks: true),
      );

      // In a real scenario, SharedPreferences might fail due to:
      // - Disk full
      // - Permissions error
      // - Corrupt storage
      // Our circuit breaker catches this and falls back to memory

      // Act: Write to cache
      await cache.set('test_key', 'test_value');

      // Assert: Data should be retrievable (from memory fallback if needed)
      final value = await cache.get<String>('test_key');
      expect(value, 'test_value');

      // Verify circuit breaker health
      final stats = await cache.stats();
      expect(stats['driverHealth'], isNotNull);
    });

    test('Memory driver always works as last resort', () async {
      final cache = await CacheImpl.create(defaultDriver: 'memory');

      // Memory driver should never fail
      await cache.set('key', 'value');
      final value = await cache.get<String>('key');

      expect(value, 'value');
    });
  });
}
Enter fullscreen mode Exit fullscreen mode

Production bugs this caught:

  • Disk full errors on devices with <500MB free space
  • SharedPreferences corruption after force-stop during write
  • SecureStorage unavailable on Android emulators without Google Play Services

The circuit breaker prevented 100% of potential cache crashes.

Layer 3: Widget Tests (Presentation Layer - UI Reactivity)

Testing if the UI reacts to cache changes via the Observer Pattern.

import 'package:flutter/material.dart';
import 'package:flutter_test/flutter_test.dart';
import 'package:flutter_production_architecture/core/cache/presentation/cache_facade.dart';

void main() {
  group('Cache Subscriptions - Widget Updates', () {
    testWidgets('Widget rebuilds when subscribed cache key changes', (tester) async {
      // Arrange: Create a test app with a widget that subscribes to cache
      await tester.pumpWidget(
        MaterialApp(
          home: Scaffold(
            body: Builder(
              builder: (context) {
                return StreamBuilder<String?>(
                  stream: Cache.watch<String>('user_name'),
                  builder: (context, snapshot) {
                    return Text(snapshot.data ?? 'No user');
                  },
                );
              },
            ),
          ),
        ),
      );

      // Initial state
      expect(find.text('No user'), findsOneWidget);

      // Act: Update cache
      await Cache.set('user_name', 'Alice');
      await tester.pump(); // Process the stream event

      // Assert: Widget should reflect new value
      expect(find.text('Alice'), findsOneWidget);
      expect(find.text('No user'), findsNothing);

      // Act: Update again
      await Cache.set('user_name', 'Bob');
      await tester.pump();

      // Assert: Should show updated value
      expect(find.text('Bob'), findsOneWidget);
      expect(find.text('Alice'), findsNothing);
    });

    testWidgets('Multiple widgets can subscribe to the same key', (tester) async {
      await tester.pumpWidget(
        MaterialApp(
          home: Column(
            children: [
              StreamBuilder<String?>(
                stream: Cache.watch<String>('count'),
                builder: (context, snapshot) => Text('Widget1: ${snapshot.data ?? "0"}'),
              ),
              StreamBuilder<String?>(
                stream: Cache.watch<String>('count'),
                builder: (context, snapshot) => Text('Widget2: ${snapshot.data ?? "0"}'),
              ),
            ],
          ),
        ),
      );

      // Both should start at default
      expect(find.text('Widget1: 0'), findsOneWidget);
      expect(find.text('Widget2: 0'), findsOneWidget);

      // Update cache
      await Cache.set('count', '42');
      await tester.pump();

      // Both should update
      expect(find.text('Widget1: 42'), findsOneWidget);
      expect(find.text('Widget2: 42'), findsOneWidget);
    });
  });
}
Enter fullscreen mode Exit fullscreen mode

Real bug this caught: User updates their profile photo β†’ API call succeeds β†’ Cache updates β†’ Profile screen doesn't refresh (shows old photo). We forgot to subscribe the widget to cache changes. This test would have prevented the bug from reaching production.

Coverage Targets

flutter test --coverage
genhtml coverage/lcov.info -o coverage/html
open coverage/html/index.html
Enter fullscreen mode Exit fullscreen mode

Our targets:

  • Domain layer (cache_ttl.dart, cache_validator.dart): 100% coverage βœ…
  • Data layer (cache_repository_impl.dart): 95%+ coverage βœ…
  • Presentation layer (cache_facade.dart): 80%+ coverage βœ…

What we DON'T test:

  • Platform plugin code (SharedPreferences, SecureStorage) - trust the package maintainers
  • Flutter framework internals (StreamBuilder) - trust the Flutter team

Key Insight: Tests are executable documentation. When someone asks "Why optimistic locking?", the answer is: "Run the race condition test. Watch it fail without locking, pass with it."


Part 15: Lessons Learned - What We'd Do Differently

What Worked Well βœ…

  1. Circuit Breaker Pattern:

    • Zero cache-related crashes in 8+ months of production
    • Graceful degradation on 100% of storage failures
  2. Interface Segregation (Clean Architecture):

    • Unit tests run in 5ms instead of 500ms (no platform channels)
    • Business logic is 100% platform-independent
  3. Optimistic Locking:

    • Eliminated race conditions in high-concurrency flows
    • No data loss in TTL expiration scenarios
  4. LRU Eviction:

    • Memory stable at 52MB vs 380MB before
    • 0% OOM crashes from unbounded cache growth
  5. Observable Cache:

    • UI automatically syncs with cache changes
    • Replaced need for separate state management in many cases

What We'd Improve πŸ”§

  1. Compression:

    • Problem: Large JSON payloads (10KB+) waste disk space
    • Future Solution: Add Gzip compression for values >5KB
    • Implementation: Transparent in CacheSerializer
    • Trade-off: 15% CPU overhead for 70% space savings
  2. Encryption-at-Rest:

    • Problem: Only SecureStorage is encrypted. SharedPrefs is plain text.
    • Future Solution: Encrypt SharedPreferences data by default
    • Trade-off: Performance hit (5ms β†’ 15ms per operation)
    • API: CacheConfig(encryptSharedPrefs: true)
  3. Cache Warming:

    • Problem: App starts cold, critical data loaded on-demand
    • Future Solution: Preload critical data during splash screen
    • Implementation:
     await Cache.warmup(['current_user', 'app_config', 'feature_flags']);
    
  4. Selective Eviction:

    • Problem: LRU evicts by access time, not importance
    • Future Solution: Priority-based eviction (pin critical keys)
    • API:
     await Cache.set('user', user, priority: CachePriority.high);
    
  5. Cross-Tab Synchronization:

    • Problem: Multiple instances (web) don't sync cache changes
    • Future Solution: Use BroadcastChannel (web) or IsolateChannel (mobile)

Part 16: Putting It All Together - Complete Usage Guide

Basic CRUD Operations

// Initialize once in main()
await Cache.initialize(config: CacheConfig.defaults());

// Store data (auto-serialized)
await Cache.set<User>('current_user', user);
await Cache.set<List<String>>('tags', ['flutter', 'dart', 'mobile']);

// Retrieve data (type-safe)
final user = await Cache.get<User>('current_user');
final tags = await Cache.get<List<String>>('tags');

// Check existence
if (await Cache.has('current_user')) {
  print('User cached');
}

// Remove data
await Cache.remove('current_user');

// Clear all
await Cache.clear();
Enter fullscreen mode Exit fullscreen mode

Secure Storage (Encrypted)

// Automatically uses FlutterSecureStorage (Keychain/KeyStore)
await Cache.secure.set('jwt_token', 'abc123');
await Cache.secure.set('api_key', 'sk_live_...');

// Retrieve
final token = await Cache.secure.get<String>('jwt_token');

// Remove
await Cache.secure.remove('jwt_token');
Enter fullscreen mode Exit fullscreen mode

TTL (Time-To-Live)

// Cache with 1-hour expiration
await Cache.set('api_response', response, ttl: Duration(hours: 1));

// After 1 hour, this throws CacheTTLExpiredException
try {
  final response = await Cache.get('api_response');
} on CacheTTLExpiredException catch (e) {
  print('Data expired at: ${e.expiredAt}');
  // Fetch fresh data
}
Enter fullscreen mode Exit fullscreen mode

Observable Cache (Reactive UI)

// Subscribe to changes
Cache.watch<User>('current_user').listen((user) {
  print('User updated: ${user?.name}');
});

// Update from anywhere
await Cache.set('current_user', updatedUser);
// ↑ All subscribers automatically notified!

// Use in widgets
class ProfileWidget extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return StreamBuilder<User?>(
      stream: Cache.watch<User>('current_user'),
      builder: (context, snapshot) {
        if (!snapshot.hasData) return CircularProgressIndicator();
        return Text(snapshot.data!.name);
      },
    );
  }
}
Enter fullscreen mode Exit fullscreen mode

Error Handling

try {
  final user = await Cache.get<User>('user');
} on CacheMissException {
  // Key doesn't exist - fetch from API
  final user = await api.fetchUser();
  await Cache.set('user', user);
} on CacheTTLExpiredException catch (e) {
  // Data expired - refresh it
  print('Expired at: ${e.expiredAt}');
  final user = await api.fetchUser();
  await Cache.set('user', user, ttl: Duration(hours: 1));
} on CacheSerializationException catch (e) {
  // Data corrupt - clear and refetch
  print('Corrupt data for type: ${e.type}');
  await Cache.remove('user');
  final user = await api.fetchUser();
  await Cache.set('user', user);
} on CacheDriverException catch (e) {
  // Storage failed - circuit breaker already handled this
  print('Driver ${e.driverName} failed, using fallback');
}
Enter fullscreen mode Exit fullscreen mode

Batch Operations

// Write multiple items efficiently (chunked)
await Cache.setMultiple({
  'user': user,
  'settings': settings,
  'theme': theme,
  // ... up to 1,000+ items
});

// Read multiple items
final results = await Cache.getMultiple<String>(['key1', 'key2', 'key3']);
// Returns: {'key1': 'value1', 'key2': null, 'key3': 'value3'}
Enter fullscreen mode Exit fullscreen mode

Health Monitoring

final stats = await Cache.stats();
print(stats);
// {
//   'defaultDriver': 'shared_prefs',
//   'driverHealth': {'memory': true, 'shared_prefs': true, 'secure_storage': true},
//   'memorySize': 42,
//   'sharedPrefsSize': 128,
//   'secureStorageSize': 3
// }
Enter fullscreen mode Exit fullscreen mode

Conclusion: From Prototype to Production

When we started this journey, we asked: "Why does SharedPreferences fail in production?"

The answer: Because production systems require resilience, security, and scaleβ€”not just functionality.

What we built across this 3-part series:

Part 1: Resilience

  • βœ… Circuit Breaker Pattern (0% crash rate from cache failures)
  • βœ… LRU Eviction (52MB vs 380MB memory usage)
  • βœ… Clean Architecture (100% testable domain logic)
  • βœ… Strategy Pattern (swappable storage backends)
  • βœ… Type-safe serialization (automatic JSON handling)

Part 2: Security

  • βœ… Hardware-backed encryption (iOS Keychain, Android KeyStore)
  • βœ… Tiered storage strategy (public vs secure cache)
  • βœ… Defense-in-depth (protection against root, forensics, malware)
  • βœ… Security-aware exceptions (meaningful error handling)

Part 3: Scale & Quality

  • βœ… Chunked batching (140ms vs 2.5s for 50 items)
  • βœ… Optimistic Locking (zero race condition data loss)
  • βœ… Observable cache (reactive UI updates via Observer Pattern)
  • βœ… Production metrics (circuit breaker health monitoring)
  • βœ… 3-layer testing (95%+ coverage across domain/data/presentation)

Production Impact Summary

Metric Before After Improvement
Cache-related crashes 0.3% 0.0% 100% reduction
Memory usage (8h session) 380MB 52MB 86% reduction
Sync time (50 items) 2,500ms 140ms 94% faster
Race condition data loss Occasional 0 Eliminated
Security vulnerabilities P0 Critical 0 Compliant
Test coverage ~40% 95%+ 2.4x increase

Key Architectural Insights

The patterns here aren't Flutter-specificβ€”they're systems engineering applied to mobile:

  • Circuit Breakers (used by Netflix, AWS)
  • LRU Eviction (used by Redis, CPU caches)
  • Optimistic Locking (used by databases, Git)
  • Observer Pattern (used by reactive frameworks everywhere)
  • Strategy Pattern (used by payment gateways, storage engines)

Mobile platforms have unique constraints:

  • Shared OS resources (Binder, XPC)
  • Hardware-backed security (Secure Enclave, TrustZone)
  • Memory pressure (iOS jetsam, Android OOM killer)
  • User expectations (instant UI, offline-first)

Our architecture respects these constraints while applying battle-tested patterns from distributed systems.


What's Next for Your Implementation

If you're implementing this in your app:

  1. Start with Part 1: Get the Circuit Breaker and LRU working first
  2. Add Part 2 Security: Identify sensitive keys (tokens, keys) β†’ move to SecureStorage
  3. Optimize with Part 3: Profile your batch operations β†’ add chunking if needed

Beyond this series (future enhancements):

  • Compression for large payloads (Gzip for >5KB values)
  • Priority-based eviction (pin critical keys, evict less important ones first)
  • Cross-tab synchronization (BroadcastChannel for web, IsolateChannel for mobile)
  • Encryption-at-rest for SharedPreferences (EncryptedSharedPreferences by default)
  • Cache warming (preload critical data during splash screen)

Resources

Full Source Code:

πŸ™ Flutter Production Architecture on GitHub

Read the Series:

Further Reading:


If this series helped you:

  • ⭐ Star the GitHub repository
  • πŸ’¬ Share your implementation experiences in the comments
  • πŸ”— Share with your team (especially those fighting SharedPreferences bugs!)

Questions or improvements? Open an issue or PR on the GitHub repo. This is a living architectureβ€”feedback makes it better.


Tags: #Flutter #Architecture #SystemDesign #Production #Mobile #CleanArchitecture #Performance #Testing

Author: DevMatrash

Date: February 2026


Top comments (0)