A 400MB Memory Leak from 12 Lines of Cache Code
I watched a production service climb from 200MB to 4GB over 6 hours. The culprit? A dictionary-based cache that never forgot anything.
# The silent killer
class ImageProcessor:
_cache = {} # This grows forever
@classmethod
def get_processed(cls, image_id, raw_image):
if image_id not in cls._cache:
cls._cache[image_id] = expensive_process(raw_image)
return cls._cache[image_id]
The fix was three lines. Here's the working version first, then I'll explain why the naive approach fails so spectacularly:
import weakref
class ImageProcessor:
_cache = weakref.WeakValueDictionary()
@classmethod
def get_processed(cls, image_id, processed_image):
# Only caches while caller holds a reference
cls._cache[image_id] = processed_image
return processed_image
@classmethod
def get_cached(cls, image_id):
return cls._cache.get(image_id) # Returns None if GC'd
That's it. WeakValueDictionary doesn't prevent garbage collection of its values. When the last strong reference to a cached object disappears, the entry evicts itself. No manual cleanup, no LRU complexity, no TTL tracking.
Continue reading the full article on TildAlice
Top comments (0)