Memory leaks are one of the most challenging issues to diagnose and fix in any programming language, Ruby included. When your Ruby application starts consuming increasingly more memory over time without releasing it, you're likely facing a memory leak. This article will dive deep into what memory leaks are in Ruby, how to identify them, and most importantly, how to fix them with numerous practical examples and benchmark comparisons.
Table of Contents
- What is a Memory Leak?
- Ruby's Memory Management
- Common Causes of Memory Leaks in Ruby
- Tools to Detect Memory Leaks
- Practical Examples
- Benchmarking Memory Usage
- Best Practices to Prevent Memory Leaks
- Advanced Techniques
- Conclusion
What is a Memory Leak?
A memory leak occurs when an application allocates memory but fails to release it when it's no longer needed. Over time, these unreleased memory allocations accumulate, causing the application to consume more and more memory. Eventually, this can lead to degraded performance, out-of-memory errors, and even application crashes.
In Ruby, memory leaks usually happen when objects remain referenced when they should be garbage collected. The Ruby garbage collector (GC) only reclaims memory for objects that are no longer referenced by the program.
Ruby's Memory Management
Before diving into memory leaks, it's important to understand how Ruby manages memory.
Ruby uses a garbage collector to automatically free memory that's no longer being used. The Ruby GC is a mark-and-sweep collector that:
- Marks all objects that are still referenced by the program
- Sweeps away (deallocates) all unmarked objects
Ruby 2.0 and later versions use a generational garbage collector called RGenGC, which was further improved in Ruby 2.1 with the introduction of the "Remembered Set" (RSet) optimization. Later versions have continued to refine and improve the garbage collector's performance.
Generational Structure
The heap is divided into generations:
- Young Generation (Eden): New objects are allocated here, with frequent garbage collection for short-lived objects.
- Survivor Space: Objects surviving one GC cycle move here, potentially promoted to the old generation if they persist.
- Old Generation: Contains long-lived objects, collected less frequently.
A text-based diagram illustrates this:
+-----------------------------------+
| Young Generation |
| (Eden) |
+-----------------------------------+
| - New objects are allocated here. |
| - Frequent garbage collection. |
+-----------------------------------+
| Survivor Space |
+-----------------------------------+
| - Objects that survive one GC are |
| moved here. |
+-----------------------------------+
|
| When Survivor Space is full, |
| objects are promoted to Old |
| Generation. |
v
+-----------------------------------+
| Old Generation |
+-----------------------------------+
| - Long-lived objects. |
| - Less frequent garbage collection|
+-----------------------------------+
This structure optimizes performance by focusing on the generation most likely to have garbage, leveraging the generational hypothesis that most objects die young.
Garbage Collection Phases
- Mark Phase: Identifies reachable objects from root objects (globals, stack variables).
- Sweep Phase: Reclaims memory from unreachable objects.
Recent Ruby versions, such as 3.2, have introduced incremental GC to reduce pause times, enhancing performance in long-running applications.
Here's a simple visualization of how Ruby's memory management works:
# New objects are allocated in the young generation (Eden)
obj = "I'm a new string"
# Objects that survive a GC are promoted to the old generation
survived_objects = []
10000.times { survived_objects << "I will survive" }
GC.start # Force a garbage collection
# Objects with no references are collected
temporary = "I'll be collected soon"
temporary = nil # The string is now unreferenced
GC.start # The unreferenced string is collected
Common Causes of Memory Leaks in Ruby
Memory leaks in Ruby typically fall into several categories:
1. Global Variables and Class Variables
class MemoryLeaker
@@accumulated_data = []
def self.process_data(data)
@@accumulated_data << data
# Process the data...
# But we never clean up @@accumulated_data!
end
end
# This will keep growing indefinitely
loop do
MemoryLeaker.process_data("Some data")
end
2. Long-lived References in Closures
def create_handlers
data = load_large_dataset() # Loads 100MB of data
# This proc keeps a reference to the entire data set
handler = proc { |item| data.find { |d| d.id == item.id } }
return handler # data remains referenced by the handler
end
handlers = []
100.times { handlers << create_handlers() } # We now have 100 copies of the data
3. Circular References
class Node
attr_accessor :parent, :children
def initialize
@children = []
end
def add_child(child)
@children << child
child.parent = self # Creates a circular reference
end
end
# Create a tree structure
root = Node.new
100_000.times do
child = Node.new
root.add_child(child)
end
# Later, when we're done with the tree
root = nil # We might expect the entire tree to be GC'd, but circular references can cause issues
4. Caching Without Bounds
class ExpensiveCalculator
def initialize
@cache = {}
end
def calculate(input)
@cache[input] ||= perform_expensive_calculation(input)
end
private
def perform_expensive_calculation(input)
# ... expensive calculation ...
result = input * input * Math.sqrt(input)
return result
end
end
calculator = ExpensiveCalculator.new
# If we call this with unlimited distinct inputs, the cache grows forever
(1..1_000_000).each { |i| calculator.calculate(i) }
5. Event Handlers That Aren't Unregistered
class EventEmitter
def initialize
@listeners = []
end
def add_listener(listener)
@listeners << listener
end
def emit(event)
@listeners.each { |listener| listener.call(event) }
end
end
emitter = EventEmitter.new
# This creates objects that are never released
1000.times do |i|
temp_data = "Large data #{i}" * 1000
emitter.add_listener(proc { |event| puts "Processing event with data: #{temp_data}" })
end
Tools to Detect Memory Leaks
Several tools can help identify memory leaks in Ruby applications:
1. Memory_Profiler Gem
The memory_profiler
gem provides detailed information about memory allocation and retention.
require 'memory_profiler'
report = MemoryProfiler.report do
# Code you want to profile
1000.times { "abc" * 100 }
end
report.pretty_print
Output example:
Total allocated: 1441280 bytes (14000 objects)
Total retained: 0 bytes (0 objects)
allocated memory by gem
-----------------------------------
1441280 ruby-2.7.0/lib
allocated memory by file
-----------------------------------
1441280 example.rb
allocated memory by location
-----------------------------------
1441280 example.rb:5
allocated memory by class
-----------------------------------
1441280 String
allocated objects by gem
-----------------------------------
14000 ruby-2.7.0/lib
allocated objects by file
-----------------------------------
14000 example.rb
allocated objects by location
-----------------------------------
14000 example.rb:5
allocated objects by class
-----------------------------------
14000 String
2. Derailed Benchmarks
For Rails applications, the derailed_benchmarks
gem is incredibly useful:
# Add to your Gemfile
gem 'derailed_benchmarks', group: :development
# Then run
$ bundle exec derailed exec perf:mem
3. ObjectSpace Module
Ruby's built-in ObjectSpace
module lets you enumerate all objects:
require 'objspace'
before = {}
ObjectSpace.each_object do |obj|
before[obj.class] ||= 0
before[obj.class] += 1
end
# Run some code you suspect has a leak
100.times { "test" * 100 }
after = {}
ObjectSpace.each_object do |obj|
after[obj.class] ||= 0
after[obj.class] += 1
end
# Compare before and after
after.each do |klass, count|
before_count = before[klass] || 0
if count > before_count
puts "#{klass}: #{count - before_count} new objects"
end
end
4. GC.stat
The GC.stat
method provides statistics about the garbage collector:
before = GC.stat
# Run suspect code
after = GC.stat
puts "Objects allocated: #{after[:total_allocated_objects] - before[:total_allocated_objects]}"
puts "Heap pages: #{after[:heap_pages] - before[:heap_pages]}"
Practical Examples
Let's look at some real-world examples of memory leaks and how to fix them.
Example 1: Fixing a Class Variable Leak
# Bad implementation - leaks memory
class UserActivityLogger
@@logs = []
def self.log_activity(user_id, action)
@@logs << { user_id: user_id, action: action, timestamp: Time.now }
end
end
# Simulating activity
100_000.times do |i|
UserActivityLogger.log_activity(i % 1000, "login")
end
puts "Memory usage after logging: #{`ps -o rss= -p #{Process.pid}`.to_i / 1024} MB"
Fixed version:
# Better implementation - limits memory usage
class UserActivityLogger
@@logs = []
@@max_logs = 1000
def self.log_activity(user_id, action)
@@logs << { user_id: user_id, action: action, timestamp: Time.now }
@@logs.shift if @@logs.size > @@max_logs # Keep only the most recent logs
end
def self.clear_old_logs
cutoff_time = Time.now - 3600 # 1 hour ago
@@logs.reject! { |log| log[:timestamp] < cutoff_time }
end
end
# Simulating activity
100_000.times do |i|
UserActivityLogger.log_activity(i % 1000, "login")
end
puts "Memory usage after logging with limit: #{`ps -o rss= -p #{Process.pid}`.to_i / 1024} MB"
Example 2: Fixing a Closure Leak
# Memory leak with closures
def generate_processors
large_data = "x" * 1_000_000 # 1MB of data
processors = []
10.times do |i|
# Each processor captures the large_data in its closure
processors << proc { |item| item.to_s + large_data[0..10] }
end
return processors
end
all_processors = []
100.times do
all_processors.concat(generate_processors)
end
puts "Memory usage with closure leak: #{`ps -o rss= -p #{Process.pid}`.to_i / 1024} MB"
GC.start
puts "Memory usage after GC: #{`ps -o rss= -p #{Process.pid}`.to_i / 1024} MB"
Fixed version:
# Fixed closure implementation
def generate_processors
# Extract only what's needed from the large data
prefix = "x" * 10 # Only keep what we need
processors = []
10.times do |i|
# Now each processor only captures the small prefix
processors << proc { |item| item.to_s + prefix }
end
return processors
end
all_processors = []
100.times do
all_processors.concat(generate_processors)
end
puts "Memory usage with fixed closures: #{`ps -o rss= -p #{Process.pid}`.to_i / 1024} MB"
GC.start
puts "Memory usage after GC: #{`ps -o rss= -p #{Process.pid}`.to_i / 1024} MB"
Example 3: Implementing an LRU Cache
class LRUCache
def initialize(max_size)
@max_size = max_size
@cache = {}
@access_order = []
end
def [](key)
return nil unless @cache.key?(key)
# Update access order
@access_order.delete(key)
@access_order.push(key)
@cache[key]
end
def []=(key, value)
if @cache.key?(key)
# Update existing key's position in access order
@access_order.delete(key)
elsif @cache.size >= @max_size
# Remove least recently used item
oldest_key = @access_order.shift
@cache.delete(oldest_key)
end
@cache[key] = value
@access_order.push(key)
value
end
def size
@cache.size
end
end
# Usage
cache = LRUCache.new(1000)
100_000.times do |i|
cache[i % 5000] = "Value #{i}"
end
puts "Cache size after insertions: #{cache.size}"
puts "Memory usage with LRU cache: #{`ps -o rss= -p #{Process.pid}`.to_i / 1024} MB"
Example 4: Weak References
Ruby's WeakRef
class can help prevent memory leaks by allowing objects to be garbage collected even when references exist.
require 'weakref'
class DocumentCache
def initialize
@cache = {}
end
def cache_document(id, document)
# Store a weak reference instead of the document itself
@cache[id] = WeakRef.new(document)
end
def get_document(id)
return nil unless @cache.key?(id)
begin
# Attempt to get the referenced object
@cache[id].__getobj__
rescue WeakRef::RefError
# Reference has been garbage collected
@cache.delete(id)
nil
end
end
end
# Usage
cache = DocumentCache.new
documents = []
100.times do |i|
doc = "Document content " * 10000 # Large document
cache.cache_document(i, doc)
documents << doc if i % 10 == 0 # Keep only some documents
end
# Force garbage collection
GC.start
# Check which documents are still available
documents_available = 0
100.times do |i|
documents_available += 1 if cache.get_document(i)
end
puts "Documents still in cache: #{documents_available}"
puts "Memory usage with weak references: #{`ps -o rss= -p #{Process.pid}`.to_i / 1024} MB"
Benchmarking Memory Usage
Let's compare different approaches to managing a cache and observe their memory usage patterns.
require 'benchmark'
require 'weakref'
def measure_memory
`ps -o rss= -p #{Process.pid}`.to_i / 1024
end
def run_benchmark(name)
start_memory = measure_memory
start_time = Time.now
yield
GC.start
end_memory = measure_memory
end_time = Time.now
puts "#{name}:"
puts " Time: #{(end_time - start_time).round(2)} seconds"
puts " Memory before: #{start_memory} MB"
puts " Memory after: #{end_memory} MB"
puts " Difference: #{end_memory - start_memory} MB"
puts
end
# Scenario: Caching results of expensive calculations
# Benchmark 1: Unlimited cache
run_benchmark("Unlimited Cache") do
cache = {}
100_000.times do |i|
input = i % 10000
cache[input] ||= "Result of calculation #{input} " * 100
end
end
# Benchmark 2: Limited size cache
run_benchmark("Fixed-Size Cache (1000 items)") do
cache = {}
access_order = []
max_size = 1000
100_000.times do |i|
input = i % 10000
if cache.key?(input)
# Update access order
access_order.delete(input)
elsif cache.size >= max_size
# Remove least recently used item
oldest = access_order.shift
cache.delete(oldest)
end
cache[input] = "Result of calculation #{input} " * 100
access_order.push(input)
end
end
# Benchmark 3: Weak References
run_benchmark("Weak References") do
cache = {}
strong_refs = []
100_000.times do |i|
input = i % 10000
unless cache.key?(input)
result = "Result of calculation #{input} " * 100
cache[input] = WeakRef.new(result)
# Keep a strong reference to every 50th result to see the difference
strong_refs << result if input % 50 == 0
end
end
end
# Benchmark 4: Ruby's Official Cache (Gem)
begin
require 'lru_redux'
run_benchmark("LruRedux::Cache (1000 items)") do
cache = LruRedux::Cache.new(1000)
100_000.times do |i|
input = i % 10000
cache[input] ||= "Result of calculation #{input} " * 100
end
end
rescue LoadError
puts "LruRedux gem not installed, skipping benchmark 4"
end
# Benchmark 5: Time-based expiration
run_benchmark("Time-Based Expiration (5 seconds)") do
cache = {}
timestamps = {}
max_age = 5 # seconds
100_000.times do |i|
input = i % 10000
now = Time.now
# Clear expired entries
if i % 1000 == 0
expired_keys = timestamps.select { |k, t| now - t > max_age }.keys
expired_keys.each do |k|
cache.delete(k)
timestamps.delete(k)
end
end
unless cache.key?(input)
cache[input] = "Result of calculation #{input} " * 100
timestamps[input] = now
end
# Simulate time passing
sleep(0.00001) if i % 100 == 0
end
end
puts "Final comparison of memory usage approaches:"
puts "--------------------------------------------"
Advanced Techniques
Let's explore some advanced techniques for managing memory in Ruby.
Using ObjectSpace for Heap Dumping
Ruby's ObjectSpace
module can dump heap information to a file for analysis:
require 'objspace'
# Enable object space tracing
ObjectSpace.trace_object_allocations_start
# Run your code with suspected memory issues
def allocate_objects
1000.times { "A" * 1000 }
end
allocate_objects
# Dump the heap to a file
File.open("heap_dump.json", "w") do |f|
ObjectSpace.dump_all(output: f)
end
ObjectSpace.trace_object_allocations_stop
puts "Heap dump created at heap_dump.json"
You can then analyze this dump with tools like heapy
:
# gem install heapy
require 'heapy'
heap = Heapy.read("heap_dump.json")
puts heap.stats
Compacting Garbage Collection
In Ruby 2.7+, you can use compaction to reduce memory fragmentation:
# Force full garbage collection with compaction
GC.start(full_mark: true, immediate_sweep: true)
GC.compact
before = GC.stat
puts "Initial heap_live_slots: #{before[:heap_live_slots]}"
puts "Initial heap_eden_pages: #{before[:heap_eden_pages]}"
# Allocate many strings
strings = []
10_000.times { strings << "A" * 100 }
# Free half of them
5_000.times { strings.pop }
# Run GC with compaction
GC.start(full_mark: true, immediate_sweep: true)
GC.compact
after = GC.stat
puts "After heap_live_slots: #{after[:heap_live_slots]}"
puts "After heap_eden_pages: #{after[:heap_eden_pages]}"
Analyzing Memory with a Custom Tracer
Let's build a simple memory allocation tracer:
class MemoryTracer
def initialize
@allocated = Hash.new(0)
@stack = []
end
def start
@start_stats = GC.stat
# Enable tracing
ObjectSpace.trace_object_allocations_start
set_trace_func proc { |event, file, line, id, binding, classname|
case event
when 'call'
@stack.push([file, line, id])
when 'return'
@stack.pop
when 'c-call'
if id == :new && classname != Class
# Track allocation source
allocation_point = @stack.last
@allocated[allocation_point] += 1 if allocation_point
end
end
}
end
def stop
set_trace_func(nil)
ObjectSpace.trace_object_allocations_stop
@end_stats = GC.stat
end
def report
puts "Memory Allocation Report"
puts "-----------------------"
puts "Total objects allocated: #{@end_stats[:total_allocated_objects] - @start_stats[:total_allocated_objects]}"
puts "\nTop allocation sites:"
@allocated.sort_by { |k, v| -v }.first(10).each do |site, count|
file, line, method_name = site
puts " #{count} objects: #{file}:#{line} in #{method_name}"
end
end
end
# Usage
tracer = MemoryTracer.new
tracer.start
# Run code to analyze
def create_strings
result = []
1000.times { result << "test" * 100 }
result
end
strings = create_strings
tracer.stop
tracer.report
Thread-Safe Caching with Timeouts
A more robust caching solution with timeouts and thread safety:
require 'monitor'
require 'concurrent'
class TimedCache
def initialize(max_size: 1000, ttl: 300)
@max_size = max_size # Maximum number of items
@ttl = ttl # Time to live in seconds
@cache = {}
@monitor = Monitor.new
@cleanup_task = schedule_cleanup
end
def [](key)
@monitor.synchronize do
entry = @cache[key]
return nil if entry.nil? || entry[:expires_at] < Time.now
# Update access time
entry[:last_accessed] = Time.now
entry[:value]
end
end
def []=(key, value)
@monitor.synchronize do
now = Time.now
# Make room if necessary
if !@cache.key?(key) && @cache.size >= @max_size
# Find the least recently accessed item
oldest_key = @cache.min_by { |_, entry| entry[:last_accessed] }&.first
@cache.delete(oldest_key) if oldest_key
end
@cache[key] = {
value: value,
last_accessed: now,
expires_at: now + @ttl
}
value
end
end
def size
@monitor.synchronize { @cache.size }
end
def clear
@monitor.synchronize { @cache.clear }
end
def stop
@cleanup_task&.cancel
end
private
def schedule_cleanup
Concurrent::TimerTask.new(execution_interval: [@ttl / 10, 60].min) do
cleanup_expired
end.tap(&:execute)
end
def cleanup_expired
@monitor.synchronize do
now = Time.now
expired_keys = @cache.select { |_, entry| entry[:expires_at] < now }.keys
expired_keys.each { |key| @cache.delete(key) }
end
end
end
# Usage example
cache = TimedCache.new(max_size: 100, ttl: 10) # 10 second TTL
# Fill cache
100.times do |i|
cache[i] = "Value #{i}" * 100
end
puts "Initial cache size: #{cache.size}"
sleep 5 # Wait a bit
# Add more items, exceeding max size
50.times do |i|
cache[i + 100] = "New value #{i}" * 100
end
puts "Cache size after adding more: #{cache.size}"
sleep 6 # Wait for original items to expire
puts "Cache size after expiration: #{cache.size}"
cache.stop # Clean up background task
Best Practices to Prevent Memory Leaks
Based on the examples and techniques discussed, here are some best practices to prevent memory leaks in Ruby:
-
Limit the size of caches and collections
- Always set upper bounds on cache sizes
- Use LRU or similar eviction strategies
- Consider time-based expiration for cached items
-
Be careful with global and class variables
- Avoid unbounded growth in global state
- Consider alternative designs that don't require global state
- Periodically clean up class-level collections
-
Avoid strong circular references
- Use weak references when appropriate
- Break circular references when objects are no longer needed
- Consider using observer patterns with explicit registration/deregistration
-
Monitor memory usage
- Use tools like
memory_profiler
regularly - Set up memory usage alerts in production
- Run occasional heap dumps and analyze them
- Use tools like
-
Be mindful of closures
- Be aware of what variables are captured in closures
- Only capture what you need, not entire environments
- Consider passing needed data as arguments instead of capturing
-
Clean up after background processes
- Ensure background threads are properly terminated
- Remove event listeners when they're no longer needed
- Use finalizers when appropriate to clean up resources
-
Use benchmarks to compare approaches
- Test memory usage with different implementations
- Run load tests before deploying memory-sensitive code
- Monitor memory usage over time in production
-
Leverage Ruby's garbage collector
- Understand how the GC works
- Use
GC.start
andGC.compact
strategically - Configure GC parameters for your workload
Conclusion
Memory leaks in Ruby applications can be tricky to diagnose and fix, but with a good understanding of Ruby's memory management system and the right tools, they can be effectively managed. Regular monitoring, careful design, and proper testing are key to avoiding memory leaks in production.
Remember that Ruby's garbage collector is quite sophisticated, but it can only do so much. As developers, we need to be mindful of how we're using memory, especially in long-running applications like web servers and background workers.
By applying the techniques and best practices outlined in this article, you'll be well-equipped to build Ruby applications that maintain stable memory usage over time, providing reliable performance for your users.
References
- Ruby Official Documentation on GC
- memory_profiler gem
- derailed_benchmarks gem
- heapy gem
- Ruby Under a Microscope by Pat Shaughnessy
- Understanding and Measuring Memory Usage in Ruby by Nate Berkopec. Generate markdown.
Top comments (0)