Ruby Patterns for AI Developers: Procs, Lambdas, Closures, and Enumerable Magic

#ruby #rails #tutorial #ai

Ruby's functional programming features shine when building AI pipelines. Procs, lambdas, and closures let you encapsulate behavior, while Enumerable methods transform data with elegance. This article, part of the Ruby for AI series, shows how these patterns solve real problems in machine learning workflows.

Understanding Procs and Lambdas

Both Procs and lambdas are blocks of code you can store in variables and pass around. The differences matter for AI pipelines where data integrity is crucial.

# A Proc - flexible argument handling
normalize_proc = Proc.new { |value, min, max| (value - min) / (max - min).to_f }

# A lambda - strict about arguments
normalize_lambda = ->(value, min, max) { (value - min) / (max - min).to_f }

# Both work for normalizing features
data_point = 75.0
min_val = 0.0
max_val = 100.0

puts normalize_proc.call(data_point, min_val, max_val)   # => 0.75
puts normalize_lambda.call(data_point, min_val, max_val) # => 0.75

# Procs ignore extra arguments, lambdas raise errors
puts normalize_proc.call(data_point, min_val, max_val, "extra") # works, ignores extra
# normalize_lambda.call(data_point, min_val, max_val, "extra")  # ArgumentError!

Use lambdas when you need strict contracts for data transformations. Use Procs when building flexible processing pipelines that might receive varying inputs.

Closures Capture State

Closures remember their surrounding context. This is powerful for creating configurable transformers without classes.

def make_scaler(method: :standard)
  case method
  when :min_max
    ->(data) {
      min, max = data.minmax
      data.map { |x| (x - min) / (max - min).to_f }
    }
  when :standard
    mean = nil  # captured in closure
    std = nil

    ->(data) {
      mean ||= data.sum / data.size.to_f
      std ||= Math.sqrt(data.map { |x| (x - mean) ** 2 }.sum / data.size)

      data.map { |x| (x - mean) / std }
    }
  when :robust
    ->(data) {
      sorted = data.sort
      median = sorted[data.size / 2]
      mad = sorted.map { |x| (x - median).abs }.sort[data.size / 2]

      data.map { |x| (x - median) / mad.to_f }
    }
  end
end

# Create specialized scalers
raw_features = [10, 20, 30, 40, 50, 100, 200]

min_max_scaler = make_scaler(method: :min_max)
standard_scaler = make_scaler(method: :standard)

puts "Min-max scaled: #{min_max_scaler.call(raw_features).map { |f| f.round(3) }}"
# => [0.0, 0.053, 0.105, 0.158, 0.211, 0.474, 1.0]

puts "Standard scaled: #{standard_scaler.call(raw_features).map { |f| f.round(3) }}"
# => [-0.674, -0.539, -0.404, -0.269, -0.135, 0.404, 1.617]

The closures capture configuration (method) and computed values (mean, std) without explicit object instantiation.

Enumerable Powers Data Pipelines

Ruby's Enumerable module provides the vocabulary for data transformation. These methods chain together to build readable pipelines.

Map for Feature Extraction

require 'json'

# Raw sensor data
sensor_readings = [
  { "timestamp" => "2024-01-15T10:00:00", "temperature" => 22.5, "humidity" => 45, "vibration" => 0.02 },
  { "timestamp" => "2024-01-15T10:01:00", "temperature" => 23.1, "humidity" => 46, "vibration" => 0.05 },
  { "timestamp" => "2024-01-15T10:02:00", "temperature" => 22.8, "humidity" => 44, "vibration" => 0.15 }
]

# Extract features using map
feature_vectors = sensor_readings.map do |reading|
  # Feature engineering inline
  temp_humidity_interaction = reading["temperature"] * reading["humidity"] / 1000.0
  vibration_anomaly = reading["vibration"] > 0.1 ? 1 : 0

  [
    reading["temperature"],
    reading["humidity"],
    reading["vibration"],
    temp_humidity_interaction,
    vibration_anomaly
  ]
end

puts "Feature vectors:"
feature_vectors.each { |v| puts v.map { |f| f.round(3) }.inspect }
# => [22.5, 45.0, 0.02, 1.012, 0]
# => [23.1, 46.0, 0.05, 1.063, 0]
# => [22.8, 44.0, 0.15, 1.003, 1]

Select and Reject for Filtering

# Filter training data by quality criteria
dataset = [
  { features: [1.0, 2.0], label: "A", confidence: 0.95 },
  { features: [1.5, 2.5], label: "A", confidence: 0.82 },
  { features: [3.0, 1.0], label: "B", confidence: 0.45 },  # low confidence
  { features: [nil, 2.0], label: "A", confidence: 0.90 },  # missing data
  { features: [2.0, 3.0], label: "B", confidence: 0.91 }
]

# Build a reusable quality filter using a proc
quality_threshold = 0.8
has_complete_features = ->(sample) { sample[:features].all? }
has_high_confidence = ->(sample) { sample[:confidence] >= quality_threshold }

clean_data = dataset
  .select(&has_complete_features)
  .select(&has_high_confidence)

puts "Clean samples: #{clean_data.size}"  # => 3
puts "Rejected: #{dataset.size - clean_data.size}"  # => 2

Reduce for Aggregation

# Calculate confusion matrix from predictions
predictions = [
  { actual: "cat", predicted: "cat" },
  { actual: "dog", predicted: "dog" },
  { actual: "cat", predicted: "dog" },
  { actual: "dog", predicted: "cat" },
  { actual: "cat", predicted: "cat" }
]

confusion_matrix = predictions.reduce(Hash.new(0)) do |matrix, pred|
  key = "#{pred[:actual]}_#{pred[:predicted]}"
  matrix[key] += 1
  matrix
end

puts "Confusion counts: #{confusion_matrix}"
# => {"cat_cat"=>2, "dog_dog"=>1, "cat_dog"=>1, "dog_cat"=>1}

# Calculate accuracy with reduce
correct = predictions.reduce(0) { |sum, p| p[:actual] == p[:predicted] ? sum + 1 : sum }
accuracy = correct.to_f / predictions.size
puts "Accuracy: #{(accuracy * 100).round(1)}%"  # => 60.0%

Find and Detect for Search

# Find first anomalous reading in time series
time_series = [0.1, 0.12, 0.11, 0.15, 0.13, 0.45, 0.12, 0.11]

# Create a dynamic threshold detector
def make_anomaly_detector(window_size: 5, threshold_multiplier: 3.0)
  ->(series) {
    series.each_with_index.find do |value, index|
      next false if index < window_size

      window = series[[0, index - window_size].max...index]
      mean = window.sum / window.size
      std = Math.sqrt(window.map { |v| (v - mean) ** 2 }.sum / window.size)

      std > 0 && (value - mean).abs > threshold_multiplier * std
    end
  }
end

detector = make_anomaly_detector(window_size: 5, threshold_multiplier: 2.5)
anomaly = detector.call(time_series)

if anomaly
  value, index = anomaly
  puts "Anomaly detected at index #{index}: value #{value}"
  # => Anomaly detected at index 5: value 0.45
end

Chaining Methods for Pipeline Elegance

Real AI pipelines combine these operations. Ruby's method chaining creates readable data flows.


ruby
class DataPipeline
  def initialize(data)
    @data = data
  end

  def filter(&block)
    @data = @data.select(&block)
    self
  end

  def transform(&block)
    @data = @data.map(&block)
    self
  end

  def aggregate(&block)
    @data = @data.reduce(&block)
    self
  end

  def to_a
    @data
  end
end

# Process batch inference results
results = [
  { id: 1, prediction: 0.92, latency_ms: 45, model_version: "v2" },
  { id: 2, prediction: 0.34, latency_ms: 120, model_version: "v1" },
  { id: 3, prediction: 0.89, latency_ms: 38, model_version: "v2" },
  { id: 4, prediction: 0.95, latency_ms: 200, model_version: "v1" },
  { id: