Ruby's functional programming features shine when building AI pipelines. Procs, lambdas, and closures let you encapsulate behavior, while Enumerable methods transform data with elegance. This article, part of the Ruby for AI series, shows how these patterns solve real problems in machine learning workflows.
Understanding Procs and Lambdas
Both Procs and lambdas are blocks of code you can store in variables and pass around. The differences matter for AI pipelines where data integrity is crucial.
# A Proc - flexible argument handling
normalize_proc = Proc.new { |value, min, max| (value - min) / (max - min).to_f }
# A lambda - strict about arguments
normalize_lambda = ->(value, min, max) { (value - min) / (max - min).to_f }
# Both work for normalizing features
data_point = 75.0
min_val = 0.0
max_val = 100.0
puts normalize_proc.call(data_point, min_val, max_val) # => 0.75
puts normalize_lambda.call(data_point, min_val, max_val) # => 0.75
# Procs ignore extra arguments, lambdas raise errors
puts normalize_proc.call(data_point, min_val, max_val, "extra") # works, ignores extra
# normalize_lambda.call(data_point, min_val, max_val, "extra") # ArgumentError!
Use lambdas when you need strict contracts for data transformations. Use Procs when building flexible processing pipelines that might receive varying inputs.
Closures Capture State
Closures remember their surrounding context. This is powerful for creating configurable transformers without classes.
def make_scaler(method: :standard)
case method
when :min_max
->(data) {
min, max = data.minmax
data.map { |x| (x - min) / (max - min).to_f }
}
when :standard
mean = nil # captured in closure
std = nil
->(data) {
mean ||= data.sum / data.size.to_f
std ||= Math.sqrt(data.map { |x| (x - mean) ** 2 }.sum / data.size)
data.map { |x| (x - mean) / std }
}
when :robust
->(data) {
sorted = data.sort
median = sorted[data.size / 2]
mad = sorted.map { |x| (x - median).abs }.sort[data.size / 2]
data.map { |x| (x - median) / mad.to_f }
}
end
end
# Create specialized scalers
raw_features = [10, 20, 30, 40, 50, 100, 200]
min_max_scaler = make_scaler(method: :min_max)
standard_scaler = make_scaler(method: :standard)
puts "Min-max scaled: #{min_max_scaler.call(raw_features).map { |f| f.round(3) }}"
# => [0.0, 0.053, 0.105, 0.158, 0.211, 0.474, 1.0]
puts "Standard scaled: #{standard_scaler.call(raw_features).map { |f| f.round(3) }}"
# => [-0.674, -0.539, -0.404, -0.269, -0.135, 0.404, 1.617]
The closures capture configuration (method) and computed values (mean, std) without explicit object instantiation.
Enumerable Powers Data Pipelines
Ruby's Enumerable module provides the vocabulary for data transformation. These methods chain together to build readable pipelines.
Map for Feature Extraction
require 'json'
# Raw sensor data
sensor_readings = [
{ "timestamp" => "2024-01-15T10:00:00", "temperature" => 22.5, "humidity" => 45, "vibration" => 0.02 },
{ "timestamp" => "2024-01-15T10:01:00", "temperature" => 23.1, "humidity" => 46, "vibration" => 0.05 },
{ "timestamp" => "2024-01-15T10:02:00", "temperature" => 22.8, "humidity" => 44, "vibration" => 0.15 }
]
# Extract features using map
feature_vectors = sensor_readings.map do |reading|
# Feature engineering inline
temp_humidity_interaction = reading["temperature"] * reading["humidity"] / 1000.0
vibration_anomaly = reading["vibration"] > 0.1 ? 1 : 0
[
reading["temperature"],
reading["humidity"],
reading["vibration"],
temp_humidity_interaction,
vibration_anomaly
]
end
puts "Feature vectors:"
feature_vectors.each { |v| puts v.map { |f| f.round(3) }.inspect }
# => [22.5, 45.0, 0.02, 1.012, 0]
# => [23.1, 46.0, 0.05, 1.063, 0]
# => [22.8, 44.0, 0.15, 1.003, 1]
Select and Reject for Filtering
# Filter training data by quality criteria
dataset = [
{ features: [1.0, 2.0], label: "A", confidence: 0.95 },
{ features: [1.5, 2.5], label: "A", confidence: 0.82 },
{ features: [3.0, 1.0], label: "B", confidence: 0.45 }, # low confidence
{ features: [nil, 2.0], label: "A", confidence: 0.90 }, # missing data
{ features: [2.0, 3.0], label: "B", confidence: 0.91 }
]
# Build a reusable quality filter using a proc
quality_threshold = 0.8
has_complete_features = ->(sample) { sample[:features].all? }
has_high_confidence = ->(sample) { sample[:confidence] >= quality_threshold }
clean_data = dataset
.select(&has_complete_features)
.select(&has_high_confidence)
puts "Clean samples: #{clean_data.size}" # => 3
puts "Rejected: #{dataset.size - clean_data.size}" # => 2
Reduce for Aggregation
# Calculate confusion matrix from predictions
predictions = [
{ actual: "cat", predicted: "cat" },
{ actual: "dog", predicted: "dog" },
{ actual: "cat", predicted: "dog" },
{ actual: "dog", predicted: "cat" },
{ actual: "cat", predicted: "cat" }
]
confusion_matrix = predictions.reduce(Hash.new(0)) do |matrix, pred|
key = "#{pred[:actual]}_#{pred[:predicted]}"
matrix[key] += 1
matrix
end
puts "Confusion counts: #{confusion_matrix}"
# => {"cat_cat"=>2, "dog_dog"=>1, "cat_dog"=>1, "dog_cat"=>1}
# Calculate accuracy with reduce
correct = predictions.reduce(0) { |sum, p| p[:actual] == p[:predicted] ? sum + 1 : sum }
accuracy = correct.to_f / predictions.size
puts "Accuracy: #{(accuracy * 100).round(1)}%" # => 60.0%
Find and Detect for Search
# Find first anomalous reading in time series
time_series = [0.1, 0.12, 0.11, 0.15, 0.13, 0.45, 0.12, 0.11]
# Create a dynamic threshold detector
def make_anomaly_detector(window_size: 5, threshold_multiplier: 3.0)
->(series) {
series.each_with_index.find do |value, index|
next false if index < window_size
window = series[[0, index - window_size].max...index]
mean = window.sum / window.size
std = Math.sqrt(window.map { |v| (v - mean) ** 2 }.sum / window.size)
std > 0 && (value - mean).abs > threshold_multiplier * std
end
}
end
detector = make_anomaly_detector(window_size: 5, threshold_multiplier: 2.5)
anomaly = detector.call(time_series)
if anomaly
value, index = anomaly
puts "Anomaly detected at index #{index}: value #{value}"
# => Anomaly detected at index 5: value 0.45
end
Chaining Methods for Pipeline Elegance
Real AI pipelines combine these operations. Ruby's method chaining creates readable data flows.
ruby
class DataPipeline
def initialize(data)
@data = data
end
def filter(&block)
@data = @data.select(&block)
self
end
def transform(&block)
@data = @data.map(&block)
self
end
def aggregate(&block)
@data = @data.reduce(&block)
self
end
def to_a
@data
end
end
# Process batch inference results
results = [
{ id: 1, prediction: 0.92, latency_ms: 45, model_version: "v2" },
{ id: 2, prediction: 0.34, latency_ms: 120, model_version: "v1" },
{ id: 3, prediction: 0.89, latency_ms: 38, model_version: "v2" },
{ id: 4, prediction: 0.95, latency_ms: 200, model_version: "v1" },
{ id:
Top comments (0)