DEV Community

kojix2
kojix2

Posted on • Edited on

8 4

Easy machine learning with Ruby using Rumale

What is Rumale

A powerful library for machine learning written in pure Ruby!
Rumale is created by @yoshoku.
https://github.com/yoshoku/rumale

Rumale (Ruby machine learning) is a machine learning library in Ruby. Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python. Rumale supports Linear / Kernel Support Vector Machine, Logistic Regression, Linear Regression, Ridge, Lasso, Factorization Machine, Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor classifier, K-Means, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering, Mutidimensional Scaling, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.

Install

gem install rumale
Enter fullscreen mode Exit fullscreen mode

Prepare a dataset

require 'rumale'
require 'daru'
require 'rdatasets'

# load datasets
iris = RDatasets.load(:datasets, :iris)
# Daru::DataFrame

# labels # Numo::Int32#shape=[150]
iris_labels = iris['Species'].to_a
encoder = Rumale::Preprocessing::LabelEncoder.new
labels = encoder.fit_transform(iris_labels) 

# samples Numo::DFloat#shape=[150,4]
# (Daru -> NArray )
samples = Numo::DFloat[*iris[0..3].to_matrix.to_a]
Enter fullscreen mode Exit fullscreen mode

Classification models

# Support vector machine
model = Rumale::LinearModel::SVC.new(
  reg_param: 0.0001,
  fit_bias: true,
  max_iter: 3000,
  random_seed: 1
)
Enter fullscreen mode Exit fullscreen mode

Various classifiers

model = Rumale::Tree::DecisionTreeClassifier.new(random_seed: 1)
model = Rumale::Ensemble::RandomForestClassifier.new(random_seed: 1)
model = Rumale::NearestNeighbors::KNeighborsClassifier.new(n_neighbors: 5)
model = Rumale::NaiveBayes::GaussianNB.new
# etc...
Enter fullscreen mode Exit fullscreen mode

Cross validation

# KFold
kf = Rumale::ModelSelection::StratifiedKFold.new(
  n_splits: 5,
  random_seed: 1
)

cv = Rumale::ModelSelection::CrossValidation.new(
  estimator: model,
  splitter: kf
)
report = cv.perform(samples, labels)
Enter fullscreen mode Exit fullscreen mode

Result

scores = report[:test_score]
puts scores.sum / scores.size
# 0.9466666666666667
Enter fullscreen mode Exit fullscreen mode

Learning and Predicting

# Learning
model.fit(samples, labels)

# Predicting
# accept 2D NArray  (Numo::DFloat#shape=[150,4])
p model.predict(samples).to_a
Enter fullscreen mode Exit fullscreen mode

Save and load models

# Save a model
File.binwrite("model.dat", Marshal.dump(model))

# Load a model
model = Marshal.load(File.binread("model.dat"))
Enter fullscreen mode Exit fullscreen mode

Enjoy!

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more →

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more