DEV Community

pickuma
pickuma

Posted on • Originally published at pickuma.com

Optuna Tutorial: Automate Hyperparameter Tuning for ML Models in Python

Grid search has a scaling problem. Tune four hyperparameters with five candidate values each and you have 625 model fits to run. Add a fifth parameter and it jumps to 3,125. Random search trims the count, but it still spends trials on regions of the search space a smarter method would have abandoned after the first few results. Optuna, the open-source optimization framework maintained by Preferred Networks, takes a different approach: it treats tuning as a sequential optimization problem and uses the outcome of past trials to decide what to evaluate next.

We ran Optuna against several scikit-learn and PyTorch models to see where it fits a real tuning workflow. This is what the framework does, where the speedups come from, and how to wire it into training code you already have.

How the define-by-run API works

Most tuning libraries make you declare the entire search space up front as a static dictionary. Optuna uses what it calls a define-by-run API: the search space is constructed dynamically while the objective function executes. You write a plain Python function, ask for parameter values inside it with suggest_* calls, and return a score.

import optuna
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

def objective(trial):
    n_estimators = trial.suggest_int("n_estimators", 50, 500)
    max_depth = trial.suggest_int("max_depth", 2, 32)
    max_features = trial.suggest_float("max_features", 0.1, 1.0)

    clf = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        max_features=max_features,
    )
    return cross_val_score(clf, X, y, cv=3).mean()

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100)
print(study.best_params)
Enter fullscreen mode Exit fullscreen mode

The payoff is conditional search spaces. Because the suggestions are ordinary Python calls, you can branch on them: pick suggest_categorical("classifier", ["svm", "rf"]) first, then suggest C and gamma only when the SVM branch runs. A static grid can't express that without enumerating invalid combinations and wasting trials on them. Each call to study.optimize runs n_trials evaluations, and study.best_params and study.best_value hold the winner afterward. Return a tuple instead of a single score and the same API handles multi-objective problems, such as trading accuracy against inference latency.

Pruners and samplers: the two levers

Optuna's speed comes from two components you can swap independently.

The sampler decides which values to try. The default is the Tree-structured Parzen Estimator (TPE), which models the relationship between parameter values and scores, then draws new candidates from the regions that have performed well. Optuna also ships RandomSampler, GridSampler, CmaEsSampler, and NSGAIISampler for multi-objective work. You change the strategy with one argument: optuna.create_study(sampler=optuna.samplers.CmaEsSampler()).

The pruner stops unpromising trials before they finish. For any model trained iteratively — gradient-boosted trees, neural networks — you report an intermediate score after each step and let Optuna kill trials tracking well below the others.

def objective(trial):
    lr = trial.suggest_float("lr", 1e-5, 1e-1, log=True)
    model = build_model(lr)

    for epoch in range(30):
        train_one_epoch(model)
        accuracy = validate(model)
        trial.report(accuracy, epoch)
        if trial.should_prune():
            raise optuna.TrialPruned()
    return accuracy

study = optuna.create_study(
    direction="maximize",
    pruner=optuna.pruners.MedianPruner(),
)
Enter fullscreen mode Exit fullscreen mode

MedianPruner cuts a trial when its intermediate score falls below the median of completed trials at the same step. SuccessiveHalvingPruner and HyperbandPruner allocate budget more aggressively. The log=True flag on suggest_float matters here too — learning rates and regularization strengths span orders of magnitude, and a log-uniform scale spreads trials evenly across them instead of clustering near the high end.

Pruning interacts with your sampler. The TPE sampler learns from completed trials, so very aggressive pruning can starve it of the full-length results it needs to model the space well. If your best score plateaus early, try a gentler pruner or raise n_startup_trials so the sampler sees enough finished trials before pruning takes over.

Wiring Optuna into PyTorch, TensorFlow, and scikit-learn

Optuna is framework-agnostic because the objective function is just Python — whatever runs inside it is up to you. For the iterative-training case, the optuna.integration module provides callbacks that handle the report and should_prune plumbing for you, with hooks for PyTorch Lightning, Keras, XGBoost, and LightGBM so you don't hand-write the pruning loop.

Two features matter once you move past a laptop notebook. First, storage: pass storage="sqlite:///optuna.db" to create_study and every trial is persisted to disk. Kill the process and resume with load_if_exists=True, and the study continues from where it stopped.

study = optuna.create_study(
    study_name="rf-tuning",
    storage="sqlite:///optuna.db",
    load_if_exists=True,
)
Enter fullscreen mode Exit fullscreen mode

Second, parallelism: point multiple processes or machines at the same database backend — SQLite for a single box, PostgreSQL or MySQL for a cluster — and they share one study, each pulling trials and writing results back. There is no separate scheduler to stand up. The companion optuna-dashboard package reads the same storage and renders trial history, parameter importance, and optimization curves in the browser.

A tuning run that won't waste your afternoon

Start small and let the data tell you where to spend. Define your objective, run 100 trials with the default TPE sampler and MedianPruner, and open the dashboard. The parameter-importance chart ranks which hyperparameters actually moved the score — often one or two dominate and the rest are noise. Freeze the parameters that don't matter, narrow the ranges of the ones that do, and run another 100 trials on the smaller space. Two or three rounds of that loop usually beats hours of manual tuning, and because every trial sits in the storage backend, you can stop and resume between rounds without losing history. If the score stalls, swap the sampler — CmaEsSampler often does better on smooth, continuous spaces — before you reach for a bigger trial budget.


Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.

Top comments (0)