Introduction
Hyperparameter tuning across multiple models presents a common challenge for ML practitioners. Tracking experiment results, managing configurations, and ensuring reproducibility becomes increasingly difficult as the number of models grows. This post walks through a solution that combines Amazon SageMaker, MLflow, and Optuna to create an automated, scalable hyperparameter optimization pipeline.
The use case that motivated this work involved training separate demand forecasting models for different product categories—smartphones, laptops, tablets, and accessories. Each category exhibits distinct patterns, making category-specific models more effective than a single unified model. The goal was to automate the hyperparameter search, centralize experiment tracking, and enable parallel training across all categories.
Manual hyperparameter tuning workflows often suffer from several issues. Experiment results end up scattered across notebooks and spreadsheets. Configurations from previous runs get lost or forgotten. Comparing results across different models requires tedious manual aggregation. And scaling to additional models means duplicating effort. A viable solution needs to address these pain points while integrating smoothly with existing ML workflows and AWS infrastructure.
Architecture Overview
The architecture leverages several AWS services working together. SageMaker Studio provides the development environment for notebook-based experimentation. When ready for full optimization runs, SageMaker Pipelines orchestrates notebook jobs for each product category in parallel. Each job uses Optuna to search for optimal XGBoost hyperparameters, with all experiments logged to a managed MLflow server.
Model artifacts, metrics, and visualizations are stored in Amazon S3. The entire infrastructure is defined in CloudFormation, enabling consistent deployments across accounts and regions.
The stack can be deployed by running the following command in bash, by setting your user (for sagemaker domain), bucket name for storing the artifacts, and the region.
cd infrastructure
./infrastructure/deploy.sh --user ryan --bucket sm-mlflow-optuna --region us-east-1
You can monitor the deployment of the resources in the cloudformation console under the name of the stack.
Note Deployment typically minutes, with most of that time spent provisioning the MLflow server (however you could also update the cloudformation template and use mlflow serverless option in Studio as well).
After deployment, access your SageMaker Studio domain via your created user from the cloudformation template. You will see the private space provisioned. The cloudformation stack deploys this with the latest Sagemaker distribution image and default ml.t3.medium instance size. A lifecycle configuration script is also attached to install additional dependencies. Furthermore, auto shutdown is also enabled to shut the space after 60 mins of idle activity. Run the space and wait for it to show in running state.
Navigate to the mlflow app and under tracking server, you should see your tracking server provisioned, with the artifact location as a prefix within the bucket deployed in cloudformation. Make a note of the mlflow tracking server arn as you will need to update it in the notebooks.
Once the space is in running state, click open. Navigate to git branch icon in the sidebar and clone the repository https://github.com/ryankarlos/sagemaker_mlflow_optuna.git using https.
The repository has two notebooks which will be used:
- fm_train.ipynb: Notebook that runs the execution of the data preparation, processing and model training, logging to mlflow server using optuna as backend for hyperparameter tuning. When running the notebook, the execution will run for the category and the parameters defined in the notebook cell.
Open the notebook at update the section for the mlflow arn you noted previously. We will briefly go over what each of the cells is doing in the next sections.
-
nb-job-pipeline.ipynb: This is the main orchestration notebook, here we execute different configurations of
fm_train.ipynbnotebook as Notebook Job Steps and stitch them together into a singular SageMaker Pipeline. This will run the training of models for each of the 4 product categories in the dataset, so we will have 4 models. We will describe how we accomplish this in future sections as it will require a few settings in Sagemaker Studio from the user.
You will need to update the variables for bucket and region in the cell in screenshot below if you have deployed the cloudformation stack with different values.
Data Preparation
This example uses synthetic electronics sales data generated through Claude Opus 4.5 in Kiro, for number of daily units sold for laptops, smartphones, accessories and tablets. The data generator creates features with realistic correlations to the target variable, including price sensitivity, promotional effects, seasonality, and competitive dynamics, weekends. Some of the requirements in the prompt, when generating The synthetic data was to produce feature correlations in the 0.2-0.76 range against the target, providing Optuna with meaningful signal for optimization. Weak or nonexistent correlations would limit the effectiveness of any hyperparameter search. The target variable units_sold was generated with based on a combination of these features with some added noise.
Hyperparameter Optimization with Optuna
Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. it handles the hyperparameter search via Bayesuan Optimisation using a Tree-structured Parzen Estimator (TPE) sampler by default, although users have the option of choosing other sampler options. TPE models the relationship between hyperparameters and objective values, focusing exploration on promising regions of the search space.
For this example, we are going to use XgBoost for predicting the number of units sold for each category. The XGBoost search space includes:
- Booster type (gbtree, gblinear, dart)
- Regularization parameters (lambda, alpha) with log-uniform distributions
- Tree depth and learning rate
- Growth policy
Log-uniform distributions work well for regularization parameters where optimal values can span several orders of magnitude. The Optuna documentation on search spaces covers the available distribution options.
Optuna uses the concepts of study and trial. A study is the optimization based on an objective function. A trial is a single execution of the objective function
The objective function for this use case is defined as below. The goal of a study is to find out the optimal set of hyperparameter values through multiple trials (e.g., n_trials=50).
def objective(trial):
"""Optuna objective function with MLflow child runs."""
with mlflow.start_run(run_name=f"trial-{trial.number}", nested=True):
# Suggest hyperparameters
params = {
"objective": "reg:squarederror",
"eval_metric": "rmse",
"booster": trial.suggest_categorical("booster", ["gbtree", "gblinear", "dart"]),
"lambda": trial.suggest_float("lambda", 1e-8, 1.0, log=True),
"alpha": trial.suggest_float("alpha", 1e-8, 1.0, log=True),
}
# Conditional hyperparameters based on booster type
if params["booster"] in ["gbtree", "dart"]:
params["max_depth"] = trial.suggest_int("max_depth", 1, 9)
params["eta"] = trial.suggest_float("eta", 1e-8, 1.0, log=True)
params["gamma"] = trial.suggest_float("gamma", 1e-8, 1.0, log=True)
params["grow_policy"] = trial.suggest_categorical(
"grow_policy", ["depthwise", "lossguide"]
)
# Train model
dtrain = xgb.DMatrix(train_x, label=train_y)
dvalid = xgb.DMatrix(valid_x, label=valid_y)
model = xgb.train(params, dtrain, num_boost_round=100)
preds = model.predict(dvalid)
# Calculate metrics
mse = mean_squared_error(valid_y, preds)
rmse = math.sqrt(mse)
# Log to MLflow
mlflow.log_params(params)
mlflow.log_metric("mse", mse)
mlflow.log_metric("rmse", rmse)
return mse # Optuna minimizes this value
Optuna provides integration with mlflow which allows every trial to be systematically recorded. Mlflow allows nesting runs for each experiment. Each iteration (or trial) in Optuna can be considered a 'child run' in mlflow. Each child run will track the specific hyperparameters used and the resulting metrics, providing a consolidated view of the entire optimization process. All child runs can be grouped into a parent run in mlflow, which represents the entire optimization study for a particular product catogory e.g. laptops. This structure keeps experiments organized in the MLflow where the overall best result appears at the parent level, with individual trials available for detailed inspection.
Parameterizing Notebooks for Pipeline Execution
SageMaker Pipelines executes notebooks as jobs, but proper parameterization is essential. The mechanism relies on cell tags—specifically, ptagging a cell with "parameters"](https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-auto-run-troubleshoot-override.html) using the JupyterLab metadata editor.
In this example, you will need to tag the cell in the notebook in the screenshot below, with the parameters tag, This cell defines all the parameters/configuration which may need to be changed with every model training run for each category e.g. category name, model starting parameters, number of optuna trials etc.
Open the fb_train.ipynb notebook, select the cell titled Configuration and expand the common tools section in the right sidebar. You should see a parameters tag already in the tag box, click on it to apply the tab to the cell. This will appear with a check mark as in the screenshot below.
When a notebook job run, the notebook job executor searches for a Jupyter cell tagged with the parameters tag and applies the new parameters or parameter overrides immediately after this cell.
Note All parameter values must be strings, so any constants that are overridden are injected as strings. Hence, you will see one of the cells in the notebook casts, the variables back to int and float as required. The notebook jobs documentation provides complete details on parameterization.
SageMaker Pipeline with Notebook Job Steps for Running Training Across Multiple Categories
SageMaker Notebook Step Jobs enable automated execution of Jupyter notebooks as managed compute jobs. When integrated with SageMaker Pipelines, they provide a scalable mechanism for running parameterized training workflows across multiple model categories.
The pipeline creates a NotebookJobStep for each product category using the SageMaker Pipelines SDK. Each step is configured with category-specific parameters, compute resources, and execution policies. The NotebookJobStep API reference details all available configuration options.
The following snippet from the notebook, is how the notebook job step is created using the [python sdk] in the notebook_pipeline.ipynb) notebook (https://docs.aws.amazon.com/sagemaker/latest/dg/create-notebook-auto-run-sdk.html).
from sagemaker.workflow.notebook_job_step import NotebookJobStep
nb_step = NotebookJobStep(
name=step_name,
description=f"XGBoost training for {category}",
notebook_job_name=step_name,
image_uri=image_uri,
kernel_name=kernel_name,
display_name=step_name,
role=sagemaker.get_execution_role(),
s3_root_uri=notebook_artifacts,
additional_dependencies=[
"/home/sagemaker-user/sagemaker_mlflow_optuna/scripts"
],
initialization_script="nb_job_init.sh",
input_notebook=train_notebook,
instance_type=instance_type,
parameters=nb_job_params,
max_runtime_in_seconds=3600,
max_retry_attempts=2,
There are few more non-default additional settings that need to be included. The parameters to override in the notebook need to be passed as a dictionary - an example as below.
{
"category: "smartphones",
"n_trials": 50,
"experiment_name": "electronics-smartphones",
"test_size": 0.25
}
The additional dependencies option allows us to include any additional files or folders along with the notebook to be made available when the job is running in the Sagemaker managed instance. Here the scripts folder path is included as the notebook imports from some python modules in this folder. An initialization script option, allows installing the necessary libraries in the instance which may not be present in the base image uri defined. We also need to include some other scripts along with the notebook, as it imports from functions from these scripts. Sagemaker does not include any other file besides the main input_notebook file by default when initialisation the training job instance. The additional_dependencies option allows different folder or files to passed as a list, The Notebook Job will now have access to all files under the input file's folder, in this case scripts. While the notebook job is running the file structure of directory remains unchanged
Whilst, we use the python sdk to automate this, a Notebook Job can also be initiated via the console as explained in the docs. At the top of notebook tab, click on the Notebook Jobs widget in blue
In the next configuration tab, we can input all the options required e.g. adding in parameters, including additional files/scripts folder the scheduler tries to infer a selection of default options and automatically populates the form to help you get started quickly. If you are using Studio, at minimum you can submit an on-demand job without setting any options. You can also submit a (scheduled) notebook job definition supplying just the time-specific schedule information. However, you can customize other fields if your scheduled job requires specialized settings. If you are running a local Jupyter notebook, the scheduler extension provides a feature for you to specify your own defaults (for a subset of options) so you don't have to manually insert the same values every time.
When you create a notebook job, you can include additional files such as datasets, images, and local scripts. To do so, choose Run job with input folder. The Notebook Job will now have access to all files under the input file's folder. While the notebook job is running the file structure of directory remains unchanged.
Each notebook job runs on its own compute instance, enabling true parallel execution. The ml.m5.xlarge instance type (4 vCPUs, 16 GB RAM) provides sufficient resources for XGBoost training with 50 Optuna trials. For larger workloads or GPU-accelerated training, you can specify different instance types. The SageMaker instance types documentation lists all available options for notebook jobs.
By defining an iterable of Notebook Job Steps with different parameter configurations for each of the categories, we can then execute these as a Sagemaker Pipeline.
session = PipelineSession()
role = sagemaker.get_execution_role()
pipeline = Pipeline(
name=pipeline_name,
steps=pipeline_steps,
sagemaker_session=session,
)
pipeline.upsert(role_arn=role)
execution = pipeline.start()
print(f"Pipeline: {pipeline_name}")
print(f"Execution: {execution.arn}")
When the pipeline starts, all four categories begin training simultaneously. Each runs 50 Optuna trials, logs results to MLflow, and saves the best model. You can monitor the pipeline execution from the Sagemaker Studio UI under the Pipelines section and check the logs for each step.
Notebook Job Logs and Executed Notebooks
After execution, notebook job outputs for all the steps in the pipeline are stored in S3 at the specified s3_root_uri under a prefix associated with the Sagemaker pipeline execution id, as shown in screenshot below
Download the output.tar.gz file, unzip and the executed notebook should be named after the name of the step. In addition, the sagemaker execution log file is also attached. Open the notebook in sagemaker to view the cell executions or any execution log errors.
Integration with MLflow and Experiment Tracking
The MLflow UI displays all experiments organized by category. Each experiment shows optimization history—how the objective value improved across trials. The nested run structure (parent run per category, child runs per trial) provides clear organization. Runs can be compared, parameter distributions examined, and artifacts downloaded for offline analysis.
Notebook jobs integrate seamlessly with MLflow because they run in isolated environments with the same MLflow tracking URI configured. Each job connects to the managed MLflow server independently, ensuring all experiments are logged centrally regardless of which compute instance executed the training.
The nested run structure provides clarity. At a glance, the best result for each category is visible. Expanding a parent run reveals all child trials with their logged parameters and metrics.
Optuna's built-in visualizations—parameter importance plots, parallel coordinate plots, optimization history—are logged as artifacts alongside the models.
Clicking on any child runs (tuning trial) reveals logged metrics associated with the trial
The parent run, stores the best model metrics and logged model artifacts, signatures, plots, code etc. The model can then be retrieved for inference or added to the mdoel registry for versioning, promotion and deployment.
Plots that were logged such as feature importance and residual plots are visible directly in the Mlflow console
By setting the environment variable MLFLOW_ENABLE_SYSTEM_METRICS_LOGGING to true in the code, we can also log system metrics automatically for each run and child run to help decide on optimal instance types for future runs.
Tearing Down Resources
Once you are done experimenting, you can tear down the resources to save cost by navigating to the cloudformation console and selecting delete stack. Before doing this, make sure that you shut down any running Sagemaker apps and empty the bucket contents.
You can monitor the resources state. Note, that deletion of mlflow may take over 20 mins.























Top comments (0)