Model parameters refer to the weights and biases learned by the model as it goes through training iterations.
Hyperparameters are, on the other hand, parameters that we as model builders can control.
Model architecture hyperparameters - Hyperparameters that control model's underlying mathematical function
Model training hyperparameters - Hyperparameters that control the training loop and the way the optimizer works
- Define a set of values for each hyperparameter that you want to optimize
- Use grid search - it will try every combination of the specified values and return the combination that results in the best evaluation metric for the model
- As the number of hyperparameters and values for each hyperparameter increases, the number of combinations increase and the time required to try them all increases => combinatorial explosion.
- It's a brute force solution => it doesn't learn. It will continue trying the combinations even after reaching a certain threshold, say, we reach a point where the error starts increasing instead of decreasing.
A faster alternative to grid search.
Unlike grid search, this approach will randomly sample values for each hyperparameter and try the combination.
- Define range of values for each hyperparameter that you want to optimize
- Mention number of times you would want to randomly sample values for each hyperparameter
- Use random search
This library provides solution that scales and learns from previous trials to find an optimal combination of hyperparameter values.
EXAMPLE - tuning the number of neurons in the first and second hidden layers of a MNIST classification model
import keras_tuner as kt from tensorflow import keras def build_model(hp): model = keras.Sequential([ keras.layers.Flatten(input_shape=(28, 28)), keras.layers.Dense(units=hp.Int('first_hidden', min_value=32, max_value=256, step=32), activation='relu'), keras.layers.Dense(units=hp.Int('second_hidden', min_value=32, max_value=256, step=32), activation='relu'), keras.layers.Dense(units=10, activation='softmax') ]) model.compile(optimizer=keras.optimizers.Adam( hp.Float('learning_rate', min_value=.005, max_value=.01, sampling='log')), loss='sparse_categorical_crossentropy', metrics=['accuracy']) return model tuner = kt.BayesianOptimization( build_model, objective='val_accuracy', max_trials=10, ) tuner.search(x_train, y_train, validation_split=0.1, epochs=10) best_hps = tuner.get_best_hyperparameters(num_trials=1)
Goal of this optimization approach - Directly train the model or call the objective function (the process of training the ML model) as few times as possible as it's a costly operation
One of the issues with the above approaches is that every time a new set of hyperparameters is tried on, it means running the model through an entire training loop. This is what Bayesian optimization tries to solve.
- Choose hyperparameters that need optimization
- Define a range of values for these hyperparameters
- Define the objective function
- Bayesian optimization uses this objective function to create a new function that emulates our model and is much cheaper to run (surrogate function)
- Surrogate function is used by B.O. to find the best combination of hyperparameters
- Once the best combination is found, model is run through a full training loop using these values
- The results post training are fed back into the surrogate function and the process is repeated for