Hyperparameter Optimization and Tuning

Reading Time: 3 minutes

A Machine Learning model is defined as a mathematical model with a number of parameters that need to be learned from the data. By training a model with existing data, we are able to fit the model parameters.
However, there is another kind of parameters, known as Hyperparameters. Hyperparameters contain the data that govern the training process itself These parameters express important properties of the model such as its complexity or how fast it should learn. Parameters which define the model architecture are referred to as hyperparameters and thus this process of searching for the ideal model architecture is referred to as hyperparameter optimization and tuning.

The purpose of this article is to consider various strategies for optimizing hyperparameters.

These hyperparameters Tuning & optimization might address model design questions such as:

Degree of polynomial features should I use for my linear model?
What is maximum depth of the decision tree I’m using?
Amount of minimum number of samples required at a leaf node in my decision tree?
Number of trees should I include in my random forest?
How many neurons should I have in my neural network layer?
How many layers should I have in my neural network?
What should I set my learning rate to for gradient descent?

Optimal hyperparameter tuning could accomplished using variety of methods. We’ll focus on two strategies in particular.

Grid vs Random

The grid search is an exhaustive search through a set of manually specified set of values of hyperparameters. It means you have a set of models (which differ from each other in their parameter values, which lie on a grid). What you do is you then train each of the models and evaluate it using cross-validation. You then select the one that performed best.

from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
iris = load_iris()
svc = SVC()
# grid search on kernel and C hyperparameters
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
clf = GridSearchCV(svc, param_grid=parameters)
clf.fit(iris.data, iris.target)
>>> print('Grid best parameters (max accuracy): ', clf.best_params_)
Grid best parameters (max accuracy):  {'C': 1, 'kernel': 'linear'}
>>> print('Grid best score (accuracy): ', clf.best_score_)
Grid best score (accuracy):  0.98

Drawback : GridSearchCV will go through all the intermediate combinations of hyperparameters which makes grid search computationally very expensive.

Random search differ from grid search in that random search does not provide a separate set of values ​​that can be searched for each hyperparameter, but instead provides a statistical distribution of each hyperparameter that allows random selection of values. For each hyperparameter, we’ll define a sampling distribution. Often some of the hyperparameters matter much more than others. Performing random search rather than grid search allows much more precise discovery of good values for the important ones. This approach reduces unnecessary computation.

# Necessary imports
from scipy.stats import randint
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import RandomizedSearchCV
# Creating the hyperparameter grid 
param_dist = {"max_depth": [3, None],
              "max_features": randint(1, 9),
              "min_samples_leaf": randint(1, 9),
              "criterion": ["gini", "entropy"]}
# Instantiating Decision Tree classifier
tree = DecisionTreeClassifier()
# Instantiating RandomizedSearchCV object
tree_cv = RandomizedSearchCV(tree, param_dist, cv = 5)
tree_cv.fit(X, y)
# Print the tuned parameters and score
print("Tuned Decision Tree Parameters: {}".format(tree_cv.best_params_))
print("Best score is {}".format(tree_cv.best_score_))


Tuned Decision Tree Parameters: {‘min_samples_leaf’: 5, ‘max_depth’: 3, ‘max_features’: 5, ‘criterion’: ‘gini’}
Best score is 0.7265625


  1. Hyperparameter optimization – Wikipedia
  2. CS231n: Convolutional Neural Networks for Visual Recognition
  3. Random Search for Hyper-Parameter Optimization (image source)