Navigation

Making developers awesome at machine learning

Click to Take the FREE Python Machine Learning Crash-Course

How to Tune Algorithm Parameters with Scikit-Learn

By Jason BrownleeonAugust 21, 2019in Python Machine Learning 82

Machine learning models are parameterized so that their behavior can be tuned for a given problem.

Models can have many parameters and finding the best combination of parameters can be treated as a search problem.

In this post, you will discover how to tune the parameters of machine learning algorithms in Python using thescikit-learn library.

Kick-start your project with my new bookMachine Learning Mastery With Python, includingstep-by-step tutorials and thePython source code files for all examples.

Let’s get started.

Update Jan/2017: Updated to reflect changes to the scikit-learn API in version 0.18.

Tuning an algorithm like Tuning a Piano
Photo byKatie Fricker, some rights reserved

Machine Learning Algorithm Parameters

Algorithm tuning is a final step in the process of applied machine learning before presenting results.

It is sometimes calledHyperparameter optimization where the algorithm parameters are referred to as hyperparameters whereas the coefficients found by the machine learning algorithm itself are referred to as parameters. Optimization suggests the search-nature of the problem.

Phrased as a search problem, you can use different search strategies to find a good and robust parameter or set of parameters for an algorithm on a given problem.

Two simple and easy search strategies are grid search and random search. Scikit-learn provides these two methods for algorithm parameter tuning and examples of each are provided below.

Grid Search Parameter Tuning

Grid search is an approach to parameter tuning that will methodically build and evaluate a model for each combination of algorithm parameters specified in a grid.

The recipe below evaluates different alpha values for the Ridge Regression algorithm on the standard diabetes dataset. This is a one-dimensional grid search.

Grid Search for Algorithm Tuning

Python

# Grid Search for Algorithm Tuningimport numpy as npfrom sklearn import datasetsfrom sklearn.linear_model import Ridgefrom sklearn.model_selection import GridSearchCV# load the diabetes datasetsdataset = datasets.load_diabetes()# prepare a range of alpha values to testalphas = np.array([1,0.1,0.01,0.001,0.0001,0])# create and fit a ridge regression model, testing each alphamodel = Ridge()grid = GridSearchCV(estimator=model, param_grid=dict(alpha=alphas))grid.fit(dataset.data, dataset.target)print(grid)# summarize the results of the grid searchprint(grid.best_score_)print(grid.best_estimator_.alpha)

# Grid Search for Algorithm Tuning

importnumpyasnp

fromsklearnimportdatasets

fromsklearn.linear_modelimportRidge

fromsklearn.model_selectionimportGridSearchCV

# load the diabetes datasets

dataset=datasets.load_diabetes()

# prepare a range of alpha values to test

alphas=np.array([1,0.1,0.01,0.001,0.0001,0])

# create and fit a ridge regression model, testing each alpha

model=Ridge()

grid=GridSearchCV(estimator=model,param_grid=dict(alpha=alphas))

grid.fit(dataset.data,dataset.target)

print(grid)

# summarize the results of the grid search

print(grid.best_score_)

print(grid.best_estimator_.alpha)

For more information see theAPI for GridSearchCV andExhaustive Grid Search section in the user guide.

Random Search Parameter Tuning

Random search is an approach to parameter tuning that will sample algorithm parameters from a random distribution (i.e. uniform) for a fixed number of iterations. A model is constructed and evaluated for each combination of parameters chosen.

The recipe below evaluates different alpha random values between 0 and 1 for the Ridge Regression algorithm on the standard diabetes dataset.

Randomized Search for Algorithm Tuning

Python

# Randomized Search for Algorithm Tuningimport numpy as npfrom scipy.stats import uniform as sp_randfrom sklearn import datasetsfrom sklearn.linear_model import Ridgefrom sklearn.model_selection import RandomizedSearchCV# load the diabetes datasetsdataset = datasets.load_diabetes()# prepare a uniform distribution to sample for the alpha parameterparam_grid = {'alpha': sp_rand()}# create and fit a ridge regression model, testing random alpha valuesmodel = Ridge()rsearch = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=100)rsearch.fit(dataset.data, dataset.target)print(rsearch)# summarize the results of the random parameter searchprint(rsearch.best_score_)print(rsearch.best_estimator_.alpha)

# Randomized Search for Algorithm Tuning

importnumpyasnp

fromscipy.statsimportuniformassp_rand

fromsklearnimportdatasets

fromsklearn.linear_modelimportRidge

fromsklearn.model_selectionimportRandomizedSearchCV

# load the diabetes datasets

dataset=datasets.load_diabetes()

# prepare a uniform distribution to sample for the alpha parameter

param_grid={'alpha':sp_rand()}

# create and fit a ridge regression model, testing random alpha values

model=Ridge()

rsearch=RandomizedSearchCV(estimator=model,param_distributions=param_grid,n_iter=100)

rsearch.fit(dataset.data,dataset.target)

print(rsearch)

# summarize the results of the random parameter search

print(rsearch.best_score_)

print(rsearch.best_estimator_.alpha)

For more information see theAPI for RandomizedSearchCV and the theRandomized Parameter Optimization section in the user guide.

Summary

Algorithm parameter tuning is an important step for improving algorithm performance right before presenting results or preparing a system for production.

In this post, you discovered algorithm parameter tuning and two methods that you can use right now in Python and the scikit-learn library to improve your algorithm results. Specifically grid search and random search.

Discover Fast Machine Learning in Python!

Develop Your Own Models in Minutes

...with just a few lines of scikit-learn code

Learn how in my new Ebook:
Machine Learning Mastery With Python

Coversself-study tutorials andend-to-end projects like:
Loading data,visualization,modeling,tuning, and much more...

Finally Bring Machine Learning To
Your Own Projects

Skip the Academics. Just Results.

See What's Inside

82 Responses toHow to Tune Algorithm Parameters with Scikit-Learn

HarshOctober 23, 2014 at 4:59 pm#
Nice summary. I think that due to dependency of few parameters on each other you cannot choose any combination of them in GridSearch, else it would error out. I’ve written a post exclusively on GridSearchhttp://harshtechtalk.com/model-hyperparameter-tuning-scikit-learn-using-gridsearch/
Reply
AlexSeptember 5, 2016 at 6:29 pm#
Sir, this is an excellent introduction to hyperparameter optimization.
I’m now thinking, there must be a process for determining an optimal range of parameter values for a particular parameter. For example, when demonstrating GridSearchCV, you used alphas = np.array([1, 0.1, 0.01, 0.001, 0.0001, 0]). What principles guide you into selecting those particular values? And where can I read more about those principles — do they have their roots in statistics, probability theory, or something else?
One more thing, I’m still a machine learning novice and the parameters used to tune Scikit-learn algorithms hardly make sense to me. For example, the Ridge model has parameters “alpha”, “fit_intercept”, “normalize”, “copy_X”, “max_iter”, “tol”, “solver”, and “random_state”. Those parameters don’t make sense to me because I understand I lack the background necessary to make sense of them. What is this background that I am missing?
By the way, I’m subscribed to your newsletter with the same email I’ve used to post this comment. I like your mail, very insightful. I’ll appreciate it if you can also send a copy of your response to my mailbox.
Reply
- Jason BrownleeSeptember 6, 2016 at 9:44 am#
  Hi Alex, I just chose popular values for alpha as a starting point for the search. A good practice.
  You could use random search on a suite of similar problems and try to deduce cause-effect for the parameter settings or heuristics, but you will always find a problem that breaks the rules. It is always good to use a mix of random and grid searching to expose “good” regions of the hyperparameter search space.
  Often only a few parameters make a big difference when tuning an algorithm. You can research a given algorithm and figure out what each parameter does and normal ranges for values. A difficulty is that different implementations may expose different parameters and may require careful reading of the implementations documentation as well. Basically, lots of hard work is required.
  I hope this helps.
  Reply
Chris KnowlesSeptember 17, 2016 at 6:55 pm#
What exactly do you mean by ‘a mix of random and grid search’? Can you please elaborate? Thanks.
Reply
- Jason BrownleeSeptember 18, 2016 at 7:58 am#
  Great question Chris.
  You can use random search to find good starting points, then grid search to zoom in and find the local optima (or close to it) for those good starting points. Using the two approaches interchangeably like a manual optimization algorithm. If you have a lot of resources, you could just use a genetic algorithm or similar.
  Reply
Himanshu RaiSeptember 28, 2016 at 3:32 am#
Hey Jason,
Can you suggest any relevant material on the implementation of accelerated random search?Thanks.
Reply
- Jason BrownleeSeptember 28, 2016 at 7:42 am#
  No, sorry. Using lots of cores with random search has always worked well for me.
  Reply
AizzaacOctober 6, 2016 at 7:01 am#
When does the tuning have to be done, before or after feature selection (i mean: Forward feature selection, Recursive feature elimination , etc)?
Reply
- Jason BrownleeOctober 6, 2016 at 9:42 am#
  Hi Aizzaac,
  I recommend tuning a model after you have spot checked a number of methods. I think it is an activity to improve what is working and get the most out of it, not to find what might work.
  This step-by-step process for working through a might make things clearer:
  https://machinelearningmastery.com/start-here/#process
  Reply
EhsanOctober 8, 2016 at 6:48 am#
Thanks Jason.
Lets say we optimized our parameters by grid search or random search and get the accuracy of 0.98 so how do we realize if it did over fit or not?
I mean i remember in Poly Kernel I used grid search and got very high accuracy but then I realized it might be over fit.
Reply
- Jason BrownleeOctober 8, 2016 at 10:46 am#
  Really great question Ehsan.
  You must develop a robust test harness. Try really hard to falsify any results you get.
  For example:
  – use k-fold cross validation
  – use multiple repeats of your cross validation
  – look at the graph of performance of an algorithm while it learns over each epoch/iteration and check for test accuracy>train accuracy
  – hold back a validation dataset for final confirmation
  – and so on.
  I hope that gives you some ideas.
  Reply
  - HarryMarch 5, 2020 at 7:26 am#
    Thank you. very informative!
    Most of the times train accuracy is higher than test accuracy. So can we consider test accuracy>train accuracy as sign of over-fitting?
    Reply
    - Jason BrownleeMarch 5, 2020 at 10:34 am#
      Perhaps. Or an unrepresentative/to small test set.
      Reply
Robin CABANNESFebruary 24, 2017 at 8:33 pm#
Hi, Thank you for these explanations.
However, when I used the Grid Search Parameter Tuning with my model, it always returned to me the first value of the param_grid dictionary. For example, if I write
param_grid = {
‘solver’: [‘lbfgs’,’sgd’,’adam’],
‘alpha’: [0.0001,0.00001,0.1,1],
‘activation’:[‘relu’,’tanh’],
‘hidden_layer_sizes’: [(20)],
‘learning_rate_init’: [1,0.01,0.1,0.001],
‘learning_rate’:[‘invscaling’,’constant’],
‘beta_1’:[0.9],
‘max_iter’:[1000],
‘momentum’: [0.2,0.6,1],
}
It will return as best_params
{‘max_iter’: 1000, ‘activation’: ‘relu’, ‘hidden_layer_sizes’: 20, ‘learning_rate’: ‘invscaling’, ‘alpha’: 0.0001, ‘learning_rate_init’: 1, ‘beta_1’: 0.9, ‘solver’: ‘sgd’, ‘momentum’: 0.2}
but if I just change the order of Learning ate init for example ‘learning_rate_init’: [0.001,0.01,0.1,1], it will return:
{‘max_iter’: 1000, ‘activation’: ‘relu’, ‘hidden_layer_sizes’: 20, ‘learning_rate’: ‘invscaling’, ‘alpha’: 0.0001, ‘learning_rate_init’: 0.001, ‘beta_1’: 0.9, ‘solver’: ‘sgd’, ‘momentum’: 0.2}
Have you already had this issue?
I don’t know if I was clear,
Thanks,
Reply
- Jason BrownleeFebruary 25, 2017 at 5:55 am#
  That is very odd, I have not seen this issue
  Reply
abhinavJune 23, 2017 at 10:21 am#
Hi Jason,
If i were to conduct a grid search on say the value of k in KNN. If i standardize the whole training dataset before I fit GridSearchCv with cv = 10, wouldnt that lead to leakage of data. (referring to your example of tuning parameters in your book – lesson 21).
I am trying to create a pipeline and feed that to GridSearchCV but I get an error.
This is what I am doing:
estimator = []
estimator.append((‘Scaler’, StandardScaler()))
estimator.append((‘KNN’, KNeighborsClassifier))
model = Pipeline(estimator)
param_grid = dict(n_neighbors = [1,3,5,7,9])
kfold = KFold(n_splits = num_folds, random_state = seed)
grid = GridSearchCV(estimator = model, param_grid = param_grid, scoring = scoring, cv = kfold)
grid_result = grid.fit(X_train, Y_train)
Could you Let me know what am i missing?
Thanks
Reply
- Jason BrownleeJune 24, 2017 at 7:55 am#
  Ideally you want to split data into train test, then split train into train/validation or CV the train set. All data prep should be performed on train and applied to validation, or performed on train (all of it) and applied to test.
  Reply
ArunJuly 31, 2017 at 2:33 pm#
Hi Jason,
I would like to select features based on non-zero Lasso coefficients using my model. In doing so, I have a confusion with ‘random_state’ variable as change of its value makes different R^2 and mean-squared error (MSE). I am afraid whether it would make us suspecious about the data fed as features.
For example, such changes are complex when I vary the ‘random_state’ for X_test, X_train, y_test and y_train splits, and again use the ‘random_state’ in Lasso for reducing the number of features.
If my intension is to get only the reduced features can I not worry about R^2 and MSE of corresponding feature selection using Lasso? Or, can I identify and use any value of ‘random_state’ corresponding to a better R^2 (or RMSE) value?
Thanking you,
Reply
- Jason BrownleeJuly 31, 2017 at 3:52 pm#
  Machine learning algorithms are stochastic, and most suffer variance under different input data.
  I’d recommend re-running feature selection multiple times and perhaps taking the average of the results. Or build a model from each set (if different) and compare the performance of predictions from the resulting model.
  Reply
Sah SanAugust 21, 2017 at 12:07 am#
Hi Jason,
How do we know which hyperparameter to include in either the gridsearchcv or randomsearch?
For example, decision trees has many hyperparameter such as
min_samples_split, min_samples_leaf, min_weight_fraction_leaf, max_features, max_leaf_nodes.
I don’t know if we can include all these hyperparameter .
Any comment would be appreciated. Thanks
Your comment is awaiting moderation.
Reply
- Jason BrownleeAugust 21, 2017 at 6:08 am#
  Great question, one idea is to try them all.
  After the experience, you will notice that often only a few matter the most. E.g. the number of features to consider at each split point, and others can be pushed until diminishing returns (e.g. the number of trees).
  Reply
Sah SanAugust 24, 2017 at 8:06 am#
Thank you. It was so helpful
Reply
- Jason BrownleeAugust 24, 2017 at 4:24 pm#
  I’m glad to hear that!
  Reply
Pranita PradhanSeptember 4, 2017 at 5:47 pm#
Hello Jason,
Thank you for this nice tutorial. I want to do multi-class classification (using OneVsRestClassifier) on some patient images. And now I am in the last step of hyperparameter tuning (gridsearchCv) for which I want to use leave-one-group out cross validation. I want to know if I am doing it right. Unfortunately, I did not get any examples for gridsearch and leave-one-group out.
Here is my code,
from sklearn.model_selection import LeaveOneGroupOut
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.model_selection import GridSearchCV
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
import numpy as np
from sklearn.metrics import recall_score
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import make_scorer
X = np.random.rand((10*10)).reshape((10,10))
y = np.array([1, 2, 2, 2, 1, 2, 1, 2, 1, 2])
groups = np.array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3])
pca = PCA(n_components=2)
selection = SelectKBest(k=1)
# Build estimator from PCA and Univariate selection:
combined_features = FeatureUnion([(“pca”, pca), (“univ_select”, selection)])
# use leave one out cross validation
logo = LeaveOneGroupOut()
# Make a pipeline where the features are selected by PCA and KBest.
# Multi-class classification is performed by OneVsRestClassification scheme using SVM classifier based on leave one out CV
n_components = tuple([1, 2, 3])
k = tuple([1, 2])
C = tuple([0.1, 1, 10])
model_to_set = OneVsRestClassifier(SVC(kernel=”poly”))
pipeline = Pipeline([(“features”, combined_features), (“clf”, model_to_set)])
parameters = {‘features__pca__n_components’: n_components,
‘features__univ_select__k’:k,
‘clf__estimator__C’: C}
# Parameters are optimized using grid search
grid_search = GridSearchCV(pipeline, parameters, cv = logo, scoring= make_scorer(recall_score), verbose = 20)
grid_search.fit(X, y, groups)
y_pred = grid_search.predict(X)
print (“\n\n\n Best estimator….. \n %s” %grid_search.best_estimator_)
print (“\n\n\n Best parameters….. \n %s” %grid_search.best_params_)
print (“\n\n\n Best score….. \n %0.3f” %grid_search.best_score_)
scores = grid_search.cv_results_[‘mean_test_score’]
confusion_matrix(y, y_pred, labels=[1,2])
target_names = [‘class 1’, ‘class 2’]
print(classification_report(y, y_pred, target_names=target_names))
Reply
- Jason BrownleeSeptember 7, 2017 at 12:34 pm#
  Sorry, I cannot review your code.
  Reply
ShabnamOctober 9, 2017 at 5:07 pm#
Thank you for such a great post.
I have a question about parameters. Can they affect each other?
Suppose there are two parameters; a and b.
1. Let b be fixed and increase a. Suppose accuracy (acc1) will increase (acc2). //acc2 > acc1
2. Change b and then for the new b and new accuracy (acc3), increase a. Will accuracy (acc3) increase with a? In other words acc4 > acc3? Or since a and b are related and b is changed, acc4 < acc3 is also possible?
Reply
- Jason BrownleeOctober 10, 2017 at 7:42 am#
  Yes, they can interact which makes evaluating them separately problematic. We do the best we can.
  Reply
  - ShabnamOctober 10, 2017 at 11:38 am#
    Thank you for your quick response.
    I have another question as well.
    I have two sets of data. I am using mlp classifier from neural network to model them (I use the hyperparameter method explained in this blog), but the accuracy is not changing that much. It is always around 50%. Is there any thing I can observe to see why this is happening?
    Generally, what are the ways to debug/understand a result?
    Reply
    - Jason BrownleeOctober 10, 2017 at 4:45 pm#
      I would recommend looking at skill of the model on train and test sets over epochs. See this post on diagnostics:
      https://machinelearningmastery.com/diagnose-overfitting-underfitting-lstm-models/
      The post is for LSTMs, but just as suitable for MLPs.
      Reply
      - ShabnamOctober 11, 2017 at 5:19 am#
        Thanks a lot for your quick responses and help.
        Sure, I will look at that. Thanks.
      - Jason BrownleeOctober 11, 2017 at 7:58 am#
        No problem, let me know how you go.
Tobi AdeyemiNovember 23, 2017 at 8:12 pm#
Hi Jason. can you please refer me to any material that discusses parameter tuning for Boosting and Bagging methods.
Thanks
Reply
- Jason BrownleeNovember 24, 2017 at 9:38 am#
  With both, you can simply increase the number of trees until you reach a point of diminishing returns. That would be a good first start.
  I have information on xgboost (gradient boosting) parameter tuning here:
  https://machinelearningmastery.com/start-here/#xgboost
  Reply
Tobi AdeyemiNovember 24, 2017 at 11:28 pm#
Thanks Jason
Reply
- Jason BrownleeNovember 25, 2017 at 10:20 am#
  You’re welcome.
  Reply
ashishMarch 7, 2018 at 10:23 pm#
should we do grid search on seperate cross validation set and then when we get best_params_ dictionary we fit them model with that best_params_ on whole training set
OR
we even do grid search on whole training set and even train whole model on same training set.
Reply
- Jason BrownleeMarch 8, 2018 at 6:29 am#
  Good question. I answer it here:
  https://machinelearningmastery.com/difference-test-validation-datasets/
  Reply
Saheli SahaMarch 29, 2018 at 8:01 am#
Can you suggest a way to tune parameters for Association Rule Algorithm?
Reply
- Jason BrownleeMarch 29, 2018 at 3:16 pm#
  Good question, sorry I don’t have a worked example at this stage.
  Reply
DianaMay 27, 2018 at 2:38 am#
Hello Jason,
1) Is it necessary to standard scale the train set before doing GridSearch?
2) I would like to tune xgboost before using it as a meta-learner for ensemble learning. Should i include the first-level prediction results in the train set? Or just the original features? ( I have tried both methods, with F1 score as the cross-validation metric, and I am getting a gridsearch best score of 1 if i do the former, and a score of 0.5 if do the latter)
Reply
- Jason BrownleeMay 27, 2018 at 6:50 am#
  It depends on the data and the algorithm being used.
  xgboost uses decision trees internally.
  You could use xgboost in a stacking configuration. You would output predictions from other models and convert them into a new training dataset to fit the xgboost model.
  Reply
sndnJuly 31, 2018 at 4:18 pm#
Hi, can grid search be used to select which model is the best?
As an example, for a classification problem, can grid search be used to select which classifier among Naive Bayes, SVM, AdaBoost, Random Forest etc… is best for which parameters, for the given data?
Reply
- Jason BrownleeAugust 1, 2018 at 7:39 am#
  Grid search will tune hyperparameters.
  Model selection involves comparing the performance of well tuned models. It can be a good idea to use statistical hypothesis tests to help identify whether a model with better average performance is indeed better.
  See this post:
  https://machinelearningmastery.com/statistical-significance-tests-for-comparing-machine-learning-algorithms/
  Reply
mesobAugust 5, 2018 at 2:35 am#
Jason, in RandomSearch instead of searching a continuous parameter uniformly, such as,
from scipy.stats import uniform
param_grid = {‘C’: uniform(loc=0,scale=1)
how to do a sampling like 10^p with p being uniformly between [a,b]? Thanks.
Reply
- Jason BrownleeAugust 5, 2018 at 5:34 am#
  You could use a loop or list comprehension to create such a series.
  For example:
  [10**x for x in range(1,5) ]
  1
  [10**xforxinrange(1,5)]
  Reply
  - mesobAugust 5, 2018 at 6:48 am#
    Thanks, Jason. I was wondering if there’s such a way that generates continuous numbers like ‘uniform’ does.
    Of course, following your approach at least I can do
    [10**(x/1000) for x in range(1000,5000)] if I want a finer interval.
    Reply
    - Jason BrownleeAugust 6, 2018 at 6:22 am#
      Not that I’m aware.
      Reply
Scott MillerSeptember 22, 2018 at 1:35 am#
Hello, is there a way to implement ‘early stopping’ while using RandomizedSearchCV()?
I want to stop when the cross_validation cost starts to increase. When training my neural network (model = Sequential()) some converge (achieve minimum cost function) in 10000epochs but some take 150,000epochs.
Currently I need to do a for-loop over the different hyperparameters and calculate the cost myself. The code below is embedded in a bootstrap and 3-for_loops (1 for each hyperparameter I am testing). I just can’t get early stopping to work in either GridSearchCV() or RandomizedSearchCV().
Thanks for any guidance.
##Define some callbacks
##history: Allows us to access the MSE of the data for each epoch.
##EarlyStopping:
early_stopping = EarlyStopping(monitor=’val_loss’, patience=10000)
##Callback list to pass to model fitting
call_back=[history, early_stopping]
##Fit the model.
model.fit(X_train_scaled, y_train, validation_data=(X_cv_scaled,y_cv), epochs=n_epochs,
batch_size=n_batch_train, verbose=0, callbacks=call_back)
Reply
- Jason BrownleeSeptember 22, 2018 at 6:30 am#
  Yes, use the EarlyStopping callback with your model. What is the problem exactly?
  Reply
NyglineNovember 2, 2018 at 6:07 am#
Hi Jason, can you please tell what kind of scores is GridSearchCV() is providing when we use ‘mean_test_score’ or ‘best_score_’ , i mean is it auc_score or any other evaluation method.
Reply
- Jason BrownleeNovember 2, 2018 at 6:14 am#
  It will use the default, which is probably accuracy for classification problems.
  Specify the “scoring” argument with a value from:
  http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
  Reply
  - NyglineNovember 2, 2018 at 10:04 pm#
    Thank you Jason i got it..we can mention our desired metric in ‘scoring’ parameter..
    Reply
    - Jason BrownleeNovember 3, 2018 at 7:05 am#
      Yes.
      Reply

Muhammad Waseem AkramMarch 3, 2019 at 9:46 am#

def create_model(optimizer, learning_rate):  autoencoder = Sequential()  # Encoder Layers  autoencoder.add(Conv2D(16, (3, 3), activation='relu', padding='same', input_shape=x_train.shape[1:]))  autoencoder.add(MaxPooling2D((2, 2), padding='same'))  autoencoder.add(Conv2D(8, (3, 3), activation='relu', padding='same'))  autoencoder.add(MaxPooling2D((2, 2), padding='same'))  autoencoder.add(Conv2D(8, (3, 3), strides=(2,2), activation='relu', padding='same'))  # Flatten encoding for visualization  autoencoder.add(Flatten())  autoencoder.add(Reshape((4, 4, 8)))  # Decoder Layers  autoencoder.add(Conv2D(8, (3, 3), activation='relu', padding='same'))  autoencoder.add(UpSampling2D((2, 2)))  autoencoder.add(Conv2D(8, (3, 3), activation='relu', padding='same'))  autoencoder.add(UpSampling2D((2, 2)))  autoencoder.add(Conv2D(16, (3, 3), activation='relu'))  autoencoder.add(UpSampling2D((2, 2)))  autoencoder.add(Conv2D(1, (3, 3), activation='sigmoid', padding='same'))  autoencoder.summary()  autoencoder.compile(optimizer='adam', loss='binary_crossentropy')  return autoencoderfrom sklearn.model_selection import GridSearchCVfrom keras.wrappers.scikit_learn import KerasClassifiermodel_classifier = KerasClassifier(build_fn=create_model, verbose=1)# define the grid search parametersbatch_size = [10]#loss = ['mean_squared_error', 'binary_crossentropy']#optimizer = [Adam, SGD, RMSprop]#learning_rate = [0.001]epochs = [3, 5]param_grid = dict(batch_size=batch_size, epochs=epochs)#[(slice(None), slice(None))]grid = GridSearchCV(cv=2, estimator=model_classifier, param_grid=param_grid, n_jobs=1)grid_result = grid.fit(x_train, x_train)print("training Successfully completed")

defcreate_model(optimizer,learning_rate):

autoencoder=Sequential()

# Encoder Layers

autoencoder.add(Conv2D(16,(3,3),activation='relu',padding='same',input_shape=x_train.shape[1:]))

autoencoder.add(MaxPooling2D((2,2),padding='same'))

autoencoder.add(Conv2D(8,(3,3),activation='relu',padding='same'))

autoencoder.add(MaxPooling2D((2,2),padding='same'))

autoencoder.add(Conv2D(8,(3,3),strides=(2,2),activation='relu',padding='same'))

# Flatten encoding for visualization

autoencoder.add(Flatten())

autoencoder.add(Reshape((4,4,8)))

# Decoder Layers

autoencoder.add(Conv2D(8,(3,3),activation='relu',padding='same'))

autoencoder.add(UpSampling2D((2,2)))

autoencoder.add(Conv2D(8,(3,3),activation='relu',padding='same'))

autoencoder.add(UpSampling2D((2,2)))

autoencoder.add(Conv2D(16,(3,3),activation='relu'))

autoencoder.add(UpSampling2D((2,2)))

autoencoder.add(Conv2D(1,(3,3),activation='sigmoid',padding='same'))

autoencoder.summary()

autoencoder.compile(optimizer='adam',loss='binary_crossentropy')

returnautoencoder

fromsklearn.model_selectionimportGridSearchCV

fromkeras.wrappers.scikit_learnimportKerasClassifier

model_classifier=KerasClassifier(build_fn=create_model,verbose=1)

# define the grid search parameters

batch_size=[10]

#loss = ['mean_squared_error', 'binary_crossentropy']

#optimizer = [Adam, SGD, RMSprop]

#learning_rate = [0.001]

epochs=[3,5]

param_grid=dict(batch_size=batch_size,epochs=epochs)

#[(slice(None), slice(None))]

grid=GridSearchCV(cv=2,estimator=model_classifier,param_grid=param_grid,n_jobs=1)

grid_result=grid.fit(x_train,x_train)

print("training Successfully completed")

I want to apply Gridsearch on autoencoder but i’m getting this error.Please help me.

ValueError: Invalid shape for y: (30000, 28, 28, 1)
there is no issue with y because in autoencode we pas the x as y

Jason BrownleeMarch 4, 2019 at 6:55 am#
I recommend using a manual grid search, for example:
https://machinelearningmastery.com/how-to-grid-search-deep-learning-models-for-time-series-forecasting/
Reply
- Gorkem Can AtesOctober 25, 2020 at 7:53 am#
  for autoencoders, KerasRegressor should be used .Remaining are the same
  Reply

KaranApril 12, 2019 at 9:00 pm#
Jason,
Any rationale to use the gridsearchcv() method to find alpha? Ridgecv() also takes an array of alpha and gets you the right alpha. I tried both methods. There is a significant difference in the best alpha chosen by these methods. By the way, I used the same diabetes data that you have used in your demo code.
Reply
- Jason BrownleeApril 13, 2019 at 6:30 am#
  Interesting, I wonder if the different methods are using different config, e.g. for CV.
  Or if the number of repeats is too low to show a meaningful summary result.
  Reply
SaraMay 8, 2019 at 10:15 am#
Hi Jason,
in scikit-learn documentation:
https://scikitlearn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html#sphx-glr-download-auto-examples-model-selection-plot-nested-cross-validation-iris-py
It talkes about difference between non-nested and nested cross-validation and using gridsearchcv as inner loop.
The non-nested makes sense and is the same as what you have written in this article, however, the Nested one is confusing.
Do you recommend using non-nested and nested cross-validation? and why?
Many Thanks
# Load the dataset
iris = load_iris()
X_iris = iris.data
y_iris = iris.target
# Set up possible values of parameters to optimize over
p_grid = {“C”: [1, 10, 100],”gamma”: [.01, .1]}
# We will use a Support Vector Classifier with “rbf” kernel
svm = SVC(kernel=”rbf”)
# Non_nested parameter search and scoring: Inner loop
clf = GridSearchCV(estimator=svm, param_grid=p_grid, cv=4)
clf.fit(X_iris, y_iris)
print (‘non_nested_scores=’, clf.best_score_)
# Nested CV with parameter optimization: Outer loop
nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=4)
print (‘Nested_scores=’, nested_score.mean())
Reply
- Jason BrownleeMay 8, 2019 at 2:10 pm#
  Not nested, it just gets to messy. A simpler test harness is easier to understand and harder to mess up.
  Reply
Niez GhabiJune 19, 2019 at 12:23 am#
Hey Jason,
I have been trying GridSearchCV on a a neural network but it took so long so I decided to work on RandomizedSearchCV.
I was trying to tune the learning rate this way :
learn_rate = [0.001, 0.01, 0.1, 0.2, 0.3]
param_grid = dict(learn_rate=learn_rate)
grid = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=5)
but i got this error:
ValueError: learn_rate is not a legal parameter
Can you please help me figure it out, given that i didn’t get this error with grid search ?
Thank you
Reply
- Jason BrownleeJune 19, 2019 at 8:13 am#
  Perhaps try this approach of you are using Keras:
  https://machinelearningmastery.com/how-to-grid-search-deep-learning-models-for-time-series-forecasting/
  Reply
SaraJune 20, 2019 at 10:56 am#
I am using a SVM for classification. After all data preparation and feature selection when I got good prediction on the unseen data,then, I tuned the SVM and the tuned model has a large C=1,000,000 (penalty parameter). This tuned SVM which has a large C=1,000,000 (has the best metric evaluation on test data and cross validated splits), gives awful predictions for unseen data.
Is it acceptable that I do not tune hyperparametrs C and gamma and just continue with the untuned C and gamma which give good predictions?
Can I say that a SVM with the highest metric evaluation on train/test data does not necessarily gives the best prediction model?
Reply
- Jason BrownleeJune 20, 2019 at 1:59 pm#
  The results suggest that the model may have overfit the training data.
  Perhaps tune the model using a validation dataset:
  https://machinelearningmastery.com/difference-test-validation-datasets/
  Reply
  - Moin KhanAugust 23, 2019 at 1:32 pm#
    In Python logistics regression can be applied in two ways
    1) sklearn.logisticmodel()
    2) stats.logit()
    Logistics regression gives us only final prediction
    Logit model gives us summary which contains P_values, coefficient, intercept etc and prediction also.
    I want to run stats.logit() through randomCV which should give me a summary. But stats.logit() is not working with randomCV here I am facing issue only.
    Plz suggest me.
    Reply
    - Jason BrownleeAugust 23, 2019 at 2:11 pm#
      You may need to wrap the statsmodels implementation in a class suitable for processing with sklearn.
      Sorry, I don’t have an example.
      Reply
adaneDecember 5, 2019 at 7:08 pm#
but could you please help me how to use it on CNN ?
Reply
- Jason BrownleeDecember 6, 2019 at 5:12 am#
  Yes, you can get started here:
  https://machinelearningmastery.com/start-here/#dlfcv
  Reply
CarolinaDecember 22, 2019 at 3:31 am#
Is there a way of using GridSearchCV to find the best hyperparameters using a specific validation set, instead of cross-validation?
Reply
- Jason BrownleeDecember 22, 2019 at 6:17 am#
  Not with the gridsearchcv, it is designed to use cross-validaiton.
  You can use your own for-loops over the hyperparameters, then fit/evaluate your model manually.
  Reply
daniele baranziniJanuary 27, 2020 at 7:03 pm#
Jason, is it correct to say that if you split in training and test set, then you tune with gridsearchcv the training set ….in order to improve cross-validation in the test set ?
Reply
- Jason BrownleeJanuary 28, 2020 at 7:52 am#
  No.
  With with train.
  Tune with validation.
  Evaluate with test.
  More here:
  https://machinelearningmastery.com/difference-test-validation-datasets/
  Reply
  - daniele baranziniJanuary 28, 2020 at 7:18 pm#
    So tuning hyper-parameters (e.g., with gridsearchcv) on the validation set? (not on train or test sets)
    did I get it right? : )
    ciao! from italy
    Reply
    - Jason BrownleeJanuary 29, 2020 at 6:32 am#
      Yes. Although the train set can be split into train and validation sets by the grid searching API – automatically.
      This is what grid search cv is essentially doing.
      Reply
felixJune 5, 2020 at 3:48 am#
Jason ,How to do hyperparameter tuning for the Bernoulli Naive Bayes Classifier using sklearn I’m trying with GridSearch CV,how to give the alpha parameter for the params??.
Reply
- Jason BrownleeJune 5, 2020 at 8:21 am#
  This may help:
  https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.BernoulliNB.html
  Perhaps test a range of values.
  Reply
RahulAugust 9, 2020 at 6:24 am#
Hi Jason,
In Grid Search, the best estimator is the one which has the best metric given in scoring function but I getting the large difference between mean_train_score and mean_test_score for that parameter setting. In this case what should we do?
Reply
- Jason BrownleeAugust 10, 2020 at 5:42 am#
  This can happen when using a small or unrepresentative test set.
  Perhaps confirm that the size and composition of the train and test sets is the same.
  Reply
irmaAugust 9, 2020 at 5:31 pm#
Hi,
I am not sure how that is the best result. If you’re using Ridge it should find the model with the smallest loss (Ridge is minimizing the least squares function). If I print all the result from the grid search with your example above using this code:
means = grid_result.cv_results_[‘mean_test_score’]
stds = grid_result.cv_results_[‘std_test_score’]
params = grid_result.cv_results_[‘params’]
for mean, stdev, param in zip(means, stds, params):
print(“%f (%f) with: %r” % (mean, stdev, param))
I get:
0.410176 (0.044979) with: {‘alpha’: 1.0}
0.479884 (0.047264) with: {‘alpha’: 0.1}
0.481444 (0.048503) with: {‘alpha’: 0.01}
0.482309 (0.049028) with: {‘alpha’: 0.001}
0.482323 (0.049236) with: {‘alpha’: 0.0001}
0.482318 (0.049266) with: {‘alpha’: 0.0}
Obviously, the smallest score is 0.41 with alpha = 1, right? Or am I completely wrong about interpreting this?
P.S. The same result is for the training score:
0.435197 (0.009882) with: {‘alpha’: 1.0}
0.513133 (0.008521) with: {‘alpha’: 0.1}
0.518137 (0.008749) with: {‘alpha’: 0.01}
0.519445 (0.008983) with: {‘alpha’: 0.001}
0.519506 (0.008996) with: {‘alpha’: 0.0001}
0.519506 (0.008996) with: {‘alpha’: 0.0}
How does one choose the smallest loss?
Reply
- Jason BrownleeAugust 10, 2020 at 5:47 am#
  Yes, we seek the config that results in the smallest loss.
  The grid search will tell you the best performing model and result:
  ...print(rsearch.best_score_)print(rsearch.best_estimator_.alpha)
  1
  2
  3
  ...
  print(rsearch.best_score_)
  print(rsearch.best_estimator_.alpha)
  Reply
KusJuly 26, 2021 at 3:55 pm#
Hello Mr. jason. I need hyperparameter tuning with evolutionary algorithm search which i though ill get it from here. can you suggest me link or documentation that worked? i try it myself and got eror.
Reply
- Jason BrownleeJuly 27, 2021 at 5:04 am#
  This may help you to get started:
  https://machinelearningmastery.com/simple-genetic-algorithm-from-scratch-in-python/
  Reply

Movatterモバイル変換

Navigation

How to Tune Algorithm Parameters with Scikit-Learn

Machine Learning Algorithm Parameters

Grid Search Parameter Tuning

Random Search Parameter Tuning

Summary

Discover Fast Machine Learning in Python!

Develop Your Own Models in Minutes

Finally Bring Machine Learning To
Your Own Projects

More On This Topic

About Jason Brownlee

82 Responses toHow to Tune Algorithm Parameters with Scikit-Learn

Leave a ReplyClick here to cancel reply.

Never miss a tutorial:

Picked for you:

Loving the Tutorials?

Movatterモバイル変換

Navigation

Machine Learning Algorithm Parameters

Grid Search Parameter Tuning

Random Search Parameter Tuning

Summary

Discover Fast Machine Learning in Python!

Develop Your Own Models in Minutes

Finally Bring Machine Learning ToYour Own Projects

More On This Topic

About Jason Brownlee

82 Responses toHow to Tune Algorithm Parameters with Scikit-Learn

Leave a ReplyClick here to cancel reply.

Never miss a tutorial:

Picked for you:

Loving the Tutorials?

Finally Bring Machine Learning To
Your Own Projects