You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
A hyperopt wrapper - simplifying hyperparameter tuning with Scikit-learn style estimators.
Works with either classification evaluation metrics "f1", "auc" or "accuracy" AND regression "rmse" and "mse".
Installation:
pip install skperopt
Usage:
Just pass in an estimator, a parameter grid and Hyperopt will do the rest. No need to define objectives or write hyoperopt specific parameter grids.
Recipe (vanilla flavour):
Import skperopt
Initalize skperopt
Run skperopt.HyperSearch.search
Collect the results
Code example below.
importskperoptasskimportpandasaspdfromsklearn.datasetsimportmake_classificationfromsklearn.neighborsimportKNeighborsClassifier#generate classification datadata=make_classification(n_samples=1000,n_features=10,n_classes=2)X=pd.DataFrame(data[0])y=pd.DataFrame(data[1])#init the classifierkn=KNeighborsClassifier()param= {"n_neighbors": [int(x)forxinnp.linspace(1,60,30)],"leaf_size": [int(x)forxinnp.linspace(1,60,30)],"p": [1,2,3,4,5,10,20],"algorithm": ['auto','ball_tree','kd_tree','brute'],"weights": ["uniform","distance"]}#search parameterssearch=sk.HyperSearch(kn,X,y,params=param)search.search()#gather and apply the best parameterskn.set_params(**search.best_params)#view run resultsprint(search.stats)
HyperSearch parameters
est ([sklearn estimator] required)
any sklearn style estimator
X ([pandas Dataframe] required)
your training data
y ([pandas Dataframe] required)
your training data
params ([dictionary] required)
a parameter search grid
iters (default 500[int])
number of iterations to try before early stopping
time_to_search (default None[int])
time in seconds to run for before early stopping (None = no time limit)
cv (default 5[int])
number of folds to use in cross_vaidation tests
cv_times (default 1[int])
number of times to perfrom cross validation on a new random sample of the data -higher values decrease variance but increase run time
randomState (default 10[int])
random state for the data shuffling
scorer (default "f1"[str])
type of evaluation metric to use - accepts classification "f1","auc","accuracy" or regression "rmse" and "mse"
verbose (default 1[int])
amount of verbosity
0 = none 1 = some 2 = debug
random (default -False)
should the data be randomized during the cross validation
foldtype (default "Kfold"[str])
type of folds to use - accepts "KFold", "Stratified"
HyperSearch methods
HyperSearch.search() (None)
Used to search the parameter grid using hyperopt. No parameters need to be passed to the function. All parameters are set during initialization.
Testing
With 100 tests of 150 search iterations for both RandomSearch and Skperopt Searches.
Skperopt (hyperopt) performs better than a RandomSearch, producing higher average f1 score with a smaller standard deviation.
Skperopt Search Results
f1 score over 100 test runs:
Mean0.9340930
Standard deviation0.0062275
Random Search Results
f1 score over 100 test runs
Mean0.927461652
Standard deviation0.0063314
Updates
V0.0.73
Added cv_times attr - runs the cross validation n times (ie cv (5x5) ) each iteration on a new randomly sampled data setthis should reduce overfitting
V0.0.7
AddedFIXED RMSE eval metric
Added MSE eval metric
About
A hyperopt wrapper - simplifying hyperparameter tuning with Scikit-learn style estimators.