GradientBoostingClassifier#

classsklearn.ensemble.GradientBoostingClassifier(*,loss='log_loss',learning_rate=0.1,n_estimators=100,subsample=1.0,criterion='friedman_mse',min_samples_split=2,min_samples_leaf=1,min_weight_fraction_leaf=0.0,max_depth=3,min_impurity_decrease=0.0,init=None,random_state=None,max_features=None,verbose=0,max_leaf_nodes=None,warm_start=False,validation_fraction=0.1,n_iter_no_change=None,tol=0.0001,ccp_alpha=0.0)[source]#

Gradient Boosting for classification.

This algorithm builds an additive model in a forward stage-wise fashion; itallows for the optimization of arbitrary differentiable loss functions. Ineach stagen_classes_ regression trees are fit on the negative gradientof the loss function, e.g. binary or multiclass log loss. Binaryclassification is a special case where only a single regression tree isinduced.

HistGradientBoostingClassifier is a much faster variantof this algorithm for intermediate and large datasets (n_samples>=10_000) andsupports monotonic constraints.

Read more in theUser Guide.

Parameters:
loss{‘log_loss’, ‘exponential’}, default=’log_loss’

The loss function to be optimized. ‘log_loss’ refers to binomial andmultinomial deviance, the same as used in logistic regression.It is a good choice for classification with probabilistic outputs.For loss ‘exponential’, gradient boosting recovers the AdaBoost algorithm.

learning_ratefloat, default=0.1

Learning rate shrinks the contribution of each tree bylearning_rate.There is a trade-off between learning_rate and n_estimators.Values must be in the range[0.0,inf).

For an example of the effects of this parameter and its interaction withsubsample, seeGradient Boosting regularization.

n_estimatorsint, default=100

The number of boosting stages to perform. Gradient boostingis fairly robust to over-fitting so a large number usuallyresults in better performance.Values must be in the range[1,inf).

subsamplefloat, default=1.0

The fraction of samples to be used for fitting the individual baselearners. If smaller than 1.0 this results in Stochastic GradientBoosting.subsample interacts with the parametern_estimators.Choosingsubsample<1.0 leads to a reduction of varianceand an increase in bias.Values must be in the range(0.0,1.0].

criterion{‘friedman_mse’, ‘squared_error’}, default=’friedman_mse’

The function to measure the quality of a split. Supported criteria are‘friedman_mse’ for the mean squared error with improvement score byFriedman, ‘squared_error’ for mean squared error. The default value of‘friedman_mse’ is generally the best as it can provide a betterapproximation in some cases.

Added in version 0.18.

min_samples_splitint or float, default=2

The minimum number of samples required to split an internal node:

  • If int, values must be in the range[2,inf).

  • If float, values must be in the range(0.0,1.0] andmin_samples_splitwill beceil(min_samples_split*n_samples).

Changed in version 0.18:Added float values for fractions.

min_samples_leafint or float, default=1

The minimum number of samples required to be at a leaf node.A split point at any depth will only be considered if it leaves atleastmin_samples_leaf training samples in each of the left andright branches. This may have the effect of smoothing the model,especially in regression.

  • If int, values must be in the range[1,inf).

  • If float, values must be in the range(0.0,1.0) andmin_samples_leafwill beceil(min_samples_leaf*n_samples).

Changed in version 0.18:Added float values for fractions.

min_weight_fraction_leaffloat, default=0.0

The minimum weighted fraction of the sum total of weights (of allthe input samples) required to be at a leaf node. Samples haveequal weight when sample_weight is not provided.Values must be in the range[0.0,0.5].

max_depthint or None, default=3

Maximum depth of the individual regression estimators. The maximumdepth limits the number of nodes in the tree. Tune this parameterfor best performance; the best value depends on the interactionof the input variables. If None, then nodes are expanded untilall leaves are pure or until all leaves contain less thanmin_samples_split samples.If int, values must be in the range[1,inf).

min_impurity_decreasefloat, default=0.0

A node will be split if this split induces a decrease of the impuritygreater than or equal to this value.Values must be in the range[0.0,inf).

The weighted impurity decrease equation is the following:

N_t/N*(impurity-N_t_R/N_t*right_impurity-N_t_L/N_t*left_impurity)

whereN is the total number of samples,N_t is the number ofsamples at the current node,N_t_L is the number of samples in theleft child, andN_t_R is the number of samples in the right child.

N,N_t,N_t_R andN_t_L all refer to the weighted sum,ifsample_weight is passed.

Added in version 0.19.

initestimator or ‘zero’, default=None

An estimator object that is used to compute the initial predictions.init has to providefit andpredict_proba. If‘zero’, the initial raw predictions are set to zero. By default, aDummyEstimator predicting the classes priors is used.

random_stateint, RandomState instance or None, default=None

Controls the random seed given to each Tree estimator at eachboosting iteration.In addition, it controls the random permutation of the features ateach split (see Notes for more details).It also controls the random splitting of the training data to obtain avalidation set ifn_iter_no_change is not None.Pass an int for reproducible output across multiple function calls.SeeGlossary.

max_features{‘sqrt’, ‘log2’}, int or float, default=None

The number of features to consider when looking for the best split:

  • If int, values must be in the range[1,inf).

  • If float, values must be in the range(0.0,1.0] and the featuresconsidered at each split will bemax(1,int(max_features*n_features_in_)).

  • If ‘sqrt’, thenmax_features=sqrt(n_features).

  • If ‘log2’, thenmax_features=log2(n_features).

  • If None, thenmax_features=n_features.

Choosingmax_features<n_features leads to a reduction of varianceand an increase in bias.

Note: the search for a split does not stop until at least onevalid partition of the node samples is found, even if it requires toeffectively inspect more thanmax_features features.

verboseint, default=0

Enable verbose output. If 1 then it prints progress and performanceonce in a while (the more trees the lower the frequency). If greaterthan 1 then it prints progress and performance for every tree.Values must be in the range[0,inf).

max_leaf_nodesint, default=None

Grow trees withmax_leaf_nodes in best-first fashion.Best nodes are defined as relative reduction in impurity.Values must be in the range[2,inf).IfNone, then unlimited number of leaf nodes.

warm_startbool, default=False

When set toTrue, reuse the solution of the previous call to fitand add more estimators to the ensemble, otherwise, just erase theprevious solution. Seethe Glossary.

validation_fractionfloat, default=0.1

The proportion of training data to set aside as validation set forearly stopping. Values must be in the range(0.0,1.0).Only used ifn_iter_no_change is set to an integer.

Added in version 0.20.

n_iter_no_changeint, default=None

n_iter_no_change is used to decide if early stopping will be usedto terminate training when validation score is not improving. Bydefault it is set to None to disable early stopping. If set to anumber, it will set asidevalidation_fraction size of the trainingdata as validation and terminate training when validation score is notimproving in all of the previousn_iter_no_change numbers ofiterations. The split is stratified.Values must be in the range[1,inf).SeeEarly stopping in Gradient Boosting.

Added in version 0.20.

tolfloat, default=1e-4

Tolerance for the early stopping. When the loss is not improvingby at least tol forn_iter_no_change iterations (if set to anumber), the training stops.Values must be in the range[0.0,inf).

Added in version 0.20.

ccp_alphanon-negative float, default=0.0

Complexity parameter used for Minimal Cost-Complexity Pruning. Thesubtree with the largest cost complexity that is smaller thanccp_alpha will be chosen. By default, no pruning is performed.Values must be in the range[0.0,inf).SeeMinimal Cost-Complexity Pruning for details. SeePost pruning decision trees with cost complexity pruningfor an example of such pruning.

Added in version 0.22.

Attributes:
n_estimators_int

The number of estimators as selected by early stopping (ifn_iter_no_change is specified). Otherwise it is set ton_estimators.

Added in version 0.20.

n_trees_per_iteration_int

The number of trees that are built at each iteration. For binary classifiers,this is always 1.

Added in version 1.4.0.

feature_importances_ndarray of shape (n_features,)

The impurity-based feature importances.

oob_improvement_ndarray of shape (n_estimators,)

The improvement in loss on the out-of-bag samplesrelative to the previous iteration.oob_improvement_[0] is the improvement inloss of the first stage over theinit estimator.Only available ifsubsample<1.0.

oob_scores_ndarray of shape (n_estimators,)

The full history of the loss values on the out-of-bagsamples. Only available ifsubsample<1.0.

Added in version 1.3.

oob_score_float

The last value of the loss on the out-of-bag samples. It isthe same asoob_scores_[-1]. Only available ifsubsample<1.0.

Added in version 1.3.

train_score_ndarray of shape (n_estimators,)

The i-th scoretrain_score_[i] is the loss of themodel at iterationi on the in-bag sample.Ifsubsample==1 this is the loss on the training data.

init_estimator

The estimator that provides the initial predictions. Set via theinitargument.

estimators_ndarray of DecisionTreeRegressor of shape (n_estimators,n_trees_per_iteration_)

The collection of fitted sub-estimators.n_trees_per_iteration_ is 1 forbinary classification, otherwisen_classes.

classes_ndarray of shape (n_classes,)

The classes labels.

n_features_in_int

Number of features seen duringfit.

Added in version 0.24.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen duringfit. Defined only whenXhas feature names that are all strings.

Added in version 1.0.

n_classes_int

The number of classes.

max_features_int

The inferred value of max_features.

See also

HistGradientBoostingClassifier

Histogram-based Gradient Boosting Classification Tree.

sklearn.tree.DecisionTreeClassifier

A decision tree classifier.

RandomForestClassifier

A meta-estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

AdaBoostClassifier

A meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.

Notes

The features are always randomly permuted at each split. Therefore,the best found split may vary, even with the same training data andmax_features=n_features, if the improvement of the criterion isidentical for several splits enumerated during the search of the bestsplit. To obtain a deterministic behaviour during fitting,random_state has to be fixed.

References

J. Friedman, Greedy Function Approximation: A Gradient BoostingMachine, The Annals of Statistics, Vol. 29, No. 5, 2001.

  1. Friedman, Stochastic Gradient Boosting, 1999

T. Hastie, R. Tibshirani and J. Friedman.Elements of Statistical Learning Ed. 2, Springer, 2009.

Examples

The following example shows how to fit a gradient boosting classifier with100 decision stumps as weak learners.

>>>fromsklearn.datasetsimportmake_hastie_10_2>>>fromsklearn.ensembleimportGradientBoostingClassifier
>>>X,y=make_hastie_10_2(random_state=0)>>>X_train,X_test=X[:2000],X[2000:]>>>y_train,y_test=y[:2000],y[2000:]
>>>clf=GradientBoostingClassifier(n_estimators=100,learning_rate=1.0,...max_depth=1,random_state=0).fit(X_train,y_train)>>>clf.score(X_test,y_test)0.913
apply(X)[source]#

Apply trees in the ensemble to X, return leaf indices.

Added in version 0.17.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The input samples. Internally, its dtype will be converted todtype=np.float32. If a sparse matrix is provided, it willbe converted to a sparsecsr_matrix.

Returns:
X_leavesarray-like of shape (n_samples, n_estimators, n_classes)

For each datapoint x in X and for each tree in the ensemble,return the index of the leaf x ends up in each estimator.In the case of binary classification n_classes is 1.

decision_function(X)[source]#

Compute the decision function ofX.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The input samples. Internally, it will be converted todtype=np.float32 and if a sparse matrix is providedto a sparsecsr_matrix.

Returns:
scorendarray of shape (n_samples, n_classes) or (n_samples,)

The decision function of the input samples, which corresponds tothe raw values predicted from the trees of the ensemble . Theorder of the classes corresponds to that in the attributeclasses_. Regression and binary classification produce anarray of shape (n_samples,).

fit(X,y,sample_weight=None,monitor=None)[source]#

Fit the gradient boosting model.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The input samples. Internally, it will be converted todtype=np.float32 and if a sparse matrix is providedto a sparsecsr_matrix.

yarray-like of shape (n_samples,)

Target values (strings or integers in classification, real numbersin regression)For classification, labels must correspond to classes.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights. If None, then samples are equally weighted. Splitsthat would create child nodes with net zero or negative weight areignored while searching for a split in each node. In the case ofclassification, splits are also ignored if they would result in anysingle class carrying a negative weight in either child node.

monitorcallable, default=None

The monitor is called after each iteration with the currentiteration, a reference to the estimator and the local variables of_fit_stages as keyword argumentscallable(i,self,locals()). If the callable returnsTrue the fitting procedureis stopped. The monitor can be used for various things such ascomputing held-out estimates, early stopping, model introspect, andsnapshotting.

Returns:
selfobject

Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please checkUser Guide on how the routingmechanism works.

Returns:
routingMetadataRequest

AMetadataRequest encapsulatingrouting information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator andcontained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]#

Predict class for X.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The input samples. Internally, it will be converted todtype=np.float32 and if a sparse matrix is providedto a sparsecsr_matrix.

Returns:
yndarray of shape (n_samples,)

The predicted values.

predict_log_proba(X)[source]#

Predict class log-probabilities for X.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The input samples. Internally, it will be converted todtype=np.float32 and if a sparse matrix is providedto a sparsecsr_matrix.

Returns:
pndarray of shape (n_samples, n_classes)

The class log-probabilities of the input samples. The order of theclasses corresponds to that in the attributeclasses_.

Raises:
AttributeError

If theloss does not support probabilities.

predict_proba(X)[source]#

Predict class probabilities for X.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The input samples. Internally, it will be converted todtype=np.float32 and if a sparse matrix is providedto a sparsecsr_matrix.

Returns:
pndarray of shape (n_samples, n_classes)

The class probabilities of the input samples. The order of theclasses corresponds to that in the attributeclasses_.

Raises:
AttributeError

If theloss does not support probabilities.

score(X,y,sample_weight=None)[source]#

Returnaccuracy on provided data and labels.

In multi-label classification, this is the subset accuracywhich is a harsh metric since you require for each sample thateach label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels forX.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy ofself.predict(X) w.r.t.y.

set_fit_request(*,monitor:bool|None|str='$UNCHANGED$',sample_weight:bool|None|str='$UNCHANGED$')GradientBoostingClassifier[source]#

Configure whether metadata should be requested to be passed to thefit method.

Note that this method is only relevant when this estimator is used as asub-estimator within ameta-estimator and metadata routing is enabledwithenable_metadata_routing=True (seesklearn.set_config).Please check theUser Guide on how the routingmechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed tofit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it tofit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains theexisting request. This allows you to change the request for someparameters and not others.

Added in version 1.3.

Parameters:
monitorstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing formonitor parameter infit.

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing forsample_weight parameter infit.

Returns:
selfobject

The updated object.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects(such asPipeline). The latter haveparameters of the form<component>__<parameter> so that it’spossible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

set_score_request(*,sample_weight:bool|None|str='$UNCHANGED$')GradientBoostingClassifier[source]#

Configure whether metadata should be requested to be passed to thescore method.

Note that this method is only relevant when this estimator is used as asub-estimator within ameta-estimator and metadata routing is enabledwithenable_metadata_routing=True (seesklearn.set_config).Please check theUser Guide on how the routingmechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed toscore if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it toscore.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains theexisting request. This allows you to change the request for someparameters and not others.

Added in version 1.3.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing forsample_weight parameter inscore.

Returns:
selfobject

The updated object.

staged_decision_function(X)[source]#

Compute decision function ofX for each iteration.

This method allows monitoring (i.e. determine error on testing set)after each stage.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The input samples. Internally, it will be converted todtype=np.float32 and if a sparse matrix is providedto a sparsecsr_matrix.

Yields:
scoregenerator of ndarray of shape (n_samples, k)

The decision function of the input samples, which corresponds tothe raw values predicted from the trees of the ensemble . Theclasses corresponds to that in the attributeclasses_.Regression and binary classification are special cases withk==1, otherwisek==n_classes.

staged_predict(X)[source]#

Predict class at each stage for X.

This method allows monitoring (i.e. determine error on testing set)after each stage.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The input samples. Internally, it will be converted todtype=np.float32 and if a sparse matrix is providedto a sparsecsr_matrix.

Yields:
ygenerator of ndarray of shape (n_samples,)

The predicted value of the input samples.

staged_predict_proba(X)[source]#

Predict class probabilities at each stage for X.

This method allows monitoring (i.e. determine error on testing set)after each stage.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The input samples. Internally, it will be converted todtype=np.float32 and if a sparse matrix is providedto a sparsecsr_matrix.

Yields:
ygenerator of ndarray of shape (n_samples,)

The predicted value of the input samples.

Gallery examples#

Feature transformations with ensembles of trees

Feature transformations with ensembles of trees

Gradient Boosting Out-of-Bag estimates

Gradient Boosting Out-of-Bag estimates

Gradient Boosting regularization

Gradient Boosting regularization

Feature discretization

Feature discretization