cross_val_predict#
- sklearn.model_selection.cross_val_predict(estimator,X,y=None,*,groups=None,cv=None,n_jobs=None,verbose=0,params=None,pre_dispatch='2*n_jobs',method='predict')[source]#
Generate cross-validated estimates for each input data point.
The data is split according to the cv parameter. Each sample belongsto exactly one test set, and its prediction is computed with anestimator fitted on the corresponding training set.
Passing these predictions into an evaluation metric may not be a validway to measure generalization performance. Results can differ from
cross_validateandcross_val_scoreunless all tests setshave equal size and the metric decomposes over samples.Read more in theUser Guide.
- Parameters:
- estimatorestimator
The estimator instance to use to fit the data. It must implement a
fitmethod and the method given by themethodparameter.- X{array-like, sparse matrix} of shape (n_samples, n_features)
The data to fit. Can be, for example a list, or an array at least 2d.
- y{array-like, sparse matrix} of shape (n_samples,) or (n_samples, n_outputs), default=None
The target variable to try to predict in the case ofsupervised learning.
- groupsarray-like of shape (n_samples,), default=None
Group labels for the samples used while splitting the dataset intotrain/test set. Only used in conjunction with a “Group”cvinstance (e.g.,
GroupKFold).Changed in version 1.4:
groupscan only be passed if metadata routing is not enabledviasklearn.set_config(enable_metadata_routing=True). When routingis enabled, passgroupsalongside other metadata via theparamsargument instead. E.g.:cross_val_predict(...,params={'groups':groups}).- cvint, cross-validation generator or an iterable, default=None
Determines the cross-validation splitting strategy.Possible inputs for cv are:
None, to use the default 5-fold cross validation,
int, to specify the number of folds in a
(Stratified)KFold,An iterable that generates (train, test) splits as arrays of indices.
For int/None inputs, if the estimator is a classifier and
yiseither binary or multiclass,StratifiedKFoldis used. In allother cases,KFoldis used. These splitters are instantiatedwithshuffle=Falseso the splits will be the same across calls.ReferUser Guide for the variouscross-validation strategies that can be used here.
Changed in version 0.22:
cvdefault value if None changed from 3-fold to 5-fold.- n_jobsint, default=None
Number of jobs to run in parallel. Training the estimator andpredicting are parallelized over the cross-validation splits.
Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. SeeGlossaryfor more details.- verboseint, default=0
The verbosity level.
- paramsdict, default=None
Parameters to pass to the underlying estimator’s
fitand the CVsplitter.Added in version 1.4.
- pre_dispatchint or str, default=’2*n_jobs’
Controls the number of jobs that get dispatched during parallelexecution. Reducing this number can be useful to avoid anexplosion of memory consumption when more jobs get dispatchedthan CPUs can process. This parameter can be:
None, in which case all the jobs are immediately created and spawned. Usethis for lightweight and fast-running jobs, to avoid delays due to on-demandspawning of the jobs
An int, giving the exact number of total jobs that are spawned
A str, giving an expression as a function of n_jobs, as in ‘2*n_jobs’
- method{‘predict’, ‘predict_proba’, ‘predict_log_proba’, ‘decision_function’}, default=’predict’
The method to be invoked by
estimator.
- Returns:
- predictionsndarray
This is the result of calling
method. Shape:When
methodis ‘predict’ and in special case wheremethodis‘decision_function’ and the target is binary: (n_samples,)When
methodis one of {‘predict_proba’, ‘predict_log_proba’,‘decision_function’} (unless special case above):(n_samples, n_classes)If
estimatorismultioutput, an extra dimension‘n_outputs’ is added to the end of each shape above.
See also
cross_val_scoreCalculate score for each CV split.
cross_validateCalculate one or more scores and timings for each CV split.
Notes
In the case that one or more classes are absent in a training portion, adefault score needs to be assigned to all instances for that class if
methodproduces columns per class, as in {‘decision_function’,‘predict_proba’, ‘predict_log_proba’}. Forpredict_probathis value is0. In order to ensure finite output, we approximate negative infinity bythe minimum finite float value for the dtype in other cases.Examples
>>>fromsklearnimportdatasets,linear_model>>>fromsklearn.model_selectionimportcross_val_predict>>>diabetes=datasets.load_diabetes()>>>X=diabetes.data[:150]>>>y=diabetes.target[:150]>>>lasso=linear_model.Lasso()>>>y_pred=cross_val_predict(lasso,X,y,cv=3)
For a detailed example of using
cross_val_predictto visualizeprediction errors, please seePlotting Cross-Validated Predictions.
