Pipeline#

classsklearn.pipeline.Pipeline(steps,*,transform_input=None,memory=None,verbose=False)[source]#

A sequence of data transformers with an optional final predictor.

Pipeline allows you to sequentially apply a list of transformers topreprocess the data and, if desired, conclude the sequence with a finalpredictor for predictive modeling.

Intermediate steps of the pipeline must be transformers, that is, theymust implementfit andtransform methods.The finalestimator only needs to implementfit.The transformers in the pipeline can be cached usingmemory argument.

The purpose of the pipeline is to assemble several steps that can becross-validated together while setting different parameters. For this, itenables setting parameters of the various steps using their names and theparameter name separated by a'__', as in the example below. A step’sestimator may be replaced entirely by setting the parameter with its nameto another estimator, or a transformer removed by setting it to'passthrough' orNone.

For an example use case ofPipeline combined withGridSearchCV, refer toSelecting dimensionality reduction with Pipeline and GridSearchCV. TheexamplePipelining: chaining a PCA and a logistic regression shows howto grid search on a pipeline using'__' as a separator in the parameter names.

Read more in theUser Guide.

Added in version 0.5.

Parameters:
stepslist of tuples

List of (name of step, estimator) tuples that are to be chained insequential order. To be compatible with the scikit-learn API, all stepsmust definefit. All non-last steps must also definetransform. SeeCombining Estimators for more details.

transform_inputlist of str, default=None

The names of themetadata parameters that should be transformed by thepipeline before passing it to the step consuming it.

This enables transforming some input arguments tofit (other thanX)to be transformed by the steps of the pipeline up to the step which requiresthem. Requirement is defined viametadata routing.For instance, this can be used to pass a validation set through the pipeline.

You can only set this if metadata routing is enabled, which youcan enable usingsklearn.set_config(enable_metadata_routing=True).

Added in version 1.6.

memorystr or object with the joblib.Memory interface, default=None

Used to cache the fitted transformers of the pipeline. The last stepwill never be cached, even if it is a transformer. By default, nocaching is performed. If a string is given, it is the path to thecaching directory. Enabling caching triggers a clone of the transformersbefore fitting. Therefore, the transformer instance given to thepipeline cannot be inspected directly. Use the attributenamed_stepsorsteps to inspect estimators within the pipeline. Caching thetransformers is advantageous when fitting is time consuming. SeeCaching nearest neighborsfor an example on how to enable caching.

verbosebool, default=False

If True, the time elapsed while fitting each step will be printed as itis completed.

Attributes:
named_stepsBunch

Access the steps by name.

classes_ndarray of shape (n_classes,)

The classes labels.

n_features_in_int

Number of features seen during first stepfit method.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during first stepfit method.

See also

make_pipeline

Convenience function for simplified pipeline construction.

Examples

>>>fromsklearn.svmimportSVC>>>fromsklearn.preprocessingimportStandardScaler>>>fromsklearn.datasetsimportmake_classification>>>fromsklearn.model_selectionimporttrain_test_split>>>fromsklearn.pipelineimportPipeline>>>X,y=make_classification(random_state=0)>>>X_train,X_test,y_train,y_test=train_test_split(X,y,...random_state=0)>>>pipe=Pipeline([('scaler',StandardScaler()),('svc',SVC())])>>># The pipeline can be used as any other estimator>>># and avoids leaking the test set into the train set>>>pipe.fit(X_train,y_train).score(X_test,y_test)0.88>>># An estimator's parameter can be set using '__' syntax>>>pipe.set_params(svc__C=10).fit(X_train,y_train).score(X_test,y_test)0.76
decision_function(X,**params)[source]#

Transform the data, and applydecision_function with the final estimator.

Calltransform of each transformer in the pipeline. The transformeddata are finally passed to the final estimator that callsdecision_function method. Only valid if the final estimatorimplementsdecision_function.

Parameters:
Xiterable

Data to predict on. Must fulfill input requirements of first stepof the pipeline.

**paramsdict of string -> object

Parameters requested and accepted by steps. Each step must haverequested certain metadata for these parameters to be forwarded tothem.

Added in version 1.4:Only available ifenable_metadata_routing=True. SeeMetadata Routing User Guide for moredetails.

Returns:
y_scorendarray of shape (n_samples, n_classes)

Result of callingdecision_function on the final estimator.

fit(X,y=None,**params)[source]#

Fit the model.

Fit all the transformers one after the other and sequentially transform thedata. Finally, fit the transformed data using the final estimator.

Parameters:
Xiterable

Training data. Must fulfill input requirements of first step of thepipeline.

yiterable, default=None

Training targets. Must fulfill label requirements for all steps ofthe pipeline.

**paramsdict of str -> object
  • Ifenable_metadata_routing=False (default): Parameters passed to thefit method of each step, where each parameter name is prefixed suchthat parameterp for steps has keys__p.

  • Ifenable_metadata_routing=True: Parameters requested and accepted bysteps. Each step must have requested certain metadata for these parametersto be forwarded to them.

Changed in version 1.4:Parameters are now passed to thetransform method of theintermediate steps as well, if requested, and ifenable_metadata_routing=True is set viaset_config.

SeeMetadata Routing User Guide for moredetails.

Returns:
selfobject

Pipeline with fitted steps.

fit_predict(X,y=None,**params)[source]#

Transform the data, and applyfit_predict with the final estimator.

Callfit_transform of each transformer in the pipeline. Thetransformed data are finally passed to the final estimator that callsfit_predict method. Only valid if the final estimator implementsfit_predict.

Parameters:
Xiterable

Training data. Must fulfill input requirements of first step ofthe pipeline.

yiterable, default=None

Training targets. Must fulfill label requirements for all stepsof the pipeline.

**paramsdict of str -> object
  • Ifenable_metadata_routing=False (default): Parameters to thepredict called at the end of all transformations in the pipeline.

  • Ifenable_metadata_routing=True: Parameters requested and accepted bysteps. Each step must have requested certain metadata for these parametersto be forwarded to them.

Added in version 0.20.

Changed in version 1.4:Parameters are now passed to thetransform method of theintermediate steps as well, if requested, and ifenable_metadata_routing=True.

SeeMetadata Routing User Guide for moredetails.

Note that while this may be used to return uncertainties from somemodels withreturn_std orreturn_cov, uncertainties that aregenerated by the transformations in the pipeline are not propagatedto the final estimator.

Returns:
y_predndarray

Result of callingfit_predict on the final estimator.

fit_transform(X,y=None,**params)[source]#

Fit the model and transform with the final estimator.

Fit all the transformers one after the other and sequentially transformthe data. Only valid if the final estimator either implementsfit_transform orfit andtransform.

Parameters:
Xiterable

Training data. Must fulfill input requirements of first step of thepipeline.

yiterable, default=None

Training targets. Must fulfill label requirements for all steps ofthe pipeline.

**paramsdict of str -> object
  • Ifenable_metadata_routing=False (default): Parameters passed to thefit method of each step, where each parameter name is prefixed suchthat parameterp for steps has keys__p.

  • Ifenable_metadata_routing=True: Parameters requested and accepted bysteps. Each step must have requested certain metadata for these parametersto be forwarded to them.

Changed in version 1.4:Parameters are now passed to thetransform method of theintermediate steps as well, if requested, and ifenable_metadata_routing=True.

SeeMetadata Routing User Guide for moredetails.

Returns:
Xtndarray of shape (n_samples, n_transformed_features)

Transformed samples.

get_feature_names_out(input_features=None)[source]#

Get output feature names for transformation.

Transform input features using the pipeline.

Parameters:
input_featuresarray-like of str or None, default=None

Input features.

Returns:
feature_names_outndarray of str objects

Transformed feature names.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please checkUser Guide on how the routingmechanism works.

Returns:
routingMetadataRouter

AMetadataRouter encapsulatingrouting information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Returns the parameters given in the constructor as well as theestimators contained within thesteps of thePipeline.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator andcontained subobjects that are estimators.

Returns:
paramsmapping of string to any

Parameter names mapped to their values.

inverse_transform(X,**params)[source]#

Applyinverse_transform for each step in a reverse order.

All estimators in the pipeline must supportinverse_transform.

Parameters:
Xarray-like of shape (n_samples, n_transformed_features)

Data samples, wheren_samples is the number of samples andn_features is the number of features. Must fulfillinput requirements of last step of pipeline’sinverse_transform method.

**paramsdict of str -> object

Parameters requested and accepted by steps. Each step must haverequested certain metadata for these parameters to be forwarded tothem.

Added in version 1.4:Only available ifenable_metadata_routing=True. SeeMetadata Routing User Guide for moredetails.

Returns:
X_originalndarray of shape (n_samples, n_features)

Inverse transformed data, that is, data in the original featurespace.

propertynamed_steps#

Access the steps by name.

Read-only attribute to access any step by given name.Keys are steps names and values are the steps objects.

predict(X,**params)[source]#

Transform the data, and applypredict with the final estimator.

Calltransform of each transformer in the pipeline. The transformeddata are finally passed to the final estimator that callspredictmethod. Only valid if the final estimator implementspredict.

Parameters:
Xiterable

Data to predict on. Must fulfill input requirements of first stepof the pipeline.

**paramsdict of str -> object
  • Ifenable_metadata_routing=False (default): Parameters to thepredict called at the end of all transformations in the pipeline.

  • Ifenable_metadata_routing=True: Parameters requested and accepted bysteps. Each step must have requested certain metadata for these parametersto be forwarded to them.

Added in version 0.20.

Changed in version 1.4:Parameters are now passed to thetransform method of theintermediate steps as well, if requested, and ifenable_metadata_routing=True is set viaset_config.

SeeMetadata Routing User Guide for moredetails.

Note that while this may be used to return uncertainties from somemodels withreturn_std orreturn_cov, uncertainties that aregenerated by the transformations in the pipeline are not propagatedto the final estimator.

Returns:
y_predndarray

Result of callingpredict on the final estimator.

predict_log_proba(X,**params)[source]#

Transform the data, and applypredict_log_proba with the final estimator.

Calltransform of each transformer in the pipeline. The transformeddata are finally passed to the final estimator that callspredict_log_proba method. Only valid if the final estimatorimplementspredict_log_proba.

Parameters:
Xiterable

Data to predict on. Must fulfill input requirements of first stepof the pipeline.

**paramsdict of str -> object
  • Ifenable_metadata_routing=False (default): Parameters to thepredict_log_proba called at the end of all transformations in thepipeline.

  • Ifenable_metadata_routing=True: Parameters requested and accepted bysteps. Each step must have requested certain metadata for these parametersto be forwarded to them.

Added in version 0.20.

Changed in version 1.4:Parameters are now passed to thetransform method of theintermediate steps as well, if requested, and ifenable_metadata_routing=True.

SeeMetadata Routing User Guide for moredetails.

Returns:
y_log_probandarray of shape (n_samples, n_classes)

Result of callingpredict_log_proba on the final estimator.

predict_proba(X,**params)[source]#

Transform the data, and applypredict_proba with the final estimator.

Calltransform of each transformer in the pipeline. The transformeddata are finally passed to the final estimator that callspredict_proba method. Only valid if the final estimator implementspredict_proba.

Parameters:
Xiterable

Data to predict on. Must fulfill input requirements of first stepof the pipeline.

**paramsdict of str -> object
  • Ifenable_metadata_routing=False (default): Parameters to thepredict_proba called at the end of all transformations in the pipeline.

  • Ifenable_metadata_routing=True: Parameters requested and accepted bysteps. Each step must have requested certain metadata for these parametersto be forwarded to them.

Added in version 0.20.

Changed in version 1.4:Parameters are now passed to thetransform method of theintermediate steps as well, if requested, and ifenable_metadata_routing=True.

SeeMetadata Routing User Guide for moredetails.

Returns:
y_probandarray of shape (n_samples, n_classes)

Result of callingpredict_proba on the final estimator.

score(X,y=None,sample_weight=None,**params)[source]#

Transform the data, and applyscore with the final estimator.

Calltransform of each transformer in the pipeline. The transformeddata are finally passed to the final estimator that callsscore method. Only valid if the final estimator implementsscore.

Parameters:
Xiterable

Data to predict on. Must fulfill input requirements of first stepof the pipeline.

yiterable, default=None

Targets used for scoring. Must fulfill label requirements for allsteps of the pipeline.

sample_weightarray-like, default=None

If not None, this argument is passed assample_weight keywordargument to thescore method of the final estimator.

**paramsdict of str -> object

Parameters requested and accepted by steps. Each step must haverequested certain metadata for these parameters to be forwarded tothem.

Added in version 1.4:Only available ifenable_metadata_routing=True. SeeMetadata Routing User Guide for moredetails.

Returns:
scorefloat

Result of callingscore on the final estimator.

score_samples(X)[source]#

Transform the data, and applyscore_samples with the final estimator.

Calltransform of each transformer in the pipeline. The transformeddata are finally passed to the final estimator that callsscore_samples method. Only valid if the final estimator implementsscore_samples.

Parameters:
Xiterable

Data to predict on. Must fulfill input requirements of first stepof the pipeline.

Returns:
y_scorendarray of shape (n_samples,)

Result of callingscore_samples on the final estimator.

set_output(*,transform=None)[source]#

Set the output container when"transform" and"fit_transform" are called.

Callingset_output will set the output of all estimators insteps.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output oftransform andfit_transform.

  • "default": Default output format of a transformer

  • "pandas": DataFrame output

  • "polars": Polars output

  • None: Transform configuration is unchanged

Added in version 1.4:"polars" option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**kwargs)[source]#

Set the parameters of this estimator.

Valid parameter keys can be listed withget_params(). Note thatyou can directly set the parameters of the estimators contained insteps.

Parameters:
**kwargsdict

Parameters of this estimator or parameters of estimators containedinsteps. Parameters of the steps may be set using its name andthe parameter name separated by a ‘__’.

Returns:
selfobject

Pipeline class instance.

set_score_request(*,sample_weight:bool|None|str='$UNCHANGED$')Pipeline[source]#

Request metadata passed to thescore method.

Note that this method is only relevant ifenable_metadata_routing=True (seesklearn.set_config).Please seeUser Guide on how the routingmechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed toscore if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it toscore.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains theexisting request. This allows you to change the request for someparameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as asub-estimator of a meta-estimator, e.g. used inside aPipeline. Otherwise it has no effect.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing forsample_weight parameter inscore.

Returns:
selfobject

The updated object.

transform(X,**params)[source]#

Transform the data, and applytransform with the final estimator.

Calltransform of each transformer in the pipeline. The transformeddata are finally passed to the final estimator that callstransform method. Only valid if the final estimatorimplementstransform.

This also works where final estimator isNone in which case all priortransformations are applied.

Parameters:
Xiterable

Data to transform. Must fulfill input requirements of first stepof the pipeline.

**paramsdict of str -> object

Parameters requested and accepted by steps. Each step must haverequested certain metadata for these parameters to be forwarded tothem.

Added in version 1.4:Only available ifenable_metadata_routing=True. SeeMetadata Routing User Guide for moredetails.

Returns:
Xtndarray of shape (n_samples, n_transformed_features)

Transformed data.

Gallery examples#

Feature agglomeration vs. univariate selection

Feature agglomeration vs. univariate selection

Column Transformer with Heterogeneous Data Sources

Column Transformer with Heterogeneous Data Sources

Column Transformer with Mixed Types

Column Transformer with Mixed Types

Selecting dimensionality reduction with Pipeline and GridSearchCV

Selecting dimensionality reduction with Pipeline and GridSearchCV

Pipelining: chaining a PCA and a logistic regression

Pipelining: chaining a PCA and a logistic regression

Concatenating multiple feature extraction methods

Concatenating multiple feature extraction methods

Pipeline ANOVA SVM

Pipeline ANOVA SVM

Recursive feature elimination

Recursive feature elimination

Permutation Importance vs Random Forest Feature Importance (MDI)

Permutation Importance vs Random Forest Feature Importance (MDI)

Poisson regression and non-normal loss

Poisson regression and non-normal loss

Explicit feature map approximation for RBF kernels

Explicit feature map approximation for RBF kernels

Displaying Pipelines

Displaying Pipelines

Balance model complexity and cross-validated score

Balance model complexity and cross-validated score

Sample pipeline for text feature extraction and evaluation

Sample pipeline for text feature extraction and evaluation

Underfitting vs. Overfitting

Underfitting vs. Overfitting

Caching nearest neighbors

Caching nearest neighbors

Nearest Neighbors Classification

Nearest Neighbors Classification

Comparing Nearest Neighbors with and without Neighborhood Components Analysis

Comparing Nearest Neighbors with and without Neighborhood Components Analysis

Restricted Boltzmann Machine features for digit classification

Restricted Boltzmann Machine features for digit classification

Target Encoder’s Internal Cross fitting

Target Encoder's Internal Cross fitting

Release Highlights for scikit-learn 1.7

Release Highlights for scikit-learn 1.7

Semi-supervised Classification on a Text Dataset

Semi-supervised Classification on a Text Dataset

SVM-Anova: SVM with univariate feature selection

SVM-Anova: SVM with univariate feature selection