Pipeline #

classsklearn.pipeline.Pipeline(steps,*,transform_input=None,memory=None,verbose=False)[source]#

A sequence of data transformers with an optional final predictor.

Pipeline allows you to sequentially apply a list of transformers topreprocess the data and, if desired, conclude the sequence with a finalpredictor for predictive modeling.

Intermediate steps of the pipeline must be transformers, that is, theymust implementfit andtransform methods.The finalestimator only needs to implementfit.The transformers in the pipeline can be cached usingmemory argument.

The purpose of the pipeline is to assemble several steps that can becross-validated together while setting different parameters. For this, itenables setting parameters of the various steps using their names and theparameter name separated by a'__', as in the example below. A step’sestimator may be replaced entirely by setting the parameter with its nameto another estimator, or a transformer removed by setting it to'passthrough' orNone.

For an example use case ofPipeline combined withGridSearchCV, refer toSelecting dimensionality reduction with Pipeline and GridSearchCV. TheexamplePipelining: chaining a PCA and a logistic regression shows howto grid search on a pipeline using'__' as a separator in the parameter names.

See also

make_pipeline: Convenience function for simplified pipeline construction.

Examples

>>>fromsklearn.svmimportSVC>>>fromsklearn.preprocessingimportStandardScaler>>>fromsklearn.datasetsimportmake_classification>>>fromsklearn.model_selectionimporttrain_test_split>>>fromsklearn.pipelineimportPipeline>>>X,y=make_classification(random_state=0)>>>X_train,X_test,y_train,y_test=train_test_split(X,y,...random_state=0)>>>pipe=Pipeline([('scaler',StandardScaler()),('svc',SVC())])>>># The pipeline can be used as any other estimator>>># and avoids leaking the test set into the train set>>>pipe.fit(X_train,y_train).score(X_test,y_test)0.88>>># An estimator's parameter can be set using '__' syntax>>>pipe.set_params(svc__C=10).fit(X_train,y_train).score(X_test,y_test)0.76

decision_function(X,**params)[source]#

Transform the data, and applydecision_function with the final estimator.

Calltransform of each transformer in the pipeline. The transformeddata are finally passed to the final estimator that callsdecision_function method. Only valid if the final estimatorimplementsdecision_function.

Parameters:

Xiterable: Data to predict on. Must fulfill input requirements of first stepof the pipeline.
**paramsdict of string -> object: Parameters requested and accepted by steps. Each step must haverequested certain metadata for these parameters to be forwarded tothem.
Added in version 1.4:Only available ifenable_metadata_routing=True. SeeMetadata Routing User Guide for moredetails.

Returns:

y_scorendarray of shape (n_samples, n_classes): Result of callingdecision_function on the final estimator.

fit(X,y=None,**params)[source]#

Fit the model.

Fit all the transformers one after the other and sequentially transform thedata. Finally, fit the transformed data using the final estimator.

Parameters:

Xiterable

Training data. Must fulfill input requirements of first step of thepipeline.

yiterable, default=None

Training targets. Must fulfill label requirements for all steps ofthe pipeline.

**paramsdict of str -> object

Ifenable_metadata_routing=False (default): Parameters passed to thefit method of each step, where each parameter name is prefixed suchthat parameterp for steps has keys__p.
Ifenable_metadata_routing=True: Parameters requested and accepted bysteps. Each step must have requested certain metadata for these parametersto be forwarded to them.

Changed in version 1.4:Parameters are now passed to thetransform method of theintermediate steps as well, if requested, and ifenable_metadata_routing=True is set viaset_config.

SeeMetadata Routing User Guide for moredetails.

Returns:

selfobject: Pipeline with fitted steps.

fit_predict(X,y=None,**params)[source]#

Transform the data, and applyfit_predict with the final estimator.

Callfit_transform of each transformer in the pipeline. Thetransformed data are finally passed to the final estimator that callsfit_predict method. Only valid if the final estimator implementsfit_predict.

Parameters:

Xiterable

Training data. Must fulfill input requirements of first step ofthe pipeline.

yiterable, default=None

Training targets. Must fulfill label requirements for all stepsof the pipeline.

**paramsdict of str -> object

Ifenable_metadata_routing=False (default): Parameters to thepredict called at the end of all transformations in the pipeline.
Ifenable_metadata_routing=True: Parameters requested and accepted bysteps. Each step must have requested certain metadata for these parametersto be forwarded to them.

Added in version 0.20.

Changed in version 1.4:Parameters are now passed to thetransform method of theintermediate steps as well, if requested, and ifenable_metadata_routing=True.

SeeMetadata Routing User Guide for moredetails.

Note that while this may be used to return uncertainties from somemodels withreturn_std orreturn_cov, uncertainties that aregenerated by the transformations in the pipeline are not propagatedto the final estimator.

Returns:

y_predndarray: Result of callingfit_predict on the final estimator.

fit_transform(X,y=None,**params)[source]#

Fit the model and transform with the final estimator.

Fit all the transformers one after the other and sequentially transformthe data. Only valid if the final estimator either implementsfit_transform orfit andtransform.

Parameters:

Xiterable

Training data. Must fulfill input requirements of first step of thepipeline.

yiterable, default=None

Training targets. Must fulfill label requirements for all steps ofthe pipeline.

**paramsdict of str -> object

Ifenable_metadata_routing=False (default): Parameters passed to thefit method of each step, where each parameter name is prefixed suchthat parameterp for steps has keys__p.
Ifenable_metadata_routing=True: Parameters requested and accepted bysteps. Each step must have requested certain metadata for these parametersto be forwarded to them.

Changed in version 1.4:Parameters are now passed to thetransform method of theintermediate steps as well, if requested, and ifenable_metadata_routing=True.

SeeMetadata Routing User Guide for moredetails.

Returns:

Xtndarray of shape (n_samples, n_transformed_features): Transformed samples.

get_feature_names_out(input_features=None)[source]#

Get output feature names for transformation.

Transform input features using the pipeline.

Parameters:

input_featuresarray-like of str or None, default=None: Input features.

Returns:

feature_names_outndarray of str objects: Transformed feature names.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please checkUser Guide on how the routingmechanism works.

Returns:

routingMetadataRouter: AMetadataRouter encapsulatingrouting information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Returns the parameters given in the constructor as well as theestimators contained within thesteps of thePipeline.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator andcontained subobjects that are estimators.

Returns:

paramsmapping of string to any: Parameter names mapped to their values.

inverse_transform(X,**params)[source]#

Applyinverse_transform for each step in a reverse order.

All estimators in the pipeline must supportinverse_transform.

Parameters:

Xarray-like of shape (n_samples, n_transformed_features): Data samples, wheren_samples is the number of samples andn_features is the number of features. Must fulfillinput requirements of last step of pipeline’sinverse_transform method.
**paramsdict of str -> object: Parameters requested and accepted by steps. Each step must haverequested certain metadata for these parameters to be forwarded tothem.
Added in version 1.4:Only available ifenable_metadata_routing=True. SeeMetadata Routing User Guide for moredetails.

Returns:

X_originalndarray of shape (n_samples, n_features): Inverse transformed data, that is, data in the original featurespace.

propertynamed_steps#

Access the steps by name.

Read-only attribute to access any step by given name.Keys are steps names and values are the steps objects.

predict(X,**params)[source]#

Transform the data, and applypredict with the final estimator.

Calltransform of each transformer in the pipeline. The transformeddata are finally passed to the final estimator that callspredictmethod. Only valid if the final estimator implementspredict.

Parameters:

Xiterable

Data to predict on. Must fulfill input requirements of first stepof the pipeline.

**paramsdict of str -> object

Ifenable_metadata_routing=False (default): Parameters to thepredict called at the end of all transformations in the pipeline.
Ifenable_metadata_routing=True: Parameters requested and accepted bysteps. Each step must have requested certain metadata for these parametersto be forwarded to them.

Added in version 0.20.

Changed in version 1.4:Parameters are now passed to thetransform method of theintermediate steps as well, if requested, and ifenable_metadata_routing=True is set viaset_config.

SeeMetadata Routing User Guide for moredetails.

Returns:

y_predndarray: Result of callingpredict on the final estimator.

predict_log_proba(X,**params)[source]#

Transform the data, and applypredict_log_proba with the final estimator.

Calltransform of each transformer in the pipeline. The transformeddata are finally passed to the final estimator that callspredict_log_proba method. Only valid if the final estimatorimplementspredict_log_proba.

Parameters:

Xiterable

Data to predict on. Must fulfill input requirements of first stepof the pipeline.

**paramsdict of str -> object

Ifenable_metadata_routing=False (default): Parameters to thepredict_log_proba called at the end of all transformations in thepipeline.
Ifenable_metadata_routing=True: Parameters requested and accepted bysteps. Each step must have requested certain metadata for these parametersto be forwarded to them.

Added in version 0.20.

Changed in version 1.4:Parameters are now passed to thetransform method of theintermediate steps as well, if requested, and ifenable_metadata_routing=True.

SeeMetadata Routing User Guide for moredetails.

Returns:

y_log_probandarray of shape (n_samples, n_classes): Result of callingpredict_log_proba on the final estimator.

predict_proba(X,**params)[source]#

Transform the data, and applypredict_proba with the final estimator.

Calltransform of each transformer in the pipeline. The transformeddata are finally passed to the final estimator that callspredict_proba method. Only valid if the final estimator implementspredict_proba.

Parameters:

Xiterable

Data to predict on. Must fulfill input requirements of first stepof the pipeline.

**paramsdict of str -> object

Ifenable_metadata_routing=False (default): Parameters to thepredict_proba called at the end of all transformations in the pipeline.
Ifenable_metadata_routing=True: Parameters requested and accepted bysteps. Each step must have requested certain metadata for these parametersto be forwarded to them.

Added in version 0.20.

Changed in version 1.4:Parameters are now passed to thetransform method of theintermediate steps as well, if requested, and ifenable_metadata_routing=True.

SeeMetadata Routing User Guide for moredetails.

Returns:

y_probandarray of shape (n_samples, n_classes): Result of callingpredict_proba on the final estimator.

score(X,y=None,sample_weight=None,**params)[source]#

Transform the data, and applyscore with the final estimator.

Calltransform of each transformer in the pipeline. The transformeddata are finally passed to the final estimator that callsscore method. Only valid if the final estimator implementsscore.

Parameters:

Xiterable: Data to predict on. Must fulfill input requirements of first stepof the pipeline.
yiterable, default=None: Targets used for scoring. Must fulfill label requirements for allsteps of the pipeline.
sample_weightarray-like, default=None: If not None, this argument is passed assample_weight keywordargument to thescore method of the final estimator.
**paramsdict of str -> object: Parameters requested and accepted by steps. Each step must haverequested certain metadata for these parameters to be forwarded tothem.
Added in version 1.4:Only available ifenable_metadata_routing=True. SeeMetadata Routing User Guide for moredetails.

Returns:

scorefloat: Result of callingscore on the final estimator.

score_samples(X)[source]#

Transform the data, and applyscore_samples with the final estimator.

Calltransform of each transformer in the pipeline. The transformeddata are finally passed to the final estimator that callsscore_samples method. Only valid if the final estimator implementsscore_samples.

Parameters:

Xiterable: Data to predict on. Must fulfill input requirements of first stepof the pipeline.

Returns:

y_scorendarray of shape (n_samples,): Result of callingscore_samples on the final estimator.

set_output(*,transform=None)[source]#

Set the output container when"transform" and"fit_transform" are called.

Callingset_output will set the output of all estimators insteps.

Parameters:

transform{“default”, “pandas”, “polars”}, default=None

Configure output oftransform andfit_transform.

"default": Default output format of a transformer
"pandas": DataFrame output
"polars": Polars output
None: Transform configuration is unchanged

Added in version 1.4:"polars" option was added.

Returns:

selfestimator instance: Estimator instance.

set_params(**kwargs)[source]#

Set the parameters of this estimator.

Valid parameter keys can be listed withget_params(). Note thatyou can directly set the parameters of the estimators contained insteps.

Parameters:

**kwargsdict: Parameters of this estimator or parameters of estimators containedinsteps. Parameters of the steps may be set using its name andthe parameter name separated by a ‘__’.

Returns:

selfobject: Pipeline class instance.

set_score_request(*,sample_weight:bool|None|str='$UNCHANGED$')→Pipeline[source]#

Request metadata passed to thescore method.

Note that this method is only relevant ifenable_metadata_routing=True (seesklearn.set_config).Please seeUser Guide on how the routingmechanism works.

The options for each parameter are:

True: metadata is requested, and passed toscore if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it toscore.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains theexisting request. This allows you to change the request for someparameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as asub-estimator of a meta-estimator, e.g. used inside aPipeline. Otherwise it has no effect.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing forsample_weight parameter inscore.

Returns:

selfobject: The updated object.

transform(X,**params)[source]#

Transform the data, and applytransform with the final estimator.

Calltransform of each transformer in the pipeline. The transformeddata are finally passed to the final estimator that callstransform method. Only valid if the final estimatorimplementstransform.

This also works where final estimator isNone in which case all priortransformations are applied.

Parameters:

Xiterable: Data to transform. Must fulfill input requirements of first stepof the pipeline.
**paramsdict of str -> object: Parameters requested and accepted by steps. Each step must haverequested certain metadata for these parameters to be forwarded tothem.
Added in version 1.4:Only available ifenable_metadata_routing=True. SeeMetadata Routing User Guide for moredetails.

Returns: