SelectFromModel#

classsklearn.feature_selection.SelectFromModel(estimator,*,threshold=None,prefit=False,norm_order=1,max_features=None,importance_getter='auto')[source]#

Meta-transformer for selecting features based on importance weights.

Added in version 0.17.

Read more in theUser Guide.

Parameters:
estimatorobject

The base estimator from which the transformer is built.This can be both a fitted (ifprefit is set to True)or a non-fitted estimator. The estimator should have afeature_importances_ orcoef_ attribute after fitting.Otherwise, theimportance_getter parameter should be used.

thresholdstr or float, default=None

The threshold value to use for feature selection. Features whoseabsolute importance value is greater or equal are kept while the othersare discarded. If “median” (resp. “mean”), then thethreshold valueis the median (resp. the mean) of the feature importances. A scalingfactor (e.g., “1.25*mean”) may also be used. If None and if theestimator has a parameter penalty set to l1, either explicitlyor implicitly (e.g, Lasso), the threshold used is 1e-5.Otherwise, “mean” is used by default.

prefitbool, default=False

Whether a prefit model is expected to be passed into the constructordirectly or not.IfTrue,estimator must be a fitted estimator.IfFalse,estimator is fitted and updated by callingfit andpartial_fit, respectively.

norm_ordernon-zero int, inf, -inf, default=1

Order of the norm used to filter the vectors of coefficients belowthreshold in the case where thecoef_ attribute of theestimator is of dimension 2.

max_featuresint, callable, default=None

The maximum number of features to select.

  • If an integer, then it specifies the maximum number of features toallow.

  • If a callable, then it specifies how to calculate the maximum number offeatures allowed by using the output ofmax_features(X).

  • IfNone, then all features are kept.

To only select based onmax_features, setthreshold=-np.inf.

Added in version 0.20.

Changed in version 1.1:max_features accepts a callable.

importance_getterstr or callable, default=’auto’

If ‘auto’, uses the feature importance either through acoef_attribute orfeature_importances_ attribute of estimator.

Also accepts a string that specifies an attribute name/pathfor extracting feature importance (implemented withattrgetter).For example, giveregressor_.coef_ in case ofTransformedTargetRegressor ornamed_steps.clf.feature_importances_ in case ofPipeline with its last step namedclf.

Ifcallable, overrides the default feature importance getter.The callable is passed with the fitted estimator and it shouldreturn importance for each feature.

Added in version 0.24.

Attributes:
estimator_estimator

The base estimator from which the transformer is built. This attributeexist only whenfit has been called.

  • Ifprefit=True, it is a deep copy ofestimator.

  • Ifprefit=False, it is a clone ofestimator and fit on the datapassed tofit orpartial_fit.

n_features_in_int

Number of features seen duringfit.

max_features_int

Maximum number of features calculated duringfit. Only definedif themax_features is notNone.

  • Ifmax_features is anint, thenmax_features_=max_features.

  • Ifmax_features is a callable, thenmax_features_=max_features(X).

Added in version 1.1.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen duringfit. Defined only whenXhas feature names that are all strings.

Added in version 1.0.

threshold_float

Threshold value used for feature selection.

See also

RFE

Recursive feature elimination based on importance weights.

RFECV

Recursive feature elimination with built-in cross-validated selection of the best number of features.

SequentialFeatureSelector

Sequential cross-validation based feature selection. Does not rely on importance weights.

Notes

Allows NaN/Inf in the input if the underlying estimator does as well.

Examples

>>>fromsklearn.feature_selectionimportSelectFromModel>>>fromsklearn.linear_modelimportLogisticRegression>>>X=[[0.87,-1.34,0.31],...[-2.79,-0.02,-0.85],...[-1.34,-0.48,-2.55],...[1.92,1.48,0.65]]>>>y=[0,1,0,1]>>>selector=SelectFromModel(estimator=LogisticRegression()).fit(X,y)>>>selector.estimator_.coef_array([[-0.3252,  0.8345,  0.4976]])>>>selector.threshold_np.float64(0.55249)>>>selector.get_support()array([False,  True, False])>>>selector.transform(X)array([[-1.34],       [-0.02],       [-0.48],       [ 1.48]])

Using a callable to create a selector that can use no more than halfof the input features.

>>>defhalf_callable(X):...returnround(len(X[0])/2)>>>half_selector=SelectFromModel(estimator=LogisticRegression(),...max_features=half_callable)>>>_=half_selector.fit(X,y)>>>half_selector.max_features_2
fit(X,y=None,**fit_params)[source]#

Fit the SelectFromModel meta-transformer.

Parameters:
Xarray-like of shape (n_samples, n_features)

The training input samples.

yarray-like of shape (n_samples,), default=None

The target values (integers that correspond to classes inclassification, real numbers in regression).

**fit_paramsdict
  • Ifenable_metadata_routing=False (default): Parameters directly passedto thefit method of the sub-estimator. They are ignored ifprefit=True.

  • Ifenable_metadata_routing=True: Parameters safely routed to thefitmethod of the sub-estimator. They are ignored ifprefit=True.

Changed in version 1.4:SeeMetadata Routing User Guide formore details.

Returns:
selfobject

Fitted estimator.

fit_transform(X,y=None,**fit_params)[source]#

Fit to data, then transform it.

Fits transformer toX andy with optional parametersfit_paramsand returns a transformed version ofX.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_feature_names_out(input_features=None)[source]#

Mask feature names according to selected features.

Parameters:
input_featuresarray-like of str or None, default=None

Input features.

  • Ifinput_features isNone, thenfeature_names_in_ isused as feature names in. Iffeature_names_in_ is not defined,then the following input feature names are generated:["x0","x1",...,"x(n_features_in_-1)"].

  • Ifinput_features is an array-like, theninput_features mustmatchfeature_names_in_ iffeature_names_in_ is defined.

Returns:
feature_names_outndarray of str objects

Transformed feature names.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please checkUser Guide on how the routingmechanism works.

Added in version 1.4.

Returns:
routingMetadataRouter

AMetadataRouter encapsulatingrouting information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator andcontained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

get_support(indices=False)[source]#

Get a mask, or integer index, of the features selected.

Parameters:
indicesbool, default=False

If True, the return value will be an array of integers, ratherthan a boolean mask.

Returns:
supportarray

An index that selects the retained features from a feature vector.Ifindices is False, this is a boolean array of shape[# input features], in which an element is True iff itscorresponding feature is selected for retention. Ifindices isTrue, this is an integer array of shape [# output features] whosevalues are indices into the input feature vector.

inverse_transform(X)[source]#

Reverse the transformation operation.

Parameters:
Xarray of shape [n_samples, n_selected_features]

The input samples.

Returns:
X_originalarray of shape [n_samples, n_original_features]

X with columns of zeros inserted where features would havebeen removed bytransform.

partial_fit(X,y=None,**partial_fit_params)[source]#

Fit the SelectFromModel meta-transformer only once.

Parameters:
Xarray-like of shape (n_samples, n_features)

The training input samples.

yarray-like of shape (n_samples,), default=None

The target values (integers that correspond to classes inclassification, real numbers in regression).

**partial_fit_paramsdict
  • Ifenable_metadata_routing=False (default): Parameters directly passedto thepartial_fit method of the sub-estimator.

  • Ifenable_metadata_routing=True: Parameters passed to thepartial_fitmethod of the sub-estimator. They are ignored ifprefit=True.

Changed in version 1.4:**partial_fit_params are routed to the sub-estimator, ifenable_metadata_routing=True is set viaset_config, which allows for aliasing.

SeeMetadata Routing User Guide formore details.

Returns:
selfobject

Fitted estimator.

set_output(*,transform=None)[source]#

Set output container.

SeeIntroducing the set_output APIfor an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output oftransform andfit_transform.

  • "default": Default output format of a transformer

  • "pandas": DataFrame output

  • "polars": Polars output

  • None: Transform configuration is unchanged

Added in version 1.4:"polars" option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects(such asPipeline). The latter haveparameters of the form<component>__<parameter> so that it’spossible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(X)[source]#

Reduce X to the selected features.

Parameters:
Xarray of shape [n_samples, n_features]

The input samples.

Returns:
X_rarray of shape [n_samples, n_selected_features]

The input samples with only the selected features.

Gallery examples#

Model-based and sequential feature selection

Model-based and sequential feature selection