robust_scale#

sklearn.preprocessing.robust_scale(X,*,axis=0,with_centering=True,with_scaling=True,quantile_range=(25.0,75.0),copy=True,unit_variance=False)[source]#

Standardize a dataset along any axis.

Center to the median and component wise scaleaccording to the interquartile range.

Read more in theUser Guide.

Parameters:
X{array-like, sparse matrix} of shape (n_sample, n_features)

The data to center and scale.

axisint, default=0

Axis used to compute the medians and IQR along. If 0,independently scale each feature, otherwise (if 1) scaleeach sample.

with_centeringbool, default=True

IfTrue, center the data before scaling.

with_scalingbool, default=True

IfTrue, scale the data to unit variance (or equivalently,unit standard deviation).

quantile_rangetuple (q_min, q_max), 0.0 < q_min < q_max < 100.0, default=(25.0, 75.0)

Quantile range used to calculatescale_. By default this is equal tothe IQR, i.e.,q_min is the first quantile andq_max is the thirdquantile.

Added in version 0.18.

copybool, default=True

If False, try to avoid a copy and scale in place.This is not guaranteed to always work in place; e.g. if the data isa numpy array with an int dtype, a copy will be returned even withcopy=False.

unit_variancebool, default=False

IfTrue, scale data so that normally distributed features have avariance of 1. In general, if the difference between the x-values ofq_max andq_min for a standard normal distribution is greaterthan 1, the dataset will be scaled down. If less than 1, the datasetwill be scaled up.

Added in version 0.24.

Returns:
X_tr{ndarray, sparse matrix} of shape (n_samples, n_features)

The transformed data.

See also

RobustScaler

Performs centering and scaling using the Transformer API (e.g. as part of a preprocessingPipeline).

Notes

This implementation will refuse to center scipy.sparse matricessince it would make them non-sparse and would potentially crash theprogram with memory exhaustion problems.

Instead the caller is expected to either set explicitlywith_centering=False (in that case, only variance scaling will beperformed on the features of the CSR matrix) or to callX.toarray()if he/she expects the materialized dense array to fit in memory.

To avoid memory copy the caller should pass a CSR matrix.

For a comparison of the different scalers, transformers, and normalizers,see:Compare the effect of different scalers on data with outliers.

Warning

Risk of data leak

Do not userobust_scale unless you knowwhat you are doing. A common mistake is to apply it to the entire databefore splitting into training and test sets. This will bias themodel evaluation because information would have leaked from the testset to the training set.In general, we recommend usingRobustScaler within aPipeline in order to prevent most risks of dataleaking:pipe=make_pipeline(RobustScaler(),LogisticRegression()).

Examples

>>>fromsklearn.preprocessingimportrobust_scale>>>X=[[-2,1,2],[-1,0,1]]>>>robust_scale(X,axis=0)# scale each column independentlyarray([[-1.,  1.,  1.],       [ 1., -1., -1.]])>>>robust_scale(X,axis=1)# scale each row independentlyarray([[-1.5,  0. ,  0.5],       [-1. ,  0. ,  1. ]])
On this page

This Page