EllipticEnvelope#

classsklearn.covariance.EllipticEnvelope(*,store_precision=True,assume_centered=False,support_fraction=None,contamination=0.1,random_state=None)[source]#

An object for detecting outliers in a Gaussian distributed dataset.

Read more in theUser Guide.

Parameters:
store_precisionbool, default=True

Specify if the estimated precision is stored.

assume_centeredbool, default=False

If True, the support of robust location and covariance estimatesis computed, and a covariance estimate is recomputed from it,without centering the data.Useful to work with data whose mean is significantly equal tozero but is not exactly zero.If False, the robust location and covariance are directly computedwith the FastMCD algorithm without additional treatment.

support_fractionfloat, default=None

The proportion of points to be included in the support of the rawMCD estimate. If None, the minimum value of support_fraction willbe used within the algorithm:(n_samples+n_features+1)/2*n_samples.Range is (0, 1).

contaminationfloat, default=0.1

The amount of contamination of the data set, i.e. the proportionof outliers in the data set. Range is (0, 0.5].

random_stateint, RandomState instance or None, default=None

Determines the pseudo random number generator for shufflingthe data. Pass an int for reproducible results across multiple functioncalls. SeeGlossary.

Attributes:
location_ndarray of shape (n_features,)

Estimated robust location.

covariance_ndarray of shape (n_features, n_features)

Estimated robust covariance matrix.

precision_ndarray of shape (n_features, n_features)

Estimated pseudo inverse matrix.(stored only if store_precision is True)

support_ndarray of shape (n_samples,)

A mask of the observations that have been used to compute therobust estimates of location and shape.

offset_float

Offset used to define the decision function from the raw scores.We have the relation:decision_function=score_samples-offset_.The offset depends on the contamination parameter and is defined insuch a way we obtain the expected number of outliers (samples withdecision function < 0) in training.

Added in version 0.20.

raw_location_ndarray of shape (n_features,)

The raw robust estimated location before correction and re-weighting.

raw_covariance_ndarray of shape (n_features, n_features)

The raw robust estimated covariance before correction and re-weighting.

raw_support_ndarray of shape (n_samples,)

A mask of the observations that have been used to computethe raw robust estimates of location and shape, before correctionand re-weighting.

dist_ndarray of shape (n_samples,)

Mahalanobis distances of the training set (on whichfit iscalled) observations.

n_features_in_int

Number of features seen duringfit.

Added in version 0.24.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen duringfit. Defined only whenXhas feature names that are all strings.

Added in version 1.0.

See also

EmpiricalCovariance

Maximum likelihood covariance estimator.

GraphicalLasso

Sparse inverse covariance estimation with an l1-penalized estimator.

LedoitWolf

LedoitWolf Estimator.

MinCovDet

Minimum Covariance Determinant (robust estimator of covariance).

OAS

Oracle Approximating Shrinkage Estimator.

ShrunkCovariance

Covariance estimator with shrinkage.

Notes

Outlier detection from covariance estimation may break or notperform well in high-dimensional settings. In particular, one willalways take care to work withn_samples>n_features**2.

References

[1]

Rousseeuw, P.J., Van Driessen, K. “A fast algorithm for theminimum covariance determinant estimator” Technometrics 41(3), 212(1999)

Examples

>>>importnumpyasnp>>>fromsklearn.covarianceimportEllipticEnvelope>>>true_cov=np.array([[.8,.3],...[.3,.4]])>>>X=np.random.RandomState(0).multivariate_normal(mean=[0,0],...cov=true_cov,...size=500)>>>cov=EllipticEnvelope(random_state=0).fit(X)>>># predict returns 1 for an inlier and -1 for an outlier>>>cov.predict([[0,0],...[3,3]])array([ 1, -1])>>>cov.covariance_array([[0.7411, 0.2535],       [0.2535, 0.3053]])>>>cov.location_array([0.0813 , 0.0427])
correct_covariance(data)[source]#

Apply a correction to raw Minimum Covariance Determinant estimates.

Correction using the empirical correction factor suggestedby Rousseeuw and Van Driessen in[RVD].

Parameters:
dataarray-like of shape (n_samples, n_features)

The data matrix, with p features and n samples.The data set must be the one which was used to computethe raw estimates.

Returns:
covariance_correctedndarray of shape (n_features, n_features)

Corrected robust covariance estimate.

References

[RVD]

A Fast Algorithm for the Minimum CovarianceDeterminant Estimator, 1999, American Statistical Associationand the American Society for Quality, TECHNOMETRICS

decision_function(X)[source]#

Compute the decision function of the given observations.

Parameters:
Xarray-like of shape (n_samples, n_features)

The data matrix.

Returns:
decisionndarray of shape (n_samples,)

Decision function of the samples.It is equal to the shifted Mahalanobis distances.The threshold for being an outlier is 0, which ensures acompatibility with other outlier detection algorithms.

error_norm(comp_cov,norm='frobenius',scaling=True,squared=True)[source]#

Compute the Mean Squared Error between two covariance estimators.

Parameters:
comp_covarray-like of shape (n_features, n_features)

The covariance to compare with.

norm{“frobenius”, “spectral”}, default=”frobenius”

The type of norm used to compute the error. Available error types:- ‘frobenius’ (default): sqrt(tr(A^t.A))- ‘spectral’: sqrt(max(eigenvalues(A^t.A))where A is the error(comp_cov-self.covariance_).

scalingbool, default=True

If True (default), the squared error norm is divided by n_features.If False, the squared error norm is not rescaled.

squaredbool, default=True

Whether to compute the squared error norm or the error norm.If True (default), the squared error norm is returned.If False, the error norm is returned.

Returns:
resultfloat

The Mean Squared Error (in the sense of the Frobenius norm) betweenself andcomp_cov covariance estimators.

fit(X,y=None)[source]#

Fit the EllipticEnvelope model.

Parameters:
Xarray-like of shape (n_samples, n_features)

Training data.

yIgnored

Not used, present for API consistency by convention.

Returns:
selfobject

Returns the instance itself.

fit_predict(X,y=None,**kwargs)[source]#

Perform fit on X and returns labels for X.

Returns -1 for outliers and 1 for inliers.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The input samples.

yIgnored

Not used, present for API consistency by convention.

**kwargsdict

Arguments to be passed tofit.

Added in version 1.4.

Returns:
yndarray of shape (n_samples,)

1 for inliers, -1 for outliers.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please checkUser Guide on how the routingmechanism works.

Returns:
routingMetadataRequest

AMetadataRequest encapsulatingrouting information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator andcontained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

get_precision()[source]#

Getter for the precision matrix.

Returns:
precision_array-like of shape (n_features, n_features)

The precision matrix associated to the current covariance object.

mahalanobis(X)[source]#

Compute the squared Mahalanobis distances of given observations.

For a detailed example of how outliers affects the Mahalanobis distance,seeRobust covariance estimation and Mahalanobis distances relevance.

Parameters:
Xarray-like of shape (n_samples, n_features)

The observations, the Mahalanobis distances of the which wecompute. Observations are assumed to be drawn from the samedistribution than the data used in fit.

Returns:
distndarray of shape (n_samples,)

Squared Mahalanobis distances of the observations.

predict(X)[source]#

Predict labels (1 inlier, -1 outlier) of X according to fitted model.

Parameters:
Xarray-like of shape (n_samples, n_features)

The data matrix.

Returns:
is_inlierndarray of shape (n_samples,)

Returns -1 for anomalies/outliers and +1 for inliers.

reweight_covariance(data)[source]#

Re-weight raw Minimum Covariance Determinant estimates.

Re-weight observations using Rousseeuw’s method (equivalent todeleting outlying observations from the data set beforecomputing location and covariance estimates) describedin[RVDriessen].

Parameters:
dataarray-like of shape (n_samples, n_features)

The data matrix, with p features and n samples.The data set must be the one which was used to computethe raw estimates.

Returns:
location_reweightedndarray of shape (n_features,)

Re-weighted robust location estimate.

covariance_reweightedndarray of shape (n_features, n_features)

Re-weighted robust covariance estimate.

support_reweightedndarray of shape (n_samples,), dtype=bool

A mask of the observations that have been used to computethe re-weighted robust location and covariance estimates.

References

[RVDriessen]

A Fast Algorithm for the Minimum CovarianceDeterminant Estimator, 1999, American Statistical Associationand the American Society for Quality, TECHNOMETRICS

score(X,y,sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracywhich is a harsh metric since you require for each sample thateach label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

score_samples(X)[source]#

Compute the negative Mahalanobis distances.

Parameters:
Xarray-like of shape (n_samples, n_features)

The data matrix.

Returns:
negative_mahal_distancesarray-like of shape (n_samples,)

Opposite of the Mahalanobis distances.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects(such asPipeline). The latter haveparameters of the form<component>__<parameter> so that it’spossible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

Gallery examples#

Outlier detection on a real data set

Outlier detection on a real data set

Comparing anomaly detection algorithms for outlier detection on toy datasets

Comparing anomaly detection algorithms for outlier detection on toy datasets