OAS#

classsklearn.covariance.OAS(*,store_precision=True,assume_centered=False)[source]#

Oracle Approximating Shrinkage Estimator.

Read more in theUser Guide.

Parameters:
store_precisionbool, default=True

Specify if the estimated precision is stored.

assume_centeredbool, default=False

If True, data will not be centered before computation.Useful when working with data whose mean is almost, but not exactlyzero.If False (default), data will be centered before computation.

Attributes:
covariance_ndarray of shape (n_features, n_features)

Estimated covariance matrix.

location_ndarray of shape (n_features,)

Estimated location, i.e. the estimated mean.

precision_ndarray of shape (n_features, n_features)

Estimated pseudo inverse matrix.(stored only if store_precision is True)

shrinkage_float

coefficient in the convex combination used for the computationof the shrunk estimate. Range is [0, 1].

n_features_in_int

Number of features seen duringfit.

Added in version 0.24.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen duringfit. Defined only whenXhas feature names that are all strings.

Added in version 1.0.

See also

EllipticEnvelope

An object for detecting outliers in a Gaussian distributed dataset.

EmpiricalCovariance

Maximum likelihood covariance estimator.

GraphicalLasso

Sparse inverse covariance estimation with an l1-penalized estimator.

GraphicalLassoCV

Sparse inverse covariance with cross-validated choice of the l1 penalty.

LedoitWolf

LedoitWolf Estimator.

MinCovDet

Minimum Covariance Determinant (robust estimator of covariance).

ShrunkCovariance

Covariance estimator with shrinkage.

Notes

The regularised covariance is:

(1 - shrinkage) * cov + shrinkage * mu * np.identity(n_features),

where mu = trace(cov) / n_features and shrinkage is given by the OAS formula(see[1]).

The shrinkage formulation implemented here differs from Eq. 23 in[1]. Inthe original article, formula (23) states that 2/p (p being the number offeatures) is multiplied by Trace(cov*cov) in both the numerator anddenominator, but this operation is omitted because for a large p, the valueof 2/p is so small that it doesn’t affect the value of the estimator.

References

Examples

>>>importnumpyasnp>>>fromsklearn.covarianceimportOAS>>>fromsklearn.datasetsimportmake_gaussian_quantiles>>>real_cov=np.array([[.8,.3],...[.3,.4]])>>>rng=np.random.RandomState(0)>>>X=rng.multivariate_normal(mean=[0,0],...cov=real_cov,...size=500)>>>oas=OAS().fit(X)>>>oas.covariance_array([[0.7533, 0.2763],       [0.2763, 0.3964]])>>>oas.precision_array([[ 1.7833, -1.2431 ],       [-1.2431,  3.3889]])>>>oas.shrinkage_np.float64(0.0195)

See alsoShrinkage covariance estimation: LedoitWolf vs OAS and max-likelihoodandLedoit-Wolf vs OAS estimationfor more detailed examples.

error_norm(comp_cov,norm='frobenius',scaling=True,squared=True)[source]#

Compute the Mean Squared Error between two covariance estimators.

Parameters:
comp_covarray-like of shape (n_features, n_features)

The covariance to compare with.

norm{“frobenius”, “spectral”}, default=”frobenius”

The type of norm used to compute the error. Available error types:- ‘frobenius’ (default): sqrt(tr(A^t.A))- ‘spectral’: sqrt(max(eigenvalues(A^t.A))where A is the error(comp_cov-self.covariance_).

scalingbool, default=True

If True (default), the squared error norm is divided by n_features.If False, the squared error norm is not rescaled.

squaredbool, default=True

Whether to compute the squared error norm or the error norm.If True (default), the squared error norm is returned.If False, the error norm is returned.

Returns:
resultfloat

The Mean Squared Error (in the sense of the Frobenius norm) betweenself andcomp_cov covariance estimators.

fit(X,y=None)[source]#

Fit the Oracle Approximating Shrinkage covariance model to X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Training data, wheren_samples is the number of samplesandn_features is the number of features.

yIgnored

Not used, present for API consistency by convention.

Returns:
selfobject

Returns the instance itself.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please checkUser Guide on how the routingmechanism works.

Returns:
routingMetadataRequest

AMetadataRequest encapsulatingrouting information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator andcontained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

get_precision()[source]#

Getter for the precision matrix.

Returns:
precision_array-like of shape (n_features, n_features)

The precision matrix associated to the current covariance object.

mahalanobis(X)[source]#

Compute the squared Mahalanobis distances of given observations.

For a detailed example of how outliers affects the Mahalanobis distance,seeRobust covariance estimation and Mahalanobis distances relevance.

Parameters:
Xarray-like of shape (n_samples, n_features)

The observations, the Mahalanobis distances of the which wecompute. Observations are assumed to be drawn from the samedistribution than the data used in fit.

Returns:
distndarray of shape (n_samples,)

Squared Mahalanobis distances of the observations.

score(X_test,y=None)[source]#

Compute the log-likelihood ofX_test under the estimated Gaussian model.

The Gaussian model is defined by its mean and covariance matrix which arerepresented respectively byself.location_ andself.covariance_.

Parameters:
X_testarray-like of shape (n_samples, n_features)

Test data of which we compute the likelihood, wheren_samples isthe number of samples andn_features is the number of features.X_test is assumed to be drawn from the same distribution thanthe data used in fit (including centering).

yIgnored

Not used, present for API consistency by convention.

Returns:
resfloat

The log-likelihood ofX_test withself.location_ andself.covariance_as estimators of the Gaussian model mean and covariance matrix respectively.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects(such asPipeline). The latter haveparameters of the form<component>__<parameter> so that it’spossible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

Gallery examples#

Normal, Ledoit-Wolf and OAS Linear Discriminant Analysis for classification

Normal, Ledoit-Wolf and OAS Linear Discriminant Analysis for classification

Shrinkage covariance estimation: LedoitWolf vs OAS and max-likelihood

Shrinkage covariance estimation: LedoitWolf vs OAS and max-likelihood

Ledoit-Wolf vs OAS estimation

Ledoit-Wolf vs OAS estimation