Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Ctrl+K

pandas.DataFrame.cov#

DataFrame.cov(min_periods=None,ddof=1,numeric_only=False)[source]#

Compute pairwise covariance of columns, excluding NA/null values.

Compute the pairwise covariance among the series of a DataFrame.The returned data frame is thecovariance matrix of the columnsof the DataFrame.

Both NA and null values are automatically excluded from thecalculation. (See the note below about bias from missing values.)A threshold can be set for the minimum number ofobservations for each value created. Comparisons with observationsbelow this threshold will be returned asNaN.

This method is generally used for the analysis of time series data tounderstand the relationship between different measuresacross time.

Parameters:
min_periodsint, optional

Minimum number of observations required per pair of columnsto have a valid result.

ddofint, default 1

Delta degrees of freedom. The divisor used in calculationsisN-ddof, whereN represents the number of elements.This argument is applicable only when nonan is in the dataframe.

numeric_onlybool, default False

Include onlyfloat,int orboolean data.

Added in version 1.5.0.

Changed in version 2.0.0:The default value ofnumeric_only is nowFalse.

Returns:
DataFrame

The covariance matrix of the series of the DataFrame.

See also

Series.cov

Compute covariance with another Series.

core.window.ewm.ExponentialMovingWindow.cov

Exponential weighted sample covariance.

core.window.expanding.Expanding.cov

Expanding sample covariance.

core.window.rolling.Rolling.cov

Rolling sample covariance.

Notes

Returns the covariance matrix of the DataFrame’s time series.The covariance is normalized by N-ddof.

For DataFrames that have Series that are missing data (assuming thatdata ismissing at random)the returned covariance matrix will be an unbiased estimateof the variance and covariance between the member Series.

However, for many applications this estimate may not be acceptablebecause the estimate covariance matrix is not guaranteed to be positivesemi-definite. This could lead to estimate correlations havingabsolute values which are greater than one, and/or a non-invertiblecovariance matrix. SeeEstimation of covariance matrices for more details.

Examples

>>>df=pd.DataFrame([(1,2),(0,3),(2,0),(1,1)],...columns=['dogs','cats'])>>>df.cov()          dogs      catsdogs  0.666667 -1.000000cats -1.000000  1.666667
>>>np.random.seed(42)>>>df=pd.DataFrame(np.random.randn(1000,5),...columns=['a','b','c','d','e'])>>>df.cov()          a         b         c         d         ea  0.998438 -0.020161  0.059277 -0.008943  0.014144b -0.020161  1.059352 -0.008543 -0.024738  0.009826c  0.059277 -0.008543  1.010670 -0.001486 -0.000271d -0.008943 -0.024738 -0.001486  0.921297 -0.013692e  0.014144  0.009826 -0.000271 -0.013692  0.977795

Minimum number of periods

This method also supports an optionalmin_periods keywordthat specifies the required minimum number of non-NA observations foreach column pair in order to have a valid result:

>>>np.random.seed(42)>>>df=pd.DataFrame(np.random.randn(20,3),...columns=['a','b','c'])>>>df.loc[df.index[:5],'a']=np.nan>>>df.loc[df.index[5:10],'b']=np.nan>>>df.cov(min_periods=12)          a         b         ca  0.316741       NaN -0.150812b       NaN  1.248003  0.191417c -0.150812  0.191417  0.895202

[8]ページ先頭

©2009-2025 Movatter.jp