Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add kernel density estimation to the statistics module #115532

Closed
Labels
3.13bugs and security fixestype-featureA feature request or enhancement
@rhettinger

Description

@rhettinger

Feature

Proposal:

I propose promoting the KDE statistics recipe to be an actual part of the statistics module.
See:https://docs.python.org/3.13/library/statistics.html#kernel-density-estimation

My thought Is that it is an essential part of statistical reasoning about sample data
See:https://www.itm-conferences.org/articles/itmconf/pdf/2018/08/itmconf_sam2018_00037.pdf

Discussion of this feature:

This was discussed and reviewed with the module author who gave it his full support:

This looks really great, both as a concept and the code itself. It's a
good showcase for match, and an important statistics function.

Thank you for working on this.

Working Draft:

fromtypingimportSequence,Callablefrombisectimportbisect_left,bisect_rightfrommathimportsqrt,exp,cos,pi,coshdefkde(sample:Sequence[float],h:float,kernel_function:str='gauss',    )->Callable[[float],float]:"""Kernel Density Estimation.  Creates a continuous probability    density function from discrete samples.    The basic idea is to smooth the data using a kernel function    to help draw inferences about a population from a sample.    The degree of smoothing is controlled by the scaling parameter h    which is called the bandwidth.  Smaller values emphasize local    features while larger values give smoother results.    Kernel functions that give some weight to every sample point:       gauss or normal       logistic       sigmoid    Kernel functions that only give weight to sample points within    the bandwidth:       rectangular or uniform       triangular       epanechnikov or parabolic       biweight or quartic       triweight       cosine    Example    -------    Given a sample of six data points, estimate the    probablity density at evenly spaced points from -6 to 10:        >>> sample = [-2.1, -1.3, -0.4, 1.9, 5.1, 6.2]        >>> f_hat = kde(sample, h=1.5)        >>> total = 0.0        >>> for x in range(-6, 11):        ...     density = f_hat(x)        ...     total += density        ...     plot = ' ' * int(density * 400) + 'x'        ...     print(f'{x:2}: {density:.3f} {plot}')        ...        -6: 0.002 x        -5: 0.009    x        -4: 0.031             x        -3: 0.070                             x        -2: 0.111                                             x        -1: 0.125                                                   x         0: 0.110                                            x         1: 0.086                                   x         2: 0.068                            x         3: 0.059                        x         4: 0.066                           x         5: 0.082                                 x         6: 0.082                                 x         7: 0.058                        x         8: 0.028            x         9: 0.009    x        10: 0.002 x        >>> round(total, 3)        0.999    References    ----------    Kernel density estimation and its application:    https://www.itm-conferences.org/articles/itmconf/pdf/2018/08/itmconf_sam2018_00037.pdf    Kernel functions in common use:    https://en.wikipedia.org/wiki/Kernel_(statistics)#Kernel_functions_in_common_use    Interactive graphical demonstration and exploration:    https://demonstrations.wolfram.com/KernelDensityEstimation/    """kernel:Callable[[float],float]support:float|Nonematchkernel_function:case'gauss'|'normal':c=1/sqrt(2*pi)kernel=lambdat:c*exp(-1/2*t*t)support=Nonecase'logistic':# 1.0 / (exp(t) + 2.0 + exp(-t))kernel=lambdat:1/2/ (1.0+cosh(t))support=Nonecase'sigmoid':# (2/pi) / (exp(t) + exp(-t))c=1/pikernel=lambdat:c/cosh(t)support=Nonecase'rectangular'|'uniform':kernel=lambdat:1/2support=1.0case'triangular':kernel=lambdat:1.0-abs(t)support=1.0case'epanechnikov'|'parabolic':kernel=lambdat:3/4* (1.0-t*t)support=1.0case'biweight'|'quartic':kernel=lambdat:15/16* (1.0-t*t)**2support=1.0case'triweight':kernel=lambdat:35/32* (1.0-t*t)**3support=1.0case'cosine':c1=pi/4c2=pi/2kernel=lambdat:c1*cos(c2*t)support=1.0case _:raiseValueError(f'Unknown kernel function:{kernel_function!r}')n=len(sample)ifsupportisNone:defpdf(x:float)->float:returnsum(kernel((x-x_i)/h)forx_iinsample)/ (n*h)else:sample=sorted(sample)bandwidth=h*supportdefpdf(x:float)->float:i=bisect_left(sample,x-bandwidth)j=bisect_right(sample,x+bandwidth)supported=sample[i :j]returnsum(kernel((x-x_i)/h)forx_iinsupported)/ (n*h)returnpdfdef_test()->None:fromstatisticsimportNormalDistdefkde_normal(sample:Sequence[float],h:float)->Callable[[float],float]:"Create a continuous probability density function from a sample."# Smooth the sample with a normal distribution kernel scaled by h.kernel_h=NormalDist(0.0,h).pdfn=len(sample)defpdf(x:float)->float:returnsum(kernel_h(x-x_i)forx_iinsample)/nreturnpdfD=NormalDist(250,50)data=D.samples(100)h=30pd=D.pdffg=kde(data,h,'gauss')fg2=kde_normal(data,h)fr=kde(data,h,'rectangular')ft=kde(data,h,'triangular')fb=kde(data,h,'biweight')fe=kde(data,h,'epanechnikov')fc=kde(data,h,'cosine')fl=kde(data,h,'logistic')fs=kde(data,h,'sigmoid')defshow(x:float)->None:forfuncin (pd,fg,fg2,fr,ft,fb,fe,fc,fl,fs):print(func(x))print()show(197)show(255)show(320)defintegrate(func:Callable[[float],float],low:float,high:float,steps:int=1000)->float:width= (high-low)/stepsxarr= (low+width*iforiinrange(steps))returnsum(map(func,xarr))*widthprint('\nIntegrals:')forfuncin (pd,fg,fg2,fr,ft,fb,fe,fc,fl,fs):print(integrate(func,0,500,steps=10_000))if__name__=='__main__':importdoctest_test()print(doctest.testmod())

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.13bugs and security fixestype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp