numpy.random.Generator.f#

method

random.Generator.f(dfnum,dfden,size=None)#

Draw samples from an F distribution.

Samples are drawn from an F distribution with specified parameters,dfnum (degrees of freedom in numerator) anddfden (degrees offreedom in denominator), where both parameters must be greater thanzero.

The random variate of the F distribution (also known as theFisher distribution) is a continuous probability distributionthat arises in ANOVA tests, and is the ratio of two chi-squarevariates.

Parameters:
dfnumfloat or array_like of floats

Degrees of freedom in numerator, must be > 0.

dfdenfloat or array_like of float

Degrees of freedom in denominator, must be > 0.

sizeint or tuple of ints, optional

Output shape. If the given shape is, e.g.,(m,n,k), thenm*n*k samples are drawn. If size isNone (default),a single value is returned ifdfnum anddfden are both scalars.Otherwise,np.broadcast(dfnum,dfden).size samples are drawn.

Returns:
outndarray or scalar

Drawn samples from the parameterized Fisher distribution.

See also

scipy.stats.f

probability density function, distribution or cumulative density function, etc.

Notes

The F statistic is used to compare in-group variances to between-groupvariances. Calculating the distribution depends on the sampling, andso it is a function of the respective degrees of freedom in theproblem. The variabledfnum is the number of samples minus one, thebetween-groups degrees of freedom, whiledfden is the within-groupsdegrees of freedom, the sum of the number of samples in each groupminus the number of groups.

References

[1]

Glantz, Stanton A. “Primer of Biostatistics.”, McGraw-Hill,Fifth Edition, 2002.

[2]

Wikipedia, “F-distribution”,https://en.wikipedia.org/wiki/F-distribution

Examples

An example from Glantz[1], pp 47-40:

Two groups, children of diabetics (25 people) and children from peoplewithout diabetes (25 controls). Fasting blood glucose was measured,case group had a mean value of 86.1, controls had a mean value of82.2. Standard deviations were 2.09 and 2.49 respectively. Are thesedata consistent with the null hypothesis that the parents diabeticstatus does not affect their children’s blood glucose levels?Calculating the F statistic from the data gives a value of 36.01.

Draw samples from the distribution:

>>>dfnum=1.# between group degrees of freedom>>>dfden=48.# within groups degrees of freedom>>>rng=np.random.default_rng()>>>s=rng.f(dfnum,dfden,1000)

The lower bound for the top 1% of the samples is :

>>>np.sort(s)[-10]7.61988120985 # random

So there is about a 1% chance that the F statistic will exceed 7.62,the measured value is 36, so the null hypothesis is rejected at the 1%level.

The corresponding probability density function forn=20 andm=20 is:

>>>importmatplotlib.pyplotasplt>>>fromscipyimportstats>>>dfnum,dfden,size=20,20,10000>>>s=rng.f(dfnum=dfnum,dfden=dfden,size=size)>>>bins,density,_=plt.hist(s,30,density=True)>>>x=np.linspace(0,5,1000)>>>plt.plot(x,stats.f.pdf(x,dfnum,dfden))>>>plt.xlim([0,5])>>>plt.show()
../../../_images/numpy-random-Generator-f-1.png
On this page