numpy.random.Generator.f#
method
- random.Generator.f(dfnum,dfden,size=None)#
Draw samples from an F distribution.
Samples are drawn from an F distribution with specified parameters,dfnum (degrees of freedom in numerator) anddfden (degrees offreedom in denominator), where both parameters must be greater thanzero.
The random variate of the F distribution (also known as theFisher distribution) is a continuous probability distributionthat arises in ANOVA tests, and is the ratio of two chi-squarevariates.
- Parameters:
- dfnumfloat or array_like of floats
Degrees of freedom in numerator, must be > 0.
- dfdenfloat or array_like of float
Degrees of freedom in denominator, must be > 0.
- sizeint or tuple of ints, optional
Output shape. If the given shape is, e.g.,
(m,n,k), thenm*n*ksamples are drawn. If size isNone(default),a single value is returned ifdfnumanddfdenare both scalars.Otherwise,np.broadcast(dfnum,dfden).sizesamples are drawn.
- Returns:
- outndarray or scalar
Drawn samples from the parameterized Fisher distribution.
See also
scipy.stats.fprobability density function, distribution or cumulative density function, etc.
Notes
The F statistic is used to compare in-group variances to between-groupvariances. Calculating the distribution depends on the sampling, andso it is a function of the respective degrees of freedom in theproblem. The variabledfnum is the number of samples minus one, thebetween-groups degrees of freedom, whiledfden is the within-groupsdegrees of freedom, the sum of the number of samples in each groupminus the number of groups.
References
[1]Glantz, Stanton A. “Primer of Biostatistics.”, McGraw-Hill,Fifth Edition, 2002.
[2]Wikipedia, “F-distribution”,https://en.wikipedia.org/wiki/F-distribution
Examples
An example from Glantz[1], pp 47-40:
Two groups, children of diabetics (25 people) and children from peoplewithout diabetes (25 controls). Fasting blood glucose was measured,case group had a mean value of 86.1, controls had a mean value of82.2. Standard deviations were 2.09 and 2.49 respectively. Are thesedata consistent with the null hypothesis that the parents diabeticstatus does not affect their children’s blood glucose levels?Calculating the F statistic from the data gives a value of 36.01.
Draw samples from the distribution:
>>>dfnum=1.# between group degrees of freedom>>>dfden=48.# within groups degrees of freedom>>>rng=np.random.default_rng()>>>s=rng.f(dfnum,dfden,1000)
The lower bound for the top 1% of the samples is :
>>>np.sort(s)[-10]7.61988120985 # random
So there is about a 1% chance that the F statistic will exceed 7.62,the measured value is 36, so the null hypothesis is rejected at the 1%level.
The corresponding probability density function for
n=20andm=20is:>>>importmatplotlib.pyplotasplt>>>fromscipyimportstats>>>dfnum,dfden,size=20,20,10000>>>s=rng.f(dfnum=dfnum,dfden=dfden,size=size)>>>bins,density,_=plt.hist(s,30,density=True)>>>x=np.linspace(0,5,1000)>>>plt.plot(x,stats.f.pdf(x,dfnum,dfden))>>>plt.xlim([0,5])>>>plt.show()
