numpy.random.Generator.standard_t#
method
- random.Generator.standard_t(df,size=None)#
Draw samples from a standard Student’s t distribution withdf degreesof freedom.
A special case of the hyperbolic distribution. Asdf getslarge, the result resembles that of the standard normaldistribution (
standard_normal).- Parameters:
- dffloat or array_like of floats
Degrees of freedom, must be > 0.
- sizeint or tuple of ints, optional
Output shape. If the given shape is, e.g.,
(m,n,k), thenm*n*ksamples are drawn. If size isNone(default),a single value is returned ifdfis a scalar. Otherwise,np.array(df).sizesamples are drawn.
- Returns:
- outndarray or scalar
Drawn samples from the parameterized standard Student’s t distribution.
Notes
The probability density function for the t distribution is
\[P(x, df) = \frac{\Gamma(\frac{df+1}{2})}{\sqrt{\pi df}\Gamma(\frac{df}{2})}\Bigl( 1+\frac{x^2}{df} \Bigr)^{-(df+1)/2}\]The t test is based on an assumption that the data come from aNormal distribution. The t test provides a way to test whetherthe sample mean (that is the mean calculated from the data) isa good estimate of the true mean.
The derivation of the t-distribution was first published in1908 by William Gosset while working for the Guinness Breweryin Dublin. Due to proprietary issues, he had to publish undera pseudonym, and so he used the name Student.
References
[1]Dalgaard, Peter, “Introductory Statistics With R”,Springer, 2002.
[2]Wikipedia, “Student’s t-distribution”https://en.wikipedia.org/wiki/Student’s_t-distribution
Examples
From Dalgaard page 83[1], suppose the daily energy intake for 11women in kilojoules (kJ) is:
>>>intake=np.array([5260.,5470,5640,6180,6390,6515,6805,7515, \...7515,8230,8770])
Does their energy intake deviate systematically from the recommendedvalue of 7725 kJ? Our null hypothesis will be the absence of deviation,and the alternate hypothesis will be the presence of an effect that could beeither positive or negative, hence making our test 2-tailed.
Because we are estimating the mean and we have N=11 values in our sample,we have N-1=10 degrees of freedom. We set our significance level to 95% andcompute the t statistic using the empirical mean and empirical standarddeviation of our intake. We use a ddof of 1 to base the computation of ourempirical standard deviation on an unbiased estimate of the variance (note:the final estimate is not unbiased due to the concave nature of the squareroot).
>>>np.mean(intake)6753.636363636364>>>intake.std(ddof=1)1142.1232221373727>>>t=(np.mean(intake)-7725)/(intake.std(ddof=1)/np.sqrt(len(intake)))>>>t-2.8207540608310198
We draw 1000000 samples from Student’s t distribution with the adequatedegrees of freedom.
>>>importmatplotlib.pyplotasplt>>>rng=np.random.default_rng()>>>s=rng.standard_t(10,size=1000000)>>>h=plt.hist(s,bins=100,density=True)
Does our t statistic land in one of the two critical regions found atboth tails of the distribution?
>>>np.sum(np.abs(t)<np.abs(s))/float(len(s))0.018318 #random < 0.05, statistic is in critical region
The probability value for this 2-tailed test is about 1.83%, which islower than the 5% pre-determined significance threshold.
Therefore, the probability of observing values as extreme as our intakeconditionally on the null hypothesis being true is too low, and we rejectthe null hypothesis of no deviation.
