In this vignette, we’ll walk through conducting\(t\)-tests and their randomization-basedanalogue using infer. We’ll start out with a 1-sample\(t\)-test, which compares a sample mean to ahypothesized true mean value. Then, we’ll discuss 2-sample\(t\)-tests, testing the difference in meansof two populations using a sample of data drawn from them. If you’reinterested in evaluating whether differences in paired values (e.g. somemeasure taken of a person before and after an experiment) differ from 0,seevignette("paired", package = "infer").
Throughout this vignette, we’ll make use of thegssdataset supplied by infer, which contains a sample of data from theGeneral Social Survey. See?gss for more information on thevariables included and their source. Note that this data (and ourexamples on it) are for demonstration purposes only, and will notnecessarily provide accurate estimates unless weighted properly. Forthese examples, let’s suppose that this dataset is a representativesample of a population we want to learn about: American adults. The datalooks like this:
## Rows: 500## Columns: 11## $ year <dbl> 2014, 1994, 1998, 1996, 1994, 1996, 1990, 2016, 2000, 1998, 20…## $ age <dbl> 36, 34, 24, 42, 31, 32, 48, 36, 30, 33, 21, 30, 38, 49, 25, 56…## $ sex <fct> male, female, male, male, male, female, female, female, female…## $ college <fct> degree, no degree, degree, no degree, degree, no degree, no de…## $ partyid <fct> ind, rep, ind, ind, rep, rep, dem, ind, rep, dem, dem, ind, de…## $ hompop <dbl> 3, 4, 1, 4, 2, 4, 2, 1, 5, 2, 4, 3, 4, 4, 2, 2, 3, 2, 1, 2, 5,…## $ hours <dbl> 50, 31, 40, 40, 40, 53, 32, 20, 40, 40, 23, 52, 38, 72, 48, 40…## $ income <ord> $25000 or more, $20000 - 24999, $25000 or more, $25000 or more…## $ class <fct> middle class, working class, working class, working class, mid…## $ finrela <fct> below average, below average, below average, above average, ab…## $ weight <dbl> 0.8960, 1.0825, 0.5501, 1.0864, 1.0825, 1.0864, 1.0627, 0.4785…The 1-sample\(t\)-test can be usedto test whether a sample of continuous data could have plausibly comefrom a population with a specified mean.
As an example, we’ll test whether the average American adult works 40hours a week using data from thegss. To do so, we make useof thehours variable, giving the number of hours thatrespondents reported having worked in the previous week. Thedistribution ofhours in the observed data looks likethis:
It looks like most respondents reported having worked 40 hours, butthere’s quite a bit of variability. Let’s test whether we have evidencethat the true mean number of hours that Americans work per week is40.
infer’s randomization-based analogue to the 1-sample\(t\)-test is a 1-sample mean test. We’llstart off showcasing that test before demonstrating how to carry out atheory-based\(t\)-test with thepackage.
First, to calculate the observed statistic, we can usespecify() andcalculate().
# calculate the observed statisticobserved_statistic<- gss|>specify(response = hours)|>calculate(stat ="mean")The observed statistic is 41.382. Now, we want to compare thisstatistic to a null distribution, generated under the assumption thatthe mean was actually 40, to get a sense of how likely it would be forus to see this observed mean if the true number of hours worked per weekin the population was really 40.
We cangenerate() the null distribution using thebootstrap. In the bootstrap, for each replicate, a sample of size equalto the input sample size is drawn (with replacement) from the inputsample data. This allows us to get a sense of how much variability we’dexpect to see in the entire population so that we can then understandhow unlikely our sample mean would be.
# generate the null distributionnull_dist_1_sample<- gss|>specify(response = hours)|>hypothesize(null ="point",mu =40)|>generate(reps =1000,type ="bootstrap")|>calculate(stat ="mean")To get a sense for what these distributions look like, and where ourobserved statistic falls, we can usevisualize():
# visualize the null distribution and test statistic!null_dist_1_sample|>visualize()+shade_p_value(observed_statistic,direction ="two-sided" )It looks like our observed mean of 41.382 would be relativelyunlikely if the true mean was actually 40 hours a week. More exactly, wecan calculate the p-value:
# calculate the p value from the test statistic and null distributionp_value_1_sample<- null_dist_1_sample|>get_p_value(obs_stat = observed_statistic,direction ="two-sided")p_value_1_sample## # A tibble: 1 × 1## p_value## <dbl>## 1 0.032Thus, if the true mean number of hours worked per week was really 40,our approximation of the probability that we would see a test statisticas or more extreme than 41.382 is approximately 0.032.
Analogously to the steps shown above, the package supplies a wrapperfunction,t_test, to carry out 1-sample\(t\)-tests on tidy data. Rather than usingrandomization, the wrappers carry out the theory-based\(t\)-test. The syntax looks like this:
## # A tibble: 1 × 7## statistic t_df p_value alternative estimate lower_ci upper_ci## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>## 1 2.09 499 0.0376 two.sided 41.4 40.1 42.7An alternative approach to thet_test() wrapper is tocalculate the observed statistic with an infer pipeline and then supplyit to thept function from base R.
# calculate the observed statisticobserved_statistic<- gss|>specify(response = hours)|>hypothesize(null ="point",mu =40)|>calculate(stat ="t")|> dplyr::pull()Note that this pipeline to calculate an observed statistic includes acall tohypothesize() since the\(t\) statistic requires a hypothesized meanvalue.
Then, juxtaposing that\(t\)statistic with its associated distribution using theptfunction:
## [1] 0.03756Note that the resulting\(t\)-statistics from these two theory-basedapproaches are the same.
2-Sample\(t\)-tests evaluate thedifference in mean values of two populations using data randomly-sampledfrom the population that approximately follows a normal distribution. Asan example, we’ll test if Americans work the same number of hours a weekregardless of whether they have a college degree or not using data fromthegss. Thecollege andhoursvariables allow us to do so:
It looks like both of these distributions are centered near 40 hoursa week, but the distribution for those with a degree is slightly rightskewed.
infer’s randomization-based analogue to the 2-sample\(t\)-test is a difference in means test.We’ll start off showcasing that test before demonstrating how to carryout a theory-based\(t\)-test with thepackage.
As with the one-sample test, to calculate the observed difference inmeans, we can usespecify() andcalculate().
# calculate the observed statisticobserved_statistic<- gss|>specify(hours~ college)|>calculate(stat ="diff in means",order =c("degree","no degree"))observed_statistic## Response: hours (numeric)## Explanatory: college (factor)## # A tibble: 1 × 1## stat## <dbl>## 1 1.54Note that, in the linespecify(hours ~ college), wecould have swapped this out with the syntaxspecify(response = hours, explanatory = college)!
Theorder argument in thatcalculate linegives the order to subtract the mean values in: in our case, we’retaking the mean number of hours worked by those with a degree minus themean number of hours worked by those without a degree; a positivedifference, then, would mean that people with degrees worked more thanthose without a degree.
Now, we want to compare this difference in means to a nulldistribution, generated under the assumption that the number of hoursworked a week has no relationship with whether or not one has a collegedegree, to get a sense of how likely it would be for us to see thisobserved difference in means if there were really no relationshipbetween these two variables.
We cangenerate() the null distribution usingpermutation, where, for each replicate, each value of degree status willbe randomly reassigned (without replacement) to a new number of hoursworked per week in the sample in order to break any association betweenthe two.
# generate the null distribution with randomizationnull_dist_2_sample<- gss|>specify(hours~ college)|>hypothesize(null ="independence")|>generate(reps =1000,type ="permute")|>calculate(stat ="diff in means",order =c("degree","no degree"))Again, note that, in the linesspecify(hours ~ college)in the above chunk, we could have used the syntaxspecify(response = hours, explanatory = college)instead!
To get a sense for what these distributions look like, and where ourobserved statistic falls, we can usevisualize().
# visualize the randomization-based null distribution and test statistic!null_dist_2_sample|>visualize()+shade_p_value(observed_statistic,direction ="two-sided")It looks like our observed statistic of 1.5384 would be unlikely ifthere was truly no relationship between degree status and number ofhours worked. More exactly, we’ll use the randomization-based nulldistribution to calculate the p-value.
# calculate the p value from the randomization-based null# distribution and the observed statisticp_value_2_sample<- null_dist_2_sample|>get_p_value(obs_stat = observed_statistic,direction ="two-sided")p_value_2_sample## # A tibble: 1 × 1## p_value## <dbl>## 1 0.28Thus, if there were really no relationship between the number ofhours worked a week and whether one has a college degree, theprobability that we would see a statistic as or more extreme than 1.5384is approximately 0.28.
Note that, similarly to the steps shown above, the package supplies awrapper function,t_test(), to carry out 2-sample\(t\)-tests on tidy data. The syntax lookslike this:
## # A tibble: 1 × 7## statistic t_df p_value alternative estimate lower_ci upper_ci## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>## 1 1.12 366. 0.264 two.sided 1.54 -1.16 4.24In the above example, we specified the relationship with the syntaxformula = hours ~ college; we could have also writtenresponse = hours, explanatory = college.
An alternative approach to thet_test() wrapper is tocalculate the observed statistic with an infer pipeline and then supplyit to thept function from base R. We can calculate thestatistic as before, switching out thestat = "diff in means" argument withstat = "t".
# calculate the observed statisticobserved_statistic<- gss|>specify(hours~ college)|>hypothesize(null ="point",mu =40)|>calculate(stat ="t",order =c("degree","no degree"))|> dplyr::pull()observed_statistic## t ## 1.119Note that this pipeline to calculate an observed statistic includeshypothesize() since the\(t\) statistic requires a hypothesized meanvalue.
Then, juxtaposing that\(t\)statistic with its associated distribution using thept()function:
## [1] 0.2635Note that the results from these two theory-based approaches arenearly the same.