This package gives a number of functions to aid common data analysisprocesses and reporting statistical results in an RMarkdown file. Dataanalysis functions combine multiple base R functions used to describesimple bivariate relationships into a single, easy to use function.Reporting functions will return character strings to report p-values,confidence intervals, and hypothesis test and regression results.Strings will be LaTeX-formatted as necessary and will knit pretty in anRMarkdown document. The package also provides a wrapper for theCreateTableOne function in the tableone package to make the resultsknitable.
Suppose we have the following data:
pred1=sample(letters[1:3],size=50,replace=TRUE)out1=sample(letters[4:6],size=50,replace=TRUE)out2=rnorm(50)We can investigate the relationship betweenpred1 andout1 usingcat_compare():
cat_compare(x=pred1,y=out1)## Warning in chisq.test(tab_no_miss): Chi-squared approximation may be incorrect## $counts## y## x d e f Sum## a 8 7 3 18## b 6 2 9 17## c 7 4 4 15## Sum 21 13 16 50## ## $chisq## ## Pearson's Chi-squared test## ## data: tab_no_miss## X-squared = 6.5486, df = 4, p-value = 0.1618## ## ## $CramersV## [1] 0.2559017## ## $plot
We can investigate the distribution ofout2 acrosslevels ofpred1 usingnum_compare():
num_compare(y=out2,grp=pred1)## $summary_stats## n obs mis mean stdev med q1 q3## a 18 18 0 0.006755781 1.0851542 0.04793630 -0.4201223 0.8188215## b 17 17 0 0.001604250 0.8911016 0.07865256 -0.2591153 0.5775767## c 15 15 0 0.198539217 1.0332958 -0.07142822 -0.2773362 0.6744763## ## $decomp## Call:## aov(formula = y ~ grp, data = mydat)## ## Terms:## grp Residuals## Sum of Squares 0.39657 47.67131## Deg. of Freedom 2 47## ## Residual standard error: 1.007116## Estimated effects may be unbalanced## ## $eta_sq## [1] 0.008250299## ## $plot
inline andwrite functionsinline_test()inline_reg()inline_coef()inline_anova()write_int()write_p()as_perc()Using the data above, we can obtain some inferential results:
x=rnorm(50)y=rnorm(50)a=sample(letters[1:3],size=50,replace=TRUE)b=sample(letters[1:3],size=50,replace=TRUE)test1=t.test(x)test2=chisq.test(table(a,b))model1=lm(y~ x)model2=lm(y~ a)We can then report the results of the hypothesis test inline usinginline_test(test1) and get the following: (t(49) = -0.7),(p = 0.49). Simiarly,inline_test(test2) will report theresults of the chi-squared test: (^2(4) = 4.85), (p = 0.3). So farinline_test only works for (t) and chi-squared tests, butthe goal is to add more functionality - requests gladly accepted.
The regression results can be reported withinline_reg(model1) andinline_coef(model1, 'x') to get (R^2 = 0.02), (F(1,48) =0.81), (p = 0.37) and (b = -0.14), (t(48) = -0.9), (p = 0.37),respectively. In addition,inline_anova(model2) will reportthe ANOVA F statistic and relevant results: (F(2,47) = 2.81), (p =0.07). So farinline_reg andinline_coefcurrently work forlm andglm objects;inline_anova only works forlm objects.
We can also report the confidence intervals usingwrite_int() with a length-2 vector of interval endpoints.For example,write_int(c(3.04, 4.7)) andwrite_int(test1$conf.int) yield (3.04, 4.70) and (-0.37,0.18), respectively. If a 2-column matrix is provided towrite_int(), the entries in each row will be formatted intoan interval and a character vector will be returned.
P-values can be reported usingwrite_p(). This functionwill take either a numeric value or a list-like object with an elementnamedp.value. For example,write_p(0.00002)gives (p < 0.01) andwrite_p(test1) gives (p =0.49).
Many R functions produce proportions, though analysts may want toreport the output as a percentage.as_perc() will do this.For example,as_perc(0.01) will produce 1%.
See the help files of all functions described above for more detailsand options. For example, all test and regression reporting functionshave wrappers ending in_p which report only the p-value ofthe input.
KreateTableOneThe package also provides the functionKreateTableOne, awrapper forCreateTableOne from thetableonepackage which makes the results knitable. First useKreateTableOne in an R chunk withresults='hide' (or ouside the RMarkdown document), thenrecall the saved data frame in a new chunk. For example:
table1=KreateTableOne(x=mtcars,strata='am',factorVars='vs')colnames(table1)[1:2]=c('am = 0','am = 1')Then
knitr::kable(table1[,1:3],align='r')| am = 0 | am = 1 | p | |
|---|---|---|---|
| n | 19 | 13 | |
| mpg (mean (SD)) | 17.15 (3.83) | 24.39 (6.17) | <0.001 |
| cyl (mean (SD)) | 6.95 (1.54) | 5.08 (1.55) | 0.002 |
| disp (mean (SD)) | 290.38 (110.17) | 143.53 (87.20) | <0.001 |
| hp (mean (SD)) | 160.26 (53.91) | 126.85 (84.06) | 0.180 |
| drat (mean (SD)) | 3.29 (0.39) | 4.05 (0.36) | <0.001 |
| wt (mean (SD)) | 3.77 (0.78) | 2.41 (0.62) | <0.001 |
| qsec (mean (SD)) | 18.18 (1.75) | 17.36 (1.79) | 0.206 |
| vs = 1 (%) | 7 (36.8) | 7 (53.8) | 0.556 |
| am (mean (SD)) | 0.00 (0.00) | 1.00 (0.00) | <0.001 |
| gear (mean (SD)) | 3.21 (0.42) | 4.38 (0.51) | <0.001 |
| carb (mean (SD)) | 2.74 (1.15) | 2.92 (2.18) | 0.754 |