| Type: | Package |
| Title: | Confidence Intervals for Education |
| Version: | 0.0.1 |
| Description: | An educational package providing intuitive functions for calculating confidence intervals (CI) for various statistical parameters. Designed primarily for teaching and learning about statistical inference (particularly confidence intervals). Offers user-friendly wrappers around established methods for proportions, means, and bootstrap-based intervals. Integrates seamlessly with Tidyverse workflows, making it ideal for classroom demonstrations and student exercises. |
| License: | GPL (≥ 3) |
| URL: | https://github.com/GegznaV/ci |
| BugReports: | https://github.com/GegznaV/ci/issues |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Depends: | R (≥ 4.1.0) |
| Imports: | DescTools, dplyr, tidyr, purrr, forcats, tibble, checkmate |
| Suggests: | testthat (≥ 3.0.0), knitr, rmarkdown |
| Config/testthat/edition: | 3 |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2025-10-19 20:21:15 UTC; ViG |
| Author: | Vilmantas Gegzna |
| Maintainer: | Vilmantas Gegzna <GegznaV@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-10-22 19:30:02 UTC |
Proportion CI: Binary Variable (2 groups)
Description
Calculates confidence intervals (CI) for proportions in binary variables.This enhanced version ofDescTools::BinomCI() returns a data frame.
Usage
ci_binom(x, n, method = "modified wilson", conf.level = 0.95, ...)Arguments
x | Number of events of interest or favorable outcomes. |
n | Total number of events. |
method | Calculation method: |
conf.level | Confidence level. Default: 0.95. |
... | Additional parameters for |
Details
Similar toDescTools::BinomCI(), but uses the modified Wilson method by defaultand returns a data frame instead of a vector, enabling plotting withggplot2.
Value
A data frame with columns:
est(<dbl>) – proportion estimate;lwr.ci,upr.ci(<dbl>) – lower and upper CI bounds;x(<int>) – number of events of interest;n(<int>) – total number of events.
Examples
# Example 1: Survey responses# 54 out of 80 people agree with a statement# What is the true proportion of agreement in the population?ci_binom(x = 54, n = 80)# Interpretation: We're 95% confident the true proportion# is between lwr.ci and upr.ci (roughly 0.57 to 0.78)# Example 2: Medical treatment success# 23 out of 30 patients recoveredci_binom(x = 23, n = 30)# Example 3: Coin flips# Testing if a coin is fair: 58 heads in 100 flipsci_binom(x = 58, n = 100)# If 0.5 is in the CI, we can't rule out the coin being fair# Example 4: Effect of sample size# Same proportion (54/80 is approximately 0.675) but different sample sizesci_binom(x = 54, n = c(80, 100, 200, 500))# Notice: Larger samples give narrower (more precise) CIs# Example 5: Two separate groups# Group A: 23 successes, Group B: 45 successessuccesses <- c(23, 45)ci_binom(successes, n = sum(successes))# Example 6: Student exam pass rates# 67 out of 85 students passedci_binom(x = 67, n = 85)# Example 7: Different confidence levels# 90% confidence (narrower interval, less confident)ci_binom(x = 54, n = 80, conf.level = 0.90)# 99% confidence (wider interval, more confident)ci_binom(x = 54, n = 80, conf.level = 0.99)# Example 8: Comparing different methodsci_binom(x = 15, n = 25, method = "wilson")ci_binom(x = 15, n = 25, method = "agresti-coull")# Different methods can give slightly different resultsConfidence Intervals via Bootstrap
Description
Calculates confidence intervals (CI) using bootstrap methods.This enhanced version ofDescTools::BootCI() returns a data frame.
Usage
ci_boot(.data, x, y = NULL, conf.level = 0.95, ...)Arguments
.data | Data frame. |
x,y | Column names (unquoted). |
conf.level | Confidence level. Default: 0.95. |
... | Additional parameters for
|
Details
Similar toDescTools::BootCI(), but:
First argument is a data frame;
Arguments
xandyare unquoted column names;Responds to
dplyr::group_by()for subgroup calculations;Returns a data frame for convenient plotting withggplot2.
Value
A data frame with confidence intervals. Columns depend on arguments and grouping:
(if grouped) grouping variable names;
Column matching the statistic name (from
FUN) containing the estimate;lwr.ci,upr.ci– lower and upper CI bounds.
Note
Notes:
Each group should haveat least 20 observations for bootstrap methods.
Use
set.seed()for reproducible results.If using
bci.method = "bca"produces the warning"extreme order statistics used as endpoints",the BCa method is unsuitable; use"perc"instead(https://rcompanion.org/handbook/E_04.html).
Examples
# Bootstrap is useful when:# - Data is skewed (not normal)# - You want CI for statistics other than the mean (e.g., median, SD)# - You don't want to assume a specific distributiondata(iris, package = "datasets")head(iris)set.seed(123) # For reproducible results# Example 1: CI for the median (resistant to outliers)iris |> ci_boot(Petal.Length, FUN = median, R = 1000, bci.method = "perc")# Compare to mean CI - median is often more robust# Example 2: CI for the median by groupiris |> dplyr::group_by(Species) |> ci_boot(Petal.Length, FUN = median, R = 1000, bci.method = "perc")# Useful when groups have different distributions# Example 3: CI for standard deviation# How variable is petal length?set.seed(456)iris |> ci_boot(Petal.Length, FUN = sd, R = 1000, bci.method = "perc")# Example 4: CI for interquartile range (IQR)# IQR = 75th percentile - 25th percentileset.seed(789)iris |> ci_boot(Petal.Length, FUN = IQR, R = 1000, bci.method = "perc")# Example 5: CI for correlation coefficient (Pearson's r)# How related are petal length and width?set.seed(101)iris |> dplyr::group_by(Species) |> ci_boot( Petal.Length, Petal.Width, FUN = cor, method = "pearson", R = 1000, bci.method = "perc" )# Look for CIs that don't include 0 (suggests real correlation)# Example 6: Comparing BCa and percentile methodsset.seed(111)# BCa method (often more accurate but requires more assumptions)iris |> ci_boot(Petal.Length, FUN = median, R = 1000, bci.method = "bca")# Percentile method (simpler, more robust)iris |> ci_boot(Petal.Length, FUN = median, R = 1000, bci.method = "perc")# Example 7: Effect of number of bootstrap replicationsset.seed(222)# Fewer replications (faster but less stable)iris |> ci_boot(Petal.Length, FUN = median, R = 500, bci.method = "perc")# More replications (slower but more stable)iris |> ci_boot(Petal.Length, FUN = median, R = 5000, bci.method = "perc")# For teaching: 1000 is usually enough; for research: 5000-10000# Example 8: Handling missing valuesset.seed(333)iris |> ci_boot( Petal.Length, FUN = median, na.rm = TRUE, R = 1000, bci.method = "bca" )# Example 9: With mtcars datasetset.seed(444)data(mtcars, package = "datasets")mtcars |> dplyr::group_by(cyl) |> ci_boot(mpg, FUN = median, R = 1000, bci.method = "perc")# Compare median MPG for different cylinder counts# Example 10: Spearman correlation (rank-based, robust to outliers)set.seed(555)iris |> dplyr::group_by(Species) |> ci_boot( Petal.Length, Petal.Width, FUN = cor, method = "spearman", R = 1000, bci.method = "perc" )Mean CI from Data
Description
ci_mean_t() calculates the mean's confidence interval (CI)using theclassic formula with Student's t coefficient for datain data frame format. This enhanced version ofDescTools::MeanCI()responds todplyr::group_by(), enabling subgroup calculations.Result is a data frame.
Usage
ci_mean_t(.data, x, conf.level = 0.95, ...)Arguments
.data | Data frame. |
x | Column name (unquoted). |
conf.level | Confidence level. Default: 0.95. |
... | Additional parameters for |
Value
A data frame with columns:
(if present) grouping variable names;
mean(<dbl>) – mean estimate;lwr.ci,upr.ci(<dbl>) – lower and upper CI bounds.
Examples
# Example with built-in datasetdata(npk, package = "datasets")head(npk)# Basic CI calculation for crop yieldci_mean_t(npk, yield)# Interpretation: We're 95% confident the true mean yield# falls between lwr.ci and upr.ci# Using pipe operator (tidyverse style)npk |> ci_mean_t(yield)# Compare yields with nitrogen (N) treatment vs. withoutnpk |> dplyr::group_by(N) |> ci_mean_t(yield)# Look at the CIs: Do they overlap? Non-overlapping CIs suggest# a potential difference between groups# More complex grouping: Three factors at oncenpk |> dplyr::group_by(N, P, K) |> ci_mean_t(yield)# Example with iris dataset: Petal length by speciesdata(iris, package = "datasets")iris |> dplyr::group_by(Species) |> ci_mean_t(Petal.Length)# Notice how the three species have clearly different intervals# Example with mtcars: MPG by number of cylindersdata(mtcars, package = "datasets")mtcars |> dplyr::group_by(cyl) |> ci_mean_t(mpg)# 90% confidence interval (less confident, narrower interval)npk |> ci_mean_t(yield, conf.level = 0.90)# 99% confidence interval (more confident, wider interval)npk |> ci_mean_t(yield, conf.level = 0.99)Mean CI from Descriptive Statistics
Description
ci_mean_t_stat() calculates the mean's confidence interval (CI)using theclassic formula with Student's t coefficient whendescriptive statistics (mean, standard deviation, sample size) are provided.Useful when these values are reported in scientific literature.
Usage
ci_mean_t_stat(mean_, sd_, n, group = "", conf.level = 0.95)Arguments
mean_ | Vector of group means. |
sd_ | Vector of group standard deviations. |
n | Vector of group sizes. |
group | Group name. Default: empty string ( |
conf.level | Confidence level. Default: 0.95. |
Value
A data frame with columns:
group(<fct>) – group name;mean(<dbl>) – mean estimate;lwr.ci(<dbl>) – lower CI bound (lwr. = lower);upr.ci(<dbl>) – upper CI bound (upr. = upper);sd(<dbl>) – standard deviation;n(<int>) – sample/group size.
Calculations can be performed for multiple groups simultaneously.
Note
Each ofmean_,sd_,n,group must have length(a) of one value, or(b) matching the longest vector in this argument group.
See examples for clarification.
Examples
# Basic example: Test scores# Suppose a class of 25 students has a mean score of 75 with SD of 10ci_mean_t_stat(mean_ = 75, sd_ = 10, n = 25)# The result tells us we can be 95% confident that the true mean score# lies between the lower and upper CI bounds# Example from literature: A study reports mean = 362, SD = 35, n = 100ci_mean_t_stat(mean_ = 362, sd_ = 35, n = 100)# Without argument names (order matters: mean, sd, n):ci_mean_t_stat(362, 35, 100)# Comparing multiple groups (e.g., teaching methods):# Method A: mean = 78, SD = 8, n = 30 students# Method B: mean = 82, SD = 7, n = 28 students# Method C: mean = 75, SD = 9, n = 32 studentsmean_val <- c(78, 82, 75)std_dev <- c(8, 7, 9)n <- c(30, 28, 32)group <- c("Method A", "Method B", "Method C")ci_mean_t_stat(mean_val, std_dev, n, group)# Educational example: Effect of sample size on CI width# Same mean and SD, but different sample sizesci_mean_t_stat(mean_ = 75, sd_ = 10, n = c(10, 25, 50, 100))# Notice: Larger samples give narrower (more precise) confidence intervals# Educational example: Changing confidence level (default is 95%)ci_mean_t_stat(mean_ = 75, sd_ = 10, n = 25, conf.level = 0.99)# 99% CI is wider than 95% CI (more confident = less precise)# NOTE: Changing conf.level just to get narrower CI is a BAD PRACTICE!# Please choose confidence level based on study design, not desired CI width.# To display more decimal places, convert tibble to data frame:result_ci <- ci_mean_t_stat(75, 10, 25)as.data.frame(result_ci)# Or use:# View(result_ci)Proportion CI: Multinomial Variable (3 or more groups)
Description
Calculates simultaneous confidence intervals (CI) for proportions inmultinomial variables (k >= 3). This enhanced version ofDescTools::MultinomCI()returns a data frame.
Usage
ci_multinom( x, method = "goodman", conf.level = 0.95, gr_colname = "group", ...)Arguments
x | Vector of group sizes. Best if elements have meaningful names(see examples). |
method | Calculation method: |
conf.level | Confidence level. Default: 0.95. |
gr_colname | Column name (quoted) for group names. Default: |
... | Additional parameters for |
Details
Similar toDescTools::MultinomCI(), but uses the Goodman method by defaultand returns a data frame, enabling convenient plotting withggplot2.
Value
A data frame with columns:
groupor user-specified name (<fct>) – group names;est(<dbl>) – proportion estimate;lwr.ci,upr.ci(<dbl>) – lower and upper CI bounds;x(<int>) – group size;n(<int>) – total number of events.
Examples
# Example 1: Student grade distribution# A: 20 students, B: 35 students, C: 25 students, D/F: 15 studentsgrades <- c("A" = 20, "B" = 35, "C" = 25, "D/F" = 15)ci_multinom(grades)# Each row shows the CI for that grade's proportion# Example 2: Transportation preferencestransport <- c("Car" = 45, "Bus" = 30, "Bike" = 15, "Walk" = 20)ci_multinom(transport)# Example 3: Blood type distributionblood_types <- c("O" = 156, "A" = 134, "B" = 38, "AB" = 22)ci_multinom(blood_types)# Example 4: Political party preferenceparties <- c("Party A" = 380, "Party B" = 420, "Party C" = 200)ci_multinom(parties)# Unnamed frequencies (groups will be numbered)ci_multinom(c(20, 35, 54))# Using pipe operatorc("Small" = 20, "Medium" = 35, "Large" = 54) |> ci_multinom()# Different method for simultaneous intervalsc("Small" = 33, "Medium" = 35, "Large" = 30) |> ci_multinom(method = "sisonglaz")# Custom column name for groupsc("Dog" = 65, "Cat" = 48, "Bird" = 22, "Other" = 15) |> ci_multinom(gr_colname = "pet_type")# Example 5: Teaching method effectiveness# Outcome categories: Poor, Fair, Good, Excellentoutcomes <- c("Poor" = 8, "Fair" = 22, "Good" = 45, "Excellent" = 35)ci_multinom(outcomes)# Look for non-overlapping CIs to identify categories that differ significantly