| Type: | Package |
| Version: | 1.0.3 |
| Title: | Dropout Analysis by Condition |
| Description: | Analysis and visualization of dropout between conditions in surveys and (online) experiments. Features include computation of dropout statistics, comparing dropout between conditions (e.g. Chi square), analyzing survival (e.g. Kaplan-Meier estimation), comparing conditions with the most different rates of dropout (Kolmogorov-Smirnov) and visualizing the result of each in designated plotting functions. Sources: Andrea Frick, Marie-Terese Baechtiger & Ulf-Dietrich Reips (2001)https://www.researchgate.net/publication/223956222_Financial_incentives_personal_information_and_drop-out_in_online_studies; Ulf-Dietrich Reips (2002) "Standards for Internet-Based Experimenting" <doi:10.1027//1618-3169.49.4.243>. |
| Depends: | R (≥ 3.0.0) |
| Imports: | shiny, ggplot2, data.table, survival, lifecycle |
| Suggests: | DT, shinydashboard, knitr, rmarkdown, kableExtra |
| Date: | 2024-06-24 |
| License: | GPL (≥ 3) |
| LazyData: | true |
| RoxygenNote: | 7.3.1 |
| URL: | https://iscience-kn.github.io/dropR/,https://github.com/iscience-kn/dropR |
| VignetteBuilder: | knitr |
| Encoding: | UTF-8 |
| NeedsCompilation: | no |
| Packaged: | 2024-07-03 09:38:39 UTC; Admin |
| Author: | Annika Tave Overlander [aut, cre], Matthias Bannert [aut], Ulf-Dietrich Reips [ctb] |
| Maintainer: | Annika Tave Overlander <annika-tave.overlander@uni.kn> |
| Repository: | CRAN |
| Date/Publication: | 2024-07-03 16:20:02 UTC |
dropR: Dropout Analysis by Condition
Description
Analysis and visualization of dropout between conditions in surveys and (online) experiments. Features include computation of dropout statistics, comparing dropout between conditions (e.g. Chi square), analyzing survival (e.g. Kaplan-Meier estimation), comparing conditions with the most different rates of dropout (Kolmogorov-Smirnov) and visualizing the result of each in designated plotting functions. Sources: Andrea Frick, Marie-Terese Baechtiger & Ulf-Dietrich Reips (2001)https://www.researchgate.net/publication/223956222_Financial_incentives_personal_information_and_drop-out_in_online_studies; Ulf-Dietrich Reips (2002) "Standards for Internet-Based Experimenting"doi:10.1027//1618-3169.49.4.243.
Author(s)
Maintainer: Annika Tave Overlanderannika-tave.overlander@uni.kn
Authors:
Matthias Bannertbannert@kof.ethz.ch
Other contributors:
Ulf-Dietrich Reips [contributor]
See Also
Useful links:
Add Dropout Index to a Data.Frame
Description
Find drop out positions in a data.frame that contains multiplequestions that had been asked sequentially.This function adds the Dropout Index variabledo_idx to the data.frame which is necessaryfor further analyses of dropout.
Use this functionfirst to prepare your dropout analysis. Then, keep going by creatingthe dropout statistics usingcompute_stats().
Usage
add_dropout_idx(df, q_pos)Arguments
df | data.frame containing |
q_pos | numeric range of columns that contain question items |
Details
Importantly, this function will start counting missing data at the end of thedata frame. Any missing data which is somewhere in between, i.e.a single item that was skipped or forgotten will not be counted as dropout.The function will identify sequences of missing data that go until the end of thedata frame and add the number of the last answered question indo_idx.
Therefore, the variables must be in the order that they were asked, otherwise analyseswill not be valid.
Value
Returns original data frame with columndo_idx added.
Source
R/add_dropout_idx.R
See Also
compute_stats() which is usually the next step for dropout analysis.
Examples
dropout <- add_dropout_idx(dropRdemo, 3:54)Compute Dropout Statistics
Description
This is thesecond step in conducting dropout analysis withdropR.Outputs all necessary statistics to analyze and visualize dropout, such asthe sample size N of the data (and in each condition if selected), cumulativedropout and remaining participants in absolute numbers and percent.If no experimental condition is added, the stats are only calculated for thewhole data in total.
Usage
compute_stats(df, by_cond = "None", no_of_vars)Arguments
df | data.frame containing variable |
by_cond | character name of condition variable in the data, defaults to 'None' to output total statistics. |
no_of_vars | numeric number of variables that contain questions |
Value
A data frame with 6 columns (q_idx, condition, cs, N, remain, pct_remain)and as many rows as questions in original data (for overall data and if conditions selectedagain for each condition).
Examples
do_stats <- compute_stats(df = add_dropout_idx(dropRdemo, 3:54),by_cond = "experimental_condition",no_of_vars = 52)Compute Chisq-Test Given a Question Position
Description
This function performs a chi-squared contingency table test on dropout fora given question in the data. Note that the input data should be in the format ascomputed bycompute_stats().The test can be performed on either all conditions (excluding total) or on select conditions.
Usage
do_chisq(do_stats, chisq_question, sel_cond_chisq, p_sim = TRUE)Arguments
do_stats | data.frame of dropout statistics as computed by |
chisq_question | numeric Which question to compare dropout at. |
sel_cond_chisq | vector (same class as in conditions variable in original data set) selected conditions. |
p_sim | boolean Simulate p value parameter (by Monte Carlo simulation)? Defaults to |
Value
Returns test results from chisq.test between experimental conditions at defined question.
See Also
add_dropout_idx() andcompute_stats() which are necessary for the proper data structure.
Examples
do_stats <- compute_stats(add_dropout_idx(dropRdemo, 3:54),by_cond = "experimental_condition",no_of_vars = 52)do_chisq(do_stats, 47, c(12, 22), TRUE)Kaplan-Meier Survival Estimation
Description
This function needs a data set with a dropout index added byadd_dropout_idx().Thedo_kpm function performs survival analysis with Kaplan-Meier Estimationand returns a list containing survival steps, the original data frame, and the model fit type.The function can fit the survival model either for the entire data set or separately by a specified condition column.
Usage
do_kpm(df, condition_col = "experimental_condition", model_fit = "total")Arguments
df | data set with |
condition_col | character denoting the experimental conditions to model |
model_fit | character Should be either "total" for a total model or "conditions" |
Value
Returns a list containingsteps (survival steps extracted from the fitted models),d (the original data frame), andmodel_fit (the model fit type).
See Also
survival::Surv() used to fit survival object.
Examples
demo_kpm <- do_kpm(df = add_dropout_idx(dropRdemo, 3:54),condition_col = "experimental_condition",model_fit = "total")head(demo_kpm$steps)Compute Kolmogorov-Smirnov Test for most extreme conditions
Description
This test is used for survival analysis between the most extreme conditions,so the ones with the most different rates of dropout.This function automatically prepares your data and runsstats::ks.test() on it.
Usage
do_ks(do_stats, question)Arguments
do_stats | A data frame made from |
question | Index of question to be included in analysis, commonly the last question of the survey. |
Value
Returns result of Kolmogorov-Smirnoff test including which conditions have the most different dropout rates.
Examples
do_stats <- compute_stats(df = add_dropout_idx(dropRdemo, 3:54),by_cond = "experimental_condition",no_of_vars = 52)do_ks(do_stats, 52)Dropout Odds Ratio Table
Description
This function calculates an Odds Ratio table at a given question for selected experimentalconditions. It needs data in the format as created bycompute_stats() as input.
Usage
do_or_table(do_stats, chisq_question, sel_cond_chisq)Arguments
do_stats | data.frame statistics table as computed by |
chisq_question | numeric Which question to calculate the OR table for |
sel_cond_chisq | character vector naming the experimental conditions to compare |
Value
Returns a Matrix containing the Odds Ratios of dropout between all selected conditions.
See Also
Examples
do_stats <- compute_stats(df = add_dropout_idx(dropRdemo, 3:54),by_cond = "experimental_condition",no_of_vars = 52)do_or_table(do_stats, chisq_question = 51, sel_cond_chisq = c("11", "12", "21", "22"))Calculate Steps for Uneven Data Points
Description
Thedo_steps function calculates steps for data points represented by numbers of questions from the originalexperimental or survey data inx and remaining percent of participants iny.
Usage
do_steps(x, y, return_df = TRUE)Arguments
x | Numeric vector representing the question numbers |
y | Numeric vector representing the remaining percent of participants |
return_df | Logical. If TRUE, the function returns a data frame; otherwise, it returns a list. |
Details
Due to the nature of dropout/ survival data, step functions are necessary to accurately depict participants remaining.Dropout data includes the time until the event (a.k.a. dropout at a certain question or time), so that changes in remainingparticipants are discrete rather than continuous. This means that changes in survival probability occur at specific pointsand are better represented as steps than as a continuum.
Value
Returns a data frame or a list containing the modifiedx andy values.
Examples
x <- c(1, 2, 3, 4, 5)y <- c(100, 100, 95, 90, 85)do_steps(x, y)# Using the example dataset dropRdemodo_stats <- compute_stats(df = add_dropout_idx(dropRdemo, 3:54),by_cond = "experimental_condition",no_of_vars = 52)tot_stats <- do_stats[do_stats$condition == "total", ]do_steps(tot_stats$q_idx, tot_stats$pct_remain)Demo Dataset for Dropout in an Online Survey
Description
Simulated demo data set for dropout in a survey.
Format
A data frame with 246 rows and 54 variables (in the order they were presented in the fictional survey).
- obs_id
Observation ID
- experimental_condition
experimental condition
- vi_1
item 1
- vi_2
item 2
- vi_3
item 3
- vi_4
item 4
- vi_5
item 5
- vi_6
item 6
- vi_7
item 7
- vi_8
item 8
- vi_9
item 9
- vi_10
item 10
- vi_11
item 11
- vi_12
item 12
- vi_13
item 13
- vi_14
item 14
- vi_15
item 15
- vi_16
item 16
- vi_17
item 17
- vi_18
item 18
- vi_19
item 19
- vi_20
item 20
- vi_21
item 21
- vi_22
item 22
- vi_23
item 23
- vi_24
item 24
- vi_25
item 25
- vi_26
item 26
- vi_27
item 27
- vi_28
item 28
- vi_29
item 29
- vi_30
item 30
- vi_31
item 31
- vi_32
item 32
- vi_33
item 33
- vi_34
item 34
- vi_35
item 35
- vi_36
item 36
- vi_37
item 37
- vi_38
item 38
- vi_39
item 39
- vi_40
item 40
- vi_41
item 41
- vi_42
item 42
- vi_43
item 43
- vi_44
item 44
- vi_45
item 45
- vi_46
item 46
- vi_47
item 47
- vi_48
item 48
- vi_49
item 49
- vi_50
item 50
- vi_51
item 51
- vi_52
item 52
Source
dropRdemo Demo data for dropout.
Compute Odds From Probabilities
Description
Compute odds from probabilities. The function is vectorized andcan handle a vector of probabilities, e.g. remaining percent of participantsas calculated bycompute_stats().
Usage
get_odds(p)Arguments
p | vector of probabilities. May not be larger than 1 or smaller than zero. |
Value
Returns numerical vector of the same length as original input reflecting the odds.
Examples
get_odds(0.7)get_odds(c(0.7, 0.2))Compute Odds Ratio
Description
Computes odds ratio given two probabilities.In this package, the function can be used to compare the percentages of remainingparticipants between two conditions at a time.
Usage
get_odds_ratio(a, b)Arguments
a | numeric probability value between 0 and 1. |
b | numeric probability value between 0 and 1. |
Value
Returns numerical vector of the same length as original input reflecting the Odds Ratio (OR).
See Also
get_odds(), as this is the basis for calculation.
Examples
get_odds_ratio(0.7, 0.6)Get Steps Data by Condition
Description
Theget_steps_by_cond function calculates steps data based on survival model results.This utility function is used inside thedo_kpm() function ofdropR.
Usage
get_steps_by_cond(sfit, condition = NULL)Arguments
sfit | An object representing survival model results (e.g., from a Kaplan-Meier model). |
condition | Optional. An experimental condition to include in the output data frame, defaults to |
Value
Returns a data frame containing the steps data, including time, survival estimates, upper confidence bounds, and lower confidence bounds.
See Also
Test Survival Curve Differences
Description
This function compares survival curves as modeled withdo_kpm().It outputs a contingency table and a Chisq measure of difference.
Usage
get_survdiff(kds, cond, test_type)Arguments
kds | data set of a survival model such as |
cond | character of experimental condition variable in the data |
test_type | numeric (0 or 1) parameter that controls the type of test (0 means rho = 0; log-rank,1 means rho = 1; Peto & Peto Wilcox) |
Value
Returns survival test results as called fromsurvival::survdiff().
Examples
kpm_est <- do_kpm(add_dropout_idx(dropRdemo, 3:54))get_survdiff(kpm_est$d, "experimental_condition", 0)get_survdiff(kpm_est$d, "experimental_condition", 1)Plot Dropout Curves
Description
This functions usesggplot2to create drop out curves.Please note that you should useadd_dropout_idx() andcompute_stats() on yourdata before running this function as it needs a certain data structure and variables towork properly.
Usage
plot_do_curve( do_stats, linetypes = TRUE, stroke_width = 1, full_scale = TRUE, show_points = FALSE, show_confbands = FALSE, color_palette = "color_blind")Arguments
do_stats | data.frame containing dropout statistics table computed by |
linetypes | boolean Should different line types be used? Defaults to TRUE. |
stroke_width | numeric stroke width, defaults to 1. |
full_scale | boolean Should y axis range from 0 to 100? Defaults to TRUE,FALSE cuts off at min percent remaining (>0). |
show_points | boolean Should dropout curves show individual data points? Defaults to FALSE. |
show_confbands | boolean Should there be confidence bands added to the plot? Defaults to FALSE. |
color_palette | character indicating which color palette to use. Defaults to 'color_blind',alternatively choose 'gray' or 'default' for the ggplot2 default colors. |
Value
Returns aggplot object containing the dropout curve plot. Using the Shiny App version ofdropR, this plot can easily be downloaded in different formats.
See Also
add_dropout_idx() andcompute_stats() which are necessary for the proper data structure.
Examples
do_stats <- compute_stats(add_dropout_idx(dropRdemo, 3:54),by_cond = "experimental_condition",no_of_vars = 52)plot_do_curve(do_stats)Plot a Kaplan Meier Survival Estimation
Description
Theplot_do_kpm function generates a Kaplan-Meier survival plot based on theoutput from thedo_kpm() function. It allows for customization of conditionsto display, confidence intervals, color palettes, and y-axis scaling.
Usage
plot_do_kpm( kds, sel_conds = c("11", "12", "21", "22"), kpm_ci = TRUE, full_scale_kpm = FALSE, color_palette_kp = "color_blind")Arguments
kds | list object as modeled by |
sel_conds | character Which experimental conditions to plot. |
kpm_ci | boolean Should there be confidence bands in the plot? Defaults to TRUE. |
full_scale_kpm | boolean Should the Y axis show the full range from 0 to 100? Defaults to FALSE. |
color_palette_kp | character indicating which color palette to use. Defaults to 'color_blind',alternatively choose 'gray' for gray scale values or 'default' for the ggplot2 default colors. |
Value
Returns aggplot object containing the Kaplan-Meier survival plot. Using the Shiny App version ofdropR, this plot can easily be downloaded in different formats.
Examples
plot_do_kpm(do_kpm(d = add_dropout_idx(dropRdemo, 3:54),condition_col = "experimental_condition",model_fit = "total"))plot_do_kpm(do_kpm(d = add_dropout_idx(dropRdemo, 3:54),condition_col = "experimental_condition",model_fit = "conditions"), sel_conds = c("11", "12", "21", "22"))Plot Most Extreme Conditions to Visualize Kolmogorov-Smirnov Test Results
Description
With this function, you can easily plot the most extreme conditions, a.k.a. those with the mostdifferent dropout rates at a certain question. You need to define that question in the function call ofdo_ks() already, or just call that function directly inside the plot function.
Usage
plot_do_ks( do_stats, ks, linetypes = FALSE, show_confbands = FALSE, color_palette = c("#E69F00", "#CC79A7"))Arguments
do_stats | data.frame containing dropout statistics table computed by |
ks | List of results from the |
linetypes | boolean Should different line types be used? Defaults to FALSE. |
show_confbands | boolean Should there be confidence bands added to the plot? Defaults to FALSE. |
color_palette | character indicating which color palette to use. Defaults to color blind friendly values,alternatively choose 'gray' or create your own palette with two colors, e.g. using R |
Value
Returns aggplot object containing the survival curve plot of the most extremedropout conditions. Using the Shiny App version of dropR, this plot can easily be downloaded in different formats.
See Also
Examples
do_stats <- compute_stats(add_dropout_idx(dropRdemo, 3:54), by_cond = "experimental_condition",no_of_vars = 52)ks <- do_ks(do_stats, 52)plot_do_ks(do_stats, ks, color_palette = "gray")# ... or call the do_ks() function directly inside the plotting functionplot_do_ks(do_stats, do_ks(do_stats, 30))plot_do_ks(do_stats, ks, linetypes = TRUE, show_confbands = TRUE, color_palette = c("red", "violet"))Start the dropR Shiny App
Description
Starts the interactive web application to use dropR in your web browser.Make sure to use Google Chrome or Firefox for best experience.
Usage
start_app()Details
The app will give less experienced R users or statisticians a good overview ofhow to conduct dropout analysis. For more experienced analysts, it can still be very helpfulin guiding how to use the package as there are some steps that should be taken in order,which is outlined in the app (as well as function documentation).
Value
No return value; starts the shiny app as a helper to get started with dropout analysis. All app procedures are available as functions.