| Title: | Risk-Adjusted Regression |
| Version: | 0.0.3 |
| Description: | Perform risk-adjusted regression and sensitivity analysis as developed in "Mitigating Omitted- and Included-Variable Bias in Estimates of Disparate Impact" Jung et al. (2024) <doi:10.48550/arXiv.1809.05651>. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| URL: | https://rar.jgaeb.com,https://github.com/jgaeb/rar |
| BugReports: | https://github.com/jgaeb/rar/issues |
| RoxygenNote: | 7.2.3 |
| LinkingTo: | cpp11, testthat |
| Suggests: | broom, forcats, stringr, testthat (≥ 3.0.0), xml2 |
| Config/testthat/edition: | 3 |
| Imports: | dplyr, glue, magrittr, purrr, rlang, tibble, tidyr,tidyselect, vctrs |
| NeedsCompilation: | yes |
| Packaged: | 2024-01-23 19:03:16 UTC; jgaeb |
| Author: | Johann Gaebler |
| Maintainer: | Johann Gaebler <me@jgaeb.com> |
| Repository: | CRAN |
| Date/Publication: | 2024-01-24 17:30:05 UTC |
Perform sensitivity analysis on a risk-adjusted regression
Description
sens() performs sensitivity analysis on a risk-adjusted regression bycomputing the maximum and minimum regression coefficients consistent with thedata and the analyst's prior knowledge, expressed throughepsilon, thebound on the mean absolute difference between the true and estimated risks.It additionally can provide bootstrapped pointwise confidence intervals forthe regression coefficients.
Usage
sens( df, group_col, obs_col, p_col, base_group, epsilon, lwr_col = NULL, upr_col = NULL, eta = 0.01, m = 101L, N = 0L, alpha = 0.05, chunk_size = 100L, n_threads = 1L)Arguments
df | The data frame containing the data. |
group_col | The name of the column containing the group labels. Thiscolumn should be a factor or coercible to a factor. |
obs_col | The name of the column containing whether or not the outcomewas observed. This column should be a logical or coercible to a logical. |
p_col | The name of the column containing the estimated risks. Theserisks should be expressed on the probability scale, i.e., be between 0 and 1. |
base_group | The name of the base group. This group will be used as thereference group in the regression. |
epsilon | The bound on the mean absolute difference between the true andestimated risks. |
lwr_col | The name of the column containing the lower bounds on the truerisk. (Defaults to 0 for all observations.) |
upr_col | The name of the column containing the upper bounds on the truerisk. (Defaults to 1 for all observations.) |
eta | The step size for the grid search. Note that while steps aretaken at the group level, the step size is expressed at the level of changein average riskacross the entire population. In other words, smallergroups will have proportionally larger steps. (Defaults to 0.01.) |
m | The grid size for the maximization approximation. (Defaults to |
N | The number of bootstrap resamples to use to compute pointwiseconfidence intervals. (Defaults to 0, which performs no bootstrap.) |
alpha | The confidence level for the pointwise confidence intervals.(Defaults to 0.05.) |
chunk_size | The number of repetitions to perform in each chunk when runin parallel. Larger chunk sizes make it less likely that separate threadswill block on each other, but also make it more likely that the threads willfinish at different times. (Defaults to 100.) |
n_threads | The number of threads to use when running in parallel.(Defaults to 1, i.e., serial execution.) |
Value
A data frame containing the following columns:
epsilon: Values of epsilon ranging from 0 to the input value ofepsiloninmsteps.beta_min_{group}: The minimum value of the regression coefficient for thegroupgroup. (Note that the base group is not included in this list.)beta_max_{group}: The maximum value of the regression coefficient for thegroupgroup. (Note that the base group is not included in this list.)(If
N > 0)beta_min_{group}_{alpha/2}: Thealpha/2quantile ofthe bootstrap distribution of the minimum value of the regression coefficientfor groupgroup. (Note that the base group is not included in this list.)(If
N > 0)beta_min_{group}_{1 - alpha/2}: The1 - alpha/2quantile of the bootstrap distribution of the minimum value of the regressioncoefficient for groupgroup. (Note that the base group is not included inthis list.)(If
N > 0)beta_max_{group}_{alpha/2}: Thealpha/2quantile ofthe bootstrap distribution of the maximum value of the regression coefficientfor groupgroup. (Note that the base group is not included in this list.)(If
N > 0)beta_max_{group}_{1 - alpha/2}: The1 - alpha/2quantile of the bootstrap distribution of the maximum value of the regressioncoefficient for groupgroup. (Note that the base group is not included inthis list.)
Details
The sensitivity analysis assumes that every group contains at least oneobserved and one unobserved individual, and that the estimated risks andupper and lower bounds are "sortable," i.e., that there exists a permutationof the rows such that the estimated risks and upper and lower bounds are allnon-decreasing within each group and observation status. If these conditionsare not met, the function will throw an error.
To ensure that these conditions continue to hold, the bootstrap resamples arestratified by group and observation status. As a result, in small samples,the confidence intervals may be slightly narrowed, since they do not accountfor uncertainty in the number of individuals in each group, and the number ofobserved and unobserved individuals within each group.
Examples
# Generate some dataset.seed(1)df <- tibble::tibble( group = factor( sample(c("a", "b"), 1000, replace = TRUE), levels = c("a", "b") ), p = runif(1000)^2, frisked = runif(1000) < p + 0.1 * (group != "a"))# Compute the sensitivity analysissens(df, group, frisked, p, "a", 0.1)# Search over a finer gridsens(df, group, frisked, p, "a", 0.1, eta = 0.001)# Increase the accuracy of the maximization approximationsens(df, group, frisked, p, "a", 0.1, m = 1001)# Calculate 90% pointwise confidence intervalssens(df, group, frisked, p, "a", 0.1, N = 1000, alpha = 0.1)# Run in parallel, adjusting the chunk size to avoid blockingsens(df, group, frisked, p, "a", 0.1, n_threads = 2, eta = 0.0001, chunk_size = 1000)