| Title: | Detect Clinical Trial Sites Over- or Under-Reporting ClinicalEvents |
| Version: | 1.0.0 |
| Description: | Monitoring reporting rates of subject-level clinical events (e.g. adverse events, protocol deviations) reported by clinical trial sites is an important aspect of risk-based quality monitoring strategy. Sites that are under-reporting or over-reporting events can be detected using bootstrap simulations during which patients are redistributed between sites. Site-specific distributions of event reporting rates are generated that are used to assign probabilities to the observed reporting rates. (Koneswarakantha 2024 <doi:10.1007/s43441-024-00631-8>). |
| URL: | https://openpharma.github.io/simaerep/,https://github.com/openpharma/simaerep/ |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| Depends: | R (≥ 4.0), ggplot2 |
| Imports: | dplyr (≥ 1.1.0), tidyr (≥ 1.1.0), magrittr, purrr, rlang,stringr, forcats, cowplot, RColorBrewer, furrr (≥ 0.2.1),progressr, knitr, tibble, dbplyr, glue |
| Suggests: | testthat, devtools, pkgdown, spelling, haven, vdiffr, lintr,DBI, duckdb, ggExtra |
| RoxygenNote: | 7.3.2 |
| Language: | en-US |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | no |
| Packaged: | 2025-10-28 11:22:11 UTC; koneswab |
| Author: | Bjoern Koneswarakantha |
| Maintainer: | Bjoern Koneswarakantha <bjoern.koneswarakantha@roche.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-10-28 11:40:02 UTC |
Pipe operator
Description
Seemagrittr::%>% for details.
Usage
lhs %>% rhsValue
returns output of rhs function
Aggregate duplicated visits.
Description
Internal function called bycheck_df_visit().
Usage
aggr_duplicated_visits(df_visit, event_names = "ae")Arguments
df_visit | dataframe with columns: study_id, site_number, patnum, visit,n_ae |
event_names | vector, contains the event names, default = "ae" |
Value
df_visit corrected
Integrity check for df_visit.
Description
Internal function used by all functions that accept df_visit as a parameter.Checks for NA columns, numeric visits and AEs, implicitly missing andduplicated visits.
Usage
check_df_visit(df_visit, event_names = c("event"))Arguments
df_visit | dataframe with columns: study_id, site_number, patnum, visit,n_ae |
event_names | vector, contains the event names, default = "ae" |
Value
corrected df_visit
See Also
Examples
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, ratio_out = 0.4, factor_event_rate = 0.6 ) %>% # internal functions require internal column names dplyr::rename( site_number = site_id, patnum = patient_id )df_visit_filt <- df_visit %>% dplyr::filter(visit != 3)df_visit_corr <- check_df_visit(df_visit_filt)3 %in% df_visit_corr$visitnrow(df_visit_corr) == nrow(df_visit)df_visit_corr <- check_df_visit(dplyr::bind_rows(df_visit, df_visit))nrow(df_visit_corr) == nrow(df_visit)Evaluate sites.
Description
Correct under-reporting probabilities usingp.adjust.
Usage
eval_sites( df_sim_sites, method = "BH", under_only = TRUE, visit_med75 = TRUE, ...)Arguments
df_sim_sites | dataframe generated by |
method | character, passed to stats::p.adjust(), if NULL no multiplicity correctionwill be made. |
under_only | Logical, compute under-reporting probabilities only.only applies to the classic algorithm in which a one-sided evaluation cansave computation time. Default: FALSE |
visit_med75 | Logical, should evaluation point visit_med75 be used. Compatiblewith inframe and classic version of the algorithm.Default: FALSE |
... | use to pass r_sim_sites parameter to eval_sites_deprecated() |
Value
dataframe with the following columns:
- study_id
study identification
- site_number
site identification
- visit_med75
median(max(visit)) * 0.75
- mean_ae_site_med75
mean AE at visit_med75 site level
- mean_ae_study_med75
mean AE at visit_med75 study level
- pval
p-value as returned by
poisson.test- prob
bootstrapped probability
See Also
site_aggr,sim_sites,sim_inframe,p.adjust
Examples
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, ratio_out = 0.4, factor_event_rate = 0.6 ) %>% # internal functions require internal column names dplyr::rename( n_ae = n_event, site_number = site_id, patnum = patient_id )df_site <- site_aggr(df_visit)df_sim_sites <- sim_sites(df_site, df_visit, r = 100)df_eval <- eval_sites(df_sim_sites)df_evalExpose implicitly missing visits.
Description
Internal function called bycheck_df_visit().
Usage
exp_implicit_missing_visits(df_visit, event_names = "ae")Arguments
df_visit | dataframe with columns: study_id, site_number, patnum, visit,n_ae |
event_names | vector, contains the event names, default = "ae" |
Value
df_visit corrected
Get cumulative mean event development
Description
Calculate average increase of events per visit and cumulative average increase.
Usage
get_cum_mean_event_dev( df_visit, group = c("site_number", "study_id"), event_names = c("ae"))Arguments
df_visit | Data frame with columns: study_id, site_number, patnum, visit,n_ae. |
group | character, grouping variable, one of: c("site_number", "study_id") |
event_names | vector, contains the event names, default = "event" |
Details
This is more stable than using mean cumulative patient count per visitas only a few patients will contribute to later visits. Here the impact of thelater visits is reduced as they can only add or subtract to the results fromearlier visits and not shift the mean independently.
Examples
df_visit <- sim_test_data_study(n_pat = 1000, n_sites = 10) %>% dplyr::rename( site_number = site_id, patnum = patient_id, n_ae = n_event )get_cum_mean_event_dev(df_visit)get_cum_mean_event_dev(df_visit, group = "study_id")Get df_visit_test
Description
Get df_visit_test
Usage
get_df_visit_test()Get df_visit_test mapped
Description
Get df_visit_test mapped
Usage
get_df_visit_test_mapped()replace cowplot::get_legend, to silence warningMultiple components found; returning the first one. To return all, use 'return_all = TRUE
Description
replace cowplot::get_legend, to silence warningMultiple components found; returning the first one. To return all, use 'return_all = TRUE
Usage
get_legend(p)Get Portfolio Configuration
Description
Get Portfolio configuration from a df_visit input dataframe. Will. filter studies with only a few sites and patients and will anonymize IDs.. Portfolio configuration can beused bysim_test_data_portfolio to generate data for anartificial portfolio.
Usage
get_portf_config( df_visit, check = TRUE, min_pat_per_study = 100, min_sites_per_study = 10, anonymize = TRUE, pad_width = 4)Arguments
df_visit | input dataframe with columns study_id, site_id, patient_id, visit, n_events.Can also be a lazy database table. |
check | logical, perform standard checks on df_visit, Default: TRUE |
min_pat_per_study | minimum number of patients per study, Default: 100 |
min_sites_per_study | minimum number of sites per study, Default: 10 |
anonymize | logical, Default: TRUE |
pad_width | padding width for newly created IDs, Default: 4 |
Value
dataframe with the following columns:
- study_id
study identification
- event_per_visit_mean
meanevent per visit per study
- site_id
site
- max_visit_sd
standard deviation of maximum patient visits persite
- max_visit_mean
mean of maximum patient visits per site
- n_pat
number of patients
See Also
sim_test_data_studyget_portf_configsim_test_data_portfolio
Examples
df_visit1 <- sim_test_data_study(n_pat = 100, n_sites = 10, ratio_out = 0.4, factor_event_rate = - 0.6, study_id = "A")df_visit2 <- sim_test_data_study(n_pat = 100, n_sites = 10, ratio_out = 0.2, factor_event_rate = - 0.1, study_id = "B")df_visit <- dplyr::bind_rows(df_visit1, df_visit2)get_portf_config(df_visit)# Database examplecon <- DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:")dplyr::copy_to(con, df_visit, "visit")tbl_visit <- dplyr::tbl(con, "visit")get_portf_config(tbl_visit)DBI::dbDisconnect(con)Get Portfolio Event RatesCalculates mean event rates per study and visit in a df_visit simaerep inputdataframe.
Description
Get Portfolio Event RatesCalculates mean event rates per study and visit in a df_visit simaerep inputdataframe.
Usage
get_portf_event_rates(df_visit, check = TRUE, anonymize = TRUE, pad_width = 4)Arguments
df_visit | input dataframe with columns study_id, site_id, patient_id, visit, n_events.Can also be a lazy database table. |
check | logical, perform standard checks on df_visit, Default: TRUE |
anonymize | logical, Default: TRUE |
pad_width | padding width for newly created IDs, Default: 4 |
Examples
df_visit1 <- sim_test_data_study(n_pat = 100, n_sites = 10, ratio_out = 0.4, factor_event_rate = - 0.6, study_id = "A")df_visit2 <- sim_test_data_study(n_pat = 100, n_sites = 10, ratio_out = 0.2, factor_event_rate = - 0.1, study_id = "B")df_visit <- dplyr::bind_rows(df_visit1, df_visit2)get_portf_event_rates(df_visit)# Database examplecon <- DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:")dplyr::copy_to(con, df_visit, "visit")tbl_visit <- dplyr::tbl(con, "visit")get_portf_event_rates(tbl_visit)DBI::dbDisconnect(con)Get site mean ae development.
Description
Internal function used bysite_aggr(),returns mean AE development from visit 0 to visit_med75.
Usage
get_site_mean_ae_dev(df_visit, df_pat, df_site, event_names = c("ae"))Arguments
df_visit | dataframe |
df_pat | dataframe as returned by pat_aggr() |
df_site | dataframe as returned by site_aggr() |
event_names | vector, contains the event names, default = "ae" |
Value
dataframe
Get visit_med75.
Description
Internal function used bysite_aggr().
Usage
get_visit_med75(df_pat, method = "med75_adj", min_pat_pool = 0.2)Arguments
df_pat | dataframe as returned by |
method | character, one of c("med75", "med75_adj", "max") defining method fordefining evaluation point visit_med75 (see details), Default: "med75_adj" |
min_pat_pool | double, minimum ratio of available patients available forsampling. Determines maximum visit_med75 value see Details. Default: 0.2 |
Value
dataframe
is orivisit class
Description
internal function
Usage
is_orivisit(x)Arguments
x | object |
Value
logical
is simaerep class
Description
internal function
Usage
is_simaerep(x)Arguments
x | object |
Value
logical
Calculate Max Rank
Description
like rank() with ties.method = "max", works on tbl objects
Usage
max_rank(df, col, col_new)Arguments
df | dataframe |
col | character column name to rank y |
col_new | character column name for rankings |
Details
this is needed for hochberg p value adjustment. We need to assign higherrank when multiple sites have same p value
Examples
df <- tibble::tibble(s = c(1, 2, 2, 2, 5, 10)) %>% dplyr::mutate( rank = rank(s, ties.method = "max") )df %>% simaerep:::max_rank("s", "max_rank")# Databasecon <- DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:")dplyr::copy_to(con, df, "df")simaerep:::max_rank(dplyr::tbl(con, "df"), "s", "max_rank")DBI::dbDisconnect(con)create orivisit object
Description
Internal S3 object, stores lazy reference to original visitdata.
Usage
orivisit( df_visit, call = NULL, env = parent.frame(), event_names = c("event"), col_names = list(study_id = "study_id", site_id = "site_id", patient_id = "patient_id", visit = "visit"))Arguments
df_visit | Data frame with columns: study_id, site_number, patnum, visit,n_ae. |
call | optional, provide call, Default: NULL |
env | Optional, provide environment of original visit data. Default:parent.frame(). |
event_names | vector, contains the event names, default = "event" |
col_names | named list, indicate study_id, site_id, patient_id and visitcolumn in df_visit input dataframe. Default: list(study_id = "study_id",site_id = "site_id",patient_id = "patient_id",visit = "visit") |
Details
Saves variable name of original visit data, checks whether it can beretrieved from parent environment and stores summary. Original data can beretrieved using as.data.frame(x).
Value
orivisit object
Examples
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, ratio_out = 0.4, factor_event_rate = - 0.6)#'visit <- orivisit(df_visit)object.size(df_visit)object.size(visit)as.data.frame(visit)benjamini hochberg p value correction using table operations
Description
benjamini hochberg p value correction using table operations
Usage
p_adjust_bh_inframe(df_eval, cols)Aggregate visit to patient level.
Description
Internal function used bysite_aggr() andplot_visit_med75(), adds the maximum visit for each patient.
Usage
pat_aggr(df_visit)Arguments
df_visit | dataframe |
Value
dataframe
Create a study specific patient pool for sampling
Description
Internal function forsim_sites,filter all visits greater than max_visit_med75_studyreturns dataframe with one column for studies and one column with nestedpatient data.
Usage
pat_pool(df_visit, df_site)Arguments
df_visit | dataframe, created by |
df_site | dataframe created by |
Value
dataframe with nested pat_pool column
Examples
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, ratio_out = 0.4, factor_event_rate = 0.6 ) %>% # internal functions require internal column names dplyr::rename( n_ae = n_event, site_number = site_id, patnum = patient_id )df_site <- site_aggr(df_visit)df_pat_pool <- simaerep:::pat_pool(df_visit, df_site)df_pat_poolplot AE under-reporting simulation results
Description
generic plot function for simaerep objects
Usage
## S3 method for class 'simaerep'plot( x, ..., study = NULL, what = c("prob", "med75"), n_sites = 16, df_visit = NULL, env = parent.frame(), plot_event = x$event_names[1])Arguments
x | simaerep object |
... | additional parameters passed toplot_study() orplot_visit_med75() |
study | character specifying study to be plotted, Default: NULL |
what | one of c("ur", "med75"), specifying whether to plot site AEunder-reporting or visit_med75 values, Default: 'ur' |
n_sites | number of sites to plot, Default: 16 |
df_visit | optional, pass original visit data if it cannot be retrievedfrom parent environment, Default: NULL |
env | optional, pass environment from which to retrieve original visitdata, Default: parent.frame() |
plot_event | vector containing the events that should be plotted, default = "ae" |
Details
seeplot_study() andplot_visit_med75()
Value
ggplot object
Examples
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, ratio_out = 0.4, factor_event_rate = - 0.6)evrep <- simaerep(df_visit)plot(evrep, what = "prob", study = "A")plot(evrep, what = "med75", study = "A")Plots AE per site as dots.
Description
This plot is meant to supplement the package documentation.
Usage
plot_dots( df, nrow = 10, ncols = 10, col_group = "site", thresh = NULL, color_site_a = "#BDBDBD", color_site_b = "#757575", color_site_c = "gold3", color_high = "#00695C", color_low = "#25A69A", size_dots = 10)Arguments
df | dataframe, cols = c('site', 'patients', 'n_ae') |
nrow | integer, number of rows, Default: 10 |
ncols | integer, number of columns, Default: 10 |
col_group | character, grouping column, Default: 'site' |
thresh | numeric, threshold to determine color of mean_ae annotation, Default: NULL |
color_site_a | character, hex color value, Default: '#BDBDBD' |
color_site_b | character, hex color value, Default: '#757575' |
color_site_c | character, hex color value, Default: 'gold3' |
color_high | character, hex color value, Default: '#00695C' |
color_low | character, hex color value, Default: '#25A69A' |
size_dots | integer, Default: 10 |
Value
ggplot object
Examples
study <- tibble::tibble( site = LETTERS[1:3], patients = c(list(seq(1, 50, 1)), list(seq(1, 40, 1)), list(seq(1, 10, 1)))) %>% tidyr::unnest(patients) %>% dplyr::mutate(n_ae = as.integer(runif(min = 0, max = 10, n = nrow(.))))plot_dots(study)Plot simulation example.
Description
This plots supplements the package documentation.
Usage
plot_sim_example( substract_ae_per_pat = 0, size_dots = 10, size_raster_label = 12, color_site_a = "#BDBDBD", color_site_b = "#757575", color_site_c = "gold3", color_high = "#00695C", color_low = "#25A69A", title = TRUE, legend = TRUE, seed = 5)Arguments
substract_ae_per_pat | integer, subtract aes from patients at site C, Default: 0 |
size_dots | integer, Default: 10 |
size_raster_label | integer, Default: 12 |
color_site_a | character, hex color value, Default: '#BDBDBD' |
color_site_b | character, hex color value, Default: '#757575' |
color_site_c | character, hex color value, Default: 'gold3' |
color_high | character, hex color value, Default: '#00695C' |
color_low | character, hex color value, Default: '#25A69A' |
title | logical, include title, Default: T |
legend | logical, include legend, Default: T |
seed | pass seed for simulations Default: 5 |
Details
usesplot_dots() and adds 2 simulation panels, uses made-upsite config with three sites A,B,C simulating site C
Value
ggplot
See Also
Examples
plot_sim_example(size_dots = 5)Plot multiple simulation examples.
Description
This plot is meant to supplement the package documentation.
Usage
plot_sim_examples(substract_ae_per_pat = c(0, 1, 3), ...)Arguments
substract_ae_per_pat | integer, Default: c(0, 1, 3) |
... | parameters passed to plot_sim_example() |
Details
This function is a wrapper for plot_sim_example()
Value
ggplot
See Also
Examples
plot_sim_examples(size_dot = 3, size_raster_label = 10)plot_sim_examples()Plot ae development of study and sites highlighting at risk sites.
Description
Most suitable visual representation of the AE under-reporting statistics.
Usage
plot_study( df_visit, df_site, df_eval, study, n_sites = 16, prob_col = "prob", event_names = c("ae"), plot_event = "ae", mult_corr = FALSE, delta = TRUE)Arguments
df_visit | dataframe, created by |
df_site | dataframe created by |
df_eval | dataframe created by |
study | study |
n_sites | integer number of most at risk sites, Default: 16 |
prob_col | character, denotes probability column, Default: "prob_low_prob_ur" |
event_names | vector, contains the event names, default = "ae" |
plot_event | vector containing the events that should be plotted, default = "ae" |
mult_corr | Logical, multiplicity correction, Default: TRUE |
delta | logical, show delta events on plot |
Details
Left panel shows mean AE reporting per site (lightblue and darkbluelines) against mean AE reporting of the entire study (golden line). Singlesites are plotted in descending order by AE under-reporting probability onthe right panel in which grey lines denote cumulative AE count of singlepatients. Grey dots in the left panel plot indicate sites that were pickedfor single plotting. AE under-reporting probability of dark blue linescrossed threshold of 95%. Numbers in the upper left corner indicate theratio of patients that have been used for the analysis against the totalnumber of patients. Patients that have not been on the study long enough toreach the evaluation point (visit_med75) will be ignored.
Value
ggplot
Examples
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, ratio_out = 0.4, factor_event_rate = 0.6 ) %>% # internal functions require internal column names dplyr::rename( n_ae = n_event, site_number = site_id, patnum = patient_id )df_site <- site_aggr(df_visit)df_sim_sites <- sim_sites(df_site, df_visit, r = 100)df_eval <- eval_sites(df_sim_sites)simaerep:::plot_study(df_visit, df_site, df_eval, study = "A")Plot patient visits against visit_med75.
Description
Plots cumulative AEs against visits for patients at sites ofgiven study and compares against visit_med75.
Usage
plot_visit_med75( df_visit, df_site = NULL, study_id_str, n_sites = 6, min_pat_pool = 0.2, verbose = TRUE, event_names = "ae", plot_event = "ae", ...)Arguments
df_visit | dataframe |
df_site | dataframe, as returned by |
study_id_str | character, specify study in study_id column |
n_sites | integer, Default: 6 |
min_pat_pool | double, minimum ratio of available patients available forsampling. Determines maximum visit_med75 value see Details. Default: 0.2 |
verbose | logical, Default: TRUE |
event_names | vector, contains the event names, default = "ae" |
plot_event | vector containing the events that should be plotted, default = "ae" |
... | not used |
Value
ggplot
Examples
df_visit <- sim_test_data_study( n_pat = 120, n_sites = 6, ratio_out = 0.4, factor_event_rate = - 0.6 ) %>% dplyr::rename( site_number = site_id, patnum = patient_id, n_ae = n_event )df_site <- site_aggr(df_visit)simaerep:::plot_visit_med75(df_visit, df_site, study_id_str = "A", n_site = 6)Poisson test for vector with site AEs vs vector with study AEs.
Description
Internal function used bysimaerep.
Usage
poiss_test_site_ae_vs_study_ae(site_ae, study_ae, visit_med75)Arguments
site_ae | vector with AE numbers |
study_ae | vector with AE numbers |
visit_med75 | integer |
Details
sets pvalue=1 if mean AE site is greater than mean AE study or ttest gives error
Value
pval
See Also
Examples
simaerep:::poiss_test_site_ae_vs_study_ae( site_ae = c(5, 3, 3, 2, 1, 6), study_ae = c(9, 8, 7, 9, 6, 7, 8), visit_med75 = 10)simaerep:::poiss_test_site_ae_vs_study_ae( site_ae = c(11, 9, 8, 6, 3), study_ae = c(9, 8, 7, 9, 6, 7, 8), visit_med75 = 10)Prepare data for simulation.
Description
Internal function called bysim_sites.Collect AEs per patient at visit_med75 for site and study as a vector ofintegers.
Usage
prep_for_sim(df_site, df_visit)Arguments
df_site | dataframe created by |
df_visit | dataframe, created by |
Value
dataframe
See Also
Examples
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, ratio_out = 0.4, factor_event_rate = 0.6 ) %>% # internal functions require internal column names dplyr::rename( n_ae = n_event, site_number = site_id, patnum = patient_id )df_site <- site_aggr(df_visit)df_prep <- simaerep:::prep_for_sim(df_site, df_visit)df_prepPrint method for orivisit objects
Description
Print method for orivisit objects
Usage
## S3 method for class 'orivisit'print(x, ..., n = 10)Arguments
x | An object of class 'orivisit' |
... | Additional arguments passed to print (not used) |
n | Number of rows to display from the data frame (default: 10) |
Print method for simaerep objects
Description
Print method for simaerep objects
Usage
## S3 method for class 'simaerep'print(x, ..., n = 10)Arguments
x | An object of class 'simaerep' |
... | Additional arguments passed to print (not used) |
n | Number of rows to display from df_eval (default: 5) |
Calculate bootstrapped probability for obtaining a lower site mean AE number.
Description
Internal function used bysim_sites()
Usage
prob_lower_site_ae_vs_study_ae(site_ae, study_ae, r = 1000, under_only = TRUE)Arguments
site_ae | vector with AE numbers |
study_ae | vector with AE numbers |
r | integer, denotes number of simulations, default = 1000 |
under_only | compute under-reporting probabilities only, default = TRUE |
Details
sets pvalue=1 if mean AE site is greater than mean AE study
Value
pval
See Also
Examples
simaerep:::prob_lower_site_ae_vs_study_ae( site_ae = c(5, 3, 3, 2, 1, 6), study_ae = c(9, 8, 7, 9, 6, 7, 8))prune visits to visit_med75 using table operations
Description
prune visits to visit_med75 using table operations
Usage
prune_to_visit_med75_inframe(df_visit, df_site)Arguments
df_visit | Data frame with columns: study_id, site_number, patnum, visit,n_ae. |
df_site | dataframe, as returned by |
Execute a purrr or furrr function with a progressbar.
Description
Internal utility function.
Usage
purrr_bar( ..., .purrr, .f, .f_args = list(), .purrr_args = list(), .steps, .slow = FALSE, .progress = TRUE)Arguments
... | iterable arguments passed to .purrr |
.purrr | purrr or furrr function |
.f | function to be executed over iterables |
.f_args | list of arguments passed to .f, Default: list() |
.purrr_args | list of arguments passed to .purrr, Default: list() |
.steps | integer number of iterations |
.slow | logical slows down execution, Default: FALSE |
.progress | logical, show progress bar, Default: TRUE |
Details
Call still needs to be wrapped inwith_progressorwith_progress_cnd()
Value
result of function passed to .f
Examples
# purrr::mapprogressr::with_progress( purrr_bar(rep(0.25, 5), .purrr = purrr::map, .f = Sys.sleep, .steps = 5))# purrr::walkprogressr::with_progress( purrr_bar(rep(0.25, 5), .purrr = purrr::walk,.f = Sys.sleep, .steps = 5))# progress bar offprogressr::with_progress( purrr_bar( rep(0.25, 5), .purrr = purrr::walk,.f = Sys.sleep, .steps = 5, .progress = FALSE ))# purrr::map2progressr::with_progress( purrr_bar( rep(1, 5), rep(2, 5), .purrr = purrr::map2, .f = `+`, .steps = 5, .slow = TRUE ))# purrr::pmapprogressr::with_progress( purrr_bar( list(rep(1, 5), rep(2, 5)), .purrr = purrr::pmap, .f = `+`, .steps = 5, .slow = TRUE ))# define function within purr_bar() callprogressr::with_progress( purrr_bar( list(rep(1, 5), rep(2, 5)), .purrr = purrr::pmap, .f = function(x, y) { paste0(x, y) }, .steps = 5, .slow = TRUE ))# with mutateprogressr::with_progress( tibble::tibble(x = rep(0.25, 5)) %>% dplyr::mutate(x = purrr_bar(x, .purrr = purrr::map, .f = Sys.sleep, .steps = 5)))renames internal simaerep col_names to externally applied colnames
Description
renames internal simaerep col_names to externally applied colnames
Usage
remap_col_names(df, col_names)Start simulation after preparation.
Description
Internal function called bysim_sitesafterprep_for_sim
Usage
sim_after_prep( df_sim_prep, r = 1000, poisson_test = FALSE, prob_lower = TRUE, progress = FALSE, under_only = TRUE)Arguments
df_sim_prep | dataframe as returned by |
r | integer, denotes number of simulations, default = 1000 |
poisson_test | logical, calculates poisson.test pvalue |
prob_lower | logical, calculates probability for getting a lower value |
progress | logical, display progress bar, Default = TRUE |
under_only | compute under-reporting probabilities only, default = TRUEcheck_df_visit(), computationally expensive on large datasets. Default: TRUE |
Value
dataframe
See Also
Examples
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, ratio_out = 0.4, factor_event_rate = 0.6 ) %>% # internal functions require internal column names dplyr::rename( n_ae = n_event, site_number = site_id, patnum = patient_id )df_site <- site_aggr(df_visit)df_prep <- simaerep:::prep_for_sim(df_site, df_visit)df_sim <- simaerep:::sim_after_prep(df_prep)df_simCalculate prob for study sites using table operations
Description
Calculate prob for study sites using table operations
Usage
sim_inframe(df_visit, r = 1000, df_site = NULL, event_names = c("ae"))Arguments
df_visit | Data frame with columns: study_id, site_number, patnum, visit,n_ae. |
r | Integer or tbl_object, number of repetitions for bootstrapsimulation. Pass a tbl object referring to a table with one column and asmany rows as desired repetitions. Default: 1000. |
df_site | dataframe as returned be |
event_names | vector, contains the event names, default = "event" |
Examples
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, ratio_out = 0.4, factor_event_rate = - 0.6) %>%dplyr::rename( site_number = site_id, patnum = patient_id, n_ae = n_event)df_sim <- simaerep:::sim_inframe(df_visit)simulate under-reporting
Description
we remove a fraction of events from a specific site
Usage
sim_out(df_visit, study_id, site_id, factor_event)Arguments
df_visit | dataframe |
study_id | character |
site_id | character |
factor_event | double, negative values for under-reporting positive forfor over-reporting. |
Details
we determine the absolute number of events per patient for removal.Then them remove them at the first visit.We intentionally allow fractions
Examples
df_visit <- sim_test_data_study(n_pat = 100, n_sites = 10)df_ur <- sim_out(df_visit, "A", site_id = "S0001", factor_event = - 0.35)# Example cumulated event for first patient with 35% under-reportingdf_ur[df_ur$site_id == "S0001" & df_ur$patient_id == "P000001",]$n_event# Example cumulated event for first patient with no under-reportingdf_visit[df_visit$site_id == "S0001" & df_visit$patient_id == "P000001",]$n_eventsimulate patients and events for sitessupports constant and non-constant event rates
Description
simulate patients and events for sitessupports constant and non-constant event rates
Usage
sim_pat(vs_max, vs_sd, is_out, event_rates, event_names, factor_event_rate)Calculate prob_lower and poisson.test pvalue for study sites.
Description
Collects the number of AEs of all eligible patients thatmeet visit_med75 criteria of site. Then calculates poisson.test pvalue andbootstrapped probability of having a lower mean value. Used bysimaerep_classic()
Usage
sim_sites( df_site, df_visit, r = 1000, poisson_test = TRUE, prob_lower = TRUE, progress = TRUE, under_only = TRUE)Arguments
df_site | dataframe created by |
df_visit | dataframe, created by |
r | integer, denotes number of simulations, default = 1000 |
poisson_test | logical, calculates poisson.test pvalue |
prob_lower | logical, calculates probability for getting a lower value |
progress | logical, display progress bar, Default = TRUE |
under_only | compute under-reporting probabilities only, default = TRUEcheck_df_visit(), computationally expensive on large datasets. Default: TRUE |
Value
dataframe with the following columns:
- study_id
study identification
- site_number
site identification
- n_pat
number of patients at site
- visit_med75
median(max(visit)) * 0.75
- n_pat_with_med75
number of patients at site with med75
- mean_ae_site_med75
mean AE at visit_med75 site level
- mean_ae_study_med75
mean AE at visit_med75 study level
- n_pat_with_med75_study
number of patients at study with med75 excl. site
- pval
p-value as returned by
poisson.test- prob_low
bootstrapped probability for having mean_ae_site_med75 or lower
See Also
sim_sites,site_aggr,pat_pool,prob_lower_site_ae_vs_study_ae,poiss_test_site_ae_vs_study_ae,sim_sites,prep_for_simsimaerep_classic
Examples
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, ratio_out = 0.4, factor_event_rate = 0.6 ) %>% # internal functions require internal column names dplyr::rename( n_ae = n_event, site_number = site_id, patnum = patient_id )df_site <- site_aggr(df_visit)df_sim_sites <- sim_sites(df_site, df_visit, r = 100)df_sim_sites %>% knitr::kable(digits = 2)simulate test data events
Description
generates multi-event data using sim_test_data_study()
Usage
sim_test_data_events( n_pat = 100, n_sites = 5, event_rates = c(NULL), event_names = list("event"))Arguments
n_pat | integer, number of patients, Default: 100 |
n_sites | integer, number of sites, Default: 5 |
event_rates | vector with visit-specific event rates, Default: Null |
event_names | vector, contains the event names, default = "event" |
Value
tibble with columns site_id, patient_id, is_ur, max_visit_mean,max_visit_sd, visit, and event data (events_per_visit_mean and n_events)
simulate patient event reporting test data
Description
helper function forsim_test_data_study()
Usage
sim_test_data_patient( .f_sample_max_visit = function() rnorm(1, mean = 20, sd = 4), .f_sample_event_per_visit = function(max_visit) rpois(max_visit, 0.5))Arguments
.f_sample_max_visit | function used to sample the maximum number of events,Default: function() rnorm(1, mean = 20, sd = 4) |
.f_sample_event_per_visit | function used to sample the events for each visit,Default: function(x) rpois(x, 0.5) |
Details
""
Value
vector containing cumulative events
Examples
replicate(5, sim_test_data_patient())replicate(5, sim_test_data_patient( .f_sample_event_per_visit = function(x) rpois(x, 1.2)) )replicate(5, sim_test_data_patient( .f_sample_max_visit = function() rnorm(1, mean = 5, sd = 5)) )Simulate Portfolio Test Data
Description
Simulate visit level data from a portfolio configuration.
Usage
sim_test_data_portfolio( df_config, df_event_rates = NULL, progress = TRUE, parallel = TRUE)Arguments
df_config | dataframe as returned by |
df_event_rates | dataframe with event rates. Default: NULL |
progress | logical, Default: TRUE |
parallel | logical activate parallel processing, see details, Default: FALSE |
Details
usessim_test_data_study.We use thefurrr package toimplement parallel processing as these simulations can take a long time torun. For this to work we need to specify the plan for how the code shouldrun, e.g. 'plan(multisession, workers = 3)
Value
dataframe with the following columns:
- study_id
study identification
- event_per_visit_mean
meanevent per visit per study
- site_id
site
- max_visit_sd
standard deviation of maximum patient visits persite
- max_visit_mean
mean of maximum patient visits per site
- patient_id
number of patients
- visit
visit number
- n_event
cumulative sum of events
See Also
sim_test_data_studyget_portf_configsim_test_data_portfolio
Examples
df_visit1 <- sim_test_data_study(n_pat = 100, n_sites = 10, ratio_out = 0.4, factor_event_rate = 0.6, study_id = "A")df_visit2 <- sim_test_data_study(n_pat = 100, n_sites = 10, ratio_out = 0.2, factor_event_rate = 0.1, study_id = "B")df_visit <- dplyr::bind_rows(df_visit1, df_visit2)df_config <- get_portf_config(df_visit)df_configdf_portf <- sim_test_data_portfolio(df_config)df_portfsimulate study test data
Description
evenly distributes a number of given patients across a number ofgiven sites. Then simulates event reporting of each patient reducing thenumber of reported events for patients distributed to event-under-reportingsites.
Usage
sim_test_data_study( n_pat = 1000, n_sites = 20, ratio_out = 0, factor_event_rate = 0, max_visit_mean = 20, max_visit_sd = 4, event_rates = dgamma(seq(1, 20, 0.5), shape = 5, rate = 2) * 5 + 0.1, event_names = c("event"), study_id = "A")Arguments
n_pat | integer, number of patients, Default: 1000 |
n_sites | integer, number of sites, Default: 20 |
ratio_out | ratio of sites with outlier, Default: 0 |
factor_event_rate | event reporting rate factor for site outlier, willmodify mean event per visit rate used for outlier sites. Negative Valueswill simulate under-reporting, positive values over-reporting, e.g. -0.4 ->40% under-reporting, +0.4 -> 40% over-reporting Default: 0 |
max_visit_mean | mean of the maximum number of visits of each patient,Default: 20 |
max_visit_sd | standard deviation of maximum number of visits of eachpatient, Default: 4 |
event_rates | list or vector with visit-specific event rates. Use listfor multiple event names, Default: dgamma(seq(1, 20, 0.5), shape = 5, rate =2) * 5 + 0.1 |
event_names | vector, contains the event names, default = "event" |
study_id | character, Default: "A" |
Details
maximum visit number will be sampled from normal distribution withcharacteristics derived from max_visit_mean and max_visit_sd, while theevents per visit will be sampled from a poisson distribution described byevents_per_visit_mean.
Value
tibble with columns site_id, patient_id, is_out, max_visit_mean,max_visit_sd, event_per_visit_mean, visit, n_event
Examples
set.seed(1)# no outlierdf_visit <- sim_test_data_study(n_pat = 100, n_sites = 5)df_visit[which(df_visit$patient_id == "P000001"),]# under-reporting outlierdf_visit <- sim_test_data_study(n_pat = 100, n_sites = 5, ratio_out = 0.2, factor_event_rate = -0.5)df_visit[which(df_visit$patient_id == "P000001"),]# constant event ratessim_test_data_study(n_pat = 100, n_sites = 5, event_rates = 0.5)# non-constant event rates for two event typesevent_rates_ae <- c(0.7, rep(0.5, 8), rep(0.3, 5))event_rates_pd <- c(0.3, rep(0.4, 6), rep(0.1, 5))sim_test_data_study(n_pat = 100,n_sites = 5,event_names = c("ae", "pd"),event_rates = list(event_rates_ae, event_rates_pd))Create simaerep object
Description
Simulate AE under-reporting probabilities.
Usage
simaerep( df_visit, r = 1000, check = TRUE, under_only = FALSE, visit_med75 = FALSE, inframe = TRUE, progress = TRUE, mult_corr = TRUE, poisson_test = FALSE, env = parent.frame(), event_names = c("event"), col_names = list(study_id = "study_id", site_id = "site_id", patient_id = "patient_id", visit = "visit"))simaerep_inframe( df_visit, r = 1000, under_only = FALSE, visit_med75 = FALSE, check = TRUE, env = parent.frame(), event_names = c("event"), mult_corr = FALSE, col_names = list(study_id = "study_id", site_id = "site_id", patient_id = "patient_id", visit = "visit"))simaerep_classic( df_visit, check = TRUE, progress = TRUE, env = parent.frame(), under_only = TRUE, r = 1000, mult_corr = FALSE, poisson_test = FALSE, event_names = "event", col_names = list(study_id = "study_id", site_id = "site_id", patient_id = "patient_id", visit = "visit"))Arguments
df_visit | Data frame with columns: study_id, site_number, patnum, visit,n_ae. |
r | Integer or tbl_object, number of repetitions for bootstrapsimulation. Pass a tbl object referring to a table with one column and asmany rows as desired repetitions. Default: 1000. |
check | Logical, perform data check and attempt repair with |
under_only | Logical, compute under-reporting probabilities only.only applies to the classic algorithm in which a one-sided evaluation cansave computation time. Default: FALSE |
visit_med75 | Logical, should evaluation point visit_med75 be used. Compatiblewith inframe and classic version of the algorithm.Default: FALSE |
inframe | Logical, when FALSE classic simaerep algorithm will be used. Thedefault inframe method uses only table operations and is compatible withdbplyr supported database backends. Default: TRUE |
progress | Logical, display progress bar. Default: TRUE. |
mult_corr | Logical, multiplicity correction, Default: TRUE |
poisson_test | logical, compute p-value with poisson test, only supportedby the classic algorithm using visit_med75. Default: FALSE |
env | Optional, provide environment of original visit data. Default:parent.frame(). |
event_names | vector, contains the event names, default = "event" |
col_names | named list, indicate study_id, site_id, patient_id and visitcolumn in df_visit input dataframe. Default: list(study_id = "study_id",site_id = "site_id",patient_id = "patient_id",visit = "visit") |
Details
Executessite_aggr(),sim_sites(), andeval_sites() on originalvisit data and stores all intermediate results. Stores lazy reference tooriginal visit data for facilitated plotting using generic plot(x).
Value
A simaerep object. Results are contained in the attached df_eval dataframe.
| Column Name | Description | Type |
| study_id | The study ID | Character |
| site_id. | The site ID | Character |
| (event)_count | Site event count | Numeric |
| (event)_per_visit_site | Site Ratio of event count divided by visits | Numeric |
| visits | Site visit count | Numeric |
| n_pat | Site patient count | Numeric |
| (event)_per_visit_study | Simulated study ratio | Numeric |
| (event)_prob | Site event ratio probability from -1 to 1 | Numeric |
| (event)_delta | Difference expected vs reported events | Numeric |
See Also
site_aggr,sim_sites,eval_sites,orivisit,plot.simaerep,print.simaerep,simaerep_inframe
Examples
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, ratio_out = 0.4, factor_event_rate = - 0.6)evrep <- simaerep(df_visit)evrepstr(evrep)# simaerep classic algorithmevrep <- simaerep(df_visit, inframe = FALSE, under_only = TRUE, mult_corr = TRUE)evrep# multiple eventsdf_visit_events_test <- sim_test_data_study( n_pat = 100, n_sites = 5, ratio_out = 0.4, factor_event_rate = - 0.6, event_rates = list(0.5, 0.3), event_names = c("ae", "pd"))evsrep <- simaerep(df_visit_events_test, inframe = TRUE, event_names = c("ae", "pd"))evsrep# Database examplecon <- DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:")df_r <- tibble::tibble(rep = seq(1, 1000))dplyr::copy_to(con, df_visit, "visit")dplyr::copy_to(con, df_r, "r")tbl_visit <- dplyr::tbl(con, "visit")tbl_r <- dplyr::tbl(con, "r")simaerep(tbl_visit, r = tbl_r)DBI::dbDisconnect(con)Aggregate from visit to site level.
Description
Calculates visit_med75, n_pat_with_med75 and mean_ae_site_med75.Used bysimaerep_classic()
Usage
site_aggr( df_visit, method = "med75_adj", min_pat_pool = 0.2, event_names = c("ae"))Arguments
df_visit | dataframe with columns: study_id, site_number, patnum, visit,n_ae |
method | character, one of c("med75", "med75_adj", "max") defining method fordefining evaluation point visit_med75 (see details), Default: "med75_adj" |
min_pat_pool | double, minimum ratio of available patients available forsampling. Determines maximum visit_med75 value see Details. Default: 0.2 |
event_names | vector, contains the event names, default = "ae" |
Details
For determining the visit number at which we are going to evaluate AEreporting we take the maximum visit of each patient at the site and take themedian. Then we multiply with 0.75 which will give us a cut-off pointdetermining which patient will be evaluated. Of those patients we willevaluate we take the minimum of all maximum visits hence ensuring that wetake the highest visit number possible without excluding more patients fromthe analysis. In order to ensure that the sampling pool for that visit islarge enough we limit the visit number by the 80% quantile of maximum visitsof all patients in the study. "max" will determine site max visit, flag patientsthat concluded max visit and count patients and patients that concluded max visit.
Value
dataframe with the following columns:
- study_id
study identification
- site_number
siteidentification
- n_pat
number of patients, site level
- visit_med75
adjusted median(max(visit)) * 0.75 see Details
- n_pat_with_med75
number of patients that meet visit_med75criterion, site level
- mean_ae_site_med75
mean AE at visit_med75,site level
See Also
Examples
df_visit <- sim_test_data_study( n_pat = 100, n_sites = 5, ratio_out = 0.4, factor_event_rate = 0.6 ) %>% # internal functions require internal column names dplyr::rename( n_ae = n_event, site_number = site_id, patnum = patient_id )df_site <- site_aggr(df_visit)df_site %>% knitr::kable(digits = 2)Conditionalwith_progress.
Description
Internal function. Use instead ofwith_progress within custom functions with progressbars.
Usage
with_progress_cnd(ex, progress = TRUE)Arguments
ex | expression |
progress | logical, Default: TRUE |
Details
This wrapper adds a progress parameter towith_progressso that we can control the progress bar in the user facing functions. The progressbaronly shows in interactive mode.
Value
No return value, called for side effects
See Also
Examples
if (interactive()) { with_progress_cnd( purrr_bar(rep(0.25, 5), .purrr = purrr::map, .f = Sys.sleep, .steps = 5), progress = TRUE ) with_progress_cnd( purrr_bar(rep(0.25, 5), .purrr = purrr::map, .f = Sys.sleep, .steps = 5), progress = FALSE )# wrap a function with progress bar with another call with progress barf1 <- function(x, progress = TRUE) { with_progress_cnd( purrr_bar(x, .purrr = purrr::walk, .f = Sys.sleep, .steps = length(x), .progress = progress), progress = progress )}# inner progress bar blocks outer progress barprogressr::with_progress( purrr_bar( rep(rep(1, 3),3), .purrr = purrr::walk, .f = f1, .steps = 3, .f_args = list(progress = TRUE) ))# inner progress bar turned offprogressr::with_progress( purrr_bar( rep(list(rep(0.25, 3)), 5), .purrr = purrr::walk, .f = f1, .steps = 5, .f_args = list(progress = FALSE) ))}