The rfacts package is an R interface to theFixed and AdaptiveClinical Trial Simulator (FACTS) on Unix-like systems. Itprogrammatically invokesFACTS to runclinical trial simulations, and it aggregates simulation output datainto tidy data frames. These capabilities provide end-to-end automationfor large-scale simulation workflows, and they enhance computationalreproducibility. For more information, please visit thedocumentationwebsite.
rfacts is not a product of nor supported byBerry Consultants. The codebase ofrfacts is completely independent from that ofFACTS, and theformer only invokes the latter though dynamic system calls.
rfacts only works on Unix-like systems.rfacts requires paths to pre-compiled versions of Mono,FLFLL, and the FACTS Linux engines. See the installation instructionsbelow and theconfigurationguide.To install the latest release from CRAN, open R and run thefollowing.
install.packages("rfacts")To install the latest development version:
install.packages("remotes")remotes::install_github("EliLillyCo/rfacts")Next, set theRFACTS_PATHS environment variableappropriately. For instructions, please see theconfigurationguide.
First, create a*.facts XML file using theFACTS GUI. Therfacts package has several built-in examples, included withpermission from Berry Consultants LLC.
library(rfacts)# get_facts_file_example() returns the path to# an example a FACTS file from rfacts itself.# For your own FACTS files you create yourself in the FACTS GUI,# you can skip get_facts_file_example().facts_file<-get_facts_file_example("contin.facts")basename(facts_file)#> [1] "contin.facts"Then, run trial simulations withrun_facts(). Bydefault, the results are written to a temporary directory. Set theoutput_path argument to customize the path.
out<-run_facts( facts_file,n_sims =2,verbose =FALSE)out#> [1] "/tmp/RtmpFv78Hj/file427b6fd72e42"head(get_csv_files(out))#> [1] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/patients00001.csv"#> [2] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/patients00002.csv"#> [3] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_ignore_00001.csv"#> [4] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_ignore_00002.csv"#> [5] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_locf_00001.csv"#> [6] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_locf_00002.csv"Useread_patients() to read and aggregate all thepatients*.csv files.rfacts has several suchfunctions, includingread_weeks() andread_mcmc().
read_patients(out)#> # A tibble: 2,400 x 15#> facts_file facts_scenario facts_sim facts_id facts_output facts_csv#> <chr> <chr> <int> <chr> <chr> <chr>#> 1 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 2 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 3 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 4 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 5 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 6 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 7 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 8 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 9 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> # … with 2,390 more rows, and 9 more variables: facts_header <chr>,#> # subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> # dropout <int>, baseline <lgl>, visit_1 <dbl>run_facts() has two sequential stages:
run_flfll(): generate the*.param filesand the folder structure for the FACTS Linux engines.run_engine(): execute the instructions in the*.param files to conduct trial simulations and produce CSVoutput.out<-run_flfll(facts_file,verbose =FALSE)run_engine(facts_file,param_files = out,n_sims =4,verbose =FALSE)read_patients(out)#> # A tibble: 4,800 x 15#> facts_file facts_scenario facts_sim facts_id facts_output facts_csv#> <chr> <chr> <int> <chr> <chr> <chr>#> 1 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 2 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 3 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 4 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 5 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 6 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 7 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 8 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 9 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> # … with 4,790 more rows, and 9 more variables: facts_header <chr>,#> # subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> # dropout <int>, baseline <lgl>, visit_1 <dbl>run_engine() automatically detects the Linux enginerequired for your FACTS file. If you know the engine in advance, you canuse a specific engine function such asrun_engine_contin()orrun_engine_dichot().
out<-run_flfll(facts_file,verbose =FALSE)run_engine_contin(param_files = out,n_sims =4,verbose =FALSE)read_patients(out)#> # A tibble: 4,800 x 15#> facts_file facts_scenario facts_sim facts_id facts_output facts_csv#> <chr> <chr> <int> <chr> <chr> <chr>#> 1 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 2 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 3 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 4 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 5 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 6 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 7 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 8 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 9 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> # … with 4,790 more rows, and 9 more variables: facts_header <chr>,#> # subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> # dropout <int>, baseline <lgl>, visit_1 <dbl>If you are unsure which engine function to use, callget_facts_engine()
get_facts_engine(facts_file)#> [1] "run_engine_contin"If we take control of the simulation process, we can pick and choosewhich FACTS simulation scenarios to run and read.
# Example FACTS file built into rfacts.facts_file<-get_facts_file_example("contin.facts")# Set up the files for the scenarios.param_files<-run_flfll(facts_file,verbose =FALSE)# Each scenario has its own folder with internal parameter files.scenarios<-get_param_dirs(param_files)# not in rfacts <= 1.0.0scenarios#> [1] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc1_drop1_resp1_params"#> [2] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc1_drop1_resp2_params"#> [3] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc2_drop1_resp1_params"#> [4] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc2_drop1_resp2_params"# Let's pick one of those scenarios and run the simulations.scenario<- scenarios[1]run_engine_contin(scenario,n_sims =2,verbose =FALSE)read_patients(scenario)#> # A tibble: 600 x 15#> facts_file facts_scenario facts_sim facts_id facts_output facts_csv#> <chr> <chr> <int> <chr> <chr> <chr>#> 1 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 2 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 3 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 4 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 5 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 6 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 7 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 8 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 9 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> # … with 590 more rows, and 9 more variables: facts_header <chr>,#> # subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> # dropout <int>, baseline <lgl>, visit_1 <dbl>rfacts makes it straightforward to parallelize across simulations.First, userun_flfll() to create a directory of paramfiles. Be sure to supply anoutput_path that all theparallel workers can access (e.g. notempfile()s).
library(rfacts)facts_file<-get_facts_file_example("contin.facts")param_files<-file.path(getwd(),"param_files")run_flfll(facts_file, param_files)#> [1] "/home/c240390/projects/rfacts/param_files"Next, write a custom function that accepts the param files, runs asingle simulation for each param file, and returns the important data inmemory. Be sure to set a unique seed for each simulation iteration.
sim_once<-function(iter, param_files) {# Copy param files to a temp file in order to# (1) Avoid race conditions in parallel processing, and# (2) Make things run faster: temp files are on local node storage. out<-tempfile() fs::dir_copy(path = param_files,new_path = out)# Run the engine once per param file.run_engine_contin(out,n_sims = 1L,seed = iter)# Return aggregated patients files.read_patients(out)# Reads fast because `out` is a tempfile().}At this point, we should test this function locally without parallelcomputing.
library(dplyr)# All the patients files were named patients00001.csv,# so do not trust the facts_sim column.# For data post-processing, use the facts_id column instead.lapply(seq_len(4), sim_once,param_files = param_files)%>%bind_rows()#> # A tibble: 4,800 x 15#> facts_file facts_scenario facts_sim facts_id facts_output facts_csv#> <chr> <chr> <int> <chr> <chr> <chr>#> 1 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 2 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 3 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 4 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 5 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 6 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 7 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 8 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 9 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…#> # … with 4,790 more rows, and 9 more variables: facts_header <chr>,#> # subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> # dropout <int>, baseline <lgl>, visit_1 <dbl>Parallel computing happens when we callsim_once()repeatedly over several parallel workers. A powerful and convenientparallel computing solution isclustermq.Here is a sketch of how to use it withrfacts.mclapply() from theparallel package is aquick and dirty alternative.
# Configure clustermq to use our grid and your template file.# If you are using a scheduler like SGE, you need to write a template file# like clustermq.tmpl. To learn how, visit# https://mschubert.github.io/clustermq/articles/userguide.html#configuration-1options(clustermq.scheduler ="sge",clustermq.template ="clustermq.tmpl")# Run the computation.library(clustermq)patients<-Q(fun = sim_once,iter =seq_len(50),const =list(params = params),pkgs =c("fs","rfacts"),n_jobs =4)%>%bind_rows()# Show aggregated patient data.patientsAlternatives toclustermq includeparallel::mclapply(),furrr::future_map(), andfuture.apply::future_lapply().
Variousget_facts_*() functions interrogate FACTSfiles.
get_facts_scenarios(facts_file)#> [1] "acc1_drop1_resp1" "acc1_drop1_resp2" "acc2_drop1_resp1" "acc2_drop1_resp2"get_facts_version(facts_file)#> [1] "6.2.5.22668"get_facts_versions()#> [1] "6.3.1" "6.2.5" "6.0.0.1"