Movatterモバイル変換

The rfacts package is an R interface to theFixed and AdaptiveClinical Trial Simulator (FACTS) on Unix-like systems. Itprogrammatically invokesFACTS to runclinical trial simulations, and it aggregates simulation output datainto tidy data frames. These capabilities provide end-to-end automationfor large-scale simulation workflows, and they enhance computationalreproducibility. For more information, please visit thedocumentationwebsite.

Disclaimer

rfacts is not a product of nor supported byBerry Consultants. The codebase ofrfacts is completely independent from that ofFACTS, and theformer only invokes the latter though dynamic system calls.

Limitations

Installation

Next, set theRFACTS_PATHS environment variableappropriately. For instructions, please see theconfigurationguide.

Run FACTS simulations

First, create a*.facts XML file using theFACTS GUI. Therfacts package has several built-in examples, included withpermission from Berry Consultants LLC.

library(rfacts)# get_facts_file_example() returns the path to# an example a FACTS file from rfacts itself.# For your own FACTS files you create yourself in the FACTS GUI,# you can skip get_facts_file_example().facts_file<-get_facts_file_example("contin.facts")basename(facts_file)#> [1] "contin.facts"

Then, run trial simulations withrun_facts(). Bydefault, the results are written to a temporary directory. Set theoutput_path argument to customize the path.

out<-run_facts(  facts_file,n_sims =2,verbose =FALSE)out#> [1] "/tmp/RtmpFv78Hj/file427b6fd72e42"head(get_csv_files(out))#> [1] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/patients00001.csv"#> [2] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/patients00002.csv"#> [3] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_ignore_00001.csv"#> [4] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_ignore_00002.csv"#> [5] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_locf_00001.csv"#> [6] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_locf_00002.csv"

Useread_patients() to read and aggregate all thepatients*.csv files.rfacts has several suchfunctions, includingread_weeks() andread_mcmc().

read_patients(out)#> # A tibble: 2,400 x 15#>    facts_file facts_scenario facts_sim facts_id facts_output facts_csv#>    <chr>      <chr>              <int> <chr>    <chr>        <chr>#>  1 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  2 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  3 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  4 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  5 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  6 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  7 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  8 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  9 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> # … with 2,390 more rows, and 9 more variables: facts_header <chr>,#> #   subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> #   dropout <int>, baseline <lgl>, visit_1 <dbl>

The simulation process

out<-run_flfll(facts_file,verbose =FALSE)run_engine(facts_file,param_files = out,n_sims =4,verbose =FALSE)read_patients(out)#> # A tibble: 4,800 x 15#>    facts_file facts_scenario facts_sim facts_id facts_output facts_csv#>    <chr>      <chr>              <int> <chr>    <chr>        <chr>#>  1 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  2 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  3 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  4 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  5 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  6 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  7 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  8 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  9 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> # … with 4,790 more rows, and 9 more variables: facts_header <chr>,#> #   subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> #   dropout <int>, baseline <lgl>, visit_1 <dbl>

run_engine() automatically detects the Linux enginerequired for your FACTS file. If you know the engine in advance, you canuse a specific engine function such asrun_engine_contin()orrun_engine_dichot().

out<-run_flfll(facts_file,verbose =FALSE)run_engine_contin(param_files = out,n_sims =4,verbose =FALSE)read_patients(out)#> # A tibble: 4,800 x 15#>    facts_file facts_scenario facts_sim facts_id facts_output facts_csv#>    <chr>      <chr>              <int> <chr>    <chr>        <chr>#>  1 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  2 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  3 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  4 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  5 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  6 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  7 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  8 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  9 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> # … with 4,790 more rows, and 9 more variables: facts_header <chr>,#> #   subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> #   dropout <int>, baseline <lgl>, visit_1 <dbl>

Run a single scenario

If we take control of the simulation process, we can pick and choosewhich FACTS simulation scenarios to run and read.

# Example FACTS file built into rfacts.facts_file<-get_facts_file_example("contin.facts")# Set up the files for the scenarios.param_files<-run_flfll(facts_file,verbose =FALSE)# Each scenario has its own folder with internal parameter files.scenarios<-get_param_dirs(param_files)# not in rfacts <= 1.0.0scenarios#> [1] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc1_drop1_resp1_params"#> [2] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc1_drop1_resp2_params"#> [3] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc2_drop1_resp1_params"#> [4] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc2_drop1_resp2_params"# Let's pick one of those scenarios and run the simulations.scenario<- scenarios[1]run_engine_contin(scenario,n_sims =2,verbose =FALSE)read_patients(scenario)#> # A tibble: 600 x 15#>    facts_file facts_scenario facts_sim facts_id facts_output facts_csv#>    <chr>      <chr>              <int> <chr>    <chr>        <chr>#>  1 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  2 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  3 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  4 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  5 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  6 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  7 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  8 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  9 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> # … with 590 more rows, and 9 more variables: facts_header <chr>,#> #   subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> #   dropout <int>, baseline <lgl>, visit_1 <dbl>

Parallel computing

rfacts makes it straightforward to parallelize across simulations.First, userun_flfll() to create a directory of paramfiles. Be sure to supply anoutput_path that all theparallel workers can access (e.g. notempfile()s).

library(rfacts)facts_file<-get_facts_file_example("contin.facts")param_files<-file.path(getwd(),"param_files")run_flfll(facts_file, param_files)#> [1] "/home/c240390/projects/rfacts/param_files"

Next, write a custom function that accepts the param files, runs asingle simulation for each param file, and returns the important data inmemory. Be sure to set a unique seed for each simulation iteration.

sim_once<-function(iter, param_files) {# Copy param files to a temp file in order to# (1) Avoid race conditions in parallel processing, and# (2) Make things run faster: temp files are on local node storage.  out<-tempfile()  fs::dir_copy(path = param_files,new_path = out)# Run the engine once per param file.run_engine_contin(out,n_sims = 1L,seed = iter)# Return aggregated patients files.read_patients(out)# Reads fast because `out` is a tempfile().}

library(dplyr)# All the patients files were named patients00001.csv,# so do not trust the facts_sim column.# For data post-processing, use the facts_id column instead.lapply(seq_len(4), sim_once,param_files = param_files)%>%bind_rows()#> # A tibble: 4,800 x 15#>    facts_file facts_scenario facts_sim facts_id facts_output facts_csv#>    <chr>      <chr>              <int> <chr>    <chr>        <chr>#>  1 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  2 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  3 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  4 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  5 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  6 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  7 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  8 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  9 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> # … with 4,790 more rows, and 9 more variables: facts_header <chr>,#> #   subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> #   dropout <int>, baseline <lgl>, visit_1 <dbl>

Parallel computing happens when we callsim_once()repeatedly over several parallel workers. A powerful and convenientparallel computing solution isclustermq.Here is a sketch of how to use it withrfacts.mclapply() from theparallel package is aquick and dirty alternative.

# Configure clustermq to use our grid and your template file.# If you are using a scheduler like SGE, you need to write a template file# like clustermq.tmpl. To learn how, visit# https://mschubert.github.io/clustermq/articles/userguide.html#configuration-1options(clustermq.scheduler ="sge",clustermq.template ="clustermq.tmpl")# Run the computation.library(clustermq)patients<-Q(fun = sim_once,iter =seq_len(50),const =list(params = params),pkgs =c("fs","rfacts"),n_jobs =4)%>%bind_rows()# Show aggregated patient data.patients

Alternatives toclustermq includeparallel::mclapply(),furrr::future_map(), andfuture.apply::future_lapply().

Helpers

get_facts_scenarios(facts_file)#> [1] "acc1_drop1_resp1" "acc1_drop1_resp2" "acc2_drop1_resp1" "acc2_drop1_resp2"get_facts_version(facts_file)#> [1] "6.2.5.22668"get_facts_versions()#> [1] "6.3.1"   "6.2.5"   "6.0.0.1"