Movatterモバイル変換


[0]ホーム

URL:


rfacts

cranactivechecklint

The rfacts package is an R interface to theFixed and AdaptiveClinical Trial Simulator (FACTS) on Unix-like systems. Itprogrammatically invokesFACTS to runclinical trial simulations, and it aggregates simulation output datainto tidy data frames. These capabilities provide end-to-end automationfor large-scale simulation workflows, and they enhance computationalreproducibility. For more information, please visit thedocumentationwebsite.

Disclaimer

rfacts is not a product of nor supported byBerry Consultants. The codebase ofrfacts is completely independent from that ofFACTS, and theformer only invokes the latter though dynamic system calls.

Limitations

Installation

To install the latest release from CRAN, open R and run thefollowing.

install.packages("rfacts")

To install the latest development version:

install.packages("remotes")remotes::install_github("EliLillyCo/rfacts")

Next, set theRFACTS_PATHS environment variableappropriately. For instructions, please see theconfigurationguide.

Run FACTS simulations

First, create a*.facts XML file using theFACTS GUI. Therfacts package has several built-in examples, included withpermission from Berry Consultants LLC.

library(rfacts)# get_facts_file_example() returns the path to# an example a FACTS file from rfacts itself.# For your own FACTS files you create yourself in the FACTS GUI,# you can skip get_facts_file_example().facts_file<-get_facts_file_example("contin.facts")basename(facts_file)#> [1] "contin.facts"

Then, run trial simulations withrun_facts(). Bydefault, the results are written to a temporary directory. Set theoutput_path argument to customize the path.

out<-run_facts(  facts_file,n_sims =2,verbose =FALSE)out#> [1] "/tmp/RtmpFv78Hj/file427b6fd72e42"head(get_csv_files(out))#> [1] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/patients00001.csv"#> [2] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/patients00002.csv"#> [3] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_ignore_00001.csv"#> [4] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_ignore_00002.csv"#> [5] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_locf_00001.csv"#> [6] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_locf_00002.csv"

Useread_patients() to read and aggregate all thepatients*.csv files.rfacts has several suchfunctions, includingread_weeks() andread_mcmc().

read_patients(out)#> # A tibble: 2,400 x 15#>    facts_file facts_scenario facts_sim facts_id facts_output facts_csv#>    <chr>      <chr>              <int> <chr>    <chr>        <chr>#>  1 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  2 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  3 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  4 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  5 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  6 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  7 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  8 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  9 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> # … with 2,390 more rows, and 9 more variables: facts_header <chr>,#> #   subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> #   dropout <int>, baseline <lgl>, visit_1 <dbl>

The simulation process

run_facts() has two sequential stages:

  1. run_flfll(): generate the*.param filesand the folder structure for the FACTS Linux engines.
  2. run_engine(): execute the instructions in the*.param files to conduct trial simulations and produce CSVoutput.
out<-run_flfll(facts_file,verbose =FALSE)run_engine(facts_file,param_files = out,n_sims =4,verbose =FALSE)read_patients(out)#> # A tibble: 4,800 x 15#>    facts_file facts_scenario facts_sim facts_id facts_output facts_csv#>    <chr>      <chr>              <int> <chr>    <chr>        <chr>#>  1 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  2 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  3 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  4 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  5 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  6 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  7 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  8 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  9 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> # … with 4,790 more rows, and 9 more variables: facts_header <chr>,#> #   subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> #   dropout <int>, baseline <lgl>, visit_1 <dbl>

run_engine() automatically detects the Linux enginerequired for your FACTS file. If you know the engine in advance, you canuse a specific engine function such asrun_engine_contin()orrun_engine_dichot().

out<-run_flfll(facts_file,verbose =FALSE)run_engine_contin(param_files = out,n_sims =4,verbose =FALSE)read_patients(out)#> # A tibble: 4,800 x 15#>    facts_file facts_scenario facts_sim facts_id facts_output facts_csv#>    <chr>      <chr>              <int> <chr>    <chr>        <chr>#>  1 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  2 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  3 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  4 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  5 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  6 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  7 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  8 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  9 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> # … with 4,790 more rows, and 9 more variables: facts_header <chr>,#> #   subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> #   dropout <int>, baseline <lgl>, visit_1 <dbl>

If you are unsure which engine function to use, callget_facts_engine()

get_facts_engine(facts_file)#> [1] "run_engine_contin"

Run a single scenario

If we take control of the simulation process, we can pick and choosewhich FACTS simulation scenarios to run and read.

# Example FACTS file built into rfacts.facts_file<-get_facts_file_example("contin.facts")# Set up the files for the scenarios.param_files<-run_flfll(facts_file,verbose =FALSE)# Each scenario has its own folder with internal parameter files.scenarios<-get_param_dirs(param_files)# not in rfacts <= 1.0.0scenarios#> [1] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc1_drop1_resp1_params"#> [2] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc1_drop1_resp2_params"#> [3] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc2_drop1_resp1_params"#> [4] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc2_drop1_resp2_params"# Let's pick one of those scenarios and run the simulations.scenario<- scenarios[1]run_engine_contin(scenario,n_sims =2,verbose =FALSE)read_patients(scenario)#> # A tibble: 600 x 15#>    facts_file facts_scenario facts_sim facts_id facts_output facts_csv#>    <chr>      <chr>              <int> <chr>    <chr>        <chr>#>  1 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  2 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  3 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  4 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  5 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  6 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  7 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  8 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  9 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> # … with 590 more rows, and 9 more variables: facts_header <chr>,#> #   subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> #   dropout <int>, baseline <lgl>, visit_1 <dbl>

Parallel computing

rfacts makes it straightforward to parallelize across simulations.First, userun_flfll() to create a directory of paramfiles. Be sure to supply anoutput_path that all theparallel workers can access (e.g. notempfile()s).

library(rfacts)facts_file<-get_facts_file_example("contin.facts")param_files<-file.path(getwd(),"param_files")run_flfll(facts_file, param_files)#> [1] "/home/c240390/projects/rfacts/param_files"

Next, write a custom function that accepts the param files, runs asingle simulation for each param file, and returns the important data inmemory. Be sure to set a unique seed for each simulation iteration.

sim_once<-function(iter, param_files) {# Copy param files to a temp file in order to# (1) Avoid race conditions in parallel processing, and# (2) Make things run faster: temp files are on local node storage.  out<-tempfile()  fs::dir_copy(path = param_files,new_path = out)# Run the engine once per param file.run_engine_contin(out,n_sims = 1L,seed = iter)# Return aggregated patients files.read_patients(out)# Reads fast because `out` is a tempfile().}

At this point, we should test this function locally without parallelcomputing.

library(dplyr)# All the patients files were named patients00001.csv,# so do not trust the facts_sim column.# For data post-processing, use the facts_id column instead.lapply(seq_len(4), sim_once,param_files = param_files)%>%bind_rows()#> # A tibble: 4,800 x 15#>    facts_file facts_scenario facts_sim facts_id facts_output facts_csv#>    <chr>      <chr>              <int> <chr>    <chr>        <chr>#>  1 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  2 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  3 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  4 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  5 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  6 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  7 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  8 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  9 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> # … with 4,790 more rows, and 9 more variables: facts_header <chr>,#> #   subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> #   dropout <int>, baseline <lgl>, visit_1 <dbl>

Parallel computing happens when we callsim_once()repeatedly over several parallel workers. A powerful and convenientparallel computing solution isclustermq.Here is a sketch of how to use it withrfacts.mclapply() from theparallel package is aquick and dirty alternative.

# Configure clustermq to use our grid and your template file.# If you are using a scheduler like SGE, you need to write a template file# like clustermq.tmpl. To learn how, visit# https://mschubert.github.io/clustermq/articles/userguide.html#configuration-1options(clustermq.scheduler ="sge",clustermq.template ="clustermq.tmpl")# Run the computation.library(clustermq)patients<-Q(fun = sim_once,iter =seq_len(50),const =list(params = params),pkgs =c("fs","rfacts"),n_jobs =4)%>%bind_rows()# Show aggregated patient data.patients

Alternatives toclustermq includeparallel::mclapply(),furrr::future_map(), andfuture.apply::future_lapply().

Helpers

Variousget_facts_*() functions interrogate FACTSfiles.

get_facts_scenarios(facts_file)#> [1] "acc1_drop1_resp1" "acc1_drop1_resp2" "acc2_drop1_resp1" "acc2_drop1_resp2"get_facts_version(facts_file)#> [1] "6.2.5.22668"get_facts_versions()#> [1] "6.3.1"   "6.2.5"   "6.0.0.1"

[8]ページ先頭

©2009-2025 Movatter.jp