Movatterモバイル変換


[0]ホーム

URL:


rfacts0.2.1.9000

rfacts

The rfacts package is an R interface to theFixed and Adaptive Clinical Trial Simulator (FACTS) on Unix-like systems. It programmatically invokesFACTS to run clinical trial simulations, and it aggregates simulation output data into tidy data frames. These capabilities provide end-to-end automation for large-scale simulation workflows, and they enhance computational reproducibility. For more information, please visit thedocumentation website.

Disclaimer

rfacts is not a product of nor supported byBerry Consultants. The code base ofrfacts is completely independent from that ofFACTS, and the former only invokes the latter though dynamic system calls.

Limitations

  • FACTS files prior to version 6.2.4 are unsupported.
  • rfacts only works on Unix-like systems.
  • rfacts requires paths to pre-compiled versions of Mono, FLFLL, and the FACTS Linux engines. See the installation instructions below and theconfiguration guide.

Installation

To install the latest release from CRAN, open R and run the following.

To install the latest development version:

install.packages("remotes")remotes::install_github("EliLillyCo/rfacts")

Next, set theRFACTS_PATHS environment variable appropriately. For instructions, please see theconfiguration guide.

Run FACTS simulations

First, create a*.facts XML file using theFACTS GUI. Therfacts package has several built-in examples, included with permission from Berry Consultants LLC.

library(rfacts)# get_facts_file_example() returns the path to# an example a FACTS file from rfacts itself.# For your own FACTS files you create yourself in the FACTS GUI,# you can skip get_facts_file_example().facts_file<-get_facts_file_example("contin.facts")basename(facts_file)#> [1] "contin.facts"

Then, run trial simulations withrun_facts(). By default, the results are written to a temporary directory. Set theoutput_path argument to customize the path.

out<-run_facts(facts_file,  n_sims=2,  verbose=FALSE)out#> [1] "/tmp/RtmpFv78Hj/file427b6fd72e42"head(get_csv_files(out))#> [1] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/patients00001.csv"#> [2] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/patients00002.csv"#> [3] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_ignore_00001.csv"#> [4] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_ignore_00002.csv"#> [5] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_locf_00001.csv"#> [6] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_locf_00002.csv"

Useread_patients() to read and aggregate all thepatients*.csv files.rfacts has several such functions, includingread_weeks() andread_mcmc().

read_patients(out)#> # A tibble: 2,400 x 15#>    facts_file facts_scenario facts_sim facts_id facts_output facts_csv#>    <chr>      <chr>              <int> <chr>    <chr>        <chr>#>  1 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  2 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  3 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  4 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  5 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  6 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  7 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  8 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  9 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> # … with 2,390 more rows, and 9 more variables: facts_header <chr>,#> #   subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> #   dropout <int>, baseline <lgl>, visit_1 <dbl>

The simulation process

run_facts() has two sequential stages:

  1. run_flfll(): generate the*.param files and the folder structure for the FACTS Linux engines.
  2. run_engine(): execute the instructions in the*.param files to conduct trial simulations and produce CSV output.
out<-run_flfll(facts_file, verbose=FALSE)run_engine(facts_file, param_files=out, n_sims=4, verbose=FALSE)read_patients(out)#> # A tibble: 4,800 x 15#>    facts_file facts_scenario facts_sim facts_id facts_output facts_csv#>    <chr>      <chr>              <int> <chr>    <chr>        <chr>#>  1 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  2 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  3 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  4 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  5 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  6 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  7 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  8 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  9 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> # … with 4,790 more rows, and 9 more variables: facts_header <chr>,#> #   subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> #   dropout <int>, baseline <lgl>, visit_1 <dbl>

run_engine() automatically detects the Linux engine required for your FACTS file. If you know the engine in advance, you can use a specific engine function such asrun_engine_contin() orrun_engine_dichot().

out<-run_flfll(facts_file, verbose=FALSE)run_engine_contin(param_files=out, n_sims=4, verbose=FALSE)read_patients(out)#> # A tibble: 4,800 x 15#>    facts_file facts_scenario facts_sim facts_id facts_output facts_csv#>    <chr>      <chr>              <int> <chr>    <chr>        <chr>#>  1 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  2 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  3 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  4 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  5 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  6 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  7 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  8 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  9 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> # … with 4,790 more rows, and 9 more variables: facts_header <chr>,#> #   subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> #   dropout <int>, baseline <lgl>, visit_1 <dbl>

If you are unsure which engine function to use, callget_facts_engine()

get_facts_engine(facts_file)#> [1] "run_engine_contin"

Run a single scenario

If we take control of the simulation process, we can pick and choose which FACTS simulation scenarios to run and read.

# Example FACTS file built into rfacts.facts_file<-get_facts_file_example("contin.facts")# Set up the files for the scenarios.param_files<-run_flfll(facts_file, verbose=FALSE)# Each scenario has its own folder with internal parameter files.scenarios<-get_param_dirs(param_files)# not in rfacts <= 1.0.0scenarios#> [1] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc1_drop1_resp1_params"#> [2] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc1_drop1_resp2_params"#> [3] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc2_drop1_resp1_params"#> [4] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc2_drop1_resp2_params"# Let's pick one of those scenarios and run the simulations.scenario<-scenarios[1]run_engine_contin(scenario, n_sims=2, verbose=FALSE)read_patients(scenario)#> # A tibble: 600 x 15#>    facts_file facts_scenario facts_sim facts_id facts_output facts_csv#>    <chr>      <chr>              <int> <chr>    <chr>        <chr>#>  1 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  2 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  3 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  4 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  5 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  6 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  7 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  8 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  9 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> # … with 590 more rows, and 9 more variables: facts_header <chr>,#> #   subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> #   dropout <int>, baseline <lgl>, visit_1 <dbl>

Parallel computing

rfacts makes it straightforward to parallelize across simulations. First, userun_flfll() to create a directory of param files. Be sure to supply anoutput_path that all the parallel workers can access (e.g. notempfile()s).

library(rfacts)facts_file<-get_facts_file_example("contin.facts")param_files<-file.path(getwd(),"param_files")run_flfll(facts_file,param_files)#> [1] "/home/c240390/projects/rfacts/param_files"

Next, write a custom function that accepts the param files, runs a single simulation for each param file, and returns the important data in memory. Be sure to set a unique seed for each simulation iteration.

sim_once<-function(iter,param_files){# Copy param files to a temp file in order to# (1) Avoid race conditions in parallel processing, and# (2) Make things run faster: temp files are on local node storage.out<-tempfile()fs::dir_copy(path=param_files, new_path=out)# Run the engine once per param file.run_engine_contin(out, n_sims=1L, seed=iter)# Return aggregated patients files.read_patients(out)# Reads fast because `out` is a tempfile().}

At this point, we should test this function locally without parallel computing.

library(dplyr)# All the patients files were named patients00001.csv,# so do not trust the facts_sim column.# For data post-processing, use the facts_id column instead.lapply(seq_len(4),sim_once, param_files=param_files)%>%bind_rows()#> # A tibble: 4,800 x 15#>    facts_file facts_scenario facts_sim facts_id facts_output facts_csv#>    <chr>      <chr>              <int> <chr>    <chr>        <chr>#>  1 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  2 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  3 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  4 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  5 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  6 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  7 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  8 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#>  9 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> 10 contin.fa… acc1_drop1_re…         1 file427… patients     /tmp/Rtm…#> # … with 4,790 more rows, and 9 more variables: facts_header <chr>,#> #   subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,#> #   dropout <int>, baseline <lgl>, visit_1 <dbl>

Parallel computing happens when we callsim_once() repeatedly over several parallel workers. A powerful and convenient parallel computing solution isclustermq. Here is a sketch of how to use it withrfacts.mclapply() from theparallel package is a quick and dirty alternative.

# Configure clustermq to use our grid and your template file.# If you are using a scheduler like SGE, you need to write a template file# like clustermq.tmpl. To learn how, visit# https://mschubert.github.io/clustermq/articles/userguide.html#configuration-1options(clustermq.scheduler="sge", clustermq.template="clustermq.tmpl")# Run the computation.library(clustermq)patients<-Q(  fun=sim_once,  iter=seq_len(50),  const=list(params=params),  pkgs=c("fs","rfacts"),  n_jobs=4)%>%bind_rows()# Show aggregated patient data.patients

Alternatives toclustermq includeparallel::mclapply(),furrr::future_map(), andfuture.apply::future_lapply().

Helpers

Variousget_facts_*() functions interrogate FACTS files.

get_facts_scenarios(facts_file)#> [1] "acc1_drop1_resp1" "acc1_drop1_resp2" "acc2_drop1_resp1" "acc2_drop1_resp2"get_facts_version(facts_file)#> [1] "6.2.5.22668"get_facts_versions()#> [1] "6.3.1"   "6.2.5"   "6.0.0.1"

Links

License

Community

Citation

Developers

Dev status

  • cran
  • active
  • check
  • lint

[8]ページ先頭

©2009-2025 Movatter.jp