Movatterモバイル変換

Title:

Intensive Care Unit Data with R

Description:

Focused on (but not exclusive to) data sets hosted on PhysioNet (https://physionet.org), 'ricu' provides utilities for download, setup and access of intensive care unit (ICU) data sets. In addition to functions for running arbitrary queries against available data sets, a system for defining clinical concepts and encoding their representations in tabular ICU data is presented.

Version:

0.6.3

License:

GPL-3

Encoding:

UTF-8

Language:

en-US

URL:

https://github.com/eth-mds/ricu,https://physionet.org

BugReports:

https://github.com/eth-mds/ricu/issues

Depends:

R (≥ 3.5.0)

Imports:

data.table, curl, assertthat, fst, readr, jsonlite, methods,stats, prt (≥ 0.1.2), tibble, backports, rlang, vctrs, cli (≥2.1.0), fansi, openssl, utils

Suggests:

xml2, covr, testthat (≥ 3.0.0), withr, mockthat, pkgload,mimic.demo, eicu.demo, progress, knitr, rmarkdown, ggplot2,cowplot, survival, forestmodel, rticles, kableExtra, units,pdftools, magick, pillar

RoxygenNote:

7.3.2

Additional_repositories:

https://eth-mds.github.io/physionet-demo

VignetteBuilder:

knitr

Config/testthat/edition:

NeedsCompilation:

Packaged:

2025-09-03 21:17:03 UTC; nbennett

Author:

Nicolas Bennett [aut, cre], Drago Plecko [aut], Ida-Fong Ukor [aut]

Maintainer:

Nicolas Bennett <r@nbenn.ch>

Repository:

CRAN

Date/Publication:

2025-09-03 21:50:09 UTC

ricu: Intensive Care Unit Data with R

Description

Author(s)

Maintainer: Nicolas Bennettr@nbenn.ch

Authors:

Drago Pleckodrago.plecko@stat.math.ethz.ch
Ida-Fong Ukorida-fong.ukor@monashhealth.org

Internal item callback utilities

Description

The utility functionadd_concept() is exported for convenience when addingexternal datasets and integrating concepts that require other concepts.While this could be solves by defining arec_concpt, in some scenariosthis might not be ideal, as it might be only required thatitmimplementations for certain data sources require additional information.Examples for this include vasopressor rates which might rely on patientweight, and blood cell counts when expressed as ratio. For performancereasons, the pulled in concept is internally cached, as this might be usedunchanged many times, when loading several concepts that need to pull inthe given concept. Persistence of cache is session-level and therefore thisutility is intended to be used somewhat sparingly.

Usage

add_concept(x, env, concept, var_name = concept, aggregate = NULL)add_weight(x, env, var_name = "weight")calc_dur(x, val_var, min_var, max_var, grp_var = NULL)combine_callbacks(...)

Arguments

x

Object in loading

env

Data source environment as available asenv in callbackfunctions

concept

String valued concept name that will be loaded from thedefault dictionary

var_name

String valued variable name

aggregate

Forwarded toload_concepts()

val_var

String valued column name corresponding to the value variable

min_var,max_var

Column names denoting start and end times

grp_var

Optional grouping variable (for example linking infusions)

...

Functions which will be successively applied

Value

A copy ofx with the requested concept merged in.

Data attach utilities

Description

Making a dataset available toricu consists of 3 steps: downloading(download_src()), importing (import_src()) and attaching(attach_src()). While downloading and importing are one-time procedures,attaching of the dataset is repeated every time the package is loaded.Briefly, downloading loads the raw dataset from the internet (most likelyin.csv format), importing consists of some preprocessing to make thedata available more efficiently and attaching sets up the data for use bythe package.

Usage

attach_src(x, ...)## S3 method for class 'src_cfg'attach_src(x, assign_env = NULL, data_dir = src_data_dir(x), ...)## S3 method for class 'character'attach_src(x, assign_env = NULL, data_dir = src_data_dir(x), ...)detach_src(x)setup_src_env(x, ...)## S3 method for class 'src_cfg'setup_src_env(x, data_dir = src_data_dir(x), link_env = NULL, ...)

Arguments

x

Data source to attach

...

Forwarded to further calls toattach_src()

assign_env,link_env

Environment in which the data source will becomeavailable

data_dir

Directory used to look forfst::fst() files;NULL callsdata_dir() using the source name assubdir argument

Details

Attaching a dataset sets up two types of S3 classes: a singlesrc_envobject, containing as manysrc_tbl objects as tables are associated withthe dataset. Asrc_env is an environment with anid_cfg attribute, aswell as sub-classes as specified by the data sourceclass_prefixconfiguration setting (seeload_src_cfg()). Allsrc_env objects createdby callingattach_src() represent environments that are directdescendants of thedata environment and are bound to the respectivedataset name within that environment. For more information onsrc_env andsrc_tbl objects, refer tonew_src_tbl().

If set up correctly, it is not necessary for the user to directly callattach_src(). When the package is loaded, the default data sources (seeauto_attach_srcs()) are attached automatically. This default can becontrolled by setting as environment variableRICU_SRC_LOAD a commaseparated list of data source names before loading the library. Settingthis environment variable as

Sys.setenv(RICU_SRC_LOAD = "mimic_demo,eicu_demo")

will change the default of loading both MIMIC-III and eICU, alongside therespective demo datasets, as well as HiRID and AUMC, to just the two demodatasets. For setting an environment variable upon startup of the Rsession, refer tobase::.First.sys().

Attaching a dataset during package namespace loading will both instantiatea correspondingsrc_env in thedata environment and for conveniencealso assign this object into the package namespace, such that for examplethe MIMIC-III demo dataset not only is available as⁠ricu::data::mimic_demo⁠, but also asricu::mimic_demo (or if the packagenamespace is attached, simply asmimic_demo). Dataset attaching usingattach_src() does not need to happen during namespace loading, but can betriggered by the user at any time. If such a convenience link as describedabove is desired by the user, an environment such as.GlobalEnv has to bepassed asassign_env toattach_src().

Data sets are set up assrc_env objects irrespective of whether all (orany) of the required data is available. If some (or all) data is missing,the user is asked for permission to download in interactive sessions and anerror is thrown in non-interactive sessions. Downloading demo datasetsrequires no further information but access to full-scale datasets (eventhough they are publicly available) is guarded by access credentials (seedownload_src()).

Whileattach_src() provides the main entry point,src_env objects areinstantiated by the S3 generic functionsetup_src_env() and the wrappingfunction serves to catch errors that might be caused by config file parsingissues as to not break attaching of the package namespace. Apart form this,attach_src() also provides the convenience linking into the packagenamespace (or a user-specified environment) described above.

Asrc_env object created bysetup_src_env() does not directly containsrc_tbl objects bound to names, but rather an active binding (seebase::makeActiveBinding()) per table. These active bindings check foravailability of required files and evaluate to correspondingsrc_tblobjects if these checks are passed and ask for user input otherwise. Assrc_tbl objects are intended to be read-only, assignment is not possibleexcept for the valueNULL which resets the internally cachedsrc_tblthat is created on first successful access.

Value

Bothattach_src() andsetup_src_env() are called for sideeffects and therefore return invisibly. Whileattach_src() returnsNULL,setup_src_env() returns the newly createdsrc_env object.

Examples

## Not run: Sys.setenv(RICU_SRC_LOAD = "")library(ricu)ls(envir = data)exists("mimic_demo")attach_src("mimic_demo", assign_env = .GlobalEnv)ls(envir = data)exists("mimic_demo")mimic_demo## End(Not run)

ICU class data reshaping

Description

Utilities for reshapingid_tbl andts_tbl objects.

Usage

cbind_id_tbl(  ...,  keep.rownames = FALSE,  check.names = FALSE,  key = NULL,  stringsAsFactors = FALSE)rbind_id_tbl(  ...,  use.names = TRUE,  fill = FALSE,  idcol = NULL,  ignore.attr = FALSE)## S3 method for class 'id_tbl'merge(x, y, by = NULL, by.x = NULL, by.y = NULL, ...)## S3 method for class 'id_tbl'split(x, ...)rbind_lst(x, ...)merge_lst(x)unmerge(x, col_groups = as.list(data_vars(x)), by = meta_vars(x), na_rm = TRUE)

Arguments

...

Objects to combine

keep.rownames,check.names,key,stringsAsFactors

Forwarded todata.table::data.table

use.names,fill,idcol

Forwarded todata.table::rbindlist

x,y

Objects to combine

by,by.x,by.y

Column names used for combining data

col_groups

A list of character vectors defining the grouping ofnon-by columns

na_rm

Logical flag indicating whether to remove rows that have allmissing entries in the respectivecol_groups group

Value

Eitherid_tbl orts_tbl objects (depending on inputs) or liststhereof in case ofsplit() andunmerge().

Switch between id types

Description

ICU datasets such as MIMIC-III or eICU typically represent patients bymultiple ID systems such as patient IDs, hospital stay IDs and ICUadmission IDs. Even if the raw data is available in only one such IDsystem, given a mapping of IDs alongside start and end times, it ispossible to convert data from one ID system to another. The functionchange_id() provides such a conversion utility, internally eithercallingupgrade_id() when moving to an ID system with higher cardinalityanddowngrade_id() when the target ID system is of lower cardinality

Usage

change_id(x, target_id, src, ..., keep_old_id = TRUE, id_type = FALSE)upgrade_id(x, target_id, src, cols = time_vars(x), ...)downgrade_id(x, target_id, src, cols = time_vars(x), ...)## S3 method for class 'ts_tbl'upgrade_id(x, target_id, src, cols = time_vars(x), ...)## S3 method for class 'id_tbl'upgrade_id(x, target_id, src, cols = time_vars(x), ...)## S3 method for class 'ts_tbl'downgrade_id(x, target_id, src, cols = time_vars(x), ...)## S3 method for class 'id_tbl'downgrade_id(x, target_id, src, cols = time_vars(x), ...)

Arguments

x

icu_tbl object for which to make the id change

target_id

The destination id name

src

Passed toas_id_cfg() andas_src_env()

...

Passed toupgrade_id()/downgrade_id()

keep_old_id

Logical flag indicating whether to keep the previous IDcolumn

id_type

Logical flag indicating whethertarget_id is specified asID name (e.g.icustay_id on MIMIC) or ID type (e.g.icustay)

cols

Column names that require time-adjustment

Details

In order to provide ID system conversion for a data source, the (internal)functionid_map() must be able to construct an ID mapping for that datasource. Constructing such a mapping can be expensive w.r.t. the frequencyit might be re-used and therefore,id_map() provides cachinginfrastructure. The mapping itself is constructed by the (internal)functionid_map_helper(), which is expected to provide source anddestination ID columns as well as start and end columns corresponding tothe destination ID, relative to the source ID system. In the followingexample, we request formimic_demo, with ICU stay IDs as source andhospital admissions as destination IDs.

id_map_helper(mimic_demo, "icustay_id", "hadm_id")#> # An `id_tbl`: 136 x 4#> # Id var:      `icustay_id`#>     icustay_id hadm_id hadm_id_start hadm_id_end#>          <int>   <int> <drtn>        <drtn>#> 1       201006  198503  -3291 mins    9113 mins#> 2       201204  114648     -2 mins    6949 mins#> 3       203766  126949  -1336 mins    8818 mins#> 4       204132  157609     -1 mins   10103 mins#> 5       204201  177678   -369 mins    9444 mins#> ...#> 132     295043  170883 -10413 mins   31258 mins#> 133     295741  176805     -2 mins    3152 mins#> 134     296804  110244  -1295 mins    4598 mins#> 135     297782  167612     -1 mins     207 mins#> 136     298685  151323     -1 mins   19082 mins#> # i 131 more rows

Both start and end columns encode the hospital admission windows relativeto each corresponding ICU stay start time. It therefore comes as nosurprise that most start times are negative (hospital admission typicallyoccurs before ICU stay start time), while end times are often days in thefuture (as hospital discharge typically occurs several days after ICUadmission).

In order to use the ID conversion infrastructure offered byricu for anew dataset, it typically suffices to provide anid_cfg entry in thesource configuration (seeload_src_cfg()), outlining the available IDsystems alongside an ordering, as well as potentially a class specificimplementation ofid_map_helper() for the given source class, specifyingthe corresponding time windows in 1 minute resolution (for every possiblepair of IDs).

While both up- and downgrades forid_tbl objects, as well as downgradesforts_tbl objects are simple merge operations based on the ID mappingprovided byid_map(), ID upgrades forts_tbl objects are slightly moreinvolved. As an example, consider the following setting: we havedataassociated withhadm_id IDs and times relative to hospital admission:

               1      2       3        4       5       6        7      8data        ---*------*-------*--------*-------*-------*--------*------*---               3h    10h     18h      27h     35h     43h      52h    59h                                         HADM_1            0h     7h                26h        37h             53h      62hhadm_id     |-------------------------------------------------------------|icustay_id         |------------------|          |---------------|                   0h                19h         0h             16h                           ICU_1                       ICU_2

The mapping of data points fromhadm_id toicustay_id is created asfollows: ICU stay end times mark boundaries and all data that is recordedafter the last ICU stay ended is assigned to the last ICU stay. Thereforedata points 1-3 are assigned toICU_1, while 4-8 are assigned toICU_2.Times have to be shifted as well, as timestamps are expected to be relativeto the current ID system. Data points 1-3 therefore are assigned to timestamps -4h, 3h and 11h, while data points 4-8 are assigned to -10h, -2h,6h, 15h and 22h. Implementation-wise, the mapping is computed using anefficientdata.table rolling join.

Value

An object of the same type asx with modified IDs.

Examples

if (require(mimic.demo)) {tbl <- mimic_demo$labeventsdat <- load_difftime(tbl, itemid == 50809, c("charttime", "valuenum"))datchange_id(dat, "icustay_id", tbl, keep_old_id = FALSE)}

Internal utilities for ICU data classes

Description

Internal utilities for ICU data classes

Usage

col_renamer(x, new, old = colnames(x), skip_absent = FALSE, by_ref = FALSE)

Arguments

x

Object to query

new,old

Replacement names and existing column names for renamingcolumns

skip_absent

Logical flag for ignoring non-existent column names

by_ref

Logical flag indicating whether to perform the operation byreference

ICU datasets

Description

TheLaboratory for Computational Physiology (LCP) at MIT hosts several large-scaledatabases of hospital intensive care units (ICUs), two of which can beeither downloaded in full (MIMIC-III andeICU) or as demo subsets(MIMIC-III demo andeICU demo), while athird data set is available only in full (HiRID). While demo data sets arefreely available, full download requires credentialed access which can begained by applying for an account withPhysioNet. Even though registration is required,the described datasets are all publicly available. WithAmsterdamUMCdb, a non-PhysioNethosted data source is available as well. As with the PhysioNet datasets,access is public but has to be granted by the data collectors.

Usage

data

Format

The exporteddata environment contains all datasets that have been madeavailable toricu. For datasets that are attached during package loading(seeattach_src()), shortcuts to the datasets are set up in the packagenamespace, allowing the object⁠ricu::data::mimic_demo⁠ to be accessed asricu::mimic_demo (or in case the package namespace has been attached,simply asmimic_demo). Datasets that are made available after the packagenamespace has been sealed will have their proxy object by default locatedin.GlobalEnv. Datasets are represented bysrc_envobjects, while individual tables aresrc_tbl and do notrepresent in-memory data, but rather data stored on disk, subsets of whichcan be loaded into memory.

Details

Setting up a dataset for use withricu requires a configuration object.For the included datasets, configuration can be loaded from

system.file("extdata", "config", "data-sources.json", package = "ricu")

by callingload_src_cfg() and for dataset that are external toricu,additional configuration can be made available by setting the environmentvariableRICU_CONFIG_PATH (for more information, refer toload_src_cfg()). Using the dataset configuration object, data can bedownloaded (download_src()), imported (import_src()) and attached(attach_src()). While downloading and importing are one-time procedures,attaching of the dataset is repeated every time the package is loaded.Briefly, downloading loads the raw dataset from the internet (most likelyin.csv format), importing consists of some preprocessing to make thedata available more efficiently (by converting it to.fstformat) and attaching sets up the data for use by the package. For moreinformation on the individual steps, refer to the respective documentationpages.

A dataset that has been successfully made available can interactively beexplored by typing its name into the console and individual tables can beinspected using the$ function. For example for the MIMIC-III demodataset and theicustays table, this gives

mimic_demo#> <mimic_demo_env[25]>#>         admissions            callout         caregivers        chartevents #>         [129 x 19]          [77 x 24]        [7,567 x 4]     [758,355 x 15] #>          cptevents              d_cpt    d_icd_diagnoses   d_icd_procedures #>       [1,579 x 12]          [134 x 9]       [14,567 x 4]        [3,882 x 4] #>            d_items         d_labitems     datetimeevents      diagnoses_icd #>      [12,487 x 10]          [753 x 6]      [15,551 x 14]        [1,761 x 5] #>           drgcodes           icustays     inputevents_cv     inputevents_mv #>          [297 x 8]         [136 x 12]      [34,799 x 22]      [13,224 x 31] #>          labevents microbiologyevents       outputevents           patients #>       [76,074 x 9]       [2,003 x 16]      [11,320 x 13]          [100 x 8] #>      prescriptions procedureevents_mv     procedures_icd           services #>      [10,398 x 19]         [753 x 25]          [506 x 5]          [163 x 6] #>          transfers #>         [524 x 13]mimic_demo$icustays#> # <mimic_tbl>: [136 x 12]#> # ID options:  subject_id (patient) < hadm_id (hadm) < icustay_id (icustay)#> # Defaults:    `intime` (index), `last_careunit` (val)#> # Time vars:   `intime`, `outtime`#>     row_id subject_id hadm_id icustay_id dbsource   first_careunit last_careunit#>      <int>      <int>   <int>      <int> <chr>      <chr>          <chr>#> 1    12742      10006  142345     206504 carevue    MICU           MICU#> 2    12747      10011  105331     232110 carevue    MICU           MICU#> 3    12749      10013  165520     264446 carevue    MICU           MICU#> 4    12754      10017  199207     204881 carevue    CCU            CCU#> 5    12755      10019  177759     228977 carevue    MICU           MICU#> ...#> 132  42676      44083  198330     286428 metavision CCU            CCU#> 133  42691      44154  174245     217724 metavision MICU           MICU#> 134  42709      44212  163189     239396 metavision MICU           MICU#> 135  42712      44222  192189     238186 metavision CCU            CCU#> 136  42714      44228  103379     217992 metavision SICU           SICU#> # i 131 more rows#> # i 5 more variables: first_wardid <int>, last_wardid <int>, intime <dttm>,#> #   outtime <dttm>, los <dbl>

Table subsets can be loaded into memory for example using thebase::subset() function, which uses non-standard evaluation (NSE) todetermine a row-subsetting. This design choice stems form the fact thatsome tables can have on the order of 10^8 rows, which makes loading fulltables into memory an expensive operation. Table subsets loaded intomemory are represented asdata.table objects.Extending the above example, if only ICU stays corresponding to the patientwithsubject_id == 10124 are of interest, the respective data can beloaded as

subset(mimic_demo$icustays, subject_id == 10124)#>    row_id subject_id hadm_id icustay_id dbsource first_careunit last_careunit#>     <int>      <int>   <int>      <int>   <char>         <char>        <char>#> 1:  12863      10124  182664     261764  carevue           MICU          MICU#> 2:  12864      10124  170883     222779  carevue           MICU          MICU#> 3:  12865      10124  170883     295043  carevue            CCU           CCU#> 4:  12866      10124  170883     237528  carevue           MICU          MICU#>    first_wardid last_wardid              intime             outtime     los#>           <int>       <int>              <POSc>              <POSc>   <num>#> 1:           23          23 2192-03-29 10:46:51 2192-04-01 06:36:00  2.8258#> 2:           50          50 2192-04-16 20:58:32 2192-04-20 08:51:28  3.4951#> 3:            7           7 2192-04-24 02:29:49 2192-04-26 23:59:45  2.8958#> 4:           23          23 2192-04-30 14:50:44 2192-05-15 23:34:21 15.3636

Much care has been taken to makericu extensible to new datasets. Forexample the publicly available ICU databaseAmsterdamUMCdbprovided by the Amsterdam University Medical Center, currently is not partof the core datasets ofricu, but code for integrating this dataset isavailable ongithub.

MIMIC-III

The Medical Information Mart for Intensive Care(MIMIC) database holdsdetailed clinical data from roughly 60,000 patient stays in Beth IsraelDeaconess Medical Center (BIDMC) intensive care units between 2001 and 2012.The database includes information such as demographics, vital signmeasurements made at the bedside (~1 data point per hour), laboratory testresults, procedures, medications, caregiver notes, imaging reports, andmortality (both in and out of hospital). For further information, pleaserefer to theMIMIC-III documentation.

The correspondingdemo datasetcontains the full data of a randomly selected subset of 100 patients fromthe patient cohort with conformed in-hospital mortality. The only notabledata omission is thenoteevents table, which contains unstructured textreports on patients.

eICU

More recently, Philips Healthcare and LCP began assembling theeICU Collaborative Research Database as a multi-center resourcefor ICU data. Combining data of several critical care units throughout thecontinental United States from the years 2014 and 2015, this databasecontains de-identified health data associated with over 200,000 admissions,including vital sign measurements, care plan documentation, severity ofillness measures, diagnosis information, and treatment information. Forfurther information, please refer to theeICU documentation.

For thedemo subset,data associated with ICU stays for over 2,500 unit stays selected from 20of the larger hospitals is included. An important caveat that applied to theeICU-based datasets is considerable variability among the large number ofhospitals in terms of data availability.

HiRID

Moving to higher time-resolution,HiRID is a freely accessible criticalcare dataset containing data relating to almost 34,000 patient admissionsto the Department of Intensive Care Medicine of the Bern UniversityHospital, Switzerland. The dataset contains de-identified demographicinformation and a total of 681 routinely collected physiological variables,diagnostic test results and treatment parameters, collected during theperiod from January 2008 to June 2016. Dependent on the type of measurement,time resolution can be on the order of 2 minutes.

AmsterdamUMCdb

With similar time-resolution (for vital-sign measurements) as HiRID,AmsterdamUMCdbcontains data from 23,000 admissions of adult patients from 2003-2016 tothe department of Intensive Care, of Amsterdam University Medical Center.In total, nearly 10^9^ individual observations consisting of vitals signs,clinical scoring systems, device data and lab results data, as well asnearly 5*10^6^ million medication entries, alongside de-identifieddemographic information corresponding to the 20,000 individual patientsis spread over 7 tables.

MIMIC-IV

The latest v2.2 release of MIMIC-IV is available in added inricu.Building on the success of MIMIC-III, this next iterationcontains data on patients admitted to an ICU or the emergency departmentbetween 2008 - 2019 at BIDMC. Therefore, relative to MIMIC-III, patientsadmitted prior to 2008 (whose data is stored in a CareVue-based system) hasbeen removed, while data onward of 2012 has been added. This simplifiesdata queries considerably, as the CareVue/MetaVision data split in MIMIC-IIIno longer applies. While addition of ED data is planned, this is not partof the initial v1.0 release and currently is not supported byricu. Forfurther information, please refer to theMIMIC-III documentation.

SICdb

The Salzburg ICU database (SICdb) originates from the University Hospital ofSalzburg. Inricu, version v1.0.6 is currently supported. Forfurther information, please refer to theSICdb documentation.

References

Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database(version 1.4). PhysioNet. https://doi.org/10.13026/C2XW26.

MIMIC-III, a freely accessible critical care database. Johnson AEW, PollardTJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA,and Mark RG. Scientific Data (2016). DOI: 10.1038/sdata.2016.35.

Johnson, A., Pollard, T., Badawi, O., & Raffa, J. (2019). eICUCollaborative Research Database Demo (version 2.0). PhysioNet.https://doi.org/10.13026/gxmm-es70.

The eICU Collaborative Research Database, a freely available multi-centerdatabase for critical care research. Pollard TJ, Johnson AEW, Raffa JD,Celi LA, Mark RG and Badawi O. Scientific Data (2018). DOI:http://dx.doi.org/10.1038/sdata.2018.178.

Faltys, M., Zimmermann, M., Lyu, X., Hüser, M., Hyland, S., Rätsch, G., &Merz, T. (2020). HiRID, a high time-resolution ICU dataset (version 1.0).PhysioNet. https://doi.org/10.13026/hz5m-md48.

Hyland, S.L., Faltys, M., Hüser, M. et al. Early prediction of circulatoryfailure in the intensive care unit using machine learning. Nat Med 26,364–373 (2020). https://doi.org/10.1038/s41591-020-0789-4

Thoral PJ, Peppink JM, Driessen RH, et al (2020) AmsterdamUMCdb: The FirstFreely Accessible European Intensive Care Database from the ESICM DataSharing Initiative. https://www.amsterdammedicaldatascience.nl.

Elbers, Dr. P.W.G. (Amsterdam UMC) (2019): AmsterdamUMCdb v1.0.2. DANS.https://doi.org/10.17026/dans-22u-f8vd

Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R.(2021). MIMIC-IV (version 1.0). PhysioNet.https://doi.org/10.13026/s6n6-xd98.

Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark,R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet:Components of a new research resource for complex physiologic signals.Circulation (Online). 101 (23), pp. e215–e220.

File system utilities

Description

Determine the location where to place data meant to persist betweenindividual sessions.

Usage

data_dir(subdir = NULL, create = TRUE)src_data_dir(srcs)auto_attach_srcs()config_paths()get_config(name, cfg_dirs = config_paths(), combine_fun = c, ...)set_config(x, name, dir = file.path("inst", "extdata", "config"), ...)

Arguments

subdir

A string specifying a directory that will be made sure toexist below the data directory.

create

Logical flag indicating whether to create the specifieddirectory

srcs

Character vector of data source names, an object for which ansrc_name() method is defined or an arbitrary-length list thereof.

name

File name of the configuration file (.json will be appended)

cfg_dirs

Character vector of directories searched for config files

combine_fun

If multiple files are found, a function for combiningreturned lists

...

Passed tojsonlite::read_json() orjsonlite::write_json()

x

Object to be written

dir

Directory to write the file to (created if non-existent)

Details

For data, the default location depends on the operating system as

Platform	Location
Linux	`⁠~/.local/share/ricu⁠`
macOS	`⁠~/Library/Application Support/ricu⁠`
Windows	`⁠%LOCALAPPDATA%/ricu⁠`

If the default storage directory does not exists, it will only be createdupon user consent (requiring an interactive session).

The environment variableRICU_DATA_PATH can be used to overwrite thedefault location. If desired, this variable can be set in an R startup fileto make it apply to all R sessions. For example, it could be set within:

A project-local.Renviron;
The user-level.Renviron;
A file at⁠$(R RHOME)/etc/Renviron.site⁠.

Any directory specified as environment variable will recursively be created.

Data source directories typically are sub-directories todata_dir() namedthe same as the respective dataset. For demo datasets corresponding tomimic andeicu, file location however deviates from this scheme. Thefunctionsrc_data_dir() is used to determine the expected data locationof a given dataset.

Configuration files used both for data source configuration, as well as fordictionary definitions potentially involve multiple files that are read andmerged. For that reason,get_config() will iterate over directoriespassed ascfg_dirs and look for the specified file (with suffix.jsonappended and might be missing in some of the queried directories). Allfound files are read byjsonlite::read_json() and the resulting lists arecombined by reduction with the binary function passed ascombine_fun.

With default arguments,get_config() will simply concatenate listscorresponding to files found in the default config locations as returned byconfig_paths(): first the directory specified by the environment variableRICU_CONFIG_PATH (if set), followed by the directory at

system.file("extdata", "config", package = "ricu")

Further arguments are passed tojsonlite::read_json(), which is calledwith slightly modified defaults:simplifyVector = TRUE,simplifyDataFrame = FALSE andsimplifyMatrix = FALSE.

The utility functionset_config() writes the list passed asx to filedir/name.json, usingjsonlite::write_json() also with slightly modifieddefaults (which can be overridden by passing arguments as...):null = "null",auto_unbox = TRUE andpretty = TRUE.

Whenever the package namespace is attached, a summary of datasetavailability is printed using the utility functionsauto_attach_srcs()andsrc_data_avail(). While the former simply returns a character vectorof data sources that are configures for automatically being set up onpackage loading, the latter returns a summary of the number of availabletables per dataset.m Finally,is_data_avail() returns a named logicalvector indicating which data sources have all required data available.

Value

Functionsdata_dir(),src_data_dir() andconfig_paths() returnfile paths as character vectors,auto_attach_srcs() returns a charactervector of data source names,src_data_avail() returns adata.framedescribing availability of data sources andis_data_avail() a namedlogical vector. Configuration utilitiesget_config() andset_config()read and write list objects to/from JSON format.

Examples

Sys.setenv(RICU_DATA_PATH = tempdir())identical(data_dir(), tempdir())dir.exists(file.path(tempdir(), "some_subdir"))some_subdir <- data_dir("some_subdir")dir.exists(some_subdir)cfg <- get_config("concept-dict")identical(  cfg,  get_config("concept-dict",             system.file("extdata", "config", package = "ricu")))

Data download utilities

Description

Making a dataset available toricu consists of 3 steps: downloading(download_src()), importing (import_src()) and attaching(attach_src()). While downloading and importing are one-time procedures,attaching of the dataset is repeated every time the package is loaded.Briefly, downloading loads the raw dataset from the internet (most likelyin.csv format), importing consists of some preprocessing to make thedata available more efficiently (by converting it to.fstformat) and attaching sets up the data for use by the package.

Usage

download_src(x, data_dir = src_data_dir(x), ...)## S3 method for class 'src_cfg'download_src(x, data_dir = src_data_dir(x), tables = NULL, force = FALSE, ...)## S3 method for class 'aumc_cfg'download_src(  x,  data_dir = src_data_dir(x),  tables = NULL,  force = FALSE,  token = NULL,  verbose = TRUE,  ...)## S3 method for class 'character'download_src(  x,  data_dir = src_data_dir(x),  tables = NULL,  force = FALSE,  user = NULL,  pass = NULL,  verbose = TRUE,  ...)

Arguments

x

Object specifying the source configuration

data_dir

Destination directory where the downloaded data is writtento.

...

Generic consistency

tables

Character vector specifying the tables to download. IfNULL, all available tables are downloaded.

force

Logical flag; ifTRUE, existing data will be re-downloaded

token

Download token for AmsterdamUMCdb (see 'Details')

verbose

Logical flag indicating whether to print progress information

user,pass

PhysioNet credentials; ifNULL and environmentvariablesRICU_PHYSIONET_USER/RICU_PHYSIONET_PASS are not set, userinput is required

Details

Downloads byricu are focused data hosted byPhysioNet and tools are currently available fordownloading the datasetsMIMIC-III,eICU andHiRID (seedata). Whilecredentials are required for downloading any of the three datasets, demodataset for both MIMIC-III and eICU are available without having to log in.Even though access to full dataset is credentialed, the datasets are infact publicly available. For setting up an account, please refer tothe registration form.

PhysioNet credentials can either be entered in an interactive session,passed as function argumentsuser/pass or as environmentvariablesRICU_PHYSIONET_USER/RICU_PHYSIONET_PASS. For settingenvironment variables on session startup, refer tobase::.First.sys() andfor setting environment variables in general, refer tobase::Sys.setenv()If the openssl package is available, SHA256 hashes of downloaded files areverified usingopenssl::sha256().

Demo datasetsMIMIC-III demo andeICU demo can either beinstalled as R packages directly by running

install.packages(  c("mimic.demo", "eicu.demo"),  repos = "https://eth-mds.github.io/physionet-demo")

or downloaded and imported usingdownload_src() andimport_src().Furthermore,ricu specifiesmimic.demo andeicu.demo asSuggestsdependencies therefore, passingdependencies = TURE when callinginstall.packages() for installingricu, this will automatically installthe demo datasets as well.

While the included data downloaders are intended for data hosted byPhysioNet,download_src() is an S3 generic function that can be extendedto new classes. Method dispatch is intended to occur on objects thatinherit from or can be coerced tosrc_cfg. For more information on datasource configuration, refer toload_src_cfg().

As such, with the addition of the AmsterdamUMCdb dataset, whichunfortunately is not hosted on PhysioNet, A separate downloader for thatdataset is available as well. Currently this requires both availability ofthe CRAN packagexml2, as well as the command line utility 7zip.Furthermore, data access has to berequested and fornon-interactive download the download token has to be made available asenvironment variableRICU_AUMC_TOKEN or passed astoken argument todownload_src(). The download token can be retrieved from the URL providedwhen granted access as by extracting the string followed by⁠token=⁠:

https://example.org/?s=download&token=0c27af59-72d1-0349-aa59-00000a8076d9

would translate to

Sys.setenv(RICU_AUMC_TOKEN = "0c27af59-72d1-0349-aa59-00000a8076d9")

If the dependencies outlined above are not fulfilled, download and archiveextraction can be carried out manually into the corresponding folder andimport_src() can be run.

Value

Called for side effects and returnsNULL invisibly.

Examples

## Not run: dir <- tempdir()list.files(dir)download_datasource("mimic_demo", data_dir = dir)list.files(dir)unlink(dir, recursive = TRUE)## End(Not run)

Time series utility functions

Description

ICU data as handled byricu is mostly comprised of time series data and assuch, several utility functions are available for working with time seriesdata in addition to a class dedicated to representing time series data (seets_tbl()). Some terminology to begin with: a time series is consideredto have gaps if, per (combination of) ID variable value(s), some time stepsare missing. Expanding and collapsing mean to change betweenrepresentations where time steps are explicit or encoded as interval withstart and end times. For sliding window-type operations,slide() means toiterate over time-windows,slide_index() means to iterate over certaintime-windows, selected relative to the index andhop() means to iterateover time-windows selected in absolute terms.

Usage

expand(  x,  start_var = index_var(x),  end_var = NULL,  step_size = time_step(x),  new_index = start_var,  keep_vars = NULL,  aggregate = FALSE)collapse(  x,  id_vars = NULL,  index_var = NULL,  start_var = "start",  end_var = "end",  env = NULL,  as_win_tbl = TRUE,  ...)has_no_gaps(x)has_gaps(...)is_regular(x)fill_gaps(x, limits = collapse(x), start_var = "start", end_var = "end")remove_gaps(x)slide(x, expr, before, after = hours(0L), ...)slide_index(x, expr, index, before, after = hours(0L), ...)hop(  x,  expr,  windows,  full_window = FALSE,  lwr_col = "min_time",  upr_col = "max_time",  left_closed = TRUE,  right_closed = TRUE,  eval_env = NULL,  ...)

Arguments

x

ts_tbl object to use

start_var,end_var

Name of the columns that represent lower and upperwindows bounds

step_size

Controls the step size used to interpolate betweenstart_var andend_var

new_index

Name of the new index column

keep_vars

Names of the columns to hold onto

aggregate

Function for aggregating values in overlapping intervals

id_vars,index_var

ID and index variables

env

Environment used as parent to the environment used to evaluateexpressions passes as...

as_win_tbl

Logical flag indicating whether to return awin_tbl oranid_tbl

...

Passed tohop_quo() and ultimately todata.table::[()

limits

A table with columns for lower and upper window bounds or alength 2 difftime vector

expr

Expression (quoted for⁠*_quo⁠ and unquoted otherwise) to beevaluated over each window

before,after

Time span to look back/forward

index

A vector of times around which windows are spanned (relativeto the index)

windows

Anicu_tbl defining the windows to span

full_window

Logical flag controlling how the situation is handledwhere the sliding window extends beyond available data

lwr_col,upr_col

Names of columns (inwindows) of lower/upperwindow bounds

left_closed,right_closed

Logical flag indicating whether intervalsare closed (default) or open.

eval_env

Environment in whichexpr is substituted;NULL resolvesto the environment in whichexpr was created

Details

A gap in ats_tbl object is a missing time step, i.e. a missing entry inthe sequenceseq(min(index), max(index), by = interval) in at least onegroup (as defined byid_vars(), where the extrema are calculated pergroup. In this case,has_gaps() will returnTRUE. The functionis_regular() checks whether the time series has no gaps, in addition tothe object being sorted and unique (seeis_sorted() andis_unique()).In order to transform a time series containing gaps into a regular timeseries,fill_gaps() will fill missing time steps withNA values in alldata_vars() columns, whileremove_gaps() provides the inverse operationof removing time steps that consist ofNA values indata_vars() columns.

Anexpand() operation performed on an object inheriting fromdata.tableyields ats_tbl where time-steps encoded by columnsstart_var andend_var are made explicit with values inkeep_vars being appropriatelyrepeated. The inverse operation is available ascollapse(), which groupsbyid_vars, representsindex_var as group-wise extrema in two newcolumnsstart_var andend_var and allows for further data summary using.... An aspect to keep in mind when applyingexpand() to awin_tblobject is that values simply are repeated for all time-steps that fall intoa given validity interval. This gives correct results when awin_tbl forexample contains data on infusions as rates, but might not lead to correctresults when infusions are represented as drug amounts administered over agiven time-span. In such a scenario it might be desirable to evenlydistribute the total amount over the corresponding time steps (currently notimplemented).

Sliding-window type operations are available asslide(),slide_index()andhop() (function naming is inspired by the CRAN packageslider). Themost flexible of the three,hop takes as input ats_tbl objectxcontaining the data, anid_tbl objectwindows, containing for each IDthe desired windows represented by two columnslwr_col andupr_col, aswell as an expressionexpr to be evaluated per window. At the other endof the spectrum,slide() spans windows for every ID and availabletime-step using the argumentsbefore andafter, whileslide_index()can be seen as a compromise between the two, where windows are spanned forcertain time-points, specified byindex.

Value

Most functions returnts_tbl objects with the exception ofhas_gaps()/has_no_gaps()/is_regular(), which return logical flags.

Examples

if (FALSE) {tbl <- ts_tbl(x = 1:5, y = hours(1:5), z = hours(2:6), val = rnorm(5),              index_var = "y")exp <- expand(tbl, "y", "z", step_size = 1L, new_index = "y",              keep_vars = c("x", "val"))col <- collapse(exp, start_var = "y", end_var = "z", val = unique(val))all.equal(tbl, col, check.attributes = FALSE)tbl <- ts_tbl(x = rep(1:5, 1:5), y = hours(sequence(1:5)), z = 1:15)win <- id_tbl(x = c(3, 4), a = hours(c(2, 1)), b = hours(c(3, 4)))hop(tbl, list(z = sum(z)), win, lwr_col = "a", upr_col = "b")slide_index(tbl, list(z = sum(z)), hours(c(4, 5)), before = hours(2))slide(tbl, list(z = sum(z)), before = hours(2))tbl <- ts_tbl(x = rep(3:4, 3:4), y = hours(sequence(3:4)), z = 1:7)has_no_gaps(tbl)is_regular(tbl)tbl[1, 2] <- hours(2)has_no_gaps(tbl)is_regular(tbl)tbl[6, 2] <- hours(2)has_no_gaps(tbl)is_regular(tbl)}

Data loading utilities

Description

Two important tools for smoothing out differences among used datasets areid_origin() which returns origin times for a given ID andid_map()which returns a mapping between two ID systems alongside start and endcolumns of the target ID system relative to the source ID system. As boththese function are called frequently during data loading and might involvesomewhat expensive operations, both rely on internal helper functions(id_orig_helper() andid_map_helper()) which perform the heavy liftingand wrap those helper functions, providing a memoization layer. When addinga new data source, a class specific implementation of the S3 genericfunctionid_map_helper() might be required, as this is used during dataloading usingload_id() andload_ts() viachange_id().

Usage

id_origin(x, id, origin_name = NULL, copy = TRUE)id_orig_helper(x, id)## S3 method for class 'src_env'id_orig_helper(x, id)## S3 method for class 'miiv_env'id_orig_helper(x, id)id_windows(x, copy = TRUE)id_win_helper(x)## S3 method for class 'mimic_env'id_win_helper(x)## S3 method for class 'eicu_env'id_win_helper(x)## S3 method for class 'sic_env'id_win_helper(x)## S3 method for class 'hirid_env'id_win_helper(x)## S3 method for class 'aumc_env'id_win_helper(x)## S3 method for class 'miiv_env'id_win_helper(x)id_map(x, id_var, win_var, in_time = NULL, out_time = NULL)id_map_helper(x, id_var, win_var)## S3 method for class 'src_env'id_map_helper(x, id_var, win_var)

Arguments

x

Object identify the ID system (passed toas_src_env())

id

ID name for which to return origin times

origin_name

String-valued name which will be used to label the origincolumn

copy

Logical flag indicating whether to return a copy of the memoized⁠0data.table⁠ for safety

id_var

Type of ID all returned times are relative to

win_var

Type of ID for which the in/out times is returned

in_time,out_time

column names of the returned in/out times

Details

For the internal datasets,id_map_helper() relies on yet another S3generic functionid_windows(), which provides a table containing allavailable ID systems, as well as all ID windows for a given data source. Asfor the other two functions, the same helper-function approach is in place,with the data loading functionid_win_helper(). The functionid_map_helper() is then implemented in a data source agnostic manner(dispatching on thesrc_env class), providing subsetting of this largerID map table and ensuring timestamps are relative to the correct ID system.For adding a new data source however, this layer can be forgone. Similarlyforid_origin(), this is used for the internal datasets inload_difftime(). An implementation ofload_difftime(), specific to anew data source can be provided that does not rely onid_windows(),making this function irrelevant for this specific dataset.

Value

id_origin()/id_orig_helper(): anid_tbl with admission time stampscorresponding to the selected ID
id_windows()/id_win_helper(): anid_tbl holding all IDs and theirrespective start and end times
id_map()/id_map_helper(): anid_tbl containing the selected IDs anddepending on values passed asin_time andout_time, start and endtimes of the ID passed aswin_var

Tabular ICU data classes

Description

In order to simplify handling or tabular ICU data,ricu providesS3 classes,id_tbl,ts_tbl, andwin_tbl. These classes essentiallyconsist of adata.table object, alongside some meta data and S3 dispatchis used to enable more natural behavior for some data manipulation tasks.For example, when merging two tables, a default for theby argument canbe chosen more sensibly if columns representing patient ID and timestampinformation can be identified.

Usage

id_tbl(..., id_vars = 1L)is_id_tbl(x)as_id_tbl(x, id_vars = NULL, by_ref = FALSE)ts_tbl(..., id_vars = 1L, index_var = NULL, interval = NULL)is_ts_tbl(x)as_ts_tbl(x, id_vars = NULL, index_var = NULL, interval = NULL, by_ref = FALSE)win_tbl(..., id_vars = NULL, index_var = NULL, interval = NULL, dur_var = NULL)is_win_tbl(x)as_win_tbl(  x,  id_vars = NULL,  index_var = NULL,  interval = NULL,  dur_var = NULL,  by_ref = FALSE)## S3 method for class 'id_tbl'as.data.table(x, keep.rownames = FALSE, by_ref = FALSE, ...)## S3 method for class 'id_tbl'as.data.frame(x, row.names = NULL, optional = FALSE, ...)validate_tbl(x)

Arguments

...

forwarded todata.table::data.table() or generic consistency

id_vars

Column name(s) to be used asid column(s)

x

Object to query/operate on

by_ref

Logical flag indicating whether to perform the operation byreference

index_var

Column name of the index column

interval

Time series interval length specified as scalar-valueddifftime object

dur_var

Column name of the duration column

keep.rownames

Default isFALSE. IfTRUE, adds the input object's names as a separate column named"rn".keep.rownames = "id" names the column"id" instead.

row.names

NULL or a character vector giving the rownames for the data frame. Missing values are not allowed.

optional

logical. IfTRUE, setting row names andconverting column names (to syntactic names: seemake.names) is optional. Note that all ofR'sbase packageas.data.frame() methods useoptional only for column names treatment, basically with themeaning ofdata.frame(*, check.names = !optional).See also themake.names argument of thematrix method.

Details

The introduced classes are designed for several often encountered datascenarios:

id_tbl objects can be used to represent static (with respect torelevant time scales) patient data such as patient age and such an objectis simply adata.table combined with a non-zero length character vectorvalued attribute marking the columns tracking patient ID information(id_vars). All further columns are considered asdata_vars.
ts_tbl objects are used for grouped time series data. Adata.tableobject again is augmented by attributes, including a non-zero lengthcharacter vector identifying patient ID columns (id_vars),a string, tracking the column holding time-stamps(index_var) and a scalardifftime object determiningthe time-series step sizeinterval. Again, all furthercolumns are treated asdata_vars.
win_tbl: In addition to representing grouped time-series data as doesats_tbl,win_tbl objects also encode a validity interval for eachtime-stamped measurement (asdur_var). This can for examplebe useful when a drug is administered at a certain infusion rate for agiven time period.

Owing to the nested structure of required meta data,ts_tbl inherits fromid_tbl andwin_tbl fromts_tbl. Furthermore, both classes inherit fromdata.table. As such,data.tablereference semanticsare available for some operations, indicated by presence of aby_refargument. At default, value,by_ref is set toFALSE as this is in linewith base R behavior at the cost of potentially incurring unnecessary datacopies. Some care has to be taken when passingby_ref = TRUE and enablingby reference operations as this can have side effects (see examples).

For instantiatingts_tbl objects, bothindex_var andinterval can beautomatically determined if not specified. For the index column, the onlyrequirement is that a singledifftime column ispresent, while for the time step, the minimal difference between twoconsecutive observations is chosen (and all differences are thereforerequired to be multiples of the minimum difference). Similarly, for awin_tbl, exactly twodifftime columns are requiredwhere the first is assumed to be corresponding to theindex_var and thesecond to thedur_var.

Upon instantiation, the data might be rearranged: columns are reorderedsuch that ID columns are moved to the front, followed by the index columnand adata.table::key() is set on meta columns, causing rows to be sortedaccordingly. Moving meta columns to the front is done for reasons ofconvenience for printing, while setting a key on meta columns is done toimprove efficiency of subsequent transformations such as merging or groupedoperations. Furthermore,NA values in either ID or index columns are notallowed and therefore corresponding rows are silently removed.

Coercion betweenid_tbl andts_tbl (andwin_tbl) by default keepsintersecting attributes fixed and new attributes are by default inferred asfor class instantiation. Each class comes with a class-specificimplementation of the S3 generic functionvalidate_tbl() which returnsTRUE if the object is considered valid or a string outlining the type ofvalidation failure that was encountered. Validity requires

inheriting fromdata.table and unique column names
forid_tbl that all columns specified by the non-zero length charactervector holding onto theid_vars specification are available
forts_tbl that the string-valuedindex_var column is available anddoes not intersect withid_vars and that the index column obeys thespecified interval.
forwin_tbl that the string-valueddur_var corresponds to adifftime vector and is not among the columns marked as index or IDvariables

Finally, inheritance can be checked by callingis_id_tbl() andis_ts_tbl(). Note that due tots_tbl inheriting fromid_tbl,is_id_tbl() returnsTRUE for bothid_tbl andts_tbl objects (andsimilarly forwin_tbl), whileis_ts_tbl() only returnsTRUE forts_tbl objects.

Value

Constructorsid_tbl()/ts_tbl()/win_tbl(), as well as coercionfunctionsas_id_tbl()/as_ts_tbl()/as_win_tbl() returnid_tbl/ts_tbl/win_tbl objects respectively,while inheritance testersis_id_tbl()/is_ts_tbl()/is_win_tbl() returnlogical flags andvalidate_tbl() returns eitherTRUE or a stringdescribing the validation failure.

Relationship to`data.table`

Bothid_tbl andts_tbl inherit fromdata.table and as such, functionsintended for use withdata.table objects can be applied toid_tbl andts_tbl as well. But there are some caveats: Many functions introduced bydata.table are not S3 generic and therefore they would have to be maskedin order to retain control over how they operate on objects inheriting formdata.table. Take for example the functiondata.table::setnames(), whichchanges column names by reference. Using this function, the name of anindex column of anid_tbl object can me changed without updating theattribute marking the column as such and thusly leaving the object in aninconsistent state. Instead of masking the functionsetnames(), analternative is provided asrename_cols(). In places where it is possibleto seamlessly insert the appropriate function (such asbase::names<-() orbase::colnames<-()) and the responsibility for notusingdata.table::setnames() in a way that breaks theid_tbl object isleft to the user.

Owing todata.table heritage, one of the functions that is often calledonid_tbl andts_tbl objects is base S3 generic [base::[()]. As thisfunction is capable of modifying the object in a way that makes itincompatible with attached meta data, an attempt is made at preserving asmuch as possible and if all fails, adata.table object is returnedinstead of an object inheriting formid_tbl. If for example the indexcolumn is removed (or modified in a way that makes it incompatible with theinterval specification) from ats_tbl, anid_tbl is returned. Ifhowever the ID column is removed the only sensible thing to return is adata.table (see examples).

Examples

tbl <- id_tbl(a = 1:10, b = rnorm(10))is_id_tbl(tbl)is_ts_tbl(tbl)dat <- data.frame(a = 1:10, b = hours(1:10), c = rnorm(10))tbl <- as_ts_tbl(dat, "a")is_id_tbl(tbl)is_ts_tbl(tbl)tmp <- as_id_tbl(tbl)is_ts_tbl(tbl)is_ts_tbl(tmp)tmp <- as_id_tbl(tbl, by_ref = TRUE)is_ts_tbl(tbl)is_ts_tbl(tmp)tbl <- id_tbl(a = 1:10, b = rnorm(10))names(tbl) <- c("c", "b")tbltbl <- id_tbl(a = 1:10, b = rnorm(10))validate_tbl(data.table::setnames(tbl, c("c", "b")))tbl <- id_tbl(a = 1:10, b = rnorm(10))validate_tbl(rename_cols(tbl, c("c", "b")))tbl <- ts_tbl(a = rep(1:2, each = 5), b = hours(rep(1:5, 2)), c = rnorm(10))tbl[, c("a", "c"), with = FALSE]tbl[, c("b", "c"), with = FALSE]tbl[, list(a, b = as.double(b), c)]

ICU class meta data utilities

Description

The two data classesid_tbl andts_tbl, used byricu to represent ICUpatient data, consist of adata.table alongside some meta data. Thisincludes marking columns that have special meaning and for datarepresenting measurements ordered in time, the step size. The followingutility functions can be used to extract columns and column names withspecial meaning, as well as query ats_tbl object regarding its timeseries related meta data.

Usage

id_vars(x)id_var(x)id_col(x)index_var(x)index_col(x)dur_var(x)dur_col(x)dur_unit(x)meta_vars(x)data_vars(x)data_var(x)data_col(x)interval(x)time_unit(x)time_step(x)time_vars(x)

Arguments

x

Object to query

Details

The following functions can be used to query an object for columns orcolumn names that represent a distinct aspect of the data:

id_vars(): ID variables are one or more column names with theinteraction of corresponding columns identifying a grouping of the data.Most commonly this is some sort of patient identifier.
id_var(): This function either fails or returns a string and cantherefore be used in case only a single column provides groupinginformation.
id_col(): Again, in case only a single column provides groupinginformation, this column can be extracted using this function.
index_var(): Suitable for use as index variable is a column that encodesa temporal ordering of observations asdifftimevector. Only a single column can be marked as index variable and thisfunction queries ats_tbl object for its name.
index_col(): similarly toid_col(), this function extracts the columnwith the given designation. As ats_tbl object is required to haveexactly one column marked as index, this function always returns forts_tbl objects (and fails forid_tbl objects).
dur_var(): Forwin_tbl objects, this returns the name of the columnencoding the data validity interval.
dur_col(): Similarly toindex_col(), this returns thedifftimevector corresponding to thedur_var().
meta_vars(): Forts_tbl objects, meta variables represent the unionof ID and index variables (forwin_tbl, this also includes thedur_var()), while forid_tbl objects meta variables consist pf IDvariables.
data_vars(): Data variables on the other hand are all columns that arenot meta variables.
data_var(): Similarly toid_var(), this function either returns thename of a single data variable or fails.
data_col(): Building ondata_var(), in situations where only a singledata variable is present, it is returned or if multiple data columnexists, an error is thrown.
time_vars(): Time variables are all columns in an object inheritingfromdata.frame that are of typedifftime. Therefore in ats_tbl object the indexcolumn is one of (potentially) several time variables. For awin_tbl,however thedur_var() is not among thetime_vars().
interval(): The time series interval length is represented a scalarvalueddifftime object.
time_unit(): The time unit of the time series interval, represented bya string such as "hours" or "mins" (seedifftime).
time_step(): The time series step size represented by a numeric valuein the unit as returned bytime_unit().

Value

Mostly column names as character vectors, in case ofid_var(),index_var(),data_var() andtime_unit() of length 1, else of variablelength. Functionsid_col(),index_col() anddata_col() return tablecolumns as vectors, whileinterval() returns a scalar valueddifftimeobject andtime_step() a number.

Examples

tbl <- id_tbl(a = rep(1:2, each = 5), b = rep(1:5, 2), c = rnorm(10),              id_vars = c("a", "b"))id_vars(tbl)tryCatch(id_col(tbl), error = function(...) "no luck")data_vars(tbl)data_col(tbl)tmp <- as_id_tbl(tbl, id_vars = "a")id_vars(tmp)id_col(tmp)tbl <- ts_tbl(a = rep(1:2, each = 5), b = hours(rep(1:5, 2)), c = rnorm(10))index_var(tbl)index_col(tbl)identical(index_var(tbl), time_vars(tbl))interval(tbl)time_unit(tbl)time_step(tbl)

Data import utilities

Description

Usage

import_src(x, ...)## S3 method for class 'src_cfg'import_src(  x,  data_dir = src_data_dir(x),  tables = NULL,  force = FALSE,  verbose = TRUE,  ...)## S3 method for class 'aumc_cfg'import_src(x, ...)## S3 method for class 'character'import_src(  x,  data_dir = src_data_dir(x),  tables = NULL,  force = FALSE,  verbose = TRUE,  cleanup = FALSE,  ...)import_tbl(x, ...)## S3 method for class 'tbl_cfg'import_tbl(  x,  data_dir = src_data_dir(x),  progress = NULL,  cleanup = FALSE,  ...)

Arguments

x

Object specifying the source configuration

...

Passed to downstream methods (finally toreadr::read_csv/readr::read_csv_chunked)/generic consistency

data_dir

Destination directory where the downloaded data is writtento.

tables

Character vector specifying the tables to download. IfNULL, all available tables are downloaded.

force

Logical flag; ifTRUE, existing data will be re-downloaded

verbose

Logical flag indicating whether to print progress information

cleanup

Logical flag indicating whether to remove raw csv files afterconversion to fst

progress

EitherNULL or a progress bar as created byprogress::progress_bar()

Details

In order to speed up data access operations,ricu does not directly usethe PhysioNet provided CSV files, but converts all data tofst::fst()format, which allows for random row and column access. Large tables aresplit into chunks in order to keep memory requirements reasonably low.

The one-time step per dataset of data import is fairly resource intensive:depending on CPU and available storage system, it will take on the order ofan hour to run to completion and depending on the dataset, somewherebetween 50 GB and 75 GB of temporary disk space are required as tables areuncompressed, in case of partitioned data, rows are reordered and the dataagain is saved to a storage efficient format.

The S3 generic functionimport_src() performs import of an entire datasource, internally calling the S3 generic functionimport_tbl() in orderto perform import of individual tables. Method dispatch is intended tooccur on objects inheriting fromsrc_cfg andtbl_cfg respectively. Suchobjects can be generated from JSON based configuration files which containinformation such as table names, column types or row numbers, in order toprovide safety in parsing of.csv files. For more information on datasource configuration, refer toload_src_cfg().

Current import capabilities include re-saving a.csv file to.fst atonce (used for smaller sized tables), reading a large.csv file using thereadr::read_csv_chunked() API, while partitioning chunks and reassemblingsub-partitions (used for splitting a large file into partitions), as wellas re-partitioning an already partitioned table according to a newpartitioning scheme. Care has been taken to keep the maximal memoryrequirements for this reasonably low, such that data import is feasible onlaptop class hardware.

Value

Called for side effects and returnsNULL invisibly.

Examples

## Not run: dir <- tempdir()list.files(dir)download_src("mimic_demo", dir)list.files(dir)import_src("mimic_demo", dir)list.files(dir)unlink(dir, recursive = TRUE)## End(Not run)

Load concept data

Description

Concept objects are used inricu as a way to specify how a clinicalconcept, such as heart rate can be loaded from a data source. Building onthis abstraction,load_concepts() powers concise loading of data withdata source specific preprocessing hidden away from the user, therebyproviding a data source agnostic interface to data loading. At defaultvalue of the argumentmerge_data, a tabular data structure (either ats_tbl or anid_tbl, depending on what kind ofconcepts are requested), inheriting fromdata.table, is returned, representing the datain wide format (i.e. returning concepts as columns).

Usage

load_concepts(x, ...)## S3 method for class 'character'load_concepts(  x,  src = NULL,  concepts = NULL,  ...,  dict_name = "concept-dict",  dict_dirs = NULL)## S3 method for class 'integer'load_concepts(  x,  src = NULL,  concepts = NULL,  ...,  dict_name = "concept-dict",  dict_dirs = NULL)## S3 method for class 'numeric'load_concepts(x, ...)## S3 method for class 'concept'load_concepts(  x,  src = NULL,  aggregate = NULL,  merge_data = TRUE,  verbose = TRUE,  ...)## S3 method for class 'cncpt'load_concepts(x, aggregate = NULL, ..., progress = NULL)## S3 method for class 'num_cncpt'load_concepts(x, aggregate = NULL, ..., progress = NULL)## S3 method for class 'unt_cncpt'load_concepts(x, aggregate = NULL, ..., progress = NULL)## S3 method for class 'fct_cncpt'load_concepts(x, aggregate = NULL, ..., progress = NULL)## S3 method for class 'lgl_cncpt'load_concepts(x, aggregate = NULL, ..., progress = NULL)## S3 method for class 'rec_cncpt'load_concepts(  x,  aggregate = NULL,  patient_ids = NULL,  id_type = "icustay",  interval = hours(1L),  ...,  progress = NULL)## S3 method for class 'item'load_concepts(  x,  patient_ids = NULL,  id_type = "icustay",  interval = hours(1L),  progress = NULL,  ...)## S3 method for class 'itm'load_concepts(  x,  patient_ids = NULL,  id_type = "icustay",  interval = hours(1L),  ...)

Arguments

x

Object specifying the data to be loaded

...

Passed to downstream methods

src

A character vector, used to subset theconcepts;NULLmeans no subsetting

concepts

The concepts to be used, orNULL. In the latter case thestandard ricu dictionary (obtained by callingload_dictionary()) is usedfor loading the objects specified inx.

dict_name,dict_dirs

In case not concepts are passed asconcepts,these are forwarded toload_dictionary() asname andfile arguments

aggregate

Controls how data within concepts is aggregated

merge_data

Logical flag, specifying whether to merge concepts intowide format or return a list, each entry corresponding to a concept

verbose

Logical flag for muting informational output

progress

EitherNULL, or a progress bar object as created byprogress::progress_bar

patient_ids

Optional vector of patient ids to subset the fetched datawith

id_type

String specifying the patient id type to return

interval

The time interval used to discretize time stamps with,specified asbase::difftime() object

Details

In order to allow for a large degree of flexibility (and extensibility),which is much needed owing to considerable heterogeneity presented bydifferent data sources, several nested S3 classes are involved inrepresenting a concept andload_concepts() follows this hierarchy ofclasses recursively whenresolving a concept. An outline of this hierarchy can be described as

concept: contains manycncpt objects (of potentially differingsub-types), each comprising of some meta-data and anitem object
item: contains manyitm objects (of potentially differingsub-types), each encoding how to retrieve a data item.

The design choice for wrapping a vector ofcncpt objects with a containerclassconcept is motivated by the requirement of having several differentsub-types ofcncpt objects (all inheriting from the parent typecncpt),while retaining control over how this homogeneous w.r.t. parent type, butheterogeneous w.r.t. sub-type vector of objects behaves in terms of S3generic functions.

Value

Anid_tbl/ts_tbl or a list thereof, depending on loadedconcepts and the value passed asmerge_data.

Concept

Top-level entry points are either a character vector of concept names or aninteger vector of concept IDs (matched againstomopid fields), which areused to subset aconcept object or an entireconcept dictionary, or aconcept object. When passing acharacter/integer vector as first argument, the most important furtherarguments at that level control from where the dictionary is taken(dict_name ordict_dirs). Atconcept level, the most importantadditional arguments control the result structure: data merging can bedisabled usingmerge_data and data aggregation is governed by theaggregate argument.

Data aggregation is important for merging several concepts into awide-format table, as this requires data to be unique per observation (i.e.by either id or combination of id and index). Several value types areacceptable asaggregate argument, the most important beingFALSE, whichdisables aggregation, NULL, which auto-determines a suitable aggregationfunction or a string which is ultimately passed todt_gforce() where itidentifies a function such assum(),mean(),min() ormax(). Moreinformation on aggregation is available asaggregate().If the object passed asaggregate is scalar, it is applied to allrequested concepts in the same way. In order to customize aggregation perconcept, a named object (with names corresponding to concepts) of the samelength as the number of requested concepts may be passed.

Under the hood, aconcept object comprises of severalcncpt objectswith varying sub-types (for examplenum_cncpt, representing continuousnumeric data orfct_cncpt representing categorical data). Thisimplementation detail is of no further importance for understanding conceptloading and for more information, please refer to theconcept documentation. The only argument that is introducedatcncpt level isprogress, which controls progress reporting. Ifcalled directly, the default value ofNULL yields messages, sent to theterminal. Internally, if called fromload_concepts() atconcept level(withverbose set toTRUE), aprogress::progress_bar is set up in away that allows nested messages to be captured and not interrupt progressreporting (seemsg_progress()).

Item

A singlecncpt object contains anitem object, which in turn iscomposed of severalitm objects with varying sub-types, the relationshipitem toitm being that ofconcept tocncpt and the rationale forthis implementation choice is the same as previously: a container classused representing a vector of objects of varying sub-types, all inheritingform a common super-type. For more information on theitem class, pleaserefer to therelevant documentation. Arguments introduced atitemlevel includepatient_ids,id_type andinterval. Acceptable values forinterval are scalar-valuedbase::difftime() objects (see also helperfunctions such ashours()) and this argument essentially controls thetime-resolution of the returned time-series. Of course, the limiting factorraw time resolution which is on the order of hours for data sets likeMIMIC-III oreICU but can be much higher for adata set likeHiRID. The argumentid_type is used to specify what kind of id system should be used toidentify different time series in the returned data. A data set likeMIMIC-III, for example, makes possible the resolution of data to 3 nestedID systems:

patient (subject_id): identifies a person
hadm (hadm_id): identifies a hospital admission (several of which arepossible for a given person)
icustay (icustay_id): identifies an admission to an ICU and again hasa one-to-many relationship tohadm.

Acceptable argument values are strings that match ID systems as specifiedby thedata source configuration. Finally,patient_idsis used to define a patient cohort for which data can be requested. Valuesmay either be a vector of IDs (which are assumed to be of the same type asspecified by theid_type argument) or a tabular object inheriting fromdata.frame, which must contain a column named after the data set-specificID system identifier (for MIMIC-III and anid_type argument ofhadm,for example, that would behadm_id).

Extensions

The presented hierarchy of S3 classes is designed with extensibility inmind: while the current range of functionality covers settings encounteredwhen dealing with the included concepts and datasets, further data setsand/or clinical concepts might necessitate different behavior for dataloading. For this reason, various parts in the cascade of calls toload_concepts() can be adapted for new requirements by defining new sub-classes tocncpt oritm and providing methods for the generic functionload_concepts()specific to these new classes. Atcncpt level, methoddispatch defaults toload_concepts.cncpt() if no method specific to thenew class is provided, while atitm level, no default function isavailable.

Roughly speaking, the semantics for the two functions are as follows:

cncpt: Called with argumentsx (the currentcncpt object),aggregate (controlling how aggregation per time-point and ID ishandled),... (further arguments passed to downstream methods) andprogress (controlling progress reporting), this function should be ableto load and aggregate data for the given concept. Usually this involvesextracting theitem object and callingload_concepts() again,dispatching on theitem class with argumentsx (the givenitem),arguments passed as..., as well asprogress.
itm: Called with argumentsx (the current object inheriting fromitm,patient_ids (NULL or a patient ID selection),id_type (astring specifying what ID system to retrieve), andinterval (the timeseries interval), this function actually carries out the loading ofindividual data items, using the specified ID system, rounding times tothe correct interval and subsetting on patient IDs. As return value, onobject of class as specified by thetarget entry is expected and alldata_vars() should be named consistently, as data corresponding tomultipleitm objects concatenated in row-wise fashion as inbase::rbind().

Examples

if (require(mimic.demo)) {dat <- load_concepts("glu", "mimic_demo")gluc <- concept("gluc",  item("mimic_demo", "labevents", "itemid", list(c(50809L, 50931L))))identical(load_concepts(gluc), dat)class(dat)class(load_concepts(c("sex", "age"), "mimic_demo"))}

Load concept dictionaries

Description

Data concepts can be specified in JSON format as a concept dictionary whichcan be read and parsed intoconcept/item objects. Dictionary loadingcan either be performed on the default included dictionary or on a user-specified custom dictionary. Furthermore, a mechanism is provided for addingconcepts and/or data sources to the existing dictionary (see the Detailssection).

Usage

load_dictionary(  src = NULL,  concepts = NULL,  name = "concept-dict",  cfg_dirs = NULL)concept_availability(dict = NULL, include_rec = FALSE, ...)explain_dictionary(  dict = NULL,  cols = c("name", "category", "description"),  ...)

Arguments

src

NULL or the name of one or several data sources

concepts

A character vector used to subset the concept dictionary orNULL indicating no subsetting

name

Name of the dictionary to be read

cfg_dirs

File name of the dictionary

dict

A dictionary (conncept object) orNULL

include_rec

Logical flag indicating whether to includerec_cncptconcepts as well

...

Forwarded toload_dictionary() in caseNULL is passed asdict argument

cols

Columns to include in the output ofexplain_dictionary()

Details

A default dictionary is provided at

system.file(  file.path("extdata", "config", "concept-dict.json"),  package = "ricu")

and can be loaded in to an R session by callingget_config("concept-dict"). The default dictionary can be extended byadding a fileconcept-dict.json to the path specified by the environmentvariableRICU_CONFIG_PATH. New concepts can be added to this file andexisting concepts can be extended (by adding new data sources).Alternatively,load_dictionary() can be called on non-defaultdictionaries using thefile argument.

In order to specify a concept as JSON object, for example the numericconcept for glucose, is given by

{  "glu": {    "unit": "mg/dL",    "min": 0,    "max": 1000,    "description": "glucose",    "category": "chemistry",    "sources": {      "mimic_demo": [        {          "ids": [50809, 50931],          "table": "labevents",          "sub_var": "itemid"        }      ]    }  }}

Using such a specification, constructors forcncpt anditm objects are called either using default arguments or asspecified by the JSON object, with the above corresponding to a call like

concept(  name = "glu",  items = item(    src = "mimic_demo", table = "labevents", sub_var = "itemid",    ids = list(c(50809L, 50931L))  ),  description = "glucose", category = "chemistry",  unit = "mg/dL", min = 0, max = 1000)

The argumentssrc andconcepts can be used to only load a subset of adictionary by specifying a character vector of data sources and/or conceptnames.

A summary of item availability for a set of concepts can be created usingconcept_availability(). This produces a logical matrix withTRUE entriescorresponding to concepts where for the given data source, at least a singleitem has been defined. If data is loaded for a combination of concept anddata source, where the corresponding entry isFALSE, this will yieldeither a zero-rowid_tbl object or an object inheriting formid_tblwhere the column corresponding to the concept isNA throughout, dependingon whether the concept was loaded alongside other concepts where data isavailable or not.

Whether to includerec_cncpt concepts in the overview produced byconcept_availability() can be controlled via the logical flaginclude_rec. A recursive concept is considered available simply if all itsbuilding blocks are available. This can, however lead to slightly confusingoutput as a recursive concept might not strictly depend on one of itssub-concepts but handle such missingness by design. In such a scenario, theavailability summary might reportFALSE even though data can still beproduced.

Value

Aconcept object containing several data concepts ascncptobjects.

Examples

if (require(mimic.demo)) {head(load_dictionary("mimic_demo"))load_dictionary("mimic_demo", c("glu", "lact"))}

Load data as`id_tbl` or`ts_tbl` objects

Description

Building on functionality provided byload_src() andload_difftime(),load_id() andload_ts() load data from disk asid_tbl andts_tblobjects respectively. Overload_difftime() bothload_id() andload_ts() provide a way to specifymeta_vars() (asid_var andindex_var arguments), as well as an interval size (asintervalargument) for time series data.

Usage

load_id(x, ...)## S3 method for class 'src_tbl'load_id(  x,  rows,  cols = colnames(x),  id_var = id_vars(x),  interval = hours(1L),  time_vars = ricu::time_vars(x),  ...)## S3 method for class 'character'load_id(x, src, ...)## S3 method for class 'itm'load_id(  x,  cols = colnames(x),  id_var = id_vars(x),  interval = hours(1L),  time_vars = ricu::time_vars(x),  ...)## S3 method for class 'fun_itm'load_id(x, ...)## Default S3 method:load_id(x, ...)load_ts(x, ...)## S3 method for class 'src_tbl'load_ts(  x,  rows,  cols = colnames(x),  id_var = id_vars(x),  index_var = ricu::index_var(x),  interval = hours(1L),  time_vars = ricu::time_vars(x),  ...)## S3 method for class 'character'load_ts(x, src, ...)## S3 method for class 'itm'load_ts(  x,  cols = colnames(x),  id_var = id_vars(x),  index_var = ricu::index_var(x),  interval = hours(1L),  time_vars = ricu::time_vars(x),  ...)## S3 method for class 'fun_itm'load_ts(x, ...)## Default S3 method:load_ts(x, ...)load_win(x, ...)## S3 method for class 'src_tbl'load_win(  x,  rows,  cols = colnames(x),  id_var = id_vars(x),  index_var = ricu::index_var(x),  interval = hours(1L),  dur_var = ricu::dur_var(x),  dur_is_end = TRUE,  time_vars = ricu::time_vars(x),  ...)## S3 method for class 'character'load_win(x, src, ...)## S3 method for class 'itm'load_win(  x,  cols = colnames(x),  id_var = id_vars(x),  index_var = ricu::index_var(x),  interval = hours(1L),  dur_var = ricu::dur_var(x),  dur_is_end = TRUE,  time_vars = ricu::time_vars(x),  ...)## S3 method for class 'fun_itm'load_win(x, ...)## Default S3 method:load_win(x, ...)

Arguments

x

Object for which to load data

...

Generic consistency

rows

Expression used for row subsetting (NSE)

cols

Character vector of column names

id_var

The column defining the id ofts_tbl andid_tbl objects

interval

The time interval used to discretize time stamps with,specified asbase::difftime() object

time_vars

Character vector enumerating the columns to be treated astimestamps and thus returned asbase::difftime() vectors

src

Passed toas_src_tbl() in order to determine the data source

index_var

The column defining the index ofts_tbl objects

dur_var

The column used for determining durations

dur_is_end

Logical flag indicating whether to use durations as-is orto calculated them by subtracting theindex_var column

Details

While forload_difftime() the ID variable can be suggested, the functiononly returns a best effort at fulfilling this request. In some cases, wherethe data does not allow for the desired ID type, data is returned using theID system (among all available ones for the given table) with highestcardinality. Bothload_id() andload_ts() are guaranteed to return anobject withid_vars() set as requested by theid_var argument.Internally, the change of ID system is performed bychange_id().

Additionally, while times returned byload_difftime() are in 1 minuteresolution, the time series step size can be specified by theintervalargument when callingload_id() orload_ts(). This rounding andpotential change of time unit is performed bychange_interval() on allcolumns specified by thetime_vars argument. All time stamps are relativeto the origin provided by the ID system. This means that for anid_varcorresponding to hospital IDs, times are relative to hospital admission.

Whenload_id() (orload_ts()) is called onitm objectsinstead ofsrc_tbl (or objects that can be coerced tosrc_tbl), The row-subsetting is performed according the the specificationas provided by theitm object. Furthermore, at default settings, columnsare returned as required by theitm object andid_var (as well asindex_var) are set accordingly if specified by theitm or set todefault values for the givensrc_tbl object if not explicitly specified.

Value

Anid_tbl or ats_tbl object.

Examples

if (require(mimic.demo)) {load_id("admissions", "mimic_demo", cols = "admission_type")dat <- load_ts(mimic_demo$labevents, itemid %in% c(50809L, 50931L),               cols = c("itemid", "valuenum"))glu <- new_itm(src = "mimic_demo", table = "labevents",               sub_var = "itemid", ids = c(50809L, 50931L))identical(load_ts(glu), dat)}

Low level functions for loading data

Description

Data loading involves a cascade of S3 generic functions, which canindividually be adapted to the specifics of individual data sources. A thelowest level,load_scr is called, followed byload_difftime().Functions up the chain, are described inload_id().

Usage

load_src(x, ...)## S3 method for class 'src_tbl'load_src(x, rows, cols = colnames(x), ...)## S3 method for class 'character'load_src(x, src, ...)load_difftime(x, ...)## S3 method for class 'mimic_tbl'load_difftime(  x,  rows,  cols = colnames(x),  id_hint = id_vars(x),  time_vars = ricu::time_vars(x),  ...)## S3 method for class 'eicu_tbl'load_difftime(  x,  rows,  cols = colnames(x),  id_hint = id_vars(x),  time_vars = ricu::time_vars(x),  ...)## S3 method for class 'hirid_tbl'load_difftime(  x,  rows,  cols = colnames(x),  id_hint = id_vars(x),  time_vars = ricu::time_vars(x),  ...)## S3 method for class 'aumc_tbl'load_difftime(  x,  rows,  cols = colnames(x),  id_hint = id_vars(x),  time_vars = ricu::time_vars(x),  ...)## S3 method for class 'miiv_tbl'load_difftime(  x,  rows,  cols = colnames(x),  id_hint = id_vars(x),  time_vars = ricu::time_vars(x),  ...)## S3 method for class 'sic_tbl'load_difftime(  x,  rows,  cols = colnames(x),  id_hint = id_vars(x),  time_vars = ricu::time_vars(x),  ...)## S3 method for class 'character'load_difftime(x, src, ...)

Arguments

x

Object for which to load data

...

Generic consistency

rows

Expression used for row subsetting (NSE)

cols

Character vector of column names

src

Passed toas_src_tbl() in order to determine the data source

id_hint

String valued id column selection (not necessarily honored)

time_vars

Character vector enumerating the columns to be treated astimestamps and thus returned asbase::difftime() vectors

Details

A function extending the S3 genericload_src() is expected to load asubset of rows/columns from a tabular data source. While the columnspecification is provided as character vector of column names, the rowsubsetting involves non-standard evaluation (NSE). Data-sets that areincluded withricu are represented byprt objects,which userlang::eval_tidy() to evaluate NSE expressions. Furthermore,prt objects potentially represent tabular data split into partitions androw-subsetting expressions are evaluated per partition (see thepart_safeflag inprt::subset.prt()). The return value ofload_src() is expectedto be of typedata.table.

Timestamps are represented differently among the included data sources:while MIMIC-III and HiRID use absolute date/times, eICU provides temporalinformation as minutes relative to ICU admission. Other data sources, suchas the ICU dataset provided by Amsterdam UMC, opt for relative times aswell, but not in minutes since admission, but in milliseconds. In order tosmoothen out such discrepancies, the next function in the data loadinghierarchy isload_difftime(). This function is expected to callload_src() in order to load a subset of rows/columns from a table storedon disk and convert all columns that represent timestamps (as specified bythe argumenttime_vars) intobase::difftime() vectors usingmins astime unit.

The returned object should be of typeid_tbl, with the ID varsidentifying the ID system the times are relative to. If for example alltimes are relative to ICU admission, the ICU stay ID should be returned asID column. The argumentid_hint may suggest an ID type, but if in the rawdata, this ID is not available,load_difftime() may return data using adifferent ID system. In MIMIC-III, for example, data in thelabeventstable is available forsubject_id (patient ID) prhadm_id (hospitaladmission ID). If data is requested foricustay_id (ICU stay ID), thisrequest cannot be fulfilled and data is returned using the ID system withthe highest cardinality (among the available ones). Utilities such aschange_id() can the later be used to resolve data toicustay_id.

Value

Adata.table object.

Examples

if (require(mimic.demo)) {tbl <- mimic_demo$labeventscol <- c("charttime", "value")load_src(tbl, itemid == 50809)colnames(  load_src("labevents", "mimic_demo", itemid == 50809, cols = col))load_difftime(tbl, itemid == 50809)colnames(  load_difftime(tbl, itemid == 50809, col))id_vars(  load_difftime(tbl, itemid == 50809, id_hint = "icustay_id"))id_vars(  load_difftime(tbl, itemid == 50809, id_hint = "subject_id"))}

Load configuration for a data source

Description

For a data source to be accessible byricu, a configuration objectinheriting from the S3 classsrc_cfg is required. Such objects can begenerated from JSON based configuration files, usingload_src_cfg().Information encoded by this configuration object includes available IDsystems (mainly for use inchange_id(), default column names per tablefor columns with special meaning (such as index column, value columns, unitcolumns, etc.), as well as a specification used for initial setup of thedataset which includes file names and column names alongside their datatypes.

Usage

load_src_cfg(src = NULL, name = "data-sources", cfg_dirs = NULL)

Arguments

src

(Optional) name(s) of data sources used for subsetting

name

String valued name of a config file which will be looked up inthe default config directors

cfg_dirs

Additional directory/ies to look for configuration files

Details

Configuration files are looked for as filesname with added suffix.json starting with the directory (or directories) supplied ascfg_dirsargument, followed by the directory specified by the environment variableRICU_CONFIG_PATH, and finally inextdata/config of the package installdirectory. If files with matching names are found in multiple places theyare concatenated such that in cases of name clashes. the earlier hits takeprecedent over the later ones. The following JSON code blocks show excerptsof the config file available at

system.file("extdata", "config", "data-sources.json", package = "ricu")

A data source configuration entry in a config file starts with a name,followed by optional entriesclass_prefix and further (variable)key-value pairs, such as an URL. For more information onclass_prefix,please refer to the end of this section. Further entries includeid_cfgandtables which are explained in more detail below. As outline, thisgives for the data sourcemimic_demo, the following JSON object:

{  "name": "mimic_demo",  "class_prefix": ["mimic_demo", "mimic"],  "url": "https://physionet.org/files/mimiciii-demo/1.4",  "id_cfg": {    ...  },  "tables": {    ...  }}

Theid_cfg entry is used to specify the available ID systems for a datasource and how they relate to each other. An ID system within the contextofricu is a patient identifier of which typically several are present ina data set. In MIMIC-III, for example, three ID systems are available:patient IDs (subject_id), hospital admission IDs (hadm_id) and ICU stayIDs (icustay_id). Furthermore there is a one-to-many relationship betweensubject_id andhadm_id, as well as betweenhadm_id andicustay_id.Required for defining an ID system are a name, aposition entry whichorders the ID systems by their cardinality, atable entry, alongsidecolumn specificationsid,start andend, which define how the IDsthemselves, combined with start and end times can be loaded from a table.This gives the following specification for the ICU stay ID system inMIMIC-III:

{  "icustay": {    "id": "icustay_id",    "position": 3,    "start": "intime",    "end": "outtime",    "table": "icustays"  }}

Tables are defined by a name and entriesfiles,defaults, andcols,as well as optional entriesnum_rows andpartitioning. Asfiles entry,a character vector of file names is expected. For all of MIMIC-III a single.csv file corresponds to a table, but for example for HiRID, some tablesare distributed in partitions. Thedefaults entry consists of key-valuepairs, identifying columns in a table with special meaning, such as thedefault index column or the set of all columns that represent timestamps.This gives as an example for a table entry for thechartevents table inMIMIC-III a JSON object like:

{  "chartevents": {    "files": "CHARTEVENTS.csv.gz",    "defaults": {      "index_var": "charttime",      "val_var": "valuenum",      "unit_var": "valueuom",      "time_vars": ["charttime", "storetime"]    },    "num_rows": 330712483,    "cols": {      ...    },    "partitioning": {      "col": "itemid",      "breaks": [127, 210, 425, 549, 643, 741, 1483, 3458, 3695, 8440,                 8553, 220274, 223921, 224085, 224859, 227629]    }  }}

The optionalnum_rows entry is used when importing data (seeimport_src()) as a sanity check, which is not performed if this entry ismissing from the data source configuration. The remaining table entry,partitioning, is optional in the sense that if it is missing, the tableis not partitioned and if it is present, the table will be partitionedaccordingly when being imported (seeimport_src()). In order to specify apartitioning, two entries are required,col andbreaks, where the formerdenotes a column and the latter a numeric vector which is used to constructintervals according to whichcol is binned. As such, currentlycol isrequired to be of numeric type. Apartitioning entry as in the exampleabove will assign rows corresponding toidemid 1 through 126 to partition1, 127 through 209 to partition 2 and so on up to partition 17.

Column specifications consist of aname and aspec entry alongside aname which determines the column name that will be used byricu. Thespec entry is expected to be the name of a column specification functionof thereadr package (seereadr::cols()) and all further entries in acols object are used as arguments to thereadr column specification.For theadmissions table of MIMIC-III the columnshadm_id andadmittime are represented by:

{  ...,  "hadm_id": {    "name": "HADM_ID",    "spec": "col_integer"  },  "admittime": {    "name": "ADMITTIME",    "spec": "col_datetime",    "format": "%Y-%m-%d %H:%M:%S"  },  ...}

Internally, asrc_cfg object consist of further S3 classes, which areinstantiated when loading a JSON source configuration file. Functions forcreating and manipulatingsrc_cfg and related objects are markedinternal but a brief overview is given here nevertheless:

src_cfg: wraps objectsid_cfg,col_cfg and optionallytbl_cfg
id_cfg: contains information in ID systems and is created fromid_cfgentries in config files
col_cfg: contains column default settings represented bydefaultsentries in table configuration blocks
tbl_cfg: used when importing data and therefore encompasses informationinfiles,num_rows andcols entries of table configuration blocks

Asrc_cfg can be instantiated without correspondingtbl_cfg butconsequently cannot be used for data import (seeimport_src()). In thatsense, table config entriesfiles andcols are optional as well withthe restriction that the data source has to be already available in.fstformat

An example for such a slimmed down config file is available at

system.file("extdata", "config", "demo-sources.json", package = "ricu")

Theclass_prefix entry in a data source configuration is used create sub-classes tosrc_cfg,id_cfg,col_cfg andtbl_cfg classes and passedon to constructors ofsrc_env (new_src_env()) andsrc_tblnew_src_tbl() objects. As an example, for the aboveclass_prefix valueofc("mimic_demo", "mimic"), the correspondingsrc_cfg will be assignedclassesc("mimic_demo_cfg", "mimic_cfg", "src_cfg") and consequently thesrc_tbl objects will inherit from"mimic_demo_tbl","mimic_tbl" and "src_tbl". This can be used to adapt the behavior of involved S3 genericfunction to specifics of the different data sources. An example for this ishowload_difftime() uses theses sub-classes to smoothen out differenttime-stamp representations. Furthermore, such a design was chosen withextensibility in mind. Currently,download_src() is designed around datasources hosted on PhysioNet, but in order to include a dataset external toPhysioNet, thedownload_src() generic can simply be extended for the newclass.

Value

A list of data source configurations assrc_cfg objects.

Examples

cfg <- load_src_cfg("mimic_demo")str(cfg, max.level = 1L)cfg <- cfg[["mimic_demo"]]str(cfg, max.level = 1L)cols <- as_col_cfg(cfg)index_var(head(cols))time_vars(head(cols))as_id_cfg(cfg)

Utility functions

Description

Several utility functions exported for convenience.

Usage

min_or_na(x)max_or_na(x)is_val(x, val)not_val(x, val)is_true(x)is_false(x)last_elem(x)first_elem(x)

Arguments

x

Object to use

val

Value to compare against or to use as replacement

Details

The two functionsmin_or_na() andmax_or_na() overcome a design choiceofbase::min() (orbase::max()⁠) that can yield undesirable results. If called on a vector of all missing values with ⁠na.rm = TRUE⁠, ⁠Inf⁠(and⁠-Inf⁠respectively) are returned. This is changed to returning a missing value of the same type as⁠x'.

The functionsis_val() andnot_val() (as well as analogouslyis_true() andis_false()) return logical vectors of the same length asthe value passed asx, with non-base R semanticists of comparing againstNA: instead of returningc(NA, TRUE) forc(NA, 5) == 5,is_val()will return⁠c(FALSE TRUE)⁠. PassingNA asval might lead to unintendedresults but no warning is thrown.

Finally,first_elem() andlast_elem() has the same semantics asutils::head() andutils::tail() withn = 1L andreplace_na() willreplace all occurrences ofNA inx withval and can be called on bothobjects inheriting fromdata.table in which case internallydata.table::setnafill() is called or other objects.

Value

min_or_na()/max_or_na(): scalar-valued extrema of a vector
is_val()/not_val()/is_true()/is_false(): Logical vector of thesame length as the object passed asx
first_elem()/last_elem(): single element of the object passed asx
replace_na(): modified version of the object passed asx

Examples

some_na <- c(NA, sample(1:10, 5), NA)identical(min(some_na, na.rm = TRUE), min_or_na(some_na))all_na <- rep(NA, 5)min(all_na, na.rm = TRUE)min_or_na(all_na)is_val(some_na, 5)some_na == 5is_val(some_na, NA)identical(first_elem(letters), head(letters, n = 1L))identical(last_elem(letters), tail(letters, n = 1L))replace_na(some_na, 11)replace_na(all_na, 11)replace_na(1:5, 11)tbl <- ts_tbl(a = 1:10, b = hours(1:10), c = c(NA, 1:5, NA, 8:9, NA))res <- replace_na(tbl, 0)identical(tbl, res)

Message signaling nested with progress reporting

Description

In order to not interrupt progress reporting by aprogress::progress_bar,messages are wrapped with classmsg_progress which causes them to becaptured printed after progress bar completion. This function is intended tobe used when signaling messages in callback functions.

Usage

msg_progress(..., envir = parent.frame())fmt_msg(msg, envir = parent.frame(), indent = 0L, exdent = 0L)

Arguments

...

Passed tobase::.makeMessage()

envir

Environment in this objects frommsg are evaluated

msg

String valued message

indent,exdent

Vector valued and mapped tofansi::strwrap2_ctl()

Value

Called for side effects and returnsNULL invisibly.

Examples

msg_progress("Foo", "bar")capt_fun <- function(x) {  message("captured: ", conditionMessage(x))}tryCatch(msg_progress("Foo", "bar"), msg_progress = capt_fun)

Data Concepts

Description

Concept objects are used inricu as a way to specify how a clinicalconcept, such as heart rate can be loaded from a data source and are mainlyconsumed byload_concepts(). Several functions are available forconstructingconcept (and related auxiliary) objects either from code orby parsing a JSON formatted concept dictionary usingload_dictionary().

Usage

new_cncpt(  name,  items,  description = name,  omopid = NA_integer_,  category = NA_character_,  aggregate = NULL,  ...,  target = "ts_tbl",  class = "num_cncpt")is_cncpt(x)init_cncpt(x, ...)## S3 method for class 'num_cncpt'init_cncpt(x, unit = NULL, min = NULL, max = NULL, ...)## S3 method for class 'unt_cncpt'init_cncpt(x, unit = NULL, min = NULL, max = NULL, ...)## S3 method for class 'fct_cncpt'init_cncpt(x, levels, ...)## S3 method for class 'cncpt'init_cncpt(x, ...)## S3 method for class 'rec_cncpt'init_cncpt(  x,  callback = paste0("rename_data_var('", x[["name"]], "')"),  interval = NULL,  ...)new_concept(x)concept(...)is_concept(x)as_concept(x)

Arguments

name

The name of the concept

items

Zero or moreitm objects

description

String-valued concept description

omopid

OMOP identifier

category

String-valued category

aggregate

NULL or a string denoting a function used to aggregate perid and if applicable per time step

...

Further specification of thecncpt object (passed toinit_cncpt())

target

The target object yielded by loading

class

NULL or a string-valued sub-class name used for customizingconcept behavior

x

Object to query/dispatch on

unit

A string, specifying the measurement unit of the concept (canbeNULL)

min,max

Scalar valued; defines a range of plausible values for anumeric concept

levels

A vector of possible values a categorical concept may take on

callback

Name of a function to be called on the returned data usedfor data cleanup operations

interval

Time interval used for data loading; if NULL, the respectiveinterval passed as argument toload_concepts() is taken

Details

concept: contains manycncpt objects (of potentially differingsub-types), each comprising of some meta-data and anitem object
item: contains manyitm objects (of potentially differingsub-types), each encoding how to retrieve a data item.

Each individualcncpt object contains the following information: a string-valued name, anitem vector containingitmobjects, a string-valued description (can be missing), a string-valuedcategory designation (can be missing), a character vector-valuedspecification for an aggregation function and a target class specification(e.g.id_tbl orts_tbl). Additionally, a sub-class tocncpt has to be specified, each representing a differentdata-scenario and holding further class-specific information. The followingsub-classes tocncpt are available:

num_cncpt: The most widely used concept type is indented for conceptsrepresenting numerical measurements. Additional information that can bespecified includes a string-valued unit specification, alongside aplausible range which can be used during data loading.
fct_cncpt: In case of categorical concepts, such assex, a set offactor levels can be specified, against which the loaded data is checked.
lgl_cncpt: A special case offct_cncpt, this allows only for logicalvalues (TRUE,FALSE andNA).
rec_cncpt: More involved concepts, such as aSOFA scorecan pull in other concepts. Recursive concepts can build on otherrecursive concepts up to arbitrary recursion depth. Owing to the morecomplicated nature of such concepts, acallback function can bespecified which is used in data loading for concept-specific post-processing steps.
unt_cncpt: A recent (experimental) addition which inherits fromnum_cncpt but instead of manual unit conversion, leverages

Class instantiation is organized in the same fashion as foritem objects:concept() maps vector-valued argumentstonew_cncpt(), which internally calls the S3 generic functioninit_cncpt(), whilenew_concept() instantiates aconcept object froma list ofcncpt objects (created by calls tonew_cncpt()). Coercion isonly possible fromlist andcncpt, by callingas_concept() andinheritance can be checked usingis_concept() oris_cncpt().

Value

Constructors and coercion functions returncncpt andconceptobjects, while inheritance tester functions return logical flags.

Examples

if (require(mimic.demo)) {gluc <- concept("glu",  item("mimic_demo", "labevents", "itemid", list(c(50809L, 50931L))),  description = "glucose", category = "chemistry",  unit = "mg/dL", min = 0, max = 1000)is_concept(gluc)identical(gluc, load_dictionary("mimic_demo", "glu"))gl1 <- new_cncpt("glu",  item("mimic_demo", "labevents", "itemid", list(c(50809L, 50931L))),  description = "glucose")is_cncpt(gl1)is_concept(gl1)conc <- concept(c("glu", "lact"),  list(    item("mimic_demo", "labevents", "itemid", list(c(50809L, 50931L))),    item("mimic_demo", "labevents", "itemid", 50813L)  ),  description = c("glucose", "lactate"))concidentical(as_concept(gl1), conc[1L])}

Data items

Description

Item objects are used inricu as a way to specify how individual dataitems corresponding to clinical concepts (see alsoconcept()), such asheart rate can be loaded from a data source. Several functions areavailable for constructingitem (and related auxiliary) objects eitherfrom code or by parsing a JSON formatted concept dictionary usingload_dictionary().

Usage

new_itm(src, ..., interval = NULL, target = NA_character_, class = "sel_itm")is_itm(x)init_itm(x, ...)## S3 method for class 'sel_itm'init_itm(x, table, sub_var, ids, callback = "identity_callback", ...)## S3 method for class 'hrd_itm'init_itm(x, table, sub_var, ids, callback = "identity_callback", ...)## S3 method for class 'col_itm'init_itm(x, table, unit_val = NULL, callback = "identity_callback", ...)## S3 method for class 'rgx_itm'init_itm(x, table, sub_var, regex, callback = "identity_callback", ...)## S3 method for class 'fun_itm'init_itm(x, callback, ...)## S3 method for class 'itm'init_itm(x, ...)new_item(x)item(...)as_item(x)is_item(x)

Arguments

src

The data source name

...

Further specification of theitm object (passed toinit_itm())

interval

A default data loading interval (either specified as scalardifftime or string such as "00:01:00")

target

Item target class (e.g. "id_tbl"),NA indicates no specificclass requirement

class

Sub class for customizingitm behavior

x

Object to query/dispatch on

table

Name of the table containing the data

sub_var

Column name used for subsetting

ids

Vector of ids used to subset table rows. IfNULL, all rows areconsidered corresponding to the data item

callback

Name of a function to be called on the returned data usedfor data cleanup operations (or a string that evaluates to a function)

unit_val

String valued unit to be used in case nounit_var isavailable for the given table

regex

String-valued regular expression which will be evaluated bybase::grepl() withignore.case = TRUE

Details

concept: contains manycncpt objects (of potentiallydiffering sub-types), each comprising of some meta-data and anitemobject
item: contains manyitm objects (of potentially differingsub-types), each encoding how to retrieve a data item.

The design choice for wrapping a vector ofitm objects with a containerclassitem is motivated by the requirement of having several differentsub-types ofitm objects (all inheriting from the parent typeitm),while retaining control over how this homogeneous w.r.t. parent type, butheterogeneous w.r.t. sub-type vector of objects behaves in terms of S3generic functions.

The following sub-classes toitm are available, each representing adifferent data-scenario:

sel_itm: The most widely used item class is intended for the situationwhere rows of interest can be identified by looking for occurrences of aset of IDs (ids) in a column (sub_var). An example for this is heartratehr on mimic, where the IDs211 and 220045⁠are looked up in the⁠itemid⁠column of⁠chartevents'.
col_itm: This item class can be used if no row-subsetting is required.An example for this is heart rate (hr) oneicu, where the tablevitalperiodic contains an entire column dedicated to heart ratemeasurements.
rgx_itm: As alternative to the value-matching approach ofsel_itmobjects, this class identifies rows using regular expressions. Used forexample for insulin ineicu, where the regular expression⁠^insulin (250.+)?\$((ml|units)/hr)?\$$⁠ is matched against thedrugname columnofinfusiondrug. The regular expression is evaluated bybase::grepl()withignore.case = TRUE.
fun_itm: Intended for the scenario where data of interest is notdirectly available from a table, thisitm class offers most flexibility.A function can be specified ascallback and this function will be calledwith argumentsx (the object itself),patient_ids,id_type andinterval (seeload_concepts()) and is expected to return an object asspecified by thetarget entry.
hrd_itm: A special case ofsel_itm for HiRID data where measurementunits are not available as separate column, but as separate table withunits fixed per concept.

Allitm objects have to specify a data source (src) as well as asub-class. Further arguments then are specific to the respective sub-classand encode information that define data loading, such as the table toquery, the column name and values to use for identifying relevant rows,etc. The S3 generic functioninit_itm() is responsible for inputvalidation of class-specific arguments as well as class initialization. Alist ofitm objects, created by calls tonew_itm() can be passed tonew_item in order to instantiate anitem object. An alternativeconstructor foritem objects is given byitem() which callsnew_itm()on the passed arguments (see examples). Finallyas_item() can be usedfor coercion of related objects such aslist,concept, and the like.Several additional S3 generic functions exist for manipulation ofitem-like objects but are markedinternal (seeitem/concept utilities).

Value

Constructors and coercion functions returnitm anditem objects,while inheritance tester functions return logical flags.

Examples

if (require(mimic.demo)) {gluc <- item("mimic_demo", "labevents", "itemid", list(c(50809L, 50931L)),             unit_var = TRUE, target = "ts_tbl")is_item(gluc)all.equal(gluc, as_item(load_dictionary("mimic_demo", "glu")))hr1 <- new_itm(src = "mimic_demo", table = "chartevents",               sub_var = "itemid", ids = c(211L, 220045L))hr2 <- item(src = c("mimic_demo", "eicu_demo"),            table = c("chartevents", "vitalperiodic"),            sub_var = list("itemid", NULL),            val_var = list(NULL, "heartrate"),            ids = list(c(211L, 220045L), FALSE),            class = c("sel_itm", "col_itm"))hr3 <- new_itm(src = "eicu_demo", table = "vitalperiodic",               val_var = "heartrate", class = "col_itm")identical(as_item(hr1), hr2[1])identical(new_item(list(hr1)), hr2[1])identical(hr2, as_item(list(hr1, hr3)))}

Internal utilities for working with data source configurations

Description

Data source configuration objects store information on data sources usedthroughoutricu. This includes URLs for data set downloading, Columnspecifications used for data set importing, default values per table forimportant columns such as index columns when loading data and how differentpatient identifiers used throughout a dataset relate to another. Perdataset, asrc_cfg object is created from a JSON file (seeload_src_cfg()), consisting of several helper-classes compartmentalizingthe pieces of information outlined above. Alongside constructors for thevarious classes, several utilities, such as inheritance checks, coercionfunctions, as well as functions to extract pieces of information from theseobjects are provided.

Usage

new_src_cfg(name, id_cfg, col_cfg, tbl_cfg, ..., class_prefix = name)new_id_cfg(  src,  name,  id,  pos = seq_along(name),  start = NULL,  end = NULL,  table = NULL,  class_prefix = src)new_col_cfg(src, table, ..., class_prefix = src)new_tbl_cfg(  src,  table,  files = NULL,  cols = NULL,  num_rows = NULL,  partitioning = NULL,  ...,  class_prefix = src)is_src_cfg(x)as_src_cfg(x)is_id_cfg(x)as_id_cfg(x)is_col_cfg(x)as_col_cfg(x)is_tbl_cfg(x)as_tbl_cfg(x)src_name(x)tbl_name(x)src_extra_cfg(x)src_prefix(x)src_url(x)id_var_opts(x)default_vars(x, type)

Arguments

name

Name of the data source

id_cfg

Anid_cfg object for the given data source

col_cfg

A list ofcol_cfg objects representing column defaults forall tables of the

tbl_cfg

A list oftbl_cfg containing information on how tables areorganized (may beNULL)

...

Further objects to add (such as an URL specification)

class_prefix

A character vector of class prefixes that are added tothe instantiated classes

src

Data source name

id,start,end

Name(s) of ID column(s), as well as respective startand end timestamps

pos

Integer valued position, ordering IDs by their cardinality

table

Table name

cols

List containing a list per column each holding string valuedentriesname (column name as used byricu),col (column name as usedin the raw data) andspec (name ofreadr::cols() column specification).Further entries will be passed as argument to the respectivereadr columnspecification

num_rows

A count indicating the expected number of rows

partitioning

A table partitioning is defined by a column name and avector of numeric values that are passed asvec argument tobase::findInterval()

x

Object to coerce/query

Details

The following classes are used to represent data source configurationobjects:

src_cfg: wraps objectsid_cfg,col_cfg and optionallytbl_cfg
id_cfg: contains information in ID systems and is created fromid_cfgentries in config files
col_cfg: contains column default settings represented bydefaultsentries in table configuration blocks
tbl_cfg: used when importing data and therefore encompasses informationinfiles,num_rows andcols entries of table configuration blocks

Represented by acol_cfg, a table can have some of its columns marked asdefault columns for the following concepts and further column meanings canbe specified via...:

id_col: column will be used for as id foricu_tbl objects
index_col: column represents a timestamp variable and will be use assuch forts_tbl objects
val_col: column contains the measured variable of interest
unit_col: column specifies the unit of measurement in the correspondingval_col

Alongside constructors (⁠new_*()⁠), inheritance checking functions(⁠is_*()⁠), as well as coercion functions (⁠as_*(⁠), relevant utilityfunctions include:

src_url(): retrieve the URL of a data source
id_var_opts(): column name(s) corresponding to ID systems
src_name(): name of the data source
tbl_name(): name of a table

Coercion between objects under some circumstances can yield list-of objectreturn types. For example when coercingsrc_cfg totbl_cfg, this willresult in a list oftbl_cfg objects, as multiple tables typicallycorrespond to a data source.

Value

Constructors⁠new_*()⁠ as well as coercion functions⁠as_*()⁠return the respective objects, while inheritance tester functions⁠is_*()⁠return a logical flag.

src_url(): string valued data source URL
id_var_opts(): character vector of ID variable options
src_name(): string valued data source name
tbl_name(): string valued table name

Data source environments

Description

Attaching a data source (seeattach_src()) instantiates two types of S3classes: a singlesrc_env object, representing the data source ascollection of tables, as well as asrc_tbl objects per table,representing the given table. Upon package loading,src_env objectsincluding the respectivesrc_tbl objects are created for all data sourcesthat are configured for auto-attaching, irrespective of whether data isactually available. If some (or all) data is missing, the user is asked forpermission to download in interactive sessions and an error is thrown innon-interactive sessions. Seesetup_src_env() for manually downloadingand setting up data sources.

Usage

new_src_tbl(files, col_cfg, tbl_cfg, prefix, src_env)is_src_tbl(x)as_src_tbl(x, ...)## S3 method for class 'src_env'as_src_tbl(x, tbl, ...)new_src_env(x, env = new.env(parent = data_env()), link = NULL)is_src_env(x)## S3 method for class 'src_env'as.list(x, ...)as_src_env(x)attached_srcs()is_tbl_avail(tbl, env)src_tbl_avail(env, tbls = ls(envir = env))src_data_avail(src = auto_attach_srcs())is_data_avail(src = auto_attach_srcs())

Arguments

files

File names offst files that will be used to create aprtobject (see alsoprt::new_prt())

col_cfg

Coerced tocol_cfg by callingas_col_cfg()

tbl_cfg

Coerced totbl_cfg by callingas_tbl_cfg()

prefix

Character vector valued data source name(s) (used as classprefix)

src_env

The data source environment (assrc_env object)

x

Object to test/coerce

tbl

String-valued table name

env

Environment used assrc_env

link

NULL or a second environment (in addition todata_env()) inwhich the resultingsrc_env is bound to a name

tbls

Character vector of table names

src

Character vector of data source names or any other object (orlist thereof) for which anas_src_env() method exists

Details

Asrc_env object is an environment with attributessrc_name (astring-valued data source name, such asmimic_demo) andid_cfg(describing the possible patient IDs for the given data source). Inaddition to thesrc_env class attribute, sub-classes are defined by thesourceclass_prefix configuration setting (seeload_src_cfg()). Suchdata source environments are intended to contain several correspondingsrc_tbl objects (or rather active bindings that evaluate tosrc_tblobjects; seesetup_src_env()).

The S3 classsrc_tbl inherits fromprt, whichrepresents a partitionedfst file. In addition to theprtobject, meta data in the form ofcol_cfg andtbl_cfg is associated withasrc_tbl object (seeload_src_cfg()). Furthermore, sub-classes areadded as specified by the source configurationclass_prefix entry, aswithsrc_env objects. This allows certain functionality, for example dataloading, to be adapted to data source-specific requirements.

Instantiation and set up ofsrc_env objects is possible irrespective ofwhether the underlying data is available. If some (or all) data is missing,the user is asked for permission to download in interactive sessions and anerror is thrown in non-interactive sessions upon first access of asrc_tbl bound as set up bysetup_src_env(). Data availability can bechecked with the following utilities:

is_tbl_avail(): Returns a logical flag indicating whether all requireddata for the table passed astbl which may be a string or any objectthat has atbl_name() implementation is available from the environmentenv (requires anas_src_env() method).
src_tbl_avail(): Returns a named logical vector, indicating which tableshave all required data available. As above, bothtbls (arbitrarylength) andenv (scalar-valued) may be character vectors or objectswith correspondingtbl_name() andas_src_env() methods.
src_data_avail(): The most comprehensive data availability report canbe generated by callingsrc_data_avail(), returning adata.frame withcolumnsname (the data source name),available (logical vectorindicating whether all data is available),tables (the number ofavailable tables) andtotal (the total number of tables). As input,src may be an arbitrary length character vector, an object for which anas_src_env() method is defined or an arbitrary-length list thereof.
is_data_avail(): Returns a named logical vector, indicating for whichdata sources all required data is available. As above,src may be anarbitrary length character vector, an object for which anas_src_env()method is defined or an arbitrary-length list thereof.

Value

The constructorsnew_src_env()/new_src_tbl() as well as coercionfunctionsas_src_env()/as_src_tbl() returnsrc_env andsrc_tblobjects respectively, while inheritance testersis_src_env()/is_src_tbl() return logical flags. For data availability utilities, seeDetails section.

Concept callback functions

Description

Owing to increased complexity and more diverse applications, recursiveconcepts (classrec_cncpt) may specify callback functionsto be called on corresponding data objects and perform post-processingsteps.

Usage

pafi(  ...,  match_win = hours(2L),  mode = c("match_vals", "extreme_vals", "fill_gaps"),  fix_na_fio2 = TRUE,  interval = NULL)safi(  ...,  match_win = hours(2L),  mode = c("match_vals", "extreme_vals", "fill_gaps"),  fix_na_fio2 = TRUE,  interval = NULL)vent_ind(..., match_win = hours(6L), min_length = mins(30L), interval = NULL)gcs(  ...,  valid_win = hours(6L),  sed_impute = c("max", "prev", "none", "verb"),  set_na_max = TRUE,  interval = NULL)urine24(  ...,  min_win = hours(12L),  limits = NULL,  start_var = "start",  end_var = "end",  interval = NULL)vaso60(..., max_gap = mins(5L), interval = NULL)vaso_ind(..., interval = NULL)supp_o2(..., interval = NULL)avpu(..., interval = NULL)bmi(..., interval = NULL)norepi_equiv(..., interval = NULL)

Arguments

...

Data input used for concept calculation

match_win

Time-span during which matching of values is allowed

mode

Method for matching PaO₂ and FiO₂ values

fix_na_fio2

Logical flag indicating whether to impute missingFiO₂ values with 21

interval

Expected time series step size (determined from data ifNULL)

min_length

Minimal time span between a ventilation start and endtime

valid_win

Maximal time window for which a GCS value is validif no newer measurement is available

sed_impute

Imputation scheme for values taken when patient wassedated (i.e. unconscious).

set_na_max

Logical flag controlling imputation of missing GCS valueswith the respective maximum values

min_win

Minimal time span required for calculation of urine/24h

limits

Passed tofill_gaps() in order to expand the time seriesbeyond first and last measurements

start_var,end_var

Passed tofill_gaps()

max_gap

Maximum time gap between administration windows that aremerged (can be negative).

Details

Several concept callback functions are exported, mainly for documentingtheir arguments, as default values oftentimes represent somewhat arbitrarychoices and passing non-default values might be of interest forinvestigating stability with respect to such choices. Furthermore, defaultvalues might not be ideal for some datasets and/or analysis tasks.

`pafi`

In order to calculate the PaO₂/FiO₂ (or Horowitz index), fora given time point, both a PaO₂ and an FiO₂ measurement isrequired. As the two are often not measured at the same time, some form ofimputation or matching procedure is required. Several options are available:

match_vals allows for a time difference of maximallymatch_winbetween two measurements for calculating their ratio
extreme_vals uses the worst PaO₂ and FiO₂ values withinthe time window spanned bymatch_win
fill_gaps represents a variation ofextreme_vals, where ratios areevaluated at every time-point as specified byintervalas opposed toonly the time points where either a PaO₂ or an FiO₂measurement is available

Finally,fix_na_fio2 imputes all remaining missing FiO₂ with 21,the percentage (by volume) of oxygen in (tropospheric) air.

`vent_ind`

Building on the atomic conceptsvent_start andvent_end,vent_inddetermines time windows during which patients are mechanically ventilatedby combining start and end events that are separated by at mostmatch_winand at leastmin_length. Durations are represented by thedur_var columnin the returnedwin_tbl and thedata_var column simply indicates theventilation status withTRUE values. Currently, no clear distinctionbetween invasive an non-invasive ventilation is made.

`sed_gcs`

In order to construct an indicator for patient sedation (used within thecontext ofgcs), information from the two conceptsett_gcs andrass ispooled: A patient is considered sedated if intubated or has less or equal to-2 on the Richmond Agitation-Sedation Scale.

`gcs`

Aggregating components of the Glasgow Coma Scale into a total score(whenever the total scoretgcs is not already available) requirescoinciding availability of an eye (egcs), verbal (vgcs) and motor(mgcs) score. In order to match values, a last observation carry forwardimputation scheme over the time span specified byvalid_win is performed.Furthermore passing"max" assed_impute will assume maximal points fortime steps where the patient is sedated (as indicated bysed_gcs), whilepassing"prev", will assign the last observed value previous to thecurrent sedation window and finally passingFALSE will in turn use rawvalues. Finally, passingTRUE asset_na_max will assume maximal pointsfor missing values (after matching and potentially applyingsed_impute).

`urine24`

Single urine output events are aggregated into a 24 hour moving window sum.At default value oflimits = NULL, moving window evaluation begins withthe first and ends with the last available measurement. This can however beextended by passing anid_tbl object, such as for example returned bystay_windows() to full stay windows. In order to provide data earlierthan 24 hours before the evaluation start point,min_win specifies theminimally required data window and the evaluation scheme is adjusted forshorter than 24 hour windows.

`vaso60`

Building on concepts for drug administration rate and drug administrationdurations, administration events are filtered if they do not fall intoadministrations windows of at least 1h. Themax_gap argument can be usedto control how far apart windows can be in order to be merged (negativetimes are possible as well, meaning that even overlapping windows can beconsidered as individual windows).

Value

Either anid_tbl orts_tbl depending on the type of concept.

Internal utilities for`item`/`concept` objects

Description

Several internal utilities for modifying, querying ans subsetting item andconcept objects, including getters and setters foritm variables,callback functions,cncpt target classes, as well as utilities for dataloading such asprepare_query() which creates a row-subsettingexpression,do_callback(), which applies a callback function to data ordo_itm_load(), which performs data loading corresponding to anitm

Usage

prepare_query(x)try_add_vars(x, ..., var_lst = NULL, type = c("data_vars", "meta_vars"))get_itm_var(x, var = NULL, type = c("data_vars", "meta_vars"))set_callback(x, fun)do_callback(x, ...)do_itm_load(x, id_type = "icustay", interval = hours(1L))n_tick(x)set_target(x, target)get_target(x)subset_src(x, src)## S3 method for class 'item'subset_src(x, src)## S3 method for class 'cncpt'subset_src(x, src)## S3 method for class 'concept'subset_src(x, src)

Arguments

x

Object defining the row-subsetting

...

Variable specification

var_lst

List-based variable specification

type

Variable type (either data or meta)

var

Variable name (NULL) returns all available

fun

Callback function (passed as string)

id_type

String specifying the patient id type to return

interval

The time interval used to discretize time stamps with,specified asbase::difftime() object

src

Character vector of data source name(s)

Value

prepare_query(): an unevaluated expression used for row-subsetting
try_add_vars(): a (potentially) modified item object with addedvariables
get_itm_var(): character vector ofitm variables
set_callback(): a modified object with added callback function
do_callback(): result of the callback function applied to data, mostlikely (id_tbl/ts_tbl)
do_itm_load(): result of item loading (id_tbl/ts_tbl)
n_tick(): Integer valued number of progress bar ticks
set_target(): a modified object with newly set target class
get_target(): string valued target class of an object
subset_src(): an object of the same type as the object passed asx

ICU class data utilities

Description

Several utility functions for working withid_tbl andts_tbl objectsare available, including functions for changing column names, removingcolumns, as well as aggregating or removing rows. An important thing tonote is that asid_tbl (and consequentlyts_tbl) inherits fromdata.table, there are several functions provided by thedata.tablepackage that are capable of modifyingid_tbl in a way that results in anobject with inconsistent state. An example for this isdata.table::setnames(): if an ID column or the index column name ismodified without updating the attribute marking the column as such, thisleads to an invalid object. Asdata.table::setnames() is not an S3generic function, the only way to control its behavior with respect toid_tbl objects is masking the function. As such an approach has its owndown-sides, a separate function,rename_cols() is provided, which is ableto handle column renaming correctly.

Usage

rename_cols(  x,  new,  old = colnames(x),  skip_absent = FALSE,  by_ref = FALSE,  ...)rm_cols(x, cols, skip_absent = FALSE, by_ref = FALSE)change_interval(x, new_interval, cols = time_vars(x), by_ref = FALSE)change_dur_unit(x, new_unit, by_ref = FALSE)rm_na(x, cols = data_vars(x), mode = c("all", "any"))## S3 method for class 'id_tbl'sort(  x,  decreasing = FALSE,  by = meta_vars(x),  reorder_cols = TRUE,  by_ref = FALSE,  ...)is_sorted(x)## S3 method for class 'id_tbl'duplicated(x, incomparables = FALSE, by = meta_vars(x), ...)## S3 method for class 'id_tbl'anyDuplicated(x, incomparables = FALSE, by = meta_vars(x), ...)## S3 method for class 'id_tbl'unique(x, incomparables = FALSE, by = meta_vars(x), ...)is_unique(x, ...)## S3 method for class 'id_tbl'aggregate(  x,  expr = NULL,  by = meta_vars(x),  vars = data_vars(x),  env = NULL,  ...)dt_gforce(  x,  fun = c("mean", "median", "min", "max", "sum", "prod", "var", "sd", "first", "last",    "any", "all"),  by = meta_vars(x),  vars = data_vars(x),  na_rm = !fun %in% c("first", "last"))replace_na(x, val, type = "const", ...)

Arguments

x

Object to query

new,old

Replacement names and existing column names for renamingcolumns

skip_absent

Logical flag for ignoring non-existent column names

by_ref

Logical flag indicating whether to perform the operation byreference

...

Ignored

cols

Column names of columns to consider

new_interval

Replacement interval length specified as scalar-valueddifftime object

new_unit

Newdifftime unit for thedur_var column

mode

Switch betweenall where all entries of a row have to bemissing (for the selected columns) orany, where a single missing entrysuffices

decreasing

Logical flag indicating the sort order

by

Character vector indicating which combinations of columns fromx to use for uniqueness checks

reorder_cols

Logical flag indicating whether to move thebycolumns to the front.

incomparables

Not used. Here for S3 method consistency

expr

Expression to apply over groups

vars

Column names to apply the function to

env

Environment to look up names inexpr

fun

Function name (as string) to apply over groups

na_rm

Logical flag indicating how to treatNA values

val

Replacement value (iftype is"const")

type

character, one of"const","locf" or"nocb". Defaults to"const".

Details

Apart from a function for renaming columns while respecting attributesmarking columns a index or ID columns, several other utility functions areprovided to make handling ofid_tbl andts_tbl objects more convenient.

Sorting

Anid_tbl orts_tbl object is considered sorted when rows are inascending order according to columns as specified bymeta_vars(). Thismeans that for anid_tbl object rows have to be ordered byid_vars()and for ats_tbl object rows have to be ordered first byid_vars(),followed by theindex_var(). Calling the S3 generic functionbase::sort() on an object that inherits formid_tbl using defaultarguments yields an object that is considered sorted. For convenience(mostly in printing), the column by which the table was sorted are moved tothe front (this can be disabled by passingFALSE asreorder_colsargument). Internally, sorting is handled by either setting adata.table::key() in casedecreasing = FALSE or be callingdata.table::setorder() in casedecreasing = TRUE.

Uniqueness

On object inheriting formid_tbl is considered unique if it is unique interms of the columns as specified bymeta_vars(). This means that for anid_tbl object, either zero or a single row is allowed per combination ofvalues in columnsid_vars() and consequently forts_tbl objects amaximum of one row is allowed per combination of time step and ID. In orderto create a uniqueid_tbl object from a non-uniqueid_tbl object,aggregate() will combine observations that represent repeatedmeasurements within a group.

Aggregating

In order to turn a non-uniqueid_tbl orts_tbl object into an objectconsidered unique, the S3 generic functionstats::aggregate() isavailable. This applied the expression (or function specification) passedasexpr to each combination of grouping variables. The columns to beaggregated can be controlled using thevars argument and the groupingvariables can be changed using theby argument. The argumentexpr isfairly flexible: it can take an expression that will be evaluated in thecontext of thedata.table in a clean environment inheriting fromenv,it can be a function, or it can be a string in which casedt_gforce() iscalled. The default valueNULL chooses a string dependent on data types,wherenumeric resolves tomedian,logical tosum andcharacter tofirst.

As aggregation is used in concept loading (seeload_concepts()),performance is important. For this reason,dt_gforce() allows for any ofthe available functions to be applied using theGForce optimization ofdata.table (seedata.table::datatable.optimize).

Value

Most of the utility functions return an object inheriting fromid_tbl, potentially modified by reference, depending on the type of theobject passed asx. The functionsis_sorted(),anyDuplicated() andis_unique() return logical flags, whileduplicated() returns a logicalvector of the lengthnrow(x).

Examples

tbl <- id_tbl(a = rep(1:5, 4), b = rep(1:2, each = 10), c = rnorm(20),              id_vars = c("a", "b"))is_unique(tbl)is_sorted(tbl)is_sorted(tbl[order(c)])identical(aggregate(tbl, list(c = sum(c))), aggregate(tbl, "sum"))tbl <- aggregate(tbl, "sum")is_unique(tbl)is_sorted(tbl)

Utilities for`difftime`

Description

Asbase::difftime() vectors are used throughoutricu, a set of wrapperfunctions are exported for convenience of instantiationbase::difftime()vectors with given time units.

Usage

secs(...)mins(...)hours(...)days(...)weeks(...)

Arguments

...

Numeric vector to coerce tobase::difftime()

Value

Vector valued time differences asdifftime object.

Examples

hours(1L)mins(NA_real_)secs(1:10)hours(numeric(0L))

Sepsis 3 label

Description

The sepsis 3 label consists of a suspected infection combined with an acuteincrease in SOFA score.

Usage

sep3(  ...,  si_window = c("first", "last", "any"),  delta_fun = delta_cummin,  sofa_thresh = 2L,  si_lwr = hours(48L),  si_upr = hours(24L),  keep_components = FALSE,  interval = NULL)delta_cummin(x)delta_start(x)delta_min(x, shifts = seq.int(0L, 23L))

Arguments

...

Data objects

si_window

Switch that can be used to filter SI windows

delta_fun

Function used to determine the SOFA increase during an SIwindow

sofa_thresh

Required SOFA increase to trigger Sepsis 3

si_lwr,si_upr

Lower/upper extent of SI windows

keep_components

Logical flag indicating whether to return theindividual components alongside the aggregated score

interval

Time series interval (only used for checking consistencyof input data)

x

Vector of SOFA scores

shifts

Vector of time shifts (multiples of the current interval) overwhichbase::pmin() is evaluated

Details

The Sepsis-3 Consensus (Singer et. al.) defines sepsis as an acuteincrease in the SOFA score (seesofa_score()) of 2 points or more withinthe suspected infection (SI) window (seesusp_inf()):

A patient can potentially have multiple SI windows. The argumentsi_window is used to control which SI window we focus on (options are⁠"first", "last", "any"⁠).

Further, although a 2 or more point increase in the SOFA score is defined,it is not perfectly clear to which value the increase refers. For this thedelta_fun argument is used. If the increase is required to happen withrespect to the minimal SOFA value (within the SI window) up to the currenttime, thedelta_cummin function should be used. If, however, we arelooking for an increase with respect to the start of the SI window, thenthedelta_start function should be used. Lastly, the increase might bedefined with respect to values of the previous 24 hours, in which case thedelta_min function is used.

References

Singer M, Deutschman CS, Seymour CW, et al. The Third InternationalConsensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA.2016;315(8):801–810. doi:10.1001/jama.2016.0287

Data setup

Description

Usage

setup_src_data(x, ...)

Arguments

x

Object specifying the source configuration

...

Forwarded toload_src_cfg() ifx is a character vector

Details

Ifsetup_src_data() is called on data sources that have all data availablewithforce = FALSE, nothing happens apart of a message being displayed. Ifonly a subset of tables is missing, only these tables are downloaded(whenever possible) and imported. Passingforce = TRUE attempts to re-download and import the entire data set. If the data source is availableas a data package (as is the case for the two demo datasets), data is notdownloaded and imported, but this package is installed.

In most scenarios,setup_src_data() does not need to be called by users,as upon package loading, all configured data sources are set up in a waythat enables download of missing data upon first access (and barring userconsent). However, instead of accessing all data sources where datamissingness should be resolved one by one,setup_src_data() is exportedfor convenience.

Value

Called for side effects and returnsNULL invisibly.

SIRS score label

Description

The SIRS (Systemic Inflammatory Response Syndrome) score is a commonly usedassessment tool used to track a patient's well-being in an ICU.

Usage

sirs_score(  ...,  win_length = hours(24L),  keep_components = FALSE,  interval = NULL)qsofa_score(  ...,  win_length = hours(24L),  keep_components = FALSE,  interval = NULL)news_score(  ...,  win_length = hours(24L),  keep_components = FALSE,  interval = NULL)mews_score(  ...,  win_length = hours(24L),  keep_components = FALSE,  interval = NULL)

Arguments

...

Data input used for scoreevaluation

win_length

Window used for carry forward

keep_components

Logical flag indicating whether to return theindividual components alongside the aggregated score

interval

Time series interval (only used for checking consistencyof input data)

SOFA score label

Description

The SOFA (Sequential Organ Failure Assessment) score is a commonly usedassessment tool for tracking a patient's status during a stay at an ICU.Organ function is quantified by aggregating 6 individual scores,representing respiratory, cardiovascular, hepatic, coagulation, renal andneurological systems. The functionsofa_score() is used as callbackfunction to thesofa concept but is exported as there are a few argumentsthat can used to modify some aspects of the presented SOFA implementation.Internally,sofa_score() calls firstsofa_window(), followed bysofa_compute() and arguments passed as... will be forwarded to therespective internally called function.

Usage

sofa_score(  ...,  worst_val_fun = max_or_na,  explicit_wins = FALSE,  win_length = hours(24L),  keep_components = FALSE,  interval = NULL)sofa_resp(..., interval = NULL)sofa_coag(..., interval = NULL)sofa_liver(..., interval = NULL)sofa_cardio(..., interval = NULL)sofa_cns(..., interval = NULL)sofa_renal(..., interval = NULL)

Arguments

...

Concept data, either passed as list or individual argument

worst_val_fun

functions used to calculate worst values over windows

explicit_wins

The defaultFALSE iterates over all time steps,TRUE uses only the last time step per patient and a vector of times williterate over these explicit time points

win_length

Time-frame to look back and apply theworst_val_fun

keep_components

Logical flag indicating whether to return theindividual components alongside the aggregated score (with a suffix⁠_comp⁠added to their names)

interval

Time series interval (only used for checking consistencyof input data,NULL will use the interval of the first data object)

Details

The functionsofa_score() calculates, for each component, the worst valueover a moving window as specified bywin_length, using the functionpassed asworst_val_fun. The default functionsmax_or_na() returnNAinstead of-Inf/Inf in the case where no measurement is available over anentire window. When calculating the overall score by summing up componentsper time-step, aNA value is treated as 0.

Building on separate concepts, measurements for each component areconverted to a component score using the definition by Vincent et. al.:

SOFA score	1	2	3	4
Respiration
PaO₂/FiO₂ [mmHg]	< 400	< 300	< 200	< 100
and mechanical ventilation			yes	yes
Coagulation
Platelets [×10³/mm³]	< 150	< 100	< 50	< 20
Liver
Bilirubin [mg/dl]	1.2-1.9	2.0-5.9	6.0-11.9	> 12.0
Cardiovascular^a
MAP	< 70 mmHg
or dopamine		≤5	> 5	> 15
or dobutamine		any dose
or epinephrine			≤0.1	> 0.1
or norepinephrine			≤0.1	> 0.1
Central nervous system
Glasgow Coma Score	13-14	10-12	6-9	< 6
Renal
Creatinine [mg/dl]	1.2-1.9	2.0-3.4	3.5-4.9	> 5.0
or urine output [ml/day]			< 500	< 200

Adrenergic^a agents administered for at least 1h(doses given are in [μg/kg · min]

At default, for each patient, a score is calculated for every time step,from the first available measurement to the last. In instead of a regularlyevaluated score, only certain time points are of interest, this can bespecified using theexplicit_wins argument: passing for examplehours(24, 48) will yield for every patient a score at hours 24 and 48relative to the origin of the current ID system (for example ICU stay).

Value

Ats_tbl object.

References

Vincent, J.-L., Moreno, R., Takala, J. et al. The SOFA (Sepsis-related OrganFailure Assessment) score to describe organ dysfunction/failure. IntensiveCare Med 22, 707–710 (1996). https://doi.org/10.1007/BF01709751

Stays

Description

Building on functionality offered by the (internal) functionid_map(),stay windows as well as (in case of differing values being passed asid_type andwin_type) an ID mapping is computed.

Usage

stay_windows(x, ...)## S3 method for class 'src_env'stay_windows(  x,  id_type = "icustay",  win_type = id_type,  in_time = "start",  out_time = "end",  interval = hours(1L),  patient_ids = NULL,  ...)## S3 method for class 'character'stay_windows(x, ...)## S3 method for class 'list'stay_windows(x, ..., patient_ids = NULL)## Default S3 method:stay_windows(x, ...)

Arguments

x

Data source (is coerced tosrc_env usingas_src_env())

...

Generic consistency

id_type

Type of ID all returned times are relative to

win_type

Type of ID for which the in/out times is returned

in_time,out_time

column names of the returned in/out times

interval

The time interval used to discretize time stamps with,specified asbase::difftime() object

patient_ids

Patient IDs used to subset the result

Value

Anid_tbl containing the selected IDs and depending on valuespassed asin_time andout_time, start and end times of the ID passed aswin_var.

Suspicion of infection label

Description

Suspected infection is defined as co-occurrence of of antibiotic treatmentand body-fluid sampling.

Usage

susp_inf(  ...,  abx_count_win = hours(24L),  abx_min_count = 1L,  positive_cultures = FALSE,  si_mode = c("and", "or", "abx", "samp"),  abx_win = hours(24L),  samp_win = hours(72L),  by_ref = TRUE,  keep_components = FALSE,  interval = NULL)

Arguments

...

Data and further arguments are passed tosi_calc()

abx_count_win

Time span during which to apply theabx_min_countcriterion

abx_min_count

Minimal number of antibiotic administrations

positive_cultures

Logical flag indicating whether to requirecultures to be positive

si_mode

Switch betweenand,or,abx,samp modes

abx_win

Time-span within which sampling has to occur

samp_win

Time-span within which antibiotic administration has tooccur

by_ref

Logical flag indicating whether to process data by reference

keep_components

Logical flag indicating whether to return theindividual components alongside the aggregated score

interval

Time series interval (only used for checking consistencyof input data)

Details

Suspected infection can occur in one of the two following ways:

administration of antibiotics followed by a culture sampling withinsamp_win hours

       abx_win   |---------------|  ABX           sampling (last possible)

culture sampling followed by an antibiotic administration withinabx_win hours

                     samp_win   |---------------------------------------------|sampling                                        ABX (last possible)

The default values ofsamp_win andabx_win are 24 and 72 hoursrespectively, as perSinger et.al..

The earlier of the two times (fluid sampling, antibiotic treatment) is takenas the time of suspected infection (SI time). The suspected infectionwindow (SI window) is defined to startsi_lwr hours before the SI timeand endsi_upr hours after the SI time. The default values of 48 and 24hours (respectively) are chosen as used bySeymour et.al. (seeSupplemental Material).

                48h                       24h  |------------------------------(|)---------------|                                SI time

For some datasets, however, information on body fluid sampling is notavailable for majority of the patients (eICU data). Therefore, analternative definition of suspected infection is required. For this, we useadministration of multiple antibiotics (argumentabx_min_count determinesthe required number) withinabx_count_win hours. The first time ofantibiotic administration is taken as the SI time in this case.

References

Singer M, Deutschman CS, Seymour CW, et al. The Third InternationalConsensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA.2016;315(8):801–810. doi:10.1001/jama.2016.0287

Seymour CW, Liu VX, Iwashyna TJ, et al. Assessment of Clinical Criteria forSepsis: For the Third International Consensus Definitions for Sepsis andSeptic Shock (Sepsis-3). JAMA. 2016;315(8):762–774.doi:10.1001/jama.2016.0288

Item callback utilities

Description

For concept loading, item callback functions are used in order to handleitem-specific post-processing steps, such as converting measurement units,mapping a set of values to another or for more involved datatransformations, like turning absolute drug administration rates into ratesthat are relative to body weight. Item callback functions are called byload_concepts() with argumentsx (the data), a variable number of name/string pairs specifying roles of columns for the given item, followed byenv, the data source environment assrc_env object.Item callback functions can be specified by their name or using functionfactories such astransform_fun(),apply_map() orconvert_unit().

Usage

transform_fun(fun, ...)binary_op(op, y)comp_na(op, y)set_val(val)apply_map(map, var = "val_var")convert_unit(fun, new, rgx = NULL, ignore_case = TRUE, ...)

Arguments

fun

Function(s) used for transforming matching values

...

Further arguments passed to downstream function

op

Function taking two arguments, such as+

y

Value passed as second argument to functionop

val

Value to replace every element of x with

map

Named atomic vector used for mapping a set of values (the namesofmap) to a different set (the values ofmap)

var

Argument which is used to determine the column the mapping isapplied to

new

Name(s) of transformed units

rgx

Regular expression(s) used for identifying observations based ontheir current unit of measurement,NULL means everything

ignore_case

Forwarded tobase::grep()

Details

The most forward setting is where a function is simply referred to by itsname. For example in eICU, age is available as character vector due toages 90 and above being represented by the string "> 89". A function suchas the following turns this into a numeric vector, replacing occurrences of"> 89" by the number 90.

eicu_age <- function(x, val_var, ...) {  data.table::set(    data.table::set(x, which(x[[val_var]] == "> 89"), j = val_var,                    value = 90),    j = val_var,    value = as.numeric(x[[val_var]])  )}

This function then is specified as item callback function for itemscorresponding to eICU data sources of theage concept as

item(src = "eicu_demo", table = "patient", val_var = "age",     callback = "eicu_age", class = "col_itm")

The string passed ascallback argument is evaluated, meaning that anexpression can be passed which evaluates to a function that in turn can beused as callback. Several function factories are provided which returnfunctions suitable for use as item callbacks:transform_fun() creates afunction that transforms theval_var column using the function suppliedasfun argument,apply_map() can be used to map one set of values toanother (again using theval_var column) andconvert_unit() is intendedfor converting a subset of rows (identified by matchingrgx against theunit_var column) by applyingfun to theval_var column and settingnew as the transformed unit name (arguments are not limited to scalarvalues). As transformations require unary functions, two utility function,binary_op() andcomp_na() are provided which can be used to fix thesecond argument of binary functions such as* or==. Taking all thistogether, an item callback function for dividing theval_var column by 2could be specified as⁠"transform_fun(binary_op(⁠/⁠, 2))"⁠. The suppliedfunction factories create functions that operate on the data usingby-reference semantics. Furthermore, during conceptloading, progress is reported by aprogress::progress_bar. In order tosignal a message without disrupting the current loading status, seemsg_progress().

Value

Callback function factories such astransform_fun(),apply_map()orconvert_unit() return functions suitable as item callback functions,while transform function generators such asbinary_op(),comp_na()return functions that apply a transformation to a vector.

Examples

dat <- ts_tbl(x = rep(1:2, each = 5), y = hours(rep(1:5, 2)), z = 1:10)subtract_3 <- transform_fun(binary_op(`-`, 3))subtract_3(data.table::copy(dat), val_var = "z")gte_4 <- transform_fun(comp_na(`>=`, 4))gte_4(data.table::copy(dat), val_var = "z")map_letters <- apply_map(setNames(letters[1:9], 1:9))res <- map_letters(data.table::copy(dat), val_var = "z")resnot_b <- transform_fun(comp_na(`!=`, "b"))not_b(res, val_var = "z")

Internal utilities for ICU data objects

Description

In order to remove allid_tbl/ts_tbl-related attributes, as well asextra class-labels, the exported but marked internal functionunclass_tbl() can be used. This function provides what one might expectfrom anid_tbl/ts_tbl-specific implementation of the S3 genericfunctiondata.table::as.data.table(). The inverse functionality ifprovided byreclass_tbl() which attempts to add attributes as seen intemplate to the object passed asx. The logical flagstop_on_failcontrols how to proceed if the attributes oftemplate are incompatiblewith the objectx. Finally, in order to generate a template,as_ptype()creates an empty object with the appropriate attributes.

Usage

unclass_tbl(x)reclass_tbl(x, template, stop_on_fail = TRUE)as_ptype(x)

Arguments

x

Object to modify/query

template

Object after which to model the object in question

stop_on_fail

Logical flag indicating whether to consider failedobject validation as error

Value

unclass_tbl(): adata.table
reclass_tbl(): either anid_tbl or ats_tbl depending on the typeof the object passed astemplate
as_ptype(): an object of the same type asx, but with on data

Read and write utilities

Description

Support for reading from and writing to pipe separated values (.psv)files as used for the PhysioNet Sepsis Challenge.

Usage

write_psv(x, dir, na_rows = NULL)read_psv(dir, col_spec = NULL, id_var = "stay_id", index_var = NULL)

Arguments

x

Object to write to files

dir

Directory to write the (many) files to or read from

na_rows

IfTRUE missing time steps are filled withNaN values,ifFALSE, rows where all data columns entries are missing are removed andifNULL, data is written as-is

col_spec

A column specification as created byreadr::cols()

id_var

Name of the id column (IDs are generated from file names)

index_var

Optional name of index column (will be coerced todifftime)

Details

Data for the PhysioNet Sepsis Challenge is distributed as pipe separatedvalues (.psv) files, split into separate files per patient ID, containingtime stamped rows with measured variables as columns. Files are named withpatient IDs and do not contain any patient identifiers as data. Functionsread_psv() andwrite_psv() can be used to read from and write to sucha data format.

Value

Whilewrite_psv() is called for side effects and returnsNULLinvisibly,read_psv() returns an object inheriting fromid_tbl.

References

Reyna, M., Josef, C., Jeter, R., Shashikumar, S., Moody, B., Westover, M.B., Sharma, A., Nemati, S., & Clifford, G. (2019). Early Prediction ofSepsis from Clinical Data – the PhysioNet Computing in CardiologyChallenge 2019 (version 1.0.0). PhysioNet.https://doi.org/10.13026/v64v-d857.

Movatterモバイル変換

ricu: Intensive Care Unit Data with R

Description

Author(s)

See Also

Internal item callback utilities

Description

Usage

Arguments

Value

Data attach utilities

Description

Usage

Arguments

Details

Value

Examples

ICU class data reshaping

Description

Usage

Arguments

Value

Switch between id types

Description

Usage

Arguments

Details

Value

Examples

Internal utilities for ICU data classes

Description

Usage

Arguments

ICU datasets

Description

Usage

Format

Details

MIMIC-III

eICU

HiRID

AmsterdamUMCdb

MIMIC-IV

SICdb

References

File system utilities

Description

Usage

Arguments

Details

Value

Examples

Data download utilities

Description

Usage

Arguments

Details

Value

Examples

Time series utility functions

Description

Usage

Arguments

Details

Value

Examples

Data loading utilities

Description

Usage

Arguments

Details

Value

Tabular ICU data classes

Description

Usage

Arguments

Details

Value

Relationship todata.table

Examples

Relationship to`data.table`

Load data as`id_tbl` or`ts_tbl` objects