Package overview
The omopgenerics package provides definitions of core classes and methods used by analytic pipelines that query the OMOP common data model.
#> Warning in citation("omopgenerics"): could not determine year for#> 'omopgenerics' from package DESCRIPTION file#> To cite package 'omopgenerics' in publications use:#>#> Català M, Burn E (2025). _omopgenerics: Methods and Classes for the#> OMOP Common Data Model_. R package version 1.3.2,#> <https://darwin-eu.github.io/omopgenerics/>.#>#> A BibTeX entry for LaTeX users is#>#> @Manual{,#> title = {omopgenerics: Methods and Classes for the OMOP Common Data Model},#> author = {Martí Català and Edward Burn},#> note = {R package version 1.3.2},#> url = {https://darwin-eu.github.io/omopgenerics/},#> }If you find the package useful in supporting your research study, please consider citing this package.
Installation
You can install the development version of OMOPGenerics fromGitHub with:
install.packages("pak")pak::pkg_install("darwin-eu/omopgenerics")And load it using the library command:
Core classes and methods
CDM Reference
A cdm reference is a single R object that represents OMOP CDM data. The tables in the cdm reference may be in a database, but a cdm reference may also contain OMOP CDM tables that are in dataframes/tibbles or in arrow. In the latter case the cdm reference would typically be a subset of an original cdm reference that has been derived as part of a particular analysis.
omopgenerics contains the class definition of a cdm reference and a dataframe implementation. For creating a cdm reference using a database, see the CDMConnector package (https://darwin-eu.github.io/CDMConnector/).
A cdm object can contain four type of tables:
- Standard tables:
omopTables()#> [1] "person" "observation_period" "visit_occurrence"#> [4] "visit_detail" "condition_occurrence" "drug_exposure"#> [7] "procedure_occurrence" "device_exposure" "measurement"#> [10] "observation" "death" "note"#> [13] "note_nlp" "specimen" "fact_relationship"#> [16] "location" "care_site" "provider"#> [19] "payer_plan_period" "cost" "drug_era"#> [22] "dose_era" "condition_era" "metadata"#> [25] "cdm_source" "concept" "vocabulary"#> [28] "domain" "concept_class" "concept_relationship"#> [31] "relationship" "concept_synonym" "concept_ancestor"#> [34] "source_to_concept_map" "drug_strength" "cohort_definition"#> [37] "attribute_definition" "concept_recommended"Each one of the tables has a required columns. For example, for theperson table this are the required columns:
omopColumns(table="person")#> [1] "person_id" "gender_concept_id"#> [3] "year_of_birth" "month_of_birth"#> [5] "day_of_birth" "birth_datetime"#> [7] "race_concept_id" "ethnicity_concept_id"#> [9] "location_id" "provider_id"#> [11] "care_site_id" "person_source_value"#> [13] "gender_source_value" "gender_source_concept_id"#> [15] "race_source_value" "race_source_concept_id"#> [17] "ethnicity_source_value" "ethnicity_source_concept_id"- Cohort tables We can see the cohort-related tables and their required columns.
cohortTables()#> [1] "cohort" "cohort_set" "cohort_attrition" "cohort_codelist"cohortColumns(table="cohort")#> [1] "cohort_definition_id" "subject_id" "cohort_start_date"#> [4] "cohort_end_date"In addition, cohorts are defined in terms of ageneratedCohortSet class. For more details on this class definition see the corresponding vignette.
- Achilles tables The Achilles R package generates descriptive statistics about the data contained in the OMOP CDM. Again, we can see the tables created and their required columns.
achillesTables()#> [1] "achilles_analysis" "achilles_results" "achilles_results_dist"achillesColumns(table="achilles_results")#> [1] "analysis_id" "stratum_1" "stratum_2" "stratum_3" "stratum_4"#> [6] "stratum_5" "count_value"- Other tables, these other tables can have any format.
Any table to be part of a cdm object has to fulfill 4 conditions:
All must share a common source.
The name of the tables must be lowercase.
The name of the column names of each table must be lowercase.
personandobservation_periodmust be present.
Concept set
A concept set can be represented as either a codelist or a concept set expression. A codelist is a named list, with each item of the list containing specific concept IDs.
condition_codes<-list("diabetes"=c(201820L,4087682L,3655269L),"asthma"=317009L)condition_codes<-newCodelist(condition_codes)condition_codes#>#> ── 2 codelists ─────────────────────────────────────────────────────────────────#>#> - asthma (1 codes)#> - diabetes (3 codes)Meanwhile, a concept set expression provides a high-level definition of concepts that, when applied to a specific OMOP CDM vocabulary version (by making use of the concept hierarchies and relationships), will result in a codelist.
condition_cs<-list("diabetes"=dplyr::tibble("concept_id"=c(201820L,4087682L),"excluded"=c(FALSE,FALSE),"descendants"=c(TRUE,FALSE),"mapped"=c(FALSE,FALSE)),"asthma"=dplyr::tibble("concept_id"=317009L,"excluded"=FALSE,"descendants"=FALSE,"mapped"=FALSE))condition_cs<-newConceptSetExpression(condition_cs)condition_cs#>#> ── 2 concept set expressions ───────────────────────────────────────────────────#>#> - asthma (1 concept criteria)#> - diabetes (2 concept criteria)A cohort table
A cohort is a set of persons who satisfy one or more inclusion criteria for a duration of time and, when defined, this table in a cdm reference has a cohort table class. Cohort tables are then associated with attributes such as settings and attrition.
person<-tibble( person_id=1L, gender_concept_id=0L, year_of_birth=1990L, race_concept_id=0L, ethnicity_concept_id=0L)observation_period<-dplyr::tibble( observation_period_id=1L, person_id=1L, observation_period_start_date=as.Date("2000-01-01"), observation_period_end_date=as.Date("2023-12-31"), period_type_concept_id=0L)diabetes<-tibble( cohort_definition_id=1L, subject_id=1L, cohort_start_date=as.Date("2020-01-01"), cohort_end_date=as.Date("2020-01-10"))cdm<-cdmFromTables( tables=list("person"=person,"observation_period"=observation_period,"diabetes"=diabetes), cdmName="example_cdm")cdm$diabetes<-newCohortTable(cdm$diabetes)cdm$diabetes#> # A tibble: 1 × 4#> cohort_definition_id subject_id cohort_start_date cohort_end_date#> <int> <int> <date> <date>#> 1 1 1 2020-01-01 2020-01-10settings(cdm$diabetes)#> # A tibble: 1 × 2#> cohort_definition_id cohort_name#> <int> <chr>#> 1 1 cohort_1attrition(cdm$diabetes)#> # A tibble: 1 × 7#> cohort_definition_id number_records number_subjects reason_id reason#> <int> <int> <int> <int> <chr>#> 1 1 1 1 1 Initial qualify…#> # ℹ 2 more variables: excluded_records <int>, excluded_subjects <int>cohortCount(cdm$diabetes)#> # A tibble: 1 × 3#> cohort_definition_id number_records number_subjects#> <int> <int> <int>#> 1 1 1 1Summarised result
A summarised result provides a standard format for the results of an analysis performed against data mapped to the OMOP CDM.
For example this format is used when we get a summary of the cdm as a whole
summary(cdm)|>glimpse()#> Rows: 13#> Columns: 13#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1#> $ cdm_name <chr> "example_cdm", "example_cdm", "example_cdm", "example…#> $ group_name <chr> "overall", "overall", "overall", "overall", "overall"…#> $ group_level <chr> "overall", "overall", "overall", "overall", "overall"…#> $ strata_name <chr> "overall", "overall", "overall", "overall", "overall"…#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…#> $ variable_name <chr> "snapshot_date", "person_count", "observation_period_…#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA#> $ estimate_name <chr> "value", "count", "count", "source_name", "version", …#> $ estimate_type <chr> "date", "integer", "integer", "character", "character…#> $ estimate_value <chr> "2025-10-12", "1", "1", "", NA, "5.3", "", "", "", ""…#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…and also when we summarise a cohort
summary(cdm$diabetes)|>glimpse()#> `cohort_definition_id` casted to character.#> `cohort_definition_id` casted to character.#> Rows: 6#> Columns: 13#> $ result_id <int> 1, 1, 2, 2, 2, 2#> $ cdm_name <chr> "example_cdm", "example_cdm", "example_cdm", "example…#> $ group_name <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…#> $ group_level <chr> "cohort_1", "cohort_1", "cohort_1", "cohort_1", "coho…#> $ strata_name <chr> "overall", "overall", "reason", "reason", "reason", "…#> $ strata_level <chr> "overall", "overall", "Initial qualifying events", "I…#> $ variable_name <chr> "number_records", "number_subjects", "number_records"…#> $ variable_level <chr> NA, NA, NA, NA, NA, NA#> $ estimate_name <chr> "count", "count", "count", "count", "count", "count"#> $ estimate_type <chr> "integer", "integer", "integer", "integer", "integer"…#> $ estimate_value <chr> "1", "1", "1", "1", "0", "0"#> $ additional_name <chr> "overall", "overall", "reason_id", "reason_id", "reas…#> $ additional_level <chr> "overall", "overall", "1", "1", "1", "1"