Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

R package for working with medical coding schema and identifying records with specific comorbidities

NotificationsYou must be signed in to change notification settings

dewittpe/medicalcoder

Repository files navigation

Project Status: Active – The project has reached a stable, usable state and is being actively developed.R-CMD-checkCodecov test coverageCRAN statusCRAN RStudio mirror downloadsdownloads

medicalcoder is a lightweight, base-R package for working with ICD-9 andICD-10 diagnosis and procedure codes. It provides fast, dependency-free tools tolook up, validate, and manipulate ICD codes, while also implementing widely usedcomorbidity algorithms such as Charlson, Elixhauser, and the Pediatric ComplexChronic Conditions (PCCC). Designed for portability and reproducibility, thepackage avoids external dependencies—requiring only R ≥ 3.5.0—yet offers a richset of curated ICD code libraries from the United States' Centers for Medicareand Medicaid Services (CMS), Centers for Disease Control (CDC), and the WorldHealth Organization (WHO).

The package balances performance with elegance: its internal caching, efficientjoins, and compact data structures make it practical for large-scale health dataanalyses, while its clean design makes it easy to extend or audit. Whether youneed to flag comorbidities, explore ICD hierarchies, or standardize clinicalcoding workflows, medicalcoder provides a robust, transparent foundation forresearch and applied work in biomedical informatics.

The primary objectives of medicalcoder are:

  1. Fully self-contained

    • Minimal Dependencies

      • No dependencies other than base R.
      • Requires R version ≥ 3.5.0 due to achange in dataserialization.R 3.5.0 was released in April 2018. The initial public release ofmedicalcoder was in 2025.
      • Several packages are listed in theSuggests section of theDESCRIPTION file. These are only needed for building vignettes, otherdocumentation, and testing. They are not required to install the package.
    • No Imports

      • medicalcoder does not import any non-base namespaces. This improvesease of maintenance and usability.
      • Suggested packages are needed only for development work and buildingvignettes. They are not required for installation or use.
    • That said, there are non-trivial performance gains when passing adata.table to thecomorbidities() function. Passing atibble is typically faster than abasedata.frame but slower than adata.table.(Seebenchmarking).

    • Internal lookup tables

      • All required data are included in the package. If you have the.tar.gzsource file and R ≥ 3.5.0, that is all you need to install and use thepackage.
  2. Efficient implementation of multiple comorbidity algorithms

    • Implements three general algorithms, each with multiple variants. Detailsare provided below.
    • Supports flagging of subconditions within PCCC.
    • Supports longitudinal flagging of comorbidities. medicalcoder will flagcomorbidities based on present-on-admission indicators for thecurrent encounter and can look back in time for a patient to flag acomorbidity if reported in a prior encounter. See examples.
  3. Tools for working with ICD codes

    • Lookup tables.
    • Ability to work with both full codes (ICD codes with decimal points) andcompact codes (ICD codes with decimal points omitted).

Why use medicalcoder

There are several tools for working with ICD codes and comorbidity algorithms.medicalcoder provides novel features:

  • Unified access to multiple comorbidity algorithms through a single function:comorbidities().
  • Support for both ICD-9 and ICD-10 diagnostic and procedure codes.
  • Longitudinal patient-level comorbidity flagging using present-on-admission indicators.
  • Fully self-contained package (no external dependencies).

Install

CRAN

install.packages("medicalcoder")

From GitHub

remotes::install_github("dewittpe/medicalcoder")

From source

If you have the .tar.gz file for version X.Y.Z, e.g.,medicalcoder_X.Y.Z.tar.gzyou can install from within R via:

install.packages(pkgs="medicalcoder_X.Y.Z.tar.gz",# replace file name with the file you haverepos=NULL,type="source")

From the command line:

R CMD INSTALL medicalcoder_X.Y.Z.tar.gz

Quick Start:

Example Data

Input data forcomorbidities() is expected to be in a 'long' format. Each rowis one code with additional columns for patient and/or encounter id. There aretwo example data sets in the package:mdcr andmdcr_longitudinal.

data(mdcr,mdcr_longitudinal,package="medicalcoder")

Themdcr data set consists of 319 856 rows.Each row contains one ICD code (code). The columnicdv denoteseach code as ICD-9 or ICD-10, and thedx column denotes diagnostic (1) orprocedure (0) code. This data set contains diagnostic and procedure codes for38 262 patients.

str(mdcr)#> 'data.frame':319856 obs. of  4 variables:#>  $ patid: int  71412 71412 71412 71412 71412 17087 64424 64424 84361 84361 ...#>  $ icdv : int  9 9 9 9 9 10 9 9 9 9 ...#>  $ code : chr  "99931" "75169" "99591" "V5865" ...#>  $ dx   : int  1 1 1 1 1 1 1 0 1 1 ...head(mdcr)#>   patid icdv  code dx#> 1 71412    9 99931  1#> 2 71412    9 75169  1#> 3 71412    9 99591  1#> 4 71412    9 V5865  1#> 5 71412    9  V427  1#> 6 17087   10  V441  1

Themdcr_longitudinal data set is distinct from themdcr data set. The majordifference is that this data set contains only diagnostic codes and there areonly 3 patients. Thedate columndenotes the date of the diagnosis and allows us to look at changes incomorbidities over time.

str(mdcr_longitudinal)#> 'data.frame':60 obs. of  4 variables:#>  $ patid: int  9663901 9663901 9663901 9663901 9663901 9663901 9663901 9663901 9663901 9663901 ...#>  $ date : IDate, format: "2016-03-18" "2016-03-24" ...#>  $ icdv : int  10 10 10 10 10 10 10 10 10 10 ...#>  $ code : chr  "Z77.22" "IMO0002" "V87.7XXA" "J95.851" ...head(mdcr_longitudinal)#>     patid       date icdv     code#> 1 9663901 2016-03-18   10   Z77.22#> 2 9663901 2016-03-24   10  IMO0002#> 3 9663901 2016-03-24   10 V87.7XXA#> 4 9663901 2016-03-25   10  J95.851#> 5 9663901 2016-03-30   10  IMO0002#> 6 9663901 2016-03-30   10    Z93.0

Comorbidity Algorithms

There are three comorbidity methods, each with several variants, available inmedicalcoder. All of which are accessible through thecomorbidities()method by specifying themethod argument.

General examples and explanations for when conditions are flagged are in thevignette

vignette(topic="comorbidities",package="medicalcoder")

Pediatric Complex Chronic Conditions (PCCC)

  • Version 2.0

  • Version 2.1

    • Updated code base with the same assessment algorithm as version 2.0.
  • Version 3.0

  • Version 3.1

    • Updated code base with same assessment algorithm as version 3.0.
  • All variants can flag conditions and subconditions.

# PCCC v3.1 examplelibrary(medicalcoder)cmrbs2<-  comorbidities(data=mdcr,id.vars="patid",# can use more than one column, e.g., site, patient, encountericd.codes="code",dx.var="dx",poa=1,# consider all codes to be present on admissionmethod="pccc_v2.1"  )cmrbs3<-  comorbidities(data=mdcr,id.vars="patid",icd.codes="code",dx.var="dx",poa=1,# consider all codes to be present on admissionmethod="pccc_v3.1"  )str(cmrbs2,max.level=0)#> Classes 'medicalcoder_comorbidities' and 'data.frame':38262 obs. of  16 variables:#>  - attr(*, "method")= chr "pccc_v2.1"#>  - attr(*, "id.vars")= chr "patid"#>  - attr(*, "flag.method")= chr "current"str(cmrbs3,max.level=0)#> Classes 'medicalcoder_comorbidities' and 'data.frame':38262 obs. of  49 variables:#>  - attr(*, "method")= chr "pccc_v3.1"#>  - attr(*, "id.vars")= chr "patid"#>  - attr(*, "flag.method")= chr "current"

A summary of the flagged conditions is generated with a call tosummary().

s2<- summary(cmrbs2)str(s2)

Forpccc_v2.0 andpccc_v2.1 thedata.frame returned bysummary()reports the count (uniqueid.vars with the condition) and percentage.

s3<- summary(cmrbs3)str(s3)

Forpccc_v3.0 andpccc_v3.1 the returneddata.frame reports counts andpercentages for how the condition was flagged based on diagnosis/procedure codesonly, technology dependent codes only, or both. Thedxpr_or_tech columnsanswer the question "did this patient have the condition".

Further detail, examples, and explanations are in the vignette.

vignette(topic="pccc",package="medicalcoder")

Charlson Comorbidities

There are four variants of Charlson comorbidities implemented in medicalcoder:

# Charlson examplecmrbs<-  comorbidities(data=mdcr,id.vars="patid",icd.codes="code",dx.var="dx",poa=1,# assume all codes are present on admissionprimarydx=0L,# assume all codes are secondary diagnosis codesmethod="charlson_quan2005"  )

A summary of the flagged conditions can be generated by callingsummary().Where the summary for the PCCC method was adata.frame the return for theCharlson comorbidities is a list of data frames summarizing the conditions, agecategory, and the index score.

s<- summary(cmrbs)str(s,max.level=1)

More details and examples are provided in the vignette:

vignette(topic="charlson",package="medicalcoder")

Elixhauser Comorbidities

# Elixhauser examplecmrbs<-  comorbidities(data=mdcr,id.vars="patid",icd.codes="code",dx.var="dx",poa=1,primarydx=0L,method="elixhauser_ahrq_icd10"  )

The summary for the results frommethod = elixhauser_ahrq_icd10 are similar to those forCharlson. Adata.frame with the counts and percentages of distinctdata[id.vars] with the noted condition, and a summary of the index scores.

s<- summary(cmrbs)str(s,max.level=1)

More details and examples are provided in the vignette:

vignette(topic="elixhauser",package="medicalcoder")

ICD

The package contains internal data sets with references for ICD-9 and ICD-10 USbased diagnostic and procedure codes. These codes are supplemented withadditional codes from the World Health Organization.

You can get a table of ICD codes viaget_icd_codes().

str(medicalcoder::get_icd_codes())#> 'data.frame':249736 obs. of  9 variables:#>  $ icdv            : int  9 9 9 9 9 9 9 9 9 9 ...#>  $ dx              : int  0 0 0 0 0 0 0 0 0 0 ...#>  $ full_code       : chr  "00" "00" "00.0" "00.0" ...#>  $ code            : chr  "00" "00" "000" "000" ...#>  $ src             : chr  "cdc" "cms" "cdc" "cms" ...#>  $ known_start     : int  2003 2006 2003 2006 2003 2006 2003 2006 2003 2006 ...#>  $ known_end       : int  2012 2015 2012 2015 2012 2015 2012 2015 2012 2015 ...#>  $ assignable_start: int  NA NA NA NA 2003 2006 2003 2006 2003 2006 ...#>  $ assignable_end  : int  NA NA NA NA 2012 2015 2012 2015 2012 2015 ...

The columns are:

  • icdv: integer value 9 or 10; for ICD-9 or ICD-10

  • dx: integer 0 or 1; 0 = procedure code, 1 = diagnostic code

  • full_code: character string for the ICD code with any appropriate decimal point.

  • code: character string for the compact ICD code, that is, the ICD codewithout any decimal point, e.g., the full code C00.1 has the compact code formC001.

  • src: character string denoting the source of the ICD code information.

    • cms: The ICD-9-CM, ICD-9-PCS, ICD-10-CM, or ICD-10-PCS codes curatedby the Centers for Medicare and Medicaid Services (CMS).
    • cdc: CDC mortality coding.
    • who: World Health Organization.
  • known_start: The earliest (fiscal) year when source data for the code wasavailable in the source code for medicalcoder. Codes from CMS are for theUnited States fiscal year. Codes from CDC and WHO are calendar year. TheUnited States fiscal year starts October 1 and concludes September 30. Forexample, fiscal year 2013 started October 1 2012 and concluded September 30 2013.

    To reemphasize that the year is for the data within medicalcoder. ForICD-9-CM, the codes went into effect for fiscal year 1980. The source codeonly has documented source files for the codes dating back to1997.

  • known_end: The latest (fiscal) year when the code was part of the ICDsystem and/or known within the medicalcoder lookup tables.

  • Assignable codes. Some codes are header codes, e.g., ICD-10-CM three-digitcode Z94 is a header code because the four-digit codes Z94.0, Z94.1, Z94.2,Z94.3, Z94.4, Z94.5, Z94.6, Z94.7, Z94.8, and Z94.9 exist. All but Z94.8 areassignable codes because no five-digit codes with the same initial four-digitsexist. Z94.8 is a header code because the five-digit codes Z94.81, Z94.82,Z94.83, Z94.84, and Z94.89 exist.

    • assignable_start: Earliest (fiscal) year when the code was assignable.
    • assignable_end: Latest (fiscal) year when the code was assignable.
subset(x= lookup_icd_codes("^Z94",regex=TRUE,full.codes=TRUE,compact.codes=FALSE),subset=src=="cms",select= c("full_code","known_start","known_end","assignable_start","assignable_end"))#>    full_code known_start known_end assignable_start assignable_end#> 1        Z94        2014      2026               NA             NA#> 5      Z94.0        2014      2026             2014           2026#> 9      Z94.1        2014      2026             2014           2026#> 14     Z94.2        2014      2026             2014           2026#> 17     Z94.3        2014      2026             2014           2026#> 22     Z94.4        2014      2026             2014           2026#> 25     Z94.5        2014      2026             2014           2026#> 29     Z94.6        2014      2026             2014           2026#> 33     Z94.7        2014      2026             2014           2026#> 38     Z94.8        2014      2026               NA             NA#> 41    Z94.81        2014      2026             2014           2026#> 42    Z94.82        2014      2026             2014           2026#> 43    Z94.83        2014      2026             2014           2026#> 44    Z94.84        2014      2026             2014           2026#> 45    Z94.89        2014      2026             2014           2026#> 46     Z94.9        2014      2026             2014           2026

Additionally, theget_icd_codes() method can provide descriptions and the ICDhierarchy by using thewith.descriptions and/orwith.hierarchy arguments.

Functionslookup_icd_codes(),is_icd(), andicd_compact_to_full() are alsoprovided for working with ICD codes.

More details and examples are in the vignette:

vignette(topic="icd",package="medicalcoder")

Benchmarking

The major factors impacting the expected computation time for applying acomorbidity algorithm to a data set are:

  1. Data size: number of subjects/encounters.
  2. Data storage class: medicalcoder has been built such that no imports ofother namespaces is required. That said, when adata.table is passed tocomorbidities() and thedata.table namespace is available, then S3dispatch formerge is used, along with some other methods, to reduce memoryuse and reduce computation time. When atibble is passed and the tidyversenamespaces are available, the tibble-aware paths improve performance over abasedata.frame, butdata.table remains fastest.
  3. flag.method: "current" will take less time than the "cumulative" method.

Details on the benchmarking method, summary graphics, and tables, can be foundon the medicalcoder GitHubbenchmarkingdirectory.

Testing

Along with the GitHub actions and testing on current versions of R, thetestingdirectory in the medicalcoder GitHub repo reports theR CMD check results forall R versions from 3.5.0 to latest. Several with, and without Sugguests.

About

R package for working with medical coding schema and identifying records with specific comorbidities

Topics

Resources

Stars

Watchers

Forks

Contributors3

  •  
  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp