Movatterモバイル変換


[0]ホーム

URL:


qualitycontrol

The goal of qualitycontrol is to set a data quality controlframework

Installation

You can install the qualitycontrol fromGitHub with:

# install.packages("devtools")devtools::install_github("luisgarcez11/qualitycontrol")

Data

Theals_data dataset will be used to guide you throughthe package functionality. This data is not real, but based on dataretrieved from Amyotrophic Lateral Sclerosis patients.

library(qualitycontrol)als_data
##    subjid p1 p2 p3 p4 p5 p6 p7 p8 p9 x1r x2r x3r age_at_baseline age_at_onset## 1       1  4  1  1  3  4  3  4  3  4   2   2   1              51           46## 2       2  4  4  4  1  1  3  3  1  4   1   2   4              82           77## 3       3  2  3  1  4  3  1  3  1  1   4   3   1              85           80## 4       4  3  2  1  1  4  1  3  2  4   4   3   3              77           72## 5       5  3  2  1  3  3  4  4  3  4   1   4   2              85           80## 6       6  2  2  1  4  1  4  4  3  1   3   5   2              73           68## 7       7  1  4  2  4  3  3  2  3  4   1   2   2              65           60## 8       8  2  2  4  4  3  2  1  2  3   3   1   1              50           62## 9       9  3  1  1  4  4  2  4  1  1   2   2   4              65           46## 10     10  3  4  1  4  3  2  3  2  1   4   3   1              81           76## 11     11  1  3  1  3  3  4  1 NA  3   3   2   4              51           46## 12     12  1  4  3  2  3  2  2 NA  1   3   2   3              50           45## 13     13  1  1  4  1  1  3  4 NA  2   2   3   1              82           77## 14     14  3  2  2  4  3  3  3  3  2   3   4   1              76           71## 15     15  3  4  2  2  2  3  1  3  4   4   1   4              87          376## 16     16  3  3  2  4  3  3  1  1  2   2   4   1              50           45## 17     17  3  2  3  1  4  1  3  2  1   4   4   2              85           80## 18     18  4  1  3  1  3  1  3  2  2   4   3   4              57           52## 19     19  1  3  3  2  2  2  3  2  3   2   3   2              74           69## 20     20  2  2  4  2  3  4  2  4  1   4   1   3              59           54## 21     21  2  3  3  2  3  2  4  4  1   1   3   3              79           74## 22     22  4  3  1  1  3  4  2  1  4   1   2   3              53           48## 23     23  3  3  4  3  4  1  3  4  3   2   2   2              45           40## 24     24  4  1  1  2  4  2  4  4  4   4   2   1              72           67## 25     25  4  3  1  3  3  4  3  2  3   3   4   2              77           72## 26     26  2  1  1  2  4  2  4  1  2   3   2   4              65           60## 27     27  1  1  1  1  1  1  3  3  2   2   1   1              54           49## 28     28  3  1  1  3  1  4  1  2  2   2   3   4              50          -23## 29     29  2  3  1  3  1  4  4  1  3   2   4   1              85           80## 30     30  3  1  2  1  3  1  2  4  1   1   2   4              85           80## 31     30  3  3  1  4  2  2  1  4  3   3   1   3              53           48##          onset baseline_date death_date## 1       bulbar    2003-03-26 2010-10-18## 2        bulba    2003-07-03 2019-06-24## 3       spinal    2007-01-27 9999-12-30## 4       bulbar    2010-11-27 2018-01-04## 5       bulbar    2006-10-25 2017-10-13## 6       spinal    2007-04-30 2010-05-08## 7       spinal    2002-11-15 2019-04-06## 8       spinal    2002-12-13 2018-05-04## 9       spinal    2005-06-02 2013-08-11## 10      bulbar    2004-06-02 2016-05-20## 11      bulbar    2007-03-09 2016-09-26## 12      bulbar    2005-01-11 2010-06-20## 13      bulbar    2010-12-22 2019-07-05## 14      bulbar    2008-10-14 2013-08-14## 15      spinal    2005-09-15 2010-07-20## 16      spinal    2007-07-05 2010-08-28## 17 respiratory    2002-08-19 2011-10-17## 18      spinal    2002-06-30 2020-12-17## 19 respiratory    2010-07-18 2016-05-15## 20      spinal    2004-08-15 2015-03-15## 21      bulbar    2006-04-07 2013-03-16## 22      bulbar    2002-06-01 2016-06-21## 23      bulbar    2007-08-12 2017-04-01## 24      bulbar    2006-08-12 2002-12-02## 25 respiratory    2006-08-11 2016-03-03## 26      spinal    2005-01-04 2011-10-05## 27 respiratory    2009-08-25 2015-03-11## 28      bulbar    2002-05-11 2017-11-09## 29      bulbar    2004-07-27 2014-03-27## 30      bulbar    2005-11-11 2015-05-30## 31      bulbar    2008-02-27 2014-07-05

QC mapping

Theals_data_qc_mapping is anR list whichcontains 3 tables specifying all the tests used for quality control. Youcan specify your own tests, by creating an excel file and then read itusing the functionread_qc_mapping.

Missing

als_data_qc_mapping$missing
## # A tibble: 13 × 3##    qc_type    variable type   ##    <chr>      <chr>    <chr>  ##  1 duplicated subjid   text   ##  2 missing    p1       numeric##  3 missing    p2       numeric##  4 missing    p3       numeric##  5 missing    p4       numeric##  6 missing    p5       numeric##  7 missing    p6       numeric##  8 missing    p7       numeric##  9 missing    p8       numeric## 10 missing    p9       numeric## 11 missing    x1r      numeric## 12 missing    x2r      numeric## 13 missing    x3r      numeric

Inconsistencies

als_data_qc_mapping$inconsistencies
## # A tibble: 2 × 6##   qc_type             variable1       type1   relation     variable2    type2  ##   <chr>               <chr>           <chr>   <chr>        <chr>        <chr>  ## 1 inconsistent_values age_at_baseline numeric greater_than age_at_onset numeric## 2 inconsistent_values baseline_date   date    lower_than   death_date   date

Out of range values

als_data_qc_mapping$range
## # A tibble: 16 × 6##    qc_type variable        type        lower_value upper_value categories       ##    <chr>   <chr>           <chr>       <chr>       <chr>       <chr>            ##  1 range   p1              numeric     1           4           <NA>             ##  2 range   p2              numeric     1           4           <NA>             ##  3 range   p3              numeric     1           4           <NA>             ##  4 range   p4              numeric     1           4           <NA>             ##  5 range   p5              numeric     1           4           <NA>             ##  6 range   p6              numeric     1           4           <NA>             ##  7 range   p7              numeric     1           4           <NA>             ##  8 range   p8              numeric     1           4           <NA>             ##  9 range   p9              numeric     1           4           <NA>             ## 10 range   x1r             numeric     1           4           <NA>             ## 11 range   x2r             numeric     1           4           <NA>             ## 12 range   x3r             numeric     1           4           <NA>             ## 13 range   age_at_baseline numeric     20          100         <NA>             ## 14 range   age_at_onset    numeric     20          100         <NA>             ## 15 range   death_date      date        2000-01-01  2022-01-01  <NA>             ## 16 range   onset           categorical <NA>        <NA>        bulbar, respirat…

qc_data function

qc_data takes as arguments the data to be qualitycontrolled and the QC mapping containing the tests to be applied.

qc_data(als_data, als_data_qc_mapping)[,c("subjid","age_at_onset","onset","baseline_date","death_date","finding")]
## # A tibble: 13 × 6##    subjid age_at_onset onset  baseline_date death_date finding                  ##    <chr>  <chr>        <chr>  <chr>         <chr>      <chr>                    ##  1 30     80           bulbar 2005-11-11    2015-05-30 subjid variable is dupli…##  2 30     48           bulbar 2008-02-27    2014-07-05 subjid variable is dupli…##  3 11     46           bulbar 2007-03-09    2016-09-26 variable p8 is missing   ##  4 12     45           bulbar 2005-01-11    2010-06-20 variable p8 is missing   ##  5 13     77           bulbar 2010-12-22    2019-07-05 variable p8 is missing   ##  6 6      68           spinal 2007-04-30    2010-05-08 variable x2r is out of r…##  7 15     376          spinal 2005-09-15    2010-07-20 variable age_at_onset is…##  8 28     -23          bulbar 2002-05-11    2017-11-09 variable age_at_onset is…##  9 3      80           spinal 2007-01-27    9999-12-30 variable death_date is o…## 10 2      77           bulba  2003-07-03    2019-06-24 variable onset is not a …## 11 8      62           spinal 2002-12-13    2018-05-04 variables age_at_baselin…## 12 15     376          spinal 2005-09-15    2010-07-20 variables age_at_baselin…## 13 24     67           bulbar 2006-08-12    2002-12-02 variables baseline_date …

This will return a table with all the findings. If you want to saveit, you can specify the path to be saved inoutput_file.


[8]ページ先頭

©2009-2025 Movatter.jp