Movatterモバイル変換


[0]ホーム

URL:


Formatting Data Exported from REDCap

This vignette covers different methods for formatting the recordsfrom REDCap into an analysis ready data set. It is assumed that thereader is familiar with the process for exporting data from REDCap to Ras described invignette("api", package = "REDCapExporter")

For the purposes of this vignette we will use the example data setsprovided in the package from the 2000-2001 National Hockey LeagueStanley Cup Champion Colorado Avalanche. The data was transcribed fromHockeyReference into a REDCap Project hosed at the University of ColoradoDenver.

The data sets we will work with in this vignette are:

library(REDCapExporter)avs_raw_core# object returned from export_core(format = "csv")avs_raw_metadata# object returned from export_content(content = "metadata", format = "csv")avs_raw_record# object returned from export_content(content = "record", format = "csv")

There are two conceptual formatting tools provided byREDCapExporter:

  1. as.data.frame

  2. format_record

Coercion to data.frame

The object returned fromexport_content is a string ineither csv or json format. To have that information as a data.frame callas.data.frame.

This method works for the metadata and records directly.

avs_metadata_DF<-as.data.frame(avs_raw_metadata)avs_record_DF<-as.data.frame(avs_raw_record)

Forrcer_rccore objects returned byexport_core all the elements can be coerced to data.framesvialapply

avs_core_DFs<-lapply(avs_raw_core, as.data.frame)

The behavior ofas.data.frame for these objects is toreturn a data.frame with all character columns.

avs_metadata_DF|>sapply(class)|>sapply(is.character)|>all()## [1] TRUEavs_record_DF|>sapply(class)|>sapply(is.character)|>all()## [1] TRUE

Obviously, this is not ideal for analysis. It does give the user aknown starting point for formatting the records explicitly. However,REDCapExporter provides theformat_record method tosimplify this task by using the metadata from the REDCap project.

format_record

format_record uses the metadata to inform the storagemode of the elements of a data.frame. For example, after exporting thecore of a REDCap project we can build a data.frameavsDFvia

avsDF<-format_record(avs_raw_core)str(avsDF,max.level =0)## Classes 'rcer_record' and 'data.frame':  32 obs. of  75 variables:

Note: the above uses the core export from REDCap. You can use justthe record and metadata to get the same result:

identical(format_record(avs_raw_core),format_record(avs_raw_record, avs_raw_metadata))## [1] TRUE

Let’s look at theavsDF object (presented as a nicehuman readable table)

record_iduniform_numberfirstnamelastnamehofnationalitypositionbirthdatefirst_nhl_gamelast_nhl_gameheightweightshootscatchesexperienceroster_completegpgoalsassistspointsplusmnpimigoals_evgoals_ppgoals_shgoals_gwassists_evassists_ppassists_shshotsshooting_percentagetoiatoiregular_season_scoring_completewinslossesties_otlgoals_againstshots_againstsavessave_percentagegaasoregular_season_goalies_completegp_postseasongoals_postseasonassists_postseasonpoints_postseasonplusmn_postseasonpimi_postseasongoals_ev_postseasongoals_pp_postseasongoals_sh_postseasongoals_gw_postseasonassists_ev_postseasonassists_pp_postseasonassists_sh_postseasonshots_postseasonshooting_percentage_postseasontoi_postseasonatoi_postseasonpost_season_scoring_completewins_postseasonlosses_postseasonties_otl_postseasongoals_allowed_postseasonsaves_postseasonsave_percentage_postseasongaa_postseasonso_postseasonpost_season_goalies_completeeg_checkbox___cb01eg_checkbox___cb02eg_checkbox___cb03extras_complete
11DavidAebischer0SwissGoal1978-02-072001-04-072007-10-1073185NALeft0Complete26011000000NANANA00.000000139353M 34SComplete1273525384860.90334572.243Complete1000000000NANANA0NA132SComplete00NA000.0000.00Complete100Incomplete
246YuriBabenko0USSRCenter1978-01-022000-11-222000-11-2973200LeftNA0Complete300000000000020.0000003210M 34SCompleteNANANANANANANANANAComplete0NANANANANANANANANANANANANANANANACompleteNANANANANANANANAComplete010Incomplete
345RickBerry0CanadaDefence1978-11-042001-01-072004-04-0474210LeftNA0Complete190445380000400100.00000023112M 8SCompleteNANANANANANANANANAComplete0NANANANANANANANANANANANANANANANACompleteNANANANANANANANAComplete000Incomplete
44RobBlake1CanadaDefence1969-12-101990-03-272010-05-2376220RightNA11Complete1328101181101620444.54545433926M 3SCompleteNANANANANANANANANAComplete23613196163300NANANA837.22891667729M 26SCompleteNANANANANANANANAComplete000Incomplete
577RayBourque1CanadaDefence1960-12-281979-10-112001-06-0971219LeftNA21Complete807525925483220213102163.240741208826M 6SCompleteNANANANANANANANANAComplete2146109121301NANANA498.16326559928M 32SCompleteNANANANANANANANAComplete010Incomplete
67Gregde Vries0CanadaDefence1973-01-041996-01-172009-04-1074205LeftNA5Complete7951217235150001101766.578947135117M 6SCompleteNANANANANANANANANAComplete230115200000NANANA200.00000032814M 17SCompleteNANANANANANANANAComplete100Incomplete
718AdamDeadmarsh0CanadaRight Wing1975-05-101995-01-212002-12-1572205RightNA6Complete39131326-25967027608615.11627968717M 38SCompleteNANANANANANANANANAComplete0NANANANANANANANANANANANANANANANACompleteNANANANANANANANAComplete110Incomplete
811ChrisDingman0CanadaLeft Wing1976-07-061997-10-012006-04-2576235LeftNA3Complete41112-31081000100333.0303032646M 26SCompleteNANANANANANANANANAComplete160443140000NANANA80.0000001016M 18SCompleteNANANANANANANANAComplete110Incomplete
937ChrisDrury0USALeft Wing1976-08-201998-10-102011-04-2370191RightNA2Complete712441656471311052218020411.764706128118M 3SCompleteNANANANANANANANANAComplete2311516549202NANANA6217.74193643919M 6SCompleteNANANANANANANANAComplete000Incomplete
1052AdamFoote0CanadaDefence1971-07-101991-10-192011-04-1074220RightNA9Complete35312156421111750595.08474688825M 22SCompleteNANANANANANANANANAComplete233475472101NANANA2810.71428665228M 22SCompleteNANANANANANANANAComplete110Incomplete
1121PeterForsberg1SweedenCenter1973-07-201995-01-112011-02-1272205LeftNA6Complete7327628923541212253424417815.168539151820M 48SCompleteNANANANANANANANANAComplete1141014563102NANANA2317.39130424121M 55SCompleteNANANANANANANANAComplete000Incomplete
125AlexeiGusarov0USSRDefence1964-07-081990-12-152001-05-2175185LeftNA10Complete901126000010040.00000013514M 59SCompleteNANANANANANANANANAComplete0NANANANANANANANANANANANANANANANACompleteNANANANANANANANAComplete100Incomplete
1323MilanHejduk0CzechoslovakiaRight Wing1976-02-141998-10-102013-04-2772190RightNA2Complete8041387932362812192115221319.248826158919M 52SCompleteNANANANANANANANANAComplete2371623863401NANANA5113.72549049621M 33SCompleteNANANANANANANANAComplete000Incomplete
1413DanHinote0USACenter1977-01-301999-10-052009-04-2172187RightNA1Complete76510151514101820697.24637778710M 21SCompleteNANANANANANANANANAComplete232464212000NANANA1612.5000001928M 22SCompleteNANANANANANANANAComplete010Incomplete
1524JonKlemm0CanadaDefence1970-01-081992-02-232008-04-0374205RightNA8Complete784111522542202632974.123711155419M 56SCompleteNANANANANANANANANAComplete221237161001NANANA147.14285735716M 15SCompleteNANANANANANANANAComplete110Incomplete
169BradLarsen0CanadaLeft Wing1977-06-28NANA72210LeftNA1Incomplete900010000000030.000000849M 17SCompleteNANANANANANANANANAComplete0NANANANANANANANANANANANANANANANACompleteNANANANANANANANAComplete000Incomplete
1729EricMessier0CanadaLeft Wing1973-10-291996-11-112003-11-2174195LeftNA4Complete645712-3265001700608.33333378612M 16SCompleteNANANANANANANANANAComplete232240142000NANANA2010.00000037416M 16SCompleteNANANANANANANANAIncomplete000Incomplete
183AaronMiller0USADefence1971-08-111994-01-152008-03-0675210RightNA7Complete56491319294000801498.163265103218M 25SCompleteNANANANANANANANANAComplete0NANANANANANANANANANANANANANANANACompleteNANANANANANANANAComplete110Incomplete
192BryanMuir0CanadaDefence1973-06-081996-03-082007-04-0775224LeftNA4Unverified800004000000030.000000668M 14SCompleteNANANANANANANANANAComplete3000000000NANANA0NA103M 15SCompleteNANANANANANANANAComplete100Incomplete
2039VilleNieminen0FinlandLeft Wing1977-04-062000-01-292007-04-0571200LeftNA1Complete5014822838122035306820.58823562212M 26SCompleteNANANANANANANANANAComplete234610-1201301NANANA3910.25641032614M 10SCompleteNANANANANANANANAComplete010Incomplete
2127ScottParker0USARight Wing1978-01-291998-11-282008-03-1177240RightNA10Complete69235-21552001300355.7142863945M 42SCompleteNANANANANANANANANAComplete4000020000NANANA0NA92M 12SCompleteNANANANANANANANAComplete000Incomplete
2225ShjonPodein0USARight Wing1968-03-051993-01-092003-04-2274200LeftNA8Complete8215173276815003170013710.948905118014M 23SCompleteNANANANANANANANANAComplete232353142001NANANA1612.50000034514M 59SCompleteNANANANANANANANAComplete100Incomplete
234-44NolanPratt0CandaDefence1975-08-141996-10-052008-04-0375207LeftNA4Complete461232401001200263.8461544529M 50SCompleteNANANANANANANANANAComplete0NANANANANANANANANANANANANANANANACompleteNANANANANANANANAComplete110Incomplete
2463JoelPrpic0CanadaCenter1974-09-25NANA78225LeftNA2Incomplete30000200000000NA299M 47SCompleteNANANANANANANANANAComplete0NANANANANANANANANANANANANANANANACompleteNANANANANANANANAComplete110Incomplete
2514DaveReid0CanadaRight Wing1964-05-151983-12-232001-06-0973217LeftNA17Complete7319101211000801661.5151517219M 53SCompleteNANANANANANANANANAComplete18044260000NANANA80.0000001649M 8SCompleteNANANANANANANANAComplete010Incomplete
2628SteveReinprecht0CanadaCenter1976-05-07NANA72195LeftNA1Incomplete21347-1230003012810.71428632815M 38SCompleteNANANANANANANANANAComplete22235022000NANANA1414.28571426712M 9SCompleteNANANANANANANANAComplete010Incomplete
2733PatrickRoy1CanadaGoal1965-10-051985-02-232003-04-2274185NALeft16Complete620550100000NANANA0NA356557M 30SComplete40137132151312810.84666232.224Complete23011000000NANANA0NA1451NAIncomplete167NA416220.9341.74Complete000Incomplete
2819JoeSakic1CanadaCenter1969-07-071988-10-062008-11-2871195LeftNA12Complete825464118453032193123427333216.265060188723M 1SCompleteNANANANANANANANANAComplete21131326668503NANANA7916.45569645221M 33SCompleteNANANANANANANANAComplete110Incomplete
2944RobShearer0CanadaCenter1976-10-192000-11-112000-11-1370190RightNA0Complete2000-2000000000NA146M 45SCompleteNANANANANANANANANAComplete0NANANANANANANANANANANANANANANANACompleteNANANANANANANANAComplete010Incomplete
3041MartinSkoula0CzechoslovakiaDefence1979-10-281999-10-052010-04-2275226LeftNA11Complete8281725838530211601087.407407169720M 41SCompleteNANANANANANANANANAComplete23145181000NANANA147.14285727611M 59SCompleteNANANANANANANANAComplete100Incomplete
3140AlexTanguay0CanadaLeft Wing1979-11-211999-10-052016-04-1973194LeftNA1Complete822750773537197133911013520.000000146417M 51SCompleteNANANANANANANANANAComplete23615211385102NANANA3716.21621644419M 18SCompleteNANANANANANANANAComplete000Incomplete
3226StephaneYelle0CanadaCenter1974-05-091995-10-062010-04-2474182LeftNA5Complete5041014-32030101000547.40740772314M 28SCompleteNANANANANANANANANAComplete23123281001NANANA234.34782631913M 52SCompleteNANANANANANANANAComplete000Incomplete

Now, consider the classes of the columns. Start by looking at a fewcolumns which look like they are numeric values, record_id,uniform_number, height, and points.

cols<-c("record_id","uniform_number","height","points")head(avsDF[, cols],n =3)##   record_id uniform_number height points## 1         1              1     73      1## 2         2             46     73      0## 3         3             45     74      4sapply(avsDF[, cols], class)##      record_id uniform_number         height         points##    "character"    "character"      "integer"      "numeric"

Why are record_id and uniform_number, stored as characters whereasheight and points (sum of goals scored and assists) integer and numericvalues respectively? The answer is in the metadata.

avs_metadata_DF[avs_metadata_DF$field_name%in% cols, ]
field_nameform_namesection_headerfield_typefield_labelselect_choices_or_calculationsfield_notetext_validation_type_or_show_slider_numbertext_validation_mintext_validation_maxidentifierbranching_logicrequired_fieldcustom_alignmentquestion_numbermatrix_group_namematrix_rankingfield_annotation
record_idrostertextRecord ID
uniform_numberrostertextUniform Number
heightrostertextHeightin inchesinteger6084
pointsregular_season_scoringcalcPoints[goals]+[assists]

Notice that for the record_id and uniform_number the field_type is“text” with no value for “select_choices_or_calculations” and no valuefor “text_validation_type_or_show_slider_number”. This is interpreted,then, as just a text field and should be character vector in thedata.frame. Obviously the user could coerce to integer of numeric isdesired and if appropriate.

For height, note that the field_type is “text” and the“text_validation_type_or_show_slider_number” is “integer”, hence thecoercion from the raw data to integer when building the data.frame.Lastly, the points are a calculated field and set to numeric.

REDCapExporter attempts to make reasonable assumptions for the datatypes base on the metadata. For example, dates in REDCap can by enteredand validated in Year-Month-Day, Month-Day-Year, and Day-Month-Yearformats. The raw data is all in Year-Month-Day format.

field_namefield_typefield_labelfield_notetext_validation_type_or_show_slider_number
birthdatetextBirthdateFormat: M-D-Ydate_mdy
first_nhl_gametextDate of first NHL gamedate_dmy
last_nhl_gametextDate of last NHL gamedate_ymd

The coercion that will be used when callingformat_record is defined by an implicit call tocol_type which uses the metadata, in raw or formatted form,to determine the coercion.

identical(col_type(avs_raw_metadata),col_type(avs_metadata_DF))## [1] TRUEct<-col_type(avs_metadata_DF)

Each of the elements ofct are applied to the column ofthe data frame with the same name. Examples: The record_id is to be acharacter string by default.

ct[["record_id"]]## as.character(record_id)

If the user would prefer the record_id to be an integer we can modifyct and apply it explicitly when callingformat_record.

ct[["record_id"]]|>str()##  language as.character(record_id)ct[["record_id"]]<-expression(as.integer(record_id))avsDF2<-format_record(avs_raw_core,col_type = ct)## Ignoring metadata, using col_type

Two notes to make here, first, we can see that the storage mode isdifferent betweenavsDF$record_id andavsDF2$record_id.

class(avsDF$record_id)## [1] "character"class(avsDF2$record_id)## [1] "integer"

Second, there is a message (not a warning), that the metadata that ispart of theavs_raw_core object, is not being used todefine the column types.

If you want to suppress that message you can use

suppressMessages(format_record(avs_raw_core,col_type = ct))

or use the records as the object passed toformat_record

format_record(avs_record_DF,col_type = ct)

By default, variables recorded in REDCap via radio buttons ordropdown lists are formatted as factors. For example, the position ofthe player is a factor.

class(avsDF$position)## [1] "factor"summary(avsDF$position)##       Goal  Left Wing Right Wing     Center    Defence##          2          6          5          8         11

If you’d prefer to have all these variables stored as charactersinstead of factors you can modify the call tocol_type

ct<-col_type(avs_raw_metadata,factors =FALSE)avsDF2<-format_record(avs_raw_record,col_type = ct)class(avsDF2$position)## [1] "character"summary(avsDF2$position)##    Length     Class      Mode##        32 character charactertable(avsDF2$position)####     Center    Defence       Goal  Left Wing Right Wing##          8         11          2          6          5

The default formatting is documented in the manual file Theimplemented code is within the S3 method:

?col_type

[8]ページ先頭

©2009-2025 Movatter.jp