Movatterモバイル変換


[0]ホーム

URL:


Converting dates, times or date-times toISO 8601

An SDTM DTC variable may include data that is represented inISO 8601 format as acomplete date/time, a partial date/time, or an incomplete date/time.{sdtm.oak} provides thecreate_iso8601()function that allows flexible mapping of date and time values in variousformats to a single date-time ISO 8601 format.

Introduction

To perform conversion to the ISO 8601 format you need to pass two keyarguments:

create_iso8601("2000 01 05",.format ="y m d")#> [1] "2000-01-05"create_iso8601("22:35:05",.format ="H:M:S")#> [1] "-----T22:35:05"

By default the.format parameter understands a fewreserved characters:

Besides character vectors of dates and times, you may also pass asingle vector of date-times, provided you adjust the format:

create_iso8601("2000-01-05 22:35:05",.format ="y-m-d H:M:S")#> [1] "2000-01-05T22:35:05"

Multiple inputs

If you have dates and times in separate vectors then you will need topass a format for each vector:

create_iso8601("2000-01-05","22:35:05",.format =c("y-m-d","H:M:S"))#> [1] "2000-01-05T22:35:05"

In addition, like most R functions that take vectors as input,create_iso8601() is vectorized:

date<-c("2000-01-05","2001-12-25","1980-06-18","1979-09-07")time<-c("00:12:21","22:35:05","03:00:15","07:09:00")create_iso8601(date, time,.format =c("y-m-d","H:M:S"))#> [1] "2000-01-05T00:12:21" "2001-12-25T22:35:05" "1980-06-18T03:00:15"#> [4] "1979-09-07T07:09:00"

But the number of elements in each of the inputs has to match or youwill get an error:

date<-c("2000-01-05","2001-12-25","1980-06-18","1979-09-07")time<-"00:12:21"try(create_iso8601(date, time,.format =c("y-m-d","H:M:S")))#> Error in create_iso8601(date, time, .format = c("y-m-d", "H:M:S")) :#>   All vectors in `...` must be of the same length.

You can combine individual date and time components coming in asseparate inputs; here is a contrived example of year, month and daytogether, hour, and minute:

year<-c("99","84","00","80","79","1944","1953")month_and_day<-c("jan 1","apr 04","mar 06","jun 18","sep 07","sep 13","sep 14")hour<-c("12","13","05","23","16","16","19")min<-c("0","60","59","42","44","10","13")create_iso8601(year, month_and_day, hour, min,.format =c("y","m d","H","M"))#> [1] "1999-01-01T12:00" "1984-04-04T13:60" "2000-03-06T05:59" "1980-06-18T23:42"#> [5] "1979-09-07T16:44" "1944-09-13T16:10" "1953-09-14T19:13"

The.format argument must be always named; otherwise, itwill be treated as if it were one of the inputs and interpreted asmissing.

try(create_iso8601("2000-01-05","y-m-d"))#> Error in create_iso8601("2000-01-05", "y-m-d") :#>   argument ".format" is missing, with no default

Format variations

The.format parameter can easily accommodate variationsin the format of the inputs:

create_iso8601("2000-01-05",.format ="y-m-d")#> [1] "2000-01-05"create_iso8601("2000 01 05",.format ="y m d")#> [1] "2000-01-05"create_iso8601("2000/01/05",.format ="y/m/d")#> [1] "2000-01-05"

Individual components may come in a different order, so adjust theformat accordingly:

create_iso8601("2000 01 05",.format ="y m d")#> [1] "2000-01-05"create_iso8601("05 01 2000",.format ="d m y")#> [1] "2000-01-05"create_iso8601("01 05, 2000",.format ="m d, y")#> [1] "2000-01-05"

All other individual characters given in the format are takenstrictly, e.g. the number of spaces matters:

date<-c("2000 01 05","2000  01 05","2000 01  05","2000   01   05")create_iso8601(date,.format ="y m d")#> [1] "2000-01-05" NA           NA           NAcreate_iso8601(date,.format ="y  m d")#> [1] NA           "2000-01-05" NA           NAcreate_iso8601(date,.format ="y m  d")#> [1] NA           NA           "2000-01-05" NAcreate_iso8601(date,.format ="y   m   d")#> [1] NA           NA           NA           "2000-01-05"

The format can include regular expressions though:

create_iso8601(date,.format ="y\\s+m\\s+d")#> [1] "2000-01-05" "2000-01-05" "2000-01-05" "2000-01-05"

By default, a streak of the reserved characters is treated as if onlyone was provided, so these formats are equivalent:

date<-c("2000-01-05","2001-12-25","1980-06-18","1979-09-07")time<-c("00:12:21","22:35:05","03:00:15","07:09:00")create_iso8601(date, time,.format =c("y-m-d","H:M:S"))#> [1] "2000-01-05T00:12:21" "2001-12-25T22:35:05" "1980-06-18T03:00:15"#> [4] "1979-09-07T07:09:00"create_iso8601(date, time,.format =c("yyyy-mm-dd","HH:MM:SS"))#> [1] "2000-01-05T00:12:21" "2001-12-25T22:35:05" "1980-06-18T03:00:15"#> [4] "1979-09-07T07:09:00"create_iso8601(date, time,.format =c("yyyyyyyy-m-dddddd","H:MMMMM:SSSS"))#> [1] "2000-01-05T00:12:21" "2001-12-25T22:35:05" "1980-06-18T03:00:15"#> [4] "1979-09-07T07:09:00"

Multiple alternative formats

When an input vector contains values with varying formats, a singleformat may not be adequate to encompass all variations. In suchsituations, it’s advisable to list multiple alternative formats. Thisapproach ensures that each format is tried sequentially until onematches the data in the vector.

date<-c("2000/01/01","2000-01-02","2000 01 03","2000/01/04")create_iso8601(date,.format ="y-m-d")#> [1] NA           "2000-01-02" NA           NAcreate_iso8601(date,.format ="y m d")#> [1] NA           NA           "2000-01-03" NAcreate_iso8601(date,.format ="y/m/d")#> [1] "2000-01-01" NA           NA           "2000-01-04"create_iso8601(date,.format =list(c("y-m-d","y m d","y/m/d")))#> [1] "2000-01-01" "2000-01-02" "2000-01-03" "2000-01-04"

Consider the order in which you supply the formats, as it can besignificant. If multiple formats could potentially match, the sequencedetermines which format is applied first.

create_iso8601("07 04 2000",.format =list(c("d m y","m d y")))#> [1] "2000-04-07"create_iso8601("07 04 2000",.format =list(c("m d y","d m y")))#> [1] "2000-07-04"

Note that if you are passing alternative formats, then the.format argument must be a list whose length matches thenumber of inputs.

Parsing of date or time components

By default, date or time components are parsed as follows:

# Years: two-digit or four-digit numbers.years<-c("0","1","00","01","15","30","50","68","69","80","99")create_iso8601(years,.format ="y")#>  [1] NA     NA     "2000" "2001" "2015" "2030" "2050" "2068" "1969" "1980"#> [11] "1999"# Adjust the point where two-digits years are mapped to 2000's or 1900's.create_iso8601(years,.format ="y",.cutoff_2000 =20L)#>  [1] NA     NA     "2000" "2001" "2015" "1930" "1950" "1968" "1969" "1980"#> [11] "1999"# Both numeric months (two-digit only) and abbreviated months work out of the boxmonths<-c("0","00","1","01","Jan","jan")create_iso8601(months,.format ="m")#> [1] NA     "--00" NA     "--01" "--01" "--01"# Month days: single or two-digit numbers, anything else results in NA.create_iso8601(c("1","01","001","10","20","31"),.format ="d")#> [1] "----01" "----01" NA       "----10" "----20" "----31"# Hourscreate_iso8601(c("1","01","001","10","20","31"),.format ="H")#> [1] "-----T01" "-----T01" NA         "-----T10" "-----T20" "-----T31"# Minutescreate_iso8601(c("1","01","001","10","20","60"),.format ="M")#> [1] "-----T-:01" "-----T-:01" NA           "-----T-:10" "-----T-:20"#> [6] "-----T-:60"# Secondscreate_iso8601(c("1","01","23.04","001","10","20","60"),.format ="S")#> [1] "-----T-:-:01"    "-----T-:-:01"    "-----T-:-:23.04" NA#> [5] "-----T-:-:10"    "-----T-:-:20"    "-----T-:-:60"

Allowing alternative date or time values

If date or time component values include special values, e.g. valuesencoding missing values, then you can indicate those values as possiblealternatives such that the parsing will tolerate them; use the.na argument:

create_iso8601("U DEC 2019 14:00",.format ="d m y H:M")#> [1] NAcreate_iso8601("U DEC 2019 14:00",.format ="d m y H:M",.na ="U")#> [1] "2019-12--T14:00"create_iso8601("U UNK 2019 14:00",.format ="d m y H:M")#> [1] NAcreate_iso8601("U UNK 2019 14:00",.format ="d m y H:M",.na =c("U","UNK"))#> [1] "2019----T14:00"

In this case you could achieve the same result using regexps:

create_iso8601("U UNK 2019 14:00",.format ="(d|U) (m|UNK) y H:M")#> [1] "2019----T14:00"

Changing reserved format characters

There might be cases when the reserved characters —"y","m","d","H","M","S" — might get in the way of specifying an adequateformat. For example, you might be tempted to use format"HHMM" to try to parse a time such as"14H00M". You could assume that the first “H” codes forparsing the hour, and the second “H” to be a literal “H” but, actually,"HH" will be taken to mean parsing hours, and"MM" to parse minutes. You can use the functionfmt_cmp() to specify alternative format regexps for theformat, replacing the default characters.

In the next example, we reassign new format strings for the hour andminute components, thus freeing the"H" and"M" patterns from being interpreted as hours and minutes,and to be taken literally:

create_iso8601("14H00M",.format ="HHMM")#> [1] NAcreate_iso8601("14H00M",.format ="xHwM",.fmt_c =fmt_cmp(hour ="x",min ="w"))#> [1] "-----T14:00"

Note that you need to make sure that the format component regexps aremutually exclusive, i.e. they don’t have overlapping matches; otherwisecreate_iso8601() will fail with an error. In the nextexample both months and minutes could be represented by an"m" in the format resulting in an ambiguous formatspecification.

fmt_cmp(hour ="h",min ="m")#> $sec#> [1] "S+"#>#> $min#> [1] "m"#>#> $hour#> [1] "h"#>#> $mday#> [1] "d+"#>#> $mon#> [1] "m+"#>#> $year#> [1] "y+"#>#> attr(,"class")#> [1] "fmt_c"try(create_iso8601("14H00M",.format ="hHmM",.fmt_c =fmt_cmp(hour ="h",min ="m")))#> Error in purrr::map2(dots, .format, ~parse_dttm(dttm = .x, fmt = .y, na = .na,  :#>   ℹ In index: 1.#> Caused by error in `purrr::map()`:#> ℹ In index: 1.#> Caused by error in `parse_dttm_fmt()`:#> ! Patterns in `fmt_c` have overlapping matches.

[8]ページ先頭

©2009-2025 Movatter.jp