The goal of this vignette is to introduce you to clock’s high-levelAPI, which works directly on R’s built-in date-time types, Date andPOSIXct. For an overview of all of the functionality in the high-levelAPI, check out the pkgdown reference section,HighLevel API. One thing you should immediately notice is that everyfunction specific to R’s date and date-time types are prefixed withdate_*(). There are also additional functions forarithmetic (add_*()) and getting (get_*()) orsetting (set_*()) components that are also used by othertypes in clock.
As you’ll quickly see in this vignette, one of the main goals ofclock is to guard you, the user, from unexpected issues caused byfrustrating date manipulation concepts like invalid dates and daylightsaving time. It does this by letting you know as soon as one of theseissues happens, giving you the power to handle it explicitly with one ofa number of different resolution strategies.
To create a vector of dates, you can usedate_build().This allows you to specify the components individually.
If you happen to specify aninvalid date, you’ll get anerror message:
date_build(2019,1:12,31)#> Error in `invalid_resolve()`:#> ! Invalid date found at location 2.#> ℹ Resolve invalid date issues by specifying the `invalid` argument.One way to resolve this is by specifying an invalid date resolutionstrategy using theinvalid argument. There are multipleoptions, but in this case we’ll ask for the invalid dates to be set tothe previous valid moment in time.
date_build(2019,1:12,31,invalid ="previous")#> [1] "2019-01-31" "2019-02-28" "2019-03-31" "2019-04-30" "2019-05-31"#> [6] "2019-06-30" "2019-07-31" "2019-08-31" "2019-09-30" "2019-10-31"#> [11] "2019-11-30" "2019-12-31"To learn more about invalid dates, check out the documentation forinvalid_resolve().
If we were actually after the “last day of the month”, an easier wayto specify this would have been:
date_build(2019,1:12,"last")#> [1] "2019-01-31" "2019-02-28" "2019-03-31" "2019-04-30" "2019-05-31"#> [6] "2019-06-30" "2019-07-31" "2019-08-31" "2019-09-30" "2019-10-31"#> [11] "2019-11-30" "2019-12-31"You can also create date-times usingdate_time_build(),which generates a POSIXct. Note that you must supply a time zone!
date_time_build(2019,1:5,1,2,30,zone ="America/New_York")#> [1] "2019-01-01 02:30:00 EST" "2019-02-01 02:30:00 EST"#> [3] "2019-03-01 02:30:00 EST" "2019-04-01 02:30:00 EDT"#> [5] "2019-05-01 02:30:00 EDT"If you “build” a time that doesn’t exist, you’ll get an error. Forexample, on March 8th, 2020, there was a daylight saving time gap of 1hour in the America/New_York time zone that took us from01:59:59 directly to03:00:00, skipping the 2o’clock hour entirely. Let’s “accidentally” create a time in thatgap:
date_time_build(2019:2021,3,8,2,30,zone ="America/New_York")#> Error in `as_zoned_time()`:#> ! Nonexistent time due to daylight saving time at location 2.#> ℹ Resolve nonexistent time issues by specifying the `nonexistent` argument.To resolve this issue, we can specify a nonexistent time resolutionstrategy through thenonexistent argument. There are anumber of options, including rolling forward or backward to the next orprevious valid moments in time:
zone<-"America/New_York"date_time_build(2019:2021,3,8,2,30,zone = zone,nonexistent ="roll-forward")#> [1] "2019-03-08 02:30:00 EST" "2020-03-08 03:00:00 EDT"#> [3] "2021-03-08 02:30:00 EST"date_time_build(2019:2021,3,8,2,30,zone = zone,nonexistent ="roll-backward")#> [1] "2019-03-08 02:30:00 EST" "2020-03-08 01:59:59 EST"#> [3] "2021-03-08 02:30:00 EST"To parse dates, usedate_parse(). Parsing dates requiresaformat string, a combination ofcommands thatspecify where date components are in your string. By default, it assumesthat you’re working with dates in the form"%Y-%m-%d"(year-month-day).
You can change the format string usingformat:
Various different locales are supported for parsing month and weekdaynames in different languages. To parse a French month:
You can learn about more locale options in the documentation forclock_locale().
If you have heterogeneous dates, you can supply multiple formatstrings:
You have four options when parsing date-times:
date_time_parse(): For strings like"2020-01-01 01:02:03" where there is neither a time zoneoffset nor a full (not abbreviated!) time zone name.
date_time_parse_complete(): For strings like"2020-01-01T01:02:03-05:00[America/New_York]" where thereis both a time zone offset and time zone name present in thestring.
date_time_parse_abbrev(): For strings like"2020-01-01 01:02:03 EST" where there is a time zoneabbreviation in the string.
date_time_parse_RFC_3339(): For strings like"2020-01-01T01:02:03Z" or"2020-01-01T01:02:03-05:00", which are in RFC 3339 formatand are intended to be interpreted as UTC.
date_time_parse() requires azone argument,and will ignore any other zone information in the string (i.e. if youtried to specify%z and%Z). The defaultformat string is"%Y-%m-%d %H:%M:%S".
If you happen to parse an invalid or ambiguous date-time, you’ll getan error. For example, on November 1st, 2020, there weretwo 1o’clock hours in the America/New_York time zone due to a daylight savingtime fallback. You can see that if we parse a time right before thefallback, and then shift it forward by 1 second, and then 1 hour and 1second, respectively:
before<-date_time_parse("2020-11-01 00:59:59","America/New_York")# First 1 o'clockbefore+1#> [1] "2020-11-01 01:00:00 EDT"# Second 1 o'clockbefore+1+3600#> [1] "2020-11-01 01:00:00 EST"The following string doesn’t include any information about which ofthese two 1 o’clocks it belongs to, so it is consideredambiguous. Ambiguous times will error when parsing:
date_time_parse("2020-11-01 01:30:00","America/New_York")#> Error in `as_zoned_time()`:#> ! Ambiguous time due to daylight saving time at location 1.#> ℹ Resolve ambiguous time issues by specifying the `ambiguous` argument.To fix that, you can specify an ambiguous time resolution strategywith theambiguous argument.
date_time_parse_complete() doesn’t have azone argument, and doesn’t requireambiguousornonexistent arguments, since it assumes that the stringyou are providing is completely unambiguous. The only way this ispossible is by having both a time zone offset, specified by%z, and a full time zone name, specified by%Z, in the string.
The following is an example of an “extended” RFC 3339 format used byJava 8’s time library to specify complete date-time strings. This issomething thatdate_time_parse_complete() can parse. Thedefault format string follows this extended format, and is"%Y-%m-%dT%H:%M:%S%z[%Z]".
date_time_parse_abbrev() is useful when your date-timestrings contain a time zone abbreviation rather than a time zone offsetor full time zone name.
x<-"2020-01-01 01:02:03 EST"date_time_parse_abbrev(x,"America/New_York")#> [1] "2020-01-01 01:02:03 EST"The string is first parsed as a naive time without considering theabbreviation, and is then converted to a zoned-time using the suppliedzone. If an ambiguous time is parsed, the abbreviation isused to resolve the ambiguity.
x<-c("1970-10-25 01:30:00 EDT","1970-10-25 01:30:00 EST")date_time_parse_abbrev(x,"America/New_York")#> [1] "1970-10-25 01:30:00 EDT" "1970-10-25 01:30:00 EST"You might be wondering why you need to supplyzone atall. Isn’t the abbreviation enough? Unfortunately, multiple countriesuse the same time zone abbreviations, even though they have differenttime zones. This means that, in many cases, the abbreviation alone isambiguous. For example, both India and Israel useIST fortheir standard times.
date_time_parse_RFC_3339() is useful when your date-timestrings come from an API, which means they are likely in an ISO 8601 orRFC 3339 format, and should be interpreted as UTC.
The default format string parses the typical RFC 3339 format of"%Y-%m-%dT%H:%M:%SZ".
If your date-time strings contain a numeric offset from UTC ratherthan a"Z", then you’ll need to set theoffsetargument to one of the following:
"%z" if the offset is of the form"-0500"."%Ez" if the offset is of the form"-05:00".When performing time-series related data analysis, you often need tosummarize your series at a less precise precision. There are manydifferent ways to do this, and the differences between them are subtle,but meaningful. clock offers three different sets of functions forsummarization:
date_group()
date_floor(),date_ceiling(), anddate_round()
date_shift()
Grouping allows you to summarize a component of a date or date-timewithin other components. An example of this is grouping by dayof the month, which summarizes the day componentwithin thecurrent year-month.
x<-seq(date_build(2019,1,20),date_build(2019,2,5),by =1)x#> [1] "2019-01-20" "2019-01-21" "2019-01-22" "2019-01-23" "2019-01-24"#> [6] "2019-01-25" "2019-01-26" "2019-01-27" "2019-01-28" "2019-01-29"#> [11] "2019-01-30" "2019-01-31" "2019-02-01" "2019-02-02" "2019-02-03"#> [16] "2019-02-04" "2019-02-05"# Grouping by 5 days of the current monthdate_group(x,"day",n =5)#> [1] "2019-01-16" "2019-01-21" "2019-01-21" "2019-01-21" "2019-01-21"#> [6] "2019-01-21" "2019-01-26" "2019-01-26" "2019-01-26" "2019-01-26"#> [11] "2019-01-26" "2019-01-31" "2019-02-01" "2019-02-01" "2019-02-01"#> [16] "2019-02-01" "2019-02-01"The thing to note about grouping by day of the month is that at theend of each month, the groups restart. So this created groups forJanuary of[1, 5], [6, 10], [11, 15], [16, 20], [21, 25], [26, 30], [31].
You can also group by month or year:
date_group(x,"month")#> [1] "2019-01-01" "2019-01-01" "2019-01-01" "2019-01-01" "2019-01-01"#> [6] "2019-01-01" "2019-01-01" "2019-01-01" "2019-01-01" "2019-01-01"#> [11] "2019-01-01" "2019-01-01" "2019-02-01" "2019-02-01" "2019-02-01"#> [16] "2019-02-01" "2019-02-01"This also works with date-times, adding the ability to group by hourof the day, minute of the hour, and second of the minute.
x<-seq(date_time_build(2019,1,1,1,55,zone ="UTC"),date_time_build(2019,1,1,2,15,zone ="UTC"),by =120)x#> [1] "2019-01-01 01:55:00 UTC" "2019-01-01 01:57:00 UTC"#> [3] "2019-01-01 01:59:00 UTC" "2019-01-01 02:01:00 UTC"#> [5] "2019-01-01 02:03:00 UTC" "2019-01-01 02:05:00 UTC"#> [7] "2019-01-01 02:07:00 UTC" "2019-01-01 02:09:00 UTC"#> [9] "2019-01-01 02:11:00 UTC" "2019-01-01 02:13:00 UTC"#> [11] "2019-01-01 02:15:00 UTC"date_group(x,"minute",n =5)#> [1] "2019-01-01 01:55:00 UTC" "2019-01-01 01:55:00 UTC"#> [3] "2019-01-01 01:55:00 UTC" "2019-01-01 02:00:00 UTC"#> [5] "2019-01-01 02:00:00 UTC" "2019-01-01 02:05:00 UTC"#> [7] "2019-01-01 02:05:00 UTC" "2019-01-01 02:05:00 UTC"#> [9] "2019-01-01 02:10:00 UTC" "2019-01-01 02:10:00 UTC"#> [11] "2019-01-01 02:15:00 UTC"While grouping is useful for summarizingwithin a component,rounding is useful for summarizingacross components. It isgreat for summarizing by, say, a rolling set of 60 days.
Rounding operates on the underlying count that makes up your date ordate-time. To see what I mean by this, try unclassing a date:
This is a count of days since theorigin that R uses,1970-01-01, which is considered day 0. If you were to floor by 60 days,this would bundle[1970-01-01, 1970-03-02), [1970-03-02, 1970-05-01), and soon. Equivalently, it bundles counts of[0, 60), [60, 120),etc.
x<-seq(date_build(1970,01,01),date_build(1970,05,10),by =20)date_floor(x,"day",n =60)#> [1] "1970-01-01" "1970-01-01" "1970-01-01" "1970-03-02" "1970-03-02"#> [6] "1970-03-02" "1970-05-01"date_ceiling(x,"day",n =60)#> [1] "1970-01-01" "1970-03-02" "1970-03-02" "1970-03-02" "1970-05-01"#> [6] "1970-05-01" "1970-05-01"If you prefer a different origin, you can supply a Dateorigin todate_floor(), which determines what“day 0” is considered to be. This can be useful for grouping by multipleweeks if you want to control what is considered the start of the week.Since 1970-01-01 is a Thursday, flooring by 2 weeks would normallygenerate all Thursdays:
To change this you can supply anorigin on the weekdaythat you’d like to be considered the first day of the week.
sunday<-date_build(1970,01,04)date_floor(x,"week",n =14,origin = sunday)#> [1] "1969-09-28" "1970-01-04" "1970-01-04" "1970-01-04" "1970-01-04"#> [6] "1970-01-04" "1970-04-12"as_weekday(date_floor(x,"week",n =14,origin = sunday))#> <weekday[7]>#> [1] Sun Sun Sun Sun Sun Sun SunIf you only need to floor by 1 week, it is often easier to usedate_shift(), as seen in the next section.
date_shift() allows you to target a weekday, and thenshift a vector of dates forward or backward to the next instance of thattarget. It requires using one of the new types in clock,weekday, which is supplied as the target.
For example, to shift to the next Tuesday:
x<-date_build(2020,1,1:2)# Wednesday / Thursdayas_weekday(x)#> <weekday[2]>#> [1] Wed Thu# `clock_weekdays` is a helper that returns the code corresponding to# the requested day of the weekclock_weekdays$tuesday#> [1] 3tuesday<-weekday(clock_weekdays$tuesday)tuesday#> <weekday[1]>#> [1] Tuedate_shift(x,target = tuesday)#> [1] "2020-01-07" "2020-01-07"Shifting to theprevious day of the week is a nice way tofloor by 1 week. It allows you to control the start of the week in a waythat is slightly easier than usingdate_floor(origin = ).
You can do arithmetic with dates and date-times using the family ofadd_*() functions. With dates, you can add years, months,and days. With date-times, you can additionally add hours, minutes, andseconds.
x<-date_build(2020,1,1)add_years(x,1:5)#> [1] "2021-01-01" "2022-01-01" "2023-01-01" "2024-01-01" "2025-01-01"One of the neat parts about clock is that it requires you to beexplicit about how you want to handle invalid dates when doingarithmetic. What is 1 month after January 31st? If you try and createthis date, you’ll get an error.
x<-date_build(2020,1,31)add_months(x,1)#> Error in `invalid_resolve()`:#> ! Invalid date found at location 1.#> ℹ Resolve invalid date issues by specifying the `invalid` argument.clock gives you the power to handle this through theinvalid option:
# The previous valid moment in timeadd_months(x,1,invalid ="previous")#> [1] "2020-02-29"# The next valid moment in timeadd_months(x,1,invalid ="next")#> [1] "2020-03-01"# Overflow the days. There were 29 days in February, 2020, but we# specified 31. So this overflows 2 days past day 29.add_months(x,1,invalid ="overflow")#> [1] "2020-03-02"# If you don't consider it to be a valid dateadd_months(x,1,invalid ="NA")#> [1] NAAs a teaser, the low level library has acalendar type namedyear-month-day that powers this operation. It actually gives youmore flexibility, allowing"2020-02-31" to existin the wild:
You can useinvalid_resolve(invalid =) to resolve thislike you did inadd_months(), or you can let it hang aroundif you expect other operations to make it “valid” again.
# Adding 1 more month makes it valid againymd+duration_months(1)#> <year_month_day<day>[1]>#> [1] "2020-03-31"When working with date-times, you can additionally add hours,minutes, and seconds.
x<-date_time_build(2020,1,1,2,30,zone ="America/New_York")x%>%add_days(1)%>%add_hours(2:5)#> [1] "2020-01-02 04:30:00 EST" "2020-01-02 05:30:00 EST"#> [3] "2020-01-02 06:30:00 EST" "2020-01-02 07:30:00 EST"When adding units of time to a POSIXct, you have to be very carefulwith daylight saving time issues. clock tries to help you out by lettingyou know when you run into an issue:
x<-date_time_build(1970,04,25,02,30,00,zone ="America/New_York")x#> [1] "1970-04-25 02:30:00 EST"# Daylight saving time gap on the 26th between 01:59:59 -> 03:00:00x%>%add_days(1)#> Error in `as_zoned_time()`:#> ! Nonexistent time due to daylight saving time at location 1.#> ℹ Resolve nonexistent time issues by specifying the `nonexistent` argument.You can solve this using thenonexistent argument tocontrol how these times should be handled.
# Roll forward to the next valid moment in timex%>%add_days(1,nonexistent ="roll-forward")#> [1] "1970-04-26 03:00:00 EDT"# Roll backward to the previous valid moment in timex%>%add_days(1,nonexistent ="roll-backward")#> [1] "1970-04-26 01:59:59 EST"# Shift forward by adding the size of the DST gap# (this often keeps the time of day,# but doesn't guaratee that relative ordering in `x` is maintained# so I don't recommend it)x%>%add_days(1,nonexistent ="shift-forward")#> [1] "1970-04-26 03:30:00 EDT"# Replace nonexistent times with an NAx%>%add_days(1,nonexistent ="NA")#> [1] NAclock provides a family of getters and setters for working with datesand date-times. You can get and set the year, month, or day of adate.
x<-date_build(2019,5,6)get_year(x)#> [1] 2019get_month(x)#> [1] 5get_day(x)#> [1] 6x%>%set_day(22)%>%set_month(10)#> [1] "2019-10-22"As you might expect by now, setting the date to an invalid daterequires you to explicitly handle this:
x%>%set_day(31)%>%set_month(4)#> Error in `invalid_resolve()`:#> ! Invalid date found at location 1.#> ℹ Resolve invalid date issues by specifying the `invalid` argument.x%>%set_day(31)%>%set_month(4,invalid ="previous")#> [1] "2019-04-30"You can additionally set the hour, minute, and second of aPOSIXct.
x<-date_time_build(2020,1,2,3,zone ="America/New_York")x#> [1] "2020-01-02 03:00:00 EST"x%>%set_minute(5)%>%set_second(10)#> [1] "2020-01-02 03:05:10 EST"As with other manipulations of POSIXct, you’ll have to be aware ofdaylight saving time when setting components. You may need to supply thenonexistent orambiguous arguments of theset_*() functions to handle these issues.