Movatterモバイル変換

Why can’t I do day arithmetic on a year-month-day?

It might seem intuitive that since you can do:

x<-year_month_day(2019,1,5)add_months(x,1)#> <year_month_day<day>[1]>#> [1] "2019-02-05"

That you should also be able to do:

add_days(x,1)#> Error in `add_days()`:#> ! Can't perform this operation on a <clock_year_month_day>.#> ℹ Do you need to convert to a time point first?#> ℹ Use `as_naive_time()` or `as_sys_time()` to convert to a time point.

Generally, calendars don’t support day based arithmetic, nor do theysupport arithmetic at more precise precisions than day. Instead, youhave to convert to a time point, do the arithmetic there, and thenconvert back (if you still need a year-month-day after that).

x%>%as_naive_time()%>%add_days(1)%>%as_year_month_day()#> <year_month_day<day>[1]>#> [1] "2019-01-06"

The first reason for this is performance. A year-month-day is afield type, implemented as multiple parallel vectors holdingthe year, month, day, and all other components separately. There are twoways that day based arithmetic could be implemented for this:

Increment the day field, then check the year and month field tosee if they need to be incremented, accounting for months having adiffering number of days, and leap years.
Convert to naive-time, add days, convert back.

Both approaches are relatively expensive. One of the goals of thelow-level API of clock is to make these expensive operations explicit.This helps make it apparent that when you need to chain togethermultiple operations, you should try and do all of yourcalendrical arithmetic steps first, then convert to a timepoint (i.e. the second bullet point from above) to do all of yourchronological arithmetic.

The second reason for this has to do with invalid dates, such as thethree in this vector:

odd_dates<-year_month_day(2019,2,28:31)odd_dates#> <year_month_day<day>[4]>#> [1] "2019-02-28" "2019-02-29" "2019-02-30" "2019-02-31"

What does it mean to “add 1 day” to these? There is no obvious answerto this question. Since clock requires that you first convert to a timepoint to do day based arithmetic, you’ll be forced to callinvalid_resolve() to handle these invalid dates first.After resolving them manually, then day based arithmetic again makessense.

odd_dates%>%invalid_resolve(invalid ="next")#> <year_month_day<day>[4]>#> [1] "2019-02-28" "2019-03-01" "2019-03-01" "2019-03-01"odd_dates%>%invalid_resolve(invalid ="next")%>%as_naive_time()%>%add_days(2)#> <naive_time<day>[4]>#> [1] "2019-03-02" "2019-03-03" "2019-03-03" "2019-03-03"odd_dates%>%invalid_resolve(invalid ="overflow")#> <year_month_day<day>[4]>#> [1] "2019-02-28" "2019-03-01" "2019-03-02" "2019-03-03"odd_dates%>%invalid_resolve(invalid ="overflow")%>%as_naive_time()%>%add_days(2)#> <naive_time<day>[4]>#> [1] "2019-03-02" "2019-03-03" "2019-03-04" "2019-03-05"

Why can’t I add time to a zoned-time?

If you have a zoned-time, such as:

x<-zoned_time_parse_complete("1970-04-26T01:30:00-05:00[America/New_York]")x#> <zoned_time<second><America/New_York>[1]>#> [1] "1970-04-26T01:30:00-05:00"

You might wonder why you can’t add any units of time to it:

add_days(x,1)#> Error in `add_days()`:#> ! Can't perform this operation on a <clock_zoned_time>.#> ℹ Do you need to convert to a time point first?#> ℹ Use `as_naive_time()` or `as_sys_time()` to convert to a time point.add_seconds(x,1)#> Error in `add_seconds()`:#> ! Can't perform this operation on a <clock_zoned_time>.#> ℹ Do you need to convert to a time point first?#> ℹ Use `as_naive_time()` or `as_sys_time()` to convert to a time point.

In clock, you can’t do much with zoned-times directly. The best wayto understand this is to think of a zoned-time as containing 3 things: asys-time, a naive-time, and a time zone name. You can access thosethings with:

x#> <zoned_time<second><America/New_York>[1]>#> [1] "1970-04-26T01:30:00-05:00"# The printed time with no time zone infoas_naive_time(x)#> <naive_time<second>[1]>#> [1] "1970-04-26T01:30:00"# The equivalent time in UTCas_sys_time(x)#> <sys_time<second>[1]>#> [1] "1970-04-26T06:30:00"zoned_time_zone(x)#> [1] "America/New_York"

Callingadd_days() on a zoned-time is then an ambiguousoperation. Should we add to the sys-time or the naive-time that iscontained in the zoned-time? The answer changes depending on thescenario.

Because of this, you have to extract out the relevant time point thatyou care about, operate on that, and then convert back to zoned-time.This often produces the same result:

x%>%as_naive_time()%>%add_seconds(1)%>%as_zoned_time(zoned_time_zone(x))#> <zoned_time<second><America/New_York>[1]>#> [1] "1970-04-26T01:30:01-05:00"x%>%as_sys_time()%>%add_seconds(1)%>%as_zoned_time(zoned_time_zone(x))#> <zoned_time<second><America/New_York>[1]>#> [1] "1970-04-26T01:30:01-05:00"

But not always! When daylight saving time is involved, the choice ofsys-time or naive-time matters. Let’s try adding 30 minutes:

# There is a DST gap 1 second after 01:59:59,# which jumps us straight to 03:00:00,# skipping the 2 o'clock hour entirelyx%>%as_naive_time()%>%add_minutes(30)%>%as_zoned_time(zoned_time_zone(x))#> Error in `as_zoned_time()`:#> ! Nonexistent time due to daylight saving time at location 1.#> ℹ Resolve nonexistent time issues by specifying the `nonexistent` argument.x%>%as_sys_time()%>%add_minutes(30)%>%as_zoned_time(zoned_time_zone(x))#> <zoned_time<second><America/New_York>[1]>#> [1] "1970-04-26T03:00:00-04:00"

When adding to the naive-time, we got an error. With the sys-time,everything seems okay. What happened?

The sys-time scenario is easy to explain. Technically this convertsto UTC, adds the time there, then converts back to your time zone. Aneasier way to think about this is that you sat in front of your computerfor exactly 30 minutes (1800 seconds), then looked at the clock.Assuming that that clock automatically changes itself correctly fordaylight saving time, it should read 3 o’clock.

The naive-time scenario makes more sense if you break down the steps.First, we convert to naive-time, dropping all time zone information butkeeping the printed time:

x#> <zoned_time<second><America/New_York>[1]>#> [1] "1970-04-26T01:30:00-05:00"x%>%as_naive_time()#> <naive_time<second>[1]>#> [1] "1970-04-26T01:30:00"

We add 30 minutes to this. Because we don’t have any time zoneinformation, this lands us at 2 o’clock, which isn’t an issue whenworking with naive-time:

x%>%as_naive_time()%>%add_minutes(30)#> <naive_time<second>[1]>#> [1] "1970-04-26T02:00:00"

Finally, we convert back to zoned-time. If possible, this tries tokeep the printed time, and just attaches the relevant time zone onto it.However, in this case that isn’t possible, since 2 o’clock didn’t existin this time zone! Thisnonexistent time must be handledexplicitly by setting thenonexistent argument ofas_zoned_time(). We can choose from a variety of strategiesto handle nonexistent times, but here we just roll forward to the nextvalid moment in time.

x%>%as_naive_time()%>%add_minutes(30)%>%as_zoned_time(zoned_time_zone(x),nonexistent ="roll-forward")#> <zoned_time<second><America/New_York>[1]>#> [1] "1970-04-26T03:00:00-04:00"

As a general rule, it often makes the most sense to add:

Years, quarters, and months to acalendar.
Weeks and days to anaive time.
Hours, minutes, seconds, and subseconds to asystime.

This is what the high-level API for POSIXct does. However, this isn’talways what you want, so the low-level API requires you to be moreexplicit.

Where did my POSIXct subseconds go?

old<-options(digits.secs =6,digits =22)

Consider the following POSIXct:

x<-as.POSIXct("2019-01-01 01:00:00.2","America/New_York")x#> [1] "2019-01-01 01:00:00.2 EST"

It looks like there is some fractional second information here, butconverting it to naive-time drops it:

as_naive_time(x)#> <naive_time<second>[1]>#> [1] "2019-01-01T01:00:00"

This is purposeful. clock treats POSIXct as asecondprecision data type. The reason for this has to do with the factthat POSIXct is implemented as a vector of doubles, which have a limitto how precisely they can store information. For example, try parsing aslightly smaller or larger fractional second:

y<-as.POSIXct(c("2019-01-01 01:00:00.1","2019-01-01 01:00:00.3"),"America/New_York")# Oh dear!y#> [1] "2019-01-01 01:00:00.0 EST" "2019-01-01 01:00:00.2 EST"

It isn’t printing correctly, at the very least. Let’s look under thehood:

unclass(y)#> [1] 1546322400.099999904633 1546322400.299999952316#> attr(,"tzone")#> [1] "America/New_York"

Double vectors have a limit to how much precision they can represent,and this is bumping up against that limit. So our.1seconds is instead represented as.099999etc.

This precision loss gets worse the farther we get from the epoch,1970-01-01, represented as0 under the hood. For example,here we’ll use a number of seconds that represents the year 2050, andadd 5 microseconds to it:

new_utc<-function(x) {class(x)<-c("POSIXct","POSIXt")attr(x,"tzone")<-"UTC"  x}year_2050<-2524608000five_microseconds<-0.000005new_utc(year_2050)#> [1] "2050-01-01 UTC"# Oh no!new_utc(year_2050+ five_microseconds)#> [1] "2050-01-01 00:00:00.000004 UTC"# Represented internally as:year_2050+ five_microseconds#> [1] 2524608000.000004768372

Because of these issues, clock treats POSIXct as a second precisiondata type, dropping all other information. Instead, you should parsedirectly into a subsecond clock type:

naive_time_parse(c("2019-01-01T01:00:00.1","2019-01-01T01:00:00.3"),precision ="millisecond")%>%as_zoned_time("America/New_York")#> <zoned_time<millisecond><America/New_York>[2]>#> [1] "2019-01-01T01:00:00.100-05:00" "2019-01-01T01:00:00.300-05:00"

# Reset old optionsoptions(old)

What is the time zone of Date?

In clock, R’s native Date type is actually assumed to benaive, i.e. clock assumes that there is a yet-to-be-specifiedtime zone, like with a naive-time. The other possibility is to assumethat Date is UTC (like sys-time), but it is often more intuitive forDates to be naive when manipulating them and converting them tozoned-time or POSIXct.

R does not consistently treat Dates as naive or UTC. Instead itswitches between them, depending on the function.

For example, the Date method ofas.POSIXct() does notexpose atz argument. Instead, it assumes that Date is UTC,and that the result should be shown in local time (as defined bySys.timezone()). This often results in confusing behavior,such as:

x<-as.Date("2019-01-01")x#> [1] "2019-01-01"withr::with_timezone("America/New_York", {print(as.POSIXct(x))})#> [1] "2019-01-01 UTC"

With clock, converting to zoned-time from Date will always assumethat Date is naive, which will keep the printed date (if possible) andshow it in thezone you specified.

as_zoned_time(x,"UTC")#> <zoned_time<second><UTC>[1]>#> [1] "2019-01-01T00:00:00+00:00"as_zoned_time(x,"America/New_York")#> <zoned_time<second><America/New_York>[1]>#> [1] "2019-01-01T00:00:00-05:00"as_zoned_time(x,"Europe/London")#> <zoned_time<second><Europe/London>[1]>#> [1] "2019-01-01T00:00:00+00:00"

On the other hand, the POSIXct method foras.Date()treats Date as a naive type. This is probably what you want, and thisexample just shows the inconsistency. It is a bit hard to see this,because thetz argument of the method defaults to"UTC", but if you set thetz argument to thezone of your input, it becomes clear:

x<-as.POSIXct("2019-01-01 23:00:00","America/New_York")as.Date(x,tz =date_time_zone(x))#> [1] "2019-01-01"

If this assumed that Date was UTC, then it would have resulted insomething like:

utc<-date_time_set_zone(x,"UTC")utc#> [1] "2019-01-02 04:00:00 UTC"as.Date(utc,tz =date_time_zone(utc))#> [1] "2019-01-02"

What does clock do with leap seconds?

clock currently handles leap seconds in the same way that base R’sdate-time (POSIXct) class does - it ignores them entirely. Whilestrptime() has some very simple capabilities for parsingleap seconds, clock doesn’t allow them at all:

raw<-c("2015-12-31T23:59:59","2015-12-31T23:59:60",# A real leap second!"2016-01-01T00:00:00")x<-sys_time_parse(raw)#> Warning: Failed to parse 1 string at location 2. Returning `NA` at that#> location.x#> <sys_time<second>[3]>#> [1] "2015-12-31T23:59:59" NA                    "2016-01-01T00:00:00"

# Reported as exactly 1 second apart.# In real life these are 2 seconds apart because of the leap second.x[[3]]- x[[1]]#> <duration<second>[1]>#> [1] 1

Because none of the clock types handle leap seconds, clock currentlydoesn’t offer a way to parse them. Your current best option if youreally need to parse leap seconds is to usestrptime():

# This returns a POSIXlt, which can handle the special 60s fieldx<-strptime(raw,format ="%Y-%m-%dT%H:%M:%S",tz ="UTC")x#> [1] "2015-12-31 23:59:59 UTC" "2015-12-31 23:59:60 UTC"#> [3] "2016-01-01 00:00:00 UTC"# On conversion to POSIXct, it "rolls" forwardas.POSIXct(x)#> [1] "2015-12-31 23:59:59 UTC" "2016-01-01 00:00:00 UTC"#> [3] "2016-01-01 00:00:00 UTC"

strptime() isn’t a great solution though, because theparsing is fairly simple. If you try to use a “fake” leap second, itwill still accept it, even though it isn’t a real time:

# 2016-12-31 wasn't a leap second date, but it still tries to parse this fake timestrptime("2016-12-31T23:59:60",format ="%Y-%m-%dT%H:%M:%S",tz ="UTC")#> [1] "2016-12-31 23:59:60 UTC"

A true solution would check this against a database of actual leapseconds, and would only successfully parse it if it matched a real leapsecond. The C++ library that powers clock does have this capability,through autc_clock class, and we may expose this in alimited form in the future, with conversion to and from sys-time andnaive-time.

Why doesn’t this work with data.table?

While the entire high-level API for R’s native date (Date) anddate-time (POSIXct) types will work fine with data.table, if you try toput any of the major clock types into a data.table, you will probablysee this error message:

library(data.table)data.table(x =year_month_day(2019,1,1))#> Error in dimnames(x) <- dn :#>   length of 'dimnames' [1] not equal to array extent

You won’t see this issue when working with data.frames ortibbles.

As of now, data.table doesn’t support the concept ofrecordtypes. These are implemented as a list of vectors of equal length,that together represent a single idea. Thelength() ofthese types should be taken from the length of the vectors, not thelength of the list. If you unclass any of the clock types, you’ll seethat they are implemented in this way:

ymdh<-year_month_day(2019,1,1:2,1)unclass(ymdh)#> $year#> [1] 2019 2019#>#> $month#> [1] 1 1#>#> $day#> [1] 1 2#>#> $hour#> [1] 1 1#>#> attr(,"precision")#> [1] 5unclass(as_naive_time(ymdh))#> $lower#> [1] 2147483648 2147483648#>#> $upper#> [1] 429529 429553#>#> attr(,"precision")#> [1] 5#> attr(,"clock")#> [1] 1

I find that record types are extremely useful data structures forbuilding upon R’s basic atomic types in ways that otherwise couldn’t bedone. They allow calendar types to hold information about eachcomponent, enabling instant access for retrieval, modification, andgrouping. They also allow calendars to represent invalid dates, such as2019-02-31, without any issues. Time points use them tostore up to nanosecond precision date-times, which are really C++int64_t types that don’t nicely fit into any R atomic type(I am aware of the bit64 package, and made a conscious decision toimplement as a record type instead. This partly had to do with howmissing values are handled, and how that integrates with vctrs).

The idea of a record type actually isn’t new. R’s own POSIXlt type isa record type:

x<-as.POSIXct("2019-01-01","America/New_York")# POSIXct is implemented as a doubleunclass(x)#> [1] 1546318800#> attr(,"tzone")#> [1] "America/New_York"# POSIXlt is a record typeunclass(as.POSIXlt(x))#> $sec#> [1] 0#>#> $min#> [1] 0#>#> $hour#> [1] 0#>#> $mday#> [1] 1#>#> $mon#> [1] 0#>#> $year#> [1] 119#>#> $wday#> [1] 2#>#> $yday#> [1] 0#>#> $isdst#> [1] 0#>#> $zone#> [1] "EST"#>#> $gmtoff#> [1] -18000#>#> attr(,"tzone")#> [1] "America/New_York" "EST"              "EDT"#> attr(,"balanced")#> [1] TRUE

data.table doesn’t truly support POSIXlt either. Instead, you get awarning about them converting it to a POSIXct. This is pretty reasonableconsidering their focus on performance.

data.table(x =as.POSIXlt("2019-01-01","America/New_York"))#>             x#> 1: 2019-01-01#> Warning message:#> In as.data.table.list(x, keep.rownames = keep.rownames, check.names = check.names,  :#>   POSIXlt column type detected and converted to POSIXct. We do not recommend use of POSIXlt at all because it uses 40 bytes to store one date.

It was previously a bit difficult to create record types in R becausethere were few examples and no resources to build on. In vctrs, we’veadded avctrs_rcrd type that serves as a base to build newrecord types on. Many S3 methods have been written forvctrs_rcrds in a way that should work for any type thatbuilds on top of it, giving you a lot of scaffolding for free.

I am hopeful that as more record types make their way into the Recosystem built on this common foundation, it might be possible fordata.table to enable this as an approved type in their package.