Movatterモバイル変換

dtts:Time-series functionality based onnanotime anddata.table.

Motivation

Combining packagenanotimefor operating with nanosecond time-resolution with packagedata.tableleverages the conciseness, high performance, and memory efficiency ofthe latter to provide high-resolution, high-performance time seriesoperations.

Our time-series representation is simply adata.tablewith a first column of typenanotime and a key on it. Thismeans all the standarddata.table functions can be used,and this package consolidates this functionality.

Specifically,dtts proposes alignment functions that areparticularly versatile, and allow to work across time-zones.

Usage

Creatingadata.table-based time-series with ananotimeindex

Three operations are necessary to create adata.table-based time-series for use with the functionsdefined in this package: 1. Create the time index, i.e. a vector ofnanotime 2. Create adata.table with the firstcolumn being the time index and specifying it as a key

For instance, this code creates a time-series of 10 rows spaced everyhour with a data columnV1 containing random data:

library(data.table)library(nanotime)t1<-seq(as.nanotime(Sys.time()),by=as.nanoduration("01:00:00"),length.out=10)dt1<-data.table(index=t1,V1=runif(10),key="index")

produces:

                               index        V1 1: 2021-11-21T06:23:12.404650+00:00 0.7206800 2: 2021-11-21T07:23:12.404650+00:00 0.9677868 3: 2021-11-21T08:23:12.404650+00:00 0.6211587 4: 2021-11-21T09:23:12.404650+00:00 0.7669201 5: 2021-11-21T10:23:12.404650+00:00 0.6426368 6: 2021-11-21T11:23:12.404650+00:00 0.4026811 7: 2021-11-21T12:23:12.404650+00:00 0.2512213 8: 2021-11-21T13:23:12.404650+00:00 0.3476128 9: 2021-11-21T14:23:12.404650+00:00 0.966327110: 2021-11-21T15:23:12.404650+00:00 0.4744729

(Note that we can also write this in a singledata.tablestatement as

dt1<-data.table(index =seq(as.nanotime(Sys.time()),by=as.nanoduration("01:00:00"),length.out=10),V1 =runif(10),key ="index")

Alignment functions

Alignment is the process of matching the time of the observations ofone time series to another. All alignment functions in this package workin a similar way. For each point in the vectory onto whichx is aligned, a pair or arguments namedstartandend define an interval around this point. As an examplelet us takestart equal to -1 hour andendequal to 0 hour. This means that ay of 2021-11-20 11:00:00defines an interval from 2021-11-20 10:00:00 to 2021-11-20 11:00:00. Thealignment process will then use that interval to pick points in order tocompute one or more statistics on that interval for the correspondingpoint iny.

In addition to the argumentsstart andend,two other arguments, booleans namedsopen andeopen, define if the start and end, respectively, of theinterval are open or not.

Finally, note that when the interval is specified with ananoperiod type, the argumenttz is necessaryin order to give meaning to the interval. Withnanoperiod,alignments are time-zone aware and correct across daylight savingtime.

This figure shows an alignment using the “closest” point as data:

This figure shows an alignment using a statistic (here simplycounting the number of elements in the intervals):

`align_idx`

This function takes two vectors of typenanotime. Italigns the first one onto the second one and returns the indices of thefirst vector that align with the second vector. There is no choice ofaggregation function here as this function works uniquely onnanotime vectors. The algorithm selects the point inx that falls in the interval that is closest to the pointof alignment iny. The index of the point that falls inthat interval is returned at the position of the vectory.If no point exists in that intervalNaN is returned.

library(dtts)t1<-seq(as.nanotime("1970-01-01T00:00:00+00:00"),by=as.nanoduration("00:00:01"),length.out=100)t2<-seq(as.nanotime("1970-01-01T00:00:10+00:00"),by=as.nanoduration("00:00:10"),length.out=10)align_idx(t1, t2,start=as.nanoduration("-00:00:10"))

Which produces:

 [1]  10  20  30  40  50  60  70  80  90 100

`align`

This function takes adata.table and aligns it ontoy, a vector ofnanotime. Likealign_idx, it uses the argumentsstart,end,sopen andeopen to definethe intervals around the points iny.

Instead of the result being an index, it is a newdata.table time-series with the firstnanotimecolumn being the vectory, and the rows of this time-seriesare taken from thedata.tablex. If nofunction is specified (i.e. func isNULL), thefunction returns the row of the point inx that is in theinterval and that is closest to the point iny on which thealignment is made. Iffunc is defined, it receives for eachpoint iny all the rows inx that are in thedefined interval. Sofunc must be a statistic that returnsone row, but it may return one or more columns. Common examples aremeans (e.g. usingcolMeans), counts, etc.

In the following example a time-seriesdt1 is createdwith a data columnV1 which has the integer index as valueand it is aligned onto ananotime vectort2

library(dtts)t1<-seq(as.nanotime("1970-01-01T00:00:00+00:00"),by=as.nanoduration("00:00:01"),length.out=100)dt1<-data.table(index=t1,V1=0:99)setkey(dt1, index)t2<-seq(as.nanotime("1970-01-01T00:00:10+00:00"),by=as.nanoduration("00:00:10"),length.out=10)align(dt1, t2,start=as.nanoduration("-00:00:10"),func=colMeans)

Which produces:

                        index   V1 1: 1970-01-01T00:00:10+00:00  4.5 2: 1970-01-01T00:00:20+00:00 14.5 3: 1970-01-01T00:00:30+00:00 24.5 4: 1970-01-01T00:00:40+00:00 34.5 5: 1970-01-01T00:00:50+00:00 44.5 6: 1970-01-01T00:01:00+00:00 54.5 7: 1970-01-01T00:01:10+00:00 64.5 8: 1970-01-01T00:01:20+00:00 74.5 9: 1970-01-01T00:01:30+00:00 84.510: 1970-01-01T00:01:40+00:00 94.5

`grid_align`

This function adds one more dimension to the functionalign. Instead of taking a vectory, itconstructs a grid that has as interval the value supplied in theargumentby. The interval is controllable (with argumentsival_start,ival_end,ival_sopen,ival_eopen) but it is likely that in most cases the defaultwill be used which is the grid interval. As in the case ofalign, the caller can specifyfunc. Finally,note thatby can be either ananoduration or ananoperiod. In the latter case, as for the other functions,the argumenttz must be supplied so that thenanoperiod interval can be anchored to a specifictimezone.

The following example is the same as for thealignfunction, but shows that the vectort2 does not need to besupplied as it is instead constructed bygrid_align:

library(dtts)t1<-seq(as.nanotime("1970-01-01T00:00:00+00:00"),by=as.nanoduration("00:00:01"),length.out=100)dt1<-data.table(index=t1,V1=0:99)setkey(dt1, index)grid_align(dt1,as.nanoduration("00:00:10"),func=colMeans)

Which produces:

                        index   V1 1: 1970-01-01T00:00:10+00:00  4.5 2: 1970-01-01T00:00:20+00:00 14.5 3: 1970-01-01T00:00:30+00:00 24.5 4: 1970-01-01T00:00:40+00:00 34.5 5: 1970-01-01T00:00:50+00:00 44.5 6: 1970-01-01T00:01:00+00:00 54.5 7: 1970-01-01T00:01:10+00:00 64.5 8: 1970-01-01T00:01:20+00:00 74.5 9: 1970-01-01T00:01:30+00:00 84.510: 1970-01-01T00:01:40+00:00 94.5

Frequency

Usinggrid_align andnrow it is possible toget the frequency of a time-series, i.e. to count the number of elementsin each interval of a grid.

Taking the same example as above, we see that the result is the countof elements ofdt1 that are in each interval:

library(dtts)t1<-seq(as.nanotime("1970-01-01T00:00:00+00:00"),by=as.nanoduration("00:00:01"),length.out=100)dt1<-data.table(index=t1,V1=0:99)setkey(dt1, index)grid_align(dt1,as.nanoduration("00:00:10"),func=nrow)

Which produces:

                        index V1 1: 1970-01-01T00:00:10+00:00 10 2: 1970-01-01T00:00:20+00:00 10 3: 1970-01-01T00:00:30+00:00 10 4: 1970-01-01T00:00:40+00:00 10 5: 1970-01-01T00:00:50+00:00 10 6: 1970-01-01T00:01:00+00:00 10 7: 1970-01-01T00:01:10+00:00 10 8: 1970-01-01T00:01:20+00:00 10 9: 1970-01-01T00:01:30+00:00 1010: 1970-01-01T00:01:40+00:00 10

ops

ops performs arithmetic operations between twotime-series and has the following signature, wherex andy are time-series andop is a string denotingan arithmetic operator.

ops(x, y, op_string)

Each entry in the left time-series operand defines an interval fromthe previous entry, and the value associated with this interval will beapplied to all the observations in the right time-series operand thatfall in the interval. Note that the interval is closed at the beginningand open and the end. The available values for op are“*“,”/“,”+“,”-“.

This function is particulary useful to apply a multiplier or to add aconstant that changes over time; one example would be the adjustment ofstock prices for splits.

Here is a visualization ofops:

Here is an example:

one_second_duration<-as.nanoduration("00:00:01")t1<-nanotime(1:2* one_second_duration*3)t2<-nanotime(1:4* one_second_duration)dt1<-data.table(index=t1,data1 =1:length(t1))setkey(dt1, index)dt2<-data.table(index=t2,data1 =1:length(t2))setkey(dt2, index)ops(dt1, dt2,"+")

Which produces:

                       index data11: 1970-01-01T00:00:01+00:00     22: 1970-01-01T00:00:02+00:00     33: 1970-01-01T00:00:03+00:00     34: 1970-01-01T00:00:04+00:00     4

Time-series subsetting

Usingnanoival, it is possible to do complex subsettingof a time-series:

one_second<-1e9index<-seq(nanotime("2022-12-12 12:12:10+00:00"),length.out=10,by=one_second)dts<-data.table(index=index,data=1:length(index),key="index")ival<-as.nanoival(c("-2022-12-12 12:12:10+00:00 -> 2022-12-12 12:12:14+00:00-"),                     ("+2022-12-12 12:12:18+00:00 -> 2022-12-12 12:12:20+00:00+"))dts[index%in% ival]

Status

dtts currently proposes only a set of alignmentfunctions, but it is likely that other time-series functions will beimpletemented so thatnanotime-based time-series havereasonably complete time-series functionality.

See theissuetickets for an up to date list of potentially desirable, possiblyplanned, or at least discussed items.

Installation

The package is onCRAN andcan be installed via a standard

install.packages("dtts")

and development versions can be installed via

remotes::install_github("eddelbuettel/dtts")

Author

Dirk Eddelbuettel, Leonardo Silvestri

License

GPL (>= 2)

[8]ページ先頭