Movatterモバイル変換


[0]ホーム

URL:


Skip to contents

Insert time series rows with regularly spaced timestamps

Source:R/dplyr-pad_by_time.R
pad_by_time.Rd

The easiest way to fill in missing timestamps or convert to a moregranular period (e.g. quarter to month). Wraps thepadr::pad() functionfor padding tibbles.

Usage

pad_by_time(.data,.date_var,  .by="auto",  .pad_value=NA,  .fill_na_direction=c("none","down","up","downup","updown"),  .start_date=NULL,  .end_date=NULL)

Arguments

.data

A tibble with a time-based column.

.date_var

A column containing date or date-time values to pad

.by

Either "auto", a time-based frequency like "year", "month", "day", "hour", etc,or a time expression like "5 min", or "7 days". See Details.

.pad_value

Fills in padded values. Default isNA.

.fill_na_direction

Users can provide anNA fill strategy usingtidyr::fill().Possible values:'none','down','up','downup','updown'. Default:'none'

.start_date

Specifies the start of the padded series.If NULL it will use the lowest value of the input variable.

.end_date

Specifies the end of the padded series.If NULL it will use the highest value of the input variable.

Value

Atibble ordata.frame with rows containing missing timestamps added.

Details

Padding Missing Observations

The most common use case forpad_by_time() is to add rows where timestampsare missing. This could be from sales data that have missing values on weekends and holidays.Or it could be high frequency data where observations are irregularly spaced and should bereset to a regular frequency.

Going from Low to High Frequency

The second use case is going from a low frequency (e.g. day) to high frequency (e.g. hour).This is possible by supplying a higher frequency topad_by_time().

Interval, .by

Padding can be applied in the following ways:

  • .by = "auto" -pad_by_time() will detect the time-stamp frequency and apply padding.

  • The eight intervals in are: year, quarter, month, week, day, hour, min, and sec.

  • Intervals like 5 minutes, 6 hours, 10 days are possible.

Pad Value, .pad_value

A pad value can be supplied that fills in missing numeric data. Note that this is only applied to numeric columns.

Fill NA Direction, .fill_na_directions

Usestidyr::fill() to fill missing observations using a fill strategy.

References

  • This function wraps thepadr::pad() function developed by Edwin Thoen.

See also

Imputation:

Time-Based dplyr functions:

Examples

library(dplyr)# Create a quarterly series with 1 missing valuemissing_data_tbl<-tibble::tibble(    date=tk_make_timeseries("2014-01-01","2015-01-01", by="quarter"),    value=1:5)%>%slice(-4)# Lose the 4th quarter on purposemissing_data_tbl#># A tibble: 4 × 2#>   date       value#><date><int>#>1 2014-01-01     1#>2 2014-04-01     2#>3 2014-07-01     3#>4 2015-01-01     5# Detects missing quarter, and pads the missing regularly spaced quarter with NAmissing_data_tbl%>%pad_by_time(date, .by="quarter")#># A tibble: 5 × 2#>   date       value#><date><int>#>1 2014-01-01     1#>2 2014-04-01     2#>3 2014-07-01     3#>4 2014-10-01NA#>5 2015-01-01     5# Can specify a shorter period. This fills monthly.missing_data_tbl%>%pad_by_time(date, .by="month")#># A tibble: 13 × 2#>    date       value#><date><int>#> 1 2014-01-01     1#> 2 2014-02-01NA#> 3 2014-03-01NA#> 4 2014-04-01     2#> 5 2014-05-01NA#> 6 2014-06-01NA#> 7 2014-07-01     3#> 8 2014-08-01NA#> 9 2014-09-01NA#>10 2014-10-01NA#>11 2014-11-01NA#>12 2014-12-01NA#>13 2015-01-01     5# Can let pad_by_time() auto-detect date and periodmissing_data_tbl%>%pad_by_time()#> .date_var is missing. Using: date#> pad applied on the interval: quarter#># A tibble: 5 × 2#>   date       value#><date><int>#>1 2014-01-01     1#>2 2014-04-01     2#>3 2014-07-01     3#>4 2014-10-01NA#>5 2015-01-01     5# Can specify a .pad_valuemissing_data_tbl%>%pad_by_time(date, .by="quarter", .pad_value=0)#># A tibble: 5 × 2#>   date       value#><date><int>#>1 2014-01-01     1#>2 2014-04-01     2#>3 2014-07-01     3#>4 2014-10-01     0#>5 2015-01-01     5# Can then impute missing valuesmissing_data_tbl%>%pad_by_time(date, .by="quarter")%>%mutate(value=ts_impute_vec(value, period=1))#># A tibble: 5 × 2#>   date       value#><date><dbl>#>1 2014-01-01     1#>2 2014-04-01     2#>3 2014-07-01     3#>4 2014-10-01     4#>5 2015-01-01     5# Can specify a custom .start_date and .end_datemissing_data_tbl%>%pad_by_time(date, .by="quarter", .start_date="2013", .end_date="2015-07-01")#># A tibble: 11 × 2#>    date       value#><date><int>#> 1 2013-01-01NA#> 2 2013-04-01NA#> 3 2013-07-01NA#> 4 2013-10-01NA#> 5 2014-01-01     1#> 6 2014-04-01     2#> 7 2014-07-01     3#> 8 2014-10-01NA#> 9 2015-01-01     5#>10 2015-04-01NA#>11 2015-07-01NA# Can specify a tidyr::fill() directionmissing_data_tbl%>%pad_by_time(date, .by="quarter",               .fill_na_direction="downup",               .start_date="2013", .end_date="2015-07-01")#># A tibble: 11 × 2#>    date       value#><date><int>#> 1 2013-01-01     1#> 2 2013-04-01     1#> 3 2013-07-01     1#> 4 2013-10-01     1#> 5 2014-01-01     1#> 6 2014-04-01     2#> 7 2014-07-01     3#> 8 2014-10-01     3#> 9 2015-01-01     5#>10 2015-04-01     5#>11 2015-07-01     5# --- GROUPS ----# Apply standard NA padding to groupsFANG%>%group_by(symbol)%>%pad_by_time(.by="day")#> .date_var is missing. Using: date#># A tibble: 5,836 × 8#># Groups:   symbol [4]#>    symbol date        open  high   low close  volume adjusted#><chr><date><dbl><dbl><dbl><dbl><dbl><dbl>#> 1 AMZN   2013-01-02  256.  258.  253.  257. 3271000     257.#> 2 AMZN   2013-01-03  257.  261.  256.  258. 2750900     258.#> 3 AMZN   2013-01-04  258.  260.  257.  259. 1874200     259.#> 4 AMZN   2013-01-05NANANANANANA#> 5 AMZN   2013-01-06NANANANANANA#> 6 AMZN   2013-01-07  263.  270.  263.  268. 4910000     268.#> 7 AMZN   2013-01-08  267.  269.  264.  266. 3010700     266.#> 8 AMZN   2013-01-09  268.  270.  265.  266. 2265600     266.#> 9 AMZN   2013-01-10  269.  269.  262.  265. 2863400     265.#>10 AMZN   2013-01-11  265.  268.  264.  268. 2413300     268.#># ℹ 5,826 more rows# Apply constant pad valueFANG%>%group_by(symbol)%>%pad_by_time(.by="day", .pad_value=0)#> .date_var is missing. Using: date#># A tibble: 5,836 × 8#># Groups:   symbol [4]#>    symbol date        open  high   low close  volume adjusted#><chr><date><dbl><dbl><dbl><dbl><dbl><dbl>#> 1 AMZN   2013-01-02  256.  258.  253.  257. 3271000     257.#> 2 AMZN   2013-01-03  257.  261.  256.  258. 2750900     258.#> 3 AMZN   2013-01-04  258.  260.  257.  259. 1874200     259.#> 4 AMZN   2013-01-05    0     0     0     0        0       0#> 5 AMZN   2013-01-06    0     0     0     0        0       0#> 6 AMZN   2013-01-07  263.  270.  263.  268. 4910000     268.#> 7 AMZN   2013-01-08  267.  269.  264.  266. 3010700     266.#> 8 AMZN   2013-01-09  268.  270.  265.  266. 2265600     266.#> 9 AMZN   2013-01-10  269.  269.  262.  265. 2863400     265.#>10 AMZN   2013-01-11  265.  268.  264.  268. 2413300     268.#># ℹ 5,826 more rows# Apply filled padding to groupsFANG%>%group_by(symbol)%>%pad_by_time(.by="day", .fill_na_direction="down")#> .date_var is missing. Using: date#># A tibble: 5,836 × 8#># Groups:   symbol [4]#>    symbol date        open  high   low close  volume adjusted#><chr><date><dbl><dbl><dbl><dbl><dbl><dbl>#> 1 AMZN   2013-01-02  256.  258.  253.  257. 3271000     257.#> 2 AMZN   2013-01-03  257.  261.  256.  258. 2750900     258.#> 3 AMZN   2013-01-04  258.  260.  257.  259. 1874200     259.#> 4 AMZN   2013-01-05  258.  260.  257.  259. 1874200     259.#> 5 AMZN   2013-01-06  258.  260.  257.  259. 1874200     259.#> 6 AMZN   2013-01-07  263.  270.  263.  268. 4910000     268.#> 7 AMZN   2013-01-08  267.  269.  264.  266. 3010700     266.#> 8 AMZN   2013-01-09  268.  270.  265.  266. 2265600     266.#> 9 AMZN   2013-01-10  269.  269.  262.  265. 2863400     265.#>10 AMZN   2013-01-11  265.  268.  264.  268. 2413300     268.#># ℹ 5,826 more rows

[8]ページ先頭

©2009-2025 Movatter.jp