The easiest way to fill in missing timestamps or convert to a moregranular period (e.g. quarter to month). Wraps thepadr::pad() functionfor padding tibbles.
Usage
pad_by_time(.data,.date_var, .by="auto", .pad_value=NA, .fill_na_direction=c("none","down","up","downup","updown"), .start_date=NULL, .end_date=NULL)Arguments
- .data
A tibble with a time-based column.
- .date_var
A column containing date or date-time values to pad
- .by
Either "auto", a time-based frequency like "year", "month", "day", "hour", etc,or a time expression like "5 min", or "7 days". See Details.
- .pad_value
Fills in padded values. Default is
NA.- .fill_na_direction
Users can provide an
NAfill strategy usingtidyr::fill().Possible values:'none','down','up','downup','updown'. Default:'none'- .start_date
Specifies the start of the padded series.If NULL it will use the lowest value of the input variable.
- .end_date
Specifies the end of the padded series.If NULL it will use the highest value of the input variable.
Details
Padding Missing Observations
The most common use case forpad_by_time() is to add rows where timestampsare missing. This could be from sales data that have missing values on weekends and holidays.Or it could be high frequency data where observations are irregularly spaced and should bereset to a regular frequency.
Going from Low to High Frequency
The second use case is going from a low frequency (e.g. day) to high frequency (e.g. hour).This is possible by supplying a higher frequency topad_by_time().
Interval, .by
Padding can be applied in the following ways:
.by = "auto"-pad_by_time()will detect the time-stamp frequency and apply padding.The eight intervals in are: year, quarter, month, week, day, hour, min, and sec.
Intervals like 5 minutes, 6 hours, 10 days are possible.
Pad Value, .pad_value
A pad value can be supplied that fills in missing numeric data. Note that this is only applied to numeric columns.
Fill NA Direction, .fill_na_directions
Usestidyr::fill() to fill missing observations using a fill strategy.
References
This function wraps the
padr::pad()function developed by Edwin Thoen.
See also
Imputation:
ts_impute_vec()- Impute missing values for time series.
Time-Based dplyr functions:
summarise_by_time()- Easily summarise using a date column.mutate_by_time()- Simplifies applying mutations by time windows.pad_by_time()- Insert time series rows with regularly spaced timestampsfilter_by_time()- Quickly filter using date ranges.filter_period()- Apply filtering expressions inside periods (windows)slice_period()- Apply slice inside periods (windows)condense_period()- Convert to a different periodicitybetween_time()- Range detection for date or date-time sequences.slidify()- Turn any function into a sliding (rolling) function
Examples
library(dplyr)# Create a quarterly series with 1 missing valuemissing_data_tbl<-tibble::tibble( date=tk_make_timeseries("2014-01-01","2015-01-01", by="quarter"), value=1:5)%>%slice(-4)# Lose the 4th quarter on purposemissing_data_tbl#># A tibble: 4 × 2#> date value#><date><int>#>1 2014-01-01 1#>2 2014-04-01 2#>3 2014-07-01 3#>4 2015-01-01 5# Detects missing quarter, and pads the missing regularly spaced quarter with NAmissing_data_tbl%>%pad_by_time(date, .by="quarter")#># A tibble: 5 × 2#> date value#><date><int>#>1 2014-01-01 1#>2 2014-04-01 2#>3 2014-07-01 3#>4 2014-10-01NA#>5 2015-01-01 5# Can specify a shorter period. This fills monthly.missing_data_tbl%>%pad_by_time(date, .by="month")#># A tibble: 13 × 2#> date value#><date><int>#> 1 2014-01-01 1#> 2 2014-02-01NA#> 3 2014-03-01NA#> 4 2014-04-01 2#> 5 2014-05-01NA#> 6 2014-06-01NA#> 7 2014-07-01 3#> 8 2014-08-01NA#> 9 2014-09-01NA#>10 2014-10-01NA#>11 2014-11-01NA#>12 2014-12-01NA#>13 2015-01-01 5# Can let pad_by_time() auto-detect date and periodmissing_data_tbl%>%pad_by_time()#> .date_var is missing. Using: date#> pad applied on the interval: quarter#># A tibble: 5 × 2#> date value#><date><int>#>1 2014-01-01 1#>2 2014-04-01 2#>3 2014-07-01 3#>4 2014-10-01NA#>5 2015-01-01 5# Can specify a .pad_valuemissing_data_tbl%>%pad_by_time(date, .by="quarter", .pad_value=0)#># A tibble: 5 × 2#> date value#><date><int>#>1 2014-01-01 1#>2 2014-04-01 2#>3 2014-07-01 3#>4 2014-10-01 0#>5 2015-01-01 5# Can then impute missing valuesmissing_data_tbl%>%pad_by_time(date, .by="quarter")%>%mutate(value=ts_impute_vec(value, period=1))#># A tibble: 5 × 2#> date value#><date><dbl>#>1 2014-01-01 1#>2 2014-04-01 2#>3 2014-07-01 3#>4 2014-10-01 4#>5 2015-01-01 5# Can specify a custom .start_date and .end_datemissing_data_tbl%>%pad_by_time(date, .by="quarter", .start_date="2013", .end_date="2015-07-01")#># A tibble: 11 × 2#> date value#><date><int>#> 1 2013-01-01NA#> 2 2013-04-01NA#> 3 2013-07-01NA#> 4 2013-10-01NA#> 5 2014-01-01 1#> 6 2014-04-01 2#> 7 2014-07-01 3#> 8 2014-10-01NA#> 9 2015-01-01 5#>10 2015-04-01NA#>11 2015-07-01NA# Can specify a tidyr::fill() directionmissing_data_tbl%>%pad_by_time(date, .by="quarter", .fill_na_direction="downup", .start_date="2013", .end_date="2015-07-01")#># A tibble: 11 × 2#> date value#><date><int>#> 1 2013-01-01 1#> 2 2013-04-01 1#> 3 2013-07-01 1#> 4 2013-10-01 1#> 5 2014-01-01 1#> 6 2014-04-01 2#> 7 2014-07-01 3#> 8 2014-10-01 3#> 9 2015-01-01 5#>10 2015-04-01 5#>11 2015-07-01 5# --- GROUPS ----# Apply standard NA padding to groupsFANG%>%group_by(symbol)%>%pad_by_time(.by="day")#> .date_var is missing. Using: date#># A tibble: 5,836 × 8#># Groups: symbol [4]#> symbol date open high low close volume adjusted#><chr><date><dbl><dbl><dbl><dbl><dbl><dbl>#> 1 AMZN 2013-01-02 256. 258. 253. 257. 3271000 257.#> 2 AMZN 2013-01-03 257. 261. 256. 258. 2750900 258.#> 3 AMZN 2013-01-04 258. 260. 257. 259. 1874200 259.#> 4 AMZN 2013-01-05NANANANANANA#> 5 AMZN 2013-01-06NANANANANANA#> 6 AMZN 2013-01-07 263. 270. 263. 268. 4910000 268.#> 7 AMZN 2013-01-08 267. 269. 264. 266. 3010700 266.#> 8 AMZN 2013-01-09 268. 270. 265. 266. 2265600 266.#> 9 AMZN 2013-01-10 269. 269. 262. 265. 2863400 265.#>10 AMZN 2013-01-11 265. 268. 264. 268. 2413300 268.#># ℹ 5,826 more rows# Apply constant pad valueFANG%>%group_by(symbol)%>%pad_by_time(.by="day", .pad_value=0)#> .date_var is missing. Using: date#># A tibble: 5,836 × 8#># Groups: symbol [4]#> symbol date open high low close volume adjusted#><chr><date><dbl><dbl><dbl><dbl><dbl><dbl>#> 1 AMZN 2013-01-02 256. 258. 253. 257. 3271000 257.#> 2 AMZN 2013-01-03 257. 261. 256. 258. 2750900 258.#> 3 AMZN 2013-01-04 258. 260. 257. 259. 1874200 259.#> 4 AMZN 2013-01-05 0 0 0 0 0 0#> 5 AMZN 2013-01-06 0 0 0 0 0 0#> 6 AMZN 2013-01-07 263. 270. 263. 268. 4910000 268.#> 7 AMZN 2013-01-08 267. 269. 264. 266. 3010700 266.#> 8 AMZN 2013-01-09 268. 270. 265. 266. 2265600 266.#> 9 AMZN 2013-01-10 269. 269. 262. 265. 2863400 265.#>10 AMZN 2013-01-11 265. 268. 264. 268. 2413300 268.#># ℹ 5,826 more rows# Apply filled padding to groupsFANG%>%group_by(symbol)%>%pad_by_time(.by="day", .fill_na_direction="down")#> .date_var is missing. Using: date#># A tibble: 5,836 × 8#># Groups: symbol [4]#> symbol date open high low close volume adjusted#><chr><date><dbl><dbl><dbl><dbl><dbl><dbl>#> 1 AMZN 2013-01-02 256. 258. 253. 257. 3271000 257.#> 2 AMZN 2013-01-03 257. 261. 256. 258. 2750900 258.#> 3 AMZN 2013-01-04 258. 260. 257. 259. 1874200 259.#> 4 AMZN 2013-01-05 258. 260. 257. 259. 1874200 259.#> 5 AMZN 2013-01-06 258. 260. 257. 259. 1874200 259.#> 6 AMZN 2013-01-07 263. 270. 263. 268. 4910000 268.#> 7 AMZN 2013-01-08 267. 269. 264. 266. 3010700 266.#> 8 AMZN 2013-01-09 268. 270. 265. 266. 2265600 266.#> 9 AMZN 2013-01-10 269. 269. 262. 265. 2863400 265.#>10 AMZN 2013-01-11 265. 268. 264. 268. 2413300 268.#># ℹ 5,826 more rows