- Notifications
You must be signed in to change notification settings - Fork0
dplyr friendly spark style window aggregation for R dataframes and remote dbplyr tbls
License
talegari/tidier
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
tidier package provides ‘Apache Spark’style window aggregation for R dataframes and remotedbplyr tbls via‘mutate’ in‘dplyr’ flavour.
Create a new column with average temp over last seven days in the samemonth.
set.seed(101)airquality|># create date columndplyr::mutate(date_col=lubridate::make_date(1973,Month,Day))|># create gaps by removing some daysdplyr::slice_sample(prop=0.8)|># compute mean temperature over last seven days in the same monthtidier::mutate(avg_temp_over_last_week= mean(Temp,na.rm=TRUE),.order_by=Day,.by=Month,.frame= c(lubridate::days(7),# 7 days before current rowlubridate::days(-1)# do not include current row ),.index=date_col )#> # A tibble: 122 × 8#> Month Ozone Solar.R Wind Temp Day date_col avg_temp_over_last_week#> <int> <int> <int> <dbl> <int> <int> <date> <dbl>#> 1 6 NA 286 8.6 78 1 1973-06-01 NaN#> 2 6 NA 242 16.1 67 3 1973-06-03 78#> 3 6 NA 186 9.2 84 4 1973-06-04 72.5#> 4 6 NA 264 14.3 79 6 1973-06-06 76.3#> 5 6 29 127 9.7 82 7 1973-06-07 77#> 6 6 NA 273 6.9 87 8 1973-06-08 78#> 7 6 NA 259 10.9 93 11 1973-06-11 83#> 8 6 NA 250 9.2 92 12 1973-06-12 85.2#> 9 6 23 148 8 82 13 1973-06-13 86.6#> 10 6 NA 332 13.8 80 14 1973-06-14 87.2#> # ℹ 112 more rows
mutatesupports.by(group by),.order_by(order by),.frame(endpoints of window frame),.index(identify index column like date column, in df versiononly),.complete(whether to compute over incomplete window, in dfversion only).
mutateautomatically uses a future backend (viafurrr, in df version only).
This implementation is inspired by Apache Spark’swindowSpecclass withrangeBetweenandrowsBetween.
dbplyrimplements this viadbplyr::win_overenablingsparklyrusers to writewindow computations. Also see,dbplyr::window_order/dbplyr::window_frame.tidier’smutatewraps this functionality via uniform syntaxacross dataframes and remote tbls.tidypysparkpython package implementsmutatestyle window computation API forpyspark.
- dev:
remotes::install_github("talegari/tidier") - cran:
install.packages("tidier")
tidier package is deeply indebted to three amazing packages and peoplebehind it.
Wickham H, François R, Henry L, Müller K, Vaughan D (2023). _dplyr: AGrammar of Data Manipulation_. R package version 1.1.0,<https://CRAN.R-project.org/package=dplyr>.Vaughan D (2021). _slider: Sliding Window Functions_. R packageversion 0.2.2, <https://CRAN.R-project.org/package=slider>.Wickham H, Girlich M, Ruiz E (2023). _dbplyr: A 'dplyr' Back End for Databases_. R package version 2.3.2, <https://CRAN.R-project.org/package=dbplyr>.About
dplyr friendly spark style window aggregation for R dataframes and remote dbplyr tbls
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.