Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

dplyr friendly spark style window aggregation for R dataframes and remote dbplyr tbls

License

NotificationsYou must be signed in to change notification settings

talegari/tidier

Repository files navigation

CRAN statusR-CMD-check

tidier package provides ‘Apache Spark’style window aggregation for R dataframes and remotedbplyr tbls via‘mutate’ in‘dplyr’ flavour.

Example

Create a new column with average temp over last seven days in the samemonth.

set.seed(101)airquality|># create date columndplyr::mutate(date_col=lubridate::make_date(1973,Month,Day))|># create gaps by removing some daysdplyr::slice_sample(prop=0.8)|># compute mean temperature over last seven days in the same monthtidier::mutate(avg_temp_over_last_week= mean(Temp,na.rm=TRUE),.order_by=Day,.by=Month,.frame= c(lubridate::days(7),# 7 days before current rowlubridate::days(-1)# do not include current row                               ),.index=date_col                 )#> # A tibble: 122 × 8#>    Month Ozone Solar.R  Wind  Temp   Day date_col   avg_temp_over_last_week#>    <int> <int>   <int> <dbl> <int> <int> <date>                       <dbl>#>  1     6    NA     286   8.6    78     1 1973-06-01                   NaN#>  2     6    NA     242  16.1    67     3 1973-06-03                    78#>  3     6    NA     186   9.2    84     4 1973-06-04                    72.5#>  4     6    NA     264  14.3    79     6 1973-06-06                    76.3#>  5     6    29     127   9.7    82     7 1973-06-07                    77#>  6     6    NA     273   6.9    87     8 1973-06-08                    78#>  7     6    NA     259  10.9    93    11 1973-06-11                    83#>  8     6    NA     250   9.2    92    12 1973-06-12                    85.2#>  9     6    23     148   8      82    13 1973-06-13                    86.6#> 10     6    NA     332  13.8    80    14 1973-06-14                    87.2#> # ℹ 112 more rows

Features

  • mutate supports
    • .by (group by),
    • .order_by (order by),
    • .frame (endpoints of window frame),
    • .index (identify index column like date column, in df versiononly),
    • .complete (whether to compute over incomplete window, in dfversion only).
  • mutate automatically uses a future backend (viafurrr, in df version only).

Motivation

This implementation is inspired by Apache Spark’swindowSpecclass withrangeBetweenandrowsBetween.

Ecosystem

  1. dbplyr implements this viadbplyr::win_overenablingsparklyr users to writewindow computations. Also see,dbplyr::window_order/dbplyr::window_frame.tidier’smutate wraps this functionality via uniform syntaxacross dataframes and remote tbls.

  2. tidypysparkpython package implementsmutate style window computation API forpyspark.

Installation

  • dev:remotes::install_github("talegari/tidier")
  • cran:install.packages("tidier")

Acknowledgements

tidier package is deeply indebted to three amazing packages and peoplebehind it.

  1. dplyr:
Wickham H, François R, Henry L, Müller K, Vaughan D (2023). _dplyr: AGrammar of Data Manipulation_. R package version 1.1.0,<https://CRAN.R-project.org/package=dplyr>.
  1. slider:
Vaughan D (2021). _slider: Sliding Window Functions_. R packageversion 0.2.2, <https://CRAN.R-project.org/package=slider>.
  1. dbplyr:
Wickham H, Girlich M, Ruiz E (2023). _dbplyr: A 'dplyr' Back End  for Databases_. R package version 2.3.2,  <https://CRAN.R-project.org/package=dbplyr>.

About

dplyr friendly spark style window aggregation for R dataframes and remote dbplyr tbls

Topics

Resources

License

Stars

Watchers

Forks

Languages


[8]ページ先頭

©2009-2025 Movatter.jp