Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Summarise each group down to one row

Source:R/summarise.R
summarise.Rd

summarise() creates a new data frame. It returns one row for eachcombination of grouping variables; if there are no grouping variables, theoutput will have a single row summarising all observations in the input. Itwill contain one column for each grouping variable and one column for each ofthe summary statistics that you have specified.

summarise() andsummarize() are synonyms.

Usage

summarise(.data,..., .by=NULL, .groups=NULL)summarize(.data,..., .by=NULL, .groups=NULL)

Arguments

.data

A data frame, data frame extension (e.g. a tibble), or alazy data frame (e.g. from dbplyr or dtplyr). SeeMethods, below, formore details.

...

<data-masking> Name-value pairs ofsummary functions. The name will be the name of the variable in the result.

The value can be:

  • A vector of length 1, e.g.min(x),n(), orsum(is.na(y)).

  • A data frame, to add multiple columns from a single expression.

[Deprecated] Returning values with size 0 or >1 wasdeprecated as of 1.1.0. Please usereframe() for this instead.

.by

[Experimental]

<tidy-select> Optionally, a selection of columns togroup by for just this operation, functioning as an alternative togroup_by(). Fordetails and examples, see?dplyr_by.

.groups

[Experimental] Grouping structure of theresult.

  • "drop_last": dropping the last level of grouping. This was theonly supported option before version 1.0.0.

  • "drop": All levels of grouping are dropped.

  • "keep": Same grouping structure as.data.

  • "rowwise": Each row is its own group.

When.groups is not specified, it is chosenbased on the number of rows of the results:

  • If all the results have 1 row, you get "drop_last".

  • If the number of rows varies, you get "keep" (note that returning avariable number of rows was deprecated in favor ofreframe(), whichalso unconditionally drops all levels of grouping).

In addition, a message informs you of that choice, unless the result is ungrouped,the option "dplyr.summarise.inform" is set toFALSE,or whensummarise() is called from a function in a package.

Value

An objectusually of the same type as.data.

  • The rows come from the underlyinggroup_keys().

  • The columns are a combination of the grouping keys and the summaryexpressions that you provide.

  • The grouping structure is controlled by the.groups= argument, theoutput may be anothergrouped_df, atibble or arowwise data frame.

  • Data frame attributes arenot preserved, becausesummarise()fundamentally creates a new data frame.

Useful functions

Backend variations

The data frame backend supports creating a variable and using it in thesame summary. This means that previously created summary variables can befurther transformed or combined within the summary, as inmutate().However, it also means that summary variables with the same names as previousvariables overwrite them, making those variables unavailable to later summaryvariables.

This behaviour may not be supported in other backends. To avoid unexpectedresults, consider using new names for your summary variables, especially whencreating multiple summaries.

Methods

This function is ageneric, which means that packages can provideimplementations (methods) for other classes. See the documentation ofindividual methods for extra arguments and differences in behaviour.

The following methods are currently available in loaded packages:dbplyr (tbl_lazy), dplyr (data.frame,grouped_df,rowwise_df).

See also

Other single table verbs:arrange(),filter(),mutate(),reframe(),rename(),select(),slice()

Examples

# A summary applied to ungrouped tbl returns a single rowmtcars%>%summarise(mean=mean(disp), n=n())#>       mean  n#> 1 230.7219 32# Usually, you'll want to group firstmtcars%>%group_by(cyl)%>%summarise(mean=mean(disp), n=n())#># A tibble: 3 × 3#>     cyl  mean     n#><dbl><dbl><int>#>1     4  105.    11#>2     6  183.     7#>3     8  353.    14# Each summary call removes one grouping level (since that group# is now just a single row)mtcars%>%group_by(cyl,vs)%>%summarise(cyl_n=n())%>%group_vars()#> `summarise()` has grouped output by 'cyl'. You can override using the#> `.groups` argument.#> [1] "cyl"# BEWARE: reusing variables may lead to unexpected resultsmtcars%>%group_by(cyl)%>%summarise(disp=mean(disp), sd=sd(disp))#># A tibble: 3 × 3#>     cyl  disp    sd#><dbl><dbl><dbl>#>1     4  105.NA#>2     6  183.NA#>3     8  353.NA# Refer to column names stored as strings with the `.data` pronoun:var<-"mass"summarise(starwars, avg=mean(.data[[var]], na.rm=TRUE))#># A tibble: 1 × 1#>     avg#><dbl>#>1  97.3# Learn more in ?rlang::args_data_masking# In dplyr 1.1.0, returning multiple rows per group was deprecated in favor# of `reframe()`, which never messages and always returns an ungrouped# result:mtcars%>%group_by(cyl)%>%summarise(qs=quantile(disp,c(0.25,0.75)), prob=c(0.25,0.75))#>Warning:Returning more (or less) than 1 row per `summarise()` group was#> deprecated in dplyr 1.1.0.#> Please use `reframe()` instead.#> When switching from `summarise()` to `reframe()`, remember that#>   `reframe()` always returns an ungrouped data frame and adjust#>   accordingly.#> `summarise()` has grouped output by 'cyl'. You can override using the#> `.groups` argument.#># A tibble: 6 × 3#># Groups:   cyl [3]#>     cyl    qs  prob#><dbl><dbl><dbl>#>1     4  78.8  0.25#>2     4 121.   0.75#>3     6 160    0.25#>4     6 196.   0.75#>5     8 302.   0.25#>6     8 390    0.75# ->mtcars%>%group_by(cyl)%>%reframe(qs=quantile(disp,c(0.25,0.75)), prob=c(0.25,0.75))#># A tibble: 6 × 3#>     cyl    qs  prob#><dbl><dbl><dbl>#>1     4  78.8  0.25#>2     4 121.   0.75#>3     6 160    0.25#>4     6 196.   0.75#>5     8 302.   0.25#>6     8 390    0.75

[8]ページ先頭

©2009-2025 Movatter.jp