Movatterモバイル変換

Title:

Tools for Splitting, Applying and Combining Data

Version:

1.8.9

Description:

A set of tools that solves a common set of problems: you need to break a big problem down into manageable pieces, operate on each piece and then put all the pieces back together. For example, you might want to fit a model to each spatial location or time point in your study, summarise data by panels or collapse high-dimensional arrays to simpler summary statistics. The development of 'plyr' has been generously supported by 'Becton Dickinson'.

License:

MIT + file LICENSE

URL:

http://had.co.nz/plyr,https://github.com/hadley/plyr

BugReports:

https://github.com/hadley/plyr/issues

Depends:

R (≥ 3.1.0)

Imports:

Rcpp (≥ 0.11.0)

Suggests:

abind, covr, doParallel, foreach, iterators, itertools,tcltk, testthat

LinkingTo:

Rcpp

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.2.3

NeedsCompilation:

yes

Packaged:

2023-09-27 13:58:04 UTC; hadleywickham

Author:

Hadley Wickham [aut, cre]

Maintainer:

Hadley Wickham <hadley@rstudio.com>

Repository:

CRAN

Date/Publication:

2023-10-02 06:50:08 UTC

plyr: the split-apply-combine paradigm for R.

Description

The plyr package is a set of clean and consistent tools that implement thesplit-apply-combine pattern in R. This is an extremely common pattern indata analysis: you solve a complex problem by breaking it down into smallpieces, doing something to each piece and then combining the results backtogether again.

Details

The plyr functions are named according to what sort of data structure theysplit up and what sort of data structure they return:

a: array
l: list
d: data.frame
m: multiple inputs
r: repeat multiple times
_: nothing

Soddply takes a data frame as input and returns a data frameas output, andl_ply takes a list as input and returns nothingas output.

Row names

By design, no plyr function will preserve row names - in general it is toohard to know what should be done with them for many of the operationssupported by plyr. If you want to preserve row names, usename_rows to convert them into an explicit column in yourdata frame, perform the plyr operations, and then usename_rowsagain to convert the column back into row names.

Helpers

Plyr also provides a set of helper functions for common data analysisproblems:

arrange: re-order the rows of a data frame byspecifying the columns to order by
mutate: add new columns or modifying existing columns,liketransform, but new columns can refer to other columnsthat you just created.
summarise: likemutate but create anew data frame, not preserving any columns in the old data frame.
join: an adapation ofmerge which ismore similar to SQL, and has a much faster implementation if you onlywant to find the first match.
match_df: a version ofjoin that insteadof returning the two tables combined together, only returns the rowsin the first table that match the second.
colwise: make any function work colwise on a dataframe
rename: easily rename columns in a data frame
round_any: round a number to any degree of precision
count: quickly count unique combinations and returnreturn as a data frame.

Quote variables to create a list of unevaluated expressions for laterevaluation.

Description

This function is similar to~ in that it is used tocapture the name of variables, not their current value. This is usedthroughout plyr to specify the names of variables (or more complicatedexpressions).

Usage

.(..., .env = parent.frame())

Arguments

...

unevaluated expressions to be recorded. Specify names if youwant the set the names of the resultant variables

.env

environment in which unbound symbols in... should beevaluated. Defaults to the environment in which. was executed.

Details

Similar tricks can be performed withsubstitute, but whenfunctions can be called in multiple ways it becomes increasingly trickyto ensure that the values are extracted from the correct frame. Substitutetricks also make it difficult to program against the functions that usethem, while thequoted class providesas.quoted.character to convert strings to the appropriatedata structure.

Value

list of symbol and language primitives

Examples

.(a, b, c).(first = a, second = b, third = c).(a ^ 2, b - d, log(c))as.quoted(~ a + b + c)as.quoted(a ~ b + c)as.quoted(c("a", "b", "c"))# Some examples using ddply - look at the column namesddply(mtcars, "cyl", each(nrow, ncol))ddply(mtcars, ~ cyl, each(nrow, ncol))ddply(mtcars, .(cyl), each(nrow, ncol))ddply(mtcars, .(log(cyl)), each(nrow, ncol))ddply(mtcars, .(logcyl = log(cyl)), each(nrow, ncol))ddply(mtcars, .(vs + am), each(nrow, ncol))ddply(mtcars, .(vsam = vs + am), each(nrow, ncol))

Subset splits.

Description

Subset splits, ensuring that labels keep matching

Usage

## S3 method for class 'split'x[i, ...]

Arguments

x

split object

i

index

...

unused

Split array, apply function, and discard results.

Description

For each slice of an array, apply function and discard results

Usage

a_ply(  .data,  .margins,  .fun = NULL,  ...,  .expand = TRUE,  .progress = "none",  .inform = FALSE,  .print = FALSE,  .parallel = FALSE,  .paropts = NULL)

Arguments

.data

matrix, array or data frame to be processed

.margins

a vector giving the subscripts to split updata by.1 splits up by rows, 2 by columns and c(1,2) by rows and columns, and soon for higher dimensions

.fun

function to apply to each piece

...

other arguments passed on to.fun

.expand

if.data is a data frame, should output be 1d (expand= FALSE), with an element for each row; or nd (expand = TRUE), with adimension for each variable.

.progress

name of the progress bar to use, seecreate_progress_bar

.inform

produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging

.print

automatically print each result? (default:FALSE)

.parallel

ifTRUE, apply function in parallel, using parallelbackend provided by foreach

.paropts

a list of additional options passed intotheforeach function when parallel computationis enabled. This is important if (for example) your code relies onexternal data or packages: use the.export and.packagesarguments to supply them so that all cluster nodes have the correctenvironment set up for computing.

Value

Nothing

Input

This function splits matrices, arrays and data frames bydimensions

Output

All output is discarded. This is useful for functions that you arecalling purely for their side effects like displaying plots orsaving output.

References

Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Split array, apply function, and return results in an array.

Description

For each slice of an array, apply function, keeping results as an array.

Usage

aaply(  .data,  .margins,  .fun = NULL,  ...,  .expand = TRUE,  .progress = "none",  .inform = FALSE,  .drop = TRUE,  .parallel = FALSE,  .paropts = NULL)

Arguments

.data

matrix, array or data frame to be processed

.margins

a vector giving the subscripts to split updata by.1 splits up by rows, 2 by columns and c(1,2) by rows and columns, and soon for higher dimensions

.fun

function to apply to each piece

...

other arguments passed on to.fun

.expand

if.data is a data frame, should output be 1d (expand= FALSE), with an element for each row; or nd (expand = TRUE), with adimension for each variable.

.progress

name of the progress bar to use, seecreate_progress_bar

.inform

produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging

.drop

should extra dimensions of length 1 in the output bedropped, simplifying the output. Defaults toTRUE

.parallel

ifTRUE, apply function in parallel, using parallelbackend provided by foreach

.paropts

Details

This function is very similar toapply, except that it willalways return an array, and when the function returns >1 d data structures,those dimensions are added on to the highest dimensions, rather than thelowest dimensions. This makesaaply idempotent, so thataaply(input, X, identity) is equivalent toaperm(input, X).

Value

if results are atomic with same type and dimensionality, avector, matrix or array; otherwise, a list-array (a list withdimensions)

Warning

Contrary toalply andadply, passing a dataframe as first argument toaaply may lead to unexpected resultssuch as huge memory allocations.

Input

This function splits matrices, arrays and data frames bydimensions

Output

If there are no results, then this function will return a vector oflength 0 (vector()).

References

Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Examples

dim(ozone)aaply(ozone, 1, mean)aaply(ozone, 1, mean, .drop = FALSE)aaply(ozone, 3, mean)aaply(ozone, c(1,2), mean)dim(aaply(ozone, c(1,2), mean))dim(aaply(ozone, c(1,2), mean, .drop = FALSE))aaply(ozone, 1, each(min, max))aaply(ozone, 3, each(min, max))standardise <- function(x) (x - min(x)) / (max(x) - min(x))aaply(ozone, 3, standardise)aaply(ozone, 1:2, standardise)aaply(ozone, 1:2, diff)

Split array, apply function, and return results in a data frame.

Description

For each slice of an array, apply function then combine results into a dataframe.

Usage

adply(  .data,  .margins,  .fun = NULL,  ...,  .expand = TRUE,  .progress = "none",  .inform = FALSE,  .parallel = FALSE,  .paropts = NULL,  .id = NA)

Arguments

.data

matrix, array or data frame to be processed

.margins

a vector giving the subscripts to split updata by.1 splits up by rows, 2 by columns and c(1,2) by rows and columns, and soon for higher dimensions

.fun

function to apply to each piece

...

other arguments passed on to.fun

.expand

if.data is a data frame, should output be 1d (expand= FALSE), with an element for each row; or nd (expand = TRUE), with adimension for each variable.

.progress

name of the progress bar to use, seecreate_progress_bar

.inform

produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging

.parallel

ifTRUE, apply function in parallel, using parallelbackend provided by foreach

.paropts

.id

name(s) of the index column(s).PassNULL to avoid creation of the index column(s).Omit or passNA to use the default names"X1","X2", ....Otherwise, this argument must have the same length as.margins.

Value

A data frame, as described in the output section.

Input

This function splits matrices, arrays and data frames bydimensions

Output

The most unambiguous behaviour is achieved when.fun returns adata frame - in that case pieces will be combined withrbind.fill. If.fun returns an atomic vector offixed length, it will berbinded together and converted to a dataframe. Any other values will result in an error.

If there are no results, then this function will return a dataframe with zero rows and columns (data.frame()).

References

Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Split array, apply function, and return results in a list.

Description

For each slice of an array, apply function then combine results into alist.

Usage

alply(  .data,  .margins,  .fun = NULL,  ...,  .expand = TRUE,  .progress = "none",  .inform = FALSE,  .parallel = FALSE,  .paropts = NULL,  .dims = FALSE)

Arguments

.data

matrix, array or data frame to be processed

.margins

a vector giving the subscripts to split updata by.1 splits up by rows, 2 by columns and c(1,2) by rows and columns, and soon for higher dimensions

.fun

function to apply to each piece

...

other arguments passed on to.fun

.expand

if.data is a data frame, should output be 1d (expand= FALSE), with an element for each row; or nd (expand = TRUE), with adimension for each variable.

.progress

name of the progress bar to use, seecreate_progress_bar

.inform

produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging

.parallel

ifTRUE, apply function in parallel, using parallelbackend provided by foreach

.paropts

.dims

ifTRUE, copy over dimensions and names from input.

Details

The list will have "dims" and "dimnames" corresponding to themargins given. For instancealply(x, c(3,2), ...) wherex has dimsc(4,3,2) will give a result with dimsc(2,3).

alply is somewhat similar toapply for caseswhere the results are not atomic.

Value

list of results

Input

This function splits matrices, arrays and data frames bydimensions

Output

If there are no results, then this function will returna list of length 0 (list()).

References

Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Examples

alply(ozone, 3, quantile)alply(ozone, 3, function(x) table(round(x)))

Dimensions.

Description

Consistent dimensions for vectors, matrices and arrays.

Usage

amv_dim(x)

Arguments

x

array, matrix or vector

Dimension names.

Description

Consistent dimnames for vectors, matrices and arrays.

Usage

amv_dimnames(x)

Arguments

x

array, matrix or vector

Details

Unlikedimnames no part of the output will ever benull. If a component of dimnames is omitted,amv_dimnameswill return an integer sequence of the appropriate length.

Order a data frame by its colums.

Description

This function completes the subsetting, transforming and ordering triadwith a function that works in a similar way tosubset andtransform but for reordering a data frame by its columns.This saves a lot of typing!

Usage

arrange(df, ...)

Arguments

df

data frame to reorder

...

expressions evaluated in the context ofdf and then fedtoorder

Examples

# sort mtcars data by cylinder and displacementmtcars[with(mtcars, order(cyl, disp)), ]# Same result using arrange: no need to use with(), as the context is implicit# NOTE: plyr functions do NOT preserve row.namesarrange(mtcars, cyl, disp)# Let's keep the row.names in this examplemyCars = cbind(vehicle=row.names(mtcars), mtcars)arrange(myCars, cyl, disp)# Sort with displacement in descending orderarrange(myCars, cyl, desc(disp))

Make a function return a data frame.

Description

Create a new function that returns the existing function wrapped in adata.frame with a single column,value.

Usage

## S3 method for class ''function''as.data.frame(x, row.names, optional, ...)

Arguments

x

function to make return a data frame

row.names

necessary to match the generic, but not used

optional

necessary to match the generic, but not used

...

necessary to match the generic, but not used

Details

This is useful when calling*dply functions with a function thatreturns a vector, and you want the output in rows, rather than columns.Thevalue column is always created, even for empty inputs.

Convert split list to regular list.

Description

Strip off label related attributed to make a strip list as regular list

Usage

## S3 method for class 'split'as.list(x, ...)

Arguments

x

object to convert to a list

...

unused

Convert input to quoted variables.

Description

Convert characters, formulas and calls to quoted .variables

Usage

as.quoted(x, env = parent.frame())

Arguments

x

input to quote

env

environment in which unbound symbols in expression should beevaluated. Defaults to the environment in whichas.quoted wasexecuted.

Details

This method is called by default on all plyr functions that take a.variables argument, so that equivalent forms can be used anywhere.

Currently conversions exist for character vectors, formulas andcall objects.

Value

a list of quoted variables

Examples

as.quoted(c("a", "b", "log(d)"))as.quoted(a ~ b + log(d))

Yearly batting records for all major league baseball players

Description

This data frame contains batting statistics for a subset of playerscollected fromhttp://www.baseball-databank.org/. There are a totalof 21,699 records, covering 1,228 players from 1871 to 2007. Only playerswith more 15 seasons of play are included.

Usage

baseball

Format

A 21699 x 22 data frame

Variables

Variables:

id, unique player id
year, year of data
stint
team, team played for
lg, league
g, number of games
ab, number of times at bat
r, number of runs
h, hits, times reached base because of a batted, fair ball withouterror by the defense
X2b, hits on which the batter reached second base safely
X3b, hits on which the batter reached third base safely
hr, number of home runs
rbi, runs batted in
sb, stolen bases
cs, caught stealing
bb, base on balls (walk)
so, strike outs
ibb, intentional base on balls
hbp, hits by pitch
sh, sacrifice hits
sf, sacrifice flies
gidp, ground into double play

References

http://www.baseball-databank.org/

Examples

baberuth <- subset(baseball, id == "ruthba01")baberuth$cyear <- baberuth$year - min(baberuth$year) + 1calculate_cyear <- function(df) {  mutate(df,    cyear = year - min(year),    cpercent = cyear / (max(year) - min(year))  )}baseball <- ddply(baseball, .(id), calculate_cyear)baseball <- subset(baseball, ab >= 25)model <- function(df) {  lm(rbi / ab ~ cyear, data=df)}model(baberuth)models <- dlply(baseball, .(id), model)

Column-wise function.

Description

Turn a function that operates on a vector into a function that operatescolumn-wise on a data.frame.

Usage

colwise(.fun, .cols = true, ...)catcolwise(.fun, ...)numcolwise(.fun, ...)

Arguments

.fun

function

.cols

either a function that tests columns for inclusion, or aquoted object giving which columns to process

...

other arguments passed on to.fun

Details

catcolwise andnumcolwise provide version that only operateon discrete and numeric variables respectively.

Examples

# Count number of missing valuesnmissing <- function(x) sum(is.na(x))# Apply to every column in a data framecolwise(nmissing)(baseball)# This syntax looks a little different.  It is shorthand for the# the following:f <- colwise(nmissing)f(baseball)# This is particularly useful in conjunction with d*plyddply(baseball, .(year), colwise(nmissing))# To operate only on specified columns, supply them as the second# argument.  Many different forms are accepted.ddply(baseball, .(year), colwise(nmissing, .(sb, cs, so)))ddply(baseball, .(year), colwise(nmissing, c("sb", "cs", "so")))ddply(baseball, .(year), colwise(nmissing, ~ sb + cs + so))# Alternatively, you can specify a boolean function that determines# whether or not a column should be includedddply(baseball, .(year), colwise(nmissing, is.character))ddply(baseball, .(year), colwise(nmissing, is.numeric))ddply(baseball, .(year), colwise(nmissing, is.discrete))# These last two cases are particularly common, so some shortcuts are# provided:ddply(baseball, .(year), numcolwise(nmissing))ddply(baseball, .(year), catcolwise(nmissing))# You can supply additional arguments to either colwise, or the function# it generates:numcolwise(mean)(baseball, na.rm = TRUE)numcolwise(mean, na.rm = TRUE)(baseball)

Compact list.

Description

Remove all NULL entries from a list

Usage

compact(l)

Arguments

l

list

Count the number of occurences.

Description

Equivalent toas.data.frame(table(x)), but does not includecombinations with zero counts.

Usage

count(df, vars = NULL, wt_var = NULL)

Arguments

df

data frame to be processed

vars

variables to count unique values of

wt_var

optional variable to weight by - if this is non-NULL, countwill sum up the value of this variable for each combination of idvariables.

Details

Speed-wise count is competitive withtable for singlevariables, but it really comes into its own when summarising multipledimensions because it only counts combinations that actually occur in thedata.

Compared totable +as.data.frame,countalso preserves the type of the identifier variables, instead of convertingthem to characters/factors.

Value

a data frame with label and freq columns

Examples

# Count of each value of "id" in the first 100 casescount(baseball[1:100,], vars = "id")# Count of ids, weighted by their "g" loadingcount(baseball[1:100,], vars = "id", wt_var = "g")count(baseball, "id", "ab")count(baseball, "lg")# How many stints do players do?count(baseball, "stint")# Count of times each player appeared in each of the years they playedcount(baseball[1:100,], c("id", "year"))# Count of countscount(count(baseball[1:100,], c("id", "year")), "id", "freq")count(count(baseball, c("id", "year")), "freq")

Create progress bar.

Description

Create progress bar object from text string.

Usage

create_progress_bar(name = "none", ...)

Arguments

name

type of progress bar to create

...

other arguments passed onto progress bar function

Details

Progress bars give feedback on how apply step is proceeding. Thisis mainly useful for long running functions, as for short functions, thetime taken up by splitting and combining may be on the same order (orlonger) as the apply step. Additionally, for short functions, the timeneeded to update the progress bar can significantly slow down the process.For the trivial examples below, using the tk progress bar slows things downby a factor of a thousand.

Note the that progress bar is approximate, and if the time taken byindividual function applications is highly non-uniform it may not be veryinformative of the time left.

There are currently four types of progress bar: "none", "text", "tk", and"win". See the individual documentation for more details. In plyrfunctions, these can either be specified by name, or you can create theprogress bar object yourself if you want more control over its apperance.See the examples.

Examples

# No progress barl_ply(1:100, identity, .progress = "none")## Not run: # Use the Tcl/Tk interfacel_ply(1:100, identity, .progress = "tk")## End(Not run)# Text-based progress (|======|)l_ply(1:100, identity, .progress = "text")# Choose a progress character, run a length of time you can seel_ply(1:10000, identity, .progress = progress_text(char = "."))

Split data frame, apply function, and discard results.

Description

For each subset of a data frame, apply function and discard results.To apply a function for each row, usea_ply with.margins set to1.

Usage

d_ply(  .data,  .variables,  .fun = NULL,  ...,  .progress = "none",  .inform = FALSE,  .drop = TRUE,  .print = FALSE,  .parallel = FALSE,  .paropts = NULL)

Arguments

.data

data frame to be processed

.variables

variables to split data frame by, asas.quotedvariables, a formula or character vector

.fun

function to apply to each piece

...

other arguments passed on to.fun

.progress

name of the progress bar to use, seecreate_progress_bar

.inform

produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging

.drop

should combinations of variables that do not appear in theinput data be preserved (FALSE) or dropped (TRUE, default)

.print

automatically print each result? (default:FALSE)

.parallel

ifTRUE, apply function in parallel, using parallelbackend provided by foreach

.paropts

Value

Nothing

Input

This function splits data frames by variables.

Output

All output is discarded. This is useful for functions that you arecalling purely for their side effects like displaying plots orsaving output.

References

Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Split data frame, apply function, and return results in an array.

Description

For each subset of data frame, apply function then combine results intoan array.daply with a function that operates column-wise issimilar toaggregate.To apply a function for each row, useaaply with.margins set to1.

Usage

daply(  .data,  .variables,  .fun = NULL,  ...,  .progress = "none",  .inform = FALSE,  .drop_i = TRUE,  .drop_o = TRUE,  .parallel = FALSE,  .paropts = NULL)

Arguments

.data

data frame to be processed

.variables

variables to split data frame by, as quotedvariables, a formula or character vector

.fun

function to apply to each piece

...

other arguments passed on to.fun

.progress

name of the progress bar to use, seecreate_progress_bar

.inform

produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging

.drop_i

should combinations of variables that do not appear in theinput data be preserved (FALSE) or dropped (TRUE, default)

.drop_o

should extra dimensions of length 1 in the output bedropped, simplifying the output. Defaults toTRUE

.parallel

ifTRUE, apply function in parallel, using parallelbackend provided by foreach

.paropts

Value

if results are atomic with same type and dimensionality, avector, matrix or array; otherwise, a list-array (a list withdimensions)

Input

This function splits data frames by variables.

Output

If there are no results, then this function will return a vector oflength 0 (vector()).

References

Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Examples

daply(baseball, .(year), nrow)# Several different ways of summarising by variables that should not be# included in the summarydaply(baseball[, c(2, 6:9)], .(year), colwise(mean))daply(baseball[, 6:9], .(baseball$year), colwise(mean))daply(baseball, .(year), function(df) colwise(mean)(df[, 6:9]))

Split data frame, apply function, and return results in a data frame.

Description

For each subset of a data frame, apply function then combine results into adata frame.To apply a function for each row, useadply with.margins set to1.

Usage

ddply(  .data,  .variables,  .fun = NULL,  ...,  .progress = "none",  .inform = FALSE,  .drop = TRUE,  .parallel = FALSE,  .paropts = NULL)

Arguments

.data

data frame to be processed

.variables

variables to split data frame by, asas.quotedvariables, a formula or character vector

.fun

function to apply to each piece

...

other arguments passed on to.fun

.progress

name of the progress bar to use, seecreate_progress_bar

.inform

produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging

.drop

should combinations of variables that do not appear in theinput data be preserved (FALSE) or dropped (TRUE, default)

.parallel

ifTRUE, apply function in parallel, using parallelbackend provided by foreach

.paropts

Value

A data frame, as described in the output section.

Input

This function splits data frames by variables.

Output

If there are no results, then this function will return a dataframe with zero rows and columns (data.frame()).

References

Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Examples

# Summarize a dataset by two variablesdfx <- data.frame(  group = c(rep('A', 8), rep('B', 15), rep('C', 6)),  sex = sample(c("M", "F"), size = 29, replace = TRUE),  age = runif(n = 29, min = 18, max = 54))# Note the use of the '.' function to allow# group and sex to be used without quotingddply(dfx, .(group, sex), summarize, mean = round(mean(age), 2), sd = round(sd(age), 2))# An example using a formula for .variablesddply(baseball[1:100,], ~ year, nrow)# Applying two functions; nrow and ncolddply(baseball, .(lg), c("nrow", "ncol"))# Calculate mean runs batted in for each yearrbi <- ddply(baseball, .(year), summarise,  mean_rbi = mean(rbi, na.rm = TRUE))# Plot a line chart of the resultplot(mean_rbi ~ year, type = "l", data = rbi)# make new variable career_year based on the# start year for each player (id)base2 <- ddply(baseball, .(id), mutate, career_year = year - min(year) + 1)

Set defaults.

Description

Convient method for combining a list of values with their defaults.

Usage

defaults(x, y)

Arguments

x

list of values

y

defaults

Descending order.

Description

Transform a vector into a format that will be sorted in descending order.

Usage

desc(x)

Arguments

x

vector to transform

Examples

desc(1:10)desc(factor(letters))first_day <- seq(as.Date("1910/1/1"), as.Date("1920/1/1"), "years")desc(first_day)

Number of dimensions.

Description

Number of dimensions of an array or vector

Usage

dims(x)

Arguments

x

array

Split data frame, apply function, and return results in a list.

Description

For each subset of a data frame, apply function then combine results into alist.dlply is similar toby except that the resultsare returned in a different format.To apply a function for each row, usealply with.margins set to1.

Usage

dlply(  .data,  .variables,  .fun = NULL,  ...,  .progress = "none",  .inform = FALSE,  .drop = TRUE,  .parallel = FALSE,  .paropts = NULL)

Arguments

.data

data frame to be processed

.variables

variables to split data frame by, asas.quotedvariables, a formula or character vector

.fun

function to apply to each piece

...

other arguments passed on to.fun

.progress

name of the progress bar to use, seecreate_progress_bar

.inform

produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging

.drop

should combinations of variables that do not appear in theinput data be preserved (FALSE) or dropped (TRUE, default)

.parallel

ifTRUE, apply function in parallel, using parallelbackend provided by foreach

.paropts

Value

list of results

Input

This function splits data frames by variables.

Output

If there are no results, then this function will returna list of length 0 (list()).

References

Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Examples

linmod <- function(df) {  lm(rbi ~ year, data = mutate(df, year = year - min(year)))}models <- dlply(baseball, .(id), linmod)models[[1]]coef <- ldply(models, coef)with(coef, plot(`(Intercept)`, year))qual <- laply(models, function(mod) summary(mod)$r.squared)hist(qual)

Aggregate multiple functions into a single function.

Description

Combine multiple functions into a single function returning a named vectorof outputs.Note: you cannot supply additional parameters for the summary functions

Usage

each(...)

Arguments

...

functions to combine. each function should produce a singlenumber as output

Examples

# Call min() and max() on the vector 1:10each(min, max)(1:10)# This syntax looks a little different.  It is shorthand for the# the following:f<- each(min, max)f(1:10)# Three equivalent ways to call min() and max() on the vector 1:10each("min", "max")(1:10)each(c("min", "max"))(1:10)each(c(min, max))(1:10)# Call length(), min() and max() on a random normal vectoreach(length, mean, var)(rnorm(100))

Check if a data frame is empty.

Description

Empty if it's null or it has 0 rows or columns

Usage

empty(df)

Arguments

df

data frame to check

Evaluate a quoted list of variables.

Description

Evaluates quoted variables in specified environment

Usage

eval.quoted(exprs, envir = NULL, enclos = NULL, try = FALSE)

Arguments

exprs

quoted object to evaluate

try

if TRUE, returnNULL if evaluation unsuccessful

Value

a list

Fail with specified value.

Description

Modify a function so that it returns a default value when there is anerror.

Usage

failwith(default = NULL, f, quiet = FALSE)

Arguments

default

default value

f

function

quiet

all error messages be suppressed?

Value

a function

Examples

f <- function(x) if (x == 1) stop("Error!") else 1## Not run: f(1)f(2)## End(Not run)safef <- failwith(NULL, f)safef(1)safef(2)

Capture current evaluation context.

Description

This function captures the current context, making it easierto use**ply with functions that do special evaluation andneed access to the environment where ddply was called from.

Usage

here(f)

Arguments

f

a function that does non-standard evaluation

Author(s)

Peter Meilstrup,https://github.com/crowding

Examples

df <- data.frame(a = rep(c("a","b"), each = 10), b = 1:20)f1 <- function(label) {   ddply(df, "a", mutate, label = paste(label, b))}## Not run: f1("name:")# Doesn't work because mutate can't find label in the current scopef2 <- function(label) {   ddply(df, "a", here(mutate), label = paste(label, b))}f2("name:")# Works :)

Compute a unique numeric id for each unique row in a data frame.

Description

Properties:

order(id) is equivalent todo.call(order, df)
rows containing the same data have the same value
ifdrop = FALSE then room for all possibilites

Usage

id(.variables, drop = FALSE)

Arguments

.variables

list of variables

drop

drop unusued factor levels?

Value

a numeric vector with attribute n, giving total number ofpossibilities

Numeric id for a vector.

Description

Numeric id for a vector.

Usage

id_var(x, drop = FALSE)

Construct an immutable data frame.

Description

An immutable data frame works like an ordinary data frame, except that whenyou subset it, it returns a reference to the original data frame, not aa copy. This makes subsetting substantially faster and has a big impactwhen you are working with large datasets with many groups.

Usage

idata.frame(df)

Arguments

df

a data frame

Details

This method is still a little experimental, so please let me know if yourun into any problems.

Value

an immutable data frame

Examples

system.time(dlply(baseball, "id", nrow))system.time(dlply(idata.frame(baseball), "id", nrow))

An indexed array.

Description

Create a indexed array, a space efficient way of indexing into a largearray.

Usage

indexed_array(env, index)

Arguments

env

environment containing data frame

index

list of indices

An indexed data frame.

Description

Create a indexed list, a space efficient way of indexing into a large data frame

Usage

indexed_df(data, index, vars)

Arguments

data

environment containing data frame

index

list of indices

vars

a character vector giving the variables used for subsetting

Determine if a vector is discrete.

Description

A discrete vector is a factor or a character vector

Usage

is.discrete(x)

Arguments

x

vector to test

Examples

is.discrete(1:10)is.discrete(c("a", "b", "c"))is.discrete(factor(c("a", "b", "c")))

Is a formula?Checks if argument is a formula

Description

Is a formula?Checks if argument is a formula

Usage

is.formula(x)

Split iterator that returns values, not indices.

Description

Split iterator that returns values, not indices.

Usage

isplit2(x, f, drop = FALSE, ...)

Warning

Deprecated, do not use in new code.

Join two data frames together.

Description

Join, like merge, is designed for the types of problemswhere you would use a sql join.

Usage

join(x, y, by = NULL, type = "left", match = "all")

Arguments

x

data frame

y

data frame

by

character vector of variable names to join by. If omitted, willmatch on all common variables.

type

type of join: left (default), right, inner or full. Seedetails for more information.

match

how should duplicate ids be matched? Either match just the"first" matching row, or match"all" matching rows. Defaultsto"all" for compatibility with merge, but"first" issignificantly faster.

Details

The four join types return:

inner: only rows with matching keys in both x and y
left: all rows in x, adding matching columns from y
right: all rows in y, adding matching columns from x
full: all rows in x with matching columns in y, then therows of y that don't match x.

Note that from plyr 1.5,join will (by default) return all matches,not just the first match, as it did previously.

Unlike merge, preserves the order of x no matter what join type is used.If needed, rows from y will be added to the bottom. Join is often fasterthan merge, although it is somewhat less featureful - it currently offersno way to rename output or merge on different variables in the x and ydata frames.

Examples

first <- ddply(baseball, "id", summarise, first = min(year))system.time(b2 <- merge(baseball, first, by = "id", all.x = TRUE))system.time(b3 <- join(baseball, first, by = "id"))b2 <- arrange(b2, id, year, stint)b3 <- arrange(b3, id, year, stint)stopifnot(all.equal(b2, b3))

Join keys.Given two data frames, create a unique key for each row.

Description

Join keys.Given two data frames, create a unique key for each row.

Usage

join.keys(x, y, by)

Arguments

x

data frame

y

data frame

by

character vector of variable names to join by

Recursively join a list of data frames.

Description

Recursively join a list of data frames.

Usage

join_all(dfs, by = NULL, type = "left", match = "all")

Arguments

dfs

A list of data frames.

by

character vector of variable names to join by. If omitted, willmatch on all common variables.

type

type of join: left (default), right, inner or full. Seedetails for more information.

match

how should duplicate ids be matched? Either match just the"first" matching row, or match"all" matching rows. Defaultsto"all" for compatibility with merge, but"first" issignificantly faster.

Examples

dfs <- list(  a = data.frame(x = 1:10, a = runif(10)),  b = data.frame(x = 1:10, b = runif(10)),  c = data.frame(x = 1:10, c = runif(10)))join_all(dfs)join_all(dfs, "x")

Split list, apply function, and discard results.

Description

For each element of a list, apply function and discard results

Usage

l_ply(  .data,  .fun = NULL,  ...,  .progress = "none",  .inform = FALSE,  .print = FALSE,  .parallel = FALSE,  .paropts = NULL)

Arguments

.data

list to be processed

.fun

function to apply to each piece

...

other arguments passed on to.fun

.progress

name of the progress bar to use, seecreate_progress_bar

.inform

produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging

.print

automatically print each result? (default:FALSE)

.parallel

ifTRUE, apply function in parallel, using parallelbackend provided by foreach

.paropts

Value

Nothing

Input

This function splits lists by elements.

Output

All output is discarded. This is useful for functions that you arecalling purely for their side effects like displaying plots orsaving output.

References

Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Examples

l_ply(llply(mtcars, round), table, .print = TRUE)l_ply(baseball, function(x) print(summary(x)))

Split list, apply function, and return results in an array.

Description

For each element of a list, apply function then combine results into anarray.

Usage

laply(  .data,  .fun = NULL,  ...,  .progress = "none",  .inform = FALSE,  .drop = TRUE,  .parallel = FALSE,  .paropts = NULL)

Arguments

.data

list to be processed

.fun

function to apply to each piece

...

other arguments passed on to.fun

.progress

name of the progress bar to use, seecreate_progress_bar

.inform

produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging

.drop

should extra dimensions of length 1 in the output bedropped, simplifying the output. Defaults toTRUE

.parallel

ifTRUE, apply function in parallel, using parallelbackend provided by foreach

.paropts

Details

laply is similar in spirit tosapply exceptthat it will always return an array, and the output is transposed withrespectsapply - each element of the list corresponds to a row,not a column.

Value

if results are atomic with same type and dimensionality, avector, matrix or array; otherwise, a list-array (a list withdimensions)

Input

This function splits lists by elements.

Output

If there are no results, then this function will return a vector oflength 0 (vector()).

References

Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Examples

laply(baseball, is.factor)# cfldply(baseball, is.factor)colwise(is.factor)(baseball)laply(seq_len(10), identity)laply(seq_len(10), rep, times = 4)laply(seq_len(10), matrix, nrow = 2, ncol = 2)

Split list, apply function, and return results in a data frame.

Description

For each element of a list, apply function then combine results into a dataframe.

Usage

ldply(  .data,  .fun = NULL,  ...,  .progress = "none",  .inform = FALSE,  .parallel = FALSE,  .paropts = NULL,  .id = NA)

Arguments

.data

list to be processed

.fun

function to apply to each piece

...

other arguments passed on to.fun

.progress

name of the progress bar to use, seecreate_progress_bar

.inform

produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging

.parallel

ifTRUE, apply function in parallel, using parallelbackend provided by foreach

.paropts

.id

name of the index column (used if.data is a named list).PassNULL to avoid creation of the index column. For compatibility,omit this argument or passNA to avoid converting the index columnto a factor; in this case,".id" is used as colum name.

Value

A data frame, as described in the output section.

Input

This function splits lists by elements.

Output

If there are no results, then this function will return a dataframe with zero rows and columns (data.frame()).

References

Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Experimental iterator based version of llply.

Description

Because iterators do not have known length,liply starts byallocating an output list of length 50, and then doubles that lengthwhenever it runs out of space. This gives O(n ln n) performance ratherthan the O(n ^ 2) performance from the naive strategy of growing the listeach time.

Usage

liply(.iterator, .fun = NULL, ...)

Arguments

.iterator

iterator object

.fun

function to apply to each piece

...

other arguments passed on to.fun

Warning

Deprecated, do not use in new code.

List to array.

Description

Reduce/simplify a list of homogenous objects to an array

Usage

list_to_array(res, labels = NULL, .drop = FALSE)

Arguments

res

list of input data

labels

a data frame of labels, one row for each element of res

.drop

should extra dimensions be dropped (TRUE) or preserved (FALSE)

List to data frame.

Description

Reduce/simplify a list of homogenous objects to a data frame. AllNULL entries are removed. Remaining entries must be all atomicor all data frames.

Usage

list_to_dataframe(res, labels = NULL, id_name = NULL, id_as_factor = FALSE)

Arguments

res

list of input data

labels

a data frame of labels, one row for each element of res

id_name

the name of the index column,NULL for no indexcolumn

List to vector.

Description

Reduce/simplify a list of homogenous objects to a vector

Usage

list_to_vector(res)

Arguments

res

list of input data

Split list, apply function, and return results in a list.

Description

For each element of a list, apply function, keeping results as a list.

Usage

llply(  .data,  .fun = NULL,  ...,  .progress = "none",  .inform = FALSE,  .parallel = FALSE,  .paropts = NULL)

Arguments

.data

list to be processed

.fun

function to apply to each piece

...

other arguments passed on to.fun

.progress

name of the progress bar to use, seecreate_progress_bar

.inform

produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging

.parallel

ifTRUE, apply function in parallel, using parallelbackend provided by foreach

.paropts

Details

llply is equivalent tolapply except that it willpreserve labels and can display a progress bar.

Value

list of results

Input

This function splits lists by elements.

Output

If there are no results, then this function will returna list of length 0 (list()).

References

Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Examples

llply(llply(mtcars, round), table)llply(baseball, summary)# Examples from ?lapplyx <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE))llply(x, mean)llply(x, quantile, probs = 1:3/4)

Loop apply

Description

An optimised version of lapply for the special case of operating onseq_len(n)

Usage

loop_apply(n, f, env = parent.frame())

Arguments

n

length of sequence

f

function to apply to each integer

env

environment in which to evaluate function

Call function with arguments in array or data frame, discarding results.

Description

Call a multi-argument function with values taken from columns of andata frame or array, and discard results into a list.

Usage

m_ply(  .data,  .fun = NULL,  ...,  .expand = TRUE,  .progress = "none",  .inform = FALSE,  .print = FALSE,  .parallel = FALSE,  .paropts = NULL)

Arguments

.data

matrix or data frame to use as source of arguments

.fun

function to apply to each piece

...

other arguments passed on to.fun

.expand

should output be 1d (expand = FALSE), with an element foreach row; or nd (expand = TRUE), with a dimension for each variable.

.progress

name of the progress bar to use, seecreate_progress_bar

.inform

produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging

.print

automatically print each result? (default:FALSE)

.parallel

ifTRUE, apply function in parallel, using parallelbackend provided by foreach

.paropts

Details

Them*ply functions are theplyr version ofmapply,specialised according to the type of output they produce. These functionsare just a convenient wrapper arounda*ply withmargins = 1and.fun wrapped insplat.

Value

Nothing

Input

Call a multi-argument function with values taken fromcolumns of an data frame or array

Output

All output is discarded. This is useful for functions that you arecalling purely for their side effects like displaying plots orsaving output.

References

Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Call function with arguments in array or data frame, returning an array.

Description

Call a multi-argument function with values taken from columns of andata frame or array, and combine results into an array

Usage

maply(  .data,  .fun = NULL,  ...,  .expand = TRUE,  .progress = "none",  .inform = FALSE,  .drop = TRUE,  .parallel = FALSE,  .paropts = NULL)

Arguments

.data

matrix or data frame to use as source of arguments

.fun

function to apply to each piece

...

other arguments passed on to.fun

.expand

should output be 1d (expand = FALSE), with an element foreach row; or nd (expand = TRUE), with a dimension for each variable.

.progress

name of the progress bar to use, seecreate_progress_bar

.inform

produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging

.drop

should extra dimensions of length 1 in the output bedropped, simplifying the output. Defaults toTRUE

.parallel

ifTRUE, apply function in parallel, using parallelbackend provided by foreach

.paropts

Details

Value

if results are atomic with same type and dimensionality, avector, matrix or array; otherwise, a list-array (a list withdimensions)

Input

Call a multi-argument function with values taken fromcolumns of an data frame or array

Output

If there are no results, then this function will return a vector oflength 0 (vector()).

References

Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Examples

maply(cbind(mean = 1:5, sd = 1:5), rnorm, n = 5)maply(expand.grid(mean = 1:5, sd = 1:5), rnorm, n = 5)maply(cbind(1:5, 1:5), rnorm, n = 5)

Replace specified values with new values, in a vector or factor.

Description

Item inx that match itemsfrom will be replaced byitems into, matched by position. For example, items inx thatmatch the first element infrom will be replaced by the firstelement ofto.

Usage

mapvalues(x, from, to, warn_missing = TRUE)

Arguments

x

the factor or vector to modify

from

a vector of the items to replace

to

a vector of replacement values

warn_missing

print a message if any of the old values arenot actually present inx

Details

Ifx is a factor, the matching levels of the factor will bereplaced with the new values.

The relatedrevalue function works only on character vectorsand factors, but this function works on vectors of any type and factors.

Examples

x <- c("a", "b", "c")mapvalues(x, c("a", "c"), c("A", "C"))# Works on factorsy <- factor(c("a", "b", "c", "a"))mapvalues(y, c("a", "c"), c("A", "C"))# Works on numeric vectorsz <- c(1, 4, 5, 9)mapvalues(z, from = c(1, 5, 9), to = c(10, 50, 90))

Extract matching rows of a data frame.

Description

Match works in the same way as join, but instead of return the combineddataset, it only returns the matching rows from the first dataset. This isparticularly useful when you've summarised the data in some wayand want to subset the original data by a characteristic of the subset.

Usage

match_df(x, y, on = NULL)

Arguments

x

data frame to subset.

y

data frame defining matching rows.

on

variables to match on - by default will use all variables commonto both data frames.

Details

match_df shares the same semantics asjoin, notmatch:

the match criterion is==, notidentical).
it doesn't work for columns that are not atomic vectors
if there are no matches, the row will be omitted'

Value

a data frame

Examples

# count the occurrences of each id in the baseball dataframe, then get the subset with a freq >25longterm <- subset(count(baseball, "id"), freq > 25)# longterm#             id freq# 30   ansonca01   27# 48   baineha01   27# ...# Select only rows from these longterm players from the baseball dataframe# (match would default to match on shared column names, but here was explicitly set "id")bb_longterm <- match_df(baseball, longterm, on="id")bb_longterm[1:5,]

Call function with arguments in array or data frame, returning a data frame.

Description

Call a multi-argument function with values taken from columns of andata frame or array, and combine results into a data frame

Usage

mdply(  .data,  .fun = NULL,  ...,  .expand = TRUE,  .progress = "none",  .inform = FALSE,  .parallel = FALSE,  .paropts = NULL)

Arguments

.data

matrix or data frame to use as source of arguments

.fun

function to apply to each piece

...

other arguments passed on to.fun

.expand

should output be 1d (expand = FALSE), with an element foreach row; or nd (expand = TRUE), with a dimension for each variable.

.progress

name of the progress bar to use, seecreate_progress_bar

.inform

produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging

.parallel

ifTRUE, apply function in parallel, using parallelbackend provided by foreach

.paropts

Details

Value

A data frame, as described in the output section.

Input

Call a multi-argument function with values taken fromcolumns of an data frame or array

Output

If there are no results, then this function will return a dataframe with zero rows and columns (data.frame()).

References

Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Examples

mdply(data.frame(mean = 1:5, sd = 1:5), rnorm, n = 2)mdply(expand.grid(mean = 1:5, sd = 1:5), rnorm, n = 2)mdply(cbind(mean = 1:5, sd = 1:5), rnorm, n = 5)mdply(cbind(mean = 1:5, sd = 1:5), as.data.frame(rnorm), n = 5)

Call function with arguments in array or data frame, returning a list.

Description

Call a multi-argument function with values taken from columns of andata frame or array, and combine results into a list.

Usage

mlply(  .data,  .fun = NULL,  ...,  .expand = TRUE,  .progress = "none",  .inform = FALSE,  .parallel = FALSE,  .paropts = NULL)

Arguments

.data

matrix or data frame to use as source of arguments

.fun

function to apply to each piece

...

other arguments passed on to.fun

.expand

should output be 1d (expand = FALSE), with an element foreach row; or nd (expand = TRUE), with a dimension for each variable.

.progress

name of the progress bar to use, seecreate_progress_bar

.inform

produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging

.parallel

ifTRUE, apply function in parallel, using parallelbackend provided by foreach

.paropts

Details

Value

list of results

Input

Call a multi-argument function with values taken fromcolumns of an data frame or array

Output

If there are no results, then this function will returna list of length 0 (list()).

References

Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Examples

mlply(cbind(1:4, 4:1), rep)mlply(cbind(1:4, times = 4:1), rep)mlply(cbind(1:4, 4:1), seq)mlply(cbind(1:4, length = 4:1), seq)mlply(cbind(1:4, by = 4:1), seq, to = 20)

Mutate a data frame by adding new or replacing existing columns.

Description

This function is very similar totransform but it executesthe transformations iteratively so that later transformations can use thecolumns created by earlier transformations. Like transform, unnamedcomponents are silently dropped.

Usage

mutate(.data, ...)

Arguments

.data

the data frame to transform

...

named parameters giving definitions of new columns.

Details

Mutate seems to be considerably faster than transform for large dataframes.

Examples

# Examples from transformmutate(airquality, Ozone = -Ozone)mutate(airquality, new = -Ozone, Temp = (Temp - 32) / 1.8)# Things transform can't domutate(airquality, Temp = (Temp - 32) / 1.8, OzT = Ozone / Temp)# mutate is rather faster than transformsystem.time(transform(baseball, avg_ab = ab / g))system.time(mutate(baseball, avg_ab = ab / g))

Toggle row names between explicit and implicit.

Description

Plyr functions ignore row names, so this function provides a way to preservethem by converting them to an explicit column in the data frame. After theplyr operation, you can then applyname_rows again to convert backfrom the explicit column to the implicitrownames.

Usage

name_rows(df)

Arguments

df

a data.frame, with eitherrownames, or a column called.rownames.

Examples

name_rows(mtcars)name_rows(name_rows(mtcars))df <- data.frame(a = sample(10))arrange(df, a)arrange(name_rows(df), a)name_rows(arrange(name_rows(df), a))

Compute names of quoted variables.

Description

Figure out names of quoted variables, using specified names if they exist,otherwise converting the values to character strings. This may createvariable names that can only be accessed using``.

Usage

## S3 method for class 'quoted'names(x)

Number of unique values.

Description

Calculate number of unique values of a variable as efficiently as possible.

Usage

nunique(x)

Arguments

x

vector

Monthly ozone measurements over Central America.

Description

This data set is a subset of the data from the 2006 ASA Data expochallenge,https://community.amstat.org/jointscsg-section/dataexpo/dataexpo2006.The data are monthly ozone averages on a very coarse 24 by 24 grid covering CentralAmerica, from Jan 1995 to Dec 2000. The data is stored in a 3d area withthe first two dimensions representing latitude and longitude, and the thirdrepresenting time.

Usage

ozone

Format

A 24 x 24 x 72 numeric array

References

https://community.amstat.org/jointscsg-section/dataexpo/dataexpo2006

Examples

value <- ozone[1, 1, ]time <- 1:72month.abbr <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")month <- factor(rep(month.abbr, length = 72), levels = month.abbr)year <- rep(1:6, each = 12)deseasf <- function(value) lm(value ~ month - 1)models <- alply(ozone, 1:2, deseasf)coefs <- laply(models, coef)dimnames(coefs)[[3]] <- month.abbrnames(dimnames(coefs))[3] <- "month"deseas <- laply(models, resid)dimnames(deseas)[[3]] <- 1:72names(dimnames(deseas))[3] <- "time"dim(coefs)dim(deseas)

Deprecated Functions in Package plyr

Description

These functions are provided for compatibility with older versions ofplyr only, and may be defunct as soon as the next release.

Details

liply
isplit2

Print quoted variables.

Description

Display thestructure of quoted variables

Usage

## S3 method for class 'quoted'print(x, ...)

Print split.

Description

Don't print labels, so it appears like a regular list

Usage

## S3 method for class 'split'print(x, ...)

Arguments

x

object to print

...

unused

Null progress bar

Description

A progress bar that does nothing

Usage

progress_none()

Details

This the default progress bar used by plyr functions. It's very simple tounderstand - it does nothing!

Examples

l_ply(1:100, identity, .progress = "none")

Text progress bar.

Description

A textual progress bar

Usage

progress_text(style = 3, ...)

Arguments

style

style of text bar, see Details section oftxtProgressBar

...

other arugments passed on totxtProgressBar

Details

This progress bar displays a textual progress bar that works on allplatforms. It is a thin wrapper around the built-insetTxtProgressBar and can be customised in the same way.

Examples

l_ply(1:100, identity, .progress = "text")l_ply(1:100, identity, .progress = progress_text(char = "-"))

Text progress bar with time.

Description

A textual progress bar that estimates time remaining. It displays theestimated time remaining and, when finished, total duration.

Usage

progress_time()

Examples

l_ply(1:100, function(x) Sys.sleep(.01), .progress = "time")

Graphical progress bar, powered by Tk.

Description

A graphical progress bar displayed in a Tk window

Usage

progress_tk(title = "plyr progress", label = "Working...", ...)

Arguments

title

window title

label

progress bar label (inside window)

...

other arguments passed on totkProgressBar

Details

This graphical progress will appear in a separate window.

Examples

## Not run: l_ply(1:100, identity, .progress = "tk")l_ply(1:100, identity, .progress = progress_tk(width=400))l_ply(1:100, identity, .progress = progress_tk(label=""))## End(Not run)

Graphical progress bar, powered by Windows.

Description

A graphical progress bar displayed in a separate window

Usage

progress_win(title = "plyr progress", ...)

Arguments

title

window title

...

other arguments passed on towinProgressBar

Details

This graphical progress only works on Windows.

Examples

## Not run: l_ply(1:100, identity, .progress = "win")l_ply(1:100, identity, .progress = progress_win(title="Working..."))## End(Not run)

Quick data frame.

Description

Experimental version ofas.data.frame that converts alist to a data frame, but doesn't do any checks to make sure it's avalid format. Much faster.

Usage

quickdf(list)

Arguments

list

list to convert to data frame

Replicate expression and discard results.

Description

Evalulate expression n times then discard results

Usage

r_ply(.n, .expr, .progress = "none", .print = FALSE)

Arguments

.n

number of times to evaluate the expression

.expr

expression to evaluate

.progress

name of the progress bar to use, seecreate_progress_bar

.print

automatically print each result? (default:FALSE)

Details

This function runs an expression multiple times, discarding the results.This function is equivalent toreplicate, but never returnsanything

References

Hadley Wickham (2011). The Split-Apply-Combine Strategy forData Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Examples

r_ply(10, plot(runif(50)))r_ply(25, hist(runif(1000)))

Replicate expression and return results in a array.

Description

Evalulate expression n times then combine results into an array

Usage

raply(.n, .expr, .progress = "none", .drop = TRUE)

Arguments

.n

number of times to evaluate the expression

.expr

expression to evaluate

.progress

name of the progress bar to use, seecreate_progress_bar

.drop

should extra dimensions of length 1 be dropped, simplifying the output. Defaults toTRUE

Details

This function runs an expression multiple times, and combines theresult into a data frame. If there are no results, then this functionreturns a vector of length 0 (vector(0)).This function is equivalent toreplicate, but will alwaysreturn results as a vector, matrix or array.

Value

if results are atomic with same type and dimensionality, a vector, matrix or array; otherwise, a list-array (a list with dimensions)

References

Hadley Wickham (2011). The Split-Apply-Combine Strategy forData Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Examples

raply(100, mean(runif(100)))raply(100, each(mean, var)(runif(100)))raply(10, runif(4))raply(10, matrix(runif(4), nrow=2))# See the central limit theorem in actionhist(raply(1000, mean(rexp(10))))hist(raply(1000, mean(rexp(100))))hist(raply(1000, mean(rexp(1000))))

Combine data.frames by row, filling in missing columns.

Description

rbinds a list of data frames filling missing columns with NA.

Usage

rbind.fill(...)

Arguments

...

input data frames to row bind together. The first argument canbe a list of data frames, in which case all other arguments are ignored.Any NULL inputs are silently dropped. If all inputs are NULL, the outputis NULL.

Details

This is an enhancement torbind that adds in columnsthat are not present in all inputs, accepts a list of data frames, andoperates substantially faster.

Column names and types in the output will appear in the order in whichthey were encountered.

Unordered factor columns will have their levels unified andcharacter data bound with factors will be converted tocharacter. POSIXct data will be converted to be in the same timezone. Array and matrix columns must have identical dimensions afterthe row count. Aside from these there are no general checks thateach column is of consistent data type.

Value

a single data frame

Examples

rbind.fill(mtcars[c("mpg", "wt")], mtcars[c("wt", "cyl")])

Bind matrices by row, and fill missing columns with NA.

Description

The matrices are bound together using their column names or the columnindices (in that order of precedence.) Numeric columns may be converted tocharacter beforehand, e.g. using format. If a matrix doesn't havecolnames, the column number is used. Note that this means that acolumn with name"1" is merged with the first column of a matrixwithout name and so on. The returned matrix will always have column names.

Usage

rbind.fill.matrix(...)

Arguments

...

the matrices to rbind. The first argument can be a list ofmatrices, in which case all other arguments are ignored.

Details

Vectors are converted to 1-column matrices.

Matrices of factors are not supported. (They are anyways quiteinconvenient.) You may convert them first to either numeric or charactermatrices. If a matrices of different types are merged, then normalcovnersion precendence will apply.

Row names are ignored.

Value

a matrix with column names

Author(s)

C. Beleites

Examples

A <- matrix (1:4, 2)B <- matrix (6:11, 2)ABrbind.fill.matrix (A, B)colnames (A) <- c (3, 1)Arbind.fill.matrix (A, B)rbind.fill.matrix (A, 99)

Replicate expression and return results in a data frame.

Description

Evaluate expression n times then combine results into a data frame

Usage

rdply(.n, .expr, .progress = "none", .id = NA)

Arguments

.n

number of times to evaluate the expression

.expr

expression to evaluate

.progress

name of the progress bar to use, seecreate_progress_bar

.id

name of the index column. PassNULL to avoid creation ofthe index column. For compatibility, omit this argument or passNAto use".n" as column name.

Details

This function runs an expression multiple times, and combines the result intoa data frame. If there are no results, then this function returns a dataframe with zero rows and columns (data.frame()). This function isequivalent toreplicate, but will always return results as adata frame.

Value

a data frame

References

Hadley Wickham (2011). The Split-Apply-Combine Strategy for DataAnalysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Examples

rdply(20, mean(runif(100)))rdply(20, each(mean, var)(runif(100)))rdply(20, data.frame(x = runif(2)))

Reduce dimensions.

Description

Remove extraneous dimensions

Usage

reduce_dim(x)

Arguments

x

array

Modify names by name, not position.

Description

Modify names by name, not position.

Usage

rename(x, replace, warn_missing = TRUE, warn_duplicated = TRUE)

Arguments

x

named object to modify

replace

named character vector, with new names as values, andold names as names.

warn_missing

print a message if any of the old names arenot actually present inx.

warn_duplicated

print a message if any name appears morethan once inx after the operation.Note: x is not altered: To save the result, you need to copy the returneddata into a variable.

Examples

x <- c("a" = 1, "b" = 2, d = 3, 4)# Rename column d to "c", updating the variable "x" with the resultx <- rename(x, replace = c("d" = "c"))x# Rename column "disp" to "displacement"rename(mtcars, c("disp" = "displacement"))

Replace specified values with new values, in a factor or character vector.

Description

Ifx is a factor, the named levels of the factor will bereplaced with the new values.

Usage

revalue(x, replace = NULL, warn_missing = TRUE)

Arguments

x

factor or character vector to modify

replace

named character vector, with new values as values, andold values as names.

warn_missing

print a message if any of the old values arenot actually present inx

Details

This function works only on character vectors and factors, but therelatedmapvalues function works on vectors of any type and factors,and instead of a named vector specifying the original and replacement values,it takes two separate vectors

Examples

x <- c("a", "b", "c")revalue(x, c(a = "A", c = "C"))revalue(x, c("a" = "A", "c" = "C"))y <- factor(c("a", "b", "c", "a"))revalue(y, c(a = "A", c = "C"))

Replicate expression and return results in a list.

Description

Evalulate expression n times then combine results into a list

Usage

rlply(.n, .expr, .progress = "none")

Arguments

.n

number of times to evaluate the expression

.expr

expression to evaluate

.progress

name of the progress bar to use, seecreate_progress_bar

Details

This function runs an expression multiple times, and combines theresult into a list. If there are no results, then this function will returna list of length 0 (list()). This function is equivalent toreplicate, but will always return results as a list.

Value

list of results

References

Hadley Wickham (2011). The Split-Apply-Combine Strategy forData Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.

Examples

mods <- rlply(100, lm(y ~ x, data=data.frame(x=rnorm(100), y=rnorm(100))))hist(laply(mods, function(x) summary(x)$r.squared))

Round to multiple of any number.

Description

Round to multiple of any number.

Usage

round_any(x, accuracy, f = round)

Arguments

x

numeric or date-time (POSIXct) vector to round

accuracy

number to round to; for POSIXct objects, a number of seconds

f

rounding function:floor,ceiling orround

Examples

round_any(135, 10)round_any(135, 100)round_any(135, 25)round_any(135, 10, floor)round_any(135, 100, floor)round_any(135, 25, floor)round_any(135, 10, ceiling)round_any(135, 100, ceiling)round_any(135, 25, ceiling)round_any(Sys.time() + 1:10, 5)round_any(Sys.time() + 1:10, 5, floor)round_any(Sys.time(), 3600)

‘Splat’ arguments to a function.

Description

Wraps a function in do.call, so instead of taking multiple arguments, ittakes a single named list which will be interpreted as its arguments.

Usage

splat(flat)

Arguments

flat

function to splat

Details

This is useful when you want to pass a function a row of data frame orarray, and don't want to manually pull it apart in your function.

Value

a function

Examples

hp_per_cyl <- function(hp, cyl, ...) hp / cylsplat(hp_per_cyl)(mtcars[1,])splat(hp_per_cyl)(mtcars)f <- function(mpg, wt, ...) data.frame(mw = mpg / wt)ddply(mtcars, .(cyl), splat(f))

Split indices.

Description

An optimised version of split for the special case of splitting rowindices into groups, as used bysplitter_d.

Usage

split_indices(group, n = 0L)

Arguments

group

integer indices

n

largest integer (may not appear in index). This is hint: ifthe largest value ofgroup is bigger thann, the outputwill silently expand.

Examples

split_indices(sample(10, 100, rep = TRUE))split_indices(sample(10, 100, rep = TRUE), 10)

Generate labels for split data frame.

Description

Create data frame giving labels for split data frame.

Usage

split_labels(splits, drop, id = plyr::id(splits, drop = TRUE))

Arguments

splits

list of variables to split up by

drop

whether all possible combinations should be considered, or only those present in the data

Split an array by .margins.

Description

Split a 2d or higher data structure into lower-d pieces based

Usage

splitter_a(data, .margins = 1L, .expand = TRUE, .id = NA)

Arguments

data

>1d data structure (matrix, data.frame or array)

.margins

a vector giving the subscripts to split updata by.

.expand

if splitting a dataframe by row, should output be 1d(expand = FALSE), with an element for each row; or nd (expand = TRUE),with a dimension for each variable.

.id

names of the split label.PassNULL to avoid creation of split labels.Omit or passNA to use the default names"X1","X2", ....Otherwise, this argument must have the same length as.margins.

Details

This is the workhorse of thea*ply functions. Given a >1 ddata structure (matrix, array, data.frame), it splits it into piecesbased on the subscripts that you supply. Each piece is a lower dimensionalslice.

The margins are specified in the same way asapply, butsplitter_a just splits up the data, whileapply alsoapplies a function and combines the pieces back together. This functionalso includes enough information to recreate the split from attributes onthe list of pieces.

Value

a list of lower-d slices, with attributes that record split details

Examples

plyr:::splitter_a(mtcars, 1)plyr:::splitter_a(mtcars, 2)plyr:::splitter_a(ozone, 2)plyr:::splitter_a(ozone, 3)plyr:::splitter_a(ozone, 1:2)

Split a data frame by variables.

Description

Split a data frame into pieces based on variable contained in that data frame

Usage

splitter_d(data, .variables = NULL, drop = TRUE)

Arguments

data

data frame

.variables

aquoted list of variables

drop

drop unnused factor levels?

Details

This is the workhorse of thed*ply functions. Based on the variablesyou supply, it breaks up a single data frame into a list of data frames,each containing a single combination from the levels of the specifiedvariables.

This is basically a thin wrapper aroundsplit whichevaluates the variables in the context of the data, and includes enoughinformation to reconstruct the labelling of the data frame afterother operations.

Value

a list of data.frames, with attributes that record split details

Examples

plyr:::splitter_d(mtcars, .(cyl))plyr:::splitter_d(mtcars, .(vs, am))plyr:::splitter_d(mtcars, .(am, vs))mtcars$cyl2 <- factor(mtcars$cyl, levels = c(2, 4, 6, 8, 10))plyr:::splitter_d(mtcars, .(cyl2), drop = TRUE)plyr:::splitter_d(mtcars, .(cyl2), drop = FALSE)mtcars$cyl3 <- ifelse(mtcars$vs == 1, NA, mtcars$cyl)plyr:::splitter_d(mtcars, .(cyl3))plyr:::splitter_d(mtcars, .(cyl3, vs))plyr:::splitter_d(mtcars, .(cyl3, vs), drop = FALSE)

Remove splitting variables from a data frame.

Description

This is useful when you want to perform some operation to every columnin the data frame, except the variables that you have used to split it.These variables will be automatically added back on to the result whencombining all results together.

Usage

strip_splits(df)

Arguments

df

data frame produced byd*ply.

Examples

dlply(mtcars, c("vs", "am"))dlply(mtcars, c("vs", "am"), strip_splits)

Summarise a data frame.

Description

Summarise works in an analogous way tomutate, exceptinstead of adding columns to an existing data frame, it creates a newdata frame. This is particularly useful in conjunction withddply as it makes it easy to perform group-wise summaries.

Usage

summarise(.data, ...)

Arguments

.data

the data frame to be summarised

...

further arguments of the form var = value

Note

Be careful when using existing variable names; the correspondingcolumns will be immediately updated with the new data and this can affectsubsequent operations referring to those variables.

Examples

# Let's extract the number of teams and total period of time# covered by the baseball dataframesummarise(baseball, duration = max(year) - min(year), nteams = length(unique(team)))# Combine with ddply to do that for each separate idddply(baseball, "id", summarise, duration = max(year) - min(year), nteams = length(unique(team)))

Take a subset along an arbitrary dimension

Description

Take a subset along an arbitrary dimension

Usage

take(x, along, indices, drop = FALSE)

Arguments

x

matrix or array to subset

along

dimension to subset along

indices

the indices to select

drop

should the dimensions of the array be simplified? DefaultstoFALSE which is the opposite of the useful R default.

Examples

x <- array(seq_len(3 * 4 * 5), c(3, 4, 5))take(x, 3, 1)take(x, 2, 1)take(x, 1, 1)take(x, 3, 1, drop = TRUE)take(x, 2, 1, drop = TRUE)take(x, 1, 1, drop = TRUE)

Function that always returns true.

Description

Function that always returns true.

Usage

true(...)

Arguments

...

all input ignored

Value

TRUE

Try, with default in case of error.

Description

try_default wraps try so that it returns a default value in the case of error.tryNULL provides a useful special case when dealing with lists.

Usage

try_default(expr, default, quiet = FALSE)tryNULL(expr)

Arguments

expr

expression to try

default

default value in case of error

quiet

should errors be printed (TRUE) or ignored (FALSE, default)

Apply with built in try.Uses compact, lapply and tryNULL

Description

Apply with built in try.Uses compact, lapply and tryNULL

Usage

tryapply(list, fun, ...)

Arguments

list

list to apply functionf on

fun

function

...

further arguments tof

Un-rowname.

Description

Strip rownames from an object

Usage

unrowname(x)

Arguments

x

data frame

Vector aggregate.

Description

This function is somewhat similar totapply, but is designed foruse in conjunction withid. It is simpler in that it onlyaccepts a single grouping vector (useid if you have more)and usesvapply internally, using the.default valueas the template.

Usage

vaggregate(.value, .group, .fun, ..., .default = NULL, .n = nlevels(.group))

Arguments

.value

vector of values to aggregate

.group

grouping vector

.fun

aggregation function

...

other arguments passed on to.fun

.default

default value used for missing groups. This argument isalso used as the template for function output.

.n

total number of groups

Details

vaggregate should be faster thantapply in most situationsbecause it avoids making a copy of the data.

Examples

# Some examples of use borrowed from ?tapplyn <- 17; fac <- factor(rep(1:3, length.out = n), levels = 1:5)table(fac)vaggregate(1:n, fac, sum)vaggregate(1:n, fac, sum, .default = NA_integer_)vaggregate(1:n, fac, range)vaggregate(1:n, fac, range, .default = c(NA, NA) + 0)vaggregate(1:n, fac, quantile)# Unlike tapply, vaggregate does not support multi-d output:tapply(warpbreaks$breaks, warpbreaks[,-1], sum)vaggregate(warpbreaks$breaks, id(warpbreaks[,-1]), sum)# But it is about 10x fasterx <- rnorm(1e6)y1 <- sample.int(10, 1e6, replace = TRUE)system.time(tapply(x, y1, mean))system.time(vaggregate(x, y1, mean))

Movatterモバイル変換

plyr: the split-apply-combine paradigm for R.

Description

Details

Row names

Helpers

Quote variables to create a list of unevaluated expressions for laterevaluation.

Description

Usage

Arguments

Details

Value

Examples

Subset splits.

Description

Usage

Arguments

Split array, apply function, and discard results.

Description

Usage

Arguments

Value

Input

Output

References

See Also

Split array, apply function, and return results in an array.

Description

Usage

Arguments

Details

Value

Warning

Input

Output

References

See Also

Examples

Split array, apply function, and return results in a data frame.

Description

Usage

Arguments

Value

Input

Output

References

See Also

Split array, apply function, and return results in a list.

Description

Usage

Arguments

Details

Value

Input

Output

References

See Also

Examples

Dimensions.

Description

Usage

Arguments

Dimension names.

Description

Usage

Arguments

Details

Order a data frame by its colums.

Description

Usage

Arguments

See Also

Examples

Make a function return a data frame.

Description

Usage

Arguments

Details

Convert split list to regular list.

Description