| Title: | Tools for Splitting, Applying and Combining Data |
| Version: | 1.8.9 |
| Description: | A set of tools that solves a common set of problems: you need to break a big problem down into manageable pieces, operate on each piece and then put all the pieces back together. For example, you might want to fit a model to each spatial location or time point in your study, summarise data by panels or collapse high-dimensional arrays to simpler summary statistics. The development of 'plyr' has been generously supported by 'Becton Dickinson'. |
| License: | MIT + file LICENSE |
| URL: | http://had.co.nz/plyr,https://github.com/hadley/plyr |
| BugReports: | https://github.com/hadley/plyr/issues |
| Depends: | R (≥ 3.1.0) |
| Imports: | Rcpp (≥ 0.11.0) |
| Suggests: | abind, covr, doParallel, foreach, iterators, itertools,tcltk, testthat |
| LinkingTo: | Rcpp |
| Encoding: | UTF-8 |
| LazyData: | true |
| RoxygenNote: | 7.2.3 |
| NeedsCompilation: | yes |
| Packaged: | 2023-09-27 13:58:04 UTC; hadleywickham |
| Author: | Hadley Wickham [aut, cre] |
| Maintainer: | Hadley Wickham <hadley@rstudio.com> |
| Repository: | CRAN |
| Date/Publication: | 2023-10-02 06:50:08 UTC |
plyr: the split-apply-combine paradigm for R.
Description
The plyr package is a set of clean and consistent tools that implement thesplit-apply-combine pattern in R. This is an extremely common pattern indata analysis: you solve a complex problem by breaking it down into smallpieces, doing something to each piece and then combining the results backtogether again.
Details
The plyr functions are named according to what sort of data structure theysplit up and what sort of data structure they return:
- a
array
- l
list
- d
data.frame
- m
multiple inputs
- r
repeat multiple times
- _
nothing
Soddply takes a data frame as input and returns a data frameas output, andl_ply takes a list as input and returns nothingas output.
Row names
By design, no plyr function will preserve row names - in general it is toohard to know what should be done with them for many of the operationssupported by plyr. If you want to preserve row names, usename_rows to convert them into an explicit column in yourdata frame, perform the plyr operations, and then usename_rowsagain to convert the column back into row names.
Helpers
Plyr also provides a set of helper functions for common data analysisproblems:
arrange: re-order the rows of a data frame byspecifying the columns to order bymutate: add new columns or modifying existing columns,liketransform, but new columns can refer to other columnsthat you just created.summarise: likemutatebut create anew data frame, not preserving any columns in the old data frame.join: an adapation ofmergewhich ismore similar to SQL, and has a much faster implementation if you onlywant to find the first match.match_df: a version ofjointhat insteadof returning the two tables combined together, only returns the rowsin the first table that match the second.colwise: make any function work colwise on a dataframerename: easily rename columns in a data frameround_any: round a number to any degree of precisioncount: quickly count unique combinations and returnreturn as a data frame.
Quote variables to create a list of unevaluated expressions for laterevaluation.
Description
This function is similar to~ in that it is used tocapture the name of variables, not their current value. This is usedthroughout plyr to specify the names of variables (or more complicatedexpressions).
Usage
.(..., .env = parent.frame())Arguments
... | unevaluated expressions to be recorded. Specify names if youwant the set the names of the resultant variables |
.env | environment in which unbound symbols in |
Details
Similar tricks can be performed withsubstitute, but whenfunctions can be called in multiple ways it becomes increasingly trickyto ensure that the values are extracted from the correct frame. Substitutetricks also make it difficult to program against the functions that usethem, while thequoted class providesas.quoted.character to convert strings to the appropriatedata structure.
Value
list of symbol and language primitives
Examples
.(a, b, c).(first = a, second = b, third = c).(a ^ 2, b - d, log(c))as.quoted(~ a + b + c)as.quoted(a ~ b + c)as.quoted(c("a", "b", "c"))# Some examples using ddply - look at the column namesddply(mtcars, "cyl", each(nrow, ncol))ddply(mtcars, ~ cyl, each(nrow, ncol))ddply(mtcars, .(cyl), each(nrow, ncol))ddply(mtcars, .(log(cyl)), each(nrow, ncol))ddply(mtcars, .(logcyl = log(cyl)), each(nrow, ncol))ddply(mtcars, .(vs + am), each(nrow, ncol))ddply(mtcars, .(vsam = vs + am), each(nrow, ncol))Subset splits.
Description
Subset splits, ensuring that labels keep matching
Usage
## S3 method for class 'split'x[i, ...]Arguments
x | split object |
i | index |
... | unused |
Split array, apply function, and discard results.
Description
For each slice of an array, apply function and discard results
Usage
a_ply( .data, .margins, .fun = NULL, ..., .expand = TRUE, .progress = "none", .inform = FALSE, .print = FALSE, .parallel = FALSE, .paropts = NULL)Arguments
.data | matrix, array or data frame to be processed |
.margins | a vector giving the subscripts to split up |
.fun | function to apply to each piece |
... | other arguments passed on to |
.expand | if |
.progress | name of the progress bar to use, see |
.inform | produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging |
.print | automatically print each result? (default: |
.parallel | if |
.paropts | a list of additional options passed intothe |
Value
Nothing
Input
This function splits matrices, arrays and data frames bydimensions
Output
All output is discarded. This is useful for functions that you arecalling purely for their side effects like displaying plots orsaving output.
References
Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
See Also
Other array input:aaply(),adply(),alply()
Other no output:d_ply(),l_ply(),m_ply()
Split array, apply function, and return results in an array.
Description
For each slice of an array, apply function, keeping results as an array.
Usage
aaply( .data, .margins, .fun = NULL, ..., .expand = TRUE, .progress = "none", .inform = FALSE, .drop = TRUE, .parallel = FALSE, .paropts = NULL)Arguments
.data | matrix, array or data frame to be processed |
.margins | a vector giving the subscripts to split up |
.fun | function to apply to each piece |
... | other arguments passed on to |
.expand | if |
.progress | name of the progress bar to use, see |
.inform | produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging |
.drop | should extra dimensions of length 1 in the output bedropped, simplifying the output. Defaults to |
.parallel | if |
.paropts | a list of additional options passed intothe |
Details
This function is very similar toapply, except that it willalways return an array, and when the function returns >1 d data structures,those dimensions are added on to the highest dimensions, rather than thelowest dimensions. This makesaaply idempotent, so thataaply(input, X, identity) is equivalent toaperm(input, X).
Value
if results are atomic with same type and dimensionality, avector, matrix or array; otherwise, a list-array (a list withdimensions)
Warning
Contrary toalply andadply, passing a dataframe as first argument toaaply may lead to unexpected resultssuch as huge memory allocations.
Input
This function splits matrices, arrays and data frames bydimensions
Output
If there are no results, then this function will return a vector oflength 0 (vector()).
References
Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
See Also
Other array input:a_ply(),adply(),alply()
Other array output:daply(),laply(),maply()
Examples
dim(ozone)aaply(ozone, 1, mean)aaply(ozone, 1, mean, .drop = FALSE)aaply(ozone, 3, mean)aaply(ozone, c(1,2), mean)dim(aaply(ozone, c(1,2), mean))dim(aaply(ozone, c(1,2), mean, .drop = FALSE))aaply(ozone, 1, each(min, max))aaply(ozone, 3, each(min, max))standardise <- function(x) (x - min(x)) / (max(x) - min(x))aaply(ozone, 3, standardise)aaply(ozone, 1:2, standardise)aaply(ozone, 1:2, diff)Split array, apply function, and return results in a data frame.
Description
For each slice of an array, apply function then combine results into a dataframe.
Usage
adply( .data, .margins, .fun = NULL, ..., .expand = TRUE, .progress = "none", .inform = FALSE, .parallel = FALSE, .paropts = NULL, .id = NA)Arguments
.data | matrix, array or data frame to be processed |
.margins | a vector giving the subscripts to split up |
.fun | function to apply to each piece |
... | other arguments passed on to |
.expand | if |
.progress | name of the progress bar to use, see |
.inform | produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging |
.parallel | if |
.paropts | a list of additional options passed intothe |
.id | name(s) of the index column(s).Pass |
Value
A data frame, as described in the output section.
Input
This function splits matrices, arrays and data frames bydimensions
Output
The most unambiguous behaviour is achieved when.fun returns adata frame - in that case pieces will be combined withrbind.fill. If.fun returns an atomic vector offixed length, it will berbinded together and converted to a dataframe. Any other values will result in an error.
If there are no results, then this function will return a dataframe with zero rows and columns (data.frame()).
References
Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
See Also
Other array input:a_ply(),aaply(),alply()
Other data frame output:ddply(),ldply(),mdply()
Split array, apply function, and return results in a list.
Description
For each slice of an array, apply function then combine results into alist.
Usage
alply( .data, .margins, .fun = NULL, ..., .expand = TRUE, .progress = "none", .inform = FALSE, .parallel = FALSE, .paropts = NULL, .dims = FALSE)Arguments
.data | matrix, array or data frame to be processed |
.margins | a vector giving the subscripts to split up |
.fun | function to apply to each piece |
... | other arguments passed on to |
.expand | if |
.progress | name of the progress bar to use, see |
.inform | produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging |
.parallel | if |
.paropts | a list of additional options passed intothe |
.dims | if |
Details
The list will have "dims" and "dimnames" corresponding to themargins given. For instancealply(x, c(3,2), ...) wherex has dimsc(4,3,2) will give a result with dimsc(2,3).
alply is somewhat similar toapply for caseswhere the results are not atomic.
Value
list of results
Input
This function splits matrices, arrays and data frames bydimensions
Output
If there are no results, then this function will returna list of length 0 (list()).
References
Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
See Also
Other array input:a_ply(),aaply(),adply()
Other list output:dlply(),llply(),mlply()
Examples
alply(ozone, 3, quantile)alply(ozone, 3, function(x) table(round(x)))Dimensions.
Description
Consistent dimensions for vectors, matrices and arrays.
Usage
amv_dim(x)Arguments
x | array, matrix or vector |
Dimension names.
Description
Consistent dimnames for vectors, matrices and arrays.
Usage
amv_dimnames(x)Arguments
x | array, matrix or vector |
Details
Unlikedimnames no part of the output will ever benull. If a component of dimnames is omitted,amv_dimnameswill return an integer sequence of the appropriate length.
Order a data frame by its colums.
Description
This function completes the subsetting, transforming and ordering triadwith a function that works in a similar way tosubset andtransform but for reordering a data frame by its columns.This saves a lot of typing!
Usage
arrange(df, ...)Arguments
df | data frame to reorder |
... | expressions evaluated in the context of |
See Also
order for sorting function in the base package
Examples
# sort mtcars data by cylinder and displacementmtcars[with(mtcars, order(cyl, disp)), ]# Same result using arrange: no need to use with(), as the context is implicit# NOTE: plyr functions do NOT preserve row.namesarrange(mtcars, cyl, disp)# Let's keep the row.names in this examplemyCars = cbind(vehicle=row.names(mtcars), mtcars)arrange(myCars, cyl, disp)# Sort with displacement in descending orderarrange(myCars, cyl, desc(disp))Make a function return a data frame.
Description
Create a new function that returns the existing function wrapped in adata.frame with a single column,value.
Usage
## S3 method for class ''function''as.data.frame(x, row.names, optional, ...)Arguments
x | function to make return a data frame |
row.names | necessary to match the generic, but not used |
optional | necessary to match the generic, but not used |
... | necessary to match the generic, but not used |
Details
This is useful when calling*dply functions with a function thatreturns a vector, and you want the output in rows, rather than columns.Thevalue column is always created, even for empty inputs.
Convert split list to regular list.
Description
Strip off label related attributed to make a strip list as regular list
Usage
## S3 method for class 'split'as.list(x, ...)Arguments
x | object to convert to a list |
... | unused |
Convert input to quoted variables.
Description
Convert characters, formulas and calls to quoted .variables
Usage
as.quoted(x, env = parent.frame())Arguments
x | input to quote |
env | environment in which unbound symbols in expression should beevaluated. Defaults to the environment in which |
Details
This method is called by default on all plyr functions that take a.variables argument, so that equivalent forms can be used anywhere.
Currently conversions exist for character vectors, formulas andcall objects.
Value
a list of quoted variables
See Also
Examples
as.quoted(c("a", "b", "log(d)"))as.quoted(a ~ b + log(d))Yearly batting records for all major league baseball players
Description
This data frame contains batting statistics for a subset of playerscollected fromhttp://www.baseball-databank.org/. There are a totalof 21,699 records, covering 1,228 players from 1871 to 2007. Only playerswith more 15 seasons of play are included.
Usage
baseballFormat
A 21699 x 22 data frame
Variables
Variables:
id, unique player id
year, year of data
stint
team, team played for
lg, league
g, number of games
ab, number of times at bat
r, number of runs
h, hits, times reached base because of a batted, fair ball withouterror by the defense
X2b, hits on which the batter reached second base safely
X3b, hits on which the batter reached third base safely
hr, number of home runs
rbi, runs batted in
sb, stolen bases
cs, caught stealing
bb, base on balls (walk)
so, strike outs
ibb, intentional base on balls
hbp, hits by pitch
sh, sacrifice hits
sf, sacrifice flies
gidp, ground into double play
References
http://www.baseball-databank.org/
Examples
baberuth <- subset(baseball, id == "ruthba01")baberuth$cyear <- baberuth$year - min(baberuth$year) + 1calculate_cyear <- function(df) { mutate(df, cyear = year - min(year), cpercent = cyear / (max(year) - min(year)) )}baseball <- ddply(baseball, .(id), calculate_cyear)baseball <- subset(baseball, ab >= 25)model <- function(df) { lm(rbi / ab ~ cyear, data=df)}model(baberuth)models <- dlply(baseball, .(id), model)Column-wise function.
Description
Turn a function that operates on a vector into a function that operatescolumn-wise on a data.frame.
Usage
colwise(.fun, .cols = true, ...)catcolwise(.fun, ...)numcolwise(.fun, ...)Arguments
.fun | function |
.cols | either a function that tests columns for inclusion, or aquoted object giving which columns to process |
... | other arguments passed on to |
Details
catcolwise andnumcolwise provide version that only operateon discrete and numeric variables respectively.
Examples
# Count number of missing valuesnmissing <- function(x) sum(is.na(x))# Apply to every column in a data framecolwise(nmissing)(baseball)# This syntax looks a little different. It is shorthand for the# the following:f <- colwise(nmissing)f(baseball)# This is particularly useful in conjunction with d*plyddply(baseball, .(year), colwise(nmissing))# To operate only on specified columns, supply them as the second# argument. Many different forms are accepted.ddply(baseball, .(year), colwise(nmissing, .(sb, cs, so)))ddply(baseball, .(year), colwise(nmissing, c("sb", "cs", "so")))ddply(baseball, .(year), colwise(nmissing, ~ sb + cs + so))# Alternatively, you can specify a boolean function that determines# whether or not a column should be includedddply(baseball, .(year), colwise(nmissing, is.character))ddply(baseball, .(year), colwise(nmissing, is.numeric))ddply(baseball, .(year), colwise(nmissing, is.discrete))# These last two cases are particularly common, so some shortcuts are# provided:ddply(baseball, .(year), numcolwise(nmissing))ddply(baseball, .(year), catcolwise(nmissing))# You can supply additional arguments to either colwise, or the function# it generates:numcolwise(mean)(baseball, na.rm = TRUE)numcolwise(mean, na.rm = TRUE)(baseball)Compact list.
Description
Remove all NULL entries from a list
Usage
compact(l)Arguments
l | list |
Count the number of occurences.
Description
Equivalent toas.data.frame(table(x)), but does not includecombinations with zero counts.
Usage
count(df, vars = NULL, wt_var = NULL)Arguments
df | data frame to be processed |
vars | variables to count unique values of |
wt_var | optional variable to weight by - if this is non-NULL, countwill sum up the value of this variable for each combination of idvariables. |
Details
Speed-wise count is competitive withtable for singlevariables, but it really comes into its own when summarising multipledimensions because it only counts combinations that actually occur in thedata.
Compared totable +as.data.frame,countalso preserves the type of the identifier variables, instead of convertingthem to characters/factors.
Value
a data frame with label and freq columns
See Also
table for related functionality in the base package
Examples
# Count of each value of "id" in the first 100 casescount(baseball[1:100,], vars = "id")# Count of ids, weighted by their "g" loadingcount(baseball[1:100,], vars = "id", wt_var = "g")count(baseball, "id", "ab")count(baseball, "lg")# How many stints do players do?count(baseball, "stint")# Count of times each player appeared in each of the years they playedcount(baseball[1:100,], c("id", "year"))# Count of countscount(count(baseball[1:100,], c("id", "year")), "id", "freq")count(count(baseball, c("id", "year")), "freq")Create progress bar.
Description
Create progress bar object from text string.
Usage
create_progress_bar(name = "none", ...)Arguments
name | type of progress bar to create |
... | other arguments passed onto progress bar function |
Details
Progress bars give feedback on how apply step is proceeding. Thisis mainly useful for long running functions, as for short functions, thetime taken up by splitting and combining may be on the same order (orlonger) as the apply step. Additionally, for short functions, the timeneeded to update the progress bar can significantly slow down the process.For the trivial examples below, using the tk progress bar slows things downby a factor of a thousand.
Note the that progress bar is approximate, and if the time taken byindividual function applications is highly non-uniform it may not be veryinformative of the time left.
There are currently four types of progress bar: "none", "text", "tk", and"win". See the individual documentation for more details. In plyrfunctions, these can either be specified by name, or you can create theprogress bar object yourself if you want more control over its apperance.See the examples.
See Also
progress_none,progress_text,progress_tk,progress_win
Examples
# No progress barl_ply(1:100, identity, .progress = "none")## Not run: # Use the Tcl/Tk interfacel_ply(1:100, identity, .progress = "tk")## End(Not run)# Text-based progress (|======|)l_ply(1:100, identity, .progress = "text")# Choose a progress character, run a length of time you can seel_ply(1:10000, identity, .progress = progress_text(char = "."))Split data frame, apply function, and discard results.
Description
For each subset of a data frame, apply function and discard results.To apply a function for each row, usea_ply with.margins set to1.
Usage
d_ply( .data, .variables, .fun = NULL, ..., .progress = "none", .inform = FALSE, .drop = TRUE, .print = FALSE, .parallel = FALSE, .paropts = NULL)Arguments
.data | data frame to be processed |
.variables | variables to split data frame by, as |
.fun | function to apply to each piece |
... | other arguments passed on to |
.progress | name of the progress bar to use, see |
.inform | produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging |
.drop | should combinations of variables that do not appear in theinput data be preserved (FALSE) or dropped (TRUE, default) |
.print | automatically print each result? (default: |
.parallel | if |
.paropts | a list of additional options passed intothe |
Value
Nothing
Input
This function splits data frames by variables.
Output
All output is discarded. This is useful for functions that you arecalling purely for their side effects like displaying plots orsaving output.
References
Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
See Also
Other data frame input:daply(),ddply(),dlply()
Other no output:a_ply(),l_ply(),m_ply()
Split data frame, apply function, and return results in an array.
Description
For each subset of data frame, apply function then combine results intoan array.daply with a function that operates column-wise issimilar toaggregate.To apply a function for each row, useaaply with.margins set to1.
Usage
daply( .data, .variables, .fun = NULL, ..., .progress = "none", .inform = FALSE, .drop_i = TRUE, .drop_o = TRUE, .parallel = FALSE, .paropts = NULL)Arguments
.data | data frame to be processed |
.variables | variables to split data frame by, as quotedvariables, a formula or character vector |
.fun | function to apply to each piece |
... | other arguments passed on to |
.progress | name of the progress bar to use, see |
.inform | produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging |
.drop_i | should combinations of variables that do not appear in theinput data be preserved (FALSE) or dropped (TRUE, default) |
.drop_o | should extra dimensions of length 1 in the output bedropped, simplifying the output. Defaults to |
.parallel | if |
.paropts | a list of additional options passed intothe |
Value
if results are atomic with same type and dimensionality, avector, matrix or array; otherwise, a list-array (a list withdimensions)
Input
This function splits data frames by variables.
Output
If there are no results, then this function will return a vector oflength 0 (vector()).
References
Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
See Also
Other array output:aaply(),laply(),maply()
Other data frame input:d_ply(),ddply(),dlply()
Examples
daply(baseball, .(year), nrow)# Several different ways of summarising by variables that should not be# included in the summarydaply(baseball[, c(2, 6:9)], .(year), colwise(mean))daply(baseball[, 6:9], .(baseball$year), colwise(mean))daply(baseball, .(year), function(df) colwise(mean)(df[, 6:9]))Split data frame, apply function, and return results in a data frame.
Description
For each subset of a data frame, apply function then combine results into adata frame.To apply a function for each row, useadply with.margins set to1.
Usage
ddply( .data, .variables, .fun = NULL, ..., .progress = "none", .inform = FALSE, .drop = TRUE, .parallel = FALSE, .paropts = NULL)Arguments
.data | data frame to be processed |
.variables | variables to split data frame by, as |
.fun | function to apply to each piece |
... | other arguments passed on to |
.progress | name of the progress bar to use, see |
.inform | produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging |
.drop | should combinations of variables that do not appear in theinput data be preserved (FALSE) or dropped (TRUE, default) |
.parallel | if |
.paropts | a list of additional options passed intothe |
Value
A data frame, as described in the output section.
Input
This function splits data frames by variables.
Output
The most unambiguous behaviour is achieved when.fun returns adata frame - in that case pieces will be combined withrbind.fill. If.fun returns an atomic vector offixed length, it will berbinded together and converted to a dataframe. Any other values will result in an error.
If there are no results, then this function will return a dataframe with zero rows and columns (data.frame()).
References
Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
See Also
tapply for similar functionality in the base package
Other data frame input:d_ply(),daply(),dlply()
Other data frame output:adply(),ldply(),mdply()
Examples
# Summarize a dataset by two variablesdfx <- data.frame( group = c(rep('A', 8), rep('B', 15), rep('C', 6)), sex = sample(c("M", "F"), size = 29, replace = TRUE), age = runif(n = 29, min = 18, max = 54))# Note the use of the '.' function to allow# group and sex to be used without quotingddply(dfx, .(group, sex), summarize, mean = round(mean(age), 2), sd = round(sd(age), 2))# An example using a formula for .variablesddply(baseball[1:100,], ~ year, nrow)# Applying two functions; nrow and ncolddply(baseball, .(lg), c("nrow", "ncol"))# Calculate mean runs batted in for each yearrbi <- ddply(baseball, .(year), summarise, mean_rbi = mean(rbi, na.rm = TRUE))# Plot a line chart of the resultplot(mean_rbi ~ year, type = "l", data = rbi)# make new variable career_year based on the# start year for each player (id)base2 <- ddply(baseball, .(id), mutate, career_year = year - min(year) + 1)Set defaults.
Description
Convient method for combining a list of values with their defaults.
Usage
defaults(x, y)Arguments
x | list of values |
y | defaults |
Descending order.
Description
Transform a vector into a format that will be sorted in descending order.
Usage
desc(x)Arguments
x | vector to transform |
Examples
desc(1:10)desc(factor(letters))first_day <- seq(as.Date("1910/1/1"), as.Date("1920/1/1"), "years")desc(first_day)Number of dimensions.
Description
Number of dimensions of an array or vector
Usage
dims(x)Arguments
x | array |
Split data frame, apply function, and return results in a list.
Description
For each subset of a data frame, apply function then combine results into alist.dlply is similar toby except that the resultsare returned in a different format.To apply a function for each row, usealply with.margins set to1.
Usage
dlply( .data, .variables, .fun = NULL, ..., .progress = "none", .inform = FALSE, .drop = TRUE, .parallel = FALSE, .paropts = NULL)Arguments
.data | data frame to be processed |
.variables | variables to split data frame by, as |
.fun | function to apply to each piece |
... | other arguments passed on to |
.progress | name of the progress bar to use, see |
.inform | produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging |
.drop | should combinations of variables that do not appear in theinput data be preserved (FALSE) or dropped (TRUE, default) |
.parallel | if |
.paropts | a list of additional options passed intothe |
Value
list of results
Input
This function splits data frames by variables.
Output
If there are no results, then this function will returna list of length 0 (list()).
References
Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
See Also
Other data frame input:d_ply(),daply(),ddply()
Other list output:alply(),llply(),mlply()
Examples
linmod <- function(df) { lm(rbi ~ year, data = mutate(df, year = year - min(year)))}models <- dlply(baseball, .(id), linmod)models[[1]]coef <- ldply(models, coef)with(coef, plot(`(Intercept)`, year))qual <- laply(models, function(mod) summary(mod)$r.squared)hist(qual)Aggregate multiple functions into a single function.
Description
Combine multiple functions into a single function returning a named vectorof outputs.Note: you cannot supply additional parameters for the summary functions
Usage
each(...)Arguments
... | functions to combine. each function should produce a singlenumber as output |
See Also
summarise for applying summary functions to data
Examples
# Call min() and max() on the vector 1:10each(min, max)(1:10)# This syntax looks a little different. It is shorthand for the# the following:f<- each(min, max)f(1:10)# Three equivalent ways to call min() and max() on the vector 1:10each("min", "max")(1:10)each(c("min", "max"))(1:10)each(c(min, max))(1:10)# Call length(), min() and max() on a random normal vectoreach(length, mean, var)(rnorm(100))Check if a data frame is empty.
Description
Empty if it's null or it has 0 rows or columns
Usage
empty(df)Arguments
df | data frame to check |
Evaluate a quoted list of variables.
Description
Evaluates quoted variables in specified environment
Usage
eval.quoted(exprs, envir = NULL, enclos = NULL, try = FALSE)Arguments
exprs | quoted object to evaluate |
try | if TRUE, return |
Value
a list
Fail with specified value.
Description
Modify a function so that it returns a default value when there is anerror.
Usage
failwith(default = NULL, f, quiet = FALSE)Arguments
default | default value |
f | function |
quiet | all error messages be suppressed? |
Value
a function
See Also
Examples
f <- function(x) if (x == 1) stop("Error!") else 1## Not run: f(1)f(2)## End(Not run)safef <- failwith(NULL, f)safef(1)safef(2)Capture current evaluation context.
Description
This function captures the current context, making it easierto use**ply with functions that do special evaluation andneed access to the environment where ddply was called from.
Usage
here(f)Arguments
f | a function that does non-standard evaluation |
Author(s)
Peter Meilstrup,https://github.com/crowding
Examples
df <- data.frame(a = rep(c("a","b"), each = 10), b = 1:20)f1 <- function(label) { ddply(df, "a", mutate, label = paste(label, b))}## Not run: f1("name:")# Doesn't work because mutate can't find label in the current scopef2 <- function(label) { ddply(df, "a", here(mutate), label = paste(label, b))}f2("name:")# Works :)Compute a unique numeric id for each unique row in a data frame.
Description
Properties:
order(id)is equivalent todo.call(order, df)rows containing the same data have the same value
if
drop = FALSEthen room for all possibilites
Usage
id(.variables, drop = FALSE)Arguments
.variables | list of variables |
drop | drop unusued factor levels? |
Value
a numeric vector with attribute n, giving total number ofpossibilities
See Also
Numeric id for a vector.
Description
Numeric id for a vector.
Usage
id_var(x, drop = FALSE)Construct an immutable data frame.
Description
An immutable data frame works like an ordinary data frame, except that whenyou subset it, it returns a reference to the original data frame, not aa copy. This makes subsetting substantially faster and has a big impactwhen you are working with large datasets with many groups.
Usage
idata.frame(df)Arguments
df | a data frame |
Details
This method is still a little experimental, so please let me know if yourun into any problems.
Value
an immutable data frame
Examples
system.time(dlply(baseball, "id", nrow))system.time(dlply(idata.frame(baseball), "id", nrow))An indexed array.
Description
Create a indexed array, a space efficient way of indexing into a largearray.
Usage
indexed_array(env, index)Arguments
env | environment containing data frame |
index | list of indices |
An indexed data frame.
Description
Create a indexed list, a space efficient way of indexing into a large data frame
Usage
indexed_df(data, index, vars)Arguments
data | environment containing data frame |
index | list of indices |
vars | a character vector giving the variables used for subsetting |
Determine if a vector is discrete.
Description
A discrete vector is a factor or a character vector
Usage
is.discrete(x)Arguments
x | vector to test |
Examples
is.discrete(1:10)is.discrete(c("a", "b", "c"))is.discrete(factor(c("a", "b", "c")))Is a formula?Checks if argument is a formula
Description
Is a formula?Checks if argument is a formula
Usage
is.formula(x)Split iterator that returns values, not indices.
Description
Split iterator that returns values, not indices.
Usage
isplit2(x, f, drop = FALSE, ...)Warning
Deprecated, do not use in new code.
See Also
Join two data frames together.
Description
Join, like merge, is designed for the types of problemswhere you would use a sql join.
Usage
join(x, y, by = NULL, type = "left", match = "all")Arguments
x | data frame |
y | data frame |
by | character vector of variable names to join by. If omitted, willmatch on all common variables. |
type | type of join: left (default), right, inner or full. Seedetails for more information. |
match | how should duplicate ids be matched? Either match just the |
Details
The four join types return:
inner: only rows with matching keys in both x and yleft: all rows in x, adding matching columns from yright: all rows in y, adding matching columns from xfull: all rows in x with matching columns in y, then therows of y that don't match x.
Note that from plyr 1.5,join will (by default) return all matches,not just the first match, as it did previously.
Unlike merge, preserves the order of x no matter what join type is used.If needed, rows from y will be added to the bottom. Join is often fasterthan merge, although it is somewhat less featureful - it currently offersno way to rename output or merge on different variables in the x and ydata frames.
Examples
first <- ddply(baseball, "id", summarise, first = min(year))system.time(b2 <- merge(baseball, first, by = "id", all.x = TRUE))system.time(b3 <- join(baseball, first, by = "id"))b2 <- arrange(b2, id, year, stint)b3 <- arrange(b3, id, year, stint)stopifnot(all.equal(b2, b3))Join keys.Given two data frames, create a unique key for each row.
Description
Join keys.Given two data frames, create a unique key for each row.
Usage
join.keys(x, y, by)Arguments
x | data frame |
y | data frame |
by | character vector of variable names to join by |
Recursively join a list of data frames.
Description
Recursively join a list of data frames.
Usage
join_all(dfs, by = NULL, type = "left", match = "all")Arguments
dfs | A list of data frames. |
by | character vector of variable names to join by. If omitted, willmatch on all common variables. |
type | type of join: left (default), right, inner or full. Seedetails for more information. |
match | how should duplicate ids be matched? Either match just the |
Examples
dfs <- list( a = data.frame(x = 1:10, a = runif(10)), b = data.frame(x = 1:10, b = runif(10)), c = data.frame(x = 1:10, c = runif(10)))join_all(dfs)join_all(dfs, "x")Split list, apply function, and discard results.
Description
For each element of a list, apply function and discard results
Usage
l_ply( .data, .fun = NULL, ..., .progress = "none", .inform = FALSE, .print = FALSE, .parallel = FALSE, .paropts = NULL)Arguments
.data | list to be processed |
.fun | function to apply to each piece |
... | other arguments passed on to |
.progress | name of the progress bar to use, see |
.inform | produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging |
.print | automatically print each result? (default: |
.parallel | if |
.paropts | a list of additional options passed intothe |
Value
Nothing
Input
This function splits lists by elements.
Output
All output is discarded. This is useful for functions that you arecalling purely for their side effects like displaying plots orsaving output.
References
Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
See Also
Other list input:laply(),ldply(),llply()
Other no output:a_ply(),d_ply(),m_ply()
Examples
l_ply(llply(mtcars, round), table, .print = TRUE)l_ply(baseball, function(x) print(summary(x)))Split list, apply function, and return results in an array.
Description
For each element of a list, apply function then combine results into anarray.
Usage
laply( .data, .fun = NULL, ..., .progress = "none", .inform = FALSE, .drop = TRUE, .parallel = FALSE, .paropts = NULL)Arguments
.data | list to be processed |
.fun | function to apply to each piece |
... | other arguments passed on to |
.progress | name of the progress bar to use, see |
.inform | produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging |
.drop | should extra dimensions of length 1 in the output bedropped, simplifying the output. Defaults to |
.parallel | if |
.paropts | a list of additional options passed intothe |
Details
laply is similar in spirit tosapply exceptthat it will always return an array, and the output is transposed withrespectsapply - each element of the list corresponds to a row,not a column.
Value
if results are atomic with same type and dimensionality, avector, matrix or array; otherwise, a list-array (a list withdimensions)
Input
This function splits lists by elements.
Output
If there are no results, then this function will return a vector oflength 0 (vector()).
References
Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
See Also
Other list input:l_ply(),ldply(),llply()
Other array output:aaply(),daply(),maply()
Examples
laply(baseball, is.factor)# cfldply(baseball, is.factor)colwise(is.factor)(baseball)laply(seq_len(10), identity)laply(seq_len(10), rep, times = 4)laply(seq_len(10), matrix, nrow = 2, ncol = 2)Split list, apply function, and return results in a data frame.
Description
For each element of a list, apply function then combine results into a dataframe.
Usage
ldply( .data, .fun = NULL, ..., .progress = "none", .inform = FALSE, .parallel = FALSE, .paropts = NULL, .id = NA)Arguments
.data | list to be processed |
.fun | function to apply to each piece |
... | other arguments passed on to |
.progress | name of the progress bar to use, see |
.inform | produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging |
.parallel | if |
.paropts | a list of additional options passed intothe |
.id | name of the index column (used if |
Value
A data frame, as described in the output section.
Input
This function splits lists by elements.
Output
The most unambiguous behaviour is achieved when.fun returns adata frame - in that case pieces will be combined withrbind.fill. If.fun returns an atomic vector offixed length, it will berbinded together and converted to a dataframe. Any other values will result in an error.
If there are no results, then this function will return a dataframe with zero rows and columns (data.frame()).
References
Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
See Also
Other list input:l_ply(),laply(),llply()
Other data frame output:adply(),ddply(),mdply()
Experimental iterator based version of llply.
Description
Because iterators do not have known length,liply starts byallocating an output list of length 50, and then doubles that lengthwhenever it runs out of space. This gives O(n ln n) performance ratherthan the O(n ^ 2) performance from the naive strategy of growing the listeach time.
Usage
liply(.iterator, .fun = NULL, ...)Arguments
.iterator | iterator object |
.fun | function to apply to each piece |
... | other arguments passed on to |
Warning
Deprecated, do not use in new code.
See Also
List to array.
Description
Reduce/simplify a list of homogenous objects to an array
Usage
list_to_array(res, labels = NULL, .drop = FALSE)Arguments
res | list of input data |
labels | a data frame of labels, one row for each element of res |
.drop | should extra dimensions be dropped (TRUE) or preserved (FALSE) |
See Also
Other list simplification functions:list_to_dataframe(),list_to_vector()
List to data frame.
Description
Reduce/simplify a list of homogenous objects to a data frame. AllNULL entries are removed. Remaining entries must be all atomicor all data frames.
Usage
list_to_dataframe(res, labels = NULL, id_name = NULL, id_as_factor = FALSE)Arguments
res | list of input data |
labels | a data frame of labels, one row for each element of res |
id_name | the name of the index column, |
See Also
Other list simplification functions:list_to_array(),list_to_vector()
List to vector.
Description
Reduce/simplify a list of homogenous objects to a vector
Usage
list_to_vector(res)Arguments
res | list of input data |
See Also
Other list simplification functions:list_to_array(),list_to_dataframe()
Split list, apply function, and return results in a list.
Description
For each element of a list, apply function, keeping results as a list.
Usage
llply( .data, .fun = NULL, ..., .progress = "none", .inform = FALSE, .parallel = FALSE, .paropts = NULL)Arguments
.data | list to be processed |
.fun | function to apply to each piece |
... | other arguments passed on to |
.progress | name of the progress bar to use, see |
.inform | produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging |
.parallel | if |
.paropts | a list of additional options passed intothe |
Details
llply is equivalent tolapply except that it willpreserve labels and can display a progress bar.
Value
list of results
Input
This function splits lists by elements.
Output
If there are no results, then this function will returna list of length 0 (list()).
References
Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
See Also
Other list input:l_ply(),laply(),ldply()
Other list output:alply(),dlply(),mlply()
Examples
llply(llply(mtcars, round), table)llply(baseball, summary)# Examples from ?lapplyx <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE))llply(x, mean)llply(x, quantile, probs = 1:3/4)Loop apply
Description
An optimised version of lapply for the special case of operating onseq_len(n)
Usage
loop_apply(n, f, env = parent.frame())Arguments
n | length of sequence |
f | function to apply to each integer |
env | environment in which to evaluate function |
Call function with arguments in array or data frame, discarding results.
Description
Call a multi-argument function with values taken from columns of andata frame or array, and discard results into a list.
Usage
m_ply( .data, .fun = NULL, ..., .expand = TRUE, .progress = "none", .inform = FALSE, .print = FALSE, .parallel = FALSE, .paropts = NULL)Arguments
.data | matrix or data frame to use as source of arguments |
.fun | function to apply to each piece |
... | other arguments passed on to |
.expand | should output be 1d (expand = FALSE), with an element foreach row; or nd (expand = TRUE), with a dimension for each variable. |
.progress | name of the progress bar to use, see |
.inform | produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging |
.print | automatically print each result? (default: |
.parallel | if |
.paropts | a list of additional options passed intothe |
Details
Them*ply functions are theplyr version ofmapply,specialised according to the type of output they produce. These functionsare just a convenient wrapper arounda*ply withmargins = 1and.fun wrapped insplat.
Value
Nothing
Input
Call a multi-argument function with values taken fromcolumns of an data frame or array
Output
All output is discarded. This is useful for functions that you arecalling purely for their side effects like displaying plots orsaving output.
References
Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
See Also
Other multiple arguments input:maply(),mdply(),mlply()
Other no output:a_ply(),d_ply(),l_ply()
Call function with arguments in array or data frame, returning an array.
Description
Call a multi-argument function with values taken from columns of andata frame or array, and combine results into an array
Usage
maply( .data, .fun = NULL, ..., .expand = TRUE, .progress = "none", .inform = FALSE, .drop = TRUE, .parallel = FALSE, .paropts = NULL)Arguments
.data | matrix or data frame to use as source of arguments |
.fun | function to apply to each piece |
... | other arguments passed on to |
.expand | should output be 1d (expand = FALSE), with an element foreach row; or nd (expand = TRUE), with a dimension for each variable. |
.progress | name of the progress bar to use, see |
.inform | produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging |
.drop | should extra dimensions of length 1 in the output bedropped, simplifying the output. Defaults to |
.parallel | if |
.paropts | a list of additional options passed intothe |
Details
Them*ply functions are theplyr version ofmapply,specialised according to the type of output they produce. These functionsare just a convenient wrapper arounda*ply withmargins = 1and.fun wrapped insplat.
Value
if results are atomic with same type and dimensionality, avector, matrix or array; otherwise, a list-array (a list withdimensions)
Input
Call a multi-argument function with values taken fromcolumns of an data frame or array
Output
If there are no results, then this function will return a vector oflength 0 (vector()).
References
Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
See Also
Other multiple arguments input:m_ply(),mdply(),mlply()
Other array output:aaply(),daply(),laply()
Examples
maply(cbind(mean = 1:5, sd = 1:5), rnorm, n = 5)maply(expand.grid(mean = 1:5, sd = 1:5), rnorm, n = 5)maply(cbind(1:5, 1:5), rnorm, n = 5)Replace specified values with new values, in a vector or factor.
Description
Item inx that match itemsfrom will be replaced byitems into, matched by position. For example, items inx thatmatch the first element infrom will be replaced by the firstelement ofto.
Usage
mapvalues(x, from, to, warn_missing = TRUE)Arguments
x | the factor or vector to modify |
from | a vector of the items to replace |
to | a vector of replacement values |
warn_missing | print a message if any of the old values arenot actually present in |
Details
Ifx is a factor, the matching levels of the factor will bereplaced with the new values.
The relatedrevalue function works only on character vectorsand factors, but this function works on vectors of any type and factors.
See Also
revalue to do the same thing but with a singlenamed vector instead of two separate vectors.
Examples
x <- c("a", "b", "c")mapvalues(x, c("a", "c"), c("A", "C"))# Works on factorsy <- factor(c("a", "b", "c", "a"))mapvalues(y, c("a", "c"), c("A", "C"))# Works on numeric vectorsz <- c(1, 4, 5, 9)mapvalues(z, from = c(1, 5, 9), to = c(10, 50, 90))Extract matching rows of a data frame.
Description
Match works in the same way as join, but instead of return the combineddataset, it only returns the matching rows from the first dataset. This isparticularly useful when you've summarised the data in some wayand want to subset the original data by a characteristic of the subset.
Usage
match_df(x, y, on = NULL)Arguments
x | data frame to subset. |
y | data frame defining matching rows. |
on | variables to match on - by default will use all variables commonto both data frames. |
Details
match_df shares the same semantics asjoin, notmatch:
the match criterion is
==, notidentical).it doesn't work for columns that are not atomic vectors
if there are no matches, the row will be omitted'
Value
a data frame
See Also
join to combine the columns from both x and yandmatch for the base function selecting matching items
Examples
# count the occurrences of each id in the baseball dataframe, then get the subset with a freq >25longterm <- subset(count(baseball, "id"), freq > 25)# longterm# id freq# 30 ansonca01 27# 48 baineha01 27# ...# Select only rows from these longterm players from the baseball dataframe# (match would default to match on shared column names, but here was explicitly set "id")bb_longterm <- match_df(baseball, longterm, on="id")bb_longterm[1:5,]Call function with arguments in array or data frame, returning a data frame.
Description
Call a multi-argument function with values taken from columns of andata frame or array, and combine results into a data frame
Usage
mdply( .data, .fun = NULL, ..., .expand = TRUE, .progress = "none", .inform = FALSE, .parallel = FALSE, .paropts = NULL)Arguments
.data | matrix or data frame to use as source of arguments |
.fun | function to apply to each piece |
... | other arguments passed on to |
.expand | should output be 1d (expand = FALSE), with an element foreach row; or nd (expand = TRUE), with a dimension for each variable. |
.progress | name of the progress bar to use, see |
.inform | produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging |
.parallel | if |
.paropts | a list of additional options passed intothe |
Details
Them*ply functions are theplyr version ofmapply,specialised according to the type of output they produce. These functionsare just a convenient wrapper arounda*ply withmargins = 1and.fun wrapped insplat.
Value
A data frame, as described in the output section.
Input
Call a multi-argument function with values taken fromcolumns of an data frame or array
Output
The most unambiguous behaviour is achieved when.fun returns adata frame - in that case pieces will be combined withrbind.fill. If.fun returns an atomic vector offixed length, it will berbinded together and converted to a dataframe. Any other values will result in an error.
If there are no results, then this function will return a dataframe with zero rows and columns (data.frame()).
References
Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
See Also
Other multiple arguments input:m_ply(),maply(),mlply()
Other data frame output:adply(),ddply(),ldply()
Examples
mdply(data.frame(mean = 1:5, sd = 1:5), rnorm, n = 2)mdply(expand.grid(mean = 1:5, sd = 1:5), rnorm, n = 2)mdply(cbind(mean = 1:5, sd = 1:5), rnorm, n = 5)mdply(cbind(mean = 1:5, sd = 1:5), as.data.frame(rnorm), n = 5)Call function with arguments in array or data frame, returning a list.
Description
Call a multi-argument function with values taken from columns of andata frame or array, and combine results into a list.
Usage
mlply( .data, .fun = NULL, ..., .expand = TRUE, .progress = "none", .inform = FALSE, .parallel = FALSE, .paropts = NULL)Arguments
.data | matrix or data frame to use as source of arguments |
.fun | function to apply to each piece |
... | other arguments passed on to |
.expand | should output be 1d (expand = FALSE), with an element foreach row; or nd (expand = TRUE), with a dimension for each variable. |
.progress | name of the progress bar to use, see |
.inform | produce informative error messages? This is turned offby default because it substantially slows processing speed, but is veryuseful for debugging |
.parallel | if |
.paropts | a list of additional options passed intothe |
Details
Them*ply functions are theplyr version ofmapply,specialised according to the type of output they produce. These functionsare just a convenient wrapper arounda*ply withmargins = 1and.fun wrapped insplat.
Value
list of results
Input
Call a multi-argument function with values taken fromcolumns of an data frame or array
Output
If there are no results, then this function will returna list of length 0 (list()).
References
Hadley Wickham (2011). The Split-Apply-Combine Strategyfor Data Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
See Also
Other multiple arguments input:m_ply(),maply(),mdply()
Other list output:alply(),dlply(),llply()
Examples
mlply(cbind(1:4, 4:1), rep)mlply(cbind(1:4, times = 4:1), rep)mlply(cbind(1:4, 4:1), seq)mlply(cbind(1:4, length = 4:1), seq)mlply(cbind(1:4, by = 4:1), seq, to = 20)Mutate a data frame by adding new or replacing existing columns.
Description
This function is very similar totransform but it executesthe transformations iteratively so that later transformations can use thecolumns created by earlier transformations. Like transform, unnamedcomponents are silently dropped.
Usage
mutate(.data, ...)Arguments
.data | the data frame to transform |
... | named parameters giving definitions of new columns. |
Details
Mutate seems to be considerably faster than transform for large dataframes.
See Also
subset,summarise,arrange. For another somewhat different approach tosolving the same problem, seewithin.
Examples
# Examples from transformmutate(airquality, Ozone = -Ozone)mutate(airquality, new = -Ozone, Temp = (Temp - 32) / 1.8)# Things transform can't domutate(airquality, Temp = (Temp - 32) / 1.8, OzT = Ozone / Temp)# mutate is rather faster than transformsystem.time(transform(baseball, avg_ab = ab / g))system.time(mutate(baseball, avg_ab = ab / g))Toggle row names between explicit and implicit.
Description
Plyr functions ignore row names, so this function provides a way to preservethem by converting them to an explicit column in the data frame. After theplyr operation, you can then applyname_rows again to convert backfrom the explicit column to the implicitrownames.
Usage
name_rows(df)Arguments
df | a data.frame, with either |
Examples
name_rows(mtcars)name_rows(name_rows(mtcars))df <- data.frame(a = sample(10))arrange(df, a)arrange(name_rows(df), a)name_rows(arrange(name_rows(df), a))Compute names of quoted variables.
Description
Figure out names of quoted variables, using specified names if they exist,otherwise converting the values to character strings. This may createvariable names that can only be accessed using``.
Usage
## S3 method for class 'quoted'names(x)Number of unique values.
Description
Calculate number of unique values of a variable as efficiently as possible.
Usage
nunique(x)Arguments
x | vector |
Monthly ozone measurements over Central America.
Description
This data set is a subset of the data from the 2006 ASA Data expochallenge,https://community.amstat.org/jointscsg-section/dataexpo/dataexpo2006.The data are monthly ozone averages on a very coarse 24 by 24 grid covering CentralAmerica, from Jan 1995 to Dec 2000. The data is stored in a 3d area withthe first two dimensions representing latitude and longitude, and the thirdrepresenting time.
Usage
ozoneFormat
A 24 x 24 x 72 numeric array
References
https://community.amstat.org/jointscsg-section/dataexpo/dataexpo2006
Examples
value <- ozone[1, 1, ]time <- 1:72month.abbr <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")month <- factor(rep(month.abbr, length = 72), levels = month.abbr)year <- rep(1:6, each = 12)deseasf <- function(value) lm(value ~ month - 1)models <- alply(ozone, 1:2, deseasf)coefs <- laply(models, coef)dimnames(coefs)[[3]] <- month.abbrnames(dimnames(coefs))[3] <- "month"deseas <- laply(models, resid)dimnames(deseas)[[3]] <- 1:72names(dimnames(deseas))[3] <- "time"dim(coefs)dim(deseas)Deprecated Functions in Package plyr
Description
These functions are provided for compatibility with older versions ofplyr only, and may be defunct as soon as the next release.
Details
Print quoted variables.
Description
Display thestructure of quoted variables
Usage
## S3 method for class 'quoted'print(x, ...)Print split.
Description
Don't print labels, so it appears like a regular list
Usage
## S3 method for class 'split'print(x, ...)Arguments
x | object to print |
... | unused |
Null progress bar
Description
A progress bar that does nothing
Usage
progress_none()Details
This the default progress bar used by plyr functions. It's very simple tounderstand - it does nothing!
See Also
Other progress bars:progress_text(),progress_time(),progress_tk(),progress_win()
Examples
l_ply(1:100, identity, .progress = "none")Text progress bar.
Description
A textual progress bar
Usage
progress_text(style = 3, ...)Arguments
style | style of text bar, see Details section of |
... | other arugments passed on to |
Details
This progress bar displays a textual progress bar that works on allplatforms. It is a thin wrapper around the built-insetTxtProgressBar and can be customised in the same way.
See Also
Other progress bars:progress_none(),progress_time(),progress_tk(),progress_win()
Examples
l_ply(1:100, identity, .progress = "text")l_ply(1:100, identity, .progress = progress_text(char = "-"))Text progress bar with time.
Description
A textual progress bar that estimates time remaining. It displays theestimated time remaining and, when finished, total duration.
Usage
progress_time()See Also
Other progress bars:progress_none(),progress_text(),progress_tk(),progress_win()
Examples
l_ply(1:100, function(x) Sys.sleep(.01), .progress = "time")Graphical progress bar, powered by Tk.
Description
A graphical progress bar displayed in a Tk window
Usage
progress_tk(title = "plyr progress", label = "Working...", ...)Arguments
title | window title |
label | progress bar label (inside window) |
... | other arguments passed on to |
Details
This graphical progress will appear in a separate window.
See Also
tkProgressBar for the function that powers this progress bar
Other progress bars:progress_none(),progress_text(),progress_time(),progress_win()
Examples
## Not run: l_ply(1:100, identity, .progress = "tk")l_ply(1:100, identity, .progress = progress_tk(width=400))l_ply(1:100, identity, .progress = progress_tk(label=""))## End(Not run)Graphical progress bar, powered by Windows.
Description
A graphical progress bar displayed in a separate window
Usage
progress_win(title = "plyr progress", ...)Arguments
title | window title |
... | other arguments passed on to |
Details
This graphical progress only works on Windows.
See Also
winProgressBar for the function that powers this progress bar
Other progress bars:progress_none(),progress_text(),progress_time(),progress_tk()
Examples
## Not run: l_ply(1:100, identity, .progress = "win")l_ply(1:100, identity, .progress = progress_win(title="Working..."))## End(Not run)Quick data frame.
Description
Experimental version ofas.data.frame that converts alist to a data frame, but doesn't do any checks to make sure it's avalid format. Much faster.
Usage
quickdf(list)Arguments
list | list to convert to data frame |
Replicate expression and discard results.
Description
Evalulate expression n times then discard results
Usage
r_ply(.n, .expr, .progress = "none", .print = FALSE)Arguments
.n | number of times to evaluate the expression |
.expr | expression to evaluate |
.progress | name of the progress bar to use, see |
.print | automatically print each result? (default: |
Details
This function runs an expression multiple times, discarding the results.This function is equivalent toreplicate, but never returnsanything
References
Hadley Wickham (2011). The Split-Apply-Combine Strategy forData Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
Examples
r_ply(10, plot(runif(50)))r_ply(25, hist(runif(1000)))Replicate expression and return results in a array.
Description
Evalulate expression n times then combine results into an array
Usage
raply(.n, .expr, .progress = "none", .drop = TRUE)Arguments
.n | number of times to evaluate the expression |
.expr | expression to evaluate |
.progress | name of the progress bar to use, see |
.drop | should extra dimensions of length 1 be dropped, simplifying the output. Defaults to |
Details
This function runs an expression multiple times, and combines theresult into a data frame. If there are no results, then this functionreturns a vector of length 0 (vector(0)).This function is equivalent toreplicate, but will alwaysreturn results as a vector, matrix or array.
Value
if results are atomic with same type and dimensionality, a vector, matrix or array; otherwise, a list-array (a list with dimensions)
References
Hadley Wickham (2011). The Split-Apply-Combine Strategy forData Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
Examples
raply(100, mean(runif(100)))raply(100, each(mean, var)(runif(100)))raply(10, runif(4))raply(10, matrix(runif(4), nrow=2))# See the central limit theorem in actionhist(raply(1000, mean(rexp(10))))hist(raply(1000, mean(rexp(100))))hist(raply(1000, mean(rexp(1000))))Combine data.frames by row, filling in missing columns.
Description
rbinds a list of data frames filling missing columns with NA.
Usage
rbind.fill(...)Arguments
... | input data frames to row bind together. The first argument canbe a list of data frames, in which case all other arguments are ignored.Any NULL inputs are silently dropped. If all inputs are NULL, the outputis NULL. |
Details
This is an enhancement torbind that adds in columnsthat are not present in all inputs, accepts a list of data frames, andoperates substantially faster.
Column names and types in the output will appear in the order in whichthey were encountered.
Unordered factor columns will have their levels unified andcharacter data bound with factors will be converted tocharacter. POSIXct data will be converted to be in the same timezone. Array and matrix columns must have identical dimensions afterthe row count. Aside from these there are no general checks thateach column is of consistent data type.
Value
a single data frame
See Also
Other binding functions:rbind.fill.matrix()
Examples
rbind.fill(mtcars[c("mpg", "wt")], mtcars[c("wt", "cyl")])Bind matrices by row, and fill missing columns with NA.
Description
The matrices are bound together using their column names or the columnindices (in that order of precedence.) Numeric columns may be converted tocharacter beforehand, e.g. using format. If a matrix doesn't havecolnames, the column number is used. Note that this means that acolumn with name"1" is merged with the first column of a matrixwithout name and so on. The returned matrix will always have column names.
Usage
rbind.fill.matrix(...)Arguments
... | the matrices to rbind. The first argument can be a list ofmatrices, in which case all other arguments are ignored. |
Details
Vectors are converted to 1-column matrices.
Matrices of factors are not supported. (They are anyways quiteinconvenient.) You may convert them first to either numeric or charactermatrices. If a matrices of different types are merged, then normalcovnersion precendence will apply.
Row names are ignored.
Value
a matrix with column names
Author(s)
C. Beleites
See Also
Other binding functions:rbind.fill()
Examples
A <- matrix (1:4, 2)B <- matrix (6:11, 2)ABrbind.fill.matrix (A, B)colnames (A) <- c (3, 1)Arbind.fill.matrix (A, B)rbind.fill.matrix (A, 99)Replicate expression and return results in a data frame.
Description
Evaluate expression n times then combine results into a data frame
Usage
rdply(.n, .expr, .progress = "none", .id = NA)Arguments
.n | number of times to evaluate the expression |
.expr | expression to evaluate |
.progress | name of the progress bar to use, see |
.id | name of the index column. Pass |
Details
This function runs an expression multiple times, and combines the result intoa data frame. If there are no results, then this function returns a dataframe with zero rows and columns (data.frame()). This function isequivalent toreplicate, but will always return results as adata frame.
Value
a data frame
References
Hadley Wickham (2011). The Split-Apply-Combine Strategy for DataAnalysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
Examples
rdply(20, mean(runif(100)))rdply(20, each(mean, var)(runif(100)))rdply(20, data.frame(x = runif(2)))Reduce dimensions.
Description
Remove extraneous dimensions
Usage
reduce_dim(x)Arguments
x | array |
Modify names by name, not position.
Description
Modify names by name, not position.
Usage
rename(x, replace, warn_missing = TRUE, warn_duplicated = TRUE)Arguments
x | named object to modify |
replace | named character vector, with new names as values, andold names as names. |
warn_missing | print a message if any of the old names arenot actually present in |
warn_duplicated | print a message if any name appears morethan once in |
Examples
x <- c("a" = 1, "b" = 2, d = 3, 4)# Rename column d to "c", updating the variable "x" with the resultx <- rename(x, replace = c("d" = "c"))x# Rename column "disp" to "displacement"rename(mtcars, c("disp" = "displacement"))Replace specified values with new values, in a factor or character vector.
Description
Ifx is a factor, the named levels of the factor will bereplaced with the new values.
Usage
revalue(x, replace = NULL, warn_missing = TRUE)Arguments
x | factor or character vector to modify |
replace | named character vector, with new values as values, andold values as names. |
warn_missing | print a message if any of the old values arenot actually present in |
Details
This function works only on character vectors and factors, but therelatedmapvalues function works on vectors of any type and factors,and instead of a named vector specifying the original and replacement values,it takes two separate vectors
See Also
mapvalues to replace values with vectors of any type
Examples
x <- c("a", "b", "c")revalue(x, c(a = "A", c = "C"))revalue(x, c("a" = "A", "c" = "C"))y <- factor(c("a", "b", "c", "a"))revalue(y, c(a = "A", c = "C"))Replicate expression and return results in a list.
Description
Evalulate expression n times then combine results into a list
Usage
rlply(.n, .expr, .progress = "none")Arguments
.n | number of times to evaluate the expression |
.expr | expression to evaluate |
.progress | name of the progress bar to use, see |
Details
This function runs an expression multiple times, and combines theresult into a list. If there are no results, then this function will returna list of length 0 (list()). This function is equivalent toreplicate, but will always return results as a list.
Value
list of results
References
Hadley Wickham (2011). The Split-Apply-Combine Strategy forData Analysis. Journal of Statistical Software, 40(1), 1-29.https://www.jstatsoft.org/v40/i01/.
Examples
mods <- rlply(100, lm(y ~ x, data=data.frame(x=rnorm(100), y=rnorm(100))))hist(laply(mods, function(x) summary(x)$r.squared))Round to multiple of any number.
Description
Round to multiple of any number.
Usage
round_any(x, accuracy, f = round)Arguments
x | numeric or date-time (POSIXct) vector to round |
accuracy | number to round to; for POSIXct objects, a number of seconds |
f |
Examples
round_any(135, 10)round_any(135, 100)round_any(135, 25)round_any(135, 10, floor)round_any(135, 100, floor)round_any(135, 25, floor)round_any(135, 10, ceiling)round_any(135, 100, ceiling)round_any(135, 25, ceiling)round_any(Sys.time() + 1:10, 5)round_any(Sys.time() + 1:10, 5, floor)round_any(Sys.time(), 3600)‘Splat’ arguments to a function.
Description
Wraps a function in do.call, so instead of taking multiple arguments, ittakes a single named list which will be interpreted as its arguments.
Usage
splat(flat)Arguments
flat | function to splat |
Details
This is useful when you want to pass a function a row of data frame orarray, and don't want to manually pull it apart in your function.
Value
a function
Examples
hp_per_cyl <- function(hp, cyl, ...) hp / cylsplat(hp_per_cyl)(mtcars[1,])splat(hp_per_cyl)(mtcars)f <- function(mpg, wt, ...) data.frame(mw = mpg / wt)ddply(mtcars, .(cyl), splat(f))Split indices.
Description
An optimised version of split for the special case of splitting rowindices into groups, as used bysplitter_d.
Usage
split_indices(group, n = 0L)Arguments
group | integer indices |
n | largest integer (may not appear in index). This is hint: ifthe largest value of |
Examples
split_indices(sample(10, 100, rep = TRUE))split_indices(sample(10, 100, rep = TRUE), 10)Generate labels for split data frame.
Description
Create data frame giving labels for split data frame.
Usage
split_labels(splits, drop, id = plyr::id(splits, drop = TRUE))Arguments
splits | list of variables to split up by |
drop | whether all possible combinations should be considered, or only those present in the data |
Split an array by .margins.
Description
Split a 2d or higher data structure into lower-d pieces based
Usage
splitter_a(data, .margins = 1L, .expand = TRUE, .id = NA)Arguments
data | >1d data structure (matrix, data.frame or array) |
.margins | a vector giving the subscripts to split up |
.expand | if splitting a dataframe by row, should output be 1d(expand = FALSE), with an element for each row; or nd (expand = TRUE),with a dimension for each variable. |
.id | names of the split label.Pass |
Details
This is the workhorse of thea*ply functions. Given a >1 ddata structure (matrix, array, data.frame), it splits it into piecesbased on the subscripts that you supply. Each piece is a lower dimensionalslice.
The margins are specified in the same way asapply, butsplitter_a just splits up the data, whileapply alsoapplies a function and combines the pieces back together. This functionalso includes enough information to recreate the split from attributes onthe list of pieces.
Value
a list of lower-d slices, with attributes that record split details
See Also
Other splitter functions:splitter_d()
Examples
plyr:::splitter_a(mtcars, 1)plyr:::splitter_a(mtcars, 2)plyr:::splitter_a(ozone, 2)plyr:::splitter_a(ozone, 3)plyr:::splitter_a(ozone, 1:2)Split a data frame by variables.
Description
Split a data frame into pieces based on variable contained in that data frame
Usage
splitter_d(data, .variables = NULL, drop = TRUE)Arguments
data | data frame |
.variables | aquoted list of variables |
drop | drop unnused factor levels? |
Details
This is the workhorse of thed*ply functions. Based on the variablesyou supply, it breaks up a single data frame into a list of data frames,each containing a single combination from the levels of the specifiedvariables.
This is basically a thin wrapper aroundsplit whichevaluates the variables in the context of the data, and includes enoughinformation to reconstruct the labelling of the data frame afterother operations.
Value
a list of data.frames, with attributes that record split details
See Also
Other splitter functions:splitter_a()
Examples
plyr:::splitter_d(mtcars, .(cyl))plyr:::splitter_d(mtcars, .(vs, am))plyr:::splitter_d(mtcars, .(am, vs))mtcars$cyl2 <- factor(mtcars$cyl, levels = c(2, 4, 6, 8, 10))plyr:::splitter_d(mtcars, .(cyl2), drop = TRUE)plyr:::splitter_d(mtcars, .(cyl2), drop = FALSE)mtcars$cyl3 <- ifelse(mtcars$vs == 1, NA, mtcars$cyl)plyr:::splitter_d(mtcars, .(cyl3))plyr:::splitter_d(mtcars, .(cyl3, vs))plyr:::splitter_d(mtcars, .(cyl3, vs), drop = FALSE)Remove splitting variables from a data frame.
Description
This is useful when you want to perform some operation to every columnin the data frame, except the variables that you have used to split it.These variables will be automatically added back on to the result whencombining all results together.
Usage
strip_splits(df)Arguments
df | data frame produced by |
Examples
dlply(mtcars, c("vs", "am"))dlply(mtcars, c("vs", "am"), strip_splits)Summarise a data frame.
Description
Summarise works in an analogous way tomutate, exceptinstead of adding columns to an existing data frame, it creates a newdata frame. This is particularly useful in conjunction withddply as it makes it easy to perform group-wise summaries.
Usage
summarise(.data, ...)Arguments
.data | the data frame to be summarised |
... | further arguments of the form var = value |
Note
Be careful when using existing variable names; the correspondingcolumns will be immediately updated with the new data and this can affectsubsequent operations referring to those variables.
Examples
# Let's extract the number of teams and total period of time# covered by the baseball dataframesummarise(baseball, duration = max(year) - min(year), nteams = length(unique(team)))# Combine with ddply to do that for each separate idddply(baseball, "id", summarise, duration = max(year) - min(year), nteams = length(unique(team)))Take a subset along an arbitrary dimension
Description
Take a subset along an arbitrary dimension
Usage
take(x, along, indices, drop = FALSE)Arguments
x | matrix or array to subset |
along | dimension to subset along |
indices | the indices to select |
drop | should the dimensions of the array be simplified? Defaultsto |
Examples
x <- array(seq_len(3 * 4 * 5), c(3, 4, 5))take(x, 3, 1)take(x, 2, 1)take(x, 1, 1)take(x, 3, 1, drop = TRUE)take(x, 2, 1, drop = TRUE)take(x, 1, 1, drop = TRUE)Function that always returns true.
Description
Function that always returns true.
Usage
true(...)Arguments
... | all input ignored |
Value
TRUE
See Also
colwise which uses it
Try, with default in case of error.
Description
try_default wraps try so that it returns a default value in the case of error.tryNULL provides a useful special case when dealing with lists.
Usage
try_default(expr, default, quiet = FALSE)tryNULL(expr)Arguments
expr | expression to try |
default | default value in case of error |
quiet | should errors be printed (TRUE) or ignored (FALSE, default) |
See Also
Apply with built in try.Uses compact, lapply and tryNULL
Description
Apply with built in try.Uses compact, lapply and tryNULL
Usage
tryapply(list, fun, ...)Arguments
list | list to apply function |
fun | function |
... | further arguments to |
Un-rowname.
Description
Strip rownames from an object
Usage
unrowname(x)Arguments
x | data frame |
Vector aggregate.
Description
This function is somewhat similar totapply, but is designed foruse in conjunction withid. It is simpler in that it onlyaccepts a single grouping vector (useid if you have more)and usesvapply internally, using the.default valueas the template.
Usage
vaggregate(.value, .group, .fun, ..., .default = NULL, .n = nlevels(.group))Arguments
.value | vector of values to aggregate |
.group | grouping vector |
.fun | aggregation function |
... | other arguments passed on to |
.default | default value used for missing groups. This argument isalso used as the template for function output. |
.n | total number of groups |
Details
vaggregate should be faster thantapply in most situationsbecause it avoids making a copy of the data.
Examples
# Some examples of use borrowed from ?tapplyn <- 17; fac <- factor(rep(1:3, length.out = n), levels = 1:5)table(fac)vaggregate(1:n, fac, sum)vaggregate(1:n, fac, sum, .default = NA_integer_)vaggregate(1:n, fac, range)vaggregate(1:n, fac, range, .default = c(NA, NA) + 0)vaggregate(1:n, fac, quantile)# Unlike tapply, vaggregate does not support multi-d output:tapply(warpbreaks$breaks, warpbreaks[,-1], sum)vaggregate(warpbreaks$breaks, id(warpbreaks[,-1]), sum)# But it is about 10x fasterx <- rnorm(1e6)y1 <- sample.int(10, 1e6, replace = TRUE)system.time(tapply(x, y1, mean))system.time(vaggregate(x, y1, mean))