R CMD checkR CMD check passes cleanly in futureR-devel.R CMD check passes cleanly in futureR-devel.R CMD check passes cleanly on R andR-devel.R CMD check passes cleanly on R andR-devel.R CMD check passes cleanly on R andR-devel.loop_apply() as Rcpp version wasappears to be having PROTECTion problems. (Also fixes #256)Update for changes in R namespace best-practices.
New parameter.id toadply() thatspecifies the name(s) of the index column(s). (Thanks to Kirill Müller,#191)
Fix bug insplit_indices() whenn isn’tsupplied.
Fix bug in.id parameter toldply() andrdply() allowing for.id = NULL to work asdescribed in the help. (Thanks to Doug Mitarotonda, #207, and Marek,#224 and #225)
Deprecate exotic functionsliply() andisplit2(), remove unused and unexported functionsdots() andparallel_fe() (Thanks to KirillMüller, #242, #248)
Warn on duplicate names that cause certain array functions tofail. (Thanks to Kirill Müller, #211)
Parameter.inform is now honored for?_ply() calls. (Thanks to Kirill Müller, #209)
New parameter.id toldply() andrdply() that specifies the name of the index column.(Thanks to Kirill Müller, #107, #140, #142)
The .id column inldply() is generated as a factorto preserve the sort order, but only if the new.idparameter is set. (Thanks to Kirill Müller, #137)
rbind.fill now silently drops NULL inputs(#138)
rbind.fill avoids array copying which had producedquadratic time complexity.*dply of large numbers of groupsshould be faster. (Contributed by Peter Meilstrup)
rbind.fill handles non-numeric matrix columns(i.e. factor arrays, character arrays, list arrays); also arrays withmore than 2 dimensions can be used. Dimnames of array columns are nowpreserved. (Contributed by Peter Meilstrup)
rbind.fill(x,y) converts factor columns of Y tocharacter when columns of X are character.join(x,y) andmatch_df(x,y) now work when the key column in X ischaracter and Y is factor. (Contributed by Peter Meilstrup)
Fix faulty array allocation which caused problems when usingsplit_indices with large (> 2^24) vectors. (Fixes#131)
list_to_array() incorrectly determined dimensions ifcolumn of labels contained any missing values (#169).
r*ply expression is evaluated exactly.n times, evaluation results are consistent with sideeffects. (#158, thanks to Kirill Müller)
**ply gain a.inform argument(previously only available inllply) - this gives moreuseful debugging information at the cost of some speed. (Thanks to BrianDiggs, #57)
if.dims = TRUEalply’s output gainsdimensions and dimnames, similar toapply. Sequentialindexing of a list produced byalply should be unaffected.(Peter Meilstrup)
colwise,numcolwise andcatcolwise now all accept additional arguments in ….(Thanks to Stavros Macrakis, #62)
here makes it possible to use**ply + afunction that uses non-standard evaluation (e.g. summarise,mutate,subset,arrange) inside afunction. (Thanks to Peter Meilstrup, #3)
join_all recursively joins a list of data frames.(Fixes #29)
name_rows provides a convenient way of saving andthen restoring row names so that you can preserve them if you need to.(#61)
progress_time (used with.progress = "time") estimates the amount of time remainingbefore the job is completed. (Thanks to Mike Lawrence, #78)
summarise now works iteratively so that latercolumns can refer to earlier. (Thanks to Jim Hester, #44)
take makes it easy to subset along an arbitrarydimension.
Improved documentation thanks to patches from Tim Bates.
**ply gains a.paropts argument, a listof options that is passed ontoforeach for controllingparallel computation.
*_ply now accepts.parallel argument toenable parallel processing. (Fixes #60)
Progress bars are disabled when using parallel plyr (Fixes#32)
a*ply: 25x speedup when indexing array objects, 3xspeedup when indexing data frames. This should substantially reduce theoverhead of usinga*ply
d*ply subsetting has been considerably optimised:this will have a small impact unless you have a very large number ofgroups, in which case it will be considerably faster.
idata.frame: Subsetting immutable data frames with[.idf is now faster (Peter Meilstrup)
quickdf is around 20% faster
split_indices, which powers much internal splittingcode (likevaggregate,join andd*ply) is about 2x faster. It was already incredibly fast~0.2s for 1,000,000 obs, so this won’t have much impact on overallperformance
*aply functions now bind list mode results into alist-array (Peter Meilstrup)
*aply now accepts 0-dimension arrays as inputs.(#88)
count now works correctly for factor and Dateinputs. (Fixes #130)
*dply now deals better with matrix results,converting them to data frames, rather than vectors. (Fixes#12)
d*ply will now preserve factor levels input ifdrop = FALSE (#81)
join works correctly when there are no common rows(Fixes #74), or when one input has no rows (Fixes #48). It alsoconsistently orders the columns: common columns, then x cols, then ycols (Fixes #40).
quickdf correctly handles NA variable names. (Fixes#66. Thanks to Scott Kostyshak)
rbind.fill andrbind.fill.matrix workconsistently with matrices and data frames with zero rows. Fixes #79.(Peter Meilstrup)
rbind.fill now stops if inputs are not data frames.(Fixes #51)
rbind.fill now works consistently with 0 column dataframes
round_any now works withPOSIXctobjects, thanks to Jean-Olivier Irisson (#76)
rbind.fill: if a column contains both factors andcharacters (in different inputs), the resulting column will be coercedto character
When there are more than 2^31 distinct combinationsid, switches to a slower fallback strategy using strings(inspired bymerge) that guarantees correct results. Thisfixes problems withjoin when joining across many columns.(Fixes #63)
split_indices checks input more aggressively toprevent segfaults. Fixes #43.
fix small bug inloop_apply which lead to segfaultsin certain circumstances. (Thanks to Pål Westermark for patch)
itertools anditerators moved tosuggests from imports so that plyr now only depends on base R.
documentation improved using new features ofroxygen2
fixed namespacing issue which lead to lost labels when subsettingthe results of*lply
colwise automatically strips off splitvariables.
rlply now correctly deals withrlply(4, NULL) (thanks to bug report from EricGoldlust)
rbind.fill tries harder to keep attributes,retaining the attributes from the first occurrence of each column itfinds. It also now works with variables of classPOSIXltand preserves the ordered status of factors.
arrange now works with one column dataframes
d*ply returns correct number of rows when functionreturns vector
fix NAMESPACE bug which was causing problems withggplot2
rbind.fill now treats 1d arrays in the same way asrbind (i.e. it turns them into ordinary vectors)
fix bug in rename when renaming multiple columns
newstrip_splits function removes splittingvariables from the data frames returned byddply.
rename moved in from reshape, andrewritten.
newmatch_df function makes it easy to subset a dataframe to only contain values matching another data frame. Inspired byhttp://stackoverflow.com/questions/4693849.
**ply now works when passed a list offunctions
*dply now correctly names output even when someoutput combinations are missing (NULL) (Thanks to bug report from KarlOve Hufthammer)
*dply preserves the class of many more objecttypes.
a*ply now correctly works with zero length margins,operating on the entire object (Thanks to bug report from StavrosMacrakis)
join now implements joins in a more SQL like way,returning all possible matches, not just the first one. It is still a(little) faster than merge. The previous behaviour is accessible withmatch = "first".
join is now more symmetric so thatjoin(x, y, "left") is closer tojoin(y, x, "right"), modulo column ordering
named.quoted failed when quoted expressions werelonger than 50 characters. (Thanks to bug report from EricGoldlust)
rbind.fill now correctly maintains POSIXct tzoneattributes and preserves missing factor levels
split_labels correctly preserves empty factorlevels, which means thatdrop = FALSE should work in moreplaces. Usebase::droplevels to remove levels that don’toccur in the data, anddrop = T to remove combinations oflevels that don’t occur.
vaggregate now passes... to theaggregation function when working out the output type (thanks to bugreport by Pavan Racherla)
count now takes an additional parameterwt_var which allows you to compute weighted sums. This isas fast, or faster than,tapply orxtabs.
Really fix bug innames.quoted
. now captures the environment in which it wasevaluated. This should fix an esoteric class of bugs which no-oneprobably ever encountered, but will form the basis for an improvedversion ofggplot2::aes.
names.quoted that interfered withggplot2mutate that works like transform to addnew columns or overwrite existing columns, but computes new columnsiteratively so later transformations can use columns created by earliertransformations. (It’s also about 10x faster) (Fixes #21)split column names are no longer coerced to valid Rnames.
quickdf now adds names if missing
summarise preserves variable names if explicit namesnot provided (Fixes #17)
arrays with names should be sorted correctly onceagain (also fixed a bug in the test case that prevented me from catchingthis automatically)
m_ply no longer possesses .parallel argument(mistakenly added)
ldply (and henceadply andddply) now correctly passes on .parallel argument (Fixes#16)
id uses a better strategy for converting tointegers, making it possible to use for cases with larger potentialnumbers of combinations
l*ply,d*ply,a*ply andm*ply all gain a .parallel argument that whenTRUE, applies functions in parallel using a parallelbackend registered with the foreach package:
x<-seq_len(20)wait<-function(i)Sys.sleep(0.1)system.time(llply(x, wait))# user system elapsed# 0.007 0.005 2.005doParallel::registerDoParallel(2)system.time(llply(x, wait,.parallel =TRUE))# user system elapsed# 0.020 0.011 1.038This work has been generously supported by BD (BectonDickinson).
aply and mply gain an .expand argument that controlswhether data frames produce a single output dimension (one element foreach row), or an output dimension for each variable.
new vaggregate (vector aggregate) function, which is equivalentto tapply, but much faster (~ 10x), since it avoids copying thedata.
llply: for simple lists and vectors, with no progress bar, noextra info, and no parallelisation, llply calls lapply directly to avoidall the overhead associated with those unused extra features.
llply: in serial case, for loop replaced with custom C functionthat takes about 40% less time (or about 20% less time than lapply).Note that as a whole, llply still has much more overhead thanlapply.
round_any now lives in plyr instead of reshape
list_to_array works correct even when there are missingvalues in the array. This is particularly important for daply.*dply deals more gracefully with the case when allresults are NULL (fixes #10)
*aply correctly orders output regardless ofdimension names (fixes #11)
join gains type = “full” which preserves all x and yrows
experimental immutable data frame (idata.frame) that vastlyspeeds up subsetting - for large datasets with large numbers of groups,this can yield 10-fold speed ups. See examples in ?idata.frame to seehow to use it.
rbind.fill rewritten again to increase speed and work with moredata types
d*ply now much faster with nested groups
This work has been generously supported by BD (BectonDickinson).
d*ply when .drop = FALSEa*ply now works correctly with array-listsr*ply now works with …