pivot_wider now uses.by and|> syntax for the dplyr helper message to identifyduplicates (@boshek,#1516)New family of consistent string separating functions:separate_wider_delim(),separate_wider_position(),separate_wider_regex(),separate_longer_delim(), andseparate_longer_position(). These functions are thoroughrefreshes ofseparate() andextract(),featuring improved performance, greater consistency, a polished API, anda new approach for handling problems. They use stringr and supersedeextract(),separate(), andseparate_rows() (#1304).
The named character vector interface used inseparate_wider_regex() is very similar to thenc package by Toby DylanHocking.
nest() gains a.by argument whichallows you to specify the columns to nest by (rather than the columns tonest, i.e. through...). Additionally, the.key argument is no longer deprecated, and is used whenever... isn’t specified (#1458).
unnest_longer() gains akeep_emptyargument likeunnest() (#1339).
pivot_longer() gains acols_varyargument for controlling the ordering of the output rows relative totheir original row number (#1312).
New datasetswho2,household,cms_patient_experience, andcms_patient_careto demonstrate various tidying challenges (#1333).
The... argument of bothpivot_longer()andpivot_wider() has been moved to the front of thefunction signature, after the required arguments but before the optionalones. Additionally,pivot_longer_spec(),pivot_wider_spec(),build_longer_spec(), andbuild_wider_spec() have all gained...arguments in a similar location. This change allows us to more easilyadd new features to the pivoting functions without breaking existingCRAN packages and user scripts.
pivot_wider() provides temporary backwards compatiblesupport for the case of a single unnamed argument that previously wasbeing positionally matched toid_cols. This one specialcase still works, but will throw a warning encouraging you to explicitlyname theid_cols argument.
To read more about this pattern, seehttps://design.tidyverse.org/dots-after-required.html(#1350).
_ and various arguments tounnest()) now warn on every use. They will be made defunctin 2024 (#1406).unnest_longer() now consistently drops rows witheitherNULL or empty vectors (likeinteger())by default. Set the newkeep_empty argument toTRUE to retain them. Previously,keep_empty = TRUE was implicitly being used forNULL, whilekeep_empty = FALSE was being usedfor empty vectors, which was inconsistent with all other tidyr verbswith this argument (#1363).
unnest_longer() now uses"" in theindex column for fully unnamed vectors. It also now consistently usesNA in the index column for empty vectors that are “kept” bykeep_empty = TRUE (#1442).
unnest_wider() now errors if any values beingunnested are unnamed andnames_sep is not provided(#1367).
unnest_wider() now generates automatic names forpartially unnamed vectors. Previously it only generated themfor fully unnamed vectors, resulting in a strange mix of automatic namesand name-repaired names (#1367).
Most tidyr functions now consistently disallow renaming duringtidy-selection. Renaming was never meaningful in these functions, andpreviously either had no effect or caused problems (#1449,#1104).
tidyr errors (including input validation) have been thoroughlyreviewed and should generally be more likely to point you in the rightdirection (#1313, #1400).
uncount() is now generic so implementations can beprovided for objects other than data frames (
uncount() gains a... argument. Itcomes between the required and the optional arguments (
nest(),complete(),expand(), andfill() now document theirsupport for grouped data frames created bydplyr::group_by() (#952).
All built in datasets are now standard tibbles (#1459).
R >=3.4.0 is now required, in line with the tidyverse standardof supporting the previous 5 minor releases of R.
rlang >=1.0.4 and vctrs >=0.5.2 are now required (#1344,#1470).
Removed dependency on ellipsis in favor of equivalent functionsin rlang (#1314).
unnest(),unchop(),unnest_longer(), andunnest_wider() betterhandle lists with additional classes (#1327).
pack(),unpack(),chop(),andunchop() all gain anerror_call argument,which in turn improves some of the error calls shown innest() and variousunnest() adjacent functions(#1446).
chop(),unpack(), andunchop() all gain..., which must be empty(#1447).
unpack() does a better job of reporting column nameduplication issues and gives better advice about how to resolve themusingnames_sep. This also improves errors from functionsthat useunpack(), likeunnest() andunnest_wider() (#1425, #1367).
pivot_longer() no longer supports interpretingvalues_ptypes = list() andnames_ptypes = list() asNULL. An emptylist() is now interpreted as a<list>prototype to apply to all columns, which is consistent with how anyother 0-length value is interpreted (#1296).
pivot_longer(values_drop_na = TRUE) is faster whenthere aren’t any missing values to drop (#1392,
pivot_longer() is now more memory efficient due tothe usage ofvctrs::vec_interleave() (#1310,
pivot_longer() now throws a slightly better errormessage whenvalues_ptypes ornames_ptypes isprovided and the coercion can’t be made (#1364).
pivot_wider() now throws a better error message whena column selected bynames_from orvalues_fromis also selected byid_cols (#1318).
pivot_wider() is now faster whennames_sep is provided (
pivot_longer_spec(),pivot_wider_spec(),build_longer_spec(), andbuild_wider_spec() all gain anerror_callargument, resulting in better error reporting inpivot_longer() andpivot_wider()(#1408).
fill() now works correctly when there is a columnnamed.direction indata (#1319,
replace_na() is faster when there aren’t any missingvalues to replace (#1392,
The documentation of thereplace argument ofreplace_na() now mentions thatreplace isalways cast to the type ofdata (#1317).
complete() andexpand() no longer allowyou to complete or expand on a grouping column. This was neverwell-defined since completion/expansion on a grouped data frame happens“within” each group and otherwise has the potential to produce erroneousresults (#1299).
replace_na() no longer allows the type ofdata to change when the replacement is applied.replace will now always be cast to the type ofdata before the replacement is made. For example, thismeans that using a replacement value of1.5 on an integercolumn is no longer allowed. Similarly, replacing missing values in alist-column must now be done withlist("foo") rather thanjust"foo".
pivot_wider() gains newnames_expandandid_expand arguments for turning implicit missing factorlevels and variable combinations into explicit ones. This is similar tothedrop argument fromspread()(#770).
pivot_wider() gains a newnames_varyargument for controlling the ordering when combiningnames_from values withvalues_from columnnames (#839).
pivot_wider() gains a newunused_fnargument for controlling how to summarize unused columns that aren’tinvolved in the pivoting process (#990, thanks to
pivot_longer()’snames_transform andvalues_transform arguments now accept a single functionwhich will be applied to all of the columns (#1284, thanks to
pivot_longer()’snames_ptypes andvalues_ptypes arguments now accept a single empty ptypewhich will be applied to all of the columns (#1284).
unnest() andunchop()’sptype argument now accepts a single empty ptype which willbe applied to allcols (#1284).
unpack() now silently skips over any non-data framecolumns specified bycols. This matches the existingbehavior ofunchop() andunnest()(#1153).
unnest_wider() andunnest_longer() cannow unnest multiple columns at once (#740).
unnest_longer()’sindices_to andvalues_to arguments now accept a glue specification, whichis useful when unnesting multiple columns.
Forhoist(),unnest_longer(), andunnest_wider(), if aptype is supplied, butthat column can’t be simplified, the result will be a list-of columnwhere each element has typeptype (#998).
unnest_wider() gains a newstrictargument which controls whether or not strict vctrs typing rules shouldbe applied. It defaults toFALSE for backwardscompatibility, and because it is often more useful to be lax whenunnesting JSON, which doesn’t always map one-to-one with R’s types(#1125).
hoist(),unnest_longer(), andunnest_wider()’ssimplify argument now acceptsa named list ofTRUE orFALSE to controlsimplification on a per column basis (#995).
hoist(),unnest_longer(), andunnest_wider()’stransform argument nowaccepts a single function which will be applied to all components(#1284).
hoist(),unnest_longer(), andunnest_wider()’sptype argument now accepts asingle empty ptype which will be applied to all components(#1284).
complete() gains a newexplicitargument for limitingfill to only implicit missing values.This is useful if you don’t want to fill in pre-existing missing values(#1270).
complete() gains a grouped data frame method. Thisgenerates a more correct completed data frame when groups are involved(#396, #966).
drop_na(),replace_na(), andfill() have been updated to utilize vctrs. This means thatyou can use these functions on a wider variety of column types,including lubridate’s Period types (#1094), data frame columns, and thercrd typefrom vctrs.
replace_na() no longer replaces empty atomicelements in list-columns (likeinteger(0)). The only valuethat is replaced in a list-column isNULL (#1168).
drop_na() no longer drops empty atomic elements fromlist-columns (likeinteger(0)). The only value that isdropped in a list-column isNULL (#1228).
@mgirlich isnow a tidyr author in recognition of his significant and sustainedcontributions.
All lazyeval variants of tidyr verbs have been soft-deprecated.Expect them to move to the defunct stage in the next minor release oftidyr (#1294).
any_of() andall_of() from tidyselectare now re-exported (#1217).
dplyr >= 1.0.0 is now required.
pivot_wider() now gives better advice about how toidentify duplicates when values are not uniquely identified(#1113).
pivot_wider() now throws a more informative errorwhenvalues_fn doesn’t result in a single summary value(#1238).
pivot_wider() andpivot_longer() nowgenerate more informative errors related to name repair (#987).
pivot_wider() now works correctly whenvalues_fill is a data frame.
pivot_wider() no longer accidentally retainsvalues_from when pivoting a zero row data frame(#1249).
pivot_wider() now correctly handles the case wherean id column name collides with a value fromnames_from(#1107).
pivot_wider() andpivot_longer() nowboth check that the spec columns.name and.value are character vectors. Additionally, the.name column must be unique (#1107).
pivot_wider()’snames_from andvalues_from arguments are now required if their defaultvalues ofname andvalue don’t correspond tocolumns indata. Additionally, they must identify at least1 column indata (#1240).
pivot_wider()’svalues_fn argument nowcorrectly allows anonymous functions (#1114).
pivot_wider_spec() now works correctly with a 0-rowdata frame and aspec that doesn’t identify any rows(#1250, #1252).
pivot_longer()’snames_ptypes argumentis now applied afternames_transform for consistency withthe rectangling functions (i.e. hoist()) (#1233).
check_pivot_spec() is a new developer facingfunction for validating a pivotspec argument. This is onlyuseful if you are extendingpivot_longer() orpivot_wider() with new S3 methods (#1087).
Thenest() generic now avoids computing on.data, making it more compatible with lazy tibbles(#1134).
The.names_sep argument of the data.frame method fornest() is now actually used (#1174).
unnest()’sptype argument now works asexpected (#1158).
unpack() no longer drops empty columns specifiedthroughcols (#1191).
unpack() now works correctly with data frame columnscontaining 1 row but 0 columns (#1189).
chop() now works correctly with data frames with 0rows (#1206).
chop()’scols argument is no longeroptional. This matches the behavior ofcols seen elsewherein tidyr (#1205).
unchop() now respectsptype whenunnesting a non-list column (#1211).
hoist() no longer accidentally removes elements thathave duplicated names (#1259).The grouped data frame methods forcomplete() andexpand() now move the group columns to the front of theresult (in addition to the columns you completed on or expanded, whichwere already moved to the front). This should make more intuitive sense,as you are completing or expanding “within” each group, so the groupcolumns should be the first thing you see (#1289).
complete() now appliesfill even whenno columns to complete are specified (#1272).
expand(),crossing(), andnesting() now correctly retainNA values offactors (#1275).
expand_grid(),expand(),nesting(), andcrossing() now silently applyname repair to automatically named inputs. This avoids a number ofissues resulting from duplicate truncated names (#1116, #1221, #1092,#1037, #992).
expand_grid(),expand(),nesting(), andcrossing() now allow columnsfrom unnamed data frames to be used in expressions after that data framewas specified, likeexpand_grid(tibble(x = 1), y = x). Thisis more consistent with howtibble() behaves.
expand_grid(),expand(),nesting(), andcrossing() now work correctlywith data frames containing 0 columns but >0 rows (#1189).
expand_grid(),expand(),nesting(), andcrossing() now return a 1 rowdata frame when no inputs are supplied, which is more consistent withprod() == 1L and the idea that computations involving thenumber of combinations computed from an empty set should return 1(#1258).
drop_na() no longer drops missing values from allcolumns when a tidyselect expression that results in 0 columns beingselected is used (#1227).
fill() now treatsNaN like any othermissing value (#982).
expand_grid() is now about twice as fast andpivot_wider() is a bit faster (
unchop() is now much faster, which propagatesthrough to various functions, such asunnest(),unnest_longer(),unnest_wider(), andseparate_rows() (
unnest() is now much faster (
unnest() no longer allows unnesting a list-colcontaining a mix of vector and data frame elements. Previously, thisonly worked by accident, and is considered an off-label usage ofunnest() that has now become an error.
tidyr verbs no longer have “default” methods for lazyevalfallbacks. This means that you’ll get clearer error messages(#1036).
uncount() error for non-integer weights and gives aclearer error message for negative weights (
You can once again unnest dates (#1021, #1089).
pivot_wider() works with data.table and empty keyvariables (
separate_rows() works for factor columns (
separate_rows() returns to 1.1.0 behaviour for emptystrings (@rjpatm,#1014).New tidyr logo!
stringi dependency has been removed; this was a substantialdependency that make tidyr hard to compile in resource constrainedenvironments (@rjpat,#936).
Replace Rcpp with cpp11. Seehttps://cpp11.r-lib.org/articles/motivations.html forreasons why.
pivot_longer(),hoist(),unnest_wider(), andunnest_longer() gain newtransform arguments; these allow you to transform values“in flight”. They are partly needed because vctrs coercion rules havebecome stricter, but they give you greater flexibility than wasavailable previously (#921).
Arguments that use tidy selection syntax are now clearlydocumented and have been updated to use tidyselect 1.1.0(#872).
Bothpivot_wider() andpivot_longer()are considerably more performant, thanks largely to improvements in theunderlying vctrs code (#790,
pivot_longer() now supportsnames_to = character() which prevents the name column frombeing created (#961).
df <- tibble(id = 1:3, x_1 = 1:3, x_2 = 4:6) df %>% pivot_longer(-id, names_to = character())pivot_longer() no longer creates a.copy variable in the presence of duplicate column names.This makes it more consistent with the handling of non-uniquespecs.
pivot_longer() automatically disambiguatesnon-unique ouputs, which can occur when the input variables include someadditional component that you don’t care about and want to discard(#792, #793).
df <- tibble(id = 1:3, x_1 = 1:3, x_2 = 4:6) df %>% pivot_longer(-id, names_pattern = "(.)_.")df %>% pivot_longer(-id, names_sep = "_", names_to = c("name", NA))df %>% pivot_longer(-id, names_sep = "_", names_to = c(".value", NA))pivot_wider() gains anames_sortargument which allows you to sort column names in order. The default,FALSE, orders columns by their first appearance (#839). Ina future version, I’ll consider changing the default toTRUE.
pivot_wider() gains anames_glueargument that allows you to construct output column names with a gluespecification.
pivot_wider() argumentsvalues_fn andvalues_fill can now be single values; you now only need touse a named list if you want to use different values for different valuecolumns (#739, #746). They also get improved errors if they’re not ofthe expected type.
hoist() now automatically names pluckers that are asingle string (#837). It error if you use duplicated column names (rlang::list2() behind the scenes (which means that youcan now use!!! and:=) (#801).
unnest_longer(),unnest_wider(), andhoist() do a better job simplifying list-cols. They nolonger add unneededunspecified() when the result is stilla list (#806), and work when the list contains non-vectors (#810,#848).
unnest_wider(names_sep = "") now provides defaultnames for unnamed inputs, suppressing the many previous name repairmessages (#742).
pack() andnest() gains a.names_sep argument allows you to strip outer names frominner names, in symmetrical way to how the same argument tounpack() andunnest() combines inner and outernames (#795, #797).
unnest_wider() andunnest_longer() cannow unnestlist_of columns. This is important for unnestingcolumns created fromnest() and withpivot_wider(), which will createlist_ofcolumns if the id columns are non-unique (#741).
chop() now creates list-columns of classvctrs::list_of(). This helps keep track of the type in casethe chopped data frame is empty, allowingunchop() toreconstitute a data frame with the correct number and types of columneven when there are no observations.
drop_na() now preserves attributes of unclassedvectors (#905).
expand(),expand_grid(),crossing(), andnesting() once again evaluatetheir inputs iteratively, so you can refer to freshly created columns,e.g. crossing(x = seq(-2, 2), y = x) (#820).
expand(),expand_grid(),crossing(), andnesting() gain a.name_repair giving you control over their name repairstrategy (
extract() lets you useNA ininto, as documented (#793).
extract(),separate(),hoist(),unnest_longer(), andunnest_wider() give a better error message ifcol is missing (#805).
pack()’s first argument is now.datainstead ofdata (#759).
pivot_longer() now errors ifvalues_tois not a length-1 character vector (#949).
pivot_longer() andpivot_wider() arenow generic so implementations can be provided for objects other thandata frames (#800).
pivot_wider() can now pivot data frame columns(#926)
unite(na.rm = TRUE) now works for all types ofvariable, not just character vectors (#765).
unnest_wider() gives a better error message if youattempt to unnest multiple columns (#740).
unnest_auto() works when the input data contains acolumn calledcol (#959).
Seevignette("in-packages") for a detailed transitionguide.
nest() andunnest() have new syntax.The majority of existing usage should be automatically translated to thenew syntax with a warning. If that doesn’t work, put this in your scriptto use the old versions until you can take a closer look and update yourcode:
library(tidyr)nest<- nest_legacyunnest<- unnest_legacynest() now preserves grouping, which hasimplications for downstream calls to group-aware functions, such asdplyr::mutate() andfilter().
The first argument ofnest() has changed fromdata to.data.
unnest() uses theemergingtidyverse standard to disambiguate unique names. Usenames_repair = tidyr_legacy to request the previousapproach.
unnest_()/nest_() and the lazyevalmethods forunnest()/nest() are now defunct.They have been deprecated for some time, and, since the interface haschanged, package authors will need to update to avoid deprecationwarnings. I think one clean break should be less work for everyone.
All other lazyeval functions have been formally deprecated, and willbe made defunct in the next major release. (Seelifecyclevignette for details on deprecation stages).
crossing() andnesting() now return0-row outputs if any input is a length-0 vector. If you want to preservethe previous behaviour which silently dropped these inputs, you shouldconvert empty vectors toNULL. (More discussion on thisgeneral pattern athttps://github.com/tidyverse/principles/issues/24)
Newpivot_longer() andpivot_wider()provide modern alternatives tospread() andgather(). They have been carefully redesigned to be easierto learn and remember, and include many new features. Learn more invignette("pivot").
These functions resolve multiple existing issues withspread()/gather(). Both functions now handlemulitple value columns (#149/#150), support more vector types (#333),use tidyverse conventions for duplicated column names (#496, #478), andare symmetric (#453).pivot_longer() gracefully handlesduplicated column names (#472), and can directly split column names intomultiple variables.pivot_wider() can now aggregate (#474),select keys (#572), and has control over generated column names(#208).
To demonstrate how these functions work in practice, tidyr has gainedseveral new datasets:relig_income,construction,billboard,us_rent_income,fish_encounters andworld_bank_pop.
Finally, tidyr demos have been removed. They are dated, and have beensuperseded byvignette("pivot").
tidyr contains four new functions to supportrectangling, turning a deeply nested list into a tidytibble:unnest_longer(),unnest_wider(),unnest_auto(), andhoist(). They aredocumented in a new vignette:vignette("rectangle").
unnest_longer() andunnest_wider() make iteasier to unnest list-columns of vectors into either rows or columns(#418).unnest_auto() automatically picks between_longer() and_wider() using heuristics basedon the presence of common names.
Newhoist() provides a convenient way of pluckingcomponents of a list-column out into their own top-level columns (#341).This is particularly useful when you are working with deeply nestedJSON, because it provides a convenient shortcut for themutate() +map() pattern:
df %>% hoist(metadata, name = "name")# shortcut fordf %>% mutate(name = map_chr(metadata, "name"))nest() andunnest() have been updated withnew interfaces that are more closely aligned to evolving tidyverseconventions. They use the theory developed invctrs to more consistently handlemixtures of input types, and their arguments have been overhauled basedon the last few years of experience. They are supported by a newvignette("nest"), which outlines some of the main ideas ofnested data (it’s still very rough, but will get better over time).
The biggest change is to their operation with multiple columns:df %>% unnest(x, y, z) becomesdf %>% unnest(c(x, y, z)) anddf %>% nest(x, y, z) becomesdf %>% nest(data = c(x, y, z)).
I have done my best to ensure that common uses ofnest()andunnest() will continue to work, generating aninformative warning telling you precisely how you need to update yourcode. Pleasefile an issueif I’ve missed an important use case.
unnest() has been overhauled:
Newkeep_empty parameter ensures that every row inthe input gets at least one row in the output, inserting missing valuesas needed (#358).
Providesnames_sep argument to control how inner andouter column names are combined.
Uses standard tidyverse name-repair rules, so by default you willget an error if the output would contain multiple columns with the samename. You can override by usingname_repair(#514).
Now supportsNULL entries (#436).
Under the hood,nest() andunnest() areimplemented withchop(),pack(),unchop(), andunpack():
pack() andunpack() allow you to packand unpack columns into data frame columns (#523).
chop() andunchop() chop up rows intosets of list-columns.
Packing and chopping are interesting primarily because they are theatomic operations underlying nesting (and similarly, unchop andunpacking underlie unnesting), and I don’t expect them to be useddirectly very often.
Newexpand_grid(), a tidy version ofexpand.grid(), is lower-level than the existingexpand() andcrossing() functions, as it takesindividual vectors, and does not sort or uniquify them.
crossing(),nesting(), andexpand() have been rewritten to use the vctrs package. Thisshould not affect much existing code, but considerably simplies theimplementation and ensures that these functions work consistently acrossall generalised vectors (#557). As part of this alignment, thesefunctions now only dropNULL inputs, not any 0-lengthvector.
full_seq() now also works when gaps betweenobservations are shorter than the givenperiod, but arewithin the tolerance given bytol. Previously, gaps betweenconsecutive observations had to be in the range [period,period + tol]; gaps can now be in the range[period - tol,period + tol] (
tidyr now re-exportstibble(),as_tibble(), andtribble(), as well as thetidyselect helpers (starts_with(),ends_width(), …). This makes generating documentation,reprexes, and tests easier, and makes tidyr easier to use without alsoattaching dplyr.
All functions that take... have been instrumentedwith functions from theellipsis package to warnif you’ve supplied arguments that are ignored (typically because you’vemisspelled an argument name) (#573).
complete() now usesfull_join() so thatall levels are preserved even when not all levels are specified (
crossing() now takes the unique values of data frameinputs, not just vector inputs (#490).
gather() throws an error if a column is a data frame(#553).
extract() (and hencepivot_longer())can extract multiple input values into a single output column(#619).
fill() is now implemented usingdplyr::mutate_at(). This radically simplifies theimplementation and considerably improves performance when working withgrouped data (#520).
fill() now acceptsdownup andupdown as fill directions (
unite() gainsna.rm argument, making iteasier to remove missing values prior to uniting values together(#203)
crossing() preserves factor levels (#410), now workswith list-columns (#446,expand() which is built on top ofcrossing())
nest() is compatible with dplyr 0.8.0.
spread() works when the id variable has names(#525).
unnest() preserves column being unnested when inputis zero-length (#483), usinglist_of() attribute tocorrectly restore columns, where possible.
unnest() will run with named and unnamedlist-columns of same length (
separate() now acceptsNA as a columnname in theinto argument to denote columns which areomitted from the result. (
Minor updates to ensure compatibility with dependencies.
unnest() weakens test of “atomicity” to restoreprevious behaviour when unnesting factors and dates (#407).There are no deliberate breaking changes in this release.However, a number of packages are failing with errors related to numbersof elements in columns, and row names. It is possible that these areaccidental API changes or new bugs. If you see such an error in yourpackage, I would sincerely appreciate a minimal reprex.
separate() now correctly uses -1 to refer to the farright position, instead of -2. If you depended on this behaviour, you’llneed to switch onpackageVersion("tidyr") > "0.7.2"
Increased test coverage from 84% to 99%.
uncount() performs the inverse operation ofdplyr::count() (#279)
complete(data) now returnsdata ratherthan throwing an error (#390).complete() with zero-lengthcompletions returns original input (#331).
crossing() preservesNAs(#364).
expand() with empty input gives empty data frameinstead ofNULL (#331).
expand(),crossing(), andcomplete() now complete empty factors instead of droppingthem (#270, #285)
extract() has a better error message ifregex does not contain the expected number of groups(#313).
drop_na() no longer drops columns (NA in a listcolumn is any empty (length 0) data structure.
nest() is now faster, especially when a long dataframe is collapsed into a nested data frame with few rows.
nest() on a zero-row data frame works as expected(#320).
replace_na() no longer complains if you try andreplace missing values in variables not present in the data(#356).
replace_na() now also works with vectors (#342,@flying-sheep),and can replaceNULL in list-columns. It throws a bettererror message if you attempt to replace with something other than length1.
separate() no longer checks that... isempty, allowing methods to make use of it. This check was added in tidyr0.4.0 (2016-02-02) to deprecate previous behaviour where... was passed tostrsplit().
separate() andextract() now insertcolumns in correct position whendrop = TRUE(#394).
separate() now works correctly counts from RHS whenusing negative integersep values (
separate() gets improved warning message when piecesaren’t as expected (#375).
separate_rows() supports list columns (#321), andworks with empty tibbles.
spread() now consistently returns 0 row outputs for0 row inputs (#269).
spread() now works whenkey columnincludesNA anddrop isFALSE(#254).
spread() no longer returns tibbles with row names(#322).
spread(),separate(),extract() (#255), andgather() (#347) nowreplace existing variables rather than creating an invalid data framewith duplicated variable names (matching the semantics ofmutate).
unite() now works (as documented) if you don’tsupply any variables (#355).
unnest() gainspreserve argument whichallows you to preserve list columns without unnesting them(#328).
unnest() can unnested list-columns contains lists oflists (#278).
unnest(df) now works ifdf contains nolist-cols (#344)
The SE variantsgather_(),spread_()andnest_() now treat non-syntactic names in the same wayas pre tidy eval versions of tidyr (#361).
Fix tidyr bug revealed by R-devel.
This is a hotfix release to account for some tidyselect changes inthe unit tests.
Note that the upcoming version of tidyselect backtracks on some ofthe changes announced for 0.7.0. The special evaluation semantics forselection have been changed back to the old behaviour because the newrules were causing too much trouble and confusion. From now on dataexpressions (symbols and calls to: andc())can refer to both registered variables and to objects from thecontext.
However the semantics for context expressions (any calls other thanto: andc()) remain the same. Thoseexpressions are evaluated in the context only and cannot refer toregistered variables. If you’re writing functions and refer tocontextual objects, it is still a good idea to avoid data expressions byfollowing the advice of the 0.7.0 release notes.
This release includes important changes to tidyr internals. Tidyr nowsupports the new tidy evaluation framework for quoting (NSE) functions.It also uses the new tidyselect package as selecting backend.
If you see error messages about objects or functions not found,it is likely because the selecting functions are now stricter in theirarguments An example of selecting function isgather() andits... argument. This change makes the code more robust bydisallowing ambiguous scoping. Consider the following code:
x <- 3df <- tibble(w = 1, x = 2, y = 3)gather(df, "variable", "value", 1:x)Does it select the first three columns (using thexdefined in the global environment), or does it select the first twocolumns (using the column namedx)?
To solve this ambiguity, we now make a strict distinction betweendata and context expressions. A data expression is either a bare name oran expression likex:y orc(x, y). In a dataexpression, you can only refer to columns from the data frame.Everything else is a context expression in which you can only refer toobjects that you have defined with<-.
In practice this means that you can no longer refer to contextualobjects like this:
mtcars %>% gather(var, value, 1:ncol(mtcars))x <- 3mtcars %>% gather(var, value, 1:x)mtcars %>% gather(var, value, -(1:x))You now have to be explicit about where to find objects. To do so,you can use the quasiquotation operator!! which willevaluate its argument early and inline the result:
mtcars %>% gather(var, value, !! 1:ncol(mtcars))mtcars %>% gather(var, value, !! 1:x)mtcars %>% gather(var, value, !! -(1:x))An alternative is to turn your data expression into a contextexpression by usingseq() orseq_len() insteadof:. See the section on tidyselect for more informationabout these semantics.
Following the switch to tidy evaluation, you might see warningsabout the “variable context not set”. This is most likely caused bysupplying helpers likeeverything() to underscored versionsof tidyr verbs. Helpers should be always be evaluated lazily. To fixthis, just quote the helper with a formula:drop_na(df, ~everything()).
The selecting functions are now stricter when you supply integerpositions. If you see an error along the lines of
`-0.949999999999999`, `-0.940000000000001`, ... must resolve tointeger column positions, not a double vectorplease round the positions before supplying them to tidyr. Doublevectors are fine as long as they are rounded.
tidyr is now a tidy evaluation grammar. See theprogrammingvignette in dplyr for practical information about tidyevaluation.
The tidyr port is a bit special. While the philosophy of tidyevaluation is that R code should refer to real objects (from the dataframe or from the context), we had to make some exceptions to this rulefor tidyr. The reason is that several functions accept bare symbols tospecify the names ofnew columns to create(gather() being a prime example). This is not tidy becausethe symbol do not represent any actual object. Our workaround is tocapture these arguments usingrlang::quo_name() (so theystill support quasiquotation and you can unquote symbols or strings).This type of NSE is now discouraged in the tidyverse: symbols in R codeshould represent real objects.
Following the switch to tidy eval the underscored variants are softlydeprecated. However they will remain around for some time and withoutwarning for backward compatibility.
The selecting backend of dplyr has been extracted in a standalonepackage tidyselect which tidyr now uses for selecting variables. It isused for selecting multiple variables (indrop_na()) aswell as single variables (thecol argument ofextract() andseparate(), and thekey andvalue arguments ofspread()). This implies the following changes:
The arguments for selecting a single variable now support allfeatures fromdplyr::pull(). You can supply a name or aposition, including negative positions.
Multiple variables are now selected a bit differently. We nowmake a strict distinction between data and context expressions. A dataexpression is either a bare name of an expression likex:yorc(x, y). In a data expression, you can only refer tocolumns from the data frame. Everything else is a context expression inwhich you can only refer to objects that you have defined with<-.
You can still refer to contextual objects in a data expression bybeing explicit. One way of being explicit is to unquote a variable fromthe environment with the tidy eval operator!!:
x<-2drop_na(df,2)# Works finedrop_na(df, x)# Object 'x' not founddrop_na(df,!! x)# Works as if you had supplied 2On the other hand, select helpers likestart_with() arecontext expressions. It is therefore easy to refer to objects and theywill never be ambiguous with data columns:
x <- "d"drop_na(df, starts_with(x))While these special rules is in contrast to most dplyr and tidyrverbs (where both the data and the context are in scope) they make sensefor selecting functions and should provide more robust and helpfulsemantics.
Register C functions
Added package docs
Patch tests to be compatible with dev dplyr.
Patch test to be compatible with dev tibble
Changed deprecation message ofextract_numeric() topoint toreadr::parse_number() rather thanreadr::parse_numeric()
drop_na() removes observations which haveNA in the given variables. If no variables are given, allvariables are considered (#194,
extract_numeric() has been deprecated(#213).
Renamedtable4 andtable5 totable4a andtable4b to make their connectionmore clear. Thekey andvalue variables intable2 have been renamed totype andcount.
expand(),crossing(), andnesting() now silently drop zero-length inputs.
crossing_() andnesting_() are versionsofcrossing() andnesting() that take a listas input.
full_seq() works correctly for dates anddate/times.
getS3method(envir = ) (#205,separate_rows() separates observations with multipledelimited values into separate rows (#69,complete() preserves grouping created by dplyr(#168).
expand() (and hencecomplete())preserves the ordered attribute of factors (#165).
full_seq() preserve attributes for dates anddate/times (#156), and sequences no longer need to start at 0.
gather() can now gather together list columns(#175), andgather_.data.frame(na.rm = TRUE) now onlyremoves missing values if they’re actually present (#173).
nest() returns correct output if every variable isnested (#186).
separate() fills from right-to-left (notleft-to-right!) when fill = “left” (#170,
separate() andunite() nowautomatically drop removed variables from grouping (#159,#177).
spread() gains asep argument. Ifnot-null, this will name columns as “keyNULL missing values will be converted to<NA> (#68).
spread() works in the presence of list-columns(#199)
unnest() works with non-syntactic names(#190).
unnest() gains asep argument. Ifnon-null, this will rename the columns of nested data frames to includeboth the original column name, and the nested column name, separated by.sep (#184).
unnest() gains.id argument that worksthe same way asbind_rows(). This is useful if you have anamed list of data frames or vectors (#125).
Moved in useful sample datasets from the DSR package.
Made compatible with both dplyr 0.4 and 0.5.
tidyr functions that create new columns are more aggressive aboutre-encoding the column names as UTF-8.
nest() where nested data was ending up inthe wrong row (#158).nest() andunnest() have been overhauled tosupport a useful way of structuring data frames: thenested data frame. In a grouped data frame, you haveone row per observation, and additional metadata define the groups. In anested data frame, you have onerow per group, and theindividual observations are stored in a column that is a list of dataframes. This is a useful structure when you have lists of other objects(like models) with one element per group.
nest() now produces a single list of data framescalled “data” rather than a list column for each variable. Nestingvariables are not included in nested data frames. It also works withgrouped data frames made bydplyr::group_by(). You canoverride the default column name with.key.
unnest() gains a.drop argument whichcontrols what happens to other list columns. By default, they’re kept ifthe output doesn’t require row duplication; otherwise they’redropped.
unnest() now hasmutate() semantics for... - this allows you to unnest transformed columns moreeasily. (Previously it used select semantics).
expand() once again allows you to evaluate arbitraryexpressions likefull_seq(year). If you were previouslyusingc() to created nested combinations, you’ll now needto usenesting() (#85, #121).
nesting() andcrossing() allow you tocreate nested and crossed data frames from individual vectors.crossing() is similar tobase::expand.grid()
full_seq(x, period) creates the full sequence ofvalues frommin(x) tomax(x) everyperiod values.
fill() fills inNULLs inlist-columns.
fill() gains a direction argument so that it canfill either upwards or downwards (#114).
gather() now stores the key column as character, bydefault. To revert to the previous behaviour of using a factor (whichallows you to preserve the ordering of the columns), usekey_factor = TRUE (#96).
All tidyr verbs do the right thing for grouped data framescreated bygroup_by() (#122, #129, #81).
seq_range() has been removed. It was never used orannounced.
spread() once again creates columns of mixed typewhenconvert = TRUE (#118,spread() withdrop = FALSE handles zero-length factors (#56).spread()ing a data frame with only key and value columnscreates a one row output (#41).
unite() now removes old columns before adding new(#89,
separate() now warns if defunct … argument is used(#151,
Newcomplete() provides a wrapper aroundexpand(),left_join() andreplace_na() for a common task: completing a data framewith missing combinations of variables.
fill() fills in missing values in a column with thelast non-missing value (#4).
Newreplace_na() makes it easy to replace missingvalues with something meaningful for your data.
nest() is the complement ofunnest()(#3).
unnest() can now work with multiple list-columns atthe same time. If you don’t supply any columns names, it will unlist alllist-columns (#44).unnest() can also handle columns thatare lists of data frames (#58).
tidyr no longer depends on reshape2. This should fix issues ifyou also try to load reshape (#88).
%>% is re-exported from magrittr.
expand() now supports nesting and crossing (seeexamples for details). This comes at the expense of creating newvariables inline (#46).
expand_ does SE evaluation correctly so you can passit a character vector of columns names (or list of formulas etc)(#70).
extract() is 10x faster because it now uses stringiinstead of base R regular expressions. It also returns NA instead ofthrowing an error if the regular expression doesn’t match(#72).
extract() andseparate() preservecharacter vectors whenconvert is TRUE (#99).
The internals ofspread() have been rewritten, andnow preserve all attributes of the inputvalue column. Thismeans that you can now spread date (#62) and factor (#35)inputs.
spread() gives a more informative error message ifkey orvalue don’t exist in the input data(#36).
separate() only displays the first 20 failures(#50). It has finer control over what happens if there are two fewmatches: you can fill with missing values on either the “left” or the“right” (#49).separate() no longer throws an error if thenumber of pieces aren’t as expected - instead it uses drops extra valuesand fills on the right and gives a warning.
If the input is NAseparate() andextract() both return silently return NA outputs, ratherthan throwing an error. (#77)
Experimentalunnest() method for lists has beenremoved.
Experimentalexpand() function (#21).
Experimentunnest() function for converting namedlists into data frames. (#3, #22)
extract_numeric() preserves negative signs(#20).
gather() has better defaults ifkey andvalue are not supplied. If... is omitted,gather() selects all columns (#28). Performance is nowcomparable toreshape2::melt() (#18).
separate() gainsextra argument whichlets you control what happens to extra pieces. The default is to throwan “error”, but you can also “merge” or “drop”.
spread() gainsdrop argument, whichallows you to preserve missing factor levels (#25). It converts factorvalue variables to character vectors, instead of embedding a matrixinside the data frame (#35).