Movatterモバイル変換

Title:

A Grammar of Nested Data Manipulation

Version:

0.3.0

Author:

Mark Rieke [aut], Bolívar Aponte Rolón

[cre], Joran Elias [ctb]

Maintainer:

Bolívar Aponte Rolón <bolaponte@pm.me>

Description:

Provides functions for manipulating nested data frames in a list-column using 'dplyr'https://dplyr.tidyverse.org/ syntax. Rather than unnesting, then manipulating a data frame, 'nplyr' allows users to manipulate each nested data frame directly. 'nplyr' is a wrapper for 'dplyr' functions that provide tools for common data manipulation steps: filtering rows, selecting columns, summarising grouped data, among others.

License:

MIT + file LICENSE

URL:

https://github.com/jibarozzo/nplyr,https://jibarozzo.github.io/nplyr/

BugReports:

https://github.com/jibarozzo/nplyr/issues

Depends:

R (≥ 3.5.0)

Imports:

assertthat, dplyr, magrittr, purrr, rlang, tidyr

Encoding:

UTF-8

RoxygenNote:

7.3.2

Suggests:

gapminder, knitr, readr, rmarkdown, stringr, testthat (≥3.0.0), tibble

Config/testthat/edition:

VignetteBuilder:

knitr

LazyData:

true

NeedsCompilation:

Packaged:

2025-05-28 22:15:12 UTC; baponte

Repository:

CRAN

Date/Publication:

2025-05-29 14:50:02 UTC

Pipe operator

Description

Seemagrittr::%>% for details.

Usage

lhs %>% rhs

Arguments

lhs

A value or the magrittr placeholder.

rhs

A function call using the magrittr semantics.

Value

The result of callingrhs(lhs).

Example survey data regarding job satisfaction

Description

A toy dataset containing 500 responses to a job satisfaction survey. Theresponses were randomly generated using the Qualtrics survey platform.

Usage

job_survey

Format

A data frame with 500 rows and 6 variables:

survey_name: name of survey
Q1: respondent age
Q2: city the respondent resides in
Q3: field that the respondent that works in
Q4: respondent's job satisfaction (on a scale from extremely satisfied to extremely dissatisfied)
Q5: respondent's annual salary, in thousands of dollars

Nested filtering joins

Description

Nested filtering joins filter rows from.nest_data based on the presence orabsence of matches iny:

nest_semi_join() returns all rows from.nest_data with a match iny.
nest_anti_join() returns all rows from.nest_data without a match iny.

Usage

nest_semi_join(.data, .nest_data, y, by = NULL, copy = FALSE, ...)nest_anti_join(.data, .nest_data, y, by = NULL, copy = FALSE, ...)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

y

A data frame, data frame extension (e.g., a tibble), or a lazy dataframe (e.g., from dbplyr or dtplyr).

by

A character vector of variables to join by or a join specificationcreated withjoin_by().

IfNULL, the default,⁠nest_*_join()⁠ will perform a natural join, usingall variables in common across each object in.nest_data andy. Amessage lists the variables so you can check they're correct; suppress themessage by supplyingby explicitly.

To join on different variables between the objects in.nest_data andy,use a named vector. For example,by = c("a" = "b") will match.nest_data$a toy$b for each object in.nest_data.

To join by multiple variables, use a vector with length >1. For example,by = c("a", "b") will match.nest_data$a toy$a and.nest_data$b toy$b for each object in.nest_data. Use a named vector to matchdifferent variables in.nest_data andy. For example,by = c("a" = "b", "c" = "d") will match.nest_data$a toy$b and.nest_data$c toy$d for each object in.nest_data.

To perform a cross-join, generating all combinations of each object in.nest_data andy, useby = character().

copy

If.nest_data andy are not from the same data source andcopy = TRUE theny will be copied into the same src as.nest_data.(Need to review this parameter in more detail for applicability with nplyr)

...

One or more unquoted expressions separated by commas. Variablenames can be used if they were positions in the data frame, so expressionslikex:y can be used to select a range of variables.

Details

nest_semi_join() andnest_anti_join() are largely wrappers fordplyr::semi_join() anddplyr::anti_join() and maintain the functionalityofsemi_join() andanti_join() within each nested data frame. For moreinformation onsemi_join() oranti_join(), please refer to thedocumentation indplyr.

Value

An object of the same type as.data. Each object in the column.nest_datawill also be of the same type as the input. Each object in.nest_data hasthe following properties:

Rows are a subset of the input, but appear in the same order.
Columns are not modified.
Data frame attributes are preserved.
Groups are taken from.nest_data. The number of groups may be reduced.

Examples

gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent)gm_codes <- gapminder::country_codes %>% dplyr::slice_sample(n = 10)gm_nest %>% nest_semi_join(country_data, gm_codes, by = "country")gm_nest %>% nest_anti_join(country_data, gm_codes, by = "country")

Nested Mutating joins

Description

Nested mutating joins add columns fromy to each of the nested data framesin.nest_data, matching observations based on the keys. There are fournested mutating joins:

Inner join

nest_inner_join() only keeps observations from.nest_data that have amatching key iny.

The most important property of an inner join is that unmatched rows in eitherinput are not included in the result.

Outer joins

There are three outer joins that keep observations that appear in at leastone of the data frames:

nest_left_join() keeps all observations in.nest_data.
nest_right_join() keeps all observations iny.
nest_full_join() keeps all observations in.nest_data andy.

Usage

nest_inner_join(  .data,  .nest_data,  y,  by = NULL,  copy = FALSE,  suffix = c(".x", ".y"),  ...,  keep = FALSE)nest_left_join(  .data,  .nest_data,  y,  by = NULL,  copy = FALSE,  suffix = c(".x", ".y"),  ...,  keep = FALSE)nest_right_join(  .data,  .nest_data,  y,  by = NULL,  copy = FALSE,  suffix = c(".x", ".y"),  ...,  keep = FALSE)nest_full_join(  .data,  .nest_data,  y,  by = NULL,  copy = FALSE,  suffix = c(".x", ".y"),  ...,  keep = FALSE)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

y

A data frame, data frame extension (e.g., a tibble), or a lazy dataframe (e.g., from dbplyr or dtplyr).

by

A character vector of variables to join by or a join specificationcreated withjoin_by().

To join on different variables between the objects in.nest_data andy,use a named vector. For example,by = c("a" = "b") will match.nest_data$a toy$b for each object in.nest_data.

To perform a cross-join, generating all combinations of each object in.nest_data andy, useby = character().

copy

suffix

If there are non-joined duplicate variables in.nest_data andy, these suffixes will be added to the output to disambiguate them.Should be a character vector of length 2.

...

Other parameters passed onto methods. Includes:

na_matches : Should twoNA or twoNaN values match?
- "na", the default, treats twoNA or twoNaN values as equal.
- "never" treats twoNA or twoNaN values as different, and willnever match them together or to any other values.
multiple : Handling of rows in.nest_data with multiple matches iny.
- "all" returns every match detected iny.
- "any" returns one match detected iny, with no guarantees on whichmatch will be returned. It is often faster than"first" and"last" ifyou just need to detect if there is at least one match.
- "first" returns the first match detected iny.
- "last" returns the last match detected iny.
- "warning" throws a warning if multiple matches are detected, and thenfalls back to"all".
- "error" throws an error if multiple matches are detected.
unmatched : How should unmatched keys that would result in dropped rowsbe handled?
- "drop" drops unmatched keys from the result.
- "error" throws an error if unmatched keys are detected.

keep

Should the join keys from both.nest_data andy be preservedin the output?

Details

nest_inner_join(),nest_left_join(),nest_right_join(), andnest_full_join() are largely wrappers fordplyr::inner_join(),dplyr::left_join(),dplyr::right_join(), anddplyr::full_join() andmaintain the functionality of these verbs within each nested data frame. Formore information oninner_join(),left_join(),right_join(), orfull_join(), please refer to the documentation indplyr.

Value

An object of the same type as.data. Each object in the column.nest_datawill also be of the same type as the input. The order of the rows and columnsof each object in.nest_data is preserved as much as possible. Each objectin.nest_data has the following properties:

Fornest_inner_join(), a subset of rows in each object in.nest_data.Fornest_left_join(), all rows in each object in.nest_data.Fornest_right_join(), a subset of rows in each object in.nest_data,followed by unmatchedy rows.Fornest_full_join(), all rows in each object in.nest_data, followedby unmatchedy rows.
Output columns include all columns from each.nest_data and all non-keycolumns fromy. Ifkeep = TRUE, the key columns fromy are includedas well.
If non-key columns in any object in.nest_data andy have the same name,suffixes are added to disambiguate. Ifkeep = TRUE and key columns in.nest_data andy have the same name,suffixes are added todisambiguate these as well.
Ifkeep = FALSE, output columns included inby are coerced to theircommon type between the objects in.nest_data andy.
Groups are taken from.nest_data.

Examples

gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent)gm_codes <- gapminder::country_codesgm_nest %>% nest_inner_join(country_data, gm_codes, by = "country")gm_nest %>% nest_left_join(country_data, gm_codes, by = "country")gm_nest %>% nest_right_join(country_data, gm_codes, by = "country")gm_nest %>% nest_full_join(country_data, gm_codes, by = "country")

Arrange rows within a nested data frames by column values

Description

nest_arrange() orders the rows of nested data frames by the values ofselected columns.

Usage

nest_arrange(.data, .nest_data, ..., .by_group = FALSE)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

...

Variables, or functions of variables. Usedplyr::desc() to sorta variable in descending order.

.by_group

IfTRUE, will sort first by grouping variable. Applies togrouped data frames only.

Details

nest_arrange() is largely a wrapper fordplyr::arrange() and maintainsthe functionality ofarrange() within each nested data frame. For moreinformation onarrange(), please refer to the documentation indplyr.

Value

An object of the same type as.data. Each object in the column.nest_datawill be also of the same type as the input. Each object in.nest_data hasthe following properties:

All rows appear in the output, but (usually) in a different place.
Columns are not modified.
Groups are not modified.
Data frame attributes are preserved.

Examples

gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent)gm_nest %>%   nest_arrange(country_data, pop)gm_nest %>%  nest_arrange(country_data, desc(pop))

Count observations in a nested data frame by group

Description

nest_count() lets you quickly count the unique values of one or morevariables within each nested data frame.nest_count() results in a summarywith one row per each set of variables to count by.nest_add_count() isequivalent with the exception that it retains all rows and adds a new columnwith group-wise counts.

Usage

nest_count(.data, .nest_data, ..., wt = NULL, sort = FALSE, name = NULL)nest_add_count(.data, .nest_data, ..., wt = NULL, sort = FALSE, name = NULL)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

...

Variables to group by.

wt

Frequency weights.Can beNULL or a variable:

IfNULL (the default), counts the number of rows in each group.
If a variable, computessum(wt) for each group.

sort

IfTRUE, will show the largest groups at the top.

name

The name of the new column in the output.

Details

nest_count() andnest_add_count() are largely wrappers fordplyr::count() anddplyr::add_count() and maintain the functionality ofcount() andadd_count() within each nested data frame. For moreinformation oncount() andadd_count(), please refer to the documentationindplyr.

Value

An object of the same type as.data. Each object in the column.nest_datawill also be of the same type as the input.nest_count() andnest_add_count() group each object in.nest_data transiently, so theoutput returned in.nest_data will have the same groups as the input.

Examples

gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent)# count the number of times each country appears in each nested tibblegm_nest %>% nest_count(country_data, country)gm_nest %>% nest_add_count(country_data, country)# count the sum of population for each country in each nested tibblegm_nest %>% nest_count(country_data, country, wt = pop)gm_nest %>% nest_add_count(country_data, country, wt = pop)

Subset distinct/unique rows within a nested data frame

Description

nest_distinct() selects only unique/distinct rows in a nested data frame.

Usage

nest_distinct(.data, .nest_data, ..., .keep_all = FALSE)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

...

Optional variables to use when determining uniqueness. If thereare multiple rows for a given combination of inputs, only the first rowwill be preserved. If omitted, will use all variables.

.keep_all

IfTRUE, keep all variables in.nest_data. If acombination of... is not distinct, this keeps the first row of values.

Details

nest_distinct() is largely a wrapper fordplyr::distinct() and maintainsthe functionality ofdistinct() within each nested data frame. For moreinformation ondistinct(), please refer to the documentation indplyr.

Value

An object of the same type as.data. Each object in the column.nest_datawill also be of the same type as the input. Each object in.nest_data hasthe following properties:

Rows are a subset of the input but appear in the same order.
Columns are not modified if... is empty or.keep_all isTRUE.Otherwise,nest_distinct() first callsdplyr::mutate() to create newcolumns within each object in.nest_data.
Groups are not modified.
Data frame attributes are preserved.

Examples

gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent)gm_nest %>% nest_distinct(country_data, country)gm_nest %>% nest_distinct(country_data, country, year)

Drop rows containing missing values in a column of nested data frames

Description

nest_drop_na() is used to drop rows from each data frame in a column ofnested data frames.

Usage

nest_drop_na(.data, .nest_data, ...)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

...

Columns within.nest_data to inspect for missing values. If empty,all columns within each dataframe in.nest_data are used.

Details

nest_drop_na() is a wrapper fortidyr::drop_na() and maintains the functionalityofdrop_na() within each nested data frame. For more information ondrop_na()please refer to the documentation in'tidyr'.

Value

An object of the same type as.data. Each object in the column.nest_datawill have rows dropped according to the presence of NAs.

Examples

gm <- gapminder::gapminder # randomly insert NAs into the dataframe & nestset.seed(123) gm <-   gm %>%   dplyr::mutate(pop = dplyr::if_else(runif(nrow(gm)) >= 0.9,                                      NA_integer_,                                      pop))  gm_nest <- gm %>% tidyr::nest(country_data = -continent)# drop rows where an NA exists in column `pop`gm_nest %>%   nest_drop_na(country_data, pop)

Extract a character column into multiple columns using regex groups in a column of nested data frames

Description

nest_extract() is used to extract capturing groups from a column in a nesteddata frame using regular expressions into a new column. If the groups don'tmatch, or the input is NA, the output will be NA.

Usage

nest_extract(  .data,  .nest_data,  col,  into,  regex = "([[:alnum:]]+)",  remove = TRUE,  convert = FALSE,  ...)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

col

Column name or position within.nest_data (must be present withinall nested data frames in.nest_data). This is passed totidyselect::vars_pull().

This argument is passed by expression and supports quasiquotation (you canunquote column names or column positions).

into

Names of new variables to create as character vector.UseNA to omit the variable in the output.

regex

A string representing a regular expression used to extract thedesired values. There should be one group (defined by⁠()⁠) for eachelement ofinto.

remove

IfTRUE, remove input column from output data frame.

convert

IfTRUE, will runtype.convert() withas.is = TRUE on new columns. This is useful if the componentcolumns are integer, numeric or logical.

NB: this will cause string"NA"s to be converted toNAs.

...

Additional arguments passed on totidyr::extract() methods.

Details

nest_extract() is a wrapper fortidyr::extract() and maintains the functionalityofextract() within each nested data frame. For more information onextract()please refer to the documentation in'tidyr'.

Value

An object of the same type as.data. Each object in the column.nest_datawill have new columns created according to the capture groups specified inthe regular expression.

Examples

set.seed(123)gm <- gapminder::gapminder gm <-   gm %>%   dplyr::mutate(comb = sample(c(NA, "a-b", "a-d", "b-c", "d-e"),                              size = nrow(gm),                              replace = TRUE))                              gm_nest <- gm %>% tidyr::nest(country_data = -continent)gm_nest %>%   nest_extract(country_data,               col = comb,               into = c("var1","var2"),               regex = "([[:alnum:]]+)-([[:alnum:]]+)")

Fill missing values in a column of nested data frames

Description

nest_fill() is used to fill missing values in selected columns of nested dataframes using the next or previous entries in a column of nested data frames.

Usage

nest_fill(  .data,  .nest_data,  ...,  .direction = c("down", "up", "downup", "updown"))

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

...

<[`tidy-select`][tidyr_tidy_select]> Columns to fill.

.direction

Direction in which to fill missing values. Currently either"down" (the default), "up", "downup" (i.e. first down and then up) or "updown"(first up and then down).

Details

nest_fill() is a wrapper for[tidyr::fill()] and maintains the functionalityoffill() within each nested data frame. For more information onfill()please refer to the documentation in'tidyr'.

Value

An object of the same type as.data. Each object in the column.nest_datawill have the chosen columns filled in the direction specified by.direction.

Examples

set.seed(123)gm <-  gapminder::gapminder %>%  dplyr::mutate(pop = dplyr::if_else(runif(dplyr::n()) >= 0.9,                                     NA_integer_,                                     pop))gm_nest <- gm %>% tidyr::nest(country_data = -continent)gm_nest %>%  nest_fill(country_data, pop, .direction = "down")

Subset rows in nested data frames using column values.

Description

nest_filter() is used to subset nested data frames, retaining all rows thatsatisfy your conditions. To be retained, the row must produce a value ofTRUE for all conditions. Note that when a condition evaluates toNA therow will be dropped, unlike base subsetting with[.

nest_filter() subsets the rows within.nest_data, applying theexpressions in... to the column values to determine which rows should beretained. It can be applied to both grouped and ungrouped data.

Usage

nest_filter(.data, .nest_data, ..., .preserve = FALSE)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

...

Expressions that return a logical value, and are defined in termsof the variables in.nest_data. If multiple expressions are included,they are combined with the& operator. Only rows for which all conditionsevaluate toTRUE are kept.

.preserve

Relevant when.nest_data is grouped. If.preserve = FALSE (the default), the grouping structure is recalculatedbased on the resulting data, otherwise the grouping is kept as is.

Details

nest_filter() is largely a wrapper fordplyr::filter() and maintains thefunctionality offilter() within each nested data frame. For moreinformation onfilter(), please refer to the documentation indplyr.

Value

An object of the same type as.data. Each object in the column.nest_datawill also be of the same type as the input. Each object in.nest_data hasthe following properties:

Rows are a subset of the input, but appear in the same order.
Columns are not modified.
The number of groups may be reduced (if.preserve is notTRUE).
Data frame attributes are preserved.

Examples

gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent)# apply a filtergm_nest %>%  nest_filter(country_data, year > 1972)# apply multiple filtersgm_nest %>%  nest_filter(country_data, year > 1972, pop < 10000000)  # apply a filter on grouped datagm_nest %>%  nest_group_by(country_data, country) %>%  nest_filter(country_data, pop > mean(pop))

Group nested data frames by one or more variables

Description

nest_group_by() takes a set of nested tbls and converts it to a set ofnested grouped tbls where operations are performed "by group".nest_ungroup() removes grouping.

Usage

nest_group_by(.data, .nest_data, ..., .add = FALSE, .drop = TRUE)nest_ungroup(.data, .nest_data, ...)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

...

Innest_group_by(), variables or computations to group by.Computations are always done on the ungrouped data frames. To performcomputations on the grouped data, you need to use a separatemutate()step after thegroup_by().Innest_ungroup(), variables to remove from the grouping.

.add

WhenFALSE (the default),nest_group_by() will override theexisting groups. To add to the existing groups, use.add = TRUE.

.drop

Drop groups formed by factor levels that don't appear in thedata? The default isTRUE except when.nest_data has been previouslygrouped with.drop = FALSE. Seedplyr::group_by_drop_default() fordetails.

Details

nest_group_by() andnest_ungroup() are largely wrappers fordplyr::group_by() anddplyr::ungroup() and maintain the functionality ofgroup_by() andungroup() within each nested data frame. For moreinformation ongroup_by() orungroup(), please refer to the documentationindplyr.

Value

An object of the same type as.data. Each object in the column.nest_datawill be returned as a grouped data frame with classgrouped_df, unless thecombination of... and.add yields an empty set of grouping columns, inwhich case a tibble will be returned.

Examples

gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent)# grouping doesn't change .nest_data, just .nest_data class:gm_nest_grouped <-  gm_nest %>%  nest_group_by(country_data, year)gm_nest_grouped# It changes how it acts with other nplyr verbs:gm_nest_grouped %>%  nest_summarise(    country_data,    lifeExp = mean(lifeExp),    pop = mean(pop),    gdpPercap = mean(gdpPercap)  )# ungrouping removes variable groups:gm_nest_grouped %>% nest_ungroup(country_data)

Create, modify, and delete columns in nested data frames

Description

nest_mutate() adds new variables to and preserves existing ones withinthe nested data frames in.nest_data.nest_transmute() adds new variables to and drops existing ones from thenested data frames in.nest_data.

Usage

nest_mutate(.data, .nest_data, ...)nest_transmute(.data, .nest_data, ...)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

...

Name-value pairs.The name gives the name of the column in the output.

The value can be:

A vector of length 1, which will be recycled to the correct length.
NULL, to remove the column.
A data frame or tibble, to create multiple columns in the output.

Details

nest_mutate() andnest_transmute() are largely wrappers fordplyr::mutate() anddplyr::transmute() and maintain the functionality ofmutate() andtransmute() within each nested data frame. For moreinformation onmutate() ortransmute(), please refer to the documentationindplyr.

Value

An object of the same type as.data. Each object in the column.nest_datawill also be of the same type as the input. Each object in.nest_data hasthe following properties:

Fornest_mutate():
- Columns from each object in.nest_data will be preserved according tothe.keep argument.
- Existing columns that are modified by... will always be returned intheir original location.
- New columns created through... will be placed according to the.before and.after arguments.
Fornest_transmute():
- Columns created or modified through... will be returned in the orderspecified by....
- Unmodified grouping columns will be placed at the front.
The number of rows is not affected.
Columns given the valueNULL will be removed.
Groups will be recomputed if a grouping variable is mutated.
Data frame attributes will be preserved.

Examples

gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent)# add or modify columns:gm_nest %>%  nest_mutate(    country_data,    lifeExp = NULL,    gdp = gdpPercap * pop,    pop = pop/1000000  )  # use dplyr::across() to apply transformation to multiple columns gm_nest %>%  nest_mutate(    country_data,    across(c(lifeExp:gdpPercap), mean)  )# nest_transmute() drops unused columns when mutating:gm_nest %>%  nest_transmute(    country_data,    country = country,    year = year,    pop = pop/1000000  )

Nested nest join

Description

nest_nest_join() returns all rows and columns in.nest_data with a newnested-df column that contains all matches fromy. When there is no match,the list contains a 0-row tibble.

Usage

nest_nest_join(  .data,  .nest_data,  y,  by = NULL,  copy = FALSE,  keep = FALSE,  name = NULL,  ...)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

y

A data frame, data frame extension (e.g., a tibble), or a lazy dataframe (e.g., from dbplyr or dtplyr).

by

A character vector of variables to join by or a join specificationcreated withjoin_by().

To join on different variables between the objects in.nest_data andy,use a named vector. For example,by = c("a" = "b") will match.nest_data$a toy$b for each object in.nest_data.

To perform a cross-join, generating all combinations of each object in.nest_data andy, useby = character().

copy

keep

Should the join keys from both.nest_data andy be preservedin the output?

name

The name of the list column nesting joins create. IfNULL, thename ofy is used.

...

One or more unquoted expressions separated by commas. Variablenames can be used if they were positions in the data frame, so expressionslikex:y can be used to select a range of variables.

Details

nest_nest_join() is largely a wrapper arounddplyr::nest_join() andmaintains the functionality ofnest_join() within east nested data frame.For more information onnest_join(), please refer to the documentation indplyr.

Value

An object of the same type as.data. Each object in the column.nest_datawill also be of the same type as the input.

Examples

gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent)gm_codes <- gapminder::country_codesgm_nest %>% nest_nest_join(country_data, gm_codes, by = "country")

Change column order within a nested data frame

Description

nest_relocate() changes column positions within a nested data frame, usingthe same syntax asnest_select() ordplyr::select() to make it easy tomove blocks of columns at once.

Usage

nest_relocate(.data, .nest_data, ..., .before = NULL, .after = NULL)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

...

Columns to move.

.before,.after

Destination of columns selected by.... Supplyingneither will move columns to the left-hand side; specifying both is anerror.

Details

nest_relocate() is largely a wrapper fordplyr::relocate() and maintainsthe functionality ofrelocate() within each nested data frame. For moreinformation onrelocate(), please refer to the documentation indplyr.

Value

An object of the same type as.data. Each object in the column.nest_datawill also be of the same type as the input. Each object in.nest_data hasthe following properties:

Rows are not affected.
The same columns appear in the output, but (usually) in a different place.
Data frame attributes are preserved.
Groups are not affected.

Examples

gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent)gm_nest %>% nest_relocate(country_data, year)gm_nest %>% nest_relocate(country_data, pop, .after = year)

Rename columns in nested data frames

Description

nest_rename() changes the names of individual variables usingnew_name = old_name syntax;nest_rename_with() renames columns using afunction.

Usage

nest_rename(.data, .nest_data, ...)nest_rename_with(.data, .nest_data, .fn, .cols = dplyr::everything(), ...)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

...

Fornest_rename(): Usenew_name = old_name to rename selected variables.

Fornest_rename_with(): additional arguments passed onto.fn.

.fn

A function used to transform the selected.cols. Should return acharacter vector the same length as the input.

.cols

Columns to rename; defaults to all columns.

Details

nest_rename() andnest_rename_with() are largely wrappers fordplyr::rename() anddplyr::rename_with() and maintain the functionalityofrename() andrename_with() within each nested data frame. For moreinformation onrename() orrename_with(), please refer to thedocumentation indplyr.

Value

An object of the same type as.data. Each object in the column.nest_datawill also be of the same type as the input. Each object in.nest_data hasthe following properties:

Rows are not affected.
Column names are changed; column order is preserved.
Data frame attributes are preserved.
Groups are updated to reflect new names.

Examples

gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent)gm_nest %>% nest_rename(country_data, population = pop)gm_nest %>% nest_rename_with(country_data, stringr::str_to_lower)

Replace NAs with specified values in a column of nested data frames

Description

nest_replace_na() is used to replace missing values in selected columns ofnested data frames using values specified by column.

Usage

nest_replace_na(.data, .nest_data, replace, ...)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

replace

A list of values, with one value for each column in that hasNA valuesto be replaced.

...

Additional arguments fortidyr::replace_na() methods. Currently unused.

Details

nest_replace_na() is a wrapper fortidyr::replace_na() and maintains the functionalityofreplace_na() within each nested data frame. For more information onreplace_na()please refer to the documentation in'tidyr'.

Value

An object of the same type as.data. Each object in the column.nest_datawill have NAs replaced in the specified columns.

Examples

set.seed(123)gm <-   gapminder::gapminder %>%   dplyr::mutate(pop = dplyr::if_else(runif(dplyr::n()) >= 0.9,                                     NA_integer_,                                     pop))                                     gm_nest <- gm %>% tidyr::nest(country_data = -continent)gm_nest %>%   nest_replace_na(.nest_data = country_data,                  replace = list(pop = -500))

Subset columns in nested data frames using their names and types

Description

nest_select() selects (and optionally renames) variables in nested dataframes, using a concise mini-language that makes it easy to refer tovariables based on their name (e.g.,a:f selects all columns froma onthe left tof on the right). You can also use predicate functions likeis.numeric to select variables based on their properties.

Usage

nest_select(.data, .nest_data, ...)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

...

One or more unquoted expressions separated by commas. Variablenames can be used if they were positions in the data frame, so expressionslikex:y can be used to select a range of variables.

Details

nest_select() is largely a wrapper fordplyr::select() and maintains thefunctionality ofselect() within each nested data frame. For moreinformation onselect(), please refer to the documentation indplyr.

Value

An object of the same type as.data. Each object in the column.nest_datawill also be of the same type as the input. Each object in.nest_data hasthe following properties:

Rows are not affect.
Output columns are a subset of input columns, potentially with a differentorder. Columns will be renamed ifnew_name = old_name form is used.
Data frame attributes are preserved.
Groups are maintained; you can't select off grouping variables.

Examples

gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent)gm_nest %>% nest_select(country_data, country, year, pop)gm_nest %>% nest_select(country_data, dplyr::where(is.numeric))

Separate a character column into multiple columns in a column of nested data frames

Description

nest_separate() is used to separate a single character column into multiplecolumns using a regular expression or a vector of character positions in alist of nested data frames.

Usage

nest_separate(  .data,  .nest_data,  col,  into,  sep = "[^[:alnum:]]+",  remove = TRUE,  convert = FALSE,  extra = "warn",  fill = "warn",  ...)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

col

Column name or position within. Must be present in all data framesin.nest_data. This is passed totidyselect::vars_pull().

This argument is passed by expression and supports quasiquotation (you canunquote column names or column positions).

into

Names of new variables to create as character vector.UseNA to omit the variable in the output.

sep

Separator between columns.

If character,sep is interpreted as a regular expression. The defaultvalue is a regular expression that matches any sequence ofnon-alphanumeric values.

If numeric,sep is interpreted as character positions to split at. Positivevalues start at 1 at the far-left of the string; negative value start at -1 atthe far-right of the string. The length ofsep should be one less thaninto.

remove

IfTRUE, remove input column from output data frame.

convert

IfTRUE, will runtype.convert() withas.is = TRUE on new columns. This is useful if the componentcolumns are integer, numeric or logical.

NB: this will cause string"NA"s to be converted toNAs.

extra

Ifsep is a character vector, this controls whathappens when there are too many pieces. There are three valid options:

"warn" (the default): emit a warning and drop extra values.
"drop": drop any extra values without a warning.
"merge": only splits at mostlength(into) times

fill

Ifsep is a character vector, this controls whathappens when there are not enough pieces. There are three valid options:

"warn" (the default): emit a warning and fill from the right
"right": fill with missing values on the right
"left": fill with missing values on the left

...

Additional arguments passed on totidyr::separate() methods.

Details

nest_separate() is a wrapper fortidyr::separate() and maintains the functionalityofseparate() within each nested data frame. For more information onseparate()please refer to the documentation in'tidyr'.

Value

An object of the same type as.data. Each object in the column.nest_datawill have the specified column split according to the regular expression orthe vector of character positions.

Examples

set.seed(123)gm <-   gapminder::gapminder %>%   dplyr::mutate(comb = paste(continent, year, sep = "-"))  gm_nest <- gm %>% tidyr::nest(country_data = -continent)gm_nest %>%   nest_separate(country_data,                col = comb,                into = c("var1","var2"),                sep = "-")

Subset rows in nested data frames using their positions.

Description

nest_slice() lets you index rows in nested data frames by their (integer)locations. It allows you to select, remove, and duplicate rows. It isaccompanied by a number of helpers for common use cases:

nest_slice_head() andnest_slice_tail() select the first or last rowsof each nested data frame in.nest_data.
nest_slice_sample() randomly selects rows from each data frame in.nest_data.
nest_slice_min() andnest_slice_max() select the rows with the highestor lowest values of a variable within each nested data frame in.nest_data.

If.nest_data is a grouped data frame, the operation will be performed oneach group, so that (e.g.)nest_slice_head(df, nested_dfs, n = 5) willreturn the first five rows in each group for each nested data frame.

Usage

nest_slice(.data, .nest_data, ..., .preserve = FALSE)nest_slice_head(.data, .nest_data, ...)nest_slice_tail(.data, .nest_data, ...)nest_slice_min(.data, .nest_data, order_by, ..., with_ties = TRUE)nest_slice_max(.data, .nest_data, order_by, ..., with_ties = TRUE)nest_slice_sample(.data, .nest_data, ..., weight_by = NULL, replace = FALSE)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

...

Fornest_slice(): Integer row values.

Provide either positive values to keep, or negative values to drop. Thevalues provided must be either all positive or all negative. Indices beyondthe number of rows in the input are silently ignored.

Fornest_slice_helpers(), these arguments are passed on to methods.

Additionally:

n,prop Provide eithern, the number of rows, orprop, theproportion of rows to select. If neither are supplied,n = 1 will beused.
If a negative value ofn orprop is provided, the specified number orproportion of rows will be removed.
Ifn is greater than the number of rows in the group (orprop > 1), theresult will be silently truncated to the group size. If the proportion of agroup size does not yield an integer number of rows, the absolute value ofprop*nrow(.nest_data) is rounded down.

.preserve

Relevant when.nest_data is grouped.If.preserve = FALSE (the default), the grouping structure isrecalculated based on the resulting data, otherwise the grouping data iskept as is.

order_by

Variable or function of variables to order by.

with_ties

Should ties be kept together? The default,TRUE, mayreturn more rows than you request. UseFALSE to ignore ties and returnthe firstn rows.

weight_by

Sampling weights. This must evaluate to a vector ofnon-negative numbers the same length as the input. Weights are automaticallystandardised to sum to 1.

replace

Should sampling be performed with (TRUE) or without (FALSE,the default) replacement?

Details

nest_slice() and its helpers are largely wrappers fordplyr::slice() andits helpers and maintains the functionality ofslice() and its helperswithin each nested data frame. For more information onslice() or itshelpers, please refer to the documentation indplyr.

Value

An object of the same type as.data. Each object in the column.nest_datawill also be of the same type as the input. Each object in.nest_data hasthe following properties:

Each row may appear 0, 1, or many times in the output.
Columns are not modified.
Groups are not modified.
Data frame attributes are preserved.

Examples

gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent)# select the 1st, 3rd, and 5th rows in each data frame in country_datagm_nest %>% nest_slice(country_data, 1, 3, 5)# or select all but the 1st, 3rd, and 5th rows:gm_nest %>% nest_slice(country_data, -1, -3, -5)# first and last rows based on existing order:gm_nest %>% nest_slice_head(country_data, n = 5)gm_nest %>% nest_slice_tail(country_data, n = 5)# rows with minimum and maximum values of a variable:gm_nest %>% nest_slice_min(country_data, lifeExp, n = 5)gm_nest %>% nest_slice_max(country_data, lifeExp, n = 5)# randomly select rows with or without replacement:gm_nest %>% nest_slice_sample(country_data, n = 5)gm_nest %>% nest_slice_sample(country_data, n = 5, replace = TRUE)

Summarise each group in nested data frames to fewer rows

Description

nest_summarise() creates a new set of nested data frames. Each will haveone (or more) rows for each combination of grouping variables; if there areno grouping variables, the output will have a single row summarising allobservations in.nest_data. Each nested data frame will contain one columnfor each grouping variable and one column for each of the summary statisticsthat you have specified.

nest_summarise() andnest_summarize() are synonyms.

Usage

nest_summarise(.data, .nest_data, ..., .groups = NULL)nest_summarize(.data, .nest_data, ..., .groups = NULL)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

...

Name-value pairs of functions. The name will be the name of thevariable in the result.

The value can be:

A vector of length 1, e.g.min(x),n(), orsum(is.na(y)).
A vector of lengthn, e.g.,quantile().
A data frame, to add multiple columns from a single expression.

.groups

Grouping structure of theresult. Refer todplyr::summarise() for more up-to-date information.

Details

nest_summarise() is largely a wrapper fordplyr::summarise() andmaintains the functionality ofsummarise() within each nested data frame.For more information onsummarise(), please refer to the documentation indplyr.

Value

An object of the same type as.data. Each object in the column.nest_datawillusually be of the same type as the input. Each object in.nest_data hasthe following properties:

The rows come from the underlyinggroup_keys()
The columns are a combination of the grouping keys and the summaryexpressions that you provide.
The grouping structure is controlled by the.groups argument, the outputmay be another grouped_df, a tibble, or a rowwise data frame.
Data frame attributes arenot preserved, becausenest_summarise()fundamentally creates a new data frame for each object in.nest_data.

Examples

gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent)# a summary applied to an ungrouped tbl returns a single rowgm_nest %>%  nest_summarise(    country_data,    n = dplyr::n(),    median_pop = median(pop)  )# usually, you'll want to group firstgm_nest %>%  nest_group_by(country_data, country) %>%  nest_summarise(    country_data,    n = dplyr::n(),    median_pop = median(pop)  )

Unite multiple columns into one in a column of nested data frames

Description

nest_unite() is used to combine multiple columns into one in a column ofnested data frames.

Usage

nest_unite(  .data,  .nest_data,  col,  ...,  sep = "_",  remove = TRUE,  na.rm = FALSE)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazydata frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

col

The name of the new column, as a string or symbol.

This argument is passed by expression and supportsquasiquotation (you can unquote stringsand symbols). The name is captured from the expression withrlang::ensym() (note that this kind of interface wheresymbols do not represent actual objects is now discouraged in thetidyverse; we support it here for backward compatibility).

...

Columns to unite.

sep

Separator to use between values.

remove

IfTRUE, remove input columns from output data frame.

na.rm

IfTRUE, missing values will be removed prior to unitingeach value.

Details

nest_unite() is a wrapper fortidyr::unite() and maintains the functionalityofunite() within each nested data frame. For more information onunite()please refer to the documentation in'tidyr'.

Value

An object of the same type as.data. Each object in the column.nest_datawill have a new column created as a combination of existing columns.

Examples

set.seed(123)gm <- gapminder::gapminder gm_nest <- gm %>% tidyr::nest(country_data = -continent)gm_nest %>%   nest_unite(country_data,              col = comb,              year,              pop)

Example survey data regarding personal life satisfaction

Description

A toy dataset containing 750 responses to a personal satisfaction survey. Theresponses were randomly generated using the Qualtrics survey platform.

Usage

personal_survey

Format

A data frame with 750 rows and 6 variables

survey_name: name of survey
Q1: respondent age
Q2: city the respondent resides in
Q3: field that the respondent that works in
Q4: respondent's personal life satisfaction (on a scale from extremely satisfied to extremely dissatisfied)
Q5: open text response elaborating on personal life satisfaction

Movatterモバイル変換

Pipe operator

Description

Usage

Arguments

Value

Example survey data regarding job satisfaction

Description

Usage

Format

Nested filtering joins

Description

Usage

Arguments

Details

Value

See Also

Examples

Nested Mutating joins

Description

Inner join

Outer joins

Usage

Arguments

Details

Value

See Also

Examples

Arrange rows within a nested data frames by column values

Description

Usage

Arguments

Details

Value

See Also

Examples

Count observations in a nested data frame by group

Description

Usage

Arguments

Details

Value

Examples

Subset distinct/unique rows within a nested data frame

Description

Usage

Arguments

Details

Value

Examples

Drop rows containing missing values in a column of nested data frames

Description

Usage

Arguments

Details

Value

See Also

Examples

Extract a character column into multiple columns using regex groups in a column of nested data frames

Description

Usage

Arguments

Details

Value

See Also

Examples

Fill missing values in a column of nested data frames

Description

Usage

Arguments

Details

Value

See Also

Examples

Subset rows in nested data frames using column values.

Description

Usage

Arguments

Details

Value