Movatterモバイル変換


[0]ホーム

URL:


Package 'vctrs'

Title:Vector Helpers
Description:Defines new notions of prototype and size that are used to provide tools for consistent and well-founded type-coercion and size-recycling, and are in turn connected to ideas of type- and size-stability useful for analysing function interfaces.
Authors:Hadley Wickham [aut], Lionel Henry [aut], Davis Vaughan [aut, cre], data.table team [cph] (Radix sort based on data.table's forder() and their contribution to R's order()), Posit Software, PBC [cph, fnd]
Maintainer:Davis Vaughan <[email protected]>
License:MIT + file LICENSE
Version:0.6.5.9000
Built:2025-03-27 05:35:53 UTC
Source:https://github.com/r-lib/vctrs

Help Index


Default value for empty vectors

Description

Use this inline operator when you need to provide a default value forempty (as defined byvec_is_empty()) vectors.

Usage

x%0% y

Arguments

x

A vector

y

Value to use ifx is empty. To preserve type-stability, shouldbe the same type asx.

Examples

1:10%0%5integer()%0%5

Construct a data frame

Description

data_frame() constructs a data frame. It is similar tobase::data.frame(), but there are a few notable differences that make itmore in line with vctrs principles. The Properties section outlines these.

Usage

data_frame(...,  .size=NULL,  .name_repair= c("check_unique","unique","universal","minimal","unique_quiet","universal_quiet"),  .error_call= current_env())

Arguments

...

Vectors to become columns in the data frame. When inputs arenamed, those names are used for column names.

.size

The number of rows in the data frame. IfNULL, this willbe computed as the common size of the inputs.

.name_repair

One of"check_unique","unique","universal","minimal","unique_quiet", or"universal_quiet". Seevec_as_names()for the meaning of these options.

.error_call

The execution environment of a currentlyrunning function, e.g.caller_env(). The function will bementioned in error messages as the source of the error. See thecall argument ofabort() for more information.

Details

If no column names are supplied,"" will be used as a default name for allcolumns. This is applied before name repair occurs, so the default namerepair of"check_unique" will error if any unnamed inputs are supplied and"unique" (or"unique_quiet") will repair the empty string column namesappropriately. If the column names don't matter, use a"minimal" namerepair for convenience and performance.

Properties

  • Inputs arerecycled to a common size withvec_recycle_common().

  • With the exception of data frames, inputs are not modified in any way.Character vectors are never converted to factors, and lists are storedas-is for easy creation of list-columns.

  • Unnamed data frame inputs are automatically unpacked. Named data frameinputs are stored unmodified as data frame columns.

  • NULL inputs are completely ignored.

  • The dots are dynamic, allowing for splicing of lists with⁠!!!⁠ andunquoting.

See Also

df_list() for safely creating a data frame's underlying data structure fromindividual columns.new_data_frame() for constructing the actual dataframe from that underlying data structure. Together, these can be usefulfor developers when creating new data frame subclasses supportingstandard evaluation.

Examples

data_frame(x=1, y=2)# Inputs are recycled using tidyverse recycling rulesdata_frame(x=1, y=1:3)# Strings are never converted to factorsclass(data_frame(x="foo")$x)# List columns can be easily createddf<- data_frame(x= list(1:2,2,3:4), y=3:1)# However, the base print method is suboptimal for displaying them,# so it is recommended to convert them to tibbleif(rlang::is_installed("tibble")){  tibble::as_tibble(df)}# Named data frame inputs create data frame columnsdf<- data_frame(x= data_frame(y=1:2, z="a"))# The `x` column itself is another data framedf$x# Again, it is recommended to convert these to tibbles for a better# print methodif(rlang::is_installed("tibble")){  tibble::as_tibble(df)}# Unnamed data frame input is automatically unpackeddata_frame(x=1, data_frame(y=1:2, z="a"))

Collect columns for data frame construction

Description

df_list() constructs the data structure underlying a dataframe, a named list of equal-length vectors. It is often used incombination withnew_data_frame() to safely and consistently createa helper function for data frame subclasses.

Usage

df_list(...,  .size=NULL,  .unpack=TRUE,  .name_repair= c("check_unique","unique","universal","minimal","unique_quiet","universal_quiet"),  .error_call= current_env())

Arguments

...

Vectors of equal-length. When inputs are named, those namesare used for names of the resulting list.

.size

The common size of vectors supplied in.... IfNULL, thiswill be computed as the common size of the inputs.

.unpack

Should unnamed data frame inputs be unpacked? Defaults toTRUE.

.name_repair

One of"check_unique","unique","universal","minimal","unique_quiet", or"universal_quiet". Seevec_as_names()for the meaning of these options.

.error_call

The execution environment of a currentlyrunning function, e.g.caller_env(). The function will bementioned in error messages as the source of the error. See thecall argument ofabort() for more information.

Properties

  • Inputs arerecycled to a common size withvec_recycle_common().

  • With the exception of data frames, inputs are not modified in any way.Character vectors are never converted to factors, and lists are storedas-is for easy creation of list-columns.

  • Unnamed data frame inputs are automatically unpacked. Named data frameinputs are stored unmodified as data frame columns.

  • NULL inputs are completely ignored.

  • The dots are dynamic, allowing for splicing of lists with⁠!!!⁠ andunquoting.

See Also

new_data_frame() for constructing data frame subclasses from a validatedinput.data_frame() for a fast data frame creation helper.

Examples

# `new_data_frame()` can be used to create custom data frame constructorsnew_fancy_df<-function(x= list(), n=NULL,..., class=NULL){  new_data_frame(x, n= n,..., class= c(class,"fancy_df"))}# Combine this constructor with `df_list()` to create a safe,# consistent helper function for your data frame subclassfancy_df<-function(...){  data<- df_list(...)  new_fancy_df(data)}df<- fancy_df(x=1)class(df)

Coercion between two data frames

Description

df_ptype2() anddf_cast() are the two functions you need tocall fromvec_ptype2() andvec_cast() methods for data framesubclasses. See?howto-faq-coercion-data-frame.Their main job is to determine the common type of two data frames,adding and coercing columns as needed, or throwing an incompatibletype error when the columns are not compatible.

Usage

df_ptype2(x, y,..., x_arg="", y_arg="", call= caller_env())df_cast(x, to,..., x_arg="", to_arg="", call= caller_env())tib_ptype2(x, y,..., x_arg="", y_arg="", call= caller_env())tib_cast(x, to,..., x_arg="", to_arg="", call= caller_env())

Arguments

x,y,to

Subclasses of data frame.

...

If you calldf_ptype2() ordf_cast() from avec_ptype2() orvec_cast() method, you must forward the dotspassed to your method on todf_ptype2() ordf_cast().

x_arg,y_arg

Argument names forx andy. These are usedin error messages to inform the user about the locations ofincompatible types (seestop_incompatible_type()).

call

The execution environment of a currentlyrunning function, e.g.caller_env(). The function will bementioned in error messages as the source of the error. See thecall argument ofabort() for more information.

to_arg

Argument nameto used in error messages toinform the user about the locations of incompatible types(seestop_incompatible_type()).

Value

  • Whenx andy are not compatible, an error of classvctrs_error_incompatible_type is thrown.

  • Whenx andy are compatible,df_ptype2() returns the commontype as a bare data frame.tib_ptype2() returns the common typeas a bare tibble.


FAQ - How is the compatibility of vector types decided?

Description

Two vectors arecompatible when you can safely:

  • Combine them into one larger vector.

  • Assign values from one of the vectors into the other vector.

Examples of compatible types are integer and double vectors. On theother hand, integer and character vectors are not compatible.

Common type of multiple vectors

There are two possible outcomes when multiple vectors of different typesare combined into a larger vector:

  • An incompatible type error is thrown because some of the types are notcompatible:

    df1 <- data.frame(x = 1:3)df2 <- data.frame(x = "foo")dplyr::bind_rows(df1, df2)#> Error in `dplyr::bind_rows()`:#> ! Can't combine `..1$x` <integer> and `..2$x` <character>.
  • The vectors are combined into a vector that has the common type of allinputs. In this example, the common type of integer and logical isinteger:

    df1 <- data.frame(x = 1:3)df2 <- data.frame(x = FALSE)dplyr::bind_rows(df1, df2)#>   x#> 1 1#> 2 2#> 3 3#> 4 0

In general, the common type is thericher type, in other words thetype that can represent the most values. Logical vectors are at thebottom of the hierarchy of numeric types because they can only representtwo values (not counting missing values). Then come integer vectors, andthen doubles. Here is the vctrs type hierarchy for the fundamentalvectors:

coerce.png

Type conversion and lossy cast errors

Type compatibility does not necessarily mean that you canconvertone type to the other type. That’s because one of the types mightsupport a larger set of possible values. For instance, integer anddouble vectors are compatible, but double vectors can’t be converted tointeger if they contain fractional values.

When vctrs can’t convert a vector because the target type is not as richas the source type, it throws a lossy cast error. Assigning a fractionalnumber to an integer vector is a typical example of a lossy cast error:

int_vector <- 1:3vec_assign(int_vector, 2, 0.001)#> Error in `vec_assign()`:#> ! Can't convert from <double> to <integer> due to loss of precision.#> * Locations: 1

How to make two vector classes compatible?

If you encounter two vector types that you think should be compatible,they might need to implement coercion methods. Reach out to theauthor(s) of the classes and ask them if it makes sense for theirclasses to be compatible.

These developer FAQ items provide guides for implementing coercionmethods:


FAQ - Error/Warning: Some attributes are incompatible

Description

This error occurs whenvec_ptype2() orvec_cast() are suppliedvectors of the same classes with different attributes. In thiscase, vctrs doesn't know how to combine the inputs.

To fix this error, the maintainer of the class should implementself-to-self coercion methods forvec_ptype2() andvec_cast().

Implementing coercion methods

  • For an overview of how these generics work and their roles in vctrs,see?theory-faq-coercion.

  • For an example of implementing coercion methods for simple vectors,see?howto-faq-coercion.

  • For an example of implementing coercion methods for data framesubclasses, see?howto-faq-coercion-data-frame.

  • For a tutorial about implementing vctrs classes from scratch, seevignette("s3-vector").


FAQ - Error: Input must be a vector

Description

This error occurs when a function expects a vector and gets a scalarobject instead. This commonly happens when some code attempts to assigna scalar object as column in a data frame:

fn <- function() NULLtibble::tibble(x = fn)#> Error in `tibble::tibble()`:#> ! All columns in a tibble must be vectors.#> x Column `x` is a function.fit <- lm(1:3 ~ 1)tibble::tibble(x = fit)#> Error in `tibble::tibble()`:#> ! All columns in a tibble must be vectors.#> x Column `x` is a `lm` object.

Vectorness in base R and in the tidyverse

In base R, almost everything is a vector or behaves like a vector. Inthe tidyverse we have chosen to be a bit stricter about what isconsidered a vector. The main question we ask ourselves to decide on thevectorness of a type is whether it makes sense to include that object asa column in a data frame.

The main difference is that S3 lists are considered vectors by base Rbut in the tidyverse that’s not the case by default:

fit <- lm(1:3 ~ 1)typeof(fit)#> [1] "list"class(fit)#> [1] "lm"# S3 lists can be subset like a vector using base R:fit[c(1, 4)]#> $coefficients#> (Intercept) #>           2 #> #> $rank#> [1] 1# But not in vctrsvctrs::vec_slice(fit, c(1, 4))#> Error in `vctrs::vec_slice()`:#> ! `x` must be a vector, not a <lm> object.

Defused function calls are another (more esoteric) example:

call <- quote(foo(bar = TRUE, baz = FALSE))call#> foo(bar = TRUE, baz = FALSE)# They can be subset like a vector using base R:call[1:2]#> foo(bar = TRUE)lapply(call, function(x) x)#> [[1]]#> foo#> #> $bar#> [1] TRUE#> #> $baz#> [1] FALSE# But not with vctrs:vctrs::vec_slice(call, 1:2)#> Error in `vctrs::vec_slice()`:#> ! `x` must be a vector, not a call.

I get a scalar type error but I think this is a bug

It’s possible the author of the class needs to do some work to declaretheir class a vector. Consider reaching out to the author. We havewritten adeveloper FAQ page tohelp them fix the issue.


FAQ - How to implement ptype2 and cast methods?

Description

This guide illustrates how to implementvec_ptype2() andvec_cast()methods for existing classes. Related topics:

  • For an overview of how these generics work and their roles in vctrs,see?theory-faq-coercion.

  • For an example of implementing coercion methods for data framesubclasses, see?howto-faq-coercion-data-frame.

  • For a tutorial about implementing vctrs classes from scratch, seevignette("s3-vector")

The natural number class

We’ll illustrate how to implement coercion methods with a simple classthat represents natural numbers. In this scenario we have an existingclass that already features a constructor and methods forprint() andsubset.

#' @exportnew_natural <- function(x) {  if (is.numeric(x) || is.logical(x)) {    stopifnot(is_whole(x))    x <- as.integer(x)  } else {    stop("Can't construct natural from unknown type.")  }  structure(x, class = "my_natural")}is_whole <- function(x) {  all(x %% 1 == 0 | is.na(x))}#' @exportprint.my_natural <- function(x, ...) {  cat("<natural>\n")  x <- unclass(x)  NextMethod()}#' @export`[.my_natural` <- function(x, i, ...) {  new_natural(NextMethod())}
new_natural(1:3)#> <natural>#> [1] 1 2 3new_natural(c(1, NA))#> <natural>#> [1]  1 NA

Roxygen workflow

To implement methods for generics, first import the generics in yournamespace and redocument:

#' @importFrom vctrs vec_ptype2 vec_castNULL

Note that for each batches of methods that you add to your package, youneed to export the methods and redocument immediately, even duringdevelopment. Otherwise they won’t be in scope when you run unit testse.g. with testthat.

Implementing double dispatch methods is very similar to implementingregular S3 methods. In these examples we are using roxygen2 tags toregister the methods, but you can also register the methods manually inyour NAMESPACE file or lazily withs3_register().

Implementingvec_ptype2()

The self-self method

The first method to implement is the one that signals that your class iscompatible with itself:

#' @exportvec_ptype2.my_natural.my_natural <- function(x, y, ...) {  x}vec_ptype2(new_natural(1), new_natural(2:3))#> <natural>#> integer(0)

vec_ptype2() implements a fallback to try and be compatible withsimple classes, so it may seem that you don’t need to implement theself-self coercion method. However, you must implement it explicitlybecause this is how vctrs knows that a class that is implementing vctrsmethods (for instance this disable fallbacks tobase::c()). Also, itmakes your class a bit more efficient.

The parent and children methods

Our natural number class is conceptually a parent of⁠<logical>⁠ and achild of⁠<integer>⁠, but the class is not compatible with logical,integer, or double vectors yet:

vec_ptype2(TRUE, new_natural(2:3))#> Error:#> ! Can't combine `TRUE` <logical> and `new_natural(2:3)` <my_natural>.vec_ptype2(new_natural(1), 2:3)#> Error:#> ! Can't combine `new_natural(1)` <my_natural> and `2:3` <integer>.

We’ll specify the twin methods for each of these classes, returning thericher class in each case.

#' @exportvec_ptype2.my_natural.logical <- function(x, y, ...) {  # The order of the classes in the method name follows the order of  # the arguments in the function signature, so `x` is the natural  # number and `y` is the logical  x}#' @exportvec_ptype2.logical.my_natural <- function(x, y, ...) {  # In this case `y` is the richer natural number  y}

Between a natural number and an integer, the latter is the richer class:

#' @exportvec_ptype2.my_natural.integer <- function(x, y, ...) {  y}#' @exportvec_ptype2.integer.my_natural <- function(x, y, ...) {  x}

We no longer get common type errors for logical and integer:

vec_ptype2(TRUE, new_natural(2:3))#> <natural>#> integer(0)vec_ptype2(new_natural(1), 2:3)#> integer(0)

We are not done yet. Pairwise coercion methods must be implemented forall the connected nodes in the coercion hierarchy, which include doublevectors further up. The coercion methods for grand-parent types must beimplemented separately:

#' @exportvec_ptype2.my_natural.double <- function(x, y, ...) {  y}#' @exportvec_ptype2.double.my_natural <- function(x, y, ...) {  x}
Incompatible attributes

Most of the time, inputs are incompatible because they have differentclasses for which novec_ptype2() method is implemented. More rarely,inputs could be incompatible because of their attributes. In that caseincompatibility is signalled by callingstop_incompatible_type().

In the following example, we implement a self-self ptype2 method for ahypothetical subclass of⁠<factor>⁠ that has stricter combinationsemantics. The method throws an error when the levels of the two factorsare not compatible.

#' @exportvec_ptype2.my_strict_factor.my_strict_factor <- function(x, y, ..., x_arg = "", y_arg = "") {  if (!setequal(levels(x), levels(y))) {    stop_incompatible_type(x, y, x_arg = x_arg, y_arg = y_arg)  }  x}

Note how the methods need to takex_arg andy_arg parameters andpass them on tostop_incompatible_type(). These argument tags helpcreate more informative error messages when the common typedetermination is for a column of a data frame. They are part of thegeneric signature but can usually be left out if not used.

Implementingvec_cast()

Correspondingvec_cast() methods must be implemented for allvec_ptype2() methods. The general pattern is to convert the argumentx to the type ofto. The methods should validate the values inxand make sure they conform to the values ofto.

Please note that for historical reasons, the order of the classes in themethod name is in reverse order of the arguments in the functionsignature. The first class representsto, whereas the second classrepresentsx.

The self-self method is easy in this case, it just returns the targetinput:

#' @exportvec_cast.my_natural.my_natural <- function(x, to, ...) {  x}

The other types need to be validated. We perform input validation in thenew_natural() constructor, so that’s a good fit for ourvec_cast()implementations.

#' @exportvec_cast.my_natural.logical <- function(x, to, ...) {  # The order of the classes in the method name is in reverse order  # of the arguments in the function signature, so `to` is the natural  # number and `x` is the logical  new_natural(x)}vec_cast.my_natural.integer <- function(x, to, ...) {  new_natural(x)}vec_cast.my_natural.double <- function(x, to, ...) {  new_natural(x)}

With these methods, vctrs is now able to combine logical and naturalvectors. It properly returns the richer type of the two, a naturalvector:

vec_c(TRUE, new_natural(1), FALSE)#> <natural>#> [1] 1 1 0

Because we haven’t implemented conversionsfrom natural, it stilldoesn’t know how to combine natural with the richer integer and doubletypes:

vec_c(new_natural(1), 10L)#> Error in `vec_c()`:#> ! Can't convert `..1` <my_natural> to <integer>.vec_c(1.5, new_natural(1))#> Error in `vec_c()`:#> ! Can't convert `..2` <my_natural> to <double>.

This is quick work which completes the implementation of coercionmethods for vctrs:

#' @exportvec_cast.logical.my_natural <- function(x, to, ...) {  # In this case `to` is the logical and `x` is the natural number  attributes(x) <- NULL  as.logical(x)}#' @exportvec_cast.integer.my_natural <- function(x, to, ...) {  attributes(x) <- NULL  as.integer(x)}#' @exportvec_cast.double.my_natural <- function(x, to, ...) {  attributes(x) <- NULL  as.double(x)}

And we now get the expected combinations.

vec_c(new_natural(1), 10L)#> [1]  1 10vec_c(1.5, new_natural(1))#> [1] 1.5 1.0

FAQ - How to implement ptype2 and cast methods? (Data frames)

Description

This guide provides a practical recipe for implementingvec_ptype2()andvec_cast() methods for coercions of data frame subclasses. Relatedtopics:

Coercion of data frames occurs when different data frame classes arecombined in some way. The two main methods of combination are currentlyrow-binding withvec_rbind() and col-binding withvec_cbind() (which are in turn used by a number ofdplyr and tidyr functions). These functions take multiple data frameinputs and automatically coerce them to their common type.

vctrs is generally strict about the kind of automatic coercions that areperformed when combining inputs. In the case of data frames we havedecided to be a bit less strict for convenience. Instead of throwing anincompatible type error, we fall back to a base data frame or a tibbleif we don’t know how to combine two data frame subclasses. It is still agood idea to specify the proper coercion behaviour for your data framesubclasses as soon as possible.

We will see two examples in this guide. The first example is about adata frame subclass that has no particular attributes to manage. In thesecond example, we implement coercion methods for a tibble subclass thatincludes potentially incompatible attributes.

Roxygen workflow

To implement methods for generics, first import the generics in yournamespace and redocument:

#' @importFrom vctrs vec_ptype2 vec_castNULL

Note that for each batches of methods that you add to your package, youneed to export the methods and redocument immediately, even duringdevelopment. Otherwise they won’t be in scope when you run unit testse.g. with testthat.

Implementing double dispatch methods is very similar to implementingregular S3 methods. In these examples we are using roxygen2 tags toregister the methods, but you can also register the methods manually inyour NAMESPACE file or lazily withs3_register().

Parent methods

Most of the common type determination should be performed by the parentclass. In vctrs, double dispatch is implemented in such a way that youneed to call the methods for the parent class manually. Forvec_ptype2() this means you need to calldf_ptype2() (for data framesubclasses) ortib_ptype2() (for tibble subclasses). Similarly,df_cast() andtib_cast() are the workhorses forvec_cast() methodsof subtypes ofdata.frame andtbl_df. These functions take the unionof the columns inx andy, and ensure shared columns have the sametype.

These functions are much less strict thanvec_ptype2() andvec_cast() as they accept any subclass of data frame as input. Theyalways return adata.frame or atbl_df. You will probably want towrite similar functions for your subclass to avoid repetition in yourcode. You may want to export them as well if you are expecting otherpeople to derive from your class.

Adata.table example

This example is the actual implementation of vctrs coercion methods fordata.table. This is a simple example because we don’t have to keeptrack of attributes for this class or manage incompatibilities. See thetibble section for a more complicated example.

We first create thedt_ptype2() anddt_cast() helpers. They wraparound the parent methodsdf_ptype2() anddf_cast(), and transformthe common type or converted input to a data table. You may want toexport these helpers if you expect other packages to derive from yourdata frame class.

These helpers should always return data tables. To this end we use theconversion genericas.data.table(). Depending on the tools availablefor the particular class at hand, a constructor might be appropriate aswell.

dt_ptype2 <- function(x, y, ...) {  as.data.table(df_ptype2(x, y, ...))}dt_cast <- function(x, to, ...) {  as.data.table(df_cast(x, to, ...))}

We start with the self-self method:

#' @exportvec_ptype2.data.table.data.table <- function(x, y, ...) {  dt_ptype2(x, y, ...)}

Between a data frame and a data table, we consider the richer type to bedata table. This decision is not based on the value coverage of eachdata structures, but on the idea that data tables have richer behaviour.Since data tables are the richer type, we calldt_type2() from thevec_ptype2() method. It always returns a data table, no matter theorder of arguments:

#' @exportvec_ptype2.data.table.data.frame <- function(x, y, ...) {  dt_ptype2(x, y, ...)}#' @exportvec_ptype2.data.frame.data.table <- function(x, y, ...) {  dt_ptype2(x, y, ...)}

Thevec_cast() methods follow the same pattern, but note how themethod for coercing to data frame usesdf_cast() rather thandt_cast().

Also, please note that for historical reasons, the order of the classesin the method name is in reverse order of the arguments in the functionsignature. The first class representsto, whereas the second classrepresentsx.

#' @exportvec_cast.data.table.data.table <- function(x, to, ...) {  dt_cast(x, to, ...)}#' @exportvec_cast.data.table.data.frame <- function(x, to, ...) {  # `x` is a data.frame to be converted to a data.table  dt_cast(x, to, ...)}#' @exportvec_cast.data.frame.data.table <- function(x, to, ...) {  # `x` is a data.table to be converted to a data.frame  df_cast(x, to, ...)}

With these methods vctrs is now able to combine data tables with dataframes:

vec_cbind(data.frame(x = 1:3), data.table(y = "foo"))#>        x      y#>    <int> <char>#> 1:     1    foo#> 2:     2    foo#> 3:     3    foo

A tibble example

In this example we implement coercion methods for a tibble subclass thatcarries a colour as a scalar metadata:

# User constructormy_tibble <- function(colour = NULL, ...) {  new_my_tibble(tibble::tibble(...), colour = colour)}# Developer constructornew_my_tibble <- function(x, colour = NULL) {  stopifnot(is.data.frame(x))  tibble::new_tibble(    x,    colour = colour,    class = "my_tibble",    nrow = nrow(x)  )}df_colour <- function(x) {  if (inherits(x, "my_tibble")) {    attr(x, "colour")  } else {    NULL  }}#'@exportprint.my_tibble <- function(x, ...) {  cat(sprintf("<%s: %s>\n", class(x)[[1]], df_colour(x)))  cli::cat_line(format(x)[-1])}

This subclass is very simple. All it does is modify the header.

red <- my_tibble("red", x = 1, y = 1:2)red#> <my_tibble: red>#>       x     y#>   <dbl> <int>#> 1     1     1#> 2     1     2red[2]#> <my_tibble: red>#>       y#>   <int>#> 1     1#> 2     2green <- my_tibble("green", z = TRUE)green#> <my_tibble: green>#>   z    #>   <lgl>#> 1 TRUE

Combinations do not work properly out of the box, instead vctrs fallsback to a bare tibble:

vec_rbind(red, tibble::tibble(x = 10:12))#> # A tibble: 5 x 2#>       x     y#>   <dbl> <int>#> 1     1     1#> 2     1     2#> 3    10    NA#> 4    11    NA#> 5    12    NA

Instead of falling back to a data frame, we would like to return a⁠<my_tibble>⁠ when combined with a data frame or a tibble. Because thissubclass has more metadata than normal data frames (it has a colour), itis asupertype of tibble and data frame, i.e. it is the richer type.This is similar to how a grouped tibble is a more general type than atibble or a data frame. Conceptually, the latter are pinned to a singleconstant group.

The coercion methods for data frames operate in two steps:

  • They check for compatible subclass attributes. In our case the tibblecolour has to be the same, or be undefined.

  • They call their parent methods, in this casetib_ptype2() andtib_cast() becausewe have a subclass of tibble. This eventually calls the data framemethodsdf_ptype2() andtib_ptype2() which match the columns and theirtypes.

This process should usually be wrapped in two functions to avoidrepetition. Consider exporting these if you expect your class to bederived by other subclasses.

We first implement a helper to determine if two data frames havecompatible colours. We use thedf_colour() accessor which returnsNULL when the data frame colour is undefined.

has_compatible_colours <- function(x, y) {  x_colour <- df_colour(x) %||% df_colour(y)  y_colour <- df_colour(y) %||% x_colour  identical(x_colour, y_colour)}

Next we implement the coercion helpers. If the colours are notcompatible, we callstop_incompatible_cast() orstop_incompatible_type(). These strict coercion semantics arejustified because in this class colour is adata attribute. If it werea non essentialdetail attribute, like the timezone in a datetime, wewould just standardise it to the value of the left-hand side.

In simpler cases (like the data.table example), these methods do notneed to take the arguments suffixed in⁠_arg⁠. Here we do need to takethese arguments so we can pass them to thestop_ functions when wedetect an incompatibility. They also should be passed to the parentmethods.

#' @exportmy_tib_cast <- function(x, to, ..., x_arg = "", to_arg = "") {  out <- tib_cast(x, to, ..., x_arg = x_arg, to_arg = to_arg)  if (!has_compatible_colours(x, to)) {    stop_incompatible_cast(      x,      to,      x_arg = x_arg,      to_arg = to_arg,      details = "Can't combine colours."    )  }  colour <- df_colour(x) %||% df_colour(to)  new_my_tibble(out, colour = colour)}#' @exportmy_tib_ptype2 <- function(x, y, ..., x_arg = "", y_arg = "") {  out <- tib_ptype2(x, y, ..., x_arg = x_arg, y_arg = y_arg)  if (!has_compatible_colours(x, y)) {    stop_incompatible_type(      x,      y,      x_arg = x_arg,      y_arg = y_arg,      details = "Can't combine colours."    )  }  colour <- df_colour(x) %||% df_colour(y)  new_my_tibble(out, colour = colour)}

Let’s now implement the coercion methods, starting with the self-selfmethods.

#' @exportvec_ptype2.my_tibble.my_tibble <- function(x, y, ...) {  my_tib_ptype2(x, y, ...)}#' @exportvec_cast.my_tibble.my_tibble <- function(x, to, ...) {  my_tib_cast(x, to, ...)}

We can now combine compatible instances of our class!

vec_rbind(red, red)#> <my_tibble: red>#>       x     y#>   <dbl> <int>#> 1     1     1#> 2     1     2#> 3     1     1#> 4     1     2vec_rbind(green, green)#> <my_tibble: green>#>   z    #>   <lgl>#> 1 TRUE #> 2 TRUEvec_rbind(green, red)#> Error in `my_tib_ptype2()`:#> ! Can't combine `..1` <my_tibble> and `..2` <my_tibble>.#> Can't combine colours.

The methods for combining our class with tibbles follow the samepattern. For ptype2 we return our class in both cases because it is thericher type:

#' @exportvec_ptype2.my_tibble.tbl_df <- function(x, y, ...) {  my_tib_ptype2(x, y, ...)}#' @exportvec_ptype2.tbl_df.my_tibble <- function(x, y, ...) {  my_tib_ptype2(x, y, ...)}

For cast are careful about returning a tibble when casting to a tibble.Note the call tovctrs::tib_cast():

#' @exportvec_cast.my_tibble.tbl_df <- function(x, to, ...) {  my_tib_cast(x, to, ...)}#' @exportvec_cast.tbl_df.my_tibble <- function(x, to, ...) {  tib_cast(x, to, ...)}

From this point, we get correct combinations with tibbles:

vec_rbind(red, tibble::tibble(x = 10:12))#> <my_tibble: red>#>       x     y#>   <dbl> <int>#> 1     1     1#> 2     1     2#> 3    10    NA#> 4    11    NA#> 5    12    NA

However we are not done yet. Because the coercion hierarchy is differentfrom the class hierarchy, there is no inheritance of coercion methods.We’re not getting correct behaviour for data frames yet because wehaven’t explicitly specified the methods for this class:

vec_rbind(red, data.frame(x = 10:12))#> # A tibble: 5 x 2#>       x     y#>   <dbl> <int>#> 1     1     1#> 2     1     2#> 3    10    NA#> 4    11    NA#> 5    12    NA

Let’s finish up the boiler plate:

#' @exportvec_ptype2.my_tibble.data.frame <- function(x, y, ...) {  my_tib_ptype2(x, y, ...)}#' @exportvec_ptype2.data.frame.my_tibble <- function(x, y, ...) {  my_tib_ptype2(x, y, ...)}#' @exportvec_cast.my_tibble.data.frame <- function(x, to, ...) {  my_tib_cast(x, to, ...)}#' @exportvec_cast.data.frame.my_tibble <- function(x, to, ...) {  df_cast(x, to, ...)}

This completes the implementation:

vec_rbind(red, data.frame(x = 10:12))#> <my_tibble: red>#>       x     y#>   <dbl> <int>#> 1     1     1#> 2     1     2#> 3    10    NA#> 4    11    NA#> 5    12    NA

FAQ - Why isn't my class treated as a vector?

Description

The tidyverse is a bit stricter than base R regarding what kind ofobjects are considered as vectors (see theuser FAQ about this topic). Sometimes vctrs won’ttreat your class as a vector when it should.

Why isn’t my list class considered a vector?

By default, S3 lists are not considered to be vectors by vctrs:

my_list <- structure(list(), class = "my_class")vctrs::vec_is(my_list)#> [1] FALSE

To be treated as a vector, the class must either inherit from"list"explicitly:

my_explicit_list <- structure(list(), class = c("my_class", "list"))vctrs::vec_is(my_explicit_list)#> [1] TRUE

Or it should implement avec_proxy() method that returns its input ifexplicit inheritance is not possible or troublesome:

#' @exportvec_proxy.my_class <- function(x, ...) xvctrs::vec_is(my_list)#> [1] FALSE

Note that explicit inheritance is the preferred way because this makesit possible for your class to dispatch onlist methods of S3 generics:

my_generic <- function(x) UseMethod("my_generic")my_generic.list <- function(x) "dispatched!"my_generic(my_list)#> Error in UseMethod("my_generic"): no applicable method for 'my_generic' applied to an object of class "my_class"my_generic(my_explicit_list)#> [1] "dispatched!"

Why isn’t my data frame class considered a vector?

The most likely explanation is that the data frame has not been properlyconstructed.

However, if you get an “Input must be a vector” error with a data framesubclass, it probably means that the data frame has not been properlyconstructed. The main cause of these errors are data frames whosebaseclass is not"data.frame":

my_df <- data.frame(x = 1)class(my_df) <- c("data.frame", "my_class")vctrs::obj_check_vector(my_df)#> Error:#> ! `my_df` must be a vector, not a <data.frame/my_class> object.

This is problematic as many tidyverse functions won’t work properly:

dplyr::slice(my_df, 1)#> Error in `vec_slice()`:#> ! `x` must be a vector, not a <data.frame/my_class> object.

It is generally not appropriate to declare your class to be a superclassof another class. We generally consider this undefined behaviour (UB).To fix these errors, you can simply change the construction of your dataframe class so that"data.frame" is a base class, i.e. it should comelast in the class vector:

class(my_df) <- c("my_class", "data.frame")vctrs::obj_check_vector(my_df)dplyr::slice(my_df, 1)#>   x#> 1 1

Internal FAQ - Implementation ofvec_locate_matches()

Description

vec_locate_matches() is similar tovec_match(), but detectsall matches by default, and can match on conditions other than equality (like>= and<). There are also various other arguments to limit or adjust exactly which kinds of matches are returned. Here is an example:

x <- c("a", "b", "a", "c", "d")y <- c("d", "b", "a", "d", "a", "e")# For each value of `x`, find all matches in `y`# - The "c" in `x` doesn't have a match, so it gets an NA location by default# - The "e" in `y` isn't matched by anything in `x`, so it is dropped by defaultvec_locate_matches(x, y)#>   needles haystack#> 1       1        3#> 2       1        5#> 3       2        2#> 4       3        3#> 5       3        5#> 6       4       NA#> 7       5        1#> 8       5        4

Algorithm description

Overview and==

The simplest (approximate) way to think about the algorithm thatdf_locate_matches_recurse() uses is that it sorts both inputs, and then starts at the midpoint inneedles and uses a binary search to find each needle inhaystack. Since there might be multiple of the same needle, we find the location of the lower and upper duplicate of that needle to handle all duplicates of that needle at once. Similarly, if there are duplicates of a matchinghaystack value, we find the lower and upper duplicates of the match.

If the condition is==, that is pretty much all we have to do. For each needle, we then record 3 things: the location of the needle, the location of the lower match in the haystack, and the match size (i.e.loc_upper_match - loc_lower_match + 1). This later gets expanded inexpand_compact_indices() into the actual output.

After recording the matches for a single needle, we perform the same procedure on the LHS and RHS of that needle (remember we started on the midpoint needle). i.e. from⁠[1, loc_needle-1]⁠ and⁠[loc_needle+1, size_needles]⁠, again taking the midpoint of those two ranges, finding their respective needle in the haystack, recording matches, and continuing on to the next needle. This iteration proceeds until we run out of needles.

When we have a data frame with multiple columns, we add a layer of recursion to this. For the first column, we find the locations of the lower/upper duplicate of the current needle, and we find the locations of the lower/upper matches in the haystack. If we are on the final column in the data frame, we record the matches, otherwise we pass this information on to another call todf_locate_matches_recurse(), bumping the column index and using these refined lower/upper bounds as the starting bounds for the next column.

I think an example would be useful here, so below I step through this process for a few iterations:

# these are sorted already for simplicityneedles <- data_frame(x = c(1, 1, 2, 2, 2, 3), y = c(1, 2, 3, 4, 5, 3))haystack <- data_frame(x = c(1, 1, 2, 2, 3), y = c(2, 3, 4, 4, 1))needles#>   x y#> 1 1 1#> 2 1 2#> 3 2 3#> 4 2 4#> 5 2 5#> 6 3 3haystack#>   x y#> 1 1 2#> 2 1 3#> 3 2 4#> 4 2 4#> 5 3 1## Column 1, iteration 1# start at midpoint in needles# this corresponds to x==2loc_mid_needles <- 3L# finding all x==2 values in needles gives us:loc_lower_duplicate_needles <- 3Lloc_upper_duplicate_needles <- 5L# finding matches in haystack give us:loc_lower_match_haystack <- 3Lloc_upper_match_haystack <- 4L# compute LHS/RHS bounds for next needlelhs_loc_lower_bound_needles <- 1L # original lower boundlhs_loc_upper_bound_needles <- 2L # lower_duplicate-1rhs_loc_lower_bound_needles <- 6L # upper_duplicate+1rhs_loc_upper_bound_needles <- 6L # original upper bound# We still have a 2nd column to check. So recurse and pass on the current# duplicate and match bounds to start the 2nd column with.## Column 2, iteration 1# midpoint of [3, 5]# value y==4loc_mid_needles <- 4Lloc_lower_duplicate_needles <- 4Lloc_upper_duplicate_needles <- 4Lloc_lower_match_haystack <- 3Lloc_upper_match_haystack <- 4L# last column, so record matches# - this was location 4 in needles# - lower match in haystack is at loc 3# - match size is 2# Now handle LHS and RHS of needle midpointlhs_loc_lower_bound_needles <- 3L # original lower boundlhs_loc_upper_bound_needles <- 3L # lower_duplicate-1rhs_loc_lower_bound_needles <- 5L # upper_duplicate+1rhs_loc_upper_bound_needles <- 5L # original upper bound## Column 2, iteration 2 (using LHS bounds)# midpoint of [3,3]# value of y==3loc_mid_needles <- 3Lloc_lower_duplicate_needles <- 3Lloc_upper_duplicate_needles <- 3L# no match! no y==3 in haystack for x==2# lower-match will always end up > upper-match in this caseloc_lower_match_haystack <- 3Lloc_upper_match_haystack <- 2L# no LHS or RHS needle values to do, so we are done here## Column 2, iteration 3 (using RHS bounds)# same as above, range of [5,5], value of y==5, which has no match in haystack## Column 1, iteration 2 (LHS of first x needle)# Now we are done with the x needles from [3,5], so move on to the LHS and RHS# of that. Here we would do the LHS:# midpoint of [1,2]loc_mid_needles <- 1L# ...## Column 1, iteration 3 (RHS of first x needle)# midpoint of [6,6]loc_mid_needles <- 6L# ...

In the real code, rather than comparing the double values of the columns directly, we replace each column with pseudo "joint ranks" computed between the i-th column ofneedles and the i-th column ofhaystack. It is approximately like doingvec_rank(vec_c(needles$x, haystack$x), type = "dense"), then splitting the resulting ranks back up into their corresponding needle/haystack columns. This keeps the recursion code simpler, because we only have to worry about comparing integers.

Non-equi conditions and containers

At this point we can talk about non-equi conditions like< or>=. The general idea is pretty simple, and just builds on the above algorithm. For example, start with thex column from needles/haystack above:

needles$x#> [1] 1 1 2 2 2 3haystack$x#> [1] 1 1 2 2 3

If we used a condition of<=, then we'd do everything the same as before:

  • Midpoint in needles is location 3, valuex==2

  • Find lower/upper duplicates in needles, giving locations⁠[3, 5]⁠

  • Find lower/upperexact match in haystack, giving locations⁠[3, 4]⁠

At this point, we need to "adjust" thehaystack match bounds to account for the condition. Sincehaystack is ordered, our "rule" for<= is to keep the lower match location the same, but extend the upper match location to the upper bound, so we end up with⁠[3, 5]⁠. We know we can extend the upper match location because every haystack value after the exact match should be less than the needle. Then we just record the matches and continue on normally.

This approach is really nice, because we only have to exactly match theneedle inhaystack. We don't have to compare each needle against every value inhaystack, which would take a massive amount of time.

However, it gets slightly more complex with data frames with multiple columns. Let's go back to our originalneedles andhaystack data frames and apply the condition<= to each column. Here is another worked example, which shows a case where our "rule" falls apart on the second column.

needles#>   x y#> 1 1 1#> 2 1 2#> 3 2 3#> 4 2 4#> 5 2 5#> 6 3 3haystack#>   x y#> 1 1 2#> 2 1 3#> 3 2 4#> 4 2 4#> 5 3 1# `condition = c("<=", "<=")`## Column 1, iteration 1# x == 2loc_mid_needles <- 3Lloc_lower_duplicate_needles <- 3Lloc_upper_duplicate_needles <- 5L# finding exact matches in haystack give us:loc_lower_match_haystack <- 3Lloc_upper_match_haystack <- 4L# because haystack is ordered we know we can expand the upper bound automatically# to include everything past the match. i.e. needle of x==2 must be less than# the haystack value at loc 5, which we can check by seeing that it is x==3.loc_lower_match_haystack <- 3Lloc_upper_match_haystack <- 5L## Column 2, iteration 1# needles range of [3, 5]# y == 4loc_mid_needles <- 4Lloc_lower_duplicate_needles <- 4Lloc_upper_duplicate_needles <- 4L# finding exact matches in haystack give us:loc_lower_match_haystack <- 3Lloc_upper_match_haystack <- 4L# lets try using our rule, which tells us we should be able to extend the upper# bound:loc_lower_match_haystack <- 3Lloc_upper_match_haystack <- 5L# but the haystack value of y at location 5 is y==1, which is not less than y==4# in the needles! looks like our rule failed us.

If you read through the above example, you'll see that the rule didn't work here. The problem is that whilehaystack is ordered (byvec_order()s standards), each column isn't orderedindependently of the others. Instead, each column is ordered within the "group" created by previous columns. Concretely,haystack here has an orderedx column, but if you look athaystack$y by itself, it isn't ordered (because of that 1 at the end). That is what causes the rule to fail.

haystack#>   x y#> 1 1 2#> 2 1 3#> 3 2 4#> 4 2 4#> 5 3 1

To fix this, we need to create haystack "containers" where the values within each container are alltotally ordered. Forhaystack that would create 2 containers and look like:

haystack[1:4,]#> # A tibble: 4 × 2#>       x     y#>   <dbl> <dbl>#> 1     1     2#> 2     1     3#> 3     2     4#> 4     2     4haystack[5,]#> # A tibble: 1 × 2#>       x     y#>   <dbl> <dbl>#> 1     3     1

This is essentially whatcomputing_nesting_container_ids() does. You can actually see these ids with the helper,compute_nesting_container_info():

haystack2 <- haystack# we really pass along the integer ranks, but in this case that is equivalent# to converting our double columns to integershaystack2$x <- as.integer(haystack2$x)haystack2$y <- as.integer(haystack2$y)info <- compute_nesting_container_info(haystack2, condition = c("<=", "<="))# the ids are in the second slot.# container ids break haystack into [1, 4] and [5, 5].info[[2]]#> [1] 0 0 0 0 1

So the idea is that for each needle, we look in each haystack container and find all the matches, then we aggregate all of the matches once at the end.df_locate_matches_with_containers() has the job of iterating over the containers.

Computing totally ordered containers can be expensive, but luckily it doesn't happen very often in normal usage.

  • If there are all== conditions, we don't need containers (i.e. any equi join)

  • If there is only 1 non-equi condition and no conditions after it, we don't need containers (i.e. most rolling joins)

  • Otherwise the typical case where we need containers is if we have something like⁠date >= lower, date <= upper⁠. Even so, the computation cost generally scales with the number of columns inhaystack you compute containers with (here 2), and it only really slows down around 4 columns or so, which I haven't ever seen a real life example of.


Internal FAQ -vec_ptype2(),NULL, and unspecified vectors

Description

Promotion monoid

Promotions (i.e. automatic coercions) should always transform inputs totheir richer type to avoid losing values of precision.vec_ptype2()returns thericher type of two vectors, or throws an incompatible typeerror if none of the two vector types include the other. For example,the richer type of integer and double is the latter because doublecovers a larger range of values than integer.

vec_ptype2() is amonoid overvectors, which in practical terms means that it is a well behavedoperation forreduction.Reduction is an important operation for promotions because that is howthe richer type of multiple elements is computed. As a monoid,vec_ptype2() needs an identity element, i.e. a value that doesn’tchange the result of the reduction. vctrs has two identity values,NULL andunspecified vectors.

TheNULL identity

As an identity element that shouldn’t influence the determination of thecommon type of a set of vectors,NULL is promoted to any type:

vec_ptype2(NULL, "")#> character(0)vec_ptype2(1L, NULL)#> integer(0)

The common type ofNULL andNULL is the identityNULL:

vec_ptype2(NULL, NULL)#> NULL

This way the result ofvec_ptype2(NULL, NULL) does not influencesubsequent promotions:

vec_ptype2(  vec_ptype2(NULL, NULL),  "")#> character(0)

Unspecified vectors

In the vctrs coercion system, logical vectors of missing values are alsoautomatically promoted to the type of any other vector, just likeNULL. We call these vectors unspecified. The special coercionsemantics of unspecified vectors serve two purposes:

  1. It makes it possible to assign vectors ofNA inside any type ofvectors, even when they are not coercible with logical:

    x <- letters[1:5]vec_assign(x, 1:2, c(NA, NA))#> [1] NA  NA  "c" "d" "e"
  2. We can’t putNULL in a data frame, so we need an identity elementthat behaves more like a vector. Logical vectors ofNA seem anatural fit for this.

Unspecified vectors are thus promoted to any other type, just likeNULL:

vec_ptype2(NA, "")#> character(0)vec_ptype2(1L, c(NA, NA))#> integer(0)

Finalising common types

vctrs has an internal vector type of classvctrs_unspecified. Usersnormally don’t see such vectors in the wild, but they do come up whentaking the common type of an unspecified vector with another identityvalue:

vec_ptype2(NA, NA)#> <unspecified> [0]vec_ptype2(NA, NULL)#> <unspecified> [0]vec_ptype2(NULL, NA)#> <unspecified> [0]

We can’t returnNA here becausevec_ptype2() normally returns emptyvectors. We also can’t returnNULL because unspecified vectors need tobe recognised as logical vectors if they haven’t been promoted at theend of the reduction.

vec_ptype_finalise(vec_ptype2(NULL, NA))#> logical(0)

See the output ofvec_ptype_common() which performs the reduction andfinalises the type, ready to be used by the caller:

vec_ptype_common(NULL, NULL)#> NULLvec_ptype_common(NA, NULL)#> logical(0)

Note thatpartial types in vctrs make use of the same mechanism.They are finalised withvec_ptype_finalise().


Drop empty elements from a list

Description

list_drop_empty() removes empty elements from a list. This includesNULLelements along with empty vectors, likeinteger(0). This is equivalent to,but faster than,vec_slice(x, list_sizes(x) != 0L).

Usage

list_drop_empty(x)

Arguments

x

A list.

Dependencies

Examples

x<- list(1,NULL, integer(),2)list_drop_empty(x)

list_of S3 class for homogenous lists

Description

Alist_of object is a list where each element has the same type.Modifying the list with$,[, and[[ preserves the constraintby coercing all input items.

Usage

list_of(..., .ptype=NULL)as_list_of(x,...)is_list_of(x)## S3 method for class 'vctrs_list_of'vec_ptype2(x, y,..., x_arg="", y_arg="")## S3 method for class 'vctrs_list_of'vec_cast(x, to,...)

Arguments

...

Vectors to coerce.

.ptype

IfNULL, the default, the output type is determined bycomputing the common type across all elements of....

Alternatively, you can supply.ptype to give the output known type.IfgetOption("vctrs.no_guessing") isTRUE you must supply this value:this is a convenient way to make production code demand fixed types.

x

Foras_list_of(), a vector to be coerced to list_of.

y,to

Arguments tovec_ptype2() andvec_cast().

x_arg,y_arg

Argument names forx andy. These are usedin error messages to inform the user about the locations ofincompatible types (seestop_incompatible_type()).

Details

Unlike regular lists, setting a list element toNULL using[[does not remove it.

Examples

x<- list_of(1:3,5:6,10:15)if(requireNamespace("tibble", quietly=TRUE)){  tibble::tibble(x= x)}vec_c(list_of(1,2), list_of(FALSE,TRUE))

Missing values

Description

  • vec_detect_missing() returns a logical vector the same size asx. Foreach element ofx, it returnsTRUE if the element is missing, andFALSEotherwise.

  • vec_any_missing() returns a singleTRUE orFALSE depending on whetheror notx hasany missing values.

Differences withis.na()

Data frame rows are only considered missing if every element in the row ismissing. Similarly,record vector elements are only consideredmissing if every field in the record is missing. Put another way, rows withany missing values are consideredincomplete, butonly rows withall missing values are considered missing.

List elements are only considered missing if they areNULL.

Usage

vec_detect_missing(x)vec_any_missing(x)

Arguments

x

A vector

Value

  • vec_detect_missing() returns a logical vector the same size asx.

  • vec_any_missing() returns a singleTRUE orFALSE.

Dependencies

See Also

vec_detect_complete()

Examples

x<- c(1,2,NA,4,NA)vec_detect_missing(x)vec_any_missing(x)# Data frames are iterated over rowwise, and only report a row as missing# if every element of that row is missing. If a row is only partially# missing, it is said to be incomplete, but not missing.y<- c("a","b",NA,"d","e")df<- data_frame(x= x, y= y)df$missing<- vec_detect_missing(df)df$incomplete<-!vec_detect_complete(df)df

Name specifications

Description

A name specification describes how to combine an inner and outernames. This sort of name combination arises when concatenatingvectors or flattening lists. There are two possible cases:

  • Named vector:

    vec_c(outer = c(inner1 = 1, inner2 = 2))
  • Unnamed vector:

    vec_c(outer = 1:2)

In r-lib and tidyverse packages, these cases are errors by default,because there's no behaviour that works well for every case.Instead, you can provide a name specification that describes how tocombine the inner and outer names of inputs. Name specificationscan refer to:

  • outer: The external name recycled to the size of the inputvector.

  • inner: Either the names of the input vector, or a sequence ofinteger from 1 to the size of the vector if it is unnamed.

Arguments

name_spec,.name_spec

A name specification for combininginner and outer names. This is relevant for inputs passed with aname, when these inputs are themselves named, likeouter = c(inner = 1), or when they have length greater than 1:outer = 1:2. By default, these cases trigger an error. You can resolvethe error by providing a specification that describes how tocombine the names or the indices of the inner vector with thename of the input. This specification can be:

  • A function of two arguments. The outer name is passed as astring to the first argument, and the inner names or positionsare passed as second argument.

  • An anonymous function as a purrr-style formula.

  • A glue specification of the form"{outer}_{inner}".

  • Anrlang::zap() object, in which case both outer and innernames are ignored and the result is unnamed.

See thename specification topic.

Examples

# By default, named inputs must be length 1:vec_c(name=1)# oktry(vec_c(name=1:3))# bad# They also can't have internal names, even if scalar:try(vec_c(name= c(internal=1)))# bad# Pass a name specification to work around this. A specification# can be a glue string referring to `outer` and `inner`:vec_c(name=1:3, other=4:5, .name_spec="{outer}")vec_c(name=1:3, other=4:5, .name_spec="{outer}_{inner}")# They can also be functions:my_spec<-function(outer, inner) paste(outer, inner, sep="_")vec_c(name=1:3, other=4:5, .name_spec= my_spec)# Or purrr-style formulas for anonymous functions:vec_c(name=1:3, other=4:5, .name_spec=~ paste0(.x, .y))

Assemble attributes for data frame construction

Description

new_data_frame() constructs a new data frame from an existing list. It ismeant to be performant, and does not check the inputs for correctness in anyway. It is only safe to use after a call todf_list(), which collects andvalidates the columns used to construct the data frame.

Usage

new_data_frame(x= list(), n=NULL,..., class=NULL)

Arguments

x

A named list of equal-length vectors. The lengths are notchecked; it is responsibility of the caller to make sure they areequal.

n

Number of rows. IfNULL, will be computed from the length ofthe first element ofx.

...,class

Additional arguments for creating subclasses.

The following attributes have special behavior:

  • "names" is preferred if provided, overriding existing names inx.

  • "row.names" is preferred if provided, overriding bothn and the sizeimplied byx.

See Also

df_list() for a way to safely construct a data frame's underlyingdata structure from individual columns. This can be used to create anamed list for further use bynew_data_frame().

Examples

new_data_frame(list(x=1:10, y=10:1))

List checks

Description

  • obj_is_list() tests ifx is considered a list in the vctrs sense. ItreturnsTRUE if:

    • x is a bare list with no class.

    • x is a list explicitly inheriting from"list".

  • list_all_vectors() takes a list and returnsTRUE if all elements ofthat list are vectors.

  • list_all_size() takes a list and returnsTRUE if all elements of thatlist have the samesize.

  • obj_check_list(),list_check_all_vectors(), andlist_check_all_size()use the above functions, but throw a standardized and informative error ifthey returnFALSE.

Usage

obj_is_list(x)obj_check_list(x,..., arg= caller_arg(x), call= caller_env())list_all_vectors(x)list_check_all_vectors(x,..., arg= caller_arg(x), call= caller_env())list_all_size(x, size)list_check_all_size(x, size,..., arg= caller_arg(x), call= caller_env())

Arguments

x

For⁠vec_*()⁠ functions, an object. For⁠list_*()⁠ functions, alist.

...

These dots are for future extensions and must be empty.

arg

An argument name as a string. This argumentwill be mentioned in error messages as the input that is at theorigin of a problem.

call

The execution environment of a currentlyrunning function, e.g.caller_env(). The function will bementioned in error messages as the source of the error. See thecall argument ofabort() for more information.

size

The size to check each element for.

Details

Notably, data frames and S3 record style classes like POSIXlt are notconsidered lists.

See Also

list_sizes()

Examples

obj_is_list(list())obj_is_list(list_of(1))obj_is_list(data.frame())list_all_vectors(list(1, mtcars))list_all_vectors(list(1, environment()))list_all_size(list(1:2,2:3),2)list_all_size(list(1:2,2:4),2)# `list_`-prefixed functions assume a list:try(list_all_vectors(environment()))

FAQ - Is my class compatible with vctrs?

Description

vctrs provides a framework for working with vector classes in a genericway. However, it implements several compatibility fallbacks to base Rmethods. In this reference you will find how vctrs tries to becompatible with your vector class, and what base methods you need toimplement for compatibility.

If you’re starting from scratch, we think you’ll find it easier to startusingnew_vctr() as documented invignette("s3-vector"). This guide is aimed for developers withexisting vector classes.

Aggregate operations with fallbacks

All vctrs operations are based on four primitive generics described inthe next section. However there are many higher level operations. Themost important ones implement fallbacks to base generics for maximumcompatibility with existing classes.

  • vec_slice() falls back to the base[ generic if novec_proxy() method is implemented. This way foreignclasses that do not implementvec_restore() canrestore attributes based on the new subsetted contents.

  • vec_c() andvec_rbind() now fall back tobase::c() if the inputs have a common parent class withac() method (only if they have no self-to-selfvec_ptype2()method).

    vctrs works hard to make yourc() method success in varioussituations (withNULL andNA inputs, even as first input whichwould normally prevent dispatch to your method). The main downsidecompared to using vctrs primitives is that you can’t combine vectorsof different classes since there is no extensible mechanism ofcoercion inc(), and it is less efficient in some cases.

The vctrs primitives

Most functions in vctrs are aggregate operations: they call other vctrsfunctions which themselves call other vctrs functions. The dependenciesof a vctrs functions are listed in the Dependencies section of itsdocumentation page. Take a look atvec_count() for anexample.

These dependencies form a tree whose leaves are the four vctrsprimitives. Here is the diagram forvec_count():

vec-count-deps.png

The coercion generics

The coercion mechanism in vctrs is based on two generics:

See thetheory overview.

Two objects with the same class and the same attributes are alwaysconsidered compatible by ptype2 and cast. If the attributes or classesdiffer, they throw an incompatible type error.

Coercion errors are the main source of incompatibility with vctrs. Seethehowto guide if you need to implement methodsfor these generics.

The proxy and restoration generics

These generics are essential for vctrs but mostly optional.vec_proxy() defaults to anidentity function and younormally don’t need to implement it. The proxy a vector must be one ofthe atomic vector types, a list, or a data frame. By default, S3 liststhat do not inherit from"list" do not have an identity proxy. In thatcase, you need to explicitly implementvec_proxy() or make your classinherit from list.


Runs

Description

  • vec_identify_runs() returns a vector of identifiers for the elements ofx that indicate which run of repeated values they fall in. The number ofruns is also returned as an attribute,n.

  • vec_run_sizes() returns an integer vector corresponding to the size ofeach run. This is identical to thetimes column fromvec_unrep(), butis faster if you don't need the run keys.

  • vec_unrep() is a generalizedbase::rle(). It is documented alongsidethe "repeat" functions ofvec_rep() andvec_rep_each(); look there formore information.

Usage

vec_identify_runs(x)vec_run_sizes(x)

Arguments

x

A vector.

Details

Unlikebase::rle(), adjacent missing values are considered identical whenconstructing runs. For example,vec_identify_runs(c(NA, NA)) will returnc(1, 1), notc(1, 2).

Value

  • Forvec_identify_runs(), an integer vector with the same size asx. Ascalar integer attribute,n, is attached.

  • Forvec_run_sizes(), an integer vector with size equal to the number ofruns inx.

See Also

vec_unrep() for a generalizedbase::rle().

Examples

x<- c("a","z","z","c","a","a")vec_identify_runs(x)vec_run_sizes(x)vec_unrep(x)y<- c(1,1,1,2,2,3)# With multiple columns, the runs are constructed rowwisedf<- data_frame(  x= x,  y= y)vec_identify_runs(df)vec_run_sizes(df)vec_unrep(df)

FAQ - How does coercion work in vctrs?

Description

This is an overview of the usage ofvec_ptype2() andvec_cast() andtheir role in the vctrs coercion mechanism. Related topics:

  • For an example of implementing coercion methods for simple vectors,see?howto-faq-coercion.

  • For an example of implementing coercion methods for data framesubclasses, see?howto-faq-coercion-data-frame.

  • For a tutorial about implementing vctrs classes from scratch, seevignette("s3-vector").

Combination mechanism in vctrs

The coercion system in vctrs is designed to make combination of multipleinputs consistent and extensible. Combinations occur in many places,such as row-binding, joins, subset-assignment, or grouped summaryfunctions that use the split-apply-combine strategy. For example:

vec_c(TRUE, 1)#> [1] 1 1vec_c("a", 1)#> Error in `vec_c()`:#> ! Can't combine `..1` <character> and `..2` <double>.vec_rbind(  data.frame(x = TRUE),  data.frame(x = 1, y = 2))#>   x  y#> 1 1 NA#> 2 1  2vec_rbind(  data.frame(x = "a"),  data.frame(x = 1, y = 2))#> Error in `vec_rbind()`:#> ! Can't combine `..1$x` <character> and `..2$x` <double>.

One major goal of vctrs is to provide a central place for implementingthe coercion methods that make generic combinations possible. The tworelevant generics arevec_ptype2() andvec_cast(). They both taketwo arguments and performdouble dispatch, meaning that a method isselected based on the classes of both inputs.

The general mechanism for combining multiple inputs is:

  1. Find the common type of a set of inputs by reducing (as inbase::Reduce() orpurrr::reduce()) thevec_ptype2() binaryfunction over the set.

  2. Convert all inputs to the common type withvec_cast().

  3. Initialise the output vector as an instance of this common type withvec_init().

  4. Fill the output vector with the elements of the inputs usingvec_assign().

The last two steps may requirevec_proxy() andvec_restore()implementations, unless the attributes of your class are constant and donot depend on the contents of the vector. We focus here on the first twosteps, which requirevec_ptype2() andvec_cast() implementations.

vec_ptype2()

Methods forvec_ptype2() are passed twoprototypes, i.e. two inputsemptied of their elements. They implement two behaviours:

  • If the types of their inputs are compatible, indicate which of them isthe richer type by returning it. If the types are of equal resolution,return any of the two.

  • Throw an error withstop_incompatible_type() when it can bedetermined from the attributes that the types of the inputs are notcompatible.

Type compatibility

A type iscompatible with another type if the values it representsare a subset or a superset of the values of the other type. The notionof “value” is to be interpreted at a high level, in particular it is notthe same as the memory representation. For example, factors arerepresented in memory with integers but their values are more related tocharacter vectors than to round numbers:

# Two factors are compatiblevec_ptype2(factor("a"), factor("b"))#> factor()#> Levels: a b# Factors are compatible with a charactervec_ptype2(factor("a"), "b")#> character(0)# But they are incompatible with integersvec_ptype2(factor("a"), 1L)#> Error:#> ! Can't combine `factor("a")` <factor<4d52a>> and `1L` <integer>.
Richness of type

Richness of type is not a very precise notion. It can be about richerdata (for instance adouble vector covers more values than an integervector), richer behaviour (adata.table has richer behaviour than adata.frame), or both. If you have trouble determining which one of thetwo types is richer, it probably means they shouldn’t be automaticallycoercible.

Let’s look again at what happens when we combine a factor and acharacter:

vec_ptype2(factor("a"), "b")#> character(0)

The ptype2 method for⁠<character>⁠ and⁠<factor<"a">>⁠ returns⁠<character>⁠ because the former is a richer type. The factor can onlycontain"a" strings, whereas the character can contain any strings. Inthis sense, factors are asubset of character.

Note that another valid behaviour would be to throw an incompatible typeerror. This is what a strict factor implementation would do. We havedecided to be laxer in vctrs because it is easy to inadvertently createfactors instead of character vectors, especially with older versions ofR wherestringsAsFactors is still true by default.

Consistency and symmetry on permutation

Each ptype2 method should strive to have exactly the same behaviour whenthe inputs are permuted. This is not always possible, for example factorlevels are aggregated in order:

vec_ptype2(factor(c("a", "c")), factor("b"))#> factor()#> Levels: a c bvec_ptype2(factor("b"), factor(c("a", "c")))#> factor()#> Levels: b a c

In any case, permuting the input should not return a fundamentallydifferent type or introduce an incompatible type error.

Coercion hierarchy

The classes that you can coerce together form a coercion (or subtyping)hierarchy. Below is a schema of the hierarchy for the base types likeinteger and factor. In this diagram the directions of the arrows expresswhich type is richer. They flow from the bottom (more constrained types)to the top (richer types).

coerce.png

A coercion hierarchy is distinct from the structural hierarchy impliedby memory types and classes. For instance, in a structural hierarchy,factors are built on top of integers. But in the coercion hierarchy theyare more related to character vectors. Similarly, subclasses are notnecessarily coercible with their superclasses because the coercion andstructural hierarchies are separate.

Implementing a coercion hierarchy

As a class implementor, you have two options. The simplest is to createan entirely separate hierarchy. The date and date-time classes are anexample of an S3-based hierarchy that is completely separate.Alternatively, you can integrate your class in an existing hierarchy,typically by adding parent nodes on top of the hierarchy (your class isricher), by adding children node at the root of the hierarchy (yourclass is more constrained), or by inserting a node in the tree.

These coercion hierarchies areimplicit, in the sense that they areimplied by thevec_ptype2() implementations. There is no structuredway to create or modify a hierarchy, instead you need to implement theappropriate coercion methods for all the types in your hierarchy, anddiligently return the richer type in each case. Thevec_ptype2()implementations are not transitive nor inherited, so all pairwisemethods between classes lying on a given path must be implementedmanually. This is something we might make easier in the future.

vec_cast()

The second generic,vec_cast(), is the one that looks at the data andactually performs the conversion. Because it has access to moreinformation thanvec_ptype2(), it may be stricter and cause an errorin more cases.vec_cast() has three possible behaviours:

  • Determine that the prototypes of the two inputs are not compatible.This must be decided in exactly the same way as forvec_ptype2().Callstop_incompatible_cast() if you can determine from theattributes that the types are not compatible.

  • Detect incompatible values. Usually this is because the target type istoo restricted for the values supported by the input type. Forexample, a fractional number can’t be converted to an integer. Themethod should throw an error in that case.

  • Return the input vector converted to the target type if all values arecompatible. Whereasvec_ptype2() must return the same type when theinputs are permuted,vec_cast() isdirectional. It always returnsthe type of the right-hand side, or dies trying.

Double dispatch

The dispatch mechanism forvec_ptype2() andvec_cast() looks like S3but is actually a custom mechanism. Compared to S3, it has the followingdifferences:

  • It dispatches on the classes of the first two inputs.

  • There is no inheritance of ptype2 and cast methods. This is becausethe S3 class hierarchy is not necessarily the same as the coercionhierarchy.

  • NextMethod() does not work. Parent methods must be called explicitlyif necessary.

  • The default method is hard-coded.

Data frames

The determination of the common type of data frames withvec_ptype2()happens in three steps:

  1. Match the columns of the two input data frames. If some columnsdon’t exist, they are created and filled with adequately typedNAvalues.

  2. Find the common type for each column by callingvec_ptype2() oneach pair of matched columns.

  3. Find the common data frame type. For example the common type of agrouped tibble and a tibble is a grouped tibble because the latteris the richer type. The common type of a data table and a data frameis a data table.

vec_cast() operates similarly. If a data frame is cast to a targettype that has fewer columns, this is an error.

If you are implementing coercion methods for data frames, you will needto explicitly call the parent methods that perform the common typedetermination or the type conversion described above. These are exportedasdf_ptype2() anddf_cast().

Data frame fallbacks

Being too strict with data frame combinations would cause too much painbecause there are many data frame subclasses in the wild that don’timplement vctrs methods. We have decided to implement a special fallbackbehaviour for foreign data frames. Incompatible data frames fall back toa base data frame:

df1 <- data.frame(x = 1)df2 <- structure(df1, class = c("foreign_df", "data.frame"))vec_rbind(df1, df2)#>   x#> 1 1#> 2 1

When a tibble is involved, we fall back to tibble:

df3 <- tibble::as_tibble(df1)vec_rbind(df1, df3)#> # A tibble: 2 x 1#>       x#>   <dbl>#> 1     1#> 2     1

These fallbacks are not ideal but they make sense because all dataframes share a common data structure. This is not generally the case forvectors. For example factors and characters have differentrepresentations, and it is not possible to find a fallback timemechanically.

However this fallback has a big downside: implementing vctrs methods foryour data frame subclass is a breaking behaviour change. The propercoercion behaviour for your data frame class should be specified as soonas possible to limit the consequences of changing the behaviour of yourclass in R scripts.


FAQ - How does recycling work in vctrs and the tidyverse?

Description

Recycling describes the concept of repeating elements of one vector tomatch the size of another. There are two rules that underlie the“tidyverse” recycling rules:

  • Vectors of size 1 will be recycled to the size of any other vector

  • Otherwise, all vectors must have the same size

Examples

Vectors of size 1 are recycled to the size of any other vector:

tibble(x = 1:3, y = 1L)#> # A tibble: 3 x 2#>       x     y#>   <int> <int>#> 1     1     1#> 2     2     1#> 3     3     1

This includes vectors of size 0:

tibble(x = integer(), y = 1L)#> # A tibble: 0 x 2#> # i 2 variables: x <int>, y <int>

If vectors aren’t size 1, they must all be the same size. Otherwise, anerror is thrown:

tibble(x = 1:3, y = 4:7)#> Error in `tibble()`:#> ! Tibble columns must have compatible sizes.#> * Size 3: Existing data.#> * Size 4: Column `y`.#> i Only values of size one are recycled.

vctrs backend

Packages in r-lib and the tidyverse generally usevec_size_common() andvec_recycle_common() as the backends forhandling recycling rules.

  • vec_size_common() returns the common size of multiple vectors, afterapplying the recycling rules

  • vec_recycle_common() goes one step further, and actually recyclesthe vectors to their common size

vec_size_common(1:3, "x")#> [1] 3vec_recycle_common(1:3, "x")#> [[1]]#> [1] 1 2 3#> #> [[2]]#> [1] "x" "x" "x"vec_size_common(1:3, c("x", "y"))#> Error:#> ! Can't recycle `..1` (size 3) to match `..2` (size 2).

Base R recycling rules

The recycling rules described here are stricter than the ones generallyused by base R, which are:

  • If any vector is length 0, the output will be length 0

  • Otherwise, the output will be lengthmax(length_x, length_y), and awarning will be thrown if the length of the longer vector is not aninteger multiple of the length of the shorter vector.

We explore the base R rules in detail invignette("type-size").


Retrieve and repair names

Description

vec_as_names() takes a character vector of names and repairs itaccording to therepair argument. It is the r-lib and tidyverseequivalent ofbase::make.names().

vctrs deals with a few levels of name repair:

  • minimal names exist. Thenames attribute is notNULL. Thename of an unnamed element is"" and neverNA. For instance,vec_as_names() always returns minimal names and data framescreated by the tibble package have names that are, at least,minimal.

  • unique names areminimal, have no duplicates, and can be usedwhere a variable name is expected. Empty names,..., and.. followed by a sequence of digits are banned.

    • All columns can be accessed by name viadf[["name"]] anddf$`name` andwith(df, `name`).

  • universal names areunique and syntactic (see Details formore).

    • Names work everywhere, without quoting:df$name andwith(df, name) andlm(name1 ~ name2, data = df) anddplyr::select(df, name) all work.

universal impliesunique,unique impliesminimal. Theselevels are nested.

Usage

vec_as_names(  names,...,  repair= c("minimal","unique","universal","check_unique","unique_quiet","universal_quiet"),  repair_arg=NULL,  quiet=FALSE,  call= caller_env())

Arguments

names

A character vector.

...

These dots are for future extensions and must be empty.

repair

Either a string or a function. If a string, it must be one of"check_unique","minimal","unique","universal","unique_quiet",or"universal_quiet". If a function, it is invoked with a vector ofminimal names and must return minimal names, otherwise an error is thrown.

  • Minimal names are neverNULL orNA. When an element doesn'thave a name, its minimal name is an empty string.

  • Unique names are unique. A suffix is appended to duplicatenames to make them unique.

  • Universal names are unique and syntactic, meaning that you cansafely use the names as variables without causing a syntaxerror.

The"check_unique" option doesn't perform any name repair.Instead, an error is raised if the names don't suit the"unique" criteria.

The options"unique_quiet" and"universal_quiet" are here to help theuser who calls this function indirectly, via another function which exposesrepair but notquiet. Specifyingrepair = "unique_quiet" is likespecifying⁠repair = "unique", quiet = TRUE⁠. When the"*_quiet" optionsare used, any setting ofquiet is silently overridden.

repair_arg

If specified andrepair = "check_unique", any errorswill include a hint to set therepair_arg.

quiet

By default, the user is informed of any renamingcaused by repairing the names. This only concerns unique anduniversal repairing. Setquiet toTRUE to silence themessages.

Users can silence the name repair messages by setting the"rlib_name_repair_verbosity" global option to"quiet".

call

The execution environment of a currentlyrunning function, e.g.caller_env(). The function will bementioned in error messages as the source of the error. See thecall argument ofabort() for more information.

minimal names

minimal names exist. Thenames attribute is notNULL. Thename of an unnamed element is"" and neverNA.

Examples:

Original names of a vector with length 3: NULL                           minimal names: "" "" ""                          Original names: "x" NA                           minimal names: "x" ""

unique names

unique names areminimal, have no duplicates, and can be used(possibly with backticks) in contexts where a variable isexpected. Empty names,..., and.. followed by a sequence ofdigits are banned. If a data frame hasunique names, you canindex it by name, and also access the columns by name. Inparticular,df[["name"]] anddf$`name` and alsowith(df, `name`) always work.

There are many ways to make namesunique. We append a suffix of the form...j to any name that is"" or a duplicate, wherej is the position.We also change..# and... to...#.

Example:

Original names:     ""     "x"     "" "y"     "x"  "..2"  "..."  unique names: "...1" "x...2" "...3" "y" "x...5" "...6" "...7"

Pre-existing suffixes of the form...j are always stripped, priorto making namesunique, i.e. reconstructing the suffixes. If thisinteracts poorly with your names, you should take control of namerepair.

universal names

universal names areunique and syntactic, meaning they:

  • Are never empty (inherited fromunique).

  • Have no duplicates (inherited fromunique).

  • Are not.... Do not have the form..i, wherei is anumber (inherited fromunique).

  • Consist of letters, numbers, and the dot. or underscore⁠_⁠characters.

  • Start with a letter or start with the dot. not followed by anumber.

  • Are not areserved word, e.g.,if orfunction orTRUE.

If a vector hasuniversal names, variable names can be used"as is" in code. They work well with nonstandard evaluation, e.g.,df$name works.

vctrs has a different method of making names syntactic thanbase::make.names(). In general, vctrs prepends one or more dots. until the name is syntactic.

Examples:

 Original names:     ""     "x"    NA      "x"universal names: "...1" "x...2" "...3" "x...4"  Original names: "(y)"  "_z"  ".2fa"  "FALSE" universal names: ".y." "._z" "..2fa" ".FALSE"

See Also

rlang::names2() returns the names of an object, aftermaking themminimal.

Examples

# By default, `vec_as_names()` returns minimal names:vec_as_names(c(NA,NA,"foo"))# You can make them unique:vec_as_names(c(NA,NA,"foo"), repair="unique")# Universal repairing fixes any non-syntactic name:vec_as_names(c("_foo","+"), repair="universal")

Combine many data frames into one data frame

Description

This pair of functions binds together data frames (and vectors), eitherrow-wise or column-wise. Row-binding creates a data frame with common typeacross all arguments. Column-binding creates a data frame with common lengthacross all arguments.

Usage

vec_rbind(...,  .ptype=NULL,  .names_to= rlang::zap(),  .name_repair= c("unique","universal","check_unique","unique_quiet","universal_quiet"),  .name_spec=NULL,  .error_call= current_env())vec_cbind(...,  .ptype=NULL,  .size=NULL,  .name_repair= c("unique","universal","check_unique","minimal","unique_quiet","universal_quiet"),  .error_call= current_env())

Arguments

...

Data frames or vectors.

When the inputs are named:

  • vec_rbind() assigns names to row names unless.names_to issupplied. In that case the names are assigned in the columndefined by.names_to.

  • vec_cbind() creates packed data frame columns with namedinputs.

NULL inputs are silently ignored. Empty (e.g. zero row) inputswill not appear in the output, but will affect the derived.ptype.

.ptype

IfNULL, the default, the output type is determined bycomputing the common type across all elements of....

Alternatively, you can supply.ptype to give the output known type.IfgetOption("vctrs.no_guessing") isTRUE you must supply this value:this is a convenient way to make production code demand fixed types.

.names_to

This controls what to do with input names supplied in....

  • By default, input names arezapped.

  • If a string, specifies a column where the input names will becopied. These names are often useful to identify rows withtheir original input. If a column name is supplied and... isnot named, an integer column is used instead.

  • IfNULL, the input names are used as row names.

.name_repair

One of"unique","universal","check_unique","unique_quiet", or"universal_quiet". Seevec_as_names() for themeaning of these options.

Withvec_rbind(), the repair function is applied to all inputsseparately. This is becausevec_rbind() needs to align theircolumns before binding the rows, and thus needs all inputs tohave unique names. On the other hand,vec_cbind() applies therepair function after all inputs have been concatenated togetherin a final data frame. Hencevec_cbind() allows the morepermissive minimal names repair.

.name_spec

A name specification (as documented invec_c())for combining the outer inputs names in... and the inner rownames of the inputs. This only has an effect when.names_to isset toNULL, which causes the input names to be assigned as rownames.

.error_call

The execution environment of a currentlyrunning function, e.g.caller_env(). The function will bementioned in error messages as the source of the error. See thecall argument ofabort() for more information.

.size

If,NULL, the default, will determine the number of rows invec_cbind() output by using the tidyverserecycling rules.

Alternatively, specify the desired number of rows, and any inputs of length1 will be recycled appropriately.

Value

A data frame, or subclass of data frame.

If... is a mix of different data frame subclasses,vec_ptype2()will be used to determine the output type. Forvec_rbind(), thiswill determine the type of the container and the type of each column;forvec_cbind() it only determines the type of the output container.If there are no non-NULL inputs, the result will bedata.frame().

Invariants

All inputs are first converted to a data frame. The conversion for1d vectors depends on the direction of binding:

  • Forvec_rbind(), each element of the vector becomes a column ina single row.

  • Forvec_cbind(), each element of the vector becomes a row in asingle column.

Once the inputs have all become data frames, the followinginvariants are observed for row-binding:

  • vec_size(vec_rbind(x, y)) == vec_size(x) + vec_size(y)

  • vec_ptype(vec_rbind(x, y)) = vec_ptype_common(x, y)

Note that if an input is an empty vector, it is first converted toa 1-row data frame with 0 columns. Despite being empty, itseffective size for the total number of rows is 1.

For column-binding, the following invariants apply:

  • vec_size(vec_cbind(x, y)) == vec_size_common(x, y)

  • vec_ptype(vec_cbind(x, y)) == vec_cbind(vec_ptype(x), vec_ptype(x))

Dependencies

vctrs dependencies

base dependencies ofvec_rbind()

If columns to combine inherit from a common class,vec_rbind() falls back tobase::c() if there exists ac()method implemented for this class hierarchy.

See Also

vec_c() for combining 1d vectors.

Examples

# row binding -----------------------------------------# common columns are coerced to common classvec_rbind(  data.frame(x=1),  data.frame(x=FALSE))# unique columns are filled with NAsvec_rbind(  data.frame(x=1),  data.frame(y="x"))# null inputs are ignoredvec_rbind(  data.frame(x=1),NULL,  data.frame(x=2))# bare vectors are treated as rowsvec_rbind(  c(x=1, y=2),  c(x=3))# default names will be supplied if arguments are not namedvec_rbind(1:2,1:3,1:4)# column binding --------------------------------------# each input is recycled to have common lengthvec_cbind(  data.frame(x=1),  data.frame(y=1:3))# bare vectors are treated as columnsvec_cbind(  data.frame(x=1),  y= letters[1:3])# if you supply a named data frame, it is packed in a single columndata<- vec_cbind(  x= data.frame(a=1, b=2),  y=1)data# Packed data frames are nested in a single column. This makes it# possible to access it through a single name:data$x# since the base print method is suboptimal with packed data# frames, it is recommended to use tibble to work with these:if(rlang::is_installed("tibble")){  vec_cbind(x= tibble::tibble(a=1, b=2), y=1)}# duplicate names are flaggedvec_cbind(x=1, x=2)

Combine many vectors into one vector

Description

Combine all arguments into a new vector of common type.

Usage

vec_c(...,  .ptype=NULL,  .name_spec=NULL,  .name_repair= c("minimal","unique","check_unique","universal","unique_quiet","universal_quiet"),  .error_arg="",  .error_call= current_env())

Arguments

...

Vectors to coerce.

.ptype

IfNULL, the default, the output type is determined bycomputing the common type across all elements of....

Alternatively, you can supply.ptype to give the output known type.IfgetOption("vctrs.no_guessing") isTRUE you must supply this value:this is a convenient way to make production code demand fixed types.

.name_spec

A name specification for combininginner and outer names. This is relevant for inputs passed with aname, when these inputs are themselves named, likeouter = c(inner = 1), or when they have length greater than 1:outer = 1:2. By default, these cases trigger an error. You can resolvethe error by providing a specification that describes how tocombine the names or the indices of the inner vector with thename of the input. This specification can be:

  • A function of two arguments. The outer name is passed as astring to the first argument, and the inner names or positionsare passed as second argument.

  • An anonymous function as a purrr-style formula.

  • A glue specification of the form"{outer}_{inner}".

  • Anrlang::zap() object, in which case both outer and innernames are ignored and the result is unnamed.

See thename specification topic.

.name_repair

How to repair names, seerepair options invec_as_names().

.error_arg

An argument name as a string. This argumentwill be mentioned in error messages as the input that is at theorigin of a problem.

.error_call

The execution environment of a currentlyrunning function, e.g.caller_env(). The function will bementioned in error messages as the source of the error. See thecall argument ofabort() for more information.

Value

A vector with class given by.ptype, and length equal to thesum of thevec_size() of the contents of....

The vector will have names if the individual components have names(inner names) or if the arguments are named (outer names). If bothinner and outer names are present, an error is thrown unless a.name_spec is provided.

Invariants

  • vec_size(vec_c(x, y)) == vec_size(x) + vec_size(y)

  • vec_ptype(vec_c(x, y)) == vec_ptype_common(x, y).

Dependencies

vctrs dependencies

base dependencies

If inputs inherit from a common class hierarchy,vec_c() fallsback tobase::c() if there exists ac() method implemented forthis class hierarchy.

See Also

vec_cbind()/vec_rbind() for combining data frames by rowsor columns.

Examples

vec_c(FALSE,1L,1.5)# Date/times --------------------------c(Sys.Date(), Sys.time())c(Sys.time(), Sys.Date())vec_c(Sys.Date(), Sys.time())vec_c(Sys.time(), Sys.Date())# Factors -----------------------------c(factor("a"), factor("b"))vec_c(factor("a"), factor("b"))# By default, named inputs must be length 1:vec_c(name=1)try(vec_c(name=1:3))# Pass a name specification to work around this:vec_c(name=1:3, .name_spec="{outer}_{inner}")# See `?name_spec` for more examples of name specifications.

Cast a vector to a specified type

Description

vec_cast() provides directional conversions from one type ofvector to another. Along withvec_ptype2(), this generic formsthe foundation of type coercions in vctrs.

Usage

vec_cast(x, to,..., x_arg= caller_arg(x), to_arg="", call= caller_env())vec_cast_common(..., .to=NULL, .arg="", .call= caller_env())## S3 method for class 'logical'vec_cast(x, to,...)## S3 method for class 'integer'vec_cast(x, to,...)## S3 method for class 'double'vec_cast(x, to,...)## S3 method for class 'complex'vec_cast(x, to,...)## S3 method for class 'raw'vec_cast(x, to,...)## S3 method for class 'character'vec_cast(x, to,...)## S3 method for class 'list'vec_cast(x, to,...)

Arguments

x

Vectors to cast.

to,.to

Type to cast to. IfNULL,x will be returned as is.

...

Forvec_cast_common(), vectors to cast. Forvec_cast(),vec_cast_default(), andvec_restore(), thesedots are only for future extensions and should be empty.

x_arg

Argument name forx, used in error messages toinform the user about the locations of incompatible types(seestop_incompatible_type()).

to_arg

Argument nameto used in error messages toinform the user about the locations of incompatible types(seestop_incompatible_type()).

call,.call

The execution environment of a currentlyrunning function, e.g.caller_env(). The function will bementioned in error messages as the source of the error. See thecall argument ofabort() for more information.

.arg

An argument name as a string. This argumentwill be mentioned in error messages as the input that is at theorigin of a problem.

Value

A vector the same length asx with the same type asto,or an error if the cast is not possible. An error is generated ifinformation is lost when casting between compatible types (i.e. whenthere is no 1-to-1 mapping for a specific value).

Implementing coercion methods

  • For an overview of how these generics work and their roles in vctrs,see?theory-faq-coercion.

  • For an example of implementing coercion methods for simple vectors,see?howto-faq-coercion.

  • For an example of implementing coercion methods for data framesubclasses, see?howto-faq-coercion-data-frame.

  • For a tutorial about implementing vctrs classes from scratch, seevignette("s3-vector").

Dependencies ofvec_cast_common()

vctrs dependencies

base dependencies

Some functions enable a base-class fallback forvec_cast_common(). In that case the inputs are deemed compatiblewhen they have the samebase type and inherit fromthe same base class.

See Also

Callstop_incompatible_cast() when you determine from theattributes that an input can't be cast to the target type.

Examples

# x is a double, but no information is lostvec_cast(1, integer())# When information is lost the cast failstry(vec_cast(c(1,1.5), integer()))try(vec_cast(c(1,2), logical()))# You can suppress this error and get the partial resultsallow_lossy_cast(vec_cast(c(1,1.5), integer()))allow_lossy_cast(vec_cast(c(1,2), logical()))# By default this suppress all lossy cast errors without# distinction, but you can be specific about what cast is allowed# by supplying prototypesallow_lossy_cast(vec_cast(c(1,1.5), integer()), to_ptype= integer())try(allow_lossy_cast(vec_cast(c(1,2), logical()), to_ptype= integer()))# No sensible coercion is possible so an error is generatedtry(vec_cast(1.5, factor("a")))# Cast to common typevec_cast_common(factor("a"), factor(c("a","b")))

Chopping

Description

  • vec_chop() provides an efficient method to repeatedly slice a vector. Itcaptures the pattern ofmap(indices, vec_slice, x = x). When no indicesare supplied, it is generally equivalent toas.list().

  • list_unchop() combines a list of vectors into a single vector, placingelements in the output according to the locations specified byindices.It is similar tovec_c(), but gives greater control over how the elementsare combined. When no indices are supplied, it is identical tovec_c(),but typically a little faster.

Ifindices selects every value inx exactly once, in any order, thenlist_unchop() is the inverse ofvec_chop() and the following invariantholds:

list_unchop(vec_chop(x, indices = indices), indices = indices) == x

Usage

vec_chop(x,..., indices=NULL, sizes=NULL)list_unchop(  x,...,  indices=NULL,  ptype=NULL,  name_spec=NULL,  name_repair= c("minimal","unique","check_unique","universal","unique_quiet","universal_quiet"),  error_arg="x",  error_call= current_env())

Arguments

x

A vector

...

These dots are for future extensions and must be empty.

indices

Forvec_chop(), a list of positive integer vectors toslicex with, orNULL. Can't be used ifsizes is already specified.If bothindices andsizes areNULL,x is split into its individualelements, equivalent to using anindices ofas.list(vec_seq_along(x)).

Forlist_unchop(), a list of positive integer vectors specifying thelocations to place elements ofx in. Each element ofx is recycled tothe size of the corresponding index vector. The size ofindices mustmatch the size ofx. IfNULL,x is combined in the order it isprovided in, which is equivalent to usingvec_c().

sizes

An integer vector of non-negative sizes representing sequentialindices to slicex with, orNULL. Can't be used ifindices is alreadyspecified.

For example,sizes = c(2, 4) is equivalent toindices = list(1:2, 3:6),but is typically faster.

sum(sizes) must be equal tovec_size(x), i.e.sizes must completelypartitionx, but an individual size is allowed to be0.

ptype

IfNULL, the default, the output type is determined bycomputing the common type across all elements ofx. Alternatively, youcan supplyptype to give the output a known type.

name_spec

A name specification for combininginner and outer names. This is relevant for inputs passed with aname, when these inputs are themselves named, likeouter = c(inner = 1), or when they have length greater than 1:outer = 1:2. By default, these cases trigger an error. You can resolvethe error by providing a specification that describes how tocombine the names or the indices of the inner vector with thename of the input. This specification can be:

  • A function of two arguments. The outer name is passed as astring to the first argument, and the inner names or positionsare passed as second argument.

  • An anonymous function as a purrr-style formula.

  • A glue specification of the form"{outer}_{inner}".

  • Anrlang::zap() object, in which case both outer and innernames are ignored and the result is unnamed.

See thename specification topic.

name_repair

How to repair names, seerepair options invec_as_names().

error_arg

An argument name as a string. This argumentwill be mentioned in error messages as the input that is at theorigin of a problem.

error_call

The execution environment of a currentlyrunning function, e.g.caller_env(). The function will bementioned in error messages as the source of the error. See thecall argument ofabort() for more information.

Value

  • vec_chop(): A list where each element has the same type asx. The sizeof the list is equal tovec_size(indices),vec_size(sizes), orvec_size(x) depending on whether or notindices orsizes is provided.

  • list_unchop(): A vector of typevec_ptype_common(!!!x), orptype, ifspecified. The size is computed asvec_size_common(!!!indices) unlessthe indices areNULL, in which case the size isvec_size_common(!!!x).

Dependencies ofvec_chop()

Dependencies oflist_unchop()

Examples

vec_chop(1:5)# These two are equivalentvec_chop(1:5, indices= list(1:2,3:5))vec_chop(1:5, sizes= c(2,3))# Can also be used on data framesvec_chop(mtcars, indices= list(1:3,4:6))# If `indices` selects every value in `x` exactly once,# in any order, then `list_unchop()` inverts `vec_chop()`x<- c("a","b","c","d")indices<- list(2, c(3,1),4)vec_chop(x, indices= indices)list_unchop(vec_chop(x, indices= indices), indices= indices)# When unchopping, size 1 elements of `x` are recycled# to the size of the corresponding indexlist_unchop(list(1,2:3), indices= list(c(1,3,5), c(2,4)))# Names are retained, and outer names can be combined with inner# names through the use of a `name_spec`lst<- list(x= c(a=1, b=2), y=1)list_unchop(lst, indices= list(c(3,2), c(1,4)), name_spec="{outer}_{inner}")# An alternative implementation of `ave()` can be constructed using# `vec_chop()` and `list_unchop()` in combination with `vec_group_loc()`ave2<-function(.x, .by, .f,...){  indices<- vec_group_loc(.by)$loc  chopped<- vec_chop(.x, indices= indices)  out<- lapply(chopped, .f,...)  list_unchop(out, indices= indices)}breaks<- warpbreaks$breakswool<- warpbreaks$woolave2(breaks, wool, mean)identical(  ave2(breaks, wool, mean),  ave(breaks, wool, FUN= mean))# If you know your input is sorted and you'd like to split on the groups,# `vec_run_sizes()` can be efficiently combined with `sizes`df<- data_frame(  g= c(2,5,5,6,6,6,6,8,9,9),  x=1:10)vec_chop(df, sizes= vec_run_sizes(df$g))# If you have a list of homogeneous vectors, sometimes it can be useful to# unchop, apply a function to the flattened vector, and then rechop according# to the original indices. This can be done efficiently with `list_sizes()`.x<- list(c(1,2,1), c(3,1),5, double())x_flat<- list_unchop(x)x_flat<- x_flat+ max(x_flat)vec_chop(x_flat, sizes= list_sizes(x))

Compare two vectors

Description

Compare two vectors

Usage

vec_compare(x, y, na_equal=FALSE, .ptype=NULL)

Arguments

x,y

Vectors with compatible types and lengths.

na_equal

ShouldNA values be considered equal?

.ptype

Override to optionally specify common type

Value

An integer vector with values -1 forx < y, 0 ifx == y,and 1 ifx > y. Ifna_equal isFALSE, the result will beNAif eitherx ory isNA.

S3 dispatch

vec_compare() is not generic for performance; instead it usesvec_proxy_compare() to create a proxy that is used in the comparison.

Dependencies

Examples

vec_compare(c(TRUE,FALSE,NA),FALSE)vec_compare(c(TRUE,FALSE,NA),FALSE, na_equal=TRUE)vec_compare(1:10,5)vec_compare(runif(10),0.5)vec_compare(letters[1:10],"d")df<- data.frame(x= c(1,1,1,2), y= c(0,1,2,1))vec_compare(df, data.frame(x=1, y=1))

Count unique values in a vector

Description

Count the number of unique values in a vector.vec_count() has twoimportant differences totable(): it returns a data frame, and whengiven multiple inputs (as a data frame), it only counts combinations thatappear in the input.

Usage

vec_count(x, sort= c("count","key","location","none"))

Arguments

x

A vector (including a data frame).

sort

One of "count", "key", "location", or "none".

  • "count", the default, puts most frequent values at top

  • "key", orders by the output key column (i.e. unique values ofx)

  • "location", orders by location where key first seen. This is usefulif you want to match the counts up to other unique/duplicated functions.

  • "none", leaves unordered. This is not guaranteed to produce the sameordering across R sessions, but is the fastest method.

Value

A data frame with columnskey (same type asx) andcount (an integer vector).

Dependencies

Examples

vec_count(mtcars$vs)vec_count(iris$Species)# If you count a data frame you'll get a data frame# column in the outputstr(vec_count(mtcars[c("vs","am")]))# Sorting ---------------------------------------x<- letters[rpois(100,6)]# default is to sort by frequencyvec_count(x)# by can sort by keyvec_count(x, sort="key")# or location of first valuevec_count(x, sort="location")head(x)# or not at allvec_count(x, sort="none")

Complete

Description

vec_detect_complete() detects "complete" observations. An observation isconsidered complete if it is non-missing. For most vectors, this implies thatvec_detect_complete(x) == !vec_detect_missing(x).

For data frames and matrices, a row is only considered complete if allelements of that row are non-missing. To compare,!vec_detect_missing(x)detects rows that are partially complete (they have at least one non-missingvalue).

Usage

vec_detect_complete(x)

Arguments

x

A vector

Details

Arecord type vector is similar to a data frame, and is onlyconsidered complete if all fields are non-missing.

Value

A logical vector with the same size asx.

See Also

stats::complete.cases()

Examples

x<- c(1,2,NA,4,NA)# For most vectors, this is identical to `!vec_detect_missing(x)`vec_detect_complete(x)!vec_detect_missing(x)df<- data_frame(  x= x,  y= c("a","b",NA,"d","e"))# This returns `TRUE` where all elements of the row are non-missing.# Compare that with `!vec_detect_missing()`, which detects rows that have at# least one non-missing value.df2<- dfdf2$all_non_missing<- vec_detect_complete(df)df2$any_non_missing<-!vec_detect_missing(df)df2

Find duplicated values

Description

  • vec_duplicate_any(): detects the presence of duplicated values,similar toanyDuplicated().

  • vec_duplicate_detect(): returns a logical vector describing if eachelement of the vector is duplicated elsewhere. Unlikeduplicated(), itreports all duplicated values, not just the second and subsequentrepetitions.

  • vec_duplicate_id(): returns an integer vector giving the location ofthe first occurrence of the value.

Usage

vec_duplicate_any(x)vec_duplicate_detect(x)vec_duplicate_id(x)

Arguments

x

A vector (including a data frame).

Value

  • vec_duplicate_any(): a logical vector of length 1.

  • vec_duplicate_detect(): a logical vector the same length asx.

  • vec_duplicate_id(): an integer vector the same length asx.

Missing values

In most cases, missing values are not considered to be equal, i.e.NA == NA is notTRUE. This behaviour would be unappealing here,so these functions consider allNAs to be equal. (Similarly,allNaN are also considered to be equal.)

Dependencies

See Also

vec_unique() for functions that work with the dual of duplicatedvalues: unique values.

Examples

vec_duplicate_any(1:10)vec_duplicate_any(c(1,1:10))x<- c(10,10,20,30,30,40)vec_duplicate_detect(x)# Note that `duplicated()` doesn't consider the first instance to# be a duplicateduplicated(x)# Identify elements of a vector by the location of the first element that# they're equal to:vec_duplicate_id(x)# Location of the unique values:vec_unique_loc(x)# Equivalent to `duplicated()`:vec_duplicate_id(x)== seq_along(x)

Equality

Description

vec_equal() tests if two vectors are equal.

Usage

vec_equal(x, y, na_equal=FALSE, .ptype=NULL)

Arguments

x,y

Vectors with compatible types and lengths.

na_equal

ShouldNA values be considered equal?

.ptype

Override to optionally specify common type

Value

A logical vector the same size as the common size ofx andy.Will only containNAs ifna_equal isFALSE.

Dependencies

See Also

vec_detect_missing()

Examples

vec_equal(c(TRUE,FALSE,NA),FALSE)vec_equal(c(TRUE,FALSE,NA),FALSE, na_equal=TRUE)vec_equal(5,1:10)vec_equal("d", letters[1:10])df<- data.frame(x= c(1,1,2,1), y= c(1,2,1,NA))vec_equal(df, data.frame(x=1, y=2))

Create a data frame from all combinations of the inputs

Description

vec_expand_grid() creates a new data frame by creating a grid of allpossible combinations of the input vectors. It is inspired byexpand.grid(). Compared withexpand.grid(), it:

  • Produces sorted output by default by varying the first column the slowest,rather than the fastest. Control this with.vary.

  • Never converts strings to factors.

  • Does not add additional attributes.

  • DropsNULL inputs.

  • Can expand any vector type, including data frames andrecords.

Usage

vec_expand_grid(...,  .vary="slowest",  .name_repair="check_unique",  .error_call= current_env())

Arguments

...

Name-value pairs. The name will become the column name in theresulting data frame.

.vary

One of:

  • "slowest" to vary the first column slowest. This produces sortedoutput and is generally the most useful.

  • "fastest" to vary the first column fastest. This matches the behaviorofexpand.grid().

.name_repair

One of"check_unique","unique","universal","minimal","unique_quiet", or"universal_quiet". Seevec_as_names()for the meaning of these options.

.error_call

The execution environment of a currentlyrunning function, e.g.caller_env(). The function will bementioned in error messages as the source of the error. See thecall argument ofabort() for more information.

Details

If any input is empty (i.e. size 0), then the result will have 0 rows.

If no inputs are provided, the result is a 1 row data frame with 0 columns.This is consistent with the fact thatprod() with no inputs returns1.

Value

A data frame with as many columns as there are inputs in... and as manyrows as theprod() of the sizes of the inputs.

Examples

vec_expand_grid(x=1:2, y=1:3)# Use `.vary` to match `expand.grid()`:vec_expand_grid(x=1:2, y=1:3, .vary="fastest")# Can also expand data framesvec_expand_grid(  x= data_frame(a=1:2, b=3:4),  y=1:4)

Fill in missing values with the previous or following value

Description

[Experimental]

vec_fill_missing() fills gaps of missing values with the previous orfollowing non-missing value.

Usage

vec_fill_missing(  x,  direction= c("down","up","downup","updown"),  max_fill=NULL)

Arguments

x

A vector

direction

Direction in which to fill missing values. Must be either"down","up","downup", or"updown".

max_fill

A single positive integer specifying the maximum number ofsequential missing values that will be filled. IfNULL, there isno limit.

Examples

x<- c(NA,NA,1,NA,NA,NA,3,NA,NA)# Filling down replaces missing values with the previous non-missing valuevec_fill_missing(x, direction="down")# To also fill leading missing values, use `"downup"`vec_fill_missing(x, direction="downup")# Limit the number of sequential missing values to fill with `max_fill`vec_fill_missing(x, max_fill=1)# Data frames are filled rowwise. Rows are only considered missing# if all elements of that row are missing.y<- c(1,NA,2,NA,NA,3,4,NA,5)df<- data_frame(x= x, y= y)dfvec_fill_missing(df)

Initialize a vector

Description

Initialize a vector

Usage

vec_init(x, n=1L)

Arguments

x

Template of vector to initialize.

n

Desired size of result.

Dependencies

  • vec_slice()

Examples

vec_init(1:10,3)vec_init(Sys.Date(),5)# The "missing" value for a data frame is a row that is entirely missingvec_init(mtcars,2)# The "missing" value for a list is `NULL`vec_init(list(),3)

Interleave many vectors into one vector

Description

vec_interleave() combines multiple vectors together, much likevec_c(),but does so in such a way that the elements of each vector are interleavedtogether.

It is a more efficient equivalent to the following usage ofvec_c():

vec_interleave(x, y) == vec_c(x[1], y[1], x[2], y[2], ..., x[n], y[n])

Usage

vec_interleave(...,  .ptype=NULL,  .name_spec=NULL,  .name_repair= c("minimal","unique","check_unique","universal","unique_quiet","universal_quiet"))

Arguments

...

Vectors to interleave. These will berecycled to a common size.

.ptype

IfNULL, the default, the output type is determined bycomputing the common type across all elements of....

Alternatively, you can supply.ptype to give the output known type.IfgetOption("vctrs.no_guessing") isTRUE you must supply this value:this is a convenient way to make production code demand fixed types.

.name_spec

A name specification for combininginner and outer names. This is relevant for inputs passed with aname, when these inputs are themselves named, likeouter = c(inner = 1), or when they have length greater than 1:outer = 1:2. By default, these cases trigger an error. You can resolvethe error by providing a specification that describes how tocombine the names or the indices of the inner vector with thename of the input. This specification can be:

  • A function of two arguments. The outer name is passed as astring to the first argument, and the inner names or positionsare passed as second argument.

  • An anonymous function as a purrr-style formula.

  • A glue specification of the form"{outer}_{inner}".

  • Anrlang::zap() object, in which case both outer and innernames are ignored and the result is unnamed.

See thename specification topic.

.name_repair

How to repair names, seerepair options invec_as_names().

Dependencies

vctrs dependencies

Examples

# The most common case is to interleave two vectorsvec_interleave(1:3,4:6)# But you aren't restricted to just twovec_interleave(1:3,4:6,7:9,10:12)# You can also interleave data framesx<- data_frame(x=1:2, y= c("a","b"))y<- data_frame(x=3:4, y= c("c","d"))vec_interleave(x, y)

Locate observations matching specified conditions

Description

[Experimental]

vec_locate_matches() is a more flexible version ofvec_match() used toidentify locations where each value ofneedles matches one or multiplevalues inhaystack. Unlikevec_match(),vec_locate_matches() returnsall matches by default, and can match on binary conditions other thanequality, such as>,>=,<, and<=.

Usage

vec_locate_matches(  needles,  haystack,...,  condition="==",  filter="none",  incomplete="compare",  no_match=NA_integer_,  remaining="drop",  multiple="all",  relationship="none",  nan_distinct=FALSE,  chr_proxy_collate=NULL,  needles_arg="needles",  haystack_arg="haystack",  error_call= current_env())

Arguments

needles,haystack

Vectors used for matching.

  • needles represents the vector to search for.

  • haystack represents the vector to search in.

Prior to comparison,needles andhaystack are coerced to the same type.

...

These dots are for future extensions and must be empty.

condition

Condition controlling howneedles should be comparedagainsthaystack to identify a successful match.

  • One of:"==",">",">=","<", or"<=".

  • For data frames, a length1 orncol(needles) character vectorcontaining only the above options, specifying how matching is determinedfor each column.

filter

Filter to be applied to the matched results.

  • "none" doesn't apply any filter.

  • "min" returns only the minimum haystack value matching the currentneedle.

  • "max" returns only the maximum haystack value matching the currentneedle.

  • For data frames, a length1 orncol(needles) character vectorcontaining only the above options, specifying a filter to apply toeach column.

Filters don't have any effect on"==" conditions, but are useful forcomputing "rolling" matches with other conditions.

A filter can return multiple haystack matches for a particular needleif the maximum or minimum haystack value is duplicated inhaystack. Thesecan be further controlled withmultiple.

incomplete

Handling of missing andincompletevalues inneedles.

  • "compare" usescondition to determine whether or not a missing valueinneedles matches a missing value inhaystack. Ifcondition is==,>=, or<=, then missing values will match.

  • "match" always allows missing values inneedles to match missingvalues inhaystack, regardless of thecondition.

  • "drop" drops incomplete values inneedles from the result.

  • "error" throws an error if anyneedles are incomplete.

  • If a single integer is provided, this represents the value returnedin thehaystack column for values ofneedles that are incomplete. Ifno_match = NA, settingincomplete = NA forces incomplete values inneedles to be treated like unmatched values.

nan_distinct determines whether aNA is allowed to match aNaN.

no_match

Handling ofneedles without a match.

  • "drop" dropsneedles with zero matches from the result.

  • "error" throws an error if anyneedles have zero matches.

  • If a single integer is provided, this represents the value returned inthehaystack column for values ofneedles that have zero matches. Thedefault represents an unmatched needle withNA.

remaining

Handling ofhaystack values thatneedles never matched.

  • "drop" drops remaininghaystack values from the result.Typically, this is the desired behavior if you only care whenneedleshas a match.

  • "error" throws an error if there are any remaininghaystackvalues.

  • If a single integer is provided (oftenNA), this represents the valuereturned in theneedles column for the remaininghaystack valuesthatneedles never matched. Remaininghaystack values are alwaysreturned at the end of the result.

multiple

Handling ofneedles with multiple matches. For each needle:

  • "all" returns all matches detected inhaystack.

  • "any" returns any match detected inhaystack with no guarantees onwhich match will be returned. It is often faster than"first" and"last" if you just need to detect if there is at least one match.

  • "first" returns the first match detected inhaystack.

  • "last" returns the last match detected inhaystack.

relationship

Handling of the expected relationship betweenneedles andhaystack. If the expectations chosen from the list beloware invalidated, an error is thrown.

  • "none" doesn't perform any relationship checks.

  • "one-to-one" expects:

    • Each value inneedles matches at most 1 value inhaystack.

    • Each value inhaystack matches at most 1 value inneedles.

  • "one-to-many" expects:

    • Each value inneedles matches any number of values inhaystack.

    • Each value inhaystack matches at most 1 value inneedles.

  • "many-to-one" expects:

    • Each value inneedles matches at most 1 value inhaystack.

    • Each value inhaystack matches any number of values inneedles.

  • "many-to-many" expects:

    • Each value inneedles matches any number of values inhaystack.

    • Each value inhaystack matches any number of values inneedles.

    This performs no checks, and is identical to"none", but is provided toallow you to be explicit about this relationship if you know it exists.

  • "warn-many-to-many" doesn't assume there is any known relationship, butwill warn ifneedles andhaystack have a many-to-many relationship(which is typically unexpected), encouraging you to either take a closerlook at your inputs or make this relationship explicit by specifying"many-to-many".

relationship is applied afterfilter andmultiple to allow potentialmultiple matches to be filtered out first.

relationship doesn't handle cases where there are zero matches. For that,seeno_match andremaining.

nan_distinct

A single logical specifying whether or notNaN shouldbe considered distinct fromNA for double and complex vectors. IfTRUE,NaN will always be ordered betweenNA and non-missing numbers.

chr_proxy_collate

A function generating an alternate representationof character vectors to use for collation, often used for locale-awareordering.

  • IfNULL, no transformation is done.

  • Otherwise, this must be a function of one argument. If the input containsa character vector, it will be passed to this function after it has beentranslated to UTF-8. This function should return a character vector withthe same length as the input. The result should sort as expected in theC-locale, regardless of encoding.

For data frames,chr_proxy_collate will be applied to all charactercolumns.

Common transformation functions include:tolower() for case-insensitiveordering andstringi::stri_sort_key() for locale-aware ordering.

needles_arg,haystack_arg

Argument tags forneedles andhaystackused in error messages.

error_call

The execution environment of a currentlyrunning function, e.g.caller_env(). The function will bementioned in error messages as the source of the error. See thecall argument ofabort() for more information.

Details

vec_match() is identical to (but often slightly faster than):

vec_locate_matches(  needles,  haystack,  condition = "==",  multiple = "first",  nan_distinct = TRUE)

vec_locate_matches() is extremely similar to a SQL join betweenneedlesandhaystack, with the default being most similar to a left join.

Be very careful when specifying matchconditions. If a condition ismisspecified, it is very easy to accidentally generate an exponentiallylarge number of matches.

Value

A two column data frame containing the locations of the matches.

  • needles is an integer vector containing the location ofthe needle currently being matched.

  • haystack is an integer vector containing the location of thecorresponding match in the haystack for the current needle.

Dependencies ofvec_locate_matches()

Examples

x<- c(1,2,NA,3,NaN)y<- c(2,1,4,NA,1,2,NaN)# By default, for each value of `x`, all matching locations in `y` are# returnedmatches<- vec_locate_matches(x, y)matches# The result can be used to slice the inputs to align themdata_frame(  x= vec_slice(x, matches$needles),  y= vec_slice(y, matches$haystack))# If multiple matches are present, control which is returned with `multiple`vec_locate_matches(x, y, multiple="first")vec_locate_matches(x, y, multiple="last")vec_locate_matches(x, y, multiple="any")# Use `relationship` to add constraints and error on multiple matches if# they aren't expectedtry(vec_locate_matches(x, y, relationship="one-to-one"))# In this case, the `NA` in `y` matches two rows in `x`try(vec_locate_matches(x, y, relationship="one-to-many"))# By default, `NA` is treated as being identical to `NaN`.# Using `nan_distinct = TRUE` treats `NA` and `NaN` as different values, so# `NA` can only match `NA`, and `NaN` can only match `NaN`.vec_locate_matches(x, y, nan_distinct=TRUE)# If you never want missing values to match, set `incomplete = NA` to return# `NA` in the `haystack` column anytime there was an incomplete value# in `needles`.vec_locate_matches(x, y, incomplete=NA)# Using `incomplete = NA` allows us to enforce the one-to-many relationship# that we couldn't beforevec_locate_matches(x, y, relationship="one-to-many", incomplete=NA)# `no_match` allows you to specify the returned value for a needle with# zero matches. Note that this is different from an incomplete value,# so specifying `no_match` allows you to differentiate between incomplete# values and unmatched values.vec_locate_matches(x, y, incomplete=NA, no_match=0L)# If you want to require that every `needle` has at least 1 match, set# `no_match` to `"error"`:try(vec_locate_matches(x, y, incomplete=NA, no_match="error"))# By default, `vec_locate_matches()` detects equality between `needles` and# `haystack`. Using `condition`, you can detect where an inequality holds# true instead. For example, to find every location where `x[[i]] >= y`:matches<- vec_locate_matches(x, y, condition=">=")data_frame(  x= vec_slice(x, matches$needles),  y= vec_slice(y, matches$haystack))# You can limit which matches are returned with a `filter`. For example,# with the above example you can filter the matches returned by `x[[i]] >= y`# down to only the ones containing the maximum `y` value of those matches.matches<- vec_locate_matches(x, y, condition=">=", filter="max")# Here, the matches for the `3` needle value have been filtered down to# only include the maximum haystack value of those matches, `2`. This is# often referred to as a rolling join.data_frame(  x= vec_slice(x, matches$needles),  y= vec_slice(y, matches$haystack))# In the very rare case that you need to generate locations for a# cross match, where every value of `x` is forced to match every# value of `y` regardless of what the actual values are, you can# replace `x` and `y` with integer vectors of the same size that contain# a single value and match on those instead.x_proxy<- vec_rep(1L, vec_size(x))y_proxy<- vec_rep(1L, vec_size(y))nrow(vec_locate_matches(x_proxy, y_proxy))vec_size(x)* vec_size(y)# By default, missing values will match other missing values when using# `==`, `>=`, or `<=` conditions, but not when using `>` or `<` conditions.# This is similar to how `vec_compare(x, y, na_equal = TRUE)` works.x<- c(1,NA)y<- c(NA,2)vec_locate_matches(x, y, condition="<=")vec_locate_matches(x, y, condition="<")# You can force missing values to match regardless of the `condition`# by using `incomplete = "match"`vec_locate_matches(x, y, condition="<", incomplete="match")# You can also use data frames for `needles` and `haystack`. The# `condition` will be recycled to the number of columns in `needles`, or# you can specify varying conditions per column. In this example, we take# a vector of date `values` and find all locations where each value is# between lower and upper bounds specified by the `haystack`.values<- as.Date("2019-01-01")+0:9needles<- data_frame(lower= values, upper= values)set.seed(123)lower<- as.Date("2019-01-01")+ sample(10,10, replace=TRUE)upper<- lower+ sample(3,10, replace=TRUE)haystack<- data_frame(lower= lower, upper= upper)# (values >= lower) & (values <= upper)matches<- vec_locate_matches(needles, haystack, condition= c(">=","<="))data_frame(  lower= vec_slice(lower, matches$haystack),  value= vec_slice(values, matches$needle),  upper= vec_slice(upper, matches$haystack))

Find matching observations across vectors

Description

vec_in() returns a logical vector based on whetherneedle is found inhaystack.vec_match() returns an integer vector giving location ofneedle inhaystack, orNA if it's not found.

Usage

vec_match(  needles,  haystack,...,  na_equal=TRUE,  needles_arg="",  haystack_arg="")vec_in(  needles,  haystack,...,  na_equal=TRUE,  needles_arg="",  haystack_arg="")

Arguments

needles,haystack

Vector ofneedles to search for in vector haystack.haystack should usually be unique; if notvec_match() will onlyreturn the location of the first match.

needles andhaystack are coerced to the same type prior tocomparison.

...

These dots are for future extensions and must be empty.

na_equal

IfTRUE, missing values inneedles can bematched to missing values inhaystack. IfFALSE, theypropagate, missing values inneedles are represented asNA inthe return value.

needles_arg,haystack_arg

Argument tags forneedles andhaystack used in error messages.

Details

vec_in() is equivalent to%in%;vec_match() is equivalent tomatch().

Value

A vector the same length asneedles.vec_in() returns alogical vector;vec_match() returns an integer vector.

Missing values

In most cases places in R, missing values are not considered to be equal,i.e.NA == NA is notTRUE. The exception is in matching functionslikematch() andmerge(), where anNA will match anotherNA.Byvec_match() andvec_in() will matchNAs; but you can controlthis behaviour with thena_equal argument.

Dependencies

Examples

hadley<- strsplit("hadley","")[[1]]vec_match(hadley, letters)vowels<- c("a","e","i","o","u")vec_match(hadley, vowels)vec_in(hadley, vowels)# Only the first index of duplicates is returnedvec_match(c("a","b"), c("a","b","a","b"))

Get or set the names of a vector

Description

These functions work likerlang::names2(),names() andnames<-(),except that they return or modify the the rowwise names of the vector. These are:

  • The usualnames() for atomic vectors and lists

  • The row names for data frames and matrices

  • The names of the first dimension for arraysRowwise names are size consistent: the length of the names always equalsvec_size().

vec_names2() returns the repaired names from a vector, even if it is unnamed.Seevec_as_names() for details on name repair.

vec_names() is a bare-bones version that returnsNULL if the vector isunnamed.

vec_set_names() sets the names or removes them.

Usage

vec_names2(  x,...,  repair= c("minimal","unique","universal","check_unique","unique_quiet","universal_quiet"),  quiet=FALSE)vec_names(x)vec_set_names(x, names)

Arguments

x

A vector with names

...

These dots are for future extensions and must be empty.

repair

Either a string or a function. If a string, it must be one of"check_unique","minimal","unique","universal","unique_quiet",or"universal_quiet". If a function, it is invoked with a vector ofminimal names and must return minimal names, otherwise an error is thrown.

  • Minimal names are neverNULL orNA. When an element doesn'thave a name, its minimal name is an empty string.

  • Unique names are unique. A suffix is appended to duplicatenames to make them unique.

  • Universal names are unique and syntactic, meaning that you cansafely use the names as variables without causing a syntaxerror.

The"check_unique" option doesn't perform any name repair.Instead, an error is raised if the names don't suit the"unique" criteria.

The options"unique_quiet" and"universal_quiet" are here to help theuser who calls this function indirectly, via another function which exposesrepair but notquiet. Specifyingrepair = "unique_quiet" is likespecifying⁠repair = "unique", quiet = TRUE⁠. When the"*_quiet" optionsare used, any setting ofquiet is silently overridden.

quiet

By default, the user is informed of any renamingcaused by repairing the names. This only concerns unique anduniversal repairing. Setquiet toTRUE to silence themessages.

Users can silence the name repair messages by setting the"rlib_name_repair_verbosity" global option to"quiet".

names

A character vector, orNULL.

Value

vec_names2() returns the names ofx, repaired.vec_names() returns the names ofx orNULL if unnamed.vec_set_names() returnsx with names updated.

Examples

vec_names2(1:3)vec_names2(1:3, repair="unique")vec_names2(c(a=1, b=2))# `vec_names()` consistently returns the rowwise names of data frames and arrays:vec_names(data.frame(a=1, b=2))names(data.frame(a=1, b=2))vec_names(mtcars)names(mtcars)vec_names(Titanic)names(Titanic)vec_set_names(1:3, letters[1:3])vec_set_names(data.frame(a=1:3), letters[1:3])

Order and sort vectors

Description

Order and sort vectors

Usage

vec_order(  x,...,  direction= c("asc","desc"),  na_value= c("largest","smallest"))vec_sort(  x,...,  direction= c("asc","desc"),  na_value= c("largest","smallest"))

Arguments

x

A vector

...

These dots are for future extensions and must be empty.

direction

Direction to sort in. Defaults toascending.

na_value

ShouldNAs be treated as the largest or smallest values?

Value

  • vec_order() an integer vector the same size asx.

  • vec_sort() a vector with the same size and type asx.

Differences withorder()

Unlike thena.last argument oforder() which decides thepositions of missing values irrespective of thedecreasingargument, thena_value argument ofvec_order() interacts withdirection. If missing values are considered the largest value,they will appear last in ascending order, and first in descendingorder.

Dependencies ofvec_order()

Dependencies ofvec_sort()

Examples

x<- round(c(runif(9),NA),3)vec_order(x)vec_sort(x)vec_sort(x, direction="desc")# Can also handle data framesdf<- data.frame(g= sample(2,10, replace=TRUE), x= x)vec_order(df)vec_sort(df)vec_sort(df, direction="desc")# Missing values interpreted as largest values are last when# in increasing order:vec_order(c(1,NA), na_value="largest", direction="asc")vec_order(c(1,NA), na_value="largest", direction="desc")

Find the prototype of a set of vectors

Description

vec_ptype() returns the unfinalised prototype of a single vector.vec_ptype_common() finds the common type of multiple vectors.vec_ptype_show() nicely prints the common type of any number ofinputs, and is designed for interactive exploration.

Usage

vec_ptype(x,..., x_arg="", call= caller_env())vec_ptype_common(..., .ptype=NULL, .arg="", .call= caller_env())vec_ptype_show(...)

Arguments

x

A vector

...

Forvec_ptype(), these dots are for future extensions and mustbe empty.

Forvec_ptype_common() andvec_ptype_show(), vector inputs.

x_arg

Argument name forx. This is used in error messages to informthe user about the locations of incompatible types.

call,.call

The execution environment of a currentlyrunning function, e.g.caller_env(). The function will bementioned in error messages as the source of the error. See thecall argument ofabort() for more information.

.ptype

IfNULL, the default, the output type is determined bycomputing the common type across all elements of....

Alternatively, you can supply.ptype to give the output known type.IfgetOption("vctrs.no_guessing") isTRUE you must supply this value:this is a convenient way to make production code demand fixed types.

.arg

An argument name as a string. This argumentwill be mentioned in error messages as the input that is at theorigin of a problem.

Value

vec_ptype() andvec_ptype_common() return a prototype(a size-0 vector)

vec_ptype()

vec_ptype() returnssize 0 vectors potentiallycontaining attributes but no data. Generally, this is justvec_slice(x, 0L), but some inputs require specialhandling.

  • While you can't sliceNULL, the prototype ofNULL isitself. This is because we treatNULL as an identity value inthevec_ptype2() monoid.

  • The prototype of logical vectors that only contain missing valuesis the specialunspecified type, which can be coerced to anyother 1d type. This allows bareNAs to represent missing valuesfor any 1d vector type.

Seeinternal-faq-ptype2-identity for more information aboutidentity values.

vec_ptype() is aperformance generic. It is not necessary to implement itbecause the default method will work for any vctrs type. However the defaultmethod builds around other vctrs primitives likevec_slice() which incursperformance costs. If your class has a static prototype, you might considerimplementing a customvec_ptype() method that returns a constant. This willimprove the performance of your class in many cases (common type imputation in particular).

Because it may contain unspecified vectors, the prototype returnedbyvec_ptype() is said to beunfinalised. Callvec_ptype_finalise() to finalise it. Commonly you will need thefinalised prototype as returned byvec_slice(x, 0L).

vec_ptype_common()

vec_ptype_common() first finds the prototype of each input, thensuccessively callsvec_ptype2() to find a common type. It returnsafinalised prototype.

Dependencies ofvec_ptype()

Dependencies ofvec_ptype_common()

Examples

# Unknown types ------------------------------------------vec_ptype_show()vec_ptype_show(NA)vec_ptype_show(NULL)# Vectors ------------------------------------------------vec_ptype_show(1:10)vec_ptype_show(letters)vec_ptype_show(TRUE)vec_ptype_show(Sys.Date())vec_ptype_show(Sys.time())vec_ptype_show(factor("a"))vec_ptype_show(ordered("a"))# Matrices -----------------------------------------------# The prototype of a matrix includes the number of columnsvec_ptype_show(array(1, dim= c(1,2)))vec_ptype_show(array("x", dim= c(1,2)))# Data frames --------------------------------------------# The prototype of a data frame includes the prototype of# every columnvec_ptype_show(iris)# The prototype of multiple data frames includes the prototype# of every column that in any data framevec_ptype_show(  data.frame(x=TRUE),  data.frame(y=2),  data.frame(z="a"))

Find the common type for a pair of vectors

Description

vec_ptype2() defines the coercion hierarchy for a set of relatedvector types. Along withvec_cast(), this generic forms thefoundation of type coercions in vctrs.

vec_ptype2() is relevant when you are implementing vctrs methodsfor your class, but it should not usually be called directly. Ifyou need to find the common type of a set of inputs, callvec_ptype_common() instead. This function supports multipleinputs andfinalises the common type.

Usage

## S3 method for class 'logical'vec_ptype2(x, y,..., x_arg="", y_arg="")## S3 method for class 'integer'vec_ptype2(x, y,..., x_arg="", y_arg="")## S3 method for class 'double'vec_ptype2(x, y,..., x_arg="", y_arg="")## S3 method for class 'complex'vec_ptype2(x, y,..., x_arg="", y_arg="")## S3 method for class 'character'vec_ptype2(x, y,..., x_arg="", y_arg="")## S3 method for class 'raw'vec_ptype2(x, y,..., x_arg="", y_arg="")## S3 method for class 'list'vec_ptype2(x, y,..., x_arg="", y_arg="")vec_ptype2(  x,  y,...,  x_arg= caller_arg(x),  y_arg= caller_arg(y),  call= caller_env())

Arguments

x,y

Vector types.

...

These dots are for future extensions and must be empty.

x_arg,y_arg

Argument names forx andy. These are usedin error messages to inform the user about the locations ofincompatible types (seestop_incompatible_type()).

call

The execution environment of a currentlyrunning function, e.g.caller_env(). The function will bementioned in error messages as the source of the error. See thecall argument ofabort() for more information.

Implementing coercion methods

  • For an overview of how these generics work and their roles in vctrs,see?theory-faq-coercion.

  • For an example of implementing coercion methods for simple vectors,see?howto-faq-coercion.

  • For an example of implementing coercion methods for data framesubclasses, see?howto-faq-coercion-data-frame.

  • For a tutorial about implementing vctrs classes from scratch, seevignette("s3-vector").

Dependencies

See Also

stop_incompatible_type() when you determine from theattributes that an input can't be cast to the target type.


Compute ranks

Description

vec_rank() computes the sample ranks of a vector. For data frames, ranksare computed along the rows, using all columns after the first to breakties.

Usage

vec_rank(  x,...,  ties= c("min","max","sequential","dense"),  incomplete= c("rank","na"),  direction="asc",  na_value="largest",  nan_distinct=FALSE,  chr_proxy_collate=NULL)

Arguments

x

A vector

...

These dots are for future extensions and must be empty.

ties

Ranking of duplicate values.

  • "min": Use the current rank for all duplicates. The next non-duplicatevalue will have a rank incremented by the number of duplicates present.

  • "max": Use the current rank+ n_duplicates - 1 for all duplicates.The next non-duplicate value will have a rank incremented by the number ofduplicates present.

  • "sequential": Use an increasing sequence of ranks starting at thecurrent rank, applied to duplicates in order of appearance.

  • "dense": Use the current rank for all duplicates. The nextnon-duplicate value will have a rank incremented by1, effectivelyremoving any gaps in the ranking.

incomplete

Ranking of missing andincompleteobservations.

  • "rank": Rank incomplete observations normally. Missing values withinincomplete observations will be affected byna_value andnan_distinct.

  • "na": Don't rank incomplete observations at all. Instead, they aregiven a rank ofNA. In this case,na_value andnan_distinct haveno effect.

direction

Direction to sort in.

  • A single"asc" or"desc" for ascending or descending orderrespectively.

  • For data frames, a length1 orncol(x) character vector containingonly"asc" or"desc", specifying the direction for each column.

na_value

Ordering of missing values.

  • A single"largest" or"smallest" for ordering missing values as thelargest or smallest values respectively.

  • For data frames, a length1 orncol(x) character vector containingonly"largest" or"smallest", specifying how missing values shouldbe ordered within each column.

nan_distinct

A single logical specifying whether or notNaN shouldbe considered distinct fromNA for double and complex vectors. IfTRUE,NaN will always be ordered betweenNA and non-missing numbers.

chr_proxy_collate

A function generating an alternate representationof character vectors to use for collation, often used for locale-awareordering.

  • IfNULL, no transformation is done.

  • Otherwise, this must be a function of one argument. If the input containsa character vector, it will be passed to this function after it has beentranslated to UTF-8. This function should return a character vector withthe same length as the input. The result should sort as expected in theC-locale, regardless of encoding.

For data frames,chr_proxy_collate will be applied to all charactercolumns.

Common transformation functions include:tolower() for case-insensitiveordering andstringi::stri_sort_key() for locale-aware ordering.

Details

Unlikebase::rank(), whenincomplete = "rank" all missing values aregiven the same rank, rather than an increasing sequence of ranks. Whennan_distinct = FALSE,NaN values are given the same rank asNA,otherwise they are given a rank that differentiates them fromNA.

Likevec_order_radix(), ordering is done in the C-locale. This can affectthe ranks of character vectors, especially regarding how uppercase andlowercase letters are ranked. See the documentation ofvec_order_radix()for more information.

Dependencies

Examples

x<- c(5L,6L,3L,3L,5L,3L)vec_rank(x, ties="min")vec_rank(x, ties="max")# Sequential ranks use an increasing sequence for duplicatesvec_rank(x, ties="sequential")# Dense ranks remove gaps between distinct values,# even if there are duplicatesvec_rank(x, ties="dense")y<- c(NA, x,NA,NaN)# Incomplete values match other incomplete values by default, and their# overall position can be adjusted with `na_value`vec_rank(y, na_value="largest")vec_rank(y, na_value="smallest")# NaN can be ranked separately from NA if requiredvec_rank(y, nan_distinct=TRUE)# Rank in descending order. Since missing values are the largest value,# they are given a rank of `1` when ranking in descending order.vec_rank(y, direction="desc", na_value="largest")# Give incomplete values a rank of `NA` by setting `incomplete = "na"`vec_rank(y, incomplete="na")# Can also rank data frames, using columns after the first to break tiesz<- c(2L,3L,4L,4L,5L,2L)df<- data_frame(x= x, z= z)dfvec_rank(df)

Vector recycling

Description

vec_recycle(x, size) recycles a single vector to a given size.vec_recycle_common(...) recycles multiple vectors to their common size. Allfunctions obey thevctrs recycling rules, and willthrow an error if recycling is not possible. Seevec_size() for the precisedefinition of size.

Usage

vec_recycle(x, size,..., x_arg="", call= caller_env())vec_recycle_common(..., .size=NULL, .arg="", .call= caller_env())

Arguments

x

A vector to recycle.

size

Desired output size.

...

Depending on the function used:

  • Forvec_recycle_common(), vectors to recycle.

  • Forvec_recycle(), these dots should be empty.

x_arg

Argument name forx. These are used in errormessages to inform the user about which argument has anincompatible size.

call,.call

The execution environment of a currentlyrunning function, e.g.caller_env(). The function will bementioned in error messages as the source of the error. See thecall argument ofabort() for more information.

.size

Desired output size. If omitted,will use the common size fromvec_size_common().

.arg

An argument name as a string. This argumentwill be mentioned in error messages as the input that is at theorigin of a problem.

Dependencies

Examples

# Inputs with 1 observation are recycledvec_recycle_common(1:5,5)vec_recycle_common(integer(),5)## Not run:vec_recycle_common(1:5,1:2)## End(Not run)# Data frames and matrices are recycled along their rowsvec_recycle_common(data.frame(x=1),1:5)vec_recycle_common(array(1:2, c(1,2)),1:5)vec_recycle_common(array(1:3, c(1,3,1)),1:5)

Useful sequences

Description

vec_seq_along() is equivalent toseq_along() but uses size, not length.vec_init_along() creates a vector of missing values with size matchingan existing object.

Usage

vec_seq_along(x)vec_init_along(x, y= x)

Arguments

x,y

Vectors

Value

  • vec_seq_along() an integer vector with the same size asx.

  • vec_init_along() a vector with the same type asx and the same sizeasy.

Examples

vec_seq_along(mtcars)vec_init_along(head(mtcars))

Number of observations

Description

vec_size(x) returns the size of a vector.vec_is_empty()returnsTRUE if the size is zero,FALSE otherwise.

The size is distinct from thelength() of a vector because itgeneralises to the "number of observations" for 2d structures,i.e. it's the number of rows in matrix or a data frame. Thisdefinition has the important property that every column of a dataframe (even data frame and matrix columns) have the same size.vec_size_common(...) returns the common size of multiple vectors.

list_sizes() returns an integer vector containing the size of each elementof a list. It is nearly equivalent to, but faster than,map_int(x, vec_size), with the exception thatlist_sizes() willerror on non-list inputs, as defined byobj_is_list().list_sizes() istovec_size() aslengths() is tolength().

Usage

vec_size(x)vec_size_common(...,  .size=NULL,  .absent=0L,  .arg="",  .call= caller_env())list_sizes(x)vec_is_empty(x)

Arguments

x,...

Vector inputs orNULL.

.size

IfNULL, the default, the output size is determined byrecycling the lengths of all elements of.... Alternatively, you cansupply.size to force a known size; in this case,x and... areignored.

.absent

The size used when no input is provided, or when all inputisNULL. If left asNULL when no input is supplied, an error is thrown.

.arg

An argument name as a string. This argumentwill be mentioned in error messages as the input that is at theorigin of a problem.

.call

The execution environment of a currentlyrunning function, e.g.caller_env(). The function will bementioned in error messages as the source of the error. See thecall argument ofabort() for more information.

Details

There is no vctrs helper that retrieves the number of columns: as thisis a property of thetype.

vec_size() is equivalent toNROW() but has a name that is easier topronounce, and throws an error when passed non-vector inputs.

Value

An integer (or double for long vectors).

vec_size_common() returns.absent if all inputs areNULL orabsent,0L by default.

Invariants

  • vec_size(dataframe) ==vec_size(dataframe[[i]])

  • vec_size(matrix) ==vec_size(matrix[, i, drop = FALSE])

  • vec_size(vec_c(x, y)) ==vec_size(x) +vec_size(y)

The size of NULL

The size ofNULL is hard-coded to0L invec_size().vec_size_common() returns.absent when all inputs areNULL(if only some inputs areNULL, they are simply ignored).

A default size of 0 makes sense because sizes are most oftenqueried in order to compute a total size while assembling acollection of vectors. Since we treatNULL as an absent input byprinciple, we return the identity of sizes under addition toreflect that an absent input doesn't take up any size.

Note that other defaults might make sense under differentcircumstances. For instance, a default size of 1 makes sense forfinding the common size because 1 is the identity of the recyclingrules.

Dependencies

See Also

vec_slice() for a variation of[ compatible withvec_size(),andvec_recycle() torecycle vectors to commonlength.

Examples

vec_size(1:100)vec_size(mtcars)vec_size(array(dim= c(3,5,10)))vec_size_common(1:10,1:10)vec_size_common(1:10,1)vec_size_common(integer(),1)list_sizes(list("a",1:5, letters))

Split a vector into groups

Description

This is a generalisation ofsplit() that can split by any type of vector,not just factors. Instead of returning the keys in the character names,the are returned in a separate parallel vector.

Usage

vec_split(x, by)

Arguments

x

Vector to divide into groups.

by

Vector whose unique values defines the groups.

Value

A data frame with two columns and size equal tovec_size(vec_unique(by)). Thekey column has the same type asby, and theval column is a list containing elements of typevec_ptype(x).

Note for complex types, the defaultdata.frame print method will besuboptimal, and you will want to coerce into a tibble to betterunderstand the output.

Dependencies

Examples

vec_split(mtcars$cyl, mtcars$vs)vec_split(mtcars$cyl, mtcars[c("vs","am")])if(require("tibble")){  as_tibble(vec_split(mtcars$cyl, mtcars[c("vs","am")]))  as_tibble(vec_split(mtcars, mtcars[c("vs","am")]))}

Find and count unique values

Description

  • vec_unique(): the unique values. Equivalent tounique().

  • vec_unique_loc(): the locations of the unique values.

  • vec_unique_count(): the number of unique values.

Usage

vec_unique(x)vec_unique_loc(x)vec_unique_count(x)

Arguments

x

A vector (including a data frame).

Value

  • vec_unique(): a vector the same type asx containing only uniquevalues.

  • vec_unique_loc(): an integer vector, giving locations of unique values.

  • vec_unique_count(): an integer vector of length 1, giving thenumber of unique values.

Dependencies

Missing values

In most cases, missing values are not considered to be equal, i.e.NA == NA is notTRUE. This behaviour would be unappealing here,so these functions consider allNAs to be equal. (Similarly,allNaN are also considered to be equal.)

See Also

vec_duplicate for functions that work with the dual ofunique values: duplicated values.

Examples

x<- rpois(100,8)vec_unique(x)vec_unique_loc(x)vec_unique_count(x)# `vec_unique()` returns values in the order that encounters them# use sort = "location" to match to the result of `vec_count()`head(vec_unique(x))head(vec_count(x, sort="location"))# Normally missing values are not considered to be equalNA==NA# But they are for the purposes of considering uniquenessvec_unique(c(NA,NA,NA,NA,1,2,1))

Repeat a vector

Description

  • vec_rep() repeats an entire vector a set number oftimes.

  • vec_rep_each() repeats each element of a vector a set number oftimes.

  • vec_unrep() compresses a vector with repeated values. The repeated valuesare returned as akey alongside the number oftimes each key isrepeated.

Usage

vec_rep(  x,  times,...,  error_call= current_env(),  x_arg="x",  times_arg="times")vec_rep_each(  x,  times,...,  error_call= current_env(),  x_arg="x",  times_arg="times")vec_unrep(x)

Arguments

x

A vector.

times

Forvec_rep(), a single integer for the number of times to repeatthe entire vector.

Forvec_rep_each(), an integer vector of the number of times to repeateach element ofx.times will berecycled tothe size ofx.

...

These dots are for future extensions and must be empty.

error_call

The execution environment of a currentlyrunning function, e.g.caller_env(). The function will bementioned in error messages as the source of the error. See thecall argument ofabort() for more information.

x_arg,times_arg

Argument names for errors.

Details

Usingvec_unrep() andvec_rep_each() together is similar to usingbase::rle() andbase::inverse.rle(). The following invariant showsthe relationship between the two functions:

compressed <- vec_unrep(x)identical(x, vec_rep_each(compressed$key, compressed$times))

There are two main differences betweenvec_unrep() andbase::rle():

  • vec_unrep() treats adjacent missing values as equivalent, whilerle()treats them as different values.

  • vec_unrep() works along the size ofx, whilerle() works along itslength. This means thatvec_unrep() works on data frames by compressingrepeated rows.

Value

Forvec_rep(), a vector the same type asx with sizevec_size(x) * times.

Forvec_rep_each(), a vector the same type asx with sizesum(vec_recycle(times, vec_size(x))).

Forvec_unrep(), a data frame with two columns,key andtimes.keyis a vector with the same type asx, andtimes is an integer vector.

Dependencies

Examples

# Repeat the entire vectorvec_rep(1:2,3)# Repeat within each vectorvec_rep_each(1:2,3)x<- vec_rep_each(1:2, c(3,4))x# After using `vec_rep_each()`, you can recover the original vector# with `vec_unrep()`vec_unrep(x)df<- data.frame(x=1:2, y=3:4)# `rep()` repeats columns of data frames, and returns listsrep(df, each=2)# `vec_rep()` and `vec_rep_each()` repeat rows, and return data framesvec_rep(df,2)vec_rep_each(df,2)# `rle()` treats adjacent missing values as differenty<- c(1,NA,NA,2)rle(y)# `vec_unrep()` treats them as equivalentvec_unrep(y)

Set operations

Description

  • vec_set_intersect() returns all values in bothx andy.

  • vec_set_difference() returns all values inx but noty. Notethat this is an asymmetric set difference, meaning it is not commutative.

  • vec_set_union() returns all values in eitherx ory.

  • vec_set_symmetric_difference() returns all values in eitherx orybut not both. This is a commutative difference.

Because these areset operations, these functions only return unique valuesfromx andy, returned in the order they first appeared in the originalinput. Names ofx andy are retained on the result, but names are alwaystaken fromx if the value appears in both inputs.

These functions work similarly tointersect(),setdiff(), andunion(),but don't strip attributes and can be used with data frames.

Usage

vec_set_intersect(  x,  y,...,  ptype=NULL,  x_arg="x",  y_arg="y",  error_call= current_env())vec_set_difference(  x,  y,...,  ptype=NULL,  x_arg="x",  y_arg="y",  error_call= current_env())vec_set_union(  x,  y,...,  ptype=NULL,  x_arg="x",  y_arg="y",  error_call= current_env())vec_set_symmetric_difference(  x,  y,...,  ptype=NULL,  x_arg="x",  y_arg="y",  error_call= current_env())

Arguments

x,y

A pair of vectors.

...

These dots are for future extensions and must be empty.

ptype

IfNULL, the default, the output type is determined bycomputing the common type betweenx andy. If supplied, bothx andy will be cast to this type.

x_arg,y_arg

Argument names forx andy. These are used in errormessages.

error_call

The execution environment of a currentlyrunning function, e.g.caller_env(). The function will bementioned in error messages as the source of the error. See thecall argument ofabort() for more information.

Details

Missing values are treated as equal to other missing values. For doubles andcomplexes,NaN are equal to otherNaN, but not toNA.

Value

A vector of the common type ofx andy (orptype, if supplied)containing the result of the corresponding set function.

Dependencies

vec_set_intersect()

vec_set_difference()

vec_set_union()

vec_set_symmetric_difference()

Examples

x<- c(1,2,1,4,3)y<- c(2,5,5,1)# All unique values in both `x` and `y`.# Duplicates in `x` and `y` are always removed.vec_set_intersect(x, y)# All unique values in `x` but not `y`vec_set_difference(x, y)# All unique values in either `x` or `y`vec_set_union(x, y)# All unique values in either `x` or `y` but not bothvec_set_symmetric_difference(x, y)# These functions can also be used with data framesx<- data_frame(  a= c(2,3,2,2),  b= c("j","k","j","l"))y<- data_frame(  a= c(1,2,2,2,3),  b= c("j","l","j","l","j"))vec_set_intersect(x, y)vec_set_difference(x, y)vec_set_union(x, y)vec_set_symmetric_difference(x, y)# Vector names don't affect set membership, but if you'd like to force# them to, you can transform the vector into a two column data framex<- c(a=1, b=2, c=2, d=3)y<- c(c=2, b=1, a=3, d=3)vec_set_intersect(x, y)x<- data_frame(name= names(x), value= unname(x))y<- data_frame(name= names(y), value= unname(y))vec_set_intersect(x, y)

Vector checks

Description

  • obj_is_vector() tests ifx is considered a vector in the vctrs sense.SeeVectors and scalars below for the exact details.

  • obj_check_vector() usesobj_is_vector() and throws a standardized andinformative error if it returnsFALSE.

  • vec_check_size() tests ifx has sizesize, and throws an informativeerror if it doesn't.

Usage

obj_is_vector(x)obj_check_vector(x,..., arg= caller_arg(x), call= caller_env())vec_check_size(x, size,..., arg= caller_arg(x), call= caller_env())

Arguments

x

For⁠obj_*()⁠ functions, an object. For⁠vec_*()⁠ functions, avector.

...

These dots are for future extensions and must be empty.

arg

An argument name as a string. This argumentwill be mentioned in error messages as the input that is at theorigin of a problem.

call

The execution environment of a currentlyrunning function, e.g.caller_env(). The function will bementioned in error messages as the source of the error. See thecall argument ofabort() for more information.

size

The size to check for.

Value

  • obj_is_vector() returns a singleTRUE orFALSE.

  • obj_check_vector() returnsNULL invisibly, or errors.

  • vec_check_size() returnsNULL invisibly, or errors.

Vectors and scalars

Informally, a vector is a collection that makes sense to use as column in adata frame. The following rules define whether or notx is considered avector.

If novec_proxy() method has been registered,x is a vector if:

If avec_proxy() method has been registered,x is a vector if:

  • The proxy satisfies one of the above conditions.

  • The base type of the proxy is"list", regardless of its class. S3 listsare thus treated as scalars unless they implement avec_proxy() method.

Otherwise an object is treated as scalar and cannot be used as a vector.In particular:

  • NULL is not a vector.

  • S3 lists likelm objects are treated as scalars by default.

  • Objects of typeexpression are not treated as vectors.

Technical limitations

  • Support for S4 vectors is currently limited to objects that inherit from anatomic type.

  • Subclasses ofdata.frame thatappend their class to the back of the"class" attribute are not treated as vectors. If you inherit from an S3class, always prepend your class to the front of the"class" attributefor correct dispatch. This matches our general principle of allowingsubclasses but not mixins.

Examples

obj_is_vector(1)# Data frames are vectorsobj_is_vector(data_frame())# Bare lists are vectorsobj_is_vector(list())# S3 lists are vectors if they explicitly inherit from `"list"`x<- structure(list(), class= c("my_list","list"))obj_is_list(x)obj_is_vector(x)# But if they don't explicitly inherit from `"list"`, they aren't# automatically considered to be vectors. Instead, vctrs considers this# to be a scalar object, like a linear model returned from `lm()`.y<- structure(list(), class="my_list")obj_is_list(y)obj_is_vector(y)# `obj_check_vector()` throws an informative error if the input# isn't a vectortry(obj_check_vector(y))# `vec_check_size()` throws an informative error if the size of the# input doesn't match `size`vec_check_size(1:5, size=5)try(vec_check_size(1:5, size=4))


[8]ページ先頭

©2009-2025 Movatter.jp