| Title: | Vector Helpers |
| Version: | 0.6.5 |
| Description: | Defines new notions of prototype and size that are used to provide tools for consistent and well-founded type-coercion and size-recycling, and are in turn connected to ideas of type- and size-stability useful for analysing function interfaces. |
| License: | MIT + file LICENSE |
| URL: | https://vctrs.r-lib.org/,https://github.com/r-lib/vctrs |
| BugReports: | https://github.com/r-lib/vctrs/issues |
| Depends: | R (≥ 3.5.0) |
| Imports: | cli (≥ 3.4.0), glue, lifecycle (≥ 1.0.3), rlang (≥ 1.1.0) |
| Suggests: | bit64, covr, crayon, dplyr (≥ 0.8.5), generics, knitr,pillar (≥ 1.4.4), pkgdown (≥ 2.0.1), rmarkdown, testthat (≥3.0.0), tibble (≥ 3.1.3), waldo (≥ 0.2.0), withr, xml2,zeallot |
| VignetteBuilder: | knitr |
| Config/Needs/website: | tidyverse/tidytemplate |
| Config/testthat/edition: | 3 |
| Encoding: | UTF-8 |
| Language: | en-GB |
| RoxygenNote: | 7.2.3 |
| NeedsCompilation: | yes |
| Packaged: | 2023-12-01 16:27:12 UTC; davis |
| Author: | Hadley Wickham [aut], Lionel Henry [aut], Davis Vaughan [aut, cre], data.table team [cph] (Radix sort based on data.table's forder() and their contribution to R's order()), Posit Software, PBC [cph, fnd] |
| Maintainer: | Davis Vaughan <davis@posit.co> |
| Repository: | CRAN |
| Date/Publication: | 2023-12-01 23:50:02 UTC |
vctrs: Vector Helpers
Description
Defines new notions of prototype and size that areused to provide tools for consistent and well-founded type-coercionand size-recycling, and are in turn connected to ideas of type- andsize-stability useful for analysing function interfaces.
Author(s)
Maintainer: Davis Vaughandavis@posit.co
Authors:
Hadley Wickhamhadley@posit.co
Lionel Henrylionel@posit.co
Other contributors:
data.table team (Radix sort based on data.table's forder() and their contribution to R's order()) [copyright holder]
Posit Software, PBC [copyright holder, funder]
See Also
Useful links:
Report bugs athttps://github.com/r-lib/vctrs/issues
Default value for empty vectors
Description
Use this inline operator when you need to provide a default value forempty (as defined byvec_is_empty()) vectors.
Usage
x %0% yArguments
x | A vector |
y | Value to use if |
Examples
1:10 %0% 5integer() %0% 5AsIs S3 class
Description
These functions help the base AsIs class fit into the vctrs type systemby providing coercion and casting functions.
Usage
## S3 method for class 'AsIs'vec_ptype2(x, y, ..., x_arg = "", y_arg = "")Construct a data frame
Description
data_frame() constructs a data frame. It is similar tobase::data.frame(), but there are a few notable differences that make itmore in line with vctrs principles. The Properties section outlines these.
Usage
data_frame( ..., .size = NULL, .name_repair = c("check_unique", "unique", "universal", "minimal", "unique_quiet", "universal_quiet"), .error_call = current_env())Arguments
... | Vectors to become columns in the data frame. When inputs arenamed, those names are used for column names. |
.size | The number of rows in the data frame. If |
.name_repair | One of |
.error_call | The execution environment of a currentlyrunning function, e.g. |
Details
If no column names are supplied,"" will be used as a default name for allcolumns. This is applied before name repair occurs, so the default namerepair of"check_unique" will error if any unnamed inputs are supplied and"unique" (or"unique_quiet") will repair the empty string column namesappropriately. If the column names don't matter, use a"minimal" namerepair for convenience and performance.
Properties
Inputs arerecycled to a common size with
vec_recycle_common().With the exception of data frames, inputs are not modified in any way.Character vectors are never converted to factors, and lists are storedas-is for easy creation of list-columns.
Unnamed data frame inputs are automatically unpacked. Named data frameinputs are stored unmodified as data frame columns.
NULLinputs are completely ignored.The dots are dynamic, allowing for splicing of lists with
!!!andunquoting.
See Also
df_list() for safely creating a data frame's underlying data structure fromindividual columns.new_data_frame() for constructing the actual dataframe from that underlying data structure. Together, these can be usefulfor developers when creating new data frame subclasses supportingstandard evaluation.
Examples
data_frame(x = 1, y = 2)# Inputs are recycled using tidyverse recycling rulesdata_frame(x = 1, y = 1:3)# Strings are never converted to factorsclass(data_frame(x = "foo")$x)# List columns can be easily createddf <- data_frame(x = list(1:2, 2, 3:4), y = 3:1)# However, the base print method is suboptimal for displaying them,# so it is recommended to convert them to tibbleif (rlang::is_installed("tibble")) { tibble::as_tibble(df)}# Named data frame inputs create data frame columnsdf <- data_frame(x = data_frame(y = 1:2, z = "a"))# The `x` column itself is another data framedf$x# Again, it is recommended to convert these to tibbles for a better# print methodif (rlang::is_installed("tibble")) { tibble::as_tibble(df)}# Unnamed data frame input is automatically unpackeddata_frame(x = 1, data_frame(y = 1:2, z = "a"))Collect columns for data frame construction
Description
df_list() constructs the data structure underlying a dataframe, a named list of equal-length vectors. It is often used incombination withnew_data_frame() to safely and consistently createa helper function for data frame subclasses.
Usage
df_list( ..., .size = NULL, .unpack = TRUE, .name_repair = c("check_unique", "unique", "universal", "minimal", "unique_quiet", "universal_quiet"), .error_call = current_env())Arguments
... | Vectors of equal-length. When inputs are named, those namesare used for names of the resulting list. |
.size | The common size of vectors supplied in |
.unpack | Should unnamed data frame inputs be unpacked? Defaults to |
.name_repair | One of |
.error_call | The execution environment of a currentlyrunning function, e.g. |
Properties
Inputs arerecycled to a common size with
vec_recycle_common().With the exception of data frames, inputs are not modified in any way.Character vectors are never converted to factors, and lists are storedas-is for easy creation of list-columns.
Unnamed data frame inputs are automatically unpacked. Named data frameinputs are stored unmodified as data frame columns.
NULLinputs are completely ignored.The dots are dynamic, allowing for splicing of lists with
!!!andunquoting.
See Also
new_data_frame() for constructing data frame subclasses from a validatedinput.data_frame() for a fast data frame creation helper.
Examples
# `new_data_frame()` can be used to create custom data frame constructorsnew_fancy_df <- function(x = list(), n = NULL, ..., class = NULL) { new_data_frame(x, n = n, ..., class = c(class, "fancy_df"))}# Combine this constructor with `df_list()` to create a safe,# consistent helper function for your data frame subclassfancy_df <- function(...) { data <- df_list(...) new_fancy_df(data)}df <- fancy_df(x = 1)class(df)Coercion between two data frames
Description
df_ptype2() anddf_cast() are the two functions you need tocall fromvec_ptype2() andvec_cast() methods for data framesubclasses. See?howto-faq-coercion-data-frame.Their main job is to determine the common type of two data frames,adding and coercing columns as needed, or throwing an incompatibletype error when the columns are not compatible.
Usage
df_ptype2(x, y, ..., x_arg = "", y_arg = "", call = caller_env())df_cast(x, to, ..., x_arg = "", to_arg = "", call = caller_env())tib_ptype2(x, y, ..., x_arg = "", y_arg = "", call = caller_env())tib_cast(x, to, ..., x_arg = "", to_arg = "", call = caller_env())Arguments
x,y,to | Subclasses of data frame. |
... | If you call |
x_arg,y_arg | Argument names for |
call | The execution environment of a currentlyrunning function, e.g. |
to_arg | Argument name |
Value
When
xandyare not compatible, an error of classvctrs_error_incompatible_typeis thrown.When
xandyare compatible,df_ptype2()returns the commontype as a bare data frame.tib_ptype2()returns the common typeas a bare tibble.
FAQ - How is the compatibility of vector types decided?
Description
Two vectors arecompatible when you can safely:
Combine them into one larger vector.
Assign values from one of the vectors into the other vector.
Examples of compatible types are integer and double vectors. On theother hand, integer and character vectors are not compatible.
Common type of multiple vectors
There are two possible outcomes when multiple vectors of different typesare combined into a larger vector:
An incompatible type error is thrown because some of the types are notcompatible:
df1 <- data.frame(x = 1:3)df2 <- data.frame(x = "foo")dplyr::bind_rows(df1, df2)#> Error in `dplyr::bind_rows()`:#> ! Can't combine `..1$x` <integer> and `..2$x` <character>.
The vectors are combined into a vector that has the common type of allinputs. In this example, the common type of integer and logical isinteger:
df1 <- data.frame(x = 1:3)df2 <- data.frame(x = FALSE)dplyr::bind_rows(df1, df2)#> x#> 1 1#> 2 2#> 3 3#> 4 0
In general, the common type is thericher type, in other words thetype that can represent the most values. Logical vectors are at thebottom of the hierarchy of numeric types because they can only representtwo values (not counting missing values). Then come integer vectors, andthen doubles. Here is the vctrs type hierarchy for the fundamentalvectors:

Type conversion and lossy cast errors
Type compatibility does not necessarily mean that you canconvertone type to the other type. That’s because one of the types mightsupport a larger set of possible values. For instance, integer anddouble vectors are compatible, but double vectors can’t be converted tointeger if they contain fractional values.
When vctrs can’t convert a vector because the target type is not as richas the source type, it throws a lossy cast error. Assigning a fractionalnumber to an integer vector is a typical example of a lossy cast error:
int_vector <- 1:3vec_assign(int_vector, 2, 0.001)#> Error in `vec_assign()`:#> ! Can't convert from <double> to <integer> due to loss of precision.#> * Locations: 1
How to make two vector classes compatible?
If you encounter two vector types that you think should be compatible,they might need to implement coercion methods. Reach out to theauthor(s) of the classes and ask them if it makes sense for theirclasses to be compatible.
These developer FAQ items provide guides for implementing coercionmethods:
For an example of implementing coercion methods for simple vectors,see
?howto-faq-coercion.For an example of implementing coercion methods for data framesubclasses, see
?howto-faq-coercion-data-frame.
FAQ - Error/Warning: Some attributes are incompatible
Description
This error occurs whenvec_ptype2() orvec_cast() are suppliedvectors of the same classes with different attributes. In thiscase, vctrs doesn't know how to combine the inputs.
To fix this error, the maintainer of the class should implementself-to-self coercion methods forvec_ptype2() andvec_cast().
Implementing coercion methods
For an overview of how these generics work and their roles in vctrs,see
?theory-faq-coercion.For an example of implementing coercion methods for simple vectors,see
?howto-faq-coercion.For an example of implementing coercion methods for data framesubclasses, see
?howto-faq-coercion-data-frame.For a tutorial about implementing vctrs classes from scratch, see
vignette("s3-vector").
FAQ - Error: Input must be a vector
Description
This error occurs when a function expects a vector and gets a scalarobject instead. This commonly happens when some code attempts to assigna scalar object as column in a data frame:
fn <- function() NULLtibble::tibble(x = fn)#> Error in `tibble::tibble()`:#> ! All columns in a tibble must be vectors.#> x Column `x` is a function.fit <- lm(1:3 ~ 1)tibble::tibble(x = fit)#> Error in `tibble::tibble()`:#> ! All columns in a tibble must be vectors.#> x Column `x` is a `lm` object.
Vectorness in base R and in the tidyverse
In base R, almost everything is a vector or behaves like a vector. Inthe tidyverse we have chosen to be a bit stricter about what isconsidered a vector. The main question we ask ourselves to decide on thevectorness of a type is whether it makes sense to include that object asa column in a data frame.
The main difference is that S3 lists are considered vectors by base Rbut in the tidyverse that’s not the case by default:
fit <- lm(1:3 ~ 1)typeof(fit)#> [1] "list"class(fit)#> [1] "lm"# S3 lists can be subset like a vector using base R:fit[c(1, 4)]#> $coefficients#> (Intercept) #> 2 #> #> $rank#> [1] 1# But not in vctrsvctrs::vec_slice(fit, c(1, 4))#> Error in `vctrs::vec_slice()`:#> ! `x` must be a vector, not a <lm> object.
Defused function calls are another (more esoteric) example:
call <- quote(foo(bar = TRUE, baz = FALSE))call#> foo(bar = TRUE, baz = FALSE)# They can be subset like a vector using base R:call[1:2]#> foo(bar = TRUE)lapply(call, function(x) x)#> [[1]]#> foo#> #> $bar#> [1] TRUE#> #> $baz#> [1] FALSE# But not with vctrs:vctrs::vec_slice(call, 1:2)#> Error in `vctrs::vec_slice()`:#> ! `x` must be a vector, not a call.
I get a scalar type error but I think this is a bug
It’s possible the author of the class needs to do some work to declaretheir class a vector. Consider reaching out to the author. We havewritten adeveloper FAQ page tohelp them fix the issue.
Tools for accessing the fields of a record.
Description
Arcrd behaves like a vector, solength(),names(), and$ cannot provide access to the fields of the underlying list. These helpers do:fields() is equivalent tonames();n_fields() is equivalent tolength();field() is equivalent to$.
Usage
fields(x)n_fields(x)field(x, i)field(x, i) <- valueArguments
x | Arcrd, i.e. a list of equal length vectors with unique names. |
Examples
x <- new_rcrd(list(x = 1:3, y = 3:1, z = letters[1:3]))n_fields(x)fields(x)field(x, "y")field(x, "y") <- runif(3)field(x, "y")FAQ - How to implement ptype2 and cast methods?
Description
This guide illustrates how to implementvec_ptype2() andvec_cast()methods for existing classes. Related topics:
For an overview of how these generics work and their roles in vctrs,see
?theory-faq-coercion.For an example of implementing coercion methods for data framesubclasses, see
?howto-faq-coercion-data-frame.For a tutorial about implementing vctrs classes from scratch, see
vignette("s3-vector")
The natural number class
We’ll illustrate how to implement coercion methods with a simple classthat represents natural numbers. In this scenario we have an existingclass that already features a constructor and methods forprint() andsubset.
#' @exportnew_natural <- function(x) { if (is.numeric(x) || is.logical(x)) { stopifnot(is_whole(x)) x <- as.integer(x) } else { stop("Can't construct natural from unknown type.") } structure(x, class = "my_natural")}is_whole <- function(x) { all(x %% 1 == 0 | is.na(x))}#' @exportprint.my_natural <- function(x, ...) { cat("<natural>\n") x <- unclass(x) NextMethod()}#' @export`[.my_natural` <- function(x, i, ...) { new_natural(NextMethod())}new_natural(1:3)#> <natural>#> [1] 1 2 3new_natural(c(1, NA))#> <natural>#> [1] 1 NA
Roxygen workflow
To implement methods for generics, first import the generics in yournamespace and redocument:
#' @importFrom vctrs vec_ptype2 vec_castNULL
Note that for each batches of methods that you add to your package, youneed to export the methods and redocument immediately, even duringdevelopment. Otherwise they won’t be in scope when you run unit testse.g. with testthat.
Implementing double dispatch methods is very similar to implementingregular S3 methods. In these examples we are using roxygen2 tags toregister the methods, but you can also register the methods manually inyour NAMESPACE file or lazily withs3_register().
Implementingvec_ptype2()
The self-self method
The first method to implement is the one that signals that your class iscompatible with itself:
#' @exportvec_ptype2.my_natural.my_natural <- function(x, y, ...) { x}vec_ptype2(new_natural(1), new_natural(2:3))#> <natural>#> integer(0)vec_ptype2() implements a fallback to try and be compatible withsimple classes, so it may seem that you don’t need to implement theself-self coercion method. However, you must implement it explicitlybecause this is how vctrs knows that a class that is implementing vctrsmethods (for instance this disable fallbacks tobase::c()). Also, itmakes your class a bit more efficient.
The parent and children methods
Our natural number class is conceptually a parent of<logical> and achild of<integer>, but the class is not compatible with logical,integer, or double vectors yet:
vec_ptype2(TRUE, new_natural(2:3))#> Error:#> ! Can't combine `TRUE` <logical> and `new_natural(2:3)` <my_natural>.vec_ptype2(new_natural(1), 2:3)#> Error:#> ! Can't combine `new_natural(1)` <my_natural> and `2:3` <integer>.
We’ll specify the twin methods for each of these classes, returning thericher class in each case.
#' @exportvec_ptype2.my_natural.logical <- function(x, y, ...) { # The order of the classes in the method name follows the order of # the arguments in the function signature, so `x` is the natural # number and `y` is the logical x}#' @exportvec_ptype2.logical.my_natural <- function(x, y, ...) { # In this case `y` is the richer natural number y}Between a natural number and an integer, the latter is the richer class:
#' @exportvec_ptype2.my_natural.integer <- function(x, y, ...) { y}#' @exportvec_ptype2.integer.my_natural <- function(x, y, ...) { x}We no longer get common type errors for logical and integer:
vec_ptype2(TRUE, new_natural(2:3))#> <natural>#> integer(0)vec_ptype2(new_natural(1), 2:3)#> integer(0)
We are not done yet. Pairwise coercion methods must be implemented forall the connected nodes in the coercion hierarchy, which include doublevectors further up. The coercion methods for grand-parent types must beimplemented separately:
#' @exportvec_ptype2.my_natural.double <- function(x, y, ...) { y}#' @exportvec_ptype2.double.my_natural <- function(x, y, ...) { x}Incompatible attributes
Most of the time, inputs are incompatible because they have differentclasses for which novec_ptype2() method is implemented. More rarely,inputs could be incompatible because of their attributes. In that caseincompatibility is signalled by callingstop_incompatible_type().
In the following example, we implement a self-self ptype2 method for ahypothetical subclass of<factor> that has stricter combinationsemantics. The method throws an error when the levels of the two factorsare not compatible.
#' @exportvec_ptype2.my_strict_factor.my_strict_factor <- function(x, y, ..., x_arg = "", y_arg = "") { if (!setequal(levels(x), levels(y))) { stop_incompatible_type(x, y, x_arg = x_arg, y_arg = y_arg) } x}Note how the methods need to takex_arg andy_arg parameters andpass them on tostop_incompatible_type(). These argument tags helpcreate more informative error messages when the common typedetermination is for a column of a data frame. They are part of thegeneric signature but can usually be left out if not used.
Implementingvec_cast()
Correspondingvec_cast() methods must be implemented for allvec_ptype2() methods. The general pattern is to convert the argumentx to the type ofto. The methods should validate the values inxand make sure they conform to the values ofto.
Please note that for historical reasons, the order of the classes in themethod name is in reverse order of the arguments in the functionsignature. The first class representsto, whereas the second classrepresentsx.
The self-self method is easy in this case, it just returns the targetinput:
#' @exportvec_cast.my_natural.my_natural <- function(x, to, ...) { x}The other types need to be validated. We perform input validation in thenew_natural() constructor, so that’s a good fit for ourvec_cast()implementations.
#' @exportvec_cast.my_natural.logical <- function(x, to, ...) { # The order of the classes in the method name is in reverse order # of the arguments in the function signature, so `to` is the natural # number and `x` is the logical new_natural(x)}vec_cast.my_natural.integer <- function(x, to, ...) { new_natural(x)}vec_cast.my_natural.double <- function(x, to, ...) { new_natural(x)}With these methods, vctrs is now able to combine logical and naturalvectors. It properly returns the richer type of the two, a naturalvector:
vec_c(TRUE, new_natural(1), FALSE)#> <natural>#> [1] 1 1 0
Because we haven’t implemented conversionsfrom natural, it stilldoesn’t know how to combine natural with the richer integer and doubletypes:
vec_c(new_natural(1), 10L)#> Error in `vec_c()`:#> ! Can't convert `..1` <my_natural> to <integer>.vec_c(1.5, new_natural(1))#> Error in `vec_c()`:#> ! Can't convert `..2` <my_natural> to <double>.
This is quick work which completes the implementation of coercionmethods for vctrs:
#' @exportvec_cast.logical.my_natural <- function(x, to, ...) { # In this case `to` is the logical and `x` is the natural number attributes(x) <- NULL as.logical(x)}#' @exportvec_cast.integer.my_natural <- function(x, to, ...) { attributes(x) <- NULL as.integer(x)}#' @exportvec_cast.double.my_natural <- function(x, to, ...) { attributes(x) <- NULL as.double(x)}And we now get the expected combinations.
vec_c(new_natural(1), 10L)#> [1] 1 10vec_c(1.5, new_natural(1))#> [1] 1.5 1.0
FAQ - How to implement ptype2 and cast methods? (Data frames)
Description
This guide provides a practical recipe for implementingvec_ptype2()andvec_cast() methods for coercions of data frame subclasses. Relatedtopics:
For an overview of the coercion mechanism in vctrs, see
?theory-faq-coercion.For an example of implementing coercion methods for simple vectors,see
?howto-faq-coercion.
Coercion of data frames occurs when different data frame classes arecombined in some way. The two main methods of combination are currentlyrow-binding withvec_rbind() and col-binding withvec_cbind() (which are in turn used by a number ofdplyr and tidyr functions). These functions take multiple data frameinputs and automatically coerce them to their common type.
vctrs is generally strict about the kind of automatic coercions that areperformed when combining inputs. In the case of data frames we havedecided to be a bit less strict for convenience. Instead of throwing anincompatible type error, we fall back to a base data frame or a tibbleif we don’t know how to combine two data frame subclasses. It is still agood idea to specify the proper coercion behaviour for your data framesubclasses as soon as possible.
We will see two examples in this guide. The first example is about adata frame subclass that has no particular attributes to manage. In thesecond example, we implement coercion methods for a tibble subclass thatincludes potentially incompatible attributes.
Roxygen workflow
To implement methods for generics, first import the generics in yournamespace and redocument:
#' @importFrom vctrs vec_ptype2 vec_castNULL
Note that for each batches of methods that you add to your package, youneed to export the methods and redocument immediately, even duringdevelopment. Otherwise they won’t be in scope when you run unit testse.g. with testthat.
Implementing double dispatch methods is very similar to implementingregular S3 methods. In these examples we are using roxygen2 tags toregister the methods, but you can also register the methods manually inyour NAMESPACE file or lazily withs3_register().
Parent methods
Most of the common type determination should be performed by the parentclass. In vctrs, double dispatch is implemented in such a way that youneed to call the methods for the parent class manually. Forvec_ptype2() this means you need to calldf_ptype2() (for data framesubclasses) ortib_ptype2() (for tibble subclasses). Similarly,df_cast() andtib_cast() are the workhorses forvec_cast() methodsof subtypes ofdata.frame andtbl_df. These functions take the unionof the columns inx andy, and ensure shared columns have the sametype.
These functions are much less strict thanvec_ptype2() andvec_cast() as they accept any subclass of data frame as input. Theyalways return adata.frame or atbl_df. You will probably want towrite similar functions for your subclass to avoid repetition in yourcode. You may want to export them as well if you are expecting otherpeople to derive from your class.
Adata.table example
This example is the actual implementation of vctrs coercion methods fordata.table. This is a simple example because we don’t have to keeptrack of attributes for this class or manage incompatibilities. See thetibble section for a more complicated example.
We first create thedt_ptype2() anddt_cast() helpers. They wraparound the parent methodsdf_ptype2() anddf_cast(), and transformthe common type or converted input to a data table. You may want toexport these helpers if you expect other packages to derive from yourdata frame class.
These helpers should always return data tables. To this end we use theconversion genericas.data.table(). Depending on the tools availablefor the particular class at hand, a constructor might be appropriate aswell.
dt_ptype2 <- function(x, y, ...) { as.data.table(df_ptype2(x, y, ...))}dt_cast <- function(x, to, ...) { as.data.table(df_cast(x, to, ...))}We start with the self-self method:
#' @exportvec_ptype2.data.table.data.table <- function(x, y, ...) { dt_ptype2(x, y, ...)}Between a data frame and a data table, we consider the richer type to bedata table. This decision is not based on the value coverage of eachdata structures, but on the idea that data tables have richer behaviour.Since data tables are the richer type, we calldt_type2() from thevec_ptype2() method. It always returns a data table, no matter theorder of arguments:
#' @exportvec_ptype2.data.table.data.frame <- function(x, y, ...) { dt_ptype2(x, y, ...)}#' @exportvec_ptype2.data.frame.data.table <- function(x, y, ...) { dt_ptype2(x, y, ...)}Thevec_cast() methods follow the same pattern, but note how themethod for coercing to data frame usesdf_cast() rather thandt_cast().
Also, please note that for historical reasons, the order of the classesin the method name is in reverse order of the arguments in the functionsignature. The first class representsto, whereas the second classrepresentsx.
#' @exportvec_cast.data.table.data.table <- function(x, to, ...) { dt_cast(x, to, ...)}#' @exportvec_cast.data.table.data.frame <- function(x, to, ...) { # `x` is a data.frame to be converted to a data.table dt_cast(x, to, ...)}#' @exportvec_cast.data.frame.data.table <- function(x, to, ...) { # `x` is a data.table to be converted to a data.frame df_cast(x, to, ...)}With these methods vctrs is now able to combine data tables with dataframes:
vec_cbind(data.frame(x = 1:3), data.table(y = "foo"))#> x y#> 1: 1 foo#> 2: 2 foo#> 3: 3 foo
A tibble example
In this example we implement coercion methods for a tibble subclass thatcarries a colour as a scalar metadata:
# User constructormy_tibble <- function(colour = NULL, ...) { new_my_tibble(tibble::tibble(...), colour = colour)}# Developer constructornew_my_tibble <- function(x, colour = NULL) { stopifnot(is.data.frame(x)) tibble::new_tibble( x, colour = colour, class = "my_tibble", nrow = nrow(x) )}df_colour <- function(x) { if (inherits(x, "my_tibble")) { attr(x, "colour") } else { NULL }}#'@exportprint.my_tibble <- function(x, ...) { cat(sprintf("<%s: %s>\n", class(x)[[1]], df_colour(x))) cli::cat_line(format(x)[-1])}This subclass is very simple. All it does is modify the header.
red <- my_tibble("red", x = 1, y = 1:2)red#> <my_tibble: red>#> x y#> <dbl> <int>#> 1 1 1#> 2 1 2red[2]#> <my_tibble: red>#> y#> <int>#> 1 1#> 2 2green <- my_tibble("green", z = TRUE)green#> <my_tibble: green>#> z #> <lgl>#> 1 TRUECombinations do not work properly out of the box, instead vctrs fallsback to a bare tibble:
vec_rbind(red, tibble::tibble(x = 10:12))#> # A tibble: 5 x 2#> x y#> <dbl> <int>#> 1 1 1#> 2 1 2#> 3 10 NA#> 4 11 NA#> 5 12 NA
Instead of falling back to a data frame, we would like to return a<my_tibble> when combined with a data frame or a tibble. Because thissubclass has more metadata than normal data frames (it has a colour), itis asupertype of tibble and data frame, i.e. it is the richer type.This is similar to how a grouped tibble is a more general type than atibble or a data frame. Conceptually, the latter are pinned to a singleconstant group.
The coercion methods for data frames operate in two steps:
They check for compatible subclass attributes. In our case the tibblecolour has to be the same, or be undefined.
They call their parent methods, in this case
tib_ptype2()andtib_cast()becausewe have a subclass of tibble. This eventually calls the data framemethodsdf_ptype2()andtib_ptype2()which match the columns and theirtypes.
This process should usually be wrapped in two functions to avoidrepetition. Consider exporting these if you expect your class to bederived by other subclasses.
We first implement a helper to determine if two data frames havecompatible colours. We use thedf_colour() accessor which returnsNULL when the data frame colour is undefined.
has_compatible_colours <- function(x, y) { x_colour <- df_colour(x) %||% df_colour(y) y_colour <- df_colour(y) %||% x_colour identical(x_colour, y_colour)}Next we implement the coercion helpers. If the colours are notcompatible, we callstop_incompatible_cast() orstop_incompatible_type(). These strict coercion semantics arejustified because in this class colour is adata attribute. If it werea non essentialdetail attribute, like the timezone in a datetime, wewould just standardise it to the value of the left-hand side.
In simpler cases (like the data.table example), these methods do notneed to take the arguments suffixed in_arg. Here we do need to takethese arguments so we can pass them to thestop_ functions when wedetect an incompatibility. They also should be passed to the parentmethods.
#' @exportmy_tib_cast <- function(x, to, ..., x_arg = "", to_arg = "") { out <- tib_cast(x, to, ..., x_arg = x_arg, to_arg = to_arg) if (!has_compatible_colours(x, to)) { stop_incompatible_cast( x, to, x_arg = x_arg, to_arg = to_arg, details = "Can't combine colours." ) } colour <- df_colour(x) %||% df_colour(to) new_my_tibble(out, colour = colour)}#' @exportmy_tib_ptype2 <- function(x, y, ..., x_arg = "", y_arg = "") { out <- tib_ptype2(x, y, ..., x_arg = x_arg, y_arg = y_arg) if (!has_compatible_colours(x, y)) { stop_incompatible_type( x, y, x_arg = x_arg, y_arg = y_arg, details = "Can't combine colours." ) } colour <- df_colour(x) %||% df_colour(y) new_my_tibble(out, colour = colour)}Let’s now implement the coercion methods, starting with the self-selfmethods.
#' @exportvec_ptype2.my_tibble.my_tibble <- function(x, y, ...) { my_tib_ptype2(x, y, ...)}#' @exportvec_cast.my_tibble.my_tibble <- function(x, to, ...) { my_tib_cast(x, to, ...)}We can now combine compatible instances of our class!
vec_rbind(red, red)#> <my_tibble: red>#> x y#> <dbl> <int>#> 1 1 1#> 2 1 2#> 3 1 1#> 4 1 2vec_rbind(green, green)#> <my_tibble: green>#> z #> <lgl>#> 1 TRUE #> 2 TRUEvec_rbind(green, red)#> Error in `my_tib_ptype2()`:#> ! Can't combine `..1` <my_tibble> and `..2` <my_tibble>.#> Can't combine colours.
The methods for combining our class with tibbles follow the samepattern. For ptype2 we return our class in both cases because it is thericher type:
#' @exportvec_ptype2.my_tibble.tbl_df <- function(x, y, ...) { my_tib_ptype2(x, y, ...)}#' @exportvec_ptype2.tbl_df.my_tibble <- function(x, y, ...) { my_tib_ptype2(x, y, ...)}For cast are careful about returning a tibble when casting to a tibble.Note the call tovctrs::tib_cast():
#' @exportvec_cast.my_tibble.tbl_df <- function(x, to, ...) { my_tib_cast(x, to, ...)}#' @exportvec_cast.tbl_df.my_tibble <- function(x, to, ...) { tib_cast(x, to, ...)}From this point, we get correct combinations with tibbles:
vec_rbind(red, tibble::tibble(x = 10:12))#> <my_tibble: red>#> x y#> <dbl> <int>#> 1 1 1#> 2 1 2#> 3 10 NA#> 4 11 NA#> 5 12 NA
However we are not done yet. Because the coercion hierarchy is differentfrom the class hierarchy, there is no inheritance of coercion methods.We’re not getting correct behaviour for data frames yet because wehaven’t explicitly specified the methods for this class:
vec_rbind(red, data.frame(x = 10:12))#> # A tibble: 5 x 2#> x y#> <dbl> <int>#> 1 1 1#> 2 1 2#> 3 10 NA#> 4 11 NA#> 5 12 NA
Let’s finish up the boiler plate:
#' @exportvec_ptype2.my_tibble.data.frame <- function(x, y, ...) { my_tib_ptype2(x, y, ...)}#' @exportvec_ptype2.data.frame.my_tibble <- function(x, y, ...) { my_tib_ptype2(x, y, ...)}#' @exportvec_cast.my_tibble.data.frame <- function(x, to, ...) { my_tib_cast(x, to, ...)}#' @exportvec_cast.data.frame.my_tibble <- function(x, to, ...) { df_cast(x, to, ...)}This completes the implementation:
vec_rbind(red, data.frame(x = 10:12))#> <my_tibble: red>#> x y#> <dbl> <int>#> 1 1 1#> 2 1 2#> 3 10 NA#> 4 11 NA#> 5 12 NA
FAQ - Why isn't my class treated as a vector?
Description
The tidyverse is a bit stricter than base R regarding what kind ofobjects are considered as vectors (see theuser FAQ about this topic). Sometimes vctrs won’ttreat your class as a vector when it should.
Why isn’t my list class considered a vector?
By default, S3 lists are not considered to be vectors by vctrs:
my_list <- structure(list(), class = "my_class")vctrs::vec_is(my_list)#> [1] FALSE
To be treated as a vector, the class must either inherit from"list"explicitly:
my_explicit_list <- structure(list(), class = c("my_class", "list"))vctrs::vec_is(my_explicit_list)#> [1] TRUEOr it should implement avec_proxy() method that returns its input ifexplicit inheritance is not possible or troublesome:
#' @exportvec_proxy.my_class <- function(x, ...) xvctrs::vec_is(my_list)#> [1] FALSE
Note that explicit inheritance is the preferred way because this makesit possible for your class to dispatch onlist methods of S3 generics:
my_generic <- function(x) UseMethod("my_generic")my_generic.list <- function(x) "dispatched!"my_generic(my_list)#> Error in UseMethod("my_generic"): no applicable method for 'my_generic' applied to an object of class "my_class"my_generic(my_explicit_list)#> [1] "dispatched!"Why isn’t my data frame class considered a vector?
The most likely explanation is that the data frame has not been properlyconstructed.
However, if you get an “Input must be a vector” error with a data framesubclass, it probably means that the data frame has not been properlyconstructed. The main cause of these errors are data frames whosebaseclass is not"data.frame":
my_df <- data.frame(x = 1)class(my_df) <- c("data.frame", "my_class")vctrs::obj_check_vector(my_df)#> Error:#> ! `my_df` must be a vector, not a <data.frame/my_class> object.This is problematic as many tidyverse functions won’t work properly:
dplyr::slice(my_df, 1)#> Error in `vec_slice()`:#> ! `x` must be a vector, not a <data.frame/my_class> object.
It is generally not appropriate to declare your class to be a superclassof another class. We generally consider this undefined behaviour (UB).To fix these errors, you can simply change the construction of your dataframe class so that"data.frame" is a base class, i.e. it should comelast in the class vector:
class(my_df) <- c("my_class", "data.frame")vctrs::obj_check_vector(my_df)dplyr::slice(my_df, 1)#> x#> 1 1Internal FAQ - Implementation ofvec_locate_matches()
Description
vec_locate_matches() is similar tovec_match(), but detectsall matches by default, and can match on conditions other than equality (like>= and<). There are also various other arguments to limit or adjust exactly which kinds of matches are returned. Here is an example:
x <- c("a", "b", "a", "c", "d")y <- c("d", "b", "a", "d", "a", "e")# For each value of `x`, find all matches in `y`# - The "c" in `x` doesn't have a match, so it gets an NA location by default# - The "e" in `y` isn't matched by anything in `x`, so it is dropped by defaultvec_locate_matches(x, y)#> needles haystack#> 1 1 3#> 2 1 5#> 3 2 2#> 4 3 3#> 5 3 5#> 6 4 NA#> 7 5 1#> 8 5 4Algorithm description
Overview and==
The simplest (approximate) way to think about the algorithm thatdf_locate_matches_recurse() uses is that it sorts both inputs, and then starts at the midpoint inneedles and uses a binary search to find each needle inhaystack. Since there might be multiple of the same needle, we find the location of the lower and upper duplicate of that needle to handle all duplicates of that needle at once. Similarly, if there are duplicates of a matchinghaystack value, we find the lower and upper duplicates of the match.
If the condition is==, that is pretty much all we have to do. For each needle, we then record 3 things: the location of the needle, the location of the lower match in the haystack, and the match size (i.e.loc_upper_match - loc_lower_match + 1). This later gets expanded inexpand_compact_indices() into the actual output.
After recording the matches for a single needle, we perform the same procedure on the LHS and RHS of that needle (remember we started on the midpoint needle). i.e. from[1, loc_needle-1] and[loc_needle+1, size_needles], again taking the midpoint of those two ranges, finding their respective needle in the haystack, recording matches, and continuing on to the next needle. This iteration proceeds until we run out of needles.
When we have a data frame with multiple columns, we add a layer of recursion to this. For the first column, we find the locations of the lower/upper duplicate of the current needle, and we find the locations of the lower/upper matches in the haystack. If we are on the final column in the data frame, we record the matches, otherwise we pass this information on to another call todf_locate_matches_recurse(), bumping the column index and using these refined lower/upper bounds as the starting bounds for the next column.
I think an example would be useful here, so below I step through this process for a few iterations:
# these are sorted already for simplicityneedles <- data_frame(x = c(1, 1, 2, 2, 2, 3), y = c(1, 2, 3, 4, 5, 3))haystack <- data_frame(x = c(1, 1, 2, 2, 3), y = c(2, 3, 4, 4, 1))needles#> x y#> 1 1 1#> 2 1 2#> 3 2 3#> 4 2 4#> 5 2 5#> 6 3 3haystack#> x y#> 1 1 2#> 2 1 3#> 3 2 4#> 4 2 4#> 5 3 1## Column 1, iteration 1# start at midpoint in needles# this corresponds to x==2loc_mid_needles <- 3L# finding all x==2 values in needles gives us:loc_lower_duplicate_needles <- 3Lloc_upper_duplicate_needles <- 5L# finding matches in haystack give us:loc_lower_match_haystack <- 3Lloc_upper_match_haystack <- 4L# compute LHS/RHS bounds for next needlelhs_loc_lower_bound_needles <- 1L # original lower boundlhs_loc_upper_bound_needles <- 2L # lower_duplicate-1rhs_loc_lower_bound_needles <- 6L # upper_duplicate+1rhs_loc_upper_bound_needles <- 6L # original upper bound# We still have a 2nd column to check. So recurse and pass on the current# duplicate and match bounds to start the 2nd column with.## Column 2, iteration 1# midpoint of [3, 5]# value y==4loc_mid_needles <- 4Lloc_lower_duplicate_needles <- 4Lloc_upper_duplicate_needles <- 4Lloc_lower_match_haystack <- 3Lloc_upper_match_haystack <- 4L# last column, so record matches# - this was location 4 in needles# - lower match in haystack is at loc 3# - match size is 2# Now handle LHS and RHS of needle midpointlhs_loc_lower_bound_needles <- 3L # original lower boundlhs_loc_upper_bound_needles <- 3L # lower_duplicate-1rhs_loc_lower_bound_needles <- 5L # upper_duplicate+1rhs_loc_upper_bound_needles <- 5L # original upper bound## Column 2, iteration 2 (using LHS bounds)# midpoint of [3,3]# value of y==3loc_mid_needles <- 3Lloc_lower_duplicate_needles <- 3Lloc_upper_duplicate_needles <- 3L# no match! no y==3 in haystack for x==2# lower-match will always end up > upper-match in this caseloc_lower_match_haystack <- 3Lloc_upper_match_haystack <- 2L# no LHS or RHS needle values to do, so we are done here## Column 2, iteration 3 (using RHS bounds)# same as above, range of [5,5], value of y==5, which has no match in haystack## Column 1, iteration 2 (LHS of first x needle)# Now we are done with the x needles from [3,5], so move on to the LHS and RHS# of that. Here we would do the LHS:# midpoint of [1,2]loc_mid_needles <- 1L# ...## Column 1, iteration 3 (RHS of first x needle)# midpoint of [6,6]loc_mid_needles <- 6L# ...
In the real code, rather than comparing the double values of the columns directly, we replace each column with pseudo "joint ranks" computed between the i-th column ofneedles and the i-th column ofhaystack. It is approximately like doingvec_rank(vec_c(needles$x, haystack$x), type = "dense"), then splitting the resulting ranks back up into their corresponding needle/haystack columns. This keeps the recursion code simpler, because we only have to worry about comparing integers.
Non-equi conditions and containers
At this point we can talk about non-equi conditions like< or>=. The general idea is pretty simple, and just builds on the above algorithm. For example, start with thex column from needles/haystack above:
needles$x#> [1] 1 1 2 2 2 3haystack$x#> [1] 1 1 2 2 3
If we used a condition of<=, then we'd do everything the same as before:
Midpoint in needles is location 3, value
x==2Find lower/upper duplicates in needles, giving locations
[3, 5]Find lower/upperexact match in haystack, giving locations
[3, 4]
At this point, we need to "adjust" thehaystack match bounds to account for the condition. Sincehaystack is ordered, our "rule" for<= is to keep the lower match location the same, but extend the upper match location to the upper bound, so we end up with[3, 5]. We know we can extend the upper match location because every haystack value after the exact match should be less than the needle. Then we just record the matches and continue on normally.
This approach is really nice, because we only have to exactly match theneedle inhaystack. We don't have to compare each needle against every value inhaystack, which would take a massive amount of time.
However, it gets slightly more complex with data frames with multiple columns. Let's go back to our originalneedles andhaystack data frames and apply the condition<= to each column. Here is another worked example, which shows a case where our "rule" falls apart on the second column.
needles#> x y#> 1 1 1#> 2 1 2#> 3 2 3#> 4 2 4#> 5 2 5#> 6 3 3haystack#> x y#> 1 1 2#> 2 1 3#> 3 2 4#> 4 2 4#> 5 3 1# `condition = c("<=", "<=")`## Column 1, iteration 1# x == 2loc_mid_needles <- 3Lloc_lower_duplicate_needles <- 3Lloc_upper_duplicate_needles <- 5L# finding exact matches in haystack give us:loc_lower_match_haystack <- 3Lloc_upper_match_haystack <- 4L# because haystack is ordered we know we can expand the upper bound automatically# to include everything past the match. i.e. needle of x==2 must be less than# the haystack value at loc 5, which we can check by seeing that it is x==3.loc_lower_match_haystack <- 3Lloc_upper_match_haystack <- 5L## Column 2, iteration 1# needles range of [3, 5]# y == 4loc_mid_needles <- 4Lloc_lower_duplicate_needles <- 4Lloc_upper_duplicate_needles <- 4L# finding exact matches in haystack give us:loc_lower_match_haystack <- 3Lloc_upper_match_haystack <- 4L# lets try using our rule, which tells us we should be able to extend the upper# bound:loc_lower_match_haystack <- 3Lloc_upper_match_haystack <- 5L# but the haystack value of y at location 5 is y==1, which is not less than y==4# in the needles! looks like our rule failed us.If you read through the above example, you'll see that the rule didn't work here. The problem is that whilehaystack is ordered (byvec_order()s standards), each column isn't orderedindependently of the others. Instead, each column is ordered within the "group" created by previous columns. Concretely,haystack here has an orderedx column, but if you look athaystack$y by itself, it isn't ordered (because of that 1 at the end). That is what causes the rule to fail.
haystack#> x y#> 1 1 2#> 2 1 3#> 3 2 4#> 4 2 4#> 5 3 1
To fix this, we need to create haystack "containers" where the values within each container are alltotally ordered. Forhaystack that would create 2 containers and look like:
haystack[1:4,]#> # A tibble: 4 × 2#> x y#> <dbl> <dbl>#> 1 1 2#> 2 1 3#> 3 2 4#> 4 2 4haystack[5,]#> # A tibble: 1 × 2#> x y#> <dbl> <dbl>#> 1 3 1
This is essentially whatcomputing_nesting_container_ids() does. You can actually see these ids with the helper,compute_nesting_container_info():
haystack2 <- haystack# we really pass along the integer ranks, but in this case that is equivalent# to converting our double columns to integershaystack2$x <- as.integer(haystack2$x)haystack2$y <- as.integer(haystack2$y)info <- compute_nesting_container_info(haystack2, condition = c("<=", "<="))# the ids are in the second slot.# container ids break haystack into [1, 4] and [5, 5].info[[2]]#> [1] 0 0 0 0 1So the idea is that for each needle, we look in each haystack container and find all the matches, then we aggregate all of the matches once at the end.df_locate_matches_with_containers() has the job of iterating over the containers.
Computing totally ordered containers can be expensive, but luckily it doesn't happen very often in normal usage.
If there are all
==conditions, we don't need containers (i.e. any equi join)If there is only 1 non-equi condition and no conditions after it, we don't need containers (i.e. most rolling joins)
Otherwise the typical case where we need containers is if we have something like
date >= lower, date <= upper. Even so, the computation cost generally scales with the number of columns inhaystackyou compute containers with (here 2), and it only really slows down around 4 columns or so, which I haven't ever seen a real life example of.
Internal FAQ -vec_ptype2(),NULL, and unspecified vectors
Description
Promotion monoid
Promotions (i.e. automatic coercions) should always transform inputs totheir richer type to avoid losing values of precision.vec_ptype2()returns thericher type of two vectors, or throws an incompatible typeerror if none of the two vector types include the other. For example,the richer type of integer and double is the latter because doublecovers a larger range of values than integer.
vec_ptype2() is amonoid overvectors, which in practical terms means that it is a well behavedoperation forreduction.Reduction is an important operation for promotions because that is howthe richer type of multiple elements is computed. As a monoid,vec_ptype2() needs an identity element, i.e. a value that doesn’tchange the result of the reduction. vctrs has two identity values,NULL andunspecified vectors.
TheNULL identity
As an identity element that shouldn’t influence the determination of thecommon type of a set of vectors,NULL is promoted to any type:
vec_ptype2(NULL, "")#> character(0)vec_ptype2(1L, NULL)#> integer(0)
The common type ofNULL andNULL is the identityNULL:
vec_ptype2(NULL, NULL)#> NULL
This way the result ofvec_ptype2(NULL, NULL) does not influencesubsequent promotions:
vec_ptype2( vec_ptype2(NULL, NULL), "")#> character(0)
Unspecified vectors
In the vctrs coercion system, logical vectors of missing values are alsoautomatically promoted to the type of any other vector, just likeNULL. We call these vectors unspecified. The special coercionsemantics of unspecified vectors serve two purposes:
It makes it possible to assign vectors of
NAinside any type ofvectors, even when they are not coercible with logical:x <- letters[1:5]vec_assign(x, 1:2, c(NA, NA))#> [1] NA NA "c" "d" "e"
We can’t put
NULLin a data frame, so we need an identity elementthat behaves more like a vector. Logical vectors ofNAseem anatural fit for this.
Unspecified vectors are thus promoted to any other type, just likeNULL:
vec_ptype2(NA, "")#> character(0)vec_ptype2(1L, c(NA, NA))#> integer(0)
Finalising common types
vctrs has an internal vector type of classvctrs_unspecified. Usersnormally don’t see such vectors in the wild, but they do come up whentaking the common type of an unspecified vector with another identityvalue:
vec_ptype2(NA, NA)#> <unspecified> [0]vec_ptype2(NA, NULL)#> <unspecified> [0]vec_ptype2(NULL, NA)#> <unspecified> [0]
We can’t returnNA here becausevec_ptype2() normally returns emptyvectors. We also can’t returnNULL because unspecified vectors need tobe recognised as logical vectors if they haven’t been promoted at theend of the reduction.
vec_ptype_finalise(vec_ptype2(NULL, NA))#> logical(0)
See the output ofvec_ptype_common() which performs the reduction andfinalises the type, ready to be used by the caller:
vec_ptype_common(NULL, NULL)#> NULLvec_ptype_common(NA, NULL)#> logical(0)
Note thatpartial types in vctrs make use of the same mechanism.They are finalised withvec_ptype_finalise().
Drop empty elements from a list
Description
list_drop_empty() removes empty elements from a list. This includesNULLelements along with empty vectors, likeinteger(0). This is equivalent to,but faster than,vec_slice(x, list_sizes(x) != 0L).
Usage
list_drop_empty(x)Arguments
x | A list. |
Dependencies
Examples
x <- list(1, NULL, integer(), 2)list_drop_empty(x)list_of S3 class for homogenous lists
Description
Alist_of object is a list where each element has the same type.Modifying the list with$,[, and[[ preserves the constraintby coercing all input items.
Usage
list_of(..., .ptype = NULL)as_list_of(x, ...)is_list_of(x)## S3 method for class 'vctrs_list_of'vec_ptype2(x, y, ..., x_arg = "", y_arg = "")## S3 method for class 'vctrs_list_of'vec_cast(x, to, ...)Arguments
... | Vectors to coerce. |
.ptype | If Alternatively, you can supply |
x | For |
y,to | Arguments to |
x_arg,y_arg | Argument names for |
Details
Unlike regular lists, setting a list element toNULL using[[does not remove it.
Examples
x <- list_of(1:3, 5:6, 10:15)if (requireNamespace("tibble", quietly = TRUE)) { tibble::tibble(x = x)}vec_c(list_of(1, 2), list_of(FALSE, TRUE))Lossy cast error
Description
By default, lossy casts are an error. Useallow_lossy_cast() tosilence these errors and continue with the partial results. In thiscase the lost values are typically set toNA or to a lower valueresolution, depending on the type of cast.
Lossy cast errors are thrown bymaybe_lossy_cast(). Unlikefunctions prefixed withstop_,maybe_lossy_cast() usuallyreturns a result. If a lossy cast is detected, it throws an error,unless it's been wrapped inallow_lossy_cast(). In that case, itreturns the result silently.
Usage
maybe_lossy_cast( result, x, to, lossy = NULL, locations = NULL, ..., loss_type = c("precision", "generality"), x_arg, to_arg, call = caller_env(), details = NULL, message = NULL, class = NULL, .deprecation = FALSE)Arguments
result | The result of a potentially lossy cast. |
x | Vectors to cast. |
to | Type to cast to. |
lossy | A logical vector indicating which elements of Can also be a single |
locations | An optional integer vector giving thelocations where |
...,class | Only use these fields when creating a subclass. |
loss_type | The kind of lossy cast to be mentioned in errormessages. Can be loss of precision (for instance from double tointeger) or loss of generality (from character to factor). |
x_arg | Argument name for |
to_arg | Argument name |
call | The execution environment of a currentlyrunning function, e.g. |
details | Any additional human readable details. |
message | An overriding message for the error. |
.deprecation | If |
Missing values
Description
vec_detect_missing()returns a logical vector the same size asx. Foreach element ofx, it returnsTRUEif the element is missing, andFALSEotherwise.vec_any_missing()returns a singleTRUEorFALSEdepending on whetheror notxhasany missing values.
Differences withis.na()
Data frame rows are only considered missing if every element in the row ismissing. Similarly,record vector elements are only consideredmissing if every field in the record is missing. Put another way, rows withany missing values are consideredincomplete, butonly rows withall missing values are considered missing.
List elements are only considered missing if they areNULL.
Usage
vec_detect_missing(x)vec_any_missing(x)Arguments
x | A vector |
Value
vec_detect_missing()returns a logical vector the same size asx.vec_any_missing()returns a singleTRUEorFALSE.
Dependencies
See Also
Examples
x <- c(1, 2, NA, 4, NA)vec_detect_missing(x)vec_any_missing(x)# Data frames are iterated over rowwise, and only report a row as missing# if every element of that row is missing. If a row is only partially# missing, it is said to be incomplete, but not missing.y <- c("a", "b", NA, "d", "e")df <- data_frame(x = x, y = y)df$missing <- vec_detect_missing(df)df$incomplete <- !vec_detect_complete(df)dfName specifications
Description
A name specification describes how to combine an inner and outernames. This sort of name combination arises when concatenatingvectors or flattening lists. There are two possible cases:
Named vector:
vec_c(outer = c(inner1 = 1, inner2 = 2))
Unnamed vector:
vec_c(outer = 1:2)
In r-lib and tidyverse packages, these cases are errors by default,because there's no behaviour that works well for every case.Instead, you can provide a name specification that describes how tocombine the inner and outer names of inputs. Name specificationscan refer to:
outer: The external name recycled to the size of the inputvector.inner: Either the names of the input vector, or a sequence ofinteger from 1 to the size of the vector if it is unnamed.
Arguments
name_spec,.name_spec | A name specification for combininginner and outer names. This is relevant for inputs passed with aname, when these inputs are themselves named, like
See thename specification topic. |
Examples
# By default, named inputs must be length 1:vec_c(name = 1) # oktry(vec_c(name = 1:3)) # bad# They also can't have internal names, even if scalar:try(vec_c(name = c(internal = 1))) # bad# Pass a name specification to work around this. A specification# can be a glue string referring to `outer` and `inner`:vec_c(name = 1:3, other = 4:5, .name_spec = "{outer}")vec_c(name = 1:3, other = 4:5, .name_spec = "{outer}_{inner}")# They can also be functions:my_spec <- function(outer, inner) paste(outer, inner, sep = "_")vec_c(name = 1:3, other = 4:5, .name_spec = my_spec)# Or purrr-style formulas for anonymous functions:vec_c(name = 1:3, other = 4:5, .name_spec = ~ paste0(.x, .y))Assemble attributes for data frame construction
Description
new_data_frame() constructs a new data frame from an existing list. It ismeant to be performant, and does not check the inputs for correctness in anyway. It is only safe to use after a call todf_list(), which collects andvalidates the columns used to construct the data frame.
Usage
new_data_frame(x = list(), n = NULL, ..., class = NULL)Arguments
x | A named list of equal-length vectors. The lengths are notchecked; it is responsibility of the caller to make sure they areequal. |
n | Number of rows. If |
...,class | Additional arguments for creating subclasses. The following attributes have special behavior:
|
See Also
df_list() for a way to safely construct a data frame's underlyingdata structure from individual columns. This can be used to create anamed list for further use bynew_data_frame().
Examples
new_data_frame(list(x = 1:10, y = 10:1))Date, date-time, and duration S3 classes
Description
A
date(Date) is a double vector. Its value represent the numberof days since the Unix "epoch", 1970-01-01. It has no attributes.A
datetime(POSIXct is a double vector. Its value represents thenumber of seconds since the Unix "Epoch", 1970-01-01. It has a singleattribute: the timezone (tzone))A
duration(difftime)
Usage
new_date(x = double())new_datetime(x = double(), tzone = "")new_duration(x = double(), units = c("secs", "mins", "hours", "days", "weeks"))## S3 method for class 'Date'vec_ptype2(x, y, ...)## S3 method for class 'POSIXct'vec_ptype2(x, y, ...)## S3 method for class 'POSIXlt'vec_ptype2(x, y, ...)## S3 method for class 'difftime'vec_ptype2(x, y, ...)## S3 method for class 'Date'vec_cast(x, to, ...)## S3 method for class 'POSIXct'vec_cast(x, to, ...)## S3 method for class 'POSIXlt'vec_cast(x, to, ...)## S3 method for class 'difftime'vec_cast(x, to, ...)## S3 method for class 'Date'vec_arith(op, x, y, ...)## S3 method for class 'POSIXct'vec_arith(op, x, y, ...)## S3 method for class 'POSIXlt'vec_arith(op, x, y, ...)## S3 method for class 'difftime'vec_arith(op, x, y, ...)Arguments
x | A double vector representing the number of days since UNIXepoch for |
tzone | Time zone. A character vector of length 1. Either |
units | Units of duration. |
Details
These function help the baseDate,POSIXct, anddifftime classes fitinto the vctrs type system by providing constructors, coercion functions,and casting functions.
Examples
new_date(0)new_datetime(0, tzone = "UTC")new_duration(1, "hours")Factor/ordered factor S3 class
Description
Afactor is an integer with attributelevels, a character vector. Thereshould be one level for each integer between 1 andmax(x).Anordered factor has the same properties as a factor, but possessesan extra class that marks levels as having a total ordering.
Usage
new_factor(x = integer(), levels = character(), ..., class = character())new_ordered(x = integer(), levels = character())## S3 method for class 'factor'vec_ptype2(x, y, ...)## S3 method for class 'ordered'vec_ptype2(x, y, ...)## S3 method for class 'factor'vec_cast(x, to, ...)## S3 method for class 'ordered'vec_cast(x, to, ...)Arguments
x | Integer values which index in to |
levels | Character vector of labels. |
...,class | Used to for subclasses. |
Details
These functions help the base factor and ordered factor classes fit in tothe vctrs type system by providing constructors, coercion functions,and casting functions.new_factor() andnew_ordered() are low-levelconstructors - they only check that types, but not values, are valid, soare for expert use only.
Create list_of subclass
Description
Create list_of subclass
Usage
new_list_of(x = list(), ptype = logical(), ..., class = character())Arguments
x | A list |
ptype | The prototype which every element of |
... | Additional attributes used by subclass |
class | Optional subclass name |
Partial type
Description
Usenew_partial() when constructing a new partial type subclass;and useis_partial() to test if a type is partial. All subclassesneed to provide avec_ptype_finalise() method.
Usage
new_partial(..., class = character())is_partial(x)vec_ptype_finalise(x, ...)Arguments
... | Attributes of the partial type |
class | Name of subclass. |
Details
As the name suggests, a partial typepartially specifies a type, andit must be combined with data to yield a full type. A useful exampleof a partial type ispartial_frame(), which makes it possible tospecify the type of just a few columns in a data frame. Use this constructorif you're making your own partial type.
rcrd (record) S3 class
Description
The rcrd class extendsvctr. A rcrd is composed of 1 or morefields,which must be vectors of the same length. Is designed specifically forclasses that can naturally be decomposed into multiple vectors of the samelength, likePOSIXlt, but where the organisation should be consideredan implementation detail invisible to the user (unlike adata.frame).
Usage
new_rcrd(fields, ..., class = character())Arguments
fields | A list or a data frame. Lists must be rectangular(same sizes), and contain uniquely named vectors (at leastone). |
... | Additional attributes |
class | Name of subclass. |
vctr (vector) S3 class
Description
This abstract class provides a set of useful default methods that makes itconsiderably easier to get started with a new S3 vector class. Seevignette("s3-vector") to learn how to use it to create your own S3vector classes.
Usage
new_vctr(.data, ..., class = character(), inherit_base_type = NULL)Arguments
Details
List vctrs are special cases. When created throughnew_vctr(), theresulting list vctr should always be recognized as a list byobj_is_list(). Because of this, ifinherit_base_type isFALSEan error is thrown.
Base methods
The vctr class provides methods for many base generics using a smallerset of generics defined by this package. Generally, you should thinkcarefully before overriding any of the methods that vctrs implements foryou as they've been carefully planned to be internally consistent.
[[and[useNextMethod()dispatch to the underlying base function,then restore attributes withvec_restore().rep()andlength<-work similarly.[[<-and[<-castvalueto same type asx, then callNextMethod().as.logical(),as.integer(),as.numeric(),as.character(),as.Date()andas.POSIXct()methods callvec_cast().Theas.list()method calls[[repeatedly, and theas.data.frame()method uses a standard technique to wrap a vector in a data frame.as.factor(),as.ordered()andas.difftime()are not generic functionsin base R, but have been reimplemented as generics in thegenericspackage.vctrsextends these and callsvec_cast(). To inherit thisbehaviour in a package, import and re-export the generic of interestfromgenerics.==,!=,unique(),anyDuplicated(), andis.na()usevec_proxy().<,<=,>=,>,min(),max(),range(),median(),quantile(), andxtfrm()methods usevec_proxy_compare().+,-,/,*,^,%%,%/%,!,&, and|operatorsusevec_arith().Mathematical operations including the Summary group generics (
prod(),sum(),any(),all()), the Math group generics (abs(),sign(),etc),mean(),is.nan(),is.finite(), andis.infinite()usevec_math().dims(),dims<-,dimnames(),dimnames<-,levels(), andlevels<-methods throw errors.
List checks
Description
obj_is_list()tests ifxis considered a list in the vctrs sense. ItreturnsTRUEif:xis a bare list with no class.xis a list explicitly inheriting from"list".
list_all_vectors()takes a list and returnsTRUEif all elements ofthat list are vectors.list_all_size()takes a list and returnsTRUEif all elements of thatlist have the samesize.obj_check_list(),list_check_all_vectors(), andlist_check_all_size()use the above functions, but throw a standardized and informative error ifthey returnFALSE.
Usage
obj_is_list(x)obj_check_list(x, ..., arg = caller_arg(x), call = caller_env())list_all_vectors(x)list_check_all_vectors(x, ..., arg = caller_arg(x), call = caller_env())list_all_size(x, size)list_check_all_size(x, size, ..., arg = caller_arg(x), call = caller_env())Arguments
x | For |
... | These dots are for future extensions and must be empty. |
arg | An argument name as a string. This argumentwill be mentioned in error messages as the input that is at theorigin of a problem. |
call | The execution environment of a currentlyrunning function, e.g. |
size | The size to check each element for. |
Details
Notably, data frames and S3 record style classes like POSIXlt are notconsidered lists.
See Also
Examples
obj_is_list(list())obj_is_list(list_of(1))obj_is_list(data.frame())list_all_vectors(list(1, mtcars))list_all_vectors(list(1, environment()))list_all_size(list(1:2, 2:3), 2)list_all_size(list(1:2, 2:4), 2)# `list_`-prefixed functions assume a list:try(list_all_vectors(environment()))print() andstr() generics.
Description
These are constructed to be more easily extensible since you can overridethe_header(),_data() or_footer() components individually. Thedefault methods are built on top offormat().
Usage
obj_print(x, ...)obj_print_header(x, ...)obj_print_data(x, ...)obj_print_footer(x, ...)obj_str(x, ...)obj_str_header(x, ...)obj_str_data(x, ...)obj_str_footer(x, ...)Arguments
x | A vector |
... | Additional arguments passed on to methods. See |
Order and sort vectors
Description
vec_order_radix() computes the order ofx. For data frames, the order iscomputed along the rows by computing the order of the first column andusing subsequent columns to break ties.
vec_sort_radix() sortsx. It is equivalent tovec_slice(x, vec_order_radix(x)).
Usage
vec_order_radix( x, ..., direction = "asc", na_value = "largest", nan_distinct = FALSE, chr_proxy_collate = NULL)vec_sort_radix( x, ..., direction = "asc", na_value = "largest", nan_distinct = FALSE, chr_proxy_collate = NULL)Arguments
x | A vector |
... | These dots are for future extensions and must be empty. |
direction | Direction to sort in.
|
na_value | Ordering of missing values.
|
nan_distinct | A single logical specifying whether or not |
chr_proxy_collate | A function generating an alternate representationof character vectors to use for collation, often used for locale-awareordering.
For data frames, Common transformation functions include: |
Value
vec_order_radix()an integer vector the same size asx.vec_sort_radix()a vector with the same size and type asx.
Differences withorder()
Unlike thena.last argument oforder() which decides the positions ofmissing values irrespective of thedecreasing argument, thena_valueargument ofvec_order_radix() interacts withdirection. If missing valuesare considered the largest value, they will appear last in ascending order,and first in descending order.
Character vectors are ordered in the C-locale. This is different frombase::order(), which respectsbase::Sys.setlocale(). Sorting in aconsistent locale can produce more reproducible results between differentsessions and platforms, however, the results of sorting in the C-localecan be surprising. For example, capital letters sort before lower caseletters. Sortingc("b", "C", "a") withvec_sort_radix() will returnc("C", "a", "b"), but withbase::order() will returnc("a", "b", "C")unlessbase::order(method = "radix") is explicitly set, which also usesthe C-locale. While sorting with the C-locale can be useful foralgorithmic efficiency, in many real world uses it can be the cause ofdata analysis mistakes. To balance these trade-offs, you can supply achr_proxy_collate function to transform character vectors into analternative representation that orders in the C-locale in a less surprisingway. For example, providingbase::tolower() as a transform will order theoriginal vector in a case-insensitive manner. Locale-aware ordering can beachieved by providingstringi::stri_sort_key() as a transform, setting thecollation options as appropriate for your locale.
Character vectors are always translated to UTF-8 before ordering, and beforeany transform is applied bychr_proxy_collate.
For complex vectors, if either the real or imaginary component isNA orNaN, then the entire observation is considered missing.
Dependencies ofvec_order_radix()
Dependencies ofvec_sort_radix()
Examples
if (FALSE) {x <- round(sample(runif(5), 9, replace = TRUE), 3)x <- c(x, NA)vec_order_radix(x)vec_sort_radix(x)vec_sort_radix(x, direction = "desc")# Can also handle data framesdf <- data.frame(g = sample(2, 10, replace = TRUE), x = x)vec_order_radix(df)vec_sort_radix(df)vec_sort_radix(df, direction = "desc")# For data frames, `direction` and `na_value` are allowed to be vectors# with length equal to the number of columns in the data framevec_sort_radix( df, direction = c("desc", "asc"), na_value = c("largest", "smallest"))# Character vectors are ordered in the C locale, which orders capital letters# below lowercase onesy <- c("B", "A", "a")vec_sort_radix(y)# To order in a case-insensitive manner, provide a `chr_proxy_collate`# function that transforms the strings to all lowercasevec_sort_radix(y, chr_proxy_collate = tolower)}Partially specify a factor
Description
This special class can be passed as aptype in order to specify that theresult should be a factor that contains at least the specified levels.
Usage
partial_factor(levels = character())Arguments
levels | Character vector of labels. |
Examples
pf <- partial_factor(levels = c("x", "y"))pfvec_ptype_common(factor("v"), factor("w"), .ptype = pf)Partially specify columns of a data frame
Description
This special class can be passed to.ptype in order to specify thetypes of only some of the columns in a data frame.
Usage
partial_frame(...)Arguments
... | Attributes of subclass |
Examples
pf <- partial_frame(x = double())pfvec_rbind( data.frame(x = 1L, y = "a"), data.frame(x = FALSE, z = 10), .ptype = partial_frame(x = double(), a = character()))FAQ - Is my class compatible with vctrs?
Description
vctrs provides a framework for working with vector classes in a genericway. However, it implements several compatibility fallbacks to base Rmethods. In this reference you will find how vctrs tries to becompatible with your vector class, and what base methods you need toimplement for compatibility.
If you’re starting from scratch, we think you’ll find it easier to startusingnew_vctr() as documented invignette("s3-vector"). This guide is aimed for developers withexisting vector classes.
Aggregate operations with fallbacks
All vctrs operations are based on four primitive generics described inthe next section. However there are many higher level operations. Themost important ones implement fallbacks to base generics for maximumcompatibility with existing classes.
vec_slice()falls back to the base[generic if novec_proxy()method is implemented. This way foreignclasses that do not implementvec_restore()canrestore attributes based on the new subsetted contents.vec_c()andvec_rbind()now fall back tobase::c()if the inputs have a common parent class withac()method (only if they have no self-to-selfvec_ptype2()method).vctrs works hard to make your
c()method success in varioussituations (withNULLandNAinputs, even as first input whichwould normally prevent dispatch to your method). The main downsidecompared to using vctrs primitives is that you can’t combine vectorsof different classes since there is no extensible mechanism ofcoercion inc(), and it is less efficient in some cases.
The vctrs primitives
Most functions in vctrs are aggregate operations: they call other vctrsfunctions which themselves call other vctrs functions. The dependenciesof a vctrs functions are listed in the Dependencies section of itsdocumentation page. Take a look atvec_count() for anexample.
These dependencies form a tree whose leaves are the four vctrsprimitives. Here is the diagram forvec_count():

The coercion generics
The coercion mechanism in vctrs is based on two generics:
See thetheory overview.
Two objects with the same class and the same attributes are alwaysconsidered compatible by ptype2 and cast. If the attributes or classesdiffer, they throw an incompatible type error.
Coercion errors are the main source of incompatibility with vctrs. Seethehowto guide if you need to implement methodsfor these generics.
The proxy and restoration generics
These generics are essential for vctrs but mostly optional.vec_proxy() defaults to anidentity function and younormally don’t need to implement it. The proxy a vector must be one ofthe atomic vector types, a list, or a data frame. By default, S3 liststhat do not inherit from"list" do not have an identity proxy. In thatcase, you need to explicitly implementvec_proxy() or make your classinherit from list.
Runs
Description
vec_identify_runs()returns a vector of identifiers for the elements ofxthat indicate which run of repeated values they fall in. The number ofruns is also returned as an attribute,n.vec_run_sizes()returns an integer vector corresponding to the size ofeach run. This is identical to thetimescolumn fromvec_unrep(), butis faster if you don't need the run keys.vec_unrep()is a generalizedbase::rle(). It is documented alongsidethe "repeat" functions ofvec_rep()andvec_rep_each(); look there formore information.
Usage
vec_identify_runs(x)vec_run_sizes(x)Arguments
x | A vector. |
Details
Unlikebase::rle(), adjacent missing values are considered identical whenconstructing runs. For example,vec_identify_runs(c(NA, NA)) will returnc(1, 1), notc(1, 2).
Value
For
vec_identify_runs(), an integer vector with the same size asx. Ascalar integer attribute,n, is attached.For
vec_run_sizes(), an integer vector with size equal to the number ofruns inx.
See Also
vec_unrep() for a generalizedbase::rle().
Examples
x <- c("a", "z", "z", "c", "a", "a")vec_identify_runs(x)vec_run_sizes(x)vec_unrep(x)y <- c(1, 1, 1, 2, 2, 3)# With multiple columns, the runs are constructed rowwisedf <- data_frame( x = x, y = y)vec_identify_runs(df)vec_run_sizes(df)vec_unrep(df)Register a method for a suggested dependency
Description
Generally, the recommend way to register an S3 method is to use theS3Method() namespace directive (often generated automatically by the@export roxygen2 tag). However, this technique requires that the genericbe in an imported package, and sometimes you want to suggest a package,and only provide a method when that package is loaded.s3_register()can be called from your package's.onLoad() to dynamically registera method only if the generic's package is loaded.
Arguments
generic | Name of the generic in the form |
class | Name of the class |
method | Optionally, the implementation of the method. By default,this will be found by looking for a function called Note that providing |
Details
For R 3.5.0 and later,s3_register() is also useful when demonstratingclass creation in a vignette, since method lookup no longer always involvesthe lexical scope. For R 3.6.0 and later, you can achieve a similar effectby using "delayed method registration", i.e. placing the following in yourNAMESPACE file:
if (getRversion() >= "3.6.0") { S3method(package::generic, class)}Usage in other packages
To avoid taking a dependency on vctrs, you copy the source ofs3_register()into your own package. It is licensed under the permissiveunlicense to make itcrystal clear that we're happy for you to do this. There's no need to includethe license or even credit us when using this function.
Examples
# A typical use case is to dynamically register tibble/pillar methods# for your class. That way you avoid creating a hard dependency on packages# that are not essential, while still providing finer control over# printing when they are used..onLoad <- function(...) { s3_register("pillar::pillar_shaft", "vctrs_vctr") s3_register("tibble::type_sum", "vctrs_vctr")}Table S3 class
Description
These functions help the base table class fit into the vctrs type systemby providing coercion and casting functions.
FAQ - How does coercion work in vctrs?
Description
This is an overview of the usage ofvec_ptype2() andvec_cast() andtheir role in the vctrs coercion mechanism. Related topics:
For an example of implementing coercion methods for simple vectors,see
?howto-faq-coercion.For an example of implementing coercion methods for data framesubclasses, see
?howto-faq-coercion-data-frame.For a tutorial about implementing vctrs classes from scratch, see
vignette("s3-vector").
Combination mechanism in vctrs
The coercion system in vctrs is designed to make combination of multipleinputs consistent and extensible. Combinations occur in many places,such as row-binding, joins, subset-assignment, or grouped summaryfunctions that use the split-apply-combine strategy. For example:
vec_c(TRUE, 1)#> [1] 1 1vec_c("a", 1)#> Error in `vec_c()`:#> ! Can't combine `..1` <character> and `..2` <double>.vec_rbind( data.frame(x = TRUE), data.frame(x = 1, y = 2))#> x y#> 1 1 NA#> 2 1 2vec_rbind( data.frame(x = "a"), data.frame(x = 1, y = 2))#> Error in `vec_rbind()`:#> ! Can't combine `..1$x` <character> and `..2$x` <double>.One major goal of vctrs is to provide a central place for implementingthe coercion methods that make generic combinations possible. The tworelevant generics arevec_ptype2() andvec_cast(). They both taketwo arguments and performdouble dispatch, meaning that a method isselected based on the classes of both inputs.
The general mechanism for combining multiple inputs is:
Find the common type of a set of inputs by reducing (as in
base::Reduce()orpurrr::reduce()) thevec_ptype2()binaryfunction over the set.Convert all inputs to the common type with
vec_cast().Initialise the output vector as an instance of this common type with
vec_init().Fill the output vector with the elements of the inputs using
vec_assign().
The last two steps may requirevec_proxy() andvec_restore()implementations, unless the attributes of your class are constant and donot depend on the contents of the vector. We focus here on the first twosteps, which requirevec_ptype2() andvec_cast() implementations.
vec_ptype2()
Methods forvec_ptype2() are passed twoprototypes, i.e. two inputsemptied of their elements. They implement two behaviours:
If the types of their inputs are compatible, indicate which of them isthe richer type by returning it. If the types are of equal resolution,return any of the two.
Throw an error with
stop_incompatible_type()when it can bedetermined from the attributes that the types of the inputs are notcompatible.
Type compatibility
A type iscompatible with another type if the values it representsare a subset or a superset of the values of the other type. The notionof “value” is to be interpreted at a high level, in particular it is notthe same as the memory representation. For example, factors arerepresented in memory with integers but their values are more related tocharacter vectors than to round numbers:
# Two factors are compatiblevec_ptype2(factor("a"), factor("b"))#> factor()#> Levels: a b# Factors are compatible with a charactervec_ptype2(factor("a"), "b")#> character(0)# But they are incompatible with integersvec_ptype2(factor("a"), 1L)#> Error:#> ! Can't combine `factor("a")` <factor<4d52a>> and `1L` <integer>.Richness of type
Richness of type is not a very precise notion. It can be about richerdata (for instance adouble vector covers more values than an integervector), richer behaviour (adata.table has richer behaviour than adata.frame), or both. If you have trouble determining which one of thetwo types is richer, it probably means they shouldn’t be automaticallycoercible.
Let’s look again at what happens when we combine a factor and acharacter:
vec_ptype2(factor("a"), "b")#> character(0)The ptype2 method for<character> and<factor<"a">> returns<character> because the former is a richer type. The factor can onlycontain"a" strings, whereas the character can contain any strings. Inthis sense, factors are asubset of character.
Note that another valid behaviour would be to throw an incompatible typeerror. This is what a strict factor implementation would do. We havedecided to be laxer in vctrs because it is easy to inadvertently createfactors instead of character vectors, especially with older versions ofR wherestringsAsFactors is still true by default.
Consistency and symmetry on permutation
Each ptype2 method should strive to have exactly the same behaviour whenthe inputs are permuted. This is not always possible, for example factorlevels are aggregated in order:
vec_ptype2(factor(c("a", "c")), factor("b"))#> factor()#> Levels: a c bvec_ptype2(factor("b"), factor(c("a", "c")))#> factor()#> Levels: b a cIn any case, permuting the input should not return a fundamentallydifferent type or introduce an incompatible type error.
Coercion hierarchy
The classes that you can coerce together form a coercion (or subtyping)hierarchy. Below is a schema of the hierarchy for the base types likeinteger and factor. In this diagram the directions of the arrows expresswhich type is richer. They flow from the bottom (more constrained types)to the top (richer types).

A coercion hierarchy is distinct from the structural hierarchy impliedby memory types and classes. For instance, in a structural hierarchy,factors are built on top of integers. But in the coercion hierarchy theyare more related to character vectors. Similarly, subclasses are notnecessarily coercible with their superclasses because the coercion andstructural hierarchies are separate.
Implementing a coercion hierarchy
As a class implementor, you have two options. The simplest is to createan entirely separate hierarchy. The date and date-time classes are anexample of an S3-based hierarchy that is completely separate.Alternatively, you can integrate your class in an existing hierarchy,typically by adding parent nodes on top of the hierarchy (your class isricher), by adding children node at the root of the hierarchy (yourclass is more constrained), or by inserting a node in the tree.
These coercion hierarchies areimplicit, in the sense that they areimplied by thevec_ptype2() implementations. There is no structuredway to create or modify a hierarchy, instead you need to implement theappropriate coercion methods for all the types in your hierarchy, anddiligently return the richer type in each case. Thevec_ptype2()implementations are not transitive nor inherited, so all pairwisemethods between classes lying on a given path must be implementedmanually. This is something we might make easier in the future.
vec_cast()
The second generic,vec_cast(), is the one that looks at the data andactually performs the conversion. Because it has access to moreinformation thanvec_ptype2(), it may be stricter and cause an errorin more cases.vec_cast() has three possible behaviours:
Determine that the prototypes of the two inputs are not compatible.This must be decided in exactly the same way as for
vec_ptype2().Callstop_incompatible_cast()if you can determine from theattributes that the types are not compatible.Detect incompatible values. Usually this is because the target type istoo restricted for the values supported by the input type. Forexample, a fractional number can’t be converted to an integer. Themethod should throw an error in that case.
Return the input vector converted to the target type if all values arecompatible. Whereas
vec_ptype2()must return the same type when theinputs are permuted,vec_cast()isdirectional. It always returnsthe type of the right-hand side, or dies trying.
Double dispatch
The dispatch mechanism forvec_ptype2() andvec_cast() looks like S3but is actually a custom mechanism. Compared to S3, it has the followingdifferences:
It dispatches on the classes of the first two inputs.
There is no inheritance of ptype2 and cast methods. This is becausethe S3 class hierarchy is not necessarily the same as the coercionhierarchy.
NextMethod()does not work. Parent methods must be called explicitlyif necessary.The default method is hard-coded.
Data frames
The determination of the common type of data frames withvec_ptype2()happens in three steps:
Match the columns of the two input data frames. If some columnsdon’t exist, they are created and filled with adequately typed
NAvalues.Find the common type for each column by calling
vec_ptype2()oneach pair of matched columns.Find the common data frame type. For example the common type of agrouped tibble and a tibble is a grouped tibble because the latteris the richer type. The common type of a data table and a data frameis a data table.
vec_cast() operates similarly. If a data frame is cast to a targettype that has fewer columns, this is an error.
If you are implementing coercion methods for data frames, you will needto explicitly call the parent methods that perform the common typedetermination or the type conversion described above. These are exportedasdf_ptype2() anddf_cast().
Data frame fallbacks
Being too strict with data frame combinations would cause too much painbecause there are many data frame subclasses in the wild that don’timplement vctrs methods. We have decided to implement a special fallbackbehaviour for foreign data frames. Incompatible data frames fall back toa base data frame:
df1 <- data.frame(x = 1)df2 <- structure(df1, class = c("foreign_df", "data.frame"))vec_rbind(df1, df2)#> x#> 1 1#> 2 1When a tibble is involved, we fall back to tibble:
df3 <- tibble::as_tibble(df1)vec_rbind(df1, df3)#> # A tibble: 2 x 1#> x#> <dbl>#> 1 1#> 2 1
These fallbacks are not ideal but they make sense because all dataframes share a common data structure. This is not generally the case forvectors. For example factors and characters have differentrepresentations, and it is not possible to find a fallback timemechanically.
However this fallback has a big downside: implementing vctrs methods foryour data frame subclass is a breaking behaviour change. The propercoercion behaviour for your data frame class should be specified as soonas possible to limit the consequences of changing the behaviour of yourclass in R scripts.
FAQ - How does recycling work in vctrs and the tidyverse?
Description
Recycling describes the concept of repeating elements of one vector tomatch the size of another. There are two rules that underlie the“tidyverse” recycling rules:
Vectors of size 1 will be recycled to the size of any other vector
Otherwise, all vectors must have the same size
Examples
Vectors of size 1 are recycled to the size of any other vector:
tibble(x = 1:3, y = 1L)#> # A tibble: 3 x 2#> x y#> <int> <int>#> 1 1 1#> 2 2 1#> 3 3 1
This includes vectors of size 0:
tibble(x = integer(), y = 1L)#> # A tibble: 0 x 2#> # i 2 variables: x <int>, y <int>
If vectors aren’t size 1, they must all be the same size. Otherwise, anerror is thrown:
tibble(x = 1:3, y = 4:7)#> Error in `tibble()`:#> ! Tibble columns must have compatible sizes.#> * Size 3: Existing data.#> * Size 4: Column `y`.#> i Only values of size one are recycled.
vctrs backend
Packages in r-lib and the tidyverse generally usevec_size_common() andvec_recycle_common() as the backends forhandling recycling rules.
vec_size_common()returns the common size of multiple vectors, afterapplying the recycling rulesvec_recycle_common()goes one step further, and actually recyclesthe vectors to their common size
vec_size_common(1:3, "x")#> [1] 3vec_recycle_common(1:3, "x")#> [[1]]#> [1] 1 2 3#> #> [[2]]#> [1] "x" "x" "x"vec_size_common(1:3, c("x", "y"))#> Error:#> ! Can't recycle `..1` (size 3) to match `..2` (size 2).Base R recycling rules
The recycling rules described here are stricter than the ones generallyused by base R, which are:
If any vector is length 0, the output will be length 0
Otherwise, the output will be length
max(length_x, length_y), and awarning will be thrown if the length of the longer vector is not aninteger multiple of the length of the shorter vector.
We explore the base R rules in detail invignette("type-size").
A 1d vector of unspecified type
Description
This is apartial type used to represent logical vectorsthat only containNA. These require special handling because we want toallowNA to specify missingness without requiring a type.
Usage
unspecified(n = 0)Arguments
n | Length of vector |
Examples
vec_ptype_show()vec_ptype_show(NA)vec_c(NA, factor("x"))vec_c(NA, Sys.Date())vec_c(NA, Sys.time())vec_c(NA, list(1:3, 4:5))Custom conditions for vctrs package
Description
These functions are called for their side effect of raisingerrors and warnings.These conditions have custom classes and structures to maketesting easier.
Usage
stop_incompatible_type( x, y, ..., x_arg, y_arg, action = c("combine", "convert"), details = NULL, message = NULL, class = NULL, call = caller_env())stop_incompatible_cast( x, to, ..., x_arg, to_arg, details = NULL, message = NULL, class = NULL, call = caller_env())stop_incompatible_op( op, x, y, details = NULL, ..., message = NULL, class = NULL, call = caller_env())stop_incompatible_size( x, y, x_size, y_size, ..., x_arg, y_arg, details = NULL, message = NULL, class = NULL, call = caller_env())allow_lossy_cast(expr, x_ptype = NULL, to_ptype = NULL)Arguments
x,y,to | Vectors |
...,class | Only use these fields when creating a subclass. |
x_arg,y_arg,to_arg | Argument names for |
action | An option to customize the incompatible type message dependingon the context. Errors thrown from |
details | Any additional human readable details. |
message | An overriding message for the error. |
call | The execution environment of a currentlyrunning function, e.g. |
x_ptype,to_ptype | Suppress only the casting errors where |
Value
stop_incompatible_*() unconditionally raise an error of class"vctrs_error_incompatible_*" and"vctrs_error_incompatible".
Examples
# Most of the time, `maybe_lossy_cast()` returns its input normally:maybe_lossy_cast( c("foo", "bar"), NA, "", lossy = c(FALSE, FALSE), x_arg = "", to_arg = "")# If `lossy` has any `TRUE`, an error is thrown:try(maybe_lossy_cast( c("foo", "bar"), NA, "", lossy = c(FALSE, TRUE), x_arg = "", to_arg = ""))# Unless lossy casts are allowed:allow_lossy_cast( maybe_lossy_cast( c("foo", "bar"), NA, "", lossy = c(FALSE, TRUE), x_arg = "", to_arg = "" ))vctrs methods for data frames
Description
These functions help the base data.frame class fit into the vctrs type systemby providing coercion and casting functions.
Usage
## S3 method for class 'data.frame'vec_ptype2(x, y, ...)## S3 method for class 'data.frame'vec_cast(x, to, ...)Repeat a vector
Description
vec_rep()repeats an entire vector a set number oftimes.vec_rep_each()repeats each element of a vector a set number oftimes.vec_unrep()compresses a vector with repeated values. The repeated valuesare returned as akeyalongside the number oftimeseach key isrepeated.
Usage
vec_rep( x, times, ..., error_call = current_env(), x_arg = "x", times_arg = "times")vec_rep_each( x, times, ..., error_call = current_env(), x_arg = "x", times_arg = "times")vec_unrep(x)Arguments
x | A vector. |
times | For For |
... | These dots are for future extensions and must be empty. |
error_call | The execution environment of a currentlyrunning function, e.g. |
x_arg,times_arg | Argument names for errors. |
Details
Usingvec_unrep() andvec_rep_each() together is similar to usingbase::rle() andbase::inverse.rle(). The following invariant showsthe relationship between the two functions:
compressed <- vec_unrep(x)identical(x, vec_rep_each(compressed$key, compressed$times))
There are two main differences betweenvec_unrep() andbase::rle():
vec_unrep()treats adjacent missing values as equivalent, whilerle()treats them as different values.vec_unrep()works along the size ofx, whilerle()works along itslength. This means thatvec_unrep()works on data frames by compressingrepeated rows.
Value
Forvec_rep(), a vector the same type asx with sizevec_size(x) * times.
Forvec_rep_each(), a vector the same type asx with sizesum(vec_recycle(times, vec_size(x))).
Forvec_unrep(), a data frame with two columns,key andtimes.keyis a vector with the same type asx, andtimes is an integer vector.
Dependencies
Examples
# Repeat the entire vectorvec_rep(1:2, 3)# Repeat within each vectorvec_rep_each(1:2, 3)x <- vec_rep_each(1:2, c(3, 4))x# After using `vec_rep_each()`, you can recover the original vector# with `vec_unrep()`vec_unrep(x)df <- data.frame(x = 1:2, y = 3:4)# `rep()` repeats columns of data frames, and returns listsrep(df, each = 2)# `vec_rep()` and `vec_rep_each()` repeat rows, and return data framesvec_rep(df, 2)vec_rep_each(df, 2)# `rle()` treats adjacent missing values as differenty <- c(1, NA, NA, 2)rle(y)# `vec_unrep()` treats them as equivalentvec_unrep(y)Set operations
Description
vec_set_intersect()returns all values in bothxandy.vec_set_difference()returns all values inxbut noty. Notethat this is an asymmetric set difference, meaning it is not commutative.vec_set_union()returns all values in eitherxory.vec_set_symmetric_difference()returns all values in eitherxorybut not both. This is a commutative difference.
Because these areset operations, these functions only return unique valuesfromx andy, returned in the order they first appeared in the originalinput. Names ofx andy are retained on the result, but names are alwaystaken fromx if the value appears in both inputs.
These functions work similarly tointersect(),setdiff(), andunion(),but don't strip attributes and can be used with data frames.
Usage
vec_set_intersect( x, y, ..., ptype = NULL, x_arg = "x", y_arg = "y", error_call = current_env())vec_set_difference( x, y, ..., ptype = NULL, x_arg = "x", y_arg = "y", error_call = current_env())vec_set_union( x, y, ..., ptype = NULL, x_arg = "x", y_arg = "y", error_call = current_env())vec_set_symmetric_difference( x, y, ..., ptype = NULL, x_arg = "x", y_arg = "y", error_call = current_env())Arguments
x,y | A pair of vectors. |
... | These dots are for future extensions and must be empty. |
ptype | If |
x_arg,y_arg | Argument names for |
error_call | The execution environment of a currentlyrunning function, e.g. |
Details
Missing values are treated as equal to other missing values. For doubles andcomplexes,NaN are equal to otherNaN, but not toNA.
Value
A vector of the common type ofx andy (orptype, if supplied)containing the result of the corresponding set function.
Dependencies
vec_set_intersect()
vec_set_difference()
vec_set_union()
vec_set_symmetric_difference()
Examples
x <- c(1, 2, 1, 4, 3)y <- c(2, 5, 5, 1)# All unique values in both `x` and `y`.# Duplicates in `x` and `y` are always removed.vec_set_intersect(x, y)# All unique values in `x` but not `y`vec_set_difference(x, y)# All unique values in either `x` or `y`vec_set_union(x, y)# All unique values in either `x` or `y` but not bothvec_set_symmetric_difference(x, y)# These functions can also be used with data framesx <- data_frame( a = c(2, 3, 2, 2), b = c("j", "k", "j", "l"))y <- data_frame( a = c(1, 2, 2, 2, 3), b = c("j", "l", "j", "l", "j"))vec_set_intersect(x, y)vec_set_difference(x, y)vec_set_union(x, y)vec_set_symmetric_difference(x, y)# Vector names don't affect set membership, but if you'd like to force# them to, you can transform the vector into a two column data framex <- c(a = 1, b = 2, c = 2, d = 3)y <- c(c = 2, b = 1, a = 3, d = 3)vec_set_intersect(x, y)x <- data_frame(name = names(x), value = unname(x))y <- data_frame(name = names(y), value = unname(y))vec_set_intersect(x, y)Arithmetic operations
Description
This generic provides a common double dispatch mechanism for all infixoperators (+,-,/,*,^,%%,%/%,!,&,|). It is usedto power the default arithmetic and boolean operators forvctrs objects,overcoming the limitations of the baseOps generic.
Usage
vec_arith(op, x, y, ...)## Default S3 method:vec_arith(op, x, y, ...)## S3 method for class 'logical'vec_arith(op, x, y, ...)## S3 method for class 'numeric'vec_arith(op, x, y, ...)vec_arith_base(op, x, y)MISSING()Arguments
op | An arithmetic operator as a string |
x,y | A pair of vectors. For |
... | These dots are for future extensions and must be empty. |
Details
vec_arith_base() is provided as a convenience for writing methods. Itrecyclesx andy to common length then calls the base operator with theunderlyingvec_data().
vec_arith() is also used indiff.vctrs_vctr() method via-.
See Also
stop_incompatible_op() for signalling that an arithmeticoperation is not permitted/supported.
Seevec_math() for the equivalent for the unary mathematicalfunctions.
Examples
d <- as.Date("2018-01-01")dt <- as.POSIXct("2018-01-02 12:00")t <- as.difftime(12, unit = "hours")vec_arith("-", dt, 1)vec_arith("-", dt, t)vec_arith("-", dt, d)vec_arith("+", dt, 86400)vec_arith("+", dt, t)vec_arith("+", t, t)vec_arith("/", t, t)vec_arith("/", t, 2)vec_arith("*", t, 2)Convert to an index vector
Description
vec_as_index() has been renamed tovec_as_location() and isdeprecated as of vctrs 0.2.2.
Usage
vec_as_index(i, n, names = NULL)Arguments
i | An integer, character or logical vector specifying thelocations or names of the observations to get/set. Specify |
n | A single integer representing the total size of theobject that |
names | If |
Create a vector of locations
Description
These helpers provide a means of standardizing common indexingmethods such as integer, character or logical indexing.
vec_as_location()accepts integer, character, or logical vectorsof any size. The output is always an integer vector that issuitable for subsetting with[orvec_slice(). It might be adifferent size than the input because negative selections aretransformed to positive ones and logical vectors are transformedto a vector of indices for theTRUElocations.vec_as_location2()accepts a single number or string. It returnsa single location as a integer vector of size 1. This is suitablefor extracting with[[.num_as_location()andnum_as_location2()are specialized variantsthat have extra options for numeric indices.
Usage
vec_as_location( i, n, names = NULL, ..., missing = c("propagate", "remove", "error"), arg = caller_arg(i), call = caller_env())num_as_location( i, n, ..., missing = c("propagate", "remove", "error"), negative = c("invert", "error", "ignore"), oob = c("error", "remove", "extend"), zero = c("remove", "error", "ignore"), arg = caller_arg(i), call = caller_env())vec_as_location2( i, n, names = NULL, ..., missing = c("error", "propagate"), arg = caller_arg(i), call = caller_env())num_as_location2( i, n, ..., negative = c("error", "ignore"), missing = c("error", "propagate"), arg = caller_arg(i), call = caller_env())Arguments
i | An integer, character or logical vector specifying thelocations or names of the observations to get/set. Specify |
n | A single integer representing the total size of theobject that |
names | If |
... | These dots are for future extensions and must be empty. |
missing | How should missing
By default, vector subscripts propagate missing values but scalarsubscripts error on them. Propagated missing values can't be combined with negative indices when |
arg | The argument name to be displayed in error messages. |
call | The execution environment of a currentlyrunning function, e.g. |
negative | How should negative
|
oob | How should out-of-bounds
|
zero | How should zero
|
Value
vec_as_location()andnum_as_location()return an integer vector thatcan be used as an index in a subsetting operation.vec_as_location2()andnum_as_location2()return an integer of size 1that can be used a scalar index for extracting an element.
Examples
x <- array(1:6, c(2, 3))dimnames(x) <- list(c("r1", "r2"), c("c1", "c2", "c3"))# The most common use case validates row indicesvec_as_location(1, vec_size(x))# Negative indices can be used to index from the backvec_as_location(-1, vec_size(x))# Character vectors can be used if `names` are providedvec_as_location("r2", vec_size(x), rownames(x))# You can also construct an index for dimensions other than the firstvec_as_location(c("c2", "c1"), ncol(x), colnames(x))Retrieve and repair names
Description
vec_as_names() takes a character vector of names and repairs itaccording to therepair argument. It is the r-lib and tidyverseequivalent ofbase::make.names().
vctrs deals with a few levels of name repair:
minimalnames exist. Thenamesattribute is notNULL. Thename of an unnamed element is""and neverNA. For instance,vec_as_names()always returns minimal names and data framescreated by the tibble package have names that are, at least,minimal.uniquenames areminimal, have no duplicates, and can be usedwhere a variable name is expected. Empty names,..., and..followed by a sequence of digits are banned.All columns can be accessed by name via
df[["name"]]anddf$`name`andwith(df, `name`).
universalnames areuniqueand syntactic (see Details formore).Names work everywhere, without quoting:
df$nameandwith(df, name)andlm(name1 ~ name2, data = df)anddplyr::select(df, name)all work.
universal impliesunique,unique impliesminimal. Theselevels are nested.
Usage
vec_as_names( names, ..., repair = c("minimal", "unique", "universal", "check_unique", "unique_quiet", "universal_quiet"), repair_arg = NULL, quiet = FALSE, call = caller_env())Arguments
names | A character vector. |
... | These dots are for future extensions and must be empty. |
repair | Either a string or a function. If a string, it must be one of
The The options |
repair_arg | If specified and |
quiet | By default, the user is informed of any renamingcaused by repairing the names. This only concerns unique anduniversal repairing. Set Users can silence the name repair messages by setting the |
call | The execution environment of a currentlyrunning function, e.g. |
minimal names
minimal names exist. Thenames attribute is notNULL. Thename of an unnamed element is"" and neverNA.
Examples:
Original names of a vector with length 3: NULL minimal names: "" "" "" Original names: "x" NA minimal names: "x" ""
unique names
unique names areminimal, have no duplicates, and can be used(possibly with backticks) in contexts where a variable isexpected. Empty names,..., and.. followed by a sequence ofdigits are banned. If a data frame hasunique names, you canindex it by name, and also access the columns by name. Inparticular,df[["name"]] anddf$`name` and alsowith(df, `name`) always work.
There are many ways to make namesunique. We append a suffix of the form...j to any name that is"" or a duplicate, wherej is the position.We also change..# and... to...#.
Example:
Original names: "" "x" "" "y" "x" "..2" "..." unique names: "...1" "x...2" "...3" "y" "x...5" "...6" "...7"
Pre-existing suffixes of the form...j are always stripped, priorto making namesunique, i.e. reconstructing the suffixes. If thisinteracts poorly with your names, you should take control of namerepair.
universal names
universal names areunique and syntactic, meaning they:
Are never empty (inherited from
unique).Have no duplicates (inherited from
unique).Are not
.... Do not have the form..i, whereiis anumber (inherited fromunique).Consist of letters, numbers, and the dot
.or underscore_characters.Start with a letter or start with the dot
.not followed by anumber.Are not areserved word, e.g.,
iforfunctionorTRUE.
If a vector hasuniversal names, variable names can be used"as is" in code. They work well with nonstandard evaluation, e.g.,df$name works.
vctrs has a different method of making names syntactic thanbase::make.names(). In general, vctrs prepends one or more dots. until the name is syntactic.
Examples:
Original names: "" "x" NA "x"universal names: "...1" "x...2" "...3" "x...4" Original names: "(y)" "_z" ".2fa" "FALSE" universal names: ".y." "._z" "..2fa" ".FALSE"
See Also
rlang::names2() returns the names of an object, aftermaking themminimal.
Examples
# By default, `vec_as_names()` returns minimal names:vec_as_names(c(NA, NA, "foo"))# You can make them unique:vec_as_names(c(NA, NA, "foo"), repair = "unique")# Universal repairing fixes any non-syntactic name:vec_as_names(c("_foo", "+"), repair = "universal")Repair names with legacy method
Description
This standardises names with the legacy approach that was used intidyverse packages (such as tibble, tidyr, and readxl) beforevec_as_names() was implemented. This tool is meant to helptransitioning to the new name repairing standard and will bedeprecated and removed from the package some time in the future.
Usage
vec_as_names_legacy(names, prefix = "V", sep = "")Arguments
names | A character vector. |
prefix,sep | Prefix and separator for repaired names. |
Examples
if (rlang::is_installed("tibble")) {library(tibble)# Names repair is turned off by default in tibble:try(tibble(a = 1, a = 2))# You can turn it on by supplying a repair method:tibble(a = 1, a = 2, .name_repair = "universal")# If you prefer the legacy method, use `vec_as_names_legacy()`:tibble(a = 1, a = 2, .name_repair = vec_as_names_legacy)}Convert to a base subscript type
Description
Converti to the base type expected byvec_as_location() orvec_as_location2(). The values of the subscript type arenot checked in any way (length, missingness, negative elements).
Usage
vec_as_subscript( i, ..., logical = c("cast", "error"), numeric = c("cast", "error"), character = c("cast", "error"), arg = NULL, call = caller_env())vec_as_subscript2( i, ..., numeric = c("cast", "error"), character = c("cast", "error"), arg = NULL, call = caller_env())Arguments
i | An integer, character or logical vector specifying thelocations or names of the observations to get/set. Specify |
... | These dots are for future extensions and must be empty. |
logical,numeric,character | How to handle logical, numeric,and character subscripts. If If |
arg | The argument name to be displayed in error messages. |
call | The execution environment of a currentlyrunning function, e.g. |
Assert an argument has known prototype and/or size
Description
vec_is()is a predicate that checks if its input is a vector thatconforms to a prototype and/or a size.vec_assert()throws an error when the input is not a vector ordoesn't conform.
Usage
vec_assert( x, ptype = NULL, size = NULL, arg = caller_arg(x), call = caller_env())vec_is(x, ptype = NULL, size = NULL)Arguments
x | A vector argument to check. |
ptype | Prototype to compare against. If the prototype has aclass, its |
size | A single integer size against which to compare. |
arg | Name of argument being checked. This is used in errormessages. The label of the expression passed as |
call | The execution environment of a currentlyrunning function, e.g. |
Value
vec_is() returnsTRUE orFALSE.vec_assert() eitherthrows a typed error (see section on error types) or returnsx,invisibly.
Error types
vec_is() never throws.vec_assert() throws the following errors:
If the input is not a vector, an error of class
"vctrs_error_scalar_type"is raised.If the prototype doesn't match, an error of class
"vctrs_error_assert_ptype"is raised.If the size doesn't match, an error of class
"vctrs_error_assert_size"is raised.
Both errors inherit from"vctrs_error_assert".
Lifecycle
Bothvec_is() andvec_assert() are questioning because theirptypearguments have semantics that are challenging to define clearly and arerarely useful.
Use
obj_is_vector()orobj_check_vector()for vector checksUse
vec_check_size()for size checksUse
vec_cast(),inherits(), or simple type predicates likerlang::is_logical()for specific type checks
Vectors and scalars
Informally, a vector is a collection that makes sense to use as column in adata frame. The following rules define whether or notx is considered avector.
If novec_proxy() method has been registered,x is a vector if:
Thebase type of the object is atomic:
"logical","integer","double","complex","character", or"raw".xis a list, as defined byobj_is_list().xis adata.frame.
If avec_proxy() method has been registered,x is a vector if:
The proxy satisfies one of the above conditions.
The base type of the proxy is
"list", regardless of its class. S3 listsare thus treated as scalars unless they implement avec_proxy()method.
Otherwise an object is treated as scalar and cannot be used as a vector.In particular:
NULLis not a vector.S3 lists like
lmobjects are treated as scalars by default.Objects of typeexpression are not treated as vectors.
Combine many data frames into one data frame
Description
This pair of functions binds together data frames (and vectors), eitherrow-wise or column-wise. Row-binding creates a data frame with common typeacross all arguments. Column-binding creates a data frame with common lengthacross all arguments.
Usage
vec_rbind( ..., .ptype = NULL, .names_to = rlang::zap(), .name_repair = c("unique", "universal", "check_unique", "unique_quiet", "universal_quiet"), .name_spec = NULL, .error_call = current_env())vec_cbind( ..., .ptype = NULL, .size = NULL, .name_repair = c("unique", "universal", "check_unique", "minimal", "unique_quiet", "universal_quiet"), .error_call = current_env())Arguments
... | Data frames or vectors. When the inputs are named:
|
.ptype | If Alternatively, you can supply |
.names_to | This controls what to do with input names supplied in
|
.name_repair | One of With |
.name_spec | A name specification (as documented in |
.error_call | The execution environment of a currentlyrunning function, e.g. |
.size | If, Alternatively, specify the desired number of rows, and any inputs of length1 will be recycled appropriately. |
Value
A data frame, or subclass of data frame.
If... is a mix of different data frame subclasses,vec_ptype2()will be used to determine the output type. Forvec_rbind(), thiswill determine the type of the container and the type of each column;forvec_cbind() it only determines the type of the output container.If there are no non-NULL inputs, the result will bedata.frame().
Invariants
All inputs are first converted to a data frame. The conversion for1d vectors depends on the direction of binding:
For
vec_rbind(), each element of the vector becomes a column ina single row.For
vec_cbind(), each element of the vector becomes a row in asingle column.
Once the inputs have all become data frames, the followinginvariants are observed for row-binding:
vec_size(vec_rbind(x, y)) == vec_size(x) + vec_size(y)vec_ptype(vec_rbind(x, y)) = vec_ptype_common(x, y)
Note that if an input is an empty vector, it is first converted toa 1-row data frame with 0 columns. Despite being empty, itseffective size for the total number of rows is 1.
For column-binding, the following invariants apply:
vec_size(vec_cbind(x, y)) == vec_size_common(x, y)vec_ptype(vec_cbind(x, y)) == vec_cbind(vec_ptype(x), vec_ptype(x))
Dependencies
vctrs dependencies
base dependencies ofvec_rbind()
If columns to combine inherit from a common class,vec_rbind() falls back tobase::c() if there exists ac()method implemented for this class hierarchy.
See Also
vec_c() for combining 1d vectors.
Examples
# row binding -----------------------------------------# common columns are coerced to common classvec_rbind( data.frame(x = 1), data.frame(x = FALSE))# unique columns are filled with NAsvec_rbind( data.frame(x = 1), data.frame(y = "x"))# null inputs are ignoredvec_rbind( data.frame(x = 1), NULL, data.frame(x = 2))# bare vectors are treated as rowsvec_rbind( c(x = 1, y = 2), c(x = 3))# default names will be supplied if arguments are not namedvec_rbind( 1:2, 1:3, 1:4)# column binding --------------------------------------# each input is recycled to have common lengthvec_cbind( data.frame(x = 1), data.frame(y = 1:3))# bare vectors are treated as columnsvec_cbind( data.frame(x = 1), y = letters[1:3])# if you supply a named data frame, it is packed in a single columndata <- vec_cbind( x = data.frame(a = 1, b = 2), y = 1)data# Packed data frames are nested in a single column. This makes it# possible to access it through a single name:data$x# since the base print method is suboptimal with packed data# frames, it is recommended to use tibble to work with these:if (rlang::is_installed("tibble")) { vec_cbind(x = tibble::tibble(a = 1, b = 2), y = 1)}# duplicate names are flaggedvec_cbind(x = 1, x = 2)Combine many vectors into one vector
Description
Combine all arguments into a new vector of common type.
Usage
vec_c( ..., .ptype = NULL, .name_spec = NULL, .name_repair = c("minimal", "unique", "check_unique", "universal", "unique_quiet", "universal_quiet"), .error_arg = "", .error_call = current_env())Arguments
... | Vectors to coerce. |
.ptype | If Alternatively, you can supply |
.name_spec | A name specification for combininginner and outer names. This is relevant for inputs passed with aname, when these inputs are themselves named, like
See thename specification topic. |
.name_repair | How to repair names, see |
.error_arg | An argument name as a string. This argumentwill be mentioned in error messages as the input that is at theorigin of a problem. |
.error_call | The execution environment of a currentlyrunning function, e.g. |
Value
A vector with class given by.ptype, and length equal to thesum of thevec_size() of the contents of....
The vector will have names if the individual components have names(inner names) or if the arguments are named (outer names). If bothinner and outer names are present, an error is thrown unless a.name_spec is provided.
Invariants
vec_size(vec_c(x, y)) == vec_size(x) + vec_size(y)vec_ptype(vec_c(x, y)) == vec_ptype_common(x, y).
Dependencies
vctrs dependencies
vec_cast_common()with fallback
base dependencies
If inputs inherit from a common class hierarchy,vec_c() fallsback tobase::c() if there exists ac() method implemented forthis class hierarchy.
See Also
vec_cbind()/vec_rbind() for combining data frames by rowsor columns.
Examples
vec_c(FALSE, 1L, 1.5)# Date/times --------------------------c(Sys.Date(), Sys.time())c(Sys.time(), Sys.Date())vec_c(Sys.Date(), Sys.time())vec_c(Sys.time(), Sys.Date())# Factors -----------------------------c(factor("a"), factor("b"))vec_c(factor("a"), factor("b"))# By default, named inputs must be length 1:vec_c(name = 1)try(vec_c(name = 1:3))# Pass a name specification to work around this:vec_c(name = 1:3, .name_spec = "{outer}_{inner}")# See `?name_spec` for more examples of name specifications.Cast a vector to a specified type
Description
vec_cast() provides directional conversions from one type ofvector to another. Along withvec_ptype2(), this generic formsthe foundation of type coercions in vctrs.
Usage
vec_cast(x, to, ..., x_arg = caller_arg(x), to_arg = "", call = caller_env())vec_cast_common(..., .to = NULL, .arg = "", .call = caller_env())## S3 method for class 'logical'vec_cast(x, to, ...)## S3 method for class 'integer'vec_cast(x, to, ...)## S3 method for class 'double'vec_cast(x, to, ...)## S3 method for class 'complex'vec_cast(x, to, ...)## S3 method for class 'raw'vec_cast(x, to, ...)## S3 method for class 'character'vec_cast(x, to, ...)## S3 method for class 'list'vec_cast(x, to, ...)Arguments
x | Vectors to cast. |
to,.to | Type to cast to. If |
... | For |
x_arg | Argument name for |
to_arg | Argument name |
call,.call | The execution environment of a currentlyrunning function, e.g. |
.arg | An argument name as a string. This argumentwill be mentioned in error messages as the input that is at theorigin of a problem. |
Value
A vector the same length asx with the same type asto,or an error if the cast is not possible. An error is generated ifinformation is lost when casting between compatible types (i.e. whenthere is no 1-to-1 mapping for a specific value).
Implementing coercion methods
For an overview of how these generics work and their roles in vctrs,see
?theory-faq-coercion.For an example of implementing coercion methods for simple vectors,see
?howto-faq-coercion.For an example of implementing coercion methods for data framesubclasses, see
?howto-faq-coercion-data-frame.For a tutorial about implementing vctrs classes from scratch, see
vignette("s3-vector").
Dependencies ofvec_cast_common()
vctrs dependencies
base dependencies
Some functions enable a base-class fallback forvec_cast_common(). In that case the inputs are deemed compatiblewhen they have the samebase type and inherit fromthe same base class.
See Also
Callstop_incompatible_cast() when you determine from theattributes that an input can't be cast to the target type.
Examples
# x is a double, but no information is lostvec_cast(1, integer())# When information is lost the cast failstry(vec_cast(c(1, 1.5), integer()))try(vec_cast(c(1, 2), logical()))# You can suppress this error and get the partial resultsallow_lossy_cast(vec_cast(c(1, 1.5), integer()))allow_lossy_cast(vec_cast(c(1, 2), logical()))# By default this suppress all lossy cast errors without# distinction, but you can be specific about what cast is allowed# by supplying prototypesallow_lossy_cast(vec_cast(c(1, 1.5), integer()), to_ptype = integer())try(allow_lossy_cast(vec_cast(c(1, 2), logical()), to_ptype = integer()))# No sensible coercion is possible so an error is generatedtry(vec_cast(1.5, factor("a")))# Cast to common typevec_cast_common(factor("a"), factor(c("a", "b")))Frame prototype
Description
This is an experimental generic that returns zero-columns variantsof a data frame. It is needed forvec_cbind(), to work around thelack of colwise primitives in vctrs. Expect changes.
Usage
vec_cbind_frame_ptype(x, ...)Arguments
x | A data frame. |
... | These dots are for future extensions and must be empty. |
Chopping
Description
vec_chop()provides an efficient method to repeatedly slice a vector. Itcaptures the pattern ofmap(indices, vec_slice, x = x). When no indicesare supplied, it is generally equivalent toas.list().list_unchop()combines a list of vectors into a single vector, placingelements in the output according to the locations specified byindices.It is similar tovec_c(), but gives greater control over how the elementsare combined. When no indices are supplied, it is identical tovec_c(),but typically a little faster.
Ifindices selects every value inx exactly once, in any order, thenlist_unchop() is the inverse ofvec_chop() and the following invariantholds:
list_unchop(vec_chop(x, indices = indices), indices = indices) == x
Usage
vec_chop(x, ..., indices = NULL, sizes = NULL)list_unchop( x, ..., indices = NULL, ptype = NULL, name_spec = NULL, name_repair = c("minimal", "unique", "check_unique", "universal", "unique_quiet", "universal_quiet"), error_arg = "x", error_call = current_env())Arguments
x | A vector |
... | These dots are for future extensions and must be empty. |
indices | For For |
sizes | An integer vector of non-negative sizes representing sequentialindices to slice For example,
|
ptype | If |
name_spec | A name specification for combininginner and outer names. This is relevant for inputs passed with aname, when these inputs are themselves named, like
See thename specification topic. |
name_repair | How to repair names, see |
error_arg | An argument name as a string. This argumentwill be mentioned in error messages as the input that is at theorigin of a problem. |
error_call | The execution environment of a currentlyrunning function, e.g. |
Value
vec_chop(): A list where each element has the same type asx. The sizeof the list is equal tovec_size(indices),vec_size(sizes), orvec_size(x)depending on whether or notindicesorsizesis provided.list_unchop(): A vector of typevec_ptype_common(!!!x), orptype, ifspecified. The size is computed asvec_size_common(!!!indices)unlessthe indices areNULL, in which case the size isvec_size_common(!!!x).
Dependencies ofvec_chop()
Dependencies oflist_unchop()
Examples
vec_chop(1:5)# These two are equivalentvec_chop(1:5, indices = list(1:2, 3:5))vec_chop(1:5, sizes = c(2, 3))# Can also be used on data framesvec_chop(mtcars, indices = list(1:3, 4:6))# If `indices` selects every value in `x` exactly once,# in any order, then `list_unchop()` inverts `vec_chop()`x <- c("a", "b", "c", "d")indices <- list(2, c(3, 1), 4)vec_chop(x, indices = indices)list_unchop(vec_chop(x, indices = indices), indices = indices)# When unchopping, size 1 elements of `x` are recycled# to the size of the corresponding indexlist_unchop(list(1, 2:3), indices = list(c(1, 3, 5), c(2, 4)))# Names are retained, and outer names can be combined with inner# names through the use of a `name_spec`lst <- list(x = c(a = 1, b = 2), y = 1)list_unchop(lst, indices = list(c(3, 2), c(1, 4)), name_spec = "{outer}_{inner}")# An alternative implementation of `ave()` can be constructed using# `vec_chop()` and `list_unchop()` in combination with `vec_group_loc()`ave2 <- function(.x, .by, .f, ...) { indices <- vec_group_loc(.by)$loc chopped <- vec_chop(.x, indices = indices) out <- lapply(chopped, .f, ...) list_unchop(out, indices = indices)}breaks <- warpbreaks$breakswool <- warpbreaks$woolave2(breaks, wool, mean)identical( ave2(breaks, wool, mean), ave(breaks, wool, FUN = mean))# If you know your input is sorted and you'd like to split on the groups,# `vec_run_sizes()` can be efficiently combined with `sizes`df <- data_frame( g = c(2, 5, 5, 6, 6, 6, 6, 8, 9, 9), x = 1:10)vec_chop(df, sizes = vec_run_sizes(df$g))# If you have a list of homogeneous vectors, sometimes it can be useful to# unchop, apply a function to the flattened vector, and then rechop according# to the original indices. This can be done efficiently with `list_sizes()`.x <- list(c(1, 2, 1), c(3, 1), 5, double())x_flat <- list_unchop(x)x_flat <- x_flat + max(x_flat)vec_chop(x_flat, sizes = list_sizes(x))Compare two vectors
Description
Compare two vectors
Usage
vec_compare(x, y, na_equal = FALSE, .ptype = NULL)Arguments
x,y | Vectors with compatible types and lengths. |
na_equal | Should |
.ptype | Override to optionally specify common type |
Value
An integer vector with values -1 forx < y, 0 ifx == y,and 1 ifx > y. Ifna_equal isFALSE, the result will beNAif eitherx ory isNA.
S3 dispatch
vec_compare() is not generic for performance; instead it usesvec_proxy_compare() to create a proxy that is used in the comparison.
Dependencies
vec_cast_common()with fallback
Examples
vec_compare(c(TRUE, FALSE, NA), FALSE)vec_compare(c(TRUE, FALSE, NA), FALSE, na_equal = TRUE)vec_compare(1:10, 5)vec_compare(runif(10), 0.5)vec_compare(letters[1:10], "d")df <- data.frame(x = c(1, 1, 1, 2), y = c(0, 1, 2, 1))vec_compare(df, data.frame(x = 1, y = 1))Count unique values in a vector
Description
Count the number of unique values in a vector.vec_count() has twoimportant differences totable(): it returns a data frame, and whengiven multiple inputs (as a data frame), it only counts combinations thatappear in the input.
Usage
vec_count(x, sort = c("count", "key", "location", "none"))Arguments
x | A vector (including a data frame). |
sort | One of "count", "key", "location", or "none".
|
Value
A data frame with columnskey (same type asx) andcount (an integer vector).
Dependencies
Examples
vec_count(mtcars$vs)vec_count(iris$Species)# If you count a data frame you'll get a data frame# column in the outputstr(vec_count(mtcars[c("vs", "am")]))# Sorting ---------------------------------------x <- letters[rpois(100, 6)]# default is to sort by frequencyvec_count(x)# by can sort by keyvec_count(x, sort = "key")# or location of first valuevec_count(x, sort = "location")head(x)# or not at allvec_count(x, sort = "none")Extract underlying data
Description
Extract the data underlying an S3 vector object, i.e. the underlying(named) atomic vector, data frame, or list.
Usage
vec_data(x)Arguments
x | A vector or object implementing |
Value
The data underlyingx, free from any attributes except the names.
Difference withvec_proxy()
vec_data()returns unstructured data. The only attributespreserved are names, dims, and dimnames.Currently, due to the underlying memory architecture of R, thiscreates a full copy of the data for atomic vectors.
vec_proxy()may return structured data. This generic is themain customisation point for accessing memory values in vctrs,along withvec_restore().Methods must return a vector type. Records and data frames willbe processed rowwise.
Default cast and ptype2 methods
Description
These functions are automatically called when novec_ptype2() orvec_cast() method is implemented for a pair of types.
They apply special handling if one of the inputs is of type
AsIsorsfc.They attempt a number of fallbacks in cases where it would be tooinconvenient to be strict:
If the class and attributes are the same they are consideredcompatible.
vec_default_cast()returnsxin this case.In case of incompatible data frame classes, they fall back to
data.frame. If an incompatible subclass of tibble isinvolved, they fall back totbl_df.
Otherwise, an error is thrown with
stop_incompatible_type()orstop_incompatible_cast().
Usage
vec_default_cast(x, to, ..., x_arg = "", to_arg = "", call = caller_env())vec_default_ptype2(x, y, ..., x_arg = "", y_arg = "", call = caller_env())Arguments
x | Vectors to cast. |
to | Type to cast to. If |
... | For |
x_arg | Argument name for |
to_arg | Argument name |
call | The execution environment of a currentlyrunning function, e.g. |
Complete
Description
vec_detect_complete() detects "complete" observations. An observation isconsidered complete if it is non-missing. For most vectors, this implies thatvec_detect_complete(x) == !vec_detect_missing(x).
For data frames and matrices, a row is only considered complete if allelements of that row are non-missing. To compare,!vec_detect_missing(x)detects rows that are partially complete (they have at least one non-missingvalue).
Usage
vec_detect_complete(x)Arguments
x | A vector |
Details
Arecord type vector is similar to a data frame, and is onlyconsidered complete if all fields are non-missing.
Value
A logical vector with the same size asx.
See Also
Examples
x <- c(1, 2, NA, 4, NA)# For most vectors, this is identical to `!vec_detect_missing(x)`vec_detect_complete(x)!vec_detect_missing(x)df <- data_frame( x = x, y = c("a", "b", NA, "d", "e"))# This returns `TRUE` where all elements of the row are non-missing.# Compare that with `!vec_detect_missing()`, which detects rows that have at# least one non-missing value.df2 <- dfdf2$all_non_missing <- vec_detect_complete(df)df2$any_non_missing <- !vec_detect_missing(df)df2Find duplicated values
Description
vec_duplicate_any(): detects the presence of duplicated values,similar toanyDuplicated().vec_duplicate_detect(): returns a logical vector describing if eachelement of the vector is duplicated elsewhere. Unlikeduplicated(), itreports all duplicated values, not just the second and subsequentrepetitions.vec_duplicate_id(): returns an integer vector giving the location ofthe first occurrence of the value.
Usage
vec_duplicate_any(x)vec_duplicate_detect(x)vec_duplicate_id(x)Arguments
x | A vector (including a data frame). |
Value
vec_duplicate_any(): a logical vector of length 1.vec_duplicate_detect(): a logical vector the same length asx.vec_duplicate_id(): an integer vector the same length asx.
Missing values
In most cases, missing values are not considered to be equal, i.e.NA == NA is notTRUE. This behaviour would be unappealing here,so these functions consider allNAs to be equal. (Similarly,allNaN are also considered to be equal.)
Dependencies
See Also
vec_unique() for functions that work with the dual of duplicatedvalues: unique values.
Examples
vec_duplicate_any(1:10)vec_duplicate_any(c(1, 1:10))x <- c(10, 10, 20, 30, 30, 40)vec_duplicate_detect(x)# Note that `duplicated()` doesn't consider the first instance to# be a duplicateduplicated(x)# Identify elements of a vector by the location of the first element that# they're equal to:vec_duplicate_id(x)# Location of the unique values:vec_unique_loc(x)# Equivalent to `duplicated()`:vec_duplicate_id(x) == seq_along(x)Is a vector empty
Description
This function is defunct, please usevec_is_empty().
Usage
vec_empty(x)Arguments
x | An object. |
Equality
Description
vec_equal() tests if two vectors are equal.
Usage
vec_equal(x, y, na_equal = FALSE, .ptype = NULL)Arguments
x,y | Vectors with compatible types and lengths. |
na_equal | Should |
.ptype | Override to optionally specify common type |
Value
A logical vector the same size as the common size ofx andy.Will only containNAs ifna_equal isFALSE.
Dependencies
vec_cast_common()with fallback
See Also
Examples
vec_equal(c(TRUE, FALSE, NA), FALSE)vec_equal(c(TRUE, FALSE, NA), FALSE, na_equal = TRUE)vec_equal(5, 1:10)vec_equal("d", letters[1:10])df <- data.frame(x = c(1, 1, 2, 1), y = c(1, 2, 1, NA))vec_equal(df, data.frame(x = 1, y = 2))Missing values
Description
vec_equal_na() has been renamed tovec_detect_missing() and is deprecatedas of vctrs 0.5.0.
Usage
vec_equal_na(x)Arguments
x | A vector |
Value
A logical vector the same size asx.
Create a data frame from all combinations of the inputs
Description
vec_expand_grid() creates a new data frame by creating a grid of allpossible combinations of the input vectors. It is inspired byexpand.grid(). Compared withexpand.grid(), it:
Produces sorted output by default by varying the first column the slowest,rather than the fastest. Control this with
.vary.Never converts strings to factors.
Does not add additional attributes.
Drops
NULLinputs.Can expand any vector type, including data frames andrecords.
Usage
vec_expand_grid( ..., .vary = "slowest", .name_repair = "check_unique", .error_call = current_env())Arguments
... | Name-value pairs. The name will become the column name in theresulting data frame. |
.vary | One of:
|
.name_repair | One of |
.error_call | The execution environment of a currentlyrunning function, e.g. |
Details
If any input is empty (i.e. size 0), then the result will have 0 rows.
If no inputs are provided, the result is a 1 row data frame with 0 columns.This is consistent with the fact thatprod() with no inputs returns1.
Value
A data frame with as many columns as there are inputs in... and as manyrows as theprod() of the sizes of the inputs.
Examples
vec_expand_grid(x = 1:2, y = 1:3)# Use `.vary` to match `expand.grid()`:vec_expand_grid(x = 1:2, y = 1:3, .vary = "fastest")# Can also expand data framesvec_expand_grid( x = data_frame(a = 1:2, b = 3:4), y = 1:4)Fill in missing values with the previous or following value
Description
vec_fill_missing() fills gaps of missing values with the previous orfollowing non-missing value.
Usage
vec_fill_missing( x, direction = c("down", "up", "downup", "updown"), max_fill = NULL)Arguments
x | A vector |
direction | Direction in which to fill missing values. Must be either |
max_fill | A single positive integer specifying the maximum number ofsequential missing values that will be filled. If |
Examples
x <- c(NA, NA, 1, NA, NA, NA, 3, NA, NA)# Filling down replaces missing values with the previous non-missing valuevec_fill_missing(x, direction = "down")# To also fill leading missing values, use `"downup"`vec_fill_missing(x, direction = "downup")# Limit the number of sequential missing values to fill with `max_fill`vec_fill_missing(x, max_fill = 1)# Data frames are filled rowwise. Rows are only considered missing# if all elements of that row are missing.y <- c(1, NA, 2, NA, NA, 3, 4, NA, 5)df <- data_frame(x = x, y = y)dfvec_fill_missing(df)Identify groups
Description
vec_group_id()returns an identifier for the group that each element ofxfalls in, constructed in the order that they appear. The number ofgroups is also returned as an attribute,n.vec_group_loc()returns a data frame containing akeycolumn with theunique groups, and aloccolumn with the locations of each group inx.vec_group_rle()locates groups inxand returns them run lengthencoded in the order that they appear. The return value is a rcrd objectwith fields for thegroupidentifiers and the runlengthof thecorresponding group. The number of groups is also returned as anattribute,n.
Usage
vec_group_id(x)vec_group_loc(x)vec_group_rle(x)Arguments
x | A vector |
Value
vec_group_id(): An integer vector with the same size asx.vec_group_loc(): A two column data frame with size equal tovec_size(vec_unique(x)).A
keycolumn of typevec_ptype(x)A
loccolumn of type list, with elements of type integer.
vec_group_rle(): Avctrs_group_rlercrd object with two integervector fields:groupandlength.
Note that when usingvec_group_loc() for complex types, the defaultdata.frame print method will be suboptimal, and you will want to coerceinto a tibble to better understand the output.
Dependencies
Examples
purrr <- c("p", "u", "r", "r", "r")vec_group_id(purrr)vec_group_rle(purrr)groups <- mtcars[c("vs", "am")]vec_group_id(groups)group_rle <- vec_group_rle(groups)group_rle# Access fields with `field()`field(group_rle, "group")field(group_rle, "length")# `vec_group_id()` is equivalent tovec_match(groups, vec_unique(groups))vec_group_loc(mtcars$vs)vec_group_loc(mtcars[c("vs", "am")])if (require("tibble")) { as_tibble(vec_group_loc(mtcars[c("vs", "am")]))}Initialize a vector
Description
Initialize a vector
Usage
vec_init(x, n = 1L)Arguments
x | Template of vector to initialize. |
n | Desired size of result. |
Dependencies
vec_slice()
Examples
vec_init(1:10, 3)vec_init(Sys.Date(), 5)vec_init(mtcars, 2)Interleave many vectors into one vector
Description
vec_interleave() combines multiple vectors together, much likevec_c(),but does so in such a way that the elements of each vector are interleavedtogether.
It is a more efficient equivalent to the following usage ofvec_c():
vec_interleave(x, y) == vec_c(x[1], y[1], x[2], y[2], ..., x[n], y[n])
Usage
vec_interleave( ..., .ptype = NULL, .name_spec = NULL, .name_repair = c("minimal", "unique", "check_unique", "universal", "unique_quiet", "universal_quiet"))Arguments
... | Vectors to interleave. These will berecycled to a common size. |
.ptype | If Alternatively, you can supply |
.name_spec | A name specification for combininginner and outer names. This is relevant for inputs passed with aname, when these inputs are themselves named, like
See thename specification topic. |
.name_repair | How to repair names, see |
Dependencies
vctrs dependencies
Examples
# The most common case is to interleave two vectorsvec_interleave(1:3, 4:6)# But you aren't restricted to just twovec_interleave(1:3, 4:6, 7:9, 10:12)# You can also interleave data framesx <- data_frame(x = 1:2, y = c("a", "b"))y <- data_frame(x = 3:4, y = c("c", "d"))vec_interleave(x, y)List checks
Description
These functions have been deprecated as of vctrs 0.6.0.
vec_is_list()has been renamed toobj_is_list().vec_check_list()has been renamed toobj_check_list().
Usage
vec_is_list(x)vec_check_list(x, ..., arg = caller_arg(x), call = caller_env())Arguments
x | For |
... | These dots are for future extensions and must be empty. |
arg | An argument name as a string. This argumentwill be mentioned in error messages as the input that is at theorigin of a problem. |
call | The execution environment of a currentlyrunning function, e.g. |
Locate observations matching specified conditions
Description
vec_locate_matches() is a more flexible version ofvec_match() used toidentify locations where each value ofneedles matches one or multiplevalues inhaystack. Unlikevec_match(),vec_locate_matches() returnsall matches by default, and can match on binary conditions other thanequality, such as>,>=,<, and<=.
Usage
vec_locate_matches( needles, haystack, ..., condition = "==", filter = "none", incomplete = "compare", no_match = NA_integer_, remaining = "drop", multiple = "all", relationship = "none", nan_distinct = FALSE, chr_proxy_collate = NULL, needles_arg = "needles", haystack_arg = "haystack", error_call = current_env())Arguments
needles,haystack | Vectors used for matching.
Prior to comparison, |
... | These dots are for future extensions and must be empty. |
condition | Condition controlling how
|
filter | Filter to be applied to the matched results.
Filters don't have any effect on A filter can return multiple haystack matches for a particular needleif the maximum or minimum haystack value is duplicated in |
incomplete | Handling of missing andincompletevalues in
|
no_match | Handling of
|
remaining | Handling of
|
multiple | Handling of
|
relationship | Handling of the expected relationship between
|
nan_distinct | A single logical specifying whether or not |
chr_proxy_collate | A function generating an alternate representationof character vectors to use for collation, often used for locale-awareordering.
For data frames, Common transformation functions include: |
needles_arg,haystack_arg | Argument tags for |
error_call | The execution environment of a currentlyrunning function, e.g. |
Details
vec_match() is identical to (but often slightly faster than):
vec_locate_matches( needles, haystack, condition = "==", multiple = "first", nan_distinct = TRUE)
vec_locate_matches() is extremely similar to a SQL join betweenneedlesandhaystack, with the default being most similar to a left join.
Be very careful when specifying matchconditions. If a condition ismisspecified, it is very easy to accidentally generate an exponentiallylarge number of matches.
Value
A two column data frame containing the locations of the matches.
needlesis an integer vector containing the location ofthe needle currently being matched.haystackis an integer vector containing the location of thecorresponding match in the haystack for the current needle.
Dependencies ofvec_locate_matches()
Examples
x <- c(1, 2, NA, 3, NaN)y <- c(2, 1, 4, NA, 1, 2, NaN)# By default, for each value of `x`, all matching locations in `y` are# returnedmatches <- vec_locate_matches(x, y)matches# The result can be used to slice the inputs to align themdata_frame( x = vec_slice(x, matches$needles), y = vec_slice(y, matches$haystack))# If multiple matches are present, control which is returned with `multiple`vec_locate_matches(x, y, multiple = "first")vec_locate_matches(x, y, multiple = "last")vec_locate_matches(x, y, multiple = "any")# Use `relationship` to add constraints and error on multiple matches if# they aren't expectedtry(vec_locate_matches(x, y, relationship = "one-to-one"))# In this case, the `NA` in `y` matches two rows in `x`try(vec_locate_matches(x, y, relationship = "one-to-many"))# By default, `NA` is treated as being identical to `NaN`.# Using `nan_distinct = TRUE` treats `NA` and `NaN` as different values, so# `NA` can only match `NA`, and `NaN` can only match `NaN`.vec_locate_matches(x, y, nan_distinct = TRUE)# If you never want missing values to match, set `incomplete = NA` to return# `NA` in the `haystack` column anytime there was an incomplete value# in `needles`.vec_locate_matches(x, y, incomplete = NA)# Using `incomplete = NA` allows us to enforce the one-to-many relationship# that we couldn't beforevec_locate_matches(x, y, relationship = "one-to-many", incomplete = NA)# `no_match` allows you to specify the returned value for a needle with# zero matches. Note that this is different from an incomplete value,# so specifying `no_match` allows you to differentiate between incomplete# values and unmatched values.vec_locate_matches(x, y, incomplete = NA, no_match = 0L)# If you want to require that every `needle` has at least 1 match, set# `no_match` to `"error"`:try(vec_locate_matches(x, y, incomplete = NA, no_match = "error"))# By default, `vec_locate_matches()` detects equality between `needles` and# `haystack`. Using `condition`, you can detect where an inequality holds# true instead. For example, to find every location where `x[[i]] >= y`:matches <- vec_locate_matches(x, y, condition = ">=")data_frame( x = vec_slice(x, matches$needles), y = vec_slice(y, matches$haystack))# You can limit which matches are returned with a `filter`. For example,# with the above example you can filter the matches returned by `x[[i]] >= y`# down to only the ones containing the maximum `y` value of those matches.matches <- vec_locate_matches(x, y, condition = ">=", filter = "max")# Here, the matches for the `3` needle value have been filtered down to# only include the maximum haystack value of those matches, `2`. This is# often referred to as a rolling join.data_frame( x = vec_slice(x, matches$needles), y = vec_slice(y, matches$haystack))# In the very rare case that you need to generate locations for a# cross match, where every value of `x` is forced to match every# value of `y` regardless of what the actual values are, you can# replace `x` and `y` with integer vectors of the same size that contain# a single value and match on those instead.x_proxy <- vec_rep(1L, vec_size(x))y_proxy <- vec_rep(1L, vec_size(y))nrow(vec_locate_matches(x_proxy, y_proxy))vec_size(x) * vec_size(y)# By default, missing values will match other missing values when using# `==`, `>=`, or `<=` conditions, but not when using `>` or `<` conditions.# This is similar to how `vec_compare(x, y, na_equal = TRUE)` works.x <- c(1, NA)y <- c(NA, 2)vec_locate_matches(x, y, condition = "<=")vec_locate_matches(x, y, condition = "<")# You can force missing values to match regardless of the `condition`# by using `incomplete = "match"`vec_locate_matches(x, y, condition = "<", incomplete = "match")# You can also use data frames for `needles` and `haystack`. The# `condition` will be recycled to the number of columns in `needles`, or# you can specify varying conditions per column. In this example, we take# a vector of date `values` and find all locations where each value is# between lower and upper bounds specified by the `haystack`.values <- as.Date("2019-01-01") + 0:9needles <- data_frame(lower = values, upper = values)set.seed(123)lower <- as.Date("2019-01-01") + sample(10, 10, replace = TRUE)upper <- lower + sample(3, 10, replace = TRUE)haystack <- data_frame(lower = lower, upper = upper)# (values >= lower) & (values <= upper)matches <- vec_locate_matches(needles, haystack, condition = c(">=", "<="))data_frame( lower = vec_slice(lower, matches$haystack), value = vec_slice(values, matches$needle), upper = vec_slice(upper, matches$haystack))Locate sorted groups
Description
vec_locate_sorted_groups() returns a data frame containing akey columnwith sorted unique groups, and aloc column with the locations of eachgroup inx. It is similar tovec_group_loc(), except the groups arereturned sorted rather than by first appearance.
Usage
vec_locate_sorted_groups( x, ..., direction = "asc", na_value = "largest", nan_distinct = FALSE, chr_proxy_collate = NULL)Arguments
x | A vector |
... | These dots are for future extensions and must be empty. |
direction | Direction to sort in.
|
na_value | Ordering of missing values.
|
nan_distinct | A single logical specifying whether or not |
chr_proxy_collate | A function generating an alternate representationof character vectors to use for collation, often used for locale-awareordering.
For data frames, Common transformation functions include: |
Details
vec_locate_sorted_groups(x) is equivalent to, but faster than:
info <- vec_group_loc(x)vec_slice(info, vec_order(info$key))
Value
A two column data frame with size equal tovec_size(vec_unique(x)).
A
keycolumn of typevec_ptype(x).A
loccolumn of type list, with elements of type integer.
Dependencies ofvec_locate_sorted_groups()
Examples
df <- data.frame( g = sample(2, 10, replace = TRUE), x = c(NA, sample(5, 9, replace = TRUE)))# `vec_locate_sorted_groups()` is similar to `vec_group_loc()`, except keys# are returned ordered rather than by first appearance.vec_locate_sorted_groups(df)vec_group_loc(df)Find matching observations across vectors
Description
vec_in() returns a logical vector based on whetherneedle is found inhaystack.vec_match() returns an integer vector giving location ofneedle inhaystack, orNA if it's not found.
Usage
vec_match( needles, haystack, ..., na_equal = TRUE, needles_arg = "", haystack_arg = "")vec_in( needles, haystack, ..., na_equal = TRUE, needles_arg = "", haystack_arg = "")Arguments
needles,haystack | Vector of
|
... | These dots are for future extensions and must be empty. |
na_equal | If |
needles_arg,haystack_arg | Argument tags for |
Details
vec_in() is equivalent to%in%;vec_match() is equivalent tomatch().
Value
A vector the same length asneedles.vec_in() returns alogical vector;vec_match() returns an integer vector.
Missing values
In most cases places in R, missing values are not considered to be equal,i.e.NA == NA is notTRUE. The exception is in matching functionslikematch() andmerge(), where anNA will match anotherNA.Byvec_match() andvec_in() will matchNAs; but you can controlthis behaviour with thena_equal argument.
Dependencies
vec_cast_common()with fallback
Examples
hadley <- strsplit("hadley", "")[[1]]vec_match(hadley, letters)vowels <- c("a", "e", "i", "o", "u")vec_match(hadley, vowels)vec_in(hadley, vowels)# Only the first index of duplicates is returnedvec_match(c("a", "b"), c("a", "b", "a", "b"))Mathematical operations
Description
This generic provides a common dispatch mechanism for all regular unarymathematical functions. It is used as a common wrapper around many of theSummary group generics, the Math group generics, and a handful of othermathematical functions likemean() (but notvar() orsd()).
Usage
vec_math(.fn, .x, ...)vec_math_base(.fn, .x, ...)Arguments
.fn | A mathematical function from the base package, as a string. |
.x | A vector. |
... | Additional arguments passed to |
Details
vec_math_base() is provided as a convenience for writing methods. Itcalls the base.fn on the underlyingvec_data().
Included functions
From theSummary group generic:
prod(),sum(),any(),all().From theMath group generic:
abs(),sign(),sqrt(),ceiling(),floor(),trunc(),cummax(),cummin(),cumprod(),cumsum(),log(),log10(),log2(),log1p(),acos(),acosh(),asin(),asinh(),atan(),atanh(),exp(),expm1(),cos(),cosh(),cospi(),sin(),sinh(),sinpi(),tan(),tanh(),tanpi(),gamma(),lgamma(),digamma(),trigamma().Additional generics:
mean(),is.nan(),is.finite(),is.infinite().
Note thatmedian() is currently not implemented, andsd() andvar() are currently not generic and so do not support customclasses.
See Also
vec_arith() for the equivalent for the arithmetic infix operators.
Examples
x <- new_vctr(c(1, 2.5, 10))xabs(x)sum(x)cumsum(x)Get or set the names of a vector
Description
These functions work likerlang::names2(),names() andnames<-(),except that they return or modify the the rowwise names of the vector. These are:
The usual
names()for atomic vectors and listsThe row names for data frames and matrices
The names of the first dimension for arraysRowwise names are size consistent: the length of the names always equals
vec_size().
vec_names2() returns the repaired names from a vector, even if it is unnamed.Seevec_as_names() for details on name repair.
vec_names() is a bare-bones version that returnsNULL if the vector isunnamed.
vec_set_names() sets the names or removes them.
Usage
vec_names2( x, ..., repair = c("minimal", "unique", "universal", "check_unique", "unique_quiet", "universal_quiet"), quiet = FALSE)vec_names(x)vec_set_names(x, names)Arguments
x | A vector with names |
... | These dots are for future extensions and must be empty. |
repair | Either a string or a function. If a string, it must be one of
The The options |
quiet | By default, the user is informed of any renamingcaused by repairing the names. This only concerns unique anduniversal repairing. Set Users can silence the name repair messages by setting the |
names | A character vector, or |
Value
vec_names2() returns the names ofx, repaired.vec_names() returns the names ofx orNULL if unnamed.vec_set_names() returnsx with names updated.
Examples
vec_names2(1:3)vec_names2(1:3, repair = "unique")vec_names2(c(a = 1, b = 2))# `vec_names()` consistently returns the rowwise names of data frames and arrays:vec_names(data.frame(a = 1, b = 2))names(data.frame(a = 1, b = 2))vec_names(mtcars)names(mtcars)vec_names(Titanic)names(Titanic)vec_set_names(1:3, letters[1:3])vec_set_names(data.frame(a = 1:3), letters[1:3])Order and sort vectors
Description
Order and sort vectors
Usage
vec_order( x, ..., direction = c("asc", "desc"), na_value = c("largest", "smallest"))vec_sort( x, ..., direction = c("asc", "desc"), na_value = c("largest", "smallest"))Arguments
x | A vector |
... | These dots are for future extensions and must be empty. |
direction | Direction to sort in. Defaults to |
na_value | Should |
Value
vec_order()an integer vector the same size asx.vec_sort()a vector with the same size and type asx.
Differences withorder()
Unlike thena.last argument oforder() which decides thepositions of missing values irrespective of thedecreasingargument, thena_value argument ofvec_order() interacts withdirection. If missing values are considered the largest value,they will appear last in ascending order, and first in descendingorder.
Dependencies ofvec_order()
Dependencies ofvec_sort()
Examples
x <- round(c(runif(9), NA), 3)vec_order(x)vec_sort(x)vec_sort(x, direction = "desc")# Can also handle data framesdf <- data.frame(g = sample(2, 10, replace = TRUE), x = x)vec_order(df)vec_sort(df)vec_sort(df, direction = "desc")# Missing values interpreted as largest values are last when# in increasing order:vec_order(c(1, NA), na_value = "largest", direction = "asc")vec_order(c(1, NA), na_value = "largest", direction = "desc")Proxy and restore
Description
vec_proxy() returns the data structure containing the values of avector. This data structure is usually the vector itself. In thiscase the proxy is theidentity function, which isthe defaultvec_proxy() method.
Only experts should implement specialvec_proxy() methods, forthese cases:
A vector has vectorised attributes, i.e. metadata foreach element of the vector. Theserecord types are implementedin vctrs by returning a data frame in the proxy method. If you'restarting your class from scratch, consider deriving from the
rcrdclass. It implements the appropriate dataframe proxy and is generally the preferred way to create a recordclass.When you're implementing a vector on top of a non-vector type,like an environment or an S4 object. This is currently onlypartially supported.
S3 lists are considered scalars by default. This is the safechoice for list objects such as returned by
stats::lm(). Todeclare that your S3 list class is a vector, you normally add"list"to the right of your class vector. Explicit inheritancefrom list is generally the preferred way to declare an S3 list inR, for instance it makes it possible to dispatch ongeneric.listS3 methods.If you can't modify your class vector, you can implement anidentity proxy (i.e. a proxy method that just returns its input)to let vctrs know this is a vector list and not a scalar.
vec_restore() is the inverse operation ofvec_proxy(). Itshould only be called on vector proxies.
It undoes the transformations of
vec_proxy().It restores attributes and classes. These may be lost when thememory values are manipulated. For example slicing a subset of avector's proxy causes a new proxy to be allocated.
By default vctrs restores all attributes and classesautomatically. You only need to implement avec_restore() methodif your class has attributes that depend on the data.
Usage
vec_proxy(x, ...)vec_restore(x, to, ...)Arguments
x | A vector. |
... | These dots are for future extensions and must be empty. |
to | The original vector to restore to. |
Proxying
You should only implementvec_proxy() when your type is designedaround a non-vector class. I.e. anything that is not either:
An atomic vector
A bare list
A data frame
In this case, implementvec_proxy() to return such a vectorclass. The vctrs operations such asvec_slice() are applied onthe proxy andvec_restore() is called to restore the originalrepresentation of your type.
The most common case where you need to implementvec_proxy() isfor S3 lists. In vctrs, S3 lists are treated as scalars bydefault. This way we don't treat objects like model fits asvectors. To prevent vctrs from treating your S3 list as a scalar,unclass it in thevec_proxy() method. For instance, here is thedefinition forlist_of:
vec_proxy.vctrs_list_of <- function(x) { unclass(x)}Another case where you need to implement a proxy isrecord types. Record types should return a data frame, as inthePOSIXlt method:
vec_proxy.POSIXlt <- function(x) { new_data_frame(unclass(x))}Note that you don't need to implementvec_proxy() when your classinherits fromvctrs_vctr orvctrs_rcrd.
Restoring
A restore is a specialised type of cast, primarily used inconjunction withNextMethod() or a C-level function that works onthe underlying data structure. Avec_restore() method can makethe following assumptions aboutx:
It has the correct type.
It has the correct names.
It has the correct
dimanddimnamesattributes.It is unclassed. This way you can call vctrs generics with
xwithout triggering an infinite loop of restoration.
The length may be different (for example aftervec_slice() hasbeen called), and all other attributes may have been lost. Themethod should restore all attributes so that after restoration,vec_restore(vec_data(x), x) yieldsx.
To understand the difference betweenvec_cast() andvec_restore()think about factors: it doesn't make sense to cast an integer to a factor,but ifNextMethod() or another low-level function has stripped attributes,you still need to be able to restore them.
The default method copies across all attributes so you only need toprovide your own method if your attributes require special care(i.e. they are dependent on the data in some way). When implementingyour own method, bear in mind that many R users add attributes to trackadditional metadata that is important to them, so you should preserve anyattributes that don't require special handling for your class.
Dependencies
xmust be a vector in the vctrs sense (seevec_is())By default the underlying data is returned as is (identity proxy)
All vector classes have a proxy, even those who don't implement anyvctrs methods. The exception is S3 lists that don't inherit from"list" explicitly. These might have to implement an identityproxy for compatibility with vctrs (see discussion above).
Comparison and order proxy
Description
vec_proxy_compare() andvec_proxy_order() return proxy objects, i.e.an atomic vector or data frame of atomic vectors.
Forvctrs_vctr objects:
vec_proxy_compare()determines the behavior of<,>,>=and<=(viavec_compare()); andmin(),max(),median(), andquantile().vec_proxy_order()determines the behavior oforder()andsort()(viaxtfrm()).
Usage
vec_proxy_compare(x, ...)vec_proxy_order(x, ...)Arguments
x | A vector x. |
... | These dots are for future extensions and must be empty. |
Details
The default method ofvec_proxy_compare() assumes that all classes builton top of atomic vectors or records are comparable. Internally the defaultcallsvec_proxy_equal(). If your class is not comparable, you will needto provide avec_proxy_compare() method that throws an error.
The behavior ofvec_proxy_order() is identical tovec_proxy_compare(),with the exception of lists. Lists are not comparable, as comparingelements of different types is undefined. However, to allow ordering ofdata frames containing list-columns, the ordering proxy of a list isgenerated as an integer vector that can be used to order list elementsby first appearance.
If a class implements avec_proxy_compare() method, it usually doesn't needto provide avec_proxy_order() method, because the latter is implementedby forwarding tovec_proxy_compare() by default. Classes inheriting fromlist are an exception: due to the defaultvec_proxy_order() implementation,vec_proxy_compare() andvec_proxy_order() should be provided for suchclasses (with identical implementations) to avoid mismatches betweencomparison and sorting.
Value
A 1d atomic vector or a data frame.
Dependencies
vec_proxy_equal()called by default invec_proxy_compare()vec_proxy_compare()called by default invec_proxy_order()
Data frames
If the proxy forx is a data frame, the proxy function is automaticallyrecursively applied on all columns as well. After applying the proxyrecursively, if there are any data frame columns present in the proxy, thenthey are unpacked. Finally, if the resulting data frame only has a singlecolumn, then it is unwrapped and a vector is returned as the proxy.
Examples
# Lists are not comparablex <- list(1:2, 1, 1:2, 3)try(vec_compare(x, x))# But lists are orderable by first appearance to allow for# ordering data frames with list-colsdf <- new_data_frame(list(x = x))vec_sort(df)Equality proxy
Description
Returns a proxy object (i.e. an atomic vector or data frame of atomicvectors). Forvctrs, this determines the behaviour of== and!= (viavec_equal());unique(),duplicated() (viavec_unique() andvec_duplicate_detect());is.na() andanyNA()(viavec_detect_missing()).
Usage
vec_proxy_equal(x, ...)Arguments
x | A vector x. |
... | These dots are for future extensions and must be empty. |
Details
The default method callsvec_proxy(), as the default underlyingvector data should be equal-able in most cases. If your class isnot equal-able, provide avec_proxy_equal() method that throws anerror.
Value
A 1d atomic vector or a data frame.
Data frames
If the proxy forx is a data frame, the proxy function is automaticallyrecursively applied on all columns as well. After applying the proxyrecursively, if there are any data frame columns present in the proxy, thenthey are unpacked. Finally, if the resulting data frame only has a singlecolumn, then it is unwrapped and a vector is returned as the proxy.
Dependencies
vec_proxy()called by default
Find the prototype of a set of vectors
Description
vec_ptype() returns the unfinalised prototype of a single vector.vec_ptype_common() finds the common type of multiple vectors.vec_ptype_show() nicely prints the common type of any number ofinputs, and is designed for interactive exploration.
Usage
vec_ptype(x, ..., x_arg = "", call = caller_env())vec_ptype_common(..., .ptype = NULL, .arg = "", .call = caller_env())vec_ptype_show(...)Arguments
x | A vector |
... | For For |
x_arg | Argument name for |
call,.call | The execution environment of a currentlyrunning function, e.g. |
.ptype | If Alternatively, you can supply |
.arg | An argument name as a string. This argumentwill be mentioned in error messages as the input that is at theorigin of a problem. |
Value
vec_ptype() andvec_ptype_common() return a prototype(a size-0 vector)
vec_ptype()
vec_ptype() returnssize 0 vectors potentiallycontaining attributes but no data. Generally, this is justvec_slice(x, 0L), but some inputs require specialhandling.
While you can't slice
NULL, the prototype ofNULLisitself. This is because we treatNULLas an identity value inthevec_ptype2()monoid.The prototype of logical vectors that only contain missing valuesis the specialunspecified type, which can be coerced to anyother 1d type. This allows bare
NAs to represent missing valuesfor any 1d vector type.
Seeinternal-faq-ptype2-identity for more information aboutidentity values.
vec_ptype() is aperformance generic. It is not necessary to implement itbecause the default method will work for any vctrs type. However the defaultmethod builds around other vctrs primitives likevec_slice() which incursperformance costs. If your class has a static prototype, you might considerimplementing a customvec_ptype() method that returns a constant. This willimprove the performance of your class in many cases (common type imputation in particular).
Because it may contain unspecified vectors, the prototype returnedbyvec_ptype() is said to beunfinalised. Callvec_ptype_finalise() to finalise it. Commonly you will need thefinalised prototype as returned byvec_slice(x, 0L).
vec_ptype_common()
vec_ptype_common() first finds the prototype of each input, thensuccessively callsvec_ptype2() to find a common type. It returnsafinalised prototype.
Dependencies ofvec_ptype()
vec_slice()for returning an empty slice
Dependencies ofvec_ptype_common()
Examples
# Unknown types ------------------------------------------vec_ptype_show()vec_ptype_show(NA)vec_ptype_show(NULL)# Vectors ------------------------------------------------vec_ptype_show(1:10)vec_ptype_show(letters)vec_ptype_show(TRUE)vec_ptype_show(Sys.Date())vec_ptype_show(Sys.time())vec_ptype_show(factor("a"))vec_ptype_show(ordered("a"))# Matrices -----------------------------------------------# The prototype of a matrix includes the number of columnsvec_ptype_show(array(1, dim = c(1, 2)))vec_ptype_show(array("x", dim = c(1, 2)))# Data frames --------------------------------------------# The prototype of a data frame includes the prototype of# every columnvec_ptype_show(iris)# The prototype of multiple data frames includes the prototype# of every column that in any data framevec_ptype_show( data.frame(x = TRUE), data.frame(y = 2), data.frame(z = "a"))Find the common type for a pair of vectors
Description
vec_ptype2() defines the coercion hierarchy for a set of relatedvector types. Along withvec_cast(), this generic forms thefoundation of type coercions in vctrs.
vec_ptype2() is relevant when you are implementing vctrs methodsfor your class, but it should not usually be called directly. Ifyou need to find the common type of a set of inputs, callvec_ptype_common() instead. This function supports multipleinputs andfinalises the common type.
Usage
## S3 method for class 'logical'vec_ptype2(x, y, ..., x_arg = "", y_arg = "")## S3 method for class 'integer'vec_ptype2(x, y, ..., x_arg = "", y_arg = "")## S3 method for class 'double'vec_ptype2(x, y, ..., x_arg = "", y_arg = "")## S3 method for class 'complex'vec_ptype2(x, y, ..., x_arg = "", y_arg = "")## S3 method for class 'character'vec_ptype2(x, y, ..., x_arg = "", y_arg = "")## S3 method for class 'raw'vec_ptype2(x, y, ..., x_arg = "", y_arg = "")## S3 method for class 'list'vec_ptype2(x, y, ..., x_arg = "", y_arg = "")vec_ptype2( x, y, ..., x_arg = caller_arg(x), y_arg = caller_arg(y), call = caller_env())Arguments
x,y | Vector types. |
... | These dots are for future extensions and must be empty. |
x_arg,y_arg | Argument names for |
call | The execution environment of a currentlyrunning function, e.g. |
Implementing coercion methods
For an overview of how these generics work and their roles in vctrs,see
?theory-faq-coercion.For an example of implementing coercion methods for simple vectors,see
?howto-faq-coercion.For an example of implementing coercion methods for data framesubclasses, see
?howto-faq-coercion-data-frame.For a tutorial about implementing vctrs classes from scratch, see
vignette("s3-vector").
Dependencies
vec_ptype()is applied toxandy
See Also
stop_incompatible_type() when you determine from theattributes that an input can't be cast to the target type.
Vector type as a string
Description
vec_ptype_full() displays the full type of the vector.vec_ptype_abbr()provides an abbreviated summary suitable for use in a column heading.
Usage
vec_ptype_full(x, ...)vec_ptype_abbr(x, ..., prefix_named = FALSE, suffix_shape = TRUE)Arguments
x | A vector. |
... | These dots are for future extensions and must be empty. |
prefix_named | If |
suffix_shape | If |
Value
A string.
S3 dispatch
The default method forvec_ptype_full() uses the first element of theclass vector. Override this method if your class has parameters that shouldbe prominently displayed.
The default method forvec_ptype_abbr()abbreviate()svec_ptype_full()to 8 characters. You should almost always override, aiming for 4-6characters where possible.
These arguments are handled by the generic and not passed to methods:
prefix_namedsuffix_shape
Examples
cat(vec_ptype_full(1:10))cat(vec_ptype_full(iris))cat(vec_ptype_abbr(1:10))64 bit integers
Description
Ainteger64 is a 64 bits integer vector, implemented in thebit64 package.
Usage
## S3 method for class 'integer64'vec_ptype_full(x, ...)## S3 method for class 'integer64'vec_ptype_abbr(x, ...)## S3 method for class 'integer64'vec_ptype2(x, y, ...)## S3 method for class 'integer64'vec_cast(x, to, ...)Details
These functions help theinteger64 class frombit64 in tothe vctrs type system by providing coercion functionsand casting functions.
Compute ranks
Description
vec_rank() computes the sample ranks of a vector. For data frames, ranksare computed along the rows, using all columns after the first to breakties.
Usage
vec_rank( x, ..., ties = c("min", "max", "sequential", "dense"), incomplete = c("rank", "na"), direction = "asc", na_value = "largest", nan_distinct = FALSE, chr_proxy_collate = NULL)Arguments
x | A vector |
... | These dots are for future extensions and must be empty. |
ties | Ranking of duplicate values.
|
incomplete | Ranking of missing andincompleteobservations.
|
direction | Direction to sort in.
|
na_value | Ordering of missing values.
|
nan_distinct | A single logical specifying whether or not |
chr_proxy_collate | A function generating an alternate representationof character vectors to use for collation, often used for locale-awareordering.
For data frames, Common transformation functions include: |
Details
Unlikebase::rank(), whenincomplete = "rank" all missing values aregiven the same rank, rather than an increasing sequence of ranks. Whennan_distinct = FALSE,NaN values are given the same rank asNA,otherwise they are given a rank that differentiates them fromNA.
Likevec_order_radix(), ordering is done in the C-locale. This can affectthe ranks of character vectors, especially regarding how uppercase andlowercase letters are ranked. See the documentation ofvec_order_radix()for more information.
Dependencies
Examples
x <- c(5L, 6L, 3L, 3L, 5L, 3L)vec_rank(x, ties = "min")vec_rank(x, ties = "max")# Sequential ranks use an increasing sequence for duplicatesvec_rank(x, ties = "sequential")# Dense ranks remove gaps between distinct values,# even if there are duplicatesvec_rank(x, ties = "dense")y <- c(NA, x, NA, NaN)# Incomplete values match other incomplete values by default, and their# overall position can be adjusted with `na_value`vec_rank(y, na_value = "largest")vec_rank(y, na_value = "smallest")# NaN can be ranked separately from NA if requiredvec_rank(y, nan_distinct = TRUE)# Rank in descending order. Since missing values are the largest value,# they are given a rank of `1` when ranking in descending order.vec_rank(y, direction = "desc", na_value = "largest")# Give incomplete values a rank of `NA` by setting `incomplete = "na"`vec_rank(y, incomplete = "na")# Can also rank data frames, using columns after the first to break tiesz <- c(2L, 3L, 4L, 4L, 5L, 2L)df <- data_frame(x = x, z = z)dfvec_rank(df)Vector recycling
Description
vec_recycle(x, size) recycles a single vector to a given size.vec_recycle_common(...) recycles multiple vectors to their common size. Allfunctions obey thevctrs recycling rules, and willthrow an error if recycling is not possible. Seevec_size() for the precisedefinition of size.
Usage
vec_recycle(x, size, ..., x_arg = "", call = caller_env())vec_recycle_common(..., .size = NULL, .arg = "", .call = caller_env())Arguments
x | A vector to recycle. |
size | Desired output size. |
... | Depending on the function used:
|
x_arg | Argument name for |
call,.call | The execution environment of a currentlyrunning function, e.g. |
.size | Desired output size. If omitted,will use the common size from |
.arg | An argument name as a string. This argumentwill be mentioned in error messages as the input that is at theorigin of a problem. |
Dependencies
Examples
# Inputs with 1 observation are recycledvec_recycle_common(1:5, 5)vec_recycle_common(integer(), 5)## Not run: vec_recycle_common(1:5, 1:2)## End(Not run)# Data frames and matrices are recycled along their rowsvec_recycle_common(data.frame(x = 1), 1:5)vec_recycle_common(array(1:2, c(1, 2)), 1:5)vec_recycle_common(array(1:3, c(1, 3, 1)), 1:5)Expand the length of a vector
Description
vec_repeat() has been replaced withvec_rep() andvec_rep_each() and isdeprecated as of vctrs 0.3.0.
Usage
vec_repeat(x, each = 1L, times = 1L)Arguments
x | A vector. |
each | Number of times to repeat each element of |
times | Number of times to repeat the whole vector of |
Value
A vector the same type asx with sizevec_size(x) * times * each.
Useful sequences
Description
vec_seq_along() is equivalent toseq_along() but uses size, not length.vec_init_along() creates a vector of missing values with size matchingan existing object.
Usage
vec_seq_along(x)vec_init_along(x, y = x)Arguments
x,y | Vectors |
Value
vec_seq_along()an integer vector with the same size asx.vec_init_along()a vector with the same type asxand the same sizeasy.
Examples
vec_seq_along(mtcars)vec_init_along(head(mtcars))Number of observations
Description
vec_size(x) returns the size of a vector.vec_is_empty()returnsTRUE if the size is zero,FALSE otherwise.
The size is distinct from thelength() of a vector because itgeneralises to the "number of observations" for 2d structures,i.e. it's the number of rows in matrix or a data frame. Thisdefinition has the important property that every column of a dataframe (even data frame and matrix columns) have the same size.vec_size_common(...) returns the common size of multiple vectors.
list_sizes() returns an integer vector containing the size of each elementof a list. It is nearly equivalent to, but faster than,map_int(x, vec_size), with the exception thatlist_sizes() willerror on non-list inputs, as defined byobj_is_list().list_sizes() istovec_size() aslengths() is tolength().
Usage
vec_size(x)vec_size_common( ..., .size = NULL, .absent = 0L, .arg = "", .call = caller_env())list_sizes(x)vec_is_empty(x)Arguments
x,... | Vector inputs or |
.size | If |
.absent | The size used when no input is provided, or when all inputis |
.arg | An argument name as a string. This argumentwill be mentioned in error messages as the input that is at theorigin of a problem. |
.call | The execution environment of a currentlyrunning function, e.g. |
Details
There is no vctrs helper that retrieves the number of columns: as thisis a property of thetype.
vec_size() is equivalent toNROW() but has a name that is easier topronounce, and throws an error when passed non-vector inputs.
Value
An integer (or double for long vectors).
vec_size_common() returns.absent if all inputs areNULL orabsent,0L by default.
Invariants
vec_size(dataframe)==vec_size(dataframe[[i]])vec_size(matrix)==vec_size(matrix[, i, drop = FALSE])vec_size(vec_c(x, y))==vec_size(x)+vec_size(y)
The size of NULL
The size ofNULL is hard-coded to0L invec_size().vec_size_common() returns.absent when all inputs areNULL(if only some inputs areNULL, they are simply ignored).
A default size of 0 makes sense because sizes are most oftenqueried in order to compute a total size while assembling acollection of vectors. Since we treatNULL as an absent input byprinciple, we return the identity of sizes under addition toreflect that an absent input doesn't take up any size.
Note that other defaults might make sense under differentcircumstances. For instance, a default size of 1 makes sense forfinding the common size because 1 is the identity of the recyclingrules.
Dependencies
See Also
vec_slice() for a variation of[ compatible withvec_size(),andvec_recycle() torecycle vectors to commonlength.
Examples
vec_size(1:100)vec_size(mtcars)vec_size(array(dim = c(3, 5, 10)))vec_size_common(1:10, 1:10)vec_size_common(1:10, 1)vec_size_common(integer(), 1)list_sizes(list("a", 1:5, letters))Get or set observations in a vector
Description
This provides a common interface to extracting and modifying observationsfor all vector types, regardless of dimensionality. It is an analog to[that matchesvec_size() instead oflength().
Usage
vec_slice(x, i, ..., error_call = current_env())vec_slice(x, i) <- valuevec_assign(x, i, value, ..., x_arg = "", value_arg = "")Arguments
x | A vector |
i | An integer, character or logical vector specifying thelocations or names of the observations to get/set. Specify |
... | These dots are for future extensions and must be empty. |
error_call | The execution environment of a currentlyrunning function, e.g. |
value | Replacement values. |
x_arg,value_arg | Argument names for |
Value
A vector of the same type asx.
Genericity
Support for S3 objects depends on whether the object implements avec_proxy() method.
When a
vec_proxy()method exists, the proxy is sliced andvec_restore()is called on the result.Otherwise
vec_slice()falls back to the base generic[.
Note that S3 lists are treated as scalars by default, and willcause an error if they don't implement avec_proxy() method.
Differences with base R subsetting
vec_slice()only slices along one dimension. Fortwo-dimensional types, the first dimension is subsetted.vec_slice()preserves attributes by default.vec_slice<-()is type-stable and always returns the same typeas the LHS.
Dependencies
vctrs dependencies
base dependencies
base::`[`
If a non-data-frame vector class doesn't have avec_proxy()method, the vector is sliced with[ instead.
Examples
x <- sample(10)xvec_slice(x, 1:3)# You can assign with the infix variant:vec_slice(x, 2) <- 100x# Or with the regular variant that doesn't modify the original input:y <- vec_assign(x, 3, 500)yx# Slicing objects of higher dimension:vec_slice(mtcars, 1:3)# Type stability --------------------------------------------------# The assign variant is type stable. It always returns the same# type as the input.x <- 1:5vec_slice(x, 2) <- 20.0# `x` is still an integer vector because the RHS was cast to the# type of the LHS:vec_ptype(x)# Compare to `[<-`:x[2] <- 20.0vec_ptype(x)# Note that the types must be coercible for the cast to happen.# For instance, you can cast a double vector of whole numbers to an# integer vector:vec_cast(1, integer())# But not fractional doubles:try(vec_cast(1.5, integer()))# For this reason you can't assign fractional values in an integer# vector:x <- 1:3try(vec_slice(x, 2) <- 1.5)Split a vector into groups
Description
This is a generalisation ofsplit() that can split by any type of vector,not just factors. Instead of returning the keys in the character names,the are returned in a separate parallel vector.
Usage
vec_split(x, by)Arguments
x | Vector to divide into groups. |
by | Vector whose unique values defines the groups. |
Value
A data frame with two columns and size equal tovec_size(vec_unique(by)). Thekey column has the same type asby, and theval column is a list containing elements of typevec_ptype(x).
Note for complex types, the defaultdata.frame print method will besuboptimal, and you will want to coerce into a tibble to betterunderstand the output.
Dependencies
Examples
vec_split(mtcars$cyl, mtcars$vs)vec_split(mtcars$cyl, mtcars[c("vs", "am")])if (require("tibble")) { as_tibble(vec_split(mtcars$cyl, mtcars[c("vs", "am")])) as_tibble(vec_split(mtcars, mtcars[c("vs", "am")]))}Deprecated type functions
Description
These functions have been renamed:
vec_type()=>vec_ptype()vec_type2()=>vec_ptype2()vec_type_common()=>vec_ptype_common()
Usage
vec_type(x)vec_type_common(..., .ptype = NULL)vec_type2(x, y, ...)Arguments
x,y,...,.ptype | Arguments for deprecated functions. |
Chopping
Description
vec_unchop() has been renamed tolist_unchop() and is deprecated as ofvctrs 0.5.0.
Usage
vec_unchop( x, indices = NULL, ptype = NULL, name_spec = NULL, name_repair = c("minimal", "unique", "check_unique", "universal"))Arguments
x | A vector |
indices | For For |
ptype | If |
name_spec | A name specification for combininginner and outer names. This is relevant for inputs passed with aname, when these inputs are themselves named, like
See thename specification topic. |
name_repair | How to repair names, see |
Value
vec_chop(): A list where each element has the same type asx. The sizeof the list is equal tovec_size(indices),vec_size(sizes), orvec_size(x)depending on whether or notindicesorsizesis provided.list_unchop(): A vector of typevec_ptype_common(!!!x), orptype, ifspecified. The size is computed asvec_size_common(!!!indices)unlessthe indices areNULL, in which case the size isvec_size_common(!!!x).
Find and count unique values
Description
vec_unique(): the unique values. Equivalent tounique().vec_unique_loc(): the locations of the unique values.vec_unique_count(): the number of unique values.
Usage
vec_unique(x)vec_unique_loc(x)vec_unique_count(x)Arguments
x | A vector (including a data frame). |
Value
vec_unique(): a vector the same type asxcontaining only uniquevalues.vec_unique_loc(): an integer vector, giving locations of unique values.vec_unique_count(): an integer vector of length 1, giving thenumber of unique values.
Dependencies
Missing values
In most cases, missing values are not considered to be equal, i.e.NA == NA is notTRUE. This behaviour would be unappealing here,so these functions consider allNAs to be equal. (Similarly,allNaN are also considered to be equal.)
See Also
vec_duplicate for functions that work with the dual ofunique values: duplicated values.
Examples
x <- rpois(100, 8)vec_unique(x)vec_unique_loc(x)vec_unique_count(x)# `vec_unique()` returns values in the order that encounters them# use sort = "location" to match to the result of `vec_count()`head(vec_unique(x))head(vec_count(x, sort = "location"))# Normally missing values are not considered to be equalNA == NA# But they are for the purposes of considering uniquenessvec_unique(c(NA, NA, NA, NA, 1, 2, 1))Vector checks
Description
obj_is_vector()tests ifxis considered a vector in the vctrs sense.SeeVectors and scalars below for the exact details.obj_check_vector()usesobj_is_vector()and throws a standardized andinformative error if it returnsFALSE.vec_check_size()tests ifxhas sizesize, and throws an informativeerror if it doesn't.
Usage
obj_is_vector(x)obj_check_vector(x, ..., arg = caller_arg(x), call = caller_env())vec_check_size(x, size, ..., arg = caller_arg(x), call = caller_env())Arguments
x | For |
... | These dots are for future extensions and must be empty. |
arg | An argument name as a string. This argumentwill be mentioned in error messages as the input that is at theorigin of a problem. |
call | The execution environment of a currentlyrunning function, e.g. |
size | The size to check for. |
Value
obj_is_vector()returns a singleTRUEorFALSE.obj_check_vector()returnsNULLinvisibly, or errors.vec_check_size()returnsNULLinvisibly, or errors.
Vectors and scalars
Informally, a vector is a collection that makes sense to use as column in adata frame. The following rules define whether or notx is considered avector.
If novec_proxy() method has been registered,x is a vector if:
Thebase type of the object is atomic:
"logical","integer","double","complex","character", or"raw".xis a list, as defined byobj_is_list().xis adata.frame.
If avec_proxy() method has been registered,x is a vector if:
The proxy satisfies one of the above conditions.
The base type of the proxy is
"list", regardless of its class. S3 listsare thus treated as scalars unless they implement avec_proxy()method.
Otherwise an object is treated as scalar and cannot be used as a vector.In particular:
NULLis not a vector.S3 lists like
lmobjects are treated as scalars by default.Objects of typeexpression are not treated as vectors.
Technical limitations
Support for S4 vectors is currently limited to objects that inherit from anatomic type.
Subclasses ofdata.frame thatappend their class to the back of the
"class"attribute are not treated as vectors. If you inherit from an S3class, always prepend your class to the front of the"class"attributefor correct dispatch. This matches our general principle of allowingsubclasses but not mixins.
Examples
obj_is_vector(1)# Data frames are vectorsobj_is_vector(data_frame())# Bare lists are vectorsobj_is_vector(list())# S3 lists are vectors if they explicitly inherit from `"list"`x <- structure(list(), class = c("my_list", "list"))obj_is_list(x)obj_is_vector(x)# But if they don't explicitly inherit from `"list"`, they aren't# automatically considered to be vectors. Instead, vctrs considers this# to be a scalar object, like a linear model returned from `lm()`.y <- structure(list(), class = "my_list")obj_is_list(y)obj_is_vector(y)# `obj_check_vector()` throws an informative error if the input# isn't a vectortry(obj_check_vector(y))# `vec_check_size()` throws an informative error if the size of the# input doesn't match `size`vec_check_size(1:5, size = 5)try(vec_check_size(1:5, size = 4))