This vignette compares dplyr functions to their base R equivalents.This helps those familiar with base R understand better what dplyr does,and shows dplyr users how you might express the same ideas in base Rcode. We’ll start with a rough overview of the major differences, thendiscuss the one table verbs in more detail, followed by the two tableverbs.
The code dplyr verbs input and output data frames. This contrastswith base R functions which more frequently work with individualvectors.
dplyr relies heavily on “non-standard evaluation” so that youdon’t need to use$ to refer to columns in the “current”data frame. This behaviour is inspired by the base functionssubset() andtransform().
dplyr solutions tend to use a variety of single purpose verbs,while base R solutions typically tend to use[ in a varietyof ways, depending on the task at hand.
Multiple dplyr verbs are often strung together into a pipeline by%>%. In base R, you’ll typically save intermediateresults to a variable that you either discard, or repeatedlyoverwrite.
All dplyr verbs handle “grouped” data frames so that the code toperform a computation per-group looks very similar to code that works ona whole data frame. In base R, per-group operations tend to have variedforms.
The following table shows a condensed translation between dplyr verbsand their base R equivalents. The following sections describe eachoperation in more detail. You’ll learn more about the dplyr verbs intheir documentation and invignette("dplyr").
| dplyr | base |
|---|---|
arrange(df, x) | df[order(x), , drop = FALSE] |
distinct(df, x) | df[!duplicated(x), , drop = FALSE],unique() |
filter(df, x) | df[which(x), , drop = FALSE],subset() |
mutate(df, z = x + y) | df$z <- df$x + df$y,transform() |
pull(df, 1) | df[[1]] |
pull(df, x) | df$x |
rename(df, y = x) | names(df)[names(df) == "x"] <- "y" |
relocate(df, y) | df[union("y", names(df))] |
select(df, x, y) | df[c("x", "y")],subset() |
select(df, starts_with("x")) | df[grepl("^x", names(df))] |
summarise(df, mean(x)) | mean(df$x),tapply(),aggregate(),by() |
slice(df, c(1, 2, 5)) | df[c(1, 2, 5), , drop = FALSE] |
To begin, we’ll load dplyr and convertmtcars andiris to tibbles so that we can easily show only abbreviatedoutput for each operation.
arrange(): Arrange rows by variablesdplyr::arrange() orders the rows of a data frame by thevalues of one or more columns:
mtcars%>%arrange(cyl, disp)#> # A tibble: 32 × 11#> mpg cyl disp hp drat wt qsec vs am gear carb#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 33.9 4 71.1 65 4.22 1.84 19.9 1 1 4 1#> 2 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2#> 3 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1#> 4 27.3 4 79 66 4.08 1.94 18.9 1 1 4 1#> # ℹ 28 more rowsThedesc() helper allows you to order selected variablesin descending order:
mtcars%>%arrange(desc(cyl),desc(disp))#> # A tibble: 32 × 11#> mpg cyl disp hp drat wt qsec vs am gear carb#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 10.4 8 472 205 2.93 5.25 18.0 0 0 3 4#> 2 10.4 8 460 215 3 5.42 17.8 0 0 3 4#> 3 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4#> 4 19.2 8 400 175 3.08 3.84 17.0 0 0 3 2#> # ℹ 28 more rowsWe can replicate in base R by using[ withorder():
mtcars[order(mtcars$cyl, mtcars$disp), , drop=FALSE]#> # A tibble: 32 × 11#> mpg cyl disp hp drat wt qsec vs am gear carb#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 33.9 4 71.1 65 4.22 1.84 19.9 1 1 4 1#> 2 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2#> 3 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1#> 4 27.3 4 79 66 4.08 1.94 18.9 1 1 4 1#> # ℹ 28 more rowsNote the use ofdrop = FALSE. If you forget this, andthe input is a data frame with a single column, the output will be avector, not a data frame. This is a source of subtle bugs.
Base R does not provide a convenient and general way to sortindividual variables in descending order, so you have two options:
-x.order() to sort all variables indescending order.distinct(): Select distinct/unique rowsdplyr::distinct() selects unique rows:
df<-tibble(x =sample(10,100,rep =TRUE),y =sample(10,100,rep =TRUE))df%>%distinct(x)# selected columns#> # A tibble: 10 × 1#> x#> <int>#> 1 3#> 2 5#> 3 4#> 4 7#> # ℹ 6 more rowsdf%>%distinct(x,.keep_all =TRUE)# whole data frame#> # A tibble: 10 × 2#> x y#> <int> <int>#> 1 3 6#> 2 5 2#> 3 4 1#> 4 7 1#> # ℹ 6 more rowsThere are two equivalents in base R, depending on whether you wantthe whole data frame, or just selected variables:
filter(): Return rows with matching conditionsdplyr::filter() selects rows where an expression isTRUE:
starwars%>%filter(species=="Human")#> # A tibble: 35 × 14#> name height mass hair_color skin_color eye_color birth_year sex gender#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>#> 1 Luke Sky… 172 77 blond fair blue 19 male mascu…#> 2 Darth Va… 202 136 none white yellow 41.9 male mascu…#> 3 Leia Org… 150 49 brown light brown 19 fema… femin…#> 4 Owen Lars 178 120 brown, gr… light blue 52 male mascu…#> # ℹ 31 more rows#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,#> # vehicles <list>, starships <list>starwars%>%filter(mass>1000)#> # A tibble: 1 × 14#> name height mass hair_color skin_color eye_color birth_year sex gender#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>#> 1 Jabba De… 175 1358 <NA> green-tan… orange 600 herm… mascu…#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,#> # vehicles <list>, starships <list>starwars%>%filter(hair_color=="none"& eye_color=="black")#> # A tibble: 9 × 14#> name height mass hair_color skin_color eye_color birth_year sex gender#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>#> 1 Nien Nunb 160 68 none grey black NA male mascu…#> 2 Gasgano 122 NA none white, bl… black NA male mascu…#> 3 Kit Fisto 196 87 none green black NA male mascu…#> 4 Plo Koon 188 80 none orange black 22 male mascu…#> # ℹ 5 more rows#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,#> # vehicles <list>, starships <list>The closest base equivalent (and the inspiration forfilter()) issubset():
subset(starwars, species=="Human")#> # A tibble: 35 × 14#> name height mass hair_color skin_color eye_color birth_year sex gender#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>#> 1 Luke Sky… 172 77 blond fair blue 19 male mascu…#> 2 Darth Va… 202 136 none white yellow 41.9 male mascu…#> 3 Leia Org… 150 49 brown light brown 19 fema… femin…#> 4 Owen Lars 178 120 brown, gr… light blue 52 male mascu…#> # ℹ 31 more rows#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,#> # vehicles <list>, starships <list>subset(starwars, mass>1000)#> # A tibble: 1 × 14#> name height mass hair_color skin_color eye_color birth_year sex gender#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>#> 1 Jabba De… 175 1358 <NA> green-tan… orange 600 herm… mascu…#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,#> # vehicles <list>, starships <list>subset(starwars, hair_color=="none"& eye_color=="black")#> # A tibble: 9 × 14#> name height mass hair_color skin_color eye_color birth_year sex gender#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>#> 1 Nien Nunb 160 68 none grey black NA male mascu…#> 2 Gasgano 122 NA none white, bl… black NA male mascu…#> 3 Kit Fisto 196 87 none green black NA male mascu…#> 4 Plo Koon 188 80 none orange black 22 male mascu…#> # ℹ 5 more rows#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,#> # vehicles <list>, starships <list>You can also use[ but this also requires the use ofwhich() to removeNAs:
starwars[which(starwars$species=="Human"), , drop=FALSE]#> # A tibble: 35 × 14#> name height mass hair_color skin_color eye_color birth_year sex gender#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>#> 1 Luke Sky… 172 77 blond fair blue 19 male mascu…#> 2 Darth Va… 202 136 none white yellow 41.9 male mascu…#> 3 Leia Org… 150 49 brown light brown 19 fema… femin…#> 4 Owen Lars 178 120 brown, gr… light blue 52 male mascu…#> # ℹ 31 more rows#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,#> # vehicles <list>, starships <list>starwars[which(starwars$mass>1000), , drop=FALSE]#> # A tibble: 1 × 14#> name height mass hair_color skin_color eye_color birth_year sex gender#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>#> 1 Jabba De… 175 1358 <NA> green-tan… orange 600 herm… mascu…#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,#> # vehicles <list>, starships <list>starwars[which(starwars$hair_color=="none"& starwars$eye_color=="black"), , drop=FALSE]#> # A tibble: 9 × 14#> name height mass hair_color skin_color eye_color birth_year sex gender#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>#> 1 Nien Nunb 160 68 none grey black NA male mascu…#> 2 Gasgano 122 NA none white, bl… black NA male mascu…#> 3 Kit Fisto 196 87 none green black NA male mascu…#> 4 Plo Koon 188 80 none orange black 22 male mascu…#> # ℹ 5 more rows#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,#> # vehicles <list>, starships <list>mutate(): Create or transform variablesdplyr::mutate() creates new variables from existingvariables:
df%>%mutate(z = x+ y,z2 = z^2)#> # A tibble: 100 × 4#> x y z z2#> <int> <int> <int> <dbl>#> 1 3 6 9 81#> 2 5 2 7 49#> 3 4 1 5 25#> 4 7 1 8 64#> # ℹ 96 more rowsThe closest base equivalent istransform(), but notethat it cannot use freshly created variables:
head(transform(df,z = x+ y,z2 = (x+ y)^2))#> x y z z2#> 1 3 6 9 81#> 2 5 2 7 49#> 3 4 1 5 25#> 4 7 1 8 64#> 5 10 7 17 289#> 6 7 3 10 100Alternatively, you can use$<-:
When applied to a grouped data frame,dplyr::mutate()computes new variable once per group:
gf<-tibble(g =c(1,1,2,2),x =c(0.5,1.5,2.5,3.5))gf%>%group_by(g)%>%mutate(x_mean =mean(x),x_rank =rank(x))#> # A tibble: 4 × 4#> # Groups: g [2]#> g x x_mean x_rank#> <dbl> <dbl> <dbl> <dbl>#> 1 1 0.5 1 1#> 2 1 1.5 1 2#> 3 2 2.5 3 1#> 4 2 3.5 3 2To replicate this in base R, you can useave():
pull(): Pull out a single variabledplyr::pull() extracts a variable either by name orposition:
mtcars%>%pull(1)#> [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4#> [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7#> [31] 15.0 21.4mtcars%>%pull(cyl)#> [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4This equivalent to[[ for positions and$for names:
relocate(): Change column orderdplyr::relocate() makes it easy to move a set of columnsto a new position (by default, the front):
# to frontmtcars%>%relocate(gear, carb)#> # A tibble: 32 × 13#> gear carb mpg cyl disp hp drat wt qsec vs am cyl2 cyl4#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 4 4 21 6 160 110 3.9 2.62 16.5 0 1 12 24#> 2 4 4 21 6 160 110 3.9 2.88 17.0 0 1 12 24#> 3 4 1 22.8 4 108 93 3.85 2.32 18.6 1 1 8 16#> 4 3 1 21.4 6 258 110 3.08 3.22 19.4 1 0 12 24#> # ℹ 28 more rows# to backmtcars%>%relocate(mpg, cyl,.after =last_col())#> # A tibble: 32 × 13#> disp hp drat wt qsec vs am gear carb cyl2 cyl4 mpg cyl#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 160 110 3.9 2.62 16.5 0 1 4 4 12 24 21 6#> 2 160 110 3.9 2.88 17.0 0 1 4 4 12 24 21 6#> 3 108 93 3.85 2.32 18.6 1 1 4 1 8 16 22.8 4#> 4 258 110 3.08 3.22 19.4 1 0 3 1 12 24 21.4 6#> # ℹ 28 more rowsWe can replicate this in base R with a little set manipulation:
mtcars[union(c("gear","carb"),names(mtcars))]#> # A tibble: 32 × 13#> gear carb mpg cyl disp hp drat wt qsec vs am cyl2 cyl4#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 4 4 21 6 160 110 3.9 2.62 16.5 0 1 12 24#> 2 4 4 21 6 160 110 3.9 2.88 17.0 0 1 12 24#> 3 4 1 22.8 4 108 93 3.85 2.32 18.6 1 1 8 16#> 4 3 1 21.4 6 258 110 3.08 3.22 19.4 1 0 12 24#> # ℹ 28 more rowsto_back<-c("mpg","cyl")mtcars[c(setdiff(names(mtcars), to_back), to_back)]#> # A tibble: 32 × 13#> disp hp drat wt qsec vs am gear carb cyl2 cyl4 mpg cyl#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 160 110 3.9 2.62 16.5 0 1 4 4 12 24 21 6#> 2 160 110 3.9 2.88 17.0 0 1 4 4 12 24 21 6#> 3 108 93 3.85 2.32 18.6 1 1 4 1 8 16 22.8 4#> 4 258 110 3.08 3.22 19.4 1 0 3 1 12 24 21.4 6#> # ℹ 28 more rowsMoving columns to somewhere in the middle requires a little more settwiddling.
rename(): Rename variables by namedplyr::rename() allows you to rename variables by nameor position:
iris%>%rename(sepal_length = Sepal.Length,sepal_width =2)#> # A tibble: 150 × 5#> sepal_length sepal_width Petal.Length Petal.Width Species#> <dbl> <dbl> <dbl> <dbl> <fct>#> 1 5.1 3.5 1.4 0.2 setosa#> 2 4.9 3 1.4 0.2 setosa#> 3 4.7 3.2 1.3 0.2 setosa#> 4 4.6 3.1 1.5 0.2 setosa#> # ℹ 146 more rowsRenaming variables by position is straight forward in base R:
Renaming variables by name requires a bit more work:
rename_with(): Rename variables with a functiondplyr::rename_with() transform column names with afunction:
iris%>%rename_with(toupper)#> # A tibble: 150 × 5#> SEPAL.LENGTH SEPAL.WIDTH PETAL.LENGTH PETAL.WIDTH SPECIES#> <dbl> <dbl> <dbl> <dbl> <fct>#> 1 5.1 3.5 1.4 0.2 setosa#> 2 4.9 3 1.4 0.2 setosa#> 3 4.7 3.2 1.3 0.2 setosa#> 4 4.6 3.1 1.5 0.2 setosa#> # ℹ 146 more rowsA similar effect can be achieved withsetNames() in baseR:
select(): Select variables by namedplyr::select() subsets columns by position, name,function of name, or other property:
iris%>%select(1:3)#> # A tibble: 150 × 3#> Sepal.Length Sepal.Width Petal.Length#> <dbl> <dbl> <dbl>#> 1 5.1 3.5 1.4#> 2 4.9 3 1.4#> 3 4.7 3.2 1.3#> 4 4.6 3.1 1.5#> # ℹ 146 more rowsiris%>%select(Species, Sepal.Length)#> # A tibble: 150 × 2#> Species Sepal.Length#> <fct> <dbl>#> 1 setosa 5.1#> 2 setosa 4.9#> 3 setosa 4.7#> 4 setosa 4.6#> # ℹ 146 more rowsiris%>%select(starts_with("Petal"))#> # A tibble: 150 × 2#> Petal.Length Petal.Width#> <dbl> <dbl>#> 1 1.4 0.2#> 2 1.4 0.2#> 3 1.3 0.2#> 4 1.5 0.2#> # ℹ 146 more rowsiris%>%select(where(is.factor))#> # A tibble: 150 × 1#> Species#> <fct>#> 1 setosa#> 2 setosa#> 3 setosa#> 4 setosa#> # ℹ 146 more rowsSubsetting variables by position is straightforward in base R:
iris[1:3]# single argument selects columns; never drops#> # A tibble: 150 × 3#> Sepal.Length Sepal.Width Petal.Length#> <dbl> <dbl> <dbl>#> 1 5.1 3.5 1.4#> 2 4.9 3 1.4#> 3 4.7 3.2 1.3#> 4 4.6 3.1 1.5#> # ℹ 146 more rowsiris[1:3, , drop=FALSE]#> # A tibble: 3 × 5#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species#> <dbl> <dbl> <dbl> <dbl> <fct>#> 1 5.1 3.5 1.4 0.2 setosa#> 2 4.9 3 1.4 0.2 setosa#> 3 4.7 3.2 1.3 0.2 setosaYou have two options to subset by name:
iris[c("Species","Sepal.Length")]#> # A tibble: 150 × 2#> Species Sepal.Length#> <fct> <dbl>#> 1 setosa 5.1#> 2 setosa 4.9#> 3 setosa 4.7#> 4 setosa 4.6#> # ℹ 146 more rowssubset(iris,select =c(Species, Sepal.Length))#> # A tibble: 150 × 2#> Species Sepal.Length#> <fct> <dbl>#> 1 setosa 5.1#> 2 setosa 4.9#> 3 setosa 4.7#> 4 setosa 4.6#> # ℹ 146 more rowsSubsetting by function of name requires a bit of work withgrep():
iris[grep("^Petal",names(iris))]#> # A tibble: 150 × 2#> Petal.Length Petal.Width#> <dbl> <dbl>#> 1 1.4 0.2#> 2 1.4 0.2#> 3 1.3 0.2#> 4 1.5 0.2#> # ℹ 146 more rowsAnd you can useFilter() to subset by type:
summarise(): Reduce multiple values down to a singlevaluedplyr::summarise() computes one or more summaries foreach group:
mtcars%>%group_by(cyl)%>%summarise(mean =mean(disp),n =n())#> # A tibble: 3 × 3#> cyl mean n#> <dbl> <dbl> <int>#> 1 4 105. 11#> 2 6 183. 7#> 3 8 353. 14I think the closest base R equivalent usesby().Unfortunatelyby() returns a list of data frames, but youcan combine them back together again withdo.call() andrbind():
mtcars_by<-by(mtcars, mtcars$cyl,function(df) {with(df,data.frame(cyl = cyl[[1]],mean =mean(disp),n =nrow(df)))})do.call(rbind, mtcars_by)#> cyl mean n#> 4 4 105.1364 11#> 6 6 183.3143 7#> 8 8 353.1000 14aggregate() comes very close to providing an elegantanswer:
agg<-aggregate(disp~ cyl, mtcars,function(x)c(mean =mean(x),n =length(x)))agg#> cyl disp.mean disp.n#> 1 4 105.1364 11.0000#> 2 6 183.3143 7.0000#> 3 8 353.1000 14.0000But unfortunately while it looks like there aredisp.mean anddisp.n columns, it’s actually asingle matrix column:
str(agg)#> 'data.frame': 3 obs. of 2 variables:#> $ cyl : num 4 6 8#> $ disp: num [1:3, 1:2] 105 183 353 11 7 ...#> ..- attr(*, "dimnames")=List of 2#> .. ..$ : NULL#> .. ..$ : chr [1:2] "mean" "n"You can see a variety of other options athttps://gist.github.com/hadley/c430501804349d382ce90754936ab8ec.
slice(): Choose rows by positionslice() selects rows with their location:
slice(mtcars,25:n())#> # A tibble: 8 × 13#> mpg cyl disp hp drat wt qsec vs am gear carb cyl2 cyl4#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 19.2 8 400 175 3.08 3.84 17.0 0 0 3 2 16 32#> 2 27.3 4 79 66 4.08 1.94 18.9 1 1 4 1 8 16#> 3 26 4 120. 91 4.43 2.14 16.7 0 1 5 2 8 16#> 4 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2 8 16#> # ℹ 4 more rowsThis is straightforward to replicate with[:
mtcars[25:nrow(mtcars), , drop=FALSE]#> # A tibble: 8 × 13#> mpg cyl disp hp drat wt qsec vs am gear carb cyl2 cyl4#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 19.2 8 400 175 3.08 3.84 17.0 0 0 3 2 16 32#> 2 27.3 4 79 66 4.08 1.94 18.9 1 1 4 1 8 16#> 3 26 4 120. 91 4.43 2.14 16.7 0 1 5 2 8 16#> 4 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2 8 16#> # ℹ 4 more rowsWhen we want to merge two data frames,x andy), we have a variety of different ways to bring themtogether. Various base Rmerge() calls are replaced by avariety of dplyrjoin() functions.
| dplyr | base |
|---|---|
inner_join(df1, df2) | merge(df1, df2) |
left_join(df1, df2) | merge(df1, df2, all.x = TRUE) |
right_join(df1, df2) | merge(df1, df2, all.y = TRUE) |
full_join(df1, df2) | merge(df1, df2, all = TRUE) |
semi_join(df1, df2) | df1[df1$x %in% df2$x, , drop = FALSE] |
anti_join(df1, df2) | df1[!df1$x %in% df2$x, , drop = FALSE] |
For more information about two-table verbs, seevignette("two-table").
dplyr’sinner_join(),left_join(),right_join(), andfull_join() add new columnsfromy tox, matching rows based on a set of“keys”, and differ only in how missing matches are handled. They areequivalent to calls tomerge() with various settings of theall,all.x, andall.y arguments.The main difference is the order of the rows:
x data frame.merge() sorts the key columns.dplyr’ssemi_join() andanti_join() affectonly the rows, not the columns:
band_members%>%semi_join(band_instruments)#> Joining with `by = join_by(name)`#> # A tibble: 2 × 2#> name band#> <chr> <chr>#> 1 John Beatles#> 2 Paul Beatlesband_members%>%anti_join(band_instruments)#> Joining with `by = join_by(name)`#> # A tibble: 1 × 2#> name band#> <chr> <chr>#> 1 Mick StonesThey can be replicated in base R with[ and%in%:
band_members[band_members$name%in% band_instruments$name, , drop=FALSE]#> # A tibble: 2 × 2#> name band#> <chr> <chr>#> 1 John Beatles#> 2 Paul Beatlesband_members[!band_members$name%in% band_instruments$name, , drop=FALSE]#> # A tibble: 1 × 2#> name band#> <chr> <chr>#> 1 Mick StonesSemi and anti joins with multiple key variables are considerably morechallenging to implement.