Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Data table backend for dplyr

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
NotificationsYou must be signed in to change notification settings

tidyverse/dtplyr

CRAN statusR-CMD-checkCodecov test coverage

Overview

data.table seal of approvaldtplyrprovides adata.table backend for dplyr. Thegoal of dtplyr is to allow you to write dplyr code that is automaticallytranslated to the equivalent, but usually much faster, data.table code.

Seevignette("translation") for details of the current translations,andtable.express andrqdatatable for relatedwork.

Installation

You can install from CRAN with:

install.packages("dtplyr")

Or try the development version from GitHub with:

# install.packages("pak")pak::pak("tidyverse/dtplyr")

Usage

To use dtplyr, you must at least load dtplyr and dplyr. You may alsowant to loaddata.table so you can access theother goodies that it provides:

library(data.table)library(dtplyr)library(dplyr,warn.conflicts=FALSE)

Then uselazy_dt() to create a “lazy” data table that tracks theoperations performed on it.

mtcars2<- lazy_dt(mtcars)

You can preview the transformation (including the generated data.tablecode) by printing the result:

mtcars2 %>%  filter(wt<5) %>%  mutate(l100k=235.21/mpg) %>%# liters / 100 km  group_by(cyl) %>%  summarise(l100k= mean(l100k))#> Source: local data table [3 x 2]#> Call:   `_DT1`[wt < 5][, `:=`(l100k = 235.21/mpg)][, .(l100k = mean(l100k)),#>     keyby = .(cyl)]#>#>     cyl l100k#>   <dbl> <dbl>#> 1     4  9.05#> 2     6 12.0#> 3     8 14.9#>#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

But generally you should reserve this only for debugging, and useas.data.table(),as.data.frame(), oras_tibble() to indicate thatyou’re done with the transformation and want to access the results:

mtcars2 %>%  filter(wt<5) %>%  mutate(l100k=235.21/mpg) %>%# liters / 100 km  group_by(cyl) %>%  summarise(l100k= mean(l100k)) %>%  as_tibble()#> # A tibble: 3 × 2#>     cyl l100k#>   <dbl> <dbl>#> 1     4  9.05#> 2     6 12.0#> 3     8 14.9

Why is dtplyr slower than data.table?

There are two primary reasons that dtplyr will always be somewhat slowerthan data.table:

  • Each dplyr verb must do some work to convert dplyr syntax todata.table syntax. This takes time proportional to the complexity ofthe input code, not the inputdata, so should be a negligibleoverhead for large datasets.Initialbenchmarkssuggest that the overhead should be under 1ms per dplyr call.

  • To match dplyr semantics,mutate() does not modify in place bydefault. This means that most expressions involvingmutate() mustmake a copy that would not be necessary if you were using data.tabledirectly. (You can opt out of this behaviour inlazy_dt() withimmutable = FALSE).

Code of Conduct

Please note that the dtplyr project is released with aContributor Codeof Conduct. Bycontributing to this project, you agree to abide by its terms.

About

Data table backend for dplyr

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Code of conduct

Contributing

Stars

Watchers

Forks

Contributors36

Languages


[8]ページ先頭

©2009-2025 Movatter.jp