For absolute physical speed, usedata.table directly. Whilethe learning curve might be longer, the improvement of computationperformance pays off if you are dealing with large datasets frequently.There are several ways to cut intodata.table syntax to gainhigher performance intidyfst. A convenient way is to use theDT[I,J,BY] syntax after the pipe(%>%).
library(tidyfst)#> Thank you for using tidyfst!#> To acknowledge our work, please cite the package:#> Huang et al., (2020). tidyfst: Tidy Verbs for Fast Data Manipulation. Journal of Open Source Software, 5(52), 2388, https://doi.org/10.21105/joss.02388iris%>%as_dt()%>%#coerce a data.frame to data.table.[,.SD[1],by=Species]#> Species Sepal.Length Sepal.Width Petal.Length Petal.Width#> <fctr> <num> <num> <num> <num>#> 1: setosa 5.1 3.5 1.4 0.2#> 2: versicolor 7.0 3.2 4.7 1.4#> 3: virginica 6.3 3.3 6.0 2.5This syntax is not so consistent with the tidy syntax, thereforein_dt is also designed for the short cut todata.table method, which could be used as:
iris%>%in_dt(,.SD[1],by=Species)#> Species Sepal.Length Sepal.Width Petal.Length Petal.Width#> <fctr> <num> <num> <num> <num>#> 1: setosa 5.1 3.5 1.4 0.2#> 2: versicolor 7.0 3.2 4.7 1.4#> 3: virginica 6.3 3.3 6.0 2.5in_dt follows the basic principals oftidyfst,which include: (1) Never use in place replacement. Therefore, the inplace functions like:= will still return the results. (2)Always recieves a data frame (data.frame/tibble/data.table) and returnsa data.table. This means you don’t have to writeas.data.table oras_dt all the time as long asyou are working on data frames in R.