- Notifications
You must be signed in to change notification settings - Fork7
Tidy Verbs for Fast Data Manipulation
License
Unknown, MIT licenses found
Licenses found
hope-data-science/tidyfst
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
tidyfst is a toolkit of tidy data manipulation verbs withdata.table as the backend . Combining the merits of syntax elegance fromdplyr and computing performance fromdata.table,tidyfst intends to provide users with state-of-the-art data manipulation tools with least pain. This package is an extension ofdata.table, while enjoying a tidy syntax, it also wraps combinations of efficient functions to facilitate frequently-used data operations. Also,tidyfst would introduce more tidy data verbs from other packages, including but not limited totidyverse anddata.table. If you are adplyr user but have to usedata.table for speedy computation, ordata.table user looking for readable coding syntax,tidyfst is designed for you (and me of course). For further details and tutorials, seevignettes. BothChinese andEnglish tutorials could be found there.
Till now,tidyfst has an API that might even transcend its predecessors (e.g.select_dt could accept nearly anything for super column selection). Enjoy the efficient data operations intidyfst !
PS: For extreme performance in tidy syntax, trytidyfst's mirror packagetidyft.
- Receives any data.frame (tibble/data.table/data.frame) and returns a data.table.
- Show the variable class of data.table as default.
- Never use in place replacement (also known as modification by reference, which means the original variable would not be modified without notification).
- Use suffix ("_dt") rather than prefix to increase the efficiency (especially when you have IDE with automatic code completion).
- More flexible verbs (e.g.pairwise_count_dt) for big data manipulation.
- Supporting data importing and parsing withfst, which saves both time and memory. Details seeparse_fst/select_fst/filter_fst andimport_fst/export_fst.
- Low and stable dependency on mature packages (data.table, fst, stringr)
install.packages("tidyfst")library(tidyfst)iris %>% mutate_dt(group=Species,sl=Sepal.Length,sw=Sepal.Width) %>% select_dt(group,sl,sw) %>% filter_dt(sl>5) %>% arrange_dt(group,sl) %>% distinct_dt(sl,.keep_all=T) %>% summarise_dt(sw= max(sw),by=group)#> group sw#> <fctr> <num>#> 1: setosa 4.4#> 2: versicolor 3.4#> 3: virginica 3.8iris %>% count_dt(Species) %>% add_prop()#> Species n prop prop_label#> <fctr> <int> <num> <char>#> 1: setosa 50 0.3333333 33.3%#> 2: versicolor 50 0.3333333 33.3%#> 3: virginica 50 0.3333333 33.3%iris[3:8,] %>% mutate_when(Petal.Width==.2,one=1,Sepal.Length=2)#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species one#> <num> <num> <num> <num> <fctr> <num>#> 1: 2.0 3.2 1.3 0.2 setosa 1#> 2: 2.0 3.1 1.5 0.2 setosa 1#> 3: 2.0 3.6 1.4 0.2 setosa 1#> 4: 5.4 3.9 1.7 0.4 setosa NA#> 5: 4.6 3.4 1.4 0.3 setosa NA#> 6: 2.0 3.4 1.5 0.2 setosa 1
tidyfst will keep up with theupdates ofdata.table , in the next step would introduce more new features to improve the performance and flexibility to facilitate fast data manipulation in tidy syntax.
- Example 1: Basic usage
- Example 2: Join tables
- Example 3: Reshape
- Example 4: Nest
- Example 5: Fst
- Example 6: Dt
Huang et al., (2020). tidyfst: Tidy Verbs for Fast Data Manipulation. Journal of Open Source Software, 5(52), 2388,https://doi.org/10.21105/joss.02388
The author ofmaditr,Gregory Demin and the author offst,Marcus Klik have helped me a lot in the development of this work. It is so lucky to have them (and many other selfless contributors) in the same open source community of R.
About
Tidy Verbs for Fast Data Manipulation
Resources
License
Unknown, MIT licenses found
Licenses found
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Contributors3
Uh oh!
There was an error while loading.Please reload this page.

