- Notifications
You must be signed in to change notification settings - Fork18
A Grammar of Data Manipulation in python
License
NotificationsYou must be signed in to change notification settings
pwwang/datar
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A Grammar of Data Manipulation in python
Documentation |Reference Maps |Notebook Examples |API
datar
is a re-imagining of APIs for data manipulation in python with multiple backends supported. Those APIs are aligned with tidyverse packages in R as much as possible.
pip install -U datar# install with a backendpip install -U datar[pandas]# More backends support coming soon
Repo | Badges |
---|---|
datar-numpy | |
datar-pandas | |
datar-arrow |
# with pandas backendfromdatarimportffromdatar.dplyrimportmutate,filter_,if_elsefromdatar.tibbleimporttibble# or# from datar.all import f, mutate, filter_, if_else, tibbledf=tibble(x=range(4),# or c[:4] (from datar.base import c)y=['zero','one','two','three'])df>>mutate(z=f.x)"""# output x y z <int64> <object> <int64>0 0 zero 01 1 one 12 2 two 23 3 three 3"""df>>mutate(z=if_else(f.x>1,1,0))"""# output: x y z <int64> <object> <int64>0 0 zero 01 1 one 02 2 two 13 3 three 1"""df>>filter_(f.x>1)"""# output: x y <int64> <object>0 2 two1 3 three"""df>>mutate(z=if_else(f.x>1,1,0))>>filter_(f.z==1)"""# output: x y z <int64> <object> <int64>0 2 two 11 3 three 1"""
# works with plotnine# example grabbed from https://github.com/has2k1/plydataimportnumpyfromdatarimportffromdatar.baseimportsin,pifromdatar.tibbleimporttibblefromdatar.dplyrimportmutate,if_elsefromplotnineimportggplot,aes,geom_line,theme_classicdf=tibble(x=numpy.linspace(0,2*pi,500))(df>>mutate(y=sin(f.x),sign=if_else(f.y>=0,"positive","negative"))>>ggplot(aes(x="x",y="y"))+theme_classic()+geom_line(aes(color="sign"),size=1.2))
# very easy to integrate with other libraries# for example: klibimportklibfrompipdaimportregister_verbfromdatarimportffromdatar.dataimportirisfromdatar.dplyrimportpulldist_plot=register_verb(func=klib.dist_plot)iris>>pull(f.Sepal_Length)>>dist_plot()
Thanks for your excellent package to port R (
dplyr
) flow of processing to Python. I have been using other alternatives, and yours is the one that offers the most extensive and equivalent to what is possible now withdplyr
.
About
A Grammar of Data Manipulation in python
Topics
Resources
License
Stars
Watchers
Forks
Packages0
No packages published