NotificationsYou must be signed in to change notification settings
Fork13
Star56

R package for data manipulation — inspired by Stata's API

56 stars 13 forks Branches Tags Activity

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 998 Commits
.github		.github
R		R
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
CRAN-RELEASE		CRAN-RELEASE
CRAN-SUBMISSION		CRAN-SUBMISSION
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
cran-comments.md		cran-comments.md

Repository files navigation

statar

This package contains R functions corresponding to useful Stata commands.

The package includes:

panel data functions (monthly/quarterly dates, lead/lag, fillin)
data.frame functions (tabulate, merge)
vector functions (xtile, pctile, winsorize)
graph functions (binscatter)

Data Frame Functions

sum_up = summarize

sum_up prints detailed summary statistics (corresponds to Statasummarize)

N<-100df<- tibble(id=1:N,v1= sample(5,N,TRUE),v2= sample(1e6,N,TRUE))sum_up(df)df %>% sum_up(starts_with("v"),d=TRUE)df %>% group_by(v1) %>%  sum_up()

tab = tabulate

tab prints distinct rows with their count. Compared to the dplyr functioncount, this command adds frequency, percent, and cumulative percent.

N<-1e2 ;K=10df<- tibble(id= sample(c(NA,1:5),N/K,TRUE),v1= sample(1:5,N/K,TRUE)       )tab(df,id)tab(df,id,na.rm=TRUE)tab(df,id,v1)

join = merge

join is a wrapper for dplyr merge functionalities, with two added functions

The optioncheck checks there are no duplicates in the master or using data.tables (as in Stata).
```
# merge m:1 v1join(x,y,kind="full",check=m~1)
```
The optiongen specifies the name of a new variable that identifies non matched and matched rows (as in Stata).
```
# merge m:1 v1, gen(_merge)join(x,y,kind="full",gen="_merge")
```
The optionupdate allows to update missing values of the master dataset by the value in the using dataset

Vector Functions

# pctile computes quantile and weighted quantile of type 2 (similarly to Stata _pctile)v<- c(NA,1:10)                   pctile(v,probs= c(0.3,0.7),na.rm=TRUE)# xtile creates integer variable for quantile categories (corresponds to Stata xtile)v<- c(NA,1:10)                   xtile(v,n_quantiles=3)# 3 groups based on tercilesxtile(v,probs= c(0.3,0.7))# 3 groups based on two quantilesxtile(v,cutpoints= c(2,3))# 3 groups based on two cutpoints# winsorize (default based on 5 x interquartile range)v<- c(1:4,99)winsorize(v)winsorize(v,replace=NA)winsorize(v,probs= c(0.01,0.99))winsorize(v,cutpoints= c(1,50))

Panel Data Functions

Elapsed dates

The classes "monthly" and "quarterly" print as dates and are compatible with usual time extraction (iemonth,year, etc). Yet, they are stored as integers representing the number of elapsed periods since 1970/01/0 (resp in week, months, quarters). This is particularly handy for simple algebra:

# elapsed dates library(lubridate)date<- mdy(c("04/03/1992","01/04/1992","03/15/1992"))datem<- as.monthly(date)# displays as a perioddatem#> [1] "1992m04" "1992m01" "1992m03"# behaves as an integer for numerical operations:datem+1#> [1] "1992m05" "1992m02" "1992m04"# behaves as a date for period extractions: year(datem)#> [1] 1992 1992 1992

lag / lead

tlag/tlead a vector with respect to a number of periods,not with respect to the number of rows

year<- c(1989,1991,1992)value<- c(4.1,4.5,3.3)tlag(value,1,time=year)library(lubridate)date<- mdy(c("01/04/1992","03/15/1992","04/03/1992"))datem<- as.monthly(date)value<- c(4.1,4.5,3.3)tlag(value,time=datem)

In constrast to comparable functions inzoo andxts, these functions can be applied to any vector and be used within adplyr chain:

df<- tibble(id= c(1,1,1,2,2),year= c(1989,1991,1992,1991,1992),value= c(4.1,4.5,3.3,3.2,5.2))df %>% group_by(id) %>% mutate(value_l= tlag(value,time=year))

is.panel

is.panel checks whether a dataset is a panel i.e. the time variable is never missing and the combinations (id, time) are unique.

df<- tibble(id1= c(1,1,1,2,2),id2=1:5,year= c(1991,1993,NA,1992,1992),value= c(4.1,4.5,3.3,3.2,5.2))df %>% group_by(id1) %>% is.panel(year)df1<-df %>% filter(!is.na(year))df1 %>% is.panel(year)df1 %>% group_by(id1) %>% is.panel(year)df1 %>% group_by(id1,id2) %>% is.panel(year)

fill_gap

fill_gap transforms a unbalanced panel into a balanced panel. It corresponds to the stata commandtsfill. Missing observations are added as rows with missing values.

df<- tibble(id= c(1,1,1,2),datem= as.monthly(mdy(c("04/03/1992","01/04/1992","03/15/1992","05/11/1992"))),value= c(4.1,4.5,3.3,3.2))df %>% group_by(id) %>% fill_gap(datem)df %>% group_by(id) %>% fill_gap(datem,full=TRUE)df %>% group_by(id) %>% fill_gap(datem,roll="nearest")

Graph Functions

stat_binmean

stat_binmean() (astat for ggplot2) returns the mean ofy andx within 20 bins ofx. It's a barebone version of the Stata commandbinscatter

ggplot(iris, aes(x=Sepal.Width ,y=Sepal.Length))+ stat_binmean()# change number of binsggplot(iris, aes(x=Sepal.Width ,y=Sepal.Length,color=Species))+ stat_binmean(n=10)# add regression lineggplot(iris, aes(x=Sepal.Width ,y=Sepal.Length,color=Species))+ stat_binmean()+ stat_smooth(method="lm",se=FALSE)

Installation

You can install

The latest released version fromCRAN with
```
 install.packages("statar")
```

The current version fromgithub with

devtools::install_github("matthieugomez/statar")

About

R package for data manipulation — inspired by Stata's API

Languages

R100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

statar

Data Frame Functions

sum_up = summarize

tab = tabulate

join = merge

Vector Functions

Panel Data Functions

Elapsed dates

lag / lead

is.panel

fill_gap

Graph Functions

stat_binmean

Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors4

Uh oh!

Languages

Movatterモバイル変換

matthieugomez/statar

Folders and files

Latest commit

History

Repository files navigation

statar

Data Frame Functions

sum_up = summarize

tab = tabulate

join = merge

Vector Functions

Panel Data Functions

Elapsed dates

lag / lead

is.panel

fill_gap

Graph Functions

stat_binmean

Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors4

Uh oh!

Languages

Packages