FixedEffects/FixedEffectModels.jlPublic

NotificationsYou must be signed in to change notification settings
Fork46
Star240

Fast Estimation of Linear Models with IV and High Dimensional Categorical Variables

License

View license

240 stars 46 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,056 Commits
.github/workflows		.github/workflows
benchmark		benchmark
dataset		dataset
src		src
test		test
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.travis.yml		.travis.yml
LICENSE.md		LICENSE.md
Project.toml		Project.toml
README.md		README.md

Repository files navigation

This package estimates linear models with high dimensional categorical variables, potentially including instrumental variables.

Installation

The package is registered in theGeneral registry and so can be installed at the REPL with] add FixedEffectModels.

Benchmarks

The objective of the package is similar to the Stata commandreghdfe and the R packageslfe andfixest. The package is much faster thanreghdfe orlfe. It also tends to be a bit faster than the more recentfixest (depending on the exact command). For complicated models,FixedEffectModels can also run on Nvidia GPUs for even faster performances (see below)

Syntax

using DataFrames, RDatasets, FixedEffectModelsdf=dataset("plm","Cigar")reg(df,@formula(Sales~ NDI+fe(State)+fe(Year)), Vcov.cluster(:State), weights=:Pop)#                             FixedEffectModel# =========================================================================# Number of obs:                 1380   Converged:                     true# dof (model):                      1   dof (residuals):                 45# R²:                           0.803   R² adjusted:                  0.798# F-statistic:                13.3382   P-value:                      0.001# R² within:                    0.139   Iterations:                       5# =========================================================================#         Estimate  Std. Error    t-stat  Pr(>|t|)   Lower 95%    Upper 95%# ─────────────────────────────────────────────────────────────────────────# NDI  -0.00526264  0.00144097  -3.65216    0.0007  -0.0081649  -0.00236038# =========================================================================

A typical formula is composed of one dependent variable, exogenous variables, endogenous variables, instrumental variables, and a set of high-dimensional fixed effects.
```
dependent variable~ exogenous variables+ (endogenous variables~ instrumental variables)+fe(fixedeffect variable)
```
High-dimensional fixed effect variables are indicated with the functionfe. You can add an arbitrary number of high dimensional fixed effects, separated with+. You can also interact fixed effects using& or*.
For instance, to add state fixed effects usefe(State). To add both state and year fixed effects, usefe(State) + fe(Year). To add state-year fixed effects, usefe(State)&fe(Year). To add state specific slopes for year, usefe(State)&Year. To add both state fixed-effects and state specific slopes for year usefe(State)*Year.
```
reg(df,@formula(Sales~ Price+fe(State)+fe(Year)))reg(df,@formula(Sales~ NDI+fe(State)+fe(State)&Year))reg(df,@formula(Sales~ NDI+fe(State)&fe(Year)))# for illustration only (this will not run here)reg(df,@formula(Sales~ (Price~ Pimin)))
```
To construct formula programmatically, use
```
reg(df,term(:Sales)~term(:NDI)+fe(:State)+fe(:Year))
```

The optioncontrasts specifies that a column should be understood as a set of dummy variables:

reg(df,@formula(Sales~ Price+ Year); contrasts=Dict(:Year=>DummyCoding()))

You can specify different base levels

reg(df,@formula(Sales~ Price+ Year); contrasts=Dict(:Year=>DummyCoding(base=80)))

The optionweights specifies a variable for weights
```
 weights=:Pop
```
Standard errors are indicated with the prefixVcov (with the packageVcov)
```
 Vcov.robust() Vcov.cluster(:State) Vcov.cluster(:State,:Year)
```
The optionsave can be set to one of the following::none (default) to save nothing,:residuals to save residuals,:fe to save fixed effects, and:all to save both. Once saved, they can then be accessed usingresiduals(m) orfe(m) wherem is the estimated model (the object returned by the functionreg). Both residuals and fixed effects are aligned with the original dataframe used to estimate the model.
The optionmethod can be set to one of the following::cpu,:CUDA, or:Metal (see Performances below).

Output

reg returns a light object. It is composed of

the vector of coefficients & the covariance matrix (usecoef,coefnames,vcov on the output ofreg)
a boolean vector reporting rows used in the estimation
a set of scalars (number of observations, the degree of freedoms, r2, etc)

Methods such aspredict,residuals are still defined but require to specify a dataframe as a second argument. The problematic size oflm andglm models in R or Julia is discussedhere,here,here here (and for absurd consequences,here andthere).

You may useRegressionTables.jl to get publication-quality regression tables.

Performances

MultiThreads

FixedEffectModels is multi-threaded. Use the optionnthreads to select the number of threads to use in the estimation (defaults toThreads.nthreads()).

GPUs

The package has an experimental support for GPUs. This can make the package an order of magnitude faster for complicated problems.

If you have a Nvidia GPU, runusing CUDA beforeusing FixedEffectModels. Then, estimate a model withmethod = :CUDA.

using CUDA, FixedEffectModels@assert CUDA.functional()df=dataset("plm","Cigar")reg(df,@formula(Sales~ NDI+fe(State)+fe(Year)), method=:CUDA)

The package also supports Apple GPUs withMetal.jl, although I could not find a way to get better performance

using Metal, FixedEffectModels@assert Metal.functional()df=dataset("plm","Cigar")reg(df,@formula(Sales~ NDI+fe(State)+fe(Year)), method=:Metal)

Solution Method

Denote the modely = X β + D θ + e where X is a matrix with few columns and D is the design matrix from categorical variables. Estimates forβ, along with their standard errors, are obtained in two steps:

y, X are regressed onD using the packageFixedEffects.jl
Estimates forβ, along with their standard errors, are obtained by regressing the projectedy on the projectedX (an application of the Frisch Waugh-Lovell Theorem)
With the optionsave = true, estimates for the high dimensional fixed effects are obtained after regressing the residuals of the full model minus the residuals of the partialed out models onD using the packageFixedEffects.jl

References

Baum, C. and Schaffer, M. (2013)AVAR: Stata module to perform asymptotic covariance estimation for iid and non-iid data robust to heteroskedasticity, autocorrelation, 1- and 2-way clustering, and common cross-panel autocorrelated disturbances. Statistical Software Components, Boston College Department of Economics.

Correia, S. (2014)REGHDFE: Stata module to perform linear or instrumental-variable regression absorbing any number of high-dimensional fixed effects. Statistical Software Components, Boston College Department of Economics.

Fong, DC. and Saunders, M. (2011)LSMR: An Iterative Algorithm for Sparse Least-Squares Problems. SIAM Journal on Scientific Computing

Gaure, S. (2013)OLS with Multiple High Dimensional Category Variables. Computational Statistics and Data Analysis