| Type: | Package |
| Title: | Fast Fixed-Effects Estimations |
| Version: | 0.13.2 |
| Imports: | stats, graphics, grDevices, tools, utils, methods, numDeriv,nlme, sandwich, Rcpp(≥ 1.0.5), dreamerr(≥ 1.4.0),stringmagic(≥ 1.2.0) |
| Suggests: | knitr, rmarkdown, data.table, plm, MASS, pander, ggplot2,lfe, tinytex, pdftools, emmeans, estimability, AER, Matrix |
| LinkingTo: | Rcpp |
| Depends: | R(≥ 3.5.0) |
| Description: | Fast and user-friendly estimation of econometric models with multiple fixed-effects. Includes ordinary least squares (OLS), generalized linear models (GLM) and the negative binomial. The core of the package is based on optimized parallel C++ code, scaling especially well for large data sets. The method to obtain the fixed-effects coefficients is based on Berge (2018)https://github.com/lrberge/fixest/blob/master/_DOCS/FENmlm_paper.pdf. Further provides tools to export and view the results of several estimations with intuitive design to cluster the standard-errors. |
| License: | GPL-3 |
| BugReports: | https://github.com/lrberge/fixest/issues |
| URL: | https://lrberge.github.io/fixest/,https://github.com/lrberge/fixest |
| VignetteBuilder: | knitr |
| LazyData: | true |
| RoxygenNote: | 7.3.2.9000 |
| Encoding: | UTF-8 |
| NeedsCompilation: | yes |
| Packaged: | 2025-09-05 09:44:37 UTC; berge028 |
| Author: | Laurent Berge [aut, cre], Sebastian Krantz [ctb], Grant McDermott |
| Maintainer: | Laurent Berge <laurent.berge@u-bordeaux.fr> |
| Repository: | CRAN |
| Date/Publication: | 2025-09-08 07:30:02 UTC |
Fast and User-Friendly Fixed-Effects Estimations
Description
The packagefixest provides a family of functions to perform estimationswith multiple fixed-effects. Standard-errors can be easily and intuitively clustered.It also includes tools to seamlessly export the results of various estimations.
To get started, look at theintroduction.
Details
The main features are:
Estimation. The core functions are:
feols,feglmandfemlmtoestimate, respectively, linear models, generalized linear models and maximum likelihoodmodels with multiple fixed-effects. The functionfeNmlmallows the inclusion ofnon-linear in parameters right hand sides. Finallyfepoisandfenegbinare shorthands to estimate Poisson and NegativeBinomial models.Multiple estimations: You can perform multiple estimations at once withthe
stepwisefunctions. It's then very easy to manipulate multiple resultswith the associated methods. See an introduction in the dedicated vignette:Multiple estimationsEasy and flexible clustering of standard-errors. By using the arguments
vcovandssc(seesummary.fixest). To have a sense of how the standard errors are computed,see the vignetteOn standard-errors.Visualization and exportation of results. You can visualize the results ofmultiple estimations in R, or export them in Latex using the function
etable.This vignette details how to customize the Latex tables:Exporting estimation tables.Plot multiple results. You can plot the coefficients and confidence intervals ofestimations easily with the function
coefplot. This function also offers a specificlayout for interactions.
Author(s)
Maintainer: Laurent Bergelaurent.berge@u-bordeaux.fr
Other contributors:
Sebastian Krantz [contributor]
Grant McDermottgrantmcd@uoregon.edu (ORCID) [contributor]
Russell Lenthrussell-lenth@uiowa.edu [contributor]
Kyle Buttsbuttskyle96@gmail.com [contributor]
References
Berge, Laurent, 2018, "Efficient estimation of maximum likelihood modelswith multiple fixed-effects: the R package FENmlm." CREA Discussion Papers,13 ().
See Also
Useful links:
Report bugs athttps://github.com/lrberge/fixest/issues
Aikake's an information criterion
Description
This function computes the AIC (Aikake's, an information criterion) from afixest estimation.
Usage
## S3 method for class 'fixest'AIC(object, ..., k = 2)Arguments
object | A |
... | Optionally, more fitted objects. |
k | A numeric, the penalty per parameter to be used; the default k = 2 is theclassical AIC (i.e. |
Details
The AIC is computed as:
AIC = -2\times LogLikelihood + k\times nbParams
with k the penalty parameter.
You can have more information on this criterion onAIC.
Value
It return a numeric vector, with length the same as the number of objects taken as arguments.
Author(s)
Laurent Berge
See Also
See also the main estimation functionsfemlm,feols orfeglm.Other statictics methods:BIC.fixest,logLik.fixest,nobs.fixest.
Examples
# two fitted models with different expl. variables:res1 = femlm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width | Species, iris)res2 = femlm(Sepal.Length ~ Petal.Width | Species, iris)AIC(res1, res2)BIC(res1, res2)Bayesian information criterion
Description
This function computes the BIC (Bayesian information criterion) from afixest estimation.
Usage
## S3 method for class 'fixest'BIC(object, ...)Arguments
object | A |
... | Optionally, more fitted objects. |
Details
The BIC is computed as follows:
BIC = -2\times LogLikelihood + \log(nobs)\times nbParams
with k the penalty parameter.
You can have more information on this criterion onAIC.
Value
It return a numeric vector, with length the same as the number of objects taken as arguments.
Author(s)
Laurent Berge
See Also
See also the main estimation functionsfemlm,feols orfeglm. Other statistics functions:AIC.fixest,logLik.fixest.
Examples
# two fitted models with different expl. variables:res1 = femlm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width | Species, iris)res2 = femlm(Sepal.Length ~ Petal.Width | Species, iris)AIC(res1, res2)BIC(res1, res2)Subsets a fixest_multi object
Description
Subsets a fixest_multi object using different keys.
Usage
## S3 method for class 'fixest_multi'x[i, sample, lhs, rhs, fixef, iv, I, reorder = TRUE, drop = FALSE]Arguments
x | A |
i | An integer vector. Represents the estimations to extract. |
sample | An integer vector, a logical scalar, or a character vector. It representsthe |
lhs | An integer vector, a logical scalar, or a character vector. It representsthe left-hand-sides identifiers for which the results should be extracted. Only valid whenthe |
rhs | An integer vector or a logical scalar. It represents the right-hand-sidesidentifiers for which the results should be extracted. Only valid when the |
fixef | An integer vector or a logical scalar. It represents the fixed-effectsidentifiers for which the results should be extracted. Only valid when the |
iv | An integer vector or a logical scalar. It represent the stages of the IV. Notethat the length can be greater than 2 when there are multiple endogenous regressors (thefirst stage corresponding to multiple estimations). Note that the order of the stages dependson the |
I | An integer vector. Represents the root element to extract. |
reorder | Logical, default is |
drop | Logical, default is |
Details
The order with we we use the keys matter. Every time a keysample,lhs,rhs,fixef oriv is used, a reordering is performed to consider the leftmost-side keyto be the new root.
Use logical keys to easily reorder. For example, say the objectres contains amultiple estimation with multiple left-hand-sides, right-hand-sides and fixed-effects.By default the results are ordered as follows:lhs,fixef,rhs.If you useres[lhs = FALSE], then the new order is:fixef,rhs,lhs.Withres[rhs = TRUE, lhs = FALSE] it becomes:rhs,fixef,lhs. In both casesyou keep all estimations.
Value
It returns afixest_multi object. If there is only one estimation left in the object, thenthe result is simplified into afixest object only withdrop = TRUE.
See Also
The main fixest estimation functions:feols,fepois,fenegbin,feglm,feNmlm. Tools for mutliple fixestestimations:summary.fixest_multi,print.fixest_multi,as.list.fixest_multi,sub-sub-.fixest_multi,sub-.fixest_multi.
Examples
# Estimation with multiple samples/LHS/RHSaq = airquality[airquality$Month %in% 5:6, ]est_split = feols(c(Ozone, Solar.R) ~ sw(poly(Wind, 2), poly(Temp, 2)), aq, split = ~ Month)# By default: sample is the rootetable(est_split)# Let's reorder, by considering lhs the rootetable(est_split[lhs = 1:.N])# Selecting only one LHS and RHSetable(est_split[lhs = "Ozone", rhs = 1])# Taking the first root (here sample = 5)etable(est_split[I = 1])# The first and last estimationsetable(est_split[i = c(1, .N)])Method to subselect from afixest_panel
Description
Subselection from afixest_panel which has been created with the functionpanel.Also allows to create lag/lead variables with functionsl/f ifthefixest_panel is also adata.table::data.table.
Usage
## S3 method for class 'fixest_panel'x[i, j, ...]Arguments
x | A |
i | Row subselection. Allows |
j | Variable selection. Allows |
... | Other arguments to be passed to |
Details
If the original data was also a data.table, some calls to[.fixest_panel may dissolvethefixest_panel object and return a regular data.table. This is the case forsubselections with additional arguments. If so, a note is displayed on the console.
Value
It returns afixest_panel data base, with the attributes allowing to createlags/leads properly bookkeeped.
Author(s)
Laurent Berge
See Also
Alternatively, the functionpanel changes adata.frame into a panel from which thefunctionsl andf (creating leads and lags) can be called. Otherwise you can set thepanel 'live' during the estimation using the argumentpanel.id (see for example inthe functionfeols).
Examples
data(base_did)# Creating a fixest_panel objectpdat = panel(base_did, ~id+period)# Subselections of fixest_panel objects bookkeeps the leads/lags enginepdat_small = pdat[!pdat$period %in% c(2, 4), ]a = feols(y~l(x1, 0:1), pdat_small)# we obtain the same results, had we created the lags "on the fly"base_small = base_did[!base_did$period %in% c(2, 4), ]b = feols(y~l(x1, 0:1), base_small, panel.id = ~id+period)etable(a, b)# Using data.table to create new lead/lag variablesif(require("data.table")){ pdat_dt = panel(as.data.table(base_did), ~id+period) # Variable creation pdat_dt[, x_l1 := l(x1)] pdat_dt[, c("x_l1", "x_f1_2") := .(l(x1), f(x1)**2)] # Estimation on a subset of the data # (the lead/lags work appropriately) feols(y~l(x1, 0:1), pdat_dt[!period %in% c(2, 4)])}Extracts one element from afixest_multi object
Description
Extracts single elements from multiplefixest estimations.
Usage
## S3 method for class 'fixest_multi'x[[i]]Arguments
x | A |
i | An integer scalar. The identifier of the estimation to extract. |
Value
Afixest object is returned.
See Also
The main fixest estimation functions:feols,fepois,fenegbin,feglm,feNmlm. Tools for mutliple fixestestimations:summary.fixest_multi,print.fixest_multi,as.list.fixest_multi,sub-sub-.fixest_multi,sub-.fixest_multi.
Examples
base = irisnames(base) = c("y", "x1", "x2", "x3", "species")# Multiple estimationres = feols(y ~ csw(x1, x2, x3), base, split = ~species)# The first estimationres[[1]]# The second one, etcres[[2]]Aggregates the values of DiD coefficients a la Sun and Abraham
Description
Simple tool that aggregates the value of CATT coefficients in staggereddifference-in-difference setups (see details).
Usage
## S3 method for class 'fixest'aggregate(x, agg, full = FALSE, use_weights = TRUE, ...)Arguments
x | A |
agg | A character scalar describing the variable names to be aggregated,it is pattern-based. For |
full | Logical scalar, defaults to |
use_weights | Logical, default is |
... | Arguments to be passed to |
Details
This is a function helping to replicate the estimator from Sun and Abraham (2021).You first need to perform an estimation with cohort and relative periods dummies(typically using the functioni), this leads to estimators of the cohortaverage treatment effect on the treated (CATT). Then you can use this function toretrieve the average treatment effect on each relative period, or for any other wayyou wish to aggregate the CATT.
Note that contrary to the SA article, here the cohort share in the sample isconsidered to be a perfect measure for the cohort share in the population.
Value
It returns a matrix representing a table of coefficients.
Author(s)
Laurent Berge
References
Liyang Sun and Sarah Abraham, 2021, "Estimating Dynamic Treatment Effects inEvent Studies with Heterogeneous Treatment Effects". Journal of Econometrics.
Examples
## DiD example#data(base_stagg)# 2 kind of estimations:# - regular TWFE model# - estimation with cohort x time_to_treatment interactions, later aggregated# Note: the never treated have a time_to_treatment equal to -1000# Now we perform the estimationres_twfe = feols(y ~ x1 + i(time_to_treatment, treated, ref = c(-1, -1000)) | id + year, base_stagg)# we use the "i." prefix to force year_treated to be considered as a factorres_cohort = feols(y ~ x1 + i(time_to_treatment, i.year_treated, ref = c(-1, -1000)) | id + year, base_stagg)# Displaying the resultsiplot(res_twfe, ylim = c(-6, 8))att_true = tapply(base_stagg$treatment_effect_true, base_stagg$time_to_treatment, mean)[-1]points(-9:8 + 0.15, att_true, pch = 15, col = 2)# The aggregate effect for each periodagg_coef = aggregate(res_cohort, "(ti.*nt)::(-?[[:digit:]]+)")x = c(-9:-2, 0:8) + .35points(x, agg_coef[, 1], pch = 17, col = 4)ci_low = agg_coef[, 1] - 1.96 * agg_coef[, 2]ci_up = agg_coef[, 1] + 1.96 * agg_coef[, 2]segments(x0 = x, y0 = ci_low, x1 = x, y1 = ci_up, col = 4)legend("topleft", col = c(1, 2, 4), pch = c(20, 15, 17), legend = c("TWFE", "True", "Sun & Abraham"))# The ATTaggregate(res_cohort, c("ATT" = "treatment::[^-]"))with(base_stagg, mean(treatment_effect_true[time_to_treatment >= 0]))# The total effect for each cohortaggregate(res_cohort, c("cohort" = "::[^-].*year_treated::([[:digit:]]+)"))Transforms a character string into a dictionary
Description
Transforms a single character string containing a dictionary in a textual format into a proper dictionary, that is a named character vector
Usage
as.dict(x)Arguments
x | A character scalar of the form |
Details
This function is mostly used in combination withsetFixest_dict to set the dictionary to beused in the functionetable.
Value
It returns a named character vector.
Author(s)
Laurent Berge
See Also
Examples
x = "# Main vars mpg: Miles per gallon hp: Horsepower # Categorical variables cyl: Number of cylinders; vs: Engine"as.dict(x)Transforms a fixest_multi object into a list
Description
Extracts the results from afixest_multi object and place them into a list.
Usage
## S3 method for class 'fixest_multi'as.list(x, ...)Arguments
x | A |
... | Not currently used. |
Value
Returns a list containing all the results of the multiple estimations.
See Also
The main fixest estimation functions:feols,fepois,fenegbin,feglm,feNmlm. Tools for mutliple fixestestimations:summary.fixest_multi,print.fixest_multi,as.list.fixest_multi,sub-sub-.fixest_multi,sub-.fixest_multi.
Examples
base = irisnames(base) = c("y", "x1", "x2", "x3", "species")# Multiple estimationres = feols(y ~ csw(x1, x2, x3), base, split = ~species)# All the results at onceas.list(res)Sample data for difference in difference
Description
This data has been generated to illustrate the use of difference in difference functions inpackagefixest. This is a balanced panel of 104 individuals and 10 periods.About half the individuals are treated, the treatment having a positive effect onthe dependent variabley after the 5th period. The effect of the treatment ony is gradual.
Usage
data(base_did, package = "fixest")Format
base_did is a data frame with 1,040 observations and 6 variables namedy,x1,id,period,post andtreat.
- y
The dependent variable affected by the treatment.
- x1
An explanatory variable.
- id
Identifier of the individual.
- period
From 1 to 10
- post
Indicator taking value 1 if the period is strictly greater than 5, 0 otherwise.
- treat
Indicator taking value 1 if the individual is treated, 0 otherwise.
Source
This data has been generated fromR.
Publication data sample
Description
This data reports the publication output (number of articles and number of citations received)for a few scientists from the start of their career to 2000.Most of the variables are processed from the Microsoft Academic Graph (MAG) data set. A few variables are randomly generated.
Usage
data(base_pub, package = "fixest")Format
base_pub is a data frame with 4,024 observations and 10 variables. There are 200 different scientists and 51 different years (ends in 2000).
author_id: scientist identifieryear: current yearaffil_id: affiliation ID of the scientist's current affiliationaffil_name: affiliation name of the scientist's current affiliation (character)field: field name of the scientist (character), time invariantnb_pub: number of publications of the scientist for the current yearnb_cites: number of citations received by the publications of the scientist in the current year. Accounts for the citations received from articles published up to 2020.birth_year: birth year of the scientist (this is randomly generated)is_woman: 1 if the scientist is a woman, 0 otherwise (this is randomly generated)age: current age of the scientist (formallyyear - birth_year)
Source
The source of this data set is the Microsoft Academic Graph data set, extracted in 2020. Now a defunct project, you can find similar data onOpenAlex.
The variablesbirth_year,is_woman andage were randomly generated. All other variables have created from the raw MAG files.
Sample data for staggered difference in difference
Description
This data has been generated to illustrate the Sun and Abraham (Journal of Econometrics, 2021) method for staggered difference-in-difference. This is a balanced panel of 95 individuals and 10 periods. Half the individuals are treated. For those treated, the treatment date can vary from the second to the last period. The effect of the treatment depends on the time since the treatment: it is first negative and then increasing.
Usage
data(base_stagg, package = "fixest")Format
base_stagg is a data frame with 950 observations and 7 variables:
id: panel identifier.
year: from 1 to 10.
year_treated: the period at which the individual is treated.
time_to_treatment: different between the year and the treatment year.
treated: indicator taking value 1 if the individual is treated, 0 otherwise.
treatment_effect_true: true effect of the treatment.
x1: explanatory variable, correlated with the period.
y: the dependent variable affected by the treatment.
Source
This data has been generated fromR.
Bins the values of a variable (typically a factor)
Description
Tool to easily group the values of a given variable.
Usage
bin(x, bin)Arguments
x | A vector whose values have to be grouped. Can be of any type but must be atomic. |
bin | A list of values to be grouped, a vector, a formula, or the specialvalues |
Value
It returns a vector of the same length asx.
"Cutting" a numeric vector
Numeric vectors can be cut easily into: a) equal parts, b) user-specified bins.
Use"cut::n" to cut the vector inton (roughly) equal parts. Percentiles areused to partition the data, hence some data distributions can lead to create lessthann parts (for example if P0 is the same as P50).
The user can specify custom bins with the following syntax:"cut::a]b]c]". Herethe numbersa,b,c, etc, are a sequence of increasing numbers, each followedby an open or closed square bracket. The numbers can be specified as eitherplain numbers (e.g."cut::5]12[32["), quartiles (e.g."cut::q1]q3["),or percentiles (e.g."cut::p10]p15]p90]"). Values of different types can be mixed:"cut::5]q2[p80[" is valid provided the median (q2) is indeed greaterthan5, otherwise an error is thrown.
The square bracket right of each number tells whether the numbers should be includedor excluded from the current bin. For example, sayx ranges from 0 to 100,then"cut::5]" will create two bins: one from 0 to 5 and a second from 6 to 100.With"cut::5[" the bins would have been 0-4 and 5-100.
A factor is always returned. The labels always report the min and max values in each bin.
To have user-specified bin labels, just add them in the character vectorfollowing'cut::values'. You don't need to provide all of them, andNA valuesfall back to the default label. For example,bin = c("cut::4", "Q1", NA, "Q3")will modify only the first and third label that will be displayed as"Q1" and"Q3".
bin vsref
The functionsbin andref are able to do the same thing, then why use oneinstead of the other? Here are the differences:
refalways returns a factor. This is in contrast withbinwhich returns,when possible, a vector of the same type as the vector in input.refalways places the values modified in the first place of the factor levels.On the other hand,bintries to not modify the ordering of the levels. It is possibleto makebinmimic the behavior ofrefby adding an"@"as the first element ofthe list in the argumentbin.when a vector (and not a list) is given in input,
refwill place each element ofthe vector in the first place of the factor levels. The behavior ofbinistotally different,binwill transform all the values in the vector into a singlevalue inx(i.e. it's binning).
Author(s)
Laurent Berge
See Also
To re-factor variables:ref.
Examples
data(airquality)month_num = airquality$Monthtable(month_num)# Grouping the first two valuestable(bin(month_num, 5:6))# ... plus changing the name to '10'table(bin(month_num, list("10" = 5:6)))# ... and grouping 7 to 9table(bin(month_num, list("g1" = 5:6, "g2" = 7:9)))# Grouping every two monthstable(bin(month_num, "bin::2"))# ... every 2 consecutive elementstable(bin(month_num, "!bin::2"))# ... idem starting from the last onetable(bin(month_num, "!!bin::2"))# Using .() for list():table(bin(month_num, .("g1" = 5:6)))## with non numeric data#month_lab = c("may", "june", "july", "august", "september")month_fact = factor(month_num, labels = month_lab)# Grouping the first two elementstable(bin(month_fact, c("may", "jun")))# ... using regextable(bin(month_fact, "@may|jun"))# ...changing the nametable(bin(month_fact, list("spring" = "@may|jun")))# Grouping every 2 consecutive monthstable(bin(month_fact, "!bin::2"))# ...idem but starting from the lasttable(bin(month_fact, "!!bin::2"))# Relocating the months using "@d" in the nametable(bin(month_fact, .("@5" = "may", "@1 summer" = "@aug|jul")))# Putting "@" as first item means subsequent items will be placed firsttable(bin(month_fact, .("@", "aug", "july")))## "Cutting" numeric data#data(iris)plen = iris$Petal.Length# 3 parts of (roughly) equal sizetable(bin(plen, "cut::3"))# Three custom binstable(bin(plen, "cut::2]5]"))# .. same, excluding 5 in the 2nd bintable(bin(plen, "cut::2]5["))# Using quartilestable(bin(plen, "cut::q1]q2]q3]"))# Using percentilestable(bin(plen, "cut::p20]p50]p70]p90]"))# Mixing alltable(bin(plen, "cut::2[q2]p90]"))# NOTA:# -> the labels always contain the min/max values in each bin# Custom labels can be provided, just give them in the char. vector# NA values lead to the default labeltable(bin(plen, c("cut::2[q2]p90]", "<2", "]2; Q2]", NA, ">90%")))## With a formula#data(iris)plen = iris$Petal.Length# We need to use "x"table(bin(plen, list("< 2" = ~x < 2, ">= 2" = ~x >= 2)))Extracts the bread matrix from fixest objects
Description
Extracts the bread matrix from fixest objects to be used to compute sandwich variance-covariance matrices.
Usage
## S3 method for class 'fixest'bread(x, ...)Arguments
x | A |
... | Not currently used. |
Value
Returns a matrix of the same dimension as the number of variables used in the estimation.
Examples
est = feols(Petal.Length ~ Petal.Width + Sepal.Width, iris)bread(est)Check the fixed-effects convergence of afeols estimation
Description
Checks the convergence of afeols estimation by computing the first-order conditions of all fixed-effects (all should be close to 0)
Usage
check_conv_feols(x)## S3 method for class 'fixest_check_conv'summary(object, type = "short", ...)Arguments
x | A |
object | An object returned by |
type | Either "short" (default) or "detail". If "short", only the maximum absolute FOC aredisplayed, otherwise the 2 smallest and the 2 largest FOC are reported for each fixed-effect andeach variable. |
... | Not currently used. Note that this function first re-demeans the variables, thus possibly incurring some extracomputation time. |
Value
It returns a list ofN elements,N being the number of variables in the estimation(dependent variable + explanatory variables +, if IV, endogenous variables and instruments). Foreach variable, all the first-order conditions for each fixed-effect are returned.
Examples
base = setNames(iris, c("y", "x1", "x2", "x3", "species"))base$FE = rep(1:30, 5)# one estimation with fixed-effects + varying slopesest = feols(y ~ x1 | species[x2] + FE[x3], base)# Checking the convergenceconv = check_conv_feols(est)# We can check that al values are close to 0summary(conv)summary(conv, "detail")Extracts the coefficients from afixest estimation
Description
This function extracts the coefficients obtained from a model estimated withfemlm,feols orfeglm.
Usage
## S3 method for class 'fixest'coef(object, keep, drop, order, collin = FALSE, agg = TRUE, ...)## S3 method for class 'fixest'coefficients(object, keep, drop, order, collin = FALSE, agg = TRUE, ...)Arguments
object | A |
keep | Character vector. This element is used to display only a subset of variables. Thisshould be a vector of regular expressions (see |
drop | Character vector. This element is used if some variables are not to be displayed.This should be a vector of regular expressions (see |
order | Character vector. This element is used if the user wants the variables to beordered in a certain way. This should be a vector of regular expressions (see |
collin | Logical, default is |
agg | Logical scalar, default is |
... | Not currently used. |
Details
The coefficients are the ones that have been found to maximize the log-likelihood of the specified model. More information can be found on the models from the estimations help pages:femlm,feols orfeglm.
Note that if the model has been estimated with fixed-effects, to obtain the fixed-effect coefficients, you need to use the functionfixef.fixest.
Value
This function returns a named numeric vector.
Author(s)
Laurent Berge
See Also
See also the main estimation functionsfemlm,feols orfeglm.summary.fixest,confint.fixest,vcov.fixest,etable,fixef.fixest.
Examples
# simple estimation on iris data, using "Species" fixed-effectsres = femlm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width | Species, iris)# the coefficients of the variables:coef(res)# the fixed-effects coefficients:fixef(res)Extracts the coefficients of fixest_multi objects
Description
Utility to extract the coefficients of multiple estimations and rearrange them into a matrix.
Usage
## S3 method for class 'fixest_multi'coef( object, keep, drop, order, collin = FALSE, long = FALSE, na.rm = TRUE, ...)## S3 method for class 'fixest_multi'coefficients( object, keep, drop, order, collin = FALSE, long = FALSE, na.rm = TRUE, ...)Arguments
object | A |
keep | Character vector. This element is used to display only a subset of variables. Thisshould be a vector of regular expressions (see |
drop | Character vector. This element is used if some variables are not to be displayed.This should be a vector of regular expressions (see |
order | Character vector. This element is used if the user wants the variables to beordered in a certain way. This should be a vector of regular expressions (see |
collin | Logical, default is |
long | Logical, default is |
na.rm | Logical, default is |
... | Not currently used. |
Examples
base = irisnames(base) = c("y", "x1", "x2", "x3", "species")# A multiple estimationest = feols(y ~ x1 + csw0(x2, x3), base)# Getting all the coefficients at once,# each row is a modelcoef(est)# Example of keep/drop/ordercoef(est, keep = "Int|x1", order = "x1")# To change the order of the model, use fixest_multi# extraction tools:coef(est[rhs = .N:1])# collin + long + na.rmbase$x1_bis = base$x1 # => collinearest = feols(y ~ x1_bis + csw0(x1, x2, x3), base, split = ~species)# does not display x1 since it is always collinearcoef(est)# now it doescoef(est, collin = TRUE)# longcoef(est, long = TRUE)# long but balanced (with NAs then)coef(est, long = TRUE, na.rm = FALSE)Plots confidence intervals and point estimates
Description
This function plots the results of estimations (coefficients and confidence intervals).The functioniplot restricts the output to variables created withi, eitherinteractions with factors or raw factors.
Usage
coefplot( ..., objects = NULL, style = NULL, se, ci_low, ci_high, df.t = NULL, vcov = NULL, cluster = NULL, x, x.shift = 0, horiz = FALSE, dict = NULL, keep, drop, order, ci.width = "1%", ci_level = 0.95, add = FALSE, plot_prms = list(), pch = c(20, 17, 15, 21, 24, 22), col = 1:8, cex = 1, lty = 1, lwd = 1, ylim = NULL, xlim = NULL, pt.pch = pch, pt.bg = NULL, pt.cex = cex, pt.col = col, ci.col = col, pt.lwd = lwd, ci.lwd = lwd, ci.lty = lty, grid = TRUE, grid.par = list(lty = 3, col = "gray"), zero = TRUE, zero.par = list(col = "black", lwd = 1), pt.join = FALSE, pt.join.par = list(col = pt.col, lwd = lwd), ci.join = FALSE, ci.join.par = list(lwd = lwd, col = col, lty = 2), ci.fill = FALSE, ci.fill.par = list(col = "lightgray", alpha = 0.5), ref = "auto", ref.line = "auto", ref.line.par = list(col = "black", lty = 2), lab.cex, lab.min.cex = 0.85, lab.max.mar = 0.25, lab.fit = "auto", xlim.add, ylim.add, only.params = FALSE, sep, as.multiple = FALSE, bg, group = "auto", group.par = list(lwd = 2, line = 3, tcl = 0.75), main = "Effect on __depvar__", value.lab = "Estimate and __ci__ Conf. Int.", ylab = NULL, xlab = NULL, sub = NULL, i.select = NULL, do_iplot = NULL)iplot( ..., i.select = 1, objects = NULL, style = NULL, se, ci_low, ci_high, df.t = NULL, vcov = NULL, cluster = NULL, x, x.shift = 0, horiz = FALSE, dict = NULL, keep, drop, order, ci.width = "1%", ci_level = 0.95, add = FALSE, plot_prms = list(), pch = c(20, 17, 15, 21, 24, 22), col = 1:8, cex = 1, lty = 1, lwd = 1, ylim = NULL, xlim = NULL, pt.pch = pch, pt.bg = NULL, pt.cex = cex, pt.col = col, ci.col = col, pt.lwd = lwd, ci.lwd = lwd, ci.lty = lty, grid = TRUE, grid.par = list(lty = 3, col = "gray"), zero = TRUE, zero.par = list(col = "black", lwd = 1), pt.join = FALSE, pt.join.par = list(col = pt.col, lwd = lwd), ci.join = FALSE, ci.join.par = list(lwd = lwd, col = col, lty = 2), ci.fill = FALSE, ci.fill.par = list(col = "lightgray", alpha = 0.5), ref = "auto", ref.line = "auto", ref.line.par = list(col = "black", lty = 2), lab.cex, lab.min.cex = 0.85, lab.max.mar = 0.25, lab.fit = "auto", xlim.add, ylim.add, only.params = FALSE, sep, as.multiple = FALSE, bg, group = "auto", group.par = list(lwd = 2, line = 3, tcl = 0.75), main = "Effect on __depvar__", value.lab = "Estimate and __ci__ Conf. Int.", ylab = NULL, xlab = NULL, sub = NULL)Arguments
... | Other arguments to be passed to |
objects | A list of |
style | A character scalar giving the style of the plot to be used. Youcan set styles with the function |
se | The standard errors of the estimates. It may be missing. |
ci_low | If |
ci_high | If |
df.t | Integer scalar or |
vcov | Versatile argument to specify the VCOV.In general, it is either a character scalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. The VCOV types implemented are: "iid", "hetero" (or "HC1"),"cluster", "twoway", "NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley".It also accepts object from vcov_cluster, vcov_NW, NW, vcov_DK, DK, vcov_conley and conley.It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances.See the vcov documentation in the vignette.You can pass several VCOVs (as above) if you nest them into a list.If the number of VCOVs equals the number of models, eahc VCOV is mapped to the appropriate model.If there is one model and several VCOVs, or if the first element of the list is equal to |
cluster | Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering over |
x | The value of the x-axis. If missing, the names of the argument |
x.shift | Shifts the confidence intervals bars to the left or right, dependingon the value of |
horiz | A logical scalar, default is |
dict | A named character vector or a logical scalar. It changes the original variable namesto the ones contained in the |
keep | Character vector. This element is used to display only a subset of variables. Thisshould be a vector of regular expressions (see |
drop | Character vector. This element is used if some variables are not to be displayed.This should be a vector of regular expressions (see |
order | Character vector. This element is used if the user wants the variables to beordered in a certain way. This should be a vector of regular expressions (see |
ci.width | The width of the extremities of the confidence intervals. Default is |
ci_level | Scalar between 0 and 1: the level of the CI. By default it is equal to 0.95. |
add | Default is |
plot_prms | A named list. It may contain additionnal parameters to be passedto the plot. |
pch | The patch of the coefficient estimates. Default is 1 (circle).This is an alias to tha argument |
col | The color of the points and the confidence intervals. Default is 1("black"). Note that you can set the colors separately for each of themwith |
cex | Numeric, default is 1. Expansion factor for the points |
lty | The line type of the confidence intervals. Default is 1.This is an alias to the argument |
lwd | General line with. Default is 1. |
ylim | Numeric vector of length 2 which gives the limits of the plotting region forthe y-axis. The default is |
xlim | Numeric vector of length 2 which gives the limits of the plotting region forthe x-axis. The default is |
pt.pch | The patch of the coefficient estimates. Default is 1 (circle). |
pt.bg | The background color of the point estimate (when the |
pt.cex | The size of the coefficient estimates. Default is the other argument |
pt.col | The color of the coefficient estimates. Default is equal to the argument |
ci.col | The color of the confidence intervals. Default is equal to the argument |
pt.lwd | The line width of the coefficient estimates. Default is equal tothe other argument |
ci.lwd | The line width of the confidence intervals. Default is equal tothe other argument |
ci.lty | The line type of the confidence intervals. Default is 1. |
grid | Logical, default is |
grid.par | List. Parameters of the grid. The default values are: |
zero | Logical, default is |
zero.par | List. Parameters of the zero-line. The default values are |
pt.join | Logical, default is |
pt.join.par | List. Parameters of the line joining the coefficients. Thedefault values are: |
ci.join | Logical default to |
ci.join.par | A list of parameters to be passed to |
ci.fill | Logical default to |
ci.fill.par | A list of parameters to be passed to |
ref | Used to add points at |
ref.line | Logical or numeric, default is "auto", whose behavior dependson the situation. It is |
ref.line.par | List. Parameters of the vertical line on the reference. Thedefault values are: |
lab.cex | The size of the labels of the coefficients. Default is missing.It is automatically set by an internal algorithm which can go as low as |
lab.min.cex | The minimum size of the coefficients labels, as set by theinternal algorithm. Default is 0.85. |
lab.max.mar | The maximum size the left margin can take when trying to fitthe coefficient labels into it (only when |
lab.fit | The method to fit the coefficient labels into the plotting region(only when |
xlim.add | A numeric vector of length 1 or 2. It represents an extensionfactor of xlim, in percentage. Eg: |
ylim.add | A numeric vector of length 1 or 2. It represents an extensionfactor of ylim, in percentage. Eg: |
only.params | Logical, default is |
sep | The distance between two estimates – only when argument |
as.multiple | Logical: default is |
bg | Background color for the plot. By default it is white. |
group | A list, default is missing. Each element of the list reports thecoefficients to be grouped while the name of the element is the group name. Eachelement of the list can be either: i) a character vector of length 1, ii) oflength 2, or ii) a numeric vector. If equal to: i) then it is interpreted asa pattern: all element fitting the regular expression will be grouped (note thatyou can use the special character "^^" to clean the beginning of the names, seeexample), if ii) it corresponds to the first and last elements to be grouped,if iii) it corresponds to the coefficients numbers to be grouped. If equal toa character vector, you can use a percentage to tell the algorithm to look atthe coefficients before aliasing (e.g. |
group.par | A list of parameters controlling the display of the group. Theparameters controlling the line are: |
main | The title of the plot. Default is |
value.lab | The label to appear on the side of the coefficient values. If |
ylab | The label of the y-axis, default is |
xlab | The label of the x-axis, default is |
sub | A subtitle, default is |
i.select | Integer scalar, default is 1. In |
do_iplot | Logical, default is |
Functions
iplot(): Plots the coefficients generated with i()
Setting custom default values
The functioncoefplot dispose of many arguments to parametrize the plots. Mostof these arguments can be set once an for all using the functionsetFixest_coefplot.See Example 3 below for a demonstration.
iplot
The functioniplot restrictscoefplot to interactions or factors createdwith the functioni. Onlyone of the i-variables will be plotted at a time.If you have several i-variables, you can navigate through them with thei.select argument.
The argumenti.select is an index that will go through all the i-variables.It will work well if the variables are pure, meaning not interacted with othervariables. If the i-variables are interacted, the index may have an odd behaviorbut will (in most cases) work all the same, just try some numbers up until you(hopefully) obtain the graph you want.
Note, importantly, that interactions of two factor variables are (in general)disregarded since they would require a 3-D plot to be properly represented.
Arguments keep, drop and order
The argumentskeep,drop andorder use regular expressions. If you are not awareof regular expressions, I urge you to learn it, since it is an extremely powerful wayto manipulate character strings (and it exists across most programming languages).
For example drop = "Wind" would drop any variable whose name contains "Wind". Note thatvariables such as "Temp:Wind" or "StrongWind" do contain "Wind", so would be dropped.To drop only the variable named "Wind", you need to usedrop = "^Wind$" (with "^" meaning beginning, resp. "$" meaning end,of the string => this is the language of regular expressions).
Although you can combine several regular expressions in a single characterstring using pipes,drop also accepts a vector of regular expressions.
You can use the special character "!" (exclamation mark) to reverse the effectof the regular expression (this feature is specific to this function).For exampledrop = "!Wind" would drop any variable that does not contain "Wind".
You can use the special character "%" (percentage) to make reference to theoriginal variable name instead of the aliased name. For example, you have avariable named"Month6", and use a dictionarydict = c(Month6="June").Thus the variable will be displayed as"June".If you want to delete that variable, you can use eitherdrop="June",ordrop="%Month6" (which makes reference to its original name).
The argumentorder takes in a vector of regular expressions, the order will follow theelements of this vector. The vector gives a list of priorities,on the left the elements with highest priority.For example, order = c("Wind", "!Inter", "!Temp") would give highest priorities tothe variables containing "Wind" (which would then appear first),second highest priority is the variables not containing "Inter", last,with lowest priority, the variables not containing "Temp".If you had the following variables: (Intercept), Temp:Wind, Wind, Temp youwould end up with the following order: Wind, Temp:Wind, Temp, (Intercept).
Author(s)
Laurent Berge
See Also
SeesetFixest_coefplot to set the default values ofcoefplot, and the estimationfunctions: e.g.feols,fepois,feglm,fenegbin.
Examples
## Example 1: Stacking two sets of results on the same graph## Estimation on Iris data with one fixed-effect (Species)# + we cluster the standard-errorsest = feols(Petal.Length ~ Petal.Width + Sepal.Width | Species, iris, vcov = "cluster")# Now with "regular" standard-errorsest_std = summary(est, vcov = "iid")# You can plot the two results at oncecoefplot(est, est_std)# You could also use the argument vcovcoefplot(est, vcov = list("cluster", "iid"))# Alternatively, you can use the argument x.shift# to do it sequentially:# First graph with clustered standard-errorscoefplot(est, x.shift = -.2)# 'x.shift' was used to shift the coefficients to the left.# Second set of results: this time with# standard-errors that are not clustered.coefplot(est, vcov = "iid", x.shift = .2, add = TRUE, col = 2, ci.lty = 2, pch = 15)legend("topright", col = 1:2, pch = 20, lwd = 1, lty = 1:2, legend = c("Clustered", "IID"), title = "Standard-Errors")## Example 2: Interactions## Now we estimate and plot the "yearly" treatment effectsdata(base_did)base_inter = base_did# We interact the variable 'period' with the variable 'treat'est_did = feols(y ~ x1 + i(period, treat, 5) | id + period, base_inter)# In the estimation, the variable treat is interacted# with each value of period but 5, set as a reference# coefplot will show all the coefficients:coefplot(est_did)# Note that the grouping of the coefficients is due to 'group = "auto"'# If you want to keep only the coefficients# created with i() (ie the interactions), use iplotiplot(est_did)# We can see that the graph is different from before:# - only interactions are shown,# - the reference is present,# => this is fully flexibleiplot(est_did, ref.line = FALSE, pt.join = TRUE)## What if the interacted variable is not numeric?# Let's create a "month" variableall_months = c("aug", "sept", "oct", "nov", "dec", "jan", "feb", "mar", "apr", "may", "jun", "jul")base_inter$period_month = all_months[base_inter$period]# The new estimationest = feols(y ~ x1 + i(period_month, treat, "oct") | id+period, base_inter)# Since 'period_month' of type character, coefplot sorts itiplot(est)# To respect a plotting order, use a factorbase_inter$month_factor = factor(base_inter$period_month, levels = all_months)est = feols(y ~ x1 + i(month_factor, treat, "oct") | id + period, base_inter)iplot(est)## Example 3: Setting defaults## coefplot has many arguments, which makes it highly flexible.# If you don't like the default style of coefplot. No worries,# you can set *your* default by using the function# setFixest_coefplot()dict = c("Petal.Length"="Length (Petal)", "Petal.Width"="Width (Petal)", "Sepal.Length"="Length (Sepal)", "Sepal.Width"="Width (Sepal)")setFixest_coefplot(ci.col = 2, pt.col = "darkblue", ci.lwd = 3, pt.cex = 2, pt.pch = 15, ci.width = 0, dict = dict)est = feols(Petal.Length ~ Petal.Width + Sepal.Length + Sepal.Width + i(Species), iris)# And that's itcoefplot(est)# You can set separate default values for iplotsetFixest_coefplot("iplot", pt.join = TRUE, pt.join.par = list(lwd = 2, lty = 2))iplot(est)# To reset to the default settings:setFixest_coefplot("all", reset = TRUE)coefplot(est)## Example 4: group + cleaning## You can use the argument group to group variables# You can further use the special character "^^" to clean# the beginning of the coef. name: particularly useful for factorsest = feols(Petal.Length ~ Petal.Width + Sepal.Length + Sepal.Width + Species, iris)# No grouping:coefplot(est)# now we group by Sepal and Speciescoefplot(est, group = list(Sepal = "Sepal", Species = "Species"))# now we group + clean the beginning of the names using the special character ^^coefplot(est, group = list(Sepal = "^^Sepal.", Species = "^^Species"))Extracts the coefficients table from an estimation
Description
Methods to extracts the coefficients table and its sub-components from an estimation.
Usage
coeftable(object, ...)se(object, ...)pvalue(object, ...)tstat(object, ...)Arguments
object | An estimation (fitted model object), e.g. a |
... | Other arguments to the methods. |
Value
Returns a matrix (coeftable) or vectors.
See Also
Please look at thecoeftable.fixest page for more detailed information.
Examples
est = lm(mpg ~ cyl, mtcars)coeftable(est)Extracts the coefficients table from an estimation
Description
Default method to extracts the coefficients table and its sub-components from an estimation.
Usage
## Default S3 method:coeftable(object, keep, drop, order, ...)## Default S3 method:se(object, keep, drop, order, ...)## Default S3 method:tstat(object, keep, drop, order, ...)## Default S3 method:pvalue(object, keep, drop, order, ...)## S3 method for class 'matrix'se(object, keep, drop, order, ...)Arguments
object | The result of an estimation (a fitted model object). Note that this functionis made to work with |
keep | Character vector. This element is used to display only a subset of variables. Thisshould be a vector of regular expressions (see |
drop | Character vector. This element is used if some variables are not to be displayed.This should be a vector of regular expressions (see |
order | Character vector. This element is used if the user wants the variables to beordered in a certain way. This should be a vector of regular expressions (see |
... | Other arguments that will be passed to First the method summary is applied if needed, then the coefficients table is extracted fromits output. The default method is very naive and hopes that the resulting coefficients tablecontained in the summary of the fitted model is well formed: this assumption is veryoften wrong. Anyway, there is no development intended since the coeftable/se/pvalue/tstatseries of methods is only intended to work well with |
Value
Returns a matrix (coeftable) or vectors.
Functions
se(default): Extracts the standard-errors from an estimationtstat(default): Extracts the standard-errors from an estimationpvalue(default): Extracts the p-values from an estimationse(matrix): Extracts the standard-errors from a VCOV matrix
Examples
# NOTA: This function is really made to handle fixest objects# The default methods works for simple structures, but you'd be# likely better off with broom::tidy for other modelsest = lm(mpg ~ cyl, mtcars)coeftable(est)se(est)Obtain various statistics from an estimation
Description
Set of functions to directly extract some commonly used statistics, like the p-value orthe table of coefficients, from estimations. This was first implemented forfixest estimations, but has some support for other models.
Usage
## S3 method for class 'fixest'coeftable( object, vcov = NULL, ssc = NULL, cluster = NULL, keep = NULL, drop = NULL, order = NULL, list = FALSE, ...)## S3 method for class 'fixest'se( object, vcov = NULL, ssc = NULL, cluster = NULL, keep = NULL, drop = NULL, order = NULL, ...)## S3 method for class 'fixest'tstat( object, vcov = NULL, ssc = NULL, cluster = NULL, keep = NULL, drop = NULL, order = NULL, ...)## S3 method for class 'fixest'pvalue( object, vcov = NULL, ssc = NULL, cluster = NULL, keep = NULL, drop = NULL, order = NULL, ...)Arguments
object | A |
vcov | Versatile argument to specify the VCOV.In general, it is either a character scalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. The VCOV types implemented are: "iid", "hetero" (or "HC1"),"cluster", "twoway", "NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley".It also accepts object from vcov_cluster, vcov_NW, NW, vcov_DK, DK, vcov_conley and conley.It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances.See the vcov documentation in the vignette.You can pass several VCOVs (as above) if you nest them into a list.If the number of VCOVs equals the number of models, eahc VCOV is mapped to the appropriate model.If there is one model and several VCOVs, or if the first element of the list is equal to |
ssc | An object of class |
cluster | Tells how to cluster the standard-errors (if clustering is requested). Canbe either a list of vectors, a character vector of variable names, a formula or aninteger vector. Assume we want to perform 2-way clustering over |
keep | Character vector. This element is used to display only a subset of variables. Thisshould be a vector of regular expressions (see |
drop | Character vector. This element is used if some variables are not to be displayed.This should be a vector of regular expressions (see |
order | Character vector. This element is used if the user wants the variables to beordered in a certain way. This should be a vector of regular expressions (see |
list | Logical, default is |
... | Other arguments to be passed to |
Details
This set of tiny functions is primarily constructed forfixest estimations.
Value
Returns a table of coefficients, with in rows the variables and four columns: the estimate,the standard-error, the t-statistic and the p-value.
Iflist = TRUE then a nested list is returned, the first layer is accessed withthe coefficients names; the second layer with the following values:coef,se,tstat,pvalue. For example, withres = coeftable(est, list = TRUE)you can access the SE of the coefficientx1 withres$x1$se; and itscoefficient withres$x1$coef, etc.
Functions
se(fixest): Extracts the standard-error of an estimationtstat(fixest): Extracts the t-statistics of an estimationpvalue(fixest): Extracts the p-value of an estimation
Examples
# Some data and estimationdata(trade)est = fepois(Euros ~ log(dist_km) | Origin^Product + Year, trade)## Coeftable/se/tstat/pvalue#coeftable(est)se(est)tstat(est)pvalue(est)# Now with two-way clustered standard-errors# and using coeftable()coeftable(est, cluster = ~Origin + Product)se(est, cluster = ~Origin + Product)pvalue(est, cluster = ~Origin + Product)tstat(est, cluster = ~Origin + Product)# Or you can cluster only once using summary:est_sum = summary(est, cluster = ~Origin + Product)coeftable(est_sum)se(est_sum)tstat(est_sum)pvalue(est_sum)# You can use the arguments keep, drop, order# to rearrange the resultsbase = irisnames(base) = c("y", "x1", "x2", "x3", "species")est_iv = feols(y ~ x1 | x2 ~ x3, base)tstat(est_iv, keep = "x1")coeftable(est_iv, keep = "x1|Int")coeftable(est_iv, order = "!Int")## Using lists## Returning the coefficients table as a list can be useful for quick# reference in markdown documents.# Note that the "(Intercept)" is renamed into "constant"res = coeftable(est_iv, list = TRUE)# coefficient of the constant:res$constant$coef# pvalue of x1res$x1$pvalueExtracts the coefficients tables fromfixest_multi estimations
Description
Series of methods to extract the coefficients table or its sub-components from afixest_multi objects (i.e. the outcome of multiple estimations).
Usage
## S3 method for class 'fixest_multi'coeftable( object, vcov = NULL, keep = NULL, drop = NULL, order = NULL, long = FALSE, wide = FALSE, ...)## S3 method for class 'fixest_multi'se( object, vcov = NULL, keep = NULL, drop = NULL, order = NULL, long = FALSE, ...)## S3 method for class 'fixest_multi'tstat( object, vcov = NULL, keep = NULL, drop = NULL, order = NULL, long = FALSE, ...)## S3 method for class 'fixest_multi'pvalue( object, vcov = NULL, keep = NULL, drop = NULL, order = NULL, long = FALSE, ...)Arguments
object | A |
vcov | Versatile argument to specify the VCOV.In general, it is either a character scalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. The VCOV types implemented are: "iid", "hetero" (or "HC1"),"cluster", "twoway", "NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley".It also accepts object from vcov_cluster, vcov_NW, NW, vcov_DK, DK, vcov_conley and conley.It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances.See the vcov documentation in the vignette.You can pass several VCOVs (as above) if you nest them into a list.If the number of VCOVs equals the number of models, eahc VCOV is mapped to the appropriate model.If there is one model and several VCOVs, or if the first element of the list is equal to |
keep | Character vector. This element is used to display only a subset of variables. Thisshould be a vector of regular expressions (see |
drop | Character vector. This element is used if some variables are not to be displayed.This should be a vector of regular expressions (see |
order | Character vector. This element is used if the user wants the variables to beordered in a certain way. This should be a vector of regular expressions (see |
long | Logical scalar, default is |
wide | A logical scalar, default is |
... | Other arguments to be passed to |
Value
It returns adata.frame containing the coefficients tables (or just the se/pvalue/tstat)along with the information on which model was estimated.
Ifwide = TRUE, then a list is returned. The elements of the list arecoef/se/tstat/pvalue. Each element of the list is a wide table with a column per coefficient.
Iflong = TRUE, then all the information is stacked. This removes the 4 columnscontaining the coefficient estimates to the p-values, and replace them with twonew columns:"param" and"value". The columnparam contains thevaluescoef/se/tstat/pvalue, and the columnvalues theassociated numerical information.
Functions
se(fixest_multi): Extracts the standard-errors fromfixest_multiestimationststat(fixest_multi): Extracts the t-stats fromfixest_multiestimationspvalue(fixest_multi): Extracts the p-values fromfixest_multiestimations
Examples
base = setNames(iris, c("y", "x1", "x2", "x3", "species"))est_multi = feols(y ~ csw(x.[,1:3]), base, split = ~species)# we get all the coefficient tables at oncecoeftable(est_multi)# Now just the standard-errorsse(est_multi)# wide = TRUE => leads toa list of wide tablescoeftable(est_multi, wide = TRUE)# long = TRUE, all the information is stackedcoeftable(est_multi, long = TRUE)Collinearity diagnostics forfixest objects
Description
In some occasions, the optimization algorithm offemlm may fail to converge, orthe variance-covariance matrix may not be available. The most common reason of whythis happens is collinearity among variables. This function helps to find out whichset of variables is problematic.
Usage
collinearity(x, verbose)Arguments
x | A |
verbose | An integer. If higher than or equal to 1, then a note is prompted ateach step of the algorithm. By default |
Details
This function tests: 1) collinearity with the fixed-effect variables,2) perfect multi-collinearity between the variables, 3) perfect multi-collinearitybetween several variables and the fixed-effects, and 4) identification issueswhen there are non-linear in parameters parts.
Value
It returns a text message with the identified diagnostics.
Author(s)
Laurent Berge
Examples
# Creating an example data base:set.seed(1)fe_1 = sample(3, 100, TRUE)fe_2 = sample(20, 100, TRUE)x = rnorm(100, fe_1)**2y = rnorm(100, fe_2)**2z = rnorm(100, 3)**2dep = rpois(100, x*y*z)base = data.frame(fe_1, fe_2, x, y, z, dep)# creating collinearity problems:base$v1 = base$v2 = base$v3 = base$v4 = 0base$v1[base$fe_1 == 1] = 1base$v2[base$fe_1 == 2] = 1base$v3[base$fe_1 == 3] = 1base$v4[base$fe_2 == 1] = 1# Estimations:# Collinearity with the fixed-effects:res_1 = femlm(dep ~ log(x) + v1 + v2 + v4 | fe_1 + fe_2, base)collinearity(res_1)# => collinearity with the first fixed-effect identified, we drop v1 and v2res_1bis = femlm(dep ~ log(x) + v4 | fe_1 + fe_2, base)collinearity(res_1bis)# Multi-Collinearity:res_2 = femlm(dep ~ log(x) + v1 + v2 + v3 + v4, base)collinearity(res_2)Confidence interval for parameters estimated withfixest
Description
This function computes the confidence interval of parameter estimates obtained from amodel estimated withfemlm,feols orfeglm.
Usage
## S3 method for class 'fixest'confint( object, parm, level = 0.95, vcov, se, cluster, ssc = NULL, coef.col = FALSE, ...)Arguments
object | A |
parm | The parameters for which to compute the confidence interval (either aninteger vector OR a character vector with the parameter name). If missing, allparameters are used. |
level | The confidence level. Default is 0.95. |
vcov | Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form: |
se | Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation: |
cluster | Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering over |
ssc | An object of class |
coef.col | Logical, default is |
... | Not currently used. |
Value
Returns a data.frame with two columns giving respectively the lower and upper boundof the confidence interval. There is as many rows as parameters.
Author(s)
Laurent Berge
Examples
# Load trade datadata(trade)# We estimate the effect of distance on trade (with 3 fixed-effects)est_pois = femlm(Euros ~ log(dist_km) + log(Year) | Origin + Destination + Product, trade)# confidence interval with "normal" VCOVconfint(est_pois)# confidence interval with "clustered" VCOV (w.r.t. the Origin factor)confint(est_pois, se = "cluster")Confidence intervals forfixest_multi objects
Description
Computes the confidence intervals of parameter estimates forfixest's multipleestimation objects (akafixest_multi).
Usage
## S3 method for class 'fixest_multi'confint( object, parm, level = 0.95, vcov = NULL, se = NULL, cluster = NULL, ssc = NULL, ...)Arguments
object | A |
parm | The parameters for which to compute the confidence interval (either aninteger vector OR a character vector with the parameter name). If missing, allparameters are used. |
level | The confidence level. Default is 0.95. |
vcov | Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form: |
se | Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation: |
cluster | Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering over |
ssc | An object of class |
... | Not currently used. |
Value
It returns a data frame whose first columns indicate which model has been estimated.The last three columns indicate the coefficient name, and the lower and upperconfidence intervals.
Examples
base = setNames(iris, c("y", "x1", "x2", "x3", "species"))est = feols(y ~ csw(x.[,1:3]) | sw0(species), base, vcov = "iid")confint(est)# focusing only on the coefficient 'x3'confint(est, "x3")# the 'id' provides the index of the estimationest[c(3, 6)]Gets the degrees of freedom of afixest estimation
Description
Simple utility to extract the degrees of freedom from afixest estimation.
Usage
degrees_freedom( x, type, vars = NULL, vcov = NULL, se = NULL, cluster = NULL, ssc = NULL, stage = 2)degrees_freedom_iid(x, type)Arguments
x | A |
type | Character scalar, equal to "k", "resid", "t". If "k", then the number ofregressors is returned. If "resid", then it is the "residuals degree of freedom", i.e.the number of observations minus the number of regressors. If "t", it is the degrees offreedom used in the t-test. Note that these values are affected by how the VCOV of |
vars | A vector of variable names, of the regressors. This is optional. If provided,then |
vcov | Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form: |
se | Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation: |
cluster | Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering over |
ssc | An object of class |
stage | Either 1 or 2. Only concerns IV regressions, which stage to look at. The type of VCOV can have an influence on the degrees of freedom. In particular, when theVCOV is clustered, the DoF returned will be in accordance with the way the smallsample correction was performed when computing the VCOV. That type of value is in generalnot what we have in mind when we think of "degrees of freedom". To obtain the ones that aremore intuitive, please use |
Functions
degrees_freedom_iid(): Gets the degrees of freedom of afixestestimation
Examples
# First: an estimationbase = irisnames(base) = c("y", "x1", "x2", "x3", "species")est = feols(y ~ x1 + x2 | species, base)# "Normal" standard-errors (SE)est_standard = summary(est, se = "st")# Clustered SEsest_clustered = summary(est, se = "clu")# The different degrees of freedom# => different type 1 DoF (because of the clustering)degrees_freedom(est_standard, type = "k")degrees_freedom(est_clustered, type = "k") # fixed-effects are excluded# => different type 2 DoF (because of the clustering)degrees_freedom(est_standard, type = "resid") # => equivalent to the df.residual from lmdegrees_freedom(est_clustered, type = "resid")Centers a set of variables around a set of factors
Description
User-level access to internal demeaning algorithm offixest.
Usage
demean( X, f, slope.vars, slope.flag, data, weights, sample = "estimation", nthreads = getFixest_nthreads(), notes = getFixest_notes(), iter = 2000, tol = 1e-06, fixef.reorder = TRUE, fixef.algo = NULL, na.rm = TRUE, as.matrix = is.atomic(X), im_confident = FALSE, ...)Arguments
X | A matrix, vector, data.frame or a list OR a formula OR a |
f | A matrix, vector, data.frame or list. The factors used to center the variables inargument |
slope.vars | A vector, matrix or list representing the variables with varying slopes.Matrices will be coerced using |
slope.flag | An integer vector of the same length as the number of variables in |
data | A data.frame containing all variables in the argument |
weights | Vector, can be missing or NULL. If present, it must contain the same number ofobservations as in |
sample | Character scalar equal to "estimation" (default) or "original". Onlyused when the argument By default, only the observations used in the estimation are demeaned. This willreturn a matrix with the same number of rows as the number of observations inthe estimation. You can safely use the resulting matrix to recompute the coefficientsfrom the estimation 'by hand'. To demean all the observations of the original sample, use |
nthreads | Number of threads to be used. By default it is equal to |
notes | Logical, whether to display a message when NA values are removed. By default it isequal to |
iter | Number of iterations, default is 2000. |
tol | Stopping criterion of the algorithm. Default is |
fixef.reorder | Logical, default is |
fixef.algo |
|
na.rm | Logical, default is |
as.matrix | Logical, if |
im_confident | Logical, default is |
... | Not currently used. |
Value
It returns a data.frame of the same number of columns as the number of variables to be centered.
Ifna.rm = TRUE, then the number of rows is equal to the number of rows in input minus thenumber of NA values (contained inX,f,slope.vars orweights). The default is to havean output of the same number of observations as the input (filled with NAs where appropriate).
A matrix can be returned ifas.matrix = TRUE.
Varying slopes
You can add variables with varying slopes in the fixed-effect part of the formula.The syntax is as follows:fixef_var[var1, var2]. Here the variables var1 and var2 willbe with varying slopes (one slope per value in fixef_var) and the fixed-effectfixef_var will also be added.
To add only the variables with varying slopes and not the fixed-effect,use double square brackets:fixef_var[[var1, var2]].
In other words:
fixef_var[var1, var2]is equivalent tofixef_var + fixef_var[[var1]] + fixef_var[[var2]]fixef_var[[var1, var2]]is equivalent tofixef_var[[var1]] + fixef_var[[var2]]
In general, for convergence reasons, it is recommended to always add the fixed-effect andavoid using only the variable with varying slope (i.e. use single square brackets).
Examples
# Illustration of the FWL theoremdata(trade)base = tradebase$ln_dist = log(base$dist_km)base$ln_euros = log(base$Euros)# We center the two variables ln_dist and ln_euros# on the factors Origin and DestinationX_demean = demean(X = base[, c("ln_dist", "ln_euros")], f = base[, c("Origin", "Destination")])base[, c("ln_dist_dm", "ln_euros_dm")] = X_demeanest = feols(ln_euros_dm ~ ln_dist_dm, base)est_fe = feols(ln_euros ~ ln_dist | Origin + Destination, base)# The results are the same as if we used the two factors# as fixed-effectsetable(est, est_fe, se = "st")## Variables with varying slopes## You can center on factors but also on variables with varying slopes# Let's have an illustrationbase = irisnames(base) = c("y", "x1", "x2", "x3", "species")## We center y and x1 on species and x2 * species# using a formulabase_dm = demean(y + x1 ~ species[x2], data = base)# using vectorsbase_dm_bis = demean(X = base[, c("y", "x1")], f = base$species, slope.vars = base$x2, slope.flag = 1)# Let's look at the equivalencesres_vs_1 = feols(y ~ x1 + species + x2:species, base)res_vs_2 = feols(y ~ x1, base_dm)res_vs_3 = feols(y ~ x1, base_dm_bis)# only the small sample adj. differ in the SEsetable(res_vs_1, res_vs_2, res_vs_3, keep = "x1")## center on x2 * species and on another FEbase$fe = rep(1:5, 10)# using a formula => double square brackets!base_dm = demean(y + x1 ~ fe + species[[x2]], data = base)# using vectors => note slope.flag!base_dm_bis = demean(X = base[, c("y", "x1")], f = base[, c("fe", "species")], slope.vars = base$x2, slope.flag = c(0, -1))# Explanations slope.flag = c(0, -1):# - the first 0: the first factor (fe) is associated to no variable# - the "-1":# * |-1| = 1: the second factor (species) is associated to ONE variable# * -1 < 0: the second factor should not be included as such# Let's look at the equivalencesres_vs_1 = feols(y ~ x1 + i(fe) + x2:species, base)res_vs_2 = feols(y ~ x1, base_dm)res_vs_3 = feols(y ~ x1, base_dm_bis)# only the small sample adj. differ in the SEsetable(res_vs_1, res_vs_2, res_vs_3, keep = "x1")Controls the parameters of the demeaning procedure
Description
Fine control of the demeaning procedure. Since the defaults are sensible,only use this function in case of difficult convergence (e.g. infeols ordemean).That is, look at the slot$iterations of the returned object, if it's high (over 50),then it might be worth playing around with these settings.
Usage
demeaning_algo( extraProj = 0, iter_warmup = 15, iter_projAfterAcc = 40, iter_grandAcc = 4, internal = FALSE)Arguments
extraProj | Integer scalar, default is 0. Should there be more plain projection stepsin between two accelerations? By default there is not. Each integer value adds 3simple projections.This can be useful in cases where the acceleration algorithm does not work wellbut simple projections do. |
iter_warmup | Integer scalar, default is 15. Only used in the presence of 3or more fixed-effects (FE), ignored otherwise. For 3+ FEs, the algorithm is as follows:
|
iter_projAfterAcc | Integer scalar, default is 40. After |
iter_grandAcc | Integer scalar, default is 4. The regular fixed-point algorithmapplies an acceleration at each iteration. This acceleration is for |
internal | Logical scalar, default is |
Details
The demeaning algorithm is a fixed-point algorithm. Basically a functionf is applieduntil|f(X) - X| = 0, i.e. there is no difference betweenX and its image.For terminology, let's call the application off a "projection".
For well behaved problems, the algorithm in its simplest form, i.e. just applyingf untilconvergence, works fine and you only need a few iterations to reach convergence.
The problems arise for non well behaved problems. In these cases, simply applying thefunctionf can lead to extremely slow convergence. To handle these cases, this algorithmapplies a fixed-point acceleration algorithm, namely the "Irons and Tuck" acceleration.
The main algorithm combines regular projections with accelerations. Unfortunatelysometimes this is not enough, so we also resort on internal cuisine, detailed below.
Sometimes the acceleration in its simplest form does not work well, and garbles theconvergence properties. In those cases:
the argument
extraProjadds several standard projections in between two accelerations,which can improve the performance of the algorithm. By default there are no extraprojections. Note that while it can reduce the total number of iterations until convergence,each iterations is almost twice expensive in terms of computing time.the argument
iter_projAfterAcccontrols whether, and when, to apply a simple projectionright after the acceleration step. This projection adds roughly a 33% increase incomputing time per iteration but can improve the convergence properties and speed. By defaultthis step starts at iteration 40 (when the convergence rate is already not great).
On top of this, in case of very difficult convergence, a "grand" acceleration is added tothe algorithm. The regular acceleration is overf. Sayg is the function equivalent tothe application of one regular iteration (which is a combination of one acceleration withseveral projections).By default the grand acceleration is overh = g o g o g o g, otherwiseg applied four times.The grand acceleration is controled with the argumentiter_grandAcc which correspondsto the number of iterations of the regular algorithm definingh.
Finally in case of 3+ fixed-effects (FE), the convergence in general takes more iterations.In cases of the absence of quick convergence, applying a first demeaning over the firsttwo largest FEs before applying the demeaning over all FEs can improve convergence speed.This is controlled with the argumentiter_warmup which gives the number of iterationsover all the FEs to run before going to the 2 FEs demeaning. By default, the deameaningover all FEs is run for 15 iterations before switching to the 2 FEs case.
The above defaults are the outcome of extended empirical applications, and try to strike abalance across a majority of cases. Of course you can always get better results by tailoringthe settings to your problem at hand.
Value
This function returns a list of 4 integers, equal to the arguments passed by the user.That list is of classdemeaning_algo.
References
B. M. Irons, R. Tuck, "A version of the Aitken accelerator for computer iteration",International journal of numerical methods in engineering 1 (1969) 670 275–277.
Extracts the deviance of a fixest estimation
Description
Returns the deviance from afixest estimation.
Usage
## S3 method for class 'fixest'deviance(object, ...)Arguments
object | A |
... | Not currently used. |
Value
Returns a numeric scalar equal to the deviance.
See Also
feols,fepois,feglm,fenegbin,feNmlm.
Examples
est = feols(Petal.Length ~ Petal.Width, iris)deviance(est)est_pois = fepois(Petal.Length ~ Petal.Width, iris)deviance(est_pois)Residual degrees-of-freedom forfixest objects
Description
Returns the residual degrees of freedom for a fittedfixest object
Usage
## S3 method for class 'fixest'df.residual(object, ...)Arguments
object | |
... | Not currently used |
Value
It returns an integer scalar giving the residuals degrees of freedom of the estimation.
See Also
The functiondegrees_freedom infixest.
Examples
est = feols(mpg ~ hp, mtcars)df.residual(est)Treated and control sample descriptives
Description
This function shows the means and standard-deviations of several variables conditional on whether they are from the treated or the control group. The groups can further be split according to a pre/post variable. Results can be seamlessly be exported to Latex.
Usage
did_means( fml, base, treat_var, post_var, tex = FALSE, treat_dict, dict = getFixest_dict(), file, replace = FALSE, title, label, raw = FALSE, indiv, treat_first, prepostnames = c("Before", "After"), diff.inv = FALSE)Arguments
fml | Either a formula of the type |
base | A data base containing all the variables in the formula |
treat_var | Only if argument |
post_var | Only if argument |
tex | Should the result be displayed in Latex? Default is |
treat_dict | A character vector of length two. What are the names of the treatedand the control? This should be a dictionary: e.g. |
dict | A named character vector. A dictionary between the variables names and an alias.For instance |
file | A file path. If given, the table is written in Latex into this file. |
replace | Default is |
title | Character string giving the Latex title of the table. (Only if exported.) |
label | Character string giving the Latex label of the table. (Only if exported.) |
raw | Logical, default is |
indiv | Either the variable name of individual identifiers, a one sided formula,or a vector. If the data is that of a panel, this can be used to track the numberof individuals per group. |
treat_first | Which value of the 'treatment' vector should appear on the left?By default the max value appears first (e.g. if the treatment variable is a 0/1 vector,1 appears first). |
prepostnames | Only if there is a 'post' variable. The names of the pre and postperiods to be displayed in Latex. Default is |
diff.inv | Logical, default to |
Details
By default, when the user tries to apply this function to nun-numeric variables, an error is raised. The exception is when the all variables are selected with the dot (like in. ~ treat. In this case, non-numeric variables are automatically omitted (with a message).
NAs are removed automatically: if the data contains NAs an information message will be prompted. First all observations containing NAs relating to the treatment or post variables are removed. Then if there are still NAs for the variables, they are excluded separately for each variable, and a new message detailing the NA breakup is prompted.
Value
It returns a data.frame or a Latex table with the conditional means and statistical differences between the groups.
Examples
# Playing around with the DiD datadata(base_did)# means of treat/controldid_means(y+x1+period~treat, base_did)# same but inverting the differencedid_means(y+x1+period~treat, base_did, diff.inv = TRUE)# now treat/control, before/afterdid_means(y+x1+period~treat|post, base_did)# same but with a new line giving the number of unique "indiv" for each casedid_means(y+x1+period~treat|post, base_did, indiv = "id")# same but with the treat case "0" coming firstdid_means(y+x1+period~treat|post, base_did, indiv = ~id, treat_first = 0)# Selecting all the variables with "."did_means(.~treat|post, base_did, indiv = "id")Simple and powerful string manipulation with the dot square bracket operator
Description
Compactly performs many low level string operations. Advanced support for pluralization.
Usage
dsb( ..., frame = parent.frame(), sep = "", vectorize = FALSE, nest = TRUE, collapse = NULL)Arguments
... | Character scalars that will be collapsed with the argument |
frame | An environment used to evaluate the variables in |
sep | Character scalar, default is |
vectorize | Logical, default is |
nest | Logical, default is |
collapse | Character scalar or There are over 30 basic string operations, it supports pluralization, it's fast (e.g. faster than See detailed help on the console with |
Value
It returns a character vector whose length depends on the elements and operations in".[]".
Examples
## BASIC USAGE #####x = c("Romeo", "Juliet")# .[x] inserts xdsb("Hello .[x]!")# elements in ... are collapsed with "" (default)dsb("Hello .[x[1]], ", "how is .[x[2]] doing?")# Splitting a comma separated string# The mechanism is explained laterdsb("/J. Mills, David, Agnes, Dr Strong")# Nota: this is equivalent to (explained later)dsb("', *'S !J. Mills, David, Agnes, Dr Strong")## Applying low level operations to strings## Two main syntax:# A) expression evaluation# .[operation ? x]# | |# | \-> the expression to be evaluated# \-> ? means that the expression will be evaluated# B) verbatim# .[operation ! x]# | |# | \-> the expression taken as verbatim (here ' x')# \-> ! means that the expression is taken as verbatim# operation: usually 'arg'op with op an operation code.# Example: splittingx = "hello dear"dsb(".[' 's ? x]")# x is split by ' 'dsb(".[' 's !hello dear]")# 'hello dear' is split by ' '# had we used ?, there would have been an error# By default, the string is nested in .[], so in that case no need to use .[]:dsb("' 's ? x")dsb("' 's !hello dear")# There are 35 string operators# Operators usually have a default value# Operations can be chained by separating them with a comma# Example: default of 's' is ' ' + chaining with collapsedsb("s, ' my 'c!hello dear")## Nesting## .[operations ! s1.[expr]s2]# | |# | \-> expr will be evaluated then added to the string# \-> nesting requires verbatim evaluation: '!'dsb("The variables are: .[C!x.[1:4]].")# This one is a bit ugly but it shows triple nestingdsb("The variables are: .[w, C!.[2* ! x.[1:4]].[S, 4** ! , _sq]].")## Splitting## s: split with fixed pattern, default is ' 'dsb("s !a b c")dsb("' b 's !a b c")# S: split with regex pattern, default is ', *'dsb("S !a, b, c")dsb("'[[:punct:] ]'S !a! b; c")## Collapsing## c and C do the same, their default is different# syntax: 's1||s2' with# - s1 the string used for collapsing# - s2 (optional) the string used for the last collapse# c: default is ' 'dsb("c?1:3")# C: default is ', || and 'dsb("C?1:3")dsb("', || or 'c?1:4")## Extraction## x: extracts the first pattern# X: extracts all patterns# syntax: 'pattern'x# Default is '[[:alnum:]]+'x = "This years is... 2020"dsb("x ? x")dsb("X ? x")dsb("'\\d+'x ? x")## STRING FORMATTING ####### u, U: uppercase first/all letters# first letterdsb("u!julia mills")# title case: split -> upper first letter -> collapsedsb("s, u, c!julia mills")# upper all lettersdsb("U!julia mills")## L: lowercasedsb("L!JULIA MILLS")## q, Q: single or double quotedsb("S, q, C!Julia, David, Wilkins")dsb("S, Q, C!Julia, David, Wilkins")## f, F: formats the string to fit the same lengthscore = c(-10, 2050)nm = c("Wilkins", "David")dsb("Monopoly scores:\n.['\n'c ! - .[f ? nm]: .[F ? score] US$]")# OK that example may have been a bit too complex,# let's make it simple:dsb("Scores: .[f ? score]")dsb("Names: .[F ? nm]")## w, W: reformat the white spaces# w: suppresses trimming white spaces + normalizes successive white spaces# W: same but also includes punctuationdsb("w ! The white spaces are now clean. ")dsb("W ! I, really -- truly; love punctuation!!!")## %: applies sprintf formattingdsb("pi = .['.2f'% ? pi]")## a: appends text on each item# syntax: 's1|s2'a, adds s1 at the beginning and s2 at the end of the string# It accepts the special values :1:, :i:, :I:, :a:, :A:# These values create enumerations (only one such value is accepted)# appending square bracketsdsb("'[|]'a, ' + 'c!x.[1:4]")# Enumerationsacad = dsb("/you like admin, you enjoy working on weekends, you really love emails")dsb("Main reasons to pursue an academic career:\n .[':i:) 'a, C ? acad].")## A: same as 'a' but adds at the begging/end of the full string (not on the elements)# special values: :n:, :N:, give the number of elementscharacters = dsb("/David, Wilkins, Dora, Agnes")dsb("There are .[':N: characters: 'A, C ? characters].")## stop: removes basic English stopwords# the list is from the Snowball project: http://snowball.tartarus.org/algorithms/english/stop.txtdsb("stop, w!It is a tale told by an idiot, full of sound and fury, signifying nothing.")## k: keeps the first n characters# syntax: nk: keeps the first n characters# 'n|s'k: same + adds 's' at the end of shortened strings# 'n||s'k: same but 's' counts in the n characters keptwords = dsb("/short, constitutional")dsb("5k ? words")dsb("'5|..'k ? words")dsb("'5||..'k ? words")## K: keeps the first n elements# syntax: nK: keeps the first n elements# 'n|s'K: same + adds the element 's' at the end# 'n||s'K: same but 's' counts in the n elements kept## Special values :rest: and :REST:, give the number of items droppedbx = dsb("/Pessac Leognan, Saint Emilion, Marguaux, Saint Julien, Pauillac")dsb("Bordeaux wines I like: .[3K, ', 'C ? bx].")dsb("Bordeaux wines I like: .['3|etc..'K, ', 'C ? bx].")dsb("Bordeaux wines I like: .['3||etc..'K, ', 'C ? bx].")dsb("Bordeaux wines I like: .['3|and at least :REST: others'K, ', 'C ? bx].")## Ko, KO: special operator which keeps the first n elements and adds "others"# syntax: nKo# KO gives the rest in lettersdsb("Bordeaux wines I like: .[4KO, C ? bx].")## r, R: string replacement# syntax: 's'R: deletes the content in 's' (replaces with the empty string)# 's1 => s2'R replaces s1 into s2# r: fixed / R: perl = TRUEdsb("'e'r !The letter e is deleted")# adding a perl look-behinddsb("'(?<! )e'R !The letter e is deleted")dsb("'e => a'r !The letter e becomes a")dsb("'([[:alpha:]]{3})[[:alpha:]]+ => \\1.'R !Trimming the words")## *, *c, **, **c: replication, replication + collapse# syntax: n* or n*c# ** is the same as * but uses "each" in the replicationdsb("N.[10*c!o]!")dsb("3*c ? 1:3")dsb("3**c ? 1:3")## d: replaces the items by the empty string# -> useful in conditionsdsb("d!I am going to be annihilated")## ELEMENT MANIPULATION ####### D: deletes all elements# -> useful in conditionsx = dsb("/I'll, be, deleted")dsb("D ? x")## i, I: inserts an item# syntax: 's1|s2'i: inserts s1 first and s2 last# I: is the same as i but is 'invisibly' includedcharacters = dsb("/David, Wilkins, Dora, Agnes, Trotwood")dsb("'Heep|Spenlow'i, C ? characters")dsb("'Heep|Spenlow'I, C ? characters")## PLURALIZATION ###### There is support for pluralization## *s, *s_: adds 's' or 's ' depending on the number of elementsnb = 1:5dsb("Number.[*s, D ? nb]: .[C ? nb]")dsb("Number.[*s, D ? 2 ]: .[C ? 2 ]")# ordsb("Number.[*s, ': 'A, C ? nb]")## v, V: adds a verb at the beginning/end of the string# syntax: 'verb'v# Unpopular opinion?brand = c("Apple", "Samsung")dsb(".[V, C ? brand] overrated.")dsb(".[V, C ? brand[1]] overrated.")win = dsb("/Peggoty, Agnes, Emily")dsb("The winner.[*s_, v, C ? win].")dsb("The winner.[*s_, v, C ? win[1]].")# Other verbsdsb(".[' have'V, C ? win] won a prize.")dsb(".[' have'V, C ? win[1]] won a prize.")dsb(".[' was'V, C ? win] unable to come.")dsb(".[' was'V, C ? win[1]] unable to come.")## *A: appends text depending on the length of the vector# syntax: 's1|s2 / s3|s4'# if length == 1: applies 's1|s2'A# if length > 1: applies 's3|s4'Awin = dsb("/Barkis, Micawber, Murdstone")dsb("The winner.[' is /s are '*A, C ? win].")dsb("The winner.[' is /s are '*A, C ? win[1]].")## CONDITIONS ###### Conditions can be applied with 'if' statements.",# The syntax is 'type comp value'if(true : false), with# - type: either 'len', 'char', 'fixed' or 'regex'# + len: number of elements in the vector# + char: number of characters# + fixed: fixed pattern# + regex: regular expression pattern# - comp: a comparator:# + valid for len/char: >, <, >=, <=, !=, ==# + valid for fixed/regex: !=, ==# - value: a value for which the comparison is applied.# - true: operations to be applied if true (can be void)# - false: operations to be applied if false (can be void)dsb("'char <= 2'if('(|)'a : '[|]'a), ' + 'c ? c(1, 12, 123)")sentence = "This is a sentence with some longish words."dsb("s, 'char<=4'if(D), c ? sentence")dsb("s, 'fixed == e'if(:D), c ! Only words with an e are selected.")## ARGUMENTS FROM THE FRAME ###### Arguments can be evaluated from the calling frame.# Simply use backticks instead of quotes.dollar = 6reason = "glory"dsb("Why do you develop packages? For .[`dollar`*c!$]?", "For money? No... for .[U,''s, c?reason]!", sep = "\n")Support for emmeans package
Description
Ifemmeans is installed, its functionality is supported forfixestorfixest_multi objects. Its reference grid is based on the main partof the model, and does not include fixed effects or instrumental variables.Note that any desired arguments tovcov() may be passed as optionalarguments inemmeans::emmeans() oremmeans::ref_grid().
Note
When fixed effects are present, estimated marginal means (EMMs) are estimatedcorrectly, provided equal weighting is used. However, the SEs of these EMMswill be incorrect - often dramatically - because the estimated variance ofthe intercept is not available. However,contrasts among EMMs can beestimated and tested with no issues, because these do not involve theintercept.
Author(s)
Russell V. Lenth
Examples
if(requireNamespace("emmeans") && requireNamespace("AER")) { data(Fatalities, package = "AER") Fatalities$frate = with(Fatalities, fatal/pop * 10000) fat.mod = feols(frate ~ breath * jail * beertax | state + year, data = Fatalities) emm = emmeans::emmeans(fat.mod, ~ breath*jail, cluster = ~ state + year) emm ### SEs and CIs are incorrect emmeans::contrast(emm, "consec", by = "breath") ### results are reliable}Estimates afixest estimation from afixest environment
Description
This is a function advanced users which allows to estimate anyfixest estimation from afixest environment obtained withonly.env = TRUE in afixest estimation.
Usage
est_env(env, y, X, weights, endo, inst)Arguments
env | An environment obtained from a |
y | A vector representing the dependent variable. Should be of the same lengthas the number of observations in the initial estimation. |
X | A matrix representing the independent variables. Should be of the same dimensionas in the initial estimation. |
weights | A vector of weights (i.e. with only positive values). Should be ofthe same length as the number of observations in the initial estimation. If identicalto the scalar 1, this will mean that no weights will be used in the estimation. |
endo | A matrix representing the endogenous regressors in IV estimations. It shouldbe of the same dimension as the original endogenous regressors. |
inst | A matrix representing the instruments in IV estimations. It should be ofthe same dimension as the original instruments. |
Details
This function has been created for advanced users, mostly to avoid overheadswhen making simulations withfixest.
How can it help you make simulations? First make a core estimation withonly.env = TRUE,and usually withonly.coef = TRUE (to avoid having extra things that take time to compute).Then loop while modifying the appropriate things directly in the environment. Beware thatif you make a mistake here (typically giving stuff of the wrong length),then you can make the R session crash because there is no more error-handling!Finally estimate withest_env(env = core_env) and store the results.
Instead ofest_env, you could use directlyfixest estimations too, likefeols,since they accept theenv argument. The functionest_env is only here to add abit of generality to avoid the trouble to the user to write conditions(look at the source, it's just a one liner).
Objects of main interest in the environment are:
- lhs
The left hand side, or dependent variable.
- linear.mat
The matrix of the right-hand-side, or explanatory variables.
- iv_lhs
The matrix of the endogenous variables in IV regressions.
- iv.mat
The matrix of the instruments in IV regressions.
- weights.value
The vector of weights.
I strongly discourage changing the dimension of any of these elements, or else crash can occur.However, you can change their values at will (given the dimension stay the same).The only exception is the weights, which tolerates changing its dimension: it canbe identical to the scalar1 (meaning no weights), or to something of the length thenumber of observations.
I also discourage changing anything in the fixed-effects (even their value)since this will almost surely lead to a crash.
Note that this function is mostly useful when the overheads/estimation ratio is high.This means that OLS will benefit the most from this function. For GLM/Max.Lik. estimations,the ratio is small since the overheads is only a tiny portion of the total estimation time.Hence this function will be less useful for these models.
Value
It returns the results of afixest estimation: the one that was summoned whenobtaining the environment.
Author(s)
Laurent Berge
Examples
# Let's make a short simulation# Inspired from Grant McDermott bboot function# See https://twitter.com/grant_mcdermott/status/1487528757418102787# Simple function that computes a Bayesian bootstrapbboot = function(x, n_sim = 100){ # We bootstrap on the weights # Works with fixed-effects/IVs # and with any fixest function that accepts weights core_env = update(x, only.coef = TRUE, only.env = TRUE) n_obs = x$nobs res_all = vector("list", n_sim) for(i in 1:n_sim){ ## begin: NOT RUN ## We could directly assign in the environment: # assign("weights.value", rexp(n_obs, rate = 1), core_env) # res_all[[i]] = est_env(env = core_env) ## end: NOT RUN ## Instead we can use the argument weights, which does the same res_all[[i]] = est_env(env = core_env, weights = rexp(n_obs, rate = 1)) } do.call(rbind, res_all)}est = feols(mpg ~ wt + hp, mtcars)boot_res = bboot(est)coef = colMeans(boot_res)std_err = apply(boot_res, 2, sd)# Comparing the results with the main estimationcoeftable(est)cbind(coef, std_err)Extracts the scores from a fixest estimation
Description
Extracts the scores from a fixest estimation.
Usage
## S3 method for class 'fixest'estfun(x, ...)Arguments
x | A |
... | Not currently used. |
Value
Returns a matrix of the same number of rows as the number of observations used forthe estimation, and the same number of columns as there were variables.
Examples
data(iris)est = feols(Petal.Length ~ Petal.Width + Sepal.Width, iris)head(estfun(est))Estimations table (export the results of multiples estimations to a DF or to Latex)
Description
Aggregates the results of multiple estimations and displays them in the form of either a Latextable or adata.frame. Note that you will need thebooktabs package for the Latex table torender properly. SeesetFixest_etable to set the default values, andstyle.tex to customize Latex output.
Usage
esttable( ..., vcov = NULL, stage = 2, agg = NULL, se = NULL, ssc = NULL, cluster = NULL, .vcov_args = NULL, digits = 4, digits.stats = 5, fitstat = NULL, coefstat = "se", ci = 0.95, se.row = NULL, se.below = NULL, keep = NULL, drop = NULL, order = NULL, dict = TRUE, file = NULL, replace = TRUE, create_dirs = FALSE, convergence = NULL, signif.code = NULL, headers = list("auto"), fixef_sizes = FALSE, fixef_sizes.simplify = TRUE, keepFactors = TRUE, family = NULL, powerBelow = -5, interaction.combine = NULL, interaction.order = NULL, i.equal = NULL, depvar = TRUE, style.df = NULL, group = NULL, extralines = NULL, fixef.group = NULL, drop.section = NULL, poly_dict = c("", " square", " cube"), postprocess.df = NULL, fit_format = "__var__", coef.just = NULL, highlight = NULL, coef.style = NULL, export = NULL, page.width = "fit", div.class = "etable")esttex( ..., vcov = NULL, stage = 2, agg = NULL, se = NULL, ssc = NULL, cluster = NULL, .vcov_args = NULL, digits = 4, digits.stats = 5, fitstat = NULL, caption = NULL, coefstat = "se", ci = 0.95, se.row = NULL, se.below = NULL, keep = NULL, drop = NULL, order = NULL, dict = TRUE, file = NULL, replace = TRUE, create_dirs = FALSE, convergence = NULL, signif.code = NULL, label = NULL, float = NULL, headers = list("auto"), fixef_sizes = FALSE, fixef_sizes.simplify = TRUE, keepFactors = TRUE, family = NULL, powerBelow = -5, interaction.combine = NULL, interaction.order = NULL, i.equal = NULL, depvar = TRUE, style.tex = NULL, notes = NULL, group = NULL, extralines = NULL, fixef.group = NULL, placement = "htbp", drop.section = NULL, poly_dict = c("", " square", " cube"), postprocess.tex = NULL, tpt = FALSE, arraystretch = NULL, adjustbox = NULL, fontsize = NULL, fit_format = "__var__", tabular = "normal", highlight = NULL, coef.style = NULL, meta = NULL, meta.time = NULL, meta.author = NULL, meta.sys = NULL, meta.call = NULL, meta.comment = NULL, view = FALSE, export = NULL, markdown = NULL, page.width = "fit", div.class = "etable")etable( ..., vcov = NULL, stage = 2, agg = NULL, se = NULL, ssc = NULL, cluster = NULL, .vcov_args = NULL, digits = 4, digits.stats = 5, tex, fitstat = NULL, caption = NULL, coefstat = "se", ci = 0.95, se.row = NULL, se.below = NULL, keep = NULL, drop = NULL, order = NULL, dict = TRUE, file = NULL, replace = TRUE, create_dirs = FALSE, convergence = NULL, signif.code = NULL, label = NULL, float = NULL, headers = list("auto"), fixef_sizes = FALSE, fixef_sizes.simplify = TRUE, keepFactors = TRUE, family = NULL, powerBelow = -5, interaction.combine = NULL, interaction.order = NULL, i.equal = NULL, depvar = TRUE, style.tex = NULL, style.df = NULL, notes = NULL, group = NULL, extralines = NULL, fixef.group = NULL, placement = "htbp", drop.section = NULL, poly_dict = c("", " square", " cube"), postprocess.tex = NULL, postprocess.df = NULL, tpt = FALSE, arraystretch = NULL, adjustbox = NULL, fontsize = NULL, fit_format = "__var__", coef.just = NULL, tabular = "normal", highlight = NULL, coef.style = NULL, meta = NULL, meta.time = NULL, meta.author = NULL, meta.sys = NULL, meta.call = NULL, meta.comment = NULL, view = FALSE, export = NULL, markdown = NULL, page.width = "fit", div.class = "etable")setFixest_etable( digits = 4, digits.stats = 5, fitstat, coefstat = c("se", "tstat", "confint", "pvalue"), ci = 0.95, se.below = TRUE, keep, drop, order, dict, float, signif.code = NULL, fixef_sizes = FALSE, fixef_sizes.simplify = TRUE, family, powerBelow = -5, interaction.order = NULL, depvar, style.tex = NULL, style.df = NULL, notes = NULL, group = NULL, extralines = NULL, fixef.group = NULL, placement = "htbp", drop.section = NULL, view = FALSE, markdown = NULL, view.cache = TRUE, page.width = "fit", div.class = "etable", postprocess.tex = NULL, postprocess.df = NULL, fit_format = "__var__", meta.time = NULL, meta.author = NULL, meta.sys = NULL, meta.call = NULL, meta.comment = NULL, reset = FALSE, save = FALSE)getFixest_etable()## S3 method for class 'etable_tex'print(x, ...)## S3 method for class 'etable_df'print(x, ...)log_etable(type = "pdflatex")Arguments
... | Used to capture different |
vcov | Versatile argument to specify the VCOV.In general, it is either a character scalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. The VCOV types implemented are: "iid", "hetero" (or "HC1"),"cluster", "twoway", "NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley".It also accepts object from vcov_cluster, vcov_NW, NW, vcov_DK, DK, vcov_conley and conley.It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances.See the vcov documentation in the vignette.You can pass several VCOVs (as above) if you nest them into a list.If the number of VCOVs equals the number of models, eahc VCOV is mapped to the appropriate model.If there is one model and several VCOVs, or if the first element of the list is equal to |
stage | Can be equal to |
agg | A character scalar describing the variable names to be aggregated,it is pattern-based. For |
se | Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation: |
ssc | An object of class |
cluster | Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering over |
.vcov_args | A list containing arguments to be passed to the function |
digits | Integer or character scalar. Default is 4 and represents the number of significantdigits to be displayed for the coefficients and standard-errors. To apply rounding instead ofsignificance use, e.g., |
digits.stats | Integer or character scalar. Default is 5 and represents the number ofsignificant digits to be displayed for the fit statistics. To apply rounding instead ofsignificance use, e.g., |
fitstat | A character vector or a one sided formula (both with only lowercase letters). Avector listing which fit statistics to display. The valid types are 'n', 'll', 'aic', 'bic' andr2 types like 'r2', 'pr2', 'war2', etc (see all valid types in |
coefstat | One of |
ci | Level of the confidence interval, defaults to |
se.row | Logical scalar, default is |
se.below | Logical or |
keep | Character vector. This element is used to display only a subset of variables. Thisshould be a vector of regular expressions (see |
drop | Character vector. This element is used if some variables are not to be displayed.This should be a vector of regular expressions (see |
order | Character vector. This element is used if the user wants the variables to beordered in a certain way. This should be a vector of regular expressions (see |
dict | A named character vector or a logical scalar. It changes the original variable namesto the ones contained in the |
file | A character scalar. If provided, the Latex (or data frame) table will be saved in afile whose path is |
replace | Logical, default is |
create_dirs | Logical, default is |
convergence | Logical, default is missing. Should the convergence state of the algorithm bedisplayed? By default, convergence information is displayed if at least one model did notconverge. |
signif.code | Named numeric vector, used to provide the significance codes with respect tothe p-value of the coefficients. Default is |
headers | Character vector or list. Adds one or more header lines in the table. A headerline can be represented by a character vector or a named list of numbers where the names are thecell values and the numbers are the span. Example: |
fixef_sizes | (Tex only.) Logical, default is |
fixef_sizes.simplify | Logical, default is |
keepFactors | Logical, default is |
family | Logical, default is missing. Whether to display the families of the models. Bydefault this line is displayed when at least two models are from different families. |
powerBelow | (Tex only.) Integer, default is -5. A coefficient whose value is below |
interaction.combine | Character scalar, defaults to |
interaction.order | Character vector of regular expressions. Only affects variables thatare interacted like x1 and x2 in |
i.equal | Character scalar, defaults to |
depvar | Logical, default is |
style.df | An object created by the function |
group | A list. The list elements should be vectors of regular expressions. For eachelements of this list: A new line in the table is created, all variables that are matched by theregular expressions are discarded (same effect as the argument |
extralines | A vector, a list or a one sided formula. The list elements should be either avector representing the value of each cell, a list of the form
|
fixef.group | Logical scalar or list (default is |
drop.section | Character vector which can be of length 0 (i.e. equal to |
poly_dict | Character vector, default is |
postprocess.df | A function that will postprocess.tex the resulting data.frame. Only when |
fit_format | Character scalar, default is |
coef.just | (DF only.) Either |
highlight | List containing coefficients to highlight.Highlighting is of the form To be able to use use the highlighting feature, you need thefollowing lines in your latex preamble: |
coef.style | Named list containing styles to be applied to the coefficients. It must be ofthe form |
export | Character scalar giving the path to a PNG file to be created, default is |
page.width | Character scalar equal to |
div.class | Character scalar, default is |
caption | (Tex only.) Character scalar. The caption of the Latex table. |
label | (Tex only.) Character scalar. The label of the Latex table. |
float | (Tex only.) Logical. By default, if the argument |
style.tex | An object created by the function |
notes | (Tex only.) Character vector. If provided, a |
placement | (Tex only.) Character string giving the position of the float in Latex. Defaultis "htbp". It must consist of only the characters 'h', 't', 'b', 'p', 'H' and '!'.Reminder: h: here; t: top; b: bottom; p: float page; H: definitely here;!: prevents Latex to look for other positions. Note that it can be equal to the empty string(and you'll get the default placement). |
postprocess.tex | A function that will postprocess the character vector defining the latextable. Only when |
tpt | (Tex only.) Logical scalar, default is FALSE. Whether to use the |
arraystretch | (Tex only.) A numeric scalar, default is |
adjustbox | (Tex only.) A logical, numeric or character scalar, default is |
fontsize | (Tex only.) A character scalar, default is |
tabular | (Tex only.) Character scalar equal to "normal" (default), |
meta | (Tex only.) A one-sided formula that shall contain the following elements:date or time, sys, author, comment and call. Default is |
meta.time | (Tex only.) Either a logical scalar (default is |
meta.author | (Tex only.) A logical scalar (default is |
meta.sys | (Tex only.) A logical scalar, default is |
meta.call | (Tex only.) Logical scalar, default is |
meta.comment | (Tex only.) A character vector containing free-form comments to be insertedright before the table. |
view | Logical, default is |
markdown | Character scalar giving the location of a directory, or a logical scalar.Default is |
tex | Logical: whether the results should be a data.frame or a Latex table. By default,this argument is |
view.cache | Logical, default is |
reset | ( |
save | Either a logical or equal to |
x | An object returned by |
type | Character scalar equal to 'pdflatex' (default), 'magick', 'dir' or 'tex'.Which log file to report; if 'tex', the full source code of the tex file is returned,if 'dir': the directory of the log files is returned. |
Details
The functionesttex is equivalent to the functionetable with argumenttex = TRUE.
The functionesttable is equivalent to the functionetable with argumenttex = FALSE.
To display the table, you will need the Latex packagebooktabs which containsthe\\toprule,\\midrule and\\bottomrule commands.
You can permanently change the way your table looks in Latex by usingsetFixest_etable.The following vignette gives an example as well as illustrates how to use thestyle andpostprocessing functions:Exporting estimation tables.
When the argumentpostprocess.tex is not missing, two additional tags willbe included in the character vector returned byetable:"%start:tab\\n" and"%end:tab\\n". These can be usedto identify the start and end of the tabular and are useful to insert codewithin thetable environment.
Value
Iftex = TRUE, the lines composing the Latex table are returned invisibly whilethe table is directly prompted on the console.
Iftex = FALSE, the data.frame is directly returned. If the argumentfile isnot missing, thedata.frame is printed and returned invisibly.
Functions
esttable(): Exports the results of multiplefixestestimations in a Latex table.esttex(): Exports the results of multiplefixestestimations in a Latex table.
Latex dependencies
Some features require specific Latex dependencies, these are:
always needed:
\\usepackage{booktabs},\\usepackage{array},\\usepackage{multirow},\\usepackage{amsmath},\\usepackage{amssymb}if there are line break within cells:
\\usepackage{makecell}if the tabularx environment is used:
\\usepackage{tabularx}if threeparttable notes are used:
\\usepackage[flushleft]{threeparttable}if you use adjustbox:
\\usepackage{adjustbox}if you use any kind of colors in the table:
\\usepackage[dvipsnames,table]{xcolor}if you highlight cells with a box:
\\usepackage{tikz}and\\usetikzlibrary{matrix, shapes, arrows, fit, tikzmark}if you highlight rows using the background color:
\\usepackage{colortbl}
Here is a summary:
% required\usepackage{booktabs}\usepackage{array}\usepackage{multirow}\usepackage{amsmath}\usepackage{amssymb}% optionnal, dependent on context\usepackage{makecell}\usepackage{tabularx}\usepackage[flushleft]{threeparttable}\usepackage{adjustbox}\usepackage[dvipsnames,table]{xcolor}\usepackage{tikz}\usetikzlibrary{matrix, shapes, arrows, fit, tikzmark}\usepackage{colortbl}How doesdigits handle the number of decimals displayed?
The default display of decimals is the outcome of an algorithm. Let's take the exampleofdigits = 3 which "kind of" requires 3 significant digits to be displayed.
For numbers greater than 1 (in absolute terms), their integral part isalways displayed and the number of decimals shown is equal todigitsminus the number of digits in the integral part.This means that12.345 will be displayed as12.3.If the number of decimals should be 0, then a single decimal is displayedto suggest that the number is not whole. This means that1234.56 willbe displayed as1234.5. Note that if the number is whole, no decimals are shown.
For numbers lower than 1 (in absolute terms), the number of decimals displayed is equaltodigits except if there are only 0s in which case the first significantdigit is shown.This means that0.01234 will be displayed as0.012 (first rule),and that 0.000123 will be displayed as0.0001 (second rule).
Arguments keep, drop and order
The argumentskeep,drop andorder use regular expressions. If you are not awareof regular expressions, I urge you to learn it, since it is an extremely powerful wayto manipulate character strings (and it exists across most programming languages).
For example drop = "Wind" would drop any variable whose name contains "Wind". Note thatvariables such as "Temp:Wind" or "StrongWind" do contain "Wind", so would be dropped.To drop only the variable named "Wind", you need to usedrop = "^Wind$" (with "^" meaning beginning, resp. "$" meaning end,of the string => this is the language of regular expressions).
Although you can combine several regular expressions in a single characterstring using pipes,drop also accepts a vector of regular expressions.
You can use the special character "!" (exclamation mark) to reverse the effectof the regular expression (this feature is specific to this function).For exampledrop = "!Wind" would drop any variable that does not contain "Wind".
You can use the special character "%" (percentage) to make reference to theoriginal variable name instead of the aliased name. For example, you have avariable named"Month6", and use a dictionarydict = c(Month6="June").Thus the variable will be displayed as"June".If you want to delete that variable, you can use eitherdrop="June",ordrop="%Month6" (which makes reference to its original name).
The argumentorder takes in a vector of regular expressions, the order will follow theelements of this vector. The vector gives a list of priorities,on the left the elements with highest priority.For example, order = c("Wind", "!Inter", "!Temp") would give highest priorities tothe variables containing "Wind" (which would then appear first),second highest priority is the variables not containing "Inter", last,with lowest priority, the variables not containing "Temp".If you had the following variables: (Intercept), Temp:Wind, Wind, Temp youwould end up with the following order: Wind, Temp:Wind, Temp, (Intercept).
The argumentextralines
The argumentextralines adds well... extra lines to the table.It accepts either a list, or a one-sided formula.
For each line, you can define the values taken by each cell using 4 different ways:a) a vector, b) a list, c) a function, and d) a formula.
If a vector, it should represent the values taken by each cell. Note that if thelength of the vector is smaller than the number of models, its values arerecycled across models, but the length of the vector is required to be adivisor of the number of models.
If a list, it should be of the formlist("item1" = #item1, "item2" = #item2, etc).For examplelist("A"=2, "B"=3) leads toc("A", "A", "B", "B", "B").Note that if the number of items is 1, you don't need to add= 1.For examplelist("A"=2, "B") is valid and leads toc("A", "A", "B". As for the vector the values are recycled if necessary.
If a function, it will be applied to each model and should return a scalar (NA valuesreturned are accepted).
If a formula, it must be one-sided and the elements in the formula must represent eitherextralines macros, either fit statistics (i.e. valid types ofthe functionfitstat).One new line will be added for each element of the formula.To registerextralines macros, you must first register them inextralines_register.
Finally, you can combine as many lines as wished by nesting them in a list.The names of the nesting list are the row titles (values in the leftmost cell).For exampleextralines = list(~r2, Controls = TRUE, Group = list("A"=2, "B")) willadd three lines, the titles of which are "R2", "Controls" and "Group".
Controlling the placement of extra lines
The argumentsgroup,extralines andfixef.group allow to add customized lines in thetable. They can be defined via a list where the list name will be the row name.By default, the placement of the extra line is right after the coefficients(except forfixef.group, covered in the last paragraph).For instance,group = list("Controls" = "x[[:digit:]]") will create aline right after the coefficients telling which models contain the control variables.
But the placement can be customized. The previous example (of the controls) willbe used for illustration (the mechanism forextralines andfixef.group is identical).
The row names accept 2 special characters at the very start.The first character tells in which section the line should appear:it can be equal to"^","-", or"_", meaning respectivelythe coefficients, the fixed-effects and the statistics section(which typically appear at the top, mid and bottom of the table).The second one governs the placement of the new line withinthe section: it can be equal to"^", meaning first line, or"_", meaning last line.
Let's have some examples. Using the previous example, writing"_^Controls"would place the new line at the top of the statistics section.Writing"-_Controls" places it as the last row ofthe fixed-effects section;"^^Controls" at the top row ofthe coefficients section; etc...
The second character is optional, the default placement being in the bottom.This means that"_Controls" would place it at the bottom of the statistics section.
The placement infixef.group is defined similarly, only the defaultplacement is different.Its default placement is at the top of the fixed-effects section.
Escaping special Latex characters
By default on all instances (with the notable exception of the elements ofstyle.tex)special Latex characters are escaped. This means thatcaption="Exports in million $." will be exported as"Exports in million \\$.": the dollar sign will be escaped.This is true for the following characters: &,$, %, _, ^ and #.
Note, importantly, that equations are NOT escaped. This means thatcaption="Functional form $a_i \\times x^b$, variation in %." will be displayed as:"Functional form $a_i \\times x^b$, variation in \\%.": only thelast percentage will be escaped.
If for some reason you don't want the escaping to take place, the argumentsheaders andextralines are the only ones allowing that. To disable escaping, add the special token":tex:" in the row names.Example: inheaders=list(":tex:Row title"="weird & & %\\n tex stuff\\\\"),the elements will be displayed verbatim. Of course, since it can easily ruin your table,it is only recommended to super users.
Markdown markup
Within anything that is Latex-escaped (see previous section), you can use a markdown-stylemarkup to put the text in italic and/or bold. Use*text*,**text** or***text*** toput some text in, respectively, italic (with\\textit),bold (with\\textbf) and italic-bold.
The markup can be escaped by using an backslash first. For example"***This: \\***, are three stars***" will leave the three stars in the middle untouched.
Author(s)
Laurent Berge
See Also
For styling the table:setFixest_etable,style.tex,style.df.
See also the main estimation functionsfemlm,feols orfeglm.Usesummary.fixestto see the results with the appropriate standard-errors,fixef.fixest to extract thefixed-effects coefficients.
Examples
est1 = feols(Ozone ~ i(Month) / Wind + Temp, data = airquality)est2 = feols(Ozone ~ i(Month, Wind) + Temp | Month, data = airquality)# Displaying the two results in a single tableetable(est1, est2)# keep/drop: keeping only interactionsetable(est1, est2, keep = " x ")# or using drop (see regexp help):etable(est1, est2, drop = "^(Month|Temp|\\()")# keep/drop: dropping interactionsetable(est1, est2, drop = " x ")# or using keep ("!" reverses the effect):etable(est1, est2, keep = "! x ")# order: Wind variable first, intercept last (note the "!" to reverse the effect)etable(est1, est2, order = c("Wind", "!Inter"))# Month, then interactions, then the restetable(est1, est2, order = c("^Month", " x "))## dict## You can rename variables with dict = c(var1 = alias1, var2 = alias2, etc)# You can also rename values taken by factors.# Here's a full example:dict = c(Temp = "Temperature", "Month::5"="May", "6"="Jun")etable(est1, est2, dict = dict)# Note the difference of treatment between Jun and May# Assume the following dictionary:dict = c("Month::5"="May", "Month::6"="Jun", "Month::7"="Jul", "Month::8"="Aug", "Month::9"="Sep")# We would like to keep only the Months, but now the names are all changed...# How to do?# We can use the special character '%' to make reference to the original names.etable(est1, est2, dict = dict, keep = "%Month")## signif.code#etable(est1, est2, signif.code = c(" A"=0.01, " B"=0.05, " C"=0.1, " D"=0.15, " F"=1))## Using the argument style to customize Latex exports## If you don't like the default layout of the table, no worries!# You can modify many parameters with the argument style# To drop the headers before each section, use:# Note that a space adds an extra linestyle_noHeaders = style.tex(var.title = "", fixef.title = "", stats.title = " ")etable(est1, est2, dict = dict, tex = TRUE, style.tex = style_noHeaders)# To change the lines of the table + dropping the table footerstyle_lines = style.tex(line.top = "\\toprule", line.bottom = "\\bottomrule", tablefoot = FALSE)etable(est1, est2, dict = dict, tex = TRUE, style.tex = style_lines)# Or you have the predefined type "aer"etable(est1, est2, dict = dict, tex = TRUE, style.tex = style.tex("aer"))## Group and extralines## Sometimes it's useful to group control variables into a single line# You can achieve that with the group argumentsetFixest_fml(..ctrl = ~ poly(Wind, 2) + poly(Temp, 2))est_c0 = feols(Ozone ~ Solar.R, data = airquality)est_c1 = feols(Ozone ~ Solar.R + ..ctrl, data = airquality)est_c2 = feols(Ozone ~ Solar.R + Solar.R^2 + ..ctrl, data = airquality)etable(est_c0, est_c1, est_c2, group = list(Controls = "poly"))# 'group' here does the same as drop = "poly", but adds an extra line# with TRUE/FALSE where the variables were found# 'extralines' adds an extra line, where you can add the value for each modelest_all = feols(Ozone ~ Solar.R + Temp + Wind, data = airquality)est_sub1 = feols(Ozone ~ Solar.R + Temp + Wind, data = airquality, subset = ~ Month %in% 5:6)est_sub2 = feols(Ozone ~ Solar.R + Temp + Wind, data = airquality, subset = ~ Month %in% 7:8)est_sub3 = feols(Ozone ~ Solar.R + Temp + Wind, data = airquality, subset = ~ Month == 9)etable(est_all, est_sub1, est_sub2, est_sub3, extralines = list("Sub-sample" = c("All", "May-June", "Jul.-Aug.", "Sept.")))# You can monitor the placement of the new lines with two special characters# at the beginning of the row name.# 1) "^", "-" or "_" which mean the coefficients, the fixed-effects or the# statistics section.# 2) "^" or "_" which mean first or last line of the section## Ex: starting with "_^" will place the line at the top of the stat. section# starting with "-_" will place the line at the bottom of the FEs section# etc.## You can use a single character which will represent the section,# the line would then appear at the bottom of the section.# Examplesetable(est_c0, est_c1, est_c2, group = list("_Controls" = "poly"))etable(est_all, est_sub1, est_sub2, est_sub3, extralines = list("^^Sub-sample" = c("All", "May-June", "Jul.-Aug.", "Sept.")))## headers## You can add header lines with 'headers'# These lines will appear at the top of the table# first, 3 estimationsest_header = feols(c(Ozone, Solar.R, Wind) ~ poly(Temp, 2), airquality)# header => vector: adds a line w/t titleetable(est_header, headers = c("A", "A", "B"))# header => list: identical way to do the previous header# The form is: list(item1 = #item1, item2 = #item2, etc)etable(est_header, headers = list("A" = 2, "B" = 1))# Adding a title +# when an element is to be repeated only once, you can avoid the "= 1":etable(est_header, headers = list(Group = list("A" = 2, "B")))# To change the placement, add as first character:# - "^" => top# - "-" => mid (default)# - "_" => bottom# Note that "mid" and "top" are only distinguished when tex = TRUE# Placing the new header line at the bottometable(est_header, headers = list("_Group" = c("A", "A", "B"), "^Currency" = list("US $" = 2, "CA $" = 1)))# In Latex, you can add "grouped underlines" (cmidrule from the booktabs package)# by adding ":_:" in the title:etable(est_header, tex = TRUE, headers = list("^:_:Group" = c("A", "A", "B")))## extralines and headers: .() for list()## In the two arguments extralines and headers, .() can be used for list()# For example:etable(est_header, headers = .("^Currency" = .("US $" = 2, "CA $" = 1)))## fixef.group## You can group the fixed-effects line with fixef.groupest_0fe = feols(Ozone ~ Solar.R + Temp + Wind, airquality)est_1fe = feols(Ozone ~ Solar.R + Temp + Wind | Month, airquality)est_2fe = feols(Ozone ~ Solar.R + Temp + Wind | Month + Day, airquality)# A) automatic way => simply use fixef.group = TRUEetable(est_0fe, est_2fe, fixef.group = TRUE)# Note that when grouping would lead to inconsistencies across models,# it is avoidedetable(est_0fe, est_1fe, est_2fe, fixef.group = TRUE)# B) customized way => use a listetable(est_0fe, est_2fe, fixef.group = list("Dates" = "Month|Day"))# Note that when a user grouping would lead to inconsistencies,# the term partial replaces yes/no and the fixed-effects are not removed.etable(est_0fe, est_1fe, est_2fe, fixef.group = list("Dates" = "Month|Day"))# Using customized placement => as with 'group' and 'extralines',# the user can control the placement of the new line.# See the previous 'group' examples and the dedicated section in the help.# On top of the coefficients:etable(est_0fe, est_2fe, fixef.group = list("^^Dates" = "Month|Day"))# Last line of the statisticsetable(est_0fe, est_2fe, fixef.group = list("_Dates" = "Month|Day"))## Using custom functions to compute the standard errors## You can use external functions to compute the VCOVs# by feeding functions in the 'vcov' argument.# Let's use some covariances from the sandwich packageetable(est_c0, est_c1, est_c2, vcov = sandwich::vcovHC)# To add extra arguments to vcovHC, you need to write your wrapper:etable(est_c0, est_c1, est_c2, vcov = function(x) sandwich::vcovHC(x, type = "HC0"))## Customize which fit statistic to display## You can change the fit statistics with the argument fitstat# and you can rename them with the dictionaryetable(est1, est2, fitstat = ~ r2 + n + G)# If you use a formula, '.' means the default:etable(est1, est2, fitstat = ~ ll + .)## Computing a different SE for each model#est = feols(Ozone ~ Solar.R + Wind + Temp, data = airquality)## Method 1: use summarys1 = summary(est, "iid")s2 = summary(est, cluster = ~ Month)s3 = summary(est, cluster = ~ Day)s4 = summary(est, cluster = ~ Day + Month)etable(list(s1, s2, s3, s4))## Method 2: using a list in the argument 'vcov'est_bis = feols(Ozone ~ Solar.R + Wind + Temp | Month, data = airquality)etable(est, est_bis, vcov = list("hetero", ~ Month))# When you have only one model, this model is replicated# along the elements of the vcov list.etable(est, vcov = list("hetero", ~ Month))## Method 3: Using "each" or "times" in vcov# If the first element of the list in 'vcov' is "each" or "times",# then all models will be replicated and all the VCOVs will be# applied to each model. The order in which they are replicated# are governed by the each/times keywords.# eachetable(est, est_bis, vcov = list("each", "iid", ~ Month, ~ Day))# timesetable(est, est_bis, vcov = list("times", "iid", ~ Month, ~ Day))## Notes and markup## Notes can be also be set in a dictionary# You can use markdown markup to put text into italic/bolddict = c("note 1" = "*Notes:* This data is not really random.", "source 1" = "**Source:** the internet?")est = feols(Ozone ~ csw(Solar.R, Wind, Temp), data = airquality)etable(est, dict = dict, tex = TRUE, notes = c("note 1", "source 1"))Registerextralines macros to be used inetable
Description
This function is used to createextralines (which is an argument ofetable) macrosthat can be easily summoned inetable.
Usage
extralines_register(type, fun, alias)Arguments
type | A character scalar giving the type-name. |
fun | A function to be applied to a |
alias | A character scalar. This is the alias to be used in lieu of the type name toform the row name. |
Details
You can register as many macros as you wish, the only constraint is that the type name should not conflict with afitstat type name.
Examples
# We register a function computing the standard-deviation of the dependent variablemy_fun = function(x) sd(model.matrix(x, type = "lhs"))extralines_register("sdy", my_fun, "SD(y)")# An estimationdata(iris)est = feols(Petal.Length ~ Sepal.Length | Species, iris)# Now we can easily create a row with the SD of y.# We just "summon" it in a one-sided formulaetable(est, extralines = ~ sdy)# We can change the alias on the fly:etable(est, extralines = list("_Standard deviation of the dep. var." = ~ sdy))Lags a variable in afixest estimation
Description
Produce lags or leads in the formulas offixest estimations or when creating variables inadata.table::data.table. The data must be set as a panel beforehand (either withthe functionpanel or with the argumentpanel.id in the estimation).
Usage
f(x, k = 1, fill = NA)d(x, k = 1, fill = NA)l(x, k = 1, fill = NA)Arguments
x | The variable. |
k | A vector of integers giving the number of lags (for |
fill | A scalar, default is |
Value
These functions can only be used i) in a formula of afixest estimation, or ii) whencreating variables within afixest_panel object (obtained with functionpanel) whichis alaos adata.table::data.table.
Functions
f(): Forwards a variable (inverse of lagging) in afixestestimationd(): Creates differences (i.e. x - lag(x)) in afixestestimation
See Also
The functionpanel changesdata.frames into a panel from which the functionslandf can be called. Otherwise you can set the panel 'live' during the estimation usingthe argumentpanel.id (see for example in the functionfeols).
Examples
data(base_did)# Setting a data set as a panel...pdat = panel(base_did, ~ id + period)# ...then using the functions l and fest1 = feols(y ~ l(x1, 0:1), pdat)est2 = feols(f(y) ~ l(x1, -1:1), pdat)est3 = feols(l(y) ~ l(x1, 0:3), pdat)etable(est1, est2, est3, order = c("f", "^x"), drop = "Int")# or using the argument panel.idfeols(f(y) ~ l(x1, -1:1), base_did, panel.id = ~id + period)feols(d(y) ~ d(x1), base_did, panel.id = ~id + period)# l() and f() can also be used within a data.table:if(require("data.table")){ pdat_dt = panel(as.data.table(base_did), ~id+period) # Now since pdat_dt is also a data.table # you can create lags/leads directly pdat_dt[, x1_l1 := l(x1)] pdat_dt[, x1_d1 := d(x1)] pdat_dt[, c("x1_l1_fill0", "y_f2") := .(l(x1, fill = 0), f(y, 2))]}Formatted dimension
Description
Prints the dimension of a data set, in an user-readable way
Usage
fdim(x)Arguments
x | An R object, usually a data.frame (but can also be a vector). |
Value
It does not return anything, the output is directly printed on the console.
Author(s)
Laurent Berge
Examples
fdim(iris)fdim(iris$Species)Fixed effects nonlinear maximum likelihood models
Description
This function estimates maximum likelihood models (e.g., Poisson or Logit) with non-linearin parameters right-hand-sides and is efficient to handle any number of fixed effects.If you do not use non-linear in parameters right-hand-side, usefemlm orfeglminstead (their design is simpler).
Usage
feNmlm( fml, data, family = c("poisson", "negbin", "logit", "gaussian"), NL.fml, vcov, fixef, fixef.rm = "perfect_fit", NL.start, lower, upper, NL.start.init, offset, subset, split, fsplit, split.keep, split.drop, cluster, se, ssc, panel.id, panel.time.step = NULL, panel.duplicate.method = "none", start = 0, jacobian.method = "simple", useHessian = TRUE, hessian.args = NULL, opt.control = list(), nthreads = getFixest_nthreads(), lean = FALSE, verbose = 0, theta.init, fixef.tol = 1e-05, fixef.iter = 10000, deriv.tol = 1e-04, deriv.iter = 1000, warn = TRUE, notes = getFixest_notes(), fixef.keep_names = NULL, mem.clean = FALSE, only.env = FALSE, only.coef = FALSE, data.save = FALSE, env, ...)Arguments
fml | A formula. This formula gives the linear formula to be estimated(it is similar to a |
data | A data.frame containing the necessary variables to run the model.The variables of the non-linear right hand side of the formula are identifiedwith this |
family | Character scalar. It should provide the family. The possible valuesare "poisson" (Poisson model with log-link, the default), "negbin" (Negative Binomialmodel with log-link), "logit" (LOGIT model with log-link), "gaussian" (Gaussian model). |
NL.fml | A formula. If provided, this formula represents the non-linear part ofthe right hand side (RHS). Note that contrary to the |
vcov | Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form: |
fixef | Character vector. The names of variables to be used as fixed-effects.These variables should contain the identifier of each observation (e.g., think of itas a panel identifier). Note that the recommended way to include fixed-effects is toinsert them directly in the formula. |
fixef.rm | Can be equal to "perfect_fit" (default), "singletons", "infinite_coef"or "none". This option controls which observations should be removed prior to the estimation.If "singletons", fixed-effects associated to a single observation are removed(since they perfectly explain it). The value "infinite_coef" only works with GLM families with limited left hand sides (LHS)and exponential link.For instance the Poisson family for which the LHS cannot be lower than 0, or the logitfamily for which the LHS lies within 0 and 1.In that case the fixed-effects (FEs) with only-0 LHS would lead to infinite coefficients(FE = -Inf would explain perfectly the LHS).The value If "perfect_fit", it is equivalent to "singletons" and "infinite_coef" combined.That means all observations that are perfectly explained by the FEs are removed. If "none": no observation is removed. Note that whathever the value of this options: the coefficient estimateswill remain the same. It only affects inference (the standard-errors). The algorithm is recursive, meaning that, e.g. in the presence of several fixed-effects (FEs),removing singletons in one FE can create singletons (or perfect fits) in another FE.The algorithm continues until there is no singleton/perfect-fit remaining. |
NL.start | (For NL models only) A list of starting values for the non-linear parameters.ALL the parameters are to be named and given a staring value.Example: |
lower | (For NL models only) A list. The lower bound for each of the non-linearparameters that requires one. Example: |
upper | (For NL models only) A list. The upper bound for each of the non-linearparameters that requires one. Example: |
NL.start.init | (For NL models only) Numeric scalar. If the argument |
offset | A formula or a numeric vector. An offset can be added to the estimation.If equal to a formula, it should be of the form (for example) |
subset | A vector (logical or numeric) or a one-sided formula. If provided,then the estimation will be performed only on the observations defined by this argument. |
split | A one sided formula representing a variable (eg |
fsplit | A one sided formula representing a variable (eg |
split.keep | A character vector. Only used when |
split.drop | A character vector. Only used when |
cluster | Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering over |
se | Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation: |
ssc | An object of class |
panel.id | The panel identifiers. Can either be: i) a one sided formula(e.g. |
panel.time.step | The method to compute the lags, default is |
panel.duplicate.method | If several observations have the same id and time values,then the notion of lag is not defined for them. If |
start | Starting values for the coefficients in the linear part (for the non-linearpart, use NL.start). Can be: i) a numeric of length 1 (e.g. |
jacobian.method | (For NL models only) Character scalar. Provides the methodused to numerically compute the Jacobian of the non-linear part.Can be either |
useHessian | Logical. Should the Hessian be computed in the optimization stage?Default is |
hessian.args | List of arguments to be passed to function |
opt.control | List of elements to be passed to the optimization method |
nthreads | The number of threads. Can be: a) an integer lower than, or equal to,the maximum number of threads; b) 0: meaning all available threads will be used;c) a number strictly between 0 and 1 which represents the fraction of all threads to use.The default is to use 50% of all threads. You can set permanently the numberof threads used within this package using the function |
lean | Logical scalar, default is |
verbose | Integer, default is 0. It represents the level of information thatshould be reported during the optimisation process. If |
theta.init | Positive numeric scalar. The starting value of the dispersionparameter if |
fixef.tol | Precision used to obtain the fixed-effects. Defaults to |
fixef.iter | Maximum number of iterations in fixed-effects algorithm(only in use for 2+ fixed-effects). Default is 10000. |
deriv.tol | Precision used to obtain the fixed-effects derivatives. Defaults to |
deriv.iter | Maximum number of iterations in the algorithm to obtain the derivativeof the fixed-effects (only in use for 2+ fixed-effects). Default is 1000. |
warn | Logical, default is |
notes | Logical. By default, two notes are displayed: when NAs are removed(to show additional information) and when some observations are removed becauseof only 0 (or 0/1) outcomes in a fixed-effect setup (in Poisson/Neg. Bin./Logit models).To avoid displaying these messages, you can set |
fixef.keep_names | Logical or |
mem.clean | Logical scalar, default is |
only.env | (Advanced users.) Logical scalar, default is |
only.coef | Logical scalar, default is |
data.save | Logical scalar, default is |
env | (Advanced users.) A |
... | Not currently used. |
Details
This function estimates maximum likelihood models where the conditional expectationsare as follows:
Gaussian likelihood:
E(Y|X)=X\beta
Poisson and Negative Binomial likelihoods:
E(Y|X)=\exp(X\beta)
where in the Negative Binomial there is the parameter\theta used tomodel the variance as\mu+\mu^2/\theta, with\mu theconditional expectation.Logit likelihood:
E(Y|X)=\frac{\exp(X\beta)}{1+\exp(X\beta)}
When there are one or more fixed-effects, the conditional expectation can be written as:
E(Y|X) = h(X\beta+\sum_{k}\sum_{m}\gamma_{m}^{k}\times C_{im}^{k}),
whereh(.) is the function corresponding to the likelihood function as shown before.C^k is the matrix associated to fixed-effect dimensionk such thatC^k_{im}is equal to 1 if observationi is of categorym in thefixed-effect dimensionk and 0 otherwise.
When there are non linear in parameters functions, we can schematically splitthe set of regressors in two:
f(X,\beta)=X^1\beta^1 + g(X^2,\beta^2)
with first a linear term and then a non linear part expressed by the function g. That is,we add a non-linear term to the linear terms (which areX*beta andthe fixed-effects coefficients). It is always better (more efficient) to putinto the argumentNL.fml only the non-linear in parameter terms, andadd all linear terms in thefml argument.
To estimate only a non-linear formula without even the intercept, you mustexclude the intercept from the linear formula by using, e.g.,fml = z~0.
The over-dispersion parameter of the Negative Binomial family, theta,is capped at 10,000. If theta reaches this high value, it means that there is no overdispersion.
Value
Afixest object. Note thatfixest objects contain many elements and most of themare for internal use, they are presented here only for information. To access them,it is safer to use the user-level methods (e.g.vcov.fixest,resid.fixest,etc) or functions (like for instancefitstat to access any fit statistic).
coefficients | The named vector of coefficients. |
coeftable | The table of the coefficients with their standard errors,z-values and p-values. |
loglik | The loglikelihood. |
iterations | Number of iterations of the algorithm. |
nobs | The number of observations. |
nparams | The number of parameters of the model. |
call | The call. |
fml | The linear formula of the call. |
fml_all | A list containing different parts of the formula. Always containthe linear formula. Then, if relevant: |
ll_null | Log-likelihood of the null model (i.e. with the intercept only). |
pseudo_r2 | The adjusted pseudo R2. |
message | The convergence message from the optimization procedures. |
sq.cor | Squared correlation between the dependent variable and the expectedpredictor (i.e. fitted.values) obtained by the estimation. |
hessian | The Hessian of the parameters. |
fitted.values | The fitted values are the expected value of the dependent variablefor the fitted model: that is |
cov.iid | The variance-covariance matrix of the parameters. |
se | The standard-error of the parameters. |
scores | The matrix of the scores (first derivative for each observation). |
family | The ML family that was used for the estimation. |
data | The original data set used when calling the function. Only available whenthe estimation was called with |
residuals | The difference between the dependent variable and the expected predictor. |
sumFE | The sum of the fixed-effects for each observation. |
offset | The offset formula. |
NL.fml | The nonlinear formula of the call. |
bounds | Whether the coefficients were upper or lower bounded. – This can only bethe case when a non-linear formula is included and the arguments 'lower' or 'upper'are provided. |
isBounded | The logical vector that gives for each coefficient whether it wasbounded or not. This can only be the case when a non-linear formula is includedand the arguments 'lower' or 'upper' are provided. |
fixef_vars | The names of each fixed-effect dimension. |
fixef_id | The list (of length the number of fixed-effects) of thefixed-effects identifiers for each observation. |
fixef_sizes | The size of each fixed-effect (i.e. the number of uniqueidentifier for each fixed-effect dimension). |
obs_selection | (When relevant.) List containing vectors of integers. Itrepresents the sequential selection of observation vis a vis the original data set. |
fixef_removed | In the case there were fixed-effects and some observationswere removed because of only 0/1 outcome within a fixed-effect, it gives thelist (for each fixed-effect dimension) of the fixed-effect identifiers that were removed. |
theta | In the case of a negative binomial estimation: the overdispersion parameter. |
@seealsoSee alsosummary.fixest to see the results with the appropriate standard-errors,fixef.fixest to extract the fixed-effects coefficients, and the functionetableto visualize the results of multiple estimations.
And other estimation methods:feols,femlm,feglm,fepois,fenegbin.
Lagging variables
To use leads/lags of variables in the estimation, you can: i) either provide the argumentpanel.id, ii) either set your data set as a panel with the functionpanel,f andd.
You can provide several leads/lags/differences at once: e.g. if your formula is equal tof(y) ~ l(x, -1:1), it means that the dependent variable is equal to the lead ofy,and you will have as explanatory variables the lead ofx1,x1 and the lag ofx1.See the examples in functionl for more details.
Interactions
You can interact a numeric variable with a "factor-like" variable by usingi(factor_var, continuous_var, ref), wherecontinuous_var will be interacted witheach value offactor_var and the argumentref is a value offactor_vartaken as a reference (optional).
Using this specific way to create interactions leads to a different display of theinteracted values inetable. See examples.
It is important to note thatif you do not care about the standard-errors ofthe interactions, then you can add interactions in the fixed-effects part of the formula,it will be incomparably faster (using the syntaxfactor_var[continuous_var], as explainedin the section “Varying slopes”).
The functioni has in fact more arguments, please see details in its associated help page.
On standard-errors
Standard-errors can be computed in different ways, you can use the argumentsse andsscinsummary.fixest to define how to compute them. By default, the VCOV is the "standard" one.
The following vignette:On standard-errors describes in details how the standard-errors are computed infixest and how you can replicate standard-errors from other software.
You can use the functionssetFixest_vcov andsetFixest_ssc topermanently set the way the standard-errors are computed.
Multiple estimations
Multiple estimations can be performed at once, they just have to be specified in the formula.Multiple estimations yield afixest_multi object which is ‘kind of’ a list ofall the results but includes specific methods to access the results in a handy way.Please have a look at the dedicated vignette:Multiple estimations.
To include multiple dependent variables, wrap them inc() (list() also works).For instancefml = c(y1, y2) ~ x1 would estimate the modelfml = y1 ~ x1 andthen the modelfml = y2 ~ x1.
To include multiple independent variables, you need to use the stepwise functions.There are 4 stepwise functions:sw,sw0,csw,csw0, andmvsw. Of courseswstands for stepwise, andcsw for cumulative stepwise. Finallymvsw is a bit special,it stands for multiverse stepwise. Let's explain that.Assume you have the following formula:fml = y ~ x1 + sw(x2, x3).The stepwise functionsw will estimate the following two models:y ~ x1 + x2 andy ~ x1 + x3. That is, each element insw() is sequentially, and separately,added to the formula. Would have you usedsw0 in lieu ofsw, then the modely ~ x1 would also have been estimated. The0 in the name means that the modelwithout any stepwise element also needs to be estimated.The prefixc means cumulative: each stepwise element is added to the next. That is,fml = y ~ x1 + csw(x2, x3) would lead to the following modelsy ~ x1 + x2 andy ~ x1 + x2 + x3. The0 has the same meaning and would also lead to the model withoutthe stepwise elements to be estimated: in other words,fml = y ~ x1 + csw0(x2, x3)leads to the following three models:y ~ x1,y ~ x1 + x2 andy ~ x1 + x2 + x3.Finallymvsw will add, in a stepwise fashion all possible combinations of the variablesin its arguments. For examplemvsw(x1, x2, x3) is equivalent tosw0(x1, x2, x3, x1 + x2, x1 + x3, x2 + x3, x1 + x2 + x3). The number of modelsto estimate grows at a factorial rate: so be cautious!
Multiple independent variables can be combined with multiple dependent variables, as infml = c(y1, y2) ~ cw(x1, x2, x3) which would lead to 6 estimations. Multipleestimations can also be combined to split samples (with the argumentssplit,fsplit).
You can also add fixed-effects in a stepwise fashion. Note that you cannot performstepwise estimations on the IV part of the formula (feols only).
If NAs are present in the sample, to avoid too many messages, only NA removalconcerning the variables common to all estimations is reported.
A note on performance. The feature of multiple estimations has been highly optimized forfeols, in particular in the presence of fixed-effects. It is faster to estimatemultiple models using the formula rather than with a loop. For non-feols models usingthe formula is roughly similar to using a loop performance-wise.
Argument sliding
When the data set has been set up globally usingsetFixest_estimation(data = data_set), the argumentvcov can be used implicitly.This means that calls such asfeols(y ~ x, "HC1"), orfeols(y ~ x, ~id), are valid:i) the data is automatically deduced from the global settings, and ii) thevcovis deduced to be the second argument.
Piping
Although the argument 'data' is placed in second position, the data can be piped to theestimation functions. For example, with R >= 4.1,mtcars |> feols(mpg ~ cyl) works asfeols(mpg ~ cyl, mtcars).
Tricks to estimate multiple LHS
To use multiple dependent variables infixest estimations, you need to include themin a vector: like inc(y1, y2, y3).
First, if names are stored in a vector, they can readily be inserted in a formula toperform multiple estimations using the dot square bracket operator. For instance ifmy_lhs = c("y1", "y2"), callingfixest with, sayfeols(.[my_lhs] ~ x1, etc) isequivalent to usingfeols(c(y1, y2) ~ x1, etc). Beware that this is a special featureunique to theleft-hand-side offixest estimations (the default behavior of the DSBoperator is to aggregate with sums, seexpd).
Second, you can use a regular expression to grep the left-hand-sides on the fly. When the..("regex") (reregex("regex")) feature is used naked on the LHS,the variables grepped are inserted intoc(). For example..("Pe") ~ Sepal.Length, iris is equivalent toc(Petal.Length, Petal.Width) ~ Sepal.Length, iris. Beware that this is aspecial feature unique to theleft-hand-side offixest estimations(the default behavior of..("regex") is to aggregate with sums, seexpd).
Note that if the dependent variable is also on the right-hand-side, it is automaticallyremoved from the set of explanatory variable.For example, feols(y ~ y + x, base) works as feols(y ~ x, base).This is particulary useful to batch multiple estimations with multiple left hand sides.
Dot square bracket operator in formulas
In a formula, the dot square bracket (DSB) operator can: i) create manifold variablesat once, or ii) capture values from the current environment and put themverbatim in the formula.
Say you want to include the variablesx1 tox3 in your formula. You can usexpd(y ~ x.[1:3]) and you'll gety ~ x1 + x2 + x3.
To summon values from the environment, simply put the variable in square brackets.For example:for(i in 1:3) xpd(y.[i] ~ x) will create the formulasy1 ~ x toy3 ~ xdepending on the value ofi.
You can include a full variable from the environment in the same way:for(y in c("a", "b")) xpd(.[y] ~ x) will create the two formulasa ~ x andb ~ x.
The DSB can even be used within variable names, but then the variable must be nested incharacter form. For exampley ~ .["x.[1:2]_sq"] will createy ~ x1_sq + x2_sq. Using thecharacter form is important to avoid a formula parsing error.Double quotes must be used. Note that the character string that is nested willbe parsed with the functiondsb, and thus it will return a vector.
By default, the DSB operator expands vectors into sums. You can add a comma,like in.[, x],to expand with commas–the content can then be used within functions. For instance:c(x.[, 1:2]) will createc(x1, x2) (andnotc(x1 + x2)).
In allfixest estimations, this special parsing is enabled, so you don't need to usexpd.
One-sided formulas can be expanded with the DSB operator: letx = ~sepal + petal, thenxpd(y ~ .[x]) leads tocolor ~ sepal + petal.
You can even use multiple square brackets within a single variable,but then the use of nesting is required.For example, the followingxpd(y ~ .[".[letters[1:2]]_.[1:2]"]) will createy ~ a_1 + b_2. Remember that the nested character string is parsed withdsb,which explains this behavior.
When the element to be expanded i) is equal to the empty string or,ii) is of length 0, it is replaced with a neutral element, namely1.For example,x = "" ; xpd(y ~ .[x]) leads toy ~ 1.
Author(s)
Laurent Berge
References
Berge, Laurent, 2018, "Efficient estimation of maximum likelihood models withmultiple fixed-effects: the R package FENmlm." CREA Discussion Papers,13 ().
For models with multiple fixed-effects:
Gaure, Simen, 2013, "OLS with multiple high dimensional category variables",Computational Statistics & Data Analysis 66 pp. 8–18
On the unconditionnal Negative Binomial model:
Allison, Paul D and Waterman, Richard P, 2002, "Fixed-Effects NegativeBinomial Regression Models", Sociological Methodology 32(1) pp. 247–265
Examples
# This section covers only non-linear in parameters examples# For linear relationships: use femlm or feglm instead# Generating data for a simple exampleset.seed(1)n = 100x = rnorm(n, 1, 5)**2y = rnorm(n, -1, 5)**2z1 = rpois(n, x*y) + rpois(n, 2)base = data.frame(x, y, z1)# Estimating a 'linear' relation:est1_L = femlm(z1 ~ log(x) + log(y), base)# Estimating the same 'linear' relation using a 'non-linear' callest1_NL = feNmlm(z1 ~ 1, base, NL.fml = ~a*log(x)+b*log(y), NL.start = list(a=0, b=0))# we compare the estimates with the function esttable (they are identical)etable(est1_L, est1_NL)# Now generating a non-linear relation (E(z2) = x + y + 1):z2 = rpois(n, x + y) + rpois(n, 1)base$z2 = z2# Estimation using this non-linear formest2_NL = feNmlm(z2 ~ 0, base, NL.fml = ~log(a*x + b*y), NL.start = 2, lower = list(a=0, b=0))# we can't estimate this relation linearily# => closest we can do:est2_L = femlm(z2 ~ log(x) + log(y), base)# Difference between the two models:etable(est2_L, est2_NL)# Plotting the fits:plot(x, z2, pch = 18)points(x, fitted(est2_L), col = 2, pch = 1)points(x, fitted(est2_NL), col = 4, pch = 2)Fixed-effects GLM estimations
Description
Estimates GLM models with any number of fixed-effects.
Usage
feglm( fml, data, family = "gaussian", vcov, offset, weights, subset, split, fsplit, split.keep, split.drop, cluster, se, ssc, panel.id, panel.time.step = NULL, panel.duplicate.method = "none", start = NULL, etastart = NULL, mustart = NULL, fixef, fixef.rm = "perfect_fit", fixef.tol = 1e-06, fixef.iter = 10000, fixef.algo = NULL, collin.tol = 1e-09, glm.iter = 25, glm.tol = 1e-08, nthreads = getFixest_nthreads(), lean = FALSE, warn = TRUE, notes = getFixest_notes(), verbose = 0, only.coef = FALSE, data.save = FALSE, fixef.keep_names = NULL, mem.clean = FALSE, only.env = FALSE, env, ...)feglm.fit( y, X, fixef_df, family = "gaussian", vcov, offset, split, fsplit, split.keep, split.drop, cluster, se, ssc, weights, subset, start = NULL, etastart = NULL, mustart = NULL, fixef.rm = "perfect_fit", fixef.tol = 1e-06, fixef.iter = 10000, fixef.algo = NULL, collin.tol = 1e-09, glm.iter = 25, glm.tol = 1e-08, nthreads = getFixest_nthreads(), lean = FALSE, warn = TRUE, notes = getFixest_notes(), mem.clean = FALSE, verbose = 0, only.env = FALSE, only.coef = FALSE, env, ...)fepois( fml, data, vcov, offset, weights, subset, split, fsplit, split.keep, split.drop, cluster, se, ssc, panel.id, panel.time.step = NULL, panel.duplicate.method = "none", start = NULL, etastart = NULL, mustart = NULL, fixef, fixef.rm = "perfect_fit", fixef.tol = 1e-06, fixef.iter = 10000, fixef.algo = NULL, collin.tol = 1e-09, glm.iter = 25, glm.tol = 1e-08, nthreads = getFixest_nthreads(), lean = FALSE, warn = TRUE, notes = getFixest_notes(), verbose = 0, fixef.keep_names = NULL, mem.clean = FALSE, only.env = FALSE, only.coef = FALSE, data.save = FALSE, env, ...)Arguments
fml | A formula representing the relation to be estimated. For example: |
data | A data.frame containing the necessary variables to run the model.The variables of the non-linear right hand side of the formula are identifiedwith this |
family | Family to be used for the estimation. Defaults to |
vcov | Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form: |
offset | A formula or a numeric vector. An offset can be added to the estimation.If equal to a formula, it should be of the form (for example) |
weights | A formula or a numeric vector. Each observation can be weighted,the weights must be greater than 0. If equal to a formula, it should be one-sided:for example |
subset | A vector (logical or numeric) or a one-sided formula. If provided,then the estimation will be performed only on the observations defined by this argument. |
split | A one sided formula representing a variable (eg |
fsplit | A one sided formula representing a variable (eg |
split.keep | A character vector. Only used when |
split.drop | A character vector. Only used when |
cluster | Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering over |
se | Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation: |
ssc | An object of class |
panel.id | The panel identifiers. Can either be: i) a one sided formula(e.g. |
panel.time.step | The method to compute the lags, default is |
panel.duplicate.method | If several observations have the same id and time values,then the notion of lag is not defined for them. If |
start | Starting values for the coefficients. Can be: i) a numeric of length 1(e.g. |
etastart | Numeric vector of the same length as the data. Starting values for thelinear predictor. Default is missing. |
mustart | Numeric vector of the same length as the data. Starting values for thevector of means. Default is missing. |
fixef | Character vector. The names of variables to be used as fixed-effects.These variables should contain the identifier of each observation (e.g., think of itas a panel identifier). Note that the recommended way to include fixed-effects is toinsert them directly in the formula. |
fixef.rm | Can be equal to "perfect_fit" (default), "singletons", "infinite_coef"or "none". This option controls which observations should be removed prior to the estimation.If "singletons", fixed-effects associated to a single observation are removed(since they perfectly explain it). The value "infinite_coef" only works with GLM families with limited left hand sides (LHS)and exponential link.For instance the Poisson family for which the LHS cannot be lower than 0, or the logitfamily for which the LHS lies within 0 and 1.In that case the fixed-effects (FEs) with only-0 LHS would lead to infinite coefficients(FE = -Inf would explain perfectly the LHS).The value If "perfect_fit", it is equivalent to "singletons" and "infinite_coef" combined.That means all observations that are perfectly explained by the FEs are removed. If "none": no observation is removed. Note that whathever the value of this options: the coefficient estimateswill remain the same. It only affects inference (the standard-errors). The algorithm is recursive, meaning that, e.g. in the presence of several fixed-effects (FEs),removing singletons in one FE can create singletons (or perfect fits) in another FE.The algorithm continues until there is no singleton/perfect-fit remaining. |
fixef.tol | Precision used to obtain the fixed-effects. Defaults to |
fixef.iter | Maximum number of iterations in fixed-effects algorithm(only in use for 2+ fixed-effects). Default is 10000. |
fixef.algo |
|
collin.tol | Numeric scalar, default is |
glm.iter | Number of iterations of the glm algorithm. Default is 25. |
glm.tol | Tolerance level for the glm algorithm. Default is |
nthreads | The number of threads. Can be: a) an integer lower than, or equal to,the maximum number of threads; b) 0: meaning all available threads will be used;c) a number strictly between 0 and 1 which represents the fraction of all threads to use.The default is to use 50% of all threads. You can set permanently the numberof threads used within this package using the function |
lean | Logical scalar, default is |
warn | Logical, default is |
notes | Logical. By default, three notes are displayed: when NAs are removed,when some fixed-effects are removed because of only 0 (or 0/1) outcomes, or when avariable is dropped because of collinearity. To avoid displaying these messages,you can set |
verbose | Integer. Higher values give more information. In particular,it can detail the number of iterations in the demeaning algoritmh (the first numberis the left-hand-side, the other numbers are the right-hand-side variables).It can also detail the step-halving algorithm. |
only.coef | Logical scalar, default is |
data.save | Logical scalar, default is |
fixef.keep_names | Logical or |
mem.clean | Logical scalar, default is |
only.env | (Advanced users.) Logical scalar, default is |
env | (Advanced users.) A |
... | Not currently used. |
y | Numeric vector/matrix/data.frame of the dependent variable(s). Multiple dependentvariables will return a |
X | Numeric matrix of the regressors. |
fixef_df | Matrix/data.frame of the fixed-effects. |
Details
The core of the GLM are the weighted OLS estimations. These estimations are performedwithfeols. The method used to demean each variable along the fixed-effectsis based on Berge (2018), since this is the same problem to solve as for the Gaussiancase in a ML setup.
Value
Afixest object. Note thatfixest objects contain many elements and most of themare for internal use, they are presented here only for information. To access them,it is safer to use the user-level methods (e.g.vcov.fixest,resid.fixest,etc) or functions (like for instancefitstat to access any fit statistic).
nobs | The number of observations. |
fml | The linear formula of the call. |
call | The call of the function. |
method | The method used to estimate the model. |
family | The family used to estimate the model. |
data | The original data set used when calling the function. Only available whenthe estimation was called with |
fml_all | A list containing different parts of the formula. Always contain thelinear formula. Then, if relevant: |
nparams | The number of parameters of the model. |
fixef_vars | The names of each fixed-effect dimension. |
fixef_id | The list (of length the number of fixed-effects) of thefixed-effects identifiers for each observation. |
fixef_sizes | The size of each fixed-effect (i.e. the number of unique identifier foreach fixed-effect dimension). |
y | (When relevant.) The dependent variable (used to compute the within-R2when fixed-effects are present). |
convStatus | Logical, convergence status of the IRWLS algorithm. |
irls_weights | The weights of the last iteration of the IRWLS algorithm. |
obs_selection | (When relevant.) List containing vectors of integers. It representsthe sequential selection of observation vis a vis the original data set. |
fixef_removed | (When relevant.) In the case there were fixed-effects and someobservations were removed because of only 0/1 outcome within a fixed-effect, it gives thelist (for each fixed-effect dimension) of the fixed-effect identifiers that were removed. |
coefficients | The named vector of estimated coefficients. |
coeftable | The table of the coefficients with their standard errors,z-values and p-values. |
loglik | The loglikelihood. |
deviance | Deviance of the fitted model. |
iterations | Number of iterations of the algorithm. |
ll_null | Log-likelihood of the null model (i.e. with the intercept only). |
ssr_null | Sum of the squared residuals of the null model (containing onlywith the intercept). |
pseudo_r2 | The adjusted pseudo R2. |
fitted.values | The fitted values are the expected value of the dependentvariable for the fitted model: that is |
linear.predictors | The linear predictors. |
residuals | The residuals (y minus the fitted values). |
sq.cor | Squared correlation between the dependent variable and the expectedpredictor (i.e. fitted.values) obtained by the estimation. |
hessian | The Hessian of the parameters. |
cov.iid | The variance-covariance matrix of the parameters. |
se | The standard-error of the parameters. |
scores | The matrix of the scores (first derivative for each observation). |
residuals | The difference between the dependent variable and the expected predictor. |
sumFE | The sum of the fixed-effects coefficients for each observation. |
offset | (When relevant.) The offset formula. |
weights | (When relevant.) The weights formula. |
collin.var | (When relevant.) Vector containing the variables removedbecause of collinearity. |
collin.coef | (When relevant.) Vector of coefficients, where the values of the variables removed because of collinearity are NA. |
Combining the fixed-effects
You can combine two variables to make it a new fixed-effect using^.The syntax is as follows:fe_1^fe_2. Here you created a new variable which is the combinationof the two variables fe_1 and fe_2. This is identical to doingpaste0(fe_1, "_", fe_2)but more convenient.
Note that pasting is a costly operation, especially for large data sets.Hence, by default this paste is done only when the number of unique valuesis lower than 50,000 observations.
In case you are using a large data set and want to keep the identity of the fixed-effects,you need to use the argumentfixef.keep_names = TRUE.
Note that these “identities” are useful only if you're interested inthe value of the fixed-effects (that you can extract withfixef.fixest).
Varying slopes
You can add variables with varying slopes in the fixed-effect part of the formula.The syntax is as follows:fixef_var[var1, var2]. Here the variables var1 and var2 willbe with varying slopes (one slope per value in fixef_var) and the fixed-effectfixef_var will also be added.
To add only the variables with varying slopes and not the fixed-effect,use double square brackets:fixef_var[[var1, var2]].
In other words:
fixef_var[var1, var2]is equivalent tofixef_var + fixef_var[[var1]] + fixef_var[[var2]]fixef_var[[var1, var2]]is equivalent tofixef_var[[var1]] + fixef_var[[var2]]
In general, for convergence reasons, it is recommended to always add the fixed-effect andavoid using only the variable with varying slope (i.e. use single square brackets).
Lagging variables
To use leads/lags of variables in the estimation, you can: i) either provide the argumentpanel.id, ii) either set your data set as a panel with the functionpanel,f andd.
You can provide several leads/lags/differences at once: e.g. if your formula is equal tof(y) ~ l(x, -1:1), it means that the dependent variable is equal to the lead ofy,and you will have as explanatory variables the lead ofx1,x1 and the lag ofx1.See the examples in functionl for more details.
Interactions
You can interact a numeric variable with a "factor-like" variable by usingi(factor_var, continuous_var, ref), wherecontinuous_var will be interacted witheach value offactor_var and the argumentref is a value offactor_vartaken as a reference (optional).
Using this specific way to create interactions leads to a different display of theinteracted values inetable. See examples.
It is important to note thatif you do not care about the standard-errors ofthe interactions, then you can add interactions in the fixed-effects part of the formula,it will be incomparably faster (using the syntaxfactor_var[continuous_var], as explainedin the section “Varying slopes”).
The functioni has in fact more arguments, please see details in its associated help page.
On standard-errors
Standard-errors can be computed in different ways, you can use the argumentsse andsscinsummary.fixest to define how to compute them. By default, the VCOV is the "standard" one.
The following vignette:On standard-errors describes in details how the standard-errors are computed infixest and how you can replicate standard-errors from other software.
You can use the functionssetFixest_vcov andsetFixest_ssc topermanently set the way the standard-errors are computed.
Multiple estimations
Multiple estimations can be performed at once, they just have to be specified in the formula.Multiple estimations yield afixest_multi object which is ‘kind of’ a list ofall the results but includes specific methods to access the results in a handy way.Please have a look at the dedicated vignette:Multiple estimations.
To include multiple dependent variables, wrap them inc() (list() also works).For instancefml = c(y1, y2) ~ x1 would estimate the modelfml = y1 ~ x1 andthen the modelfml = y2 ~ x1.
To include multiple independent variables, you need to use the stepwise functions.There are 4 stepwise functions:sw,sw0,csw,csw0, andmvsw. Of courseswstands for stepwise, andcsw for cumulative stepwise. Finallymvsw is a bit special,it stands for multiverse stepwise. Let's explain that.Assume you have the following formula:fml = y ~ x1 + sw(x2, x3).The stepwise functionsw will estimate the following two models:y ~ x1 + x2 andy ~ x1 + x3. That is, each element insw() is sequentially, and separately,added to the formula. Would have you usedsw0 in lieu ofsw, then the modely ~ x1 would also have been estimated. The0 in the name means that the modelwithout any stepwise element also needs to be estimated.The prefixc means cumulative: each stepwise element is added to the next. That is,fml = y ~ x1 + csw(x2, x3) would lead to the following modelsy ~ x1 + x2 andy ~ x1 + x2 + x3. The0 has the same meaning and would also lead to the model withoutthe stepwise elements to be estimated: in other words,fml = y ~ x1 + csw0(x2, x3)leads to the following three models:y ~ x1,y ~ x1 + x2 andy ~ x1 + x2 + x3.Finallymvsw will add, in a stepwise fashion all possible combinations of the variablesin its arguments. For examplemvsw(x1, x2, x3) is equivalent tosw0(x1, x2, x3, x1 + x2, x1 + x3, x2 + x3, x1 + x2 + x3). The number of modelsto estimate grows at a factorial rate: so be cautious!
Multiple independent variables can be combined with multiple dependent variables, as infml = c(y1, y2) ~ cw(x1, x2, x3) which would lead to 6 estimations. Multipleestimations can also be combined to split samples (with the argumentssplit,fsplit).
You can also add fixed-effects in a stepwise fashion. Note that you cannot performstepwise estimations on the IV part of the formula (feols only).
If NAs are present in the sample, to avoid too many messages, only NA removalconcerning the variables common to all estimations is reported.
A note on performance. The feature of multiple estimations has been highly optimized forfeols, in particular in the presence of fixed-effects. It is faster to estimatemultiple models using the formula rather than with a loop. For non-feols models usingthe formula is roughly similar to using a loop performance-wise.
Argument sliding
When the data set has been set up globally usingsetFixest_estimation(data = data_set), the argumentvcov can be used implicitly.This means that calls such asfeols(y ~ x, "HC1"), orfeols(y ~ x, ~id), are valid:i) the data is automatically deduced from the global settings, and ii) thevcovis deduced to be the second argument.
Piping
Although the argument 'data' is placed in second position, the data can be piped to theestimation functions. For example, with R >= 4.1,mtcars |> feols(mpg ~ cyl) works asfeols(mpg ~ cyl, mtcars).
Tricks to estimate multiple LHS
To use multiple dependent variables infixest estimations, you need to include themin a vector: like inc(y1, y2, y3).
First, if names are stored in a vector, they can readily be inserted in a formula toperform multiple estimations using the dot square bracket operator. For instance ifmy_lhs = c("y1", "y2"), callingfixest with, sayfeols(.[my_lhs] ~ x1, etc) isequivalent to usingfeols(c(y1, y2) ~ x1, etc). Beware that this is a special featureunique to theleft-hand-side offixest estimations (the default behavior of the DSBoperator is to aggregate with sums, seexpd).
Second, you can use a regular expression to grep the left-hand-sides on the fly. When the..("regex") (reregex("regex")) feature is used naked on the LHS,the variables grepped are inserted intoc(). For example..("Pe") ~ Sepal.Length, iris is equivalent toc(Petal.Length, Petal.Width) ~ Sepal.Length, iris. Beware that this is aspecial feature unique to theleft-hand-side offixest estimations(the default behavior of..("regex") is to aggregate with sums, seexpd).
Note that if the dependent variable is also on the right-hand-side, it is automaticallyremoved from the set of explanatory variable.For example, feols(y ~ y + x, base) works as feols(y ~ x, base).This is particulary useful to batch multiple estimations with multiple left hand sides.
Dot square bracket operator in formulas
In a formula, the dot square bracket (DSB) operator can: i) create manifold variablesat once, or ii) capture values from the current environment and put themverbatim in the formula.
Say you want to include the variablesx1 tox3 in your formula. You can usexpd(y ~ x.[1:3]) and you'll gety ~ x1 + x2 + x3.
To summon values from the environment, simply put the variable in square brackets.For example:for(i in 1:3) xpd(y.[i] ~ x) will create the formulasy1 ~ x toy3 ~ xdepending on the value ofi.
You can include a full variable from the environment in the same way:for(y in c("a", "b")) xpd(.[y] ~ x) will create the two formulasa ~ x andb ~ x.
The DSB can even be used within variable names, but then the variable must be nested incharacter form. For exampley ~ .["x.[1:2]_sq"] will createy ~ x1_sq + x2_sq. Using thecharacter form is important to avoid a formula parsing error.Double quotes must be used. Note that the character string that is nested willbe parsed with the functiondsb, and thus it will return a vector.
By default, the DSB operator expands vectors into sums. You can add a comma,like in.[, x],to expand with commas–the content can then be used within functions. For instance:c(x.[, 1:2]) will createc(x1, x2) (andnotc(x1 + x2)).
In allfixest estimations, this special parsing is enabled, so you don't need to usexpd.
One-sided formulas can be expanded with the DSB operator: letx = ~sepal + petal, thenxpd(y ~ .[x]) leads tocolor ~ sepal + petal.
You can even use multiple square brackets within a single variable,but then the use of nesting is required.For example, the followingxpd(y ~ .[".[letters[1:2]]_.[1:2]"]) will createy ~ a_1 + b_2. Remember that the nested character string is parsed withdsb,which explains this behavior.
When the element to be expanded i) is equal to the empty string or,ii) is of length 0, it is replaced with a neutral element, namely1.For example,x = "" ; xpd(y ~ .[x]) leads toy ~ 1.
Author(s)
Laurent Berge
References
Berge, Laurent, 2018, "Efficient estimation of maximum likelihood models withmultiple fixed-effects: the R package FENmlm." CREA Discussion Papers,13 ().
For models with multiple fixed-effects:
Gaure, Simen, 2013, "OLS with multiple high dimensional category variables",Computational Statistics & Data Analysis 66 pp. 8–18
See Also
See alsosummary.fixest to see the results with the appropriate standard-errors,fixef.fixest to extract the fixed-effects coefficients, and the functionetableto visualize the results of multiple estimations.And other estimation methods:feols,femlm,fenegbin,feNmlm.
Examples
# Poisson estimationres = feglm(Sepal.Length ~ Sepal.Width + Petal.Length | Species, iris, "poisson")# You could also use fepoisres_pois = fepois(Sepal.Length ~ Sepal.Width + Petal.Length | Species, iris)# With the fit method:res_fit = feglm.fit(iris$Sepal.Length, iris[, 2:3], iris$Species, "poisson")# All results are identical:etable(res, res_pois, res_fit)# Note that you have many more examples in feols## Multiple estimations:## 6 estimationsest_mult = fepois(c(Ozone, Solar.R) ~ Wind + Temp + csw0(Wind:Temp, Day), airquality)# We can display the results for the first lhs:etable(est_mult[lhs = 1])# And now the second (access can be made by name)etable(est_mult[lhs = "Solar.R"])# Now we focus on the two last right hand sides# (note that .N can be used to specify the last item)etable(est_mult[rhs = 2:.N])# Combining with splitest_split = fepois(c(Ozone, Solar.R) ~ sw(poly(Wind, 2), poly(Temp, 2)), airquality, split = ~ Month)# You can display everything at once with the print methodest_split# Different way of displaying the results with "compact"summary(est_split, "compact")# You can still select which sample/LHS/RHS to displayest_split[sample = 1:2, lhs = 1, rhs = 1]Fixed-effects maximum likelihood models
Description
This function estimates maximum likelihood models with any number of fixed-effects.
Usage
femlm( fml, data, family = c("poisson", "negbin", "logit", "gaussian"), vcov, start = 0, fixef, fixef.rm = "perfect_fit", offset, subset, split, fsplit, split.keep, split.drop, cluster, se, ssc, panel.id, panel.time.step = NULL, panel.duplicate.method = "none", fixef.tol = 1e-05, fixef.iter = 10000, nthreads = getFixest_nthreads(), lean = FALSE, verbose = 0, warn = TRUE, notes = getFixest_notes(), theta.init, fixef.keep_names = NULL, mem.clean = FALSE, only.env = FALSE, only.coef = FALSE, data.save = FALSE, env, ...)fenegbin( fml, data, vcov, theta.init, start = 0, fixef, fixef.rm = "perfect_fit", offset, subset, split, fsplit, split.keep, split.drop, cluster, se, ssc, panel.id, panel.time.step = NULL, panel.duplicate.method = "none", fixef.tol = 1e-05, fixef.iter = 10000, nthreads = getFixest_nthreads(), lean = FALSE, verbose = 0, warn = TRUE, notes = getFixest_notes(), fixef.keep_names = NULL, mem.clean = FALSE, only.env = FALSE, only.coef = FALSE, data.save = FALSE, env, ...)Arguments
fml | A formula representing the relation to be estimated. For example: |
data | A data.frame containing the necessary variables to run the model.The variables of the non-linear right hand side of the formula are identifiedwith this |
family | Character scalar. It should provide the family. The possible valuesare "poisson" (Poisson model with log-link, the default), "negbin" (Negative Binomialmodel with log-link), "logit" (LOGIT model with log-link), "gaussian" (Gaussian model). |
vcov | Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form: |
start | Starting values for the coefficients. Can be: i) a numeric of length 1(e.g. |
fixef | Character vector. The names of variables to be used as fixed-effects.These variables should contain the identifier of each observation (e.g., think of itas a panel identifier). Note that the recommended way to include fixed-effects is toinsert them directly in the formula. |
fixef.rm | Can be equal to "perfect_fit" (default), "singletons", "infinite_coef"or "none". This option controls which observations should be removed prior to the estimation.If "singletons", fixed-effects associated to a single observation are removed(since they perfectly explain it). The value "infinite_coef" only works with GLM families with limited left hand sides (LHS)and exponential link.For instance the Poisson family for which the LHS cannot be lower than 0, or the logitfamily for which the LHS lies within 0 and 1.In that case the fixed-effects (FEs) with only-0 LHS would lead to infinite coefficients(FE = -Inf would explain perfectly the LHS).The value If "perfect_fit", it is equivalent to "singletons" and "infinite_coef" combined.That means all observations that are perfectly explained by the FEs are removed. If "none": no observation is removed. Note that whathever the value of this options: the coefficient estimateswill remain the same. It only affects inference (the standard-errors). The algorithm is recursive, meaning that, e.g. in the presence of several fixed-effects (FEs),removing singletons in one FE can create singletons (or perfect fits) in another FE.The algorithm continues until there is no singleton/perfect-fit remaining. |
offset | A formula or a numeric vector. An offset can be added to the estimation.If equal to a formula, it should be of the form (for example) |
subset | A vector (logical or numeric) or a one-sided formula. If provided,then the estimation will be performed only on the observations defined by this argument. |
split | A one sided formula representing a variable (eg |
fsplit | A one sided formula representing a variable (eg |
split.keep | A character vector. Only used when |
split.drop | A character vector. Only used when |
cluster | Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering over |
se | Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation: |
ssc | An object of class |
panel.id | The panel identifiers. Can either be: i) a one sided formula(e.g. |
panel.time.step | The method to compute the lags, default is |
panel.duplicate.method | If several observations have the same id and time values,then the notion of lag is not defined for them. If |
fixef.tol | Precision used to obtain the fixed-effects. Defaults to |
fixef.iter | Maximum number of iterations in fixed-effects algorithm(only in use for 2+ fixed-effects). Default is 10000. |
nthreads | The number of threads. Can be: a) an integer lower than, or equal to,the maximum number of threads; b) 0: meaning all available threads will be used;c) a number strictly between 0 and 1 which represents the fraction of all threads to use.The default is to use 50% of all threads. You can set permanently the numberof threads used within this package using the function |
lean | Logical scalar, default is |
verbose | Integer, default is 0. It represents the level of information thatshould be reported during the optimisation process. If |
warn | Logical, default is |
notes | Logical. By default, two notes are displayed: when NAs are removed(to show additional information) and when some observations are removed becauseof only 0 (or 0/1) outcomes in a fixed-effect setup (in Poisson/Neg. Bin./Logit models).To avoid displaying these messages, you can set |
theta.init | Positive numeric scalar. The starting value of the dispersionparameter if |
fixef.keep_names | Logical or |
mem.clean | Logical scalar, default is |
only.env | (Advanced users.) Logical scalar, default is |
only.coef | Logical scalar, default is |
data.save | Logical scalar, default is |
env | (Advanced users.) A |
... | Not currently used. |
Details
Note that the functionsfeglm andfemlm provide the same results when usingthe same families but differ in that the latter is a direct maximum likelihoodoptimization (so the two can really have different convergence rates).
Value
Afixest object. Note thatfixest objects contain many elements and most ofthem are for internal use, they are presented here only for information.To access them, it is safer to use the user-level methods(e.g.vcov.fixest,resid.fixest, etc) or functions (like for instancefitstat to access any fit statistic).
nobs | The number of observations. |
fml | The linear formula of the call. |
call | The call of the function. |
method | The method used to estimate the model. |
family | The family used to estimate the model. |
data | The original data set used when calling the function. Only available whenthe estimation was called with |
fml_all | A list containing different parts of the formula. Always contain thelinear formula. Then, if relevant: |
nparams | The number of parameters of the model. |
fixef_vars | The names of each fixed-effect dimension. |
fixef_id | The list (of length the number of fixed-effects) of thefixed-effects identifiers for each observation. |
fixef_sizes | The size of each fixed-effect (i.e. the number of uniqueidentifier for each fixed-effect dimension). |
convStatus | Logical, convergence status. |
message | The convergence message from the optimization procedures. |
obs_selection | (When relevant.) List containing vectors of integers. It representsthe sequential selection of observation vis a vis the original data set. |
fixef_removed | (When relevant.) In the case there were fixed-effects and someobservations were removed because of only 0/1 outcome within a fixed-effect, it gives thelist (for each fixed-effect dimension) of the fixed-effect identifiers that were removed. |
coefficients | The named vector of estimated coefficients. |
coeftable | The table of the coefficients with their standard errors, z-valuesand p-values. |
loglik | The log-likelihood. |
iterations | Number of iterations of the algorithm. |
ll_null | Log-likelihood of the null model (i.e. with the intercept only). |
ll_fe_only | Log-likelihood of the model with only the fixed-effects. |
ssr_null | Sum of the squared residuals of the null model (containing only withthe intercept). |
pseudo_r2 | The adjusted pseudo R2. |
fitted.values | The fitted values are the expected value of the dependent variablefor the fitted model: that is |
residuals | The residuals (y minus the fitted values). |
sq.cor | Squared correlation between the dependent variable and theexpected predictor (i.e. fitted.values) obtained by the estimation. |
hessian | The Hessian of the parameters. |
cov.iid | The variance-covariance matrix of the parameters. |
se | The standard-error of the parameters. |
scores | The matrix of the scores (first derivative for each observation). |
residuals | The difference between the dependent variable and the expected predictor. |
sumFE | The sum of the fixed-effects coefficients for each observation. |
offset | (When relevant.) The offset formula. |
Combining the fixed-effects
You can combine two variables to make it a new fixed-effect using^.The syntax is as follows:fe_1^fe_2. Here you created a new variable which is the combinationof the two variables fe_1 and fe_2. This is identical to doingpaste0(fe_1, "_", fe_2)but more convenient.
Note that pasting is a costly operation, especially for large data sets.Hence, by default this paste is done only when the number of unique valuesis lower than 50,000 observations.
In case you are using a large data set and want to keep the identity of the fixed-effects,you need to use the argumentfixef.keep_names = TRUE.
Note that these “identities” are useful only if you're interested inthe value of the fixed-effects (that you can extract withfixef.fixest).
Lagging variables
To use leads/lags of variables in the estimation, you can: i) either provide the argumentpanel.id, ii) either set your data set as a panel with the functionpanel,f andd.
You can provide several leads/lags/differences at once: e.g. if your formula is equal tof(y) ~ l(x, -1:1), it means that the dependent variable is equal to the lead ofy,and you will have as explanatory variables the lead ofx1,x1 and the lag ofx1.See the examples in functionl for more details.
Interactions
You can interact a numeric variable with a "factor-like" variable by usingi(factor_var, continuous_var, ref), wherecontinuous_var will be interacted witheach value offactor_var and the argumentref is a value offactor_vartaken as a reference (optional).
Using this specific way to create interactions leads to a different display of theinteracted values inetable. See examples.
It is important to note thatif you do not care about the standard-errors ofthe interactions, then you can add interactions in the fixed-effects part of the formula,it will be incomparably faster (using the syntaxfactor_var[continuous_var], as explainedin the section “Varying slopes”).
The functioni has in fact more arguments, please see details in its associated help page.
On standard-errors
Standard-errors can be computed in different ways, you can use the argumentsse andsscinsummary.fixest to define how to compute them. By default, the VCOV is the "standard" one.
The following vignette:On standard-errors describes in details how the standard-errors are computed infixest and how you can replicate standard-errors from other software.
You can use the functionssetFixest_vcov andsetFixest_ssc topermanently set the way the standard-errors are computed.
Multiple estimations
Multiple estimations can be performed at once, they just have to be specified in the formula.Multiple estimations yield afixest_multi object which is ‘kind of’ a list ofall the results but includes specific methods to access the results in a handy way.Please have a look at the dedicated vignette:Multiple estimations.
To include multiple dependent variables, wrap them inc() (list() also works).For instancefml = c(y1, y2) ~ x1 would estimate the modelfml = y1 ~ x1 andthen the modelfml = y2 ~ x1.
To include multiple independent variables, you need to use the stepwise functions.There are 4 stepwise functions:sw,sw0,csw,csw0, andmvsw. Of courseswstands for stepwise, andcsw for cumulative stepwise. Finallymvsw is a bit special,it stands for multiverse stepwise. Let's explain that.Assume you have the following formula:fml = y ~ x1 + sw(x2, x3).The stepwise functionsw will estimate the following two models:y ~ x1 + x2 andy ~ x1 + x3. That is, each element insw() is sequentially, and separately,added to the formula. Would have you usedsw0 in lieu ofsw, then the modely ~ x1 would also have been estimated. The0 in the name means that the modelwithout any stepwise element also needs to be estimated.The prefixc means cumulative: each stepwise element is added to the next. That is,fml = y ~ x1 + csw(x2, x3) would lead to the following modelsy ~ x1 + x2 andy ~ x1 + x2 + x3. The0 has the same meaning and would also lead to the model withoutthe stepwise elements to be estimated: in other words,fml = y ~ x1 + csw0(x2, x3)leads to the following three models:y ~ x1,y ~ x1 + x2 andy ~ x1 + x2 + x3.Finallymvsw will add, in a stepwise fashion all possible combinations of the variablesin its arguments. For examplemvsw(x1, x2, x3) is equivalent tosw0(x1, x2, x3, x1 + x2, x1 + x3, x2 + x3, x1 + x2 + x3). The number of modelsto estimate grows at a factorial rate: so be cautious!
Multiple independent variables can be combined with multiple dependent variables, as infml = c(y1, y2) ~ cw(x1, x2, x3) which would lead to 6 estimations. Multipleestimations can also be combined to split samples (with the argumentssplit,fsplit).
You can also add fixed-effects in a stepwise fashion. Note that you cannot performstepwise estimations on the IV part of the formula (feols only).
If NAs are present in the sample, to avoid too many messages, only NA removalconcerning the variables common to all estimations is reported.
A note on performance. The feature of multiple estimations has been highly optimized forfeols, in particular in the presence of fixed-effects. It is faster to estimatemultiple models using the formula rather than with a loop. For non-feols models usingthe formula is roughly similar to using a loop performance-wise.
Argument sliding
When the data set has been set up globally usingsetFixest_estimation(data = data_set), the argumentvcov can be used implicitly.This means that calls such asfeols(y ~ x, "HC1"), orfeols(y ~ x, ~id), are valid:i) the data is automatically deduced from the global settings, and ii) thevcovis deduced to be the second argument.
Piping
Although the argument 'data' is placed in second position, the data can be piped to theestimation functions. For example, with R >= 4.1,mtcars |> feols(mpg ~ cyl) works asfeols(mpg ~ cyl, mtcars).
Tricks to estimate multiple LHS
To use multiple dependent variables infixest estimations, you need to include themin a vector: like inc(y1, y2, y3).
First, if names are stored in a vector, they can readily be inserted in a formula toperform multiple estimations using the dot square bracket operator. For instance ifmy_lhs = c("y1", "y2"), callingfixest with, sayfeols(.[my_lhs] ~ x1, etc) isequivalent to usingfeols(c(y1, y2) ~ x1, etc). Beware that this is a special featureunique to theleft-hand-side offixest estimations (the default behavior of the DSBoperator is to aggregate with sums, seexpd).
Second, you can use a regular expression to grep the left-hand-sides on the fly. When the..("regex") (reregex("regex")) feature is used naked on the LHS,the variables grepped are inserted intoc(). For example..("Pe") ~ Sepal.Length, iris is equivalent toc(Petal.Length, Petal.Width) ~ Sepal.Length, iris. Beware that this is aspecial feature unique to theleft-hand-side offixest estimations(the default behavior of..("regex") is to aggregate with sums, seexpd).
Note that if the dependent variable is also on the right-hand-side, it is automaticallyremoved from the set of explanatory variable.For example, feols(y ~ y + x, base) works as feols(y ~ x, base).This is particulary useful to batch multiple estimations with multiple left hand sides.
Dot square bracket operator in formulas
In a formula, the dot square bracket (DSB) operator can: i) create manifold variablesat once, or ii) capture values from the current environment and put themverbatim in the formula.
Say you want to include the variablesx1 tox3 in your formula. You can usexpd(y ~ x.[1:3]) and you'll gety ~ x1 + x2 + x3.
To summon values from the environment, simply put the variable in square brackets.For example:for(i in 1:3) xpd(y.[i] ~ x) will create the formulasy1 ~ x toy3 ~ xdepending on the value ofi.
You can include a full variable from the environment in the same way:for(y in c("a", "b")) xpd(.[y] ~ x) will create the two formulasa ~ x andb ~ x.
The DSB can even be used within variable names, but then the variable must be nested incharacter form. For exampley ~ .["x.[1:2]_sq"] will createy ~ x1_sq + x2_sq. Using thecharacter form is important to avoid a formula parsing error.Double quotes must be used. Note that the character string that is nested willbe parsed with the functiondsb, and thus it will return a vector.
By default, the DSB operator expands vectors into sums. You can add a comma,like in.[, x],to expand with commas–the content can then be used within functions. For instance:c(x.[, 1:2]) will createc(x1, x2) (andnotc(x1 + x2)).
In allfixest estimations, this special parsing is enabled, so you don't need to usexpd.
One-sided formulas can be expanded with the DSB operator: letx = ~sepal + petal, thenxpd(y ~ .[x]) leads tocolor ~ sepal + petal.
You can even use multiple square brackets within a single variable,but then the use of nesting is required.For example, the followingxpd(y ~ .[".[letters[1:2]]_.[1:2]"]) will createy ~ a_1 + b_2. Remember that the nested character string is parsed withdsb,which explains this behavior.
When the element to be expanded i) is equal to the empty string or,ii) is of length 0, it is replaced with a neutral element, namely1.For example,x = "" ; xpd(y ~ .[x]) leads toy ~ 1.
Author(s)
Laurent Berge
References
Berge, Laurent, 2018, "Efficient estimation of maximum likelihood models withmultiple fixed-effects: the R package FENmlm." CREA Discussion Papers,13 ().
For models with multiple fixed-effects:
Gaure, Simen, 2013, "OLS with multiple high dimensional category variables",Computational Statistics & Data Analysis 66 pp. 8–18
On the unconditionnal Negative Binomial model:
Allison, Paul D and Waterman, Richard P, 2002, "Fixed-Effects NegativeBinomial Regression Models", Sociological Methodology 32(1) pp. 247–265
See Also
See alsosummary.fixest to see the results with the appropriate standard-errors,fixef.fixest to extract the fixed-effects coefficients, and the functionetable to visualize the results of multiple estimations.And other estimation methods:feols,feglm,fepois,feNmlm.
Examples
# Load trade datadata(trade)# We estimate the effect of distance on trade => we account for 3 fixed-effects# 1) Poisson estimationest_pois = femlm(Euros ~ log(dist_km) | Origin + Destination + Product, trade)# 2) Log-Log Gaussian estimation (with same FEs)est_gaus = update(est_pois, log(Euros+1) ~ ., family = "gaussian")# Comparison of the results using the function etableetable(est_pois, est_gaus)# Now using two way clustered standard-errorsetable(est_pois, est_gaus, se = "twoway")# Comparing different types of standard errorssum_hetero = summary(est_pois, se = "hetero")sum_oneway = summary(est_pois, se = "cluster")sum_twoway = summary(est_pois, se = "twoway")sum_threeway = summary(est_pois, se = "threeway")etable(sum_hetero, sum_oneway, sum_twoway, sum_threeway)## Multiple estimations:## 6 estimationsest_mult = femlm(c(Ozone, Solar.R) ~ Wind + Temp + csw0(Wind:Temp, Day), airquality)# We can display the results for the first lhs:etable(est_mult[lhs = 1])# And now the second (access can be made by name)etable(est_mult[lhs = "Solar.R"])# Now we focus on the two last right hand sides# (note that .N can be used to specify the last item)etable(est_mult[rhs = 2:.N])# Combining with splitest_split = fepois(c(Ozone, Solar.R) ~ sw(poly(Wind, 2), poly(Temp, 2)), airquality, split = ~ Month)# You can display everything at once with the print methodest_split# Different way of displaying the results with "compact"summary(est_split, "compact")# You can still select which sample/LHS/RHS to displayest_split[sample = 1:2, lhs = 1, rhs = 1]Fixed-effects OLS estimation
Description
Estimates OLS with any number of fixed-effects.
Usage
feols( fml, data, vcov, weights, offset, subset, split, fsplit, split.keep, split.drop, cluster, se, ssc, panel.id, panel.time.step = NULL, panel.duplicate.method = "none", fixef, fixef.rm = "perfect_fit", fixef.tol = 1e-06, fixef.iter = 10000, fixef.algo = NULL, collin.tol = 1e-09, nthreads = getFixest_nthreads(), lean = FALSE, verbose = 0, warn = TRUE, notes = getFixest_notes(), only.coef = FALSE, data.save = FALSE, fixef.keep_names = NULL, demeaned = FALSE, mem.clean = FALSE, only.env = FALSE, env, ...)feols.fit( y, X, fixef_df, vcov, offset, split, fsplit, split.keep, split.drop, cluster, se, ssc, weights, subset, fixef.rm = "perfect_fit", fixef.tol = 1e-06, fixef.iter = 10000, fixef.algo = NULL, collin.tol = 1e-09, nthreads = getFixest_nthreads(), lean = FALSE, warn = TRUE, notes = getFixest_notes(), mem.clean = FALSE, verbose = 0, only.env = FALSE, only.coef = FALSE, env, ...)Arguments
fml | A formula representing the relation to be estimated. For example: |
data | A data.frame containing the necessary variables to run the model.The variables of the non-linear right hand side of the formula are identifiedwith this |
vcov | Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form: |
weights | A formula or a numeric vector. Each observation can be weighted,the weights must be greater than 0. If equal to a formula, it should be one-sided:for example |
offset | A formula or a numeric vector. An offset can be added to the estimation.If equal to a formula, it should be of the form (for example) |
subset | A vector (logical or numeric) or a one-sided formula. If provided,then the estimation will be performed only on the observations defined by this argument. |
split | A one sided formula representing a variable (eg |
fsplit | A one sided formula representing a variable (eg |
split.keep | A character vector. Only used when |
split.drop | A character vector. Only used when |
cluster | Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering over |
se | Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation: |
ssc | An object of class |
panel.id | The panel identifiers. Can either be: i) a one sided formula(e.g. |
panel.time.step | The method to compute the lags, default is |
panel.duplicate.method | If several observations have the same id and time values,then the notion of lag is not defined for them. If |
fixef | Character vector. The names of variables to be used as fixed-effects.These variables should contain the identifier of each observation (e.g., think of itas a panel identifier). Note that the recommended way to include fixed-effects is toinsert them directly in the formula. |
fixef.rm | Can be equal to "perfect_fit" (default), "singletons", "infinite_coef"or "none". This option controls which observations should be removed prior to the estimation.If "singletons", fixed-effects associated to a single observation are removed(since they perfectly explain it). The value "infinite_coef" only works with GLM families with limited left hand sides (LHS)and exponential link.For instance the Poisson family for which the LHS cannot be lower than 0, or the logitfamily for which the LHS lies within 0 and 1.In that case the fixed-effects (FEs) with only-0 LHS would lead to infinite coefficients(FE = -Inf would explain perfectly the LHS).The value If "perfect_fit", it is equivalent to "singletons" and "infinite_coef" combined.That means all observations that are perfectly explained by the FEs are removed. If "none": no observation is removed. Note that whathever the value of this options: the coefficient estimateswill remain the same. It only affects inference (the standard-errors). The algorithm is recursive, meaning that, e.g. in the presence of several fixed-effects (FEs),removing singletons in one FE can create singletons (or perfect fits) in another FE.The algorithm continues until there is no singleton/perfect-fit remaining. |
fixef.tol | Precision used to obtain the fixed-effects. Defaults to |
fixef.iter | Maximum number of iterations in fixed-effects algorithm(only in use for 2+ fixed-effects). Default is 10000. |
fixef.algo |
|
collin.tol | Numeric scalar, default is |
nthreads | The number of threads. Can be: a) an integer lower than, or equal to,the maximum number of threads; b) 0: meaning all available threads will be used;c) a number strictly between 0 and 1 which represents the fraction of all threads to use.The default is to use 50% of all threads. You can set permanently the numberof threads used within this package using the function |
lean | Logical scalar, default is |
verbose | Integer. Higher values give more information. In particular,it can detail the number of iterations in the demeaning algorithm(the first number is the left-hand-side, the other numbers are the right-hand-side variables). |
warn | Logical, default is |
notes | Logical. By default, two notes are displayed: when NAs are removed(to show additional information) and when some observations are removed becauseof collinearity. To avoid displaying these messages, you can set |
only.coef | Logical scalar, default is |
data.save | Logical scalar, default is |
fixef.keep_names | Logical or |
demeaned | Logical, default is |
mem.clean | Logical scalar, default is |
only.env | (Advanced users.) Logical scalar, default is |
env | (Advanced users.) A |
... | Not currently used. |
y | Numeric vector/matrix/data.frame of the dependent variable(s). Multiple dependentvariables will return a |
X | Numeric matrix of the regressors. |
fixef_df | Matrix/data.frame of the fixed-effects. |
Details
The method used to demean each variable along the fixed-effects is based on Berge (2018), sincethis is the same problem to solve as for the Gaussian case in a ML setup.
Value
Afixest object. Note thatfixest objects contain many elements and most of them arefor internal use, they are presented here only for information. To access them, it is saferto use the user-level methods (e.g.vcov.fixest,resid.fixest, etc) or functions(like for instancefitstat to access any fit statistic).
nobs | The number of observations. |
fml | The linear formula of the call. |
call | The call of the function. |
method | The method used to estimate the model. |
data | The original data set used when calling the function. Only available whenthe estimation was called with |
fml_all | A list containing different parts of the formula. Always contain the linear formula. Then depending on the cases: |
fixef_vars | The names of each fixed-effect dimension. |
fixef_id | The list (of length the number of fixed-effects) of the fixed-effects identifiers for each observation. |
fixef_sizes | The size of each fixed-effect (i.e. the number of unique identifierfor each fixed-effect dimension). |
coefficients | The named vector of estimated coefficients. |
multicol | Logical, if multicollinearity was found. |
coeftable | The table of the coefficients with their standard errors, z-values and p-values. |
loglik | The loglikelihood. |
ssr_null | Sum of the squared residuals of the null model (containing only with the intercept). |
ssr_fe_only | Sum of the squared residuals of the model estimated with fixed-effects only. |
ll_null | The log-likelihood of the null model (containing only with the intercept). |
ll_fe_only | The log-likelihood of the model estimated with fixed-effects only. |
fitted.values | The fitted values. |
linear.predictors | The linear predictors. |
residuals | The residuals (y minus the fitted values). |
sq.cor | Squared correlation between the dependent variable and the expected predictor (i.e. fitted.values) obtained by the estimation. |
hessian | The Hessian of the parameters. |
cov.iid | The variance-covariance matrix of the parameters. |
se | The standard-error of the parameters. |
scores | The matrix of the scores (first derivative for each observation). |
residuals | The difference between the dependent variable and the expected predictor. |
sumFE | The sum of the fixed-effects coefficients for each observation. |
offset | (When relevant.) The offset formula. |
weights | (When relevant.) The weights formula. |
obs_selection | (When relevant.) List containing vectors of integers. It represents the sequential selection of observation vis a vis the original data set. |
collin.var | (When relevant.) Vector containing the variables removed because of collinearity. |
collin.coef | (When relevant.) Vector of coefficients, where the values of the variables removed because of collinearity are NA. |
collin.min_norm | The minimal diagonal value of the Cholesky decomposition. Small values indicate possible presence collinearity. |
y_demeaned | Only when |
X_demeaned | Only when |
Combining the fixed-effects
You can combine two variables to make it a new fixed-effect using^.The syntax is as follows:fe_1^fe_2. Here you created a new variable which is the combinationof the two variables fe_1 and fe_2. This is identical to doingpaste0(fe_1, "_", fe_2)but more convenient.
Note that pasting is a costly operation, especially for large data sets.Hence, by default this paste is done only when the number of unique valuesis lower than 50,000 observations.
In case you are using a large data set and want to keep the identity of the fixed-effects,you need to use the argumentfixef.keep_names = TRUE.
Note that these “identities” are useful only if you're interested inthe value of the fixed-effects (that you can extract withfixef.fixest).
Varying slopes
You can add variables with varying slopes in the fixed-effect part of the formula.The syntax is as follows:fixef_var[var1, var2]. Here the variables var1 and var2 willbe with varying slopes (one slope per value in fixef_var) and the fixed-effectfixef_var will also be added.
To add only the variables with varying slopes and not the fixed-effect,use double square brackets:fixef_var[[var1, var2]].
In other words:
fixef_var[var1, var2]is equivalent tofixef_var + fixef_var[[var1]] + fixef_var[[var2]]fixef_var[[var1, var2]]is equivalent tofixef_var[[var1]] + fixef_var[[var2]]
In general, for convergence reasons, it is recommended to always add the fixed-effect andavoid using only the variable with varying slope (i.e. use single square brackets).
Lagging variables
To use leads/lags of variables in the estimation, you can: i) either provide the argumentpanel.id, ii) either set your data set as a panel with the functionpanel,f andd.
You can provide several leads/lags/differences at once: e.g. if your formula is equal tof(y) ~ l(x, -1:1), it means that the dependent variable is equal to the lead ofy,and you will have as explanatory variables the lead ofx1,x1 and the lag ofx1.See the examples in functionl for more details.
Interactions
You can interact a numeric variable with a "factor-like" variable by usingi(factor_var, continuous_var, ref), wherecontinuous_var will be interacted witheach value offactor_var and the argumentref is a value offactor_vartaken as a reference (optional).
Using this specific way to create interactions leads to a different display of theinteracted values inetable. See examples.
It is important to note thatif you do not care about the standard-errors ofthe interactions, then you can add interactions in the fixed-effects part of the formula,it will be incomparably faster (using the syntaxfactor_var[continuous_var], as explainedin the section “Varying slopes”).
The functioni has in fact more arguments, please see details in its associated help page.
On standard-errors
Standard-errors can be computed in different ways, you can use the argumentsse andsscinsummary.fixest to define how to compute them. By default, the VCOV is the "standard" one.
The following vignette:On standard-errors describes in details how the standard-errors are computed infixest and how you can replicate standard-errors from other software.
You can use the functionssetFixest_vcov andsetFixest_ssc topermanently set the way the standard-errors are computed.
Instrumental variables
To estimate two stage least square regressions, insert the relationship betweenthe endogenous regressor(s) and the instruments in a formula, after a pipe.
For example,fml = y ~ x1 | x_endo ~ x_inst will use the variablesx1 andx_inst inthe first stage to explainx_endo. Then will use the fitted value ofx_endo(which will be namedfit_x_endo) andx1 to explainy.To include several endogenous regressors, just use "+",like in:fml = y ~ x1 | x_endo1 + x_end2 ~ x_inst1 + x_inst2.
Of course you can still add the fixed-effects, but the IV formula must always come last,like infml = y ~ x1 | fe1 + fe2 | x_endo ~ x_inst.
If you want to estimate a model without exogenous variables, use"1" as aplaceholder: e.g.fml = y ~ 1 | x_endo ~ x_inst.
By default, the second stage regression is returned. You can access the first stage(s)regressions either directly in the slotiv_first_stage (not recommended),or using the argumentstage = 1 from the functionsummary.fixest.For examplesummary(iv_est, stage = 1) will give the first stage(s).Note that using summary you can display both the second and first stages atthe same time using, e.g.,stage = 1:2 (using2:1 would reverse the order).
Multiple estimations
Multiple estimations can be performed at once, they just have to be specified in the formula.Multiple estimations yield afixest_multi object which is ‘kind of’ a list ofall the results but includes specific methods to access the results in a handy way.Please have a look at the dedicated vignette:Multiple estimations.
To include multiple dependent variables, wrap them inc() (list() also works).For instancefml = c(y1, y2) ~ x1 would estimate the modelfml = y1 ~ x1 andthen the modelfml = y2 ~ x1.
To include multiple independent variables, you need to use the stepwise functions.There are 4 stepwise functions:sw,sw0,csw,csw0, andmvsw. Of courseswstands for stepwise, andcsw for cumulative stepwise. Finallymvsw is a bit special,it stands for multiverse stepwise. Let's explain that.Assume you have the following formula:fml = y ~ x1 + sw(x2, x3).The stepwise functionsw will estimate the following two models:y ~ x1 + x2 andy ~ x1 + x3. That is, each element insw() is sequentially, and separately,added to the formula. Would have you usedsw0 in lieu ofsw, then the modely ~ x1 would also have been estimated. The0 in the name means that the modelwithout any stepwise element also needs to be estimated.The prefixc means cumulative: each stepwise element is added to the next. That is,fml = y ~ x1 + csw(x2, x3) would lead to the following modelsy ~ x1 + x2 andy ~ x1 + x2 + x3. The0 has the same meaning and would also lead to the model withoutthe stepwise elements to be estimated: in other words,fml = y ~ x1 + csw0(x2, x3)leads to the following three models:y ~ x1,y ~ x1 + x2 andy ~ x1 + x2 + x3.Finallymvsw will add, in a stepwise fashion all possible combinations of the variablesin its arguments. For examplemvsw(x1, x2, x3) is equivalent tosw0(x1, x2, x3, x1 + x2, x1 + x3, x2 + x3, x1 + x2 + x3). The number of modelsto estimate grows at a factorial rate: so be cautious!
Multiple independent variables can be combined with multiple dependent variables, as infml = c(y1, y2) ~ cw(x1, x2, x3) which would lead to 6 estimations. Multipleestimations can also be combined to split samples (with the argumentssplit,fsplit).
You can also add fixed-effects in a stepwise fashion. Note that you cannot performstepwise estimations on the IV part of the formula (feols only).
If NAs are present in the sample, to avoid too many messages, only NA removalconcerning the variables common to all estimations is reported.
A note on performance. The feature of multiple estimations has been highly optimized forfeols, in particular in the presence of fixed-effects. It is faster to estimatemultiple models using the formula rather than with a loop. For non-feols models usingthe formula is roughly similar to using a loop performance-wise.
Tricks to estimate multiple LHS
To use multiple dependent variables infixest estimations, you need to include themin a vector: like inc(y1, y2, y3).
First, if names are stored in a vector, they can readily be inserted in a formula toperform multiple estimations using the dot square bracket operator. For instance ifmy_lhs = c("y1", "y2"), callingfixest with, sayfeols(.[my_lhs] ~ x1, etc) isequivalent to usingfeols(c(y1, y2) ~ x1, etc). Beware that this is a special featureunique to theleft-hand-side offixest estimations (the default behavior of the DSBoperator is to aggregate with sums, seexpd).
Second, you can use a regular expression to grep the left-hand-sides on the fly. When the..("regex") (reregex("regex")) feature is used naked on the LHS,the variables grepped are inserted intoc(). For example..("Pe") ~ Sepal.Length, iris is equivalent toc(Petal.Length, Petal.Width) ~ Sepal.Length, iris. Beware that this is aspecial feature unique to theleft-hand-side offixest estimations(the default behavior of..("regex") is to aggregate with sums, seexpd).
Note that if the dependent variable is also on the right-hand-side, it is automaticallyremoved from the set of explanatory variable.For example, feols(y ~ y + x, base) works as feols(y ~ x, base).This is particulary useful to batch multiple estimations with multiple left hand sides.
Argument sliding
When the data set has been set up globally usingsetFixest_estimation(data = data_set), the argumentvcov can be used implicitly.This means that calls such asfeols(y ~ x, "HC1"), orfeols(y ~ x, ~id), are valid:i) the data is automatically deduced from the global settings, and ii) thevcovis deduced to be the second argument.
Piping
Although the argument 'data' is placed in second position, the data can be piped to theestimation functions. For example, with R >= 4.1,mtcars |> feols(mpg ~ cyl) works asfeols(mpg ~ cyl, mtcars).
Dot square bracket operator in formulas
In a formula, the dot square bracket (DSB) operator can: i) create manifold variablesat once, or ii) capture values from the current environment and put themverbatim in the formula.
Say you want to include the variablesx1 tox3 in your formula. You can usexpd(y ~ x.[1:3]) and you'll gety ~ x1 + x2 + x3.
To summon values from the environment, simply put the variable in square brackets.For example:for(i in 1:3) xpd(y.[i] ~ x) will create the formulasy1 ~ x toy3 ~ xdepending on the value ofi.
You can include a full variable from the environment in the same way:for(y in c("a", "b")) xpd(.[y] ~ x) will create the two formulasa ~ x andb ~ x.
The DSB can even be used within variable names, but then the variable must be nested incharacter form. For exampley ~ .["x.[1:2]_sq"] will createy ~ x1_sq + x2_sq. Using thecharacter form is important to avoid a formula parsing error.Double quotes must be used. Note that the character string that is nested willbe parsed with the functiondsb, and thus it will return a vector.
By default, the DSB operator expands vectors into sums. You can add a comma,like in.[, x],to expand with commas–the content can then be used within functions. For instance:c(x.[, 1:2]) will createc(x1, x2) (andnotc(x1 + x2)).
In allfixest estimations, this special parsing is enabled, so you don't need to usexpd.
One-sided formulas can be expanded with the DSB operator: letx = ~sepal + petal, thenxpd(y ~ .[x]) leads tocolor ~ sepal + petal.
You can even use multiple square brackets within a single variable,but then the use of nesting is required.For example, the followingxpd(y ~ .[".[letters[1:2]]_.[1:2]"]) will createy ~ a_1 + b_2. Remember that the nested character string is parsed withdsb,which explains this behavior.
When the element to be expanded i) is equal to the empty string or,ii) is of length 0, it is replaced with a neutral element, namely1.For example,x = "" ; xpd(y ~ .[x]) leads toy ~ 1.
Author(s)
Laurent Berge
References
Berge, Laurent, 2018, "Efficient estimation of maximum likelihood models withmultiple fixed-effects: the R package FENmlm." CREA Discussion Papers, 13 ().
For models with multiple fixed-effects:
Gaure, Simen, 2013, "OLS with multiple high dimensional category variables",Computational Statistics & Data Analysis 66 pp. 8–18
See Also
See alsosummary.fixest to see the results with the appropriate standard-errors,fixef.fixest to extract the fixed-effects coefficients, and the functionetableto visualize the results of multiple estimations. For plotting coefficients: seecoefplot.
And other estimation methods:femlm,feglm,fepois,fenegbin,feNmlm.
Examples
## Basic estimation#res = feols(Sepal.Length ~ Sepal.Width + Petal.Length, iris)# You can specify clustered standard-errors in summary:summary(res, cluster = ~Species)## Just one set of fixed-effects:#res = feols(Sepal.Length ~ Sepal.Width + Petal.Length | Species, iris)# Here we have "default" SEssummary(res)## Varying slopes:#res = feols(Sepal.Length ~ Petal.Length | Species[Sepal.Width], iris)summary(res)## Combining the FEs:#base = irisbase$fe_2 = rep(1:10, 15)res_comb = feols(Sepal.Length ~ Petal.Length | Species^fe_2, base)summary(res_comb)fixef(res_comb)[[1]]## Using leads/lags:#data(base_did)# We need to set up the panel with the arg. panel.idest1 = feols(y ~ l(x1, 0:1), base_did, panel.id = ~id+period)est2 = feols(f(y) ~ l(x1, -1:1), base_did, panel.id = ~id+period)etable(est1, est2, order = "f", drop = "Int")## Using interactions:#data(base_did)# We interact the variable 'period' with the variable 'treat'est_did = feols(y ~ x1 + i(period, treat, 5) | id + period, base_did)# Now we can plot the result of the interaction with coefplotcoefplot(est_did)# You have many more example in coefplot help## Instrumental variables## To estimate Two stage least squares,# insert a formula describing the endo. vars./instr. relation after a pipe:data(fulton)# Using exogenous control, 1 endogenous var. and 1 instrumentres_iv = feols(qty ~ t | price ~ speed2, fulton)# The second stage is the defaultsummary(res_iv)# To show the first stage:summary(res_iv, stage = 1)# To show both the first and second stages:summary(res_iv, stage = 1:2)# Adding a fixed-effect => IV formula always last!res_iv_fe = feols(qty ~ t | day | price ~ speed2, fulton)# With two instrumentsres_iv2 = feols(qty ~ t | day | price ~ speed2 + wave2, fulton)# Now there's two first stages => a fixest_multi object is returnedsum_res_iv2 = summary(res_iv2, stage = 1)# You can navigate through it by subsetting:sum_res_iv2[iv = 1]# The stage argument also works in etable:etable(res_iv, res_iv_fe, res_iv2, order = "endo")etable(res_iv, res_iv_fe, res_iv2, stage = 1:2, order = c("endo", "inst"), group = list(control = "!endo|inst"))## Multiple estimations:## 6 estimationsest_mult = feols(c(Ozone, Solar.R) ~ Wind + Temp + csw0(Wind:Temp, Day), airquality)# We can display the results for the first lhs:etable(est_mult[lhs = 1])# And now the second (access can be made by name)etable(est_mult[lhs = "Solar.R"])# Now we focus on the two last right hand sides# (note that .N can be used to specify the last item)etable(est_mult[rhs = 2:.N])# Combining with splitest_split = feols(c(Ozone, Solar.R) ~ sw(poly(Wind, 2), poly(Temp, 2)), airquality, split = ~ Month)# You can display everything at once with the print methodest_split# Different way of displaying the results with "compact"summary(est_split, "compact")# You can still select which sample/LHS/RHS to displayest_split[sample = 1:2, lhs = 1, rhs = 1]## Split sample estimations#base = setNames(iris, c("y", "x1", "x2", "x3", "species"))est = feols(y ~ x.[1:3], base, split = ~species)etable(est)# You can select specific values with the %keep% and %drop% operators# By default, partial matching is enabled. It should refer to a single variable.est = feols(y ~ x.[1:3], base, split = ~species %keep% c("set", "vers"))etable(est)# You can supply regular expression by using an @ first.# regex can match several values.est = feols(y ~ x.[1:3], base, split = ~species %keep% c("@set|vers"))etable(est)## Argument sliding## When the data set is set up globally, you can use the vcov argument implicitlybase = setNames(iris, c("y", "x1", "x2", "x3", "species"))no_sliding = feols(y ~ x1 + x2, base, ~species)# With slidingsetFixest_estimation(data = base)# ~species is implicitly deduced to be equal to 'vcov'sliding = feols(y ~ x1 + x2, ~species)etable(no_sliding, sliding)# Resetting the global optionssetFixest_estimation(data = NULL)## Formula expansions## By default, the features of the xpd function are enabled in# all fixest estimations# Here's a few examplesbase = setNames(iris, c("y", "x1", "x2", "x3", "species"))# dot square bracket operatorfeols(y ~ x.[1:3], base)# fetching variables via regular expressions: ..("regex")feols(y ~ ..("1|2"), base)# NOTA: it also works for multiple LHSmult1 = feols(x.[1:2] ~ y + species, base)mult2 = feols(..("y|3") ~ x.[1:2] + species, base)etable(mult1, mult2)# Use .[, stuff] to include variables in functions:feols(y ~ csw(x.[, 1:3]), base)# Same for ..(, "regex")feols(y ~ csw(..(,"x")), base)Computes fit statistics of fixest objects
Description
Computes various fit statistics forfixest estimations.
Usage
fitstat( x, type, vcov = NULL, cluster = NULL, ssc = NULL, simplify = FALSE, verbose = TRUE, show_types = FALSE, frame = parent.frame(), ...)Arguments
x | A |
type | Character vector or one sided formula. The type of fit statistic to be computed.The classic ones are: n, rmse, r2, pr2, f, wald, ivf, ivwald. You have the full list inthe details section or use |
vcov | Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form: |
cluster | Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering over |
ssc | An object of class |
simplify | Logical, default is |
verbose | Logical, default is |
show_types | Logical, default is |
frame | An environment in which to evaluate variables, default is |
... | For internal use. |
Value
By default an object of classfixest_fitstat is returned. Usingverbose = FALSEreturns a simple a list. Finally, if only one type is selected,simplify = TRUEleads to the selected type to be returned.
Registering your own types
You can register custom fit statistics with the functionfitstat_register.
Available types
The types are case sensitive, please use lower case only. The types available are:
n,ll,aic,bic,rmse:The number of observations, the log-likelihood,the AIC, the BIC and the root mean squared error, respectively.
my:Mean of the dependent variable.
g:The degrees of freedom used to compute the t-test (it influences the p-valuesof the coefficients). When the VCOV is clustered, this value is equal to the minimumcluster size, otherwise, it is equal to the sample size minus the number of variables.
r2,ar2,wr2,awr2,pr2,apr2,wpr2,awpr2:All r2 that can beobtained with the function
r2. Theastands for 'adjusted', thewfor 'within' andthepfor 'pseudo'. Note that the order of the lettersa,wandpdoes not matter.The pseudo R2s are McFadden's R2s (ratios of log-likelihoods).theta:The over-dispersion parameter in Negative Binomial models. Low values meanhigh overdispersion.
f,wf:The F-tests of nullity of the coefficients. The
wstands for'within'. These types return the following values:stat,p,df1anddf2.If you want to display only one of these, use their name after a dot: e.g.f.statwill give the statistic of the F-test, orwf.pwill give the p-values of the F-teston the projected model (i.e. projected onto the fixed-effects).wald:Wald test of joint nullity of the coefficients. This test always excludesthe intercept and the fixed-effects. These type returns the following values:
stat,p,df1,df2andvcov. The elementvcovreports the way the VCOVmatrix was computed since it directly influences this statistic.ivf,ivf1,ivf2,ivfall:These statistics are specific to IV estimations.They report either the IV F-test (namely the Cragg-Donald F statistic in the presenceof only one endogenous regressor) of the first stage (
ivforivf1), of thesecond stage (ivf2) or of both (ivfall). The F-test of the first stage iscommonly named weak instrument test. The value ofivfallis only useful inetablewhen both the 1st and 2nd stages are displayed (it leads to the 1st stage F-test(s)to be displayed on the 1st stage estimation(s), and the 2nd stage one on the2nd stage estimation – otherwise,ivf1would also be displayed on the 2nd stageestimation). These types return the following values:stat,p,df1anddf2.ivwald,ivwald1,ivwald2,ivwaldall:These statistics are specific to IVestimations. They report either the IV Wald-test of the first stage (
ivwaldorivwald1),of the second stage (ivwald2) or of both (ivwaldall). The Wald-test of the first stageis commonly named weak instrument test. Note that if the estimation was done with a robustVCOV and there is only one endogenous regressor, this is equivalent to theKleibergen-Paap statistic. The value ofivwaldallis only useful inetablewhen boththe 1st and 2nd stages are displayed (it leads to the 1st stage Wald-test(s) to be displayedon the 1st stage estimation(s), and the 2nd stage one on the 2nd stage estimation –otherwise,ivwald1would also be displayed on the 2nd stage estimation). These typesreturn the following values:stat,p,df1,df2, andvcov.cd:The Cragg-Donald test for weak instruments.
kpr:The Kleibergen-Paap test for weak instruments.
wh:This statistic is specific to IV estimations. Wu-Hausman endogeneity test.H0 is the absence of endogeneity of the instrumented variables. It returns the followingvalues:
stat,p,df1,df2.sargan:Sargan test of overidentifying restrictions. H0: the instruments arenot correlated with the second stage residuals. It returns thefollowing values:
stat,p,df.lr,wlr:Likelihood ratio and within likelihood ratio tests. It returnsthe following elements:
stat,p,df. Concerning the within-LR test, note that,contrary to estimations withfemlmorfeNmlm, estimations withfeglm/fepoisneed to estimate the model with fixed-effects only which may prove time-consuming(depending on your model). Bottom line, if you really need the within-LR and estimate aPoisson model, usefemlminstead offepois(the former uses direct ML maximization forwhich the only FEs model is a by product).
Examples
data(trade)gravity = feols(log(Euros) ~ log(dist_km) | Destination + Origin, trade)# Extracting the 'working' number of observations used to compute the pvaluesfitstat(gravity, "g", simplify = TRUE)# Some fit statisticsfitstat(gravity, ~ rmse + r2 + wald + wf)# You can use them in etableetable(gravity, fitstat = ~ rmse + r2 + wald + wf)# For wald and wf, you could show the pvalue instead:etable(gravity, fitstat = ~ rmse + r2 + wald.p + wf.p)# Now let's display some statistics that are not built-in# => we use fitstat_register to create them# We need: a) type name, b) the function to be applied# c) (optional) an aliasfitstat_register("tstand", function(x) tstat(x, se = "stand")[1], "t-stat (regular)")fitstat_register("thc", function(x) tstat(x, se = "heter")[1], "t-stat (HC1)")fitstat_register("t1w", function(x) tstat(x, se = "clus")[1], "t-stat (clustered)")fitstat_register("t2w", function(x) tstat(x, se = "twow")[1], "t-stat (2-way)")# Now we can use these keywords in fitstat:etable(gravity, fitstat = ~ . + tstand + thc + t1w + t2w)# Note that the custom stats we created are can easily lead# to errors, but that's another story!Register custom fit statistics
Description
Enables the registration of custom fit statistics that can be easily summoned with the functionfitstat.
Usage
fitstat_register(type, fun, alias = NULL, subtypes = NULL)Arguments
type | A character scalar giving the type-name. |
fun | A function to be applied to a |
alias | A (named) character vector. An alias to be used in lieu of the type name inthe display methods (ie when used in |
subtypes | A character vector giving the name of each element returned by thefunction |
Details
If there are several components to the computed statistics (i.e. the function returnsseveral elements), then using the argumentsubtypes, giving the names of each ofthese components, is mandatory. This is to ensure that the statistic can be used as anyother built-in statistic (and there are too many edge cases impeding automatic deduction).
Author(s)
Laurent Berge
Examples
# An estimationbase = irisnames(base) = c("y", "x1", "x2", "x3", "species")est = feols(y ~ x1 + x2 | species, base)## single valued tests## say you want to add the coefficient of variation of the dependent variablecv = function(est){ y = model.matrix(est, type = "lhs") sd(y)/mean(y)}# Now we register the routinefitstat_register("cvy", cv, "Coef. of Variation (dep. var.)")# now we can summon the registered routine with its type ("cvy")fitstat(est, "cvy")## Multi valued tests## Let's say you want a Wald test with an heteroskedasticiy robust variance# First we create the functionhc_wald = function(est){ w = wald(est, keep = "!Intercept", print = FALSE, se = "hetero") head(w, 4)}# This test returns a vector of 4 elements: stat, p, df1 and df2# Now we register the routinefitstat_register("hc_wald", hc_wald, "Wald (HC1)", "test2")# You can access the statistic, as beforefitstat(est, "hc_wald")# But you can also access the sub elementsfitstat(est, "hc_wald.p")Extracts fitted values from afixest fit
Description
This function extracts the fitted values from a model estimated withfemlm,feols orfeglm. The fitted values that are returned are theexpected predictor.
Usage
## S3 method for class 'fixest'fitted(object, type = c("response", "link"), na.rm = TRUE, ...)## S3 method for class 'fixest'fitted.values(object, type = c("response", "link"), na.rm = TRUE, ...)Arguments
object | A |
type | Character either equal to |
na.rm | Logical, default is |
... | Not currently used. |
Details
This function returns theexpected predictor of afixest fit. The likelihood functionsare detailed infemlm help page.
Value
It returns a numeric vector of length the number of observations used to estimate the model.
Iftype = "response", the value returned is the expected predictor, i.e. theexpected value of the dependent variable for the fitted model:E(Y|X).Iftype = "link", the value returned is the linear predictor of the fitted model,that isX\cdot \beta (remind thatE(Y|X) = f(X\cdot \beta)).
Author(s)
Laurent Berge
See Also
See also the main estimation functionsfemlm,feols orfeglm.resid.fixest,predict.fixest,summary.fixest,vcov.fixest,fixef.fixest.
Examples
# simple estimation on iris data, using "Species" fixed-effectsres_poisson = femlm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width | Species, iris)# we extract the fitted valuesy_fitted_poisson = fitted(res_poisson)# Same estimation but in OLS (Gaussian family)res_gaussian = femlm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width | Species, iris, family = "gaussian")y_fitted_gaussian = fitted(res_gaussian)# comparison of the fit for the two familiesplot(iris$Sepal.Length, y_fitted_poisson)points(iris$Sepal.Length, y_fitted_gaussian, col = 2, pch = 2)Extract the Fixed-Effects from afixest estimation.
Description
This function retrieves the fixed effects from afixest estimation. It is useful onlywhen there are one or more fixed-effect dimensions.
Usage
## S3 method for class 'fixest'fixef( object, notes = getFixest_notes(), sorted = TRUE, nthreads = getFixest_nthreads(), fixef.tol = 1e-05, fixef.iter = 10000, ...)Arguments
object | |
notes | Logical. Whether to display a note when the fixed-effects coefficients arenot regular. |
sorted | Logical, default is |
nthreads | The number of threads. Can be: a) an integer lower than, or equal to,the maximum number of threads; b) 0: meaning all available threads will be used;c) a number strictly between 0 and 1 which represents the fraction of all threads to use.The default is to use 50% of all threads. You can set permanently the numberof threads used within this package using the function |
fixef.tol | Precision used to obtain the fixed-effects. Defaults to |
fixef.iter | Maximum number of iterations in fixed-effects algorithm(only in use for 2+ fixed-effects). Default is 10000. |
... | Not currently used. |
Details
If the fixed-effect coefficients are not regular, then several reference points need tobe set: this means that the fixed-effects coefficients cannot be directly interpreted.If this is the case, then a warning is raised.
Value
A list containing the vectors of the fixed effects.
If there is more than 1 fixed-effect, then the attribute “references” is created.This is a vector of length the number of fixed-effects, each element contains the numberof coefficients set as references. By construction, the elements of the firstfixed-effect dimension are never set as references. In the presence of regularfixed-effects, there should be Q-1 references (with Q the number of fixed-effects).
Author(s)
Laurent Berge
See Also
plot.fixest.fixef. See also the main estimation functionsfemlm,feolsorfeglm. Usesummary.fixest to see the results with the appropriatestandard-errors,fixef.fixest to extract the fixed-effect coefficients, andthe functionetable to visualize the results of multiple estimations.
Examples
data(trade)# We estimate the effect of distance on trade => we account for 3 fixed-effectsest_pois = femlm(Euros ~ log(dist_km)|Origin+Destination+Product, trade)# Obtaining the fixed-effects coefficients:fe_trade = fixef(est_pois)# The fixed-effects of the first fixed-effect dimension:head(fe_trade$Origin)# Summary information:summary(fe_trade)# Plotting them:plot(fe_trade)Functions exported fromnlme to implementfixest methods
Description
The packagefixest uses thefixef method fromnlme. Unfortunately,re-exporting this method is required in order not to attach packagenlme.
Details
Here is the help from packagenlme:
fixef. Thehelp from packagefixest is here:fixef.fixest.
Note
I could find this workaround thanks to the packageplm.
Retrieves the data set used for afixest estimation
Description
Retrieves the original data set used to estimate afixest orfixest_multi model.Note that this is the original data set and not the data used for the estimation (i.e. it can have more rows).
Usage
fixest_data(x, sample = "original")Arguments
x | An object of class |
sample | Either "original" (default) or "estimation". If equal to "original",it matches the original data set. If equal to "estimation", the rows of the data setreturned matches the observations used for the estimation. |
Value
It returns a data.frame equal to the original data set used for the estimation, when the function was called.
Ifsample = "estimation", only the lines used for the estimation are returned.
In case of afixest_multi object, it returns the data set of the first estimation object.So in that case it does not make sense to usesample = "estimation" sincethe samples may be inconsistent across the different estimations.
Examples
base = setNames(iris, c("y", "x1", "x2", "x3", "species"))base$y[1:5] = NAest = feols(y ~ x1 + x2, base)# the original data sethead(fixest_data(est))# the data set, with only the lines used for the estimationhead(fixest_data(est, sample = "est"))Permanently removes the fixest package startup message
Description
Package startup messages can be very annoying, although sometimes they can be necessary.Use this function to preventfixest's package startup message from popping when loading.This will be specific to your current project.
Usage
fixest_startup_msg(x)Arguments
x | Logical, no default. If |
Details
Note that this function is introduced to cope with the firstfixest startup message(in version 0.9.0).
This function works only with R >= 4.0.0. There are no startup messages for R < 4.0.0.
Extract the formula of afixest fit
Description
This function extracts the formula from afixest estimation (obtained withfemlm,feols orfeglm). If the estimation was done with fixed-effects, they are addedin the formula after a pipe (“|”). If the estimation was done with a nonlinear in parameters part, then this will be added in the formula in betweenI().
Usage
## S3 method for class 'fixest'formula(x, type = "full", fml.update = NULL, fml.build = NULL, ...)## S3 method for class 'fixest_multi'formula(x, type = "full", fml.update = NULL, fml.build = NULL, ...)Arguments
x | An object of class |
type | A character scalar. Default is
|
fml.update | A formula representing the changes to be made to the originalformula. By default it is |
fml.build | A formula or
Example, the original estimation was |
... | Not currently used. |
Details
The argumentstype,fml.update andfml.build are exclusive: theycannot be used at the same time.
Value
It returns either a one-sided formula, either a two-sided formula.
Author(s)
Laurent Berge
See Also
See also the main estimation functionsfemlm,feols orfeglm.model.matrix.fixest,update.fixest,summary.fixest,vcov.fixest.
Examples
# example estimation with IVS and FEsbase = setNames(iris, c("y", "x1", "endo", "instr", "species"))est = feols(y ~ x1 | species | endo ~ instr, base)# the full formulaformula(est)# idem without the IVs nor the FEsformula(est, "full.nofixef.noiv")# the reduced formformula(est, "iv.reduced")# the IV relation onlyformula(est, "iv")# the dependent variable => onse-sided formulaformula(est, "lhs")# using update, we add x1^2 as an independent variable:formula(est, fml.update = . ~ . + x1^2)# using build, see the difference => the FEs and the IVs are not inheritedformula(est, fml.build = . ~ . + x1^2)# we can use some special variablesformula(est, fml.build = . ~ .endo + .indep)Fulton Fish Market data
Description
This dataset has been taken from Jeff Wooldridge's textbook.A modified version that appears in thewooldridge package.
Usage
data(fulton)Format
fulton is a data frame with 97 observations and 12 variables namedt,day,price,qty,speed2,wave2,speed3,wave3,price_asian,price_white,qty_asian,qty_white.Each row is a recording of the Fulton fish market sales on a given day.
t: Time-trend
day: Day of the week
price: Average price of fish (calculated as
(qty_white * price_white + qty_asian * price_asian) / (qty_white + qty_asian))qty: Quantity of fish sold (calculated as
qty_white + qty_asian)speed2: Wind speeds (minimum of past 2 days)
wave2: Maximum wave height (average of last 2 days)
speed3: Wind speed (3 day lag)
wave3: Maximum wave height (average of last 3 and 4 day lag)
price_asian: Average price of fish sold to Asian customers
price_white: Average price of fish sold to White customers
qty_asian: Quantity of fish sold to Asian customers
qty_white: Quantity of fish sold to White customers
Details
Source: K Graddy (1995), “Testing for Imperfect Competition at the Fulton Fish Market,” RAND Journal of Economics 26, 75-92.
Source
https://www.cengage.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&product_isbn_issn=9781111531041
Hat values forfixest objects
Description
Computes the hat values forfeols orfeglm estimations.
Usage
## S3 method for class 'fixest'hatvalues(model, exact = TRUE, boot.size = 1000, ...)Arguments
model | A fixest object. For instance from feols or feglm. |
exact | Logical scalar, default is |
boot.size | Integer scalar or |
... | Not currently used. |
Details
Hat values are not available forfenegbin,femlm andfeNmlm estimations.
Hat values for generalized linear model are disussed in Belsley, Kuh and Welsch (1980), Cook and Weisberg (1982), etc.
Whenexact == FALSE, the Johnson-Lindenstrauss approximation (JLA) algorithm is used which approximates the diagonals of the projection matrix. For more precision (but longer time), increase the value ofboot.size. See Kline, Saggio, and Sølvsten (2020) for details.
Value
Returns a vector of the same length as the number of observations used in the estimation.
References
Belsley, D. A., Kuh, E. and Welsch, R. E. (1980).Regression Diagnostics. New York: Wiley.Cook, R. D. and Weisberg, S. (1982).Residuals and Influence in Regression. London: Chapman and Hall.Kline, P., Saggio R., and Sølvsten, M. (2020).Leave‐Out Estimation of Variance Components. Econometrica.
Examples
est = feols(Petal.Length ~ Petal.Width + Sepal.Width, iris)head(hatvalues(est))Create, or interact variables with, factors
Description
Treat a variable as a factor, or interacts a variable with a factor. Values tobe dropped/kept from the factor can be easily set. Note that to interactfixed-effects, this function should not be used: instead use directly the syntaxfe1^fe2.
Usage
i(factor_var, var, ref, keep, bin, ref2, keep2, bin2, ...)Arguments
factor_var | A vector (of any type) that will be treated as a factor.You can set references (i.e. exclude values for which to create dummies) withthe |
var | A variable of the same length as |
ref | A vector of values to be taken as references from |
keep | A vector of values to be kept from |
bin | A list of values to be grouped, a vector, a formula, or the specialvalues |
ref2 | A vector of values to be dropped from |
keep2 | A vector of values to be kept from |
bin2 | A list or vector defining the binning of the second variable.See help for the argument |
... | Not currently used. |
Details
To interact fixed-effects, this function should not be used: instead use directly the syntaxfe1^fe2 in the fixed-effects part of the formula. Please see the details andexamples in the help page offeols.
Value
It returns a matrix with number of rows the length offactor_var. If there is no interactedvariable or it is interacted with a numeric variable, the number of columns is equal to thenumber of cases contained infactor_var minus the reference(s). If the interacted variable isa factor, the number of columns is the number of combined cases betweenfactor_var andvar.
Author(s)
Laurent Berge
See Also
iplot to plot interactions or factors created withi(),feols forOLS estimation with multiple fixed-effects.
See the functionbin for binning variables.
Examples
## Simple illustration#x = rep(letters[1:4], 3)[1:10]y = rep(1:4, c(1, 2, 3, 4))# interactiondata.frame(x, y, i(x, y, ref = TRUE))# without interactiondata.frame(x, i(x, "b"))# you can interact factors tooz = rep(c("e", "f", "g"), c(5, 3, 2))data.frame(x, z, i(x, z))# to force a numeric variable to be treated as a factor: use i.data.frame(x, y, i(x, i.y))# Binningdata.frame(x, i(x, bin = list(ab = c("a", "b"))))# Same as before but using .() for list() and a regular expression# note that to trigger a regex, you need to use an @ firstdata.frame(x, i(x, bin = .(ab = "@a|b")))## In fixest estimations#data(base_did)# We interact the variable 'period' with the variable 'treat'est_did = feols(y ~ x1 + i(period, treat, 5) | id + period, base_did)# => plot only interactions with iplotiplot(est_did)# Using i() for factorsest_bis = feols(y ~ x1 + i(period, keep = 3:6) + i(period, treat, 5) | id, base_did)# we plot the second set of variables created with i()# => we need to use keep (otherwise only the first one is represented)coefplot(est_bis, keep = "trea")# => special treatment in etableetable(est_bis, dict = c("6" = "six"))## Interact two factors## We use the i. prefix to consider week as a factordata(airquality)aq = airqualityaq$week = aq$Day %/% 7 + 1# Interacting Month and week:res_2F = feols(Ozone ~ Solar.R + i(Month, i.week), aq)# Same but dropping the 5th Month and 1st weekres_2F_bis = feols(Ozone ~ Solar.R + i(Month, i.week, ref = 5, ref2 = 1), aq)etable(res_2F, res_2F_bis)## Binning#data(airquality)feols(Ozone ~ i(Month, bin = "bin::2"), airquality)feols(Ozone ~ i(Month, bin = list(summer = 7:9)), airquality)Lags a variable using a formula
Description
Lags a variable using panel id + time identifiers in a formula.
Usage
## S3 method for class 'formula'lag( x, k = 1, data, time.step = NULL, fill = NA, duplicate.method = "none", ...)lag_fml( x, k = 1, data, time.step = NULL, fill = NA, duplicate.method = "none", ...)Arguments
x | A formula of the type |
k | An integer giving the number of lags. Default is 1. For leads,just use a negative number. |
data | Optional, the data.frame in which to evaluate the formula. If not provided,variables will be fetched in the current environment. |
time.step | The method to compute the lags, default is |
fill | Scalar. How to fill the observations without defined lead/lag values.Default is |
duplicate.method | If several observations have the same id and time values,then the notion of lag is not defined for them. If |
... | Not currently used. |
Value
It returns a vector of the same type and length as the variable to be lagged in the formula.
Functions
lag_fml(): Lags a variable using a formula syntax
Author(s)
Laurent Berge
See Also
Alternatively, the functionpanel changes adata.frame into a panel from whichthe functionsl andf (creating leads and lags) can be called. Otherwise you can setthe panel 'live' during the estimation using the argumentpanel.id (see for example inthe functionfeols).
Examples
# simple example with an unbalanced panelbase = data.frame(id = rep(1:2, each = 4), time = c(1, 2, 3, 4, 1, 4, 6, 9), x = 1:8)base$lag1 = lag(x~id+time, 1, base) # lag 1base$lead1 = lag(x~id+time, -1, base) # lead 1base$lag2_fill0 = lag(x~id+time, 2, base, fill = 0)# with time.step = "consecutive"base$lag1_consecutive = lag(x~id+time, 1, base, time.step = "consecutive")# => works for indiv. 2 because 9 (resp. 6) is consecutive to 6 (resp. 4)base$lag1_within.consecutive = lag(x~id+time, 1, base, time.step = "within")# => now two consecutive years within each indiv is one lagprint(base)# Argument time.step = "consecutive" is# mostly useful when the time variable is not a number:# e.g. c("1991q1", "1991q2", "1991q3") etc# with duplicatesbase_dup = data.frame(id = rep(1:2, each = 4), time = c(1, 1, 1, 2, 1, 2, 2, 3), x = 1:8)# Error because of duplicate values for (id, time)try(lag(x~id+time, 1, base_dup))# Error is bypassed, lag corresponds to first occurence of (id, time)lag(x~id+time, 1, base_dup, duplicate.method = "first")# Playing with time stepsbase = data.frame(id = rep(1:2, each = 4), time = c(1, 2, 3, 4, 1, 4, 6, 9), x = 1:8)# time step: 0.5 (here equivalent to lag of 1)lag(x~id+time, 2, base, time.step = 0.5)# Error: wrong time steptry(lag(x~id+time, 2, base, time.step = 7))# Adding NAs + unsorted IDsbase = data.frame(id = rep(1:2, each = 4), time = c(4, NA, 3, 1, 2, NA, 1, 3), x = 1:8)base$lag1 = lag(x~id+time, 1, base)base$lag1_within = lag(x~id+time, 1, base, time.step = "w")base_bis = base[order(base$id, base$time),]print(base_bis)# You can create variables without specifying the data within data.table:if(require("data.table")){ base = data.table(id = rep(1:2, each = 3), year = 1990 + rep(1:3, 2), x = 1:6) base[, x.l1 := lag(x~id+year, 1)]}Extracts the log-likelihood
Description
This function extracts the log-likelihood from afixest estimation.
Usage
## S3 method for class 'fixest'logLik(object, ...)Arguments
object | A |
... | Not currently used. |
Details
This function extracts the log-likelihood based on the model fit. You can have moreinformation on the likelihoods in the details of the functionfemlm.
Value
It returns a numeric scalar.
Author(s)
Laurent Berge
See Also
See also the main estimation functionsfemlm,feols orfeglm. Otherstatistics functions:AIC.fixest,BIC.fixest.
Examples
# simple estimation on iris data with "Species" fixed-effectsres = femlm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width | Species, iris)nobs(res)logLik(res)Design matrix of afixest object
Description
This function creates the left-hand-side or the right-hand-side(s) of afemlm,feols orfeglm estimation.
Usage
## S3 method for class 'fixest'model.matrix( object, data = NULL, type = "rhs", sample = "estimation", na.rm = FALSE, subset = FALSE, as.matrix = FALSE, as.df = FALSE, collin.rm = TRUE, ...)Arguments
object | A |
data | A data.frame or |
type | Character vector or one sided formula, default is "rhs". Contains the type ofmatrix/data.frame to be returned. Possible values are: "lhs", "rhs", "fixef", "iv.rhs1"(1st stage RHS), "iv.rhs2" (2nd stage RHS), "iv.endo" (endogenous vars.), "iv.exo"(exogenous vars), "iv.inst" (instruments). |
sample | Character scalar equal to "estimation" (default) or "original". Onlyused when If |
na.rm | Logical scalar, default is |
subset | Logical scalar or character vector. Default is |
as.matrix | Logical scalar, default is |
as.df | Logical scalar, default is |
collin.rm | Logical scalar, default is |
... | Not currently used. |
Value
It returns either a vector, a matrix or a data.frame. It returns a vector for thedependent variable ("lhs"), a data.frame for the fixed-effects ("fixef") and a matrixfor any other type.
Author(s)
Laurent Berge
See Also
See also the main estimation functionsfemlm,feols orfeglm.formula.fixest,update.fixest,summary.fixest,vcov.fixest.
Examples
# we use a data set with NAs and fixed-effect singletonsbase = setNames(iris, c("y", "x1", "x2", "x3", "fe"))# adding NAsbase$x1[1:4] = NA# adding singletonsbase$fe = as.character(base$fe)base$fe[10 + 1:5] = letters[1:5]# OLS estimation where we remove singletonsest = feols(y ~ x1 + poly(x2, 2) | fe, base, fixef.rm = "singleton")# by default, we have the data set used in the estimationhead(model.matrix(est))nrow(model.matrix(est))# to have the original data set: we need to use sample="original"head(model.matrix(est, sample = "original"))nrow(model.matrix(est, sample = "original"))# we can drop only the NA values (and not the singletons) with na.rm=TRUEhead(model.matrix(est, sample = "original", na.rm = TRUE))nrow(model.matrix(est, sample = "original", na.rm = TRUE))## Illustration of subset## subset => character vectorhead(model.matrix(est, subset = "x1"))# subset => TRUE, only works with data argument!!head(model.matrix(est, data = base[, "x1", drop = FALSE], subset = TRUE))Extracts the models tree from afixest_multi object
Description
Extracts the meta information on all the models contained in afixest_multi estimation.
Usage
models(x, simplify = FALSE)Arguments
x | A |
simplify | Logical, default is |
Value
It returns adata.frame whose first column (namedid) is the index of the models andthe other columns contain the information specific to each model (e.g. which sample,which RHS, which dependent variable, etc).
See Also
multiple estimations infeols,n_models
Examples
# a multiple estimationbase = setNames(iris, c("y", "x1", "x2", "x3", "species"))est = feols(y ~ csw(x.[, 1:3]), base, fsplit = ~species)# All the meta informationmodels(est)# Illustration: Why use simplifyest_sub = est[sample = 2]models(est_sub)models(est_sub, simplify = TRUE)Gets the dimension offixest_multi objects
Description
Otabin the number of unique models of afixest_multi object, depending on thetype requested.
Usage
n_models( x, lhs = FALSE, rhs = FALSE, sample = FALSE, fixef = FALSE, iv = FALSE)Arguments
x | A |
lhs | Logical scalar, default is |
rhs | Logical scalar, default is |
sample | Logical scalar, default is |
fixef | Logical scalar, default is |
iv | Logical scalar, default is |
Value
It returns an integer scalar. If no argument is provided, the total number ofmodels is returned.
See Also
Multiple estimations infeols,models
Examples
base = setNames(iris, c("y", "x1", "x2", "x3", "species"))est = feols(y ~ csw(x1, x2, x3), base, fsplit = ~species)# there are 3 different RHSs and 4 different samplesmodels(est)# We can obtain these numbers with n_modelsn_models(est, rhs = TRUE)n_models(est, sample = TRUE)Prints the number of unique elements in a data set
Description
This utility tool displays the number of unique elements in one or multiple data.framesas well as their number of NA values.
Usage
n_unik(x)## S3 method for class 'vec_n_unik'print(x, ...)## S3 method for class 'list_n_unik'print(x, ...)Arguments
x | A formula, with data set names on the LHS and variables on the RHS,like |
... | Not currently used. |
Value
It returns a vector containing the number of unique values per element. If severaldata sets were provided, a list is returned, as long as the number of data sets,each element being a vector of unique values.
Special values and functions
In the formula, you can use the following special values:".",".N",".U", and".NA".
"."Accesses the default values. If there is only one data set and thedata set isnot a
data.table, then the default is to display the number ofobservations and the number of unique rows. If the data is adata.table, the numberof unique items in the key(s) is displayed instead of the number of unique rows(if the table has keys of course). If there are two or more data sets, then thedefault is to display the unique items for: a) the variables common across all data sets,if there's less than 4, and b) if no variable is shown in a), the number of variablescommon across at least two data sets, provided there are less than 5. If the data sets aredata tables, the keys are also displayed on top of the common variables. In any case, thenumber of observations is always displayed.".N"Displays the number of observations.
".U"Displays the number of unique rows.
".NA"Displays the number of rows with at least one NA.
TheNA function
The special functionNA is an equivalent tois.na but can handle several variables.For instance,NA(x, y) is equivalent tois.na(x) | is.na(y). You can add asmany variables as you want as arguments. If no argument is provided, as inNA(),it is identical to having all the variables of the data set as argument.
Combining variables
Use the "hat","^", operator to combine several variables. For exampleid^periodwill display the number of unique values of id x period combinations.
Use the "super hat","%^%", operator to also include the terms on both sides.For example, instead of writingid + period + id^period, you can simply writeid%^%period.
Alternatively, you can use: for^ and* for%^%.
Sub-selections
To show the number of unique values for sub samples, simply use[].For example,id[x > 10] will display the number of uniqueid for whichx > 10.
Simple square brackets lead to the inclusion of both the variable and its subset.For exampleid[x > 10] is equivalent toid + id[x > 10].To include only the sub selection, use double square brackets, as inid[[x > 10]].
You can add multiple sub selections at once, only separate them with a comma.For exampleid[x > 10, NA(y)] is equivalent toid[x > 10] + id[NA(y)].
Use the double negative operator, i.e.!!, to include both a condition andits opposite at once. For exampleid[!!x > 10] is equivalent toid[x > 10, !x > 10].Double negative operators can be chained, like inid[!!cond1 & !!cond2], then thecardinal product of all double negatived conditions is returned.
Author(s)
Laurent Berge
Examples
data = base_diddata$x1.L1 = round(lag(x1~id+period, 1, data))# By default, just the formatted number of observationsn_unik(data)# Or the nber of unique elements of a vectorn_unik(data$id)# number of unique id values and id x period pairsn_unik(data ~.N + id + id^period)# use the %^% operator to include the terms on the two sides at once# => same as id*periodn_unik(data ~.N + id %^% period)# using sub selection with []n_unik(data ~.N + period[!NA(x1.L1)])# to show only the sub selection: [[]]n_unik(data ~.N + period[[!NA(x1.L1)]])# you can have multiple values in [],# just separate them with a comman_unik(data ~.N + period[!NA(x1.L1), x1 > 7])# to have both a condition and its opposite,# use the !! operatorn_unik(data ~.N[!!NA(x1.L1)])# the !! operator works within condition chainsn_unik(data ~.N[!!NA(x1.L1) & !!x1 > 7])# Conditions can be distributedn_unik(data ~ (id + period)[x1 > 7])## Several data sets## Typical use case: merging# Let's create two data sets and merge themdata(base_did)base_main = base_didbase_extra = sample_df(base_main[, c("id", "period")], 100)base_extra$id[1:10] = 111:120base_extra$period[11:20] = 11:20base_extra$z = rnorm(100)# You can use db1:db2 to compare the common keys in two data sets n_unik(base_main:base_extra)tmp = merge(base_main, base_extra, all.x = TRUE, by = c("id", "period"))# You can show unique values for any variable, as beforen_unik(tmp + base_main + base_extra ~ id[!!NA(z)] + id^period)Extracts the number of observations form afixest object
Description
This function simply extracts the number of observations form afixest object,obtained using the functionsfemlm,feols orfeglm.
Usage
## S3 method for class 'fixest'nobs(object, ...)Arguments
object | A |
... | Not currently used. |
Value
It returns an interger.
Author(s)
Laurent Berge
See Also
See also the main estimation functionsfemlm,feols orfeglm.Usesummary.fixest to see the results with the appropriate standard-errors,fixef.fixest to extract the fixed-effects coefficients, and the functionetableto visualize the results of multiple estimations.
Examples
# simple estimation on iris data with "Species" fixed-effectsres = femlm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width | Species, iris)nobs(res)logLik(res)Extracts the observations used for the estimation
Description
This function extracts the observations used infixest estimation.Thestats::case.names S3 method calls this function
Usage
obs(x)## S3 method for class 'fixest'case.names(object, ...)Arguments
x | A |
object | A |
... | Ignored |
Value
It returns a simple vector of integers.
Examples
base = irisnames(base) = c("y", "x1", "x2", "x3", "species")base$y[1:5] = NA# Split sample estimationsest_split = feols(y ~ x1, base, split = ~species)(obs_setosa = obs(est_split[[1]]))(obs_versi = obs(est_split[sample = "versi", drop = TRUE]))est_versi = feols(y ~ x1, base, subset = obs_versi)etable(est_split, est_versi)Formatted object size
Description
Tools that returns a formatted object size, where the appropriate unit is automatically chosen.
Usage
osize(x)## S3 method for class 'osize'print(x, ...)Arguments
x | Any R object. |
... | Not currently used. |
Value
Returns a character scalar.
Author(s)
Laurent Berge
Examples
osize(iris)data(trade)osize(trade)Constructs afixest panel data base
Description
Constructs afixest panel data base out of a data.frame which allows to use leads and lagsinfixest estimations and to create new variables from leads and lags if the data.framewas also adata.table::data.table.
Usage
panel(data, panel.id, time.step = NULL, duplicate.method = "none")Arguments
data | A data.frame. |
panel.id | The panel identifiers. Can either be: i) a one sided formula(e.g. |
time.step | The method to compute the lags, default is |
duplicate.method | If several observations have the same id and time values,then the notion of lag is not defined for them. If |
Details
This function allows you to use leads and lags in afixest estimation without having toprovide the argumentpanel.id. It also offers more options on how to set the panel(with the additional arguments 'time.step' and 'duplicate.method').
When the initial data set was also adata.table, not all operations are supported and some maydissolve thefixest_panel. This is the case when creating subselections of the initial datawith additional attributes (e.g.pdt[x>0, .(x, y, z)] would dissolve thefixest_panel,meaning only a data.table would be the result of the call).
If the initial data set was also adata.table, then you can create new variables from lagsand leads using the functionsl andf. See the example.
Value
It returns a data base identical to the one given in input, but with an additional attribute:“panel_info”. This attribute contains vectors used to efficientlycreate lags/leads of the data. When the data is subselected, some bookeeping is performedon the attribute “panel_info”.
Author(s)
Laurent Berge
See Also
The estimation methodsfeols,fepois andfeglm.
The functionsl andf to create lags and leads withinfixest_panel objects.
Examples
data(base_did)# Setting a data set as a panel...pdat = panel(base_did, ~id+period)# ...then using the functions l and fest1 = feols(y~l(x1, 0:1), pdat)est2 = feols(f(y)~l(x1, -1:1), pdat)est3 = feols(l(y)~l(x1, 0:3), pdat)etable(est1, est2, est3, order = c("f", "^x"), drop="Int")# or using the argument panel.idfeols(f(y)~l(x1, -1:1), base_did, panel.id = ~id+period)# You can use panel.id in various ways:pdat = panel(base_did, ~id+period)# is identical to:pdat = panel(base_did, c("id", "period"))# and also to:pdat = panel(base_did, "id,period")# l() and f() can also be used within a data.table:if(require("data.table")){ pdat_dt = panel(as.data.table(base_did), ~id+period) # Now since pdat_dt is also a data.table # you can create lags/leads directly pdat_dt[, x1_l1 := l(x1)] pdat_dt[, c("x1_l1_fill0", "y_f2") := .(l(x1, fill = 0), f(y, 2))]}Displaying the most notable fixed-effects
Description
This function plots the 5 fixed-effects with the highest and lowest values, foreach of the fixed-effect dimension. It takes as an argument the fixed-effects obtainedfrom the functionfixef.fixest after an estimation usingfemlm,feols orfeglm.
Usage
## S3 method for class 'fixest.fixef'plot(x, n = 5, ...)Arguments
x | An object obtained from the function |
n | The number of fixed-effects to be drawn. Defaults to 5. |
... | Not currently used. Note that the fixed-effect coefficients might NOT be interpretable. This function isuseful only for fully regular panels. If the data are not regular in the fixed-effect coefficients, this means that several‘reference points’ are set to obtain the fixed-effects, therebyimpeding their interpretation. In this case a warning is raised. |
Author(s)
Laurent Berge
See Also
fixef.fixest to extract clouster coefficients. See also the mainestimation functionfemlm,feols orfeglm. Usesummary.fixest to seethe results with the appropriate standard-errors, the functionetable tovisualize the results of multiple estimations.
Examples
data(trade)# We estimate the effect of distance on trade# => we account for 3 fixed-effectsest_pois = femlm(Euros ~ log(dist_km)|Origin+Destination+Product, trade)# obtaining the fixed-effects coefficientsfe_trade = fixef(est_pois)# plotting themplot(fe_trade)Predict method forfixest fits
Description
This function obtains prediction from a fitted model estimated withfemlm,feols orfeglm.
Usage
## S3 method for class 'fixest'predict( object, newdata, type = c("response", "link"), se.fit = FALSE, interval = "none", level = 0.95, fixef = FALSE, vs.coef = FALSE, sample = c("estimation", "original"), vcov = NULL, ssc = NULL, ...)Arguments
object | A |
newdata | A data.frame containing the variables used to make the prediction.If not provided, the fitted expected (or linear if |
type | Character either equal to |
se.fit | Logical, default is |
interval | Either "none" (default), "confidence" or "prediction". What type ofconfidence interval to compute. Note that this feature is only available for OLS modelsnot containing fixed-effects (GLM/ML models are not covered). |
level | A numeric scalar in between 0.5 and 1, defaults to 0.95. Only used whenthe argument 'interval' is requested, it corresponds to the width of the confidence interval. |
fixef | Logical scalar, default is |
vs.coef | Logical scalar, default is |
sample | Either "estimation" (default) or "original". This argument is only usedwhen arg. 'newdata' is missing, and is ignored otherwise. If equal to "estimation",the vector returned matches the sample used for the estimation. If equal to "original",it matches the original data set (the observations not used for the estimation being filledwith NAs). |
vcov | Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form: |
ssc | An object of class |
... | Not currently used. |
Value
It returns a numeric vector of length equal to the number of observations in argumentnewdata.Ifnewdata is missing, it returns a vector of the same length as the estimation sample,except ifsample = "original", in which case the length of the vector will match the oneof the original data set (which can, but also cannot, be the estimation sample).Iffixef = TRUE, adata.frame is returned.Ifse.fit = TRUE orinterval != "none", the object returned is a data.framewith the following columns:fit,se.fit, and, if CIs are requested,ci_low andci_high.
Author(s)
Laurent Berge
See Also
See also the main estimation functionsfemlm,feols orfeglm.update.fixest,summary.fixest,vcov.fixest,fixef.fixest.
Examples
# Estimation on iris datares = fepois(Sepal.Length ~ Petal.Length | Species, iris)# what would be the prediction if the data was all setosa?newdata = data.frame(Petal.Length = iris$Petal.Length, Species = "setosa")pred_setosa = predict(res, newdata = newdata)# Let's look at it graphicallyplot(c(1, 7), c(3, 11), type = "n", xlab = "Petal.Length", ylab = "Sepal.Length")newdata = iris[order(iris$Petal.Length), ]newdata$Species = "setosa"lines(newdata$Petal.Length, predict(res, newdata))# versicolornewdata$Species = "versicolor"lines(newdata$Petal.Length, predict(res, newdata), col=2)# virginicanewdata$Species = "virginica"lines(newdata$Petal.Length, predict(res, newdata), col=3)# The original datapoints(iris$Petal.Length, iris$Sepal.Length, col = iris$Species, pch = 18)legend("topleft", lty = 1, col = 1:3, legend = levels(iris$Species))## Getting the fixed-effect coefficients for each obs.#data(trade)est_trade = fepois(Euros ~ log(dist_km) | Destination^Product + Origin^Product + Year, trade)obs_fe = predict(est_trade, fixef = TRUE)head(obs_fe)# can we check we get the right sum of fixed-effectshead(cbind(rowSums(obs_fe), est_trade$sumFE))## Standard-error of the prediction#base = setNames(iris, c("y", "x1", "x2", "x3", "species"))est = feols(y ~ x1 + species, base)head(predict(est, se.fit = TRUE))# regular confidence intervalhead(predict(est, interval = "conf"))# adding the residual to the CIhead(predict(est, interval = "predi"))# You can change the type of SE on the flyhead(predict(est, interval = "conf", vcov = ~species))A print facility forfixest objects.
Description
This function is very similar to usualsummary functions as itprovides the table of coefficients along with other information on the fit ofthe estimation. The type of output can be customized by the user (usingfunctionsetFixest_print).
Usage
## S3 method for class 'fixest'print(x, n, type = "table", fitstat = NULL, ...)setFixest_print(type = "table", fitstat = NULL)getFixest_print()Arguments
x | A |
n | Integer, number of coefficients to display. By default, only thefirst 8 coefficients are displayed if |
type | Either |
fitstat | A formula or a character vector representing which fitstatistic to display. The types must be valid types of the function |
... | Other arguments to be passed to |
Details
It is possible to set the default values for the argumentstype andfitstat by using the functionsetFixest_print.
Author(s)
Laurent Berge
See Also
See also the main estimation functionsfemlm,feols orfeglm. Usesummary.fixest to see the results with the appropriatestandard-errors,fixef.fixest to extract thefixed-effects coefficients, and the functionetable tovisualize the results of multiple estimations.
Examples
# Load trade datadata(trade)# We estimate the effect of distance on trade# => we account for 3 fixed-effects (FEs)est_pois = fepois(Euros ~ log(dist_km)|Origin+Destination+Product, trade)# displaying the resultsprint(est_pois)# By default the coefficient table is displayed.# If the user wished to display only the coefficents, use option type:print(est_pois, type = "coef")# To permanently display coef. only, use setFixest_print:setFixest_print(type = "coef")est_pois# back to default:setFixest_print(type = "table")## fitstat## We modify which fit statistic to displayprint(est_pois, fitstat = ~ . + lr)# We add the LR test to the default (represented by the ".")# to show only the LR stat:print(est_pois, fitstat = ~ . + lr.stat)# To modify the defaults:setFixest_print(fitstat = ~ . + lr.stat + rmse)est_pois# Back to default (NULL == default)setFixest_print(fitstat = NULL)Print method for fit statistics of fixest estimations
Description
Displays a brief summary of selected fit statistics from the functionfitstat.
Usage
## S3 method for class 'fixest_fitstat'print(x, na.rm = FALSE, ...)Arguments
x | An object resulting from the |
na.rm | Logical, default is |
... | Not currently used. |
Examples
data(trade)gravity = feols(log(Euros) ~ log(dist_km) | Destination + Origin, trade)# Extracting the 'working' number of observations used to compute the pvaluesfitstat(gravity, "g", simplify = TRUE)# Some fit statisticsfitstat(gravity, ~ rmse + r2 + wald + wf)# You can use them in etableetable(gravity, fitstat = ~ rmse + r2 + wald + wf)# For wald and wf, you could show the pvalue instead:etable(gravity, fitstat = ~ rmse + r2 + wald.p + wf.p)# Now let's display some statistics that are not built-in# => we use fitstat_register to create them# We need: a) type name, b) the function to be applied# c) (optional) an aliasfitstat_register("tstand", function(x) tstat(x, se = "stand")[1], "t-stat (regular)")fitstat_register("thc", function(x) tstat(x, se = "heter")[1], "t-stat (HC1)")fitstat_register("t1w", function(x) tstat(x, se = "clus")[1], "t-stat (clustered)")fitstat_register("t2w", function(x) tstat(x, se = "twow")[1], "t-stat (2-way)")# Now we can use these keywords in fitstat:etable(gravity, fitstat = ~ . + tstand + thc + t1w + t2w)# Note that the custom stats we created are can easily lead# to errors, but that's another story!Print method for fixest_multi objects
Description
Displays summary information on fixest_multi objects in the R console.
Usage
## S3 method for class 'fixest_multi'print(x, type = "etable", ...)Arguments
x | A |
type | A character either equal to |
... | Other arguments to be passed to |
See Also
The main fixest estimation functions:feols,fepois,fenegbin,feglm,feNmlm. Tools for mutliple fixestestimations:summary.fixest_multi,print.fixest_multi,as.list.fixest_multi,sub-sub-.fixest_multi,sub-.fixest_multi.
Examples
base = irisnames(base) = c("y", "x1", "x2", "x3", "species")# Multiple estimationres = feols(y ~ csw(x1, x2, x3), base, split = ~species)# Let's print all thatresR2s offixest models
Description
Reports different R2s forfixest estimations (e.g.feglm orfeols).
Usage
r2(x, type = "all", full_names = FALSE)Arguments
x | |
type | A character vector representing the R2 to compute. The R2 codes are of the form:"wapr2" with letters "w" (within), "a" (adjusted) and "p" (pseudo) possibly missing.E.g. to get the regular R2: use |
full_names | Logical scalar, default is |
Details
The pseudo R2s are the McFaddens R2s, that is the ratio of log-likelihoods.
For R2s with no theoretical justification, like e.g. regular R2s for maximum likelihoodmodels – or within R2s for models without fixed-effects, NA is returned.The single measure to possibly compare all kinds of models is the squaredcorrelation between the dependent variable and the expected predictor.
The pseudo-R2 is also returned in the OLS case, it corresponds to thepseudo-R2 of the equivalent GLM model with a Gaussian family.
For the adjusted within-R2s, the adjustment factor is(n - nb_fe) / (n - nb_fe - K)withn the number of observations,nb_fe the number of fixed-effects andKthe number of variables.
Value
Returns a named vector.
Author(s)
Laurent Berge
Examples
# Load trade datadata(trade)# We estimate the effect of distance on trade (with 3 fixed-effects)est = feols(log(Euros) ~ log(dist_km) | Origin + Destination + Product, trade)# Squared correlation:r2(est, "cor2")# "regular" r2:r2(est, "r2")# pseudo r2 (equivalent to GLM with Gaussian family)r2(est, "pr2")# adjusted within r2r2(est, "war2")# all four at oncer2(est, c("cor2", "r2", "pr2", "war2"))# same with full names instead of codesr2(est, c("cor2", "r2", "pr2", "war2"), full_names = TRUE)Refactors a variable
Description
Takes a variables of any types, transforms it into a factors, and modifies the valuesof the factors. Useful in estimations when you want to set some value of a vector as a reference.
Usage
ref(x, ref)Arguments
x | A vector of any type (must be atomic though). |
ref | A vector or a list, or special binning values (explained later). If a vector,it must correspond to (partially matched) values of the vector |
Value
It returns a factor of the same length asx, where levels have been modified accordingto the argumentref.
"Cutting" a numeric vector
Numeric vectors can be cut easily into: a) equal parts, b) user-specified bins.
Use"cut::n" to cut the vector inton (roughly) equal parts. Percentiles areused to partition the data, hence some data distributions can lead to create lessthann parts (for example if P0 is the same as P50).
The user can specify custom bins with the following syntax:"cut::a]b]c]". Herethe numbersa,b,c, etc, are a sequence of increasing numbers, each followedby an open or closed square bracket. The numbers can be specified as eitherplain numbers (e.g."cut::5]12[32["), quartiles (e.g."cut::q1]q3["),or percentiles (e.g."cut::p10]p15]p90]"). Values of different types can be mixed:"cut::5]q2[p80[" is valid provided the median (q2) is indeed greaterthan5, otherwise an error is thrown.
The square bracket right of each number tells whether the numbers should be includedor excluded from the current bin. For example, sayx ranges from 0 to 100,then"cut::5]" will create two bins: one from 0 to 5 and a second from 6 to 100.With"cut::5[" the bins would have been 0-4 and 5-100.
A factor is always returned. The labels always report the min and max values in each bin.
To have user-specified bin labels, just add them in the character vectorfollowing'cut::values'. You don't need to provide all of them, andNA valuesfall back to the default label. For example,bin = c("cut::4", "Q1", NA, "Q3")will modify only the first and third label that will be displayed as"Q1" and"Q3".
bin vsref
The functionsbin andref are able to do the same thing, then why use oneinstead of the other? Here are the differences:
refalways returns a factor. This is in contrast withbinwhich returns,when possible, a vector of the same type as the vector in input.refalways places the values modified in the first place of the factor levels.On the other hand,bintries to not modify the ordering of the levels. It is possibleto makebinmimic the behavior ofrefby adding an"@"as the first element ofthe list in the argumentbin.when a vector (and not a list) is given in input,
refwill place each element ofthe vector in the first place of the factor levels. The behavior ofbinistotally different,binwill transform all the values in the vector into a singlevalue inx(i.e. it's binning).
Author(s)
Laurent Berge
See Also
To bin the values of a vector:bin.
Examples
data(airquality)# A vector of monthsmonth_num = airquality$Monthmonth_lab = c("may", "june", "july", "august", "september")month_fact = factor(month_num, labels = month_lab)table(month_num)table(month_fact)## Main use## Without argument: equivalent to as.factorref(month_num)# Main usage: to set a level first:# (Note that partial matching is enabled.)table(ref(month_fact, "aug"))# You can rename the level on-the-fly# (Northern hemisphere specific!)table(ref(month_fact, .("Hot month"="aug", "Late summer" = "sept")))# Main use is in estimations:a = feols(Petal.Width ~ Petal.Length + Species, iris)# We change the referenceb = feols(Petal.Width ~ Petal.Length + ref(Species, "vers"), iris)etable(a, b)## Binning## You can also bin factor values on the fly# Using @ first means a regular expression will be used to match the values.# Note that the value created is placed first.# To avoid that behavior => use the function "bin"table(ref(month_fact, .(summer = "@jul|aug|sep")))# Please refer to the example in the bin help page for more example.# The syntax is the same.## Precise relocation## You can place a factor at the location you want# by adding "@digit" in the name first:table(ref(month_num, .("@5"=5)))# Same with renamingtable(ref(month_num, .("@5 five"=5)))Replicatesfixest objects
Description
Simple function that replicatesfixest objects while (optionally) computing differentstandard-errors. Useful mostly in combination withetable orcoefplot.
Usage
## S3 method for class 'fixest'rep(x, times = 1, each = 1, vcov, ...)## S3 method for class 'fixest_list'rep(x, times = 1, each = 1, vcov, ...).l(...)Arguments
x | Either a |
times | Integer vector giving the number of repetitions of the vector of elements. Bydefault |
each | Integer scalar indicating the repetition of each element. Default is 1. |
vcov | A list containing the types of standard-error to be computed, default is missing. Ifnot missing, it must be of the same length as |
... | In |
Details
To applyrep.fixest on a list offixest objects, it is absolutely necessary to use.l() and notlist().
Value
Returns a list of the appropriate length. Each element of the list is afixest object.
Examples
# Let's show results with different standard-errorsest = feols(Ozone ~ Solar.R + Wind + Temp, data = airquality)my_vcov = list(~ Month, ~ Day, ~ Day + Month)etable(rep(est, vcov = my_vcov))coefplot(rep(est, vcov = my_vcov), drop = "Int")## To rep multiple objects, you need to use .l()#est_bis = feols(Ozone ~ Solar.R + Wind + Temp | Month, airquality)etable(rep(.l(est, est_bis), vcov = my_vcov))# using eachetable(rep(.l(est, est_bis), each = 3, vcov = my_vcov))Extracts residuals from afixest object
Description
This function extracts residuals from a fitted model estimated withfemlm,feols orfeglm.
Usage
## S3 method for class 'fixest'resid( object, type = c("response", "deviance", "pearson", "working"), na.rm = TRUE, ...)## S3 method for class 'fixest'residuals( object, type = c("response", "deviance", "pearson", "working"), na.rm = TRUE, ...)Arguments
object | A |
type | A character scalar, either |
na.rm | Logical, default is |
... | Not currently used. |
Value
It returns a numeric vector of the length the number of observations used for the estimation(ifna.rm = TRUE) or of the length of the original data set (ifna.rm = FALSE).
Author(s)
Laurent Berge
See Also
See also the main estimation functionsfemlm,feols orfeglm.fitted.fixest,predict.fixest,summary.fixest,vcov.fixest,fixef.fixest.
Examples
# simple estimation on iris data, using "Species" fixed-effectsres_poisson = femlm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width | Species, iris)# we plot the residualsplot(resid(res_poisson))Extracts the residuals from afixest_multi object
Description
Utility to extract the residuals from multiplefixest estimations. If possible,all the residuals are coerced into a matrix.
Usage
## S3 method for class 'fixest_multi'resid( object, type = c("response", "deviance", "pearson", "working"), na.rm = FALSE, ...)## S3 method for class 'fixest_multi'residuals( object, type = c("response", "deviance", "pearson", "working"), na.rm = FALSE, ...)Arguments
object | A |
type | A character scalar, either |
na.rm | Logical, default is |
... | Not currently used. |
Value
If all the models return residuals of the same length, a matrix is returned. Otherwise,alist is returned.
Examples
base = irisnames(base) = c("y", "x1", "x2", "x3", "species")# A multiple estimationest = feols(y ~ x1 + csw0(x2, x3), base)# We can get all the residuals at once,# each column is a modelhead(resid(est))# We can select/order the model using fixest_multi extractionhead(resid(est[rhs = .N:1]))Randomly draws observations from a data set
Description
This function is useful to check a data set. It gives a random number of rows ofthe input data set.
Usage
sample_df(x, n = 10, previous = FALSE)Arguments
x | A data set: either a vector, a matrix or a data frame. |
n | The number of random rows/elements to sample randomly. |
previous | Logical scalar. Whether the results of the previous draw should be returned. |
Value
A data base (resp vector) withn rows (resp elements).
Author(s)
Laurent Berge
Examples
sample_df(iris)sample_df(iris, previous = TRUE)Functions exported fromsandwich to implementfixest methods
Description
The packagefixest does not useestfun orbread fromsandwich, but thesemethods have been implemented to allow users to leverage the variances fromsandwich.
Details
Here is the help from packagesandwich:
estfunandbread. The help from packagefixest ishere:estfun.fixestandbread.fixest.
Sets the defaults of coefplot
Description
You can set the default values of most arguments ofcoefplot with this function.
Usage
setFixest_coefplot( style, horiz = FALSE, dict = getFixest_dict(), keep, ci.width = "1%", ci_level = 0.95, pt.pch = 20, pt.bg = NULL, cex = 1, pt.cex = cex, col = 1:8, pt.col = col, ci.col = col, lwd = 1, pt.lwd = lwd, ci.lwd = lwd, ci.lty = 1, grid = TRUE, grid.par = list(lty = 3, col = "gray"), zero = TRUE, zero.par = list(col = "black", lwd = 1), pt.join = FALSE, pt.join.par = list(col = pt.col, lwd = lwd), ci.join = FALSE, ci.join.par = list(lwd = lwd, col = col, lty = 2), ci.fill = FALSE, ci.fill.par = list(col = "lightgray", alpha = 0.5), ref.line = "auto", ref.line.par = list(col = "black", lty = 2), lab.cex, lab.min.cex = 0.85, lab.max.mar = 0.25, lab.fit = "auto", xlim.add, ylim.add, sep, bg, group = "auto", group.par = list(lwd = 2, line = 3, tcl = 0.75), main = "Effect on __depvar__", value.lab = "Estimate and __ci__ Conf. Int.", ylab = NULL, xlab = NULL, sub = NULL, reset = FALSE)getFixest_coefplot()Arguments
style | A character scalar giving the style of the plot to be used. Youcan set styles with the function |
horiz | A logical scalar, default is |
dict | A named character vector or a logical scalar. It changes the original variable namesto the ones contained in the |
keep | Character vector. This element is used to display only a subset of variables. Thisshould be a vector of regular expressions (see |
ci.width | The width of the extremities of the confidence intervals. Default is |
ci_level | Scalar between 0 and 1: the level of the CI. By default it is equal to 0.95. |
pt.pch | The patch of the coefficient estimates. Default is 1 (circle). |
pt.bg | The background color of the point estimate (when the |
cex | Numeric, default is 1. Expansion factor for the points |
pt.cex | The size of the coefficient estimates. Default is the other argument |
col | The color of the points and the confidence intervals. Default is 1("black"). Note that you can set the colors separately for each of themwith |
pt.col | The color of the coefficient estimates. Default is equal to the argument |
ci.col | The color of the confidence intervals. Default is equal to the argument |
lwd | General line with. Default is 1. |
pt.lwd | The line width of the coefficient estimates. Default is equal tothe other argument |
ci.lwd | The line width of the confidence intervals. Default is equal tothe other argument |
ci.lty | The line type of the confidence intervals. Default is 1. |
grid | Logical, default is |
grid.par | List. Parameters of the grid. The default values are: |
zero | Logical, default is |
zero.par | List. Parameters of the zero-line. The default values are |
pt.join | Logical, default is |
pt.join.par | List. Parameters of the line joining the coefficients. Thedefault values are: |
ci.join | Logical default to |
ci.join.par | A list of parameters to be passed to |
ci.fill | Logical default to |
ci.fill.par | A list of parameters to be passed to |
ref.line | Logical or numeric, default is "auto", whose behavior dependson the situation. It is |
ref.line.par | List. Parameters of the vertical line on the reference. Thedefault values are: |
lab.cex | The size of the labels of the coefficients. Default is missing.It is automatically set by an internal algorithm which can go as low as |
lab.min.cex | The minimum size of the coefficients labels, as set by theinternal algorithm. Default is 0.85. |
lab.max.mar | The maximum size the left margin can take when trying to fitthe coefficient labels into it (only when |
lab.fit | The method to fit the coefficient labels into the plotting region(only when |
xlim.add | A numeric vector of length 1 or 2. It represents an extensionfactor of xlim, in percentage. Eg: |
ylim.add | A numeric vector of length 1 or 2. It represents an extensionfactor of ylim, in percentage. Eg: |
sep | The distance between two estimates – only when argument |
bg | Background color for the plot. By default it is white. |
group | A list, default is missing. Each element of the list reports thecoefficients to be grouped while the name of the element is the group name. Eachelement of the list can be either: i) a character vector of length 1, ii) oflength 2, or ii) a numeric vector. If equal to: i) then it is interpreted asa pattern: all element fitting the regular expression will be grouped (note thatyou can use the special character "^^" to clean the beginning of the names, seeexample), if ii) it corresponds to the first and last elements to be grouped,if iii) it corresponds to the coefficients numbers to be grouped. If equal toa character vector, you can use a percentage to tell the algorithm to look atthe coefficients before aliasing (e.g. |
group.par | A list of parameters controlling the display of the group. Theparameters controlling the line are: |
main | The title of the plot. Default is |
value.lab | The label to appear on the side of the coefficient values. If |
ylab | The label of the y-axis, default is |
xlab | The label of the x-axis, default is |
sub | A subtitle, default is |
reset | Logical, default is |
Value
Doesn't return anything.
See Also
Examples
# coefplot has many arguments, which makes it highly flexible.# If you don't like the default style of coefplot. No worries,# you can set *your* default by using the function# setFixest_coefplot()# Estimationest = feols(Petal.Length ~ Petal.Width + Sepal.Length + Sepal.Width | Species, iris)# Plot with default stylecoefplot(est)# Now we permanently change some argumentsdict = c("Petal.Length"="Length (Petal)", "Petal.Width"="Width (Petal)", "Sepal.Length"="Length (Sepal)", "Sepal.Width"="Width (Sepal)")setFixest_coefplot(ci.col = 2, pt.col = "darkblue", ci.lwd = 3, pt.cex = 2, pt.pch = 15, ci.width = 0, dict = dict)# Tadaaa!coefplot(est)# To reset to the default settings:setFixest_coefplot("all", reset = TRUE)coefplot(est)Sets/gets the dictionary relabeling the variables
Description
Sets/gets the default dictionary used in the functionetable,did_means andcoefplot. The dictionaries are used to relabel variables (usually towards a fancier, moreexplicit formatting) when exporting them into a Latex table or displaying in graphs. By settingthe dictionary withsetFixest_dict, you can avoid providing the argumentdict.
Usage
setFixest_dict(dict = NULL, ..., reset = FALSE)getFixest_dict()Arguments
dict | A named character vector or a character scalar. E.g. to change my variable named "a"and "b" to (resp.) "$log(a)$" and "$bonus^3$", then use |
... | You can add arguments of the form: |
reset | Logical, default is |
Details
By default the dictionary only grows. This means that successive calls with not erase theprevious definitions unless the argumentreset has been set toTRUE.
The default dictionary is equivalent to havingsetFixest_dict("(Intercept)" = "Constant"). Tochange this default, you need to provide a new definition to"(Intercept)" explicitly.
Author(s)
Laurent Berge
Examples
data(trade)est = feols(log(Euros) ~ log(dist_km)|Origin+Destination+Product, trade)# we export the result & rename some variablesetable(est, dict = c("log(Euros)"="Euros (ln)", Origin="Country of Origin"))# If you export many tables, it can be more convenient to use setFixest_dict:setFixest_dict(c("log(Euros)"="Euros (ln)", Origin="Country of Origin"))etable(est) # variables are properly relabeled# The dictionary only 'grows'# Here you get the previous two variables + the new one that are relabeled# Btw you set the dictionary directly using the argument names:setFixest_dict(Destination = "Country of Destination")etable(est)# Another way to set a dictionary: with a character string:# See the help page of as.dictdict = "log(dist_km): Distance (ln); Product: Type of Good"setFixest_dict(dict)etable(est)# And now we reset:setFixest_dict(reset = TRUE)etable(est)Default arguments for fixest estimations
Description
This function sets globally the default arguments of fixest estimations.
Usage
setFixest_estimation( data = NULL, panel.id = NULL, fixef.rm = "perfect_fit", fixef.tol = 1e-06, fixef.iter = 10000, collin.tol = 1e-10, lean = FALSE, verbose = 0, warn = TRUE, fixef.keep_names = NULL, demeaned = FALSE, mem.clean = FALSE, glm.iter = 25, glm.tol = 1e-08, data.save = FALSE, reset = FALSE)getFixest_estimation()Arguments
data | A data.frame containing the necessary variables to run the model.The variables of the non-linear right hand side of the formula are identifiedwith this |
panel.id | The panel identifiers. Can either be: i) a one sided formula(e.g. |
fixef.rm | Can be equal to "perfect_fit" (default), "singletons", "infinite_coef"or "none". This option controls which observations should be removed prior to the estimation.If "singletons", fixed-effects associated to a single observation are removed(since they perfectly explain it). The value "infinite_coef" only works with GLM families with limited left hand sides (LHS)and exponential link.For instance the Poisson family for which the LHS cannot be lower than 0, or the logitfamily for which the LHS lies within 0 and 1.In that case the fixed-effects (FEs) with only-0 LHS would lead to infinite coefficients(FE = -Inf would explain perfectly the LHS).The value If "perfect_fit", it is equivalent to "singletons" and "infinite_coef" combined.That means all observations that are perfectly explained by the FEs are removed. If "none": no observation is removed. Note that whathever the value of this options: the coefficient estimateswill remain the same. It only affects inference (the standard-errors). The algorithm is recursive, meaning that, e.g. in the presence of several fixed-effects (FEs),removing singletons in one FE can create singletons (or perfect fits) in another FE.The algorithm continues until there is no singleton/perfect-fit remaining. |
fixef.tol | Precision used to obtain the fixed-effects. Defaults to |
fixef.iter | Maximum number of iterations in fixed-effects algorithm(only in use for 2+ fixed-effects). Default is 10000. |
collin.tol | Numeric scalar, default is |
lean | Logical scalar, default is |
verbose | Integer. Higher values give more information. In particular,it can detail the number of iterations in the demeaning algorithm(the first number is the left-hand-side, the other numbers are the right-hand-side variables). |
warn | Logical, default is |
fixef.keep_names | Logical or |
demeaned | Logical, default is |
mem.clean | Logical scalar, default is |
glm.iter | Number of iterations of the glm algorithm. Default is 25. |
glm.tol | Tolerance level for the glm algorithm. Default is |
data.save | Logical scalar, default is |
reset | Logical scalar, default is |
Value
The functiongetFixest_estimation returns the currently set global defaults.
Examples
## Example: removing singletons is FALSE by default## => changing this default# Let's create data with singletonsbase = irisnames(base) = c("y", "x1", "x2", "x3", "species")base$fe_singletons = as.character(base$species)base$fe_singletons[1:5] = letters[1:5]res = feols(y ~ x1 + x2 | fe_singletons, base)res_noSingle = feols(y ~ x1 + x2 | fe_singletons, base, fixef.rm = "single")# New defaultssetFixest_estimation(fixef.rm = "single")res_newDefault = feols(y ~ x1 + x2 | fe_singletons, base)etable(res, res_noSingle, res_newDefault)# Resetting the defaultssetFixest_estimation(reset = TRUE)Sets/gets formula macros
Description
You can set formula macros globally withsetFixest_fml. These macros can then be used infixest estimations or when using the functionxpd.
Usage
setFixest_fml(..., reset = FALSE)getFixest_fml()Arguments
... | Definition of the macro variables. Each argument name corresponds to the name of themacro variable. It is required that each macro variable name starts with two dots(e.g. |
reset | A logical scalar, defaults to |
Details
Inxpd, the default macro variables are taken fromgetFixest_fml.Any value in the... argument ofxpd will replace these default values.
The definitions of the macro variables will replace in verbatim the macro variables.Therefore, you can include multipart formulas if you wish but then beware of the order themacros variable in the formula. For example, using the airquality data, say you want to set ascontrols the variableTemp andDay fixed-effects, you can dosetFixest_fml(..ctrl = ~Temp | Day), but thenfeols(Ozone ~ Wind + ..ctrl, airquality) will be quite different fromfeols(Ozone ~ ..ctrl + Wind, airquality), so beware!
Value
The functiongetFixest_fml() returns a list of character strings, the namescorresponding to the macro variable names, the character strings correspondingto their definition.
See Also
xpd to make use of formula macros.
Examples
# Small examples with airquality datadata(airquality)# we set two macro variablessetFixest_fml(..ctrl = ~ Temp + Day, ..ctrl_long = ~ poly(Temp, 2) + poly(Day, 2))# Using the macro in lm with xpd:lm(xpd(Ozone ~ Wind + ..ctrl), airquality)lm(xpd(Ozone ~ Wind + ..ctrl_long), airquality)# You can use the macros without xpd() in fixest estimationsa = feols(Ozone ~ Wind + ..ctrl, airquality)b = feols(Ozone ~ Wind + ..ctrl_long, airquality)etable(a, b, keep = "Int|Win")# Using .[]base = setNames(iris, c("y", "x1", "x2", "x3", "species"))i = 2:3z = "species"lm(xpd(y ~ x.[2:3] + .[z]), base)# No xpd() needed in feolsfeols(y ~ x.[2:3] + .[z], base)## Auto completion with '..' suffix## You can trigger variables autocompletion with the '..' suffix# You need to provide the argument database = setNames(iris, c("y", "x1", "x2", "x3", "species"))xpd(y ~ x.., data = base)# In fixest estimations, this is automatically taken care offeols(y ~ x.., data = base)## You can use xpd for stepwise estimations## Note that for stepwise estimations in fixest, you can use# the stepwise functions: sw, sw0, csw, csw0# -> see help in feols or in the dedicated vignette# we want to look at the effect of x1 on y# controlling for different variablesbase = irisnames(base) = c("y", "x1", "x2", "x3", "species")# We first create a matrix with all possible combinations of variablesmy_args = lapply(names(base)[-(1:2)], function(x) c("", x))(all_combs = as.matrix(do.call("expand.grid", my_args)))res_all = list()for(i in 1:nrow(all_combs)){ res_all[[i]] = feols(xpd(y ~ x1 + ..v, ..v = all_combs[i, ]), base)}etable(res_all)coefplot(res_all, group = list(Species = "^^species"))## You can use macros to grep variables in your data set## Example 1: setting a macro variable globallydata(longley)setFixest_fml(..many_vars = grep("GNP|ployed", names(longley), value = TRUE))feols(Armed.Forces ~ Population + ..many_vars, longley)# Example 2: using ..("regex") or regex("regex") to grep the variables "live"feols(Armed.Forces ~ Population + ..("GNP|ployed"), longley)# Example 3: same as Ex.2 but without using a fixest estimation# Here we need to use xpd():lm(xpd(Armed.Forces ~ Population + regex("GNP|ployed"), data = longley), longley)# Stepwise estimation with regex: use a comma after the parenthesisfeols(Armed.Forces ~ Population + sw(regex(,"GNP|ployed")), longley)# Multiple LHSetable(feols(..("GNP|ployed") ~ Population, longley))## lhs and rhs arguments## to create a one sided formula from a character vectorvars = letters[1:5]xpd(rhs = vars)# Alternatively, to replace the RHSxpd(y ~ 1, rhs = vars)# To create a two sided formulaxpd(lhs = "y", rhs = vars)## argument 'add'#xpd(~x1, add = ~ x2 + x3)# also works with character vectorsxpd(~x1, add = c("x2", "x3"))# only adds to the RHSxpd(y ~ x, add = ~bon + jour)## argument add.after_pipe#xpd(~x1, add.after_pipe = ~ x2 + x3)# we can add a two sided formulaxpd(~x1, add.after_pipe = x2 ~ x3)## Dot square bracket operator## The basic use is to add variables in the formulax = c("x1", "x2")xpd(y ~ .[x])# Alternatively, one-sided formulas can be used and their content will be inserted verbatimx = ~x1 + x2xpd(y ~ .[x])# You can create multiple variables at oncexpd(y ~ x.[1:5] + z.[2:3])# You can summon variables from the environment to complete variables namesvar = "a"xpd(y ~ x.[var])# ... the variables can be multiplevars = LETTERS[1:3]xpd(y ~ x.[vars])# You can have "complex" variable names but they must be nested in character formxpd(y ~ .["x.[vars]_sq"])# DSB can be used within regular expressionsre = c("GNP", "Pop")xpd(Unemployed ~ regex(".[re]"), data = longley)# => equivalent to regex("GNP|Pop")# Use .[,var] (NOTE THE COMMA!) to expand with commas# !! can break the formula if missusedvars = c("wage", "unemp")xpd(c(y.[,1:3]) ~ csw(.[,vars]))# Example of use of .[] within a loopres_all = list()for(p in 1:3){ res_all[[p]] = feols(Ozone ~ Wind + poly(Temp, .[p]), airquality)}etable(res_all)# The former can be compactly estimated with:res_compact = feols(Ozone ~ Wind + sw(.[, "poly(Temp, .[1:3])"]), airquality)etable(res_compact)# How does it work?# 1) .[, stuff] evaluates stuff and, if a vector, aggregates it with commas# Comma aggregation is done thanks to the comma placed after the square bracket# If .[stuff], then aggregation is with sums.# 2) stuff is evaluated, and if it is a character string, it is evaluated with# the function dsb which expands values in .[]## Wrapping up:# 2) evaluation of dsb("poly(Temp, .[1:3])") leads to the vector:# c("poly(Temp, 1)", "poly(Temp, 2)", "poly(Temp, 3)")# 1) .[, c("poly(Temp, 1)", "poly(Temp, 2)", "poly(Temp, 3)")] leads to# poly(Temp, 1), poly(Temp, 2), poly(Temp, 3)## Hence sw(.[, "poly(Temp, .[1:3])"]) becomes:# sw(poly(Temp, 1), poly(Temp, 2), poly(Temp, 3))## In non-fixest functions: guessing the data allows to use regex## When used in non-fixest functions, the algorithm tries to "guess" the data# so that ..("regex") can be directly evaluated without passing the argument 'data'data(longley)lm(xpd(Armed.Forces ~ Population + ..("GNP|ployed")), longley)# same for the auto completion with '..'lm(xpd(Armed.Forces ~ Population + GN..), longley)Sets properties offixest_multi objects
Description
Use this function to change the default behavior offixest_multi objects.
Usage
setFixest_multi(drop = FALSE)getFixest_multi()Arguments
drop | Logical scalar, default is |
Value
The functiongetFixest_multi() returns the list of settings.
Examples
# 1) let's run a multiple estimationbase = setNames(iris, c("y", "x1", "x2", "x3", "species"))est = feols(y ~ csw(x1, x2, x3), base)# 2) let's pick a single estimation => by default we have a `fixest_multi` objectclass(est[rhs = 2])# `drop = TRUE` would have led to a `fixest` objectclass(est[rhs = 2, drop = TRUE])# 3) change the default behaviorsetFixest_multi(drop = TRUE)class(est[rhs = 2])Sets/gets whether to display notes infixest estimation functions
Description
Sets/gets the default values of whether notes (informing for NA and observations removed) should be displayed infixest estimation functions.
Usage
setFixest_notes(x)getFixest_notes()Arguments
x | A logical. If |
Author(s)
Laurent Berge
Examples
# Change default withsetFixest_notes(FALSE)feols(Ozone ~ Solar.R, airquality)# Back to default which is TRUEsetFixest_notes(TRUE)feols(Ozone ~ Solar.R, airquality)Sets/gets the number of threads to use infixest functions
Description
Sets/gets the default number of threads to used infixest estimation functions. The default is the maximum number of threads minus two.
Usage
setFixest_nthreads(nthreads, save = FALSE)getFixest_nthreads()Arguments
nthreads | The number of threads. Can be: a) an integer lower than, or equal to, themaximum number of threads; b) 0: meaning all available threads will be used; c) a numberstrictly between 0 and 1 which represents the fraction of all threads to use. If missing, thedefault is to use 50% of all threads. |
save | Either a logical or equal to |
Author(s)
Laurent Berge
Examples
# Gets the current number of threads(nthreads_origin = getFixest_nthreads())# To set multi-threading off:setFixest_nthreads(1)# To set it back to default at startup:setFixest_nthreads()# And back to the original valuesetFixest_nthreads(nthreads_origin)Sets the default type of standard errors to be used
Description
This functions defines or extracts the default type of standard-errors to computed infixestsummary, andvcov.
Usage
setFixest_vcov( no_FE = "iid", one_FE = "iid", two_FE = "iid", panel = "iid", all = NULL, reset = FALSE)getFixest_vcov()Arguments
no_FE | Character scalar equal to either: |
one_FE | Character scalar equal to either: |
two_FE | Character scalar equal to either: |
panel | Character scalar equal to either: |
all | Character scalar equal to either: |
reset | Logical, default is |
Value
The functiongetFixest_vcov() returns a list with three elements containing the default forestimations i) without, ii) with one, or iii) with two or more fixed-effects.
Examples
# By default: 'standard' VCOVsdata(base_did)est_no_FE = feols(y ~ x1, base_did)est_one_FE = feols(y ~ x1 | id, base_did)est_two_FE = feols(y ~ x1 | id + period, base_did)est_panel = feols(y ~ x1 | id + period, base_did, panel.id = ~id + period)etable(est_no_FE, est_one_FE, est_two_FE)# Changing the default standard-errorssetFixest_vcov(no_FE = "hetero", one_FE = "cluster", two_FE = "twoway", panel = "drisc")etable(est_no_FE, est_one_FE, est_two_FE, est_panel)# Resetting the defaultssetFixest_vcov(reset = TRUE)Residual standard deviation offixest estimations
Description
Extract the estimated standard deviation of the errors fromfixest estimations.
Usage
## S3 method for class 'fixest'sigma(object, ...)Arguments
object | A |
... | Not currently used. |
Value
Returns a numeric scalar.
See Also
feols,fepois,feglm,fenegbin,feNmlm.
Examples
est = feols(Petal.Length ~ Petal.Width, iris)sigma(est)Design matrix of afixest object returned in sparse format
Description
This function creates the left-hand-side or the right-hand-side(s) of afemlm,feols orfeglm estimation.
Usage
sparse_model_matrix( object, data, type = "rhs", sample = "estimation", na.rm = FALSE, collin.rm = NULL, combine = TRUE, ...)Arguments
object | A |
data | If missing (default) then the original data is obtained by evaluating the |
type | Character vector or one sided formula, default is "rhs".Contains the type of matrix/data.frame to be returned. Possible values are:"lhs", "rhs", "fixef", "iv.rhs1" (1st stage RHS), "iv.rhs2" (2nd stage RHS),"iv.endo" (endogenous vars.), "iv.exo" (exogenous vars), "iv.inst" (instruments). |
sample | Character scalar equal to "estimation" (default) or "original". Onlyused when If |
na.rm | Default is |
collin.rm | Logical scalar. Whether to remove variables that werefound to be collinear during the estimation. Beware: it does not perform acollinearity check and bases on the |
combine | Logical scalar, default is |
... | Not currently used. |
Value
It returns either a single sparse matrix a list of matrices,depending whethercombine isTRUE orFALSE.The sparse matrix is of classdgCMatrix from theMatrix package.
Author(s)
Laurent Berge, Kyle Butts
See Also
See also the main estimation functionsfemlm,feols orfeglm.formula.fixest,update.fixest,summary.fixest,vcov.fixest.
Examples
est = feols(wt ~ i(vs) + hp | cyl, mtcars)sparse_model_matrix(est)sparse_model_matrix(wt ~ i(vs) + hp | cyl, mtcars)Governs the small sample correction infixest VCOVs
Description
Provides how the small sample correction should be calculated invcov.fixest/summary.fixest.
Usage
ssc( K.adj = TRUE, K.fixef = "nonnested", K.exact = FALSE, G.adj = TRUE, G.df = "min", t.df = "min", ...)setFixest_ssc(ssc.type = ssc())getFixest_ssc()Arguments
K.adj | Logical scalar, defaults to |
K.fixef | Character scalar equal to |
K.exact | Logical, default is |
G.adj | Logical scalar, default is |
G.df | Either "conventional" or "min" (default). Only relevant when thevariance-covariance matrix is two-way clustered (or higher). It governs how the smallsample adjustment for the clusters is to be performed. [Sorry for the jargon that follows.]By default a unique adjustment is made, of the form G_min/(G_min-1) with G_min thesmallest G_i. If |
t.df | Either "conventional", "min" (default) or an integer scalar. Only relevant whenthe variance-covariance matrix is clustered. It governs how the p-values should be computed.By default, the degrees of freedom of the Student t distribution is equal to the minimum sizeof the clusters with which the VCOV has been clustered minus one. If |
... | Only used internally (to catch deprecated parameters). |
ssc.type | An object of class |
Details
The following vignette:On standard-errors,describes in details how the standard-errors are computed infixest and how you canreplicate standard-errors from other software.
Value
It returns assc.type object.
Author(s)
Laurent Berge
See Also
Examples
## Equivalence with lm/glm standard-errors## LM# In the absence of fixed-effects,# by default, the standard-errors are computed in the same wayres = feols(Petal.Length ~ Petal.Width + Species, iris)res_lm = lm(Petal.Length ~ Petal.Width + Species, iris)vcov(res) / vcov(res_lm)# GLM# By default, there is no small sample adjustment in glm, as opposed to feglm.# To get the same SEs, we need to use ssc(K.adj = FALSE)res_pois = fepois(round(Petal.Length) ~ Petal.Width + Species, iris)res_glm = glm(round(Petal.Length) ~ Petal.Width + Species, iris, family = poisson())vcov(res_pois, ssc = ssc(K.adj = FALSE)) / vcov(res_glm)# Same example with the Gammares_gamma = feglm(round(Petal.Length) ~ Petal.Width + Species, iris, family = Gamma())res_glm_gamma = glm(round(Petal.Length) ~ Petal.Width + Species, iris, family = Gamma())vcov(res_gamma, ssc = ssc(K.adj = FALSE)) / vcov(res_glm_gamma)## Fixed-effects corrections## We create "irregular" FEsbase = data.frame(x = rnorm(10))base$y = base$x + rnorm(10)base$fe1 = rep(1:3, c(4, 3, 3))base$fe2 = rep(1:5, each = 2)est = feols(y ~ x | fe1 + fe2, base)# fe1: 3 FEs# fe2: 5 FEs## Clustered standard-errors: by fe1## Default: K.fixef = "nonnested"# => adjustment K = 1 + 5 (i.e. x + fe2)summary(est)attributes(vcov(est, attr = TRUE))[c("ssc", "df.K")]# K.fixef = FALSE# => adjustment K = 1 (i.e. only x)summary(est, ssc = ssc(K.fixef = "none"))attr(vcov(est, ssc = ssc(K.fixef = "none"), attr = TRUE), "df.K")# K.fixef = TRUE# => adjustment K = 1 + 3 + 5 - 1 (i.e. x + fe1 + fe2 - 1 restriction)summary(est, ssc = ssc(K.fixef = "full"))attr(vcov(est, ssc = ssc(K.fixef = "full"), attr = TRUE), "df.K")# K.fixef = TRUE & K.exact = TRUE# => adjustment K = 1 + 3 + 5 - 2 (i.e. x + fe1 + fe2 - 2 restrictions)summary(est, ssc = ssc(K.fixef = "full", K.exact = TRUE))attr(vcov(est, ssc = ssc(K.fixef = "full", K.exact = TRUE), attr = TRUE), "df.K")# There are two restrictions:attr(fixef(est), "references")## To permanently set the default ssc:## eg no small sample adjustment:setFixest_ssc(ssc(K.adj = FALSE))# Factory defaultsetFixest_ssc()Stepwise estimation tools
Description
Functions to perform stepwise estimations infixest models.
Usage
sw(...)csw(...)sw0(...)csw0(...)mvsw(...)Arguments
... | Represents formula variables to be added in a stepwise fashion to an estimation. |
Details
To include multiple independent variables, you need to use the stepwise functions.There are 5 stepwise functions:sw,sw0,csw,csw0 andmvsw. Let's explain that.
Assume you have the following formula:fml = y ~ x1 + sw(x2, x3). The stepwisefunctionsw will estimate the following two models:y ~ x1 + x2 andy ~ x1 + x3.That is, each element insw() is sequentially, and separately, added to the formula.Would have you usedsw0 in lieu ofsw, then the modely ~ x1 would also havebeen estimated. The0 in the name implies that the model without any stepwiseelement will also be estimated.
Finally, the prefixc means cumulative: each stepwise element is added to the next.That is,fml = y ~ x1 + csw(x2, x3) would lead to the following modelsy ~ x1 + x2andy ~ x1 + x2 + x3. The0 has the same meaning and would also lead to the modelwithout the stepwise elements to be estimated: in other words,fml = y ~ x1 + csw0(x2, x3) leads to the following three models:y ~ x1,y ~ x1 + x2 andy ~ x1 + x2 + x3.
The last stepwise function,mvsw, refers to 'multiverse' stepwise. It will estimateas many models as there are unique combinations of stepwise variables. For examplefml = y ~ x1 + mvsw(x2, x3) will estimatey ~ x1,y ~ x1 + x2,y ~ x1 + x3,y ~ x1 + x2 + x3. Beware that the number of estimations grows pretty fast (2^n,withn the number of stewise variables)!
Examples
base = setNames(iris, c("y", "x1", "x2", "x3", "species"))# Regular stepwisefeols(y ~ sw(x1, x2, x3), base)# Cumulative stepwisefeols(y ~ csw(x1, x2, x3), base)# Using the 0feols(y ~ x1 + x2 + sw0(x3), base)# Multiverse stepwisefeols(y ~ x1 + mvsw(x2, x3), base)Style of data.frames created by etable
Description
This function describes the style of data.frames created with the functionetable.
Usage
style.df( depvar.title = "Dependent Var.:", fixef.title = "Fixed-Effects:", fixef.line = "-", fixef.prefix = "", fixef.suffix = "", slopes.title = "Varying Slopes:", slopes.line = "-", slopes.format = "__var__ (__slope__)", stats.title = "_", stats.line = "_", yesNo = c("Yes", "No"), headers.sep = TRUE, signif.code = c(`***` = 0.001, `**` = 0.01, `*` = 0.05, . = 0.1), interaction.combine = " x ", i.equal = " = ", default = FALSE)Arguments
depvar.title | Character scalar. Default is |
fixef.title | Character scalar. Default is |
fixef.line | A single character. Default is |
fixef.prefix | Character scalar. Default is |
fixef.suffix | Character scalar. Default is |
slopes.title | Character scalar. Default is |
slopes.line | Character scalar. Default is |
slopes.format | Character scalar. Default is |
stats.title | Character scalar. Default is |
stats.line | Character scalar. Default is |
yesNo | Character vector of length 1 or 2. Default is |
headers.sep | Logical, default is |
signif.code | Named numeric vector, used to provide the significance codes with respect tothe p-value of the coefficients. Default is |
interaction.combine | Character scalar, defaults to |
i.equal | Character scalar, defaults to |
default | Logical, default is |
Details
@inheritParams etable
The title elements (depvar.title,fixef.title,slopes.title andstats.title) will be therow names of the returned data.frame. Therefore keep in mind that any two of them should not beidentical (since identical row names are forbidden in data.frames).
Value
It returns an object of classfixest_style_df.
Examples
# Multiple estimations => see details in feolsaq = airqualityest = feols(c(Ozone, Solar.R) ~ Wind + csw(Temp, Temp^2, Temp^3) | Month + Day, data = aq)# Default resultetable(est)# Playing a bit with the stylesetable(est, style.df = style.df(fixef.title = "", fixef.suffix = " FE", stats.line = " ", yesNo = "yes"))Style definitions for Latex tables
Description
This function describes the style of Latex tables to be exported with the functionetable.
Usage
style.tex( main = "base", depvar.title, model.title, model.format, line.top, line.bottom, var.title, fixef.title, fixef.prefix, fixef.suffix, fixef.where, slopes.title, slopes.format, fixef_sizes.prefix, fixef_sizes.suffix, stats.title, notes.intro, notes.tpt.intro, tablefoot, tablefoot.value, yesNo, tabular = "normal", depvar.style, no_border, caption.after, rules_width, signif.code, tpt, arraystretch, adjustbox = NULL, fontsize, interaction.combine = " $\\times$ ", i.equal = " $=$ ")Arguments
main | Either "base", "aer" or "qje". Defines the basic style to start from. The styles"aer" and "qje" are almost identical and only differ on the top/bottom lines. |
depvar.title | A character scalar. The title of the line of the dependent variables(defaults to |
model.title | A character scalar. The title of the line of the models (defaults to |
model.format | A character scalar. The value to appear on top of each column. It defaultsto |
line.top | A character scalar equal to |
line.bottom | A character scalar equal to |
var.title | A character scalar. The title line appearing before the variables (defaults to |
fixef.title | A character scalar. The title line appearing before the fixed-effects(defaults to |
fixef.prefix | A prefix to add to the fixed-effects names. Defaults to |
fixef.suffix | A suffix to add to the fixed-effects names. Defaults to |
fixef.where | Either "var" or "stats". Where to place the fixed-effects lines?Defaults to |
slopes.title | A character scalar. The title line appearing before the variables withvarying slopes (defaults to |
slopes.format | Character scalar representing the format of the slope variable name.There are two special characters: "var" and "slope", placeholers for the variableand slope names. Defaults to |
fixef_sizes.prefix | A prefix to add to the fixed-effects names. Defaults to |
fixef_sizes.suffix | A suffix to add to the fixed-effects names. Defaultsto |
stats.title | A character scalar. The title line appearing before the statistics(defaults to |
notes.intro | A character scalar. Some tex code appearing just before the notes,defaults to |
notes.tpt.intro | Character scalar. Only used if |
tablefoot | A logical scalar. Whether or not to display a footer within the table.Defaults to |
tablefoot.value | A character scalar. The notes to be displayed in the footer.Defaults to |
yesNo | A character vector of length 1 or 2. Defaults to |
tabular | (Tex only.) Character scalar equal to "normal" (default), |
depvar.style | Character scalar equal to either |
no_border | Logical, default is |
caption.after | Character scalar. Tex code that will be placed right after the caption.Defaults to |
rules_width | Character vector of length 1 or 2. This vector gives the width of the |
signif.code | Named numeric vector, used to provide the significance codes with respect tothe p-value of the coefficients. Default is |
tpt | (Tex only.) Logical scalar, default is FALSE. Whether to use the |
arraystretch | (Tex only.) A numeric scalar, default is |
adjustbox | (Tex only.) A logical, numeric or character scalar, default is |
fontsize | (Tex only.) A character scalar, default is |
interaction.combine | Character scalar, defaults to |
i.equal | Character scalar, defaults to |
Details
The\\checkmark command, used in the "aer" style (in argumentyesNo), is in theamssymb package.
The commands\\toprule,\\midrule and\\bottomrule are in thebooktabs package.You can set the width of the top/bottom rules with\\setlength\\heavyrulewidth\{wd\},and of the midrule with\\setlength\\lightrulewidth\{wd\}.
Note that all titles (depvar.title,depvar.title, etc) are not escaped, so theymust be valid Latex expressions.
Value
Returns a list containing the style parameters.
See Also
Examples
# Multiple estimations => see details in feolsaq = airqualityest = feols(c(Ozone, Solar.R) ~ Wind + csw(Temp, Temp^2, Temp^3) | Month + Day, data = aq)# Playing a bit with the stylesetable(est, tex = TRUE)etable(est, tex = TRUE, style.tex = style.tex("aer"))etable(est, tex = TRUE, style.tex = style.tex("aer", var.title = "\\emph{Expl. Vars.}", model.format = "[i]", yesNo = "x", tabular = "*"))Summary of afixest object. Computes different types of standard errors.
Description
This function is similar toprint.fixest. It provides the table of coefficients along withother information on the fit of the estimation. It can compute different types of standarderrors. The new variance covariance matrix is an object returned.
Usage
## S3 method for class 'fixest'summary( object, vcov = NULL, cluster = NULL, ssc = NULL, stage = NULL, lean = FALSE, agg = NULL, forceCovariance = FALSE, se = NULL, keepBounded = FALSE, n = 1000, vcov_fix = TRUE, nthreads = getFixest_nthreads(), ...)## S3 method for class 'fixest_list'summary( object, se, cluster, ssc = getFixest_ssc(), vcov = NULL, stage = 2, lean = FALSE, n, ...)Arguments
object | A |
vcov | Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form: |
cluster | Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering over |
ssc | An object of class |
stage | Can be equal to |
lean | Logical, default is |
agg | A character scalar describing the variable names to be aggregated,it is pattern-based. For |
forceCovariance | (Advanced users.) Logical, default is |
se | Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation: |
keepBounded | (Advanced users – |
n | Integer, default is 1000. Number of coefficients to display when the print methodis used. |
vcov_fix | Logical scalar, default is |
nthreads | The number of threads. Can be: a) an integer lower than, or equal to,the maximum number of threads; b) 0: meaning all available threads will be used;c) a number strictly between 0 and 1 which represents the fraction of all threads to use.The default is to use 50% of all threads. You can set permanently the numberof threads used within this package using the function |
... | Only used if the argument |
Value
It returns afixest object with:
cov.scaled | The new variance-covariance matrix (computed according to the argument |
se | The new standard-errors (computed according to the argument |
coeftable | The table of coefficients with the new standard errors. |
Compatibility withsandwich package
The VCOVs fromsandwich can be used withfeols,feglm andfepois estimations.If you want to have asandwich VCOV when usingsummary.fixest, you can usethe argumentvcov to specify the VCOV function to use (see examples).Note that if you do so and you use a formula in thecluster argument, an innocuouswarning can pop up if you used several non-numeric fixed-effects in the estimation(this is due to the functionexpand.model.frame used insandwich).
Author(s)
Laurent Berge
See Also
See also the main estimation functionsfemlm,feols orfeglm.Usefixef.fixest to extract the fixed-effects coefficients, and the functionetableto visualize the results of multiple estimations.
Examples
# Load trade datadata(trade)# We estimate the effect of distance on trade (with 3 fixed-effects)est_pois = fepois(Euros ~ log(dist_km)|Origin+Destination+Product, trade)# Comparing different types of standard errorssum_standard = summary(est_pois, vcov = "iid")sum_hetero = summary(est_pois, vcov = "hetero")sum_oneway = summary(est_pois, vcov = "cluster")sum_twoway = summary(est_pois, vcov = "twoway")etable(sum_standard, sum_hetero, sum_oneway, sum_twoway)# Alternative ways to cluster the SE:summary(est_pois, vcov = cluster ~ Product + Origin)summary(est_pois, vcov = ~Product + Origin)summary(est_pois, cluster = ~Product + Origin)# You can interact the clustering variables "live" using the var1 ^ var2 syntax.#'summary(est_pois, vcov = ~Destination^Product)## Newey-West and Driscoll-Kraay SEs#data(base_did)# Simple estimation on a panelest = feols(y ~ x1, base_did)# --# Newey-West# Use the syntax NW ~ unit + timesummary(est, NW ~ id + period)# Now take a lag of 3:summary(est, NW(3) ~ id + period)# --# Driscoll-Kraay# Use the syntax DK ~ timesummary(est, DK ~ period)# Now take a lag of 3:summary(est, DK(3) ~ period)#--# Implicit deductions# When the estimation is done with a panel.id, you don't need to# specify these values.est_panel = feols(y ~ x1, base_did, panel.id = ~id + period)# Both methods, NM and DK, now work automaticallysummary(est_panel, "NW")summary(est_panel, "DK")## VCOVs robust to spatial correlation#data(quakes)est_geo = feols(depth ~ mag, quakes)# --# Conley# Use the syntax: conley(cutoff) ~ lat + lon# with lat/lon the latitude/longitude variable names in the data setsummary(est_geo, conley(100) ~ lat + long)# Change the cutoff, and how the distance is computedsummary(est_geo, conley(200, distance = "spherical") ~ lat + long)# --# Implicit deduction# By default the latitude and longitude are directly fetched in the data based# on pattern matching. So you don't have to specify them.# Further an automatic cutoff is computed by default.# The following workssummary(est_geo, "conley")## Compatibility with sandwich## You can use the VCOVs from sandwich by using the argument vcov:library(sandwich)summary(est_pois, vcov = vcovCL, cluster = trade[, c("Destination", "Product")])Summary method for fixed-effects coefficients
Description
This function summarizes the main characteristics of the fixed-effects coefficients.It shows the number of fixed-effects that have been set as references and the firstelements of the fixed-effects.
Usage
## S3 method for class 'fixest.fixef'summary(object, n = 5, ...)Arguments
object | An object returned by the function |
n | Positive integer, defaults to 5. The |
... | Not currently used. |
Value
It prints the number of fixed-effect coefficients per fixed-effect dimension, as well asthe number of fixed-effects used as references for each dimension, and the mean and varianceof the fixed-effect coefficients. Finally, it reports the first 5 (arg.n) elements ofeach fixed-effect.
Author(s)
Laurent Berge
See Also
femlm,fixef.fixest,plot.fixest.fixef.
Examples
data(trade)# We estimate the effect of distance on trade# => we account for 3 fixed-effects effectsest_pois = femlm(Euros ~ log(dist_km)|Origin+Destination+Product, trade)# obtaining the fixed-effects coefficientsfe_trade = fixef(est_pois)# printing some summary information on the fixed-effects coefficients:summary(fe_trade)Summary for fixest_multi objects
Description
Summary information for fixest_multi objects. In particular, this is used to specify thetype of standard-errors to be computed.
Usage
## S3 method for class 'fixest_multi'summary( object, type = "etable", vcov = NULL, se = NULL, cluster = NULL, ssc = NULL, stage = 2, lean = FALSE, n = 1000, ...)Arguments
object | A |
type | A character either equal to |
vcov | Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form: |
se | Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation: |
cluster | Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering over |
ssc | An object of class |
stage | Can be equal to |
lean | Logical, default is |
n | Integer, default is 1000. Number of coefficients to display when the print methodis used. |
... | Not currently used. |
Value
It returns either an object of classfixest_multi (iftype equalsshort orlong),either adata.frame (if type equalscompact orse_compact).
See Also
The main fixest estimation functions:feols,fepois,fenegbin,feglm,feNmlm. Tools for mutliple fixestestimations:summary.fixest_multi,print.fixest_multi,as.list.fixest_multi,sub-sub-.fixest_multi,sub-.fixest_multi.
Examples
base = irisnames(base) = c("y", "x1", "x2", "x3", "species")# Multiple estimationres = feols(y ~ csw(x1, x2, x3), base, split = ~species)# By default, the type is "etable"# You can still use the arguments from summary.fixestsummary(res, se = "hetero")summary(res, type = "long")summary(res, type = "compact")summary(res, type = "se_compact")summary(res, type = "se_long")Sun and Abraham interactions
Description
User-level method to implement staggered difference-in-difference estimations a la Sunand Abraham (Journal of Econometrics, 2021).
Usage
sunab( cohort, period, ref.c = NULL, ref.p = -1, bin, bin.rel, bin.c, bin.p, att = FALSE, no_agg = FALSE)sunab_att(cohort, period, ref.c = NULL, ref.p = -1)Arguments
cohort | A vector representing the cohort. It should represent the period atwhich the treatment has been received (and thus be fixed for each unit). |
period | A vector representing the period. It can be either a relative time period(with negative values representing the before the treatment and positive valuesafter the treatment), or a regular time period. In the latter case, the relativetime period will be created from the cohort information (which represents the time atwhich the treatment has been received). |
ref.c | A vector of references for the cohort. By default the never treatedcohorts are taken as reference and the always treated are excluded from the estimation.You can add more references with this argument, which means that dummies will not becreated for them (but they will remain in the estimation). |
ref.p | A vector of references for the (relative!) period. By default thefirst relative period (RP) before the treatment, i.e. -1, is taken as reference.You can instead use your own references (i.e. RPs for which dummies will not becreated – but these observations remain in the sample). Please note that you willneed at least two references. You can use the special variables |
bin | A list of values to be grouped, a vector, or the special value |
bin.rel | A list or a vector defining which values to bin. Only applies to therelative periods andnot the cohorts. Please refer to the help of the argument |
bin.c | A list or a vector defining which values to bin. Only applies to the cohort.Please refer to the help of the argument |
bin.p | A list or a vector defining which values to bin. Only applies to the period.Please refer to the help of the argument |
att | Logical, default is |
no_agg | Logical, default is |
Details
This function creates a matrix ofcohort x relative_period interactions, and if used withinafixest estimation, the coefficients will automatically be aggregated to obtain the ATTfor each relative period. In practice, the coefficients are aggregated with theaggregate.fixest function whose argumentagg is automatically set to the appropriatevalue.
The SA method requires relative periods (negative/positive for before/after the treatment).Either the user can compute the RP (relative periods) by his/her own, either the RPsare computed on the fly from the periods and the cohorts (which then should representthe treatment period).
The never treated, which are the cohorts displaying only negative RPs are used as references(i.e. no dummy will be constructed for them). On the other hand, the always treated areremoved from the estimation, by means of adding NAs for each of their observations.
If the RPs have to be constructed on the fly, any cohort that is not present in theperiod is considered as never treated. This means that if the period ranges from1995 to 2005,cohort = 1994 will be considered as never treated, although itshould be considered as always treated: so be careful.
If you construct your own relative periods, the controls cohorts should have only negative RPs.
Value
If not used within afixest estimation, this function will return a matrix ofinteracted coefficients.
Binning
You can bin periods with the argumentsbin,bin.c,bin.p and/orbin.rel.
The argumentbin applies both to the original periods and cohorts (the cohorts will alsobe binned!). This argument only works when theperiod represent "calendar" periods(not relative ones!).
Alternatively you can bin the periods withbin.p (either "calendar" or relative); orthe cohorts withbin.c.
The argumentbin.rel applies only to the relative periods (hence not to the cohorts) oncethey have been created.
To understand how binning works, please have a look at the help and examples of thefunctionbin.
Binning can be done in many different ways: just remember that it is not because it ispossible that it does makes sense!
Author(s)
Laurent Berge
Examples
# Simple DiD exampledata(base_stagg)head(base_stagg)# Note that the year_treated is set to 1000 for the never treatedtable(base_stagg$year_treated)table(base_stagg$time_to_treatment)# The DiD estimationres_sunab = feols(y ~ x1 + sunab(year_treated, year) | id + year, base_stagg)etable(res_sunab)# By default the reference periods are the first year and the year before the treatment# i.e. ref.p = c(-1, .F); where .F is a shortcut for the first period.# Say you want to set as references the first three periods on top of -1res_sunab_3ref = feols(y ~ x1 + sunab(year_treated, year, ref.p = c(.F + 0:2, -1)) | id + year, base_stagg)# Display the two resultsiplot(list(res_sunab, res_sunab_3ref))# ... + show all refsiplot(list(res_sunab, res_sunab_3ref), ref = "all")## ATT## To get the total ATT, you can use summary with the agg argument:summary(res_sunab, agg = "ATT")# You can also look at the total effect per cohortsummary(res_sunab, agg = "cohort")## Binning## Binning can be done in many different ways# binning the cohortest_bin.c = feols(y ~ x1 + sunab(year_treated, year, bin.c = 3:2) | id + year, base_stagg)# binning the periodest_bin.p = feols(y ~ x1 + sunab(year_treated, year, bin.p = 3:1) | id + year, base_stagg)# binning both the cohort and the periodest_bin = feols(y ~ x1 + sunab(year_treated, year, bin = 3:1) | id + year, base_stagg)# binning the relative period, grouping every two yearsest_bin.rel = feols(y ~ x1 + sunab(year_treated, year, bin.rel = "bin::2") | id + year, base_stagg)etable(est_bin.c, est_bin.p, est_bin, est_bin.rel, keep = "year")Extract the terms
Description
This function extracts the terms of afixest estimation, excluding the fixed-effects part.
Usage
## S3 method for class 'fixest'terms(x, ...)Arguments
x | A |
... | Not currently used. |
Value
An object of classc("terms", "formula") which contains the terms representation of asymbolic model.
Examples
# simple estimation on iris data, using "Species" fixed-effectsres = feols(Sepal.Length ~ Sepal.Width*Petal.Length + Petal.Width | Species, iris)# Terms of the linear partterms(res)Fast transform of any type of vector(s) into an integer vector
Description
Tool to transform any type of vector, or even combination of vectors, into an integer vectorranging from 1 to the number of unique values. This actually creates an unique identifier vector.
Usage
to_integer( ..., inputs = NULL, sorted = FALSE, add_items = FALSE, items.list = FALSE, multi.df = FALSE, multi.join = "_", na.valid = FALSE, internal = FALSE)Arguments
... | Vectors of any type, to be transformed into a single integer vector rangingfrom 1 to the number of unique elements. |
inputs | A list of inputs, by default it is |
sorted | Logical, default is |
add_items | Logical, default is |
items.list | Logical, default is |
multi.df | Logical, default is |
multi.join | Character scalar used to join the items of multiple vectors.The default is |
na.valid | Logical, default is |
internal | Logical, default is |
Value
Reruns a vector of the same length as the input vectors.Ifadd_items=TRUE anditems.list=TRUE, a list of two elements is returned:xbeing the integer vector anditems being the unique values to which the valuesinx make reference.
Author(s)
Laurent Berge
Examples
x1 = iris$Speciesx2 = as.integer(iris$Sepal.Length)# transforms the species vector into integersto_integer(x1)# To obtain the "items":to_integer(x1, add_items = TRUE)# same but in list formto_integer(x1, add_items = TRUE, items.list = TRUE)# transforms x2 into an integer vector from 1 to 4to_integer(x2, add_items = TRUE)# To have the sorted items:to_integer(x2, add_items = TRUE, sorted = TRUE)# placing the three side to sidehead(cbind(x2, as_index = to_integer(x2), as_index_sorted = to_integer(x2, sorted = TRUE)))# The result can safely be used as an indexres = to_integer(x2, add_items = TRUE, sorted = TRUE, items.list = TRUE)all(res$items[res$x] == x2)## Multiple vectors#to_integer(x1, x2, add_items = TRUE)# You can use multi.join to handle the join of the items:to_integer(x1, x2, add_items = TRUE, multi.join = "; ")# alternatively, return the items as a data.frameto_integer(x1, x2, add_items = TRUE, multi.df = TRUE)## NA values#x1_na = c("a", "a", "b", NA, NA, "b", "a", "c", NA)x2_na = c(NA, 1, NA, 1, 1, 1, 2, 2, 2)# by default the NAs are propagatedto_integer(x1_na, x2_na, add_items = TRUE)# but you can treat them as valid values with na.valid = TRUEto_integer(x1_na, x2_na, add_items = TRUE, na.valid = TRUE)## programmatic use## the argument `inputs` can be used for easy programmatic useall_vars = list(x1_na, x2_na)to_integer(inputs = all_vars)Trade data sample
Description
This data reports trade information between countries of the European Union (EU15).
Usage
data(trade, package = "fixest")Format
trade is a data frame with 38,325 observations and 6 variables namedDestination,Origin,Product,Year,dist_km andEuros.
Origin: 2-digits codes of the countries of origin of the trade flow.Destination: 2-digits codes of the countries of destination of the trade flow.Products: Number representing the product categories (from 1 to 20).Year: Years from 2007 to 2016dist_km: Geographic distance in km between the centers of the countries of origin and destination.Euros: The total amount in euros of the trade flow for the specific year/product category/origin-destination country pair.
Source
This data has been extrated from Eurostat on October 2017.
Dissolves afixest panel
Description
Transforms afixest_panel object into a regular data.frame.
Usage
unpanel(x)Arguments
x | A |
Value
Returns a data set of the exact same dimension. Only the attribute 'panel_info' is erased.
Author(s)
Laurent Berge
See Also
Alternatively, the functionpanel changes adata.frame into a panel from which thefunctionsl andf (creating leads and lags) can be called. Otherwise you can set the panel'live' during the estimation using the argumentpanel.id (see for example in the functionfeols).
Examples
data(base_did)# Setting a data set as a panelpdat = panel(base_did, ~id+period)# ... allows you to use leads and lags in estimationsfeols(y~l(x1, 0:1), pdat)# Now unpanel => returns the initial data setclass(pdat) ; dim(pdat)new_base = unpanel(pdat)class(new_base) ; dim(new_base)Updates afixest estimation
Description
Updates and re-estimates afixest model (estimated withfemlm,feols orfeglm).This function updates the formulas and use previous starting values to estimate a newfixest model. The data is obtained from the originalcall.
Usage
## S3 method for class 'fixest'update( object, fml.update = NULL, fml = NULL, nframes = 1, use_calling_env = TRUE, evaluate = TRUE, ...)## S3 method for class 'fixest_multi'update( object, fml.update = NULL, fml = NULL, nframes = 1, use_calling_env = TRUE, evaluate = TRUE, ...)Arguments
object | A |
fml.update | A formula representing the changes to be made to the originalformula. By default it is |
fml | A formula, default is |
nframes | (Advanced users.) Defaults to 1. Only used if the argument |
use_calling_env | Logical scalar, default is |
evaluate | Logical, default is |
... | Other arguments to be passed to the functions |
Value
It returns afixest object (see details infemlm,feols orfeglm).
Author(s)
Laurent Berge
See Also
See also the main estimation functionsfemlm,feols orfeglm.predict.fixest,summary.fixest,vcov.fixest,fixef.fixest.
Examples
# Example using trade datadata(trade)# main estimationest_pois = fepois(Euros ~ log(dist_km) | Origin + Destination, trade)# we add the variable log(Year)est_2 = update(est_pois, . ~ . + log(Year))# we add another fixed-effect: "Product"est_3 = update(est_2, . ~ . | . + Product)# we remove the fixed-effect "Origin" and the variable log(dist_km)est_4 = update(est_3, . ~ . - log(dist_km) | . - Origin)# Quick look at the 4 estimationsetable(est_pois, est_2, est_3, est_4)Computes the variance/covariance of afixest object
Description
This function extracts the variance-covariance of estimated parameters from a modelestimated withfemlm,feols orfeglm.
Usage
## S3 method for class 'fixest'vcov( object, vcov = NULL, se = NULL, cluster, ssc = NULL, attr = FALSE, forceCovariance = FALSE, keepBounded = FALSE, nthreads = getFixest_nthreads(), vcov_fix = TRUE, ...)Arguments
object | A |
vcov | Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form: |
se | Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation: |
cluster | Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering over |
ssc | An object of class |
attr | Logical, defaults to |
forceCovariance | (Advanced users.) Logical, default is |
keepBounded | (Advanced users – |
nthreads | The number of threads. Can be: a) an integer lower than, or equal to,the maximum number of threads; b) 0: meaning all available threads will be used;c) a number strictly between 0 and 1 which represents the fraction of all threads to use.The default is to use 50% of all threads. You can set permanently the numberof threads used within this package using the function |
vcov_fix | Logical scalar, default is |
... | Other arguments to be passed to The computation of the VCOV matrix is first done in |
Details
For an explanation on how the standard-errors are computed and what is the exact meaning ofthe arguments, please have a look at the dedicated vignette:On standard-errors.
Value
It returns aK\times K square matrix whereK is the number of variablesof the fitted model.Ifattr = TRUE, this matrix has an attribute “type” specifying how thisvariance/covariance matrix has been computed.
Author(s)
Laurent Berge
References
Ding, Peng, 2021, "The Frisch–Waugh–Lovell theorem for standard errors." Statistics & Probability Letters 168.
See Also
You can also compute VCOVs with the following functions:vcov_cluster,vcov_hac,vcov_conley.
See also the main estimation functionsfemlm,feols orfeglm.summary.fixest,confint.fixest,resid.fixest,predict.fixest,fixef.fixest.
Examples
# Load panel datadata(base_did)# Simple estimation on a panelest = feols(y ~ x1, base_did)# ======== ## IID VCOV ## ======== ## By default the VCOV assumes iid errors:se(vcov(est))# You can make the call for an iid VCOV explicitly:se(vcov(est, "iid"))## Heteroskedasticity-robust VCOV## By default the VCOV assumes iid errors:se(vcov(est, "hetero"))# => note that it also accepts vcov = "White" and vcov = "HC1" as aliases.# =============== ## Clustered VCOVs ## =============== ## To cluster the VCOV, you can use a formula of the form cluster ~ var1 + var2 etc# Let's cluster by the panel ID:se(vcov(est, cluster ~ id))# Alternative ways:# -> cluster is implicitly assumed when a one-sided formula is providedse(vcov(est, ~ id))# -> using the argument cluster instead of vcovse(vcov(est, cluster = ~ id))# For two-/three- way clustering, just add more variables:se(vcov(est, ~ id + period))# -------------------|# Implicit deduction |# -------------------|# When the estimation contains FEs, the dimension on which to cluster# is directly inferred from the FEs used in the estimation, so you don't need# to explicitly add them.est_fe = feols(y ~ x1 | id + period, base_did)# Clustered along "id"se(vcov(est_fe, "cluster"))# Clustered along "id" and "period"se(vcov(est_fe, "twoway"))# =========== ## Panel VCOVs ## =========== ## ---------------------|# Newey West (NW) VCOV |# ---------------------|# To obtain NW VCOVs, use a formula of the form NW ~ id + periodse(vcov(est, NW ~ id + period))# If you want to change the lag:se(vcov(est, NW(3) ~ id + period))# Alternative way:# -> using the vcov_NW functionse(vcov(est, vcov_NW(unit = "id", time = "period", lag = 3)))# -------------------------|# Driscoll-Kraay (DK) VCOV |# -------------------------|# To obtain DK VCOVs, use a formula of the form DK ~ periodse(vcov(est, DK ~ period))# If you want to change the lag:se(vcov(est, DK(3) ~ period))# Alternative way:# -> using the vcov_DK functionse(vcov(est, vcov_DK(time = "period", lag = 3)))# -------------------|# Implicit deduction |# -------------------|# When the estimation contains a panel identifier, you don't need# to re-write them later onest_panel = feols(y ~ x1, base_did, panel.id = ~id + period)# Both methods, NM and DK, now work automaticallyse(vcov(est_panel, "NW"))se(vcov(est_panel, "DK"))# =================================== ## VCOVs robust to spatial correlation ## =================================== #data(quakes)est_geo = feols(depth ~ mag, quakes)# ------------|# Conley VCOV |# ------------|# To obtain a Conley VCOV, use a formula of the form conley(cutoff) ~ lat + lon# with lat/lon the latitude/longitude variable names in the data setse(vcov(est_geo, conley(100) ~ lat + long))# Alternative way:# -> using the vcov_DK functionse(vcov(est_geo, vcov_conley(lat = "lat", lon = "long", cutoff = 100)))# -------------------|# Implicit deduction |# -------------------|# By default the latitude and longitude are directly fetched in the data based# on pattern matching. So you don't have to specify them.# Furhter, an automatic cutoff is deduced by default.# The following works:se(vcov(est_geo, "conley"))# ======================== ## Small Sample Corrections ## ======================== ## You can change the way the small sample corrections are done with the argument ssc.# The argument ssc must be created by the ssc functionse(vcov(est, ssc = ssc(K.adj = FALSE)))# You can add directly the call to ssc in the vcov formula.# You need to add it like a variable:se(vcov(est, iid ~ ssc(K.adj = FALSE)))se(vcov(est, DK ~ period + ssc(K.adj = FALSE)))Clustered VCOV
Description
Computes the clustered VCOV offixest objects.
Usage
vcov_cluster(x, cluster = NULL, ssc = NULL, vcov_fix = TRUE)Arguments
x | A |
cluster | Either i) a character vector giving the names of the variables onto which tocluster, or ii) a formula giving those names, or iii) a vector/list/data.frame giving the hardvalues of the clusters. Note that in cases i) and ii) the variables are fetched directly in thedata set used for the estimation. |
ssc | An object returned by the function |
vcov_fix | Logical scalar, default is |
Value
If the first argument is afixest object, then a VCOV is returned (i.e. a symmetric matrix).
If the first argument is not afixest object, then a) implicitly the arguments are shifted tothe left (i.e.vcov_cluster(~var1 + var2) is equivalent tovcov_cluster(cluster = ~var1 + var2)) and b) a VCOV-request is returned and NOT a VCOV.That VCOV-request can then be used in the argumentvcov of variousfixestfunctions (e.g.vcov.fixest or even in the estimation calls).
Author(s)
Laurent Berge
References
Cameron AC, Gelbach JB, Miller DL (2011). "Robust Inference with Multiway Clustering."Journal of Business & Economic Statistics, 29(2), 238-249. doi:10.1198/jbes.2010.07136.
Examples
base = irisnames(base) = c("y", "x1", "x2", "x3", "species")base$clu = rep(1:5, 30)est = feols(y ~ x1, base)# VCOV: using a formula giving the name of the clustersvcov_cluster(est, ~species + clu)# works as well with a character vectorvcov_cluster(est, c("species", "clu"))# you can also combine the two with '^'vcov_cluster(est, ~species^clu)## Using VCOV requests## per se: pretty useless...vcov_cluster(~species)# ...but VCOV-requests can be used at estimation time:# it may be more explicit than...feols(y ~ x1, base, vcov = vcov_cluster("species"))# ...the equivalent, built-in way:feols(y ~ x1, base, vcov = ~species)# The argument vcov does not accept hard values,# so you can feed them with a VCOV-request:feols(y ~ x1, base, vcov = vcov_cluster(rep(1:5, 30)))Conley VCOV
Description
Compute VCOVs robust to spatial correlation, a la Conley (1999).
Usage
vcov_conley( x, lat = NULL, lon = NULL, cutoff = NULL, pixel = 0, distance = "triangular", ssc = NULL, vcov_fix = TRUE)conley(cutoff = NULL, pixel = NULL, distance = NULL)Arguments
x | A |
lat | A character scalar or a one sided formula giving the name of the variablerepresenting the latitude. The latitude must lie in [-90, 90], [0, 180] or [-180, 0]. |
lon | A character scalar or a one sided formula giving the name of the variablerepresenting the longitude. The longitude must be in [-180, 180], [0, 360] or [-360, 0]. |
cutoff | The distance cutoff, in km. You can express the cutoff in miles by writing thenumber in character form and adding "mi" as a suffix: cutoff = "100mi" would be 100 miles. Ifmissing, a rule of thumb is used to deduce the cutoff, see details. |
pixel | A positive numeric scalar, default is 0. If a positive number, the coordinates ofeach observation are pooled into |
distance | How to compute the distance between points. It can be equal to "triangular"(default) or "spherical". The latter case corresponds to the great circle distance and is moreprecise than triangular but is a bit more intensive computationally. |
ssc | An object returned by the function |
vcov_fix | Logical scalar, default is |
Details
This function computes VCOVs that are robust to spatial correlations by assuming a correlationbetween the units that are at a geographic distance lower than a given cutoff.
The kernel is uniform.
If the cutoff is not provided, an estimation of it is given. This cutoff ensures that a minimumof units lie within it and is robust to sub-sampling. This automatic cutoff is only here forconvenience, the most appropriate cutoff shall depend on the application and shall be providedby the user.
The functionconley does not compute VCOVs directly but is meant to be used in the argumentvcov offixest functions (e.g. invcov.fixest or even in the estimation calls).
If the cutoff is missing, a rule of thumb is used to deduce a sensible cutoff.The algorithm is as follows:
all observations are sorted according to their latitude and their longitude (latitude major)
for each observation we take the minimum distance across the three units with the closest latitude
we do the same when sorting this time by longitude first and latitude second (longitude major)
the cutoff is the sum of the median of these two distances (lat. major and lon. major)
This cutoff is provided only for convenience but should be an appropriate first guess.With this cutoff, about 50% of units should have at least around 8 neighbors.
Value
If the first argument is afixest object, then a VCOV is returned (i.e. a symmetric matrix).
If the first argument is not afixest object, then a) implicitly the arguments are shifted tothe left (i.e.vcov_conley("lat", "long") is equivalent tovcov_conley(lat = "lat", lon = "long")) and b) a VCOV-request is returned and NOT a VCOV.That VCOV-request can then be used in the argumentvcov of variousfixest functions(e.g.vcov.fixest or even in the estimation calls).
References
Conley TG (1999). "GMM Estimation with Cross Sectional Dependence",Journal of Econometrics, 92, 1-45.
Examples
data(quakes)# We use conley() in the vcov argument of the estimationfeols(depth ~ mag, quakes, conley(100))# Post estimationest = feols(depth ~ mag, quakes)vcov_conley(est, cutoff = 100)HAC VCOVs
Description
Set of functions to compute the VCOVs robust to different forms correlation in panel ortime series settings.
Usage
vcov_DK(x, time = NULL, lag = NULL, ssc = NULL, vcov_fix = TRUE)vcov_NW(x, unit = NULL, time = NULL, lag = NULL, ssc = NULL, vcov_fix = TRUE)NW(lag = NULL)newey_west(lag = NULL)DK(lag = NULL)driscoll_kraay(lag = NULL)Arguments
x | A |
time | A character scalar or a one sided formula giving the name of thevariable representing the time. |
lag | An integer scalar, default is |
ssc | An object returned by the function |
vcov_fix | Logical scalar, default is |
unit | A character scalar or a one sided formula giving the name of thevariable representing the units of the panel. |
Details
There are currently three VCOV types: Newey-West applied to time series, Newey-West applied toa panel setting (when the argument 'unit' is not missing), and Driscoll-Kraay.
The functions on this page without the prefix "vcov_" do not compute VCOVs directly butare meant to be used in the argumentvcov offixest functions (e.g. invcov.fixestor even in the estimation calls).
Note that for Driscoll-Kraay VCOVs, to ensure its properties the number of periods shouldbe long enough (a minimum of 20 periods or so).
Value
If the first argument is afixest object, then a VCOV is returned (i.e. a symmetric matrix).
If the first argument is not afixest object, then a) implicitly the arguments are shifted tothe left (i.e.vcov_DK(~year) is equivalent tovcov_DK(time = ~year)) and b) aVCOV-request is returned and NOT a VCOV. That VCOV-request can then be used in the argumentvcov of variousfixest functions (e.g.vcov.fixest or even in the estimation calls).
Lag selection
The default lag selection depends on whether the VCOV applies to a panel or a time series.
For panels, i.e. panel Newey-West or Driscoll-Kraay VCOV, the default lag isn_t^0.25 withn_t the number of time periods. This is based on Newey and West 1987.
For time series Newey-West, the default lag is found thanks to thebwNeweyWest function from thesandwich package. It is based onNewey and West 1994.
References
Newey WK, West KD (1987). "A Simple, Positive Semi-Definite, Heteroskedasticity andAutocorrelation Consistent Covariance Matrix."Econometrica, 55(3), 703-708. doi:10.2307/1913610.
Driscoll JC, Kraay AC (1998). "Consistent Covariance Matrix Estimation with Spatially DependentPanel Data."The Review of Economics and Statistics, 80(4), 549-560. doi:10.1162/003465398557825.
Millo G (2017). "Robust Standard Error Estimators for Panel Models: A Unifying Approach"Journal of Statistical Software, 82(3). doi:10.18637/jss.v082.i03.
Examples
data(base_did)## During the estimation## Panel Newey-West, lag = 2feols(y ~ x1, base_did, NW(2) ~ id + period)# Driscoll-Kraayfeols(y ~ x1, base_did, DK ~ period)# If the estimation is made with a panel.id, the dimensions are# automatically deduced:est = feols(y ~ x1, base_did, "NW", panel.id = ~id + period)est## Post estimation## If missing, the unit and time are automatically deduced from# the panel.id used in the estimationvcov_NW(est, lag = 2)Heteroskedasticity-Robust VCOV
Description
Computes the heteroskedasticity-robust VCOV offixest objects.
Usage
vcov_hetero( x, type = "hc1", exact = TRUE, boot.size = NULL, ssc = NULL, vcov_fix = TRUE)Arguments
x | A |
type | A string scalar. Either "HC1"/"HC2"/"HC3" |
exact | Logical scalar, default is |
boot.size | Integer scalar or |
ssc | An object returned by the function |
vcov_fix | Logical scalar, default is |
Value
If the first argument is afixest object, then a VCOV is returned (i.e. a symmetric matrix).
If the first argument is not afixest object, then a) implicitly the arguments are shifted to the left (i.e.vcov_hetero("HC3") is equivalent tovcov_hetero(type = "HC3") and b) a VCOV-request is returned and NOT a VCOV. That VCOV-request can then be used in the argumentvcov of variousfixest functions (e.g.vcov.fixest or even in the estimation calls).
Author(s)
Laurent Berge and Kyle Butts
References
MacKinnon, J. G. (2012). "Thirty years of heteroscedasticity-robust inference." Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis, pp. 437–461. https://doi.org/10.1007/978-1-4614-1653-1_17
Examples
base = irisnames(base) = c("y", "x1", "x2", "x3", "species")est = feols(y ~ x1 | species, base)vcov_hetero(est, "hc1")vcov_hetero(est, "hc2", ssc = ssc(K.adj = FALSE))vcov_hetero(est, "hc3", ssc = ssc(K.adj = FALSE))# Using approximate hatvaluesvcov_hetero(est, "hc3", exact = FALSE, boot.size = 500)Wald test of nullity of coefficients
Description
Wald test used to test the joint nullity of a set of coefficients.
Usage
wald(x, keep = NULL, drop = NULL, print = TRUE, vcov, se, cluster, ...)Arguments
x | A |
keep | Character vector. This element is used to display only a subset of variables. Thisshould be a vector of regular expressions (see |
drop | Character vector. This element is used if some variables are not to be displayed.This should be a vector of regular expressions (see |
print | Logical, default is |
vcov | Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form: |
se | Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation: |
cluster | Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering over |
... | Any other element to be passed to |
Details
The type of VCOV matrix plays a crucial role in this test. Use the argumentsse andcluster to change the type of VCOV for the test.
Value
A named vector containing the following elements is returned:stat,p,df1,anddf2. They correspond to the test statistic, the p-value, the first andsecond degrees of freedoms.
If no valid coefficient is found, the valueNA is returned.
Examples
data(airquality)est = feols(Ozone ~ Solar.R + Wind + poly(Temp, 3), airquality)# Testing the joint nullity of the Temp polynomialwald(est, "poly")# Same but with clustered SEswald(est, "poly", cluster = "Month")# Now: all vars but the polynomial and the interceptwald(est, drop = "Inte|poly")## Toy example: testing pre-trends#data(base_did)est_did = feols(y ~ x1 + i(period, treat, 5) | id + period, base_did)# The graph of the coefficientscoefplot(est_did)# The pre-trend testwald(est_did, "period::[1234]$")# If "period::[1234]$" looks weird to you, check out# regular expressions: e.g. see ?regex.# Learn it, you won't regret it!Extracts the weights from afixest object
Description
Simply extracts the weights used to estimate afixest model.
Usage
## S3 method for class 'fixest'weights(object, ...)Arguments
object | A |
... | Not currently used. |
Value
Returns a vector of the same length as the number of observations in the original data set.Ignored observations due to NA or perfect fit are re-introduced and their weights set to NA.
See Also
feols,fepois,feglm,fenegbin,feNmlm.
Examples
est = feols(Petal.Length ~ Petal.Width, iris, weights = ~as.integer(Sepal.Length) - 3.99)weights(est)Expands formula macros
Description
Create macros within formulas and expand them with character vectors or other formulas.
Usage
xpd( fml, ..., add = NULL, lhs = NULL, rhs = NULL, add.after_pipe = NULL, data = NULL, frame = parent.frame())Arguments
fml | A formula containing macros variables. Each macro variable must startwith two dots.The macro variables can be set globally using |
... | Definition of the macro variables. Each argument name corresponds to the name of themacro variable. It is required that each macro variable name starts with two dots(e.g. |
add | A character vector or a one-sided formula.The elements will be added to the right-hand-side of the formula,before any macro expansion is applied. |
lhs | If present then a formula will be constructed with |
rhs | If present, then a formula will be constructed with |
add.after_pipe | A character vector or a one-sided or two-sided formula.The elements will be added to the right-hand-side of the formula, just after a pipe ( |
data | Either a character vector or a data.frame. This argument will only be used if amacro of the type |
frame | The environment containing the values to be expanded with thedot square bracket operator. Default is |
Details
Inxpd, the default macro variables are taken fromgetFixest_fml. Any value in the...argument ofxpd will replace these default values.
The definitions of the macro variables will replace in verbatim the macro variables.Therefore,you can include multi-part formulas if you wish but then beware of the order of the macrosvariable in the formula. For example, using theairquality data, say you want to set ascontrols the variableTemp andDay fixed-effects, you can dosetFixest_fml(..ctrl = ~Temp | Day), but thenfeols(Ozone ~ Wind + ..ctrl, airquality)will be quite different fromfeols(Ozone ~ ..ctrl + Wind, airquality), so beware!
Value
It returns a formula where all macros have been expanded.
Dot square bracket operator in formulas
In a formula, the dot square bracket (DSB) operator can: i) create manifold variablesat once, or ii) capture values from the current environment and put themverbatim in the formula.
Say you want to include the variablesx1 tox3 in your formula. You can usexpd(y ~ x.[1:3]) and you'll gety ~ x1 + x2 + x3.
To summon values from the environment, simply put the variable in square brackets.For example:for(i in 1:3) xpd(y.[i] ~ x) will create the formulasy1 ~ x toy3 ~ xdepending on the value ofi.
You can include a full variable from the environment in the same way:for(y in c("a", "b")) xpd(.[y] ~ x) will create the two formulasa ~ x andb ~ x.
The DSB can even be used within variable names, but then the variable must be nested incharacter form. For exampley ~ .["x.[1:2]_sq"] will createy ~ x1_sq + x2_sq. Using thecharacter form is important to avoid a formula parsing error.Double quotes must be used. Note that the character string that is nested willbe parsed with the functiondsb, and thus it will return a vector.
By default, the DSB operator expands vectors into sums. You can add a comma,like in.[, x],to expand with commas–the content can then be used within functions. For instance:c(x.[, 1:2]) will createc(x1, x2) (andnotc(x1 + x2)).
In allfixest estimations, this special parsing is enabled, so you don't need to usexpd.
One-sided formulas can be expanded with the DSB operator: letx = ~sepal + petal, thenxpd(y ~ .[x]) leads tocolor ~ sepal + petal.
You can even use multiple square brackets within a single variable,but then the use of nesting is required.For example, the followingxpd(y ~ .[".[letters[1:2]]_.[1:2]"]) will createy ~ a_1 + b_2. Remember that the nested character string is parsed withdsb,which explains this behavior.
When the element to be expanded i) is equal to the empty string or,ii) is of length 0, it is replaced with a neutral element, namely1.For example,x = "" ; xpd(y ~ .[x]) leads toy ~ 1.
Regular expressions
You can catch several variable names at once by using regular expressions. To use regularexpressions, you need to enclose it in the dot-dot or the regex function:..("regex") orregex("regex"). For example,regex("Sepal") will catch both the variablesSepal.Length andSepal.Width from theiris data set.In afixest estimation, the variables names from which the regex willbe applied come from the data set. If you usexpd, you need to provideeither a data set or a vector of names in the argumentdata.
By default the variables are aggregated with a sum. For example in a data setwith the variables x1 to x10,regex("x(1|2)" will yieldx1 + x2 + x10. You can instead ask for "comma"aggregation by using a comma first, just before the regular expression:y ~ sw(regex(,"x(1|2)")) would lead toy ~ sw(x1, x2, x10).
Note that the dot square bracket operator (DSB, see before) is applied before the regularexpression is evaluated. This means thatregex("x.[3:4]_sq") will lead,after evaluation of the DSB, toregex("x3_sq|x4_sq").It is a handy way to insert range of numbers in a regular expression.
Author(s)
Laurent Berge
See Also
setFixest_fml to set formula macros, anddsb to modify character strings with the DSB operator.
Examples
# Small examples with airquality datadata(airquality)# we set two macro variablessetFixest_fml(..ctrl = ~ Temp + Day, ..ctrl_long = ~ poly(Temp, 2) + poly(Day, 2))# Using the macro in lm with xpd:lm(xpd(Ozone ~ Wind + ..ctrl), airquality)lm(xpd(Ozone ~ Wind + ..ctrl_long), airquality)# You can use the macros without xpd() in fixest estimationsa = feols(Ozone ~ Wind + ..ctrl, airquality)b = feols(Ozone ~ Wind + ..ctrl_long, airquality)etable(a, b, keep = "Int|Win")# Using .[]base = setNames(iris, c("y", "x1", "x2", "x3", "species"))i = 2:3z = "species"lm(xpd(y ~ x.[2:3] + .[z]), base)# No xpd() needed in feolsfeols(y ~ x.[2:3] + .[z], base)## Auto completion with '..' suffix## You can trigger variables autocompletion with the '..' suffix# You need to provide the argument database = setNames(iris, c("y", "x1", "x2", "x3", "species"))xpd(y ~ x.., data = base)# In fixest estimations, this is automatically taken care offeols(y ~ x.., data = base)## You can use xpd for stepwise estimations## Note that for stepwise estimations in fixest, you can use# the stepwise functions: sw, sw0, csw, csw0# -> see help in feols or in the dedicated vignette# we want to look at the effect of x1 on y# controlling for different variablesbase = irisnames(base) = c("y", "x1", "x2", "x3", "species")# We first create a matrix with all possible combinations of variablesmy_args = lapply(names(base)[-(1:2)], function(x) c("", x))(all_combs = as.matrix(do.call("expand.grid", my_args)))res_all = list()for(i in 1:nrow(all_combs)){ res_all[[i]] = feols(xpd(y ~ x1 + ..v, ..v = all_combs[i, ]), base)}etable(res_all)coefplot(res_all, group = list(Species = "^^species"))## You can use macros to grep variables in your data set## Example 1: setting a macro variable globallydata(longley)setFixest_fml(..many_vars = grep("GNP|ployed", names(longley), value = TRUE))feols(Armed.Forces ~ Population + ..many_vars, longley)# Example 2: using ..("regex") or regex("regex") to grep the variables "live"feols(Armed.Forces ~ Population + ..("GNP|ployed"), longley)# Example 3: same as Ex.2 but without using a fixest estimation# Here we need to use xpd():lm(xpd(Armed.Forces ~ Population + regex("GNP|ployed"), data = longley), longley)# Stepwise estimation with regex: use a comma after the parenthesisfeols(Armed.Forces ~ Population + sw(regex(,"GNP|ployed")), longley)# Multiple LHSetable(feols(..("GNP|ployed") ~ Population, longley))## lhs and rhs arguments## to create a one sided formula from a character vectorvars = letters[1:5]xpd(rhs = vars)# Alternatively, to replace the RHSxpd(y ~ 1, rhs = vars)# To create a two sided formulaxpd(lhs = "y", rhs = vars)## argument 'add'#xpd(~x1, add = ~ x2 + x3)# also works with character vectorsxpd(~x1, add = c("x2", "x3"))# only adds to the RHSxpd(y ~ x, add = ~bon + jour)## argument add.after_pipe#xpd(~x1, add.after_pipe = ~ x2 + x3)# we can add a two sided formulaxpd(~x1, add.after_pipe = x2 ~ x3)## Dot square bracket operator## The basic use is to add variables in the formulax = c("x1", "x2")xpd(y ~ .[x])# Alternatively, one-sided formulas can be used and their content will be inserted verbatimx = ~x1 + x2xpd(y ~ .[x])# You can create multiple variables at oncexpd(y ~ x.[1:5] + z.[2:3])# You can summon variables from the environment to complete variables namesvar = "a"xpd(y ~ x.[var])# ... the variables can be multiplevars = LETTERS[1:3]xpd(y ~ x.[vars])# You can have "complex" variable names but they must be nested in character formxpd(y ~ .["x.[vars]_sq"])# DSB can be used within regular expressionsre = c("GNP", "Pop")xpd(Unemployed ~ regex(".[re]"), data = longley)# => equivalent to regex("GNP|Pop")# Use .[,var] (NOTE THE COMMA!) to expand with commas# !! can break the formula if missusedvars = c("wage", "unemp")xpd(c(y.[,1:3]) ~ csw(.[,vars]))# Example of use of .[] within a loopres_all = list()for(p in 1:3){ res_all[[p]] = feols(Ozone ~ Wind + poly(Temp, .[p]), airquality)}etable(res_all)# The former can be compactly estimated with:res_compact = feols(Ozone ~ Wind + sw(.[, "poly(Temp, .[1:3])"]), airquality)etable(res_compact)# How does it work?# 1) .[, stuff] evaluates stuff and, if a vector, aggregates it with commas# Comma aggregation is done thanks to the comma placed after the square bracket# If .[stuff], then aggregation is with sums.# 2) stuff is evaluated, and if it is a character string, it is evaluated with# the function dsb which expands values in .[]## Wrapping up:# 2) evaluation of dsb("poly(Temp, .[1:3])") leads to the vector:# c("poly(Temp, 1)", "poly(Temp, 2)", "poly(Temp, 3)")# 1) .[, c("poly(Temp, 1)", "poly(Temp, 2)", "poly(Temp, 3)")] leads to# poly(Temp, 1), poly(Temp, 2), poly(Temp, 3)## Hence sw(.[, "poly(Temp, .[1:3])"]) becomes:# sw(poly(Temp, 1), poly(Temp, 2), poly(Temp, 3))## In non-fixest functions: guessing the data allows to use regex## When used in non-fixest functions, the algorithm tries to "guess" the data# so that ..("regex") can be directly evaluated without passing the argument 'data'data(longley)lm(xpd(Armed.Forces ~ Population + ..("GNP|ployed")), longley)# same for the auto completion with '..'lm(xpd(Armed.Forces ~ Population + GN..), longley)