Movatterモバイル変換


[0]ホーム

URL:


Type:Package
Title:Fast Fixed-Effects Estimations
Version:0.13.2
Imports:stats, graphics, grDevices, tools, utils, methods, numDeriv,nlme, sandwich, Rcpp(≥ 1.0.5), dreamerr(≥ 1.4.0),stringmagic(≥ 1.2.0)
Suggests:knitr, rmarkdown, data.table, plm, MASS, pander, ggplot2,lfe, tinytex, pdftools, emmeans, estimability, AER, Matrix
LinkingTo:Rcpp
Depends:R(≥ 3.5.0)
Description:Fast and user-friendly estimation of econometric models with multiple fixed-effects. Includes ordinary least squares (OLS), generalized linear models (GLM) and the negative binomial. The core of the package is based on optimized parallel C++ code, scaling especially well for large data sets. The method to obtain the fixed-effects coefficients is based on Berge (2018)https://github.com/lrberge/fixest/blob/master/_DOCS/FENmlm_paper.pdf. Further provides tools to export and view the results of several estimations with intuitive design to cluster the standard-errors.
License:GPL-3
BugReports:https://github.com/lrberge/fixest/issues
URL:https://lrberge.github.io/fixest/,https://github.com/lrberge/fixest
VignetteBuilder:knitr
LazyData:true
RoxygenNote:7.3.2.9000
Encoding:UTF-8
NeedsCompilation:yes
Packaged:2025-09-05 09:44:37 UTC; berge028
Author:Laurent Berge [aut, cre], Sebastian Krantz [ctb], Grant McDermottORCID iD [ctb], Russell Lenth [ctb], Kyle Butts [ctb]
Maintainer:Laurent Berge <laurent.berge@u-bordeaux.fr>
Repository:CRAN
Date/Publication:2025-09-08 07:30:02 UTC

Fast and User-Friendly Fixed-Effects Estimations

Description

The packagefixest provides a family of functions to perform estimationswith multiple fixed-effects. Standard-errors can be easily and intuitively clustered.It also includes tools to seamlessly export the results of various estimations.

Details

The main features are:

Author(s)

Maintainer: Laurent Bergelaurent.berge@u-bordeaux.fr

Other contributors:

References

Berge, Laurent, 2018, "Efficient estimation of maximum likelihood modelswith multiple fixed-effects: the R package FENmlm." CREA Discussion Papers,13 ().

See Also

Useful links:


Aikake's an information criterion

Description

This function computes the AIC (Aikake's, an information criterion) from afixest estimation.

Usage

## S3 method for class 'fixest'AIC(object, ..., k = 2)

Arguments

object

Afixest object. Obtained using the functionsfemlm,feols orfeglm.

...

Optionally, more fitted objects.

k

A numeric, the penalty per parameter to be used; the default k = 2 is theclassical AIC (i.e.AIC=-2*LL+k*nparams).

Details

The AIC is computed as:

AIC = -2\times LogLikelihood + k\times nbParams

with k the penalty parameter.

You can have more information on this criterion onAIC.

Value

It return a numeric vector, with length the same as the number of objects taken as arguments.

Author(s)

Laurent Berge

See Also

See also the main estimation functionsfemlm,feols orfeglm.Other statictics methods:BIC.fixest,logLik.fixest,nobs.fixest.

Examples

# two fitted models with different expl. variables:res1 = femlm(Sepal.Length ~ Sepal.Width + Petal.Length +             Petal.Width | Species, iris)res2 = femlm(Sepal.Length ~ Petal.Width | Species, iris)AIC(res1, res2)BIC(res1, res2)

Bayesian information criterion

Description

This function computes the BIC (Bayesian information criterion) from afixest estimation.

Usage

## S3 method for class 'fixest'BIC(object, ...)

Arguments

object

Afixest object. Obtained using the functionsfemlm,feols orfeglm.

...

Optionally, more fitted objects.

Details

The BIC is computed as follows:

BIC = -2\times LogLikelihood + \log(nobs)\times nbParams

with k the penalty parameter.

You can have more information on this criterion onAIC.

Value

It return a numeric vector, with length the same as the number of objects taken as arguments.

Author(s)

Laurent Berge

See Also

See also the main estimation functionsfemlm,feols orfeglm. Other statistics functions:AIC.fixest,logLik.fixest.

Examples

# two fitted models with different expl. variables:res1 = femlm(Sepal.Length ~ Sepal.Width + Petal.Length +            Petal.Width | Species, iris)res2 = femlm(Sepal.Length ~ Petal.Width | Species, iris)AIC(res1, res2)BIC(res1, res2)

Subsets a fixest_multi object

Description

Subsets a fixest_multi object using different keys.

Usage

## S3 method for class 'fixest_multi'x[i, sample, lhs, rhs, fixef, iv, I, reorder = TRUE, drop = FALSE]

Arguments

x

Afixest_multi object, obtained from afixest estimation leading tomultiple results.

i

An integer vector. Represents the estimations to extract.

sample

An integer vector, a logical scalar, or a character vector. It representsthesample identifiers for which the results should be extracted. Only valid when thefixest estimation was a split sample. You can use.N to refer to the last element.If logical, all elements are selected in both cases, butFALSE leadssample to becomethe rightmost key (just try it out).

lhs

An integer vector, a logical scalar, or a character vector. It representsthe left-hand-sides identifiers for which the results should be extracted. Only valid whenthefixest estimation contained multiple left-hand-sides. You can use.N to refer tothe last element. If logical, all elements are selected in both cases, butFALSEleadslhs to become the rightmost key (just try it out).

rhs

An integer vector or a logical scalar. It represents the right-hand-sidesidentifiers for which the results should be extracted. Only valid when thefixestestimation contained multiple right-hand-sides. You can use.N to refer to the lastelement. If logical, all elements are selected in both cases, butFALSE leadsrhs tobecome the rightmost key (just try it out).

fixef

An integer vector or a logical scalar. It represents the fixed-effectsidentifiers for which the results should be extracted. Only valid when thefixestestimation contained fixed-effects in a stepwise fashion. You can use.N to refer to thelast element. If logical, all elements are selected in both cases, butFALSE leadsfixefto become the rightmost key (just try it out).

iv

An integer vector or a logical scalar. It represent the stages of the IV. Notethat the length can be greater than 2 when there are multiple endogenous regressors (thefirst stage corresponding to multiple estimations). Note that the order of the stages dependson thestage argument fromsummary.fixest. If logical, all elements are selected inboth cases, butFALSE leadsiv to become the rightmost key (just try it out).

I

An integer vector. Represents the root element to extract.

reorder

Logical, default isTRUE. Indicates whether reordering of the resultsshould be performed depending on the user input.

drop

Logical, default isFALSE. If the result contains only one estimation,then ifdrop = TRUE it will be transformed into afixest object (instead offixest_multi).Its default value can be modified with the functionsetFixest_multi.

Details

The order with we we use the keys matter. Every time a keysample,lhs,rhs,fixef oriv is used, a reordering is performed to consider the leftmost-side keyto be the new root.

Use logical keys to easily reorder. For example, say the objectres contains amultiple estimation with multiple left-hand-sides, right-hand-sides and fixed-effects.By default the results are ordered as follows:lhs,fixef,rhs.If you useres[lhs = FALSE], then the new order is:fixef,rhs,lhs.Withres[rhs = TRUE, lhs = FALSE] it becomes:rhs,fixef,lhs. In both casesyou keep all estimations.

Value

It returns afixest_multi object. If there is only one estimation left in the object, thenthe result is simplified into afixest object only withdrop = TRUE.

See Also

The main fixest estimation functions:feols,fepois,fenegbin,feglm,feNmlm. Tools for mutliple fixestestimations:summary.fixest_multi,print.fixest_multi,as.list.fixest_multi,sub-sub-.fixest_multi,sub-.fixest_multi.

Examples

# Estimation with multiple samples/LHS/RHSaq = airquality[airquality$Month %in% 5:6, ]est_split = feols(c(Ozone, Solar.R) ~ sw(poly(Wind, 2), poly(Temp, 2)),                  aq, split = ~ Month)# By default: sample is the rootetable(est_split)# Let's reorder, by considering lhs the rootetable(est_split[lhs = 1:.N])# Selecting only one LHS and RHSetable(est_split[lhs = "Ozone", rhs = 1])# Taking the first root (here sample = 5)etable(est_split[I = 1])# The first and last estimationsetable(est_split[i = c(1, .N)])

Method to subselect from afixest_panel

Description

Subselection from afixest_panel which has been created with the functionpanel.Also allows to create lag/lead variables with functionsl/f ifthefixest_panel is also adata.table::data.table.

Usage

## S3 method for class 'fixest_panel'x[i, j, ...]

Arguments

x

Afixest_panel object, created with the functionpanel.

i

Row subselection. Allowsdata.table::data.table style selection (provided thedata is also a data.table).

j

Variable selection. Allowsdata.table::data.table style selection/variablecreation (provided the data is also a data.table).

...

Other arguments to be passed to⁠[.data.frame⁠ ordata.table::data.table(or whatever the class of the initial data).

Details

If the original data was also a data.table, some calls to⁠[.fixest_panel⁠ may dissolvethefixest_panel object and return a regular data.table. This is the case forsubselections with additional arguments. If so, a note is displayed on the console.

Value

It returns afixest_panel data base, with the attributes allowing to createlags/leads properly bookkeeped.

Author(s)

Laurent Berge

See Also

Alternatively, the functionpanel changes adata.frame into a panel from which thefunctionsl andf (creating leads and lags) can be called. Otherwise you can set thepanel 'live' during the estimation using the argumentpanel.id (see for example inthe functionfeols).

Examples

data(base_did)# Creating a fixest_panel objectpdat = panel(base_did, ~id+period)# Subselections of fixest_panel objects bookkeeps the leads/lags enginepdat_small = pdat[!pdat$period %in% c(2, 4), ]a = feols(y~l(x1, 0:1), pdat_small)# we obtain the same results, had we created the lags "on the fly"base_small = base_did[!base_did$period %in% c(2, 4), ]b = feols(y~l(x1, 0:1), base_small, panel.id = ~id+period)etable(a, b)# Using data.table to create new lead/lag variablesif(require("data.table")){  pdat_dt = panel(as.data.table(base_did), ~id+period)  # Variable creation  pdat_dt[, x_l1 := l(x1)]  pdat_dt[, c("x_l1", "x_f1_2") := .(l(x1), f(x1)**2)]  # Estimation on a subset of the data  #  (the lead/lags work appropriately)  feols(y~l(x1, 0:1), pdat_dt[!period %in% c(2, 4)])}

Extracts one element from afixest_multi object

Description

Extracts single elements from multiplefixest estimations.

Usage

## S3 method for class 'fixest_multi'x[[i]]

Arguments

x

Afixest_multi object, obtained from afixest estimation leading tomultiple results.

i

An integer scalar. The identifier of the estimation to extract.

Value

Afixest object is returned.

See Also

The main fixest estimation functions:feols,fepois,fenegbin,feglm,feNmlm. Tools for mutliple fixestestimations:summary.fixest_multi,print.fixest_multi,as.list.fixest_multi,sub-sub-.fixest_multi,sub-.fixest_multi.

Examples

base = irisnames(base) = c("y", "x1", "x2", "x3", "species")# Multiple estimationres = feols(y ~ csw(x1, x2, x3), base, split = ~species)# The first estimationres[[1]]# The second one, etcres[[2]]

Aggregates the values of DiD coefficients a la Sun and Abraham

Description

Simple tool that aggregates the value of CATT coefficients in staggereddifference-in-difference setups (see details).

Usage

## S3 method for class 'fixest'aggregate(x, agg, full = FALSE, use_weights = TRUE, ...)

Arguments

x

Afixest object.

agg

A character scalar describing the variable names to be aggregated,it is pattern-based. Forsunab estimations, the following keywords work: "att","period", "cohort" andFALSE (to have full disaggregation). All variables thatmatch the pattern will be aggregated. It must be of the form"(root)", the parenthesesmust be there and the resulting variable name will be"root". You can add anotherroot with parentheses:"(root1)regex(root2)", in which case the resultingname is"root1::root2". To name the resulting variable differently you can passa named vector:c("name" = "pattern") orc("name" = "pattern(root2)"). It's abit intricate sorry, please see the examples.

full

Logical scalar, defaults toFALSE. IfTRUE, then all coefficientsare returned, not only the aggregated coefficients.

use_weights

Logical, default isTRUE. If the estimation was weighted,whether the aggregation should take into account the weights. Basically if theweights reflected frequency it should beTRUE.

...

Arguments to be passed tosummary.fixest.

Details

This is a function helping to replicate the estimator from Sun and Abraham (2021).You first need to perform an estimation with cohort and relative periods dummies(typically using the functioni), this leads to estimators of the cohortaverage treatment effect on the treated (CATT). Then you can use this function toretrieve the average treatment effect on each relative period, or for any other wayyou wish to aggregate the CATT.

Note that contrary to the SA article, here the cohort share in the sample isconsidered to be a perfect measure for the cohort share in the population.

Value

It returns a matrix representing a table of coefficients.

Author(s)

Laurent Berge

References

Liyang Sun and Sarah Abraham, 2021, "Estimating Dynamic Treatment Effects inEvent Studies with Heterogeneous Treatment Effects". Journal of Econometrics.

Examples

## DiD example#data(base_stagg)# 2 kind of estimations:# - regular TWFE model# - estimation with cohort x time_to_treatment interactions, later aggregated# Note: the never treated have a time_to_treatment equal to -1000# Now we perform the estimationres_twfe = feols(y ~ x1 + i(time_to_treatment, treated,                            ref = c(-1, -1000)) | id + year, base_stagg)# we use the "i." prefix to force year_treated to be considered as a factorres_cohort = feols(y ~ x1 + i(time_to_treatment, i.year_treated,                              ref = c(-1, -1000)) | id + year, base_stagg)# Displaying the resultsiplot(res_twfe, ylim = c(-6, 8))att_true = tapply(base_stagg$treatment_effect_true,                  base_stagg$time_to_treatment, mean)[-1]points(-9:8 + 0.15, att_true, pch = 15, col = 2)# The aggregate effect for each periodagg_coef = aggregate(res_cohort, "(ti.*nt)::(-?[[:digit:]]+)")x = c(-9:-2, 0:8) + .35points(x, agg_coef[, 1], pch = 17, col = 4)ci_low = agg_coef[, 1] - 1.96 * agg_coef[, 2]ci_up = agg_coef[, 1] + 1.96 * agg_coef[, 2]segments(x0 = x, y0 = ci_low, x1 = x, y1 = ci_up, col = 4)legend("topleft", col = c(1, 2, 4), pch = c(20, 15, 17),       legend = c("TWFE", "True", "Sun & Abraham"))# The ATTaggregate(res_cohort, c("ATT" = "treatment::[^-]"))with(base_stagg, mean(treatment_effect_true[time_to_treatment >= 0]))# The total effect for each cohortaggregate(res_cohort, c("cohort" = "::[^-].*year_treated::([[:digit:]]+)"))

Transforms a character string into a dictionary

Description

Transforms a single character string containing a dictionary in a textual format into a proper dictionary, that is a named character vector

Usage

as.dict(x)

Arguments

x

A character scalar of the form"variable 1: definition \n variable 2: definition"etc. Each line of this character must contain at most one definition with, on the left thevariable name, and on the right its definition. The separation between the variable and itsdefinition must be a colon followed with a single space (i.e. ": "). You can stack definitionswithin a single line by making use of a semi colon:"var1: def; var2: def". White spaces onthe left and right are ignored. You can add commented lines with a"#". Non-empty,non-commented lines that don't have the proper format witll raise an error.

Details

This function is mostly used in combination withsetFixest_dict to set the dictionary to beused in the functionetable.

Value

It returns a named character vector.

Author(s)

Laurent Berge

See Also

etable,setFixest_dict

Examples

x = "# Main vars     mpg: Miles per gallon     hp: Horsepower     # Categorical variables     cyl: Number of cylinders; vs: Engine"as.dict(x)

Transforms a fixest_multi object into a list

Description

Extracts the results from afixest_multi object and place them into a list.

Usage

## S3 method for class 'fixest_multi'as.list(x, ...)

Arguments

x

Afixest_multi object, obtained from afixest estimation leading tomultiple results.

...

Not currently used.

Value

Returns a list containing all the results of the multiple estimations.

See Also

The main fixest estimation functions:feols,fepois,fenegbin,feglm,feNmlm. Tools for mutliple fixestestimations:summary.fixest_multi,print.fixest_multi,as.list.fixest_multi,sub-sub-.fixest_multi,sub-.fixest_multi.

Examples

base = irisnames(base) = c("y", "x1", "x2", "x3", "species")# Multiple estimationres = feols(y ~ csw(x1, x2, x3), base, split = ~species)# All the results at onceas.list(res)

Sample data for difference in difference

Description

This data has been generated to illustrate the use of difference in difference functions inpackagefixest. This is a balanced panel of 104 individuals and 10 periods.About half the individuals are treated, the treatment having a positive effect onthe dependent variabley after the 5th period. The effect of the treatment ony is gradual.

Usage

data(base_did, package = "fixest")

Format

base_did is a data frame with 1,040 observations and 6 variables namedy,x1,id,period,post andtreat.

y

The dependent variable affected by the treatment.

x1

An explanatory variable.

id

Identifier of the individual.

period

From 1 to 10

post

Indicator taking value 1 if the period is strictly greater than 5, 0 otherwise.

treat

Indicator taking value 1 if the individual is treated, 0 otherwise.

Source

This data has been generated fromR.


Publication data sample

Description

This data reports the publication output (number of articles and number of citations received)for a few scientists from the start of their career to 2000.Most of the variables are processed from the Microsoft Academic Graph (MAG) data set. A few variables are randomly generated.

Usage

data(base_pub, package = "fixest")

Format

base_pub is a data frame with 4,024 observations and 10 variables. There are 200 different scientists and 51 different years (ends in 2000).

Source

The source of this data set is the Microsoft Academic Graph data set, extracted in 2020. Now a defunct project, you can find similar data onOpenAlex.

The variablesbirth_year,is_woman andage were randomly generated. All other variables have created from the raw MAG files.


Sample data for staggered difference in difference

Description

This data has been generated to illustrate the Sun and Abraham (Journal of Econometrics, 2021) method for staggered difference-in-difference. This is a balanced panel of 95 individuals and 10 periods. Half the individuals are treated. For those treated, the treatment date can vary from the second to the last period. The effect of the treatment depends on the time since the treatment: it is first negative and then increasing.

Usage

data(base_stagg, package = "fixest")

Format

base_stagg is a data frame with 950 observations and 7 variables:

Source

This data has been generated fromR.


Bins the values of a variable (typically a factor)

Description

Tool to easily group the values of a given variable.

Usage

bin(x, bin)

Arguments

x

A vector whose values have to be grouped. Can be of any type but must be atomic.

bin

A list of values to be grouped, a vector, a formula, or the specialvalues"bin::digit" or"cut::values". To create a new value from old values,usebin = list("new_value"=old_values) withold_values a vector of existing values.You can use.() forlist().It accepts regular expressions, but they must start with an"@", like inbin="@Aug|Dec". It accepts one-sided formulas which must contain the variablex,e.g.bin=list("<2" = ~x < 2).The names of the list are the new names. If the new name is missing, the firstvalue matched becomes the new name. In the name, adding"@d", withd a digit,will relocate the value in positiond: useful to change the position of factors.Use"@" as first item to make subsequent items be located first in the factor.Feeding in a vector is like using a list without name and only a single element.If the vector is numeric, you can use the special value"bin::digit" to groupeverydigit element.For example ifx represents years, usingbin="bin::2" creates bins of two years.With any data, using"!bin::digit" groups every digit consecutive values startingfrom the first value.Using"!!bin::digit" is the same but starting from the last value.With numeric vectors you can: a) use"cut::n" to cut the vector inton equal parts,b) use"cut::a]b[" to create the following bins:⁠[min, a]⁠,⁠]a, b[⁠,⁠[b, max]⁠.The latter syntax is a sequence of number/quartile (q0 to q4)/percentile (p0 to p100)followed by an open or closed square bracket. You can add custom bin names byadding them in the character vector after'cut::values'. See details and examples.Dot square bracket expansion (seedsb) is enabled.

Value

It returns a vector of the same length asx.

"Cutting" a numeric vector

Numeric vectors can be cut easily into: a) equal parts, b) user-specified bins.

Use"cut::n" to cut the vector inton (roughly) equal parts. Percentiles areused to partition the data, hence some data distributions can lead to create lessthann parts (for example if P0 is the same as P50).

The user can specify custom bins with the following syntax:"cut::a]b]c]". Herethe numbersa,b,c, etc, are a sequence of increasing numbers, each followedby an open or closed square bracket. The numbers can be specified as eitherplain numbers (e.g."cut::5]12[32["), quartiles (e.g."cut::q1]q3["),or percentiles (e.g."cut::p10]p15]p90]"). Values of different types can be mixed:"cut::5]q2[p80[" is valid provided the median (q2) is indeed greaterthan5, otherwise an error is thrown.

The square bracket right of each number tells whether the numbers should be includedor excluded from the current bin. For example, sayx ranges from 0 to 100,then"cut::5]" will create two bins: one from 0 to 5 and a second from 6 to 100.With"cut::5[" the bins would have been 0-4 and 5-100.

A factor is always returned. The labels always report the min and max values in each bin.

To have user-specified bin labels, just add them in the character vectorfollowing'cut::values'. You don't need to provide all of them, andNA valuesfall back to the default label. For example,bin = c("cut::4", "Q1", NA, "Q3")will modify only the first and third label that will be displayed as"Q1" and"Q3".

bin vsref

The functionsbin andref are able to do the same thing, then why use oneinstead of the other? Here are the differences:

Author(s)

Laurent Berge

See Also

To re-factor variables:ref.

Examples

data(airquality)month_num = airquality$Monthtable(month_num)# Grouping the first two valuestable(bin(month_num, 5:6))# ... plus changing the name to '10'table(bin(month_num, list("10" = 5:6)))# ... and grouping 7 to 9table(bin(month_num, list("g1" = 5:6, "g2" = 7:9)))# Grouping every two monthstable(bin(month_num, "bin::2"))# ... every 2 consecutive elementstable(bin(month_num, "!bin::2"))# ... idem starting from the last onetable(bin(month_num, "!!bin::2"))# Using .() for list():table(bin(month_num, .("g1" = 5:6)))## with non numeric data#month_lab = c("may", "june", "july", "august", "september")month_fact = factor(month_num, labels = month_lab)# Grouping the first two elementstable(bin(month_fact, c("may", "jun")))# ... using regextable(bin(month_fact, "@may|jun"))# ...changing the nametable(bin(month_fact, list("spring" = "@may|jun")))# Grouping every 2 consecutive monthstable(bin(month_fact, "!bin::2"))# ...idem but starting from the lasttable(bin(month_fact, "!!bin::2"))# Relocating the months using "@d" in the nametable(bin(month_fact, .("@5" = "may", "@1 summer" = "@aug|jul")))# Putting "@" as first item means subsequent items will be placed firsttable(bin(month_fact, .("@", "aug", "july")))## "Cutting" numeric data#data(iris)plen = iris$Petal.Length# 3 parts of (roughly) equal sizetable(bin(plen, "cut::3"))# Three custom binstable(bin(plen, "cut::2]5]"))# .. same, excluding 5 in the 2nd bintable(bin(plen, "cut::2]5["))# Using quartilestable(bin(plen, "cut::q1]q2]q3]"))# Using percentilestable(bin(plen, "cut::p20]p50]p70]p90]"))# Mixing alltable(bin(plen, "cut::2[q2]p90]"))# NOTA:# -> the labels always contain the min/max values in each bin# Custom labels can be provided, just give them in the char. vector# NA values lead to the default labeltable(bin(plen, c("cut::2[q2]p90]", "<2", "]2; Q2]", NA, ">90%")))## With a formula#data(iris)plen = iris$Petal.Length# We need to use "x"table(bin(plen, list("< 2" = ~x < 2, ">= 2" = ~x >= 2)))

Extracts the bread matrix from fixest objects

Description

Extracts the bread matrix from fixest objects to be used to compute sandwich variance-covariance matrices.

Usage

## S3 method for class 'fixest'bread(x, ...)

Arguments

x

Afixest object, obtained for instance fromfeols.

...

Not currently used.

Value

Returns a matrix of the same dimension as the number of variables used in the estimation.

Examples

est = feols(Petal.Length ~ Petal.Width + Sepal.Width, iris)bread(est)

Check the fixed-effects convergence of afeols estimation

Description

Checks the convergence of afeols estimation by computing the first-order conditions of all fixed-effects (all should be close to 0)

Usage

check_conv_feols(x)## S3 method for class 'fixest_check_conv'summary(object, type = "short", ...)

Arguments

x

Afeols estimation that should contain fixed-effects.

object

An object returned bycheck_conv_feols.

type

Either "short" (default) or "detail". If "short", only the maximum absolute FOC aredisplayed, otherwise the 2 smallest and the 2 largest FOC are reported for each fixed-effect andeach variable.

...

Not currently used.

Note that this function first re-demeans the variables, thus possibly incurring some extracomputation time.

Value

It returns a list ofN elements,N being the number of variables in the estimation(dependent variable + explanatory variables +, if IV, endogenous variables and instruments). Foreach variable, all the first-order conditions for each fixed-effect are returned.

Examples

base = setNames(iris, c("y", "x1", "x2", "x3", "species"))base$FE = rep(1:30, 5)# one estimation with fixed-effects + varying slopesest = feols(y ~ x1 | species[x2] + FE[x3], base)# Checking the convergenceconv = check_conv_feols(est)# We can check that al values are close to 0summary(conv)summary(conv, "detail")

Extracts the coefficients from afixest estimation

Description

This function extracts the coefficients obtained from a model estimated withfemlm,feols orfeglm.

Usage

## S3 method for class 'fixest'coef(object, keep, drop, order, collin = FALSE, agg = TRUE, ...)## S3 method for class 'fixest'coefficients(object, keep, drop, order, collin = FALSE, agg = TRUE, ...)

Arguments

object

Afixest object. Obtained using the functionsfemlm,feols orfeglm.

keep

Character vector. This element is used to display only a subset of variables. Thisshould be a vector of regular expressions (seebase::regex help for more info). Eachvariable satisfying any of the regular expressions will be kept. This argument is applied postaliasing (see argumentdict). Example: you have the variablex1 tox55 and want to displayonlyx1 tox9, then you could usekeep = "x[[:digit:]]$". If the first character is anexclamation mark, the effect is reversed (e.g. keep = "!Intercept" means: every variable thatdoes not contain “Intercept” is kept). See details.

drop

Character vector. This element is used if some variables are not to be displayed.This should be a vector of regular expressions (seebase::regex help for more info). Eachvariable satisfying any of the regular expressions will be discarded. This argument is appliedpost aliasing (see argumentdict). Example: you have the variablex1 tox55 and want todisplay onlyx1 tox9, then you could use⁠drop = "x[[:digit:]]{2}⁠". If the first characteris an exclamation mark, the effect is reversed (e.g. drop = "!Intercept" means: every variablethat does not contain “Intercept” is dropped). See details.

order

Character vector. This element is used if the user wants the variables to beordered in a certain way. This should be a vector of regular expressions (seebase::regexhelp for more info). The variables satisfying the first regular expression will be placed first,then the order follows the sequence of regular expressions. This argument is applied postaliasing (see argumentdict). Example: you have the following variables:month1 tomonth6,thenx1 tox5, thenyear1 toyear6. If you want to display first the x's, then theyears, then the months you could use:order = c("x", "year"). If the first character is anexclamation mark, the effect is reversed (e.g. order = "!Intercept" means: every variable thatdoes not contain “Intercept” goes first). See details.

collin

Logical, default isFALSE. Whether the coefficients removed because of collinearity should be also returned asNA. It cannot be used when coefficients aggregation is also used.

agg

Logical scalar, default isTRUE. If the coefficients of the estimation have been aggregated, whether to report the aggregated coefficients. IfFALSE, the raw coefficients will be returned.

...

Not currently used.

Details

The coefficients are the ones that have been found to maximize the log-likelihood of the specified model. More information can be found on the models from the estimations help pages:femlm,feols orfeglm.

Note that if the model has been estimated with fixed-effects, to obtain the fixed-effect coefficients, you need to use the functionfixef.fixest.

Value

This function returns a named numeric vector.

Author(s)

Laurent Berge

See Also

See also the main estimation functionsfemlm,feols orfeglm.summary.fixest,confint.fixest,vcov.fixest,etable,fixef.fixest.

Examples

# simple estimation on iris data, using "Species" fixed-effectsres = femlm(Sepal.Length ~ Sepal.Width + Petal.Length +            Petal.Width | Species, iris)# the coefficients of the variables:coef(res)# the fixed-effects coefficients:fixef(res)

Extracts the coefficients of fixest_multi objects

Description

Utility to extract the coefficients of multiple estimations and rearrange them into a matrix.

Usage

## S3 method for class 'fixest_multi'coef(  object,  keep,  drop,  order,  collin = FALSE,  long = FALSE,  na.rm = TRUE,  ...)## S3 method for class 'fixest_multi'coefficients(  object,  keep,  drop,  order,  collin = FALSE,  long = FALSE,  na.rm = TRUE,  ...)

Arguments

object

Afixest_multi object. Obtained from a multiple estimation.

keep

Character vector. This element is used to display only a subset of variables. Thisshould be a vector of regular expressions (seebase::regex help for more info). Eachvariable satisfying any of the regular expressions will be kept. This argument is applied postaliasing (see argumentdict). Example: you have the variablex1 tox55 and want to displayonlyx1 tox9, then you could usekeep = "x[[:digit:]]$". If the first character is anexclamation mark, the effect is reversed (e.g. keep = "!Intercept" means: every variable thatdoes not contain “Intercept” is kept). See details.

drop

Character vector. This element is used if some variables are not to be displayed.This should be a vector of regular expressions (seebase::regex help for more info). Eachvariable satisfying any of the regular expressions will be discarded. This argument is appliedpost aliasing (see argumentdict). Example: you have the variablex1 tox55 and want todisplay onlyx1 tox9, then you could use⁠drop = "x[[:digit:]]{2}⁠". If the first characteris an exclamation mark, the effect is reversed (e.g. drop = "!Intercept" means: every variablethat does not contain “Intercept” is dropped). See details.

order

Character vector. This element is used if the user wants the variables to beordered in a certain way. This should be a vector of regular expressions (seebase::regexhelp for more info). The variables satisfying the first regular expression will be placed first,then the order follows the sequence of regular expressions. This argument is applied postaliasing (see argumentdict). Example: you have the following variables:month1 tomonth6,thenx1 tox5, thenyear1 toyear6. If you want to display first the x's, then theyears, then the months you could use:order = c("x", "year"). If the first character is anexclamation mark, the effect is reversed (e.g. order = "!Intercept" means: every variable thatdoes not contain “Intercept” goes first). See details.

collin

Logical, default isFALSE. Whether the coefficients removed because of collinearity should be also returned asNA. It cannot be used when coefficients aggregation is also used.

long

Logical, default isFALSE. Whether the results should be displayedin a long format.

na.rm

Logical, default isTRUE. Only applies whenlong = TRUE: whether to removethe coefficients withNA values.

...

Not currently used.

Examples

base = irisnames(base) = c("y", "x1", "x2", "x3", "species")# A multiple estimationest = feols(y ~ x1 + csw0(x2, x3), base)# Getting all the coefficients at once,# each row is a modelcoef(est)# Example of keep/drop/ordercoef(est, keep = "Int|x1", order = "x1")# To change the order of the model, use fixest_multi# extraction tools:coef(est[rhs = .N:1])# collin + long + na.rmbase$x1_bis = base$x1 # => collinearest = feols(y ~ x1_bis + csw0(x1, x2, x3), base, split = ~species)# does not display x1 since it is always collinearcoef(est)# now it doescoef(est, collin = TRUE)# longcoef(est, long = TRUE)# long but balanced (with NAs then)coef(est, long = TRUE, na.rm = FALSE)

Plots confidence intervals and point estimates

Description

This function plots the results of estimations (coefficients and confidence intervals).The functioniplot restricts the output to variables created withi, eitherinteractions with factors or raw factors.

Usage

coefplot(  ...,  objects = NULL,  style = NULL,  se,  ci_low,  ci_high,  df.t = NULL,  vcov = NULL,  cluster = NULL,  x,  x.shift = 0,  horiz = FALSE,  dict = NULL,  keep,  drop,  order,  ci.width = "1%",  ci_level = 0.95,  add = FALSE,  plot_prms = list(),  pch = c(20, 17, 15, 21, 24, 22),  col = 1:8,  cex = 1,  lty = 1,  lwd = 1,  ylim = NULL,  xlim = NULL,  pt.pch = pch,  pt.bg = NULL,  pt.cex = cex,  pt.col = col,  ci.col = col,  pt.lwd = lwd,  ci.lwd = lwd,  ci.lty = lty,  grid = TRUE,  grid.par = list(lty = 3, col = "gray"),  zero = TRUE,  zero.par = list(col = "black", lwd = 1),  pt.join = FALSE,  pt.join.par = list(col = pt.col, lwd = lwd),  ci.join = FALSE,  ci.join.par = list(lwd = lwd, col = col, lty = 2),  ci.fill = FALSE,  ci.fill.par = list(col = "lightgray", alpha = 0.5),  ref = "auto",  ref.line = "auto",  ref.line.par = list(col = "black", lty = 2),  lab.cex,  lab.min.cex = 0.85,  lab.max.mar = 0.25,  lab.fit = "auto",  xlim.add,  ylim.add,  only.params = FALSE,  sep,  as.multiple = FALSE,  bg,  group = "auto",  group.par = list(lwd = 2, line = 3, tcl = 0.75),  main = "Effect on __depvar__",  value.lab = "Estimate and __ci__ Conf. Int.",  ylab = NULL,  xlab = NULL,  sub = NULL,  i.select = NULL,  do_iplot = NULL)iplot(  ...,  i.select = 1,  objects = NULL,  style = NULL,  se,  ci_low,  ci_high,  df.t = NULL,  vcov = NULL,  cluster = NULL,  x,  x.shift = 0,  horiz = FALSE,  dict = NULL,  keep,  drop,  order,  ci.width = "1%",  ci_level = 0.95,  add = FALSE,  plot_prms = list(),  pch = c(20, 17, 15, 21, 24, 22),  col = 1:8,  cex = 1,  lty = 1,  lwd = 1,  ylim = NULL,  xlim = NULL,  pt.pch = pch,  pt.bg = NULL,  pt.cex = cex,  pt.col = col,  ci.col = col,  pt.lwd = lwd,  ci.lwd = lwd,  ci.lty = lty,  grid = TRUE,  grid.par = list(lty = 3, col = "gray"),  zero = TRUE,  zero.par = list(col = "black", lwd = 1),  pt.join = FALSE,  pt.join.par = list(col = pt.col, lwd = lwd),  ci.join = FALSE,  ci.join.par = list(lwd = lwd, col = col, lty = 2),  ci.fill = FALSE,  ci.fill.par = list(col = "lightgray", alpha = 0.5),  ref = "auto",  ref.line = "auto",  ref.line.par = list(col = "black", lty = 2),  lab.cex,  lab.min.cex = 0.85,  lab.max.mar = 0.25,  lab.fit = "auto",  xlim.add,  ylim.add,  only.params = FALSE,  sep,  as.multiple = FALSE,  bg,  group = "auto",  group.par = list(lwd = 2, line = 3, tcl = 0.75),  main = "Effect on __depvar__",  value.lab = "Estimate and __ci__ Conf. Int.",  ylab = NULL,  xlab = NULL,  sub = NULL)

Arguments

...

Other arguments to be passed tosummary, ifobject is an estimation,and/or to the functionplot orlines (ifadd = TRUE).

objects

A list offixest estimation objects, orNULL (default). If provided,the objects in... are ignored and the only coefficients reported are the ones in theargumentobjects.#' @param vcov Versatile argument to specify the VCOV.In general, it is either a character scalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. The VCOV types implemented are: "iid", "hetero" (or "HC1"),"cluster", "twoway", "NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley".It also accepts object from vcov_cluster, vcov_NW, NW, vcov_DK, DK, vcov_conley and conley.It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances.See the vcov documentation in the vignette.You can pass several VCOVs (as above) if you nest them into a list.If the number of VCOVs equals the number of models, eahc VCOV is mapped to the appropriate model.If there is one model and several VCOVs, or if the first element of the list is equal to"each" or"times", then the estimations will be replicated and the resultsfor each estimation and each VCOV will be reported.

style

A character scalar giving the style of the plot to be used. Youcan set styles with the functionsetFixest_coefplot, setting all the defaultvalues of the function. If missing, then it switches to either "default" or "iplot",depending on the calling function.

se

The standard errors of the estimates. It may be missing.

ci_low

Ifse is not provided, the lower bound of the confidence interval.For each estimate.

ci_high

Ifse is not provided, the upper bound of the confidence interval.For each estimate.

df.t

Integer scalar orNULL (default). The degrees of freedom (DoF) to usewhen computing the confidence intervals with the Student t. By default ittries to capture the DoF from the estimation. To use a Normal law to compute theconfidence interval, usedf.t = Inf.

vcov

Versatile argument to specify the VCOV.In general, it is either a character scalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. The VCOV types implemented are: "iid", "hetero" (or "HC1"),"cluster", "twoway", "NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley".It also accepts object from vcov_cluster, vcov_NW, NW, vcov_DK, DK, vcov_conley and conley.It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances.See the vcov documentation in the vignette.You can pass several VCOVs (as above) if you nest them into a list.If the number of VCOVs equals the number of models, eahc VCOV is mapped to the appropriate model.If there is one model and several VCOVs, or if the first element of the list is equal to"each" or"times", then the estimations will be replicated and the resultsfor each estimation and each VCOV will be reported.

cluster

Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering overvar1 andvar2contained in the data.framebase used for the estimation. All the followingcluster arguments are valid and do the same thing:cluster = base[, c("var1", "var2")],cluster = c("var1", "var2"),cluster = ~var1+var2.If the two variables were used as fixed-effects in the estimation, you can leave itblank withvcov = "twoway" (assumingvar1 [resp.var2] wasthe 1st [resp. 2nd] fixed-effect). You can interact two variables using^ withthe following syntax:cluster = ~var1^var2 orcluster = "var1^var2".

x

The value of the x-axis. If missing, the names of the argumentestimateare used.

x.shift

Shifts the confidence intervals bars to the left or right, dependingon the value ofx.shift. Default is 0.

horiz

A logical scalar, default isFALSE. Whether to display the confidenceintervals horizontally instead of vertically.

dict

A named character vector or a logical scalar. It changes the original variable namesto the ones contained in thedictionary. E.g. to change the variables nameda andb3 to(resp.) “$log(a)$” and to “$bonus^3$”, usedict=c(a="$log(a)$",b3="$bonus^3$").By default, it is equal togetFixest_dict(), a default dictionary which can be set withsetFixest_dict. You can usedict = FALSE to disable it. By defaultdict modifies theentries in the global dictionary, to disable this behavior, use "reset" as the first element(ex:dict=c("reset", mpg="Miles per gallon")).

keep

Character vector. This element is used to display only a subset of variables. Thisshould be a vector of regular expressions (seebase::regex help for more info). Eachvariable satisfying any of the regular expressions will be kept. This argument is applied postaliasing (see argumentdict). Example: you have the variablex1 tox55 and want to displayonlyx1 tox9, then you could usekeep = "x[[:digit:]]$". If the first character is anexclamation mark, the effect is reversed (e.g. keep = "!Intercept" means: every variable thatdoes not contain “Intercept” is kept). See details.

drop

Character vector. This element is used if some variables are not to be displayed.This should be a vector of regular expressions (seebase::regex help for more info). Eachvariable satisfying any of the regular expressions will be discarded. This argument is appliedpost aliasing (see argumentdict). Example: you have the variablex1 tox55 and want todisplay onlyx1 tox9, then you could use⁠drop = "x[[:digit:]]{2}⁠". If the first characteris an exclamation mark, the effect is reversed (e.g. drop = "!Intercept" means: every variablethat does not contain “Intercept” is dropped). See details.

order

Character vector. This element is used if the user wants the variables to beordered in a certain way. This should be a vector of regular expressions (seebase::regexhelp for more info). The variables satisfying the first regular expression will be placed first,then the order follows the sequence of regular expressions. This argument is applied postaliasing (see argumentdict). Example: you have the following variables:month1 tomonth6,thenx1 tox5, thenyear1 toyear6. If you want to display first the x's, then theyears, then the months you could use:order = c("x", "year"). If the first character is anexclamation mark, the effect is reversed (e.g. order = "!Intercept" means: every variable thatdoes not contain “Intercept” goes first). See details.

ci.width

The width of the extremities of the confidence intervals. Default is0.1.

ci_level

Scalar between 0 and 1: the level of the CI. By default it is equal to 0.95.

add

Default isFALSE, if the intervals are to be added to an existinggraph. Note that if it is the case, then the argumentx MUST be numeric.

plot_prms

A named list. It may contain additionnal parameters to be passedto the plot.

pch

The patch of the coefficient estimates. Default is 1 (circle).This is an alias to tha argumentpt.pch.

col

The color of the points and the confidence intervals. Default is 1("black"). Note that you can set the colors separately for each of themwithpt.col andci.col.

cex

Numeric, default is 1. Expansion factor for the points

lty

The line type of the confidence intervals. Default is 1.This is an alias to the argumentci.lty.

lwd

General line with. Default is 1.

ylim

Numeric vector of length 2 which gives the limits of the plotting region forthe y-axis. The default isNULL, which means that it is automatically defined.Use the argumentylim.add to simply increase or decrese the default limits.

xlim

Numeric vector of length 2 which gives the limits of the plotting region forthe x-axis. The default isNULL, which means that it is automatically defined.Use the argumentxlim.add to simply increase or decrese the default limits.

pt.pch

The patch of the coefficient estimates. Default is 1 (circle).

pt.bg

The background color of the point estimate (when thept.pch isin 21 to 25). Defaults to NULL.

pt.cex

The size of the coefficient estimates. Default is the other argumentcex.

pt.col

The color of the coefficient estimates. Default is equal to the argumentcol.

ci.col

The color of the confidence intervals. Default is equal to the argumentcol.

pt.lwd

The line width of the coefficient estimates. Default is equal tothe other argumentlwd.

ci.lwd

The line width of the confidence intervals. Default is equal tothe other argumentlwd.

ci.lty

The line type of the confidence intervals. Default is 1.

grid

Logical, default isTRUE. Whether a grid should be displayed. Youcan set the display of the grid with the argumentgrid.par.

grid.par

List. Parameters of the grid. The default values are:lty = 3 andcol = "gray". You can add any graphical parameter that will be passedtographics::abline. You also have two additional arguments: usehoriz = FALSE to disable the horizontal lines, and usevert = FALSE to disable thevertical lines. Eg:grid.par = list(vert = FALSE, col = "red", lwd = 2).

zero

Logical, default isTRUE. Whether the 0-line should be emphasized.You can set the parameters of that line with the argumentzero.par.

zero.par

List. Parameters of the zero-line. The default values arecol = "black" andlwd = 1. You can add any graphical parameter that will be passedtographics::abline. Example:zero.par = list(col = "darkblue", lwd = 3).

pt.join

Logical, default isFALSE. IfTRUE, then the coefficient estimatesare joined with a line.

pt.join.par

List. Parameters of the line joining the coefficients. Thedefault values are:col = pt.col andlwd = lwd. You can add any graphicalparameter that will be passed tolines. Eg:pt.join.par = list(lty = 2).

ci.join

Logical default toFALSE. Whether to join the extremities ofthe confidence intervals. IfTRUE, then you can set the graphical parameterswith the argumentci.join.par.

ci.join.par

A list of parameters to be passed tographics::lines.Only used ifci.join=TRUE. By default it is equal tolist(lwd = lwd, col = col, lty = 2).

ci.fill

Logical default toFALSE. Whether to fill the confidence intervalswith a color. IfTRUE, then you can set the graphical parameters with the argumentci.fill.par.

ci.fill.par

A list of parameters to be passed tographics::polygon.Only used ifci.fill=TRUE. By default it is equal tolist(col = "lightgray", alpha = 0.5).Note thatalpha is a special parameter that adds transparency to the color (ranges from 0 to 1).

ref

Used to add points aty = 0 (typically to visualize reference points).Either: i) "auto" (default), ii) a character vector of length 1, iii) a listof length 1, iv) a named integer vector of length 1, or v) a numeric vector.By default, iniplot, if the argumentref has been used in the estimation,these references are automatically added. If ii), ie a character scalar, thenthat coefficient equal to zero is added as the first coefficient. If a list ora named integer vector of length 1, then the integer gives the position of thereference among the coefficients and the name gives the coefficient name. A non-namednumeric value ofref only works if the x-axis is also numeric (which can happeniniplot).

ref.line

Logical or numeric, default is "auto", whose behavior dependson the situation. It isTRUE only if: i) interactions are plotted, ii) thex values are numeric and iii) a reference is found. IfTRUE, then a verticalline is drawn at the level of the reference value. Otherwise, if numeric a verticalline will be drawn at that specific value.

ref.line.par

List. Parameters of the vertical line on the reference. Thedefault values are:col = "black" andlty = 2. You can add any graphicalparameter that will be passed tographics::abline. Eg:ref.line.par = list(lty = 1, lwd = 3).

lab.cex

The size of the labels of the coefficients. Default is missing.It is automatically set by an internal algorithm which can go as low aslab.min.cex(another argument).

lab.min.cex

The minimum size of the coefficients labels, as set by theinternal algorithm. Default is 0.85.

lab.max.mar

The maximum size the left margin can take when trying to fitthe coefficient labels into it (only whenhoriz = TRUE). This is used in theinternal algorithm fitting the coefficient labels. Default is0.25.

lab.fit

The method to fit the coefficient labels into the plotting region(only whenhoriz = FALSE). Can be"auto" (the default),"simple","multi"or"tilted". If"simple", then the classic axis is drawn. If"multi", thenthe coefficient labels are fit horizontally across several lines, such that theydon't collide. If"tilted", then the labels are tilted. If"auto", an automaticchoice between the three is made.

xlim.add

A numeric vector of length 1 or 2. It represents an extensionfactor of xlim, in percentage. Eg:xlim.add = c(0, 0.5) extendsxlim of 50%on the right. If of length 1, positive values represent the right, and negativevalues the left (Eg:xlim.add = -0.5 is equivalent toxlim.add = c(0.5, 0)).

ylim.add

A numeric vector of length 1 or 2. It represents an extensionfactor of ylim, in percentage. Eg:ylim.add = c(0, 0.5) extendsylim of 50%on the top. If of length 1, positive values represent the top, and negative valuesthe bottom (Eg:ylim.add = -0.5 is equivalent toylim.add = c(0.5, 0)).

only.params

Logical, default isFALSE. IfTRUE no graphic is displayed,only the values ofx andy used in the plot are returned.

sep

The distance between two estimates – only when argumentobjectis a list of estimation results.

as.multiple

Logical: default isFALSE. Only whenobject is a singleestimation result: whether each coefficient should have a different color, linetype, etc. By default they all get the same style.

bg

Background color for the plot. By default it is white.

group

A list, default is missing. Each element of the list reports thecoefficients to be grouped while the name of the element is the group name. Eachelement of the list can be either: i) a character vector of length 1, ii) oflength 2, or ii) a numeric vector. If equal to: i) then it is interpreted asa pattern: all element fitting the regular expression will be grouped (note thatyou can use the special character "^^" to clean the beginning of the names, seeexample), if ii) it corresponds to the first and last elements to be grouped,if iii) it corresponds to the coefficients numbers to be grouped. If equal toa character vector, you can use a percentage to tell the algorithm to look atthe coefficients before aliasing (e.g."%varname"). Example of valid uses:⁠group=list(group_name=\"pattern\")⁠,⁠group=list(group_name=c(\"var_start\", \"var_end\"))⁠,⁠group=list(group_name=1:2))⁠. See details.

group.par

A list of parameters controlling the display of the group. Theparameters controlling the line are:lwd,tcl (length of the tick),line.adj(adjustment of the position, default is 0),tick (whether to add the ticks),lwd.ticks,col.ticks. Then the parameters controlling the text:text.adj(adjustment of the position, default is 0),text.cex,text.font,text.col.

main

The title of the plot. Default is"Effect on __depvar__". You canuse the special variable⁠__depvar__⁠ to set the title (useful when you set theplot default withsetFixest_coefplot).

value.lab

The label to appear on the side of the coefficient values. Ifhoriz = FALSE, the label appears in the y-axis. Ifhoriz = TRUE, then itappears on the x-axis. The default is equal to"Estimate and __ci__ Conf. Int.",with⁠__ci__⁠ a special variable giving the value of the confidence interval.

ylab

The label of the y-axis, default isNULL. Note that ifhoriz = FALSE, it overrides the value of the argumentvalue.lab.

xlab

The label of the x-axis, default isNULL. Note that ifhoriz = TRUE, it overrides the value of the argumentvalue.lab.

sub

A subtitle, default isNULL.

i.select

Integer scalar, default is 1. Iniplot, used to select whichvariable created withi() to select. Only used when there are several variablescreated withi. This is an index, just try increasing numbers to hopefullyobtain what you want. Note that it works much better when the variables are "pure"i() and not interacted with other variables. For example:i(species, x1)is good whilei(species):x1 isn't. The latter will also work but the indexmay feel weird in case there are manyi() variables.

do_iplot

Logical, default isFALSE. For internal use only.IfTRUE, theniplot is run instead ofcoefplot.

Functions

Setting custom default values

The functioncoefplot dispose of many arguments to parametrize the plots. Mostof these arguments can be set once an for all using the functionsetFixest_coefplot.See Example 3 below for a demonstration.

iplot

The functioniplot restrictscoefplot to interactions or factors createdwith the functioni. Onlyone of the i-variables will be plotted at a time.If you have several i-variables, you can navigate through them with thei.select argument.

The argumenti.select is an index that will go through all the i-variables.It will work well if the variables are pure, meaning not interacted with othervariables. If the i-variables are interacted, the index may have an odd behaviorbut will (in most cases) work all the same, just try some numbers up until you(hopefully) obtain the graph you want.

Note, importantly, that interactions of two factor variables are (in general)disregarded since they would require a 3-D plot to be properly represented.

Arguments keep, drop and order

The argumentskeep,drop andorder use regular expressions. If you are not awareof regular expressions, I urge you to learn it, since it is an extremely powerful wayto manipulate character strings (and it exists across most programming languages).

For example drop = "Wind" would drop any variable whose name contains "Wind". Note thatvariables such as "Temp:Wind" or "StrongWind" do contain "Wind", so would be dropped.To drop only the variable named "Wind", you need to usedrop = "^Wind$" (with "^" meaning beginning, resp. "$" meaning end,of the string => this is the language of regular expressions).

Although you can combine several regular expressions in a single characterstring using pipes,drop also accepts a vector of regular expressions.

You can use the special character "!" (exclamation mark) to reverse the effectof the regular expression (this feature is specific to this function).For exampledrop = "!Wind" would drop any variable that does not contain "Wind".

You can use the special character "%" (percentage) to make reference to theoriginal variable name instead of the aliased name. For example, you have avariable named"Month6", and use a dictionarydict = c(Month6="June").Thus the variable will be displayed as"June".If you want to delete that variable, you can use eitherdrop="June",ordrop="%Month6" (which makes reference to its original name).

The argumentorder takes in a vector of regular expressions, the order will follow theelements of this vector. The vector gives a list of priorities,on the left the elements with highest priority.For example, order = c("Wind", "!Inter", "!Temp") would give highest priorities tothe variables containing "Wind" (which would then appear first),second highest priority is the variables not containing "Inter", last,with lowest priority, the variables not containing "Temp".If you had the following variables: (Intercept), Temp:Wind, Wind, Temp youwould end up with the following order: Wind, Temp:Wind, Temp, (Intercept).

Author(s)

Laurent Berge

See Also

SeesetFixest_coefplot to set the default values ofcoefplot, and the estimationfunctions: e.g.feols,fepois,feglm,fenegbin.

Examples

## Example 1: Stacking two sets of results on the same graph## Estimation on Iris data with one fixed-effect (Species)# + we cluster the standard-errorsest = feols(Petal.Length ~ Petal.Width + Sepal.Width | Species,             iris, vcov = "cluster")# Now with "regular" standard-errorsest_std = summary(est, vcov = "iid")# You can plot the two results at oncecoefplot(est, est_std)# You could also use the argument vcovcoefplot(est, vcov = list("cluster", "iid"))# Alternatively, you can use the argument x.shift# to do it sequentially:# First graph with clustered standard-errorscoefplot(est, x.shift = -.2)# 'x.shift' was used to shift the coefficients to the left.# Second set of results: this time with#  standard-errors that are not clustered.coefplot(est, vcov = "iid", x.shift = .2,         add = TRUE, col = 2, ci.lty = 2, pch = 15)legend("topright", col = 1:2, pch = 20, lwd = 1, lty = 1:2,       legend = c("Clustered", "IID"), title = "Standard-Errors")## Example 2: Interactions## Now we estimate and plot the "yearly" treatment effectsdata(base_did)base_inter = base_did# We interact the variable 'period' with the variable 'treat'est_did = feols(y ~ x1 + i(period, treat, 5) | id + period, base_inter)# In the estimation, the variable treat is interacted#  with each value of period but 5, set as a reference# coefplot will show all the coefficients:coefplot(est_did)# Note that the grouping of the coefficients is due to 'group = "auto"'# If you want to keep only the coefficients# created with i() (ie the interactions), use iplotiplot(est_did)# We can see that the graph is different from before:#  - only interactions are shown,#  - the reference is present,# => this is fully flexibleiplot(est_did, ref.line = FALSE, pt.join = TRUE)## What if the interacted variable is not numeric?# Let's create a "month" variableall_months = c("aug", "sept", "oct", "nov", "dec", "jan",               "feb", "mar", "apr", "may", "jun", "jul")base_inter$period_month = all_months[base_inter$period]# The new estimationest = feols(y ~ x1 + i(period_month, treat, "oct") | id+period, base_inter)# Since 'period_month' of type character, coefplot sorts itiplot(est)# To respect a plotting order, use a factorbase_inter$month_factor = factor(base_inter$period_month, levels = all_months)est = feols(y ~ x1 + i(month_factor, treat, "oct") | id + period, base_inter)iplot(est)## Example 3: Setting defaults## coefplot has many arguments, which makes it highly flexible.# If you don't like the default style of coefplot. No worries,# you can set *your* default by using the function# setFixest_coefplot()dict = c("Petal.Length"="Length (Petal)", "Petal.Width"="Width (Petal)",         "Sepal.Length"="Length (Sepal)", "Sepal.Width"="Width (Sepal)")setFixest_coefplot(ci.col = 2, pt.col = "darkblue", ci.lwd = 3,                   pt.cex = 2, pt.pch = 15, ci.width = 0, dict = dict)est = feols(Petal.Length ~ Petal.Width + Sepal.Length +                Sepal.Width + i(Species), iris)# And that's itcoefplot(est)# You can set separate default values for iplotsetFixest_coefplot("iplot", pt.join = TRUE, pt.join.par = list(lwd = 2, lty = 2))iplot(est)# To reset to the default settings:setFixest_coefplot("all", reset = TRUE)coefplot(est)## Example 4: group + cleaning## You can use the argument group to group variables# You can further use the special character "^^" to clean#  the beginning of the coef. name: particularly useful for factorsest = feols(Petal.Length ~ Petal.Width + Sepal.Length +                Sepal.Width + Species, iris)# No grouping:coefplot(est)# now we group by Sepal and Speciescoefplot(est, group = list(Sepal = "Sepal", Species = "Species"))# now we group + clean the beginning of the names using the special character ^^coefplot(est, group = list(Sepal = "^^Sepal.", Species = "^^Species"))

Extracts the coefficients table from an estimation

Description

Methods to extracts the coefficients table and its sub-components from an estimation.

Usage

coeftable(object, ...)se(object, ...)pvalue(object, ...)tstat(object, ...)

Arguments

object

An estimation (fitted model object), e.g. afixest object.

...

Other arguments to the methods.

Value

Returns a matrix (coeftable) or vectors.

See Also

Please look at thecoeftable.fixest page for more detailed information.

Examples

est = lm(mpg ~ cyl, mtcars)coeftable(est)

Extracts the coefficients table from an estimation

Description

Default method to extracts the coefficients table and its sub-components from an estimation.

Usage

## Default S3 method:coeftable(object, keep, drop, order, ...)## Default S3 method:se(object, keep, drop, order, ...)## Default S3 method:tstat(object, keep, drop, order, ...)## Default S3 method:pvalue(object, keep, drop, order, ...)## S3 method for class 'matrix'se(object, keep, drop, order, ...)

Arguments

object

The result of an estimation (a fitted model object). Note that this functionis made to work withfixest objects so it may not work for the specific model you provide.

keep

Character vector. This element is used to display only a subset of variables. Thisshould be a vector of regular expressions (seebase::regex help for more info). Eachvariable satisfying any of the regular expressions will be kept. This argument is applied postaliasing (see argumentdict). Example: you have the variablex1 tox55 and want to displayonlyx1 tox9, then you could usekeep = "x[[:digit:]]$". If the first character is anexclamation mark, the effect is reversed (e.g. keep = "!Intercept" means: every variable thatdoes not contain “Intercept” is kept). See details.

drop

Character vector. This element is used if some variables are not to be displayed.This should be a vector of regular expressions (seebase::regex help for more info). Eachvariable satisfying any of the regular expressions will be discarded. This argument is appliedpost aliasing (see argumentdict). Example: you have the variablex1 tox55 and want todisplay onlyx1 tox9, then you could use⁠drop = "x[[:digit:]]{2}⁠". If the first characteris an exclamation mark, the effect is reversed (e.g. drop = "!Intercept" means: every variablethat does not contain “Intercept” is dropped). See details.

order

Character vector. This element is used if the user wants the variables to beordered in a certain way. This should be a vector of regular expressions (seebase::regexhelp for more info). The variables satisfying the first regular expression will be placed first,then the order follows the sequence of regular expressions. This argument is applied postaliasing (see argumentdict). Example: you have the following variables:month1 tomonth6,thenx1 tox5, thenyear1 toyear6. If you want to display first the x's, then theyears, then the months you could use:order = c("x", "year"). If the first character is anexclamation mark, the effect is reversed (e.g. order = "!Intercept" means: every variable thatdoes not contain “Intercept” goes first). See details.

...

Other arguments that will be passed tosummary.

First the method summary is applied if needed, then the coefficients table is extracted fromits output.

The default method is very naive and hopes that the resulting coefficients tablecontained in the summary of the fitted model is well formed: this assumption is veryoften wrong. Anyway, there is no development intended since the coeftable/se/pvalue/tstatseries of methods is only intended to work well withfixest objects. To extractthe coefficients table from fitted models in a general way, it's better tousetidy from broom.

Value

Returns a matrix (coeftable) or vectors.

Functions

Examples

# NOTA: This function is really made to handle fixest objects# The default methods works for simple structures, but you'd be# likely better off with broom::tidy for other modelsest = lm(mpg ~ cyl, mtcars)coeftable(est)se(est)

Obtain various statistics from an estimation

Description

Set of functions to directly extract some commonly used statistics, like the p-value orthe table of coefficients, from estimations. This was first implemented forfixest estimations, but has some support for other models.

Usage

## S3 method for class 'fixest'coeftable(  object,  vcov = NULL,  ssc = NULL,  cluster = NULL,  keep = NULL,  drop = NULL,  order = NULL,  list = FALSE,  ...)## S3 method for class 'fixest'se(  object,  vcov = NULL,  ssc = NULL,  cluster = NULL,  keep = NULL,  drop = NULL,  order = NULL,  ...)## S3 method for class 'fixest'tstat(  object,  vcov = NULL,  ssc = NULL,  cluster = NULL,  keep = NULL,  drop = NULL,  order = NULL,  ...)## S3 method for class 'fixest'pvalue(  object,  vcov = NULL,  ssc = NULL,  cluster = NULL,  keep = NULL,  drop = NULL,  order = NULL,  ...)

Arguments

object

Afixest object. For example an estimation obtained fromfeols.

vcov

Versatile argument to specify the VCOV.In general, it is either a character scalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. The VCOV types implemented are: "iid", "hetero" (or "HC1"),"cluster", "twoway", "NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley".It also accepts object from vcov_cluster, vcov_NW, NW, vcov_DK, DK, vcov_conley and conley.It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances.See the vcov documentation in the vignette.You can pass several VCOVs (as above) if you nest them into a list.If the number of VCOVs equals the number of models, eahc VCOV is mapped to the appropriate model.If there is one model and several VCOVs, or if the first element of the list is equal to"each" or"times", then the estimations will be replicated and the resultsfor each estimation and each VCOV will be reported.

ssc

An object of classssc.type obtained with the functionssc. Representshow the degree of freedom correction should be done.You must use the functionsscfor this argument. The arguments and defaults of the functionssc are:K.adj = TRUE,K.fixef = "nonnested",G.adj = TRUE,G.df = "min",t.df = "min",⁠K.exact = FALSE)⁠. See the help of the functionssc for details.

cluster

Tells how to cluster the standard-errors (if clustering is requested). Canbe either a list of vectors, a character vector of variable names, a formula or aninteger vector. Assume we want to perform 2-way clustering overvar1 andvar2 containedin the data.framebase used for the estimation. All the followingcluster argumentsare valid and do the same thing:⁠cluster = base[, c("var1, "var2")]⁠,⁠cluster = c("var1, "var2")⁠,cluster = ~var1+var2. If the two variables were used asclusters in the estimation, you could further usecluster = 1:2 or leave it blankwithse = "twoway" (assumingvar1 [resp.var2] was the 1st [resp. 2nd] cluster).

keep

Character vector. This element is used to display only a subset of variables. Thisshould be a vector of regular expressions (seebase::regex help for more info). Eachvariable satisfying any of the regular expressions will be kept. This argument is applied postaliasing (see argumentdict). Example: you have the variablex1 tox55 and want to displayonlyx1 tox9, then you could usekeep = "x[[:digit:]]$". If the first character is anexclamation mark, the effect is reversed (e.g. keep = "!Intercept" means: every variable thatdoes not contain “Intercept” is kept). See details.

drop

Character vector. This element is used if some variables are not to be displayed.This should be a vector of regular expressions (seebase::regex help for more info). Eachvariable satisfying any of the regular expressions will be discarded. This argument is appliedpost aliasing (see argumentdict). Example: you have the variablex1 tox55 and want todisplay onlyx1 tox9, then you could use⁠drop = "x[[:digit:]]{2}⁠". If the first characteris an exclamation mark, the effect is reversed (e.g. drop = "!Intercept" means: every variablethat does not contain “Intercept” is dropped). See details.

order

Character vector. This element is used if the user wants the variables to beordered in a certain way. This should be a vector of regular expressions (seebase::regexhelp for more info). The variables satisfying the first regular expression will be placed first,then the order follows the sequence of regular expressions. This argument is applied postaliasing (see argumentdict). Example: you have the following variables:month1 tomonth6,thenx1 tox5, thenyear1 toyear6. If you want to display first the x's, then theyears, then the months you could use:order = c("x", "year"). If the first character is anexclamation mark, the effect is reversed (e.g. order = "!Intercept" means: every variable thatdoes not contain “Intercept” goes first). See details.

list

Logical, default isFALSE. IfTRUE, then a nested list is returned, thefirst layer is accessed with the coefficients names; the second layer with thefollowing values:coef,se,tstat,pvalue. Note that the variable"(Intercept)"is renamed into"constant".

...

Other arguments to be passed tosummary.fixest.

Details

This set of tiny functions is primarily constructed forfixest estimations.

Value

Returns a table of coefficients, with in rows the variables and four columns: the estimate,the standard-error, the t-statistic and the p-value.

Iflist = TRUE then a nested list is returned, the first layer is accessed withthe coefficients names; the second layer with the following values:coef,se,tstat,pvalue. For example, withres = coeftable(est, list = TRUE)you can access the SE of the coefficientx1 withres$x1$se; and itscoefficient withres$x1$coef, etc.

Functions

Examples

# Some data and estimationdata(trade)est = fepois(Euros ~ log(dist_km) | Origin^Product + Year, trade)## Coeftable/se/tstat/pvalue#coeftable(est)se(est)tstat(est)pvalue(est)# Now with two-way clustered standard-errors#  and using coeftable()coeftable(est, cluster = ~Origin + Product)se(est, cluster = ~Origin + Product)pvalue(est, cluster = ~Origin + Product)tstat(est, cluster = ~Origin + Product)# Or you can cluster only once using summary:est_sum = summary(est, cluster = ~Origin + Product)coeftable(est_sum)se(est_sum)tstat(est_sum)pvalue(est_sum)# You can use the arguments keep, drop, order# to rearrange the resultsbase = irisnames(base) = c("y", "x1", "x2", "x3", "species")est_iv = feols(y ~ x1 | x2 ~ x3, base)tstat(est_iv, keep = "x1")coeftable(est_iv, keep = "x1|Int")coeftable(est_iv, order = "!Int")## Using lists## Returning the coefficients table as a list can be useful for quick# reference in markdown documents.# Note that the "(Intercept)" is renamed into "constant"res = coeftable(est_iv, list = TRUE)# coefficient of the constant:res$constant$coef# pvalue of x1res$x1$pvalue

Extracts the coefficients tables fromfixest_multi estimations

Description

Series of methods to extract the coefficients table or its sub-components from afixest_multi objects (i.e. the outcome of multiple estimations).

Usage

## S3 method for class 'fixest_multi'coeftable(  object,  vcov = NULL,  keep = NULL,  drop = NULL,  order = NULL,  long = FALSE,  wide = FALSE,  ...)## S3 method for class 'fixest_multi'se(  object,  vcov = NULL,  keep = NULL,  drop = NULL,  order = NULL,  long = FALSE,  ...)## S3 method for class 'fixest_multi'tstat(  object,  vcov = NULL,  keep = NULL,  drop = NULL,  order = NULL,  long = FALSE,  ...)## S3 method for class 'fixest_multi'pvalue(  object,  vcov = NULL,  keep = NULL,  drop = NULL,  order = NULL,  long = FALSE,  ...)

Arguments

object

Afixest_multi object, coming from afixest multiple estimation.

vcov

Versatile argument to specify the VCOV.In general, it is either a character scalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. The VCOV types implemented are: "iid", "hetero" (or "HC1"),"cluster", "twoway", "NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley".It also accepts object from vcov_cluster, vcov_NW, NW, vcov_DK, DK, vcov_conley and conley.It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances.See the vcov documentation in the vignette.You can pass several VCOVs (as above) if you nest them into a list.If the number of VCOVs equals the number of models, eahc VCOV is mapped to the appropriate model.If there is one model and several VCOVs, or if the first element of the list is equal to"each" or"times", then the estimations will be replicated and the resultsfor each estimation and each VCOV will be reported.

keep

Character vector. This element is used to display only a subset of variables. Thisshould be a vector of regular expressions (seebase::regex help for more info). Eachvariable satisfying any of the regular expressions will be kept. This argument is applied postaliasing (see argumentdict). Example: you have the variablex1 tox55 and want to displayonlyx1 tox9, then you could usekeep = "x[[:digit:]]$". If the first character is anexclamation mark, the effect is reversed (e.g. keep = "!Intercept" means: every variable thatdoes not contain “Intercept” is kept). See details.

drop

Character vector. This element is used if some variables are not to be displayed.This should be a vector of regular expressions (seebase::regex help for more info). Eachvariable satisfying any of the regular expressions will be discarded. This argument is appliedpost aliasing (see argumentdict). Example: you have the variablex1 tox55 and want todisplay onlyx1 tox9, then you could use⁠drop = "x[[:digit:]]{2}⁠". If the first characteris an exclamation mark, the effect is reversed (e.g. drop = "!Intercept" means: every variablethat does not contain “Intercept” is dropped). See details.

order

Character vector. This element is used if the user wants the variables to beordered in a certain way. This should be a vector of regular expressions (seebase::regexhelp for more info). The variables satisfying the first regular expression will be placed first,then the order follows the sequence of regular expressions. This argument is applied postaliasing (see argumentdict). Example: you have the following variables:month1 tomonth6,thenx1 tox5, thenyear1 toyear6. If you want to display first the x's, then theyears, then the months you could use:order = c("x", "year"). If the first character is anexclamation mark, the effect is reversed (e.g. order = "!Intercept" means: every variable thatdoes not contain “Intercept” goes first). See details.

long

Logical scalar, default isFALSE. IfTRUE, then all the informationis stacked, with two columns containing the information:"param" and"value".The columnparam contains the valuescoef/se/tstat/pvalue.

wide

A logical scalar, default isFALSE. IfTRUE, then a list is returned:the elements of the list are coef/se/tstat/pvalue. Each element of the list is a widetable with a column per coefficient.

...

Other arguments to be passed tosummary.fixest.

Value

It returns adata.frame containing the coefficients tables (or just the se/pvalue/tstat)along with the information on which model was estimated.

Ifwide = TRUE, then a list is returned. The elements of the list arecoef/se/tstat/pvalue. Each element of the list is a wide table with a column per coefficient.

Iflong = TRUE, then all the information is stacked. This removes the 4 columnscontaining the coefficient estimates to the p-values, and replace them with twonew columns:"param" and"value". The columnparam contains thevaluescoef/se/tstat/pvalue, and the columnvalues theassociated numerical information.

Functions

Examples

base = setNames(iris, c("y", "x1", "x2", "x3", "species"))est_multi = feols(y ~ csw(x.[,1:3]), base, split = ~species)# we get all the coefficient tables at oncecoeftable(est_multi)# Now just the standard-errorsse(est_multi)# wide = TRUE => leads toa  list of wide tablescoeftable(est_multi, wide = TRUE)# long = TRUE, all the information is stackedcoeftable(est_multi, long = TRUE)

Collinearity diagnostics forfixest objects

Description

In some occasions, the optimization algorithm offemlm may fail to converge, orthe variance-covariance matrix may not be available. The most common reason of whythis happens is collinearity among variables. This function helps to find out whichset of variables is problematic.

Usage

collinearity(x, verbose)

Arguments

x

Afixest object obtained from, e.g. functionsfemlm,feols orfeglm.

verbose

An integer. If higher than or equal to 1, then a note is prompted ateach step of the algorithm. By defaultverbose = 0 for small problemsand to 1 for large problems.

Details

This function tests: 1) collinearity with the fixed-effect variables,2) perfect multi-collinearity between the variables, 3) perfect multi-collinearitybetween several variables and the fixed-effects, and 4) identification issueswhen there are non-linear in parameters parts.

Value

It returns a text message with the identified diagnostics.

Author(s)

Laurent Berge

Examples

# Creating an example data base:set.seed(1)fe_1 = sample(3, 100, TRUE)fe_2 = sample(20, 100, TRUE)x = rnorm(100, fe_1)**2y = rnorm(100, fe_2)**2z = rnorm(100, 3)**2dep = rpois(100, x*y*z)base = data.frame(fe_1, fe_2, x, y, z, dep)# creating collinearity problems:base$v1 = base$v2 = base$v3 = base$v4 = 0base$v1[base$fe_1 == 1] = 1base$v2[base$fe_1 == 2] = 1base$v3[base$fe_1 == 3] = 1base$v4[base$fe_2 == 1] = 1# Estimations:# Collinearity with the fixed-effects:res_1 = femlm(dep ~ log(x) + v1 + v2 + v4 | fe_1 + fe_2, base)collinearity(res_1)# => collinearity with the first fixed-effect identified, we drop v1 and v2res_1bis = femlm(dep ~ log(x) + v4 | fe_1 + fe_2, base)collinearity(res_1bis)# Multi-Collinearity:res_2 =  femlm(dep ~ log(x) + v1 + v2 + v3 + v4, base)collinearity(res_2)

Confidence interval for parameters estimated withfixest

Description

This function computes the confidence interval of parameter estimates obtained from amodel estimated withfemlm,feols orfeglm.

Usage

## S3 method for class 'fixest'confint(  object,  parm,  level = 0.95,  vcov,  se,  cluster,  ssc = NULL,  coef.col = FALSE,  ...)

Arguments

object

Afixest object. Obtained using the functionsfemlm,feols orfeglm.

parm

The parameters for which to compute the confidence interval (either aninteger vector OR a character vector with the parameter name). If missing, allparameters are used.

level

The confidence level. Default is 0.95.

vcov

Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. TheVCOV types implemented are: "iid", "hetero" (or "HC1"), "cluster", "twoway","NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley". It also acceptsobject fromvcov_cluster,vcov_NW,NW,vcov_DK,DK,vcov_conley andconley. It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances. See thevcov documentationin thevignette.

se

Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation:se = "cluster", otherwisese = "iid". Note that this argument is deprecated,you should usevcov instead.

cluster

Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering overvar1 andvar2contained in the data.framebase used for the estimation. All the followingcluster arguments are valid and do the same thing:cluster = base[, c("var1", "var2")],cluster = c("var1", "var2"),cluster = ~var1+var2.If the two variables were used as fixed-effects in the estimation, you can leave itblank withvcov = "twoway" (assumingvar1 [resp.var2] wasthe 1st [resp. 2nd] fixed-effect). You can interact two variables using^ withthe following syntax:cluster = ~var1^var2 orcluster = "var1^var2".

ssc

An object of classssc.type obtained with the functionssc. Representshow the degree of freedom correction should be done.You must use the functionsscfor this argument. The arguments and defaults of the functionssc are:K.adj = TRUE,K.fixef = "nonnested",G.adj = TRUE,G.df = "min",t.df = "min",⁠K.exact = FALSE)⁠. See the help of the functionssc for details.

coef.col

Logical, default isFALSE. IfTRUE the columncoefficient isinserted in the first position containing the coefficient names.

...

Not currently used.

Value

Returns a data.frame with two columns giving respectively the lower and upper boundof the confidence interval. There is as many rows as parameters.

Author(s)

Laurent Berge

Examples

# Load trade datadata(trade)# We estimate the effect of distance on trade (with 3 fixed-effects)est_pois = femlm(Euros ~ log(dist_km) + log(Year) | Origin + Destination +                 Product, trade)# confidence interval with "normal" VCOVconfint(est_pois)# confidence interval with "clustered" VCOV (w.r.t. the Origin factor)confint(est_pois, se = "cluster")

Confidence intervals forfixest_multi objects

Description

Computes the confidence intervals of parameter estimates forfixest's multipleestimation objects (akafixest_multi).

Usage

## S3 method for class 'fixest_multi'confint(  object,  parm,  level = 0.95,  vcov = NULL,  se = NULL,  cluster = NULL,  ssc = NULL,  ...)

Arguments

object

Afixest_multi object obtained from a multiple estimation infixest.

parm

The parameters for which to compute the confidence interval (either aninteger vector OR a character vector with the parameter name). If missing, allparameters are used.

level

The confidence level. Default is 0.95.

vcov

Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. TheVCOV types implemented are: "iid", "hetero" (or "HC1"), "cluster", "twoway","NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley". It also acceptsobject fromvcov_cluster,vcov_NW,NW,vcov_DK,DK,vcov_conley andconley. It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances. See thevcov documentationin thevignette.

se

Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation:se = "cluster", otherwisese = "iid". Note that this argument is deprecated,you should usevcov instead.

cluster

Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering overvar1 andvar2contained in the data.framebase used for the estimation. All the followingcluster arguments are valid and do the same thing:cluster = base[, c("var1", "var2")],cluster = c("var1", "var2"),cluster = ~var1+var2.If the two variables were used as fixed-effects in the estimation, you can leave itblank withvcov = "twoway" (assumingvar1 [resp.var2] wasthe 1st [resp. 2nd] fixed-effect). You can interact two variables using^ withthe following syntax:cluster = ~var1^var2 orcluster = "var1^var2".

ssc

An object of classssc.type obtained with the functionssc. Representshow the degree of freedom correction should be done.You must use the functionsscfor this argument. The arguments and defaults of the functionssc are:K.adj = TRUE,K.fixef = "nonnested",G.adj = TRUE,G.df = "min",t.df = "min",⁠K.exact = FALSE)⁠. See the help of the functionssc for details.

...

Not currently used.

Value

It returns a data frame whose first columns indicate which model has been estimated.The last three columns indicate the coefficient name, and the lower and upperconfidence intervals.

Examples

base = setNames(iris, c("y", "x1", "x2", "x3", "species"))est = feols(y ~ csw(x.[,1:3]) | sw0(species), base, vcov = "iid")confint(est)# focusing only on the coefficient 'x3'confint(est, "x3")# the 'id' provides the index of the estimationest[c(3, 6)]

Gets the degrees of freedom of afixest estimation

Description

Simple utility to extract the degrees of freedom from afixest estimation.

Usage

degrees_freedom(  x,  type,  vars = NULL,  vcov = NULL,  se = NULL,  cluster = NULL,  ssc = NULL,  stage = 2)degrees_freedom_iid(x, type)

Arguments

x

Afixest estimation.

type

Character scalar, equal to "k", "resid", "t". If "k", then the number ofregressors is returned. If "resid", then it is the "residuals degree of freedom", i.e.the number of observations minus the number of regressors. If "t", it is the degrees offreedom used in the t-test. Note that these values are affected by how the VCOV ofxis computed, in particular when the VCOV is clustered.

vars

A vector of variable names, of the regressors. This is optional. If provided,thentype is set to 1 by default and the number of regressors contained invarsis returned. This is only useful in the presence of collinearity and we want a subset ofthe regressors only. (Mostly for internal use.)

vcov

Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. TheVCOV types implemented are: "iid", "hetero" (or "HC1"), "cluster", "twoway","NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley". It also acceptsobject fromvcov_cluster,vcov_NW,NW,vcov_DK,DK,vcov_conley andconley. It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances. See thevcov documentationin thevignette.

se

Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation:se = "cluster", otherwisese = "iid". Note that this argument is deprecated,you should usevcov instead.

cluster

Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering overvar1 andvar2contained in the data.framebase used for the estimation. All the followingcluster arguments are valid and do the same thing:cluster = base[, c("var1", "var2")],cluster = c("var1", "var2"),cluster = ~var1+var2.If the two variables were used as fixed-effects in the estimation, you can leave itblank withvcov = "twoway" (assumingvar1 [resp.var2] wasthe 1st [resp. 2nd] fixed-effect). You can interact two variables using^ withthe following syntax:cluster = ~var1^var2 orcluster = "var1^var2".

ssc

An object of classssc.type obtained with the functionssc. Representshow the degree of freedom correction should be done.You must use the functionsscfor this argument. The arguments and defaults of the functionssc are:K.adj = TRUE,K.fixef = "nonnested",G.adj = TRUE,G.df = "min",t.df = "min",⁠K.exact = FALSE)⁠. See the help of the functionssc for details.

stage

Either 1 or 2. Only concerns IV regressions, which stage to look at.

The type of VCOV can have an influence on the degrees of freedom. In particular, when theVCOV is clustered, the DoF returned will be in accordance with the way the smallsample correction was performed when computing the VCOV. That type of value is in generalnot what we have in mind when we think of "degrees of freedom". To obtain the ones that aremore intuitive, please usedegrees_freedom_iid instead.

Functions

Examples

# First: an estimationbase = irisnames(base) = c("y", "x1", "x2", "x3", "species")est = feols(y ~ x1 + x2 | species, base)# "Normal" standard-errors (SE)est_standard = summary(est, se = "st")# Clustered SEsest_clustered = summary(est, se = "clu")# The different degrees of freedom# => different type 1 DoF (because of the clustering)degrees_freedom(est_standard, type = "k")degrees_freedom(est_clustered, type = "k") # fixed-effects are excluded# => different type 2 DoF (because of the clustering)degrees_freedom(est_standard, type = "resid") # => equivalent to the df.residual from lmdegrees_freedom(est_clustered, type = "resid")

Centers a set of variables around a set of factors

Description

User-level access to internal demeaning algorithm offixest.

Usage

demean(  X,  f,  slope.vars,  slope.flag,  data,  weights,  sample = "estimation",  nthreads = getFixest_nthreads(),  notes = getFixest_notes(),  iter = 2000,  tol = 1e-06,  fixef.reorder = TRUE,  fixef.algo = NULL,  na.rm = TRUE,  as.matrix = is.atomic(X),  im_confident = FALSE,  ...)

Arguments

X

A matrix, vector, data.frame or a list OR a formula OR afeols estimation. If equalto a formula, then the argumentdata is required, and it must be of the type:x1 + x2 ~ f1 + fe2 with on the LHS the variables to be centered, and on the RHS the factorsused for centering. Note that you can use variables with varying slopes with the syntaxfe[v1, v2] (see details infeols). If afeols estimation, all variables (LHS+RHS) aredemeaned and then returned (only if it was estimated with fixed-effects). Otherwise, it mustrepresent the data to be centered. Of course the number of observations of that data must be thesame as the factors used for centering (argumentf).

f

A matrix, vector, data.frame or list. The factors used to center the variables inargumentX. Matrices will be coerced usingas.data.frame.

slope.vars

A vector, matrix or list representing the variables with varying slopes.Matrices will be coerced usingas.data.frame. Note that if this argument is used it MUST be inconjunction with the argumentslope.flag that maps the factors to which the varying slopes areattached. See examples.

slope.flag

An integer vector of the same length as the number of variables inf (thefactors used for centering). It indicates for each factor the number of variables with varyingslopes to which it is associated. Positive values mean that the raw factor should also beincluded in the centering, negative values that it should be excluded. Sorry it's complicated...but see the examples it may get clearer.

data

A data.frame containing all variables in the argumentX. Only used ifX is aformula, in which casedata is mandatory.

weights

Vector, can be missing or NULL. If present, it must contain the same number ofobservations as inX.

sample

Character scalar equal to "estimation" (default) or "original". Onlyused when the argumentX is afixest estimation.

By default, only the observations used in the estimation are demeaned. This willreturn a matrix with the same number of rows as the number of observations inthe estimation. You can safely use the resulting matrix to recompute the coefficientsfrom the estimation 'by hand'.

To demean all the observations of the original sample, usesample="original".

nthreads

Number of threads to be used. By default it is equal togetFixest_nthreads().

notes

Logical, whether to display a message when NA values are removed. By default it isequal togetFixest_notes().

iter

Number of iterations, default is 2000.

tol

Stopping criterion of the algorithm. Default is1e-6. The algorithm stops when themaximum absolute increase in the coefficients values is lower thantol.

fixef.reorder

Logical, default isTRUE. Whether to reorder the fixed-effects byfrequencies before feeding them into the algorithm. IfFALSE, the original fixed-effects orderprovided by the user is maintained. In general, reordering leads to faster and more preciseperformance.

fixef.algo

NULL (default) or an object of classdemeaning_algo obtained withthe functiondemeaning_algo. IfNULL, it falls to the defaults ofdemeaning_algo.This arguments controls the settings of the demeaning algorithm.Only play with it if the convergence is slow, i.e. look at the slot⁠$iterations⁠, and if any isover 50, it may be worth playing around with it. Please read the documentation of thefunctiondemeaning_algo. Be aware that there is no clear guidance on how to change thesettings, it's more a matter of try-and-see.

na.rm

Logical, default isTRUE. IfTRUE and the input data contains any NA value,then any observation with NA will be discarded leading to an output with less observations thanthe input. IfFALSE, if NAs are present the output will also be filled with NAs for each NAobservation in input.

as.matrix

Logical, ifTRUE a matrix is returned, ifFALSE it will be a data.frame.The default depends on the input, if atomic then a matrix will be returned.

im_confident

Logical, default isFALSE. FOR EXPERT USERS ONLY! This argument allows toskip some of the preprocessing of the arguments given in input. IfTRUE, thenX MUST be anumeric vector/matrix/list (not a formula!),f MUST be a list,slope.vars MUST be a list,slope.vars MUST be consistent withslope.flag, andweights, if given, MUST be numeric (notinteger!). Further there MUST be not any NA value, and the number of observations of eachelement MUST be consistent. Non compliance to these rules may simply lead your R session tobreak.

...

Not currently used.

Value

It returns a data.frame of the same number of columns as the number of variables to be centered.

Ifna.rm = TRUE, then the number of rows is equal to the number of rows in input minus thenumber of NA values (contained inX,f,slope.vars orweights). The default is to havean output of the same number of observations as the input (filled with NAs where appropriate).

A matrix can be returned ifas.matrix = TRUE.

Varying slopes

You can add variables with varying slopes in the fixed-effect part of the formula.The syntax is as follows:fixef_var[var1, var2]. Here the variables var1 and var2 willbe with varying slopes (one slope per value in fixef_var) and the fixed-effectfixef_var will also be added.

To add only the variables with varying slopes and not the fixed-effect,use double square brackets:fixef_var[[var1, var2]].

In other words:

In general, for convergence reasons, it is recommended to always add the fixed-effect andavoid using only the variable with varying slope (i.e. use single square brackets).

Examples

# Illustration of the FWL theoremdata(trade)base = tradebase$ln_dist = log(base$dist_km)base$ln_euros = log(base$Euros)# We center the two variables ln_dist and ln_euros#  on the factors Origin and DestinationX_demean = demean(X = base[, c("ln_dist", "ln_euros")],                  f = base[, c("Origin", "Destination")])base[, c("ln_dist_dm", "ln_euros_dm")] = X_demeanest = feols(ln_euros_dm ~ ln_dist_dm, base)est_fe = feols(ln_euros ~ ln_dist | Origin + Destination, base)# The results are the same as if we used the two factors# as fixed-effectsetable(est, est_fe, se = "st")## Variables with varying slopes## You can center on factors but also on variables with varying slopes# Let's have an illustrationbase = irisnames(base) = c("y", "x1", "x2", "x3", "species")## We center y and x1 on species and x2 * species# using a formulabase_dm = demean(y + x1 ~ species[x2], data = base)# using vectorsbase_dm_bis = demean(X = base[, c("y", "x1")], f = base$species,                     slope.vars = base$x2, slope.flag = 1)# Let's look at the equivalencesres_vs_1 = feols(y ~ x1 + species + x2:species, base)res_vs_2 = feols(y ~ x1, base_dm)res_vs_3 = feols(y ~ x1, base_dm_bis)# only the small sample adj. differ in the SEsetable(res_vs_1, res_vs_2, res_vs_3, keep = "x1")## center on x2 * species and on another FEbase$fe = rep(1:5, 10)# using a formula => double square brackets!base_dm = demean(y + x1 ~ fe + species[[x2]], data = base)# using vectors => note slope.flag!base_dm_bis = demean(X = base[, c("y", "x1")], f = base[, c("fe", "species")],                     slope.vars = base$x2, slope.flag = c(0, -1))# Explanations slope.flag = c(0, -1):# - the first 0: the first factor (fe) is associated to no variable# - the "-1":#    * |-1| = 1: the second factor (species) is associated to ONE variable#    *   -1 < 0: the second factor should not be included as such# Let's look at the equivalencesres_vs_1 = feols(y ~ x1 + i(fe) + x2:species, base)res_vs_2 = feols(y ~ x1, base_dm)res_vs_3 = feols(y ~ x1, base_dm_bis)# only the small sample adj. differ in the SEsetable(res_vs_1, res_vs_2, res_vs_3, keep = "x1")

Controls the parameters of the demeaning procedure

Description

Fine control of the demeaning procedure. Since the defaults are sensible,only use this function in case of difficult convergence (e.g. infeols ordemean).That is, look at the slot⁠$iterations⁠ of the returned object, if it's high (over 50),then it might be worth playing around with these settings.

Usage

demeaning_algo(  extraProj = 0,  iter_warmup = 15,  iter_projAfterAcc = 40,  iter_grandAcc = 4,  internal = FALSE)

Arguments

extraProj

Integer scalar, default is 0. Should there be more plain projection stepsin between two accelerations? By default there is not. Each integer value adds 3simple projections.This can be useful in cases where the acceleration algorithm does not work wellbut simple projections do.

iter_warmup

Integer scalar, default is 15. Only used in the presence of 3or more fixed-effects (FE), ignored otherwise. For 3+ FEs, the algorithm is as follows:

  1. iter_warmup iterations on all FEs. If convergence: end of the algorithm. 2) Otherwise:a) demeaning over the first two largest FEs only, until convergence, thenb) demeaning over all FEs until convergence.To skip the demeaning over 2 FEs, use a very high value ofiter_warmup. To go directlyto the demeaning over 2 FEs, seiter_warmup to a value lower than or equal to 0.

iter_projAfterAcc

Integer scalar, default is 40. Afteriter_projAfterAcc iterationsof the standard algorithm, a simple projection is performed right afterthe acceleration step. Use very high values to skip this step, or low values to apply thisprocedure right from the start.

iter_grandAcc

Integer scalar, default is 4. The regular fixed-point algorithmapplies an acceleration at each iteration. This acceleration is forf(X)(withf the projection).This settings controls a grand acceleration, which is instead forf^k(X) wherek is the value ofiter_grandAcc andf^2(X) is defined asf(f(X))(i.e. the functionf appliedk times). By default, an additional accelerationis performed forh(X) = f^4(X) every 8 iterations (2 times 4, equivalent tothe iterationsthe time to gatherh(X) andh(h(X))).

internal

Logical scalar, default isFALSE. IfTRUE, no check on the argumentsis performed and the returned object is a plain list. For internal use only.

Details

The demeaning algorithm is a fixed-point algorithm. Basically a functionf is applieduntil⁠|f(X) - X| = 0⁠, i.e. there is no difference betweenX and its image.For terminology, let's call the application off a "projection".

For well behaved problems, the algorithm in its simplest form, i.e. just applyingf untilconvergence, works fine and you only need a few iterations to reach convergence.

The problems arise for non well behaved problems. In these cases, simply applying thefunctionf can lead to extremely slow convergence. To handle these cases, this algorithmapplies a fixed-point acceleration algorithm, namely the "Irons and Tuck" acceleration.

The main algorithm combines regular projections with accelerations. Unfortunatelysometimes this is not enough, so we also resort on internal cuisine, detailed below.

Sometimes the acceleration in its simplest form does not work well, and garbles theconvergence properties. In those cases:

On top of this, in case of very difficult convergence, a "grand" acceleration is added tothe algorithm. The regular acceleration is overf. Sayg is the function equivalent tothe application of one regular iteration (which is a combination of one acceleration withseveral projections).By default the grand acceleration is over⁠h = g o g o g o g⁠, otherwiseg applied four times.The grand acceleration is controled with the argumentiter_grandAcc which correspondsto the number of iterations of the regular algorithm definingh.

Finally in case of 3+ fixed-effects (FE), the convergence in general takes more iterations.In cases of the absence of quick convergence, applying a first demeaning over the firsttwo largest FEs before applying the demeaning over all FEs can improve convergence speed.This is controlled with the argumentiter_warmup which gives the number of iterationsover all the FEs to run before going to the 2 FEs demeaning. By default, the deameaningover all FEs is run for 15 iterations before switching to the 2 FEs case.

The above defaults are the outcome of extended empirical applications, and try to strike abalance across a majority of cases. Of course you can always get better results by tailoringthe settings to your problem at hand.

Value

This function returns a list of 4 integers, equal to the arguments passed by the user.That list is of classdemeaning_algo.

References

B. M. Irons, R. Tuck, "A version of the Aitken accelerator for computer iteration",International journal of numerical methods in engineering 1 (1969) 670 275–277.


Extracts the deviance of a fixest estimation

Description

Returns the deviance from afixest estimation.

Usage

## S3 method for class 'fixest'deviance(object, ...)

Arguments

object

Afixest object.

...

Not currently used.

Value

Returns a numeric scalar equal to the deviance.

See Also

feols,fepois,feglm,fenegbin,feNmlm.

Examples

est = feols(Petal.Length ~ Petal.Width, iris)deviance(est)est_pois = fepois(Petal.Length ~ Petal.Width, iris)deviance(est_pois)

Residual degrees-of-freedom forfixest objects

Description

Returns the residual degrees of freedom for a fittedfixest object

Usage

## S3 method for class 'fixest'df.residual(object, ...)

Arguments

object

Afixest estimation, e.g. fromfeols orfeglm.

...

Not currently used

Value

It returns an integer scalar giving the residuals degrees of freedom of the estimation.

See Also

The functiondegrees_freedom infixest.

Examples

est = feols(mpg ~ hp, mtcars)df.residual(est)

Treated and control sample descriptives

Description

This function shows the means and standard-deviations of several variables conditional on whether they are from the treated or the control group. The groups can further be split according to a pre/post variable. Results can be seamlessly be exported to Latex.

Usage

did_means(  fml,  base,  treat_var,  post_var,  tex = FALSE,  treat_dict,  dict = getFixest_dict(),  file,  replace = FALSE,  title,  label,  raw = FALSE,  indiv,  treat_first,  prepostnames = c("Before", "After"),  diff.inv = FALSE)

Arguments

fml

Either a formula of the typevar1 + ... + varN ~ treat orvar1 + ... + varN ~ treat | post. Either a data.frame/matrix containingall the variables for which the means are to be computed (they must be numeric of course).Both the treatment and the post variables must contain only exactly two values.You can use a point to select all the variables of the data set:. ~ treat.

base

A data base containing all the variables in the formulafml.

treat_var

Only if argumentfml isnot a formula. The vector identifyingthe treated and the control observations (the vector can be of any type but mustcontain only two possible values). Must be of the same length as the data.

post_var

Only if argumentfml isnot a formula. The vector identifyingthe periods (pre/post) of the observations (the vector can be of any type butmust contain only two possible values). The first value (in the sorted sense)of the vector is taken as the pre period. Must be of the same length as the data.

tex

Should the result be displayed in Latex? Default isFALSE. Automaticallyset toTRUE if the table is to be saved in a file using the argumentfile.

treat_dict

A character vector of length two. What are the names of the treatedand the control? This should be a dictionary: e.g.c("1"="Treated", "0" = "Control").

dict

A named character vector. A dictionary between the variables names and an alias.For instancedict=c("x"="Inflation Rate") would replace the variable namexby “Inflation Rate”.

file

A file path. If given, the table is written in Latex into this file.

replace

Default isTRUE, which means that when the table is exported, theexisting file is not erased.

title

Character string giving the Latex title of the table. (Only if exported.)

label

Character string giving the Latex label of the table. (Only if exported.)

raw

Logical, default isFALSE. IfTRUE, it returns the information without formatting.

indiv

Either the variable name of individual identifiers, a one sided formula,or a vector. If the data is that of a panel, this can be used to track the numberof individuals per group.

treat_first

Which value of the 'treatment' vector should appear on the left?By default the max value appears first (e.g. if the treatment variable is a 0/1 vector,1 appears first).

prepostnames

Only if there is a 'post' variable. The names of the pre and postperiods to be displayed in Latex. Default isc("Before", "After").

diff.inv

Logical, default toFALSE. Whether to inverse the difference.

Details

By default, when the user tries to apply this function to nun-numeric variables, an error is raised. The exception is when the all variables are selected with the dot (like in. ~ treat. In this case, non-numeric variables are automatically omitted (with a message).

NAs are removed automatically: if the data contains NAs an information message will be prompted. First all observations containing NAs relating to the treatment or post variables are removed. Then if there are still NAs for the variables, they are excluded separately for each variable, and a new message detailing the NA breakup is prompted.

Value

It returns a data.frame or a Latex table with the conditional means and statistical differences between the groups.

Examples

# Playing around with the DiD datadata(base_did)# means of treat/controldid_means(y+x1+period~treat, base_did)# same but inverting the differencedid_means(y+x1+period~treat, base_did, diff.inv = TRUE)# now treat/control, before/afterdid_means(y+x1+period~treat|post, base_did)# same but with a new line giving the number of unique "indiv" for each casedid_means(y+x1+period~treat|post, base_did, indiv = "id")# same but with the treat case "0" coming firstdid_means(y+x1+period~treat|post, base_did, indiv = ~id, treat_first = 0)# Selecting all the variables with "."did_means(.~treat|post, base_did, indiv = "id")

Simple and powerful string manipulation with the dot square bracket operator

Description

Compactly performs many low level string operations. Advanced support for pluralization.

Usage

dsb(  ...,  frame = parent.frame(),  sep = "",  vectorize = FALSE,  nest = TRUE,  collapse = NULL)

Arguments

...

Character scalars that will be collapsed with the argumentsep. You can use".[x]" within each character string to insert the value ofx in the string. You can add string operations in each".[]" instance with the syntax"'arg'op ? x" (resp."'arg'op ! x") to apply the operation'op' with the argument'arg' tox (resp. the verbatim ofx). Otherwise, what to say? Ah, nesting is enabled, and since there's over 30 operators, it's a bit complicated to sort you out in this small space. But typedsb("--help") to prompt an (almost) extensive help.

frame

An environment used to evaluate the variables in".[]".

sep

Character scalar, default is"". It is used to collapse all the elements in....

vectorize

Logical, default isFALSE. IfTRUE, Further, elements in... are NOT collapsed together, but instead vectorised.

nest

Logical, default isTRUE. Whether the original character strings should be nested into a".[]". IfTRUE, then things likedsb("S!one, two") are equivalent todsb(".[S!one, two]") and hence create the vectorc("one", "two").

collapse

Character scalar orNULL (default). If provided, the resulting character vector will be collapsed into a character scalar using this value as a separator.

There are over 30 basic string operations, it supports pluralization, it's fast (e.g. faster thanglue in the benchmarks), string operations can be nested (it may be the most powerful feature), operators have sensible defaults.

See detailed help on the console withdsb("--help"). The real help is in fact in the "Examples" section.

Value

It returns a character vector whose length depends on the elements and operations in".[]".

Examples

## BASIC USAGE #####x = c("Romeo", "Juliet")# .[x] inserts xdsb("Hello .[x]!")# elements in ... are collapsed with "" (default)dsb("Hello .[x[1]], ",    "how is .[x[2]] doing?")# Splitting a comma separated string# The mechanism is explained laterdsb("/J. Mills, David, Agnes, Dr Strong")# Nota: this is equivalent to (explained later)dsb("', *'S !J. Mills, David, Agnes, Dr Strong")## Applying low level operations to strings## Two main syntax:# A) expression evaluation# .[operation ? x]#             | |#             |  \-> the expression to be evaluated#              \-> ? means that the expression will be evaluated# B) verbatim# .[operation ! x]#             | |#             |  \-> the expression taken as verbatim (here ' x')#              \-> ! means that the expression is taken as verbatim# operation: usually 'arg'op with op an operation code.# Example: splittingx = "hello dear"dsb(".[' 's ? x]")# x is split by ' 'dsb(".[' 's !hello dear]")# 'hello dear' is split by ' '# had we used ?, there would have been an error# By default, the string is nested in .[], so in that case no need to use .[]:dsb("' 's ? x")dsb("' 's !hello dear")# There are 35 string operators# Operators usually have a default value# Operations can be chained by separating them with a comma# Example: default of 's' is ' ' + chaining with collapsedsb("s, ' my 'c!hello dear")## Nesting## .[operations ! s1.[expr]s2]#              |    |#              |     \-> expr will be evaluated then added to the string#               \-> nesting requires verbatim evaluation: '!'dsb("The variables are: .[C!x.[1:4]].")# This one is a bit ugly but it shows triple nestingdsb("The variables are: .[w, C!.[2* ! x.[1:4]].[S, 4** ! , _sq]].")## Splitting## s: split with fixed pattern, default is ' 'dsb("s !a b c")dsb("' b 's !a b c")# S: split with regex pattern, default is ', *'dsb("S !a, b, c")dsb("'[[:punct:] ]'S !a! b; c")## Collapsing## c and C do the same, their default is different# syntax: 's1||s2' with# - s1 the string used for collapsing# - s2 (optional) the string used for the last collapse# c: default is ' 'dsb("c?1:3")# C: default is ', || and 'dsb("C?1:3")dsb("', || or 'c?1:4")## Extraction## x: extracts the first pattern# X: extracts all patterns# syntax: 'pattern'x# Default is '[[:alnum:]]+'x = "This years is... 2020"dsb("x ? x")dsb("X ? x")dsb("'\\d+'x ? x")## STRING FORMATTING ####### u, U: uppercase first/all letters# first letterdsb("u!julia mills")# title case: split -> upper first letter -> collapsedsb("s, u, c!julia mills")# upper all lettersdsb("U!julia mills")## L: lowercasedsb("L!JULIA MILLS")## q, Q: single or double quotedsb("S, q, C!Julia, David, Wilkins")dsb("S, Q, C!Julia, David, Wilkins")## f, F: formats the string to fit the same lengthscore = c(-10, 2050)nm = c("Wilkins", "David")dsb("Monopoly scores:\n.['\n'c ! - .[f ? nm]: .[F ? score] US$]")# OK that example may have been a bit too complex,# let's make it simple:dsb("Scores: .[f ? score]")dsb("Names: .[F ? nm]")## w, W: reformat the white spaces# w: suppresses trimming white spaces + normalizes successive white spaces# W: same but also includes punctuationdsb("w ! The   white  spaces are now clean.  ")dsb("W ! I, really -- truly; love punctuation!!!")## %: applies sprintf formattingdsb("pi = .['.2f'% ? pi]")## a: appends text on each item# syntax: 's1|s2'a, adds s1 at the beginning and s2 at the end of the string# It accepts the special values :1:, :i:, :I:, :a:, :A:# These values create enumerations (only one such value is accepted)# appending square bracketsdsb("'[|]'a, ' + 'c!x.[1:4]")# Enumerationsacad = dsb("/you like admin, you enjoy working on weekends, you really love emails")dsb("Main reasons to pursue an academic career:\n .[':i:) 'a, C ? acad].")## A: same as 'a' but adds at the begging/end of the full string (not on the elements)# special values: :n:, :N:, give the number of elementscharacters = dsb("/David, Wilkins, Dora, Agnes")dsb("There are .[':N: characters: 'A, C ? characters].")## stop: removes basic English stopwords# the list is from the Snowball project: http://snowball.tartarus.org/algorithms/english/stop.txtdsb("stop, w!It is a tale told by an idiot, full of sound and fury, signifying nothing.")## k: keeps the first n characters# syntax: nk: keeps the first n characters#         'n|s'k: same + adds 's' at the end of shortened strings#         'n||s'k: same but 's' counts in the n characters keptwords = dsb("/short, constitutional")dsb("5k ? words")dsb("'5|..'k ? words")dsb("'5||..'k ? words")## K: keeps the first n elements# syntax: nK: keeps the first n elements#         'n|s'K: same + adds the element 's' at the end#         'n||s'K: same but 's' counts in the n elements kept## Special values :rest: and :REST:, give the number of items droppedbx = dsb("/Pessac Leognan, Saint Emilion, Marguaux, Saint Julien, Pauillac")dsb("Bordeaux wines I like: .[3K, ', 'C ? bx].")dsb("Bordeaux wines I like: .['3|etc..'K, ', 'C ? bx].")dsb("Bordeaux wines I like: .['3||etc..'K, ', 'C ? bx].")dsb("Bordeaux wines I like: .['3|and at least :REST: others'K, ', 'C ? bx].")## Ko, KO: special operator which keeps the first n elements and adds "others"# syntax: nKo# KO gives the rest in lettersdsb("Bordeaux wines I like: .[4KO, C ? bx].")## r, R: string replacement# syntax: 's'R: deletes the content in 's' (replaces with the empty string)#         's1 => s2'R replaces s1 into s2# r: fixed / R: perl = TRUEdsb("'e'r !The letter e is deleted")# adding a perl look-behinddsb("'(?<! )e'R !The letter e is deleted")dsb("'e => a'r !The letter e becomes a")dsb("'([[:alpha:]]{3})[[:alpha:]]+ => \\1.'R !Trimming the words")## *, *c, **, **c: replication, replication + collapse# syntax: n* or n*c# ** is the same as * but uses "each" in the replicationdsb("N.[10*c!o]!")dsb("3*c ? 1:3")dsb("3**c ? 1:3")## d: replaces the items by the empty string# -> useful in conditionsdsb("d!I am going to be annihilated")## ELEMENT MANIPULATION ####### D: deletes all elements# -> useful in conditionsx = dsb("/I'll, be, deleted")dsb("D ? x")## i, I: inserts an item# syntax: 's1|s2'i: inserts s1 first and s2 last# I: is the same as i but is 'invisibly' includedcharacters = dsb("/David, Wilkins, Dora, Agnes, Trotwood")dsb("'Heep|Spenlow'i, C ? characters")dsb("'Heep|Spenlow'I, C ? characters")## PLURALIZATION ###### There is support for pluralization## *s, *s_: adds 's' or 's ' depending on the number of elementsnb = 1:5dsb("Number.[*s, D ? nb]: .[C ? nb]")dsb("Number.[*s, D ? 2 ]: .[C ? 2 ]")# ordsb("Number.[*s, ': 'A, C ? nb]")## v, V: adds a verb at the beginning/end of the string# syntax: 'verb'v# Unpopular opinion?brand = c("Apple", "Samsung")dsb(".[V, C ? brand] overrated.")dsb(".[V, C ? brand[1]] overrated.")win = dsb("/Peggoty, Agnes, Emily")dsb("The winner.[*s_, v, C ? win].")dsb("The winner.[*s_, v, C ? win[1]].")# Other verbsdsb(".[' have'V, C ? win] won a prize.")dsb(".[' have'V, C ? win[1]] won a prize.")dsb(".[' was'V, C ? win] unable to come.")dsb(".[' was'V, C ? win[1]] unable to come.")## *A: appends text depending on the length of the vector# syntax: 's1|s2 / s3|s4'#         if length == 1: applies 's1|s2'A#         if length >  1: applies 's3|s4'Awin = dsb("/Barkis, Micawber, Murdstone")dsb("The winner.[' is /s are '*A, C ? win].")dsb("The winner.[' is /s are '*A, C ? win[1]].")## CONDITIONS ###### Conditions can be applied with 'if' statements.",# The syntax is 'type comp value'if(true : false), with# - type: either 'len', 'char', 'fixed' or 'regex'#   + len: number of elements in the vector#   + char: number of characters#   + fixed: fixed pattern#   + regex: regular expression pattern# - comp: a comparator:#   + valid for len/char: >, <, >=, <=, !=, ==#   + valid for fixed/regex: !=, ==# - value: a value for which the comparison is applied.# - true: operations to be applied if true (can be void)# - false: operations to be applied if false (can be void)dsb("'char <= 2'if('(|)'a : '[|]'a), ' + 'c ? c(1, 12, 123)")sentence = "This is a sentence with some longish words."dsb("s, 'char<=4'if(D), c ? sentence")dsb("s, 'fixed == e'if(:D), c ! Only words with an e are selected.")## ARGUMENTS FROM THE FRAME ###### Arguments can be evaluated from the calling frame.# Simply use backticks instead of quotes.dollar = 6reason = "glory"dsb("Why do you develop packages? For .[`dollar`*c!$]?",    "For money? No... for .[U,''s, c?reason]!", sep = "\n")

Support for emmeans package

Description

Ifemmeans is installed, its functionality is supported forfixestorfixest_multi objects. Its reference grid is based on the main partof the model, and does not include fixed effects or instrumental variables.Note that any desired arguments tovcov() may be passed as optionalarguments inemmeans::emmeans() oremmeans::ref_grid().

Note

When fixed effects are present, estimated marginal means (EMMs) are estimatedcorrectly, provided equal weighting is used. However, the SEs of these EMMswill be incorrect - often dramatically - because the estimated variance ofthe intercept is not available. However,contrasts among EMMs can beestimated and tested with no issues, because these do not involve theintercept.

Author(s)

Russell V. Lenth

Examples

if(requireNamespace("emmeans") && requireNamespace("AER")) {    data(Fatalities, package = "AER")    Fatalities$frate = with(Fatalities, fatal/pop * 10000)    fat.mod = feols(frate ~ breath * jail * beertax | state + year, data = Fatalities)    emm = emmeans::emmeans(fat.mod, ~ breath*jail, cluster = ~ state + year)    emm   ### SEs and CIs are incorrect    emmeans::contrast(emm, "consec", by = "breath")   ### results are reliable}

Estimates afixest estimation from afixest environment

Description

This is a function advanced users which allows to estimate anyfixest estimation from afixest environment obtained withonly.env = TRUE in afixest estimation.

Usage

est_env(env, y, X, weights, endo, inst)

Arguments

env

An environment obtained from afixest estimation withonly.env = TRUE. This isintended for advanced users so there is no error handling: any other kind of input willfail with a poor error message.

y

A vector representing the dependent variable. Should be of the same lengthas the number of observations in the initial estimation.

X

A matrix representing the independent variables. Should be of the same dimensionas in the initial estimation.

weights

A vector of weights (i.e. with only positive values). Should be ofthe same length as the number of observations in the initial estimation. If identicalto the scalar 1, this will mean that no weights will be used in the estimation.

endo

A matrix representing the endogenous regressors in IV estimations. It shouldbe of the same dimension as the original endogenous regressors.

inst

A matrix representing the instruments in IV estimations. It should be ofthe same dimension as the original instruments.

Details

This function has been created for advanced users, mostly to avoid overheadswhen making simulations withfixest.

How can it help you make simulations? First make a core estimation withonly.env = TRUE,and usually withonly.coef = TRUE (to avoid having extra things that take time to compute).Then loop while modifying the appropriate things directly in the environment. Beware thatif you make a mistake here (typically giving stuff of the wrong length),then you can make the R session crash because there is no more error-handling!Finally estimate withest_env(env = core_env) and store the results.

Instead ofest_env, you could use directlyfixest estimations too, likefeols,since they accept theenv argument. The functionest_env is only here to add abit of generality to avoid the trouble to the user to write conditions(look at the source, it's just a one liner).

Objects of main interest in the environment are:

lhs

The left hand side, or dependent variable.

linear.mat

The matrix of the right-hand-side, or explanatory variables.

iv_lhs

The matrix of the endogenous variables in IV regressions.

iv.mat

The matrix of the instruments in IV regressions.

weights.value

The vector of weights.

I strongly discourage changing the dimension of any of these elements, or else crash can occur.However, you can change their values at will (given the dimension stay the same).The only exception is the weights, which tolerates changing its dimension: it canbe identical to the scalar1 (meaning no weights), or to something of the length thenumber of observations.

I also discourage changing anything in the fixed-effects (even their value)since this will almost surely lead to a crash.

Note that this function is mostly useful when the overheads/estimation ratio is high.This means that OLS will benefit the most from this function. For GLM/Max.Lik. estimations,the ratio is small since the overheads is only a tiny portion of the total estimation time.Hence this function will be less useful for these models.

Value

It returns the results of afixest estimation: the one that was summoned whenobtaining the environment.

Author(s)

Laurent Berge

Examples

# Let's make a short simulation# Inspired from Grant McDermott bboot function# See https://twitter.com/grant_mcdermott/status/1487528757418102787# Simple function that computes a Bayesian bootstrapbboot = function(x, n_sim = 100){  # We bootstrap on the weights  # Works with fixed-effects/IVs  #  and with any fixest function that accepts weights  core_env = update(x, only.coef = TRUE, only.env = TRUE)  n_obs = x$nobs  res_all = vector("list", n_sim)  for(i in 1:n_sim){    ## begin: NOT RUN    ## We could directly assign in the environment:    # assign("weights.value", rexp(n_obs, rate = 1), core_env)    # res_all[[i]] = est_env(env = core_env)    ##   end: NOT RUN    ## Instead we can use the argument weights, which does the same    res_all[[i]] = est_env(env = core_env, weights = rexp(n_obs, rate = 1))  }  do.call(rbind, res_all)}est = feols(mpg ~ wt + hp, mtcars)boot_res = bboot(est)coef = colMeans(boot_res)std_err = apply(boot_res, 2, sd)# Comparing the results with the main estimationcoeftable(est)cbind(coef, std_err)

Extracts the scores from a fixest estimation

Description

Extracts the scores from a fixest estimation.

Usage

## S3 method for class 'fixest'estfun(x, ...)

Arguments

x

Afixest object, obtained for instance fromfeols.

...

Not currently used.

Value

Returns a matrix of the same number of rows as the number of observations used forthe estimation, and the same number of columns as there were variables.

Examples

data(iris)est = feols(Petal.Length ~ Petal.Width + Sepal.Width, iris)head(estfun(est))

Estimations table (export the results of multiples estimations to a DF or to Latex)

Description

Aggregates the results of multiple estimations and displays them in the form of either a Latextable or adata.frame. Note that you will need thebooktabs package for the Latex table torender properly. SeesetFixest_etable to set the default values, andstyle.tex to customize Latex output.

Usage

esttable(  ...,  vcov = NULL,  stage = 2,  agg = NULL,  se = NULL,  ssc = NULL,  cluster = NULL,  .vcov_args = NULL,  digits = 4,  digits.stats = 5,  fitstat = NULL,  coefstat = "se",  ci = 0.95,  se.row = NULL,  se.below = NULL,  keep = NULL,  drop = NULL,  order = NULL,  dict = TRUE,  file = NULL,  replace = TRUE,  create_dirs = FALSE,  convergence = NULL,  signif.code = NULL,  headers = list("auto"),  fixef_sizes = FALSE,  fixef_sizes.simplify = TRUE,  keepFactors = TRUE,  family = NULL,  powerBelow = -5,  interaction.combine = NULL,  interaction.order = NULL,  i.equal = NULL,  depvar = TRUE,  style.df = NULL,  group = NULL,  extralines = NULL,  fixef.group = NULL,  drop.section = NULL,  poly_dict = c("", " square", " cube"),  postprocess.df = NULL,  fit_format = "__var__",  coef.just = NULL,  highlight = NULL,  coef.style = NULL,  export = NULL,  page.width = "fit",  div.class = "etable")esttex(  ...,  vcov = NULL,  stage = 2,  agg = NULL,  se = NULL,  ssc = NULL,  cluster = NULL,  .vcov_args = NULL,  digits = 4,  digits.stats = 5,  fitstat = NULL,  caption = NULL,  coefstat = "se",  ci = 0.95,  se.row = NULL,  se.below = NULL,  keep = NULL,  drop = NULL,  order = NULL,  dict = TRUE,  file = NULL,  replace = TRUE,  create_dirs = FALSE,  convergence = NULL,  signif.code = NULL,  label = NULL,  float = NULL,  headers = list("auto"),  fixef_sizes = FALSE,  fixef_sizes.simplify = TRUE,  keepFactors = TRUE,  family = NULL,  powerBelow = -5,  interaction.combine = NULL,  interaction.order = NULL,  i.equal = NULL,  depvar = TRUE,  style.tex = NULL,  notes = NULL,  group = NULL,  extralines = NULL,  fixef.group = NULL,  placement = "htbp",  drop.section = NULL,  poly_dict = c("", " square", " cube"),  postprocess.tex = NULL,  tpt = FALSE,  arraystretch = NULL,  adjustbox = NULL,  fontsize = NULL,  fit_format = "__var__",  tabular = "normal",  highlight = NULL,  coef.style = NULL,  meta = NULL,  meta.time = NULL,  meta.author = NULL,  meta.sys = NULL,  meta.call = NULL,  meta.comment = NULL,  view = FALSE,  export = NULL,  markdown = NULL,  page.width = "fit",  div.class = "etable")etable(  ...,  vcov = NULL,  stage = 2,  agg = NULL,  se = NULL,  ssc = NULL,  cluster = NULL,  .vcov_args = NULL,  digits = 4,  digits.stats = 5,  tex,  fitstat = NULL,  caption = NULL,  coefstat = "se",  ci = 0.95,  se.row = NULL,  se.below = NULL,  keep = NULL,  drop = NULL,  order = NULL,  dict = TRUE,  file = NULL,  replace = TRUE,  create_dirs = FALSE,  convergence = NULL,  signif.code = NULL,  label = NULL,  float = NULL,  headers = list("auto"),  fixef_sizes = FALSE,  fixef_sizes.simplify = TRUE,  keepFactors = TRUE,  family = NULL,  powerBelow = -5,  interaction.combine = NULL,  interaction.order = NULL,  i.equal = NULL,  depvar = TRUE,  style.tex = NULL,  style.df = NULL,  notes = NULL,  group = NULL,  extralines = NULL,  fixef.group = NULL,  placement = "htbp",  drop.section = NULL,  poly_dict = c("", " square", " cube"),  postprocess.tex = NULL,  postprocess.df = NULL,  tpt = FALSE,  arraystretch = NULL,  adjustbox = NULL,  fontsize = NULL,  fit_format = "__var__",  coef.just = NULL,  tabular = "normal",  highlight = NULL,  coef.style = NULL,  meta = NULL,  meta.time = NULL,  meta.author = NULL,  meta.sys = NULL,  meta.call = NULL,  meta.comment = NULL,  view = FALSE,  export = NULL,  markdown = NULL,  page.width = "fit",  div.class = "etable")setFixest_etable(  digits = 4,  digits.stats = 5,  fitstat,  coefstat = c("se", "tstat", "confint", "pvalue"),  ci = 0.95,  se.below = TRUE,  keep,  drop,  order,  dict,  float,  signif.code = NULL,  fixef_sizes = FALSE,  fixef_sizes.simplify = TRUE,  family,  powerBelow = -5,  interaction.order = NULL,  depvar,  style.tex = NULL,  style.df = NULL,  notes = NULL,  group = NULL,  extralines = NULL,  fixef.group = NULL,  placement = "htbp",  drop.section = NULL,  view = FALSE,  markdown = NULL,  view.cache = TRUE,  page.width = "fit",  div.class = "etable",  postprocess.tex = NULL,  postprocess.df = NULL,  fit_format = "__var__",  meta.time = NULL,  meta.author = NULL,  meta.sys = NULL,  meta.call = NULL,  meta.comment = NULL,  reset = FALSE,  save = FALSE)getFixest_etable()## S3 method for class 'etable_tex'print(x, ...)## S3 method for class 'etable_df'print(x, ...)log_etable(type = "pdflatex")

Arguments

...

Used to capture differentfixest estimation objects (obtained withfemlm,feols orfeglm). Note that any other type of element is discarded. Note that you cangive a list offixest objects.

vcov

Versatile argument to specify the VCOV.In general, it is either a character scalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. The VCOV types implemented are: "iid", "hetero" (or "HC1"),"cluster", "twoway", "NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley".It also accepts object from vcov_cluster, vcov_NW, NW, vcov_DK, DK, vcov_conley and conley.It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances.See the vcov documentation in the vignette.You can pass several VCOVs (as above) if you nest them into a list.If the number of VCOVs equals the number of models, eahc VCOV is mapped to the appropriate model.If there is one model and several VCOVs, or if the first element of the list is equal to"each" or"times", then the estimations will be replicated and the resultsfor each estimation and each VCOV will be reported.

stage

Can be equal to2 (default),1,1:2 or2:1. Only used if the objectis an IV estimation: defines the stage to whichsummary should be applied. Ifstage = 1and there are multiple endogenous regressors or ifstage is of length 2, then anobject of classfixest_multi is returned.

agg

A character scalar describing the variable names to be aggregated,it is pattern-based. Forsunab estimations, the following keywords work: "att","period", "cohort" andFALSE (to have full disaggregation). All variables thatmatch the pattern will be aggregated. It must be of the form"(root)", the parenthesesmust be there and the resulting variable name will be"root". You can add anotherroot with parentheses:"(root1)regex(root2)", in which case the resultingname is"root1::root2". To name the resulting variable differently you can passa named vector:c("name" = "pattern") orc("name" = "pattern(root2)"). It's abit intricate sorry, please see the examples.

se

Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation:se = "cluster", otherwisese = "iid". Note that this argument is deprecated,you should usevcov instead.

ssc

An object of classssc.type obtained with the functionssc. Representshow the degree of freedom correction should be done.You must use the functionsscfor this argument. The arguments and defaults of the functionssc are:K.adj = TRUE,K.fixef = "nonnested",G.adj = TRUE,G.df = "min",t.df = "min",⁠K.exact = FALSE)⁠. See the help of the functionssc for details.

cluster

Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering overvar1 andvar2contained in the data.framebase used for the estimation. All the followingcluster arguments are valid and do the same thing:cluster = base[, c("var1", "var2")],cluster = c("var1", "var2"),cluster = ~var1+var2.If the two variables were used as fixed-effects in the estimation, you can leave itblank withvcov = "twoway" (assumingvar1 [resp.var2] wasthe 1st [resp. 2nd] fixed-effect). You can interact two variables using^ withthe following syntax:cluster = ~var1^var2 orcluster = "var1^var2".

.vcov_args

A list containing arguments to be passed to the functionvcov.

digits

Integer or character scalar. Default is 4 and represents the number of significantdigits to be displayed for the coefficients and standard-errors. To apply rounding instead ofsignificance use, e.g.,digits = "r3" which will round at the first 3 decimals. If character,it must be of the form"rd" or"sd" withd a digit (r is for round ands is forsignificance). For the number of digits for the fit statistics, usedigits.stats. Note thatwhen significance is used it does not exactly display the number of significant digits: seedetails for its exact meaning.

digits.stats

Integer or character scalar. Default is 5 and represents the number ofsignificant digits to be displayed for the fit statistics. To apply rounding instead ofsignificance use, e.g.,digits = "r3" which will round at the first 3 decimals. If character,it must be of the form"rd" or"sd" withd a digit (r is for round ands is forsignificance). Note that when significance is used it does not exactly display the number ofsignificant digits: see details for its exact meaning.

fitstat

A character vector or a one sided formula (both with only lowercase letters). Avector listing which fit statistics to display. The valid types are 'n', 'll', 'aic', 'bic' andr2 types like 'r2', 'pr2', 'war2', etc (see all valid types inr2). Also accepts valid typesfrom the functionfitstat. The default value depends on the models to display. Example ofuse:fitstat=c('n', 'cor2', 'ar2', 'war2'), orfitstat=~n+cor2+ar2+war2 using a formula. Youcan use the dot to refer to default values: ~ . + ll would add the log-likelihood to thedefault fit statistics.

coefstat

One of"se" (default),"tstat","pvalue", or"confint". The statistic to report foreach coefficient: the standard-error, the t-statistics, the p-value,or the confidence interval. You can adjust the confidence interval with the argumentci.

ci

Level of the confidence interval, defaults to0.95. Only used ifcoefstat = confint.

se.row

Logical scalar, default isNULL. Whether should be displayed the row with thetype of standard-error for each model. Whentex = FALSE, the default isTRUE.Whentex = FALSE, the row is showed only when there is a table-footer and the types ofstandard-errors differ across models.

se.below

Logical orNULL (default). Should the standard-errors be displayed below thecoefficients? IfNULL, then this isTRUE for Latex andFALSE otherwise.

keep

Character vector. This element is used to display only a subset of variables. Thisshould be a vector of regular expressions (seebase::regex help for more info). Eachvariable satisfying any of the regular expressions will be kept. This argument is applied postaliasing (see argumentdict). Example: you have the variablex1 tox55 and want to displayonlyx1 tox9, then you could usekeep = "x[[:digit:]]$". If the first character is anexclamation mark, the effect is reversed (e.g. keep = "!Intercept" means: every variable thatdoes not contain “Intercept” is kept). See details.

drop

Character vector. This element is used if some variables are not to be displayed.This should be a vector of regular expressions (seebase::regex help for more info). Eachvariable satisfying any of the regular expressions will be discarded. This argument is appliedpost aliasing (see argumentdict). Example: you have the variablex1 tox55 and want todisplay onlyx1 tox9, then you could use⁠drop = "x[[:digit:]]{2}⁠". If the first characteris an exclamation mark, the effect is reversed (e.g. drop = "!Intercept" means: every variablethat does not contain “Intercept” is dropped). See details.

order

Character vector. This element is used if the user wants the variables to beordered in a certain way. This should be a vector of regular expressions (seebase::regexhelp for more info). The variables satisfying the first regular expression will be placed first,then the order follows the sequence of regular expressions. This argument is applied postaliasing (see argumentdict). Example: you have the following variables:month1 tomonth6,thenx1 tox5, thenyear1 toyear6. If you want to display first the x's, then theyears, then the months you could use:order = c("x", "year"). If the first character is anexclamation mark, the effect is reversed (e.g. order = "!Intercept" means: every variable thatdoes not contain “Intercept” goes first). See details.

dict

A named character vector or a logical scalar. It changes the original variable namesto the ones contained in thedictionary. E.g. to change the variables nameda andb3 to(resp.) “$log(a)$” and to “$bonus^3$”, usedict=c(a="$log(a)$",b3="$bonus^3$").By default, it is equal togetFixest_dict(), a default dictionary which can be set withsetFixest_dict. You can usedict = FALSE to disable it. By defaultdict modifies theentries in the global dictionary, to disable this behavior, use "reset" as the first element(ex:dict=c("reset", mpg="Miles per gallon")).

file

A character scalar. If provided, the Latex (or data frame) table will be saved in afile whose path isfile. If you provide this argument, then a Latex table will be exported, toexport a regulardata.frame, use argumenttex = FALSE.

replace

Logical, default isFALSE. Only used if optionfile is used. Should theexported table be written in a new file that replaces any existing file?

create_dirs

Logical, default isFALSE. Only used if when some file needs to becreated (e;g. whenfile orexport is used). By default, i.e. whenFALSE,if the parent directory does not exist, the containing folders are createdup to the grand parent.IfTRUE, all containing folders are recursively created.

convergence

Logical, default is missing. Should the convergence state of the algorithm bedisplayed? By default, convergence information is displayed if at least one model did notconverge.

signif.code

Named numeric vector, used to provide the significance codes with respect tothe p-value of the coefficients. Default isc("***"=0.01, "**"=0.05, "*"=0.10) for a Latextable andc("***"=0.001, "**"=0.01, "*"=0.05, "."=0.10) for a data.frame (to conform with R'sdefault). To suppress the significance codes, usesignif.code=NA orsignif.code=NULL. Canalso be equal to"letters", then the default becomesc("a"=0.01, "b"=0.05, "c"=0.10).

headers

Character vector or list. Adds one or more header lines in the table. A headerline can be represented by a character vector or a named list of numbers where the names are thecell values and the numbers are the span. Example:headers=list("M"=2, "F"=3) will create arow with 2 times "M" and three time "F" (this is identical toheaders=rep(c("M", "F"), c(2, 3))). You can stack header lines within a list, in that case thelist names will be displayed in the leftmost cell.Example:⁠headers=list(Gender=list("M"=2, "F"=3), Country="US"⁠ will create two header lines.Whentex = TRUE, you can add a rule to separate groups by using":_:" somewhere in the rowname (ex:headers=list(":_:Gender"=list("M"=2, "F"=3)). You can monitor the placement byinserting a special character in the row name: "^" means at the top, "-" means in the middle(default) and "_" means at the bottom. Example:headers=list("_Country"="US") will add thecountry row as the very last header row (after the model row). Finally, you can use the specialvalue "auto" to include automatic headers when the data contains split sample estimations. Bydefault it is equal tolist("auto"). You can use.() instead oflist().

fixef_sizes

(Tex only.) Logical, default isFALSE. IfTRUE and fixed-effects wereused in the models, then the number of "units" per fixed-effect dimension is also displayed.

fixef_sizes.simplify

Logical, default isTRUE. Only used iffixef_sizes = TRUE. IfTRUE, the fixed-effects sizes will be displayed in parentheses instead of in a separate lineif there is no ambiguity (i.e. if the size is constant across models).

keepFactors

Logical, default isTRUE. IfFALSE, then factor variables are displayedas fixed-effects and no coefficient is shown.

family

Logical, default is missing. Whether to display the families of the models. Bydefault this line is displayed when at least two models are from different families.

powerBelow

(Tex only.) Integer, default is -5. A coefficient whose value is below10**(powerBelow+1) is written with a power in Latex. For example0.0000456 would be written⁠4.56$\\times 10^{-5}$⁠ by default. SettingpowerBelow = -6 would lead to0.00004 in Latex.

interaction.combine

Character scalar, defaults to" $\\times$ " for Tex and to" x "otherwise. When the estimation contains interactions, then the variables names (after aliasing)are combined with this argument. For example: ifdict = c(x1="Wind", x2="Rain") and you havethe following interactionx1:x2, then it will be renamed (by default)⁠Wind $\\times$ Rain⁠ – usinginteraction.combine = "*" would lead toWind*Rain.

interaction.order

Character vector of regular expressions. Only affects variables thatare interacted like x1 and x2 infeols(y ~ x1*x2, data). You can change the order in which theinteracted variables are displayed: e.g.interaction.order = "x2" would lead to "x2 x x1"instead of "x1 x x2". Please look at the argument 'order' and the dedicated section in the helppage for more information.

i.equal

Character scalar, defaults to" $=$ " whentex = TRUE and" = " otherwise.Only affects factor variables created with the functioni, tells how the variable should belinked to its value. For example if you have theSpecies factor from theiris data set, bydefault the display of the variable isSpecies = Setosa, etc. Ifi.equal = ": " the displaybecomesSpecies: Setosa.

depvar

Logical, default isTRUE. Whether a first line containing the dependentvariables should be shown.

style.df

An object created by the functionstyle.df It represents the style of thedata frame returned (iftex = FALSE), see the documentation ofstyle.df.

group

A list. The list elements should be vectors of regular expressions. For eachelements of this list: A new line in the table is created, all variables that are matched by theregular expressions are discarded (same effect as the argumentdrop) andTRUE orFALSEwill appear in the model cell, depending on whether some of the previous variables were found inthe model.Example:group=list("Controls: personal traits"=c("gender", "height", "weight"))will create an new line with"Controls: personal traits" in the leftmost cell, all threevariables gender, height and weight are discarded,TRUE appearing in each model containing atleast one of the three variables (the style ofTRUE/FALSE is governed by the argumentyesNo). You can control the placement of the new row by using 1 or 2 special characters at thestart of the row name.The meaning of these special characters are: 1)"^": coef.,"-": fixed-effect,"_": stats, section; 2)"^": 1st,"_": last, row.For example:group=list("_^Controls"=stuff) will place the line at the top of the 'stats'section, and usinggroup=list("^_Controls"=stuff) will make the row appear at the bottom ofthe coefficients section. For details, see the dedicated section.

extralines

A vector, a list or a one sided formula. The list elements should be either avector representing the value of each cell, a list of the form⁠list("item1" = #item1, "item2" = #item2, etc)⁠, or a function.This argument can be many things, please have a look at the dedicated help section;a simplified description follows. For each elements of this list: A new line in the table iscreated, the list name being the row name and the vector being the content of the cells.Example:extralines=list("Sub-sample"=c("<20 yo", "all", ">50 yo")) will create an new linewith"Sub-sample" in the leftmost cell, the vector filling the content of the cells for thethree models. You can control the placement of the new row by using 1 or 2 special characters atthe start of the row name. The meaning of these special characters are:

  1. "^": coef.,"-": fixed-effect,"_": stats, section;

  2. "^": 1st,"_": last, row.For example:extralines=list("__Controls"=stuff) will place the line at the bottom of thestats section, and usingextralines=list("^^Controls"=stuff) will make the row appear at thetop of the 'coefficients' section. For details, see the dedicated section. You can use.()instead oflist().

fixef.group

Logical scalar or list (default isNULL). If equal toTRUE, then allfixed-effects always appearing jointly in models will be grouped in one row. If a list, itselements must be character vectors of regular expressions and the list names will be the rownames. For ex.fixef.group=list("Dates fixed-effects"="Month|Day") will remove the"Month"and"Day" fixed effects from the display and replace them with a single row named"Dates fixed-effects". You can monitor the placement of the new row with two special characterstelling where to place the row within a section: first in which section it should appear:"^" (coef.),"-" (fixed-effects), or"_" (stat.) section;then whether the row should be"^" (first), or"_" (last). These two special characters mustappear first in the row names. Please see the dedicated section

drop.section

Character vector which can be of length 0 (i.e. equal toNULL). Cancontain the values "coef", "fixef", "slopes" or "stats". It would drop, respectively, thecoefficients section, fixed-effects section, the variables with varying slopes section or thefit statistics section.

poly_dict

Character vector, default isc("", " square", " cube"). When raw polynomials(x^2, etc) are used, the variables are automatically renamed andpoly_dict rules the displayof the power. For powers greater than the number of elements of the vector, the value displayedis⁠$^{pow}$⁠ in Latex and⁠^ pow⁠ in the R console.

postprocess.df

A function that will postprocess.tex the resulting data.frame. Only whentex = FALSE. By default it is equal toNULL, meaning that there is no postprocessing. Whentex = TRUE, see the argumentpostprocess.tex.

fit_format

Character scalar, default is"__var__". Only used in the presence of IVs. Bydefault the endogenous regressors are namedfit_varname in the second stage. The format of theendogenous regressor to appear in the table is governed byfit_format. For instance, bydefault, the prefix"fit_" is removed, leading to onlyvarname to appear.If⁠fit_format = "$\\\\hat{__var__$"}⁠, then⁠"$\\hat{varname$"}⁠ will appear in the table.

coef.just

(DF only.) Either".","(","l","c" or"r", default isNULL.How the coefficients should be justified. IfNULL then they are right aligned ifse.below = FALSE and aligned to the dot ifse.below = TRUE. The keywords standrespectively for dot-, parenthesis-, left-, center- and right-aligned.

highlight

List containing coefficients to highlight.Highlighting is of the form.("options1" = "coefs1", "options2" = "coefs2", etc).The coefficients to be highlighted can be written in three forms: 1) row, eg"x1" willhighlight the full row of the variablex1;2) cells, use'@' after the coefficient name togive the column, it accepts ranges, eg"x1@2, 4-6, 8" will highlight only the columns2, 4, 5, 6, and 8 of the variablex1; 3) range, by giving the top-left andbottom-right values separated with a semi-colon, eg"x1@2 ; x3@5" will highlightfrom the column 2 ofx1 to the 5th column ofx3. Coefficient names are partiallymatched, use a'%' first to refer to the original name (before dictionary) anduse'@' first to use a regular expression. You can add a vector of row/cell/range.The options are a comma-separated list of items. By default the highlighting is donewith a frame (a thick box) around the coefficient, use'rowcol' to highlight with arow color instead. Here are the other options:'se' to highlight the standard-errors too;'square' to have a square box (instead of rounded);'thick1' to'thick6'to monitor the width of the box;'sep0' to'sep9' to monitor the inner spacing.Finally the remaining option is the color: simply add an R color (it must be a valid R color!).You can use"color!alpha" with "alpha" a number between 0 to 100 to changethe alpha channel of the color.

To be able to use use the highlighting feature, you need thefollowing lines in your latex preamble:⁠\\usepackage{tikz}⁠ and⁠\\usetikzlibrary{matrix, shapes, arrows, fit, tikzmark}⁠

coef.style

Named list containing styles to be applied to the coefficients. It must be ofthe form.("style1" = "coefs1", "style2" = "coefs2", etc). The style must contain thestring":coef:" (or":coef_se:" to style both the coefficient and its standard-error).The string⁠:coef:⁠ will be replaced verbatim by the coefficient value. For example use"\\textbf{:coef:}" to put the coefficient in bold. Note that markdown markup is enabledso"**:coef:**" would also put it in bold. The coefficients to be styled can be writtenin three forms: 1) row, eg"x1" will style the full row of the variablex1; 2) cells,use'@' after the coefficient name to give the column, it accepts ranges,eg"x1@2, 4-6, 8" will style only the columns 2, 4, 5, 6, and 8 of the variablex1;3) range, by giving the top-left and bottom-right values separated with a semi-colon,eg"x1@2 ; x3@5" will style from the column 2 ofx1 to the 5th column ofx3.Coefficient names are partially matched, use a'%' first to refer to the original name(before dictionary) and use'@' first to use a regular expression. You can add avector of row/cell/range.

export

Character scalar giving the path to a PNG file to be created, default isNULL.If provided, the Latex table will be converted to PNG and copied to theexport location. Notethat for this option to work you need a working distribution ofpdflatex,imagemagick andghostscript, or the R packagestinytex andpdftools.

page.width

Character scalar equal to'fit' (default),'a4' or'us'; or asingle Latex measure (like'17cm') or a double one (like"21, 2cm"). Only used whenthe Latex table is to be viewed (view = TRUE), exported (export != NULL) or displayedin Rmarkdown (markdown != NULL). It represents the text width of the page in which theLatex table will be inserted. By default,'fit', the page fits exactly the table (i.e.text width = table width). If'a4' or'us', two times 2cm is removed from the pagewidth to account for margins. Providing a page width and a margin width, like in"17in, 1in", enables a correct display of the argumentadjustbox. Note that themargin width represent the width of a single side margin (and hence will be doubled).

div.class

Character scalar, default is"etable". Only used in Rmarkdown documentswhenmarkdown = TRUE. The table in an image format is embedded in a⁠<div>⁠ container,and that container is of classdiv.class.

caption

(Tex only.) Character scalar. The caption of the Latex table.

label

(Tex only.) Character scalar. The label of the Latex table.

float

(Tex only.) Logical. By default, if the argumentcaption orlabel is provided,it is set toTRUE. Otherwise, it is set toFALSE.

style.tex

An object created by the functionstyle.tex. It represents the style of theLatex table, see the documentation ofstyle.tex.

notes

(Tex only.) Character vector. If provided, a"notes" section will be added at theend right after the end of the table, containing the text of this argument. If it is a vector,it will be collapsed with new lines. Iftpt = TRUE, the behavior is different: each element ofthe vector is an item. If the first element of the vector starts with"@", then it will beincluded verbatim, and in case oftpt = TRUE, right before the first item. If that element isprovided, it will replace the value defined instyle.tex(notes.intro) orstyle.tex(notes.tpt.intro).

placement

(Tex only.) Character string giving the position of the float in Latex. Defaultis "htbp". It must consist of only the characters 'h', 't', 'b', 'p', 'H' and '!'.Reminder: h: here; t: top; b: bottom; p: float page; H: definitely here;!: prevents Latex to look for other positions. Note that it can be equal to the empty string(and you'll get the default placement).

postprocess.tex

A function that will postprocess the character vector defining the latextable. Only whentex = TRUE. By default it is equal toNULL, meaning that there is nopostprocessing. Whentex = FALSE, see the argumentpostprocess.df. See details.

tpt

(Tex only.) Logical scalar, default is FALSE. Whether to use thethreeparttableenvironment. If so, thenotes will be integrated into thetablenotes environment.

arraystretch

(Tex only.) A numeric scalar, default isNULL. If provided,the command⁠\\renewcommand*{\\arraystretch{x}}⁠ is inserted, replacingx by the value ofarraystretch. The changes are specific to the current table and do not affect the rest of thedocument.

adjustbox

(Tex only.) A logical, numeric or character scalar, default isNULL. If notNULL, the table is inserted within theadjustbox environment. By default the options are⁠width = 1\\textwidth, center⁠ (ifTRUE). A numeric value changes the value before⁠\\textwidth⁠. You can also add a character of the form"x tw" or"x th" withx a numberand where tw (th) stands for text-width (text-height). Finally any other character value ispassed verbatim as anadjustbox option.

fontsize

(Tex only.) A character scalar, default isNULL. Can be equal totiny,scriptsize,footnotesize,small,normalsize,large, orLarge. The change affect thetable only (and not the rest of the document).

tabular

(Tex only.) Character scalar equal to "normal" (default),"*" or"X".Represents the type of tabular environment to use: eithertabular,⁠tabular*⁠ ortabularx.

meta

(Tex only.) A one-sided formula that shall contain the following elements:date or time, sys, author, comment and call. Default isNULL. This argument is a shortcut tocontrolling the meta information that can be displayed in comments before the table. Typicallyif the element is in the formula, it means that the argument will be equal toTRUE.Example:meta = ~time+call is equivalent tometa.time = TRUE andmeta.call = TRUE.The "author" and "comment" elements are a bit special. Usingmeta = ~author("Mark") isequivalent tometa.author = "Mark" whilemeta=~author is equiv. tometa.author = TRUE.The "comment" must be used with a character string inside:meta = ~comment("this is a comment"). The order in the formula controls the order ofappearance of the meta elements. It also has precedence over themeta.XX arguments.

meta.time

(Tex only.) Either a logical scalar (default isFALSE) or "time" or "date".Whether to include the time (ifTRUE or "time") or the date (if "date") of creation of thetable in a comment right before the table.

meta.author

(Tex only.) A logical scalar (default isFALSE) or a character vector. IfTRUE then the identity of the author (deduced from the system user inSys.info()) isinserted in a comment right before the table. If a character vector, then it should containauthor names that will be inserted as comments before the table, prefixed with"Created by:".For free-form comments see the argumentmeta.comment.

meta.sys

(Tex only.) A logical scalar, default isFALSE. Whether to include systeminformation (fromSys.info()) in a comment right before the table.

meta.call

(Tex only.) Logical scalar, default isFALSE. IfTRUE then the call to thefunction is inserted right before the table in a comment.

meta.comment

(Tex only.) A character vector containing free-form comments to be insertedright before the table.

view

Logical, default isFALSE. IfTRUE, then the table generated in Latex byetable and then is displayed in the viewer pane. Note that for this option to work you needi) pdflatex or the R packagetinytex, ii) imagemagick and ghostscript, or theR packagepdftools. All three software must be installed and on the path.

markdown

Character scalar giving the location of a directory, or a logical scalar.Default isNULL. This argument only works in Rmarkdown documents, when knitting the document.If provided: two behaviors depending on context. A) if the output document is Latex, the tableis exported in Latex. B) if the output document is not Latex, the table will be exported to PNGat the desired location and inserted in the document via a markdown link. If equal toTRUE,the default location of the PNGs is a temporary folder for⁠R > 4.0.0⁠,or to"images/etable/" for earlier versions.

tex

Logical: whether the results should be a data.frame or a Latex table. By default,this argument isTRUE if the argumentfile (used for exportation) is not missing; it isequal toFALSE otherwise.

view.cache

Logical, default isTRUE. Only used whenview = TRUE.Whether the PNGs of the tables should be cached.

reset

(setFixest_etable only.) Logical, default isFALSE. IfTRUE, this will resetall the default values that were already set by the user in previous calls.

save

Either a logical or equal to"reset". Default isFALSE. IfTRUE then the valueis set permanently at the project level, this means that if you restart R, you will still obtainthe previously saved defaults. This is done by writing in the".Renviron" file, located in theproject's working directory, hence we must have write permission there for this to work, andonly works with Rstudio. If equal to "reset", the default at the project level is erased. Sincethere is writing in a file involved, permission is asked to the user.

x

An object returned byetable.

type

Character scalar equal to 'pdflatex' (default), 'magick', 'dir' or 'tex'.Which log file to report; if 'tex', the full source code of the tex file is returned,if 'dir': the directory of the log files is returned.

Details

The functionesttex is equivalent to the functionetable with argumenttex = TRUE.

The functionesttable is equivalent to the functionetable with argumenttex = FALSE.

To display the table, you will need the Latex packagebooktabs which containsthe⁠\\toprule⁠,⁠\\midrule⁠ and⁠\\bottomrule⁠ commands.

You can permanently change the way your table looks in Latex by usingsetFixest_etable.The following vignette gives an example as well as illustrates how to use thestyle andpostprocessing functions:Exporting estimation tables.

When the argumentpostprocess.tex is not missing, two additional tags willbe included in the character vector returned byetable:"%start:tab\\n" and"%end:tab\\n". These can be usedto identify the start and end of the tabular and are useful to insert codewithin thetable environment.

Value

Iftex = TRUE, the lines composing the Latex table are returned invisibly whilethe table is directly prompted on the console.

Iftex = FALSE, the data.frame is directly returned. If the argumentfile isnot missing, thedata.frame is printed and returned invisibly.

Functions

Latex dependencies

Some features require specific Latex dependencies, these are:

Here is a summary:

% required\usepackage{booktabs}\usepackage{array}\usepackage{multirow}\usepackage{amsmath}\usepackage{amssymb}% optionnal, dependent on context\usepackage{makecell}\usepackage{tabularx}\usepackage[flushleft]{threeparttable}\usepackage{adjustbox}\usepackage[dvipsnames,table]{xcolor}\usepackage{tikz}\usetikzlibrary{matrix, shapes, arrows, fit, tikzmark}\usepackage{colortbl}

How doesdigits handle the number of decimals displayed?

The default display of decimals is the outcome of an algorithm. Let's take the exampleofdigits = 3 which "kind of" requires 3 significant digits to be displayed.

For numbers greater than 1 (in absolute terms), their integral part isalways displayed and the number of decimals shown is equal todigitsminus the number of digits in the integral part.This means that12.345 will be displayed as12.3.If the number of decimals should be 0, then a single decimal is displayedto suggest that the number is not whole. This means that1234.56 willbe displayed as1234.5. Note that if the number is whole, no decimals are shown.

For numbers lower than 1 (in absolute terms), the number of decimals displayed is equaltodigits except if there are only 0s in which case the first significantdigit is shown.This means that0.01234 will be displayed as0.012 (first rule),and that 0.000123 will be displayed as0.0001 (second rule).

Arguments keep, drop and order

The argumentskeep,drop andorder use regular expressions. If you are not awareof regular expressions, I urge you to learn it, since it is an extremely powerful wayto manipulate character strings (and it exists across most programming languages).

For example drop = "Wind" would drop any variable whose name contains "Wind". Note thatvariables such as "Temp:Wind" or "StrongWind" do contain "Wind", so would be dropped.To drop only the variable named "Wind", you need to usedrop = "^Wind$" (with "^" meaning beginning, resp. "$" meaning end,of the string => this is the language of regular expressions).

Although you can combine several regular expressions in a single characterstring using pipes,drop also accepts a vector of regular expressions.

You can use the special character "!" (exclamation mark) to reverse the effectof the regular expression (this feature is specific to this function).For exampledrop = "!Wind" would drop any variable that does not contain "Wind".

You can use the special character "%" (percentage) to make reference to theoriginal variable name instead of the aliased name. For example, you have avariable named"Month6", and use a dictionarydict = c(Month6="June").Thus the variable will be displayed as"June".If you want to delete that variable, you can use eitherdrop="June",ordrop="%Month6" (which makes reference to its original name).

The argumentorder takes in a vector of regular expressions, the order will follow theelements of this vector. The vector gives a list of priorities,on the left the elements with highest priority.For example, order = c("Wind", "!Inter", "!Temp") would give highest priorities tothe variables containing "Wind" (which would then appear first),second highest priority is the variables not containing "Inter", last,with lowest priority, the variables not containing "Temp".If you had the following variables: (Intercept), Temp:Wind, Wind, Temp youwould end up with the following order: Wind, Temp:Wind, Temp, (Intercept).

The argumentextralines

The argumentextralines adds well... extra lines to the table.It accepts either a list, or a one-sided formula.

For each line, you can define the values taken by each cell using 4 different ways:a) a vector, b) a list, c) a function, and d) a formula.

If a vector, it should represent the values taken by each cell. Note that if thelength of the vector is smaller than the number of models, its values arerecycled across models, but the length of the vector is required to be adivisor of the number of models.

If a list, it should be of the form⁠list("item1" = #item1, "item2" = #item2, etc)⁠.For examplelist("A"=2, "B"=3) leads toc("A", "A", "B", "B", "B").Note that if the number of items is 1, you don't need to add⁠= 1⁠.For examplelist("A"=2, "B") is valid and leads to⁠c("A", "A", "B"⁠. As for the vector the values are recycled if necessary.

If a function, it will be applied to each model and should return a scalar (NA valuesreturned are accepted).

If a formula, it must be one-sided and the elements in the formula must represent eitherextralines macros, either fit statistics (i.e. valid types ofthe functionfitstat).One new line will be added for each element of the formula.To registerextralines macros, you must first register them inextralines_register.

Finally, you can combine as many lines as wished by nesting them in a list.The names of the nesting list are the row titles (values in the leftmost cell).For exampleextralines = list(~r2, Controls = TRUE, Group = list("A"=2, "B")) willadd three lines, the titles of which are "R2", "Controls" and "Group".

Controlling the placement of extra lines

The argumentsgroup,extralines andfixef.group allow to add customized lines in thetable. They can be defined via a list where the list name will be the row name.By default, the placement of the extra line is right after the coefficients(except forfixef.group, covered in the last paragraph).For instance,group = list("Controls" = "x[[:digit:]]") will create aline right after the coefficients telling which models contain the control variables.

But the placement can be customized. The previous example (of the controls) willbe used for illustration (the mechanism forextralines andfixef.group is identical).

The row names accept 2 special characters at the very start.The first character tells in which section the line should appear:it can be equal to"^","-", or"_", meaning respectivelythe coefficients, the fixed-effects and the statistics section(which typically appear at the top, mid and bottom of the table).The second one governs the placement of the new line withinthe section: it can be equal to"^", meaning first line, or"_", meaning last line.

Let's have some examples. Using the previous example, writing"_^Controls"would place the new line at the top of the statistics section.Writing"-_Controls" places it as the last row ofthe fixed-effects section;"^^Controls" at the top row ofthe coefficients section; etc...

The second character is optional, the default placement being in the bottom.This means that"_Controls" would place it at the bottom of the statistics section.

The placement infixef.group is defined similarly, only the defaultplacement is different.Its default placement is at the top of the fixed-effects section.

Escaping special Latex characters

By default on all instances (with the notable exception of the elements ofstyle.tex)special Latex characters are escaped. This means thatcaption="Exports in million $." will be exported as"Exports in million \\$.": the dollar sign will be escaped.This is true for the following characters: &,$, %, _, ^ and #.

Note, importantly, that equations are NOT escaped. This means thatcaption="Functional form $a_i \\times x^b$, variation in %." will be displayed as:"Functional form $a_i \\times x^b$, variation in \\%.": only thelast percentage will be escaped.

If for some reason you don't want the escaping to take place, the argumentsheaders andextralines are the only ones allowing that. To disable escaping, add the special token":tex:" in the row names.Example: inheaders=list(":tex:Row title"="weird & & %\\n tex stuff\\\\"),the elements will be displayed verbatim. Of course, since it can easily ruin your table,it is only recommended to super users.

Markdown markup

Within anything that is Latex-escaped (see previous section), you can use a markdown-stylemarkup to put the text in italic and/or bold. Use⁠*text*⁠,⁠**text**⁠ or⁠***text***⁠ toput some text in, respectively, italic (with⁠\\textit⁠),bold (with⁠\\textbf⁠) and italic-bold.

The markup can be escaped by using an backslash first. For example"***This: \\***, are three stars***" will leave the three stars in the middle untouched.

Author(s)

Laurent Berge

See Also

For styling the table:setFixest_etable,style.tex,style.df.

See also the main estimation functionsfemlm,feols orfeglm.Usesummary.fixestto see the results with the appropriate standard-errors,fixef.fixest to extract thefixed-effects coefficients.

Examples

est1 = feols(Ozone ~ i(Month) / Wind + Temp, data = airquality)est2 = feols(Ozone ~ i(Month, Wind) + Temp | Month, data = airquality)# Displaying the two results in a single tableetable(est1, est2)# keep/drop: keeping only interactionsetable(est1, est2, keep = " x ")# or using drop  (see regexp help):etable(est1, est2, drop = "^(Month|Temp|\\()")# keep/drop: dropping interactionsetable(est1, est2, drop = " x ")# or using keep ("!" reverses the effect):etable(est1, est2, keep = "! x ")# order: Wind variable first, intercept last (note the "!" to reverse the effect)etable(est1, est2, order = c("Wind", "!Inter"))# Month, then interactions, then the restetable(est1, est2, order = c("^Month", " x "))## dict## You can rename variables with dict = c(var1 = alias1, var2 = alias2, etc)# You can also rename values taken by factors.# Here's a full example:dict = c(Temp = "Temperature", "Month::5"="May", "6"="Jun")etable(est1, est2, dict = dict)# Note the difference of treatment between Jun and May# Assume the following dictionary:dict = c("Month::5"="May", "Month::6"="Jun", "Month::7"="Jul",         "Month::8"="Aug", "Month::9"="Sep")# We would like to keep only the Months, but now the names are all changed...# How to do?# We can use the special character '%' to make reference to the original names.etable(est1, est2, dict = dict, keep = "%Month")## signif.code#etable(est1, est2, signif.code = c(" A"=0.01, " B"=0.05, " C"=0.1, " D"=0.15, " F"=1))## Using the argument style to customize Latex exports## If you don't like the default layout of the table, no worries!# You can modify many parameters with the argument style# To drop the headers before each section, use:# Note that a space adds an extra linestyle_noHeaders = style.tex(var.title = "", fixef.title = "", stats.title = " ")etable(est1, est2, dict = dict, tex = TRUE, style.tex = style_noHeaders)# To change the lines of the table + dropping the table footerstyle_lines = style.tex(line.top = "\\toprule", line.bottom = "\\bottomrule",                    tablefoot = FALSE)etable(est1, est2, dict = dict, tex = TRUE, style.tex = style_lines)# Or you have the predefined type "aer"etable(est1, est2, dict = dict, tex = TRUE, style.tex = style.tex("aer"))## Group and extralines## Sometimes it's useful to group control variables into a single line# You can achieve that with the group argumentsetFixest_fml(..ctrl = ~ poly(Wind, 2) + poly(Temp, 2))est_c0 = feols(Ozone ~ Solar.R, data = airquality)est_c1 = feols(Ozone ~ Solar.R + ..ctrl, data = airquality)est_c2 = feols(Ozone ~ Solar.R + Solar.R^2 + ..ctrl, data = airquality)etable(est_c0, est_c1, est_c2, group = list(Controls = "poly"))# 'group' here does the same as drop = "poly", but adds an extra line# with TRUE/FALSE where the variables were found# 'extralines' adds an extra line, where you can add the value for each modelest_all  = feols(Ozone ~ Solar.R + Temp + Wind, data = airquality)est_sub1 = feols(Ozone ~ Solar.R + Temp + Wind, data = airquality,                 subset = ~ Month %in% 5:6)est_sub2 = feols(Ozone ~ Solar.R + Temp + Wind, data = airquality,                 subset = ~ Month %in% 7:8)est_sub3 = feols(Ozone ~ Solar.R + Temp + Wind, data = airquality,                 subset = ~ Month == 9)etable(est_all, est_sub1, est_sub2, est_sub3,       extralines = list("Sub-sample" = c("All", "May-June", "Jul.-Aug.", "Sept.")))# You can monitor the placement of the new lines with two special characters# at the beginning of the row name.# 1) "^", "-" or "_" which mean the coefficients, the fixed-effects or the# statistics section.# 2) "^" or "_" which mean first or last line of the section## Ex: starting with "_^" will place the line at the top of the stat. section#     starting with "-_" will place the line at the bottom of the FEs section#     etc.## You can use a single character which will represent the section,# the line would then appear at the bottom of the section.# Examplesetable(est_c0, est_c1, est_c2, group = list("_Controls" = "poly"))etable(est_all, est_sub1, est_sub2, est_sub3,       extralines = list("^^Sub-sample" = c("All", "May-June", "Jul.-Aug.", "Sept.")))## headers## You can add header lines with 'headers'# These lines will appear at the top of the table# first, 3 estimationsest_header = feols(c(Ozone, Solar.R, Wind) ~  poly(Temp, 2), airquality)# header => vector: adds a line w/t titleetable(est_header, headers = c("A", "A", "B"))# header => list: identical way to do the previous header# The form is: list(item1 = #item1, item2 = #item2,  etc)etable(est_header, headers = list("A" = 2, "B" = 1))# Adding a title +# when an element is to be repeated only once, you can avoid the "= 1":etable(est_header, headers = list(Group = list("A" = 2, "B")))# To change the placement, add as first character:# - "^" => top# - "-" => mid (default)# - "_" => bottom# Note that "mid" and "top" are only distinguished when tex = TRUE# Placing the new header line at the bottometable(est_header, headers = list("_Group" = c("A", "A", "B"),                                  "^Currency" = list("US $" = 2, "CA $" = 1)))# In Latex, you can add "grouped underlines" (cmidrule from the booktabs package)# by adding ":_:" in the title:etable(est_header, tex = TRUE,       headers = list("^:_:Group" = c("A", "A", "B")))## extralines and headers: .() for list()## In the two arguments extralines and headers, .() can be used for list()# For example:etable(est_header, headers = .("^Currency" = .("US $" = 2, "CA $" = 1)))## fixef.group## You can group the fixed-effects line with fixef.groupest_0fe = feols(Ozone ~ Solar.R + Temp + Wind, airquality)est_1fe = feols(Ozone ~ Solar.R + Temp + Wind | Month, airquality)est_2fe = feols(Ozone ~ Solar.R + Temp + Wind | Month + Day, airquality)# A) automatic way => simply use fixef.group = TRUEetable(est_0fe, est_2fe, fixef.group = TRUE)# Note that when grouping would lead to inconsistencies across models,# it is avoidedetable(est_0fe, est_1fe, est_2fe, fixef.group = TRUE)# B) customized way => use a listetable(est_0fe, est_2fe, fixef.group = list("Dates" = "Month|Day"))# Note that when a user grouping would lead to inconsistencies,# the term partial replaces yes/no and the fixed-effects are not removed.etable(est_0fe, est_1fe, est_2fe, fixef.group = list("Dates" = "Month|Day"))# Using customized placement => as with 'group' and 'extralines',# the user can control the placement of the new line.# See the previous 'group' examples and the dedicated section in the help.# On top of the coefficients:etable(est_0fe, est_2fe, fixef.group = list("^^Dates" = "Month|Day"))# Last line of the statisticsetable(est_0fe, est_2fe, fixef.group = list("_Dates" = "Month|Day"))## Using custom functions to compute the standard errors## You can use external functions to compute the VCOVs# by feeding functions in the 'vcov' argument.# Let's use some covariances from the sandwich packageetable(est_c0, est_c1, est_c2, vcov = sandwich::vcovHC)# To add extra arguments to vcovHC, you need to write your wrapper:etable(est_c0, est_c1, est_c2, vcov = function(x) sandwich::vcovHC(x, type = "HC0"))## Customize which fit statistic to display## You can change the fit statistics with the argument fitstat# and you can rename them with the dictionaryetable(est1, est2, fitstat = ~ r2 + n + G)# If you use a formula, '.' means the default:etable(est1, est2, fitstat = ~ ll + .)## Computing a different SE for each model#est = feols(Ozone ~ Solar.R + Wind + Temp, data = airquality)## Method 1: use summarys1 = summary(est, "iid")s2 = summary(est, cluster = ~ Month)s3 = summary(est, cluster = ~ Day)s4 = summary(est, cluster = ~ Day + Month)etable(list(s1, s2, s3, s4))## Method 2: using a list in the argument 'vcov'est_bis = feols(Ozone ~ Solar.R + Wind + Temp | Month, data = airquality)etable(est, est_bis, vcov = list("hetero", ~ Month))# When you have only one model, this model is replicated# along the elements of the vcov list.etable(est, vcov = list("hetero", ~ Month))## Method 3: Using "each" or "times" in vcov# If the first element of the list in 'vcov' is "each" or "times",# then all models will be replicated and all the VCOVs will be# applied to each model. The order in which they are replicated# are governed by the each/times keywords.# eachetable(est, est_bis, vcov = list("each", "iid", ~ Month, ~ Day))# timesetable(est, est_bis, vcov = list("times", "iid", ~ Month, ~ Day))## Notes and markup## Notes can be also be set in a dictionary# You can use markdown markup to put text into italic/bolddict = c("note 1" = "*Notes:* This data is not really random.",         "source 1" = "**Source:** the internet?")est = feols(Ozone ~ csw(Solar.R, Wind, Temp), data = airquality)etable(est, dict = dict, tex = TRUE, notes = c("note 1", "source 1"))

Registerextralines macros to be used inetable

Description

This function is used to createextralines (which is an argument ofetable) macrosthat can be easily summoned inetable.

Usage

extralines_register(type, fun, alias)

Arguments

type

A character scalar giving the type-name.

fun

A function to be applied to afixest estimation. It must return a scalar.

alias

A character scalar. This is the alias to be used in lieu of the type name toform the row name.

Details

You can register as many macros as you wish, the only constraint is that the type name should not conflict with afitstat type name.

Examples

# We register a function computing the standard-deviation of the dependent variablemy_fun = function(x) sd(model.matrix(x, type = "lhs"))extralines_register("sdy", my_fun, "SD(y)")# An estimationdata(iris)est = feols(Petal.Length ~ Sepal.Length | Species, iris)# Now we can easily create a row with the SD of y.# We just "summon" it in a one-sided formulaetable(est, extralines = ~ sdy)# We can change the alias on the fly:etable(est, extralines = list("_Standard deviation of the dep. var." = ~ sdy))

Lags a variable in afixest estimation

Description

Produce lags or leads in the formulas offixest estimations or when creating variables inadata.table::data.table. The data must be set as a panel beforehand (either withthe functionpanel or with the argumentpanel.id in the estimation).

Usage

f(x, k = 1, fill = NA)d(x, k = 1, fill = NA)l(x, k = 1, fill = NA)

Arguments

x

The variable.

k

A vector of integers giving the number of lags (forl() andd()) orleads (forf()). Forl() andd() negative values lead to leads. Forf()negative values lead to lags.This argument can be a vector when using it in fixest estimations. When creating variables inadata.table::data.table, itmust be of length one.

fill

A scalar, default isNA. How to fill the missing values due to the lag/lead?Note that in afixest estimation, 'fill' must be numeric (not required whencreating new variables).

Value

These functions can only be used i) in a formula of afixest estimation, or ii) whencreating variables within afixest_panel object (obtained with functionpanel) whichis alaos adata.table::data.table.

Functions

See Also

The functionpanel changesdata.frames into a panel from which the functionslandf can be called. Otherwise you can set the panel 'live' during the estimation usingthe argumentpanel.id (see for example in the functionfeols).

Examples

data(base_did)# Setting a data set as a panel...pdat = panel(base_did, ~ id + period)# ...then using the functions l and fest1 = feols(y ~ l(x1, 0:1), pdat)est2 = feols(f(y) ~ l(x1, -1:1), pdat)est3 = feols(l(y) ~ l(x1, 0:3), pdat)etable(est1, est2, est3, order = c("f", "^x"), drop = "Int")# or using the argument panel.idfeols(f(y) ~ l(x1, -1:1), base_did, panel.id = ~id + period)feols(d(y) ~ d(x1), base_did, panel.id = ~id + period)# l() and f() can also be used within a data.table:if(require("data.table")){  pdat_dt = panel(as.data.table(base_did), ~id+period)  # Now since pdat_dt is also a data.table  #   you can create lags/leads directly  pdat_dt[, x1_l1 := l(x1)]  pdat_dt[, x1_d1 := d(x1)]  pdat_dt[, c("x1_l1_fill0", "y_f2") := .(l(x1, fill = 0), f(y, 2))]}

Formatted dimension

Description

Prints the dimension of a data set, in an user-readable way

Usage

fdim(x)

Arguments

x

An R object, usually a data.frame (but can also be a vector).

Value

It does not return anything, the output is directly printed on the console.

Author(s)

Laurent Berge

Examples

fdim(iris)fdim(iris$Species)

Fixed effects nonlinear maximum likelihood models

Description

This function estimates maximum likelihood models (e.g., Poisson or Logit) with non-linearin parameters right-hand-sides and is efficient to handle any number of fixed effects.If you do not use non-linear in parameters right-hand-side, usefemlm orfeglminstead (their design is simpler).

Usage

feNmlm(  fml,  data,  family = c("poisson", "negbin", "logit", "gaussian"),  NL.fml,  vcov,  fixef,  fixef.rm = "perfect_fit",  NL.start,  lower,  upper,  NL.start.init,  offset,  subset,  split,  fsplit,  split.keep,  split.drop,  cluster,  se,  ssc,  panel.id,  panel.time.step = NULL,  panel.duplicate.method = "none",  start = 0,  jacobian.method = "simple",  useHessian = TRUE,  hessian.args = NULL,  opt.control = list(),  nthreads = getFixest_nthreads(),  lean = FALSE,  verbose = 0,  theta.init,  fixef.tol = 1e-05,  fixef.iter = 10000,  deriv.tol = 1e-04,  deriv.iter = 1000,  warn = TRUE,  notes = getFixest_notes(),  fixef.keep_names = NULL,  mem.clean = FALSE,  only.env = FALSE,  only.coef = FALSE,  data.save = FALSE,  env,  ...)

Arguments

fml

A formula. This formula gives the linear formula to be estimated(it is similar to alm formula), for example:fml = z~x+y. To includefixed-effects variables, insert them in this formula using a pipe(e.g.fml = z~x+y|fixef_1+fixef_2). To include a non-linear in parameters element,you must use the argmentNL.fml. Multiple estimations can be performed at once:for multiple dep. vars, wrap them inc(): exc(y1, y2). For multiple indep.vars, use the stepwise functions: exx1 + csw(x2, x3). This leads to 6 estimationfml = c(y1, y2) ~ x1 + cw0(x2, x3). See details. Square brackets starting with adot can be used to call global variables:y.[i] ~ x.[1:2] will lead toy3 ~ x1 + x2 ifi is equal to 3 in the current environment (see details inxpd).

data

A data.frame containing the necessary variables to run the model.The variables of the non-linear right hand side of the formula are identifiedwith thisdata.frame names. Can also be a matrix.

family

Character scalar. It should provide the family. The possible valuesare "poisson" (Poisson model with log-link, the default), "negbin" (Negative Binomialmodel with log-link), "logit" (LOGIT model with log-link), "gaussian" (Gaussian model).

NL.fml

A formula. If provided, this formula represents the non-linear part ofthe right hand side (RHS). Note that contrary to thefml argument, thecoefficients must explicitly appear in this formula. For instance, it can be~a*log(b*x + c*x^3), wherea,b, andc are the coefficients to be estimated.Note that only the RHS of the formula is to be provided, and NOT the left hand side.

vcov

Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. TheVCOV types implemented are: "iid", "hetero" (or "HC1"), "cluster", "twoway","NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley". It also acceptsobject fromvcov_cluster,vcov_NW,NW,vcov_DK,DK,vcov_conley andconley. It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances. See thevcov documentationin thevignette.

fixef

Character vector. The names of variables to be used as fixed-effects.These variables should contain the identifier of each observation (e.g., think of itas a panel identifier). Note that the recommended way to include fixed-effects is toinsert them directly in the formula.

fixef.rm

Can be equal to "perfect_fit" (default), "singletons", "infinite_coef"or "none".

This option controls which observations should be removed prior to the estimation.If "singletons", fixed-effects associated to a single observation are removed(since they perfectly explain it).

The value "infinite_coef" only works with GLM families with limited left hand sides (LHS)and exponential link.For instance the Poisson family for which the LHS cannot be lower than 0, or the logitfamily for which the LHS lies within 0 and 1.In that case the fixed-effects (FEs) with only-0 LHS would lead to infinite coefficients(FE = -Inf would explain perfectly the LHS).The valuefixef.rm="infinite_coef" removes all observations associated to FEs withinfinite coefficients.

If "perfect_fit", it is equivalent to "singletons" and "infinite_coef" combined.That means all observations that are perfectly explained by the FEs are removed.

If "none": no observation is removed.

Note that whathever the value of this options: the coefficient estimateswill remain the same. It only affects inference (the standard-errors).

The algorithm is recursive, meaning that, e.g. in the presence of several fixed-effects (FEs),removing singletons in one FE can create singletons (or perfect fits) in another FE.The algorithm continues until there is no singleton/perfect-fit remaining.

NL.start

(For NL models only) A list of starting values for the non-linear parameters.ALL the parameters are to be named and given a staring value.Example:NL.start=list(a=1,b=5,c=0). Though, there is an exception: if allparameters are to be given the same starting value, you can use a numeric scalar.

lower

(For NL models only) A list. The lower bound for each of the non-linearparameters that requires one. Example:lower=list(b=0,c=0). Beware, if the estimatedparameter is at his lower bound, then asymptotic theory cannot be applied and thestandard-error of the parameter cannot be estimated because the gradient willnot be null. In other words, when at its upper/lower bound, the parameter isconsidered as 'fixed'.

upper

(For NL models only) A list. The upper bound for each of the non-linearparameters that requires one. Example:upper=list(a=10,c=50). Beware, if theestimated parameter is at his upper bound, then asymptotic theory cannot be appliedand the standard-error of the parameter cannot be estimated because the gradientwill not be null. In other words, when at its upper/lower bound, the parameteris considered as 'fixed'.

NL.start.init

(For NL models only) Numeric scalar. If the argumentNL.startis not provided, or only partially filled (i.e. there remain non-linear parameterswith no starting value), then the starting value of all remaining non-linear parametersis set toNL.start.init.

offset

A formula or a numeric vector. An offset can be added to the estimation.If equal to a formula, it should be of the form (for example)~0.5*x**2. Thisoffset is linearly added to the elements of the main formula 'fml'.

subset

A vector (logical or numeric) or a one-sided formula. If provided,then the estimation will be performed only on the observations defined by this argument.

split

A one sided formula representing a variable (egsplit = ~var) or a vector.If provided, the sample is split according to the variable and one estimation is performedfor each value of that variable. If you also want to include the estimation for thefull sample, use the argumentfsplit instead. You can use the special operators⁠%keep%⁠ and⁠%drop%⁠ to select only a subset of values for which to split thesample. E.g.split = ~var %keep% c("v1", "v2") will split the sample only accordingto the valuesv1 andv2 of the variablevar; it is equivalent to supplying theargumentsplit.keep = c("v1", "v2"). By default there is partial matching on each value,you can trigger a regular expression evaluation by adding a'@' first,as in:~var %drop% "@^v[12]" which will drop values starting with"v1" or"v2" (of course you need to know regexes!).

fsplit

A one sided formula representing a variable (egfsplit = ~var) or a vector.If provided, the sample is split according to the variable and one estimation is performedfor each value of that variable. This argument is the same assplit but also includes thefull sample as the first estimation. You can use the special operators⁠%keep%⁠ and⁠%drop%⁠to select only a subset of values for which to split the sample.E.g.fsplit = ~var %keep% c("v1", "v2") will split the sample only according to thevaluesv1 andv2 of the variablevar; it is equivalent to supplying theargumentsplit.keep = c("v1", "v2"). By default there is partial matching on each value,you can trigger a regular expression evaluation by adding an'@' first,as in:~var %drop% "@^v[12]" which will drop values starting with"v1"or"v2" (of course you need to know regexes!).

split.keep

A character vector. Only used whensplit, orfsplit, is supplied.If provided, then the sample will be split only on the values ofsplit.keep.The values insplit.keep will be partially matched to the values ofsplit.To enable regular expressions, you need to add an'@' first.For examplesplit.keep = c("v1", "@other|var") will keep only the valueinsplit partially matched by"v1" or the values containing"other" or"var".

split.drop

A character vector. Only used whensplit, orfsplit, is supplied.If provided, then the sample will be split only on the values that are not insplit.drop.The values insplit.drop will be partially matched to the values ofsplit.To enable regular expressions, you need to add an'@' first. For examplesplit.drop = c("v1", "@other|var") will drop only the value insplit partiallymatched by"v1" or the values containing"other" or"var".

cluster

Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering overvar1 andvar2contained in the data.framebase used for the estimation. All the followingcluster arguments are valid and do the same thing:cluster = base[, c("var1", "var2")],cluster = c("var1", "var2"),cluster = ~var1+var2.If the two variables were used as fixed-effects in the estimation, you can leave itblank withvcov = "twoway" (assumingvar1 [resp.var2] wasthe 1st [resp. 2nd] fixed-effect). You can interact two variables using^ withthe following syntax:cluster = ~var1^var2 orcluster = "var1^var2".

se

Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation:se = "cluster", otherwisese = "iid". Note that this argument is deprecated,you should usevcov instead.

ssc

An object of classssc.type obtained with the functionssc. Representshow the degree of freedom correction should be done.You must use the functionsscfor this argument. The arguments and defaults of the functionssc are:K.adj = TRUE,K.fixef = "nonnested",G.adj = TRUE,G.df = "min",t.df = "min",⁠K.exact = FALSE)⁠. See the help of the functionssc for details.

panel.id

The panel identifiers. Can either be: i) a one sided formula(e.g.panel.id = ~id+time), ii) a character vector of length 2(e.g.panel.id=c('id', 'time'), or iii) a character scalar of two variablesseparated by a comma (e.g.panel.id='id,time'). Note that you can combine variableswith^ only inside formulas (see the dedicated section infeols).

panel.time.step

The method to compute the lags, default isNULL (which meansautomatically set). Can be equal to:"unitary","consecutive","within.consecutive",or to a number. If"unitary", then the largest common divisor between consecutivetime periods is used (typically if the time variable represents years, it will be 1).This method can apply only to integer (or convertible to integer) variables.If"consecutive", then the time variable can be of any type: two successivetime periods represent a lag of 1. If"witihn.consecutive" thenwithin a given id,two successive time periods represent a lag of 1. Finally, if the time variable is numeric,you can provide your own numeric time step.

panel.duplicate.method

If several observations have the same id and time values,then the notion of lag is not defined for them. Ifduplicate.method = "none" (default)and duplicate values are found, this leads to an error. You can useduplicate.method = "first" so that the first occurrence of identical id/timeobservations will be used as lag.

start

Starting values for the coefficients in the linear part (for the non-linearpart, use NL.start). Can be: i) a numeric of length 1 (e.g.start = 0, the default),ii) a numeric vector of the exact same length as the number of variables, or iii) anamed vector of any length (the names will be used to initialize the appropriate coefficients).

jacobian.method

(For NL models only) Character scalar. Provides the methodused to numerically compute the Jacobian of the non-linear part.Can be either"simple" or"Richardson". Default is"simple".See the help ofnumDeriv::jacobian() for more information.

useHessian

Logical. Should the Hessian be computed in the optimization stage?Default isTRUE.

hessian.args

List of arguments to be passed to functionnumDeriv::genD().Defaults is missing. Only used with the presence ofNL.fml.

opt.control

List of elements to be passed to the optimization methodnlminb.See the help page ofnlminb for more information.

nthreads

The number of threads. Can be: a) an integer lower than, or equal to,the maximum number of threads; b) 0: meaning all available threads will be used;c) a number strictly between 0 and 1 which represents the fraction of all threads to use.The default is to use 50% of all threads. You can set permanently the numberof threads used within this package using the functionsetFixest_nthreads.

lean

Logical scalar, default isFALSE. IfTRUE then all large objects are removedfrom the returned result: this will save memory but will block the possibility touse many methods. It is recommended to use the argumentsse orcluster toobtain the appropriate standard-errors at estimation time, since obtaining differentSEs won't be possible afterwards.

verbose

Integer, default is 0. It represents the level of information thatshould be reported during the optimisation process. Ifverbose=0:nothing is reported. Ifverbose=1: the value of the coefficients and thelikelihood are reported. Ifverbose=2:1 + information on the computing time ofthe null model, the fixed-effects coefficients and the hessian are reported.

theta.init

Positive numeric scalar. The starting value of the dispersionparameter iffamily="negbin". By default, the algorithm uses as a starting valuethe theta obtained from the model with only the intercept.

fixef.tol

Precision used to obtain the fixed-effects. Defaults to1e-5.It corresponds to the maximum absolute difference allowed between two coefficientsof successive iterations. Argumentfixef.tol cannot be lowerthan10000*.Machine$double.eps. Note that this parameter is dynamicallycontrolled by the algorithm.

fixef.iter

Maximum number of iterations in fixed-effects algorithm(only in use for 2+ fixed-effects). Default is 10000.

deriv.tol

Precision used to obtain the fixed-effects derivatives. Defaults to1e-4.It corresponds to the maximum absolute difference allowed between two coefficients ofsuccessive iterations. Argumentderiv.tol cannot be lower than10000*.Machine$double.eps.

deriv.iter

Maximum number of iterations in the algorithm to obtain the derivativeof the fixed-effects (only in use for 2+ fixed-effects). Default is 1000.

warn

Logical, default isTRUE. Whether warnings should be displayed(concerns warnings relating to convergence state).

notes

Logical. By default, two notes are displayed: when NAs are removed(to show additional information) and when some observations are removed becauseof only 0 (or 0/1) outcomes in a fixed-effect setup (in Poisson/Neg. Bin./Logit models).To avoid displaying these messages, you can setnotes = FALSE. You canremove these messages permanently by usingsetFixest_notes(FALSE).

fixef.keep_names

Logical orNULL (default). When you combine differentvariables to transform them into a single fixed-effects you can doe.g.y ~ x | paste(var1, var2).The algorithm provides a shorthand to do the same operation:y ~ x | var1^var2.Because pasting variables is a costly operation, the internal algorithm may use anumerical trick to hasten the process. The cost of doing so is that you lose the labels.If you are interested in getting the value of the fixed-effects coefficientsafter the estimation, you should usefixef.keep_names = TRUE. By default it isequal toTRUE if the number of unique values is lower than 50,000, and toFALSEotherwise.

mem.clean

Logical scalar, default isFALSE. Only to be used if the data set islarge compared to the available RAM. IfTRUE then intermediary objects are removed asmuch as possible andgc is run before each substantial C++ section in the internalcode to avoid memory issues.

only.env

(Advanced users.) Logical scalar, default isFALSE. IfTRUE, then onlythe environment used to make the estimation is returned.

only.coef

Logical scalar, default isFALSE. IfTRUE, then only the estimatedcoefficients are returned. Note that the length of the vector returned is alwaysthe length of the number of coefficients to be estimated: this means that thevariables found to be collinear are returned with an NA value.

data.save

Logical scalar, default isFALSE. IfTRUE, the data used forthe estimation is saved within the returned object. Hence later calls to predict(),vcov(), etc..., will be consistent even if the original data has been modifiedin the meantime.This is especially useful for estimations within loops, where the data changesat each iteration, such that postprocessing can be done outside the loop without issue.

env

(Advanced users.) Afixest environment created by afixest estimationwithonly.env = TRUE. Default is missing. If provided, the data from this environmentwill be used to perform the estimation.

...

Not currently used.

Details

This function estimates maximum likelihood models where the conditional expectationsare as follows:

Gaussian likelihood:

E(Y|X)=X\beta

Poisson and Negative Binomial likelihoods:

E(Y|X)=\exp(X\beta)

where in the Negative Binomial there is the parameter\theta used tomodel the variance as\mu+\mu^2/\theta, with\mu theconditional expectation.Logit likelihood:

E(Y|X)=\frac{\exp(X\beta)}{1+\exp(X\beta)}

When there are one or more fixed-effects, the conditional expectation can be written as:

E(Y|X) = h(X\beta+\sum_{k}\sum_{m}\gamma_{m}^{k}\times C_{im}^{k}),

whereh(.) is the function corresponding to the likelihood function as shown before.C^k is the matrix associated to fixed-effect dimensionk such thatC^k_{im}is equal to 1 if observationi is of categorym in thefixed-effect dimensionk and 0 otherwise.

When there are non linear in parameters functions, we can schematically splitthe set of regressors in two:

f(X,\beta)=X^1\beta^1 + g(X^2,\beta^2)

with first a linear term and then a non linear part expressed by the function g. That is,we add a non-linear term to the linear terms (which areX*beta andthe fixed-effects coefficients). It is always better (more efficient) to putinto the argumentNL.fml only the non-linear in parameter terms, andadd all linear terms in thefml argument.

To estimate only a non-linear formula without even the intercept, you mustexclude the intercept from the linear formula by using, e.g.,fml = z~0.

The over-dispersion parameter of the Negative Binomial family, theta,is capped at 10,000. If theta reaches this high value, it means that there is no overdispersion.

Value

Afixest object. Note thatfixest objects contain many elements and most of themare for internal use, they are presented here only for information. To access them,it is safer to use the user-level methods (e.g.vcov.fixest,resid.fixest,etc) or functions (like for instancefitstat to access any fit statistic).

coefficients

The named vector of coefficients.

coeftable

The table of the coefficients with their standard errors,z-values and p-values.

loglik

The loglikelihood.

iterations

Number of iterations of the algorithm.

nobs

The number of observations.

nparams

The number of parameters of the model.

call

The call.

fml

The linear formula of the call.

fml_all

A list containing different parts of the formula. Always containthe linear formula. Then, if relevant:fixef: the fixed-effects;NL: the non linearpart of the formula.

ll_null

Log-likelihood of the null model (i.e. with the intercept only).

pseudo_r2

The adjusted pseudo R2.

message

The convergence message from the optimization procedures.

sq.cor

Squared correlation between the dependent variable and the expectedpredictor (i.e. fitted.values) obtained by the estimation.

hessian

The Hessian of the parameters.

fitted.values

The fitted values are the expected value of the dependent variablefor the fitted model: that isE(Y|X).

cov.iid

The variance-covariance matrix of the parameters.

se

The standard-error of the parameters.

scores

The matrix of the scores (first derivative for each observation).

family

The ML family that was used for the estimation.

data

The original data set used when calling the function. Only available whenthe estimation was called withdata.save = TRUE

residuals

The difference between the dependent variable and the expected predictor.

sumFE

The sum of the fixed-effects for each observation.

offset

The offset formula.

NL.fml

The nonlinear formula of the call.

bounds

Whether the coefficients were upper or lower bounded. – This can only bethe case when a non-linear formula is included and the arguments 'lower' or 'upper'are provided.

isBounded

The logical vector that gives for each coefficient whether it wasbounded or not. This can only be the case when a non-linear formula is includedand the arguments 'lower' or 'upper' are provided.

fixef_vars

The names of each fixed-effect dimension.

fixef_id

The list (of length the number of fixed-effects) of thefixed-effects identifiers for each observation.

fixef_sizes

The size of each fixed-effect (i.e. the number of uniqueidentifier for each fixed-effect dimension).

obs_selection

(When relevant.) List containing vectors of integers. Itrepresents the sequential selection of observation vis a vis the original data set.

fixef_removed

In the case there were fixed-effects and some observationswere removed because of only 0/1 outcome within a fixed-effect, it gives thelist (for each fixed-effect dimension) of the fixed-effect identifiers that were removed.

theta

In the case of a negative binomial estimation: the overdispersion parameter.

@seealsoSee alsosummary.fixest to see the results with the appropriate standard-errors,fixef.fixest to extract the fixed-effects coefficients, and the functionetableto visualize the results of multiple estimations.

And other estimation methods:feols,femlm,feglm,fepois,fenegbin.

Lagging variables

To use leads/lags of variables in the estimation, you can: i) either provide the argumentpanel.id, ii) either set your data set as a panel with the functionpanel,f andd.

You can provide several leads/lags/differences at once: e.g. if your formula is equal tof(y) ~ l(x, -1:1), it means that the dependent variable is equal to the lead ofy,and you will have as explanatory variables the lead ofx1,x1 and the lag ofx1.See the examples in functionl for more details.

Interactions

You can interact a numeric variable with a "factor-like" variable by usingi(factor_var, continuous_var, ref), wherecontinuous_var will be interacted witheach value offactor_var and the argumentref is a value offactor_vartaken as a reference (optional).

Using this specific way to create interactions leads to a different display of theinteracted values inetable. See examples.

It is important to note thatif you do not care about the standard-errors ofthe interactions, then you can add interactions in the fixed-effects part of the formula,it will be incomparably faster (using the syntaxfactor_var[continuous_var], as explainedin the section “Varying slopes”).

The functioni has in fact more arguments, please see details in its associated help page.

On standard-errors

Standard-errors can be computed in different ways, you can use the argumentsse andsscinsummary.fixest to define how to compute them. By default, the VCOV is the "standard" one.

The following vignette:On standard-errors describes in details how the standard-errors are computed infixest and how you can replicate standard-errors from other software.

You can use the functionssetFixest_vcov andsetFixest_ssc topermanently set the way the standard-errors are computed.

Multiple estimations

Multiple estimations can be performed at once, they just have to be specified in the formula.Multiple estimations yield afixest_multi object which is ‘kind of’ a list ofall the results but includes specific methods to access the results in a handy way.Please have a look at the dedicated vignette:Multiple estimations.

To include multiple dependent variables, wrap them inc() (list() also works).For instancefml = c(y1, y2) ~ x1 would estimate the modelfml = y1 ~ x1 andthen the modelfml = y2 ~ x1.

To include multiple independent variables, you need to use the stepwise functions.There are 4 stepwise functions:sw,sw0,csw,csw0, andmvsw. Of courseswstands for stepwise, andcsw for cumulative stepwise. Finallymvsw is a bit special,it stands for multiverse stepwise. Let's explain that.Assume you have the following formula:fml = y ~ x1 + sw(x2, x3).The stepwise functionsw will estimate the following two models:y ~ x1 + x2 andy ~ x1 + x3. That is, each element insw() is sequentially, and separately,added to the formula. Would have you usedsw0 in lieu ofsw, then the modely ~ x1 would also have been estimated. The0 in the name means that the modelwithout any stepwise element also needs to be estimated.The prefixc means cumulative: each stepwise element is added to the next. That is,fml = y ~ x1 + csw(x2, x3) would lead to the following modelsy ~ x1 + x2 andy ~ x1 + x2 + x3. The0 has the same meaning and would also lead to the model withoutthe stepwise elements to be estimated: in other words,fml = y ~ x1 + csw0(x2, x3)leads to the following three models:y ~ x1,y ~ x1 + x2 andy ~ x1 + x2 + x3.Finallymvsw will add, in a stepwise fashion all possible combinations of the variablesin its arguments. For examplemvsw(x1, x2, x3) is equivalent tosw0(x1, x2, x3, x1 + x2, x1 + x3, x2 + x3, x1 + x2 + x3). The number of modelsto estimate grows at a factorial rate: so be cautious!

Multiple independent variables can be combined with multiple dependent variables, as infml = c(y1, y2) ~ cw(x1, x2, x3) which would lead to 6 estimations. Multipleestimations can also be combined to split samples (with the argumentssplit,fsplit).

You can also add fixed-effects in a stepwise fashion. Note that you cannot performstepwise estimations on the IV part of the formula (feols only).

If NAs are present in the sample, to avoid too many messages, only NA removalconcerning the variables common to all estimations is reported.

A note on performance. The feature of multiple estimations has been highly optimized forfeols, in particular in the presence of fixed-effects. It is faster to estimatemultiple models using the formula rather than with a loop. For non-feols models usingthe formula is roughly similar to using a loop performance-wise.

Argument sliding

When the data set has been set up globally usingsetFixest_estimation(data = data_set), the argumentvcov can be used implicitly.This means that calls such asfeols(y ~ x, "HC1"), orfeols(y ~ x, ~id), are valid:i) the data is automatically deduced from the global settings, and ii) thevcovis deduced to be the second argument.

Piping

Although the argument 'data' is placed in second position, the data can be piped to theestimation functions. For example, with R >= 4.1,mtcars |> feols(mpg ~ cyl) works asfeols(mpg ~ cyl, mtcars).

Tricks to estimate multiple LHS

To use multiple dependent variables infixest estimations, you need to include themin a vector: like inc(y1, y2, y3).

First, if names are stored in a vector, they can readily be inserted in a formula toperform multiple estimations using the dot square bracket operator. For instance ifmy_lhs = c("y1", "y2"), callingfixest with, sayfeols(.[my_lhs] ~ x1, etc) isequivalent to usingfeols(c(y1, y2) ~ x1, etc). Beware that this is a special featureunique to theleft-hand-side offixest estimations (the default behavior of the DSBoperator is to aggregate with sums, seexpd).

Second, you can use a regular expression to grep the left-hand-sides on the fly. When the..("regex") (reregex("regex")) feature is used naked on the LHS,the variables grepped are inserted intoc(). For example⁠..("Pe") ~ Sepal.Length, iris⁠ is equivalent to⁠c(Petal.Length, Petal.Width) ~ Sepal.Length, iris⁠. Beware that this is aspecial feature unique to theleft-hand-side offixest estimations(the default behavior of..("regex") is to aggregate with sums, seexpd).

Note that if the dependent variable is also on the right-hand-side, it is automaticallyremoved from the set of explanatory variable.For example, feols(y ~ y + x, base) works as feols(y ~ x, base).This is particulary useful to batch multiple estimations with multiple left hand sides.

Dot square bracket operator in formulas

In a formula, the dot square bracket (DSB) operator can: i) create manifold variablesat once, or ii) capture values from the current environment and put themverbatim in the formula.

Say you want to include the variablesx1 tox3 in your formula. You can usexpd(y ~ x.[1:3]) and you'll gety ~ x1 + x2 + x3.

To summon values from the environment, simply put the variable in square brackets.For example:for(i in 1:3) xpd(y.[i] ~ x) will create the formulasy1 ~ x toy3 ~ xdepending on the value ofi.

You can include a full variable from the environment in the same way:for(y in c("a", "b")) xpd(.[y] ~ x) will create the two formulasa ~ x andb ~ x.

The DSB can even be used within variable names, but then the variable must be nested incharacter form. For exampley ~ .["x.[1:2]_sq"] will createy ~ x1_sq + x2_sq. Using thecharacter form is important to avoid a formula parsing error.Double quotes must be used. Note that the character string that is nested willbe parsed with the functiondsb, and thus it will return a vector.

By default, the DSB operator expands vectors into sums. You can add a comma,like in.[, x],to expand with commas–the content can then be used within functions. For instance:c(x.[, 1:2]) will createc(x1, x2) (andnotc(x1 + x2)).

In allfixest estimations, this special parsing is enabled, so you don't need to usexpd.

One-sided formulas can be expanded with the DSB operator: letx = ~sepal + petal, thenxpd(y ~ .[x]) leads tocolor ~ sepal + petal.

You can even use multiple square brackets within a single variable,but then the use of nesting is required.For example, the followingxpd(y ~ .[".[letters[1:2]]_.[1:2]"]) will createy ~ a_1 + b_2. Remember that the nested character string is parsed withdsb,which explains this behavior.

When the element to be expanded i) is equal to the empty string or,ii) is of length 0, it is replaced with a neutral element, namely1.For example,⁠x = "" ; xpd(y ~ .[x])⁠ leads toy ~ 1.

Author(s)

Laurent Berge

References

Berge, Laurent, 2018, "Efficient estimation of maximum likelihood models withmultiple fixed-effects: the R package FENmlm." CREA Discussion Papers,13 ().

For models with multiple fixed-effects:

Gaure, Simen, 2013, "OLS with multiple high dimensional category variables",Computational Statistics & Data Analysis 66 pp. 8–18

On the unconditionnal Negative Binomial model:

Allison, Paul D and Waterman, Richard P, 2002, "Fixed-Effects NegativeBinomial Regression Models", Sociological Methodology 32(1) pp. 247–265

Examples

# This section covers only non-linear in parameters examples# For linear relationships: use femlm or feglm instead# Generating data for a simple exampleset.seed(1)n = 100x = rnorm(n, 1, 5)**2y = rnorm(n, -1, 5)**2z1 = rpois(n, x*y) + rpois(n, 2)base = data.frame(x, y, z1)# Estimating a 'linear' relation:est1_L = femlm(z1 ~ log(x) + log(y), base)# Estimating the same 'linear' relation using a 'non-linear' callest1_NL = feNmlm(z1 ~ 1, base, NL.fml = ~a*log(x)+b*log(y), NL.start = list(a=0, b=0))# we compare the estimates with the function esttable (they are identical)etable(est1_L, est1_NL)# Now generating a non-linear relation (E(z2) = x + y + 1):z2 = rpois(n, x + y) + rpois(n, 1)base$z2 = z2# Estimation using this non-linear formest2_NL = feNmlm(z2 ~ 0, base, NL.fml = ~log(a*x + b*y),               NL.start = 2, lower = list(a=0, b=0))# we can't estimate this relation linearily# => closest we can do:est2_L = femlm(z2 ~ log(x) + log(y), base)# Difference between the two models:etable(est2_L, est2_NL)# Plotting the fits:plot(x, z2, pch = 18)points(x, fitted(est2_L), col = 2, pch = 1)points(x, fitted(est2_NL), col = 4, pch = 2)

Fixed-effects GLM estimations

Description

Estimates GLM models with any number of fixed-effects.

Usage

feglm(  fml,  data,  family = "gaussian",  vcov,  offset,  weights,  subset,  split,  fsplit,  split.keep,  split.drop,  cluster,  se,  ssc,  panel.id,  panel.time.step = NULL,  panel.duplicate.method = "none",  start = NULL,  etastart = NULL,  mustart = NULL,  fixef,  fixef.rm = "perfect_fit",  fixef.tol = 1e-06,  fixef.iter = 10000,  fixef.algo = NULL,  collin.tol = 1e-09,  glm.iter = 25,  glm.tol = 1e-08,  nthreads = getFixest_nthreads(),  lean = FALSE,  warn = TRUE,  notes = getFixest_notes(),  verbose = 0,  only.coef = FALSE,  data.save = FALSE,  fixef.keep_names = NULL,  mem.clean = FALSE,  only.env = FALSE,  env,  ...)feglm.fit(  y,  X,  fixef_df,  family = "gaussian",  vcov,  offset,  split,  fsplit,  split.keep,  split.drop,  cluster,  se,  ssc,  weights,  subset,  start = NULL,  etastart = NULL,  mustart = NULL,  fixef.rm = "perfect_fit",  fixef.tol = 1e-06,  fixef.iter = 10000,  fixef.algo = NULL,  collin.tol = 1e-09,  glm.iter = 25,  glm.tol = 1e-08,  nthreads = getFixest_nthreads(),  lean = FALSE,  warn = TRUE,  notes = getFixest_notes(),  mem.clean = FALSE,  verbose = 0,  only.env = FALSE,  only.coef = FALSE,  env,  ...)fepois(  fml,  data,  vcov,  offset,  weights,  subset,  split,  fsplit,  split.keep,  split.drop,  cluster,  se,  ssc,  panel.id,  panel.time.step = NULL,  panel.duplicate.method = "none",  start = NULL,  etastart = NULL,  mustart = NULL,  fixef,  fixef.rm = "perfect_fit",  fixef.tol = 1e-06,  fixef.iter = 10000,  fixef.algo = NULL,  collin.tol = 1e-09,  glm.iter = 25,  glm.tol = 1e-08,  nthreads = getFixest_nthreads(),  lean = FALSE,  warn = TRUE,  notes = getFixest_notes(),  verbose = 0,  fixef.keep_names = NULL,  mem.clean = FALSE,  only.env = FALSE,  only.coef = FALSE,  data.save = FALSE,  env,  ...)

Arguments

fml

A formula representing the relation to be estimated. For example:fml = z~x+y.To include fixed-effects, insert them in this formula using a pipe: e.g.fml = z~x+y|fixef_1+fixef_2. Multiple estimations can be performed at once:for multiple dep. vars, wrap them inc(): exc(y1, y2). For multiple indep.vars, use the stepwise functions: exx1 + csw(x2, x3).The formulafml = c(y1, y2) ~ x1 + cw0(x2, x3) leads to 6 estimation, see details.Square brackets starting with a dot can be used to call global variables:y.[i] ~ x.[1:2] will lead toy3 ~ x1 + x2 ifi is equal to 3 inthe current environment (see details inxpd).

data

A data.frame containing the necessary variables to run the model.The variables of the non-linear right hand side of the formula are identifiedwith thisdata.frame names. Can also be a matrix.

family

Family to be used for the estimation. Defaults togaussian().Seefamily for details of family functions.

vcov

Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. TheVCOV types implemented are: "iid", "hetero" (or "HC1"), "cluster", "twoway","NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley". It also acceptsobject fromvcov_cluster,vcov_NW,NW,vcov_DK,DK,vcov_conley andconley. It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances. See thevcov documentationin thevignette.

offset

A formula or a numeric vector. An offset can be added to the estimation.If equal to a formula, it should be of the form (for example)~0.5*x**2. Thisoffset is linearly added to the elements of the main formula 'fml'.

weights

A formula or a numeric vector. Each observation can be weighted,the weights must be greater than 0. If equal to a formula, it should be one-sided:for example~ var_weight.

subset

A vector (logical or numeric) or a one-sided formula. If provided,then the estimation will be performed only on the observations defined by this argument.

split

A one sided formula representing a variable (egsplit = ~var) or a vector.If provided, the sample is split according to the variable and one estimation is performedfor each value of that variable. If you also want to include the estimation for thefull sample, use the argumentfsplit instead. You can use the special operators⁠%keep%⁠ and⁠%drop%⁠ to select only a subset of values for which to split thesample. E.g.split = ~var %keep% c("v1", "v2") will split the sample only accordingto the valuesv1 andv2 of the variablevar; it is equivalent to supplying theargumentsplit.keep = c("v1", "v2"). By default there is partial matching on each value,you can trigger a regular expression evaluation by adding a'@' first,as in:~var %drop% "@^v[12]" which will drop values starting with"v1" or"v2" (of course you need to know regexes!).

fsplit

A one sided formula representing a variable (egfsplit = ~var) or a vector.If provided, the sample is split according to the variable and one estimation is performedfor each value of that variable. This argument is the same assplit but also includes thefull sample as the first estimation. You can use the special operators⁠%keep%⁠ and⁠%drop%⁠to select only a subset of values for which to split the sample.E.g.fsplit = ~var %keep% c("v1", "v2") will split the sample only according to thevaluesv1 andv2 of the variablevar; it is equivalent to supplying theargumentsplit.keep = c("v1", "v2"). By default there is partial matching on each value,you can trigger a regular expression evaluation by adding an'@' first,as in:~var %drop% "@^v[12]" which will drop values starting with"v1"or"v2" (of course you need to know regexes!).

split.keep

A character vector. Only used whensplit, orfsplit, is supplied.If provided, then the sample will be split only on the values ofsplit.keep.The values insplit.keep will be partially matched to the values ofsplit.To enable regular expressions, you need to add an'@' first.For examplesplit.keep = c("v1", "@other|var") will keep only the valueinsplit partially matched by"v1" or the values containing"other" or"var".

split.drop

A character vector. Only used whensplit, orfsplit, is supplied.If provided, then the sample will be split only on the values that are not insplit.drop.The values insplit.drop will be partially matched to the values ofsplit.To enable regular expressions, you need to add an'@' first. For examplesplit.drop = c("v1", "@other|var") will drop only the value insplit partiallymatched by"v1" or the values containing"other" or"var".

cluster

Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering overvar1 andvar2contained in the data.framebase used for the estimation. All the followingcluster arguments are valid and do the same thing:cluster = base[, c("var1", "var2")],cluster = c("var1", "var2"),cluster = ~var1+var2.If the two variables were used as fixed-effects in the estimation, you can leave itblank withvcov = "twoway" (assumingvar1 [resp.var2] wasthe 1st [resp. 2nd] fixed-effect). You can interact two variables using^ withthe following syntax:cluster = ~var1^var2 orcluster = "var1^var2".

se

Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation:se = "cluster", otherwisese = "iid". Note that this argument is deprecated,you should usevcov instead.

ssc

An object of classssc.type obtained with the functionssc. Representshow the degree of freedom correction should be done.You must use the functionsscfor this argument. The arguments and defaults of the functionssc are:K.adj = TRUE,K.fixef = "nonnested",G.adj = TRUE,G.df = "min",t.df = "min",⁠K.exact = FALSE)⁠. See the help of the functionssc for details.

panel.id

The panel identifiers. Can either be: i) a one sided formula(e.g.panel.id = ~id+time), ii) a character vector of length 2(e.g.panel.id=c('id', 'time'), or iii) a character scalar of two variablesseparated by a comma (e.g.panel.id='id,time'). Note that you can combine variableswith^ only inside formulas (see the dedicated section infeols).

panel.time.step

The method to compute the lags, default isNULL (which meansautomatically set). Can be equal to:"unitary","consecutive","within.consecutive",or to a number. If"unitary", then the largest common divisor between consecutivetime periods is used (typically if the time variable represents years, it will be 1).This method can apply only to integer (or convertible to integer) variables.If"consecutive", then the time variable can be of any type: two successivetime periods represent a lag of 1. If"witihn.consecutive" thenwithin a given id,two successive time periods represent a lag of 1. Finally, if the time variable is numeric,you can provide your own numeric time step.

panel.duplicate.method

If several observations have the same id and time values,then the notion of lag is not defined for them. Ifduplicate.method = "none" (default)and duplicate values are found, this leads to an error. You can useduplicate.method = "first" so that the first occurrence of identical id/timeobservations will be used as lag.

start

Starting values for the coefficients. Can be: i) a numeric of length 1(e.g.start = 0), ii) a numeric vector of the exact same length as the number of variables,or iii) a named vector of any length (the names will be used to initialize theappropriate coefficients). Default is missing.

etastart

Numeric vector of the same length as the data. Starting values for thelinear predictor. Default is missing.

mustart

Numeric vector of the same length as the data. Starting values for thevector of means. Default is missing.

fixef

Character vector. The names of variables to be used as fixed-effects.These variables should contain the identifier of each observation (e.g., think of itas a panel identifier). Note that the recommended way to include fixed-effects is toinsert them directly in the formula.

fixef.rm

Can be equal to "perfect_fit" (default), "singletons", "infinite_coef"or "none".

This option controls which observations should be removed prior to the estimation.If "singletons", fixed-effects associated to a single observation are removed(since they perfectly explain it).

The value "infinite_coef" only works with GLM families with limited left hand sides (LHS)and exponential link.For instance the Poisson family for which the LHS cannot be lower than 0, or the logitfamily for which the LHS lies within 0 and 1.In that case the fixed-effects (FEs) with only-0 LHS would lead to infinite coefficients(FE = -Inf would explain perfectly the LHS).The valuefixef.rm="infinite_coef" removes all observations associated to FEs withinfinite coefficients.

If "perfect_fit", it is equivalent to "singletons" and "infinite_coef" combined.That means all observations that are perfectly explained by the FEs are removed.

If "none": no observation is removed.

Note that whathever the value of this options: the coefficient estimateswill remain the same. It only affects inference (the standard-errors).

The algorithm is recursive, meaning that, e.g. in the presence of several fixed-effects (FEs),removing singletons in one FE can create singletons (or perfect fits) in another FE.The algorithm continues until there is no singleton/perfect-fit remaining.

fixef.tol

Precision used to obtain the fixed-effects. Defaults to1e-6.It corresponds to the maximum absolute difference allowed betweentwo coefficients of successive iterations.

fixef.iter

Maximum number of iterations in fixed-effects algorithm(only in use for 2+ fixed-effects). Default is 10000.

fixef.algo

NULL (default) or an object of classdemeaning_algo obtained withthe functiondemeaning_algo. IfNULL, it falls to the defaults ofdemeaning_algo.This arguments controls the settings of the demeaning algorithm.Only play with it if the convergence is slow, i.e. look at the slot⁠$iterations⁠, and if any isover 50, it may be worth playing around with it. Please read the documentation of thefunctiondemeaning_algo. Be aware that there is no clear guidance on how to change thesettings, it's more a matter of try-and-see.

collin.tol

Numeric scalar, default is1e-9. Threshold deciding when variables shouldbe considered collinear and subsequently removed from the estimation. Higher values means morevariables will be removed (if there is presence of collinearity). One signal of presence ofcollinearity is t-stats that are extremely low (for instance when t-stats < 1e-3).

glm.iter

Number of iterations of the glm algorithm. Default is 25.

glm.tol

Tolerance level for the glm algorithm. Default is1e-8.

nthreads

The number of threads. Can be: a) an integer lower than, or equal to,the maximum number of threads; b) 0: meaning all available threads will be used;c) a number strictly between 0 and 1 which represents the fraction of all threads to use.The default is to use 50% of all threads. You can set permanently the numberof threads used within this package using the functionsetFixest_nthreads.

lean

Logical scalar, default isFALSE. IfTRUE then all large objects are removedfrom the returned result: this will save memory but will block the possibility touse many methods. It is recommended to use the argumentsse orcluster toobtain the appropriate standard-errors at estimation time, since obtaining differentSEs won't be possible afterwards.

warn

Logical, default isTRUE. Whether warnings should be displayed(concerns warnings relating to convergence state).

notes

Logical. By default, three notes are displayed: when NAs are removed,when some fixed-effects are removed because of only 0 (or 0/1) outcomes, or when avariable is dropped because of collinearity. To avoid displaying these messages,you can setnotes = FALSE. You can remove these messages permanentlyby usingsetFixest_notes(FALSE).

verbose

Integer. Higher values give more information. In particular,it can detail the number of iterations in the demeaning algoritmh (the first numberis the left-hand-side, the other numbers are the right-hand-side variables).It can also detail the step-halving algorithm.

only.coef

Logical scalar, default isFALSE. IfTRUE, then only the estimatedcoefficients are returned. Note that the length of the vector returned is alwaysthe length of the number of coefficients to be estimated: this means that thevariables found to be collinear are returned with an NA value.

data.save

Logical scalar, default isFALSE. IfTRUE, the data used forthe estimation is saved within the returned object. Hence later calls to predict(),vcov(), etc..., will be consistent even if the original data has been modifiedin the meantime.This is especially useful for estimations within loops, where the data changesat each iteration, such that postprocessing can be done outside the loop without issue.

fixef.keep_names

Logical orNULL (default). When you combine differentvariables to transform them into a single fixed-effects you can doe.g.y ~ x | paste(var1, var2).The algorithm provides a shorthand to do the same operation:y ~ x | var1^var2.Because pasting variables is a costly operation, the internal algorithm may use anumerical trick to hasten the process. The cost of doing so is that you lose the labels.If you are interested in getting the value of the fixed-effects coefficientsafter the estimation, you should usefixef.keep_names = TRUE. By default it isequal toTRUE if the number of unique values is lower than 50,000, and toFALSEotherwise.

mem.clean

Logical scalar, default isFALSE. Only to be used if the data set islarge compared to the available RAM. IfTRUE then intermediary objects are removed asmuch as possible andgc is run before each substantial C++ section in the internalcode to avoid memory issues.

only.env

(Advanced users.) Logical scalar, default isFALSE. IfTRUE, then onlythe environment used to make the estimation is returned.

env

(Advanced users.) Afixest environment created by afixest estimationwithonly.env = TRUE. Default is missing. If provided, the data from this environmentwill be used to perform the estimation.

...

Not currently used.

y

Numeric vector/matrix/data.frame of the dependent variable(s). Multiple dependentvariables will return afixest_multi object.

X

Numeric matrix of the regressors.

fixef_df

Matrix/data.frame of the fixed-effects.

Details

The core of the GLM are the weighted OLS estimations. These estimations are performedwithfeols. The method used to demean each variable along the fixed-effectsis based on Berge (2018), since this is the same problem to solve as for the Gaussiancase in a ML setup.

Value

Afixest object. Note thatfixest objects contain many elements and most of themare for internal use, they are presented here only for information. To access them,it is safer to use the user-level methods (e.g.vcov.fixest,resid.fixest,etc) or functions (like for instancefitstat to access any fit statistic).

nobs

The number of observations.

fml

The linear formula of the call.

call

The call of the function.

method

The method used to estimate the model.

family

The family used to estimate the model.

data

The original data set used when calling the function. Only available whenthe estimation was called withdata.save = TRUE

fml_all

A list containing different parts of the formula. Always contain thelinear formula. Then, if relevant:fixef: the fixed-effects.

nparams

The number of parameters of the model.

fixef_vars

The names of each fixed-effect dimension.

fixef_id

The list (of length the number of fixed-effects) of thefixed-effects identifiers for each observation.

fixef_sizes

The size of each fixed-effect (i.e. the number of unique identifier foreach fixed-effect dimension).

y

(When relevant.) The dependent variable (used to compute the within-R2when fixed-effects are present).

convStatus

Logical, convergence status of the IRWLS algorithm.

irls_weights

The weights of the last iteration of the IRWLS algorithm.

obs_selection

(When relevant.) List containing vectors of integers. It representsthe sequential selection of observation vis a vis the original data set.

fixef_removed

(When relevant.) In the case there were fixed-effects and someobservations were removed because of only 0/1 outcome within a fixed-effect, it gives thelist (for each fixed-effect dimension) of the fixed-effect identifiers that were removed.

coefficients

The named vector of estimated coefficients.

coeftable

The table of the coefficients with their standard errors,z-values and p-values.

loglik

The loglikelihood.

deviance

Deviance of the fitted model.

iterations

Number of iterations of the algorithm.

ll_null

Log-likelihood of the null model (i.e. with the intercept only).

ssr_null

Sum of the squared residuals of the null model (containing onlywith the intercept).

pseudo_r2

The adjusted pseudo R2.

fitted.values

The fitted values are the expected value of the dependentvariable for the fitted model: that isE(Y|X).

linear.predictors

The linear predictors.

residuals

The residuals (y minus the fitted values).

sq.cor

Squared correlation between the dependent variable and the expectedpredictor (i.e. fitted.values) obtained by the estimation.

hessian

The Hessian of the parameters.

cov.iid

The variance-covariance matrix of the parameters.

se

The standard-error of the parameters.

scores

The matrix of the scores (first derivative for each observation).

residuals

The difference between the dependent variable and the expected predictor.

sumFE

The sum of the fixed-effects coefficients for each observation.

offset

(When relevant.) The offset formula.

weights

(When relevant.) The weights formula.

collin.var

(When relevant.) Vector containing the variables removedbecause of collinearity.

collin.coef

(When relevant.) Vector of coefficients, where the values of the variables removed because of collinearity are NA.

Combining the fixed-effects

You can combine two variables to make it a new fixed-effect using^.The syntax is as follows:fe_1^fe_2. Here you created a new variable which is the combinationof the two variables fe_1 and fe_2. This is identical to doingpaste0(fe_1, "_", fe_2)but more convenient.

Note that pasting is a costly operation, especially for large data sets.Hence, by default this paste is done only when the number of unique valuesis lower than 50,000 observations.

In case you are using a large data set and want to keep the identity of the fixed-effects,you need to use the argumentfixef.keep_names = TRUE.

Note that these “identities” are useful only if you're interested inthe value of the fixed-effects (that you can extract withfixef.fixest).

Varying slopes

You can add variables with varying slopes in the fixed-effect part of the formula.The syntax is as follows:fixef_var[var1, var2]. Here the variables var1 and var2 willbe with varying slopes (one slope per value in fixef_var) and the fixed-effectfixef_var will also be added.

To add only the variables with varying slopes and not the fixed-effect,use double square brackets:fixef_var[[var1, var2]].

In other words:

In general, for convergence reasons, it is recommended to always add the fixed-effect andavoid using only the variable with varying slope (i.e. use single square brackets).

Lagging variables

To use leads/lags of variables in the estimation, you can: i) either provide the argumentpanel.id, ii) either set your data set as a panel with the functionpanel,f andd.

You can provide several leads/lags/differences at once: e.g. if your formula is equal tof(y) ~ l(x, -1:1), it means that the dependent variable is equal to the lead ofy,and you will have as explanatory variables the lead ofx1,x1 and the lag ofx1.See the examples in functionl for more details.

Interactions

You can interact a numeric variable with a "factor-like" variable by usingi(factor_var, continuous_var, ref), wherecontinuous_var will be interacted witheach value offactor_var and the argumentref is a value offactor_vartaken as a reference (optional).

Using this specific way to create interactions leads to a different display of theinteracted values inetable. See examples.

It is important to note thatif you do not care about the standard-errors ofthe interactions, then you can add interactions in the fixed-effects part of the formula,it will be incomparably faster (using the syntaxfactor_var[continuous_var], as explainedin the section “Varying slopes”).

The functioni has in fact more arguments, please see details in its associated help page.

On standard-errors

Standard-errors can be computed in different ways, you can use the argumentsse andsscinsummary.fixest to define how to compute them. By default, the VCOV is the "standard" one.

The following vignette:On standard-errors describes in details how the standard-errors are computed infixest and how you can replicate standard-errors from other software.

You can use the functionssetFixest_vcov andsetFixest_ssc topermanently set the way the standard-errors are computed.

Multiple estimations

Multiple estimations can be performed at once, they just have to be specified in the formula.Multiple estimations yield afixest_multi object which is ‘kind of’ a list ofall the results but includes specific methods to access the results in a handy way.Please have a look at the dedicated vignette:Multiple estimations.

To include multiple dependent variables, wrap them inc() (list() also works).For instancefml = c(y1, y2) ~ x1 would estimate the modelfml = y1 ~ x1 andthen the modelfml = y2 ~ x1.

To include multiple independent variables, you need to use the stepwise functions.There are 4 stepwise functions:sw,sw0,csw,csw0, andmvsw. Of courseswstands for stepwise, andcsw for cumulative stepwise. Finallymvsw is a bit special,it stands for multiverse stepwise. Let's explain that.Assume you have the following formula:fml = y ~ x1 + sw(x2, x3).The stepwise functionsw will estimate the following two models:y ~ x1 + x2 andy ~ x1 + x3. That is, each element insw() is sequentially, and separately,added to the formula. Would have you usedsw0 in lieu ofsw, then the modely ~ x1 would also have been estimated. The0 in the name means that the modelwithout any stepwise element also needs to be estimated.The prefixc means cumulative: each stepwise element is added to the next. That is,fml = y ~ x1 + csw(x2, x3) would lead to the following modelsy ~ x1 + x2 andy ~ x1 + x2 + x3. The0 has the same meaning and would also lead to the model withoutthe stepwise elements to be estimated: in other words,fml = y ~ x1 + csw0(x2, x3)leads to the following three models:y ~ x1,y ~ x1 + x2 andy ~ x1 + x2 + x3.Finallymvsw will add, in a stepwise fashion all possible combinations of the variablesin its arguments. For examplemvsw(x1, x2, x3) is equivalent tosw0(x1, x2, x3, x1 + x2, x1 + x3, x2 + x3, x1 + x2 + x3). The number of modelsto estimate grows at a factorial rate: so be cautious!

Multiple independent variables can be combined with multiple dependent variables, as infml = c(y1, y2) ~ cw(x1, x2, x3) which would lead to 6 estimations. Multipleestimations can also be combined to split samples (with the argumentssplit,fsplit).

You can also add fixed-effects in a stepwise fashion. Note that you cannot performstepwise estimations on the IV part of the formula (feols only).

If NAs are present in the sample, to avoid too many messages, only NA removalconcerning the variables common to all estimations is reported.

A note on performance. The feature of multiple estimations has been highly optimized forfeols, in particular in the presence of fixed-effects. It is faster to estimatemultiple models using the formula rather than with a loop. For non-feols models usingthe formula is roughly similar to using a loop performance-wise.

Argument sliding

When the data set has been set up globally usingsetFixest_estimation(data = data_set), the argumentvcov can be used implicitly.This means that calls such asfeols(y ~ x, "HC1"), orfeols(y ~ x, ~id), are valid:i) the data is automatically deduced from the global settings, and ii) thevcovis deduced to be the second argument.

Piping

Although the argument 'data' is placed in second position, the data can be piped to theestimation functions. For example, with R >= 4.1,mtcars |> feols(mpg ~ cyl) works asfeols(mpg ~ cyl, mtcars).

Tricks to estimate multiple LHS

To use multiple dependent variables infixest estimations, you need to include themin a vector: like inc(y1, y2, y3).

First, if names are stored in a vector, they can readily be inserted in a formula toperform multiple estimations using the dot square bracket operator. For instance ifmy_lhs = c("y1", "y2"), callingfixest with, sayfeols(.[my_lhs] ~ x1, etc) isequivalent to usingfeols(c(y1, y2) ~ x1, etc). Beware that this is a special featureunique to theleft-hand-side offixest estimations (the default behavior of the DSBoperator is to aggregate with sums, seexpd).

Second, you can use a regular expression to grep the left-hand-sides on the fly. When the..("regex") (reregex("regex")) feature is used naked on the LHS,the variables grepped are inserted intoc(). For example⁠..("Pe") ~ Sepal.Length, iris⁠ is equivalent to⁠c(Petal.Length, Petal.Width) ~ Sepal.Length, iris⁠. Beware that this is aspecial feature unique to theleft-hand-side offixest estimations(the default behavior of..("regex") is to aggregate with sums, seexpd).

Note that if the dependent variable is also on the right-hand-side, it is automaticallyremoved from the set of explanatory variable.For example, feols(y ~ y + x, base) works as feols(y ~ x, base).This is particulary useful to batch multiple estimations with multiple left hand sides.

Dot square bracket operator in formulas

In a formula, the dot square bracket (DSB) operator can: i) create manifold variablesat once, or ii) capture values from the current environment and put themverbatim in the formula.

Say you want to include the variablesx1 tox3 in your formula. You can usexpd(y ~ x.[1:3]) and you'll gety ~ x1 + x2 + x3.

To summon values from the environment, simply put the variable in square brackets.For example:for(i in 1:3) xpd(y.[i] ~ x) will create the formulasy1 ~ x toy3 ~ xdepending on the value ofi.

You can include a full variable from the environment in the same way:for(y in c("a", "b")) xpd(.[y] ~ x) will create the two formulasa ~ x andb ~ x.

The DSB can even be used within variable names, but then the variable must be nested incharacter form. For exampley ~ .["x.[1:2]_sq"] will createy ~ x1_sq + x2_sq. Using thecharacter form is important to avoid a formula parsing error.Double quotes must be used. Note that the character string that is nested willbe parsed with the functiondsb, and thus it will return a vector.

By default, the DSB operator expands vectors into sums. You can add a comma,like in.[, x],to expand with commas–the content can then be used within functions. For instance:c(x.[, 1:2]) will createc(x1, x2) (andnotc(x1 + x2)).

In allfixest estimations, this special parsing is enabled, so you don't need to usexpd.

One-sided formulas can be expanded with the DSB operator: letx = ~sepal + petal, thenxpd(y ~ .[x]) leads tocolor ~ sepal + petal.

You can even use multiple square brackets within a single variable,but then the use of nesting is required.For example, the followingxpd(y ~ .[".[letters[1:2]]_.[1:2]"]) will createy ~ a_1 + b_2. Remember that the nested character string is parsed withdsb,which explains this behavior.

When the element to be expanded i) is equal to the empty string or,ii) is of length 0, it is replaced with a neutral element, namely1.For example,⁠x = "" ; xpd(y ~ .[x])⁠ leads toy ~ 1.

Author(s)

Laurent Berge

References

Berge, Laurent, 2018, "Efficient estimation of maximum likelihood models withmultiple fixed-effects: the R package FENmlm." CREA Discussion Papers,13 ().

For models with multiple fixed-effects:

Gaure, Simen, 2013, "OLS with multiple high dimensional category variables",Computational Statistics & Data Analysis 66 pp. 8–18

See Also

See alsosummary.fixest to see the results with the appropriate standard-errors,fixef.fixest to extract the fixed-effects coefficients, and the functionetableto visualize the results of multiple estimations.And other estimation methods:feols,femlm,fenegbin,feNmlm.

Examples

# Poisson estimationres = feglm(Sepal.Length ~ Sepal.Width + Petal.Length | Species, iris, "poisson")# You could also use fepoisres_pois = fepois(Sepal.Length ~ Sepal.Width + Petal.Length | Species, iris)# With the fit method:res_fit = feglm.fit(iris$Sepal.Length, iris[, 2:3], iris$Species, "poisson")# All results are identical:etable(res, res_pois, res_fit)# Note that you have many more examples in feols## Multiple estimations:## 6 estimationsest_mult = fepois(c(Ozone, Solar.R) ~ Wind + Temp + csw0(Wind:Temp, Day), airquality)# We can display the results for the first lhs:etable(est_mult[lhs = 1])# And now the second (access can be made by name)etable(est_mult[lhs = "Solar.R"])# Now we focus on the two last right hand sides# (note that .N can be used to specify the last item)etable(est_mult[rhs = 2:.N])# Combining with splitest_split = fepois(c(Ozone, Solar.R) ~ sw(poly(Wind, 2), poly(Temp, 2)),                  airquality, split = ~ Month)# You can display everything at once with the print methodest_split# Different way of displaying the results with "compact"summary(est_split, "compact")# You can still select which sample/LHS/RHS to displayest_split[sample = 1:2, lhs = 1, rhs = 1]

Fixed-effects maximum likelihood models

Description

This function estimates maximum likelihood models with any number of fixed-effects.

Usage

femlm(  fml,  data,  family = c("poisson", "negbin", "logit", "gaussian"),  vcov,  start = 0,  fixef,  fixef.rm = "perfect_fit",  offset,  subset,  split,  fsplit,  split.keep,  split.drop,  cluster,  se,  ssc,  panel.id,  panel.time.step = NULL,  panel.duplicate.method = "none",  fixef.tol = 1e-05,  fixef.iter = 10000,  nthreads = getFixest_nthreads(),  lean = FALSE,  verbose = 0,  warn = TRUE,  notes = getFixest_notes(),  theta.init,  fixef.keep_names = NULL,  mem.clean = FALSE,  only.env = FALSE,  only.coef = FALSE,  data.save = FALSE,  env,  ...)fenegbin(  fml,  data,  vcov,  theta.init,  start = 0,  fixef,  fixef.rm = "perfect_fit",  offset,  subset,  split,  fsplit,  split.keep,  split.drop,  cluster,  se,  ssc,  panel.id,  panel.time.step = NULL,  panel.duplicate.method = "none",  fixef.tol = 1e-05,  fixef.iter = 10000,  nthreads = getFixest_nthreads(),  lean = FALSE,  verbose = 0,  warn = TRUE,  notes = getFixest_notes(),  fixef.keep_names = NULL,  mem.clean = FALSE,  only.env = FALSE,  only.coef = FALSE,  data.save = FALSE,  env,  ...)

Arguments

fml

A formula representing the relation to be estimated. For example:fml = z~x+y.To include fixed-effects, insert them in this formula using a pipe: e.g.fml = z~x+y|fixef_1+fixef_2. Multiple estimations can be performed at once:for multiple dep. vars, wrap them inc(): exc(y1, y2). For multiple indep.vars, use the stepwise functions: exx1 + csw(x2, x3).The formulafml = c(y1, y2) ~ x1 + cw0(x2, x3) leads to 6 estimation, see details.Square brackets starting with a dot can be used to call global variables:y.[i] ~ x.[1:2] will lead toy3 ~ x1 + x2 ifi is equal to 3 inthe current environment (see details inxpd).

data

A data.frame containing the necessary variables to run the model.The variables of the non-linear right hand side of the formula are identifiedwith thisdata.frame names. Can also be a matrix.

family

Character scalar. It should provide the family. The possible valuesare "poisson" (Poisson model with log-link, the default), "negbin" (Negative Binomialmodel with log-link), "logit" (LOGIT model with log-link), "gaussian" (Gaussian model).

vcov

Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. TheVCOV types implemented are: "iid", "hetero" (or "HC1"), "cluster", "twoway","NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley". It also acceptsobject fromvcov_cluster,vcov_NW,NW,vcov_DK,DK,vcov_conley andconley. It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances. See thevcov documentationin thevignette.

start

Starting values for the coefficients. Can be: i) a numeric of length 1(e.g.start = 0, the default), ii) a numeric vector of the exact same length as thenumber of variables, or iii) a named vector of any length (the names will beused to initialize the appropriate coefficients).

fixef

Character vector. The names of variables to be used as fixed-effects.These variables should contain the identifier of each observation (e.g., think of itas a panel identifier). Note that the recommended way to include fixed-effects is toinsert them directly in the formula.

fixef.rm

Can be equal to "perfect_fit" (default), "singletons", "infinite_coef"or "none".

This option controls which observations should be removed prior to the estimation.If "singletons", fixed-effects associated to a single observation are removed(since they perfectly explain it).

The value "infinite_coef" only works with GLM families with limited left hand sides (LHS)and exponential link.For instance the Poisson family for which the LHS cannot be lower than 0, or the logitfamily for which the LHS lies within 0 and 1.In that case the fixed-effects (FEs) with only-0 LHS would lead to infinite coefficients(FE = -Inf would explain perfectly the LHS).The valuefixef.rm="infinite_coef" removes all observations associated to FEs withinfinite coefficients.

If "perfect_fit", it is equivalent to "singletons" and "infinite_coef" combined.That means all observations that are perfectly explained by the FEs are removed.

If "none": no observation is removed.

Note that whathever the value of this options: the coefficient estimateswill remain the same. It only affects inference (the standard-errors).

The algorithm is recursive, meaning that, e.g. in the presence of several fixed-effects (FEs),removing singletons in one FE can create singletons (or perfect fits) in another FE.The algorithm continues until there is no singleton/perfect-fit remaining.

offset

A formula or a numeric vector. An offset can be added to the estimation.If equal to a formula, it should be of the form (for example)~0.5*x**2. Thisoffset is linearly added to the elements of the main formula 'fml'.

subset

A vector (logical or numeric) or a one-sided formula. If provided,then the estimation will be performed only on the observations defined by this argument.

split

A one sided formula representing a variable (egsplit = ~var) or a vector.If provided, the sample is split according to the variable and one estimation is performedfor each value of that variable. If you also want to include the estimation for thefull sample, use the argumentfsplit instead. You can use the special operators⁠%keep%⁠ and⁠%drop%⁠ to select only a subset of values for which to split thesample. E.g.split = ~var %keep% c("v1", "v2") will split the sample only accordingto the valuesv1 andv2 of the variablevar; it is equivalent to supplying theargumentsplit.keep = c("v1", "v2"). By default there is partial matching on each value,you can trigger a regular expression evaluation by adding a'@' first,as in:~var %drop% "@^v[12]" which will drop values starting with"v1" or"v2" (of course you need to know regexes!).

fsplit

A one sided formula representing a variable (egfsplit = ~var) or a vector.If provided, the sample is split according to the variable and one estimation is performedfor each value of that variable. This argument is the same assplit but also includes thefull sample as the first estimation. You can use the special operators⁠%keep%⁠ and⁠%drop%⁠to select only a subset of values for which to split the sample.E.g.fsplit = ~var %keep% c("v1", "v2") will split the sample only according to thevaluesv1 andv2 of the variablevar; it is equivalent to supplying theargumentsplit.keep = c("v1", "v2"). By default there is partial matching on each value,you can trigger a regular expression evaluation by adding an'@' first,as in:~var %drop% "@^v[12]" which will drop values starting with"v1"or"v2" (of course you need to know regexes!).

split.keep

A character vector. Only used whensplit, orfsplit, is supplied.If provided, then the sample will be split only on the values ofsplit.keep.The values insplit.keep will be partially matched to the values ofsplit.To enable regular expressions, you need to add an'@' first.For examplesplit.keep = c("v1", "@other|var") will keep only the valueinsplit partially matched by"v1" or the values containing"other" or"var".

split.drop

A character vector. Only used whensplit, orfsplit, is supplied.If provided, then the sample will be split only on the values that are not insplit.drop.The values insplit.drop will be partially matched to the values ofsplit.To enable regular expressions, you need to add an'@' first. For examplesplit.drop = c("v1", "@other|var") will drop only the value insplit partiallymatched by"v1" or the values containing"other" or"var".

cluster

Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering overvar1 andvar2contained in the data.framebase used for the estimation. All the followingcluster arguments are valid and do the same thing:cluster = base[, c("var1", "var2")],cluster = c("var1", "var2"),cluster = ~var1+var2.If the two variables were used as fixed-effects in the estimation, you can leave itblank withvcov = "twoway" (assumingvar1 [resp.var2] wasthe 1st [resp. 2nd] fixed-effect). You can interact two variables using^ withthe following syntax:cluster = ~var1^var2 orcluster = "var1^var2".

se

Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation:se = "cluster", otherwisese = "iid". Note that this argument is deprecated,you should usevcov instead.

ssc

An object of classssc.type obtained with the functionssc. Representshow the degree of freedom correction should be done.You must use the functionsscfor this argument. The arguments and defaults of the functionssc are:K.adj = TRUE,K.fixef = "nonnested",G.adj = TRUE,G.df = "min",t.df = "min",⁠K.exact = FALSE)⁠. See the help of the functionssc for details.

panel.id

The panel identifiers. Can either be: i) a one sided formula(e.g.panel.id = ~id+time), ii) a character vector of length 2(e.g.panel.id=c('id', 'time'), or iii) a character scalar of two variablesseparated by a comma (e.g.panel.id='id,time'). Note that you can combine variableswith^ only inside formulas (see the dedicated section infeols).

panel.time.step

The method to compute the lags, default isNULL (which meansautomatically set). Can be equal to:"unitary","consecutive","within.consecutive",or to a number. If"unitary", then the largest common divisor between consecutivetime periods is used (typically if the time variable represents years, it will be 1).This method can apply only to integer (or convertible to integer) variables.If"consecutive", then the time variable can be of any type: two successivetime periods represent a lag of 1. If"witihn.consecutive" thenwithin a given id,two successive time periods represent a lag of 1. Finally, if the time variable is numeric,you can provide your own numeric time step.

panel.duplicate.method

If several observations have the same id and time values,then the notion of lag is not defined for them. Ifduplicate.method = "none" (default)and duplicate values are found, this leads to an error. You can useduplicate.method = "first" so that the first occurrence of identical id/timeobservations will be used as lag.

fixef.tol

Precision used to obtain the fixed-effects. Defaults to1e-5.It corresponds to the maximum absolute difference allowed between two coefficientsof successive iterations. Argumentfixef.tol cannot be lowerthan10000*.Machine$double.eps. Note that this parameter is dynamicallycontrolled by the algorithm.

fixef.iter

Maximum number of iterations in fixed-effects algorithm(only in use for 2+ fixed-effects). Default is 10000.

nthreads

The number of threads. Can be: a) an integer lower than, or equal to,the maximum number of threads; b) 0: meaning all available threads will be used;c) a number strictly between 0 and 1 which represents the fraction of all threads to use.The default is to use 50% of all threads. You can set permanently the numberof threads used within this package using the functionsetFixest_nthreads.

lean

Logical scalar, default isFALSE. IfTRUE then all large objects are removedfrom the returned result: this will save memory but will block the possibility touse many methods. It is recommended to use the argumentsse orcluster toobtain the appropriate standard-errors at estimation time, since obtaining differentSEs won't be possible afterwards.

verbose

Integer, default is 0. It represents the level of information thatshould be reported during the optimisation process. Ifverbose=0:nothing is reported. Ifverbose=1: the value of the coefficients and thelikelihood are reported. Ifverbose=2:1 + information on the computing time ofthe null model, the fixed-effects coefficients and the hessian are reported.

warn

Logical, default isTRUE. Whether warnings should be displayed(concerns warnings relating to convergence state).

notes

Logical. By default, two notes are displayed: when NAs are removed(to show additional information) and when some observations are removed becauseof only 0 (or 0/1) outcomes in a fixed-effect setup (in Poisson/Neg. Bin./Logit models).To avoid displaying these messages, you can setnotes = FALSE. You canremove these messages permanently by usingsetFixest_notes(FALSE).

theta.init

Positive numeric scalar. The starting value of the dispersionparameter iffamily="negbin". By default, the algorithm uses as a starting valuethe theta obtained from the model with only the intercept.

fixef.keep_names

Logical orNULL (default). When you combine differentvariables to transform them into a single fixed-effects you can doe.g.y ~ x | paste(var1, var2).The algorithm provides a shorthand to do the same operation:y ~ x | var1^var2.Because pasting variables is a costly operation, the internal algorithm may use anumerical trick to hasten the process. The cost of doing so is that you lose the labels.If you are interested in getting the value of the fixed-effects coefficientsafter the estimation, you should usefixef.keep_names = TRUE. By default it isequal toTRUE if the number of unique values is lower than 50,000, and toFALSEotherwise.

mem.clean

Logical scalar, default isFALSE. Only to be used if the data set islarge compared to the available RAM. IfTRUE then intermediary objects are removed asmuch as possible andgc is run before each substantial C++ section in the internalcode to avoid memory issues.

only.env

(Advanced users.) Logical scalar, default isFALSE. IfTRUE, then onlythe environment used to make the estimation is returned.

only.coef

Logical scalar, default isFALSE. IfTRUE, then only the estimatedcoefficients are returned. Note that the length of the vector returned is alwaysthe length of the number of coefficients to be estimated: this means that thevariables found to be collinear are returned with an NA value.

data.save

Logical scalar, default isFALSE. IfTRUE, the data used forthe estimation is saved within the returned object. Hence later calls to predict(),vcov(), etc..., will be consistent even if the original data has been modifiedin the meantime.This is especially useful for estimations within loops, where the data changesat each iteration, such that postprocessing can be done outside the loop without issue.

env

(Advanced users.) Afixest environment created by afixest estimationwithonly.env = TRUE. Default is missing. If provided, the data from this environmentwill be used to perform the estimation.

...

Not currently used.

Details

Note that the functionsfeglm andfemlm provide the same results when usingthe same families but differ in that the latter is a direct maximum likelihoodoptimization (so the two can really have different convergence rates).

Value

Afixest object. Note thatfixest objects contain many elements and most ofthem are for internal use, they are presented here only for information.To access them, it is safer to use the user-level methods(e.g.vcov.fixest,resid.fixest, etc) or functions (like for instancefitstat to access any fit statistic).

nobs

The number of observations.

fml

The linear formula of the call.

call

The call of the function.

method

The method used to estimate the model.

family

The family used to estimate the model.

data

The original data set used when calling the function. Only available whenthe estimation was called withdata.save = TRUE

fml_all

A list containing different parts of the formula. Always contain thelinear formula. Then, if relevant:fixef: the fixed-effects;NL: the non linear part of the formula.

nparams

The number of parameters of the model.

fixef_vars

The names of each fixed-effect dimension.

fixef_id

The list (of length the number of fixed-effects) of thefixed-effects identifiers for each observation.

fixef_sizes

The size of each fixed-effect (i.e. the number of uniqueidentifier for each fixed-effect dimension).

convStatus

Logical, convergence status.

message

The convergence message from the optimization procedures.

obs_selection

(When relevant.) List containing vectors of integers. It representsthe sequential selection of observation vis a vis the original data set.

fixef_removed

(When relevant.) In the case there were fixed-effects and someobservations were removed because of only 0/1 outcome within a fixed-effect, it gives thelist (for each fixed-effect dimension) of the fixed-effect identifiers that were removed.

coefficients

The named vector of estimated coefficients.

coeftable

The table of the coefficients with their standard errors, z-valuesand p-values.

loglik

The log-likelihood.

iterations

Number of iterations of the algorithm.

ll_null

Log-likelihood of the null model (i.e. with the intercept only).

ll_fe_only

Log-likelihood of the model with only the fixed-effects.

ssr_null

Sum of the squared residuals of the null model (containing only withthe intercept).

pseudo_r2

The adjusted pseudo R2.

fitted.values

The fitted values are the expected value of the dependent variablefor the fitted model: that isE(Y|X).

residuals

The residuals (y minus the fitted values).

sq.cor

Squared correlation between the dependent variable and theexpected predictor (i.e. fitted.values) obtained by the estimation.

hessian

The Hessian of the parameters.

cov.iid

The variance-covariance matrix of the parameters.

se

The standard-error of the parameters.

scores

The matrix of the scores (first derivative for each observation).

residuals

The difference between the dependent variable and the expected predictor.

sumFE

The sum of the fixed-effects coefficients for each observation.

offset

(When relevant.) The offset formula.

Combining the fixed-effects

You can combine two variables to make it a new fixed-effect using^.The syntax is as follows:fe_1^fe_2. Here you created a new variable which is the combinationof the two variables fe_1 and fe_2. This is identical to doingpaste0(fe_1, "_", fe_2)but more convenient.

Note that pasting is a costly operation, especially for large data sets.Hence, by default this paste is done only when the number of unique valuesis lower than 50,000 observations.

In case you are using a large data set and want to keep the identity of the fixed-effects,you need to use the argumentfixef.keep_names = TRUE.

Note that these “identities” are useful only if you're interested inthe value of the fixed-effects (that you can extract withfixef.fixest).

Lagging variables

To use leads/lags of variables in the estimation, you can: i) either provide the argumentpanel.id, ii) either set your data set as a panel with the functionpanel,f andd.

You can provide several leads/lags/differences at once: e.g. if your formula is equal tof(y) ~ l(x, -1:1), it means that the dependent variable is equal to the lead ofy,and you will have as explanatory variables the lead ofx1,x1 and the lag ofx1.See the examples in functionl for more details.

Interactions

You can interact a numeric variable with a "factor-like" variable by usingi(factor_var, continuous_var, ref), wherecontinuous_var will be interacted witheach value offactor_var and the argumentref is a value offactor_vartaken as a reference (optional).

Using this specific way to create interactions leads to a different display of theinteracted values inetable. See examples.

It is important to note thatif you do not care about the standard-errors ofthe interactions, then you can add interactions in the fixed-effects part of the formula,it will be incomparably faster (using the syntaxfactor_var[continuous_var], as explainedin the section “Varying slopes”).

The functioni has in fact more arguments, please see details in its associated help page.

On standard-errors

Standard-errors can be computed in different ways, you can use the argumentsse andsscinsummary.fixest to define how to compute them. By default, the VCOV is the "standard" one.

The following vignette:On standard-errors describes in details how the standard-errors are computed infixest and how you can replicate standard-errors from other software.

You can use the functionssetFixest_vcov andsetFixest_ssc topermanently set the way the standard-errors are computed.

Multiple estimations

Multiple estimations can be performed at once, they just have to be specified in the formula.Multiple estimations yield afixest_multi object which is ‘kind of’ a list ofall the results but includes specific methods to access the results in a handy way.Please have a look at the dedicated vignette:Multiple estimations.

To include multiple dependent variables, wrap them inc() (list() also works).For instancefml = c(y1, y2) ~ x1 would estimate the modelfml = y1 ~ x1 andthen the modelfml = y2 ~ x1.

To include multiple independent variables, you need to use the stepwise functions.There are 4 stepwise functions:sw,sw0,csw,csw0, andmvsw. Of courseswstands for stepwise, andcsw for cumulative stepwise. Finallymvsw is a bit special,it stands for multiverse stepwise. Let's explain that.Assume you have the following formula:fml = y ~ x1 + sw(x2, x3).The stepwise functionsw will estimate the following two models:y ~ x1 + x2 andy ~ x1 + x3. That is, each element insw() is sequentially, and separately,added to the formula. Would have you usedsw0 in lieu ofsw, then the modely ~ x1 would also have been estimated. The0 in the name means that the modelwithout any stepwise element also needs to be estimated.The prefixc means cumulative: each stepwise element is added to the next. That is,fml = y ~ x1 + csw(x2, x3) would lead to the following modelsy ~ x1 + x2 andy ~ x1 + x2 + x3. The0 has the same meaning and would also lead to the model withoutthe stepwise elements to be estimated: in other words,fml = y ~ x1 + csw0(x2, x3)leads to the following three models:y ~ x1,y ~ x1 + x2 andy ~ x1 + x2 + x3.Finallymvsw will add, in a stepwise fashion all possible combinations of the variablesin its arguments. For examplemvsw(x1, x2, x3) is equivalent tosw0(x1, x2, x3, x1 + x2, x1 + x3, x2 + x3, x1 + x2 + x3). The number of modelsto estimate grows at a factorial rate: so be cautious!

Multiple independent variables can be combined with multiple dependent variables, as infml = c(y1, y2) ~ cw(x1, x2, x3) which would lead to 6 estimations. Multipleestimations can also be combined to split samples (with the argumentssplit,fsplit).

You can also add fixed-effects in a stepwise fashion. Note that you cannot performstepwise estimations on the IV part of the formula (feols only).

If NAs are present in the sample, to avoid too many messages, only NA removalconcerning the variables common to all estimations is reported.

A note on performance. The feature of multiple estimations has been highly optimized forfeols, in particular in the presence of fixed-effects. It is faster to estimatemultiple models using the formula rather than with a loop. For non-feols models usingthe formula is roughly similar to using a loop performance-wise.

Argument sliding

When the data set has been set up globally usingsetFixest_estimation(data = data_set), the argumentvcov can be used implicitly.This means that calls such asfeols(y ~ x, "HC1"), orfeols(y ~ x, ~id), are valid:i) the data is automatically deduced from the global settings, and ii) thevcovis deduced to be the second argument.

Piping

Although the argument 'data' is placed in second position, the data can be piped to theestimation functions. For example, with R >= 4.1,mtcars |> feols(mpg ~ cyl) works asfeols(mpg ~ cyl, mtcars).

Tricks to estimate multiple LHS

To use multiple dependent variables infixest estimations, you need to include themin a vector: like inc(y1, y2, y3).

First, if names are stored in a vector, they can readily be inserted in a formula toperform multiple estimations using the dot square bracket operator. For instance ifmy_lhs = c("y1", "y2"), callingfixest with, sayfeols(.[my_lhs] ~ x1, etc) isequivalent to usingfeols(c(y1, y2) ~ x1, etc). Beware that this is a special featureunique to theleft-hand-side offixest estimations (the default behavior of the DSBoperator is to aggregate with sums, seexpd).

Second, you can use a regular expression to grep the left-hand-sides on the fly. When the..("regex") (reregex("regex")) feature is used naked on the LHS,the variables grepped are inserted intoc(). For example⁠..("Pe") ~ Sepal.Length, iris⁠ is equivalent to⁠c(Petal.Length, Petal.Width) ~ Sepal.Length, iris⁠. Beware that this is aspecial feature unique to theleft-hand-side offixest estimations(the default behavior of..("regex") is to aggregate with sums, seexpd).

Note that if the dependent variable is also on the right-hand-side, it is automaticallyremoved from the set of explanatory variable.For example, feols(y ~ y + x, base) works as feols(y ~ x, base).This is particulary useful to batch multiple estimations with multiple left hand sides.

Dot square bracket operator in formulas

In a formula, the dot square bracket (DSB) operator can: i) create manifold variablesat once, or ii) capture values from the current environment and put themverbatim in the formula.

Say you want to include the variablesx1 tox3 in your formula. You can usexpd(y ~ x.[1:3]) and you'll gety ~ x1 + x2 + x3.

To summon values from the environment, simply put the variable in square brackets.For example:for(i in 1:3) xpd(y.[i] ~ x) will create the formulasy1 ~ x toy3 ~ xdepending on the value ofi.

You can include a full variable from the environment in the same way:for(y in c("a", "b")) xpd(.[y] ~ x) will create the two formulasa ~ x andb ~ x.

The DSB can even be used within variable names, but then the variable must be nested incharacter form. For exampley ~ .["x.[1:2]_sq"] will createy ~ x1_sq + x2_sq. Using thecharacter form is important to avoid a formula parsing error.Double quotes must be used. Note that the character string that is nested willbe parsed with the functiondsb, and thus it will return a vector.

By default, the DSB operator expands vectors into sums. You can add a comma,like in.[, x],to expand with commas–the content can then be used within functions. For instance:c(x.[, 1:2]) will createc(x1, x2) (andnotc(x1 + x2)).

In allfixest estimations, this special parsing is enabled, so you don't need to usexpd.

One-sided formulas can be expanded with the DSB operator: letx = ~sepal + petal, thenxpd(y ~ .[x]) leads tocolor ~ sepal + petal.

You can even use multiple square brackets within a single variable,but then the use of nesting is required.For example, the followingxpd(y ~ .[".[letters[1:2]]_.[1:2]"]) will createy ~ a_1 + b_2. Remember that the nested character string is parsed withdsb,which explains this behavior.

When the element to be expanded i) is equal to the empty string or,ii) is of length 0, it is replaced with a neutral element, namely1.For example,⁠x = "" ; xpd(y ~ .[x])⁠ leads toy ~ 1.

Author(s)

Laurent Berge

References

Berge, Laurent, 2018, "Efficient estimation of maximum likelihood models withmultiple fixed-effects: the R package FENmlm." CREA Discussion Papers,13 ().

For models with multiple fixed-effects:

Gaure, Simen, 2013, "OLS with multiple high dimensional category variables",Computational Statistics & Data Analysis 66 pp. 8–18

On the unconditionnal Negative Binomial model:

Allison, Paul D and Waterman, Richard P, 2002, "Fixed-Effects NegativeBinomial Regression Models", Sociological Methodology 32(1) pp. 247–265

See Also

See alsosummary.fixest to see the results with the appropriate standard-errors,fixef.fixest to extract the fixed-effects coefficients, and the functionetable to visualize the results of multiple estimations.And other estimation methods:feols,feglm,fepois,feNmlm.

Examples

# Load trade datadata(trade)# We estimate the effect of distance on trade => we account for 3 fixed-effects# 1) Poisson estimationest_pois = femlm(Euros ~ log(dist_km) | Origin + Destination + Product, trade)# 2) Log-Log Gaussian estimation (with same FEs)est_gaus = update(est_pois, log(Euros+1) ~ ., family = "gaussian")# Comparison of the results using the function etableetable(est_pois, est_gaus)# Now using two way clustered standard-errorsetable(est_pois, est_gaus, se = "twoway")# Comparing different types of standard errorssum_hetero   = summary(est_pois, se = "hetero")sum_oneway   = summary(est_pois, se = "cluster")sum_twoway   = summary(est_pois, se = "twoway")sum_threeway = summary(est_pois, se = "threeway")etable(sum_hetero, sum_oneway, sum_twoway, sum_threeway)## Multiple estimations:## 6 estimationsest_mult = femlm(c(Ozone, Solar.R) ~ Wind + Temp + csw0(Wind:Temp, Day), airquality)# We can display the results for the first lhs:etable(est_mult[lhs = 1])# And now the second (access can be made by name)etable(est_mult[lhs = "Solar.R"])# Now we focus on the two last right hand sides# (note that .N can be used to specify the last item)etable(est_mult[rhs = 2:.N])# Combining with splitest_split = fepois(c(Ozone, Solar.R) ~ sw(poly(Wind, 2), poly(Temp, 2)),                  airquality, split = ~ Month)# You can display everything at once with the print methodest_split# Different way of displaying the results with "compact"summary(est_split, "compact")# You can still select which sample/LHS/RHS to displayest_split[sample = 1:2, lhs = 1, rhs = 1]

Fixed-effects OLS estimation

Description

Estimates OLS with any number of fixed-effects.

Usage

feols(  fml,  data,  vcov,  weights,  offset,  subset,  split,  fsplit,  split.keep,  split.drop,  cluster,  se,  ssc,  panel.id,  panel.time.step = NULL,  panel.duplicate.method = "none",  fixef,  fixef.rm = "perfect_fit",  fixef.tol = 1e-06,  fixef.iter = 10000,  fixef.algo = NULL,  collin.tol = 1e-09,  nthreads = getFixest_nthreads(),  lean = FALSE,  verbose = 0,  warn = TRUE,  notes = getFixest_notes(),  only.coef = FALSE,  data.save = FALSE,  fixef.keep_names = NULL,  demeaned = FALSE,  mem.clean = FALSE,  only.env = FALSE,  env,  ...)feols.fit(  y,  X,  fixef_df,  vcov,  offset,  split,  fsplit,  split.keep,  split.drop,  cluster,  se,  ssc,  weights,  subset,  fixef.rm = "perfect_fit",  fixef.tol = 1e-06,  fixef.iter = 10000,  fixef.algo = NULL,  collin.tol = 1e-09,  nthreads = getFixest_nthreads(),  lean = FALSE,  warn = TRUE,  notes = getFixest_notes(),  mem.clean = FALSE,  verbose = 0,  only.env = FALSE,  only.coef = FALSE,  env,  ...)

Arguments

fml

A formula representing the relation to be estimated. For example:fml = z~x+y.To include fixed-effects, insert them in this formula using a pipe:e.g.fml = z~x+y | fe_1+fe_2. You can combine two fixed-effects with^:e.g.fml = z~x+y|fe_1^fe_2, see details. You can also use variables withvarying slopes using square brackets: e.g. infml = z~y|fe_1[x] + fe_2, see details.To add IVs, insert the endogenous vars./instruments after a pipe,like iny ~ x | x_endo1 + x_endo2 ~ x_inst1 + x_inst2.Note that it should always be the last element, see details. Multiple estimations can beperformed at once: for multiple dep. vars, wrap them inc(): exc(y1, y2).For multiple indep. vars, use the stepwise functions: exx1 + csw(x2, x3).The formulafml = c(y1, y2) ~ x1 + cw0(x2, x3) leads to 6 estimation, see details.Square brackets starting with a dot can be used to call global variables:y.[i] ~ x.[1:2] will lead toy3 ~ x1 + x2 ifi is equalto 3 in the current environment (see details inxpd).

data

A data.frame containing the necessary variables to run the model.The variables of the non-linear right hand side of the formula are identifiedwith thisdata.frame names. Can also be a matrix.

vcov

Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. TheVCOV types implemented are: "iid", "hetero" (or "HC1"), "cluster", "twoway","NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley". It also acceptsobject fromvcov_cluster,vcov_NW,NW,vcov_DK,DK,vcov_conley andconley. It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances. See thevcov documentationin thevignette.

weights

A formula or a numeric vector. Each observation can be weighted,the weights must be greater than 0. If equal to a formula, it should be one-sided:for example~ var_weight.

offset

A formula or a numeric vector. An offset can be added to the estimation.If equal to a formula, it should be of the form (for example)~0.5*x**2. Thisoffset is linearly added to the elements of the main formula 'fml'.

subset

A vector (logical or numeric) or a one-sided formula. If provided,then the estimation will be performed only on the observations defined by this argument.

split

A one sided formula representing a variable (egsplit = ~var) or a vector.If provided, the sample is split according to the variable and one estimation is performedfor each value of that variable. If you also want to include the estimation for thefull sample, use the argumentfsplit instead. You can use the special operators⁠%keep%⁠ and⁠%drop%⁠ to select only a subset of values for which to split thesample. E.g.split = ~var %keep% c("v1", "v2") will split the sample only accordingto the valuesv1 andv2 of the variablevar; it is equivalent to supplying theargumentsplit.keep = c("v1", "v2"). By default there is partial matching on each value,you can trigger a regular expression evaluation by adding a'@' first,as in:~var %drop% "@^v[12]" which will drop values starting with"v1" or"v2" (of course you need to know regexes!).

fsplit

A one sided formula representing a variable (egfsplit = ~var) or a vector.If provided, the sample is split according to the variable and one estimation is performedfor each value of that variable. This argument is the same assplit but also includes thefull sample as the first estimation. You can use the special operators⁠%keep%⁠ and⁠%drop%⁠to select only a subset of values for which to split the sample.E.g.fsplit = ~var %keep% c("v1", "v2") will split the sample only according to thevaluesv1 andv2 of the variablevar; it is equivalent to supplying theargumentsplit.keep = c("v1", "v2"). By default there is partial matching on each value,you can trigger a regular expression evaluation by adding an'@' first,as in:~var %drop% "@^v[12]" which will drop values starting with"v1"or"v2" (of course you need to know regexes!).

split.keep

A character vector. Only used whensplit, orfsplit, is supplied.If provided, then the sample will be split only on the values ofsplit.keep.The values insplit.keep will be partially matched to the values ofsplit.To enable regular expressions, you need to add an'@' first.For examplesplit.keep = c("v1", "@other|var") will keep only the valueinsplit partially matched by"v1" or the values containing"other" or"var".

split.drop

A character vector. Only used whensplit, orfsplit, is supplied.If provided, then the sample will be split only on the values that are not insplit.drop.The values insplit.drop will be partially matched to the values ofsplit.To enable regular expressions, you need to add an'@' first. For examplesplit.drop = c("v1", "@other|var") will drop only the value insplit partiallymatched by"v1" or the values containing"other" or"var".

cluster

Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering overvar1 andvar2contained in the data.framebase used for the estimation. All the followingcluster arguments are valid and do the same thing:cluster = base[, c("var1", "var2")],cluster = c("var1", "var2"),cluster = ~var1+var2.If the two variables were used as fixed-effects in the estimation, you can leave itblank withvcov = "twoway" (assumingvar1 [resp.var2] wasthe 1st [resp. 2nd] fixed-effect). You can interact two variables using^ withthe following syntax:cluster = ~var1^var2 orcluster = "var1^var2".

se

Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation:se = "cluster", otherwisese = "iid". Note that this argument is deprecated,you should usevcov instead.

ssc

An object of classssc.type obtained with the functionssc. Representshow the degree of freedom correction should be done.You must use the functionsscfor this argument. The arguments and defaults of the functionssc are:K.adj = TRUE,K.fixef = "nonnested",G.adj = TRUE,G.df = "min",t.df = "min",⁠K.exact = FALSE)⁠. See the help of the functionssc for details.

panel.id

The panel identifiers. Can either be: i) a one sided formula(e.g.panel.id = ~id+time), ii) a character vector of length 2(e.g.panel.id=c('id', 'time'), or iii) a character scalar of two variablesseparated by a comma (e.g.panel.id='id,time'). Note that you can combine variableswith^ only inside formulas (see the dedicated section infeols).

panel.time.step

The method to compute the lags, default isNULL (which meansautomatically set). Can be equal to:"unitary","consecutive","within.consecutive",or to a number. If"unitary", then the largest common divisor between consecutivetime periods is used (typically if the time variable represents years, it will be 1).This method can apply only to integer (or convertible to integer) variables.If"consecutive", then the time variable can be of any type: two successivetime periods represent a lag of 1. If"witihn.consecutive" thenwithin a given id,two successive time periods represent a lag of 1. Finally, if the time variable is numeric,you can provide your own numeric time step.

panel.duplicate.method

If several observations have the same id and time values,then the notion of lag is not defined for them. Ifduplicate.method = "none" (default)and duplicate values are found, this leads to an error. You can useduplicate.method = "first" so that the first occurrence of identical id/timeobservations will be used as lag.

fixef

Character vector. The names of variables to be used as fixed-effects.These variables should contain the identifier of each observation (e.g., think of itas a panel identifier). Note that the recommended way to include fixed-effects is toinsert them directly in the formula.

fixef.rm

Can be equal to "perfect_fit" (default), "singletons", "infinite_coef"or "none".

This option controls which observations should be removed prior to the estimation.If "singletons", fixed-effects associated to a single observation are removed(since they perfectly explain it).

The value "infinite_coef" only works with GLM families with limited left hand sides (LHS)and exponential link.For instance the Poisson family for which the LHS cannot be lower than 0, or the logitfamily for which the LHS lies within 0 and 1.In that case the fixed-effects (FEs) with only-0 LHS would lead to infinite coefficients(FE = -Inf would explain perfectly the LHS).The valuefixef.rm="infinite_coef" removes all observations associated to FEs withinfinite coefficients.

If "perfect_fit", it is equivalent to "singletons" and "infinite_coef" combined.That means all observations that are perfectly explained by the FEs are removed.

If "none": no observation is removed.

Note that whathever the value of this options: the coefficient estimateswill remain the same. It only affects inference (the standard-errors).

The algorithm is recursive, meaning that, e.g. in the presence of several fixed-effects (FEs),removing singletons in one FE can create singletons (or perfect fits) in another FE.The algorithm continues until there is no singleton/perfect-fit remaining.

fixef.tol

Precision used to obtain the fixed-effects. Defaults to1e-5.It corresponds to the maximum absolute difference allowed between two coefficientsof successive iterations. Argumentfixef.tol cannot be lowerthan10000*.Machine$double.eps. Note that this parameter is dynamicallycontrolled by the algorithm.

fixef.iter

Maximum number of iterations in fixed-effects algorithm(only in use for 2+ fixed-effects). Default is 10000.

fixef.algo

NULL (default) or an object of classdemeaning_algo obtained withthe functiondemeaning_algo. IfNULL, it falls to the defaults ofdemeaning_algo.This arguments controls the settings of the demeaning algorithm.Only play with it if the convergence is slow, i.e. look at the slot⁠$iterations⁠, and if any isover 50, it may be worth playing around with it. Please read the documentation of thefunctiondemeaning_algo. Be aware that there is no clear guidance on how to change thesettings, it's more a matter of try-and-see.

collin.tol

Numeric scalar, default is1e-9. Threshold deciding when variables shouldbe considered collinear and subsequently removed from the estimation. Higher values means morevariables will be removed (if there is presence of collinearity). One signal of presence ofcollinearity is t-stats that are extremely low (for instance when t-stats < 1e-3).

nthreads

The number of threads. Can be: a) an integer lower than, or equal to,the maximum number of threads; b) 0: meaning all available threads will be used;c) a number strictly between 0 and 1 which represents the fraction of all threads to use.The default is to use 50% of all threads. You can set permanently the numberof threads used within this package using the functionsetFixest_nthreads.

lean

Logical scalar, default isFALSE. IfTRUE then all large objects are removedfrom the returned result: this will save memory but will block the possibility touse many methods. It is recommended to use the argumentsse orcluster toobtain the appropriate standard-errors at estimation time, since obtaining differentSEs won't be possible afterwards.

verbose

Integer. Higher values give more information. In particular,it can detail the number of iterations in the demeaning algorithm(the first number is the left-hand-side, the other numbers are the right-hand-side variables).

warn

Logical, default isTRUE. Whether warnings should be displayed(concerns warnings relating to convergence state).

notes

Logical. By default, two notes are displayed: when NAs are removed(to show additional information) and when some observations are removed becauseof collinearity. To avoid displaying these messages, you can setnotes = FALSE.You can remove these messages permanently by usingsetFixest_notes(FALSE).

only.coef

Logical scalar, default isFALSE. IfTRUE, then only the estimatedcoefficients are returned. Note that the length of the vector returned is alwaysthe length of the number of coefficients to be estimated: this means that thevariables found to be collinear are returned with an NA value.

data.save

Logical scalar, default isFALSE. IfTRUE, the data used forthe estimation is saved within the returned object. Hence later calls to predict(),vcov(), etc..., will be consistent even if the original data has been modifiedin the meantime.This is especially useful for estimations within loops, where the data changesat each iteration, such that postprocessing can be done outside the loop without issue.

fixef.keep_names

Logical orNULL (default). When you combine differentvariables to transform them into a single fixed-effects you can doe.g.y ~ x | paste(var1, var2).The algorithm provides a shorthand to do the same operation:y ~ x | var1^var2.Because pasting variables is a costly operation, the internal algorithm may use anumerical trick to hasten the process. The cost of doing so is that you lose the labels.If you are interested in getting the value of the fixed-effects coefficientsafter the estimation, you should usefixef.keep_names = TRUE. By default it isequal toTRUE if the number of unique values is lower than 50,000, and toFALSEotherwise.

demeaned

Logical, default isFALSE. Only used in the presence of fixed-effects: shouldthe centered variables be returned? IfTRUE, it creates the itemsy_demeaned andX_demeaned.

mem.clean

Logical scalar, default isFALSE. Only to be used if the data set islarge compared to the available RAM. IfTRUE then intermediary objects are removed asmuch as possible andgc is run before each substantial C++ section in the internalcode to avoid memory issues.

only.env

(Advanced users.) Logical scalar, default isFALSE. IfTRUE, then onlythe environment used to make the estimation is returned.

env

(Advanced users.) Afixest environment created by afixest estimationwithonly.env = TRUE. Default is missing. If provided, the data from this environmentwill be used to perform the estimation.

...

Not currently used.

y

Numeric vector/matrix/data.frame of the dependent variable(s). Multiple dependentvariables will return afixest_multi object.

X

Numeric matrix of the regressors.

fixef_df

Matrix/data.frame of the fixed-effects.

Details

The method used to demean each variable along the fixed-effects is based on Berge (2018), sincethis is the same problem to solve as for the Gaussian case in a ML setup.

Value

Afixest object. Note thatfixest objects contain many elements and most of them arefor internal use, they are presented here only for information. To access them, it is saferto use the user-level methods (e.g.vcov.fixest,resid.fixest, etc) or functions(like for instancefitstat to access any fit statistic).

nobs

The number of observations.

fml

The linear formula of the call.

call

The call of the function.

method

The method used to estimate the model.

data

The original data set used when calling the function. Only available whenthe estimation was called withdata.save = TRUE

fml_all

A list containing different parts of the formula. Always contain the linear formula. Then depending on the cases:fixef: the fixed-effects,iv: the IV part of the formula.

fixef_vars

The names of each fixed-effect dimension.

fixef_id

The list (of length the number of fixed-effects) of the fixed-effects identifiers for each observation.

fixef_sizes

The size of each fixed-effect (i.e. the number of unique identifierfor each fixed-effect dimension).

coefficients

The named vector of estimated coefficients.

multicol

Logical, if multicollinearity was found.

coeftable

The table of the coefficients with their standard errors, z-values and p-values.

loglik

The loglikelihood.

ssr_null

Sum of the squared residuals of the null model (containing only with the intercept).

ssr_fe_only

Sum of the squared residuals of the model estimated with fixed-effects only.

ll_null

The log-likelihood of the null model (containing only with the intercept).

ll_fe_only

The log-likelihood of the model estimated with fixed-effects only.

fitted.values

The fitted values.

linear.predictors

The linear predictors.

residuals

The residuals (y minus the fitted values).

sq.cor

Squared correlation between the dependent variable and the expected predictor (i.e. fitted.values) obtained by the estimation.

hessian

The Hessian of the parameters.

cov.iid

The variance-covariance matrix of the parameters.

se

The standard-error of the parameters.

scores

The matrix of the scores (first derivative for each observation).

residuals

The difference between the dependent variable and the expected predictor.

sumFE

The sum of the fixed-effects coefficients for each observation.

offset

(When relevant.) The offset formula.

weights

(When relevant.) The weights formula.

obs_selection

(When relevant.) List containing vectors of integers. It represents the sequential selection of observation vis a vis the original data set.

collin.var

(When relevant.) Vector containing the variables removed because of collinearity.

collin.coef

(When relevant.) Vector of coefficients, where the values of the variables removed because of collinearity are NA.

collin.min_norm

The minimal diagonal value of the Cholesky decomposition. Small values indicate possible presence collinearity.

y_demeaned

Only whendemeaned = TRUE: the centered dependent variable.

X_demeaned

Only whendemeaned = TRUE: the centered explanatory variable.

Combining the fixed-effects

You can combine two variables to make it a new fixed-effect using^.The syntax is as follows:fe_1^fe_2. Here you created a new variable which is the combinationof the two variables fe_1 and fe_2. This is identical to doingpaste0(fe_1, "_", fe_2)but more convenient.

Note that pasting is a costly operation, especially for large data sets.Hence, by default this paste is done only when the number of unique valuesis lower than 50,000 observations.

In case you are using a large data set and want to keep the identity of the fixed-effects,you need to use the argumentfixef.keep_names = TRUE.

Note that these “identities” are useful only if you're interested inthe value of the fixed-effects (that you can extract withfixef.fixest).

Varying slopes

You can add variables with varying slopes in the fixed-effect part of the formula.The syntax is as follows:fixef_var[var1, var2]. Here the variables var1 and var2 willbe with varying slopes (one slope per value in fixef_var) and the fixed-effectfixef_var will also be added.

To add only the variables with varying slopes and not the fixed-effect,use double square brackets:fixef_var[[var1, var2]].

In other words:

In general, for convergence reasons, it is recommended to always add the fixed-effect andavoid using only the variable with varying slope (i.e. use single square brackets).

Lagging variables

To use leads/lags of variables in the estimation, you can: i) either provide the argumentpanel.id, ii) either set your data set as a panel with the functionpanel,f andd.

You can provide several leads/lags/differences at once: e.g. if your formula is equal tof(y) ~ l(x, -1:1), it means that the dependent variable is equal to the lead ofy,and you will have as explanatory variables the lead ofx1,x1 and the lag ofx1.See the examples in functionl for more details.

Interactions

You can interact a numeric variable with a "factor-like" variable by usingi(factor_var, continuous_var, ref), wherecontinuous_var will be interacted witheach value offactor_var and the argumentref is a value offactor_vartaken as a reference (optional).

Using this specific way to create interactions leads to a different display of theinteracted values inetable. See examples.

It is important to note thatif you do not care about the standard-errors ofthe interactions, then you can add interactions in the fixed-effects part of the formula,it will be incomparably faster (using the syntaxfactor_var[continuous_var], as explainedin the section “Varying slopes”).

The functioni has in fact more arguments, please see details in its associated help page.

On standard-errors

Standard-errors can be computed in different ways, you can use the argumentsse andsscinsummary.fixest to define how to compute them. By default, the VCOV is the "standard" one.

The following vignette:On standard-errors describes in details how the standard-errors are computed infixest and how you can replicate standard-errors from other software.

You can use the functionssetFixest_vcov andsetFixest_ssc topermanently set the way the standard-errors are computed.

Instrumental variables

To estimate two stage least square regressions, insert the relationship betweenthe endogenous regressor(s) and the instruments in a formula, after a pipe.

For example,fml = y ~ x1 | x_endo ~ x_inst will use the variablesx1 andx_inst inthe first stage to explainx_endo. Then will use the fitted value ofx_endo(which will be namedfit_x_endo) andx1 to explainy.To include several endogenous regressors, just use "+",like in:fml = y ~ x1 | x_endo1 + x_end2 ~ x_inst1 + x_inst2.

Of course you can still add the fixed-effects, but the IV formula must always come last,like infml = y ~ x1 | fe1 + fe2 | x_endo ~ x_inst.

If you want to estimate a model without exogenous variables, use"1" as aplaceholder: e.g.fml = y ~ 1 | x_endo ~ x_inst.

By default, the second stage regression is returned. You can access the first stage(s)regressions either directly in the slotiv_first_stage (not recommended),or using the argumentstage = 1 from the functionsummary.fixest.For examplesummary(iv_est, stage = 1) will give the first stage(s).Note that using summary you can display both the second and first stages atthe same time using, e.g.,stage = 1:2 (using2:1 would reverse the order).

Multiple estimations

Multiple estimations can be performed at once, they just have to be specified in the formula.Multiple estimations yield afixest_multi object which is ‘kind of’ a list ofall the results but includes specific methods to access the results in a handy way.Please have a look at the dedicated vignette:Multiple estimations.

To include multiple dependent variables, wrap them inc() (list() also works).For instancefml = c(y1, y2) ~ x1 would estimate the modelfml = y1 ~ x1 andthen the modelfml = y2 ~ x1.

To include multiple independent variables, you need to use the stepwise functions.There are 4 stepwise functions:sw,sw0,csw,csw0, andmvsw. Of courseswstands for stepwise, andcsw for cumulative stepwise. Finallymvsw is a bit special,it stands for multiverse stepwise. Let's explain that.Assume you have the following formula:fml = y ~ x1 + sw(x2, x3).The stepwise functionsw will estimate the following two models:y ~ x1 + x2 andy ~ x1 + x3. That is, each element insw() is sequentially, and separately,added to the formula. Would have you usedsw0 in lieu ofsw, then the modely ~ x1 would also have been estimated. The0 in the name means that the modelwithout any stepwise element also needs to be estimated.The prefixc means cumulative: each stepwise element is added to the next. That is,fml = y ~ x1 + csw(x2, x3) would lead to the following modelsy ~ x1 + x2 andy ~ x1 + x2 + x3. The0 has the same meaning and would also lead to the model withoutthe stepwise elements to be estimated: in other words,fml = y ~ x1 + csw0(x2, x3)leads to the following three models:y ~ x1,y ~ x1 + x2 andy ~ x1 + x2 + x3.Finallymvsw will add, in a stepwise fashion all possible combinations of the variablesin its arguments. For examplemvsw(x1, x2, x3) is equivalent tosw0(x1, x2, x3, x1 + x2, x1 + x3, x2 + x3, x1 + x2 + x3). The number of modelsto estimate grows at a factorial rate: so be cautious!

Multiple independent variables can be combined with multiple dependent variables, as infml = c(y1, y2) ~ cw(x1, x2, x3) which would lead to 6 estimations. Multipleestimations can also be combined to split samples (with the argumentssplit,fsplit).

You can also add fixed-effects in a stepwise fashion. Note that you cannot performstepwise estimations on the IV part of the formula (feols only).

If NAs are present in the sample, to avoid too many messages, only NA removalconcerning the variables common to all estimations is reported.

A note on performance. The feature of multiple estimations has been highly optimized forfeols, in particular in the presence of fixed-effects. It is faster to estimatemultiple models using the formula rather than with a loop. For non-feols models usingthe formula is roughly similar to using a loop performance-wise.

Tricks to estimate multiple LHS

To use multiple dependent variables infixest estimations, you need to include themin a vector: like inc(y1, y2, y3).

First, if names are stored in a vector, they can readily be inserted in a formula toperform multiple estimations using the dot square bracket operator. For instance ifmy_lhs = c("y1", "y2"), callingfixest with, sayfeols(.[my_lhs] ~ x1, etc) isequivalent to usingfeols(c(y1, y2) ~ x1, etc). Beware that this is a special featureunique to theleft-hand-side offixest estimations (the default behavior of the DSBoperator is to aggregate with sums, seexpd).

Second, you can use a regular expression to grep the left-hand-sides on the fly. When the..("regex") (reregex("regex")) feature is used naked on the LHS,the variables grepped are inserted intoc(). For example⁠..("Pe") ~ Sepal.Length, iris⁠ is equivalent to⁠c(Petal.Length, Petal.Width) ~ Sepal.Length, iris⁠. Beware that this is aspecial feature unique to theleft-hand-side offixest estimations(the default behavior of..("regex") is to aggregate with sums, seexpd).

Note that if the dependent variable is also on the right-hand-side, it is automaticallyremoved from the set of explanatory variable.For example, feols(y ~ y + x, base) works as feols(y ~ x, base).This is particulary useful to batch multiple estimations with multiple left hand sides.

Argument sliding

When the data set has been set up globally usingsetFixest_estimation(data = data_set), the argumentvcov can be used implicitly.This means that calls such asfeols(y ~ x, "HC1"), orfeols(y ~ x, ~id), are valid:i) the data is automatically deduced from the global settings, and ii) thevcovis deduced to be the second argument.

Piping

Although the argument 'data' is placed in second position, the data can be piped to theestimation functions. For example, with R >= 4.1,mtcars |> feols(mpg ~ cyl) works asfeols(mpg ~ cyl, mtcars).

Dot square bracket operator in formulas

In a formula, the dot square bracket (DSB) operator can: i) create manifold variablesat once, or ii) capture values from the current environment and put themverbatim in the formula.

Say you want to include the variablesx1 tox3 in your formula. You can usexpd(y ~ x.[1:3]) and you'll gety ~ x1 + x2 + x3.

To summon values from the environment, simply put the variable in square brackets.For example:for(i in 1:3) xpd(y.[i] ~ x) will create the formulasy1 ~ x toy3 ~ xdepending on the value ofi.

You can include a full variable from the environment in the same way:for(y in c("a", "b")) xpd(.[y] ~ x) will create the two formulasa ~ x andb ~ x.

The DSB can even be used within variable names, but then the variable must be nested incharacter form. For exampley ~ .["x.[1:2]_sq"] will createy ~ x1_sq + x2_sq. Using thecharacter form is important to avoid a formula parsing error.Double quotes must be used. Note that the character string that is nested willbe parsed with the functiondsb, and thus it will return a vector.

By default, the DSB operator expands vectors into sums. You can add a comma,like in.[, x],to expand with commas–the content can then be used within functions. For instance:c(x.[, 1:2]) will createc(x1, x2) (andnotc(x1 + x2)).

In allfixest estimations, this special parsing is enabled, so you don't need to usexpd.

One-sided formulas can be expanded with the DSB operator: letx = ~sepal + petal, thenxpd(y ~ .[x]) leads tocolor ~ sepal + petal.

You can even use multiple square brackets within a single variable,but then the use of nesting is required.For example, the followingxpd(y ~ .[".[letters[1:2]]_.[1:2]"]) will createy ~ a_1 + b_2. Remember that the nested character string is parsed withdsb,which explains this behavior.

When the element to be expanded i) is equal to the empty string or,ii) is of length 0, it is replaced with a neutral element, namely1.For example,⁠x = "" ; xpd(y ~ .[x])⁠ leads toy ~ 1.

Author(s)

Laurent Berge

References

Berge, Laurent, 2018, "Efficient estimation of maximum likelihood models withmultiple fixed-effects: the R package FENmlm." CREA Discussion Papers, 13 ().

For models with multiple fixed-effects:

Gaure, Simen, 2013, "OLS with multiple high dimensional category variables",Computational Statistics & Data Analysis 66 pp. 8–18

See Also

See alsosummary.fixest to see the results with the appropriate standard-errors,fixef.fixest to extract the fixed-effects coefficients, and the functionetableto visualize the results of multiple estimations. For plotting coefficients: seecoefplot.

And other estimation methods:femlm,feglm,fepois,fenegbin,feNmlm.

Examples

## Basic estimation#res = feols(Sepal.Length ~ Sepal.Width + Petal.Length, iris)# You can specify clustered standard-errors in summary:summary(res, cluster = ~Species)## Just one set of fixed-effects:#res = feols(Sepal.Length ~ Sepal.Width + Petal.Length | Species, iris)# Here we have "default" SEssummary(res)## Varying slopes:#res = feols(Sepal.Length ~ Petal.Length | Species[Sepal.Width], iris)summary(res)## Combining the FEs:#base = irisbase$fe_2 = rep(1:10, 15)res_comb = feols(Sepal.Length ~ Petal.Length | Species^fe_2, base)summary(res_comb)fixef(res_comb)[[1]]## Using leads/lags:#data(base_did)# We need to set up the panel with the arg. panel.idest1 = feols(y ~ l(x1, 0:1), base_did, panel.id = ~id+period)est2 = feols(f(y) ~ l(x1, -1:1), base_did, panel.id = ~id+period)etable(est1, est2, order = "f", drop = "Int")## Using interactions:#data(base_did)# We interact the variable 'period' with the variable 'treat'est_did = feols(y ~ x1 + i(period, treat, 5) | id + period, base_did)# Now we can plot the result of the interaction with coefplotcoefplot(est_did)# You have many more example in coefplot help## Instrumental variables## To estimate Two stage least squares,# insert a formula describing the endo. vars./instr. relation after a pipe:data(fulton)# Using exogenous control, 1 endogenous var. and 1 instrumentres_iv = feols(qty ~ t | price ~ speed2, fulton)# The second stage is the defaultsummary(res_iv)# To show the first stage:summary(res_iv, stage = 1)# To show both the first and second stages:summary(res_iv, stage = 1:2)# Adding a fixed-effect => IV formula always last!res_iv_fe = feols(qty ~ t | day | price ~ speed2, fulton)# With two instrumentsres_iv2 = feols(qty ~ t | day | price ~ speed2 + wave2, fulton)# Now there's two first stages => a fixest_multi object is returnedsum_res_iv2 = summary(res_iv2, stage = 1)# You can navigate through it by subsetting:sum_res_iv2[iv = 1]# The stage argument also works in etable:etable(res_iv, res_iv_fe, res_iv2, order = "endo")etable(res_iv, res_iv_fe, res_iv2, stage = 1:2, order = c("endo", "inst"),       group = list(control = "!endo|inst"))## Multiple estimations:## 6 estimationsest_mult = feols(c(Ozone, Solar.R) ~ Wind + Temp + csw0(Wind:Temp, Day), airquality)# We can display the results for the first lhs:etable(est_mult[lhs = 1])# And now the second (access can be made by name)etable(est_mult[lhs = "Solar.R"])# Now we focus on the two last right hand sides# (note that .N can be used to specify the last item)etable(est_mult[rhs = 2:.N])# Combining with splitest_split = feols(c(Ozone, Solar.R) ~ sw(poly(Wind, 2), poly(Temp, 2)),                  airquality, split = ~ Month)# You can display everything at once with the print methodest_split# Different way of displaying the results with "compact"summary(est_split, "compact")# You can still select which sample/LHS/RHS to displayest_split[sample = 1:2, lhs = 1, rhs = 1]## Split sample estimations#base = setNames(iris, c("y", "x1", "x2", "x3", "species"))est  = feols(y ~ x.[1:3], base, split = ~species)etable(est)# You can select specific values with the %keep% and %drop% operators# By default, partial matching is enabled. It should refer to a single variable.est  = feols(y ~ x.[1:3], base, split = ~species %keep% c("set", "vers"))etable(est)# You can supply regular expression by using an @ first.# regex can match several values.est  = feols(y ~ x.[1:3], base, split = ~species %keep% c("@set|vers"))etable(est)## Argument sliding## When the data set is set up globally, you can use the vcov argument implicitlybase = setNames(iris, c("y", "x1", "x2", "x3", "species"))no_sliding = feols(y ~ x1 + x2, base, ~species)# With slidingsetFixest_estimation(data = base)# ~species is implicitly deduced to be equal to 'vcov'sliding = feols(y ~ x1 + x2, ~species)etable(no_sliding, sliding)# Resetting the global optionssetFixest_estimation(data = NULL)## Formula expansions## By default, the features of the xpd function are enabled in# all fixest estimations# Here's a few examplesbase = setNames(iris, c("y", "x1", "x2", "x3", "species"))# dot square bracket operatorfeols(y ~ x.[1:3], base)# fetching variables via regular expressions: ..("regex")feols(y ~ ..("1|2"), base)# NOTA: it also works for multiple LHSmult1 = feols(x.[1:2] ~ y + species, base)mult2 = feols(..("y|3") ~ x.[1:2] + species, base)etable(mult1, mult2)# Use .[, stuff] to include variables in functions:feols(y ~ csw(x.[, 1:3]), base)# Same for ..(, "regex")feols(y ~ csw(..(,"x")), base)

Computes fit statistics of fixest objects

Description

Computes various fit statistics forfixest estimations.

Usage

fitstat(  x,  type,  vcov = NULL,  cluster = NULL,  ssc = NULL,  simplify = FALSE,  verbose = TRUE,  show_types = FALSE,  frame = parent.frame(),  ...)

Arguments

x

Afixest estimation.

type

Character vector or one sided formula. The type of fit statistic to be computed.The classic ones are: n, rmse, r2, pr2, f, wald, ivf, ivwald. You have the full list inthe details section or useshow_types = TRUE. Further, you can register your own typeswithfitstat_register.

vcov

Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. TheVCOV types implemented are: "iid", "hetero" (or "HC1"), "cluster", "twoway","NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley". It also acceptsobject fromvcov_cluster,vcov_NW,NW,vcov_DK,DK,vcov_conley andconley. It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances. See thevcov documentationin thevignette.

cluster

Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering overvar1 andvar2contained in the data.framebase used for the estimation. All the followingcluster arguments are valid and do the same thing:cluster = base[, c("var1", "var2")],cluster = c("var1", "var2"),cluster = ~var1+var2.If the two variables were used as fixed-effects in the estimation, you can leave itblank withvcov = "twoway" (assumingvar1 [resp.var2] wasthe 1st [resp. 2nd] fixed-effect). You can interact two variables using^ withthe following syntax:cluster = ~var1^var2 orcluster = "var1^var2".

ssc

An object of classssc.type obtained with the functionssc. Representshow the degree of freedom correction should be done.You must use the functionsscfor this argument. The arguments and defaults of the functionssc are:K.adj = TRUE,K.fixef = "nonnested",G.adj = TRUE,G.df = "min",t.df = "min",⁠K.exact = FALSE)⁠. See the help of the functionssc for details.

simplify

Logical, default isFALSE. By default a list is returned whose names arethe selected types. Ifsimplify = TRUE and only one type is selected, then the elementis directly returned (ie will not be nested in a list).

verbose

Logical, default isTRUE. IfTRUE, an object of classfixest_fitstatis returned (so its associated print method will be triggered). IfFALSE a simple listis returned instead.

show_types

Logical, default isFALSE. IfTRUE, only prompts all available types.

frame

An environment in which to evaluate variables, default isparent.frame().Only used if the argumenttype is a formula and some values in the formula have to beextended with the dot square bracket operator. Mostly for internal use.

...

For internal use.

Value

By default an object of classfixest_fitstat is returned. Usingverbose = FALSEreturns a simple a list. Finally, if only one type is selected,simplify = TRUEleads to the selected type to be returned.

Registering your own types

You can register custom fit statistics with the functionfitstat_register.

Available types

The types are case sensitive, please use lower case only. The types available are:

n,ll,aic,bic,rmse:

The number of observations, the log-likelihood,the AIC, the BIC and the root mean squared error, respectively.

my:

Mean of the dependent variable.

g:

The degrees of freedom used to compute the t-test (it influences the p-valuesof the coefficients). When the VCOV is clustered, this value is equal to the minimumcluster size, otherwise, it is equal to the sample size minus the number of variables.

r2,ar2,wr2,awr2,pr2,apr2,wpr2,awpr2:

All r2 that can beobtained with the functionr2. Thea stands for 'adjusted', thew for 'within' andthep for 'pseudo'. Note that the order of the lettersa,w andp does not matter.The pseudo R2s are McFadden's R2s (ratios of log-likelihoods).

theta:

The over-dispersion parameter in Negative Binomial models. Low values meanhigh overdispersion.

f,wf:

The F-tests of nullity of the coefficients. Thew stands for'within'. These types return the following values:stat,p,df1 anddf2.If you want to display only one of these, use their name after a dot: e.g.f.statwill give the statistic of the F-test, orwf.p will give the p-values of the F-teston the projected model (i.e. projected onto the fixed-effects).

wald:

Wald test of joint nullity of the coefficients. This test always excludesthe intercept and the fixed-effects. These type returns the following values:stat,p,df1,df2 andvcov. The elementvcov reports the way the VCOVmatrix was computed since it directly influences this statistic.

ivf,ivf1,ivf2,ivfall:

These statistics are specific to IV estimations.They report either the IV F-test (namely the Cragg-Donald F statistic in the presenceof only one endogenous regressor) of the first stage (ivf orivf1), of thesecond stage (ivf2) or of both (ivfall). The F-test of the first stage iscommonly named weak instrument test. The value ofivfall is only useful inetablewhen both the 1st and 2nd stages are displayed (it leads to the 1st stage F-test(s)to be displayed on the 1st stage estimation(s), and the 2nd stage one on the2nd stage estimation – otherwise,ivf1 would also be displayed on the 2nd stageestimation). These types return the following values:stat,p,df1 anddf2.

ivwald,ivwald1,ivwald2,ivwaldall:

These statistics are specific to IVestimations. They report either the IV Wald-test of the first stage (ivwald orivwald1),of the second stage (ivwald2) or of both (ivwaldall). The Wald-test of the first stageis commonly named weak instrument test. Note that if the estimation was done with a robustVCOV and there is only one endogenous regressor, this is equivalent to theKleibergen-Paap statistic. The value ofivwaldall is only useful inetable when boththe 1st and 2nd stages are displayed (it leads to the 1st stage Wald-test(s) to be displayedon the 1st stage estimation(s), and the 2nd stage one on the 2nd stage estimation –otherwise,ivwald1 would also be displayed on the 2nd stage estimation). These typesreturn the following values:stat,p,df1,df2, andvcov.

cd:

The Cragg-Donald test for weak instruments.

kpr:

The Kleibergen-Paap test for weak instruments.

wh:

This statistic is specific to IV estimations. Wu-Hausman endogeneity test.H0 is the absence of endogeneity of the instrumented variables. It returns the followingvalues:stat,p,df1,df2.

sargan:

Sargan test of overidentifying restrictions. H0: the instruments arenot correlated with the second stage residuals. It returns thefollowing values:stat,p,df.

lr,wlr:

Likelihood ratio and within likelihood ratio tests. It returnsthe following elements:stat,p,df. Concerning the within-LR test, note that,contrary to estimations withfemlm orfeNmlm, estimations withfeglm/fepoisneed to estimate the model with fixed-effects only which may prove time-consuming(depending on your model). Bottom line, if you really need the within-LR and estimate aPoisson model, usefemlm instead offepois (the former uses direct ML maximization forwhich the only FEs model is a by product).

Examples

data(trade)gravity = feols(log(Euros) ~ log(dist_km) | Destination + Origin, trade)# Extracting the 'working' number of observations used to compute the pvaluesfitstat(gravity, "g", simplify = TRUE)# Some fit statisticsfitstat(gravity, ~ rmse + r2 + wald + wf)# You can use them in etableetable(gravity, fitstat = ~ rmse + r2 + wald + wf)# For wald and wf, you could show the pvalue instead:etable(gravity, fitstat = ~ rmse + r2 + wald.p + wf.p)# Now let's display some statistics that are not built-in# => we use fitstat_register to create them# We need: a) type name, b) the function to be applied#          c) (optional) an aliasfitstat_register("tstand", function(x) tstat(x, se = "stand")[1], "t-stat (regular)")fitstat_register("thc", function(x) tstat(x, se = "heter")[1], "t-stat (HC1)")fitstat_register("t1w", function(x) tstat(x, se = "clus")[1], "t-stat (clustered)")fitstat_register("t2w", function(x) tstat(x, se = "twow")[1], "t-stat (2-way)")# Now we can use these keywords in fitstat:etable(gravity, fitstat = ~ . + tstand + thc + t1w + t2w)# Note that the custom stats we created are can easily lead# to errors, but that's another story!

Register custom fit statistics

Description

Enables the registration of custom fit statistics that can be easily summoned with the functionfitstat.

Usage

fitstat_register(type, fun, alias = NULL, subtypes = NULL)

Arguments

type

A character scalar giving the type-name.

fun

A function to be applied to afixest estimation. It must return either a scalar,or a list of unitary elements. If the number of elements returned is greater than 1,then each element must be named! If the fit statistic is not valid for a given estimation,a plainNA value should be returned.

alias

A (named) character vector. An alias to be used in lieu of the type name inthe display methods (ie when used inprint.fixest_fitstat oretable).If the function returns several values, i.e. sub-types, you can give an alias tothese sub-types. The syntax isc("type" = "alias", "subtype_i" = "alias_i"),with "type" (resp. "subtype") the value of the argumenttype resp. (subtypes).You can also give an alias encompassing the type and sub-type with the syntaxc("type.subtype_i" = "alias").

subtypes

A character vector giving the name of each element returned by thefunctionfun. This is only used when the function returns more than one value.Note that you can use the shortcut "test" when the sub-types are "stat", "p" and "df";and "test2" when these are "stat", "p", "df1" and "df2".

Details

If there are several components to the computed statistics (i.e. the function returnsseveral elements), then using the argumentsubtypes, giving the names of each ofthese components, is mandatory. This is to ensure that the statistic can be used as anyother built-in statistic (and there are too many edge cases impeding automatic deduction).

Author(s)

Laurent Berge

Examples

# An estimationbase = irisnames(base) = c("y", "x1", "x2", "x3", "species")est = feols(y ~ x1 + x2 | species, base)## single valued tests## say you want to add the coefficient of variation of the dependent variablecv = function(est){  y = model.matrix(est, type = "lhs")  sd(y)/mean(y)}# Now we register the routinefitstat_register("cvy", cv, "Coef. of Variation (dep. var.)")# now we can summon the registered routine with its type ("cvy")fitstat(est, "cvy")## Multi valued tests## Let's say you want a Wald test with an heteroskedasticiy robust variance# First we create the functionhc_wald = function(est){  w = wald(est, keep = "!Intercept", print = FALSE, se = "hetero")  head(w, 4)}# This test returns a vector of 4 elements: stat, p, df1 and df2# Now we register the routinefitstat_register("hc_wald", hc_wald, "Wald (HC1)", "test2")# You can access the statistic, as beforefitstat(est, "hc_wald")# But you can also access the sub elementsfitstat(est, "hc_wald.p")

Extracts fitted values from afixest fit

Description

This function extracts the fitted values from a model estimated withfemlm,feols orfeglm. The fitted values that are returned are theexpected predictor.

Usage

## S3 method for class 'fixest'fitted(object, type = c("response", "link"), na.rm = TRUE, ...)## S3 method for class 'fixest'fitted.values(object, type = c("response", "link"), na.rm = TRUE, ...)

Arguments

object

Afixest object. Obtained using the functionsfemlm,feols orfeglm.

type

Character either equal to"response" (default) or"link".Iftype="response", then the output is at the level of the response variable, i.e.it is the expected predictorE(Y|X). If"link", then the output is atthe level of the explanatory variables, i.e. the linear predictorX\cdot \beta.

na.rm

Logical, default isTRUE. IfFALSE the number of observation returnedwill be the number of observations in the original data set, otherwise it will be thenumber of observations used in the estimation.

...

Not currently used.

Details

This function returns theexpected predictor of afixest fit. The likelihood functionsare detailed infemlm help page.

Value

It returns a numeric vector of length the number of observations used to estimate the model.

Iftype = "response", the value returned is the expected predictor, i.e. theexpected value of the dependent variable for the fitted model:E(Y|X).Iftype = "link", the value returned is the linear predictor of the fitted model,that isX\cdot \beta (remind thatE(Y|X) = f(X\cdot \beta)).

Author(s)

Laurent Berge

See Also

See also the main estimation functionsfemlm,feols orfeglm.resid.fixest,predict.fixest,summary.fixest,vcov.fixest,fixef.fixest.

Examples

# simple estimation on iris data, using "Species" fixed-effectsres_poisson = femlm(Sepal.Length ~ Sepal.Width + Petal.Length +                    Petal.Width | Species, iris)# we extract the fitted valuesy_fitted_poisson = fitted(res_poisson)# Same estimation but in OLS (Gaussian family)res_gaussian = femlm(Sepal.Length ~ Sepal.Width + Petal.Length +                    Petal.Width | Species, iris, family = "gaussian")y_fitted_gaussian = fitted(res_gaussian)# comparison of the fit for the two familiesplot(iris$Sepal.Length, y_fitted_poisson)points(iris$Sepal.Length, y_fitted_gaussian, col = 2, pch = 2)

Extract the Fixed-Effects from afixest estimation.

Description

This function retrieves the fixed effects from afixest estimation. It is useful onlywhen there are one or more fixed-effect dimensions.

Usage

## S3 method for class 'fixest'fixef(  object,  notes = getFixest_notes(),  sorted = TRUE,  nthreads = getFixest_nthreads(),  fixef.tol = 1e-05,  fixef.iter = 10000,  ...)

Arguments

object

Afixest estimation (e.g. obtained usingfeols orfeglm).

notes

Logical. Whether to display a note when the fixed-effects coefficients arenot regular.

sorted

Logical, default isTRUE. Whether to order the fixed-effects by their names.IfFALSE, then the order used in the demeaning algorithm is used.

nthreads

The number of threads. Can be: a) an integer lower than, or equal to,the maximum number of threads; b) 0: meaning all available threads will be used;c) a number strictly between 0 and 1 which represents the fraction of all threads to use.The default is to use 50% of all threads. You can set permanently the numberof threads used within this package using the functionsetFixest_nthreads.

fixef.tol

Precision used to obtain the fixed-effects. Defaults to1e-5.It corresponds to the maximum absolute difference allowed between two coefficientsof successive iterations. Argumentfixef.tol cannot be lowerthan10000*.Machine$double.eps. Note that this parameter is dynamicallycontrolled by the algorithm.

fixef.iter

Maximum number of iterations in fixed-effects algorithm(only in use for 2+ fixed-effects). Default is 10000.

...

Not currently used.

Details

If the fixed-effect coefficients are not regular, then several reference points need tobe set: this means that the fixed-effects coefficients cannot be directly interpreted.If this is the case, then a warning is raised.

Value

A list containing the vectors of the fixed effects.

If there is more than 1 fixed-effect, then the attribute “references” is created.This is a vector of length the number of fixed-effects, each element contains the numberof coefficients set as references. By construction, the elements of the firstfixed-effect dimension are never set as references. In the presence of regularfixed-effects, there should be Q-1 references (with Q the number of fixed-effects).

Author(s)

Laurent Berge

See Also

plot.fixest.fixef. See also the main estimation functionsfemlm,feolsorfeglm. Usesummary.fixest to see the results with the appropriatestandard-errors,fixef.fixest to extract the fixed-effect coefficients, andthe functionetable to visualize the results of multiple estimations.

Examples

data(trade)# We estimate the effect of distance on trade => we account for 3 fixed-effectsest_pois = femlm(Euros ~ log(dist_km)|Origin+Destination+Product, trade)# Obtaining the fixed-effects coefficients:fe_trade = fixef(est_pois)# The fixed-effects of the first fixed-effect dimension:head(fe_trade$Origin)# Summary information:summary(fe_trade)# Plotting them:plot(fe_trade)

Functions exported fromnlme to implementfixest methods

Description

The packagefixest uses thefixef method fromnlme. Unfortunately,re-exporting this method is required in order not to attach packagenlme.

Details

Note

I could find this workaround thanks to the packageplm.


Retrieves the data set used for afixest estimation

Description

Retrieves the original data set used to estimate afixest orfixest_multi model.Note that this is the original data set and not the data used for the estimation (i.e. it can have more rows).

Usage

fixest_data(x, sample = "original")

Arguments

x

An object of classfixest orfixest_multi. For example obtained fromfeols orfeglm.

sample

Either "original" (default) or "estimation". If equal to "original",it matches the original data set. If equal to "estimation", the rows of the data setreturned matches the observations used for the estimation.

Value

It returns a data.frame equal to the original data set used for the estimation, when the function was called.

Ifsample = "estimation", only the lines used for the estimation are returned.

In case of afixest_multi object, it returns the data set of the first estimation object.So in that case it does not make sense to usesample = "estimation" sincethe samples may be inconsistent across the different estimations.

Examples

base = setNames(iris, c("y", "x1", "x2", "x3", "species"))base$y[1:5] = NAest = feols(y ~ x1 + x2, base)# the original data sethead(fixest_data(est))# the data set, with only the lines used for the estimationhead(fixest_data(est, sample = "est"))

Permanently removes the fixest package startup message

Description

Package startup messages can be very annoying, although sometimes they can be necessary.Use this function to preventfixest's package startup message from popping when loading.This will be specific to your current project.

Usage

fixest_startup_msg(x)

Arguments

x

Logical, no default. IfFALSE, the package startup message is removed.

Details

Note that this function is introduced to cope with the firstfixest startup message(in version 0.9.0).

This function works only with R >= 4.0.0. There are no startup messages for R < 4.0.0.


Extract the formula of afixest fit

Description

This function extracts the formula from afixest estimation (obtained withfemlm,feols orfeglm). If the estimation was done with fixed-effects, they are addedin the formula after a pipe (“|”). If the estimation was done with a nonlinear in parameters part, then this will be added in the formula in betweenI().

Usage

## S3 method for class 'fixest'formula(x, type = "full", fml.update = NULL, fml.build = NULL, ...)## S3 method for class 'fixest_multi'formula(x, type = "full", fml.update = NULL, fml.build = NULL, ...)

Arguments

x

An object of classfixest. Typically the result of afemlm,feolsorfeglm estimation.

type

A character scalar. Default istype = "full" which gives back a formulacontaining the linear part of the model along with the fixed-effects (if any) and theIV part (if any). Here is a description of the other types:

  • full.noiv: the full formula without the IV part

  • full.nofixef.noiv: the full formula without the IV nor the fixed-effects part

  • lhs: a one-sided formula with the dependent variable

  • rhs: a one-sided formula of the right hand side without the IVs (if any)

  • rhs.nofixef orindep: a one-sided formula of the right hand side without thefixed-effects nor IVs (if any), it is equivalent to theindependent variables

  • NL: a one-sided formula with the non-linear part (if any)

  • fixef: a one-sided formula containing the fixed-effects

  • iv: a two-sided formula containing the endogenous variables (left) and theinstruments (right)

  • iv.endo: a one-sided formula of the endogenous variables

  • iv.inst: a one-sided formula of the instruments

  • iv.reduced: a two-sided formula representing the reduced form,that isy ~ exo + inst

fml.update

A formula representing the changes to be made to the originalformula. By default it isNULL.Use a dot to refer to the previous variables in the current part.For example:. ~ . + xnew will add the variablexnew as an explanatory variable.Note that the previous fixed-effects (FEs) and IVs are implicitly forwarded.To rerun without the FEs or the IVs, you need to set them to 0 in their respective slot.Ex, assume the original formula is:y ~ x | fe | endo ~ inst, passing. ~ . + xnewto fml.update leads toy ~ x + xnew | fe | endo ~ inst (FEs and IVs are forwarded).To add xnew and remove the IV part: use. ~ . + xnew | . | 0 which leads toy ~ x + xnew | fe.

fml.build

A formula orNULL (default). You can create a new formula basedon the parts of the formula of the object inx. In this argument you have accessto these specific variables:

  • .: to refer to the part of the original formula

  • .lhs: to refer to the dependent variable

  • .indep: to refer to the independent variables (excluding the fixed-effects)

  • .fixef: to refer to the fixed-effects

  • .endo: to refer to endogenous variables in an IV estimation

  • .inst: to refer to instruments in an IV estimation

Example, the original estimation wasy ~ x1 | z ~ inst. Thenfml.build = . ~ .endo + . leads toy ~ z + x1.

...

Not currently used.

Details

The argumentstype,fml.update andfml.build are exclusive: theycannot be used at the same time.

Value

It returns either a one-sided formula, either a two-sided formula.

Author(s)

Laurent Berge

See Also

See also the main estimation functionsfemlm,feols orfeglm.model.matrix.fixest,update.fixest,summary.fixest,vcov.fixest.

Examples

# example estimation with IVS and FEsbase = setNames(iris, c("y", "x1", "endo", "instr", "species"))est = feols(y ~ x1 | species | endo ~ instr, base)# the full formulaformula(est)# idem without the IVs nor the FEsformula(est, "full.nofixef.noiv")# the reduced formformula(est, "iv.reduced")# the IV relation onlyformula(est, "iv")# the dependent variable => onse-sided formulaformula(est, "lhs")# using update, we add x1^2 as an independent variable:formula(est, fml.update = . ~ . + x1^2)# using build, see the difference => the FEs and the IVs are not inheritedformula(est, fml.build = . ~ . + x1^2)# we can use some special variablesformula(est, fml.build = . ~ .endo + .indep)

Fulton Fish Market data

Description

This dataset has been taken from Jeff Wooldridge's textbook.A modified version that appears in thewooldridge package.

Usage

data(fulton)

Format

fulton is a data frame with 97 observations and 12 variables namedt,day,price,qty,speed2,wave2,speed3,wave3,price_asian,price_white,qty_asian,qty_white.Each row is a recording of the Fulton fish market sales on a given day.

Details

Source: K Graddy (1995), “Testing for Imperfect Competition at the Fulton Fish Market,” RAND Journal of Economics 26, 75-92.

Source

https://www.cengage.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&product_isbn_issn=9781111531041


Hat values forfixest objects

Description

Computes the hat values forfeols orfeglm estimations.

Usage

## S3 method for class 'fixest'hatvalues(model, exact = TRUE, boot.size = 1000, ...)

Arguments

model

A fixest object. For instance from feols or feglm.

exact

Logical scalar, default isTRUE. Whether the diagonals of the projection matrix should be calculated exactly. IfFALSE, then it will be approximated using a JLA algorithm. See details. Unless you have a very large number of observations, it is recommended to keep the default value.

boot.size

Integer scalar orNULL, default is 1000. This is only used whenexact == FALSE. This determines the number of bootstrap samples used to estimate the projection matrix. If equal toNULL, it falls back to the default value of 1000.

...

Not currently used.

Details

Hat values are not available forfenegbin,femlm andfeNmlm estimations.

Hat values for generalized linear model are disussed in Belsley, Kuh and Welsch (1980), Cook and Weisberg (1982), etc.

Whenexact == FALSE, the Johnson-Lindenstrauss approximation (JLA) algorithm is used which approximates the diagonals of the projection matrix. For more precision (but longer time), increase the value ofboot.size. See Kline, Saggio, and Sølvsten (2020) for details.

Value

Returns a vector of the same length as the number of observations used in the estimation.

References

Belsley, D. A., Kuh, E. and Welsch, R. E. (1980).Regression Diagnostics. New York: Wiley.Cook, R. D. and Weisberg, S. (1982).Residuals and Influence in Regression. London: Chapman and Hall.Kline, P., Saggio R., and Sølvsten, M. (2020).Leave‐Out Estimation of Variance Components. Econometrica.

Examples

est = feols(Petal.Length ~ Petal.Width + Sepal.Width, iris)head(hatvalues(est))

Create, or interact variables with, factors

Description

Treat a variable as a factor, or interacts a variable with a factor. Values tobe dropped/kept from the factor can be easily set. Note that to interactfixed-effects, this function should not be used: instead use directly the syntaxfe1^fe2.

Usage

i(factor_var, var, ref, keep, bin, ref2, keep2, bin2, ...)

Arguments

factor_var

A vector (of any type) that will be treated as a factor.You can set references (i.e. exclude values for which to create dummies) withtheref argument.

var

A variable of the same length asfactor_var. This variable will beinteracted with the factor infactor_var. It can be numeric or factor-like.To force a numeric variable to be treated as a factor, you can add thei.prefix to a variable name. For instance take a numeric variablex_num:i(x_fact, x_num) will treatx_num as numeric whilei(x_fact, i.x_num)will treatx_num as a factor (it's a shortcut toas.factor(x_num)).

ref

A vector of values to be taken as references fromfactor_var.Can also be a logical: ifTRUE, then the first value offactor_var will be removed.Ifref is a character vector, partial matching is applied to values;use "@" as the first character to enable regular expression matching. See examples.

keep

A vector of values to be kept fromfactor_var (all others are dropped).By default they should be values fromfactor_var and ifkeep is acharacter vector partial matching is applied. Use "@" as the first characterto enable regular expression matching instead.

bin

A list of values to be grouped, a vector, a formula, or the specialvalues"bin::digit" or"cut::values". To create a new value from old values,usebin = list("new_value"=old_values) withold_values a vector of existing values.You can use.() forlist().It accepts regular expressions, but they must start with an"@", like inbin="@Aug|Dec". It accepts one-sided formulas which must contain the variablex,e.g.bin=list("<2" = ~x < 2).The names of the list are the new names. If the new name is missing, the firstvalue matched becomes the new name. In the name, adding"@d", withd a digit,will relocate the value in positiond: useful to change the position of factors.Use"@" as first item to make subsequent items be located first in the factor.Feeding in a vector is like using a list without name and only a single element.If the vector is numeric, you can use the special value"bin::digit" to groupeverydigit element.For example ifx represents years, usingbin="bin::2" creates bins of two years.With any data, using"!bin::digit" groups every digit consecutive values startingfrom the first value.Using"!!bin::digit" is the same but starting from the last value.With numeric vectors you can: a) use"cut::n" to cut the vector inton equal parts,b) use"cut::a]b[" to create the following bins:⁠[min, a]⁠,⁠]a, b[⁠,⁠[b, max]⁠.The latter syntax is a sequence of number/quartile (q0 to q4)/percentile (p0 to p100)followed by an open or closed square bracket. You can add custom bin names byadding them in the character vector after'cut::values'. See details and examples.Dot square bracket expansion (seedsb) is enabled.

ref2

A vector of values to be dropped fromvar. By default theyshould be values fromvar and ifref2 is a character vector partial matching is applied.Use "@" as the first character to enable regular expression matching instead.

keep2

A vector of values to be kept fromvar (all others are dropped).By default they should be values fromvar and ifkeep2 is a character vectorpartial matching is applied. Use "@" as the first characterto enable regular expression matching instead.

bin2

A list or vector defining the binning of the second variable.See help for the argumentbin for details (or look at the help of the functionbin).You can use.() forlist().

...

Not currently used.

Details

To interact fixed-effects, this function should not be used: instead use directly the syntaxfe1^fe2 in the fixed-effects part of the formula. Please see the details andexamples in the help page offeols.

Value

It returns a matrix with number of rows the length offactor_var. If there is no interactedvariable or it is interacted with a numeric variable, the number of columns is equal to thenumber of cases contained infactor_var minus the reference(s). If the interacted variable isa factor, the number of columns is the number of combined cases betweenfactor_var andvar.

Author(s)

Laurent Berge

See Also

iplot to plot interactions or factors created withi(),feols forOLS estimation with multiple fixed-effects.

See the functionbin for binning variables.

Examples

## Simple illustration#x = rep(letters[1:4], 3)[1:10]y = rep(1:4, c(1, 2, 3, 4))# interactiondata.frame(x, y, i(x, y, ref = TRUE))# without interactiondata.frame(x, i(x, "b"))# you can interact factors tooz = rep(c("e", "f", "g"), c(5, 3, 2))data.frame(x, z, i(x, z))# to force a numeric variable to be treated as a factor: use i.data.frame(x, y, i(x, i.y))# Binningdata.frame(x, i(x, bin = list(ab = c("a", "b"))))# Same as before but using .() for list() and a regular expression# note that to trigger a regex, you need to use an @ firstdata.frame(x, i(x, bin = .(ab = "@a|b")))## In fixest estimations#data(base_did)# We interact the variable 'period' with the variable 'treat'est_did = feols(y ~ x1 + i(period, treat, 5) | id + period, base_did)# => plot only interactions with iplotiplot(est_did)# Using i() for factorsest_bis = feols(y ~ x1 + i(period, keep = 3:6) + i(period, treat, 5) | id, base_did)# we plot the second set of variables created with i()# => we need to use keep (otherwise only the first one is represented)coefplot(est_bis, keep = "trea")# => special treatment in etableetable(est_bis, dict = c("6" = "six"))## Interact two factors## We use the i. prefix to consider week as a factordata(airquality)aq = airqualityaq$week = aq$Day %/% 7 + 1# Interacting Month and week:res_2F = feols(Ozone ~ Solar.R + i(Month, i.week), aq)# Same but dropping the 5th Month and 1st weekres_2F_bis = feols(Ozone ~ Solar.R + i(Month, i.week, ref = 5, ref2 = 1), aq)etable(res_2F, res_2F_bis)## Binning#data(airquality)feols(Ozone ~ i(Month, bin = "bin::2"), airquality)feols(Ozone ~ i(Month, bin = list(summer = 7:9)), airquality)

Lags a variable using a formula

Description

Lags a variable using panel id + time identifiers in a formula.

Usage

## S3 method for class 'formula'lag(  x,  k = 1,  data,  time.step = NULL,  fill = NA,  duplicate.method = "none",  ...)lag_fml(  x,  k = 1,  data,  time.step = NULL,  fill = NA,  duplicate.method = "none",  ...)

Arguments

x

A formula of the typevar ~ id + time wherevar is the variable to be lagged,id is a variable representing the panel id, andtime is the time variable of the panel.

k

An integer giving the number of lags. Default is 1. For leads,just use a negative number.

data

Optional, the data.frame in which to evaluate the formula. If not provided,variables will be fetched in the current environment.

time.step

The method to compute the lags, default isNULL (which meansautomatically set). Can be equal to:"unitary","consecutive","within.consecutive",or to a number. If"unitary", then the largest common divisor between consecutivetime periods is used (typically if the time variable represents years, it will be 1).This method can apply only to integer (or convertible to integer) variables.If"consecutive", then the time variable can be of any type: two successivetime periods represent a lag of 1. If"witihn.consecutive" thenwithin a given id,two successive time periods represent a lag of 1. Finally, if the time variable is numeric,you can provide your own numeric time step.

fill

Scalar. How to fill the observations without defined lead/lag values.Default isNA.

duplicate.method

If several observations have the same id and time values,then the notion of lag is not defined for them. Ifduplicate.method = "none" (default)and duplicate values are found, this leads to an error. You can useduplicate.method = "first" so that the first occurrence of identical id/timeobservations will be used as lag.

...

Not currently used.

Value

It returns a vector of the same type and length as the variable to be lagged in the formula.

Functions

Author(s)

Laurent Berge

See Also

Alternatively, the functionpanel changes adata.frame into a panel from whichthe functionsl andf (creating leads and lags) can be called. Otherwise you can setthe panel 'live' during the estimation using the argumentpanel.id (see for example inthe functionfeols).

Examples

# simple example with an unbalanced panelbase = data.frame(id = rep(1:2, each = 4),                  time = c(1, 2, 3, 4, 1, 4, 6, 9), x = 1:8)base$lag1 = lag(x~id+time,  1, base) # lag 1base$lead1 = lag(x~id+time, -1, base) # lead 1base$lag2_fill0 = lag(x~id+time, 2, base, fill = 0)# with time.step = "consecutive"base$lag1_consecutive = lag(x~id+time, 1, base, time.step = "consecutive")#   => works for indiv. 2 because 9 (resp. 6) is consecutive to 6 (resp. 4)base$lag1_within.consecutive = lag(x~id+time, 1, base, time.step = "within")#   => now two consecutive years within each indiv is one lagprint(base)# Argument time.step = "consecutive" is# mostly useful when the time variable is not a number:# e.g. c("1991q1", "1991q2", "1991q3") etc# with duplicatesbase_dup = data.frame(id = rep(1:2, each = 4),                      time = c(1, 1, 1, 2, 1, 2, 2, 3), x = 1:8)# Error because of duplicate values for (id, time)try(lag(x~id+time, 1, base_dup))# Error is bypassed, lag corresponds to first occurence of (id, time)lag(x~id+time, 1, base_dup, duplicate.method = "first")# Playing with time stepsbase = data.frame(id = rep(1:2, each = 4),                  time = c(1, 2, 3, 4, 1, 4, 6, 9), x = 1:8)# time step: 0.5 (here equivalent to lag of 1)lag(x~id+time, 2, base, time.step = 0.5)# Error: wrong time steptry(lag(x~id+time, 2, base, time.step = 7))# Adding NAs + unsorted IDsbase = data.frame(id = rep(1:2, each = 4),                  time = c(4, NA, 3, 1, 2, NA, 1, 3), x = 1:8)base$lag1 = lag(x~id+time, 1, base)base$lag1_within = lag(x~id+time, 1, base, time.step = "w")base_bis = base[order(base$id, base$time),]print(base_bis)# You can create variables without specifying the data within data.table:if(require("data.table")){  base = data.table(id = rep(1:2, each = 3), year = 1990 + rep(1:3, 2), x = 1:6)  base[, x.l1 := lag(x~id+year, 1)]}

Extracts the log-likelihood

Description

This function extracts the log-likelihood from afixest estimation.

Usage

## S3 method for class 'fixest'logLik(object, ...)

Arguments

object

Afixest object. Obtained using the functionsfemlm,feols orfeglm.

...

Not currently used.

Details

This function extracts the log-likelihood based on the model fit. You can have moreinformation on the likelihoods in the details of the functionfemlm.

Value

It returns a numeric scalar.

Author(s)

Laurent Berge

See Also

See also the main estimation functionsfemlm,feols orfeglm. Otherstatistics functions:AIC.fixest,BIC.fixest.

Examples

# simple estimation on iris data with "Species" fixed-effectsres = femlm(Sepal.Length ~ Sepal.Width + Petal.Length +            Petal.Width | Species, iris)nobs(res)logLik(res)

Design matrix of afixest object

Description

This function creates the left-hand-side or the right-hand-side(s) of afemlm,feols orfeglm estimation.

Usage

## S3 method for class 'fixest'model.matrix(  object,  data = NULL,  type = "rhs",  sample = "estimation",  na.rm = FALSE,  subset = FALSE,  as.matrix = FALSE,  as.df = FALSE,  collin.rm = TRUE,  ...)

Arguments

object

Afixest object. Obtained using the functionsfemlm,feols orfeglm.

data

A data.frame orNULL (the default). If missing orNULL, then theoriginal data is obtained by evaluating thecall.

type

Character vector or one sided formula, default is "rhs". Contains the type ofmatrix/data.frame to be returned. Possible values are: "lhs", "rhs", "fixef", "iv.rhs1"(1st stage RHS), "iv.rhs2" (2nd stage RHS), "iv.endo" (endogenous vars.), "iv.exo"(exogenous vars), "iv.inst" (instruments).

sample

Character scalar equal to "estimation" (default) or "original". Onlyused whendata=NULL (i.e. the original data is requested). By default,only the observations effectively used in the estimation are returned (it includesthe observations with NA values or the fully explained by the fixed-effects (FE), ordue to NAs in the weights).

Ifsample="original", all the observations are returned. In that case, ifyou usena.rm=TRUE (which is not the default), you can withdraw the observationswith NA values (and keep the ones fully explained by the FEs).

na.rm

Logical scalar, default isFALSE. Should observations with NAs beremoved from the resulting matrix or data.frame? Note that ifdata=NULL

subset

Logical scalar or character vector. Default isFALSE. IfTRUE, then thematrix created will be restricted only to the variables contained in the argumentdata,which can then contain a subset of the variables used in the estimation. If acharacter vector, then only the variables matching the elements of the vector viaregular expressions will be created.

as.matrix

Logical scalar, default isFALSE. Whether to coerce the result to a matrix.

as.df

Logical scalar, default isFALSE. Whether to coerce the result to a data.frame.

collin.rm

Logical scalar, default isTRUE. Only used whendata=NULL (i.e.the data used in the estimation is requested). Whether to remove variables that werefound to be collinear during the estimation. Beware: it does not perform acollinearity check.

...

Not currently used.

Value

It returns either a vector, a matrix or a data.frame. It returns a vector for thedependent variable ("lhs"), a data.frame for the fixed-effects ("fixef") and a matrixfor any other type.

Author(s)

Laurent Berge

See Also

See also the main estimation functionsfemlm,feols orfeglm.formula.fixest,update.fixest,summary.fixest,vcov.fixest.

Examples

# we use a data set with NAs and fixed-effect singletonsbase = setNames(iris, c("y", "x1", "x2", "x3", "fe"))# adding NAsbase$x1[1:4] = NA# adding singletonsbase$fe = as.character(base$fe)base$fe[10 + 1:5] = letters[1:5]# OLS estimation where we remove singletonsest = feols(y ~ x1 + poly(x2, 2) | fe, base, fixef.rm = "singleton")# by default, we have the data set used in the estimationhead(model.matrix(est))nrow(model.matrix(est))# to have the original data set: we need to use sample="original"head(model.matrix(est, sample = "original"))nrow(model.matrix(est, sample = "original"))# we can drop only the NA values (and not the singletons) with na.rm=TRUEhead(model.matrix(est, sample = "original", na.rm = TRUE))nrow(model.matrix(est, sample = "original", na.rm = TRUE))## Illustration of subset## subset => character vectorhead(model.matrix(est, subset = "x1"))# subset => TRUE, only works with data argument!!head(model.matrix(est, data = base[, "x1", drop = FALSE], subset = TRUE))

Extracts the models tree from afixest_multi object

Description

Extracts the meta information on all the models contained in afixest_multi estimation.

Usage

models(x, simplify = FALSE)

Arguments

x

Afixest_multi object, obtained from afixest estimation leading tomultiple results.

simplify

Logical, default isFALSE. The default behavior is to display all the metainformation, even if they are identical across models. By usingsimplify = TRUE, only theinformation with some variation is kept.

Value

It returns adata.frame whose first column (namedid) is the index of the models andthe other columns contain the information specific to each model (e.g. which sample,which RHS, which dependent variable, etc).

See Also

multiple estimations infeols,n_models

Examples

# a multiple estimationbase = setNames(iris, c("y", "x1", "x2", "x3", "species"))est = feols(y ~ csw(x.[, 1:3]), base, fsplit = ~species)# All the meta informationmodels(est)# Illustration: Why use simplifyest_sub = est[sample = 2]models(est_sub)models(est_sub, simplify = TRUE)

Gets the dimension offixest_multi objects

Description

Otabin the number of unique models of afixest_multi object, depending on thetype requested.

Usage

n_models(  x,  lhs = FALSE,  rhs = FALSE,  sample = FALSE,  fixef = FALSE,  iv = FALSE)

Arguments

x

Afixest_mutli object, obtained e.g. fromfeols.

lhs

Logical scalar, default isFALSE. IfTRUE, the number of differentleft hand sides is returned.

rhs

Logical scalar, default isFALSE. IfTRUE, the number of differentright hand sides is returned.

sample

Logical scalar, default isFALSE. IfTRUE, the number of differentsamples is returned.

fixef

Logical scalar, default isFALSE. IfTRUE, the number of differenttypes of fixed-effects is returned.

iv

Logical scalar, default isFALSE. IfTRUE, the number of differentIV stages is returned.

Value

It returns an integer scalar. If no argument is provided, the total number ofmodels is returned.

See Also

Multiple estimations infeols,models

Examples

base = setNames(iris, c("y", "x1", "x2", "x3", "species"))est = feols(y ~ csw(x1, x2, x3), base, fsplit = ~species)# there are 3 different RHSs and 4 different samplesmodels(est)# We can obtain these numbers with n_modelsn_models(est, rhs = TRUE)n_models(est, sample = TRUE)

Prints the number of unique elements in a data set

Description

This utility tool displays the number of unique elements in one or multiple data.framesas well as their number of NA values.

Usage

n_unik(x)## S3 method for class 'vec_n_unik'print(x, ...)## S3 method for class 'list_n_unik'print(x, ...)

Arguments

x

A formula, with data set names on the LHS and variables on the RHS,likedata1 + data2 ~ var1 + var2. The following special variables areadmitted:"." to get default values,".N" for the number of observations,".U"for the number of unique rows,".NA" for the number of rows with at least one NA.Variables can be combined with"^", e.g.df~id^period; useid%^%period to alsoinclude the terms on both sides. Note that using: and*is equivalent to^ and⁠%^%⁠. Sub select withid[cond], when doing soidis automatically included. Conditions can be chained, as inid[cond1, cond2].UseNA(x, y) in conditions instead ofis.na(x) | is.na(y). Use the⁠!!⁠operator to have both a condition and its opposite. To compare the keysin two data sets, usedata1:data2. If not a formula,x can be: a vector(displays the # of unique values); adata.frame (default values are displayed),or a "sum" of data sets like inx = data1 + data2, in that case it is equivalenttodata1 + data2 ~ ..

...

Not currently used.

Value

It returns a vector containing the number of unique values per element. If severaldata sets were provided, a list is returned, as long as the number of data sets,each element being a vector of unique values.

Special values and functions

In the formula, you can use the following special values:".",".N",".U", and".NA".

"."

Accesses the default values. If there is only one data set and thedata set isnot adata.table, then the default is to display the number ofobservations and the number of unique rows. If the data is adata.table, the numberof unique items in the key(s) is displayed instead of the number of unique rows(if the table has keys of course). If there are two or more data sets, then thedefault is to display the unique items for: a) the variables common across all data sets,if there's less than 4, and b) if no variable is shown in a), the number of variablescommon across at least two data sets, provided there are less than 5. If the data sets aredata tables, the keys are also displayed on top of the common variables. In any case, thenumber of observations is always displayed.

".N"

Displays the number of observations.

".U"

Displays the number of unique rows.

".NA"

Displays the number of rows with at least one NA.

TheNA function

The special functionNA is an equivalent tois.na but can handle several variables.For instance,NA(x, y) is equivalent tois.na(x) | is.na(y). You can add asmany variables as you want as arguments. If no argument is provided, as inNA(),it is identical to having all the variables of the data set as argument.

Combining variables

Use the "hat","^", operator to combine several variables. For exampleid^periodwill display the number of unique values of id x period combinations.

Use the "super hat","%^%", operator to also include the terms on both sides.For example, instead of writingid + period + id^period, you can simply writeid%^%period.

Alternatively, you can use: for^ and* for⁠%^%⁠.

Sub-selections

To show the number of unique values for sub samples, simply use⁠[]⁠.For example,id[x > 10] will display the number of uniqueid for whichx > 10.

Simple square brackets lead to the inclusion of both the variable and its subset.For exampleid[x > 10] is equivalent toid + id[x > 10].To include only the sub selection, use double square brackets, as inid[[x > 10]].

You can add multiple sub selections at once, only separate them with a comma.For exampleid[x > 10, NA(y)] is equivalent toid[x > 10] + id[NA(y)].

Use the double negative operator, i.e.⁠!!⁠, to include both a condition andits opposite at once. For exampleid[!!x > 10] is equivalent toid[x > 10, !x > 10].Double negative operators can be chained, like inid[!!cond1 & !!cond2], then thecardinal product of all double negatived conditions is returned.

Author(s)

Laurent Berge

Examples

data = base_diddata$x1.L1 = round(lag(x1~id+period, 1, data))# By default, just the formatted number of observationsn_unik(data)# Or the nber of unique elements of a vectorn_unik(data$id)# number of unique id values and id x period pairsn_unik(data ~.N + id + id^period)# use the %^% operator to include the terms on the two sides at once# => same as id*periodn_unik(data ~.N + id %^% period)# using sub selection with []n_unik(data ~.N + period[!NA(x1.L1)])# to show only the sub selection: [[]]n_unik(data ~.N + period[[!NA(x1.L1)]])# you can have multiple values in [],# just separate them with a comman_unik(data ~.N + period[!NA(x1.L1), x1 > 7])# to have both a condition and its opposite,# use the !! operatorn_unik(data ~.N[!!NA(x1.L1)])# the !! operator works within condition chainsn_unik(data ~.N[!!NA(x1.L1) & !!x1 > 7])# Conditions can be distributedn_unik(data ~ (id + period)[x1 > 7])## Several data sets## Typical use case: merging# Let's create two data sets and merge themdata(base_did)base_main = base_didbase_extra = sample_df(base_main[, c("id", "period")], 100)base_extra$id[1:10] = 111:120base_extra$period[11:20] = 11:20base_extra$z = rnorm(100)# You can use db1:db2 to compare the common keys in two data sets n_unik(base_main:base_extra)tmp = merge(base_main, base_extra, all.x = TRUE, by = c("id", "period"))# You can show unique values for any variable, as beforen_unik(tmp + base_main + base_extra ~ id[!!NA(z)] + id^period)

Extracts the number of observations form afixest object

Description

This function simply extracts the number of observations form afixest object,obtained using the functionsfemlm,feols orfeglm.

Usage

## S3 method for class 'fixest'nobs(object, ...)

Arguments

object

Afixest object. Obtained using the functionsfemlm,feols orfeglm.

...

Not currently used.

Value

It returns an interger.

Author(s)

Laurent Berge

See Also

See also the main estimation functionsfemlm,feols orfeglm.Usesummary.fixest to see the results with the appropriate standard-errors,fixef.fixest to extract the fixed-effects coefficients, and the functionetableto visualize the results of multiple estimations.

Examples

# simple estimation on iris data with "Species" fixed-effectsres = femlm(Sepal.Length ~ Sepal.Width + Petal.Length +            Petal.Width | Species, iris)nobs(res)logLik(res)

Extracts the observations used for the estimation

Description

This function extracts the observations used infixest estimation.Thestats::case.names S3 method calls this function

Usage

obs(x)## S3 method for class 'fixest'case.names(object, ...)

Arguments

x

Afixest object.

object

Afixest object.

...

Ignored

Value

It returns a simple vector of integers.

Examples

base = irisnames(base) = c("y", "x1", "x2", "x3", "species")base$y[1:5] = NA# Split sample estimationsest_split = feols(y ~ x1, base, split = ~species)(obs_setosa = obs(est_split[[1]]))(obs_versi = obs(est_split[sample = "versi", drop = TRUE]))est_versi = feols(y ~ x1, base, subset = obs_versi)etable(est_split, est_versi)

Formatted object size

Description

Tools that returns a formatted object size, where the appropriate unit is automatically chosen.

Usage

osize(x)## S3 method for class 'osize'print(x, ...)

Arguments

x

Any R object.

...

Not currently used.

Value

Returns a character scalar.

Author(s)

Laurent Berge

Examples

osize(iris)data(trade)osize(trade)

Constructs afixest panel data base

Description

Constructs afixest panel data base out of a data.frame which allows to use leads and lagsinfixest estimations and to create new variables from leads and lags if the data.framewas also adata.table::data.table.

Usage

panel(data, panel.id, time.step = NULL, duplicate.method = "none")

Arguments

data

A data.frame.

panel.id

The panel identifiers. Can either be: i) a one sided formula(e.g.panel.id = ~id+time), ii) a character vector of length 2(e.g.panel.id=c('id', 'time'), or iii) a character scalar of two variablesseparated by a comma (e.g.panel.id='id,time'). Note that you can combine variableswith^ only inside formulas (see the dedicated section infeols).

time.step

The method to compute the lags, default isNULL (which meansautomatically set). Can be equal to:"unitary","consecutive","within.consecutive",or to a number. If"unitary", then the largest common divisor between consecutivetime periods is used (typically if the time variable represents years, it will be 1).This method can apply only to integer (or convertible to integer) variables.If"consecutive", then the time variable can be of any type: two successivetime periods represent a lag of 1. If"witihn.consecutive" thenwithin a given id,two successive time periods represent a lag of 1. Finally, if the time variable is numeric,you can provide your own numeric time step.

duplicate.method

If several observations have the same id and time values,then the notion of lag is not defined for them. Ifduplicate.method = "none" (default)and duplicate values are found, this leads to an error. You can useduplicate.method = "first" so that the first occurrence of identical id/timeobservations will be used as lag.

Details

This function allows you to use leads and lags in afixest estimation without having toprovide the argumentpanel.id. It also offers more options on how to set the panel(with the additional arguments 'time.step' and 'duplicate.method').

When the initial data set was also adata.table, not all operations are supported and some maydissolve thefixest_panel. This is the case when creating subselections of the initial datawith additional attributes (e.g.pdt[x>0, .(x, y, z)] would dissolve thefixest_panel,meaning only a data.table would be the result of the call).

If the initial data set was also adata.table, then you can create new variables from lagsand leads using the functionsl andf. See the example.

Value

It returns a data base identical to the one given in input, but with an additional attribute:“panel_info”. This attribute contains vectors used to efficientlycreate lags/leads of the data. When the data is subselected, some bookeeping is performedon the attribute “panel_info”.

Author(s)

Laurent Berge

See Also

The estimation methodsfeols,fepois andfeglm.

The functionsl andf to create lags and leads withinfixest_panel objects.

Examples

data(base_did)# Setting a data set as a panel...pdat = panel(base_did, ~id+period)# ...then using the functions l and fest1 = feols(y~l(x1, 0:1), pdat)est2 = feols(f(y)~l(x1, -1:1), pdat)est3 = feols(l(y)~l(x1, 0:3), pdat)etable(est1, est2, est3, order = c("f", "^x"), drop="Int")# or using the argument panel.idfeols(f(y)~l(x1, -1:1), base_did, panel.id = ~id+period)# You can use panel.id in various ways:pdat = panel(base_did, ~id+period)# is identical to:pdat = panel(base_did, c("id", "period"))# and also to:pdat = panel(base_did, "id,period")# l() and f() can also be used within a data.table:if(require("data.table")){  pdat_dt = panel(as.data.table(base_did), ~id+period)  # Now since pdat_dt is also a data.table  #   you can create lags/leads directly  pdat_dt[, x1_l1 := l(x1)]  pdat_dt[, c("x1_l1_fill0", "y_f2") := .(l(x1, fill = 0), f(y, 2))]}

Displaying the most notable fixed-effects

Description

This function plots the 5 fixed-effects with the highest and lowest values, foreach of the fixed-effect dimension. It takes as an argument the fixed-effects obtainedfrom the functionfixef.fixest after an estimation usingfemlm,feols orfeglm.

Usage

## S3 method for class 'fixest.fixef'plot(x, n = 5, ...)

Arguments

x

An object obtained from the functionfixef.fixest.

n

The number of fixed-effects to be drawn. Defaults to 5.

...

Not currently used.

Note that the fixed-effect coefficients might NOT be interpretable. This function isuseful only for fully regular panels.

If the data are not regular in the fixed-effect coefficients, this means that several‘reference points’ are set to obtain the fixed-effects, therebyimpeding their interpretation. In this case a warning is raised.

Author(s)

Laurent Berge

See Also

fixef.fixest to extract clouster coefficients. See also the mainestimation functionfemlm,feols orfeglm. Usesummary.fixest to seethe results with the appropriate standard-errors, the functionetable tovisualize the results of multiple estimations.

Examples

data(trade)# We estimate the effect of distance on trade# => we account for 3 fixed-effectsest_pois = femlm(Euros ~ log(dist_km)|Origin+Destination+Product, trade)# obtaining the fixed-effects coefficientsfe_trade = fixef(est_pois)# plotting themplot(fe_trade)

Predict method forfixest fits

Description

This function obtains prediction from a fitted model estimated withfemlm,feols orfeglm.

Usage

## S3 method for class 'fixest'predict(  object,  newdata,  type = c("response", "link"),  se.fit = FALSE,  interval = "none",  level = 0.95,  fixef = FALSE,  vs.coef = FALSE,  sample = c("estimation", "original"),  vcov = NULL,  ssc = NULL,  ...)

Arguments

object

Afixest object. Obtained using the functionsfemlm,feols orfeglm.

newdata

A data.frame containing the variables used to make the prediction.If not provided, the fitted expected (or linear iftype = "link") predictors are returned.

type

Character either equal to"response" (default) or"link".Iftype="response", then the output is at the level of the response variable, i.e.it is the expected predictorE(Y|X). If"link", then the output is atthe level of the explanatory variables, i.e. the linear predictorX\cdot \beta.

se.fit

Logical, default isFALSE. IfTRUE, the standard-error of the predictedvalue is computed and returned in a column namedse.fit. This feature is only availablefor OLS models not containing fixed-effects.

interval

Either "none" (default), "confidence" or "prediction". What type ofconfidence interval to compute. Note that this feature is only available for OLS modelsnot containing fixed-effects (GLM/ML models are not covered).

level

A numeric scalar in between 0.5 and 1, defaults to 0.95. Only used whenthe argument 'interval' is requested, it corresponds to the width of the confidence interval.

fixef

Logical scalar, default isFALSE. IfTRUE, a data.frame is returned,with each column representing the fixed-effects coefficients for each observation innewdata – with as many columns as fixed-effects. Note that when there are variableswith varying slopes, the slope coefficients are returned (i.e. they are not multipliedby the variable).

vs.coef

Logical scalar, default isFALSE. Only used whenfixef = TRUE andwhen variables with varying slopes are present. IfTRUE, the coefficients of thevariables with varying slopes are returned instead of the coefficient multiplied by thevalue of the variables (default).

sample

Either "estimation" (default) or "original". This argument is only usedwhen arg. 'newdata' is missing, and is ignored otherwise. If equal to "estimation",the vector returned matches the sample used for the estimation. If equal to "original",it matches the original data set (the observations not used for the estimation being filledwith NAs).

vcov

Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. TheVCOV types implemented are: "iid", "hetero" (or "HC1"), "cluster", "twoway","NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley". It also acceptsobject fromvcov_cluster,vcov_NW,NW,vcov_DK,DK,vcov_conley andconley. It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances. See thevcov documentationin thevignette.

ssc

An object of classssc.type obtained with the functionssc. Representshow the degree of freedom correction should be done.You must use the functionsscfor this argument. The arguments and defaults of the functionssc are:K.adj = TRUE,K.fixef = "nonnested",G.adj = TRUE,G.df = "min",t.df = "min",⁠K.exact = FALSE)⁠. See the help of the functionssc for details.

...

Not currently used.

Value

It returns a numeric vector of length equal to the number of observations in argumentnewdata.Ifnewdata is missing, it returns a vector of the same length as the estimation sample,except ifsample = "original", in which case the length of the vector will match the oneof the original data set (which can, but also cannot, be the estimation sample).Iffixef = TRUE, adata.frame is returned.Ifse.fit = TRUE orinterval != "none", the object returned is a data.framewith the following columns:fit,se.fit, and, if CIs are requested,ci_low andci_high.

Author(s)

Laurent Berge

See Also

See also the main estimation functionsfemlm,feols orfeglm.update.fixest,summary.fixest,vcov.fixest,fixef.fixest.

Examples

# Estimation on iris datares = fepois(Sepal.Length ~ Petal.Length | Species, iris)# what would be the prediction if the data was all setosa?newdata = data.frame(Petal.Length = iris$Petal.Length, Species = "setosa")pred_setosa = predict(res, newdata = newdata)# Let's look at it graphicallyplot(c(1, 7), c(3, 11), type = "n", xlab = "Petal.Length",     ylab = "Sepal.Length")newdata = iris[order(iris$Petal.Length), ]newdata$Species = "setosa"lines(newdata$Petal.Length, predict(res, newdata))# versicolornewdata$Species = "versicolor"lines(newdata$Petal.Length, predict(res, newdata), col=2)# virginicanewdata$Species = "virginica"lines(newdata$Petal.Length, predict(res, newdata), col=3)# The original datapoints(iris$Petal.Length, iris$Sepal.Length, col = iris$Species, pch = 18)legend("topleft", lty = 1, col = 1:3, legend = levels(iris$Species))## Getting the fixed-effect coefficients for each obs.#data(trade)est_trade = fepois(Euros ~ log(dist_km) | Destination^Product +                                           Origin^Product + Year, trade)obs_fe = predict(est_trade, fixef = TRUE)head(obs_fe)# can we check we get the right sum of fixed-effectshead(cbind(rowSums(obs_fe), est_trade$sumFE))## Standard-error of the prediction#base = setNames(iris, c("y", "x1", "x2", "x3", "species"))est = feols(y ~ x1 + species, base)head(predict(est, se.fit = TRUE))# regular confidence intervalhead(predict(est, interval = "conf"))# adding the residual to the CIhead(predict(est, interval = "predi"))# You can change the type of SE on the flyhead(predict(est, interval = "conf", vcov = ~species))

A print facility forfixest objects.

Description

This function is very similar to usualsummary functions as itprovides the table of coefficients along with other information on the fit ofthe estimation. The type of output can be customized by the user (usingfunctionsetFixest_print).

Usage

## S3 method for class 'fixest'print(x, n, type = "table", fitstat = NULL, ...)setFixest_print(type = "table", fitstat = NULL)getFixest_print()

Arguments

x

Afixest object. Obtained using the methodsfemlm,feols orfeglm.

n

Integer, number of coefficients to display. By default, only thefirst 8 coefficients are displayed ifx does not come fromsummary.fixest.

type

Either"table" (default) to display the coefficients tableor"coef" to display only the coefficients.

fitstat

A formula or a character vector representing which fitstatistic to display. The types must be valid types of the functionfitstat. The default fit statistics depend on thetype of estimation (OLS, GLM, IV, with/without fixed-effect). Providing theargumentfitstat overrides the default fit statistics, you canhowever use the point "." to summon them back. Ex 1:fitstat = ~ . + ll adds the log-likelihoodto the default values. Ex 2:fitstat = ~ ll + pr2 only displays the log-likelihood and the pseudo-R2.

...

Other arguments to be passed tovcov.fixest.

Details

It is possible to set the default values for the argumentstype andfitstat by using the functionsetFixest_print.

Author(s)

Laurent Berge

See Also

See also the main estimation functionsfemlm,feols orfeglm. Usesummary.fixest to see the results with the appropriatestandard-errors,fixef.fixest to extract thefixed-effects coefficients, and the functionetable tovisualize the results of multiple estimations.

Examples

# Load trade datadata(trade)# We estimate the effect of distance on trade#   => we account for 3 fixed-effects (FEs)est_pois = fepois(Euros ~ log(dist_km)|Origin+Destination+Product, trade)# displaying the resultsprint(est_pois)# By default the coefficient table is displayed.#  If the user wished to display only the coefficents, use option type:print(est_pois, type = "coef")# To permanently display coef. only, use setFixest_print:setFixest_print(type = "coef")est_pois# back to default:setFixest_print(type = "table")## fitstat## We modify which fit statistic to displayprint(est_pois, fitstat = ~ . + lr)# We add the LR test to the default (represented by the ".")# to show only the LR stat:print(est_pois, fitstat = ~ . + lr.stat)# To modify the defaults:setFixest_print(fitstat = ~ . + lr.stat + rmse)est_pois# Back to default (NULL == default)setFixest_print(fitstat = NULL)

Print method for fit statistics of fixest estimations

Description

Displays a brief summary of selected fit statistics from the functionfitstat.

Usage

## S3 method for class 'fixest_fitstat'print(x, na.rm = FALSE, ...)

Arguments

x

An object resulting from thefitstat function.

na.rm

Logical, default isFALSE. IfTRUE, the statistics that are missingare not displayed.

...

Not currently used.

Examples

data(trade)gravity = feols(log(Euros) ~ log(dist_km) | Destination + Origin, trade)# Extracting the 'working' number of observations used to compute the pvaluesfitstat(gravity, "g", simplify = TRUE)# Some fit statisticsfitstat(gravity, ~ rmse + r2 + wald + wf)# You can use them in etableetable(gravity, fitstat = ~ rmse + r2 + wald + wf)# For wald and wf, you could show the pvalue instead:etable(gravity, fitstat = ~ rmse + r2 + wald.p + wf.p)# Now let's display some statistics that are not built-in# => we use fitstat_register to create them# We need: a) type name, b) the function to be applied#          c) (optional) an aliasfitstat_register("tstand", function(x) tstat(x, se = "stand")[1], "t-stat (regular)")fitstat_register("thc", function(x) tstat(x, se = "heter")[1], "t-stat (HC1)")fitstat_register("t1w", function(x) tstat(x, se = "clus")[1], "t-stat (clustered)")fitstat_register("t2w", function(x) tstat(x, se = "twow")[1], "t-stat (2-way)")# Now we can use these keywords in fitstat:etable(gravity, fitstat = ~ . + tstand + thc + t1w + t2w)# Note that the custom stats we created are can easily lead# to errors, but that's another story!

Print method for fixest_multi objects

Description

Displays summary information on fixest_multi objects in the R console.

Usage

## S3 method for class 'fixest_multi'print(x, type = "etable", ...)

Arguments

x

Afixest_multi object, obtained from afixest estimation leading tomultiple results.

type

A character either equal to"etable","short","long","compact","se_compact" or"se_long".Ifetable, the functionetable is used to print the result.Ifshort, only the table of coefficients is displayed for each estimation.Iflong, then the full results are displayed for each estimation. Ifcompact,adata.frame is returned with one line per model and the formattedcoefficients + standard-errors in the columns. Ifse_compact, adata.frame isreturned with one line per model, one numeric column for each coefficient and one numericcolumn for each standard-error. If"se_long", same as"se_compact" but the data is in along format instead of wide.

...

Other arguments to be passed tosummary.fixest_multi.

See Also

The main fixest estimation functions:feols,fepois,fenegbin,feglm,feNmlm. Tools for mutliple fixestestimations:summary.fixest_multi,print.fixest_multi,as.list.fixest_multi,sub-sub-.fixest_multi,sub-.fixest_multi.

Examples

base = irisnames(base) = c("y", "x1", "x2", "x3", "species")# Multiple estimationres = feols(y ~ csw(x1, x2, x3), base, split = ~species)# Let's print all thatres

R2s offixest models

Description

Reports different R2s forfixest estimations (e.g.feglm orfeols).

Usage

r2(x, type = "all", full_names = FALSE)

Arguments

x

Afixest object, e.g. obtained with functionfeglm orfeols.

type

A character vector representing the R2 to compute. The R2 codes are of the form:"wapr2" with letters "w" (within), "a" (adjusted) and "p" (pseudo) possibly missing.E.g. to get the regular R2: usetype = "r2", the within adjusted R2: usetype = "war2",the pseudo R2: usetype = "pr2", etc. Use"cor2" for the squared correlation.By default, all R2s are computed.

full_names

Logical scalar, default isFALSE. IfTRUE then names of the vectorin output will have full names instead of keywords (e.g.⁠Squared Correlation⁠instead ofcor2, etc).

Details

The pseudo R2s are the McFaddens R2s, that is the ratio of log-likelihoods.

For R2s with no theoretical justification, like e.g. regular R2s for maximum likelihoodmodels – or within R2s for models without fixed-effects, NA is returned.The single measure to possibly compare all kinds of models is the squaredcorrelation between the dependent variable and the expected predictor.

The pseudo-R2 is also returned in the OLS case, it corresponds to thepseudo-R2 of the equivalent GLM model with a Gaussian family.

For the adjusted within-R2s, the adjustment factor is(n - nb_fe) / (n - nb_fe - K)withn the number of observations,nb_fe the number of fixed-effects andKthe number of variables.

Value

Returns a named vector.

Author(s)

Laurent Berge

Examples

# Load trade datadata(trade)# We estimate the effect of distance on trade (with 3 fixed-effects)est = feols(log(Euros) ~ log(dist_km) | Origin + Destination + Product, trade)# Squared correlation:r2(est, "cor2")# "regular" r2:r2(est, "r2")# pseudo r2 (equivalent to GLM with Gaussian family)r2(est, "pr2")# adjusted within r2r2(est, "war2")# all four at oncer2(est, c("cor2", "r2", "pr2", "war2"))# same with full names instead of codesr2(est, c("cor2", "r2", "pr2", "war2"), full_names = TRUE)

Refactors a variable

Description

Takes a variables of any types, transforms it into a factors, and modifies the valuesof the factors. Useful in estimations when you want to set some value of a vector as a reference.

Usage

ref(x, ref)

Arguments

x

A vector of any type (must be atomic though).

ref

A vector or a list, or special binning values (explained later). If a vector,it must correspond to (partially matched) values of the vectorx. The vectorx whichwill be transformed into a factor and these values will be placed first in the levels.That's the main usage of this function. You can also bin on-the-fly the values ofx,using the same syntax as the functionbin. To create a new value from old values,useref = list("new_value"=old_values) withold_values a vector of existing values.You can use.() forlist().It accepts regular expressions, but they must start with an"@", like inref="@Aug|Dec".It accepts one-sided formulas which must contain the variablex,e.g.ref=list("<2" = ~x < 2).The names of the list are the new names. If the new name is missing, the firstvalue matched becomes the new name. In the name, adding"@d", withd a digit,will relocate the value in positiond: useful to change the position of factors.If the vectorx is numeric, you can use the special value"bin::digit" to groupeverydigit element.For example ifx represents years, usingref="bin::2" creates bins of two years.With any data, using"!bin::digit" groups every digit consecutive values startingfrom the first value.Using"!!bin::digit" is the same but starting from the last value.With numeric vectors you can: a) use"cut::n" to cut the vector inton equal parts,b) use"cut::a]b[" to create the following bins:⁠[min, a]⁠,⁠]a, b[⁠,⁠[b, max]⁠.The latter syntax is a sequence of number/quartile (q0 to q4)/percentile (p0 to p100)followed by an open or closed square bracket. You can add custom bin names byadding them in the character vector after'cut::values'. See details and examples.Dot square bracket expansion (seedsb) is enabled.

Value

It returns a factor of the same length asx, where levels have been modified accordingto the argumentref.

"Cutting" a numeric vector

Numeric vectors can be cut easily into: a) equal parts, b) user-specified bins.

Use"cut::n" to cut the vector inton (roughly) equal parts. Percentiles areused to partition the data, hence some data distributions can lead to create lessthann parts (for example if P0 is the same as P50).

The user can specify custom bins with the following syntax:"cut::a]b]c]". Herethe numbersa,b,c, etc, are a sequence of increasing numbers, each followedby an open or closed square bracket. The numbers can be specified as eitherplain numbers (e.g."cut::5]12[32["), quartiles (e.g."cut::q1]q3["),or percentiles (e.g."cut::p10]p15]p90]"). Values of different types can be mixed:"cut::5]q2[p80[" is valid provided the median (q2) is indeed greaterthan5, otherwise an error is thrown.

The square bracket right of each number tells whether the numbers should be includedor excluded from the current bin. For example, sayx ranges from 0 to 100,then"cut::5]" will create two bins: one from 0 to 5 and a second from 6 to 100.With"cut::5[" the bins would have been 0-4 and 5-100.

A factor is always returned. The labels always report the min and max values in each bin.

To have user-specified bin labels, just add them in the character vectorfollowing'cut::values'. You don't need to provide all of them, andNA valuesfall back to the default label. For example,bin = c("cut::4", "Q1", NA, "Q3")will modify only the first and third label that will be displayed as"Q1" and"Q3".

bin vsref

The functionsbin andref are able to do the same thing, then why use oneinstead of the other? Here are the differences:

Author(s)

Laurent Berge

See Also

To bin the values of a vector:bin.

Examples

data(airquality)# A vector of monthsmonth_num = airquality$Monthmonth_lab = c("may", "june", "july", "august", "september")month_fact = factor(month_num, labels = month_lab)table(month_num)table(month_fact)## Main use## Without argument: equivalent to as.factorref(month_num)# Main usage: to set a level first:# (Note that partial matching is enabled.)table(ref(month_fact, "aug"))# You can rename the level on-the-fly# (Northern hemisphere specific!)table(ref(month_fact, .("Hot month"="aug",                        "Late summer" = "sept")))# Main use is in estimations:a = feols(Petal.Width ~ Petal.Length + Species, iris)# We change the referenceb = feols(Petal.Width ~ Petal.Length + ref(Species, "vers"), iris)etable(a, b)## Binning## You can also bin factor values on the fly# Using @ first means a regular expression will be used to match the values.# Note that the value created is placed first.# To avoid that behavior => use the function "bin"table(ref(month_fact, .(summer = "@jul|aug|sep")))# Please refer to the example in the bin help page for more example.# The syntax is the same.## Precise relocation## You can place a factor at the location you want#  by adding "@digit" in the name first:table(ref(month_num, .("@5"=5)))# Same with renamingtable(ref(month_num, .("@5 five"=5)))

Replicatesfixest objects

Description

Simple function that replicatesfixest objects while (optionally) computing differentstandard-errors. Useful mostly in combination withetable orcoefplot.

Usage

## S3 method for class 'fixest'rep(x, times = 1, each = 1, vcov, ...)## S3 method for class 'fixest_list'rep(x, times = 1, each = 1, vcov, ...).l(...)

Arguments

x

Either afixest object, either a list offixest objects created with.l().

times

Integer vector giving the number of repetitions of the vector of elements. Bydefaulttimes = 1. It must be either of length 1, either of the same length as the argumentx.

each

Integer scalar indicating the repetition of each element. Default is 1.

vcov

A list containing the types of standard-error to be computed, default is missing. Ifnot missing, it must be of the same length astimes,each, or the final vector. Note that ifthe argumentstimes andeach are missing, thentimes becomes equal to the length ofvcov. To see how to summon a VCOV, see the dedicated section in thevignette.

...

In.l():fixest objects. Inrep(): not currently used.

Details

To applyrep.fixest on a list offixest objects, it is absolutely necessary to use.l() and notlist().

Value

Returns a list of the appropriate length. Each element of the list is afixest object.

Examples

# Let's show results with different standard-errorsest = feols(Ozone ~ Solar.R + Wind + Temp, data = airquality)my_vcov = list(~ Month, ~ Day, ~ Day + Month)etable(rep(est, vcov = my_vcov))coefplot(rep(est, vcov = my_vcov), drop = "Int")## To rep multiple objects, you need to use .l()#est_bis = feols(Ozone ~ Solar.R + Wind + Temp | Month, airquality)etable(rep(.l(est, est_bis), vcov = my_vcov))# using eachetable(rep(.l(est, est_bis), each = 3, vcov = my_vcov))

Extracts residuals from afixest object

Description

This function extracts residuals from a fitted model estimated withfemlm,feols orfeglm.

Usage

## S3 method for class 'fixest'resid(  object,  type = c("response", "deviance", "pearson", "working"),  na.rm = TRUE,  ...)## S3 method for class 'fixest'residuals(  object,  type = c("response", "deviance", "pearson", "working"),  na.rm = TRUE,  ...)

Arguments

object

Afixest object. Obtained using the functionsfemlm,feols orfeglm.

type

A character scalar, either"response" (default),"deviance","pearson", or"working". Note that the"working" corresponds to the residualsfrom the weighted least square and only applies tofeglm models.

na.rm

Logical, default isTRUE. Whether to remove the observations with NAsfrom the original data set. IfFALSE, then the vector returned is always of the samelength as the original data set.

...

Not currently used.

Value

It returns a numeric vector of the length the number of observations used for the estimation(ifna.rm = TRUE) or of the length of the original data set (ifna.rm = FALSE).

Author(s)

Laurent Berge

See Also

See also the main estimation functionsfemlm,feols orfeglm.fitted.fixest,predict.fixest,summary.fixest,vcov.fixest,fixef.fixest.

Examples

# simple estimation on iris data, using "Species" fixed-effectsres_poisson = femlm(Sepal.Length ~ Sepal.Width + Petal.Length +                    Petal.Width | Species, iris)# we plot the residualsplot(resid(res_poisson))

Extracts the residuals from afixest_multi object

Description

Utility to extract the residuals from multiplefixest estimations. If possible,all the residuals are coerced into a matrix.

Usage

## S3 method for class 'fixest_multi'resid(  object,  type = c("response", "deviance", "pearson", "working"),  na.rm = FALSE,  ...)## S3 method for class 'fixest_multi'residuals(  object,  type = c("response", "deviance", "pearson", "working"),  na.rm = FALSE,  ...)

Arguments

object

Afixes_multi object.

type

A character scalar, either"response" (default),"deviance","pearson", or"working". Note that the"working" corresponds to the residualsfrom the weighted least square and only applies tofeglm models.

na.rm

Logical, default isFALSE. Should the NAs be kept? IfTRUE, they are removed.

...

Not currently used.

Value

If all the models return residuals of the same length, a matrix is returned. Otherwise,alist is returned.

Examples

base = irisnames(base) = c("y", "x1", "x2", "x3", "species")# A multiple estimationest = feols(y ~ x1 + csw0(x2, x3), base)# We can get all the residuals at once,# each column is a modelhead(resid(est))# We can select/order the model using fixest_multi extractionhead(resid(est[rhs = .N:1]))

Randomly draws observations from a data set

Description

This function is useful to check a data set. It gives a random number of rows ofthe input data set.

Usage

sample_df(x, n = 10, previous = FALSE)

Arguments

x

A data set: either a vector, a matrix or a data frame.

n

The number of random rows/elements to sample randomly.

previous

Logical scalar. Whether the results of the previous draw should be returned.

Value

A data base (resp vector) withn rows (resp elements).

Author(s)

Laurent Berge

Examples

sample_df(iris)sample_df(iris, previous = TRUE)

Functions exported fromsandwich to implementfixest methods

Description

The packagefixest does not useestfun orbread fromsandwich, but thesemethods have been implemented to allow users to leverage the variances fromsandwich.

Details


Sets the defaults of coefplot

Description

You can set the default values of most arguments ofcoefplot with this function.

Usage

setFixest_coefplot(  style,  horiz = FALSE,  dict = getFixest_dict(),  keep,  ci.width = "1%",  ci_level = 0.95,  pt.pch = 20,  pt.bg = NULL,  cex = 1,  pt.cex = cex,  col = 1:8,  pt.col = col,  ci.col = col,  lwd = 1,  pt.lwd = lwd,  ci.lwd = lwd,  ci.lty = 1,  grid = TRUE,  grid.par = list(lty = 3, col = "gray"),  zero = TRUE,  zero.par = list(col = "black", lwd = 1),  pt.join = FALSE,  pt.join.par = list(col = pt.col, lwd = lwd),  ci.join = FALSE,  ci.join.par = list(lwd = lwd, col = col, lty = 2),  ci.fill = FALSE,  ci.fill.par = list(col = "lightgray", alpha = 0.5),  ref.line = "auto",  ref.line.par = list(col = "black", lty = 2),  lab.cex,  lab.min.cex = 0.85,  lab.max.mar = 0.25,  lab.fit = "auto",  xlim.add,  ylim.add,  sep,  bg,  group = "auto",  group.par = list(lwd = 2, line = 3, tcl = 0.75),  main = "Effect on __depvar__",  value.lab = "Estimate and __ci__ Conf. Int.",  ylab = NULL,  xlab = NULL,  sub = NULL,  reset = FALSE)getFixest_coefplot()

Arguments

style

A character scalar giving the style of the plot to be used. Youcan set styles with the functionsetFixest_coefplot, setting all the defaultvalues of the function. If missing, then it switches to either "default" or "iplot",depending on the calling function.

horiz

A logical scalar, default isFALSE. Whether to display the confidenceintervals horizontally instead of vertically.

dict

A named character vector or a logical scalar. It changes the original variable namesto the ones contained in thedictionary. E.g. to change the variables nameda andb3 to(resp.) “$log(a)$” and to “$bonus^3$”, usedict=c(a="$log(a)$",b3="$bonus^3$").By default, it is equal togetFixest_dict(), a default dictionary which can be set withsetFixest_dict. You can usedict = FALSE to disable it. By defaultdict modifies theentries in the global dictionary, to disable this behavior, use "reset" as the first element(ex:dict=c("reset", mpg="Miles per gallon")).

keep

Character vector. This element is used to display only a subset of variables. Thisshould be a vector of regular expressions (seebase::regex help for more info). Eachvariable satisfying any of the regular expressions will be kept. This argument is applied postaliasing (see argumentdict). Example: you have the variablex1 tox55 and want to displayonlyx1 tox9, then you could usekeep = "x[[:digit:]]$". If the first character is anexclamation mark, the effect is reversed (e.g. keep = "!Intercept" means: every variable thatdoes not contain “Intercept” is kept). See details.

ci.width

The width of the extremities of the confidence intervals. Default is0.1.

ci_level

Scalar between 0 and 1: the level of the CI. By default it is equal to 0.95.

pt.pch

The patch of the coefficient estimates. Default is 1 (circle).

pt.bg

The background color of the point estimate (when thept.pch isin 21 to 25). Defaults to NULL.

cex

Numeric, default is 1. Expansion factor for the points

pt.cex

The size of the coefficient estimates. Default is the other argumentcex.

col

The color of the points and the confidence intervals. Default is 1("black"). Note that you can set the colors separately for each of themwithpt.col andci.col.

pt.col

The color of the coefficient estimates. Default is equal to the argumentcol.

ci.col

The color of the confidence intervals. Default is equal to the argumentcol.

lwd

General line with. Default is 1.

pt.lwd

The line width of the coefficient estimates. Default is equal tothe other argumentlwd.

ci.lwd

The line width of the confidence intervals. Default is equal tothe other argumentlwd.

ci.lty

The line type of the confidence intervals. Default is 1.

grid

Logical, default isTRUE. Whether a grid should be displayed. Youcan set the display of the grid with the argumentgrid.par.

grid.par

List. Parameters of the grid. The default values are:lty = 3 andcol = "gray". You can add any graphical parameter that will be passedtographics::abline. You also have two additional arguments: usehoriz = FALSE to disable the horizontal lines, and usevert = FALSE to disable thevertical lines. Eg:grid.par = list(vert = FALSE, col = "red", lwd = 2).

zero

Logical, default isTRUE. Whether the 0-line should be emphasized.You can set the parameters of that line with the argumentzero.par.

zero.par

List. Parameters of the zero-line. The default values arecol = "black" andlwd = 1. You can add any graphical parameter that will be passedtographics::abline. Example:zero.par = list(col = "darkblue", lwd = 3).

pt.join

Logical, default isFALSE. IfTRUE, then the coefficient estimatesare joined with a line.

pt.join.par

List. Parameters of the line joining the coefficients. Thedefault values are:col = pt.col andlwd = lwd. You can add any graphicalparameter that will be passed tolines. Eg:pt.join.par = list(lty = 2).

ci.join

Logical default toFALSE. Whether to join the extremities ofthe confidence intervals. IfTRUE, then you can set the graphical parameterswith the argumentci.join.par.

ci.join.par

A list of parameters to be passed tographics::lines.Only used ifci.join=TRUE. By default it is equal tolist(lwd = lwd, col = col, lty = 2).

ci.fill

Logical default toFALSE. Whether to fill the confidence intervalswith a color. IfTRUE, then you can set the graphical parameters with the argumentci.fill.par.

ci.fill.par

A list of parameters to be passed tographics::polygon.Only used ifci.fill=TRUE. By default it is equal tolist(col = "lightgray", alpha = 0.5).Note thatalpha is a special parameter that adds transparency to the color (ranges from 0 to 1).

ref.line

Logical or numeric, default is "auto", whose behavior dependson the situation. It isTRUE only if: i) interactions are plotted, ii) thex values are numeric and iii) a reference is found. IfTRUE, then a verticalline is drawn at the level of the reference value. Otherwise, if numeric a verticalline will be drawn at that specific value.

ref.line.par

List. Parameters of the vertical line on the reference. Thedefault values are:col = "black" andlty = 2. You can add any graphicalparameter that will be passed tographics::abline. Eg:ref.line.par = list(lty = 1, lwd = 3).

lab.cex

The size of the labels of the coefficients. Default is missing.It is automatically set by an internal algorithm which can go as low aslab.min.cex(another argument).

lab.min.cex

The minimum size of the coefficients labels, as set by theinternal algorithm. Default is 0.85.

lab.max.mar

The maximum size the left margin can take when trying to fitthe coefficient labels into it (only whenhoriz = TRUE). This is used in theinternal algorithm fitting the coefficient labels. Default is0.25.

lab.fit

The method to fit the coefficient labels into the plotting region(only whenhoriz = FALSE). Can be"auto" (the default),"simple","multi"or"tilted". If"simple", then the classic axis is drawn. If"multi", thenthe coefficient labels are fit horizontally across several lines, such that theydon't collide. If"tilted", then the labels are tilted. If"auto", an automaticchoice between the three is made.

xlim.add

A numeric vector of length 1 or 2. It represents an extensionfactor of xlim, in percentage. Eg:xlim.add = c(0, 0.5) extendsxlim of 50%on the right. If of length 1, positive values represent the right, and negativevalues the left (Eg:xlim.add = -0.5 is equivalent toxlim.add = c(0.5, 0)).

ylim.add

A numeric vector of length 1 or 2. It represents an extensionfactor of ylim, in percentage. Eg:ylim.add = c(0, 0.5) extendsylim of 50%on the top. If of length 1, positive values represent the top, and negative valuesthe bottom (Eg:ylim.add = -0.5 is equivalent toylim.add = c(0.5, 0)).

sep

The distance between two estimates – only when argumentobjectis a list of estimation results.

bg

Background color for the plot. By default it is white.

group

A list, default is missing. Each element of the list reports thecoefficients to be grouped while the name of the element is the group name. Eachelement of the list can be either: i) a character vector of length 1, ii) oflength 2, or ii) a numeric vector. If equal to: i) then it is interpreted asa pattern: all element fitting the regular expression will be grouped (note thatyou can use the special character "^^" to clean the beginning of the names, seeexample), if ii) it corresponds to the first and last elements to be grouped,if iii) it corresponds to the coefficients numbers to be grouped. If equal toa character vector, you can use a percentage to tell the algorithm to look atthe coefficients before aliasing (e.g."%varname"). Example of valid uses:⁠group=list(group_name=\"pattern\")⁠,⁠group=list(group_name=c(\"var_start\", \"var_end\"))⁠,⁠group=list(group_name=1:2))⁠. See details.

group.par

A list of parameters controlling the display of the group. Theparameters controlling the line are:lwd,tcl (length of the tick),line.adj(adjustment of the position, default is 0),tick (whether to add the ticks),lwd.ticks,col.ticks. Then the parameters controlling the text:text.adj(adjustment of the position, default is 0),text.cex,text.font,text.col.

main

The title of the plot. Default is"Effect on __depvar__". You canuse the special variable⁠__depvar__⁠ to set the title (useful when you set theplot default withsetFixest_coefplot).

value.lab

The label to appear on the side of the coefficient values. Ifhoriz = FALSE, the label appears in the y-axis. Ifhoriz = TRUE, then itappears on the x-axis. The default is equal to"Estimate and __ci__ Conf. Int.",with⁠__ci__⁠ a special variable giving the value of the confidence interval.

ylab

The label of the y-axis, default isNULL. Note that ifhoriz = FALSE, it overrides the value of the argumentvalue.lab.

xlab

The label of the x-axis, default isNULL. Note that ifhoriz = TRUE, it overrides the value of the argumentvalue.lab.

sub

A subtitle, default isNULL.

reset

Logical, default isTRUE. IfTRUE, then the arguments thatare not set during the call are reset to their "factory"-default values. IfFALSE, on the other hand, arguments that have already been modified are not changed.

Value

Doesn't return anything.

See Also

coefplot

Examples

# coefplot has many arguments, which makes it highly flexible.# If you don't like the default style of coefplot. No worries,# you can set *your* default by using the function# setFixest_coefplot()# Estimationest = feols(Petal.Length ~ Petal.Width + Sepal.Length +                Sepal.Width | Species, iris)# Plot with default stylecoefplot(est)# Now we permanently change some argumentsdict = c("Petal.Length"="Length (Petal)", "Petal.Width"="Width (Petal)",         "Sepal.Length"="Length (Sepal)", "Sepal.Width"="Width (Sepal)")setFixest_coefplot(ci.col = 2, pt.col = "darkblue", ci.lwd = 3,                   pt.cex = 2, pt.pch = 15, ci.width = 0, dict = dict)# Tadaaa!coefplot(est)# To reset to the default settings:setFixest_coefplot("all", reset = TRUE)coefplot(est)

Sets/gets the dictionary relabeling the variables

Description

Sets/gets the default dictionary used in the functionetable,did_means andcoefplot. The dictionaries are used to relabel variables (usually towards a fancier, moreexplicit formatting) when exporting them into a Latex table or displaying in graphs. By settingthe dictionary withsetFixest_dict, you can avoid providing the argumentdict.

Usage

setFixest_dict(dict = NULL, ..., reset = FALSE)getFixest_dict()

Arguments

dict

A named character vector or a character scalar. E.g. to change my variable named "a"and "b" to (resp.) "$log(a)$" and "$bonus^3$", then usedict = c(a="$log(a)$", b3="$bonus^3$"). Alternatively you can feed a character scalarcontaining the dictionary in the form"variable 1: definition \n variable 2: definition". Inthat case the functionas.dict will be applied to get a proper dictionary. This dictionaryis used in Latex tables or in graphs by the functioncoefplot. If you want to separate Latexrendering from rendering in graphs, use an ampersand first to make the variable specific tocoefplot.

...

You can add arguments of the form:variable_name = "Definition". This is analternative to using a named vector in the argumentdict.

reset

Logical, default isFALSE. IfTRUE, then the dictionary is reset. Note that thedefault dictionary always relabels the variable "(Intercept)" in to "Constant". To overwrite it,you need to add "(Intercept)" explicitly in your dictionary.

Details

By default the dictionary only grows. This means that successive calls with not erase theprevious definitions unless the argumentreset has been set toTRUE.

The default dictionary is equivalent to havingsetFixest_dict("(Intercept)" = "Constant"). Tochange this default, you need to provide a new definition to"(Intercept)" explicitly.

Author(s)

Laurent Berge

Examples

data(trade)est = feols(log(Euros) ~ log(dist_km)|Origin+Destination+Product, trade)# we export the result & rename some variablesetable(est, dict = c("log(Euros)"="Euros (ln)", Origin="Country of Origin"))# If you export many tables, it can be more convenient to use setFixest_dict:setFixest_dict(c("log(Euros)"="Euros (ln)", Origin="Country of Origin"))etable(est) # variables are properly relabeled# The dictionary only 'grows'# Here you get the previous two variables + the new one that are relabeled# Btw you set the dictionary directly using the argument names:setFixest_dict(Destination = "Country of Destination")etable(est)# Another way to set a dictionary: with a character string:# See the help page of as.dictdict = "log(dist_km): Distance (ln); Product: Type of Good"setFixest_dict(dict)etable(est)# And now we reset:setFixest_dict(reset = TRUE)etable(est)

Default arguments for fixest estimations

Description

This function sets globally the default arguments of fixest estimations.

Usage

setFixest_estimation(  data = NULL,  panel.id = NULL,  fixef.rm = "perfect_fit",  fixef.tol = 1e-06,  fixef.iter = 10000,  collin.tol = 1e-10,  lean = FALSE,  verbose = 0,  warn = TRUE,  fixef.keep_names = NULL,  demeaned = FALSE,  mem.clean = FALSE,  glm.iter = 25,  glm.tol = 1e-08,  data.save = FALSE,  reset = FALSE)getFixest_estimation()

Arguments

data

A data.frame containing the necessary variables to run the model.The variables of the non-linear right hand side of the formula are identifiedwith thisdata.frame names. Can also be a matrix.

panel.id

The panel identifiers. Can either be: i) a one sided formula(e.g.panel.id = ~id+time), ii) a character vector of length 2(e.g.panel.id=c('id', 'time'), or iii) a character scalar of two variablesseparated by a comma (e.g.panel.id='id,time'). Note that you can combine variableswith^ only inside formulas (see the dedicated section infeols).

fixef.rm

Can be equal to "perfect_fit" (default), "singletons", "infinite_coef"or "none".

This option controls which observations should be removed prior to the estimation.If "singletons", fixed-effects associated to a single observation are removed(since they perfectly explain it).

The value "infinite_coef" only works with GLM families with limited left hand sides (LHS)and exponential link.For instance the Poisson family for which the LHS cannot be lower than 0, or the logitfamily for which the LHS lies within 0 and 1.In that case the fixed-effects (FEs) with only-0 LHS would lead to infinite coefficients(FE = -Inf would explain perfectly the LHS).The valuefixef.rm="infinite_coef" removes all observations associated to FEs withinfinite coefficients.

If "perfect_fit", it is equivalent to "singletons" and "infinite_coef" combined.That means all observations that are perfectly explained by the FEs are removed.

If "none": no observation is removed.

Note that whathever the value of this options: the coefficient estimateswill remain the same. It only affects inference (the standard-errors).

The algorithm is recursive, meaning that, e.g. in the presence of several fixed-effects (FEs),removing singletons in one FE can create singletons (or perfect fits) in another FE.The algorithm continues until there is no singleton/perfect-fit remaining.

fixef.tol

Precision used to obtain the fixed-effects. Defaults to1e-5.It corresponds to the maximum absolute difference allowed between two coefficientsof successive iterations. Argumentfixef.tol cannot be lowerthan10000*.Machine$double.eps. Note that this parameter is dynamicallycontrolled by the algorithm.

fixef.iter

Maximum number of iterations in fixed-effects algorithm(only in use for 2+ fixed-effects). Default is 10000.

collin.tol

Numeric scalar, default is1e-9. Threshold deciding when variables shouldbe considered collinear and subsequently removed from the estimation. Higher values means morevariables will be removed (if there is presence of collinearity). One signal of presence ofcollinearity is t-stats that are extremely low (for instance when t-stats < 1e-3).

lean

Logical scalar, default isFALSE. IfTRUE then all large objects are removedfrom the returned result: this will save memory but will block the possibility touse many methods. It is recommended to use the argumentsse orcluster toobtain the appropriate standard-errors at estimation time, since obtaining differentSEs won't be possible afterwards.

verbose

Integer. Higher values give more information. In particular,it can detail the number of iterations in the demeaning algorithm(the first number is the left-hand-side, the other numbers are the right-hand-side variables).

warn

Logical, default isTRUE. Whether warnings should be displayed(concerns warnings relating to convergence state).

fixef.keep_names

Logical orNULL (default). When you combine differentvariables to transform them into a single fixed-effects you can doe.g.y ~ x | paste(var1, var2).The algorithm provides a shorthand to do the same operation:y ~ x | var1^var2.Because pasting variables is a costly operation, the internal algorithm may use anumerical trick to hasten the process. The cost of doing so is that you lose the labels.If you are interested in getting the value of the fixed-effects coefficientsafter the estimation, you should usefixef.keep_names = TRUE. By default it isequal toTRUE if the number of unique values is lower than 50,000, and toFALSEotherwise.

demeaned

Logical, default isFALSE. Only used in the presence of fixed-effects: shouldthe centered variables be returned? IfTRUE, it creates the itemsy_demeaned andX_demeaned.

mem.clean

Logical scalar, default isFALSE. Only to be used if the data set islarge compared to the available RAM. IfTRUE then intermediary objects are removed asmuch as possible andgc is run before each substantial C++ section in the internalcode to avoid memory issues.

glm.iter

Number of iterations of the glm algorithm. Default is 25.

glm.tol

Tolerance level for the glm algorithm. Default is1e-8.

data.save

Logical scalar, default isFALSE. IfTRUE, the data used forthe estimation is saved within the returned object. Hence later calls to predict(),vcov(), etc..., will be consistent even if the original data has been modifiedin the meantime.This is especially useful for estimations within loops, where the data changesat each iteration, such that postprocessing can be done outside the loop without issue.

reset

Logical scalar, default isFALSE. Whether to reset all values.

Value

The functiongetFixest_estimation returns the currently set global defaults.

Examples

## Example: removing singletons is FALSE by default## => changing this default# Let's create data with singletonsbase = irisnames(base) = c("y", "x1", "x2", "x3", "species")base$fe_singletons = as.character(base$species)base$fe_singletons[1:5] = letters[1:5]res          = feols(y ~ x1 + x2 | fe_singletons, base)res_noSingle = feols(y ~ x1 + x2 | fe_singletons, base, fixef.rm = "single")# New defaultssetFixest_estimation(fixef.rm = "single")res_newDefault = feols(y ~ x1 + x2 | fe_singletons, base)etable(res, res_noSingle, res_newDefault)# Resetting the defaultssetFixest_estimation(reset = TRUE)

Sets/gets formula macros

Description

You can set formula macros globally withsetFixest_fml. These macros can then be used infixest estimations or when using the functionxpd.

Usage

setFixest_fml(..., reset = FALSE)getFixest_fml()

Arguments

...

Definition of the macro variables. Each argument name corresponds to the name of themacro variable. It is required that each macro variable name starts with two dots(e.g...ctrl). The value of each argument must be a one-sided formula or a character vector,it is the definition of the macro variable. Example of a valid call:setFixest_fml(..ctrl = ~ var1 + var2). In the functionxpd, the default macro variables aretaken fromgetFixest_fml, any variable in... will replace these values. You can enclosevalues in.[], if so they will be evaluated from the current environment.For example..ctrl = ~ x.[1:2] + .[z] will lead to~x1 + x2 + var ifz is equal to"var".

reset

A logical scalar, defaults toFALSE. IfTRUE,all macro variables are first reset (i.e. deleted).

Details

Inxpd, the default macro variables are taken fromgetFixest_fml.Any value in the... argument ofxpd will replace these default values.

The definitions of the macro variables will replace in verbatim the macro variables.Therefore, you can include multipart formulas if you wish but then beware of the order themacros variable in the formula. For example, using the airquality data, say you want to set ascontrols the variableTemp andDay fixed-effects, you can dosetFixest_fml(..ctrl = ~Temp | Day), but thenfeols(Ozone ~ Wind + ..ctrl, airquality) will be quite different fromfeols(Ozone ~ ..ctrl + Wind, airquality), so beware!

Value

The functiongetFixest_fml() returns a list of character strings, the namescorresponding to the macro variable names, the character strings correspondingto their definition.

See Also

xpd to make use of formula macros.

Examples

# Small examples with airquality datadata(airquality)# we set two macro variablessetFixest_fml(..ctrl = ~ Temp + Day,              ..ctrl_long = ~ poly(Temp, 2) + poly(Day, 2))# Using the macro in lm with xpd:lm(xpd(Ozone ~ Wind + ..ctrl), airquality)lm(xpd(Ozone ~ Wind + ..ctrl_long), airquality)# You can use the macros without xpd() in fixest estimationsa = feols(Ozone ~ Wind + ..ctrl, airquality)b = feols(Ozone ~ Wind + ..ctrl_long, airquality)etable(a, b, keep = "Int|Win")# Using .[]base = setNames(iris, c("y", "x1", "x2", "x3", "species"))i = 2:3z = "species"lm(xpd(y ~ x.[2:3] + .[z]), base)# No xpd() needed in feolsfeols(y ~ x.[2:3] + .[z], base)## Auto completion with '..' suffix## You can trigger variables autocompletion with the '..' suffix# You need to provide the argument database = setNames(iris, c("y", "x1", "x2", "x3", "species"))xpd(y ~ x.., data = base)# In fixest estimations, this is automatically taken care offeols(y ~ x.., data = base)## You can use xpd for stepwise estimations## Note that for stepwise estimations in fixest, you can use# the stepwise functions: sw, sw0, csw, csw0# -> see help in feols or in the dedicated vignette# we want to look at the effect of x1 on y# controlling for different variablesbase = irisnames(base) = c("y", "x1", "x2", "x3", "species")# We first create a matrix with all possible combinations of variablesmy_args = lapply(names(base)[-(1:2)], function(x) c("", x))(all_combs = as.matrix(do.call("expand.grid", my_args)))res_all = list()for(i in 1:nrow(all_combs)){  res_all[[i]] = feols(xpd(y ~ x1 + ..v, ..v = all_combs[i, ]), base)}etable(res_all)coefplot(res_all, group = list(Species = "^^species"))## You can use macros to grep variables in your data set## Example 1: setting a macro variable globallydata(longley)setFixest_fml(..many_vars = grep("GNP|ployed", names(longley), value = TRUE))feols(Armed.Forces ~ Population + ..many_vars, longley)# Example 2: using ..("regex") or regex("regex") to grep the variables "live"feols(Armed.Forces ~ Population + ..("GNP|ployed"), longley)# Example 3: same as Ex.2 but without using a fixest estimation# Here we need to use xpd():lm(xpd(Armed.Forces ~ Population + regex("GNP|ployed"), data = longley), longley)# Stepwise estimation with regex: use a comma after the parenthesisfeols(Armed.Forces ~ Population + sw(regex(,"GNP|ployed")), longley)# Multiple LHSetable(feols(..("GNP|ployed") ~ Population, longley))## lhs and rhs arguments## to create a one sided formula from a character vectorvars = letters[1:5]xpd(rhs = vars)# Alternatively, to replace the RHSxpd(y ~ 1, rhs = vars)# To create a two sided formulaxpd(lhs = "y", rhs = vars)## argument 'add'#xpd(~x1, add = ~ x2 + x3)# also works with character vectorsxpd(~x1, add = c("x2", "x3"))# only adds to the RHSxpd(y ~ x, add = ~bon + jour)## argument add.after_pipe#xpd(~x1, add.after_pipe = ~ x2 + x3)# we can add a two sided formulaxpd(~x1, add.after_pipe = x2 ~ x3)## Dot square bracket operator## The basic use is to add variables in the formulax = c("x1", "x2")xpd(y ~ .[x])# Alternatively, one-sided formulas can be used and their content will be inserted verbatimx = ~x1 + x2xpd(y ~ .[x])# You can create multiple variables at oncexpd(y ~ x.[1:5] + z.[2:3])# You can summon variables from the environment to complete variables namesvar = "a"xpd(y ~ x.[var])# ... the variables can be multiplevars = LETTERS[1:3]xpd(y ~ x.[vars])# You can have "complex" variable names but they must be nested in character formxpd(y ~ .["x.[vars]_sq"])# DSB can be used within regular expressionsre = c("GNP", "Pop")xpd(Unemployed ~ regex(".[re]"), data = longley)# => equivalent to regex("GNP|Pop")# Use .[,var] (NOTE THE COMMA!) to expand with commas# !! can break the formula if missusedvars = c("wage", "unemp")xpd(c(y.[,1:3]) ~ csw(.[,vars]))# Example of use of .[] within a loopres_all = list()for(p in 1:3){  res_all[[p]] = feols(Ozone ~ Wind + poly(Temp, .[p]), airquality)}etable(res_all)# The former can be compactly estimated with:res_compact = feols(Ozone ~ Wind + sw(.[, "poly(Temp, .[1:3])"]), airquality)etable(res_compact)# How does it work?# 1)  .[, stuff] evaluates stuff and, if a vector, aggregates it with commas#     Comma aggregation is done thanks to the comma placed after the square bracket#     If .[stuff], then aggregation is with sums.# 2) stuff is evaluated, and if it is a character string, it is evaluated with# the function dsb which expands values in .[]## Wrapping up:# 2) evaluation of dsb("poly(Temp, .[1:3])") leads to the vector:#    c("poly(Temp, 1)", "poly(Temp, 2)", "poly(Temp, 3)")# 1) .[, c("poly(Temp, 1)", "poly(Temp, 2)", "poly(Temp, 3)")] leads to#    poly(Temp, 1), poly(Temp, 2), poly(Temp, 3)## Hence sw(.[, "poly(Temp, .[1:3])"]) becomes:#       sw(poly(Temp, 1), poly(Temp, 2), poly(Temp, 3))## In non-fixest functions: guessing the data allows to use regex## When used in non-fixest functions, the algorithm tries to "guess" the data# so that ..("regex") can be directly evaluated without passing the argument 'data'data(longley)lm(xpd(Armed.Forces ~ Population + ..("GNP|ployed")), longley)# same for the auto completion with '..'lm(xpd(Armed.Forces ~ Population + GN..), longley)

Sets properties offixest_multi objects

Description

Use this function to change the default behavior offixest_multi objects.

Usage

setFixest_multi(drop = FALSE)getFixest_multi()

Arguments

drop

Logical scalar, default isFALSE. Provides the default value of the argumentdrop when subsettingfixest_multi objects.

Value

The functiongetFixest_multi() returns the list of settings.

Examples

# 1) let's run a multiple estimationbase = setNames(iris, c("y", "x1", "x2", "x3", "species"))est = feols(y ~ csw(x1, x2, x3), base)# 2) let's pick a single estimation => by default we have a `fixest_multi` objectclass(est[rhs = 2])# `drop = TRUE` would have led to a `fixest` objectclass(est[rhs = 2, drop = TRUE])# 3) change the default behaviorsetFixest_multi(drop = TRUE)class(est[rhs = 2])

Sets/gets whether to display notes infixest estimation functions

Description

Sets/gets the default values of whether notes (informing for NA and observations removed) should be displayed infixest estimation functions.

Usage

setFixest_notes(x)getFixest_notes()

Arguments

x

A logical. IfFALSE, then notes are permanently removed.

Author(s)

Laurent Berge

Examples

# Change default withsetFixest_notes(FALSE)feols(Ozone ~ Solar.R, airquality)# Back to default which is TRUEsetFixest_notes(TRUE)feols(Ozone ~ Solar.R, airquality)

Sets/gets the number of threads to use infixest functions

Description

Sets/gets the default number of threads to used infixest estimation functions. The default is the maximum number of threads minus two.

Usage

setFixest_nthreads(nthreads, save = FALSE)getFixest_nthreads()

Arguments

nthreads

The number of threads. Can be: a) an integer lower than, or equal to, themaximum number of threads; b) 0: meaning all available threads will be used; c) a numberstrictly between 0 and 1 which represents the fraction of all threads to use. If missing, thedefault is to use 50% of all threads.

save

Either a logical or equal to"reset". Default isFALSE. IfTRUE then the valueis set permanently at the project level, this means that if you restart R, you will still obtainthe previously saved defaults. This is done by writing in the".Renviron" file, located in theproject's working directory, hence we must have write permission there for this to work, andonly works with Rstudio. If equal to "reset", the default at the project level is erased. Sincethere is writing in a file involved, permission is asked to the user.

Author(s)

Laurent Berge

Examples

# Gets the current number of threads(nthreads_origin = getFixest_nthreads())# To set multi-threading off:setFixest_nthreads(1)# To set it back to default at startup:setFixest_nthreads()# And back to the original valuesetFixest_nthreads(nthreads_origin)

Sets the default type of standard errors to be used

Description

This functions defines or extracts the default type of standard-errors to computed infixestsummary, andvcov.

Usage

setFixest_vcov(  no_FE = "iid",  one_FE = "iid",  two_FE = "iid",  panel = "iid",  all = NULL,  reset = FALSE)getFixest_vcov()

Arguments

no_FE

Character scalar equal to either:"iid" (default), or"hetero". The typeof standard-errors to use by default for estimations without fixed-effects.

one_FE

Character scalar equal to either:"iid" (default),"hetero", or"cluster".The type of standard-errors to use by default for estimations withone fixed-effect.

two_FE

Character scalar equal to either:"iid" (default),"hetero","cluster", or"twoway". The type of standard-errors to use by default for estimations withtwo or morefixed-effects.

panel

Character scalar equal to either:"iid" (default),"hetero","cluster", or"driscoll_kraaay". The type of standard-errors to use by default for estimations with theargumentpanel.id set up. Note that panel has precedence over the presence of fixed-effects.

all

Character scalar equal to either:"iid", or"hetero" (or"cluster" ifthe argumentno_FE is provided).By default is isNULL. If provided, it sets all the SEs to that value.

reset

Logical, default isFALSE. Whether to reset to the default values.

Value

The functiongetFixest_vcov() returns a list with three elements containing the default forestimations i) without, ii) with one, or iii) with two or more fixed-effects.

Examples

# By default: 'standard' VCOVsdata(base_did)est_no_FE  = feols(y ~ x1, base_did)est_one_FE = feols(y ~ x1 | id, base_did)est_two_FE = feols(y ~ x1 | id + period, base_did)est_panel  = feols(y ~ x1 | id + period, base_did, panel.id = ~id + period)etable(est_no_FE, est_one_FE, est_two_FE)# Changing the default standard-errorssetFixest_vcov(no_FE = "hetero", one_FE = "cluster",               two_FE = "twoway", panel = "drisc")etable(est_no_FE, est_one_FE, est_two_FE, est_panel)# Resetting the defaultssetFixest_vcov(reset = TRUE)

Residual standard deviation offixest estimations

Description

Extract the estimated standard deviation of the errors fromfixest estimations.

Usage

## S3 method for class 'fixest'sigma(object, ...)

Arguments

object

Afixest object.

...

Not currently used.

Value

Returns a numeric scalar.

See Also

feols,fepois,feglm,fenegbin,feNmlm.

Examples

est = feols(Petal.Length ~ Petal.Width, iris)sigma(est)

Design matrix of afixest object returned in sparse format

Description

This function creates the left-hand-side or the right-hand-side(s) of afemlm,feols orfeglm estimation.

Usage

sparse_model_matrix(  object,  data,  type = "rhs",  sample = "estimation",  na.rm = FALSE,  collin.rm = NULL,  combine = TRUE,  ...)

Arguments

object

Afixest object. Obtained using the functionsfemlm,feols orfeglm.

data

If missing (default) then the original data is obtained by evaluating thecall.Otherwise, it should be adata.frame.

type

Character vector or one sided formula, default is "rhs".Contains the type of matrix/data.frame to be returned. Possible values are:"lhs", "rhs", "fixef", "iv.rhs1" (1st stage RHS), "iv.rhs2" (2nd stage RHS),"iv.endo" (endogenous vars.), "iv.exo" (exogenous vars), "iv.inst" (instruments).

sample

Character scalar equal to "estimation" (default) or "original". Onlyused whendata=NULL (i.e. the original data is requested). By default,only the observations effectively used in the estimation are returned (it includesthe observations with NA values or the fully explained by the fixed-effects (FE), ordue to NAs in the weights).

Ifsample="original", all the observations are returned. In that case, ifyou usena.rm=TRUE (which is not the default), you can withdraw the observationswith NA values (and keep the ones fully explained by the FEs).

na.rm

Default isFALSE. Should observations with NAs be removed from the matrix?

collin.rm

Logical scalar. Whether to remove variables that werefound to be collinear during the estimation. Beware: it does not perform acollinearity check and bases on thecoef(object).Default is TRUE if object is afixest object, orFALSE if object is a formula.

combine

Logical scalar, default isTRUE. Whether to combine eachresulting sparse matrix.

...

Not currently used.

Value

It returns either a single sparse matrix a list of matrices,depending whethercombine isTRUE orFALSE.The sparse matrix is of classdgCMatrix from theMatrix package.

Author(s)

Laurent Berge, Kyle Butts

See Also

See also the main estimation functionsfemlm,feols orfeglm.formula.fixest,update.fixest,summary.fixest,vcov.fixest.

Examples

est = feols(wt ~ i(vs) + hp | cyl, mtcars)sparse_model_matrix(est)sparse_model_matrix(wt ~ i(vs) + hp | cyl, mtcars)

Governs the small sample correction infixest VCOVs

Description

Provides how the small sample correction should be calculated invcov.fixest/summary.fixest.

Usage

ssc(  K.adj = TRUE,  K.fixef = "nonnested",  K.exact = FALSE,  G.adj = TRUE,  G.df = "min",  t.df = "min",  ...)setFixest_ssc(ssc.type = ssc())getFixest_ssc()

Arguments

K.adj

Logical scalar, defaults toTRUE. Whether to apply a small sample adjustment ofthe form(n - 1) / (n - K), withK the number of estimated parameters. IfFALSE, thenno adjustment is made.

K.fixef

Character scalar equal to"nonnested" (default),"none" or"full". Inthe small sample adjustment, how to account for the fixed-effects parameters. If"none",the fixed-effects parameters are discarded, meaning the number of parameters (K) is onlyequal to the number of variables. If"full", then the number of parameters is equal tothe number of variables plus the number of fixed-effects. Finally, if"nonnested", thenthe number of parameters is equal to the number of variables plus the number offixed-effects thatare not nested in the clusters used to cluster the standard-errors.

K.exact

Logical, default isFALSE. If there are 2 or more fixed-effects,these fixed-effects they can be irregular, meaning they can provide the same information.If so, the "real" number of parameters should be lower than the total number offixed-effects. IfK.exact = TRUE, thenfixef.fixest is first run todetermine the exact number of parameters among the fixed-effects. Mostly, panels ofthe type individual-firm requireK.exact = TRUE (but it adds computational costs).

G.adj

Logical scalar, default isTRUE. How to make the small sample correctionwhen clustering the standard-errors? IfTRUE aG/(G-1) correction is performed withGthe number of cluster values.

G.df

Either "conventional" or "min" (default). Only relevant when thevariance-covariance matrix is two-way clustered (or higher). It governs how the smallsample adjustment for the clusters is to be performed. [Sorry for the jargon that follows.]By default a unique adjustment is made, of the form G_min/(G_min-1) with G_min thesmallest G_i. IfG.df="conventional" then the i-th "sandwich" matrix is adjustedwith G_i/(G_i-1) with G_i the number of unique clusters.

t.df

Either "conventional", "min" (default) or an integer scalar. Only relevant whenthe variance-covariance matrix is clustered. It governs how the p-values should be computed.By default, the degrees of freedom of the Student t distribution is equal to the minimum sizeof the clusters with which the VCOV has been clustered minus one. Ift.df="conventional",then the degrees of freedom of the Student t distribution is equal to the number ofobservations minus the number of estimated variables. You can also pass a number tomanually specify the DoF of the t-distribution.

...

Only used internally (to catch deprecated parameters).

ssc.type

An object of classssc.type obtained with the functionssc.

Details

The following vignette:On standard-errors,describes in details how the standard-errors are computed infixest and how you canreplicate standard-errors from other software.

Value

It returns assc.type object.

Author(s)

Laurent Berge

See Also

summary.fixest,vcov.fixest

Examples

## Equivalence with lm/glm standard-errors## LM# In the absence of fixed-effects,# by default, the standard-errors are computed in the same wayres = feols(Petal.Length ~ Petal.Width + Species, iris)res_lm = lm(Petal.Length ~ Petal.Width + Species, iris)vcov(res) / vcov(res_lm)# GLM# By default, there is no small sample adjustment in glm, as opposed to feglm.# To get the same SEs, we need to use ssc(K.adj = FALSE)res_pois = fepois(round(Petal.Length) ~ Petal.Width + Species, iris)res_glm = glm(round(Petal.Length) ~ Petal.Width + Species, iris, family = poisson())vcov(res_pois, ssc = ssc(K.adj = FALSE)) / vcov(res_glm)# Same example with the Gammares_gamma = feglm(round(Petal.Length) ~ Petal.Width + Species, iris, family = Gamma())res_glm_gamma = glm(round(Petal.Length) ~ Petal.Width + Species, iris, family = Gamma())vcov(res_gamma, ssc = ssc(K.adj = FALSE)) / vcov(res_glm_gamma)## Fixed-effects corrections## We create "irregular" FEsbase = data.frame(x = rnorm(10))base$y = base$x + rnorm(10)base$fe1 = rep(1:3, c(4, 3, 3))base$fe2 = rep(1:5, each = 2)est = feols(y ~ x | fe1 + fe2, base)# fe1: 3 FEs# fe2: 5 FEs## Clustered standard-errors: by fe1## Default: K.fixef = "nonnested"#  => adjustment K = 1 + 5 (i.e. x + fe2)summary(est)attributes(vcov(est, attr = TRUE))[c("ssc", "df.K")]# K.fixef = FALSE#  => adjustment K = 1 (i.e. only x)summary(est, ssc = ssc(K.fixef = "none"))attr(vcov(est, ssc = ssc(K.fixef = "none"), attr = TRUE), "df.K")# K.fixef = TRUE#  => adjustment K = 1 + 3 + 5 - 1 (i.e. x + fe1 + fe2 - 1 restriction)summary(est, ssc = ssc(K.fixef = "full"))attr(vcov(est, ssc = ssc(K.fixef = "full"), attr = TRUE), "df.K")# K.fixef = TRUE & K.exact = TRUE#  => adjustment K = 1 + 3 + 5 - 2 (i.e. x + fe1 + fe2 - 2 restrictions)summary(est, ssc = ssc(K.fixef = "full", K.exact = TRUE))attr(vcov(est, ssc = ssc(K.fixef = "full", K.exact = TRUE), attr = TRUE), "df.K")# There are two restrictions:attr(fixef(est), "references")## To permanently set the default ssc:## eg no small sample adjustment:setFixest_ssc(ssc(K.adj = FALSE))# Factory defaultsetFixest_ssc()

Stepwise estimation tools

Description

Functions to perform stepwise estimations infixest models.

Usage

sw(...)csw(...)sw0(...)csw0(...)mvsw(...)

Arguments

...

Represents formula variables to be added in a stepwise fashion to an estimation.

Details

To include multiple independent variables, you need to use the stepwise functions.There are 5 stepwise functions:sw,sw0,csw,csw0 andmvsw. Let's explain that.

Assume you have the following formula:fml = y ~ x1 + sw(x2, x3). The stepwisefunctionsw will estimate the following two models:y ~ x1 + x2 andy ~ x1 + x3.That is, each element insw() is sequentially, and separately, added to the formula.Would have you usedsw0 in lieu ofsw, then the modely ~ x1 would also havebeen estimated. The0 in the name implies that the model without any stepwiseelement will also be estimated.

Finally, the prefixc means cumulative: each stepwise element is added to the next.That is,fml = y ~ x1 + csw(x2, x3) would lead to the following modelsy ~ x1 + x2andy ~ x1 + x2 + x3. The0 has the same meaning and would also lead to the modelwithout the stepwise elements to be estimated: in other words,fml = y ~ x1 + csw0(x2, x3) leads to the following three models:y ~ x1,y ~ x1 + x2 andy ~ x1 + x2 + x3.

The last stepwise function,mvsw, refers to 'multiverse' stepwise. It will estimateas many models as there are unique combinations of stepwise variables. For examplefml = y ~ x1 + mvsw(x2, x3) will estimatey ~ x1,y ~ x1 + x2,y ~ x1 + x3,y ~ x1 + x2 + x3. Beware that the number of estimations grows pretty fast (2^n,withn the number of stewise variables)!

Examples

base = setNames(iris, c("y", "x1", "x2", "x3", "species"))# Regular stepwisefeols(y ~ sw(x1, x2, x3), base)# Cumulative stepwisefeols(y ~ csw(x1, x2, x3), base)# Using the 0feols(y ~ x1 + x2 + sw0(x3), base)# Multiverse stepwisefeols(y ~ x1 + mvsw(x2, x3), base)

Style of data.frames created by etable

Description

This function describes the style of data.frames created with the functionetable.

Usage

style.df(  depvar.title = "Dependent Var.:",  fixef.title = "Fixed-Effects:",  fixef.line = "-",  fixef.prefix = "",  fixef.suffix = "",  slopes.title = "Varying Slopes:",  slopes.line = "-",  slopes.format = "__var__ (__slope__)",  stats.title = "_",  stats.line = "_",  yesNo = c("Yes", "No"),  headers.sep = TRUE,  signif.code = c(`***` = 0.001, `**` = 0.01, `*` = 0.05, . = 0.1),  interaction.combine = " x ",  i.equal = " = ",  default = FALSE)

Arguments

depvar.title

Character scalar. Default is"Dependent Var.:".The row name of the dependent variables.

fixef.title

Character scalar. Default is"Fixed-Effects:". The header preceding thefixed-effects. If equal to the empty string, then this line is removed.

fixef.line

A single character. Default is"-". A character that will be used to createa line of separation for the fixed-effects header. Used only iffixef.title is not the emptystring.

fixef.prefix

Character scalar. Default is"". A prefix to appearbefore each fixed-effect name.

fixef.suffix

Character scalar. Default is"". A suffix to appearafter each fixed-effect name.

slopes.title

Character scalar. Default is"Varying Slopes:". The header preceding thevariables with varying slopes. If equal to the empty string, then this line is removed.

slopes.line

Character scalar. Default is"-". A character that will be used to create aline of separation for the variables with varying slopes header. Used only ifslopes.line isnot the empty string.

slopes.format

Character scalar. Default is"__var__ (__slope__)". The format of thename of the varying slopes. The values⁠__var__⁠ and⁠__slope__⁠ are special characters thatwill be replaced by the value of the variable name and slope name, respectively.

stats.title

Character scalar. Default is"_". The header preceding the statisticssection. If equal to the empty string, then this line is removed. If equal to single character(like in the default), then this character will be expanded to take the full column width.

stats.line

Character scalar. Default is"_". A character that will be used to create aline of separation for the statistics header. Used only ifstats.title is not the empty string.

yesNo

Character vector of length 1 or 2. Default isc("Yes", "No"). Used to inform onthe presence or absence of fixed-effects in the estimation. If of length 1, then automaticallythe second value is considered as the empty string.

headers.sep

Logical, default isTRUE. Whether to add a line of separation between theheaders and the coefficients.

signif.code

Named numeric vector, used to provide the significance codes with respect tothe p-value of the coefficients. Default isc("***"=0.001, "**"=0.01, "*"=0.05, "."=0.10). Tosuppress the significance codes, usesignif.code=NA orsignif.code=NULL. Can also be equalto"letters", then the default becomesc("a"=0.01, "b"=0.05, "c"=0.10).

interaction.combine

Character scalar, defaults to" x ". When the estimation containsinteractions, then the variables names (after aliasing) are combined with this argument. Forexample: ifdict = c(x1="Wind", x2="Rain") and you have the following interactionx1:x2,then it will be renamed (by default)⁠Wind x Rain⁠ – usinginteraction.combine = "*" wouldlead toWind*Rain.

i.equal

Character scalar, defaults to" = ". Only affects factor variables created withthe functioni, tells how the variable should be linked to its value. For example if youhave the Species factor from the iris data set, by default the display of the variable isSpecies = Setosa, etc. Ifi.equal = ": " the display becomesSpecies: Setosa.

default

Logical, default isFALSE. IfTRUE, all the values not provided by the userare set to their default.

Details

@inheritParams etable

The title elements (depvar.title,fixef.title,slopes.title andstats.title) will be therow names of the returned data.frame. Therefore keep in mind that any two of them should not beidentical (since identical row names are forbidden in data.frames).

Value

It returns an object of classfixest_style_df.

Examples

# Multiple estimations => see details in feolsaq = airqualityest = feols(c(Ozone, Solar.R) ~                Wind + csw(Temp, Temp^2, Temp^3) | Month + Day,            data = aq)# Default resultetable(est)# Playing a bit with the stylesetable(est, style.df = style.df(fixef.title = "", fixef.suffix = " FE",                                 stats.line = " ", yesNo = "yes"))

Style definitions for Latex tables

Description

This function describes the style of Latex tables to be exported with the functionetable.

Usage

style.tex(  main = "base",  depvar.title,  model.title,  model.format,  line.top,  line.bottom,  var.title,  fixef.title,  fixef.prefix,  fixef.suffix,  fixef.where,  slopes.title,  slopes.format,  fixef_sizes.prefix,  fixef_sizes.suffix,  stats.title,  notes.intro,  notes.tpt.intro,  tablefoot,  tablefoot.value,  yesNo,  tabular = "normal",  depvar.style,  no_border,  caption.after,  rules_width,  signif.code,  tpt,  arraystretch,  adjustbox = NULL,  fontsize,  interaction.combine = " $\\times$ ",  i.equal = " $=$ ")

Arguments

main

Either "base", "aer" or "qje". Defines the basic style to start from. The styles"aer" and "qje" are almost identical and only differ on the top/bottom lines.

depvar.title

A character scalar. The title of the line of the dependent variables(defaults to"Dependent variable(s):" ifmain = "base" (the 's' appears only if just onevariable) and to"" ifmain = "aer").

model.title

A character scalar. The title of the line of the models (defaults to"Model:" ifmain = "base" and to"" ifmain = "aer").

model.format

A character scalar. The value to appear on top of each column. It defaultsto"(1)". Note that 1, i, I, a and A are special characters: if found, their values will beautomatically incremented across columns.

line.top

A character scalar equal to"simple","double", or anything else. The lineat the top of the table (defaults to"double" ifmain = "base" and to"simple" ifmain = "aer")."simple" is equivalent to"\\toprule", and"double" to"\\tabularnewline \\midrule \\midrule".

line.bottom

A character scalar equal to"simple","double", or anything else. Theline at the bottom of the table (defaults to"double" ifmain = "base" and to"simple"ifmain = "aer")."simple" is equivalent to"\\bottomrule", and"double" to"\\midrule \\midrule & \\tabularnewline".

var.title

A character scalar. The title line appearing before the variables (defaults to"\\midrule \\emph{Variables}" ifmain = "base" and to"\\midrule" ifmain = "aer").Note that the behavior ofvar.title = " " (a space) is different fromvar.title = ""(the empty string): in the first case you will get an empty row, while in the second caseyou get no empty row. To get a line without an empty row, use"\\midrule" (and not"\\midrule "!–the space!).

fixef.title

A character scalar. The title line appearing before the fixed-effects(defaults to"\\midrule \\emph{Fixed-effects}" ifmain = "base" and to" " ifmain = "aer"). Note that the behavior offixef.title = " " (a space) is different fromfixef.title = "" (the empty string): in the first case you will get an empty row, while in thesecond case you get no empty row. To get a line without an empty row, use"\\midrule"(and not"\\midrule "!–the space!).

fixef.prefix

A prefix to add to the fixed-effects names. Defaults to""(i.e. no prefix).

fixef.suffix

A suffix to add to the fixed-effects names. Defaults to"" ifmain = "base") and to"fixed-effects" ifmain = "aer").

fixef.where

Either "var" or "stats". Where to place the fixed-effects lines?Defaults to"var", i.e. just after the variables, ifmain = "base") and to"stats", i.e. just after the statistics, ifmain = "aer").

slopes.title

A character scalar. The title line appearing before the variables withvarying slopes (defaults to"\\midrule \\emph{Varying Slopes}" ifmain = "base"and to"" ifmain = "aer"). Note that the behavior ofslopes.title = " " (a space)is different fromslopes.title = "" (the empty string): in the first case you will getan empty row, while in the second case you get no empty row. To get a line without anempty row, use"\\midrule" (and not"\\midrule "!–the space!).

slopes.format

Character scalar representing the format of the slope variable name.There are two special characters: "var" and "slope", placeholers for the variableand slope names. Defaults to"__var__ (__slope__)" ifmain = "base") andto"__var__ $\\times $ __slope__" ifmain = "aer").

fixef_sizes.prefix

A prefix to add to the fixed-effects names. Defaults to"# ".

fixef_sizes.suffix

A suffix to add to the fixed-effects names. Defaultsto"" (i.e. no suffix).

stats.title

A character scalar. The title line appearing before the statistics(defaults to⁠"\\midrule \\emph{Fit statistics"}⁠ ifmain = "base" and to" "ifmain = "aer"). Note that the behavior ofstats.title = " " (a space) is differentfromstats.title = "" (the empty string): in the first case you will get an empty row,while in the second case you get no empty row. To get a line without an empty row,use"\\midrule" (and not"\\midrule "!–the space!).

notes.intro

A character scalar. Some tex code appearing just before the notes,defaults to"\\par \\raggedright \n".

notes.tpt.intro

Character scalar. Only used iftpt = TRUE, it is some tex code that ispassed before anythreeparttable item (can be used for, typically, the font size). Default isthe empty string.

tablefoot

A logical scalar. Whether or not to display a footer within the table.Defaults toTRUE ifmain = "base") andFALSE ifmain = "aer").

tablefoot.value

A character scalar. The notes to be displayed in the footer.Defaults to"default" ifmain = "base", which leads to custom footers informing onthe type of standard-error and significance codes, depending on the estimations.

yesNo

A character vector of length 1 or 2. Defaults to"Yes" ifmain = "base"and to"$\\checkmark$" ifmain = "aer" (from packageamssymb). This is the messagedisplayed when a given fixed-effect is (or is not) included in a regression.IfyesNo is of length 1, then the second element is the empty string.

tabular

(Tex only.) Character scalar equal to "normal" (default),"*" or"X".Represents the type of tabular environment to use: eithertabular,⁠tabular*⁠ ortabularx.

depvar.style

Character scalar equal to either" " (default),"*" (italic),"**"(bold),"***" (italic-bold). How the name of the dependent variable should be displayed.

no_border

Logical, default isFALSE. Whether to remove any side border to the table (typically adds⁠@\{\⁠ to the sides of the tabular).

caption.after

Character scalar. Tex code that will be placed right after the caption.Defaults to"" formain = "base" and"\\medskip" formain = "aer".

rules_width

Character vector of length 1 or 2. This vector gives the width of thebooktabs rules: the first element the heavy-width, the second element the light-width. NAvalues mean no modification. If of length 1, only the heavy rules are modified. The width are inLatex units (ex:"0.1 em", etc).

signif.code

Named numeric vector, used to provide the significance codes with respect tothe p-value of the coefficients. Default isc("***"=0.01, "**"=0.05, "*"=0.10). To suppressthe significance codes, usesignif.code=NA orsignif.code=NULL. Can also be equal to"letters", then the default becomesc("a"=0.01, "b"=0.05, "c"=0.10).

tpt

(Tex only.) Logical scalar, default is FALSE. Whether to use thethreeparttableenvironment. If so, thenotes will be integrated into thetablenotes environment.

arraystretch

(Tex only.) A numeric scalar, default isNULL. If provided,the command⁠\\renewcommand*{\\arraystretch{x}}⁠ is inserted, replacingx by the value ofarraystretch. The changes are specific to the current table and do not affect the rest of thedocument.

adjustbox

(Tex only.) A logical, numeric or character scalar, default isNULL. If notNULL, the table is inserted within theadjustbox environment. By default the options are⁠width = 1\\textwidth, center⁠ (ifTRUE). A numeric value changes the value before⁠\\textwidth⁠. You can also add a character of the form"x tw" or"x th" withx a numberand where tw (th) stands for text-width (text-height). Finally any other character value ispassed verbatim as anadjustbox option.

fontsize

(Tex only.) A character scalar, default isNULL. Can be equal totiny,scriptsize,footnotesize,small,normalsize,large, orLarge. The change affect thetable only (and not the rest of the document).

interaction.combine

Character scalar, defaults to" $\\times$ ". When the estimationcontains interactions, then the variables names (after aliasing) are combined with thisargument. For example: ifdict = c(x1="Wind", x2="Rain") and you have the followinginteractionx1:x2, then it will be renamed (by default)⁠Wind $\\times$ Rain⁠ – usinginteraction.combine = "*" would lead toWind*Rain.

i.equal

Character scalar, defaults to" $=$ ". Only affects factor variables createdwith the functioni, tells how the variable should be linked to its value. For example ifyou have the Species factor from the iris data set, by default the display of the variable is⁠Species $=$ Setosa⁠, etc. Ifi.equal = ": " the display becomesSpecies: Setosa.

Details

The⁠\\checkmark⁠ command, used in the "aer" style (in argumentyesNo), is in theamssymb package.

The commands⁠\\toprule⁠,⁠\\midrule⁠ and⁠\\bottomrule⁠ are in thebooktabs package.You can set the width of the top/bottom rules with⁠\\setlength\\heavyrulewidth\{wd\}⁠,and of the midrule with⁠\\setlength\\lightrulewidth\{wd\}⁠.

Note that all titles (depvar.title,depvar.title, etc) are not escaped, so theymust be valid Latex expressions.

Value

Returns a list containing the style parameters.

See Also

etable

Examples

# Multiple estimations => see details in feolsaq = airqualityest = feols(c(Ozone, Solar.R) ~                Wind + csw(Temp, Temp^2, Temp^3) | Month + Day,            data = aq)# Playing a bit with the stylesetable(est, tex = TRUE)etable(est, tex = TRUE, style.tex = style.tex("aer"))etable(est, tex = TRUE, style.tex = style.tex("aer",                                      var.title = "\\emph{Expl. Vars.}",                                      model.format = "[i]",                                      yesNo = "x",                                      tabular = "*"))

Summary of afixest object. Computes different types of standard errors.

Description

This function is similar toprint.fixest. It provides the table of coefficients along withother information on the fit of the estimation. It can compute different types of standarderrors. The new variance covariance matrix is an object returned.

Usage

## S3 method for class 'fixest'summary(  object,  vcov = NULL,  cluster = NULL,  ssc = NULL,  stage = NULL,  lean = FALSE,  agg = NULL,  forceCovariance = FALSE,  se = NULL,  keepBounded = FALSE,  n = 1000,  vcov_fix = TRUE,  nthreads = getFixest_nthreads(),  ...)## S3 method for class 'fixest_list'summary(  object,  se,  cluster,  ssc = getFixest_ssc(),  vcov = NULL,  stage = 2,  lean = FALSE,  n,  ...)

Arguments

object

Afixest object. Obtained using the functionsfemlm,feols orfeglm.

vcov

Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. TheVCOV types implemented are: "iid", "hetero" (or "HC1"), "cluster", "twoway","NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley". It also acceptsobject fromvcov_cluster,vcov_NW,NW,vcov_DK,DK,vcov_conley andconley. It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances. See thevcov documentationin thevignette.

cluster

Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering overvar1 andvar2contained in the data.framebase used for the estimation. All the followingcluster arguments are valid and do the same thing:cluster = base[, c("var1", "var2")],cluster = c("var1", "var2"),cluster = ~var1+var2.If the two variables were used as fixed-effects in the estimation, you can leave itblank withvcov = "twoway" (assumingvar1 [resp.var2] wasthe 1st [resp. 2nd] fixed-effect). You can interact two variables using^ withthe following syntax:cluster = ~var1^var2 orcluster = "var1^var2".

ssc

An object of classssc.type obtained with the functionssc. Representshow the degree of freedom correction should be done.You must use the functionsscfor this argument. The arguments and defaults of the functionssc are:K.adj = TRUE,K.fixef = "nonnested",G.adj = TRUE,G.df = "min",t.df = "min",⁠K.exact = FALSE)⁠. See the help of the functionssc for details.

stage

Can be equal to2 (default),1,1:2 or2:1. Only used if the objectis an IV estimation: defines the stage to whichsummary should be applied. Ifstage = 1and there are multiple endogenous regressors or ifstage is of length 2, then anobject of classfixest_multi is returned.

lean

Logical, default isFALSE. Used to reduce the (memory) size of the summary object.IfTRUE, then all objects of length N (the number of observations) are removedfrom the result. Note that somefixest methods may consequently not work when appliedto the summary.

agg

A character scalar describing the variable names to be aggregated,it is pattern-based. Forsunab estimations, the following keywords work: "att","period", "cohort" andFALSE (to have full disaggregation). All variables thatmatch the pattern will be aggregated. It must be of the form"(root)", the parenthesesmust be there and the resulting variable name will be"root". You can add anotherroot with parentheses:"(root1)regex(root2)", in which case the resultingname is"root1::root2". To name the resulting variable differently you can passa named vector:c("name" = "pattern") orc("name" = "pattern(root2)"). It's abit intricate sorry, please see the examples.

forceCovariance

(Advanced users.) Logical, default isFALSE. In the peculiar casewhere the obtained Hessian is not invertible (usually because of collinearity ofsome variables), use this option to force the covariance matrix, by using a generalizedinverse of the Hessian. This can be useful to spot where possible problems come from.

se

Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation:se = "cluster", otherwisese = "iid". Note that this argument is deprecated,you should usevcov instead.

keepBounded

(Advanced users –feNmlm with non-linear part and boundedcoefficients only.) Logical, default isFALSE. IfTRUE, then the bounded coefficients(if any) are treated as unrestricted coefficients and their S.E. is computed (otherwiseit is not).

n

Integer, default is 1000. Number of coefficients to display when the print methodis used.

vcov_fix

Logical scalar, default isFALSE. If the VCOV ends up not beingpositive definite, whether to "fix" it using an eigenvalue decomposition(a la Cameron, Gelbach & Miller 2011).Since the VCOV should be PSD asymptotically, this might be a sign of a problemwith using the asymptotic approximation (e.g. too few units in clusters).If a problem is detected, the function will print a message to inform you.

nthreads

The number of threads. Can be: a) an integer lower than, or equal to,the maximum number of threads; b) 0: meaning all available threads will be used;c) a number strictly between 0 and 1 which represents the fraction of all threads to use.The default is to use 50% of all threads. You can set permanently the numberof threads used within this package using the functionsetFixest_nthreads.

...

Only used if the argumentvcov is provided and is a function: extra argumentsto be passed to that function.

Value

It returns afixest object with:

cov.scaled

The new variance-covariance matrix (computed according to the argumentse).

se

The new standard-errors (computed according to the argumentse).

coeftable

The table of coefficients with the new standard errors.

Compatibility withsandwich package

The VCOVs fromsandwich can be used withfeols,feglm andfepois estimations.If you want to have asandwich VCOV when usingsummary.fixest, you can usethe argumentvcov to specify the VCOV function to use (see examples).Note that if you do so and you use a formula in thecluster argument, an innocuouswarning can pop up if you used several non-numeric fixed-effects in the estimation(this is due to the functionexpand.model.frame used insandwich).

Author(s)

Laurent Berge

See Also

See also the main estimation functionsfemlm,feols orfeglm.Usefixef.fixest to extract the fixed-effects coefficients, and the functionetableto visualize the results of multiple estimations.

Examples

# Load trade datadata(trade)# We estimate the effect of distance on trade (with 3 fixed-effects)est_pois = fepois(Euros ~ log(dist_km)|Origin+Destination+Product, trade)# Comparing different types of standard errorssum_standard = summary(est_pois, vcov = "iid")sum_hetero   = summary(est_pois, vcov = "hetero")sum_oneway   = summary(est_pois, vcov = "cluster")sum_twoway   = summary(est_pois, vcov = "twoway")etable(sum_standard, sum_hetero, sum_oneway, sum_twoway)# Alternative ways to cluster the SE:summary(est_pois, vcov = cluster ~ Product + Origin)summary(est_pois, vcov = ~Product + Origin)summary(est_pois, cluster = ~Product + Origin)# You can interact the clustering variables "live" using the var1 ^ var2 syntax.#'summary(est_pois, vcov = ~Destination^Product)## Newey-West and Driscoll-Kraay SEs#data(base_did)# Simple estimation on a panelest = feols(y ~ x1, base_did)# --# Newey-West# Use the syntax NW ~ unit + timesummary(est, NW ~ id + period)# Now take a lag of 3:summary(est, NW(3) ~ id + period)# --# Driscoll-Kraay# Use the syntax DK ~ timesummary(est, DK ~ period)# Now take a lag of 3:summary(est, DK(3) ~ period)#--# Implicit deductions# When the estimation is done with a panel.id, you don't need to# specify these values.est_panel = feols(y ~ x1, base_did, panel.id = ~id + period)# Both methods, NM and DK, now work automaticallysummary(est_panel, "NW")summary(est_panel, "DK")## VCOVs robust to spatial correlation#data(quakes)est_geo = feols(depth ~ mag, quakes)# --# Conley# Use the syntax: conley(cutoff) ~ lat + lon# with lat/lon the latitude/longitude variable names in the data setsummary(est_geo, conley(100) ~ lat + long)# Change the cutoff, and how the distance is computedsummary(est_geo, conley(200, distance = "spherical") ~ lat + long)# --# Implicit deduction# By default the latitude and longitude are directly fetched in the data based# on pattern matching. So you don't have to specify them.# Further an automatic cutoff is computed by default.# The following workssummary(est_geo, "conley")## Compatibility with sandwich## You can use the VCOVs from sandwich by using the argument vcov:library(sandwich)summary(est_pois, vcov = vcovCL, cluster = trade[, c("Destination", "Product")])

Summary method for fixed-effects coefficients

Description

This function summarizes the main characteristics of the fixed-effects coefficients.It shows the number of fixed-effects that have been set as references and the firstelements of the fixed-effects.

Usage

## S3 method for class 'fixest.fixef'summary(object, n = 5, ...)

Arguments

object

An object returned by the functionfixef.fixest.

n

Positive integer, defaults to 5. Then first fixed-effects for eachfixed-effect dimension are reported.

...

Not currently used.

Value

It prints the number of fixed-effect coefficients per fixed-effect dimension, as well asthe number of fixed-effects used as references for each dimension, and the mean and varianceof the fixed-effect coefficients. Finally, it reports the first 5 (arg.n) elements ofeach fixed-effect.

Author(s)

Laurent Berge

See Also

femlm,fixef.fixest,plot.fixest.fixef.

Examples

data(trade)# We estimate the effect of distance on trade# => we account for 3 fixed-effects effectsest_pois = femlm(Euros ~ log(dist_km)|Origin+Destination+Product, trade)# obtaining the fixed-effects coefficientsfe_trade = fixef(est_pois)# printing some summary information on the fixed-effects coefficients:summary(fe_trade)

Summary for fixest_multi objects

Description

Summary information for fixest_multi objects. In particular, this is used to specify thetype of standard-errors to be computed.

Usage

## S3 method for class 'fixest_multi'summary(  object,  type = "etable",  vcov = NULL,  se = NULL,  cluster = NULL,  ssc = NULL,  stage = 2,  lean = FALSE,  n = 1000,  ...)

Arguments

object

Afixest_multi object, obtained from afixest estimation leading tomultiple results.

type

A character either equal to"etable","short","long","compact","se_compact" or"se_long".Ifetable, the functionetable is used to print the result.Ifshort, only the table of coefficients is displayed for each estimation.Iflong, then the full results are displayed for each estimation. Ifcompact,adata.frame is returned with one line per model and the formattedcoefficients + standard-errors in the columns. Ifse_compact, adata.frame isreturned with one line per model, one numeric column for each coefficient and one numericcolumn for each standard-error. If"se_long", same as"se_compact" but the data is in along format instead of wide.

vcov

Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. TheVCOV types implemented are: "iid", "hetero" (or "HC1"), "cluster", "twoway","NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley". It also acceptsobject fromvcov_cluster,vcov_NW,NW,vcov_DK,DK,vcov_conley andconley. It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances. See thevcov documentationin thevignette.

se

Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation:se = "cluster", otherwisese = "iid". Note that this argument is deprecated,you should usevcov instead.

cluster

Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering overvar1 andvar2contained in the data.framebase used for the estimation. All the followingcluster arguments are valid and do the same thing:cluster = base[, c("var1", "var2")],cluster = c("var1", "var2"),cluster = ~var1+var2.If the two variables were used as fixed-effects in the estimation, you can leave itblank withvcov = "twoway" (assumingvar1 [resp.var2] wasthe 1st [resp. 2nd] fixed-effect). You can interact two variables using^ withthe following syntax:cluster = ~var1^var2 orcluster = "var1^var2".

ssc

An object of classssc.type obtained with the functionssc. Representshow the degree of freedom correction should be done.You must use the functionsscfor this argument. The arguments and defaults of the functionssc are:K.adj = TRUE,K.fixef = "nonnested",G.adj = TRUE,G.df = "min",t.df = "min",⁠K.exact = FALSE)⁠. See the help of the functionssc for details.

stage

Can be equal to2 (default),1,1:2 or2:1. Only used if the objectis an IV estimation: defines the stage to whichsummary should be applied. Ifstage = 1and there are multiple endogenous regressors or ifstage is of length 2, then anobject of classfixest_multi is returned.

lean

Logical, default isFALSE. Used to reduce the (memory) size of the summary object.IfTRUE, then all objects of length N (the number of observations) are removedfrom the result. Note that somefixest methods may consequently not work when appliedto the summary.

n

Integer, default is 1000. Number of coefficients to display when the print methodis used.

...

Not currently used.

Value

It returns either an object of classfixest_multi (iftype equalsshort orlong),either adata.frame (if type equalscompact orse_compact).

See Also

The main fixest estimation functions:feols,fepois,fenegbin,feglm,feNmlm. Tools for mutliple fixestestimations:summary.fixest_multi,print.fixest_multi,as.list.fixest_multi,sub-sub-.fixest_multi,sub-.fixest_multi.

Examples

base = irisnames(base) = c("y", "x1", "x2", "x3", "species")# Multiple estimationres = feols(y ~ csw(x1, x2, x3), base, split = ~species)# By default, the type is "etable"# You can still use the arguments from summary.fixestsummary(res, se = "hetero")summary(res, type = "long")summary(res, type = "compact")summary(res, type = "se_compact")summary(res, type = "se_long")

Sun and Abraham interactions

Description

User-level method to implement staggered difference-in-difference estimations a la Sunand Abraham (Journal of Econometrics, 2021).

Usage

sunab(  cohort,  period,  ref.c = NULL,  ref.p = -1,  bin,  bin.rel,  bin.c,  bin.p,  att = FALSE,  no_agg = FALSE)sunab_att(cohort, period, ref.c = NULL, ref.p = -1)

Arguments

cohort

A vector representing the cohort. It should represent the period atwhich the treatment has been received (and thus be fixed for each unit).

period

A vector representing the period. It can be either a relative time period(with negative values representing the before the treatment and positive valuesafter the treatment), or a regular time period. In the latter case, the relativetime period will be created from the cohort information (which represents the time atwhich the treatment has been received).

ref.c

A vector of references for the cohort. By default the never treatedcohorts are taken as reference and the always treated are excluded from the estimation.You can add more references with this argument, which means that dummies will not becreated for them (but they will remain in the estimation).

ref.p

A vector of references for the (relative!) period. By default thefirst relative period (RP) before the treatment, i.e. -1, is taken as reference.You can instead use your own references (i.e. RPs for which dummies will not becreated – but these observations remain in the sample). Please note that you willneed at least two references. You can use the special variables.F and.L toaccess the first and the last relative periods.

bin

A list of values to be grouped, a vector, or the special value"bin::digit".The binning will be applied to both the cohort and the period (to bin them separately,seebin.c andbin.p). To create a new value from old values,usebin = list("new_value"=old_values) withold_values a vector ofexisting values. It accepts regular expressions, but they must start with an"@",like inbin="@Aug|Dec". The names of the list are the new names. If the newname is missing, the first value matched becomes the new name. Feeding in a vector islike using a list without name and only a single element. If the vector is numeric,you can use the special value"bin::digit" to group everydigit element.For example ifx represent years, usingbin="bin::2" create bins of two years.Using"!bin::digit" groups every digit consecutive values starting from the first value.Using"!!bin::digit" is the same bu starting from the last value. In both cases,x is not required to be numeric.

bin.rel

A list or a vector defining which values to bin. Only applies to therelative periods andnot the cohorts. Please refer to the help of the argumentbin to understand the different ways to do the binning (or look at the helpofbin).

bin.c

A list or a vector defining which values to bin. Only applies to the cohort.Please refer to the help of the argumentbin to understand the different ways todo the binning (or look at the help ofbin).

bin.p

A list or a vector defining which values to bin. Only applies to the period.Please refer to the help of the argumentbin to understand the different ways todo the binning (or look at the help ofbin).

att

Logical, default isFALSE. IfTRUE: then the total average treatmenteffect for the treated is computed (instead of the ATT for each relative period).

no_agg

Logical, default isFALSE. IfTRUE: then there is no aggregation,leading to the estimation of all⁠cohort x time to treatment⁠ coefficients.

Details

This function creates a matrix of⁠cohort x relative_period⁠ interactions, and if used withinafixest estimation, the coefficients will automatically be aggregated to obtain the ATTfor each relative period. In practice, the coefficients are aggregated with theaggregate.fixest function whose argumentagg is automatically set to the appropriatevalue.

The SA method requires relative periods (negative/positive for before/after the treatment).Either the user can compute the RP (relative periods) by his/her own, either the RPsare computed on the fly from the periods and the cohorts (which then should representthe treatment period).

The never treated, which are the cohorts displaying only negative RPs are used as references(i.e. no dummy will be constructed for them). On the other hand, the always treated areremoved from the estimation, by means of adding NAs for each of their observations.

If the RPs have to be constructed on the fly, any cohort that is not present in theperiod is considered as never treated. This means that if the period ranges from1995 to 2005,cohort = 1994 will be considered as never treated, although itshould be considered as always treated: so be careful.

If you construct your own relative periods, the controls cohorts should have only negative RPs.

Value

If not used within afixest estimation, this function will return a matrix ofinteracted coefficients.

Binning

You can bin periods with the argumentsbin,bin.c,bin.p and/orbin.rel.

The argumentbin applies both to the original periods and cohorts (the cohorts will alsobe binned!). This argument only works when theperiod represent "calendar" periods(not relative ones!).

Alternatively you can bin the periods withbin.p (either "calendar" or relative); orthe cohorts withbin.c.

The argumentbin.rel applies only to the relative periods (hence not to the cohorts) oncethey have been created.

To understand how binning works, please have a look at the help and examples of thefunctionbin.

Binning can be done in many different ways: just remember that it is not because it ispossible that it does makes sense!

Author(s)

Laurent Berge

Examples

# Simple DiD exampledata(base_stagg)head(base_stagg)# Note that the year_treated is set to 1000 for the never treatedtable(base_stagg$year_treated)table(base_stagg$time_to_treatment)# The DiD estimationres_sunab = feols(y ~ x1 + sunab(year_treated, year) | id + year, base_stagg)etable(res_sunab)# By default the reference periods are the first year and the year before the treatment# i.e. ref.p = c(-1, .F); where .F is a shortcut for the first period.# Say you want to set as references the first three periods on top of -1res_sunab_3ref = feols(y ~ x1 + sunab(year_treated, year, ref.p = c(.F + 0:2, -1)) |                         id + year, base_stagg)# Display the two resultsiplot(list(res_sunab, res_sunab_3ref))# ... + show all refsiplot(list(res_sunab, res_sunab_3ref), ref = "all")## ATT## To get the total ATT, you can use summary with the agg argument:summary(res_sunab, agg = "ATT")# You can also look at the total effect per cohortsummary(res_sunab, agg = "cohort")## Binning## Binning can be done in many different ways# binning the cohortest_bin.c   = feols(y ~ x1 + sunab(year_treated, year, bin.c = 3:2) | id + year, base_stagg)# binning the periodest_bin.p   = feols(y ~ x1 + sunab(year_treated, year, bin.p = 3:1) | id + year, base_stagg)# binning both the cohort and the periodest_bin     = feols(y ~ x1 + sunab(year_treated, year, bin = 3:1) | id + year, base_stagg)# binning the relative period, grouping every two yearsest_bin.rel = feols(y ~ x1 + sunab(year_treated, year, bin.rel = "bin::2") | id + year, base_stagg)etable(est_bin.c, est_bin.p, est_bin, est_bin.rel, keep = "year")

Extract the terms

Description

This function extracts the terms of afixest estimation, excluding the fixed-effects part.

Usage

## S3 method for class 'fixest'terms(x, ...)

Arguments

x

Afixest object. Obtained using the functionsfemlm,feols orfeglm.

...

Not currently used.

Value

An object of classc("terms", "formula") which contains the terms representation of asymbolic model.

Examples

# simple estimation on iris data, using "Species" fixed-effectsres = feols(Sepal.Length ~ Sepal.Width*Petal.Length +            Petal.Width | Species, iris)# Terms of the linear partterms(res)

Fast transform of any type of vector(s) into an integer vector

Description

Tool to transform any type of vector, or even combination of vectors, into an integer vectorranging from 1 to the number of unique values. This actually creates an unique identifier vector.

Usage

to_integer(  ...,  inputs = NULL,  sorted = FALSE,  add_items = FALSE,  items.list = FALSE,  multi.df = FALSE,  multi.join = "_",  na.valid = FALSE,  internal = FALSE)

Arguments

...

Vectors of any type, to be transformed into a single integer vector rangingfrom 1 to the number of unique elements.

inputs

A list of inputs, by default it isNULL. If provided, it completelyreplaces the elements in....

sorted

Logical, default isFALSE. Whether the integer vector should make referenceto sorted values?

add_items

Logical, default isFALSE. Whether to add the unique values of theoriginal vector(s). If requested, an attributeitems is created containing thevalues (alternatively, they can appear in a list ifitems.list=TRUE).

items.list

Logical, default isFALSE. Only used ifadd_items=TRUE. IfTRUE,then a list of length 2 is returned withx the integer vector anditems the vector of items.

multi.df

Logical, default isFALSE. IfTRUE then a data.frame listing theunique elements is returned in the form of a data.frame. Ignored ifadd_items = FALSE.

multi.join

Character scalar used to join the items of multiple vectors.The default is"_". Ignored ifadd_items = FALSE.

na.valid

Logical, default isFALSE. Whether to consider NAs as regular values.IfTRUE, the returned index will not contain any NA value.

internal

Logical, default isFALSE. For programming only. If this functionis used within another function, settinginternal = TRUE is needed to make theevaluation of... valid. End users ofto_integer should not care.

Value

Reruns a vector of the same length as the input vectors.Ifadd_items=TRUE anditems.list=TRUE, a list of two elements is returned:xbeing the integer vector anditems being the unique values to which the valuesinx make reference.

Author(s)

Laurent Berge

Examples

x1 = iris$Speciesx2 = as.integer(iris$Sepal.Length)# transforms the species vector into integersto_integer(x1)# To obtain the "items":to_integer(x1, add_items = TRUE)# same but in list formto_integer(x1, add_items = TRUE, items.list = TRUE)# transforms x2 into an integer vector from 1 to 4to_integer(x2, add_items = TRUE)# To have the sorted items:to_integer(x2, add_items = TRUE, sorted = TRUE)# placing the three side to sidehead(cbind(x2, as_index = to_integer(x2),            as_index_sorted = to_integer(x2, sorted = TRUE)))# The result can safely be used as an indexres = to_integer(x2, add_items = TRUE, sorted = TRUE, items.list = TRUE)all(res$items[res$x] == x2)## Multiple vectors#to_integer(x1, x2, add_items = TRUE)# You can use multi.join to handle the join of the items:to_integer(x1, x2, add_items = TRUE, multi.join = "; ")# alternatively, return the items as a data.frameto_integer(x1, x2, add_items = TRUE, multi.df = TRUE)## NA values#x1_na = c("a", "a", "b", NA, NA, "b", "a", "c", NA)x2_na = c(NA,    1,  NA,  1,  1,   1,   2,   2,  2)# by default the NAs are propagatedto_integer(x1_na, x2_na, add_items = TRUE)# but you can treat them as valid values with na.valid = TRUEto_integer(x1_na, x2_na, add_items = TRUE, na.valid = TRUE)## programmatic use## the argument `inputs` can be used for easy programmatic useall_vars = list(x1_na, x2_na)to_integer(inputs = all_vars)

Trade data sample

Description

This data reports trade information between countries of the European Union (EU15).

Usage

data(trade, package = "fixest")

Format

trade is a data frame with 38,325 observations and 6 variables namedDestination,Origin,Product,Year,dist_km andEuros.

Source

This data has been extrated from Eurostat on October 2017.


Dissolves afixest panel

Description

Transforms afixest_panel object into a regular data.frame.

Usage

unpanel(x)

Arguments

x

Afixest_panel object (obtained from functionpanel).

Value

Returns a data set of the exact same dimension. Only the attribute 'panel_info' is erased.

Author(s)

Laurent Berge

See Also

Alternatively, the functionpanel changes adata.frame into a panel from which thefunctionsl andf (creating leads and lags) can be called. Otherwise you can set the panel'live' during the estimation using the argumentpanel.id (see for example in the functionfeols).

Examples

data(base_did)# Setting a data set as a panelpdat = panel(base_did, ~id+period)# ... allows you to use leads and lags in estimationsfeols(y~l(x1, 0:1), pdat)# Now unpanel => returns the initial data setclass(pdat) ; dim(pdat)new_base = unpanel(pdat)class(new_base) ; dim(new_base)

Updates afixest estimation

Description

Updates and re-estimates afixest model (estimated withfemlm,feols orfeglm).This function updates the formulas and use previous starting values to estimate a newfixest model. The data is obtained from the originalcall.

Usage

## S3 method for class 'fixest'update(  object,  fml.update = NULL,  fml = NULL,  nframes = 1,  use_calling_env = TRUE,  evaluate = TRUE,  ...)## S3 method for class 'fixest_multi'update(  object,  fml.update = NULL,  fml = NULL,  nframes = 1,  use_calling_env = TRUE,  evaluate = TRUE,  ...)

Arguments

object

Afixest orfixest_multi object. These are obtained fromfeols, orfeglm estimations, for example.

fml.update

A formula representing the changes to be made to the originalformula. By default it isNULL.Use a dot to refer to the previous variables in the current part.For example:. ~ . + xnew will add the variablexnew as an explanatory variable.Note that the previous fixed-effects (FEs) and IVs are implicitly forwarded.To rerun without the FEs or the IVs, you need to set them to 0 in their respective slot.Ex, assume the original formula is:y ~ x | fe | endo ~ inst, passing. ~ . + xnewto fml.update leads toy ~ x + xnew | fe | endo ~ inst (FEs and IVs are forwarded).To add xnew and remove the IV part: use. ~ . + xnew | . | 0 which leads toy ~ x + xnew | fe.

fml

A formula, default isNULL. If provided, it will completely overridethe value infml.update, which will be ignored. Note that this formula will beused for the new estimation, without any modification.

nframes

(Advanced users.) Defaults to 1. Only used if the argumentuse_calling_env isFALSE.Number of frames up the stack where to perform the evaluation of the updated call.By default, this is the parent frame.

use_calling_env

Logical scalar, default isTRUE. IfTRUE then the evaluationof the call will be done within the environment that called the initial estimation.This is mostly useful when thefixest object has been created through a customfunction, so that the new evaluation can use the variables within the enclosure ofthe function.

evaluate

Logical, default isTRUE. IfFALSE, only the updated call is returned.

...

Other arguments to be passed to the functionsfemlm,feols orfeglm.

Value

It returns afixest object (see details infemlm,feols orfeglm).

Author(s)

Laurent Berge

See Also

See also the main estimation functionsfemlm,feols orfeglm.predict.fixest,summary.fixest,vcov.fixest,fixef.fixest.

Examples

# Example using trade datadata(trade)# main estimationest_pois = fepois(Euros ~ log(dist_km) | Origin + Destination, trade)# we add the variable log(Year)est_2 = update(est_pois, . ~ . + log(Year))# we add another fixed-effect: "Product"est_3 = update(est_2, . ~ . | . + Product)# we remove the fixed-effect "Origin" and the variable log(dist_km)est_4 = update(est_3, . ~ . - log(dist_km) | . - Origin)# Quick look at the 4 estimationsetable(est_pois, est_2, est_3, est_4)

Computes the variance/covariance of afixest object

Description

This function extracts the variance-covariance of estimated parameters from a modelestimated withfemlm,feols orfeglm.

Usage

## S3 method for class 'fixest'vcov(  object,  vcov = NULL,  se = NULL,  cluster,  ssc = NULL,  attr = FALSE,  forceCovariance = FALSE,  keepBounded = FALSE,  nthreads = getFixest_nthreads(),  vcov_fix = TRUE,  ...)

Arguments

object

Afixest object. Obtained using the functionsfemlm,feols orfeglm.

vcov

Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. TheVCOV types implemented are: "iid", "hetero" (or "HC1"), "cluster", "twoway","NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley". It also acceptsobject fromvcov_cluster,vcov_NW,NW,vcov_DK,DK,vcov_conley andconley. It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances. See thevcov documentationin thevignette.

se

Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation:se = "cluster", otherwisese = "iid". Note that this argument is deprecated,you should usevcov instead.

cluster

Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering overvar1 andvar2contained in the data.framebase used for the estimation. All the followingcluster arguments are valid and do the same thing:cluster = base[, c("var1", "var2")],cluster = c("var1", "var2"),cluster = ~var1+var2.If the two variables were used as fixed-effects in the estimation, you can leave itblank withvcov = "twoway" (assumingvar1 [resp.var2] wasthe 1st [resp. 2nd] fixed-effect). You can interact two variables using^ withthe following syntax:cluster = ~var1^var2 orcluster = "var1^var2".

ssc

An object of classssc.type obtained with the functionssc. Representshow the degree of freedom correction should be done.You must use the functionsscfor this argument. The arguments and defaults of the functionssc are:K.adj = TRUE,K.fixef = "nonnested",G.adj = TRUE,G.df = "min",t.df = "min",⁠K.exact = FALSE)⁠. See the help of the functionssc for details.

attr

Logical, defaults toFALSE. Whether to include the attributes describing howthe VCOV was computed.

forceCovariance

(Advanced users.) Logical, default isFALSE. In the peculiar casewhere the obtained Hessian is not invertible (usually because of collinearity ofsome variables), use this option to force the covariance matrix, by using a generalizedinverse of the Hessian. This can be useful to spot where possible problems come from.

keepBounded

(Advanced users –feNmlm with non-linear part and boundedcoefficients only.) Logical, default isFALSE. IfTRUE, then the bounded coefficients(if any) are treated as unrestricted coefficients and their S.E. is computed (otherwiseit is not).

nthreads

The number of threads. Can be: a) an integer lower than, or equal to,the maximum number of threads; b) 0: meaning all available threads will be used;c) a number strictly between 0 and 1 which represents the fraction of all threads to use.The default is to use 50% of all threads. You can set permanently the numberof threads used within this package using the functionsetFixest_nthreads.

vcov_fix

Logical scalar, default isFALSE. If the VCOV ends up not beingpositive definite, whether to "fix" it using an eigenvalue decomposition(a la Cameron, Gelbach & Miller 2011).Since the VCOV should be PSD asymptotically, this might be a sign of a problemwith using the asymptotic approximation (e.g. too few units in clusters).If a problem is detected, the function will print a message to inform you.

...

Other arguments to be passed tosummary.fixest.

The computation of the VCOV matrix is first done insummary.fixest.

Details

For an explanation on how the standard-errors are computed and what is the exact meaning ofthe arguments, please have a look at the dedicated vignette:On standard-errors.

Value

It returns aK\times K square matrix whereK is the number of variablesof the fitted model.Ifattr = TRUE, this matrix has an attribute “type” specifying how thisvariance/covariance matrix has been computed.

Author(s)

Laurent Berge

References

Ding, Peng, 2021, "The Frisch–Waugh–Lovell theorem for standard errors." Statistics & Probability Letters 168.

See Also

You can also compute VCOVs with the following functions:vcov_cluster,vcov_hac,vcov_conley.

See also the main estimation functionsfemlm,feols orfeglm.summary.fixest,confint.fixest,resid.fixest,predict.fixest,fixef.fixest.

Examples

# Load panel datadata(base_did)# Simple estimation on a panelest = feols(y ~ x1, base_did)# ======== ## IID VCOV ## ======== ## By default the VCOV assumes iid errors:se(vcov(est))# You can make the call for an iid VCOV explicitly:se(vcov(est, "iid"))## Heteroskedasticity-robust VCOV## By default the VCOV assumes iid errors:se(vcov(est, "hetero"))# => note that it also accepts vcov = "White" and vcov = "HC1" as aliases.# =============== ## Clustered VCOVs ## =============== ## To cluster the VCOV, you can use a formula of the form cluster ~ var1 + var2 etc# Let's cluster by the panel ID:se(vcov(est, cluster ~ id))# Alternative ways:# -> cluster is implicitly assumed when a one-sided formula is providedse(vcov(est, ~ id))# -> using the argument cluster instead of vcovse(vcov(est, cluster = ~ id))# For two-/three- way clustering, just add more variables:se(vcov(est, ~ id + period))# -------------------|# Implicit deduction |# -------------------|# When the estimation contains FEs, the dimension on which to cluster# is directly inferred from the FEs used in the estimation, so you don't need# to explicitly add them.est_fe = feols(y ~ x1 | id + period, base_did)# Clustered along "id"se(vcov(est_fe, "cluster"))# Clustered along "id" and "period"se(vcov(est_fe, "twoway"))# =========== ## Panel VCOVs ## =========== ## ---------------------|# Newey West (NW) VCOV |# ---------------------|# To obtain NW VCOVs, use a formula of the form NW ~ id + periodse(vcov(est, NW ~ id + period))# If you want to change the lag:se(vcov(est, NW(3) ~ id + period))# Alternative way:# -> using the vcov_NW functionse(vcov(est, vcov_NW(unit = "id", time = "period", lag = 3)))# -------------------------|# Driscoll-Kraay (DK) VCOV |# -------------------------|# To obtain DK VCOVs, use a formula of the form DK ~ periodse(vcov(est, DK ~ period))# If you want to change the lag:se(vcov(est, DK(3) ~ period))# Alternative way:# -> using the vcov_DK functionse(vcov(est, vcov_DK(time = "period", lag = 3)))# -------------------|# Implicit deduction |# -------------------|# When the estimation contains a panel identifier, you don't need# to re-write them later onest_panel = feols(y ~ x1, base_did, panel.id = ~id + period)# Both methods, NM and DK, now work automaticallyse(vcov(est_panel, "NW"))se(vcov(est_panel, "DK"))# =================================== ## VCOVs robust to spatial correlation ## =================================== #data(quakes)est_geo = feols(depth ~ mag, quakes)# ------------|# Conley VCOV |# ------------|# To obtain a Conley VCOV, use a formula of the form conley(cutoff) ~ lat + lon# with lat/lon the latitude/longitude variable names in the data setse(vcov(est_geo, conley(100) ~ lat + long))# Alternative way:# -> using the vcov_DK functionse(vcov(est_geo, vcov_conley(lat = "lat", lon = "long", cutoff = 100)))# -------------------|# Implicit deduction |# -------------------|# By default the latitude and longitude are directly fetched in the data based# on pattern matching. So you don't have to specify them.# Furhter, an automatic cutoff is deduced by default.# The following works:se(vcov(est_geo, "conley"))# ======================== ## Small Sample Corrections ## ======================== ## You can change the way the small sample corrections are done with the argument ssc.# The argument ssc must be created by the ssc functionse(vcov(est, ssc = ssc(K.adj = FALSE)))# You can add directly the call to ssc in the vcov formula.# You need to add it like a variable:se(vcov(est, iid ~ ssc(K.adj = FALSE)))se(vcov(est, DK ~ period + ssc(K.adj = FALSE)))

Clustered VCOV

Description

Computes the clustered VCOV offixest objects.

Usage

vcov_cluster(x, cluster = NULL, ssc = NULL, vcov_fix = TRUE)

Arguments

x

Afixest object.

cluster

Either i) a character vector giving the names of the variables onto which tocluster, or ii) a formula giving those names, or iii) a vector/list/data.frame giving the hardvalues of the clusters. Note that in cases i) and ii) the variables are fetched directly in thedata set used for the estimation.

ssc

An object returned by the functionssc. It specifies how to perform the smallsample correction.

vcov_fix

Logical scalar, default isFALSE. If the VCOV ends up not beingpositive definite, whether to "fix" it using an eigenvalue decomposition(a la Cameron, Gelbach & Miller 2011).Since the VCOV should be PSD asymptotically, this might be a sign of a problemwith using the asymptotic approximation (e.g. too few units in clusters).If a problem is detected, the function will print a message to inform you.

Value

If the first argument is afixest object, then a VCOV is returned (i.e. a symmetric matrix).

If the first argument is not afixest object, then a) implicitly the arguments are shifted tothe left (i.e.vcov_cluster(~var1 + var2) is equivalent tovcov_cluster(cluster = ~var1 + var2)) and b) a VCOV-request is returned and NOT a VCOV.That VCOV-request can then be used in the argumentvcov of variousfixestfunctions (e.g.vcov.fixest or even in the estimation calls).

Author(s)

Laurent Berge

References

Cameron AC, Gelbach JB, Miller DL (2011). "Robust Inference with Multiway Clustering."Journal of Business & Economic Statistics, 29(2), 238-249. doi:10.1198/jbes.2010.07136.

Examples

base = irisnames(base) = c("y", "x1", "x2", "x3", "species")base$clu = rep(1:5, 30)est = feols(y ~ x1, base)# VCOV: using a formula giving the name of the clustersvcov_cluster(est, ~species + clu)# works as well with a character vectorvcov_cluster(est, c("species", "clu"))# you can also combine the two with '^'vcov_cluster(est, ~species^clu)## Using VCOV requests## per se: pretty useless...vcov_cluster(~species)# ...but VCOV-requests can be used at estimation time:# it may be more explicit than...feols(y ~ x1, base, vcov = vcov_cluster("species"))# ...the equivalent, built-in way:feols(y ~ x1, base, vcov = ~species)# The argument vcov does not accept hard values,# so you can feed them with a VCOV-request:feols(y ~ x1, base, vcov = vcov_cluster(rep(1:5, 30)))

Conley VCOV

Description

Compute VCOVs robust to spatial correlation, a la Conley (1999).

Usage

vcov_conley(  x,  lat = NULL,  lon = NULL,  cutoff = NULL,  pixel = 0,  distance = "triangular",  ssc = NULL,  vcov_fix = TRUE)conley(cutoff = NULL, pixel = NULL, distance = NULL)

Arguments

x

Afixest object.

lat

A character scalar or a one sided formula giving the name of the variablerepresenting the latitude. The latitude must lie in [-90, 90], [0, 180] or [-180, 0].

lon

A character scalar or a one sided formula giving the name of the variablerepresenting the longitude. The longitude must be in [-180, 180], [0, 360] or [-360, 0].

cutoff

The distance cutoff, in km. You can express the cutoff in miles by writing thenumber in character form and adding "mi" as a suffix: cutoff = "100mi" would be 100 miles. Ifmissing, a rule of thumb is used to deduce the cutoff, see details.

pixel

A positive numeric scalar, default is 0. If a positive number, the coordinates ofeach observation are pooled intopixel xpixel km squares. This lowers the precision but can(depending on the cases) greatly improve computational speed at a low precision cost. Note thatif thecutoff was expressed in miles, thenpixel will also be in miles.

distance

How to compute the distance between points. It can be equal to "triangular"(default) or "spherical". The latter case corresponds to the great circle distance and is moreprecise than triangular but is a bit more intensive computationally.

ssc

An object returned by the functionssc. It specifies how to perform the smallsample correction.

vcov_fix

Logical scalar, default isFALSE. If the VCOV ends up not beingpositive definite, whether to "fix" it using an eigenvalue decomposition(a la Cameron, Gelbach & Miller 2011).Since the VCOV should be PSD asymptotically, this might be a sign of a problemwith using the asymptotic approximation (e.g. too few units in clusters).If a problem is detected, the function will print a message to inform you.

Details

This function computes VCOVs that are robust to spatial correlations by assuming a correlationbetween the units that are at a geographic distance lower than a given cutoff.

The kernel is uniform.

If the cutoff is not provided, an estimation of it is given. This cutoff ensures that a minimumof units lie within it and is robust to sub-sampling. This automatic cutoff is only here forconvenience, the most appropriate cutoff shall depend on the application and shall be providedby the user.

The functionconley does not compute VCOVs directly but is meant to be used in the argumentvcov offixest functions (e.g. invcov.fixest or even in the estimation calls).

If the cutoff is missing, a rule of thumb is used to deduce a sensible cutoff.The algorithm is as follows:

This cutoff is provided only for convenience but should be an appropriate first guess.With this cutoff, about 50% of units should have at least around 8 neighbors.

Value

If the first argument is afixest object, then a VCOV is returned (i.e. a symmetric matrix).

If the first argument is not afixest object, then a) implicitly the arguments are shifted tothe left (i.e.vcov_conley("lat", "long") is equivalent tovcov_conley(lat = "lat", lon = "long")) and b) a VCOV-request is returned and NOT a VCOV.That VCOV-request can then be used in the argumentvcov of variousfixest functions(e.g.vcov.fixest or even in the estimation calls).

References

Conley TG (1999). "GMM Estimation with Cross Sectional Dependence",Journal of Econometrics, 92, 1-45.

Examples

data(quakes)# We use conley() in the vcov argument of the estimationfeols(depth ~ mag, quakes, conley(100))# Post estimationest = feols(depth ~ mag, quakes)vcov_conley(est, cutoff = 100)

HAC VCOVs

Description

Set of functions to compute the VCOVs robust to different forms correlation in panel ortime series settings.

Usage

vcov_DK(x, time = NULL, lag = NULL, ssc = NULL, vcov_fix = TRUE)vcov_NW(x, unit = NULL, time = NULL, lag = NULL, ssc = NULL, vcov_fix = TRUE)NW(lag = NULL)newey_west(lag = NULL)DK(lag = NULL)driscoll_kraay(lag = NULL)

Arguments

x

Afixest object.

time

A character scalar or a one sided formula giving the name of thevariable representing the time.

lag

An integer scalar, default isNULL. IfNULL, then the default lag is equal ton_t^0.25 withn_t the number of time periods (as of Newey and West 1987) for panelNewey-West and Driscoll-Kraay. The default for the time series Newey-West is computed viabwNeweyWest which implements the Newey and West 1994 method.

ssc

An object returned by the functionssc. It specifies how to perform the smallsample correction.

vcov_fix

Logical scalar, default isFALSE. If the VCOV ends up not beingpositive definite, whether to "fix" it using an eigenvalue decomposition(a la Cameron, Gelbach & Miller 2011).Since the VCOV should be PSD asymptotically, this might be a sign of a problemwith using the asymptotic approximation (e.g. too few units in clusters).If a problem is detected, the function will print a message to inform you.

unit

A character scalar or a one sided formula giving the name of thevariable representing the units of the panel.

Details

There are currently three VCOV types: Newey-West applied to time series, Newey-West applied toa panel setting (when the argument 'unit' is not missing), and Driscoll-Kraay.

The functions on this page without the prefix "vcov_" do not compute VCOVs directly butare meant to be used in the argumentvcov offixest functions (e.g. invcov.fixestor even in the estimation calls).

Note that for Driscoll-Kraay VCOVs, to ensure its properties the number of periods shouldbe long enough (a minimum of 20 periods or so).

Value

If the first argument is afixest object, then a VCOV is returned (i.e. a symmetric matrix).

If the first argument is not afixest object, then a) implicitly the arguments are shifted tothe left (i.e.vcov_DK(~year) is equivalent tovcov_DK(time = ~year)) and b) aVCOV-request is returned and NOT a VCOV. That VCOV-request can then be used in the argumentvcov of variousfixest functions (e.g.vcov.fixest or even in the estimation calls).

Lag selection

The default lag selection depends on whether the VCOV applies to a panel or a time series.

For panels, i.e. panel Newey-West or Driscoll-Kraay VCOV, the default lag isn_t^0.25 withn_t the number of time periods. This is based on Newey and West 1987.

For time series Newey-West, the default lag is found thanks to thebwNeweyWest function from thesandwich package. It is based onNewey and West 1994.

References

Newey WK, West KD (1987). "A Simple, Positive Semi-Definite, Heteroskedasticity andAutocorrelation Consistent Covariance Matrix."Econometrica, 55(3), 703-708. doi:10.2307/1913610.

Driscoll JC, Kraay AC (1998). "Consistent Covariance Matrix Estimation with Spatially DependentPanel Data."The Review of Economics and Statistics, 80(4), 549-560. doi:10.1162/003465398557825.

Millo G (2017). "Robust Standard Error Estimators for Panel Models: A Unifying Approach"Journal of Statistical Software, 82(3). doi:10.18637/jss.v082.i03.

Examples

data(base_did)## During the estimation## Panel Newey-West, lag = 2feols(y ~ x1, base_did, NW(2) ~ id + period)# Driscoll-Kraayfeols(y ~ x1, base_did, DK ~ period)# If the estimation is made with a panel.id, the dimensions are# automatically deduced:est = feols(y ~ x1, base_did, "NW", panel.id = ~id + period)est## Post estimation## If missing, the unit and time are automatically deduced from# the panel.id used in the estimationvcov_NW(est, lag = 2)

Heteroskedasticity-Robust VCOV

Description

Computes the heteroskedasticity-robust VCOV offixest objects.

Usage

vcov_hetero(  x,  type = "hc1",  exact = TRUE,  boot.size = NULL,  ssc = NULL,  vcov_fix = TRUE)

Arguments

x

Afixest object.

type

A string scalar. Either "HC1"/"HC2"/"HC3"

exact

Logical scalar, default isTRUE. Whether the diagonals of the projection matrix should be calculated exactly. IfFALSE, then it will be approximated using a JLA algorithm. See details. Unless you have a very large number of observations, it is recommended to keep the default value.

boot.size

Integer scalar orNULL, default is 1000. This is only used whenexact == FALSE. This determines the number of bootstrap samples used to estimate the projection matrix. If equal toNULL, it falls back to the default value of 1000.

ssc

An object returned by the functionssc. It specifies how to perform the smallsample correction.

vcov_fix

Logical scalar, default isFALSE. If the VCOV ends up not beingpositive definite, whether to "fix" it using an eigenvalue decomposition(a la Cameron, Gelbach & Miller 2011).Since the VCOV should be PSD asymptotically, this might be a sign of a problemwith using the asymptotic approximation (e.g. too few units in clusters).If a problem is detected, the function will print a message to inform you.

Value

If the first argument is afixest object, then a VCOV is returned (i.e. a symmetric matrix).

If the first argument is not afixest object, then a) implicitly the arguments are shifted to the left (i.e.vcov_hetero("HC3") is equivalent tovcov_hetero(type = "HC3") and b) a VCOV-request is returned and NOT a VCOV. That VCOV-request can then be used in the argumentvcov of variousfixest functions (e.g.vcov.fixest or even in the estimation calls).

Author(s)

Laurent Berge and Kyle Butts

References

MacKinnon, J. G. (2012). "Thirty years of heteroscedasticity-robust inference." Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis, pp. 437–461. https://doi.org/10.1007/978-1-4614-1653-1_17

Examples

base = irisnames(base) = c("y", "x1", "x2", "x3", "species")est = feols(y ~ x1 | species, base)vcov_hetero(est, "hc1")vcov_hetero(est, "hc2", ssc = ssc(K.adj = FALSE))vcov_hetero(est, "hc3", ssc = ssc(K.adj = FALSE))# Using approximate hatvaluesvcov_hetero(est, "hc3", exact = FALSE, boot.size = 500)

Wald test of nullity of coefficients

Description

Wald test used to test the joint nullity of a set of coefficients.

Usage

wald(x, keep = NULL, drop = NULL, print = TRUE, vcov, se, cluster, ...)

Arguments

x

Afixest object. Obtained using the methodsfemlm,feols orfeglm.

keep

Character vector. This element is used to display only a subset of variables. Thisshould be a vector of regular expressions (seebase::regex help for more info). Eachvariable satisfying any of the regular expressions will be kept. This argument is applied postaliasing (see argumentdict). Example: you have the variablex1 tox55 and want to displayonlyx1 tox9, then you could usekeep = "x[[:digit:]]$". If the first character is anexclamation mark, the effect is reversed (e.g. keep = "!Intercept" means: every variable thatdoes not contain “Intercept” is kept). See details.

drop

Character vector. This element is used if some variables are not to be displayed.This should be a vector of regular expressions (seebase::regex help for more info). Eachvariable satisfying any of the regular expressions will be discarded. This argument is appliedpost aliasing (see argumentdict). Example: you have the variablex1 tox55 and want todisplay onlyx1 tox9, then you could use⁠drop = "x[[:digit:]]{2}⁠". If the first characteris an exclamation mark, the effect is reversed (e.g. drop = "!Intercept" means: every variablethat does not contain “Intercept” is dropped). See details.

print

Logical, default isTRUE. IfTRUE, then a verbose description of the testis prompted on the R console. Otherwise only a named vector containing the test statisticsis returned.

vcov

Versatile argument to specify the VCOV. In general, it is either a characterscalar equal to a VCOV type, either a formula of the form:vcov_type ~ variables. TheVCOV types implemented are: "iid", "hetero" (or "HC1"), "cluster", "twoway","NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley". It also acceptsobject fromvcov_cluster,vcov_NW,NW,vcov_DK,DK,vcov_conley andconley. It also accepts covariance matrices computed externally.Finally it accepts functions to compute the covariances. See thevcov documentationin thevignette.

se

Character scalar. Which kind of standard error should be computed:“standard”, “hetero”, “cluster”, “twoway”, “threeway”or “fourway”? By default if there are clusters in the estimation:se = "cluster", otherwisese = "iid". Note that this argument is deprecated,you should usevcov instead.

cluster

Tells how to cluster the standard-errors (if clustering is requested).Can be either a list of vectors, a character vector of variable names, a formula oran integer vector. Assume we want to perform 2-way clustering overvar1 andvar2contained in the data.framebase used for the estimation. All the followingcluster arguments are valid and do the same thing:cluster = base[, c("var1", "var2")],cluster = c("var1", "var2"),cluster = ~var1+var2.If the two variables were used as fixed-effects in the estimation, you can leave itblank withvcov = "twoway" (assumingvar1 [resp.var2] wasthe 1st [resp. 2nd] fixed-effect). You can interact two variables using^ withthe following syntax:cluster = ~var1^var2 orcluster = "var1^var2".

...

Any other element to be passed tosummary.fixest.

Details

The type of VCOV matrix plays a crucial role in this test. Use the argumentsse andcluster to change the type of VCOV for the test.

Value

A named vector containing the following elements is returned:stat,p,df1,anddf2. They correspond to the test statistic, the p-value, the first andsecond degrees of freedoms.

If no valid coefficient is found, the valueNA is returned.

Examples

data(airquality)est = feols(Ozone ~ Solar.R + Wind + poly(Temp, 3), airquality)# Testing the joint nullity of the Temp polynomialwald(est, "poly")# Same but with clustered SEswald(est, "poly", cluster = "Month")# Now: all vars but the polynomial and the interceptwald(est, drop = "Inte|poly")## Toy example: testing pre-trends#data(base_did)est_did = feols(y ~ x1 + i(period, treat, 5) | id + period, base_did)# The graph of the coefficientscoefplot(est_did)# The pre-trend testwald(est_did, "period::[1234]$")# If "period::[1234]$" looks weird to you, check out# regular expressions: e.g. see ?regex.# Learn it, you won't regret it!

Extracts the weights from afixest object

Description

Simply extracts the weights used to estimate afixest model.

Usage

## S3 method for class 'fixest'weights(object, ...)

Arguments

object

Afixest object.

...

Not currently used.

Value

Returns a vector of the same length as the number of observations in the original data set.Ignored observations due to NA or perfect fit are re-introduced and their weights set to NA.

See Also

feols,fepois,feglm,fenegbin,feNmlm.

Examples

est = feols(Petal.Length ~ Petal.Width, iris, weights = ~as.integer(Sepal.Length) - 3.99)weights(est)

Expands formula macros

Description

Create macros within formulas and expand them with character vectors or other formulas.

Usage

xpd(  fml,  ...,  add = NULL,  lhs = NULL,  rhs = NULL,  add.after_pipe = NULL,  data = NULL,  frame = parent.frame())

Arguments

fml

A formula containing macros variables. Each macro variable must startwith two dots.The macro variables can be set globally usingsetFixest_fml, or can be defined in....Special macros of the form..("regex") can be used to fetch, through a regular expression,variables directly in a character vector (or in column names) given in theargumentdata (note that the algorithm tries to "guess" the argument data whennested in function calls [see example]).You can negate the regex by starting with a"!". Square brackets have a specialmeaning: Values in them are evaluated and parsed accordingly.Example:y~x.[1:2] + z.[i] willlead toy~x1+x2+z3 ifi==3. You can trigger the auto-completion of variablesby using the'..' suffix, like iny ~ x..which would includex1 andx2, etc. See examples.

...

Definition of the macro variables. Each argument name corresponds to the name of themacro variable. It is required that each macro variable name starts with two dots(e.g...ctrl). The value of each argument must be a one-sided formula or a character vector,it is the definition of the macro variable. Example of a valid call:setFixest_fml(..ctrl = ~ var1 + var2). In the functionxpd, the default macro variables aretaken fromgetFixest_fml, any variable in... will replace these values. You can enclosevalues in.[], if so they will be evaluated from the current environment.For example..ctrl = ~ x.[1:2] + .[z] will lead to~x1 + x2 + var ifz is equal to"var".

add

A character vector or a one-sided formula.The elements will be added to the right-hand-side of the formula,before any macro expansion is applied.

lhs

If present then a formula will be constructed withlhs asthe full left-hand-side.The value oflhs can be a one-sided formula, a call, or a character vector. Note that themacro variables wont be applied. You can use it in combination with the argumentrhs. Notethat iffml is not missing, its LHS will be replaced bylhs.

rhs

If present, then a formula will be constructed withrhs as the fullright-hand-side. The value ofrhs can be a one-sided formula, a call,or a character vector.Note that the macro variables wont be applied. You can use it in combinationwith the argumentlhs. Note that iffml is not missing, its RHS will be replaced byrhs.

add.after_pipe

A character vector or a one-sided or two-sided formula.The elements will be added to the right-hand-side of the formula, just after a pipe (|),before any macro expansion is applied.

data

Either a character vector or a data.frame. This argument will only be used if amacro of the type..("regex") is used in the formula of the argumentfml. If so, anyvariable name fromdata that matches the regular expression will be added to the formula.

frame

The environment containing the values to be expanded with thedot square bracket operator. Default isparent.frame().

Details

Inxpd, the default macro variables are taken fromgetFixest_fml. Any value in the...argument ofxpd will replace these default values.

The definitions of the macro variables will replace in verbatim the macro variables.Therefore,you can include multi-part formulas if you wish but then beware of the order of the macrosvariable in the formula. For example, using theairquality data, say you want to set ascontrols the variableTemp andDay fixed-effects, you can dosetFixest_fml(..ctrl = ~Temp | Day), but thenfeols(Ozone ~ Wind + ..ctrl, airquality)will be quite different fromfeols(Ozone ~ ..ctrl + Wind, airquality), so beware!

Value

It returns a formula where all macros have been expanded.

Dot square bracket operator in formulas

In a formula, the dot square bracket (DSB) operator can: i) create manifold variablesat once, or ii) capture values from the current environment and put themverbatim in the formula.

Say you want to include the variablesx1 tox3 in your formula. You can usexpd(y ~ x.[1:3]) and you'll gety ~ x1 + x2 + x3.

To summon values from the environment, simply put the variable in square brackets.For example:for(i in 1:3) xpd(y.[i] ~ x) will create the formulasy1 ~ x toy3 ~ xdepending on the value ofi.

You can include a full variable from the environment in the same way:for(y in c("a", "b")) xpd(.[y] ~ x) will create the two formulasa ~ x andb ~ x.

The DSB can even be used within variable names, but then the variable must be nested incharacter form. For exampley ~ .["x.[1:2]_sq"] will createy ~ x1_sq + x2_sq. Using thecharacter form is important to avoid a formula parsing error.Double quotes must be used. Note that the character string that is nested willbe parsed with the functiondsb, and thus it will return a vector.

By default, the DSB operator expands vectors into sums. You can add a comma,like in.[, x],to expand with commas–the content can then be used within functions. For instance:c(x.[, 1:2]) will createc(x1, x2) (andnotc(x1 + x2)).

In allfixest estimations, this special parsing is enabled, so you don't need to usexpd.

One-sided formulas can be expanded with the DSB operator: letx = ~sepal + petal, thenxpd(y ~ .[x]) leads tocolor ~ sepal + petal.

You can even use multiple square brackets within a single variable,but then the use of nesting is required.For example, the followingxpd(y ~ .[".[letters[1:2]]_.[1:2]"]) will createy ~ a_1 + b_2. Remember that the nested character string is parsed withdsb,which explains this behavior.

When the element to be expanded i) is equal to the empty string or,ii) is of length 0, it is replaced with a neutral element, namely1.For example,⁠x = "" ; xpd(y ~ .[x])⁠ leads toy ~ 1.

Regular expressions

You can catch several variable names at once by using regular expressions. To use regularexpressions, you need to enclose it in the dot-dot or the regex function:..("regex") orregex("regex"). For example,regex("Sepal") will catch both the variablesSepal.Length andSepal.Width from theiris data set.In afixest estimation, the variables names from which the regex willbe applied come from the data set. If you usexpd, you need to provideeither a data set or a vector of names in the argumentdata.

By default the variables are aggregated with a sum. For example in a data setwith the variables x1 to x10,⁠regex("x(1|2)"⁠ will yieldx1 + x2 + x10. You can instead ask for "comma"aggregation by using a comma first, just before the regular expression:y ~ sw(regex(,"x(1|2)")) would lead toy ~ sw(x1, x2, x10).

Note that the dot square bracket operator (DSB, see before) is applied before the regularexpression is evaluated. This means thatregex("x.[3:4]_sq") will lead,after evaluation of the DSB, toregex("x3_sq|x4_sq").It is a handy way to insert range of numbers in a regular expression.

Author(s)

Laurent Berge

See Also

setFixest_fml to set formula macros, anddsb to modify character strings with the DSB operator.

Examples

# Small examples with airquality datadata(airquality)# we set two macro variablessetFixest_fml(..ctrl = ~ Temp + Day,              ..ctrl_long = ~ poly(Temp, 2) + poly(Day, 2))# Using the macro in lm with xpd:lm(xpd(Ozone ~ Wind + ..ctrl), airquality)lm(xpd(Ozone ~ Wind + ..ctrl_long), airquality)# You can use the macros without xpd() in fixest estimationsa = feols(Ozone ~ Wind + ..ctrl, airquality)b = feols(Ozone ~ Wind + ..ctrl_long, airquality)etable(a, b, keep = "Int|Win")# Using .[]base = setNames(iris, c("y", "x1", "x2", "x3", "species"))i = 2:3z = "species"lm(xpd(y ~ x.[2:3] + .[z]), base)# No xpd() needed in feolsfeols(y ~ x.[2:3] + .[z], base)## Auto completion with '..' suffix## You can trigger variables autocompletion with the '..' suffix# You need to provide the argument database = setNames(iris, c("y", "x1", "x2", "x3", "species"))xpd(y ~ x.., data = base)# In fixest estimations, this is automatically taken care offeols(y ~ x.., data = base)## You can use xpd for stepwise estimations## Note that for stepwise estimations in fixest, you can use# the stepwise functions: sw, sw0, csw, csw0# -> see help in feols or in the dedicated vignette# we want to look at the effect of x1 on y# controlling for different variablesbase = irisnames(base) = c("y", "x1", "x2", "x3", "species")# We first create a matrix with all possible combinations of variablesmy_args = lapply(names(base)[-(1:2)], function(x) c("", x))(all_combs = as.matrix(do.call("expand.grid", my_args)))res_all = list()for(i in 1:nrow(all_combs)){  res_all[[i]] = feols(xpd(y ~ x1 + ..v, ..v = all_combs[i, ]), base)}etable(res_all)coefplot(res_all, group = list(Species = "^^species"))## You can use macros to grep variables in your data set## Example 1: setting a macro variable globallydata(longley)setFixest_fml(..many_vars = grep("GNP|ployed", names(longley), value = TRUE))feols(Armed.Forces ~ Population + ..many_vars, longley)# Example 2: using ..("regex") or regex("regex") to grep the variables "live"feols(Armed.Forces ~ Population + ..("GNP|ployed"), longley)# Example 3: same as Ex.2 but without using a fixest estimation# Here we need to use xpd():lm(xpd(Armed.Forces ~ Population + regex("GNP|ployed"), data = longley), longley)# Stepwise estimation with regex: use a comma after the parenthesisfeols(Armed.Forces ~ Population + sw(regex(,"GNP|ployed")), longley)# Multiple LHSetable(feols(..("GNP|ployed") ~ Population, longley))## lhs and rhs arguments## to create a one sided formula from a character vectorvars = letters[1:5]xpd(rhs = vars)# Alternatively, to replace the RHSxpd(y ~ 1, rhs = vars)# To create a two sided formulaxpd(lhs = "y", rhs = vars)## argument 'add'#xpd(~x1, add = ~ x2 + x3)# also works with character vectorsxpd(~x1, add = c("x2", "x3"))# only adds to the RHSxpd(y ~ x, add = ~bon + jour)## argument add.after_pipe#xpd(~x1, add.after_pipe = ~ x2 + x3)# we can add a two sided formulaxpd(~x1, add.after_pipe = x2 ~ x3)## Dot square bracket operator## The basic use is to add variables in the formulax = c("x1", "x2")xpd(y ~ .[x])# Alternatively, one-sided formulas can be used and their content will be inserted verbatimx = ~x1 + x2xpd(y ~ .[x])# You can create multiple variables at oncexpd(y ~ x.[1:5] + z.[2:3])# You can summon variables from the environment to complete variables namesvar = "a"xpd(y ~ x.[var])# ... the variables can be multiplevars = LETTERS[1:3]xpd(y ~ x.[vars])# You can have "complex" variable names but they must be nested in character formxpd(y ~ .["x.[vars]_sq"])# DSB can be used within regular expressionsre = c("GNP", "Pop")xpd(Unemployed ~ regex(".[re]"), data = longley)# => equivalent to regex("GNP|Pop")# Use .[,var] (NOTE THE COMMA!) to expand with commas# !! can break the formula if missusedvars = c("wage", "unemp")xpd(c(y.[,1:3]) ~ csw(.[,vars]))# Example of use of .[] within a loopres_all = list()for(p in 1:3){  res_all[[p]] = feols(Ozone ~ Wind + poly(Temp, .[p]), airquality)}etable(res_all)# The former can be compactly estimated with:res_compact = feols(Ozone ~ Wind + sw(.[, "poly(Temp, .[1:3])"]), airquality)etable(res_compact)# How does it work?# 1)  .[, stuff] evaluates stuff and, if a vector, aggregates it with commas#     Comma aggregation is done thanks to the comma placed after the square bracket#     If .[stuff], then aggregation is with sums.# 2) stuff is evaluated, and if it is a character string, it is evaluated with# the function dsb which expands values in .[]## Wrapping up:# 2) evaluation of dsb("poly(Temp, .[1:3])") leads to the vector:#    c("poly(Temp, 1)", "poly(Temp, 2)", "poly(Temp, 3)")# 1) .[, c("poly(Temp, 1)", "poly(Temp, 2)", "poly(Temp, 3)")] leads to#    poly(Temp, 1), poly(Temp, 2), poly(Temp, 3)## Hence sw(.[, "poly(Temp, .[1:3])"]) becomes:#       sw(poly(Temp, 1), poly(Temp, 2), poly(Temp, 3))## In non-fixest functions: guessing the data allows to use regex## When used in non-fixest functions, the algorithm tries to "guess" the data# so that ..("regex") can be directly evaluated without passing the argument 'data'data(longley)lm(xpd(Armed.Forces ~ Population + ..("GNP|ployed")), longley)# same for the auto completion with '..'lm(xpd(Armed.Forces ~ Population + GN..), longley)

[8]ページ先頭

©2009-2025 Movatter.jp