| Part of a series on |
| Regression analysis |
|---|
| Models |
| Estimation |
| Background |
Thegeneral linear model orgeneral multivariate regression model is a compact way of simultaneously writing severalmultiple linear regression models. In that sense it is not a separate statisticallinear model. The various multiple linear regression models may be compactly written as[1]
whereY is amatrix with series of multivariate measurements (each column being a set of measurements on one of thedependent variables),X is a matrix of observations onindependent variables that might be adesign matrix (each column being a set of observations on one of the independent variables),B is a matrix containing parameters that are usually to be estimated andU is a matrix containingerrors (noise). The errors are usually assumed to be uncorrelated across measurements, and follow amultivariate normal distribution. If the errors do not follow a multivariate normal distribution,generalized linear models may be used to relax assumptions aboutY andU.
The general linear model (GLM) encompasses several statistical models, includingANOVA,ANCOVA,MANOVA,MANCOVA, and ordinarylinear regression. Within this framework, both thet-test and theF-test can be applied. The general linear model is a generalization of multiple linear regression to the case of more than one dependent variable. IfY,B, andU werecolumn vectors, the matrix equation above would represent multiple linear regression.
Hypothesis tests with the general linear model can be made in two ways:multivariate or as several independentunivariate tests. In multivariate tests the columns ofY are tested together, whereas in univariate tests the columns ofY are tested independently, i.e., as multiple univariate tests with the same design matrix.
Multiple linear regression is a generalization ofsimple linear regression to the case of more than one independent variable, and aspecial case of general linear models, restricted to one dependent variable. The basic model for multiple linear regression is
for each observationi = 1, ... ,n.
In the formula above we considern observations of one dependent variable andp independent variables. Thus,Yi is theith observation of the dependent variable,Xik isith observation of thekth independent variable,k = 1, 2, ...,p. The valuesβk represent parameters to be estimated, andεi is theith independent identically distributed normal error.
In the more general multivariate linear regression, there is one equation of the above form for each ofm > 1 dependent variables that share the same set of explanatory variables and hence are estimated simultaneously with each other:
for all observations indexed asi = 1, ... ,n and for all dependent variables indexed asj = 1,... ,m.
Note that, since each dependent variable has its own set of regression parameters to be fitted, from a computational point of view the general multivariate regression is simply a sequence of standard multiple linear regressions using the same explanatory variables.
The general linear model and thegeneralized linear model (GLM)[2][3] are two commonly used families ofstatistical methods to relate some number of continuous and/or categoricalpredictors to a singleoutcome variable.
The main difference between the two approaches is that the general linear model strictly assumes that theresiduals will follow aconditionallynormal distribution,[4] while the GLM loosens this assumption and allows for a variety of otherdistributions from theexponential family for the residuals.[2] The general linear model is a special case of the GLM in which the distribution of the residuals follow a conditionally normal distribution.
The distribution of the residuals largely depends on the type and distribution of the outcome variable; different types of outcome variables lead to the variety of models within the GLM family. Commonly used models in the GLM family includebinary logistic regression[5] for binary or dichotomous outcomes,Poisson regression[6] for count outcomes, andlinear regression for continuous, normally distributed outcomes. This means that GLM may be spoken of as a general family of statistical models or as specific models for specific outcome types.
| General linear model | Generalized linear model | |
|---|---|---|
| Typical estimation method | Least squares,best linear unbiased prediction | Maximum likelihood orBayesian |
| Examples | ANOVA,ANCOVA,linear regression | linear regression,logistic regression,Poisson regression, gamma regression,[7] general linear model |
| Extensions and related methods | MANOVA,MANCOVA,linear mixed model | generalized linear mixed model (GLMM),generalized estimating equations (GEE) |
| R package and function | lm() in stats package (base R) | glm() in stats package (base R) manova, |
| MATLAB function | mvregress() | glmfit() |
| SAS procedures | PROC GLM,PROC REG | PROC GENMOD,PROC LOGISTIC (for binary & ordered or unordered categorical outcomes) |
| Stata command | regress | glm |
| SPSS command | regression,glm | genlin, logistic |
| Wolfram Language &Mathematica function | LinearModelFit[][8] | GeneralizedLinearModelFit[][9] |
| EViews command | ls[10] | glm[11] |
| statsmodels Python Package | regression-and-linear-models | GLM |
An application of the general linear model appears in the analysis of multiplebrain scans in scientific experiments whereY contains data from brain scanners,X contains experimental design variables and confounds. It is usually tested in a univariate way (usually referred to amass-univariate in this setting) and is often referred to asstatistical parametric mapping.[12]
{{cite book}}: CS1 maint: DOI inactive as of July 2025 (link)