Movatterモバイル変換

[0]ホーム

Jump to content

Linear regression

Edit links

From Wikipedia, the free encyclopedia

Statistical modeling method

For other uses, seeLinear regression (disambiguation).

Regression analysis
Part of a series on
Models
Linear regression Simple regression Polynomial regression General linear model
Generalized linear model Vector generalized linear model Discrete choice Binomial regression Binary regression Logistic regression Multinomial logistic regression Mixed logit Probit Multinomial probit Ordered logit Ordered probit Poisson
Multilevel model Fixed effects Random effects Linear mixed-effects model Nonlinear mixed-effects model
Nonlinear regression Nonparametric Semiparametric Robust Quantile Isotonic Principal components Least angle Local Segmented
Errors-in-variables
Estimation
Least squares Linear Non-linear
Ordinary Weighted Generalized Generalized estimating equation
Partial Total Non-negative Ridge regression Regularized
Least absolute deviations Iteratively reweighted Bayesian Bayesian multivariate Least-squares spectral analysis
Background
Regression validation Mean and predicted response Errors and residuals Goodness of fit Studentized residual Gauss–Markov theorem
Mathematics portal
v t e

Instatistics,linear regression is amodel that estimates the relationship between ascalar response (dependent variable) and one or more explanatory variables (regressor orindependent variable). A model with exactly one explanatory variable is asimple linear regression; a model with two or more explanatory variables is amultiple linear regression.^[1] This term is distinct frommultivariate linear regression, which predicts multiplecorrelated dependent variables rather than a single dependent variable.^[2]

In linear regression, the relationships are modeled usinglinear predictor functions whose unknown modelparameters areestimated from thedata. Most commonly, theconditional mean of the response given the values of the explanatory variables (or predictors) is assumed to be anaffine function of those values; less commonly, the conditionalmedian or some otherquantile is used. Like all forms ofregression analysis, linear regression focuses on theconditional probability distribution of the response given the values of the predictors, rather than on thejoint probability distribution of all of these variables, which is the domain ofmultivariate analysis.

Linear regression is also a type ofmachine learning algorithm, more specifically asupervised algorithm, that learns from the labelled datasets and maps the data points to the most optimized linear functions that can be used for prediction on new datasets.^[3]

Linear regression was the first type of regression analysis to be studied rigorously, and to be used extensively in practical applications.^[4] This is because models which depend linearly on their unknown parameters are easier to fit than models which are non-linearly related to their parameters and because the statistical properties of the resulting estimators are easier to determine.

Linear regression has many practical uses. Most applications fall into one of the following two broad categories:

If the goal is to reduce error, i.e.variance inprediction orforecasting, linear regression can be used to fit a predictive model to an observeddata set of values of the response and explanatory variables. After developing such a model, if additional values of the explanatory variables are collected without an accompanying response value, the fitted model can be used to make a prediction of the response.
If the goal is to explain variation in the response variable that can be attributed to variation in the explanatory variables, linear regression analysis can be applied to quantify the strength of the relationship between the response and the explanatory variables, and in particular to determine whether some explanatory variables may have no linear relationship with the response at all, or to identify which subsets of explanatory variables may contain redundant information about the response.

Linear regression models are often fitted using theleast squares approach, but they may also be fitted in other ways, such as by minimizing the "lack of fit" in some othernorm (as withleast absolute deviations regression), or by minimizing a penalized version of the least squarescost function as inridge regression (L²-norm penalty) andlasso (L¹-norm penalty). Use of theMean Squared Error (MSE) as the cost on a dataset that has many large outliers, can result in a model that fits the outliers more than the true data due to the higher importance assigned by MSE to large errors. So, cost functions that are robust to outliers should be used if the dataset has many largeoutliers. Conversely, the least squares approach can be used to fit models that are not linear models. Thus, although the terms "least squares" and "linear model" are closely linked, they are not synonymous.

Movatterモバイル変換

Formulation

Notation and terminology

Example

Assumptions

Interpretation

Extensions

Simple and multiple linear regression

General linear models

Heteroscedastic models

Generalized linear models

Hierarchical linear models

Errors-in-variables

Group effects

Others

Estimation methods

Least-squares estimation and related techniques

Maximum-likelihood estimation and related techniques

Maximum likelihood estimation

Regularized Regression

Least Absolute Deviation

Adaptive Estimation

Other estimation techniques

Applications

Trend line

Epidemiology

Finance

Economics

Environmental science

Building science

Machine learning

History

See also

References

Citations

Sources

Further reading

External links