Movatterモバイル変換

[0]ホーム

Jump to content

Regularization (mathematics)

Edit links

From Wikipedia, the free encyclopedia

Technique to make a model more generalizable and transferable

The green and blue functions both incur zero loss on the given data points. A learned model can be induced to prefer the green function, which may generalize better to more points drawn from the underlying unknown distribution, by adjusting $\lambda$ , the weight of the regularization term.

{\displaystyle \lambda } — The green and blue functions both incur zero loss on the given data points. A learned model can be induced to prefer the green function, which may generalize better to more points drawn from the underlying unknown distribution, by adjusting $\lambda$ , the weight of the regularization term.

Inmathematics,statistics,finance,^[1] andcomputer science, particularly inmachine learning andinverse problems,regularization is a process that converts theanswer to a problem to a simpler one. It is often used in solvingill-posed problems or to preventoverfitting.^[2]

Although regularization procedures can be divided in many ways, the following delineation is particularly helpful:

Explicit regularization is regularization whenever one explicitly adds a term to the optimization problem. These terms could bepriors, penalties, or constraints. Explicit regularization is commonly employed with ill-posed optimization problems. The regularization term, or penalty, imposes a cost on the optimization function to make the optimal solution unique.
Implicit regularization is all other forms of regularization. This includes, for example, early stopping, using a robust loss function, and discarding outliers. Implicit regularization is essentially ubiquitous in modern machine learning approaches, includingstochastic gradient descent for trainingdeep neural networks, andensemble methods (such asrandom forests andgradient boosted trees).

In explicit regularization, independent of the problem or model, there is always a data term, that corresponds to a likelihood of the measurement, and a regularization term that corresponds to a prior. By combining both usingBayesian statistics, one can compute a posterior, that includes both information sources and therefore stabilizes the estimation process. By trading off both objectives, one chooses to be more aligned to the data or to enforce regularization (to prevent overfitting). There is a whole research branch dealing with all possible regularizations. In practice, one usually tries a specific regularization and then figures out the probability density that corresponds to that regularization to justify the choice. It can also be physically motivated by common sense or intuition.

Inmachine learning, the data term corresponds to the training data and the regularization is either the choice of the model or modifications to the algorithm. It is always intended to reduce thegeneralization error, i.e. the error score with the trained model on the evaluation set (testing data) and not the training data.^[3]

One of the earliest uses of regularization isTikhonov regularization (ridge regression), related to the method of least squares.

Model	Fit measure	Entropy measure^[5]^[8]
AIC/BIC	$\left\\|Y-X\beta \right\\|_{2}$	$\left\\|\beta \right\\|_{0}$
Lasso^[9]	$\left\\|Y-X\beta \right\\|_{2}$	$\left\\|\beta \right\\|_{1}$
Ridge regression^[10]	$\left\\|Y-X\beta \right\\|_{2}$	$\left\\|\beta \right\\|_{2}$
Basis pursuit denoising	$\left\\|Y-X\beta \right\\|_{2}$	$\lambda \left\\|\beta \right\\|_{1}$
Rudin–Osher–Fatemi model (TV)	$\left\\|Y-X\beta \right\\|_{2}$	$\lambda \left\\|\nabla \beta \right\\|_{1}$
Potts model	$\left\\|Y-X\beta \right\\|_{2}$	$\lambda \left\\|\nabla \beta \right\\|_{0}$
RLAD^[11]	$\left\\|Y-X\beta \right\\|_{1}$	$\left\\|\beta \right\\|_{1}$
Dantzig Selector^[12]	$\left\\|X^{\mathsf {T}}(Y-X\beta )\right\\|_{\infty }$	$\left\\|\beta \right\\|_{1}$
SLOPE^[13]	$\left\\|Y-X\beta \right\\|_{2}$	$\sum _{i=1}^{p}\lambda _{i}\left\|\beta \right\|_{(i)}$

Movatterモバイル変換

Regularization in machine learning

Early stopping

L1 and L2 regularization

Dropout

Classification

Generalization

Tikhonov regularization (ridge regression)

Tikhonov-regularized least squares

Early stopping

Theoretical motivation in least squares

Regularizers for sparsity

Proximal methods

Group sparsity without overlaps

Group sparsity with overlaps

Regularizers for semi-supervised learning

Regularizers for multitask learning

Sparse regularizer on columns

Nuclear norm regularization

Mean-constrained regularization

Clustered mean-constrained regularization

Graph-based similarity

Other uses of regularization in statistics and machine learning

See also

Notes

References