Instatistics,multivariate adaptive regression splines (MARS) is a form ofregression analysis introduced byJerome H. Friedman in 1991.[1] It is anon-parametric regression technique and can be seen as an extension oflinear models that automatically models nonlinearities and interactions between variables.
The term "MARS" is trademarked and licensed to Salford Systems. In order to avoid trademark infringements, many open-source implementations of MARS are called "Earth".[2][3]
This section introduces MARS using a few examples. We start with a set of data: a matrix of input variablesx, and a vector of the observed responsesy, with a response for each row inx. For example, the data could be:
| x | y |
|---|---|
| 10.5 | 16.4 |
| 10.7 | 18.8 |
| 10.8 | 19.7 |
| ... | ... |
| 20.6 | 77.0 |
Here there is only oneindependent variable, so thex matrix is just a single column. Given these measurements, we would like to build a model which predicts the expectedy for a givenx.

Alinear model for the above data isThe hat on the indicates that is estimated from the data. The figure on the right shows a plot of this function: a line giving the predicted versusx, with the original values ofy shown as red dots.
The data at the extremes ofx indicates that the relationship betweeny andx may be non-linear (look at the red dots relative to the regression line at low and high values ofx). We thus turn to MARS to automatically build a model taking into account non-linearities. MARS software constructs a model from the givenx andy as follows

The figure on the right shows a plot of this function: the predicted versusx, with the original values ofy once again shown as red dots. The predicted response is now a better fit to the originaly values.
MARS has automatically produced a kink in the predictedy to take into account non-linearity. The kink is produced byhinge functions. The hinge functions are the expressions starting with (where is if, else). Hinge functions are described in more detail below.
In this simple example, we can easily see from the plot thaty has a non-linear relationship withx (and might perhaps guess that y varies with the square ofx). However, in general there will be multipleindependent variables, and the relationship betweeny and these variables will be unclear and not easily visible by plotting. We can use MARS to discover that non-linear relationship.
An example MARS expression with multiple variables is

This expression models air pollution (the ozone level) as a function of the temperature and a few other variables. Note that the last term in the formula (on the last line) incorporates an interaction between and.
The figure on the right plots the predicted as and vary, with the other variables fixed at their median values. The figure shows that wind does not affect the ozone level unless visibility is low. We see that MARS can build quite flexible regression surfaces by combining hinge functions.
To obtain the above expression, the MARS model building procedure automatically selects which variables to use (some variables are important, others not), the positions of the kinks in the hinge functions, and how the hinge functions are combined.
MARS builds models of the form
The model is a weighted sum of basis functions.Each is a constant coefficient.For example, each line in the formula for ozone above is one basis function multiplied by its coefficient.
Eachbasis function takes one of the following three forms:
An example is the last line of the ozone formula.

A key part of MARS models arehinge functions taking the formorwhere is a constant, called theknot.The figure on the right shows a mirrored pair of hinge functions with a knot at 3.1.
A hinge function is zero for part of its range, so can be used to partition the data into disjoint regions, each of which can be treated independently. Thus for example a mirrored pair of hinge functions in the expressioncreates thepiecewise linear graph shown for the simple MARS model in the previous section.
One might assume that only piecewise linear functions can be formed from hinge functions, but hinge functions can be multiplied together to form non-linear functions.
Hinge functions are also calledramp,hockey stick, orrectifier functions. Instead of the notation used in this article, hinge functions are often represented by where means take the positive part.
MARS builds a model in two phases:the forward and the backward pass.This two-stage approach is the same as that used byrecursive partitioning trees.
MARS starts with a model which consists of just the intercept term(which is the mean of the response values).
MARS then repeatedly adds basis function in pairs to the model. At each step it finds the pair of basis functions that gives the maximum reduction in sum-of-squaresresidual error (it is agreedy algorithm). The two basis functions in the pair are identical except that a different side of a mirrored hinge function is used for each function. Each new basis function consists of a term already in the model (which could perhaps be the intercept term) multiplied by a new hinge function. A hinge function is defined by a variable and a knot, so to add a new basis function, MARS must search over all combinations of the following:
To calculate the coefficient of each term, MARS applies a linear regression over the terms.
This process of adding terms continues until the change in residual error is too small to continue or until the maximum number of terms is reached. The maximum number of terms is specified by the user before model building starts.
The search at each step is usually done in abrute-force fashion, but a key aspect of MARS is that because of the nature of hinge functions, the search can be done quickly using a fast least-squares update technique. Brute-force search can be sped up by using aheuristic that reduces the number of parent terms considered at each step ("Fast MARS"[4]).
The forward pass usuallyoverfits the model. To build a model with better generalization ability, the backward pass prunes the model, deleting the least effective term at each step until it finds the best submodel. Model subsets are compared using the Generalized cross validation (GCV) criterion described below.
The backward pass has an advantage over the forward pass: at any step it can choose any term to delete, whereas the forward pass at each step can only see the next pair of terms.
The forward pass adds terms in pairs, but the backward pass typically discards one side of the pair and so terms are often not seen in pairs in the final model. A paired hinge can be seen in the equation for in the first MARS example above; there are no complete pairs retained in the ozone example.
The backward pass compares the performance of different models using Generalized Cross-Validation (GCV), a minor variant on theAkaike information criterion that approximates theleave-one-out cross-validation score in the special case where errors are Gaussian, or where the squared errorloss function is used. GCV was introduced by Craven andWahba and extended by Friedman for MARS; lower values of GCV indicate better models. The formula for the GCV is
where RSS is the residual sum-of-squares measured on the training data andN is the number of observations (the number of rows in thex matrix).
The effective number of parameters is defined as
wherepenalty is typically 2 (giving results equivalent to theAkaike information criterion) but can be increased by the user if they so desire.
Note that
is the number of hinge-function knots, so the formula penalizes the addition of knots. Thus the GCV formula adjusts (i.e. increases) the training RSS to penalize more complex models. We penalize flexibility because models that are too flexible will model the specific realization of noise in the data instead of just the systematic structure of the data.
One constraint has already been mentioned: the user can specify the maximum number of terms in the forward pass.
A further constraint can be placed on the forward pass by specifying a maximum allowable degree of interaction. Typically only one or two degrees of interaction are allowed, but higher degrees can be used when the data warrants it. The maximum degree of interaction in the first MARS example above is one (i.e. no interactions or anadditive model); in the ozone example it is two.
Other constraints on the forward pass are possible. For example, the user can specify that interactions are allowed only for certain input variables. Such constraints could make sense because of knowledge of the process that generated the data.