Linear regression: Loss Stay organized with collections Save and categorize content based on your preferences.
Page Summary
Loss is a numerical value indicating the difference between a model's predictions and the actual values.
The goal of model training is to minimize loss, bringing it as close to zero as possible.
Two common methods for calculating loss are Mean Absolute Error (MAE) and Mean Squared Error (MSE), which differ in their sensitivity to outliers.
Choosing between MAE and MSE depends on the dataset and how you want the model to handle outliers, with MSE penalizing them more heavily.
Loss is a numerical metric that describeshow wrong a model'spredictionsare. Loss measures the distance between the model's predictions and the actuallabels. The goal of training a model is to minimize the loss, reducing it to itslowest possible value.
In the following image, you can visualize loss as arrows drawn from the datapoints to the model. The arrows show how far the model's predictions are fromthe actual values.

Figure 8. Loss is measured from the actual value to the predicted value.
Distance of loss
In statistics and machine learning, loss measures the difference between thepredicted and actual values. Loss focuses on thedistance between the values,not the direction. For example, if a model predicts 2, but the actual value is5, we don't care that the loss is negative (2 – 5= –3). Instead, we care thatthedistance between the values is 3. Thus, all methods for calculating lossremove the sign.
The two most common methods to remove the sign are the following:
- Take the absolute value of the difference between the actual value and theprediction.
- Square the difference between the actual value and the prediction.
Types of loss
In linear regression, there are five main types of loss, which are outlined inthe following table.
| Loss type | Definition | Equation |
|---|---|---|
| L1 loss | The sum of the absolute values of the difference between the predicted values and the actual values. | $ ∑ | actual\ value - predicted\ value | $ |
| Mean absolute error (MAE) | The average of L1 losses across a set ofN examples. | $ \frac{1}{N} ∑ | actual\ value - predicted\ value | $ |
| L2 loss | The sum of the squared difference between the predicted values and the actual values. | $ ∑(actual\ value - predicted\ value)^2 $ |
| Mean squared error (MSE) | The average of L2 losses across a set ofN examples. | $ \frac{1}{N} ∑ (actual\ value - predicted\ value)^2 $ |
| Root mean squared error (RMSE) | The square root of the mean squared error (MSE). | $ \sqrt{\frac{1}{N} ∑ (actual\ value - predicted\ value)^2} $ |
The functional difference between L1 loss and L2 loss(or between MAE/RMSE and MSE) is squaring. When the difference between theprediction and label is large, squaring makes the loss even larger. When thedifference is small (less than 1), squaring makes the loss even smaller.
Loss metrics like MAE and RMSE may be preferable to L2 loss or MSE insome use cases because they tend to be more human-interpretable, as they measureerror using the same scale as the model's predicted value.
Note: MAE and RMSE can differ quite widely. MAE represents the averageprediction error, whereas RMSE represents the "spread" of the errors, and ismore skewed by larger errors.When processing multiple examples at once, we recommend averaging the lossesacross all the examples, whether using MAE, MSE, or RMSE.
Calculating loss example
In the previous section, we created the followingmodel to predict fuel efficiency based oncar heaviness:
- Model: $ y' = 34 + (-4.6)(x_1) $
- Weight: $ –4.6 $
- Bias: $ 34 $
If the model predicts that a 2,370-pound car gets 23.1 miles per gallon, but itactually gets 24 miles per gallon, we would calculate the L2 lossas follows:
Note: The formula uses 2.37 because the graphs are scaled to 1000s of pounds.| Value | Equation | Result |
|---|---|---|
| Prediction | $\small{bias + (weight * feature\ value)}$ $\small{34 + (-4.6*2.37)}$ | $\small{23.1}$ |
| Actual value | $ \small{ label } $ | $ \small{ 24 } $ |
| L2 loss | $ \small{ (actual\ value - predicted\ value)^2 } $ $\small{ (24 - 23.1)^2 }$ | $\small{0.81}$ |
In this example, the L2 loss for that single data point is 0.81.
Choosing a loss
Deciding whether to use MAE or MSE can depend on the dataset and the way youwant to handle certain predictions. Most feature values in a dataset typicallyfall within a distinct range. For example, cars are normally between 2000 and5000 pounds and get between 8 to 50 miles per gallon. An 8,000-pound car,or a car that gets 100 miles per gallon, is outside the typical range and wouldbe considered anoutlier.
An outlier can also refer to how far off a model's predictions are from the realvalues. For instance, 3,000 pounds is within the typical car-weight range, and40 miles per gallon is within the typical fuel-efficiency range. However, a3,000-pound car that gets 40 miles per gallon would be an outlier in terms ofthe model's prediction because the model would predict that a 3,000-pound carwould get around 20 miles per gallon.
When choosing the best loss function, consider how you want the model to treatoutliers. For instance, MSE moves the model more toward the outliers, while MAEdoesn't. L2 loss incurs a much higher penalty for an outlier thanL1 loss. For example, the following images show a model trainedusing MAE and a model trained using MSE. The red line represents a fullytrained model that will be used to make predictions. The outliers are closer tothe model trained with MSE than to the model trained with MAE.

Figure 9. MSE loss moves the model closer to the outliers.

Figure 10. MAE loss keeps the model farther from the outliers.
Note the relationship between the model and the data:
MSE. The model is closer to the outliers but further away from most ofthe other data points.
MAE. The model is further away from the outliers but closer to most ofthe other data points.
Click the icon for more guidelines on choosing a loss metric
Choose MSE:
- If you want to heavily penalize large errors.
- If you believe the outliers are important and indicative of true data variance that the model should account for.
Choose MAE:
- If your dataset has significant outliers that you don't want to overly influence the model. MAE is more robust.
- If you prefer a loss function that is more directly interpretable as the average error magnitude.
In practice, your metric choice can also depend on the specific business problem and what kind of errors are more costly.
Check Your Understanding
Consider the following two plots of a linear model fit to a dataset:
![]() | ![]() |
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-01-05 UTC.

