Linear regression

  • This module introduces linear regression, a statistical method used to predict a label value based on its features.

  • The linear regression model uses an equation (y' = b + w₁x₁ + ...) to represent the relationship between features and the label.

  • During training, the model adjusts its bias (b) and weights (w) to minimize the difference between predicted and actual values.

  • Linear regression can be applied to models with multiple features, each with its own weight, to improve prediction accuracy.

  • Gradient descent and hyperparameter tuning are key techniques used to optimize the performance of a linear regression model.

Estimated module length: 80 minutes

This module introduceslinear regression concepts.

Learning objectives:
  • Explain a loss function and how it works.
  • Define and describe how gradient descent finds the optimal model parameters.
  • Describe how to tune hyperparameters to efficiently train a linear model.
Prerequisites:

This module assumes you are familiar with the concepts covered in the following module:

Linear regression is astatistical technique used to find the relationship between variables. In an MLcontext, linear regression finds the relationship betweenfeatures and alabel.

For example, suppose we want to predict a car's fuel efficiency in miles pergallon based on how heavy the car is, and we have the following dataset:

Pounds in 1000s (feature)Miles per gallon (label)
3.518
3.6915
3.4418
3.4316
4.3415
4.4214
2.3724

If we plotted these points, we'd get the following graph:

Figure 1. Data points showing downward-sloping trend from left to right.

Figure 1. Car heaviness (in pounds) versus miles per gallon rating. As acar gets heavier, its miles per gallon rating generally decreases.

We could create our own model by drawing a best fit line through the points:

Figure 2. Data points with a best fit line drawn through them representing the model.

Figure 2. A best fit line drawn through the data from the previous figure.

Linear regression equation

In algebraic terms, the model would be defined as $ y = mx + b $, where

  • $ y $ is miles per gallon—the value we want to predict.
  • $ m $ is the slope of the line.
  • $ x $ is pounds—our input value.
  • $ b $ is the y-intercept.

In ML, we write the equation for a linear regression model as follows:

$$ y' = b + w_1x_1 $$

where:

  • $ y' $ is the predicted label—the output.
  • $ b $ is thebiasof the model. Bias is the same concept as the y-intercept in the algebraicequation for a line. In ML, bias is sometimes referred to as $ w_0 $. Biasis aparameter of the model andis calculated during training.
  • $ w_1 $ is theweight of thefeature. Weight is the same concept as the slope $ m $ in the algebraicequation for a line. Weight is aparameter of the model and iscalculated during training.
  • $ x_1 $ is afeature—theinput.

During training, the model calculates the weight and bias that produce the bestmodel.

Figure 3. The equation y' = b + w1x1, with each component annotated with its purpose.

Figure 3. Mathematical representation of a linear model.

In our example, we'd calculate the weight and bias from the line we drew. Thebias is 34 (where the line intersects the y-axis), and the weight is –4.6 (theslope of the line). The model would be defined as $ y' = 34 + (-4.6)(x_1) $, andwe could use it to make predictions. For instance, using this model, a 4,000-pound car would have a predicted fuel efficiency of 15.6 miles pergallon.

Figure 4. Same graph as Figure 2, with the point (4, 15.6) highlighted.

Figure 4. Using the model, a 4,000-pound car has a predictedfuel efficiency of 15.6 miles per gallon.

Models with multiple features

Although the example in this section uses only one feature—the heavinessof the car—a more sophisticated model might rely on multiple features,each having a separate weight ($ w_1 $, $ w_2 $, etc.). For example, a modelthat relies on five features would be written as follows:

$ y' = b + w_1x_1 + w_2x_2 + w_3x_3 + w_4x_4 + w_5x_5 $

For example, a model that predicts gas mileage could additionally use featuressuch as the following:

  • Engine displacement
  • Acceleration
  • Number of cylinders
  • Horsepower

This model would be written as follows:

Figure 5. Linear regression equation with five features.

Figure 5. A model with five features to predict a car's miles per gallonrating.

By graphing a couple of these additional features, we can see that they alsohave a linear relationship to the label, miles per gallon:

Figure 6. Displacement in cubic centimeters graphed against miles per gallon showing a negative linear relationship.

Figure 6. A car's displacement in cubic centimeters and its miles per gallonrating. As a car's engine gets bigger, its miles per gallon rating generallydecreases.

Figure 7. Acceleration from zero to sixty in seconds graphed against miles per gallon showing a positive linear relationship.

Figure 7. A car's acceleration and its miles per gallon rating. As a car'sacceleration takes longer, the miles per gallon rating generally increases.

Exercise: Check your understanding

What parts of the linear regression equation are updated during training?
The bias and weights
During training, the model updates the bias and weights.
The prediction
Predictions are not updated during training.
The feature values
Feature values are part of the dataset, so they're not updated during training.
Key terms:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-09 UTC.