Classification: Prediction bias

  • Prediction bias, calculated as the difference between the average prediction and the average ground truth, is a quick check for model or data issues.

  • A model with zero prediction bias ideally predicts the same average outcome as observed in the ground truth data, such as a spam detection model predicting the same percentage of spam emails as actually present in the dataset.

  • Significant prediction bias can indicate problems in the training data, the model itself, or the new data being applied to the model.

  • Common causes of prediction bias include biased data, excessive regularization, bugs in the training process, and insufficient features provided to the model.

Calculatingprediction biasis a quick check that can flag issues with the model or training dataearly on.

Prediction bias is the difference between the mean of a model'spredictionsand the mean ofground-truth labels in thedata. A model trained on a datasetwhere 5% of the emails are spam should predict, on average, that 5% of theemails it classifies are spam. In other words, the mean of the labels in theground-truth dataset is 0.05, and the mean of the model's predictions shouldalso be 0.05. If this is the case, the model has zero prediction bias. Ofcourse, the model might still have other problems.

If the model instead predicts 50% of the time that an email is spam, thensomething is wrong with the training dataset, the new dataset the model isapplied to, or with the model itself. Anysignificant difference between the two means suggests that the model hassome prediction bias.

Prediction bias can be caused by:

  • Biases or noise in the data, including biased sampling for the training set
  • Too-strong regularization, meaning that the model was oversimplified and lostsome necessary complexity
  • Bugs in the model training pipeline
  • The set of features provided to the model being insufficient for the task
Key terms:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-10-17 UTC.