scikit-learn/scikit-learnPublic

NotificationsYou must be signed in to change notification settings
Fork26k
Star62.7k

Is cost sensitive learning compatible with probability calibration ?#31329

Unanswered

lcrmorin asked this question inQ&A

lcrmorin

May 7, 2025

· 1 comments· 5 replies

Return to top

Discussion options

lcrmorin
May 7, 2025

Regarding the problem of class imbalance, it seems the consensus is now to usecost sensitive learning. That is to use cost imbalance (instead of class imbalance) as weights in the evaluation metric and then use the weights for learning. The idea is to get closer to real world metrics ($$$). I understand and agree with this.

However there is also the problem of probability calibration:calibration. To me it seems that using cost imbalance would break the calibration in probability. Am I right in thinking this ?

I would be tempted to fit a model with weights then use a probability calibration approach. But I am not sure that it would works as expected: typically, wouldn't the probability calibration approach need to be weighted too ?

If the probability calibration approach is not weighted: doesn't that defeat the purpose of weighting the model in the first place ?
If the probability calibration approach is weighted: doesn't that break the natural interpretation of probability ? Can we change the definition of probability to match this ? (from expected number of event, to expected loss ?)
Or would it make sense to use both unweighted and weighted calibration approaches and report both natural probability (% of events) and weighted probability (% of loss) ?

Edit: does the weighting change the ranking of the first step ? Maybe we don't need to weight the first step and only the second step should be weighted ?

You must be logged in to vote

Replies: 1 comment 5 replies

Comment options

lcrmorin
May 12, 2025
Author

Maybe@glemaitre or@jeremiedbb have an input on this ?

You must be logged in to vote

5 replies

Comment options

glemaitre May 12, 2025
Maintainer

To me it seems that using cost imbalance would break the calibration in probability. Am I right in thinking this ?

Because the cost-sensitive learning is a post-hoc operation, it will not affect the calibration of the predictive model. What will affect the calibration of the model are:

whether you are using a proper scoring rule: only does loss will make sure that you will get the best probability estimates
if the loss also include some regularization: then you are not only minimizing the "proper scoring rule" thus it will have an effect on the probability estimates
the number of samples: the theory is based on infinite number of samples which is not the case in practice and thus it will have an impact on the estimates

So above there are general discussions. We think that there are details to refine when it comes to tune the hyperparameters with something related tohttps://arxiv.org/pdf/2501.19195?. In short, it might be better to tune a ranking metric and have an internal calibration as well (so "refine").

So to the larger question of calibrating with weights: if you are adding weights, then you calibrate the model on the weighted target probability rather than the original one. So if the aim is to get the true probability estimate from the original target, then you don't want to apply any weights. Weights could be useful if you applied some sampling in the process and you want to shift back to the original distribution.

Comment options

lcrmorin May 12, 2025
Author

Because the cost-sensitive learning is a post-hoc operation, it will not affect the calibration of the predictive model.

But it changes what it is calibrated to ? (Natural probabilities or weighted probabilities)

Thanks for the links@glemaitre (although it seems a bit like the Deep Learning community reinventing the wheel). In my field it is usual to split the modelling into a ranking step and a calibration step.

I think I can reformulate my question as: which step should be weighted ?

It is not clear whether the ranking step is impacted by weighting. So that one could drop weights for that step.
It seems more clear that the calibration step would be impacted by weighting. But can the weighting be performed independently from the first step ?
And if we need both natural and weighted probabilities can we perform one ranking step then two calibration step ?

Comment options

glemaitre May 12, 2025
Maintainer

But it changes what it is calibrated to ? (Natural probabilities or weighted probabilities)

No because the post-hoc operation is just finding the cut-off point to go from probability estimates to classification decision.

Since the original estimator is found by minimizing a proper scoring rule then you cannot get a better probability estimates. Potentially, you can refine those estimates with an additional calibration steps.

However, those steps are prior to do any cost-sensitive learning as presenting in the example. In short, the perfectly calibrated classifier will provide probability estimate and we find the threshold that transform them into classification decision and thus it does not change the calibration of the estimator.

which step should be weighted ?

Therefore, no steps should be weighted if the aim is to stick to the "natural" probabilities. Weighting should only be used if in the process, you do not minimize the original problem and altered it (e.g. resampling).

Comment options

lcrmorin May 13, 2025
Author

Thanks@glemaitre for your answer.

I think I was confused by the post-hoc operation you mention, as you are discussing the binarisation threshold, while I was discussing calibration step (typically isotonic spline). I'd say we don't use the binarisation step that much in practice. The level of the threshold would mainly depends on the ranks (and change if the metric used is weighted by costs).

I generally need both weighted and unweighted metric (probability and expected loss) and was wondering if there is an option to have only one model for ranking and two 'calibration steps', but I guess it is not the way to go. (The decomposition of risk in the paper you mentioned would have the decomposition of a cost weighted metric in two weighted terms).

Comment options

glemaitre May 13, 2025
Maintainer

OK so to be sure to get it properly, here you refer to cost-sensitive learning as weighting directly the loss that is minimized by the estimator?

If it is the case, then I agree that you estimator is not calibrated in respect to the original target distribution.

If you recalibrate the model without weight, it should make the estimator predict in the original target distribution. If you are passing weights during the calibration then you are forcing your model to predict in the weighted target distribution, I guess. Then, I'm wondering if it is actually needed to recalibrate the model because minimizing the cost-sensitive loss in the first step would have been enough. But it might be similar to the non-sensitive model that you can get not a perfectly calibrated model and still want to reduce the calibration loss.

So at the end, if the aim is to predict in the reweighted target distribution then I think that I agree with you that you need to weight both the estimator loss and the calibration score.

Movatterモバイル変換

Uh oh!

Is cost sensitive learning compatible with probability calibration ?#31329

Uh oh!

Uh oh!

lcrmorinMay 7, 2025

Replies: 1 comment· 5 replies

Uh oh!

Uh oh!

lcrmorinMay 12, 2025 Author

Uh oh!

glemaitreMay 12, 2025 Maintainer

Uh oh!

Uh oh!

lcrmorinMay 12, 2025 Author

Uh oh!

glemaitreMay 12, 2025 Maintainer

Uh oh!

lcrmorinMay 13, 2025 Author

Uh oh!

glemaitreMay 13, 2025 Maintainer

Uh oh!

lcrmorin
May 7, 2025

Replies: 1 comment 5 replies

lcrmorin
May 12, 2025
Author

glemaitre May 12, 2025
Maintainer

lcrmorin May 12, 2025
Author

glemaitre May 12, 2025
Maintainer

lcrmorin May 13, 2025
Author

glemaitre May 13, 2025
Maintainer