Given we detected some form of bias during bias auditing, we areoften interested in obtaining fair(er) models. There are several ways toachieve this, such as collecting additional data or finding and fixingerrors in the data. Assuming there are no biases in the data and labels,one other option is to debias models using eitherpreprocessing,postprocessing andinprocessing methods.mlr3fairnessprovides some operators asPipeOps formlr3pipelines. If you are not familiar withmlr3pipelines, themlr3book contains an introduction.
We again showcase debiasing using theadult_traintask:
mlr3fairness implements 2 reweighing-based algorithms:reweighing_wts andreweighing_os.reweighing_wts adds observation weights to aTask that can counteract imbalances between the conditionalprobabilities\(P(Y | pta)\).
| key | output.num | input.type.train | input.type.predict | output.type.train |
|---|---|---|---|---|
| EOd | 1 | TaskClassif | TaskClassif | NULL |
| reweighing_os | 1 | TaskClassif | TaskClassif | TaskClassif |
| reweighing_wts | 1 | TaskClassif | TaskClassif | TaskClassif |
We fist instantiate thePipeOp:
and directly add the weights:
Often we directly combine thePipeOp with aLearner to automate the preprocessing (seelearner_rw). Below we instantiate a small benchmark
set.seed(4321)learner=lrn("classif.rpart",cp =0.005)learner_rw=as_learner(po("reweighing_wts")%>>% learner)grd=benchmark_grid(list(task),list(learner, learner_rw),rsmp("cv",folds=3))bmr=benchmark(grd)#> INFO [13:56:47.585] [mlr3] Running benchmark with 6 resampling iterations#> INFO [13:56:47.625] [mlr3] Applying learner 'classif.rpart' on task 'adult_train' (iter 1/3)#> INFO [13:56:47.692] [mlr3] Applying learner 'classif.rpart' on task 'adult_train' (iter 2/3)#> INFO [13:56:47.746] [mlr3] Applying learner 'classif.rpart' on task 'adult_train' (iter 3/3)#> INFO [13:56:47.798] [mlr3] Applying learner 'reweighing_wts.classif.rpart' on task 'adult_train' (iter 1/3)#> INFO [13:56:47.897] [mlr3] Applying learner 'reweighing_wts.classif.rpart' on task 'adult_train' (iter 2/3)#> INFO [13:56:48.005] [mlr3] Applying learner 'reweighing_wts.classif.rpart' on task 'adult_train' (iter 3/3)#> INFO [13:56:48.102] [mlr3] Finished benchmarkWe can now compute the metrics for our benchmark and see ifreweighing actually improved fairness, measured via True Positive Rate(TPR) and classification accuracy (ACC):
bmr$aggregate(msrs(c("fairness.tpr","fairness.acc")))#> nr task_id learner_id resampling_id iters#> <int> <char> <char> <char> <int>#> 1: 1 adult_train classif.rpart cv 3#> 2: 2 adult_train reweighing_wts.classif.rpart cv 3#> fairness.tpr fairness.acc#> <num> <num>#> 1: 0.07494903 0.1162688#> 2: 0.01151982 0.1054431#> Hidden columns: resample_resultOur model became way fairer wrt. TPR but minimally worse wrt.accuracy!