Author Contributions
Conceptualization, F.S. and M.S.; methodology, P.M. and M.S.; software, P.M and J.K.; validation, P.M.; formal analysis, P.M.; investigation, J.K. and F.S.; resources, F.S.; data curation, P.M., J.K. and F.S.; writing—original draft preparation, P.M.; writing—review and editing, P.M.; visualization, J.K.; supervision, P.M. and J.M.; project administration, F.S. and M.S.; funding acquisition, F.S. and M.S. All authors have read and agreed to the published version of the manuscript.
Figure 1. Workflow of data, machine learning models, and bias mitigation techniques used in this research.
Figure 1. Workflow of data, machine learning models, and bias mitigation techniques used in this research.
Figure 2. Balanced accuracy and disparate impact error versus classification threshold for a logistic regression classifier with no bias mitigation. The dotted vertical line is the threshold that maximizes balanced accuracy. The plot shown corresponds to one of the folds of cross-validation. Disparate impact error, equal to 1-min(DI, 1/DI), where DI is the disparate impact, is the difference between disparate impact and its ideal value of 1.
Figure 2. Balanced accuracy and disparate impact error versus classification threshold for a logistic regression classifier with no bias mitigation. The dotted vertical line is the threshold that maximizes balanced accuracy. The plot shown corresponds to one of the folds of cross-validation. Disparate impact error, equal to 1-min(DI, 1/DI), where DI is the disparate impact, is the difference between disparate impact and its ideal value of 1.
Figure 3. Balanced accuracy and average odds difference versus classification threshold for a logistic regression classifier with no bias mitigation. The dotted vertical line is the threshold that maximizes balanced accuracy. The plot shown corresponds to one of the folds of cross-validation.
Figure 3. Balanced accuracy and average odds difference versus classification threshold for a logistic regression classifier with no bias mitigation. The dotted vertical line is the threshold that maximizes balanced accuracy. The plot shown corresponds to one of the folds of cross-validation.
Figure 4. Balanced accuracy and disparate impact error versus classification threshold for a logistic regression classifier with reweighing. The dotted vertical line is the threshold that maximizes balanced accuracy. The plot shown corresponds to one of the folds of cross-validation. Disparate impact error, equal to 1-min(DI, 1/DI), where DI is the disparate impact, and the difference between disparate impact and its ideal value of 1.
Figure 4. Balanced accuracy and disparate impact error versus classification threshold for a logistic regression classifier with reweighing. The dotted vertical line is the threshold that maximizes balanced accuracy. The plot shown corresponds to one of the folds of cross-validation. Disparate impact error, equal to 1-min(DI, 1/DI), where DI is the disparate impact, and the difference between disparate impact and its ideal value of 1.
Figure 5. Balanced accuracy and average odds difference versus classification threshold for a logistic regression classifier with reweighing. The dotted vertical line is the threshold that maximizes balanced accuracy. The plot shown corresponds to one of the folds of cross-validation.
Figure 5. Balanced accuracy and average odds difference versus classification threshold for a logistic regression classifier with reweighing. The dotted vertical line is the threshold that maximizes balanced accuracy. The plot shown corresponds to one of the folds of cross-validation.
Figure 6. Balanced accuracy and disparate impact error versus classification threshold for a logistic regression classifier with prejudice remover. The dotted vertical line is the threshold that maximizes balanced accuracy. The plot shown corresponds to one of the folds of cross-validation. Disparate impact error, equal to 1-min(DI, 1/DI), where DI is the disparate impact, and the difference between disparate impact and its ideal value of 1.
Figure 6. Balanced accuracy and disparate impact error versus classification threshold for a logistic regression classifier with prejudice remover. The dotted vertical line is the threshold that maximizes balanced accuracy. The plot shown corresponds to one of the folds of cross-validation. Disparate impact error, equal to 1-min(DI, 1/DI), where DI is the disparate impact, and the difference between disparate impact and its ideal value of 1.
Figure 7. Balanced accuracy and average odds difference versus classification threshold for a logistic regression classifier with prejudice remover. The dotted vertical line is the threshold that maximizes balanced accuracy. The plot shown corresponds to one of the folds of cross-validation.
Figure 7. Balanced accuracy and average odds difference versus classification threshold for a logistic regression classifier with prejudice remover. The dotted vertical line is the threshold that maximizes balanced accuracy. The plot shown corresponds to one of the folds of cross-validation.
Table 1. Datasets retrieved from the psychiatry department of the UMC Utrecht, with the variables present in each dataset that are used for this study. Psychiatry is divided into fournursing wards. For the “medication” dataset, the “Administered” and “Not administered” variables contain, in principle, the same information; however, sometimes only one of them is filled.
Table 1. Datasets retrieved from the psychiatry department of the UMC Utrecht, with the variables present in each dataset that are used for this study. Psychiatry is divided into fournursing wards. For the “medication” dataset, the “Administered” and “Not administered” variables contain, in principle, the same information; however, sometimes only one of them is filled.
Dataset | Variable | Type |
---|
Admissions | Admission ID | Identifier |
| Patient ID | Identifier |
| Nursing ward ID | Identifier |
| Admission date | Date |
| Discharge date | Date |
| Admission time | Time |
| Discharge time | Time |
| Emergency | Boolean |
| First admission | Boolean |
| Gender | Man/Woman |
| Age at admission | Integer |
| Admission status | Ongoing/Discharged |
| Duration in days | Integer |
Medication | Patient ID | Identifier |
| Prescription ID | Identifier |
| ATC code (medication ID) | String |
| Medication name | String |
| Dose | Float |
| Unit (for dose) | String |
| Administration date | Date |
| Administration time | Time |
| Administered | Boolean |
| Dose used | Float |
| Original dose | Float |
| Continuation After Suspension | Boolean |
| Not administered | Boolean |
Diagnoses | Patient ID | Identifier |
| Diagnosis number | Identifier |
| Start date | Date |
| End date | Date |
| Main diagnosis group | Categorical |
| Level of care demand | Numeric |
| Multiple problem | Boolean |
| Personality disorder | Boolean |
| Admission | Boolean |
| Diagnosis date | Date |
Aggression | Patient ID | Identifier |
| Date of incident | Date |
| Start time | Time |
Patient | Patient ID | Identifier |
| Age at start of dossier | Integer |
Table 2. List of tranquillizers considered in this study, along with the multipliers used for scaling the doses of those tranquillizers to a diazepam-equivalent dose. The last column is the inverse of the centre column.
Table 2. List of tranquillizers considered in this study, along with the multipliers used for scaling the doses of those tranquillizers to a diazepam-equivalent dose. The last column is the inverse of the centre column.
Tranquillizer | Multiplier | mg/(mg Diazepam) |
---|
Diazepam | 1.0 | 1.00 |
Alprazolam | 10.0 | 0.10 |
Bromazepam | 1.0 | 1.00 |
Brotizolam | 40.0 | 0.03 |
Chlordiazepoxide | 0.5 | 2.00 |
Clobazam | 0.5 | 2.00 |
Clorazepate potassium | 0.75 | 1.33 |
Flunitrazepam | 0.1 | 10 |
Flurazepam | 0.33 | 3.03 |
Lorazepam | 5.0 | 0.20 |
Lormetazepam | 10.0 | 0.10 |
Midazolam | 1.33 | 0.10 |
Nitrazepam | 1.0 | 1.00 |
Oxazepam | 0.33 | 3.03 |
Temazepam | 1.0 | 1.00 |
Zolpidem | 1.0 | 1.00 |
Zopiclone | 1.33 | 0.75 |
Table 3. List of variables in the final dataset.
Table 3. List of variables in the final dataset.
Variable | Type |
---|
Patient ID | Numeric |
Emergency | Binary |
First admission | Binary |
Gender | Binary |
Age at admission | Numeric |
Duration in days | Numeric |
Age at start of dossier | Numeric |
Incidents during admission | Numeric |
Incidents before admission | Numeric |
Multiple problem | Binary |
Personality disorder | Binary |
Minimum level of care demand | Numeric |
Maximum level of care demand | Numeric |
Past diazepam-equivalent dose | Numeric |
Future diazepam-equivalent dose | Numeric |
Nursing ward: Clinical Affective and Psychotic Disorders | Binary |
Nursing ward: Clinical Acute and Intensive Care | Binary |
Nursing ward: Clinical Acute and Intensive Care Youth | Binary |
Nursing ward: Clinical Diagnosis and Early Psychosis | Binary |
Diagnosis: Attention Deficit Disorder | Binary |
Diagnosis: Other issues that may be a cause for concern | Binary |
Diagnosis: Anxiety disorders | Binary |
Diagnosis: Autism spectrum disorder | Binary |
Diagnosis: Bipolar Disorders | Binary |
Diagnosis: Cognitive disorders | Binary |
Diagnosis: Depressive Disorders | Binary |
Diagnosis: Dissociative Disorders | Binary |
Diagnosis: Behavioural disorders | Binary |
Diagnosis: Substance-Related and Addiction Disorders | Binary |
Diagnosis: Obsessive Compulsive and Related Disorders | Binary |
Diagnosis: Other mental disorders | Binary |
Diagnosis: Other Infant or Childhood Disorders | Binary |
Diagnosis: Personality Disorders | Binary |
Diagnosis: Psychiatric disorders due to a general medical condition | Binary |
Diagnosis: Schizophrenia and other psychotic disorders | Binary |
Diagnosis: Somatic Symptom Disorder and Related Disorders | Binary |
Diagnosis: Trauma- and stressor-related disorders | Binary |
Diagnosis: Nutrition and Eating Disorders | Binary |
Table 4. Classification metrics for logistic regression (LR) and random forest (RF) classifiers including bias mitigation strategies reweighing (RW) and prejudice remover (PR). The classification metrics are balanced accuracy (Accbal) and F1 score. The errors shown are standard deviations.
Table 4. Classification metrics for logistic regression (LR) and random forest (RF) classifiers including bias mitigation strategies reweighing (RW) and prejudice remover (PR). The classification metrics are balanced accuracy (Accbal) and F1 score. The errors shown are standard deviations.
Model | Performance |
---|
Clf. | Mit. | Accbal | F1 |
---|
LR | | 0.834 ± 0.015 | 0.843 ± 0.014 |
RF | | 0.843 ± 0.018 | 0.835 ± 0.020 |
LR | RW | 0.830 ± 0.014 | 0.839 ± 0.011 |
RF | RW | 0.847 ± 0.019 | 0.840 ± 0.020 |
LR | PR | 0.793 ± 0.020 | 0.802 ± 0.029 |
Table 5. Fairness metrics for logistic regression (LR) and random forest (RF) classifiers including bias mitigation strategies reweighing (RW) and prejudice remover (PR). The fairness metrics are disparate impact (DI), average odds difference (AOD), statistical parity difference (SPD), and equal opportunity difference (EOD). The errors shown are standard deviations.
Table 5. Fairness metrics for logistic regression (LR) and random forest (RF) classifiers including bias mitigation strategies reweighing (RW) and prejudice remover (PR). The fairness metrics are disparate impact (DI), average odds difference (AOD), statistical parity difference (SPD), and equal opportunity difference (EOD). The errors shown are standard deviations.
Model | Fairness |
---|
Clf. | Mit. | DI | AOD | SPD | EOD |
---|
LR | | 0.793 ± 0.074 | −0.046 ± 0.021 | −0.110 ± 0.038 | −0.038 ± 0.028 |
RF | | 0.796 ± 0.071 | −0.018 ± 0.017 | −0.083 ± 0.031 | −0.013 ± 0.035 |
LR | RW | 0.869 ± 0.066 | −0.003 ± 0.013 | −0.066 ± 0.035 | 0.004 ± 0.034 |
RF | RW | 0.830 ± 0.077 | −0.004 ± 0.023 | −0.070 ± 0.034 | 0.001 ± 0.043 |
LR | PR | 0.886 ± 0.056 | −0.008 ± 0.003 | −0.060 ± 0.034 | −0.020 ± 0.045 |
Table 6. Classification metric differences of models with bias mitigators reweighing (RW) and prejudice remover (PR) compared to a baseline without bias mitigation, for logistic regression (LR) and random forest (RF) classifiers. The classification metrics are balanced accuracy (Accbal) and F1 score. The errors shown are standard deviations. Differences significant at 95% confidence level are shown inbold.
Table 6. Classification metric differences of models with bias mitigators reweighing (RW) and prejudice remover (PR) compared to a baseline without bias mitigation, for logistic regression (LR) and random forest (RF) classifiers. The classification metrics are balanced accuracy (Accbal) and F1 score. The errors shown are standard deviations. Differences significant at 95% confidence level are shown inbold.
Model | Performance |
---|
Clf. | Mit. | ΔAccbal | ΔF1 |
---|
LR | PR | −0.040 ± 0.013 | −0.041 ± 0.025 |
LR | RW | −0.003 ± 0.013 | −0.005 ± 0.013 |
RF | RW | 0.003 ± 0.002 | 0.005 ± 0.001 |
Table 7. Fairness metric differences of models with bias mitigators reweighing (RW) and prejudice remover (PR) compared to a baseline without bias mitigation, for logistic regression (LR) and random forest (RF) classifiers. The fairness metrics are disparate impact (DI), average odds difference (AOD), statistical parity difference (SPD) and equal opportunity difference (EOD). The errors shown are standard deviations. Differences significant at 95% confidence level are shown inbold.
Table 7. Fairness metric differences of models with bias mitigators reweighing (RW) and prejudice remover (PR) compared to a baseline without bias mitigation, for logistic regression (LR) and random forest (RF) classifiers. The fairness metrics are disparate impact (DI), average odds difference (AOD), statistical parity difference (SPD) and equal opportunity difference (EOD). The errors shown are standard deviations. Differences significant at 95% confidence level are shown inbold.
Model | Fairness |
---|
Clf. | Mit. | ΔDI | ΔAOD | ΔSPD | ΔEOD |
---|
LR | PR | 0.092 ± 0.036 | 0.038 ± 0.021 | 0.050 ± 0.019 | 0.018 ± 0.042 |
LR | RW | 0.075 ± 0.021 | 0.043 ± 0.017 | 0.043 ± 0.014 | 0.042 ± 0.034 |
RF | RW | 0.034 ± 0.013 | 0.014 ± 0.006 | 0.013 ± 0.006 | 0.014 ± 0.011 |