- Notifications
You must be signed in to change notification settings - Fork2
fani-lab/Adila
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
* عادلة, feminine Arabic given name, meaning just and fair
Team Formation
aims to automate forming teams of experts who can successfully solve difficult tasks. While state-of-the-art neural team formation methods are able to efficiently analyze massive collections of experts to form effective collaborative teams, they largely ignore the fairness in recommended teams of experts. Fairness breeds innovation and increases teams' success by enabling a stronger sense of community, reducing conflict, and stimulating more creative thinking. InAdila
, we study the application offairness-aware
team formation algorithms to mitigate the potential popularity bias in the neural team formation models. Our experiments show that, first, neural team formation models are biased towardpopular
andmale
experts. Second, although deterministic re-ranking algorithms mitigatepopularity
XORgender
bias substantially, they severely hurt the efficacy of teams. On the other hand, probabilistic greedy re-ranking algorithms mitigatepopularity
bias significantly and maintain utility. Finally, due to extreme bias in the dataset in terms ofgender
, probabilistic greedy re-ranking algorithms also fail to achieve fair and efficient teams.
We have studied the application of state-of-the-art
deterministic greedy re-ranking methods [Geyik et al. KDD'19]
in addition toprobabilistic greedy re-ranking methods [Zehlike et al. IP&M'22]
to mitigatepopulairty bias
andgender bias
based onequality of opportunity
anddemographic parity
notions of fairness for state-of-the-art neural team formation methods fromOpeNTF
. Our experiments show that:
- Neural team formation models are biased toward popular experts;
- Although deterministic re-ranking algorithms mitigate bias substantially, they severely hurt the efficacy of teams.
- Probabilistic greedy re-ranking methods are able to mitigate bias while maintaining the utility of the teams as well.
Currently, we are investigating:
- Other fairness factors like demographic attributes, including age, race, and gender;
- Developing machine learning-based models using Learning-to-Rank (L2R) techniques to mitigate bias as opposed to deterministic greedy algorithms.
Adila
needsPython=3.8
and others packages listed inrequirements.txt
:
Bypip
, clone the codebase and install the required packages:
git clone https://github.com/Fani-Lab/Adilacd Adilapip install -r requirements.txt
Byconda
:
git clone https://github.com/Fani-Lab/Adilacd Adilaconda env create -f environment.ymlconda activate adila
To runAdila
, you can use./src/main.py
:
cd srcpython -u main.py \ -fteamsvecs ../data/preprocessed/dblp/toy.dblp.v12.json/teamsvecs.pkl \ -fsplit ../output/toy.dblp.v12.json/splits.json \ -fpred ../output/toy.dblp.v12.json/bnn/ \ -np_ratio 0.5 \ -reranker det_cons \ -output ../output/toy.dblp.v12.json/
Where the arguements are:
fteamsvecs
: the sparse matrix representation of all teams in a pickle file, including the teams whose members are predicted in--pred
. It should contain a dictionary of threelil_matrix
with keys[id]
of size[#teams × 1]
,[skill]
of size[#teams × #skills]
,[member]
of size[#teams × #experts]
. Simply, each row of a metrix shows the occurrence vector of skills and experts in a team. For a toy example, try
import picklewith open(./data/preprocessed/dblp/toy.dblp.v12.json/teamsvecs.pkl) as f: teams=pickle.load(f)
fsplit
: the split.json file that indicates the index (rowid) of teams whose members are predicted in--pred
. For a toy example, seeoutput/toy.dblp.v12.json/splits.json
fpred
: a file or folder that includes the prediction files of a neural team formation methods in the format oftorch.ndarray
. The file name(s) should be*.pred
and the content is[#test × #experts]
probabilities that shows the membership probability of an expert to a team in test set. For a toy example, try
import torchtorch.load(./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/f0.test.pred)
np_ratio
: the desirednonpopular
ratio among members of predicted teams after mitigation process by re-ranking algorithms. E.g., 0.5.
reranker
: fairness-aware reranking algorithm from {det_greedy
,det_cons
,det_relaxed
,fa-ir
}. Eg.det_cons
.
output
: the path to the reranked predictions of members for teams, as well as, the teams' success and fairness evaluationbefore
andafter
mitigation process.
Adila
needs preprocessed information about the teams in the form of sparse matrix representation (-fteamsvecs
) and neural team formation prediction file(s) (-fpred
), obtained fromOpeNTF
:
├── data│ └── preprocessed│ └── dblp│ └── toy.dblp.v12.json│ └── teamsvecs.pkl#sparse matrix representation of teams├── output └── toy.dblp.v12.json ├── bnn │ └── t31.s11.m13.l[100].lr0.1.b4096.e20.s1 │ ├── f0.test.pred │ ├── f1.test.pred │ ├── f2.test.pred └── splits.json#rowids of team instances in n-fold train-valid splits, and a final test split
Adila
has three main steps:
Based on the distribution of experts on teams, which is power law (long tail) as shown in the figure, we label those in thetail
asnonpopular
and those in thehead
as popular.
To find the cutoff betweenhead
andtail
, we calculate the average number of teams per expert over the whole dataset. As seen in the table, this number is62.45
and the popular/nonpopular ratio is0.426/0.574
. The result is a Boolean value in{popular: True, nonpopular: False}
for each expert and is save in{output}/popularity.csv
like./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/rerank/popularity.csv
imdb | ||
---|---|---|
raw | filtered | |
#movies | 507,034 | 32,059 |
#unique casts and crews | 876,981 | 2,011 |
#unique genres | 28 | 23 |
average #casts and crews per team | 1.88 | 3.98 |
average #genres per team | 1.54 | 1.76 |
average #movie per cast and crew | 1.09 | 62.45 |
average #genre per cast and crew | 1.59 | 10.85 |
#team w/ single cast and crew | 322,918 | 0 |
#team w/ single genre | 315,503 | 15,180 |
Future:
We will consider equal area under the curve for the cutoff.
The following figures will demonstrate the gender distributions inimdb
,dblp
anduspt
datasets.
We apply rerankers fromdeterministic greedy re-ranking methods [Geyik et al. KDD'19]
, including{'det_greedy', 'det_cons', 'det_relaxed'}
to mitigatepopulairty bias
. The reranker needs a cutoffk_max
which is set to10
by default.
The result of predictions after reranking is saved in{output}/rerank/{fpred}.{reranker}.{k_max}.rerank.pred
like./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/rerank/f0.test.pred.det_cons.10.rerank.pred
.
We evaluatefairness
andutility
metricsbefore
andafter
applying rerankers on team predictions to answer two research questions (RQs):
RQ1:
Do state-of-the-art neural team formation models produce fair teams of experts in terms of popularity bias? To this end, we measure the fairness scores of predicted teamsbefore
applying rerankers.
RQ2:
Do state-of-the-art deterministic greedy re-ranking algorithms improve the fairness of neural team formation models while maintaining their accuracy? To this end, we measure thefairness
andutility
metricsbefore
andafter
applying rerankers.
The result offairness
metricsbefore
andafter
will be stored in{output}.{algorithm}.{k_max}.{faireval}.csv
like./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/rerank/f2.test.pred.det_cons.10.faireval.csv
.
The result ofutility
metricsbefore
andafter
will be stored in{output}.{algorithm}.{k_max}.{utileval}.csv
like./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/rerank/f1.test.pred.det_cons.10.utileval.csv
.
Future:
We will consider other fairness metrics.
After successful run of all steps,./output
contains:
├── output └── toy.dblp.v12.json ├── bnn │ └── t31.s11.m13.l[100].lr0.1.b4096.e20.s1 │ ├── f0.test.pred │ ├── f1.test.pred │ ├── f2.test.pred │ └── rerank/{popularity, gender} │ ├── f0.test.pred.det_cons.10.faireval.csv │ ├── f0.test.pred.det_cons.10.utileval.csv │ ├── f0.test.pred.det_cons.10.rerank.csv │ ├── f0.test.pred.det_cons.10.rerank.pred │ ├── f1.test.pred.det_cons.10.faireval.csv │ ├── f1.test.pred.det_cons.10.utileval.csv │ ├── f1.test.pred.det_cons.10.rerank.csv │ ├── f1.test.pred.det_cons.10.rerank.pred │ ├── f2.test.pred.det_cons.10.faireval.csv │ ├── f2.test.pred.det_cons.10.utileval.csv │ ├── f2.test.pred.det_cons.10.rerank.csv │ ├── f2.test.pred.det_cons.10.rerank.pred │ ├── labels.csv │ ├── rerank.time │ └── stats.pkl └── splits.json
Our results show that although we improve fairness significantly, our utility metric drops extensively. Part of this phenomenon is described inFairness in Ranking, Part I: Score-Based Ranking [Zehlike et al. ACM Computing Surveys'22]
. When we apply representation constraints on individual attributes, like race , popularity and gender and we want to maximize a score with respect to these constraints, utility loss can be particularly significant in historically disadvantaged intersectional groups. The following tables contain the results of our experiments on thebnn
,bnn_emb
andrandom
baselines withgreedy
,conservative
andrelaxed
re-ranking algorithms withdemographic parity
fairness notion.
bnn(3.8 GB) | |||||||
---|---|---|---|---|---|---|---|
greedy | conservative | relaxed | |||||
before | after | after | after | ||||
ndcg2 ↑ | 0.695% | 0.126% | -0.569% | 0.091% | -0.604% | 0.146% | -0.550% |
ndcg5 ↑ | 0.767% | 0.141% | -0.626% | 0.130% | -0.637% | 0.130% | -0.637% |
ndcg10 ↑ | 1.058% | 0.247% | -0.811% | 0.232% | -0.826% | 0.246% | -0.812% |
map2 ↑ | 0.248% | 0.060% | -0.188% | 0.041% | -0.207% | 0.063% | -0.185% |
map5 ↑ | 0.381% | 0.083% | -0.298% | 0.068% | -0.313% | 0.079% | -0.302% |
map10 ↑ | 0.467% | 0.115% | -0.352% | 0.101% | -0.366% | 0.115% | -0.352% |
ndlkl ↓ | 0.2317 | 0.0276 | -0.2041 | 0.0276 | -0.2041 | 0.0273 | -0.2043 |
bnn_emb(3.79 GB) | |||||||
---|---|---|---|---|---|---|---|
greedy | conservative | relaxed | |||||
before | after | after | after | ||||
ndcg2 ↑ | 0.921% | 0.087% | -0.834% | 0.121% | -0.799% | 0.087% | -0.834% |
ndcg5 ↑ | 0.927% | 0.117% | -0.810% | 0.150% | -0.777% | 0.117% | -0.810% |
ndcg10 ↑ | 1.266% | 0.223% | -1.043% | 0.241% | -1.025% | 0.223% | -1.043% |
map2 ↑ | 0.327% | 0.034% | -0.293% | 0.057% | -0.270% | 0.034% | -0.293% |
map5 ↑ | 0.469% | 0.059% | -0.410% | 0.084% | -0.386% | 0.059% | -0.410% |
map10 ↑ | 0.573% | 0.093% | -0.480% | 0.111% | -0.461% | 0.093% | -0.480% |
ndkl ↓ | 0.2779 | 0.0244 | -0.2535 | 0.0244 | -0.2535 | 0.0241 | -0.2539 |
random(2.41 GB) | |||||||
---|---|---|---|---|---|---|---|
greedy | conservative | relaxed | |||||
before | after | after | after | ||||
ndcg2 ↑ | 0.1711% | 0.136% | -0.035% | 0.205% | 0.034% | 0.205% | 0.034% |
ndcg5 ↑ | 0.1809% | 0.170% | -0.011% | 0.190% | 0.009% | 0.190% | 0.009% |
ndcg10 ↑ | 0.3086% | 0.258% | -0.051% | 0.283% | -0.026% | 0.283% | -0.026% |
map2 ↑ | 0.0617% | 0.059% | -0.003% | 0.089% | 0.028% | 0.089% | 0.028% |
map5 ↑ | 0.0889% | 0.095% | 0.006% | 0.110% | 0.021% | 0.110% | 0.021% |
map10 ↑ | 0.1244% | 0.121% | -0.003% | 0.140% | 0.016% | 0.140% | 0.016% |
ndkl ↓ | 0.0072 | 0.0369 | 0.0296 | 0.0366 | 0.0293 | 0.0366 | 0.0294 |
The files containing the rest of our experiment results with various notions, datasets ,and algorithms are as follows:
We benefit frompytrec
,reranking
, and other libraries. We would like to thank the authors of these libraries and helpful resources.
©2024. This work is licensed under aCC BY-NC-SA 4.0 license.
@inproceedings{DBLP:conf/bias/LoghmaniF23, author = {Hamed Loghmani and Hossein Fani}, title = {Bootless Application of Greedy Re-ranking Algorithms in Fair Neural Team Formation}, booktitle = {Advances in Bias and Fairness in Information Retrieval - Fourth International Workshop, {BIAS} 2023, Dublin, Irland, April 2, 2023, Revised Selected Papers}, pages = {108--118}, publisher = {Springer Nature Switzerland}, year = {2023}, url = {https://doi.org/10.1007/978-3-031-37249-0_9}, doi = {10.1007/978-3-031-37249-0_9}, bibsource = {dblp computer science bibliography, https://dblp.org}}