Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Fairness-Aware Team Formation

NotificationsYou must be signed in to change notification settings

fani-lab/Adila

Repository files navigation

* عادلة, feminine Arabic given name, meaning just and fair

Team Formation aims to automate forming teams of experts who can successfully solve difficult tasks. While state-of-the-art neural team formation methods are able to efficiently analyze massive collections of experts to form effective collaborative teams, they largely ignore the fairness in recommended teams of experts. Fairness breeds innovation and increases teams' success by enabling a stronger sense of community, reducing conflict, and stimulating more creative thinking. InAdila, we study the application offairness-aware team formation algorithms to mitigate the potential popularity bias in the neural team formation models. Our experiments show that, first, neural team formation models are biased towardpopular andmale experts. Second, although deterministic re-ranking algorithms mitigatepopularity XORgender bias substantially, they severely hurt the efficacy of teams. On the other hand, probabilistic greedy re-ranking algorithms mitigatepopularity bias significantly and maintain utility. Finally, due to extreme bias in the dataset in terms ofgender, probabilistic greedy re-ranking algorithms also fail to achieve fair and efficient teams.

We have studied the application of state-of-the-artdeterministic greedy re-ranking methods [Geyik et al. KDD'19] in addition toprobabilistic greedy re-ranking methods [Zehlike et al. IP&M'22]to mitigatepopulairty bias andgender bias based onequality of opportunity anddemographic parity notions of fairness for state-of-the-art neural team formation methods fromOpeNTF. Our experiments show that:

  • Neural team formation models are biased toward popular experts;
  • Although deterministic re-ranking algorithms mitigate bias substantially, they severely hurt the efficacy of teams.
  • Probabilistic greedy re-ranking methods are able to mitigate bias while maintaining the utility of the teams as well.

Currently, we are investigating:

  • Other fairness factors like demographic attributes, including age, race, and gender;
  • Developing machine learning-based models using Learning-to-Rank (L2R) techniques to mitigate bias as opposed to deterministic greedy algorithms.

1. Setup

Adila needsPython=3.8 and others packages listed inrequirements.txt:

Bypip, clone the codebase and install the required packages:

git clone https://github.com/Fani-Lab/Adilacd Adilapip install -r requirements.txt

Byconda:

git clone https://github.com/Fani-Lab/Adilacd Adilaconda env create -f environment.ymlconda activate adila

2. Quickstart

To runAdila, you can use./src/main.py:

cd srcpython -u main.py \  -fteamsvecs ../data/preprocessed/dblp/toy.dblp.v12.json/teamsvecs.pkl \  -fsplit ../output/toy.dblp.v12.json/splits.json \  -fpred ../output/toy.dblp.v12.json/bnn/ \  -np_ratio 0.5 \  -reranker det_cons \  -output ../output/toy.dblp.v12.json/

Where the arguements are:

fteamsvecs: the sparse matrix representation of all teams in a pickle file, including the teams whose members are predicted in--pred. It should contain a dictionary of threelil_matrix with keys[id] of size[#teams × 1],[skill] of size[#teams × #skills],[member] of size[#teams × #experts]. Simply, each row of a metrix shows the occurrence vector of skills and experts in a team. For a toy example, try

import picklewith open(./data/preprocessed/dblp/toy.dblp.v12.json/teamsvecs.pkl) as f: teams=pickle.load(f)

fsplit: the split.json file that indicates the index (rowid) of teams whose members are predicted in--pred. For a toy example, seeoutput/toy.dblp.v12.json/splits.json

fpred: a file or folder that includes the prediction files of a neural team formation methods in the format oftorch.ndarray. The file name(s) should be*.pred and the content is[#test × #experts] probabilities that shows the membership probability of an expert to a team in test set. For a toy example, try

import torchtorch.load(./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/f0.test.pred)

np_ratio: the desirednonpopular ratio among members of predicted teams after mitigation process by re-ranking algorithms. E.g., 0.5.

reranker: fairness-aware reranking algorithm from {det_greedy,det_cons,det_relaxed,fa-ir}. Eg.det_cons.

output: the path to the reranked predictions of members for teams, as well as, the teams' success and fairness evaluationbefore andafter mitigation process.

3. Pipeline

Adila needs preprocessed information about the teams in the form of sparse matrix representation (-fteamsvecs) and neural team formation prediction file(s) (-fpred), obtained fromOpeNTF:

├── data│   └── preprocessed│       └── dblp│           └── toy.dblp.v12.json│               └── teamsvecs.pkl#sparse matrix representation of teams├── output    └── toy.dblp.v12.json        ├── bnn        │   └── t31.s11.m13.l[100].lr0.1.b4096.e20.s1        │       ├── f0.test.pred        │       ├── f1.test.pred        │       ├── f2.test.pred        └── splits.json#rowids of team instances in n-fold train-valid splits, and a final test split

Adila has three main steps:

3.1. Labeling

Based on the distribution of experts on teams, which is power law (long tail) as shown in the figure, we label those in thetail asnonpopular and those in thehead as popular.

To find the cutoff betweenhead andtail, we calculate the average number of teams per expert over the whole dataset. As seen in the table, this number is62.45 and the popular/nonpopular ratio is0.426/0.574. The result is a Boolean value in{popular: True, nonpopular: False} for each expert and is save in{output}/popularity.csv like./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/rerank/popularity.csv

imdb
rawfiltered
#movies507,03432,059
#unique casts and crews876,9812,011
#unique genres2823
average #casts and crews per team1.883.98
average #genres per team1.541.76
average #movie per cast and crew1.0962.45
average #genre per cast and crew1.5910.85
#team w/ single cast and crew322,9180
#team w/ single genre315,50315,180

Future: We will consider equal area under the curve for the cutoff.

3.2. Gender

The following figures will demonstrate the gender distributions inimdb,dblp anduspt datasets.

3.3. Reranking

We apply rerankers fromdeterministic greedy re-ranking methods [Geyik et al. KDD'19], including{'det_greedy', 'det_cons', 'det_relaxed'} to mitigatepopulairty bias. The reranker needs a cutoffk_max which is set to10 by default.

The result of predictions after reranking is saved in{output}/rerank/{fpred}.{reranker}.{k_max}.rerank.pred like./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/rerank/f0.test.pred.det_cons.10.rerank.pred .

3.4. Evaluations

We evaluatefairness andutility metricsbefore andafter applying rerankers on team predictions to answer two research questions (RQs):

RQ1: Do state-of-the-art neural team formation models produce fair teams of experts in terms of popularity bias? To this end, we measure the fairness scores of predicted teamsbefore applying rerankers.

RQ2: Do state-of-the-art deterministic greedy re-ranking algorithms improve the fairness of neural team formation models while maintaining their accuracy? To this end, we measure thefairness andutility metricsbefore andafter applying rerankers.

The result offairness metricsbefore andafter will be stored in{output}.{algorithm}.{k_max}.{faireval}.csv like./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/rerank/f2.test.pred.det_cons.10.faireval.csv .

The result ofutility metricsbefore andafter will be stored in{output}.{algorithm}.{k_max}.{utileval}.csv like./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/rerank/f1.test.pred.det_cons.10.utileval.csv.

Future: We will consider other fairness metrics.

After successful run of all steps,./output contains:

├── output    └── toy.dblp.v12.json        ├── bnn        │   └── t31.s11.m13.l[100].lr0.1.b4096.e20.s1        │       ├── f0.test.pred        │       ├── f1.test.pred        │       ├── f2.test.pred        │       └── rerank/{popularity, gender}        │           ├── f0.test.pred.det_cons.10.faireval.csv        │           ├── f0.test.pred.det_cons.10.utileval.csv        │           ├── f0.test.pred.det_cons.10.rerank.csv        │           ├── f0.test.pred.det_cons.10.rerank.pred        │           ├── f1.test.pred.det_cons.10.faireval.csv        │           ├── f1.test.pred.det_cons.10.utileval.csv        │           ├── f1.test.pred.det_cons.10.rerank.csv        │           ├── f1.test.pred.det_cons.10.rerank.pred        │           ├── f2.test.pred.det_cons.10.faireval.csv        │           ├── f2.test.pred.det_cons.10.utileval.csv        │           ├── f2.test.pred.det_cons.10.rerank.csv        │           ├── f2.test.pred.det_cons.10.rerank.pred        │           ├── labels.csv        │           ├── rerank.time        │           └── stats.pkl        └── splits.json

4. Result

Our results show that although we improve fairness significantly, our utility metric drops extensively. Part of this phenomenon is described inFairness in Ranking, Part I: Score-Based Ranking [Zehlike et al. ACM Computing Surveys'22]. When we apply representation constraints on individual attributes, like race , popularity and gender and we want to maximize a score with respect to these constraints, utility loss can be particularly significant in historically disadvantaged intersectional groups. The following tables contain the results of our experiments on thebnn,bnn_emb andrandom baselines withgreedy,conservative andrelaxed re-ranking algorithms withdemographic parity fairness notion.

bnn(3.8 GB)
greedyconservativerelaxed
beforeafter$\Delta$after$\Delta$after$\Delta$
ndcg2 ↑0.695%0.126%-0.569%0.091%-0.604%0.146%-0.550%
ndcg5 ↑0.767%0.141%-0.626%0.130%-0.637%0.130%-0.637%
ndcg10 ↑1.058%0.247%-0.811%0.232%-0.826%0.246%-0.812%
map2 ↑0.248%0.060%-0.188%0.041%-0.207%0.063%-0.185%
map5 ↑0.381%0.083%-0.298%0.068%-0.313%0.079%-0.302%
map10 ↑0.467%0.115%-0.352%0.101%-0.366%0.115%-0.352%
ndlkl ↓0.23170.0276-0.20410.0276-0.20410.0273-0.2043
bnn_emb(3.79 GB)
greedyconservativerelaxed
beforeafter$\Delta$after$\Delta$after$\Delta$
ndcg2 ↑0.921%0.087%-0.834%0.121%-0.799%0.087%-0.834%
ndcg5 ↑0.927%0.117%-0.810%0.150%-0.777%0.117%-0.810%
ndcg10 ↑1.266%0.223%-1.043%0.241%-1.025%0.223%-1.043%
map2 ↑0.327%0.034%-0.293%0.057%-0.270%0.034%-0.293%
map5 ↑0.469%0.059%-0.410%0.084%-0.386%0.059%-0.410%
map10 ↑0.573%0.093%-0.480%0.111%-0.461%0.093%-0.480%
ndkl ↓0.27790.0244-0.25350.0244-0.25350.0241-0.2539
random(2.41 GB)
greedyconservativerelaxed
beforeafter$\Delta$after$\Delta$after$\Delta$
ndcg2 ↑0.1711%0.136%-0.035%0.205%0.034%0.205%0.034%
ndcg5 ↑0.1809%0.170%-0.011%0.190%0.009%0.190%0.009%
ndcg10 ↑0.3086%0.258%-0.051%0.283%-0.026%0.283%-0.026%
map2 ↑0.0617%0.059%-0.003%0.089%0.028%0.089%0.028%
map5 ↑0.0889%0.095%0.006%0.110%0.021%0.110%0.021%
map10 ↑0.1244%0.121%-0.003%0.140%0.016%0.140%0.016%
ndkl ↓0.00720.03690.02960.03660.02930.03660.0294

The files containing the rest of our experiment results with various notions, datasets ,and algorithms are as follows:

file
1Demographic Parity.Popularity.Conservative.DBLP
2Demographic Parity.Popularity.Greedy.DBLP
3Demographic Parity.Popularity.Relaxed.DBLP
4Equality of Opportunity.Popularity.Greedy.IMDB
5Equality of Opportunity.Popularity.Conservative.IMDB
6Equality of Opportunity.Popularity.Relaxed.IMDB
7Equality of Opportunity.Popularity.Greedy.DBLP
8Equality of Opportunity.Popularity.Relaxed.DBLP
9Equality of Opportunity.Popularity.Conservative.DBLP

5. Acknowledgement

We benefit frompytrec,reranking, and other libraries. We would like to thank the authors of these libraries and helpful resources.

6. License

©2024. This work is licensed under aCC BY-NC-SA 4.0 license.

7. Citation

@inproceedings{DBLP:conf/bias/LoghmaniF23,  author    = {Hamed Loghmani and Hossein Fani},  title     = {Bootless Application of Greedy Re-ranking Algorithms in Fair Neural Team Formation},  booktitle = {Advances in Bias and Fairness in Information Retrieval - Fourth International Workshop, {BIAS} 2023, Dublin, Irland, April 2, 2023, Revised Selected Papers},  pages     = {108--118},  publisher = {Springer Nature Switzerland},  year      = {2023},  url       = {https://doi.org/10.1007/978-3-031-37249-0_9},  doi       = {10.1007/978-3-031-37249-0_9},  bibsource = {dblp computer science bibliography, https://dblp.org}}

[8]ページ先頭

©2009-2025 Movatter.jp