NotificationsYou must be signed in to change notification settings
Fork2
Star5

Fairness-Aware Team Formation

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 229 Commits
data/preprocessed/dblp/toy.dblp.v12.json		data/preprocessed/dblp/toy.dblp.v12.json
misc		misc
output/toy.dblp.v12.json		output/toy.dblp.v12.json
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
quickstart.ipynb		quickstart.ipynb
requirements.txt		requirements.txt

Repository files navigation

`Adila`^*: Fairness-Aware Team Formation

^{* عادلة, feminine Arabic given name, meaning just and fair}

Team Formation aims to automate forming teams of experts who can successfully solve difficult tasks. While state-of-the-art neural team formation methods are able to efficiently analyze massive collections of experts to form effective collaborative teams, they largely ignore the fairness in recommended teams of experts. Fairness breeds innovation and increases teams' success by enabling a stronger sense of community, reducing conflict, and stimulating more creative thinking. InAdila, we study the application offairness-aware team formation algorithms to mitigate the potential popularity bias in the neural team formation models. Our experiments show that, first, neural team formation models are biased towardpopular andmale experts. Second, although deterministic re-ranking algorithms mitigatepopularity XORgender bias substantially, they severely hurt the efficacy of teams. On the other hand, probabilistic greedy re-ranking algorithms mitigatepopularity bias significantly and maintain utility. Finally, due to extreme bias in the dataset in terms ofgender, probabilistic greedy re-ranking algorithms also fail to achieve fair and efficient teams.

We have studied the application of state-of-the-artdeterministic greedy re-ranking methods [Geyik et al. KDD'19] in addition toprobabilistic greedy re-ranking methods [Zehlike et al. IP&M'22]to mitigatepopulairty bias andgender bias based onequality of opportunity anddemographic parity notions of fairness for state-of-the-art neural team formation methods fromOpeNTF. Our experiments show that:
Neural team formation models are biased toward popular experts;
Although deterministic re-ranking algorithms mitigate bias substantially, they severely hurt the efficacy of teams.
Probabilistic greedy re-ranking methods are able to mitigate bias while maintaining the utility of the teams as well.

Currently, we are investigating:
Other fairness factors like demographic attributes, including age, race, and gender;
Developing machine learning-based models using Learning-to-Rank (L2R) techniques to mitigate bias as opposed to deterministic greedy algorithms.

1. Setup

Adila needsPython=3.8 and others packages listed inrequirements.txt:

Bypip, clone the codebase and install the required packages:

git clone https://github.com/Fani-Lab/Adilacd Adilapip install -r requirements.txt

Byconda:

git clone https://github.com/Fani-Lab/Adilacd Adilaconda env create -f environment.ymlconda activate adila

2. Quickstart

To runAdila, you can use./src/main.py:

cd srcpython -u main.py \  -fteamsvecs ../data/preprocessed/dblp/toy.dblp.v12.json/teamsvecs.pkl \  -fsplit ../output/toy.dblp.v12.json/splits.json \  -fpred ../output/toy.dblp.v12.json/bnn/ \  -np_ratio 0.5 \  -reranker det_cons \  -output ../output/toy.dblp.v12.json/

Where the arguements are:

fteamsvecs: the sparse matrix representation of all teams in a pickle file, including the teams whose members are predicted in--pred. It should contain a dictionary of threelil_matrix with keys[id] of size[#teams × 1],[skill] of size[#teams × #skills],[member] of size[#teams × #experts]. Simply, each row of a metrix shows the occurrence vector of skills and experts in a team. For a toy example, try

import picklewith open(./data/preprocessed/dblp/toy.dblp.v12.json/teamsvecs.pkl) as f: teams=pickle.load(f)

fsplit: the split.json file that indicates the index (rowid) of teams whose members are predicted in--pred. For a toy example, seeoutput/toy.dblp.v12.json/splits.json

fpred: a file or folder that includes the prediction files of a neural team formation methods in the format oftorch.ndarray. The file name(s) should be*.pred and the content is[#test × #experts] probabilities that shows the membership probability of an expert to a team in test set. For a toy example, try

import torchtorch.load(./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/f0.test.pred)

np_ratio: the desirednonpopular ratio among members of predicted teams after mitigation process by re-ranking algorithms. E.g., 0.5.

reranker: fairness-aware reranking algorithm from {det_greedy,det_cons,det_relaxed,fa-ir}. Eg.det_cons.

output: the path to the reranked predictions of members for teams, as well as, the teams' success and fairness evaluationbefore andafter mitigation process.

3. Pipeline

Adila needs preprocessed information about the teams in the form of sparse matrix representation (-fteamsvecs) and neural team formation prediction file(s) (-fpred), obtained fromOpeNTF:

├── data│   └── preprocessed│       └── dblp│           └── toy.dblp.v12.json│               └── teamsvecs.pkl#sparse matrix representation of teams├── output    └── toy.dblp.v12.json        ├── bnn        │   └── t31.s11.m13.l[100].lr0.1.b4096.e20.s1        │       ├── f0.test.pred        │       ├── f1.test.pred        │       ├── f2.test.pred        └── splits.json#rowids of team instances in n-fold train-valid splits, and a final test split

Adila has three main steps:

3.1. Labeling

Based on the distribution of experts on teams, which is power law (long tail) as shown in the figure, we label those in thetail asnonpopular and those in thehead as popular.

To find the cutoff betweenhead andtail, we calculate the average number of teams per expert over the whole dataset. As seen in the table, this number is62.45 and the popular/nonpopular ratio is0.426/0.574. The result is a Boolean value in{popular: True, nonpopular: False} for each expert and is save in{output}/popularity.csv like./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/rerank/popularity.csv

imdb
	raw	filtered
#movies	507,034	32,059
#unique casts and crews	876,981	2,011
#unique genres	28	23
average #casts and crews per team	1.88	3.98
average #genres per team	1.54	1.76
average #movie per cast and crew	1.09	62.45
average #genre per cast and crew	1.59	10.85
#team w/ single cast and crew	322,918	0
#team w/ single genre	315,503	15,180

Future: We will consider equal area under the curve for the cutoff.

3.2. Gender

The following figures will demonstrate the gender distributions inimdb,dblp anduspt datasets.

3.3. Reranking

We apply rerankers fromdeterministic greedy re-ranking methods [Geyik et al. KDD'19], including{'det_greedy', 'det_cons', 'det_relaxed'} to mitigatepopulairty bias. The reranker needs a cutoffk_max which is set to10 by default.

The result of predictions after reranking is saved in{output}/rerank/{fpred}.{reranker}.{k_max}.rerank.pred like./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/rerank/f0.test.pred.det_cons.10.rerank.pred .

3.4. Evaluations

We evaluatefairness andutility metricsbefore andafter applying rerankers on team predictions to answer two research questions (RQs):

RQ1: Do state-of-the-art neural team formation models produce fair teams of experts in terms of popularity bias? To this end, we measure the fairness scores of predicted teamsbefore applying rerankers.

RQ2: Do state-of-the-art deterministic greedy re-ranking algorithms improve the fairness of neural team formation models while maintaining their accuracy? To this end, we measure thefairness andutility metricsbefore andafter applying rerankers.

The result offairness metricsbefore andafter will be stored in{output}.{algorithm}.{k_max}.{faireval}.csv like./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/rerank/f2.test.pred.det_cons.10.faireval.csv .

The result ofutility metricsbefore andafter will be stored in{output}.{algorithm}.{k_max}.{utileval}.csv like./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/rerank/f1.test.pred.det_cons.10.utileval.csv.

Future: We will consider other fairness metrics.

After successful run of all steps,./output contains:

├── output    └── toy.dblp.v12.json        ├── bnn        │   └── t31.s11.m13.l[100].lr0.1.b4096.e20.s1        │       ├── f0.test.pred        │       ├── f1.test.pred        │       ├── f2.test.pred        │       └── rerank/{popularity, gender}        │           ├── f0.test.pred.det_cons.10.faireval.csv        │           ├── f0.test.pred.det_cons.10.utileval.csv        │           ├── f0.test.pred.det_cons.10.rerank.csv        │           ├── f0.test.pred.det_cons.10.rerank.pred        │           ├── f1.test.pred.det_cons.10.faireval.csv        │           ├── f1.test.pred.det_cons.10.utileval.csv        │           ├── f1.test.pred.det_cons.10.rerank.csv        │           ├── f1.test.pred.det_cons.10.rerank.pred        │           ├── f2.test.pred.det_cons.10.faireval.csv        │           ├── f2.test.pred.det_cons.10.utileval.csv        │           ├── f2.test.pred.det_cons.10.rerank.csv        │           ├── f2.test.pred.det_cons.10.rerank.pred        │           ├── labels.csv        │           ├── rerank.time        │           └── stats.pkl        └── splits.json

4. Result

Our results show that although we improve fairness significantly, our utility metric drops extensively. Part of this phenomenon is described inFairness in Ranking, Part I: Score-Based Ranking [Zehlike et al. ACM Computing Surveys'22]. When we apply representation constraints on individual attributes, like race , popularity and gender and we want to maximize a score with respect to these constraints, utility loss can be particularly significant in historically disadvantaged intersectional groups. The following tables contain the results of our experiments on thebnn,bnn_emb andrandom baselines withgreedy,conservative andrelaxed re-ranking algorithms withdemographic parity fairness notion.

`bnn(3.8 GB)`
		greedy		conservative		relaxed
	before	after	$\Delta$	after	$\Delta$	after	$\Delta$
ndcg2 ↑	0.695%	0.126%	-0.569%	0.091%	-0.604%	0.146%	-0.550%
ndcg5 ↑	0.767%	0.141%	-0.626%	0.130%	-0.637%	0.130%	-0.637%
ndcg10 ↑	1.058%	0.247%	-0.811%	0.232%	-0.826%	0.246%	-0.812%
map2 ↑	0.248%	0.060%	-0.188%	0.041%	-0.207%	0.063%	-0.185%
map5 ↑	0.381%	0.083%	-0.298%	0.068%	-0.313%	0.079%	-0.302%
map10 ↑	0.467%	0.115%	-0.352%	0.101%	-0.366%	0.115%	-0.352%
ndlkl ↓	0.2317	0.0276	-0.2041	0.0276	-0.2041	0.0273	-0.2043

`bnn_emb(3.79 GB)`
		greedy		conservative		relaxed
	before	after	$\Delta$	after	$\Delta$	after	$\Delta$
ndcg2 ↑	0.921%	0.087%	-0.834%	0.121%	-0.799%	0.087%	-0.834%
ndcg5 ↑	0.927%	0.117%	-0.810%	0.150%	-0.777%	0.117%	-0.810%
ndcg10 ↑	1.266%	0.223%	-1.043%	0.241%	-1.025%	0.223%	-1.043%
map2 ↑	0.327%	0.034%	-0.293%	0.057%	-0.270%	0.034%	-0.293%
map5 ↑	0.469%	0.059%	-0.410%	0.084%	-0.386%	0.059%	-0.410%
map10 ↑	0.573%	0.093%	-0.480%	0.111%	-0.461%	0.093%	-0.480%
ndkl ↓	0.2779	0.0244	-0.2535	0.0244	-0.2535	0.0241	-0.2539

`random(2.41 GB)`
		greedy		conservative		relaxed
	before	after	$\Delta$	after	$\Delta$	after	$\Delta$
ndcg2 ↑	0.1711%	0.136%	-0.035%	0.205%	0.034%	0.205%	0.034%
ndcg5 ↑	0.1809%	0.170%	-0.011%	0.190%	0.009%	0.190%	0.009%
ndcg10 ↑	0.3086%	0.258%	-0.051%	0.283%	-0.026%	0.283%	-0.026%
map2 ↑	0.0617%	0.059%	-0.003%	0.089%	0.028%	0.089%	0.028%
map5 ↑	0.0889%	0.095%	0.006%	0.110%	0.021%	0.110%	0.021%
map10 ↑	0.1244%	0.121%	-0.003%	0.140%	0.016%	0.140%	0.016%
ndkl ↓	0.0072	0.0369	0.0296	0.0366	0.0293	0.0366	0.0294

The files containing the rest of our experiment results with various notions, datasets ,and algorithms are as follows:

	file
1	Demographic Parity.Popularity.Conservative.DBLP
2	Demographic Parity.Popularity.Greedy.DBLP
3	Demographic Parity.Popularity.Relaxed.DBLP
4	Equality of Opportunity.Popularity.Greedy.IMDB
5	Equality of Opportunity.Popularity.Conservative.IMDB
6	Equality of Opportunity.Popularity.Relaxed.IMDB
7	Equality of Opportunity.Popularity.Greedy.DBLP
8	Equality of Opportunity.Popularity.Relaxed.DBLP
9	Equality of Opportunity.Popularity.Conservative.DBLP

5. Acknowledgement

We benefit frompytrec,reranking, and other libraries. We would like to thank the authors of these libraries and helpful resources.

6. License

7. Citation

@inproceedings{DBLP:conf/bias/LoghmaniF23,  author    = {Hamed Loghmani and Hossein Fani},  title     = {Bootless Application of Greedy Re-ranking Algorithms in Fair Neural Team Formation},  booktitle = {Advances in Bias and Fairness in Information Retrieval - Fourth International Workshop, {BIAS} 2023, Dublin, Irland, April 2, 2023, Revised Selected Papers},  pages     = {108--118},  publisher = {Springer Nature Switzerland},  year      = {2023},  url       = {https://doi.org/10.1007/978-3-031-37249-0_9},  doi       = {10.1007/978-3-031-37249-0_9},  bibsource = {dblp computer science bibliography, https://dblp.org}}

About

Fairness-Aware Team Formation

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Folders and files

Latest commit

History

Repository files navigation

`Adila`^*: Fairness-Aware Team Formation

1. Setup

2. Quickstart

3. Pipeline

3.1. Labeling

3.2. Gender

3.3. Reranking

3.4. Evaluations

4. Result

5. Acknowledgement

6. License

7. Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages

Contributors6

Languages

Movatterモバイル変換

fani-lab/Adila

Folders and files

Latest commit

History

Repository files navigation

Adila*: Fairness-Aware Team Formation

1. Setup

2. Quickstart

3. Pipeline

3.1. Labeling

3.2. Gender

3.3. Reranking

3.4. Evaluations

4. Result

5. Acknowledgement

6. License

7. Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages0

Contributors6

Languages

`Adila`^*: Fairness-Aware Team Formation

Packages