samrat-halder/personality-detection-with-BERT-RoBERTPublic

NotificationsYou must be signed in to change notification settings
Fork1
Star9

Detection of MBTI-type personality with NLP and Deep Learning

License

GPL-3.0 license

9 stars 1 fork Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
logs		logs
setup		setup
src		src
utils		utils
LICENSE		LICENSE
README.md		README.md
config.txt		config.txt
report.pdf		report.pdf
requirements.sh		requirements.sh

Repository files navigation

MBTI type personality detection with BERT and RoBERT

Detection of MBTI-type personality with state-of-art language models

Software version: Ubuntu version 16.04 | Tensorflow-gpu version 1.11 | Keras 1.0.8 | cuda 9.0 | python 3.6.10

Minimum server requirements: 30 Gb CPU, 15 Gb NVIDIA P100 GPU

Settting up GCP environment and running experiments

Step-by-step instructions for setting up the environment:

Download libcudnn7-dev_7.4.2.24-1+cuda9.0_amd64.deb, libcudnn7-doc_7.4.2.24-1+cuda9.0_amd64.deb, libcudnn7_7.4.2.24-1+cuda9.0_amd64.deb to your home folder from Nvidia developer website. Please note the versions (the module may not be compatible with any other version)
Clone the git repository to your home
cd /$repository/setup/
Run setup.sh from setup dir with commandsh setup.sh
Naviagte back to home dircd ../
Runsh requirement.sh
Import bash file bysource ~/.bashrc
copy mbti9k_comments.csv file (~2GB) to ./data/ (This file can be obtained on request)

Step-by-step instructions for running the codes:

First one needs to setup the config file. Please follow the instructions below file-by-file according to your experiement. You may want to change the config file parameters from script to script to reduce the runtime.

cd ./src
To quickly check a summary of the data runpython3 data_summary.py. This will output a file in data directory with sentence length for each user.
Runpython3 src/data_prep.py to create training samples for both the models.
Runpython3 train_bert.py This will fine tune the BERT model for the classification task. It willcreate model.ckpt-NUM files in the model_folder/outputs directory.
Runpython3 predict_bert.py. If you set the parameter toFLAG = 'H' then it will prepare a dataset with grouped embeddings for the entire document of each individual user and pass it to the RoBERT model. Otherwise it will run the fine tuned BERT model on a test dataset. Also you need to set theBERT_NUM parameter from theckpt file from step 3. Please refer to the script for other parameters.
Runpython3 run_model_roBert.py to run the RoBERT model.

Please note some of the functions in ./utils/ of this repository were taken from Google Research's BERT repository

Results

Results: We achieved 41% accuracy on an 8-class and 43% accuracy on a 4-class classification problem with MBTI type personality detection in our research. We also ran the 4-class classification task with RoBERT for the full document and achieved the best accuracy of 29.8%.

4-Class

Class	Precision	Recall	f1
INTP	0.49	0.42	0.45
INF	0.41	0.64	0.5
INTJ	0.41	0.41	0.41
EN	0.44	0.18	0.26

8-Class

Class	Precision	Recall	f1
INTP	0.47	0.45	0.46
INTJ	0.39	0.69	0.5
INFP	0.35	0.27	0.31
INFJ	0.4	0.24	0.3
ENTP	0.55	0.12	0.2
ENFP	0.37	0.1	0.15
ENTJ	0.38	0.05	0.09
ENFJ	0.28	0.04	0.07

For 4-class classification the fine-tuning and training BERT-base model took 2 hr 25 mins with 230,428 examples (each of length 150 words). For the same classification task with RoBERT using 5 epochs, batch size of 5 took 2 hr 43 minutes with 7218 samples (whole document). Other computation times for data preparation and other experiments can be found in the log files.

Contact Information

This repository has been developed and maintained by Samrat Halder as a part ofELEN6040: Deep learning Research course offered by Professor Zoran Kostic at Columbia University. For any communication related to this repository please contactsh3970@columbia.edu

License

This repository has been released under GPL-3.0

About

Detection of MBTI-type personality with NLP and Deep Learning

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

MBTI type personality detection with BERT and RoBERT

Settting up GCP environment and running experiments

Results

Contact Information

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

samrat-halder/personality-detection-with-BERT-RoBERT

Folders and files

Latest commit

History

Repository files navigation

MBTI type personality detection with BERT and RoBERT

Settting up GCP environment and running experiments

Results

Contact Information

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages