Movatterモバイル変換

labadier/EVALITA_TAG_itPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star2

Predicting age, gender and topic from italian texts

2 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Ensembler		Ensembler
bert		bert
data		data
test_data/tagit-testsets		test_data/tagit-testsets
README.md		README.md
_pyfreeling.so		_pyfreeling.so
bertTL.py		bertTL.py
feature_extract.py		feature_extract.py
feature_extract_IG_ITFIDF.py		feature_extract_IG_ITFIDF.py
main.py		main.py
pyfreeling.py		pyfreeling.py
toolsIT.py		toolsIT.py
toolsIT_bert.py		toolsIT_bert.py

Repository files navigation

EVALITA_TAG_it

This Project constist on a neural network model used for participating in the TAG-it Author Profiling task at EVALITA 2020. This task aims to predict age and gender of blogs users from their posts, as the topic they wrote about.It combines learned representations by RNN at word and sentence levels, Transformer Neural Net, specifically BERT arquitecture, and hand-crafted stylistic features.All these representations are mixed and fed into fully connected layer from a fedforward neural network in order to make predictions for addressed subtasks.

The Models description is availablehere.

For this code to be functional is needed:

Python 3.8
tensorflow 2.0
Keras 2.4.3
Freeling 4.1 and python API
Italian Word Embedding avalilablehere

Steps for using the model

Once downloaded the word embedding file(wiki-it.vec) it must be placed ondata folder.
Download the weights of theBERT model and place it on `data' folder.
Train the models.
Make the predictions over the test files

Training models of the ensemble

The models code for predicting each task is locatend onEnsemble floder, also there is a file train.py which once run save the weights learned with the provided training data.So the first step for use this classifier is run on the command line:

 python ./Ensemble/train.py

The training files are located ondata folder and these are the one provided by the contest organizers. If you want to chage the trainning file, change thesource variable on thistrain.py file.

source="./data/training.txt"

Making Predictions

For making predictions run:

 python main.py

You should provide the test files by-dp option. Inside thetest_data folder is the test data provided by the organizers.

Data Format

The datasets are composed by texts written by multiple users, with possibly multiple posts per user.The data is distributed in the form of one XML-like file per genre with one sample per elements, and attributes specifying an id, the topic, the gendermale|female, and the age range[0,19], [20,29], [30-39], [40-49], [50-100]. This is a sample:

<docid="3046"topic="orologi"age="30-39"gender="male" > <post>   Per quale motivo oggi, il mondo dell'orologeria è così importante per voi?  </post> <post>   Cosa vi ha spinto a rendervi appassionati così bramosi? </post></doc>

About

Predicting age, gender and topic from italian texts

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

EVALITA_TAG_it

Steps for using the model

Training models of the ensemble

Making Predictions

Data Format

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

labadier/EVALITA_TAG_it

Folders and files

Latest commit

History

Repository files navigation

EVALITA_TAG_it

Steps for using the model

Training models of the ensemble

Making Predictions

Data Format

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages