- Notifications
You must be signed in to change notification settings - Fork0
labadier/EVALITA_TAG_it
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This Project constist on a neural network model used for participating in the TAG-it Author Profiling task at EVALITA 2020. This task aims to predict age and gender of blogs users from their posts, as the topic they wrote about.It combines learned representations by RNN at word and sentence levels, Transformer Neural Net, specifically BERT arquitecture, and hand-crafted stylistic features.All these representations are mixed and fed into fully connected layer from a fedforward neural network in order to make predictions for addressed subtasks.
The Models description is availablehere.
For this code to be functional is needed:
- Python 3.8
- tensorflow 2.0
- Keras 2.4.3
- Freeling 4.1 and python API
- Italian Word Embedding avalilablehere
- Once downloaded the word embedding file
(wiki-it.vec)
it must be placed ondata
folder. - Download the weights of theBERT model and place it on `data' folder.
- Train the models.
- Make the predictions over the test files
The models code for predicting each task is locatend onEnsemble
floder, also there is a file train.py which once run save the weights learned with the provided training data.So the first step for use this classifier is run on the command line:
python ./Ensemble/train.py
The training files are located ondata
folder and these are the one provided by the contest organizers. If you want to chage the trainning file, change thesource
variable on thistrain.py
file.
source="./data/training.txt"
For making predictions run:
python main.py
You should provide the test files by-dp
option. Inside thetest_data
folder is the test data provided by the organizers.
The datasets are composed by texts written by multiple users, with possibly multiple posts per user.The data is distributed in the form of one XML-like file per genre with one sample per elements, and attributes specifying an id, the topic, the gendermale|female
, and the age range[0,19], [20,29], [30-39], [40-49], [50-100]
. This is a sample:
<docid="3046"topic="orologi"age="30-39"gender="male" > <post> Per quale motivo oggi, il mondo dell'orologeria è così importante per voi? </post> <post> Cosa vi ha spinto a rendervi appassionati così bramosi? </post></doc>