Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Predicting age, gender and topic from italian texts

NotificationsYou must be signed in to change notification settings

labadier/EVALITA_TAG_it

Repository files navigation

This Project constist on a neural network model used for participating in the TAG-it Author Profiling task at EVALITA 2020. This task aims to predict age and gender of blogs users from their posts, as the topic they wrote about.It combines learned representations by RNN at word and sentence levels, Transformer Neural Net, specifically BERT arquitecture, and hand-crafted stylistic features.All these representations are mixed and fed into fully connected layer from a fedforward neural network in order to make predictions for addressed subtasks.

The Models description is availablehere.

For this code to be functional is needed:

  • Python 3.8
  • tensorflow 2.0
  • Keras 2.4.3
  • Freeling 4.1 and python API
  • Italian Word Embedding avalilablehere

Steps for using the model

Training models of the ensemble

The models code for predicting each task is locatend onEnsemble floder, also there is a file train.py which once run save the weights learned with the provided training data.So the first step for use this classifier is run on the command line:

 python ./Ensemble/train.py

The training files are located ondata folder and these are the one provided by the contest organizers. If you want to chage the trainning file, change thesource variable on thistrain.py file.

source="./data/training.txt"

Making Predictions

For making predictions run:

 python main.py

You should provide the test files by-dp option. Inside thetest_data folder is the test data provided by the organizers.

Data Format

The datasets are composed by texts written by multiple users, with possibly multiple posts per user.The data is distributed in the form of one XML-like file per genre with one sample per elements, and attributes specifying an id, the topic, the gendermale|female, and the age range[0,19], [20,29], [30-39], [40-49], [50-100]. This is a sample:

<docid="3046"topic="orologi"age="30-39"gender="male" > <post>   Per quale motivo oggi, il mondo dell'orologeria è così importante per voi?  </post> <post>   Cosa vi ha spinto a rendervi appassionati così bramosi? </post></doc>

About

Predicting age, gender and topic from italian texts

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp