rafaelsandroni/author-profiling-modelsPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star3

Models from masters dissertation: Author profiling from texts using artificial neural networks, EACH-USP 2019

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 309 Commits
dissertation		dissertation
notebooks		notebooks
src		src
README.md		README.md
requeriments.txt		requeriments.txt
run.py		run.py
setup.py		setup.py

Repository files navigation

Author Profiling

Author Profiling (AP) is a computational task of recognizing the characteristics of textauthors based on their linguistic patterns. The use of computer computational models allowsus to infer social characteristics from the text, even if the authors do not consciously chooseto place indicators of these characteristics in the text. The AP task can be importantfor many practical applications, such as forensic analysis, criminal investigation, andmarketing. Traditional AP approaches often use language knowledge, which requires priorknowledge and requires manual effort to extract features. Recently, the use of artificialneural networks has shown satisfactory results in natural language processing (NLP)problems, however, for author profiling, presents a varied level of success. This paper aimsto organize, define and explore various authorial characterization tasks from the textualcorpus considered, covering three languages (i.e, Portuguese, English and Spanish) andfive textual domains (ie, social networks, questionnaires, SMS etc). Six models based onneural networks and word embeddings were proposed, performance of models are compared with baseline systems.

Masters dissertation

Download masters dissertation latest version

Implementation models

Here you can find implemented models with containing both data pipeline and machine learning pipeline.

lr_tfidf: logistic regression + tfidf, /src/models/baseline1
cnn_tfidf: 1D conv net + tfidf, /src/models/baseline2
cnn_wv: multichannel 1D conv net + word vectors, /src/models/baseline3
cnn_wv, Kim implementation: multichannel 1D conv net + word vectors, /src/models/baseline4
lstm_wv: LSTM + word vectors, /baseline5
lstm_attention_wv: LSTM self attention mechanism + word vectors, /src/models/baseline6
gru_wv: GRU + word vectors, /src/models/baseline7
cnn_char: multichannel 1D conv net + char vectors, /src/models/baseline9
lstm_attention_char: LSTM self attention mechanism + char vectors, /src/models/baseline9

Corpus

Those textual datasets supports 6 author profiling tasks: gender, age, education level, religious, IT formation and politics position, in three languages: portuguese, english and spanish.

This dissertation have structured and defined datasets to author profiling tasks, such as classes distribution and definition of the problems.

b5-post
BRMoral
BlogSet-BR
Nus-SMS
The Blog Authorship
PAN 2013 (PAN-CLEF)

Dataset are splited into stratificated training and test subsets

You can request access to structured datasets to the author.

Utils evaluation functions

Utils functions build to help implementations, pre-build models, reports etc

/src/functions/

utils: related to helpers functions
plot: related to plot functions, using matplotlib and metrics calc
word vectors: related to embeddings algorithms, training and load pre trained models
etc

Reference

@MASTERSDISSERTATION{sandroni-dias,  title        = "Author profiling from texts using artificial neural networks",  author       = "Rafael Felipe Sandroni Dias",  year         = "2019",  type         = "Master's Dissertation",  school       = "University of São Paulo",  address      = "São Paulo, SP, Brazil",}

About

Models from masters dissertation: Author profiling from texts using artificial neural networks, EACH-USP 2019

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Author Profiling

Masters dissertation

Implementation models

Corpus

Utils evaluation functions

Reference

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

rafaelsandroni/author-profiling-models

Folders and files

Latest commit

History

Repository files navigation

Author Profiling

Masters dissertation

Implementation models

Corpus

Utils evaluation functions

Reference

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages