- Notifications
You must be signed in to change notification settings - Fork0
rafaelsandroni/author-profiling-models
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Author Profiling (AP) is a computational task of recognizing the characteristics of textauthors based on their linguistic patterns. The use of computer computational models allowsus to infer social characteristics from the text, even if the authors do not consciously chooseto place indicators of these characteristics in the text. The AP task can be importantfor many practical applications, such as forensic analysis, criminal investigation, andmarketing. Traditional AP approaches often use language knowledge, which requires priorknowledge and requires manual effort to extract features. Recently, the use of artificialneural networks has shown satisfactory results in natural language processing (NLP)problems, however, for author profiling, presents a varied level of success. This paper aimsto organize, define and explore various authorial characterization tasks from the textualcorpus considered, covering three languages (i.e, Portuguese, English and Spanish) andfive textual domains (ie, social networks, questionnaires, SMS etc). Six models based onneural networks and word embeddings were proposed, performance of models are compared with baseline systems.
Download masters dissertation latest version
Here you can find implemented models with containing both data pipeline and machine learning pipeline.
lr_tfidf: logistic regression + tfidf, /src/models/baseline1
cnn_tfidf: 1D conv net + tfidf, /src/models/baseline2
cnn_wv: multichannel 1D conv net + word vectors, /src/models/baseline3
cnn_wv, Kim implementation: multichannel 1D conv net + word vectors, /src/models/baseline4
lstm_wv: LSTM + word vectors, /baseline5
lstm_attention_wv: LSTM self attention mechanism + word vectors, /src/models/baseline6
gru_wv: GRU + word vectors, /src/models/baseline7
cnn_char: multichannel 1D conv net + char vectors, /src/models/baseline9
lstm_attention_char: LSTM self attention mechanism + char vectors, /src/models/baseline9
Those textual datasets supports 6 author profiling tasks: gender, age, education level, religious, IT formation and politics position, in three languages: portuguese, english and spanish.
This dissertation have structured and defined datasets to author profiling tasks, such as classes distribution and definition of the problems.
- b5-post
- BRMoral
- BlogSet-BR
- Nus-SMS
- The Blog Authorship
- PAN 2013 (PAN-CLEF)
Dataset are splited into stratificated training and test subsets
You can request access to structured datasets to the author.
Utils functions build to help implementations, pre-build models, reports etc
/src/functions/
- utils: related to helpers functions
- plot: related to plot functions, using matplotlib and metrics calc
- word vectors: related to embeddings algorithms, training and load pre trained models
- etc
@MASTERSDISSERTATION{sandroni-dias, title = "Author profiling from texts using artificial neural networks", author = "Rafael Felipe Sandroni Dias", year = "2019", type = "Master's Dissertation", school = "University of São Paulo", address = "São Paulo, SP, Brazil",}
About
Models from masters dissertation: Author profiling from texts using artificial neural networks, EACH-USP 2019
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.