- Notifications
You must be signed in to change notification settings - Fork0
DPigeon/NLP-Language-Classifier
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
https://github.com/DPigeon/NLP-Language-Classifier
A Naive Bayes classification for NLP to determine the most likely language of a tweet
First, install Miniconda with Python 3.7 at
https://docs.conda.io/en/latest/miniconda.html
You also need NumPy to run the project.
Install NumPy with
conda install numpy
To run the program, you must create an output folder in the root of the project. Then, you must edit the input.txt file in input folder.The input file text is made as follow:
vocabulary size_of_ngram smoothing_value training_file testing_file
Where the vocabulary is either
0Fold the corpus to lowercase and use only the 26 letters of the alphabet [a-z]
1Distinguish up and low cases and use only the 26 letters of the alphabet [a-z, A-Z]
2Distinguish up and low cases and use all characters accepted by the built-in isalpha() method
Where the size of ngram is either
1character unigrams
2character bigrams
3character trigrams
Smoothing value is a smoothing between [0, 1].
The trace file will give an output as follows:
tweet_id most_likely_class score_most_likely_class correct_class correct_wrong_label
Where the correct and wrong label.
The evaluation file will give an output as follows:
accuracyeu_precision ca_precision gl_precision es_precision en_precision pt_precisioneu_recal ca_recall gl_recall es_recall en_recall pt_recalleu_f1_measure ca_f1_measure gl_f1_measure es_f1_measure en_f1_measure pt_f1_measuremacro_f1 weighted_average_f1
About
A Naive Bayes classification for NLP to determine the most likely language of a tweet.
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Contributors2
Uh oh!
There was an error while loading.Please reload this page.