Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A Naive Bayes classification for NLP to determine the most likely language of a tweet.

NotificationsYou must be signed in to change notification settings

DPigeon/NLP-Language-Classifier

Repository files navigation

https://github.com/DPigeon/NLP-Language-Classifier

A Naive Bayes classification for NLP to determine the most likely language of a tweet

First, install Miniconda with Python 3.7 at

https://docs.conda.io/en/latest/miniconda.html

You also need NumPy to run the project.

Install NumPy with

conda install numpy

Run

To run the program, you must create an output folder in the root of the project. Then, you must edit the input.txt file in input folder.The input file text is made as follow:

vocabulary size_of_ngram smoothing_value training_file testing_file

Where the vocabulary is either

0Fold the corpus to lowercase and use only the 26 letters of the alphabet [a-z]
1Distinguish up and low cases and use only the 26 letters of the alphabet [a-z, A-Z]
2Distinguish up and low cases and use all characters accepted by the built-in isalpha() method

Where the size of ngram is either

1character unigrams
2character bigrams
3character trigrams

Smoothing value is a smoothing between [0, 1].

Output Files

The trace file will give an output as follows:

tweet_id  most_likely_class  score_most_likely_class  correct_class  correct_wrong_label

Where the correct and wrong label.

The evaluation file will give an output as follows:

accuracyeu_precision  ca_precision  gl_precision  es_precision  en_precision  pt_precisioneu_recal  ca_recall  gl_recall  es_recall  en_recall  pt_recalleu_f1_measure  ca_f1_measure  gl_f1_measure  es_f1_measure  en_f1_measure  pt_f1_measuremacro_f1  weighted_average_f1

About

A Naive Bayes classification for NLP to determine the most likely language of a tweet.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors2

  •  
  •  

Languages


[8]ページ先頭

©2009-2025 Movatter.jp