Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Sentiment Analysis of Kaggle Yelp Reviews using FastText.

NotificationsYou must be signed in to change notification settings

VETURISRIRAM/YELP_REVIEWS_SENTIMENT_ANALYSIS_FASTTEXT_AUTOTUNE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Description

This project aims to classify theKaggle Yelp reviews in three classes.

  1. Positive (If the stars are above 3).
  2. Neutral (If the stars are equal to 3).
  3. Negative (If the stars are below 3).

FastText Introduction

FastText as a library for efficient learning of word representations and sentence classification. It is written in C++ and supports multiprocessing during training. FastText allows you to train supervised and unsupervised representations of words and sentences. These representations (embeddings) can be used for numerous applications from data compression, as features into additional models, for candidate selection, or as initializers for transfer learning.

Get the Data.

Thedata used in this project from the initial Kaggle dataset to the intermediate FastText files created could be downloaded fromhere.

In this repository, I have kept the./data/ directory empty. You can place the downloaded folder (extracted) in the./data/ and follow the below instructions.

How to run?

Setup the project. I used the latest FastText from theGitHub.

I wanted to explore theAutoTune feature of FastText which enables the automatic Hyperparameter tuning. UsingAutoTune feature, the model is trained with the best possible hyperparameters. According to my understanding, it is somewhat similar to theSklearn's GridSearchCV module.

The below script reads the data, creates the labels, does some minor text preprocessing usingmultiprocessing.

python preprocess_data.py

After the preprocessing is done, train-val-test files are to be created for the FastText model.

The format required by FastText is like__label__positive The restaurant was great.

Notice the__label__. It's how FastText understands thatpositive is the label for the dataThe restaurant was great.

They could either be separated by a space or a tab.

The file extension does not matter. It could be any of the TXT/TSV/CSV or other extensions which can hold textual data.

The below command creates input files for FastText as described above.

python create_files.py

Now that we have the input files, we can go ahead and start training our classifier.

Model Training and Testing are fairly simple in FastText.

I am using theAutoTune functionality to tune the hyperparameters of my model. It can be set by passing the validation file to theautotuneValidationFile argument when you initiate the training.

The bin model is trained with the best hyperparameters and saved in the./models/ directory.

python modelling.py

I chose recall as my evaluation metric in order to fit more reviews in the correct buckets or classes.

Evaluation Results on the Test Set.

Recall@10.863

You can also get results /predictions for your test review by running the below file.

python test_model.py

Example Testing

I gave some test inputs to the model and got the predictions as follows.

print(model.predict("the food was really great"))print(model.predict("the restaurant was horrible"))print(model.predict("the salon was okay. Not bad!"))

Output/Predictions:

(('__label__positive',), array([0.99909163]))(('__label__negative',), array([1.00000417]))(('__label__neutral',), array([0.99479502]))

References and Sources

Thanks to the authors of these articles!

  1. FastText: Under the Hood (Medium Article).
  2. Python for NLP: Working with Facebook FastText Library (StackAbuse Article).
  3. FastText Official Documentation

About

Sentiment Analysis of Kaggle Yelp Reviews using FastText.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp