VETURISRIRAM/YELP_REVIEWS_SENTIMENT_ANALYSIS_FASTTEXT_AUTOTUNEPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star4

Sentiment Analysis of Kaggle Yelp Reviews using FastText.

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
src		src
README.md		README.md
requirements.txt		requirements.txt

Repository files navigation

Yelp Reviews Sentiment Analysis using FastText

Description

This project aims to classify theKaggle Yelp reviews in three classes.

Positive (If the stars are above 3).
Neutral (If the stars are equal to 3).
Negative (If the stars are below 3).

FastText Introduction

FastText as a library for efficient learning of word representations and sentence classification. It is written in C++ and supports multiprocessing during training. FastText allows you to train supervised and unsupervised representations of words and sentences. These representations (embeddings) can be used for numerous applications from data compression, as features into additional models, for candidate selection, or as initializers for transfer learning.

Get the Data.

Thedata used in this project from the initial Kaggle dataset to the intermediate FastText files created could be downloaded fromhere.

In this repository, I have kept the./data/ directory empty. You can place the downloaded folder (extracted) in the./data/ and follow the below instructions.

How to run?

Setup the project. I used the latest FastText from theGitHub.

I wanted to explore theAutoTune feature of FastText which enables the automatic Hyperparameter tuning. UsingAutoTune feature, the model is trained with the best possible hyperparameters. According to my understanding, it is somewhat similar to theSklearn's GridSearchCV module.

The below script reads the data, creates the labels, does some minor text preprocessing usingmultiprocessing.

python preprocess_data.py

After the preprocessing is done, train-val-test files are to be created for the FastText model.

The format required by FastText is like__label__positive The restaurant was great.

Notice the__label__. It's how FastText understands thatpositive is the label for the dataThe restaurant was great.

They could either be separated by a space or a tab.

The file extension does not matter. It could be any of the TXT/TSV/CSV or other extensions which can hold textual data.

The below command creates input files for FastText as described above.

python create_files.py

Now that we have the input files, we can go ahead and start training our classifier.

Model Training and Testing are fairly simple in FastText.

I am using theAutoTune functionality to tune the hyperparameters of my model. It can be set by passing the validation file to theautotuneValidationFile argument when you initiate the training.

The bin model is trained with the best hyperparameters and saved in the./models/ directory.

python modelling.py

I chose recall as my evaluation metric in order to fit more reviews in the correct buckets or classes.

Evaluation Results on the Test Set.

Recall@10.863

You can also get results /predictions for your test review by running the below file.

python test_model.py

Example Testing

I gave some test inputs to the model and got the predictions as follows.

print(model.predict("the food was really great"))print(model.predict("the restaurant was horrible"))print(model.predict("the salon was okay. Not bad!"))

Output/Predictions:

(('__label__positive',), array([0.99909163]))(('__label__negative',), array([1.00000417]))(('__label__neutral',), array([0.99479502]))

References and Sources

Thanks to the authors of these articles!

FastText: Under the Hood (Medium Article).
Python for NLP: Working with Facebook FastText Library (StackAbuse Article).
FastText Official Documentation

About

Sentiment Analysis of Kaggle Yelp Reviews using FastText.

Releases

No releases published

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Yelp Reviews Sentiment Analysis using FastText

Description

FastText Introduction

Get the Data.

How to run?

Example Testing

References and Sources

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

VETURISRIRAM/YELP_REVIEWS_SENTIMENT_ANALYSIS_FASTTEXT_AUTOTUNE

Folders and files

Latest commit

History

Repository files navigation

Yelp Reviews Sentiment Analysis using FastText

Description

FastText Introduction

Get the Data.

How to run?

Example Testing

References and Sources

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages