mjmaher987/Sentiment-Analysis-ProjectPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star9

This is a project related to machine learning course

License

MIT license

9 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Source Codes		Source Codes
Final Edition.ipynb		Final Edition.ipynb
LICENSE		LICENSE
README.md		README.md
project documentation.pdf		project documentation.pdf

Repository files navigation

Sentiment Analysis of Movie Reviews

This project performs sentiment analysis on the Large Movie Review Dataset using various machine learning techniques.

About

This project is related to the machine learning course project.Here we are going to clean data, train, test and evaluate some models for sentiment analysis.We use a lot of traditional and new models for training and testing.Share any ideas or brainstorming if you have.

Contributors

Mohammad Javad Maheronnaghsh

Data

The dataset contains 50,000 reviews from IMDB. There are 25,000 training reviews and 25,000 testing reviews. The sentiment labels are balanced between positive and negative.

The data is loaded and preprocessed by:

Removing stopwords
Converting to lowercase
Removing punctuation
Tokenizing

Models

The following models are implemented and evaluated:

Traditional ML Models
Logistic Regression
Decision Tree
AdaBoost with Decision Tree base estimator
Neural Network Models
Simple feedforward network
Deep network with dropout and regularization

Word Embedding Models

TF-IDF vectors
FastText word embeddings trained in unsupervised mode

Training

Models are trained on 80% of the dataset and validated on 10% for hyperparameter tuning.

The TensorFlow models use the Adam optimizer and sparse categorical crossentropy loss. They are trained for 30 epochs with a batch size of 32.

The FastText model is trained with supervised labeling on the training set sentences.

Evaluation

All models are evaluated on the remaining 10% test set using accuracy and F1 score.

Classification reports are printed showing precision, recall and F1 for each sentiment class.

Results

The deep neural network achieves the best performance with 63% test accuracy. Logistic regression, decision tree, and FastText also perform reasonably well.

In general, the neural network models outperform the traditional ML models. The word embedding models achieve better performance compared to pure TF-IDF vectors.

There is still room for improvement by using more advanced architectures, pretrained embeddings, and regularization techniques.

Usage

The main scripts are:

train.py - Train a modelevaluate.py - Evaluate on test setpredict.py - Make predictions on new data

Example:

# Train FastText modelpython train.py --model fasttext# Evaluate LSTM modelpython evaluate.py --model lstm# Make predictions with ensemblepython predict.py --model ensemble

References

Link 1

About

This is a project related to machine learning course

Releases1

v1.0 Latest

Jun 29, 2023

Packages

No packages published

Languages

Jupyter Notebook100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis of Movie Reviews

About

Contributors

Data

Models

Word Embedding Models

Training

Evaluation

Results

Usage

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Packages

Languages

Movatterモバイル変換

License

mjmaher987/Sentiment-Analysis-Project

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis of Movie Reviews

About

Contributors

Data

Models

Word Embedding Models

Training

Evaluation

Results

Usage

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Packages0

Languages

Packages