- Notifications
You must be signed in to change notification settings - Fork0
This is a project related to machine learning course
License
mjmaher987/Sentiment-Analysis-Project
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation

This project performs sentiment analysis on the Large Movie Review Dataset using various machine learning techniques.
This project is related to the machine learning course project.Here we are going to clean data, train, test and evaluate some models for sentiment analysis.We use a lot of traditional and new models for training and testing.Share any ideas or brainstorming if you have.
- Mohammad Javad Maheronnaghsh
The dataset contains 50,000 reviews from IMDB. There are 25,000 training reviews and 25,000 testing reviews. The sentiment labels are balanced between positive and negative.
The data is loaded and preprocessed by:
- Removing stopwords
- Converting to lowercase
- Removing punctuation
- Tokenizing
The following models are implemented and evaluated:
- Traditional ML Models
- Logistic Regression
- Decision Tree
- AdaBoost with Decision Tree base estimator
- Neural Network Models
- Simple feedforward network
- Deep network with dropout and regularization
- TF-IDF vectors
- FastText word embeddings trained in unsupervised mode
Models are trained on 80% of the dataset and validated on 10% for hyperparameter tuning.
The TensorFlow models use the Adam optimizer and sparse categorical crossentropy loss. They are trained for 30 epochs with a batch size of 32.
The FastText model is trained with supervised labeling on the training set sentences.
All models are evaluated on the remaining 10% test set using accuracy and F1 score.
Classification reports are printed showing precision, recall and F1 for each sentiment class.
The deep neural network achieves the best performance with 63% test accuracy. Logistic regression, decision tree, and FastText also perform reasonably well.
In general, the neural network models outperform the traditional ML models. The word embedding models achieve better performance compared to pure TF-IDF vectors.
There is still room for improvement by using more advanced architectures, pretrained embeddings, and regularization techniques.
The main scripts are:
train.py - Train a modelevaluate.py - Evaluate on test setpredict.py - Make predictions on new dataExample:
# Train FastText modelpython train.py --model fasttext# Evaluate LSTM modelpython evaluate.py --model lstm# Make predictions with ensemblepython predict.py --model ensembleAbout
This is a project related to machine learning course
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.