tadiusfrank2001/Sentiment_Analyst_Machine_Learning_ProjectPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star0

Design a Decision-Tree Classifer and a Logistical Regression Classifier and compare their performance in Sentiment Analysis on Twitter Tweet data

License

MIT license

0 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
README.md		README.md
classifiers.py		classifiers.py
decision_tree.png		decision_tree.png
experiments.py		experiments.py
twittersentimentdata_aa		twittersentimentdata_aa
twittersentimentdata_ab		twittersentimentdata_ab
twittersentimentdata_ac		twittersentimentdata_ac
twittersentimentdata_ad		twittersentimentdata_ad
twittersentimentdata_ae		twittersentimentdata_ae

Repository files navigation

Sentiment Analysis Machine Learning Project

Overview

The goal of this project was to perform sentiment analysis on a Twitter dataset using two different machine learning models: Decision Tree and Logistic Regression.The focus was on evaluating which model performed sentiment classification better in terms of accuracy.

Sentiment Classification

Sentiment classification is a type of text classification problem, where instead of classifying based on the topic of the text, the focus is on the sentiment or opinion lexicon that indicates whether an opinion is positive, negative or neutral.To reduce the complexity and improve classification accuracy, I filtered out neutral lexicons, or “stopwords,” using Python’snltk library, as they are less informative.This approach allows us to treat sentiment classification as a binary classification problem, ignoring the neutral class and classifying text as either positive or negative.

After vectorizing the text and counting the frequency of sentiment lexicons, I fitted a Logistic Regression model on the preprocessed data with a 70-30 train-test split.I then evaluated whether removing neutral lexicons impacts the results and implemented a Decision Tree classifier to compare performance.

Challenges and Adjustments

Initially, the project code was structured to handle non-NLP-based datasets, which only involved numerical data.To address this, I augmented the code to properly handle the text data by reading the CSV file, removing stop words, and filtering out characters such as emojis and "@" mentions. Additionally, the dataset was contextually labeled based on a specific scenario (e.g., a video game Twitter thread), where logically negative actions might be labeled as positive.

The dataset consisted of over 1.6 million entries, which posed challenges in terms of runtime and computational resources. To manage this, we trained and tested our models on physical university lab machines with greater GPU and CPU capacity.

Technologies Used

Python: Programming language used.

scikit-learn: For implementing the Decision Tree and Logistic Regression models.

Pandas: For data manipulation and analysis.

NumPy: For numerical computations.

NLTK: For natural language processing tasks.

Project Structure

experiment.py: Cleans and preprocesses the Twitter dataset, including removing stop words and extra punctuation.

classify.py: Contains the Logistic Regression and Decision Tree classifiers used to train and evaluate the models.

data/: Directory containing the Twitter tweet data, split into multiple 25MB files for easier management.

About

Design a Decision-Tree Classifer and a Logistical Regression Classifier and compare their performance in Sentiment Analysis on Twitter Tweet data

Releases

No releases published

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis Machine Learning Project

Overview

Sentiment Classification

Challenges and Adjustments

Technologies Used

Project Structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

tadiusfrank2001/Sentiment_Analyst_Machine_Learning_Project

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis Machine Learning Project

Overview

Sentiment Classification

Challenges and Adjustments

Technologies Used

Project Structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages