Movatterモバイル変換

Twitter Sentiment Analysis is the process of using Python to understand the emotions or opinions expressed in tweets automatically. By analyzing the text we can classify tweets as positive, negative or neutral. This helps businesses and researchers track public mood, brand reputation or reactions to events in real time. Python libraries like TextBlob, Tweepy and NLTK make it easy to collect tweets, process the text and perform sentiment analysis efficiently.

How is Twitter Sentiment Analysis Useful?

Twitter Sentiment Analysis is important because it helps people and businesses understand what the public thinks in real time.
Millions of tweets are posted every day, sharing opinions about brands, products, events or social issues. By analyzing this huge stream of data, companies can measure customer satisfaction, spot trends early, handle negative feedback quickly and make better decisions based on how people actually feel.
It’s also useful for researchers and governments to monitor public mood during elections, crises or big events as it turns raw tweets into valuable insights.

Step by Step Implementation

Step 1: Install Necessary Libraries

This block installs and imports the required libraries. It uses pandasto load and handle data,TfidfVectorizerto turn text into numbers andscikit learn to train model.

Python

pipinstallpandasscikit-learnimportpandasaspdfromsklearn.feature_extraction.textimportTfidfVectorizerfromsklearn.model_selectionimporttrain_test_splitfromsklearn.naive_bayesimportBernoulliNBfromsklearn.linear_modelimportLogisticRegressionfromsklearn.svmimportLinearSVCfromsklearn.metricsimportaccuracy_score,classification_report

Step 2: Load Dataset

Here we loads the Sentiment140 dataset from a zipped CSV file, you can download it from Kaggle.
We keep only the polarity and tweet text columns, renames them for clarity and prints the first few rows to check the data.

Python

df=pd.read_csv('training.1600000.processed.noemoticon.csv.zip',encoding='latin-1',header=None)df=df[[0,5]]df.columns=['polarity','text']print(df.head())

Output:

Step 3: Keep Only Positive and Negative Sentiments

Here we removes neutral tweets where polarity is 2, maps the labels so 0 stays negative and 4 becomes 1 for positive.
Then we print how many positive and negative tweets are left in the data.

Python

df=df[df.polarity!=2]df['polarity']=df['polarity'].map({0:0,4:1})print(df['polarity'].value_counts())

Output:

Step 4: Clean the Tweets

Here we define a simple function to convert all text to lowercase for consistency, applies it to every tweet in the dataset.
Then shows the original and cleaned versions of the first few tweets.

Python

defclean_text(text):returntext.lower()df['clean_text']=df['text'].apply(clean_text)print(df[['text','clean_text']].head())

Output:

Step 5: Train Test Split

This code splits the clean_text and polarity columns into training and testing sets using an 80/20 split.
random_state=42 ensures reproducibility.

Python

X_train,X_test,y_train,y_test=train_test_split(df['clean_text'],df['polarity'],test_size=0.2,random_state=42)print("Train size:",len(X_train))print("Test size:",len(X_test))

Output:

Train size: 1280000
Test size: 320000

Step 6: Perform Vectorization

This code creates a TF IDF vectorizer that converts text into numerical features using unigrams and bigrams limited to 5000 features.
It fits and transforms the training data and transforms the test data and then prints the shapes of the resulting TF IDF matrices.

Python

vectorizer=TfidfVectorizer(max_features=5000,ngram_range=(1,2))X_train_tfidf=vectorizer.fit_transform(X_train)X_test_tfidf=vectorizer.transform(X_test)print("TF-IDF shape (train):",X_train_tfidf.shape)print("TF-IDF shape (test):",X_test_tfidf.shape)

Output:

TF-IDF shape (train): (1280000, 5000)
TF-IDF shape (test): (320000, 5000)

Step 7: Train Bernoulli Naive Bayes model

Here we train aBernoulli Naive Bayes classifier on the TF IDF features from the training data.
It predicts sentiments for the test data and then prints the accuracy and a detailed classification report.

Python

bnb=BernoulliNB()bnb.fit(X_train_tfidf,y_train)bnb_pred=bnb.predict(X_test_tfidf)print("Bernoulli Naive Bayes Accuracy:",accuracy_score(y_test,bnb_pred))print("\nBernoulliNB Classification Report:\n",classification_report(y_test,bnb_pred))

Output:

Step 9: Train Support Vector Machine (SVM) model

This code trains aSupport Vector Machine (SVM)with a maximum of 1000 iterations on the TF IDF features.
It predicts test labels then prints the accuracy and a detailed classification report showing how well the SVM performed.

Python

svm=LinearSVC(max_iter=1000)svm.fit(X_train_tfidf,y_train)svm_pred=svm.predict(X_test_tfidf)print("SVM Accuracy:",accuracy_score(y_test,svm_pred))print("\nSVM Classification Report:\n",classification_report(y_test,svm_pred))

Output:

Step 10: Train Logistic Regression model

This code trains aLogistic Regression model with up to 100 iterations on the TF IDF features.
It predicts sentiment labels for the test data and prints the accuracy and detailed classification report for model evaluation.

Python

logreg=LogisticRegression(max_iter=100)logreg.fit(X_train_tfidf,y_train)logreg_pred=logreg.predict(X_test_tfidf)print("Logistic Regression Accuracy:",accuracy_score(y_test,logreg_pred))print("\nLogistic Regression Classification Report:\n",classification_report(y_test,logreg_pred))

Output:

Step 11: Make Predictions on sample Tweets

This code takes three sample tweets and transforms them into TF IDF features using the same vectorizer.
It then predicts their sentiment using the trained BernoulliNB, SVM and Logistic Regression models and prints the results for each classifier.
Where 1 stands for Positive and 0 for Negative.

C++

sample_tweets=["I love this!","I hate that!","It was okay, not great."]sample_vec=vectorizer.transform(sample_tweets)print("\nSample Predictions:")print("BernoulliNB:",bnb.predict(sample_vec))print("SVM:",svm.predict(sample_vec))print("Logistic Regression:",logreg.predict(sample_vec))