Twitter Sentiment Analysis is the process of using Python to understand the emotions or opinions expressed in tweets automatically. By analyzing the text we can classify tweets as positive, negative or neutral. This helps businesses and researchers track public mood, brand reputation or reactions to events in real time. Python libraries like TextBlob, Tweepy and NLTK make it easy to collect tweets, process the text and perform sentiment analysis efficiently.
How is Twitter Sentiment Analysis Useful?
- Twitter Sentiment Analysis is important because it helps people and businesses understand what the public thinks in real time.
- Millions of tweets are posted every day, sharing opinions about brands, products, events or social issues. By analyzing this huge stream of data, companies can measure customer satisfaction, spot trends early, handle negative feedback quickly and make better decisions based on how people actually feel.
- It’s also useful for researchers and governments to monitor public mood during elections, crises or big events as it turns raw tweets into valuable insights.
Step by Step Implementation
Step 1: Install Necessary Libraries
This block installs and imports the required libraries. It usespandasto load and handle data,TfidfVectorizerto turn text into numbers andscikit learn to train model.
Pythonpipinstallpandasscikit-learnimportpandasaspdfromsklearn.feature_extraction.textimportTfidfVectorizerfromsklearn.model_selectionimporttrain_test_splitfromsklearn.naive_bayesimportBernoulliNBfromsklearn.linear_modelimportLogisticRegressionfromsklearn.svmimportLinearSVCfromsklearn.metricsimportaccuracy_score,classification_report
Step 2: Load Dataset
- Here we loads the Sentiment140 dataset from a zipped CSV file, you can download it from Kaggle.
- We keep only the polarity and tweet text columns, renames them for clarity and prints the first few rows to check the data.
Pythondf=pd.read_csv('training.1600000.processed.noemoticon.csv.zip',encoding='latin-1',header=None)df=df[[0,5]]df.columns=['polarity','text']print(df.head())
Output:
OutputStep 3: Keep Only Positive and Negative Sentiments
- Here we removes neutral tweets where polarity is 2, maps the labels so 0 stays negative and 4 becomes 1 for positive.
- Then we print how many positive and negative tweets are left in the data.
Pythondf=df[df.polarity!=2]df['polarity']=df['polarity'].map({0:0,4:1})print(df['polarity'].value_counts())
Output:
OutputStep 4: Clean the Tweets
- Here we define a simple function to convert all text to lowercase for consistency, applies it to every tweet in the dataset.
- Then shows the original and cleaned versions of the first few tweets.
Pythondefclean_text(text):returntext.lower()df['clean_text']=df['text'].apply(clean_text)print(df[['text','clean_text']].head())
Output:
OutputStep 5: Train Test Split
- This code splits the clean_text and polarity columns into training and testing sets using an 80/20 split.
- random_state=42 ensures reproducibility.
PythonX_train,X_test,y_train,y_test=train_test_split(df['clean_text'],df['polarity'],test_size=0.2,random_state=42)print("Train size:",len(X_train))print("Test size:",len(X_test))
Output:
Train size: 1280000
Test size: 320000
Step 6: Perform Vectorization
- This code creates a TF IDF vectorizer that converts text into numerical features using unigrams and bigrams limited to 5000 features.
- It fits and transforms the training data and transforms the test data and then prints the shapes of the resulting TF IDF matrices.
Pythonvectorizer=TfidfVectorizer(max_features=5000,ngram_range=(1,2))X_train_tfidf=vectorizer.fit_transform(X_train)X_test_tfidf=vectorizer.transform(X_test)print("TF-IDF shape (train):",X_train_tfidf.shape)print("TF-IDF shape (test):",X_test_tfidf.shape)
Output:
TF-IDF shape (train): (1280000, 5000)
TF-IDF shape (test): (320000, 5000)
Step 7: Train Bernoulli Naive Bayes model
- Here we train aBernoulli Naive Bayes classifier on the TF IDF features from the training data.
- It predicts sentiments for the test data and then prints the accuracy and a detailed classification report.
Pythonbnb=BernoulliNB()bnb.fit(X_train_tfidf,y_train)bnb_pred=bnb.predict(X_test_tfidf)print("Bernoulli Naive Bayes Accuracy:",accuracy_score(y_test,bnb_pred))print("\nBernoulliNB Classification Report:\n",classification_report(y_test,bnb_pred))
Output:
OutputStep 9: Train Support Vector Machine (SVM) model
- This code trains aSupport Vector Machine (SVM)with a maximum of 1000 iterations on the TF IDF features.
- It predicts test labels then prints the accuracy and a detailed classification report showing how well the SVM performed.
Pythonsvm=LinearSVC(max_iter=1000)svm.fit(X_train_tfidf,y_train)svm_pred=svm.predict(X_test_tfidf)print("SVM Accuracy:",accuracy_score(y_test,svm_pred))print("\nSVM Classification Report:\n",classification_report(y_test,svm_pred))
Output:
OutputStep 10: Train Logistic Regression model
- This code trains aLogistic Regression model with up to 100 iterations on the TF IDF features.
- It predicts sentiment labels for the test data and prints the accuracy and detailed classification report for model evaluation.
Pythonlogreg=LogisticRegression(max_iter=100)logreg.fit(X_train_tfidf,y_train)logreg_pred=logreg.predict(X_test_tfidf)print("Logistic Regression Accuracy:",accuracy_score(y_test,logreg_pred))print("\nLogistic Regression Classification Report:\n",classification_report(y_test,logreg_pred))
Output:
OutputStep 11: Make Predictions on sample Tweets
- This code takes three sample tweets and transforms them into TF IDF features using the same vectorizer.
- It then predicts their sentiment using the trained BernoulliNB, SVM and Logistic Regression models and prints the results for each classifier.
- Where 1 stands for Positive and 0 for Negative.
C++sample_tweets=["I love this!","I hate that!","It was okay, not great."]sample_vec=vectorizer.transform(sample_tweets)print("\nSample Predictions:")print("BernoulliNB:",bnb.predict(sample_vec))print("SVM:",svm.predict(sample_vec))print("Logistic Regression:",logreg.predict(sample_vec))
Output:
OutputWe can see that our models are working fine and giving same predictions even with different approaches.
You can download the Source code from here-Twitter Sentiment Analysis using Python
Twitter Sentiment Analysis Using Python

Twitter Sentiment Analysis Using Python

Twitter Sentiment Analysis with Python